Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix confusion matrix using only predictions as source for labels #249

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

levkk
Copy link
Contributor

@levkk levkk commented Oct 17, 2022

Fix confusing matrix incorrectly using labels from predict only instead of using labels from predict and ground truth. Ideally we should expose the Scikit-like API that passes in all the labels, in case the labels in the test set are not all inclusive (which would be a mistake in train/test partitioning, but can happen).

I'm somewhat confused by the way the API is written because the argument for the confusion_matrix method is called ground_truth, but shouldn't it be the predicted points instead?

@codecov-commenter
Copy link

Codecov Report

Base: 39.24% // Head: 39.26% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (3356d42) compared to base (5ebe23c).
Patch coverage: 60.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage   39.24%   39.26%   +0.02%     
==========================================
  Files          92       92              
  Lines        6085     6089       +4     
==========================================
+ Hits         2388     2391       +3     
- Misses       3697     3698       +1     
Impacted Files Coverage Δ
src/dataset/mod.rs 29.03% <50.00%> (-0.60%) ⬇️
src/metrics_classification.rs 38.36% <100.00%> (-0.63%) ⬇️
algorithms/linfa-nn/src/linear.rs 45.16% <0.00%> (-1.72%) ⬇️
src/correlation.rs 29.57% <0.00%> (-1.41%) ⬇️
algorithms/linfa-svm/src/classification.rs 46.49% <0.00%> (-0.88%) ⬇️
...rithms/linfa-trees/src/decision_trees/algorithm.rs 36.60% <0.00%> (-0.45%) ⬇️
algorithms/linfa-nn/tests/nn.rs 78.04% <0.00%> (ø)
algorithms/linfa-linear/src/glm/mod.rs 52.77% <0.00%> (ø)
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@YuhanLiin
Copy link
Collaborator

The argument is ground_truth because self is the predicted points. The point about using labels from both sources still stands though.

@@ -323,6 +323,18 @@ pub trait Labels {
fn labels(&self) -> Vec<Self::Elem> {
self.label_set().into_iter().flatten().collect()
Copy link
Collaborator

@YuhanLiin YuhanLiin Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this method doesn't dedup the final vector. It should do something like union all HashSet together. Or we can just change the return type to HashSet, but that might be too invasive.

@@ -323,6 +323,18 @@ pub trait Labels {
fn labels(&self) -> Vec<Self::Elem> {
self.label_set().into_iter().flatten().collect()
}

fn combined_labels(&self, other: Vec<Self::Elem>) -> Vec<Self::Elem> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have this method take &impl Labels or &Self as input. Then you can call label_set on both self and the input and union all the hashsets before converting it into a Vec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants