Slides Imbalanced Learning Intro
Slides Imbalanced Learning Intro
positives
Learning goals
3
negatives
2
set is
x2
0
Understand disadvantage of
−1
x1
IMBALANCED DATA SETS
Class imbalance: Ratio of classes is significantly different.
Consequence: Undesirable predictive behavior for smaller class.
Example: Sampling from two Gaussian distributions
0.8
0.975
Learner Learner
Accuracy
TPR
Classification Tree 0.6 Classification Tree
0.950 Logistic Regression Logistic Regression
SVM SVM
0.4
0.925
0.9
0.90
0.8
0.85
Learner Learner
F1 Score
0.7
PPV
0.75 0.5
0.4
10000/10000 1000/10000 100/10000 50/10000 10000/10000 1000/10000 100/10000 50/10000
Positive/Negative Ratio Positive/Negative Ratio
In each scenario, we have 10.000 obs in the negative class. Number of obs in positive
class varies between 10.000, 1.000, 100, and 50. Train classifiers with 10-fold stratified
cv. Evaluate via aggregated predictions on test set.