0% found this document useful (0 votes)

10 views

Slides Imbalanced Learning Intro

The document discusses imbalanced data sets where the ratio of classes is significantly different. This can cause undesirable predictive behavior for the smaller class. Some examples of domains with imbalanced data are medicine, information retrieval, and fraud detection. The document outlines issues with evaluating classifiers on imbalanced data and possible solutions like resampling data, cost-sensitive learning, and ensemble-based approaches.

Uploaded by

prasadagalave0007

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Slides Imbalanced Learning Intro

Uploaded by

prasadagalave0007

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Advanced Machine Learning

Imbalanced Learning: Introduction

positives
Learning goals
3

negatives
2

Know what an imbalanced data

set is
x2
0

Understand disadvantage of
−1

accuracy on imbalanced data

−2

Know techniques for handling

−3

imbalanced data sets

−3 −2 −1 0 1 2 3

x1
IMBALANCED DATA SETS
Class imbalance: Ratio of classes is significantly different.
Consequence: Undesirable predictive behavior for smaller class.
Example: Sampling from two Gaussian distributions

© Advanced Machine Learning – 1 / 6

IMBALANCED DATA SETS: EXAMPLES

Domain Task Majority Class Minor Class

Medicine Predict tumor pathology Benign Malignant

Information retrieval Find relevant items Irrelevant items Relevant items
Tracking criminals Detect fraud emails Non-fraud emails Fraud emails
Weather prediction Predict extreme weather Normal weather Tornado, hurricane

Often, the minority class is the more important class.

Imbalanced data can be a source of bias related to concept of fairness.

© Advanced Machine Learning – 2 / 6

ISSUES WITH EVALUATING CLASSIFIERS
Ideal case: correctly classify as many instances as possible
⇒ High accuracy, preferably 100%.
In practice, we often obtain on imbalanced data sets: good
performance on the majority class(es), a poor performance on
the minority class(es).
Reason: the classifier is biased towards the majority class(es), as
predicting the majority class pays off in terms of accuracy.
Focusing only on accuracy can lead to bad performance on
minority class.
Example:
Assume that only 0.5% of the patients have a disease,
Always predicting “no disease” leads to accuracy of 99.5%

© Advanced Machine Learning – 3 / 6

ISSUES WITH EVALUATING CLASSIFIERS
1.000

0.8
0.975

Learner Learner
Accuracy

TPR
Classification Tree 0.6 Classification Tree
0.950 Logistic Regression Logistic Regression
SVM SVM

0.4
0.925

10000/10000 1000/10000 100/10000 50/10000 10000/10000 1000/10000 100/10000 50/10000

Positive/Negative Ratio Positive/Negative Ratio

0.9
0.90

0.8

0.85
Learner Learner

F1 Score
0.7
PPV

Classification Tree Classification Tree

Logistic Regression Logistic Regression
0.80 SVM SVM
0.6

0.75 0.5

0.4
10000/10000 1000/10000 100/10000 50/10000 10000/10000 1000/10000 100/10000 50/10000
Positive/Negative Ratio Positive/Negative Ratio

In each scenario, we have 10.000 obs in the negative class. Number of obs in positive
class varies between 10.000, 1.000, 100, and 50. Train classifiers with 10-fold stratified
cv. Evaluate via aggregated predictions on test set.

POSSIBLE SOLUTIONS
Ideal performance metric: the learning is properly biased towards
the minority class(es).
Imbalance-aware performance metrics:
G-score
Balanced accuracy
Matthews Correlation Coefficient
Weighted macro F1 score

POSSIBLE SOLUTIONS

Approach Main idea Remark

Algorithm-level Bias classifiers towards minority Special knowledge about clas-
sifiers is needed
Data-level Rebalance classes by resampling No modification of classifiers is
needed
Cost-sensitive Introduce different costs for mis- Between algorithm- and data-
Learning classification when learning level approaches

Ensemble-based Ensemble learning plus one of -

three techniques above

Year 5 Science Comments. Term 3
0% (1)
Year 5 Science Comments. Term 3
4 pages
Employability Skills
80% (5)
Employability Skills
62 pages
Caroline Regis at Excel Systems
No ratings yet
Caroline Regis at Excel Systems
3 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Normal It As
No ratings yet
Normal It As
12 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Imbalanced Deep Learning by Minority Class Incremental Rectification
No ratings yet
Imbalanced Deep Learning by Minority Class Incremental Rectification
16 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
SPSS Uji PH
No ratings yet
SPSS Uji PH
16 pages
Cross Tab Pendidikan
No ratings yet
Cross Tab Pendidikan
13 pages
24 - Pande Komang Adinda Febriasi - TUGAS Praktikum STATISTIKA LATIHAN 2
No ratings yet
24 - Pande Komang Adinda Febriasi - TUGAS Praktikum STATISTIKA LATIHAN 2
9 pages
Uji Homogenitas
No ratings yet
Uji Homogenitas
191 pages
Uji Normalitas
No ratings yet
Uji Normalitas
18 pages
Uoc Luong
No ratings yet
Uoc Luong
5 pages
Normalitas Dan Homogenitas
No ratings yet
Normalitas Dan Homogenitas
15 pages
Explore: Notes
No ratings yet
Explore: Notes
34 pages
Output
No ratings yet
Output
8 pages
Postes Kemampuan
No ratings yet
Postes Kemampuan
12 pages
Bab Ii - 5
No ratings yet
Bab Ii - 5
10 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Explore Nya
No ratings yet
Explore Nya
7 pages
Punya Titi
No ratings yet
Punya Titi
9 pages
Uji Normalitas Wilcoxonnn
No ratings yet
Uji Normalitas Wilcoxonnn
12 pages
Normal It As
No ratings yet
Normal It As
15 pages
Fartoks Onset
No ratings yet
Fartoks Onset
12 pages
Hipotesa Ha Diterima H0 Ditolak Ada Perbedaan Antara Kelompok Control Dengan Kelompok Perlakuan
No ratings yet
Hipotesa Ha Diterima H0 Ditolak Ada Perbedaan Antara Kelompok Control Dengan Kelompok Perlakuan
11 pages
Logistic Regression 2024
No ratings yet
Logistic Regression 2024
23 pages
Uji Normalitas
No ratings yet
Uji Normalitas
32 pages
Hasil Uji Normalitas
No ratings yet
Hasil Uji Normalitas
15 pages
Normalitas 1
No ratings yet
Normalitas 1
16 pages
Output
No ratings yet
Output
10 pages
NeurIPS 2018 A Simple Unified Framework For Detecting Out of Distribution Samples and Adversarial Attacks Paper
No ratings yet
NeurIPS 2018 A Simple Unified Framework For Detecting Out of Distribution Samples and Adversarial Attacks Paper
11 pages
202422051_MAITA SARI (BIOSTATISTIK)
No ratings yet
202422051_MAITA SARI (BIOSTATISTIK)
26 pages
Anova 1 Jalan Output
No ratings yet
Anova 1 Jalan Output
14 pages
Olah 1
No ratings yet
Olah 1
11 pages
Frequency Table: Statistics
No ratings yet
Frequency Table: Statistics
6 pages
Explore
No ratings yet
Explore
14 pages
silgia2a
No ratings yet
silgia2a
12 pages
silgia1
No ratings yet
silgia1
10 pages
Outputkijmmllpiyv
No ratings yet
Outputkijmmllpiyv
8 pages
Tugas Spss Per 10
No ratings yet
Tugas Spss Per 10
10 pages
Tugas SPSS Eva Andari IV A
No ratings yet
Tugas SPSS Eva Andari IV A
11 pages
Spss Terbaru Vika
No ratings yet
Spss Terbaru Vika
19 pages
Diki Firmansyah 11119059
No ratings yet
Diki Firmansyah 11119059
14 pages
Output Latihan Uji Anova Tsania Wanda
No ratings yet
Output Latihan Uji Anova Tsania Wanda
12 pages
Homogenitas
No ratings yet
Homogenitas
9 pages
Out Put Word Normalitas & Homogenitas
No ratings yet
Out Put Word Normalitas & Homogenitas
30 pages
Explore: Notes
No ratings yet
Explore: Notes
19 pages
EXAMINE VARIABLES - Bu Dwi
No ratings yet
EXAMINE VARIABLES - Bu Dwi
12 pages
GM, CM and Nursing Team an-ACC Training
No ratings yet
GM, CM and Nursing Team an-ACC Training
22 pages
MK Biostatistik Latihan Uji Analisis One Way Anova: Dosen Pengampu: Ibu Yunita Liana, S.Kep, Ners, M.Kes
No ratings yet
MK Biostatistik Latihan Uji Analisis One Way Anova: Dosen Pengampu: Ibu Yunita Liana, S.Kep, Ners, M.Kes
11 pages
Nama Inam Delima Spss
No ratings yet
Nama Inam Delima Spss
7 pages
Uji Normalitas_Dewi Matius
No ratings yet
Uji Normalitas_Dewi Matius
4 pages
Uji Normalitas
No ratings yet
Uji Normalitas
30 pages
Normality
No ratings yet
Normality
4 pages
OUTPUT
No ratings yet
OUTPUT
4 pages
DATA SPSS
No ratings yet
DATA SPSS
14 pages
NORMALITAS, HOMOGENITAS, UJI T CLIENT 4753 Completed
No ratings yet
NORMALITAS, HOMOGENITAS, UJI T CLIENT 4753 Completed
12 pages
Interpretaation Guide and Overview PDF
No ratings yet
Interpretaation Guide and Overview PDF
9 pages
Pydata 2021 CV Tesco
No ratings yet
Pydata 2021 CV Tesco
28 pages
Output
No ratings yet
Output
31 pages
Applied Time Series Econometrics: A Practical Guide for Macroeconomic Researchers with a Focus on Africa
From Everand
Applied Time Series Econometrics: A Practical Guide for Macroeconomic Researchers with a Focus on Africa
Alemayehu Geda
3/5 (1)
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ijarm 21
No ratings yet
Ijarm 21
4 pages
Group 2 Interpreting
No ratings yet
Group 2 Interpreting
10 pages
The Nature of History: History As Reconstruction: "God Alone Knows The Future, But Only An Historian Can Alter The Past."
No ratings yet
The Nature of History: History As Reconstruction: "God Alone Knows The Future, But Only An Historian Can Alter The Past."
8 pages
Creative Non Fiction and Research
No ratings yet
Creative Non Fiction and Research
80 pages
English Through Pictures, Book 1
No ratings yet
English Through Pictures, Book 1
273 pages
Question Bank For Human Resource Management (MBA 2) - 1
No ratings yet
Question Bank For Human Resource Management (MBA 2) - 1
4 pages
Garrita National High School: Individual Learning Monitoring Plan First Quarter
No ratings yet
Garrita National High School: Individual Learning Monitoring Plan First Quarter
4 pages
Cetak Transkrip Nilai Sementara Munir
100% (1)
Cetak Transkrip Nilai Sementara Munir
2 pages
Helmut Lachenmann Concept of Rejection PDF
No ratings yet
Helmut Lachenmann Concept of Rejection PDF
11 pages
Lesson Plan Bar Graph
No ratings yet
Lesson Plan Bar Graph
6 pages
Writing Synonyms List
No ratings yet
Writing Synonyms List
19 pages
싶다 used to mean "to think" or "to wonder": Level 10 Lesson 28
No ratings yet
싶다 used to mean "to think" or "to wonder": Level 10 Lesson 28
5 pages
Chapter 8 Emotion
No ratings yet
Chapter 8 Emotion
7 pages
Activity Proposal For LNHS
No ratings yet
Activity Proposal For LNHS
6 pages
Republic of The Philippines Division of Sarangani: Department of Education East Malungon District Teachers' Profile
No ratings yet
Republic of The Philippines Division of Sarangani: Department of Education East Malungon District Teachers' Profile
12 pages
B.Ed (English) - Pedagogy of Teaching English - Course
No ratings yet
B.Ed (English) - Pedagogy of Teaching English - Course
6 pages
ادارة الجودة في المشاريع
No ratings yet
ادارة الجودة في المشاريع
18 pages
PDF Reason, Bias, and Inquiry: The Crossroads of Epistemology and Psychology Nathan Ballantyne download
100% (1)
PDF Reason, Bias, and Inquiry: The Crossroads of Epistemology and Psychology Nathan Ballantyne download
66 pages
The Concept of Management
No ratings yet
The Concept of Management
4 pages
Self-Efficacy and Learning Outcomes Among The College of Teacher Education Working Students of The University of Mindanao
0% (1)
Self-Efficacy and Learning Outcomes Among The College of Teacher Education Working Students of The University of Mindanao
11 pages
Concept Paper Defense
No ratings yet
Concept Paper Defense
3 pages
Sems 1,2,3 Syllabus and QP
No ratings yet
Sems 1,2,3 Syllabus and QP
53 pages
Human Cognition in The Human Brain
No ratings yet
Human Cognition in The Human Brain
6 pages
Skip to main content 3.4 Content
No ratings yet
Skip to main content 3.4 Content
6 pages
Bloom-Taxonomy Learning Objective
No ratings yet
Bloom-Taxonomy Learning Objective
4 pages
Inquiry
No ratings yet
Inquiry
8 pages
Curriculum Plan
No ratings yet
Curriculum Plan
79 pages

Slides Imbalanced Learning Intro

Uploaded by

Slides Imbalanced Learning Intro

Uploaded by

Advanced Machine Learning

Imbalanced Learning: Introduction

Know what an imbalanced data

accuracy on imbalanced data

Know techniques for handling

imbalanced data sets

© Advanced Machine Learning – 1 / 6

Domain Task Majority Class Minor Class

Medicine Predict tumor pathology Benign Malignant

Often, the minority class is the more important class.

© Advanced Machine Learning – 2 / 6

© Advanced Machine Learning – 3 / 6

10000/10000 1000/10000 100/10000 50/10000 10000/10000 1000/10000 100/10000 50/10000

Classification Tree Classification Tree

© Advanced Machine Learning – 4 / 6

© Advanced Machine Learning – 5 / 6

Approach Main idea Remark

Ensemble-based Ensemble learning plus one of -

© Advanced Machine Learning – 6 / 6

You might also like