10 Techniques To Solve Imbalanced Classes in ML
10 Techniques To Solve Imbalanced Classes in ML
10 Techniques to Solve
Imbalanced Classes in Machine
Learning (Updated 2024)
G guest_blog
14 Jun, 2024 10 min read
16
Introduction
While working as a data scientist, some of the most
frequently occurring problem statements are related to
binary classification. A common problem when solving
these problem statements is that of class imbalance.
When observation in one class is higher than in other
classes, a class imbalance exists. Example: To detect
fraudulent credit card transactions. As shown in the graph
below, the fraudulent transaction is around 400 compared
to the non-fraudulent transaction of around 90000.
Table of contents
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 2/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
◆ ◆ ◆
Ibis Styles Paris Abba Rambla
Acta Antibes
Gare De Lyon Tgv Hotel
$ 3.408 $ 3.271
$ 3.434
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 3/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
# predict
xgb_y_predict = xgb_model.predict(x_test)
# accuracy score
xgb_score = accuracy_score(xgb_y_predict, y_test)
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 4/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
# Separate class
class_0 = data[data['Class'] == 0]
class_1 = data[data['Class'] == 1]# print the shape of the
print('class 0:', class_0.shape)
print('class 1:', class_1.shape
1. Random Under-Sampling
Undersampling can be defined as removing some
observations of the majority class. This is done until the
majority and minority class is balanced out.
Undersampling can be a good choice when you have a ton
of data -think millions of rows. But a drawback to
undersampling is that we are removing information that
may be valuable.
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 5/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
class_0_under = class_0.sample(class_count_1)
2. Random Over-Sampling
Oversampling can be defined as adding more copies to
the minority class. Oversampling in machine learning can
be a good choice when you don’t have a ton of data to
work with.
A con to consider when undersampling is that it can cause
overfitting and poor generalization to your test set.
class_1_over = class_1.sample(class_count_0, replace=True)
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 6/16
How to Balance Data With the Imbalanced-
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 7/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
ros = RandomOverSampler(random_state=42)
tl = RandomOverSampler(sampling_strategy='majority')
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 8/16
6. Synthetic Minority Oversampling
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
Technique (SMOTE)
This technique generates synthetic data for the minority
class.
SMOTE (Synthetic Minority Oversampling Technique in
machine learning) works by randomly picking a point from
the minority class and computing the k-nearest neighbors
for this point. The synthetic points are added between
the chosen point and its neighbors.
smote = SMOTE()
7. NearMiss
NearMiss is an under-sampling technique. Instead of
resampling the Minority class, using a distance will make
the majority class equal to the minority class.
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 9/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
nm = NearMiss()
minority class.
A popular algorithm for this technique is Penalized-SVM.
During training, we can use the
argument class_weight=’balanced’ to penalize mistakes
on the minority class by an amount proportional to how
under-represented it is.
We also want to include the argument probability=True if
we want to enable probability estimates for SVM
algorithms.
Let’s train a model using Penalized-SVM on the original
imbalanced dataset:
# load library
from sklearn.svm import SVC
svc_model.fit(x_train, y_train)
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 11/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
rfc = RandomForestClassifier()
# predict
rfc_predict = rfc.predict(x_test)# check performance
print('ROCAUC score:',roc_auc_score(y_test, rfc_predict))
print('Accuracy score:',accuracy_score(y_test, rfc_predict
print('F1 score:',f1_score(y_test, rfc_predict))
G guest_blog
14 Jun, 2024
RECOMMENDED ARTICLES
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 13/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
Submit reply
Shashi
05 Dec, 2021
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 14/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
Akanksha
11 Jun, 2022
ig x are the feature values for records and y is the overall
labeled data for records.
Cynthia
Write for us
Write, captivate, and earn accolades and rewards for your
work
Company Discover
About Us Blogs
Contact Us Expert session
Careers Podcasts
Comprehensive Guides
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 15/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
Learn Engage
Free courses Community
Learning path Hackathons
BlackBelt program Events
Gen AI Daily challenges
Contribute Enterprise
Contribute & win Our offerings
Become a speaker Case studies
Become a mentor Industry report
Become an instructor quexto.ai
Download App
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 16/16