0% found this document useful (0 votes)

19 views16 pages

10 Techniques To Solve Imbalanced Classes in ML

Uploaded by

shunconejitu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views16 pages

10 Techniques To Solve Imbalanced Classes in ML

Uploaded by

shunconejitu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

1/7/24, 9:32 p.m.

Class Imbalance in ML: 10 Best Ways to Solve it Using Python

10 Techniques to Solve
Imbalanced Classes in Machine
Learning (Updated 2024)
G guest_blog
14 Jun, 2024 10 min read

Introduction
While working as a data scientist, some of the most
frequently occurring problem statements are related to
binary classification. A common problem when solving
these problem statements is that of class imbalance.
When observation in one class is higher than in other
classes, a class imbalance exists. Example: To detect
fraudulent credit card transactions. As shown in the graph
below, the fraudulent transaction is around 400 compared
to the non-fraudulent transaction of around 90000.

Class Imbalance in machine learning oversampling in

machine learning is a common problem in machine
learning, especially in classification problems. Imbalance
data can hamper our model accuracy big time. It appears
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 1/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

in many domains, including fraud detection, spam filtering,

disease screening, SaaS subscription churn, advertising
click-throughs, etc. Let’s understand how to deal with
imbalanced data in machine learning.
Learning Objectives
Get familiar with class imbalance in ML through coding
tutorials in this article.
Understand various techniques for handling
imbalanced data, such as Random under-sampling,
Random over-sampling, and NearMiss.

Table of contents

The Problem With Class Imbalance in

Machine Learning
Most machine learning algorithms work best when the
number of samples in each class is about equal. This is
because most algorithms are designed to maximize
accuracy and reduce errors.
However, if the dataframes has imbalanced classes, then
In such cases, you get a pretty high accuracy just by
predicting the majority class, but you fail to capture
the minority class, which is most often the point of
creating the model in the first place. For example, if the
class distribution shows that 99% of the data has the
majority class, then any basic classification model like the
logistic regression or decision tree will not be able to
identify the minor class data points.
Credit Card Fraud Detection Example
Let’s say we have a dataset of credit card companies
where we have to find out whether the credit card
transaction was fraudulent or not.

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 2/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
◆ ◆ ◆
Ibis Styles Paris Abba Rambla
Acta Antibes
Gare De Lyon Tgv Hotel
$ 3.408 $ 3.271
$ 3.434

But here’s the catch… fraud transaction is relatively rare.

Only 6% of the transactions are fraudulent.
Now, before you even start, do you see how the problem
might break? Imagine if you didn’t bother training a model
at all. Instead, what if you just wrote a single line of code
that always predicts ‘no fraudulent transaction’
def transaction(transaction_data):
return 'No fradulent transaction'

Well, guess what? Your “solution” would have 94%

accuracy!
Unfortunately, that accuracy is misleading.
For all those non-fraudulent transactions, you’d have
100% accuracy.
For those transactions which are fraudulent, you’d
have 0% accuracy.
Your overall accuracy would be high simply because
most of the transactions are not fraudulent (not
because your model is any good).
This is clearly a problem because many machine learning
algorithms are designed to maximize overall accuracy. In
this article, we will see different techniques to handle
imbalanced data.
Sample Dataset
We will use a credit card fraud detection dataset for this
article. You can find the dataset here.
After loading the data display the first five-row of the data
set.
Python Code:

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 3/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

You can clearly see that there is a huge difference

between the data set. 9000 non-fraudulent transactions
and 492 fraudulent.
The Metric Trap
One of the major issues that new developer users fall into
when dealing with unbalanced datasets relates to the
evaluation metrics used to evaluate their machine learning
model. Using simpler metrics like accuracy score can be
misleading. In a dataset with highly unbalanced classes,
the classifier will always “predicts” the most common
class without performing any analysis of the features, and
it will have a high accuracy rate, obviously not the correct
one.
Let’s do this experiment using the simple XGBClassifier
and no feature engineering:
# import linrary
from xgboost import XGBClassifier

xgb_model = XGBClassifier().fit(x_train, y_train)

# predict
xgb_y_predict = xgb_model.predict(x_test)

# accuracy score
xgb_score = accuracy_score(xgb_y_predict, y_test)

print('Accuracy score is:', xbg_score)OUTPUT

Accuracy score is: 0.992

We can see 99% accuracy, we are getting very high

accuracy because it is predicting mostly
the majority class that is 0 (Non-fraudulent).
Resampling Techniques to Solve Class
Imbalance
One of the widely adopted class imbalance techniques for
dealing with highly unbalanced datasets is called
resampling. It consists of removing samples from the
majority class (under-sampling) and/or adding more
examples from the minority class (over-sampling).

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 4/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

Despite the advantage of balancing classes, these

techniques also have their weaknesses (there is no free
lunch).
The simplest implementation of over-sampling is to
duplicate random records from the minority class, which
can cause overfishing.
In under-sampling, the simplest technique involves
removing random records from the majority class, which
can cause a loss of information.
Let’s implement this with the credit card fraud detection
example.
We will start by separating the class that will be 0 and
class 1.
# class count
class_count_0, class_count_1 = data['Class'].value_counts(

# Separate class
class_0 = data[data['Class'] == 0]
class_1 = data[data['Class'] == 1]# print the shape of the
print('class 0:', class_0.shape)
print('class 1:', class_1.shape

1. Random Under-Sampling
Undersampling can be defined as removing some
observations of the majority class. This is done until the
majority and minority class is balanced out.
Undersampling can be a good choice when you have a ton
of data -think millions of rows. But a drawback to
undersampling is that we are removing information that
may be valuable.

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 5/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

class_0_under = class_0.sample(class_count_1)

test_under = pd.concat([class_0_under, class_1], axis=0)

print("total class of 1 and0:",test_under['Class'].value_c

test_under['Class'].value_counts().plot(kind='bar', title=

2. Random Over-Sampling
Oversampling can be defined as adding more copies to
the minority class. Oversampling in machine learning can
be a good choice when you don’t have a ton of data to
work with.
A con to consider when undersampling is that it can cause
overfitting and poor generalization to your test set.
class_1_over = class_1.sample(class_count_0, replace=True)

test_over = pd.concat([class_1_over, class_0], axis=0)

print("total class of 1 and 0:",test_under['Class'].value_

test_over['Class'].value_counts().plot(kind='bar', title=

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 6/16
How to Balance Data With the Imbalanced-
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

Learn Python Module?

A number of more sophisticated resampling techniques
have been proposed in the scientific literature.
For example, we can cluster the records of the majority
class and do the under-sampling by removing records
from each cluster, thus seeking to preserve information. In
over-sampling, instead of creating exact copies of the
minority class records, we can introduce small variations
into those copies, creating more diverse synthetic
samples.
Let’s apply some of these resampling techniques using
the Python library imbalanced-learn. It is compatible with
scikit-learn and is part of scikit-learn-contrib projects.
import imblearn

3. Random Under-Sampling With Imblearn

You may have heard about pandas, numpy, matplotlib, etc.
while learning data science. But there is another library:
imblearn, which is used to sample imbalanced datasets
and improve your model performance.
RandomUnderSampler is a fast and easy way to balance the
data by randomly selecting a subset of data for the
targeted classes. Under-sample the majority class(es) by
randomly picking samples with or without replacement.
# import library
from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(random_state=42, replacement=True

x_rus, y_rus = rus.fit_resample(x, y)

print('original dataset shape:', Counter(y))

print('Resample dataset shape', Counter(y_rus))

4. Random Over-Sampling With imblearn

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 7/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

One way to fight imbalanced data is to generate new

samples in the minority classes. The most naive strategy
is to generate new samples by random sampling with the
replacement of the currently available samples.
The RandomOverSampler offers such a scheme.
# import library
from imblearn.over_sampling import RandomOverSampler

ros = RandomOverSampler(random_state=42)

# fit predictor and target variablex_ros, y_ros = ros.fit_

print('Original dataset shape', Counter(y))

print('Resample dataset shape', Counter(y_ros))

5. Under-Sampling: Tomek Links

Tomek links are pairs of very close instances but of
opposite classes. Removing the instances of the majority
class of each pair increases the space between the two
classes, facilitating the classification process.
Tomek’s link exists if the two samples are the nearest
neighbors of each other.

In the code below, we’ll use ratio='majority' to resample

the majority class.
# import library
from imblearn.under_sampling import TomekLinks

tl = RandomOverSampler(sampling_strategy='majority')

# fit predictor and target variable

x_tl, y_tl = ros.fit_resample(x, y)

print('Original dataset shape', Counter(y))

print('Resample dataset shape', Counter(y_ros))

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 8/16
6. Synthetic Minority Oversampling
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

Technique (SMOTE)
This technique generates synthetic data for the minority
class.
SMOTE (Synthetic Minority Oversampling Technique in
machine learning) works by randomly picking a point from
the minority class and computing the k-nearest neighbors
for this point. The synthetic points are added between
the chosen point and its neighbors.

SMOTE algorithm works in 4 simple steps:

1. Choose a minority class as the input vector.
2. Find its k nearest neighbors (k_neighbors is specified
as an argument in the SMOTE() function).
3. Choose one of these neighbors and place a synthetic
point anywhere on the line joining the point under
consideration and its chosen neighbor.
4. Repeat the steps until the data is balanced.
# import library
from imblearn.over_sampling import SMOTE

smote = SMOTE()

# fit predictor and target variable

x_smote, y_smote = smote.fit_resample(x, y)

print('Original dataset shape', Counter(y))

print('Resample dataset shape', Counter(y_ros))

7. NearMiss
NearMiss is an under-sampling technique. Instead of
resampling the Minority class, using a distance will make
the majority class equal to the minority class.

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 9/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

from imblearn.under_sampling import NearMiss

nm = NearMiss()

x_nm, y_nm = nm.fit_resample(x, y)

print('Original dataset shape:', Counter(y))

print('Resample dataset shape:', Counter(y_nm))

8. Change the Performance Metric

Accuracy is not the best metric to use when evaluating
imbalanced datasets, as it can be misleading.
Metrics that can provide better insight are:
Confusion Matrix: a table showing correct predictions
and types of incorrect predictions.
Precision: the number of true positives divided by all
positive predictions. Precision is also called Positive
Predictive Value. It is a measure of a classifier’s
exactness. Low precision indicates a high number of
false positives.
Recall: the number of true positives divided by the
number of positive values in the test data. The recall is
also called Sensitivity or the True Positive Rate. It is a
measure of a classifier’s completeness. Low recall
indicates a high number of false negatives.
F1: Score: the weighted average of precision and
recall.
Area Under ROC Curve (AUROC): AUROC represents
the likelihood of your model distinguishing
observations from two classes.
In other words, if you randomly select one observation
from each class, what’s the probability that your model
will be able to “rank” them correctly?
9. Penalize Algorithms (Cost-Sensitive
Training)
The next tactic is to use penalized learning algorithms
that increase the cost of classification mistakes in the
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 10/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

minority class.
A popular algorithm for this technique is Penalized-SVM.
During training, we can use the
argument class_weight=’balanced’ to penalize mistakes
on the minority class by an amount proportional to how
under-represented it is.
We also want to include the argument probability=True if
we want to enable probability estimates for SVM
algorithms.
Let’s train a model using Penalized-SVM on the original
imbalanced dataset:
# load library
from sklearn.svm import SVC

# we can add class_weight='balanced' to add panalize mista

svc_model = SVC(class_weight='balanced', probability=True)

svc_model.fit(x_train, y_train)

svc_predict = svc_model.predict(x_test)# check performance

print('ROCAUC score:',roc_auc_score(y_test, svc_predict))
print('Accuracy score:',accuracy_score(y_test, svc_predict
print('F1 score:',f1_score(y_test, svc_predict))

10. Change the Algorithm

While in every machine learning problem, it’s a good rule
of thumb to try a variety of algorithms, it can be especially
beneficial with imbalanced datasets.
Decision trees frequently perform well on imbalanced
data. In modern machine learning, tree ensembles
(Random Forests, Gradient Boosted Trees, etc.) almost
always outperform singular decision trees, so we’ll jump
right into those:
Tree base algorithm work by learning a hierarchy of if/else
questions. This can force both classes to be addressed.
# load library
from sklearn.ensemble import RandomForestClassifier

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 11/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

rfc = RandomForestClassifier()

# fit the predictor and target

rfc.fit(x_train, y_train)

# predict
rfc_predict = rfc.predict(x_test)# check performance
print('ROCAUC score:',roc_auc_score(y_test, rfc_predict))
print('Accuracy score:',accuracy_score(y_test, rfc_predict
print('F1 score:',f1_score(y_test, rfc_predict))

Advantages and Disadvantages of Under-

Sampling
Advantage:
It can help improve run time and storage problems by
reducing the number of training data samples when
the training data set is huge.
Disadvantages:
It can discard potentially useful information which
could be important for building rule classifiers.
The sample chosen by random under-sampling may be
a biased sample. And it will not be an accurate
representation of the population. Thereby resulting in
inaccurate results with the actual test data set.
Advantages and Disadvantages of Over-
Sampling
Advantages:
Unlike under-sampling, this method leads to no
information loss.
Outperforms under sampling
Disadvantages:
It increases the likelihood of overfitting since it
replicates the minority class events.
Conclusion
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 12/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

To summarize, in this article, we have seen various

techniques to handle the class imbalance in a dataset.
There are actually many methods to try when dealing with
imbalanced data. You can check the implementation of
these codes in my GitHub repository here.
Key Takeaways
In this article, we learned about the different
techniques that we can perform to handle class
imbalance in machine learning.
Some of the most widely used techniques are SMOTE,
imblearn oversampling, and under sampling.
There is no “best“ method for handling imbalance, it
depends on your use case.
Frequently Asked Questions
Class Imbalance Imbalanced Dataset Imblearn
NearMiss Random Undersampling SMOTE

G guest_blog
14 Jun, 2024

Classification Intermediate Machine Learning

Python Structured Data

5 Techniques to Handle Imbalanced Data For a Class...

Practical Guide to Deal with Imbalanced Classifica...
Handling Imbalanced Data with Imbalance-Learn in P...
Handling Imbalanced Data – Machine Learning,...
Is Adult Income Dataset Imbalanced?
Imbalanced Data : How to handle Imbalanced Classif...

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 13/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

SMOTE for Imbalanced Classification with Python

How to Improve Class Imbalance using Class
Weights...
Practicing Machine Learning with Imbalanced Dataset
Building Customer Churn Prediction Model With Imba...

Frequently Asked Questions

Q1. What are class imbalances?
A. Class imbalances in MLhappen when the categories in
your dataset are not evenly represented. For example, in a
medical dataset, you might have many more healthy patients
than sick ones. This can make it hard for a model to learn to
recognize the less common category (the sick patients in
this case).

Q2. What is a class balance?

Q3. How to solve class imbalance problem?

Q4. Which loss is best for class imbalance?

Responses From Readers

What are your thoughts?...

Submit reply

Shashi
05 Dec, 2021
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 14/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python

What is x and y in the following code: # import library

from imblearn.over_sampling import
RandomOverSampler ros =
RandomOverSampler(random_state=42) # fit predictor
and target variablex_ros, y_ros = ros.fit_resample(x, y)
print('Original dataset shape', Counter(y))
print('Resample dataset shape', Counter(y_ros))

Akanksha
11 Jun, 2022
ig x are the feature values for records and y is the overall
labeled data for records.

Cynthia

Write for us
Write, captivate, and earn accolades and rewards for your
work

Reach a Global Audience

Get Expert Feedback
Build Your Brand & Audience
Cash In on Your Knowledge
Join a Thriving Community
Level Up Your Data Science Game

Sion Chakrabarti CHIRAG

16 87

Company Discover
About Us Blogs
Contact Us Expert session
Careers Podcasts
Comprehensive Guides
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 15/16
1/7/24, 9:32 p.m. Class Imbalance in ML: 10 Best Ways to Solve it Using Python
Learn Engage
Free courses Community
Learning path Hackathons
BlackBelt program Events
Gen AI Daily challenges
Contribute Enterprise
Contribute & win Our offerings
Become a speaker Case studies
Become a mentor Industry report
Become an instructor quexto.ai

Download App

Terms & conditions Refund Policy Privacy Policy

https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/ 16/16

Operation Manual: Jesma Filter
100% (4)
Operation Manual: Jesma Filter
50 pages
Imbalanced Data Problem in Machine Learning A Review
No ratings yet
Imbalanced Data Problem in Machine Learning A Review
14 pages
VUTTIPITTAYAMONGKOL 2021 On The Class Overlap Problem
No ratings yet
VUTTIPITTAYAMONGKOL 2021 On The Class Overlap Problem
56 pages
Synthesizing Class Labels For Highly Imbalanced Credit Card Fraud Detection Data
No ratings yet
Synthesizing Class Labels For Highly Imbalanced Credit Card Fraud Detection Data
22 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
81 pages
Handling Imbalanced Datasets
No ratings yet
Handling Imbalanced Datasets
21 pages
Imbalanced Dataset Techniques
No ratings yet
Imbalanced Dataset Techniques
16 pages
Introduction To Imbalanced Datasets
No ratings yet
Introduction To Imbalanced Datasets
10 pages
Classification of Imbalanced Data A Review
No ratings yet
Classification of Imbalanced Data A Review
34 pages
Lesson 3
No ratings yet
Lesson 3
8 pages
Class Notes
No ratings yet
Class Notes
24 pages
Final Presentation
No ratings yet
Final Presentation
12 pages
Foundations of Data Imbalance and Solutions For A Data Democracy
No ratings yet
Foundations of Data Imbalance and Solutions For A Data Democracy
20 pages
Imbalanced Data Classification Method Based On LSSASMOTE
No ratings yet
Imbalanced Data Classification Method Based On LSSASMOTE
9 pages
MK-SMOTE and M-SMOTE: Enhanced Techniques For Handling Class Imbalance Problem
No ratings yet
MK-SMOTE and M-SMOTE: Enhanced Techniques For Handling Class Imbalance Problem
19 pages
Learning From Imbalanced Classes
100% (1)
Learning From Imbalanced Classes
33 pages
5 Techniques To Handle Imbalanced Data For A Classification Problem
No ratings yet
5 Techniques To Handle Imbalanced Data For A Classification Problem
7 pages
2515-Article Text-14337-4-10-20230331
No ratings yet
2515-Article Text-14337-4-10-20230331
12 pages
A Unifying View of Class Overlap and Imbalance
No ratings yet
A Unifying View of Class Overlap and Imbalance
26 pages
Handling Imbalanced Ratio For Class Imbalance Problem Using SMOTE
No ratings yet
Handling Imbalanced Ratio For Class Imbalance Problem Using SMOTE
12 pages
04 1a-Checkpoint1
No ratings yet
04 1a-Checkpoint1
6 pages
10 Techniques To Deal With Class Imbalance in Machine Learning
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
10 pages
Machine Learning With Oversampling and Undersampling Techniques Overview Study and Experimental Results
No ratings yet
Machine Learning With Oversampling and Undersampling Techniques Overview Study and Experimental Results
6 pages
Synth
No ratings yet
Synth
6 pages
Axioms 11 00607 v2
No ratings yet
Axioms 11 00607 v2
19 pages
AReviewon Oversampling Techniquesfor Solvingthe Data Imbalance Problemin Classification
No ratings yet
AReviewon Oversampling Techniquesfor Solvingthe Data Imbalance Problemin Classification
11 pages
NICE Actimize - DS - Rarity Problem in Supervised Fraud Detection Insights Article - 3JUNE20
No ratings yet
NICE Actimize - DS - Rarity Problem in Supervised Fraud Detection Insights Article - 3JUNE20
11 pages
Report
No ratings yet
Report
14 pages
8 Tactics To Combat Imbalanced Classes in Your Machine Learning Dataset - Machine Learning Mastery by Jason Brownlee
No ratings yet
8 Tactics To Combat Imbalanced Classes in Your Machine Learning Dataset - Machine Learning Mastery by Jason Brownlee
7 pages
Oversampling Techniques For Imbalanced Data in Regression
No ratings yet
Oversampling Techniques For Imbalanced Data in Regression
19 pages
8 Tactics To Combat Imbalanced Classes in Your Machine Learning Dataset
No ratings yet
8 Tactics To Combat Imbalanced Classes in Your Machine Learning Dataset
62 pages
Handling Data Imbalance in Machine Learning
No ratings yet
Handling Data Imbalance in Machine Learning
51 pages
Modeling Imbalance Class
No ratings yet
Modeling Imbalance Class
24 pages
Performance Evaluation of Class Balancing
No ratings yet
Performance Evaluation of Class Balancing
6 pages
Slides Imbalanced Learning Intro
No ratings yet
Slides Imbalanced Learning Intro
7 pages
Clustering Based Undersampling For Handling Class Imbalance in C4.5 Classification Algorithm
No ratings yet
Clustering Based Undersampling For Handling Class Imbalance in C4.5 Classification Algorithm
7 pages
Paper 6 - 240417 - 184500 OCR
No ratings yet
Paper 6 - 240417 - 184500 OCR
11 pages
MEE22154 Task2
No ratings yet
MEE22154 Task2
4 pages
An Overview of Classification Algorithms For Imbalanced Datasets
No ratings yet
An Overview of Classification Algorithms For Imbalanced Datasets
7 pages
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
No ratings yet
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
9 pages
Catboost ET Comparaison
No ratings yet
Catboost ET Comparaison
20 pages
11-A-SMOTE A New Preprocessing Approach For Highly Im
No ratings yet
11-A-SMOTE A New Preprocessing Approach For Highly Im
11 pages
Icpcsi.2017.8392219
No ratings yet
Icpcsi.2017.8392219
6 pages
Imbalanced Classes Cheatsheet
No ratings yet
Imbalanced Classes Cheatsheet
1 page
1608 06048 PDF
No ratings yet
1608 06048 PDF
7 pages
Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means and SMOTE
No ratings yet
Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means and SMOTE
20 pages
Improving The Performance of Your Imbalanced Machine Learning Classifiers
No ratings yet
Improving The Performance of Your Imbalanced Machine Learning Classifiers
26 pages
Handling Imbalanced Datasets in Machine Learning - by Baptiste Rocca - Towards Data Science
No ratings yet
Handling Imbalanced Datasets in Machine Learning - by Baptiste Rocca - Towards Data Science
24 pages
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
100% (1)
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
36 pages
A Systematic Study of The Class Imbalance Problem in Convolutional Neural Networks
No ratings yet
A Systematic Study of The Class Imbalance Problem in Convolutional Neural Networks
21 pages
Ensemble Models For Effective Classification of Big Data With Data Imbalance
No ratings yet
Ensemble Models For Effective Classification of Big Data With Data Imbalance
17 pages
Deep Learning and Thresholding With Class-Imbalanced Big Data
No ratings yet
Deep Learning and Thresholding With Class-Imbalanced Big Data
8 pages
Bagging Using Instance-Level Difficulty For Multi-Class Imbalanced Big Data Classification On Spark
No ratings yet
Bagging Using Instance-Level Difficulty For Multi-Class Imbalanced Big Data Classification On Spark
10 pages
Addressing Imbalance Problem in The Class - A Survey
No ratings yet
Addressing Imbalance Problem in The Class - A Survey
5 pages
Class Imbalance Problem in Data Mining: Review
No ratings yet
Class Imbalance Problem in Data Mining: Review
5 pages
Hydrotherapy Lecture...
100% (2)
Hydrotherapy Lecture...
76 pages
An Insight Into Classification With Imbalanced Data
No ratings yet
An Insight Into Classification With Imbalanced Data
29 pages
A Study For The Discovery of Web Usage Patterns Using Soft Computing Based Data Clustering Techniques
No ratings yet
A Study For The Discovery of Web Usage Patterns Using Soft Computing Based Data Clustering Techniques
14 pages
Astm f2882
No ratings yet
Astm f2882
7 pages
Inventory Management Summary
No ratings yet
Inventory Management Summary
5 pages
Recirculation Pump Sizing
No ratings yet
Recirculation Pump Sizing
3 pages
TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
No ratings yet
TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
121 pages
Handling Imbalanced Data
No ratings yet
Handling Imbalanced Data
21 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
Water Level Indicator
No ratings yet
Water Level Indicator
29 pages
Clutches Technical Data
No ratings yet
Clutches Technical Data
7 pages
VVDED302023 Altistart 48 Modbus Protocol
No ratings yet
VVDED302023 Altistart 48 Modbus Protocol
61 pages
BAEMIN Group Report - SMK Group 10
No ratings yet
BAEMIN Group Report - SMK Group 10
26 pages
Statistics of Inheritance POGIL
50% (2)
Statistics of Inheritance POGIL
3 pages
LP LECTURE NOTES-1 Linux Programming PDF
No ratings yet
LP LECTURE NOTES-1 Linux Programming PDF
235 pages
How To - Determine Stayed Surface Calculations - Power Engineering 101
No ratings yet
How To - Determine Stayed Surface Calculations - Power Engineering 101
12 pages
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
No ratings yet
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
51 pages
SSC CGL Physics in English d241009b
No ratings yet
SSC CGL Physics in English d241009b
13 pages
90 Integrals
No ratings yet
90 Integrals
2 pages
Making Ubuntu Unity Look Beautiful by Enabling Transparency
No ratings yet
Making Ubuntu Unity Look Beautiful by Enabling Transparency
3 pages
STA301-Quiz-2 by Vu Topper RM
No ratings yet
STA301-Quiz-2 by Vu Topper RM
125 pages
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
No ratings yet
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
17 pages
17 GEOG245 Tutorial9 PDF
No ratings yet
17 GEOG245 Tutorial9 PDF
7 pages
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
No ratings yet
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
8 pages
Andrew Antena CV3PX310R1 CRET INTEGRADO
No ratings yet
Andrew Antena CV3PX310R1 CRET INTEGRADO
2 pages
Self-Learning Home Task (SLHT)
No ratings yet
Self-Learning Home Task (SLHT)
6 pages
Stone Masonry For Structures
No ratings yet
Stone Masonry For Structures
8 pages
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
No ratings yet
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
2 pages
Re-Evaluation of The 2,2-Diphenyl-1-Picrylhydrazyl Free Radical (DPPH) Assay For Antioxidant Activity
No ratings yet
Re-Evaluation of The 2,2-Diphenyl-1-Picrylhydrazyl Free Radical (DPPH) Assay For Antioxidant Activity
10 pages
6.state in React
No ratings yet
6.state in React
31 pages
Design and Development of Hand Operate Milk Churn Machine: Tandin Wangdi, Chenga Dorji, Namgay Dorji and Norbu Tshering
No ratings yet
Design and Development of Hand Operate Milk Churn Machine: Tandin Wangdi, Chenga Dorji, Namgay Dorji and Norbu Tshering
4 pages
DSK Audio4 Reva 1
No ratings yet
DSK Audio4 Reva 1
15 pages
AE264 Spring2014 HW1
No ratings yet
AE264 Spring2014 HW1
3 pages
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
From Everand
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
alasdair gilchrist
4.5/5 (5)
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet

10 Techniques To Solve Imbalanced Classes in ML

Uploaded by

10 Techniques To Solve Imbalanced Classes in ML

Uploaded by

1/7/24, 9:32 p.m.

Class Imbalance in ML: 10 Best Ways to Solve it Using Python

Class Imbalance in machine learning oversampling in

in many domains, including fraud detection, spam filtering,

The Problem With Class Imbalance in

But here’s the catch… fraud transaction is relatively rare.

Well, guess what? Your “solution” would have 94%

You can clearly see that there is a huge difference

xgb_model = XGBClassifier().fit(x_train, y_train)

print('Accuracy score is:', xbg_score)OUTPUT

We can see 99% accuracy, we are getting very high

Despite the advantage of balancing classes, these

test_under = pd.concat([class_0_under, class_1], axis=0)

print("total class of 1 and0:",test_under['Class'].value_c

test_over = pd.concat([class_1_over, class_0], axis=0)

print("total class of 1 and 0:",test_under['Class'].value_

Learn Python Module?

3. Random Under-Sampling With Imblearn

rus = RandomUnderSampler(random_state=42, replacement=True

print('original dataset shape:', Counter(y))

4. Random Over-Sampling With imblearn

One way to fight imbalanced data is to generate new

# fit predictor and target variablex_ros, y_ros = ros.fit_

print('Original dataset shape', Counter(y))

5. Under-Sampling: Tomek Links

In the code below, we’ll use ratio='majority' to resample

# fit predictor and target variable

print('Original dataset shape', Counter(y))

SMOTE algorithm works in 4 simple steps:

# fit predictor and target variable

print('Original dataset shape', Counter(y))

from imblearn.under_sampling import NearMiss

x_nm, y_nm = nm.fit_resample(x, y)

print('Original dataset shape:', Counter(y))

8. Change the Performance Metric

# we can add class_weight='balanced' to add panalize mista

svc_predict = svc_model.predict(x_test)# check performance

10. Change the Algorithm

# fit the predictor and target

Advantages and Disadvantages of Under-

To summarize, in this article, we have seen various

Classification Intermediate Machine Learning

5 Techniques to Handle Imbalanced Data For a Class...

SMOTE for Imbalanced Classification with Python

Frequently Asked Questions

Q2. What is a class balance?

Q3. How to solve class imbalance problem?

Q4. Which loss is best for class imbalance?

Responses From Readers

What is x and y in the following code: # import library

Reach a Global Audience

Sion Chakrabarti CHIRAG

Terms & conditions Refund Policy Privacy Policy

You might also like