0% found this document useful (0 votes)
5 views9 pages

Ads Exp 8

This document discusses methods for overcoming class imbalance in datasets, focusing on SMOTE (Synthetic Minority Oversampling Technique) as a key resampling technique. It outlines various approaches including data-level, algorithm-level, and hybrid techniques, highlighting the advantages and limitations of SMOTE. The conclusion emphasizes the effectiveness of SMOTE in improving model performance and reducing bias towards the majority class.

Uploaded by

sakshipssb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Ads Exp 8

This document discusses methods for overcoming class imbalance in datasets, focusing on SMOTE (Synthetic Minority Oversampling Technique) as a key resampling technique. It outlines various approaches including data-level, algorithm-level, and hybrid techniques, highlighting the advantages and limitations of SMOTE. The conclusion emphasizes the effectiveness of SMOTE in improving model performance and reducing bias towards the majority class.

Uploaded by

sakshipssb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EXPERIMENT NO.

AIM: Overcoming class imbalance using SMOTE techniques

THEORY

Handling Imbalanced Data

Class imbalance occurs when one class significantly outnumbers the other in a dataset.

1. Techniques for Handling Imbalanced Data

a) Data-Level Approaches (Resampling)

Resampling techniques modify the dataset before training the model.

●​ Oversampling (Minority Class Replication)


○​ Increases instances of the minority class.
○​ Risk: Overfitting due to duplicated samples.
●​ Undersampling (Majority Class Reduction)
○​ Reduces instances of the majority class.
○​ Risk: Loss of important data, reducing model performance.
●​ Synthetic Oversampling (e.g., SMOTE, ADASYN)
○​ Creates synthetic examples for the minority class rather than simple
replication.

b) Algorithm-Level Approaches

Some models can handle imbalance natively.

●​ Cost-Sensitive Learning
○​ Assigns higher misclassification penalties for the minority class.
●​ Ensemble Methods (Bagging & Boosting)
○​ Random Forest & XGBoost have built-in options to address imbalance.

c) Hybrid Approaches

Combining data-level and algorithm-level techniques often improves performance.

●​ Balanced Bagging Classifier: Uses undersampling within ensemble learning.


●​ SMOTE + Cost-Sensitive Learning: Synthetic oversampling combined with
weighted loss functions.
Synthetic Minority Oversampling Technique (SMOTE):

SMOTE is a powerful oversampling technique that generates synthetic examples for the
minority class instead of simply duplicating existing instances.

1. How SMOTE Works

1.​ Identify Minority Class Samples


○​ SMOTE selects samples from the minority class at random.
2.​ Find k-Nearest Neighbors
○​ For each selected sample, it identifies k-nearest minority class neighbors.
3.​ Generate Synthetic Samples
○​ New points are created between the selected sample and its neighbors
using interpolation.
○​ Example formula: xnew=xexisting+λ(xneighbor−xexisting)x_{\text{new}} =
x_{\text{existing}} + \lambda (x_{\text{neighbor}} -
x_{\text{existing}})xnew​=xexisting​+λ(xneighbor​−xexisting​) where
λ\lambdaλ is a random number between 0 and 1.
4.​ Repeat Until Balance is Achieved
○​ The process continues until the dataset reaches the desired class
balance.

2. Advantages of SMOTE

●​ Reduces overfitting caused by simple duplication of minority class samples.


●​ Works well for continuous data.
●​ Enhances classifier performance by providing more diverse training samples.

3. Limitations of SMOTE

●​ Can introduce noise if synthetic points are poorly generated.


●​ Does not work well with categorical features.
●​ May increase training time.
Comparison of Model Performance (With and Without SMOTE):

1. Model Without Handling Imbalance

●​ Trained on original dataset.


●​ High accuracy but biased towards majority class.
●​ Poor recall for the minority class.

2. Model After Applying SMOTE

●​ Improved balance in training data.


●​ Increased recall and F1-score.
●​ Lower risk of model bias.

By comparing F1-scores before and after SMOTE, we can observe its effectiveness in
handling imbalance.

CONCLUSION

In this experiment, we learn about and implement overcoming class imbalance using
SMOTE techniques.

You might also like