0% found this document useful (0 votes)
8 views4 pages

Ads Lab5

The experiment aimed to address class imbalance using the SMOTE technique by generating synthetic data. It involved comparing the performance of a Random Forest classifier on original and SMOTE-resampled datasets, revealing that SMOTE had negligible effects on model performance. The study highlighted the importance of careful application of SMOTE, considering dataset characteristics and potential impacts on model interpretability.

Uploaded by

abhijaysingh66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Ads Lab5

The experiment aimed to address class imbalance using the SMOTE technique by generating synthetic data. It involved comparing the performance of a Random Forest classifier on original and SMOTE-resampled datasets, revealing that SMOTE had negligible effects on model performance. The study highlighted the importance of careful application of SMOTE, considering dataset characteristics and potential impacts on model interpretability.

Uploaded by

abhijaysingh66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

EXPERIMENT NO.

05
AIM: Use the SMOTE technique to generate synthetic data (to solve the problem of class
imbalance) Use any dataset to check for imbalance ratio and perform balancing by random
generating and with SMOTE and compare the accuracy and other evaluation metrics
THEORY:

Class Imbalance:
In many real-world classification problems, the distribution of classes is often uneven, with
one class significantly outnumbering the others. This class imbalance can lead machine
learning models to be biased towards the majority class, resulting in poor performance on the
minority class. This is a common issue in various domains, such as fraud detection, medical
diagnosis, and rare event prediction.
Synthetic Minority Over-sampling Technique (SMOTE):
1. Introduction:
●​ Objective: SMOTE aims to alleviate the impact of class imbalance by oversampling
the minority class through the generation of synthetic examples.
●​ Key Idea: Instead of replicating existing minority class instances, SMOTE creates
synthetic samples by interpolating between existing minority class instances.
2. How SMOTE Works:
●​ Nearest Neighbors: SMOTE operates by selecting a minority class instance and its
k-nearest neighbors.
●​ Synthetic Sample Creation: A synthetic sample is generated by selecting one of the
k-nearest neighbors and creating a convex combination of the feature values between
the selected instance and that neighbor.
3. Algorithm Steps:
For each minority class instance:
●​ Identify its k-nearest neighbors.
●​ Randomly select one of the neighbors.
●​ Generate synthetic samples by interpolating between the chosen neighbor and the
original instance.
●​ Repeat until the desired balance between classes is achieved.
4. Advantages of SMOTE:
●​ Mitigating Overfitting: SMOTE helps in reducing overfitting, as it introduces
diversity into the dataset without simply duplicating existing examples.
●​ Improved Generalization: The synthetic samples contribute to a better generalization
of the model, especially when the available data is limited.
5. Considerations:
●​ Parameter Tuning: Users need to decide on the number of synthetic samples to
generate (controlled by parameters like the oversampling ratio and k-neighbors).
●​ Impact on Model Interpretability: Introducing synthetic samples may affect the
interpretability of the model, as these examples do not correspond to actual
observations.
6. Limitations:
Sensitive to Noise: SMOTE may introduce noise when dealing with noisy datasets.
Potential Overfitting: If not used cautiously, SMOTE might lead to overfitting, especially
when the number of synthetic samples is excessive.
In summary, SMOTE is a valuable technique for addressing class imbalance by creating
synthetic samples for the minority class. However, its application should be done carefully,
considering the characteristics of the dataset and potential impacts on model performance and
interpretability
CODE:
import numpy as np
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
np.random.seed(42)
X_minority = np.random.rand(100, 20) # Minority class
X_majority = np.random.rand(900, 20) # Majority class
X = np.vstack((X_minority, X_majority))
y = np.hstack((np.ones(100), np.zeros(900))) # Labels (1 for minority, 0 for majority)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
imbalance_ratio = np.sum(y_train == 0) / np.sum(y_train == 1)
print(f"Imbalance ratio before balancing: {imbalance_ratio}")
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
imbalance_ratio_after_smote = np.sum(y_resampled == 0) / np.sum(y_resampled == 1)
print(f"Imbalance ratio after SMOTE: {imbalance_ratio_after_smote}")
clf_original = RandomForestClassifier(random_state=42)
clf_original.fit(X_train, y_train)
clf_smote = RandomForestClassifier(random_state=42)
clf_smote.fit(X_resampled, y_resampled)
def evaluate_model(clf, X_test, y_test):
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)
return accuracy, precision, recall, f1, confusion_mat
accuracy_original, precision_original, recall_original, f1_original, conf_mat_original =
evaluate_model(clf_original, X_test, y_test)
accuracy_smote, precision_smote, recall_smote, f1_smote, conf_mat_smote =
evaluate_model(clf_smote, X_test, y_test)
print("\nResults on the original test set:")
print(f"Accuracy: {accuracy_original}")
print(f"Precision: {precision_original}")
print(f"Recall: {recall_original}")
print(f"F1 Score: {f1_original}")
print(f"Confusion Matrix:\n{conf_mat_original}")
print("\nResults on the SMOTE-resampled test set:")
print(f"Accuracy: {accuracy_smote}")
print(f"Precision: {precision_smote}")
print(f"Recall: {recall_smote}")
print(f"F1 Score: {f1_smote}")
print(f"Confusion Matrix:\n{conf_mat_smote}")
OUTPUT:

CONCLUSION:
We successfully studied that With a mildly imbalanced dataset, applying SMOTE resulted in
negligible changes in model performance.

You might also like