Module2(ML)
Module2(ML)
Feature selection Mechanisms, Imbalanced data, Outlier detection- Artificial neural networks
including
backpropagation- Applications give me a little bit detailed explanation of these topics
Feature selection involves choosing the most relevant features (or variables) for building a predictive
model. It's crucial for reducing model complexity, improving performance, and avoiding overfitting.
Filter Methods:
Use statistical tests to select features independent of the machine learning model.
Examples: Pearson correlation, Chi-square test, Mutual Information.
Wrapper Methods:
Evaluate subsets of features using a specific machine learning algorithm.
Examples: Recursive Feature Elimination (RFE), Forward Selection, Backward Elimination.
Embedded Methods:
Feature selection is part of the model training process.
Examples: LASSO (L1 Regularization), Tree-based methods (e.g., feature importance in
Random Forest).
Applications:
2. Imbalanced Data
Imbalanced data refers to datasets where the classes are not represented equally, e.g., 95% of one class
and 5% of another. This is common in real-world applications like fraud detection, medical diagnosis, or
churn prediction.
Challenges:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/3
Standard metrics like accuracy can be misleading (e.g., always predicting the majority class may
yield high accuracy but fail to identify minority cases).
Handling Techniques:
Data-Level Methods:
Oversampling: Increase minority samples using techniques like SMOTE (Synthetic Minority
Oversampling Technique).
Undersampling: Reduce majority class samples.
Hybrid Sampling: Combine oversampling and undersampling.
Algorithm-Level Methods:
Modify algorithms to handle imbalanced data.
Examples: Cost-sensitive learning, ensemble methods like Random Forest with balanced
weights.
Evaluation Metrics:
Use metrics like Precision, Recall, F1-Score, ROC-AUC, or PR curves to evaluate performance.
3. Outlier Detection
Outliers are data points significantly different from others. They can arise due to variability in the data or
errors during data collection.
Statistical Methods:
Z-score: Measures how far a data point is from the mean.
IQR (Interquartile Range): Data points outside 1.5 × IQR are considered outliers.
Machine Learning Approaches:
Clustering methods (e.g., DBSCAN, K-means with distance thresholds).
Isolation Forest: Detects anomalies by isolating instances.
Visualization Tools:
Box plots, scatter plots, and histograms.
Applications:
Structure:
Backpropagation:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/3
A key algorithm for training ANNs.
Steps:
1. Forward pass: Compute output and loss.
2. Backward pass: Calculate gradients using the chain rule.
3. Update weights using Gradient Descent or its variants (e.g., Adam).
Activation Functions:
5. Applications of ANNs
ANNs have a wide range of applications due to their ability to model complex patterns and relationships
in data.
Examples:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/3