0% found this document useful (0 votes)
1 views26 pages

INT354 - Unit 2

The document provides an overview of supervised learning, differentiating between regression and classification with real-life examples. It discusses how to choose classification algorithms based on dataset size, feature types, interpretability, and other factors. Additionally, it covers specific algorithms like Logistic Regression, Perceptron, and Decision Trees, including their mechanisms and comparisons.

Uploaded by

HARSH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views26 pages

INT354 - Unit 2

The document provides an overview of supervised learning, differentiating between regression and classification with real-life examples. It discusses how to choose classification algorithms based on dataset size, feature types, interpretability, and other factors. Additionally, it covers specific algorithms like Logistic Regression, Perceptron, and Decision Trees, including their mechanisms and comparisons.

Uploaded by

HARSH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Machine Learning Classifier-I

Overview of Supervised Learning


Regression Vs Classification
Regression Vs Classification
Real-life examples of Supervised Learning
Techniques
Estimating the price of house based on price,
Classifying emails as spam or not spam location and area

Categorizing patients into different


Predicting future sales of product based on
disease categories based on symptoms historical data
and test results

Forecasting the future price of stock based on


Identifying objects in images financial indicators

Predicting temperature, rainfall and other


Determining sentiments of text data weather conditions

Predicting creditworthiness of an individual or


Predicting the grade of student in ML company based on financial history

Predicting that whether India is going to Predicting product quality based on


win the match or not manufacturing process variables

Student is going to get the placement or Predict student performance based on


not attendance, grades and test scores

Classification Examples Regression Examples


Choosing a Classification Algorithm
• a) Small dataset (<10,000 samples) → Classical ML models (Logistic
Regression, SVM, Decision Tree)
• Large dataset (>10,000 samples) → Deep learning models (CNN, LSTM,
Transformer)

• b) Number of Features (Dimensionality)


• Low-dimensional data (<20 features) → Logistic Regression, Naïve Bayes,
Decision Tree
• High-dimensional data (>1000 features) → SVM (with kernel trick), Deep
Learning, Feature Selection needed
Choosing a Classification Algorithm
• c) Type of Features
• Numerical → Logistic Regression, Random Forest, XGBoost, Neural Networks
• Categorical → Decision Trees, Random Forest, XGBoost
• Mixed (Numerical + Categorical) → CatBoost, XGBoost, Decision Trees

• d) Data Imbalance
• Balanced → Any model works well
• Imbalanced → Consider:
• Resampling (Oversampling, SMOTE, Undersampling)
• Weighted loss function (XGBoost, CatBoost, Deep Learning)
Choosing a Classification Algorithm
• a) Interpretability
• High Interpretability Required → Logistic Regression, Decision Trees, Naïve Bayes
• Low Interpretability (but High Accuracy) → Random Forest, XGBoost, Deep
Learning

• b) Computational Complexity
• Low Complexity (Fast Training & Inference) → Logistic Regression, Naïve Bayes,
SVM (linear)
• Medium Complexity → Decision Trees, Random Forest, XGBoost
• High Complexity (Slow Training, High GPU Requirements) → Deep Learning (CNN,
LSTM, Transformers)
Choosing a Classification Algorithm
• a) Accuracy vs Speed Trade-off
• High Accuracy Required → XGBoost, Random Forest, Deep Learning
• Fast Computation Needed → Logistic Regression, Naïve Bayes

• b) Sensitivity to Noise
• Robust to Noise → Random Forest, XGBoost, Deep Learning
• Sensitive to Noise → SVM, Decision Trees (prone to overfitting)

• c) Overfitting Risk
• Low Risk → Random Forest, XGBoost (with regularization), Ridge/Lasso Regression
• High Risk → Decision Trees (without pruning), Deep Learning (without dropout)
Choosing a Classification Algorithm
• a) Sequential or Time-Series Data
• Yes → LSTM, GRU, Transformer, 1D-CNN
• No → Use traditional ML models

• b) Multiclass vs Binary
• Binary Classification → Logistic Regression, SVM, Decision Trees, Deep Learning
• Multiclass (≥3 classes) → Random Forest, XGBoost, Neural Networks (softmax activation)

• c) Streaming / Real-time Processing


• Yes → Online Learning Models (Incremental Learning with SGD, Adaptive Boosting, Streaming
Decision Trees)
• No → Batch Training Methods (Deep Learning, XGBoost)
Choosing a Classification Algorithm
Use Case Recommended Algorithm
Text Classification Naïve Bayes, Transformer (BERT), LSTM

Image Classification CNN, ResNet, EfficientNet, Vision Transformer

Tabular Data Classification XGBoost, Random Forest, Logistic Regression

Anomaly Detection Isolation Forest, Autoencoders, One-Class SVM

Medical Diagnosis XGBoost, Random Forest, CNN for Images

Fraud Detection Random Forest, XGBoost, Autoencoders


No Free Lunch Theorem
• No algorithm best for all problems.
• Performance depends on problem context.
• Data matters more than the algorithm
• Trade-offs exist: Accuracy vs. Efficiency.
• Evaluate models using cross-validation methods.
• Experiment with hyperparameters for optimization.
• Choose algorithm based on specific requirements.
Logistic Regression
• Logistic regression predicts binary class probabilities.
• Uses sigmoid function to map outputs.
• Optimized using cross-entropy loss function.
• Works well for linearly separable data.
• Extension: Multinomial logistic for multiple classes.
• Widely used for classification problems.
Logistic Regression
Logistic Regression Algorithm
Step 1: Compute Linear Combination of Features
Logistic Regression starts with a weighted sum of input features:
Z=w1X1+w2X2+...+wnXn+b

Step 2: Apply the Sigmoid Activation Function


Instead of predicting any real number, we transform Z using the sigmoid
function:
1
𝜎 𝑍 =
1 + 𝑒 !"
where:
• σ(Z) squashes the output between 0 and 1, making it a probability.
• If σ(Z) >0.5, classify as 1 (Positive Class).
• If σ(Z) ≤0.5, classify as 0 (Negative Class).
Logistic Regression Algorithm
Step 3: Define the Cost Function (Log Loss)
Instead of Mean Squared Error (MSE), we use Log Loss (Binary Cross-Entropy):
&
1
J 𝑤 = − +[𝑦# 𝑙𝑜𝑔𝑦#^ + (1 − 𝑦# ) log(1 − 𝑦#^ )]
𝑚
#$%
• Minimizing Log Loss ensures that the model maximizes the probability of
correct predictions.

Step 4: Optimize Weights Using Gradient Descent


Logistic Regression updates weights using Gradient Descent:
𝜕𝐽
𝑤( = 𝑤( − 𝛼
𝜕𝑤(
• Gradient Descent ensures the cost function is minimized.
Perceptron
Perceptron Algorithm
• Step 1: Initialize Weights & Bias
§ Set all weights (wi) and bias (b) to small random values or
zeros.
• Step 2: Compute the Output
• For each training sample (X,y):
§ Compute the weighted sum: Z=∑wiXi+b
§ Apply the step activation function:
§ ypred={1: if Z ≥0, 0: if Z<0}
Perceptron Algorithm
• Step 3: Update Weights Using the Perceptron Rule
• If the predicted output ypred matches the actual label y, do nothing.
• If incorrect, update the weights using the rule:
• wi=wi+η (y−ypred)Xi
• b = b + η (y - ypred)

• Step 4: Repeat Until Convergence


• Iterate over the dataset multiple times (epochs) until:
oAll samples are classified correctly.
oA stopping criterion (max iterations or minimal error) is met.
Decision Tree Classifier
• Models decisions as tree-like structures.
• Splits nodes using feature thresholds recursively.
• Handles both categorical and numerical data.
• Overfitting mitigated using pruning techniques.
• Easy to interpret, lacks generalization sometimes.
• Algorithms include ID3, CART, and C4.5.
Decision Tree Classifier
• A Decision Tree might split the data as follows:
• First Split on Credit Score:
• If Credit Score > 700 → No Default
• If Credit Score ≤ 700 → Check Age.
• Second Split on Age:
• If Age ≤ 30 → Default
• If Age > 30 → No Default

• This hierarchical decision-making forms a tree-like structure.


Decision Tree Classifier
• Root Node: The first decision point (e.g., Credit Score).
• Internal Nodes: Decision points based on feature splits.
• Leaves (Terminal Nodes): Final classification or prediction outcome.

• The algorithm determines the best feature to split on using:


• Gini Impurity → Used in CART (Classification and Regression Trees) Algorithm,
measures how “impure” a node is.
• Entropy (Information Gain) → Used in ID3 Algorithm, calculates reduction in
uncertainty.

• Pruning: Removes unnecessary branches to avoid overfitting.


• Pre-Pruning (Stopping early): Limits tree depth.
• Post-Pruning (Trimming later): Removes nodes after training.
ID3 Algorithm
• ID3 builds trees using information gain metric.
• Splits data based on entropy reduction.
• Suitable for categorical feature data splitting.
• Prone to overfitting with noisy data.
• Iteratively chooses best attribute for split.
• Simpler than CART and C4.5 algorithms.
ID3 Algorithm Start

Input Dataset contain N feature

While
i<=N

Compute entropy and information gain

Select best feature with minimum


entropy/maximum gain

Split dataset based on that feature

Make decision tree node containing that


feature
Make nodes of decision tree with subset of
data created
More
feature
to split

Stop
C4.5 Algorithm
Comparison of ID3 and C4.5
Feature ID3 (Iterative Dichotomiser 3) C4.5 (Successor of ID3)
Uses Entropy & Gain Ratio, which normalizes
Uses Entropy & Information Gain to choose the best feature
Splitting Criterion Information Gain to prevent bias towards
for splitting.
multi-valued attributes.
Handling Cannot handle numerical (continuous) attributes; requires Can handle continuous (numerical) attributes
Continuous discretization. by creating threshold splits.
Features (e.g., "High Credit Score" vs. "Low Credit Score"). (e.g., "Credit Score ≤ 680").
Handling Missing Can handle missing values by assigning
Cannot handle missing values.
Values probabilities to different attribute values.

Uses post-pruning to remove branches that


Tree Pruning No pruning; leads to overfitting.
do not improve accuracy, reducing overfitting.

Bias Towards Multi- Yes, because Information Gain favors attributes with more
No, because Gain Ratio corrects this bias.
Valued Attributes unique values.
Produces a simpler tree with better
Output Produces a large tree with possible overfitting.
generalization.
Slower than ID3 but more accurate due to
Efficiency Faster but less accurate due to lack of pruning.
pruning and handling continuous values.

You might also like