Module IV - Machine Learning
Module IV - Machine Learning
Learning
MODULE IV
SYLLABUS
Introduction to supervised and
unsupervised learning - Regression
and Classification algorithms -
Clustering and Dimensionality
Reduction - Model evaluation and
selection.
Machine Learning
●Machine learning (ML) is a branch of
artificial intelligence (AI) that focuses on
developing algorithms that enable
computers to learn from data and make
predictions or decisions without being
explicitly programmed.
●ML is used in various fields, including
healthcare, finance, marketing, and
robotics.
Machine Learning
Types of Machine Learning
Logistic Regression
Decision Trees
Random Forests
Polynomial Regression
Lasso Regression
Supervised Learning
2. Classification
Classification is a type of supervised learning that is used to
predict categorical values, such as whether a customer will
churn or not, whether an email is spam or not, or whether a
medical image shows a tumor or not. Classification
algorithms learn a function that maps from the input features
to a probability distribution over the output classes.
Decision Trees
Random Forests
Supervised Learning
Applications of Supervised learning
Hierarchical clustering
Deep learning models for complex, Estimating property values with images
Neural Networks (for Regression)
nonlinear regressions. and data.
Regression and Classification
Classification: algorithms
Classification is a process of finding a function which helps
in dividing the dataset into classes based on different
parameters. In Classification, a computer program is
trained on the training dataset and based on that training,
it categorizes the data into different classes.
The task of the classification algorithm is to find the
mapping function to map the input(x) to the discrete
output(y).
Decision Trees Tree-based model splitting data by features. Customer churn prediction.
Gradient Boosting Classifier Ensemble of weak classifiers optimized Fraud detection, medical
(XGBoost, LightGBM) sequentially. diagnosis.
Deep models good for complex, high-
Neural Networks (for Classification) Voice recognition, image tagging.
dimensional data.
Regression and Classification
algorithms
Regression and Classification
algorithms
Regression Types
1. Linear Regression
Description: Models the relationship between a
dependent variable and one or more independent
variables using a straight line.
Example: Predicting house prices based on square
footage.
2. Multiple Linear Regression
Description: Extension of linear regression with two
or more independent variables.
Example: Predicting salary based on experience,
Regression Types
3. Polynomial Regression
Description: Models the relationship as an nth-
degree polynomial. Useful when data shows a non-
linear trend.
Example: Predicting growth over time that
accelerates or decelerates.
4. Ridge Regression (L2 Regularization)
Description: Linear regression with a penalty for
large coefficients to reduce overfitting.
Example: When there’s multicollinearity in the data
Regression Types
5. Lasso Regression (L1 Regularization)
Description: Similar to Ridge, but can shrink some
coefficients to zero, which helps with feature
selection.
Example: When you want a simpler, more
interpretable model.
6. Logistic Regression
Description: Despite its name, it’s used for
classification, not regression. Outputs probabilities
using a logistic (sigmoid) function.
Classification Types
1. Binary Classification
Description: Two classes only (e.g., yes/no, 0/1).
Examples:
Spam vs. Not Spam, Disease vs. No Disease
2. Multiclass Classification
Description: More than two classes, but only one correct
class per instance.
Examples:
Digit recognition (0–9), Animal classification (cat, dog,
Classification Types
3. Imbalanced Classification
Examples:
Fraud detection (fraud = rare, normal = common)
4. Multilabel Classification
Examples:
A movie can be tagged as both Action and Comedy
Classification Types
5. Ordinal Classification
Description: Classes have a meaningful order,
but not a numeric difference.
Examples:
Rating systems (low, medium, high)
Education levels (high school, college,
postgraduate)
Clustering and Dimensionality
Clustering Reduction
Used to group similar data points when labels are not known
(unsupervised learning).
1. K-Means Clustering
Divides data into K clusters based on distance (usually
Euclidean).
Fast & simple, but needs the number of clusters in
advance.
2. Hierarchical Clustering
Builds a tree (dendrogram) of clusters.
Clustering and Dimensionality
Reduction
3. DBSCAN (Density-Based Spatial Clustering)
Groups points that are closely packed; good at handling
noise and outliers.
Doesn’t need number of clusters.
4. Gaussian Mixture Models (GMM)
Probabilistic model; assumes data is from multiple
Gaussian distributions.
More flexible than K-Means for complex shapes.
Clustering and Dimensionality
Reduction
Clustering and Dimensionality
Reduction
Dimensionality reduction is a technique
that helps improve the accuracy and
performance of clustering algorithms by
reducing the number of features in a dataset.
It can help with the following:
Sparse data: As more features are added, data
can become sparse, making analysis difficult.
Dimensionality reduction can help avoid this
issue.
Clustering and Dimensionality
Reduction
Accuracy: Dimensionality reduction can
improve accuracy in classification and
clustering.
Computational cost: Dimensionality reduction
can reduce computational cost.
Visualization: Dimensionality reduction can
improve data visualization.
Storage: Dimensionality reduction can improve
data storage.
Clustering and Dimensionality
Reduction
Model evaluation and selection
TP / FP / FN / TN
The counts of true positive (TP), false
positive (FP), false negative (FN), and true
negative (TN) ground truths and inferences are
essential for summarizing model performance.
These metrics are the building blocks of many
other metrics, including accuracy, precision,
and recall.
Model evaluation and selection
Model evaluation and selection
Used to measure how well a model performs.
Key Metrics:
1. Classification Metrics:
Accuracy: % of correct predictions
Approaches:
1. Train-Test Split
Simple split (e.g., 80/20) to compare models