0% found this document useful (0 votes)
4 views53 pages

Module IV - Machine Learning

The document provides an overview of machine learning, focusing on supervised and unsupervised learning, including regression and classification algorithms, clustering, and dimensionality reduction. It outlines the types of machine learning, popular algorithms, and their applications, advantages, and disadvantages. Additionally, it discusses semi-supervised learning and the differences between regression and classification algorithms.

Uploaded by

pinkylily097
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views53 pages

Module IV - Machine Learning

The document provides an overview of machine learning, focusing on supervised and unsupervised learning, including regression and classification algorithms, clustering, and dimensionality reduction. It outlines the types of machine learning, popular algorithms, and their applications, advantages, and disadvantages. Additionally, it discusses semi-supervised learning and the differences between regression and classification algorithms.

Uploaded by

pinkylily097
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Machine

Learning
MODULE IV
SYLLABUS
Introduction to supervised and
unsupervised learning - Regression
and Classification algorithms -
Clustering and Dimensionality
Reduction - Model evaluation and
selection.
Machine Learning
●Machine learning (ML) is a branch of
artificial intelligence (AI) that focuses on
developing algorithms that enable
computers to learn from data and make
predictions or decisions without being
explicitly programmed.
●ML is used in various fields, including
healthcare, finance, marketing, and
robotics.
Machine Learning
Types of Machine Learning

● Supervised Learning – The model learns from labeled data.

Examples: Linear regression, decision trees, support vector


machines, and neural networks.

● Unsupervised Learning – The model identifies patterns in


unlabeled data.

Examples: Clustering (K-Means, DBSCAN) and dimensionality


reduction (PCA).

● Reinforcement Learning – The model learns by interacting


with an environment and receiving rewards or penalties.

Examples: Q-learning, Deep Q-Networks (DQN).


Machine Learning
Popular ML Algorithms
 Linear Regression

 Logistic Regression

 Decision Trees

 Random Forests

 K-Nearest Neighbors (KNN)

 Support Vector Machines (SVM)

 Neural Networks (Deep Learning)

 Gradient Boosting (XGBoost, LightGBM)


Comparative Study
Introduction to Supervised and
● SupervisedUnsupervised Learning
and unsupervised learning are two key
approaches in machine learning.
● In supervised learning, the model is trained with labeled
data where each input is paired with a corresponding
output.
● On the other hand, unsupervised learning involves
training the model with unlabeled data where the task is
to uncover patterns, structures or relationships within the
data without predefined outputs.
Supervised Learning
● Supervised learning is when we teach or train the machine
using data that is well-labelled.

● Which means some data is already tagged with the correct


answer.

● After that, the machine is provided with a new set of


examples (data) so that the supervised learning algorithm
analyses the training data (set of training examples) and
produces a correct outcome from labeled data.

● Supervised learning is a type of machine learning where the


algorithm learns from labeled data meaning the data comes
with correct answers or classifications.

● For example, a labeled dataset of images of Elephant, Camel


Supervised Learning
Supervised Learning
Types of Supervised Learning
Supervised learning is classified into two categories of
algorithms:
● Regression: A regression problem is when the output
variable is a real value, such as “dollars” or “weight”.
● Classification: A classification problem is when the
output variable is a category, such as “Yes” or “No” ,
“disease” or “no disease”.
Supervised learning deals with or learns with “labeled”
data. This implies that some data is already tagged with
Supervised Learning
1. Regression
 Regression is a type of supervised learning that is used to
predict continuous values, such as house prices, stock
prices, or customer churn. Regression algorithms learn a
function that maps from the input features to the output
value.

Some common regression algorithms include:


 Linear Regression

 Polynomial Regression

 Lasso Regression
Supervised Learning
2. Classification
 Classification is a type of supervised learning that is used to
predict categorical values, such as whether a customer will
churn or not, whether an email is spam or not, or whether a
medical image shows a tumor or not. Classification
algorithms learn a function that maps from the input features
to a probability distribution over the output classes.

Some of the most common classification algorithms include:


 Logistic Regression

 Support Vector Machines

 Decision Trees

 Random Forests
Supervised Learning
Applications of Supervised learning

Spam filtering: Supervised learning algorithms can be trained to


identify and classify spam emails based on their content, helping
users avoid unwanted messages.

Image classification: Supervised learning can automatically


classify images into different categories, such as animals, objects,
or scenes, facilitating tasks like image search, content moderation,
and image-based product recommendations.

Medical diagnosis: Supervised learning can assist in medical


diagnosis by analyzing patient data, such as medical images, test
results, and patient history, to identify patterns that suggest
specific diseases or conditions.

Natural language processing (NLP): Supervised learning plays


a crucial role in NLP tasks, including sentiment analysis, machine
Supervised Learning
Advantages of Supervised learning
 By using labeled examples, supervised learning builds
models that draw on prior experiences to produce reliable
outputs for new, unseen data.
 It Improves Over Time. With more data and training, these
models can refine their accuracy, leading to better
performance and more reliable predictions.
 Supervised learning is practical for a range of applications,
from detecting spam emails to predicting house prices,
thanks to its ability to handle diverse computational
challenges.
 It can be used to categorize data into predefined labels
Supervised Learning
Disadvantages of Supervised learning
 Supervised learning requires a well-labeled dataset,
where each input is paired with a corresponding output.
Creating such datasets can be time-consuming,
expensive, and prone to human error, which may limit its
applicability in scenarios where labeled data is scarce or
difficult to obtain.
 While supervised learning performs well on many tasks, it
often struggles with highly complex or unstructured
problems. For instance, it may have difficulty handling
nuanced patterns, multiple dependencies, or tasks that
involve abstract reasoning, as these typically go beyond
the model’s trained scope.
Unsupervised Learning
 It is a type of machine learning that works with data that
has no labels or categories.
 The main goal is to find patterns and relationships in the
data without any guidance.
 In this approach, the machine analyzes unorganized
information and groups it based on similarities, patterns, or
differences. Unlike supervised learning, there is no teacher
or training involved.
 The machine must uncover hidden structures in the data on
its own.
 For example, unsupervised learning can analyze animal data
and group the animals by their traits and behavior.
Unsupervised Learning

It allows the model to work on its own to discover patterns


and information that was previously undetected. It mainly
deals with unlabeled data.
Unsupervised Learning
Types of Unsupervised Learning
Unsupervised learning is classified into two
categories of algorithms:
 Clustering: A clustering problem is where you
want to discover the inherent groupings in the
data, such as grouping customers by
purchasing behavior.
 Association: An association rule learning
problem is where you want to discover rules
that describe large portions of your data, such
Unsupervised Learning
Clustering
 Clustering is a type of unsupervised learning that is
used to group similar data points together.
 Clustering algorithms work by iteratively moving data
points closer to their cluster centers and further away
from data points in other clusters. Some of the most
important hierarchical clustering algorithms include:
 K-means clustering

 Hierarchical clustering

 Principal Component Analysis (PCA)

 Singular Value Decomposition (SVD)


Unsupervised Learning
Association rule learning
 Association rule learning is a type of unsupervised
learning that is used to identify patterns in a data.
Association rule learning algorithms work by finding
relationships between different items in a dataset.
Some common association rule learning algorithms
include:
 Apriori Algorithm
 Eclat Algorithm
 FP-Growth Algorithm
Unsupervised Learning
Application of Unsupervised learning
 Anomaly detection: Unsupervised learning can identify
unusual patterns or deviations from normal behavior in data,
enabling the detection of fraud, intrusion, or system failures.
 Scientific discovery: Unsupervised learning can uncover
hidden relationships and patterns in scientific data, leading to
new hypotheses and insights in various scientific fields.
 Recommendation systems: Unsupervised learning can
identify patterns and similarities in user behavior and
preferences to recommend products, movies, or music that
align with their interests.
 Customer segmentation: Unsupervised learning can identify
groups of customers with similar characteristics, allowing
Unsupervised Learning
Advantages of Unsupervised learning

1. Unsupervised learning doesn’t require data to be labeled,


making it easier and faster to start working with large
datasets.

2. This approach can handle large amounts of data and


reduce it into simpler forms without losing essential patterns,
making it more manageable and efficient.

3. It can uncover patterns and relationships in the data that


were previously unknown, offering valuable insights you may
not have found otherwise.

4. By analyzing unlabeled data, unsupervised learning can


reveal meaningful trends and groupings that help you
Unsupervised Learning
Disadvantages of Unsupervised learning
1. Since there are no labeled answers to compare with, it
can be challenging to gauge how accurate or effective the
model is.
2. The lack of clear guidance can result in less precise
results, particularly for complex tasks.
3. After unsupervised learning groups the data, the user
often needs to review and label these groupings, which
can be time-consuming.
4. Unsupervised learning is easily influenced by missing
Semi-Supervised Machine Learning
 As the name suggests, this method combines supervised and
unsupervised learning.
 The technique relies on using a small amount of labeled data and
a large amount of unlabeled data to train systems.
 First, the labeled data is used to partially train the machine-
learning algorithm.
 After that, the partially trained algorithm labels the unlabeled
data. This process is called pseudo-labeling.
 The model is then re-trained on the resulting data mix without
being explicitly programmed.
 This method's advantage is that it does not require large amounts
of labeled data. This is handy when working with data like long
documents that would be too time-consuming for humans to read
Machine Learning Modeling Cycle
Regression and Classification

algorithms
Regression and Classification
Learning algorithms.
algorithms are Supervised

 Both the algorithms are used for prediction in Machine


learning and work with the labeled datasets.
 But the difference between both is how they are used for
different machine learning problems.

The main difference between Regression and Classification


algorithms that
 Regression algorithms are used to predict the continuous
values such as price, salary, age, etc. and
 Classification algorithms are used to predict/Classify the
discrete values such as Male or Female, True or False, Spam
Regression and Classification
algorithms
Regression and Classification
Regression: algorithms
 Regression is a process of finding the correlations
between dependent and independent variables. It helps
in predicting the continuous variables such as
prediction of Market Trends, prediction of House prices,
etc.
 The task of the Regression algorithm is to find the
mapping function to map the input variable(x) to the
continuous output variable(y).
 Example: Suppose we want to do weather forecasting,
so for this, we will use the Regression algorithm. In
weather prediction, the model is trained on the past
Regression and Classification
algorithms
Types of Regression Algorithm:
Algorithm Description Use Case Example
Models a linear relationship between
Linear Regression Predict house price based on size.
inputs and output.

Variants of linear regression with Financial forecasting with many


Ridge/Lasso Regression
regularization to prevent overfitting. features.

Fits a nonlinear curve using polynomial


Polynomial Regression Modeling growth curves.
terms.
Uses SVM principles for regression
Support Vector Regression (SVR) Predicting stock prices.
tasks.
Splits data into tree-like model for
Decision Tree Regression Predict customer spending behavior.
prediction.
Ensemble of decision trees to improve
Random Forest Regression Forecasting weather patterns.
accuracy.

Gradient Boosting (e.g., XGBoost, Ensemble technique that builds models


Predicting demand, sales forecasting.
LightGBM) sequentially to correct errors.

Deep learning models for complex, Estimating property values with images
Neural Networks (for Regression)
nonlinear regressions. and data.
Regression and Classification
Classification: algorithms
 Classification is a process of finding a function which helps
in dividing the dataset into classes based on different
parameters. In Classification, a computer program is
trained on the training dataset and based on that training,
it categorizes the data into different classes.
 The task of the classification algorithm is to find the
mapping function to map the input(x) to the discrete
output(y).

Example: The best example to understand the Classification


problem is Email Spam Detection. The model is trained on the
basis of millions of emails on different parameters, and
whenever it receives a new email, it identifies whether the
Regression and Classification
algorithms
Types of ML Classification Algorithms:
Algorithm Description Use Case Example

Models binary or multi-class outcomes using


Logistic Regression Email spam detection.
a logistic function.

Classifies based on the closest training


K-Nearest Neighbors (KNN) Handwriting recognition.
examples.
Finds the best boundary (hyperplane)
Support Vector Machines (SVM) Image classification.
between classes.

Decision Trees Tree-based model splitting data by features. Customer churn prediction.

Ensemble of decision trees to reduce


Random Forest Classifier Loan default classification.
overfitting.

Based on Bayes’ theorem, assumes Sentiment analysis, document


Naive Bayes
independence between features. classification.

Gradient Boosting Classifier Ensemble of weak classifiers optimized Fraud detection, medical
(XGBoost, LightGBM) sequentially. diagnosis.
Deep models good for complex, high-
Neural Networks (for Classification) Voice recognition, image tagging.
dimensional data.
Regression and Classification
algorithms
Regression and Classification
algorithms
Regression Types
1. Linear Regression
Description: Models the relationship between a
dependent variable and one or more independent
variables using a straight line.
Example: Predicting house prices based on square
footage.
2. Multiple Linear Regression
Description: Extension of linear regression with two
or more independent variables.
Example: Predicting salary based on experience,
Regression Types
3. Polynomial Regression
Description: Models the relationship as an nth-
degree polynomial. Useful when data shows a non-
linear trend.
Example: Predicting growth over time that
accelerates or decelerates.
4. Ridge Regression (L2 Regularization)
Description: Linear regression with a penalty for
large coefficients to reduce overfitting.
Example: When there’s multicollinearity in the data
Regression Types
5. Lasso Regression (L1 Regularization)
Description: Similar to Ridge, but can shrink some
coefficients to zero, which helps with feature
selection.
Example: When you want a simpler, more
interpretable model.
6. Logistic Regression
Description: Despite its name, it’s used for
classification, not regression. Outputs probabilities
using a logistic (sigmoid) function.
Classification Types
1. Binary Classification
Description: Two classes only (e.g., yes/no, 0/1).
Examples:
 Spam vs. Not Spam, Disease vs. No Disease

2. Multiclass Classification
Description: More than two classes, but only one correct
class per instance.
Examples:
 Digit recognition (0–9), Animal classification (cat, dog,
Classification Types
3. Imbalanced Classification

Description: One class is much more frequent than the others.

Examples:
 Fraud detection (fraud = rare, normal = common)

 Medical diagnosis for rare diseases

4. Multilabel Classification

Description: Each instance can belong to multiple classes at


the same time.

Examples:
 A movie can be tagged as both Action and Comedy
Classification Types
5. Ordinal Classification
Description: Classes have a meaningful order,
but not a numeric difference.
Examples:
 Rating systems (low, medium, high)
 Education levels (high school, college,
postgraduate)
Clustering and Dimensionality
Clustering Reduction
Used to group similar data points when labels are not known
(unsupervised learning).

1. K-Means Clustering
 Divides data into K clusters based on distance (usually
Euclidean).
 Fast & simple, but needs the number of clusters in
advance.

2. Hierarchical Clustering
 Builds a tree (dendrogram) of clusters.
Clustering and Dimensionality
Reduction
3. DBSCAN (Density-Based Spatial Clustering)
 Groups points that are closely packed; good at handling
noise and outliers.
Doesn’t need number of clusters.
4. Gaussian Mixture Models (GMM)
 Probabilistic model; assumes data is from multiple
Gaussian distributions.
 More flexible than K-Means for complex shapes.
Clustering and Dimensionality
Reduction
Clustering and Dimensionality
Reduction
Dimensionality reduction is a technique
that helps improve the accuracy and
performance of clustering algorithms by
reducing the number of features in a dataset.
It can help with the following:
Sparse data: As more features are added, data
can become sparse, making analysis difficult.
Dimensionality reduction can help avoid this
issue.
Clustering and Dimensionality
Reduction
Accuracy: Dimensionality reduction can
improve accuracy in classification and
clustering.
Computational cost: Dimensionality reduction
can reduce computational cost.
Visualization: Dimensionality reduction can
improve data visualization.
Storage: Dimensionality reduction can improve
data storage.
Clustering and Dimensionality
Reduction
Model evaluation and selection
TP / FP / FN / TN
The counts of true positive (TP), false
positive (FP), false negative (FN), and true
negative (TN) ground truths and inferences are
essential for summarizing model performance.
These metrics are the building blocks of many
other metrics, including accuracy, precision,
and recall.
Model evaluation and selection
Model evaluation and selection
Used to measure how well a model performs.

Key Metrics:

1. Classification Metrics:
 Accuracy: % of correct predictions

 Precision: TP / (TP + FP) – how many predicted positives


are correct
 Recall (Sensitivity): TP / (TP + FN) – how many actual
positives were captured
 F1-Score: Harmonic mean of precision and recall
Model evaluation and selection
2. Regression Metrics:
 MSE (Mean Squared Error): Average of squared
differences (punishes large errors)
 RMSE (Root MSE): More interpretable (same units as
target)
 MAE (Mean Absolute Error): Average of absolute
errors (more robust to outliers)
 R² Score (Coefficient of Determination): Measures
how well predictions approximate actual values
Model evaluation and selection
3. Cross-Validation:
 Splits data into training & validation folds to
ensure the model generalizes
 k-Fold Cross-Validation is the most common
 Helps avoid overfitting
Model evaluation and selection
Model Selection

Choosing the best model based on performance, complexity, and


generalization.

Approaches:

1. Train-Test Split
 Simple split (e.g., 80/20) to compare models

2. Grid Search / Random Search

Tune hyperparameters using cross-validation


 Grid: Exhaustive, slower

 Random: Faster, good for large search space


Model evaluation and selection
3. Automated Tools
 AutoML frameworks (e.g., TPOT, H2O, AutoSklearn)
select the best model + hyperparams
4. Bias-Variance Tradeoff
 High Bias → underfitting
 High Variance → overfitting
 Select models that balance both

Note: Always test on unseen (test) data after tuning on


training/validation sets!
THANK YOU

You might also like