SlideShare a Scribd company logo
Clustering
What is Clustering?
K-Means Clustering
Flowchart to understand K-means Clustering
Clustering of cars based on brands
Logistic Regression
What’s in it for you?
What is Logistic Regression?
Logistic Regression Curve & Sigmoid function
Classify whether a tumor is malignant or benign
based on features
Cover/transition slides
will be changed
Clustering
Suppose, we
have a pile of
books of
different genres!
Clustering
Now, we divide them into different groups like
Fiction
Horror
Educational
Well, organizing objects
into groups based on
their similarity is
Clustering!
Well, organizing objects
into groups based on
their similarity is
Clustering!
K-means Clustering
K-Means Clustering is an
example of Unsupervised
learning
K-Means Clustering is an
example of Unsupervised
learning
It is used when you have
unlabeled data!
K-Means Clustering is an
example of Unsupervised
learning
It is used when you have
unlabeled data!
To find clusters in the data
based on feature similarity!
Steps for K-Means
Suppose we have these data
points and we want to assign
them into clusters
STEP 1: Initialize Cluster Centroids
We pick ‘K’ clusters & assign random centroids to clusters
STEP 1: Initialize Cluster Centroids
We pick ‘K’ clusters & assign random centroids to clusters
Then, we compute distance from objects to centroids
STEP 2: Compute Minimum Distance
Now, we form new clusters based on minimum distance and calculate
their centroids
STEP 3: Assign Points to New Clusters
Repeat previous two steps iteratively till the cluster centroids stop
changing their positions and become static
STEP 3: Assign Points to New Clusters
Repeat previous two steps iteratively till the cluster centroids stop
changing their positions and become static
Shall we see a flowchart to
understand?
Flowchart to understand K-Means
Choose K (Elbow Method)
START
Assign random centroids to clusters
Compute distance from objects to centroids
Yes
Form new clusters based on minimum distance and calculate their centroids
Compute distance from objects to new centroids
Repeat until
no
observations
change
groups
Let’s see an example!
K-Means Algorithm
Subject A B
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5
Suppose, we have this dataset of 7 individuals and their
score on two topics (A and B)
K-Means Algorithm
Now, lets take two farthest-apart points as initial cluster
centroids
Subject A B
1 1 1
2 1.5 2
3 3 4
4 5 7
5 3.5 5
6 4.5 5
7 3.5 4.5
K-Means Algorithm
Now, lets take two farthest-apart points as initial cluster
centroids
K-Means Algorithm
Each point is then assigned to the closest cluster with
respect to their distance from the centroids Cluster 1
Cluster 2
K-Means Algorithm
Now, we again calculate the centroids of each cluster:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
Cluster 1
Cluster 2
K-Means Algorithm
We compare each individual’s distance to its own cluster mean and to
that of the opposite cluster. And we find:
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Using Eucledian Distance
between the points and the
mean
Cluster 1
Cluster 2
K-Means Algorithm
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2)
than its own (Cluster 1)
Cluster 1
Cluster 2
Moving point 3 to new
cluster
K-Means Algorithm
Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:
Individual
Distance to mean
(centroid) of Cluster
1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
Cluster 1
Cluster 2
K-Means Algorithm
For the new clusters, we will find the actual cluster
centroids:
Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.25, 1.5)
Cluster 2 4, 5, 6, 7 (3.9, 5.1)
Cluster 1
Cluster 2
K-Means Algorithm
On comparing the distance of each individual’s distance
to it’s own cluster mean and to that of the opposite cluster,
we find that the data points are stable, hence we have our
final clusters!
Cluster 1
Cluster 2
K-Means Algorithm
To find appropriate number of clusters in a dataset, we use elbow method:
WSS
No . of. clusters
Elbow point
Within sum of squares (WSS) is defined
as the sum of the squared distance
between each member of the cluster and
its centroid
Finding the optimal number of clusters using
the elbow of the graph is called as the Elbow
method
Use Case
Using K-means clustering to cluster cars into brands using the
parameters such as horsepower, cubic inches, make year, etc.
Dataset: Cars data having information about 3 brands of cars namely
Toyota, Honda, Nissan
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Clustering
Today, we’ll dive into K-
means Clustering!
Well, organizing objects
into groups based on
their similarity is
Clustering!
Logistic Regression
Logistic Regression
Now, let’s look into
Logistic Regression
Logistic Regression
The Logistic Regression algorithm is the
simplest classification algorithm used for
binary or multi-classification problems
Logistic Regression
To brush up,
y = mx+c
The dependent variable is the
target class variable we are
going to predict
In the previous tutorial, we learnt about Linear Regression, dependent and independent variables
Logistic Regression
In the previous tutorial, we learnt about Linear Regression, dependent and independent variables
The independent variables
(x1…xn) are the features or
attributes we are going to use to
predict the target class
To brush up,
y = mx+c
The dependent variable is the
target class variable we are
going to predict
Logistic Regression
1
0
Marks
No. of hours studied
We know what a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
100
Logistic Regression
100
0
We know what a
linear regression
looks like, but using
this graph we
cannot divide the
outcome into
categories
For example, a linear regression graph can
tell us that with increase in number of hours
studied, the marks of a student will
increase
But, it will not tell us whether the student
will pass or not!
Marks
No. of hours studied
Logistic Regression
In such cases, where we need the output
as categorical value, we will use logistic
regression! 100
0
No. of hours studied
Marks
Logistic Regression
0
100 1
0
Sigmoid
Curve
Sigmoid Function
y = m*x + c
p =
1
1 + ⅇ
− y
p
ln (
1-p
) = m*x + c
No. of hours studied No. of hours studied
Marks
Marks
Logistic Regression
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9
Logistic Regression
Threshold value
Probability > 0.50
Value is rounded off to 1 indicating that the
student will pass
Probability < 0.50 , the value is
rounded off to 0 indicating that the
student will fail
0.30
0.82
Problem statement: To classify whether a
tumor is ‘malignant’ or ‘benign’
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
Use Case
So, this model is
able to predict the
type of tumor with
91% accuracy!
Finally, let’s discuss the answers to the quiz asked in
Machine Learning Tutorial Part-1
for the instructor
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
A. Grouping documents into different categories based on the
topic and content of each document
“This is an example of Clustering where K-means
clustering can be used to group the documents by
topics using bag-of-words approach”
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
B. Identifying hand-written digits in images correctly
“This is an example of Classification. The traditional
approach to solving this would be to extract digit
dependent features like curvature of different digits,
etc. and then use a classifier like SVM to distinguish
between images”
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
C. Behavior of a website indicating that the site is not working
as designed
“This is an example of Anomaly Detection. In this case,
the algorithm learns what is "normal" and what is "not
normal", usually by observing the logs of the website”
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you tell what’s happening in the
following cases?
D. Predicting salary of an individual based his/her years
of experience
“This is an example of Regression. This problem can
be mathematically defined as a function between
independent (years of experience) and dependent
variable (salary of an individual)”
Summary
What is K-Means Elbow Method to choose K Clustering cars with K-means
Classifying tumor with logisticWhat is logistic regression
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn

More Related Content

PPTX
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Simplilearn
 
PPTX
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
PDF
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
PDF
Machine Learning Course | Edureka
Edureka!
 
PPTX
Machine Learning Overview.pptx
RushikeshChikane2
 
PPTX
Machine Learning vs Deep Learning vs Artificial Intelligence | ML vs DL vs AI...
Simplilearn
 
PPTX
What Is Machine Learning? | What Is Machine Learning And How Does It Work? | ...
Simplilearn
 
PPTX
Machine Learning Contents.pptx
Naveenkushwaha18
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Simplilearn
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
Machine Learning Course | Edureka
Edureka!
 
Machine Learning Overview.pptx
RushikeshChikane2
 
Machine Learning vs Deep Learning vs Artificial Intelligence | ML vs DL vs AI...
Simplilearn
 
What Is Machine Learning? | What Is Machine Learning And How Does It Work? | ...
Simplilearn
 
Machine Learning Contents.pptx
Naveenkushwaha18
 

What's hot (20)

PPTX
Machine Learning Algorithms
DezyreAcademy
 
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
PPTX
Machine learning overview
prih_yah
 
PDF
Machine learning
Dr Geetha Mohan
 
PPTX
introduction to machin learning
nilimapatel6
 
PPT
Machine learning
Rajib Kumar De
 
PDF
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
PDF
Supervised learning
Learnbay Datascience
 
PDF
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
PPTX
Machine Can Think
Rahul Jaiman
 
PDF
Machine Learning
Shrey Malik
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PDF
Machine learning
Amit Kumar Rathi
 
PPTX
Supervised Machine Learning
Ankit Rai
 
PPT
Machine Learning
Rahul Kumar
 
PPTX
Machine Learning Basics
Suresh Arora
 
PPT
Machine learning
Sanjay krishne
 
PPTX
Supervised Unsupervised and Reinforcement Learning
Aakash Chotrani
 
PDF
An introduction to Machine Learning
butest
 
PDF
Reinforcement Learning Tutorial | Edureka
Edureka!
 
Machine Learning Algorithms
DezyreAcademy
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Machine learning overview
prih_yah
 
Machine learning
Dr Geetha Mohan
 
introduction to machin learning
nilimapatel6
 
Machine learning
Rajib Kumar De
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
Supervised learning
Learnbay Datascience
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
Machine Can Think
Rahul Jaiman
 
Machine Learning
Shrey Malik
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Machine learning
Amit Kumar Rathi
 
Supervised Machine Learning
Ankit Rai
 
Machine Learning
Rahul Kumar
 
Machine Learning Basics
Suresh Arora
 
Machine learning
Sanjay krishne
 
Supervised Unsupervised and Reinforcement Learning
Aakash Chotrani
 
An introduction to Machine Learning
butest
 
Reinforcement Learning Tutorial | Edureka
Edureka!
 
Ad

Similar to Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn (20)

PDF
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
PDF
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
PPTX
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
PPTX
DataAnalysis in machine learning using different techniques
mtwnc202302
 
PPTX
Clustering
Md. Hasnat Shoheb
 
PPTX
Clustering.pptx
Mukul Kumar Singh Chauhan
 
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPTX
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
PPTX
Unsupervised learning Algorithms and Assumptions
refedey275
 
PPTX
Lec13 Clustering.pptx
Khalid Rabayah
 
PPT
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
PPTX
K means clustering
keshav goyal
 
PPT
K mean-clustering
Afzaal Subhani
 
PDF
k-mean-clustering.pdf
YatharthKhichar1
 
PPT
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
PDF
Cluster Analysis : Assignment & Update
Billy Yang
 
PPT
k-mean-clustering big data analaysis.ppt
abikishor767
 
PPT
k-mean-clustering for data classification
KantilalRane1
 
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
DataAnalysis in machine learning using different techniques
mtwnc202302
 
Clustering
Md. Hasnat Shoheb
 
Clustering.pptx
Mukul Kumar Singh Chauhan
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
Unsupervised learning Algorithms and Assumptions
refedey275
 
Lec13 Clustering.pptx
Khalid Rabayah
 
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
K means clustering
keshav goyal
 
K mean-clustering
Afzaal Subhani
 
k-mean-clustering.pdf
YatharthKhichar1
 
Clustering in Machine Learning: A Brief Overview.ppt
shilpamathur13
 
Cluster Analysis : Assignment & Update
Billy Yang
 
k-mean-clustering big data analaysis.ppt
abikishor767
 
k-mean-clustering for data classification
KantilalRane1
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 

Recently uploaded (20)

PDF
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PDF
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
PPTX
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
How to Manage Global Discount in Odoo 18 POS
Celine George
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Understanding operators in c language.pptx
auteharshil95
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
PDF
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
PPTX
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
Landforms and landscapes data surprise preview
jpinnuck
 
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
How to Manage Global Discount in Odoo 18 POS
Celine George
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Understanding operators in c language.pptx
auteharshil95
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Landforms and landscapes data surprise preview
jpinnuck
 

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners Part - 2 | Simplilearn

  • 1. Clustering What is Clustering? K-Means Clustering Flowchart to understand K-means Clustering Clustering of cars based on brands Logistic Regression What’s in it for you? What is Logistic Regression? Logistic Regression Curve & Sigmoid function Classify whether a tumor is malignant or benign based on features Cover/transition slides will be changed
  • 2. Clustering Suppose, we have a pile of books of different genres!
  • 3. Clustering Now, we divide them into different groups like Fiction Horror Educational
  • 4. Well, organizing objects into groups based on their similarity is Clustering!
  • 5. Well, organizing objects into groups based on their similarity is Clustering! K-means Clustering
  • 6. K-Means Clustering is an example of Unsupervised learning
  • 7. K-Means Clustering is an example of Unsupervised learning It is used when you have unlabeled data!
  • 8. K-Means Clustering is an example of Unsupervised learning It is used when you have unlabeled data! To find clusters in the data based on feature similarity!
  • 9. Steps for K-Means Suppose we have these data points and we want to assign them into clusters
  • 10. STEP 1: Initialize Cluster Centroids We pick ‘K’ clusters & assign random centroids to clusters
  • 11. STEP 1: Initialize Cluster Centroids We pick ‘K’ clusters & assign random centroids to clusters Then, we compute distance from objects to centroids
  • 12. STEP 2: Compute Minimum Distance Now, we form new clusters based on minimum distance and calculate their centroids
  • 13. STEP 3: Assign Points to New Clusters Repeat previous two steps iteratively till the cluster centroids stop changing their positions and become static
  • 14. STEP 3: Assign Points to New Clusters Repeat previous two steps iteratively till the cluster centroids stop changing their positions and become static
  • 15. Shall we see a flowchart to understand?
  • 16. Flowchart to understand K-Means Choose K (Elbow Method) START Assign random centroids to clusters Compute distance from objects to centroids Yes Form new clusters based on minimum distance and calculate their centroids Compute distance from objects to new centroids Repeat until no observations change groups
  • 17. Let’s see an example!
  • 18. K-Means Algorithm Subject A B 1 1 1 2 1.5 2 3 3 4 4 5 7 5 3.5 5 6 4.5 5 7 3.5 4.5 Suppose, we have this dataset of 7 individuals and their score on two topics (A and B)
  • 19. K-Means Algorithm Now, lets take two farthest-apart points as initial cluster centroids Subject A B 1 1 1 2 1.5 2 3 3 4 4 5 7 5 3.5 5 6 4.5 5 7 3.5 4.5
  • 20. K-Means Algorithm Now, lets take two farthest-apart points as initial cluster centroids
  • 21. K-Means Algorithm Each point is then assigned to the closest cluster with respect to their distance from the centroids Cluster 1 Cluster 2
  • 22. K-Means Algorithm Now, we again calculate the centroids of each cluster: Individual Mean Vector (centroid) Cluster 1 1, 2, 3 (1.8, 2.3) Cluster 2 4, 5, 6, 7 (4.1, 5.4) Cluster 1 Cluster 2
  • 23. K-Means Algorithm We compare each individual’s distance to its own cluster mean and to that of the opposite cluster. And we find: Individual Distance to mean (centroid) of Cluster 1 Distance to mean (centroid) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1 Using Eucledian Distance between the points and the mean Cluster 1 Cluster 2
  • 24. K-Means Algorithm Individual Distance to mean (centroid) of Cluster 1 Distance to mean (centroid) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1 Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1) Cluster 1 Cluster 2 Moving point 3 to new cluster
  • 25. K-Means Algorithm Thus, individual 3 is relocated to Cluster 2 resulting in the new partition: Individual Distance to mean (centroid) of Cluster 1 Distance to mean (centroid) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1 Cluster 1 Cluster 2
  • 26. K-Means Algorithm For the new clusters, we will find the actual cluster centroids: Individual Mean Vector (centroid) Cluster 1 1, 2, 3 (1.25, 1.5) Cluster 2 4, 5, 6, 7 (3.9, 5.1) Cluster 1 Cluster 2
  • 27. K-Means Algorithm On comparing the distance of each individual’s distance to it’s own cluster mean and to that of the opposite cluster, we find that the data points are stable, hence we have our final clusters! Cluster 1 Cluster 2
  • 28. K-Means Algorithm To find appropriate number of clusters in a dataset, we use elbow method: WSS No . of. clusters Elbow point Within sum of squares (WSS) is defined as the sum of the squared distance between each member of the cluster and its centroid Finding the optimal number of clusters using the elbow of the graph is called as the Elbow method
  • 29. Use Case Using K-means clustering to cluster cars into brands using the parameters such as horsepower, cubic inches, make year, etc. Dataset: Cars data having information about 3 brands of cars namely Toyota, Honda, Nissan
  • 39. Clustering Today, we’ll dive into K- means Clustering! Well, organizing objects into groups based on their similarity is Clustering! Logistic Regression
  • 40. Logistic Regression Now, let’s look into Logistic Regression
  • 41. Logistic Regression The Logistic Regression algorithm is the simplest classification algorithm used for binary or multi-classification problems
  • 42. Logistic Regression To brush up, y = mx+c The dependent variable is the target class variable we are going to predict In the previous tutorial, we learnt about Linear Regression, dependent and independent variables
  • 43. Logistic Regression In the previous tutorial, we learnt about Linear Regression, dependent and independent variables The independent variables (x1…xn) are the features or attributes we are going to use to predict the target class To brush up, y = mx+c The dependent variable is the target class variable we are going to predict
  • 44. Logistic Regression 1 0 Marks No. of hours studied We know what a linear regression looks like, but using this graph we cannot divide the outcome into categories 100
  • 45. Logistic Regression 100 0 We know what a linear regression looks like, but using this graph we cannot divide the outcome into categories For example, a linear regression graph can tell us that with increase in number of hours studied, the marks of a student will increase But, it will not tell us whether the student will pass or not! Marks No. of hours studied
  • 46. Logistic Regression In such cases, where we need the output as categorical value, we will use logistic regression! 100 0 No. of hours studied Marks
  • 47. Logistic Regression 0 100 1 0 Sigmoid Curve Sigmoid Function y = m*x + c p = 1 1 + ⅇ − y p ln ( 1-p ) = m*x + c No. of hours studied No. of hours studied Marks Marks
  • 48. Logistic Regression 0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6 7 8 9 Logistic Regression Threshold value Probability > 0.50 Value is rounded off to 1 indicating that the student will pass Probability < 0.50 , the value is rounded off to 0 indicating that the student will fail 0.30 0.82
  • 49. Problem statement: To classify whether a tumor is ‘malignant’ or ‘benign’
  • 59. Use Case So, this model is able to predict the type of tumor with 91% accuracy!
  • 60. Finally, let’s discuss the answers to the quiz asked in Machine Learning Tutorial Part-1 for the instructor
  • 61. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Can you tell what’s happening in the following cases? A. Grouping documents into different categories based on the topic and content of each document “This is an example of Clustering where K-means clustering can be used to group the documents by topics using bag-of-words approach”
  • 62. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Can you tell what’s happening in the following cases? B. Identifying hand-written digits in images correctly “This is an example of Classification. The traditional approach to solving this would be to extract digit dependent features like curvature of different digits, etc. and then use a classifier like SVM to distinguish between images”
  • 63. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Can you tell what’s happening in the following cases? C. Behavior of a website indicating that the site is not working as designed “This is an example of Anomaly Detection. In this case, the algorithm learns what is "normal" and what is "not normal", usually by observing the logs of the website”
  • 64. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Can you tell what’s happening in the following cases? D. Predicting salary of an individual based his/her years of experience “This is an example of Regression. This problem can be mathematically defined as a function between independent (years of experience) and dependent variable (salary of an individual)”
  • 65. Summary What is K-Means Elbow Method to choose K Clustering cars with K-means Classifying tumor with logisticWhat is logistic regression

Editor's Notes