M3 - Unsupervised Machine Learning
M3 - Unsupervised Machine Learning
Machine Learning
3 of 3 modules
The data has no target
attribute.
Unsupervised
Learning
We want to explore the
data to find some intrinsic
structures in them.
What is Clustering?
Clustering
Clustering is a technique for finding similarity groups in data, called clusters.
I.e.,
• It groups data instances that are similar to (near) each other in one cluster
and data instances that are very different (far away) from each other into
different clusters.
Grouping of
data points
Intuitive definition: that are close
to each other
algorithm
Unlabeled data Structured data
assignment
New data New data included
(unlabeled) in structure
Think of it like
this – In layman
figures
A Clustering Technique
K-Means
Algorithm
K-means is a partitional clustering algorithm
Each cluster has a cluster center, called centroid. k is specified by the user
k-means clustering: the algorithm
• Choose k centroids
• Recompute centroids
• Repeat steps (2) and (3) until there is no more change to the centroids
k-means: simple example
k-means: simple example
k-means: simple example
In 3D, K-
Means looks
like this
k-means
performance good clustering points close
to cluster centroids
within cluster sum of squares (wcss)
k-means
performance
adding new
data Use Use this data to train a classifier
k-means:
strengths and
weaknesses
Weaknesses:
• Optimal k is often not obvious
• Sensitive to outliers
• Scaling affects results
Clustering - Real life Examples
Example 1: groups people of similar sizes together to make “small”, “medium” and “large” T-
Shirts.
Tailor-made for each person: too
One-size-fits-all: does not fit all.
expensive
To do targeted marketing.
Additional Reading:
Hierarchical Clustering
Hierarchical clustering is a popular unsupervised learning technique
used to group similar data points into clusters.
Similar (close) data pairs are MERGED together into clusters by iteration
This merging then continues in order until a stopping criterion (e.g. “three clusters”) is
reached
Two types of
hierarchical •Agglomerative
clustering
•Divisive
Hierarchical clusters result in
DENDROGRAMS (“tree graphs”)
How many clusters to set as our
criterion? “Prune” the dendrogram at
the appropriate level
Applications of HC
Image Segmentation
Gene Expression
Analysis
What have you learned?
4/5/2023 34
Thank you !!
I welcome your questions.
4/5/2023 35