0% found this document useful (0 votes)
6 views35 pages

M3 - Unsupervised Machine Learning

The document discusses unsupervised machine learning, focusing on clustering techniques such as K-means and hierarchical clustering. Clustering groups similar data points into clusters without labeled data, with K-means being a partitional algorithm that requires the user to specify the number of clusters. The document also highlights the strengths and weaknesses of K-means and provides real-life examples of clustering applications in tailoring and marketing.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

M3 - Unsupervised Machine Learning

The document discusses unsupervised machine learning, focusing on clustering techniques such as K-means and hierarchical clustering. Clustering groups similar data points into clusters without labeled data, with K-means being a partitional algorithm that requires the user to specify the number of clusters. The document also highlights the strengths and weaknesses of K-means and provides real-life examples of clustering applications in tailoring and marketing.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Unsupervised

Machine Learning
3 of 3 modules
The data has no target
attribute.
Unsupervised
Learning
We want to explore the
data to find some intrinsic
structures in them.
What is Clustering?

Clustering
Clustering is a technique for finding similarity groups in data, called clusters.
I.e.,
• It groups data instances that are similar to (near) each other in one cluster
and data instances that are very different (far away) from each other into
different clusters.
Grouping of
data points
Intuitive definition: that are close
to each other

What’s a To make this computer friendly,


need a mathematical definition
cluster? of “close.”

Closeness (most based on


common distance or
density
definitions):
Clustering as unsupervised learning

algorithm
Unlabeled data Structured data

assignment
New data New data included
(unlabeled) in structure
Think of it like
this – In layman
figures
A Clustering Technique

K-Means
Algorithm
K-means is a partitional clustering algorithm

The k-means algorithm partitions the given data into k clusters.

Each cluster has a cluster center, called centroid. k is specified by the user
k-means clustering: the algorithm

• Choose k centroids

• Assign points to cluster based on nearest centroid

• Recompute centroids

• Repeat steps (2) and (3) until there is no more change to the centroids
k-means: simple example
k-means: simple example
k-means: simple example
In 3D, K-
Means looks
like this
k-means
performance good clustering  points close
to cluster centroids
within cluster sum of squares (wcss)
k-means
performance

number of clusters (k)


Add Add new data to nearest cluster

k-means: Treat Treat clusters as labeled data

adding new
data Use Use this data to train a classifier

Apply Apply classifier to new data


Strengths:
• Simple—one parameter (k
clusters)
• Typically fast
• Easy to implement

k-means:
strengths and
weaknesses
Weaknesses:
• Optimal k is often not obvious
• Sensitive to outliers
• Scaling affects results
Clustering - Real life Examples

Example 1: groups people of similar sizes together to make “small”, “medium” and “large” T-
Shirts.
Tailor-made for each person: too
One-size-fits-all: does not fit all.
expensive

Example 2: In marketing, segment customers according to their similarities

To do targeted marketing.
Additional Reading:
Hierarchical Clustering
Hierarchical clustering is a popular unsupervised learning technique
used to group similar data points into clusters.
Similar (close) data pairs are MERGED together into clusters by iteration
This merging then continues in order until a stopping criterion (e.g. “three clusters”) is
reached
Two types of
hierarchical •Agglomerative
clustering
•Divisive
Hierarchical clusters result in
DENDROGRAMS (“tree graphs”)
How many clusters to set as our
criterion? “Prune” the dendrogram at
the appropriate level
Applications of HC

Image Segmentation

Gene Expression
Analysis
What have you learned?

4/5/2023 34
Thank you !!
I welcome your questions.

4/5/2023 35

You might also like