Machine Learning-Lecture#7-Fall 2020
Machine Learning-Lecture#7-Fall 2020
Unsupervised Learning
(Clustering)
1
What is Cluster Analysis?
Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Understanding
– Group students who
succeed and fails in
the same exercises
Summarization
– Reduce the size of
large data sets
Clustering precipitation
in Australia
What is not Cluster Analysis?
Supervised classification
– Have class label information
Simple segmentation
– Dividing students into different registration groups
alphabetically, by last name
Types of Clusterings
A clustering is a set of clusters
Partitional Clustering
– A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly
one subset
Hierarchical clustering
– A set of nested clusters organized as a hierarchical
tree
K-means Clustering
Original Points
Partitional Clustering
2.5
1.5
Original Points
y
1
0.5
3 3
2.5 2 .5
2 2
1.5 1 .5
y
y
1 1
0.5 0 .5
0 0
K
SSE (mi , x)
dist 2
i 1 xCi
Advantages of K-means
Is efficient.
Can be computed in a distributive way.
Is easy to apply.
Limitations of K-means
How to determine the best K?
May give a sub-optimal solution.
K.means has problems when clusters are of
differing
– Sizes
– Densities
– Non-globular shapes
K-means is sensible to outliers.