0% found this document useful (0 votes)
48 views

Machine Learning-Lecture#7-Fall 2020

K-means clustering is an unsupervised learning technique that groups unlabeled data points into a specified number of clusters (K) based on feature similarity. It works by assigning data points to the cluster with the closest centroid and iteratively updating centroids until clusters are stable or the maximum number of iterations is reached. While efficient and easy to apply, K-means clustering has limitations such as being sensitive to initialization and not able to handle clusters of varying shapes, sizes, or densities.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Machine Learning-Lecture#7-Fall 2020

K-means clustering is an unsupervised learning technique that groups unlabeled data points into a specified number of clusters (K) based on feature similarity. It works by assigning data points to the cluster with the closest centroid and iteratively updating centroids until clusters are stable or the maximum number of iterations is reached. While efficient and easy to apply, K-means clustering has limitations such as being sensitive to initialization and not able to handle clusters of varying shapes, sizes, or densities.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture#7

Unsupervised Learning
(Clustering)

1
What is Cluster Analysis?
Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Understanding
– Group students who
succeed and fails in
the same exercises

Summarization
– Reduce the size of
large data sets

Clustering precipitation
in Australia
What is not Cluster Analysis?

Supervised classification
– Have class label information


Simple segmentation
– Dividing students into different registration groups
alphabetically, by last name
Types of Clusterings
A clustering is a set of clusters

Important distinction between hierarchical and


partitional sets of clusters

 Partitional Clustering
– A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly
one subset

 Hierarchical clustering
– A set of nested clusters organized as a hierarchical
tree
K-means Clustering

Partitional clustering approach


Each cluster is associated with a centroid (center
point)
Each point is assigned to the cluster with the closest
centroid
Number of clusters, K, must be specified
The basic algorithm is very simple
K-means Clustering
Partitional Clustering

Original Points
Partitional Clustering

Original Points with initial centres


Partitional Clustering

Original Points with clusters iteration 1


Partitional Clustering

Original Points with new centres


Partitional Clustering

Original Points with clusters and new centres iteration 2


Partitional Clustering

Original Points with clusters and new centres iteration 3


Partitional Clustering

Final clusters and centres A Partitional


Clustering
Two different K-means Clusterings
3

2.5

1.5
Original Points

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2


x

3 3

2.5 2 .5

2 2

1.5 1 .5

y
y

1 1

0.5 0 .5

0 0

-2 - 1.5 -1 -0 .5 0 0 .5 1 1.5 2 -2 - 1.5 -1 -0.5 0 0 .5 1 1 .5 2


x x

Optimal Clustering Sub-optimal Clustering


Property of K-means

Sum of Squared Error (SSE) diminishes after each
iteration.

The SSE is not necessarily the optimal one .

K
SSE    (mi , x)
dist 2

i 1 xCi
Advantages of K-means

Is efficient.
 Can be computed in a distributive way.
 Is easy to apply.
Limitations of K-means
 How to determine the best K?
 May give a sub-optimal solution.

K.means has problems when clusters are of
differing
– Sizes
– Densities
– Non-globular shapes


K-means is sensible to outliers.

You might also like