0% found this document useful (0 votes)
83 views

Clustering: Prof. Ankur Sinha

This document discusses clustering, an unsupervised machine learning technique used to group unlabeled data points into clusters based on similarity. It provides examples of clustering applications in marketing, urban planning, and more. Different similarity measures for comparing data points are introduced, such as Euclidean distance. An example clusters 10 customers defined by age and service usage attributes. Hierarchical and k-means clustering algorithms are overviewed, with k-means explained as iteratively assigning points to centroids and updating centroids.

Uploaded by

Vibhuti Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Clustering: Prof. Ankur Sinha

This document discusses clustering, an unsupervised machine learning technique used to group unlabeled data points into clusters based on similarity. It provides examples of clustering applications in marketing, urban planning, and more. Different similarity measures for comparing data points are introduced, such as Euclidean distance. An example clusters 10 customers defined by age and service usage attributes. Hierarchical and k-means clustering algorithms are overviewed, with k-means explained as iteratively assigning points to centroids and updating centroids.

Uploaded by

Vibhuti Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Clustering

Prof. Ankur Sinha


Indian Institute of Management Ahmedabad
Gujarat India
Clustering
• Grouping a set of data objects into different
groups based on similarity
• An example of unsupervised learning
• Data objects can be vectors representing
different attributes for an object, for example,
customer, location, product, etc.
Examples
• Used in a variety of areas
– Marketing
– Urban planning
– Customer segmentation
– Product segmentation
– Seismology
Similarity Measure
• If two objects i and j are represented by
vectors xi and xj
– How do you measure similarity between the two
objects
• Euclidean distance
• Manhattan distance
• Mahalanobis distance
– Similarity can be chosen based on the application
Similarity Measure
• Consider 10 customers with two attributes
– Attribute 1: Recent usage of services
– Attribute 2: Customer age
• Objective: Cluster the data into two classes and design two marketing
campaigns for the two customer segments
X 10 years
10

7
Customer Age

0
0 1 2 3 4 5 6 7 8 9 10
X 10 minutes

Usage of Service
Similarity Measure
• Consider 10 customers with two attributes
– Attribute 1: Usage of services
– Attribute 2: Customer age

10 Cluster 1 Cluster2
9

8
(3,4) (6,2)
7

6 (2,6) (7,2)
5

4
(4,5) (7,4)
3 (4,7) (8,4)
2

1
(3,8) (8,5)
0
0 1 2 3 4 5 6 7 8 9 10
Clustering approaches
• Hierarchical clustering
– Agglomerative
– Divisive
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
Clustering approaches
• K-means Clustering
– Select initial centroids randomly
– Assign objects to centroids based on similarity
measure
– Compute new centroid as mean of each class
– Repeat the above two steps until there is no
change
K-Means Clustering

Start with centroids randomly placed Assign points to the centroids Update centroids

Assign points to the new centroids Update centroids Assign points to the new centroids
Random centroids
K-Means Clustering

Start with centroids randomly placed Assign points to the centroids Update centroids

Assign points to the new centroids Update centroids Assign points to the new centroids

Continue until there is no


change in the structure of the
clusters

You might also like