0% found this document useful (0 votes)

11 views28 pages

Lecture-11 Cluster Analysis-1

Uploaded by

Sazzad Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views28 pages

Lecture-11 Cluster Analysis-1

Uploaded by

Sazzad Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Lecture-11

M.S. 1st Semester

Session: 2020-21

Cluster Analysis
Reference books

1. Bhuyan K.C. (2005). Multivariate Analysis and its Application, 1st

Edition, New Central Book Agency (P) Ltd, India
2. Everitt, B.S., Landau, S., Leese, M., Stahl, D. (2012). Cluster Analysis,
5th Edition, John Wiley & Sons Ltd, UK

2
Background
Clustering:
Finding groups of objects (e.g. patients) such that the
objects in a group are similar (or related) to one another
and different from the objects in other groups.

Inter-cluster
Intra-cluster
distances are
distances are
maximized
minimized

3
Background
Cluster Analysis:
▪ In statistics, the search for relatively homogeneous
objects is called cluster analysis.

In general,
▪ It is a class of techniques used to classify observations
into groups or clusters such that:
✓ Each group or cluster is homogeneous (or compact)
with respect to certain characteristics.
✓ Each group should be different from other groups
with respect to the same characteristics.

4
Application
▪ In Psychiatry
✓ Clustering is frequently used on patients to form
homogeneous sub-groups using variables (e.g. cognitions)
that help identify the disease severity.
▪ In the field of Biology
✓ To classify the animals into classes, order and families.
▪ In Agriculture
✓ The land fertility of a particular region may not be
homogeneous for any type of crops.
✓ Then the pieces of land sharing similar fertility for a
particular crop may be group together.
▪ In Economics
✓ People of a city center may be grouped together according
to their socio-economic condition.
5
Data Reduction Techniques
▪ Cluster Analysis
✓ Reduces the sample observations in size.
✓ Data reduction technique in rows of the data matrix.
✓ Identifies homogeneous groups or clusters.

▪ Principal Component Analysis (PCA)

✓ Reduces the data in columns i.e., it reduces number of
variables.

▪ Discriminant Analysis
✓ Similar with respect to classification of observations.
✓ But it derives a rule for allocating an object to its proper
population based on some prior information of the group
membership of the object.
6
Basic Steps of Cluster Analysis
Clustering depends on
➢ Choice of clustering algorithms
– Hierarchical clustering
– K-means clustering
➢ Choice of distance
– Euclidean distance
– Minkowski distance
– Manhattan distance
– etc..
➢ Choice of variables
➢ Standardization
➢ The number of clusters
7
Distances or similarity measures
➢ Distances
▪ Euclidean distance

If x = (x1, x2,..., xn) and y = (y1, y2,..., yn) are two points
in Euclidean n-space, then the distance from x to y is

𝑑𝑥𝑦 = (𝑥1 − 𝑦1 )2 +(𝑥2 − 𝑦2 )2 + ⋯ + (𝑥𝑛 − 𝑦𝑛 )2

▪ Minkowski distance
1ൗ
𝑛 𝑝
𝑑𝑥𝑦 = ෍ 𝑥𝑖 − 𝑦𝑖 𝑝

𝑖=1
When p = 2, it is Euclidean distance and
when p = 1→ Manhattan distance 8
Distances
▪ Euclidean distance
𝑑𝑥𝑦 = (𝑥1 − 𝑦1 )2 +(𝑥2 − 𝑦2 )2 + ⋯ + (𝑥𝑛 − 𝑦𝑛 )2

Ex: The body length (in mm) and body weight (in gm) of 5 randomly
selected slugs are following:

Subject 1 2 3 4 5
Length 35 35 38 35 39
Weight 1.3 4.0 3.2 1.0 1.4

The Euclidean distance matrix is

0.00 2.70 3.55 0.30 4.00

2.70 0.00 3.10 3.00 4.77
𝑑= 3.55 3.10 0.00 3.72 2.06
0.30 3.00 3.72 0.00 4.02
4.00 4.77 2.06 4.02 0.00
9
Similarity measure
➢ Similarity measure

▪ Pearson’s correlation
𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑗 )
𝑆𝑖𝑗 =
𝑣𝑎𝑟 𝑥𝑖 𝑣𝑎𝑟(𝑦𝑗 )

1
0
Hierarchical Clustering
➢ Agglomerative algorithm:
– Starts with each object as a separate cluster
– Combines objects into clusters that are closest
– Ends with one cluster with all objects
– Once a cluster is formed, it cannot be split

➢ Divisive algorithm:
– The opposite of
agglomerative method

➢ Different distances can be used

to form cluster
11
Methods for hierarchical clustering
➢ Criterion to merge an object into a cluster for hierarchical
clustering
▪ Linkage methods
– Single Linkage (minimum distance)
– Complete Linkage (maximum distance)
– Average Linkage
▪ Ward’s method
– Compute sum of squared distances within clusters
– Aggregate clusters with the minimum increase in the
overall sum of squares
▪ Centroid method
– The distance between two clusters is defined as the
difference between the centroids (cluster mean/median)
12
Single Linkage or Nearest Neighbor Method

▪ The distance between two clusters is represented by the

minimum of the distance between all possible pairs of
subjects in the two clusters.

dAB = min ( d(ui, vj) )

where u A and v  B
for all i = 1 to nA and j = 1 to nB

dAB

13
Complete Linkage or Furthest Neighbor Method

▪ The distance between two clusters is defined as the

maximum of the distances between all possible pairs of
observations in the two clusters.

dAB = max ( d(ui, vj) )

where u A and vB

for all i = 1 to nA and j = 1 to nB

dAB

14
Average Linkage
▪ The distance between two clusters is obtained by taking
the average distance between all pairs of subjects in the
two clusters.
𝑛𝐴 𝑛𝐴
1
𝑑𝐴𝐵 = ෍ ෍ 𝑑(𝑢𝑖 , 𝑣𝑗 )
𝑛𝐴 𝑛𝐵
𝑖=1 𝑗=1

where u  A and v  B
for all i = 1 to nA and j = 1 to nB

dAB

15
Ward’s Minimum Variance

• The within group (i.e., within-cluster) sum of squares is

used as the measure of homogeneity.

• Compute the sum of squared distances within the

groups.

• The Ward’s method tries to minimize the total within-

group or within cluster sum of squares.

• Clusters are formed at each step such that the resulting

cluster solution has the fewest within-group sums of
squares.

16
Dendrogram

▪ After clustering, the objects in the clusters can be

represented by a diagram. This diagram is known as
Dendrogram.
Distance

Objects
17
Example of forming Hierarchical Clustering

Example 7.1: The following data represent the number of ever born children
(x1) and number of desired children (x2) of 5 mothers residing in some parts
of north-Eastern Libya:
Variables/Mothers M1 M2 M3 M4 M5
X1 7 8 6 3 11
x2 10 10 5 2 10

Find the Euclidean distance matrix d and represent the others by Dendrogram
after clustering.
Solution:
The Euclidean distance matrix is

𝑀1 𝑀2 𝑀3 𝑀4 𝑀5
𝑀1
0.00 1.00 5.10 8.94 4.00
𝑀2
1.00 0.00 5.39 9.43 3.00
𝑑 = 𝑀3
5.10 5.39 0.00 4.24 7.07
𝑀4
8.94 9.43 4.24 0.00 11.31
𝑀5
4.00 3.00 7.07 11.31 0.00
18
Solution (1)

Using Average Linkage method

Step-1: We can form the first cluster with M1 and M2 since the distance
between these two objects is minimum.
Step-2: Calculate the average distances of the objects in the 1st cluster and the
objects which are in any cluster.
1 1
𝑑 1,2 3 = ෍ ෍ 𝑑𝑖𝑗 = 𝑑 + 𝑑23 = 5.24
𝑛1 𝑛2 2 × 1 13
Here n1 =2=no. of objects in the 1st cluster and n2=1= no. of objects in another
cluster.
Similarly, 𝑑 1,2 4=9.185, 𝑑 1,2 5 = 3.5

The matrix 𝑑2 is
1,2 3 4 5
1,2
0.00 5.245 9.185 3.50
3
𝑑2 = 5.245 0.00 4.24 7.07
4
9.185 4.24 0.00 11.31
5
3.50 7.07 11.31 0.00

19
Solution (2)

Now, M5 is fused to 1st cluster because the distance of M5 is minimum.

Step-3: The distances between objects and newly formed clusters are
1
𝑑 1,2,5 3 = 𝑑 + 𝑑23 + 𝑑53 = 5.85
3 × 1 13
1
𝑑 1,2,5 4 = 𝑑 + 𝑑24 + 𝑑54 = 8.89
3 × 1 14
The matrix 𝑑3 is
1,2,5 3 4
1,2,5
0.00 5.85 8.89
𝑑3 = 3
5.85 0.00 4.24
4
8. . 89 4.24 0.00
Step-4: Once again, M3 and M4 form a new cluster since the distance between
these two objects is minimum.

Final Step-4: The two clusters are linked to form a single cluster of all objects.
The distance between the two clusters is

1
𝑑 1,2,5 (3,4) = 𝑑 + 𝑑23 + 𝑑53 + 𝑑14 + 𝑑24 + 𝑑54 = 7.87
3 × 2 13
20
Solution (3)

▪ The Dendrogram of all mothers is represented by the

following figure:

21
K-Means Clustering

1) The number of clusters is set at K.

2) Select K points or centroids in the space
of variables.
3) Determine the distance of each object to
the centroids.
4) Group the object based on minimum
distance.
5) When all objects are assigned, calculate
the mean or median of the K groups
(find the closest centroid).
6) Restart until the groups do not change
anymore.

▪ The selection of centroids is essential

since it can lead to incorrect clusters.
▪ Hierarchical clustering can be used for
appropriate centroids. 22
K-Means Clustering

23
K-Means Clustering

24
K-Means Clustering

25
K-Means Clustering

26
K-Means Clustering

27
Thank You!!

K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
Data Science
No ratings yet
Data Science
16 pages
Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
Clustering
No ratings yet
Clustering
55 pages
Exam 2003 B
No ratings yet
Exam 2003 B
20 pages
K Medoids
No ratings yet
K Medoids
101 pages
Banana Leaf Disease Detection Using Deep Learning Approach
No ratings yet
Banana Leaf Disease Detection Using Deep Learning Approach
5 pages
Chapter 4 - Cluster Analysis
No ratings yet
Chapter 4 - Cluster Analysis
55 pages
연구역량 강화를 위한 생성형 인공지능 활용 방안
No ratings yet
연구역량 강화를 위한 생성형 인공지능 활용 방안
120 pages
An To Neural Networks Ben Krose Patrick Van Der Smagt
No ratings yet
An To Neural Networks Ben Krose Patrick Van Der Smagt
9 pages
Activation Functions Book
No ratings yet
Activation Functions Book
20 pages
Clustering
No ratings yet
Clustering
35 pages
ML Visuals
No ratings yet
ML Visuals
61 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Introduction To Deep Learning: Suresh Jaganathan
No ratings yet
Introduction To Deep Learning: Suresh Jaganathan
73 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Tom M CMU ANN Lecture Notes
No ratings yet
Tom M CMU ANN Lecture Notes
68 pages
Clustering
No ratings yet
Clustering
80 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Lec 35
No ratings yet
Lec 35
18 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
AI.5-Machine Learning (21-26)
No ratings yet
AI.5-Machine Learning (21-26)
196 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
101 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
Exercises INF 5860: Exercise 1 Linear Regression
No ratings yet
Exercises INF 5860: Exercise 1 Linear Regression
5 pages
Unit 4 LSTM
No ratings yet
Unit 4 LSTM
85 pages
Unit IV
No ratings yet
Unit IV
51 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Cluster Analysis Using Statgraphics: Dr. Neil W. Polhemus
No ratings yet
Cluster Analysis Using Statgraphics: Dr. Neil W. Polhemus
32 pages
Text Summarization As Feature Selection For Arabic Text Classification
No ratings yet
Text Summarization As Feature Selection For Arabic Text Classification
4 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Neural Language Model, RNNS: Pawan Goyal
No ratings yet
Neural Language Model, RNNS: Pawan Goyal
15 pages
Clustering
No ratings yet
Clustering
80 pages
Unit 3
No ratings yet
Unit 3
12 pages
Adaline and Delta Learning Rule
No ratings yet
Adaline and Delta Learning Rule
18 pages
Deep Learning Technique Syllabus
No ratings yet
Deep Learning Technique Syllabus
2 pages
Chapter II Build A Neural Network Step by Step
No ratings yet
Chapter II Build A Neural Network Step by Step
31 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Marielle Caccam Jewel Refran
No ratings yet
Marielle Caccam Jewel Refran
100 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Depth-Gated Recurrent Neural Networks
No ratings yet
Depth-Gated Recurrent Neural Networks
5 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
DL Lab File Front Page
No ratings yet
DL Lab File Front Page
7 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
34-Why Neural Networks-24-10-2024
No ratings yet
34-Why Neural Networks-24-10-2024
19 pages
Mtech It 2 Sem Soft Computing 2012
No ratings yet
Mtech It 2 Sem Soft Computing 2012
3 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Clustering
No ratings yet
Clustering
75 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Deep Learning
No ratings yet
Deep Learning
1 page
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
AL and ML Assessment Week 11
No ratings yet
AL and ML Assessment Week 11
2 pages
Cluster Analysis Techniques
No ratings yet
Cluster Analysis Techniques
33 pages
Topic 6d - Hierarchical Algorithm
No ratings yet
Topic 6d - Hierarchical Algorithm
38 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
ID6001 Homework
No ratings yet
ID6001 Homework
2 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Class 9th Neural Networks QAns
100% (1)
Class 9th Neural Networks QAns
2 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Infographic - ABCs of AI and Deep Learning
No ratings yet
Infographic - ABCs of AI and Deep Learning
1 page
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
1/5 (1)

Lecture-11 Cluster Analysis-1

Uploaded by

Lecture-11 Cluster Analysis-1

Uploaded by

Lecture-11

M.S. 1st Semester

1. Bhuyan K.C. (2005). Multivariate Analysis and its Application, 1st

▪ Principal Component Analysis (PCA)

𝑑𝑥𝑦 = (𝑥1 − 𝑦1 )2 +(𝑥2 − 𝑦2 )2 + ⋯ + (𝑥𝑛 − 𝑦𝑛 )2

The Euclidean distance matrix is

0.00 2.70 3.55 0.30 4.00

➢ Different distances can be used

▪ The distance between two clusters is represented by the

dAB = min ( d(ui, vj) )

▪ The distance between two clusters is defined as the

dAB = max ( d(ui, vj) )

where u A and vB

• The within group (i.e., within-cluster) sum of squares is

• Compute the sum of squared distances within the

• The Ward’s method tries to minimize the total within-

• Clusters are formed at each step such that the resulting

▪ After clustering, the objects in the clusters can be

Using Average Linkage method

Now, M5 is fused to 1st cluster because the distance of M5 is minimum.

▪ The Dendrogram of all mothers is represented by the

1) The number of clusters is set at K.

▪ The selection of centroids is essential

You might also like