0% found this document useful (0 votes)

6 views12 pages

Cluster Analysis

Cluster Analysis is an unsupervised learning method used to group similar objects, such as customers, into clusters for better understanding and targeted marketing. The document discusses two main clustering approaches: Hierarchical Clustering, which includes Agglomerative and Divisive methods, and Non-hierarchical Clustering, specifically K-means. It also covers distance measures used in clustering, the steps involved in Agglomerative Clustering, and highlights the importance of domain-specific knowledge in interpreting clustering results.

Uploaded by

mavi260900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

Cluster Analysis

Uploaded by

mavi260900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Cluster Analysis

Prof. Thomas B. Fomby

Department of Economics
Southern Methodist University
Dallas, TX 75275

April 2008
April 2010

Cluster Analysis, sometimes called data segmentation or customer

segmentation, is an unsupervised learning method. As you will recall a method is an
unsupervised learning method if it doesn’t involve prediction or classification. The major
purpose of Cluster Analysis is to group together collections of objects (e.g. customers)
into “clusters” so that the objects in the clusters are “similar.” One reason a company
might want to organize its customers into groups is to come to better understand the
nature of its customers. Given the delineation of its customers into distinct groups, the
company could advertise differently to its distinct groups, send different catalogues to its
distinct groups, and the like.

In terms of building prediction and classification models, cluster analysis can help
the analyst identify groups of input variables that in turn can lead to different models for
each group. This is, of course, assuming that the output relationships vis-à-vis the input
variables across the groups are not the same. But then one can always test the
“poolability” of the models by either conventional hypothesis tests, when considering
econometric models, or accuracy measures across validation and test data partitions when
considering machine learning models.

As one will come to understand after working on several clustering projects,

clustering is an “Art Form.” It must be practiced with care. The more experience you
have in doing cluster analysis, the better you become as a practitioner. Before beginning
cluster analysis it is often recommended that the data be normalized first. Cluster
analysis based on variables with very different scales of measurement can lead to clusters
that are not very robust to adding or deleting variables or observations. In this
discussion, we will be focusing on clustering only continuous input variables. The
clustering of mixed data, some continuous and some categorical, is not considered here
as it is beyond the scope of this discussion.

Now let us begin. There are two basic approaches to clustering:

a) Hierarchical Clustering (Agglomerative Clustering discussed here)
b) Non-hierarchical clustering (K-means)

1
Hierarchical Clustering

With respect to hierarchical clustering, the final clusters chosen are built in a
series of steps. If we start with N objects, each being in its own separate cluster, and then
combine one of the clusters with another cluster resulting in N – 1 clusters and continue
to combine clusters into fewer and few clusters with more and more objects in each
cluster, we are engaging in Agglomerative clustering. In contrast, if we start with all of
the objects being in a single cluster and then remove one of the objects to form a second
cluster and then continue to build more and more clusters with fewer and few objects in
each cluster until each object is in its own cluster, we are engaging in Divisive
clustering. The distinction between these two hierarchical methods is represented in the
below figure taken from the XLMINER help file.

Figure 1

Hierarchical Clustering:
Agglomerative versus Divisive Methods

The above figure is called a dendrogram and represents the fusions or divisions made at
each successive stage of the analysis. More formally then, a dendrogram is a tree-like
diagram that summarizes the process of clustering.

2
Distance Measures Using in Clustering

In order to build clusters, either agglomeratively or divisively, we need to define

the distance between two objects (cases), (xi1 , xi 2, .xip ) and (x j1 , x j 2 ,, x jp ) and
eventually between clusters. Let us first examine the distance between two objects. If
the units of measure of the p variables are quite different, it is suggested that the
variables be first normalized by forming z-scores of the variables as in subtracting the
sample means from the original variables and dividing the deviations by their respective
sample standard deviations. The most often used measure of distance (dissimilarity)
between the two cases is the Euclidean distance defined by

dij  (xi1  x j1 ) 2  (x x ) 

2
(x  ipjp
x ) 2
. (1)
i 2j 2

Alternatively, a weighted Euclidean distance can be used and is defined by

*
dij (x  x )2  w2i(x
w1i1j1 2j 2
x ) 
2
x
w (x pipjp )2 (2)

p
where the w1 , w2 ,, wp satisfy the wi  0 and  wi  1. For the
weights properties i1

remaining discussion let us focus on the Euclidean distance measure of distance between
objects (cases).

Moving to the discussion of the distance between clusters we need to somehow

define the distance between the objects in one cluster and the objects in another cluster.
Cluster distances are usually defined in one of three basic ways: Single Linkage
(Nearest Neighbor), Complete Linkage (Farthest Neighbor), and Average Group
Linkage. Each of these cluster distance measures are defined in order below:

Single Linkage (Nearest Neighbor)

The Single Linkage distance between two clusters is defined as the distance
between the nearest pair of objects in the two clusters (one object in each cluster). If
cluster A is the set of A1 , A2 ,, Am and cluster B is B1 , B2 ,, Bn , the Single
objects
Linkage distance between clusters A and B is

D( A, B) = Min{dij : where Ai is in cluster A and B j is in cluster B

object object
and dij is the Euclidean distance Ai and B j }
between

At each stage of hierarchical clustering based on the Single Linkage distance measure,
the clusters A and B, for which D(A, B) is minimum, are merged. The Single Linkage

3
distance is represented in the XLMINER Help File figure below:

4
Figure 2

Single Linkage Distance

Between Clusters

Complete Linkage (Farthest Neighbor)

The Complete Linkage distance between two clusters is defined as the distance
between the most distant (farthest) pair of objects in the two clusters (one object in each
cluster). If cluster A is the set of A1 , A2 ,, Am and cluster B is B1 , B2 ,, Bn , the
objects
Single Linkage distance between clusters A and B is

D( A, B) = Max{dij : where Ai is in cluster A and B j is in cluster B

object object
and dij is the Euclidean distance Ai and B j }
between

At each stage of hierarchical clustering based on the Complete Linkage distance measure,
the clusters A and B, for which D(A, B) is minimum, are merged. The Complete Linkage
distance is represented in the XLMINER Help File figure below:

5
Figure 3

Complete Linkage Distance

Between Clusters

Average Linkage

Under Average Linkage the distance between two clusters is defined to be the
average of the distances between all pairs of objects, where each pair is made up on one
object from each cluster. If cluster A is the set of A1 , A2 ,, Am and cluster B is
objects
B1 , B2 ,, Bn , the Average Linkage distance between clusters A and B is

𝐷(𝐴, 𝐵) = 𝑇𝐴𝐵

𝑁𝐴·𝑁𝐵

where 𝑇𝐴𝐵 is the sum of all pairwise distances between cluster A and Cluster B. 𝑁𝐴 and
𝑁𝐵 are the sizes of the clusters A and B, respectively.

At each stage of hierarchical clustering based on the Average Linkage distance measure,
the clusters A and B are merged such that, after merger, the average pairwise distance
within the newly formed cluster, is minimum. The Complete Linkage distance is
represented in the XLMINER Help File figure below:

6
Figure 4

Average Linkage Distance

Between Clusters

Steps in Agglomerative Clustering

The steps in Agglomerative Clustering are as follows:

1. Start with n clusters (each observation = cluster)
2. The two closest observations are merged into one cluster
3. At every step, the two clusters that are “closest” to each other are merged. That
is, either single observations are added to existing clusters or two exiting
clusters are merged.
4. This process continues until all observations are merged.

This process of agglomeration leads to the construction of a dendrogram. This is

a tree-like diagram that summarizes the process of clustering. For any given number of
clusters we can determine the records in the clusters by sliding a horizontal line (ruler)
up and down the dendrogram until the number of vertical intersections of the
horizontal line equals the number of clusters desired.

7
Dendrograms are more useful visually when there are a smaller number of cases
as in the Utilities.xls data set. However, the agglomerative procedure works for larger
data sets but is computing intensive in that nxn matrices are the basic building blocks for
the Agglomerative procedure.

To demonstrate the construction and interpretation of a dendrogram let’s cluster

the data contained in the Utilities.xls data set. This data set consists of observations on
22 utilities each utility being described by 8 variables. As noted above we have 3
different choices of distance between clusters. They are Single Linkage (Nearest
Neighbor), Complete Linkage (Farthest Neighbor) and Average Linkage. Three separate
dendrograms can be generated for each choice of distance measure. Let’s look at the
dendrogram generated by using the Average Linkage measure. It is reproduced below:

Dendrogram(Average linkage)
5

4.5

3.5

3
Distan

2.5

1.5

0.5
1 18 14 19 6392
1 2 3 4 5 6 7 8 22 4
9 10 20 10 13 7
11 12 13 14 12 21 15 17 58
15 16 17 18 19 20 16 11
21 22

If we put our horizontal ruler at 4.0 for the maximal distance allowed between

clusters. They are as follows:{1,18,14,19,6,3,9,2,22,4,20,10,13};

clusters (as measured by average linkage) we “cut across” 4 vertical lines and thus get 4

{7,12,21,15,17}; {5} ;
{8,16,11}. If we put our horizontal ruler at 3.5 for the maximal distance allowed
between clusters we “cut across” 7 vertical lines and thus get 7 clusters. They are as

{1,18,14,19,6,3,9}; {2,22,4,20,10,13} ; {7,12,21,15}; {17};{5};{8,16};{11}.

follows:

The four
cluster group is constructed by combining the first and second clusters, the third and
fourth clusters, and the sixth and seventh clusters in the seven cluster group. You can
8
now see why this type of clustering is call hierarchical because the 4 cluster group is
constructed by combining cluster groupings immediately below it. As you move up

9
slowly from the bottom of the dendrogram to the top you move from n clusters to n-l
clusters to n-2 clusters etc. until all of the observations are contained into one cluster.

To show how sensitive the choice of clusters is to the choice of distance, consider
the Single Linkage dendrogram for the Utilities data:

Dendrogram(Single linkage)
4

3.5

2.5

2
Distan

1.5

0.5

0 1 0 1

18 14 19 924
2 3 4 5 6 7 10 13 2 0 7
8 9 10 11 12 21 15 22 638
12 13 14 15 16 17 18 16 17 11 5
19 20 21 22
23

dendrogram. Then we get the following 4 clusters:{5} ; {11}; {17}; {𝑟𝑒𝑠𝑡}. These
In the case of forming 4 groups, set the maximal allowed distance to be 3.0 in the above

four clusters are quite different from the 4 clusters determined by using the Average
Linkage dendrogram. This just goes to show that cluster analysis is an art form and
the clusters should be interpreted with caution and hopefully only accepted if the
clusters make sense given the domain-specific knowledge we have concerning the
utilities under study.

Also we should note some additional limitations of hierarchical clustering:

 For very large data sets, can be expensive and slow
 Makes only one pass through the data. Therefore, early clustering decisions
affect the rest of the clustering results.
 Often has low stability. That is, adding or subtracting variables or adding or
dropping observations can affect the groupings substantially.
 Sensitive to outliers and their treatment

10
Non-hierarchical Clustering (K-means)

The following is hopefully a not too technical discussion of K-means clustering.

It is a non-hierarchical method in the sense that if one has 2 clusters, say, generated by
pre-specifying 2 means (centroids) in the K-means algorithm and 3 clusters generated by
pre-specifying 3 means in the K-means algorithm, then it may be the case that no
combination of any two clusters of the 3 cluster group can give rise to the 2 cluster
grouping. In this sense the K-means algorithm is non-hierarchical. Let us turn again to
the Utilities data and use the K-means clustering method to determine 4 clusters based on
the normalized data. We use the following choices:

 Normalized data
 10 Random Starts
 10 iterations per start
 Fixed random seed = 12345
 Number of reported clusters = 4

clusters: cluster 1 = {2,5,7,12,15,17,21,22}; cluster 2 = {4,10,13,20};

Then the K-means algorithm in XLMiner for four clusters generated the following

{3,6,9}; cluster 4 = {1,8,11,14,16,18,19}. Again we derive another distinct 4

cluster 3 =

cluster grouping. Once can then use domain-specific knowledge to determine if this 4
cluster grouping makes more or less sense than the 4 group clusters determined by
either of the choices of cluster distance in the agglomerative approach.

The Steps in the K-means Clustering Approach

Given a set of observations (x1, x2, ⋯ . xn) where each observation is a d-

K sets (K < n), 𝑆 = {𝑆1, 𝑆2, ⋯ . 𝑆𝐾} so as to minimize the within-cluster sum of
dimensional real vector, then K-means clustering aims to partition the n observations into

squares (WCSS):

𝑎𝑟𝑔 − 𝜇
i= ∑𝗑j∈𝑆i‖ 2
𝐾
min𝑆 ∑
‖
(1)
𝗑j
1
i

where 𝜇i is the mean of the points in 𝑆i . Now minimizing (1) can, in theory, be done by
the integer programming method but this can be extremely time-consuming. Instead

Given the initial set of K-means 𝑚(1), ⋯ , 𝑚(1)which can be specified randomly or by
the Lloyd algorithm is more often used. The steps of the Lloyd algorithm are as follows.
1 𝐾
some heuristic, the algorithm proceeds by alternating between two steps:

Assignment Step: Assign each observation to the cluster with the closest mean

𝑆(𝑡) = {𝗑j: ‖𝗑j − 𝑚(𝑡)‖ ≤ ‖𝗑j − 𝑚(𝑡)‖ } for all i* = 1,2, ⋯ ,(2)
𝐾.
i i i*

11
Update Step: Calculate the new means to be the centroids of the observations in
the clusters, i.e.

𝑚 = for i = 1,2, ⋯ , 𝐾.
(𝑡+1) 1

∑ 𝗑j
(𝑡) (3)
i (𝑡) 𝗑j∈𝑆i
|𝑆 |
i

Repeat the Assignment and Update steps until WCSS (equation (1)) no longer
changes. Then the centroids and members of the K clusters are determined.

Note: When using random assignment of the K-means to start the algorithm, one might
try several starting point K-means and then choose the “best” starting point to be the
random K-means that produces the smallest WCSS among all of the random starting
points K-means tried.

Regardless of the clustering technique used, one should strive to choose clusters
that are interpretable and make sense given the domain-specific knowledge that we have
about the problem at hand.

 Review Utilities.xls data

Alberto - Leon-Garcia 2009 Student Solutions Manual
86% (7)
Alberto - Leon-Garcia 2009 Student Solutions Manual
204 pages
Theory of Financial Decision Making Ingersoll
100% (1)
Theory of Financial Decision Making Ingersoll
390 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Cluster Analysis Notes
No ratings yet
Cluster Analysis Notes
37 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Clustering
No ratings yet
Clustering
69 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
R PPT 30
No ratings yet
R PPT 30
45 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Clustering
No ratings yet
Clustering
20 pages
Mid 2
No ratings yet
Mid 2
11 pages
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
100% (11)
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
17 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Clustering
No ratings yet
Clustering
29 pages
Abstract Lie Algebras
From Everand
Abstract Lie Algebras
David J Winter
No ratings yet
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
An Overview On Clustering Methods: T. Soni Madhulatha
No ratings yet
An Overview On Clustering Methods: T. Soni Madhulatha
7 pages
Clustring
No ratings yet
Clustring
20 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Lecture Notes Mechanics 5 03 26 21
No ratings yet
Lecture Notes Mechanics 5 03 26 21
8 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
Is 2500 2 1965
100% (1)
Is 2500 2 1965
44 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Expt 5
No ratings yet
Expt 5
3 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
MS5030: Data Analysis For Management: Rahul R Marathe
No ratings yet
MS5030: Data Analysis For Management: Rahul R Marathe
39 pages
Linear Differential Equation - Wikipedia, The Free Encyclopedia
100% (1)
Linear Differential Equation - Wikipedia, The Free Encyclopedia
8 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Career Point: Fresher Course For IIT JEE (Main & Advanced) - 2017
No ratings yet
Career Point: Fresher Course For IIT JEE (Main & Advanced) - 2017
2 pages
A LATTICE-THEORETICAL FIXPOINT THEOREM - Tarski PDF
No ratings yet
A LATTICE-THEORETICAL FIXPOINT THEOREM - Tarski PDF
26 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
FS1 Practice Paper 5 - For Teachers
No ratings yet
FS1 Practice Paper 5 - For Teachers
16 pages
SS1 Further Maths Exam
No ratings yet
SS1 Further Maths Exam
8 pages
DC Motor
No ratings yet
DC Motor
4 pages
Robust Model Predictive Control of Constrained Linear Systems With Bounded Disturbances
No ratings yet
Robust Model Predictive Control of Constrained Linear Systems With Bounded Disturbances
6 pages
PV and CDF Tables
No ratings yet
PV and CDF Tables
3 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
RSA Examples
No ratings yet
RSA Examples
18 pages
LP5 Q4W1 Miot
No ratings yet
LP5 Q4W1 Miot
17 pages
Chem - DSE - Applications of Computers in Chemistry
No ratings yet
Chem - DSE - Applications of Computers in Chemistry
3 pages
Kriner Cash's Resume
No ratings yet
Kriner Cash's Resume
21 pages
Prediction of Pressure Drop in Adsorption Filter Using Friction Factor Correlations For Packed Bed
No ratings yet
Prediction of Pressure Drop in Adsorption Filter Using Friction Factor Correlations For Packed Bed
9 pages
Soal Logaritma
No ratings yet
Soal Logaritma
2 pages
Solid Mensuration
100% (1)
Solid Mensuration
3 pages
Statistical Optimization of Process Variables For L-Lysine Production by Corynebacterium Glutamicum
No ratings yet
Statistical Optimization of Process Variables For L-Lysine Production by Corynebacterium Glutamicum
9 pages
CIRED MV Shielded Busbar Long Term Ageing Test
No ratings yet
CIRED MV Shielded Busbar Long Term Ageing Test
5 pages
Data Structures For Statistical Computing in Python
No ratings yet
Data Structures For Statistical Computing in Python
6 pages
Chapter 1 Challenges: Questions
No ratings yet
Chapter 1 Challenges: Questions
6 pages
NETWORK-THEORY IES and GATE
No ratings yet
NETWORK-THEORY IES and GATE
2 pages
Unit Standards and Competencies Diagram: Immaculate Heart of Mary School (Bulacan), Inc
No ratings yet
Unit Standards and Competencies Diagram: Immaculate Heart of Mary School (Bulacan), Inc
6 pages
Assignment PDF
No ratings yet
Assignment PDF
2 pages
Subject Combination Suggestions
No ratings yet
Subject Combination Suggestions
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

Prof. Thomas B. Fomby

Cluster Analysis, sometimes called data segmentation or customer

As one will come to understand after working on several clustering projects,

Now let us begin. There are two basic approaches to clustering:

In order to build clusters, either agglomeratively or divisively, we need to define

dij  (xi1  x j1 ) 2  (x x ) 

Alternatively, a weighted Euclidean distance can be used and is defined by

Moving to the discussion of the distance between clusters we need to somehow

Single Linkage (Nearest Neighbor)

D( A, B) = Min{dij : where Ai is in cluster A and B j is in cluster B

Single Linkage Distance

Complete Linkage (Farthest Neighbor)

D( A, B) = Max{dij : where Ai is in cluster A and B j is in cluster B

Complete Linkage Distance

Average Linkage Distance

Steps in Agglomerative Clustering

The steps in Agglomerative Clustering are as follows:

This process of agglomeration leads to the construction of a dendrogram. This is

To demonstrate the construction and interpretation of a dendrogram let’s cluster

clusters. They are as follows:{1,18,14,19,6,3,9,2,22,4,20,10,13};

{1,18,14,19,6,3,9}; {2,22,4,20,10,13} ; {7,12,21,15}; {17};{5};{8,16};{11}.

Also we should note some additional limitations of hierarchical clustering:

The following is hopefully a not too technical discussion of K-means clustering.

clusters: cluster 1 = {2,5,7,12,15,17,21,22}; cluster 2 = {4,10,13,20};

{3,6,9}; cluster 4 = {1,8,11,14,16,18,19}. Again we derive another distinct 4

The Steps in the K-means Clustering Approach

Given a set of observations (x1, x2, ⋯ . xn) where each observation is a d-

 Review Utilities.xls data

You might also like