0% found this document useful (0 votes)

71 views10 pages

Lecture Notes On Clustering

Uploaded by

yuyang zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views10 pages

Lecture Notes On Clustering

Uploaded by

yuyang zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lecture Notes on

Clustering
Laurenz Wiskott
Institut fur Neuroinformatik
Ruhr-Universitat Bochum, Germany, EU

14 December 2016

Contents
1 Introduction 2

2 Hard partitional clustering 2

2.1 K-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Davies-Bouldin index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Soft partitional clustering 3

3.1 Gaussian mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.2 Isotropic Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.3 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.4 Conditions for a local optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.5 EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.6 Practical problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.7 Unisotropic Gaussians + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Partition coefficient index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Agglomerative hierarchical clustering 6

4.1 Dendrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 The hierarchical clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Validating hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Applications 8

These lecture notes depend on my lecture notes on vector quantization.

2009, 2011, 2014 Laurenz Wiskott (homepage https://www.ini.rub.de/PEOPLE/wiskott/). This work (except for all
figures from other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources have their
own copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with non-free
copyrights (here usually figures I have the rights to publish but you dont, like my own published figures). Figures I do not
have the rights to publish are grayed out, but the word Figure, Image, or the like in the reference is often linked to a pdf.
More teaching material is available at https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/.

1
1 Introduction
Data1 are often given as points (or vectors) xn in a Euclidean vector space and often form groups that
are close to each other, so called clusters (D: Cluster). In data analysis one is, of course, interested to
discover such a structure, a process called clustering.
Clustering algorithms can be classified into hard or crisp clustering, where each point is assigned to
exactly one cluster, and soft or fuzzy clustering, where each point can be assigned to several clusters with
certain probabilities that add up to 1. Another distinction can be made between partitional clustering ,
where all clusters are on the same level, and hierarchical clustering , where the clustering is done from fine
to coarse by merging points successively to larger and larger clusters (agglomerative hierarchical clustering),
or from coarse to fine, where the points are successively split into smaller and smaller clusters (divisive
hierarchical clustering). I will discuss clustering algorithms of different types in turn.

2 Hard partitional clustering

2.1 K-means algorithm
A particularly simple method for clustering is K-means, which is identical to the LBG or generalized
Lloyd algorithm we know from vector quantization, just applied to clustered data. The idea is to represent
each cluster k by a center point ck and assign each data point xn to one of the clusters k, which
can be written in terms of index sets Ck . The center points and the assignment are then chosen such that
the mean squared distance between data points and center points
K X
X
E := kxn ck k2 (1)
k=1 nCk

is minimized. This can be interpreted, for instance, in terms of a reconstruction error. Imagine we replace
each data point by its associated center point. This will lead to an error, which could be quantified by (1). The
task is to minimize this error. There is actually a close link to vector quantization (D: Vektorquantisierung)
here.
To achieve the minimization in practice we split the problem into two phases. First we keep the assignment
fixed and optimize the position of the center points; then we keep the center points fixed and optimize the
assignment.
If the assignment is fixed, it is easy to show that the optimal choice of the center positions is given
by
1 X
ck = xn , (2)
Nk
nCk

which is simple the center of gravity of the points assigned to this cluster.
If the center points are fixed, it is obvious that each point should be assigned to the nearest center
position. Thus, a Voronoi tessallation (D: Dirichlet-Zerlegung) is optimal.
The K-means algorithm now consists of applying these two optimizations in turn until conver-
gence. The initial center locations could be chosen randomly from the data points. A drawback of this and
many other clustering algorithms is that the number of clusters is not determined. One has to decide
on a proper K in advance, or one simply runs the algorithm with several different K-values and picks the
best according to some criterion.
Also note that the result of the algorithm is not necessarily a global optimum of the error func-
tion (1). For instance, imagine two distinct clusters of equal size and K = 4. If in such a situation three
center points are initialized to lie in one cluster and only one lies in the other, the algorithm will optimize this
only locally with three center points in one cluster and one in the other, and it will find the better solution
where two center points are in each cluster. It is therefore advisable to run the algorithm several times
with different initial center locations and pick the best result.
1 Important text (but not inline formulas) is set bold face; marks important formulas worth remembering; marks less

important formulas, which I also discuss in the lecture; + marks sections that I typically skip during my lectures.

2
Figure 1: Examples of a converged K-means algorithm, once with 5 (yellow) center points (left), and
two different runs with 10 center points (middle and right). The data points are drawn in black and the
Voronoi tessallation in red. (Created with DemoGNG 1.5 written by Hartmut Loos and Bernd Fritzke, see
http://www.demogng.de/js/demogng.html for a more recent version, with kind transfer of copyrights in

this figure by the authors.)

CC BY-SA 4.0

2.2 Davies-Bouldin index

(This section is based on (Gan et al., 2007, 17.2.2).)
To evaluate the quality of a clustering a plethora of validity indices have been proposed. One of them
is the Davies-Bouldin index or, for short, the DB index. First define cluster dispersion as
s
1 X
k := kxn ck k2 , (3)
Nk
nCk

which can be interpreted as a generalized standard deviation. Then define cluster similarity of
two clusters as
k + l
Skl := . (4)
kck cl k

Thus, two clusters are considered similar if they have large dispersion relative to their distance.
A good clustering should be characterized by clusters being as dissimilar as possible. This should apply in
particular to neighboring clusters, because it is clear that distant clusters are dissimilar in any case. Thus,
an overall validation of the clustering can be done by the DB index
K
1 X
VDB := max Skl . (5)
K l6=k
k=1

The DB index does not systematically depend on K and is therefore suitable to find the best
optimal number of clusters, e.g. by plotting VDB and picking a pronounced minimum.

3 Soft partitional clustering

3.1 Gaussian mixture model
(This section is based on (Bishop, 1995, 2.6), a book I can highly recommend.)

3
3.1.1 Introduction

The K-means algorithm is a very simple method with sharp boundaries between the clusters and no particular
characterization of the shape of individual clusters. In a more refined algorithm, one might want to
model each cluster with a Gaussian, capturing the shape of the clusters. This leads naturally to
a probabilistic interpretation of the data as a superposition of Gaussian probability distributions. For
simplicity we first assume that the Gaussians are isotropic, i.e. spherical.

3.1.2 Isotropic Gaussians

If we assume that the Gaussians are isotropic, the probability density function (pdf) of cluster k can
be written as
kx ck k2

1
p(x|k) := exp , (6)
(2k2 )d/2 2k2
where controls the width of the Gaussian. There is also a prior probability P (k) that a data
point belongs to a particular cluster k. The overall pdf for the data is then given by the total probability
K
X
p(x) = p(x|k)P (k) , (7)
k=1

and the probablity density of the data given the model (7) is simply
Y
p({xn }) = p(xn ) . (8)
n

3.1.3 Maximum likelihood estimation

The problem now is that we do not know the parameters of the model, i.e. the values of the centers ck
and the widths k of the Gaussians and the probabilities P (k) for the clusters. How could we estimate or
optimize them?
The simple idea is to choose the parameters such, that the probability density of the data is
maximized. In other words we want to choose the model such that the data becomes most probable. This
is referred to as maximum likelihood estimation (D: Maximum-Likelihood Schatzung), and p({xn }) as
a function of the model parameters is referred to as a likelihood function.

3.1.4 Conditions for a local optimum

A standard method of optimizing a function analytically is to calculate the gradient and set it to zero. I do
not want to work this out here but only state that at a (local) optimum the following equations hold.
P
P (k|xn )xn
ck = Pn , (9)
P (k|xm )
Pm
1 n P (k|xn )kxn ck k2
k2 = P , (10)
d m P (k|xm )
1 X
P (k) = P (k|xn ) . (11)
N n

where all sums go overP all N data points. These equations are perfectly reasonable, as one can see if one
realizes that P (k|xn )/ m P (k|xm ) can be interpreted as a weighting factor for how much data point xn
contributes to cluster k. The key function in these equations is P (k|xn ) which is according to Bayes theorem
p(x|k)P (k)
P (k|x) = (12)
p(x)
p(x|k)P (k)
= P . (13)
l p(x|l)P (l)

4
3.1.5 EM algorithm

The problem with equations (911) is that the parameters on the left-hand side occur im-
plicitely also on the right-hand side. Thus we cannot use these equations directly to calculate the
parameters. However, one can start with some initial parameter values and then iterate through these
equations to improve the estimate. One can actually show that the likelihood increases with each iteration,
if a change occurs. This iterative scheme is referred to as the expectation-maximization algorithm, or
simply EM algorithm. Notice that this is completely different from a gradient ascent method.

3.1.6 Practical problems

Two problems might occur during optimization. Firstly, one of the Gaussians might focus on just one
data point and become infinitly narrow and infinitely high, leading to a divergence of the likelihood.
Secondly, the method can get stuck in a local optimum and miss the globally optimal solution. In
either case it helps to run the algorithm several times and discard inappropriate solutions.
Another general problem is again that the number of clusters is not determined by the algorithm but
must be chosen in advance. Again, running the algorithm several times with different values of K helps.

3.1.7 Unisotropic Gaussians +

The Gaussian mixture model can be generalized to unisotropic Gaussians, which may be elongated
or compressed in certain directions in space. One can think of a cigar-shaped or a UFO-shaped Gaussian.
In that case one would generalize (6) to

1 1 T 1
p(x|k) := exp (x c k ) k (x c k ) , (14)
(2)d/2 |k |1/2 2

with the covariance matrix k playing the role of the width parameter 2 in (6). Note that k is symmetric
and positive semi-definite.
Equations (9) and (11) would stay the same, only (10) would change. Taken together we get the equations
P
P (k|xn )xn
ck = Pn , (15)
m P (k|xm )
T
P
n P (k|xPn )(xn ck )(xn ck )
k = , (16)
m P (k|xm )
1 X
P (k) = P (k|xn ) . (17)
N n

Otherwise, the EM algorithm would work just the same.

3.2 Partition coefficient index

(This section is based on (Gan et al., 2007, 17.4.1).)
To evaluate a soft clustering result one could try to generalize the DB index (5), e.g. by introducing a
weighting factor P (k|xn ) into (3) and summing over all data points. Another approach is to use only the
cluster membership information contained in P (k|xn ). It is clear that

P (k|xn ) [0, 1] (since P it is a probability) , (18)

X
P (k|xn ) = 1 (since xn has to belong to some cluster) (19)
k
X
= P (k|xn ) = N, (20)
k,n
X
P (k|xn ) > 0 (since each cluster should should contain at least one point) . (21)
n

5
The partition coefficient index is defined as
1 X
VP C := P (k|xn )2 . (22)
N
k,n

While (20) equals N in any case, irrespectively of how the data points are assigned to the clusters, the
partition coefficient index, due to the square, lies between 1/K, if all points are assigned with equal
probability to all clusters, and 1, if each point is assigned exactly to one cluster. Thus, VP C = 1
would be optimal and indicate clearly separated clusters.
Notice that the spatial information is taken into account only implicitely, which works only for clustering
models such as the Gaussian mixture model, which have soft tails. For K-means, this index would always
be one by construction, regardless of whether the clustering is good or not.

4 Agglomerative hierarchical clustering

(This section is based on (Gan et al., 2007, 7.2).)

4.1 Dendrograms
In agglomerative hierarchical clustering one starts by considering each single data point as a separate
cluster. Then one merges points that are near to each other into clusters, and finally merges
clusters that are near to each other into clusters. In the end all points form one big cluster.
Documenting the hierarchical merging process results in a tree-like structure and represents the cluster
structure of the data on all levels from fine (cluster distance slighty greater 0) to coarse (cluster distance
). It can be visualized with a dendrogram. Many algorithms can be viewed in this scheme and differ
only in the definition of what near to each other means for clusters.
Let d(xn , xm ) be the distance between two points and Ck indicate a cluster of (possibly only one) points xn .
If we define the distance D(Ck , Cl ) between two clusters Ck and Cl as

Ds (Ck , Cl ) := min d(xn , xm ) , (23)

nCk ,mCl

then it depends on the distance between the nearest two points of the two clusters. If we define

Dc (Ck , Cl ) := max d(xn , xm ) , (24)

nCk ,mCl

then the distance depends on the farthest two points of the two clusters, see figure 2. The former distance
measure gives rise to the single-link method the latter to the complete-link method. These names
come from the idea that you introduce links between all the points in the order of their distance. In the
single-link method, two clusters become merged as soon as they are connected by a single link, which then
naturally has length Ds (Ck , Cl ). In the complete-link method, two clusters become merged only if all points
in one cluster have a link to all points in the other cluster. The last link added before the clusters are merged
then naturally has length Dc (Ck , Cl )
Figure 3 illustrates agglomerative hierarchical clustering with the single- and the complete-link method.
Notice that the resulting dendrograms are qualitatively different and that the distances are naturally larger
in the complete-link method.

4.2 The hierarchical clustering algorithm

Producing a dendrogram by agglomerative hierarchical clustering works as follows:

1. Define each data point as a cluster, Ck := {xk }. Represent each one-point cluster as a point on
the abscissa of a graph, the ordinate of which represents cluster distance.

6
C3

Ds Dc
C1 Ds
C2

Figure 2: Two different measures of cluster distance. Ds (Ck , Cl ) measures the distance between the two
nearest points of the two clusters and Dc (Ck , Cl ) measures the distance between the two farthest points of
the two clusters. Notice that with Ds cluster C2 would first be merged with C1 , while with Dc it would first

be merged with C3 .
CC BY-SA 4.0

singlelink method completelink method

cluster distance

a b c d e a b c d e

a a
c 2 c
1 1
2
b 3 b
4
d d
4
3
e e

Figure 3: An example of the single-link and the complete-link method on a data distribution of 5 data
points a-e. The numbers at the links indicate the order in which the clusters are linked. On the left they are
linked by the minimal smallest distance between points of two clusters; on the right they are linked by the
minimal largest distance between points of two clusters. In this example the dendrograms are qualitatively

different. Also the distances are generally larger in the complete-link method.
CC BY-SA 4.0

7
2. Find the two clusters Ck0 and Cl0 that are closest to each other, i.e.

(k 0 , l0 ) := arg min D(Ck , Cl ) . (25)

k,l

Draw vertical lines in the graph on top of each cluster up to the distance of these two closest
clusters, i.e. up to D(Ck0 , Cl0 ).
3. Merge the two closest clusters into one, i.e. define a new cluster Cq0 := Ck0 unionCl0 and discard Ck0
and Cl0 . Rearrange the clusters on the abscissa such that the two new closest ones become neighbors
(and already connected clusters remain neighbors). Draw a vertical line between the two closest
cluster.

4. Go to step 2 unless there is only one cluster left, then stop.

Depending on how the distance measure D(Ck , Cl ) is defined does this algorithm result in different dendro-
grams and has different intuitive interpretations.

4.3 Validating hierarchical clustering

(This section is based on (Gan et al., 2007, 17.2.2).)
One way to validate a hierarchical clustering as obtained by the single-link or the complete-link method is to
rerun the clustering with data where noise has been added to each data point. If the clustering tree
remains stable against this perturbation, one can assume that the clustering is robust and meaningful;
if it is not, then one should not trust the clustering result.

5 Applications

In this analysis, the expres-

sion of a large number of
Genome-wide expression patterns genes were tested in a time se-
ries in response to a partic-
ular protocol. The time se-
ries were then clustered and
ordered with the help of a den-
drogram. Five groups of genes
emerged that are known to
be involved in (A) cholesterol
biosynthesis, (B) the cell cy-
cle, (C) the immediate-early
(Eisen et al, 1998, Proc. Natl. Acad. Sci. USA 95:148638, Fig. 1, non-free) response, (D) signaling and
angiogenesis, and (E) wound
Genes cluster that are involved in healing and tissue remodel-
(A) cholesterol biosynthesis, ing. These clusters also con-
tain named genes not involved
(B) the cell cycle,
in these processes and numer-
(C) the immediate-early response, ous uncharacterized genes.

(D) signaling and angiogenesis, and Figure: (Eisen et al., 1998,
(E) wound healing and tissue remodeling. Fig. 1)1 non-free.

8
Words can be clustered based
on a large text corpus by
Semantics of words defining similarity between
words depending on common
context, i.e. if two words cooc-
cur with the same words they
are considered similiar, other-
wise they are not. Clustering
can then reveal semantic sim-
ilarities.

Figure: (Gries and Stefanow-
itsch, 2010, Fig. 3)2 non-
free.

(Gries & Stefanowitsch, 2010, Fig. 3, non-free)

Similarity between words is defined by the common context around them,
e.g. other words.

The habbits of Northern Cali-

fornian shoppers were charac-
Northern Californian Shoppers terized by a number of factors,
then a clustering was applied
and a number of prototypes
identified. Factors with abso-
lute values greater than 0.25
and greater than 0.5 are set
in boldface and highlighted in
green, respectively.

Table: (Mokhtarian and Ory,
2007, Tab. 3)3 unclear.

(Mokhtarian & Ory, 2007, Tab. 3, unclear)

9
References
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. 3

Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and display of
genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A., 95:1486314868. 8
Gan, G., Ma, C., and Wu, J. (2007). Data Clustering - Theory, Algorithms, and Applications. ASA-SIAM
Series on Statistics and Applied Probability. SIAM, Philadelphia, VA, USA. 3, 5, 6, 8

Gries, S. T. and Stefanowitsch, A. (2010). Cluster analysis and the identification of collexeme classes. In
Rice, S. and Newman, J., editors, Empirical and Experimental Methods in Cognitive/Functional Research.
CSLI Publications. 9
Mokhtarian, P. L. and Ory, D. T. (2007). Shopping-related attitudes: A factor and cluster analysis of north-
ern california shoppers. Manuscript downloaded 2016-12-14 from http://www.wctrs-society.com/wp/
wp-content/uploads/abstracts/berkeley/D5/149/shoppingAttitudes.bergenfinalwctrsubmit.
070417.doc. 9

Notes
1 Eisen et al, 1998, Proc. Natl. Acad. Sci. USA 95:148638, Fig. 1, non-free, http://gene-quantification.org/
eisen-et-al-cluster-1998.pdf

2 Gries & Stefanowitsch, 2010, Fig. 3, non-free, http://www.linguistics.ucsb.edu/faculty/stgries/research/2010_
STG-AS_ClusteringCollexemes_EmpExpMeth.pdf
3 Mokhtarian & Ory, 2007, Tab. 3, unclear, http://www.wctrs-society.com/wp/wp-content/uploads/abstracts/
berkeley/D5/149/shoppingAttitudes.bergenfinalwctrsubmit.070417.doc

Earth Materials Processes New
No ratings yet
Earth Materials Processes New
47 pages
N Tesla HOW COSMIC FORCES SHAPE OUR DESTINIES PDF
67% (3)
N Tesla HOW COSMIC FORCES SHAPE OUR DESTINIES PDF
5 pages
Cx31-P600-Lehw0008-03-Hoja de Datos
100% (1)
Cx31-P600-Lehw0008-03-Hoja de Datos
3 pages
Chapter 4 - Gas Absorption
100% (8)
Chapter 4 - Gas Absorption
95 pages
Solutions To The Exercises On Independent Component Analysis
No ratings yet
Solutions To The Exercises On Independent Component Analysis
12 pages
A07_27SI
No ratings yet
A07_27SI
13 pages
AIML_ECE_UNIT-4
No ratings yet
AIML_ECE_UNIT-4
130 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
3.K-metoids and hierarchical updated (2)
No ratings yet
3.K-metoids and hierarchical updated (2)
50 pages
Clustering (1)
No ratings yet
Clustering (1)
53 pages
Clustering
No ratings yet
Clustering
82 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
WHP04 Q2
No ratings yet
WHP04 Q2
2 pages
WBS Assignment
No ratings yet
WBS Assignment
3 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
Chapter_6 (2)
No ratings yet
Chapter_6 (2)
54 pages
ADC Project Report
No ratings yet
ADC Project Report
7 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
4760 - Moving Charges and Magnetism - Solution
No ratings yet
4760 - Moving Charges and Magnetism - Solution
10 pages
Clustering
No ratings yet
Clustering
45 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Phase Modulation
100% (1)
Phase Modulation
2 pages
A Systematic Review On Studies of Thermal Comfort in Building Transitional Space
No ratings yet
A Systematic Review On Studies of Thermal Comfort in Building Transitional Space
16 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
K-Medias, Mezcla de Gausianas y Un Ejemplo
No ratings yet
K-Medias, Mezcla de Gausianas y Un Ejemplo
6 pages
Unit-4
No ratings yet
Unit-4
19 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Final
No ratings yet
Final
11 pages
Unit-V Clustering part 1
No ratings yet
Unit-V Clustering part 1
26 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Lab 04
No ratings yet
Lab 04
17 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
clustering
No ratings yet
clustering
16 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
DM_C6
No ratings yet
DM_C6
37 pages
_Clustering
No ratings yet
_Clustering
41 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
Area Abierta en Pozos-Roscoe Moss
No ratings yet
Area Abierta en Pozos-Roscoe Moss
15 pages
Exercises On Fisher Discriminant Analysis
No ratings yet
Exercises On Fisher Discriminant Analysis
2 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
24 # Assignment (Nuclear Physics) Student Copy
No ratings yet
24 # Assignment (Nuclear Physics) Student Copy
5 pages
OOP - Lab Notes
No ratings yet
OOP - Lab Notes
107 pages
Exercises On Independent Component Analysis
No ratings yet
Exercises On Independent Component Analysis
5 pages
Solutions To The Exercises On Fisher Discriminant Analysis
No ratings yet
Solutions To The Exercises On Fisher Discriminant Analysis
5 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
ITN - Module - 11 IPv4 Addressing For Mhs PDF
100% (1)
ITN - Module - 11 IPv4 Addressing For Mhs PDF
58 pages
M5
No ratings yet
M5
40 pages
Ocvirk - Short Bearing Approximation For Full Journal Bearings - 1952
No ratings yet
Ocvirk - Short Bearing Approximation For Full Journal Bearings - 1952
62 pages
Clustering
No ratings yet
Clustering
35 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
Unit 5
No ratings yet
Unit 5
63 pages
UNIT5
No ratings yet
UNIT5
60 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
ISO-17827-1-2016
No ratings yet
ISO-17827-1-2016
9 pages
K Means
No ratings yet
K Means
33 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Lecture Notes On Independent Component Analysis
No ratings yet
Lecture Notes On Independent Component Analysis
12 pages
JS330/360 LC/NLC: Hydraulic Excavator
No ratings yet
JS330/360 LC/NLC: Hydraulic Excavator
24 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Backpropagation LectureNotesPublic
No ratings yet
Backpropagation LectureNotesPublic
13 pages
DM30L Steel Track
No ratings yet
DM30L Steel Track
1 page
Clustering
No ratings yet
Clustering
39 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Industrial Furnaces
No ratings yet
Industrial Furnaces
20 pages
Overview of Polkadot and Its Design Considerations: Jeff Burdges, Alfonso Cevallos, Peter Czaban
No ratings yet
Overview of Polkadot and Its Design Considerations: Jeff Burdges, Alfonso Cevallos, Peter Czaban
41 pages
Risk Management in Commodity Markets (2008)
100% (2)
Risk Management in Commodity Markets (2008)
323 pages
Introduction To MS Excel 2007
No ratings yet
Introduction To MS Excel 2007
12 pages
Symmetry As A Compositional Determinant (SOLOMON 2002)
No ratings yet
Symmetry As A Compositional Determinant (SOLOMON 2002)
15 pages
MATH-SAT, ACT Problem Book
No ratings yet
MATH-SAT, ACT Problem Book
130 pages
06-086-098 Weld Ring Gaskets
No ratings yet
06-086-098 Weld Ring Gaskets
13 pages
Cluster
100% (1)
Cluster
72 pages
Pipette Performance Check SOP
No ratings yet
Pipette Performance Check SOP
6 pages
External Alarm Schematic
No ratings yet
External Alarm Schematic
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture Notes On Clustering

Uploaded by

Lecture Notes On Clustering

Uploaded by

Lecture Notes on

2 Hard partitional clustering 2

3 Soft partitional clustering 3

4 Agglomerative hierarchical clustering 6

These lecture notes depend on my lecture notes on vector quantization.

2 Hard partitional clustering

this figure by the authors.)

2.2 Davies-Bouldin index

3 Soft partitional clustering

3.1.2 Isotropic Gaussians

3.1.3 Maximum likelihood estimation

3.1.4 Conditions for a local optimum

3.1.6 Practical problems

3.1.7 Unisotropic Gaussians +

Otherwise, the EM algorithm would work just the same.

3.2 Partition coefficient index

P (k|xn ) [0, 1] (since P it is a probability) , (18)

4 Agglomerative hierarchical clustering

Ds (Ck , Cl ) := min d(xn , xm ) , (23)

Dc (Ck , Cl ) := max d(xn , xm ) , (24)

4.2 The hierarchical clustering algorithm

singlelink method completelink method

(k 0 , l0 ) := arg min D(Ck , Cl ) . (25)

4. Go to step 2 unless there is only one cluster left, then stop.

4.3 Validating hierarchical clustering

In this analysis, the expres-

(Gries & Stefanowitsch, 2010, Fig. 3, non-free)

The habbits of Northern Cali-

(Mokhtarian & Ory, 2007, Tab. 3, unclear)

You might also like