Lect 12
Lect 12
Data Mining
Clustering
Lecture # 12
Important Announcements
Clustering:
Process of grouping a set of objects into classes
of similar object.
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Goal of Clustering
– Inter-cluster similarity is F1 xx
xx
x xx x
maximized xx
xxxx
x
xxx
– Intra-cluster similarity is x
minimized
F2
Peter Piotr
0.23 3 342.7
Applications of Cluster Analysis
Discovered Clusters Industry Group
Understanding Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,
price fluctuations
Summarization
– Reduce the size of large
data sets
Clustering precipitation
in Australia
Application of Clustering
Clustering is subjective
Partitional Clustering
A division of data objects into non-overlapping subsets (clusters)
Construct various partitions and then evaluate them by some criterion
Hierarchical clustering
A set of nested clusters organized as a hierarchical tree
Create a hierarchical decomposition of the set of objects using some criterion
Hierarchical Partitional
Partitional Clustering
p1
p3 p4
p2
p1 p2 p3 p4
p1
p3 p4
p2
p1 p2 p3 p4
Well-separated clusters
Prototype-based clusters
Contiguity-based clusters
Density-based clusters
Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is
closer (or more similar) to every other point in the cluster than
to any point not in the cluster.
3 well-separated clusters
Types of Clusters: Prototype-Based
Prototype-based (center-based)
– A cluster is a set of objects such that an object in a cluster is
closer (more similar) to the prototype or “center” of a cluster,
than to the center of any other cluster
– The center of a cluster is often a centroid (continuous
attributes), the average of all the points in the cluster, or a
medoid (cetagorical attributes), the most “representative” point
of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
8 contiguous clusters
Types of Clusters: Density-Based
Density-based
– A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
– Used when the clusters are irregular or intertwined, and when
noise and outliers are present.
6 density-based clusters
Types of Clusters: Objective Function
Hierarchical clustering
Density-based clustering
K-means Clustering
Iteration 1
3
2.5
1.5
y
0.5
2.5
1.5
y
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Example
Centroids:
3 – 2 3 4 7 9 new centroid: 5
Centroids:
5 – 2 3 4 7 9 new centroid: 5
2.5
2
Original Points
1.5
y
1
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2.5
1.5
y
0.5
Iteration 1 Iteration 2
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Solutions to Initial Centroids Problem
Multiple runs
– Helps, but probability is not on your side
Use some strategy to select the k initial centroids
and then select among these initial centroids
– Select most widely separated
K-means++ is a robust way of doing this selection
– Use hierarchical clustering to determine initial
centroids
Bisecting K-means
– Not as susceptible to initialization issues
K-means++ for initializalization
CLUTO: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview
Bisecting K-means Example
Limitations of K-means
One solution is to find a large number of clusters such that each of them represents a part of a
natural cluster. But these small clusters need to be put together in a post-processing step.
Overcoming K-means Limitations
One solution is to find a large number of clusters such that each of them represents a part of a
natural cluster. But these small clusters need to be put together in a post-processing step.
Overcoming K-means Limitations
One solution is to find a large number of clusters such that each of them represents a part of
a natural cluster. But these small clusters need to be put together in a post-processing step.
Hierarchical Clustering
6 5
0.2
4
3 4
0.15 2
5
2
0.1
1
0.05
3 1
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
– Divisive:
Start with one, all-inclusive cluster
At each step, split a cluster until each cluster contains an individual
point (or there are k clusters)
An example
An example
SOLUTION:
The closest two values are 100 and 200
=>the centroid of these two values is 150.
Now we are clustering the values: 150, 500, 900, 1100
The closest two values are 900 and 1100
=>the centroid of these two values is 1000.
The remaining values to be joined are: 150, 500, 1000.
The closest two values are 150 and 500
=>the centroid of these two values is 325.
Finally, the two resulting subtrees are joined in the root of
the tree.
57
An example:
Two hierarchical clusters of the expression values of a single
gene measured in 5 experiments.
p2
p3
p4
p5
.
.
. Proximity Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
Step 4
We want to merge the two closest clusters (C2 and C5) and
update the proximity matrix. C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
Step 5
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
Proximity Matrix
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
How to Define Inter-Cluster Distance
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
MIN or Single Link
Distance Matrix:
Hierarchical Clustering: MIN
0.2
5
1 0.15
3
0.1
5
2 1 0.05
2 3 6 0
3 6 2 5 4 1
Dendrogram
4
4
Nested Clusters
Strength of MIN
Two Clusters
Original Points
• Sensitive to noise
Three Clusters
MAX or Complete Linkage
Distance Matrix:
Hierarchical Clustering: MAX
0.4
0.35
4 1
0.3
0.25
2 5 0.2
0.15
5
2 0.1
0.05
3 6 0
3 6 4 1 2 5
3 Dendrogram
1
4
Nested Clusters
Strength of MAX
p jClusterj
proximity(Clusteri , Clusterj )
|Clusteri ||Clusterj |
Distance Matrix:
Hierarchical Clustering: Group Average
0.25
0.2
5 4 1 0.15
2 0.1
5
2 0.05
3 6
0
3 6 4 1 2 5
1 Dendrogram
4
3
Nested Clusters
Hierarchical Clustering: Group Average
Strengths
– Less susceptible to noise
Limitations
– Biased towards globular clusters
Hierarchical Clustering: Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4
5
1
2
5
2
3 6 Group Average
3
4 1
4
Hierarchical Clustering: Problems and Limitations