Clustering in Machine Learning
Clustering in Machine Learning
• Partitioning method
→ constructs partitions of data points
→ evaluates the partitions by some criterion
→ k -means, k -medoids
• Density-based method:
→ based on connectivity and density functions
→ DBSCAN, DJCluster
Density-Based Clustering
locates regions of high density that are separated from one another
by regions of low density.
• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
Directly density-reachable:
→ A point p is directly density-reachable from a point q wrt. Eps,
MinPts if:
Directly density-reachable:
→ A point p is directly density-reachable from a point q wrt. Eps,
MinPts if:
• Density-reachable:
→ A point p is density-reachable from a point q wrt. Eps,
MinPts if there is a chain of points p1 , ..., pn , with
p1 = q, pn = p, s.t.pi+1 is directly density reachable from pi
• transitive but not symmetric
Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts
Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts
→ symmetric
Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts
→ symmetric
2
Erik Kropat, University of the Bundeswehr Munich
3
Erik Kropat, University of the Bundeswehr Munich
Observation:
• For points in a cluster their k -th nearest neighbours are at
roughly the same distance.
• Noise points have the k -th nearest neighbour at farther
distance.
4
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
5
5
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 17
6
6
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 18
Pros and Cons
Pros:
X discovers clusters of arbitrary shapes
X handles noise
X needs density parameters as termination condition
Cons:
X cannot handle varying densities
X sensitive to parameters → hard to determine the correct set of
parameters
X sampling affects density measures