0% found this document useful (0 votes)
2 views

Module 10

The document provides an overview of clustering, focusing on K-Means and DBSCAN algorithms. It explains the concepts of unsupervised learning, the definition of clustering, and the characteristics of both clustering methods, including their limitations and differences. Additionally, it discusses the types of points in DBSCAN and the algorithm's sensitivity to parameters.

Uploaded by

satyam.kumar10
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 10

The document provides an overview of clustering, focusing on K-Means and DBSCAN algorithms. It explains the concepts of unsupervised learning, the definition of clustering, and the characteristics of both clustering methods, including their limitations and differences. Additionally, it discusses the types of points in DBSCAN and the algorithm's sensitivity to parameters.

Uploaded by

satyam.kumar10
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Learning Objectives

• Introduction to Clustering
• Understand K-Means clustering
• Learn DBSCAN a Density-based clustering
• Compare K-Means clustering with DBSCAN

2
Let’s Look At…

We will take a look at an


Unsupervised unsupervised learning
algorithm to understand how
Learning data without labels can also be
useful.

4
And Now Clustering…

Task of dividing the population


or data points into a number of
groups such that data points in
What is the same groups are more
similar to other data points in
Clustering? the same group and dissimilar
to the data points in other
groups.

5
Clustering

7
Clustering

8
Clustering

9
Clustering

10
Clustering

11
Clustering: K Means

13
Clustering: K Means

14
Clustering: K Means

15
Clustering: K Means

16
Clustering: K Means

17
Clustering: K Means

18
Clustering: K Means

19
Clustering: K Means

20
Clustering: K Means

21
Clustering: K Means

22
Clustering: K Means

24
Clustering: K Means

25
Limitations: K Means

26
Limitations: K Means

27
Limitations: K Means

28
Limitations: K Means

29
Clustering: K Means
Hard clustering and soft clustering
In hard clustering, one data
point can belong to one cluster
only.

But in soft clustering, the output


provided is a probability
likelihood of a data point
belonging to each of the
predefined numbers of clusters.

30
DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN)
● DBSCAN is a density-based spatial clustering algorithm proposed in 1996
by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu.
● It is a clustering method that identifies groups of data points that are
densely packed together while flagging outliers that are isolated in low-
density areas.
● DBSCAN is widely recognized and frequently cited as one of the most
common clustering algorithms in data analysis and machine learning!

32
Density Based Clustering
Basic idea
● Clusters are dense regions in
the data space, separated by
regions of lower object density
● Discovers clusters of arbitrary
shape
Where are the clusters?

33
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.

Any distance function can be used based on the application

34
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.

Any distance function can be used based on the application

36
Density Definition
● High Density: ε-Neighborhood of a point contains at least MinPts
number of points.

Example:
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)

37
Type of Points
● A point is a core point if it has at least MinPts number of points
within ε. These are points that are at the interior of a cluster.
● A border point has fewer than MinPts within ε, but is in the
neighborhood of a core point.
● An outlier point is any point that is not a core point nor a border
point.

38
Type of Points

MinPts = 5

39
Type of Points

40
Density Reachability

Directly density-reachable

42
Density Reachability

Directly density-reachable

43
Density Reachability

Directly density-reachable

Is q is directly density-reachable from p?


Is p is directly density-reachable from q?
Is Density-reachability is symmetric?

44
Density Reachability

Directly density-reachable

Is q is directly density-reachable from p? ✅


Is p is directly density-reachable from q? ❌
Is Density-reachability is symmetric? ❌

45
Density Reachability

Density-reachable (Indirectly)

● A point p is directly density-reachable from p2


● p2 is directly density-reachable from p1
● p1 is directly density-reachable from q
p ← p2 ← p1 ← q form a chain

46
Density Reachability

Density-reachable (Indirectly)

● p is (indirectly) density-reachable from q

● q is not density-reachable from p

p ← p2 ← p1 ← q form a chain

47
Density Connectivity

Density-connected

https://www.geeksforgeeks.org/

48
DBSCAN Algorithm ● ε = 2 cm
● MinPts = 3

50
DBSCAN Algorithm

51
DBSCAN Algorithm

52
DBSCAN Algorithm

for each p ∈ D do
if p is not yet classified then

if p is a core-object then
collect all objects density reachable
from p and assign them to a new
cluster
else
assign p to outlier

53
DBSCAN Algorithm

for each p ∈ D do
if p is not yet classified then
What is the time complexity?

if p is a core-object then
collect all objects density reachable What is the space complexity?
from p and assign them to a new
cluster
else
assign p to outlier

54
DBSCAN is Sensitive to Parameters

56
DBSCAN vs. K Means

57
DBSCAN vs. K Means

DBSCAN K Means
● In DBSCAN two parameters are ● In K-Means only one parameter is required
required for training the Model is for training the model

● Clusters formed in DBSCAN can be ● Clusters formed in K-Means are spherical


of any arbitrary shape or convex in shape

● DBSCAN can work well with ● K-Means is very sensitive to the number of
datasets having noise and outliers clusters so it need to specified
● K-Means does not work well with outliers
data

58

You might also like