0% found this document useful (0 votes)

6 views10 pages

Demystifying Clustering KMeans Agglomer

This document provides an overview of three clustering techniques: KMeans, Agglomerative Clustering, and DBSCAN, each with unique advantages for different datasets. KMeans is fast but assumes spherical clusters and requires a predefined number of clusters, while Agglomerative Clustering offers a hierarchical approach without needing to specify the number of clusters. DBSCAN excels in identifying clusters of arbitrary shapes and noise, making it suitable for complex data relationships.

Uploaded by

gaenday12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

Demystifying Clustering KMeans Agglomer

Uploaded by

gaenday12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Demystifying

Clustering: KMeans,
Agglomerative, and
DBSCAN
Welcome to this lecture on clustering techniques! Clustering
is a fundamental concept in machine learning, focusing on
grouping similar data points together. Today, we'll explore
three popular methods: KMeans, Agglomerative Clustering,
and DBSCAN. Each offers unique advantages for different
datasets and problems. Let's dive in and discover how these
algorithms can unlock valuable insights from your data.

by Props
KMeans Clustering: An Overview
KMeans partitions data into \(k\) clusters, aiming to minimize the sum of squared
distances between data points and their respective cluster centroids. This
optimization is represented mathematically as:

\[F = \sum_{i=1}^{k}\sum_{x_{j} \in S_{i}}\left \| x_{j} - \mu_{i} \right \|^{2}\]

KMeans assumes that clusters are spherical and equally sized, which makes it
very fast. However it's sensitive to initialization and the selection of \(k\). It
works best when these assumptions are met.

Advantages
• Fast
• Easy to Implement

Disadvantages
• Assumes spherical Clusters
• Sensitive to initial Centroids
• Requires pre-defined K clusters
KMeans vs. Agglomerative vs. DBSCAN
KMeans Agglomerative DBSCAN

A centroid-based approach, A hierarchical method that A density-based algorithm

which works best on data with doesn’t require a fixed \(k\). capable of detecting clusters
spherical clusters. It's Instead, it builds a with arbitrary shapes and
computationally efficient but dendrogram visualization to identifying noise points or
requires predefining the represent the merging of outliers. Great for messy data
number of clusters \(k\). clusters at different levels of with unpredicatable
Initialization is very important similarity. Flexible in terms of relationships.
to avoid local optima. cluster shape.
Introduction to
Agglomerative Clustering
Agglomerative clustering takes a "bottom-up" approach, starting with
each data point as its own cluster. It then iteratively merges the closest
pairs of clusters until only one cluster remains. This hierarchical process
can be visualized using a dendrogram, a tree-like diagram showing the
sequence of merges. Agglomerative clustering does not require to
specify the number of clusters beforehand, which is an advantage over
KMeans.

Bottom-Up Approach Iterative Merging

Each data point starts as its Closest clusters are merged

own cluster. based on similarity.

Dendrogram Visualization

The merging process is represented as a tree.

How Agglomerative Clustering Works
The key to agglomerative clustering lies in how it measures the similarity between clusters. Different linkage methods exist, each
with its own approach:

Single Linkage: Uses the shortest distance between any two points in the clusters.
Complete Linkage: Uses the longest distance between any two points in the clusters.
Average Linkage: Uses the average distance between all pairs of points in the clusters.
Ward’s Method: Minimizes the variance within clusters.

Agglomerative clustering is particularly useful for smaller datasets or when a hierarchical structure is expected in the data.

Ward's
1
Average
2

Complete
3

Single
4
DBSCAN: Density-Based
Spatial Clustering
DBSCAN forms clusters based on data density, grouping together
points that are closely packed together while marking as outliers
points that lie alone in low-density regions. The algorithm relies on
two key parameters: \(\epsilon\) (eps) and \(min\_samples\). Core
points have at least \(min\_samples\) within a radius of \(\epsilon\),
while border points are within \(\epsilon\) of a core point but do not
meet the density threshold themselves. Points that are neither core
nor border points are considered noise or outliers.

Core Points Border Points

Meet Density Threshold Near core but not dense

Noise Points

Outliers
How DBSCAN Works
Let's delve deeper into the workings of DBSCAN. First, the algorithm selects an unvisited data point and checks its neighborhood within the \(\epsilon\) radius. If the
neighborhood contains at least \(min\_samples\) data points, a new cluster is formed. The algorithm then expands the cluster by recursively finding all connected data points
that meet the density requirement. If the initial point does not meet the density threshold, it is marked as noise, at least until it is included in another point radius.

Select Point

Choose an unvisited point.

Check Density

Examine epsilon neighborhood.

Form Cluster

Expand if density threshold is met.

Mark Noise

If unexpandable.
Use Cases: Agglomerative & DBSCAN
Agglomerative Clustering DBSCAN

Gene expression analysis: Identify hierarchical Geographic data: Clustered cities based on
relationships between genes. population density, and noise isolates
Customer segmentation in marketing: Group Image processing: Segmenting complex
customers based on purchasing behavior and textures in satellite imagery or medical scans.
demographics. Anomaly Detection: Detecting unusual behavior
Document clustering: Identify topics based on in network traffic or financial transactions.
textual analysis.
Dendrograms Explained
A dendrogram serves as a visual tool for understanding the hierarchical structure produced by agglomerative clustering.
It displays the sequence of cluster merges, with the height of each branch indicating the distance between the merged
clusters. By cutting the dendrogram at a certain height, you can determine the optimal number of clusters for your
data. A higher cut leads to fewer, larger clusters, while a lower cut results in more, smaller clusters.

Branches

2 Show merging sequence

Nodes
Represent data points or clusters 1

Height
3
Indicates distance between clusters
Conclusion & Choosing Techniques
Choosing the right clustering technique depends on your data and goals. KMeans is fast and suitable for
spherical data with a known number of clusters, but sensitive to initialization. Agglomerative clustering is
valuable when a hierarchical structure is expected, particularly useful for small datasets, but can be
computationally expensive. DBSCAN excels at handling arbitrary shapes and identifying noise, making it ideal
for data with complex relationships.

When deciding which technique to use, always experiment and compare results. Each dataset has its own
story to tell. By understanding the strengths and weaknesses of each algorithm, you can unlock valuable
insights and make informed decisions.

DBSCAN
Agglomerative
Arbitrary shapes, handles noise
KMeans
Hierarchical needs, small datasets
Fast, spherical data, fixed \(k\)

DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
C16 Dcme
No ratings yet
C16 Dcme
311 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
21st Century Learning For 21st Century Skills 7th European Conference Of Technology Enhanced Learning Ectel 2012 Saarbrcken Germany September 1821 2012 Proceedings 1st Edition Richard Noss Auth instant download
No ratings yet
21st Century Learning For 21st Century Skills 7th European Conference Of Technology Enhanced Learning Ectel 2012 Saarbrcken Germany September 1821 2012 Proceedings 1st Edition Richard Noss Auth instant download
77 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
Module 5
No ratings yet
Module 5
43 pages
Clustering
No ratings yet
Clustering
53 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Clustering
No ratings yet
Clustering
69 pages
Partition
No ratings yet
Partition
52 pages
Clustering
No ratings yet
Clustering
65 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
ML - 8
No ratings yet
ML - 8
70 pages
L07 Clustering Algorithms
No ratings yet
L07 Clustering Algorithms
45 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 4
No ratings yet
Unit 4
16 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Clustering
No ratings yet
Clustering
75 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
Unit 2
No ratings yet
Unit 2
33 pages
Company Profile-Falcon Comp
No ratings yet
Company Profile-Falcon Comp
9 pages
Tutorials: Tutorial 1 Getting Started
No ratings yet
Tutorials: Tutorial 1 Getting Started
11 pages
SIC Final Prac Manual
No ratings yet
SIC Final Prac Manual
60 pages
Clustering
No ratings yet
Clustering
75 pages
Cold Storage Design Thesis
100% (2)
Cold Storage Design Thesis
6 pages
DSS09 (B) - Clustering
No ratings yet
DSS09 (B) - Clustering
35 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
June 2022 QP - Paper 2 OCR Computer Science GCSE
No ratings yet
June 2022 QP - Paper 2 OCR Computer Science GCSE
20 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
AI
No ratings yet
AI
19 pages
Clustering
No ratings yet
Clustering
11 pages
02 5G Xhaul Transport - BRKSPM-2012 BRKSPG-2680
100% (1)
02 5G Xhaul Transport - BRKSPM-2012 BRKSPG-2680
98 pages
9CSC006267e - PROFIsafe Safety Functions Module - 11122023 - EN
No ratings yet
9CSC006267e - PROFIsafe Safety Functions Module - 11122023 - EN
29 pages
Clustering
No ratings yet
Clustering
14 pages
Nara Cognitive Technologies Whitepaper
No ratings yet
Nara Cognitive Technologies Whitepaper
29 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Ds Un4
No ratings yet
Ds Un4
11 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Density Based Clustering (Unit 5)
No ratings yet
Density Based Clustering (Unit 5)
5 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
Aishwarya Digitec Profile Present
No ratings yet
Aishwarya Digitec Profile Present
11 pages
V V V V: Scrapers
No ratings yet
V V V V: Scrapers
38 pages
2022 JamesCook Katalog EN Homepage
No ratings yet
2022 JamesCook Katalog EN Homepage
36 pages
Understanding Clustering
No ratings yet
Understanding Clustering
8 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Time Table
No ratings yet
Time Table
6 pages
G17 Gen 5 Instructions
No ratings yet
G17 Gen 5 Instructions
9 pages
Draft Thesis Proposal
No ratings yet
Draft Thesis Proposal
10 pages
CBSE Class 9 Mathematics Question Paper Set B PDF
No ratings yet
CBSE Class 9 Mathematics Question Paper Set B PDF
10 pages
Introduction To Clustering
No ratings yet
Introduction To Clustering
8 pages
DB Scan
No ratings yet
DB Scan
7 pages
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
0% (1)
Coda Cofee and Bext360 SC: MH, THING, RNET of Things, and BC
5 pages
Flashpool Design
No ratings yet
Flashpool Design
25 pages
Introduction To Web Technologies
No ratings yet
Introduction To Web Technologies
8 pages
Comparison of Clustering Algorithms
No ratings yet
Comparison of Clustering Algorithms
6 pages
Clustering New
No ratings yet
Clustering New
6 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
3 pages
Spatial Data Mining: Clustering Techniques
No ratings yet
Spatial Data Mining: Clustering Techniques
56 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
2 pages
DBSCAN An Assessment of Density Based CL
No ratings yet
DBSCAN An Assessment of Density Based CL
5 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Recommendation
No ratings yet
Recommendation
3 pages
EBAB
No ratings yet
EBAB
2 pages
SysteDrawing LI23090+卡塔尔+CVES1.2+64台 V1 20230613
No ratings yet
SysteDrawing LI23090+卡塔尔+CVES1.2+64台 V1 20230613
3 pages
Junos Genius PDF
No ratings yet
Junos Genius PDF
12 pages
Gemini For Google Cloud Documentation
No ratings yet
Gemini For Google Cloud Documentation
2 pages
Description of Damage: 4.2.7 Brittle Fracture 4.2.7.1
No ratings yet
Description of Damage: 4.2.7 Brittle Fracture 4.2.7.1
5 pages
TC - Conversion Process
No ratings yet
TC - Conversion Process
5 pages
Complaint Copy Uppcl
No ratings yet
Complaint Copy Uppcl
2 pages
Oi Nod 2425 0142
No ratings yet
Oi Nod 2425 0142
1 page
Plant Image Analysis Fundamentals and Applications Edited by S Dutta Gupta and Yasuomi Ibaraki Download
No ratings yet
Plant Image Analysis Fundamentals and Applications Edited by S Dutta Gupta and Yasuomi Ibaraki Download
83 pages

Demystifying Clustering KMeans Agglomer

Uploaded by

Demystifying Clustering KMeans Agglomer

Uploaded by

Demystifying

\[F = \sum_{i=1}^{k}\sum_{x_{j} \in S_{i}}\left \| x_{j} - \mu_{i} \right \|^{2}\]

A centroid-based approach, A hierarchical method that A density-based algorithm

Bottom-Up Approach Iterative Merging

Each data point starts as its Closest clusters are merged

The merging process is represented as a tree.

Core Points Border Points

Meet Density Threshold Near core but not dense

Choose an unvisited point.

Examine epsilon neighborhood.

Expand if density threshold is met.

2 Show merging sequence

You might also like