CLUSTERING

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

CLUSTERING

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Clustering

1
Supervised learning vs. unsupervised
learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute
in future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.

2
Clustering:
• Clustering is a technique for finding similarity groups in data, called
clusters. I.e.,
• it groups data instances that are similar to (near) each other in one cluster
and data instances that are very different (far away) from each other into
different clusters.
• Clustering is often called an unsupervised learning task as no class
values denoting an a priori grouping of the data instances are given,
which is the case in supervised learning.
• Due to historical reasons, clustering is often considered
synonymous with unsupervised learning.
• In fact, association rule mining is also unsupervised

3
An illustration
• The data set has three natural groups of data points, i.e., 3
natural clusters.

4
What is clustering for?
• Let us see some real-life examples
• Example 1: groups people of similar sizes together to make
“small”, “medium” and “large” T-Shirts.
• Example 2: In marketing, segment customers according to
their similarities
• To do targeted marketing.
• Example 3: Given a collection of text documents, we want to
organize them according to their content similarities,
• To produce a topic hierarchy

5
Applications:
• In fact, clustering is one of the most utilized data mining
techniques.
• It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing,
insurance, libraries, etc.
• In recent years, due to the rapid increase of online documents, text
clustering becomes important.

6
Common ways to represent
clusters:
• 1.Use the centroid of each cluster to represent the cluster.
• compute the radius and
• standard deviation of the cluster to determine its spread in each
dimension

• The centroid representation alone works well if the clusters are of

the hyper-spherical shape.
• If clusters are elongated or are of other shapes, centroids are not
sufficient

7
2.Using classification model
• All the data points in a
cluster are regarded to
have the same class label,
e.g., the cluster ID.
• run a supervised learning
algorithm on the data to find
a classification model.

8
3.Use frequent values to
represent cluster
• This method is mainly for clustering of categorical data (e.g.,
k-modes clustering).
• Main method used in text clustering, where a small set of
frequent words in each cluster is selected to represent the
cluster.

9
Clusters of arbitrary shapes:
• Hyper-elliptical and hyper-spherical clusters
are usually easy to represent, using their
centroid together with spreads.
• Irregular shape clusters are hard to represent.
They may not be useful in some applications.
• Using centroids are not suitable (upper figure) in
general
• K-means clusters may be more useful (lower
figure), e.g., for making 2 size T-shirts.

10
Hierarchical Clustering:
• Produce a nested sequence of clusters, a tree, also called
Dendrogram.

11
Explanation:
• At the bottom of the tree, there are 5 clusters (5 data points).
• At the next level, cluster 6 contains data points 1 and 2, and cluster 7
contains data points 4 and 5.
• As we move up the tree, we have fewer and fewer clusters.
• Since the whole clustering tree is stored, the user can choose to view
clusters at any level of the tree.

12
Types of hierarchical clustering:
• Agglomerative (bottom up) clustering: It builds the dendrogram
(tree) from the bottom level, and
• merges the most similar (or nearest) pair of clusters
• stops when all the data points are merged into a single cluster (i.e., the
root cluster).
• Divisive (top down) clustering: It starts with all data points in one
cluster, the root.
• Splits the root into a set of child clusters. Each child cluster is recursively
divided further
• stops when only singleton clusters of individual data points remain, i.e.,
each cluster with only a single point

13
Agglomerative clustering:
It is more popular then divisive methods.
• At the beginning, each data point forms a cluster (also called a node).
• Merge nodes/clusters that have the least distance.
• Go on merging
• Eventually all nodes belong to one cluster

14
Agglomerative clustering
algorithm:

15
An example: working of the
algorithm

16
Measuring the distance of two
clusters:
• A few ways to measure distances of two clusters.
• Results in different variations of the algorithm.
• Single link
• Complete link
• Average link
• Centroids
• …

17
Single link method:
• The distance between two clusters
is the distance between two closest
data points in the two clusters, one
data point from each cluster.
• It can find arbitrarily shaped
clusters, but
• It may cause the undesirable “chain
effect” by noisy points

Two natural clusters are

split into two

18
Complete link method:
• The distance between two clusters is the distance of two
furthest data points in the two clusters.
• It is sensitive to outliers because they are far away

19
Average link and centroid
methods:
• Average link: A compromise between
• the sensitivity of complete-link clustering to outliers and
• the tendency of single-link clustering to form long chains that do
not correspond to the intuitive notion of clusters as compact,
spherical objects.
• In this method, the distance between two clusters is the average
distance of all pair-wise distances between the data points in two
clusters.
• Centroid method: In this method, the distance between two
clusters is the distance between their centroids

Clustering
No ratings yet
Clustering
28 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
D3IT Clustering April 2023
No ratings yet
D3IT Clustering April 2023
70 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering
No ratings yet
Clustering
27 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
18 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
28 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Module 5
No ratings yet
Module 5
45 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering
No ratings yet
Clustering
38 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
E-Note 28966 Content Document 20241211091351PM
No ratings yet
E-Note 28966 Content Document 20241211091351PM
69 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Mod2 Clustering Text Book
No ratings yet
Mod2 Clustering Text Book
30 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Module 5
No ratings yet
Module 5
91 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Clustering
No ratings yet
Clustering
29 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Clustering
No ratings yet
Clustering
8 pages
Clustering New
No ratings yet
Clustering New
6 pages
Unit I-Introduction
100% (1)
Unit I-Introduction
23 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
UNIT5
No ratings yet
UNIT5
60 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Clustering
No ratings yet
Clustering
39 pages
Nosql Data Architecture Patterns
No ratings yet
Nosql Data Architecture Patterns
62 pages
Pos Laju Track Trace Information System-2
No ratings yet
Pos Laju Track Trace Information System-2
19 pages
Jurnal Administrasi Puskesmas
No ratings yet
Jurnal Administrasi Puskesmas
10 pages
SQL Quiz
No ratings yet
SQL Quiz
10 pages
AbInitio Questions
No ratings yet
AbInitio Questions
2 pages
Essential Access Exercises
No ratings yet
Essential Access Exercises
15 pages
Using Google Scholar For Systematic Review
No ratings yet
Using Google Scholar For Systematic Review
6 pages
ATA Question Bank@GSR
No ratings yet
ATA Question Bank@GSR
5 pages
Lecture Topic: Protein Databases: Topics Covered
No ratings yet
Lecture Topic: Protein Databases: Topics Covered
67 pages
Sharub Tank WP 2024 Final
No ratings yet
Sharub Tank WP 2024 Final
17 pages
Naive Bayes Thoerem
No ratings yet
Naive Bayes Thoerem
90 pages
M. Madhusudhanarao WP 2024 Revenue Final With Mis.
No ratings yet
M. Madhusudhanarao WP 2024 Revenue Final With Mis.
33 pages
Book Recommendation System
No ratings yet
Book Recommendation System
8 pages
Oracle BI Publisher Report Creation
No ratings yet
Oracle BI Publisher Report Creation
13 pages
SQL Exercises, Practice, Solution
No ratings yet
SQL Exercises, Practice, Solution
8 pages
Chapter One System Analysis and Design
No ratings yet
Chapter One System Analysis and Design
22 pages
Warehouse Complete
No ratings yet
Warehouse Complete
6 pages
MCS 226 (2025)
No ratings yet
MCS 226 (2025)
3 pages
Final Project1 Insurance Claims
No ratings yet
Final Project1 Insurance Claims
186 pages
Mobile Application in IoT
No ratings yet
Mobile Application in IoT
51 pages
FA24-BSE-037 Documentation - My
No ratings yet
FA24-BSE-037 Documentation - My
21 pages
Assign10-Hall of Fame AND Shame
100% (1)
Assign10-Hall of Fame AND Shame
2 pages
Vdocuments - MX - Splunk Roadmap To Operational Information That Roadmap To Operational
No ratings yet
Vdocuments - MX - Splunk Roadmap To Operational Information That Roadmap To Operational
4 pages
HCI Lesson Plan
No ratings yet
HCI Lesson Plan
1 page
Bonu Adv - Paidiraju Writ - July 2024
No ratings yet
Bonu Adv - Paidiraju Writ - July 2024
5 pages
BI Developer
No ratings yet
BI Developer
8 pages
E-Marketing Lecture Notes Knowledge Management
No ratings yet
E-Marketing Lecture Notes Knowledge Management
15 pages
Mongodb AWS Cloud Migration
No ratings yet
Mongodb AWS Cloud Migration
12 pages
SAP BW ODS Object Structure
No ratings yet
SAP BW ODS Object Structure
12 pages
Practice Bioinformatics - Sem2 - 2023-2024
No ratings yet
Practice Bioinformatics - Sem2 - 2023-2024
2 pages
Active Learning
No ratings yet
Active Learning
3 pages
Mandeep Singh
No ratings yet
Mandeep Singh
1 page
Eim Basics
No ratings yet
Eim Basics
3 pages
Assignment I in in AIS
No ratings yet
Assignment I in in AIS
2 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
From Everand
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
Fouad Sabry
No ratings yet

CLUSTERING

Uploaded by

CLUSTERING

Uploaded by

Clustering

• The centroid representation alone works well if the clusters are of

Two natural clusters are

You might also like