0% found this document useful (0 votes)

7 views

Clustering Notes

Unsupervised Machine Learning Algorithm

Uploaded by

SUGATA SENGUPTA

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Clustering Notes

Unsupervised Machine Learning Algorithm

Uploaded by

SUGATA SENGUPTA

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Machine Learning: Unsupervised Classification

Dr. Muhammad Shaheen

Clustering

[email protected] © M. Shahbaz – 2006

Lecture Outline
• What is Clustering
• Supervised and Unsupervised
Classification
• Types of Clustering Algorithms
• Most Common Techniques
• Areas of Applications
• Discussion
• Result

[email protected] © M. Shahbaz – 2006

Clustering - Definition

─ Process of grouping similar items together

─ Clusters should be very similar to each other
but…
─ Should be very different from the objects of other
clusters/ other clusters
─ We can say that intra-cluster similarity between
objects is high and inter-cluster similarity is low
─ Important human activity --- used from early
childhood in distinguishing between different
items such as cars and cats, animals and plants
etc.
Supervised and Unsupervised Classification

─ What is Classification?
─ What is Supervised Classification/Learning?
─ What is Unsupervised Classification/Learning?
─ SOM – Self Organizing Maps
Types of Clustering Algorithms

─ Clustering has been a popular area of research

─ Several methods and techniques have been
developed to determine natural grouping among
the objects

Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Survey.
ACM Computing Surveys, 1999. 31: pp. 264-323.

Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data. 1988,

Englewood Cliffs, NJ: Prentice Hall. 013022278X
Types of Clustering Algorithms
Clustering

Hierarchical Partitioning Grid-Based Clustering Algorithms For

Methods Methods Methods Algorithms Used in High Dimensional
Machine Learning Data

Agglomerative Divisive
Gradient Descent Evolutionary
Algorithms Algorithms and Artificial Methods
Neural Networks

Subspace Projection Co-Clustering

Clustering Techniques Techniques

Relocation Probabilistic K-medoids K-means Methods Density-Based

Algorithms Clustering Methods Algorithms

Density-Based Density Functions

Connectivity
Clustering
Clustering
Classification vs. Clustering
Classification:
Supervised learning:
Learns a method for predicting the
instance class from pre-labeled
(classified) instances
Clustering

Unsupervised learning:
Finds “natural” grouping of
instances given un-labeled data
Clustering Evaluation

• Manual inspection
• Benchmarking on existing labels
• Cluster quality measures
– distance measures
– high similarity within a cluster, low across
clusters
The Distance Function

• Simplest case: one numeric attribute A

– Distance(X,Y) = A(X) – A(Y)
• Several numeric attributes:
– Distance(X,Y) = Euclidean distance between
X,Y

• Are all attributes equally important?

– Weighting the attributes might be necessary
Simple Clustering: K-means

Works with numeric data only

1) Pick a number (K) of cluster centers (at
random)
2) Assign every item to its nearest cluster
center (e.g. using Euclidean distance)
3) Move each cluster center to the mean of
its assigned items
4) Repeat steps 2,3 until convergence
(change in cluster assignments less than
a threshold)
K-means example, step 1

k1
Y
Pick 3 k2
initial
cluster
centers
(randomly)
k3

X
K-means example, step 2

k1
Y

k2
Assign
each point
to the closest
cluster
center k3

X
K-means example, step 3

k1 k1
Y

Move k2
each cluster
center k3
k2
to the mean
of each cluster k3

X
K-means example, step 4

Reassign k1
points Y
closest to a
different new
cluster center
k3
Q: Which k2
points are
reassigned?

X
K-means example, step 4
…

k1
Y
A: three
points with
animation k3
k2

X
K-means example, step 4b

k1
Y
re-compute
cluster
means k3
k2

X
K-means example, step 5

k1
Y

k2
move cluster
centers to k3
cluster means

X
Squared Error Criterion
Pros and cons of K-Means
K-means variations

• K-medoids – instead of mean, use

medians of each cluster
– Mean of 1, 3, 5, 7, 9 is 5
– Mean of 1, 3, 5, 7, 1009 is 205
– Median of 1, 3, 5, 7, 1009 is 5
– Median advantage: not affected by extreme
values
• For large databases, use sampling
k-Medoids
The k-Medoids Algorithm
Evaluating Cost of Swapping Medoids
Evaluating Cost of Swapping Medoids
Four Cases
Total Cost of Swap
K-means clustering summary

Advantages Disadvantages
• Simple, understandable • Must pick number of
• items automatically clusters before hand
assigned to clusters • All items forced into a
cluster
• Too sensitive to outliers
since an object with an
extremely large value
may substantially
distort the distribution
of data
Hierarchical clustering
• Agglomerative Clustering
– Start with single-instance clusters
– At each step, join the two closest clusters
– Design decision: distance between clusters
• Divisive Clustering
– Start with one universal cluster
– Find two clusters
– Proceed recursively on each subset
– Can be very fast
• Both methods produce a
dendrogram
g a c i e d k b j f h
Partial Supervision of Clustering

A two dimensional image of supervised clusters

Partial Supervision of Clustering

A two dimensional image of supervised clusters (real case)

Partial Supervision of Clustering

Disputed Data
Point

5
4
5 3
4 2
1
3
2
1

A two dimensional image of the different zones of overlapping clusters

who both claim a data point (More than two clusters claiming a point is
also common)
Research Problems

─ Effective and Efficient methods of Clustering

─ Scalability
─ Handling different types of data
─ Handling complex multidimensional data
─ Complex shapes of clusters
─ Subspace Clustering
─ Cluster overlapping etc.
Examples of Clustering Applications

• Marketing: discover customer groups and use

them for targeted marketing and re-organization
• Astronomy: find groups of similar stars and
galaxies
• Earth-quake studies: Observed earth quake
epicenters should be clustered along continent
faults
• Genomics: finding groups of gene with similar
expressions
• …
Clustering Summary
• unsupervised
• many approaches
– K-means – simple, sometimes useful
• K-medoids is less sensitive to outliers
– Hierarchical clustering – works for symbolic
attributes
– Can be used to fill in missing values
Questions

L 8 Clustering
No ratings yet
L 8 Clustering
58 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Unit-4
No ratings yet
Unit-4
53 pages
M5
No ratings yet
M5
40 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
unit4
No ratings yet
unit4
96 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Clustering
No ratings yet
Clustering
84 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
M5
No ratings yet
M5
40 pages
Unit 4
No ratings yet
Unit 4
74 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Week 9
No ratings yet
Week 9
66 pages
(PML ITS - Week 10) - Clustering
No ratings yet
(PML ITS - Week 10) - Clustering
42 pages
Clustering
No ratings yet
Clustering
25 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Clustering new
No ratings yet
Clustering new
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Week 3 Clustering
No ratings yet
Week 3 Clustering
36 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering
No ratings yet
Clustering
44 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering
No ratings yet
Clustering
125 pages
Cluster
No ratings yet
Cluster
20 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
UNIT IV
No ratings yet
UNIT IV
19 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
4 40 e vb21 vb21m Vacuum Breakers dn15
No ratings yet
4 40 e vb21 vb21m Vacuum Breakers dn15
2 pages
IoT in Telemedicine Enhancing Patient Monitoring and care
No ratings yet
IoT in Telemedicine Enhancing Patient Monitoring and care
5 pages
A Case Study Illustrating Nursing Assessment
No ratings yet
A Case Study Illustrating Nursing Assessment
9 pages
IDAPython Book
No ratings yet
IDAPython Book
48 pages
Shanta Kumar V. Council of Scientific and Industrial Research (CSIR) & Ors
No ratings yet
Shanta Kumar V. Council of Scientific and Industrial Research (CSIR) & Ors
4 pages
NCR Retail Platform Software For Windows
No ratings yet
NCR Retail Platform Software For Windows
11 pages
Fossil Book
No ratings yet
Fossil Book
115 pages
Serverless Computing
No ratings yet
Serverless Computing
25 pages
Metlertoled 8530
No ratings yet
Metlertoled 8530
123 pages
BJP and It's Invisible Relationship With Adani Group
No ratings yet
BJP and It's Invisible Relationship With Adani Group
2 pages
The Promise and Pitfalls of Using Robots To Care For The Elderly
No ratings yet
The Promise and Pitfalls of Using Robots To Care For The Elderly
6 pages
Newzoo Free Report Spotting The Mobile Spenders V1
No ratings yet
Newzoo Free Report Spotting The Mobile Spenders V1
31 pages
Bluenrg 2 PDF
No ratings yet
Bluenrg 2 PDF
175 pages
ARBITRATOR
No ratings yet
ARBITRATOR
21 pages
Signage Manual
No ratings yet
Signage Manual
14 pages
Grade X - Letter For Placing An Order
No ratings yet
Grade X - Letter For Placing An Order
4 pages
Pipeline Integrity Management (PIM)-1
No ratings yet
Pipeline Integrity Management (PIM)-1
22 pages
2 - CIR V Filinvest PDF
No ratings yet
2 - CIR V Filinvest PDF
2 pages
Analisis Soalan SPM 2019
No ratings yet
Analisis Soalan SPM 2019
31 pages
Internship Report
No ratings yet
Internship Report
3 pages
C V
No ratings yet
C V
2 pages
Jumping at Shadows - Shadow and de Facto Directors - (2015) 19 (4) IHC 41
No ratings yet
Jumping at Shadows - Shadow and de Facto Directors - (2015) 19 (4) IHC 41
4 pages
PST ECON 2015 2023
No ratings yet
PST ECON 2015 2023
47 pages
EC8392 Digital Electronics
No ratings yet
EC8392 Digital Electronics
124 pages
Theories of Profit
No ratings yet
Theories of Profit
13 pages
MD Companies Registration Jan 2023 Updet 9 Jan 2023
No ratings yet
MD Companies Registration Jan 2023 Updet 9 Jan 2023
9 pages
Swastik Gold Notes
No ratings yet
Swastik Gold Notes
17 pages
Modelingandsimualtionof Bullet Resistant Composite Body Armor
No ratings yet
Modelingandsimualtionof Bullet Resistant Composite Body Armor
10 pages
Solutions Sales Engineering Management in Raleigh Durham NC Resume Matthew Riegel
No ratings yet
Solutions Sales Engineering Management in Raleigh Durham NC Resume Matthew Riegel
2 pages
Rate of Material Used For Making P.P.C Batch 1
No ratings yet
Rate of Material Used For Making P.P.C Batch 1
7 pages