Clustering Algorithms

The document discusses different clustering techniques including hierarchical and partitional clustering. It describes hierarchical agglomerative clustering and three linkage methods - single, complete, and average linkage. It also explains k-means clustering including how it works, the algorithm, updating cluster means, and stopping criteria. Clustering and biclustering of microarray data is also briefly mentioned.

Uploaded by

Ayesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views61 pages

Clustering Algorithms

Uploaded by

Ayesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Clustering

Dr. Zoya Khalid

zoya.khalid@nu.edu.pk
Clustering techniques

• Hierarchical: Organize elements into a tree, leaves represent genes and

the length of the paths between leaves represents the distances between
objects (genes, etc). Similar objects lie within the same subtrees. It has
two types:
• Agglomerative (Bottom-Up): Start with every element in its own cluster,
and iteratively join clusters together
• Divisive (Top-Down): Start with one cluster and iteratively divide it into
smaller clusters
Contid….
Measures of similarity and
dissimilarity (distance)
• There are many different ways of calculating similarity and distance
• Knowing your data is important
• When working on distance, pay attention to three properties:
positivity, symmetry, and triangle inequality.
• Examples
• Euclidean distance
Hierarchical Agglomerative Clustering
Most Hierarchical clustering algorithms are agglomerative
Three Techniques
Hierarchical clustering: Recomputing distances
d min (C , C * ) = min d ( x, y )
for all elements x in C and y in C*
• Distance between two clusters is the smallest distance between any pair of their elements
(single-linkage)

d max (C , C * ) = max d ( x, y )
for all elements x in C and y in C*
• Distance between two clusters is the largest distance between any pair of their elements
(complete-linkage)
d avg (C , C ) =
* å d ( x, y )
C C*
for all elements x in C and y in C*
• Distance between two clusters is the average distance between all pairs of their elements
(average-linkage)
Single Linkage example
Single Linkage continued

A B D F
Continued
Complete Linkage Method
Contid….
Contid….
Contid…
Which Distance Measure is Better?
• Each method has both advantages and disadvantages; application-
dependent, single-link and complete-link are the most common
methods
• Single-link
• Can find irregular-shaped clusters
• Sensitive to outliers
• Complete-link, Average-link,
• Robust to outliers
• Tend to break large clusters
• Prefer spherical clusters (smaller sized)
Partitional clustering
• It determines all clusters at once
They include:
• K-means and derivatives
• Fuzzy c-means clustering
• QT clustering algorithm
K –means clustering
K- Means Clustering
K-Means clustering
• consider an example in which our vectors have 2 dimensions

+ +

+ cluster center
profile

+
K-Means clustering
• each iteration involves two steps
• assignment of profiles to clusters
• re-computation of the cluster centers (means)
+ + + +

+ +

assignment re-computation of cluster centers

Example
y
Distance between two clusters
Distance from Cluster 1 – Cluster 2
y
Tabulate them
y
y
Tabulate the new dataset
y
y
y
y
Elbow Method
• It involves running the algorithm multiple times over a loop, with an
increasing number of cluster choice and then plotting a clustering
score as a function of the number of clusters.
How K-means algorithm works
K-Means clustering algorithm
• Input: K, number of clusters, a set X={x1,.. xN} of data points, where
xi are p-dimensional vectors
• Initialize
• Select initial cluster means f1, ….., fK
• Repeat until convergence
• Assign each xi to cluster C(i) such that

C(i) = argmin1≤k≤K ǁ xi - fk ǁ2

• Re-estimate the mean of each cluster based on new members

K-means: updating the mean
• To compute the mean of the kth cluster

1 X
fk = xi
Nk
i:C(i)=k

Number of points in cluster k All points in cluster k

K-means stopping criteria
1. Assignment of objects to clusters don’t change (convergence)

2. Maximum Number of iterations

Microarray data
Clustering of microarray data
Clsutering and Biclustering
• Biclustering - Identifies groups of genes with similar/coherent
expression patterns under a specific subset of the conditions.
• Clustering - Identifies groups of genes/conditions that show similar
activity patterns under all the set of conditions/all the set of genes
under analysis.

Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
No ratings yet
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
47 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Clustering
No ratings yet
Clustering
84 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Clustering
No ratings yet
Clustering
75 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Clustering
No ratings yet
Clustering
75 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering
No ratings yet
Clustering
24 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Clustering Part1
No ratings yet
Clustering Part1
19 pages
Clustering
No ratings yet
Clustering
125 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Clustering
No ratings yet
Clustering
80 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Kmea
No ratings yet
Kmea
53 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Clustering
No ratings yet
Clustering
69 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Event Driven Programing Lab - Lec
No ratings yet
Event Driven Programing Lab - Lec
21 pages
Mini Project 2 Report
No ratings yet
Mini Project 2 Report
28 pages
The Application of AI Technologies in STEM Education: A Systematic Review From 2011 To 2021
No ratings yet
The Application of AI Technologies in STEM Education: A Systematic Review From 2011 To 2021
20 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Unit IV
No ratings yet
Unit IV
51 pages
ML Unit-5
No ratings yet
ML Unit-5
30 pages
AWS T&C Cert AllUp Ebook FINAL PDF
No ratings yet
AWS T&C Cert AllUp Ebook FINAL PDF
48 pages
CRP - Final Brief - 2024-25 - AI
No ratings yet
CRP - Final Brief - 2024-25 - AI
55 pages
8 Machine Learning in Trading
No ratings yet
8 Machine Learning in Trading
17 pages
AISTech 2019 Successful Use Case Applications of Artificial Intelligence in The Steel Industry
No ratings yet
AISTech 2019 Successful Use Case Applications of Artificial Intelligence in The Steel Industry
14 pages
Bio Assignment 3
No ratings yet
Bio Assignment 3
3 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Quiz 1 - Solution
No ratings yet
Quiz 1 - Solution
2 pages
Quiz2 - Solution
No ratings yet
Quiz2 - Solution
2 pages
Assignment 3 CS-460
No ratings yet
Assignment 3 CS-460
2 pages
Airbnb PDF
No ratings yet
Airbnb PDF
9 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Syllabus: International Institute of Professional Studies Davv, Indore
No ratings yet
Syllabus: International Institute of Professional Studies Davv, Indore
13 pages
1 s2.0 S1568494623000844 Main
No ratings yet
1 s2.0 S1568494623000844 Main
18 pages
Machine Larning
No ratings yet
Machine Larning
14 pages
Automatic Estimation of Excavator Actual and Relative Cycle Times
No ratings yet
Automatic Estimation of Excavator Actual and Relative Cycle Times
16 pages
Lec1-Introduction To Bioinformatics
No ratings yet
Lec1-Introduction To Bioinformatics
27 pages
Rationales Mechanisms Challenges Regulating AI NAIAC Non Decisional
No ratings yet
Rationales Mechanisms Challenges Regulating AI NAIAC Non Decisional
16 pages
Learn Basics To Become A Generative AI Engineer PDF
No ratings yet
Learn Basics To Become A Generative AI Engineer PDF
25 pages
2023-Drilling Stuck Probability Intelligent Prediction Based On LSTM Considering Local Interpretability
No ratings yet
2023-Drilling Stuck Probability Intelligent Prediction Based On LSTM Considering Local Interpretability
10 pages
Module 4
No ratings yet
Module 4
63 pages
CSE121
No ratings yet
CSE121
13 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Mid Summary
No ratings yet
Mid Summary
13 pages
Sumeru Solution JD and Drive
No ratings yet
Sumeru Solution JD and Drive
2 pages
Lecture2 - Background
No ratings yet
Lecture2 - Background
43 pages
A Biology Primer For Computer Scientists: Franco P. Preparata
No ratings yet
A Biology Primer For Computer Scientists: Franco P. Preparata
18 pages
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
No ratings yet
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
37 pages
Describe Machine Learning Lifecycle
No ratings yet
Describe Machine Learning Lifecycle
4 pages
3D Structure Prediction
No ratings yet
3D Structure Prediction
33 pages
Why Bioinformatics?: Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Why Bioinformatics?: Zoya Khalid Zoya - Khalid@nu - Edu.pk
22 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Lec4 Databases
No ratings yet
Lec4 Databases
29 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
38 pages
Lecture 5 Fragment Assembly
No ratings yet
Lecture 5 Fragment Assembly
40 pages
Background Guide & Matrix - UNHRC
No ratings yet
Background Guide & Matrix - UNHRC
14 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 4
No ratings yet
Unit 4
125 pages
Architectural Learning Model in Context
No ratings yet
Architectural Learning Model in Context
10 pages
BDE Final Report
No ratings yet
BDE Final Report
53 pages
Day 3
No ratings yet
Day 3
74 pages
Lec 1
No ratings yet
Lec 1
43 pages
Unit4
No ratings yet
Unit4
43 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Module 4 ISML
No ratings yet
Module 4 ISML
88 pages
HW 3
No ratings yet
HW 3
35 pages
Excel Project Sales Data Analysis GC
No ratings yet
Excel Project Sales Data Analysis GC
33 pages

Clustering Algorithms

Uploaded by

Clustering Algorithms

Uploaded by

Clustering

Dr. Zoya Khalid

• Hierarchical: Organize elements into a tree, leaves represent genes and

assignment re-computation of cluster centers

• Re-estimate the mean of each cluster based on new members

Number of points in cluster k All points in cluster k

2. Maximum Number of iterations

You might also like