0% found this document useful (0 votes)

204 views30 pages

K-Means and Hierarchical Clustering

K-means and Hierarchical clustering Powerpoint Presentation which will help you to understand clustering very well.

Uploaded by

Sanket Kharat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

204 views30 pages

K-Means and Hierarchical Clustering

K-means and Hierarchical clustering Powerpoint Presentation which will help you to understand clustering very well.

Uploaded by

Sanket Kharat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

K- means and Hierarchical Clustering

What is Clustering ?

● Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set
into subsets (clusters), so that the data in each subset
(ideally) share some common trait - often according to
some defined distance measure.

2
K – Means Clustering

3
K – means Clustering:
● K-means clustering is one of the simplest and popular
unsupervised machine learning algorithms.

● The objective of K-means is simple: group similar data points

together and discover underlying patterns. To achieve this
objective, K-means looks for a fixed number (k) of clusters in
a dataset.

● In other words, the K-means algorithm identifies k number

of centroids, and then allocates every data point to the nearest
cluster, while keeping the centroids as small as possible.
The ‘means’ in the K-means refers to averaging of the data;
that is, finding the centroid.
4
How K – means Clustering algorithm works?

5
K – means clustering Algorithm with example:
Given Dataset:
K= {2,3,4,10,11,12,20,25,30}
K=2

6
Weakness of K – means Clustering:

● When the numbers of data are not so many, initial grouping will
determine the cluster significantly.

● The number of cluster, K, must be determined before hand. Its

disadvantage is that it does not yield the same result with each run, since
the resulting clusters depend on the initial random assignments.

● We never know the real cluster, using the same data, because if it is
inputted in a different order it may produce different cluster if the number
of data is few.

● It is sensitive to initial condition. Different initial condition may produce

different result of cluster. The algorithm may be trapped in the local
optimum.

7
Applications of K – means Clustering:

● It is relatively efficient and fast. It computes result at O(tkn),

where n is number of objects or points, k is number of
clusters and t is number of iterations.

● k-means clustering can be applied to machine learning or

data mining.

● Used on acoustic data in speech understanding to convert

waveforms into one of k categories (known as Vector
Quantization or Image Segmentation).

● Also used for choosing color palettes on old fashioned

graphical display devices and Image Quantization.
8
Hierarchical Clustering

9
Hierarchical Clustering

● The hierarchical clustering Technique is one of the

popular Clustering techniques in Machine Learning.
● Hierarchical clustering is one of the easy to understand

clustering technique. This clustering technique is divided into

two types:
Agglomerative
Divisive

10
Hierarchical Clustering

● Agglomerative approach
Initialization:
Each object is a cluster
Iteration:
a
ab Merge two clusters which are
b abcde most similar to each other;
Until all objects are merged
c
cde into a single cluster
d
de
e

Step 0 Step 1 Step 2 Step 3 Step 4 bottom-up

11
Hierarchical Clustering

● Divisive Approaches Initialization:

All objects stay in one cluster
Iteration:
a Select a cluster and split it into
ab
two sub clusters
b abcde Until each leaf cluster contains
c only one object
cde
d
de
e

Step 4 Step 3 Step 2 Step 1 Step 0 Top-down

12
Dendrogram
● A binary tree that shows how clusters are
merged/split hierarchically
● Each node on the tree is a cluster; each leaf node is a
singleton cluster

13
Hierarchical Agglomerative Clustering-Linkage
Method
● The single linkage method is based on minimum distance,
or the nearest neighbor rule.
● The complete linkage method is based on the maximum
distance or the furthest neighbor approach.
● The average linkage method the distance between two
clusters is defined as the average of the distances between all
pairs of objects.
Centroid Method
● In the centroid methods, the distance between two clusters is
the distance between their centroids.

14
Single Linkage

Minimum Distance

Cluster 2
Cluster 1
Complete Linkage

Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
15
Cluster 1 Cluster 2
Centroid Method

16
How to Merge Clusters?

● How to measure the distance between clusters?

● Single-link
● Complete-link
Distance?
● Average-link
● Centroid distance

Hint: Distance between clusters is

usually defined on the basis of distance
between objects.

17
How to Define Inter-Cluster Distance

● Single-link
● Complete-link
● Average-link The distance between two
● Centroid distance clusters is represented by the
distance of the closest pair of
data objects belonging to
different clusters.
18
How to Define Inter-Cluster Distance

● Single-link
● Complete-link
● Average-link The distance between two
● Centroid distance clusters is represented by the
distance of the farthest pair of
data objects belonging to
different clusters.
19
How to Define Inter-Cluster Distance

● Single-link
● Complete-link
● Average-link
The distance between two
● Centroid distance
clusters is represented by the
average distance of all pairs of
data objects belonging to
different clusters.
20
How to Define Inter-Cluster Distance

× ×

● Single-link
● Complete-link
● Average-link The distance between two
● Centroid distance clusters is represented by the
distance between the means of
the cluters.

21
An Example of the Agglomerative Hierarchical
Clustering Algorithm

● For the following data set, we will get different

clustering results with the single-link and
complete-link algorithms.

1 5

3 4
2 6

22
Result of the Single-Link algorithm

1 5

3 4
2 6
1 3 4 5 2 6

Result of the Complete-Link algorithm

1 5

3 4
2 6
1 3 2 4 5 6

23
Hierarchical Clustering: Comparison
Single-link Complete-link
5
1 4 1
3
2 5
5 5
2 1 2
2 3 6 3 6
3
1
4 4
4

1 2 5 3 6 4 1 2 5 3 6 4

Average-link Centroid distance

2 5 3 6 4 1 25
1 2 5 3 6 4
Strength of Single-link

Original Points Two Clusters

• Can handle non-global shapes

26
Limitations of Single-Link

Original Points
Two Clusters

• Sensitive to noise and outliers

27
Strength of Complete-link

Original Points Two Clusters

• Less susceptible to noise and outliers

28
Which Distance Measure is Better?
● Each method has both advantages and disadvantages;
application-dependent, single-link and complete-link
are the most common methods
● Single-link
● Can find irregular-shaped clusters
● Sensitive to outliers, suffers the so-called chaining effects
● Complete-link, Average-link, and Centroid distance
● Robust to outliers
● Tend to break large clusters
● Prefer spherical clusters

29
30

GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
No ratings yet
GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
140 pages
System Center Configuration Manager - Administration Ebook v2
No ratings yet
System Center Configuration Manager - Administration Ebook v2
197 pages
K Means Questions
No ratings yet
K Means Questions
2 pages
Address Calculation PDF
33% (3)
Address Calculation PDF
2 pages
Data Science Questions and Answers - Clustering
No ratings yet
Data Science Questions and Answers - Clustering
4 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
Strassen's Matrix Multiplication Algorithm: Problem Description
No ratings yet
Strassen's Matrix Multiplication Algorithm: Problem Description
5 pages
DATA MODELING Notes
No ratings yet
DATA MODELING Notes
8 pages
C++ Practice Questions PDF
100% (1)
C++ Practice Questions PDF
3 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Extended ER Model
No ratings yet
Extended ER Model
4 pages
DDL DML DQL TCL DCL Practice1
50% (4)
DDL DML DQL TCL DCL Practice1
9 pages
PST - Unit 4
No ratings yet
PST - Unit 4
15 pages
Lecture 6 - Simulation of Inventory Systems
0% (1)
Lecture 6 - Simulation of Inventory Systems
12 pages
Chapter 2 - Query Processing and Optimization
No ratings yet
Chapter 2 - Query Processing and Optimization
16 pages
Realtime Java Interview Question PDF
No ratings yet
Realtime Java Interview Question PDF
52 pages
Mall Customer Segmentation Using Machine Learning Techniques
No ratings yet
Mall Customer Segmentation Using Machine Learning Techniques
17 pages
Wifu Syllabus
0% (1)
Wifu Syllabus
9 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Fundamentals of Database Systems - PPTX (Repaired)
No ratings yet
Fundamentals of Database Systems - PPTX (Repaired)
238 pages
Array
No ratings yet
Array
31 pages
Database Anomalies
No ratings yet
Database Anomalies
1 page
DFD and Data Dictionary - SAD 6e
100% (1)
DFD and Data Dictionary - SAD 6e
43 pages
File Organization
No ratings yet
File Organization
2 pages
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
No ratings yet
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
24 pages
Asst Programmer Question
No ratings yet
Asst Programmer Question
7 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
2.ER Model and Normalization
No ratings yet
2.ER Model and Normalization
21 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Relational Algebra and SQL
No ratings yet
Relational Algebra and SQL
68 pages
ER Model Example
No ratings yet
ER Model Example
9 pages
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
100% (1)
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
6 pages
Chapter One:data Structures and Algorithm Analysis
No ratings yet
Chapter One:data Structures and Algorithm Analysis
209 pages
Clustering in Non-Euclidean Space
No ratings yet
Clustering in Non-Euclidean Space
4 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
2 pages
Data Structures and Algorithms: Practical Workbook
100% (1)
Data Structures and Algorithms: Practical Workbook
76 pages
Computer Organization and Architecture: Chapter One Digital Logic and Digital Systems
No ratings yet
Computer Organization and Architecture: Chapter One Digital Logic and Digital Systems
50 pages
Data Structures and Algorithms: Assignment 1
No ratings yet
Data Structures and Algorithms: Assignment 1
4 pages
Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
K Means Clustering Solved Numerical - 5 Minutes Engineering
No ratings yet
K Means Clustering Solved Numerical - 5 Minutes Engineering
8 pages
AIX Interview Questions
No ratings yet
AIX Interview Questions
2 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
Chapter 3. Control Statements
100% (1)
Chapter 3. Control Statements
62 pages
Chapter 5 Data Resource Management
No ratings yet
Chapter 5 Data Resource Management
24 pages
Scilab Lecture NOtes
No ratings yet
Scilab Lecture NOtes
7 pages
Database Management System Assignment
No ratings yet
Database Management System Assignment
8 pages
Software Requirements
100% (1)
Software Requirements
66 pages
Predictive Analytics: Course Syllabus
No ratings yet
Predictive Analytics: Course Syllabus
8 pages
Semester Exam System Erd
No ratings yet
Semester Exam System Erd
7 pages
Clustring
No ratings yet
Clustring
20 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
12.10 Create An ER Diagram For Each of The Following Descriptions
No ratings yet
12.10 Create An ER Diagram For Each of The Following Descriptions
4 pages
Linux Sample Questions
No ratings yet
Linux Sample Questions
6 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
CSC 431 - Computer System Performance Evaluation (2 Units)
No ratings yet
CSC 431 - Computer System Performance Evaluation (2 Units)
56 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Data Mining Exercises - Solutions
No ratings yet
Data Mining Exercises - Solutions
5 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
Business Intelligence and Analytics Tools
No ratings yet
Business Intelligence and Analytics Tools
4 pages
Rdbms
100% (1)
Rdbms
88 pages
Multiprocessor Architecture System
100% (1)
Multiprocessor Architecture System
10 pages
10 Total Mark: 10 X 1 10: NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
No ratings yet
10 Total Mark: 10 X 1 10: NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
7 pages
Packed BCD To Unpacked BCD
No ratings yet
Packed BCD To Unpacked BCD
5 pages
Code Optimization
0% (1)
Code Optimization
90 pages
Computer Hardware Fundamentals Notes
No ratings yet
Computer Hardware Fundamentals Notes
9 pages
Database Case Study Report
No ratings yet
Database Case Study Report
14 pages
Anna University OOPS Question Bank Unit 2
100% (1)
Anna University OOPS Question Bank Unit 2
6 pages
Basic Interview Question
No ratings yet
Basic Interview Question
8 pages
Pds Fall2003
No ratings yet
Pds Fall2003
52 pages
UHF+RFID+Reader+UHFReader18+User's+Manual+V2 0
No ratings yet
UHF+RFID+Reader+UHFReader18+User's+Manual+V2 0
39 pages
Documentum Component Exchange
100% (1)
Documentum Component Exchange
19 pages
Cadence Nclaunch Lab
No ratings yet
Cadence Nclaunch Lab
86 pages
Relative Grading System New
No ratings yet
Relative Grading System New
2 pages
Model Paper Cs VII (Cloud Computing cs703) PDF
No ratings yet
Model Paper Cs VII (Cloud Computing cs703) PDF
4 pages
CWC A Distributed Computing Infrastructure Using Smartphones
No ratings yet
CWC A Distributed Computing Infrastructure Using Smartphones
4 pages
Chapter 4 Network Foundation Protection
No ratings yet
Chapter 4 Network Foundation Protection
6 pages
COMSATS Institute of Information Technology Registrar Office, Principal Seat, Islamabad
No ratings yet
COMSATS Institute of Information Technology Registrar Office, Principal Seat, Islamabad
4 pages
Android Operating System
No ratings yet
Android Operating System
23 pages
Accumulator Based 3-Weight Pattern Generation
No ratings yet
Accumulator Based 3-Weight Pattern Generation
15 pages
ACX Junos Cli
No ratings yet
ACX Junos Cli
336 pages
TreeView Control From The Database
No ratings yet
TreeView Control From The Database
5 pages
Serializability
No ratings yet
Serializability
6 pages
X137daE560 IEC61850 Host R1
No ratings yet
X137daE560 IEC61850 Host R1
75 pages
SmileBasic Manual
No ratings yet
SmileBasic Manual
52 pages
216205.1 - Database Initialization Parameters For Oracle Applications Release 11i
No ratings yet
216205.1 - Database Initialization Parameters For Oracle Applications Release 11i
21 pages
Cits1401 2015 Sem-1 Crawley
No ratings yet
Cits1401 2015 Sem-1 Crawley
3 pages
Delphi - 20 Rules For OOP in Delphi
No ratings yet
Delphi - 20 Rules For OOP in Delphi
4 pages
ModBusPlus PCI85 K01 001 03
No ratings yet
ModBusPlus PCI85 K01 001 03
30 pages
Assignment 2 PF
No ratings yet
Assignment 2 PF
3 pages

K-Means and Hierarchical Clustering

Uploaded by

K-Means and Hierarchical Clustering

Uploaded by

K- means and Hierarchical Clustering

● Clustering is the classification of objects into different

● The objective of K-means is simple: group similar data points

● In other words, the K-means algorithm identifies k number

● The number of cluster, K, must be determined before hand. Its

● It is sensitive to initial condition. Different initial condition may produce

● It is relatively efficient and fast. It computes result at O(tkn),

● k-means clustering can be applied to machine learning or

● Used on acoustic data in speech understanding to convert

● Also used for choosing color palettes on old fashioned

● The hierarchical clustering Technique is one of the

clustering technique. This clustering technique is divided into

Step 0 Step 1 Step 2 Step 3 Step 4 bottom-up

● Divisive Approaches Initialization:

Step 4 Step 3 Step 2 Step 1 Step 0 Top-down

● How to measure the distance between clusters?

Hint: Distance between clusters is

● For the following data set, we will get different

Result of the Complete-Link algorithm

Average-link Centroid distance

Average-link Centroid distance

Original Points Two Clusters

• Can handle non-global shapes

• Sensitive to noise and outliers

Original Points Two Clusters

• Less susceptible to noise and outliers

You might also like