0% found this document useful (0 votes)

54 views22 pages

Clustering Techniques - Hierarchical, K-Means Clustering

Hierarchical and k-means clustering are common clustering techniques. Hierarchical clustering finds successive clusters using previously established clusters in either an agglomerative or divisive manner. K-means clustering partitions data into k clusters by minimizing distances between data points and cluster centroids, with the algorithm iteratively reassigning points until centroids converge. While useful for data exploration, k-means clustering has weaknesses like sensitivity to initialization and requiring pre-specifying the number of clusters.

Uploaded by

Tanya Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views22 pages

Clustering Techniques - Hierarchical, K-Means Clustering

Uploaded by

Tanya Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Clustering Techniques –

Hierarchical, K-means Clustering

1
INTRODUCTION-
What is clustering?

• Clustering is the classification of objects into

different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.

2
Types of clustering:
1. Hierarchical algorithms: these find successive clusters
using previously established clusters.
1. Agglomerative ("bottom-up"): Agglomerative algorithms
begin with each element as a separate cluster and merge them
into successively larger clusters.
2. Divisive ("top-down"): Divisive algorithms begin with the
whole set and proceed to divide it into successively smaller
clusters.
2. Partitional clustering: Partitional algorithms determine all clusters at
once. They include:
– K-means and derivatives
– Fuzzy c-means clustering
– QT clustering algorithm
3
Common Distance measures:

• Distance measure will determine how the similarity of two

elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:

2. The Manhattan distance (also called taxicab norm or 1-norm) is

given by:

4
3.The maximum norm is given by:

4. The Mahalanobis distance corrects data for

different scales and correlations in the variables.
5. Inner product space: The angle between two
vectors can be used as a distance measure when
clustering high dimensional data
6. Hamming distance (sometimes edit distance)
measures the minimum number of substitutions
required to change one member into another.
5
K-MEANS CLUSTERING
• The k-means algorithm is an algorithm to cluster n
objects based on attributes into k partitions, where
k < n.
• It is similar to the expectation-maximization
algorithm for mixtures of Gaussians in that they
both attempt to find the centers of natural clusters
in the data.
• It assumes that the object attributes form a vector
space.
6
• An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing data points so as to minimize the
sum-of-squares criterion

where xn is a vector representing the the nth

data point and uj is the geometric centroid of
the data points in Sj.

7
• Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
• K is positive integer number.
• The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.

8
How the K-Mean Clustering algorith m
works?

9
• Step 1: Begin with a decision on the value of k =
number of clusters .
• Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the training
samples randomly,or systematically as the
following:
1.Take the first k training sample as single- element
clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest
centroid. After each assignment, recompute the
centroid of the gaining cluster.

10
• Step 3: Take each sample in sequence and
compute its distance from the centroid of each
of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster and
update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
• Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.

11
A Simple example showing the implementation of
k-means algorithm
(using K=2)

12
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).

13
Step 2:
• Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
• Their new centroids are:

14
Step 3:
• Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

• Therefore, the new clusters

are:
{1,2} and {3,4,5,6,7}

• Next centroids are:

m1=(1.25,1.5) and m2 =
(3.9,5.1)

15
• Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

• Therefore, there is no change

in the cluster.
• Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.

16
PLOT

17
(with K=3)

Step 1 Step 2
18
PLOT

19
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial grouping
will determine the cluster significantly.
2. The number of cluster, K, must be determined before hand. Its
disadvantage is that it does not yield the same result with
each run, since the resulting clusters depend on the initial
random assignments.
3. We never know the real cluster, using the same data, because
if it is inputted in a different order it may produce different
cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm may be
trapped in the local optimum.

20
Applications of K-Mean Clustering
• It is relatively efficient and fast. It computes result
at O(tkn), where n is number of objects or points, k
is number of clusters and t is number of iterations.
• k-means clustering can be applied to machine
learning or data mining
• Used on acoustic data in speech understanding to
convert waveforms into one of k categories (known
as Vector Quantization or Image Segmentation).
• Also used for choosing color palettes on old
fashioned graphical display devices and Image
Quantization. 21
CONCLUSION
• K-means algorithm is useful for undirected
knowledge discovery and is relatively simple.
• K-means has found wide spread usage in lot
of fields, ranging from unsupervised learning
of neural network, Pattern recognitions,
Classification analysis, Artificial intelligence,
image processing, machine vision, and many
others.

Green Card
100% (4)
Green Card
2 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ADB Ch07 - Data Mining Clustering K-Means
No ratings yet
ADB Ch07 - Data Mining Clustering K-Means
27 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
ML 12
No ratings yet
ML 12
19 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Unsupervised Learning Update
No ratings yet
Unsupervised Learning Update
37 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
No ratings yet
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
28 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Week 11
No ratings yet
Week 11
49 pages
Minor Project
No ratings yet
Minor Project
10 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit 4
No ratings yet
Unit 4
125 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Mini Project
No ratings yet
Mini Project
8 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K Mean
No ratings yet
K Mean
7 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Presentation: Operating System Concept CS-582
No ratings yet
Presentation: Operating System Concept CS-582
13 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
125 pages
Strengths and Weaknesses of Approaches To Teaching Writing
80% (10)
Strengths and Weaknesses of Approaches To Teaching Writing
10 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Mengfei Li+Eggy+Party+Server
No ratings yet
Mengfei Li+Eggy+Party+Server
50 pages
What Is A Building Management System?
100% (1)
What Is A Building Management System?
14 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
1851441271253300110871
No ratings yet
1851441271253300110871
1 page
Dependent Personality Inventory-Revised (DPI-R) - Incorporating A
No ratings yet
Dependent Personality Inventory-Revised (DPI-R) - Incorporating A
85 pages
K Mean
No ratings yet
K Mean
12 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Chapter 1 Introduction To Well Planning
No ratings yet
Chapter 1 Introduction To Well Planning
6 pages
RFID Vehicle Tracking Techno Commercial
No ratings yet
RFID Vehicle Tracking Techno Commercial
11 pages
Présentation Anglais À L'oral
No ratings yet
Présentation Anglais À L'oral
4 pages
Process of The Manufacture of Common Salt
No ratings yet
Process of The Manufacture of Common Salt
23 pages
Course Syllabus and Schedule Rubric
No ratings yet
Course Syllabus and Schedule Rubric
2 pages
Pmei l4 Complete
No ratings yet
Pmei l4 Complete
4 pages
Model QP 2022 Scheme
No ratings yet
Model QP 2022 Scheme
39 pages
PC5020 V3.2 - Manual Programare PDF
No ratings yet
PC5020 V3.2 - Manual Programare PDF
46 pages
Designing Effective Powerpoint Presentations: Adopted From: Victor Chen Erau - V.Chen@Erau
No ratings yet
Designing Effective Powerpoint Presentations: Adopted From: Victor Chen Erau - V.Chen@Erau
49 pages
Types of Brakes: Adhesive Brake
No ratings yet
Types of Brakes: Adhesive Brake
10 pages
Sociology and Nursing
No ratings yet
Sociology and Nursing
115 pages
Precision Oxygen Analyzer: Key Features
No ratings yet
Precision Oxygen Analyzer: Key Features
2 pages
Criticism B.A 5th Sem
No ratings yet
Criticism B.A 5th Sem
19 pages
What Is Bilingualism
No ratings yet
What Is Bilingualism
4 pages
Filters and Value Helps in Virtual Data Model - Part 2
No ratings yet
Filters and Value Helps in Virtual Data Model - Part 2
16 pages
4 - 2 - Thornyhwaite Climate Classification System
No ratings yet
4 - 2 - Thornyhwaite Climate Classification System
5 pages
Course 1 - Exam - Attempt Review
No ratings yet
Course 1 - Exam - Attempt Review
4 pages
Powerteam RT Series Manual
No ratings yet
Powerteam RT Series Manual
10 pages
Visio LCP 02
No ratings yet
Visio LCP 02
15 pages
Si 2008
No ratings yet
Si 2008
5 pages
Pay Query Procedure
No ratings yet
Pay Query Procedure
1 page
A Mathematica Package For Visualizing Objects Inmersed in R4
No ratings yet
A Mathematica Package For Visualizing Objects Inmersed in R4
15 pages
7 - Eleven Case Study
No ratings yet
7 - Eleven Case Study
4 pages
Sripura GP
No ratings yet
Sripura GP
6 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Clustering Techniques - Hierarchical, K-Means Clustering

Uploaded by

Clustering Techniques - Hierarchical, K-Means Clustering

Uploaded by

Clustering Techniques –

Hierarchical, K-means Clustering

• Clustering is the classification of objects into

• Distance measure will determine how the similarity of two

2. The Manhattan distance (also called taxicab norm or 1-norm) is

4. The Mahalanobis distance corrects data for

where xn is a vector representing the the nth

• Therefore, the new clusters

• Next centroids are:

• Therefore, there is no change

You might also like