0% found this document useful (0 votes)

20 views

6 Clustering

The document discusses clustering, an unsupervised machine learning technique. Clustering involves grouping unlabeled data points so that points within each cluster are similar to each other and dissimilar from points in other clusters. The document covers various clustering algorithms like k-means clustering and hierarchical clustering and their applications in different domains.

Uploaded by

Monis Khan

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

6 Clustering

Uploaded by

Monis Khan

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

ML Methods

Supervised
Unsuprvised
(Prediciton||Classification
(Description||Clustering)
Regression)
What is Clustering?
“It is an unsupervised descriptive data analytics”
Definition: Clustering is the task of dividing the population or
data points into a number of groups such that data points in the
same groups are more similar than data points those are in other
groups.

 Cluster is a group of objects that belongs to the

same category or share similar properties.

 While doing cluster analysis, we first partition the set

of data into groups based on data similarity and
then assign the labels to the groups.

 The main advantage of clustering over classification

is that, it helps find out useful features that
distinguish different groups.
Problem Statement (Objective function):
Given a set of data points, group them into a clusters so that:

 Points within each cluster are similar to each other (Intra

cluster distance are minimized)- Homogeneity
 Points from different clusters are dissimilar (Inter cluster
distance are maximized)- Heterogeneity
Applications of Cluster Analysis
 Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
 Clustering can also help marketers discover distinct groups in their customer
base. And they can characterize their customer groups based on the purchasing
patterns.
 In the field of biology, it can be used to derive plant and animal taxonomies,
categorize genes with similar functionalities and gain insight into structures
inherent to populations.
 Clustering also helps in identification of areas of similar land use in an earth
observation database. It also helps in the identification of groups of houses in a
city according to house type, value, and geographic location.
 Clustering also helps in classifying documents on the web for information
discovery.
 Clustering is also used in outlier detection applications such as detection of
credit card fraud.
 Detecting anomalous behavior, such as unauthorized network intrusions, by
identifying patterns of use falling outside the known clusters.

Clustering Methods
Clustering methods can be classified into the following
categories −

 Partitioning Method (K-Means)

 Hierarchical Method (Agglomerative)
 Density-based Method (DBSCAN)

1-Partitioning Method

Suppose we are given a database of ‘n’ objects and the

partitioning method constructs ‘k’ partition of data. Each
partition will represent a cluster and k ≤ n. It means that
it will classify the data into k groups, which satisfy the
following requirements −
 Each group contains at least one object.
 Each object must belong to exactly one group (Hard
Clustering).

The k-means clustering algorithm

The k-means algorithm is perhaps the most commonly
used clustering method. Having been studied for several
decades, it serves as the foundation for many more
sophisticated clustering techniques.

Key points:
 The k-means algorithm assigns each of the n examples
to one of the k clusters.
 Where suitable k is a number that has been
determined ahead of time (but must be given in
advance at beginning).
 The goal is to minimize the differences within each
cluster and maximize the differences between the
clusters.
Procedure:

Algorithm essentially involves two phases (updation and

assignment)- Recursive process.
 First, it assigns examples to an initial set of k clusters.

 Then, it updates the assignments by adjusting the

cluster boundaries.

 The process of updating and assigning occurs several

times until changes no longer improve the cluster fit.
 At this point, the process stops and the clusters are
finalized.

Basically there is three stopping criteria:

(i) Changes (data points movement between the cluster)

does not improve cluster fit criteria i.e., homogeneity in
cluster).
(ii) Data points stop shifting (data point’s movement stop).
(iii) Set in advance, number of iterations
Using distance to assign and update clusters

Euclidean distance

The points (x1,y1) and (x2,y2) are in 2-

dimensional space, then the Euclidean
distance between them is:

For points (x1,y1,z1) and (x2,y2,z2) in 3-

dimensional space, the Euclidean distance
between them is

As an example, the (Euclidean) distance

between points (2, -1) and (-2, 2) is found to be
dist((2, -1), (-2, 2))= √(2 - (-2))² + ((-1) - 2)²
= √(2 + 2)² + (-1 - 2)²
= √(4)² + (-3)²
= √16 + 9
= √25
=5
Note: Other distance function can also be used, i.e., Manhattan
Distance, Minkowski Distance, Chebychev Distance,
Spearman correlation, edit distance etc.

K-Mean Working Example

Medicine data
Weight PH (y)
(x)
A 1 1
B 2 1
C 4 3
D 5 4

Each medicine can be represented by on point (x, y)

Step: centroid initialization
Suppose: k=2, algorithm select two random points/cluster (A
and B). Then centroid of clusters are:
C1= 1,1
C2= 2,1

Step: Distance calculation

Calculate the distance between cluster centroid and each data
points. Lets use Euclidian distance.
Final distance matrix:

The first row of the distance matrix corresponds to the distance

of each data points to the first centroid and similar second row
for second centroid.

Step-Data points labeling

Assign each point to the cluster based on minimum distance.

Step-Iteration(1)
New centroid
Now re-calculate the centroid of each cluster based on the new
member.
Group-1 has one member and so centroid is = 1,1
Group-2 has three members and so centroid is:

Distance calculation

Data points labeling

Step-Iteration (2)
New centroid
Distance calculation

Data points labeling

Note: Found no changes in cluster i.e., data points are not

moving. This means k-mean clustering has reached at stability
and so no more iteration is needed.
Choosing the appropriate number of clusters

 A technique known as the elbow method attempts to

gauge how the homogeneity or heterogeneity within the
clusters changes for various values of k.
 As illustrated in the following diagrams, the
homogeneity within clusters is expected to increase as
additional clusters are added; similarly, heterogeneity
will also continue to increase with more clusters. As you
could continue to see improvements until each example
is in its own cluster.
 The goal is not to maximize homogeneity and
heterogeneity (think over it: otherwise it would end
cluster with single data point), but rather to find k so that
there are diminishing returns beyond that point. This
value of k is known as the elbow point because it looks
like an elbow.

GoM Report On Government Communication
67% (9)
GoM Report On Government Communication
97 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
kmea
No ratings yet
kmea
53 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Clustering
No ratings yet
Clustering
125 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Clustering Analysis (1)
No ratings yet
Clustering Analysis (1)
12 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Unit 4
No ratings yet
Unit 4
74 pages
Week 9
No ratings yet
Week 9
66 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
89 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Kmean
No ratings yet
Kmean
24 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
PART2
No ratings yet
PART2
61 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
21csc305p Machine Learning Unit 3_updated (2)
No ratings yet
21csc305p Machine Learning Unit 3_updated (2)
147 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
algo
No ratings yet
algo
59 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Module 5
No ratings yet
Module 5
98 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Unit-5
No ratings yet
Unit-5
33 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
2-Introduction - ML Vs Conventional Deteministic Etc
No ratings yet
2-Introduction - ML Vs Conventional Deteministic Etc
21 pages
5-Supervised and Unsupervised
No ratings yet
5-Supervised and Unsupervised
7 pages
Map Reduce
No ratings yet
Map Reduce
21 pages
CC Lab 4 - WSDL
No ratings yet
CC Lab 4 - WSDL
4 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Data-Information-Data Analytics
No ratings yet
Data-Information-Data Analytics
15 pages
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
No ratings yet
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
17 pages
Verilog Lecture 1 - Noopur
No ratings yet
Verilog Lecture 1 - Noopur
41 pages
CG Assignment-2 2021bcs045
No ratings yet
CG Assignment-2 2021bcs045
19 pages
CG Lab4 Assignment 2021IMT015 APOORV JAIN
No ratings yet
CG Lab4 Assignment 2021IMT015 APOORV JAIN
27 pages
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
No ratings yet
Ds Assignment - 9 - Debajyoti - Dhar - Bcs - 021
17 pages
Chapter-2 (Data)
No ratings yet
Chapter-2 (Data)
95 pages
Chapter-1 (Introduction)
No ratings yet
Chapter-1 (Introduction)
17 pages
Chocolate Layer Cake From Rose
No ratings yet
Chocolate Layer Cake From Rose
9 pages
22nd - 25th Oct Consolidated - Seating - Plan - Morning - Shift
No ratings yet
22nd - 25th Oct Consolidated - Seating - Plan - Morning - Shift
4 pages
Task Assignment - Week 18. Ingles 2
No ratings yet
Task Assignment - Week 18. Ingles 2
6 pages
Betahistine Hydrochloride
No ratings yet
Betahistine Hydrochloride
3 pages
Musuko Ga Kawaikute Shikataganai Mazoku No Hahaoya Vol.9 Chapter 200 Successor - Manganelo
No ratings yet
Musuko Ga Kawaikute Shikataganai Mazoku No Hahaoya Vol.9 Chapter 200 Successor - Manganelo
1 page
Marketing Group Assignment
No ratings yet
Marketing Group Assignment
21 pages
Economics of Food Safety: Basic
No ratings yet
Economics of Food Safety: Basic
50 pages
Witricity Synopsis
No ratings yet
Witricity Synopsis
11 pages
DME Question Bank
No ratings yet
DME Question Bank
18 pages
Lynxos
No ratings yet
Lynxos
4 pages
Eco Friendly Competent Ware - VIZAG STEEL
No ratings yet
Eco Friendly Competent Ware - VIZAG STEEL
8 pages
Aman Futures
No ratings yet
Aman Futures
2 pages
Articulo Metil Metacrilato
No ratings yet
Articulo Metil Metacrilato
6 pages
Finals MS Powerpoint SGN
No ratings yet
Finals MS Powerpoint SGN
18 pages
Pharmacology Prelim
No ratings yet
Pharmacology Prelim
43 pages
Download full Animal Skeletons and Anatomy An Image Archive for Artists and Designers Kale James ebook all chapters
100% (1)
Download full Animal Skeletons and Anatomy An Image Archive for Artists and Designers Kale James ebook all chapters
34 pages
Inf Sta3
No ratings yet
Inf Sta3
15 pages
In The Previous Lesson On It States That T
100% (1)
In The Previous Lesson On It States That T
16 pages
Cds Deploymnet Steps
No ratings yet
Cds Deploymnet Steps
18 pages
The Passive With Reporting Verbs
No ratings yet
The Passive With Reporting Verbs
3 pages
Hotel Dialogues in English
0% (1)
Hotel Dialogues in English
3 pages
Django Bootcamp
No ratings yet
Django Bootcamp
4 pages
Ball Valve Data Sheet: Item Requirement Notes
No ratings yet
Ball Valve Data Sheet: Item Requirement Notes
1 page
BEL 2023 PYQ
No ratings yet
BEL 2023 PYQ
9 pages
Yahoo Mobage Overview
No ratings yet
Yahoo Mobage Overview
40 pages
HISTORY (Grade 10B & 10D) May 2020
No ratings yet
HISTORY (Grade 10B & 10D) May 2020
7 pages
CNB Sof 2018 19 Personal PDF
No ratings yet
CNB Sof 2018 19 Personal PDF
4 pages
Notes On Business Studies (The Nature of Business)
No ratings yet
Notes On Business Studies (The Nature of Business)
13 pages
Tle Module 16
No ratings yet
Tle Module 16
14 pages