0% found this document useful (0 votes)

6 views5 pages

CLUSTERING

Clustering is the process of grouping data points based on their similarities, with two main types: hard clustering, where each point belongs to one cluster, and soft clustering, where points have probabilities of belonging to multiple clusters. Popular clustering algorithms include connectivity models, centroid models like K-Means, distribution models, and density models like DBSCAN. Hierarchical clustering builds a hierarchy of clusters and can be visualized with a dendrogram to determine the optimal number of clusters.

Uploaded by

palapadusachivalayam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

CLUSTERING

Uploaded by

palapadusachivalayam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

R16 – B.

Tech – CSE – IV/II – Machine Learning – Unit IV

Clustering: Introduction
 Clustering is the task of dividing the population or data points into a number of groups such
that data points in the same groups are more similar to other data points in the same group
than those in other groups. In simple words, the aim is to segregate groups with similar traits
and assign them into clusters.
 Let’s understand this with an example. Suppose, you are the head of a rental store and wish to
understand preferences of your costumers to scale up your business. Is it possible for you to
look at details of each costumer and devise a unique business strategy for each one of them?
Definitely not. But, what you can do is to cluster all of your costumers into say 10 groups based
on their purchasing habits and use a separate strategy for costumers in each of these 10
groups. And this is what we call clustering.
 Now, that we understand what is clustering. Let’s take a look at the types of clustering.

Types of Clustering
Broadly speaking, clustering can be divided into two subgroups :
 Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or
not. For example, in the above example each customer is put into one group out of the 10
groups.
 Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a
probability or likelihood of that data point to be in those clusters is assigned. For example,
from the above scenario each costumer is assigned a probability to be in either of 10 clusters
of the retail store.

. Page 21 of 25
R16 – B.Tech – CSE – IV/II – Machine Learning – Unit IV

Types of clustering algorithms

Since the task of clustering is subjective, the means that can be used for achieving this goal are
plenty. Every methodology follows a different set of rules for defining the ‘similarity’ among data
points. In fact, there are more than 100 clustering algorithms known. But few of the algorithms are
used popularly, let’s look at them in detail:
 Connectivity models: As the name suggests, these models are based on the notion that the
data points closer in data space exhibit more similarity to each other than the data points lying
farther away. These models can follow two approaches. In the first approach, they start with
classifying all data points into separate clusters & then aggregating them as the distance
decreases. In the second approach, all data points are classified as a single cluster and then
partitioned as the distance increases. Also, the choice of distance function is subjective. These
models are very easy to interpret but lacks scalability for handling big datasets. Examples of
these models are hierarchical clustering algorithm and its variants.
 Centroid models: These are iterative clustering algorithms in which the notion of similarity is
derived by the closeness of a data point to the centroid of the clusters. K-Means clustering
algorithm is a popular algorithm that falls into this category. In these models, the no. of
clusters required at the end have to be mentioned beforehand, which makes it important to
have prior knowledge of the dataset. These models run iteratively to find the local optima.
 Distribution models: These clustering models are based on the notion of how probable is it
that all data points in the cluster belong to the same distribution (For example: Normal,
Gaussian). These models often suffer from overfitting. A popular example of these models is
Expectation-maximization algorithm which uses multivariate normal distributions.
 Density Models: These models search the data space for areas of varied density of data points
in the data space. It isolates various different density regions and assign the data points within
these regions in the same cluster. Popular examples of density models are DBSCAN and
OPTICS.
K Means Clustering
K means is an iterative clustering algorithm that aims to find local maxima in each iteration. This
algorithm works in these 5 steps :
1. Specify the desired number of clusters K : Let us choose k=2 for these 5 data points in 2-D
space.

. Page 22 of 25
R16 – B.Tech – CSE – IV/II – Machine Learning – Unit IV

2. Randomly assign each data point to a cluster : Let’s assign three points in cluster 1 shown
using red color and two points in cluster 2 shown using grey color.

3. Compute cluster centroids : The centroid of data points in the red cluster is shown using
red cross and those in grey cluster using grey cross.

4. Re-assign each point to the closest cluster centroid: Note that only the data point at the
bottom is assigned to the red cluster even though its closer to the centroid of grey cluster.
Thus, we assign that data point into grey cluster

5. Re-compute cluster centroids: Now, re-computing the centroids for both the clusters.
. Page 23 of 25
R16 – B.Tech – CSE – IV/II – Machine Learning – Unit IV

6. Repeat steps 4 and 5 until no improvements are possible : Similarly, we’ll repeat the 4th and
5th steps until we’ll reach global optima. When there will be no further switching of data
points between two clusters for two successive repeats. It will mark the termination of the
algorithm if not explicitly mentioned.
Hierarchical Clustering
Hierarchical clustering, as the name suggests is an algorithm that builds hierarchy of clusters. This
algorithm starts with all the data points assigned to a cluster of their own. Then two nearest
clusters are merged into the same cluster. In the end, this algorithm terminates when there is only
a single cluster left.
The results of hierarchical clustering can be shown using dendrogram. The dendrogram can be
interpreted as:

. Page 24 of 25
R16 – B.Tech – CSE – IV/II – Machine Learning – Unit IV

At the bottom, we start with 25 data points, each assigned to separate clusters. Two closest
clusters are then merged till we have just one cluster at the top. The height in the dendrogram at
which two clusters are merged represents the distance between two clusters in the data space.
The decision of the no. of clusters that can best depict different groups can be chosen by
observing the dendrogram. The best choice of the no. of clusters is the no. of vertical lines in the
dendrogram cut by a horizontal line that can transverse the maximum distance vertically without
intersecting a cluster.
In the above example, the best choice of no. of clusters will be 4 as the red horizontal line in the
dendrogram below covers maximum vertical distance AB.

Two important things that you should know about hierarchical clustering are:
 This algorithm has been implemented above using bottom up approach. It is also possible to
follow top-down approach starting with all data points assigned in the same cluster and
recursively performing splits till each data point is assigned a separate cluster.
 The decision of merging two clusters is taken on the basis of closeness of these clusters. There
are multiple metrics for deciding the closeness of two clusters :
o Euclidean distance: ||a-b||2 = √(Σ(ai-bi))
o Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2)
o Manhattan distance: ||a-b||1 = Σ|ai-bi|
o Maximum distance:||a-b||INFINITY = maxi|ai-bi|
o Mahalanobis distance: √((a-b)T S-1 (-b)) {where, s : covariance matrix}

*****END******

. Page 25 of 25

Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering
No ratings yet
Clustering
53 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
66 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Module 5
No ratings yet
Module 5
43 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
ML - 8
No ratings yet
ML - 8
70 pages
Lec 2
No ratings yet
Lec 2
32 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Clustering
No ratings yet
Clustering
38 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Lect 12
No ratings yet
Lect 12
80 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Unit V Machine Learning
No ratings yet
Unit V Machine Learning
5 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering
No ratings yet
Clustering
75 pages
Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Unit 3
No ratings yet
Unit 3
12 pages
Unit 4
No ratings yet
Unit 4
16 pages
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
No ratings yet
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
24 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
11 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
UNIT5
No ratings yet
UNIT5
60 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Zara
No ratings yet
Zara
47 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Clustering New
No ratings yet
Clustering New
6 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
R PPT 30
No ratings yet
R PPT 30
45 pages
Cluster
100% (1)
Cluster
72 pages
Student's Behavior Clustering Based On Ubiquitous Learning Log Data Using Unsupervised Machine Learning
No ratings yet
Student's Behavior Clustering Based On Ubiquitous Learning Log Data Using Unsupervised Machine Learning
7 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering
No ratings yet
Clustering
3 pages
Clustering
No ratings yet
Clustering
39 pages
An Analysis of Outlier Detection Through Clustering Method
No ratings yet
An Analysis of Outlier Detection Through Clustering Method
6 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Unit 3
No ratings yet
Unit 3
30 pages
AD3461 - ML Lab Manual
No ratings yet
AD3461 - ML Lab Manual
54 pages
Project Report
No ratings yet
Project Report
40 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Ward Clustering Algorithm
100% (1)
Ward Clustering Algorithm
4 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Clustering
No ratings yet
Clustering
4 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
CURE
No ratings yet
CURE
14 pages
Data Segmentation
No ratings yet
Data Segmentation
27 pages
Decision Region Vs
No ratings yet
Decision Region Vs
4 pages
Research Paper On Cluster Techniques of Data Variations
No ratings yet
Research Paper On Cluster Techniques of Data Variations
9 pages
Assignment 5 1
No ratings yet
Assignment 5 1
13 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
3-9-5-Choffray-Lilien Model of Market Seqmentation - New Approach
No ratings yet
3-9-5-Choffray-Lilien Model of Market Seqmentation - New Approach
13 pages
DMDW Case Study Finished
No ratings yet
DMDW Case Study Finished
28 pages
Most Detailed 4 Data Mining Answers
No ratings yet
Most Detailed 4 Data Mining Answers
3 pages
Lab 13
No ratings yet
Lab 13
5 pages
Assignment 2 BDA
No ratings yet
Assignment 2 BDA
9 pages
Ex 9
No ratings yet
Ex 9
2 pages
Utilizing Cluster Analysis of Close Ended Survey Responses To Select Participants For Qualitative Data Collection
No ratings yet
Utilizing Cluster Analysis of Close Ended Survey Responses To Select Participants For Qualitative Data Collection
25 pages

CLUSTERING

Uploaded by

CLUSTERING

Uploaded by

R16 – B.

Tech – CSE – IV/II – Machine Learning – Unit IV

Types of clustering algorithms

You might also like