Machine Learning Notes-1 (Clustering-1)

Uploaded by

rwt91848

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views25 pages

Machine Learning Notes-1 (Clustering-1)

Uploaded by

rwt91848

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CLUSTERING

 Clustering is one of the most useful tasks in data mining process

for discovering groups and identifying interesting distributions
and patterns in the underlying data.
 Clustering problem is about partitioning a given data set into
groups (clusters) such that the data points in a cluster are more
similar to each other than points in different clusters.
 In the clustering process, there are no predefined classes and no
examples that would show what kind of desirable relations should
be valid among the data that is why it is perceived as an
unsupervised process.
 Classification is a procedure of assigning a data item to a
predefined set of categories
bNMF: Bayesian non-negative matrix factorization
DB Scan: Density-Based Spatial Clustering of Applications with Noise
https://www.researchgate.net/figure/Types-of-clustering-algorithms_fig3_362507241
Partitional Clustering  It aims to partition a
dataset into K clusters.

 It groups similar data

points together while
maximizing differences
between the clusters.

 Partitioning methods
work by iteratively
refining the cluster
centroids until
convergence is reached.

 These algorithms are

popular for their speed
and scalability in
https://medium.com/analytics-vidhya/partitional-clustering-181d42049670 handling large datasets.
https://www.scaler.com/topics/data-mining-tutorial/partitioning-methods-in-
data-mining/
Hierarchical Clustering

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.

Agglomerative Clustering
Agglomerative is a bottom-up approach, in which the algorithm starts with
taking all data points as single clusters and merging them until one cluster is left.

Divisive Clustering
Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-
down approach.

https://www.google.com/search?sca_esv=ad2aacbf6bcb86d2&q=hierarchical+clustering&tbm=isch&source=lnms&sa=X&ved=2ahUKEwiDr9vr4
dmEAxXufGwGHQdVDC0Q0pQJegQIDRAB&biw=1366&bih=587&dpr=1#imgrc=UR-6ylprlb0lAM
Density based Clustering

Density-based clustering is an unsupervised machine learning

algorithm that groups similar data points in a dataset based on
their density.

The algorithm identifies core points with a minimum number

of neighboring points within a specified distance
(known as the epsilon radius).

It expands clusters by connecting these core points to their

neighboring points until the density falls below a certain
threshold.

Points that do not include any cluster are considered outliers or

noise.
Core — This is a point that has at least
m points within distance n from itself.

Border — This is a point that has at

least one Core point at a distance n.

Noise — This is a point that is neither a

Core nor a Border. And it has less than
m points within distance n from itself.
https://www.graduatetutor.com/statistics-tutor/k-means-clustering-hierarchical-clustering-density-based-clustering-partitional-
clustering/
https://www.kdnuggets.com/2020/04/dbscan-clustering-algorithm-machine-learning.html
Steps of clustering process

Figure 1: Steps of Clustering process

Steps of clustering process Contd..

The basic steps to develop clustering process are presented in figure 1 and
can be summarized as follows :
 Feature selection: The goal is to select properly the features on which
clustering is to be performed so as to encode as much information as
possible concerning the task of our interest.
 Clustering algorithm: This step refers to the choice of an algorithm that
results in the definition of a good clustering scheme for a data set.
i) Proximity measure: It is a measure that quantifies how “similar” two
data points (i.e. feature vectors) are.
ii) Clustering criterion: In this step, we have to define the clustering
criterion, which can be expressed via a cost function or some other type
of rules. Thus, we may define a “good” clustering criterion, leading to a
partitioning that fits well the data set.
Steps of clustering process

 Validation of the results:

Since clustering algorithms define clusters that are not known a
priori, irrespective of the clustering methods, the final partition of
data requires some kind of evaluation in most applications.

 Interpretation of the results:

In many cases, the experts in the application area have to
integrate the clustering results with other experimental evidence
and analysis in order to draw the right conclusion.
https://techvidvan.com/tutorials/cluster-analysis-in-r/
https://www.linkedin.com/pulse/k-means-clustering-its-use-cases-security-domain-gaurav-sharma
Clustering application
 Data reduction: Clustering can be used to partition data set into a number of “interesting”
clusters. Then, instead of processing the data set as an entity, we adopt the representatives
of the defined clusters in our process.
 Prediction based on groups: Assume, for example, that the cluster analysis is applied to a
data set concerning patients infected by the same disease. The result is a number of clusters
of patients, according to their reaction to specific drugs. Then for a new patient, we identify
the cluster in which he/she can be classified and based on this decision his/her medication
can be made
 Business: In business, clustering may help marketers discover significant groups in their
customers’ database and characterize them based on purchasing patterns.
 Biology: In biology, it can be used to define taxonomies, categorize genes with similar
functionality.
 Spatial data analysis: Due to the huge amounts of spatial data that may be obtained from
satellite images, medical equipment, Geographical Information Systems (GIS), image
database exploration etc.
 Web mining: In this case, clustering is used to discover significant groups of documents
on the Web huge collection of semi-structured documents.
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering
k-Means Clustering

k-means clustering aims to partition n observations into k clusters in

which each observation belongs to the cluster with the
nearest mean, serving as a prototype of the cluster.

The computational complexity of the algorithm is O(ndcT) where d

the number of features and T the number of iterations
k-Means Clustering

Given a set of observations (x1, x2, …, xn), where each

observation is a d-dimensional real vector, k-means
clustering aims to partition the n observations
into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize
the within-cluster sum of squares (WCSS). In other
words, its objective is to find:

where μi is the mean of points in Si.

Reference
https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-
learning
https://www.researchgate.net/figure/Clustering-algorithms-and-their-
applications_fig1_309461986
https://www.linkedin.com/pulse/k-means-clustering-its-use-cases-security-
domain-gaurav-sharma

CS5486 Intelligent Systems: Prof. Jun Wang Department of Computer Science Tel: 3442 9701 Email: Jwang - Cs@cityu - Edu.hk
No ratings yet
CS5486 Intelligent Systems: Prof. Jun Wang Department of Computer Science Tel: 3442 9701 Email: Jwang - Cs@cityu - Edu.hk
324 pages
CHAPTER 4 - Network Models
No ratings yet
CHAPTER 4 - Network Models
11 pages
Clustering
No ratings yet
Clustering
104 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
07Clustering
No ratings yet
07Clustering
34 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 4
No ratings yet
Unit 4
74 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Clustering
No ratings yet
Clustering
125 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Clustering
No ratings yet
Clustering
29 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
M5
No ratings yet
M5
40 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Dwdm Unit v Note
No ratings yet
Dwdm Unit v Note
19 pages
M5
No ratings yet
M5
40 pages
Week 11
No ratings yet
Week 11
49 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Untitled document
No ratings yet
Untitled document
32 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fuzzy LUT
No ratings yet
Fuzzy LUT
18 pages
TAMU, MEEN 364, 2014, Exam
No ratings yet
TAMU, MEEN 364, 2014, Exam
22 pages
12-Genetic Algorithm
No ratings yet
12-Genetic Algorithm
30 pages
Cryptography in Network Security
No ratings yet
Cryptography in Network Security
26 pages
Machine Learning MCQ
No ratings yet
Machine Learning MCQ
11 pages
Life Contingencies and Life Table 6 - April
100% (1)
Life Contingencies and Life Table 6 - April
17 pages
Neural Networks Arbitration For Optimum DCT Image Compression
No ratings yet
Neural Networks Arbitration For Optimum DCT Image Compression
6 pages
Algorithms in Real World PDF
No ratings yet
Algorithms in Real World PDF
303 pages
Cap282:Data Structures-Laboratory: Course Outcomes
No ratings yet
Cap282:Data Structures-Laboratory: Course Outcomes
2 pages
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
No ratings yet
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
16 pages
L23 - Postulates of QM
No ratings yet
L23 - Postulates of QM
24 pages
Extreme Values
No ratings yet
Extreme Values
24 pages
Linear Model And Extensions Peng Ding instant download
No ratings yet
Linear Model And Extensions Peng Ding instant download
91 pages
LCTM and Gru
No ratings yet
LCTM and Gru
62 pages
Attribute Selection Measure
No ratings yet
Attribute Selection Measure
3 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
351 - 27435 - EE419 - 2020 - 1 - 2 - 1 - 0 5 EE419 Lec9 Solution of State Equations
No ratings yet
351 - 27435 - EE419 - 2020 - 1 - 2 - 1 - 0 5 EE419 Lec9 Solution of State Equations
15 pages
Lecture 6
No ratings yet
Lecture 6
50 pages
MP Matlab Codes
No ratings yet
MP Matlab Codes
6 pages
First Law of Thermodynamics
No ratings yet
First Law of Thermodynamics
35 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
Exams 1
No ratings yet
Exams 1
8 pages
e01_978-1-4614-6033-6_01
No ratings yet
e01_978-1-4614-6033-6_01
8 pages
Lab3 - Linked List
No ratings yet
Lab3 - Linked List
4 pages
SMDS-unit-3
No ratings yet
SMDS-unit-3
45 pages
Unit Commitment: Mr. Debasisha Jena
No ratings yet
Unit Commitment: Mr. Debasisha Jena
14 pages
Markov Chains and Mixing Times With A Chapter On C
No ratings yet
Markov Chains and Mixing Times With A Chapter On C
388 pages
DSS Solver Examples
No ratings yet
DSS Solver Examples
4 pages