0% found this document useful (0 votes)

11 views

Clustering

Clustering is an unsupervised machine learning technique used to group unlabeled data points into meaningful clusters. There are several types of clustering algorithms, including k-means clustering, DBSCAN clustering, and self-organizing maps (SOM). K-means clustering partitions data into k groups by finding cluster centroids. DBSCAN clustering identifies core data points that form dense clusters based on a distance epsilon. SOM creates a visual map to represent similarities between data by adjusting neuron weights.

Uploaded by

ExpoMed ExpoMed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Clustering

Uploaded by

ExpoMed ExpoMed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

7.

Clustering
Clustering

Clustering is the process of finding meaningful groups in data.

For example, customers of a company can be grouped based on the purchase

behavior. In recent years, clustering has even found its use in political elections

Clustering to describe the data

Clustering for pre-processing

Types of clustering
1. Exclusive or strict partitioning clusters
2. Overlapping clusters
3. Hierarchical clusters
4. Fuzzy or probabilistic clusters

Based Algorithmic approach

1. Prototype-based clustering
2. Density clustering
3. Hierarchical clustering
4. Model-based clustering
k-Means Clustering
k-Means
k-means clustering is a prototype-based clustering method where the data set
is divided into k clusters.

Objective: find a prototype data point for each cluster; all the data points are
then assigned to the nearest prototype, which then forms a cluster
k Partitions

k-means algorithm divides the data space

into k partitions or boundaries, where the
centroid in each partition is the prototype of
the clusters

Voronoi partition. (“Euclidean Voronoi Diagram” by Raincomplex – personal work. Licensed under Creative
Commons Zero, Public Domain Dedication via Wikimedia Commons
k Partitions

k-means algorithm divides

the data space into k
partitions or boundaries,
where the centroid in each
partition is the prototype of
the clusters
Step 1: Initiate Centroids
Step 2: Assign Data Points
Step 3: Calculate New Centroids

Minimizing the sum of

squared errors (SSE)
Step 4: Repeat Assignment and Calculate
New Centroids
Step 5: Termination

No further change in assignment of data points happens or, in other words, no

significant change in centroids are noted.

Evaluation of Clusters

- Minimize total SSE

- Davies-Bouldin index
DBSCAN CLUSTERING
Density in the dataset
Density of a data point

The number of points within a circular space

with radius ε (epsilon) around a data point A
is six
Step 1: Defining Epsilon and MinPoints

The number of data points inside the space is defined by radius ε . If

MinPoints is defined as 5, the space ε surrounding data point A is considered
a high-density region.
Step 2: Classification of Data Points
Step 3: Clustering

Groups of core points form distinct clusters. If two core points are within ε of
each other, then both core points are within the same cluster.

Optimize the Parameters: ε and a minimum threshold (MinPoints)

Special Cases: Varying Densities
SELF-ORGANIZING MAPS
SOM
Powerful visual clustering technique

A neural network. Output is an organized visual matrix. SOM output is a

two-dimensional grid with data objects placed next to each other based on
their similarity to one another.
Step 1: Topology Specification
Step 2: Initialize Centroids

The initial centroids are values of random data objects from the data set.

Step 3: Assignment of Data Objects

Data objects are selected one by one and assigned to the nearest centroid.
Step 4: Centroid Update

Update the data values of the nearest centroid of the data object, proportional
to the difference between the centroid and the data object.
Step 4: Centroid Update
Step 5: Termination

Until no significant centroid updates take place in each run.

Step 6: Mapping a New Data Object

Based on proximity to the centroids.

Bebras Solutions Guide 2021 R2 Primary
No ratings yet
Bebras Solutions Guide 2021 R2 Primary
64 pages
DSS09 (B) - Clustering
No ratings yet
DSS09 (B) - Clustering
35 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
M5
No ratings yet
M5
40 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
27 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
ML Unit 3
No ratings yet
ML Unit 3
24 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Cluster
100% (1)
Cluster
72 pages
Chapter 7
No ratings yet
Chapter 7
29 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Unit-5
No ratings yet
Unit-5
33 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ML - 8
No ratings yet
ML - 8
70 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
FAI Lecture - 9-10-2023 PDF
No ratings yet
FAI Lecture - 9-10-2023 PDF
16 pages
Clustering
No ratings yet
Clustering
28 pages
Clustering
No ratings yet
Clustering
125 pages
K-Mean
No ratings yet
K-Mean
9 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Chap 19 - CLustering
No ratings yet
Chap 19 - CLustering
18 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
Module 5.Docx Aiml
No ratings yet
Module 5.Docx Aiml
28 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
DWDM 5
No ratings yet
DWDM 5
12 pages
Clustering new
No ratings yet
Clustering new
6 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Module 5
No ratings yet
Module 5
98 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Unit 5
No ratings yet
Unit 5
63 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Cluster
No ratings yet
Cluster
20 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Lect 12
No ratings yet
Lect 12
80 pages
Clustering
No ratings yet
Clustering
17 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
Clustering
No ratings yet
Clustering
84 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Use of Remote Sensing in Urban Planning
No ratings yet
Use of Remote Sensing in Urban Planning
7 pages
Sentiments Analysis Using Ai: Project Report
No ratings yet
Sentiments Analysis Using Ai: Project Report
27 pages
Data Mining Numericals
No ratings yet
Data Mining Numericals
38 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Coursera Machine Learning Specialization
No ratings yet
Coursera Machine Learning Specialization
46 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Lab Report 02
No ratings yet
Lab Report 02
5 pages
(Pec Cs701e)
No ratings yet
(Pec Cs701e)
4 pages
2023-Key Contractor Selection Criteria For Db-Epc Projects in Construction
No ratings yet
2023-Key Contractor Selection Criteria For Db-Epc Projects in Construction
14 pages
Application of Deep Learning in Stock Market - Recent Progess
No ratings yet
Application of Deep Learning in Stock Market - Recent Progess
97 pages
Machine Learning Top Trumps With David Tarrant
No ratings yet
Machine Learning Top Trumps With David Tarrant
22 pages
Michal Kosinski - Private Traits and Attributes Are Predictable From Digital Records of Human Behavior PDF
No ratings yet
Michal Kosinski - Private Traits and Attributes Are Predictable From Digital Records of Human Behavior PDF
4 pages
Recent Advances in Sensing Plant Diseases For Precision Crop Protection PDF
No ratings yet
Recent Advances in Sensing Plant Diseases For Precision Crop Protection PDF
13 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
1 s2.0 S2352914819302047 Main PDF
No ratings yet
1 s2.0 S2352914819302047 Main PDF
6 pages
Q NeuroEvolution Arxiv
No ratings yet
Q NeuroEvolution Arxiv
12 pages
White Paper Artificial Intelligence in Logistics by SSI Schaefer
No ratings yet
White Paper Artificial Intelligence in Logistics by SSI Schaefer
24 pages
BAI601-NLP
No ratings yet
BAI601-NLP
5 pages
Assignment Instructions For The Data Analytics Report
No ratings yet
Assignment Instructions For The Data Analytics Report
5 pages
Powerpoint Presentation
100% (1)
Powerpoint Presentation
30 pages
Unit 1-1
No ratings yet
Unit 1-1
45 pages
Thesis Q. Wang - Abs 2019
No ratings yet
Thesis Q. Wang - Abs 2019
176 pages
Module 1-Data Mining Introduction (Student Edition)
No ratings yet
Module 1-Data Mining Introduction (Student Edition)
39 pages
Section 1.1
No ratings yet
Section 1.1
11 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Mobile App Success Prediction
No ratings yet
Mobile App Success Prediction
8 pages
Process Analysis and Classification
No ratings yet
Process Analysis and Classification
9 pages
Online Payment Fraud Detection
No ratings yet
Online Payment Fraud Detection
5 pages

Clustering

Uploaded by

Clustering

Uploaded by

7.

Clustering is the process of finding meaningful groups in data.

For example, customers of a company can be grouped based on the purchase

Clustering to describe the data

Clustering for pre-processing

Based Algorithmic approach

k-means algorithm divides the data space

k-means algorithm divides

Minimizing the sum of

No further change in assignment of data points happens or, in other words, no

- Minimize total SSE

The number of points within a circular space

The number of data points inside the space is defined by radius ε . If

Optimize the Parameters: ε and a minimum threshold (MinPoints)

A neural network. Output is an organized visual matrix. SOM output is a

Step 3: Assignment of Data Objects

Until no significant centroid updates take place in each run.

Step 6: Mapping a New Data Object

Based on proximity to the centroids.

You might also like