Cluster Analysis

Uploaded by

harikrishnahk6305

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views27 pages

Cluster Analysis

Uploaded by

harikrishnahk6305

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Cluster Analysis

• Cluster analysis is a statistical method for processing

data. It works by organising items into groups – or
clusters – based on how closely associated they are.
• The objective of cluster analysis is to find similar groups
of subjects, where the “similarity” between each pair of
subjects represents a unique characteristic of the group
vs. the larger population/sample. Strong differentiation
between groups is indicated through separate clusters;
a single cluster indicates extremely homogeneous data.
Cluster analysis algorithms

• Your choice of cluster analysis algorithm is important,

particularly when you have mixed data. In major
statistics packages you’ll find a range of preset
algorithms ready to number-crunch your matrices.
• K-means and K-medoid are two of the most suitable
clustering methods. In both cases (K) = the number of
clusters.
Properties of Clustering :
1. Clustering Scalability: Nowadays there is a vast amount of data and should be dealing with
huge databases. In order to handle extensive databases, the clustering algorithm should be
scalable. Data should be scalable, if it is not scalable, then we can’t get the appropriate result
which would lead to wrong results.
2. High Dimensionality: The algorithm should be able to handle high dimensional space along
with the data of small size.
3. Algorithm Usability with multiple data kinds: Different kinds of data can be used with
algorithms of clustering. It should be capable of dealing with different types of data like discrete,
categorical and interval-based data, binary data etc.
4. Dealing with unstructured data: There would be some databases that contain missing
values, and noisy or erroneous data. If the algorithms are sensitive to such data then it may lead
to poor quality clusters. So it should be able to handle unstructured data and give some structure
to the data by organising it into groups of similar data objects. This makes the job of the data
expert easier in order to process the data and discover new patterns.
5. Interpretability: The clustering outcomes should be interpretable, comprehensible, and
usable. The interpretability reflects how easily the data is understood.
2.It can be used for exploratory data analysis and can help with feature selection.

3.It can be used to reduce the dimensionality of the data.

4.It can be used for anomaly detection and outlier identification.

5.It can be used for market segmentation and customer profiling.

Disadvantages of Cluster Analysis:

6.It can be sensitive to the choice of initial conditions and the number of clusters.

7.It can be sensitive to the presence of noise or outliers in the data.

8.It can be difficult to interpret the results of the analysis if the clusters are not well-defined.

9.It can be computationally expensive for large datasets.

10.The results of the analysis can be affected by the choice of clustering algorithm used.

11.It is important to note that the success of cluster analysis depends on the data, the goals of the
analysis, and the ability of the analyst to interpret the results.
Density-based clustering
• Density-based clustering refers to a method that is
based on local cluster criterion, such as density
connected points. In this tutorial, we will discuss
density-based clustering with examples.
What is Density-based clustering?
• Density-Based Clustering refers to one of the most
popular unsupervised learning methodologies used in
model building and machine learning algorithms. The
data points in the region separated by two clusters of
low point density are considered as noise. The
surroundings with a radius ε of a given object are known
as the ε neighborhood of the object. If the ε
neighborhood of the object comprises at least a
minimum number, MinPts of objects, then it is called a
core object.
• Parameters Required For DBSCAN Algorithm

1.eps: It defines the neighborhood around a data point i.e. if the

distance between two points is lower or equal to ‘eps’ then
they are considered neighbors. If the eps value is chosen too
small then a large part of the data will be considered as an
outlier. If it is chosen very large then the clusters will merge
and the majority of the data points will be in the same clusters.
One way to find the eps value is based on the k-distance
graph.

2.MinPts: Minimum number of neighbors (data points) within

eps radius. The larger the dataset, the larger value of MinPts
must be chosen. As a general rule, the minimum MinPts can be
derived from the number of dimensions D in the dataset as,
MinPts >= D+1. The minimum value of MinPts must be chosen
at least 3.
Density-Based Clustering - Background
There are two different parameters to calculate the density-based clustering
EPS: It is considered as the maximum radius of the neighborhood.
MinPts: MinPts refers to the minimum number of points in an Eps
neighborhood of that point.
NEps (i) : { k belongs to D and dist (i,k) < = Eps}
Directly density reachable:
A point i is considered as the directly density reachable from a point k with
respect to Eps, MinPts if
i belongs to NEps(k)
Core point condition:
NEps (k) >= MinPts
Density reachable:
A point denoted by i is a density reachable from a point j with
respect to Eps, MinPts if there is a sequence chain of a point i1,….,
in, i1 = j, pn = i such that ii + 1 is directly density reachable from ii.
Density connected:
A point i refers to density connected to a point j with respect to
Eps, MinPts if there is a point o such that both i and j are
considered as density reachable from o with respect to Eps and
MinPts.
Major Features of Density-Based
Clustering
• The primary features of Density-based clustering are
given below.
• It is a scan method.
• It requires density parameters as a termination
condition.
• It is used to manage noise in data clusters.
• Density-based clustering is used to identify clusters of
arbitrary size.
Example
• MinPts: 4
• Eps: 1.9

• Use Euclidean Distance

My Revision Notes AQA CS A-Level
100% (3)
My Revision Notes AQA CS A-Level
259 pages
Oracle New List
50% (2)
Oracle New List
13 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Fresco Play 4
No ratings yet
Fresco Play 4
22 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
SQL Injection Attack Lab
No ratings yet
SQL Injection Attack Lab
9 pages
Clustering
No ratings yet
Clustering
12 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
710-97 - Archive Server 9.7.1 Administration
No ratings yet
710-97 - Archive Server 9.7.1 Administration
449 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
4.5-Cluster Analysis
No ratings yet
4.5-Cluster Analysis
17 pages
A Survey of Some Density Based Clustering Techniques PDF
No ratings yet
A Survey of Some Density Based Clustering Techniques PDF
5 pages
Chapter 2 (19-06-2019 v2)
No ratings yet
Chapter 2 (19-06-2019 v2)
10 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
15 pages
Clustering Density Based
No ratings yet
Clustering Density Based
14 pages
DWDM Unit Vi
No ratings yet
DWDM Unit Vi
23 pages
M5
No ratings yet
M5
40 pages
Density Based
No ratings yet
Density Based
27 pages
M5
No ratings yet
M5
40 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Clustering
No ratings yet
Clustering
65 pages
DWDM 5
No ratings yet
DWDM 5
12 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
Density
No ratings yet
Density
3 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
ML - 8
No ratings yet
ML - 8
70 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
DBSCAN An Assessment of Density Based CL
No ratings yet
DBSCAN An Assessment of Density Based CL
5 pages
PART2
No ratings yet
PART2
61 pages
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
No ratings yet
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
6 pages
DW&M Unit 3 Part II
No ratings yet
DW&M Unit 3 Part II
50 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
VDBSCAN
No ratings yet
VDBSCAN
4 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unit 4
No ratings yet
Unit 4
5 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Unit 4
No ratings yet
Unit 4
16 pages
A Study On "CRM: Sales Force Automation"
No ratings yet
A Study On "CRM: Sales Force Automation"
84 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
Clustering New
No ratings yet
Clustering New
6 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Clustering
No ratings yet
Clustering
11 pages
Introduction To Indexing
No ratings yet
Introduction To Indexing
18 pages
Postgre SQL
No ratings yet
Postgre SQL
21 pages
12 Computer Science SP 02 A
No ratings yet
12 Computer Science SP 02 A
18 pages
Trigger Dbms
No ratings yet
Trigger Dbms
7 pages
Chapter 7: Relational Database Design
No ratings yet
Chapter 7: Relational Database Design
92 pages
Module 5 - Data Visualization
No ratings yet
Module 5 - Data Visualization
53 pages
Database Management Systems Lab
No ratings yet
Database Management Systems Lab
81 pages
Admin Interview - PPT by Teju Validation Upto
No ratings yet
Admin Interview - PPT by Teju Validation Upto
92 pages
Dream Content Analysis Using Artificial Intelli-Gence
No ratings yet
Dream Content Analysis Using Artificial Intelli-Gence
11 pages
Classification by Back Propagation Iv1
No ratings yet
Classification by Back Propagation Iv1
20 pages
DWDM Glob
No ratings yet
DWDM Glob
20 pages
Data Analytics Certification Program Learnbay
No ratings yet
Data Analytics Certification Program Learnbay
32 pages
02 - Basic Data Warehousing & Architectures
No ratings yet
02 - Basic Data Warehousing & Architectures
51 pages
Roles of Data Scientists in Business and Society
No ratings yet
Roles of Data Scientists in Business and Society
47 pages
3 DBMS - Quest
No ratings yet
3 DBMS - Quest
19 pages
Metadata and The World Wide Web Citeseerx - 59eccd5d1723ddea75f92526
No ratings yet
Metadata and The World Wide Web Citeseerx - 59eccd5d1723ddea75f92526
13 pages
10jn25 8
No ratings yet
10jn25 8
1 page
openSAP Hanasql2 Week 3 Transcript EN
No ratings yet
openSAP Hanasql2 Week 3 Transcript EN
21 pages
DWDM Lab Tasks
No ratings yet
DWDM Lab Tasks
13 pages
Creating A Boiler Plant Dashboard in Power BI Involves Several Steps
No ratings yet
Creating A Boiler Plant Dashboard in Power BI Involves Several Steps
2 pages
Operating Web Class 12 Notes
No ratings yet
Operating Web Class 12 Notes
5 pages
Final Project
No ratings yet
Final Project
8 pages
Telephone Directory Management System Project
No ratings yet
Telephone Directory Management System Project
3 pages
NPTEL Assignment-Week 3
No ratings yet
NPTEL Assignment-Week 3
4 pages
Azure Data Engineer Roadmap
No ratings yet
Azure Data Engineer Roadmap
7 pages
Technical Support Basic
No ratings yet
Technical Support Basic
5 pages
Named PL SQL BLOCK
No ratings yet
Named PL SQL BLOCK
3 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

• Cluster analysis is a statistical method for processing

• Your choice of cluster analysis algorithm is important,

3.It can be used to reduce the dimensionality of the data.

4.It can be used for anomaly detection and outlier identification.

5.It can be used for market segmentation and customer profiling.

Disadvantages of Cluster Analysis:

7.It can be sensitive to the presence of noise or outliers in the data.

9.It can be computationally expensive for large datasets.

1.eps: It defines the neighborhood around a data point i.e. if the

2.MinPts: Minimum number of neighbors (data points) within

• Use Euclidean Distance

You might also like