07Clustering

Uploaded by

hussienayman366

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

07Clustering

Uploaded by

hussienayman366

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Cluster Analysis

What is Cluster Analysis?

 Finding groups of objects such that the objects in
a group will be similar (or related) to one another
and different from (or unrelated to) the objects in
other groups Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no
predefined classes
 Clustering is used:
 As a stand-alone tool to get insight into data distribution
 Visualization of clusters may unveil important information
 As a preprocessing step for other algorithms
 Efficient indexing or compression often relies on clustering
Some Applications of Clustering
What Is Good Clustering?
 A good clustering method will produce high
quality clusters with
 high intra-class similarity
 low inter-class similarity
 The quality of a clustering result depends on both
the similarity measure used by the method and its
implementation.
 The quality of a clustering method is also
measured by its ability to discover some or all of
the hidden patterns.
Requirements of Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to
determine input parameters
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
Clustering Algorithms
Four of the most used clustering Algorithms
Distances Measure
K-Means Clustering
Algorithm
K-means Clustering

 Partitional clustering approach

 Each cluster is associated with a centroid (center
point)
 Each point is assigned to the cluster with the
closest centroid
 Number of clusters, K, must be specified
 The basic algorithm is very simple
K-means Clustering – Details
 Initial centroids are often chosen randomly.
 Clusters produced vary from one run to another.
 The centroid is (typically) the mean of the points in
the cluster.
 ‘Closeness’ is measured by Euclidean distance,
cosine similarity, correlation, etc.
 Most of the convergence happens in the first few
iterations.
 Often the stopping condition is changed to ‘Until relatively
few points change clusters’
 Complexity is O( n * K * I * d )
 n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Two different K-means Clusterings
3

2.5

2
Original Points
1.5

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

Optimal Clustering Sub-optimal Clustering

How the K-Mean Clustering algorithm works?
Example of K-Means Clustering
𝐺 𝑖 = 𝐺 𝑖+1 That the objects does not move group anymore
Hierarchical Clustering
Algorithms
How They Work
Step 3 can be done in different ways:
Example
How to calculate distance between newly grouped
clustered (D,F) and other clusters?
Assignment
A hierarchical clustering of distances in kilometers between some Italian
cities. The method used is single-linkage.
Input distance matrix (L = 0 for all the clusters):

The process is summarized by

the following hierarchical tree

ZU Practice: Markscheme
No ratings yet
ZU Practice: Markscheme
11 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering
No ratings yet
Clustering
104 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Module 5
No ratings yet
Module 5
98 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
CT075!3!2 DTM Topic 10 Cluster Analysis
No ratings yet
CT075!3!2 DTM Topic 10 Cluster Analysis
21 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Unit 4
No ratings yet
Unit 4
74 pages
Chapter 3-Unsupervised learning_updated
No ratings yet
Chapter 3-Unsupervised learning_updated
54 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Cluster
100% (1)
Cluster
72 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
UNIT 4 Updated
No ratings yet
UNIT 4 Updated
56 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Unit 4
No ratings yet
Unit 4
4 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Lect 12
No ratings yet
Lect 12
80 pages
UNIT5
No ratings yet
UNIT5
60 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
From Everand
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
Prof. Sham Tickoo
No ratings yet
Vector Calculus Using Mathematica Second Edition
From Everand
Vector Calculus Using Mathematica Second Edition
Steven Tan
No ratings yet
ASSA Nursery 2
No ratings yet
ASSA Nursery 2
8 pages
[FREE PDF sample] Signal processing for neuroscientists 2nd ed Edition Drongelen - eBook PDF ebooks
100% (4)
[FREE PDF sample] Signal processing for neuroscientists 2nd ed Edition Drongelen - eBook PDF ebooks
69 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
23 pages
Solving System of Linear Equations Unit Plan Template
No ratings yet
Solving System of Linear Equations Unit Plan Template
6 pages
12345
No ratings yet
12345
64 pages
Common Papper
No ratings yet
Common Papper
26 pages
12 PDF
No ratings yet
12 PDF
7 pages
1 Epis
No ratings yet
1 Epis
12 pages
A Plastic Damage Model For Concrete - J. Lubliner, J. Oliver, S. Oller, E. Onate
No ratings yet
A Plastic Damage Model For Concrete - J. Lubliner, J. Oliver, S. Oller, E. Onate
28 pages
MATHEMATICS FOR MACHINE LEARNING (2)
No ratings yet
MATHEMATICS FOR MACHINE LEARNING (2)
2 pages
Surtronic 25 User Guide Eng
No ratings yet
Surtronic 25 User Guide Eng
60 pages
Heat Exchanger Design CHE 311 Final Project MSU
No ratings yet
Heat Exchanger Design CHE 311 Final Project MSU
15 pages
Simplification Tricks - Easiest Way To Choose Simplification Questions
No ratings yet
Simplification Tricks - Easiest Way To Choose Simplification Questions
4 pages
Module 5 Fluid Flow Measurement
No ratings yet
Module 5 Fluid Flow Measurement
27 pages
Module 1
No ratings yet
Module 1
12 pages
"Full Coverage": Histograms: (Edexcel IGCSE Nov-2010-4H Q17b Edited)
No ratings yet
"Full Coverage": Histograms: (Edexcel IGCSE Nov-2010-4H Q17b Edited)
15 pages
Chaptyer One of Introduction
No ratings yet
Chaptyer One of Introduction
18 pages
Minor Losses Exp-2 GRP 13
No ratings yet
Minor Losses Exp-2 GRP 13
12 pages
The Role of Top Management Teams in Hospitals Facing Strategic Change: Effects On Performance
No ratings yet
The Role of Top Management Teams in Hospitals Facing Strategic Change: Effects On Performance
9 pages
SE - EEP Syllabus New 17-18 CGPA
No ratings yet
SE - EEP Syllabus New 17-18 CGPA
38 pages
Hehd 107
No ratings yet
Hehd 107
9 pages
Week 6 - Graphing Linear Equations
No ratings yet
Week 6 - Graphing Linear Equations
27 pages
ISC2021 Computer Science Practical Paper
No ratings yet
ISC2021 Computer Science Practical Paper
5 pages
Flame Retardant Test Fabrics
No ratings yet
Flame Retardant Test Fabrics
4 pages
Worksheet Normal Distributions
100% (1)
Worksheet Normal Distributions
3 pages
Testing_the_validity_of_CAPM_in_Indian_s
No ratings yet
Testing_the_validity_of_CAPM_in_Indian_s
5 pages
Capital University of Science and Technology Department of Computer Science CS 3163: Design and Analysis of Algorithms (3) : Fall 2020
No ratings yet
Capital University of Science and Technology Department of Computer Science CS 3163: Design and Analysis of Algorithms (3) : Fall 2020
4 pages
Tutorial7 Dummy Variables
No ratings yet
Tutorial7 Dummy Variables
47 pages
Requirements of Driver's Field of Vision For Agricultural Tractors
No ratings yet
Requirements of Driver's Field of Vision For Agricultural Tractors
10 pages

07Clustering

Uploaded by

07Clustering

Uploaded by

Cluster Analysis

What is Cluster Analysis?

 Partitional clustering approach

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Optimal Clustering Sub-optimal Clustering

The process is summarized by

You might also like