0% found this document useful (0 votes)

40 views40 pages

DM Chapter 5 (Clustering)

The document discusses clustering techniques for data mining. It covers clustering concepts, applications, quality measures, similarity measures, different data types, partitioning and hierarchical clustering approaches. The k-means clustering method is also explained with an example.

Uploaded by

world channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views40 pages

DM Chapter 5 (Clustering)

Uploaded by

world channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Chapter 5

Clustering
CRISP-DM Model
• Business Understanding: This initial phase focuses on setting project
objectives and requirements from a business perspective, and then
converting this knowledge into a data mining problem, & a preliminary plan
to achieve the objectives.
• Data Understanding: This phase starts with an initial data collection and
proceeds with activities in order to get familiar with the data, or to detect
interesting subsets to form hypotheses for hidden information.
• Data Preparation: This phase covers data cleaning, data reduction and data
transformation for modeling tools.
• Modeling: In this phase, apply various modeling techniques to create a
model by readjusting their parameters to optimal values.
• Evaluation: At this stage you have built a model. Next is to more thoroughly
evaluate the model, and review the steps executed to construct the model,
to be certain it properly achieves the business objectives.
• Deployment: Even if the purpose of the model is to increase knowledge of
the data, the knowledge gained will need to be organized and presented in
a way that the customer can easily understand and use it.
Clustering
• Clustering is a data mining (machine learning) technique that
finds similarities between data according to the characteristics
found in the data & groups similar data objects into one cluster
• Given a set of points, with a x
notion of distance between x x xx x
points, group the points into x x x x x x
some number of clusters, so x x x x x x x
x
that members of a cluster x x x
x x xx x
are in some sense as close x
to each other as possible.
x x
• While data points in the same x x x x
cluster are similar, those in x x x
separate clusters are dissimilar x
to one another.
Cont,,,
 Cluster: a collection of data objects
▶ Similar to one another within the same cluster
▶ Dissimilar to the objects in other clusters
 Cluster analysis
▶ Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no predefined
classes
 Typical applications
▶ As a stand-alone tool to get insight into data
distribution
▶ As a preprocessing step for other algorithms
Cont,,,
Cont,,,
Example: clustering
• The example below demonstrates the clustering of padlocks
of same kind. There are a total of 10 padlocks which various
in color, size, shape, etc.

• How many possible clusters of padlocks can be identified?

– There are three different kind of padlocks; which can be grouped into
three different clusters.
– The padlocks of same kind are clustered into a group as shown below:
Cont,,
Example: Clustering Application
• Text/Document Clustering:
–Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
–Approach:
–To identify frequently occurring terms in each
document.
–Form a similarity measure based on the frequencies
of different terms and use it to cluster documents.
–Gain: Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.
Quality: What Is Good Clustering?
• A good clustering method will produce Intra-cluster
high quality clusters with distances are
– high intra-class similarity minimized
– low inter-class similarity
• The quality of a clustering result depends
on both the similarity measure used by
the method and its implementation
– Key requirement of clustering: Need a
good measure of similarity between
instances.
• The quality of a clustering method is also
measured by its ability to discover some Inter-cluster
or all of the hidden patterns in the given distances are
Inter
datasets maximized
Type of data in clustering analysis
▶ Some of the common types of data we may
have are:
▶ Interval-scaled variables
▶ Binary variables
▶ Nominal, and ordinal
▶ mixed types:
Similarity/Dissimilarity Measures
• Each clustering problem is based on some kind of “distance” or
“nearness measurement” between data points.
• Distances are normally used to measure the similarity or
dissimilarity between two data objects
• Popular similarity measure is: Minkowski distance:
n q
dis( X ,Y )  q  (| x  y |)
i 1 i i
where X = (x1, x2, …, xn) and Y = (y1, y2, …, yn) are two n-
dimensional data objects; n is size of vector attributes of the
data object; q= 1,2,3,…
• If q = 1, dis is Manhattan distance
n
dis ( X , Y )   (| xi  yi |
i 1
Similarity and Dissimilarity Between Objects
• If q = 2, dis is Euclidean distance:
n
dis( X ,Y )   (| x  y |) 2

i 1 i i
• Cosine Similarity
– If X and Y are two vector attributes of data objects, then
cosine similarity measure is given by:
x y
dis( X ,Y )  i i
x  y
i i
where  indicates vector dot product, ||xi|| the length of
vector d
Example: Similarity measure
• Ex: Find the similarity between documents 1 and 2.
d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25

||d1||= (55+00+33+00+22+00+00+22+00+00)½ =(42)½ =

6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)½ = (17)½
= 4.12

cos(d1, d2 ) = 0.94
Binary Variables
▶ A binary variable is a variable which has only two
possible values (1or 0, yes or no, etc)
▶ For example smoker, educated, Ethiopian, Female etc
▶ If all attributes of the objects are binary valued, we
can construct dissimilarity matrix from the given
binary data
▶ If all the binary valued attributes have the same
weight, we can construct a 2-by-2 contingency table
for any two objects I and J as shown bellow
Binary variables
Binary Variables – distance functions
▶ Hence, distance between the two object can be
measured as follows
▶ Simple matching coefficient for binary valued attributes
in which the two values are equally relevant (Symmetric)
▶ For example sex as Female or male:

▶ Jaccard
coefficient: the two values are not equally
important for example smoker no(=1) more relevant than
smoker yes (=0) (asymmetric):
Dissimilarity between Binary Variables
Dissimilarity between Binary Variables
▶ Contingency table between Jack and Mary:
Dissimilarity between Binary Variables
▶ Contingency table between Jack and Jim
Dissimilarity between Binary Variables
▶ Contingency table between Jim and Mary
Dissimilarity between Binary Variables
Dissimilarity between Binary Variables
Major Clustering Approaches
• Partitioning clustering approach:
– Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
– Typical methods:
• distance-based: K-means clustering
• model-based: expectation maximization (EM) clustering.
• Hierarchical clustering approach:
– Create a hierarchical decomposition of the set of data (or
objects) using some criterion
– Typical methods:
• agglomerative Vs divisive
• single link Vs complete link
Partitioning Algorithms: Basic Concept
• Partitioning method: Construct a partition of a database D of
n objects into a set of k clusters; such that, sum of squared
distance is minimum
• Given a k, find a partition of k clusters that optimizes the
chosen partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means: Each cluster is represented by the center of the
cluster
– k-medoids or PAM (Partition around medoids): Each cluster is
represented by one of the objects in the cluster
The K-Means Clustering Method
• Algorithm:
• Select K cluster points as initial centroids (the initial
centroids are selected randomly)
– Given k, the k-means algorithm is implemented as follows:
• Repeat
– Partition objects into k nonempty subsets
– Recompute the centroids of each K clusters of the
current partition (the centroid is the center, i.e.,
mean point, of the cluster)
– Assign each object to the cluster with the nearest
seed point
• Until the centroid don’t change
The K-Means Clustering Method
• Example
10 Assign 10

9
10

each
9

Update
8 8
8
7 7
7

6
objects 6

5 the 6

5
5

4
to most 4
cluster 4
3 3
3

2
similar 2
means 2

center
1 1
1
0 0
0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

reassign reassign
K=2
Update
10 10

9 9

Arbitrarily 8

7 the
8

choose K object 6

5 cluster
6

as initial cluster 4

3 means
4

center 2

1
2

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Example Problem
• Cluster the following eight points (with (x, y)
representing locations) into three clusters : A1(2,
10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
A7(1, 2) A8(4, 9).
– Assume that initial cluster centers are: A1(2, 10), A4(5,
8) and A7(1, 2).
• The distance function between two points a=(x1,
y1) and b=(x2, y2) is defined as:
dis(a, b) = |x2 – x1| + |y2 – y1| .
• Use k-means algorithm to find optimal centroids
to group the given data into three clusters.
Iteration 1
First we list all points in the first column of the table below. The
initial cluster centers – centroids, are (2, 10), (5, 8) and (1, 2) -
chosen randomly.
(2,10) (5, 8) (1, 2)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
Next, we will calculate the distance from each points to each of
the three centroids, by using the distance function:
dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
• Starting from point A1 calculate the distance to each of the three
means, by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |5 – 2| + |8 – 10| = 3 + 2 = 5
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
– Fill these values in the table & decide which cluster should the point (2,
10) be placed in? The one, where the point has the shortest distance to
the mean – i.e. mean 1 (cluster 1), since the distance is 0.
• Next go to the second point A2 and calculate the distance:
dis(A2, mean1) = |2 – 2| + |10 – 5| = 0 + 5 = 5
dis(A2, mean2) = |5 – 2| + |8 – 5| = 3 + 3 = 6
dis(A2, mean2) = |1 – 2| + |2 – 5| = 1 + 3 = 4
– So, we fill in these values in the table and assign the point (2, 5)
to cluster 3 since mean 3 is the shortest distance from A2.
• Analogically, we fill in the rest of the table, and place each point in
one of the clusters
Iteration 1
• Next, we need to re-compute the new cluster centers (means). We
do so, by taking the mean of all points in each cluster.
• For Cluster 1, we only have one point A1(2, 10), which was the old
mean, so the cluster center remains the same.
• For Cluster 2, we have five points and needs to take average of
them as new centroid, i,e.
( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
• For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
• That was Iteration1 (epoch1). Next, we go to Iteration2 (epoch2),
Iteration3, and so on until the centroids do not change anymore.
– In Iteration2, we basically repeat the process from Iteration1
this time using the new means we computed.
Second epoch
• Using the new centroid we have to compute cluster members.
(2,10) (6, 6) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 8 7 1
A2 (2, 5) 5 5 2 3
A3 (8, 4) 12 4 7 2
A4 (5, 8) 5 3 8 2
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1

• After the 2nd epoch the results would be:

cluster 1: {A1,A8} with new centroid=(3,9.5);
cluster 2: {A3,A4,A5,A6} with new centroid=(6.5,5.25);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Third epoch
• Using the new centroid we have to compute cluster members.
(3,9.5) (6.5, 5.25) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 1.5 9.25 7 1
A2 (2, 5) 5.5 4.75 2 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Fourth epoch
• Using the new centroid we have to compute cluster members.
(3.66,9) (7, 4.33) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 2.66 10.67 7 1
A2 (2, 5) 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Final results
• Finally in the 4th epoch there is no change of members of
clusters and centroids. So the algoithrm stops.
• The result of clustering is shown in the following figure
Comments on the K-Means Method
• Strength: Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.
• Weakness
–Applicable only when mean is defined, then what about
categorical data? Use hierarchical clustering
• Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers Since an object with
an extremely large value may substantially distort the
distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
Hierarchical Clustering
• Produces a set of nested clusters organized
as a hierarchical tree.
0.2

• Can be visualized as a dendrogram; a tree 0.15

like diagram that records the sequences of 0.1

merges or splits 0.05

0
1 3 2 5 4 6

Step 0 Step 1 Step 2 Step 3 Step 4

agglomerative
a
ab
b
abcde
6 5
c
cde 4
3 4
d 2
de 5
2
e
divisive 1
1
3
Step 4 Step 3 Step 2 Step 1 Step 0
Strengths of Hierarchical Clustering
• Do not have to assume any particular number
of clusters
– Any desired number of clusters can be obtained by
‘cutting’ the dendogram at the proper level

• They may correspond to meaningful

taxonomies
– Example in biological sciences (e.g., animal
kingdom, phylogeny reconstruction, …)
Two main types of hierarchical clustering
•Agglomerative: it is a Bottom Up clustering
technique
– Start with all sample units in n clusters of size 1.
– Then, at each step of the algorithm, the pair of clusters with the
shortest distance are combined into a single cluster.
– The algorithm stops when all sample units are combined into a single
cluster of size n.
•Divisive: it is a Top Down clustering technique
– Start with all sample units in a single cluster of size n.
– Then, at each step of the algorithm, clusters are partitioned into
a pair of daughter clusters, selected to maximize the distance
between each daughter.
– The algorithm stops when sample units are partitioned into n
clusters of size 1.
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Let each data point be a cluster
2. Compute the proximity matrix
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
• Key operation is the computation of the proximity of two clusters
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Social Media Data Analytics To Improve Supply Chain Management in Food Industries
No ratings yet
Social Media Data Analytics To Improve Supply Chain Management in Food Industries
18 pages
Data Science in Theory and Practice (Jaydip Sen, Sayantani Roy Choudhury) (Z-Library)
No ratings yet
Data Science in Theory and Practice (Jaydip Sen, Sayantani Roy Choudhury) (Z-Library)
388 pages
Chapter 1-Introduction
No ratings yet
Chapter 1-Introduction
12 pages
Chapter 5 Memory
No ratings yet
Chapter 5 Memory
58 pages
DM Chapter 1
No ratings yet
DM Chapter 1
37 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
DWM Musa
No ratings yet
DWM Musa
4 pages
Numerical Ecology With R Total Access Ebook
No ratings yet
Numerical Ecology With R Total Access Ebook
14 pages
Nhóm 4 KHDL
No ratings yet
Nhóm 4 KHDL
59 pages
Clustering and Association Rule
No ratings yet
Clustering and Association Rule
69 pages
Managerial Skill
No ratings yet
Managerial Skill
24 pages
Clustering
No ratings yet
Clustering
51 pages
K Medoids
No ratings yet
K Medoids
101 pages
DM - Topic Four - Part III (Autosaved)
No ratings yet
DM - Topic Four - Part III (Autosaved)
67 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
Crash Course in Analytics For Non Analytics Managers
No ratings yet
Crash Course in Analytics For Non Analytics Managers
74 pages
Data Mining
No ratings yet
Data Mining
98 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
ICS 2408 - Lecture 7 - Clustering
No ratings yet
ICS 2408 - Lecture 7 - Clustering
25 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Multidendrograms: Variable-Group Agglomerative Hierarchical Clusterings
No ratings yet
Multidendrograms: Variable-Group Agglomerative Hierarchical Clusterings
22 pages
Analysis of Cluteruing
No ratings yet
Analysis of Cluteruing
16 pages
Parallel Algorithm For The Chameleon Clustering Algorithm Using Dynamic Modeling
No ratings yet
Parallel Algorithm For The Chameleon Clustering Algorithm Using Dynamic Modeling
7 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Chapter 2-Simple Searching and Sorting Algorithms
100% (1)
Chapter 2-Simple Searching and Sorting Algorithms
21 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
How To Pass Sem 5 - Comps
No ratings yet
How To Pass Sem 5 - Comps
11 pages
Clustering
No ratings yet
Clustering
80 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Data Mining: Concepts and Techniques: Cluster Analysis
No ratings yet
Data Mining: Concepts and Techniques: Cluster Analysis
97 pages
Clustering
No ratings yet
Clustering
29 pages
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
No ratings yet
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
119 pages
Data Segmentation
No ratings yet
Data Segmentation
27 pages
CPMBU Article-Published
No ratings yet
CPMBU Article-Published
6 pages
Chapter4 Clustering
No ratings yet
Chapter4 Clustering
77 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
8 pages
Intro To C# - Part 3
No ratings yet
Intro To C# - Part 3
23 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Introduction
No ratings yet
Introduction
20 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Marielle Caccam Jewel Refran
No ratings yet
Marielle Caccam Jewel Refran
100 pages
Chapter 2 Data Representation
No ratings yet
Chapter 2 Data Representation
119 pages
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
56 pages
Chapter 9 TCP IP Reference Model
No ratings yet
Chapter 9 TCP IP Reference Model
38 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Clustering
No ratings yet
Clustering
47 pages
Chapter 4 Processor - 2014
No ratings yet
Chapter 4 Processor - 2014
71 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Lecture24 s12
No ratings yet
Lecture24 s12
24 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
7 Cluster Analysis
No ratings yet
7 Cluster Analysis
62 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Clustering
No ratings yet
Clustering
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Professional Ethics Chapter One
No ratings yet
Professional Ethics Chapter One
27 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
DM Chapter 2
No ratings yet
DM Chapter 2
35 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
No ratings yet
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
24 pages
Exdata
No ratings yet
Exdata
184 pages
Kitaw A
No ratings yet
Kitaw A
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Ip2 Ass
No ratings yet
Ip2 Ass
8 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
82 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
Lect 4
No ratings yet
Lect 4
34 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
CST383 B
No ratings yet
CST383 B
4 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Stata Training Course
No ratings yet
Stata Training Course
43 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering
No ratings yet
Clustering
104 pages
Unit 4
No ratings yet
Unit 4
65 pages
Chapter 3 Boolean Anlgebra and Logi Gates
No ratings yet
Chapter 3 Boolean Anlgebra and Logi Gates
59 pages
EDP Part 1
No ratings yet
EDP Part 1
42 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Machine Learning in Drug Discovery and Development Part 1: A Primer
No ratings yet
Machine Learning in Drug Discovery and Development Part 1: A Primer
14 pages
Factors Affecting Safety of Processes in The Malaysian Oil and Gas Industry
No ratings yet
Factors Affecting Safety of Processes in The Malaysian Oil and Gas Industry
16 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
2020 - Applied Statistics For Environmental Science With R
No ratings yet
2020 - Applied Statistics For Environmental Science With R
3 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
23 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Practical Software Testing
No ratings yet
Practical Software Testing
3 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

DM Chapter 5 (Clustering)

Uploaded by

DM Chapter 5 (Clustering)

Uploaded by

Chapter 5

• How many possible clusters of padlocks can be identified?

||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)½ =(42)½ =

• After the 2nd epoch the results would be:

• Can be visualized as a dendrogram; a tree 0.15

like diagram that records the sequences of 0.1

merges or splits 0.05

Step 0 Step 1 Step 2 Step 3 Step 4

• They may correspond to meaningful

You might also like

||d1||= (55+00+33+00+22+00+00+22+00+00)½ =(42)½ =