0% found this document useful (0 votes)

7 views33 pages

Lecture 18 Clustering 19092024 091909am

Uploaded by

Nife Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views33 pages

Lecture 18 Clustering 19092024 091909am

Uploaded by

Nife Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

CSC479

Data Mining
Lecture # 18

Clustering

(Ch # 10)
The Problem of Clustering
 Given a set of points, with a notion of
distance between points, group the
points into some number of clusters, so
that members of a cluster are in some
sense as nearby as possible.
 Clustering is unsupervised
classification: no predefined classes.
 Formally, Clustering is the process of
grouping data points such as intra-
cluster distance is minimized and inter-
cluster distance is maximized. 2
Types of Clustering
 A clustering is a set of clusters
 Important distinction between
hierarchical and partitional sets of
clusters
 Partitional Clustering
• A division data objects into non-overlapping
subsets (clusters) such that each data object
is in exactly one subset
 Hierarchical clustering
• A set of nested clusters organized as a
hierarchical tree
 Other distinctions – coming slides 3
Partitional Clustering

Original Points A Partitional Clustering

4
Hierarchical Clustering

p1
p3 p4
p2

p1 p2 p3 p4
Traditional Hierarchical Clustering Traditional Dendrogram

5
Other Distinctions Between Sets of
Clusters
 Exclusive versus non-exclusive
 In non-exclusive clusterings, points may belong to
multiple clusters.
 Can represent multiple classes or ‘border’ points

 Fuzzy versus non-fuzzy

 In fuzzy clustering, a point belongs to every cluster with
some weight between 0 and 1
 Weights must sum to 1
 Probabilistic clustering has similar characteristics

 Partial versus complete

 In some cases, we only want to cluster some of the data

 Heterogeneous versus homogeneous

 Cluster of widely different sizes, shapes, and densities

6
Types of Clusters
 Well-separated clusters

 Center-based clusters

 Contiguous clusters

 Density-based clusters

 Property or Conceptual

 Described by an Objective Function

7
Types of Clusters: Well-Separated
 Well-Separated Clusters:
 A cluster is a set of points such that any point in a cluster
is closer (or more similar) to every other point in the
cluster than to any point not in the cluster.

3 well-separated clusters
8
Types of Clusters: Center-Based
 Center-based
 A cluster is a set of objects such that an object in a
cluster is closer (more similar) to the “center” of a
cluster, than to the center of any other cluster
 The center of a cluster is often a centroid, the average of
all the points in the cluster, or a medoid, the most
“representative” point of a cluster

4 center-based clusters
9
Types of Clusters: Density-Based
 Density-based
 A cluster is a dense region of points, which is separated
by low-density regions, from other regions of high
density.
 Used when the clusters are irregular or intertwined, and
when noise and outliers are present.

6 density-based clusters
10
Data Structures Used
 x11 ... x1f ... x1p 
 
 Data matrix  ... ... ... ... ... 
x ... xif ... xip 
 i1 
 ... ... ... ... ... 
x ... xnf ... xnp 
 n1 

 0 
 Similarity matrix  d(2,1) 0 
 
 d(3,1) d ( 3,2) 0 
 
 : : : 
 d ( n,1) d ( n,2) ... ... 0

11
Partitioning (Centeroid-Based) Algorithms
 Construct a partition of a database D of n
objects into a set of k clusters
 Given a k, find a partition of k clusters that
optimizes the chosen partitioning criterion
 k-means (MacQueen’67)
• Each cluster is represented by the center of the
cluster
• A Euclidean Distance based method, mostly used
for interval/ratio scaled data

 k-medoids
• Each cluster is represented by one of the objects
in the cluster
• For categorical data
K-means Clustering
 Partitional clustering approach
 Each cluster is associated with a centroid
(center point)
 Each point is assigned to the cluster with the
closest centroid
 Number of clusters, K, must be specified
 The basic algorithm is very simple

13
Clustering Example
Iteration 0
3

2.5

1.5
y

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

14
Clustering Example
Iteration 6
1
2
3
4
5
3

2.5

1.5
y

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

15
K-means Clustering – Details
 Initial centroids are often chosen randomly.
 Clusters produced vary from one run to another.
 The centroid is (typically) the mean of the points in
the cluster.
 ‘Closeness’ is measured by Euclidean distance,
cosine similarity, correlation, etc.
 K-means will converge for common similarity
measures mentioned above.
 Most of the convergence happens in the first few
iterations.
 Often the stopping condition is changed to ‘Until relatively
few points change clusters’
 Complexity is O( n * K * I * d )
 n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes

16
A Simple example showing the
implementation of k-means algorithm

(using K=2)
Step 1:
Initialization: Randomly we choose following two
centroids (k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
 Thus, we obtain two
clusters containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these
centroids we compute
the Euclidean distance
of each object, as
shown in table.

 Therefore, the new

clusters are:
{1,2} and {3,4,5,6,7}

 Next centroids are:

m1=(1.25,1.5) and m2
= (3.9,5.1)
 Step 4 :
The clusters obtained
are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no
change in the cluster.
 Thus, the algorithm
comes to a halt here
and final result consist
of 2 clusters {1,2} and
{3,4,5,6,7}.
PLOT
(with K=3)

Step 1 Step 2
PLOT
Real-Life Numerical
Example of K-Means
Clustering
We have 4 medicines as our training data points
object and each medicine has 2 attributes. Each
attribute represents coordinate of the object. We
have to determine which medicines belong to
cluster 1 and which medicines belong to the other
cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object weight index
1 1
Medicine A

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Step 1:
 Initial value of
centroids : Suppose
we use medicine A
and medicine B as
the first centroids.
 Let and c1 and c2
denote the
coordinate of the
centroids, then
c1=(1,1) and
c2=(2,1)
 Objects-Centroids distance : we calculate the
distance between cluster centroid to each object.
Let us use Euclidean distance, then we have
distance matrix at iteration 0 is

 Each column in the distance matrix symbolizes

the object.
 The first row of the distance matrix corresponds
to the distance of each object to the first centroid
and the second row is the distance of each object
to the second centroid.
 For example, distance from medicine C = (4, 3)
to the first centroid is , and its
distance to the second centroid is , is
etc.
Step 2:
 Objects clustering :
We assign each object
based on the minimum
distance.
 Medicine A is assigned
to group 1, medicine B
to group 2, medicine C
to group 2 and
medicine D to group 2.
 The elements of Group
matrix below is 1 if and
only if the object is
assigned to that group.
 Iteration-1, Objects-Centroids
distances : The next step is to
compute the distance of all objects
to the new centroids.
 Similar to step 2, we have distance
matrix at iteration 1 is
 Iteration-1, Objects
clustering:Based on the new
distance matrix, we move the
medicine B to Group 1 while
all the other objects remain.
The Group matrix is shown
below

 Iteration 2, determine
centroids: Now we repeat
step 4 to calculate the new
centroids coordinate based
on the clustering of previous
iteration. Group1 and group 2
both has two members, thus
the new centroids are
 Iteration-2, Objects-Centroids
distances : Repeat step 2 again,
we have new distance matrix at
iteration 2 as
 Iteration-2, Objects clustering: Again,
we assign each object based on the
minimum distance.

 We obtain result that . Comparing

the grouping of last iteration and this
iteration reveals that the objects does not
move group anymore.
 Thus, the computation of the k-mean
clustering has reached its stability and no
more iteration is needed..
We get the final grouping as the results as:

Object Feature1(X): Feature2 Group

weight index (Y): pH (result)
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2

K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
Unit 4 Remote Sensing and GIS Notes
100% (1)
Unit 4 Remote Sensing and GIS Notes
79 pages
Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
Introduction To GIS
No ratings yet
Introduction To GIS
17 pages
UE 461 Intro. To GIS - 12
No ratings yet
UE 461 Intro. To GIS - 12
124 pages
Humanities Geography
No ratings yet
Humanities Geography
14 pages
Geographic Information Systems For Transportation From A Static Past To A Dynamic Future PDF
100% (1)
Geographic Information Systems For Transportation From A Static Past To A Dynamic Future PDF
13 pages
GIS Data Models: GIS Data Models, Vector Data Models, Advantages and Disadvantages of Vector Data Models
100% (2)
GIS Data Models: GIS Data Models, Vector Data Models, Advantages and Disadvantages of Vector Data Models
28 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Syllabus-Pavement Analysis and Design
No ratings yet
Syllabus-Pavement Analysis and Design
24 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Impacts of Rail Transit Access On Land and Housing Values in China A Quantitative Synthesis
No ratings yet
Impacts of Rail Transit Access On Land and Housing Values in China A Quantitative Synthesis
18 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
No ratings yet
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
73 pages
Jatma 2018 06SI1 12 - 2
No ratings yet
Jatma 2018 06SI1 12 - 2
8 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
02 K-Means
No ratings yet
02 K-Means
25 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
(Badi H. Baltagi, Giuseppe Arbia (Auth.), Giuseppe (B-Ok - CC)
No ratings yet
(Badi H. Baltagi, Giuseppe Arbia (Auth.), Giuseppe (B-Ok - CC)
282 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Noam Shoval, Michal Isaacson - Tourist Mobility and Advanced Tracking Technologies (Routledge Advances in Tourism) (2009)
100% (1)
Noam Shoval, Michal Isaacson - Tourist Mobility and Advanced Tracking Technologies (Routledge Advances in Tourism) (2009)
228 pages
Clustering
No ratings yet
Clustering
84 pages
Ecological Informatics: Nishant Singh, Sunil Kumar Katiyar
No ratings yet
Ecological Informatics: Nishant Singh, Sunil Kumar Katiyar
6 pages
Jean Francois Coeurjolly
No ratings yet
Jean Francois Coeurjolly
63 pages
Risk City - DE Multi Hazard Risk Assessment
No ratings yet
Risk City - DE Multi Hazard Risk Assessment
4 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
41 pages
Clustering
No ratings yet
Clustering
125 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Carrying Capacity: A New Model For Mature Cities
No ratings yet
Carrying Capacity: A New Model For Mature Cities
52 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
Lect 12
No ratings yet
Lect 12
80 pages
K Mean Clustering 1
100% (1)
K Mean Clustering 1
12 pages
K Means
No ratings yet
K Means
26 pages
Jerram1996 - The Spatial Distribution of Grains and Crystals in Rocks
No ratings yet
Jerram1996 - The Spatial Distribution of Grains and Crystals in Rocks
15 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Appliction of Gis in Town Planning
No ratings yet
Appliction of Gis in Town Planning
6 pages
C9 - Clustering - K Means
No ratings yet
C9 - Clustering - K Means
24 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Week 11
No ratings yet
Week 11
49 pages
Unit IV
No ratings yet
Unit IV
51 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Bus Stop-Environmental Connection: Do Characteristics of The Built Environment Correlate With Bus Stop Crime?
No ratings yet
Bus Stop-Environmental Connection: Do Characteristics of The Built Environment Correlate With Bus Stop Crime?
13 pages
Lecture 9 - Geographically Weighted Regression II
No ratings yet
Lecture 9 - Geographically Weighted Regression II
23 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
GIS and Public Health - 2012
No ratings yet
GIS and Public Health - 2012
530 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
ML 5
No ratings yet
ML 5
61 pages
10 1016@j Catena 2016 01 029
No ratings yet
10 1016@j Catena 2016 01 029
10 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Title:: Identification of Angiospermic Plants To Genus and Species Level Within A Known Family Abstract
No ratings yet
Title:: Identification of Angiospermic Plants To Genus and Species Level Within A Known Family Abstract
43 pages
K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
Nan Mudhal Van Project
No ratings yet
Nan Mudhal Van Project
35 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Binder 3
No ratings yet
Binder 3
10 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
LO1 - K-Means Strategy
No ratings yet
LO1 - K-Means Strategy
29 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
GIS Modeling
No ratings yet
GIS Modeling
77 pages
Unit 1 - GIS
No ratings yet
Unit 1 - GIS
22 pages
Iccgis2016 04
No ratings yet
Iccgis2016 04
9 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML Clustering K Mean
No ratings yet
ML Clustering K Mean
33 pages
APHUG Unit 1 Vocabulary
No ratings yet
APHUG Unit 1 Vocabulary
3 pages
Pilot
No ratings yet
Pilot
3 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Clustering
No ratings yet
Clustering
29 pages
Debdutta Nayak (DAIICT)
No ratings yet
Debdutta Nayak (DAIICT)
8 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 18 Clustering 19092024 091909am

Uploaded by

Lecture 18 Clustering 19092024 091909am

Uploaded by

CSC479

Original Points A Partitional Clustering

 Fuzzy versus non-fuzzy

 Partial versus complete

 Heterogeneous versus homogeneous

 Described by an Objective Function

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

 Therefore, the new

 Next centroids are:

 Each column in the distance matrix symbolizes

 We obtain result that . Comparing

Object Feature1(X): Feature2 Group

You might also like