0% found this document useful (0 votes)

61 views98 pages

Cluster Analysis I: Presidency University

This document provides an introduction to cluster analysis and hierarchical clustering. It discusses the aims of classification in multivariate analysis, including supervised vs unsupervised learning. Key aspects covered include defining similarity between objects, hierarchical clustering methods like agglomerative and divisive approaches, and representing hierarchical clusters using dendrograms. Examples are provided to illustrate clustering satellite image pixels and merging objects in a step-by-step hierarchical manner.

Uploaded by

Arka Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views98 pages

Cluster Analysis I: Presidency University

Uploaded by

Arka Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Cluster Analysis I

Presidency University

September,2016
Classication

I Aim of any multivariate study is to learn about the nature of

some object.
Classication

I Aim of any multivariate study is to learn about the nature of

some object.

I One way of learning about the nature is to nd out which class
the object belongs .
Classication

I Aim of any multivariate study is to learn about the nature of

some object.

I One way of learning about the nature is to nd out which class
the object belongs .

I For example, a candidate is asked many questions in

examination to classify him as either pass or fail.
Classication

I Aim of any multivariate study is to learn about the nature of

some object.

I One way of learning about the nature is to nd out which class
the object belongs .

I For example, a candidate is asked many questions in

examination to classify him as either pass or fail.

I Hence classifying a object is one of the most important issue

in multivariate statistics.
Types of classication

Classification
Types of classication

Classification

Supervised
Types of classication

Classification

Supervised Unsupervised
Supervised vs Unsupervised Learning

I Kid trying to learn alphabets.

Supervised vs Unsupervised Learning

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

Supervised vs Unsupervised Learning

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

I Teacher tells him the name of each letter.

Supervised vs Unsupervised Learning

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

I Teacher tells him the name of each letter.

I This is supervised classication.

Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.
Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.

I At some place he sees

Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.

I At some place he sees

I Even now, he cannot read the alphabets.

Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

I Can pick the unique ones.

Supervised vs Unsupervised Learning
I Suppose the same kid grows up and visits china rst time.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

I Can pick the unique ones.

I This is unsupervised classication.

Supervised vs Unsupervised Learning
I Given data x1 , x2 , ....., xn with labels (i.e. group tag)
,
y1 y2 , ....., yn we learn to predict the label associated with new
input xnew .

What is
Chair Duck this?
Supervised vs Unsupervised Learning
I Given only x1 , x2 , ....., xn , we try to infer some underlying
structure (i.e. nd similarities and identifying groups).

Group these items

according to common
features
Supervised vs Unsupervised Learning

I Discriminant Analysis is supervised classication.

I Clustering is Unsupervised classication.

What is clustering? And why?

I Clustering is the task of dividing up data into groups

(clusters), so that points in any one group are more similar
to each other than to points outside the group
What is clustering? And why?

I Clustering is the task of dividing up data into groups

(clusters), so that points in any one group are more similar
to each other than to points outside the group

Two main uses are :

I Summary: deriving a reduced representation of the full data

set.

I Discovery: looking for new insights into the structure of the

data. e.g., nding groups of students that commit similar
mistakes, or groups of 80's songs that sound alike
Example

I Lets have a look at a satellite image

Example

I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green
vegetation.
Example

I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green
vegetation.

I Our human eye can detect large regions of similar colour in an

image.
Example

I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green
vegetation.

I Our human eye can detect large regions of similar colour in an

image.

I Our brain is performing a simple clustering

I It groups the pixels into two clusters: land and sea.

Proximity Measures

I Cluster Analysis is the art of grouping similar items.

Proximity Measures

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

Proximity Measures

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

measure sij between xi and xj .
Proximity Measures

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

measure sij between xi and xj .

I Generally dij is taken to be any distance measure, e.g.

dij = ||xi − xj ||2
Proximity Measures

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

measure sij between xi and xj .

I Generally dij is taken to be any distance measure, e.g.

dij = ||xi − xj ||2
I This holds good if we are working with variables.

I For categorical data, we will discuss the proximity measures

later.
Types of Clustering

Clustering

Hierarchical Non-
Clustering Hierarchical
Clustering

Agglomerative Divisive
Hierarchical Clustering

Consider the four shapes

Suppose we want to group
the similar objects
together.
Hierarchical Clustering

I Possibly put the two

cubes in one cluster
and the sphere and
cone in the other as
both have round
surfaces.

I So the two clusters

are now (sphere,
cone) and (small
cube, large cube)
Hierarchical Clustering

I Suppose we further
split these clusters.

I Then we shall split

the (sphere, cone)
cluster into (sphere)
and (cone) clusters.
Hierarchical Clustering

I If we are to further
increase the number
we will split the
second cluster.
Hierarchical Clustering
This step by step clustering process can be expressed using the
following diagram:
Agglomerative vs divisive

Agglomerative (i.e., bottom-up):

I Start with all points in their own group

I Until there is only one cluster, repeatedly: merge the two

groups that have the smallest dissimilarity
Agglomerative vs divisive

Agglomerative (i.e., bottom-up):

I Start with all points in their own group

I Until there is only one cluster, repeatedly: merge the two

groups that have the smallest dissimilarity

Divisive (i.e., top-down):

I Start with all points in one cluster

I Until all points are in their own cluster, repeatedly split the
group into two resulting in the biggest dissimilarity
Simple Example

Step 1: {1}, {2}, {3},

{4}, {5}, {6}, {7}
Simple Example

Step 2: {1}, {2, 3}, {4},

{5}, {6}, {7};
Simple Example

Step 3: {1, 7}, {2, 3},

{4}, {5}, {6}
Simple Example

Step 4: {1, 7}, {2, 3}, {4,

5}, {6}
Simple Example

Step 5: {1, 7}, {2, 3, 6},

{4, 5}
Simple Example

Step 6: {1, 7}, {2, 3, 6,4,

5}
Simple Example

Step 7: {1, 2, 3, 4, 5, 6,
7}
Simple Example
We can simply represent this sequence of clustering assignments in
a tree called dendogram.
What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.
What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.

This is simply a tree where:

I Each node represents a group

What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.

This is simply a tree where:

I Each node represents a group

I Each leaf node is a singleton (i.e., a group containing a single

data point)
What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.

This is simply a tree where:

I Each node represents a group

I Each leaf node is a singleton (i.e., a group containing a single

data point)

I Root node is the group containing the whole data set

What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.

This is simply a tree where:

I Each node represents a group

I Each leaf node is a singleton (i.e., a group containing a single

data point)

I Root node is the group containing the whole data set

I Each internal node has two daughter nodes (children),

representing the the groups that were merged to form it
What's a dendrogram?
I Dendrogram is a convenient graphic to display a hierarchical
sequence of clustering assignments.

This is simply a tree where:

I Each node represents a group

I Each leaf node is a singleton (i.e., a group containing a single

data point)

I Root node is the group containing the whole data set

I Each internal node has two daughter nodes (children),

representing the the groups that were merged to form it

I If we x the leaf nodes at height zero, then each internal node

is drawn at a height proportional to the dissmilarity between
its two daughter nodes
Cutting a dendogram
I Dendogram represents clustering allowing all possible clusters.
Cutting a dendogram
I Dendogram represents clustering allowing all possible clusters.

I In order to use it, we have to cut it at a particular height.

Cutting a dendogram
I Dendogram represents clustering allowing all possible clusters.

I In order to use it, we have to cut it at a particular height.

Cutting a dendogram
I Dendogram represents clustering allowing all possible clusters.

I In order to use it, we have to cut it at a particular height.

I The height at which we cut the dendogram represents the

distance which we choose to call dissimilar and practically
depends on the number of clusters we want.
Cutting a dendogram
I Dendogram represents clustering allowing all possible clusters.

I In order to use it, we have to cut it at a particular height.

I The height at which we cut the dendogram represents the

distance which we choose to call dissimilar and practically
depends on the number of clusters we want.

I Depending on the linkage we use, this cut has its own

interpretation, which we will study.
Linkage

I We have studied distances between two data points xi and xj .

Linkage

I We have studied distances between two data points xi and xj .

I While performing hierarchical clustering we need to merge the

clusters which are similar.
Linkage

I We have studied distances between two data points xi and xj .

I While performing hierarchical clustering we need to merge the

clusters which are similar.

I This requires a idea of distance among the clusters, what is

known as linkage.
Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
I At any level, clustering assignments can be expressed by sets
G = {i1 , i2 , ...., inG } giving the indices ( ir ) of points belonging
to that group.
Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
I At any level, clustering assignments can be expressed by sets
G = {i1 , i2 , ...., inG } giving the indices ( ir ) of points belonging
to that group.

I At the bottom level the groups are singleton, i.e. G = {i }

Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
I At any level, clustering assignments can be expressed by sets
G = {i1 , i2 , ...., inG } giving the indices ( ir ) of points belonging
to that group.

I At the bottom level the groups are singleton, i.e. G = {i }

I At the topmost level the only one group is

G = {1, 2, 3, ....., n}.
Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
I At any level, clustering assignments can be expressed by sets
G = {i1 , i2 , ...., inG } giving the indices ( ir ) of points belonging
to that group.

I At the bottom level the groups are singleton, i.e. G = {i }

I At the topmost level the only one group is

G = {1, 2, 3, ....., n}.

I Linkage is a function d (G , H ) that takes two groups G , H and

returns a dissimilarity score between them
Linkage
I We are given the data points x1 , x2 , ......, xn and the distance
between any pairs xi and xj as dij .
I At any level, clustering assignments can be expressed by sets
G = {i1 , i2 , ...., inG } giving the indices ( ir ) of points belonging
to that group.

I At the bottom level the groups are singleton, i.e. G = {i }

I At the topmost level the only one group is

G = {1, 2, 3, ....., n}.

I Linkage is a function d (G , H ) that takes two groups G , H and

returns a dissimilarity score between them

I Remember: the choice of linkage determines how we measure

dissimilarity between groups of points.
Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of

the following steps:
Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of

the following steps:

I Start with all points in their own group

Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of

the following steps:

I Start with all points in their own group

I Until there is only one cluster, repeatedly: merge the two

groups G, H such that d (G , H ) is smallest
Single Linkage
I In single linkage (i.e., nearest-neighbor linkage), the
dissimilarity between G, H is the smallest dissimilarity between
two points in opposite groups:

dsingle (G , H ) = min dij

i ∈G ,j ∈H
Single Linkage

I Single linkage score dsingle (G , H ) is the distance of the closest

pair.

I Cut interpretation: If we cut the denddogram at height 0.9,

then we can say, for each point xi , there is another point xj in
its cluster with dij ≤ 0.9
Complete Linkage
I In complete linkage (i.e., farthest-neighbor linkage),
dissimilarity between G, H is the largest dissimilarity between
two points in opposite groups:

dcomplete (G , H ) = max dij

i ∈ G ,j ∈ H
Complete Linkage

I Complete linkage score dcomplete (G , H ) is the distance of the

farthest pair.

I Cut interpretation: for each point xi , every other point xj in

its cluster satises dij ≤ 0.9
Average Linkage

I In average linkage, the dissimilarity between G, H is the

average dissimilarity over all points in opposite groups:

1
daverage (G , H ) = dij
X
nH nG i ∈G ,j ∈H
Average Linkage

I In average linkage, the dissimilarity between G, H is the

average dissimilarity over all points in opposite groups:

1
daverage (G , H ) = dij
X
nH nG i ∈G ,j ∈H

I average linkage score daverage (G , H ) is the average distance

across all pairs
Average Linkage

I In average linkage, the dissimilarity between G, H is the

average dissimilarity over all points in opposite groups:

1
daverage (G , H ) = dij
X
nH nG i ∈G ,j ∈H

I average linkage score daverage (G , H ) is the average distance

across all pairs

I There is not really any good cut interpretation.

Common Properties

Single, complete, average linkage share the following properties:

Common Properties

Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities dij , and don't need

the points x1 , x2 , .....xn to be in Euclidean space
Common Properties

Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities dij , and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages

produces a dendrogram with no inversions
Common Properties

Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities dij , and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages

produces a dendrogram with no inversions

I Second property can be stated as : disimilarity scores between

merged clusers only increases as we run the algorithm
Common Properties

Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities dij , and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages

produces a dendrogram with no inversions

I Second property can be stated as : disimilarity scores between

merged clusers only increases as we run the algorithm

I This means that we can draw a proper dendrogram, where the

height of a parent is always higher than height of its daughters
Shortcomings of single, complete linkage
Single and complete linkage can have some practical problems:
Shortcomings of single, complete linkage
Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two

groups, only need one pair of points to be close, irrespective of
all others. Therefore clusters can be too spread out, and not
compact enough
Shortcomings of single, complete linkage
Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two

groups, only need one pair of points to be close, irrespective of
all others. Therefore clusters can be too spread out, and not
compact enough

I Complete linkage avoids chaining, but suers from crowding.

Because its score is based on the worst-case dissimilarity
between pairs, a point can be closer to points in other clusters
than to points in its own cluster. Clusters are compact, but
not far enough apart
Shortcomings of single, complete linkage
Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two

groups, only need one pair of points to be close, irrespective of
all others. Therefore clusters can be too spread out, and not
compact enough

I Complete linkage avoids chaining, but suers from crowding.

I Average linkage tries to strike a balance. It uses average

pairwise dissimilarity, so clusters tend to be relatively compact
and relatively far apart
Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

I It is not clear what properties the resulting clusters have when

we cut an average linkage tree at given height h. Single and
complete linkage trees each had simple interpretations
Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

I It is not clear what properties the resulting clusters have when

we cut an average linkage tree at given height h. Single and
complete linkage trees each had simple interpretations

I Results of average linkage clustering can change with a

monotone increasing transformation of dissimilarities dij . If g
is such that g (x ) ≤ g (y ) whenever x ≤ y , and we used
dissimilarites h (dij ) instead of dij , then we could get dierent
answers
Centroid Linkage
I Centroid linkage is commonly used. Let x̄G , x̄H denote group
averages for G , H. Then

dcentroid (G , H ) = dx̄G x̄H

Centroid Linkage

I Centroid linkage score dcentroid (G , H ) is the distance between

the group cen- troids (i.e., group averages)

I There is not really any good cut interpretation.

Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes

up the visualization
Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes

up the visualization

I Even if were we lucky enough to have no inversions, still no

interpretation for the clusters resulting from cutting the tree
Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes

up the visualization

I Even if were we lucky enough to have no inversions, still no

interpretation for the clusters resulting from cutting the tree

I Answers change with a monotone transformation of the

dissimilarity measure
Linkages in a nutshell

Linkage No Inversions? Unchanged with monotone transformations? Cut Interpretation? Note

Single X X X Chaining

Complete X X X Crowding

Average X × ×
Centroid × × × Simple
Linkages in a nutshell

Linkage No Inversions? Unchanged with monotone transformations? Cut Interpretation? Note

Single X X X Chaining

Complete X X X Crowding

Average X × ×
Centroid × × × Simple

I This doesn't tell us what best linkage is

Linkages in a nutshell

Linkage No Inversions? Unchanged with monotone transformations? Cut Interpretation? Note

Single X X X Chaining

Complete X X X Crowding

Average X × ×
Centroid × × × Simple

I This doesn't tell us what best linkage is

I choosing a linkage can be very situation dependent

Landscape Design Elements and Principles
No ratings yet
Landscape Design Elements and Principles
45 pages
Wind Load Calculation For An Open Tower Type Structure (As Per BS EN 1991-1-4-2005) PDF
100% (3)
Wind Load Calculation For An Open Tower Type Structure (As Per BS EN 1991-1-4-2005) PDF
4 pages
Typical EfW Plant Commissioning Plan Feb 2010
No ratings yet
Typical EfW Plant Commissioning Plan Feb 2010
176 pages
Mechanics - Dynamics
100% (1)
Mechanics - Dynamics
108 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Mole Concepts and Molar Mass
No ratings yet
Mole Concepts and Molar Mass
11 pages
Topic 4
100% (1)
Topic 4
11 pages
Conformity, Compliance and Obedience
No ratings yet
Conformity, Compliance and Obedience
9 pages
Pump Performance Monitoring
No ratings yet
Pump Performance Monitoring
3 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
D3IT Clustering April 2023
No ratings yet
D3IT Clustering April 2023
70 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
No ratings yet
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
66 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
Cluster Analysis
No ratings yet
Cluster Analysis
101 pages
Lecture 6
No ratings yet
Lecture 6
42 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Clustering
No ratings yet
Clustering
64 pages
Image Enhancement (Spatial Filtering 2)
No ratings yet
Image Enhancement (Spatial Filtering 2)
30 pages
Lesson Plan R Task 2
No ratings yet
Lesson Plan R Task 2
5 pages
Clustering
No ratings yet
Clustering
69 pages
Education System in Singapore and Hong Kong
No ratings yet
Education System in Singapore and Hong Kong
9 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
MP44
No ratings yet
MP44
183 pages
Lorenz Datalogger Software Installation Maual
0% (1)
Lorenz Datalogger Software Installation Maual
19 pages
DM 4
No ratings yet
DM 4
76 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
75 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering
No ratings yet
Clustering
22 pages
Thesis Statement For An Expository Essay
100% (3)
Thesis Statement For An Expository Essay
4 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Teaching Method - Direct Method 2
No ratings yet
Teaching Method - Direct Method 2
5 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Clustering
No ratings yet
Clustering
20 pages
Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
Anova 1: Eric Jacobs Hubert Korzilius
No ratings yet
Anova 1: Eric Jacobs Hubert Korzilius
39 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
CH - 5 Clustering ?
No ratings yet
CH - 5 Clustering ?
22 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
CV w4 - Recognition - Statistical Based
No ratings yet
CV w4 - Recognition - Statistical Based
42 pages
ML 8
No ratings yet
ML 8
12 pages
CLUSTERING
No ratings yet
CLUSTERING
20 pages
UNIT5
No ratings yet
UNIT5
60 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Clustering
No ratings yet
Clustering
29 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Clustering
No ratings yet
Clustering
36 pages
Cluster Analysis: Cosmin Lazar Como Lab Vub
No ratings yet
Cluster Analysis: Cosmin Lazar Como Lab Vub
77 pages
Cluster Analysis
No ratings yet
Cluster Analysis
77 pages
Fault Modeling: Why Model Faults? Some Real Defects in VLSI and PCB Common Fault Models Stuck-At Faults
No ratings yet
Fault Modeling: Why Model Faults? Some Real Defects in VLSI and PCB Common Fault Models Stuck-At Faults
18 pages
Budget of Minority
No ratings yet
Budget of Minority
18 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Introduction To Clustering Procedures: Sas/Stat User's Guide
No ratings yet
Introduction To Clustering Procedures: Sas/Stat User's Guide
48 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Additional Maths SBA 2 PDF
No ratings yet
Additional Maths SBA 2 PDF
15 pages
Clustering
No ratings yet
Clustering
39 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
PowerBuilder and The Cloud
No ratings yet
PowerBuilder and The Cloud
10 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Sade Assignment Full
No ratings yet
Sade Assignment Full
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Probability - Handout
No ratings yet
Probability - Handout
9 pages
Microprotol CADTech Consultants
No ratings yet
Microprotol CADTech Consultants
13 pages
ToR Learning, and Regular Monitoring - Maluku Utara - Revised - Fin
No ratings yet
ToR Learning, and Regular Monitoring - Maluku Utara - Revised - Fin
8 pages
Solution
No ratings yet
Solution
6 pages
A New Hierarchical Clustering Algorithm
No ratings yet
A New Hierarchical Clustering Algorithm
5 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
Clustering
No ratings yet
Clustering
7 pages
Chapter 6 - Probability
No ratings yet
Chapter 6 - Probability
3 pages
Chapter 6 - 10 Legend Questions
No ratings yet
Chapter 6 - 10 Legend Questions
3 pages
SHHH! - Society For HandHeld Hushing
No ratings yet
SHHH! - Society For HandHeld Hushing
3 pages
General Invitation For AAA2020 Version 3.0
No ratings yet
General Invitation For AAA2020 Version 3.0
4 pages
Lorenzo M. de Vera - Resume For SWS
No ratings yet
Lorenzo M. de Vera - Resume For SWS
2 pages
Clustering
No ratings yet
Clustering
5 pages
Numerology Gifts & Talents
From Everand
Numerology Gifts & Talents
Mark James Carter
No ratings yet
MCS-013: Discrete Mathematics
From Everand
MCS-013: Discrete Mathematics
Dr. DK Sukhani
No ratings yet

Cluster Analysis I: Presidency University

Uploaded by

Cluster Analysis I: Presidency University

Uploaded by

Cluster Analysis I

I Aim of any multivariate study is to learn about the nature of

I Aim of any multivariate study is to learn about the nature of

I Aim of any multivariate study is to learn about the nature of

I For example, a candidate is asked many questions in

I Aim of any multivariate study is to learn about the nature of

I For example, a candidate is asked many questions in

I Hence classifying a object is one of the most important issue

I Kid trying to learn alphabets.

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

I Teacher tells him the name of each letter.

I Kid trying to learn alphabets.

I The english letters are only pictures for him.

I Teacher tells him the name of each letter.

I This is supervised classication.

I At some place he sees

I At some place he sees

I Even now, he cannot read the alphabets.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

I Can pick the unique ones.

I At some place he sees

I Even now, he cannot read the alphabets.

I But at least can identify that some are repeated.

I Can pick the unique ones.

I This is unsupervised classication.

Group these items

I Discriminant Analysis is supervised classication.

I Clustering is Unsupervised classication.

I Clustering is the task of dividing up data into groups

I Clustering is the task of dividing up data into groups

Two main uses are :

I Summary: deriving a reduced representation of the full data

I Discovery: looking for new insights into the structure of the

I Lets have a look at a satellite image

I Lets have a look at a satellite image

I Lets have a look at a satellite image

I Our human eye can detect large regions of similar colour in an

I Lets have a look at a satellite image

I Our human eye can detect large regions of similar colour in an

I Our brain is performing a simple clustering

I It groups the pixels into two clusters: land and sea.

I Cluster Analysis is the art of grouping similar items.

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I Generally dij is taken to be any distance measure, e.g.

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I Generally dij is taken to be any distance measure, e.g.

I For categorical data, we will discuss the proximity measures

Consider the four shapes

I Possibly put the two

I So the two clusters

I Then we shall split

Agglomerative (i.e., bottom-up):

I Start with all points in their own group

I Until there is only one cluster, repeatedly: merge the two

Agglomerative (i.e., bottom-up):

I Start with all points in their own group

I Until there is only one cluster, repeatedly: merge the two

Divisive (i.e., top-down):

I Start with all points in one cluster

I This is supervised classication.

I This is unsupervised classication.

I Discriminant Analysis is supervised classication.

I Clustering is Unsupervised classication.

I Cluster Analysis is the art of grouping similar items.

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I Cluster Analysis is the art of grouping similar items.

I How to dene similarity of items?

I Usually we dene a dissimilarity measure dij or a similarity

I If we x the leaf nodes at height zero, then each internal node

I This requires a idea of distance among the clusters, what is