0% found this document useful (0 votes)

17 views32 pages

Lec 2

Uploaded by

foreverycc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views32 pages

Lec 2

Uploaded by

foreverycc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Lecture 2: Classification, Clustering

STATS 202: Data mining and analysis

Rajan Patel

1 / 19
Classification problem

oo
oo o
o
o
o
Recall:
o oo oo o
o
o oo oo ooo
o o oo ooo o
oo o o o o ooo oo oo
o
oo
o o
o oo
o o
o oo o
o
ooo o o o o o
o o o
oo
o
I X = (X1 , X2 ) are inputs.
o oo o o o o o o o
o o oooo o ooo o o o o o
oo ooo
I Color Y ∈ {Yellow , Blue} is the
X2

o o oo o o o ooooo o o o
oo o oo o o
o o o o oo o
o oo o
o o o oo o ooo o
oo
o
o ooooo oooo
oo o
o
output.
o o oo o o
o o oo oo o o o

(X, Y ) have a joint distribution.

o o oo
o
o oo o
o
o oo
oo o
o I
o o o

X1
I Purple line is Bayes boundary —
the best we could do if we knew
Figure: * the joint distribution of (X, Y )
Figure 2.13

2 / 19
K-nearest neighbors
To assign a color to the input ×, we look at its K = 3 nearest
neighbors. We predict the color of the majority of the neighbors.

o o
o o o o
o o o o
o o

o o

o o o o
o o
o o
o o

Figure: *

Figure 2.14

3 / 19
K-nearest neighbors also has a decision boundary
KNN: K=10

oo o
oo o
o
o
o oo oo o
o o
o oo oo o o
o o oo oo
oo o o o o o oo o oo
o o oo oo o oo
o o
o
o ooo o o o o o
o
o o o oo o o o o
o
o oo o o o o o
o o
o o oooo o ooo o o o o o
oo o o oo o o oo o
X2

o
o o o
o
o o
oo o oo o oo o
o o o o oo o
o o oo oo o
o o o ooo o
o
o ooo o
oo o ooooo o
o oo o
o oo oo o
o o oo oo o
o o ooooo o
o o oo o
o oo o
o o o

Figure: *

Figure 2.15
4 / 19
The higher K, the smoother the decision boundary

KNN: K=1 KNN: K=100

o o o o
oo o o oo o o
o o o o
o oo o o oo o
oo oo
oo o oo o
o oo oo o oo o oo oo o oo
o o o
o o o o o
o o
oo o o o o o oo oo oo o o o o o oo oo
o o oo oo o oo o o oo oo o oo
o o o o o oo o o o o o o o oo o o
oo o o o o oo o o o o
o o o oo o o o o o o oo o o o
o oo o o o o o oo o o o o
o o oo o o o o o oo o o o
oo o o
oo ooo o oo ooo oo o o
oo ooo o oo ooo
o o o o oooo o o
o
o o o o o oooo o o
o
o
o o o o o o o o
oo o o o o oo o o o o
oo o o o oo o oo o o o oo o
o oo o o o oo o o
o o
o o o o o oo o o o o o o oo o
o ooo o o ooo o
oo o oooo oo oo o oooo oo
o o oo o oo
o o oo o o o
o o oo o o
o o o oo o o o o o o oo o o o
oo o oo o
o
o
o oooo oo o
o
o
o oooo oo o
oo o oo o
o o
o o o o
o o

Figure: *

Figure 2.16

5 / 19
Clustering

As in classification, we assign a class to each sample in the data

matrix. However, the class is not an output variable; we only use
input variables.

Clustering is an unsupervised procedure, whose goal is to find

homogeneous subgroups among the observations.

We will discuss 2 algorithms:

I K-means clustering
I Hierarchical clustering

6 / 19
K-means clustering
I K is the number of clusters and must be fixed in advance.
I The goal of this method is to maximize the similarity of
samples within each cluster:
K
X 1 X
min W (C` ) ; W (C` ) = Distance2 (xi,: , xj,: ).
C1 ,...,CK |C` |
`=1 i,j∈C`
K=2 K=3 K=4

Figure: *

Figure 10.5
7 / 19
K-means clustering algorithm

1. Assign each sample to a cluster from 1 to K arbitrarily, e.g. at

random.

2. Iterate these two steps until the clustering is constant:

I Find the centroid of each cluster `; i.e. the average x`,: of all
the samples in the cluster:
1 X
x`,j = xi,j for j = 1, . . . , p.
|C` |
i∈C`

I Reassign each sample to the nearest centroid.

8 / 19
K-means clustering algorithm
Data Step 1 Iteration 1, Step 2a

Iteration 1, Step 2b Iteration 2, Step 2a Final Results

Figure: *

Figure 10.6
9 / 19
Properties of K-means clustering

I The algorithm always converges to a local minimum of

K
X 1 X
min W (C` ) ; W (C` ) = Distance2 (xi,: , xj,: ).
C1 ,...,CK |C` |
`=1 i,j∈C`

I Each initialization could yield a different minimum.

10 / 19
Example: K-means output with different
initializations
320.9 235.8 235.8

In practice, we start from

235.8 235.8 310.9
many random initializations
and choose the output which
minimizes the objective
function.

Figure: *

Figure 10.7 11 / 19
Hierarchical clustering

Most algorithms for hierarchical clustering are agglomerative.

9 9
0.5

0.5
0.0

0.0
7 7
8 8
X2

X2
5 5
3 3
−0.5

−0.5
2 2
−1.0

−1.0
1 1
6 6
−1.5

−1.5

4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 X1
9 9
0.5

0.5

12 / 19
Hierarchical clustering

Most algorithms for hierarchical clustering are agglomerative.

9
0.5
0.0

7 7
8
X2

5 5
3
−0.5

2
−1.0

1
6
−1.5

4
.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1
9
0.5

12 / 19
−0.5

−0.5
2 2
Hierarchical clustering
−1.0

−1.0
1 1
6 6
−1.5

−1.5
4 4
Most algorithms for hierarchical clustering are agglomerative.
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 X1
9 9
0.5

0.5
0.0

0.0
7 7
8 8
X2

X2
5 5
3 3
−0.5

−0.5
2 2
−1.0

−1.0
1 1
6 6
−1.5

−1.5

4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 X1

12 / 19
−0.
2

−1.0
−1.5 6
1 Hierarchical clustering
4
.0 Most algorithms for hierarchical clustering are agglomerative.
−1.5 −1.0 −0.5 0.0 0.5 1.0

X1
9
0.5
0.0

7 7
8
X2

5 5
3
−0.5

2
−1.0

1
6
−1.5

4
.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

12 / 19
Hierarchical clustering

Most algorithms for hierarchical clustering are agglomerative.

9 9

3.0
0.5

0.5

2.5
2.0
0.0

0.0
7 7

X2
1.5
8 8
X2

X2
5 5
3 3

9
−0.5

−0.5

1.0

2
3
0.5
2 2

4
−1.0

−1.0

8
1 1

0.0
6 6

7
−1.5

−1.5

4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 The output ofXthe

1 algorithm is a
9 9
dendogram.
0.5

0.5

12 / 19
Hierarchical clustering

Most algorithms for hierarchical clustering are agglomerative.

3.0
0.5

2.5
2.0
0.0

7 7

X2
1.5
8
X2

5 5
3

9
−0.5

1.0

2
3
0.5
2

4
−1.0

8
1

0.0
6

7
−1.5

4
.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 The output of the algorithm is a

9 dendogram.
0.5

12 / 19
8 5 8 5

X
3 3

−0.5

−0.5
2 Hierarchical clustering 2
−1.0

−1.0
1 1
6 6

Most algorithms for hierarchical clustering are agglomerative.

−1.5

−1.5
4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 X1
9 9

3.0
2.5
0.5

0.5

2.0
0.0

0.0
7 7

X2
1.5
8 8
X2

X2
5 5

9
1.0
3 3

2
−0.5

−0.5

3
0.5

4
2 2

8
0.0
−1.0

−1.0
1 1

7
6 6
−1.5

−1.5

4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1 The output ofXthe

1
algorithm is a
dendogram.

12 / 19
5 5
3

−0.5
−1.0 1
2 Hierarchical clustering
6

Most4algorithms for hierarchical clustering are agglomerative.

−1.5

.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1
9

3.0
2.5
0.5

2.0
0.0

7 7

X2
1.5
8
X2

5 5

9
3

1.0

2
−0.5

3
0.5

4
2

8
0.0
−1.0

7
6
−1.5

4
.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1
The output of the algorithm is a
dendogram.

12 / 19
5 5
3

−0.5
−1.0 1
2 Hierarchical clustering
6

Most4algorithms for hierarchical clustering are agglomerative.

−1.5

.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1
9

3.0
2.5
0.5

2.0
0.0

7 7

X2
1.5
8
X2

5 5

9
3

1.0

2
−0.5

3
0.5

4
2

8
0.0
−1.0

7
6
−1.5

4
.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

X1
We must be careful about how
we interpret the dendogram.

12 / 19
Hierarchical clustering

I The number of clusters is not

fixed.
10

10
8

8
6

6
4

4
2

2
0

Figure: *

Figure 10.9

13 / 19
Hierarchical clustering

I The number of clusters is not

fixed.
10

10
I Hierarchical clustering is not
8

8
always appropriate.
6

6
4

4
2

2
0

Figure: *

Figure 10.9

13 / 19
Hierarchical clustering

I The number of clusters is not

fixed.
10

10
I Hierarchical clustering is not
8

8
always appropriate.
6

e.g. Market segmentation for

consumers of 3 different
2

2
0

nationalities.
I Natural 2 clusters: gender
I Natural 3 clusters: nationality
Figure: *
These clusterings are not nested
Figure 10.9 or hierarchical.

13 / 19
Notion of distance between clusters

At each step, we link the 2 clusters that are “closest” to each other.

Hierarchical clustering algorithms are classified according to the

notion of distance between clusters.

14 / 19
Notion of distance between clusters
At each step, we link the 2 clusters that are “closest” to each other.

Hierarchical clustering algorithms are classified according to the

notion of distance between clusters.

Complete linkage:
The distance between 2 clusters is the
maximum distance between any pair of
samples, one in each cluster.

14 / 19
Notion of distance between clusters

At each step, we link the 2 clusters that are “closest” to each other.

Hierarchical clustering algorithms are classified according to the

notion of distance between clusters.

Average linkage:
The distance between 2 clusters is the
average of all pairwise distances.

14 / 19
Notion of distance between clusters

At each step, we link the 2 clusters that are “closest” to each other.

Hierarchical clustering algorithms are classified according to the

notion of distance between clusters.

Single linkage:
The distance between 2 clusters is the
minimum distance between any pair of
samples, one in each cluster.
Suffers from the chaining phenomenon

14 / 19
Example
Average Linkage Complete Linkage Single Linkage

Figure: *

Figure 10.12
15 / 19
Clustering is riddled with questions and choices

I Is clustering appropriate? i.e. Could a sample belong to more

than one cluster?
I Mixture models, soft clustering, topic models.
I How many clusters are appropriate?
I Choose subjectively — depends on the inference sought.
I There are formal methods based on gap statistics, mixture
models, etc.
I Are the clusters robust?
I Run the clustering on different random subsets of the data. Is
the structure preserved?
I Try different clustering algorithms. Are the conclusions
consistent?
I Most important: temper your conclusions.

16 / 19
Clustering is riddled with questions and choices

I Should we scale the variables before doing the clustering.

I Variables with larger variance have a larger effect on the
Euclidean distance between two samples.
I Does Euclidean distance capture dissimilarity between samples?

17 / 19
Correlation distance

Example: Suppose that we want to cluster customers at a store

for market segmentation.
I Samples are customers
I Each variable corresponds to a specific product and measures
the number of items bought by the customer during a year.
20

Observation 1
Observation 2
Observation 3
15
10

2
5

3
1
0

5 10 15 20

Variable Index

18 / 19
Correlation distance
I Euclidean distance would cluster all customers who purchase
few things (orange and purple).
I Perhaps we want to cluster customers who purchase similar
things (orange and teal).
I Then, the correlation distance may be a more appropriate
measure of dissimilarity between samples.
20

Observation 1
Observation 2
Observation 3
15
10

2
5

3
1
0

5 10 15 20

Variable Index
19 / 19

Oracle: Questions and Answers (PDF) For More Information - Visit
100% (1)
Oracle: Questions and Answers (PDF) For More Information - Visit
70 pages
Laser Maker Manual (YLM) - 95p - ENG
No ratings yet
Laser Maker Manual (YLM) - 95p - ENG
95 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Beej's Guide To C Programming: Brian "Beej Jorgensen" Hall
No ratings yet
Beej's Guide To C Programming: Brian "Beej Jorgensen" Hall
679 pages
Scholar Advacned Higher Maths Unit 1
No ratings yet
Scholar Advacned Higher Maths Unit 1
274 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Risk To Reward Ratio Spreadsheet Tradingview
No ratings yet
Risk To Reward Ratio Spreadsheet Tradingview
19 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
6.nsupervised Learning Clustering Lecture 7 Slides For4962
No ratings yet
6.nsupervised Learning Clustering Lecture 7 Slides For4962
37 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
Clustering
No ratings yet
Clustering
110 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
DS203 2024-02-09 Clustering K Means and Hierarchical v2
No ratings yet
DS203 2024-02-09 Clustering K Means and Hierarchical v2
35 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
41 pages
Data Mining P
No ratings yet
Data Mining P
23 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
cz4041 10 Clustering
No ratings yet
cz4041 10 Clustering
67 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering
No ratings yet
Clustering
75 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
No ratings yet
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
11 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Clustering
No ratings yet
Clustering
75 pages
CLUSTERING
No ratings yet
CLUSTERING
5 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
PPS - Unit 5 Objective Questions
No ratings yet
PPS - Unit 5 Objective Questions
12 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Unit 3
No ratings yet
Unit 3
12 pages
Clustering
No ratings yet
Clustering
38 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Powershell
No ratings yet
Powershell
4 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Aoc 2217v
No ratings yet
Aoc 2217v
51 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Clustring
No ratings yet
Clustring
20 pages
UNIT5
No ratings yet
UNIT5
60 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Zara
No ratings yet
Zara
47 pages
Clustering
No ratings yet
Clustering
39 pages
Monitor Materno-Fetal Especializado C20
No ratings yet
Monitor Materno-Fetal Especializado C20
4 pages
Year 3 Reasoning Test Set 2 Paper A
No ratings yet
Year 3 Reasoning Test Set 2 Paper A
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
Css q2 Week6 g12
No ratings yet
Css q2 Week6 g12
4 pages
Cluster
100% (1)
Cluster
72 pages
How To Build A RAG Chatbot Using Ollama - Serve LLMs Locally
No ratings yet
How To Build A RAG Chatbot Using Ollama - Serve LLMs Locally
12 pages
Unit2 - DWDM Notes
No ratings yet
Unit2 - DWDM Notes
63 pages
Eng Cs 82021 CPC Sections6-9 0707
No ratings yet
Eng Cs 82021 CPC Sections6-9 0707
37 pages
TM-ENS-003 FD322 Basic Training Presentation Rev A
No ratings yet
TM-ENS-003 FD322 Basic Training Presentation Rev A
94 pages
Zabbix Installation
No ratings yet
Zabbix Installation
10 pages
OptiRamp Rod Pump Diagnostics PDF
No ratings yet
OptiRamp Rod Pump Diagnostics PDF
16 pages
Document 3
No ratings yet
Document 3
22 pages
ControlAcceso Resumen
No ratings yet
ControlAcceso Resumen
27 pages
Staad To Afes - Google Search
No ratings yet
Staad To Afes - Google Search
2 pages
Project Schedule Management Overview: 174 Part 1 - Guide
No ratings yet
Project Schedule Management Overview: 174 Part 1 - Guide
1 page
Driver List
No ratings yet
Driver List
20 pages
STD - 11 Chapter-6
No ratings yet
STD - 11 Chapter-6
3 pages
EDA All Functions
No ratings yet
EDA All Functions
9 pages
Case Study
No ratings yet
Case Study
6 pages
VT 100 Log
No ratings yet
VT 100 Log
4 pages
Program Level Energy and Power Analysis
No ratings yet
Program Level Energy and Power Analysis
4 pages
Registering Domain Name 15feb23 en
No ratings yet
Registering Domain Name 15feb23 en
1 page
Digital Systems Design Using VHDL
No ratings yet
Digital Systems Design Using VHDL
1 page

Lec 2

Uploaded by

Lec 2

Uploaded by

Lecture 2: Classification, Clustering

STATS 202: Data mining and analysis

(X, Y ) have a joint distribution.

KNN: K=1 KNN: K=100

As in classification, we assign a class to each sample in the data

Clustering is an unsupervised procedure, whose goal is to find

We will discuss 2 algorithms:

1. Assign each sample to a cluster from 1 to K arbitrarily, e.g. at

2. Iterate these two steps until the clustering is constant:

I Reassign each sample to the nearest centroid.

Iteration 1, Step 2b Iteration 2, Step 2a Final Results

I The algorithm always converges to a local minimum of

I Each initialization could yield a different minimum.

In practice, we start from

Most algorithms for hierarchical clustering are agglomerative.

Most algorithms for hierarchical clustering are agglomerative.

Most algorithms for hierarchical clustering are agglomerative.

X1 The output ofXthe

Most algorithms for hierarchical clustering are agglomerative.

X1 The output of the algorithm is a

Most algorithms for hierarchical clustering are agglomerative.

X1 The output ofXthe

Most4algorithms for hierarchical clustering are agglomerative.

.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

Most4algorithms for hierarchical clustering are agglomerative.

.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

I The number of clusters is not

I The number of clusters is not

I The number of clusters is not

e.g. Market segmentation for

Hierarchical clustering algorithms are classified according to the

Hierarchical clustering algorithms are classified according to the

Hierarchical clustering algorithms are classified according to the

Hierarchical clustering algorithms are classified according to the

I Is clustering appropriate? i.e. Could a sample belong to more

I Should we scale the variables before doing the clustering.

Example: Suppose that we want to cluster customers at a store

You might also like