0% found this document useful (0 votes)

14 views41 pages

Clustering

The document discusses clustering algorithms in computer vision, focusing on methods like K-means, agglomerative clustering, and spectral clustering. It highlights the importance of clustering for data summarization, segmentation, and prediction, while also addressing challenges such as determining similarity and evaluating clusters. Additionally, it covers the Iterative Closest Points (ICP) algorithm for aligning point sets and provides insights into the advantages and disadvantages of various clustering techniques.

Uploaded by

atik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views41 pages

Clustering

Uploaded by

atik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Clustering

Computer Vision
CSE M164
Today’s class
• Fitting and alignment
– One more algorithm: ICP
– Review of all the algorithms

• Clustering algorithms
– K-means
– Hierarchical clustering
– Spectral clustering
What if you want to align but have no prior
matched pairs?

• Hough transform and RANSAC not applicable

• Important applications

Medical imaging: match brain Robotics: match point clouds

scans or contours
Iterative Closest Points (ICP) Algorithm

Goal: estimate transform between two dense

sets of points

1. Assign each point in {Set 1} to its nearest neighbor

in {Set 2}
2. Estimate transformation parameters
– e.g., least squares or robust least squares
3. Transform the points in {Set 1} using estimated
parameters
4. Repeat steps 2-4 until change is very small
Example: solving for translation

A2 A3 B1

B2 B3

Given matched points in {A} and {B}, estimate the translation of the object
 xiB   xiA   t x 
 B   A    t 
 yi   yi   y 
Example: solving for translation

A2 A3 (tx, ty) B1

B2 B3

Least squares solution

1. Write down objective function
 xiB   xiA   t x 
 B   A    t 
2. Derived solution
a) Compute derivative
 yi   yi   y 
b) Compute solution 1 0  x1B  x1A 
0  B A
3. Computational solution 1 y  y
   tx   1 1 
a) Write in form Ax=b       
  t  
0    x nB  x nA 
y
b) Solve using pseudo-inverse or 1
eigenvalue decomposition  0 1  y nB  y nA 
Example: solving for translation

A1
A5
B4
A2 A3 (tx, ty) B1

A4
B2 B3
B5

Problem: outliers

RANSAC solution  xiB   xiA   t x 

1. Sample a set of matching points (1 pair)  B   A    t 
2. Solve for transformation parameters  yi   yi   y 
3. Score parameters with number of inliers
4. Repeat steps 1-3 N times
Example: solving for translation
B4
A1 B5 B6

A2 A3 (tx, ty) B1

A4 B2 B3
A5 A6

Problem: outliers, multiple objects, and/or many-to-one matches

Hough transform solution  xiB   xiA   t x 

1. Initialize a grid of parameter values  B   A    t 
2. Each matched pair casts a vote for  yi   yi   y 
consistent values
3. Find the parameters with the most votes
4. Solve using least squares with inliers
Example: solving for translation

(tx, ty)

Problem: no initial guesses for correspondence

ICP solution  xiB   xiA   t x 

1. Find nearest neighbors for each point  B   A    t 
2. Compute transform using matches  yi   yi   y 
3. Move points using transform
4. Repeat steps 1-3 until convergence
Clustering
Group together similar points and represent
them with a single token

Key Challenges:
1) What makes two points/images/patches similar?
2) How do we compute an overall grouping from
pairwise similarities?
Why do we cluster?
• Summarizing data
– Look at large amounts of data
– Patch-based compression or denoising
– Represent a large continuous vector with the cluster number

• Counting
– Histograms of texture, color, SIFT vectors

• Segmentation
– Separate the image into different regions

• Prediction
– Images in the same cluster may have the same labels
How do we cluster?

• K-means
– Iteratively re-assign points to the nearest cluster center

• Agglomerative clustering
– Start with each point as its own cluster and iteratively
merge the closest clusters

• Spectral clustering
– Split the nodes in a graph based on assigned links with
similarity weights
Clustering for Summarization
Goal: cluster to minimize variance in data
given clusters
– Preserve information

Cluster center Data

N K
c * , δ * argmin N1   ij c i  x j 
2

c ,δ j i

Whether xj is assigned to ci
K-means

0. Initialize 1. Assign Points to 2. Re-compute Repeat (1) and (2)

Cluster Centers Clusters Means

Illustration Source: wikipedia

K-means
1. Initialize cluster centers: c0 ; t=0

2. Assign each point to the closest center

N K

  c  xj
t t 1 2
δ argmin 1
N ij i
δ j i

3. Update cluster centers

N K
as the mean of the points
c t argmin N1   ijt c i  x j 
2

c j i

4. Repeat 2-3 until no points are re-assigned (t=t+1)

K-means: design choices
• Initialization
– Randomly select K points as initial cluster center
– Or greedily choose K points to minimize residual

• Distance measures
– Traditionally Euclidean, could be others

• Optimization
– Will converge to a local minimum
– May want to perform multiple restarts

Illustration Source: wikipedia

K-means clustering using intensity or color

Image Clusters on intensity Clusters on color

How to choose the number of clusters?
• Minimum Description Length (MDL) principle for
model comparison

• Minimize Schwarz Criterion

– also called Bayes Information Criteria (BIC)
How to evaluate clusters?

• Generative
– How well are points reconstructed from the clusters?
– Example: Predict the next word in a sequence

• Discriminative
– How well do the clusters correspond to labels?
• Purity
– Example: Spectral clustering
– Note: unsupervised clustering does not aim to be
discriminative
How to choose the number of clusters?
• Validation set
– Try different numbers of clusters and look at
performance
• When building dictionaries (discussed later), more
clusters typically work better
Conclusions: K-means
Good
• Finds cluster centers that minimize conditional variance (good
representation of data)
• Simple to implement, widespread application

Bad
• Prone to local minima
• Need to choose K
• All clusters have the same parameters (e.g., distance measure
is non-adaptive)
• Can be slow: each iteration is O(KNd) for N d-dimensional
points
Building Visual Dictionaries
1. Sample patches from
a database
– E.g., 128 dimensional
SIFT vectors

2. Cluster the patches

– Cluster centers are
the dictionary

3. Assign a codeword
(number) to each
new patch, according
to the nearest cluster
Examples of learned codewords

Most likely codewords for 4 learned “topics”

EM with multinomial (problem 3) to get topics

http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic05b.pdf Sivic et al. ICCV 2005

Common similarity/distance measures
• P-norms
– City Block (L1)
– Euclidean (L2)
Here xi is the
– L-infinity
distance
between two
points

• Mahalanobis
– Scaled Euclidean

• Cosine distance
K-medoids
• Just like K-means except
– Represent the cluster with one of its members,
rather than the mean of its members
– Choose the member (data point) that minimizes
cluster dissimilarity

• Applicable when a mean is not meaningful

– E.g., clustering values of hue or using L-infinity
similarity
Agglomerative clustering
Agglomerative clustering
Agglomerative clustering
Agglomerative clustering
Agglomerative clustering
Agglomerative clustering
How to define cluster similarity?
- Average distance between points, maximum
distance, minimum distance
- Distance between means or medoids

How many clusters?

- Clustering creates a dendrogram (a tree)
- Threshold based on max number of clusters
or based on distance between merges

distance
Conclusions: Agglomerative Clustering
Good
• Simple to implement, widespread application
• Clusters have adaptive shapes
• Provides a hierarchy of clusters

Bad
• May have imbalanced clusters
• Still have to choose number of clusters or threshold
• Need to use an “ultrametric” to get a meaningful
hierarchy
Spectral clustering
Group points based on links in a graph

B
A
Cuts in a graph

B
A

Normalized Cut
• a cut penalizes large segments
• fix by normalizing for size of segments

• volume(A) = sum of costs of all edges that touch A

Source: Seitz
Normalized cuts for segmentation
Visual PageRank
• Determining importance by random walk
– What’s the probability that you will randomly walk
to a given node?
• Create adjacency matrix based on visual similarity
• Edge weights determine probability of transition

Jing Baluja 2008

Which algorithm to use?
• Quantization/Summarization: K-means
– Aims to preserve variance of original data
– Can easily assign new point to a cluster

Summary of 20,000 photos of Rome using

Quantization for
“greedy k-means”
computing histograms
http://grail.cs.washington.edu/projects/canonview/
Which algorithm to use?
• Image segmentation: agglomerative clustering
– More flexible with distance measures (e.g., can be
based on boundary prediction)
– Adapts better to specific data
– Hierarchy can be useful

http://www.cs.berkeley.edu/~arbelaez/UCM.html
Which algorithm to use?
• Image segmentation: spectral clustering
– Can provide more regular regions
– Spectral methods also used to propagate global
cues
Things to remember
• K-means useful for summarization,
building dictionaries of patches,
general clustering

• Agglomerative clustering useful for

segmentation, general clustering

• Spectral clustering useful for

determining relevance,
summarization, segmentation

Networking Manual by Bassterlord (Fisheye)
No ratings yet
Networking Manual by Bassterlord (Fisheye)
63 pages
Awsadmst
No ratings yet
Awsadmst
371 pages
Birthday Girl PDF
No ratings yet
Birthday Girl PDF
1 page
Linux Red Hat
No ratings yet
Linux Red Hat
35 pages
Unit-5 SM
No ratings yet
Unit-5 SM
32 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Intelligent Disk Subsystems
No ratings yet
Intelligent Disk Subsystems
69 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
State Machines
No ratings yet
State Machines
6 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
28 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Meta Search Engine Using Distributed Information Retrieval
No ratings yet
Meta Search Engine Using Distributed Information Retrieval
35 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Infineon IDP2303 DS v02 - 00 EN
No ratings yet
Infineon IDP2303 DS v02 - 00 EN
38 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Crontab
No ratings yet
Crontab
1 page
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
Clustering
No ratings yet
Clustering
75 pages
Whitepaper Automotive-Spice en Swe1 Software-requirements-Analysis
No ratings yet
Whitepaper Automotive-Spice en Swe1 Software-requirements-Analysis
15 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Asit Kumar Das - M4 BDA Clustering
No ratings yet
Asit Kumar Das - M4 BDA Clustering
99 pages
14
No ratings yet
14
72 pages
Clustering
No ratings yet
Clustering
82 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML 03 Clustering
No ratings yet
ML 03 Clustering
63 pages
5clustering 2
No ratings yet
5clustering 2
35 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Hierarchicalclustering
No ratings yet
Hierarchicalclustering
20 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
UNIT 4 ML Notes
No ratings yet
UNIT 4 ML Notes
22 pages
Clustering
No ratings yet
Clustering
45 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
Clustering
No ratings yet
Clustering
27 pages
cs4811 ch10c Clustering
No ratings yet
cs4811 ch10c Clustering
35 pages
Week 7 Kmeans
No ratings yet
Week 7 Kmeans
18 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
EPG REST Integration V17
No ratings yet
EPG REST Integration V17
48 pages
Lec10 Clustering
No ratings yet
Lec10 Clustering
19 pages
Module 5
No ratings yet
Module 5
43 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
UNIT5
No ratings yet
UNIT5
60 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
61 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
SOP For Masters in Computer Science: Phone: +91 9946991401
No ratings yet
SOP For Masters in Computer Science: Phone: +91 9946991401
1 page
K Means
No ratings yet
K Means
36 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
Unit 3
No ratings yet
Unit 3
33 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Norzulsuriana Binti Yahaya Yahaya KBR Tvmdol LGK Flight - Originating
No ratings yet
Norzulsuriana Binti Yahaya Yahaya KBR Tvmdol LGK Flight - Originating
2 pages
Cluster
100% (1)
Cluster
72 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Lecture 01
No ratings yet
Lecture 01
39 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Clustering
No ratings yet
Clustering
39 pages
Exam Cell Automation System
No ratings yet
Exam Cell Automation System
3 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
SWJ 3625
No ratings yet
SWJ 3625
26 pages
L4D2 Server Config
No ratings yet
L4D2 Server Config
3 pages
Data Infrastructure at Meta: Atik Ishrak October 2024
No ratings yet
Data Infrastructure at Meta: Atik Ishrak October 2024
6 pages
Pay PDF
No ratings yet
Pay PDF
7 pages
Unit 4
No ratings yet
Unit 4
16 pages
Handout Esis 2023-2024
No ratings yet
Handout Esis 2023-2024
42 pages
SRG 4600 Manual
No ratings yet
SRG 4600 Manual
29 pages
INTERN
No ratings yet
INTERN
40 pages
CS100 Computational Problem Solving Fall 2021-2022 Sarvech Qadir
No ratings yet
CS100 Computational Problem Solving Fall 2021-2022 Sarvech Qadir
4 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Worksheet 5 Memorandum Patterns Grade 10 Maths
No ratings yet
Worksheet 5 Memorandum Patterns Grade 10 Maths
3 pages
1 s2.0 S1574013721000186 Main
No ratings yet
1 s2.0 S1574013721000186 Main
13 pages
Modern Work Plan Comparison Enterprise
No ratings yet
Modern Work Plan Comparison Enterprise
10 pages
Eto Yung Performance Task Namin
No ratings yet
Eto Yung Performance Task Namin
2 pages
Data Management Tools at Meta
No ratings yet
Data Management Tools at Meta
13 pages
8 Puzzle
No ratings yet
8 Puzzle
17 pages
Watershed
No ratings yet
Watershed
9 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Creative Brief For FocusVu
No ratings yet
Creative Brief For FocusVu
4 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
14 pages

Clustering

Uploaded by

Clustering

Uploaded by

Clustering

• Hough transform and RANSAC not applicable

Medical imaging: match brain Robotics: match point clouds

Goal: estimate transform between two dense

1. Assign each point in {Set 1} to its nearest neighbor

Least squares solution

RANSAC solution  xiB   xiA   t x 

Problem: outliers, multiple objects, and/or many-to-one matches

Hough transform solution  xiB   xiA   t x 

Problem: no initial guesses for correspondence

ICP solution  xiB   xiA   t x 

Cluster center Data

0. Initialize 1. Assign Points to 2. Re-compute Repeat (1) and (2)

Illustration Source: wikipedia

2. Assign each point to the closest center

3. Update cluster centers

4. Repeat 2-3 until no points are re-assigned (t=t+1)

Illustration Source: wikipedia

Image Clusters on intensity Clusters on color

• Minimize Schwarz Criterion

2. Cluster the patches

Most likely codewords for 4 learned “topics”

http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic05b.pdf Sivic et al. ICCV 2005

• Applicable when a mean is not meaningful

How many clusters?

• volume(A) = sum of costs of all edges that touch A

Jing Baluja 2008

Summary of 20,000 photos of Rome using

• Agglomerative clustering useful for

• Spectral clustering useful for

You might also like