Dimensionality Reduction
Dimensionality Reduction
Dimensionality Reduction
Motivation
• Clustering
• One way to summarize a complex real-valued data point with a single
categorical variable
• Dimensionality reduction
• Another way to simplify complex high-dimensional data
• Summarize data with a lower dimensional real valued vector
Motivation
• Clustering
• One way to summarize a complex real-valued data point with a single
categorical variable
• Dimensionality reduction
• Another way to simplify complex high-dimensional data
• Summarize data with a lower dimensional real valued vector
2D to 1D
(cm)
Data Compression
Reduce data from
(inches)
2D to 1D
(cm)
Andrew Ng
Data Compression
Reduce data from 3D to 2D
Principal Component Analysis (PCA) problem formulation
Andrew Ng
Covariance
• Variance and Covariance:
• Measure of the “spread” of a set of points around their center of mass(mean)
• Variance:
• Measure of the deviation from the mean for points in one dimension
• Covariance:
• Measure of how much each of the dimensions vary from the mean with
respect to each other
Example
Eigenvector and Eigenvalue
Ax - λx = 0
Ax = λx
(A – λI)x = 0
• The top three principal components of SIFT descriptors from a set of images are computed
• Map these principal components to the principal components of the RGB space
• pixels with similar colors share similar structures
Application: Image compression
Original Image
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
PCA compression: 144D ) 6D
6 most important eigenvectors
2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
PCA compression: 144D
3D
3 most important eigenvectors
2 2
4 4
6 6
8 8
10 10
12 12
2 4 6 8 10 12 2 4 6 8 10 12
10
12
2 4 6 8 10 12
PCA compression: 144D
1D
60 most important eigenvectors