0% found this document useful (0 votes)
39 views47 pages

Dimensionality Reduction 22-01-22

The document discusses dimensionality reduction techniques. It begins by explaining the motivations for dimensionality reduction, such as computational efficiency, better generalization with fewer dimensions, visualization of data structure, and anomaly detection. It then describes the basic setup for linear dimensionality reduction, which involves projecting high-dimensional data points into a lower-dimensional subspace. Finally, it introduces principal component analysis (PCA) and explains that PCA aims to choose projection directions that minimize reconstruction error of the original data and maximize the projected variance of the data.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views47 pages

Dimensionality Reduction 22-01-22

The document discusses dimensionality reduction techniques. It begins by explaining the motivations for dimensionality reduction, such as computational efficiency, better generalization with fewer dimensions, visualization of data structure, and anomaly detection. It then describes the basic setup for linear dimensionality reduction, which involves projecting high-dimensional data points into a lower-dimensional subspace. Finally, it introduces principal component analysis (PCA) and explains that PCA aims to choose projection directions that minimize reconstruction error of the original data and maximize the projected variance of the data.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Pattern Recognition and Machine learning

Dimensionality Reduction
Dipanjan Roy
Associate Professor
School of AIDE
Indian Institute of Technology Jodhpur
Numerous examples of high-dimensional data…..
Documents
According to media reports, a pair of hackers said on Saturday that
the Firefox Web browser, commonly perceived as the safer and
more customizable alternative to market leader Internet Explorer,
Face images is critically flawed. A presentation on the flaw was shown during the
ToorCon hacker conference in San Diego.

Zambian President Levy


Mwanawasa has won a
second term in office in
an election his challenger
Michael Sata accused him
of rigging, official results
showed on Monday.

Neural population recordings MEG readings


Gene expression data

2
High dimensional Brain fMRI data
Motivation and context
Why do dimensionality reduction?
• Computational: compress data ⇒ time/space
efficiency
Motivation and context
Why do dimensionality reduction?
• Computational: compress data ⇒ time/space
efficiency
• Statistical: fewer dimensions ⇒ better generalization

3
Motivation and context
Why do dimensionality reduction?
• Computational: compress data ⇒ time/space
efficiency
• Statistical: fewer dimensions ⇒ better generalization
• Visualization: understand structure of data

3
Motivation and context
Why do dimensionality reduction?
• Computational: compress data ⇒ time/space efficiency
• Statistical: fewer dimensions ⇒ better generalization
• Visualization: understand structure of data
• Anomaly detection: describe normal data, detect
outliers

3
Motivation and context
Why do dimensionality reduction?
• Computational: compress data ⇒ time/space efficiency
• Statistical: fewer dimensions ⇒ better generalization
• Visualization: understand structure of data
• Anomaly detection: describe normal data, detect
outliers

Dimensionality reduction in this course:


• Linear methods (this week)
• Nonlinear methods (later)

3
Why reduce dimensions?
 High dimensionality has many costs

– Redundant and irrelevant features degrade performance of some ML


algorithms

– Difficulty in interpretation and visualization

– Computation may become infeasible


 what if your algorithm scales as O( n3 )?

– Curse of dimensionality
Types of problems
• Prediction x → y: classification, regression
Applications: face recognition, gene expression
prediction Techniques: kNN, SVM, least squares ( +
dimensionality reduction preprocessing)

• Structure discovery x → z: find an


alternative representation z of data x
Applications: visualization
Techniques: clustering, linear dimensionality
reduction

• Density estimation p(x): model the data


Applications: anomaly detection, language modeling
Techniques: clustering, linear dimensionality
Linear dimensionality reduction
 Bestk-dimensional subspace for projection depends on task
– Unsupervised: retain as much data variance as possible
Example: principal component analysis (PCA)

– Classification: maximize separation among classes


Example: linear discriminant analysis (LDA)

– Regression: maximize correlation between projected data and response


variable
Example: partial least squares (PLS)
Basic idea of linear dimensionality reduction

Represent each face as a high-dimensional vector x ∈


R3 6 1

5
Basic idea of linear dimensionality reduction

Represent each face as a high-dimensional vector x ∈


R3 6 1
x ∈R 361

z=
UT x z ∈
R 10
5
Basic idea of linear dimensionality
reduction

Represent each face as a high-dimensional vector x ∈


R3 6 1
x ∈R 361

z=
UT x z ∈
R 10

How do we choose U ? 5
Outline
• Principal component analysis (PCA)
– Basic principles
– Case studies

• Linear discriminant analysis (LDA)

• Fisher discriminant analysis (FDA)

• Canonical correlation analysis (CCA)

• Independent Component Analysis (ICA)

• Summary
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈

R d×n
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈R d×n

Want to reduce dimensionality from d to k


Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈R d×n

Want to reduce dimensionality from d to k


Choose k directions u 1 , . . . , u k
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈R d×n

Want to reduce dimensionality from d to k


Choose k directions u 1 , . . . , u k

U = ( u 1 ·· u k ) ∈R d×k
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈R d×n

Want to reduce dimensionality from d to k


Choose k directions u 1 , . . . , u k

U = ( u 1 ·· u k ) ∈R d×k

For each u j , compute “similarity” z j = uTj x


Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈

R d×n

Want to reduce dimensionality from d to k


Choose k directions u 1 , . . . , u k

U = ( u 1 ·· u k ) ∈R d×k

For each u j , compute “similarity” z j = uTj


x
Dimensionality reduction setup
Given n data points in d dimensions: x 1 , . . . , x n ∈
Rd

X = ( x1 · · · · · · xn ) ∈

R d×n

Want to reduce dimensionality from d to k


Choose k directions u 1 , . . . , u k

U = ( u 1 ·· u k ) ∈R d×k

For each u j , compute “similarity” z j = uTj


x
PCA objective 1: reconstruction error
U serves two functions:
• Encode: z = U T x, zj =
uTj x
PCA objective 1: reconstruction error
U serves two functions:
• Encode: z = U T x, zj =
•u
T x
Decode:
j
x˜ = U z Σ kj =1 z j u j
=
PCA objective 1: reconstruction error
U serves two functions:
• Encode: z = U T x, zj =
•u
T x
Decode:
j
x˜ = U z Σ kj =1 z j u j
= reconstruction error ǁx − x˜ǁ to be
Want
small
PCA objective 1: reconstruction error
U serves two functions:
• Encode: z = U T x, zj =
•u
T x Σ k
Decode: x˜ = U z
j
j =1 z j u j
= reconstruction error ǁx − x˜ ǁ to be small
Want

Objective: minimize total squared reconstruction


error

n
min Σ ǁx i − U U T x i ǁ 2
U ∈ R d×k
i=1
PCA objective 2: projected variance
Empirical distribution: uniform over x 1 , . . . ,
xn
PCA objective 2: projected variance
Empirical distribution: uniform over x 1 , . . . ,
xn
Expectation (think
Ê [f sum
(x)] over
1 data
Σ
n
fpoints):
n i=1
= (x i )
PCA objective 2: projected variance
Empirical distribution: uniform over x 1 , . . . ,
xn
Expectation (think sum over data points):
Variance (think sum of squares if
centered):

Assume data is centered:


PCA objective 2: projected variance
Dimensionality reduction from Multi Trial Recordings
Trial averaged and concatenated PCA
Equivalence in two objectives
Finding one principal component
How many principal components?

• Similar to question of “How many clusters?”


• Magnitude of eigenvalues indicate fraction of variance captured.
How many principal components?
• Similar to question of “How many clusters?”
• Magnitude of eigenvalues indicate fraction of variance
captured.
• Eigenvalues 1353.2
on a face image dataset:
1086.7

820.1
λi
553.6

287.1

2 3 4 5 6 7 8 9 10
i 11

Principal component analysis (PCA) / Basic 15


How many principal components?
• Similar to question of “How many clusters?”
• Magnitude of eigenvalues indicate fraction of variance
captured.
• Eigenvalues 1353.2
on a face image dataset:
1086.7

820.1
λi
553.6

287.1

2 3 4 6 7 8 9 10
5 11
i
• Eigenvalues typically drop off sharply, so don’t need that
many.
• Of course variance isn’t everything...
Summary of PCA
Reducing Matrix Dimensions
◾ Often, our data can be represented by an
𝑚-by-𝑛 matrix
◾ And this matrix can be closely approximated
by the product of three matrices that share a
small common dimension 𝑟
n
n r r VT
   r

m A ≈ U m

Jure Leskovec & Mina Ghashami


SVD Definition
T

n r r n
r


m A  m VT

◾ A: Input data matrix U


 m x n matrix (e.g., m documents, n terms)
◾ U: Left singular vectors
 m x r matrix (m documents, r concepts)

: Singular values
 r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)
◾ V: Right singular vectors
 n x r matrix (n terms, r concepts)
SVD steps for estimation of eigenvectors

n
1u1v1 2u2v2

m A  +

σi … scalar
If we set 2 = 0, then the green ui … vector
columns may as well vi … vector
It is always possible to decompose a
real matrix A into A = U  VT , where
◾ U, , V: unique
◾ U, V: column orthonormal
 UT U = I; VT V = I (I: identity matrix)
 (Columns are orthogonal unit vectors)
◾ : diagonal
 Entries (singular values) are non-negative,
and sorted in decreasing order (σ1  σ2  ...  0)

Nice proof of uniqueness: https://www.cs.cornell.edu/courses/cs322/2008sp/stuff/TrefethenBau_Lec4_SVD.pdf


Large-scale Brain Networks in M/EEG and
dimension reduction?
• What is happening at faster time-scales?
• What are the specific neuronal interactions?
• Can we use MEG to answer these questions?
• excellent temporal res (millisecs)
• good spatial res
• non-
invasive
SVD on High dimensional Brain Data
Spectrum Cross-Spectral Matrix Karhunen-Loève transform

Global Coherence

Sahoo et al. (2020) Neuroimage


Linear dimensionality reduction
 Best k-dimensional subspace for projection depends on task

– Unsupervised: retain as much data variance as possible


Example: principal component analysis (PCA)

– Classification: maximize separation among classes


Example: linear discriminant analysis (LDA)

– Regression: maximize correlation between projected data and response


variable
Example: partial least squares (PLS)
LDA for two classes

You might also like