0% found this document useful (0 votes)
10 views31 pages

Eigenvectors 2

The document discusses dimensionality reduction techniques in advanced data mining, focusing on linear methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), as well as nonlinear methods such as Locally Linear Embedding (LLE). It explains the concepts of parametric and nonparametric learning, comparing their effectiveness based on data distribution and sample size. Additionally, it introduces the concept of manifolds and their role in modeling complex data distributions.

Uploaded by

Hare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Eigenvectors 2

The document discusses dimensionality reduction techniques in advanced data mining, focusing on linear methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), as well as nonlinear methods such as Locally Linear Embedding (LLE). It explains the concepts of parametric and nonparametric learning, comparing their effectiveness based on data distribution and sample size. Additionally, it introduces the concept of manifolds and their role in modeling complex data distributions.

Uploaded by

Hare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CIS 530—Advanced Data Mining

9- Dimensionality
Reduction
Computer and Information Science
University of Massachusetts Dartmouth
Attribute Dimensions and Orders

• Dimensions
• 1D: scalar
• 2D: two-dimensional vector
• 3D: three-dimensional
vector
• >3D: multi-dimensional
vector
• Orders 1-st order 2-nd order
• scalars
• vectors
• matrix
• tensors (high-order) vector matrix
Bivariate Data Representations

Courtesy of Prof. Hanspeter Pfister, Harvard University.


Original figures were from the slides of Stasko
Trivariate Data Representations

Courtesy of Prof. Hanspeter Pfister, Harvard University.


Original figures were from the slides of Stasko
Multi-Dimensional Data

Courtesy of Prof. Hanspeter Pfister, Harvard University.


Original figures were from the slides of Stasko
What if the dimension of the data is 4, 5, 6, and
even more?
Dimensionality Reduction
• Linear Methods
• Principal Component Analysis (PCA), M.A. Turk &
A.P. Pentland
• Linear Discriminant Analysis (LDA), R. Fisher
• Nonlinear Methods
• Locally Linear Embedding (LLE), S.T. Roweis & L.K.
Saul
Parametric vs. Nonparametric Learning

• Parametric Model
• Use a parameterized family of probability distributions to describe the
nature of a set of data (Moghaddam & Pentland, 1997).
• The data distribution is empirically assumed or estimated.
• Learning is conducted by measuring a set of fixed parameters, such as
mean and variance.
• Effective for the large sample but degrade for complicated data
distribution.

• Nonparametric Model
• Distribution free.
• Learning is conducted by measuring the pair-wise data relationship in
both global and local manners.
• Effective and robust due to the reliance on fewer assumptions and
parameters.
• Work for cases with small-sample, high-dimensionality, and complicated
data distribution.
Linear Models
• Two representative models
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• PCA is trying to captures the “principal” variations in the
data
• It is computed by finding the Eigenvectors of the
covariance matrix of the data
• Geometrically, PCA finds the largest variations
directions of the underlying data
• LDA on the other hand considers the label information
which maximizes the distance between classes, and
minimizes the distance within a class
Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Principal Component Analysis
• Two views: Assuming zero
• Variance mean, line is
represented
• Reconstruction as , where
w is the basis,
s.t. =1.
• Maximize the data
variance in the lower-
dimensional space

• Find the projections that


minimize the
reconstruction error
Face reconstruction using different
number of eigenvectors
PCA: View 1
• Given a dataset , we first centralize the
dataset by

• We want to find a low-dimensional


space , such that the variance of in
this new space is maximized
• Let’s say is the new representation in
this space, then we should maximize
the following:
PCA: View 1

• Maximum of will be affected by the magnitude of vector


.
• To mitigate this effect, we introduce another constraint.

• By using Lagrangian multiplier method, it turns to be:


Eigenvectors
• For a square matrix , if exists

• is an eigenvector, is the eigenvalue associated with


this eigenvector.
• For eigenvector , transform is just a scaling function.
• Example

Matrix Compute eigenvalues Eigenvectors

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


http://en.wikipedia.org/wiki/Eigenvalue,_eigenvector_and_eigenspace
PCA: View 2
• Preliminary:
• A subspace is represented by the orthogonal basis of
this space:
• Projection – from high to low dimensional space:
• Reconstruction – projecting back:
• Example,

• Objective:
• Find a subspace spanned by to
minimize the reconstruction error
• We add an additional constraint to
make this problem trackable:
Face reconstruction using different
number of eigenvectors
Principal Component Analysis

Eigenvector Eigenfaced
Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Linear Discriminant Analysis

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Linear Discriminant Analysis

• Instead of PCA, it finds the discriminant subspace by including class label info in
subspace modeling (Supervised learning).
– Compute within class scatter
– Compute between class scatter
– Maximize between scatters and minimize within scatters

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


LDA Problem Definition
• In LDA, we want to find a projection to achieve
two goals in one shot!
• Make samples from the same class compact
• Make samples from different classes dispart
• Assume the center of class ; is the projection
to be optimized
• We hope in 1-D space:


LDA
𝑥𝑗

𝜇𝑖 𝑢𝑗
Within-Class Scatter Matrix
• For all samples from Class , we add them
together

Within-class scatter
matrix of class-i
• If we have more than one classes (most likely!),
the within-class scatter matrix is:
Between-Class Scatter Matrix
• Only Two Classes
𝑇
( ( 𝜇1 − 𝜇2 ) ( 𝜇1 − 𝜇 2) ) 𝑤=𝑤 𝑆𝑏 𝑤
2
( 𝑤 ( 𝜇1 −𝜇 2) ) =𝑤
𝑇 𝑇 𝑇

: Between-class
• More than Two Classes scatter matrix
Learning Objective
• To achieve two goals, we will

• Which is equal to the following problem:

• Again, using Lagrangian multiplier method, we


have:

• Which is a typical eigen-decomposition problem


Different Subspace Base Vectors
• Different subspace base vectors show different
projective directions
• Subspace base vector forms a Fisherface
PCA vs. LDA

PCA LDA
Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
PCA vs. LDA
• PCA performs
worse under
this condition

• LDA (FLD-Fisher
Linear Discriminant)
provides
better low
dimensional
representation.

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


When LDA Fails
• LDA fails in the right figure( is the projected
direction). Think about why…

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Manifold
• “Manifold, is an abstract mathematical space in which
every point has a neighborhood which resembles
Euclidean space, but in which the global structure may
be more complicated.” ---from Wikipedia
• “A manifold is a topological space that is locally
Euclidean.” ---from Mathworld
• e.g., 2D map of the 3D earth is a manifold.
• Manifold could be obtained by a projection from original
data to a low-dimensional representation via subspace
learning.
• Manifold criterion can provide more effective ways to
model the data distribution than conventional learning
methods based on the Gaussian distribution.
Manifold

http://en.wikipedia.org/wiki/Manifold
Manifold Learning
Swiss Roll

Dimensionality
Reduction

Courtesy of Sam T. Roweis and Lawrence K. Saul, Sience 2002


Locally Linear Embedding

http://www.cs.toronto.edu/~roweis/lle/
LEA for Pose Manifold

Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Expression Manifold

Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.

You might also like