0% found this document useful (0 votes)

81 views34 pages

Chapter 7: Dimensionality Reduction

The document discusses dimensionality reduction techniques in data mining and machine learning. It describes how dimensionality reduction aims to find a lower dimensional representation of data to avoid the "curse of dimensionality". It presents the concept of projecting data points onto a new basis defined by orthogonal vectors to obtain coordinates in a lower dimensional space, while minimizing information loss. An example using Fisher's linear discriminant on iris data is provided to find the optimal one-dimensional projection.

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views34 pages

Chapter 7: Dimensionality Reduction

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms

dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 7: Dimensionality Reduction

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 1/
Dimensionality Reduction
The goal of dimensionality reduction is to find a lower dimensional representation
of the data matrix D to avoid the curse of dimensionality.
Given n × d data matrix, each point x i = (xi 1 , xi 2 , . . . , xid )T is a vector in the
ambient d-dimensional vector space spanned by the d standard basis vectors
e 1, e 2, . . . , e d .
Given any other set of d orthonormal vectors u 1 , u 2 , . . . , u d we can re-express each
point x as

x = a1 u 1 + a2 u 2 + · · · + ad u d

where a = (a1 , a2 , . . . , ad )T represents the coordinates of x in the new basis. More

compactly:

x = Ua

where U is the d × d orthogonal matrix, whose ith column comprises the ith basis
vector u i . Thus U −1 = U T , and we have

a = UT x
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 2/
Optimal Basis: Projection in Lower Dimensional Space
There are potentially infinite choices for the orthonormal basis vectors. Our goal
is to choose an optimal basis that preserves essential information about D.
We are interested in finding the optimal r -dimensional representation of D, with
r ≪ d. Projection of x onto the first r basis vectors is given as
r
X
x ′ = a1 u 1 + a2 u 2 + · · · + ar u r = ai u i = U r a r
i =1

where U r and a r comprises the r basis vectors and coordinates, respv. Also,
restricting a = U T x to r terms, we have
a r = U Tr x

The r -dimensional projection of x is thus given as:

x ′ = U r U Tr x = P r x
Pr
where P r = U r U Tr = i =1 u i u Ti is the orthogonal projection matrix for the
subspace spanned by the first r basis vectors.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 3/
Optimal Basis: Error Vector

Given the projected vector x ′ = P r x, the corresponding error vector, is the

projection onto the remaining d − r basis vectors

d
X
ǫ= ai u i = x − x ′
i =r +1

The error vector ǫ is orthogonal to x ′ .

The goal of dimensionality reduction is to seek an r -dimensional basis that gives
the best possible approximation x ′i over all the points x i ∈ D. Alternatively, we
seek to minimize the error ǫi = x i − x ′i over all the points.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 4/
Iris Data: Optimal One-dimensional Basis

bC
X3 bC
X3
bC bC bC bC
bC bC bC bC

bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC
bC bC Cb bC bC Cb
bC
bC bC bC Cb bC
bC bC bC Cb
Cb Cb
bC bC bC bC bC bC
bC bC bC Cb bC bC bC Cb
bC Cb bC bC bC Cb bC bC
bC bC bC bC
bC bCCb bC bC bCCb bC
bCbC Cb Cb bC bC bC bC Cb Cb bCbC Cb Cb bC bC bC bC Cb Cb
X1 bC bC bC bC Cb bC bC bC X1 bC bC bC bC Cb bC bC bC
Cb bC Cb Cb bC Cb
bC bC bC bC bC bC
bC Cb bC bC bC X2 bC Cb bC bC bC X2
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bCbC bC bC bC bCbC bC bC bC

bC bC
bC Cb bC Cb
bC bC

bC bC
bC Cb bC bC Cb bC
bC Cb bC bC Cb bC
bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bCbC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC

Iris Data: 3D Optimal 1D Basis

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 5/
Iris Data: Optimal 2D Basis

bC
X3
bC bC
bC bC
bC
X3
bC bC bC bC
bC bC bC
bC bC
bC bC bC
bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC Cb bC
bC Cb bC Cb bC
bC Cb CbbC bC bC bC bC bC bC Cb bC bC
X1 Cb bC bC Cb bC bC Cb bC bC bC bC bC bC bC bC bC Cb bC
bC
bC
bC bC
bC bC bC CbbC bC bCbC bC bC bC bC bC
Cb bC X2
bC bC bC
bC X1 Cb bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC Cb bC bC bC bC X2
bC bC bC
C b bC
bC bC bC bC bC bC bC bC bC bC
bCbC bC bC bC bC bC bC bC
bC bC C b
bCbC bC bC
bC
bC
bC Cb
bC bC
bC Cb
bC
bC bC u2
bC bC
bC bC bC
bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC Cb Cb bC bC bC bC bC bC CbCb
bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bbC C bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC Cb
bC bC bC

Iris Data (3D) Optimal 2D Basis

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 6/
Principal Component Analysis

Principal Component Analysis (PCA) is a technique that seeks a r -dimensional

basis that best captures the variance in the data.
The direction with the largest projected variance is called the first principal
component.
The orthogonal direction that captures the second largest projected variance is
called the second principal component, and so on.
The direction that maximizes the variance is also the one that minimizes the mean
squared error.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 7/
Principal Component: Direction of Most Variance
We seek to find the unit vector u that maximizes the projected variance of the
points. Let D be centered, and let Σ be its covariance matrix.
The projection of x i on u is given as
T
u xi
′
xi = u = (u T x i )u = ai u
uT u

Across all the points, the projected variance along u is

n n n
!
1X 1X T 1X
σu2 (ai − µu )2 = u x i x Ti u = u T x i x Ti u = u T Σu

=
n i =1 n i =1 n i =1

We have to find the optimal basis vector u that maximizes the projected variance
σu2 = u T Σu, subject to the constraint that u T u = 1. The maximization objective
is given as
max J(u) = u T Σu − α(u T u − 1)
u

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 8/
Principal Component: Direction of Most Variance
Given the objective maxu J(u) = u T Σu − α(u T u − 1), we solve it by setting the
derivative of J(u) with respect to u to the zero vector, to obtain

∂
u T Σu − α(u T u − 1) = 0

∂u
that is, 2Σu − 2αu = 0

which implies Σu = αu

Thus α is an eigenvalue of the covariance matrix Σ, with the associated

eigenvector u.
Taking the dot product with u on both sides, we have

σu2 = u T Σuu T αu = αu T u = α

To maximize the projected variance σu2 , we thus choose the largest eigenvalue λ1
of Σ, and the dominant eigenvector u 1 specifies the direction of most variance,
also called the f irst principal component.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 9/
Iris Data: First Principal Component

bC
X3
bC bC
bC bC

bC bC bC
bC bC
bC bC bC
bC bC bC
bC bC
bC bC Cb bC
bC Cb bC
bC bC Cb bC bC
Cb bC bC
bC
Cb bC bC bC
bC
CbCb bC bC bC bC bC bC bC bC
X1 Cb bC bC bC bC bC bC bC bC bC bC
bC bC bC
Cb bC bC bC bC X2
bC
bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC
bCbC bC bC
bC

bC
bC Cb
bC

bC
bC bC bC
bC bC bC
bC Cb bC bC bC bC bC bC bC CbCb
C b C b
bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC
bC
bC bC bC Cb
bC bC

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 10 /
Minimum Squared Error Approach
The direction that maximizes the projected variance is also the one that minimizes
the average squared error. The mean squared error (MSE) optimization condition
is
n n n
X kx i k2
1X 2 1X
MSE (u) = kǫi k = kx i − x ′i k2 = − u T Σu
n i =1 n i =1 i =1
n

Since the first term is fixed for a dataset D, we see that the direction u 1 that
maximizes the variance is also the one that minimizes the MSE. Further,

n d
X kx i k2 X
T
− u Σu = var (D) = tr (Σ) = σi2
i =1
n i =1

Thus, for the direction u 1 that minimizes MSE, we have

MSE (u 1 ) = var (D) − u T1 Σu 1 = var (D) − λ1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 11 /
Best 2-dimensional Approximation
The best 2D subspace that captures the most variance in D comprises the
eigenvectors u 1 and u 2 corresponding to the largest and second largest
eigenvalues λ1 and λ2 , respv.

Let U 2 = u 1 u 2 be the matrix whose columns correspond to the two principal
components. Given the point x i ∈ D its projected coordinates are computed as
follows:

a i = U T2 x i

Let A denote the projected 2D dataset. The total projected variance for A is
given as

var (A) = u T1 Σu 1 + u T2 Σu 2 = u T1 λ1 u 1 + u T2 λ2 u 2 = λ1 + λ2

The first two principal components also minimize the mean square error objective,
since
n n
1X 2 1X T
MSE = kx i − x ′i k = var (D) − x i P 2 x i = var (D) − var (A)
n i =1 n i =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 12 /
Optimal and Non-optimal 2D Approximations
The optimal subspace maximizes the variance, and minimizes the squared error,
whereas the nonoptimal subspace captures less variance, and has a high mean
squared error value, as seen from the lengths of the error vectors (line segments).

bC
X3 bC
X3
bC bC bC bC
bC bC bC bC

bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC
bC Cb bC Cb
Cb bC Cb Cb bC Cb
Cb bC bC Cb Cb bC bC Cb
bC bC bC bC bC bC
bbC C bC Cb bbC C bC Cb
Cb bC bC bC Cb bC bC bC
bC bC
Cb bC bC bC Cb bC bC bC
bC bC
CbCb bC bC bC bC bC bC bC bC CbCb bC bC bC bC bC bC bC bC
X1 Cb bC bC bC bC bC bC bC bC bC Cb X1 Cb bC bC bC bC bC bC bC bC bC Cb
bC bC bC bC bC bC
bC bC bC bC bC X2 bC bC bC bC bC X2
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bCbC bC bC bC bCbC bC bC bC

bC bC
bC Cb bC Cb
bC bC
u2
bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bCbC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 13 /
Best r -dimensional Approximation
To find the best r -dimensional approximation to D, we compute the eigenvalues of Σ.
Because Σ is positive semidefinite, its eigenvalues are non-negative

λ1 ≥ λ2 ≥ · · · λr ≥ λr +1 · · · ≥ λd ≥ 0

We select the r largest eigenvalues, and their corresponding eigenvectors to form the
best r -dimensional approximation.

Total Projected Variance: Let U r = u 1 · · · u r be the r -dimensional basis vector
matrix, withe the projection matrix given as P r = U r U Tr = ri=1 u i u Ti .
P

Let A denote the dataset formed by the coordinates of the projected points in the
r -dimensional subspace. The projected variance is given as
n r r
1X T X T X
var (A) = xi Pr xi = u i Σu i = λi
n
i =1 i =1 i =1

Mean Squared Error: The mean squared error in r dimensions is

n r d r
1 X
x i − x ′i 2 = var (D) −
X X X
MSE = λi = λi − λi
n
i =1 i =1 i =1 i =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 14 /
Choosing the Dimensionality

One criteria for choosing r is to compute the fraction of the total variance
captured by the first r principal components, computed as

Pr Pr
λ1 + λ2 + · · · + λr λi λi
f (r ) = = Pid=1 = i =1
λ1 + λ2 + · · · + λd λ
i =1 i
var (D)

Given a certain desired variance threshold, say α, starting from the first principal
component, we keep on adding additional components, and stop at the smallest
value r , for which f (r ) ≥ α. In other words, we select the fewest number of
dimensions such that the subspace spanned by those r dimensions captures at
least α fraction (say 0.9) of the total variance.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 15 /
Principal Component Analysis: Algorithm

PCA (D, Pn α):

1 µ = 1n i =1 x i // compute mean
2 Z = D − 1 · µT // center the data
Σ = 1n Z T Z // compute covariance matrix

3
4 (λ1 , λ2 , . . . , λd ) = eigenvalues(Σ) // compute eigenvalues

5 U = u 1 u 2 · · · u d = eigenvectors(Σ) // compute
eigenvectors
Pr
λi
6 f (r ) = Pdi=1 , for all r = 1, 2, . . . , d // fraction of total
i=1 λi
variance
7 Choose smallest r so that f (r ) ≥ α // choose dimensionality
8 U r = u 1 u 2 · · · u r // reduced basis
9 A = {a i | a i = U Tr x i , for i = 1, . . . , n} // reduced dimensionality
data

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 16 /
Iris Principal Components
Covariance matrix:
 
0.681 −0.039 1.265
Σ = −0.039 0.187 −0.320
1.265 −0.32 3.092

The eigenvalues and eigenvectors of Σ

λ1 = 3.662 λ2 = 0.239 λ3 = 0.059
     
−0.390 −0.639 −0.663
u 1 =  0.089 u 2 = −0.742 u 3 =  0.664
−0.916 0.200 0.346
The total variance is therefore λ1 + λ2 + λ3 = 3.662 + 0.239 + 0.059 = 3.96.
The fraction of total variance for different values of r is given as

r 1 2 3
f (r ) 0.925 0.985 1.0

This r = 2 PCs are need to capture α = 0.95 fraction of variance.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 17 /
Iris Data: Optimal 3D PC Basis

bC
X3
bC bC
bC bC

bC bC bC
bC bC bC
bC bC bC bC
bC bC bC
bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC Cb bC
bC Cb bC Cb bC
bC Cb CbbC bC bC bC bC bC bC Cb bC bC
X1 Cb bC bC Cb bC bC Cb bC bC bC bC bC bC bC bC bC Cb bC
bC
bC
bC bC
bC bC bC CbbC bC bCbC bC bC bC bC bC
Cb bC X2
bC bC bC Cb bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC Cb bC bC bC bC
bC bC bC
C b bC
bC bC bC bC bC bC bC bC bC bC
bCbC bC bC bC bC bC bC bC u3
bC bC bC
bCbC bC bC
bC
bC
bC Cb
bC bC
u2 bC Cb
bC
bC bC
bC bC
bC bC bC
bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC Cb Cb bC bC bC bC bC bC CbCb
bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bbC C bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC Cb
bC bC bC

Iris Data (3D) Optimal 3D Basis

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 18 /
Iris Principal Components: Projected Data (2D)

u2
1.5
bC
bC
bC bC
1.0 bC
bC bC bC
bC bC bC bC
bC bC
bC bC bC bC bC bC
bC bC
bC bC
0.5 bC
bC Cb bC bC Cb
bC Cb
bC bC
bC
bC bC
bC bC bC bC Cb bC
bC bC bC Cb bC bC bC Cb bC
Cb bC Cb bC Cb bC Cb bC bC bC Cb
bC Cb bC bC bC
bC Cb Cb
Cb
0 bC bC Cb
Cb bC
Cb bC bC bC bC
bC
b C Cb bC bC bC
bC bC Cb Cb Cb Cb Cb bC bC
bC bC Cb Cb
bC bC bC bC
bC Cb Cb
Cb bC
bC Cb bC bC bC
bC bC bC Cb bC bC
−0.5 bC Cb bC bC
bC
bC bC bC bC
bC
bC bC bC
bC
−1.0 bC
bC bC
bC bC

−1.5 u1
−4 −3 −2 −1 0 1 2 3

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 19 /
Geometry of PCA
Geometrically, when r = d, PCA corresponds to a orthogonal change of basis, so that the
total variance is captured by the sum of the variances along each of the principal
directions u 1 , u 2 , . . . , u d , and further, all covariances are zero.
Let U be the d × d orthogonal matrix U = u 1 u 2 · · · u d , with U −1 = U T . Let

Λ = diag (λ1 , · · · , λd ) be the diagonal matrix of eigenvalues. Each principal component u i

corresponds to an eigenvector of the covariance matrix Σ

Σu i = λi u i for all 1 ≤ i ≤ d

which can be written compactly in matrix notation:

ΣU = UΛ which implies Σ = UΛU T

Thus, Λ represents the covariance matrix in the new PC basis.

In the new PC basis, the equation

x T Σ−1 x = 1

defines a d-dimensional ellipsoid (or hyper-ellipse). The eigenvectors u i of Σ, that is, the
principal components, are the directions √
for the principal axes of the ellipsoid. The
square roots of the eigenvalues, that is, λi , give the lengths of the semi-axes.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 20 /
Iris: Elliptic Contours in Standard Basis

bC
bC bC
bC bC

bC bC bC
bC bC
Cb bC bC
bC bC
bC bC bC
bC bC Cb bC
bC Cb bC
bbC C bC bC
Cb bC Cb bC
bC
Cb bC bC
CbbC bC bC bC bC bC bC bC
Cb bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC
Cb bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC bC bC bC u3
bC bC bC
bCbC bC bC bC

bC
u2 bC Cb
bC

bC
bC bC bC
bC bC bC
bC Cb Cb bC bC bC bC bC bC CbCb
bC bC bC bC bC bC bC
bC bC bC bC C b
bC bC bC bC bC bC bC
bC
bC bC bC Cb
bC bC

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 21 /
Iris: Axis-Parallel Ellipsoid in PC Basis

u3
bC bC
bC bC bC bC
Cb bC bC bC bC bC
bCbC bC bC Cb bC Cb bC bC
bC bC bC bC bC bC Cb
bC Cb bCbC bC bC bC bC
bC CbCb bC bC bC bC bCbC bC bC bC bC bC bC bC Cb bC bC bC bC
bC bC Cb
C b C b Cb bC bC bC bC bC Cb
C b
bC bC bC bC bC Cb bC
Cb
bC Cb CbbC bC bC bC
bC Cb bC bC
bC bC bC
bC Cb bC bC bC bC
bC bC Cb Cb bC bC bC
bC bC Cb bC bC
bC bC bC bC bC bC bC bC bCbC bC bC bC bC
bC Cb bC bC bC bC bC bC bC
bC u2
bC
u1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 22 /
Kernel Principal Component Analysis

Principal component analysis can be extended to find nonlinear “directions” in the

data using kernel methods. Kernel PCA finds the directions of most variance in
the feature space instead of the input space. Using the kernel trick, all PCA
operations can be carried out in terms of the kernel function in input space,
without having to transform the data into feature space.

Let φ be a function that maps a point x in input space to its image φ(x i ) in
feature space. Let the points in feature space be centered and let Σφ be the
covariance matrix. The first PC in feature space correspond to the dominant
eigenvector

Σφ u 1 = λ1 u 1

where
n
1X
Σφ = φ(x i )φ(x i )T
n i =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 23 /
Kernel Principal Component Analysis

n
X
It can be shown that u 1 = ci φ(x i ). That is, the PC direction in feature space
i =1
is a linear combination of the transformed points.
The coefficients are captured in the weight vector
T
c = c1 , c2 , · · · , cn

Substituting into the eigen-decomposition of Σφ and simplifying, we get:

K c = nλ1 c = η1 c

Thus, the weight vector c is the eigenvector corresponding to the largest

eigenvalue η1 of the kernel matrix K .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 24 /
Kernel Principal Component Analysis
n
X
The weight vector c can be used to then find u 1 via u 1 = ci φ(x i ).
i =1

The only constraint we impose is that u 1 should be normalized to be a unit vector, which
implies kck2 = η11 .
We cannot compute directly the principal direction, but we can project any point φ(x)
onto the principal direction u 1 , as follows:
n
X n
X
u T1 φ(x) = ci φ(x i )T φ(x) = ci K (x i , x)
i =1 i =1

which requires only kernel operations.

We can obtain the additional principal components by solving for the other eigenvalues
and eigenvectors of
K c j = nλj c j = ηj c j
If we sort the eigenvalues of K in decreasing order η1 ≥ η2 ≥ · · · ≥ ηn ≥ 0, we can obtain
the jth principal component as the corresponding eigenvector c j . The variance along the
η
jth principal component is given as λj = nj .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 25 /
Kernel PCA Algorithm

KernelPCA
(D, K , α):
1 K = K (x i , x j ) i ,j =1,...,n // compute n × n kernel matrix
2 K = (I − 1n 1n×n )K (I − 1n 1n×n ) // center the kernel matrix
3 (η1 , η2 , . . . , ηd ) = eigenvalues(K
) // compute eigenvalues
4 c 1 c 2 · · · c n = eigenvectors(K ) // compute eigenvectors
5 λi = ηni for all i = 1, . . . , n // compute variance for each
component
q
6 c i = η1 · c i for all i = 1, . . . , n // ensure that u Ti u i = 1
Pi r
λi
7 f (r ) = Pdi=1 , for all r = 1, 2, . . . , d // fraction of total
i=1 λi
variance
8 Choose smallest r so that f (r ) ≥ α // choose dimensionality
9 C r = c 1 c 2 · · · c r // reduced basis
10 A = {a i | a i = C Tr K i , for i = 1, . . . , n} // reduced dimensionality
data

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 26 /
Nonlinear Iris Data: PCA in Input Space

u1
1.5 1.5
bC bC

bC bC
bC bC
1.0 bC 1.0 bC
bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC
0.5 bC bC bC bC 0.5 bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
X2

X2
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
0 bC bC bC bC bC bC bC bC bC bC Cb bC bC bC 0 bC bC bC bC bC bC bC bC bC bC Cb bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
−0.5 bC bC bC bC bC bC −0.5 bC bC bC bC bC bC u2
bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC

−1 bC −1 bC

−0.5 0 0.5 1.0 1.5 −0.5 0 0.5 1.0 1.5

X1 X1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 27 /
Nonlinear Iris Data: Projection onto PCs

bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC
0 bC bC bC
bC bC bC bC bC
bC bC bC
bC bC bC bC bC
bC bC bC bC
bC bC
bC bC
bC
bC bC
bC
−0.5 bC bC
bC bC
bC bC

bC
bC
bC
−1.0 bC

−1.5 u1
−0.75 0 0.5 1.0 1.5

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 28 /
Kernel PCA: 3 PCs (Contours of Constant Projection)
Homogeneous Quadratic Kernel: K (x i , x j ) = (x T
i xj )
2

1.5 1.5 1.5

bC bC bC

bC bC bC
bC bC bC
1.0 bC 1.0 bC 1.0 bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
0.5 bC Cb bC bC 0.5 bC Cb bC bC 0.5 bC Cb bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
X2

X2
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
0 bC bC bC bC bC bC bC bC bC bC Cb bC bC bC 0 bC bC bC bC bC bC bC bC bC bC Cb bC bC bC 0 bC bC bC bC bC bC bC bC bC bC Cb bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
−0.5 bC bC bC bC bC bC −0.5 bC bC bC bC bC bC −0.5 bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC

−1 bC −1 bC −1 bC

−0.5 0 0.5 1.0 1.5 −0.5 0 0.5 1.0 1.5 −0.5 0 0.5 1.0 1.5
X1 X1 X1

(a) λ1 = 0.2067 (b) λ2 = 0.0596 (c) λ3 = 0.0184

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 29 /
Kernel PCA: Projected Points onto 2 PCs
Homogeneous Quadratic Kernel: K (x i , x j ) = (x T
i xj )
2

bC bC bC bC bC bC
bC bCbC bC bC bC bC bC bC bC bCbC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC
0 bC bC bC bC bC bC bC
bC bC Cb bC bC
bC bC bC bC bC bC
bC bC
bC
bC C b
bC
Cb bC
−0.5
bC
bC

bC
−1.0

−1.5

bC
−2 u1
−0.5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 30 /
Singular Value Decomposition

Principal components analysis is a special case of a more general matrix decomposition

method called Singular Value Decomposition (SVD). PCA yields the following
decomposition of the covariance matrix:

Σ = UΛU T

where the covariance matrix has been factorized into the orthogonal matrix U containing
its eigenvectors, and a diagonal matrix Λ containing its eigenvalues (sorted in decreasing
order).
SVD generalizes the above factorization for any matrix. In particular for an n × d data
matrix D with n points and d columns, SVD factorizes D as follows:

D = L∆R T

where L is a orthogonal n × n matrix, R is an orthogonal d × d matrix, and ∆ is an n × d

“diagonal” matrix, defined as ∆(i, i) = δi , and 0 otherwise. The columns of L are called
the left singular and the columns of R (or rows of R T ) are called the right singular
vectors. The entries δi are called the singular values of D, and they are all non-negative.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 31 /
Reduced SVD

If the rank of D is r ≤ min(n, d), then there are only r nonzero singular values, ordered
as follows: δ1 ≥ δ2 ≥ · · · ≥ δr > 0.
We discard the left and right singular vectors that correspond to zero singular values, to
obtain the reduced SVD as

D = Lr ∆r R Tr

where Lr is the n × r matrix of the left singular vectors, R r is the d × r matrix of the
right singular vectors, and ∆r is the r × r diagonal matrix containing the positive singular
vectors.
The reduced SVD leads directly to the spectral decomposition of D given as
r
X
D= δi l i r Ti
i =1

The best rank q approximation to the original data D is the matrix D q = qi=1 δi l i r Ti
P
qP P
n d
that minimizes the expression kD − D q kF , where kAkF = i =1 j =1 A(i, j) is called
2

the Frobenius Norm of A.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 32 /
Connection Between SVD and PCA
Assume D has been centered, and let D = L∆R T via SVD. Consider the scatter matrix
for D, given as D T D. We have
T
D T D = L∆R T L∆R T = R∆T LT L∆R T = R(∆T ∆)R T = R∆2d R T

where ∆2d is the d × d diagonal matrix defined as ∆2d (i, i) = δi2 , for i = 1, . . . , d.
The covariance matrix of centered D is given as Σ = 1n D T D; we get

D T D = nΣ
= nUΛU T
= U(nΛ)U T

The right singular vectors R are the same as the eigenvectors of Σ. The singular values
of D are related to the eigenvalues of Σ as
δi2
nλi = δi2 , which implies λi = , for i = 1, . . . , d
n

Likewise the left singular vectors in L are the eigenvectors of the matrix n × n matrix
DD T , and the corresponding eigenvalues are given as δi2 .
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 33 /
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 7: Dimensionality Reduction

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 34 /

Parvaneh Joharinad, Jürgen Jost - Mathematical Principles of Topological and Geometric Data Analysis (Mathematics of Data, 2) - Springer (2023)
No ratings yet
Parvaneh Joharinad, Jürgen Jost - Mathematical Principles of Topological and Geometric Data Analysis (Mathematics of Data, 2) - Springer (2023)
287 pages
Majumdar 和 Laha - 2020 - Clustering and classification of time series using
No ratings yet
Majumdar 和 Laha - 2020 - Clustering and classification of time series using
40 pages
Time Series Notes
100% (4)
Time Series Notes
212 pages
CTDAbook
No ratings yet
CTDAbook
377 pages
Christian Dior The Magic of Fashion
100% (3)
Christian Dior The Magic of Fashion
66 pages
Topological Data Analysis Presentation
No ratings yet
Topological Data Analysis Presentation
87 pages
Specification For Wrought Aluminium and Aluminium Alloy Plate For General Engineering Purposes
0% (1)
Specification For Wrought Aluminium and Aluminium Alloy Plate For General Engineering Purposes
7 pages
Hart Oil & Gas Lawsuit
100% (1)
Hart Oil & Gas Lawsuit
55 pages
Topological Data Analysis
No ratings yet
Topological Data Analysis
11 pages
Evaluating Ayasdi's Topological Data Analysis For Big Data HKim2015
No ratings yet
Evaluating Ayasdi's Topological Data Analysis For Big Data HKim2015
84 pages
TDA Mapper PDF
No ratings yet
TDA Mapper PDF
71 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Time Series
No ratings yet
Time Series
327 pages
Computing Persistent Homology
No ratings yet
Computing Persistent Homology
41 pages
An Introduction To Topological Data Analysis - Chazal & Michel
No ratings yet
An Introduction To Topological Data Analysis - Chazal & Michel
44 pages
Lecture Finite Difference Crank
No ratings yet
Lecture Finite Difference Crank
37 pages
UBL Operations Management
No ratings yet
UBL Operations Management
18 pages
Geometric and Topological Data Reduction
No ratings yet
Geometric and Topological Data Reduction
275 pages
Cubic Spline Interpolation PDF
100% (1)
Cubic Spline Interpolation PDF
67 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
9 pages
Archmodels Vol 127
100% (1)
Archmodels Vol 127
71 pages
Theory - of - Interest 3rd
No ratings yet
Theory - of - Interest 3rd
329 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Python Final Project
No ratings yet
Python Final Project
3 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
A Finite Volume Method For GNSQ
No ratings yet
A Finite Volume Method For GNSQ
19 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Python Oops Concept
No ratings yet
Python Oops Concept
37 pages
PCA Machine Learning
No ratings yet
PCA Machine Learning
34 pages
Research Proposal Listowel Dika
No ratings yet
Research Proposal Listowel Dika
32 pages
Stochastic Analysis in Finance III
No ratings yet
Stochastic Analysis in Finance III
17 pages
On Derivations of Black-Scholes Greek Letters
No ratings yet
On Derivations of Black-Scholes Greek Letters
6 pages
Finite Element Method Bai 1
No ratings yet
Finite Element Method Bai 1
31 pages
RPT Bahasa Inggeris Tingkatan 3 2017
100% (1)
RPT Bahasa Inggeris Tingkatan 3 2017
21 pages
Brand Perception of Honda Products
No ratings yet
Brand Perception of Honda Products
64 pages
Data Science in Agriculture Part I: Introduction
100% (1)
Data Science in Agriculture Part I: Introduction
2 pages
Mathematical Finance: Hiroshi Toyoizumi September 26, 2012
No ratings yet
Mathematical Finance: Hiroshi Toyoizumi September 26, 2012
87 pages
Decision Theory Exercise Sheet Solutions
No ratings yet
Decision Theory Exercise Sheet Solutions
3 pages
The Advantages of Least Squares Monte Carlo
0% (1)
The Advantages of Least Squares Monte Carlo
9 pages
A Numerical Solution of The Navier-Stokes Equations Using The Finite Element
No ratings yet
A Numerical Solution of The Navier-Stokes Equations Using The Finite Element
28 pages
Basic of Finite Difference Method
No ratings yet
Basic of Finite Difference Method
7 pages
Intern Data Science
No ratings yet
Intern Data Science
2 pages
Comments On The Savitzky Golay Convolution Method For Least Squares Fit Smoothing and Differentiation of Digital Data
No ratings yet
Comments On The Savitzky Golay Convolution Method For Least Squares Fit Smoothing and Differentiation of Digital Data
4 pages
Healing Benefits of Himalayan Pink Salt
No ratings yet
Healing Benefits of Himalayan Pink Salt
4 pages
Module 07 Lecture Slides
No ratings yet
Module 07 Lecture Slides
166 pages
Sentence and Reading Comprehension
No ratings yet
Sentence and Reading Comprehension
4 pages
Investor Presentation
No ratings yet
Investor Presentation
30 pages
Markowitz-Portfolio-Optimization
No ratings yet
Markowitz-Portfolio-Optimization
10 pages
Worksheet On Conduction, Convection and Radiation
No ratings yet
Worksheet On Conduction, Convection and Radiation
2 pages
Modeling With Penalized Splines
No ratings yet
Modeling With Penalized Splines
50 pages
Sara
No ratings yet
Sara
160 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Chapter 10: Sequence Mining
No ratings yet
Chapter 10: Sequence Mining
37 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
45 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
59 pages
Stability Numerical Schemes
No ratings yet
Stability Numerical Schemes
15 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
79 pages
CH 12 The Black-Scholes Formula
No ratings yet
CH 12 The Black-Scholes Formula
45 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Linear and Multiobjective Programming With Fuzzy Stochastic Extensions
No ratings yet
Linear and Multiobjective Programming With Fuzzy Stochastic Extensions
103 pages
Convergence Black-Scholes To Binomial
100% (1)
Convergence Black-Scholes To Binomial
9 pages
Black Scholes
No ratings yet
Black Scholes
41 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
58 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
Introduction of Data Science - Mahatma Gandhi Central University
No ratings yet
Introduction of Data Science - Mahatma Gandhi Central University
17 pages
Formulae Sheet For Multivariate Statistics
No ratings yet
Formulae Sheet For Multivariate Statistics
4 pages
R 2 Calculations
No ratings yet
R 2 Calculations
30 pages
Binomial and Black Scholes - 111153
No ratings yet
Binomial and Black Scholes - 111153
18 pages
Chapter 8: Itemset Mining
No ratings yet
Chapter 8: Itemset Mining
34 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
31 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
29 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
28 pages
Mean-Variance Portfolio Selection With Estimation Risk and Transaction Costs
No ratings yet
Mean-Variance Portfolio Selection With Estimation Risk and Transaction Costs
19 pages
Chapter 3: Categorical Attributes
No ratings yet
Chapter 3: Categorical Attributes
26 pages
An Empirical Verification On The Performance of Black-Scholes Option Pricing Model in Nigerian Stock Market
No ratings yet
An Empirical Verification On The Performance of Black-Scholes Option Pricing Model in Nigerian Stock Market
9 pages
HOA314N: Activity 2: Vernacular Houses
No ratings yet
HOA314N: Activity 2: Vernacular Houses
8 pages
Diffusion and Black-Scholes Pricing Formula Wikipedia
No ratings yet
Diffusion and Black-Scholes Pricing Formula Wikipedia
13 pages
Chapter 1: Data Mining and Analysis
No ratings yet
Chapter 1: Data Mining and Analysis
24 pages
Predicting Credit Risk For Unsecured Lending
No ratings yet
Predicting Credit Risk For Unsecured Lending
9 pages
Chapter 6: High-Dimensional Data
No ratings yet
Chapter 6: High-Dimensional Data
21 pages
Multivariate Statistics Introduction
No ratings yet
Multivariate Statistics Introduction
20 pages
A Comparative Analysis of The
No ratings yet
A Comparative Analysis of The
15 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Exponential Smoothing
No ratings yet
Exponential Smoothing
5 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
16 pages
Optimization of Investment Portfolio Management
No ratings yet
Optimization of Investment Portfolio Management
15 pages
English 4: Quarter 1: Week 3
No ratings yet
English 4: Quarter 1: Week 3
12 pages
Madhubhan Rejou Spa Services Menu
No ratings yet
Madhubhan Rejou Spa Services Menu
10 pages
Imo Cnew Series
No ratings yet
Imo Cnew Series
6 pages
A Non-Linear Black-Scholes Equation
No ratings yet
A Non-Linear Black-Scholes Equation
8 pages
Retail Management
No ratings yet
Retail Management
8 pages
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
No ratings yet
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
6 pages
Alnpp0187h 2024
No ratings yet
Alnpp0187h 2024
8 pages
A Short Remark On Feller's Square Root Condition.
No ratings yet
A Short Remark On Feller's Square Root Condition.
3 pages
TEST 3. Lines Quadratics Composition Inverse (2015)
No ratings yet
TEST 3. Lines Quadratics Composition Inverse (2015)
10 pages
Template Sop 2 & 3-Sheryl A. Vicente
No ratings yet
Template Sop 2 & 3-Sheryl A. Vicente
8 pages
Title Page Thesis SHSHSH
No ratings yet
Title Page Thesis SHSHSH
6 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 13
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 13
7 pages
Lesson Plan: Veer Surendra Sai University of Technology
No ratings yet
Lesson Plan: Veer Surendra Sai University of Technology
2 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Numerical Solution of Black-Scholes Equation
No ratings yet
Numerical Solution of Black-Scholes Equation
8 pages
Sahil - Shamra - TCA NDA Form
No ratings yet
Sahil - Shamra - TCA NDA Form
2 pages
Dispatch & Store
No ratings yet
Dispatch & Store
1 page
Cet455-Qp May 24
No ratings yet
Cet455-Qp May 24
2 pages
Tickets in Integrated Public Transport System of Southern Moravia
No ratings yet
Tickets in Integrated Public Transport System of Southern Moravia
1 page
P6F-52 Antenna: Terrestrial Microwave Antenna Products
No ratings yet
P6F-52 Antenna: Terrestrial Microwave Antenna Products
3 pages
Read Me
No ratings yet
Read Me
2 pages
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet

Chapter 7: Dimensionality Reduction

Uploaded by

Chapter 7: Dimensionality Reduction

Uploaded by

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms

Mohammed J. Zaki1 Wagner Meira Jr.2

Chapter 7: Dimensionality Reduction

where a = (a1 , a2 , . . . , ad )T represents the coordinates of x in the new basis. More

The r -dimensional projection of x is thus given as:

Given the projected vector x ′ = P r x, the corresponding error vector, is the

The error vector ǫ is orthogonal to x ′ .

Iris Data: 3D Optimal 1D Basis

Iris Data (3D) Optimal 2D Basis

Principal Component Analysis (PCA) is a technique that seeks a r -dimensional

Across all the points, the projected variance along u is

Thus α is an eigenvalue of the covariance matrix Σ, with the associated

Thus, for the direction u 1 that minimizes MSE, we have

MSE (u 1 ) = var (D) − u T1 Σu 1 = var (D) − λ1

Mean Squared Error: The mean squared error in r dimensions is

PCA (D, Pn α):

The eigenvalues and eigenvectors of Σ

This r = 2 PCs are need to capture α = 0.95 fraction of variance.

Iris Data (3D) Optimal 3D Basis

Λ = diag (λ1 , · · · , λd ) be the diagonal matrix of eigenvalues. Each principal component u i

which can be written compactly in matrix notation:

ΣU = UΛ which implies Σ = UΛU T

Thus, Λ represents the covariance matrix in the new PC basis.

Principal component analysis can be extended to find nonlinear “directions” in the

Substituting into the eigen-decomposition of Σφ and simplifying, we get:

Thus, the weight vector c is the eigenvector corresponding to the largest

which requires only kernel operations.

−0.5 0 0.5 1.0 1.5 −0.5 0 0.5 1.0 1.5

1.5 1.5 1.5

(a) λ1 = 0.2067 (b) λ2 = 0.0596 (c) λ3 = 0.0184

Principal components analysis is a special case of a more general matrix decomposition

where L is a orthogonal n × n matrix, R is an orthogonal d × d matrix, and ∆ is an n × d

the Frobenius Norm of A.

Mohammed J. Zaki1 Wagner Meira Jr.2

Chapter 7: Dimensionality Reduction

You might also like