0% found this document useful (0 votes)

3 views57 pages

Lec05 DimensionalityReduction

The document discusses the dimensionality problem in pattern classification, highlighting the challenges of high-dimensional data and the curse of dimensionality. It covers techniques for dimensionality reduction such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), as well as feature selection methods. The document emphasizes the importance of balancing the number of features with training samples to improve classification accuracy.

Uploaded by

Preet Kr Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views57 pages

Lec05 DimensionalityReduction

Uploaded by

Preet Kr Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Pattern Classification
Lecture 05: Component and Discriminant Analysis

Kundan Kumar
https://github.com/erkundanec/PatternClassification

c 2020 Kundan Kumar, All Rights Reserved

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Topics to be covered

Dimensionality Problem
Dimensionality/Feature reduction
Principal component analysis
Linear discriminant analysis
Fisher Linear discriminant
Multiple Discriminant Analysis
Feature Selection

1/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Dimensionality Problem

2/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Introduction

In practical multicategory applications, it is not unusual to encounter problems

involving tens or hundreds of features.
Intuitively, it may seem that each feature is useful for at least some of the
discriminations.
In general, if the performance obtained with a given set of features is inadequate, it
is natural to consider adding new features.
Even though increasing the number of features increases the complexity of the
classifier, it may be acceptable for an improved performance.

3/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Introduction
28 CHAPTER 3. MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION

Figure Bayes
Figure: There is a non-zero 3.3: Two three-dimensional
error distributionsxhave
in the one-dimensional nonoverlapping densities, and
1 space or the two-dimensional x1 , x2 space.
thus in three dimensions the Bayes error vanishes. When projected to a subspace —
However, the Bayes error vanishes in the x1 , x2 , x3 space because of non-overlapping densities.
here, the two-dimensional x1 − x2 subspace or a one-dimensional x1 subspace — there
4/56 can be greater overlap of the projected
Kundan Kumar distributions, and hence greater Bayes errors. Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Problems of Dimensionality

Unfortunately, it has frequently been observed in practice that, beyond a certain

point, adding new features leads to worse rather than better performance.

This is called the curse of dimensionality.

There are two issues that we must be careful about:

How is the classification accuracy affected by the dimensionality (relative to the amount
of training data)?
How is the complexity of the classifier affected by the dimensionality?

5/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Problems of Dimensionality

Potential reasons for increase in error include

wrong assumptions in model selection,
estimation errors due to the finite number of training samples for high-dimensional
observations (overfitting).

Potential solutions include

reducing the dimensionality,
simplifying the estimation.

6/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Problems of Dimensionality

Dimensionality can be reduced by

redesigning the features,
selecting an appropriate subset among the existing features,
combining existing features.
Estimation errors can be simplified by
assuming equal covariance for all classes (for the Gaussian case),
using regularization,
using prior information and a Bayes estimate,
using heuristics such as conditional independence,
···.

7/56 Kundan Kumar Pattern Classification

Dimensionality Problem the straight lineAnalysis
Component could easily be
PCAsuperior. The tenth-degree
LDA polynomial ﬁts the given
Feature Selection References
data perfectly. However, we do not expect that a tenth-degree polynomial is required
here. In general, reliable interpolation or extrapolation can not be obtained unless
Problem of Dimensionality
the solution is overdetermined, i.e., there are more points than function parameters
to be set.

f(x)

x
2 4 6 8
-5

-10

Figure 3.4:
Figure: The “training The (black
data” “training data”
dots) (black
were dots) from
selected were selected from function
a quadratic a quadradic function
plus Gaussian noise, i.e,
f (x) = ax2 + plus
bx +Gaussian noise,p(ε)
c + ε where i.e.,≈
f (x) =σ
N (0, ax22).+The
bx +10th
c + %degree
where p(%)
polynomial
2
∼ N (0, σshown
). Thefits10th
the data perfectly, but
degree
we desire instead thepolynomial shown
second-order ﬁts the fdata
function (x),perfectly,
since it but we desire
would lead toinstead
betterthe second-order
predictions for few samples.
function f (x), since it would lead to better predictions for new samples.

In ﬁtting the points in Fig. 3.4, then, we might consider beginning with a high-
8/56 polynomial (e.g., 10th order),
order Kundan Kumar
and successively smoothing or simplifying our Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Problem of Dimensionality

All of the commonly used classifiers can suffer from the curse of dimensionality.

While an exact relationship between the probability of error, the number of training
samples, the number of features, and the number of parameters is very difficult to
establish, some guidelines have been suggested.

It is generally accepted that using at least ten times as many training samples per
class as the number of features (n/d > 10) is a good practice.

9/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature/Dimensionality Reduction

10/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Component Analysis and Discriminants

One way of coping with the problem of high dimensionality is to reduce the
dimensionality by combining features.
Issues in feature/dimensionality reduction:
Linear vs. non-linear transformations.
Use of class labels or not (depends on the availability of training data).
Linear combinations are particularly attractive because they are simple to compute
and are analytically tractable.
Linear methods project the high-dimensional data onto a lower dimensional space.
Advantages of these projections include
reduced complexity in estimation and classification,
ability to visually examine the multivariate data in two or three dimensions.

11/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Component Analysis and Discriminants

Given x ∈ Rd , the goal is to find a linear transformation A that gives y = AT x,

0
y ∈ Rd where d0 < d.

Two classical approaches for finding optimal linear transformations are:

Principal Components Analysis (PCA): Seeks a projection that best represents the data
in a least-squares sense.
Multiple Discriminant Analysis (MDA): Seeks a projection that best separates the data
in a least-squares sense.

12/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

13/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

Given x1 , x2 , . . . , xn ∈ Rd , the goal is to find a d0 -dimensional subspace where the

reconstruction error of xi in this subspace is minimized.
The squared-error criterion function J0 (x0 ) by
n
X
J0 (x0 ) = kx0 − xk k2
k=1

and seek the value of x0 that minimizes J0

It is simple to show that the solution to this problem is given by x0 = m, where m
is the sample mean.
n
1X
m= xk
n
k=1

14/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

This can be easily verified by writing
n
X
J0 (x0 ) = k(x0 − m) − (xk − m)k2
k=1
Xn n
X n
X
= k(x0 − m)k2 −2 (x0 − m)T (xk − m)+ k(xk − m)k2
k=1 k=1 k=1
Xn n
X Xn
= k(x0 − m)k2 −2(x0 − m)T (xk − m)+ k(xk − m)k2
k=1 k=1 k=1
Xn n
X
= k(x0 − m)k2 + k(xk − m)k2
k=1 k=1
| {z }
independent of x0

Since the second sum is independent of x0 , So the above expression is obviously

minimized by the choice of x0 = m.
15/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

The sample mean is a zero-dimensional representation of the data set. It is simple,

but it does not reveal any of the variability in the data.
One-dimensional representation by projecting the data onto a line running through
the sample mean.
Let e be a unit vector in the direction of the line. Then equation of line will be

x = m + ae

where a is any real value, corresponds to the distance of any point x form the mean
m.
If xk = m + ak e, then we can find optimal set of coefficients ak by minimizing the
squared-error criterion function.

16/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

Squared-error criterion function
n
X
J1 (a1 , a2 , . . . , an , e) = k(m + ak e) − xk k2
k=1
Xn
= kak e − (xk − m)k2
k=1
Xn n
X n
X
= a2k kek2 − 2 ak eT (xk − m) + k(xk − m)k2
k=1 k=1 k=1

Recognize that ||e|| = 1, partially differentiating with respect to ak , and setting the
derivative to zero, we obtain
ak = eT (xk − m)
Geometrically, this result merely says that we obtain a least-squares solution by
projecting the vector xk onto the line in the direction of e that passes through the
sample mean.
17/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

The solution to the problem involves the scatter matrix S defined by
n
X
S= (xk − m)(xk − m)T
k=1

Scatter matrix is n times the sample covariance matrix.

Substitute ak in the cost function
n
X n
X n
X
J1 (e) = a2k − 2 a2k + kxk − mk2
k=1 k=1 k=1
n n
X 2 X
=− [eT (xk − m)] + kxk − mk2
k=1 k=1
Xn n
X
=− eT (xk − m)(xk − m)T e + kxk − mk2
k=1 k=1
n
X
T 2
= −e Se + kxk − mk
k=1

18/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

So the resulting cost function

n
X
J1 (e) = −eT Se + kxk − mk2
k=1

Use Lagrange multipliers to maximize eT Se subject to the constraint that kek = 1.

Letting λ be the undetermined multiplier, we differentiate

u = eT Se − λ(eT e − 1)
∂u
= 2Se − 2λe
∂e
Se = λe

In particular, because eT Se = λeT e = λ, it follows that to maximize eT Se, so select the eigenvector
corresponding to the largest eigenvalue of the scatter matrix.

19/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

To find the best one-dimensional projection of the data (best in the

least-sum-of-squared-error sense), we project the data onto a line through the
sample mean in the direction of the eigenvector of the scatter matrix having the
largest eigenvalue.

This result can be readily extended from 1-D to a d0 -D projection.

d0
X
x=m+ ai ei
i=1

where d0 ≤ d.

20/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

It is not difficult to show that the criterion function

 0
 2
n
X d
X
Jd0 = m + aki ei  − xk
k=1 i=1

is minimized when the vector e1 , e2 , . . . , ed0 are the d0 eigenvector of the scatter
matrix having the largest eigenvalues.

Because the scatter matrix is real and symmetric, these eigenvectors are orthogonal.

The coefficients ai are the components of x in that basis, and are the principal
components.

21/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

Given x1 , x2 , . . . , xn ∈ Rd , the goal is to find a d0 -dimensional subspace where the
reconstruction error of xi in this subspace is minimized.
The squared error criterion function J0 (x0 ) can be minimized by selecting x0 = m,
where m is the sample mean.
The sample mean is a zero-dimensional representation of the data set. It is simple,
but it does not reveal any of the variability in the data.
We must consider at least one-dimensional representation of data by choosing
x = m + ae
and compute the optimal value of a such that the squared error criterion function
J1 is minimum.
We obtained the solution as
ak = eT (xk − m)
22/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

Given x1 , x2 , . . . , xn ∈ Rd , the goal is to find a d0 -dimensional subspace where the

reconstruction error of xi in this subspace is minimized.
The criterion function for the reconstruction error can be defined in the least
squares sense as
n d0
! 2
X X
Jd0 = m+ aki ei − xk
k=1 i=1

where e1 , e2 , . . . , ed0 are the bases for the subspace (stored as the columns of A)
and ai is the projection of xi onto that subspace.

23/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Principal Component Analysis

It can be shown that Jd0 is minimized when e1 , e2 , . . . , ed0 are eigenvectors

corresponding to first d0 largest eigenvalues of scatter matrix.
n
X
S= (xk − m)(xk − m)T
k=1

The coefficients a = (ai , . . . , ad0 )T are called the principal components.

When the eigenvectors are sorted in descending order of the corresponding
eigenvalues, the greatest variance of the data lies on the first principal component,
the second greatest variance on the second component, and so on.

24/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Example to be solved
Question: Given the following sets of feature vector belonging to two classes ω1 and ω2 which is
Gaussian distributed.

1 3 4 5 7
, , , , ∈ ω1
2 4 3 5 5

6 9 7 11 13
, , , , ∈ ω2
2 4 3 4 6
Find out the best direction of the line of projection that best represent the data in one-dimensional
feature space. 8

6
x2

0
0 5 10 15
x1
25/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Examples: Iris dataset representation

”Iris” dataset is very famous dataset used for data analysis problems (classification,
feature reduction, and many more)
Available on the UCI machine learning repository
https://archive.ics.uci.edu/ml/datasets/Iris.
The iris dataset contains measurements for 150 iris flowers from three different
species.
Iris-setosa (n1 = 50)
Iris-versicolor (n2 = 50)
Iris-virginica (n3 = 50)
And the four features of in Iris dataset are:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm

26/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Examples: Iris data representation

sepal length (cm)

sepal width (cm)

petal length (cm)

2
petal width (cm)

1
5

2
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

Figure: Scatter plot of the iris data. Diagonal cells show the histogram for each feature. Other cells show
scatters of pairs of features x1 , x2 , x3 , x4 in top-down and left-right order. Red, green and blue points represent
samples for the setosa, versicolor and virginica classes, respectively.
27/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Examples: Iris data representation

Iris-setosa
Iris-versicolor
Iris-virginica

1.5

3
3rd eigenvector, e
1.0
2nd eigenvector, e2

0.5

0.0

−0.5
Iris-setosa

e2
Iris-versicolor
−1.0

r,
cto
Iris-virginica

ve
en
−2 0 2 4

eig
1st eigenvector, e1 1st eigenv

d
ector, e

2n
1

Figure: Scatter plot of the projection of the iris data onto the first two and the first three principal axes.

28/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Linear Discriminant Analysis

29/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Fisher Linear Discriminant

4.10. *FISHER LINEAR DISCRIMINANT 45
PCA seeks directions that are efficient for representation, discriminant analysis
seeks directions that are efficient for discrimination.
x2 x2
2 2

1.5 1.5

1 1

0.5 0.5
w

x1 x1
0.5 1 1.5 0.5 1 1.5
w

-0.5

Figure: Projection of the same set of samples onto two different lines in the directions marked as w. The
Figure
figure 4.27:
on the right Projection
shows greaterof samplesbetween
separation onto two diﬀerent
the red and blacklines. Thepoints
projected ﬁgure on the right
shows greater separation between the red and black projected points.
30/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Fisher4.10.
Linear Discriminant
*FISHER LINEAR DISCRIMINANT

Suppose x1 , x2 , . .x. , xn ∈ Rd are divided

2
x2

into two subsets 2D1 (n1 samples) and 2

D2 (n2 samples) corresponding to the

classes ω1 and ω1.52 respectively, the goal 1.5

is to find a projection onto a line

defined as 1 1
y = wT x
such that the points
0.5 corresponding
w
to 0.5

D1 and D2 are well separated.

A corresponding set of n 0.5
samples
1 1.5
x1
0.5 1 1.5
x1

w
y1 , y2 , . . . , yn divided into the subset Y1
and Y2 . -0.5

Figure 31/56
4.27: Projection of Kundan
samples
Kumar onto two different lines. ThePattern
figure on th
Classification
values
Dimensionality for
w forComponent
Problem a two-dimensional
Analysis PCA s̃2i = It should
example. (y −= be
m̃ 12 t clear that
abundantly
i) .
LDA if theSelection (76)References
Feature
w x = w t mi . (74)
original distributions are multimodal and highly y∈Yi overlapping,ni even the “best” w is
y∈Yi
Fisher Linear Discriminant
unlikely to provide adequate seaparation, and thus this method will be of little use.
Thus, turn to21 the
We now(1/n)(s̃ +and 2
s̃matter
2 )isissimply
an estimate
of finding theofbest
the projection thesuch
ofvariance
midirection
. of the pooled
w, one data,
we hope willand s̃21 + s̃22
enable accurate
is called theclassification.
total within-class
It followsA measure the of
thatscatter the separation
of the
distance projected
between between
the the projected
samples.
projected The is
means Fisher linear with
points
The is the difference
discriminant
criterion employs
functionof the that
for sample
linear
the means.
best If mi w
function
separationist the
xcan d-dimensional
for which
be sample
the criterion
defined as mean
function clas
given by |m̃ − m̃ | = |wt (m − m )|, scat
(75)
1 2 1 2
|m̃1 − m̃2 |2
1
and that we can J(w)
make = difference
this (77) w. Of
as large as we wish merely by scaling
mi = x, s̃21 + s̃22 (73)
n
course, to obtain good
i separation of the projected data we really want the difference
x∈Di
is maximum between
(and the
independentmeans
of to be large
w). relative
While thetowsome measure ofJ(·)
maximizing the standard
leads todeviations
the for
where,
then the m̃i is the
sample meansample
for
each the mean
projected
class. Rather s̃2i is
andpoints
than isthe scatter
given
forming by
sample for the projected
variances, we define samples
the scatter for projected
best separation between the two projected sets (in the sense just described), we will
labeled ωi , given assamples labelled ωi by
also need a threshold criterion before we have a true classifier. We first consider how
to find the optimal w, and later to the issue
1 turn ofthresholds.
m̃i = y s̃2i = (y − m̃i )2 . (76)
To obtain J(·) as an explicit
ni function of w, we y∈Y
y∈Yi
define the scatter matrices Si and scat
i
SW by matr
2 2 2 2
This is called the Thus, (1/n)(s̃1 + s̃2 ) is an estimate of the variance of the pooled data, and s̃1 + s̃2
Fisher’s linear discriminant
with the geometric interpretation that
is called the total within-class scatter of the projected samples. The Fisher linear
the best projectiondiscriminant S
makes theemploys=
difference(x − m )(x − )t
mimeans (78)
i
that between
i the
linear function wt x for as large
which theascriterion
possiblefunction
x∈Di
relative to the variance.
and |m̃1 − m̃2 |2
J(w) = (77)
s̃21 + s̃22
32/56 Kundan Kumar Pattern Classification
best
fall inseparation
another, between
we want the thePCA
two x∈D
projected
projections sets LDA
(in the
the sense
line tojust
be described), we will
falling onto well separated, not
i
Dimensionality ProblemComponent Analysis Feature Selection References
also need a threshold criterion
thoroughly intermingled. Figure = before wet have a true classifier.
(x − mi )(x
4.27willustrates the−effect t We first consider
mi ) wof choosing two different how
Fisher to
Linear
find for
values theDiscriminant
woptimal w, and later turn
for a two-dimensional to the issue
example.
x∈D i
of thresholds.
It should be abundantly clear that if the
To obtain J(·) as an explicit
original distributions are multimodal function
t andof w, we
highly define the scatter
overlapping, even matrices
the “best” Si w
and
is scat
= w Si w; (80)
SW by
Tounlikely
compute to the optimal
provide w, weseaparation,
adequate define the and
scatter
thusmatrices S will be of little use.
this method mat
i
We now
therefore theturn
sumtoofthe matter
these of
scattersfinding
can bethe best such direction
written w, one we hope will
S =
enable accurate classification. A measure
i (x − m ofi )(x mi )t
the−separation (78)
between the projected
points is the difference of the samplex∈D
2 2 t
s̃1 +means.
i
s̃2 = w IfSW mw.
i is the d-dimensional sample mean
(81)
given
and
where, by
Similarly, the separations of the projected means obeys
1
mi = x, (73)
46 ni 4. NONPARAMETRIC
(m̃1 − m̃2 )2 CHAPTER
= (wt mx∈D 1 − t
iw m )
2
2 TECHNIQUES
t
then
The the samplescatter
within-class mean for the projected
matrix (m1 − is
SW= w points m2given − m2 )t w
)(m1 by
= wt SB w, (82)
SW = S1 + S2 . (79)
wherewe can write 1
Then m̃i = y
and the between-class scatter matrix nSiB
y∈Yi
SB = (m1 − m2 )(m1 − m2 )t . (83)
2
s̃i = (wt x − wt mi )2
thin- We 33/56
call SW the within-class scatter matrix. It is proportional to the sample
Kundan
co-
x∈DKumar
i Pattern Classification
ass variance matrix for the pooled d-dimensional data. It is symmetric and positive
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Fisher Linear Discriminant

Then, the criterion function becomes

|m̃1 − m̃2 |2 w T SB w
J(w) = =
s̃21 + s̃22 w T SW w

This expression is well known in mathematical physics as the generalized Rayleigh

quotient.
A vector w that maximizes J(·) must satisfy

SB w = λSW w
SW −1 SB w = λw

In this perticular case, it is unnecessary to solve for the eigenvalues and eigenvectors
of SW −1 SB due to the fact that SB w is always in the direction of m1 − m2
34/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Fisher Linear Discriminant

4.11. *MULTIPLE DISCRIMINANT ANALYSIS 47

scale factor for w is immaterial, we can immediately write the solution for the w that
optimizes J(·):
So we can find the immediate solution as
w = S−1
W (m1 − m2 ). (87)

Note Thus,
that, Swe have obtained w for Fisher’s linear discriminant — the linear function
W is symmetric and positive semidefinite, and it is usually nonsingular
yielding the maximum ratio of between-class scatter to within-class scatter. (The
if n > d. SB is also symmetric and positive semidefinite, but its rank is at most 1.
solution w given by Eq. 87 is sometimes called the canonical variate.) Thus the
Thus, we have has
classification obtained w for Fisher’s
been converted from linear discriminant
a d-dimensional – the to
problem linear functionmore
a hopefully
yielding the maximum
manageable ratio ofone.
one-dimensional between-class
This mappingscatter to within-class
is many-to-one, and inscatter.
theory can not
possibly reduce the minimum achievable error rate if we have a very large training set.
In general, one is willing to sacrifice some of the theoretically attainable performance
for the advantages of working in one dimension. All that remains is to find the
threshold, i.e., the point along the one-dimensional subspace separating the projected
points.
When35/56the conditional densities p(x|ωi ) are multivariate normal with equal
Kundan Kumar co-
Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Example to be solved
Question: Given the following sets of feature vector belonging to two classes ω1 and ω2 which is
Gaussian distributed.

1 3 4 5 7
, , , , ∈ ω1
2 4 3 5 5

6 9 7 11 13
, , , , ∈ ω2
2 4 3 4 6
Find out the best direction of the line of projection that best separates the data in one-dimensional
feature space.
8

6
x2

0
0 5 10 15
x1

36/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Multiple Discriminant Analysis

37/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Multiple
50
Discriminant Analysis
CHAPTER 4. NONPARAMETRIC TECHNIQUES

W1
W2

Figure:Figure
Three-dimensional
4.28: Three distributions are projected
three-dimensional onto two-dimensional
distributions are projectedsubspaces, described by a normal
onto two-dimensional
W1 and Wdescribed
vectorssubspaces, 2. by a normal vectors w and w . Informally, multiple discrimi-
1 2
nant methods seek the optimum such subspace, i.e., the one with the greatest sepa-
38/56 Kundan Kumar Pattern Classification
ration of the projected distributions for a given total within-scatter matrix, here as
Dimensionality Problem Component Analysis PCA LDA Feature Selection References
For the
For the c-class c-class the
problem, problem,
naturalthe natural generalization
generalization of Fisher’soflinear
Fisher’s linear discriminant
discriminant
Multiple Discriminant
involves c − 1 Analysis
discriminant functions. Thus, the projection is
involves c − 1 discriminant functions. Thus, the projection is from a d-dimensional from a d-dimensional
space to a (c −
space to a (c − 1)-dimensional 1)-dimensional space, and it is tacitly assumed that d ≥ c. The
that d4. ≥NONPARAMETRIC
48 space, and it is tacitly assumedCHAPTER c. The TECH
generalization
generalization for the within-class scatter matrix
for the within-class scatter matrix is obvious: is obvious:
The within-class scatter matrix
c
c

SW = S
SiW = Si 1 (90)
48 m
CHAPTERi = 4. x. (90)
NONPARAMETRIC TECH
CHAPTER 4. NONPARAMETRIC
i=1
i=1 ni
TECHNIQUES x∈Di
where, as before,
where, aswhere
before,
The proper generalization
for SB is not quite so obvious. Suppose that
total a totalmean vector m and a total scattert ST by
1matrix
S = 1 (xSi−=m )(x(x − −
m m)ti )(x − m m i) = x. (91) (91)
i
mi = i
x.x∈Di i i
mean
n ni (92) c
x∈D
1
x∈Dii
vector x∈Di 1 i
and and m = x = ni mi
The proper generalization
The for SB isThe
proper generalization
total
proper
notfor SB so
quite generalization
is obvious. sofor
not quiteSuppose SnBthat
obvious. x
is not
we quite
n i=1so obvious. Suppose that
deﬁne
al meanvectortotal a total mean
m and a total scatter matrixvector m and a total scatter matrix ST by
ST byvector
scatter
Suppose
meanthat we define
and a total mean m and a total scatter matrix ST by
matrix
vector c 1 1
c
1 1 m = x = ni mt i
m= x= ni mi ST = n (x − m)(x n(93) − m) .
total n x
n i=1
x i=1
x
scatter and
matrix Then it follows that
39/56 Kundan Kumar Pattern Classification

matrix
matrix
Dimensionality Problem Component Analysis PCA
LDA
t
Feature Selection References

ST S=
T = (x (x − m)(x
− m)(x − m)
− m) t
. . (94)
(94)
Multiple Discriminant Analysis x
x

Then
Then it follows
it follows that
that
Then we can write
c
c
t
ST ST= = (x (x
−m −m i + mi − m)(x − mi + mi − m)
t
i + mi − m)(x − mi + mi − m)
i=1 x∈D
i=1 x∈Di i
c c

c c
t t
== (x − m )(x − mt
i − m ) i+
(x − mi )(x ) + (m(m i − m)(mi − m)
t
i i − m)(mi − m)
i=1 x∈D i=1 x∈D
i=1 x∈Di i i=1 x∈Di i
c
c
t
=S S+W + n (m ni (m i − m)(mi − m)
t . (95)
= W i i − m)(mi − m) . (95)
i=1
i=1

Therefore, It natural
It is is natural
to to deﬁne
deﬁne thisthis second
second term
term as as a general
a general between-class
between-class scatter
scatter matrix,
matrix,
so that the total scatter is the
so that the total scatter is the sum sum of
STof=thethe within-class
SWwithin-class
+ SB scatter and the between-class
scatter and the between-class
scatter:
scatter:
where
c
c
t
SB = ni (m i − m)(mi − m)
t (96)
SB = ni (m i − m)(mi − m) (96)
i=1
i=1

andand
40/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References
and
Multiple Discriminant Analysis
ST = SW + SB . (97)
If we check the two-class case, we ﬁnd that the resulting between-class scatter matrix
is n1 n2 /n times our previous deﬁnition.∗

The projection
The projection form from a d-dimensional
a d-dimensional spacespace
to a to − 1)-dimensional
(c a−(c1)-dimensional space
space is is ac-
complished by c − 1 discriminant functions
accomplished by c − 1 discriminant functions
yi = wit x i = 1, ..., c − 1. (98)

If Ifthe
they yiare
i
areviewed
viewedas
as components
components of
ofaavector
vectoryyand
andthe weight
the vectors
weight wiwareare
vector i
viewed
as the columns of a d-by-(c − 1) matrix W, then the projection can be written as a
viewed as the columns of a d-by-(c − 1) matrix W, then the projection can be
single matrix equation
written as a single matrix equation
y = Wt x. (99)
∗ We could redeﬁne SB for the two-class case to obtain complete consistency, but there should be
no misunderstanding of our usage.

41/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Multiple Discriminant Analysis

4.11. *MULTIPLE DISCRIMINANT ANALYSIS 49

The samples x1The

, x2samples
, . . . , xnx1project to a corresponding
, ..., xn project set set
to a corresponding of samplesyy
of samples 1 1, ,y...,
2, y. n. ,. ,which
yn ,
candescribed
which can be be described by by their
their ownmean
own mean vectors
vectors and
andscatter
scattermatrices. Thus,Thus
matrices. if we deﬁne

1
m̃i = y (100)
ni
y∈Yi

c
1
m̃ = ni m̃i (101)
n i=1
c

S̃W = (y − m̃i )(y − m̃i )t (102)
i=1 y∈Yi

and
c

S̃B = ni (m̃i − m̃)(m̃i − m̃)t , (103)
i=1

it is a straightforward matter to show that

42/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA i=1 y∈Y
LDAi Feature Selection References

and
Multiple Discriminant Analysis
c

S̃B = ni (m̃i − m̃)(m̃i − m̃)t , (10
i=1

it is a straightforward
It is a straightforward matterthat
matter to show to show that

S̃W = Wt SW W (10

and

S̃B = Wt SB W. (10

These
These equations showequations
how theshow how the and
within-class within-class and between-class
between-class scatterare
scatter matrices matrices a
transformed by the projection to the lower dimensional
transformed by the projection to the lower dimensional space. space (Fig. 4.28). What w
seek is a transformation matrix W that in some sense maximizes the ratio of t
between-class scatter to the within-class scatter. A simple scalar measure of scatt
is the determinant of the scatter matrix. The determinant is the product of t
eigenvalues, and hence is the product of the “variances” in the principal direction
thereby measuring the square of the hyperellipsoidal scattering volume. Using th
43/56 Kundan Kumar Pattern Classification
measure, we obtain the criterion function
transformed
Dimensionality Problemby the
Component projection
Analysis PCA to the lower dimensional
LDA space (Fig.Feature
4.28). What we
Selection References
seek is a transformation matrix W that in some sense maximizes the ratio of the
Multiple Discriminant
between-class scatter Analysis: Solution
to the within-class scatter. A simple scalar measure of scatter
is the determinant of the scatter matrix. W1 The determinant is the product of the
eigenvalues, and hence is the product of the “variances” W2 in the principal directions,
thereby measuring the square of the hyperellipsoidal scattering volume. Using this
measure,
The wefunction
criterion obtain the criterion function
Figure 4.28: Three three-dimensional distributionst are projected onto two-dimensional
|S̃B | |W SB W|
subspaces, described by a normalJ(W) vectors
= w=1 andt w2 . Informally,
. (106)
multiple discrimi-
|W S W|
nant methods seek the optimum such|S̃subspace,
W| i.e.,W the one with the greatest sepa-
ration
Theof problem
the projected distributions
of finding for a given total
a rectangular within-scatter matrix, here as
the problem
associated
of finding a rectangular matrixmatrix
W thatWmaximized
that maximizes
J(·). J(·) is tricky,
with w1 . it turns out that the solution is relatively simple. The columns of
though fortunately
Thean columns
optimal Wof an
areoptimal W are the
the generalized generalized
eigenvectors thateigenvectors
correspond that correspond
to the to
largest eigen-
thevalues
largest
in eigenvalues in
(SB − λi SW )wi = 0 (109)
SB wi = λi SW wi . (107)
directly for the eigenvectors wi . Because SB is the sum of c matrices of rank one or
A few observations about this solution are in order. First, if SW is non-singular,
less, and because only c − 1 of these are independent, SB is of rank c − 1 or less. Thus,
this can be converted to a conventional eigenvalue problem as before. However, this
no more than c − 1 of the eigenvalues are nonzero, and the desired weight vectors
is actually undesirable, since it requires an unnecessary computation of the inverse of
correspond to these nonzero eigenvalues. If the within-class scatter is isotropic, the
SW . Instead,
44/56 one can find the eigenvalues
Kundan Kumar as the roots of the characteristic polynomial
Pattern Classification
ration of theComponent
Dimensionality Problem
projected distributions
Analysis PCA
for a |given|W
|SW SW within-scatter
total
LDA
W| matrix, here as References
Feature Selection
associated with w .
The problem1 of finding a rectangular matrix W that maximizes J(·) is tricky,
Multiple Discriminant
though Analysis:
fortunately it turns out thatObservation
the solution is relatively simple. The columns of
an optimal W are the generalized eigenvectors that correspond to the largest eigen-
values in W
(SB − λi S1 W )wi = 0 (109)
W2
directly for the eigenvectors wi . Because SB wi = SBλiisSW the
wisum
. of c matrices of rank one(107) or
less, and
If SWAisfew because only
nonsingular, c −
this 1 of these
can this are independent,
be converted to in S is
a conventional of rank c − 1 or less. Thus,
ifeigenvalue problem as
B
no more thanobservations
c − 1 of about
the eigenvalues solution
are are
nonzero, order.
and First,
the desiredSW weight
is non-singular,
vectors
before.
Figure
this can4.28:
be Three three-dimensional distributions are projected onto two-dimensional
correspond toconverted
these nonzero to a conventional
eigenvalues. eigenvalue problem
If the within-class as before.
scatter However,the
is isotropic, this
subspaces,
is actually described
undesirable, by a
sincenormal
it vectors
requires an w and w . Informally, multiple discrimi-
Computationare
eigenvectors of merely
the inversethe eigenvectors of SB , and the eigenvectors with nonzeroof
of SW is expensive. unnecessary
1 2 computation of the inverse
nant
S . methods
Instead, seek
one canthefindoptimum
the such
eigenvalues subspace, i.e., the one with the greatest sepa-
W
eigenvalues
Instead, one span thefindspace spanned by as the as the roots
vectors mofi −of m.
the characteristic
In this special polynomial
case the
thecan thedistributions
eigenvalues for the rootstotal the characteristic polynomial

ration of projected a given within-scatter matrix, here as
columns of W can be found simply by applying the Gram-Schmidt orthonormalization
associated with w .
procedure to the c −1 1 vectors mi −|Sm, B −i λ= i S1, | =c0− 1. Finally, we observe that(108)
W ..., in
general
andthen the
thensolvesolution
solve for W is not unique. The allowable transformations include
and
rotating and scaling the axes in various ways. These are all linear transformations
from a (c − 1)-dimensional space to(SaB(c−−λi1)-dimensional SW )wi = 0 space, however, and do not (109)
change things in any significant way; in particular, they leave the criterion function
directly for theand
J(W) eigenvectors w . Because S is the sum of c matrices of rank one or
directlyinvariant the classifier
for the eigenvectors wi .i unchanged. B
less,
If we and because
have very only − 1 ofwe
little cdata, these
wouldare tend
independent,
to project rank c −of
SBtoisaofsubspace 1 or
lowless. Thus,
dimen-
no more
sion, while than c −is1more
if there
45/56 of the eigenvalues
data, we can
Kundan are anonzero,
use
Kumar and the desired
higher dimension, weight
as we shall vectors
explore
Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature Selection

46/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature Selection

Feature reduction uses a linear or non-linear combination of features.

An alternative to feature reduction is feature selection that reduces dimensionality
by selecting subsets of existing features.
Benefits of performing feature selection:
avoid curse of dimensionality
reduce the computational cost
improves accuracy
avoid overfitting
The first step in feature selection is to define a criterion function that is often a
function of the classification error.
Note that, the use of classification error in the criterion function makes feature
selection procedures dependent on the specific classifier used.

47/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature Selection

The most straightforward approach would require

d
examining all possible subsets of size m,
m
selecting the subset that performs the best according to the criterion function.
The number of subsets grows combinatorially, making the exhaustive search
impractical.
There are two main types of feature selection algorithms:
Wrapper Feature Selection Methods.
Filter Feature Selection Methods.

48/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Examples: Iris data representation

12.5
setosa 15 setosa
10.0 versicolor versicolor
virginica virginica
7.5 10

5.0
5
2.5

0.0 0
5 6 7 8 2.0 2.5 3.0 3.5 4.0 4.5
sepal length (cm) sepal width (cm)

30
12.5 setosa setosa
versicolor versicolor
10.0 virginica virginica
20
7.5

5.0
10
2.5

0.0 0
2 4 6 0.0 0.5 1.0 1.5 2.0 2.5
petal length (cm) petal width (cm)

Figure: Histogram plot of Iris features

49/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Examples: Iris data representation

sepal length (cm)

sepal width (cm)

2
petal length (cm)

2
petal width (cm)

1
5

2
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

Figure: Scatter plot of the iris data. Off-diagonal cells show scatters of pairs of features x1 , x2 , x3 , x4 .
50/56 Kundan Kumar Pattern Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature Selection
Examples
Sequential forward selection:
Sequential forward selection
1. First, the best single feature is selected.
DEM::ELEVATION
2. Then, pairs of features are formed IKONOS2::BAND1
IKONOS2::BAND3
using one of the remaining features and IKONOS2::BAND2
IKONOS2::BAND4
IKONOS2_GABOR4::FINE90DEG
this best feature, and the best pair is IKONOS2_GABOR4::COARSE90DEG
IKONOS2_GABOR4::FINE0DEG
IKONOS2_GABOR4::COARSE0DEG
selected. AERIAL_GABOR1::FINE0DEG
IKONOS2_GABOR1::COARSE0DEG

3. Next, triplets of features are formed IKONOS2_GABOR1::FINE0DEG

IKONOS3::BAND4
IKONOS3::BAND3
using one of the remaining features and IKONOS2_GABOR1::FINE90DEG
IKONOS2_GABOR1::COARSE90DEG
AERIAL_GABOR2::COARSE0DEG
these two best features, and the best IKONOS3::BAND1
AERIAL_GABOR1::FINE90DEG
AERIAL_GABOR2::FINE90DEG
triplet is selected. AERIAL_GABOR1::COARSE90DEG
IKONOS3::BAND2
4. This procedure continues until all or a AERIAL_GABOR2::FINE0DEG
AERIAL::BAND1
AERIAL::BAND2
predefined number of features are AERIAL_GABOR2::COARSE90DEG
AERIAL::BAND3
AERIAL_GABOR1::COARSE0DEG
selected. 56 58 60 62 64 66 68 70 72 74
Classification accuracy

Figure 24: Results of sequential forward feature selection for clas

51/56 a Kundan
satellite image using 28 features. x-axis shows Pattern
Kumar the classification
Classification
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Feature Selection
Examples
Sequential backward selection:
Sequential backward selection
First, the criterion function is computed
AERIAL_GABOR2::FINE0DEG
for all d features. AERIAL::BAND2
AERIAL::BAND3
IKONOS3::BAND2
Then, each feature is deleted one at a AERIAL_GABOR1::COARSE0DEG
AERIAL_GABOR2::COARSE90DEG
time, the criterion function is computed AERIAL_GABOR1::FINE90DEG
IKONOS2_GABOR4::FINE0DEG
IKONOS2::BAND4
for all subsets with d − 1 features, and IKONOS2_GABOR1::FINE0DEG
IKONOS2_GABOR1::COARSE90DEG
IKONOS2_GABOR1::COARSE0DEG
the worst feature is discarded. IKONOS2_GABOR4::COARSE0DEG
IKONOS2::BAND2
Next, each feature among the IKONOS3::BAND1
IKONOS2_GABOR1::FINE90DEG
IKONOS3::BAND4
remaining d − 1 is deleted one at a IKONOS2_GABOR4::COARSE90DEG
AERIAL_GABOR2::FINE90DEG
AERIAL_GABOR1::FINE0DEG
time, and the worst feature is discarded AERIAL_GABOR2::COARSE0DEG
IKONOS2_GABOR4::FINE90DEG

to form a subset with d − 2 features. IKONOS2::BAND3

IKONOS2::BAND1
AERIAL_GABOR1::COARSE90DEG
This procedure continues until one IKONOS3::BAND3
DEM::ELEVATION
NONE
feature or a predefined number of 54 56 58 60 62 64 66 68 70 72
Classification accuracy
features are left.
Figure 25: Results of sequential backward feature selection for cla
of a satellite image using 28 features. x-axis shows the classificatio
52/56 accuracy (%) and y-axis shows the features removed
Kundan Kumar atClassification
Pattern each iterat
Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Summary

The choice between feature reduction and feature selection depends on the
application domain and the specific training data.
Feature selection leads to savings in computational costs and the selected features
retain their original physical interpretation.
Feature reduction with transformations may provide a better discriminative ability
but these new features may not have a clear physical meaning.

53/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

Assignment Problem

Question:
(a) Given the following sets of feature vector belonging to two classes ω1 and ω2 which is Gaussian
distributed.
(1, 2)t , (3, 5)t , (4, 3)t , (5, 6)t , (7, 5)t ∈ ω1
(6, 2)t , (9, 4)t , (10, 1)t , (12, 3)t , (13, 6)t ∈ ω2
The vector are projected onto a line to represent the feature vectors by a single feature. Find out
the best direction of the line of projection that maintains the separability of the two classes.
(b) Assuming the mean of the projected point belonging to ω1 to be the origin of the projection line,
identify the point on the projection line that optimally separates two classes. Assume the classes to
be equally probable and the projected features also follow Gaussian distribution.

54/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

References

[1] Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. John
Wiley & Sons, 2012.

55/56 Kundan Kumar Pattern Classification

Dimensionality Problem Component Analysis PCA LDA Feature Selection References

1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
No ratings yet
1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
9 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
41 pages
PCALDAICA
No ratings yet
PCALDAICA
28 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Lecture 11 Dimensionality Reduction
No ratings yet
Lecture 11 Dimensionality Reduction
32 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Curse of Dimensionality, Dimensionality Reduction With PCA
No ratings yet
Curse of Dimensionality, Dimensionality Reduction With PCA
36 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
DimensionalityReduction Pca
No ratings yet
DimensionalityReduction Pca
24 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
32 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Sta 5
No ratings yet
Sta 5
16 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
LINFO2275 Questions D Examen-4
No ratings yet
LINFO2275 Questions D Examen-4
34 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
Pattern Recognition Notes Part-2 - Studocu
No ratings yet
Pattern Recognition Notes Part-2 - Studocu
16 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Unit 3
No ratings yet
Unit 3
21 pages
Multivariate Parametric Methods: Steven J Zeil
No ratings yet
Multivariate Parametric Methods: Steven J Zeil
36 pages
Updated Feature Enginering Notes
No ratings yet
Updated Feature Enginering Notes
47 pages
Outline: Reducing Data Dimension
No ratings yet
Outline: Reducing Data Dimension
7 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Statistical Pattern Recognition
No ratings yet
Statistical Pattern Recognition
19 pages
ML Lec-20
No ratings yet
ML Lec-20
17 pages
Clustering
No ratings yet
Clustering
104 pages
1.2. Linear and Quadratic Discriminant Analysis - Scikit-Learn 1.6.1 Documentati
No ratings yet
1.2. Linear and Quadratic Discriminant Analysis - Scikit-Learn 1.6.1 Documentati
10 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
UNIT 2 - CS3401-Algorithms
No ratings yet
UNIT 2 - CS3401-Algorithms
22 pages
Ad3351-Staff-Lab-Manual II AIDS
No ratings yet
Ad3351-Staff-Lab-Manual II AIDS
50 pages
Numerical Methods & Probability Theory (20A54402) : Lecture Notes
No ratings yet
Numerical Methods & Probability Theory (20A54402) : Lecture Notes
199 pages
Class47 49 - AttentionBasedModels Transformers 10 15may2023
No ratings yet
Class47 49 - AttentionBasedModels Transformers 10 15may2023
27 pages
Unit 3
No ratings yet
Unit 3
14 pages
Alpha Beta Pruning Algorithm
No ratings yet
Alpha Beta Pruning Algorithm
27 pages
Job 3
No ratings yet
Job 3
3 pages
CPU Scheduling - I: Roadmap
No ratings yet
CPU Scheduling - I: Roadmap
5 pages
Finite Impulse Response (FIR) Filter: Dr. Dur-e-Shahwar Kundi Lec-7
No ratings yet
Finite Impulse Response (FIR) Filter: Dr. Dur-e-Shahwar Kundi Lec-7
37 pages
Blackfin Processor
No ratings yet
Blackfin Processor
20 pages
Tree-Structured Vector Quantizers
No ratings yet
Tree-Structured Vector Quantizers
13 pages
Assisgnment Questions
No ratings yet
Assisgnment Questions
4 pages
Wa0018
No ratings yet
Wa0018
61 pages
Oops Lab
No ratings yet
Oops Lab
9 pages
CPU Scheduling
No ratings yet
CPU Scheduling
11 pages
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
No ratings yet
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
5 pages
DSP - LP
No ratings yet
DSP - LP
4 pages
DSP Lect 01
No ratings yet
DSP Lect 01
18 pages
Y20cs027 Internship
No ratings yet
Y20cs027 Internship
18 pages
Digital Control System 5
100% (1)
Digital Control System 5
2 pages
Six Textbook Mistakes in Computational Physics
No ratings yet
Six Textbook Mistakes in Computational Physics
14 pages
Toom Cook Polynomials Bodrato
No ratings yet
Toom Cook Polynomials Bodrato
15 pages
Graduated Non-Convexity For Robust Spatial Perception: From Non-Minimal Solvers To Global Outlier Rejection
No ratings yet
Graduated Non-Convexity For Robust Spatial Perception: From Non-Minimal Solvers To Global Outlier Rejection
11 pages
Matrix Description For Linear Block Codes
No ratings yet
Matrix Description For Linear Block Codes
24 pages
Iris Species IB
No ratings yet
Iris Species IB
7 pages
Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result
No ratings yet
Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result
27 pages
Biometric Voice Recognition
100% (1)
Biometric Voice Recognition
33 pages
Chapter 8 and 9
No ratings yet
Chapter 8 and 9
25 pages
EE370 Lab Experiment 07
No ratings yet
EE370 Lab Experiment 07
3 pages
Practical 7-10 BSC VI Sem - 240527 - 180417
No ratings yet
Practical 7-10 BSC VI Sem - 240527 - 180417
7 pages