Lecture 3
Lecture 3
Reduc1on
How
Can
We
Visualize
High
Dimensional
Data?
• E.g.,
53
blood
and
urine
tests
for
65
pa1ents
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
Instances
Features
Could
we
find
the
smallest
subspace
of
the
53-‐D
space
that
keeps
the
most
informa-on
about
the
original
data?
4
Principle
Component
Analysis
5
The
Principal
Components
7
2D
Gaussian
Dataset
8
1st
PCA
axis
9
2nd
PCA
axis
10
Dimensionality
Reduc1on
Can
ignore
the
components
of
lesser
significance
25
20
Variance (%)
15
10
5
0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
16
Slide
by
Eric
Eaton
PCA
• We
can
apply
these
formulas
to
get
the
new
representa1on
for
each
instance
x!
2 2 3 2 3
0 1 0 1 1 0 0 0.34 1... 0.23 0.30 0.340.23 0.23 ...
6 1 1 0 1 161 0 0.04 0 . . . 7 0.13 6 0.040.21 0.13
0.40 ... 7
6 6 7 6 7
6 1 1=16 7 x0.93 ^ 6 0.640.28 0.93 7
X =6 0 0Q 60 0 0 .
0.64 . . 7 3" Q = 0.61
6 ... 7
6 ..6 .. 7 .. 6 .. .. .. . . .. 7
4 .4 . 5 . 4 . . . . 5
1 0 1 0 1 0 0 0.20
0... 0.83 0.78 0.200.93 0.83
...