0% found this document useful (0 votes)
11 views14 pages

Lecture 3

Uploaded by

ahmed.shk468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Lecture 3

Uploaded by

ahmed.shk468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Dimensionality

 Reduc1on  
How  Can  We  Visualize  High  
Dimensional  Data?  
• E.g.,  53  blood  and  urine  tests  for  65  pa1ents  
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
Instances  

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000


A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Features  

Difficult  to  see  the  correla1ons  between  the  features...  


3
Data Visualization
• Is  there  a  representa1on  beIer  than  the  raw  features?  
• Is  it  really  necessary  to  show  all  the  53  dimensions?  
• …  what  if  there  are  strong  correla1ons  between  the  features?  

Could  we  find  the  smallest  subspace  of  the  53-­‐D  space  
that  keeps  the  most  informa-on  about  the  original  data?  

One  solu1on:  Principal  Component  Analysis  

4
Principle  Component  Analysis  

Orthogonal  projec1on  of  data  onto  lower-­‐dimension  linear  


space  that...  
• maximizes  variance  of  projected  data  (purple  line)  
• minimizes  mean  squared  distance  between    
data  point  and  projec1ons  (sum  of  blue  lines)  

5
The  Principal  Components  

• Vectors  origina1ng  from  the  center  of  mass  

• Principal  component  #1  points  in  the  direc1on  of  the  


largest  variance  

• Each  subsequent  principal  component…  


• is  orthogonal  to  the  previous  ones,  and    
• points  in  the  direc1ons  of  the  largest  variance  of  the  
residual  subspace  

7
2D  Gaussian  Dataset  

8
1st  PCA  axis  

9
2nd  PCA  axis  

10
Dimensionality  Reduc1on  
Can  ignore  the  components  of  lesser  significance    
  25

    20
Variance (%)

 
15

10
 
5
 
0
 
 
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

You  do  lose  some  informa1on,  but  if  the  eigenvalues  


are  small,  you  don’t  lose  much  
– choose  only  the  first  k  eigenvectors,  based  on  
their  eigenvalues  
– final  data  set  has  only  k  dimensions  
12
PCA  Algorithm  
• Given  data  {x1, …, xn},  compute  covariance  matrix  Σ    
• X  is  the  n    x  d  data  matrix  
• Compute  data  mean  (average  over  all  rows  of  X)  
• Subtract  mean  from  each  row  of  X (centering  the  data)  
• Compute  covariance  matrix  Σ  =  XTX (  Σ  is  d  x  d )  
 

• PCA  basis  vectors  are  given  by  the  eigenvectors  of  Σ  


• Q,Λ = numpy.linalg.eig(Σ)"
• {qi, λi}i=1..n are the eigenvectors/eigenvalues of Σ
... λ1 ≥ λ2 ≥ … ≥ λn!

• Larger  eigenvalue  ⇒  more  important  eigenvectors  


13
2
PCA
3
 
0 1 0 1 1 0 0 1... X has d columns
6 1 1 0 1 1 1 0 0... 7
6 7
6 0 0 1 1 1 0 0 0... 7
X=6 7
6 .. 7
4 . 5
1 0 1 0 1 0 0 0...
Q  is  the  eigenvectors  of  Σ;  
columns  are  ordered  by  importance!   Q is d x d
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...
14
Slide  by  Eric  Eaton  
PCA  
2 3
0 1 0 1 1 0 0 1...
6 1 1 0 1 1 1 0 0... 7
6 7
6 0 0 1 1 1 0 0 0... 7
X=6 7
6 .. 7
4 . 5
1 0 1 0 1 0 0 0...
Each  row  of  Q  corresponds  to  a  feature;  keep  only  first  k  columns  of  Q"
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...
15
Slide  by  Eric  Eaton  
PCA  
• Each  column  of  Q  gives  weights  for  a  linear  
combina1on  of  the  original  features  
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...

= 0.34 feature1 + 0.04 feature2 – 0.64 feature3 + ...

16
Slide  by  Eric  Eaton  
PCA  
• We  can  apply  these  formulas  to  get  the  new  
representa1on  for  each  instance  x!
2 2 3 2 3
0 1 0 1 1 0 0 0.34 1... 0.23 0.30 0.340.23 0.23 ...
6 1 1 0 1 161 0 0.04 0 . . . 7 0.13 6 0.040.21 0.13
0.40 ... 7
6 6 7 6 7
6 1 1=16 7 x0.93 ^ 6 0.640.28 0.93 7
X =6 0 0Q 60 0 0 .
0.64 . . 7 3" Q = 0.61
6 ... 7
6 ..6 .. 7 .. 6 .. .. .. . . .. 7
4 .4 . 5 . 4 . . . . 5
1 0 1 0 1 0 0 0.20
0... 0.83 0.78 0.200.93 0.83
...

• The  new  2D  representa1on  for  x3  is  given  by:  


x^31 = 0.34(0) + 0.04(0) - 0.64(1) + ...
x^32 = 0.23(0) + 0.13(0) + 0.93(1) + ...
^ ^
• The  re-­‐projected  data  matrix  is  given  by  X = XQ"
17
Slide  by  Eric  Eaton  

You might also like