0% found this document useful (0 votes)

11 views14 pages

Lecture 3

Uploaded by

ahmed.shk468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

Lecture 3

Uploaded by

ahmed.shk468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Dimensionality

Reduc1on
How Can We Visualize High
Dimensional Data?
• E.g., 53 blood and urine tests for 65 pa1ents
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
Instances

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000

A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Features

Diﬃcult to see the correla1ons between the features...

3
Data Visualization
• Is there a representa1on beIer than the raw features?
• Is it really necessary to show all the 53 dimensions?
• … what if there are strong correla1ons between the features?

Could we ﬁnd the smallest subspace of the 53-‐D space
that keeps the most informa-on about the original data?

One solu1on: Principal Component Analysis

4
Principle Component Analysis

Orthogonal projec1on of data onto lower-‐dimension linear

space that...
• maximizes variance of projected data (purple line)
• minimizes mean squared distance between
data point and projec1ons (sum of blue lines)

5
The Principal Components

• Vectors origina1ng from the center of mass

• Principal component #1 points in the direc1on of the

largest variance

• Each subsequent principal component…

• is orthogonal to the previous ones, and
• points in the direc1ons of the largest variance of the
residual subspace

7
2D Gaussian Dataset

8
1st PCA axis

9
2nd PCA axis

10
Dimensionality Reduc1on
Can ignore the components of lesser signiﬁcance
25

20
Variance (%)

10

5

0

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

You do lose some informa1on, but if the eigenvalues

are small, you don’t lose much
– choose only the ﬁrst k eigenvectors, based on
their eigenvalues
– ﬁnal data set has only k dimensions
12
PCA Algorithm
• Given data {x1, …, xn}, compute covariance matrix Σ
• X is the n x d data matrix
• Compute data mean (average over all rows of X)
• Subtract mean from each row of X (centering the data)
• Compute covariance matrix Σ = XTX ( Σ is d x d )

• PCA basis vectors are given by the eigenvectors of Σ

• Q,Λ = numpy.linalg.eig(Σ)"
• {qi, λi}i=1..n are the eigenvectors/eigenvalues of Σ
... λ1 ≥ λ2 ≥ … ≥ λn!

• Larger eigenvalue ⇒ more important eigenvectors

13
2
PCA
3

0 1 0 1 1 0 0 1... X has d columns
6 1 1 0 1 1 1 0 0... 7
6 7
6 0 0 1 1 1 0 0 0... 7
X=6 7
6 .. 7
4 . 5
1 0 1 0 1 0 0 0...
Q is the eigenvectors of Σ;
columns are ordered by importance! Q is d x d
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...
14
Slide by Eric Eaton
PCA
2 3
0 1 0 1 1 0 0 1...
6 1 1 0 1 1 1 0 0... 7
6 7
6 0 0 1 1 1 0 0 0... 7
X=6 7
6 .. 7
4 . 5
1 0 1 0 1 0 0 0...
Each row of Q corresponds to a feature; keep only ﬁrst k columns of Q"
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...
15
Slide by Eric Eaton
PCA
• Each column of Q gives weights for a linear
combina1on of the original features
2 3
0.34 0.23 0.30 0.23 ...
6 0.04 0.13 0.40 0.21 ... 7
6 7
6 0.64 0.93 0.61 0.28 ... 7
Q=6 7
6 .. .. .. .. .. 7
4 . . . . . 5
0.20 0.83 0.78 0.93 ...

= 0.34 feature1 + 0.04 feature2 – 0.64 feature3 + ...

16
Slide by Eric Eaton
PCA
• We can apply these formulas to get the new
representa1on for each instance x!
2 2 3 2 3
0 1 0 1 1 0 0 0.34 1... 0.23 0.30 0.340.23 0.23 ...
6 1 1 0 1 161 0 0.04 0 . . . 7 0.13 6 0.040.21 0.13
0.40 ... 7
6 6 7 6 7
6 1 1=16 7 x0.93 ^ 6 0.640.28 0.93 7
X =6 0 0Q 60 0 0 .
0.64 . . 7 3" Q = 0.61
6 ... 7
6 ..6 .. 7 .. 6 .. .. .. . . .. 7
4 .4 . 5 . 4 . . . . 5
1 0 1 0 1 0 0 0.20
0... 0.83 0.78 0.200.93 0.83
...

• The new 2D representa1on for x3 is given by:

x^31 = 0.34(0) + 0.04(0) - 0.64(1) + ...
x^32 = 0.23(0) + 0.13(0) + 0.93(1) + ...
^ ^
• The re-‐projected data matrix is given by X = XQ"
17
Slide by Eric Eaton

2019 Book OptimizationOfProcessFlowsheet PDF
No ratings yet
2019 Book OptimizationOfProcessFlowsheet PDF
121 pages
NM Assignment Solution
No ratings yet
NM Assignment Solution
146 pages
Lecture 6 (B) PCA-II
No ratings yet
Lecture 6 (B) PCA-II
90 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Ch4 - Moment Distribution
No ratings yet
Ch4 - Moment Distribution
35 pages
20 Pca
No ratings yet
20 Pca
50 pages
Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Computer Vision and Image Processing - Fundamentals and Applications
34 pages
PCA Biology
No ratings yet
PCA Biology
45 pages
D3S2 - Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 - Unsupervised - Dimensionality Reduction
81 pages
Lec 13-14 PCA
No ratings yet
Lec 13-14 PCA
53 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
Unit 3
No ratings yet
Unit 3
102 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Optimization Techniques Draft
No ratings yet
Optimization Techniques Draft
4 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
PCA1
No ratings yet
PCA1
45 pages
Module 3
No ratings yet
Module 3
41 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Unit - V Aem
No ratings yet
Unit - V Aem
144 pages
Unit 3
No ratings yet
Unit 3
28 pages
Homework Assignment 2 Term - 4: Submitted To Professor Jitamitra Desai PGP 2021-23
100% (1)
Homework Assignment 2 Term - 4: Submitted To Professor Jitamitra Desai PGP 2021-23
28 pages
Prs l6
No ratings yet
Prs l6
10 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Presentation
No ratings yet
Presentation
31 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Program 3
No ratings yet
Program 3
7 pages
Pac
No ratings yet
Pac
70 pages
196 1 PB
No ratings yet
196 1 PB
262 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Chapter 3 Solution
No ratings yet
Chapter 3 Solution
134 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
A Novel Approach To The Gas-Lift Allocation Optimization Problem
No ratings yet
A Novel Approach To The Gas-Lift Allocation Optimization Problem
11 pages
Wa0001.
No ratings yet
Wa0001.
47 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Discretization Methods ("Numerical Heat Transfer and Fluid Flow" by Suhas V. Patankar)
No ratings yet
Discretization Methods ("Numerical Heat Transfer and Fluid Flow" by Suhas V. Patankar)
4 pages
Caed101 de Castro Acn1 Assignment Problem
No ratings yet
Caed101 de Castro Acn1 Assignment Problem
2 pages
SLeM 7 Math 10 Q1
No ratings yet
SLeM 7 Math 10 Q1
9 pages
Optimization
No ratings yet
Optimization
16 pages
PCA
100% (1)
PCA
33 pages
Aravali International School Class - Viii Subject - Mathematics Algebraic Expressions and Identities Name: - Roll No: - Teacher's Signature
No ratings yet
Aravali International School Class - Viii Subject - Mathematics Algebraic Expressions and Identities Name: - Roll No: - Teacher's Signature
3 pages
Linear Programming - Defined As The Problem of Maximizing or Minimizing A Linear Function Subject To
No ratings yet
Linear Programming - Defined As The Problem of Maximizing or Minimizing A Linear Function Subject To
6 pages
Andreani2010 - Constant-Rank Condition and Second-Order Constraint Qualification
No ratings yet
Andreani2010 - Constant-Rank Condition and Second-Order Constraint Qualification
12 pages
Matrix and Determinants 2 PDF
No ratings yet
Matrix and Determinants 2 PDF
4 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Matrix Row Operations - Algorithm
No ratings yet
Matrix Row Operations - Algorithm
21 pages
Binomial Theorem Level - 01 MRA Sir
No ratings yet
Binomial Theorem Level - 01 MRA Sir
28 pages
Curve Fitting Assignment
No ratings yet
Curve Fitting Assignment
5 pages
Course Overview - Structural Analysis II
No ratings yet
Course Overview - Structural Analysis II
2 pages
Write-Up Bairstows Method
No ratings yet
Write-Up Bairstows Method
3 pages
Flops PDF
No ratings yet
Flops PDF
6 pages
SeriesTemp Cours4 Slides
No ratings yet
SeriesTemp Cours4 Slides
25 pages
Andesnia Qonita Luthfiya - Assignment3
No ratings yet
Andesnia Qonita Luthfiya - Assignment3
13 pages
Cayley Hamilton
No ratings yet
Cayley Hamilton
9 pages
MATH-314 Linear Algebra
No ratings yet
MATH-314 Linear Algebra
3 pages
ALG 055 M7 Content Guide
No ratings yet
ALG 055 M7 Content Guide
5 pages
Sample Paper 3
No ratings yet
Sample Paper 3
3 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Dimensionality

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000

Diﬃcult to see the correla1ons between the features...

One solu1on: Principal Component Analysis

Orthogonal projec1on of data onto lower-­‐dimension linear

• Vectors origina1ng from the center of mass

• Principal component #1 points in the direc1on of the

• Each subsequent principal component…

You do lose some informa1on, but if the eigenvalues

• PCA basis vectors are given by the eigenvectors of Σ

• Larger eigenvalue ⇒ more important eigenvectors

= 0.34 feature1 + 0.04 feature2 – 0.64 feature3 + ...

• The new 2D representa1on for x3 is given by:

You might also like

Orthogonal projec1on of data onto lower-‐dimension linear