0% found this document useful (0 votes)

26 views47 pages

Dimensionality Reduction

Uploaded by

ivan.cheung.yui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views47 pages

Dimensionality Reduction

Uploaded by

ivan.cheung.yui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

5.

Dimensionality Reduction

COMP3314
Machine Learning
COMP 3314 2

Motivation
● Many ML problems have thousands or even millions of features
● As a result we have an intractable problem
○ Training is slow
○ Finding a solution is difficult (Curse of Dimensionality)
○ Data visualization is impossible
● Solution
○ Dimensionality Reduction using feature extraction
○ Often possible without losing much relevant information
■ E.g., Merge neighboring pixels of the MNIST dataset
COMP 3314 3

The Curse of Dimensionality

COMP 3314 4

High Dimensional Weirdness

● 2D
○ Pick a random point in a unit square will have <0.4% chance of
being located <0.001 from a border
● 10,000D
○ Pick a random point in a unit hypercube will have >99.99999%
chance of being located <0.001 from a border
● I.e., the high-dimensional unit hypercube can be said to consist
almost entirely of borders with almost no middle
COMP 3314 5

High Dimensional Weirdness

● 2D
○ Pick two random points in a unit square
○ Distance between them will be 0.52 on average
● 1,000,000D
○ Pick two random points in a unit hypercube
○ Distance between them will be 408.25 on average
● How can two points be so far apart when they both lie within the
same unit hypercube?
● As a result, new test samples will likely be far away from training
samples in high dimensional space
○ Overfitting risk is much higher in high-dimensional space
COMP 3314 6

Idea: Projection
● In most problems, training
instances are not spread out
uniformly across all dimensions
● Many features are almost
constant, while others are highly
correlated
● As a result, all training instances
lie within (or close to) a much
lower-dimensional subspace of
the high-dimensional space
COMP 3314 7

When Projection Fails

● Projection is not always the
best approach
● Consider the following toy
dataset to illustrate this
problem
○ The Swiss roll
● Simply projecting onto a plane
(e.g., dropping x3) would
squash different layers
COMP 3314 8

Solution: Manifold Learning

● The Swiss roll is an example of 2D manifold
○ A 2D manifold is a 2D shape that can be bent and twisted in a
higher-dimensional space
● It is possible to learn the manifold on which the training instances
lie and then to unroll the swiss roll
COMP 3314 9

Manifold Learning
● Note
○ The decision boundary may not always be simpler in lower
dimensions
COMP 3314 10

Outline
● PCA
○ Principal Component Analysis
○ Projects data points onto (few) principal components
● LLE
○ Locally Linear Embedding
○ Powerful nonlinear dimensionality reduction technique
○ Manifold Learning technique that does not rely on projections
COMP 3314 11

PCA - Principal Component Analysis

● By far the most popular dimensionality reduction algorithm
● Identifies a hyperplane and then projects data onto it
How to choose the hyperplane?
COMP 3314 13

Preserving the Variance

● Select axis that preserves the maximum amount of variance
○ I.e., loses less information than other projections
● Example: 1D Hyperplanes Preserves max
variance
Let’s try to project
this data onto three
different axis
Preserves
intermediate
amount of variance

Preserves very little

variance
COMP 3314 14

Principal Components (PC)

● The first PC is the axis that accounts for the
largest amount of variance
○ E.g., PC1 in the figure
● The second PC is orthogonal to the first one
and accounts for the largest amount of
remaining variance
○ E.g., PC2 in the figure
○ In this 2D example there is no choice
● If it were in a higher-dimensional dataset the
third PC would be orthogonal to both
previous axes, and a fourth, a fifth, and so
on—as many axes as the number of
dimensions in the dataset
COMP 3314 15

How to find PCs?

● There is a standard matrix factorization technique
called Singular Value Decomposition (SVD)
● It decomposes the training set matrix X into the
matrix multiplication of three matrices
X = U Σ V⊺, where V contains the unit vectors
that define all the principal components that we
are looking for
● Note that PCs are highly sensitive to data scaling
● We need to standardize the features prior to PCA
if the features were measured on different scales
COMP 3314 16

PCA - Principal Component Analysis

● An unsupervised linear transformation technique
○ Finds PCs
■ Using e.g., SVD
○ Projects data onto a subspace with fewer (or equal) dimensions using
some (or all) of the found PCs
■ Multiply original data with a transformation matrix that consists
of PCs, some (or all)
COMP 3314 17

Projecting Down to k Dimensions

● Once you have identified all the principal components, you can reduce the
dimensionality of the dataset down to k dimensions by projecting it onto
the hyperplane defined by the first k principal components
● To project the training set onto the hyperplane and obtain a reduced
dataset of dimensionality k, compute the matrix multiplication of the
training set vector (or matrix) x (or X) by the matrix W, defined as the
matrix containing the first k columns of V
● W is a d × k transformation matrix
○ Maps a d-dimensional vector x to a
k-dimensional vector z
COMP 3314 18

Code - PCA.ipynb
● Available here on CoLab
COMP 3314 19

Load and Standardize Data

● Let’s apply PCA on the wine dataset
○ Load the wine dataset and split it into separate train and test sets
○ Standardize the (d=13)-dimensional dataset
COMP 3314 20

Projecting Down
COMP 3314 21
COMP 3314 22

Using Scikit-Learn’s PCS

● Scikit-Learn’s PCA class uses SVD decomposition to implement
PCA
○ Just like we did manually
● The following code applies PCA to reduce the dimensionality of the
dataset down to two dimensions
COMP 3314 23
COMP 3314 24

Explained Variance Ratio

● Another useful piece of information is the explained variance ratio
of each principal component
○ Available via the explained_variance_ratio_ variable
● The ratio indicates the proportion of the dataset’s variance that lies
along each principal component

● This output tells us that 36.9% of the dataset’s variance lies along
the first PC, and 18.4% lies along the second PC, etc
COMP 3314 25

Choosing the Right Number of Dimensions

● Choose the number of dimensions that add up to a sufficiently large
portion of the variance (e.g., 95%)
○ Unless, of course, you are reducing dimensionality for data
visualization—in that case you will want to reduce the dimensionality
down to 2 or 3
● The following code performs PCA without reducing dimensionality, then
computes the minimum number of dimensions required to preserve 90%
of the training set’s variance
COMP 3314 26

Choosing the Right Number of Dimensions

● You could then set n_components=k and run PCA again
○ But there is a much better option: instead of specifying the
number of principal components you want to preserve, you can
set n_components to be a float between 0.0 and 1.0, indicating
the ratio of variance you wish to preserve
COMP 3314 27

Choosing the Right Number of Dimensions

● Yet another option is to plot the explained variance as a function of
the number of dimensions
COMP 3314 28
COMP 3314 29
COMP 3314 30

PCA for Compression

● Let’s apply PCA to the MNIST dataset while preserving 90% of its variance
○ 87 features instead of the original 784 features
○ This size reduction can speed up a classification algorithm (such as an SVM
classifier) tremendously
● It is also possible to decompress the reduced dataset back to 784 dimensions
○ This won’t give you back the original data, since the projection lost a bit of
information (within the 10% variance that was dropped)
○ The following code compresses the MNIST dataset down to 87 dimensions, then
uses the inverse_transform() method to decompress it back to 784 dimensions
COMP 3314 31

PCA for Compression

COMP 3314 32

Randomized PCA
● If you set the svd_solver hyperparameter to "randomized", Scikit-Learn uses a stochastic
algorithm called Randomized PCA that quickly finds an approximation of the first d
principal components
○ It is dramatically faster than full SVD when k is much smaller than d
● By default, svd_solver is actually set to "auto"
○ Scikit-Learn automatically uses the randomized PCA algorithm if d is greater than
500 and k is less than 80% d, or else it uses the full SVD approach
○ If you want to force Scikit-Learn to use full SVD, you can set the svd_solver
hyperparameter to "full"
COMP 3314 33

Incremental PCA
● The previous PCA implementations require the whole training set to fit in memory
● Incremental PCA (IPCA) allows you to feed an IPCA algorithm one mini-batch at a time
○ Useful for large training sets and online training (i.e., on the fly, as new data arrive)
● The following code splits the MNIST dataset into 100 mini-batches (using NumPy’s
array_split() function) and feeds them to Scikit-Learn’s IncrementalPCA class
○ Note that you must call the partial_fit() method with each mini-batch, rather than the
fit() method with the whole training set
COMP 3314 34

Code - LLE.ipynb
● Available here on CoLab
COMP 3314 36

LLE
● How it works
○ Measures how each training instance linearly relates to its
closest neighbors
○ Then looks for a low-dimensional representation of the training
set where these local relationships are best preserved
● This approach makes it particularly good at unrolling twisted
manifolds, especially when there is not too much noise
COMP 3314 37

Example: Unrolling the Swiss roll

COMP 3314 38

LLE - Details
● For each training sample x(i), the algorithm identifies its
n_neighbors closest neighbors
○ E.g., n_neighbors = 10
● Then it tries to reconstruct x(i) as a linear function of these
neighbors
● More specifically, it finds the weights wi,j such that the squared
distance between x(i) and

is as small as possible, assuming wi,j = 0 if x(j) is not one of the k

closest neighbors of x(i)
COMP 3314 39

LLE - Details
● Thus the first step of LLE is the constrained optimization problem
below, where W is the weight matrix containing all the weights wi,j
● The second constraint simply normalizes the weights for each
training instance x(i)
COMP 3314 40

LLE - Details
● After this step, the weight matrix Wˆ (containing the weights wˆi,j)
encodes the local linear relationships between the training instances
● The second step is to map the training instances into a k-
dimensional space (where k < d) while preserving these local
relationships as much as possible
● If z(i) is the image of x(i) in this k-dimensional space, then we want
the squared distance between z(i) and

● to be as small as possible
COMP 3314 41

LLE - Details
● This idea leads to the following unconstrained optimization
problem
● It looks very similar to the first step, but instead of keeping the
instances fixed and finding the optimal weights, we are doing the
reverse
○ Keeping the weights fixed and finding the optimal position of
the instances’ images in the low-dimensional space
● Note that Z is the matrix containing all z(i)
COMP 3314 42

Other Dimensionality Reduction Techniques

● There are many other dimensionality reduction techniques, several
of which are available in Scikit-Learn
● Here are some of the most popular ones
○ Random Projections
○ Multidimensional Scaling (MDS)
○ Isomap
○ t-Distributed Stochastic Neighbor Embedding (t-SNE)
○ Linear Discriminant Analysis (LDA)
COMP 3314 43

References
● Most materials in this chapter are
based on
○ Book
○ Code
COMP 3314 44

References
● Some materials in this chapter
are based on
○ Book
○ Code
COMP 3314 45

Exercise 1
● What are the main motivations for reducing a dataset’s dimensionality?
○ What are the main drawbacks?
● What is the curse of dimensionality?
● Once a dataset’s dimensionality has been reduced, is it possible to reverse
the operation?
○ If so, how? If not, why?
● Can PCA be used to reduce the dimensionality of a highly nonlinear
dataset?
● Suppose you perform PCA on a 1,000-dimensional dataset, setting the
explained variance ratio to 95%
○ How many dimensions will the resulting dataset have?
COMP 3314 46

Exercise 2
● In what cases would you use vanilla PCA, Incremental PCA,
Randomized PCA?
● How can you evaluate the performance of a dimensionality
reduction algorithm on your dataset?
● Does it make any sense to chain two different dimensionality
reduction algorithms?
COMP 3314 47

Exercise 3
● Load the MNIST dataset and split it into a training set and a test set (take the first
60,000 instances for training, and the remaining 10,000 for testing)
● Train a Random Forest classifier on the dataset and time how long it takes, then
evaluate the resulting model on the test set
● Next, use PCA to reduce the dataset’s dimensionality, with an explained variance
ratio of 95%
● Train a new Random Forest classifier on the reduced dataset and see how long it
takes
● Was training much faster?
● Next, evaluate the classifier on the test set
● How does it compare to the previous classifier?

UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Unit 3
No ratings yet
Unit 3
102 pages
Comp3314 9. Dimensionality Reduction
No ratings yet
Comp3314 9. Dimensionality Reduction
44 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Computer Vision and Image Processing - Fundamentals and Applications
34 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
20 Pca
No ratings yet
20 Pca
50 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
Lec 13-14 PCA
No ratings yet
Lec 13-14 PCA
53 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
Veeam Certified Engineer 2021 VMCE2021 Exam Dumps
No ratings yet
Veeam Certified Engineer 2021 VMCE2021 Exam Dumps
11 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
14 Pca
No ratings yet
14 Pca
18 pages
ASM2 - Database - GCS200218
No ratings yet
ASM2 - Database - GCS200218
22 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
EMC Part Number
0% (1)
EMC Part Number
9 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
51 pages
Week 1
No ratings yet
Week 1
19 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
COMPX310-19A Machine Learning Chapter 8: Dimensionality Reduction
No ratings yet
COMPX310-19A Machine Learning Chapter 8: Dimensionality Reduction
35 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Project LA
No ratings yet
Project LA
13 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Essbase Interview Questions
No ratings yet
Essbase Interview Questions
43 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
7 pages
Love Report
No ratings yet
Love Report
7 pages
Business Rules
No ratings yet
Business Rules
5 pages
1 - Project File (By Sai Supriya)
No ratings yet
1 - Project File (By Sai Supriya)
43 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
PCA
100% (1)
PCA
33 pages
إدارة أنظمة المعلومات
No ratings yet
إدارة أنظمة المعلومات
31 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
8 pages
Dinesh Kumar S: Professional Summary
No ratings yet
Dinesh Kumar S: Professional Summary
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Rar Practicals Bokok
No ratings yet
Rar Practicals Bokok
14 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Mid Semester 2
No ratings yet
Mid Semester 2
20 pages
Data Engineer Certification Study Guide
No ratings yet
Data Engineer Certification Study Guide
2 pages
Computer Science IGCSE Paper 2
No ratings yet
Computer Science IGCSE Paper 2
8 pages
Pca
No ratings yet
Pca
6 pages
DBMS - Mod 1
No ratings yet
DBMS - Mod 1
145 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Maintaining and Caring For Your EPM Environment: Luis Castillo, May 8, 2013
No ratings yet
Maintaining and Caring For Your EPM Environment: Luis Castillo, May 8, 2013
22 pages
How To Switch Replica Master of A Non-GTID Slave in Percona Cluster ? - Mydbops
No ratings yet
How To Switch Replica Master of A Non-GTID Slave in Percona Cluster ? - Mydbops
5 pages
Unlock The Power of FDME Guy Rogers Keyteach
No ratings yet
Unlock The Power of FDME Guy Rogers Keyteach
57 pages
CTO Tuesdays #47: Compensation
No ratings yet
CTO Tuesdays #47: Compensation
17 pages
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
Accy 231 Report
No ratings yet
Accy 231 Report
16 pages
Lab Report 10 Dbms
No ratings yet
Lab Report 10 Dbms
5 pages
(A ) - (240605)
No ratings yet
(A ) - (240605)
8 pages
Performance Tuning - Index Management & DTA
No ratings yet
Performance Tuning - Index Management & DTA
2 pages
SQL Syntax
No ratings yet
SQL Syntax
2 pages
Important Xii Practical Questions: For Example
No ratings yet
Important Xii Practical Questions: For Example
4 pages
2 Data Warehouse Conceptual Design
No ratings yet
2 Data Warehouse Conceptual Design
17 pages
Hard Drive Knowledge: Blocks vs. Sectors
No ratings yet
Hard Drive Knowledge: Blocks vs. Sectors
2 pages
Implementing Federated Governance in DataMesh Architecture
No ratings yet
Implementing Federated Governance in DataMesh Architecture
18 pages
For Resubmission of Esf7 March13
No ratings yet
For Resubmission of Esf7 March13
7 pages
Database
No ratings yet
Database
3 pages
Openssl Commands To Convert Certificate Format
No ratings yet
Openssl Commands To Convert Certificate Format
3 pages
MSBI Notes
No ratings yet
MSBI Notes
9 pages