0% found this document useful (0 votes)
11 views38 pages

03 Dimensionality Reduction

03 Dimensionality Reduction

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views38 pages

03 Dimensionality Reduction

03 Dimensionality Reduction

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Dimensionality

Reduction
Course: Artificial Intelligence
Fundamentals

Instructor: Marco Bonzanini


Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Section Agenda

• Introduction to Dimensionality Reduction

• Dimensionality Reduction with PCA


Introduction to
Dimensionality
Reduction
Motivation

• Too many dimensions! Do we need all of them?

• Some ML algorithms are not able to handle many


variables too well:
- they are too slow
- they are inaccurate

• How do you visualise N dimensions on a 2D screen?


Dimensionality Reduction

• Statistical techniques to reduce the number of


dimensions have been developed since the early
1900s

• Still extremely valuable in ML nowadays

• Different strategies:
- Feature Selection
- Feature Projection
Feature Selection
• A subset of the original variables is used

• Filter methods: least interesting variables are


suppressed (regardless of the model), e.g. using
information gain / correlation

• Wrapper methods: use a subset of variables to train a


model, features are added/removed from here
iteratively (watch out for overfitting)

• Embedded methods: similar to wrapper, but an intrinsic


metric is used when building the model
Feature Projection

• Transform the data in high-dimensional space into


a space with fewer dimensions

• The extracted set of dimensions is intended to be


informative and non-redundant

• Example: Principal Component Analysis (PCA)


Example

From:

To:
Dimensionality
Reduction
Algorithms
Principal Component Analysis

• Converts a set of observations (with a number of


variables) into a set of values of linearly
uncorrelated variables called principal components

• With N samples and P variables:


at most min(N-1, P) principal components

• But how to find these variables / dimensions?


PCA by Example
Suppose we have a data set:
N samples with 2 variables

VAR 1 VAR 2
12 32
54 56
34 34
… …

at most min(N-1, 2) principal components


PCA by Example
var 2

var 1
PCA by Example
var 2

var 1
PCA by Example
var 2
Take the average for
the first dimension

X
var 1
PCA by Example
var 2
and for the other dimension

X
var 1
PCA by Example
var 2
centre of the data set

X X

X
var 1
PCA by Example
shift the data set
var 2 so that the centre
corresponds to the origin

var 1
PCA by Example
var 2

X
var 1
PCA by Example
var 2
Relative positions:
still the same

X
var 1
PCA by Example
var 2
Fit a line that goes
through the origin

X
var 1
PCA by Example
var 2
Start random

var 1
PCA by Example
var 2
Rotate the line

var 1
PCA by Example
var 2
until you find the best fit

var 1
PCA by Example

How to decide what’s a good fit?


PCA by Example
var 2
Find the projections

X
X
X

var 1
X
X
X
PCA by Example
var 2 - minimise:
d(point, line)
- maximise:
X d(project, origin)
X
X

var 1
X
X
X
PCA by Example
• Minimising the distance from the point to the line, or
maximising the distance from the projection to the
origin are equivalent (Why?)

• Intuitively, it makes sense to minimise the distance


point-to-line, but in practice it could be easier to
calculate the distance projection-to-origin

• Use sum of squared distances (could have negative


distance)
PCA by Example
var 2
This line maximises the
sum of squared distances
projection-to-origin

var 1
PCA by Example
var 2
This line represent the
first principal component
(or PC1)

var 1
PCA by Example

• A Principal Component is a linear combination of the


original variables

• We are essentially maximising the spread of the


projection

• Note: principal components are orthogonal


PCA by Example
var 2 In 2D, it’s easy to find PC2
PC2 (do we really need it?)

PC1

var 1
PCA by Example
var 2 Finding the values:
PC2 rotate the PCs so that
PC1 is horizontal

PC1

var 1
PCA by Example
PC2 Finding the values:
rotate the PCs so that
PC1 is horizontal

PC1
PCA by Example
PC2 Finding the values:
rotate the PCs so that
PC1 is horizontal

PC1
PCA Discussion
• The components are ordered by variance
i.e. the first component is the one with the highest
variance

• How many components? (see notebook)

• What do the new dimensions mean?

• Objects that are similar in high-dimensional space


should also be similar in the transformed space
Questions?

You might also like