0% found this document useful (0 votes)

24 views7 pages

W9a Autoencoders Pca

This document discusses dimensionality reduction techniques like autoencoders and principal component analysis (PCA). Autoencoders are neural networks that learn compressed representations of input data without labels. PCA finds the principal components or directions along which the data varies the most and can be used to reduce dimensionality. Both techniques aim to represent data using fewer dimensions while retaining key information.

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

W9a Autoencoders Pca

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Autoencoders and Principal Components

Analysis (PCA)
One of the purposes of machine learning is to automatically learn how to use data, without
writing code by hand. When we started the course with linear regression, we saw that we
could represent complicated functions if we hand-engineered features (or basis functions).
Those functions can then be turned into “neural networks”, where — given enough labelled
data — we can learn the features that are useful for classification automatically.
For some data science tasks the amount of labelled data is small. In these situations it is
useful to have pre-existing basis functions that were fitted as part of solving some other task.
We can then fit a linear regression model on top of these basis functions. Or perhaps use the
basis functions to initialize a neural network, and only train for a short time.
The basis functions could come from fitting another supervised task. For example, neural
networks trained on the large ImageNet dataset are often used to initialize the training of
image recognition models for tasks with only a few labels.
We may also wish to use completely unlabelled data, such as relevant but unannotated
images or text. Recently (2018–), there has been an explosion of interest in Natural Language
Processing in using pre-trained deep neural networks based on unlabelled data. See the
Further Reading for papers.

1 Autoencoders
Autoencoders solve an “unsupervised” task: find a representation of feature vectors, without
any labels. This representation might be useful for other tasks. An autoencoder is a neural
network representing a vector-valued function, which when fitted well, approximately
returns its input:
f(x) ≈ x. (1)
If we were allowed to set up the network arbitrarily, this function is easy to represent. For
example, we could use a single “weight matrix”:

f(x) = Wx, with W = I, the identity. (2)

Constraints are required to find an interesting representation that might be useful.

Dimensionality reduction: One possible constraint is to form a “bottleneck”. We use a
neural network with a narrow hidden layer with K D units:

h = g ( 1 ) (W ( 1 ) x + b ( 1 ) ) (3)
(2) (2) (2)
f=g (W h+b ), (4)

where W (1) is a K × D weight matrix, and the g’s are element-wise functions. If the function
output manages to closely match its inputs, then we have a good lossy compressor. The
network can compress D numbers down into K numbers, and then decodes them again,
approximately reconstructing the original input.
One application of dimensionality reduction is visualization. When K = 2 we can plot our
transformed data as a 2-dimensional scatter-plot.
When an autoencoder works well, the transformed values h contain most of the information
from the original input. We should therefore be able to use these transformed vectors as
input to a classifier instead of our original data. It might then be possible to fit a classifier
using less labelled data, because we are fitting a function with lower-dimensional inputs.

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

Fitting the weights of an autoencoder extracts information about how our inputs are dis-
tributed from our training dataset, which might help us fit a classifier or other methods.
However, at test time, transforming an individual input with an autoencoder can’t add
information about that example, an observation known as the “data processing inequality”.
Denoising and sparse autoencoders: Fitting an autoencoder with a high-dimensional hid-
den layer gives features that are easier to separate with a linear classifier.1 A regularization
strategy that enables setting K ≥ D is the denoising autoencoder: randomly set some of the
features in the input to zero, but try to reconstruct the original uncorrupted vector. Then
if K = D the best strategy is no longer for W (1) to be the identity matrix. The hidden units
should represent common conjunctions of multiple input features, so that missing features
can be reconstructed. Alternatively, sparse autoencoders only allow a small fraction of the K
hidden units to take on non-zero values. That limitation forces the network to represent
the input vector as a linear combination of a small number of different “sources”. A large
number, K, of different sources are possible, but only a few can be used for each example.

2 Interlude on covariance matrices

Real symmetric matrices, like covariance matrices, can be decomposed as:

Σ = QΛQ> , (5)
where Λ is a diagonal matrix containing the eigenvalues of Σ, and the columns of Q contain
the eigenvectors of Σ.
Questions:

a) Describe how to sample from N (0, Σ) given Q and Λ.

Hint: If you can write the covariance in the form Σ = AA> , you should have the
answer.
You may not just recompute Σ = QΛQ> and then use some other decomposition, like
the Cholesky decomposition; you need to actually use the eigendecomposition!
[The website version of this note has a question here.]

b) Q is an orthogonal matrix, corresponding to a rigid rotation (and possibly a reflection).

Describe geometrically (perhaps in 2D) how the sampling process in part a) transforms
a cloud of points drawn from a standard normal.
Hint: consider what happens to a point sitting one unit along one of the axes, e.g. at
(1, 0) in 2D.
[The website version of this note has a question here.]

3 Principal Components Analysis (PCA)

A linear autoencoder sets the activation functions above to g(1) ( a) = g(2) ( a) = a. It turns out
that2 the best square error for a linear auto-encoder can still be obtained when re-writing
the transformations in the restricted form:
h = W (1) x + b(1) = V > (x − x̄) (6)
(2) (2)
f=W h+b = Vh + x̄. (7)

1. This animation: https://www.youtube.com/watch?v=3liCbRZPrZA demonstrates a non-linear transformation of

two-dimensional points into three dimensions. In this example, a circle of points in a two dimensional space can be
separated with a plane in a three-dimensional space.
2. We don’t prove any of the properties of PCA in this note. We sketch some of the theory in slightly more detail
in the video, but see the (optional) further reading for detailed proofs if you want them.

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

The first step centres the input by subtracting the mean x̄ = N1 ∑n x(n) , and then transforms to
a K-dimensional ‘hidden’ representation using a D × K matrix V. The second step transforms
back up to the high D-dimensional space, and re-centres the points around the mean. The
parameters in the two transformations are shared, which turns out not to be a limitation.
[The website version of this note has a question here.]
The method of Principal Components Analysis (PCA) identifies a matrix V, which if used in
the above autoencoder achieves the best possible square error. There are three advantages
over fitting the autoencoder with gradient methods: 1) The solutions for different hidden
layer sizes K are nested: for a given input vector, the extracted feature h1 is the same in the
solutions for all K, h2 is the same for all K ≥ 2, and so on. 2) This constraint makes the
solution unique, so we fit the same parameters every time. 3) The parameters can be found
using standard linear algebra operations.
We compute the covariance matrix of the points Σ = N1 ∑n (x(n) − x̄)(x(n) − x̄)> . Recalling
the material on multivariate Gaussians, the covariance can be used to describe an ellipsoidal
ball that summarizes how the data is spread in space.3 Some axes of the ellipsoid are
often very short, and the data is “squashed” into a ball that only significantly extends
in some directions. As we saw in the interlude on covariances above, the eigenvectors of
the covariance matrix point along the axes of the ellipsoid, and the longest axes are the
eigenvectors with the largest eigenvalues.
PCA measures the displacement of a data-point from the mean x̄ along the K most elongated
axes of the ellipsoid. To do that, it sets the columns of the transformation matrix V to the K
eigenvectors of the covariance matrix associated with the largest K eigenvalues.
Example Python code:
# Find top K principal directions:
E, V = np.linalg.eig(np.cov(X.T))
idx = np.argsort(E)[::-1]
V = V[:, idx[:K]] # (D,K)
x_bar = np.mean(X, 0)

# Transform down to K-dims:

X_kdim = np.dot(X - x_bar, V) # (N,K)

# Transform back to D-dims:

X_proj = np.dot(X_kdim, V.T) + x_bar # (N,D)

An illustration for D = 2 and K = 1:

1
K=1
+=X
0
·= X proj
−1 — = V[:,0]

−1 0 1

The two-dimensional coordinate of each + is reduced to one number, giving the position
along the red line that it has been projected onto (the principal component). Transforming
back up to two dimensions gives the coordinates of the •’s in the full 2-dimensional space.

3. The data might not be Gaussian distributed, so this summary could be misleading, just as the standard deviation
can be a misleading indicator of width for a 1D distribution.

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

These projected points (for zero-mean data XVV > ) are constrained to lie along a one-
dimensional line. (See also the related discussion in Q5 of the background material self-test.)
The position along the second principal axis has been lost.
The eigenvectors of a covariance matrix are orthogonal, so if all the dimensions are kept,
that is K = D, then VV > = I, and no information is lost.
We can replace a feature vector x with a linear transformation Ax in any model. The
transformation A could even be generated at random (no fitting, so no risk of overfitting!).
Alternatively the transformation could be seen as K × D extra parameters, and fitted as part
of the model (like in a neural network). PCA is a way to fit a sensible transformation, but
without fitting (and possibly overfitting) to a specific task.

4 PCA Examples
PCA is widely used, across many different types of data. It can give a quick first visual-
ization of a dataset, or reduce the number of dimensions of a data matrix if overfitting or
computational cost are concerns.
An example where we expect data to be largely controlled by a few numbers is body
shape. The location of a point on a triangular mesh representing a human body is strongly
constrained by the surrounding mesh-points, and could be accurately predicted with linear
regression. PCA describes the principal ways in which variables can jointly change when
moving away from the mean object.4 The principal components are often interpretable,
and can be animated. Starting at a mean body mesh, one can move along each of the
principal components, showing taller/shorter people, and then thinner/fatter people. The
later principal components will correspond to more subtle, less interpretable combinations
of features that covary.
A striking PCA visualization was obtained by reducing the dimensionality of ≈ 200, 000
features of people’s DNA to two dimensions (Novembre et al., 2008).5 The coordinates along
the two principal axes closely correspond to a map of Europe showing where the people
came from(!). The people were carefully chosen.
As is often the case with useful algorithms, we can choose how to put our data into them, and
solve different tasks with the same code. Given an N × D matrix, we can run PCA to visualize
the N rows. Or we can transpose the matrix and instead visualize the D columns. As an
example, we took a binary S × C matrix M relating students and courses. Msc = 1, if student
s was taking course c. In terms of these features, each course is a length-S vector, or each
student is a length-C vector. We can reduce either of these sets of vectors to 2-dimensions
and visualize them.
The 2D scatter plot of courses was somewhat interpretable:
CPSLP
MT
ANLP
0.3 NLU
SProc
ASR

0.2

ALE1 TCM
CCS
0.1 MASWS

MI CCN

DS CN STPPLS PA BIO1 CNV NC

0 DAPA PM BIO2
SAPM SEOC DIE IQC
EXC SP HCI LPABS
ADBS TDD COPT
QSX AR
CG IJP
TTS CAVRC NIP
AGTA
−0.1
IAML
RLSCAV
ITRSS
DME
DMR
−0.2 RL
MLPR
PMR
−0.2 −0.1 0 0.1 0.2

4. While they’re doing something a little more complicated, you can get an idea of what the principal components
of body shape look like from the figures in the paper: Lie bodies: a manifold representation of 3D human shape,
Freifeld and Black, ECCV 2012.
5. Genes mirror geography within Europe. https://www.nature.com/articles/nature07331

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 4

One axis (roughly) goes from computer-based applications of Informatics, through theory to
broader applications. The other goes from cognitive/language applications down to machine
learning. The algorithm had no labels, just which courses are taken together.
A scatter plot of students was less interpretable. We didn’t find obvious groups corresponding
to the topics (suggested groups of courses) in the Informatics MSc handbook:
0.1

0.05

−0.05

−0.1

−0.15

−0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Finally, PCA doesn’t always work well. One of the papers that helped convince people that
it was feasible to fit deep neural networks showed impressive results with non-linear deep
autoencoders in cases where PCA worked poorly: Reducing the dimensionality of data with
neural networks, Hinton and Salakhutdinov (2006). Science, Vol. 313. no. 5786, pp504–507,
28 July 2006. Available from https://www.cs.utoronto.ca/~hinton/papers.html

5 Pre-processing matters
The units that data are measured in affects the principal components. Given age, weights,
and heights of people, it matters if their height is measured in centimeters or meters. The
numbers are 100 times bigger if we use centimeters, making the square error for an equivalent
mistake in reconstructing height 10,000 times bigger. Therefore, if we use centimeters the
principal component will be more aligned with height to reduce overall square error than
if we use meters. To give each feature similar importance, it’s common to standardize all
features so they have unit standard deviation: but the best scaling could depend on the
application.
Given DNA data, xd ∈ {A, C, G, T}, we have to decide how to encode categorical data. We
could use one-hot encoding. In the example above, Novembre et al. used a lossy binary
encoding indicating if the subject had the most common letter or not.
As usual, given positive data, we may wish to take logarithms. There are lots of free choices
in data analysis.

6 PCA and SVD

The truncated SVD view of PCA reflects the symmetry noted in the MSc course data example
above: we can find a low-dimensional vector representing either the rows or columns of a
matrix. SVD finds both at once.

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 5

The truncated Singular Value Decomposition (SVD) is a standard technique, available in
most linear algebra packages. It approximates an N × D matrix as a product of three matrices,

X ≈ USV > , (8)

where U has size N × K, S is a diagonal K × K matrix, and V > has size K × D. The columns of
the V matrix (or the rows of V > ) contain eigenvectors of X > X. The columns of U contain
eigenvectors of XX > . The rows of U give a K-dimensional embedding of the rows of X. The
columns of V > (or the rows of V) give a K-dimensional embedding of the columns of X.
When K = min( N, D ), SVD exactly reconstructs the matrix. For smaller K, truncated SVD is
known to be the best low-rank approximation of a matrix, as measured by square error.
When applied to centred data (the mean feature vector has been subtracted from every row of
X, so that ∑n Xnd = 0 for each feature d), SVD gives the same solution as PCA. The V matrix
contains the eigenvectors of the covariance (Σ = N1 X > X, where the 1/N scaling makes no
difference to the directions). The U matrix contains the eigenvectors of the covariance if we
were to transpose our data matrix before applying PCA.
Python demo:
# PCA via SVD, for NxD matrix X
x_bar = np.mean(X, 0)
[U, vecS, VT] = np.linalg.svd(X - x_bar, 0) # Apply SVD to centred data
U = U[:, :K] # NxK "datapoints" transformed into K-dims
vecS = vecS[:K] # The diagonal elements of diagonal matrix S, in a vector
V = VT[:K, :].T # DxK "features" transformed into K-dims
X_kdim = U * vecS # = np.dot(U, np.diag(vecS))
X_proj = np.dot(X_kdim, V.T) + x_bar # SVD approx USV' + mean

7 Probabilistic versions of PCA

The simplest probabilistic model of D-dimensional feature vectors x that lie on a low-
dimensional manifold, is to assume they’re Gaussian. The model assumes that a K-dimensional
Gaussian variable was generated, ν ∼ N (0, IK ), and then transformed up into D-dimensions,
x = Wν + µ, where W is a D ×K matrix. Under this model, x ∼ N (µ, WW > ). The covariance
is low rank, rank K, because it only has K independent rows or columns. By the construction,
all vectors x generated from this model will lie exactly on a linear subspace of dimension K.
A Gaussian with low-rank covariance isn’t able to explain real-world data, which won’t
lie exactly on a linear subspace. Specifically the likelihood of such a model will be zero if
any data-points lie outside the K-dimensional subspace. We can explain such deviations
by assuming that spherical noise was added to the points from the model of the previous
paragraph: x ∼ N (µ, WW > + σ2 I). This is the probabilistic PCA (PPCA) model. In the limit
as σ2 → 0 the low-dimensional explanations of the data will be the same as PCA. But a more
sensible model will result by setting non-zero σ2 . PPCA is a special case of probabilistic
Factor Analysis, which sets the noise to be an arbitrary diagonal covariance matrix.
[The website version of this note has a question here.]

8 Further reading
Different tutorials will focus on different use-cases of PCA. Some practitioners are mostly
interested in reducing the dimensionality of their data. Others are interested in inspecting
and interpreting the components.
You may also find that different tutorials put different emphasis on the two different
principles from which PCA can be derived: 1) Auto-encoding / error minimization: PCA

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 6

provides the nested set of K-dimensional linear subspaces that minimize square error
when reconstructing the data with a linear transformation. 2) Variance maximization: each
principal component is the direction in which the data is most spread out (as measured by
the variance), with the constraint that each component is orthogonal to all of the previous
components. It’s neat that these two different principles give the same principal components!
Bishop covers PCA in Chapter 12, with mathematical detail to back up the results.
Murphy’s treatment of PCA: Section 12.2.1 p387–389, and Section 12.2.3 pp392–395.
Barber’s treatment starts in Section 15.2.
There are many online tutorials about PCA, with different levels of detail. The one by Alex
Williams looks good.
Goodfellow et al.’s Deep Learning Textbook has much more on autoencoders in Chapter 14.
If you wanted to train an auto-encoder to have a nested set of hidden units like PCA, where
keeping the top K units gives good reconstructions, you could use a method called “Nested
Dropout” (Rippel et al, 2014).
There are also non-parametric dimensionality reduction methods for visualization, such as
t-SNE. These place each data-point at an arbitrary location on a scatter plot, by minimizing
a cost function. The cost function says it is good if some properties of the scatter plot match
the original high-dimensional data. For example, it is good to approximately preserve the
relative distances between points, especially between nearby-points. There are examples
where t-SNE gives far better visualizations than PCA does. I’ve also recently (2018) enjoyed
using UMAP. However, in other applications (like the MSc data above) the best method of
several I tried was simple linear PCA.
We can modify a denoising autoencoder to only compute the loss on the predictions of
features that were masked out. It’s possible to turn such a network into a probabilistic
model P(x) of the inputs6 . Such masked autoencoder objectives have also been used in
BERT (Devlin et al., 2018)7 , to pre-train representations of text for use in many other Natural
Language Processing (NLP) tasks. BERT and its successors are causing big changes in NLP8 ,
including (October 2019) how Google search works9 .

9 Bonus note on matrix functions

Non-examinable!
A previous MLPR student asked: if X is a square symmetric matrix, doesn’t the SVD of X
give us the eigenvectors of X? Yes it does. That’s potentially confusing because above we
said that the SVD gives the eigenvectors of XX > and X > X. For a square symmetric matrix,
the SVD therefore gives the eigenvectors of X 2 . These are in fact the same as the eigenvectors
of X, so there’s no contradiction.
X 2 is the square function applied to the matrix X. A way to apply a function to a square
matrix is to decompose the matrix using its eigendecomposition X = UΛU −1 , apply the
function to the diagonal elements of Λ (in this case square the values), and then put the
matrix back together again. The eigenvectors don’t change! For a square symmetric matrix,
the SVD and the eigendecomposition are the same thing.
It may interest you to know that other functions are applied to matrices in this way. For
example the matrix exponential of a square matrix (scipy.linalg.expm in Python), equal to
X + 12 X 2 + 3!1 X 3 + 4!1 X 4 + . . ., can (in principle) be computed by taking the eigendecomposi-
tion, exponentiating the eigenvalues, and putting the matrix back together again. (Although
the best way to compute the matrix exponential numerically is a complicated question.)

6. https://homepages.inf.ed.ac.uk/imurray2/pub/14dnade/
7. https://arxiv.org/abs/1810.04805
8. https://thegradient.pub/nlp-imagenet/
9. https://blog.google/products/search/search-language-understanding-bert

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 7

Gen AI Unit 2
100% (1)
Gen AI Unit 2
65 pages
Sci 7 q1 12 Demonstrate Proper Use and Handling of Science Equipment
No ratings yet
Sci 7 q1 12 Demonstrate Proper Use and Handling of Science Equipment
44 pages
Wind Turbine Blade Design On SolidWorks
No ratings yet
Wind Turbine Blade Design On SolidWorks
6 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Unit 3
No ratings yet
Unit 3
21 pages
Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
100% (1)
Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
15 pages
Sim of Tyre Rolling Resistance Final Rev
No ratings yet
Sim of Tyre Rolling Resistance Final Rev
26 pages
DSA5102 Lecture9
100% (1)
DSA5102 Lecture9
35 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
85 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Manual Midas m32
100% (1)
Manual Midas m32
61 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Lecture Slides - 11-16 - 2024
No ratings yet
Lecture Slides - 11-16 - 2024
128 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Crankshaft Balance: Practical Considerations
No ratings yet
Crankshaft Balance: Practical Considerations
4 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
TS Part2
No ratings yet
TS Part2
62 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
79 pages
Principal Component Analysis PCA 17
No ratings yet
Principal Component Analysis PCA 17
58 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
CS-AI LECUTRE NOTES Unsupervised Learning-03
No ratings yet
CS-AI LECUTRE NOTES Unsupervised Learning-03
71 pages
PCA (v3)
No ratings yet
PCA (v3)
34 pages
Part 3
No ratings yet
Part 3
29 pages
COMP 002 Computer Application Module TEACHERS
No ratings yet
COMP 002 Computer Application Module TEACHERS
34 pages
Part 4
No ratings yet
Part 4
24 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
L6 - Eigenmodels2 - 2023
No ratings yet
L6 - Eigenmodels2 - 2023
28 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Project Report
No ratings yet
Project Report
29 pages
DSA5105 Lecture8
No ratings yet
DSA5105 Lecture8
35 pages
MDA3S
No ratings yet
MDA3S
22 pages
Unit 4 Transport Layer
No ratings yet
Unit 4 Transport Layer
25 pages
Part 5
No ratings yet
Part 5
31 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
20 Pca
No ratings yet
20 Pca
50 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
L7 Ann
No ratings yet
L7 Ann
22 pages
Lecture 2.3.2VariationalAutoencoders (VAEs)
No ratings yet
Lecture 2.3.2VariationalAutoencoders (VAEs)
25 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Auto Encoder
No ratings yet
Auto Encoder
17 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Math - Problem Solving One-Step Equation Word Problems Rubric
No ratings yet
Math - Problem Solving One-Step Equation Word Problems Rubric
2 pages
Autoencoders Reloaded: Hervé Bourlard Selen Hande Kabil
No ratings yet
Autoencoders Reloaded: Hervé Bourlard Selen Hande Kabil
18 pages
Forging Presentation
No ratings yet
Forging Presentation
17 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Limitations of Conventional Mobile Systems Over Cellular Mobile System
No ratings yet
Limitations of Conventional Mobile Systems Over Cellular Mobile System
17 pages
DAC: Deep Autoencoder-Based Clustering, A General Deep Learning Framework of Representation Learning
No ratings yet
DAC: Deep Autoencoder-Based Clustering, A General Deep Learning Framework of Representation Learning
12 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
A Beginner's Guide To Eigenvectors, Eigenvalues, PCA, Covariance and Entropy - Pathmind
No ratings yet
A Beginner's Guide To Eigenvectors, Eigenvalues, PCA, Covariance and Entropy - Pathmind
15 pages
Effect of Soundscape Dimensions On Acoustic Comfort in Urban Open Publicspaces
No ratings yet
Effect of Soundscape Dimensions On Acoustic Comfort in Urban Open Publicspaces
9 pages
ANSUMAN SHARMA 109ee0305 Department of ELECTRICAL Engineering, NIT Rourkela
100% (1)
ANSUMAN SHARMA 109ee0305 Department of ELECTRICAL Engineering, NIT Rourkela
1 page
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Auto Encoder
No ratings yet
Auto Encoder
11 pages
Ic Engine and Turbo Machinery
No ratings yet
Ic Engine and Turbo Machinery
25 pages
Stacked Autoencoders. - Towards Data Science
No ratings yet
Stacked Autoencoders. - Towards Data Science
9 pages
Pump Minimum Continuous Stable Flow (MCSF)
No ratings yet
Pump Minimum Continuous Stable Flow (MCSF)
6 pages
MTH302-lec-02 Worksheet
No ratings yet
MTH302-lec-02 Worksheet
6 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
Probabilistic & Unsupervised Learning: Maneesh@gatsby - Ucl.ac - Uk
No ratings yet
Probabilistic & Unsupervised Learning: Maneesh@gatsby - Ucl.ac - Uk
10 pages
Design of PV
No ratings yet
Design of PV
6 pages
Slides Lecture7 Ext
No ratings yet
Slides Lecture7 Ext
21 pages
LIFT DATA SHEET (Single Mobile Crane Lift)
No ratings yet
LIFT DATA SHEET (Single Mobile Crane Lift)
1 page
Summary Report: Threat Analysis
No ratings yet
Summary Report: Threat Analysis
9 pages
Properties of Water Reading - (1 - )
No ratings yet
Properties of Water Reading - (1 - )
4 pages
Manifold Learning Algorithms
No ratings yet
Manifold Learning Algorithms
17 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Wk01 Machine Learning
No ratings yet
Wk01 Machine Learning
6 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
P310 A4 Colour
No ratings yet
P310 A4 Colour
4 pages
Rocker Arm Installation & Removal 0 Deutz 0312 4232 2011
No ratings yet
Rocker Arm Installation & Removal 0 Deutz 0312 4232 2011
4 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Abyss MiniRPG
No ratings yet
Abyss MiniRPG
4 pages
Principal Component Analysis (PCA) : Feature Extraction Node
No ratings yet
Principal Component Analysis (PCA) : Feature Extraction Node
4 pages
ESC201 UDas Lec24Corrected OpAmp Aps PDF
No ratings yet
ESC201 UDas Lec24Corrected OpAmp Aps PDF
6 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Final Jacking Method
No ratings yet
Final Jacking Method
15 pages
Presentation
No ratings yet
Presentation
31 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Lab Java MyRMwhFBL
No ratings yet
Lab Java MyRMwhFBL
2 pages
Backpropagation: Static Backpropagation Is A Network Designed To
No ratings yet
Backpropagation: Static Backpropagation Is A Network Designed To
2 pages
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
Soeg RT m18 Ps K GB
No ratings yet
Soeg RT m18 Ps K GB
5 pages
Iub Port Available Bandwidth Utilizing Ratio PDF
No ratings yet
Iub Port Available Bandwidth Utilizing Ratio PDF
2 pages
DSPDF Formulae
No ratings yet
DSPDF Formulae
3 pages
Master-of-Science-in-Renewable-Energy-and-Management
No ratings yet
Master-of-Science-in-Renewable-Energy-and-Management
1 page

W9a Autoencoders Pca

Uploaded by

W9a Autoencoders Pca

Uploaded by

Autoencoders and Principal Components

f(x) = Wx, with W = I, the identity. (2)

Constraints are required to find an interesting representation that might be useful.

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

2 Interlude on covariance matrices

a) Describe how to sample from N (0, Σ) given Q and Λ.

b) Q is an orthogonal matrix, corresponding to a rigid rotation (and possibly a reflection).

3 Principal Components Analysis (PCA)

1. This animation: https://www.youtube.com/watch?v=3liCbRZPrZA demonstrates a non-linear transformation of

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

# Transform down to K-dims:

# Transform back to D-dims:

An illustration for D = 2 and K = 1:

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

DS CN STPPLS PA BIO1 CNV NC

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 4

6 PCA and SVD

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 5

X ≈ USV > , (8)

7 Probabilistic versions of PCA

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 6

9 Bonus note on matrix functions

MLPR:w9a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 7

You might also like