SML unit 3
SML unit 3
Unsupervised Learning
The main difference between Autoencoders and Principle Component Analysis (PCA) is that
while PCA finds the directions along which you can project the data with maximum
variance, Autoencoders reconstruct our original input given just a compressed version of it.
If anyone needs the original data can reconstruct it from the compressed data using an
autoencoder
the input fed to the network is 784 pixels (the square of the 28x28 pixel images in
the MNIST dataset), then the first layer of the deep autoencoder should have 1000
parameters; i.e. slightly larger.
This may seem counterintuitive, because having more parameters than input is a
good way to overfit a neural network.
In this case, expanding the parameters, and in a sense expanding the features of the
input itself, will make the eventual decoding of the autoencoded data possible.
The layers will be 1000, 500, 250, 100 nodes wide, respectively, until the end,
where the net produces a vector 30 numbers long. This 30-number vector is the
last layer of the first half of the deep autoencoder, the pretraining half, and it is the
product of a normal RBM, Dept of Networking & Communications 36
Decoding Representations
Those 30 numbers are an encoded version of the 28x28 pixel image. The second half of a deep
autoencoder actually learns how to decode the condensed vector, which becomes the input as it makes its
way back.
The decoding half of a deep autoencoder is a feed-forward net with layers 100, 250, 500 and 1000 nodes
wide, respectively. Layer weights are initialized randomly.
784 (output) <---- 1000 <---- 500 <---- 250 <---- 30
The decoding half of a deep autoencoder is the part that learns to reconstruct the image. It does so with a
second feed-forward net which also conducts back propagation. The back propagation happens through
reconstruction entropy.