0% found this document useful (0 votes)
40 views32 pages

Auto Encoder S

An autoencoder is a type of neural network that learns a compressed representation of the input data. It consists of an encoder that compresses the input into a latent representation, and a decoder that tries to reconstruct the input from the latent representation. The autoencoder is trained to minimize the reconstruction error between the input and output. This allows it to learn an efficient encoding of the input data distribution. Autoencoders can be used for dimensionality reduction by learning representations using fewer dimensions than the input. They have applications in tasks like denoising, classification using the learned representations, and generating new data samples from the learned distribution.

Uploaded by

tejsharma815
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views32 pages

Auto Encoder S

An autoencoder is a type of neural network that learns a compressed representation of the input data. It consists of an encoder that compresses the input into a latent representation, and a decoder that tries to reconstruct the input from the latent representation. The autoencoder is trained to minimize the reconstruction error between the input and output. This allows it to learn an efficient encoding of the input data distribution. Autoencoders can be used for dimensionality reduction by learning representations using fewer dimensions than the input. They have applications in tasks like denoising, classification using the learned representations, and generating new data samples from the learned distribution.

Uploaded by

tejsharma815
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Autoencoders

A function has a structure and a set of parameters.

2
f = ax
Structure: square of ‘x’
Parameter: {a}

bx 2
f = ae + cx + dx + ksin(x)
Structure: Exponential+Polynomial+Trigonometric
Parameters: {a,b,c,d,k}
A function:

x∈ℝ n
f f(x) ∈ ℝ m

Let us capture the structure(S) and parameters(P) explicitly.

x∈ℝ n f(S; P) f(x) ∈ ℝ m


x∈ℝ n f(S; P) y = f(x) ∈ ℝm

What is the “training part” of the learning?

Find ‘P’ given input, output, and ‘S’.

We should know how far we are from the output. Hence, we


need ‘something’ to minimize or maximize!!
x∈ℝ n f(S; P) y = f(x) ∈ ℝ m

What is a neural network?

S: Number of layers, neurons, etc.

P: w Weights

Find ‘P’ such that the ‘error’ is minimized or ‘likelihood’ is


maximized.
x∈ℝ n f(S; P) y = f(x) ∈ ℝ m

n
Let us ask: If ‘y’ = ‘x’, what is the meaning of ‘f’? y=x∈ℝ

Hold on for sometime. We will see why is this question important?

n
x∈ℝ n y=x∈ℝ
The function does not give any ‘insight’.
n
x∈ℝ n y=x∈ℝ

We do not want this. We want something else which is


meaningful. What to do?

The single block (or one function) is preventing us from


moving to “more meaningful” structure.

Change the structure then!!!

n n
x∈ℝ y=x∈ℝ
x∈ℝ n n
y=x∈ℝ

q
h∈ℝ n
x∈ℝn g(S1; P1) f(S2; P2) y=x∈ℝ

Now we have a richer ‘structure’.


So what?
h = g(x)
y = f(h) Why do we take this much

y = x = f(g(x)) trouble to write ‘x’ as


f(g(x))?
q
h∈ℝ n
x∈ℝ n g(S1; P1) f(S2; P2) y=x∈ℝ

Here, we can represent ‘x’ as


‘h’ provided we can get ‘x’ This part: We can get ‘x’ from ‘h’.
back.
The advantage of two blocks(or functions) is very clear. It
does create a rich structure. What is ‘rich’ there? We get
a representation for ‘x’. If you make ‘q<n’, we get a
compressed representation too!!!
We have special names.
g f h
Encoder Decoder Latent representation
Autoencoder
q
h∈ℝ n
x∈ℝ n g(S1; P1) f(S2; P2) y=x∈ℝ

Encoder Latent Decoder


representation

An autoencoder is a type of algorithm that ‘learns’ a


representation of the data, which can then be used in
di erent applications by learning to reconstruct the data.

How to ‘learn?
ff
Dataset: M unlabeled observations
x1, x2, …, xM
q
g(S1; P1) hi ∈ ℝ f(S2; P2)
n
xi ∈ ℝn x̃i ∈ ℝ
Latent
representation
Encoder Decoder

x̃i = f(hi) = f(g(xi))


Training an autoencoder (Unsupervised learning):
Find f and g such that
Average over all observations of a
argmin Error
f, g measure for quantifying the errors. xi − x̃i
q
g(S1; P1) hi ∈ ℝ f(S2; P2)
n
xi ∈ ℝ n x̃i ∈ ℝ
Latent Reconstructed
Input representation Input
Encoder Decoder

Loss Function

2
argmin
f, g [ | xi − x̃i | ]
“Bottleneck” is another
g f popular name.
When q is very small:
h (N,q,N)
𝔼
xi x̃i
We can bring other techniques to the loss function and get
variants of autoencoders.
Add l2 regularization term,
2 +λ
L = argmin
f, g [ | xi − x̃i | ] ∑
2
θk
k
Or add l1 regularization term,
2 +λ
L = argmin
f, g [ | xi − x̃i | ] ∑
| θk |
k

Parameters in the functions ‘f’ and ‘g’ i.e. weights in the


θk :
case of neural networks
This will enforce sparsity in the latent representation.
𝔼
𝔼
One more technique:

Tie the weights of the encoder to the weights of the decoder.

A Feed-forward Autoencoder is easy to implement. The Feed-


forward structure has an odd number of layers and is
symmetrical with respect to the middle layer.
‘h’ is also called a learned representation of the input
observations ‘x’.

A decoder can reconstruct the input by using only a smaller


number of features i.e. q.

The output layer’s activation function plays a key role in


autoencoders based on neural networks.

The common functions are ReLU and sigmoid.


If the activation function of the output layer is sigmoid in a
Feed-forward autoencoder, the neuron outputs are between 0
and 1.

The input features are normalized to be between 0 and 1.

Then we can use Binary Cross Entropy as loss function which


is useful for classi cation problems.

M n
1
∑ ∑
L=− [xj,i logx̃j,i + (1 − xj,i) log(1 − x̃j,i)]
M i=1 j=1

xj,i : jth component of the ith observation.


fi
Applications
MNIST dataset

This contains 70000 hand-written digits from 0 to 9.


Each image is 28x28 pixels with only gray values. We have 784
features.

Let us see the output (reconstructed image) of Feed-forward


autoencoder with following con gurations:

(784,8,784)
(784,16,784)
(784,64,784)
fi
Original image
10 random digits

Feed-forward Autoencoder
(784,8,784)

Feed-forward Autoencoder
(784,16,784)

Feed-forward Autoencoder
(784,64,784)
Loss function used: Binary cross entropy
(See ‘References’ for the links to codes)
Original image
10 random digits

Feed-forward Autoencoder
(784,16,784) with
Binary Cross Entropy
loss function

Feed-forward Autoencoder
(784,16,784) with
Mean square error loss
function
Both loss functions worked well in this case!
(N,q,N) autoencoder creates a latent representation using only
‘q’ numbers. Dimensionality reduction!!!

You know any other dimension reduction technique?


PCA
This algorithm projects a dataset on the eigenvectors of its
covariance matrix - a linear transformation of the features.
PCA uses the entire dataset.

Feed forward AE
Non-linear transformation of features. Why?
This can handle large amount of data e ciently as the training
can be done in batches.

But… ffi
It has been shown that a Feed-forward AE is equivalent to PCA
when:

The encoder ‘g’ is linear.


The decoder ‘f’ is linear.
The loss function is mean square error.
The input is normalized as,
M

M( M k=1 )
1 1

x̃i,j = xi,j − xk,j
Classi cation

Instead of classifying using original input data, AE’s latent


representation can be used in classi cation. What’s the use?

Input Data Accuracy Training Time

Original Data
96.4% ~1000 sec.
784
xi ∈ ℝ
Latent Representation
89% ~1 sec.
8
g(xi) ∈ ℝ
Minor degradation in accuracy but huge advantage!!
fi
fi
Autoencoder as Anomaly Detector
Consider a ‘shoe image’ is added to MNIST dataset.
Can AE detect that?

AE has seen only hand-written digit images during the training.


It will not be able to reconstruct the ‘shoe’. Therefore, the
reconstruction error will be larger. Label that as an outlier!

In general, outlier detection is an involved process and the


information about the data and the contextual knowledge are
needed.
Denoising Autoencoder

Input: Noisy image (say noise is added to MNIST images)


Output: Original MNIST image

Train the autoencoder.


Convolutional Autoencoder

Instead of one hidden layer of simple autoencoder, you can use


convolutional layers to implement convolutional autoencoder.

The convolutional layers are shown without pooling layers.


Autoencoder
q
g(S1; P1) hi ∈ ℝ f(S2; P2)
n
xi ∈ ℝ n x̃i ∈ ℝ
Latent
representation
Encoder Decoder

Variational Autoencoder

Encoder Decoder

r
g(S1; P1)
Statistics of the
data such as hi ∈ ℝ f(S2; P2)
mean, variance,
etc. And a Latent data
n n
xi ∈ ℝ sample representation x̃i ∈ ℝ
generator
Latent distribution

Learn the distribution of the data and then represent the data
as samples generated from that distribution.
History
Ronald J. Williams
David E. Rumelhart Geo rey Hinton http://www.ccs.neu.edu/home/rjw/pubs.html
1942-2011 1947-
https://www.cs.toronto.edu/~hinton/
Stanford Obituary Link

D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal


representations by error propagation. In Parallel Distributed Processing.
Vol 1: Foundations. MIT Press, Cambridge, MA, 1986.
ff
Dana Ballard
1946-
https://www.cs.utexas.edu/~dana/

Ballard, D.H., 1987, July. Modular learning in neural networks. In AAAI (Vol. 647, pp. 279-284)
For further reading…
Autoencoder example:
http://adl.toelt.ai/Chapter25/Your_first_autoencoder_with_Keras.html

Autoencoder codes: https://adl.toelt.ai

Autoencoder ensembles: https://saketsathe.net/downloads/autoencode.pdf

Autoencoders: http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/slides/
lec20.pdf

Bank, D., Koenigstein, N. and Giryes, R., 2020. Autoencoders. arXiv preprint
arXiv:2003.05991.

Michelucci, U., 2022. An Introduction to Autoencoders. arXiv preprint


arXiv:2201.03898.

MNIST: http://yann.lecun.com/exdb/mnist/

PCA equivalent: http://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Teaching/pdf/


Lecture7.pdf

You might also like