0% found this document useful (0 votes)

11 views35 pages

Autoencoders

The document discusses Autoencoders, a type of neural network used in unsupervised learning to compress and reconstruct data. It covers various types of autoencoders, including undercomplete, sparse, denoising, and variational autoencoders, highlighting their applications and properties. The document also explains the concepts of latent space representation, reconstruction error, and KL divergence in the context of autoencoders.

Uploaded by

melvin.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views35 pages

Autoencoders

Uploaded by

melvin.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

AutoEncoders

Dr. Thomas Abraham

SCOPE, VIT Chennai
Unsupervised Learning
• Used when there is input data (X) and
no corresponding output label (Y)

• Objective is to explore the data and

find some pattern in it

• Very useful when there is lot of data

but very less label

• Example tasks such as association

rule, clustering, compression,
dimensionality reduction, data
generation
Applications of UL

Clustering
Recommendation System
Anamoly Detection

Dimensionality Reduction Data Generation

Data Compression
Autoencoder
• An autoencoder is a special type of neural network that is trained to copy its input to
its output.

• Autoencoders are an unsupervised learning technique in which we leverage neural

networks for the task of representation learning.

• An autoencoder learns to compress the data while minimizing the reconstruction error.

Latent Space Representation

Input Image Output Image

Autoencoder
• For higher dimensional data, autoencoders are capable of learning a complex
representation of the data (manifold) which can be used to describe observations
in a lower dimensionality and correspondingly decoded into the original input
space.

• For example, given an image of a handwritten digit, an autoencoder first encodes

the image into a lower dimensional latent representation, then decodes the latent
representation back to an image.

• Suppose we have a set of data points. , where each data point has
many dimensions

• to map to another set of data points where z's have lower

dimensionality than x's and z's can faithfully reconstruct x's.
Autoencoder
• Encoder: compress input into a latent-space of usually smaller dimension. h = f(x)

• Decoder: reconstruct input from the latent space. r = g(f(x)) with r as close to x as possible

• To map data back and forth, z and x̃ are functions

•
Autoencoder
(i) (i)
• Our goal is to have x̃ to approximate x using objective function, which is the sum of
(i) (i)
squared diferences between x̃ and x

• which can be minimized using stochastic gradient descent

Properties of Autoencoders
• Data-specific: Autoencoders are only able to compress data similar to what they have been
trained on.

• Lossy: The decompressed outputs will be degraded compared to the original inputs.

• Learned automatically from examples: It is easy to train specialized instances of the

algorithm that will perform well on a specific type of input.
Reconstruction Error

A bottleneck constrains the amount of information that can traverse the full network, forcing
a learned compression of the input data.
Autoencoder
The ideal autoencoder model balances the following:

• Sensitive to the inputs enough to accurately build a reconstruction.

• Insensitive enough to the inputs that the model doesn't simply memorize or overfit the
training data.
Undercomplete autoencoder
• Important to train the autoencoder to perform the input copying task will result in h (hidden
unit) taking on useful properties from input

• features from the autoencoder is to constrain h to have a smaller dimension than x

• An autoencoder whose code dimension is less than the input dimension is called
undercomplete.
Linear Autoencoder
z
x x̃

• autoencoder maps data from 4 dimensions to 2 dimensions, with one hidden layer, The
activation function of the hidden layer is linear

• works for the case where the data lie on a linear surface
Non-linear Autoencoder
z
x x̃

• If the data lie on a nonlinear surface it makes more sense to use a nonlinear autoencoder

• If the data is highly nonlinear, one could add more hidden layers to the network to have a
deep autoencoder
Types of Autoencoders
• Undercomplete autoencoders

• Sparse Autoencoders

• Denoising autoencoders

• Variational Autoencoders (for generative modelling)

• Contractive Autoencoders(CAE)
Sparse Autoencoder
• In an autoencoder , may the number of hidden units is large (perhaps even greater than the
number of input pixels).

• It can still discover interesting structure, by imposing other constraints on the network.

• In particular, impose a sparsity constraint on the hidden units, then the autoencoder will still
discover interesting structure in the data, even if the number of hidden units is large

• ρ is a sparsity parameter, typically a small value close to zero.

• we would like the average activation of each hidden neuron to be close to 0.05 (ρ)

• To achieve, add an extra penalty term to our optimization objective that penalizes ρĵ
deviating significantly from ρ
Sparse Autoencoder (cont’d)
Sparse Autoencoder (cont’d)
• An autoencoder with sparsity penalty Ω(h) on the code layer h , in addition to the
reconstruction error while training L(x, g( f(x))) + Ω(h)

• where g(h) is the decoder output, h =f(x), the encoder output

• Incorporating sparsity forces more neurons to be inactive

• Sparse autoencoders attempt to enforce the constraint i.e sparsity parameter(Ω(h)).

• This penalizes the neurons that are too active, forcing them to activate less

• forces the model to only have a small number of hidden units being activated at the same
time
Sparse Autoencoder (cont’d)
• sparsity penalty can yield a model that has learned useful features as a byproduct

• sparsity penalty function, prevents the neural network from activating more neurons and
serves as a regularizer

• There are two main ways by which we can impose this sparsity constraint; both involve
measuring the hidden layer activations for each training batch and adding some term to the
loss function in order to penalize excessive activations. These terms are:

• L1 Regularization: add a term to the loss function that penalizes the absolute value of the
vector of activations α in layer h for observation i, scaled by a tuning parameter λ,

( h)
ℒ (x, x)̂ + λ
∑
ai
i
Sparse Autoencoder (cont’d)
• KL-Divergence: In essence, KL-divergence is a measure of the difference between two
probability distributions. We can define a sparsity parameter ρ which denotes the average
activation of a neuron over a collection of samples. This expectation can be calculated as
1
[ ]
(h)
ρĵ =
∑
ai
(x) where the subscript j denotes the specific neuron in layer h , summing the
m i
activations for training observations denoted individually as x .

• In essence, by constraining the average activation of a neuron over a collection of samples we're
encouraging neurons to only fire for a subset of the observations.

• ρ can be described as a Bernoulli random variable distribution such that we can leverage the KL
divergence (expanded next) to compare the ideal distribution to the observed distributions over
all hidden layer nodes ℒ (x, x)̂ + KL (ρ | | ρĵ )
∑
j
KL Divergence
• A measure of how one probability distribution Q is different from a second, reference probability
distribution P.

• The KL divergence between two probability distributions P and Q is the sum, over all possible
outcomes x, of the probability of x under distribution PP multiplied by the logarithm of the ratio of the
probability of x under P to the probability of x under Q.

( Q(x) )
P(x)
∑
DKL(P | | Q) = P(x)log
x∈X
Sparse Autoencoder (cont’d)
• Note: A Bernoulli distribution is "the probability distribution of a random variable which
takes the value 1 with probability p and the value 0 with probability q = 1 − p ". This
corresponds quite well with establishing the probability a neuron will fire.

• The KL divergence between two Bernoulli distributions can be written as

l (h)
ρ 1−ρ
ρ log + (1 − ρ) log
∑ ρĵ 1 − ρĵ
j=1
Denoising Autoencoder
Denoising Autoencoder
• One approach towards developing a generalizable model is to slightly corrupt the input data but still maintain the
uncorrupted data as our target output.

• The denoising autoencoder (DAE) is an autoencoder that receives a corrupted data point as input and is trained to
predict the original, uncorrupted data point as its output

• we train the autoencoder to reconstruct the input from a corrupted copy of the inputs. This forces the codings to
learn more robust features of the inputs

• The input is partially corrupted by adding noises to or masking some values of the input vector in a stochastic
manner
Denoising Autoencoder
Variational Autoencoder
• Variational autoencoder was proposed in 2013 by Diederik P. Kingma and Max Welling at
Google and Qualcomm.

• A variational autoencoder (VAE) provides a probabilistic manner for describing an observation

in latent space.

• Variational autoencoder is different from an autoencoder in a way that it provides a statistical

manner for describing the samples of the dataset in latent space. Therefore, in the variational
autoencoder, the encoder outputs a probability distribution in the bottleneck layer instead of a
single output value.

• It has many applications, such as data compression, synthetic data creation, etc.

• For variational autoencoders, the encoder model is sometimes referred to as the recognition
model whereas the decoder model is sometimes referred to as the generative model.
How does VAE work?
• The encoder network takes raw input data and transforms it into a probability distribution
within the latent space.

• The latent code generated by the encoder is a probabilistic encoding, allowing the VAE to
express not just a single point in the latent space but a distribution of potential
representations.

• The decoder network, in turn, takes a sampled point from the latent distribution and
reconstructs it back into data space. During training, the model refines both the encoder and
decoder parameters to minimize the reconstruction loss – the disparity between the input data
and the decoded output.

• The goal is not just to achieve accurate reconstruction but also to regularize the latent space,
ensuring that it conforms to a specified distribution.
• The reconstruction loss compels the model to accurately reconstruct the input, while the
regularization term encourages the latent space to adhere to the chosen distribution,
preventing overfitting and promoting generalization.

ℒ (x, x)̂ + KL (qj (z | x) | | p (z))

∑
j

• Where q (z | x) is the learned distribution and p (z) is the true prior distribution , which we'll
assume follows a unit Gaussian distribution, for each dimension j of the latent space.
Statistical

p (x | z) p (z)
p (z | x) =
p (x)

∫
p (x) = p (x | z) p (z) dz
min KL (q (z | x) | | p (z | x))

Eq(z | x) log p (x | z) − KL (q (z | x) | | p (z))

Exercise
• Two GAN generators have learned different distributions over 4 categories:

Generator G1 :

PG1=[0.3,0.3,0.2,0.2]

Generator G2 :

PG2 =[0.25,0.25,0.25,0.25]

The real data distribution is:

Preal=[0.35,0.3,0.2,0.15]

Compute the KL divergence between Preal and PG1 , and between Preal and PG2 .

Based on KL divergence, which generator better approximates the real data distribution? Discuss how KL
divergence helps compare the quality of the two generators in GAN training.

Gen AI Unit 2
100% (1)
Gen AI Unit 2
65 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Customer Seminar Off-Line Testing Using Doble - Manila - March 2018
100% (2)
Customer Seminar Off-Line Testing Using Doble - Manila - March 2018
95 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Shaft Design Problem 1 - 9
No ratings yet
Shaft Design Problem 1 - 9
14 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Ch3 Auto Encoder
No ratings yet
Ch3 Auto Encoder
40 pages
Vapor Liquid Equilibrium
100% (1)
Vapor Liquid Equilibrium
7 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Unit 4
No ratings yet
Unit 4
10 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Xavier A. Crosswin Surface Welltest Supervisor
No ratings yet
Xavier A. Crosswin Surface Welltest Supervisor
3 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Autoencoder
No ratings yet
Autoencoder
4 pages
Module 03
No ratings yet
Module 03
13 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Unit 3
No ratings yet
Unit 3
23 pages
DL Class5
No ratings yet
DL Class5
23 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Autoencoder
No ratings yet
Autoencoder
14 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Module 4
No ratings yet
Module 4
10 pages
Unit 5
No ratings yet
Unit 5
27 pages
Unit 3
No ratings yet
Unit 3
39 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
Practice Test Pediatric Nursing 100 Items
100% (1)
Practice Test Pediatric Nursing 100 Items
16 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Unit V
No ratings yet
Unit V
32 pages
DL Unit 4
No ratings yet
DL Unit 4
21 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Brief Introduction On Current Research Areas - Autoencoders
No ratings yet
Brief Introduction On Current Research Areas - Autoencoders
20 pages
Unit II
No ratings yet
Unit II
35 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
DL Unit3 Autoencoder
No ratings yet
DL Unit3 Autoencoder
91 pages
Ad3501-Dl-Unit 5 Notes
No ratings yet
Ad3501-Dl-Unit 5 Notes
16 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
UNIT-5 Part1
No ratings yet
UNIT-5 Part1
15 pages
03 Autoencoders 4
No ratings yet
03 Autoencoders 4
159 pages
Unsupervised Deep Learning-Unit 4
No ratings yet
Unsupervised Deep Learning-Unit 4
26 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Auto Encoder S
No ratings yet
Auto Encoder S
16 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Auto Encoder S
No ratings yet
Auto Encoder S
22 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Unit Iii
No ratings yet
Unit Iii
15 pages
Dl-Unit 3
No ratings yet
Dl-Unit 3
14 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Nse Sector Fno
No ratings yet
Nse Sector Fno
28 pages
N9020A MXA X-Series Signal Analyzer: Data Sheet
No ratings yet
N9020A MXA X-Series Signal Analyzer: Data Sheet
18 pages
MEMS Motion Sensor: Three-Axis Digital Output Gyroscope: Applications
No ratings yet
MEMS Motion Sensor: Three-Axis Digital Output Gyroscope: Applications
44 pages
Math IA
No ratings yet
Math IA
9 pages
2020-TT-Semester III - B.Sc. HONS. in Physics Evening Program
No ratings yet
2020-TT-Semester III - B.Sc. HONS. in Physics Evening Program
1 page
CA Foundation Maths Chapterwise Weightage
No ratings yet
CA Foundation Maths Chapterwise Weightage
7 pages
Module 5 Gned 06 Issues and Application of STS
No ratings yet
Module 5 Gned 06 Issues and Application of STS
9 pages
Biology Syllabus
No ratings yet
Biology Syllabus
11 pages
The Big Ear Wow! Signal (Dr. Jerry R. Ehman)
No ratings yet
The Big Ear Wow! Signal (Dr. Jerry R. Ehman)
40 pages
Atr72-600 Jic 05-51-25 Volcanic Ash Insp 2
No ratings yet
Atr72-600 Jic 05-51-25 Volcanic Ash Insp 2
9 pages
Eapp W3
No ratings yet
Eapp W3
4 pages
Canonical Forms or Normal Forms: Module 3: Second-Order Partial Differential Equations
No ratings yet
Canonical Forms or Normal Forms: Module 3: Second-Order Partial Differential Equations
4 pages
2023년 중3 2학기+기말 구미인덕중학교 경상북도+구미시 비상 (김진완) .hwp
No ratings yet
2023년 중3 2학기+기말 구미인덕중학교 경상북도+구미시 비상 (김진완) .hwp
7 pages
English 1st Rearange
No ratings yet
English 1st Rearange
15 pages
Electronic Device Lab Report 1
No ratings yet
Electronic Device Lab Report 1
15 pages
Wind Strength by Beaufort Scale
No ratings yet
Wind Strength by Beaufort Scale
3 pages
LKG - Summer Home Work
No ratings yet
LKG - Summer Home Work
14 pages
IT 231 Foundation of Information Technology
No ratings yet
IT 231 Foundation of Information Technology
50 pages
Acute Cough
No ratings yet
Acute Cough
6 pages
TESLA
No ratings yet
TESLA
2 pages
Method of Load Ow Solution of Radial Distribution Network
No ratings yet
Method of Load Ow Solution of Radial Distribution Network
9 pages
Bài Mẫu XP
No ratings yet
Bài Mẫu XP
30 pages
5 Quotation
No ratings yet
5 Quotation
3 pages
Darpan Theng 200933026
No ratings yet
Darpan Theng 200933026
12 pages
Wireless M-Bus Gateway
No ratings yet
Wireless M-Bus Gateway
7 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet