0% found this document useful (0 votes)

6 views14 pages

Autoencoders

Uploaded by

sankeerthrockz2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

Autoencoders

Uploaded by

sankeerthrockz2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Autoencoders

Autoencoders are a specific type of feedforward neural networks where the

input is the same as the output. They compress the input into a lower-
dimensional code and then reconstruct the output from this representation.
The code is a compact “summary” or “compression” of the input, also called
the latent-space representation.

An autoencoder consists of 3 components: enc`oder, code and decoder. The

encoder compresses the input and produces the code, the decoder then
reconstructs the input only using this code.

To build an autoencoder we need 3 things: an encoding method, decoding

method, and a loss function to compare the output with the target. We will
explore these in the next section.

Autoencoders are mainly a dimensionality reduction (or compression)

algorithm with a couple of important properties:

● Data-specific: Autoencoders are only able to meaningfully

compress data similar to what they have been trained on. Since
they learn features specific for the given training data, they are
different than a standard data compression algorithm like gzip. So
we can’t expect an autoencoder trained on handwritten digits to
compress landscape photos.
● Lossy: The output of the autoencoder will not be exactly the same
as the input, it will be a close but degraded representation. If you
want lossless compression they are not the way to go.
● Unsupervised: To train an autoencoder we don’t need to do
anything fancy, just throw the raw input data at it. Autoencoders
are considered an unsupervised learning technique since they
don’t need explicit labels to train on. But to be more precise they
are self-supervised because they generate their own labels from
the training data.

Architecture
Let’s explore the details of the encoder, code and decoder. Both the encoder
and decoder are fully-connected feedforward neural networks, essentially
the ANNs we covered in Part 1. Code is a single layer of an ANN with the
dimensionality of our choice. The number of nodes in the code layer (code
size) is a hyperparameter that we set before training the autoencoder.

This is a more detailed visualization of an autoencoder. First the input

passes through the encoder, which is a fully-connected ANN, to produce the
code. The decoder, which has the similar ANN structure, then produces the
output only using the code. The goal is to get an output identical with the
input. Note that the decoder architecture is the mirror image of the
encoder. This is not a requirement but it’s typically the case. The only
requirement is the dimensionality of the input and output needs to be the
same. Anything in the middle can be played with.

There are 4 hyperparameters that we need to set before training an

autoencoder:

● Code size: number of nodes in the middle layer. Smaller size

results in more compression.
● Number of layers: the autoencoder can be as deep as we like. In
the figure above we have 2 layers in both the encoder and
decoder, without considering the input and output.
● Number of nodes per layer: the autoencoder architecture we’re
working on is called a stacked autoencoder since the layers are
stacked one after another. Usually stacked autoencoders look like
a “sandwitch”. The number of nodes per layer decreases with each
subsequent layer of the encoder, and increases back in the
decoder. Also the decoder is symmetric to the encoder in terms of
layer structure. As noted above this is not necessary and we have
total control over these parameters.
● Loss function: we either use mean squared error (mse) or binary
crossentropy. If the input values are in the range [0, 1] then we
typically use crossentropy, otherwise we use the mean squared
error. For more details check out this video.

Autoencoders are trained the same way as ANNs via backpropagation.

Check out the introduction of Part 1 for more details on how neural
networks are trained, it directly applies to the autoencoders.
Implementation
Now let’s implement an autoencoder for the following architecture, 1
hidden layer in the encoder and decoder.

We will use the extremely popular MNIST dataset as input. It contains

black-and-white images of handwritten digits.
They’re of size 28x28 and we use them as a vector of 784 numbers between
[0, 1]. Check the jupyter notebook for the details.

We will now implement the autoencoder with Keras. The hyperparameters

are: 128 nodes in the hidden layer, code size is 32, and binary cross entropy
is the loss function.

This is very similar to the ANNs we worked on, but now we’re using the
Keras functional API. Refer to this guide for details, but here’s a quick
comparison. Before we used to add layers using the sequential API as
follows:

model.add(Dense(16, activation='relu'))

model.add(Dense(8, activation='relu'))

With the functional API we do this:

layer_1 = Dense(16, activation='relu')(input)

layer_2 = Dense(8, activation='relu')(layer_1)

It’s more verbose but a more flexible way to define complex models. We can
easily grab parts of our model, for example only the decoder, and work with
that. The output of Dense method is a callable layer, using the functional
API we provide it with the input and store the output. The output of a layer
becomes the input of the next layer. With the sequential API the add method
implicitly handled this for us.

Note that all the layers use the relu activation function, as it’s the standard
with deep neural networks. The last layer uses the sigmoid activation
because we need the outputs to be between [0, 1]. The input is also in the
same range.

Also note the call to fit function, before with ANNs we used to do:

model.fit(x_train, y_train)

But now we do:

model.fit(x_train, x_train)

Remember that the targets of the autoencoder are the same as the input.
That’s why we supply the training data as the target.

Visualization

Now let’s visualize how well our autoencoder reconstructs its input.

We run the autoencoder on the test set simply by using the predict function
of Keras. For every image in the test set, we get the output of the
autoencoder. We expect the output to be very similar to the input.
They are indeed pretty similar, but not exactly the same. We can notice it
more clearly in the last digit “4”. Since this was a simple task our
autoencoder performed pretty well.

Advice

We have total control over the architecture of the autoencoder. We can

make it very powerful by increasing the number of layers, nodes per layer
and most importantly the code size. Increasing these hyperparameters will
let the autoencoder to learn more complex codings. But we should be
careful to not make it too powerful. Otherwise the autoencoder will simply
learn to copy its inputs to the output, without learning any meaningful
representation. It will just mimic the identity function. The autoencoder will
reconstruct the training data perfectly, but it will be overfitting without
being able to generalize to new instances, which is not what we want.

This is why we prefer a “sandwitch” architecture, and deliberately keep the

code size small. Since the coding layer has a lower dimensionality than the
input data, the autoencoder is said to be undercomplete. It won’t be able to
directly copy its inputs to the output, and will be forced to learn intelligent
features. If the input data has a pattern, for example the digit “1” usually
contains a somewhat straight line and the digit “0” is circular, it will learn
this fact and encode it in a more compact form. If the input data was
completely random without any internal correlation or dependency, then an
undercomplete autoencoder won’t be able to recover it perfectly. But luckily
in the real-world there is a lot of dependency.
Denoising Autoencoders
Keeping the code layer small forced our autoencoder to
learn an intelligent representation of the data. There is
another way to force the autoencoder to learn useful
features, which is adding random noise to its inputs and
making it recover the original noise-free data. This way the
autoencoder can’t simply copy the input to its output
because the input also contains random noise. We are asking
it to subtract the noise and produce the underlying
meaningful data. This is called a denoising autoencoder.

The top row contains the original images. We add random

Gaussian noise to them and the noisy data becomes the input
to the autoencoder. The autoencoder doesn’t see the original
image at all. But then we expect the autoencoder to
regenerate the noise-free original image.
There is only one small difference between the
implementation of denoising autoencoder and the regular
one. The architecture doesn’t change at all, only the fit
function. We trained the regular autoencoder as follows:

autoencoder.fit(x_train, x_train)

Denoising autoencoder is trained as:

autoencoder.fit(x_train_noisy, x_train)

Simple as that, everything else is exactly the same. The input

to the autoencoder is the noisy image, and the expected
target is the original noise-free one.

Visualization

Now let’s visualize whether we are able to recover the noise-

free images.
Looks pretty good. The bottom row is the autoencoder
output. We can do better by using more complex
autoencoder architecture, such as convolutional
autoencoders. We will cover convolutions in the upcoming
article.

Sparse Autoencoders
We introduced two ways to force the autoencoder to learn
useful features: keeping the code size small and denoising
autoencoders. The third method is using regularization. We
can regularize the autoencoder by using a sparsity
constraint such that only a fraction of the nodes would have
nonzero values, called active nodes.

In particular, we add a penalty term to the loss function such

that only a fraction of the nodes become active. This forces
the autoencoder to represent each input as a combination of
small number of nodes, and demands it to discover
interesting structure in the data. This method works even if
the code size is large, since only a small subset of the nodes
will be active at any time.

It’s pretty easy to do this in Keras with just one parameter.

As a reminder, previously we created the code layer as
follows:

code = Dense(code_size, activation='relu')(input_img)

We now add another parameter called activity_regularizer

by specifying the regularization strength. This is typically a
value in the range [0.001, 0.000001]. Here we chose 10e-6.

code = Dense(code_size, activation='relu',

activity_regularizer=l1(10e-6))(input_img)

The final loss of the sparse model is 0.01 higher than the
standard one, due to the added regularization term.

Let’s demonstrate the encodings generated by the

regularized model are indeed sparse. If we look at the
histogram of code values for the images in the test set, the
distribution is as follows:
The mean for the standard model is 6.6 but for the
regularized model it’s 0.8, a pretty big reduction. We can see
that a large chunk of code values in the regularized model
are indeed 0, which is what we wanted. The variance of the
regularized model is also fairly low.

Use Cases
Now we might ask the following questions. How good are
autoencoders at compressing the input? And are they a
commonly used deep learning technique?

Unfortunately autoencoders are not widely used in real-

world applications. As a compression method, they don’t
perform better than its alternatives, for example jpeg does
photo compression better than an autoencoder. And the fact
that autoencoders are data-specific makes them impractical
as a general technique.

They have 3 common use cases though:

● Data denoising: we have seen an example of this on

images.
● Dimensionality reduction: visualizing high-
dimensional data is challenging. t-SNE is the most
commonly used method but struggles with large
number of dimensions (typically above 32). So
autoencoders are used as a preprocessing step to
reduce the dimensionality, and this compressed
representation is used by t-SNE to visualize the data
in 2D space. For great articles on t-SNE refer here
and here.
● Variational Autoencoders (VAE): this is a more
modern and complex use-case of autoencoders and
we will cover them in another article. But as a quick
summary, VAE learns the parameters of the
probability distribution modeling the input data,
instead of learning an arbitrary function in the case
of vanilla autoencoders. By sampling points from this
distribution we can also use the VAE as a generative
model. Here is a good reference.

Conclusion
Autoencoders are a very useful dimensionality reduction
technique. They are very popular as a teaching material in
introductory deep learning courses, most likely due to their
simplicity.

The Zebra Finch
No ratings yet
The Zebra Finch
351 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Garage Management System Abstract
No ratings yet
Garage Management System Abstract
3 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
659451A19_DL_EXP5
No ratings yet
659451A19_DL_EXP5
8 pages
L23_autoencoders
No ratings yet
L23_autoencoders
16 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Autoencoder_GAN_edited
No ratings yet
Autoencoder_GAN_edited
138 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Keras1 - 1.4 Advanced Model Architectures
No ratings yet
Keras1 - 1.4 Advanced Model Architectures
11 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
Deep Learning Autoencoders
No ratings yet
Deep Learning Autoencoders
31 pages
Autoencoders: Presented By: 2019220013 Balde Lansana (
No ratings yet
Autoencoders: Presented By: 2019220013 Balde Lansana (
21 pages
Deep Learning: Autoencoder
No ratings yet
Deep Learning: Autoencoder
42 pages
Lecture 5 Variational Autoencoder
No ratings yet
Lecture 5 Variational Autoencoder
17 pages
UNIT V
No ratings yet
UNIT V
32 pages
Unit II
No ratings yet
Unit II
35 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
CVDL Cae 2
No ratings yet
CVDL Cae 2
7 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
3 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Auto Encoder s
No ratings yet
Auto Encoder s
4 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Autoencoders - Bits and Bytes of Deep Learning - Towards Data Science
No ratings yet
Autoencoders - Bits and Bytes of Deep Learning - Towards Data Science
10 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
module 03
No ratings yet
module 03
13 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Week 6 Unsupervised Learning
No ratings yet
Week 6 Unsupervised Learning
60 pages
Chap 6 Embedding
No ratings yet
Chap 6 Embedding
44 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Autoencoders Tutorial _ What Are Autoencoders_ _ Edureka
No ratings yet
Autoencoders Tutorial _ What Are Autoencoders_ _ Edureka
10 pages
Building Autoencoders in Keras
No ratings yet
Building Autoencoders in Keras
17 pages
Lecture 2.3.1 - Autoencoders
No ratings yet
Lecture 2.3.1 - Autoencoders
6 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
DL Unit3 Autoencoder
No ratings yet
DL Unit3 Autoencoder
91 pages
Generative_Models
No ratings yet
Generative_Models
65 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
Deeplearning Seminar
No ratings yet
Deeplearning Seminar
9 pages
Unit5 Autoencoders.doc
No ratings yet
Unit5 Autoencoders.doc
45 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
14393_253_125_concurrency_control_NE
No ratings yet
14393_253_125_concurrency_control_NE
48 pages
14326_253_125_Example_of_a_PL
No ratings yet
14326_253_125_Example_of_a_PL
2 pages
C++ reduced list 2021
No ratings yet
C++ reduced list 2021
13 pages
Adobe Scan 12-Mar-2024 (2)
No ratings yet
Adobe Scan 12-Mar-2024 (2)
1 page
Need for Upsampling in GANs
No ratings yet
Need for Upsampling in GANs
6 pages
Novel_Transfer_Learning_Approach_for_Driver_Drowsiness_Detection_Using_Eye_Movement_Behavior
No ratings yet
Novel_Transfer_Learning_Approach_for_Driver_Drowsiness_Detection_Using_Eye_Movement_Behavior
14 pages
14743_253_125_21_8_UnitTesting
No ratings yet
14743_253_125_21_8_UnitTesting
19 pages
Real-time_eye_blink_detection_using_general_camera
No ratings yet
Real-time_eye_blink_detection_using_general_camera
8 pages
Q1
No ratings yet
Q1
3 pages
A Fistful of Darkness
67% (3)
A Fistful of Darkness
13 pages
Jamc Ir Advance c5051 5045 5035 5030
No ratings yet
Jamc Ir Advance c5051 5045 5035 5030
6 pages
Disini Brosur Hardware Update September 2021
No ratings yet
Disini Brosur Hardware Update September 2021
4 pages
Joint and Conditional Probability Distributions
No ratings yet
Joint and Conditional Probability Distributions
52 pages
A3 Signature Card
100% (1)
A3 Signature Card
2 pages
Ppt Data Mining
No ratings yet
Ppt Data Mining
13 pages
00 00 00 2021 Nurses' Organizational Commitment and Job Performance in A Tertiary Hospital Questionnaire
No ratings yet
00 00 00 2021 Nurses' Organizational Commitment and Job Performance in A Tertiary Hospital Questionnaire
5 pages
Madura14e Ch01 Final
No ratings yet
Madura14e Ch01 Final
11 pages
Oscillations Standard Practice Sheet JEE Main & Ad 240806 220158
No ratings yet
Oscillations Standard Practice Sheet JEE Main & Ad 240806 220158
27 pages
Ozone Furniture Locks
No ratings yet
Ozone Furniture Locks
13 pages
Adventure Tourism and Risk Management
No ratings yet
Adventure Tourism and Risk Management
6 pages
MT6771 Android Scatter
No ratings yet
MT6771 Android Scatter
27 pages
Huma - Hameed@lmdc - Edu.pk: Vitro Drug Release Studies
100% (1)
Huma - Hameed@lmdc - Edu.pk: Vitro Drug Release Studies
3 pages
Phast Europe: Fast Delivery Program
No ratings yet
Phast Europe: Fast Delivery Program
40 pages
Password 2 PB U4 U7 PDF
No ratings yet
Password 2 PB U4 U7 PDF
50 pages
Impact of Information Technology in Indian Banking Industry
No ratings yet
Impact of Information Technology in Indian Banking Industry
14 pages
Gupta Dynasty
No ratings yet
Gupta Dynasty
6 pages
Arjun Resume 2022
No ratings yet
Arjun Resume 2022
4 pages
French Gcse Work Experience Coursework
100% (2)
French Gcse Work Experience Coursework
7 pages
Real Numbers and The Role of Mathematics in Business
No ratings yet
Real Numbers and The Role of Mathematics in Business
4 pages
Lovecraftesque RulesSummary
No ratings yet
Lovecraftesque RulesSummary
2 pages
AC-14 HHP ANCHOR 90KGS Drawing1
No ratings yet
AC-14 HHP ANCHOR 90KGS Drawing1
1 page
Unit # 8 Governor: Mechanis of Material & Machines 1
100% (2)
Unit # 8 Governor: Mechanis of Material & Machines 1
27 pages
ABB - Busduct Brochure - 12pp - WEB PDF
No ratings yet
ABB - Busduct Brochure - 12pp - WEB PDF
12 pages
2015 Portfolio - Post Disaster Assessment & Rehabilitation of Brgy Health Centers - Americares&GIZ
No ratings yet
2015 Portfolio - Post Disaster Assessment & Rehabilitation of Brgy Health Centers - Americares&GIZ
23 pages
FieldGenius 7 SP1 Release Notes
No ratings yet
FieldGenius 7 SP1 Release Notes
2 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Indomie As A Soft Power Tool in Indonesia's Gastrodiplomacy
No ratings yet
Indomie As A Soft Power Tool in Indonesia's Gastrodiplomacy
14 pages