100% found this document useful (1 vote)
77 views65 pages

Gen AI Unit 2

The document discusses autoencoders and autoregressive models, detailing various types of autoencoders such as regularized, denoising, sparse, variational, and convolutional autoencoders, along with their advantages and disadvantages. It also introduces autoregressive models, explaining their function in predicting sequences based on previous values, and outlines traditional autoregressive models like AR, ARMA, and ARIMA. Additionally, it covers specific models like Fully Visible Sigmoid Belief Networks (FVSBN), Neural Autoregressive Density Estimation (NADE), and Masked Autoencoder for Distribution Estimation (MADE).

Uploaded by

23adl05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
77 views65 pages

Gen AI Unit 2

The document discusses autoencoders and autoregressive models, detailing various types of autoencoders such as regularized, denoising, sparse, variational, and convolutional autoencoders, along with their advantages and disadvantages. It also introduces autoregressive models, explaining their function in predicting sequences based on previous values, and outlines traditional autoregressive models like AR, ARMA, and ARIMA. Additionally, it covers specific models like Fully Visible Sigmoid Belief Networks (FVSBN), Neural Autoregressive Density Estimation (NADE), and Masked Autoencoder for Distribution Estimation (MADE).

Uploaded by

23adl05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

UNIT – II AUTOENCODERS AND AUTOREGRESSIVE MODELS

Autoencoders – Regularised autoencoders – Stochastic Encoders and Decoders –


Autoregressive Models – Fully Visible Sigmoid Belief Network (FVSBN) – Neural
Autoregressive Density Estimation (NADE) – Masked Autoencoder for
Distribution Estimation (MADE)
Autoencoders
Regularized Autoencoder
• With code dimension less than the input dimension, can learn the
most salient features of the data distribution.
• A similar problem occurs if the hidden code is allowed to have
dimension equal to the input, and in the overcomplete case in which
the hidden code has dimension greater than the input.
• In these cases, even a linear encoder and linear decoder can learn to
copy the input to the output without learning anything useful about
the data distribution.
• Rather than limiting the model capacity by keeping the encoder
and decoder shallow and the code size small, regularized
autoencoders use a loss function that encourages the model to have
other properties besides the ability to copy its input to its output.
• These other properties include sparsity of the representation,
smallness of the derivative of the representation, and robustness to
noise or to missing inputs.
• A regularized autoencoder can be nonlinear and overcomplete but
still learn something useful about the data distribution even if the
model capacity is great enough to learn a trivial identity function.
Types of Autoencoders
• Denoising Autoencoder
• Denoising autoencoder works on a partially corrupted input and trains to
recover the original undistorted image. This method is an effective way to
constrain the network from simply copying the input and thus learn the
underlying structure and important features of the data.
• Advantages
• This type of autoencoder can extract important features and reduce the
noise or the useless features.
• Denoising autoencoders can be used as a form of data augmentation, the
restored images can be used as augmented data thus generating additional
training samples.
• Disadvantages
• Selecting the right type and level of noise to introduce can be
challenging and may require domain knowledge.
• Denoising process can result into loss of some information that is
needed from the original input. This loss can impact accuracy of the
output.
• Sparse Autoencoder
• This type of autoencoder typically
contains more hidden units than the
input but only a few are allowed to
be active at once. This property is
called the sparsity of the network.
• The sparsity of the network can be
controlled by either manually
zeroing the required hidden units,
tuning the activation functions or by
adding a loss term to the cost
function. It avoids bottle-neck
• Advantages
• The sparsity constraint in sparse autoencoders helps in filtering out
noise and irrelevant features during the encoding process.
• These autoencoders often learn important and meaningful features due
to their emphasis on sparse activations.
• Disadvantages
• The choice of hyperparameters play a significant role in the performance
of this autoencoder. Different inputs should result in the activation of
different nodes of the network.
• The application of sparsity constraint increases computational
complexity.
• Variational Autoencoder
• Variational autoencoder makes strong assumptions about the
distribution of latent variables and uses the Stochastic Gradient
Variational Bayes (SGVB) estimator in the training process.
• It assumes that the data is generated by a Directed Graphical Model
and tries to learn an approximation to to the conditional
property where are the parameters of the encoder and
the decoder respectively.
• Advantages
• Variational Autoencoders are used to generate new data points that
resemble the original training data. These samples are learned from the
latent space.
• Variational Autoencoder is probabilistic framework that is used to learn a
compressed representation of the data that captures its underlying
structure and variations, so it is useful in detecting anomalies and data
exploration.
• Disadvantages
• Variational Autoencoder use approximations to estimate the true
distribution of the latent variables. This approximation introduces some
level of error, which can affect the quality of generated samples.
• The generated samples may only cover a limited subset of the true data
distribution. This can result in a lack of diversity in generated samples.
• Convolutional Autoencoder
• Convolutional autoencoders are a type of autoencoder that use
convolutional neural networks (CNNs) as their building blocks.
• The encoder consists of multiple layers that take a image or a grid as
input and pass it through different convolution layers thus forming a
compressed representation of the input.
• The decoder is the mirror image of the encoder it deconvolves the
compressed representation and tries to reconstruct the original
image.
• Advantages
• Convolutional autoencoder can compress high-dimensional image
data into a lower-dimensional data. This improves storage efficiency
and transmission of image data.
• Convolutional autoencoder can reconstruct missing parts of an
image. It can also handle images with slight variations in object
position or orientation.
• Disadvantages
• These autoencoder are prone to overfitting. Proper regularization
techniques should be used to tackle this issue.
• Compression of data can cause data loss which can result in
reconstruction of a lower quality image.
Stochastic Encoders and Decoders
• Autoencoders are just feedforward networks. The same loss
functions and output unit types that can be used for traditional
feedforward networks are also used for autoencoders.
• General strategy for designing the output units and the loss function
of a feedforward network is to define an output distribution
p(y | x) and minimize the negative log-likelihood - log p(y | x).
• In that setting, y was a vector of targets, such as class labels.
• In the case of an autoencoder, x is now the target as well as the input.
• Given a hidden code h, we may think of the decoder as providing a
conditional distribution pdecoder(x | h). We may then train the
autoencoder by minimizing - log pdecoder(x | h).
• The exact form of this loss function will change depending on the
form of pdecoder .
• As with traditional feedforward networks, we usually use linear
output units to parametrize the mean of a Gaussian distribution if x is
real-valued. In that case, the negative log-likelihood yields a mean
squared error criterion.
• Similarly, binary x values correspond to a Bernoulli distribution whose
parameters are given by a sigmoid output unit, discrete x values
correspond to a softmax distribution, and so on.
• Stochastic encoder

• Stochastic decoder
Autoregressive models
• An autoregressive model is a statistical model that describes a sequence of
observations where each observation depends on its preceding values. In
other words, the model predicts the next value in a sequence based on the
previous values. Autoregressive models are commonly used in time series
analysis, signal processing, and various other fields.
• The term "autoregressive" is derived from the idea that the model
regresses a variable onto itself. The mathematical representation of an
autoregressive model of order p, often denoted as AR(p), is given by the
following equation:
• Xt=ϕ1Xt−1+ϕ2Xt−2+…+ϕpXt−p+ϵt
• Xt is the value at time t, ϕ1,ϕ2,…,ϕp are the parameters of the model
representing the weights,
• Xt−1,Xt−2,…,Xt−p are the lagged values of the series (past observations),
ϵt is a white noise term representing the error or randomness at time
t.
• The order p indicates how many past observations are considered in
predicting the current value. If p=1, it's an AR(1) model, and if p=2, it's
an AR(2) model, and so on.
Types of Autoregressive Models
• Autoregressive models are used in various domains and have
different formulations:
• 2.1 Traditional Autoregressive Models
• These models are used in time series forecasting. The two most
common ones are:
• AR (Autoregressive) Model
• The AR model predicts the future value of a variable based on a weighted
sum of its past values.
• Example: AR(1) (first-order AR model)
• ARMA (Autoregressive Moving Average) Model
• Combines AR and MA (Moving Average) models.
• ARMA(p, q) includes both autoregressive terms and past error terms:

• ARIMA (Autoregressive Integrated Moving Average) Model


• Extends ARMA by incorporating differencing to handle non-stationary time
series.
• Two examples of data from autoregressive models with different
parameters. Left: AR(1) with yt=18−0.8yt−1+εt
• Right: AR(2) with yt=8+1.3yt−1−0.7yt−2+εt. In both cases, εt is
normally distributed white noise with mean zero and variance one.
Sigmoid Belief Network
Learning Rule for Sigmoid Belief Network
Fully Visible Sigmoid Belief Network
(FVSBN)
without any hidden units
● the fully visible sigmoid belief network is denoted FVSBN.

The conditional variables x |x ,..., x in FVSBN are Bernoulli with


i 1 i−1

parameters.
Some conditionals are too complex. So FVSBN assume:

𝑥1 𝑥2 𝑥3 𝑥4 𝜎(𝛼 =7i 𝑝+
= 𝑥4 1
𝛼𝑥 i=𝑥11 𝑥+ ⋯𝑥 +…
𝛼(i)𝑥 𝑥i–1)= 𝑓 𝑥 , 𝑥 , … , 𝑥 ; 𝛼i
i i i–1 1, 2, i–1 i 1 $ i–1

FVSBN • σ denotes the sigmoid function

• The conditional for variable xi requires i parameters, and hence


the total number of parameters in the model is given by
𝑛(-1 𝑖 = 𝑂 𝑛
2
≪ 𝑂(2𝑛)

Gan Z , Henao R , Carlson D , et al. Learning Deep Sigmoid Belief Networks with Data Augmentation[C]// Artificial Intelligence and Statistics
(AISTATS). 2015.
FVSBN Example
• Suppose we have a dataset D of handwritten digits (binarised
MNIST)

• Each image has n = 28×28x1 = pixels. Each pixel can either be black (0)
or white (1).
• We want to learn a probability distribution 𝑝 𝑥= 𝑝(𝑥1, … , 𝑥784) over 𝑥 ∈
0,1 784such that when 𝑥~𝑝(𝑥), 𝑥 looks like a digit.
• Idea: define a FVSBN model , then pick a good one based on training data
D.
FVSBN Example
• We can pick an ordering, i.e., order variables (pixels) from top-left (𝑥1)
to bottom-right (𝑥)*+).
• Use product rule factorisation without loss of generality:

• FVSBN model assume: (less parameters)


𝑥!& = 𝑝 𝑥& = 1 𝑥1, 𝑥2, … 𝑥&#1 = 𝑓& 𝑥1, 𝑥$, … , 𝑥 𝛼&

= 𝜎(𝛼 - & +
1 𝛼
&
𝑥1 + ⋯ + 𝛼(&) 𝑥&#1)
&#1
modelling assumption
• Note: This is a . We are using a logistic regression
to predict next pixel distribution based on the previous ones. Called
autoregressive.
FVSBN Example
Neural Autoregressive Density Estimation (NADE)

https://www.youtube.com/watch?v=uLVo6KtWk2
Masked Autoencoder Density Estimation
(MADE)
Thank you

You might also like