0% found this document useful (0 votes)
6 views

Module 5

The document provides an overview of various types of autoencoders, including Stacked, Convolutional, Denoising, Sparse, and Variational Autoencoders, detailing their architectures and functionalities. It also discusses Generative Adversarial Networks (GANs), their training challenges, and proposed solutions, as well as the advancements in diffusion models for image generation. Key concepts include the probabilistic nature of variational autoencoders and the competitive dynamics of GANs, highlighting their respective advantages and limitations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 5

The document provides an overview of various types of autoencoders, including Stacked, Convolutional, Denoising, Sparse, and Variational Autoencoders, detailing their architectures and functionalities. It also discusses Generative Adversarial Networks (GANs), their training challenges, and proposed solutions, as well as the advancements in diffusion models for image generation. Key concepts include the probabilistic nature of variational autoencoders and the competitive dynamics of GANs, highlighting their respective advantages and limitations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Stacked Autoencoders

stacked_encoder = tf.keras.Sequential([ stacked_ae =


tf.keras.Sequential([stacked_encoder,stacked_
tf.keras.layers.Flatten(),
decoder])
tf.keras.layers.Dense(100, activation="relu"),
stacked_ae.compile(loss="mse",
tf.keras.layers.Dense(30, activation="relu"), optimizer="nadam")

]) history = stacked_ae.fit(X_train, X_train,


epochs=20,
stacked_decoder = tf.keras.Sequential([
validation_data=(X_valid, X_valid))
tf.keras.layers.Dense(100, activation="relu"),

tf.keras.layers.Dense(28 * 28),

tf.keras.layers.Reshape([28, 28])

])
Convolutional Autoencoders
Encoder:

conv_encoder = tf.keras.Sequential([

tf.keras.layers.Reshape([28, 28, 1]),

tf.keras.layers.Conv2D(16, 3, padding="same", activation="relu"),

tf.keras.layers.MaxPool2D(pool_size=2), # output: 14 × 14 x 16

tf.keras.layers.Conv2D(32, 3, padding="same", activation="relu"),

tf.keras.layers.MaxPool2D(pool_size=2), # output: 7 × 7 x 32

tf.keras.layers.Conv2D(64, 3, padding="same", activation="relu"),

tf.keras.layers.MaxPool2D(pool_size=2), # output: 3 × 3 x 64

tf.keras.layers.Conv2D(30, 3, padding="same", activation="relu"),

tf.keras.layers.GlobalAvgPool2D() # output: 30

])
Decoder:
conv_decoder = tf.keras.Sequential([
tf.keras.layers.Dense(3 * 3 * 16),
tf.keras.layers.Reshape((3, 3, 16)),
tf.keras.layers.Conv2DTranspose(32, 3, strides=2, activation="relu"),
tf.keras.layers.Conv2DTranspose(16, 3, strides=2, padding="same", activation="relu"),
tf.keras.layers.Conv2DTranspose(1, 3, strides=2, padding="same"),
tf.keras.layers.Reshape([28, 28])
])

conv_ae = tf.keras.Sequential([conv_encoder, conv_decoder])


Denoising Autoencoders
Encoder: Decoder:
dropout_encoder = tf.keras.Sequential([ dropout_decoder = tf.keras.Sequential([
tf.keras.layers.Flatten(), tf.keras.layers.Dense(100,
tf.keras.layers.Dropout(0.5), activation="relu"),

tf.keras.layers.Dense(100, tf.keras.layers.Dense(28 * 28),


activation="relu"),
tf.keras.layers.Reshape([28, 28])
tf.keras.layers.Dense(30,
activation="relu") ])

]) dropout_ae =
tf.keras.Sequential([dropout_encoder,

dropout_decoder])
Sparse Autoencoders
Encoder: Decoder:

sparse_l1_encoder = sparse_l1_decoder = tf.keras.Sequential([


tf.keras.Sequential([ tf.keras.layers.Dense(100,
activation="relu"),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(28 * 28),
tf.keras.layers.Dense(100,
activation="relu"), tf.keras.layers.Reshape([28, 28])

])
tf.keras.layers.Dense(300,
activation="sigmoid"), sparse_l1_ae =
tf.keras.Sequential([sparse_l1_encoder,
tf.keras.layers.ActivityRegularization(l1
=1e-4) sparse_l1_decoder])

])
Variational Autoencoders
Introduction and Historical Context

● Introduced in 2013 by Diederik Kingma and Max Welling


● Quickly became one of the most popular autoencoder variants

Probabilistic Nature

● Outputs are partially determined by chance, even after training


● This differs from denoising autoencoders, which only use randomness during
training

Generative Capabilities
● Can generate new instances that appear to come from the training set
● Similar to Restricted Boltzmann Machines (RBMs) but with advantages:
○ Easier to train
○ Faster sampling process (no need to wait for thermal equilibrium)
Technical Foundation
● Based on variational Bayesian inference
● Performs approximate Bayesian inference efficiently
● Updates probability distributions using Bayes' theorem
● Works with prior and posterior distributions
Architecture and Operation
● Has the standard encoder-decoder structure
● Encoder produces two outputs:
○ Mean coding (μ)
○ Standard deviation (σ)
● Actual coding is sampled from a Gaussian distribution using μ and σ
● Decoder then processes this sampled coding
● Final output resembles the training instance
Variational Autoencoder latent loss:
L = -1/2 * Σᵢ[1 + log(σᵢ²) - σᵢ² - μᵢ²]
Generative Adversarial Neural Networks (GANs)
Historical Context
● Proposed in 2014 by Ian Goodfellow and colleagues
● Generated immediate excitement in the research community
● Initial training difficulties took years to overcome
Core Concept
● Based on competition between neural networks
● Competition drives improvement in both networks
● Composed of two distinct neural networks working against each
other
Generator Network
● Takes random noise (typically Gaussian) as input
● Produces data (typically images) as output
● Random inputs serve as latent representations
● Functions similarly to a decoder in a VAE
● Can generate new images from random noise
Discriminator Network
● Takes images as input from two sources:
○ Fake images from the generator
○ Real images from the training set
● Must classify whether each input image is real or fake
● Acts as a binary classifier
The Difficulties of Training GANs
1. Training Dynamics
● Functions as a zero-sum game between generator and discriminator
● Aims to reach a Nash equilibrium state
● Only one theoretical optimal equilibrium exists:
○ Generator produces perfectly realistic images
○ Discriminator is forced to random guessing (50/50)

2. Nash Equilibrium Concept


● State where no player benefits from changing strategy alone
● Can have single optimal strategy (like driving side)
● Can involve multiple competing strategies (predator-prey example)
3. Major Challenges

● Reaching equilibrium isn't guaranteed


● Mode collapse is a significant issue:
○ Generator focuses on one type of output
○ Gradually loses diversity
○ May cycle between different classes
● Training instability:
○ Parameters can oscillate
○ Training may suddenly diverge
○ Very sensitive to hyperparameters
4. Proposed Solutions
● Experience replay:
○ Stores generated images in buffer
○ Trains discriminator on mix of current and stored fake images
○ Reduces discriminator overfitting
● Mini-batch discrimination:
○ Measures similarity across image batches
○ Helps discriminator identify lack of diversity
○ Encourages generator variety
5. Current State
● Remains active research field
● GAN dynamics not fully understood
● Significant progress made
● Results can be impressive
● Moving toward more complex architectures
Diffusion Model
The modern formalization of diffusion models came from a 2015 paper by Sohl-Dickstein
et al. from Stanford and UC Berkeley. They used thermodynamics principles to model a
diffusion process similar to milk mixing in tea, but aimed to reverse the process.

In 2020, Jonathan Ho et al. from UC Berkeley created the denoising diffusion


probabilistic model (DDPM) that could generate highly realistic images. Their work
marked a significant advancement in the field.

A 2021 paper by OpenAI researchers (Nichol and Dhariwal) improved DDPMs to surpass
GANs in performance. The advantages were:

● Easier to train than GANs


● Generated more diverse images
● Produced higher quality images
● Main drawback: Much slower image generation compared to GANs or VAEs
The DDPM process works as follows:
● Start with an initial image (x0)
● Add Gaussian noise at each time step t (with mean 0 and variance
βt)
● Noise is added independently for each pixel (isotropic)
● Process continues until the original image is completely hidden
Technical implementation details:
● Original DDPM paper used 1,000 time steps
● Improved version increased to 4,000 time steps
● Pixel values are rescaled at each step by √1 − βt
● Mean of pixel values approaches 0
● Variance converges to 1
The forward process probability distribution is defined by:
q(xt|xt-1) = N(√1 - βt xt-1, βtI)
Where:
● N represents a Gaussian distribution
● βt is the variance at time step t
● I is the identity matrix
The ultimate goal is image generation:
● Train a model to perform the reverse process (xt to xt-1)
● Start with pure Gaussian noise
● Gradually remove noise until a new image emerges
● Model trained on specific image types (like cats) will generate
similar images

You might also like