Module 5
Module 5
tf.keras.layers.Dense(28 * 28),
tf.keras.layers.Reshape([28, 28])
])
Convolutional Autoencoders
Encoder:
conv_encoder = tf.keras.Sequential([
tf.keras.layers.MaxPool2D(pool_size=2), # output: 14 × 14 x 16
tf.keras.layers.MaxPool2D(pool_size=2), # output: 7 × 7 x 32
tf.keras.layers.MaxPool2D(pool_size=2), # output: 3 × 3 x 64
tf.keras.layers.GlobalAvgPool2D() # output: 30
])
Decoder:
conv_decoder = tf.keras.Sequential([
tf.keras.layers.Dense(3 * 3 * 16),
tf.keras.layers.Reshape((3, 3, 16)),
tf.keras.layers.Conv2DTranspose(32, 3, strides=2, activation="relu"),
tf.keras.layers.Conv2DTranspose(16, 3, strides=2, padding="same", activation="relu"),
tf.keras.layers.Conv2DTranspose(1, 3, strides=2, padding="same"),
tf.keras.layers.Reshape([28, 28])
])
]) dropout_ae =
tf.keras.Sequential([dropout_encoder,
dropout_decoder])
Sparse Autoencoders
Encoder: Decoder:
])
tf.keras.layers.Dense(300,
activation="sigmoid"), sparse_l1_ae =
tf.keras.Sequential([sparse_l1_encoder,
tf.keras.layers.ActivityRegularization(l1
=1e-4) sparse_l1_decoder])
])
Variational Autoencoders
Introduction and Historical Context
Probabilistic Nature
Generative Capabilities
● Can generate new instances that appear to come from the training set
● Similar to Restricted Boltzmann Machines (RBMs) but with advantages:
○ Easier to train
○ Faster sampling process (no need to wait for thermal equilibrium)
Technical Foundation
● Based on variational Bayesian inference
● Performs approximate Bayesian inference efficiently
● Updates probability distributions using Bayes' theorem
● Works with prior and posterior distributions
Architecture and Operation
● Has the standard encoder-decoder structure
● Encoder produces two outputs:
○ Mean coding (μ)
○ Standard deviation (σ)
● Actual coding is sampled from a Gaussian distribution using μ and σ
● Decoder then processes this sampled coding
● Final output resembles the training instance
Variational Autoencoder latent loss:
L = -1/2 * Σᵢ[1 + log(σᵢ²) - σᵢ² - μᵢ²]
Generative Adversarial Neural Networks (GANs)
Historical Context
● Proposed in 2014 by Ian Goodfellow and colleagues
● Generated immediate excitement in the research community
● Initial training difficulties took years to overcome
Core Concept
● Based on competition between neural networks
● Competition drives improvement in both networks
● Composed of two distinct neural networks working against each
other
Generator Network
● Takes random noise (typically Gaussian) as input
● Produces data (typically images) as output
● Random inputs serve as latent representations
● Functions similarly to a decoder in a VAE
● Can generate new images from random noise
Discriminator Network
● Takes images as input from two sources:
○ Fake images from the generator
○ Real images from the training set
● Must classify whether each input image is real or fake
● Acts as a binary classifier
The Difficulties of Training GANs
1. Training Dynamics
● Functions as a zero-sum game between generator and discriminator
● Aims to reach a Nash equilibrium state
● Only one theoretical optimal equilibrium exists:
○ Generator produces perfectly realistic images
○ Discriminator is forced to random guessing (50/50)
A 2021 paper by OpenAI researchers (Nichol and Dhariwal) improved DDPMs to surpass
GANs in performance. The advantages were: