Deep Generative Models
Deep Generative Models
• Realistic samples
• Artwork, super-resolution, colorization, customization
https://arxiv.org/pdf/2108.02774v1
4
• Realistic samples
• Artwork, super-resolution, colorization, customization
• Data augmentation
• Robust model training, bias & fairness
5
Supervised Learning
Shallow learning
Deep learning
https://www.mlguru.ai/Learn/concepts-deep-learning
7
Supervised Learning
Shallow learning
Deep learning
8
Supervised Learning
Supervised Learning
Supervised Learning
Unsupervised Learning
Unsupervised Learning
PC2
PC1
13
Unsupervised Learning
Unsupervised Learning
https://arxiv.org/pdf/2308.04395
15
Unsupervised Learning
Unsupervised Learning
Self-Supervised Learning
• Supervisory signals
• Generating pseudo labels from the input data
• Example: masked patches are used as labels in masked autoencoders for reconstruction
• Pretext tasks
• Learning meaningful context-aware representations of the data with a given task
• Example: contrastive learning to differentiate pairs like augmented views of the same image
• Transferable representations
• Using the learned representations for downstream tasks
• Example: fine-tuning the encoder for classification or segmentation
18
• Supervised pre-training
• Uses labeled data
• Learns task-specific representations
• Unsupervised pre-training
• Uses unlabeled data
• Learns generic representations
• New task
• What are the strategies for training?
• What factors to consider?
19
• New task
• Large enough data & resources → training from scratch
• Explicit models
• Learn a model that explicitly defines and estimates density pmodel(x)
• Example: VAEs, denoising diffusion models (DDMs)
• Implicit models
• Learn a model that samples from pmodel(x) w/o explicitly defining it
• Example: GANs
https://link.springer.com/chapter/10.1007/978-3-031-72744-3_19
22
Autoencoders
Autoencoders
Autoencoders
Autoencoders
Autoencoders
• Similarities
28
Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
30
Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
31
Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
32
Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
33
Variational Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
35
Variational Autoencoders
https://lilianweng.github.io/posts/2018-08-12-vae/
36
Variational Autoencoders
• Training
• Use an encoder/inference network qɸ(z|x) = N(µ ɸ(x), ∑ɸ(x)) that approximates pθ(z|x)
• Use maximum likelihood estimation to estimate the parameters of a model
minimize
maximize
https://lilianweng.github.io/posts/2018-08-12-vae/
37
https://link.springer.com/book/10.1007/978-3-030-93158-2
42
VAEs in Practice
• Training
• Data likelihood (evidence) lower bound is tractable
• Maximize log pθ(x) ≥ ELBO = Ez[log pθ(x|z)] - Ez[log qϕ(z|x) pθ(z)]
Reconstruction loss (x ; x’) - KLD (N(µ ɸ(x), ∑ ɸ(x)) ; N(0, I))
https://lilianweng.github.io/posts/2018-08-12-vae/
43
VAEs in Practice
• Training
• The encoder learns to output 𝜇 and 𝜎 for each input data point
• A latent vector 𝑧 is sampled for reconstruction, using the reparameterization trick
𝑧 = 𝜇 + 𝜎 ϵ, ϵ ∼ N(0, I)
• Generation
• Sample latent 𝑧 ∼ N(0, I)
• Pass 𝑧 through the decoder
44
Reparameterization Trick
• Reds are non-differentiable sampling operations and blues are loss layers
• The backpropagation can be applied to the reparametrized (right) network
https://arxiv.org/pdf/1606.05908v3
45
Summary
• Pros
• Cons
46
Summary
• Pros
• Generalization -> VAEs can generate diverse images due to better density modeling
• Interpretability -> VAEs latent representations can be used for interpretability
• Cons
47
Summary
• Pros
• Generalization -> VAEs can generate diverse images due to better density modeling
• Interpretability -> VAEs latent representations can be used for interpretability
• Cons
• Quality -> VAEs generate smoother/blurry and less detailed images
• Data -> VAEs require diverse enough data to span the entire distribution
• Dimensionality -> It is not clear how to choose the latent dimension
• Optimization -> The ELBO enforces an information bottleneck at the latent variables,
making the optimization prone to bad local minima
48
• How it works
• Iteration: both networks continuously update their strategies over time
• Learning: the generator learns to fool the discriminator, while the discriminator
becomes better at detecting fakes, mimicking the feedback loop seen in game theory
50
GANs in Practice
https://newsletter.theaiedge.io/p/how-generative-adversarial-networks
51
Summary
• Pros
• Cons
52
Summary
• Pros
• Quality -> GANs can generate high-quality, sharp images
• Utility -> Adversarial concepts can be used to improve generation process
• Cons
53
Summary
• Pros
• Quality -> GANs can generate high-quality, sharp images
• Utility -> Adversarial concepts can be used for improving generation process
• Cons
• Training instability -> Jointly training two networks can result in mode collapse
• Bias and fairness -> GANs can reflect the biases present in the training data
• Interpretability -> GANs are implicit models and difficult to interpret or explain
54
Appendix
• Bayes’ rule
• Kullback-Leibler divergence
• Jensen’s inequality
• Linear function ( )
• Convex function ( )
• Concave function ( )
Thank you!
Any questions?