0% found this document useful (0 votes)
24 views55 pages

Deep Generative Models

The document covers deep generative models, focusing on their training methods, including supervised and unsupervised learning, as well as specific models like autoencoders, variational autoencoders (VAEs), and generative adversarial networks (GANs). It discusses applications of generative models, such as generating realistic samples and data augmentation, while also highlighting the pros and cons of VAEs and GANs. The document emphasizes the importance of understanding the underlying structures of data and the challenges associated with training these models.

Uploaded by

dent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views55 pages

Deep Generative Models

The document covers deep generative models, focusing on their training methods, including supervised and unsupervised learning, as well as specific models like autoencoders, variational autoencoders (VAEs), and generative adversarial networks (GANs). It discusses applications of generative models, such as generating realistic samples and data augmentation, while also highlighting the pros and cons of VAEs and GANs. The document emphasizes the importance of understanding the underlying structures of data and the challenges associated with training these models.

Uploaded by

dent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Deep Generative Models

Mostafa Mehdipour Ghazi ([email protected])

Pioneer Centre for Artificial Intelligence


Department of Computer Science
2

Intended Learning Outcomes

• Deep network training


• Supervised learning, unsupervised learning
• Pre-training, transfer learning
• Deep generative models
• Explicit models, implicit models
• Autoencoders (AEs)
• Variational autoencoders (VAEs)
• From reconstruction to generation
• Generative adversarial networks (GANs)
• From generation to discrimination
3

Generative Models Applications

• Realistic samples
• Artwork, super-resolution, colorization, customization

https://arxiv.org/pdf/2108.02774v1
4

Generative Models Applications

• Realistic samples
• Artwork, super-resolution, colorization, customization

• Learning general latent representations


• Inference, interpretability, denoising & reconstruction

• Data augmentation
• Robust model training, bias & fairness
5

• Deep network training


• Supervised learning, unsupervised learning
• Pre-training, transfer learning
• Deep generative models
• Explicit models, implicit models
• Autoencoders (AEs)
• Variational autoencoders (VAEs)
• From reconstruction to generation
• Generative adversarial networks (GANs)
• From generation to discrimination
6

Supervised Learning

• Data: (predictor x, target y)


• Goal: learn a function to map x ↦ y
• Example: classification

Shallow learning

Deep learning

https://www.mlguru.ai/Learn/concepts-deep-learning
7

Supervised Learning

• Data: (predictor x, target y)


• Goal: learn a function to map x ↦ y
• Example: classification

Shallow learning

Deep learning
8

Supervised Learning

• Data: (predictor x, target y)


• Goal: learn a function to map x ↦ y
• Example: classification
9

Supervised Learning

• Data: (predictor x, target y)


• Goal: learn a function to map x ↦ y
• Example: detection, regression
10

Supervised Learning

• Data: (predictor x, target y)


• Goal: learn a function to map x ↦ y
• Example: pixel classification, segmentation
11

Unsupervised Learning

• Data: (predictor x), no labels!


• Goal: learn some underlying hidden structure of the data
• Example: clustering
12

Unsupervised Learning

• Data: (predictor x), no labels!


• Goal: learn some underlying hidden structure of the data
• Example: dimensionality reduction (PCA)

PC2

PC1
13

Unsupervised Learning

• Data: (predictor x), no labels!


• Goal: learn some underlying hidden structure of the data
• Example: density estimation
14

Unsupervised Learning

• Data: (predictor x), no labels! (self-supervised if pseudo labels from x)


• Goal: learn some underlying hidden structure of the data
• Example: reconstruction, imputation, denoising

MRIs with rotation and motion artifacts High-resolution MRI

https://arxiv.org/pdf/2308.04395
15

Unsupervised Learning

• Data: (predictor x), no labels! (self-supervised if pseudo labels from x)


• Goal: learn some underlying hidden structure of the data
• Example: generation

training data ~ pdata(x) generated samples ~ pmodel(x)

learning pmodel(x) similar to pdata(x)


16

Unsupervised Learning

• Data: (predictor x), no labels! (self-supervised if pseudo labels from x)


• Goal: learn some underlying hidden structure of the data
• Example: generation

training data ~ pdata(x) generated samples ~ pmodel(x)

learning pmodel(x) similar to pdata(x)


17

Self-Supervised Learning

• Supervisory signals
• Generating pseudo labels from the input data
• Example: masked patches are used as labels in masked autoencoders for reconstruction

• Pretext tasks
• Learning meaningful context-aware representations of the data with a given task
• Example: contrastive learning to differentiate pairs like augmented views of the same image

• Transferable representations
• Using the learned representations for downstream tasks
• Example: fine-tuning the encoder for classification or segmentation
18

Deep Network Training

• Supervised pre-training
• Uses labeled data
• Learns task-specific representations

• Unsupervised pre-training
• Uses unlabeled data
• Learns generic representations

• New task
• What are the strategies for training?
• What factors to consider?
19

Deep Network Training

• New task
• Large enough data & resources → training from scratch

• Limited data & time → transfer learning

• Similarities between the data & targets

• More differences → fine-tuning

• Multiple targets → multitask learning


20

• Deep network training


• Supervised learning, unsupervised learning
• Pre-training, transfer learning
• Deep generative models
• Explicit models, implicit models
• Autoencoders (AEs)
• Variational autoencoders (VAEs)
• From reconstruction to generation
• Generative adversarial networks (GANs)
• From generation to discrimination
21

Deep Generative Models

• Explicit models
• Learn a model that explicitly defines and estimates density pmodel(x)
• Example: VAEs, denoising diffusion models (DDMs)

• Implicit models
• Learn a model that samples from pmodel(x) w/o explicitly defining it
• Example: GANs

training data ~ pdata(x) generated samples ~ pmodel(x)

https://link.springer.com/chapter/10.1007/978-3-031-72744-3_19
22

Autoencoders

• Train such that features can be used to reconstruct original data


23

Autoencoders

• Train such that features can be used to reconstruct original data


• To learn without labels for denoising, imputation, or enhancement
24

Autoencoders

• Train such that features can be used to reconstruct original data


• Learn lower-dimensional features from unlabeled data
25

Autoencoders

• Train such that features can be used to reconstruct original data


• Learn lower-dimensional features from unlabeled data
• To capture meaningful factors of data variation for downstream tasks
26

Autoencoders

• Train such that features can be used to reconstruct original data


• Learn lower-dimensional features from unlabeled data
27

Autoencoders vs. U-Nets

• Similarities
28

Autoencoders vs. U-Nets

• Different in architectures: with or without skip connections


• Different in tasks: representations learning or segmentation
29

Autoencoders

• Can reconstruct data and learn features to initialize supervised models


• Can capture factors of variation in latent space from training data

https://lilianweng.github.io/posts/2018-08-12-vae/
30

Autoencoders

• Can reconstruct data and learn features to initialize supervised models


• Can capture factors of variation in latent space from training data
• Cannot generate/sample (new) data

https://lilianweng.github.io/posts/2018-08-12-vae/
31

Autoencoders

• Can reconstruct data and learn features to initialize supervised models


• Can capture factors of variation in latent space from training data
• Cannot generate/sample (new) data

https://lilianweng.github.io/posts/2018-08-12-vae/
32

Autoencoders

• Can reconstruct data and learn features to initialize supervised models


• Can capture factors of variation in latent space from training data
• Cannot generate/sample (new) data
• Different types
• Stacked/Denoising autoencoders (DAE)
• Sparse autoencoders (SAE)
• Contractive autoencoders (CAE)
• Masked autoencoders (MAE)

https://lilianweng.github.io/posts/2018-08-12-vae/
33

• Deep network training


• Supervised learning, unsupervised learning
• Pre-training, transfer learning
• Deep generative models
• Explicit models, implicit models
• Autoencoders (AEs)
• Variational autoencoders (VAEs)
• From reconstruction to generation
• Generative adversarial networks (GANs)
• From generation to discrimination
34

Variational Autoencoders

• Probabilistic spin on autoencoders to sample from the model


• Gaussian prior pθ(z) = N(0, I)
• The latent distribution is often non-Gaussian and complex
• Decoder likelihood pθ(x|z) = N(µ θ(z), ∑θ(z))
• Gaussian assumptions for tractability and flexibility

https://lilianweng.github.io/posts/2018-08-12-vae/
35

Variational Autoencoders

• Probabilistic spin on autoencoders to sample from the model


• Marginal data likelihood pθ(x) = ʃ pθ(x|z) pθ(z) dz is intractable
• Posterior density pθ(z|x) = pθ(x|z) pθ(z) / pθ(x) is also intractable
• Integrating over all possible latent configurations is computationally expensive and impractical

https://lilianweng.github.io/posts/2018-08-12-vae/
36

Variational Autoencoders

• Training
• Use an encoder/inference network qɸ(z|x) = N(µ ɸ(x), ∑ɸ(x)) that approximates pθ(z|x)
• Use maximum likelihood estimation to estimate the parameters of a model
minimize
maximize

https://lilianweng.github.io/posts/2018-08-12-vae/
37

Evidence Lower Bound (ELBO)

• Maximize the likelihood of the observed data 𝑥 under the model


38

Evidence Lower Bound (ELBO)

• Maximize the likelihood of the observed data 𝑥 under the model


39

Evidence Lower Bound (ELBO)

• Maximize the likelihood of the observed data 𝑥 under the model


40

Evidence Lower Bound (ELBO)

• Maximize the likelihood of the observed data 𝑥 under the model


41

Evidence Lower Bound (ELBO)

• Maximize the likelihood of the observed data 𝑥 under the model

https://link.springer.com/book/10.1007/978-3-030-93158-2
42

VAEs in Practice

• Training
• Data likelihood (evidence) lower bound is tractable
• Maximize log pθ(x) ≥ ELBO = Ez[log pθ(x|z)] - Ez[log qϕ(z|x) pθ(z)]
Reconstruction loss (x ; x’) - KLD (N(µ ɸ(x), ∑ ɸ(x)) ; N(0, I))

https://lilianweng.github.io/posts/2018-08-12-vae/
43

VAEs in Practice

• Training
• The encoder learns to output 𝜇 and 𝜎 for each input data point
• A latent vector 𝑧 is sampled for reconstruction, using the reparameterization trick
𝑧 = 𝜇 + 𝜎 ϵ, ϵ ∼ N(0, I)

• Generation
• Sample latent 𝑧 ∼ N(0, I)
• Pass 𝑧 through the decoder
44

Reparameterization Trick

• Reds are non-differentiable sampling operations and blues are loss layers
• The backpropagation can be applied to the reparametrized (right) network

https://arxiv.org/pdf/1606.05908v3
45

Summary

• Pros

• Cons
46

Summary

• Pros
• Generalization -> VAEs can generate diverse images due to better density modeling
• Interpretability -> VAEs latent representations can be used for interpretability

• Cons
47

Summary

• Pros
• Generalization -> VAEs can generate diverse images due to better density modeling
• Interpretability -> VAEs latent representations can be used for interpretability

• Cons
• Quality -> VAEs generate smoother/blurry and less detailed images
• Data -> VAEs require diverse enough data to span the entire distribution
• Dimensionality -> It is not clear how to choose the latent dimension
• Optimization -> The ELBO enforces an information bottleneck at the latent variables,
making the optimization prone to bad local minima
48

• Deep network training


• Supervised learning, unsupervised learning
• Pre-training, transfer learning
• Deep generative models
• Explicit models, implicit models
• Autoencoders (AEs)
• Variational autoencoders (VAEs)
• From reconstruction to generation
• Generative adversarial networks (GANs)
• From generation to discrimination
49

Generative Adversarial Networks

• GAN is a dynamic 2-player game


• Generator (Player 1) tries to create images that look real
• Discriminator (Player 2) tries to distinguish between real and fake images
• When Nash equilibrium is achieved, generated images are indistinguishable from real
• When no player can gain more by changing strategy given another player’s strategy is fixed

• How it works
• Iteration: both networks continuously update their strategies over time
• Learning: the generator learns to fool the discriminator, while the discriminator
becomes better at detecting fakes, mimicking the feedback loop seen in game theory
50

GANs in Practice

• Training GANs ->

https://newsletter.theaiedge.io/p/how-generative-adversarial-networks
51

Summary

• Pros

• Cons
52

Summary

• Pros
• Quality -> GANs can generate high-quality, sharp images
• Utility -> Adversarial concepts can be used to improve generation process

• Cons
53

Summary

• Pros
• Quality -> GANs can generate high-quality, sharp images
• Utility -> Adversarial concepts can be used for improving generation process

• Cons
• Training instability -> Jointly training two networks can result in mode collapse
• Bias and fairness -> GANs can reflect the biases present in the training data
• Interpretability -> GANs are implicit models and difficult to interpret or explain
54

Appendix

• Bayes’ rule

• Kullback-Leibler divergence

• Jensen’s inequality
• Linear function ( )
• Convex function ( )
• Concave function ( )
Thank you!

Any questions?

You might also like