0% found this document useful (0 votes)
66 views71 pages

CISC 867 Deep Learning: 15. Generative Adversarial Networks

Generative adversarial networks (GANs) are a class of deep learning models used to generate synthetic data. GANs train two neural networks - a generator that produces synthetic data and a discriminator that evaluates synthetic data and tries to distinguish it from real data. The generator and discriminator compete - the generator tries to fool the discriminator while the discriminator tries to correctly identify synthetic data. This adversarial training process continues, ideally improving the generator's ability to produce increasingly realistic synthetic data that the discriminator cannot distinguish from real data. While GANs can produce impressive results, training them is challenging due to the competing loss functions and lack of guarantees that the generator and discriminator will continue improving with each iteration of training.

Uploaded by

adel hany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views71 pages

CISC 867 Deep Learning: 15. Generative Adversarial Networks

Generative adversarial networks (GANs) are a class of deep learning models used to generate synthetic data. GANs train two neural networks - a generator that produces synthetic data and a discriminator that evaluates synthetic data and tries to distinguish it from real data. The generator and discriminator compete - the generator tries to fool the discriminator while the discriminator tries to correctly identify synthetic data. This adversarial training process continues, ideally improving the generator's ability to produce increasingly realistic synthetic data that the discriminator cannot distinguish from real data. While GANs can produce impressive results, training them is challenging due to the competing loss functions and lack of guarantees that the generator and discriminator will continue improving with each iteration of training.

Uploaded by

adel hany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

CISC 867 Deep Learning

15. Generative Adversarial Networks

Credits: Vassilis Athitsos, Justin Johnson

1
Synthetic Data

• Synthetic data is data that is generated by a machine.


• For example:
– A “real” face image is a photograph of someone’s face.
– A “synthetic” face is an image made up by a computer program,
that may have been designed to resemble someone, or may have
been designed to not resemble anyone.
• Synthetic data can have many forms:
– Synthetic text, for example a story or script or joke (or attempted
joke) produced by a computer program.
– Synthetic music.
– Synthetic images and video.

2
Uses of Synthetic Data

• What are possible uses of synthetic data?


• Synthetic data is often used as training data, especially
when “real” data is not as abundant as we would like.
• One example is hand pose estimation.

input image

Hand pose:
• Hand shape
• 3D hand
orientation.
3
Realistic Scenes in Games and Movies

• Realistic synthetic data is highly valued in the gaming


and entertainment industry.
• For example:
– Scenes in sci-fi and phantasy movies may integrate real actors and
landscapes with imaginary creatures and landscapes.
– Scenes in action movies showing explosions and massive
destruction can be much safer and cheaper to produce if they are
not real.
– In computer games, it may be important for people, objects and/or
scenery to look realistic.
– Realistic motion is also important, and can be very challenging to
synthesize (for example, realistic motion of smoke, fire, water,
humans and animals).

4
Generative Adversarial Networks

• Generative Adversarial Networks (GANs) were


introduced in 2014 by this paper:
Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing;
Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio,
Yoshua (2014). Generative Adversarial Nets (PDF). Proceedings
of the International Conference on Neural Information
Processing Systems (NIPS 2014). pp. 2672–2680.
https://arxiv.org/abs/1406.2661
• GANs have become very popular and are commonly
used to generate realistic synthetic data.

5
Generator and Discriminator

• What we really want is a “generator”: a module that


produces realistic synthetic data.
• However, in a GAN model, we essentially train two
separate modules that compete with each other:
– The generator module, that produces synthetic data that is
hopefully very realistic.
– A discriminator module, that is trained to recognize if a piece of
data is real or synthetic.

6
Generator and Discriminator

• The word “adversarial” in Generative Adversarial


Networks refers to the fact that the generator and the
discriminator actually compete with each other.
• The goal of the generator is to be so good that it can fool
the discriminator as often as possible.
– A good generator produces synthetic data that cannot be
distinguished from real data, so the discriminator fails at that task.
• The goal of the discriminator is to be so good that it
cannot be fooled by the generator.
– The discriminator should tell with high accuracy if a piece of data
is real or synthetic.

7
How It (Hopefully) Works

• The first version of the generator is initialized with random


weights. Consequently, it produces random images that are
not realistic at all.
• The discriminator is trained on a training set that combines:
– A hopefully large number of real images.
– An equally large number of images produced by the generator.
• Since the generated images are not realistic, the discriminator
should achieve very high accuracy on this initial training set.
• Now we can train a second version of the generator.
– Each input is just a random vector, that is used to make sure that the
output images are not identical to each other.
– The loss function is computed by giving the output of the generator to
the discriminator. The more confident the discriminator is that the
image is synthetic, the higher the loss.

8
How It (Hopefully) Works

• The second version of the generator should be better than


the initial version with random weights.
– The output images should now be more realistic.
• We now train a second version of the discriminator,
incorporating into the training set the output images of
the second version of the generator.
• Then, we train a third version of the generator, using the
second version of the discriminator.
• And so on, we keep training alternatively:
– a new version of the discriminator, using the latest version of the
discriminator.
– a new version of the generator, using the latest version of the
discriminator.

9
Problems With Convergence

• In the models we have trained previously this semester, we


were optimizing a single loss function.
– Both theoretically and practically, we knew that we would get the
model weights to converge to a local optimum.
• Here, we have two competing loss functions:
– The generator loss function, that is optimized as the generator gets
better at fooling the discriminator.
– The discriminator loss function, that is optimized as the discriminator
gets better at NOT being fooled by the generator.
• We optimizing these losses iteratively, one after the other.
• It would be nice to be able to guarantee that after each
iteration, both the generator and the discriminator are better
(or at least not worse) than they were before that iteration.
– Unfortunately, the opposite can also happen.

10
Problems With Convergence

• For example, suppose that we get to a point where the


generator is really great, and it fools the discriminator to
the maximum extent.
• What is the “maximum extent”?
– The discriminator has to solve a binary classification problem: “real”
vs. “synthetic”.
– A random classifier would attain 50% accuracy.
– With a perfect generator, the discriminator will be no better and no
worse than a random classifier.
• If the generator is perfect, training the discriminator will
produce a useless model, equivalent to a random classifier.
• The previous version of the discriminator, trained with
data from an imperfect generator, would probably be
better than the current version.

11
Problems With Convergence

• Conversely, suppose that we get to a point where the


discriminator is 100% accurate, so that it is never fooled.
– In that case, training the generator will produce a useless model,
equivalent to a random image generator, since there will be no effect
in the loss function by producing more realistic images.
– The previous version of the generator, trained with data from an
imperfect discriminator, would probably be better than the current
version.
• So, overall, if one of the two components gets too good,
then that makes it harder to improve the other component.
• In practice, GANs are used and often produce great
results, but the system designer may need to manually
intervene to guide the training to the right direction.
– Overall, training GANs is somewhat complicated and heuristic.

12
Needed Math
Expectation

An expected value of a function of a random variable is the average


or mean value with respect to its distribution

𝐸𝑋~𝑃(𝑥) 𝑓(𝑥) ≡ ෍ 𝑓 𝑥 𝑃(𝑥)


𝑥∈𝑋

𝐸𝑋~𝑝(𝑥) 𝑓(𝑥) ≡ න 𝑓 𝑥 𝑝(𝑥) 𝑑𝑥


𝑥∈𝑋

13
Kullback–Leibler Divergence

KL (Kullback–Leibler) divergence measures how one probability


distribution 𝑝 diverges from a second expected probability
distribution 𝑞
𝑝 𝑥
𝐷𝐾𝐿 𝑝ԡ𝑞 = න 𝑝 𝑥 log 𝑑𝑥
𝑥 𝑞 𝑥
KL divergence is asymmetric.

minimum zero when 𝑝 𝑥 = 𝑞(𝑥) everywhere.

From GAN to WGAN (Weng, 2017):


https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

14
Jensen–Shannon Divergence

Jensen–Shannon Divergence is another measure of similarity


between two probability distributions

1 𝑝+𝑞 1 𝑝+𝑞
𝐷𝐽𝑆 𝑝ԡ𝑞 = 𝐷 𝑝 ԡ + 𝐷𝐾𝐿 𝑞 ԡ
2 𝐾𝐿 2 2 2

JS divergence is symmetric

From GAN to WGAN (Weng, 2017):


https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

15
Generative Adversarial Networks

Setup: Assume we have data xi drawn from distribution pdata(x). Want to sample from pdata.

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

16
Generative Adversarial Networks
Setup: Assume we have data xi drawn from distribution pdata(x). Want to sample from pdata.

Idea: Introduce a latent variable z with simple prior p(z).


Sample 𝑧 ∼ 𝑝(𝑧) and pass to a Generator Network x = G(z)
Then x is a sample from the Generator distribution pG. Want pG = pdata!

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

17
Generative Adversarial Networks
Setup: Assume we have data xi drawn from distribution pdata(x). Want to sample from pdata.

Idea: Introduce a latent variable z with simple prior p(z).


Sample 𝑧 ∼ 𝑝(𝑧) and pass to a Generator Network x = G(z)
Then x is a sample from the Generator distribution pG. Want pG = pdata!

Generator Generated
Network Sample
Sample
z from pz
z G
Train Generator Network G to convert
z into fake data x sampled from pG

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

18
Generative Adversarial Networks
Setup: Assume we have data xi drawn from distribution pdata(x). Want to sample from pdata.

Idea: Introduce a latent variable z with simple prior p(z).


Sample 𝑧 ∼ 𝑝(𝑧) and pass to a Generator Network x = G(z)
Then x is a sample from the Generator distribution pG. Want pG = pdata!

Generator Generated Discriminator


Network Sample Network
Sample
z from pz
z G Fake
D
Train Generator Network G to convert
z into fake data x sampled from pG Real
by ”fooling” the discriminator D Train Discriminator Network D to
Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014
Real Sample classify data as real or fake (1/0)

19
Generative Adversarial Networks
Setup: Assume we have data xi drawn from distribution pdata(x). Want to sample from pdata.

Idea: Introduce a latent variable z with simple prior p(z).


Sample 𝑧 ∼ 𝑝(𝑧) and pass to a Generator Network x = G(z)
Then x is a sample from the Generator distribution pG. Want pG = pdata! Jointly train G and
D. Hopefully pG
Generator Generated Discriminator
converges to pdata!
Network Sample Network
Sample
z from pz
z G Fake
D
Train Generator Network G to convert
z into fake data x sampled from pG Real
by ”fooling” the discriminator D Train Discriminator Network D to
Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014
Real Sample classify data as real or fake (1/0)

20
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

21
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

Generator Generated Discriminator


Network Sample Network
Sample
z from pz
z G Fake
D
Real

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

22
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game
Discriminator wants
D(x) = 1 for real data

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
Generator Generated Discriminator
Network Sample Network
Sample
z from pz
z G Fake
D
Real

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

23
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game
Discriminator wants Discriminator wants
D(x) = 1 for real data D(x) = 0 for fake data

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
Generator Generated Discriminator
Network Sample Network
Sample
z from pz
z G Fake
D
Real

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

24
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game
Discriminator wants Discriminator wants
D(x) = 1 for real data D(x) = 0 for fake data

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
Generator Generated Discriminator
Network Sample Network Generator wants
Sample
z from pz
z G Fake D(x) = 1 for fake data
D
Real

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

25
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

Train G and D using alternating gradient updates

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

26
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

Train G and D using alternating gradient updates

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝑽(𝑮,𝑫)


𝑮 𝑫

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

27
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

Train G and D using alternating gradient updates

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝑽(𝑮,𝑫) For t in 1, … T:


𝑮 𝑫 𝜕𝑽
1. (Update D) 𝑫 = 𝑫 + 𝛼 𝑫
𝜕𝑫
2. (Update G) 𝑮 = 𝑮 − 𝛼𝑮 𝜕 V
𝜕𝑮
Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

28
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

Train G and D using alternating gradient updates

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝑽(𝑮, 𝑫) For t in 1, … T:


𝑮 𝑫 𝜕𝑽
1. (Update D) 𝑫 = 𝑫 + 𝛼 𝑫
We are not minimizing any overall 𝜕𝑫
loss! No training curves to look at! 2. (Update G) 𝑮 = 𝑮 − 𝛼𝑮 𝜕 𝑽
𝜕𝑮
Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

29
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

At start of training, generator is very bad


and discriminator can easily tell apart
real/fake, so D(G(z)) close to 0

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

30
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

At start of training, generator is very bad


and discriminator can easily tell apart
real/fake, so D(G(z)) close to 0
Problem: Vanishing gradients for G

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

31
Generative Adversarial Networks:
Training Objective
Jointly train generator G and discriminator D with a minimax game
min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛
𝑮 𝑫
At start of training, generator is very bad
and discriminator can easily tell apart
real/fake, so D(G(z)) close to 0
Problem: Vanishing gradients for G
Solution: Right now G is trained to
minimize log(1-D(G(z)). Instead, train G
to minimize –log(D(G(z)). Then G gets
strong gradients at start of training!

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

32
Generative Adversarial Networks: Optimality
Jointly train generator G and discriminator D with a minimax game

Why is this particular objective a good idea?

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

This minimax game achieves its global minimum when pG = pdata!

Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

33
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

(Our objective so far)

34
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

(Change of variables on second term)

35
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min max න 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋
(Definition of expectation)

36
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋

(Push maxD inside integral)

37
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋
𝑓 𝑦 = 𝑎 log 𝑦 + 𝑏 log 1 − 𝑦

(Side computation to compute max)

38
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋
𝑓 𝑦 = 𝑎 log 𝑦 + 𝑏 log 1 − 𝑦

𝑎 𝑏
𝑓′ 𝑦 = −
𝑦 1−𝑦

39
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋
𝑎
𝑓 𝑦 = 𝑎 log 𝑦 + 𝑏 log 1 − 𝑦 𝑓′ 𝑦 = 0 ⟺ 𝑦 = (local max)
𝑎+𝑏

𝑎 𝑏
𝑓′ 𝑦 = −
𝑦 1−𝑦

40
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋
𝑎
𝑓 𝑦 = 𝑎 log 𝑦 + 𝑏 log 1 − 𝑦 𝑓′ 𝑦 = 0 ⟺ 𝑦 = (local max)
𝑎+𝑏

𝑎 𝑏 ∗
𝑝𝑑𝑎𝑡𝑎 𝑥
𝑓′ 𝑦 = − Optimal Discriminator: 𝐷𝐺 𝑥 =
𝑦 1−𝑦 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝐺 𝑥

41
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න max 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝑫 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝑫 𝑥 𝑑𝑥


𝑮 𝑫
𝑋

𝑝𝑑𝑎𝑡𝑎 𝑥
Optimal Discriminator: 𝐷𝐺∗ 𝑥 =
𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝐺 𝑥

42
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷𝑮∗ 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝐷𝐺∗ 𝑥 𝑑𝑥


𝑮
𝑋

𝑝𝑑𝑎𝑡𝑎 𝑥
Optimal Discriminator: 𝐷𝐺∗ 𝑥 =
𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝐺 𝑥

43
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒙~𝑝𝑮 log 1 − 𝑫 𝒙


𝑮 𝑫

= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷𝑮∗ 𝑥 + 𝑝𝑮 𝑥 log 1 − 𝐷𝐺∗ 𝑥 𝑑𝑥


𝑮
𝑋
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log +𝑝𝑮 𝑥 log 𝑑𝑥
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑋
𝑝𝑑𝑎𝑡𝑎 𝑥
Optimal Discriminator: 𝐷𝐺∗ 𝑥 =
𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝐺 𝑥

44
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log +𝑝𝑮 𝑥 log 𝑑𝑥
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑋

45
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log +𝑝𝑮 𝑥 log 𝑑𝑥
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑋
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

(Definition of expectation)

46
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log +𝑝𝑮 𝑥 log 𝑑𝑥
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑋
2 𝑝𝑑𝑎𝑡𝑎 𝑥 2 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log
𝑮 2 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 2 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

(Multiply by a constant)

47
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
𝑝𝑑𝑎𝑡𝑎 𝑥 𝑝𝑮 𝑥
= min න 𝑝𝑑𝑎𝑡𝑎 𝑥 log +𝑝𝑮 𝑥 log 𝑑𝑥
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑋

2 𝑝𝑑𝑎𝑡𝑎 𝑥 2 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log
𝑮 2 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 2 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

48
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

49
Generative Adversarial Networks: Optimality
min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛
𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

Kullback-Leibler Divergence:
𝑝 𝑥
𝐾𝐿 𝑝, 𝑞 = 𝐸𝑥~𝑝 log
𝑞 𝑥

50
Generative Adversarial Networks: Optimality
min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛
𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥

Kullback-Leibler Divergence:
𝑝 𝑥
𝐾𝐿 𝑝, 𝑞 = 𝐸𝑥~𝑝 log
𝑞 𝑥

51
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐

Kullback-Leibler Divergence:
𝑝 𝑥
𝐾𝐿 𝑝, 𝑞 = 𝐸𝑥~𝑝 log
𝑞 𝑥

52
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐

53
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐

Jensen-Shannon Divergence:
1 𝑝+𝑞 1 𝑝+𝑞
𝐽𝑆𝐷 𝑝, 𝑞 = 𝐾𝐿 𝑝, + 𝐾𝐿 𝑞,
2 2 2 2

54
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐

Jensen-Shannon Divergence:
1 𝑝+𝑞 1 𝑝+𝑞
𝐽𝑆𝐷 𝑝, 𝑞 = 𝐾𝐿 𝑝, + 𝐾𝐿 𝑞,
2 2 2 2

55
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐
= min 2 ∗ 𝐽𝑆𝐷 𝑝𝑑𝑎𝑡𝑎 , 𝑝𝑮 − log 4
𝑮
Jensen-Shannon Divergence:
1 𝑝+𝑞 1 𝑝+𝑞
𝐽𝑆𝐷 𝑝, 𝑞 = 𝐾𝐿 𝑝, + 𝐾𝐿 𝑞,
2 2 2 2

56
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
2 ∗ 𝑝𝑑𝑎𝑡𝑎 𝑥 2 ∗ 𝑝𝑮 𝑥
= min 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log + 𝐸𝑥~𝑝𝑮 log − log 4
𝑮 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑮 𝑥
𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮 𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑮
= min 𝐾𝐿 𝑝𝑑𝑎𝑡𝑎 , + 𝐾𝐿 𝑝𝑮 , − log 4
𝑮 𝟐 𝟐
= min 2 ∗ 𝐽𝑆𝐷 𝑝𝑑𝑎𝑡𝑎 , 𝑝𝑮 − log 4
𝑮
JSD is always nonnegative, and zero only Jensen-Shannon Divergence:
when the two distributions are equal! 1 𝑝+𝑞 1 𝑝+𝑞
Thus pdata = pG is the global min 𝐽𝑆𝐷 𝑝, 𝑞 = 𝐾𝐿 𝑝, + 𝐾𝐿 𝑞,
2 2 2 2

57
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫

= min 2 ∗ 𝐽𝑆𝐷 𝑝𝑑𝑎𝑡𝑎 , 𝑝𝑮 − log 4


𝑮

58
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
= min 2 ∗ 𝐽𝑆𝐷 𝑝𝑑𝑎𝑡𝑎 , 𝑝𝑮 − log 4
𝑮
Summary: The global minimum of the minimax game happens when:
𝑝𝑑𝑎𝑡𝑎 𝑥
1. 𝐷𝐺∗ 𝑥 = (Optimal discriminator for any G)
𝑝𝑑𝑎𝑡𝑎 𝑥 +𝑝𝐺 𝑥
2. 𝑝𝑮 𝑥 = 𝑝𝑑𝑎𝑡𝑎 𝑥 (Optimal generator for optimal D)

59
Generative Adversarial Networks: Optimality

min max 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎 log 𝑫 𝑥 + 𝐸𝒛~𝑝(𝒛) log 1 − 𝑫 𝑮 𝒛


𝑮 𝑫
= min 2 ∗ 𝐽𝑆𝐷 𝑝𝑑𝑎𝑡𝑎 , 𝑝𝑮 − log 4
𝑮
Summary: The global minimum of the minimax game happens when:
𝑝𝑑𝑎𝑡𝑎 𝑥
1. 𝐷𝐺∗ 𝑥 = (Optimal discriminator for any G)
𝑝𝑑𝑎𝑡𝑎 𝑥 +𝑝𝐺 𝑥
2. 𝑝𝑮 𝑥 = 𝑝𝑑𝑎𝑡𝑎 𝑥 (Optimal generator for optimal D)
Caveats:
1. G and D are neural nets with fixed architecture. We don’t know
whether they can actually represent the optimal D and G.
2. This tells us nothing about convergence to the optimal solution

60
Generative Adversarial Networks: Results

Generated samples

Nearest neighbor from training set


Goodfellow et al, “Generative Adversarial Nets”, NeurIPS 2014

61
Generative Adversarial Networks: DC-GAN

Generator
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

62
Generative Adversarial Networks: DC-GAN

Samples
from the
model
look
much
better!

Radford et al, ICLR 2016

63
Generative Adversarial Networks: Interpolation

Interpolating
between
points in
latent z
space

Radford et al, ICLR 2016

64
Generative Adversarial Networks: Vector Math
Smiling Neutral Neutral
woman woman man

Samples
from the
model

Radford et al, ICLR 2016

65
Generative Adversarial Networks: Vector Math
Smiling Neutral Neutral
woman woman man

Samples
from the
model

Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016

66
Generative Adversarial Networks: Vector Math
Smiling Neutral Neutral
woman woman man

Samples Smiling Man


from the
model

Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016

67
Generative Adversarial Networks: Vector Math
Man with Man w/o Woman
glasses glasses w/o glasses

Samples
from the
model

Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016

68
Generative Adversarial Networks: Vector Math
Man with Man w/o Woman
glasses glasses w/o glasses

Woman with
Samples
glasses
from the
model

Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016

69
2017 to present: Explosion of GANs

https://github.com/hindupuravinash/the-gan-zoo

70
GAN Improvements: Improved Loss Functions
WGAN with Gradient Penalty
Wasserstein GAN (WGAN)
(WGAN-GP)

Gulrajani et al, “Improved Training of


Arjovsky, Chintala, and Bouttou, “Wasserstein GAN”, 2017 Wasserstein GANs”, NeurIPS 2017

71

You might also like