Mod5_Slides
Mod5_Slides
Dϕ
x
V (G , DG∗ (x))
h i h i
= Ex∼pdata log pdatapdata (x)
(x)+pG (x) + E x∼p G
log pG (x)
pdata (x)+pG (x)
pdata (x) pG (x)
= Ex∼pdata log pdata (x)+pG (x) + Ex∼pG log pdata (x)+pG (x) − log 4
2 2
pdata + pG pdata + pG
= DKL pdata , + DKL pG , − log 4
2 2
| {z }
2×Jensen-Shannon Divergence (JSD)
= 2DJSD [pdata , pG ] − log 4
Properties
DJSD [p, q] ≥ 0
DJSD [p, q] = 0 iff p = q
D
pJSD [p, q] = DJSD [q, p]
DJSD [p, q] satisfies triangle inequality → Jenson-Shannon Distance
Optimal generator for the JSD/Negative Cross Entropy GAN
pG = pdata
For the optimal discriminator DG∗ ∗ (·) and generator G ∗ (·), we have
V (G ∗ , DG∗ ∗ (x)) = − log 4
Stefano Ermon (AI Lab) Deep Generative Models Lecture 9 16 / 1
Recap of GANs
min max V (Gθ , Dϕ ) = Ex∼pdata [log Dϕ (x)] + Ez∼p(z) [log(1 − Dϕ (Gθ (z)))]
θ ϕ
Image Source: Ian Goodfellow. Samples from Goodfellow et al., 2014, Radford et
al., 2015, Liu et al., 2016, Karras et al., 2017, Karras et al., 2018
Stefano Ermon (AI Lab) Deep Generative Models Lecture 9 21 / 1
Optimization challenges
Theorem (informal): If the generator updates are made in function
space and discriminator is optimal at every step, then the generator is
guaranteed to converge to the data distribution
Unrealistic assumptions!
In practice, the generator and discriminator loss keeps oscillating
during GAN training
Likelihood-free training
Training objective for GANs
https://github.com/hindupuravinash/the-gan-zoo
The GAN Zoo: List of all named GANs
Today
Rich class of likelihood-free objectives via f -GANs
Wasserstein GAN
Inferring latent representations via BiGAN
Application: Image-to-image translation via CycleGANs
Jensen’s
R inequality: Ex∼qR[f (p(x)/q(x))] ≥ f (Ex∼q [p(x)/q(x)]) =
f ( q(x)p(x)/q(x)) = f ( p(x)) = f (1) = 0
Example: KL divergence with f (u) = u log u
Stefano Ermon (AI Lab) Deep Generative Models Lecture 10 5 / 28
f divergences
Wasserstein distance
Kantorovich-Rubinstein duality