L18 Gan Slides
L18 Gan Slides
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching
Lecture 18
Introduction to Generative
with Applications in Python
Adversarial Networks
Sebastian Raschka STAT 453: Intro to Deep Learning 1
https://arxiv.org/abs/1406.2661
https://thiscatdoesnotexist.com
https://thisponydoesnotexist.net
https://thispersondoesnotexist.com
Training set
Discriminator
Real /
Generated
Noise
Generated image
Generator
Real image
TRAIN
Training set
Discriminator
<latexit sha1_base64="9ci9F9PdaHss/edXPNTlSnuigmw=">AAACDXicbVC7SgNBFJ2NrxhfUUubIVGITdiNgjZC0MYygnlAsoTZyawZMju7zNyVLGt+wMZfsbFQxNbezr9x8ig08cCFwzn3cu89XiS4Btv+tjJLyyura9n13Mbm1vZOfnevocNYUVanoQhVyyOaCS5ZHTgI1ooUI4EnWNMbXI395j1TmofyFpKIuQG5k9znlICRuvnDqJRcdIANIS2YOYG5aWCF0QPuBAT6np8OR8fdfNEu2xPgReLMSBHNUOvmvzq9kMYBk0AF0brt2BG4KVHAqWCjXCfWLCJ0YFa1DZUkYNpNJ9+M8JFRetgPlSkJeKL+nkhJoHUSeKZzfKKe98bif147Bv/cTbmMYmCSThf5scAQ4nE0uMcVoyASQwhV3NyKaZ8oQsEEmDMhOPMvL5JGpeyclCs3p8Xq5SyOLDpABVRCDjpDVXSNaqiOKHpEz+gVvVlP1ov1bn1MWzPWbGYf/YH1+QMLYJuM</latexit>
p(y = ”real image”|x)
TRAIN
Discriminator
Noise
Generated image
Generator
FREEZE
Discriminator
<latexit sha1_base64="9ci9F9PdaHss/edXPNTlSnuigmw=">AAACDXicbVC7SgNBFJ2NrxhfUUubIVGITdiNgjZC0MYygnlAsoTZyawZMju7zNyVLGt+wMZfsbFQxNbezr9x8ig08cCFwzn3cu89XiS4Btv+tjJLyyura9n13Mbm1vZOfnevocNYUVanoQhVyyOaCS5ZHTgI1ooUI4EnWNMbXI395j1TmofyFpKIuQG5k9znlICRuvnDqJRcdIANIS2YOYG5aWCF0QPuBAT6np8OR8fdfNEu2xPgReLMSBHNUOvmvzq9kMYBk0AF0brt2BG4KVHAqWCjXCfWLCJ0YFa1DZUkYNpNJ9+M8JFRetgPlSkJeKL+nkhJoHUSeKZzfKKe98bif147Bv/cTbmMYmCSThf5scAQ4nE0uMcVoyASQwhV3NyKaZ8oQsEEmDMhOPMvL5JGpeyclCs3p8Xq5SyOLDpABVRCDjpDVXSNaqiOKHpEz+gVvVlP1ov1bn1MWzPWbGYf/YH1+QMLYJuM</latexit>
p(y = ”real image”|x)
TRAIN
Generated image
Generator
Training set
Discriminator
Real /
Generated
Noise
Generated image
Generator
min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D
Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>
Real Image
Random Noise
Discriminator
Generator New Image
Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>
Real Image
Random Noise
Discriminator
Generator New Image
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
r WG log 1 D G z (i)
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>
Real Image
Random Noise
Discriminator
Generator New Image
Algorithm 1 Minibatch stochastic gradient descent training of generative adversarial nets. The number of
steps to apply to the discriminator, k, is a hyperparameter. We used k = 1, the least expensive option, in our
experiments.
for number of training iterations do
for k steps do
• Sample minibatch of m noise samples {z (1) , . . . , z (m) } from noise prior pg (z).
• Sample minibatch of m examples {x(1) , . . . , x(m) } from data generating distribution
pdata (x).
• Update the discriminator by ascending its stochastic gradient:
Xm h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1
r ✓d log D x(i) + log 1 D G z (i) .
m i=1
end for
• Sample minibatch of m noise samples {z (1) , . . . , z (m) } from noise prior pg (z).
• Update the generator by descending its stochastic gradient:
1 Xm ⇣ ⇣ ⇣ ⌘⌘⌘
r ✓g log 1 D G z (i) .
m i=1
end for
The gradient-based updates can use any standard gradient-based learning rule. We used momen-
tum in our experiments.
• Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and
4.1 Yoshua
GlobalBengio. "Generative
Optimality of pg Adversarial
= pdata Nets." In Advances in Neural Information Processing Systems, pp.
2672-2680. 2014.
We first consider the optimal discriminator D for any given generator G.
Sebastian Raschka STAT 453: Intro to Deep Learning 18
Proposition 1. For G fixed, the optimal discriminator D is
GAN Convergence
min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D
Figure 1: Generative adversarial nets are trained by simultaneously updating the discriminative distribution
(D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black,
dotted line) px from those of the generative distribution pg (G) (green, solid line). The lower horizontal line is
the domain from which z is sampled, in this case uniformly. The horizontal line above is part of the domain
of x. The upward arrows show how the mapping x = G(z) imposes the non-uniform distribution pg on
transformed samples. G contracts in regions of high density and expands in regions of low density of pg . (a)
Consider an adversarial pair near convergence: pg is similar to pdata and D is a partially accurate classifier.
(b) In the inner loop of the algorithm D is trained to discriminate samples from data, converging to D⇤ (x) =
pdata (x)
pdata (x)+pg (x)
. (c) After an update to G, gradient of D has guided G(z) to flow to regions that are more likely
to be classified as data. (d) After several steps of training, if G and D have enough capacity, they will reach a
point at which both cannot improve because pg = pdata . The discriminator is unable to differentiate between
the two distributions, i.e. D(x) = 12 .
• Algorithm
Goodfellow, 1 Minibatch
Ian, Jean stochastic Mehdi
Pouget-Abadie, gradientMirza,
descent training
Bing of generative
Xu, David adversarial
Warde-Farley, nets.Ozair,
Sherjil The number of
Aaron Courville,
steps to Bengio.
and Yoshua apply to "Generative
the discriminator, k, is a hyperparameter.
Adversarial We used
Nets." In Advances in Neural the least expensive
k = 1, Information option,Systems,
Processing in our pp.
experiments.
2672-2680. 2014.
for number of training iterations do
for k steps do Sebastian Raschka STAT 453: Intro to Deep Learning
(1) (m) 20
Improving stochastic
gradient descent for the
generator
1. The Main Idea Behind GANs
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
rWG log D G z
n i=1
<latexit sha1_base64="ApxiiDEejL0j1l+CBLdbp67dI40=">AAACXnicbVFNSwMxEM2u3/Wr6kXwEixCvZRdFfQiiAp6VLBW6NYlm2bb0GyyJLNCDfsnvYkXf4rZtge/BpL38maGTF6SXHADQfDu+XPzC4tLyyu11bX1jc361vajUYWmrE2VUPopIYYJLlkbOAj2lGtGskSwTjK6qvKdF6YNV/IBxjnrZWQgecopASfF9SKSJBEktlFGYJiktlPG9qYscZRqQm1YWum4KbLY8vOwfJ4chRq4jaXQvJ7CzRSiRIm+GWcO7KurbfLDMtJ8MITDnxDXG0ErmAT+S8IZaaBZ3MX1t6ivaJExCVQQY7phkEPPEg2cClbWosKwnNARGbCuo5JkzPTsxJ4SHzilj1Ol3ZKAJ+r3DksyU03tKisTzO9cJf6X6xaQnvUsl3kBTNLpRWkhMChceY37XDMKYuwIoZq7WTEdEucruB+pORPC30/+Sx6PWuFx6+j+pHFxObNjGe2hfdREITpFF+gW3aE2oujD87yat+p9+ov+ur85LfW9Wc8O+hH+7hfeJLcS</latexit>
Discriminator
• Maximize prediction probability of classifying real as real and fake as fake
• Remember maximizing log likelihood is the same as minimizing negative
log likelihood (i.e., minimizing cross-entropy)
Generator
• Minimize likelihood of the discriminator to make correct predictions (predict
fake as fake; real as real), which can be achieved by maximizing the cross-
entropy
• This doesn't work well in practice though because of small gradient issues
• Better: ip labels and minimize cross entropy (force the discriminator to
output high probability for real if an image is fake)
Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>
Fake images, y =0
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>
Fake images, y =0
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning
with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434.