0% found this document useful (0 votes)
25 views33 pages

L18 Gan Slides

This document is a lecture overview on Generative Adversarial Networks (GANs) presented by Sebastian Raschka in the course STAT 453: Introduction to Deep Learning. It covers the main ideas behind GANs, their objectives, practical modifications to the loss function, and applications in Python, including generating handwritten digits and face images. The lecture emphasizes the adversarial game between the generator and discriminator in GANs and provides insights into training strategies and loss functions.

Uploaded by

27jeremi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views33 pages

L18 Gan Slides

This document is a lecture overview on Generative Adversarial Networks (GANs) presented by Sebastian Raschka in the course STAT 453: Introduction to Deep Learning. It covers the main ideas behind GANs, their objectives, practical modifications to the loss function, and applications in Python, including generating handwritten digits and face images. The lecture emphasizes the adversarial game between the generator and discriminator in GANs and provides insights into training strategies and loss functions.

Uploaded by

27jeremi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

STAT 453: Introduction to Deep Learning and Generative Models

Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching

Lecture 18
Introduction to Generative
with Applications in Python

Adversarial Networks
Sebastian Raschka STAT 453: Intro to Deep Learning 1
https://arxiv.org/abs/1406.2661

Sebastian Raschka STAT 453: Intro to Deep Learning 2


https://thisstartupdoesnotexist.com

https://thiscatdoesnotexist.com

https://thisponydoesnotexist.net
https://thispersondoesnotexist.com

Sebastian Raschka STAT 453: Intro to Deep Learning 3


Lecture Overview

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 4


Letting two neural networks
compete with each other

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 5


Generative Adversarial Networks (GAN)

• The original purpose is to generate new data


• Classically for generating new images, but applicable to
wide range of domains
• Learns the training set distribution and can generate new
images that have never been seen before
• Similar to VAE, and in contrast to e.g., autoregressive
models or RNNs (generating one word at a time), GANs
generate the whole output all at once

Sebastian Raschka STAT 453: Intro to Deep Learning 6


Deep Convolutional GAN (DCGAN or just GAN)
Real image

Training set
Discriminator

Real /
Generated

Noise

Generated image

Generator

Sebastian Raschka STAT 453: Intro to Deep Learning 7


Step 1.1: Train Discriminator

Real image

TRAIN

Training set
Discriminator

<latexit sha1_base64="9ci9F9PdaHss/edXPNTlSnuigmw=">AAACDXicbVC7SgNBFJ2NrxhfUUubIVGITdiNgjZC0MYygnlAsoTZyawZMju7zNyVLGt+wMZfsbFQxNbezr9x8ig08cCFwzn3cu89XiS4Btv+tjJLyyura9n13Mbm1vZOfnevocNYUVanoQhVyyOaCS5ZHTgI1ooUI4EnWNMbXI395j1TmofyFpKIuQG5k9znlICRuvnDqJRcdIANIS2YOYG5aWCF0QPuBAT6np8OR8fdfNEu2xPgReLMSBHNUOvmvzq9kMYBk0AF0brt2BG4KVHAqWCjXCfWLCJ0YFa1DZUkYNpNJ9+M8JFRetgPlSkJeKL+nkhJoHUSeKZzfKKe98bif147Bv/cTbmMYmCSThf5scAQ4nE0uMcVoyASQwhV3NyKaZ8oQsEEmDMhOPMvL5JGpeyclCs3p8Xq5SyOLDpABVRCDjpDVXSNaqiOKHpEz+gVvVlP1ov1bn1MWzPWbGYf/YH1+QMLYJuM</latexit>
p(y = ”real image”|x)

Train to predict that real image is real

Sebastian Raschka STAT 453: Intro to Deep Learning 8


Step 1.2: Train Discriminator

TRAIN
Discriminator

p(y = ”real image”|x)


FREEZE
<latexit sha1_base64="9ci9F9PdaHss/edXPNTlSnuigmw=">AAACDXicbVC7SgNBFJ2NrxhfUUubIVGITdiNgjZC0MYygnlAsoTZyawZMju7zNyVLGt+wMZfsbFQxNbezr9x8ig08cCFwzn3cu89XiS4Btv+tjJLyyura9n13Mbm1vZOfnevocNYUVanoQhVyyOaCS5ZHTgI1ooUI4EnWNMbXI395j1TmofyFpKIuQG5k9znlICRuvnDqJRcdIANIS2YOYG5aWCF0QPuBAT6np8OR8fdfNEu2xPgReLMSBHNUOvmvzq9kMYBk0AF0brt2BG4KVHAqWCjXCfWLCJ0YFa1DZUkYNpNJ9+M8JFRetgPlSkJeKL+nkhJoHUSeKZzfKKe98bif147Bv/cTbmMYmCSThf5scAQ4nE0uMcVoyASQwhV3NyKaZ8oQsEEmDMhOPMvL5JGpeyclCs3p8Xq5SyOLDpABVRCDjpDVXSNaqiOKHpEz+gVvVlP1ov1bn1MWzPWbGYf/YH1+QMLYJuM</latexit>

Noise

Generated image
Generator

Train to predict that fake image is fake

Sebastian Raschka STAT 453: Intro to Deep Learning 9


Step 2: Train Generator

FREEZE
Discriminator

<latexit sha1_base64="9ci9F9PdaHss/edXPNTlSnuigmw=">AAACDXicbVC7SgNBFJ2NrxhfUUubIVGITdiNgjZC0MYygnlAsoTZyawZMju7zNyVLGt+wMZfsbFQxNbezr9x8ig08cCFwzn3cu89XiS4Btv+tjJLyyura9n13Mbm1vZOfnevocNYUVanoQhVyyOaCS5ZHTgI1ooUI4EnWNMbXI395j1TmofyFpKIuQG5k9znlICRuvnDqJRcdIANIS2YOYG5aWCF0QPuBAT6np8OR8fdfNEu2xPgReLMSBHNUOvmvzq9kMYBk0AF0brt2BG4KVHAqWCjXCfWLCJ0YFa1DZUkYNpNJ9+M8JFRetgPlSkJeKL+nkhJoHUSeKZzfKKe98bif147Bv/cTbmMYmCSThf5scAQ4nE0uMcVoyASQwhV3NyKaZ8oQsEEmDMhOPMvL5JGpeyclCs3p8Xq5SyOLDpABVRCDjpDVXSNaqiOKHpEz+gVvVlP1ov1bn1MWzPWbGYf/YH1+QMLYJuM</latexit>
p(y = ”real image”|x)

TRAIN

Generated image
Generator

Train to predict that fake image is real

Sebastian Raschka STAT 453: Intro to Deep Learning 10


Adversarial Game

Discriminator: learns to become


better at distinguishing real from generated
images

Generator: learns to generate better


images to fool the discriminator

Sebastian Raschka STAT 453: Intro to Deep Learning 11


How do the loss functions
look like?

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 12


When Does a GAN Converge?
Real image

Training set
Discriminator

Real /
Generated

Noise

Generated image

Generator

Sebastian Raschka STAT 453: Intro to Deep Learning 13


GAN Objective

min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D

Sebastian Raschka STAT 453: Intro to Deep Learning 14


min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D

Discriminator gradient for update (gradient ascent):


predict well on real images predict well on fake images
=> want probability close to 1 => want probability close to 0

Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>

Real Image
Random Noise
Discriminator
Generator New Image

Sebastian Raschka STAT 453: Intro to Deep Learning 15


min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D

Discriminator gradient for update (gradient ascent):


predict well on real images predict well on fake images
=> want probability close to 1 => want probability close to 0

Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>

Real Image
Random Noise
Discriminator
Generator New Image

Sebastian Raschka STAT 453: Intro to Deep Learning 16


min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D

Generator gradient for update (gradient descent):

predict badly on fake images


=> want probability close to 1

1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
r WG log 1 D G z (i)
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>

Real Image
Random Noise
Discriminator
Generator New Image

Sebastian Raschka STAT 453: Intro to Deep Learning 17


2

Algorithm 1 Minibatch stochastic gradient descent training of generative adversarial nets. The number of
steps to apply to the discriminator, k, is a hyperparameter. We used k = 1, the least expensive option, in our
experiments.
for number of training iterations do
for k steps do
• Sample minibatch of m noise samples {z (1) , . . . , z (m) } from noise prior pg (z).
• Sample minibatch of m examples {x(1) , . . . , x(m) } from data generating distribution
pdata (x).
• Update the discriminator by ascending its stochastic gradient:
Xm h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1
r ✓d log D x(i) + log 1 D G z (i) .
m i=1

end for
• Sample minibatch of m noise samples {z (1) , . . . , z (m) } from noise prior pg (z).
• Update the generator by descending its stochastic gradient:
1 Xm ⇣ ⇣ ⇣ ⌘⌘⌘
r ✓g log 1 D G z (i) .
m i=1

end for
The gradient-based updates can use any standard gradient-based learning rule. We used momen-
tum in our experiments.

• Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and
4.1 Yoshua
GlobalBengio. "Generative
Optimality of pg Adversarial
= pdata Nets." In Advances in Neural Information Processing Systems, pp.
2672-2680. 2014.
We first consider the optimal discriminator D for any given generator G.
Sebastian Raschka STAT 453: Intro to Deep Learning 18
Proposition 1. For G fixed, the optimal discriminator D is
GAN Convergence

• Converges when Nash-equilibrium (Game Theory concept) is


reached in the minmax (zero-sum) game

min max V (D, G) = Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
G D

• Nash-equilibrium in Game Theory is reached when the actions of


one player won't change depending on the opponent's actions
• Here, this means that the GAN produces realistic images and the
discriminator outputs random predictions (probabilities close to
0.5)

Sebastian Raschka STAT 453: Intro to Deep Learning 19


...

(a) (b) (c) (d)

Figure 1: Generative adversarial nets are trained by simultaneously updating the discriminative distribution
(D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black,
dotted line) px from those of the generative distribution pg (G) (green, solid line). The lower horizontal line is
the domain from which z is sampled, in this case uniformly. The horizontal line above is part of the domain
of x. The upward arrows show how the mapping x = G(z) imposes the non-uniform distribution pg on
transformed samples. G contracts in regions of high density and expands in regions of low density of pg . (a)
Consider an adversarial pair near convergence: pg is similar to pdata and D is a partially accurate classifier.
(b) In the inner loop of the algorithm D is trained to discriminate samples from data, converging to D⇤ (x) =
pdata (x)
pdata (x)+pg (x)
. (c) After an update to G, gradient of D has guided G(z) to flow to regions that are more likely
to be classified as data. (d) After several steps of training, if G and D have enough capacity, they will reach a
point at which both cannot improve because pg = pdata . The discriminator is unable to differentiate between
the two distributions, i.e. D(x) = 12 .

• Algorithm
Goodfellow, 1 Minibatch
Ian, Jean stochastic Mehdi
Pouget-Abadie, gradientMirza,
descent training
Bing of generative
Xu, David adversarial
Warde-Farley, nets.Ozair,
Sherjil The number of
Aaron Courville,
steps to Bengio.
and Yoshua apply to "Generative
the discriminator, k, is a hyperparameter.
Adversarial We used
Nets." In Advances in Neural the least expensive
k = 1, Information option,Systems,
Processing in our pp.
experiments.
2672-2680. 2014.
for number of training iterations do
for k steps do Sebastian Raschka STAT 453: Intro to Deep Learning
(1) (m) 20
Improving stochastic
gradient descent for the
generator
1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 21


GAN Training Problems

• Oscillation between generator and discriminator loss


• Mode collapse (generator produces examples of a particular
kind only)
• Discriminator is too strong, such that the gradient for the
generator vanishes and the generator can't keep up
• Discriminator is too weak, and the generator produces non-
realistic images that fool it too easily (rare problem, though)

Sebastian Raschka STAT 453: Intro to Deep Learning 22


GAN Training Problems
• Discriminator is too strong, such that the gradient for the
generator vanishes and the generator can't keep up
• Can be xed as follows:

Instead of gradient descent with

1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>

Do gradient ascent with

1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
rWG log D G z
n i=1
<latexit sha1_base64="ApxiiDEejL0j1l+CBLdbp67dI40=">AAACXnicbVFNSwMxEM2u3/Wr6kXwEixCvZRdFfQiiAp6VLBW6NYlm2bb0GyyJLNCDfsnvYkXf4rZtge/BpL38maGTF6SXHADQfDu+XPzC4tLyyu11bX1jc361vajUYWmrE2VUPopIYYJLlkbOAj2lGtGskSwTjK6qvKdF6YNV/IBxjnrZWQgecopASfF9SKSJBEktlFGYJiktlPG9qYscZRqQm1YWum4KbLY8vOwfJ4chRq4jaXQvJ7CzRSiRIm+GWcO7KurbfLDMtJ8MITDnxDXG0ErmAT+S8IZaaBZ3MX1t6ivaJExCVQQY7phkEPPEg2cClbWosKwnNARGbCuo5JkzPTsxJ4SHzilj1Ol3ZKAJ+r3DksyU03tKisTzO9cJf6X6xaQnvUsl3kBTNLpRWkhMChceY37XDMKYuwIoZq7WTEdEucruB+pORPC30/+Sx6PWuFx6+j+pHFxObNjGe2hfdREITpFF+gW3aE2oujD87yat+p9+ov+ur85LfW9Wc8O+hH+7hfeJLcS</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 23


fi
GAN Loss Function in Practice
(will be more clear in the code examples)

Discriminator
• Maximize prediction probability of classifying real as real and fake as fake
• Remember maximizing log likelihood is the same as minimizing negative
log likelihood (i.e., minimizing cross-entropy)

Generator
• Minimize likelihood of the discriminator to make correct predictions (predict
fake as fake; real as real), which can be achieved by maximizing the cross-
entropy
• This doesn't work well in practice though because of small gradient issues
• Better: ip labels and minimize cross entropy (force the discriminator to
output high probability for real if an image is fake)

Sebastian Raschka STAT 453: Intro to Deep Learning 24


fl
gradient ascent predict well on real images predict well on fake images
=> want probability close to 1 => want probability close to 0

Xn h ⇣ ⌘ ⇣ ⇣ ⇣ ⌘⌘⌘i
1 (i) (i)
r WD log D x + log 1 D G z
n i=1
<latexit sha1_base64="0M15Pio71s2f2Mt7L9q6YJIIJnA=">AAAClHicbVFdb9MwFHXCgFE+1oHECy8WFVInRJWMSdsDkwobGk9oSHSdVIfIcZ3WmmNH9g1asfKL+De88W9w0jywdVey7vG958jX52alFBai6G8Q3tu6/+Dh9qPe4ydPn+30d59fWF0ZxidMS20uM2q5FIpPQIDkl6XhtMgkn2ZXJ01/+pMbK7T6DquSJwVdKJELRsGX0v5vomgmaepIQWGZ5W5ap+60rjHJDWUurp3y2FZF6sRxXP/wVyJ5DjMi9QKftnhIMi3ndlX45K49Zyj2amLEYgl7b1vemha/6/hnm7JfN2V3pSTtD6JR1AbeBHEHBqiL87T/h8w1qwqugElq7SyOSkgcNSCY5HWPVJaXlF3RBZ95qGjBbeJaU2v8xlfmONfGHwW4rf6vcLSwzfCe2Vhnb/ea4l29WQX5UeKEKivgiq0fyiuJQeNmQ3guDGcgVx5QZoSfFbMl9dsAv8eeNyG+/eVNcLE/it+P9r8dDMafOju20Sv0Gg1RjA7RGH1B52iCWLAbHAbj4GP4MvwQnoSf19Qw6DQv0I0Iv/4DVWHLtg==</latexit>

Discriminator objective in the neg. log-likelihood (binary cross entropy) perspective:


Real images, y =1
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>

(i) (i) (i) (i)


L(w) = y log ŷ 1 y log 1 ŷ
Want, ŷ =1
Fake images, y =0
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>

L(w) = y (i) log ŷ (i) 1 y (i) log 1 ŷ (i)


Want, ŷ =0

Sebastian Raschka STAT 453: Intro to Deep Learning 25


gradient descent with

1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>

Generator objective in the neg. log-likelihood (binary cross entropy) perspective:

Fake images, y =0
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>

(i) (i) (i) (i)


L(w) = y log ŷ 1 y log 1 ŷ
Want, ŷ =0

Flip sign to "+" so that it turns into "want ŷ = 1"

Sebastian Raschka STAT 453: Intro to Deep Learning 26


It is better to ip the labels instead of the sign
gradient descent with

1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
(i)
r WG log 1 D G z
n i=1
<latexit sha1_base64="ME07RVo9qxT+znVFpjOpufWfg/c=">AAACYHicbVFBT9swFHbCNkoHI4wbu1irJpXDqqSbxC5IaCDBESRKkZosclynterYkf0y1Fn5k7tx4MIvwUl7GLAn2e/z970nP3/OSsENhOG952+8eftus7PVfb+982E32Pt4Y1SlKRtRJZS+zYhhgks2Ag6C3ZaakSITbJwtTht9/Jtpw5W8hmXJkoLMJM85JeCoNLiLJckESW1cEJhnuR3XqT2vaxznmlAb1VY6bKoitfw4qn+1R6FmbmM59KOvZytwvkpxpsTULAuX7B9X3eeHdaz5bA6Hz1Ma9MJB2AZ+DaI16KF1XKbB33iqaFUwCVQQYyZRWEJiiQZOBau7cWVYSeiCzNjEQUkKZhLbGlTjL46Z4lxptyTglv23w5LCNFO7ysYG81JryP9pkwryH4nlsqyASbq6KK8EBoUbt/GUa0ZBLB0gVHM3K6Zz4pwF9yddZ0L08smvwc1wEH0bDK++905+ru3ooE/oM+qjCB2hE3SBLtEIUfTgbXjb3o736Hf8XX9vVep765599Cz8gyf8rLeE</latexit>

Generator objective in the neg. log-likelihood (binary cross entropy) perspective:

Fake images, y =0
<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>

(i) (i) (i) (i)


L(w) = y log ŷ 1 y log 1 ŷ
Want, ŷ =0

Flip sign to "+" so that it turns into "want ŷ = 1"

Sebastian Raschka STAT 453: Intro to Deep Learning 27


fl
Do gradient ascent with
1 Xn ⇣ ⇣ ⇣ ⌘⌘⌘
rWG log D G z (i) And ip labels
n i=1
<latexit sha1_base64="ApxiiDEejL0j1l+CBLdbp67dI40=">AAACXnicbVFNSwMxEM2u3/Wr6kXwEixCvZRdFfQiiAp6VLBW6NYlm2bb0GyyJLNCDfsnvYkXf4rZtge/BpL38maGTF6SXHADQfDu+XPzC4tLyyu11bX1jc361vajUYWmrE2VUPopIYYJLlkbOAj2lGtGskSwTjK6qvKdF6YNV/IBxjnrZWQgecopASfF9SKSJBEktlFGYJiktlPG9qYscZRqQm1YWum4KbLY8vOwfJ4chRq4jaXQvJ7CzRSiRIm+GWcO7KurbfLDMtJ8MITDnxDXG0ErmAT+S8IZaaBZ3MX1t6ivaJExCVQQY7phkEPPEg2cClbWosKwnNARGbCuo5JkzPTsxJ4SHzilj1Ol3ZKAJ+r3DksyU03tKisTzO9cJf6X6xaQnvUsl3kBTNLpRWkhMChceY37XDMKYuwIoZq7WTEdEucruB+pORPC30/+Sx6PWuFx6+j+pHFxObNjGe2hfdREITpFF+gW3aE2oujD87yat+p9+ov+ur85LfW9Wc8O+hH+7hfeJLcS</latexit>

Generator objective in the neg. log-likelihood (binary cross entropy) perspective:

fake image label ipped -> real image label, y =1


<latexit sha1_base64="IJ3rlSkMFu1hhYm7PEaWggtvQ7s=">AAACYnicbZHNS8MwGMbT6nRO3Tp31ENwCNthoxVRL8LQiwcPE9wHrHOkWbqFpR8kqVJK/0lvnrz4h5h2A+fmC4Efz/M+pH3ihIwKaZqfmr6zW9jbLx6UDo+OyxWjetIXQcQx6eGABXzoIEEY9UlPUsnIMOQEeQ4jA2fxkPmDN8IFDfwXGYdk7KGZT12KkVTSxIhtD8k5Rix5Shs5O27ynjbhHWxBGL8mDdpMoc2CGbQdOmvYcySTOF3qmdJUe7kDLUXxuvGbslrbuYlRN9tmPnAbrBXUwWq6E+PDngY48ogvMUNCjCwzlOMEcUkxI2nJjgQJEV6gGRkp9JFHxDjJK0rhhVKm0A24Or6EubqeSJAnROw5ajMrQWx6mfifN4qkeztOqB9Gkvh4eZEbMSgDmPUNp5QTLFmsAGFO1bdCPEccYalepaRKsDZ/eRv6l23rum0+X9U796s6iuAUnIMGsMAN6IBH0AU9gMGXVtDKWkX71kt6Va8tV3VtlamBP6Of/QDJ3LM1</latexit>

(i) (i) (i) (i)


L(w) = y log ŷ 1 y log 1 ŷ
Want, ŷ =1

Sebastian Raschka STAT 453: Intro to Deep Learning 28


fl
fl
Implementing our rst GAN

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 29


fi
How do the loss functions
look like?

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 30


https://github.com/soumith/ganhacks

Sebastian Raschka STAT 453: Intro to Deep Learning 31


How do the loss functions
look like?

1. The Main Idea Behind GANs

2. The GAN Objective

3. Modifying the GAN Loss Function for Practical Use

4. A Simple GAN Generating Handwritten Digits in PyTorch

5. Tips and Tricks to Make GANs Work

6. A DCGAN for Generating Face Images in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 32


Deep Convolutional GAN

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning
with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434.

Sebastian Raschka STAT 453: Intro to Deep Learning 33

You might also like