Lecture4 GAN b
Lecture4 GAN b
• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology
• https://tao-han-njit.netlify.app
Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
𝐺 ∗ = 𝑎𝑟𝑔 min 𝐷𝑖𝑣
max𝑃𝑉 𝐺,𝑑𝑎𝑡𝑎
𝐺, 𝑃 𝐷
𝐺 𝐷
2
JS divergence (Binary Classifier) is
not suitable
• In most cases, 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎 are not overlapped.
• 1. The nature of data
Both 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 are low-dim
manifold in high-dim space.
𝑃𝐺 𝑃𝑑𝑎𝑡𝑎
The overlap can be ignored.
• 2. Sampling
Even though 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
have overlap.
If you do not have enough
sampling ……
3
What is the problem of JS divergence (Binary Classifier) ?
JS divergence is always log2 if two distributions do not overlap.
Equally bad
𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝐽𝑆 𝑃𝐺1 , 𝑃𝑑𝑎𝑡𝑎 …… 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2 = 𝑙𝑜𝑔2 =0
Learning
G
vector
vector
vector
vector
image
G image
image D 1
image
update fix
5
What is the problem of JS divergence (Binary Classifier) ?
𝑑0 𝑑1
𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺1 𝑃𝑑𝑎𝑡𝑎 …… 𝑃𝐺100 𝑃𝑑𝑎𝑡𝑎
6
Wasserstein distance
• Considering one distribution P as a pile of earth,
and another distribution Q as the target
• The average distance the earth mover has to move
the earth.
𝑃 𝑄
𝑊 𝑃, 𝑄 = 𝑑
7
Wasserstein distance
Smaller Larger
distance? distance?
𝑃
WGAN
Evaluate Wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
∞
Without the constraint, the
training of D will not converge.
generated real
Keeping the D smooth forces
D(y) become ∞ and −∞ D
−∞
9
GAN is still challenging …
• Generator and Discriminator needs to match each
other
Generate fake images to fool discriminator
Generator Discriminator
11
Evaluation of Generation
12
Quality of Image
• Human evaluation is expensive (and sometimes
unfair/unstable).
• How to evaluate the quality of the generated
images automatically?
class 2
𝑦 Off-the-shelf
image
class 1 𝑃 𝑐|𝑦
Image Classifier class 3
e.g., Inception net,
VGG, etc. Concentrated distribution
means higher visual quality
13
Diversity - Mode Collapse
: real data
: generated data
14
Diversity - Mode Dropping
: real data
: generated data
Generator
at iteration t
Generator
at iteration t+1
(BEGAN on CelebA)
15
Diversity
class 2
𝑃 𝑐|𝑦1
𝑦1 CNN class 1 𝑃 𝑐
class 3
1
= 𝑃 𝑐|𝑦 𝑛
class 2 𝑁
𝑛
𝑃 𝑐|𝑦 2
𝑦2 CNN class 1 class 3 class 2
class 1 class 3
class 2
16
Inception Score (IS):
Good quality, large diversity → Large IS
Diversity
class 2
𝑃 𝑐|𝑦1
𝑦1 CNN class 1 𝑃 𝑐
class 3
1
= 𝑃 𝑐|𝑦 𝑛
class 1 𝑁
𝑛
𝑃 𝑐|𝑦 2
𝑦2 CNN class 2
class 3
softmax
blue points: generated images
CNN
FID = Fréchet distance
between the two Gaussians
Smaller is better
https://arxiv.org/pdf/1511.01844.pdf
Real Data
Generated
Data
Same as real data …
Generated
Data
Simply flip real data …
20
Conditional Generation
22
Text-to-image red eyes yellow hair
𝑥 black hair dark circles
red eyes
Generator 𝑦
red hair,
green eyes
blue hair,
red eyes 23
Conditional GAN
𝑥: Red eyes
G Image 𝑦 = 𝐺 𝑐, 𝑧
Normal distribution 𝑧
Generated images: 0
https://arxiv.org/abs/1605.05396
Conditional GAN
𝑥: Red eyes
G Image 𝑦 = 𝐺 𝑐, 𝑧
Normal distribution 𝑧
𝑦 D 𝑦 is realistic or not +
(better)
scalar 𝑥 and 𝑦 are matched or not
𝑥
Conditional GAN
𝑥
G 𝑦 = 𝐺 𝑐, 𝑧
𝑧
Conditional GAN
G Image D scalar
𝑧
https://arxiv.org/abs/1808.04108
Conditional GAN
𝑥: sound G Image
"a dog barking sound"
Training Data
Collection
video
Conditional GAN
Talking Head Generation
https://arxiv.org/abs/1905.08233
Conditional GAN
Video-to-Video Synthesis
https://github.com/NVIDIA/vid2vid
Learning from
Unpaired Data
31
Learning from Unpaired Data
𝒙 Deep 𝒚
Network
𝒙𝟏 𝒚𝟐
𝒙𝟑 𝒙𝟕 𝒚𝟒 𝒚𝟏𝟎
𝒙𝟓
𝟗 𝒚𝟖 𝒚𝟔
𝒙
unpaired
32
Learning from Unpaired Data
𝒙 Deep 𝒚
Network
Image Style
Transfer
Domain 𝒳 Domain 𝒴
unpaired
Domain 𝒳 Domain 𝒴
Network
34
Domain 𝒳 Domain 𝒴
Cycle GAN
Domain 𝒳
Become similar
to domain 𝒴
𝐺𝒳→𝒴 ?
𝐷𝒴 scalar
Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Domain 𝒳 Domain 𝒴
Cycle GAN
Domain 𝒳
Become similar
to domain 𝒴
𝐺𝒳→𝒴 ?
ignore input
𝐷𝒴 scalar
Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency
𝐺𝒳→𝒴 ? 𝐺𝒴→𝒳
Lack of information
for reconstruction
𝐷𝒴 scalar
Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency
𝐺𝒳→𝒴 𝐺𝒴→𝒳
“Related” to input, so
possible to reconstruct
𝐷𝒴 scalar
Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency
𝐺𝒳→𝒴 𝐺𝒴→𝒳
scalar: belongs to
domain 𝒳or not 𝐷𝒳 𝐷𝒴 scalar
𝐺𝒴→𝒳 𝐺𝒳→𝒴
Concluding Remarks
Conditional Generation
42