0% found this document useful (0 votes)
6 views

Lecture4 GAN b

The document outlines a course on Applied Machine Learning taught by Dr. Tao Han at NJIT, focusing on Generative Adversarial Networks (GANs) and their evaluation methods. It discusses the limitations of JS divergence in binary classification and introduces Wasserstein distance as a more effective metric for GAN training. Additionally, it covers various GAN applications, including conditional generation and learning from unpaired data, along with evaluation techniques like Inception Score and Fréchet Inception Distance.

Uploaded by

ra734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture4 GAN b

The document outlines a course on Applied Machine Learning taught by Dr. Tao Han at NJIT, focusing on Generative Adversarial Networks (GANs) and their evaluation methods. It discusses the limitations of JS divergence in binary classification and introduces Wasserstein distance as a more effective metric for GAN training. Additionally, it covers various GAN applications, including conditional generation and learning from unpaired data, along with evaluation techniques like Inception Score and Fréchet Inception Distance.

Uploaded by

ra734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

ECE 381:

Applied Machine Learning

• Tao Han, Ph.D.

• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology

• https://tao-han-njit.netlify.app

Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
𝐺 ∗ = 𝑎𝑟𝑔 min 𝐷𝑖𝑣
max𝑃𝑉 𝐺,𝑑𝑎𝑡𝑎
𝐺, 𝑃 𝐷
𝐺 𝐷

𝐷 ∗ = 𝑎𝑟𝑔 max 𝑉 𝐷, 𝐺 The maximum objective value


𝐷 is related to JS divergence.

• Initialize generator and discriminator


• In each training iteration:
Step 1: Fix generator G, and update discriminator D
Step 2: Fix discriminator D, and update generator G

2
JS divergence (Binary Classifier) is
not suitable
• In most cases, 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎 are not overlapped.
• 1. The nature of data
Both 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 are low-dim
manifold in high-dim space.
𝑃𝐺 𝑃𝑑𝑎𝑡𝑎
The overlap can be ignored.
• 2. Sampling
Even though 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
have overlap.
If you do not have enough
sampling ……
3
What is the problem of JS divergence (Binary Classifier) ?
JS divergence is always log2 if two distributions do not overlap.

𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺1 𝑃𝑑𝑎𝑡𝑎 …… 𝑃𝐺100 𝑃𝑑𝑎𝑡𝑎

Equally bad
𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝐽𝑆 𝑃𝐺1 , 𝑃𝑑𝑎𝑡𝑎 …… 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2 = 𝑙𝑜𝑔2 =0

Intuition: If two distributions do not overlap, binary classifier


achieves 100% accuracy.
The accuracy (or loss) means nothing during GAN training.
4.
Algorithm
• Initialize generator and discriminator G D
• In each training iteration:
Sample some
Update
real objects: 1 1 1 1
Learning Generate some D
D fake objects: vector 0 0 0 0
vector
vector
vector
G fix

Learning
G
vector
vector
vector
vector

image
G image
image D 1
image
update fix
5
What is the problem of JS divergence (Binary Classifier) ?

𝑑0 𝑑1
𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺1 𝑃𝑑𝑎𝑡𝑎 …… 𝑃𝐺100 𝑃𝑑𝑎𝑡𝑎

𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝐽𝑆 𝑃𝐺1 , 𝑃𝑑𝑎𝑡𝑎 …… 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎


= 𝑙𝑜𝑔2 = 𝑙𝑜𝑔2 =0

𝑊 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝑊 𝑃𝐺1 , 𝑃𝑑𝑎𝑡𝑎 …… 𝑊 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎


= 𝑑0 = 𝑑1 =0
Better!

6
Wasserstein distance
• Considering one distribution P as a pile of earth,
and another distribution Q as the target
• The average distance the earth mover has to move
the earth.

𝑃 𝑄

𝑊 𝑃, 𝑄 = 𝑑
7
Wasserstein distance
Smaller Larger
distance? distance?
𝑃

There are many possible “moving plans”.


Using the “moving plan” with the smallest average distance to
define the Wasserstein distance.
8
Source of image: https://vincentherrmann.github.io/blog/wasserstein/
https://arxiv.org/abs/1701.07875

WGAN
Evaluate Wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺

max 𝐸𝑦~𝑃𝑑𝑎𝑡𝑎 𝐷 𝑦 − 𝐸𝑦~𝑃𝐺 𝐷 𝑦


𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧

D has to be smooth enough.


Without the constraint, the
training of D will not converge.
generated real
Keeping the D smooth forces
D(y) become ∞ and −∞ D
−∞
9
GAN is still challenging …
• Generator and Discriminator needs to match each
other
Generate fake images to fool discriminator

Cannot fool the


Fail to improve ...
discriminator …

Generator Discriminator

I cannot tell the


Fail to improve ...
difference ……
Tell the difference between real and fake
Readings
• Tips from Soumith
• https://github.com/soumith/ganhacks
• Tips in DCGAN: Guideline for network architecture design
for image generation
• https://arxiv.org/abs/1511.06434
• Improved techniques for training GANs
• https://arxiv.org/abs/1606.03498
• Tips from BigGAN
• https://arxiv.org/abs/1809.11096

11
Evaluation of Generation

12
Quality of Image
• Human evaluation is expensive (and sometimes
unfair/unstable).
• How to evaluate the quality of the generated
images automatically?
class 2

𝑦 Off-the-shelf
image
class 1 𝑃 𝑐|𝑦
Image Classifier class 3
e.g., Inception net,
VGG, etc. Concentrated distribution
means higher visual quality

13
Diversity - Mode Collapse

: real data
: generated data

14
Diversity - Mode Dropping
: real data
: generated data

Generator
at iteration t

Generator
at iteration t+1

(BEGAN on CelebA)
15
Diversity
class 2
𝑃 𝑐|𝑦1
𝑦1 CNN class 1 𝑃 𝑐
class 3
1
= ෍ 𝑃 𝑐|𝑦 𝑛
class 2 𝑁
𝑛
𝑃 𝑐|𝑦 2
𝑦2 CNN class 1 class 3 class 2

class 1 class 3
class 2

𝑦3 CNN class 1 𝑃 𝑐|𝑦 3 low diversity


class 3
……

16
Inception Score (IS):
Good quality, large diversity → Large IS
Diversity
class 2
𝑃 𝑐|𝑦1
𝑦1 CNN class 1 𝑃 𝑐
class 3
1
= ෍ 𝑃 𝑐|𝑦 𝑛
class 1 𝑁
𝑛
𝑃 𝑐|𝑦 2
𝑦2 CNN class 2
class 3

class 3 Uniform means


higher variety
𝑦3 CNN class 1
class 2
𝑃 𝑐|𝑦 3
……

What is the problem here? ☺ 17


https://arxiv.org/pdf/1706.08500.pdf

Fréchet Inception Distance (FID)


Orange points: real images

softmax
blue points: generated images
CNN
FID = Fréchet distance
between the two Gaussians
Smaller is better
https://arxiv.org/pdf/1511.01844.pdf

We don’t want memory GAN.

Real Data

Generated
Data
Same as real data …

Generated
Data
Simply flip real data …
20
Conditional Generation

22
Text-to-image red eyes yellow hair
𝑥 black hair dark circles
red eyes

Generator 𝑦

red hair,
green eyes

blue hair,
red eyes 23
Conditional GAN
𝑥: Red eyes
G Image 𝑦 = 𝐺 𝑐, 𝑧
Normal distribution 𝑧

𝑦 is real image or not

D Generator will learn to


𝑦 scalar
(original) generate realistic images ….
But completely ignore the
Real images: 1 input conditions.

Generated images: 0
https://arxiv.org/abs/1605.05396

Conditional GAN
𝑥: Red eyes
G Image 𝑦 = 𝐺 𝑐, 𝑧
Normal distribution 𝑧

𝑦 D 𝑦 is realistic or not +
(better)
scalar 𝑥 and 𝑦 are matched or not
𝑥

True text-image pairs: (red eyes, ) 1

(red eyes, ) 0 (red eyes, ) 0


https://arxiv.org/abs/1611.07004

Conditional GAN
𝑥
G 𝑦 = 𝐺 𝑐, 𝑧
𝑧

Image translation, or pix2pix


https://arxiv.org/abs/1611.07004

Conditional GAN

G Image D scalar
𝑧
https://arxiv.org/abs/1808.04108

Conditional GAN

𝑥: sound G Image
"a dog barking sound"

Training Data
Collection

video
Conditional GAN
Talking Head Generation

https://arxiv.org/abs/1905.08233
Conditional GAN
Video-to-Video Synthesis

https://github.com/NVIDIA/vid2vid
Learning from
Unpaired Data

31
Learning from Unpaired Data

𝒙 Deep 𝒚
Network

𝒙𝟏 𝒚𝟐
𝒙𝟑 𝒙𝟕 𝒚𝟒 𝒚𝟏𝟎
𝒙𝟓
𝟗 𝒚𝟖 𝒚𝟔
𝒙
unpaired

32
Learning from Unpaired Data

𝒙 Deep 𝒚
Network
Image Style
Transfer

Domain 𝒳 Domain 𝒴
unpaired

Can we learn the mapping without any paired data?


Unsupervised Conditional Generation
33
Learning from Unpaired Data

Domain 𝒳 Domain 𝒴

Network

34
Domain 𝒳 Domain 𝒴

Cycle GAN

Domain 𝒳
Become similar
to domain 𝒴
𝐺𝒳→𝒴 ?

𝐷𝒴 scalar

Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Domain 𝒳 Domain 𝒴

Cycle GAN

Domain 𝒳
Become similar
to domain 𝒴
𝐺𝒳→𝒴 ?

ignore input
𝐷𝒴 scalar

Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency

𝐺𝒳→𝒴 ? 𝐺𝒴→𝒳

Lack of information
for reconstruction
𝐷𝒴 scalar

Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency

𝐺𝒳→𝒴 𝐺𝒴→𝒳

“Related” to input, so
possible to reconstruct
𝐷𝒴 scalar

Input image
belongs to
domain 𝒴 or not
Domain 𝒴
Cycle GAN
as close as possible
Cycle consistency

𝐺𝒳→𝒴 𝐺𝒴→𝒳

scalar: belongs to
domain 𝒳or not 𝐷𝒳 𝐷𝒴 scalar

𝐺𝒴→𝒳 𝐺𝒳→𝒴
Concluding Remarks

Generative Adversarial Network (GAN)

Theory behind GAN

Evaluation of Generative Models

Conditional Generation

Learning from unpaired data

42

You might also like