0% found this document useful (0 votes)

37 views21 pages

Lecture 4 - Variational Divergence Minimization or Adversarial Learning

Uploaded by

vinay thakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views21 pages

Lecture 4 - Variational Divergence Minimization or Adversarial Learning

Uploaded by

vinay thakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Variational

Divergence Minimization / Adversarial Learning .

Background reading :

i
) convex
functions Ee Fenichel
Conjugates
)
ii Euler
Lagrange equations

Iii ) Definitions of supremum / Infimum

Issue with the KL
divergence
-

Recall that
maximization of the data hog likelihood
the
is equivalent to minimization of the KL divergence -

between the model E the true data distributions .

represent the densities imposed by

Suppose p× Epo

the true data & the model , respectively .

We have
,
Dru (10×1112) =/ log P÷d×
✗
Let us
study the behavior
of DKL at
different
X
regions of .

Consider Xi
a subset × , C- ✗ where
p× ( %-)
>>
pC*) it ✗ C-

which desirable
over ✗\
,
DKL is
high is .

However , suppose
7 subset ✗ < C- ✗ where
pocx ) >>
C✗)
it ✗ £ ✗ 2

DKL will be
relatively lower over Xe which is

does not
undesirable since
optimizing over DKL
with low data density
discourage coverage of regions
.
consider KL
suppose we reverse ,

Dia [ Po 1112] =/ Po log Po tax

✗ ¥

while reverse KL is
high over Xz ,
it is
relatively
lower × which does not the
encourage
over ,

the data
of the
models to entire
cover
support .

Therefore it is desirable to consider alternate

metrics other than

divergence
KL .
The f-
family of divergences .

f-
divergence constitutes a
family of divergence metrics

between distributions .

Given distributions 1Pa

Definition : two Or with

respective absolute continuous density functions peaq

deferred the domain then the f-

on ✗ ,
divergence
between them
-
are
defined as below :
Df [ ④ 110] ⇐
qwf.pk) dx

OLCX )

where t R+→R
: is called the
generator function
which is convex
,
lower semi continuous a f- (1) = -0 .

According
to this
definition ,
there can exist
infinitely
the
many divergence metrics
depending upon
choice
of

f.
Below examples
few popular of Df :
are a

F) flu) =

ulogu , Df = KL
divergence
ii ) flu) = -
.

log u
, Df = Reverse KL

) Pearson
2 2
iii flu) =
@ 1)
-

Df = ✗
,

iD flu) @+ 1)
log Df Jensen Shannon der
¥4
≤
+
ulogu
-

= -

,
.

With this variational method

,
goal is to develop
the a

estimation /
for optimization of Df given samples from pay .
Variational f-
Estimation
of divergences .

Fenchel t
Conjugate for
:
Every convex ,
lower

semi -

continuous
function
£ has a convex
conjugate
*

function f-
defined as
follows :

f- ( t) -1 { ut few}
sup -

u -0
dont
The f- shown to be Er
function convex
Can be

☆
*

lower -

semi continuous with f- = f.

•
: flu) can also be represented as below :

tcu) =
sup { tu -

f- %-)}
tcdomti point wise lower

bound on f-

Intuitively ,
this implies that
any
convex
function
can be
represented as point wise maximum of many
linear
functions .
{
a)
*qgt(
multiple tower bounds at

f- (a) of which one is

tight .

Let us use this

definition in Df .

Dt
=L EH> e-
(%÷ ) ,
tax
ql✗) sup
:{ ᵗ¥¥ a→}d✗
•
= -

e-
+ c-
dome

since I
upremum
is over t 2 qcx) is non -

negative
the However
it can removed out
of integral .
,
the

argument inside the supremum is pointwise over ×

& thus when the is taken out the

supremum of

class
integral ,
it will be over a
function

T : ✗ → IR
•
:
Df ≥ sup
+•
({ PCH
-11×72×-1,90-7 t :( Tex
))d✗)
The above becomes a lower bound since I
may

not contain all

functions that the solutions
forms

for the
point -
wise
optimization problems .

•
:

Df [ PILE] ≥ sup
TET
[ El B-
(17×1) -
E-
Qx
( f- ( TCM ))]
I
question : supposing can represent all
functions ,

what Tcx) would drive Df to ?

zero

Solution : Df would vanish

for
'

-17×7
¥,
'

= f

Proof : Notice that Df is a

functional of Tcx ] .

To
optimize Tcx) class
°

for Df over
of
•
. a
,

needs to consider the Eviler

functions one
Legrang
-

Equations .
/×pC✗)TC✗)d✗ (-11×1) [ equality
•

Df = -

qC✗) 1- tax

7-
assuming
=

/ [p TC✗) -

qlx) f- (1- tax can recover

✗
L
all
functions ]

To
get
the optimal Tcx)
opt
we need to solve

the
following equation :

2L -
2- 2L = 0

JTCX)
'
◦×
ZTCX)

since Ttx) does L the

not
explicitly appear in ,

second term vanishes for all Tcx) .

•
i Consider 2L = 0

271×3

qcx) f- ( TW)
*

pcx) = 0
-

f- ( TW )
' •

=
pcx) (f) =
(4)
qcx)

(f) I # G)
't :{¥ )
)=p¥
TH =
f
, ,
Therefore ,
one can
find the
optimal T
depending on

the choice made

for f- .

Variational
Divergence Minimization .

We have so established lower bound the

far on
a

f- divergence . We now seek to obtain a sampler for

the
the unknown distribution via
minimization of

thus obtained lower bound .

First the bound ,

,
to
apply lower the domain
of the

conjugate function f- has to be respected . To achieve

this ,
the variational
function Tw is
represented as

follows : Tw (a) =
g, @ ( )) x
where Vw :X → IR without constraints &
range
: R→ dome is an
output activation

function specific to the f divergence used .

Collecting
everything together ,
the
final objective would be

FCQW) =

LE,p[g+Cvw D+¥%[tTg+lvwc
✓
c)
%
is a
parametric function which is
typically a

Neural Network that samples from

Deep ,
takes

distribution input another

an
arbitrary as .
Vw is

neural network which operates on the data space X .

called the Generator network

01,0 is
often & Vw

the discriminator network .

2hr10 , I) / ☒o ¥
> Vw > dont
'

✗a
/ X-p

The optimal 04 w

saddle point
found
are
by sowing btw 0GW
the

following problem alternatively , .

0 ;w• Flaw)
argminmawx
=

0
Since the networks 0 E W are
trying
to
respectively
and the objective this
minimize maximize same
function ,

the
procedure is
often referred to as Adversarial
Learning .

IGCSE Ext Mathematics (0580) Flashcards
No ratings yet
IGCSE Ext Mathematics (0580) Flashcards
45 pages
Solution Real Analysis Folland Ch4
100% (1)
Solution Real Analysis Folland Ch4
24 pages
Weatherwax Nocedal Solutions
No ratings yet
Weatherwax Nocedal Solutions
23 pages
Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
Homework 1
No ratings yet
Homework 1
8 pages
Dirichlet
No ratings yet
Dirichlet
6 pages
M.L. Krasnov, G. Yankovsky-Problems and Exercises in The Calculus of Variations-Central Books LTD (1975) PDF
No ratings yet
M.L. Krasnov, G. Yankovsky-Problems and Exercises in The Calculus of Variations-Central Books LTD (1975) PDF
112 pages
Control Optimo
No ratings yet
Control Optimo
132 pages
Optimization
No ratings yet
Optimization
11 pages
Mclas Tema1 v2
No ratings yet
Mclas Tema1 v2
74 pages
Krasnov-Makarenko-Kiselev Problems and Exercises in The Calculus of Variations
100% (2)
Krasnov-Makarenko-Kiselev Problems and Exercises in The Calculus of Variations
221 pages
Convex Optimization Quizz
No ratings yet
Convex Optimization Quizz
5 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Closed Functions: September 4, 2007
No ratings yet
Closed Functions: September 4, 2007
19 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
2023 Hw2sol
No ratings yet
2023 Hw2sol
9 pages
EE364a Homework 3 Solutions: 0 N 0 1 N N 1 1 N N 0 0
No ratings yet
EE364a Homework 3 Solutions: 0 N 0 1 N N 1 1 N N 0 0
19 pages
Lec3 Convex Function Exercise
No ratings yet
Lec3 Convex Function Exercise
4 pages
Annals of Mathematics
No ratings yet
Annals of Mathematics
19 pages
Calculus of Variation Important To With Anno 1682648205492
No ratings yet
Calculus of Variation Important To With Anno 1682648205492
80 pages
Vapnik - Complete Statistical Theory of Learning Learning U
No ratings yet
Vapnik - Complete Statistical Theory of Learning Learning U
59 pages
C62 Sheet 1 2024 SolutionsPartsAC
No ratings yet
C62 Sheet 1 2024 SolutionsPartsAC
6 pages
Nlpsol 4
No ratings yet
Nlpsol 4
7 pages
Func 20160919
No ratings yet
Func 20160919
35 pages
A Quantum Version of Randomization Crite
No ratings yet
A Quantum Version of Randomization Crite
28 pages
Sol 2
No ratings yet
Sol 2
7 pages
Jsaer2014 01 01 20 29
No ratings yet
Jsaer2014 01 01 20 29
10 pages
Cov Main
No ratings yet
Cov Main
33 pages
Kullback Leibler Divergence Monotonicity of The Metropolis Hastings Markov Chains
No ratings yet
Kullback Leibler Divergence Monotonicity of The Metropolis Hastings Markov Chains
6 pages
Sobolov Spaces
No ratings yet
Sobolov Spaces
49 pages
Quadratic Mean Differentiability Example
No ratings yet
Quadratic Mean Differentiability Example
5 pages
Plancherel Theorem and Fourier Inversion Theorem
No ratings yet
Plancherel Theorem and Fourier Inversion Theorem
6 pages
Bedd
No ratings yet
Bedd
13 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Harmonic Functions
No ratings yet
Harmonic Functions
30 pages
CAP10 FoundationsSolidMechanics Fung
No ratings yet
CAP10 FoundationsSolidMechanics Fung
23 pages
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
No ratings yet
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
4 pages
Chapter - 2 - Convex Function
No ratings yet
Chapter - 2 - Convex Function
32 pages
BasicsOfConvexOptimization PDF
No ratings yet
BasicsOfConvexOptimization PDF
142 pages
2021 Mock Exam Solution
No ratings yet
2021 Mock Exam Solution
5 pages
Convex Optimisation Solutions
No ratings yet
Convex Optimisation Solutions
14 pages
An Introduction To Functional Derivatives
No ratings yet
An Introduction To Functional Derivatives
8 pages
Investigating Q-Exponential Functions in The Context of Bi-Univalent Functions Insights Into The Fekc
No ratings yet
Investigating Q-Exponential Functions in The Context of Bi-Univalent Functions Insights Into The Fekc
6 pages
Exercises MEF - 4 - 2018 - Solution
No ratings yet
Exercises MEF - 4 - 2018 - Solution
7 pages
Hel Conj
No ratings yet
Hel Conj
9 pages
Key Math Analysis
No ratings yet
Key Math Analysis
3 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Lecture Notes CoV 2022
No ratings yet
Lecture Notes CoV 2022
105 pages
Execises Sheet 2 (Real Functions of Real Variables)
No ratings yet
Execises Sheet 2 (Real Functions of Real Variables)
4 pages
PDE - S and Transition Density Function
No ratings yet
PDE - S and Transition Density Function
44 pages
Homw 5 Sol
No ratings yet
Homw 5 Sol
8 pages
Konstantin Id Is
No ratings yet
Konstantin Id Is
24 pages
Chapter 13
No ratings yet
Chapter 13
35 pages
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
No ratings yet
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
3 pages
Orlando Lopes - Stability of Peakons For The Generalized Camassa-Holm Equation
No ratings yet
Orlando Lopes - Stability of Peakons For The Generalized Camassa-Holm Equation
12 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Introduction To Gamma Convergence
No ratings yet
Introduction To Gamma Convergence
15 pages
F Divergence PDF
No ratings yet
F Divergence PDF
13 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Bias Variance Annotated
No ratings yet
Bias Variance Annotated
73 pages
Optimum Semiconductors For High-Power Electronics: Loss Caused
No ratings yet
Optimum Semiconductors For High-Power Electronics: Loss Caused
13 pages
Assignment 1 Name: Vinay Thakar Roll NO: P20EL002 Sub: Modelling of Machines and DC Drives
No ratings yet
Assignment 1 Name: Vinay Thakar Roll NO: P20EL002 Sub: Modelling of Machines and DC Drives
12 pages
Blasius Equation
0% (1)
Blasius Equation
119 pages
01 - Set of Axioms and Finite Geometries (Part 3)
100% (2)
01 - Set of Axioms and Finite Geometries (Part 3)
21 pages
Graph The Function.: Chapter 4.1-4.4 Quiz NAME
No ratings yet
Graph The Function.: Chapter 4.1-4.4 Quiz NAME
5 pages
Introductory Mathematical of Economics
No ratings yet
Introductory Mathematical of Economics
162 pages
Big Ideas Standard Form
No ratings yet
Big Ideas Standard Form
6 pages
Mathieu Equation
No ratings yet
Mathieu Equation
69 pages
Mathematics 1
No ratings yet
Mathematics 1
4 pages
3 Hours / 70 Marks: Instructions
No ratings yet
3 Hours / 70 Marks: Instructions
4 pages
Example::: The Set of Odd Positive Integers Less Than 10 Can Be Expressed by O (1,3,5,7,9)
No ratings yet
Example::: The Set of Odd Positive Integers Less Than 10 Can Be Expressed by O (1,3,5,7,9)
7 pages
CHAPTER 6-6.2 Solving System of Equation Using Gaussian & Gauss Jordan (Part 1) - 440
No ratings yet
CHAPTER 6-6.2 Solving System of Equation Using Gaussian & Gauss Jordan (Part 1) - 440
12 pages
Hankel Transform
No ratings yet
Hankel Transform
30 pages
1.2 Universal Set and Complement of A Set
No ratings yet
1.2 Universal Set and Complement of A Set
42 pages
Wavy Curve - Basic Maths - Adv
No ratings yet
Wavy Curve - Basic Maths - Adv
4 pages
Formula 1
100% (1)
Formula 1
2 pages
Iwymic 2004
No ratings yet
Iwymic 2004
6 pages
Tutorial 6 Graphing Techniques Solutions
No ratings yet
Tutorial 6 Graphing Techniques Solutions
22 pages
Squeeze Theorem - 230728 - 102350
No ratings yet
Squeeze Theorem - 230728 - 102350
9 pages
Numerical Analysis MCQ's With Answer
100% (2)
Numerical Analysis MCQ's With Answer
7 pages
Sample Paper Class 10maths
No ratings yet
Sample Paper Class 10maths
6 pages
Grade 12 Mathematics Second Chance
No ratings yet
Grade 12 Mathematics Second Chance
104 pages
ESLCE Grade - 12 Mathamtics Examination
No ratings yet
ESLCE Grade - 12 Mathamtics Examination
45 pages
Mathematics Sample Paper - Class 7
No ratings yet
Mathematics Sample Paper - Class 7
13 pages
Towers 2009
No ratings yet
Towers 2009
12 pages
Find The Union of Each of The Following
No ratings yet
Find The Union of Each of The Following
7 pages
9 Integrating Factors Found by Inspection PDF
No ratings yet
9 Integrating Factors Found by Inspection PDF
6 pages
Lesson 4.4 - Solving Special Systems
No ratings yet
Lesson 4.4 - Solving Special Systems
12 pages
Lecture 16 Lid Driven Cavity Flow
No ratings yet
Lecture 16 Lid Driven Cavity Flow
23 pages
Midterm Question Paper MA2101 Nov2021
No ratings yet
Midterm Question Paper MA2101 Nov2021
2 pages

Lecture 4 - Variational Divergence Minimization or Adversarial Learning

Uploaded by

Lecture 4 - Variational Divergence Minimization or Adversarial Learning

Uploaded by

Variational

Divergence Minimization / Adversarial Learning .

Iii ) Definitions of supremum / Infimum

between the model E the true data distributions .

represent the densities imposed by

the true data & the model , respectively .

Dia [ Po 1112] =/ Po log Po tax

Therefore it is desirable to consider alternate

metrics other than

Given distributions 1Pa

respective absolute continuous density functions peaq

deferred the domain then the f-

With this variational method

semi continuous with f- = f.

f- (a) of which one is

Let us use this

argument inside the supremum is pointwise over ×

& thus when the is taken out the

not contain all

what Tcx) would drive Df to ?

Solution : Df would vanish

Proof : Notice that Df is a

needs to consider the Eviler

qlx) f- (1- tax can recover

since Ttx) does L the

second term vanishes for all Tcx) .

the choice made

We have so established lower bound the

f- divergence . We now seek to obtain a sampler for

thus obtained lower bound .

First the bound ,

conjugate function f- has to be respected . To achieve

function specific to the f divergence used .

Neural Network that samples from

distribution input another

neural network which operates on the data space X .

called the Generator network

the discriminator network .

The optimal 0*4 w*

following problem alternatively , .

You might also like

The optimal 04 w