0% found this document useful (0 votes)

3 views

Lecture 03

Uploaded by

jack

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture 03

Uploaded by

jack

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

401-4634-24L: Diffusion Models, Sampling and Stochastic Localization

Lecture 3 – Langevin algorithms

Lecturer: Yuansi Chen Spring 2024

Key concepts:

• Sampling from a smooth density

• Langevin diffusion

• Unadjusted Langevin Algorithm (ULA) and Metropolis-adjusted Langevin Algo-

rithm (MALA)

• Convergence of continuous Langevin diffusion

– in Wasserstein distance, using strong-logconcavity via coupling

– in χ2 distance, using Poincaré inequality via Fokker Planck equation

• (note is here but only discussed in the 4th lecture) Convergence of ULA

The material of this lecture is based on Chapter 1 and 4 of [Che23].

3.1 Introduction
In the previous lecture, we saw that the corners of the convex body cause a lot of
problems for the sampling algorithm Ball walk: Ball walk has to choose a smaller
step-size otherwise it would have close-to-zero acceptance rate in many places. It is
not always the case in practice that we encounter distributions that are as nonsmooth
as the uniform distribution on a convex body. In this lecture, we completely avoid
the nonsmooth problem by making a simplifying assumption that we are dealing with
smooth densities of the form

µ ∝ e−f

where f is twice continuously differentiable. We would like to know whether there exist
sampling algorithms better than Ball walk.

3-1
401-4634-24L Lecture 3 Spring 2024

3.1.1 Langevin diffusion

Given f : Rn → R a twice-differentiable function, the Langevin diffusion is the
following stochastic differential equation (SDE)
√
dXt = −∇f (Xt )dt + 2dBt , (3.1)
where Bt is the Brownian motion in Rn .

Brownian motion. We define Brownian motion in Rn , denoted by {Bt }t≥0 , to be a

stochastic process, i.e. a sequence of random variables in Rn indexed by t ≥ 0, satisfying
the following four properties
1. B0 = 0
2. {Bt }t≥0 is continuous with probability 1

3. (independent increment) For any k ∈ N and {ti }ki=0 with t0 = 0 < t1 < · · · < tk ,
the random variables Bti+1 − Bti for 0 ≤ i ≤ k − 1, are mutually independent
4. (Gaussian increment) for any 0 ≤ s < t, Bt − Bs is distributed as N (0, (t − s)In ).
Intuitively, we may think dBt in Eq. (3.1) is Gaussian noise with mean 0 and variance
dt. Then Eq. (3.1) may be thought of as a noisy gradient
√ descent, with a deterministic
gradient step −∇f (Xt )dt and a diffusion component 2dBt . For a rigorous treatment
of Brownian motion and stochastic calculus, readers are referred to [Pro04].

3.1.2 Sampling algorithms connected to Langevin diffusion

Langevin diffusion in Eq (3.1) is a continuous process. To simulate it in practice, we
need to discretize it.

Unadjusted Langevin Algorithm (ULA). Unadjusted Langevin Algorithm is the

outcome of Euler discretization of the Langevin diffusion. Starting for X0 drawn from
an initial distribution, it iterates as follows: from the current state Xk , it produces the
next state by
√
Xk+1 = Xk − h∇f (Xk ) + 2hξk (3.2)
where h > 0 is the step-size (or the discretization size) to be chosen by the user and
ξk ∼ N (0, In ) is independent Gaussian noise. Intuitively, taking the limit h → 0 in
ULA would get us back to the Langevin diffusion in Eq (3.1). For small step-size and
large k, we expect the distribution of Xk to be close to the stationary measure of the
Langevin diffusion (hopefully the target measure µ, but we haven’t proved it yet) with
an error that depends on h.

3-2
401-4634-24L Lecture 3 Spring 2024

Metropolis-adjusted Langevin Algorithm (MALA). To ensure that a Markov

chain has the correct stationary measure, we can always add a Metropolis-Hastings filter
(or accept-reject step) to it. This is what Metropolis-adjusted Langevin Algorithm does
in addition to ULA. It iterates as follows: from the current state Xk , it has a proposal
step and an accept-reject step
• Proposal step: same as in ULA
√
Zk+1 = Xk − h∇f (Xk ) + 2hξk

• Accept-reject step: go to
n µ(Z )P o
Zk+1 (Xk )
( k+1
Zk+1 with probability min 1, µ(Xk )PX (Zk+1 )
Xk+1 = k

Xk with the remaining probability.

Note that conditioned on Xk , the proposal step boils down to drawing a Gaussian with
mean Xk − h∇f (Xk ) and covariance 2hIn . Hence, the proposal kernel has an explicit
form
!
1 ∥x − (z − h∇f (z))∥22
Pz (x) = n exp − .
(2π · 2h) 2 4h

Then, the acceptance rate also has an explicit form

µ(z)Pz (x)
min 1,
µ(x)Px (z)

1 2 1 2
= min 1, exp −f (z) − ∥x − (z − h∇f (z))∥2 + f (x) + ∥z − (x − h∇f (x))∥2 .
4h 4h
In addition to one gradient evaluation step in the proposal step, MALA requires two
more gradient evaluation steps and two function evaluation steps per iteration.

Metropolized random walk (MRW). We can always introduce a Ball-walk-like

sampling algorithm for sampling a smooth density. In each iteration, it has a Gaussian
proposal followed by an accept-reject step.
• Proposal step:
√
Zk+1 = Xk + 2hξk .

• Accept-reject step: go to
 MRW (X )

µ(Zk+1 )PZ k
Z k+1
k+1 with probability min 1, MRW (Z
µ(Xk )PX k+1 )
Xk+1 = k

Xk with the remaining probability.



3-3
401-4634-24L Lecture 3 Spring 2024

Here, because of the symmetry of the proposal P MRW , PZMRW

k+1
MRW
(Xk ) cancels with PXk
(Zk+1 ).
n o
µ(Zk+1 )
The acceptance rate boils down to min 1, µ(Xk ) .

Main questions. We are interested in the convergence of the continuous Langevin

diffusion and the three sampling algorithms for sampling a smooth density. In this
lecture, we ask the following three main questions, and we try to answer in the next
section
1. What is the stationary measure of Langevin diffusion (3.1)? We hope it to be
µ ∝ e−f .
2. How fast does Langevin diffusion converge to its stationary measure?
3. What is the mixing time of ULA?

3.2 Convergence of Langevin diffusion

Because both sampling algorithms ULA and MALA are closely related to the Langevin
diffusion, it is natural to make use of the convergence of the Langevin diffusion in
continuous time to analyze the two discrete-time algorithms. We call it SDE-based
mixing proof technique, in contrast to the conductance-based mixing proof technique
in Lecture 2.
We first introduce the Fokker-Planck equation associated with the Langevin diffu-
sion in Eq. (3.1), assume its correctness, and then analyze the Langevin diffusion based
on it. Once we have a good understanding of the convergence of Langevin diffusion,
the mixing time analysis of ULA follows from a careful discretization analysis.

3.2.1 Fokker-Planck equation

Consider a drift-diffusion process {Xt }t≥0 on R driven by a drift term a : R × R → R
and a diffusion term b : R × R → R, and characterized by the following SDE

dXt = a(Xt , t)dt + b(Xt , t)dBt , (3.3)

where Bt is the Brownian motion in R. We assume the following fact without proving
it.

Fokker-Planck equation. Let {Xt }t≥0 be a drift-diffusion process following SDE (3.3),
starting from X0 ∼ µ0 . Then for all t ≥ 0, denoting the law of Xt by µt , we have
∂ ∂ ∂2
µt (x) = − [a(x, t)µt (x)] + 2 [D(x, t)µt (x)] , ∀x ∈ R, (3.4)
∂t ∂x ∂ x

3-4
401-4634-24L Lecture 3 Spring 2024

where D(x, t) = b(x, t)2 /2. The above equation is called the Fokker-Planck equation
associated to the drift-diffusion process {Xt }t≥0 . The Fokker-Planck equation describes
the time evolution of the probability density function via a partial differential equa-
tion (PDE). Unlike Eq. (3.3), the Fokker-Planck equation in Eq. (3.4) is completely
deterministic.
In general, there are two main approaches to interpret a drift-diffusion process in
Eq. (3.3) as illustrated in Figure 3.1. The first approach is the pathwise view: given
a random draw of the Brownian motion Bt , Eq. (3.3) becomes an ordinary differential
equation and, it generates a continuous path in R. Each random draw of the Brownian
motion generates a path. The collection of all paths describes the SDE. The second
approach is the density evolution view: since we don’t really care about the identity
of each path, we can focus on the evolution of the law of the density of Xt at any
time t > 0. Fokker-Planck equation enables this second approach via a PDE. The
two approaches are complimentary and are related via Markov semigroup theory and
Kolmogorov’s forward and backward equations. For a detailed exposition and a proof
of the Fokker-Planck equation, see Chapter 1.2 of [Che23].
Example 1 (heat equation). Taking a = 0, b = 1 in Eq. (3.4), we obtain the heat
equation
∂ 1 ∂2
µt = µt .
∂t 2 ∂x2
Starting from 0, the PDE has a closed-form solution
2
1 x
µt = √ exp − .
2πt 2t
The above density is exactly the law of Xt defined via dXt = dBt .
Finally, one can also introduce a higher dimensional formulation of the Fokker-
Planck equation. Consider a drift-diffusion process {Xt }t≥0 on Rn driven by a drift
term a : Rn × R → Rn and a diffusion term b : Rn × R → Rn×m , characterized by the
following SDE
dXt = a(Xt , t)dt + b(Xt , t)dBt , (3.5)
where Bt is the Brownian motion in Rm .

Fokker-Planck equation in n-dimension. Let {Xt }t≥0 be a drift-diffusion process

following SDE (3.5), starting from X0 ∼ µ0 . Then for all t ≥ 0, denoting the law of Xt
by µt , we have
n n X n
∂ X ∂ X ∂2
µt (x) = − [ai (x, t)µt (x)] + [Dij (x, t)µt (x)] , ∀x ∈ Rn , (3.6)
∂t i=1
∂xi i=1 j=1
∂xi ∂xj

where D = 21 bb⊤ .

3-5
401-4634-24L Lecture 3 Spring 2024

Xt Xt

X0 X0

µt
µ{t+δ}
Figure 3.1. Two interpretations of a drift-diffusion process. Left: pathwise view.
Right: density evolution view.

√
Example 2 (Langevin diffusion). Taking a(x, t) = −∇f (x) and b(x, t) = 2In , the
SDE corresponds to Langevin diffusion
√
dXt = −∇f (Xt )dt + 2dBt .

The associated Fokker-Planck equation is

∂
µt = ∇ · (µt ∇f ) + ∆µt . (3.7)
∂t

Differential operator notation.

• The divergence of a continuously differentiable vector function F : Rn → Rn is

n
X ∂
∇·F = Fi ,
i=1
∂xi

where Fi : Rn → R is the i-th coordinate output of F .

• The Laplacian of a twice-differentiable function g : Rn → R is

n
X ∂2
∆g = 2
g.
i=1
∂x i

Note that it is also the divergence of the gradient (∇g), i.e., ∆g = ∇ · ∇g.

3-6
401-4634-24L Lecture 3 Spring 2024

3.2.2 The stationary measure of Langevin diffusion

Assuming the correctness of the Fokker-Planck equation (3.6), we are ready to show
that µ is a stationary measure of Langevin diffusion. To show that µ is a stationary
measure, it suffices to show that
∂
µt
∂t
vanishes pointwise when µt is evaluated at µ. We already know the Fokker-Planck
equation of Langevin diffusion in Eq. (3.7). It remains to show that
?
0 = ∇ · (µ∇f ) + ∆µ.
We have by definition of the divergence
n
X
∇ · (µ∇f ) = ∂i (µ · ∂i f ),
i=1
and
n
X
∆µ = ∂i2 µ
i=1
n
(i) X
= ∂i (−µ · ∂i f )
i=1
n
X
=− ∂i (µ · ∂i f ),
i=1

where ∂i is used as a shorthand for ∂x∂ i , (i) used the assumption µ = ce−f with c a
constant. So, the two terms above sum to 0. And we prove µ is a stationary measure.

3.2.3 Convergence of Langevin diffusion in Wasserstein dis-

tance
We prove the convergence of Langevin diffusion in Wasserstein distance under strong
logconcavity.

Wasserstein distance. Let µ, ν be two measures on Rn with finite second moments,

i.e., EX∼ν [∥X∥22 ] < ∞ and EX∼µ [∥X∥22 ] < ∞. We define the Wasserstein-2 distance
between µ and ν by
Z 12
2
W2 (µ, ν) := inf ∥x − y∥2 γ(x, y)dxdy ,
γ∈C(µ,ν)

where C(µ, ν) is the set of all couplings of µ and ν. We say γ is a coupling of µ and ν,
if its marginal on the first variable is µ and its marginal on the second is ν.

3-7
401-4634-24L Lecture 3 Spring 2024

Strong logconcavity. We say a measure µ is m-strongly logconcave if µ ∝ exp(−f )

with f being m-strongly convex, i.e., mIn ⪯ ∇2 f .
Theorem 3.2.1 (Convergence of Langevin diffusion in Wasserstein distance). Let
{Xt }t≥0 be generated according to the Langevin diffusion (3.1) with initialization X0 ∼
µ0 and stationary measure µ ∝ e−f . Assume µ is m-strongly logconcave. Let µt denote
the law of Xt , then

W22 (µt , µ) ≤ exp(−2mt)W22 (µ0 , µ).

Proof. The main proof strategy is to construct a coupling between µt and µ by taking
advantage of the Langevin SDE (3.1), and then prove that it is bounded. We construct
a coupling as follows. Let γ0 be an optimal coupling of (µ0 , µ) which achieves W22 (µ0 , µ).
Draw (X0 , X0∗ ) ∼ γ0 . Let X0 and X0∗ evolve through the Langevin SDE with the same
copy of Brownian motion {Bs }s≥0 . Let γt denote the law of the resulting (Xt , Xt∗ ). γt
is a coupling of (µt , µ) because
• Marginally, we just followed the Langevin SDE. So the law of Xt is µt

• µ is a stationary measure, so the law of Xt∗ remains µ.

Next, we control the E(Xt ,Xt∗ )∼γt ∥Xt − Xt∗ ∥22 . We have

d ∥Xt − Xt∗ ∥22 = 2 ⟨Xt − Xt∗ , dXt − dXt∗ ⟩

(i)
= −2 ⟨Xt − Xt∗ , ∇f (Xt ) − ∇f (Xt∗ )⟩ dt
(ii)
≤ −2m ∥Xt − Xt∗ ∥22 dt. (3.8)

(i) uses the fact that Xt and Xt∗ shared the same Brownian motion. (ii) uses the mean
value theorem in the following way:

⟨y − x, ∇f (y) − ∇f (x)⟩ = ⟨y − x, ∇f (ωt ) − ∇f (ω0 )⟩ |t=1

(iii)
= y − x, ∇2 f (ωτ )(y − x)
≥ m ∥y − x∥22 ,

where ωt = (1 − t)x + ty, (iii) uses the mean value theorem for the function t →
⟨y − x, ∇f (ωt ) − ∇f (ω0 )⟩ with the derivative ⟨y − x, ∇2 f (ωt )(y − x)⟩. So there exists
τ ∈ [0, 1] such that (iii) holds. The last step follows from m-strong concavity.
Solving the ODE inequality (3.8) or applying Grönwall’s inequality, we obtain

∥Xt − Xt∗ ∥22 ≤ exp(−2mt) ∥X0 − X0∗ ∥22 .

3-8
401-4634-24L Lecture 3 Spring 2024

Taking expectation on both sides, we obtain

Eγt ∥Xt − Xt∗ ∥22 ≤ exp(−2mt)Eγ0 ∥X0 − X0∗ ∥22 = exp(−2mt)W22 (µ0 , µ).
We complete the proof by noticing that γt is one coupling and W22 (µt , µ) takes the
infimum over all couplings.
The following results show that in sampling a strongly log-concave density, it is not
hard to have a reasonable control of the initial Wasserstein distance.
Lemma 1. Let µ ∝ e−f , where f is m-strongly convex and minimized at x∗ . Then
2n
EX∼µ ∥X − x∗ ∥22 ≤ .
m
Remark that when f satisfies mIn ⪯ ∇2 f ⪯ LIn , x∗ can be obtained up to ϵ-error
L
in m log(1/ϵ) iterations via gradient descent method (see e.g., [B+ 15]).
Proof. Let µ = c exp(−f ), where c is a constant. We have
Z
EX∼µ ∥X − x ∥2 = c ∥x − x∗ ∥22 exp(−f (x))dx
∗ 2

(i) 2c
Z
≤ ⟨∇f (x), x − x∗ ⟩ exp(−f (x))dx
m
Z
(ii) 2c
= trace(In ) exp(−f (x))dx
m
2n
= .
m
(i) follows from the strong convexity of f : ⟨∇f (x) − ∇f (x∗ ), x − x∗ ⟩ ≥ m2 ∥x − x∗ ∥22
and ∇f (x∗ ) = 0. (ii) follows from integration by parts: for a differentiable function
g : Rn → R and vector field v : Rn → Rn with sufficiently fast decay at infinity, we
have
Z Z
⟨v(x), ∇g(x)⟩ dx = − g(x)(∇ · v)(x)dx. (3.9)

We integrated over ∇g, differentiated over v and the boundary term is 0 because of the
decay at infinity.

3.2.4 Convergence of Langevin diffusion in χ2 -divergence

To show that the above convergence is not mainly due to the choice of Wasserstein dis-
tance, we prove the convergence of Langevin diffusion in χ2 -divergence under Poincaré
inequality.

3-9
401-4634-24L Lecture 3 Spring 2024

χ2 -divergence. Let ν, µ be two measures on Rn . We define the χ2 -divergence between

ν and µ by
Z 2
2 ν ν(x)
χ (ν ∥ µ) := Varµ = µ(x)dx − 1.
µ µ(x)

The χ2 -divergence is an upper bound of the total variation distance because of Cauchy-
Schwarz inequality.

Poincaré inequality. We say a measure µ satisfies a Poincaré inequality with con-

stant CPI if for all differentiable and square integrable function with respect to ν, we
have

Varµ [g] ≤ CPI Eµ ∥∇g(x)∥22 .

Here Eµ and Varµ denote the expectation with respect to µ and variance with respect
to µ respectively
Z
Eµ [g] := g(x)µ(x)dx,

Varµ [g] := Eµ [g 2 ] − (Eµ [g])2 .

Similar to the isoperimetry in Lecture 2, Poincaré inequality is also an intrinsic property

of the measure µ, and this definition has nothing to do with the sampling algorithm.
Intuitively, the Poincaré constant being large also indicates that the measure µ has
a bottleneck (see Figure 3.2). Additionally, the isoperimetric constant and Poincaré
constant are related as ψ ≤ C2PI according to [Maz60] and [Che69].

A B C

µ
g

Figure 3.2. Illustration of large Poincaré constant. µ is bimodal with one mode
in region A and the other mode in region C. It has a bottleneck in region B where
density is close to 0. When the two modes are far away, it becomes possible to design
a g with small gradient, hiding inside region B and has a large variance. In this case,
the Poincaré constant of µ has to be large.

3-10
401-4634-24L Lecture 3 Spring 2024

Theorem 3.2.2 (Convergence of Langevin diffusion in χ2 -divergence). Let {Xt }t≥0 be

generated according to the Langevin diffusion (3.1) with initialization X0 ∼ µ0 and
stationary measure µ ∝ e−f . Assume µ is m-strongly logconcave. Let µt denote the law
of Xt , then

2 2t
χ (µt ∥ µ) ≤ exp − χ2 (µ0 ∥ µ).
CPI

Proof. Taking derivative with respect to t, we have

Z 2
d 2 d µt (x)
χ (µt ∥ µ) = − 1 µ(x)dx
dt dt µ2 (x)
Z
(i) µt (x) ∂ µt (x)
=2 µ(x)dx
µ(x) ∂t µ(x)
 
Z −∇ · µ∇ µt
(ii) µt (x)  µ
= 2  µ(x)dx
µ(x) µ(x)
Z 2
(iii) µt
= −2 ∇ µ(x)dx
µ 2
(iv) 2 2
≤ − χ (µt ∥ µ)
CPI
In (i) we switched the order of derivative and integral, which can be done after verifying
conditions for dominated convergence. (ii) follows from the Fokker-Planck equation for
µt in Eq. (3.7) and the observation that

µt
∇ · (µt ∇f ) + ∆µt = −∇ · µ∇ .
µ

(iii) follows from integration by parts (3.9). (iv) follows from Poincaré inequality. Solv-
ing the ODE for χ2 divergence or apply Grönwall’s inequality, we obtain the desired
result.

3.3 Mixing time of ULA

Recall the iteration of ULA from Eq. (3.2),
√
Xk+1 = Xk − h∇f (Xk ) + 2hξk . (3.10)

Let µk denote the law of Xk . Then we have the following mixing time result.

3-11
401-4634-24L Lecture 3 Spring 2024

Theorem 3.3.1. Assume that the target measure µ ∝ exp(−f ) satisfies mIn ⪯ ∇2 f ⪯
L 1
LIn . Let κ := m . Then, given h ≲ Lκ , for K ≥ 1,

K mhK 1 1
W2 (µ , µ) ≤ exp − W2 (µ0 , µ) + ch 2 n 2 κ,
2

where c is a universal constant.

A few remarks

• If we set the initial measure µ0 to be point mass at x∗ which is the mode of µ,

then
n
W2 (µ0 , µ)2 = EX∼µ ∥X − x∗ ∥22 ≤ ,
m
as a result of integration by parts and strong log-concavity.
√
• For mixing,
√ we want to achieve mW2 ≤ ϵ. It is more convenient to use the
metric mW2 instead of W2 because the former is scale-invariant.
√
• In order for mW2 ≤ ϵ in Theorem 3.3.1, we need both terms to be less than ϵ.
It results in the step-size choice

ϵ2
h≲ ,
Lκn
and the number of steps K choice
√
κ2 n

mW2 (µ0 , µ)
K ≳ 2 log .
ϵ ϵ

Proof sketch. Since ULA is the Euler discretization of the continuous Langevin
diffusion in Eq. (3.1), it is natural to analyze the convergence of ULA by comparing
it to the continuous Langevin diffuison. We know in Section 3.2.3 that the continuous
Langevin diffusion convergences exponentially fast to the target measure µ with a rate
that depends on the strong log-concavity m. It remains to analyze the discretization
error and how it accumulates as a function of the total number of steps K.
Given the above intuition, the main problem becomes how to write W2 (µk+1 , µ) as
2
a function of W2 (µk , µ). In other words, we want to upper bound E X k+1 − X(k+1)h 2
2
as a function of E X k − Xkh 2 . This analysis is separate into two parts:

• The one-step discretization error if both the discrete process and the continuous
process are started at the same distribution. The distance between X k+1 and
X̄(k+1)h in Figure 3.3.

3-12
401-4634-24L Lecture 3 Spring 2024

A ULA ...
µ1 µ2 µk

µ0

µ1h µ2h ... µkh

Langevin diffusion
(LD)
for time h
B X k+1
ULA
via one-step
X k discretization

(LD) X (k+1)h What we want to bound

for time via one-step
h coupling
Xkh

(LD) X(k+1)h
for time
h

Figure 3.3: Illustration of ULA discretization analysis.

• The Wasserstein distance contraction result for continuous Langevin diffusion ran
for time h, which we already know how to proceed in Section 3.2.3. The distance
between X̄(k+1)h and X(k+1)h in Figure 3.3.

See Section 4.1 of [Che23] for a full proof and other proof techniques in the following
subsections. YC — Or wait a bit for me to type it in

3-13
Bibliography

[B+ 15] Sébastien Bubeck et al. Convex optimization: Algorithms and complexity.
Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.

[Che69] Jeff Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In
Proceedings of the Princeton conference in honor of Professor S. Bochner,
pages 195–199, 1969.

[Che23] Sinho Chewi. Log-concave sampling. Book draft available at

https://chewisinho. github. io, 2023.

[Maz60] Vladimir Gilelevich Maz’ya. Classes of domains and imbedding theorems for
function spaces. In Doklady Akademii Nauk, volume 133, pages 527–530. Rus-
sian Academy of Sciences, 1960.

[Pro04] Philip E Protter. Stochastic integration and differential equations. Springer

Mathematics and Statistics eBooks 2005 English/International, 2004.

Italjet Dragster 125 180 Workshop Manual
67% (3)
Italjet Dragster 125 180 Workshop Manual
130 pages
Bjork Algunas Soluciones
No ratings yet
Bjork Algunas Soluciones
45 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
The Sony Corporation: A Case Study in Transnational Media Management
No ratings yet
The Sony Corporation: A Case Study in Transnational Media Management
14 pages
Chapter3English
No ratings yet
Chapter3English
37 pages
Stochastic Processes, Ito Calculus and Black-Scholes Formula
No ratings yet
Stochastic Processes, Ito Calculus and Black-Scholes Formula
36 pages
Approximation of The Invariant Measure of Stable SDEs
No ratings yet
Approximation of The Invariant Measure of Stable SDEs
32 pages
Fokker-Planck - and Langevin Equat
No ratings yet
Fokker-Planck - and Langevin Equat
16 pages
Numerical Solution of The Fokker Planck Equation Using Moving Finite Elements
No ratings yet
Numerical Solution of The Fokker Planck Equation Using Moving Finite Elements
14 pages
18-BEJ1073
No ratings yet
18-BEJ1073
29 pages
The Fokker-Planck Equation
No ratings yet
The Fokker-Planck Equation
12 pages
CH04
No ratings yet
CH04
40 pages
intro-to-sdes
No ratings yet
intro-to-sdes
28 pages
Continuous Time MP
No ratings yet
Continuous Time MP
5 pages
1 HJB: The Stochastic Case: 1.1 Brownian Motion
No ratings yet
1 HJB: The Stochastic Case: 1.1 Brownian Motion
10 pages
document
No ratings yet
document
17 pages
Langevin and Fokker-Planck Equations and Their Generalizations Descriptions and Solutions (Sau Fa Kwok)
No ratings yet
Langevin and Fokker-Planck Equations and Their Generalizations Descriptions and Solutions (Sau Fa Kwok)
208 pages
On The Mathematics of Diffusion Models: David Mcallester Toyota Technologicical Institute at Chicago (Ttic)
No ratings yet
On The Mathematics of Diffusion Models: David Mcallester Toyota Technologicical Institute at Chicago (Ttic)
5 pages
Report
No ratings yet
Report
16 pages
SDErules
No ratings yet
SDErules
5 pages
(Notes) Lec5 Mathematical Foundations of Stochastic Processes
No ratings yet
(Notes) Lec5 Mathematical Foundations of Stochastic Processes
11 pages
Applied Stochastic Processes: G.A. Pavliotis
No ratings yet
Applied Stochastic Processes: G.A. Pavliotis
367 pages
Affine Transformations of Itô Diffusions and their Transition Densities
No ratings yet
Affine Transformations of Itô Diffusions and their Transition Densities
9 pages
EJP.v11-311
No ratings yet
EJP.v11-311
27 pages
MIT18 S096F13 Lecnote21
No ratings yet
MIT18 S096F13 Lecnote21
7 pages
Lecture17 Slides PDF
No ratings yet
Lecture17 Slides PDF
17 pages
Diffusion
No ratings yet
Diffusion
15 pages
Numerical Probability
No ratings yet
Numerical Probability
16 pages
Chapman-Kolmogorov_Equations_34_The_equilibrium_distribution_48511
No ratings yet
Chapman-Kolmogorov_Equations_34_The_equilibrium_distribution_48511
9 pages
Langevin and Fokker Planck Equation and Their Generalizations Descriptions and Solutions SF Kwok WorldScientific PDF
No ratings yet
Langevin and Fokker Planck Equation and Their Generalizations Descriptions and Solutions SF Kwok WorldScientific PDF
4 pages
978-1-4613-9524-9_11
No ratings yet
978-1-4613-9524-9_11
19 pages
Stoch Proc Notes PDF
No ratings yet
Stoch Proc Notes PDF
239 pages
SPDEs
No ratings yet
SPDEs
92 pages
Stochan
No ratings yet
Stochan
63 pages
P-Brownian Motion and the P-Laplacian
No ratings yet
P-Brownian Motion and the P-Laplacian
28 pages
Chaos Superconcentration Multiple Valleys
No ratings yet
Chaos Superconcentration Multiple Valleys
11 pages
Chapter V. Stochastic Processes in Continuous Time 1. Brownian Motion
No ratings yet
Chapter V. Stochastic Processes in Continuous Time 1. Brownian Motion
20 pages
Lecture notesM4A42 PDF
No ratings yet
Lecture notesM4A42 PDF
136 pages
SDE Book
No ratings yet
SDE Book
119 pages
Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
Introduction To Stochastic Process: Julia Hinkel
No ratings yet
Introduction To Stochastic Process: Julia Hinkel
20 pages
Barbour - Stein's Method F or Diffusion Approx
No ratings yet
Barbour - Stein's Method F or Diffusion Approx
26 pages
Nonasymptotic Analysis of Stochastic Gradient Hamiltonian Monte Carlo Under Local Conditions For Nonconvex Optimization
No ratings yet
Nonasymptotic Analysis of Stochastic Gradient Hamiltonian Monte Carlo Under Local Conditions For Nonconvex Optimization
34 pages
PresentationsEN4 FENE
No ratings yet
PresentationsEN4 FENE
17 pages
Statistical Inference For Ergodic Diffusion Process: Yu.A. Kutoyants
No ratings yet
Statistical Inference For Ergodic Diffusion Process: Yu.A. Kutoyants
24 pages
An Introduction to Stochastic Processes in Continuous Time
No ratings yet
An Introduction to Stochastic Processes in Continuous Time
145 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Presentations FENE 22-23
No ratings yet
Presentations FENE 22-23
66 pages
Introduction To Stochastic Calculus
No ratings yet
Introduction To Stochastic Calculus
126 pages
Brownian Motion and The Heat Equation: Denis Bell University of North Florida
No ratings yet
Brownian Motion and The Heat Equation: Denis Bell University of North Florida
14 pages
Derivation of The Fokker Planck Equation
No ratings yet
Derivation of The Fokker Planck Equation
6 pages
Notes On Stochastics
No ratings yet
Notes On Stochastics
287 pages
Uffe Høgsbro Thygesen - Stochastic Differential Equations For Science and Engineering-CRC Press - Chapman & Hall (2023)
No ratings yet
Uffe Høgsbro Thygesen - Stochastic Differential Equations For Science and Engineering-CRC Press - Chapman & Hall (2023)
381 pages
L Evy Processes: Abhinav Pradeep
No ratings yet
L Evy Processes: Abhinav Pradeep
21 pages
The Fokker - Plank Equation
No ratings yet
The Fokker - Plank Equation
20 pages
Pavliotis Book
No ratings yet
Pavliotis Book
155 pages
Lec8h2 OptimalControl PDF
No ratings yet
Lec8h2 OptimalControl PDF
83 pages
H2-Optimal Control - Lec8
No ratings yet
H2-Optimal Control - Lec8
83 pages
HW2 550
No ratings yet
HW2 550
8 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Harmonic Analysis and the Theory of Probability
From Everand
Harmonic Analysis and the Theory of Probability
Salomon Bochner
No ratings yet
JJWA Report Final
No ratings yet
JJWA Report Final
60 pages
Solar Nanocomposite Materials
No ratings yet
Solar Nanocomposite Materials
45 pages
Uts-Unpacking The Self - The Material Self
No ratings yet
Uts-Unpacking The Self - The Material Self
5 pages
Lecture 1.1.4 (ATmega328 Block Diagram and External Peri.)
No ratings yet
Lecture 1.1.4 (ATmega328 Block Diagram and External Peri.)
14 pages
SBS Instalment Plans at 0% Markup With No Processing Fee: Alfalah Credit Cards
No ratings yet
SBS Instalment Plans at 0% Markup With No Processing Fee: Alfalah Credit Cards
2 pages
Screenshot 2023-02-10 at 9.03.47 PM PDF
No ratings yet
Screenshot 2023-02-10 at 9.03.47 PM PDF
104 pages
DS BSC Hons Guidelines - 230815 - 125930
No ratings yet
DS BSC Hons Guidelines - 230815 - 125930
3 pages
Benjamin Franklin Resume English 3
No ratings yet
Benjamin Franklin Resume English 3
2 pages
Transformation - Transfer Function State Space
No ratings yet
Transformation - Transfer Function State Space
9 pages
Factors Affecting Teachers' Beliefs About Interculturalism: Enric Llurda David Lasagabaster
No ratings yet
Factors Affecting Teachers' Beliefs About Interculturalism: Enric Llurda David Lasagabaster
27 pages
Service Manual: Stereo Power Amplifier
No ratings yet
Service Manual: Stereo Power Amplifier
31 pages
Power and Roots
No ratings yet
Power and Roots
21 pages
Thinking Like A Historian Worksheet
No ratings yet
Thinking Like A Historian Worksheet
2 pages
Recruitment LESSON 3 Activity #2: University of La Salette Graduate School
No ratings yet
Recruitment LESSON 3 Activity #2: University of La Salette Graduate School
3 pages
Procedure: Ac Voltage Withstand Test Underground Xlpe Cable 150kV
No ratings yet
Procedure: Ac Voltage Withstand Test Underground Xlpe Cable 150kV
4 pages
Project
No ratings yet
Project
19 pages
Revco Operation Manual
No ratings yet
Revco Operation Manual
15 pages
1 - Math - and - Reading - Methods
No ratings yet
1 - Math - and - Reading - Methods
23 pages
Owners Manual: Point-of-Use Drinking Water System
No ratings yet
Owners Manual: Point-of-Use Drinking Water System
12 pages
Nozzles
No ratings yet
Nozzles
12 pages
University of Southeastern Philippines
No ratings yet
University of Southeastern Philippines
10 pages
Extech Rht20 Manual
No ratings yet
Extech Rht20 Manual
6 pages
Equipment Registration Form
No ratings yet
Equipment Registration Form
1 page
Price List: Spares For Bi-Directional Tool Turrets WEF JAN 2006
No ratings yet
Price List: Spares For Bi-Directional Tool Turrets WEF JAN 2006
2 pages
Valves and Valve Trains
No ratings yet
Valves and Valve Trains
21 pages
Zamboanga Del Sur School of Arts and Trades
No ratings yet
Zamboanga Del Sur School of Arts and Trades
7 pages
Linea Weatherboard Installation Guide May2019 Rev1
No ratings yet
Linea Weatherboard Installation Guide May2019 Rev1
12 pages
11th Biology Chapter 1
No ratings yet
11th Biology Chapter 1
1 page

Lecture 03

Uploaded by

Lecture 03

Uploaded by

401-4634-24L: Diffusion Models, Sampling and Stochastic Localization

Lecture 3 – Langevin algorithms

• Sampling from a smooth density

• Unadjusted Langevin Algorithm (ULA) and Metropolis-adjusted Langevin Algo-

• Convergence of continuous Langevin diffusion

– in Wasserstein distance, using strong-logconcavity via coupling

The material of this lecture is based on Chapter 1 and 4 of [Che23].

3.1.1 Langevin diffusion

Brownian motion. We define Brownian motion in Rn , denoted by {Bt }t≥0 , to be a

3.1.2 Sampling algorithms connected to Langevin diffusion

Unadjusted Langevin Algorithm (ULA). Unadjusted Langevin Algorithm is the

Metropolis-adjusted Langevin Algorithm (MALA). To ensure that a Markov

Xk with the remaining probability.

Then, the acceptance rate also has an explicit form

Metropolized random walk (MRW). We can always introduce a Ball-walk-like

Xk with the remaining probability.

Here, because of the symmetry of the proposal P MRW , PZMRW

Main questions. We are interested in the convergence of the continuous Langevin

3.2 Convergence of Langevin diffusion

3.2.1 Fokker-Planck equation

dXt = a(Xt , t)dt + b(Xt , t)dBt , (3.3)

Fokker-Planck equation in n-dimension. Let {Xt }t≥0 be a drift-diffusion process

The associated Fokker-Planck equation is

Differential operator notation.

• The divergence of a continuously differentiable vector function F : Rn → Rn is

where Fi : Rn → R is the i-th coordinate output of F .

• The Laplacian of a twice-differentiable function g : Rn → R is

3.2.2 The stationary measure of Langevin diffusion

3.2.3 Convergence of Langevin diffusion in Wasserstein dis-

Wasserstein distance. Let µ, ν be two measures on Rn with finite second moments,

Strong logconcavity. We say a measure µ is m-strongly logconcave if µ ∝ exp(−f )

W22 (µt , µ) ≤ exp(−2mt)W22 (µ0 , µ).

• µ is a stationary measure, so the law of Xt∗ remains µ.

d ∥Xt − Xt∗ ∥22 = 2 ⟨Xt − Xt∗ , dXt − dXt∗ ⟩

⟨y − x, ∇f (y) − ∇f (x)⟩ = ⟨y − x, ∇f (ωt ) − ∇f (ω0 )⟩ |t=1

∥Xt − Xt∗ ∥22 ≤ exp(−2mt) ∥X0 − X0∗ ∥22 .

Taking expectation on both sides, we obtain

3.2.4 Convergence of Langevin diffusion in χ2 -divergence

χ2 -divergence. Let ν, µ be two measures on Rn . We define the χ2 -divergence between

Poincaré inequality. We say a measure µ satisfies a Poincaré inequality with con-

Varµ [g] ≤ CPI Eµ ∥∇g(x)∥22 .

Varµ [g] := Eµ [g 2 ] − (Eµ [g])2 .

Similar to the isoperimetry in Lecture 2, Poincaré inequality is also an intrinsic property

Theorem 3.2.2 (Convergence of Langevin diffusion in χ2 -divergence). Let {Xt }t≥0 be

Proof. Taking derivative with respect to t, we have

3.3 Mixing time of ULA

where c is a universal constant.

• If we set the initial measure µ0 to be point mass at x∗ which is the mode of µ,

µ1h µ2h ... µkh

(LD) X (k+1)h What we want to bound

Figure 3.3: Illustration of ULA discretization analysis.

[Che23] Sinho Chewi. Log-concave sampling. Book draft available at

[Pro04] Philip E Protter. Stochastic integration and differential equations. Springer

You might also like