0% found this document useful (0 votes)
9 views

diffusion_survey

diffusion_survey

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

diffusion_survey

diffusion_survey

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

A Survey on Diffusion Models for Inverse Problems

Giannis Daras1 , Hyungjin Chung2 , Chieh-Hsin Lai3 , Yuki Mitsufuji3 ,


Jong Chul Ye2 , Peyman Milanfar4 , Alexandros G. Dimakis1 , Mauricio Delbracio4
1 2 3 4
UT Austin KAIST Sony AI Google

Abstract
Diffusion models have become increasingly popular for generative modeling due
to their ability to generate high-quality samples. This has unlocked exciting new
possibilities for solving inverse problems, especially in image restoration and
reconstruction, by treating diffusion models as unsupervised priors. This survey
provides a comprehensive overview of methods that utilize pre-trained diffusion
models to solve inverse problems without requiring further training. We introduce
taxonomies to categorize these methods based on both the problems they address
and the techniques they employ. We analyze the connections between different
approaches, offering insights into their practical implementation and highlighting
important considerations. We further discuss specific challenges and potential
solutions associated with using latent diffusion models for inverse problems. This
work aims to be a valuable resource for those interested in learning about the
intersection of diffusion models and inverse problems.

1 Introduction
1.1 Problem Setting.

Inverse problems are ubiquitous and the associated reconstruction problems have tremendous ap-
plications across different domains such as seismic imaging [81, 146], weather prediction [65],
oceanography [153], audio signal processing [83, 123, 96, 97, 98, 62], medical imaging [134, 26, 2,
29], etc. Despite their generality, inverse problems across different domains follow a fairly unified
mathematical setting. Specifically, in inverse problems, the goal is to recover an unknown sample
x ∈ Rn from a distribution pX , assuming access to measurements y ∈ Rm and a corruption model
Y = A(X) + σy Z, Z ∼ N (0, Im ). (1.1)
In what follows, we present some well-known examples of measurement models that fit under this
general formulation.
Example 1.1 (Denoising). The simplest interesting example is the denoising inverse problem, i.e.
when A is the identity matrix and σy > 0. In fact, the noise model does not have to be Gaussian
and it can be generalized to other distributions, including the Laplacian Distribution or the Poisson
Distribution [52]. For the purposes of this survey, we focus on additive Gaussian noise.
A lot of practical applications arise from the non-invertible linear setting, i.e. for A(X) = AX and
A being an m × n matrix with m < n.
Example 1.2 (Inpainting). A is a masking matrix, i.e. Aij = 0 for i ̸= j and Aii is either 0 or 1,
based on whether the value at this location is observed.
Example 1.3 (Compressed Sensing). A is a matrix with entries sampled from a Gaussian random
variable.
Example 1.4 (Convolutions). Here A(X) represents the convolution of X with a (Gaussian or other)
kernel, which is again a linear operation.

Preprint. Work in progress.


Category Method Non-linear Blind Handle noise Pixel/Latent Text-conditioned Optimization Technique Code1
Score-ALD [67] ✗ ✗ ✓ Pixel ✗ Grad code
Score-SDE [135] ✗ ✗ ✗ Pixel ✗ Proj code
ILVR [22] ✗ ✗ ✗ Pixel ✗ Proj code

Explicit approximations for


DPS [24] ✓ ✗ ✓ Pixel ✗ Grad code

measurement matching
ΠGDM [130] ✓ ✗ ✓ Pixel ✗ Grad code
Moment Matching [119] ✓ ✗ ✓ Pixel ✗ Grad code
BlindDPS [23] ✓ ✓ ✓ Pixel ✗ Grad code
SNIPS [74] ✗ ✗ ✓ Pixel ✗ Grad code
DDRM [73] ✗ ✗ ✓ Pixel ✗ Grad code
GibbsDDRM [99] ✗ ✓ ✓ Grad ✗ Samp code
DDNM [149] ✗ ✗ ✓ Pixel ✗ Proj code
DDS [25] ✗ ✗ ✓ Pixel ✗ Opt code
DiffPIR [164] ✗ ✗ ✓ Pixel ✗ Opt code
PSLD [118] ✓ ✗ ✓ Latent ✗ Grad code
STSL [116] ✓ ✗ ✓ Latent ✗ Grad ✗
CSGM Variational

RED-Diff [94] ✓ ✗ ✓ Pixel ✗ Opt code


methods inference

Blind RED-Diff [6] ✓ ✓ ✓ Pixel ✗ Opt ✗


Score Prior [54] ✓ ✗ ✓ Pixel ✗ Opt code
Efficient Score Prior [53] ✓ ✗ ✓ Pixel ✗ Opt code
DMPlug [147] ✓ ✗ ✓ Pixel ✗ Opt code
SHRED [21] ✓ ✗ ✓ Pixel ✗ Opt ✗
Consistent-CSGM [154] ✓ ✗ ✓ Pixel ✗ Opt ✗
Score-ILO [32] ✓ ✗ ✓ Pixel ✗ Opt code
PnP-DM [152] ✓ ✗ ✓ Pixel ✗ Opt ✗
Asymptotically
Exact Methods

FPS [47] ✓ ✗ ✓ Pixel ✗ Samp code


PMC [138] ✓ ✗ ✓ Pixel ✗ Samp code
SMCDiff [142] ✗ ✗ ✓ Pixel ✗ Samp code
MCGDiff [17] ✗ ✗ ✓ Pixel ✗ Samp code
TDS [151] ✗ ✗ ✓ Pixel ✗ Samp code
Implicit denoiser prior [68] ✗ ✗ ✗ Pixel ✗ Proj code
MCG [27] ✗ ✗ ✗ Pixel ✗ Grad/Proj code
methods
Other

Resample [128] ✓ ✗ ✓ Latent ✗ Grad/Opt code


MPGD [61] ✓ ✗ ✓ Pixel/Latent ✓ Grad/Opt code
P2L [30] ✓ ✗ ✓ Latent ✓ Grad/Opt ✗
TReg [77] ✓ ✗ ✓ Latent ✓ Grad/Opt ✗
DreamSampler [78] ✓ ✗ ✓ Latent ✓ Grad/Opt code

Table 1: Categorization of Diffusion-Based Inverse Problem Solvers. This table categorizes meth-
ods by their approach to solving inverse problems with diffusion models. We identified four families
of methods. Explicit Approximations for Measurement Matching: These methods approximate the
measurement matching score, ∇ log pt (y|xt ), with a closed-form expression. Variational Inference:
These methods approximate the true posterior distribution, p(x|y), with a simpler, tractable distribu-
tion. Variational formulations are then used to optimize the parameters of this simpler distribution.
CSGM-type methods: The works in this category use backpropagation to change the initial noise of
the deterministic diffusion sampler, essentially optimizing over a latent space for the diffusion model.
Asymptotically Exact Methods: These methods aim to sample from the true posterior distribution.
This is typically achieved by constructing Markov chains (MCMC) or by propagating particles
through a sequence of distributions (SMC) to obtain samples that approximate the posterior. Further
categorization is based on being able to address non-linear problems, blind formulations (unknown
forward model), noise handling, pixel/latent space operation, text-conditioning, and the type of
optimization technique used (gradient-based, projection, etc.). Code availability is also indicated.

The same inverse problem can appear across vastly different scientific fields. To illustrate this point,
we can take the inpainting case as an example. In Computer Vision, inpainting can be useful for
applications such as object removal or object replacement [109, 118, 159]. In the proteins domain,
inpainting can be useful for protein engineering, e.g. by mutating certain aminoacids of the protein
sequence to achieve better thermodynamical properties [104, 42, 156, 155]. MRI acceleration is also
an inpainting problem but in the Fourier domain [1, 162, 40, 163, 141]. Particularly, for each coil
measurement yi within the multi-coil setting, we have Ai = P F Si , where P is the masking operator,
F is the 2D discrete Fourier transform, and Si denotes the element-wise sensitivity value. For
single-coil, Si is the identity matrix [92]. Similarly, CT can be considered an inpainting problem in
the Radon-transformed domain A = P R, where R is the Radon transform [105, 57, 13]. Depending
on the circumstances such as sparse-view or limited-angle, the pattern of the masking operator P
differs [70]. Finally, in the audio domain, the bandwidth extension problem, i.e. the task of recovering
high-frequency content from an observed signal, is another example of inpainting in the spectrogram
domain) [43].
Inpainting is just one of many useful linear inverse problems in scientific applications and there are
plenty of other important examples to consider. Cryo-EM [49] is a blind inverse problem that is

2
Score ALD:

∝ − A⊤ (y − Axt )

Score-SDE:

∝ − A⊤ ( yt − Axt )

ILVR:

∝ − (A⊤ A)−1 A⊤ ( yt − Axt )

DPS:
⊤
∝ I + ∇2xt log pt (xt ) A⊤
 
· y − AE[X0 |Xt = xt ]

ΠGDM:
∂E[X0 |Xt = xt ] 2
∝− (rt AA⊤ + σy2 I)−1 A⊤
∇xt log p(y|Xt = xt ) ∂xt
· ( y − AE[X0 |Xt = xt ] )

Moment Matching:

∝ − ∇xt E[x0 |xt ]⊤ A⊤ (σy2 I + Aσt2 ∇xt E[x0 |xt ]A⊤ )−1

· ( y − AE[x0 |xt ] )

SNIPS:

∝ − Σ⊤ σy2 Im − σt2 ΣΣ⊤ ( ȳ − Σx̄t )

DDRM:

∝ − Σ⊤ σy2 Im − σt2 ΣΣ⊤ ( ȳ − Σx̄0|t )

DDNM:
 
∝ Σt A † y − AE[X0 |Xt = xt , y]

Figure 1: Approximations for the measurements score proposed by different methods.

defined by A = CSR, where C is a blur kernel and S is a shifting matrix, i.e. additional (unknown)
shift and blur is applied to the projections. Deconvolution appears in several applications such as
super-resolution [106, 122] of images and removing reverberant corruption [101] in audio signals.
There are many interesting non-linear inverse problems too, i.e. where A is a nonlinear operator.
Example 1.5 (Phase Retrieval [55]). Phase retrieval considers the nonlinear operator A(X) := |F X|,
where the measurement contains only the magnitude of the Fourier signal.

3
Example 1.6 (Compression Removal). Here A(X; α) represents a (non-linear) compression operator
(e.g., JPEG) whose strength is controlled by the parameter α.
A famous non-linear inverse problem is the problem of imaging a black hole, where the relationship
between the image to be reconstructed and the interferometric measurement can be considered as a
sparse and noisy Fourier phase retrieval problem [3].

1.2 Recovery types

One common characteristic of these problems is that information is lost and perfect recovery is
impossible [140], i.e. they are ill-posed. Hence, the type of “recovery” we are looking for should be
carefully defined [124]. For instance, one might be looking for the point that maximizes the posterior
distribution p(x|y) [10, 107]. Often, the Maximum a posteriori (MAP) estimation coincides with the
Minimum Mean Squared Error Estimator, i.e. the conditional expectation E[x|y] [158, 100]. MMSE
estimation attempts to minimize distortion of the unknown signal x, but often lead to unrealistic
recoveries. A different approach is to sample from the full posterior distribution, p(x|y). Posterior
sampling accounts for the uncertainty of the estimation, and typically produces samples that have
higher perception quality. [14] show that, in general, it is impossible to find a sample that maximizes
perception and minimizes distortion at the same time. Yet, posterior sampling is nearly optimal [67]
in terms of distortion error.

1.3 Approaches for Solving Inverse Problems

Inverse problems have a rich history, with approaches evolving significantly over the decades [113, 9].
While a comprehensive review is beyond the scope of this survey, we highlight key trends to provide
context. Early approaches, prevalent in the 2000s, often framed inverse problems as optimization
tasks [38, 16, 46, 56, 38, 59, 127]. These methods sought to balance data fidelity with regularization
terms that encouraged desired solution properties like smoothness [120, 12] or sparsity in specific
representations (e.g., wavelets, dictionaries) [56, 38, 16, 46, 59].
The advent of deep learning brought a paradigm shift [103]. Researchers began leveraging large paired
datasets to directly learn mappings from measurements to clean signals using neural networks [45,
85, 139, 19, 160, 20, 143, 161]. These approaches focus on minimizing some reconstruction loss
during training, with various techniques employed to penalize distortions, and optimize for specific
application goals (e.g., perceptual quality [66, 80]). Traditional point estimates aim to recover a
single reconstruction by for example minimizing the average reconstruction error (i.e., MMSE) or by
finding the most probable reconstruction through Maximum a Posteriori estimate (MAP), i.e., finding
the x that maximizes p(x|y). While powerful, this approach can suffer from “regression to the mean”,
where the network predicts an average solution that may lack important details or even be outside the
desired solution space [14, 39]. In fact, learning a mapping to minimize a certain distortion metric
will lead, in the best case, to an average of all the plausible reconstructions (e.g., when using a L2
reconstruction loss, the best-case solution will be the posterior mean). This reconstruction might not
be in the target space (e.g., a blurry image being the average of all plausible reconstructions) [14].
Recent research has revealed a striking connection between denoising algorithms and inverse prob-
lems. Powerful denoisers, often based on deep learning, implicitly encode valuable information about
natural signals. By integrating these denoisers into optimization frameworks, we can harness their
learned priors to achieve exceptional results in a variety of inverse problems [144, 136, 18, 114, 31,
69, 71, 95]. This approach bridges the gap between traditional regularization methods and modern
denoising techniques, offering a promising new paradigm for solving these challenging tasks.
An alternative perspective views inverse problems through the lens of Bayesian inference. Given
measurements y, the goal becomes generating plausible reconstructions by sampling from the
posterior distribution p(X|Y = y) – the distribution of possible signals x given the observed
measurements y.
In this survey we explore a specific class of methods that utilize diffusion models as priors for pX ,
and then try to generate plausible reconstructions (e.g., by sampling from the posterior). While other
approaches exist, such as directly learning conditional diffusion models or flows for specific inverse
problems [84, 122, 121, 150, 90, 91, 5, 4, 86, 87, 88, 126], these often require retraining for each

4
new application. In contrast, the methods covered in this survey offer a more general framework
applicable to arbitrary inverse problems without retraining or fine-tuning.

Unsupervised methods. We refer as unsupervised methods to those that focus on characterizing the
distribution of target signals, pX , and applying this knowledge during the inversion process. Since
they don’t rely on paired data, they can be flexibly applied to different inverse problems using the
same prior knowledge.
Unsupervised methods can be used to maximize the likelihood of p(x|y) or to sample from this
distribution. Algorithmically, to solve the former problem we typically use (some variation of)
Gradient Descent and to solve the latter (some variation of) Monte Carlo Simulation (e.g., Langevin
Dynamics). Either way, one typically requires to compute the gradient of the conditional log-
likelihood, i.e., ∇x log p(x|y).
A simple application of Bayes Rule reveals that:
∇x log p(x|y) = ∇x log p(x) + ∇x log p(y|x) . (1.2)
| {z } | {z } | {z }
conditional score unconditional score measurements matching term

The last term typically has a closed-form expression, e.g. for the linear case, we have that:
∇x log p(y|x) = y−Ax 2
σy . However, the first term, known as the score function, might be hard
to estimate when the data lie on low-dimensional manifolds. The problem arises from the fact that
we do not get observations outside of the manifold and hence the vector-field estimation is inaccurate
in these regions.
One way to sidestep this issue is by using a “smoothed” version of the score function, representing
the score function of noisy data that will be supported everywhere. The central idea behind diffusion
generative models is to learn score functions that correspond to different levels of smoothing. Specifi-
cally, in diffusion modeling, we attempt to learn the smoothed score functions, ∇xt log pt (xt ), where
Xt = X0 + σt Z, Z ∼ N (0, I), for different noise levels t. During sampling, we progressively
move from more smoothed vector fields to the true score function. At the very end, the score function
corresponding to the data distribution is only queried at points for which the estimation is accurate
because of the warm-start effect of the sampling method.
Even though estimating the unconditional score becomes easier (because of the smoothing), the
measurement matching term becomes time dependent and loses its closed form expression. Indeed,
the likelihood of the measurements is given by the intractable integral:
Z
pt (y|xt ) = p(y|x0 )p(x0 |xt )dx0 . (1.3)

The computational challenge that emerges from the intractability of the conditional likelihood has led
to the proposal of numerous approaches to use diffusion models to solve inverse problems [67, 24,
130, 135, 22, 73, 74, 149, 25, 164, 119, 89, 117, 94, 54, 53, 152, 47, 17, 151, 68, 147, 21, 28, 125].
The sheer number of the proposed methods, but also the different perspectives under which these
methods have been developed, make it hard for both newcomers and experts in the field to understand
the connections between them and the unifying underlying principles. This work attempts to explain,
taxonomize and relate prominent methods in the field of using diffusion models for inverse problems.
Our list of methods is by no means exhaustive. The goal of this manuscript is not to list all the
methods that have been proposed but to review some representative methods of different approaches
and present them under a unifying framework. We believe this survey will be useful as a reference
point for people interested in this field.

2 Background
2.1 Diffusion Processes

Forward and Reverse Processes. The idea of a diffusion model is to transform a a simple dis-
tribution (e.g., normal distribution) into the unknown data distribution p0 (x), that we don’t know
explicitly but we have access to some of its samples. The first step is to define a corruption process.
The popular Denoising Diffusion Probabilistic Models (DDPM) [63, 133], adopt a discrete time

5
Markovian process to transform the input Normal distribution into the target one by incrementally
adding Gaussian noise. More generally, the corruption processes of interest can be generalized to
continuous time by a stochastic differential eqaution (SDE) [135]:
dxt = f (xt , t) dt + g(t) dWt , (2.1)
| {z } |{z}
drift coeff. diffusion coeff.
with x0 ∼ p0 , x0 ∈ Rn , and Wt denotes a Wiener process (i.e., Brownian motion). This SDE
gradually transforms the data distribution into Gaussian noise. We denote with pt the distribution that
arises by running this dynamical system up to time t.
A remarkable result by Anderson [7] shows that we can sample from p0 by running backwards in
time the reverse SDE:  

dxt = f (xt , t) − g 2 (t) ∇xt log pt (xt ) dt + g(t)dWt , (2.2)


| {z }
score
initialized at xT ∼ pT . For sufficiently large T and for linear drift functions f (·, ·), the latter
distribution approaches a Gaussian distribution with known parameters that can be used for initializing
the process. Hence, the remaining goal becomes to estimate the score function ∇xt log pt (xt ).

Probability Flow ODE. Song et al. [135] and Maoutsa, Reich, and Opper [93] observe that the
(deterministic) differential equation:
 
2
dxt g (t)
= f (xt , t) − ∇x log pt (xt ) (2.3)
dt 2 | t {z }
score
corresponds to the same Fokker-Planck equations as the SDE of Equation 2.2. An implication of this
is we can use the deterministic sampling scheme of Equation 2.3. Any well-built numerical ODE
solver can be used to solve Equation 2.3, such as the Euler solver:
g 2 (t)
 
xt−∆t = xt + ∆t f (xt , t) − ∇xt log pt (xt ) . (2.4)
2
SDE variants: Variance Exploding and Variance Preserving Processes. The drift coefficients,
f (xt , t), and the diffusion coefficients g(t) are design choices. q
One popular choice, known as the
dσ 2
Variance Exploding SDE, is setting f (xt , t) = 0 and g(t) = dt for some variance schedul-
t

T
ing {σt }t=0 . Under these choices, the marginal distribution at time t of the forward process of
Equation 2.1 can be alternatively described as:
Xt = X0 + σt Z, X0 ∼ p(X0 ), Z ∼ N (0, In ). (2.5)

The typical noise scheduling for this SDE is σt = t (that corresponds to g(t) = 1).
Another popular choice is to set the drift function to be f (xt , t) = −xt , which is known as the
Variance Preserving (VP) SDE. A famous process in the VP SDE family is the Ornstein–Uhlenbeck
(OU) process:

dxt = −xt dt + 2dWt , (2.6)
which gives:
p
Xt = exp(−t)X0 + 1 − exp(−2t)Z, Z ∼ N (0, In ). (2.7)
The VP SDE [63] takes a more general form:

Xt = αt X0 + (1 − αt )Z, X0 ∼ p(X0 ), Z ∼ N (0, In ). (2.8)
With reparametrization and the Euler solver, this leads to an efficient solution to Equation 2.3, known
as DDIM [129]:
! q !
√ xt + (1 − αt )∇xt log pt (xt ) 2

xt−1 = αt−1 √ + 1 − αt−1 − σt − 1 − αt ∇xt log pt (xt ) .
αt
| {z } | {z }
=:b
x0 =predicted x0 direction toward xt
(2.9)
For convenience, in the rest of the paper, this update will be written as: xt−1 ←
UnconditionalDDIM(b x0 , xt ).

6
2.2 Tweedie’s Formula and Denoising Score Matching

In what follows, we will discuss how one can learn the score function ∇xt log pt (xt ) that appears in
Equation 2.17. We will focus on the VE SDE, since the mathematical calculations are simpler.
Tweedie’s formula [50] is a famous result in statistics that shows that for an additive Gaussian
corruption, Xt = X0 + σt Z, Z ∼ N (0, In ), it holds that:
E[X0 |Xt = xt ] − xt
∇xt log pt (xt ) = . (2.10)
σt2
The formal statement and a self-contained proof can be found in the Appendix, Lemma A.2.
Tweedie’s formula gives us a way to derive the unconditional score function needed in Equation 2.17,
by optimizing for the conditional expectation, E[X0 |Xt = xt ]. The conditional expectation
E[X0 |Xt = xt ], is nothing more than the minimum mean square error estimator (MMSE) of
the clean image given the noisy observation xt , that is a denoiser.
In practice, we don’t know analytically this denoiser but we can parametrize it using a neural network
hθ (xt ) and learn it in a supervised way by minimizing the following objective:
h i
2
JDSM (θ) = Ex0 ,xt ||hθ (xt ) − x0 || . (2.11)
Assuming a rich enough family Θ, the minimizer of Equation 2.11 is hθ (xt )= E[x0 |Xt = xt ] (see
Lemma A.1) and the score in Equation 2.10 is approximated as hθ (xt ) − xt /σt2 . Note that for each
σt we would need to learn a different denoiser (since the noise strength is different), or alternative
the neural network hθ should also take as input the value of t or σt . Diffusion models are trained
following the later paradigm, i.e. the same neural network approximates the optimal denoisers at all
noise levels by conditioning it on the noise level through t.
Interestingly, Vincent [145] independently discovered that the score function can be learned by
minimizing an l2 objective, similar to Equation 2.11. The formal statement and a self-contained proof
of this alternative derivation is included in the Appendix, Theorem A.3.

2.3 Latent Diffusion Processes

For high-dimensional distributions, diffusion models training (see Equation 2.11) and sampling (see
Equation 2.3) require massive computational resources. To make the training and sampling more
efficient, the authors of Stable Diffusion [115] propose performing the diffusion in the latent space of
a pre-trained powerful autoencoder. Specifically, given an encoder Enc : Rn → Rk and a decoder
Dec : Rk → Rn , one can create noisy samples:
XtE = X0E +σt Z, Z ∼ N (0, Ik ), (2.12)
|{z}
Enc(X0 )
and train a denoiser network in the latent space. At inference time, one starts with pure noise, samples
a clean latent x̃E E 1
0 by running the reverse process, and outputs x0 = Dec(x̃0 ) . Solving inverse
problems with Latent Diffusion models requires special treatment. We discuss the reasons and
approaches in this space in Section 3.5.

2.4 Conditional Sampling

2.4.1 Stochastic Samplers for Inverse Problems


The goal in inverse problems is to sample from p0 (·|y) assuming a corruption model Y =
A(X0 ) + σy Z, Z ∼ N (0, Im ). We can easily adapt the original unconditional formulation
given by Equation 2.2 into a conditional one to generate samples from p0 (·|y). Specifically, the
associated reverse process is given by the stochastic dynamical system [102]:
 

dxt = f (xt , t) − g 2 (t) ∇xt log pt (xt | y) dt + g(t)dWt , (2.13)


| {z }
conditional score
1
To give an idea, the original work introducing latent diffusion models [115], propose to use a latent space of
dimension 64 × 64 × 4 to represent images of size 512 × 512 × 3.

7
initialized at xT ∼ pT (·|y). For sufficiently large T and for linear drift functions f (·, ·), the
distribution pT (·|y) is a Gaussian distribution with parameters independent of y. In the conditional
case, the goal becomes to estimate the score function ∇xt log pt (xt |y).

2.4.2 Deterministic Samplers for Inverse Problems


It is worth noting that (as in the unconditional setting) it is possible to derive deterministic sampling
algorithms as well. Particularly, one can use the following dynamical system [135, 102]:
dxt g 2 (t)
=− ∇xt log pt (xt |y). (2.14)
dt 2
initialized at pT (·|y) to get sample from the conditional distribution p0 (·|y). Once again, to run this
discrete dynamical system, one needs to know the conditional score, ∇xt log pt (xt |y).

2.4.3 Conditional Diffusion Models


Similarly to the unconditional setting, one can directly train a network to approximate the conditional
score, ∇xt log pt (xt |y). A generalization of Tweedie’s formula gives that:
E[X0 |Xt = xt , Y = y] − xt
∇ log pt (xt |y) = . (2.15)
σt2
Hence, one can train a network using a generalized version of the Denoising Score Matching,
h i
2
Jcond, DSM (θ) = Ex0 ,xt ,y ||hθ (xt , y) − x0 || , (2.16)
and then use it in Equation 2.15 in place of the conditional expectation. The main issue with this
approach is that the forward model (degradation operator) needs to be known at training time. If the
corruption model A(X) changes, then the model needs to be retrained. Further, with this approach
we need to train new models and we cannot directly leverage powerful unconditional models that are
already available. The focus of this work is on methods that use pre-trained unconditional diffusion
models to solve inverse problems, without further training.

2.4.4 Using pre-trained diffusion models to solve inverse problems


As we showed earlier, the conditional score can be decomposed using Bayes Rule into:
∇xt log pt (xt |y) = ∇xt log pt (xt ) + ∇x log p(y|xt ) . (2.17)
| {z } | t {z }
score measurements matching term

that is, the (smoothed) score function, and the measurements matching term that is given by the
inverse problem we are interested in solving. Applying this to equation 2.13, we get that:
dxt = f (xt , t) − g 2 (t) (∇xt log pt (xt ) + ∇ log p(y|xt )) dt + g(t)dWt .

(2.18)
Similarly, one can use the deterministic process:
 
1
dxt = f (xt , t) − g 2 (t) (∇xt log pt (xt ) + ∇ log p(y|xt )) dt. (2.19)
2

We have already discussed how to train a neural network to approximate ∇xt log pt (xt ) (using
Tweedie’s Formula / Denoising Score Matching). However, here we further need access to the term
∇xt log p(y|xt ). The likelihood of the measurements is given by the intractable integral:
Z
pt (y|xt ) = p(y|x0 )p(x0 |xt )dx0 . (2.20)

Gupta et. al [58] prove that there are instances of the posterior sampling problem for which every
algorithm takes superpolynomial time, even though unconditional sampling is provably fast. Hence,
diffusion models excel at performing unconditional sampling but are hard to use as priors for solving
inverse problems because of the time dependence in the measurements matching term. Since the very
introduction of diffusion models, there has been a plethora of methods proposed to use them to solve
inverse problems without retraining. This survey serves as a reference point for different techniques
that have been developed in this space.

8
2.4.5 Ambient Diffusion: Learning to solve inverse problems using only measurements
The goal of the unsupervised learning approach for solving inverse problems (Section 2.4.4) is to
use a prior p(x) to approximate the measurements matching term, ∇ log pt (y|xt ). However, in
certain applications, it is expensive or even impossible to get data from (and hence learn) p(x) in the
first place. For instance, in MRI the quality of the data is proportionate to the time spent under the
scanner [162] and it is infeasible to acquire full measurements from black holes [3]. This creates a
chicken-egg problem: we need access to p(x) to solve inverse problems and we do not have access
to samples from p(x) unless we can solve inverse problems. In certain scenarios, it is possible to
break this seemingly impossible cycle.
Ambient Diffusion [37] was one of the first frameworks to train diffusion models with linearly
corrupted data. The key concept behind the Ambient Diffusion framework is the idea of further
corruption. Specifically, the given measurements get further corrupted and the model is trained to
predict a clean image by using the measurements before further corruption for validation. Ambient
DPS [2] shows that priors learned from corrupted data can even outperform (in terms of usefulness for
inverse problems), at the high-corruption regime, priors learned from clean data. Ambient Diffusion
was extended to handle additive Gaussian Noise in the measurements. The paper Consistent Diffusion
Meets Tweedie [36] was the first diffusion-based framework to provide guarantees for sampling from
the distribution of interest, given only access to noisy data. This paper extends the idea of further
corruption to the noisy case and proposes a novel consistency loss [33] to learn the score function for
diffusion times that correspond to noise levels below the level of the noise in the dataset.
Both Ambient Diffusion and Consistent Diffusion Meets Tweedie have connections to deep ideas
from the literature in learning restoration models from corrupted data, such as Stein’s Unbiased
Risk Estimate (SURE) [51, 137] and Noise2X [82, 79, 11]. These connections are also leveraged
by alternative frameworks to Ambient Diffusion, as in [74, 1]. A different approach for learning
diffusion models from measurements is based on the Expectation-Maximization (EM) algorithm [8,
119, 148]. The convergence of these methods to the true distribution depends on the convergence of
the EM algorithm, which might get stuck in a local minimum.
In this survey, we focus on the setting where a pre-trained prior p(x) is available, regardless of
whether it was learned from clean or corrupted data.

3 Reconstruction Algorithms
We summarize all the methods analyzed in this work in Table 1. The methods have been taxonomized
based on the approach they use to solve the inverse problem (explicit score approximations, variational
methods, CSGM-type methods and asymptotically exact methods), the type of inverse problems
they can solve and the optimization techniques used to solve the problem at hand (gradient descent,
sampling, projections, parameter optimization). Additionally, we provide links to the official code
repositories associated with the papers included in this survey. Please note that we have not conducted
a review or evaluation of these codebases to verify their consistency with the corresponding papers.
These links are included for informational purposes only.

Taxonomy based on the type of the reconstruction algorithm. We identified four families of
methods. Explicit Approximations for Measurement Matching: These methods approximate the
measurement matching score, ∇ log pt (y|xt ), with a closed-form expression. Variational Inference:
These methods approximate the true posterior distribution, p(x|y), with a simpler, tractable distribu-
tion. Variational formulations are then used to optimize the parameters of this simpler distribution.
CSGM-type methods: The works in this category use backpropagation to change the initial noise of
the deterministic diffusion sampler, essentially optimizing over a latent space for the diffusion model.
Asymptotically Exact Methods: These methods aim to sample from the true posterior distribution.
This is typically achieved by constructing Markov chains (MCMC) or by propagating particles
through a sequence of distributions (SMC) to obtain samples that approximate the posterior. Methods
that do not fall into any of these categories are classified as Others.

Taxonomy based on the type of optimization techniques used. The objective of all methods is to
explain the measurements. The measurement consistency can be enforced with different optimization
techniques, e.g. through gradients (Grad), projections (Proj), sampling (Samp), or other optimization

9
techniques (Opt). Methods that belong to the Grad-type take a single gradient step (either it be
deterministic or stochastic) to xt to enforce measurement consistency. Proj-type projects xt or
E[X0 |Xt = xt ] to the measurement subspace. Samp-type samples the next particles by defining
a proposal distribution, and propagates multiple chains of particles to solve the problem. Opt-type
either defines and solves an optimization problem for every timestep, or defines a global optimization
problem that encompasses all timesteps. When the method belongs to more than one type, we
seperate them with /. Note that the categorization of different “types” is subjective, and more often
than not, the category that the method belongs to may be interpreted in multiple ways. For instance, a
projection step is also a gradient descent step with a specific step size.

Taxonomy based on the type of the inverse problem. Based on the linearity of the corruption
operator A, the inverse problems can be classified as linear or nonlinear. The inverse problems can
be further categorized based on whether there is noise in the measurements. Additionally, they are
classified as non-blind or blind depending on whether full information about A is available. In blind
problems, the degradation operator (e.g., convolution kernel, inpainting kernel) is known, while its
coefficients are unknown but parametrized. For example, we might know that we have measurements
with additive Gaussian noise, but the variance of the noise might be unknown. Finally, in certain
inverse problems, there is additional text-conditioning. Such inverse problems are typically solved
with text-to-image latent diffusion models [115].

3.1 Explicit Approximations for the Measurements Matching Term

The first family of reconstruction algorithms we identify is the one were explicit approximations for
the measurements matching term, ∇xt log p(y|Xt = xt ), are made. It is important to underline that
these approximations are not always clearly stated in the works that propose them, which makes it
hard to understand the differences and commonalities between different methods. In what follows,
we attempt to elucidate the different approximations that are being made and present different works
under a common framework. To provide some insights, we often provide the explicit approximation
formulas for the measurements matching term in the setting of linear inverse problems. In general, it
follows the template form:

Lt Mt
∇xt log p(y|Xt = xt ) ≈ − . (3.1)
Gt

Here,
• Mt represents the error vector measuring the discrepancy between the observation y and
the estimated restored vector; for example, in Score ALD [67], Mt = y − Axt .
• Lt denotes a matrix that projects the error vector Mt from Rm back into an appropriate
space in Rn ; for instance, in Score ALD, Lt = A⊤ .
• Gt is the re-scaling scalar for the guidance vector Lt Mt ; for example, in Score ALD,
Gt = σy2 + γt2 with a hyperparameter γt .
In Figure 1, we summarize the approximation-based methods in this section using the template above.
We use ∝ to omit the guidance strength terms Gt .

10
3.1.0 Sampling from a Denoiser [68]
Kadkhodaie and Simoncelli [68] introduce a method for solving linear inverse problems by using the
implicit prior knowledge captured by a pre-trained denoiser on multiple noise levels. The method is
anchored on Tweedie’s formula that connects the least-squares solution for Gaussian denoising to the
gradient of the log-density of noisy images given in Equation 2.10
x̂(y) = y + σ 2 ∇y log p(y), (3.2)
2
where y = x + n, n ∼ N (0, σ In ).
By interpreting the denoiser’s output as an approximation of this gradient, the authors develop a
stochastic gradient ascent algorithm to generate high-probability samples from the implicit prior
yt = yt−1 + ht r(yt−1 ) + ϵt zt , (3.3)
where r(y) = x̂(y) − y is the denoiser residual, ht is a step size (parameter), and ϵt controls the
amount of newly introduced Gaussian noise zt .
To solve linear inverse problems such as deblurring, super-resolution, and compressive sensing, the
generative method is extended to handle constrained sampling. Given a set of linear measurements
xc = M ⊤ x of an image x, where M is a low-rank measurement matrix, the goal is to reconstruct
the original image by utilizing the following gradient:
∇y log p(y|xc ) = (I − M M ⊤ )r(y) + M (xc − M ⊤ y), (3.4)

This approach is particularly interesting because its mathematical foundation relies solely on
Tweedie’s formula, providing a simple yet powerful framework for tackling inverse problems using
denoisers.

3.1.1 Score ALD [67]


One of the first proposed methods for solving linear inverse problems with diffusion models is the
Score-Based Annealed Langevin Dynamics (Score ALD) [67] method. The approximation of this
work is that:
“lifting” matrix measurements error

A⊤ (y − Axt )
∇xt log p(y|Xt = xt ) ≈ − , (3.5)
σy2 + γt2
where γt is a parameter to be tuned. guidance strength
It is pretty straightforward to understand what this term is doing. The diffusion process is guided
towards the opposite direction of the “lifting” (application of the A⊤ operator) of the measurements
error, i.e. (y − Axt ), where the denominator controls the guidance strength.

3.1.2 Score-SDE [135]


Score-SDE [135] is another one of the first works that discussed solving inverse problems with
pre-trained diffusion models. For linear inverse problems, the difference between Score-ALD and
Score-SDE is that the latter noises the measurements before computing the measurements error.
Specifically, for t : σt > σy , the approximation becomes:
“lifting” matrix yt (noised measurements)

∇xt log p(y|Xt = xt ) ≈ − A ( y + σt ϵ − Axt ) (3.6)

where ϵ is sampled from N (0, Im ). Here, A is an orthogonal matrix, and taking a gradient step with
Equation 3.6 yields a noisy projection to yt = Axt where yt = y + σt ϵ. Hence, we categorize
Score-SDE as “projection”.
Disregarding the guidance strength of Equation 3.5, Equation 3.5 and Equation 3.6 look very similar.
Indeed, the only difference is that the latter has stochasticity that arises from the noising of the
measurements.

11
Special case: Inpainting (Repaint [89]) Observe that for the simplest case of inpainting, Equa-
tion 3.6 would be replacing the pixel values in the current estimate xt with the known pixel values
from the noised yt . Coincidentally, this is exactly the Repaint [89] algorithm that was proposed for
solving the inpainting inverse problem with pre-trained diffusion models. RePaint++ [117] improves
upon this approximation to run the forward-reverse diffusion processes multiple times, so that the
errors arising (e.g. boundaries) can be mitigated. This can be thought of as analogous to running
MCMC corrector steps as in predictor-corrector sampling [135].

3.1.3 ILVR [22]


ILVR is a similar approach that was initially proposed for the task of super-resolution. The approxi-
mation made here is the following:
“lifting” matrix measurements error

∇xt log p(y|Xt = xt ) ≈ −A† (yt − Axt ) = − (A⊤ A)−1 A⊤ ( yt − Axt ) , (3.7)
where A† is the Moore-Penrose pseudo-inverse of A, and similar to Score-SDE, yt = y + σt ϵ.
ILVR can be regarded as a pre-conditioned version of score-SDE. In ILVR, the projection to the
space of images happens using the Moore-Penrose pseudo-inverse of A, instead of the simple A⊤ .

3.1.4 DPS
All of the previous algorithms were proposed for linear inverse problems. Diffusion Posterior
Sampling (DPS) is one of the most well known reconstruction algorithms for solving non-linear
inverse problems. The underlying approximation behind DPS is that:

∇xt log p(y|Xt = xt ) ≈ ∇xt log p(y|X0 = E[X0 |Xt = xt ]). (3.8)
It is easy to see that:
p(y|X0 = E[X0 |Xt = xt ]) = N y; µ = A(E[X0 |Xt = xt ]), Σ = σy2 I .

(3.9)
Hence, the DPS approximation can be stated as:

∇xt log p(y|Xt = xt ) ≈ ∇xt log N y; µ = A(E[X0 |Xt = xt ]), Σ = σy2 I



(3.10)
 
1 2
= ∇xt ||y − A(E[X 0 |X t = xt ])|| (3.11)
2σy2
1
= ∇⊤ A(E[X0 |Xt = xt ]) (A(E[X0 |Xt = xt ]) − y) . (3.12)
2σy2 xt
For linear inverse problems, this simplifies to:
“lifting” matrix measurements error
1  
∇xt log p(y|Xt = xt ) ≈ − 2
∇⊤
xt E[X0 |Xt = xt ]A

y − AE[X0 |Xt = xt ] . (3.13)
2σy
guidance strength

We can further use Tweedie’s formula to further write it as:


1 2
⊤ ⊤  
∇xt log p(y|Xt = xt ) ≈ − I + ∇ xt log p t (x t ) A y − AE[X 0 |Xt = xt ] .
2σy2
(3.14)
In practice, DPS does not use the theoretical guidance strength but instead proposes to use a reweight-
ing with a step size inversely proportional to the norm of the measurement error.
MCG [27] provides a geometric interpretation of DPS by showing that the approximation used in
DPS can guarantee the noisy samples stay on the manifold. DSG [157] showed that one can choose a
theoretically “correct” step size under the geometric view of MCG, and combined with projected
gradient descent, one can achieve superior sample quality. MPGD [61] showed that by constraining
the gradient update step to stay on the low dimensional subspace by autoencoding, one can acquire
better results.

12
3.1.5 ΠGDM [130]
Recall the intractable integral in Equation 1.3. According to this relation, the DPS approximation is
achieved by setting
p(X0 |Xt ) ≈ δ(X0 − E[X0 |Xt = xt ]). (3.15)
In ΠGDM, the authors propose to use a Gaussian distribution for approximation
p(X0 |Xt ) ≈ N (E[X0 |Xt = xt ], rt2 In ), (3.16)
where rt is a hyperparameter. For linear inverse problems, this leads to
p(y|Xt ) ≈ N (AÊ[X0 |Xt = xt ], rt2 AA⊤ + σy2 In ). (3.17)
Subsequently, we have
∇xt log p(y|Xt = xt )
∂E[X0 |Xt = xt ] 2
≈− (rt AA⊤ + σy2 I)−1 A⊤ ( y − AE[X0 |Xt = xt ] ). (3.18)
∂xt
measurements error
“lifting” matrix

3.1.6 Moment Matching [119]


In ΠGDM, the distribution p(x0 |xt ) was assumed to be isotropic Gaussian. However, one can
calculate explicitly the variance matrix, V [x0 |xt ]. As shown in Lemma A.4, it holds that:
V [x0 |xt ] = σt4 H(log pt (xt )) + σt2 In (3.19)
= σt2 ∇xt E[x0 |xt ]. (3.20)
The Moment Matching [119] method approximates the distribution p(x0 |xt ) with an anisotropic
Gaussian:
p(x0 |xt ) ≈ N (E[x0 |xt ], V [x0 |xt ]). (3.21)
For linear inverse problems, this leads to the following approximation for the measurements’ score:
∇ log p(y|Xt = xt )
≈ − ∇xt E[x0 |xt ]⊤ A⊤ (σy2 I + Aσt2 ∇xt E[x0 |xt ]A⊤ )−1 ( y − AE[x0 |xt ] ). (3.22)

“lifting” matrix measurements error

In high-dimensions, even materializing the matrix ∇xt E[x0 |xt ] is computationally intensive. Instead,
the authors of [119] use automatic differentiation to compute the Jacobian-vector products.

3.1.7 BlindDPS [23]


Methods that were considered so far were designed for non-blind inverse problems, where A is fully
known. BlindDPS targets the case where we have a parametrized unknown forward model Aϕ (e.g.
blurring with an unknown kernel ϕ). In BlindDPS, on top of the posterior mean approximation of xt ,
one approximates the parameter of the forward model, again, with the posterior mean. Specifically,
we design two parallel generative SDEs
dxt = f (xt , t) − g 2 (t)∇xt log pt (xt , ϕt |y) dt + g(t)dWt

(3.23)
2

dϕt = f (ϕt , t) − g (t)∇ϕt log pt (xt , ϕt |y) dt + g(t)dWt , (3.24)
where the two SDEs are coupled through log pt (xt , ϕt |y), where under the independence between
Xt and Φt , the Bayes rule reads
∇xt log pt (xt , ϕt |y) = ∇xt log pt (xt ) + ∇xt log pt (y|Xt = xt , Φt = ϕt ) (3.25)
∇ϕt log pt (xt , ϕt |y) = ∇ϕt log pt (ϕt ) + ∇xt log pt (y|Xt = xt , Φt = ϕt ), (3.26)

13
where we see that Xt and Φt are coupled through the likelihood p(y|Xt , Φt ). In BlindDPS, the
approximation used in DPS is applied to both the image and the operator, leading to
p(y|Xt = xt , Φt = ϕt ) ≈ p(y|X0 = E[X0 |Xt = xt ], Φ0 = E[Φ0 |Φt = ϕt ]). (3.27)
The gradient of the coupled likelihood with respect to xt leads to
guidance strength “lifting” matrix ∇xt log p(y|Xt = xt , Φt = ϕt ) measurements error
1  
≈− ∇⊤ ⊤
xt E[X0 |Xt = xt ]AE[Φ0 |Φt =ϕt ] y − AE[Φ0 |Φt =ϕt ] E[X0 |Xt = xt ] . (3.28)
2σy2
Similarly, for ϕt , we have
guidance strength “lifting” matrix ∇ϕt log p(y|Xt = xt , Φt = ϕt ) measurements error
1 ⊤ ⊤
 
≈− ∇ ϕ E[X 0 |X t = xt ]A |Φ =ϕ ] y − A E[Φ |Φ =ϕ ] E[X 0 |X t = xt ] . (3.29)
2σy2 t E[Φ0 t t 0 t t

3.1.8 DDRM Family


The methods under the DDRM family poses all linear inverse problems to a noisy inpainting
problem, by decomposing the measurement matrix with singular value decomposition (SVD), i.e.
A = U ΣV ⊤ , where U ∈ Rm×m , V ∈ Rn×n are orthogonal matrices, and Σ ∈ Rm×n is a
rectangular diagonal matrix with singular values {sj }m
j=1 as the elements. One can then rewrite
y = Ax + σy z, z ∼ N (0, Im ) as
ȳ = Σx̄ + σy z̄, where ȳ := U ⊤ y, x̄ := V ⊤ x, z̄ := U ⊤ z. (3.30)
Subsequently, Equation 3.30 becomes an inpainting problem in the spectral space.

SNIPS [74]. SNIPS proceeds by first solving the inverse problem posed as Equation 3.30 in the
spectral space to achieve a sample x̄ ∼ p(x̄|ȳ), then retrieving the posterior sample with x̂ = V x̄.
The key approximation can be concisely represented as

∇x̄t log p(ȳ|X̄t = xt ) ≈ − Σ⊤ σy2 Im − σt2 ΣΣ⊤ ( ȳ − Σx̄t ), (3.31)

“lifting” matrix measurements error

For the simplest case of denoising where m = n and Σ = A = I, the method becomes [75]
y − xt
∇Xt log p(y|Xt = xt ) ≈ 2 . (3.32)
|σy − σt2 |
which produces a vector direction that is weighted by the absolute difference between the diffusion
noise level σt2 , and the measurement noise level σy2 . For the fully general case in Equation 3.31,
elements in different indices are weighted according to the singular values contained in Σ. In practice,
SNIPS uses pre-conditioning with the approximate negative inverse Hessian of log p(x̄t |ȳ) when
running annealed Langevin dynamics.

DDRM [73]. DDRM extends SNIPS by leveraging the posterior mean x̄0|t := V E[X0 |Xt = xt ]
in the place of x̄t used in SNIPS. i.e.,

∇x̄t log p(ȳ|X̄t = xt ) ≈ − Σ⊤ σy2 Im − σt2 ΣΣ⊤ ( ȳ − Σx̄0|t ). (3.33)

“lifting” matrix measurements error

Expressing Equation 3.33 element-wise, we get


 (i) (i)

 N (x̄t ; x̄0|t+1 , σt2 ) if si = 0

(i) (i) (i) 2 σ
p(x̄t |Xt+1 = xt+1 , y) = N (x̄t ; x̄0|t+1 , σt ) if σt < syi , (3.34)
2
N (x̄(i) ; ȳ (i) , σ 2 − σy ) if σ ≥ σy


t t s2 t si
i

14
where x(i) denotes the i-th element of the vector, and si its corresponding singular value. Here,
DDRM introduces another hyper-parameter η to control the stochasticity of the sampling process
 (i) (i)
(i) (i)
p x̄ −x̄0|t+1


 N (x̄ t ; x̄ 0|t+1 + 1 − η 2 σ t+1
t σt+1 , η 2 σt2 ) if si = 0
 (i) (i)
(i)
p(x̄t |Xt+1 = xt+1 , y) = N (x̄(i) ; x̄(i) + 1 − η 2 σt ȳ −x̄0|t+1 , η 2 σ 2 ) if σt < σy ,
p
 t 0|t+1 σy /si t si
N (x̄(i) ; ȳ (i) , σ 2 − σy2 )

 σy
t t s2i
if σ t ≥ si
(3.35)
with η ∈ (0, 1] such that η = 1.0 recovers Equation 3.34.

GibbsDDRM. GibbsDDRM [99] extends DDRM to the following blind linear problem y =
Aφ x + σy z, where Aφ is a linear operator parameterized by φ. Here, Aφ = Uφ Σφ Vφ⊤ has a φ
dependence SVD decomposition with singular values {sj,φ }m j=1 as the elements of the diagonal
matrix Σφ . In the spectral space, ȳφ := Uφ⊤ yφ , x̄φ := Vφ⊤ xφ , z̄φ := Uφ⊤ zφ . Subsequently, the
posterior mean in DDRM is replaced with x̄0|t,φ := Vφ E[X0 |Xt = xt ], also depending on φ. Thus,
it leads to the sampling process

 (i) (i)
(i) (i)
p x̄ −x̄


 N (x̄t,φ ; x̄0|t+1,φ + 1 − η 2 σt t+1,φσt+10|t+1,φ , η 2 σt2 ) if si,φ = 0
(i)

(i) (i) (i) ȳ (i) −x̄0|t+1,φ σy .
p(x̄t,φ |Xt+1 = xt+1 , y, φ) = N (x̄t,φ
p
; x̄ + 1 − η 2σ φ , η 2 σt2 ) if σt < si,φ
 0|t+1,φ t σ y /si,φ
N (x̄(i) ; ȳ (i) , σ 2 − σy2 )

 σy
t,φ φ t s2
if σt ≥ si,φ
i,φ
(3.36)
At time step t, φ is sampled by using the conditional distribution p(φ|xt:T , y) and updated for
several iterations in a Langevin manner:
ξ p
φ ← φ + ∇φ log p(φ|xt:T , y) + ξϵ,
2
where ξ is a stepsize and ϵ ∼ N (0, In ). Here, ∇φ log p(φ|xt:T , y) ≈ ∇φ log p(φ|x̄0|t,φ , y), and
the gradient can be computed as:
1 2
∇φ log p(φ|x̄0|t,φ , y) = − ∇φ y − Aφ x̄0|t,φ . (3.37)
2σy2

3.1.9 DDNM [149] family


A different way to find meaningful approximations for the conditional score is to look at the condi-
tional version of Tweedie’s formula, see Equation 2.15. Using Bayes rule and rearranging [111], we
have
E[X0 |Xt = xt , y] = xt + σt2 ∇xt log pt (Xt |y) (3.38)
= xt + σt2 ∇xt log pt (xt ) + σt2 ∇xt log pt (y|Xt = xt ) (3.39)
= E[X0 |Xt = xt ] + σt2 ∇xt log pt (y|Xt = xt ). (3.40)
The methods that belong to the DDNM family make approximations to E[X0 |Xt = xt , y] by making
certain data consistency updates to E[X0 |Xt = xt ].

DDNM [149]. The simplest form of update when considering no noise can be obtained through
range-null space decomposition, assuming that one can compute the pseudo-inverse. In DDNM, this
condition is trivially met by considering operations that are SVD-decomposable. DDNM proposes to
use the following projection step to the posterior mean to obtain an approximation of the conditional
posterior mean
E[X0 |Xt = xt , y] ≈ (I − A† A)E[X0 |Xt = xt ] + A† y, (3.41)

15
where A† is the Moore-Penrose pseudo-inverse of A. One can also express Equation 3.41 as an
approximation of the likelihood, consistent to other methods in the chapter. Specifically, notice that
by using the relation in Equation 3.40,
1
∇xt log pt (y|Xt = xt ) = 2 (E[X0 |Xt = xt , y] − E[X0 |Xt = xt ]). (3.42)
σt
Plugging in Equation 3.41 to Equation 3.42,
“lifting” matrix measurements error
1  
∇xt log pt (y|Xt = xt ) ≈ − 2 A† y − AE[X0 |Xt = xt , y] (3.43)
σt
guidance strength

When there is noise in the measurement, one can make soft updates
E[X0 |Xt = xt , y] ≈ (I − Σt A† A)E[X0 |Xt = xt ] + Σt A† y, Σ ∈ Rn×n . (3.44)
Also, similar to Equation 3.43,
“lifting” matrix measurements error
1  
∇xt log pt (y|Xt = xt ) ≈ − Σt A † y − AE[X0 |Xt = xt , y] (3.45)
σt2
guidance strength

Here, one can choose a simple Σt = λt I with λt set as a hyper-parameter, or use different scaling for
each spectral component. Observe that due to the relationship between the (conditional) score function
and the posterior mean established in Equation 3.40, we can also easily rewrite the approximation in
terms of the score of the posterior.

DDS [25], DiffPIR [164]. Both DDS and DiffPIR propose a proximal update to approximate the
conditional posterior mean, albeit from different motivations. The resulting approximation reads
1 λt
E[X0 |Xt = xt , y] ≈ arg min ∥y − Ax∥2 + ∥x − E[X0 |Xt = xt ]∥2 . (3.46)
x 2 2
The difference between the two algorithms comes from how one solves the optimization problem
in Equation 3.46, and how one chooses the hyperparameter λt . In DDS, the optimization is solved
with a few-step conjugate gradient (CG) update steps, by showing that DPS gradient update steps
can be effectively replaced with the CG steps under assumptions on the data manifold [25]. λt is
taken to be a constant value across all t. DiffPIR uses a closed-form solution for Equation 3.46, and
proposes a schedule for λt that is proportional to the signal-to-noise (SNR) ratio of the diffusion at
time t. Specifically, one chooses λt = σt ζ, where ζ is a constant.

3.2 Variational Inference

These methods approximate the true posterior distribution, p(x|y), with a simpler, tractable distribu-
tion. Variational formulations are then used to optimize the parameters of this simpler distribution.

3.2.1 RED-Diff [94]


Mardani et al. [94] introduce RED-diff, a new approach for solving inverse problems by leverag-
ing stochastic optimization and diffusion models. The core idea is to use variational method by
introducing a simpler distribution, q := N (µ, σ 2 In ), to approximate the true posterior p(x0 |y) by
minimizing the KL divergence DKL between them:
min DKL (q(x0 |y)∥p(x0 |y)). (3.47)
q

Here, DKL (q(x0 |y)∥p(x0 |y)) can be written as follows:



DKL (q(x0 |y)∥p(x0 |y)) = −Eq(x0 |y) [log p(y|x0 )] + DKL q(x0 |y)∥p(x0 ) +constant. (3.48)
| {z }
Variational Bound (VB)

16
via classic variational inference argument. The first term in VB can be simplified into reconstruction
loss, and the second term can be decomposed as score-matching objective which involves matching
the score function of the variational distribution with the score function of the true posterior denoisers
at different timesteps:

||y − A(µ)||2
min + Et,ϵ [λt ||ϵθ (xt ; t) − ϵ||2 ] (3.49)
µ 2σy2

where µ is the mean of the variational distribution, and σv2 is the noise variance in the observation,
ϵθ (xt ; t) is the score function of the diffusion model at timestep (t) and λt is a time-weighting factor.
Sampling as optimization. The goal is then to find an image µ that reconstructs the observation y
given by f , while having a high likelihood under the denoising diffusion prior (regularizer). This
score-matching objective is optimized using stochastic gradient descent, effectively turning the
sampling problem into an optimization problem. The weighting factor (λt ) is chosen based on the
signal-to-noise ratio (SNR) at each timestep to balance the contribution of different denoisers in the
diffusion process.

3.2.2 Blind RED-Diff [6]


In [6] authors introduce blind RED-diff, an extension of the RED-diff framework [94] to solve blind
inverse problems. The main idea is to use variational inference to jointly estimate the latent image
and the unknown forward model parameters.
Similar to RED-Diff, the key mathematical formulation is the minimization of the KL-divergence
between the true posterior distribution p(x0 , γ|y) and a variational approximation q(x0 , γ|y):
min DKL (q(x0 , γ|y)∥p(x0 , γ|y)).
q

If we assume the latent image and the forward model parameters are independent, the KL-divergence
can be decomposed as:
DKL (q(x0 |y)||p(x0 )) + DKL (q(γ|y)∥p(γ)) − Eq(x0 ,γ|y) [log p(y|x0 , γ)] + log p(y).
The minimization with respect to q involves three terms:

i. DKL (q(x0 |y)∥p(x0 )) represents the KL divergence between the variational distribution of
the image (x0 ) and its prior distribution. This term is approximated using a score-matching
loss, which leverages denoising score matching with a diffusion model (as in RED-Diff).
ii. DKL (q(γ|y)∥p(γ)) is the KL divergence between the variational distribution of the forward
model parameters (γ) and their prior distribution. This term acts as a regularizer on γ.
iii. −Eq(x0 ,γ|y) [log p(y|x0 , γ)] is the expectation of the negative log-likelihood of the observed
data y given the image x0 and the forward model parameters γ. This term ensures data
consistency.

The resulting optimization can be achieved using alternating stochastic optimization, where the image
x0 and the forward model parameters γ are updated iteratively.
The formulation assumes conditional independence between x0 and γ given the measurement y, and
it also requires a specific form for the prior distribution p(γ).

3.2.3 Score Prior [54]


We again start by introducing a variational distribution qϕ (x0 ) that aims to approximate the posterior
distribution determined by the diffusion prior. The optimization problem becomes
min DKL (qϕ (x0 )||pθ (x0 |y)) (3.50)
ϕ
Z
min qϕ (x0 |y)[− log p(y|x0 ) − log pθ (x0 ) + log qϕ (x0 )]. (3.51)
ϕ

17
One of the most expressive yet tractable proposal distributions is normalizing flows (NF) [112, 44].
Choosing qϕ to be an NF, we can transform the optimization problem to
 
 dGϕ (z) 
min Ez∼N (0,In ) − log p(y|Gϕ (z)) − log pθ (Gϕ (z)) + log π(z) − log det  , (3.52)
 
ϕ | {z } | {z } dz 
Likelihood Prior | {z }
Entropy

where the expectation is over the input latent variable z, and π is the reference Gaussian distribu-
tion. Observe that the likelihood term and the entropy can be efficiently computed with a single
forward/backward pass through the NF due to the parametrization of qϕ with an NF. All that is left
for us is to compute the prior term log pθ (Gϕ (z)). In score prior [54], this is solved by leveraging
the instantaneous change-of-variables formula with the diffusion PF-ODE, as originally proposed
in [135]
Z T
log pθ (x0 ) = log pT (xT ) + ∇ · f˜θ (xt , t) dt, (3.53)
0
where fθ (xt , t) is the drift term of the reverse SDE in Equation 2.2 with the score replaced by
the network approximation. Notice that by plugging in Equation 3.53 to Equation 3.52, we can
optimize the NF model in an unsupervised fashion. Notice that while this formulation does not incur
approximation errors, it is very costly as every optimization steps involve computing Equation 3.53.
Moreover, observe that the training of NF is done for a specific measurement y. One has to run
Equation 3.52 for every different measurement that one wishes to recover.

3.2.4 Efficient Score Prior [53]


As computing Equation 3.53 is costly, Feng et al. proposed to optimize qϕ with the evidence lower
bound (ELBO), originally presented in the work of Score-flow [132] bθ (x0 ) ≤ log pθ (x0 )
1 T
Z
bθ (x0 ) = Ep(xT |x0 ) [log π(xT )] − g(t)2 h(t) dt, (3.54)
2 0
where
" #
1 2 2 2
h(t) := Ep(xt |X0 ) ∥hθ (xt ) − x0 ∥2 −∥∇xt log p(xt |x0 )∥2 − ∇xt · f (xt , t) . (3.55)
σt4 g(t)2
| {z }
Denoising loss

Intuitively, the value of bθ is small when we have a small denoising loss, and large when our diffusion
denoiser hθ cannot properly denoise the given image. Replacing the exact likelihood Equation 3.53
that requires hundreds to thousands of NFEs to the surrogate denoising likelihood Equation 3.54 that
requires only a single NFE makes the method much more efficient and scalable to higher dimensions.

3.3 Asymptotically Exact Methods

These methods aim to sample from the true posterior distribution. Of course, the intractability
of the posterior distribution cannot be circumvented but what these methods trade compute for
approximation error: as the number of network evaluations increases to infinity, these methods will
asymptotically converge to the true posterior (assuming no other approximation errors).

3.3.1 Plug and Play Diffusion Models (PnP-DM) [152]


As explained in the introduction, the end goal is to sample from the distribution p(x0 |y) ∝
p(x0 )p(y|x). The authors of [152] introduce an auxiliary variable z and an auxiliary distribution:
 
1
π(x0 , z|y) ∝ p(x0 ) · p(y|z) · exp − 2 ||x0 − z||2 . (3.56)

It is easy to see that as ρ → 0, the auxiliary distribution converges to the target distribution p(x0 |y).
To sample from the joint distribution π(x0 , z|y), the authors use Gibbs Sampling, i.e. the alternate
between sampling from the posteriors. Specifically, the sampling algorithm alternates between two
steps:

18
• Likelihood term:
 
(k) (k) 1 (k) 2
z ∼ π(z|y, x0 ) ∝ p(y|z) · exp − 2 ||x0 − z|| . (3.57)

• Prior term:
 
(k+1) 1 (k)
x0 ∼ π(x0 |y, z (k) ) ∝ p(x0 ) · exp − 2 ||x0 − z (k) ||2 . (3.58)

(k)
The likelihood term samples a vector that satisfies the measurements and is close to x0 . The prior
term samples a vector that is likely under p(x0 ) and is close to z (k) . For most problems of interest,
sampling from Equation 3.57 is easy because the distribution is log-concave, e.g. that’s the case for
linear inverse problems. The interesting observation is that sampling from Equation 3.58 corresponds
to a denoising problem, for which diffusion models excel. Indeed, for any xt at noise level σt , we
have that:
 
1 2
p(x0 |xt ) ∝ p(x0 )p(xt |x0 ) = p(x0 )exp − 2 ||x0 − xt || . (3.59)
2σt
Hence, to sample from Equation 3.58, one initializes the reverse process at z (k) and time t such that:
σt = ρ.

3.3.2 FPS [47]


FPS connects posterior sampling to Bayesian filtering and uses sequential Monte Carlo methods to
solve the filtering problem, avoiding the need to handcraft approximations to the posterior p(y|xt ).
Given an observation y, FPS proposes to first construct a sequence {yt }N t=0 from y, and then
determine a tractable distribution p(xt−1 |xt , yt−1 ). Starting from xN ∼ N (0, In ), FPS can then
recursively sample xt for t = N − 1, . . . , 1, and finally obtain x0 . Specifically, FPS consists of two
steps:

Step 1. Generating a sequence of {yt }N t=0 with an observation y. This can be done either using
the forward process or unconditional DDIM backward sampling.
For the construction via the forward process, we recursively construct yt as follows:
yt = yt−1 + σt Azt , initialized with y0 := y. (3.60)
This arises from xt = xt−1 + σt zt and applying the linear operator A to it.
For the construction via backward sampling, FPS uses methods such as unconditional DDIM
as in Equation 2.9,
yt−1 = ut y0 + v t yt + wt Azt , initialized with yN ∼ N (0, AA⊤ ).
|{z} |{z} | {z }
clean direction to time t sample noise
(3.61)
Here, ut , vt , and wt are DDIM coefficients that can be explicitly computed. Note that yN is
sampled from N (0, AA⊤ ) because the prior distribution of the diffusion model is a standard
Gaussian xN ∼ N (0, I), and due to the linearity of the inverse problem, yN = AxN .
Step 2. Generating a backward sequence of {xt }N N
t=0 from Step 1’s {yt }t=0 . First, note that
p(xt−1 |xt , yt−1 ) is a tractable normal distribution. This results from applying Bayes’ rule
and the conditional independence of xt and the random vector Yt−1 given xt−1 :
p(xt−1 |xt , Yt−1 ) ∝ p(xt−1 |xt )p(Yt−1 |xt−1 ). (3.62)
Here, p(xt−1 |xt ) is approximated via backward diffusion sampling with learned scores, and
p(Yt−1 |xt−1 ) = N (Axt−1 , c2t−1 I), where ct−1 , dependent on σy > 0, can be computed
explicitly [134]. Thus, with {yt }N
t=0 and initial condition xN ∼ N (0, In ), FPS recursively
samples xN −1 , · · · x1 using p(xt−1 |xt , Yt−1 = yt−1 ), ultimately yielding x0 .

FPS algorithm is theoretically supported to recover the oracle p(x|y) once the step size is sufficiently
small.

19
3.3.3 PMC [138]
Plug-and-Play (PnP) [71] and RED [114] are two representative methods of using denoisers as priors
for solving inverse problems. Let gy (x) = 2σ12 ∥y − Ax∥22 be the log-likelihood function, hσθ (·) an
y
MMSE denoiser from Equation 2.11 conditioned on the noise level σ, and Rθσ (·) := Id − hσθ (·) the
residual projector. Note that conditioning on the noise level σ is equivalent to the network being
conditioned no t, since the mapping is one-to-one. A single iteration of these methods read
• PnP proximal gradient method [72]:
xk+1 = hσθ (xk − γ∇xk gy (xk )) (3.63)
 
1
= xk − γ ∇xk gy (xk ) + Rθσ (xk − γ∇xk gy (xk )) . (3.64)
γ
• RED gradient descent [114]:
xk+1 = xk − γ (∇xk gy (xk ) + τ (xk − hσθ (xk ))) (3.65)
= xk − γ (∇xk gy (xk ) + τ Rθσ (xk )) . (3.66)
Notice that by using Tweedie’s formula, we see that Rθσ (x) = −σ 2 ∇x log pσ (x). Rearranging
Equation 3.64 and Equation 3.66,
• PnP:
xk+1 − xk σ2
= −P (xk ), P (x) := ∇x gy (x) − ∇x log pσ (x − γ∇x gy (x)), (3.67)
γ γ
• RED:
xk+1 − xk
= −G(xk ), G(x) := ∇x gy (x) − τ σ 2 ∇x log pσ (x). (3.68)
γ
Moreover, by setting γ = σ 2 and τ = 1/σ 2 , one can show that
lim P (x) = ∇x gy (x) − lim {∇x log p(x − σ 2 ∇xk gy (xk ))}
σ→0 σ→0
= ∇x gy (x) − lim { lim log pσ (x)} = lim G(x)
σ→0 σ→0 σ→0
= −∇x log p(y|x) − ∇x log p(x) = −∇x log p(x|y). (3.69)
In other words, we see that the iteration of PnP/RED in Equation 3.64 and Equation 3.66 will converge
to sampling from the posterior as σ 2 = γ → 0
dxt = ∇xt log p(xt |y) dt, (3.70)
where t indexes the continuous time flow of x, as opposed to the discrete formulations in Equation 3.64
and Equation 3.66. Note that this notion of t does not match the diffusion time t, where the time
index matches a specific noise level. In PMC, the authors propose to incorporate noise level annealing
as done in the usual reverse diffusion process by starting from a large noise level σ and gradually
reducing the noise level. Solving Equation 3.70 with PMC then boils down to iterative application
of Equation 3.64 and Equation 3.66 with the annealing strategy. Moreover, introducing Langevin
diffusion yields a stochastic version

dxt = ∇xt log p(xt |y) dt + 2dWt , (3.71)
which can be solved in the same way, but with additional stochasticity.

3.3.4 Sequential Monte Carlo-based methods


SMCDiff [142], MCGDiff [17], and TDS [151] belong to the category of sequential Monte Carlo
(SMC)-based methods [48]. SMC aims to sample from the posterior by constructing a sequence of
distributions X1:T , which terminates at the target distribution. The evolution of the distribution is
approximated by K particles. In a high level, SMC can be described with three steps: 1) Transition
with a proposal kernel {x1:K
t } ∼ p(Xt |Xt−1 ), 2) computing the weights to re-weight the importance,
and 3) resampling from a reweighted multinomial distribution. Methods that belong to this category
propose different ways of constructing the proposal distribution and the weighting function.

20
3.4 CSGM-Type methods

3.4.1 DMPlug [147], SHRED [21]


Compressed sensing generative model (CSGM) [15, 34] is a general method for solving inverse
problems with deep generative models by aiming to find the input latent vector z through
z ∗ = arg min ∥y − AGθ (z)∥2 , (3.72)
z

where Gθ is an arbitrary generative model. DMPlug and SHRED can be seen as extensions of CSGM
to the case where one uses a diffusion model. Unlike GANs or Flows where the mapping from the
latent space to the image space is done through a single NFE, diffusion models require multiple NFE
to solve the generative SDE/ODE. One can rewrite Equation 3.72 as
z ∗ = arg min ∥y − Ax̂(z)∥2 , (3.73)
z

where x̂ = x̂(z) is the solution of the deterministic sampler initialized at z. Essentially, the models
in this category optimize over the “latent” space of noises that are fed to the deterministic ODE
sampler. One caveat of Equation 3.73 is the exploding memory required for backpropagation through
time. To mitigate this, when sampling from pθ (x0 |xT ), a few-step sampling (e.g. 3 for DMPlug and
10 for SHRED) is used to approximate the true sampling process.

3.4.2 CSGM with consistent diffusion models [154]


Diffusion models can be distilled into one-step models, known as Consistency Models [131], that
solve in one step the Probability Flow ODE. These models can be used in Equation 3.73, replacing
the ODE sampling, to reduce the computational requirements [154].

3.4.3 Intermediate Layer Optimization [34, 32]


CSGM has been extended to perform the optimization in some intermediate latent space [34]. The
problem is that the intermediate latents need to be regularized to avoid exiting the manifold of realistic
images. Score-Guided Intermediate Layer Optimization (Score-ILO) [32] uses diffusion models to
regularize the intermediate solutions.

3.5 Latent Diffusion Models

3.5.1 Motivation
In this subsection, we focus on algorithms that have been developed for solving inverse problems
with latent diffusion models (see Section 2.3). There are a few additional challenges when dealing
with latent diffusion models that have led to a growing literature of papers that are trying to address
them.

Loss of linearity. The first challenge in solving inverse problems with latent diffusion models is
that linear inverse problems become essentially non-linear. The problem stems from the fact that
diffusion happens in the latent space but measurements are in the pixel-space. In order to guide the
diffusion there are two potential solutions: i) either project the measurements to the latent space
through the encoder, or, ii) project the latents to the pixel space as we diffuse through the decoder.
Both approaches depend on non-linear functions (Enc, Dec respectively) and hence even linear
inverse problems need a more general treatment.

Decoding is expensive. The other issue that arises is computational. Most of the time, we need to
decode the latent to pixel-space to compare with the measurements. The motivation behind latent
diffusion models is to accelerate training and sampling. Hence, we want to avoid repeated calls to the
decoder as we solve inverse problems.

Decoding-encoding map is not one-to-one. Even if we ignore the computational challenges, it


is not straightforward to decode the latent to the pixel-space, compare with the measurements and
get meaningful guidance in the latent space since the decoding-encoding map is not an one-to-one
function.

21
Text-conditioning. Finally, latent diffusion models typically get a textual prompt as an additional
input. A lot of algorithms that have been developed in the space of using latent diffusion models to
solve inverse problems innovate on how they use text conditioning.

3.5.2 Latent DPS


The first algorithm we review in the space of solving inverse problems with latent diffusion models is
Latent DPS, i.e. the straightforward extension of DPS for latent diffusion models. The approximation
made in this algorithm is:
∇xEt log p(y|XtE = xE
t ) ≈ ∇xE
t
log p(y|X0 = Dec(E[X0E |XtE = xE
t ])). (3.74)

The algorithm works by performing one-step denoising in the latent space and measuring how much
the decoding of the denoised latent matches the measurements y.

3.5.3 PSLD [118]


The performance of Latent DPS is hindered by the fact that the decoding-encoding map is not an
one-to-one function, as discussed earlier. The approximation made above could pull xE t towards any
latent xE
0 that has a decoding that matches the measurements while the score function is pulling xE
t
E E E
towards a specific x0 , i.e. towards E[x0 |xt ].
PSLD mitigates this problem by adding an additional term that pulls towards latents that are fixed
points of the decoder-encoder map. Concretely, the approximation made in PSLD is:
∇xEt log p(y|XtE = xE
t )≈

∇xEt log p(y|(X0 = Dec(E[X0E |XtE = xE


t ]))
2
+γt ∇xEt E[xE E E E
0 |xt ] − Enc(Dec(E[x0 |xt ])) , (3.75)
where γt is a tunable parameter.

3.5.4 Resample [128]


Resample, a concurrent work with PSLD, proposes an alternative way to improve the performance
of Latent DPS. After each clean prediction x b0 (xE E
t+1 ) is obtained from the previous sample xt+1
via Tweedie’s formula in Equation 2.10, and the unconditional reverse denoising process is updated
using, say, DDIM:
x′t := UnconditionalDDIM x b0 (xE E

t+1 ), xt+1 (3.76)
the authors project the latent back to a point x
bt that satisfies measurements using:

 2
σ ᾱt xb0 (y) + (1 − ᾱt )x′t σt2 (1 − ᾱt ) 
N x bt ; t , Ik . (3.77)
σt2 + (1 − ᾱt ) σt2 + (1 − ᾱt )
Here, σt2 is a hyperparameter used to tune the alignment with measurements, ᾱt is predefined in
forward process, and xb0 (y) is found by solving:
1
b0 (y) ∈ arg min ||y − A(Dec(x))||22 initialized at x
x b0 (xE
t+1 ). (3.78)
x 2

3.6 MPGD [60]

The MPGD authors note that some methods require expensive computations for measurement
alignment during gradient updates, as they involve passing through the gradient (chain rule) of the
pre-trained diffusion model ϵθ (xEt , t):
 2
xt ← xE
E
t − ηt ∇xE t
y − A Dec(x0|t ) 2 , (3.79)
1
√ 
where x0|t := √ᾱt xE t − 1 − ᾱt ϵθ (xE
t , t) is a clean estimation via Tweedie’s formula in Equa-
tion 2.10. This gradient bottleneck slows down the overall inverse problem solving. MPGD proposes
bypassing the direct gradient ∇xEt with theoretical guarantees by updating with ∇x0|t :
 2
x′0|t ← x0|t − ηt ∇x0|t y − A Dec(x0|t ) 2 (3.80)

22
with
1 √
xE 1 − ᾱt ϵθ (xE

x0|t := √ t − t , t) , (3.81)
ᾱt
and use the obtained x′0|t for unconditional reverse denoising process
x′t−1 := UnconditionalDDIM x′0|t , xE

t . (3.82)

3.6.1 P2L [30]


While text conditioning is a viable option for modern latent diffusion models such as Stable diffusion,
the actual use was underexplored due to ambiguities on which text to use. P2L addresses this question
by proposing an algorithm that optimizes for the text embedding on the fly while solving an inverse
problem.
c∗t = arg min ∥y − ADec(E[xE E 2
0 |xt , c])∥ , (3.83)
c
where c is the text embedding, and one can approximate E[xE E
0 |xt , c] by using the Tweedie’s formula
with the denoiser conditioned on c. Using the optimized embedding at each timestep c∗t , sampling
follows the procedure of Latent DPS
E ∗
∇xEt log p(y|xE E
t = xt , c) ≈ ∇xE t
log p(y|X0 = Dec(E[xE E
0 |xt = xt , ct ])) (3.84)
In addition to the optimization of the text embedding, P2L further tries to leverage the VAE prior by
decoding - running optimization in the pixel space - re-encoding
x∗ = arg min ∥y − Ax∥22 + λ∥x − Dec(E[xE E E
0 |xt = xt ])∥2
2
(3.85)
x
xE = Enc(x∗ ) (3.86)
3.6.2 TReg [77], DreamSampler [78]
Instead of automatically finding a suitable text embedding to achieve maximal reconstructive perfor-
mance, another advantage of text conditioning is that it can be used as an additional guiding signal to
lead to a specific mode. This may seem trivial, as one has access to a conditional diffusion model.
However, in practice, simply using a conditional diffusion model does not induce enough guidance as
reported in [41, 64], and naively using classifier free guidance [64] (CFG) does not lead to satisfactory
results. In addition to using data consistency imposing steps as in P2L, TReg proposes adaptive
negation to update the null text embeddings used for CFG guidance.
c∗∅ = arg min sim(T (x∗ ), c), (3.87)
c
where x∗ comes from Equation 3.85, sim denotes the CLIP similarity [110] score, and T is the CLIP
image encoder. In essence, Equation 3.87 minimizes the similarity between the current estimate of
the image and the null text embedding. Hence, when the optimized c∅ is used for CFG with
∗ E ∗
ϵθ (xE E

t , c∅ ) + ω ϵθ (xt , c) − ϵθ (xt , c∅ ), (3.88)
E ∗
the conditioning vector direction ϵθ (xE
t , c) − ϵθ (xt , c∅ ) is amplified. Later, TReg was further
advanced by devising a way to better make use of CFG by combining score distillation sampling [108]
into the sampling framework.

3.6.3 STSL [116]


Most methods leverage the mean of the reverse diffusion distribution p(X0 |Xt ), and take a single
gradient step with Equation 3.74. To further leverage the covariance of p(X0 |Xt ), Rout et al. [116]
propose to use the following fidelity loss
L(xEt , y) = ∇xE t
log p(y|X0 = Dec(E[xE E E
0 |xt = xt ]))
  
+γ∇xEt Trace ∇2xE log p(xE t ) , (3.89)
t

where γ is a constant. To effectively compute the trace, one can further use the following approxima-
tion   h  i

Trace ∇2xE log p(xE t ) ≈ Eϵ∼π ϵ ∇xEt log p(xEt + ϵ) − ∇xE t
log p(x E
t ) , (3.90)
t

where π can be a Gaussian or a Rademacher distribution. Using the loss in Equation 3.89 with
Equation 3.90, STSL uses multiple steps of stochastic gradient updates per timestep.

23
4 Thoughts from the authors
In the previous section, we presented several works in the space of using diffusion models to solve
inverse problems. A natural question that both experts and newcomers to the field might have is,
eventually,: “which approach works the best?”. Unfortunately, we cannot provide a conclusive
answer to this question within the scope of this survey, but we can share a few thoughts.

Thoughts about Explicit Approximations. In this survey we tried to express seemingly very
different works, such as DPS and DDRM, under a common mathematical language that contains
the explicit approximations made for the measurements score. We observed that all the methods
compute an error metric that matches consistency with the measurement and then lift the error back
to the image space dimensions to perform the gradient update. Some of the methods used noised
versions of the measurements to compute the error while others use the clean measurements. To the
best of our knowledge, it is not clear which one works the best and one can derive new approximation
algorithms by simply making the dual change to any of the methods that already exist, e.g. one
can propose Score-ALD++ by using the noisy measurements to compute the error. By looking at
Figure 1, it is also evident that methods propose increasingly more complex “lifting” matrices. Some
of these approximations require increased computation, e.g. the Moments Matching method. We
strongly believe that the field would benefit from a standardized benchmark for diffusion models and
inverse problems to understand better the computational performance trade-offs of different methods.
We also believe that under certain distributional assumptions, it should be possible to characterize
analytically the propagation of the approximation errors induced by the different methods.

Thoughts about Variational Methods. Variational Methods try to estimate the parameters of a
simpler distribution. The benefit here is that one can employ well-known optimization techniques
to better solve the optimization problem at hand. A potential drawback of this approach is that the
proposed distribution might not be able to capture the complexity of the real posterior distribution.

Thoughts about CSGM-type Methods. CSGM-type frameworks can benefit from the plethora of
techniques that have been previously developed to solve inverse problems with GANs and other deep
generative modeling frameworks. The main issue here is computational since the generative model
to be inverted here is the Probability Flow ODE mapping that requires several calls to the diffusion
model. Consistency Models [131, 76] and other approaches such as Intermediate Layer Optimization
could mitigate this issue.

Thoughts about Asymptotically Exact Methods. Asymptotically Exact Methods, usually based
on Monte Carlo, could be useful when sampling from the true posterior is really important. However,
the theoretical guarantees of these methods only hold under the setting of infinite computation and it
remains to be seen if they can scale to more practical settings.

5 Conclusion
In this survey, we discussed different types of inverse problems and different approaches that have
been developed to solve them using diffusion priors. We identified four distinct families: methods
that propose explicit approximations for the measurement score, variational inference methods,
CSGM-type frameworks and finally approaches that asymptotically guarantee exact sampling (at
the cost of increased computation). The different frameworks and the works therein are all trying
to address the fundamental problem of the intractability of the posterior distribution. In this survey,
we tried to unify seemingly different approaches and explain the trade-offs of different methods.
We hope that this survey will serve as a reference point for the vibrant field of diffusion models for
inverse problems.

Acknowledgments
This research has been supported by NSF Grants AF 1901292, CNS 2148141, Tripods CCF 1934932,
IFML CCF 2019844 and research gifts by Western Digital, Amazon, WNCG IAP, UT Austin Machine
Learning Lab (MLL), Cisco and the Stanly P. Finch Centennial Professorship in Engineering. Giannis

24
Daras has been supported by the Onassis Fellowship (Scholarship ID: F ZS 012-1/2022-2023), the
Bodossaki Fellowship and the Leventis Fellowship. The authors would like to thank our colleagues
Viraj Shah, Miki Rubinstein, Murata Naoki, Yutong He, and Stefano Ermon for helpful discussions.

References
[1] Asad Aali, Marius Arvinte, Sidharth Kumar, and Jonathan I Tamir. “Solving Inverse Prob-
lems with Score-Based Generative Priors learned from Noisy Data”. In: arXiv preprint
arXiv:2305.01166 (2023) (pages 2, 9).
[2] Asad Aali, Giannis Daras, Brett Levac, Sidharth Kumar, Alexandros G Dimakis, and Jonathan
I Tamir. “Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion
Models trained on Corrupted Data”. In: arXiv preprint arXiv:2403.08728 (2024) (pages 1, 9).
[3] Kazunori Akiyama, Antxon Alberdi, Walter Alef, Keiichi Asada, Rebecca Azulay, Anne-
Kathrin Baczko, David Ball, Mislav Baloković, John Barrett, Dan Bintley, et al. “First M87
event horizon telescope results. IV. Imaging the central supermassive black hole”. In: The
Astrophysical Journal Letters 875.1 (2019), p. L4 (pages 4, 9).
[4] Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. “Stochastic interpolants: A
unifying framework for flows and diffusions”. In: arXiv preprint arXiv:2303.08797 (2023)
(page 4).
[5] Michael Samuel Albergo and Eric Vanden-Eijnden. “Building Normalizing Flows with
Stochastic Interpolants”. In: The Eleventh International Conference on Learning Representa-
tions. 2023. URL: https://openreview.net/forum?id=li7qeBbCR1t (page 4).
[6] Cagan Alkan, Julio Oscanoa, Daniel Abraham, Mengze Gao, Aizada Nurdinova, Kawin
Setsompop, John M Pauly, Morteza Mardani, and Shreyas Vasanawala. “Variational Diffusion
Models for Blind MRI Inverse Problems”. In: NeurIPS 2023 Workshop on Deep Learning
and Inverse Problems. 2023 (pages 2, 17).
[7] Brian D.O. Anderson. “Reverse-time diffusion equation models”. In: Stochastic Processes
and their Applications 12.3 (1982), pp. 313–326 (page 6).
[8] Weimin Bai, Yifei Wang, Wenzheng Chen, and He Sun. “An Expectation-Maximization
Algorithm for Training Clean Diffusion Models from Corrupted Observations”. In: arXiv
preprint arXiv:2407.01014 (2024) (page 9).
[9] Harrison H Barrett and Kyle J Myers. Foundations of image science. John Wiley & Sons,
2013 (page 4).
[10] Robert Bassett and Julio Deride. “Maximum a posteriori estimators as a limit of Bayes
estimators”. In: Mathematical Programming 174 (2019), pp. 129–144 (page 4).
[11] Joshua Batson and Loic Royer. “Noise2self: Blind denoising by self-supervision”. In: Inter-
national Conference on Machine Learning. PMLR. 2019, pp. 524–533 (page 9).
[12] Amir Beck and Marc Teboulle. “Fast gradient-based algorithms for constrained total variation
image denoising and deblurring problems”. In: IEEE transactions on image processing 18.11
(2009), pp. 2419–2434 (page 4).
[13] Gregory Beylkin. “The inversion problem and applications of the generalized Radon trans-
form”. In: Communications on pure and applied mathematics 37.5 (1984), pp. 579–599
(page 2).
[14] Yochai Blau and Tomer Michaeli. “The Perception-Distortion Tradeoff”. In: 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition. IEEE. 2018 (page 4).
[15] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. “Compressed sensing using
generative models”. In: International conference on machine learning. PMLR. 2017, pp. 537–
546 (page 21).
[16] Emmanuel J Candès, Justin Romberg, and Terence Tao. “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information”. In: IEEE Transactions
on information theory 52.2 (2006), pp. 489–509 (page 4).
[17] Gabriel Cardoso, Sylvain Le Corff, Eric Moulines, et al. “Monte Carlo guided Denois-
ing Diffusion models for Bayesian linear inverse problems.” In: The Twelfth International
Conference on Learning Representations. 2023 (pages 2, 5, 20).
[18] Stanley H Chan, Xiran Wang, and Omar A Elgendy. “Plug-and-play ADMM for image restora-
tion: Fixed-point convergence and applications”. In: IEEE Transactions on Computational
Imaging 3.1 (2016), pp. 84–98 (page 4).

25
[19] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. “Learning to see in the dark”. In: IEEE
Conference on Computer Vision and Pattern Recognition. 2018, pp. 3291–3300 (page 4).
[20] Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. “Simple baselines for image
restoration”. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel,
October 23–27, 2022, Proceedings, Part VII. Springer. 2022, pp. 17–33 (page 4).
[21] Hamadi Chihaoui, Abdelhak Lemkhenter, and Paolo Favaro. Zero-shot Image Restoration
via Diffusion Inversion. 2024. URL: https://openreview.net/forum?id=ZnmofqLWMQ
(pages 2, 5, 21).
[22] Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon.
“ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models”. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021, pp. 14367–
14376 (pages 2, 5, 12).
[23] Hyungjin Chung, Jeongsol Kim, Sehui Kim, and Jong Chul Ye. “Parallel diffusion models of
operator and image for blind inverse problems”. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. 2023, pp. 6059–6069 (pages 2, 13).
[24] Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and
Jong Chul Ye. “Diffusion Posterior Sampling for General Noisy Inverse Problems”. In:
The Eleventh International Conference on Learning Representations. 2023. URL: https:
//openreview.net/forum?id=OnD9zGAGT0k (pages 2, 5).
[25] Hyungjin Chung, Suhyeon Lee, and Jong Chul Ye. “Decomposed Diffusion Sampler for
Accelerating Large-Scale Inverse Problems”. In: arXiv preprint arXiv:2303.05754 (2023)
(pages 2, 5, 16).
[26] Hyungjin Chung, Dohoon Ryu, Michael T McCann, Marc L Klasky, and Jong Chul Ye.
“Solving 3d inverse problems using pre-trained 2d diffusion models”. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 22542–22551
(page 1).
[27] Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. “Improving diffusion
models for inverse problems using manifold constraints”. In: Advances in Neural Information
Processing Systems 35 (2022), pp. 25683–25696 (pages 2, 12).
[28] Hyungjin Chung and Jong Chul Ye. “Deep Diffusion Image Prior for Efficient OOD Adap-
tation in 3D Inverse Problems”. In: Proceedings of the European Conference on Computer
Vision (ECCV). 2024 (page 5).
[29] Hyungjin Chung and Jong Chul Ye. “Score-based diffusion models for accelerated MRI”. In:
Medical image analysis 80 (2022), p. 102479 (page 1).
[30] Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, and Mauricio Delbracio. “Prompt-tuning
latent diffusion models for inverse problems”. In: International Conference on Machine
Learning. PMLR. 2014 (pages 2, 23).
[31] Regev Cohen, Michael Elad, and Peyman Milanfar. “Regularization by denoising via fixed-
point projection (RED-PRO)”. In: SIAM Journal on Imaging Sciences 14.3 (2021), pp. 1374–
1406 (page 4).
[32] Giannis Daras, Yuval Dagan, Alex Dimakis, and Constantinos Daskalakis. “Score-Guided
Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems”. In: Proceed-
ings of the 39th International Conference on Machine Learning. Ed. by Kamalika Chaudhuri,
Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato. Vol. 162. Pro-
ceedings of Machine Learning Research. PMLR, 17–23 Jul 2022, pp. 4722–4753. URL:
https://proceedings.mlr.press/v162/daras22a.html (pages 2, 21).
[33] Giannis Daras, Yuval Dagan, Alexandros G Dimakis, and Constantinos Daskalakis. “Con-
sistent diffusion models: Mitigating sampling drift by learning to be consistent”. In: arXiv
preprint arXiv:2302.09057 (2023) (page 9).
[34] Giannis Daras, Joseph Dean, Ajil Jalal, and Alex Dimakis. “Intermediate Layer Optimization
for Inverse Problems using Deep Generative Models”. In: Proceedings of the 38th Interna-
tional Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139.
Proceedings of Machine Learning Research. PMLR, 18–24 Jul 2021, pp. 2421–2432. URL:
https://proceedings.mlr.press/v139/daras21a.html (page 21).

26
[35] Giannis Daras, Mauricio Delbracio, Hossein Talebi, Alex Dimakis, and Peyman Milanfar.
“Soft Diffusion: Score Matching with General Corruptions”. In: Transactions on Machine
Learning Research (2023). ISSN: 2835-8856. URL: https://openreview.net/forum?
id=W98rebBxlQ (page 36).
[36] Giannis Daras, Alex Dimakis, and Constantinos Costis Daskalakis. “Consistent Diffusion
Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data”. In: Proceedings
of the 41st International Conference on Machine Learning. Ed. by Ruslan Salakhutdinov,
Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix
Berkenkamp. Vol. 235. Proceedings of Machine Learning Research. PMLR, 2024, pp. 10091–
10108. URL: https://proceedings.mlr.press/v235/daras24a.html (page 9).
[37] Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alex Dimakis, and Adam
Klivans. “Ambient Diffusion: Learning Clean Distributions from Corrupted Data”. In: Ad-
vances in Neural Information Processing Systems. Ed. by A. Oh, T. Naumann, A. Globerson,
K. Saenko, M. Hardt, and S. Levine. Vol. 36. Curran Associates, Inc., 2023, pp. 288–
313. URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/
012af729c5d14d279581fc8a5db975a1-Paper-Conference.pdf (page 9).
[38] Ingrid Daubechies, Michel Defrise, and Christine De Mol. “An iterative thresholding algo-
rithm for linear inverse problems with a sparsity constraint”. In: Communications on Pure and
Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences
57.11 (2004), pp. 1413–1457 (page 4).
[39] Mauricio Delbracio and Peyman Milanfar. “Inversion by Direct Iteration: An Alternative to
Denoising Diffusion for Image Restoration”. In: Transactions on Machine Learning Research
(2023). Featured Certification. ISSN: 2835-8856. URL: https://openreview.net/forum?
id=VmyFF5lL3F (page 4).
[40] Arjun D Desai, Andrew M Schmidt, Elka B Rubin, Christopher M Sandino, Marianne S
Black, Valentina Mazzoli, Kathryn J Stevens, Robert Boutin, Christopher Ré, Garry E Gold,
Brian A Hargreaves, and Akshay S Chaudhari. “SKM-TEA: A Dataset for Accelerated MRI
Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation”. In: (2022).
arXiv: 2203.06823 [eess.IV] (page 2).
[41] Prafulla Dhariwal and Alexander Nichol. “Diffusion models beat gans on image synthesis”.
In: Advances in neural information processing systems 34 (2021), pp. 8780–8794 (page 23).
[42] Daniel J Diaz, Chengyue Gong, Jeffrey Ouyang-Zhang, James M Loy, Jordan Wells, David
Yang, Andrew D Ellington, Alexandros G Dimakis, and Adam R Klivans. “Stability Oracle:
a structure-based graph-transformer framework for identifying stabilizing mutations”. In:
Nature Communications 15.1 (2024), p. 6170 (page 2).
[43] Martin Dietz, Lars Liljeryd, Kristofer Kjorling, and Oliver Kunz. “Spectral Band Replication,
a novel approach in audio coding”. In: Audio Engineering Society Convention 112. Audio
Engineering Society. 2002 (page 2).
[44] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using real nvp”.
In: arXiv preprint arXiv:1605.08803 (2016) (page 18).
[45] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. “Image super-resolution
using deep convolutional networks”. In: IEEE transactions on pattern analysis and machine
intelligence 38.2 (2015), pp. 295–307 (page 4).
[46] David L Donoho. “Compressed sensing”. In: IEEE Transactions on information theory 52.4
(2006), pp. 1289–1306 (page 4).
[47] Zehao Dou and Yang Song. “Diffusion posterior sampling for linear inverse problem solving:
A filtering perspective”. In: The Twelfth International Conference on Learning Representa-
tions. 2023 (pages 2, 5, 19).
[48] Arnaud Doucet, Nando De Freitas, and Neil Gordon. “An introduction to sequential Monte
Carlo methods”. In: Sequential Monte Carlo methods in practice (2001), pp. 3–14 (page 20).
[49] Jacques Dubochet, Marc Adrian, Jiin-Ju Chang, Jean-Claude Homo, Jean Lepault, Alasdair
W McDowall, and Patrick Schultz. “Cryo-electron microscopy of vitrified specimens”. In:
Quarterly reviews of biophysics 21.2 (1988), pp. 129–228 (page 2).
[50] Bradley Efron. “Tweedie’s formula and selection bias”. In: Journal of the American Statistical
Association 106.496 (2011), pp. 1602–1614 (page 7).

27
[51] Yonina C. Eldar. “Generalized SURE for Exponential Families: Applications to Regu-
larization”. In: IEEE Transactions on Signal Processing 57.2 (2009), pp. 471–481. DOI:
10.1109/TSP.2008.2008212 (page 9).
[52] Linwei Fan, Fan Zhang, Hui Fan, and Caiming Zhang. “Brief review of image denoising
techniques”. In: Visual Computing for Industry, Biomedicine, and Art 2.1 (2019), p. 7 (page 1).
[53] Berthy T Feng and Katherine L Bouman. “Efficient bayesian computational imaging with a
surrogate score-based prior”. In: arXiv preprint arXiv:2309.01949 (2023) (pages 2, 5, 18).
[54] Berthy T Feng, Jamie Smith, Michael Rubinstein, Huiwen Chang, Katherine L Bouman, and
William T Freeman. “Score-Based diffusion models as principled priors for inverse imaging”.
In: arXiv preprint arXiv:2304.11751 (2023) (pages 2, 5, 17, 18).
[55] James R Fienup. “Phase retrieval algorithms: a comparison”. In: Applied optics 21.15 (1982),
pp. 2758–2769 (page 3).
[56] Mário AT Figueiredo and Robert D Nowak. “An EM algorithm for wavelet-based image
restoration”. In: IEEE Transactions on Image Processing 12.8 (2003), pp. 906–916 (page 4).
[57] Martin Genzel, Ingo Gühring, Jan Macdonald, and Maximilian März. “Near-exact recovery for
tomographic inverse problems via deep learning”. In: International Conference on Machine
Learning. PMLR. 2022, pp. 7368–7381 (page 2).
[58] Shivam Gupta, Ajil Jalal, Aditya Parulekar, Eric Price, and Zhiyang Xun. “Diffusion Poste-
rior Sampling is Computationally Intractable”. In: arXiv preprint arXiv:2402.12727 (2024)
(page 8).
[59] Elaine T Hale, Wotao Yin, and Yin Zhang. “A fixed-point continuation method for l1-
regularized minimization with applications to compressed sensing”. In: CAAM TR07-07, Rice
University 43.44 (2007), p. 2 (page 4).
[60] Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun
Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. “Mani-
fold Preserving Guided Diffusion”. In: The Twelfth International Conference on Learning
Representations. 2023 (page 22).
[61] Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim,
Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, et al. “Manifold
preserving guided diffusion”. In: arXiv preprint arXiv:2311.16424 (2023) (pages 2, 12).
[62] Carlos Hernandez-Olivan, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A Martínez-
Ramirez, Wei-Hsiang Liao, and Yuki Mitsufuji. “Vrdmg: Vocal restoration via diffusion
posterior sampling with multiple guidance”. In: ICASSP 2024-2024 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2024, pp. 596–600
(page 1).
[63] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In:
Advances in Neural Information Processing Systems 33 (2020), pp. 6840–6851 (pages 5, 6).
[64] Jonathan Ho and Tim Salimans. “Classifier-free diffusion guidance”. In: arXiv preprint
arXiv:2207.12598 (2022) (page 23).
[65] Sixun Huang, Jie Xiang, Huadong Du, and Xiaoqun Cao. “Inverse problems in atmospheric
science and their application”. In: Journal of Physics: Conference Series. Vol. 12. 1. IOP
Publishing. 2005, p. 45 (page 1).
[66] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. “Image-to-image translation
with conditional adversarial networks”. In: IEEE conference on computer vision and pattern
recognition. 2017, pp. 1125–1134 (page 4).
[67] Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon
Tamir. “Robust compressed sensing mri with deep generative priors”. In: Advances in Neural
Information Processing Systems 34 (2021), pp. 14938–14954 (pages 2, 4, 5, 10, 11).
[68] Zahra Kadkhodaie and Eero P Simoncelli. “Solving linear inverse problems using the prior
implicit in a denoiser”. In: arXiv preprint arXiv:2007.13640 (2020) (pages 2, 5, 11).
[69] Zahra Kadkhodaie and Eero P Simoncelli. “Stochastic Solutions for Linear Inverse Problems
using the Prior Implicit in a Denoiser”. In: Thirty-Fifth Conference on Neural Information
Processing Systems. 2021 (page 4).
[70] Avinash C Kak and Malcolm Slaney. Principles of computerized tomographic imaging.
SIAM, 2001 (page 2).

28
[71] Ulugbek S Kamilov, Charles A Bouman, Gregery T Buzzard, and Brendt Wohlberg. “Plug-
and-play methods for integrating physical and learned models in computational imaging:
Theory, algorithms, and applications”. In: IEEE Signal Processing Magazine 40.1 (2023),
pp. 85–97 (pages 4, 20).
[72] Ulugbek S Kamilov, Hassan Mansour, and Brendt Wohlberg. “A plug-and-play priors ap-
proach for solving nonlinear imaging inverse problems”. In: IEEE Signal Processing Letters
24.12 (2017), pp. 1872–1876 (page 20).
[73] Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. “Denoising Diffusion
Restoration Models”. In: Advances in Neural Information Processing Systems. 2022 (pages 2,
5, 14).
[74] Bahjat Kawar, Gregory Vaksman, and Michael Elad. “SNIPS: Solving noisy inverse problems
stochastically”. In: Advances in Neural Information Processing Systems 34 (2021), pp. 21757–
21769 (pages 2, 5, 9, 14).
[75] Bahjat Kawar, Gregory Vaksman, and Michael Elad. “Stochastic image denoising by sampling
from the posterior distribution”. In: Proceedings of the IEEE/CVF International Conference
on Computer Vision. 2021, pp. 1866–1875 (page 14).
[76] Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu
Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. “Consistency Trajectory Models:
Learning Probability Flow ODE Trajectory of Diffusion”. In: The Twelfth International
Conference on Learning Representations. 2023 (page 24).
[77] Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, and Jong Chul Ye. “Regularization by
Texts for Latent Diffusion Inverse Solvers”. In: arXiv preprint arXiv:2311.15658 (2023)
(pages 2, 23).
[78] Jeongsol Kim, Geon Yeong Park, and Jong Chul Ye. “DreamSampler: Unifying Dif-
fusion Sampling and Score Distillation for Image Manipulation”. In: arXiv preprint
arXiv:2403.11415 (2024) (pages 2, 23).
[79] Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. “Noise2void-learning denoising
from single noisy images”. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition. 2019, pp. 2129–2137 (page 9).
[80] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiří Matas.
“Deblurgan: Blind motion deblurring using conditional adversarial networks”. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. 2018, pp. 8183–8192
(page 4).
[81] Patrick Lailly and J Bednar. “The seismic inverse problem as a sequence of before stack migra-
tions”. In: Conference on inverse scattering: theory and application. Vol. 1983. Philadelphia,
Pa. 1983, pp. 206–220 (page 1).
[82] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala,
and Timo Aila. “Noise2Noise: Learning image restoration without clean data”. In: arXiv
preprint arXiv:1803.04189 (2018) (page 9).
[83] Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, and Timo
Gerkmann. “Diffusion Models for Audio Restoration”. In: arXiv preprint arXiv:2402.09821
(2024) (page 1).
[84] Haoying Li, Yifan Yang, Meng Chang, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen.
“SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models”. In: arXiv
preprint arXiv:2104.14951 (2021) (page 4).
[85] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. “Enhanced deep
residual networks for single image super-resolution”. In: Proceedings of the IEEE conference
on computer vision and pattern recognition workshops. 2017, pp. 136–144 (page 4).
[86] Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le.
“Flow Matching for Generative Modeling”. In: The Eleventh International Conference
on Learning Representations. 2023. URL: https : / / openreview . net / forum ? id =
PqvMRDCJT9t (page 4).
[87] Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A Theodorou, Weili Nie, and
Anima Anandkumar. “I2 SB: Image-to-Image Schrödinger Bridge”. In: arXiv preprint
arXiv:2302.05872 (2023) (page 4).

29
[88] Xingchao Liu, Chengyue Gong, and qiang liu. “Flow Straight and Fast: Learning to Generate
and Transfer Data with Rectified Flow”. In: The Eleventh International Conference on Learn-
ing Representations. 2023. URL: https://openreview.net/forum?id=XVjTT1nw5z
(page 4).
[89] Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van
Gool. “Repaint: Inpainting using denoising diffusion probabilistic models”. In: Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 11461–
11471 (pages 5, 12).
[90] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. “Im-
age restoration with mean-reverting stochastic differential equations”. In: arXiv preprint
arXiv:2301.11699 (2023) (page 4).
[91] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. “Re-
fusion: Enabling large-size realistic image restoration with latent-space diffusion models”.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
2023, pp. 1680–1691 (page 4).
[92] Michael Lustig, David L Donoho, Juan M Santos, and John M Pauly. “Compressed sensing
MRI”. In: IEEE signal processing magazine 25.2 (2008), pp. 72–82 (page 2).
[93] Dimitra Maoutsa, Sebastian Reich, and Manfred Opper. “Interacting particle solutions of
fokker–planck equations through gradient–log–density estimation”. In: Entropy 22.8 (2020),
p. 802 (page 6).
[94] Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. “A Variational Perspective on
Solving Inverse Problems with Diffusion Models”. In: The Twelfth International Conference
on Learning Representations. 2024 (pages 2, 5, 16, 17).
[95] Peyman Milanfar and Mauricio Delbracio. “Denoising: A Powerful Building-Block for
Imaging, Inverse Problems, and Machine Learning”. In: arXiv preprint arXiv:2409.06219
(2024) (page 4).
[96] Eloi Moliner, Filip Elvander, and Vesa Välimäki. “Blind audio bandwidth extension: A
diffusion-based zero-shot approach”. In: arXiv preprint arXiv:2306.01433 (2023) (page 1).
[97] Eloi Moliner, Jaakko Lehtinen, and Vesa Välimäki. “Solving audio inverse problems with
a diffusion model”. In: ICASSP 2023-2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1–5 (page 1).
[98] Eloi Moliner and Vesa Välimäki. “Diffusion-based audio inpainting”. In: arXiv preprint
arXiv:2305.15266 (2023) (page 1).
[99] Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mit-
sufuji, and Stefano Ermon. “Gibbsddrm: A partially collapsed gibbs sampler for solving
blind inverse problems with denoising diffusion restoration”. In: International Conference on
Machine Learning. PMLR. 2023, pp. 25501–25522 (pages 2, 15).
[100] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012 (page 4).
[101] Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, and Biing-Hwang
Juang. “Speech dereverberation based on variance-normalized delayed linear prediction”. In:
IEEE Transactions on Audio, Speech, and Language Processing 18.7 (2010), pp. 1717–1731
(page 3).
[102] Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. 6th ed.
Berlin: Springer Science & Business Media, 2010. ISBN: 9783642143946 (pages 7, 8).
[103] Gregory Ongie, Ajil Jalal, Christopher A Metzler, Richard G Baraniuk, Alexandros G
Dimakis, and Rebecca Willett. “Deep learning techniques for inverse problems in imaging”.
In: IEEE Journal on Selected Areas in Information Theory 1.1 (2020), pp. 39–56 (page 4).
[104] Jeffrey Ouyang-Zhang, Daniel J Diaz, Adam Klivans, and Philipp Krähenbühl. “Predicting a
Protein’s Stability under a Million Mutations”. In: NeurIPS (2023) (page 2).
[105] Xiaochuan Pan, Emil Y Sidky, and Michael Vannier. “Why do commercial CT scanners still
employ traditional, filtered back-projection for image reconstruction?” In: Inverse problems
25.12 (2009), p. 123009 (page 2).
[106] Sung Cheol Park, Min Kyu Park, and Moon Gi Kang. “Super-resolution image reconstruction:
a technical overview”. In: IEEE signal processing magazine 20.3 (2003), pp. 21–36 (page 3).
[107] Marcelo Pereyra. “Revisiting maximum-a-posteriori estimation in log-concave models”. In:
SIAM Journal on Imaging Sciences 12.1 (2019), pp. 650–670 (page 4).

30
[108] Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. “DreamFusion: Text-to-3D
using 2D Diffusion”. In: The Eleventh International Conference on Learning Representations.
2023. URL: https://openreview.net/forum?id=FjNys5c7VyY (page 23).
[109] Weize Quan, Jiaxi Chen, Yanli Liu, Dong-Ming Yan, and Peter Wonka. “Deep learning-based
image and video inpainting: A survey”. In: International Journal of Computer Vision 132.7
(2024), pp. 2367–2400 (page 2).
[110] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini
Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. “Learning
transferable visual models from natural language supervision”. In: International conference
on machine learning. PMLR. 2021, pp. 8748–8763 (page 23).
[111] Sriram Ravula, Brett Levac, Ajil Jalal, Jonathan I Tamir, and Alexandros G Dimakis. “Opti-
mizing sampling patterns for compressed sensing MRI with diffusion generative models”. In:
arXiv preprint arXiv:2306.03284 (2023) (page 15).
[112] Danilo Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In:
International conference on machine learning. PMLR. 2015, pp. 1530–1538 (page 18).
[113] Alejandro Ribes and Francis Schmitt. “Linear inverse problems in imaging”. In: IEEE Signal
Processing Magazine 25.4 (2008), pp. 84–99 (page 4).
[114] Yaniv Romano, Michael Elad, and Peyman Milanfar. “The Little Engine That Could: Regular-
ization by Denoising (RED)”. In: SIAM Journal on Imaging Sciences 10.4 (2017), pp. 1804–
1844. DOI: 10.1137/16M1102884 (pages 4, 20).
[115] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer.
“High-resolution image synthesis with latent diffusion models”. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 10684–
10695 (pages 7, 10).
[116] Litu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-
Sheng Chu. “Beyond first-order tweedie: Solving inverse problems using latent diffusion”.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
2024, pp. 9472–9481 (pages 2, 23).
[117] Litu Rout, Advait Parulekar, Constantine Caramanis, and Sanjay Shakkottai. “A theoretical
justification for image inpainting using denoising diffusion probabilistic models”. In: arXiv
preprint arXiv:2302.01217 (2023) (pages 5, 12).
[118] Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay
Shakkottai. “Solving linear inverse problems provably via posterior sampling with latent
diffusion models”. In: Advances in Neural Information Processing Systems 36 (2024) (pages 2,
22).
[119] François Rozet, Gérôme Andry, François Lanusse, and Gilles Louppe. “Learning Diffusion
Priors from Observations by Expectation Maximization”. In: arXiv preprint arXiv:2405.13712
(2024) (pages 2, 5, 9, 13).
[120] Leonid I Rudin, Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise
removal algorithms”. In: Physica D: Nonlinear Phenomena 60.1-4 (1992), pp. 259–268
(page 4).
[121] Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans,
David Fleet, and Mohammad Norouzi. “Palette: Image-to-image diffusion models”. In: ACM
SIGGRAPH 2022 Conference Proceedings. 2022, pp. 1–10 (page 4).
[122] Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mo-
hammad Norouzi. “Image super-resolution via iterative refinement”. In: arXiv preprint
arXiv:2104.07636 (2021) (pages 3, 4).
[123] Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui,
and Yuki Mitsufuji. “Unsupervised vocal dereverberation with diffusion-based generative
models”. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP). IEEE. 2023, pp. 1–5 (page 1).
[124] Jonathan Scarlett, Reinhard Heckel, Miguel R. D. Rodrigues, Paul Hand, and Yonina C. Eldar.
“Theoretical Perspectives on Deep Learning Methods in Inverse Problems”. In: IEEE Journal
on Selected Areas in Information Theory 3.3 (Sept. 2022), pp. 433–453. ISSN: 2641-8770.
DOI : 10.1109/jsait.2023.3241123. URL : http://dx.doi.org/10.1109/JSAIT.
2023.3241123 (page 4).

31
[125] Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, and Dongsheng Li.
“Understanding Training-free Diffusion Guidance: Mechanisms and Limitations”. In: arXiv
preprint arXiv:2403.12404 (2024) (page 5).
[126] Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. “Diffusion Schr\"
odinger Bridge Matching”. In: arXiv preprint arXiv:2303.16852 (2023) (page 4).
[127] Nir Shlezinger, Jay Whang, Yonina C Eldar, and Alexandros G Dimakis. “Model-based deep
learning”. In: Proceedings of the IEEE 111.5 (2023), pp. 465–499 (page 4).
[128] Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. “Solving
Inverse Problems with Latent Diffusion Models via Hard Data Consistency”. In: The Twelfth
International Conference on Learning Representations. 2024. URL: https://openreview.
net/forum?id=j8hdRqOUhN (pages 2, 22).
[129] Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models”. In:
arXiv preprint arXiv:2010.02502 (2020) (page 6).
[130] Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. “Pseudoinverse-guided diffu-
sion models for inverse problems”. In: International Conference on Learning Representations.
2022 (pages 2, 5, 13).
[131] Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. “Consistency Models”. In:
International Conference on Machine Learning. PMLR. 2023, pp. 32211–32252 (pages 21,
24).
[132] Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. “Maximum likelihood training
of score-based diffusion models”. In: Advances in neural information processing systems 34
(2021), pp. 1415–1428 (page 18).
[133] Yang Song and Stefano Ermon. “Generative modeling by estimating gradients of the data
distribution”. In: Advances in Neural Information Processing Systems 32 (2019) (page 5).
[134] Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. “Solving inverse problems in medical
imaging with score-based generative models”. In: arXiv preprint arXiv:2111.08005 (2021)
(pages 1, 19).
[135] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon,
and Ben Poole. “Score-based generative modeling through stochastic differential equations”.
In: arXiv preprint arXiv:2011.13456 (2020) (pages 2, 5, 6, 8, 11, 12, 18).
[136] Suhas Sreehari, S Venkat Venkatakrishnan, Brendt Wohlberg, Gregery T Buzzard, Lawrence F
Drummy, Jeffrey P Simmons, and Charles A Bouman. “Plug-and-play priors for bright field
electron tomography and sparse interpolation”. In: IEEE Transactions on Computational
Imaging 2.4 (2016), pp. 408–423 (page 4).
[137] Charles M Stein. “Estimation of the mean of a multivariate normal distribution”. In: The
annals of Statistics (1981), pp. 1135–1151 (page 9).
[138] Yu Sun, Zihui Wu, Yifan Chen, Berthy T Feng, and Katherine L Bouman. “Provable proba-
bilistic imaging using score-based generative priors”. In: IEEE Transactions on Computa-
tional Imaging (2024) (pages 2, 20).
[139] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya Jia. “Scale-Recurrent Network
for Deep Image Deblurring”. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 2018 (page 4).
[140] Albert Tarantola. Inverse problem theory and methods for model parameter estimation. SIAM,
2005 (page 4).
[141] U Tariq, P Lai, M Lustig, M Alley, M Zhang, G Gold, and Vasanawala Shreyas S. “MRI
Data: Undersampled Knees”. In: Undersampled Knees | MRI Data (). URL: http://old.
mridata.org/undersampled/knees (page 2).
[142] Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay,
and Tommi S. Jaakkola. “Diffusion Probabilistic Modeling of Protein Backbones in 3D
for the motif-scaffolding problem”. In: The Eleventh International Conference on Learning
Representations. 2023. URL: https : / / openreview . net / forum ? id = 6TxBxqNME1Y
(pages 2, 20).
[143] Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and
Yinxiao Li. “Maxim: Multi-axis mlp for image processing”. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2022, pp. 5769–5780 (page 4).

32
[144] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. “Plug-and-play
priors for model based reconstruction”. In: 2013 IEEE Global Conference on Signal and
Information Processing. IEEE. 2013, pp. 945–948 (page 4).
[145] Pascal Vincent. “A connection between score matching and denoising autoencoders”. In:
Neural computation 23.7 (2011), pp. 1661–1674 (pages 7, 36).
[146] Jean Virieux and Stéphane Operto. “An overview of full-waveform inversion in exploration
geophysics”. In: Geophysics 74.6 (2009), WCC1–WCC26 (page 1).
[147] Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, and Ju Sun. “DMPlug:
A Plug-in Method for Solving Inverse Problems with Diffusion Models”. In: arXiv preprint
arXiv:2405.16749 (2024) (pages 2, 5, 21).
[148] Yifei Wang, Weimin Bai, Weijian Luo, Wenzheng Chen, and He Sun. “Integrating Amortized
Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images”.
In: arXiv preprint arXiv:2407.11162 (2024) (page 9).
[149] Yinhuai Wang, Jiwen Yu, and Jian Zhang. “Zero-shot image restoration using denoising
diffusion null-space model”. In: arXiv preprint arXiv:2212.00490 (2022) (pages 2, 5, 15).
[150] Jay Whang, Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros G Dimakis,
and Peyman Milanfar. “Deblurring via stochastic refinement”. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 16293–
16303 (page 4).
[151] Luhuan Wu, Brian L. Trippe, Christian A Naesseth, John Patrick Cunningham, and David
Blei. “Practical and Asymptotically Exact Conditional Sampling in Diffusion Models”. In:
Thirty-seventh Conference on Neural Information Processing Systems. 2023. URL: https:
//openreview.net/forum?id=eWKqr1zcRv (pages 2, 5, 20).
[152] Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, and Katherine L. Bouman.
Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors. 2024.
arXiv: 2405.18782 [eess.IV] (pages 2, 5, 18).
[153] Carl Wunsch. The ocean circulation inverse problem. Cambridge University Press, 1996
(page 1).
[154] Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin,
Yan Wang, Jingjing Liu, and Ya-Qin Zhang. “Consistency Model is an Effective Posterior
Sample Approximation for Diffusion Inverse Solvers”. In: (2024). arXiv: 2403 . 12063
[cs.CV] (pages 2, 21).
[155] Yuting Xu, Deeptak Verma, Robert P Sheridan, Andy Liaw, Junshui Ma, Nicholas M Marshall,
John McIntosh, Edward C Sherer, Vladimir Svetnik, and Jennifer M Johnston. “Deep dive
into machine learning models for protein engineering”. In: Journal of chemical information
and modeling 60.6 (2020), pp. 2773–2790 (page 2).
[156] Kevin K Yang, Zachary Wu, and Frances H Arnold. “Machine-learning-guided directed
evolution for protein engineering”. In: Nature methods 16.8 (2019), pp. 687–694 (page 2).
[157] Lingxiao Yang, Shutong Ding, Yifan Cai, Jingyi Yu, Jingya Wang, and Ye Shi. “Guid-
ance with Spherical Gaussian Constraint for Conditional Diffusion”. In: arXiv preprint
arXiv:2402.03201 (2024) (page 12).
[158] G Alastair Young, Richard L Smith, and Richard L Smith. Essentials of statistical inference.
Vol. 16. Cambridge University Press, 2005 (page 4).
[159] Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, and Zhibo
Chen. “Inpaint Anything: Segment Anything Meets Image Inpainting”. In: arXiv preprint
arXiv:2304.06790 (2023) (page 2).
[160] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and
Ming-Hsuan Yang. “Restormer: Efficient transformer for high-resolution image restoration”.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
2022, pp. 5728–5739 (page 4).
[161] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,
Ming-Hsuan Yang, and Ling Shao. “Multi-stage progressive image restoration”. In: Pro-
ceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021,
pp. 14821–14831 (page 4).

33
[162] Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J
Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, et al. “fastMRI: An
open dataset and benchmarks for accelerated MRI”. In: arXiv preprint arXiv:1811.08839
(2018) (pages 2, 9).

[163] Tao Zhang, John Pauly, Shreyas Vasanawala, and Michael Lustig. “MRI Data: Undersampled
Abdomens”. In: Undersampled Abdomens | MRI Data (). URL: http://old.mridata.
org/undersampled/abdomens (page 2).

[164] Yuanzhi Zhu, Kai Zhang, Jingyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, and Luc
Van Gool. “Denoising diffusion models for plug-and-play image restoration”. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 1219–
1229 (pages 2, 5, 16).

34
A Proofs
Lemma A.1 (Conditional Expectation and MMSE). Let X0 and Xt be two random variables, and
hθ (xt , t) be a function parameterized by θ. Then:

h i h i
2 2
argminθ E |||hθ (xt , t) − x0 || = argminθ E ||hθ (xt , t) − E[x0 |xt ]|| (A.1)

That is, the function hθ (xt , t) that minimizes the mean squared error with respect to x0 is the one
that best approximates the conditional expectation E[x0 |xt ].

Proof.
h i
2
argminθ E ||hθ (xt , t) − x0 || (A.2)
h i
2
= argminθ E ||hθ (xt , t) − E[x0 |xt ] + E[x0 |xt ] − x0 || (A.3)

2
= argminθ E ||hθ (xt , t) − E[x0 |xt ]|| − 2(hθ (xt , t) − E[x0 |xt ])⊤ (x0 − E[x0 |xt ])

2
+||x0 − E[x0 |xt ]|| (A.4)
h i
2
= argminθ E ||hθ (xt , t) − E[x0 |xt ]|| − 2hθ (xt , t)⊤ (x0 − E[x0 |xt ]) . (A.5)

Now, for the second term, we have:


Ex0 ,xt hθ (xt , t)⊤ (x0 − E[x0 |xt ]) = Ext Ex0 |xt hθ (xt , t)⊤ (x0 − E[x0 |xt ])
   
(A.6)
= Ext hθ (xt , t)⊤ Ex0 |xt [(x0 − E[x0 |xt ])] = 0,
 
(A.7)
which concludes the proof.

A.1 Tweedie’s Formula

Lemma A.2 (Tweedie’s Formula). Let:


Xt = X0 + σt Z, (A.8)
for X0 ∼ pX0 and Z ∼ N (0, I). Then,
E[X0 |xt ] − xt
∇xt log pt (xt ) = . (A.9)
σt2

Proof.
Z
1 1
∇xt log pt (xt ) = ∇x pt (xt ) = ∇x pt (xt , x0 )dx0 (A.10)
pt (xt ) t pt (xt ) t
Z
1
= ∇x pt (xt |x0 )p0 (x0 )dx0 (A.11)
pt (xt ) t
Z
1
= ∇xt pt (xt |x0 )p0 (x0 )dx0 (A.12)
pt (xt )
Z
1
= pt (xt |x0 )∇xt log pt (xt |x0 )p0 (x0 )dx0 (A.13)
pt (xt )
x0 − xt
Z
= p0 (x0 |xt ) dx0 (A.14)
σt2
E[X0 |xt ] − xt
= . (A.15)
σt2

35
A.2 Denoising Score Matching

By leveraging the MMSE interpretation of the conditional expectation and Tweedie’s formula, one
can approximate the score function by training a model to predict the clean image from a corrupted
observation (via supervised learning). At inference time, the trained network can be converted to a
model that approximates the score through Tweedie’s formula. This training procedure is typically
known as x0 -prediction loss. An alternative, but equivalent, way is to train for the score directly.
Vincent [145] independently discovered Denoising Score Matching, which has as a unique minimizer
the score function. DSM and the x0 -prediction objective are the same up to a simple network
reparametrization.

Theorem A.3 (Denoising Score Matching [145]). Let p0 , pt be two distributions in Rn . Assume that
all the conditional distributions, pt (xt |x0 ), are supported and differentiable in Rn . Let:

1 h
2
i
J1 (θ) = Ext ∼pt ||sθ (xt ) − ∇xt log pt (xt )|| , (A.16)
2

1 h
2
i
J2 (θ) = E(x0 ,xt )∼p0 (x0 )pt (xt |x0 ) ||sθ (xt ) − ∇xt log pt (xt |x0 )|| . (A.17)
2

Then, J1 and J2 have the same minimizer.

We include the proof listed in [35] for completeness.

Proof.

1 h
2
i
J1 (θ) = Ext ∼pt ||sθ (xt )|| − 2sθ (xt )⊤ ∇xt log pt (xt ) + ||∇xt log pt (xt )||2 (A.18)
2
1
= Ext ∼pt ||sθ (xt )||2 − Ext ∼pt sθ (xt )⊤ ∇xt log pt (xt ) + C1 .
   
(A.19)
2

Similarly,

1
Ext ∼pt ||sθ (xt )||2 − E(x0 ,xt ) sθ (xt )⊤ ∇xt log pt (xt |x0 ) + C2 .
   
J2 (θ) = (A.20)
2

It suffices to show that:

Ext ∼pt sθ (xt )⊤ ∇xt log pt (xt )


 

= E(x0 ,xt )∼p0 (x0 )pt (xt |x0 ) sθ (xt )⊤ ∇xt log pt (xt |x0 ) .
 
(A.21)

36
We start with the second term.

E(x0 ,xt )∼p0 (x0 )pt (xt |x0 ) sθ (xt )⊤ ∇xt log pt (xt |x0 )
 
Z Z
= p0 (x0 )pt (xt |x0 )sθ (xt )⊤ ∇xt log pt (xt |x0 )dxt dx0 (A.22)
x0 xt
Z Z
= s⊤
θ (xt ) (p0 (x0 )pt (xt |x0 )∇xt log pt (xt |x0 )) dxt dx0 (A.23)
x0 xt
Z Z  
1
= s⊤
θ (x t ) p 0 (x 0 )p t (x t |x 0 ) ∇ x p t (x t |x 0 ) dxt dx0 (A.24)
x0 xt pt (xt |x0 ) t
Z Z
= s⊤ θ (xt ) (p0 (x0 )∇xt pt (xt |x0 )) dxt dx0 (A.25)
x0 xt
Z Z
= s⊤ θ (xt ) (p0 (x0 )∇xt pt (xt |x0 )) dx0 dxt (A.26)
xt x0
Z Z 
= s⊤ θ (x t ) p 0 (x 0 )∇ xt
pt (x t |x0 )dx 0 dxt (A.27)
xt x0
Z Z 
= s⊤ θ (x t ) ∇ xt (p 0 (x )p
0 t (x |x
t 0 )) dx 0 dxt (A.28)
xt x0
Z  Z 
= s⊤θ (x t ) ∇ xt p 0 (x )p
0 t (x |x
t 0 )dx 0 dxt (A.29)
xt x0
Z
= s⊤θ (xt )∇xt pt (xt )dxt (A.30)
xt
Z
= pt (xt )s⊤ θ (xt )∇xt log pt (xt )dxt (A.31)
xt
= Ext ∼pt (xt ) s⊤
 
θ (xt )∇xt log pt (xt ) . (A.32)

A.3 Jacobian of the score

Lemma A.4 (Jacobian of score-function). Let:

Xt = X0 + σt Z, (A.33)

for X0 ∼ pX0 and Z ∼ N (0, I). Then,

E[X0 X0⊤ |xt ] − E[X0 |xt ]E⊤ [X0 |xt ] I


H(log pXt )(xt ) = − 2. (A.34)
σt4 σt

Proof.

E[X0 |xt ] − xt
∇xt log pXt (xt ) = (A.35)
σt2
⇒ σt2 H(log pXt )(xt ) = Jacob (E[X0 |xt ]) − I. (A.36)

37
We will now analyze the Jacobian.
Z
Jacob (E[X0 |xt ]) = ∇xt (pX0 (x0 |xt )x0 ) dx0 (A.37)
Z
= x0 ∇⊤ xt pX0 (x0 |xt )dx0 (A.38)
 
pXt (xt |x0 )pX0 (x0 )
Z
= x0 pX0 (x0 |xt )∇⊤ xt log dx0 (A.39)
pXt (xt )
 
pXt (xt |x0 )
Z
= x0 pX0 (x0 |xt )∇⊤ xt log dx0 (A.40)
pXt (xt )
Z Z
= x0 pX0 (x0 |xt )∇⊤xt log p Xt (x |x
t 0 )dx 0 − x0 pX0 (x0 |xt )∇⊤ xt log pXt (xt )dx0 (A.41)

x⊤ − x⊤
Z Z
= x0 pX0 (x0 |xt ) 0 2 t dx0 − x0 pX0 (x0 |xt )∇⊤ xt log pXt (xt )dx0 (A.42)
σt
x⊤ − x⊤ E⊤ [x0 |xt ] − x⊤
Z Z
= x0 pX0 (x0 |xt ) 0 2 t dx0 − x0 pX0 (x0 |xt ) t
dx0 (A.43)
σt σt2
1
= 2 E[x0 x⊤ ⊤

0 |xt ] − E[x0 |xt ]E[x0 |xt ] . (A.44)
σt

Corollary A.5. Let:


Xt = X0 + σt Z, (A.45)
for X0 ∼ pX0 , X0 ∈ Rn and Z ∼ N (0, I). Then,
2
E[||X0 ||2 | xt ] − ||E[X0 |xt ]|| n
∇2xt log pt (xt ) = − 2. (A.46)
σt4 σt

38

You might also like