0% found this document useful (0 votes)

63 views11 pages

Unrolled Optimization With Deep Priors: Steven Diamond Vincent Sitzmann Felix Heide Gordon Wetzstein December 20, 2018

This document proposes a framework called unrolled optimization with deep priors (ODP) for solving inverse problems in imaging. ODP trains deep convolutional neural networks within truncated, iterative optimization algorithms to incorporate prior knowledge of image formation models. The framework outperforms state-of-the-art methods for various imaging tasks like denoising, deblurring, and compressed sensing MRI. Empirical results show that layers approximating image formation inversion are useful, while prior layers improve generalization. Simple algorithms inverting image formation each iteration perform best for unrolling. ODP provides a general approach for integrating prior knowledge into deep networks for inverse imaging problems.

Uploaded by

Edward Ventura Barrientos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views11 pages

Unrolled Optimization With Deep Priors: Steven Diamond Vincent Sitzmann Felix Heide Gordon Wetzstein December 20, 2018

Uploaded by

Edward Ventura Barrientos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Unrolled Optimization with Deep Priors

Steven Diamond∗ Vincent Sitzmann∗ Felix Heide Gordon Wetzstein

December 20, 2018
arXiv:1705.08041v2 [cs.CV] 18 Dec 2018

Abstract
A broad class of problems at the core of computational imaging, sensing, and low-level computer
vision reduces to the inverse problem of extracting latent images that follow a prior distribution,
from measurements taken under a known physical image formation model. Traditionally, hand-
crafted priors along with iterative optimization methods have been used to solve such problems. In
this paper we present unrolled optimization with deep priors, a principled framework for infusing
knowledge of the image formation into deep networks that solve inverse problems in imaging,
inspired by classical iterative methods. We show that instances of the framework outperform the
state-of-the-art by a substantial margin for a wide variety of imaging problems, such as denoising,
deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover, we conduct
experiments that explain how the framework is best used and why it outperforms previous methods.

1 Introduction
In inverse imaging problems, we seek to reconstruct a latent image from measurements taken under a
known physical image formation. Such inverse problems arise throughout computational photography,
computer vision, medical imaging, and scientific imaging. Residing in the early vision layers of every
autonomous vision system, they are essential for all vision based autonomous agents. Recent years have
seen tremendous progress in both classical and deep methods for solving inverse problems in imaging.
Classical and deep approaches have relative advantages and disadvantages. Classical algorithms, based
on formal optimization, exploit knowledge of the image formation model in a principled way, but
struggle to incorporate sophisticated learned models of natural images. Deep methods easily learn
complex statistics of natural images, but lack a systematic approach to incorporating prior knowledge
of the image formation model. What is missing is a general framework for designing deep networks
that incorporate prior information, as well as a clear understanding of when prior information is useful.
In this paper we propose unrolled optimization with deep priors (ODP): a principled, general
purpose framework for integrating prior knowledge into deep networks. We focus on applications of
the framework to inverse problems in imaging. Given an image formation model and a few generic,
high-level design choices, the ODP framework provides an easy to train, high performance network
architecture. The framework suggests novel network architectures that outperform prior work across a
variety of imaging problems.
The ODP framework is based on unrolled optimization, in which we truncate a classical iterative
optimization algorithm and interpret it as a deep network. Unrolling optimization has been a common
practice among practitioners in imaging, and training unrolled optimization models has recently been
explored for various imaging applications, all using variants of field-of-experts priors [12, 21, 5, 24].
We differ from existing approaches in that we propose a general framework for unrolling optimization
methods along with deep convolutional prior architectures within the unrolled optimization. By
training deep CNN priors within unrolled optimization architectures, instances of ODP outperform
state-of-the-art results on a broad variety of inverse imaging problems.
∗ These authors contributed equally.

1
Our empirical results clarify the benefits and limitations of encoding prior information for inverse
problems in deep networks. Layers that (approximately) invert the image formation operator are useful
because they simplify the reconstruction task to denoising and correcting artifacts introduced by the
inversion layers. On the other hand, prior layers improve network generalization, boosting performance
on unseen image formation operators. For deblurring and compressed sensing MRI, we found that a
single ODP model trained on many image formation operators outperforms existing state-of-the-art
methods where a specialized model was trained for each operator.
Moreover, we offer insight into the open question of what iterative algorithm is best for unrolled
optimization, given a linear image formation model. Our main finding is that simple primal algorithms
that (approximately) invert the image formation operator each iteration perform best.
In summary, our contributions are as follows:
1. We introduce ODP, a principled, general purpose framework for inverse problems in imaging,
which incorporates prior knowledge of the image formation into deep networks.
2. We demonstrate that instances of the ODP framework for denoising, deblurring, and compressed
sensing MRI outperform state-of-the-art results by a large margin.
3. We present empirically derived insights on how the ODP framework and related approaches are
best used, such as when exploiting prior information is advantageous and which optimization
algorithms are most suitable for unrolling.

2 Motivation
Bayesian model The proposed ODP framework is inspired by an extensive body of work on solving
inverse problems in imaging via maximum-a-posteriori (MAP) estimation under a Bayesian model. In
the Bayesian model, an unknown image x is drawn from a prior distribution Ω(θ) with parameters θ.
The imaging system applies a linear operator A to this image, representing all optical processes in the
capture, and then measures an image y on the sensor, drawn from a noise distribution ω(Ax) that
models sensor noise, e.g., read noise, and noise in the signal itself, e.g., photon-shot noise.
Let P (y|Ax) be the probability of sampling y from ω(Ax) and P (x; θ) be the probability of sampling
x from Ω(θ). Then the probability of an unknown image x yielding an observation y is proportional to
P (y|Ax)P (x; θ).
The MAP point-estimate of x is given by x = argmaxx P (y|Ax)P (x; θ), or equivalently
x = argmin f (y, Ax) + r(x, θ), (1)
x

where the data term f (y, Ax) = − log P (y|Ax) and prior term r(x, θ) = − log P (x; θ) are negative
log-likelihoods. Computing x thus involves solving an optimization problem [3, Chap. 7].

Unrolled iterative methods A large variety of algorithms have been developed for solving
problem (1) efficiently for different convex data terms and priors (e.g., FISTA [1], Chambolle-
Pock [4], ADMM [2]). The majority of these algorithms are iterative methods, in which a mapping
Γ(xk , A, y, θ) → xk+1 is applied repeatedly to generate a series of iterates that converge to solution x? ,
starting with an initial point x0 .
Iterative methods are usually terminated based on a stopping condition that ensures theoretical
convergence properties. An alternative approach is to execute a pre-determined number of iterations
N , in other words unrolling the optimization algorithm. This approach is motivated by the fact
that for many imaging applications very high accuracy, e.g., convergence below tolerance of 10−6 for
every local pixel state, is not needed in practice, as opposed to optimization problems in, for instance,
control. Fixing the number of iterations allows us to view the iterative method as an explicit function
ΓN (·, A, y, θ) → xN of the initial point x0 . Parameters such as θ may be fixed across all iterations or
vary by iteration. The unrolled iterative algorithm can be interpreted as a deep network [24].

2
iter iter iter iter

Single
CNN Prior
Iteration:

Figure 1: A proximal gradient ODP network for deblurring under Gaussian noise, mapping the
observation y into an estimate x̂ of the latent image x. Here F is the DFT, k is the blur kernel, and
K is its Fourier transform.

Parameterization The parameters θ in an unrolled iterative algorithm are the algorithm hyper-
parameters, such as step sizes, and model parameters defining the prior. Generally the number of
algorithm hyperparameters is small (1-5 per iteration), so the model capacity of the unrolled algorithm
is primarily determined by the representation of the prior.
Many efficient iterative optimization methods do not interact with the prior term r directly, but
instead minimize r via its (sub)gradient or proximal operator proxr(·,θ) , defined as

1
proxr(·,θ) (v) = argmin r(z, θ) + kz − vk22 .
z 2
The proximal operator is a generalization of Euclidean projection. In the ODP framework, we propose
to parameterize the gradient or proximal operator of r directly and define r implicitly.

3 Unrolled optimization with deep priors

We propose the ODP framework to incorporate knowledge of the image formation into deep convolutional
networks. The framework factors networks into data steps, which are functions of the measurements y
encoding prior information about the image formation model, and CNN steps, which represent statistical
image priors. The factorization follows a principled approach inspired by classical optimization methods,
thereby combining the best of deep models and classical algorithms.

3.1 Framework
The ODP framework is summarized by the network template in Algorithm 1. The design choices in
the template are the optimization algorithm, which defines the data step Γ and algorithm state z k ,
the number of iterations N that the algorithm is unrolled, the function φ to initialize the algorithm
from the measurements y, and the CNN used in the prior step, whose output xk+1/2 represents either
∇r(xk , θk ) or proxr(·,θk ) (xk ), depending on the optimization algorithm. Figure 1 shows an example
ODP instance for deblurring under Gaussian noise.
Instances of the ODP framework have two complementary interpretations. From the perspective of
classical optimization based methods, an ODP architecture applies a standard optimization algorithm
but learns a prior defined by a CNN. From the perspective of deep learning, the network is a CNN
with layers tailored to the image formation model.
ODP networks are motivated by minimizing the objective in problem (1), but they are trained
to minimize a higher-level loss, which is defined on a metric between the network output and the

3
Algorithm 1 ODP network template to solve problem (1)
1: Initialization: (x0 , z 0 ) = φ(f, A, y, θ0 ).
2: for k = 0 to N − 1 do
3: xk+1/2 ← CNN(xk , z k , θk ).
4: (xk+1 , z k+1 ) ← Γ(f, A, y, xk+1/2 , z k , θk ).
5: end for

ground-truth latent image over a training set of image/measurement pairs. Classical metrics for images
are mean-squared error, PSNR, or SSIM. Let Γ(y, θ) be the network output given measurements y and
parameters θ. Then we train the network by (approximately) solving the optimization problem

minimize Ex∼Ω Ey∼ω(Ax) `(x, Γ(y, θ)), (2)

where θ is the optimization variable, ` is the chosen reconstruction loss, e.g., PSNR, and Ω is the
true distribution over images (as opposed to the parameterized approximation Ω(θ)). The ability to
train directly for expected reconstruction loss is a primary advantage of deep networks and unrolled
optimization over classical image optimization methods. In contrast to pretraining priors on natural
image training sets, directly training priors within the optimization algorithm allows ODP networks to
learn application-specific priors and efficiently share information between prior and data steps, allowing
drastically fewer iterations than classical methods.
Since ODPs are close to conventional CNNs, we can approximately solve problem (2) using the
many effective stochastic gradient based methods developed for CNNs (e.g., Adam [13]). Similarly, we
can initialize the portions of θ corresponding to the CNN prior steps using standard CNN initialization
schemes (e.g., Xavier initialization [11]). The remaining challenge in training ODPs is initializing the
portions of θ corresponding to algorithm parameters in the data step Γ. Most optimizaton algorithms
only have one or two parameters per data step, however, so an effective initialization can be found
through standard grid search.

3.2 Design choices

The ODP framework makes it straightforward to design a state-of-the-art network for solving an
inverse problem in imaging. The design choices are the choice of the optimization algorithm to unroll,
the CNN parameterization of the prior, and the initialization scheme. In this section we discuss these
design choices in detail and present defaults guided by the empirical results in Section 5.

Optimization algorithm The choice of optimization algorithm to unroll plays an important but
poorly understood role in the performance of unrolled optimization networks. The only formal
requirement is that each iteration of the unrolled algorithm be almost everywhere differentiable. Prior
work has unrolled the proximal gradient method [5], the half-quadratic splitting (HQS) algorithm
[21, 10], the alternating direction method of multipliers (ADMM) [27, 2], the Chambolle-Pock algorithm
[24, 4], ISTA [12], and a primal-dual algorithm with Bregman distances [17]. No clear consensus has
emerged as to which methods perform best in general or even for specific problems.
In the context of solving problem (1), we propose the proximal gradient method as a good default
choice. The method requires that the proximal operator of g(x) = f (Ax, y) and its Jacobian can be
computed efficiently. Algorithm 2 lists the ODP framework for the proximal gradient method. We
interpret the CNN prior as −αk ∇r(x, θk ). Note that for the proximal gradient network, the CNN
prior is naturally a residual network because its output xk+1/2 is summed with its input xk in Step 4.
The algorithm parameters α0 , . . . , αN −1 represent the gradient step sizes. The proposed initialization
αk = C0 C −k is based on an alternate interpretation of Algorithm 2 as an unrolled HQS method.
Adopting the aggressively decreasing αk from HQS minimizes the number of iterations needed [10].

4
Algorithm 2 ODP proximal gradient network.
1: Initialization: x0 = φ(f, A, y, θ0 ), αk = C0 C −k , C0 > 0, C > 0.
2: for k = 0 to N − 1 do
3: xk+1/2 ← CNN(xk , θk ).
4: xk+1 ← argminx αk f (Ax, y) + 12 kx − xk − xk+1/2 k22 .
5: end for

In Section 5.5, we compare deblurring and compressed sensing MRI results for ODP with proximal
gradient, ADMM, linearized ADMM (LADMM), and gradient descent. The ODP formulations of
ADMM, LADMM, and gradient descent can be found in the supplement. We find that all algorithms
that approximately invert the image formation operator each iteration perform on par. Algorithms
such as ADMM and LADMM that incorporate Lagrange multipliers were at best slightly better than
simple primal algorithms like proximal gradient and gradient descent for the low number of iterations
typical for unrolled optimization methods.

CNN prior The choice of parameterizing each prior step as a separate CNN offers tremendous
flexibility, even allowing the learning of a specialized function for each step. Algorithm 2 naturally
introduces a residual connection to the CNN prior, so a standard residual CNN is a reasonable default
architecture choice. The experiments in Section 5 show this architecture achieves state-of-the-art
results, while being easy to train with random initialization.
Choosing a CNN prior presents a trade-off between increasing the number of algorithm iterations
N , which adds alternation between data and prior steps, and making the CNN deeper. For example,
in our experiments we found that for denoising, where the data step is trivial, larger CNN priors with
fewer algorithm iterations gave better results, while for deconvolution and MRI, where the data step is
a complicated global operation, smaller priors and more iterations gave better results.

Initialization The initialization function (x0 , z 0 ) = φ(f, A, y, θ0 ) could in theory be an arbitrarily

complicated algorithm or neural network. We found that the simple initialization x0 = AH y, which is
known as backprojection, was sufficient for our applications [23, Ch. 25].

4 Related work
The ODP framework generalizes and improves upon previous work on unrolled optimization and deep
models for inverse imaging problems.

Unrolled optimization networks An immediate choice in constructing unrolled optimization

networks for inverse problems in imaging is to parameterize the prior r(x, θ) as r(x, θ) = kCxk1 , where
C is a filterbank, meaning a linear operator given by Cx = (c1 ∗ x, . . . , cp ∗ x) for convolution kernels
c1 , . . . , cp . The parameterizaton is inspired by hand-engineered priors that exploit the sparsity of
images in a particular (dual) basis, such as anisotropic total-variation. Unrolled optimization networks
with learned sparsity priors have achieved good results but are limited in representational power by
the choice of a simple `1 -norm [12, 24].

Field-of-experts A more sophisticated approach than learned sparsity priors is to parameterize

the prior gradient or proximal operator as a field-of-experts (FoE) g(Cx, θ), where C is again a
filterbank and g is a separable nonlinearity parameterized by θ, such as a sum of radial basis functions
[20, 21, 5, 27]. The ODP framework improves upon the FoE approaches both empirically, as shown in
the model comparisons in Section 5, and theoretically, as the FoE model is essentially a 2-layer CNN
and so has less representational power than deeper CNN priors.

5
Figure 2: A qualitative overview of results with ODP networks.

Deep models for direct inversion Several recent approaches propose CNNs that directly solve
specific imaging problems. These architectures resemble instances of the ODP framework, though
with far different motivations for the design. Schuler et al. propose a network for deblurring that
applies a single, fixed deconvolution step followed by a learned CNN, akin to a prior step in ODP
with a different initial iterate [22]. Xu et al. propose a network for deblurring that applies a single,
learned deconvolution step followed by a CNN, similar to a one iteration ODP network [26]. Wang
et al. propose a CNN for MRI whose output is averaged with observations in k-space, similar to a
one iteration ODP network but without jointly learning the prior and data steps [25]. We improve on
these deep models by recognizing the connection to classical optimization based methods via the ODP
framework and using the framework to design more powerful, multi-iteration architectures.

5 Experiments
In this section we present results and analysis of ODP networks for denoising, deblurring, and
compressed sensing MRI. Figure 2 shows a qualitative overview for this variety of inverse imaging
problems. Please see the supplement for details of the training procedure for each experiment.

5.1 Denoising
We consider the Gaussian denoising problem with image formation y = x + z, with z ∼ N (0, σ 2 ). The
corresponding Bayesian estimation problem (1) is
1
minimize 2σ 2 kx − yk22 + r(x, θ). (3)

We trained a 4 iteration proximal gradient ODP network with a 10 layer, 64 channel residual CNN
prior on the 400 image training set from [24]. Table 1 shows that the ODP network outperforms all
state-of-the-art methods on the 68 image test set evaluated in [24].

6
Table 1: Average PSNR (dB) for noise level and Table 2: Average PSNR (dB) for kernels and test
test set from [24]. Timings were done on Xeon set from [26].
3.2 GHz CPUs and a Titan X GPU.

Method Disk Motion

Method σ = 25 Time per image Krishnan [14] 25.94 25.07
BM3D [6] 28.56 2.57s (CPU) Levin [15] 24.54 24.47
EPLL [29] 28.68 108.72s (CPU) Schuler [22] 24.67 25.27
Schmidt [21] 28.72 5.10s (CPU) Schmidt [21] 24.71 25.49
Wang [24] 28.79 0.011s (GPU) Xu [26] 26.01 27.92
Chen [5] 28.95 1.42s (CPU) ODP 26.11 28.49
ODP 29.04 0.13s (GPU)
Table 3: Average PSNR (dB) for kernels and test set of 11 images from [22]. We also evaluate Schuler
et al. and ODP on the BSDS500 data set for a lower variance comparison [16].

Model Gaussian (a) Gaussian (b) Gaussian (c) Box (d) Motion (e)
σ = 10.2 σ=2 σ = 10.2 σ = 2.5 σ = 2.5
EPLL [29] 24.04 26.64 21.36 21.04 29.25
Levin [15] 24.09 26.51 21.72 21.91 28.33
Krishnan [14] 24.17 26.60 21.73 22.07 28.17
IDD-BM3D [7] 24.68 27.13 21.99 22.69 29.41
Schuler [22] 24.76/26.19 27.23/28.41 22.20/24.06 22.75/24.37 29.42/29.91
ODP 24.89/26.20 27.44/28.44 22.26/23.97 23.17/24.37 30.54/30.50

5.2 Deblurring
We consider the problem of joint Gaussian denoising and deblurring, in which the latent image x is
convolved with a known blur kernel in addition to being corrupted by Gaussian noise. The image
formation model is y = k ∗ x + z, where k is the blur kernel and z ∼ N (0, σ 2 ). The corresponding
Bayesian estimation problem (1) is
minimize 2σ1 2 kk ∗ x − yk22 + r(x, θ). (4)
To demonstrate the benefit of ODP networks specialized to specific problem instances, we first train
models per kernel, as proposed in [26]. Specifically, we train 8 iteration proximal gradient ODP
networks with residual 5 layer, 64 channel CNN priors for the out-of-focus disk kernel and motion blur
kernel from [26]. Following Xu et al., we train one model per kernel on ImageNet [8], including clipping
and JPEG artifacts as required by the authors. Table 2 shows that the ODP networks outperform
prior work slightly on the low-pass disk kernel, which completely removes high frequency content, while
a substantial gain is achieved for the motion blur kernel, which preserves more frequency content and
thus benefits from the inverse image formation steps in ODP.
Next, we show that ODP networks can generalize across image formation models. Table 3 compares
ODP networks (same architecture as above) on the test scenarios from [22]. We trained one ODP
network for the four out-of-focus kernels and associated noise levels in [22]. The out-of-focus model is
on par with of Schuler et al., even though Schuler et al. train a specialized model for each kernel and
associated noise level. We trained a second ODP network on randomly generated motion blur kernels.
This model outperforms Schuler et al. on the unseen test set motion blur kernel, even though Schuler
et al. trained specifically on the motion blur kernel from their test set.

5.3 Compressed sensing MRI

In compressed sensing (CS) MRI, a latent image x is measured in the Fourier domain with subsampling.
Following [27], we assume noise free measurements. The image formation model is y = P Fx, where

7
Table 4: Average PSNR (dB) on the brain MRI test set from [27] for different sampling ratios. Timings
were done on an Intel i7-4790k 4 GHz CPU and Titan X GPU.

Method 20% 30% 40% 50% Time per image

PBDW [18] 36.34 38.64 40.31 41.81 35.36s (CPU)
PANO [19] 36.52 39.13 41.01 42.76 53.48s (CPU)
FDLCP [28] 36.95 39.13 40.62 42.00 52.22s (CPU)
ADMM-Net [27] 37.17 39.84 41.56 43.00 0.79s (CPU)
BM3D-MRI [9] 37.98 40.33 41.99 43.47 40.91s (CPU)
ODP 38.50 40.71 42.34 43.85 0.090s (GPU)

F is the DFT and P is a diagonal binary sampling matrix for a given subsampling pattern. The
corresponding Bayesian estimation problem (1) is

minimize r(x, θ)
(5)
subject to P Fx = y.

We trained an 8 iteration proximal gradient ODP network with a residual 7 layer, 64 channel CNN
prior on the 100 training images and pseudo-radial sampling pattern from [27], range from sampling
20% to 50% of the Fourier domain. We evaluate on the 50 test images from the same work. Table 4
shows that the ODP network outperforms even the the best alternative method, BM3D-MRI, for all
sampling patterns. The improvement over BM3D-MRI is larger for sparser (and thus more challenging)
sampling patterns. The fact that a single ODP model performed so well on all four sampling patterns,
particularly in comparison to the ADMM-Net models, which were trained separately per sampling
pattern, again demonstrates that ODP generalizes across image formation models.

5.4 Contribution of prior information

In this section, we analyze how much of the reconstruction performance is due to incorporating the
image formation model in the data step and how much is due to the CNN prior steps. To this end, we
performed an ablation study where ODP models were trained without the data steps. The resulting
models are pure residual networks.
Average PSNR (dB) for the noise level σ = 25 on the test set from [24], over 8 unseen motion blur
kernels from [15] for the noise level σ = 2.55 on the BSDS500 [16], and over the MRI sampling patterns
and test set from [27].
Table 6: Comparison of different algorithms.
Table 5: Performance with and without data step.
Model Deblurring CS MRI
Application Proximal gradient Prior only Proximal gradient 31.04 41.35
Denoising 29.04 29.04 ADMM 31.01 41.39
Deblurring 31.04 26.04 LADMM 30.17 41.37
CS MRI 41.35 37.70 Gradient descent 29.96 N/A
Table 5 shows that for denoising the residual network performed as well as the proximal gradient
ODP network, while for deblurring and CS MRI the proximal gradient network performed substantially
better. The difference between denoising and the other two inverse problems is that in the case of
denoising the ODP data step is trivial: inverting A = I. However, for deblurring the data step applies
B = (AH A + γI)−1 AH , a complicated global operation that for motion blur satisfies BA ≈ I. Similarly,
for CS MRI the ODP proximal gradient network applies A† in the data step, a complicated global
operation (involving a DFT) that satisfies BA ≈ I.
The results suggest an interpretation of the ODP proximal gradient networks as alternating between
applying local corrections in the CNN prior step and applying a global operator B such that BA ≈ I

8
in the data step. The CNNs in the ODP networks conceptually learn to denoise and correct errors
introduced by the approximate inversion of A with B, whereas the pure residual networks must learn
to denoise and invert A directly. When approximately inverting A is a complicated global operation,
direct inversion using residual networks poses an extreme challenge, overcome by the indirect approach
taken by the ODP networks.

5.5 Comparing algorithms

The existing work on unrolled optimization has done little to clarify which optimization algorithms
perform best when unrolled. We investigate the relative performance of unrolled optimization algorithms
in the context of ODP networks. We have compared ODP networks for ADMM, LADMM, proximal
gradient and gradient descent in Table 6, using the same initialization and network parameters as for
deblurring and CS MRI.
For deblurring, the proximal gradient and ADMM models, which apply the regularized pseudoinverse
(AH A + γI)−1 AH in the data step, outperform the LADMM and gradient descent models, which only
apply A and AH . The results suggest that taking more aggressive steps to approximately invert A in
the data step improves performance. For CS MRI, all algorithms apply the pseudoinverse A† (because
AH = A† ) and have similar performance, which matches the observations for deblurring.
Our algorithm comparison shows minimal benefits to incorporating Lagrange multipliers, which is
expected for the relatively low number of iterations N = 8 in our models. The ODP ADMM networks
for deblurring and CS MRI are identical to the respective proximal gradient networks except that
ADMM includes Lagrange multipliers, and the performance is on par. For deblurring, LADMM and
gradient descent are similar architectures, but LADMM incorporates Lagrange multipliers and shows a
small performance gain. Note that Gradient descent cannot be applied to CS MRI as problem (5) is
constrained.

6 Conclusion
The proposed ODP framework offers a principled approach to incorporating prior knowledge of the
image formation into deep networks for solving inverse problems in imaging, yielding state-of-the-art
results for denoising, deblurring, and CS MRI. The framework generalizes and outperforms previous
approaches to unrolled optimization and deep networks for direct inversion. The presented ablation
studies offer general insights into the benefits of prior information and what algorithms are most
suitable for unrolled optimization.
Although the class of imaging problems considered in this work lies at the core of imaging and
sensing, it is only a small fraction of the potential applications for ODP. In future work we will explore
ODP for blind inverse problems, in which the image formation operator A is not fully known, as well
as nonlinear image formation models. Outside of imaging, control is a promising field in which to
apply the ODP framework because deep networks may potentially benefit from prior knowledge of
physical dynamics.

References
[1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.
SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical
learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning,
3(1):1–122, 2001.
[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[4] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to
imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.

9
[5] Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image
restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
5261–5269, 2015.
[6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain
collaborative filtering. IEEE Trans. Image Processing, 16(8):2080–2095, 2007.
[7] A. Danielyan, V. Katkovnik, and K. Egiazarian. BM3D frames and variational image deblurring. IEEE
Trans. Image Processing, 21(4):1715–1728, 2012.
[8] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical
image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on,
pages 248–255. IEEE, 2009.
[9] E. Eksioglu. Decoupled algorithm for MRI reconstruction using nonlocal block matching model: BM3D-
MRI. Journal of Mathematical Imaging and Vision, 56(3):430–440, 2016.
[10] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE Transactions
on Image Processing, 4(7):932–946, 1995.
[11] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks.
In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 9, pages
249–256, 2010.
[12] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In Proceedings of the International
Conference on Machine Learning, pages 399–406, 2010.
[13] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[14] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-Laplacian priors. In Advances in Neural
Information Processing Systems, pages 1033–1041, 2009.
[15] A. Levin, R. Fergus, F. Durand, and W. Freeman. Image and depth from a conventional camera with a
coded aperture. ACM Transactions on Graphics, 26(3), 2007.
[16] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its
application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of
the IEEE International Conference on Computer Vision, pages 416–423, 2001.
[17] p. Ochs, R. Ranftl, T. Brox, and T. Pock. Techniques for gradient-based bilevel optimization with
non-smooth lower level problems. Journal of Mathematical Imaging and Vision, pages 1–20, 2016.
[18] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen. Undersampled MRI reconstruction with
patch-based directional wavelets. Magnetic Resonance Imaging, 30(7):964–977, 2012.
[19] X. Qu, Y. Hou, F. Lam, D. Guo, J. Zhong, and Z. Chen. Magnetic resonance image reconstruction from
undersampled measurements using a patch-based nonlocal operator. Medical Image Analysis, 18(6):843–856,
2014.
[20] S. Roth and M. Black. Fields of experts: A framework for learning image priors. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 860–867, 2005.
[21] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2774–2781, 2014.
[22] C. Schuler, H. Burger, S. Harmeling, and B. Scholkopf. A machine learning approach for non-blind image
deconvolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 1067–1074, 2013.
[23] S. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing,
San Diego, 1997.
[24] S. Wang, S. Fidler, and R. Urtasun. Proximal deep structured models. In Advances in Neural Information
Processing Systems 29, pages 865–873. 2016.
[25] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang. Accelerating magnetic
resonance imaging via deep learning. In Proceedings of the IEEE International Symposium on Biomedical
Imaging, pages 514–517, 2016.

10
[26] L. Xu, J. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In Advances
in Neural Information Processing Systems, pages 1790–1798, 2014.
[27] Y. Yang, J. Sun, H. Li, and Z. Xu. Deep ADMM-Net for compressive sensing MRI. In Advances in Neural
Information Processing Systems, pages 10–18, 2016.
[28] Z. Zhan, J.-F. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu. Fast multiclass dictionaries learning with
geometrical directions in MRI reconstruction. IEEE Transactions on Biomedical Engineering, 63(9):1850–
1861, 2016.
[29] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In
Proceedings of the IEEE International Conference on Computer Vision, pages 479–486, 2011.

Neat Video Pro Plug-In v3.3 Full Pack PC and Mac Cracked
30% (10)
Neat Video Pro Plug-In v3.3 Full Pack PC and Mac Cracked
3 pages
Noise in Load Cell Signal in An Automatic Weighing
No ratings yet
Noise in Load Cell Signal in An Automatic Weighing
10 pages
Deepinverse: A Python Package For Solving Imaging Inverse Problems With Deep Learning
No ratings yet
Deepinverse: A Python Package For Solving Imaging Inverse Problems With Deep Learning
15 pages
Image Analysis Lecture 9
No ratings yet
Image Analysis Lecture 9
32 pages
Early Stopping For Deep Image Prior: Hengkang Wang
No ratings yet
Early Stopping For Deep Image Prior: Hengkang Wang
40 pages
Laumont etal22-BaysianImagingPnP
No ratings yet
Laumont etal22-BaysianImagingPnP
37 pages
Learned Reconstruction Methods With Convergence Guarantees A Survey of Concepts and Applications
No ratings yet
Learned Reconstruction Methods With Convergence Guarantees A Survey of Concepts and Applications
19 pages
Image Analysis Lecture 10
No ratings yet
Image Analysis Lecture 10
32 pages
Robust and Explainable Framework To Address Data Scarcity in Diagnostic Imaging
No ratings yet
Robust and Explainable Framework To Address Data Scarcity in Diagnostic Imaging
64 pages
Preliminary OWR 2018 11
No ratings yet
Preliminary OWR 2018 11
29 pages
Flow Priv
No ratings yet
Flow Priv
26 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
Advanced Techniques in Optimization For Machine Learning and Imaging
No ratings yet
Advanced Techniques in Optimization For Machine Learning and Imaging
173 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
Solving Inverse Problems Using Datadriven Models
No ratings yet
Solving Inverse Problems Using Datadriven Models
174 pages
Operator Learning Algorithms and Analysis
No ratings yet
Operator Learning Algorithms and Analysis
36 pages
Generative Pretraining From Pixels
No ratings yet
Generative Pretraining From Pixels
13 pages
L Resesop: Earned For Solving Inverse Problems With Inexact Forward Operator
No ratings yet
L Resesop: Earned For Solving Inverse Problems With Inexact Forward Operator
21 pages
ChamPock An
No ratings yet
ChamPock An
160 pages
Generative Pretraining From Pixels V2
No ratings yet
Generative Pretraining From Pixels V2
12 pages
Plug-and-Play Image Restoration With Deep Denoiser Prior
No ratings yet
Plug-and-Play Image Restoration With Deep Denoiser Prior
16 pages
Indigo: An Inn-Guided Probabilistic Diffusion Algorithm For Inverse Problems
No ratings yet
Indigo: An Inn-Guided Probabilistic Diffusion Algorithm For Inverse Problems
6 pages
Admm Diptv
No ratings yet
Admm Diptv
8 pages
Auto Inverse
No ratings yet
Auto Inverse
12 pages
Melidonis etal24-PushForwardGenerativePriors
No ratings yet
Melidonis etal24-PushForwardGenerativePriors
5 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
Deep Convolutional Neural Network For Inverse Problems in Imaging
No ratings yet
Deep Convolutional Neural Network For Inverse Problems in Imaging
20 pages
Denoising Diffusion Restoration Models
No ratings yet
Denoising Diffusion Restoration Models
32 pages
Solving Inverse Problems Using Data-Driven Models
No ratings yet
Solving Inverse Problems Using Data-Driven Models
174 pages
1711 10925 PDF
No ratings yet
1711 10925 PDF
23 pages
1 s2.0 S1051200421003249 Main
No ratings yet
1 s2.0 S1051200421003249 Main
13 pages
Int J Imaging Syst Tech - 2024 - Huang - Enhanced Deformation Vector Field Generation With Diffusion Models and Mamba Based
No ratings yet
Int J Imaging Syst Tech - 2024 - Huang - Enhanced Deformation Vector Field Generation With Diffusion Models and Mamba Based
14 pages
Baructu - Limited-Angle Computed Tomography With Deep Image and Physics Priors
No ratings yet
Baructu - Limited-Angle Computed Tomography With Deep Image and Physics Priors
12 pages
Nverse Deep Learning Methods and Benchmarks For Artificial Electromagnetic Material Design
No ratings yet
Nverse Deep Learning Methods and Benchmarks For Artificial Electromagnetic Material Design
22 pages
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
No ratings yet
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
19 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Statement of Purpose: Jaweria Amjad
No ratings yet
Statement of Purpose: Jaweria Amjad
3 pages
On The Interplaybw Physical and Content Priors in Deep Learning For Computational Imaging
No ratings yet
On The Interplaybw Physical and Content Priors in Deep Learning For Computational Imaging
19 pages
PNP Admm
No ratings yet
PNP Admm
93 pages
NeurIPS2024论文
No ratings yet
NeurIPS2024论文
23 pages
Zhang 2021
No ratings yet
Zhang 2021
17 pages
Deep Image Prior
No ratings yet
Deep Image Prior
3 pages
Sciadv Abi8605
No ratings yet
Sciadv Abi8605
10 pages
Rspa 2022 0576
No ratings yet
Rspa 2022 0576
23 pages
Ai Lorenz Pinn
No ratings yet
Ai Lorenz Pinn
28 pages
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
No ratings yet
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
13 pages
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
No ratings yet
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
13 pages
Automated Image Data Preprocessing With Deep Reinforcement Learning
No ratings yet
Automated Image Data Preprocessing With Deep Reinforcement Learning
9 pages
Information Dropout Learning Optimal Representations Through Noisy Computation
No ratings yet
Information Dropout Learning Optimal Representations Through Noisy Computation
9 pages
Coeurdoux etal23-PnPGibbs
No ratings yet
Coeurdoux etal23-PnPGibbs
15 pages
PINN
100% (1)
PINN
22 pages
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
No ratings yet
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
44 pages
Deep Learning 02
No ratings yet
Deep Learning 02
28 pages
Deep Learning For Accelerated and Robust Mri Reconstruction A Review
No ratings yet
Deep Learning For Accelerated and Robust Mri Reconstruction A Review
53 pages
Lensless Computational Imaging Through Deep Learning
No ratings yet
Lensless Computational Imaging Through Deep Learning
8 pages
Multivariate Probabilistic Time Series Forecasting Via Conditioned Normalizing Flows
No ratings yet
Multivariate Probabilistic Time Series Forecasting Via Conditioned Normalizing Flows
19 pages
Baddoo Et Al 2023 Physics Informed Dynamic Mode Decomposition
No ratings yet
Baddoo Et Al 2023 Physics Informed Dynamic Mode Decomposition
23 pages
Image Reconstruction in Dynamic Inverse Problems With Temporal Models
No ratings yet
Image Reconstruction in Dynamic Inverse Problems With Temporal Models
31 pages
Baddoo Et Al Physics Informed Dynamic Mode Decomposition
No ratings yet
Baddoo Et Al Physics Informed Dynamic Mode Decomposition
23 pages
Unsupervised Anomaly Detection For X-Ray Images
No ratings yet
Unsupervised Anomaly Detection For X-Ray Images
22 pages
Pourya 2024
No ratings yet
Pourya 2024
16 pages
Aniket - Singh - MSC - 2024 Thesis
No ratings yet
Aniket - Singh - MSC - 2024 Thesis
66 pages
Deep Residual Learning For Compressed Sensing Mri
No ratings yet
Deep Residual Learning For Compressed Sensing Mri
4 pages
Learning Bayesiano Csmri
No ratings yet
Learning Bayesiano Csmri
13 pages
A Transfer-Learning Approach For Accelerated MRI Using Deep Neural Networks
No ratings yet
A Transfer-Learning Approach For Accelerated MRI Using Deep Neural Networks
37 pages
Information Theory: Mark Van Rossum
No ratings yet
Information Theory: Mark Van Rossum
35 pages
1.4. Image Interpolation - IMPORTANTE
No ratings yet
1.4. Image Interpolation - IMPORTANTE
5 pages
Echo Planar Time Resolved Imaging (EPTI)
No ratings yet
Echo Planar Time Resolved Imaging (EPTI)
17 pages
Improvement of MRI Images Through Heterogeneous Interpolation Techniques
No ratings yet
Improvement of MRI Images Through Heterogeneous Interpolation Techniques
7 pages
Hennig 1999
No ratings yet
Hennig 1999
12 pages
Stereophile 1971 Autumn
No ratings yet
Stereophile 1971 Autumn
36 pages
Optimizing Rendering in Arnold by DCLOVENSKY
No ratings yet
Optimizing Rendering in Arnold by DCLOVENSKY
7 pages
Convolutional Autoencoder For Image Denoising
No ratings yet
Convolutional Autoencoder For Image Denoising
11 pages
GIMP - Reducing CCD Noise
No ratings yet
GIMP - Reducing CCD Noise
6 pages
Dynamic Noise Reduction Circuit
No ratings yet
Dynamic Noise Reduction Circuit
2 pages
PotPlayer Shortcut
No ratings yet
PotPlayer Shortcut
1 page
Wa0046.
No ratings yet
Wa0046.
41 pages
Smoothing vs. Sharpening of Color Images
No ratings yet
Smoothing vs. Sharpening of Color Images
19 pages
IAETSD-JARAS-Singular Value Decomposition Using Bayes Shrink in Wavelet Domain For
No ratings yet
IAETSD-JARAS-Singular Value Decomposition Using Bayes Shrink in Wavelet Domain For
5 pages
Innomar Software
No ratings yet
Innomar Software
6 pages
Processing Seismic Data in The Presence of Residual Statics
No ratings yet
Processing Seismic Data in The Presence of Residual Statics
5 pages
Seismic ProcessingReport
No ratings yet
Seismic ProcessingReport
42 pages
Prathiba Survey Paper 16 Aug 2024 (2) M
No ratings yet
Prathiba Survey Paper 16 Aug 2024 (2) M
21 pages
Analisis Perbandingan Metode Gaussian Filter Dengan Wiener Filter Untuk Mereduksi Noise Gabungan Gaussian Dan Salt and Pepper Dimas Ari Tonang
No ratings yet
Analisis Perbandingan Metode Gaussian Filter Dengan Wiener Filter Untuk Mereduksi Noise Gabungan Gaussian Dan Salt and Pepper Dimas Ari Tonang
5 pages
Soundhack Manual v0.888
No ratings yet
Soundhack Manual v0.888
52 pages
Hyperspectral Image Fundamentals2018
100% (1)
Hyperspectral Image Fundamentals2018
24 pages
Computer Vision (BAI151A) Important Question
No ratings yet
Computer Vision (BAI151A) Important Question
2 pages
Project1 Report
No ratings yet
Project1 Report
11 pages
The Challenge: Methods and Advancements Towards Robust Depth Estimation
No ratings yet
The Challenge: Methods and Advancements Towards Robust Depth Estimation
65 pages
Monochrome Narrowband Workflow
No ratings yet
Monochrome Narrowband Workflow
25 pages
CH 5 Extract
No ratings yet
CH 5 Extract
12 pages
Image and Video Analytics Assignment
No ratings yet
Image and Video Analytics Assignment
8 pages
Imagdressing-V1: Customizable Virtual Dressing
No ratings yet
Imagdressing-V1: Customizable Virtual Dressing
9 pages
Digital Image Processing IMAGE ENHANCEMENT
No ratings yet
Digital Image Processing IMAGE ENHANCEMENT
35 pages
Sar-Hough Transform Proposal
No ratings yet
Sar-Hough Transform Proposal
5 pages
Presented By: Gajanand Patil, Jagannath V. Bhandari, Sushanth V. S
No ratings yet
Presented By: Gajanand Patil, Jagannath V. Bhandari, Sushanth V. S
1 page
Notebook SWmanual v3.2
No ratings yet
Notebook SWmanual v3.2
187 pages
2023ES000861
No ratings yet
2023ES000861
16 pages

Unrolled Optimization With Deep Priors: Steven Diamond Vincent Sitzmann Felix Heide Gordon Wetzstein December 20, 2018

Uploaded by

Unrolled Optimization With Deep Priors: Steven Diamond Vincent Sitzmann Felix Heide Gordon Wetzstein December 20, 2018

Uploaded by

Unrolled Optimization with Deep Priors

Steven Diamond∗ Vincent Sitzmann∗ Felix Heide Gordon Wetzstein

3 Unrolled optimization with deep priors

minimize Ex∼Ω Ey∼ω(Ax) `(x, Γ(y, θ)), (2)

3.2 Design choices

Initialization The initialization function (x0 , z 0 ) = φ(f, A, y, θ0 ) could in theory be an arbitrarily

Unrolled optimization networks An immediate choice in constructing unrolled optimization

Field-of-experts A more sophisticated approach than learned sparsity priors is to parameterize

Method Disk Motion

5.3 Compressed sensing MRI

Method 20% 30% 40% 50% Time per image

5.4 Contribution of prior information

5.5 Comparing algorithms

You might also like