Unrolled Optimization with Deep Priors
Steven Diamond∗ Vincent Sitzmann∗ Felix Heide Gordon Wetzstein
December 20, 2018
arXiv:1705.08041v2 [cs.CV] 18 Dec 2018
Abstract
A broad class of problems at the core of computational imaging, sensing, and low-level computer
vision reduces to the inverse problem of extracting latent images that follow a prior distribution,
from measurements taken under a known physical image formation model. Traditionally, hand-
crafted priors along with iterative optimization methods have been used to solve such problems. In
this paper we present unrolled optimization with deep priors, a principled framework for infusing
knowledge of the image formation into deep networks that solve inverse problems in imaging,
inspired by classical iterative methods. We show that instances of the framework outperform the
state-of-the-art by a substantial margin for a wide variety of imaging problems, such as denoising,
deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover, we conduct
experiments that explain how the framework is best used and why it outperforms previous methods.
1 Introduction
In inverse imaging problems, we seek to reconstruct a latent image from measurements taken under a
known physical image formation. Such inverse problems arise throughout computational photography,
computer vision, medical imaging, and scientific imaging. Residing in the early vision layers of every
autonomous vision system, they are essential for all vision based autonomous agents. Recent years have
seen tremendous progress in both classical and deep methods for solving inverse problems in imaging.
Classical and deep approaches have relative advantages and disadvantages. Classical algorithms, based
on formal optimization, exploit knowledge of the image formation model in a principled way, but
struggle to incorporate sophisticated learned models of natural images. Deep methods easily learn
complex statistics of natural images, but lack a systematic approach to incorporating prior knowledge
of the image formation model. What is missing is a general framework for designing deep networks
that incorporate prior information, as well as a clear understanding of when prior information is useful.
In this paper we propose unrolled optimization with deep priors (ODP): a principled, general
purpose framework for integrating prior knowledge into deep networks. We focus on applications of
the framework to inverse problems in imaging. Given an image formation model and a few generic,
high-level design choices, the ODP framework provides an easy to train, high performance network
architecture. The framework suggests novel network architectures that outperform prior work across a
variety of imaging problems.
The ODP framework is based on unrolled optimization, in which we truncate a classical iterative
optimization algorithm and interpret it as a deep network. Unrolling optimization has been a common
practice among practitioners in imaging, and training unrolled optimization models has recently been
explored for various imaging applications, all using variants of field-of-experts priors [12, 21, 5, 24].
We differ from existing approaches in that we propose a general framework for unrolling optimization
methods along with deep convolutional prior architectures within the unrolled optimization. By
training deep CNN priors within unrolled optimization architectures, instances of ODP outperform
state-of-the-art results on a broad variety of inverse imaging problems.
∗ These authors contributed equally.
1
Our empirical results clarify the benefits and limitations of encoding prior information for inverse
problems in deep networks. Layers that (approximately) invert the image formation operator are useful
because they simplify the reconstruction task to denoising and correcting artifacts introduced by the
inversion layers. On the other hand, prior layers improve network generalization, boosting performance
on unseen image formation operators. For deblurring and compressed sensing MRI, we found that a
single ODP model trained on many image formation operators outperforms existing state-of-the-art
methods where a specialized model was trained for each operator.
Moreover, we offer insight into the open question of what iterative algorithm is best for unrolled
optimization, given a linear image formation model. Our main finding is that simple primal algorithms
that (approximately) invert the image formation operator each iteration perform best.
In summary, our contributions are as follows:
1. We introduce ODP, a principled, general purpose framework for inverse problems in imaging,
which incorporates prior knowledge of the image formation into deep networks.
2. We demonstrate that instances of the ODP framework for denoising, deblurring, and compressed
sensing MRI outperform state-of-the-art results by a large margin.
3. We present empirically derived insights on how the ODP framework and related approaches are
best used, such as when exploiting prior information is advantageous and which optimization
algorithms are most suitable for unrolling.
2 Motivation
Bayesian model The proposed ODP framework is inspired by an extensive body of work on solving
inverse problems in imaging via maximum-a-posteriori (MAP) estimation under a Bayesian model. In
the Bayesian model, an unknown image x is drawn from a prior distribution Ω(θ) with parameters θ.
The imaging system applies a linear operator A to this image, representing all optical processes in the
capture, and then measures an image y on the sensor, drawn from a noise distribution ω(Ax) that
models sensor noise, e.g., read noise, and noise in the signal itself, e.g., photon-shot noise.
Let P (y|Ax) be the probability of sampling y from ω(Ax) and P (x; θ) be the probability of sampling
x from Ω(θ). Then the probability of an unknown image x yielding an observation y is proportional to
P (y|Ax)P (x; θ).
The MAP point-estimate of x is given by x = argmaxx P (y|Ax)P (x; θ), or equivalently
x = argmin f (y, Ax) + r(x, θ), (1)
x
where the data term f (y, Ax) = − log P (y|Ax) and prior term r(x, θ) = − log P (x; θ) are negative
log-likelihoods. Computing x thus involves solving an optimization problem [3, Chap. 7].
Unrolled iterative methods A large variety of algorithms have been developed for solving
problem (1) efficiently for different convex data terms and priors (e.g., FISTA [1], Chambolle-
Pock [4], ADMM [2]). The majority of these algorithms are iterative methods, in which a mapping
Γ(xk , A, y, θ) → xk+1 is applied repeatedly to generate a series of iterates that converge to solution x? ,
starting with an initial point x0 .
Iterative methods are usually terminated based on a stopping condition that ensures theoretical
convergence properties. An alternative approach is to execute a pre-determined number of iterations
N , in other words unrolling the optimization algorithm. This approach is motivated by the fact
that for many imaging applications very high accuracy, e.g., convergence below tolerance of 10−6 for
every local pixel state, is not needed in practice, as opposed to optimization problems in, for instance,
control. Fixing the number of iterations allows us to view the iterative method as an explicit function
ΓN (·, A, y, θ) → xN of the initial point x0 . Parameters such as θ may be fixed across all iterations or
vary by iteration. The unrolled iterative algorithm can be interpreted as a deep network [24].
2
iter iter iter iter
Single
CNN Prior
Iteration:
Figure 1: A proximal gradient ODP network for deblurring under Gaussian noise, mapping the
observation y into an estimate x̂ of the latent image x. Here F is the DFT, k is the blur kernel, and
K is its Fourier transform.
Parameterization The parameters θ in an unrolled iterative algorithm are the algorithm hyper-
parameters, such as step sizes, and model parameters defining the prior. Generally the number of
algorithm hyperparameters is small (1-5 per iteration), so the model capacity of the unrolled algorithm
is primarily determined by the representation of the prior.
Many efficient iterative optimization methods do not interact with the prior term r directly, but
instead minimize r via its (sub)gradient or proximal operator proxr(·,θ) , defined as
1
proxr(·,θ) (v) = argmin r(z, θ) + kz − vk22 .
z 2
The proximal operator is a generalization of Euclidean projection. In the ODP framework, we propose
to parameterize the gradient or proximal operator of r directly and define r implicitly.
3 Unrolled optimization with deep priors
We propose the ODP framework to incorporate knowledge of the image formation into deep convolutional
networks. The framework factors networks into data steps, which are functions of the measurements y
encoding prior information about the image formation model, and CNN steps, which represent statistical
image priors. The factorization follows a principled approach inspired by classical optimization methods,
thereby combining the best of deep models and classical algorithms.
3.1 Framework
The ODP framework is summarized by the network template in Algorithm 1. The design choices in
the template are the optimization algorithm, which defines the data step Γ and algorithm state z k ,
the number of iterations N that the algorithm is unrolled, the function φ to initialize the algorithm
from the measurements y, and the CNN used in the prior step, whose output xk+1/2 represents either
∇r(xk , θk ) or proxr(·,θk ) (xk ), depending on the optimization algorithm. Figure 1 shows an example
ODP instance for deblurring under Gaussian noise.
Instances of the ODP framework have two complementary interpretations. From the perspective of
classical optimization based methods, an ODP architecture applies a standard optimization algorithm
but learns a prior defined by a CNN. From the perspective of deep learning, the network is a CNN
with layers tailored to the image formation model.
ODP networks are motivated by minimizing the objective in problem (1), but they are trained
to minimize a higher-level loss, which is defined on a metric between the network output and the
3
Algorithm 1 ODP network template to solve problem (1)
1: Initialization: (x0 , z 0 ) = φ(f, A, y, θ0 ).
2: for k = 0 to N − 1 do
3: xk+1/2 ← CNN(xk , z k , θk ).
4: (xk+1 , z k+1 ) ← Γ(f, A, y, xk+1/2 , z k , θk ).
5: end for
ground-truth latent image over a training set of image/measurement pairs. Classical metrics for images
are mean-squared error, PSNR, or SSIM. Let Γ(y, θ) be the network output given measurements y and
parameters θ. Then we train the network by (approximately) solving the optimization problem
minimize Ex∼Ω Ey∼ω(Ax) `(x, Γ(y, θ)), (2)
where θ is the optimization variable, ` is the chosen reconstruction loss, e.g., PSNR, and Ω is the
true distribution over images (as opposed to the parameterized approximation Ω(θ)). The ability to
train directly for expected reconstruction loss is a primary advantage of deep networks and unrolled
optimization over classical image optimization methods. In contrast to pretraining priors on natural
image training sets, directly training priors within the optimization algorithm allows ODP networks to
learn application-specific priors and efficiently share information between prior and data steps, allowing
drastically fewer iterations than classical methods.
Since ODPs are close to conventional CNNs, we can approximately solve problem (2) using the
many effective stochastic gradient based methods developed for CNNs (e.g., Adam [13]). Similarly, we
can initialize the portions of θ corresponding to the CNN prior steps using standard CNN initialization
schemes (e.g., Xavier initialization [11]). The remaining challenge in training ODPs is initializing the
portions of θ corresponding to algorithm parameters in the data step Γ. Most optimizaton algorithms
only have one or two parameters per data step, however, so an effective initialization can be found
through standard grid search.
3.2 Design choices
The ODP framework makes it straightforward to design a state-of-the-art network for solving an
inverse problem in imaging. The design choices are the choice of the optimization algorithm to unroll,
the CNN parameterization of the prior, and the initialization scheme. In this section we discuss these
design choices in detail and present defaults guided by the empirical results in Section 5.
Optimization algorithm The choice of optimization algorithm to unroll plays an important but
poorly understood role in the performance of unrolled optimization networks. The only formal
requirement is that each iteration of the unrolled algorithm be almost everywhere differentiable. Prior
work has unrolled the proximal gradient method [5], the half-quadratic splitting (HQS) algorithm
[21, 10], the alternating direction method of multipliers (ADMM) [27, 2], the Chambolle-Pock algorithm
[24, 4], ISTA [12], and a primal-dual algorithm with Bregman distances [17]. No clear consensus has
emerged as to which methods perform best in general or even for specific problems.
In the context of solving problem (1), we propose the proximal gradient method as a good default
choice. The method requires that the proximal operator of g(x) = f (Ax, y) and its Jacobian can be
computed efficiently. Algorithm 2 lists the ODP framework for the proximal gradient method. We
interpret the CNN prior as −αk ∇r(x, θk ). Note that for the proximal gradient network, the CNN
prior is naturally a residual network because its output xk+1/2 is summed with its input xk in Step 4.
The algorithm parameters α0 , . . . , αN −1 represent the gradient step sizes. The proposed initialization
αk = C0 C −k is based on an alternate interpretation of Algorithm 2 as an unrolled HQS method.
Adopting the aggressively decreasing αk from HQS minimizes the number of iterations needed [10].
4
Algorithm 2 ODP proximal gradient network.
1: Initialization: x0 = φ(f, A, y, θ0 ), αk = C0 C −k , C0 > 0, C > 0.
2: for k = 0 to N − 1 do
3: xk+1/2 ← CNN(xk , θk ).
4: xk+1 ← argminx αk f (Ax, y) + 12 kx − xk − xk+1/2 k22 .
5: end for
In Section 5.5, we compare deblurring and compressed sensing MRI results for ODP with proximal
gradient, ADMM, linearized ADMM (LADMM), and gradient descent. The ODP formulations of
ADMM, LADMM, and gradient descent can be found in the supplement. We find that all algorithms
that approximately invert the image formation operator each iteration perform on par. Algorithms
such as ADMM and LADMM that incorporate Lagrange multipliers were at best slightly better than
simple primal algorithms like proximal gradient and gradient descent for the low number of iterations
typical for unrolled optimization methods.
CNN prior The choice of parameterizing each prior step as a separate CNN offers tremendous
flexibility, even allowing the learning of a specialized function for each step. Algorithm 2 naturally
introduces a residual connection to the CNN prior, so a standard residual CNN is a reasonable default
architecture choice. The experiments in Section 5 show this architecture achieves state-of-the-art
results, while being easy to train with random initialization.
Choosing a CNN prior presents a trade-off between increasing the number of algorithm iterations
N , which adds alternation between data and prior steps, and making the CNN deeper. For example,
in our experiments we found that for denoising, where the data step is trivial, larger CNN priors with
fewer algorithm iterations gave better results, while for deconvolution and MRI, where the data step is
a complicated global operation, smaller priors and more iterations gave better results.
Initialization The initialization function (x0 , z 0 ) = φ(f, A, y, θ0 ) could in theory be an arbitrarily
complicated algorithm or neural network. We found that the simple initialization x0 = AH y, which is
known as backprojection, was sufficient for our applications [23, Ch. 25].
4 Related work
The ODP framework generalizes and improves upon previous work on unrolled optimization and deep
models for inverse imaging problems.
Unrolled optimization networks An immediate choice in constructing unrolled optimization
networks for inverse problems in imaging is to parameterize the prior r(x, θ) as r(x, θ) = kCxk1 , where
C is a filterbank, meaning a linear operator given by Cx = (c1 ∗ x, . . . , cp ∗ x) for convolution kernels
c1 , . . . , cp . The parameterizaton is inspired by hand-engineered priors that exploit the sparsity of
images in a particular (dual) basis, such as anisotropic total-variation. Unrolled optimization networks
with learned sparsity priors have achieved good results but are limited in representational power by
the choice of a simple `1 -norm [12, 24].
Field-of-experts A more sophisticated approach than learned sparsity priors is to parameterize
the prior gradient or proximal operator as a field-of-experts (FoE) g(Cx, θ), where C is again a
filterbank and g is a separable nonlinearity parameterized by θ, such as a sum of radial basis functions
[20, 21, 5, 27]. The ODP framework improves upon the FoE approaches both empirically, as shown in
the model comparisons in Section 5, and theoretically, as the FoE model is essentially a 2-layer CNN
and so has less representational power than deeper CNN priors.
5
Figure 2: A qualitative overview of results with ODP networks.
Deep models for direct inversion Several recent approaches propose CNNs that directly solve
specific imaging problems. These architectures resemble instances of the ODP framework, though
with far different motivations for the design. Schuler et al. propose a network for deblurring that
applies a single, fixed deconvolution step followed by a learned CNN, akin to a prior step in ODP
with a different initial iterate [22]. Xu et al. propose a network for deblurring that applies a single,
learned deconvolution step followed by a CNN, similar to a one iteration ODP network [26]. Wang
et al. propose a CNN for MRI whose output is averaged with observations in k-space, similar to a
one iteration ODP network but without jointly learning the prior and data steps [25]. We improve on
these deep models by recognizing the connection to classical optimization based methods via the ODP
framework and using the framework to design more powerful, multi-iteration architectures.
5 Experiments
In this section we present results and analysis of ODP networks for denoising, deblurring, and
compressed sensing MRI. Figure 2 shows a qualitative overview for this variety of inverse imaging
problems. Please see the supplement for details of the training procedure for each experiment.
5.1 Denoising
We consider the Gaussian denoising problem with image formation y = x + z, with z ∼ N (0, σ 2 ). The
corresponding Bayesian estimation problem (1) is
1
minimize 2σ 2 kx − yk22 + r(x, θ). (3)
We trained a 4 iteration proximal gradient ODP network with a 10 layer, 64 channel residual CNN
prior on the 400 image training set from [24]. Table 1 shows that the ODP network outperforms all
state-of-the-art methods on the 68 image test set evaluated in [24].
6
Table 1: Average PSNR (dB) for noise level and Table 2: Average PSNR (dB) for kernels and test
test set from [24]. Timings were done on Xeon set from [26].
3.2 GHz CPUs and a Titan X GPU.
Method Disk Motion
Method σ = 25 Time per image Krishnan [14] 25.94 25.07
BM3D [6] 28.56 2.57s (CPU) Levin [15] 24.54 24.47
EPLL [29] 28.68 108.72s (CPU) Schuler [22] 24.67 25.27
Schmidt [21] 28.72 5.10s (CPU) Schmidt [21] 24.71 25.49
Wang [24] 28.79 0.011s (GPU) Xu [26] 26.01 27.92
Chen [5] 28.95 1.42s (CPU) ODP 26.11 28.49
ODP 29.04 0.13s (GPU)
Table 3: Average PSNR (dB) for kernels and test set of 11 images from [22]. We also evaluate Schuler
et al. and ODP on the BSDS500 data set for a lower variance comparison [16].
Model Gaussian (a) Gaussian (b) Gaussian (c) Box (d) Motion (e)
σ = 10.2 σ=2 σ = 10.2 σ = 2.5 σ = 2.5
EPLL [29] 24.04 26.64 21.36 21.04 29.25
Levin [15] 24.09 26.51 21.72 21.91 28.33
Krishnan [14] 24.17 26.60 21.73 22.07 28.17
IDD-BM3D [7] 24.68 27.13 21.99 22.69 29.41
Schuler [22] 24.76/26.19 27.23/28.41 22.20/24.06 22.75/24.37 29.42/29.91
ODP 24.89/26.20 27.44/28.44 22.26/23.97 23.17/24.37 30.54/30.50
5.2 Deblurring
We consider the problem of joint Gaussian denoising and deblurring, in which the latent image x is
convolved with a known blur kernel in addition to being corrupted by Gaussian noise. The image
formation model is y = k ∗ x + z, where k is the blur kernel and z ∼ N (0, σ 2 ). The corresponding
Bayesian estimation problem (1) is
minimize 2σ1 2 kk ∗ x − yk22 + r(x, θ). (4)
To demonstrate the benefit of ODP networks specialized to specific problem instances, we first train
models per kernel, as proposed in [26]. Specifically, we train 8 iteration proximal gradient ODP
networks with residual 5 layer, 64 channel CNN priors for the out-of-focus disk kernel and motion blur
kernel from [26]. Following Xu et al., we train one model per kernel on ImageNet [8], including clipping
and JPEG artifacts as required by the authors. Table 2 shows that the ODP networks outperform
prior work slightly on the low-pass disk kernel, which completely removes high frequency content, while
a substantial gain is achieved for the motion blur kernel, which preserves more frequency content and
thus benefits from the inverse image formation steps in ODP.
Next, we show that ODP networks can generalize across image formation models. Table 3 compares
ODP networks (same architecture as above) on the test scenarios from [22]. We trained one ODP
network for the four out-of-focus kernels and associated noise levels in [22]. The out-of-focus model is
on par with of Schuler et al., even though Schuler et al. train a specialized model for each kernel and
associated noise level. We trained a second ODP network on randomly generated motion blur kernels.
This model outperforms Schuler et al. on the unseen test set motion blur kernel, even though Schuler
et al. trained specifically on the motion blur kernel from their test set.
5.3 Compressed sensing MRI
In compressed sensing (CS) MRI, a latent image x is measured in the Fourier domain with subsampling.
Following [27], we assume noise free measurements. The image formation model is y = P Fx, where
7
Table 4: Average PSNR (dB) on the brain MRI test set from [27] for different sampling ratios. Timings
were done on an Intel i7-4790k 4 GHz CPU and Titan X GPU.
Method 20% 30% 40% 50% Time per image
PBDW [18] 36.34 38.64 40.31 41.81 35.36s (CPU)
PANO [19] 36.52 39.13 41.01 42.76 53.48s (CPU)
FDLCP [28] 36.95 39.13 40.62 42.00 52.22s (CPU)
ADMM-Net [27] 37.17 39.84 41.56 43.00 0.79s (CPU)
BM3D-MRI [9] 37.98 40.33 41.99 43.47 40.91s (CPU)
ODP 38.50 40.71 42.34 43.85 0.090s (GPU)
F is the DFT and P is a diagonal binary sampling matrix for a given subsampling pattern. The
corresponding Bayesian estimation problem (1) is
minimize r(x, θ)
(5)
subject to P Fx = y.
We trained an 8 iteration proximal gradient ODP network with a residual 7 layer, 64 channel CNN
prior on the 100 training images and pseudo-radial sampling pattern from [27], range from sampling
20% to 50% of the Fourier domain. We evaluate on the 50 test images from the same work. Table 4
shows that the ODP network outperforms even the the best alternative method, BM3D-MRI, for all
sampling patterns. The improvement over BM3D-MRI is larger for sparser (and thus more challenging)
sampling patterns. The fact that a single ODP model performed so well on all four sampling patterns,
particularly in comparison to the ADMM-Net models, which were trained separately per sampling
pattern, again demonstrates that ODP generalizes across image formation models.
5.4 Contribution of prior information
In this section, we analyze how much of the reconstruction performance is due to incorporating the
image formation model in the data step and how much is due to the CNN prior steps. To this end, we
performed an ablation study where ODP models were trained without the data steps. The resulting
models are pure residual networks.
Average PSNR (dB) for the noise level σ = 25 on the test set from [24], over 8 unseen motion blur
kernels from [15] for the noise level σ = 2.55 on the BSDS500 [16], and over the MRI sampling patterns
and test set from [27].
Table 6: Comparison of different algorithms.
Table 5: Performance with and without data step.
Model Deblurring CS MRI
Application Proximal gradient Prior only Proximal gradient 31.04 41.35
Denoising 29.04 29.04 ADMM 31.01 41.39
Deblurring 31.04 26.04 LADMM 30.17 41.37
CS MRI 41.35 37.70 Gradient descent 29.96 N/A
Table 5 shows that for denoising the residual network performed as well as the proximal gradient
ODP network, while for deblurring and CS MRI the proximal gradient network performed substantially
better. The difference between denoising and the other two inverse problems is that in the case of
denoising the ODP data step is trivial: inverting A = I. However, for deblurring the data step applies
B = (AH A + γI)−1 AH , a complicated global operation that for motion blur satisfies BA ≈ I. Similarly,
for CS MRI the ODP proximal gradient network applies A† in the data step, a complicated global
operation (involving a DFT) that satisfies BA ≈ I.
The results suggest an interpretation of the ODP proximal gradient networks as alternating between
applying local corrections in the CNN prior step and applying a global operator B such that BA ≈ I
8
in the data step. The CNNs in the ODP networks conceptually learn to denoise and correct errors
introduced by the approximate inversion of A with B, whereas the pure residual networks must learn
to denoise and invert A directly. When approximately inverting A is a complicated global operation,
direct inversion using residual networks poses an extreme challenge, overcome by the indirect approach
taken by the ODP networks.
5.5 Comparing algorithms
The existing work on unrolled optimization has done little to clarify which optimization algorithms
perform best when unrolled. We investigate the relative performance of unrolled optimization algorithms
in the context of ODP networks. We have compared ODP networks for ADMM, LADMM, proximal
gradient and gradient descent in Table 6, using the same initialization and network parameters as for
deblurring and CS MRI.
For deblurring, the proximal gradient and ADMM models, which apply the regularized pseudoinverse
(AH A + γI)−1 AH in the data step, outperform the LADMM and gradient descent models, which only
apply A and AH . The results suggest that taking more aggressive steps to approximately invert A in
the data step improves performance. For CS MRI, all algorithms apply the pseudoinverse A† (because
AH = A† ) and have similar performance, which matches the observations for deblurring.
Our algorithm comparison shows minimal benefits to incorporating Lagrange multipliers, which is
expected for the relatively low number of iterations N = 8 in our models. The ODP ADMM networks
for deblurring and CS MRI are identical to the respective proximal gradient networks except that
ADMM includes Lagrange multipliers, and the performance is on par. For deblurring, LADMM and
gradient descent are similar architectures, but LADMM incorporates Lagrange multipliers and shows a
small performance gain. Note that Gradient descent cannot be applied to CS MRI as problem (5) is
constrained.
6 Conclusion
The proposed ODP framework offers a principled approach to incorporating prior knowledge of the
image formation into deep networks for solving inverse problems in imaging, yielding state-of-the-art
results for denoising, deblurring, and CS MRI. The framework generalizes and outperforms previous
approaches to unrolled optimization and deep networks for direct inversion. The presented ablation
studies offer general insights into the benefits of prior information and what algorithms are most
suitable for unrolled optimization.
Although the class of imaging problems considered in this work lies at the core of imaging and
sensing, it is only a small fraction of the potential applications for ODP. In future work we will explore
ODP for blind inverse problems, in which the image formation operator A is not fully known, as well
as nonlinear image formation models. Outside of imaging, control is a promising field in which to
apply the ODP framework because deep networks may potentially benefit from prior knowledge of
physical dynamics.
References
[1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.
SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical
learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning,
3(1):1–122, 2001.
[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[4] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to
imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.
9
[5] Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image
restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
5261–5269, 2015.
[6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain
collaborative filtering. IEEE Trans. Image Processing, 16(8):2080–2095, 2007.
[7] A. Danielyan, V. Katkovnik, and K. Egiazarian. BM3D frames and variational image deblurring. IEEE
Trans. Image Processing, 21(4):1715–1728, 2012.
[8] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical
image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on,
pages 248–255. IEEE, 2009.
[9] E. Eksioglu. Decoupled algorithm for MRI reconstruction using nonlocal block matching model: BM3D-
MRI. Journal of Mathematical Imaging and Vision, 56(3):430–440, 2016.
[10] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE Transactions
on Image Processing, 4(7):932–946, 1995.
[11] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks.
In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 9, pages
249–256, 2010.
[12] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In Proceedings of the International
Conference on Machine Learning, pages 399–406, 2010.
[13] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[14] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-Laplacian priors. In Advances in Neural
Information Processing Systems, pages 1033–1041, 2009.
[15] A. Levin, R. Fergus, F. Durand, and W. Freeman. Image and depth from a conventional camera with a
coded aperture. ACM Transactions on Graphics, 26(3), 2007.
[16] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its
application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of
the IEEE International Conference on Computer Vision, pages 416–423, 2001.
[17] p. Ochs, R. Ranftl, T. Brox, and T. Pock. Techniques for gradient-based bilevel optimization with
non-smooth lower level problems. Journal of Mathematical Imaging and Vision, pages 1–20, 2016.
[18] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen. Undersampled MRI reconstruction with
patch-based directional wavelets. Magnetic Resonance Imaging, 30(7):964–977, 2012.
[19] X. Qu, Y. Hou, F. Lam, D. Guo, J. Zhong, and Z. Chen. Magnetic resonance image reconstruction from
undersampled measurements using a patch-based nonlocal operator. Medical Image Analysis, 18(6):843–856,
2014.
[20] S. Roth and M. Black. Fields of experts: A framework for learning image priors. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 860–867, 2005.
[21] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2774–2781, 2014.
[22] C. Schuler, H. Burger, S. Harmeling, and B. Scholkopf. A machine learning approach for non-blind image
deconvolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 1067–1074, 2013.
[23] S. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing,
San Diego, 1997.
[24] S. Wang, S. Fidler, and R. Urtasun. Proximal deep structured models. In Advances in Neural Information
Processing Systems 29, pages 865–873. 2016.
[25] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang. Accelerating magnetic
resonance imaging via deep learning. In Proceedings of the IEEE International Symposium on Biomedical
Imaging, pages 514–517, 2016.
10
[26] L. Xu, J. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In Advances
in Neural Information Processing Systems, pages 1790–1798, 2014.
[27] Y. Yang, J. Sun, H. Li, and Z. Xu. Deep ADMM-Net for compressive sensing MRI. In Advances in Neural
Information Processing Systems, pages 10–18, 2016.
[28] Z. Zhan, J.-F. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu. Fast multiclass dictionaries learning with
geometrical directions in MRI reconstruction. IEEE Transactions on Biomedical Engineering, 63(9):1850–
1861, 2016.
[29] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In
Proceedings of the IEEE International Conference on Computer Vision, pages 479–486, 2011.
11