0% found this document useful (0 votes)
56 views

Deep Algorithm Unrolling For Blind Image Deblurring

1) The document proposes a neural network architecture called Deep Unrolling for Blind Deblurring (DUBLID) for blind image deblurring. 2) DUBLID is based on the idea of "algorithm unrolling" which connects iterative algorithms like those used for sparse coding to neural network architectures. 3) The authors first present an iterative algorithm that can be seen as a generalization of traditional total variation regularization in the gradient domain, then unroll it to construct the DUBLID neural network, learning key parameters from training images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Deep Algorithm Unrolling For Blind Image Deblurring

1) The document proposes a neural network architecture called Deep Unrolling for Blind Deblurring (DUBLID) for blind image deblurring. 2) DUBLID is based on the idea of "algorithm unrolling" which connects iterative algorithms like those used for sparse coding to neural network architectures. 3) The authors first present an iterative algorithm that can be seen as a generalization of traditional total variation regularization in the gradient domain, then unroll it to construct the DUBLID neural network, learning key parameters from training images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Deep Algorithm Unrolling for


Blind Image Deblurring
Yuelong Li, Student Member, IEEE, Mohammad Tofighi, Student Member, IEEE,
Junyi Geng, Student Member, IEEE, Vishal Monga, Senior Member, IEEE, and Yonina C. Eldar, Fellow, IEEE

Abstract—Blind image deblurring remains a topic of enduring camera and the imaged scene during exposure. Assuming the
interest. Learning based approaches, especially those that employ scene is planar and the camera motion is translational, the
arXiv:1902.03493v3 [eess.IV] 29 May 2019

neural networks have emerged to complement traditional model image degradation process may be modelled as a discrete
based methods and in many cases achieve vastly enhanced
performance. That said, neural network approaches are generally convolution [1]:
empirically designed and the underlying structures are difficult to y = k ∗ x + n, (1)
interpret. In recent years, a promising technique called algorithm
unrolling has been developed that has helped connect iterative where y is the observed blurry image, x is the latent sharp
algorithms such as those for sparse coding to neural network image, k is the unknown point spread function (blur kernel),
architectures. However, such connections have not been made yet
and n is random noise which is often modelled as Gaussian.
for blind image deblurring. In this paper, we propose a neural
network architecture based on this idea. We first present an Blind motion deblurring corresponds to estimating both k and
iterative algorithm that may be considered as a generalization x given y; this estimation problem is also commonly called
of the traditional total-variation regularization method in the blind deconvolution.
gradient domain. We then unroll the algorithm to construct a Related Work: The majority of existing blind motion de-
neural network for image deblurring which we refer to as Deep
Unrolling for Blind Deblurring (DUBLID). Key algorithm param-
blurring methods are based on iterative optimization. Early
eters are learned with the help of training images. Our proposed works can be traced back to several decades ago [2], [3],
deep network DUBLID achieves significant practical performance [4], [1], [5]. These methods are only effective when the
gains while enjoying interpretability at the same time. Extensive blur kernel is relatively small. In the last decade, signif-
experimental results show that DUBLID outperforms many state- icant breakthroughs have been made both practically and
of-the-art methods and in addition is computationally faster.
conceptually. As both the image and the kernel need to be
estimated, there are infinitely many pairs of solutions forming
I. I NTRODUCTION the same blurred observation rendering blind deconvolution an
ill-posed problem. A popular remedy is to add regularizations
B LIND image deblurring refers to the process of recover-
ing a sharp image from its blurred observation without
explicitly knowing the blur function. In real world imaging,
so that many blind deblurring algorithms essentially reduce to
solving regularized inverse problems. A vast majority of these
images frequently suffer from degraded quality as a conse- techniques hinge on sparsity-inducing regularizers, either in
quence of blurring artifacts, which blind deblurring algorithms the gradient domain [6], [7], [8], [9], [10], [11], [12], [13] or
are designed to remove such artifacts. These artifacts may more general sparsifying transformation domains [14], [15],
come from different sources, such as atmospheric turbulence, [16], [17]. Variants of such methods may arise indirectly from
diffraction, optical defocusing, camera shaking, and more [1]. a statistical estimation perspective, e.g. [18], [19], [20], [21].
In the computational imaging literature, motion deblurring From a conceptual perspective, Levin et al. [22] study the
is an important topic because camera shakes are common limitations and remedies of the commonly employed Maxi-
during the photography procedure. In recent years, this topic mum a Posterior (MAP) approach, while Perrone et al. [23]
has attracted growing attention thanks to the popularity of extend their study with a particular focus on Total-Variation
smartphone cameras. On such platforms, the motion deblurring (TV) regularization. Despite some performance improvements
algorithm plays an especially crucial role because effective achieved along their developments, the iterative optimization
hardware solutions such as professional camera stabilizers are approaches generally suffer from several limitations. First,
difficult to deploy due to space restrictions. their performance depends heavily on appropriate selection
In this work we focus on motion deblurring in particular of parameter values. Second, handcrafted regularizers play an
because of its practical importance. However, our development essential role, and designing versatile regularizers that gener-
does not make assumptions on blur type and hence may alize well to a variety of real datasets can be a challenging
be extended to cases other than motion blur. Motion blurs task. Finally, hundreds and thousands of iterations are often
occur as a consequence of relative movements between the required to reach an acceptable performance level and thus
these approaches can be slow in practice.
Y. Li, M. Tofighi, and V. Monga are with Department of Electrical Complementary to the aforementioned approaches, learning
Engineering, The Pennsylvania State University, University Park, PA, 16802 based methods for determining a non-linear mapping that
USA, Emails: [email protected], [email protected], [email protected]
Y. C. Eldar is with Department of Electrical Engineering, Technion, Israel deblurs the image while adapting parameter choices to an
Institute of Technology, Haifa, Israel, Email: [email protected] underlying training image set have been developed. Principally
2

important in this class are techniques that employ deep neural • Deep Unrolling for BLind Deblurring (DUBLID): We
networks. The history of leveraging neural networks for blind propose an interpretable neural network structure called
deblurring actually dates back to the last century [24]. In the DUBLID. We first present an iterative algorithm that may
past few years, there has been a growing trend in applying be considered a generalization of the traditional total-
neural networks to various imaging problems [25], and blind variation regularization method in the gradient domain,
motion deblurring has followed that trend. Xu et al. [26] use and subsequently unroll the algorithm to construct a
large convolution kernels with carefully chosen initializations neural network. Key algorithm parameters are learned
in a Convolutional Neural Network (CNN); Yan et al. [27] with the help of training images using backpropagation,
concatenate a classification network with a regression network for which we derive analytically simple forms that are
to deblur images without prior information about the blur amenable to fast implementation.
kernel type. Chakrabarti et al. [28] work in the frequency • Performance and Computational Benefits: Through
domain and employ a neural network to predict Fourier extensive experimental validation over three benchmark
transform coefficients of image patches; Xu et al. [29] employ datasets, we verify the superior performance of the pro-
a CNN for edge enhancement prior to kernel and image es- posed DUBLID, both over conventional iterative algo-
timation. These works often outperform iterative optimization rithms and more recent neural network approaches. Both
algorithms especially for linear motion kernels; however, the traditional linear and more recently developed non-linear
structures of the networks are often empirically determined kernels are used in our experiments. Besides quality
and their actual functionality is hard to interpret. gains, we show that DUBLID is computationally simpler.
In the seminal work of Gregor et al. [30], a novel tech- In particular, the carefully designed interpretable layers
nique called algorithm unrolling was proposed. Despite its enables DUBLID to learn with far fewer parameters than
focus on approximating sparse coding algorithms, it provides state of the art deep learning approaches – hence leading
a principled framework for expressing traditional iterative to much faster inference time.
algorithms as neural networks, and offers promise in devel- • Reproducibility: To ensure reproducibility, we share
oping interpretable network architectures. Specifically, each our code and datasets that are used to generate all our
iteration step may be represented as one layer of the network, experimental results freely online.
and concatenating these layers form a deep neural network. The rest of the paper is organized as follows. General-
Passing through the network is equivalent to executing the ized gradient domain deblurring is reviewed in Section II.
iterative algorithm a finite number of times. The network We identify the roles of (gradient/feature extraction) filters
may be trained using back-propagation [31], and the trained and other key parameters, which are usually assumed fixed.
network can be naturally interpreted as a parameter optimized Based on a half-quadratic optimization procedure to solve the
algorithm. An additional benefit is that prior knowledge about aforementioned gradient domain deblurring, we develop a new
the conventional algorithms may be transferred. There has unrolling method that realizes the iterative optimization as
been limited recent exploration of neural network architectures a neural network in Section III. In particular, we show that
by unrolling iterative algorithms for problems such as super- the various linear and non-linear operators in the optimization
resolution and clutter/noise suppression [32], [33], [34], [35]. can be cascaded to generate an interpretable deep network,
In blind deblurring, Schuler et al. [36] employ neural networks such that the number of layers in the network corresponds
as feature extraction modules towards a trainable deblurring to the number of iterations. The fixed filters and parameters
system. However, the network portions are still empirical. are now learnable and a custom back-propagation procedure
Other aspects of deblurring have been investigated such as is proposed to optimize them based on training images.
spatially varying blurs [37], [38], including some recent neural Experimental results that provide insights into DUBLID as
network approaches [39], [40], [41], [42]. Other algorithms well as comparisons with state of the art methods are reported
benefit from device measurements [43], [44], [45] or leverage in Section IV. Section V concludes the paper.
multiple images [46], [47].
Motivations and Contributions: Conventional iterative algo- II. G ENERALIZED B LIND D EBLURRING VIA I TERATIVE
rithms have the merits of interpretability, but acceptable perfor- M INIMIZATION : A F ILTERED D OMAIN R EGULARIZATION
mance levels demand much longer execution time compared to P ERSPECTIVE
modern neural network approaches. Despite previous efforts,
the link between both categories remains largely unexplored A. Blind Deblurring in the Filtered Domain
for the problem of blind deblurring, and a method that simul- A common practice for blind motion deblurring is to esti-
taneously enjoys the benefits of both is lacking. In this regard, mate the kernel in the image gradient domain [18], [7], [8], [9],
we make the following contributions:1 [19], [10], [23]. Because the gradient operator ∇ commutes
with convolution, taking derivatives on both side of (1) gives
1 A preliminary 4 page version of this work has been submitted to IEEE
ICASSP 2019 [48]. This paper involves substantially more analytical develop- ∇y = k ∗ ∇x + n0 , (2)
ment in the form of: a.) the unrolling mechanism and associated optimization
problem for learning parameters, b.) derivation of custom back-propagation where n0 = ∇n is Gaussian noise. Formulation in the gradient
rules, c.) handling of color images, and d.) demonstration of computational domain, as opposed to the pixel domain, has several desirable
benefits. Experimentally, we have added a new dataset and several new state
of the art methods and scenarios in our comparisons. Finally, ablation studies strengths: first, the kernel generally serves as a low-pass
have been included to better explain the proposed DUBLID and its merits. filter, and low-frequency components of the image are barely
3

informative about the kernel. Intuitively, the kernel may be B. Efficient Minimization via Half-quadratic Splitting
inferred along edges rather than homogeneous regions in the Problem (5) is non-smooth so that traditional gradient-based
image. Therefore, a gradient domain approach can lead to optimization algorithms cannot be considered. Moreover, to
improved performance in practice [19] as the gradient operator facilitate the subsequent unrolling procedure, the algorithm
effectively filtered out the uninformative regions. Additionally, needs to be simple (to simplify the network structure) and
from a computational perspective, gradient domain formula- converge quickly (to reduce the number of layers required).
tions help in better conditioning of the linear system resulting Based on these concerns, we adopt the half-quadratic splitting
in more reliable estimation [8]. algorithm [55]. This algorithm is simple but effective, and
The model (2) alone, however, is insufficient for recovering has been successfully employed in many previous deblurring
both the image and the kernel; thus regularizers on both are techniques [56], [13], [16].
needed. The gradients of natural images are generally sparse, The basic idea is to perform variable-splitting and then
i.e., most of their values are of small magnitude [18], [22]. This alternating minimization on the penalty function. To this end,
fact motivates the developments of various sparsity-inducing we first cast (5) into the following approximation model:
regularizations on ∇x. Among them one of particular interest
XC 
is the `1 -norm (often called TV) thanks to its convexity [5], 1 2
min kfi ∗ y − k ∗ gi k2
[23]. To regularize the kernel, it is common practice to assume C
k,{gi ,zi }i=1
i=1
2
the kernel coefficients are non-negative and of unit sum. Con- 
1 2  2
solidating these facts, blind motion deblurring may be carried + λi kzi k1 + kgi − zi k2 + kkk2 ,
out by solving the following optimization problem [23]: 2ζi 2
1 2 2
 subject to kkk1 = 1, k ≥ 0, (6)
min kDx y − k ∗ g1 k2 + kDy y − k ∗ g2 k2
k,g1 ,g2 2 C
by introducing auxiliary variables {zi }i=1 , where ζi , i =

+ λ1 kg1 k1 + λ2 kg2 k1 + kkk22 , 1, . . . , C are regularization parameters. It is well known that
2 as ζi → 0 the sequence of solutions to (6) converges to that
subject to kkk1 = 1, k ≥ 0, (3)
of (5) [57]. In a similar manner to [13], we then alternately
C C
minimize over {gi }i=1 , {zi }i=1 and k and iterate until con-
where Dx y, Dy y are the partial derivates of y in horizontal 2
vergence . Specifically, at the l-th iteration, we execute the
and vertical directions respectively. The notation k · kp denotes
following minimizations sequentially:
the `p vector norm, while λ1 , λ2 , ε are positive constant
parameters to balance the contributions of each term. The ≥ 1 2 1
gi − zli 2 , ∀i,
gil+1 ← arg min fi ∗ y − kl ∗ gi 2 + 2
sign is to be interpreted elementwise. The solutions g1 and g2 gi 2 2ζi
of (3) are estimates of the gradients of the sharp image x, i.e., 1
gl+1 − zi 2 + λi kzi k1 , ∀i,
we may expect g1 ≈ Dx x and g2 ≈ Dy x. zl+1
i ← arg min i 2
zi 2ζi
In practice, numerical gradients of images are usually com- XC
1
fi ∗ y − k ∗ gl+1 2 +  kkk2 ,
puted using discrete filters, such as the Prewitt and Sobel kl+1 ← arg min i 2 2
filters. From this viewpoint, Dx y and Dy y may be viewed k i=1
2 2
as filtering y through two derivative filters of orthogonal subject to kkk1 = 1, k ≥ 0. (7)
directions [49]. Therefore, a straightforward generalization
of (3) is to use more than two filters, i.e., pass y through a For notational brevity, we will consistently use i to index
filter bank. This generalization increases the flexibility of (3), the filters and l to index the layers (iteration) henceforth.
and appropriate choice of the filters can significantly boost The notations {}i and {}l collects every filter and layer
performance. In particular, by steering the filters towards more components, respectively. As it is, problem (5) is non-convex
directions other than horizontal and vertical, local features over the joint variables k and {gi }i and proper initialization
(such as lines, edges and textures) of different orientations is crucial to get good solutions. However, it is difficult
are more effectively captured [50], [51], [52]. Moreover, to find appropriate initializations that perform well under
the filter can adapt its shapes to enhance the representation various practical scenarios. An alternative strategy that has
sparsity [53], [54], a desirable property to pursue. been commonly employed is to use different parameters per
Suppose we have determined a desired collection of C filters iteration [55], [11], [23], [13]. For example, λi ’s are typically
C
{fi }i=1 . By commutativity of convolutions, we have chosen as a large value from the beginning, and then gradually
decreased towards a small constant. In [55] the values of ζi ’s
fi ∗y = fi ∗k∗x+n0i = k∗(fi ∗x)+n0i , i = 1, 2, . . . , C, (4) decrease as the algorithm proceeds for faster convergence. In
where the filtered noises n0i = fi ∗ n are still Gaussian. To numerical analysis and optimization, this strategy is called the
encourage sparsity of the filtered image, we formulate the continuation method and its effectiveness is known for solving
optimization problem (which may similarly be regarded as a non-convex problems [58]. By adopting this strategy, we
generalization of [23]) choose different parameters {ζil , λli }i,l across the iterations. We
XC   take this idea one step further by optimizing the filters across
1 2  2
min kfi ∗ y − k ∗ gi k2 + λi kgi k1 + kkk2 , 2 In the non-blind deconvolution literature, a formal convergence proof has
C
k,{gi }i=1
i=1
2 2
been shown in [55], while for blind deconvolution, empirical convergence has
subject to kkk1 = 1, k ≥ 0. (5) been frequently observed as shown in [11], [13], etc.
4

iterations as well, i.e. we design filters {fil }i,l . Consequently, Algorithm 1 Half-quadratic Splitting Algorithm for Blind
the alternating minimization scheme in (7) becomes: Deblurring with Continuation
Input: Blurred image y, filter banks {fil }i,l , positive constant
1 2 1 2 parameters {ζil , λli }i,l , number of iterations3 L and ε.
gil+1← arg min fil ∗ y − kl ∗ gi 2 + l gi − zli 2 , ∀i,
gi 2 2ζi Output: Estimated kernel k, e estimated feature maps {gei }C .
i=1
(8)
1 2 1: Initialize k1 ← δ; z1i ← 0, i = 1, . . . , C.
zl+1
i ← arg min l gil+1 − zi 2 + λli kzi k1 , ∀i, (9) 2: for l = 1 to L do
zi 2ζi
C 3: for i = 1 to C do
X1
kl+1 ← arg min fil ∗ y − k ∗ gl+1 2 +  kkk2 , (10) 4: yil ← fil ∗ y, ( )
i 2 2
k 2 2 l bl ∗ bl +zbl
i=1 l+1 ζ k y
5: gi ← F −1 i
2
i i
,
subject to kkk1 = 1, k ≥ 0. ζil kbl +1

6: zl+1
i ← Sλli ζil gil+1 ,
We summarize the complete algorithm in Algorithm 1, where δ 7: end for  
 PC d ∗ 
in Step 1 is the impulse function. Problem (8) can be efficiently 1 z l+1
y l
kl+ 3 ← F −1 Pi=1 i 2 i ,
b
8:
solved by making use of the Discrete Fourier Transform  Ci=1 zd l+1
i + 
(DFT) and its solution is given in Step 5 of Algorithm 1, h  P  i
2 1 l+ 13
where ·∗ is the complex conjugation and is the Hadamard 9: kl+ 3 ← kl+ 3 − β l log i exp k i ,
+
(elementwise) product operator. The operations are to be l+ 2
10: kl+1 ← kl+ 23 ,
interpreted elementwise when acting on matrices and vectors. k 3
1
We let b· denote the DFT and F −1 be the inverse DFT. The 11: l ← l + 1.
closed-form solution to problem (9) is well known and can be 12: end for
found in Step 6, where, Sλ (·) is the soft-thresholding operator
defined as: After Algorithm 1 converges, we obtain the estimated
feature maps {gei }i and the estimated kernel k. e Because of the
Sλ (x) = sgn(x) · max{|x| − λ, 0}. low-pass nature of k, e using it alone is inadequate to reliably
recover the sharp image x and regularization is needed. We
e
Subproblem (10) is a quadratic programming problem with may infer from (4) that, as k approximates k, gei should
a simplex constraint. While in principle, it may be solved approximate f i ∗ x. Therefore, we retrieve x by solving the
using iterative numerical algorithms, in the blind deblurring following optimization problem:
literature an approximation scheme is often adopted. The 1 2 X C
e ∗ x ηi
fiL ∗ x − gei 2
unconstrained quadratic programming problem is solved first e ← arg min
x y − k + 2
x 2 2 2
(again using DFT) to obtain a solution; its negative coefficients  i=1

are then thresholded out, and finally normalized to have 
 ∗ 


kb PC ∗ 
unit sum (Steps 8–10 of Algorithm 1). We define [x]+ = −1
e y b + i=1 ηi fi gei 
c L b
max{x, 0}. This function is commonly called the Rectified =F 2 2  , (11)

 b P 

 k e + C cL 

Linear Unit (ReLU) in neural network terminology [59]. Note i=1 ηi fi
that in Step 9 of Algorithm 1, we are adopting a common
practice [8], [16] by thresholding the kernel coefficients using where ηi ’s are positive regularization parameters.
a positive constant (which is usually set as a constant param-
eter multiplying the maximum of the kernel coefficients); to III. A LGORITHM U NROLLING FOR D EEP B LIND I MAGE
avoid the non-smoothness of the maximum operation, we use D EBLURRING (DUBLID)
the log-sum-exp function as a smooth surrogate. A. Network Construction via Algorithm Unrolling
We note that the quality of the outputs and the convergence
Each step of Algorithm 1 is in analytic form and can be
speed depend crucially on the filters {fil }i,l and parameters
implemented using a series of basic functional operations.
{ζil , λli }i,l , which are difficult to infer due to the huge variety In particular, step 5 and step 8 in Algorithm 1 can be
of real world data. Under traditional settings, they are usually implemented according to the diagrams in Fig. 1a and Fig. 1b,
determined by hand crafting or domain knowledge. For ex- respectively. The soft-thresholding operation in step 6 may
ample, in [23], [16] {fil }i,l are taken as Prewitt filters while be implemented using two ReLU operations by recognizing
{λli }l ’s are chosen as geometric sequences. Their optimal that Sλ (x) = [x − λ]+ − [−x − λ]+ . Similarly, (11) may be
values thus remain unclear. To optimize the performance, implemented according to Fig. 1c. Therefore, each iteration of
we learn (adapt) the filters and parameters by using training
images in a deep learning set-up via back-propagation. A 3 L refers to the number of outer iterations, which subsequently becomes

detailed visual comparison between filters commonly used the number of network layers in Section III-A. While in traditional iterative
algorithms it is commonly determined by certain termination criteria, this
in conventional algorithms and filters learned through real approach is difficult to implement for neural networks. Therefore, in this
datasets by DUBLID is provided in Section IV-B. work we choose it through cross-validation as is done in [30], [32].
5

the kernel in each scale. We may integrate this scheme into


Algorithm 1 by choosing large filters in early iterations, so
l
yil y1l yC
that they are capable of capturing rather high-level features,
F
k l
F ··· F
zl1
and gradually decrease their size to let fine details emerge.

Translating this into the network, we may expect the following
conj F
conj F relationship among the sizes of kernels in different layers:
Log-Sum-Exp Σ
zli
size of fi1 ≥ size of fi2 ≥ size of fi3 ≥ . . . .


zlC
×
− F −1 ÷
..
ζl × F
ReLU ε +
In practice, large filters may be difficult to train due to the large
conj F
gil+1 +
number of parameters they contain. To address this issue, we
Σ
÷ Σ
produce large filters by cascading small 3×3 filters, following
F −1 ÷ + 1
k l+1 the same principle as [60]. Formally, we set fiL = wi1 L
where
L C
{wi1 }i=1 is a collection of 3×3 filters, and recursively obtain
l C
(a) (b) fil by filtering fil+1 through 3 × 3 filters {wij }i,j=1 :
C
X
Operators
fil ← l
wij ∗ fjl+1 , i = 1, 2, . . . , C.
zL
1 zL
C
j=1

y
F ··· F
f1L
Embedding the above into the network, we obtain the struc-
ture depicted in Fig. 3. Note that yl can now be obtained
F
more efficiently by filtering yl+1 through wl . Also note that
kL l C
× conj F {wij }i,j=1 are to be learned as marked in Fig. 3. Experimental
F conj + Σ justification of cascaded filtering is provided in Fig. 5.
η1

fCL

÷
× .. B. Training
In a given training set, for each blurred image yttrain (t =
× conj F 1, . . . , T ), we let the corresponding sharp image and kernel be
F −1 + Σ
xtrain
t and ktrain
t , respectively. We do not train the parameter
ηC
ε in step 8 of Algorithm 1 because it simply serves as a
×
small constant to avoid division by zeros. We re-parametrize
e
x
ζil in step 6 of Algorithm 1 by letting bli = λli ζil and
C
denote bl = (bli )i=1 , l = 1, . . . , L. The network outputs
(c) x et corresponding to ytrain depend on the parameters wl ,
et , k t
Fig. 1. Block diagram representations of (a) step 5 in Algorithm 1, (b) step 8 b , ζ , β l , l = 1, 2, . . . , L. In addition, x
l l et depends on η. We
in Algorithm 1 and (c) Equation (11). After unrolling the iterative algorithm to train the network to determine these parameters by solving the
form a multi-layer network, the diagramatic representations serve as building following optimization problem:
blocks that repeat themselves from layer to layer. The parameters (ζ and η) 
are learned from real datasets and colored in blue. PT    
min κt
MSE ket wl , bl , ζ l , β l , Tτt ktrain
t=1 2 l t
{wl ,bl ,ζ l ,β l } l ,η,{τt }t

Algorithm 1 admits a diagram representation, and repeating it 1   


+ MSE x et {wl , bl , ζ l , β l }l , η , T−τt xtrain
t
L times yields an L-layer neural network (as shown in Fig. 2) 2
which corresponds to executing Algorithm 1 with L iterations. subject to bli ≥ 0, λli ≥ 0,β l ≥ 0, l = 1, . . . , L, i = 1, . . . , C, (12)
For notational brevity, we concatenate the parameters in each where κt > 0 is a constant parameter which is fixed to
C C C
layer and let f l = (fil )i=1 , ζ l = (ζil )i=1 , λl = (λli )i=1 and κ0
, and we determined κ0 = 105 through cross-
C maxi |(ktrain )i |2
η = (ηi )i=1 . We also concatenate yil ’s, zli ’s and gil ’s by letting t
validation. MSE(·, ·) is the (empirical) Mean-Square-Error
C C C
yl = (yil )i=1 , zl = (zli )i=1 and gl = (gil )i=1 , respectively. loss function, and Tτ {·} is the translation operator in 2D
When the blur kernel has a large size (which may happen that performs a shift by τ ∈ R2 . The shift operation is used
due to fast motion), it is desirable to alter the spatial size of the here to compensate for the inherent shifting ambiguity in blind
filter banks {fi }i in different layers. In blind deblurring, kernel deconvolution [23].
recovery is frequently performed stage-wise in a coarse-to- In the training process, when working on each mini-batch,
fine scheme: the kernel is initially estimated from a high-level we alternate between minimizing over {τt }t and performing a
summary of the image features, and then progressively refined projected stochastic gradient descent step. Specifically, we first
by introducing lower-level details. For example, in [9] an determine the optimal {τt }t efficiently via a grid search in the
initial kernel is obtained by masking out detrimental gradients, Fourier domain. We then take one stochastic gradient descent
followed by iterative refinements. Other works [8], [23], [16] step; the analytic derivation of the gradient is included in
use a multi-scale pyramid implementation by decomposing Appendix A. Finally, we threshold out the negative coefficients
the images into a Gaussian pyramid and progressively refine of {bli , ζil , β l }i,l to enforce the non-negativity constraints. We
6

Operators
Convolutional layers
···
fL ∗ f L−1 ∗ f1 ∗
Blurred Image y

yL yL−1 ··· y1

Fig. 1c

gL
zL−1
SλL−1 gL−1 ··· z1
Sλ1 g1 Fig. 1a

Estimated
Image
e
x g0
Estimated
Kernel
e
k
kL kL−1 ··· k1 Fig. 1b
k0

Fig. 2. Algorithm 1 unrolled as a neural network. The parameters that are learned from real datasets are colored in blue.

use the Adam algorithm [61] to accelerate the training speed. whose solutionis given as follows: 
The learning rate is set to 1 × 10−3 initially and decayed by a mrr br + mrg bg + mrb bb
fr = F −1
x ,
factor of 0.5 every 20 epochs. We terminate training after 160 d
epochs. The parameters {bli }i,l are initialized to 0.02, {ζil }i,l  ∗ 
mrg br + mgg bg + mgb bb
initialized to 1, {β l }l initialized to 0, and {ηi }i initialized to fg = F −1
x ,
d
20, respectively. These values are again determined through  ∗ 
mrb br + m∗gb bg + mbb bb
cross-validation. The upper part (feature extraction portion) of fb = F −1
x ,
the network in Fig. 3 resembles a CNN with linear activations d
(identities) and thus we initialize the weights according to [62]. where
2
C. Handling Color Images mrr = cgg cbb − |cgb | , mrg = crb c∗gb − cbb crg ,
2
For color images, the red, green and blue channels yr , yg , mrb = crg cgb − cgg crb , mgg = cbb crr − |crb | ,
and yb are blurred by the same kernel, and thus the following mgb = c∗rg crb − crr cgb , mbb = crr cgg − |crg | .
2
model holds instead of (1):
Here,
yc = k ∗ xc + nc , c ∈ {r, g, b}. C 2
X ∗ b
To be consistent with existing literature, we modify w in L ccc0 = d
ηi w d
ic wic0 + k δcc0 , c, c0 ∈ {r, g, b},
Fig. 3 to allow for multi-channel inputs. More specifically, yL i=1
C
X
is produced by the following formula: b∗ y ∗
X bc = k cc + d
ηi w bi ,
ic g c ∈ {r, g, b},
yiL = L
wic ∗ yc , i = 1, . . . , C.  i=1
2
c∈{r,g,b} d = cgg crr − |crg | cbb + 2<{c∗rb crg cgb }
It is easy to check that, with wL and yL being replaced, all 2 2
− |cgb | crr − |crb | cgg ,
the components of the network can be left unchanged except
for the module in Fig. 1c. This is because (4) no longer holds and δcc0 is the Kronecker delta function. These analytical
and is modified to the following: formulas may be represented using diagrams similar to Fig. 1c
 
X X and embedded into a network.
wic ∗yc = k∗ wic ∗ xc +n0i , i = 1, . . . , C,
c∈{r,g,b} c∈{r,g,b} IV. E XPERIMENTAL V ERIFICATION
P A. Experimental Setups
where n0i = c∈{r,g,b} wic ∗ n represents Gaussian noise.
Problem (11) then becomes: 1) Datasets, Training and Test Setup:
X 1
2

{f fg , x
xr , x fb } ← arg min e ∗ xc
yc − k • Training for linear kernels: For the images we used
xr ,xg ,xb 2 2 the Berkeley Segmentation Data Set 500 (BSDS500) [63]
c∈{r,g,b}
2 which is a large dataset of 500 natural images that is
XC X
ηi
explicitly divided into disjoint training, validation and
+ wic ∗ xc − gei
, test subsets. Here we use 300 images for training by
i=1
2
c∈{r,g,b}
2 combining the training and validation images.
7

Feature Extraction
yL yL−1 y1
wL wL−1 w1 Operators
Convolutional layers
∗ ∗ ··· ∗

Blurred Image y

Fig. 1c zL−1
SλL−1
··· z1
Sλ1 Fig. 1a
Deconvolution

gL gL−1 g1

g0
··· Fig. 1b

kL kL−1 k1

e Estimated
Estimated Image x k0
e
Kernel k

Fig. 3. DUBLID: using a cascade of small 3 × 3 filters instead of large filters (as compared to the network in Fig. 2) reduces the dimensionality of the
parameter space, and the network can be easier to train. Intermediate data (hidden layers) on the trained network are also shown. It can be observed that, as
l increases, gl and yl evolve in a coarse-to-fine manner. The parameters that will be learned from real datasets are colored in blue.

We generated 256 linear kernels by varying the length and TABLE I


angle of the kernels (refer to Section IV-C for details). E FFECTS OF DIFFERENT VALUES OF LAYER L.
• Training for nonlinear kernels: We used the Microsoft
Number of layers 6 8 10 12
COCO [64] dataset which is a large-scale object de-
PSNR (dB) 26.55 26.94 27.30 27.35
tection, segmentation, and captioning dataset containing
330K images. RMSE (×10−3 ) 1.96 2.06 1.67 1.66
Nonlinear kernels: We generated around 30,000 real
world kernels by recording camera motion trajectories Improvement in Signal-to-Noise-Ratio
•  (ISNR), which is
(refer to Sec. IV-D for details). ky−xk22
given by ISNR = 10 log10 kex−xk2 where x e is the
• Testing for the linear kernel experiments: We use 200 2
reconstructed image;
images from the test portion of BSDS500 as test images
• Structural Similarity Index (SSIM) [68];
and we randomly choose four kernels of different angle
• (Empirical) Root-Mean-Square Error (RMSE) computed
and length as test kernels.
between the estimated kernel and the ground truth kernel.
• Testing for the nonlinear kernel experiments: We test
on two benchmark datasets specifically developed for We note that for selected methods, RMSE numbers (against
evaluating blind deblurring with non-linear kernels: 1.) the ground truth kernel) are not reported in Tables III, IV
4 images and 8 kernels by Levin et al. [22] and 2.) 80 and V because those methods directly estimate just the de-
images and 8 kernels from Sun et al. [12]. blurred image (and not the blur kernel).
2) Comparisons Against State of the Art Methods: We
compare against five methods: B. Ablation Study of the Network
• Perrone et al. [23] - a representative iterative blind image To provide insights into the design of our network, we first
deblurring method based on total-variation minimization, carry out a series of experimental studies to investigate the
which demonstrated state-of-the-art performance amongst influence of two key design variables: 1.) the number of layers
traditional iterative methods. (TPAMI 2016) L, and 2.) the number of filters C. We monitor the performance
• Chakrabarti et al. [28] - one of the first neural network using PSNR and RMSE. The results in Tables I and II are
blind image deblurring methods. (CVPR 2016) for linear kernels with the same training-test configuration as
• Nah et al. [40] - a recent deep learning method based on in Section IV-A. The trends over L and C were found to be
the state-of-the-art ResNet [65]. (CVPR 2017) similar for non-linear kernels as well.
• Kupyn et al. [66] - a recent deep learning method based We first study the influence of different number of layers
on the state-of-the-art generative adversarial networks L. We alter L, i.e. the proposed DUBLID is learned as in
(GAN) [67]. (CVPR 2018) Section III-B, each time as L is varied. Table I summarizes
• Xu et al. [29] - a recent state of the art deep learning the numerical scores corresponding to different L. Clearly, the
method focused on motion kernel estimation. (TIP 2018) network performs better with increasing L, consistent with
3) Quantitative Performance Measures: To quantitatively common observations [69], [65]. In addition, the network
assess the performance of various methods in different sce- performance improves marginally after L reaches 10. We thus
narios, we use the following metrics: fix L = 10 subsequently. For all results in Table I, the number
• Peak-Signal-to-Noise Ratio (PSNR); of filters C is fixed to 16.
8

TABLE II
E FFECTS OF DIFFERENT VALUES OF NUMBER OF FILTERS C.

Number of filters 8 16 32
···
PSNR (dB) 26.55 27.30 27.16
 1 C  2 C  3 C  C
RMSE (×10−3 ) 1.99 1.67 1.93 fi i=1 fi i=1 fi i=1 fiL i=1

(a) DUBLID-Learned
We next study the effects of different values of C in
a similar fashion. The network performance over different
choices of C is summarized in Table II. It can be seen that the
network performance clearly improves as C increases from Dx Dy
8 to 16. However, the performance slightly drops when C (b) DUBLID-Sobel
increases further, presumably because of overfitting. We thus
fix C = 16 henceforth.
To corroborate the network design choices made in Sec-
tion III-A, we illustrate DUBLID performance for different
filter choices. We first verify the significance of learning the
filters {wl }l (and in turn {fil }i,l ) and compare the performance
with a typical choice of analytical filters, the Sobel filters,
(c) (d) (e)
in Fig. 4. Note that by employing Sobel filters, the network
reduces to executing TV-based deblurring but for a small Fig. 4. Comparison of learned filters with analytic Sobel filters: (a) DUBLID
learned filters. (b) Sobel filters that are commonly employed in traditional
number of iterations, which coincides with the number of iterative blind deblurring algorithms. (c) An example motion blur kernel. (d)
layers L. For fairness in comparison, the fixed Sobel filter Reconstructed kernel using Sobel filters and (e) using learned filters.
version of DUBLID (called DUBLID-Sobel) is trained exactly
as in Section III-B to optimize other parameters. As Fig. 4
reveals, DUBLID-Sobel is unable to accurately recover the
kernel. Indeed, such phenomenon has been observed and
analytically studied by Levin et al. [22], where they point
out that traditional gradient-based approaches can easily get
stuck at a delta solution. To gain further insight, we visualize
the learned filters as well as the Sobel filters in Fig. 4a and
(a) (b) (c)
Fig. 4b. The learned filters demonstrate richer variations than
known analytic (Sobel) filters and are able to capture higher- Fig. 5. The effectiveness of cascaded filtering: (a) a sample motion kernel.
(b) Reconstructed kernel by fixing all fil ’s to be of size 3 × 3, which can
level image features as l grows. This enables the DUBLID be implemented by enforcing wij l to be of size 1 × 1 whenever l < L. (c)
network to better recover the kernel coefficients subsequently. Reconstructed kernel using the cascaded filtering structure in Fig. 3.
Quantitatively, the PSNR achieved by DUBLID-Sobel for
L = 10 and C = 16 on the same training-test set up is 18.60
dB, which implies that DUBLID achieves a 8.7 dB gain by
explicitly optimizing filters in a data-adaptive fashion.
Finally, we show the effectiveness of cascaded filtering. To
this end, we compare with the alternative scheme of fixing the
size of {fil }i,l by restricting {wl }l to be of size 1×1 whenever
l < L. The results are shown in Fig. 5. By employing learnable (a)
filters, the network becomes capable of capturing the correct
directions of blur kernels as shown in Fig. 5b. In the absence of
cascaded filtering though, the recovered kernel is still coarse
– a limitation that is overcome by using cascaded filtering, (b)
verified in Fig. 5c.

C. Evaluation on Linear Kernels


We use the training and validation portions of the
BSDS500 [70], [63] dataset as training images. The linear
motion kernels are generated by uniformly sampling 16 angles
in [0, π] and 16 lengths in [5, 20], followed by 2D spatial
interpolation. This gives a total number of 256 kernels. We (c)
then generate T = 256 × 300 blurred images by convolving Fig. 6. Examples of images and kernels used for training. (a) The sharp
each kernel with each image and adding white Gaussian noise images are collected from the BSDS500 dataset [70]. (b) The blur kernels
are linear motion of different lengths and angles. (c) The blurred images are
with standard deviation 0.01 (suppose the image intensity is synthesized by convolving the sharp images with the blur kernels.
9

in range [0, 1]) individually. Examples of training samples [29] perform comparably and mildly worse than DUBLID.
(images and kernels) are shown in Fig. 6. We use 200 DUBLID however achieves the deblurring at a significantly
images from the test portion of the BSDS500 dataset [70] for lower computational cost as verified in Section IV-E.
evaluation. We randomly choose angles in [0, π] and lengths in Visual examples are shown in Figs. 9 and 10 for qualitative
[5, 20] to generate 4 test kernels. The images and kernels and comparisons. It can be clearly seen that DUBLID is capable
convolved to synthesize 800 blurred images. White Gaussian of more faithfully recovering the kernels, and hence produces
noise (again with standard deviation 0.01) is also added. reconstructed images of higher visual quality. In particular,
Note that some of the state of the art methods compared DUBLID preserves local details better as shown in the zoom
against are only designed to recover the kernels, including [29] boxes of Figs. 9 and 10 while providing sharper images than
and [23]. To get the deblurred image, the non-blind method Nah et al. [40], Chakrabarti et al. [28] and Kupyn et al. [66].
in [13] is used consistently. The scores are averaged and Finally, DUBLID is free of visually objectionable artifacts
summarized in Table III. The RMSE values are computed over observed in Perrone et al. [23] and Xu et al. [29].
kernels, and smaller values indicate more accurate recoveries.
For all other metrics on images, higher scores generally
E. Computational Comparisons Against State of the Art
imply better performance. We do not include results from
Chakrabarti et al. [28] here because that method works on Table VI summarizes the execution (inference) times of each
grayscale images only. Table III confirms that DUBLID out- method for processing a typical blurred image of resolution
performs competing state-of-the art algorithms by a significant 480 × 320 and a blur kernel of size 31 × 31. The number
margin. of parameters for DUBLID is estimated as follows: for 3 × 3
Fig. 7 shows four example images and kernels for a qualita- filters wij , there are a total of L = 10 layers and in each
tive comparison. The two top-performing methods, Perrone et layer there are C 2 = 16 × 16 filters, which contribute to 3 ×
al. [23] and Nah et al. [40], are also included as representatives 3 × 16 × 16 × 10 ≈ 2.3 × 104 parameters. Other parameters
of iterative methods and deep learning methods, respectively. have negligible dimensions compared with wij and thus do
Although [23] can roughly infer the directions of the blur not contribute significantly.
kernels, the recovered coefficients clearly differ from the We include measurements of running time on both CPU and
groundtruth as evidenced by the spread-out branches. Conse- GPU. The − symbol indicates inapplicability. For instance,
quently, destroyed local structures and false colors are clearly Chakrabarti et al. [28] and Nah et al. [40] only provide
observed in the reconstructed images. Nah et al.’s method [40] GPU implementations of their work and likewise Perrone et
does not suffer from false colors, yet the recovered images al’s iterative method [23] is only implemented on a CPU.
appear blurry. In contrast, DUBLID recovers kernels close Specifically, the two benchmark platforms are: 1.) Intel Core
to the groundtruth, and produces significantly fewer visually i7–6900K, 3.20GHz CPU, 8GB of RAM, and 2.) an NVIDIA
objectionable artifacts in the recovered images. TITAN X GPU. The results in Table VI deliver two messages.
D. Evaluation on Non-linear Kernels First, the deep/neural network based methods are faster than
It has been observed in several previous works [22], [71] their iterative algorithm counterparts, which is to be expected.
that realistic motion kernels often have non-linear shapes due Second, amongst the deep neural net methods DUBLID runs
to irregular camera motions, such as those shown in Fig. 8. significantly faster than the others on both GPU and CPU,
Therefore, the capability to handle such kernels is crucial for largely because it has significantly fewer parameters as seen in
a blind motion deblurring method. the final row of Table VI. Note that the number of parameters
We generate training kernels by interpolating the paths for competing deep learning methods are computed based on
provided by [71] and those created by ourselves: specifically, the description in their respective papers.
we record the camera motion trajectories using the Vicon
system, and then interpolate the trajectories spatially to create
motion kernels. We further augment these kernels by scaling V. C ONCLUSION
over 4 different scales and rotating over 8 directions. In
this way, we build around 30, 000 training kernels in total4 . We propose an Algorithm Unrolling approach for Deep
The blurred images for training are synthesized by randomly Blind image Deblurring (DUBLID). Our approach is based
picking a kernel and convolving with it. Gaussian noise of on recasting a generalized TV-regularized algorithm into a
standard deviation 0.01 is again added. We use the standard neural network, and optimizing its parameters via a custom
image set from [22] (comprising 4 images and 8 kernels) and designed backpropogation procedure. Unlike most existing
from [12] (comprising 80 images and 8 kernels) as the test sets. neural network approaches, our technique has the benefit
The average scores for both datasets are presented in Table IV of interpretability, while sharing the performance benefits of
and Table V, respectively. In both datasets, DUBLID emerges modern neural network approaches. While some existing ap-
overall as the best method. The method of Chakrabarti et proaches excel for the case of linear kernels and others for non-
al. [28] performs second best in Table V. In Table IV, Perrone linear, our method is versatile across a variety of scenarios and
et al. [23] and the recent deep learning method of Xu et al. kernel choices – as is verified both visually and quantitatively.
Further, DUBLID requires much fewer parameters leading to
4 To re-emphasize, all learning based methods use the same training-test significant computational benefits over iterative methods as
configuration for fairness in comparison. well as competing deep learning techniques.
10

(a) Groundtruth (b) Perrone et al. [23] (c) Nah et al. [40] (d) DUBLID
Fig. 7. Qualitative comparisons on the BSDS500 [70] dataset. The blur kernels are placed at the right bottom corner. DUBLID recovers the kernel at higher
accuracy and therefore the estimated images are more faithful to the groundtruth.

TABLE III
Q UANTITATIVE COMPARISON OVER AN AVERAGE OF 200 IMAGES AND 4 KERNELS . T HE BEST SCORES ARE IN BOLD FONTS .

Metrics DUBLID Perrone et al. [23] Nah et al. [40] Xu et al. [29] Kupyn et al. [66]
PSNR (dB) 27.30 22.23 24.82 24.02 23.98
ISNR (dB) 4.45 2.06 1.92 1.12 1.05
SSIM 0.88 0.76 0.80 0.78 0.78
×10−3

RMSE 1.67 5.21 − 2.40 −

TABLE IV
Q UANTITATIVE COMPARISON OVER AN AVERAGE OF 4 IMAGES AND 8 KERNELS FROM [22].

DUBLID Perrone et al. [23] Nah et al. [40] Chakrabarti et al. [28] Xu et al. [29] Kupyn et al. [66]
PSNR (dB) 27.15 26.79 24.51 23.21 26.75 23.98
ISNR (dB) 3.79 3.63 1.35 0.06 3.59 0.43
SSIM 0.89 0.89 0.81 0.81 0.89 0.80
RMSE ×10−3

3.87 3.83 − 4.33 3.98 −

TABLE V
Q UANTITATIVE COMPARISON OVER AN AVERAGE OF 80 IMAGES AND 8 NONLINEAR MOTION KERNELS FROM [12].

DUBLID Perrone et al. [23] Nah et al. [40] Chakrabarti et al. [28] Xu et al. [29] Kupyn et al. [66]
PSNR (dB) 29.91 29.82 26.98 29.86 26.55 25.84
ISNR (dB) 4.11 4.02 0.86 4.06 0.43 0.15
SSIM 0.93 0.92 0.85 0.91 0.87 0.83
×10−3

RMSE 2.33 2.68 − 2.72 2.79 −
11

TABLE VI
RUNNING TIME COMPARISONS OVER DIFFERENT METHODS . T HE IMAGE SIZE IS 480 × 320 AND THE KERNEL SIZE IS 31 × 31.

DUBLID Chakrabarti et al. [28] Nah et al. [40] Perrone et al. [23] Xu et al. [29] Kupyn et al. [66]
CPU Time (s) 1.47 − − 1462.90 6.89 10.29
GPU Time (s) 0.05 227.80 7.32 − 2.01 0.13
Number of Parameters 2.3 × 104 1.1 × 108 2.3 × 107 − 6.0 × 106 1.2 × 107

We next derive each individual term in (13) as follows:


∂zl+1 ∂zl+1 ∂gil+1  
i
= l+1i
= diag I{|Pg gl+1 |>bl } F∗ ,
∂ gd
l+1 gi ∂ gd
l+1 i i
i i
Fig. 8. Examples of realistic non-linear kernels [22].
∂zl+1
i ∂zl+1
i ∂ gd
l+1 bl
i ∂ zi
= (14)
∂zil d b l
∂ gil+1 ∂ zi ∂zi
l
 
A PPENDIX A   ζ l
 
G RADIENTS C OMPUTATION BY BACK -P ROPAGATION = diag I{|Pg gl+1 |>bl } F∗ diag  2 i  F,
i i bl l
k + ζi

∂zl+1
i ∂zl+1
i ∂ gd
l+1
i ∂ ybil
= (15)
∂yil bl ∂yil
Here we develop the back-propagation rules for computing ∂ gd
l+1 ∂ y
i i
 
the gradients of DUBLID. We will use F to denote the DFT ∗
 
operator and F∗ its adjoint operator, and 1 is a vector whose  kbl 
= diag I{|Pg gl+1 |>bl } F∗ diag  2  F,
entries are all ones. I refers to the identiy matrix. The symbols i i bl l
k + ζi
I{} means indicator vectors and diag(·) embeds the vector into
a diagonal matrix. The operators Pg and Pk are projections and
that restrict the operand into the domain of the image and the ∗!
∂zl+1
i ∂zl+1
i ∂ gd
l+1
i ∂ kbli ∂ gdl+1
i ∂ kbli
kernel, respectively. Let L be the cost function defined in (12). = + ∗ (16)
bl ∂kli l
We derive its gradients w.r.t. its variables using the chain rule ∂kl ∂ gd
l+1
i 
∂ k ∂ kbl ∂ki

as follows:
= diag I{|Pg gl+1 |>bl } F∗
i
  i    
∗ 2

  diag  ζ l ybl  F∗ −
kbl ybil
∇wil L = ∇wil yil ∇yil L = Rwil Fdiag yd
l+1
F∗ ∇yil L,
 i i 2 diag   2 2  F,
i bl 2
k +ζil k +ζil
bl

∇ζil L = ∇ζil zl+1


i ∇zl+1 L
i 2 1
 
 T
∂kl+1 ∂kl+1 ∂kl+ 3 ∂kl+ 3


kbl kbl gbil −ybil   = 2 1

=  F∗ I ∇ l+1 L ,
l+ 31
∂ k[ ∂kl+ 3 ∂kl+ 3 ∂ k[ l+ 31

bl 2
2
{| g i | i }
P g l+1
>bl zi
 
k +ζil 2
I 1T kl+ 3 − kl+ 3 1T
2
 
=  2 diag I n 1 o F∗ ,
2 Pk kl+ 3 >0
T
1 k l+ 3

 2
 2
∂kl+1 ∂kl+1 k[ l+ 31 ybl I 1T kl+ 3 − kl+ 3 1T
i
= =  2 · (17)
 T ∂yil l+ 13 y bl yil
∂ k[
2
i 1T kl+ 3
∇bli L = ∇bli zl+1
i ∇zl+1 L = I{gl+1 <−bl } − I{gl+1 >bl } ∇zl+1 L,  
i i i i i i
  PC d ∗
l+1
o F∗ diag  i=1 zi  F,
diag In 1
PC l+1 2
Pk kl+ 3 >0
i=1 z i + ε
where Rwil is the operator that extracts the components lying  ∗
∂kl+1 ∂kl+1  ∂ k[ l+ 3 ∂ zd
1 l+1
∂ [
k
1
l+ 3 ∂z \ l+1
in the support of wil . Again using the chain rule, i i 
= + ∗
∂zl+1
i ∂k[ l+ 1
3 ∂ zd l+1 ∂zl+1
i \
∂ ∂z l+1 ∂zi
l+1
i i
(18)
∂L ∂L ∂zl+1
i ∂L ∂L ∂zl+1
i ∂L ∂kl  
= , = + , I 1T k l+ 23
−k l+ 23
1T  
∂kl ∂zl+1 ∂k l ∂zli zl+1 ∂zli ∂kl ∂zli
i i
=  2 diag In 1 o F∗ ·
∂L ∂L ∂zl+1 ∂L ∂kl+1 ∂L ∂yil−1 2 Pk kl+ 3 >0
= l+1 i l + + . (13) 1T kl+ 3
l l
∂yi zi ∂yi ∂kl+1 ∂yi ∂yil−1 ∂yil
12

(a) Groundtruth (b) Perrone et al. [23] (c) Nah et al. [40] (d) Chakrabarti [28] (e) Xu et al. [29] (f) Kupyn et al. [66] (g) DUBLID
Fig. 9. Qualitative comparisons on the dataset from [22]. The blur kernels are placed at the right bottom corner. DUBLID generates fewer artifacts and
preserves more details than competing state of the art methods.

(a) Groundtruth (b) Perrone et al. [23] (c) Nah et al. [40] (d) Chakrabarti [28] (e) Xu et al. [29] (f) Kupyn et al. [66] (g) DUBLID
Fig. 10. Qualitative comparisons on the dataset from [12]. The blur kernels are placed at the right bottom corner.
   
PC d ∗ ∗
l+1 c
yjl zidl+1
  j=1 zj 
  
−diag   2 F
  PC d l+1
2 
j=1 zj + ε
  2  

  
l+1 +ε − l+1 l+1
PC d PC
ybil j=1 zj j=1 zj
cl zd
y
d
j i
diag   F∗ ,

+ 2 2
 

l+1 +ε
PC d
j=1 zj
 ζil 
  ∗
∇gil L = Fdiag   F I{|Pg gil+1 |>bli } ∇zl+1
2 L
bl i
k + ζil
   
 
PL cl ∗ yd l−1
bl ∗
∂yil−1 ∂ yd
g g
∂yil−1 l−1
i ∂ ybil ∗ l−1
[ + −Fdiag 
j=1 j j i

= = F diag wi F. (19) 
PL cl 2
2
∂yil bl ∂yil
∂ yd
l−1 ∂ y
i i j=1 g j +ε
     
PL cl 2 PL cl ∗ d
yil−1 j=1 gj +ε −
l−1
j=1 gj yj gbil
d
+ F∗ diag  
PL cl 2
2   F∗
j=1 gj +ε
Plugging (14) (15) (16) (17) (18) (19) into (13), we obtain
 1 T 
I( ) kl− 3
l− 2
3 >0
 1
Pk k

      1 In 2
Pk kl− 3 >0
o ∇kl C −  1 2 ∇k l L 
 ∗
2 1T kl− 3 1T kl− 3
ζil ybil kbl ybil
∇kl L= F∗ diag   2  − Fdiag   2  
bl 2
2
k +ζil k +ζil
bl
 
F∗ I{|Pg gl+1 |>bl } ∇zl+1 L
i i i
13

[24] R. J. Steriti and M. A. Fiddy, “Blind deconvolution of images by use


! of neural networks,” Opt. Lett., vol. 19, no. 8, pp. 575–577, 1994.
∗  
bl [25] A. Lucas, M. Iliadis, R. Molina, and A.K. Katsaggelos, “Using Deep
∇yil L= Fdiag k
bl 2
F∗ I{|Pg gl+1 |>bl } ∇zl+1 L Neural Networks for Inverse Problems in Imaging: Beyond Analytical
+ζil
k i i i
Methods,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 20–36, 2018.
 ∗ 
PL dl+1
[26] Li Xu, Jimmy SJ Ren, Ce Liu, and Jiaya Jia, “Deep convolutional neural
i=1 zi network for image deconvolution,” in Proc. NIPS, 2014.
+ Fdiag  P  F∗
L l+1 2
[27] R. Yan and L. Shao, “Blind Image Blur Estimation via Deep Learning,”
i=1 zi +ε IEEE Trans. Image Process., vol. 25, no. 4, pp. 1910–1921, Apr. 2016.
 2T  [28] Ayan Chakrabarti, “A Neural Approach to Blind Motion Deblurring,”
I( ) kl+ 3
l+ 1 in Proc. ECCV, Oct. 2016.
3 >0
 1
Pk k

 2 In 1 o ∇kl+1 C − 2 ∇kl+1 C  [29] X. Xu, J. Pan, Y. J. Zhang, and M. H. Yang, “Motion Blur Kernel
Pk kl+ 3 >0 2

1T kl+ 3 1T kl+ 3
Estimation via Deep Learning,” IEEE Trans. Image Process., vol. 27,
  no. 1, pp. 194–205, Jan. 2018.
[30] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse
[l−1
+ Fdiag wi F∗ ∇yl−1 L coding,” in Proc. ICML, 2010.
i
[31] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–
R EFERENCES 2324, Nov. 1998.
[32] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for
[1] D. Kundur and D. Hatzinakos, “Blind image deconvolution,” IEEE image super-resolution with sparse prior,” in Proc. IEEE ICCV, 2015.
Signal Process. Mag., vol. 13, no. 3, pp. 43–64, May 1996. [33] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep Convolu-
[2] William Hadley Richardson, “Bayesian-based iterative method of image tional Neural Network for Inverse Problems in Imaging,” IEEE Trans.
restoration,” J. Opt. Soc. Am., vol. 62, no. 1, pp. 55–59, 1972. Image Process., vol. 26, no. 9, pp. 4509–4522, Sept. 2017.
[3] L. A. Shepp and Y. Vardi, “Maximum Likelihood Reconstruction for [34] Y. Chen and T. Pock, “Trainable Nonlinear Reaction Diffusion: A
Emission Tomography,” IEEE Trans. Med. Imaging, vol. 1, no. 2, pp. Flexible Framework for Fast and Effective Image Restoration,” IEEE
113–122, Oct. 1982. Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1256–1272, 2017.
[4] G. R. Ayers and J. Ch. Dainty, “Iterative blind deconvolution method [35] Oren Solomon, Regev Cohen, Yi Zhang, Yi Yang, He Qiong, Jianwen
and its applications,” Opt. lett., vol. 13, no. 7, pp. 547–549, 1988. Luo, Ruud J. G. van Sloun, and Yonina C. Eldar, “Deep Unfolded
[5] Tony F. Chan and Chiu-Kwong Wong, “Total variation blind deconvo- Robust PCA with Application to Clutter Suppression in Ultrasound,”
lution,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 370–375, 1998. arXiv:1811.08252 [cs, stat], Nov. 2018.
[6] N. Joshi, R. Szeliski, and D. J. Kriegman, “PSF estimation using sharp [36] Christian J. Schuler, Michael Hirsch, Stefan Harmeling, and Bernhard
edge prediction,” in Proc. IEEE Conf. CVPR, June 2008. Scholkopf, “Learning to Deblur,” IEEE Trans. Pattern Anal. Mach.
[7] Qi Shan, Jiaya Jia, and Aseem Agarwala, “High-quality Motion Intell., vol. 38, no. 7, pp. 1439–1451, July 2016.
Deblurring from a Single Image,” in Proc. ACM SIGGRAPH, 2008.
[37] Y. W. Tai, P. Tan, and M. S. Brown, “Richardson-Lucy Deblurring for
[8] S. Cho and S. Lee, “Fast Motion Deblurring,” in Proc. ACM SIGGRAPH
Scenes under a Projective Motion Path,” IEEE Trans. Pattern Anal.
Asia, 2009.
Mach. Intell., vol. 33, no. 8, pp. 1603–1618, Aug. 2011.
[9] Li Xu and Jiaya Jia, “Two-phase kernel estimation for robust motion
[38] Oliver Whyte, Josef Sivic, Andrew Zisserman, and Jean Ponce, “Non-
deblurring,” in Proc. ECCV, 2010.
uniform Deblurring for Shaken Images,” Int. J. Comput. Vis., vol. 98,
[10] Dilip Krishnan, Terence Tay, and Rob Fergus, “Blind deconvolution
no. 2, pp. 168–186, June 2012.
using a normalized sparsity measure,” in Proc. IEEE Conf. CVPR, 2011.
[39] Jian Sun, Wenfei Cao, Zongben Xu, and Jean Ponce, “Learning a
[11] L. Xu, S. Zheng, and J. Jia, “Unnatural L0 Sparse Representation for
Natural Image Deblurring,” in Proc. IEEE Conf. CVPR, June 2013. convolutional neural network for non-uniform motion blur removal,” in
[12] L. Sun, S. Cho, J. Wang, and J. Hays, “Edge-based blur kernel estimation Proc. IEEE Conf. CVPR, 2015.
using patch priors,” in Proc. IEEE ICCP, Apr. 2013. [40] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee, “Deep multi-scale
[13] J. Pan, Z. Hu, Z. Su, and M. H. Yang, “$L 0$ -Regularized Intensity and convolutional neural network for dynamic scene deblurring,” in Proc.
Gradient Prior for Deblurring Text Images and Beyond,” IEEE Trans. IEEE Conf. CVPR, 2017, vol. 1, p. 3.
Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 342–355, Feb. 2017. [41] T. M. Nimisha, A. K. Singh, and A. N. Rajagopalan, “Blur-Invariant
[14] Jian-Feng Cai, Hui Ji, Chaoqiang Liu, and Zuowei Shen, “Framelet- Deep Learning for Blind-Deblurring,” in Proc. IEEE ICCV, Oct. 2017,
Based Blind Motion Deblurring From a Single Image,” IEEE Trans. pp. 4762–4770.
Image Process., vol. 21, no. 2, pp. 562–572, Feb. 2012. [42] Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolf-
[15] Sh. Xiang, G. Meng, Y. Wang, Ch. Pan, and Ch. Zhang, “Image gang Heidrich, and Oliver Wang, “Deep Video Deblurring for Hand-held
Deblurring with Coupled Dictionary Learning,” Int. J. Comput. Vis., Cameras,” in Proc. IEEE Conf. CVPR, 2017.
vol. 114, no. 2-3, pp. 248–271, Sept. 2015. [43] R. Raskar, A. Agrawal, and J. Tumblin, “Coded Exposure Photography:
[16] J. Pan, D. Sun, H. Pfister, and M. H. Yang, “Deblurring Images via Motion Deblurring Using Fluttered Shutter,” in ACM SIGGRAPH, 2006.
Dark Channel Prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PP, [44] T. S. Cho, A. Levin, F. Durand, and W. T. Freeman, “Motion blur
no. 99, pp. 1–1, 2018. removal with orthogonal parabolic exposures,” in IEEE ICCP, 2010.
[17] M. Tofighi, Y. Li, and V. Monga, “Blind image deblurring using row– [45] N. Joshi, S. B. Kang, C. L. Zitnick, and R. Szeliski, “Image Deblurring
column sparse representations,” IEEE Signal Processing Letters, vol. Using Inertial Measurement Sensors,” in Proc. ACM SIGGRAPH, 2010.
25, no. 2, pp. 273–277, 2018. [46] J-F. et al. Cai, “Blind motion deblurring using multiple images,” J.
[18] Rob Fergus, Barun Singh, Aaron Hertzmann, Sam T. Roweis, and Comput. Phys., vol. 228, no. 14, pp. 5057–5071, Aug. 2009.
William T. Freeman, “Removing Camera Shake from a Single Pho- [47] F. Sroubek and P. Milanfar, “Robust Multichannel Blind Deconvolution
tograph,” in Proc. ACM SIGGRAPH, New York, NY, USA, 2006. via Fast Alternating Minimization,” IEEE Trans. Image Process., vol.
[19] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Efficient marginal 21, no. 4, pp. 1687–1700, Apr. 2012.
likelihood optimization in blind deconvolution,” in Proc. IEEE Conf. [48] Y. Li, M. Tofighi, V. Monga, and Y. C. Eldar, “An algorithm unrolling
CVPR, June 2011. approach to deep image deblurring,” ”http://signal.ee.psu.edu/icassp19.
[20] S. Derin Babacan, Rafael Molina, Minh N. Do, and Aggelos K. pdf”, ”submitted to 2019 44th IEEE International Conference on
Katsaggelos, “Bayesian Blind Deconvolution with General Sparse Image Acoustics, Speech, and Signal Processing”.
Priors,” in Proc. ECCV, Oct. 2012. [49] Rafael C Gonzalez and Richard E Woods, “Digital image processing
[21] David Wipf and Haichao Zhang, “Revisiting Bayesian blind deconvo- second edition,” Beijing: Publishing House of Electronics Industry, vol.
lution,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3595–3634, 2014. 455, 2002.
[22] A. Levin, Y. Weiss, F. Durand, and W.T. Freeman, “Understanding [50] W. T. Freeman and E. H. Adelson, “The design and use of steerable
Blind Deconvolution Algorithms,” IEEE Trans. Pattern Anal. Mach. filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 9, pp.
Intell., vol. 33, no. 12, pp. 2354–2367, Dec. 2011. 891–906, Sept. 1991.
[23] Daniele Perrone and Paolo Favaro, “A Clearer Picture of Total Variation [51] Jean-Luc Starck, Emmanuel J. Candes, and David L. Donoho, “The
Blind Deconvolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, curvelet transform for image denoising,” IEEE Trans. Image process.,
no. 6, pp. 1041–1055, June 2016. vol. 11, no. 6, pp. 670–684, 2002.
14

[52] M. Unser, N. Chenouard, and D. Van De Ville, “Steerable Pyramids


and Tight Wavelet Frames in,” IEEE Trans. Image Process., vol. 20, no.
10, pp. 2705–2721, Oct. 2011.
[53] B. Mailh, S. Lesage, R. Gribonval, F. Bimbot, and P. Vandergheynst,
“Shift-invariant dictionary learning for sparse representations: Extending
K-SVD,” in EUSIPCO, Aug. 2008, pp. 1–5.
[54] Q. Barthelemy, A. Larue, A. Mayoue, D. Mercier, and J. I. Mars, “Shift
amp; 2d Rotation Invariant Sparse Coding for Multivariate Signals,”
IEEE Trans. Signal Process., vol. 60, no. 4, pp. 1597–1611, Apr. 2012.
[55] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A New Alternating
Minimization Algorithm for Total Variation Image Reconstruction,”
SIAM J. Imaging Sci., vol. 1, no. 3, pp. 248–272, Jan. 2008.
[56] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth, “Discrim-
inative Non-blind Deblurring,” in IEEE Conf. CVPR, June 2013.
[57] Dimitri P Bertsekas, Constrained optimization and Lagrange multiplier
methods, Academic press, 2014.
[58] Andrew Blake and Andrew Zisserman, Visual Reconstruction, MIT
Press, Cambridge, MA, USA, 1987.
[59] Vinod Nair and Geoffrey E. Hinton, “Rectified linear units improve
restricted boltzmann machines,” in Proc. ICML, 2010, pp. 807–814.
[60] Karen Simonyan and Andrew Zisserman, “Very deep convolutional
networks for large-scale image recognition,” in Proc. ICLR, 2015.
[61] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic
optimization,” in Proc. ICLR, 2015.
[62] Xavier Glorot and Yoshua Bengio, “Understanding the difficulty of
training deep feedforward neural networks,” in Proc. ICAIS, Mar. 2010.
[63] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour Detection
and Hierarchical Image Segmentation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 33, no. 5, pp. 898–916, May 2011.
[64] T-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Dollár, and C.L. Zitnick, “Microsoft coco: Common objects in
context,” in Proc. ECCV. Springer, 2014, pp. 740–755.
[65] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep
Residual Learning for Image Recognition,” in Proc. IEEE Conf. CVPR,
June 2016, pp. 770–778.
[66] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas,
“Deblurgan: Blind motion deblurring using conditional adversarial net-
works,” in Proc. IEEE Conf. CVPR, June 2018.
[67] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio,
“Generative adversarial nets,” in Proc. NIPS, 2014, pp. 2672–2680.
[68] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, “Image
quality assessment: from error visibility to structural similarity,” IEEE
Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[69] J. Kim, J. K. Lee, and K. M. Lee, “Accurate Image Super-Resolution
Using Very Deep Convolutional Networks,” in Proc. IEEE Conf. CVPR,
June 2016, pp. 1646–1654.
[70] D. Martin et al., “A database of human segmented natural images
and its application to evaluating segmentation algorithms and measuring
ecological statistics,” in Proc. IEEE ICCV, July 2001.
[71] R. Khler et al., “Recording and playback of camera shake: Benchmark-
ing blind deconvolution with a real-world database,” in Proc. ECCV.
2012, pp. 27–40, Springer.

You might also like