Algorithm_Unrolling_Interpretable_Efficient_Deep_Learning_for_Signal_and_Image_Processing
Algorithm_Unrolling_Interpretable_Efficient_Deep_Learning_for_Signal_and_Image_Processing
Eldar
Algorithm Unrolling
Interpretable, efficient deep learning for signal and image processing
D
eep neural networks provide unprecedented performance the core tasks in computer vision. Groundbreaking performance
gains in many real-world problems in signal and image improvements have been demonstrated via AlexNet [1], and
processing. Despite these gains, the future development fewer classification errors than human-level performance [2]
and practical deployment of deep networks are hindered were reported for the ImageNet data set [3].
by their black-box nature, i.e., a lack of interpretability and the
need for very large training sets. An emerging technique called
algorithm unrolling, or unfolding, offers promise in eliminat-
ing these issues by providing a concrete and systematic connec-
tion between iterative algorithms that are widely used in signal
processing and deep neural networks. Unrolling methods were
first proposed to develop fast neural network approximations
for sparse coding. More recently, this direction has attracted
enormous attention, and it is rapidly growing in both theoretic
investigations and practical applications. The increasing popu-
larity of unrolled deep networks is due, in part, to their potential
in developing efficient, high-performance (yet interpretable)
network architectures from reasonably sized training sets.
In this article, we review algorithm unrolling for signal and
image processing. We extensively cover popular techniques for
algorithm unrolling in various domains of signal and image
processing, including imaging, vision and recognition, and
speech processing. By reviewing previous works, we reveal the
connections between iterative algorithms and neural networks
and present recent theoretical results. Finally, we provide a
discussion on the current limitations of unrolling and suggest
possible future research directions.
Introduction
The past decade has witnessed a deep learning revolution. The
availability of large-scale training data sets, which is often facilitated
by Internet content; the accessibility of powerful computational
resources thanks to breakthroughs in microelectronics; and
advances in neural network research, such as the development of
effective network architectures and efficient training algorithms,
have resulted in the unprecedented success of deep learning in
innumerable applications of computer vision, pattern recognition,
and speech processing. For instance, deep learning has provided
significant accuracy gains in image recognition, which is one of
End-to-End Training
Output
Input
Interpretable Layers
(a) (b)
FIGURE 1. A high-level overview of algorithm unrolling. Given (a) an iterative algorithm, (b) a corresponding deep network can be generated by cascad-
ing the algorithm’s iterations h. The iteration step h in (a) is executed a number of times, resulting in the network layers h 1, h 2, f in (b). Each iteration
h depends on algorithm parameters i, which are transferred into network parameters i 1, i 2, f. Instead of determining these parameters through
cross-validation and analytical derivations, we learn i 1, i 2, f from training data sets through end-to-end training. In this way, the resulting network
could achieve better performance than the original iterative algorithm. In addition, the network layers naturally inherit interpretability from the iteration
procedure. The learnable parameters are colored in blue.
Hidden
x20 s20 s0 ×W s1 sL
The W’s and b’s are generally trainable parameters that are States
learned from data sets through training, during which back- ×U ×U ×U
propagation [18] is often employed for gradient computation.
Today, MLPs are rarely seen in practical imaging and vision x30 s30 Inputs
x0 x1 xL
applications. The fully connected nature of MLPs contributes
to a rapid increase in the number of parameters, making train- (c)
ing difficult. To address this limitation, Fukushima et al. [19]
designed a neural network by mimicking the visual nervous sys- FIGURE 2. Conventional neural network architectures that are popular
tem [20]. The neuron connections are restricted to local neigh- in signal/image processing and computer vision applications. (a) An
bors, and weights are shared across different spatial locations. MLP, where all the neurons are fully connected. (b) A CNN, where the
The linear operations then become convolutions (or correla- neurons are sparsely connected and the weights are shared among
different neurons. Therefore, the weight matrices W l, l = 1, 2, f, L ef-
tions, in a strict sense), and thus the networks employing such fectively become convolution operators. (c) An RNN, where the inputs x l,
localizing structures are generally called convolutional neural l = 1, 2, f, L are fed in sequentially and the parameters U, V, and W
networks (CNNs). A representation of a CNN can be seen in Fig- are shared across different time steps.
end for
y
Stacking
(a) (b)
Wt Wt
x0 + Sλ x1 + Sλ x2 xL
We We
y
(c)
FIGURE 3. The LISTA. One iteration of the ISTA executes a linear operation and then a nonlinear one and thus can be recast into a network layer; by stack-
ing the layers together, a deep network is formed. The network is subsequently trained using paired inputs and outputs by backpropagation to optimize
the parameters W e , W t, and m; n is a constant parameter that controls the step size of each iteration. The trained network, a LISTA, is computationally
more efficient compared with the original ISTA. The trainable parameters in the network are colored in blue. For details, see “Learned Iterative Shrinkage
and Thresholding Algorithm.” In practice, W e , W t, and m may vary in each layer. (a) An ISTA. (b) A single network layer. (c) An unrolled deep network.
, ^ W t, W e, m h = N | xt n ^ y n; W t, W e, m h - x ) n 22 , (S4)
N
1
where I ! R m # m is the identity matrix, n is a positive
n=1
parameter that controls the iteration step size, and S m ($) is
the soft-thresholding operator defined elementwise as and the network is trained through loss minimization,
using popular gradient-based learning techniques, such
S m (x) = sign (x) $ max " ; x ; - m, 0 , . (S3) as stochastic gradient descent [18], to learn W t, W t,
and m. It has been empirically shown that the number of
Basically, the ISTA is equivalent to a gradient step of
2 layers L in the (trained) LISTA can be an order of magni-
y - Wx 2 followed by a projection onto the , 1 ball.
tude smaller than the number of iterations required for the
As depicted in Figure 3, the iteration (S2) can be recast
ISTA [13] to achieve convergence corresponding to a
into a single network layer. This layer includes a series of
new observed input.
analytic operations (matrix–vector multiplication, summa-
tion, and soft thresholding), which is of the same nature as Reference
[S1] Y. C. Eldar and G. Kutyniok, Compressed Sensing: Theory and
a neural network. Executing the ISTA L times can be inter- Applications. Cambridge, U.K.: Cambridge Univ. Press, 2012.
(a) (b)
FIGURE 4. The general idea of algorithm unrolling. Starting with an abstract iterative algorithm, we map one iteration (described as the function h param-
etrized by i l, l = 0, f, L - 1) into a single network layer and stack a finite number of layers together to form a deep network. Feeding the data forward
through an L-layer network is equivalent to executing the iteration L times (finite truncation). The parameters i l, l = 0, 1, f, L - 1 are learned from real
data sets by training the network end to end to optimize the performance. The parameters can either be shared across different layers or vary from layer
to layer. The trainable parameters are colored in blue. (a) An iterative algorithm. (b) An unrolled deep network.
Table 1. Recent methods employing algorithm unrolling in practical signal processing and imaging applications.
D× Recovered
Input Patch LISTA Sparse Patch Recovered
Patch
Image y Extraction Subnetwork Code α Recombination Image x
"
z = Dα
FIGURE 5. The SCN [26] architecture. The patches extracted from input low-resolution image y are fed into a LISTA subnetwork to estimate the as-
sociated sparse codes a, and then high-resolution patches are reconstructed through a linear layer. The predicted high-resolution image xt is formed
by putting these patches into their corresponding spatial locations. The whole network is trained by forming low- and high-resolution image pairs by
employing standard stochastic gradient descent algorithm. The high-resolution dictionary D (colored in blue) and the LISTA parameters are trainable
from real data sets.
(a)
(b)
(c)
(d)
FIGURE 6. Sample experimental results from [26] for visual comparison in single-image superresolution. (a) Ground-truth images. (b) Results from
[45]. (c) Results from [46]. (d) Results from [26]. (b)–(d) include a state-of-the art iterative algorithm as well as a deep learning technique. Note that
the magnified portions show that the SCN better recovers sharp edges and spatial details.
Stage 1
"
FIGURE 7. The architecture in [31]. The network is formed by concatenating multiple stages of essential blind image deblurring modules. Stages 2 and 3
repeat the same operations as stage 1, with different trainable parameters. From a conceptual standpoint, each stage imitates one iteration of a typical
blind image deblurring algorithm. The training data can be formed by synthetically blurring sharp images to obtain their blurred versions.
FIGURE 8. Sample experimental results from [39] for visual comparison of blind image deblurring. (a) Ground-truth images and kernels. (b) A top-performing
iterative algorithm from Perrone et al. [48]. Two state-of-the-art deep learning techniques (c) and (d), from Nah et al. [49] and Tao et al. [50], respectively, are
compared against (e) the DUBLID method.
g li + 1 = F - 1 * 4
image, k is the unknown blur kernel, and n is Gaussian z li
l Xl 2
random noise. A popular class of image deblurring algo- gi k + 1
rithms perform total variation minimization, which solves := M 1 " f l ) y, z l; g l ,, 6i,
the following optimization problem: z i = S m g " g li + 1 ,
l+1 l
i
l
i
:= M 2 " g l + 1; b l ,, 6i
^ D x y - k ) g 1 22 + D y y - k ) g 2 22 h
1
min
]Z] C \ V
k, g , g 2
SRS
]] | z i 9 f i 9 y i bbW W b_bWVW
1 2
l+1) l
e 2 SS ] bWW
+ m 1 g 1 1 + m 2 g 2 1 + 2 k 2, k l + 1 = N 1 SS F - 1 ][ i = 1 C
SS ]] | \ b`bW
subject to k 1 = 1, k $ 0, (S6) S ]] zil+1 2
+ e bbbWWW
T \ i=1 aX+
:= M 3 " f l ) y, z l + 1 ,, (S9)
where D x y and D y y are the partial derivatives of y in
horizontal and vertical directions, respectively, and m 1, m 2, where [·]+ is the rectified linear unit operator, xt denotes
and f are positive regularization coefficients. Upon con- the discrete Fourier transform (DFT) of x, F - 1 indicates
vergence, the variables g 1 and g 2 are estimates of the the inverse DFT operator, 9 refers to elementwise multipli-
sharp image gradients in the x and y directions, respec- cation, S is the soft-thresholding operator defined ele-
tively. In [15] and [39], (S6) was generalized by realizing mentwise in (S3), and the operator N 1 ($) normalizes its
that D x and D y are computed using linear filters, which operand into the unit sum. In this case, g l = {g li} Ci = 1 ,
can be generalized into a set of C filters {fi} Ci = 1: b l = {m li g li} Ci = 1 , and g l, f l ) y, and z l refer to {g li} Ci = 1 ,
{f li ) y} Ci = 1, and {z li} Ci = 1 stacked together. Note that layer-
min | a 2 fi ) y - k ) g i 22 + m i g i 1 k + 2 k 2,
C
1 e 2
specific parameters g l, b l , and f l are used. The parameter
C
k,{g i} i = 1 i = 1
e 2 0 is a fixed constant.
subject to k 1 = 1, k $ 0. (S7)
As with most existing unrolling methods, only L
An efficient optimization algorithm to solve (S7) is the iterations are performed. The sharp image is retrieved
half-quadratic splitting algorithm, which alternately mini- from g L and k L by solving the following linear least-
mizes the surrogate problem squares problem:
| a 12
C
min fi ) y - k ) g i 2
1
C
hi
xu = argmin 2 y - ku ) x 22 + | 2 f iL ) x - g iL
C 2 2
k,{g i, z i} i = 1 i = 1
2
x
g i - z i 2 m + 2 k 2,
1 2 e 2
i=1
]] ku 9 yt + | h i W f iL 9 X
+ mi zi 1 + Z] t ) C _b
2g i g iL bbb
)
]
] b
subject to k 1 = 1, k $ 0 (S8) = F -1 [] i=1
`b
tk 2 + | h W
C
]] u L 2 bb
]] i fi bb
sequentially across the variables {g i} Ci = 1, {z i} Ci = 1 and k. \ i=1
a
Here, g i, i = 1, f, C are regularization coefficients. A := M " y, g , k ; h, f ,,
4 L L L
(S10)
recovering low-dose CT images. While this technique offers clutter resulting from the tissue. Thus, an important task
merits in reconstruction, the extracted features may not favor is to separate the tissue from the blood. Various filtering
detection tasks. Therefore, Wu et al. [37] extend the meth- methods have been used in this context, such as high-
od by concatenating it with a detection network and apply pass filtering and filtering based on the singular value
joint fine-tuning after individually training both networks. decomposition. Solomon et al. [33] suggest using a robust
Their jointly fine-tuned network outperforms state-of-the- PCA technique by modeling the received ultrasound movie
art alternatives. as a low-rank and sparse matrix, where the tissue is low
Another important imaging modality is ultrasound, rank and the blood vessels are sparse. They then unroll an
which has the advantage of being a radiation-free ISTA approach to robust PCA into a deep network, which
approach. When used for blood flow depiction, one of the is called convolutional robust PCA (CORONA). As the
challenges is the fact that the tissue reflections tend to be name suggests, the authors replace matrix multiplications
much stronger than those of the blood, leading to strong with convolutional layers, effectively converting the
Blurred Image y f1
fL ∗ fL–1 ∗ ∗
gL z L−1 z L−2 z1
M4
Layer L Layer L – 1 M2 ( .; β1) M1( ., .; ζ1)
( , , .; η, fL)
. .
k L−1
g1
k L−2
g0
M3 ( ., .)
Estimated k1
~
Kernel k k0
Layer 1
Estimated
Image ~
x
FIGURE S1. Deep unrolling for blind deblurring [15]. The analytical operations M 1, M 2, M 3, and M 4 correspond to casting the analytic expres-
sions in (S9) and (S10) into the network. Trainable parameters are colored in blue. In particular, the parameters f l, l = 1, f, L denote trainable filter
coefficients in the l th layer.
–10
–20
1 mm –30
FIGURE 9. Sample experimental results demonstrating the recovery of ultrasound contrast agents (UCAs) from cluttered maximum-intensity projection
(MIP) images [33]. (a) An MIP image of the input movie, composed from 50 frames of simulated UCAs cluttered by tissue. (b) A ground-truth UCA MIP
image. (c) A recovered UCA MIP image via CORONA. (d) A ground-truth tissue MIP image. (e) A recovered MIP tissue image via CORONA. The color bar
is measured in decibels. (Source: [33]; used with permission.)
:= U 1 " y, a li - 1, z li - 1; t i, D i ,,
2 Ux - y 2 + | m i g ^ D i x h, (S11)
C
1 2
min
z li = Pg ' D i x l + a li - 1; t ii 1
x
i=1 m
Measurements
y
α l−1 xl zl αl Recovered
U1 (., ., .; ρ, D) U 2 (., .; λ, ρ, D) U3 (., ., .; η, D) Stage l +1
Image x
"
z l−1
Stage l
FIGURE S2. The ADMM-CSNet [14]. Each stage includes a series of interrelated operations whose analytic forms are given in (S14). The trainable
parameters are colored in blue.
Ll + 1 = T mn ' b I - n H 1H H 1 l Ll - H 1H H 2 S l + H 1H D 1,
In ultrasound imaging, a series of pulses is transmitted into 1
1
the imaged medium, and the pulses’ echoes are received
S l + 1 = S mn ' b I - n H 2H H 2 l S l - H 2H H 1 Ll + H 2H D 1,
in each transducer element. After beamforming and 1, 2 1
2
CRF–RNN
φ
FCN
Q1 Q2
Input Message Compatibility Unary Predicted
Normalization Stage 2
"
Stage 1
FIGURE S3. The CRF–RNN network [30]. An FCN is concatenated with an RNN, called the CRF–RNN, to form a deep network. This RNN essentially
performs MF iterations and acts like CRF-based postprocessing. The concatenated network can be trained end to end to optimize its performance.
M . WH, W $ 0, H $ 0, (S21) 6M 9 ^ W l - 1 H l hb - 2@ H l
T
, (S24)
^ W l - 1 H l hb - 1 H l
Wl = Wl-1 9 T
where W has w l as its lth column, $ 0 denotes element- Normalize W l so that the columns of W l
wise nonnegativity, and H = (h l,t). To remove multiplicative
have unit norm and scale H l accordingly, (S25)
ambiguity, it is commonly assumed that each column of
W has a unit , 2 norm; i.e., occurrences of w l are unit for l = 1, 2, f. In [29], a slightly different update scheme
vectors. The model (S21) is commonly called nonnegative for W was employed to encourage the discriminative
matrix factorization (NMF) [S5] and has found wide appli- power. We omit discussing it for brevity.
cations in signal and image processing. In practice, the A deep network can be formed by unfolding these itera-
nonnegativity constraints prevent the mutual canceling of tive updates. In [29], instances of W l are untied from the
basis vectors and thus encourage semantically meaningful update rule (S24) and considered trainable parameters. In
decompositions, which turns out to be highly beneficial. other words, only (S23) and (S25) are executed in each
Assuming that the phases among different sources are layer. Similar to (S22), the b divergence, with a different
approximately the same, the power or magnitude spectro- b value, was employed in the training loss function. A
gram of the mixture can be decomposed as a summation splitting scheme was also designed to preserve the non-
of those from each source. Therefore, after performing negativity of W l during training.
NMF, the sources can be separated by selecting basis vec- References
tors corresponding to each individual source and recom- [S5] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-neg-
ative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791,
bining the source-specific basis vectors to recover the 1999. doi: 10.1038/44565.
magnitude spectrograms. In practical implementation, typi- [S6] C. Févotte, N. Bertin, and J. Durrieu, “Nonnegative matrix factoriza-
tion with the Itakura-Saito divergence: With application to music analysis,”
cally, a filtering process similar to classical Wiener filtering Neural Comput., vol. 21, no. 3, pp. 793–830, Mar. 2009. doi: 10.1162/
is performed for magnitude spectrogram recovery. neco.2008.04-08-771.
l+1
W1,1
x10 x1l x1l+1 x1L Algorithm: Input x0, Output xL
for l = 0,1,… , L − 1 do
x20 x2l x2l+1 x2L
xl+1 ← σ (wl+1xl + bl+1),
end for
x30 x3l x3l+1 x3L
Activation Function
FIGURE 10. An MLP can be interpreted as executing an underlying iterative algorithm with finite iterations and layer-specific parameters.
h k (x k ; w k ) . h k ( x k ; Z
may be treated as observations, and, if the MSE loss is
t k), (S28)
w k ) + H k (w k - w
chosen, network training essentially performs MMSE esti-
mation that is conditional on observations. where H k = 2h k /2w k ; w = wt . For a neural network, H k is
k k
Let {(x 1, y 1), (x 2, y 2), f, (x N , y N )} be a collection of training essentially the derivative of its output yt k across its parame-
pairs. We view the training samples as sequentially ters w k and therefore can be computed via backpropaga-
observed data following a time order. At time step k, tion. The following recursion is then executed:
when feeding x k into the neural network with parameters
w, the network performs a nonlinear mapping h k (·; w k) K k = Pk H k ^ H kT Pk H k + R k h-1,
and outputs an estimate yt k of y k. This process can be w t k + K k ^ y k - yt k h,
t k+1 = w
formally described as the following nonlinear state-transi- Pk + 1 = Pk - K k H Tk Pk + Q k, (S29)
tion model:
where K k is commonly called the Kalman gain. For details
w k + 1 = w k + ~ k, (S26) on deriving the update rules (S29), see [61, Ch. 1]. In sum-
y k = h k ^ x k; w k h + o k, (S27) mary, neural networks can be trained with the EKF by the
following steps:
where ~ k and o k are zero-mean white Gaussian noises
1) Initialize wt 0 and P0.
with covariance E (~ k ~ lT ) = d k,l Q k and E (o k o lT ) = d k,l R k,
2) For k = 0, 1, f,
a) feed x k into the network to obtain the output yt k
b) use backpropagation to compute H k in (S28)
c) apply the recursion in (S29).
Neural The matrix P k is the approximate error covariance
xk Network yk yk matrix, which models the correlations between network
"
MSE
hk (·; wk)
parameters and thus delivers second-order derivative infor-
mation, effectively accelerating the training speed. For
Time Step k
example, in [62], it was shown that training a multilayer
+wk
perceptron using the EKF requires an orders-of-magnitude-
lower number of epochs than standard backpropagation.
In [61], some variants of the EKF training paradigm are
hk+1 (·; wk+1)
xk+1 yk+1 yk+1 discussed. The neural network represented by h k can be a
"
MSE
Neural
Network recurrent network and trained in a similar fashion, and the
Time Step k + 1 noise covariance matrix R k is scaled to play a role similar
to the learning rate adjustment. To reduce the computation-
al complexity, a decoupling scheme is employed, which
FIGURE S4. The state-transition model for neural network training. The
training data can be viewed as sequentially feeding through the neural divides parameters w k into mutually exclusive groups and
network, and the network parameters can be viewed as system states. turns P k into a block-diagonal matrix.
et thm
Generic Neural Network
k
studies around LISTA, refer to “Convergence and Optimal-
i
or
d lgor
w
ol e A
ity Analysis of the Learned Iterative Shrinkage and Threshold-
N
tiv
Target
ra
ing Algorithm.” le
Ite
nr
[12] A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural [35] Z. Q. Wang, J. L. Roux, D. Wang, and J. R Hershey, “End-to-end speech sepa-
networks for inverse problems in imaging: Beyond analytical methods,” IEEE Signal ration with unfolded iterative phase reconstruction,” in Proc. Interspeech, 2018,
Process. Mag., vol. 35, no. 1, pp. 20–36, 2018. doi: 10.1109/MSP.2017. pp. 2708–2712. doi: 10.21437/Interspeech.2018-1629.
2760358. [36] J. Adler and O. Öktem, “Learned primal-dual reconstruction,” IEEE Trans.
[13] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proc. Med. Imag., vol. 37, no. 6, pp. 1322–1332, 2018. doi: 10.1109/TMI.2018.
Int. Conf. Machine Learning, 2010, pp. 399–406. doi: 10.5555/3104322.3104374. 2799231.
[14] Y. Yang, J. Sun, H. Li, and Z. Xu, “ADMM-CSNet: A deep learning approach [37] D. Wu, K. Kim, B. Dong, G. E. Fakhri, and Q. Li, “End-to-end lung nodule
for image compressive sensing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, detection in computed tomography,” in Machine Learning in Medical Imaging
no. 3, pp. 521–538, 2020. doi: 10.1109/TPAMI.2018.2883941. (Lecture Notes in Computer Science), Y. Shi, H. I. Suk, M. Liu, Eds. Cham:
Springer-Verlag, 2018, pp. 37–45.
[15] Y. Li, M. Tofighi, V. Monga, and Y. C. Eldar, “An algorithm unrolling
approach to deep image deblurring,” in Proc. IEEE Int. Conf. Acoustics, Speech, [38] S. A. H. Hosseini, B. Yaman, S. Moeller, M. Hong, and M. Akçakaya, “Dense
and Signal Processing, 2019, pp. 7675–7679. doi: 10.1109/ICASSP.2019. recurrent neural networks for inverse problems: History-cognizant unrolling of opti-
8682542. mization algorithms,” 2019, arXiv:1912.07197.
[16] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible [39] Y. Li, M. Tofighi, J. Geng, V. Monga, and Y. C. Eldar, “Efficient and interpre-
framework for fast and effective image restoration,” IEEE Trans. Pattern Anal. table deep blind image deblurring via algorithm unrolling,” IEEE Trans. Comput.
Mach . Intell ., vol. 39, no. 6, pp. 1256 –1272 , 2017. doi: 10.1109/ Imag., vol. 6, pp. 666–681, Jan. 2020. doi: 10.1109/TCI.2020.2964202.
TPAMI.2016.2596743. [40] L. Zhang, G. Wang, and G. B. Giannakis, “Real-time power system state estima-
[17] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” tion and forecasting via deep unrolled neural networks,” IEEE Trans. Signal Process.,
in Proc. Int. Conf. Artificial Intelligence and Statistics, 2011, pp. 315–323. vol. 67, no. 15, pp. 4069–4077, Aug. 2019. doi: 10.1109/TSP.2019.2926023.
[18] Y. A. LeCun, L. Bottou, G. B. Orr, and k Müller, “Efficient BackProp,” in [41] X. Zhang, Y. Lu, J. Liu, and B. Dong, “Dynamically unfolding recurrent
Neural Networks: Tricks of the Trade (Lecture Notes in Computer Science), restorer: A moving endpoint control method for image restoration,” in Proc. Int.
Berlin: Springer-Verlag, 2012, pp. 9–48. Conf. Learning Representations, 2019.
[19] K. Fukushima, “Neocognitron: A self-organizing neural network model for a [42] S. Lohit, D. Liu, H. Mansour, and P. T. Boufounos, “Unrolled projected gradient
mechanism of pattern recognition unaffected by shift in position,” Biol. Cybern., descent for multi-spectral image fusion,” in Proc. IEEE Int. Conf. Acoustics, Speech
vol. 36, no. 4, pp. 193–202, Apr. 1980. doi: 10.1007/BF00344251. and Signal Processing, May 2019, pp. 7725–7729. doi: 10.1109/ICASSP.2019.
8683124.
[20] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and
functional architecture in the cat’s visual cortex,” J. Physiol., vol. 160, no. 1, [43] G. Dardikman-Yoffe and Y. Eldar, “Learned SPARCOM: Unfolded deep
pp. 106–154, 1962. doi: 10.1113/jphysiol.1962.sp006837. super-resolution microscopy,” Opt. Express, vol. 28, no. 19, pp. 27,736–27,763,
2020. doi: 10.1364/OE.401925.
[21] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations
by error propagation,” in Parallel Distributed Processing: Explorations in the [44] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
Microstructure of Cognition, Vol. 1, D. E. Rumelhart, J. L. McClelland, and CORPORATE training by reducing internal covariate shift,” in Proc.32nd Int. Conf. Machine
PDP Research Group, Eds. Cambridge, MA: MIT Press, 1986, pp. 318–362. Learning, 2015, pp. 448–456.
[22] B. Xin, Y. Wang, W. Gao, D. Wipf, and B. Wang, “Maximal sparsity with deep [45] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighbor-
networks?” in Proc. Advances Neural Information Processing Systems, 2016, hood regression for fast super-resolution,” in Proc. Asian Conf. Computer Vision,
pp. 4340–4348. doi: 10.5555/3157382.3157583. 2014, pp. 111–126.
[23] J. Liu, X. Chen, Z. Wang, and W. Yin, “ALISTA: Analytic weights are as [46] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep
good as learned weights in LISTA,” in Proc. Int. Conf. Learning Representation, convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2,
2019. pp. 295–307, Feb. 2016. doi: 10.1109/TPAMI.2015.2439281.
[24] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfold- [47] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse
ed ISTA and its practical weights and thresholds,” in Proc. 32nd Int. Conf. Information representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, Nov.
Processing Systems, 2018, pp. 9079–9089. doi: 10.5555/3327546.3327581. 2010. doi: 10.1109/TIP.2010.2050625.
[25] Y. Li and S. Osher, “Coordinate descent optimization for L1 minimization with [48] D. Perrone and P. Favaro, “A clearer picture of total variation blind deconvolu-
application to compressed sensing: A greedy algorithm,” Inverse Probl. Imag., vol. tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 6, pp. 1041–1055, June
3, no. 3, pp. 487–503, 2009. doi: 10.3934/ipi.2009.3.487. 2016. doi: 10.1109/TPAMI.2015.2477819.
[26] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image [49] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural net-
super-resolution with sparse prior,” in Proc. IEEE Int. Conf. Computer Vision, work for dynamic scene deblurring,” in Proc. IEEE Conf. Computer Vision and
2015, pp. 370–378. doi: 10.1109/ICCV.2015.50. Pattern Recognition, 2017, pp. 257–265. doi: 10.1109/CVPR.2017.35.