Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging Mathematical Imaging and Vision
Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging Mathematical Imaging and Vision
Carola-Bibiane Schönlieb
Xue-Cheng Tai
Laurent Younes
Editors
Handbook of
Mathematical
Models and
Algorithms in
Computer Vision
and Imaging
Mathematical Imaging and Vision
Handbook of Mathematical Models
and Algorithms in Computer Vision
and Imaging
Ke Chen • Carola-Bibiane Schönlieb •
Xue-Cheng Tai • Laurent Younes
Editors
Handbook of
Mathematical Models
and Algorithms
in Computer Vision
and Imaging
Mathematical Imaging and Vision
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland.
Preface
The rapid development of new imaging hardware, the advance in medical imaging,
the advent of multi-sensor data fusion and multimodal imaging, as well as the
advances in computer vision have sparked numerous research endeavours leading
to highly sophisticated and rigorous mathematical models and theories. Motivated
by the increasing use of variational models, shapes and flows, differential geome-
try, optimisation theory, numerical analysis, statistical/Bayesian graphical models,
machine learning, and deep learning, we have invited contributions from leading
researchers and publish this handbook to review and capture the state of the art of
research in Computer Vision and Imaging.
This constantly improving technology that generates new demands not readily
met by existing mathematical concepts and algorithms provides a compelling
justification for such a book to meet the ever-growing challenges in applications
and to drive future development. As a consequence, new mathematical models
have to be found, analysed and realised in practice. Knowing the precise state-of-
the-art developments is key, and hence this book will serve the large community
of mathematics, imaging, computer vision, computer sciences, statistics, and, in
general, imaging and vision research. Our primary audience are
• Graduate students
• Researchers
• Imaging and vision practitioners
• Applied mathematicians
• Medical imagers
• Engineers
• Computer scientists
Viewing discrete images as data sampled from functional surfaces enables the use of
advanced tools from calculus, functions and calculus of variations, and optimisation
and provides the basis of high-resolution imaging through variational models. No
other framework can provide the comparable accuracy and precision to imaging and
vision.
v
vi Preface
to facilitate browsing the content list. However, such a division is artificial because,
these days, research becomes increasingly intra-disciplinary as well as inter-
disciplinary, and ideas from one topic often directly or indirectly inspire or transpire
another topic. This is very exciting.
For newcomers to the field, the book provides a comprehensive and fast track
introduction to the core research problems, to save time and get on with tackling new
and emerging challenges, rather than running the risk of reproducing/comparing to
some old works already done or reinventing same results. For researchers, exposure
to the state of the art of research works leads to an overall view of the entire field
so as to guide new research directions and avoid pitfalls in moving the field forward
and looking into the next 25 years of imaging and information sciences.
The dreadful Covid-19 pandemic starting from 2020 has affected lives of
everyone, of course including all researchers. We are still not out of the woods.
The editors are very much grateful to the book authors who have endured much
hardship during the last 3 years and overcome many difficulties to have completed
their chapters on time. We are also indebted to many anonymous reviewers who
provided valuable reviews and helpful criticism to improve presentations of our
chapters.
The original gathering of all editors was in 2017 when the first three editors
co-organised the prestigious Isaac Newton Institute programme titled “Variational
methods and effective algorithms for imaging and vision” (https://www.newton.
ac.uk/event/vmv/), partially supported by UK EPSRC GR/EP F005431 and Isaac
Newton Institute for Mathematical Sciences. During the programme, Mr Jan
Holland from Springer-Nature kindly suggested the idea of a book. We are grateful
to his suggestion which sparked the editors’ fruitful collaboration in the last few
Preface vii
years. The large team of publishers who have offered immense help to us include
Michael Hermann (Springer), Allan Cohen (Palgrave) and Salmanul Faris Nedum
Palli (Springer). We thank them all.
Finally, we wish all readers a happy reading.
The editorial team:
Volume 1
ix
x Contents
Volume 2
Volume 3
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1959
About the Editors
xv
xvi About the Editors
xix
xx Contributors
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Convex or Non-convex: Main Idea and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Sparsity-Inducing Separable Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CNC Models with Sparsity-Inducing Separable Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Sparsity-Inducing Non-separable Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
CNC Models with Sparsity-Inducing Non-separable Regularizers . . . . . . . . . . . . . . . . . . . . . . 24
Construction of Matrix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A Simple CNC Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Path of Solution Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Forward-Backward Minimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
FB Strategy for Separable CNC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
FB Strategy for Non-separable CNC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Efficient Solution of the Backward Steps by ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Examples Using CNC Separable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Examples Using CNC Non-separable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Abstract
the data-fidelity term, and the regularization term. Much research has focused
on models where both terms are convex, which leads to convex optimization
problems. However, there is evidence that non-convex regularization can improve
significantly the output quality for images characterized by some sparsity
property. This fostered recent research toward the investigation of optimization
problems with non-convex terms. Non-convex models are notoriously difficult
to handle as classical optimization algorithms can get trapped at unwanted local
minimizers. To avoid the intrinsic difficulties related to non-convex optimization,
the convex non-convex (CNC) strategy has been proposed, which allows the
use of non-convex regularization while maintaining convexity of the total cost
function. This work focuses on a general class of parameterized non-convex
sparsity-inducing separable and non-separable regularizers and their associated
CNC variational models. Convexity conditions for the total cost functions and
related theoretical properties are discussed, together with suitable algorithms for
their minimization based on a general forward-backward (FB) splitting strategy.
Experiments on the two classes of considered separable and non-separable CNC
variational models show their superior performance than the purely convex
counterparts when applied to the discrete inverse problem of restoring sparsity-
characterized images corrupted by blur and noise.
Keywords
Introduction
A wide class of linear systems derived from the discretization of linear ill-posed
inverse problems in data processing is characterized by high dimensionality, ill-
conditioned matrices, and noise-corrupted data. In this class of discrete inverse
problems, a noisy indirect observation b ∈ Rm of an original unknown image
x ∈ Rn is modeled as
b = Ax, (1)
where A ∈ Rm×n accounts for the data-acquisition system. For instance, A can be a
convolution matrix modeling optical blurring, a wavelet or Fourier transform matrix
in image synthesis, a radon transform matrix in X-ray computerized tomography,
a sampling matrix in compressed sensing, a binary selection matrix in image
inpainting, or the identity matrix in image denoising and segmentation.
When m < n, the linear system (1) is underdetermined and among the infinity of
solutions, it is common to seek an approximate solution with minimal norm, that is,
one solves the constrained optimization problem
1 Convex Non-convex Variational Models 5
Even in the most favorable case that m = n, so that the linear system (1) can
admit a unique solution, ill-conditioning of matrix A typically makes the problem
very difficult from a numerical point of view.
Indeed, for many image processing applications of practical interest, problems in
form (1) are ill-posed linear inverse problems. The term ill-posed was coined in the
early twentieth century by Hadamard who defined a linear problem to be well-posed
if it satisfies the following three requirements:
b = Ax + η, (4)
1
x ∗ ∈ arg minn J(x), J(x) = Ax − b22 + μ(x). (5)
x∈R 2
The quadratic term in (5) is the so-called L2 fidelity term, which forces closeness
of solution(s) x ∗ to data b according to the linear acquisition model (4) and to the
assumed noise Gaussian distribution. The term (x) in (5) represents the sparsity-
inducing regularization term and encodes some sparsity priors on the unknown
sought image. Finally, the positive scalar μ, referred to as the regularization
parameter of variational model (5), is a free parameter which allows to control the
trade-off between data fidelity and regularization.
In this work, we are particularly interested in sparsity-promoting regularization
terms : Rn → R having the following general form
with
It is important for the purposes of this work to introduce a partition of the class
of sparsity-promoting regularizers defined in (6) into two sub-classes based on
separable and non-separable penalty functions .
1 1 1
0 0 0
0 100 200 0 100 200 0 100 200
Fig. 1 Prototypical example images characterized by different sparse feature vectors (first row)
and their associated normalized histograms (second row)
s
(y) = φi (yi ), with φi : R → R, (7)
i=1
1 1 1
0 0 0
0 100 200 0 100 200 0 100 200
Fig. 2 Realistic images characterized, from left to right, by increasing level of sparsity of the
gradient magnitudes (first row) and their associated normalized histograms (second row)
Some interesting models of the form (5)–(6) are characterized by the following
well-known matrices A and L:
2016; Selesnick and Bayram 2014; Lanza et al. 2017), including 1D and 2D total
variation denoising (Lanza et al. 2016a; Malek-Mohammadi et al. 2016; Zou et al.
2019; Du and Liu 2018), transform-based denoising (Parekh and Selesnick 2015;
Ding and Selesnick 2015), low-rank matrix estimation (Parekh and Selesnick 2016),
decomposition and segmentation of images and scalar fields over surfaces (Chan
et al. 2017; Huska et al. 2019a,b), as well as machine fault detection (Cai et al.
2018; Wang et al. 2019).
The flexibility and effectiveness of the CNC approach depends on the con-
struction of non-trivial separable and non-separable convex functions. It turns
out that Moreau envelopes and infimal convolutions are useful for this purpose
(Selesnick 2017a,b; Carlsson 2016; Soubies et al. 2015). Based on convex analysis,
families of non-convex non-separable penalty functions have been proposed in
Selesnick (2017a) that do maintain convexity of the cost functional J for any
matrix A, but only in the special case where both G and L in (6) are identity
operators. More recently, a convex approach was applied in Lanza et al. (2019)
where a general CNC framework is proposed for constructing non-separable non-
convex regularizers starting from any convex regularizer, any matrix A and L,
and quite general functions G. In particular, an infimal convolution is subtracted
from a convex regularizer, such as the 1 -norm, leading to a resulting non-convex
regularizer.
Non-convex penalties of various functional forms have been proposed too for
overcoming limitations of the 1 norm by using penalties that promote sparsity more
strongly (Castella and Pesquet 2015; Candés et al. 2008; Nikolova 2011; Nikolova
et al. 2010; Chartrand 2014; Chouzenoux et al. 2013; Portilla and Mancera 2007;
Shen et al. 2016). However, these methods do not aim to maintain convexity of the
cost function to be minimized. Moreover, for what concerns non-separable sparsity-
inducing penalties in (6), pioneering work has been conducted in Tipping (2001)
and Wipf et al. (2011); however, also such penalties were not designed to maintain
cost function convexity.
We finally note that infimal convolution (related to the Moreau envelope) has
been used to define generalized TV regularizers (Setzer et al. 2011; Chambolle and
Lions 1997; Burger et al. 2016; Becker and Combettes 2014). However, the aims
and methodologies of these past works are quite different from those considered
here. In fact, in these works, the 1 norm is replaced by an infimal convolution; the
resulting regularizer is convex.
In this section, we first recall some definitions which will be useful for the rest
of the work, and, in particular, we report some results from convex analysis. We
then review some popular sparsity-inducing separable regularizers and discuss their
properties.
12 A. Lanza et al.
In this work, we denote by R+ and R++ the sets of nonnegative and positive real
numbers, respectively, by In the identity matrix of order n, by 0n the n-dimensional
null vector, by null(M) the null space of matrix M, and by Γ0 (Rn ) the set of proper
lower semicontinuous convex functions from Rn to R := R ∪ {+∞}.
and it is said to be exact and denoted by f g if the infimum above is attained for
any x ∈ Rn , namely, f g (x) = minv∈Rn f (v) + g(x − v) , for any x ∈ Rn .
Definition 3 (Moreau envelope). Let f ∈ Γ0 (Rn ) and let a ∈ R++ . The Moreau
envelope of f with parameter a is defined by
a a
envaf (x) = f · 22 (x) = minn f (v) + x − v22 . (10)
2 v∈R 2
We notice that, for any f ∈ Γ0 (Rn ), a ∈ R++ , the cost function f (v) + a2 x − v22
in (10)–(11) is strongly convex in v; hence it admits a unique (global) minimizer.
Proof. Recalling the Huber function definition in (12), the function fa in (15) takes
the explicit form
⎧
⎪ a n
⎨ i=1 xi
2 for x2 ∈ [0, 1/a] ,
fa (x) = 2 (17)
⎪ 1
⎩ n
i=1 xi −
2 for x2 ∈ ]1/a, +∞[ .
2a
The two pieces of function fa in (17) are clearly both continuously differentiable on
their domain with gradients given by
⎧
⎪
⎨ ax for x2 ∈ [0, 1/a] ,
∇fa (x) = 1 (18)
⎪
⎩ x x for x2 ∈ ]1/a, +∞[ .
2
It follows easily from (18) that, for any a ∈ R++ , the gradient function ∇fa (x)
is continuous also at points x on the spherical surface x2 = 1/a separating its
two pieces. Finally, the compact form of ∇fa given in (16) comes straightforwardly
from (18).
a = 0.5
2 1
Log
Rat a = 1.0
1 Exp 0.5
a = 2.0
MC
0 0
-4 -3 -2 -1 0 1 2 3 4 -2 -1 0 1 2
x x
Fig. 3 Sparsity-inducing scalar penalties: (a) p penalty for some different p values, (b) some
parameterized non-convex penalties satisfying assumptions 1–5 (see Table 1) and the MC penalty
(see definition in (13)) all with concavity parameter a = 1, and (c) MC penalty for some different
values of the concavity parameter a
the solid red curve in Fig. 3a). In fact, this choice very likely leads to a convex
sparsity-inducing regularizer and, hence, to a convex variational model which can
be solved numerically by standard convex optimization algorithms. However, it is
well known that the 1 norm penalty function tends to underestimate high-amplitude
components of the vector to which it is applied, in our case y = G(Lx). More
generally, it is well known that non-convex penalty functions hold the potential for
inducing sparsity more effectively than convex penalty functions. A natural non-
convex separable alternative to the 1 norm is the p quasi-norm penalty (Sidky
p s
et al. 2014), (y) = p1 yp = p1 i=1 |yi | , 0 < p < 1 ; see the solid blue
p
and black curves in Fig. 3a, corresponding to p = 0.5 and p = 0.1, respectively.
However, such a non-convex family of penalties can not be used to the purpose
of constructing CNC variational models. In fact, since the infimum of the second-
order derivative of the p penalty is equal to −∞ for any p ∈]0, 1[, it is not possible
to obtain a total convex model even when coupling the regularizer with a strongly
convex quadratic fidelity term.
1 Convex Non-convex Variational Models 15
This section is concerned with the formulation of CNC variational models with
separable sparsity-promoting regularization terms; see Definition 1. The general
form of such models reads
1
s
JS (x; a) = Ax − b22 + μS (x; a), S (x; a) = φMC gi (Lx); ai ,
2
i=1
(20)
where, we recall, A ∈ Rm×n and L ∈ Rr×n are the coefficient matrices of two
bounded linear operators, gi : Rr → R, i = 1, . . . , s are the components of
a possibly nonlinear vector-valued function G : Rr → Rs , μ ∈ R++ is the
regularization parameter, φMC : R → R+ is the non-convex MC penalty function
defined in (13), and where we introduced the vector a := (a1 , . . . , as )T ∈ Rs++
containing the concavity parameters of all the s instances of the MC penalty in the
regularizer S . We refer to (19)–(20) as the class of CNC separable (least-squares)
models, abbreviated CNC-S-L2 models.
In order to refer to models (19)–(20) as CNC, we clearly need to derive and then
impose convexity conditions for the objective function JS . More precisely, we seek
sufficient conditions on the operators A, L, and G and on the parameters μ and ai ,
i = 1, . . . , s, to ensure that the function JS in (20) is convex (strongly convex) on
its entire domain x ∈ Rn . It is worth noting that, in practice, the operators A, L, and
G are commonly prescribed by the specific application at hand. In fact, operator A
typically comes from a (more or less accurate) modeling of the image acquisition
process, whereas operators L and G are related to the expected properties of sparsity
of the sought solution. This implies that the derived convexity conditions can be
regarded as constraints on the free parameters μ and ai of model (19)–(20).
In Lemma 1, we give some useful reformulations of the separable regularizer S
defined in (20); then in Theorem 1, we derive conditions for convexity of JS .
s
HS (x; a) = hai gi (Lx) (22)
i=1
1
= · 1 W · 22 G(Lx) (23)
2
= env1W −1 · W G(Lx) , (24)
1
with hai the Huber function defined in (12) and W ∈ Rs×s the matrix defined by
√ √
W := diag a1 , . . . , as . (25)
s
s
S (x; a) = gi (Lx) − ha gi (Lx) = G(Lx)1 − hai gi (Lx) ,
i
i=1 i=1
HS (x;a)
(27)
which proves (21)–(22). Then, based on the Huber function definition in (12), the
function HS (x; a) in (27) can be manipulated as follows:
s
HS (x; a) = enva| ·i | gi (Lx)
i=1
s
ai 2
= min |vi | + gi (Lx) − vi
vi ∈R 2
i=1
s
ai 2
= mins |vi | + gi (Lx) − vi
v∈R 2
i=1
⎧ ⎫
⎨ s
1 √
s 2 ⎬
= mins |vi | + ai gi (Lx) − vi
v∈R ⎩ 2 ⎭
i=1 i=1
2
1
= mins v1 + W G(Lx) − v (28)
v∈R 2 2
18 A. Lanza et al.
1
= · 1 W · 2 G(Lx) ,
2
(29)
2
with matrix W defined in (25). The last equality (29), which proves (23), comes
straightforwardly from the definition of infimal convolution in (9).
Starting from (28), and noting that by assumption the square diagonal matrix W
in (25) is invertible (in fact, ai ∈ R++ ∀ i = 1, . . . , s), we can write
1
2
HS (x; a) = mins v1 + W G(Lx) − W v 2
v∈R 2
−1 1
2
= mins W z + W G(Lx) − z 2
z∈R 1 2
= env1W −1 · W G(Lx) ,
1
2
J1 (x) := Ax22 − μ W G(Lx)2 is convex (strongly convex), (30)
T
G(z) = z(1) 2 , . . . , z(s) 2 ,
s
with z(i) i=1
partition of z ∈ Rr , (31)
Q := AT A − μ LT W 2 L 0 ( 0) . (32)
Q = AT A − μ ã LT L 0 ( 0) , (33)
that is,
ρA,L 2
σA,min
ã = τc , τc ∈ 0, 1 τc ∈ 0, 1 , ρA,L := 2
, (34)
μ σL,max
with σA,min and σL,max denoting the minimum singular value of matrix A and the
maximum singular value of matrix L, respectively.
Proof. Since the MC penalty function defined in (13) is continuous and bounded
from below by zero, if functions gi are all lower semicontinuous, then the regularizer
S and, hence, the total objective function JS in (20) are both lower semicontinuous
and bounded from below by zero.
In order to derive convexity conditions for JS , we first introduce the function
qa : R → R+ defined by
⎧
a ⎨ |t| for |t| ∈ [0, 1/a] ,
qa (t) := t 2 + |t| − ha (t) = a 2 1 (35)
2 ⎩ t + for |t| ∈ ]1/a, +∞[ ,
2 2a
where the second equality in (35) comes from the Huber function definition in (12).
It is easy to prove that, for any value of the parameter a ∈ R++ , the function qa
in (35) is convex on R, continuously differentiable on R \ {0}, and monotonically
increasing on R+ .
Based on results in Lemma 1, in particular (21)–(22), and on definition of the
Huber function in (12), the expression of function JS in (20) can be manipulated
and equivalently rewritten as follows:
⎛ ⎞
1
s
JS (x; a)= Ax − b22 + μ ⎝G(Lx)1 − hai gi (Lx) ⎠
2
i=1
s "
#
1
= Ax − b22 + μ gi (Lx) − ha gi (Lx)
i
2
i=1
1 " s
= Ax − b22 + μ gi (Lx) − ha gi (Lx)
i
2
i=1
20 A. Lanza et al.
$
ai 2 ai 2
+ gi (Lx) − gi (Lx)
2 2
μ
s s
1 2
= Ax − b22 − ai gi (Lx) +μ qai gi (Lx)
2 2
i=1 i=1
1 2
s
= Ax − b22 − μ W G(Lx)2 + μ qai gi (Lx)
2
i=1
1
s
= J1 (x) + (1/2)b22 − bT Ax + μ qai gi (Lx) , (36)
2
i=1
J2 (x)
J3 (x)
with function J1 (x) defined in (30). Function J2 (x) in (36) is affine; hence it clearly
does not affect convexity of the total objective function JS . Recalling that, given
two convex functions f1 : Rn → R and f2 : R → R, if f1 is linear or f2 is
monotonically increasing, then the composite function f2 ◦ f1 : Rn → R is convex,
function J3 (x) in (36) is convex. In fact, since the functions qai are all convex on R
and monotonically increasing on R+ and, by assumption in the theorem statement,
all functions gi are either linear or lower semicontinuous, convex, and nonnegative,
each term of the summation defining J3 in (36) is a convex function of x. Finally,
since μ ∈ R++ , it follows that a sufficient condition for JS to be convex (strongly
convex) is that the term J1 in (30) is convex (strongly convex). This proves (30).
If G is the identity operator or G has the form in (31), then we have
J1 (x) = x T AT A − μ LT W 2 L x, (37)
In order to apply in practice the CNC strategy with separable regularizers, one
has to compute the value of the scalar ρA,L defined in (34), depending on the
minimum singular value of the measurement matrix A, σA,min , and on the maximum
singular value of the regularization matrix L, σL,max . In many important imaging
applications, the values of σA,min and σL,max can be obtained by explicit formulas.
In a general case where no explicit expressions for σA,min and σL,max are available,
efficient numerical procedures can be used for their accurate estimation.
The parameter τc in (34) is referred to as the convexity coefficient of the
separable CNC variational model in (19)–(20), as it allows to tune the degree
of convexity of the model cost function JS . In particular, we notice that for τc
approaching 0 from above, the separable regularizer S tends toward the standard
1 Convex Non-convex Variational Models 21
Corollary 1. Under the same settings of Theorem 1 with G the identity operator or
a function of the form in (31), in case that null(A) ∩ null(L) = {0n } we have:
C1. Convexity condition (32) can be satisfied (with strict or weak inequality) only
if matrix A has full column rank.
C2. If A has full column rank, and condition (32) is satisfied with strict inequality,
then the function JS in (20) is strongly convex; hence it admits a unique
global minimizer.
C3. If A has full column rank, and condition (32) is satisfied with weak inequality,
then the function JS in (20) is convex and coercive; hence it admits a compact
convex set of global minimizers.
Proof. We prove C1 by contradiction. Let us assume that A has not full column
rank, such that AT A has at least one null eigenvalue. Let v be an eigenvector
associated with a null eigenvalue of AT A, and let us consider the restriction Z(t) of
the quadratic function x T Qx – with Q the matrix defined in (32) – to the line tv,
t ∈ R:
T
tv TA
Z(t) = tv T Q tv = A tv − μ tv T LT W T W L tv = − t 2 μ W Lv22 .
(38)
CNC models with non-separable regularizers, which will be illustrated in the next
two sections.
As pointed out in previous section, when the measurement matrix A is not full
column rank, then a CNC formulation is not possible using a separable sparsity-
promoting regularizer. However, in Lanza et al. (2019) and Selesnick et al. (2020),
a general strategy to construct parameterized sparsity-promoting non-convex non-
separable regularizers has been proposed which allows to tackle also the case of A
not being full column rank. This is of great importance, as it enables us to apply the
CNC approach to practically any linear inverse problem in imaging.
In accordance with Lanza et al. (2019) and Selesnick et al. (2020), we present
a general strategy for constructing non-separable sparsity-promoting regularizers
NS starting from any convex sparsity-promoting regularizer R and then subtracting
its generalized Moreau envelope. In particular, we consider regularizers R of the
form
with
% &
HNS (x; B) := R 12 B · 22 (x; B) = minn R(v) + 12 B(x − v)22 , (41)
v∈R
where B is a matrix-valued parameter which plays the same role of the parameter
vector a in the class of separable regularizers illustrated in section “CNC Models
with Sparsity-Inducing Separable Regularizers”. Indeed, the introduced class of
non-separable regularizers in (40)–(41) can be regarded as a sort of generalization
1 Convex Non-convex Variational Models 23
Proposition 3. The function HNS (x; B) in (41) exhibits the following properties:
1. For any matrix B, HNS (x; B) is proper, continuous, and convex and satisfies
2. For any full row rank matrix B, HNS (x; B) is a differentiable function, with
gradient given by
%1 &
∇HNS (x; B) = B TB x − arg minn B(x − v)22 + R(v) . (44)
v∈R 2
This section is concerned with the formulation of CNC variational models con-
taining non-separable sparsity-promoting regularizers (see Definition 1) having the
form introduced in (40)–(41). The considered non-separable CNC models thus read
1
JNS (x;B) := Ax − b22 + μNS (x;B), NS (x;B) := R(x) − HNS (x;B),
2
(49)
with function HNS defined in (41) and the matrix B and the regularizer R satisfying
assumptions B1–B3 outlined in the previous section. We refer to (48)–(49) as the
class of CNC non-separable (least-squares) models, abbreviated CNC-NS-L2 .
1 Convex Non-convex Variational Models 25
Theorem 2. Let R and B satisfy assumptions B1–B3, and let NS be the function
defined in (49) with HNS given in (41). Then, the function JNS in (49) is proper,
lower semicontinuous, and bounded below by zero. Moreover, a sufficient condition
for JNS to be convex (strongly convex) is that the matrix of parameters B satisfies
Q := ATA − μ B TB 0 ( 0) . (50)
The proofs of Theorem 2 and Corollary 2 are reported in Lanza et al. (2019).
Remark 1. All the previous derivations are valid for any function Θ : Rr → R in
the definition of the convex regularizer R in (39), provided that assumptions B1–B3
are satisfied. However, since R = Θ(G(L · )) must be a convex regularizer inducing
(as effectively as possible) sparsity of the features vector y = G(Lx), then it is very
reasonable to consider convex, sparsity-promoting, additively separable functions
Θ of the form
s
Θ(y) = θ (yi ), (51)
i=1
Construction of Matrix B
Convexity condition (50) for the cost function JNS in (49) sets an inequal-
ity constraint on B TB, hence on the matrix B of free parameters in the non-
separable regularizer NS . In the sequel, we illustrate a few simple strategies for
choosing B.
26 A. Lanza et al.
√
The first and simplest strategy consists in setting B = γ /μ A, that is,
γ T
B TB = A A, γ ∈ [0, 1] , (52)
μ
which clearly fulfills condition (50). We notice that, analogously to τc in (34) for
the CNC separable models, the scalar parameter γ in (52) controls the degree of
non-convexity of the non-separable regularization term NS , hence the degree of
convexity of the total objective JNS : the greater the γ , the more non-convex the NS
and, hence, the less convex the JNS . In particular, for γ approaching 0 from above,
B tends to the null matrix, and hence, the non-separable regularizer NS tends to
the convex regularizer R. On the other side, for γ approaching 1 from below, the
regularizer NS tends to be maximally non-convex (hence, potentially, maximally
sparsity-promoting) under the CNC constraint that the total cost function JNS must
be convex.
A more sophisticated and flexible strategy for constructing a matrix B TB
satisfying convexity condition (50) can be derived by considering the eigenvalue
decomposition of the symmetric, positive semidefinite matrix ATA,
1
B TB = V Γ EV T , Γ := diag(γ1 , . . . , γn ), γi ∈ [0, 1] ∀ i ∈ {1, . . . , n},
μ
(54)
so that, replacing (54) into convexity condition (50), we have
Q = V (E − Γ E) V T 0 ( 0) ⇐⇒ E (In − Γ ) 0 ( 0) , (55)
which is clearly satisfied given the definition of matrix Γ in (54). We notice that
when one chooses γ1 = γ2 = · · · = γn = γ ∈ [0, 1], then (54) reduces to (52), that
is, strategy (52) is included in the more general strategy (54).
Finally, in Park and Burrus (1987) another method for prescribing the matrix
B TB, hence B, for the specific purpose of image processing with TV regularization
has been proposed. In particular, the diagonal matrix Γ in (54) is set to represent a
two-dimensional dc-notch filter (a type of band stop filter) defined by Γ := I − H ,
where H is a two-dimensional low-pass filter with a dc-gain of unity and H
I . A simple choice for H is H = H0TH0 with H0 a moving-average filter having
square support. Hence, H is a row-column separable two-dimensional filter given
by convolution with a triangle sequence (Park and Burrus 1987).
1 Convex Non-convex Variational Models 27
In this section, we provide some visual insights on the properties of the considered
non-convex separable and non-separable sparsity-promoting regularizers, S and
NS , respectively defined in (20) and (40). To this aim, we consider the three two-
dimensional variational models defined by minimizing the cost functions
1
JR (x):= Ax − b22 + μ R(x), R(x) = Lx1 , (56)
2
1
JS (x; a):= Ax − b22 + μ S (x; a), S (x; a) = R(x) − HS (x; a),
2
(57)
1
JNS (x;B):= Ax − b22 +μ NS (x;B), NS (x;B)=R(x)−HNS (x;B), (58)
2
where (56) represents the model containing the baseline convex 1 norm-based
sparsity-inducing regularizer, the functions HS in (57) and HNS in (58) are defined
in (24) and (41), respectively, and we set
- . - . - .
0 0.4 1.5 −2.0 −1.0
μ = 1.5, b= , A= , L= . (59)
0 −1.0 0.8 0.5 −2.5
Moreover, according to the convexity conditions in (34) and (52), for the CNC
separable and non-separable models in (57) and (58), we choose
ρA,L
a1 = a2 = ā = τc , τc = 0.99, (60)
μ
/
γ
B = A, γ = 0.99, (61)
μ
respectively, so that both the CNC models are pushed toward their convexity limit.
In Fig. 5, we show the regularizer R and total cost function JR of the baseline
convex model (56), in Fig. 6 the regularizer S and total cost function JS of the
separable CNC model (57), and in Fig. 7 the regularizer NS and total cost function
JNS of the non-separable CNC model (58). All function graphs are accompanied,
in the bottom row, by their associated contour plots. The solid red and blue lines
in the contour plot figures represent the hyperplanes Y1 and Y2 , respectively, with
Yi := {x ∈ R2 : Li x = 0}, i ∈ {1, 2}, and Li the i-th row of matrix L.
From the left columns of Figs. 5, 6, and 7, it can be noticed that the baseline
regularizer R(x) is clearly convex, but not strictly convex, whereas the separable
and non-separable regularizers S (x; a) and NS (x; B) are non-convex. In fact,
according to their definitions in (57) and (58), they are obtained by subtracting
28 A. Lanza et al.
Fig. 5 Graphs of functions R(x) and JR (x) in (56) with associated contour plots
from R(x) the convex terms HS (x; a) and HNS (x; B), respectively. The non-convex
regularizers S (x; a) and NS (x; B) thus hold the potential for promoting sparsity
of the vector Lx = (L1 x, L2 x)T more effectively than the convex regularizer R(x).
The plots in the right columns of Figs. 5, 6, and 7 confirm, first, that the total
cost function JR (x) is clearly convex and then, more interestingly, that the cost
functions JS (x; a) and JNS (x; B) of the separable and non-separable CNC models
in (57) and (58) are also both convex, as prescribed by the CNC rationale and as
expected due to our settings τc = γ = 0.99 < 1.
As a final interesting experiment, we push both the separable and non-separable
CNC models in (57), (58) outside their guaranteed convexity regimes, as defined
by sufficient conditions (34), (52), respectively. More precisely, we set τc , γ > 1
in (60), (61), thus obtaining the total cost functions JS (x; a), JNS (x; B) depicted
in Fig. 8. It can be noticed from the graphs in the top row and, more clearly, from
the associated contour plots in the bottom row that both the cost functions are non-
convex, as expected from theory.
1 Convex Non-convex Variational Models 29
Fig. 6 Graphs of functions S (x; a) and JS (x; a) in (57) with associated contour plots
The different behavior of standard 1 norm convex regularization versus its asso-
ciated non-convex non-separable regularization can be illustrated by observing the
solution path as the regularization parameter μ varies. In particular, we denote by
xL1 the solution of the minimization problem (56) with L = I and by xNS the
solution of its associated non-separable CNC model (58). When μ is sufficiently
large, both the solutions xL1 and xNS will be the all-zero vector. When μ is
sufficiently close to zero, the solution using either regularizations will approximate
the unconstrained least-squares solution. However, as μ varies between these two
extremes, the solutions obtained using the two regularization methods will sweep
different paths. This is illustrated in Fig. 2.1 in Hastie et al. (2015) which concerns
an example of least-squares problem with 1 norm regularization where matrix A is
of size 50 × 5. This example is reproduced in Fig. 9. As in Hastie et al. (2015), the
solution path is shown as a function of the fraction: the 1 norm of xL1 divided
by the 1 norm of the unconstrained (unregularized) least-squares solution xLS ;
this fraction varies between zero and one. Repeating the same example using non-
30 A. Lanza et al.
0.6 3
0.4 2
0.2 1
0
1 1
0.5 0.5
0 1 0 1
0.5 0.5
–0.5 0 –0.5 0
x2 –1 –1 –0.5 x2 –1 –1 –0.5
x1 x1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
x2
x2
0 0
–0.2 –0.2
–0.4 –0.4
–0.6 –0.6
–0.8 –0.8
–1 –1
–1 –0.5 0 0.5 1 –1 –0.5 0 0.5 1
x1 x1
Fig. 7 Graphs of functions NS (x;B) and JNS (x;B) in (58) with associated contour plots
Fig. 8 Graphs of the total cost functions JS (x; a), JNS (x;B) and associated contour plots for the
separable and non-separable variational models in (57), (58) pushed beyond their convexity limit,
that is, for τc , γ > 1
(2009), has attracted extensive interests due to its simplicity and several important
advantages. It is well-known that this method uses little storage, readily exploits
the separable structure of the minimization problem, and is easily implemented to
practical applications. It relies on a forward gradient step (an explicit step) followed
by a backward proximal step (an implicit step).
In the separable case (section “FB Strategy for Separable CNC Models”), it
reduces to a standard proximal gradient or subgradient splitting minimization
algorithm. In the non-separable case (section “FB Strategy for Non-separable CNC
Models”), a more general form of the FB algorithm aimed to solve monotone
inclusion problems is used. The solution of the minimization problems in the
backward steps of the FB applied to both the separable and non-separable cases
relies on a very efficient ADMM strategy (section “Efficient Solution of the
Backward Steps by ADMM”).
32 A. Lanza et al.
Fig. 9 Path of the five solution components for the regularized least squares example in Hastie
et al. (2015); xμ /xLS is the red colored path for xμ = xL1 and the black colored path for
xμ = xNS , for increasing values of the regularization parameter μ
1
s
JS (x; a) = Ax − b22 − μ hai gi (Lx) + μ G(Lx)1 (63)
2
i=1
1
= Ax − b22 − μ env1W −1 · W G(Lx) + μ G(Lx)1 . (64)
2 1
J1 (x;a) J2 (x)
functions. Then, the term J2 is in general – i.e., for the great majority of reasonable
functions G – a non-differentiable function, whereas J1 can be differentiable or
non-differentiable depending on G. Indeed, some popular regularizers are defined
in terms of G functions yielding differentiability of J1 , as it will be illustrated in
Proposition 4.
Hence, we propose to compute approximate solutions x ∗ of the CNC separable
model in (62), (63), and (64) by means of the FB iterative scheme outlined in
Proposition 5. The forward step consists of a subgradient – or gradient, depending
on G – descent step of the term J1 , whereas the backward step is a proximal step
of J2 . In Proposition 4, we preliminarily derive the expression of the subgradient –
or gradient – of the function J1 .
Proof. The quadratic term in J1 – namely, the data fidelity term – is clearly
differentiable with gradient given by AT (Ax − b). Recalling that the Moreau
envelope is a differentiable function (see Proposition 1), the second term in J1 is
differentiable if the function G is differentiable. In fact, in this case the term is
composition of differentiable functions. If G is non-differentiable, then the term can
be non-differentiable or, for some special G, also differentiable.
In the general case of a possibly non-differentiable function G, expression (65)
for the subdifferential of J1 comes from applying the chain rule of differentiation to
34 A. Lanza et al.
the calculus of the subdifferential of function J1 in the form (64) and from recalling
the expression of the gradient of the Moreau envelope function given in (14).
To demonstrate (66)–(67), first we notice that if G has the form in (31), we can
write:
s
s
(i)
s
HS (x; a) = hai gi (Lx) = hai z = fai P (i) z , z = Lx ,
2
i=1 i=1 i=1
It follows from Proposition 2 that the function H (z; a) above is differentiable (sum
of differentiable functions) with gradient given by
s "
#
T
∇z H (z) = P (i) ∇z fai P (i) z
i=1
s %
&
min ai , 1/P (i) z2 P (i) z
T
= P (i)
i=1
⎛ ⎞
s %
(i) & (i)
⎝ min ai , 1/P z2 P ⎠z
T
= P (i)
i=1
= P C(z) P z ,
T
for k = 0, 1, 2, . . .
ω(k) ∈ ∂J1 x (k)
% 1 &
1/λ(k)
x (k+1) = proxJ2 w (k) = arg minn G(Lx)1 + x − w (k) 2
2
x∈R 2λ(k) μ
end
where the variable stepsizes λ(k) are chosen according to the strategy in Bello Cruz
(2017) if J1 is non-differentiable, or λ(k) = λ ∈ ]0, 2/ρ[ with ρ the Lipschitz
constant of the gradient of J1 , if J1 is differentiable.
1 μ
F(x, v; B) = Ax − b22 + μ R(x) − μ R(v) − B(x − v)22 , (70)
2 2
where, we recall, the regularization function R(x) = Θ(G(Lx)) and the parameter
matrix B satisfy assumptions B1–B3 outlined at the beginning of section “Sparsi-
ty-Inducing Non-separable Regularizers”.
The solution of the saddle-point problem above can be calculated using a general
form of the FB algorithm (Theorem 25.8 in Bauschke and Combettes 2011). This
form of the FB algorithm is formulated to solve the general class of monotone
inclusion problems, of which the saddle-point problem (69)–(70) is a special case.
36 A. Lanza et al.
Proposition 6. Let F(x, v; B) : R2n → R be the function defined in (70) with the
parameters matrix B set as in (53)–(54). Then, a saddle-point {x ∗ , v ∗ } of F can be
∞
obtained as the limit point of the sequence of iterates x (k) , v (k) k=1 generated by
the following PDFB iterative scheme:
0 1
1 − 2γi + 2γi2
set ρ = max ei
i 1 − γi
set λ ∈ ] 0 , 2/ρ [
for k = 0, 1, 2, . . .
2 $
w(k) = x (k) − λ AT Ax (k) − b + μ B TB v (k) − x (k)
∗ α
2
t = proxαR (p) = arg minn R(t) + t −p 2
t∈R 2
α 2
= arg minn Υ (G(L t)) + t − p2 . (71)
t∈R 2
For both the FB and PDFB cases, the matrix L and the function G – hence,
the image features vector y = G(L · ) to be sparsified – are defined as in
section “Introduction”, whereas the function Υ : Rs → R is to the 1 norm function
· 1 for FB and the function Θ for PDFB. In both cases, it follows from the
considered convexity assumptions / conditions that the regularizer R = Υ (G(L · ))
is convex; hence the cost function in (71) is strongly convex and admits a unique
(global) minimizer t ∗ .
As it will be later discussed, in most cases of practical interest the function
Υ (G( · )) is easily proximable, that is, its proximity operator admits a closed form
expression or can be calculated very efficiently. Hence, we suggest to solve the
minimization problem in (71) by means of the following ADMM-based approach.
First, we rewrite (71) in the equivalent linearly constrained form:
α
t − p2
t ∗ , z∗ = arg min Υ (G(z)) + 2
s.t. : z = L t, (72)
t,z 2
where z ∈ Rr is an auxiliary variable (the notation has been chosen for coherence
with definition in (6)). Then, we introduce the augmented Lagrangian function,
α
t − p2 − ρ , z − L t + β z − L t 2 ,
L(t, z, ρ) = Υ (G(z)) + 2 2 (73)
2 2
where β > 0 is a scalar penalty parameter and ρ ∈ Rr is the dual variable, i.e., the
vector of Lagrange multipliers associated with the set of r linear constraints in (72).
Solving (72) is tantamount to seek for the saddle point of the augmented Lagrangian
function in (73). The saddle-point problem reads as follows:
2
(j +1) β
(j )
z(j +1) = arg minr L(t , z, ρ (j )
) = arg minr Υ (G(z)) + z − q
z∈R z∈R 2 2
1 (j )
q (j ) = L t (j +1) +
β
= proxΥ (G( · )) q (j ) , ρ , (76)
β
ρ (j +1) = ρ (j ) − β z(j +1) − L t (j +1) . (77)
The ADMM scheme outlined above has guaranteed convergence and, in most cases
of practical interest, allows to compute very efficiently the solution t ∗ of (71).
In the general case, the computational cost of the ADMM iteration (75), (76),
and (77) is dominated by the solution of the two subproblems for the primal
variables t and z, as the cost for updating the dual variable ρ ∈ Rr by (77) is linear
in r, hence in the number of pixels n (we do not consider the cost of multiplication
by matrix L since the term L t (j +1) in (77) must have been previously computed for
solving (76)).
The subproblem for t in (75) consists in solving an n × n linear system with
symmetric positive definite (hence, invertible) coefficient matrix In + LT L.
For ADMM implementations with iteration-independent penalty parameter β, the
matrix is constant along the ADMM iterations, and for FB (or PDFB) implemen-
tations with iteration-independent stepsize λ, it is also constant along the (outer)
FB (or PDFB) iterations. The linear system can thus be solved by direct methods,
namely, Cholesky factorization carried out once for all before starting iterations
and solution of (75) by forward and backward substitution, or by iterative methods.
In particular, when L is a sparse matrix, the (suitably preconditioned) conjugate
gradient method equipped with some variable stopping tolerance strategy represents
a good (i.e., efficient) choice. If L is a diagonal matrix, or some unitary matrix (e.g.,
the 2D discrete Fourier or cosine transform matrix, so as to sparsify the sought image
coefficients in the Fourier or cosine basis), or the matrix of some overcomplete
dictionary satisfying the tight frame condition LT L = δIn , δ ∈ R++ , then the
coefficient matrix is diagonal and (75) can be solved very efficiently. Finally, in
T
the special but practically very important case where L = LT1 , . . . , LTc with
Li ∈ Rn×n convolution matrices, i = 1, . . . , c, the linear system can also be solved
very efficiently by fast 2D discrete transforms. In particular, by assuming periodic,
symmetric, or anti-symmetric boundary conditions for the unknown image t, the
linear system in (75) can be solved by using the fast 2D discrete Fourier, cosine,
or sine transforms, respectively, all characterized by O(n log2 (n)) computational
complexity. This is the case of the TV regularizer (isotropic and anisotropic), the
Hessian-based regularizers and, more in general, of the whole important class of
widely used regularizers aimed to sparsify some (discretized) differential quantity
of the sought image.
Based on Remark 1 in section “CNC Models with Sparsity-Inducing Non-sepa-
rable Regularizers”, for both the FB and PDFB cases the subproblem for z in (76)
can be written as
1 Convex Non-convex Variational Models 39
⎧ ⎫
⎨s
β 2 ⎬
ẑ = arg minr υ gi (z) + z − q 2 , (78)
z∈R ⎩ 2 ⎭
i=1
Recalling that, based on definition (31), the vectors q (i) form a partition of q ∈
Rr s ≤ r, the computation in (81) – including calculation of all the 2
and, hence,
norm terms q (i) 2 – has linear complexity in the dimension r of the codomain of
matrix L ∈ Rr×n , hence in the number of pixels n.
is proper, lower semicontinuous, and convex and its proximal map with proximity
parameter β ∈ R++ evaluated at point q ∈ Rr is given by
β 2
proxυ(·2 ) (q) = arg minr υ z2 + z − q 2
β
(82)
z∈R 2
⎧
⎪
⎨ 0r if q2 = 0,
= q β 2 (83)
⎩ ξ̂ q2 if q2 > 0, ξ̂ = ξarg
⎪ min υ(ξ ) +
∈[0,q2 ] 2
ξ − q2 .
Proof. First, all the stated properties of composite function υ( · 2 ) come easily
from the assumed properties of functions υ and from the 2 norm function · 2
being continuous and convex on the entire domain Rr .
Then, convexity of υ( · 2 ) yields strong convexity of the cost function in
(82) which, hence, admits a unique (global) minimizer ẑ ∈ Rr . If q2 = 0
or, equivalently, q is the null vector, then the cost function in (82) reduces to
υ(z2 ) + (β/2)z22 , which is a monotonically increasing function of z2 . The
solution of (82) in this case is thus ẑ = 0r . If q is not the null vector, i.e., q2 > 0,
then it is easy to prove (see the initial part of the proof of Proposition 1 in Sidky
et al. 2014) that, under the considered assumptions on function υ, the solution of
(82) must belong to the closed segment of extremes 0r and q. By thus considering
the restriction of the cost function in (82) to that segment, parameterized by
z = ξ q/q2 , ξ ∈ [0, q2 ], one easily obtains the 1-D constrained minimization
problem in (83). Finally, the closed-form solution in (84) obtained when υ is the
identity function represents the quite popular multidimensional soft-thresholding
operator. Its derivation can be found, e.g., in the proof of Proposition 1 in Sidky
et al. (2014).
It is worth noticing that in the special case where the regularization matrix L
is the identity matrix (e.g., when one wants to sparsify the image itself as it is
expected to be predominantly zero-valued, or in general in the synthesis-based
sparse reconstruction framework), then the backward step in (71) reduces to
∗ α
2
t = arg minn Υ (G(t)) + t − p 2 = proxαΥ (G( · )) (p). (85)
t∈R 2
Hence, ADMM is not required since problem (85) consists in computing only one
proximal map of the same type as in the ADMM sub-problem for z in (76), which
in its turn reduces to solving the s lower-dimensional problems in (79)–(80) by, e.g.,
(81).
1 Convex Non-convex Variational Models 41
Finally, we notice that a suitable warm-start strategy can be used in both the FB
and PDFB approaches in order to further speedup the backward step computation by
ADMM. More precisely, at each (outer) iteration of the FB and PDFB algorithms,
the (inner) iterative ADMM scheme in (75), (76), and (77) is initialized with the
results of previous (outer) iteration, in terms of both the primal variables t, z and
the dual variable ρ. This allows to significantly decrease the number of ADMM
iterations.
Numerical Examples
provide an immediate idea of the level of sparsity of each image in terms of the
three feature vectors considered in (86). In Table 2, we report, for each image, the
42 A. Lanza et al.
Fig. 10 The three test images SPD0, SPD1, SPD2 (first row) and their associated binary sparsity
masks M (j ) , j = 0, 1, 2 (second-fourth rows) defined in (87) in terms of the feature vectors y (j ) ,
j = 0, 1, 2, given in (86)
total number of pixels n and the three total numbers of 0-value pixels of the binary
(j )
sparsity masks defined by ζ(j ) := n − ni=1 Mi , j = 0, 1, 2. As expected, the
SPD0, SPD1, and SPD2 images exhibit the highest level of sparsity, i.e., the largest
number of 0-value pixels, for the features vectors y (0) , y (1) , and y (2) , respectively.
1 Convex Non-convex Variational Models 43
L1-L2 8 L1-L2
6 6
TV-L2 TV-L2
ISNR [dB]
ISNR [dB]
ISNR [dB]
6 S2H-L2
S2H-L2 4
4
4 L1-L2
2 TV-L 2
2 2
S 2H-L2
0
0 0
50 100 150 200 250 300 350 20 40 60 80 100 20 40 60 80
Fig. 11 The three test images SPD0, SPD1, SPD2 corrupted by AWG noise of standard deviation
σ yielding BSNR(b, x̄) = 15 (first row) and the associated ISNR graphs for the three purely
convex baseline models L1 -L2 , TV-L2 , S2 H-L2 defined in (89), (90), and (91) (second row)
In accordance with the sparsity properties of the three considered test images,
to evaluate the performance of the proposed CNC separable and non-separable
models, we will compare them with the corresponding purely convex (i.e., with
convex regularizers) models, namely, the minimal L1 norm model (89), referred
to as L1 -L2 model, the isotropic TV-L2 model (90), and the S2 H-L2 model (91)
containing the S2 H regularizer which induces sparsity of the image Hessian Shatten
2-norm (Lefkimmiatis et al. 2013). More precisely, we consider the following three
variational models:
1 n
L1 − L2 : J(0) (x) = Ax − b22 + μ |xi | , (89)
2
i=1
L1 (x)
44 A. Lanza et al.
1 n
TV − L2 : J(1) (x) = Ax − b22 + μ (∇x)i , (90)
2 2
i=1
TV(x)
1 n
S2 H − L2 : J(2) (x) = Ax − b22 + μ (H x)i . (91)
2 F
i=1
S2 H(x)
We thus assume that the above three models are representative of the class of purely
convex models, and we compare their performance with those of the proposed
associated separable CNC-S-L2 and non-separable CNC-NS-L2 models which, we
recall, are also convex but contain non-convex regularizers.
It is worth to point out that the three models in (89), (90), and (91) can be
represented in a unified form according to definition (6) of the considered class
of sparsity-promoting regularizers:
1
J(j ) (x) = Ax − b22 + μ y (j ) 1 , y (j ) = G(j ) L(j ) x , j = 0, 1, 2.
2
(92)
T
(j ) (j )
G(j ) (z) = g1 (z), . . . , gn (z) , z ∈ Rrj , rj = (j + 1) n, j = 0, 1, 2,
(93)
T √ T T
L(0) = In , L(1) = DhT , DvT , L(2) = Dhh
T T
, Dvv , 2Dhv , (95)
with Dh , Dv , Dhh , Dvv , Dhv ∈ Rn×n finite difference operators discretizing the
first-order horizontal and vertical and the second-order horizontal, vertical, and
mixed horizontal-vertical partial derivatives, respectively. The discrete gradient and
Hessian operators in (90) and (91) are thus defined in terms of these matrices as
follows:
1 Convex Non-convex Variational Models 45
- . - .
(Dh x)i (Dhh x)i (Dhv x)i
(∇x)i = , (H x)i = , i = 1, . . . , n.
(Dv x)i (Dhv x)i (Dvv x)i
(96)
Finally, for what concerns the actual discretization of the gradient and Hessian
operators, in all the experiments matrices Dh , Dv , Dhh , Dvv , Dhv are the 2-D
convolution matrices (with periodic boundary conditions) associated with the
following point-spread functions:
3 4
+1
Dh → +1, −1 , Dv → ,
−1
⎛ ⎞
+1 3 4
⎜ ⎟ +1 −1
Dhh → +1, −2, +1 , Dvv → ⎝ −2 ⎠ , Dhv → ,
−1 +1
+1
The quality of the observed degraded images b and of the restored images x ∗
(in comparison with the original uncorrupted image x̄) are measured by means of
the Blurred Signal-to-Noise Ratio (BSNR) and the Improved Signal-to-Noise Ratio
(ISNR), respectively. They are defined by
BSNR (b, x̄) = SNR (b, Ax̄) , ISNR x ∗, b, x̄ = SNR x ∗, x̄ − SNR (b, x̄) ,
(98)
where E[I¯] denotes the image with constant intensity equal to the mean value of
I¯. The larger the BSNR value, the lower is the intensity, i.e., the standard deviation
σ , of the AWG noise corrupting the observation b (hence, the easier is the image
restoration problem); the larger the ISNR value, the higher the quality of the restored
image x ∗ obtained by the considered variational model. In all the experiments, after
choosing the blurring operator A and computing the blurred image Ax̄, we set the
desired BSNR value of the observation b and then exploit the BSNR definition
in (98)–(99) in order to determine the (unique) value of the AWG noise standard
deviation σ yielding the selected BSNR value:
Ax̄ − E Ax̄
σ = √ BSNR
2
. (100)
n 10 20
We consider the problem of denoising the three considered test images SPD0,
SPD1, SPD2 corrupted only by AWG noise (no blur, i.e. A = In in the acquisition
model (4) as well as in the baseline convex variational models (89), (90), and (91))
with standard deviation σ yielding BSNR(b, x̄) = 15, as shown in the first row
of Fig. 11. The three separable CNC variational models, referred to as CNC-S-L1 -
L2 , CNC-S-TV-L2 , and CNC-S-S2 H-L2 , to be compared with the baseline purely
convex models L1 -L2 , TV-L2 and S2 H-L2 defined in (89), (90), and (91), read as
follows:
CNC − S − L1 − L2 :
1 n
J(0)
S (x; a) = Ax − b22 + μ φMC |xi | ; a , (102)
2
i=1
S−L1 (x;a)
CNC − S − TV − L2 :
1 n
φMC (∇x)i 2 ; a ,
(1)
JS (x; a) = Ax − b22 + μ (103)
2
i=1
S−TV(x;a)
CNC − S − S2 H − L2 :
1 n
J(2)
S (x; a) = Ax − b2 + μ
2
φMC (H x)i F ; a , (104)
2
i=1
S−S2 H(x;a)
where φMC is the scalar MC penalty function defined in (13) and where we are
assuming a space-invariant, i.e., constant for all pixel locations, concavity parameter
a ∈ R++ for φMC . It follows from Theorem 1, in particular, condition (33),
that sufficient conditions for the three cost functions above to be convex (strongly
convex) are the following:
T
Q(j ) = In − μ a L(j ) L(j ) 0 ( 0) , j = 0, 1, 2, (105)
where we used the fact that A = In for the considered case of image denoising
and where the regularization matrices L(j ) are defined in (95). According to the
statement of Theorem 1, the sufficient conditions in (105) can be equivalently and
usefully rewritten as follows:
1
a = τc , τc ∈ [0, 1] τc ∈ [0, 1[ , j = 0, 1, 2, (106)
μ κ (j )
10 11
0
=0 9
c
-5 c
= 0.5
c
= 0.99 8
-10 = +oo
c
0 1 2 3 4 5
50 100 150 200 250 300 350 400
c
22
20
20
BEST ISNR [dB]
15
18
ISNR [dB]
10 16
c
= 10
=3 14
c
5
c
= 0.99 12
=0
0 c 10
13
12
12
BEST ISNR [dB]
10 =0
c 11
ISNR [dB]
c
= 0.99
=5 10
8 c
= 10
c 9
6
8
4 7
20 40 60 80 100 120 140 160 0 2 4 6 8 10
c
Fig. 12 ISNR results of separable CNC models CNC-S-L1 -L2 , CNC-S-TV-L2 , and CNC-S-S2 H-
L2 defined in (102), (103), and (104) when applied to the noise-corrupted images SPD0, SPD1,
and SPD2, respectively. First column: ISNR values as a function of the regularization parameter
μ for some different τc values. Second column: highest achieved ISNR values as a function of
the convexity coefficient τc . The dashed vertical red lines, corresponding to τc = 1, separate, for
each model, the pure convex and CNC regimes (τc ∈ [0, 1]) from the pure non-convex regime
(τc ∈]1, +∞[).
1 Convex Non-convex Variational Models 49
in Fig. 11 (second row) represent the ISNR values achieved by the three models
as a function of the regularization parameter μ for the three corresponding noise-
corrupted test images SPD0, SPD1, and SPD2 illustrated in the first row. From a
visual inspection, column by column, of Fig. 11 (second row), we observe that, as
expected, the best ISNR values are obtained by models L1 -L2 , TV-L2 , and S2 H-L2
on images SPD0, SPD1 and SPD2, respectively. This is completely in accordance
with the sparsity properties of the three images. The regularizers of models L1 -L2 ,
TV-L2 , and S2 H-L2 are in fact suitable for predominantly zero, piecewise constant,
and piecewise affine images, respectively, as they promote sparsity of the intensities
and of the first- and second-order intensity derivatives of the restored image.
In the next experiment, we compare the best assessed regularization models in
the three convexity regimes: pure convex (τc = 0), CNC (τc ∈ (0, 1]), and pure
non-convex regime (τc > 1). In other words, we now test the three separable CNC
models CNC-S-L1 -L2 , CNC-S-TV-L2 , and CNC-S-S2 H-L2 defined in (102), (103),
and (104) on the corresponding test images for different τc values. In Fig. 12, for
each test image SPD0 (first row), SPD1 (second row), and SPD2 (third row), we
report some interesting ISNR curves for the associated best-performing models
CNC-S-L1 -L2 , CNC-S-TV-L2 , and CNC-S-S2 H-L2 , respectively. In particular, the
plots in the first column represent, for some different τc values, the achieved ISNR
values as a function of the regularization parameter μ. The curves in the second
column depict, for a fine grid of τc values, the highest ISNR values achieved by
letting μ vary in its entire domain.
In Figs. 13, 14, and 15, we report the best (i.e., with highest associated ISNR
value) denoising results obtained by applying models CNC-S-L1 -L2 , CNC-S-TV-
L2 , and CNC-S-S2 H-L2 to the noise-corrupted test images SPD0, SPD1, and SPD2,
respectively, with different τc values. In particular, in the first column of Figs. 13,
14, and 15, we show the denoised images, whereas in the second column we report
the associated absolute error images.
From ISNR plots reported in the second column of Fig. 12, we can first observe
that usefulness of using high τc values becomes larger as the order of image
derivatives sparsified by the regularizer increases. For model CNC-S-L1 -L2 , the
best results are obtained in the CNC regime, i.e., for τc ∈]0, 1]. We recall that in
this case the upper limit of the CNC regime (τc = 1) corresponds to using x0 as
the regularizer, such that the solution is obtained by a pixel-wise hard thresholding
of the noisy observation b. For the CNC-S-TV-L2 model, the ISNR gain obtained by
the CNC regime is remarkable, whereas for the CNC-S-S2 H-L2 model, such gain
is smaller. In other words, pushing the model in pure non-convex regime (τc > 1) is
much more appealing for CNC-S-S2 H-L2 than for CNC-S-TV-L2 .
Fig. 13 Separable CNC models. Best denoising results obtained by CNC-S-L1 -L2 on image SPD0
for different τc values (left column) and associated absolute error images (right column)
1 Convex Non-convex Variational Models 51
Fig. 14 Separable CNC models. Best denoising results obtained by CNC-S-TV-L2 on image
SPD1 for different τc values (left column) and associated absolute error images (right column)
52 A. Lanza et al.
Fig. 15 Separable CNC models. Best denoising results obtained by CNC-S-S2 H-L2 on image
SPD2 for different τc values (left column) and associated absolute error images (right column)
1 Convex Non-convex Variational Models 53
usefully applied for any acquisition matrix A, also when A is very ill-conditioned
or even numerically singular like it is often the case in deblurring problems. More
precisely, we consider the restoration of the piecewise constant image SPD1 and the
piecewise affine image SPD2 depicted in the first row of Fig. 10 which, we recall,
are characterized by sparse first- and second-order derivatives, respectively.
In accordance with the considered degradation model in (4), the two test images
SPD1 and SPD2 have been synthetically corrupted by space-invariant Gaussian blur
and AWG noise, as described at the beginning of section “Numerical Examples”. In
particular, for the denoising experiment, clearly A is the identity operator, and no
synthetic blur is applied, whereas for the deblurring experiment, the Gaussian point-
spread function is generated with parameters band = 7, sigma = 1.5. We then add
AWG noise corruptions of standard deviations σ yielding BSNR(b, x̄) = 15 for
the denoising case and BSNR(b, x̄) = 7.6 for the deblurring case.
For the restoration, i.e., denoising and/or deblurring, of the degraded SPD1 and
SPD2 test images, we consider the non-separable CNC versions, referred to as
CNC-NS-TV-L2 and CNC-NS-S2 H-L2 , of the two separable CNC models CNC-
S-TV-L2 and CNC-S-S2 H-L2 defined in (103) and (104), respectively. We also
consider a slightly different but interesting version of the CNC-NS-S2 H-L2 model,
referred to as CNC-NS-S1 H-L2 , where the Shatten 2-norm (Frobenious norm) has
been replaced by the Shatten 1-norm (nuclear norm).
The three considered non-separable CNC models thus read
CNC − NS − TV − L2 :
(1) 1
JNS (x; B) = Ax − b22 + μ TV(x) − TV 2 B · 2 (x) ,
1 2
(109)
2
NS−TV(x;B)
CNC − NS − S2 H − L2 :
(2) 1
JNS (x; B) = Ax − b22 + μ S2 H(x) − S2 H 12 B · 22 (x) , (110)
2
NS−S2 H(x;B)
CNC − NS − S1 H − L2 :
1
J(3)
NS (x; B) = Ax − b22 + μ S1 H(x) − S1 H 12 B · 22 (x) . (111)
2
NS−S1 H(x;B)
54 A. Lanza et al.
Table 3 ISNR values obtained by restoring the test images SPD1 and SPD2 corrupted by zero-
mean AWG noise (Denoise) and space-invariant Gaussian blur (Deblur)
Image Model Denoise Deblur Image Model Denoise Deblur
SPD1 TV-L2 8.84 6.64 SPD2 TV-L2 7.21 3.00
S1 H-L2 5.56 2.00 S1 H-L2 7.67 2.50
S2 H-L2 4.54 1.90 S2 H-L2 6.65 2.73
CNC-NS-TV-L2 20.35 6.72 CNC-NS-TV-L2 4.11 3.20
CNC-NS-S1 H-L2 11.34 2.11 CNC-NS-S1 H-L2 12.33 2.83
CNC-NS-S2 H-L2 9.13 2.00 CNC-NS-S2 H-L2 10.57 2.73
The parameter matrix B has been constructed using dc-notch filters as described at
the end of section “Construction of Matrix B”, so that the three total cost functions
above are all convex, and hence, the three models are CNC.
Quantitative and qualitative (visual) results have been produced. In Table 3,
we report the ISNR values obtained by the three considered non-separable CNC
models on the two test images for both the denoising and deblurring experiments.
For comparison, we also report the ISNR values achieved by using the associated
purely convex baseline models. For each experiment, the best ISNR results within
each class of models are marked in boldface. Figures 16 and 17 show the corrupted
images (top rows) and the best restored images computed by the two classes of
purely convex models (center rows) and non-separable CNC models (bottom rows),
in case of denoising and deblurring, respectively, see the associated ISNR values
marked in boldface in Table 3.
From the ISNR values in Table 3 and the visual inspection of the restored
images in Figs. 16 and 17, the improvement in accuracy provided by the considered
non-convex non-separable regularizers versus the corresponding convex separable
baseline regularizers is evident, particularly for the denoising case, and in agreement
with the sparsity characteristics of the two images. It is worth remarking that such
improvement is obtained without renouncing any of the well-known advantages
of (strongly) convex optimization, namely, the existence of a unique (global)
minimizer and of numerical algorithms with proved convergence toward such
minimizer.
Furthermore, for the denoising results we could also extend the comparison
to the CNC models with separable regularizers, which were demonstrated in
section “Examples Using CNC Separable Models” to outperform the baseline purely
convex models in inducing sparsity of the gradient magnitudes or the Hessian
Shatten 2-norms in the denoised images.
To conclude, we notice that for both the separable and non-separable CNC
considered models, the regularization parameter μ has been set manually so as to
achieve the best accuracy results in terms of ISNR. In practical applications, clearly
this procedure can not be used (the true image x̄ is unknown), and also manually
tuning μ by visually inspecting the attained results is not practical. Hence, some
sort of automatic parameter selection strategy is always highly desirable. Actually,
1 Convex Non-convex Variational Models 55
Fig. 16 Non-separable CNC models. Denoising results on SPD1 (left column) and SPD2
(right column) corrupted by AWG noise. First row: degraded images (BSNR = 15). Second
row: restorations by TV-L2 (ISNR=8.84), left, and by S1 H-L2 (ISNR=7.67), right. Third row:
restorations by CNC-NS-TV-L2 (ISNR=20.35), left, and by CNC-NS-S1 H-L2 (ISNR=12.33), right
56 A. Lanza et al.
Fig. 17 Non-separable CNC models. Deblurring results on SPD1 (left column) and SPD2 (right
column) corrupted by blur and AWG noise. First row: degraded images (BSNR = 7.6). Second
row: restorations by TV-L2 (ISNR=6.64), left, and by S1 H-L2 (ISNR=3.00), right. Third row:
restorations by CNC-NS-TV-L2 (ISNR=6.72), left, CNC-NS-S1 H-L2 (ISNR=3.20), right
1 Convex Non-convex Variational Models 57
the proposed FB and PDFB numerical solution algorithms can be quite easily
equipped with such an automatic strategy. In particular, if one wants to select μ
according to the very popular discrepancy principle or to the less popular but very
effective residual whiteness principle, the ADMM approach proposed for solving
the backward denoising step can benefit from the adaptive strategies proposed in
Lanza et al. Lanza et al. (2016b, 2021, 2020) for the more general class of deblurring
problems.
Conclusion
Acknowledgments This research was supported in part by the National Group for Scientific
Computation (GNCS-INDAM), Research Projects 2019/2020.
References
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert
Spaces. Springer, New York (2011)
Bayram, I.: On the convergence of the iterative shrinkage/thresholding algorithm with a weakly
convex penalty. IEEE Trans. Signal Process. 64(6), 1597–1608 (2016)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Becker, S., Combettes, P.L.: An algorithm for splitting parallel sums of linearly composed
monotone operators, with applications to signal recovery. J. Nonlinear Convex Anal. 15(1),
137–159 (2014)
Bello Cruz, J.Y.: On proximal subgradient splitting method for minimizing the sum of two
nonsmooth convex functions. Set-Valued Var. Anal 25, 245–263 (2017)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge, MA (1987)
Bruckstein, A., Donoho, D., Elad, M.: From sparse solutions of systems of equations to sparse
modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)
Burger, M., Papafitsoros, K., Papoutsellis, E., Schönlieb, C.B.: Infimal convolution regularisation
functionals of BV and Lp spaces. J. Math. Imaging Vis. 55(3), 343–369 (2016)
58 A. Lanza et al.
Cai, G., Selesnick, I.W., Wang, S., Dai, W., Zhu, Z.: Sparsity enhanced signal decomposition via
generalized minimax-concave penalty for gearbox fault diagnosis. J. Sound Vib. 432, 213–234
(2018)
Candés, E.J., Wakin, M.B., Boyd, S.: Enhancing sparsity by reweighted l1 minimization. J. Fourier
Anal. Appl.14(5), 877–905 (2008)
Carlsson, M.: On convexification/optimization of functionals including an l2-misfit term. arXiv
preprint arXiv:1609.09378 (2016)
Castella, M., Pesquet, J.C.: Optimization of a Geman-McClure like criterion for sparse signal
deconvolution. In: IEEE International Workshop on Computational Advances Multi-sensor
Adaptive Processing, pp. 309–312 (2015)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems.
Numerische Mathematik 76, 167–188 (1997)
Chan, R., Lanza, A., Morigi, S., Sgallari, F.: Convex non-convex image segmentation. Numerische
Mathematik 138(3), 635–680 (2017)
Chartrand, R.: Shrinkage mappings and their induced penalty functions. In: International Confer-
ence on Acoustics, Speech and Signal Processing (ICASSP), pp. 1026–1029 (2014)
Chen, P.Y., Selesnick, I.W.: Group-sparse signal denoising: non-convex regularization, convex
optimization. IEEE Trans. Signal Proc. 62, 3464–3478 (2014)
Chouzenoux, E., Jezierska, A., Pesquet, J., Talbot, H.: A majorize-minimize subspace approach
for l2–l0 image regularization. SIAM J. Imag. Sci. 6(1), 563–591 (2013)
Ding, Y., Selesnick, I.W.: Artifact-free wavelet denoising: nonconvex sparse regularization, convex
optimization. IEEE Signal Process. Lett. 22(9), 1364–1368 (2015)
Du, H., Liu, Y.: Minmax-concave total variation denoising. Signal Image Video Process. 12(6),
1027–1034 (2018)
Geiger, D., Girosi, F.: Parallel and deterministic algorithms from MRF’s: surface reconstruction.
IEEE Trans. Pattern Anal. Mach. Intell. 13(5), 410–412 (1991)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of
images. IEEE PAMI 6(6), 721–741 (1984)
Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems. SIAM, Philadelphia (1997)
Hartman, P.: On functions representable as a difference of convex functions. Pac. J. Math. 9(3),
707–713 (1959)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and
Generalizations. CRC Press, Boca Raton (2015)
Huska, M., Lanza, A., Morigi, S., Sgallari, F.: Convex non-convex segmentation of scalar fields
over arbitrary triangulated surfaces. J. Comput. Appl. Math. 349, 438–451 (2019a)
Huska, M., Lanza, A., Morigi, S., Selesnick, I.W.: A convex-nonconvex variational method for the
additive decomposition of functions on surfaces. Inverse Problems 35, 124008–124041 (2019b)
Jensen, J.B., Nielsen, M.: A simple genetic algorithm applied to discontinuous regularization. In:
Proceedings IEEE workshop on NNSP, Copenhagen (1992)
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization. Scale
Space Variat. Methods Comput. Vis. 9087, 666–677 (2015)
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via nonconvex regularization with
parameter selection. J. Math. Imaging Vis. 56(2), 195–220 (2016a)
Lanza, A., Morigi, S., Sgallari, F.: Constrained TVp -2 model for image restoration. J. Sci.
Comput. 68, 64–91 (2016b)
Lanza, A., Morigi, S., Selesnick, I.W., Sgallari, F.: Nonconvex nonsmooth optimization via convex-
nonconvex majorization minimization. Numerische Mathematik 136(2), 343–381 (2017)
Lanza, A., Morigi, S., Sgallari, F.: Automatic parameter selection based on residual whiteness for
convex non-convex variational restoration. In: Mathematical Methods in Image Processing and
and Inverse Problems (eds) Tai XC, Wei S, Liu H. Springer, Singapore, 360, (2021). https://doi.
org/10.1007/978-981-16-2701-9
Lanza, A., Morigi, S., Selesnick, I.W., Sgallari, F.: Sparsity-inducing nonconvex nonseparable
regularization for convex image processing. SIAM J. Imag. Sci. 12(2), 1099–1134 (2019)
Lanza, A., Pragliola, M., Sgallari, F.: Residual whiteness principle for parameter-free image
restoration. Electron. Trans. Numer. Anal. 53, 329–351 (2020)
1 Convex Non-convex Variational Models 59
Lefkimmiatis, S., Ward, J., Unser, M.: Hessian Schatten-Norm regularization for linear inverse
problems. IEEE Trans. Image Process. 22, 1873–1888 (2013)
Malek-Mohammadi, M., Rojas, C.R., Wahlberg, B.: A class of nonconvex penalties preserving
overall convexity in optimization based mean filtering. IEEE Trans. Signal Process. 64(24),
6650–6664 (2016)
Nikolova, M.: Estimation of binary images by minimizing convex criteria. Proc. IEEE Int. Conf.
Image Process. 2, 108–112 (1998)
Nikolova, M.: Energy minimization methods. In: Scherzer, O. (ed.) Handbook of Mathematical
Methods in Imaging, Chapter 5, pp. 138–186. Springer, Berlin (2011)
Nikolova, M., Ng, M.K., Tam, C.P.: Fast nonconvex nonsmooth minimization methods for image
restoration and reconstruction. IEEE Trans. Image Process. 19(12), 3073–3088 (2010)
Parekh, A., Selesnick, I.W.: Convex denoising using non-convex tight frame regularization. IEEE
Signal Process. Lett. 22(10), 1786–1790 (2015)
Parekh, A., Selesnick, I.W.: Enhanced low-rank matrix approximation. IEEE Signal Process. Lett.
23(4), 493–497 (2016)
Park, T.W., Burrus, C.S.: Digital Filter Design. Wiley, New York (1987)
Portilla, J., Mancera, L.: L0-based sparse approximation: two alternative methods and some
applications. In: Proceedings of SPIE, San Diego, vol. 6701 (Wavelets XII) (2007)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms.
Physics D 60(1–4), 259–268 (1992)
Selesnick, I.W.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17),
4481–4494 (2017a)
Selesnick, I.W.: Total variation denoising via the Moreau envelope. IEEE Signal Process. Lett.
24(2), 216–220 (2017b)
Selesnick, I.W., Bayram, I.: Sparse signal estimation by maximally sparse convex optimization.
IEEE Trans. Signal Proc. 62(5), 1078–1092 (2014)
Selesnick, I.W., Parekh, A., Bayram, I.: Convex 1-D total variation denoising with non-convex
regularization. IEEE Signal Process. Lett. 22, 141–144 (2015)
Selesnick, I.W., Lanza, A., Morigi, S., Sgallari, F.: Non-convex total variation regularization for
convex denoising of signals. J. Math. Imag. Vis. 62, 825–841 (2020)
Setzer, S., Steidl, G., Teuber, T.: Infimal convolution regularizations with discrete l1-type function-
als. Commun. Math. Sci. 9(3), 797–827 (2011)
Shen, L., Xu, Y., Zeng, X.: Wavelet inpainting with the l0 sparse regularization. J. Appl. Comp.
Harm. Anal. 41(1), 26–53 (2016)
Sidky, E.Y., Chartrand, R., Boone, J.M., Pan, X.: Constrained TpV–minimization for enhanced
exploitation of gradient sparsity: application to CT image reconstruction. IEEE J. Trans. Eng.
Health Med. 2, 1–18 (2014)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact L0 penalty (CEL0) for least squares
regularized problem. SIAM J. Imag. Sci.8(3), 1607–1639 (2015)
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res.
1, 211–244 (2001)
Tuy, H.: DC optimization: theory, methods and algorithms. In: Handbook of Global Optimization,
pp. 149–216. Springer, Boston, (1995)
Wang, S., Selesnick, I.W., Cai, G., Ding, B., Chen, X.: Synthesis versus analysis priors via
generalized minimax-concave penalty for sparsity-assisted machinery fault diagnosis. Mech.
Syst. Signal Process. 127, 202–233 (2019)
Wipf, D.P., Rao, B.D., Nagarajan, S.: “Latent variable Bayesian models for promoting sparsity. In:
IEEE Trans. Inf. Theory 57(9), 6236–6255 (2011)
Yuille, A.L., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936
(2003)
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2),
894–942 (2010)
Zou, J., Shen, M., Zhang, Y., Li, H., Liu, G., Ding, S.: Total variation denoising with non-convex
regularizers. IEEE Access 7, 4422–4431 (2019)
Subsampled First-Order Optimization
Methods with Applications in Imaging 2
Stefania Bellavia, Tommaso Bianconcini, Nataša Krejić,
and Benedetta Morini
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Max Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Stochastic Gradient and Variance Reduction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Gradient Methods with Adaptive Steplength Selection Based on
Globalization Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Accuracy Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Stochastic Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Adaptive Regularization and Trust-Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
The Neural Network in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Training the Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Abstract
This work presents and discusses optimization methods for solving finite-sum
minimization problems which are pervasive in applications, including image
processing. The procedures analyzed employ first-order models for the objective
function and stochastic gradient approximations based on subsampling. Among
the variety of methods in the literature, the focus is on selected algorithms which
can be cast into two groups: algorithms using gradient estimates evaluated on
samples of very small size and algorithms relying on gradient estimates and
machinery from standard globally convergent optimization procedures. Neural
networks and convolutional neural networks widely used for image processing
tasks are considered, and a classification problem of images is solved with some
of the methods presented.
Keywords
Introduction
1
N
f (x) = fi (x), (2)
N
i=1
Considering first-order methods, let k be the iteration index and fk0 and gk be
subsampled approximation of f (xk ) and ∇f (xk ), respectively, i.e.,
1
fk0 = fi (xk ), (3)
|Sk,f |
i∈Sk,f
1
gk = ∇fi (xk ), (4)
|Sk,g |
i∈Sk,g
where Sk,f and Sk,g are random subsets of {1, . . . , N } and |Sk,f |, |Sk,g | denote
their cardinality. Then, the kth iteration of the stochastic gradient procedures we are
dealing with has the form
xk+1 = xk − αk gk , (5)
quantities. The choice of the sample size can vary from simple heuristics to
sophisticated schemes that take into account the progress made by the optimization
process itself. A further relevant distinction from the methods in the first group
is that, except for Curtis et al. (2019), the accuracy of the function and gradient
estimates is controlled adaptively along the iterations and plays a central role in the
convergence analysis. Assuming that the variance of random functions and gradients
is bounded, specific accuracy requirements can be fulfilled by means of a sufficiently
large sample size estimated using probabilistic arguments (Bellavia et al. 2019;
Tripuraneni et al. 2018; Tropp 2015). Some approaches Bellavia et al. (2020c),
Birgin et al. (2018), Krejić N et al. (2013); Krejić et al. (2015); Krejić et al. (2016)
reach eventually full precision functions and gradients, and thus the convergence
results are deterministic; in the remaining methods, convergence is stated in terms
of probability statements, either in mean square or almost sure.
The work is organized as follows. In section “Convolutional Neural Networks”,
we briefly introduce neural networks and convolutional neural networks which
are widely used for image processing tasks. In section “Stochastic Gradient and
Variance Reduction Methods”, we describe subsampled first-order methods in the
first group, while in section “Gradient Methods with Adaptive Steplength Selection
Based on Globalization Strategies” we present methods belonging to the second
group. Finally, in section “Numerical Experiments”, we solve a classification
problem of images, discussing the neural network used, implementation issues, and
results obtained with some of the methods presented. All norms in the paper are
def
Euclidean · = · 2 and given a random variable A; the symbols P r(A) and
E[A] denote the probability and expected value of A, respectively.
Fig. 1 An example of neural network with two hidden layers, s=4, t=3
⎛ ⎞
ni−1
vi,j = σi,j ⎝ xi,j,k vi−1,k + bi,j ⎠ , (6)
k=1
where bi,j ∈ R is called bias and the parameters xi,j,k are called weights. Vector v1
coincides with the input data d. Letting Xi ∈ Rni × Rni−1 be the matrix with (j, k)-
T
entry given by xi,j,k , for 1 ≤ j ≤ ni , 1 ≤ k ≤ ni−1 and bi = bi,1 , . . . , bi,ni ∈
Rni , the output of the whole layer Li is
vi = σ i Xi vi−1 + bi . (7)
In fact, the output of each layer is defined recursively by (7) and depends on the
output of the previous layer.
Common examples of activation functions are (Bishop 2006; Goodfellow et al.
2016):
• Linear: σ (z) = z
• Sigmoid or logistic: σ (z) = 1/(1 + e−z )
• Tanh: σ (z) = tanh(z)
• Relu: σ (z) = max(0, z)
• Elu: σ (z) = z · X[x≥0] + (ez − 1) · X[x<0]
The procedure for choosing the parameters Xi , bi i=1,...,m is referred to as
training phase. Let x be the vectorization of Xi , bi i=1,...,m . Given the set of
known data di , ŷi i=1,...,N (training set), the aim is to choose the parameters so
that the output vm (x; di ) of the neural network corresponding to the input di is as
close as possible to the value ŷi for every i = 1, . . . , N .
In order to do that, it is necessary to select a function E : Rt × Rt → R for
measuring the error made by the network on the prediction of each given data and
minimize the so-called loss function:
1
N
E(vm (x; di ), ŷi ). (9)
N
i=1
Since di and ŷi are known, the loss function is a special case of (2) where
Convolutional Layer
c
(I ∗W )(i, j ) = I (s, t, u)·W (s −i +k +1, t −j +k +1, u)+b, (10)
s t u=1
(I ∗ ∗W )(i, j, ) = (I ∗ W )(i, j ),
CNNs are networks composed by at least one convolutional layer and standard
layers. In convolutional layers, the entries of the filters are the parameters which are
updated during the training. Hence, a convolutional layer consists of m · ((2k + 1) ·
(2k + 1) · c + 1) trainable parameters, (2k + 1) · (2k + 1) · c + 1 for each filter,
bias term included. Each element of the array resulting from a convolution can be
viewed as a neuron of the type shown in Fig. 2, where some of the connections,
corresponding to the indices falling outside the ranges defined in (10), have been
dropped (i.e., the corresponding weights are set to 0). In contrast with standard
NN layers, convolutional layers share weights among different neurons. The kernel
weights are in fact the same in each output neuron, as shown in Fig. 3.
In order to speed up the training phase by reducing the dimension of the object
involved, the max pooling strategy is commonly used in CNN architectures for
imaging (Strang 2019). It consists in replacing, for every channel, a square
neighborhood with its maximum. More formally, given an image I , max pooling
process MP acts as follows:
2 Subsampled First-Order Optimization Methods with Applications in Imaging 69
In this section, we present the widely used stochastic gradient descent (SGD)
method (Robbins et al. 1951) and incremental gradient algorithms based on variance
reduction such as stochastic variance reduction gradient (SVRG) method (Johnson
et al. 2013), SVRG method with Barzilai-Borwein steplengths (SVRG - BB)
(Tan et al. 2016), StochAstic Recursive grAdient algoritHm (SARAH) method
(Nguyen et al. 2017), stochastic average gradient (SAG) method (Schmidt et al.
2017), and SAGA (Defazio et al. 2014). In the presentation of the convergence
properties of these methods, we will make use of the specific form (2) of the problem
and of following assumptions.
This assumption clearly implies that the gradient of objective function is also L-
Lipschitz continuous:
μ
f (x) ≥ f (y)+(∇f (y))T (x−y)+ x−y2 for all (x, y) ∈ Rn ×Rn . (12)
2
70 S. Bellavia et al.
xk+1 = xk − αk ∇f (xk ).
The steplength αk can be fixed in a number of ways, for example, one can apply a
line search procedure based on specific requirements on f or take a constant value,
αk = α, ∀k ≥ 0. If f is convex and Assumption 1 holds, method FG with fixed
steplength α converges sublinearly and satisfies the following error bound:
provided that 0 < α < 2/L (Nesterov 1998, Th. 2.1.13). If additionally f is strongly
convex and 0 < α < 2/(μ + L), then FG achieves linear convergence:
with ρ depending on the condition number L/μ (Nesterov 1998, Th. 2.1.14).
In the case where the number of component functions fi is large, such as in
machine learning applications, the computation of the full gradient is very expen-
sive, and SGD (stochastic gradient descent) appears as an appealing alternative.
The method was first proposed in the seminal paper of Robbins and Monro as SA
(stochastic approximation) method (Robbins et al. 1951). The main idea of SGD
is to replace the expensive gradient ∇f (xk ) with a significantly cheaper stochastic
vector gk . Here we focus on the case where gk is an unbiased approximation to
∇f (xk ), i.e., E[gk ] = ∇f (xk ), built via (4) with Sk,g chosen uniformly at random
from {1, . . . , N }.
Intuition for using subsampled functions evaluated on random small size sample
sets comes from the fact that the training set is often highly redundant, see, e.g.,
(Bottou et al. 2018). Sample sets Sk,g withsmall cardinality |Sk,g |, in the limit equal
to one, are generally used. Whenever Sk,g > 1, the stochastic approximation of the
full gradient is denoted as mini-batch; on the other hand, if the sample set reduces
to a single element, the stochastic approximation is called simple or basic. In the
following algorithm, without loss of generality, we present SGD referring to the
latter case.
ALGORITHM SGD
Step 0: Initialization. Choose an initial point x0 and a sequence of strictly
positive steplengths {αk }. Set k = 0.
Step 1. Stochastic gradient computation. Choose randomly and uniformly
ik ∈ {1, . . . , N }. Set gk = ∇fik (xk ).
(continued)
2 Subsampled First-Order Optimization Methods with Applications in Imaging 71
and if α ≤ μ/(LM2 ), then the expected optimality gap f (xk ) − f (x∗ ) falls below a
problem-dependent value (Bottou et al. 2018, Th. 4.6).
Convergence in expectation can be proved assuming toemploy diminishing
steplengths, i.e., the sequence {αk } satisfies ∞
k=1 αk = ∞,
∞
k=1 αk < ∞. It can
2
be shown (see Nemirovski et al. (2009, p. 1578)) that for strongly convex functions,
properly chosen steplengths such as αk = θ/k with θ > 1/(2μ), and random
gradient approximations having bounded variance, one can get
√
E[xk − x∗ ] = O(1/ k).
A further result on expected optimality gap for strongly convex functions is given
below.
Theorem 1 (Bottou et al. 2018, Th. 4.7). Suppose that Assumptions 1 and 2 hold
and let x∗ be the minimizer of f. Assume that (13) holds at each iteration. Then, if
SGD is run with αk = γ β+k , β > μ1 and γ > 0 such that α1 ≤ L M 1
2
, there exists a
scalar ν > 0 such that
ν
E[f (xk )] − f (x∗ ) ≤ . (14)
γ +k
The theorem above shows that, in the case of strongly convex problems, SGD
converges slower (sublinearly) than FG method and this depends on the variance
of the random sampling. Note that the larger M2 is, the smaller the steplength is,
and this implies slow convergence.
Theoretical results for SGD applied to nonconvex optimization problems are
available (Bottou et al. 2018, §4.3). In particular, if f is bounded, in expectation −gk
is a direction of sufficient descent
for f at xkand SGD is applied with diminishing
steplengths {αk } satisfying ∞ k=1 αk = ∞,
∞
k=1 αk < ∞, then it can be shown
2
72 S. Bellavia et al.
that the expected gradient norms cannot stay bounded away from zero (Bottou et al.
2018, Th. 4.9).
If the approximate gradient gk has a large variance, SGD may show slow
convergence and bad performance. Taking a larger sample size for Sk,g could help to
reduce gradient variance, but large sample may deteriorate the overall computational
efficiency of stochastic gradient optimization. In order to improve convergence with
respect to SGD, stochastic variance reduction methods have been proposed, see,
e.g., Defazio et al. (2014), Johnson et al. (2013), Nguyen et al. (2017), Tan et al.
(2016), Schmidt et al. (2017), Wang et al. (2013). In particular, in Wang et al.
(2013), a variance reduction technique is proposed by making use of control variates
(Ross 2006) to augment the gradient approximation and consequently reduce its
variance.
Variance reduction is the core of SVRG (stochastic variance reduction gradient)
method presented in Johnson et al. (2013); the algorithm is given below.
ALGORITHM SVRG
Step 0: Initialization. Choose an initial point x0 ∈ Rn , an inner loop size
m > 0, a steplength α > 0, and the option for the iterate update. Set k = 1.
Step 1: Outer iteration, full gradient evaluation.
Set x̃0 = xk−1 . Compute ∇f (x̃0 ).
Step 2: Inner iterations
For t = 0, . . . , m − 1
Uniformly and randomly choose it ∈ {1, . . . , N }.
Set x̃t+1 = x̃t − α(∇fit (x̃t ) − ∇fit (x̃0 ) + ∇f (x̃0 )).
Step 3: Outer iteration, iterate update.
Set xk = x̃m (Option I). Increment k by one and go to Step 1.
Set xk = x̃t for randomly chosen t ∈ {0, . . . , m − 1} (Option II). Increment k
by one and go to Step 1.
SVRG consists of outer and inner iterations. At each outer iteration k, the
full gradient at xk is computed. Then a prefixed number m of inner iterations is
performed using stochastic gradients and fixed steplength α; the internal iterates are
x̃0 , x̃1 , . . . , x̃m . At the tth inner iteration, the stochastic gradient used has the form
with it chosen uniformly and randomly in {1, . . . , N}. This quantity is an unbiased
estimation of the gradient. Finally, the new iterate is either the last computed iterate
x̃m (Option I) or one of the vectors x̃0 , . . . , x̃m−1 (Option II). Although Option I,
taking the new iterate as the last outcome of inner loop, is intuitively more appealing,
the convergence results from Johnson et al. (2013) are valid for Option II only. The
2 Subsampled First-Order Optimization Methods with Applications in Imaging 73
results presented in Johnson et al. (2013) cover both the convex and nonconvex
cases. For the sake of simplicity, here, we consider the strongly convex case.
Theorem 2 (Johnson et al. 2013, Th 1). Suppose that Assumptions 1 and 2 hold
and that all fi are convex, and let x∗ be the minimizer of f. If m and α satisfy
1 2Lα
θ= + < 1, (15)
μα(1 − 2Lα)m 1 − 2Lα
Theorem 3 (Tan et al. 2016, Corollary 1). Suppose that Assumptions 1 and 2 hold
and let x∗ be the minimizer of f. If m and α satisfy
4αL2
θ = (1 − 2αμ(1 − αL)m ) + < 1,
μ(1 − αL)
then Algorithm SVRG with Option I generates a sequence which converges linearly
in expectation
E[xk − x∗ 2 ] ≤ θ k x0 − x∗ 2 .
gradient is typically replaced with a mini-batch stochastic gradient (Lei et al. 2017).
Further, we mention a limited memory approach which gives rise to k-SVRG (Raj
et al. 2018).
A variant of SVRG borrows ideas from the spectral gradient method (Barzilai
et al. 1988; Raydan et al. 1997) which is very popular modification of the classical
FG. The spectral gradient method is based on the idea of approximating the Hessian
matrix in each iteration with a multiple of the identity matrix which minimizes the
discrepancy from the secant equation and yields an adaptive steplength in each
iteration of the gradient method. This steplength is known as Barzilai-Borwein
steplength or the spectral coefficient. The adaptive steplengths overcome hand-
tuning and do not need to be small, i.e., of order 1/L when the Lipschitz constant
is large. Therefore, it is reasonable to expect that some advantages of similar type
might be expected in the framework of SGD and SVRG methods. The following
algorithm is developed in Tan et al. (2016), introducing the Barzilai-Borwein
steplengths in the SVRG framework.
ALGORITHM SVRG - BB
Step 0: Initialization. Choose an initial point x0 ∈ Rn , an inner loop size
m > 0, an initial steplength α0 > 0. Set k = 1.
Step 1: Outer iteration, full gradient evaluation.
Set x̃0 = xk−1 . Compute ∇f (x̃0 ).
1 xk − xk−1 2
If k > 0, then set αk =
m (xk − xk−1 )T (∇f (xk ) − ∇f (xk−1 ))
Step 2: Inner iterations
For t = 0, . . . , m − 1
Uniformly and randomly choose it ∈ {1, . . . , N }.
Set x̃t+1 = x̃t − αk (∇fit (x̃t ) − ∇fit (x̃0 ) + ∇f (x̃0 ))
Step 3: Outer iteration, iterate update. Set xk = x̃m . Increment k by one
and go to Step 1.
Note that at the first outer iteration, the steplength is the input data α0 , while at
the successive outer iterations, the steplengths αk are adaptively chosen and used
within inner iterations. The following results are established for strongly convex
functions.
Theorem 4 (Tan et al. 2016, Th. 3.8). Suppose that Assumptions 1 and 2 hold and
let x∗ be the minimizer of f. Define θ = (1 − e−2μ/L )/2. If m is chosen such that
2 4L2 L
m > max , + ,
log(1 − 2θ ) + 2μ/L θ μ2 μ
ALGORITHM SAG
Step 0: Initialization. Choose an initial point x0 ∈ Rn , positive steplengths
{αk }, yi = 0, for i = 1, . . . , N. Set k = 0.
Step 1: Stochastic gradient update. Uniformly and randomly choose ik ∈
{1, . . . , N }. Set yik = ∇fik (xk ).
Step 2: Iterate update. Set xk+1 = xk − αNk N i=1 yi . Increment k by one
and go to Step 1.
SAG method uses a gradient estimation for ∇f (xk ) composed of the sum along
all terms in the gradient, in the spirit of FG, but the cost of each iteration is the same
as SDG. Remarkably, at the price of keeping track of a N × n matrix containing
the gradient values computed through the iterations, SAG achieves almost the same
convergence rate than FG. In fact, unlike SDG, convergence of SAG can be achieved
taking constant steplength αk = 1/(16L), ∀k ≥ 0 and the optimality gap on
average iterates achieve the same error bound O(1/k) as FG for convex function and
linear convergence for strongly convex functions (Schmidt et al. 2017, Th. 1). If the
Lipschitz constant is not available, a strategy for its estimation is given in Schmidt
et al. (2017, §4.6). The following result concerns strongly convex problems.
Theorem 5 (Schmidt et al. 2017, Th. 1). Suppose that Assumptions 1 and 2 hold.
Let x∗ be the minimizer of f. If α = 1/(16L), then
k
μ 1
E[f (xk )] − f (x∗ ) ≤ 1 − min , C0 ,
16L 8N
Note that for ill-conditioned problems where N < (2L)/μ, N does not play
a role in the convergence rate, and the SAG algorithm has nearly the same
convergence rate as the FG method with a step size of 1/(16L), even though it uses
iterations which are N times cheaper. This indicates that in case of ill-conditioned
problems, the convergence rate is not affected by the use of out-of-date gradient
values. A SAG extension, called SAGA, has been also proposed in Defazio et al.
(2014). SAGA exploits SVRG-like unbiased approximations of the gradient and
combines ideas of SAG and SVRG algorithms; a fixed steplength is employed. The
interested reader can find additional details about SAGA in Defazio et al. (2014).
SARAH method Nguyen et al. (2017) is a further variant of SGD based on
accumulated stochastic information. Unlike SAGA, SARAH is based on the idea of
variance reduction and biased estimations of the gradient; the algorithm is sketched
below.
ALGORITHM SARAH
Step 0: Initialization. Choose an initial point x0 ∈ Rn , an inner loop size
m > 0, a steplength α > 0. Set k = 1.
Step 1: Outer iteration, full gradient evaluation.
Set x̃0 = xk−1 . Compute y0 = ∇f (x̃0 ). Set x̃1 = x̃0 − αy0 .
Step 2: Inner iterations.
For t = 1, . . . , m − 1
Uniformly and randomly choose it ∈ {1, . . . , N}.
Compute yt = ∇fit (x̃t ) − ∇fit (x̃t−1 ) + yt−1 .
Set x̃t+1 = x̃t − αyt .
Step 3: Outer iteration, iterate update. Set xk = x̃t for randomly chosen
t ∈ {0, . . . , m}. Increment k by one and go to Step 1.
The convergence results presented in Nguyen et al. (2017) cover both the convex
and strongly convex cases, as well as address complexity analysis; the result for the
strongly convex case is given below.
Theorem 6 (Nguyen et al. 2017, Th. 4). Suppose that Assumptions 1 and 2 hold
and that each function fi , 1 ≤ i ≤ N is convex. If α and m are such that
1 αL
σ = + < 1, (16)
μα(m + 1) 2 − αL
then the sequence {∇f (xk )} generated by Algorithm SARAH satisfies
2 Subsampled First-Order Optimization Methods with Applications in Imaging 77
We observe that condition (16) imposes the upper bound 1/L on the steplength α,
while the analogous condition (15) governing the convergence of SVRG imposes the
tighter bound α < 1/(4L); further, for any α and m, it holds σ < θ . An additional
advantage of SARAH is that if α is small enough, then the stochastic steps computed
converge linearly in the inner loop in expectation.
Theorem 7 (Nguyen et al. 2017, Th. 1b). Suppose that Assumption 1 holds and
each function fi , 1 ≤ i ≤ N is μ-strongly convex with μ > 0. If α ≤ 2/(μ + L),
then for any t ≥ 1
2μLα 2μLα t
E[yt 2 ] ≤ 1 − E[yt−1 2 ] ≤ 1 − E[∇f (x̃0 )2 ].
μ+L μ+L
Gradient methods discussed in the previous section employ stochastic (possibly and
occasionally full) gradient estimates and do not rely on any machinery from standard
globally convergent optimization procedures such as line search, trust-region, or
adaptive overestimation strategies. On the other hand, a few and recent papers
(Bellavia et al. 2019, 2020c; Blanchet et al. 2019; Cartis et al. 2018; Chen et al.
2018; Curtis et al. 2019; Paquette et al. 2020) rely on such strategies for selecting the
steplength and part of them mimic traditional step acceptance rules using stochastic
estimates of functions and gradients. The purpose of these methods is to partially
overcome the dependence of the steplengths from the Lipschitz constant of the
gradient, i.e., lack of natural scaling, which appears in the convergence results of
SGD and its variants given in section “Stochastic Gradient and Variance Reduction
Methods”; see Curtis et al. (2019, §1).
One relevant proposal in the field of stochastic trust-region methods is TRish
(Trust-Region-ish) algorithm (Curtis et al. 2019). TRish uses a stochastic gradient
estimate gk of ∇f (xk ) and a careful steplength selection which, to a certain extent,
mimics a trust-region strategy. TRIsh algorithm is sketched below.
ALGORITHM TR I S H
Step 0: Initialization. Choose an initial point x0 ∈ Rn , positive steplengths
{αk }, positive {γ1,k } and {γ2,k } such that γ1,k > γ2,k , ∀k ≥ 0. Set k = 0.
Step 1: Step computation. Compute a gradient estimate gk ∈ Rn .
(continued)
78 S. Bellavia et al.
If the norm of the stochastic gradient is below 1/γ1,k , then the steplength is γ1,k αk ,
while if the norm is larger than 1/γ2,k , then the steplength is γ2,k αk with γ2,k <
γ1,k . Note that the trust-region machinery is used for building the step, but unlike
standard trust-region strategies, it does not employ step acceptance conditions and
therefore it does not affect the choice of the steplengths {αk }. Examples in Curtis
et al. (2019, §2) show that a pure trust-region algorithm, taking steps from (18)
independently of the norm of the stochastic gradient, is not guaranteed to converge;
this would be the case if γ1,k 0 and γ2,k ≈ 0. Hence, the convergence theory
of TRish is based on an appropriate upper bound for γ1,k /γ2,k . The theoretical
results for TRish are similar to those of SGD since both methods take steps along
the stochastic gradient; on the other hand, SGD possesses no natural scaling, while
TRish exploits normalized steps whenever gk ∈ γ1,k 1 1
, γ2,k . This issue can be
interpreted as an adaptive choice of the steplength which is αk /gk instead of αk
itself; it is expected to improve numerical performance upon traditional SGD, and
this is confirmed by the numerical results provided in Curtis et al. (2019, §2) and in
the subsequent section “Numerical Experiments”.
We summarize some results from the convergence analysis presented in Curtis
et al. (2019). Let us assume that Assumption 1 holds, gk is an unbiased estimator
of ∇f (xk ) satisfying inequality (13) for any k ≥ 0, f is bounded below by f∗ =
infx∈Rn f (x) ∈ R, and the Polyak-Lojasiewicz condition holds at any x ∈ Rn with
μ > 0, i.e.,
2
2μ(f (x) − f∗ ) ≤ ∇f (x) , ∀x ∈ Rn . (19)
where ω > 0 and ρ ∈ (0, 1). Assumption (20) on gradients can be satisfied if gk is
computed by subsampling with increasing sample size.
A further convergence result covers the cases of sublinearly diminishing
steplengths Curtis et al. (2019, Theorem 2) and resembles the corresponding
result for SGD method. If the steplengths αk are sublinearly diminishing, i.e.,
αk = β/(ν + k) for some positive β and ν properly chosen, γ1,k = γ1 > 0,
γ1 − γ2,k = ηαk , ∀k and some η ∈ (0, 1), then
80 S. Bellavia et al.
φ
E f (xk ) − f∗ ≤ ,
ν+k
for all k, with φ positive. We refer to Curtis et al. (2019) for more conver-
gence results, including the case where the Polyak-Lojasiewicz condition is not
satisfied.
Other approaches exploit globalization procedures more closely than TRish,
with the aim of computing the steplength adaptively and testing, at each iteration,
some verifiable criterion on progress toward optimality. To establish such control,
they need stochastic estimates of functions, in addition to gradient estimates
required in all the approaches described so far, and impose dynamic accuracy
in stochastic function and gradient approximations. The general scheme for such
procedures is given below. We will say that iteration k is successful whenever the
acceptance criterion tested in Step 2 is fulfilled, unsuccessful otherwise. Acceptance
criteria employed in literature will be presented in the sections “Stochastic Line
Search” and “Adaptive Regularization and Trust-Region”.
ALGORITHM LSANDTR
Step 0: Initialization. Choose an initial point x0 ∈ Rn , α0 > 0, parameters
governing the steplength selection, and the accuracy requirement in gradient
and function. Set k = 0.
Step 1: Step computation. Compute a gradient estimate gk ∈ Rn and form a
step sk = −αk gk .
Step 2: Step acceptance. Compute estimates fk0 and fks of f (xk ) and
f (xk + sk ) and test for acceptance of xk + sk . If the iteration is successful,
set xk+1 = xk + sk ; otherwise, set xk+1 = xk .
Step 3: Parameters’ update. Compute αk+1 and update parameters govern-
ing the accuracy requirements in the computation of functions and gradients.
Increment k by one and go to Step 1.
The above scheme includes the stochastic line search method proposed in
Paquette et al. (2020), the stochastic trust-region method proposed in Blanchet et al.
(2019) and Chen et al. (2018), and the adaptive overestimation method proposed
in Bellavia et al. (2019). Accuracy in function and gradient approximations is
controlled acknowledging that f has a central role since it is the quantity we
ultimately wish to decrease. Specifically, it is assumed that fk0 , fks , and gk are
sufficiently accurate in probability, conditioned on the past, and an adaptive absolute
accuracy for the objective function and an adaptive relative accuracy for the gradient
are imposed. These requirements are supposed to be satisfied probabilistically.
The method given in Cartis et al. (2018) belongs to the previous framework but
uses the exact function in Step 2. Thus, it only imposes adaptive relative accuracy
on the gradient.
2 Subsampled First-Order Optimization Methods with Applications in Imaging 81
Accuracy Requirements
As for problem (2), the computation of fk0 , fks and gk can be performed by
averaging functions fi and gradients ∇fi in uniformly and randomly selected
subsamples of the set {1, . . . , N }. In order to satisfy (22) and (24) probabilistically,
the size of uniform sampling |Sk,f | and |Sk,g | can be bounded below via the
Bernstein inequality (Tropp 2015). In particular, in Bellavia et al. (2019, Theorem
6.2) it is shown that given g > 0, gk is pg -probabilistically sufficiently accurate if
the cardinality |Sk,g | of the set Sk,g in (4) satisfies
⎧ ⎡ ⎫
⎪ ⎤⎪
⎨ 2 Vg 2ωg (xk ) n + 1 ⎥⎬
|Sk,g | ≥ min N, ⎢
⎢ + log , (25)
⎪
⎩ ⎢ g g 3 1 − pg ⎥
⎥⎪⎭
⎧ ⎡ ⎫
⎪ ⎤⎪
⎨ 2 Vf 2ωf (xk ) 2 ⎬
|Sk,f | ≥ min N, ⎢
⎢ + log ⎥ , (27)
⎪
⎩ ⎢ f f 3 1 − pf ⎥
⎥⎪⎭
A stochastic line search method, which falls into the general scheme LSandTR, is
given in Paquette et al. (2020). At iteration k, the computation of the step sk and
the stochastic line search are performed using a constant θ ∈ (0, 1) and a positive
parameter δk . Given αk , a probability pg ∈ (0, 1), a constant κ > 0, and letting
g = καk gk , the gradient estimate gk formed in Step 1 is supposed to be pg -
probabilistically sufficiently accurate, i.e., to satisfy (22) with g = καk gk .
With gk at hand, the step sk in Step 1 takes the form sk = −αk gk , and in Step 2
the Armijo condition
Note that both accuracy requirements on functions and gradients are adaptive
and the function has to be approximated with higher accuracy than the gradient.
2 Subsampled First-Order Optimization Methods with Applications in Imaging 83
Moreover, observe that the variance condition depends on the parameter δk , the
steplength αk , and the norm of the true gradient.
The kth iteration is successful if (29) is met, unsuccessful otherwise. Whenever
the iteration is successful, parameters are updated in Step 3 as follows:
for some fixed γ > 1 and αmax > 0. On the other hand, when the iteration is
unsuccessful, Step 3 consists in updating
αk+1 = γ −1 αk , 2
δk+1 = γ −1 δk2 .
The rules for choosing αk and δk either enlarge or reduce accuracy in stochastic
estimates based on fulfillment of the decrease condition (29) and the magnitude
of the expected improvement of fks over fk0 . In fact, the parameter αk affects the
accuracy of gradient and function estimates and is enlarged when the iteration
is successful, diminished otherwise. On the other hand, the parameter δk affects
the variance of function estimates and is intended to guess how much the true
function decreases. In fact, the decrease obtained in (29) does not guarantee a
similar reduction in the true function as well. Hence, δk2 is enlarged only in the
case where the iteration is successful, and αk gk 2 is not smaller than δk2 , that is,
when the variance of function values is not larger than the square of the decrease in
the approximate function. Interestingly, αk gk may not diminish as gk decreases
and consequently accuracy requirements do not necessarily become more stringent
along iterations.
In Paquette et al. (2020) stochastic complexity results have been established
for convex, strongly convex, and general nonconvex, smooth problems; they
imply convergence results. In case of μ-strongly convex problems, under suitable
assumptions on the stochastic process, Paquette et al. (2020, Th. 4.18) shows that
there exist probabilities pg , pf sufficiently close to one and satisfying pg pf > 12
and a constant ν ∈ (0, 1) such that the expected number T of iterations needed to
satisfy
f (xk ) − f (x∗ ) ≤
is such that
pg p f (Lκαmax )3
E[T ] ≤ O(1) (log(Φ0 ) + log( −1 ))
2pg pf − 1 μ
implementation of the above stochastic line search method encounters the problem
that g = καk gk depends on the norm of the vector gk that has to be computed.
Following Cartis et al. (2018), the computation of the approximated gradient gk by
subsampling can be performed via an inner iterative process. The approximated
gradient gk is computed via (25) or (26) using a predicted sample size. Then,
if the predicted accuracy is larger than the required accuracy, the sample size is
progressively increased until the accuracy requirement is satisfied.
1
min mk (s) = fk0 + gkT s + s2 ,
s∈Rn 2
with fk0 being an approximation to f (xk ). Trivially the step takes the form
sk = − σ1k gk , i.e., αk = σ1k in Step 1 of the general scheme LSandTR.
Acceptance of the step is tested using the rules employed in trust-region and reg-
ularization methods, but different from the standard approaches, here the function
values and the gradient involved are approximated. Using function estimates fk0 and
fks for f (xk ) and f (xk + sk ), the test for acceptance is
fk0 − fks
ρk = ≥ η1 , η1 ∈ (0, 1). (30)
fk0 − (fk0 + gkT sk )
Values fk0 and fks are supposed to be pf -probabilistically sufficiently accurate with
f = ωk (gkT sk ).
2 Subsampled First-Order Optimization Methods with Applications in Imaging 85
and
1
ωk+1 = min κω , ,
σk+1
for some fixed γ > 1, σmin > 0, and κω ∈ (0, 1/(2η1 )). Specifically, in case of
successful iterations, the regularization parameter is decreased, and the parameter
that rules the accuracy requirements is increased. On the other hand, in case
of unsuccessful iterations, σk is increased and tighter accuracy requirements are
imposed on function and gradient approximations.
In Bellavia et al. (2019), complexity analysis in high probability for AR1DA
is carried out. Assume for sake of simplicity pg = pf and let p̄ ∈ (0, 1) be a
prescribed probability for meeting the approximate first-order optimality condition:
with
> 0. InBellavia et al. (2019, Th. 7.1), it is shown that if 1 − pg =
−2
O (1 − p̄) /3 , then AR1DA needs at most O
2 iterations and approximate
evaluations of the objective function to satisfy (31) with probability at least p̄.
From a practical point of view, the approximated gradient gk is computed via
(25) or (26) using a predicted accuracy requirement, say, p . Then, with gk at hand,
if p > ωk gk , then p is progressively decreased and gk recomputed until p ≤
ωk gk or p < . We finally mention that the algorithm is stopped whenever the
condition
gk ≤
1 + ωk
Numerical Experiments
Fig. 5 Some random images from each class of cifar-10 dataset (Image taken from https://www.
cs.toronto.edu/~kriz/cifar.html)
into a training set (5/6 of the images) and a testing set (1/6 of the images). The
images are classified into ten homogeneously distributed classes: airplanes, cars,
birds, cats, deer, dogs, frogs, horses, ships, and trucks. In Fig. 5, we show some
images from the dataset. The color model of cifar-10 images is RGB, i.e., each
pixel of an image is represented by three numbers (typically integers) which vary
between 0 and 255 and represent the intensity of each channel; hence, the image can
be viewed as a 32 × 32 × 3 matrix. It is common to normalize the intensity of each
channel between 0 and 1.
The training set is constituted by N = 50000 data di , ŷi i=1,...,N , where
di ∈ R3072 is the vector containing the ith image stacked by columns and ŷi ∈ R10
contains value 1 for the actual category of the ith image and 0 for any other category.
Fig. 6 Architecture of the neural network used for cifar-10. Four convolutional layers mixed with
max pooling layers are followed by two dense layers
are changed too, accordingly to section “Convolutional Layer”, and become both
equal to 30. Summarizing, the output of the first layer has size 30 × 30 × 32 and
is received by the second layer which is again a convolutional layer with 64 filters,
a 3 × 3 kernel, and elu as the activation function. After the second layer, the tensor
shape becomes 28×28×64. The third layer is a max pooling layer (see section “Max
Pooling Layer”), which applies a 2 × 2 max filter on every channel; this halves the
dimension of every slice of the tensor. The fourth layer is a Dropout layer with rate
0.25 which does not alter the shape of the tensor but randomly selects 25% of the
values of the tensor and sets them to 0; this phase is commonly performed to avoid
overfitting. Next, we apply two times a convolutional layer with 128 filters and a
3 × 3 kernel followed by a max pooling. After such four layers, a further Dropout
layer with rate 0.25 is used; the resulting tensor shape is 2 × 2 × 128. At this stage,
the process for transforming the tensor into an array of probabilities is started. First,
a Flatten layer vectorizes the 2 × 2 × 128 tensor and returns a one-dimensional
array with 512 values. Second, a Dense layer with 1024 neurons is used; the input
array with 512 entries is transformed using the elu activation function. Third, a
Dropout layer with rate 0.5 is used, and, finally, a Dense layer with ten neurons
returns an array with 10 entries. Since the network output is expected to be a vector
vm = (vm,1 , . . . , vm,10 )T such that vm,j represents the probability of an input image
of being part of the j th category for j = 1, . . . , 10, in the last layer, we use the
softmax function defined as
ez
SM(z) = t zj
, (32)
j =1 e
where z ∈ Rt . This function resembles all the outputs of the neurons within the very
last layer and produces positive estimates that sum up to 1.
Every layer of the network, except the last, can be viewed as a step forward
in generating information to be used for classification. The vector of dimension
1024 built at the penultimate layer is essentially a set of features which have been
extracted from the original image. More insight into the outputs of intermediate
88 S. Bellavia et al.
layers, after training out network, we fed it with the image of the frog in Fig. 7 and
analyzed the output of the four convolutional layers. These outputs are displayed in
Fig. 8; the channels are plotted side by side for a total of 16 channels per row. In
the first plot, we display the 32 channels of the tensor built at the first convolutional
layer; the shape of the frog is pretty recognizable in all channels. After the second
and the third layer, the image of the frog is no longer recognizable. Even if, after
the fourth convolutional layer, the 4 × 4 pixels of each channel have not apparent
connection with the original image, they still contain enough information. The
dimension of the input has been reduced, and the condensed information contained
in the array is used to generate the 1024 entries which provide the features needed
for the final classification. As we will see in the numerical results subsection,
the information spread by the network allows, after network training, to correctly
classify new entries with satisfactory accuracy.
In the training phase, in order to measure the error made by the network on the
prediction of each data, we used the loss function (9) where E is categorical cross-
entropy function defined as
10
E(vm (x; di ), ŷi ) = − ŷij log vm,j (x; di ) .
j =1
In the training phase, the weights of each layer of the network are updated via
the minimization of the loss function; any of the methods previously described can
be applied.
The training procedure consists in shuffling the training dataset and splitting
it into mini-batches. The neural network is fed with each of such mini-batches in
order to compute the approximated value of the gradient and to update the network
2 Subsampled First-Order Optimization Methods with Applications in Imaging 89
Fig. 8 Intermediate activation: output of intermediate convolutional layers. The network is fed
with the image of a frog in Fig. 7. The color gradient we used for the intensity spans from yellow
(lowest intensity) to blue (highest)
90 S. Bellavia et al.
weights using any of the methods described in previous sections. Once the whole
dataset has been used, the procedure is repeated. In machine learning terminology,
the number of iterations needed to the neural network to handle each entry of the
dataset is called an epoch of the training.
Implementation Details
We implemented the neural network and the training routine using the Python
library Keras (https://keras.io/) and Tensorflow (https://www.tensorflow.org/) for
handling the backend on the GPU, a NVIDIA Quadro M1000M. Keras comes with
an utility to get the cifar-10 dataset split in training and test. We adapted one of the
examples contained into Keras library (https://www.tensorflow.org/tutorials/images/
cnn) to develop the network architecture previously described.
The SGD optimizer, presented in section “Stochastic Gradient and Variance
Reduction Methods”, is included in Keras. After fine-tuning, we ran it using
steplength αk = 10−2 , ∀k ≥ 0. SVRG, presented in section “Stochastic Gradient
and Variance Reduction Methods”, was run using an available implementation
(https://github.com/idiap/importance-sampling); in such implementation, the SVRG
gradient update rules are wrapped around the Keras framework. The full gradient
on the outer iteration of SVRG was replaced by a SG computed on a mini-batch of
1000 training samples; the outer iteration was scheduled to be performed 32 times
per epoch. The steplength for the inner iteration was set to 10−2 . TRish optimizer
presented in section “Gradient Methods with Adaptive Steplength Selection Based
on Globalization Strategies” has been implemented from scratch. After fine-tuning,
the hyperparameters were set as follows: αk = 10−1 , ∀k ≥ 0, γ1,k = 1, ∀k ≥ 0, and
γ2,k = 10−3 , ∀k ≥ 0.
All the three methods have been implemented in a mini-batch manner as
described at the end of the previous section. The batch size used for all training
runs is 32, i.e., gk was computed through (4) with |Sk,g | = 32. The methods under
comparison do not use the objective function at all; then its approximation is not
needed.
Results
SGD, SVRG, and TRish were run imposing a number of 25 epochs. At the
end of each epoch, the accuracy on both training and testing sets was measured. The
accuracy is defined as the percentage of samples for which the classifier assigned
the highest probability to the actual class. In Fig. 9, we report the accuracy achieved
by each method both on the training and on the testing set during the training. The
accuracy is evaluated at the end of each epoch.
TRish method appears to be the most effective in classification. Our experience
showed that in the large majority of TRish iterations, the normalized step arising
from the minimization of the trust-region subproblem (18) is selected. We recall
that the key difference in the gradient methods under investigation is that TRish
2 Subsampled First-Order Optimization Methods with Applications in Imaging 91
Fig. 9 The trend of training and test accuracy during the epochs
can take normalized steps and this can be viewed as an adaptive steplength selection
as the step taken is sk = − gαkk gk instead of −αk gk . The adaptive approach used
in TRish clearly improves classification on the testing set with respect to SGD and
SVRG run with prefixed steplength. In fact, after only two epochs, TRish is already
more accurate than SGD and SVRG and gives approximately 74% of accuracy on the
test set after 12 epoch.
Conclusion
Optimization methods play a key role in machine learning applications. In this work,
several subsampled first-order optimization methods suited for machine learning
applications have been revised both from a theoretical and algorithmic point of
92 S. Bellavia et al.
view. Stochastic procedures for solving convex and nonconvex problems applicable
to neural networks and convolutional neural networks have been discussed, and
numerical experience on a convolutional neural network designed for classifying
images has been presented. Our presentation aims to show how the specific features
of the optimization problems arising in the training phase of neural networks give
rise to stochastic procedures which can address the numerical solution of convex
and nonconvex problems.
The presented procedures are recent and part of the state of the art in optimization
for machine learning. The literature on this topic is immense and steadily increasing,
and this presentation is not comprehensive of the variety of existing first-order meth-
ods. We focused on methods with well-assessed convergence analysis. However we
are aware of widely adopted methods which are less theoretically well founded
than the procedures presented but are successful in machine learning. At this
regard, we would like to mention SGD with momentum (Rumelhart et al. 1986;
Loizou 2017) and ADAM (Kingma and Ba 2015; Sashank 2018). Both methods
aim to speed the convergence rate of SGD method in the solution of ill-conditioned
problems where the surface in a neighborhood of local optima curves more steeply
in one direction than in another. In fact, in such cases a common drawback of
steepest descent methods is that iterates zigzag toward the solution (Nocedal et al.
1999; Sutton 1986). To avoid that, SGD with momentum makes use of a search
direction which is a combination of the current gradient approximation and the step
(first-order momentum of the stochastic gradient) used at the previous iteration.
ADAM method computes individual adaptive steplengths for updating the iterate
component-wise on the basis of the current first- and second-order momentum of
the stochastic gradient.
We conclude underling a current growing interest in second-order methods for
nonconvex finite-sum optimization problems; see, e.g., Aggarwal (2018), Bellavia
et al. (2020, 2021, 2019, 2020a,b), Berahas et al. (2020), Bollapragada et al. (2019),
Bottou et al. (2018), Byrd et al. (2016), Byrd et al. (2012), Erdogdu et al. (2015),
Liu et al. (2018), Roosta-Khorasani et al. (2019), Strang (2019), Xu et al. (2016,
2019).
Acknowledgments The financial support of INdAM-GNCS Projects 2019 and 2020 is gratefully
acknowledged by the first and by the fourth authors. Thanks are due the referee whose comments
improved the presentation of this paper.
References
Aggarwal, C.C.: Neural Networks and Deep Learning. Springer (2018)
Andradottir, S.: A scaled stochastic approximation algorithm. Manag. Sci. 42, 475–498 (1996)
Armijo, L.: Minimization of functions having lipschitz continuous first partial derivatives. Pac. J.
Math. 16, 1–3 (1966)
Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen S.: Stop wasting
my gradients: Practical SVRG. In: Proceedings of the 28th International Conference on Neural
Information Processing Systems, Vol. 2 pp. 2251–2259 (2015)
2 Subsampled First-Order Optimization Methods with Applications in Imaging 93
Barzilai, J., Borwein, J.: Two-point step size gradient. IMA J. Numer. Anal. 8, 141–148 (1988)
Bellavia, S., Gurioli, G.: Complexity analysis of a Stochastic cubic regularisation method under
inexact gradient evaluations and dynamic Hessian accuracy, (2020). arXiv:2001.10827
Bellavia, S., Gurioli, G., Morini, B., Adaptive cubic regularization methods with dynamic inexact
Hessian information and applications to finite-sum minimization. IMA J. Numer. Anal. 41, 764–
799 (2021). https://doi.org/10.1093/imanum/drz076
Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: Adaptive regularization algorithms with inexact
evaluations for nonconvex optimization. SIAM J. Optimiz. 29, 2881–2915 (2019)
Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: High-order Evaluation Complexity of a Stochastic
Adaptive Regularization Algorithm for Nonconvex Optimization Using Inexact Function
Evaluations and Randomly Perturbed Derivatives (2020a) arXiv:2005.04639
Bellavia, S., Krejić, N., Krklec Jerinkić, N.: Subsampled Inexact Newton methods for minimizing
large sums of convex function. IMA J. Numer. Anal. 40, 2309–2341 (2020b)
Bellavia, S., Krejić, N., Morini, B.: Inexact restoration with subsampled trust-region methods for
finite-sum minimization. Comput. Optim. Appl. 76, 701–736 (2020c)
Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled
Newton methods. Optim. Method Softw. 35, 661–680 (2020)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J.
Optimiz. 10, 627–642 (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics).
Springer (2006)
Birgin, G.E., Krejić, N., Martínez, J.M.: On the employment of Inexact Restoration for the
minimization of functions whose evaluation is subject to programming errors. Math. Comput.
87, 1307–1326 (2018)
Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic
trust region method via submartingales. INFORMS J. Optim. 1, 92–119 (2019)
Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for
optimization. IMA J. Numer. Anal. 39, 545–578 (2019)
Bottou, L., Curtis, F.C., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM
Rev. 60, 223–311 (2018)
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer Y.: A stochastic quasi-Newton method for large-scale
optimization. SIAM J. Optimiz. 26, 1008–1021 (2016)
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for
machine learning. Math. Program. 134, 127–155 (2012)
Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization
methods based on probablistic models. Math. Program. 169, 337–375 (2018)
Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and
random models. Math. Program. 169, 447–487 (2018)
Chollet, F.: Deep Learning with Python. Manning Publications Co. (2017)
Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region algorithm based on careful step
normalization. INFORMS J. Optimiz. 1, 200–220 (2019)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support
for non-strongly convex composite objectives. Advances in Neural Information Processing
Systems 27 (NIPS 2014)
Delyon, B., Juditsky, A.: Accelerated stochastic approximation. SIAM J. Optimiz. 3, 868–881
(1993)
Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods, Advances in
Neural Information Processing Systems 28 (NIPS 2015)
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Pearson (2002)
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J.
Sci. Comput. 34, 1380–1405 (2012)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer New York
Inc. (2001)
94 S. Bellavia et al.
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduc-
tion. In: Proceedings of the 26th International Conference on Neural Information Processing
Systems 26 (NIPS 2013)
Lei, L., Jordan, M.I.: Less than a single pass: Stochastically controlled stochastic gradient method.
In: Proceedings of the Twentieth Conference on Artificial Intelligence and Statistics (AISTATS)
(2017)
Liu, L., Liu, X., Hsieh, C.-J., Tao, D.: Stochastic second-order methods for non-convex optimiza-
tion with inexact Hessian and gradient (2018) arXiv:1809.09853
Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, Newton,
proximal point and subspace descent methods (2017) arXiv:1712.09677
Kesten, H.: Accelerated stochastic approximation. Ann. Math. Statist. 29, 41–59 (1958)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann.
Math. Stat. 23, 462–466 (1952)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, 3rd International Conference
on Learning Representations, ICLR 2015 (2015) arXiv: 1412.6980
Krejić, N., Lužanin, Z., Stojkovska, I.: A gradient method for unconstrained optimization in noisy
environment. App. Numer. Math. 70, 1–21 (2013)
Krejić, N., Lužanin, Z., Ovcin, Z., Stojkovska, I.: Descent direction method with line search for
unconstrained optimization in noisy environment. Optim. Method. Soft. 30, 1164–1184 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research.
Springer (1999)
Krejić, N., Martínez, J.M.: Inexact restoration approach for minimization with inexact evaluation
of the objective function. Math. Comput. 85, 1775–1791 (2016)
Krejić, N., Krklec, N.: Line search methods with variable sample size for unconstrained optimiza-
tion. J. Comput. Appl. Math. 245, 213–231 (2013)
Krejić, N., Krklec Jerinkić N.: Nonmonotone line search methods with variable sample size.
Numer. Algorithms 68, 711–739 (2015)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report, Univer-
sity of Toronto (2009)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to
stochastic programming. SIAM J. Optimiz. 19, 1574–1609 (2009)
Nesterov, Y.: Introductory lectures on convex programming, Volume I: Basic course. Lecture Notes
(1998)
Nocedal, J., Sartenaer, A., Zhu, C.: On the behavior of the gradient norm in the steepest descent
method. Comput. Optim. Appl. 22, 5–35 (2002)
Nguyen, L.M., Liu, J., Scheinberg, K., Takač, M., SARAH: A novel method for machine
learning problems using stochastic recursive gradient. In: Proceedings of the 34th International
Conference on Machine Learning (2017) pp. 2613–2621
Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis.
SIAM J. Optim. 30, 349–376 (2020)
Patterson, J., Gibson, A.: Deep Learning: A Practitioner’s Approach. O’Reilly Media, Inc (2017)
Pilanci, M., Wainwright, M.J.: Newton sketch: A near linear-time optimization algorithm with
linear-quadratic convergence. SIAM J. Optimiz. 27, 205–245 (2017)
Polak, E., Royset, J.O.: Efficient sample sizes in stochastic nonlinear programing. J. Comput. Appl.
Math. 217, 301–310 (2008)
Raj, A., Stich, S.U.: k-SVRG: Variance Reduction for Large Scale Optimization (2018)
arXiv:1805.00982
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained
minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Ross, S.: Simulation. Elsevier, 4th edn. (2006)
Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled Newton methods. Math. Progr. 174, 293–326
(2019)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
2 Subsampled First-Order Optimization Methods with Applications in Imaging 95
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Bregman Proximal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A Unified Framework for Implicit and Explicit Gradient Methods . . . . . . . . . . . . . . . . . . . 101
Bregman Proximal Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Bregman Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Linearized Bregman Iteration as Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Bregman Iterations as Iterative Regularization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Inverse Scale Space Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Accelerated Bregman Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Incremental and Stochastic Bregman Proximal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Stochastic Mirror Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
The Sparse Kaczmarz Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Bregman Incremental Aggregated Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bregman Coordinate Descent Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
The Bregman Itoh–Abe Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Equivalencies of Certain Bregman Coordinate Descent Methods . . . . . . . . . . . . . . . . . . . . 119
Saddle-Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Alternating Direction Method of Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Primal-Dual Hybrid Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Robust Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Student-t Regularized Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
M. Benning ()
The School of Mathematical Sciences, Queen Mary University of London, London, UK
e-mail: [email protected]
E. S. Riis
The Department of Applied Mathematics and Theoretical Physics, Cambridge, UK
Abstract
Keywords
Introduction
Bregman methods have a long history in mathematical research areas such as opti-
mization, inverse and ill-posed problems, statistical learning theory, and machine
learning. In this review, we mainly focus on the areas of optimization and inverse
and ill-posed problems and the application of popular Bregman methods to poten-
tially large-scale problems. Following Lev Bregman’s seminal work in 1967
(Bregman 1967), it was not before the work of Censor and Lent (1981) in 1981 that
the use of Bregman methods has slowly but steadily been popularized in the area
of mathematical optimization, shortly followed by the advent of the mirror descent
algorithm (Nemirovsky and Yudin 1983). Bregman proximal methods, which we
discuss in greater detail in the following section, were first introduced by Censor and
Zenios in their seminal work in 1992 (Censor and Zenios 1992), shortly followed
by Teboulle (1992), Teboulle and Chen (1993), and Eckstein (1993). Bregman
methods have been extensively studied since, see, for example, Bauschke et al.
(2003) and references therein, and many notable extensions were developed, with
one of the most popular ones in the context of inverse and ill-posed problems
being the so-called Bregman iteration (Osher et al. 2005), which is based on a
generalized Bregman distance notion (Kiwiel 1997b). Bregman iterations have been
shown to possess favorable regularization properties over traditional linear iterative
regularization methods, especially in the context of imaging and image processing
applications, and therefore gained a lot of attention in those research fields. We
refer to Osher et al. (2005), Burger (2016), and Benning and Burger (2018) for an
overview on Bregman iterations.
The goal of this chapter is to provide a non-exhaustive overview over some
recent developments in the adaptation of Bregman methods to handle potentially
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 99
for all x, y ∈ Rn , see Bregman (1967) and Censor and Lent (1981). In the following
example, we recall a few relevant examples of Bregman distances.
1
DR (x, y) = Q(x − y), x − y.
2
Special cases include the squared Euclidean distance if Q is the identity matrix
and the squared Mahalanobis distance (cf. Mahalanobis 1936) if Q is a covariance
matrix.
100 M. Benning and E. S. Riis
DR (x̂, x 0 ) − DR (x̂, x k )
F (x k ) − F (x̂) ≤ ,
k
for k ∈ N.
which implies
K−1
F (x k+1 ) − K F (x̂) ≤ DR (x̂, x 0 ) − DR (x̂, x K ).
k=0
proof.
Let us now turn our attention to implicit and explicit gradient methods and how they
can both be formulated as special cases of (1).
Hence, we can construct Bregman methods that are either implicit or explicit w.r.t.
∇F . Whenever we use J as the notation of our function throughout this manuscript,
we implicitly refer to J as defined in (6). Whenever we use R, we refer to a function
R that is not of the form τ1 R − F . Note that we rediscover the traditional gradient
descent algorithm for the choice R(x) = 12 x 2 as a special case of the explicit
formulation. Furthermore, note that the explicit formulation
1
x k+1
= arg minx∈Rn F (x ) + ∇F (x ), x − x + DR (x, x )
k k k k
(7)
τ
is also known as mirror descent (Ben-Tal et al. 2001; Beck and Teboulle 2003;
Juditsky et al. 2011), Bregman gradient method (Teboulle 2018), or recently also
as NoLips (Bauschke et al. 2017). In order to guarantee convergence of (5), one
usually has to guarantee convexity of J . In the explicit setting, this implies that τ
and R have to be chosen to ensure convexity of τ1 R − F or equivalently that F is
1/τ -smooth if R is also a quadratic function. The latter condition has basically been
proposed in Bauschke et al. (2017) and further discussed in Benning et al. (2017a,b)
and Bolte et al. (2018). It has also been shown that if the step size τ is chosen such
that c R - F
is convex,for a some constant c > 0 and a function F , the estimate
0 < τ ≤ 1 + γ (R) − δ /c is sufficient to guarantee convergence under mild
assumptions that are outlined in detail in Bauschke et al. (2017). Here γ (R) denotes
the symmetry coefficient defined as
γ (R) := inf DR (x, y)/DR (y, x) (x, y) ∈ (int dom R) \{x, y | x=y} ∈ [0, 1],
2
and δ is a constant that satisfies δ ∈ (0, 1 + γ (R)). In the following section, we want
to review the special case of Bregman gradient methods where F is the sum of two
functions.
An interesting, special case frequently considered in the literature is the case where
F is a sum of two functions L and S, i.e., the Bregman method reads
x k+1 = arg minx∈Rn L(x) + S(x) + DJ (x, x k ) , (8)
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 103
for j ∈ {1, . . . , n}, the difference of the negative Boltzmann Shannon entropy as
defined in Example 1 and the function L, i.e., J (x) := τ1 nj=1 xj log(xj ) − xj −
L(x) with the convention 0 log(0) ≡ 0, and the characteristic function
⎧
⎨0 x∈Σ
S(x) := ,
⎩+∞ x ∈ Σ
Bregman Iteration
A very important generalization of (1), first proposed in Osher et al. (2005), allows
us to also use convex but nonsmooth functions J as defined in (6) instead of convex
and continuously differentiable functions J . Suppose we are given a proper, l.s.c.
and convex function J : Rn → R. Then its subdifferential, defined as
∂J (y) := p ∈ Rn J (x) − J (y) ≥ p, x − y, ∀x ∈ Rn ,
p
DJ (x, y) = J (x) − J (y) − p, x − y,
With the particular choice J (x) = 2τ1 x 2 + τ1 R(x) − F (x), the Bregman iteration
(9) turns into the linearized Bregman iteration, which reads
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 105
1 1 qk
x k+1
= arg minx∈Rn F (x k )+∇F (x k ), x−x k + x−x k 2
+ DR (x, x ) ,
k
2τ τ
= (I +∂R)−1 x k +q k −τ ∇F (x k ) , (10a)
q k+1 = q k − x k+1 −x k + τ ∇F (x k ) , (10b)
where (I + ∂R)−1 denotes the proximal mapping w.r.t. the function R and q k ∈
∂R(x k ) the subgradient of R at x k that is iteratively defined via (10b) and some
initial value q 0 ∈ ∂R(x 0 ). Suppose we assume that (x k + q k )/τ − ∇F (x k ) is in
the range of some matrix A ∈ Rm×n and that we therefore can substitute τ A bk :=
x k + q k − τ ∇F (x k ). Then (10) can be written as
with initial value b0 = bδ , given the assumption that the initial values of the original
formulation were x 0 = 0 and p0 = 0. Note that we can also write (12) as
% &
bk+1 = bk − A(I + ∂R)−1 τ A bk − bδ . (13)
τ 1
Gτ (b) := A b 2
− b, bδ − R̃(τ A b),
2 τ
Bregman iterations are not only useful for solving optimization problems but are
also extremely important in the context of solving inverse and ill-posed problems.
The reason for this is that Bregman iterations can be used as iterative regularization
methods. If we consider the deterministic linear inverse problem
Ax † = b† , (14)
for a given matrix A ∈ Rm×n , the aim of solving this inverse problem is to
approximate x † in (14), for given A and data bδ with b† − bδ ≤ δ. Here, δ is
a known, positive bound on the error of the measured data bδ and the data b† that
satisfies (14).
Suppose we consider a convex function F that depends on A and bδ , which we
will denote as Fbδ . It then can easily be shown that the iterates of (9) satisfy
pk+1 pk
DJ (x † , x k+1 ) < DJ (x † , x k ),
for all indices k ≤ k ∗ (δ) that satisfy Morozov’s discrepancy principle (Morozov
1966), i.e.,
∗ (δ)
Fbδ (x k ) ≤ ηδ < Fbδ (x k ),
for a parameter η ≥ 1, see Osher et al. (2005) and Burger et al. (2007). Note that
for η > 1 it can be guaranteed that k ∗ (δ) is finite. With the additional regularity
assumption that x † satisfies the so-called range condition (Benning and Burger
2018, Definition 5.8), i.e.,
x † ∈ arg minx∈Rn Fg (x) + R(x) ,
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 107
pk w 2
DJ (x † , x k ) ≤ + δ w + δ 2 k,
2k
for the special case Fbδ (x) := 12 Ax − bδ 2 , see Burger et al. (2007, Theorem 4.3).
Here, w is defined as w := g − Ax † ∈ Rm , which satisfies the source condition
A∗ w ∈ ∂J (x † ), cf. (Chavent and Kunisch 1997; Burger and Osher 2004). If k ∗ (δ)
is of order 1/δ, we therefore observe
∗ (δ)
pk ∗ (δ)
DJ (x † , x k ) = O(δ);
∗
Hence, x k (δ) converges to x † in terms of the Bregman distances if δ converges to
zero.
For more details on how to use Bregman iterations in the context of (linear)
inverse problems, we refer the reader to Osher et al. (2005), Resmerita and Scherzer
(2006), Schuster et al. (2012), Burger (2016), and Benning and Burger (2018). For
the remainder of this paper, we want to discuss modifications of Bregman iterations
and Bregman proximal methods that are suitable to large-scale optimization and
inverse problems.
In what follows, we describe the inverse scale space (ISS) flow, a system of
differential equations which can be derived as the continuous time limit of the
Bregman iterations. For a Bregman function J : Rn → R and objective function
F : Rn → R, this flow is given by
to the flow when J is the total variation seminorm. These results were extended by
Frick and Scherzer (2007) to all convex, proper, lower semicontinuous functions
J , while in Burger et al. (2013), Burger et al. characterize the solution to the flow
explicitly for the case J = · 1 . We note that while these studies do not assume
strict convexity of J , strong convexity is ensured for F by the · 2 term in F
(restricted to the range of the linear operator A), so that the iterations (and flow) are
still well-defined.
By supposing that J were twice continuously differentiable and μ-convex for
some μ > 0 (i.e., strongly convex with parameter μ, see Hiriart-Urruty and
Lemaréchal 1993), we can provide an additional interpretation of the ISS flow,
rewriting (15) as
With this formulation, one can interpret the Hessian of J (x(t)) as a preconditioner
for the flow. Furthermore, by using the chain rule, we derive an energy dissipation
law for the system
d ' ( ) *
F (x(t)) = ẋ(t), ∇F (x(t)) = − ẋ(t), ∇ 2 J (x(t))ẋ(t) ≤ −μ ẋ(t) 2 ,
dt
where the final inequality follows from μ-convexity of J . Furthermore, observe that
if J = F , (16) reduces to a continuous-time variant of Newton’s method. One may
tie this back to the variable metric proximal gradient methods, which were designed
to incorporate quasi-Newton preconditioning to proximal gradient methods.
In section “The Bregman Itoh–Abe Method,” we describe the Bregman Itoh–
Abe (BIA) method (Benning et al. 2020), an iterative system derived by applying
structure-preserving methods from numerical integration to the flow. Thus the ISS
flow provides an alternative way to consider variational formulations for formulating
Bregman schemes.
Not only when dealing with large-scale problems, reducing the number of iterations
is an important goal to achieve when designing an algorithm. In Theorem 1 we have
seen that the Bregman proximal method (1) has a convergence rate of order 1/k.
In the wake of Nesterov (1983), many acceleration strategies have been developed
for first-order optimization methods that aim at minimizing convex functions. As
we focus on Bregman methods, we want to highlight the following adaptation of
Nesterov (1983), first developed in Huang et al. (2013) for quadratic functions F .
There, the authors consider the linearized Bregman iteration, i.e., (9) for the choice
J (x) = 2τ1 x 2 + τ1 R(x) − F (x), as shown in (10). We have seen that (10) can be
formulated as the gradient descent (13) for the special case F (x) = 12 Ax − bδ 2 .
The authors in Huang et al. (2013) have applied the idea of Nesterov acceleration to
formulation (13), which reads
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 109
where {βk }k∈N is a sequence of positive scalars. Applying τ A to both sides of the
equation and substituting τ A bk = x k + q k − τ A (Ax k + bδ ) then yields the
equivalent formulation
pk pk−1
x k+1 = arg minx∈Rn F (x) + (1 + βk )DJ (x, x k ) − βk DJ (x, x k−1 ) ,
(18a)
pk+1 = (1 + βk )pk − βk pk−1 − ∇F (x k+1 ), (18b)
Remark 2. We want to emphasize that the equivalence between (17) and (18) does
not hold for arbitrary functions F as we have exploited the linearity of ∇F by
making use of ∇F ((1 + βk )x k − βk x k−1 ) = (1 + βk )∇F (x k ) − βk ∇F (x k−1 ).
methods that use previous gradient and Bregman proximal evaluations. However,
for more restrictive function classes, faster convergence rates can be achieved, as
has been shown in Hanzely et al. (2018) and Gutman and Peña (2018).
Acceleration strategies such as Nesterov acceleration have also been analyzed
in the context of iterative regularization strategies (e.g., (9) combined with early
stopping as described in section “Bregman Iterations as Iterative Regularization
Methods”), see, for instance, Matet et al. (2017), Neubauer (2017), Garrigos et al.
(2018), and Calatroni et al. (2019).
1
m
F (x) := fi (x). (21)
m
i=1
1 1
m m
F (x) = L(x) + S(x) = i (x) + si (x), (22)
m m
i=1 i=1
for the choice of Jk (x) = 2τ1k x 2 − i(k) (x). If we further pick si ≡ 0 for all i, we
obtain the classical incremental gradient descent (Widrow and Hoff 1960; Bertsekas
et al. 2011a), i.e.,
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 111
x k = x k−1 − τk ∇ i(k) (x
k−1
),
(24)
= x k−1 − τk ∇fi(k) (x k−1 ),
as a special case.
In the following sections, we discuss extensions of stochastic gradient descent
(SGD) and Kaczmarz methods in the Bregman framework, before highlighting the
connection between single cycles of incremental Bregman proximal methods and
deep neural network architectures.
pk
x k+1 = arg minx∈Rn {τk ∇fi(k) (x k ), x + DJ (x, x k )}. (25)
The Kaczmarz method is a scheme for solving quadratic problems of the form
minx x, Ax/2 − b, x. The method was originally introduced by Kaczmarz
112 M. Benning and E. S. Riis
(1937) and later by Gordon et al. (1970) under the name algebraic reconstruction
technique. In this section, we review the extension of Kaczmarz methods to sparse
Kaczmarz methods (Lorenz et al. 2014b) and their block variants. The motivation
for sparse Kaczmarz methods is to find sparse solutions to linear problems Ax = b
via the problem formulation
1
minn x 2
+λ x 1 : Ax = b . (26)
x∈R 2
We first briefly review the original Kaczmarz method. For x 0 = 0, time steps
τk > 0, and a sequence of indices (i(k))k∈N , the (randomized) Kaczmarz method is
given by
Here ai(k) denotes the i th row vector of A. If i(k) comprise a subset of indices, then
the block-variant of the Kaczmarz method is given by
†
x k+1 = x k − τk ai(k) (ai(k) x k − bi(k) ),
where ai(k) denotes the submatrix formed by the row vectors of A indexed by i(k)
†
and ai(k) denotes the Moore-Penrose pseudo-inverse of ai(k) . The iterates of the
randomized Kaczmarz methods converge linearly to a solution of Ax = b (Gower
and Richtárik 2015).
Lorenz et al. (2014b) proposed a sparse Kaczmarz method as follows. Given
starting points x 0 = z0 = 0, the updates are given by
zk+1 = zk − τk ai(k) (ai(k) x k − bi(k) ),
(29)
x k+1 = Sλ (zk+1 ).
1' (
k (x) := (I − Mk )x − 2 bk , x ,
2
for quadratic matrices {Mk }lk=1 and vectors {bk }lk=1 with Mk ∈ Rnk ×nk and bk ∈
Rnk , which has the gradient
114 M. Benning and E. S. Riis
% &
1
∇ k (x) = I − M k + Mk x − bk ,
2
⎧
⎨0 ∀j : xj ≥ 0
sk (x) := χ≥0 (x) =
⎩∞ ∃j : xj < 0
for all k ∈ {1, . . . , l}, then we easily verify that for the choice Jk (x) = x 2 /2 −
k (x) the update
x k = max 0, Ak (x k−1 ) + bk ,
τk
m−1
x k+1 = x k − ∇fi(k−l) (x k−l ), (32)
m
l=0
then it becomes evident from computing the optimality condition of (33a) that the
update (33b) is equivalent to (32) and hence (31) for k ≥ m. Note that we can
rewrite (33a) to
⎧ ⎫
⎨ 1
m−1 ⎬
x k+1 = arg minx∈Rn F (x) − Dfi(k−l) (x, x k−l ) + DJk (x, x k ) , (34)
⎩ m ⎭
l=1
for Jk (x) := 2τ1k x 2 − m1 fi(k) (x). The notable difference to the conventional
IAG method is that we can replace the Bregman distance DJk (x, x k ) in (34) with
more generic Bregman distances. As in section “A Unified Framework for Implicit
and Explicit Gradient Methods,” we can for example choose Jk (x) = 2τ1k x 2 +
1 1
τk R(x)− m fi(k) (x) and therefore derive incremental Bregman iterations of the form
% &
−1 τk k
x k+1
= (I + ∂R) x +q − g
k k
m
% &
τk k
q k+1 =q − x
k k+1
−x + g ,
k
m
g k+1 = g k − ∇fi(k+1) (x k+1−m ) + ∇fi(k+1) (x k+1 ),
τk k
where q k ∈ ∂R(x k ) for all k. Hence, substituting y k = x k + q k − mg yields the
equivalent formulation
116 M. Benning and E. S. Riis
x k+1 = (I + ∂R)−1 y k ,
Needless to say, many different IAG or SAG methods can be derived for different
choices of {Jk }m
k=1 . Choosing Jk such that convergence of the above algorithms is
guaranteed is a delicate issue and involves carefully chosen assumptions, cf. Zhang
et al. (2017, Section 2.3). Convergence guarantees for Jk as defined above with an
arbitrary (proper, convex, and l.s.c.) function R which is an open problem. Having
considered incremental variants of Bregman proximal algorithms, we now want to
review coordinate descent adaptations of this algorithm in the following section.
See, for example, Hua and Yamashita (2016), Corona et al. (2019a,b), Ahookhosh
et al. (2019), Benning et al. (2020), and Gao et al. (2020). In the following,
we want to give a brief overview on Bregman coordinate descent-type methods,
with particular emphasis on an Itoh-Abe discrete gradient-based method, and also
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 117
The Bregman Itoh–Abe (BIA) method (Benning et al. 2020) is a particular form
for coordinate descent, derived by applying the discrete gradient method to the ISS
flow (15). Discrete gradients are methods from geometric numerical integration for
solving differential equations while preserving geometric structures – for details on
geometric numerical integration, see, e.g., Hairer et al. (2006) and McLachlan and
Quispel (2001) – and have found several applications to optimization, e.g., Benning
et al. (2020), Grimm et al. (2017), Ehrhardt et al. (2018), Riis et al. (2018), and
Ringholm et al. (2018) due to their ability to preserve energy dissipation laws.
A discrete gradient is an approximation to a gradient that must satisfy two
properties as follows.
Definition (Discrete gradient). Let F be a continuously differentiable function.
A discrete gradient is a continuous map ∇F : Rn × Rn → Rn such that for all
x, y ∈ Rn ,
Given a choice of ∇F , starting points x 0 , p0 ∈ ∂J (x 0 ), and time steps (τk )k∈N , the
Bregman discrete gradient scheme is defined as
Remark 3. When J (x) = x 2 /2, then the ISS flow reduces to the Euclidean
gradient flow, and we refer to the corresponding BIA method simply as the Itoh–
Abe (IA) method.
Proposition. Suppose J is μ-convex and that (x k+1 , pk+1 ) solves the update (35)
given (x k , pk ) and time step τk > 0. Then
1 symm k k+1 μ k
F (x k+1 ) − F (x k ) = − D (x , x ) ≤ − x − x k+1 2 , (38)
τk J τk
symm
where DJ (x, y) is the symmetrized Bregman distance defined as
118 M. Benning and E. S. Riis
symm p q
DJ (x, y) := DJ (x, y)+DJ (y, x)=p−q, y−x for p ∈ ∂J (y), q ∈ ∂J (x).
1 k+1
F (x k+1 ) − F (x k ) = ∇F (x k , x k+1 ), x k+1 − x k = − p − pk , x k+1 − x k .
τk
The result then follows from monotonicity of convex functions, see, e.g., Hiriart-
Urruty and Lemaréchal (1993, Theorem 6.1.2).
While there are various discrete gradients (see, e.g., McLachlan et al. 1999), the
Itoh–Abe discrete gradient (Itoh and Abe 1988) (also known as the coordinate incre-
ment discrete gradient) is of particular interest in optimization as it is derivative-free
and can be implemented for nonsmooth functions. It is defined as
⎛ ⎞
F (y1 ,x2 ,...,xn )−F (x)
y1 −x1
⎜ F (y1 ,y2 ,x3 ,...,x ⎟
⎜ n )−F (y1 ,x2 ,...,xn ) ⎟
⎜ y2 −x2 ⎟
∇F (x, y) = ⎜ .. ⎟, (39)
⎜ ⎟
⎝ . ⎠
F (y)−F (y1 ,...,yn−1 ,xn )
yn −xn
F (y k,i ) − F (y k,i−1 )
pik+1 = pik − τk,i , pik+1 ∈ ∂Ji (yik,i ),
xik+1 − xik (40)
y k,i = [x1k+1 , . . . , xik+1 , xi+1
k
, . . . , xnk ], i = 1, . . . , n.
1 1
J (x) = x 2, J (x) = x 2
+ λ x 1,
2 2
1 1
F (x) = Ax − bδ 2
, F (x) = Ax − bδ 2
+ γ x 1,
2 2
y k,0 = x k
y k,i = y k,i−1 − τ i [∇F (y k,i−1 )]i ei , (41)
x k+1 = y k,n ,
where τ i > 0 is the time step and ei denotes the i th basis vector. As mentioned in
Wright (2015), the SOR method is also equivalent to the coordinate descent method
with F as above and the time steps scaled coordinate-wise by 1/Ai,i . Hence, in
this setting, the Itoh–Abe discrete gradient method is equivalent not only to SOR
methods but to explicit coordinate descent.
Furthermore, these equivalencies extend to discretizations of the inverse scale
space flow for certain quadratic objective functions and certain forms of Bregman
functions J . Consider a quadratic function F (x) = x, Ax/2 − b, x where A
is symmetric and positive definite, and denote by B the diagonal matrix for which
Ai,i = Bi,i for each i. Given a scaling parameter ω > 0 and the Bregman function
1
J (x) = x, Bx + λ x 1 , (42)
2ω
The Itoh–Abe method yields a sparse SOR scheme as detailed in Benning et al.
(2020). We may compare this to a Bregman linearized coordinate descent scheme
120 M. Benning and E. S. Riis
y k,0 = x k , pk ∈ ∂J (x k ),
pk
zi = arg miny [∇F (y k,i−1 )]i · y + DJ (y k,i−1 , y k,i−1 + yei ),
y k,i = y k,i−1 + zi ei ,
x k+1 = y k,n ,
where J is given by (42) for some ω = ωE ∈ (0, 2). One can verify that these
schemes are equivalent if one sets ωE = 1/ω+1/2
1
. We furthermore mention that
these equivalencies also hold if we were to consider (implicit) Bregman iterations
rather than linearized ones.
Remark 4. It is worth noting at this stage that while the Kaczmarz method (27) is
closely related to SOR (Oswald and Zhou 2015), this connection does not carry over
to the BIA method versus the sparse Kaczmarz method.
Saddle-Point Methods
Many problems in imaging (Chambolle and Pock 2016a) and machine learning
(Goldstein et al. 2015; Adler and Öktem 2018) can be formulated as minimization
problems of the form
min G(x) + F (z) subject to K(x, z) = c. (43)
x∈Rn ,z∈Rm
where δ > 0 is a positive scalar. For the special case K(x, z) = Ax − z and c ≡
0, one can replace F (Ax) with its convex conjugate and formulate the alternative
saddle-point problem
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 121
We want to emphasize that extensions for nonconvex functions (Li and Pong 2015;
Moeller et al. 2015; Möllenhoff et al. 2015) and extensions for nonlinear operators
A (Valkonen 2014; Benning et al. 2015; Clason and Valkonen 2017) or nonlinear
replacements of the dual product (Clason et al. 2019) exist. In the following, we
review Bregman algorithms for the numerical computation of solutions of those
saddle-point formulations.
Depending on the choices of Jx , Jz , and Jy , many other useful variants are possible,
such as
122 M. Benning and E. S. Riis
% &
−1
x k+1
= (I + τx δ ∂G) x − τx A Ax + Bz + δy − c ,
k k k k
% &
−1 k
zk+1 = I + τz δ ∂F z − τz B Ax k+1 + Bzk + δy k − c ,
y k+1 = y k + τy Ax k+1 + Bzk+1 − c ,
It is sensible and has indeed been suggested in Chambolle and Pock (2016b), and
Hohage and Homann (2014) to solve this nonlinear inclusion problem with a fixed
point algorithm of the form
0 ∂G(x k+1 ) + A y k+1
∈ + ∂J (x k+1 , y k+1 ) − ∂J (x k , y k ). (48)
0 ∂F ∗ (y k+1 ) − Ax k+1
36 7
2 4
1 x x 4
:= 5 M
x x
J (x, y) := with ,
2 y y y y
M M
1
τI−A
and M := ,
−A σ1 I
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 123
if we add the optimality system (47) to (48), for a saddle point (x̂, ŷ) . Taking a
dual product of
∂G(x k+1 ) − ∂G(x̂) + A (y k+1 − ŷ)
∂F ∗ (y k+1 ) − ∂F ∗ (ŷ) − A(x k+1 − x̂)
symm symm
Here DJ (x, y) denotes the symmetric Bregman distance DJ (x, y) =
q p
DJ (x, y) + DJ (y, x) = p − q, x − y, for subgradients p ∈ ∂J (x) and
q ∈ ∂J (y), which is also known as Jeffreys–Bregman divergence and closely
related to other symmetrizations such as Jensen–Bregman divergences (Nielsen and
Boltz 2011) and Burbea Rao distances (Burbea and Rao 1982a,b). As an immediate
consequence, we observe
6 7
x k+1 − x̂
0 ≥ ∂J (x k+1 , y k+1 ) − ∂J (x k , y k ),
y k+1 − ŷ
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x̂ x k+1 x̂ x k x k+1 xk ⎠
= DJ ⎝ , k+1
⎠ − DJ ⎝ , k
⎠ + DJ ⎝
k+1 , ,
ŷ y ŷ y y yk
124 M. Benning and E. S. Riis
where we have made use of the three-point identity for Bregman distances (Chen
and Teboulle 1993). Thus, we can conclude
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x̂ x k+1 k+1 k xk ⎠
DJ ⎝ , ⎠ + DJ ⎝ x ,
x ⎠
≤ DJ ⎝
x̂
,
ŷ y k+1 y k+1 y k ŷ yk
for all iterates. Consequently, the iterates are bounded in the Bregman distance
setting with respect to J . Summing up the dual product of (48) with (x k+1 −
x̂, y k+1 − ŷ) therefore yields
⎛ ⎞
N +
, N k+1
x xk ⎠
DJ ⎝ k+1 ,
symm symm
DG (x k+1 , x̂) + DF ∗ (y k+1 , ŷ) +
y yk
k=0 k=0
⎡
⎛ ⎞ ⎛ ⎤
⎞ ⎛ ⎞
N
⎢ x̂ x k x̂ x k+1
⎥ 0
= ⎣DJ ⎝ , ⎠ −DJ ⎝ , ⎠⎦ ≤DJ ⎝ x̂ , x ⎠
ŷ yk ŷ y k+1 ŷ y0
k=0
< +∞.
symm symm
Hence,
% we can conclude D&G (x N , x̂) → 0, DF ∗ (y N , ŷ) → 0, and
DJ xN yN , xk yk → 0 for N → ∞. If G and F ∗ are at least convex
and if J is strongly convex with respect to some norm, one can further guarantee
convergence of the corresponding iterates in norm to a saddle-point (x, y) solution
of (45) with standard arguments. For more details, analysis, and extensions of
PDHG methods, we refer the reader to Chambolle and Pock (2016a).
Applications
In the following we want to show applications for some of the Bregman algorithms
discussed in this review chapter. We want to emphasize that none of the applications
shown are really large-scale applications. The idea of this section is rather to
demonstrate that the algorithms are applicable to a wide range of different problems,
offering the potential to enhance actual large-scale problems. We focus on three
combinations of applications and algorithms: robust principal component analysis
via the accelerated linearized Bregman iteration, deep learning with an incremental
proximal Bregman architecture, and image denoising via the Bregman Itoh–Abe
method.
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 125
Fig. 1 From left to right: the first image of the Yale B faces database, its approximation which is
the sum of a low-rank and a sparse matrix, the low-rank matrix, and the sparse matrix. (a) Original
(b) Approximation (c) Low-rank part (d) Sparse part
Fig. 2 This is an empirical validation of the different convergence rates of the linearized Bregman
iteration and its accelerated counterpart (with regular scaling of the iterations on the left-hand side
and a logarithmic scaling on the right-hand side)
shadow, from Benning et al. (2007). Figure 1 shows the first image of the Yale
B faces database, its approximation, and its decomposition into a low-rank and a
sparse part.
The more important aspect in terms of this review paper is certainly the com-
parison between the linearized Bregman iteration and its accelerated counterpart. A
log-scale plot of the decrease of the loss function 12 L + S − X 2F , where · F
denotes the Frobenius norm, over the course of the iterations of the two algorithms
is visualized in Fig. 2. The plot is an empirical validation that (18) converges at rate
O(1/k 2 ) as opposed to the O(1/k) rate of its non-accelerated counterpart.
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 127
Fig. 3 First row: the 1st, 50th, 100th, and 150th frame of the original video sequence from
Benning et al. (2007). Second row: the same frames of the computed low-rank part. Third row:
the same frames of the computed sparse part
In Fig. 3 we see the 1st, 50th, 100th, and 150th frame of the original Cornell
box video sequence from Benning et al. (2007), together with a low-rank approxi-
mation and a sparse component computed with the accelerated linearized Bregman
iteration.
Deep Learning
Ever since Alexnet entered the scene in 2012 (Krizhevsky et al. 2012), thwarting
then state-of-the-art image classification approaches in terms of accuracy in the
process, deep neural networks (DNNs) have been central to research in computer
vision and imaging. In this section, we merely want to support the analogy between
incremental Bregman proximal methods and DNNs as shown in section “Deep
Neural Networks” with a practical example, rather than engaging in a discussion
of when and why DNNs based on (30) should be used or what advantages or
shortcomings they possess compared to other neural network architectures. For a
128 M. Benning and E. S. Riis
−1
xk = I + ∂ · 1 Ak x k−1 + bk ,
= S1 Ak x k−1 + bk ,
1
s
2
Θ̂ = arg minΘ ΦΘ (xi ) − xi .
2s
i=1
Fig. 4 Top row: random samples from the MNIST dataset. Bottom row: the corresponding
approximations with the trained auto-encoder
Fig. 5 Top row: random samples from the MNIST dataset. Bottom row: the corresponding
approximations with the first 49 singular vectors
(i.e., J (x)‘ = x 2 /2). The application of the BIA method for this example was
previously presented in Benning et al. (2020).
The objective function is given by
N
F : Rn → R, F (x) := ϕi Φ(Ki x) + x − x δ 1. (52)
i=1
Here {Ki }N
i=1 is a collection of linear filters, (ϕi )i=1 ⊂ [0, ∞) are coefficients,
N
as
n
Φ(x) := ψ(xi ), ψ(x) := log(1 + x 2 ),
j =1
Fig. 6 Comparison of BIA and IA methods, for student-t regularized image denoising. First:
convergence rate for relative objective. Second: convergence rate for relative gradient norm. Third:
input data. Fourth: reconstruction
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 131
As impulse noise only affects a fraction of pixels, we use the data fidelity term
x → x − x δ 1 to promote sparsity of x ∗ − x δ for x ∗ ∈ arg minF (x). As linear
filters, we consider the simple case of finite difference approximations to first-order
derivatives of x. We note that by applying a gradient flow to this regularization
function, we observe a similarity to Perona–Malik diffusion (Perona and Malik
1990).
For the BIA method, we consider the Bregman function
1
J (x) := x 2
+ γ x − xδ 1,
2
to account for the sparsity of the residual x ∗ − x δ and compare the method to the
regular Itoh–Abe discrete gradient method (abbreviated to IA).
We set the starting point x 0 = x δ and the parameters to τk = 1 for all k, γ = 0.5,
and ϕi = 2, i = 1, 2. For the impulse noise, we use a noise density of 10%. In the
case where xik+1 is not set to xiδ , we use the scalar root solver scipy.optimize.brenth
on Python. Otherwise, the updates are in closed form.
See Fig. 6 for numerical results. By gradient norm, we mean dist(∂ C F (x k ), 0).
al. investigate accelerating mirror descent via the ODE interpretation of Nesterov’s
acceleration (Su et al. 2016).
Going from optimization to sampling, some recent papers consider methods
for sampling of distributions which incorporate elements of mirror descent in the
underlying dynamics. Hsieh et al. (2018) propose a framework for sampling from
constrained distributions, termed mirrored Langevin dynamics. In a similar vein,
Zhang et al. (2020) propose a Mirror Langevin Monte Carlo algorithm, to improve
the smoothness and convexity properties for the distribution.
Acknowledgments MB thanks Queen Mary University of London for their support. ESR
acknowledges support from the London Mathematical Society.
References
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Ahookhosh, M., Hien, L.T.K., Gillis, N., Patrinos, P.: Multi-block Bregman proximal alternating
linearized minimization and its application to sparse orthogonal nonnegative matrix factoriza-
tion. arXiv preprint arXiv:1908.01402 (2019)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numerica 28, 1–174 (2019)
Attouch, H., Buttazzo, G., Michaille, G.: Variational analysis in Sobolev and BV spaces:
applications to PDEs and optimization. SIAM (2014)
Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: Minimax optimality and implicit
regularization. arXiv preprint arXiv:1806.00952 (2018)
Bachmayr, M., Burger, M.: Iterative total variation schemes for nonlinear inverse problems. Inverse
Prob. 25(10), 105004 (2009)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity:
first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms.
SIAM J. Control. Optim. 42(2), 596–636 (2003)
Beck, A.: First-Order Methods in Optimization, Vol. 25. SIAM (2017)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex
optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J.
Optim. 23(4), 2037–2060 (2013)
Ben-Tal, A., Margalit, T., Nemirovski, A.: The ordered subsets mirror descent optimization method
with applications to tomography. SIAM J. Optim. 12(1), 79–108 (2001)
Benning, M., Betcke, M., Ehrhardt, M., Schönlieb, C.-B.: Gradient descent in a generalised
Bregman distance framework. In: Geometric Numerical Integration and its Applications,
Vol. 74, pp. 40–45. MI Lecture Notes series of Kyushu University (2017)
Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.-B.: Choose your path wisely: gradient
descent in a Bregman distance framework. SIAM Journal on Imaging Sciences (SIIMS). arXiv
preprint arXiv:1712.04045 (2017)
Benning, M., Burger, M.: Error estimates for general fidelities. Electron. Trans. Numer. Anal.
38(44–68), 77 (2011)
Benning, M., Burger, M.: Modern regularization methods for inverse problems. Acta Numerica 27,
1–111 (2018)
Benning, M., Knoll, F., Schönlieb, C.-B., Valkonen, T.: Preconditioned ADMM with nonlinear
operator constraint. In: IFIP Conference on System Modeling and Optimization, pp. 117–126.
Springer (2015)
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 133
Benning, M., Lee, E., Pao, H., Yacoubou-Djima, K., Wittman, T., Anderson, J.: Statistical filtering
of global illumination for computer graphics. IPAM Research in Industrial Projects for Students
(RIPS) Report (2007)
Benning, M., Riis, E.S., Schönlieb, C.-B.: Bregman Itoh–Abe methods for sparse optimisation. In
print: J. Math. Imaging Vision (2020)
Bertocchi, C., Chouzenoux, E., Corbineau, M.-C., Pesquet, J.-C., Prato, M.: Deep unfolding of a
proximal interior point method for image restoration. Inverse Prob. 36, 034005 (2019)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization:
A survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)
Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math.
Program. 129(2), 163 (2011)
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant
step size. SIAM J. Optim. 18(1), 29–51 (2007)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and
Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim.
28(3), 2131–2151 (2018)
Bonettini, S., Rebegoldi, S., Ruggiero, V.: Inertial variable metric techniques for the inexact
forward–backward algorithm. SIAM J. Sci. Comput. 40(5), A3180–A3210 (2018)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: A family of variable metric
proximal methods. Math. Program. 68(1–3), 15–47 (1995)
Bouwmans, T., Javed, S., Zhang, H., Lin, Z., Otazo, R.: On the applications of robust PCA in image
and video processing. Proc. IEEE 106(8), 1427–1457 (2018)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1),
1–122 (2011)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its
application to the solution of problems in convex programming. USSR Comput. Math. Math.
Phys. 7(3), 200–217 (1967)
Brunton, S.L., Kutz, J.N.: Data-Driven Science and Engineering: Machine Learning, Dynamical
Systems, and Control. Cambridge University Press (2019)
Burbea, J., Rao, C.: On the convexity of higher order Jensen differences based on entropy functions
(corresp.). IEEE Trans. Inf. Theory 28(6), 961–963 (1982)
Burbea, J., Rao, C.: On the convexity of some divergence measures based on entropy functions.
IEEE Trans. Inf. Theory 28(3), 489–495 (1982)
Burger, M.: Bregman distances in inverse problems and partial differential equations. In:
Advances in Mathematical Modeling, Optimization and Optimal Control, pp. 3–33. Springer
(2016)
Burger, M., Frick, K., Osher, S., Scherzer, O.: Inverse total variation flow. Multiscale Model. Simul.
6(2), 366–395 (2007)
Burger, M., Gilboa, G., Moeller, M., Eckardt, L., Cremers, D.: Spectral decompositions using one-
homogeneous functionals. SIAM J. Imag. Sci. 9(3), 1374–1408 (2016)
Burger, M., Gilboa, G., Osher, S., Xu, J.: Nonlinear inverse scale space methods. Commun. Math.
Sci. 4(1), 179–212 (2006)
Burger, M., Moeller, M., Benning, M., Osher, S.: An adaptive inverse scale space method for
compressed sensing. Math. Comput. 82(281), 269–299 (2013)
Burger, M., Osher, S.: Convergence rates of convex variational regularization. Inverse Prob. 20(5),
1411 (2004)
Burger, M., Resmerita, E., He, L.: Error estimation for Bregman iterations and inverse scale space
methods in image restoration. Computing 81(2–3), 109–135 (2007)
Cai, J.-F., Osher, S., Shen, Z.: Convergence of the linearized Bregman iteration for 1 -norm
minimization. Math. Comput. 78(268), 2127–2136 (2009)
Cai, J.-F., Osher, S., Shen, Z.: Linearized Bregman iterations for compressed sensing. Math.
Comput. 78(267), 1515–1536 (2009)
Cai, J.-F., Osher, S., Shen, Z.: Linearized Bregman iterations for frame-based image deblurring.
SIAM J. Imag. Sci. 2(1), 226–252 (2009)
134 M. Benning and E. S. Riis
Calatroni, L., Garrigos, G., Rosasco, L., Villa, S.: Accelerated iterative regularization via dual
diagonal descent. arXiv preprint arXiv:1912.12153 (2019)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11
(2011)
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim.
Theory Appl. 34(3), 321–353 (1981)
Censor, Y., Stavros Zenios, A.: Proximal minimization algorithm with d-functions. J. Optim.
Theory Appl. 73(3), 451–464 (1992)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrink-
age/thresholding algorithm. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Chambolle, A., Ehrhardt, M.J., Richtárik, P., Carola-Schonlieb, B.: Stochastic primal-dual hybrid
gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim. 28(4),
2783–2808 (2018)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numerica
25, 161–319 (2016)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm.
Math. Prog. 159(1–2), 253–287 (2016)
Chavent, G., Kunisch, K.: Regularization of linear least squares problems by total bounded
variation. ESAIM Control Optim. Calc. Var. 2, 359–376 (1997)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using
Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric forward–backward algorithm for
minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl.
162(1), 107–132 (2014)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Classics in Applied Mathematics, 1st edn.
SIAM, Philadelphia (1990)
Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order
primal-dual method for nonconvex problems. SIAM J. Optim. 29(1), 933–963 (2019)
Clason, C., Valkonen, T.: Primal-dual extragradient methods for nonlinear nonsmooth PDE-
constrained optimization. SIAM J. Optim. 27(3), 1314–1339 (2017)
Combettes, P.L., Pesquet, J.-C.: Deep neural network structures solving variational inequalities.
arXiv preprint arXiv:1808.07526 (2018)
Combettes, P.L., Vũ, B.C.: Variable metric forward–backward splitting with applications to
monotone inclusions in duality. Optimization 63(9), 1289–1318 (2014)
Corona, V., Benning, M., Ehrhardt, M.J., Gladden, L.F., Mair, R., Reci, A., Sederman, A.J.,
Reichelt, S., Schönlieb, C.-B.: Enhancing joint reconstruction and segmentation with non-
convex Bregman iteration. Inverse Prob. 35(5), 055001 (2019)
Corona, V., Benning, M., Gladden, L.F., Reci, A., Sederman, A.J., Schoenlieb, C.-B.: Joint phase
reconstruction and magnitude segmentation from velocity-encoded MRI data. arXiv preprint
arXiv:1908.05285 (2019)
Doan, T.T., Bose, S., Nguyen, D.H., Beck, C.L.: Convergence of the iterates in mirror descent
methods. IEEE Control Syst. Lett. 3(1), 114–119 (2018)
Dragomir, R.-A., Taylor, A., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of
Bregman first-order methods. arXiv preprint arXiv:1911.08510 (2019)
Duchi, J.C., Agarwal, A., Johansson, M., Jordan, M.I.: Ergodic mirror descent. SIAM J. Optim.
22(4), 1549–1578 (2012)
Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to
convex programming. Math. Oper. Res. 18(1), 202–226 (1993)
Ehrhardt, M.J., Riis, E.S., Ringholm, T., Schönlieb, C.-B.: A geometric integration approach
to smooth optimisation: Foundations of the discrete gradient method. ArXiv e-prints
(2018)
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 135
Esser, E., Zhang, X., Chan, T.F.: A general framework for a class of first order primal-dual
algorithms for convex optimization in imaging science. SIAM J. Imag. Sci. 3(4), 1015–1046
(2010)
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka–
łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900
(2015)
Frerix, T., Möllenhoff, T., Moeller, M., Cremers, D.: Proximal backpropagation. arXiv preprint
arXiv:1706.04638 (2017)
Frick, K., Scherzer, O.: Convex inverse scale spaces. In: International Conference on Scale Space
and Variational Methods in Computer Vision, pp. 313–325. Springer (2007)
Gabay, D.: Chapter ix applications of the method of multipliers to variational inequalities. In:
Studies in Mathematics and Its Applications, Vol. 15, pp. 299–331. Elsevier (1983)
Gao, T., Lu, S., Liu, J., Chu, C.: Randomized Bregman coordinate descent methods for non-
Lipschitz optimization. arXiv preprint arXiv:2001.05202 (2020)
Garrigos, G., Rosasco, L., Villa, S.: Iterative regularization via dual diagonal descent. J. Math.
Imaging Vision 60(2), 189–215 (2018)
Gilboa, G., Moeller, M., Burger, M.: Nonlinear spectral analysis via one-homogeneous functionals:
Overview and future prospects. J. Math. Imaging Vision 56(2), 300–319 (2016)
Goldstein, T., Li, M., Yuan, X.: Adaptive primal-dual splitting methods for statistical learning
and image processing. In: Advances in Neural Information Processing Systems, pp. 2089–2097
(2015)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (ART) for three-
dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)
Gower, R.M., Richtárik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix
Anal. Appl. 36(4), 1660–1690 (2015)
Grimm, V., McLachlan, R.I., McLaren, D.I., Quispel, G.R.W., Schönlieb, C.-B.: Discrete gradient
methods for solving variational image regularisation models. J. Phys. A 50(29), 295201 (2017)
Gutman, D.H., Peña, J.F.: A unified framework for Bregman proximal methods: subgradient,
gradient, and accelerated gradient schemes. arXiv preprint arXiv:1812.10198 (2018)
Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving
Algorithms for Ordinary Differential Equations, Vol. 31, 2nd edn. Springer Science & Business
Media, Berlin (2006)
Hanzely, F., Richtarik, P., Xiao, L.: Accelerated Bregman proximal gradient methods for relatively
smooth convex optimization. arXiv preprint arXiv:1808.03045 (2018)
Hellinger, E.: Neue begründung der theorie quadratischer formen von unendlichvielen veränder-
lichen. Journal für die reine und angewandte Mathematik (Crelles Journal) 1909(136), 210–271
(1909)
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex analysis and minimization algorithms I: Fundamen-
tals, volume 305 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles
of Mathemati- cal Sciences], 2nd edn. Springer, Berlin (1993)
Hohage, T., Homann, C.: A generalization of the chambolle-pock algorithm to Banach spaces with
applications to inverse problems. arXiv preprint arXiv:1412.0126 (2014)
Hsieh, Y.-P., Kavis, A., Rolland, P., Cevher, V.: Mirrored Langevin dynamics. In: Advances in
Neural Information Processing Systems, pp. 2878–2887 (2018)
Hua, X., Yamashita, N.: Block coordinate proximal gradient methods with variable Bregman
functions for nonsmooth separable optimization. Math. Program. 160(1–2), 1–32 (2016)
Huang, B., Ma, S., Goldfarb, D.: Accelerated linearized Bregman method. J. Sci. Comput. 54(2–3),
428–453 (2013)
Itakura, F.: Analysis synthesis telephony based on the maximum likelihood method. In: The 6th
International Congress on Acoustics, 1968, pp. 280–292 (1968)
Itoh, T., Abe, K.: Hamiltonian-conserving discrete canonical equations based on variational
difference quotients. J. Comput. Phys. 76(1), 85–102 (1988)
136 M. Benning and E. S. Riis
Juditsky, A., Nemirovski, A., et al.: First order methods for nonsmooth convex large-scale
optimization, I: General purpose methods. Optim. Mach. Learn. 121–148 (2011). https://doi.
org/10.7551/mitpress/8996.003.0007
Kaczmarz, M.S.: Angenäherte Auflösung von Systemen linearer Gleichungen. Bulletin Interna-
tional de l’Académie Polonaise des Sciences et des Lettres. Classe des Sciences Mathématiques
et Naturelles. Série A, Sciences Mathématiques 35, 355–357 (1937)
Kiwiel, K.C.: Free-steering relaxation methods for problems with strictly convex costs and linear
constraints. Math. Oper. Res. 22(2), 326–349 (1997)
Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM
J. Control. Optim. 35(4), 1142–1168 (1997)
Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational
methods and deep learning. In: German Conference on Pattern Recognition, pp. 281–293.
Springer (2017)
Krichene, W., Bayen, A., Bartlett, P.L.: Accelerated mirror descent in continuous and discrete time.
In: Advances in Neural Information Processing Systems, pp. 2845–2853 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural
networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Cortes, C., Burges, C.J.C.: The mnist database of handwritten digits (1998). http://yann.
lecun.com/exdb/mnist 10:34 (1998)
Lee, K.-C., Ho, J., Kriegman, D.J.: Acquiring linear subspaces for face recognition under variable
lighting. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 684–698 (2005)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization.
SIAM J. Optim. 25(4), 2434–2460 (2015)
Li, H., Schwab, J., Antholzer, S., Haltmeier, M.: Nett: Solving inverse problems with deep neural
networks. Inverse Prob. 36, 065005 (2020)
Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J.
Numer. Anal. 16(6), 964–979 (1979)
Lorenz, D.A., Schöpfer, F., Wenger, S.: The linearized Bregman method via split feasibility
problems: Analysis and generalizations. SIAM J. Imag. Sci. 7(2), 1237–1262 (2014)
Lorenz, D.A., Wenger, S., Schöpfer, F., Magnor, M.: A sparse Kaczmarz solver and a linearized
Bregman method for online compressed sensing. arXiv e-prints (2014)
Prasanta, P.C.: On the generalized distance in statistics. National Institute of Science of India
(1936)
Matet, S., Rosasco, L., Villa, S., Vu, B.L.: Don’t relax: Early stopping for convex regularization.
arXiv preprint arXiv:1707.05422 (2017)
McLachlan, R.I., Quispel, G.R.W.: Six lectures on the geometric integration of ODEs, pp. 155–210.
London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge
(2001)
McLachlan, R.I., Quispel, G.R.W., Robidoux, N.: Geometric integration using discrete gradients.
Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 357(1754), 1021–1045 (1999)
Miyatake, Y., Sogabe, T., Zhang, S.-L.: On the equivalence between SOR-type methods for linear
systems and the discrete gradient methods for gradient systems. J. Comput. Appl. Math. 342,
58–69 (2018)
Moeller, M., Benning, M., Schönlieb, C., Cremers, D.: Variational depth from focus reconstruction.
IEEE Trans. Image Process. 24(12), 5369–5378 (2015)
Möllenhoff, T., Strekalovskiy, E., Moeller, M., Cremers, D.: The primal-dual hybrid gradient
method for semiconvex splittings. SIAM J. Imag. Sci. 8(2), 827–857 (2015)
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique
de France 93, 273–299 (1965)
Morozov, V.A.: Regularization of incorrectly posed problems and the choice of regularization
parameter. USSR Comput. Math. Math. Phys. 6(1), 242–251 (1966)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Pro-
ceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814
(2010)
3 Bregman Methods for Large-Scale Optimization with Applications in Imaging 137
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to
stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Nemirovsky, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization (1983)
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of conver-
gence O(1/k 2 ). In: Doklady AN USSR, Vol. 269, pp. 543–547 (1983)
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–
259 (2009)
Neubauer, A.: On Nesterov acceleration for Landweber iteration of linear ill-posed problems.
J. Inverse Ill-posed Prob. 25(3), 381–390 (2017)
Nielsen, F., Boltz, S.: The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 57(8),
5455–5466 (2011)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex
optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)
Ochs, P., Ranftl, R., Brox, T., Pock, T.: Bilevel optimization with nonsmooth lower level problems.
In: International Conference on Scale Space and Variational Methods in Computer Vision, pp.
654–665. Springer (2015)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total
variation-based image restoration. Multiscale Model. Simul. 4(2), 460–489 (2005)
Oswald, P., Zhou, W.: Convergence analysis for Kaczmarz-type methods in a Hilbert space
framework. Linear Algebra Appl. 478, 131–161 (2015)
Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In:
International Conference on Machine Learning, pp. 80–88 (2013)
Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014)
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-
Shah functional. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1133–
1140. IEEE (2009)
Resmerita, E., Scherzer, O.: Error estimates for non-quadratic regularization and the relation to
enhancement. Inverse Prob. 22(3), 801 (2006)
Riis, E.S., Ehrhardt, M.J., Quispel, G.R.W., Schönlieb, C.-B.: A geometric integration approach
to nonsmooth, nonconvex optimisation. Foundations of Computational Mathematics (FOCM).
ArXiv e-prints (2018)
Ringholm, T., Lazić, J., Schönlieb, C.-B.: Variational image regularization with Euler’s elastica
using a discrete gradient scheme. SIAM J. Imag. Sci. 11(4), 2665–2691 (2018)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Scherzer, O., Groetsch, C.: Inverse scale space theory for inverse problems. In: International
Conference on Scale-Space Theories in Computer Vision, pp. 317–325. Springer (2001)
Marie Schmidt, F., Benning, M., Schönlieb, C.-B.: Inverse scale space decomposition. Inverse
Prob. 34(4), 179–212 (2018)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient.
Math. Program. 162(1–2), 83–112 (2017)
Schöpfer, F., Lorenz, D.A.: Linear convergence of the randomized sparse Kaczmarz method. Math.
Program. 173(1), 509–536 (2019)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization methods in Banach
spaces, Vol. 10. Walter de Gruyter (2012)
Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient
method: Theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
Teboulle, M.: Entropic proximal mappings with applications to nonlinear programming. Math.
Oper. Res. 17(3), 670–690 (1992)
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1),
67–96 (2018)
Teboulle, M., Chen, G.: Convergence analysis of a proximal-like minimization algorithm using
Bregman function. SIAM J. Optim. 3(3), 538–543 (1993)
138 M. Benning and E. S. Riis
Valkonen, T.: A primal–dual hybrid gradient method for nonlinear operators with applications to
MRI. Inverse Prob. 30(5), 055012 (2014)
Wang, H., Banerjee, A.: Bregman alternating direction method of multipliers. In: Advances in
Neural Information Processing Systems, pp. 2816–2824 (2014)
Widrow, B., Hoff, M.E.: Adaptive switching circuits. Technical report, Stanford Univ Ca Stanford
Electronics Labs (1960)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 1(151), 3–34 (2015)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization.
J. Mach. Learn. Res. 11, 2543–2596 (2010)
Yin, W.: Analysis and generalizations of the linearized Bregman method. SIAM J. Imag. Sci. 3(4),
856–877 (2010)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \ell_1-minimization
with applications to compressed sensing. SIAM J. Imag. Sci. 1(1), 143–168 (2008)
Yosida, K.: Functional Analysis. Springer (1964)
Young, D.M.: Iterative Solution of Large Linear Systems. Computer Science and Applied
Mathematics, 1st edn. Academic Press, Inc., Orlando (1971)
Zhang, H., Dai, Y.-H., Guo, L., Peng, W.: Proximal-like incremental aggregated gradient
method with linear convergence under Bregman distance growth conditions. arXiv preprint
arXiv:1711.01136 (2017)
Zhang, K.S., Peyré, G., Fadili, J., Pereyra, M.: Wasserstein control of mirror Langevin Monte
Carlo. arXiv e-prints (2020)
Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman
iteration. J. Sci. Comput. 46(1), 20–46 (2011)
Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S., Glynn, P.W.: Stochastic mirror descent in
variationally coherent optimization problems. In: Advances in Neural Information Processing
Systems, pp. 7040–7049 (2017)
Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image
restoration. UCLA CAM Report 34 (2008)
Fast Iterative Algorithms for Blind Phase
Retrieval: A Survey 4
Huibin Chang, Li Yang, and Stefano Marchesini
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Mathematical Formula and Nonlinear Optimization Model for BPR . . . . . . . . . . . . . . . . . . . 141
Mathematical Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Optimization Problems and Proximal Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Fast Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Alternating Projection (AP) Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
ePIE-Type Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Proximal Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Second-Order Algorithm Using Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Subspace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Experimental Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Further Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Abstract
survey first presents the mathematical formula of BPR and related nonlinear opti-
mization problems and then gives a brief review of the recent iterative algorithms.
It mainly consists of three types of algorithms, including the operator-splitting-
based first-order optimization methods, second-order algorithm with Hessian,
and subspace methods. The future research directions for experimental issues
and theoretical analysis are further discussed.
Introduction
Phase retrieval (PR) plays a key role in nanoscale imaging technique (Pfeiffer
2018; Elser et al. 2018; Zheng et al. 2021; Gürsoy et al. 2022) and ultrafast
laser (Trebino et al. 1997). Retrieving the images of the sample from phaseless
data is a long-standing problem. Generally speaking, designing fast and reliable
algorithms is challenging since directly solving the quadratic polynomials of PR
is NP hard and the involved optimization problem is nonconvex and possibly
nonsmooth. Thus, it has drawn the attentions of researchers for several decades
(Luke 2005; Shechtman et al. 2015; Grohs et al. 2020; Fannjiang and Strohmer
2020). Among the general PR problems, besides the recovery of the sample, it is also
of great importance to reconstruct the probes. The motivation of blind recovery is
twofold: (1) characteristics of the probe (wave front sensing) and (2) improving the
reconstruction quality of the sample. Essentially in practice, as the probe is almost
never completely known, one has to solve such blind phase retrieval (BPR) problem,
e.g., in coherent diffractive imaging (CDI) (Thibault and Guizar-Sicairos 2012),
convention ptychography imaging (Thibault et al. 2009; Maiden and Rodenburg
2009), Fourier ptychography (Zheng et al. 2013; Ou et al. 2014), convolutional
PR(Ahmed et al. 2018), frequency-resolved optical gating (Trebino et al. 1997),
and others.
An early work by Chapman (1996) to solve the blind problem used the Wigner-
distribution deconvolution method to retrieve the probe. In the optics community,
alternating projection (AP) algorithms are very popular for nonblind PR prob-
lems (Marchesini 2007; Elser et al. 2018). Some AP algorithms have also been
applied to BPR problems, e.g., Douglas-Rachford (DR)-based algorithm (Thibault
et al. 2009), extended ptychographic engine (ePIE) and variants (Maiden and
Rodenburg 2009; Maiden et al. 2017), and relaxed averaged alternating reflection
(Luke 2005)-based projection algorithm (Marchesini et al. 2016). More advanced
first-order optimization method includes proximal algorithms, Hesse et al. (2015),
Yan (2020), and Huang et al. (2021), alternating direction of multiplier methods
(ADMMs) (Chang et al. 2019a; Fannjiang and Zhang 2020), and convex program-
ming method (Ahmed et al. 2018). To further accelerate the first-order optimization,
several second-order algorithms utilizing the Hessian have also been developed
(Qian et al. 2014; Yeh et al. 2015; Ma et al. 2018; Gao and Xu 2017; Kandel et al.
2021). Moreover, the subspace methods (Xin et al. 2021) were successfully applied
to the BPR as Thibault and Guizar-Sicairos (2012), Chang et al. (2019a), and Fung
and Wendy (2020).
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 141
The purpose of the survey is to give a brief review of the recent iterative
algorithms for BPR problem, so as to provide instructions for practical use and
draw attentions of applied mathematician for further improvement. The remainder
of the survey is organized as follows: Section “Mathematical Formula and Nonlinear
Optimization Model for BPR” gives the mathematical formula for BPR and
related nonlinear optimization models, as well as the closed-form expression of the
proximal mapping. Fast iterative algorithms are reviewed in Section “Fast Iterative
Algorithms”. Section “Discussions” further discusses the experimental issues and
theoretical analysis. Section “Conclusions” summarizes this survey.
f = |Au|2 , (1)
where Poi denotes the random variable following i.i.d Poisson distribution. See
more advanced models for practical noise as outliers and structured and randomly
distributed uncorrelated noise sources in Godard et al. (2012), Reinhardt et al.
(2017), Wang et al. (2017), Odstrčil et al. (2018), Chang et al. (2019b), and
references therein.
Mathematical Formula
State the BPR problem starting from convention ptychography (Rodenburg 2008),
since the principle of other BPR problems can be explained in a similar manner, all
of which can be unified as the blind recovery problem.
As shown in Fig. 1, a detector in the far field measures a series of phaseless
intensities, by letting a localized coherent X-ray probe w scan through the sample u.
Let the 2D image and the localized 2D probe denote as u ∈ Cn with √n× √n pixels
√ √
and w ∈ Cm̄ with m̄× m̄ pixels, respectively. Here both the sample and the probe
are rewritten as vectors by a lexicographical order. Let fjP ∈ Rm̄+ ∀0 ≤ j ≤ J − 1
denote the phaseless measurements satisfying
142 H. Chang et al.
where the symbols | · |, (·)2 , and ◦ represent the element-wise absolute value and
square of a vector and the element-wise multiplication of two vectors, respectively,
the symbol Sj ∈ Rm̄×n represents a matrix with binary elements extracting a patch
(with the index j and size m̄) from the entire sample, and the symbol F denotes the
normalized discrete Fourier transformation (DFT). In practice, to get an accurate
estimate of the probe, one has to solve a blind ptychographic PR problem. Note that
the coherent CDI problem (Thibault and Guizar-Sicairos 2012) can be interpreted
as a special blind ptychography problem with only one scanned frame (J = 1).
A recent super-resolution technique based on visible light called as the Fourier
ptychography method (FP) has been developed by Zheng et al. (2013) and quickly
spreads out for fruitful applications (Zheng et al. 2021). Letting w and u (here
reuse the notations for simplicity) be the point spread function (PSF) of the imaging
system and the sample of interest, the collected phaseless data fjF P of FP can be
expressed as
or
where the symbol Tj denotes the translation. From the measurement {f FROG
j }j , one
may also formulate it by BPR if assuming the element-wise multiplication for two
independent variables.
All the mentioned problems can be unified as the BPR problem, i.e., to recover
the probe (pupil, convolution kernel, or the signal itself) and the sample jointly.
Essentially the relation between these two variables is bilinear. For conventional
ptychography, the bilinear operators A : Cm̄ × Cn → Cm and Aj : Cm̄ × Cn →
Cm̄ ∀0 ≤ j ≤ J − 1 are denoted as follows:
A(w, u) := (AT0 (w, u), AT1 (w, u), · · · , ATJ −1 (w, u))T , (5)
with
Aj (w, u) := F(w ◦ Sj u)
and
Actually for all BPR problems, the bilinear operators can be unified as
⎧
⎪
⎪ F(w ◦ Sj u); Case I: CDI and ptychography
⎪
⎪
⎪
⎪
⎨ F−1 (Fw ◦ Sj (Fu)); Case II: Fourier ptychography
Aj (w, u) := (6)
⎪
⎪ F(w ◦ Tj u); Case III: FROG
⎪
⎪
⎪
⎪
⎩ w u, or F(w u); Case IV: Convolution PR
where there are totally one frame as J = 1 for the last case for convolution PR.
Hence, by introducing the general bilinear operator A(·, ·), the BPR can be given
below:
BPR: To find the “probe” w and the sample u, s.t. |A(w, u)|2 = f, (7)
where A is denoted as (5) and (6) and the per frame of phaseless measurements
fj represents the measurement from four cases. Note that the BPR problem is not
limited to the cases with forward propagation as (6).
144 H. Chang et al.
Aw u = A(w, u)∀u;
(8)
Au w = A(w, u)∀w;
and
⎧
⎪
⎪ (conj(Sj u) ◦ F−1 zj ); Case I
⎪
⎪
⎪
⎪
⎪
⎪
j
⎪
⎪
⎪
⎪ −1
⎨F
⎪ (conj(Sj Fu) ◦ Fzj ); Case II
A∗u z = j (10)
⎪
⎪
⎪
⎪
⎪
⎪ (conj(Tj u) ◦ F−1 zj ); Case III
⎪
⎪
⎪
⎪ j
⎪
⎪
⎪
⎩ conj(u) z, or conj(u) F−1 z; Case IV
J −1
∀z = (z1T , z2T , · · · , zJT −1 )T ∈ Cm . Here j is a simplified form of j =0 .
Consequently, one obtains
⎧
⎪
⎪ STj |w|2 ◦ u;
⎪
⎪
Case I
⎪
⎪
⎪
⎪ j
⎪
⎪ −1 T
⎪
⎪
⎪
⎨F Sj |Fw|2 ◦ Fu ; Case II
A∗w Aw u = j (11)
⎪
⎪ T
⎪
⎪
⎪
⎪ Tj |w|2 ◦ u; Case III
⎪
⎪
⎪
⎪ j
⎪
⎪
⎪
⎩ conj(w) w u; Case IV
and
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 145
⎧
⎪
⎪ Sj |u|2 ◦ w;
⎪
⎪
Case I
⎪
⎪
⎪
⎪ j
⎪
⎪
⎪ F−1
⎪
⎪ Sj |Fu|2 ◦ Fw ; Case II
⎨
A∗u Au w = j (12)
⎪
⎪
⎪
⎪
⎪
⎪ Tj |u|2 ◦ w; Case III
⎪
⎪
⎪
⎪ j
⎪
⎪
⎪
⎩ conj(u) u w. Case IV
where the symbol M(·, ·) represents the error between the unknown intensity
|A(w, u)|2 and collected phaseless data f . Various metrics proposed under different
noise settings include amplitude-based metric for Gaussian measurements (AGM)
(Wen et al. 2012; Chang et al. 2016), intensity-based metric for Poisson measure-
ments (IPM) (Thibault and Guizar-Sicairos 2012; Chen and Candes 2015; Chang
et al. 2018b), and intensity-based metric for Gaussian measurements (IGM) (Qian
et al. 2014; Candes et al. 2015; Sun et al. 2016), all of which can be expressed as
⎧
⎪
⎪ 1 √ 2
⎪
⎪ g− f ; (AGM)
⎪
⎪ 2
⎪
⎨
1
M(g, f ) := g − f ◦ log(g), 1 ; (IPM) (14)
⎪2
⎪
⎪
⎪
⎪
⎪ 1
⎪
⎩ g − f 2; (IGM)
2
√
where the operations on vectors such as ·, log(·), | · |, (·)2 are all defined
pointwisely in this survey, 1 denotes a vector whose entries all equal to ones, and
· denotes the 2 norm in Euclidean space.
The proximal mapping for functions defined on complex Euclidean space is
introduced below.
146 H. Chang et al.
β
Proxh;β (v) = arg min h(x) + x−v 2 , (15)
x 2
Namely, the proximal operator for the function M(| · |2 , f ) defined in (14) has a
closed-form formula (Chang et al. 2018c) as below:
⎧√
⎪
⎪ f + β|z|
⎪
⎪ ◦ sign(z), for AGM;
⎪
⎪ 1+β
⎪
⎨
ProxM(|·|2 ,f );β (z) = β|z| + (β|z|)2 + 4(1 + β)f
⎪
⎪ ◦ sign(z), for IPM;
⎪
⎪ 2(1 + β)
⎪
⎪
⎪
⎩ (|z|) ◦ sign(z),
β for IGM;
(16)
where ∀ z ∈ Cm , (sign(z))(t) := sign(z(t)) ∀0 ≤ t ≤ m − 1, sign(x) for a scalar
x ∈ C is denoted as sign(x) = |x|
x
if x = 0, otherwise sign(0) := c with an arbitrary
constant c ∈ C with unity length, and
⎧
⎪
⎪ β|z(t)| √ β|z(t)| √
⎨ 3
+ D(t) + 3
− D(t), if D(t) ≥ 0;
4 4
β (|z|)(t) = β (17)
⎪
⎪ f (t)− 2
⎩ 2 3 cos arccos θ(t)
3 , otherwise,
β
β 2 |z(t)|2
( 2 −f (t))3 β|z(t)|
for 0 ≤ t ≤ m − 1, with D(t) = 27 + , and θ (t) = .
16 (f (t)− β2 )3
4 27
Note that the alternating direction method of multipliers (ADMM) was adopted
in Wen et al. (2012) and Chang et al. (2016, 2018b) to solve the variational PR
model in (13). However, due to the lack of the globally Lipschitz differentiable
terms in the objective function, it seems difficult to guarantee its convergence. Some
other variants of the metric have been recently proposed, such as the penalized
metrics M(| · |2 + 1, f + 1) by adding a small positive scalar as Guizar-
Sicairos and Fienup (2008), Chang et al. (2019a), and Gao et al. (2020). Although
it has simple form, the technique will make the related proximal mapping not
have closed-form expression, such that additional computation cost as an inner
loop may have to be introduced (Chang et al. 2019a). By cutting off the AGM
near the origin, and then adding back a smooth function, one can keep the global
minimizer unchanged. Hence, a novel smooth truncated AGM (ST-AGM) G (·; f )
with truncation parameter > 0 (Chang et al. 2021) was designed below:
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 147
M (z, f ) := M (z(j ), f (j )), (18)
j
where ∀ x ∈ C, b ∈ R+ ,
⎧
⎪ 1 − b − 1 |x|2 , if |x| < √b;
⎪
⎪
⎨
2
M (x, b) := (19)
⎪ 1
⎪ √ 2
⎩ |x| − b , otherwise.
⎪
2
Readily its closed form of the corresponding proximal mapping can be found in
Chang et al. (2021). More other elaborate metrics can be found in Luke (2005), Cai
et al. (2021), and references therein.
In this section, the main iterative algorithms for BPR will be introduced. Note that
each algorithm may be designed originally for a specific case of (6). Hence, the basic
idea based on the original case will be explained first, and the possible extensions to
other cases will be discussed then.
Ψj := F(w ◦ Sj u) ∀0 ≤ j ≤ J − 1,
the optimal exit wave Ψ lies in the intersection of two following sets, i.e.,
Ψ ∈X 1
X 2,
with
X 1 := {Ψ := (Ψ0 , Ψ1 , · · · , ΨJ −1 ) ∈ C : |Ψj | =
T T T T m
fj ∀0 ≤ j ≤ J − 1},
X −1
2 := {Ψ ∈ C : ∃w ∈ C , u ∈ C , s.t. w ◦ Sj u = F Ψj ∀0 ≤ j ≤ J − 1}.
m m̄ n
(20)
The AP algorithm determining this intersection alternatively calculates the
projections onto these two sets X
1 and X2 . Regarding the projection onto X1 as
with
P1 (Ψ ) =
j
fj ◦ sign(Ψj ) 0 ≤ j ≤ J − 1.
For the projection onto X k
2 , given Ψ as the solution in the kth iteration, one gets
where
1
(w k+1 , uk+1 ) = arg min F (w, u, Ψ k ) := 2 j F−1 Ψjk − w ◦ Sj u 2 . (21)
w,u
Unfortunately, it does not have a closed-form solution. One can solve (21) by
alternating minimization (with T steps) as below:
where the parameters 0 < ᾱ1 , ᾱ2 1 are introduced in order to avoid dividing by
zeros.
Letting Ψ k be iterative solution in the kth iteration, the standard AP for BPR can
be directly given as below:
(1) Compute Ψk by Ψ k = F(w k+1 ◦ Sj uk+1 ), where the pair (w k+1 , uk+1 ) is
j
approximately solved by (23).
(2) Compute Ψ k+1 by Ψ k+1 = P1 (Ψ
k ).
The DR algorithm for BPR can be formulated in two steps (Thibault et al. 2009),
as follows:
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 149
Note that the formula (24) utilizing DR operator is essentially Fienup’s hybrid
input–output map, which can also be derived with proper parameters from
difference map (Elser 2003).
Since the fixed point of DR iteration may not exist, Marchesini et al. (2016)
adopted the relaxed version of DR (dubbed as RAAR by Luke 2005) to further
improve the stability of the reconstruction from noisy measurements, which simply
takes weighted average of right term of (24) and Ψ k with a tunable parameter δ ∈
(0, 1) as
Ψ k+1 = δ Ψ k + P1 (2Ψ
k − Ψ k ) − Ψ k ,
k + (1 − δ)Ψ
In the same manner, one can define two constraint sets and establish the AP
algorithms for the four cases of BPR. The only differences lie in the calculations
of the projections onto the bilinear constraint set. As (21), consider
The detailed forms of these operators can be found in (9), (10), (11), and (12).
Notably the inverse in (26) can be efficiently solved by pointwise division or DFT.
ePIE-Type Algorithms
and then updating wk+1 and uk+1 by the gradient descent algorithm (inexact
projection) for (21), the ePIE algorithm proposed by Maiden and Rodenburg (2009)
can be expressed by updating wk+1 and uk+1 in parallel as
⎧
⎪
⎪ d2
Snk conj(uk ) ◦ F−1 (Ψnkk − P1 k (Ψnkk ))
n
⎪ w k+1 = w k −
⎪
⎨ Snk uk 2
∞
(27)
⎪
⎪ d1
STnk conj(w k ) ◦ F−1 (Ψnkk − P1 k (Ψnkk )) ,
n
⎪
⎪
k+1
= uk − T
⎩u Sn k w k 2
∞
with the scalar constant δ ∈ (0, 1). It can be interpreted as a hybrid scheme for the
stepsize of gradient descent, which takes the weighted average of the denominator
of the ePIE scheme (27) and first term in the denominator of AP scheme (23). The
rPIE algorithm was further accelerated by momentum (Maiden et al. 2017).
One can directly get the ePIE and rPIE schemes for FP (Zheng et al. 2021) by
replacing the variables w and u by Fw and Fu. The ePIE-type algorithms are very
popular in optics community, since it is enough to implement the algorithm if one
knows how to calculate the gradient of the objective functions, and the memory
footprint is much smaller than more advanced AP algorithm including DR and
RAAR. However, it tends to unstable when the data redundancy is insufficient
(e.g., noisy data, big-step scan) as reported in Chang et al. (2019a). Moreover, the
theoretical convergence is unknown and seems challenging due to the relation with
nonsmooth objective functions.
Note that if with totally J = 1 frame as CDI, the differences between the ePIE
(with d1 = d2 = 1) and standard AP lie in the preconditioning matrices: AP
utilizes the spatial weighted diagonal matrices A∗u Au and A∗w Aw , while ePIE utilizes
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 151
Proximal Algorithms
min F (w, u, Ψ ) + IX
1
(Ψ ) + IX1 (w) + IX2 (u), (29)
w,u,Ψ
with F (w, u, Ψ ) and X1 denoted in (21) and (20), respectively, and the indicator
function IX denoted as
⎧
⎨ 0, if Ψ ∈ X,
IX (Ψ ) :=
⎩ + ∞, otherwise,
where the amplitude constraints of the probe and image are incorporated (in Hesse
et al. (2015), the authors further considered the compact support condition of the
probes and image), where
X1 := {w ∈ Cm̄ : w ∞ ≤ Cw };
X2 := {u ∈ Cn : u ∞ ≤ Cu } (30)
min 1
2 w̃ − w 2 .
w̃ ∞ ≤Cw
with
min Ψ − A(w, u) 2
+ IX
1
(Ψ ) + IX1 (w) + IX2 (u),
w,u,Ψ
where one adopts the same form as (25) for the first term. The detailed algorithms
are omitted, since one only needs to update the gradient of first term following (9),
(10), (11), and (12).
Variant of Proximal Algorithm Here introduce a general constraint set for the
bilinear relation as
min IX
1
(z) + IX (z). (33)
z
By replacing the indicator function by the metrics, and further combining the
alternating minimization with proximal algorithms, Han Yan (2020) derived a new
proximal algorithm for the convention ptychography problem. Specifically, the
proposed algorithm with a generalized form for BPR has the following steps:
β
Step 1: zk+1 = arg min M(|z|2 ; f ) + 2 z − A(w k , uk ) 2 ;
z
Step 3: u k+1
= arg min zk+1 − Awk+1 u) 2 . (34)
u
Here the last two steps can be solved in a same manner as (26). The above algorithm
has deep connections with the ADMM (Chang et al. 2019a). If removing the
constraint of boundedness of two variables, and setting the penalization parameter
to zero in (35), then by solving the constraint problem (37) by adding a penalization
term z − A(w, u) 2 without introducing the multiplier Λ, one can get exactly
the same iterative scheme as (34). Besides, it was further improved by accelerated
proximal gradient method in Yan (2020) and recently by stochastic gradient descent
(Huang et al. 2021) for FP.
ADMM
with G(z) := M(|z|2 + 1, f + 1) and the constraint sets defined in (30). The
authors further leveraged the additional data c ∈ Rm̄
+ to eliminate structural artifacts
caused by grid scan and therefore obtained the following variant:
min
G(A(w, u))+IX1 (w) + IX2 (u) + τ G(Fw), (36)
w∈Cm̄ ,u∈Cn
As the procedures for solving the two above models are quite similar using
ADMM, only details for solving the first optimization model (35) are given below.
By introducing an auxiliary variable z = A(w, u) ∈ Cm , an equivalent form of (35)
is formulated below:
with the multiplier Λ ∈ Cm and a positive parameter β, where (·) denotes the real
part of a complex number. Then one considers the following problem:
Given the approximated solution (wk , uk , zk , Λk ) in the kth iteration, the four-
step iteration by the generalized ADMM (only the subproblems w.r.t. w or u have
proximal terms) is given as follows:
⎧
⎪
⎪ Step 1: w k+1 = arg min Υβ (w, uk , zk , Λk ) + α1
w − wk 2
⎪
⎪ w 2 M1k
⎪
⎪
⎪
⎪ α2
⎨ Step 2: uk+1 = arg min Υβ (w k+1 , u, zk , Λk ) + 2 u − uk 2
M2k
u
(40)
⎪
⎪
⎪
⎪ Step 3: zk+1 = arg min Υβ (w k+1 , uk+1 , z, Λk )
⎪
⎪ z
⎪
⎪
⎩
Step 4: Λ k+1
= Λ + β(zk+1 − A(w k+1 , uk+1 )),
k
Λk
ẑk := zk + β
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 155
one has
From the point of view of fixed point analysis, for nonblind problems (knowing
the probe w), the authors Fannjiang and Zhang (2020) presented a variant ADMM
to solve the following optimization problem:
To further apply the idea to the BPR, alternating minimization was further adopted
as
where these two subproblems can be solved via ADMM as inner loop. Here one has
to adjust the constraint sets with the update probe and sample, i.e.,
Note that the probe and sample can be readily determined by solving the least
squares problem as
w k+1
= arg min zk+1 − A(w, uk+1 ) 2 , (49)
w
which can be solved by (23). Although the algorithms worked well with suitable
initialization as reported in Fannjiang and Zhang (2020), the theoretical convergence
for the blind recovery is still open.
Convex Programming
Ahmed et al. (2018) proposed a convex relaxation based on a lifted matrix recovery
formulation that allows a nontrivial convex relaxation of the convolution PR.
Consider the convolution PR as
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 157
One basic assumption for unique recovery is that the variables κ and u belong to the
subspace of Cn , i.e.,
κ = Bh, u = Cm,
where h ∈ Ck1 and m ∈ Ck2 with known matrices B ∈ Cn,k1 and C ∈ Cn,k2
(k1 , k2 n). Then one is concerned with the following problem with h, m as
unknowns:
1
f Cov = √ |B̂h ◦ Ĉm|2 (50)
n
√ √
with B̂ := nFB, Ĉ := nFC. Further by the lifting technique in semidefinite
programming (SDP), the above problem reduces to
1
f Cov (l) = bl b∗l , H cl c∗l , M , (51)
n
with f¯ := nf Cov .
An ADMM scheme was further developed (Ahmed et al. 2018) to solve (52). By
introducing the convex constraint set
+ I{X0} (H ) + I{X0} (M ),
s.t. v 1 (l) − bl b∗l , H = 0, v 2 (l) − cl c∗l , M = 0,
H − H = 0, M − M = 0.
158 H. Chang et al.
With the multipliers Λk for k = 1, 2, 3, 4 for the totally four constraints, the
augmented Lagrangian with scalar form has the following form:
Lc (H , H , M, M , v 1 , v 2 ; {Λk }4k=1 )
:=IC (v 1 , v 2 ) + Tr(H ) + Tr(M) + I{X0} (H ) + I{X0} (M )
+ β1 Λ1 (l), v 1 (l) − bl b∗l , H + 12 v 1 (l) − bl b∗l , H 2
l
+ β1 Λ2 (l), v 2 (l) − cl c∗l , M + 1
2 v 2 (l) − cl c∗l , M 2
+ β2 Λ3 , H − H + β2
2 H − H 2
+ β2 Λ4 , M − M + β2
2 M − M 2
, (53)
with two positive scalar parameters β1 , β2 . Then with alternating minimization and
update of dual variables Λk , the iterative scheme is obtained. First, one can optimize
the variables H and M in parallel and only consider
H := arg min Tr(H ) + β1 Λ1 (l), v 1 (l) − bl b∗l , H
H
l
+ β1
2 v 1 (l) − bl b∗l , H 2
+ β2 Λ3 , H − H + β2
2 H − H 2
.
with
T1 := β1 vec(bl b∗l )vec(bl b∗l )∗ + β2 I.
l
Similarly, one can determine the optimal M for the subproblem w.r.t. M by
vec(M ) = T−1
2 vec β1 (v 2 (l) + Λ2 (l)) cl c∗l + β2 (M − Λ4 ) − I ,
l
with
T2 := β1 vec(cl c∗l )vec(cl c∗l )∗ + β2 I.
l
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 159
H := arg min
I{X0} (H ) + 1
2 H − H̃ 2
, (54)
H
with the Hermitian matrix H̃ (if initializing the multipliers Λ3 and Λ4 with
Hermitian matrices, it can be readily guaranteed that all iterative sequences of these
two multipliers are Hermitian). The closed-form solution of (54) can be directly
given as
H = Proj+ (H̃ ),
M = Proj+ (M − Λ3 ).
1
Q(u) ≈ Q(u0 ) + ∇u Q(u0 ), u − u0 + ( ∇u2 Q(u0 )(u − u0 ), u − u0 ), (57)
2
where ∇u2 denotes the Hessian operator. Then a new estimate u1 for the stationary
point can be obtained by solving the following systems:
in (9), (10), (11), and (12). The Hessian matrices for three metrics are complicated,
and please see Appendix A of Yeh et al. (2015).
More efficient algorithms including LM and GN were developed, concerned with
the nonlinear least squares problems (NLS) (56) with the AGM and IPM metrics
(please see (14)). Namely, by denoting the residual function
⎧
⎨ |Aw u| − f ; (AGM)
r(u) =
⎩ |A u|2 − f ; (IGM)
w
1
min Q(u) = r(u) 2 .
u 2
Then with Jacobian matrix as
⎧
⎨ diag(sign(conj(Aw u)))Aw ; (AGM)
J (u) := ∇u r(u) =
⎩ diag(conj(Aw u))Aw (IGM);
as an estimate of the Hessian matrix, that leads to the following iterative scheme:
Gauss-Newton method: uk+1 = (J ∗ (uk )J (uk ))−1 uk − ∇u Q(uk )
= (J ∗ (uk )J (uk ))−1 uk − J ∗ (uk )r(uk )
∀k.
(60)
Gao and Xu (2017) further proposed a global convergent GN algorithm with
resampling for PR problem, which partial phaseless data was used to reformulate
the GN matrix and the gradient per loop.
The Hessian matrix or the GN matrix cannot be guaranteed to be nonsingular
practically. Hence, the LM method interpreted as a regularized variant of GN was
proposed as
LM method: uk+1 = (J ∗ (uk )J (uk ) + μk I)−1 uk − J ∗ (uk )r(uk ) ∀k
(61)
with the adaptive parameter μk . Readily one knows μk cannot be too large; oth-
erwise, the Hessian information is useless. To obtain fast convergence, Marquardt
(1963) proposed the following strategy for μk depending on the diagonal matrix of
GN matrix as
with D g (A) denoting the diagonal matrix with the elements from the main diagonal
of the matrix A. Yamashita and Fukushima (2001) and Fan and Yuan (2005)
proposed the scheme depending on the objective function value below:
ν
μk = (Q(uk )) 2 (62)
with ν ∈ [1, 2]. Ma et al. (2018) further improved the scheme (62) as choosing a
larger value when the iterative solution uk is far away from the global minimizer,
i.e.,
ν
μk = Thresh(uk )(Q(uk )) 2 ,
with
⎧
⎨ τ, if Q(uk ) ≥ c0 uk 2
,
Thresh(u) =
⎩ 1, otherwise,
with M(g, f ) defined in (14). Following the same manner with alternating mini-
mization, one can easily derive the second-order algorithm for the blind problem
as
Subspace Method
The subspace method (Saad 2003; Xin et al. 2021) is a very powerful algorithm, iter-
atively refining the variable in the subspace of solution, which includes the Krylov
subspace method as well-known conjugate gradient method, domain decomposition
method, and multigrid method. It originally focused on solving the linear equations
or least squares problems and now has been successfully extended to nonlinear
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 163
min f (x).
By the nonlinear conjugate gradient (NLCG) algorithm, the iterative scheme can be
given below:
x k+1 = x k + α k d k ;
(64)
d k = −∇x f (x k ) + β k−1 d k−1 ∀k ≥ 1,
with the stepsize α k and weight β k−1 , where d k is the search direction. One may
notice that the search direction d k in NLCG is the combination of the gradient and
the search direction d k−1 with the weight β k−1 . To get optimal parameters, the
stepsize α k is selected by the monotone line search procedures, while the weight β k
is determined based on the gradient ∇x f (x k−1 ), ∇x f (x k ) and the search direction
d k−1 (typically five different formulas (Xin et al. 2021)).
The NLCG has been successfully applied to the BPR problem (Thibault and
Guizar-Sicairos 2012; Qian et al. 2014). For example, Thibault and Guizar-Sicairos
(2012) adopted the NLCG to solve the CDI problem. The iterative scheme can be
given below:
(w k+1 , uk+1 ) = (w k , uk ) + α k Δk ;
(65)
Δk = −g k + β k−1 Δk−1 ∀k ≥ 1,
g k , g k − ( g k , g k−1 )
β k−1 = .
g k−1 , g k−1
8
M(|A(w k + αΔw , uk + αΔu )|2 , f ) ≈ ct α t ,
t=0
164 H. Chang et al.
8
α k := arg min ct α t .
α
t=0
2
Ω= Ωd
d=1
i.e.,
Fig. 2 (a) Ptychography scan in the domain Ω (grid scan): the starting scan centers at point O and
then moves up (or to the right) with the center point O (or O ); (b) two-subdomain DD (totally
4 × 4 frames): The subdomains Ω1 , Ω2 are generated by two 4 × 2 scans, and the overlapping
region Ω1,2 = Ω1 ∩ Ω2
−1
Then two groups of localized shift operators can be introduced {Sdjd }jJdd=0 for
d = 1, 2 with d Jd = J.
For nonblind problem, denote the linear operators A1 , A2 on the subdomains as
T
Ad ud := (F(w ◦ Sd0 ud ))T , (F(w ◦ Sd1 ud )) , · · · , (F(w ◦ SdJd −1 ud ))T )T , (66)
where the operators π1,2 (restriction from Ω1 into Ω1,2 ) and π2,1 (restriction from
Ω2 into Ω1,2 ) are denoted as
and
fd := |Ad Rd u|2 .
2
minu1 ,u2 ,v G (Ad ud ; fd ),
d=1 (68)
s.t. πd,3−d ud − v = 0, d = 1, 2.
In order to develop an iterative scheme without inner loop as well as with fast
convergence for large-step scan, two auxiliary variables z1 , z2 are introduced below:
2
minu1 ,u2 ,v,z1 ,z2 G (zd ; fd ),
d=1 (69)
s.t. πd,3−d ud − v = 0, Ad ud − zd = 0, d = 1, 2.
Then it is quite standard to solve the saddle point problem by ADMM. The details
are omitted here, and please see more details in Chang et al. (2021).
Then for blind recovery, in order to reduce the grid pathology (Chang et al.
2019a) (ambiguity derived by the multiplication of any periodical function and the
true solution) due to grid scan, introduce the support set constraint of the probe,
i.e., O := {w : (Fw)(j ) = 0, j ∈ J}, with the support set J¯ denoted as the
complement of the set J (index set for zero values for the Fourier transform of the
probe). Then consider the blind ptychography problem for two-subdomain DD:
2
min{w,u1 ,u2 ,v} G (Ad (w, ud ); fd ) + IO (w),
d=1
s.t. πd,3−d ud − v = 0, d = 1, 2,
(Ad )jd (w, ud ) := F(w ◦ Sjdd ud )) ∀ 0 ≤ jd ≤ Jd − 1,
with 2d=1 Jd = J , and the indicator function IO . To enable parallel computing,
consider the following constraint optimization problems:
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 167
2
min G (zd ; fd ) + IO (w)
{w,w1 ,w2 ,u1 ,u2 ,v,z1 ,z2 }
d=1
min F∗ ( fj ◦ sign(F(w ◦ Sj u))) − w ◦ Sj u 2 , (70)
u
j
using the AGM metric. Then the multigrid optimization framework based on FAS
was further developed, where the coarse-grid subproblem was interpreted as a first-
order approximation to the fine-grid problem. However, it is unclear how to extend
the current algorithm to the blind problem.
Discussions
Experimental Issues
Probe Drift Probe drift happens in ptychography, when the data is very noisy.
The mass center of the iterative probe will eventually touch the boundary such that
the iterative algorithms fail eventually. Hence, the joint reconstruction will cause
instability of the iterative algorithms from noisy experimental data. One simple
strategy proposed by Marchesini et al. (2016) is to shift the probe to the mass center
of the complex image periodically. Other possible way is to consider the compact
support condition for the probe, or to get additional measurement for the probe
by letting the light go through the vacuum as Marchesini et al. (2016) and Chang
168 H. Chang et al.
et al. (2019a). The related numerical stability shall be investigated, and one can refer
to Huang and Xu (2020, 2021) for nonblind PR.
Flat Samples When the sample is nearly flat (such as weak absorption or scattering
for biological specimens using hard X-ray sources), there will be no sufficient
diversity of the measured phaseless data even by very dense scan. In such case,
the iterative algorithms mentioned in this survey will become slow, and the
recovered image quality gets worse. Acquiring of scattering map by linearization
for large features of the sample (Dierolf et al. 2010b) or modeling with additional
Kramers-Kronig relation (KKR) (Hirose et al. 2017) was exploited to improve the
reconstruction quality. Besides, pairwise relations between adjacent frames were
considered in Marchesini and Wu (2014) to accelerate projection algorithms for the
flat sample.
High-Dimensional Problems The formula for all four cases for BPR holds for a
thin (2D) object in paraxial approximation. For thick samples, the linear propagation
as (7) will cause obvious errors, and one has to consider the nonlinear transform as
Dierolf et al. (2010a). Other than the 3D imaging, high-dimensional problems may
result from the spectromicroscopy (Maiden et al. 2013), multimode decomposition
of partial coherence (Thibault and Menzel 2013; Chang et al. 2018a), and dichroic
ptychography (Chang et al. 2020; Lo et al. 2021). Such strong nonlinearity coupling
with the high-dimensional optimization causes difficulties for designing the stable
and high-throughput algorithm.
Theoretical Analysis
Further Discussions
Recently, some efficient algorithms have been developed for nonblind PR, such as
the second-order algorithms including Ma et al. (2018), Gao and Xu (2017), and
the multigrid method (Fung and Wendy 2020); however, it is not clear how they
can be applied to the blind problem. Hence, we only list the algorithms for the BPR
problem included in this survey, and please see the overview in Table 1.
Then we discuss the advantages and disadvantages of all listed algorithms. As
the unique convex method, the convex programming (Ahmed et al. 2018) provided
a convex relaxation such that it can gain the global minimizer. The dimension
of the lifted matrix is much higher than that of the original form leading to the
iterative algorithm with high complexity, and therefore it seems more impractical
for real experimental analysis. Moreover, it is limited to the convolutional PR as
the special case of BPR, since it relies on the structure as (50). All other listed
algorithms designed based on the nonconvex optimization problem work well for
perfect data (smaller scan stepsizes to guarantee enough redundancy and long
exposure with high signal-to-noise ratio (SNR)). The AP, ePIE-type, proximal,
and ADMM algorithms are of the first order and have closed-form expression
for all iterative steps, all of which have already been efficiently implemented for
practical ptychography and Fourier ptychography imaging instrument with low
computational complexity. As reported in Chang et al. (2019a), the ePIE algorithm
may get unstable for noisy measurements, and it seems more sensitive to the scan
stepsizes for ptychography imaging, while the ADMM algorithm (Chang et al.
2019a) for ptychography imaging can offer promising performance even for noisy
and insufficient data. The second-order algorithms utilizing the Hessian usually
requires much more computation cost, and it can be accelerated by Gauss-Newton
170 H. Chang et al.
Conclusions
In this survey, a short review of the iterative algorithms is provided for the nonlinear
optimization problem arising from the BPR problem, mainly consisting of three
types of algorithms as the first-order operator-splitting algorithms and second-order
algorithms and subspace methods. There still exist sophisticated experimental issues
and challenging theoretical analysis, which are further discussed in the last part.
Learning-based methods have been a powerful tool for solving inverse problem and
PR problems, which are not included in this survey.
This survey focuses on the BPR problems with forms expressed as (6). However,
not all the BPR problem belongs to the categories of (6). Very recently, a resolution-
enhanced parallel coded ptychography (CP) technique (Jiang et al. 2021, 2022) was
reported which achieves the highest numerical aperture. With the sample u and the
transmission profile of the engineered surface w, the phaseless data was generated
as
2
fjCP = w ◦ (Sj u κ1 ) κ2 ,
with κ1 , κ2 as two known PSFs. Such advanced cases should be further investigated.
Table 1 Overview of all iterative algorithms for the blind phase retrieval (BPR) problem in this
survey. “Y” and “N” are short for “yes” and “no”, respectively
Name Refs Convex(Y/N) Convergence(Y/N)
Alternating projection Thibault et al. (2009); N N
Marchesini et al. (2016)
ePIE-type algorithms Maiden and Rodenburg (2009); N N
Maiden et al. (2017)
Proximal algorithms Hesse et al. (2015); Yan (2020) N Y (Hesse et al.
2015)
ADMM Chang et al. (2019a); Fannjiang N Y (Chang et al.
and Zhang (2020) 2019a)
Convex programming Ahmed et al. (2018) Y Y
Second-order Yeh et al. (2015); Kandel et al. N N
algorithms (2021)
Subspace method Thibault and Guizar-Sicairos N Y (Chang et al.
(2012); Qian et al. (2014); 2021)
Chang et al. (2021)
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 171
Acknowledgments The work of the first author was partially supported by the NSFC (Nos.
11871372, 11501413) and Natural Science Foundation of Tianjin (18JCYBJC16600). The authors
would like to thank Prof. Guoan Zheng for the helpful discussions.
References
Ahmed, A., Aghasi, A., Hand, P.: Blind deconvolutional phase retrieval via convex programming
(2018). NeurIPS (arXiv:1806.08091)
Bendory, T., Sidorenko, P., Eldar, Y.C.: On the uniqueness of frog methods. IEEE Sig. Process.
Lett. 24(5), 722–726 (2017)
Bendory, T., Edidin, D., Eldar, Y.C.: Blind phaseless short-time fourier transform recovery. IEEE
Trans. Inf. Theory 66(5), 3232–3241 (2019)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex
and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Borzi, A., Schulz, V.: Multigrid methods for pde optimization. SIAM Rev. 51(2), 361–395 (2009)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1),
1–122 (2011)
Brandt, A., Livne, O.E.: Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics,
Revised Edition. SIAM, Philadelphia (2011)
Cai, J.-F., Huang, M., Li, D., Wang, Y.: The global landscape of phase retrieval II: quotient intensity
models (2021). arXiv preprint arXiv:2112.07997
Candes, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: Theory and algorithms.
IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Chang, H., Tai, X.-C., Wang, L.-L., Yang, D.: Convergence rate of overlapping domain
decomposition methods for the Rudin-Osher-Fatami model based on a dual formulation. SIAM
J. Image Sci. 8, 564–591 (2015)
Chang, H., Lou, Y., Ng, M.K., Zeng, T.: Phase retrieval from incomplete magnitude information
via total variation regularization. SIAM J. Sci. Comput. 38(6), A3672–A3695 (2016)
Chang, H., Enfedaque, P., Lou, Y., Marchesini, S.: Partially coherent ptychography by gradient
decomposition of the probe. Acta Crystallogr. Sect. A: Found. Adv. 74(3), 157–169 (2018a)
Chang, H., Lou, Y., Duan, Y., Marchesini, S.: Total variation–based phase retrieval for Poisson
noise removal. SIAM J. Imaging Sci. 11(1), 24–55 (2018b)
Chang, H., Marchesini, S., Lou, Y., Zeng, T.: Variational phase retrieval with globally convergent
preconditioned proximal algorithm. SIAM J. Imaging Sci. 11(1), 56–93 (2018c)
Chang, H., Enfedaque, P., Marchesini, S.: Blind ptychographic phase retrieval via convergent
alternating direction method of multipliers. SIAM J. Imaging Sci. 12(1), 153–185 (2019a)
Chang, H., Enfedaque, P., Zhang, J., Reinhardt, J., Enders, B., Yu, Y.-S., Shapiro, D., Schroer,
C.G., Zeng, T., Marchesini, S.: Advanced denoising for x-ray ptychography. Opt. Express
27(8), 10395–10418 (2019b)
Chang, H., Marcus, M.A., Marchesini, S.: Analyzer-free linear dichroic ptychography. J. Appl.
Crystallogr. 53(5), 1283–1292 (2020)
Chang, H., Glowinski, R., Marchesini, S., Tai, X.-C., Wang, Y., Zeng, T.: Overlapping domain
decomposition methods for ptychographic imaging. SIAM J. Sci. Comput. 43(3), B570–B597
(2021)
Chapman, H.N.: Phase-retrieval x-ray microscopy by wigner-distribution deconvolution. Ultrami-
croscopy 66(3), 153–172 (1996)
Chen, Y., Candes, E.: Solving random quadratic systems of equations is nearly as easy as solving
linear systems. In: Advances in Neural Information Processing Systems, pp. 739–747 (2015)
Chen, P., Fannjiang, A.: Fourier phase retrieval with a single mask by douglas–rachford algorithms.
Appl. Comput. Harmon. Anal. 44(3), 665-699 (2016)
172 H. Chang et al.
Dierolf, M., Menzel, A., Thibault, P., Schneider, P., Kewish, C.M., Wepf, R., Bunk, O., Pfeiffer,
F.: Ptychographic x-ray computed tomography at the nanoscale. Nature 467(7314), 436–439
(2010a)
Dierolf, M., Thibault, P., Menzel, A., Kewish, C.M., Jefimovs, K., Schlichting, I., von König,
K., Bunk, O., Pfeiffer, F.: Ptychographic coherent diffractive imaging of weakly scattering
specimens. New J. Phys. 12(3), 035017 (2010b)
Elser, V.: Phase retrieval by iterated projections. J. Opt. Soc. Am. A 20(1), 40–55 (2003)
Elser, V., Lan, T.-Y., Bendory, T.: Benchmark problems for phase retrieval. SIAM J. Imaging Sci.
11(4), 2429–2455 (2018)
Enfedaque, P., Chang, H., Enders, B., Shapiro, D., Marchesini, S.: High performance partial
coherent x-ray ptychography. In: International Conference on Computational Science, pp. 46–
59. Springer (2019)
Fan, J.-Y., Yuan, Y.-X.: On the quadratic convergence of the levenberg-marquardt method without
nonsingularity assumption. Computing 74(1), 23–39 (2005)
Fannjiang, A.: Raster grid pathology and the cure. Multiscale Model. Simul. 17(3), 973–995
(2019)
Fannjiang, A., Strohmer, T.: The numerics of phase retrieval. Acta Numer. 29, 125–228 (2020)
Fannjiang, A., Zhang, Z.: Fixed point analysis of douglas–rachford splitting for ptychography and
phase retrieval. SIAM J. Imaging Sci. 13(2), 609–650 (2020)
Fung, S.W., Wendy, Z.: Multigrid optimization for large-scale ptychographic phase retrieval.
SIAM J. Imaging Sci. 13(1), 214–233 (2020)
Gao, B., Xu, Z.: Phaseless recovery using the Gauss–Newton method. IEEE Trans. Sig. Process.
65(22), 5885–5896 (2017)
Gao, B., Wang, Y., Xu, Z.: Solving a perturbed amplitude-based model for phase retrieval. IEEE
Trans. Sig. Process. 68, 5427–5440 (2020)
Godard, P., Allain, M., Chamard, V., Rodenburg, J.: Noise models for low counting rate coherent
diffraction imaging. Opt. Express 20(23), 25914–25934 (2012)
Grohs, P., Koppensteiner, S., Rathmair, M.: Phase retrieval: uniqueness and stability. SIAM Rev.
62(2), 301–350 (2020)
Guizar-Sicairos, M., Fienup, J.R.: Phase retrieval with transverse translation diversity: a nonlinear
optimization approach. Opt. Express 16(10), 7264–7278 (2008)
Guizar-Sicairos, M., Johnson, I., Diaz, A., Holler, M., Karvinen, P., Stadler, H.-C., Dinapoli, R.,
Bunk, O., Menzel, A.: High-throughput ptychography using eiger: scanning x-ray nano-imaging
of extended regions. Opt. Express 22(12), 14859–14870 (2014)
Gürsoy, D., Chen-Wiegart, Y.-C.K., Jacobsen, C.: Lensless x-ray nanoimaging: revolutions and
opportunities. IEEE Sig. Process. Mag. 39(1), 44–54 (2022)
Hackbusch, W.: Multi-grid Methods and Applications, Springer, Berlin, Heidelberg (1985)
Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental
algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013)
Hesse, R., Luke, D.R., Sabach, S., Tam, M.K.: Proximal heterogeneous block implicit-explicit
method and application to blind ptychographic diffraction imaging. SIAM J. Imaging Sci. 8(1),
426–457 (2015)
Hirose, M., Shimomura, K., Burdet, N., Takahashi, Y.: Use of Kramers-Kronig relation in
phase retrieval calculation in x-ray spectro-ptychography. Opt. Express 25(8), 8593–8603
(2017)
Huang, M., Xu, Z.: The estimation performance of nonlinear least squares for phase retrieval.
IEEE Trans. Inf. Theory 66(12), 7967–7977 (2020)
Huang, M., Xu, Z.: Uniqueness and stability for the solution of a nonlinear least squares problem
(2021). arXiv preprint arXiv:2104.10841
Huang, X., Yan, H., Harder, R., Hwu, Y., Robinson, I.K., Chu, Y.S.: Optimization of overlap
uniformness for ptychography. Opt. Express 22(10), 12634–12644 (2014)
Huang, Y., Jiang, S., Wang, R., Song, P., Zhang, J., Zheng, G., Ji, X., Zhang, Y.: Ptychography-
based high-throughput lensless on-chip microscopy via incremental proximal algorithms. Opt.
Express 29(23), 37892–37906 (2021)
4 Fast Iterative Algorithms for Blind Phase Retrieval: A Survey 173
Jaganathan, K., Eldar, Y.C., Hassibi, B.: Stft phase retrieval: uniqueness guarantees and recovery
algorithms. IEEE J. Sel. Top. Sig. Process. 10(4), 770–781 (2016)
Jiang, S., Guo, C., Song, P., Zhou, N., Bian, Z., Zhu, J., Wang, R., Dong, P., Zhang, Z., Liao, J.
et al.: Resolution-enhanced parallel coded ptychography for high-throughput optical imaging.
ACS Photon. 8(11), 3261–3271 (2021)
Jiang, S., Guo, C., Bian, Z., Wang, R., Zhu, J., Song, P., Hu, P., Hu, D., Zhang, Z., Hoshino, K. et al.:
Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal
resolution. Biosens. Bioelectron. 196, 113699 (2022)
Kandel, S., Maddali, S., Nashed, Y.S., Hruszkewycz, S.O., Jacobsen, C., Allain, M.: Efficient
ptychographic phase retrieval via a matrix-free levenberg-marquardt algorithm. Opt. Express
29(15), 23019–23055 (2021)
Kane, D.J., Vakhtin, A.B.: A review of ptychographic techniques for ultrashort pulse measurement.
Progress Quantum Electron.vol. 81, 100364 (2021)
Langer, A., Gaspoz, F.: Overlapping domain decomposition methods for total variation denoising.
SIAM J. Numer. Anal. 57(3), 1411–1444 (2019)
Lee, C.-O., Park, E.-H., Park, J.: A finite element approach for the dual Rudin–Osher–Fatemi
model and its nonoverlapping domain decomposition methods. SIAM J. Sci. Comput. 41(2),
B205–B228 (2019)
Lo, Y.H., Zhou, J., Rana, A., Morrill, D., Gentry, C., Enders, B., Yu, Y.-S., Sun, C.-Y., Shapiro,
D.A., Falcone, R.W., Kapteyn, H.C., Murnane, M.M., Gilbert, P.U.P.A., Miao, J.: X-ray linear
dichroic ptychography. Proc. Natl. Acad. Sci. 118(3), 2019068118 (2021)
Luke, D.R.: Relaxed averaged alternating reflections for diffraction imaging. Inverse Probl. 21(1),
37–50 (2005)
Ma, C., Liu, X., Wen, Z.: Globally convergent levenberg-marquardt method for phase retrieval.
IEEE Trans. Inf. Theory 65(4), 2343–2359 (2018)
Maiden, A.M., Rodenburg, J.M.: An improved ptychographical phase retrieval algorithm for
diffractive imaging. Ultramicroscopy 109(10), 1256–1262 (2009)
Maiden, A., Morrison, G., Kaulich, B., Gianoncelli, A., Rodenburg, J.: Soft x-ray spectromi-
croscopy using ptychography with randomly phased illumination. Nat. Commun. 4, 1669
(2013)
Maiden, A., Johnson, D., Li, P.: Further improvements to the ptychographical iterative engine.
Optica 4(7), 736–745 (2017)
Marchesini, S.: Invited article: a unified evaluation of iterative projection algorithms for phase
retrieval. Rev. Sci. Instrum. 78(1), 011301 (2007)
Marchesini, S., Wu, H.-T.: Rank-1 accelerated illumination recovery in scanning diffractive
imaging by transparency estimation (2014). arXiv preprint arXiv:1408.1922
Marchesini, S., Schirotzek, A., Yang, C., Wu, H.-T., Maia, F.: Augmented projections for
ptychographic imaging. Inverse Probl. 29(11), 115009 (2013)
Marchesini, S., Tu, Y.-C., Wu, H.-T.: Alternating projection, ptychographic imaging and phase
synchronization. Appl. Comput. Harmon. Anal. 41(3), 815-851 (2015)
Marchesini, S., Krishnan, H., Shapiro, D.A., Perciano, T., Sethian, J.A., Daurer, B.J., Maia, F.R.:
SHARP: a distributed, GPU-based ptychographic solver. J. Appl. Crystallogr. 49(4), 1245–1252
(2016)
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind.
Appl. Math. 11(2), 431–441 (1963)
Nash, S.G.: A multigrid approach to discretized optimization problems. Optim. Methods Softw.
14(1–2), 99–116 (2000)
Nashed, Y.S., Vine, D.J., Peterka, T., Deng, J., Ross, R., Jacobsen, C.: Parallel ptychographic
reconstruction. Opt. Express 22(26), 32082–32097 (2014)
Odstrčil, M., Menzel, A., Guizar-Sicairos, M.: Iterative least-squares solver for generalized
maximum-likelihood ptychography. Opt. Express 26(3), 3108–3123 (2018)
Ou, X., Zheng, G., Yang, C.: Embedded pupil function recovery for fourier ptychographic
microscopy. Opt. Express 22(5), 4960–4972 (2014)
Pfeiffer, F.: X-ray ptychography. Nat. Photon 12, 9–17 (2018)
174 H. Chang et al.
Qian, J., Yang, C., Schirotzek, A., Maia, F., Marchesini, S.: Efficient algorithms for ptychographic
phase retrieval. Inverse Probl. Appl. Contemp. Math. 615, 261–280 (2014)
Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval. In: Proceedings of
the 31st International Conference on Neural Information Processing Systems, pp. 6088–6098
(2017)
Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval via gradient descent. IEEE
Trans. Inf. Theory 66(3), 1785–1821 (2019)
Reinhardt, J., Hoppe, R., Hofmann, G., Damsgaard, C.D., Patommel, J., Baumbach, C., Baier, S.,
Rochet, A., Grunwaldt, J.-D., Falkenberg, G., Schroer, C.G.: Beamstop-based low-background
ptychography to image weakly scattering objects. Ultramicroscopy 173, 52–57 (2017)
Rodenburg, J.M.: Ptychography and related diffractive imaging methods. Adv. Imaging Electron
Phys. 150, 87–184 (2008)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied
Mathematics (2003)
Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with
application to optical imaging: a contemporary overview. Sig. Process. Mag. IEEE 32(3), 87–
109 (2015)
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International
Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)
Thibault, P., Guizar-Sicairos, M.: Maximum-likelihood refinement for coherent diffractive
imaging. New J. Phys. 14(6), 063004 (2012)
Thibault, P., Menzel, A.: Reconstructing state mixtures from diffraction measurements. Nature
494(7435), 68–71 (2013)
Thibault, P., Dierolf, M., Bunk, O., Menzel, A., Pfeiffer, F.: Probe retrieval in ptychographic
coherent diffractive imaging. Ultramicroscopy 109(4), 338–343 (2009)
Trebino, R., DeLong, K.W., Fittinghoff, D.N., Sweetser, J.N., Krumbügel, M.A., Richman, B.A.,
Kane, D.J.: Measuring ultrashort laser pulses in the time-frequency domain using frequency-
resolved optical gating. Rev. Sci. Instrum. 68(9), 3277–3295 (1997)
Wang, C., Xu, Z., Liu, H., Wang, Y., Wang, J., Tai, R.: Background noise removal in x-ray
ptychography. Appl. Opt. 56(8), 2099–2111 (2017)
Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and
ptychographic phase retrieval. Inverse Probl. 28(11), 115010 (2012)
Wu, C., Tai, X.-C.: Augmented Lagrangian method, dual methods and split-Bregman iterations for
ROF, vectorial TV and higher order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Xin, L., Zaiwen, W., Ya-Xiang, Y.: Subspace methods for nonlinear optimization. CSIAM Trans.
Appl. Math. 2(4), 585–651 (2021)
Xu, J., Zikatanov, L.: Algebraic multigrid methods. Acta Numer. 26, 591–721 (2017)
Xu, J., Tai, X.-C., Wang, L.-L.: A two-level domain decomposition method for image restoration.
Inverse Probl. Imaging 4(3), 523–545 (2010)
Yamashita, N., Fukushima, M.: On the rate of convergence of the levenberg-marquardt method.
In: Alefeld, G., Chen, X. (eds.) Topics in Numerical Analysis, pp. 239–249. Springer, Vienna
(2001)
Yan, H.: Ptychographic phase retrieval by proximal algorithms. New J. Phys. 22(2), 023035.(2020)
Yeh, L.-H., Dong, J., Zhong, J., Tian, L., Chen, M., Tang, G., Soltanolkotabi, M., Waller, L.:
Experimental robustness of fourier ptychography phase retrieval algorithms. Opt. Express
23(26), 33214–33240 (2015)
Zheng, G., Horstmeyer, R., Yang, C.: Wide-field, high-resolution fourier ptychographic
microscopy. Nat. Photon. 7, 739–745 (2013)
Zheng, G., Shen, C., Jiang, S., Song, P., Yang, C.: Concept, implementations and applications of
fourier ptychography. Nat. Rev. Phys. 3(3), 207–223 (2021)
Modular ADMM-Based Strategies for
Optimized Compression, Restoration, and 5
Distributed Representations of Visual Data
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Modular ADMM-Based Optimization: General Construction and Guidelines . . . . . . . . . . . . 178
Unconstrained Lagrangian Optimizations via ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Employing Black-Box Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Another Splitting Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Image Restoration Based on Denoising Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Modular Optimizations Based on Standard Compression Techniques . . . . . . . . . . . . . . . . . . . 185
Preliminaries: Lossy Compression via Operational Rate-Distortion Optimization . . . . . . . 185
Restoration by Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Modular Strategies for Intricate Compression Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Distributed Representations Using Black-Box Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
The General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Modular Optimizations for Holographic Compression of Images . . . . . . . . . . . . . . . . . . . . 199
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Abstract
Y. Dar ()
Electrical and Computer Engineering Department, Rice University, Houston, TX, USA
e-mail: [email protected]
A. M. Bruckstein ()
Computer Science Department, Technion – Israel Institute of Technology, Haifa, Israel
e-mail: [email protected]
Keywords
Introduction
During the last several decades, significant attention and efforts were invested
in establishing solutions for a wide variety of imaging problems. The proposed
methods often rely on models and techniques adapted to visual signals and
the relevant problem settings. Naturally, along the contemporary challenges and
open questions of the field, there are excellent solutions to various fundamental
problems that were extensively studied throughout the years. This situation suggests
addressing currently open problems by exploring their relations to existing methods
developed for basic tasks.
A lot of work has been devoted to fundamental problems such as denoising of a
single image contaminated by additive white Gaussian noise and lossy compression
of still images with respect to squared errors as quality assessment measures.
Persistent and thorough studies of such basic problems (in their classical settings)
led to excellent solutions that are believed to be nearly perfect (see, e.g., Chatterjee
and Milanfar 2009). However, the techniques for many other imaging tasks are in
various degrees of maturity that leave room for possibly considerable improvements.
Examples for types of currently active research lines include jointly addressing
multiple imaging tasks (Burger et al. 2018; Corona et al. 2019a,b; Dar et al.
2018a,b,c,d), restoration with uncertainty about the degradation operator (Lai et al.
2016; Bahat et al. 2017), image compression with respect to modern perceptual
quality measures (Ballé et al. 2017; Laparra et al. 2017), and tasks (also fundamental
ones such as denoising and compression) involving visual data beyond a single
natural image (this includes video, hyperspectral, medical, etc.).
In this chapter, we overview a recent and fascinating approach for elegant
utilization of existing knowledge and available imaging tools for complex problems
of interest. The general idea is to define an optimization problem such that when
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 177
where t denotes the iteration index, u(t) ∈ RN is the scaled dual variable, and β is
an auxiliary parameter introduced by the augmented Lagrangian. Then, the ADMM
form of the problem is derived by applying one iteration of alternating minimization
on (3), yielding a series of simpler optimizations:
180 Y. Dar and A. M. Bruckstein
β
2
v̂(t) = argmin R (v) + v − z̃(t) (5)
v∈M 2 2
β 2
ẑ(t) = argmin λD (x, z) + z − ṽ(t) (6)
z∈R N 2 2
u(t+1) = u(t) + v̂(t) − ẑ(t) (7)
where z̃(t) = ẑ(t−1) − u(t) and ṽ(t) = v̂(t) + u(t) . Importantly, in the last ADMM-
based structure, the possibly nontrivial domain M and the related function R
are decoupled from the second, perhaps intricate, function D. Accordingly, the
new subtasks in the process are much simpler. Specifically, note that (6) is a
continuous optimization problem over RN , regardless of the original domain of
problem (1) that may be even discrete. Note that in the general case, where
R, D, and M can induce non-convexity and discreteness to the problem, there
are no convergence guarantees corresponding to the ADMM process formulated
above, and its usefulness should be evaluated empirically. However, this common
practice has already provided many useful methods for various applications, and
selected examples of those are presented in sections “Image Restoration Based on
Denoising Modules” and “Modular Optimizations Based on Standard Compression
Techniques”.
While the ADMM form in (5), (6), and (7) indeed seems easier to carry out than a
complex instance of (1), the explicit definition and deployment of M and/or R in
the optimization stage (5) may still require some engineering efforts (such as design,
implementation, etc.). In the case of restoration tasks, this means detailed definitions
and implementations of regularization functions. For compression architectures,
one should establish binary compressed representations matching signal-domain
reconstructions. As explained next, the fundamental idea of using black-box
modules is to avoid explicit treatment of such details and still achieve excellent,
or even state-of-the-art, results with respect to the actual goal.
The main guideline when addressing a problem based on modular optimization
strategies is to formulate the initial optimization problem (in our case, an instance
of (1)) and choose an iterative optimization technique (here, ADMM) such that the
resulting sequential process includes:
fundamental problem (e.g., denoising, compression), then one can replace the
direct treatments of (5) with application of a module addressing the same basic
problem – possibly based on another formulation or even an algorithm that does
not correspond to an explicit mathematical expression. Such module is applied
as a black box and denoted here as
v̂(t) = BlackBoxModule z̃(t) ; θ β (8)
where θ β (which will be denoted from now on as θ ) is a parameter generalizing
the role of the Lagrange multiplier β in determining the implicit trade-off
between the components appeared in (5) before the replacement with the module.
The generic method is summarized in Algorithm 1, where the number of
β
parameters is reduced based on the relation β̃ 2λ such that only the parameters
θ and β̃ are required as inputs for the method (for simplicity, we do not use the
fact that both θ and β̃ originally depend on β).
• A subproblem considering the distance function D while having a form that can
be practically solved. This refers here to subproblem (6). In many interesting
applications, the distance function is a particular case of
K
2
D (x, z) = αj Aj x − Bj z2 (9)
j =1
K
for some positive integer K, positive real values αj j =1 , and matrices
Aj j =1 ∈ RÑ ×M , Bj j =1 ∈ RÑ ×N . Then, for the form (9), the optimization
K K
step is a least squares problem that can be easily addressed for many structures
of matrices Aj j =1 ∈ RÑ ×M and Bj j =1 ∈ RÑ ×N .
K K
One should note that the modular optimization process in Algorithm 1 provides
a result that is an output of the black-box module applied in the last iteration. This
eventual output can be the signal v̂(t) ∈ M produced by the module at the last
iteration and/or other relevant data possibly outputted by the module. This structure
is useful, for example, in the case of compression where the important output is
a binary compressed representation (i.e., a direct output of the module which is
coupled with the signal v̂(t) ∈ M). Various applications may benefit from an
alternative application that is described next.
overviewed, here, we assume that the output domain of the basic optimization
problem (1) satisfies M = RN .
The alternative process stems from a delicate difference in the variable splitting
applied on the basic problem, namely, instead of (2), we write
β
2
v̂(t) = argmin λD (x, v) + v − z̃(t) (11)
v∈RN 2 2
β
2
ẑ(t) = argmin R (z) + z − ṽ(t) (12)
z∈RN 2 2
u(t+1) = u(t) + v̂(t) − ẑ(t) (13)
where z̃(t) = ẑ(t−1) − u(t) and ṽ(t) = v̂(t) + u(t) . Note that the current procedure
in (11), (12), and (13) includes the same subproblems as in (5), (6), and (7) but in a
different order (and also up to the setting M = RN used in this subsection).
Like in section “Employing Black-Box Modules”, we identify the stage con-
sidering the function R, here in (12), as a solution to a fundamental problem that
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 183
Algorithm 2 General Modular Optimization – Type II: Overall Results Are Not
Module Outputs
1: Inputs: x, θ, β̃.
2: Initialize t = 0 , ẑ(0) = x , u(1) = 0.
3: repeat
4: t ←t +1
5: z̃(t) = ẑ(t−1) − u(t) 2
6: v̂(t) = argmin D (x, v) + β̃ v − z̃(t)
v∈RN 2
x = Hv0 + n, (14)
where pR (z) exp −R (z) is the prior probability function assumed for the
clean signal and pη is the probability density function of an additive Gaussian noise
vector η with i.i.d. components having zero mean and 1/β variance. Accordingly,
the correspondence of (12) to denoising problems motivates the usage of black-box
denoisers as the modules applied at stage 8 of Algorithm 2. These denoisers should
be set to remove noise having variance of 1/β from the signal ṽ(t) . Importantly,
the substitution of (12) with applications of Gaussian denoisers was experimentally
shown beneficial also for denoisers that do not follow the MAP estimation form
or the regularized inverse problem approach. Specifically, one can even employ
algorithmic denoisers that were designed based on completely different mindsets.
The denoising-based restoration procedure for an arbitrary degradation operator H
is summarized in Algorithm 3.
The decoupling induced by the ADMM structure leads to an additional concep-
tual simplification: stage 6 of Algorithm 3 can be interpreted as a 2 -constrained
deconvolution problem (or 2 -regularized least squares computation) with respect
to the degradation operator H. Note that this is one of the simplest restoration
formulations addressing the degradation process (14) from the regularized inverse-
problem perspective. The corresponding analytic solution is
−1
v̂(t) = HT H + β̃I HT x + β̃ z̃(t) . (17)
practical convergence (Figs. 1d and 2d). See Chan et al. (2017) for details and
analysis of the convergence appearing here.
C : RN → B, (18)
b = C (x) , (19)
Fig. 1 Deblurring using denoising-based Plug-and-Play method (Chan et al. 2017). The utilized
denoiser is BM3D (Dabov et al. 2007). The degradation includes a Gaussian blur (of 9 × 9 pixels
kernel and 1.75 standard deviation), followed by additive noise with σn = 10, applied on the House
image (256×256 pixels). (a) The original image. (b) Deteriorated image. (c) Restored image using
the method by Chan et al. (2017) (29.33 dB). (d) The PSNR evolution of the intermediate estimate
v̂(t) along the restoration-process iterations
v=F b , (20)
where
F :B→S (21)
Fig. 2 Inpainting using denoising-based Plug-and-Play method (Chan et al. 2017). The employed
denoiser is BM3D (Dabov et al. 2007). The degradation includes 80% missing pixels and additive
noise with σn = 10, applied on the House image (256 × 256 pixels). (a) The original image. (b)
Deteriorated image. (c) Restored image using the method by Chan et al. (2017) (30.98 dB). (d) The
PSNR evolution of the intermediate estimate v̂(t) along the iterations
further processed or outputted to a user. For example, in the case of visual signals,
v is usually displayed.
Modern compression architectures (see, e.g., Ortega and Ramchandran 1998;
Sullivan and Wiegand 1998; Shukla et al. 2005; Sullivan et al. 2012) implement
the compression function C using operational rate-distortion optimizations, a tool
established by Shoham and Gersho (1988), Chou et al. (1989), and Ortega and
Ramchandran (1998), and can be explained using our notions as follows. A given
deterministic signal x is compressed based on an optimization process searching for
its best compressed representation b ∈ B, coupled with the decompressed signal
188 Y. Dar and A. M. Bruckstein
v ∈ S. The optimization trades off two opposing aspects of the representation: bit-
cost and reconstruction quality. The bit-cost of the binary representation b ∈ B
is its length. Since, by (20), each b ∈ B corresponds to one decompressed signal
v ∈ S, we define the bit-cost of a decompressed signal v ∈ S as the length of its
binary representation b = F −1 (v). We also define the function RS (v) to evaluate
the bit-cost of the compressedbinary
representation associated with v. Specifically,
for v ∈ S that satisfies v = F b , the bit-cost is
RS (v) length b , (22)
where length {·} counts the length of a binary description. The second part of the
trade-off is the reconstruction distortion, D (x, v), evaluating the distance between
the compression input x and its decompressed form v. Note that the distortion value
is real and nonnegative. Then, the optimization task including bit-cost constraints,
corresponding to storage space or transmission bandwidth limitations, is
v̂ = argmin D (x, v)
v∈S (23)
subject to RS (v) ≤ r
v̂ = argmin RS (v)
v∈S (24)
subject to D (x, v) ≤ d
Restoration by Compression
RS (z) for z ∈ S
R(z) = , (26)
∞ for z ∈ /S
Fig. 3 The deblurring experiment (settings #2 in Dar et al. 2018c) for the Cameraman image
(256 × 256 pixels). (a) The underlying image. (b) Degraded image (20.76 dB). (c) Restored image
using Algorithm 5 with JPEG2000 compression (28.10 dB). (d) Restored image using Algorithm 5
with HEVC compression (30.14 dB)
Fig. 4 The inpainting experiment (80% missing pixels) for the Barbara image (512 × 512
pixels). (a) The original image. (b) Deteriorated image. (c) Restored image using Algorithm 5
with JPEG2000 compression (24.83 dB). (d) Restored image using Algorithm 5 with HEVC
compression (28.80 dB)
(Dar et al. 2018a,b,d), where ADMM-based modular strategies are employed for
optimizing end-to-end performance of systems involving acquisition, compression,
and rendering stages. The main idea is to decouple unusual distortion metrics
from the actual compression tasks that, in turn, can be applied using black-
box compression modules (which are operated with respect to the elementary
squared-error metric). Hence, this methodology opens a new research path for
addressing complex compression problems including, for example, optimizations
for nonlocal processing/prediction architectures, enhancement filters or degradation
processes, and perceptual metrics assessing subjective quality of audio/visual
signals. Indeed, a successful implementation of this approach for perceptually
oriented image compression (using an alternating minimization procedure) was
proposed by Rott Shaham and Michaeli (2018).
In this section, we overview the System-Aware Compression concept (Dar et al.
2018a,b,d), demonstrating the main aspects of using modular optimizations for
intricate compression problems. The motivation for the System-Aware Compression
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 193
framework stems from a structure common to many imaging systems (see Fig. 5),
where a source signal is first acquired, then compressed for its storage or trans-
mission, and eventually decompressed and rendered back into a signal that can
be displayed or further processed. Obviously, in such systems, the quality of the
eventual output depends on the entire acquisition-rendering chain and not solely
on the lossy compression component. Yet, the employed compression technique is
often system independent, hence inducing suboptimal rate-distortion performance
for the entire system. The System-Aware Compression architecture is a practical
and modular way for optimizing the end-to-end performance (in its rate-distortion
trade-off sense) of such acquisition-rendering systems.
Fig. 5 The general imaging system structure motivating the System-Aware Compression approach
Let us describe the system structure considered for the mathematical develop-
ment of the method (Fig. 6). A source signal, an N-length column vector x ∈ RN ,
undergoes a linear processing represented by the M × N matrix Aand, then,
deteriorated by an additive white Gaussian noise vector n ∼ N 0, σn2 I , resulting
in the signal
w = Ax + n (29)
where w and n are M-length column vectors. We represent the lossy compression
procedure via the mapping C : RM → B from the M-dimensional signal domain
to a discrete set B of binary compressed representations (which may have different
lengths). The signal w is the input to the compression component of the system,
producing the compressed binary data b = C (w) that can be stored or transmitted
in an error-free manner. Then, on a device and settings depending on the specific
application,
the compressed data b ∈ B is decompressed to provide the signal v =
F b where F : B → S represents the decompression mapping between the binary
compressed representations in B to the corresponding decompressed signals in the
discrete set S ⊂ RM . The decompressed signal v is further processed by the linear
operator denoted as the N × M matrix B, resulting in the system output signal
y = Bv, (30)
1
w − Ay2 .
Ds w, y = 2
(31)
M
This metric conforms with the fact that if y is close to x, then, by (29), w will be
close to Ay up to the noise n. Indeed, for the ideal case of y = x, the metric (31)
becomes
1
Ds (w, x) = n 2
2 ≈ σn2 (32)
M
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 195
1
w − ABv2 .
Dc (w, v) = 2
(33)
M
Since the operator B produces the output signal y, an ideal result will be y = PB x,
where PB is the matrix projecting onto B’s range. The corresponding ideal distortion
is
1
2
d0 Ds w, PB x = A I − PB x + n . (34)
M 2
We use the distortion metric (33) to constrain the bit-cost minimization in the
following rate-distortion optimization
v̂ = argmin R (v)
v∈S
(35)
1
w − ABv2 ≤ d0 + d
subject to d0 ≤ 2
M
where R (v) evaluates the length of the binary compressed description of the decom-
pressed signal v and d ≥ 0 determines the allowed distortion. By (34), the value d0
depends on the operator A, the null space of B, the source signal x, and the noise
realization n. Since x and n are unknown, d0 cannot be accurately calculated in the
operational case (in Dar et al. (2018d) we formulate the expected value of d0 for
the case of a cyclo-stationary Gaussian source signal). We address the optimization
(35) using its unconstrained Lagrangian form
1
w − ABv2
v̂ = argmin R (v) + λ 2
(36)
v∈S M
described in Algorithm 1, taking here the form of Algorithm 6. Note that we use
the form of Algorithm 1 where the eventual output is the output of the module
applied in the last iteration, which in our case corresponds to the output of the
compression module in the last iteration (and this is the desired output because
in this section we consider compression application, unlike the Restoration by
Compression method in Algorithm 4 that its purpose is restoration by means of
compression-based regularization). The interested reader is referred to Dar et al.
(2018d) for a rate-distortion theoretic analysis for cyclo-stationary Gaussian signals
and linear shift-invariant operators, explaining various aspects of the proposed
procedure.
Fig. 7 Comparison of the System-Aware Compression approach and regular compression. The
settings consider a Gaussian blur operator degrading the decompressed image. The intermediate
and eventual images of the regular and the modular optimization process are presented. (a) Input.
(b) Regular Decompression (3.75 bpp). (c) Regular Degraded Decompression (34.32 dB). (d)
System-Aware Compression: Input to Last Iteration Compression. (e) System-Aware Compression:
Decompression (2.21 bpp). (f) System-Aware Compression: Degraded Decompression (41.84 dB)
applied on a sharpened version (see Fig. 7d) adjusted to the known blur operator;
then, the compressed image at bit-rate 2.21 bpp eventually results in a degraded
decompression with moderate blur effects (Fig. 7f) and a PSNR gain of 7.52 dB
with respect to the regular compression (which used even a higher bit-rate). See
Dar et al. (2018b) for extensive experimental evaluation including PSNR-bitrate
performance curves and comparison to additional alternatives. Furthermore, LCD
display degradations associated with motion blur are also examined by Dar et al.
(2018b).
Additional evaluations of the System-Aware Compression approach are provided
by Dar et al. (2018d) for video compression settings including acquisition degrada-
tion of low-pass filtering and subsampling and post-decompression nearest-neighbor
upsampling. In Dar et al. (2018a), the idea is demonstrated for a simplified model
of multimedia distribution networks, where a set of possible degradation operators
and their probabilities are considered by the optimized compression process.
198 Y. Dar and A. M. Bruckstein
All the above problems conduct optimizations for finding one signal (or compressed
representation) that minimizes a Lagrangian cost of interest. As shown, these tasks
are addressed very well by modular optimizations, relying on sequential black-box
module applications. In this section, we demonstrate that the modular optimization
approach is useful also to problems seeking for a set of signals (or representations)
that collaboratively minimize a joint Lagrangian cost.
The following extends the settings and developments given in section “Uncon-
strained Lagrangian Optimizations via ADMM”. The general optimization form for
distributed representations broadens the single-representation problem in (1) to an
unconstrained Lagrangian form optimizing several signals, i.e.,
K
v̂1 , . . . , v̂K = argmin R (vi ) + λD x; v1 , . . . , vK (37)
v1 ,...,vK ∈M i=1
K
K
K
v̂i i=1
, ẑi i=1 = argmin R (vi ) + λD x; z1 , . . . , zK
i=1 ∈M i=1
{vi }K
(38)
{zi }K
i=1 ∈R
N
subject to vi = zi for i = 1, . . . , K
Then, the scaled form of the augmented Lagrangian and the method of multipliers
(Boyd et al. 2011, Ch. 2) renders (38) into the sequential process
K K
v̂(t)
i , ẑ(t)
i = (39)
i=1 i=1
K
K
β (t) 2
argmin R (vi ) + λD x; z1 , . . . , zK + vi − zi + ui
2 2
i=1 ∈M i=1
{vi }K i=1
{zi }K
i=1 ∈R
N
(t+1) (t) (t) (t)
ui = ui + v̂i − ẑi for i = 1, . . . , K (40)
(t) K
where t denotes the iteration index, ui ∈ RN are the scaled dual variables,
i=1
and β is an auxiliary parameter originating at the augmented Lagrangian (note that
β is an intentionally joined parameter for the purpose of easing the parameter tuning
process). The corresponding ADMM process is obtained by applying one iteration
of alternating minimization on (39), leading to
β
(t) 2
v̂(t)
i = argmin R (vi ) + vi − z̃i for i = 1, . . . , K (41)
vi ∈M 2 2
(t) i−1 (t−1) K β
2
ẑ(t)
i = argmin λD x; ẑi , zi , ẑi + zi − ṽ(t)
i 2
zi ∈RN j =1 j =i+1 2
for i = 1, . . . , K (42)
u(t+1)
i = u(t)
i + v̂ (t)
i − ẑ(t)
i for i = 1, . . . , K (43)
We now turn to exemplify the generic approach in Algorithm 7 for the purpose of
optimizing distributed representations in compressed, standard-compatible, forms.
200 Y. Dar and A. M. Bruckstein
Our recent framework for holographic compression (Dar and Bruckstein 2021)
represents a given signal using a set of distinct compressed descriptions, that any
subset of them enables reconstruction of the signal at a quality determined solely
by the number of compressed representations utilized. This property of holographic
representations is useful for designing progressive refinement mechanisms indepen-
dent of the order the representations are accessible (Bruckstein et al. 1998, 2000,
2018).
In Dar and Bruckstein (2021) we identified the shift sensitivity of standard
compression techniques as a property useful for constructing holographic rep-
resentations in binary compressed forms. Specifically, compressions of shifted
versions of an image provide a set of distinct decompressed images of similar
individual qualities, but combining subsets of them (by back-shifts and averaging)
achieves remarkable quality gains (see details in Dar and Bruckstein 2021). While
this architecture is new and interesting, it does not include optimization aspects.
This led us to suggest an optimization procedure unleashing the potential benefits
of the shift-based holographic compression settings. Here we can consider this
optimization framework as a special case of the generic process described in
Algorithm 7, described as follows. First, the general M, R notions are set to
the respective components S, RS of a standard compression method (as defined
in section “Preliminaries: Lossy Compression via Operational Rate-Distortion Opti-
mization”). This makes the first component in (37) the accumulated bit-cost of all
the compressed representations. In Dar and Bruckstein (2021) we set the function D
to evaluate the average MSE of reconstructions formed using subsets of m out of the
K representations, where m ∈ {2, . . . , K} and assuming K > 1. This improves the
reconstruction quality for subsets of m representations, at the inevitable expense of
reducing their individual qualities. Therefore, we also include in D a regularization
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 201
1
D x; v1 , . . . , vK = K D (m) x; vi1 , . . . , vim (44)
m (i1 ,...,im )∈([[K]])
m
1 (1)
K
+η D x; vi
K
i=1
1
2
D (1) x; vi x − STi vi (45)
N 2
2
1 1 m
D (m)
x; vi1 , . . . , vim x− Sij vij
T
(46)
N m
j =1 2
is the MSE of reconstruction using the m representations vi1 , . . . , vim . The matrices
STi and STij correspond to back shift operators matching the shift forms used to create
the compressed representations (further details are available in Dar and Bruckstein
2021). Then, by the settings of M, R, and D, Algorithm 7 is specified for optimizing
shift-based holographic compressed representations – this process is described in
Algorithm 8.
In Figs. 8 and 9, we provide representative results taken from Dar and Bruckstein
(2021). First, Fig. 8 presents reconstructions obtained from JPEG2000-compatible
holographic compressions optimized for using sets of four representations. Specifi-
cally note the similar quality obtained using the individual representations and how
they collaboratively achieve progressive refinement. This behavior is also clearly
demonstrated in Fig. 9 by the curves of PSNR versus number of representations
(packets) used for reconstructions. In particular, Fig. 9 shows the curves obtained
for all the subset combinations in each of the examined methods. This exhibits the
ability of the proposed method for optimizing reconstructions that rely on a specified
number of representations (independent of the actual participating signals).The
interested reader is referred to Dar and Bruckstein (2021) for additional details and
experimental demonstrations.
202 Y. Dar and A. M. Bruckstein
Conclusion
Fig. 8 Examples (taken from Dar and Bruckstein 2021) for reconstructions of the “Cameraman”
image using several representations out of a set of four holographic compressed descriptions. The
compression employed is JPEG2000 at a compression ratio of 1:50. (a)–(d) the 1-packet recon-
structions using each of the individual packets. (e)–(g) examples for the m-packet reconstructions
for m = 2, 3, 4
D (x, v) = x − v 2
2, (47)
Fig. 9 PSNR versus the number of representations used for the reconstructions. The entire set
contains four packets, each formed by JPEG2000 compression at 1:50 compression ratio. The
black, red, green, and blue curves, respectively, represent the methods of exact duplications,
baseline (unoptimized), optimized for reconstruction from pairs of packets, and optimized for
reconstruction from four packets. (a) Cameraman. (b) House. (c) Lena. (d) Barbara
D (x, v) = xi − vi 2
2, (48)
i∈I
exhibiting that, for squared-error measures, the total distortion can be computed
as the sum of distortions associated with its nonoverlapping blocks. While this
property is satisfied for any segmentation of the signal into nonoverlapping blocks,
we will exemplify it here for blocks of equal sizes that allow using one block-level
compression procedure for all the blocks.
Mirroring the definitions described in section “Preliminaries: Lossy Compression
via Operational Rate-Distortion Optimization” for full-signal compression architec-
tures, the block-level process corresponds to a function Cb : RNb → Bb , mapping
the Nb -dimensional signal-block domain to a discrete set Bb of binary compressed
representations of blocks. The associated block decompression process is denoted
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 205
R (v) = Rb (vi ) . (49)
i∈I
Plugging the block-based compression design into the Lagrangian form (25)
gives
v̂i i∈I
= argmin Rb (vi ) + λ xi − vi 2
2 (50)
{vi }i∈I ∈Sb i∈I i∈I
Note that the block-level optimizations in (51) are independent and refer to the
same Lagrangian multiplier λ. Commonly, compression designs are based on
processing of low-dimensional blocks, allowing to practically address the block-
level optimizations in (51). For example, one can evaluate the Lagrangian cost for
all the elements in Sb (since this set is sufficiently small).
References
Afonso, M.V., Bioucas-Dias, J.M., Figueiredo, M.A.: Fast image recovery using variable splitting
and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)
Ahmad, R., Bouman, C.A., Buzzard, G.T., Chan, S., Reehorst, E.T., Schniter, P.: Plug and play
methods for magnetic resonance imaging. arXiv preprint arXiv:1903.08616 (2019)
Bahat, Y., Efrat, N., Irani, M.: Non-uniform blind deblurring by reblurring. In: Proceedings of the
IEEE International Conference on Computer Vision, pp. 3286–3294 (2017)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: Proceedings
of ICLR (2017)
F. Bellard, BPG 0.9.6. [Online]. Available: http://bellard.org/bpg/
Beygi, S., Jalali, S., Maleki, A., Mitra, U.: Compressed sensing of compressible signals. In: IEEE
International Symposium on Information Theory (ISIT), pp. 2158–2162 (2017a)
Beygi, S., Jalali, S., Maleki, A., Mitra, U.: An efficient algorithm for compression-based
compressed sensing. arXiv preprint arXiv:1704.01992 (2017b)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1),
1–122 (2011)
206 Y. Dar and A. M. Bruckstein
Brifman, A., Romano, Y., Elad, M.: Turning a denoiser into a super-resolver using plug and play
priors. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1404–1408.
IEEE (2016)
Brifman, A., Romano, Y., Elad, M.: Unified single-image and video super-resolution via denoising
algorithms. IEEE Trans. Image Process. 28(12), 6063–6076 (2019)
Bruckstein, A.M., Holt, R.J., Netravali, A.N.: Holographic representations of images. IEEE Trans.
Image Process. 7(11), 1583–1597 (1998)
Bruckstein, A.M., Holt, R.J., Netravali, A.N.: On holographic transform compression of images.
Int. J. Imag. Syst. Technol. 11(5), 292–314 (2000)
Bruckstein, A.M., Ezerman, M.F., Fahreza, A.A., Ling, S.: Holographic sensing. arXiv preprint
arXiv:1807.10899 (2018)
Burger, M., Dirks, H., Schonlieb, C.-B.: A variational model for joint motion estimation and image
reconstruction. SIAM J. Imag. Sci. 11(1), 94–128 (2018)
Buzzard, G.T., Chan, S.H., Sreehari, S., Bouman, C.A.: Plug-and-play unplugged: optimization-
free reconstruction using consensus equilibrium. SIAM J. Imag. Sci. 11(3), 2001–2020 (2018)
Chan, S.H.: Performance analysis of plug-and-play ADMM: a graph signal processing perspective.
IEEE Trans. Comput. Imag. 5(2), 274–286 (2019)
Chan, S.H., Wang, X., Elgendy, O.A.: Plug-and-play ADMM for image restoration: fixed-point
convergence and applications. IEEE Trans. Comput. Imag. 3(1), 84–98 (2017)
Chatterjee, P., Milanfar, P.: Is denoising dead? IEEE Trans. Image Process. 19(4), 895–911 (2009)
Chou, P.A., Lookabaugh, T., Gray, R.M.: Optimal pruning with applications to tree-structured
source coding and modeling. IEEE Trans. Inf. Theory 35(2), 299–315 (1989)
Corona, V., Aviles-Rivero, A.I., Debroux, N., Graves, M., Le Guyader, C., Schönlieb, C.-B.,
Williams, G.: Multi-tasking to correct: motion-compensated mri via joint reconstruction and
registration. In: International Conference on Scale Space and Variational Methods in Computer
Vision, pp. 263–274. Springer (2019a)
Corona, V., Benning, M., Ehrhardt, M.J., Gladden, L.F., Mair, R., Reci, A., Sederman, A.J.,
Reichelt, S., Schönlieb, C.-B.: Enhancing joint reconstruction and segmentation with non-
convex bregman iteration. Inverse Probl. 35(5), 055001 (2019b)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-
domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Dar, Y., Bruckstein, A.M.: Benefiting from duplicates of compressed data: shift-based holographic
compression of images. J. Math. Imag. Vis. 1–14 63, 380–393 (2021)
Dar, Y., Bruckstein, A.M., Elad, M.: Image restoration via successive compression. In: Picture
Coding Symposium (PCS), pp. 1–5 (2016a)
Dar, Y., Bruckstein, A.M., Elad, M., Giryes, R.: Postprocessing of compressed images via
sequential denoising. IEEE Trans. Image Process. 25(7), 3044–3058 (2016b)
Dar, Y., Elad, M., Bruckstein, A.M.: Compression for multiple reconstructions. In: IEEE Interna-
tional Conference on Image Processing (ICIP), pp. 440–444 (2018a)
Dar, Y., Elad, M., Bruckstein, A.M.: Optimized pre-compensating compression. IEEE Trans.
Image Process. 27(10), 4798–4809 (2018b)
Dar, Y., Elad, M., Bruckstein, A.M.: Restoration by compression. IEEE Trans. Sig. Process. 66(22),
5833–5847 (2018c)
Dar, Y., Elad, M., Bruckstein, A.M.: System-aware compression. In: IEEE International Sympo-
sium on Information Theory (ISIT), pp. 2226–2230 (2018d)
Hong, T., Romano, Y., Elad, M.: Acceleration of red via vector extrapolation. J. Vis. Commun.
Image Represent. 63, 102575 (2019)
Kamilov, U.S., Mansour, H., Wohlberg, B.: A plug-and-play priors approach for solving nonlinear
imaging inverse problems. IEEE Sig. Process. Lett. 24(12), 1872–1876 (2017)
Kwan, C., Choi, J., Chan, S., Zhou, J., Budavari, B.: A super-resolution and fusion approach to
enhancing hyperspectral images. Remote Sens. 10(9), 1416 (2018)
Lai, W.-S., Huang, J.-B., Hu, Z., Ahuja, N., Yang, M.-H.: A comparative study for single image
blind deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1701–1709 (2016)
5 Modular ADMM-Based Strategies for Optimized Compression, Restoration,. . . 207
Laparra, V., Berardino, A., Ballé, J., Simoncelli, E.P.: Perceptually optimized image rendering.
J. Opt. Soc. Am. A 34, 1511 (2017)
Liu, J., Moulin, P.: Complexity-regularized image denoising. IEEE Trans. Image Process. 10(6),
841–851 (2001)
Moulin, P., Liu, J.: Statistical imaging and complexity regularization. IEEE Trans. Inf. Theory
46(5), 1762–1777 (2000)
Natarajan, B.K.: Filtering random noise from deterministic signals via data compression. IEEE
Trans. Sig. Process. 43(11), 2595–2605 (1995)
Ono, S.: Primal-dual plug-and-play image restoration. IEEE Sig. Process. Lett. 24(8), 1108–1112
(2017)
Ortega, A., Ramchandran, K.: Rate-distortion methods for image and video compression. IEEE
Sig. Process. Mag. 15(6), 23–50 (1998)
Rissanen, J.: MDL denoising. IEEE Trans. Inf. Theory 46(7), 2537–2543 (2000)
Romano, Y., Elad, M., Milanfar, P.: The little engine that could: regularization by denoising (RED).
SIAM J. Imag. Sci. 10(4), 1804–1844 (2017)
Rond, A., Giryes, R., Elad, M.: Poisson inverse problems by the plug-and-play scheme. J. Vis.
Commun. Image Represent. 41, 96–108 (2016)
Rott Shaham, T., Michaeli, T.: Deformation aware image compression. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2453–2462 (2018)
Shoham, Y., Gersho, A.: Efficient bit allocation for an arbitrary set of quantizers. IEEE Trans.
Acoust. Speech Sig. Process. 36(9), 1445–1453 (1988)
Shukla, R., Dragotti, P.L., Do, M.N., Vetterli, M.: Rate-distortion optimized tree-structured
compression algorithms for piecewise polynomial images. IEEE Trans. Image Process. 14(3),
343–359 (2005)
Sreehari, S., Venkatakrishnan, S., Wohlberg, B., Buzzard, G.T., Drummy, L.F., Simmons, J.P.,
Bouman, C.A.: Plug-and-play priors for bright field electron tomography and sparse interpo-
lation. IEEE Trans. Comput. Imag. 2(4), 408–423 (2016)
Sullivan, G.J., Wiegand, T.: Rate-distortion optimization for video compression. IEEE Sig. Process.
Mag. 15(6), 74–90 (1998)
Sullivan, G.J., Ohm, J., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding
(HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Sun, Y., Wohlberg, B., Kamilov, U.S.: An online plug-and-play algorithm for regularized image
reconstruction. IEEE Trans. Comput. Imag.5, 395–408 (2019a)
Sun, Y., Xu, S., Li, Y., Tian, L., Wohlberg, B., Kamilov, U.S.: Regularized fourier ptychography
using an online plug-and-play algorithm. In: ICASSP 2019-2019 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 7665–7669. IEEE (2019b)
Tirer, T., Giryes, R.: Image restoration by iterative denoising and backward projections. IEEE
Trans. Image Process. 28(3), 1220–1234 (2018a)
Tirer, T., Giryes, R.: An iterative denoising and backwards projections method and its advantages
for blind deblurring. In: 2018 25th IEEE International Conference on Image Processing (ICIP),
pp. 973–977. IEEE (2018b)
Tirer, T., Giryes, R.: Back-projection based fidelity term for ill-posed linear inverse problems.
arXiv preprint arXiv:1906.06794 (2019)
Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based
reconstruction. In: IEEE GlobalSIP (2013)
Yazaki, Y., Tanaka, Y., Chan, S.H.: Interpolation and denoising of graph signals using plug-and-
play ADMM. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp. 5431–5435. IEEE (2019)
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration.
In: IEEE International Conference on Computer Vision (ICCV), pp. 479–486 (2011)
Connecting Hamilton-Jacobi Partial
Differential Equations with Maximum a 6
Posteriori and Posterior Mean Estimators for
Some Non-convex Priors
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
First-Order Hamilton-Jacobi PDEs and Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . 212
Single-Time HJ PDEs and Image Denoising Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Multi-time HJ PDEs and Image Decomposition Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Min-Plus Algebra for HJ PDEs and Certain Non-convex Regularizations . . . . . . . . . . . . . 216
Application to Certain Decomposition Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Viscous Hamilton-Jacobi PDEs and Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Viscous HJ PDEs and Posterior Mean Estimators for Log-Concave Models . . . . . . . . . . . 225
On Viscous HJ PDEs with Certain Non-log-Concave Priors . . . . . . . . . . . . . . . . . . . . . . . . 227
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Abstract
Keywords
Introduction
Many low-level signal, image processing, and computer vision problems are for-
mulated as inverse problems that can be solved using variational (Aubert and Korn-
probst 2002; Scherzer et al. 2009; Vese et al. 2016) or Bayesian approaches (Winkler
2003). Both approaches have been very effective, for example, at solving image
restoration (Bouman and Sauer 1993; Likas and Galatsanos 2004; Rudin et al.
1992), segmentation (Boykov et al. 2001; Chan et al. 2006; Chan and Vese 2001),
and image decomposition problems (Aujol et al. 2005; Osher et al. 2003).
As an illustration, let us consider the following image denoising problem in finite
dimension that formally reads as follows:
x = ū + η,
where x ∈ Rn is the observed image that is the sum of an unknown ideal image ū ∈
Rn and an additive perturbation or noise realization η ∈ Rn . We aim to estimate ū.
A standard variational approach for solving this problem consists of estimating
ū as a minimizer of the following optimization problem:
min λD(x − u) + J (u) , (1)
u∈Rn
called the regularization term and encodes the knowledge on the image we wish
to reconstruct. The nonnegative parameter λ relatively weights the data fidelity
and the regularization terms. Note that minimizers of (1) are called maximum a
posteriori (MAP) estimators in a Bayesian setting. Also note that variational-based
approaches for estimating ū are particularly appealing when both the data fidelity
and regularization terms are convex because (1) becomes a convex optimization
problem that can be efficiently solved using convex optimization algorithms (see,
e.g., Chambolle and Pock 2016). Many regularization terms have been proposed
in the literature (Aubert and Kornprobst 2002; Winkler 2003). Popular choices for
these regularization terms involve robust edge-preserving priors (Bouman and Sauer
1993; Charbonnier et al. 1997; Geman and Yang 1995; Geman and Reynolds 1992;
Nikolova and Chan 2007; ?; Rudin et al. 1992) because they allow the reconstructed
image to have sharp edges. For the sake of simplicity, we only describe in this
introduction regularizations that are expressed using pairwise interactions which
take the following form:
n
J (u) = wij f (ui − uj ), (2)
i,j =1
where f : R → R∪{+∞} and wi,j 0. Note that our results that will be presented
later do not rely on pairwise interaction-based models and work for more general
regularization terms. A popular choice is the celebrated Total Variation (Bouman
and Sauer 1993; Rudin et al. 1992), which corresponds to consider f (z) = |z|
in (2). The use of Total Variation as a regularization term has been very popular
since the seminal works of Bouman and Sauer (1993); Rudin et al. (1992) because
it is convex and allows the reconstructed image to preserve edges well. When the
data fidelity D is quadratic, this model is known as the celebrated Rudin-Osher-
Fatemi model (Rudin et al. 1992). Following the seminal works of Charbonnier
et al. (1997), Geman and Yang (1995) and Geman and Reynolds (1992), another
class of edge-preserving priors corresponds to half-quadratic-based regularizations
that read as follows:
⎧
⎨|z|2 if |z| 1,
f (z) = (3)
⎩1 otherwise.
Note that the quadratic term above can be replaced by | · |, i.e., we consider:
⎧
⎨|z| if |z| 1,
f (z) = (4)
⎩1 otherwise,
which corresponds to the truncated Total Variation regularization (see Darbon et al.
2009; Dou et al. 2017 for instance).
212 J. Darbon et al.
x−u
S(x, t) = minn J (u) + tH ∗ . (5)
u∈R t
Equation (7) in this proposition gives the relation between the minimizer u in
the Lax-Oleinik formula (5) and the spatial gradient of the solution to the HJ PDE
(6). In other words, one can compute the minimizer in the corresponding denoising
model (1) using the spatial gradient ∇x S(x, t) of the solution, and vice versa.
There is another set of assumptions for the conclusion of the proposition above
to hold. For the details, we refer the reader to Darbon (2015).
⎧ ⎛ ⎞ ⎫
⎪
⎨ ⎪
⎬
N N
⎝
min n J x − ⎠
ui + λi fi (ui ) , (8)
u1 ,...,uN ∈R ⎪
⎩ ⎪
⎭
i=1 i=1
This formula is called the generalized Lax-Oleinik formula (Lions and Rochet 1986;
Tho 2005) which solves the following multi-time HJ PDE system:
⎧
⎪
⎪ ∂S(x,t1 ,...,tN )
+ H1 (∇x S(x, t1 , . . . , tN )) = 0 x ∈ Rn , t1 , · · · , tN > 0,
⎪
⎪
⎪
⎪
∂t1
⎪
⎪ ..
⎪
⎪ .
⎪
⎪
⎨ ∂S(x,t1 ,...,tN ) + H (∇ S(x, t , . . . , t )) = 0 x ∈ Rn , t , · · · , t > 0,
∂tj j x 1 N 1 N
⎪
⎪ .
..
⎪
⎪
⎪
⎪
⎪
⎪ ∂S(x,t1 ,...,tN )
+ HN (∇x S(x, t1 , . . . , tN )) = 0 x ∈ Rn , t1 , · · · , tN > 0,
⎪
⎪
⎪
⎪
∂tN
⎩S(x, 0, · · · , 0) = J (x) x ∈ Rn ,
(10)
where H1 , . . . , HN : Rn → R are called Hamiltonians and J : Rn → R ∪ {+∞}
is the initial data. Under certain assumptions (see Prop. 2), the generalized Lax-
Oleinik formula (9) gives the solution S(x, t1 , . . . , tN ) to the multi-time HJ PDE
system (10). In Darbon and Meng (2020), the relation between the minimizer in (9)
and the spatial gradient ∇x S(x, t1 , . . . , tN ) of the solution to the multi-time HJ PDE
system (10) is studied. This relation is described in the following proposition.
As a result, when the assumptions in the proposition above are satisfied, one
can compute the minimizer to the corresponding decomposition model (8) using
equation (11) and the spatial gradient ∇x S(x, t1 , . . . , tN ) of the solution to the
multi-time HJ PDE (10).
216 J. Darbon et al.
In the previous two subsections, we considered the optimization models (1) and (8)
where each term was assumed to be convex. When J is non-convex, solutions to (6)
may not be classical (in the sense that it is not differentiable). It is well-known that
the concept of viscosity solutions (Bardi and Capuzzo-Dolcetta 1997; Barles 1994;
Barron et al. 1984; Crandall et al. 1992; Evans 2010; Fleming and Soner 2006)
is generally the appropriate notion of solutions for these HJ PDEs. Note that Lax-
Oleinik formulas (1) and (8) yield viscosity solutions to their respective HJ PDEs (6)
and (10). However, these Lax-Oleinik formulas result in non-convex optimization
problems.
In this subsection, we use the min-plus algebra technique (Akian et al. 2006,
2008; Dower et al. 2015; Fleming and McEneaney 2000; Gaubert et al. 2011;
Kolokoltsov and Maslov 1997; McEneaney 2006, 2007; McEneaney et al. 2008;
McEneaney and Kluberg 2009) to handle the cases when the term J in (1) and (8)
is assumed to be a non-convex function in the following form:
Lax-Oleinik formula (5) solves the HJ PDE (6) for each i ∈ {1, . . . , m} and the
minimizer u exists (for instance, when Ji ∈ Γ0 (Rn ) for each i ∈ {1, . . . , m}, and
H : Rn → R is a differentiable, strictly convex, and 1-coercive function), then we
have:
x−u
S(x, t) = minn J (u) + tH ∗
u∈R t
x−u
= minn min Ji (u) + tH ∗
u∈R i∈{1,...,m} t
x−u
= minn min Ji (u) + tH ∗ (13)
u∈R i∈{1,...,m} t
⎧ ⎫
⎨ x−u ⎬
= min min Ji (u) + tH ∗
i∈{1,...,m} ⎩u∈Rn t ⎭
Therefore, the solution S(x, t) is given by the pointwise minimum of Si (x, t) for
i ∈ {1, . . . , m}. Note that the Lax-Oleinik formula (5) yields a convex problem
for each Si (x, t) with i ∈ {1, . . . , m}. Therefore this approach seems particularly
appealing to solve these non-convex optimization problems and associated HJ
PDEs. Note that such an approach is embarrassingly parallel since we can solve
the initial data Ji for each i ∈ {1, . . . , m} independently and compute in linear time
the pointwise minimum. However, this approach is only feasible if m is not too big.
We will see later in this subsection that robust edge-preserving priors (e.g., truncated
Total Variation or truncated quadratic) can be written in the form of (12), but m is
exponential in n.
We can also compute the set of minimizers u(x, t) as follows. Here, we abuse
notation and use u(x, t) to denote the set of minimizers, which may be not a
singleton set when the minimizer is not unique. We can write
x−u
u(x, t) = arg min min Ji (u) + tH ∗
u∈Rn i∈{1,...,m} t
x−u
= arg min min Ji (u) + tH ∗ (14)
u∈Rn i∈{1,...,m} t
x−u
= arg min Ji (u) + tH ∗ ,
u∈Rn t
i∈I (x,t)
J (x) = min JΩ ,
Ω⊆E
⎧ ⎫
⎨ ⎬
JΩ := wij + wij g(xi − xj ) ,
⎩ ⎭
(i,j )∈Ω (i,j )∈Ω
where Ω is any subset of E. The truncated regularization term (16) can therefore
be written in the form of (12), and hence the minimizer to the corresponding
optimization problem (1) with the non-convex regularization term J in (16) can
be computed using (14).
We give here two examples of truncated regularization term with pairwise
interactions in the form of (16). First, let g be the 1 norm. Then J is the truncated
discrete Total Variation regularization term defined by
J (x) = wij min{|xi − xj |, 1}, for each x = (x1 , . . . , xn ) ∈ Rn . (17)
(i,j )∈E
This function J can be written as the formula (16) with f : R → R given by Eq. (4).
Second, let g be the quadratic function. Then J is the half-quadratic regularization
term defined by
J (x) = wij min{(xi − xj )2 , 1}, for each x = (x1 , . . . , xn ) ∈ Rn . (18)
(i,j )∈E
This function J can be written as the formula (16) with f : R → R given by Eq. (3).
This specific form of edge-preserving prior was investigated in the seminal works
of Charbonnier et al. (1997), Geman and Yang (1995) and Geman and Reynolds
(1992). Several algorithms have been proposed to solve the resultant non-convex
optimization problem (13), i.e., the solution to the corresponding HJ PDE, for some
specific choice of data fidelity terms (e.g., Allain et al. 2006; Idier 2001; Geman and
Yang 1995; Geman and Reynolds 1992; Nikolova and Ng 2005; Champagnat and
Idier 2004; Nikolova and Ng 2001).
Suppose now, for general regularization terms J in the form of (16), that we have
Gaussian noise. Then the data fidelity term is quadratic and H (p) = 12 p22 and
t = λ1 . Hence, for this example, using (14), we obtain the set of minimizers:
x−u
u(x, t) = arg min JΩ (u) + tH ∗
n t
Ω∈I (x,t) u∈R
⎧ ⎫
⎨ 1 ⎬
= arg min wij g(ui − uj ) + x − u22
u∈Rn ⎩ 2t ⎭
Ω∈I (x,t) (i,j )∈Ω
= {x − t∇x SΩ (x, t)}
Ω∈I (x,t)
6 Connecting Hamilton-Jacobi Partial Differential Equations with Maximum. . . 219
where
⎧ ⎫
⎨ 1 ⎬
SΩ (x, t) = wij + minn wij g(ui − uj ) + x − u22
u∈R ⎩ 2t ⎭
(i,j )∈Ω (i,j )∈Ω
and
The same result also holds for the multi-time HJ PDE system (10). Indeed, if J
is a non-convex regularization term given by (12), and S, Sj : Rn × (0, +∞)N → R
are the solutions to the multi-time HJ PDE system (10) with initial data J and Ji ,
respectively, then similarly we have the min-plus linearity of the semigroup under
certain assumptions. Specifically, if the Lax-Oleinik formula (9) solves the multi-
time HJ PDE system (10) for each i ∈ {1, . . . , m} (for instance, when H and Ji
satisfy the assumptions in Prop. 2 for each i ∈ {1, . . . , m}), then there holds
⎧ ⎛ ⎞ ⎫
⎪ ⎪
⎨
N
N
uj ⎬
S(x, t1 , . . . , tN ) = min min Ji ⎝x − uj ⎠ + tj Hj∗
u1 ,...,uN ∈Rn ⎪
⎩i∈{1,...,m} tj ⎪
⎭
j =1 j =1
⎧ ⎧ ⎫⎫
⎪ ⎛ ⎞ ⎪⎪
⎪
⎨ ⎪
⎨ ⎪
uj ⎬⎬
N N
= min min Ji ⎝x− uj ⎠ + tj Hj∗
i∈{1,...,m} ⎪
⎪u1 ,...,uN ∈Rn ⎪
⎩ tj ⎪
⎭⎪
⎪
⎩ j =1 j =1 ⎭
= min Si (x, t1 , . . . , tN ).
i∈{1,...,m}
(19)
Let M ⊂ Rn×N be the set of minimizers of (9) with J given by (12). Then M
satisfies
⎧ ⎛ ⎞ ⎫
⎪ ⎪
⎨ N N
uj ⎬
M = arg min min Ji ⎝x − uj ⎠ + tj Hj∗
u1 ,...,uN ∈Rn ⎪
⎩i∈{1,...,m} j =1 j =1
tj ⎪
⎭
⎧ ⎛ ⎞ ⎫ (20)
⎪ ⎪
⎨
N
N
uj ⎬
= arg min Ji ⎝x − uj ⎠ + tj Hj∗ ,
∈Rn ⎪
⎩ tj ⎪
⎭
i∈I (x,t1 ,...,tN ) u1 ,...,uN j =1 j =1
As a result, we can use (20) to obtain the minimizers of the decomposition model (8)
with the non-convex regularization term J in the form of (12), such as the function
in (16) and the truncated Total Variation function (17).
In summary, one can compute the minimizers of the optimization problems (1)
and (8) with a non-convex function J in the form of (12) using the aforementioned
min-plus algebra technique. Furthermore, this technique can be extended to handle
other cases. For instance, in the denoising model (1), if the data fidelity term D
is in the form of (12) and the prior term J (u) can be written as tH ∗ u , then
λ t
one can still compute the minimizer of this problem using the min-plus algebra
technique on the HJ PDE with initial data D. Similarly, because of the symmetry in
the decomposition model (8), if there is only one non-convex term fj and if it can
be written in the form of (12), then one can apply the min-plus algebra technique to
the multi-time HJ PDE with initial data fj .
In general, however, there is a drawback to the min-plus algebra technique. To
compute the minimizers using (14) and (20), we need to compute the index set
I (x, t) and I (x, t1 , . . . , tN ) defined in (15) and (21), which involves solving m
HJ PDEs to obtain the solutions S1 , . . . , Sm . When m is too large, this approach
is impractical since it involves solving too many HJ PDEs. For instance, if J is
the truncated Total Variation in (17), the number m equals the number of subsets
of the set E, i.e., m = 2|E| , which is computationally intractable. Hence, in
general, it is impractical to use (14) and (20) to solve the problems (1) and (8)
where the regularization term J is given by the truncated Total Variation. The
same issue arises when the truncated Total Variation is replaced by half-quadratic
regularization. Several authors attempted to address this intractability for half-
quadratic regularizations by proposing heuristic optimization methods that aim to
compute a global minimizer (Allain et al. 2006; Idier 2001; Geman and Yang 1995;
Geman and Reynolds 1992; Nikolova and Ng 2005; Champagnat and Idier 2004;
Nikolova and Ng 2001).
with respect to Meyer’s norm is used in Aujol et al. (2003, 2005), and the 1 norm is
used in Le Guen (2014). Note that each texture model has some pros and cons and,
to our knowledge, it remains an open problem whether one specific texture model is
better than the others. In this example, we combine different texture regularizations
proposed in the literature by taking the minimum of the indicator function of the
unit ball with respect to Meyer’s norm and the 1 norm. In other words, we consider
the following variational problem:
u1 1
min J (x − u1 − u2 ) + t1 g + u2 22 , (22)
u1 ,u2 ∈Rn t1 2t2
u1 1
min min J (x − u1 − u2 ) + t1 gk + u2 22 , (23)
u1 ,u2 ∈Rn k∈{1,2} t1 2t2
where g1 (y) := J ∗ (y) and g2 (y) := y1 for each y ∈ Rn . Note that solving mixed
discrete-continuous optimization is hard in general (see Floudas and Pardalos 2009
for instance). However, we shall see that our proposed approach yields efficient
optimization algorithms. Since the function g is the minimum of two convex
functions, the problem (22) fits into our formulation, and can be solved using a
similar idea as in (19) and (20). To be specific, define the two functions S1 and S2
by
1 u1
S1 (x, t1 , t2 ) := min J (x − u1 − u2 ) + t1 J ∗ +
u2 22 ,
u1 ,u2 ∈Rn 2t2t1
(24)
1
S2 (x, t1 , t2 ) := min n J (x − u1 − u2 ) + u1 1 + u2 22 ,
u1 ,u2 ∈R 2t2
where the sets of the minimizers in the two minimization problems above are
denoted by M1 (x, t1 , t2 ) and M2 (x, t1 , t2 ), respectively. Using a similar argu-
ment as in (19) and (20), we conclude that the minimal value in (22) equals
min{S1 (x, t1 , t2 ), S2 (x, t1 , t2 )}, and the set of minimizers in (22), denoted by
M(x, t1 , t2 ), satisfies
222 J. Darbon et al.
⎧
⎪
⎪
⎨M1 (x, t1 , t2 ) S1 (x, t1 , t2 ) < S2 (x, t1 , t2 ),
M(x, t1 , t2 ) = M2 (x, t1 , t2 ) S1 (x, t1 , t2 ) > S2 (x, t1 , t2 ),
⎪
⎪
⎩M (x, t , t ) ∪ M (x, t , t ) S1 (x, t1 , t2 ) = S2 (x, t1 , t2 ).
1 1 2 2 1 2
(25)
As a result, we solve the two minimization problems in (24) first, and then obtain
the minimizers using (25) by comparing the minimal values S1 (x, t1 , t2 ) and
S2 (x, t1 , t2 ).
Here, we present a numerical result. We solve the first optimization problem
in (24) by a splitting method, where each subproblem can be solved using the prox-
imal operator of the anisotropic Total Variation (for more details, see Darbon and
Meng 2020). Similarly, a splitting method is used to split the second optimization
problem in (24) to two subproblems, which are solved using the proximal operators
of the anisotropic Total Variation and the 1 -norm, respectively. To compute the
proximal point of the anisotropic Total Variation, the algorithm in Chambolle and
Darbon (2009), Darbon and Sigelle (2006), and Hochbaum (2001) is adopted, and
it computes the proximal point without numerical errors. The input image x is
the image “Barbara” shown in Fig. 1. The parameters are set to be t1 = 0.07
and t2 = 0.01. Let (u1 , u2 ) ∈ M1 (x, t1 , t2 ) and (v 1 , v 2 ) ∈ M2 (x, t1 , t2 ) be
respectively the minimizers of the two minimization problems in (24) solved by the
aforementioned splitting methods. We show these minimizers and the related images
in Figs. 2 and 3. To be specific, the decomposition components x −u1 −u2 , u1 +0.5,
and u2 + 0.5 given by the first optimization problem in (24) are shown in Fig. 2a, b,
and c, respectively. The decomposition components x−v 1 −v 2 , v 1 +0.5, and v 2 +0.5
Fig. 2 The minimizer of the first problem in (24). The output images x − u1 − u2 , u1 + 0.5, and
u2 + 0.5 are shown in (a), (b), and (c), respectively
given by the second optimization problem in (24) are shown in Fig. 3a, b, and c,
respectively. We also compute the optimal values S1 (x, t1 , t2 ) and S2 (x, t1 , t2 ), and
obtain
Fig. 3 The minimizer of the second problem in (24). The output images x − v 1 − v 2 , v 1 + 0.5
and v 2 + 0.5 are shown in (a), (b) and (c), respectively
algebra technique discussed in section “Min-Plus Algebra for HJ PDEs and Certain
Non-convex Regularizations” for certain Bayesian posterior mean estimators.
where x ∈ Rn is the observed image with n pixels, and t and are positive
parameters. The posterior distribution (26) is proportional to the product of a log-
concave prior u → e−J (u)/ (possibly improper) and a Gaussian likelihood function
u → e− 2t x−u2 . This class of posterior distributions generates the family of
1 2
These are Bayesian estimators because they minimize the mean squared error (Kay
1993, pages 344–345):
uP M (x, t, ) = arg min ū − u22 q(ū|(x, t, )) d ū. (28)
u∈Rn Rn
They are frequently called minimum mean squared error estimators for this reason.
The class of posterior distributions (26) also generates the family of maximum a
posteriori estimators uMAP : Rn × (0, +∞) → Rn defined by
1
uMAP (x, t) = arg min J (u) + x − u22 , (29)
u∈Rn 2t
where uMAP (x, t) is the mode of the posterior distribution (26). Note that the MAP
estimator is also the minimizer of the solution (5) to the first-order HJ PDE (6) with
Hamiltonian H = 12 ·22 and initial data J .
There is a large body of literature on posterior mean estimators for image
restoration problems (see e.g., Demoment 1989; Kay 1993; Winkler 2003). In
particular, original connections between variational problems and Bayesian methods
have been investigated in Louchet (2008), Louchet and Moisan (2013), Burger
and Lucka (2014), Burger and Sciacchitano (2016), Gribonval (2011), Gribonval
and Machart (2013), Gribonval and Nikolova (2018), and Darbon and Langlois
226 J. Darbon et al.
(2020). In particular, in Darbon and Langlois (2020), the authors described original
connections between Bayesian posterior mean estimators and viscous HJ PDEs
when J ∈ Γ0 (Rn ) and the data fidelity term is Gaussian. We now briefly describe
these connections here.
Consider the function S : Rn × (0, +∞) → R defined by
1 − J (u)+ 2t1 x−u22 /
S (x, t) = − ln e du , (30)
(2π t)n/2 Rn
where J is the initial data. The solution to this PDE is also related to the first-order
HJ PDE (6) when the Hamiltonian is H = 12 ·22 . The following proposition, which
is given in Darbon and Langlois (2020), describes these connections.
and
uP M (x, t, ) − u2 q(u|(x, t, )) du = nt − t 2 Δx S (x, t). (33)
2
Rn
In addition, for every x ∈ Rn and t > 0, the limits of lim→0 S (x, t) and
>0
lim→0 uP M (x, t, ) exist and converge uniformly over every compact set of Rn ×
>0
(0, +∞) in (x, t). Specifically, we have
1
lim S (x, t) = minn J (u) + x − u2 ,
2
(34)
→0 u∈R 2t
>0
6 Connecting Hamilton-Jacobi Partial Differential Equations with Maximum. . . 227
where the right-hand side solves uniquely the first-order HJ PDE (6) with Hamilto-
nian H = 12 ·22 and initial data J , and
1
lim uP M (x, t, ) = arg min J (u) + x − u2 .
2
(35)
→0 u∈Rn 2t
>0
So far, we have assumed that the regularization term J in the posterior distribu-
tion (26) and Proposition 3 is convex. Here, we consider an analogue of the min-plus
algebra technique designed for certain first-order HJ PDEs tailed to viscous HJ
PDEs, which will enable us to derive representation formulas for posterior mean
estimators of the form of (27) whose priors are sums of log-concave priors, i.e., to
certain mixture distributions.
Remember that the min-plus algebra technique for first-order HJ PDEs described
in section “Min-Plus Algebra for HJ PDEs and Certain Non-convex Regulariza-
tions” involves initial data of the form mini∈{1,...,m} Ji (x) where each Ji : Rn →
R ∪ {+∞} is convex. Consider now initial data of the form
⎛ ⎞
m
J (x) = − ln ⎝ e−Ji (x)/ ⎠ . (37)
i=1
Note that formula (37) approximates the non-convex term (12) in that
⎛ ⎞
m
lim − ln ⎝ e−Ji (x)/ ⎠ = min Ji (x) for each x ∈ Rn .
→0 i∈{1,...,m}
>0 i=1
228 J. Darbon et al.
Now, assume int (dom Ji ) = ∅ for each i ∈ {1, . . . , m}, and let
1 − Ji (u)+ 2t1 x−u22 /
Si, (x, t) = − ln e du ,
(2π t)n/2 Rn
and
− Ji (u)+ 2t1 x−u22 /
Rn ue du
ui,P M (x, t, ) =
− Ji (u)+ 2t1 x−u22 /
Rn e du
denote, respectively, the solution to the viscous HJ PDE (31) with initial data Ji and
its associated posterior mean. Then, a short calculation shows that for every > 0,
the function S (x, t) : Rn × (0, +∞) → R defined by
⎛ ⎞
m
1 − Ji (u)+ 2t1 x−u22 /
S (x, t) = − ln ⎝ e du⎠
(2π t)n/2 Rn
i=1
⎛ ⎞ (38)
m
= − ln ⎝ e−Si, (x,t)/ ⎠
i=1
is the unique smooth solution to the viscous HJ PDE (31) with initial data (37). As
stated in section “Viscous HJ PDEs and Posterior Mean Estimators for Log-Concave
Models”, the posterior mean estimate uP M (x, t, ) is given by the representation
formula:
which can be expressed in terms of the solutions Si, (x, t), their spatial gradients
∇x Si, (x, t), and posterior mean estimates ui,P M (x, t, ) as the weighted sums
m −Si, (x,t)/
i=1 ∇x Si, (x, t)e
uP M (x, t, ) = x − t m −S (x,t)/
i=1 e
i,
m (40)
−Si, (x,t)/
i=1 ui,P M (x, t, )e
= m −S (x,t)/ .
i=1 e
i,
⎧ ⎫
⎨ 1 2 1 ⎬
S0 (x, t) = minn min u − μi + x − u2
u∈R ⎩i∈{1,...,m} 2σi2 2 2t 2 ⎭
⎧ ⎫
⎨ 1 2 1 ⎬
= min min u − μi + x − u2 (41)
i∈{1,...,m} ⎩u∈Rn 2 2 ⎭
2σi2 2t
1
= min x − μi 2 .
i∈{1,...,m} 2(σi2 + t) 2
Letting I (x, t) = arg mini∈{1,...,m} 1 x − μi 2 , the MAP estimator is then
2(σi2 +t) 2
the collection:
σi2 x + tμi
uMAP (x, t) = .
i∈I (x,t)
σi2 + t
The solution S (x, t) to the viscous HJ PDE (31) with initial data J (x) is given by
formula (38), which in this case can be computed analytically:
⎛ n/2 ⎞
m
σi2 − 21 x−μi 2
2
S (x, t) = − ln ⎝ 2+t
e 2(σi +t) ⎠. (42)
i=1
σi
σi2
n/2 − 1
x−μi 22
Since e−Si, (x,t)/ = e 2(σi2 +t)
, we can write the corresponding
σi2 +t
posterior mean estimator (40) using the representation formulas (39) and (40):
Conclusion
References
Akian, M., Bapat, R., Gaubert, S.: Max-plus algebra. In: Handbook of Linear Algebra, 39 (2006)
Akian, M., Gaubert, S., Lakhoua, A.: The max-plus finite element method for solving deterministic
optimal control problems: basic properties and convergence analysis. SIAM J. Control. Optim.
47, 817–848 (2008)
6 Connecting Hamilton-Jacobi Partial Differential Equations with Maximum. . . 231
Allain, M., Idier, J., Goussard, Y.: On global and local convergence of half-quadratic algorithms.
IEEE Trans. Image Process. 15, 1130–1142 (2006)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing. Springer (2002)
Aujol, J.-F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition application to
SAR images. In: L.D. Griffin, Lillholm, M. (eds.) Scale Space Methods in Computer Vision.
Springer, Berlin/Heidelberg, pp. 297–312 (2003)
Aujol, J.-F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded
variation component and an oscillating component. J. Math. Imaging Vision 22, 71–88 (2005)
Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-
Bellman equations. Systems & Control: Foundations & Applications, Birkhäuser Boston, Inc.,
Boston (1997). With appendices by Maurizio Falcone and Pierpaolo Soravia
Bardi, M., Evans, L.: On Hopf’s formulas for solutions of Hamilton-Jacobi equations. Nonlinear
Anal. Theory Methods Appl. 8, 1373–1381 (1984)
Barles, G.: Solutions de viscosité des équations de Hamilton-Jacobi. Mathématiques et Applica-
tions. Springer, Berlin/Heidelberg (1994)
Barron, E., Evans, L., Jensen, R.: Viscosity solutions of Isaacs’ equations and differential games
with Lipschitz controls. J. Differ. Equ. 53, 213–233 (1984)
Bouman, C., Sauer, K.: A generalized gaussian image model for edge-preserving map estimation.
IEEE Trans. Trans. Signal Process. 2, 296–310 (1993)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE
Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001)
Burger, M., Lucka, F.: Maximum a posteriori estimates in linear inverse problems with log-
concave priors are proper bayes estimators. Inverse Probl. 30, 114004 (2014)
Burger, Y.D.M., Sciacchitano, F.: Bregman cost for non-gaussian noise. arXiv preprint
arXiv:1608.07483 (2016)
Chambolle, A., Darbon, J.: On total variation minimization and surface evolution using parametric
maximum flows. Int. J. Comput. Vis. 84, 288–307 (2009)
Chambolle, A., Novaga, M., Cremers, D., Pock, T.: An introduction to total variation for image
analysis. In: Theoretical Foundations and Numerical Methods for Sparse Recovery, De Gruyter
(2010)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer.
25, 161–319 (2016)
Champagnat, F., Idier, J.: A connection between half-quadratic criteria and em algorithms. IEEE
Signal Processing Lett. 11, 709–712 (2004)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. SIAM J. Appl. Math. 66, 1632–1648 (2006)
Chan, T.F., Shen, J.: Image processing and analysis, Society for Industrial and Applied Mathemat-
ics (SIAM), Philadelphia (2005). Variational, PDE, wavelet, and stochastic methods
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10, 266–277
(2001)
Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barlaud, M.: Deterministic edge-preserving regu-
larization in computed imaging. IEEE Trans. Image Process. 6, 298–311 (1997)
Crandall, M.G., Ishii, H., Lions, P.-L.: User’s guide to viscosity solutions of second order partial
differential equations. Bull. Am. Math. Soc. 27, 1–67 (1992)
Darbon, J.: On convex finite-dimensional variational methods in imaging sciences and Hamilton–
Jacobi equations. SIAM J. Imag. Sci. 8, 2268–2293 (2015)
Darbon, J., Ciril, I., Marquina, A., Chan, T.F., Osher, S.: A note on the bregmanized total variation
and dual forms. In: 2009 16th IEEE International Conference on Image Processing (ICIP), Nov
2009, pp. 2965–2968
Darbon, J., Langlois, G.P.: On Bayesian posterior mean estimators in imaging sciences and
Hamilton-Jacobi partial differential equations. arXiv preprint arXiv: 2003.05572 (2020)
Darbon, J., Meng, T.: On decomposition models in imaging sciences and multi-time Hamilton-
Jacobi partial differential equations. SIAM Journal on Imaging Sciences. 13(2), 971–1014
(2020). https://doi.org/10.1137/19M1266332
232 J. Darbon et al.
Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and
exact optimization. J. Math. Imaging Vision 26, 261–276 (2006)
Demoment, G.: Image reconstruction and restoration: Overview of common estimation structures
and problems. IEEE Trans. Acoust. Speech Signal Process. 37, 2024–2036 (1989)
Dou, Z., Song, M., Gao, K., Jiang, Z.: Image smoothing via truncated total variation. IEEE Access
5, 27337–27344 (2017)
Dower, P.M., McEneaney, W.M., Zhang, H.: Max-plus fundamental solution semigroups for opti-
mal control problems. In: 2015 Proceedings of the Conference on Control and its Applications.
SIAM, 2015, pp. 368–375
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2012)
Evans, L.C.: Partial differential equations, vol. 19 of Graduate Studies in Mathematics, 2nd edn.
American Mathematical Society, Providence (2010)
Fleming, W., McEneaney, W.: A max-plus-based algorithm for a Hamilton–Jacobi–Bellman
equation of nonlinear filtering. SIAM J. Control. Optim. 38, 683–710 (2000)
Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions, vol. 25.
Springer Science & Business Media (2006)
Floudas, C.A., Pardalos, P.M. (eds.): Encyclopedia of Optimization, 2nd edn. (2009)
Gaubert, S., McEneaney, W., Qu, Z.: Curse of dimensionality reduction in max-plus based
approximation methods: Theoretical estimates and improved pruning algorithms. In: 2011 50th
IEEE Conference on Decision and Control and European Control Conference. IEEE, 2011,
pp. 1054–1061
Geman, D., Yang, C.: Nonlinear image recovery with half-quadratic regularization. IEEE Trans.
Image Process. 4, 932–946 (1995)
Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Trans.
Pattern Anal. Mach. Intell. 14, 367–383 (1992)
Gribonval, R.: Should penalized least squares regression be interpreted as maximum a posteriori
estimation? IEEE Trans. Signal Process. 59, 2405–2410 (2011)
Gribonval, R., Machart, P.: Reconciling” priors” &” priors” without prejudice? In: Advances in
Neural Information Processing Systems, 2013, pp. 2193–2201
Gribonval, R., Nikolova, M.: On bayesian estimation and proximity operators, arXiv preprint
arXiv:1807.04021 (2018)
Hochbaum, D.S.: An efficient algorithm for image segmentation, Markov random fields and related
problems. J. ACM 48, 686–701 (2001)
Hopf, E.: Generalized solutions of non-linear equations of first order. J. Math. Mech. 14, 951–973
(1965)
Idier, J.: Convex half-quadratic criteria and interacting auxiliary variables for image restoration.
IEEE Trans. Image Process. 10, 1001–1009 (2001)
Kay, S.M.: Fundamentals of Statistical Signal Processing. Prentice Hall PTR (1993)
Kolokoltsov, V.N., Maslov, V.P.: Idempotent analysis and its applications, vol. 401 of Mathematics
and its Applications. Kluwer Academic Publishers Group, Dordrecht (1997) Translation of ıt
Idempotent analysis and its application in optimal control (Russian), “Nauka” Moscow, 1994 [
MR1375021 (97d:49031)], Translated by V. E. Nazaikinskii, With an appendix by Pierre Del
Moral
Le Guen, V.: Cartoon + Texture Image Decomposition by the TV-L1yModel. Image Process. Line
4, 204–219 (2014)
Likas, A.C., Galatsanos, N.P.: A variational approach for bayesian blind image deconvolution.
IEEE Trans. Signal Process. 52, 2222–2233 (2004)
Lions, P.L., Rochet, J.-C.: Hopf formula and multitime Hamilton-Jacobi equations. Proc. Am.
Math. Soc. 96, 79–84 (1986)
Louchet, C.: Modèles variationnels et bayésiens pour le débruitage d’images: de la variation totale
vers les moyennes non-locales. Ph.D. thesis, Université René Descartes-Paris V (2008)
Louchet, C., Moisan, L.: Posterior expectation of the total variation model: properties and
experiments. SIAM J. Imaging Sci. 6, 2640–2684 (2013)
6 Connecting Hamilton-Jacobi Partial Differential Equations with Maximum. . . 233
McEneaney, W.: Max-plus methods for nonlinear control and estimation. Springer Science &
Business Media (2006)
McEneaney, W.: A curse-of-dimensionality-free numerical method for solution of certain HJB
PDEs. SIAM J. Control. Optim. 46, 1239–1276 (2007)
McEneaney, W.M., Deshpande, A., Gaubert, S.: Curse-of-complexity attenuation in the curse-
of-dimensionality-free method for HJB PDEs. In: 2008 American Control Conference. IEEE,
2008, pp. 4684–4690
McEneaney, W.M., Kluberg, L.J.: Convergence rate for a curse-of-dimensionality-free method for
a class of HJB PDEs. SIAM J. Control. Optim. 48, 3052–3079 (2009)
Nikolova, M., Chan, R.H.: The equivalence of half-quadratic minimization and the gradient
linearization iteration. IEEE Trans. Image Process. 16, 1623–1627 (2007)
Nikolova, M., Ng, M.: Fast image reconstruction algorithms combining half-quadratic regulariza-
tion and preconditioning. In: Proceedings 2001 International Conference on Image Processing
(Cat. No. 01CH37205), vol. 1. IEEE, 2001, pp. 277–280
Nikolova, M., Ng, M.K.: Analysis of half-quadratic minimization methods for signal and image
recovery. SIAM J. Sci. Comput. 27, 937–966 (2005)
Osher, S., A. Solé, and Vese, L.: , Image decomposition and restoration using total variation
minimization and the H −1 norm, Multiscale Modeling & Simulation, 1 (2003), pp. 349–370.
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60, 259–268 (1992)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational methods in
imaging, vol. 167 of Applied Mathematical Sciences. Springer, New York (2009)
Tho, N.: Hopf-Lax-Oleinik type formula for multi-time Hamilton-Jacobi equations. Acta Math.
Vietnamica 30, 275–287 (2005)
Vese, L.A., Le Guyader, C.: Variational methods in image processing, Chapman & Hall/CRC
Mathematical and Computational Imaging Sciences. CRC Press, Boca Raton (2016)
Winkler, G.: Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Applications
of Mathematics. Springer, 2nd edn. (2003)
Multi-modality Imaging with
Structure-Promoting Regularizers 7
Matthias J. Ehrhardt
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Variational Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Mathematical Models for Structural Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Measuring Structural Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Structure-Promoting Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Isotropic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Anisotropic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Algorithmic Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Prewhitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Numerical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Software, Data, and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Discussion on Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Abstract
M. J. Ehrhardt ()
Institute for Mathematical Innovation, University of Bath, Bath, UK
e-mail: [email protected]
Introduction
Many tasks in almost all scientific fields can be posed as an inverse problem of the
form
Ku = f (1)
Application Examples
Historically the first application where information from several modalities was
combined was positron emission tomography (PET) and magnetic resonance
imaging (MRI) in the early 1990s (Leahy and Yan 1991). Sharing information
between two different imaging modalities is motivated by the fact that all images
7 Multi-modality Imaging with Structure-Promoting Regularizers 237
Fig. 1 PET-MR and PET-CT. A low resolution functional PET image (left) is to be recon-
structed with the help of an anatomical MRI (middle) or CT image (right). As is evident from the
images, all three images share many edges due to the same underlying anatomy. Note that the high
soft tissue contrast in MRI makes it favorable over CT for this application. (Images curtesy of P.
Markiewicz and J. Schott)
will be highly influenced by the same underlying anatomy; see Fig. 1. Since single-
photon emission computed tomography (SPECT) imaging is both mathematically
and physically similar to PET imaging, most of the proposed models can be directly
translated and often models are proposed for both modalities simultaneously; see,
e.g., Bowsher et al. (1996), Rangarajan et al. (2000), Chan et al. (2007) and
Nuyts (4154). Over the years there always has been research in this direction (see,
e.g., Bowsher et al. (1996), Rangarajan et al. (2000), Comtat et al. (2002), Bowsher
et al. (2004), Baete et al. (2004), Chan et al. (2007), Chan et al. (2009), Tang and
Rahmim (2009), Bousse et al. (2010), Pedemonte et al. (2011), Somayajula et al.
(2011), Cheng-Liao and Qi (2011), Vunckx et al. (2012), Kazantsev et al. (2012),
Bousse et al. (2012) and Bai et al. (2013)), which was intensified with the advent
of the first simultaneous PET-MR scanner in 2011 (Delso et al. 2011); see, e.g.,
(Knoll et al. 2014; Ehrhardt et al. 2014, 2015; Tang and Rahmim 2015; Ehrhardt
et al. 2016; Knoll et al. 2016; Schramm et al. 2017; Mehranian et al. 2018, 2017;
Tsai et al. 2018; Zhang and Zhang 2018; Ehrhardt et al. 2019; Deidda et al. 2019).
The same motivation applies to other medical imaging techniques, for example,
multi-contrast MRI; see, e.g., Bilgic et al. (2011), Ehrhardt and Betcke (2016),
Huang et al. (2014), Sodickson et al. (2015), Song et al. (2018) and Xiang et al.
(2019). In multi-contrast MRI multiple acquisition sequences are used to acquire
data of the same patient; see Fig. 2 for a T1 - and a T2 -weighted image with shared
anatomy. Other special cases are the combination of anatomical MRI (e.g., T1 -
weighted) and magnetic particle imaging (Bathke et al. 2017), functional MRI
(fMRI) and anatomical MRI (Rasch et al. 2018b), as well as anatomical (1 H) and
fluorinated gas (19 F) MRI (Obert et al. 2020). A related imaging task is quantitative
MRI (such as Magnetic Resonance Fingerprinting Ma et al. 7440) (Davies et al.
2013; Tang et al. 2018; Dong et al. 2019; Golbabaee et al. 2020) where one aims
to reconstruct quantitative maps of tissue parameters (e.g., T1 , T2 , proton density,
off-resonance frequency), but regularizers coupling these maps have not been used
to date. The idea to couple channels has also been used for parallel MRI (Chen et al.
2013).
238 M. J. Ehrhardt
Fig. 2 Multi-contrast MRI. The same MRI scanner can produce different images depending
on the acquisition sequence such as T1 -weighted (left) and T2 -weighted images (right). (Images
courtesy of N. Burgos)
Fig. 3 Color imaging. The color image (left) is composed of three color channels (right) all of
which show similar edges due to the same scenery. (Images courtesy of M. Ehrhardt)
Starting from the 1990s, mathematical models were developed that make use
of the expected correlations between color channels of RGB images (Sapiro and
Ringach 1996; Blomgren and Chan 1998; Sochen et al. 1998); see Fig. 3. Research
in this field is still very active today; see, e.g., Tschumperlé and Deriche (2005),
Bresson and Chan (2008), Goldluecke et al. (2012), Holt (2014), Ehrhardt and
Arridge (2014), and Möller et al. (2014).
In remote sensing observations are often available from multiple sensors either
mounted on a plane or on a satellite. For example, a hyperspectral camera with
low spatial resolution and a digital camera with higher spatial resolution may be
used simultaneously; see Fig. 4. This situation naturally invites for the fusion of
information; see Ballester et al. (2006), Möller et al. (2012), Fang et al. (2013),
Loncan et al. (2015), Yokoya et al. (2017), Duran et al. (2017), Bungert et al. (2018),
Bungert et al. (2018) and references therein. In some situations the response of the
cameras to certain wavelengths is (assumed to be) known such that the data can be
fused making use of this knowledge. This is commonly referred to as pansharpening
(Loncan et al. 2015; Yokoya et al. 2017; Duran et al. 2017). It is important to note
that this assumption is sometimes not fulfilled, and many of the aforementioned
algorithms are flexible enough to fuse data in this more general situation.
7 Multi-modality Imaging with Structure-Promoting Regularizers 239
Fig. 5 Spectral CT. Standard (white-beam) CT on the left and three channels (28, 34, and 39 keV)
of spectral CT on the right of an iodine-stained lizard head reconstructed by CIL (Ametova et al.
2019). The spectral channels clearly show a large increase in intensity from 28 to 34 keV, thereby
revealing the presence, location, and concentration of iodine. (Images courtesy of J. Jorgensen and
R. Warr)
et al. 2019), CT and MRI (Xi et al. 2015), photoacoustic and optical coherence
tomography (Elbau et al. 2018), x-ray fluorescence and transmission tomography
(Di et al. 2016), and various channels in multi-modal electron tomography (Huber
et al. 2019). The combination of various imaging modalities into one system may
eventually lead to what is sometimes referred to as omni-tomography (Wang et al.
2012).
Image reconstruction with side information is mathematically similar to multi-
modal image registration, and thus it is not surprising that both fields share a lot of
mathematical models; see, e.g., Wells III et al. (1996), Maes et al. (1997), Pluim
et al. (2000), and Haber and Modersitzki (2006).
Variational Regularization
Inverse problems of the form (1) can be solved using variational regularization, i.e.,
framed as the optimization problem
formulations which are well-defined even when u is not smooth. For simplicity, we
do not go into more detail in this direction but refer the interested reader to the
literature, e.g., Bredies et al. (2010) and Burger and Osher (2013).
All three regularizers promote solutions with different smoothness properties. H1
promotes smooth solutions with small gradients everywhere, whereas TV promotes
solutions which have sparse gradients, i.e., the images are piecewise constant
and appear cartoon-like. The latter also leads to the staircase artifact which can
be overcome by TGV which promotes piecewise linear solutions. None of these
regularizers are able to encode additional information on the location or direction of
edges.
Contributions
Related Work
Joint Reconstruction
One can think of the setting (1) with extra information v as a special case when
multiple measurements
Ki ui = fi i = 1, . . . , m (6)
are taken. If m = 2 and one inverse problem is considerably less ill-posed, then this
can be solved first to guide the inversion of the other. Some of the described models
can be extended to the more general case (e.g., an arbitrary number of modalities)
or the joint recovery of both/all unknowns (see, e.g., (Sapiro and Ringach 1996;
Haber and Oldenburg 1997; Arridge and Simmons 1997; Gallardo and Meju 2003,
2004, 2011; Chen et al. 2013; Haber and Holtzman-Gazit 2013; Knoll et al. 2014;
242 M. J. Ehrhardt
Ehrhardt and Arridge 2014; Holt 2014; Ehrhardt et al. 2015; Rigie and La Riviere
2015; Knoll et al. 2016; Di et al. 2016; Mehranian et al. 2018; Zhang and Zhang
2018; Meju et al. 2019; Huber et al. 2019)), but it is out of the scope of this chapter
to provide an overview on those. For an overview up to 2015, see Ehrhardt (2015).
A few recent contributions are summarized in Arridge et al. (2020).
Model (6) may include several special cases: (i) multiple measurements of the
same unknown, i.e., ui = u, and (ii) measurements correspond to different states
of the same unknown, e.g., in dynamic imaging ui = u(·, ti ). The former case is
covered by the standard literature when concatenating the measurements and the
systems models, i.e., (Ku)i := Ki u and f = (f1 , . . . , fm ). The latter has been
widely studied in the literature, too; see, e.g., (Schmitt and Louis 2002; Schmitt
et al. 2002; Schuster et al. 2018) and references therein. Both of these are in general
unrelated to multi-modality imaging.
Eu = Ev (7)
e
where Eu = {x ∈ | ∇u(x) = 0}. We also write u ∼ v to denote that u and v are
structurally similar in the sense of edge sets.
d
We also write u ∼ v to denote that u and v are structurally similar in the sense of
parallel level sets.
Remark 1. For smooth images u and v, their gradients are perpendicular to their
level sets, i.e., u−1 (s) = {x ∈ | u(x) = s}. Thus parallel gradients are equivalent
to parallel level sets which explains the naming. The notion that the structure of an
image is contained in its level sets dates back to Caselles et al. (2002).
same edge set since Eu = Ev = , but they do not have parallel level sets since
∇u(x) = [1, 0] but ∇v(x) = [0, 1].
Remark 4. It has been argued in the literature that many multi-modality images
z : → Rm essentially decompose as
where ρ(x) describes its structure and τ is a material property; see, e.g., Kimmel
et al. (2000) and Holt (2014). Since the material does not change arbitrarily, it is
natural to assume that τi is slowly varying or even piecewise constant. In the latter
case, if x is such that ∇τi (x) = 0, then we have
d
in particular if τi , τj = 0, then zi ∼ zj . This property is also related to the material
decomposition in spectral CT; see, e.g., Fessler et al. (2002), Heismann et al. (2012)
and Long and Fessler (2014).
Measuring the degree of similarity with respect to the previous two definitions of
structural similarity is not easy, and we will now discuss a couple of ideas from the
literature. Here and for the rest of this chapter, we will make frequent use of the
vector-valued representation of a set of images z : → R2 , z(x) := [u(x), v(x)].
We denote by J its Jacobian, i.e., J : → Rd×2 , Ji,j = ∂i zj .
e
With the definition of the Jacobian, we see that u ∼ v if and only if
|J (x)|0 dx = |∇u(x)|0 dx = |∇v(x)|0 dx (11)
d e
Similarly, by definition u ∼ v if and only if u ∼ v and (a) rank J (x) = 1 for all
x ∈ Eu. (a) is equivalent to (b) a vanishing determinant, i.e., det J (x)J (x) = 0.
Simple calculations (see, e.g., Ehrhardt (2015)) show that
where we use the notation x, y = x y for the inner product of two column vectors
x and y. In order to get further equivalent statements, we turn to the singular values
of the Jacobian which are given by
1
σ12 (x) = |J (x)|2 + |J (x)|4 − det J (x)J (x) (13)
2
1
σ2 (x) =
2
|J (x)| − |J (x)| − det J (x)J (x)
2 4 (14)
2
with |J (x)|2 = |∇u(x)|2 + |∇v(x)|2 ; see, e.g., Ehrhardt (2015). Since σ1 (x) ≥
σ2 (x) ≥ 0 we have that (a) holds if and only if (c) the second singular value
vanishes, i.e., σ2 (x) = 0 or (d) the vector of singular vectors σ (x) = [σ1 (x), σ2 (x)]
is 1-sparse.
Structure-Promoting Regularizers
Many of the abstract models from the previous section to measure the degree
of similarity with respect to the previous two definitions of structural similarity
are computationally challenging as they relate to non-convex constraints. In this
section we will define convex structure-promoting regularizers which make them
computationally tractable.
Isotropic Models
We first look at isotropic models which only depend on gradient magnitudes rather
than directions, thus promoting structural similarity in the sense of edge sets,
Definition 1.
First, based on (11) if we approximate |J (x)|0 by |J (x)|, then
JTV(u) = |J (x)| dx = |∇u(x)|2 + |∇v(x)|2 dx (15)
≤ |∇u(x)| + |∇v(x)| dx = TV(u) + TV(v) (16)
246 M. J. Ehrhardt
Remark 5. Note that JTV has the favorable property that if ∇v = 0, then JTV(u) =
TV(u), so that it reduces to a well-defined regularization in u in this degenerate
case. Note that this property also holds locally.
Remark 6. We would also like to note that there is a connection between JTV and
the singular values of J . Let σ1 , σ2 : → [0, ∞) be the two singular values of J ,
and then we have
JTV(u) = σ12 (x) + σ22 (x) dx . (17)
η
w(x) = (18)
η + |∇v(x)|2
2
which is illustrated in Fig. 6. The figure shows that with a medium η the weight w
in (18) shows the main structures of the images so that these can be promoted in
the other image. If η is too small, then also unwanted structures are captured in w
such as a smooth background variation. If η is too large, then the structures start to
disappear.
For regularizers which are based on the image gradient ∇u, the weighting w can
be used to favor edges at certain locations by replacing ∇ by w∇. For instance, for
H1 (3), TV (4), and TGV (5), this strategy results in
wH1 (u) = |w(x)∇u(x)|2 dx = w 2 (x)|∇u(x)|2 dx (19)
wTV(u) = |w(x)∇u(x)| dx = w(x)|∇u(x)| dx (20)
wTGV(u) = inf |w(x)∇u(x) − ζ (x)| + β|Eζ (x)| dx (21)
ζ
which we will refer to as weighted squared H 1 -semi norm, weighted total variation,
and weighted total generalized variation. wTV was used in Arridge et al. (2008),
Lenzen and Berger (2015) and Ehrhardt and Betcke (2016). A variant of wTV has
been considered for single modality imaging in Hintermüller and Rincon-Camacho
7 Multi-modality Imaging with Structure-Promoting Regularizers 247
Fig. 6 Influence of the parameter η on estimation of edge location. The images on the right show
the scalar field w : → [0, 1] which locally weights the influence of the regularizer; see (18).
Here “black” denotes 0 and “white” denotes 1
(2010) and Dong et al. (2011) and extended to a variant of wTGV (Bredies et al.
2012).
Anisotropic Models
∇v(x)
D(x) = I − γ ξ(x)ξ (x) , ξ(x) = (22)
η + |∇v(x)|2
2
for γ ∈ (0, 1] (usually close to 1) and η > 0 satisfies all of these properties. Clearly
if ∇v(x) = 0, then ξ = 0 such that D(x) = I . Moreover, if ∇u(x) ∇v(x), then
there exists an α such that ∇u(x) = α∇v(x) and
248 M. J. Ehrhardt
Fig. 7 Influence of the parameter η on estimation of edge location and direction. The images on
the right show the vector field ξ : → Rd which locally defines the influence of the regularizer;
see, e.g., (22). Here “black” denotes that the magnitude of ξ , i.e., |ξ(x)|, is 0, and a bright color
denotes that |ξ(x)| is 1. The colors show the direction of the vector field ξ modulo its sign
γ
D(x)∇u(x) = I − ∇v(x)∇v (x) ∇u(x) (23)
η2 + |∇v(x)|2
γ |∇v(x)|2
= 1− ∇u(x) . (24)
η2 + |∇v(x)|2
Another strategy to promote parallel level sets is via nuclear norm of the Jacobian
min(d,2)
which is defined as |J (x)|∗ = i=1 σi (x) where σi (x) denotes the ith singular
value of J (x). Using the nuclear norm promotes sparse vectors of singular values
σ (x) = [σ1 (x), σ2 (x)] and thereby parallel level sets. As a regularizer
TNV(u) = |J (x)|∗ dx (31)
this strategy became known as total nuclear variation; see Holt (2014), Rigie and
La Riviere (2015), Knoll et al. (2016), and Rigie et al. (2017).
All first-order regularizers of this section can be readily summarized in the
following standard form
J(u) = φ[B(x)∇u(x)] dx (32)
Algorithmic Solution
Note that the solution to variational regularization (2) with either first- (32) or
second-order structural regularization (5), (21), (27) can be cast into the general
non-smooth composite optimization form
with F(y) = ni=1 Fi (yi ) and Ax = [A1 x, . . . , An x]; see Table 2. We denote by
· 2,1 , · 22 and · ∗,1 discretizations of
z → |z(x)| dx, z → |z(x)| dx
2
and z → |z(x)|∗ dx. (34)
Note that all functionals Fi and G in Table 2 are proper, convex, and lower-semi
continuous.
Algorithm
A popular algorithm to solve (33) and therefore (2) is the primal-dual hybrid
gradient (PDHG) (Esser et al. 2010; Chambolle and Pock 2011); see Algorithm 1.
It consists of two simple steps only involving basic linear algebra and the evaluation
of the operator A and its adjoint A∗ . Moreover, it involves the computation of the
proximal operator of τ G and the convex conjugate of σ F∗ where τ and σ are scalar
step sizes. The proximal operator of a functional H is defined as
1
proxH (z) := arg min x−z 2
2 + H(x) . (35)
x 2
Table 2 Mapping the variational regularization models into the composite optimization frame-
work (33). In all cases we choose A1 x = Ku, F1 (y1 ) = D(y1 , b), and G(x) = ı≥0 (u)
Regularizer Definition x A2 x A3 x F2 (y2 ) F3 (y3 )
H1 (3) u ∇u – α y2 22 –
wH1 (19) u w∇u – α y2 22 –
dH1 (19) u D∇u – α y2 22 –
TV (4) u ∇u – α y2 2,1 –
wTV (20) u w∇u – α y2 2,1 –
dTV (26) u D∇u – α y2 2,1 –
JTV (16) u [∇u, 0] – α y2 − [0, ξ ] 2,1 –
TNV (31) u [∇u, 0] – α y2 − [0, ξ ] ∗,1 –
TGV (5) (u, ζ ) ∇u − ζ Eζ α y2 2,1 αβ y3 2,1
wTGV (21) (u, ζ ) w∇u − ζ Eζ α y2 2,1 αβ y3 2,1
dTGV (27) (u, ζ ) D∇u − ζ Eζ α y2 2,1 αβ y3 2,1
or the dimension of the domain are strictly less than 5, i.e., m, d < 5; see Holt
(2014) for more details. Note also that the proximal operator of αF(· − ξ ) can be
readily computed based on the proximal operator of F. More details on proximal
operators, convex conjugates, and examples can be found, for example, in Bauschke
and Combettes (2011), Combettes and Pesquet (2011), Parikh and Boyd (2014), and
Chambolle and Pock (2016).
For some applications (e.g., x-ray tomography), a preconditioned (Pock and
Chambolle 2011; Ehrhardt et al. 2019) or randomized (Chambolle et al. 2018;
Ehrhardt et al. 2019) variant can be useful, but we will not consider these here for
simplicity.
Prewhitening
˜ Ãx) + G(x) .
min F( (36)
x
252 M. J. Ehrhardt
˜
with F(y) := ni=1 Fi ( Ai · yi ) and Ãi x := Ai x/ Ai . Then trivially Ãi =
1, i = 1, . . . , n so that all operator norms are equal. Note that the proximal operator
of σ F˜ is simple to compute if the proximal operators of σ Fi , i = 1, . . . , n are
simple to compute, since
for any λi > 0; see, for instance, Bredies and Lorenz (2018, Lemma 6.136).
Numerical Comparison
Software The numerical computations are carried out in Python using ODL
(version 1.0.0.dev0) (Adler et al. 2017) and ASTRA (van Aarle et al. 2015, 2016)
for computing line integrals in the tomography example. The source code which
reproduces all experiments in this chapter can be found at https://github.com/
mehrhardt/Multi-Modality-Imaging-with-Structural-Priors.
Data We consider two test cases with different characteristics, both of which are
visualized in Fig. 8. The first test case, later referred to as x-ray, is parallel beam
x-ray reconstruction from only 15 views where additionally some detectors are
broken. The latter is modeled by salt-and-pepper noise where 5% of all detectors are
corrupted. We aim to recover an image with domain [−1, 1]2 discretized with 2002
pixels. The simulated x-ray camera has 100 detectors and a width of 3 in the same
dimensions as the image domain. Therefore, the challenges are (1) sparse views, (2)
small number of detectors, and (3) broken detectors.
The second test case, which we refer to as super-resolution, considers the
task of super-resolution. Also here we aim to recover an image with domain [−1, 1]2
discretized with 2002 pixels. The forward operator is integrating over 52 pixels, thus
mapping images of size 2002 to images of size 402 . In addition, Gaussian noise of
mean zero and standard deviation of 0.01 is added.
Algorithmic parameters We chose the default value ρ = 1 for balancing the step
sizes in PDHG and ran the algorithm for 3,000 iterations without choosing a specific
stopping criterion.
7 Multi-modality Imaging with Structure-Promoting Regularizers 253
Fig. 8 Test cases for numerical experiments. Top: x-ray reconstruction from sparse views and
failed detectors. Bottom: super-resolution by a factor of 5 and Gaussian noise
Numerical Results
Fig. 9 Effect of edge weighting on locally weighted models for test case x-ray: increasing edge
parameter η from left to right. All other parameters where tuned to maximize the PSNR and visual
image quality
Comparison of regularizers All eleven regularizers are compared in Fig. 15. It can
be seen that the structure-promoting regularizers perform much better in terms of
PSNR and SSIM as their non-structure-promoting counterparts. Moreover, one can
7 Multi-modality Imaging with Structure-Promoting Regularizers 255
Fig. 10 Effect of edge weighting on directional models for test case x-ray: increasing edge
parameter η from left to right. All other parameters where tuned to maximize the PSNR and visual
image quality (γ = 1)
Fig. 11 Effect of edge weighting on joint total variation and total nuclear variation for test case
x-ray: increasing edge parameter η from left to right. All other parameters where tuned to
maximize the PSNR and visual image quality
Comparison of regularizers All regularizers are compared in Fig. 19 for the test
case super-resolution. It can be noted from all images that introducing
structural information allows to resolve some of the inner circles which have been
merged for regularizers which are not structure-promoting. Moreover, all total
generalized variation-based regularizers do not perform much better than the total
variation-based regularizers. The directional regularizers as well as JTV and TNV
perform best in terms of PSNR for this example.
The median computing times for the numerical experiments are reported in Table 3.
The computing time of PDHG is mainly influenced by the dimensions of the models,
the proximal operator, and the forward model. As can be seen from the table, H1 and
TV are roughly the same fast. TGV which uses a second primal variable in the space
of the image gradient is significantly slower with about twice the computational
cost. In all three cases, introducing isotropic weights (i.e., wH1 , wTV, and wTGV)
increases the cost by about 6 seconds, and anisotropic weights (i.e., dH1 , dTV, and
7 Multi-modality Imaging with Structure-Promoting Regularizers 257
Fig. 12 H 1 -semi norm-based structure-promoting regularizers for test case x-ray: increasing
the regularization parameter α from left to right. All other parameters where tuned to maximize
the PSNR and visual image quality. All regularizers in this figure reduce to the H 1 -semi norm in
areas where the side information is flat
dTGV) by about 12 s. JTV is more costly than dTV but not as costly as TGV. TNV
is by far the most costly of all algorithms due to the need to compute singular value
decompositions of 2 × 2-matrices for every pixel.
Since we run PDHG always for 3,000 iterations, we do not report computational
time “till convergence” but computational cost for the full 3,000 iterations. It
was observed at several occasions (see, e.g., Ehrhardt et al. 2019) that including
side information into the regularizer not only improves the reconstruction but also
258 M. J. Ehrhardt
Fig. 13 Total variation based structure-promoting regularizers for test case x-ray: increasing
the regularization parameter α from left to right. All other parameters where tuned to maximize
the PSNR and visual image quality. All regularizers in this figure reduce to the total variation in
areas where the side information is flat
7 Multi-modality Imaging with Structure-Promoting Regularizers 259
Fig. 14 Total generalized variation-based structure-promoting regularizers for test case x-ray:
increasing the regularization parameter α from left to right. All other parameters where tuned to
maximize the PSNR and visual image quality (β = 5e−2). All regularizers in this figure reduce to
the total generalized variation in areas where the side information is flat
Fig. 15 Comparison of structure-promoting regularizers for test case x-ray. All parameters
where tuned to maximize the PSNR and visual image quality
7 Multi-modality Imaging with Structure-Promoting Regularizers 261
Fig. 16 Effect of edge weighting on locally weighted models for test case super-
resolution: increasing edge parameter η from left to right. All other parameters where tuned
to maximize the PSNR and visual image quality
Conclusions
Fig. 17 Effect of edge weighting on directional models for test case super-resolution:
increasing edge parameter η from left to right. All other parameters where tuned to maximize
the PSNR and visual image quality (γ = 0.9)
of these regularizers for the promotion of structure has been observed in many
applications and was also illustrated in this chapter on two simulation studies.
Open Problems
Fig. 18 Effect of edge weighting on joint total variation and total nuclear variation for test case
super-resolution: increasing edge parameter η from left to right. All other parameters where
tuned to maximize the PSNR and visual image quality
Extensions beyond two modalities It is natural to consider the case that more than
one image is available as side information. For instance, in some remote sensing
applications, a color photograph with high spatial resolution is available. Similarly,
in PET-MR, images of more than one MR sequence might be available. This setting
has also been considered in Mehranian et al. (2017) for a purely discrete model.
Some of the regularizers to promote structural similarity in this chapter naturally
264 M. J. Ehrhardt
extend to multiple images as side information, but this has not yet been properly
investigated.
Joint reconstruction Throughout this chapter the focus was on improving the
reconstruction of one image with the aid of another modality used as side infor-
mation. Since the other image is rarely acquired directly, it is natural to aim to
reconstruct both images simultaneously rather than sequentially. While conceptually
appealing this strategy leads to many more complications than the approach
discussed in this chapter which is sometimes referred to as one-sided reconstruction.
While the mathematical framework for one-sided reconstruction is quite mature,
the framework for joint reconstruction is despite a lot of research effort in the last
10 years still in its infancy. Fundamental problems like computationally tractable
and efficient coupling of modalities are still unsolved. The appealing strategy of
making use of the solid mathematical foundations of one-sided reconstruction for
joint reconstruction in a mathematical sound and computationally tractable way is
still not possible to date.
266 M. J. Ehrhardt
Acknowledgments The author acknowledges support from the EPSRC grant EP/S026045/1 and
the Faraday Institution EP/T007745/1. Moreover, the author is grateful to all his collaborators
which indirectly contributed to this chapter over the last couple of years.
References
van Aarle, W., Palenstijn, W.J., Cant, J., Janssens, E., Bleichrodt, F., Dabravolski, A., De
Beenhouwer, J., Joost Batenburg, K., Sijbers, J.: Fast and flexible X-ray tomography using the
ASTRA toolbox. Optics Express 24(22), 25129 (2016). https://doi.org/10.1364/OE.24.025129
van Aarle, W., Palenstijn, W.J., De Beenhouwer, J., Altantzis, T., Bals, S., Batenburg, K.J.,
Sijbers, J.: The ASTRA Toolbox: A platform for advanced algorithm development in electron
tomography. Ultramicroscopy 157, 35–47 (2015). https://doi.org/10.1016/j.ultramic.2015.05.
002
Adler, J., Kohr, H., Öktem, O.: Operator Discretization Library (ODL) (2017). https://doi.org/10.
5281/zenodo.249479
Ametova, E., Fardell, G., Jørgensen, J.S., Lionheart, W.R.B., Papoutsellis, E., Pasca, E., Sykes, D.,
Turner, M., Warr, R., Withers, P.J.: Core Imaging Library (CIL) (2019). https://www.ccpi.ac.
uk/cil
Arridge, S.R., Burger, M., Ehrhardt, M.J.: Preface to special issue on joint reconstruction and
multi-modality/multi-spectral imaging. Inverse Prob. 36, 020302 (2020)
Arridge, S.R., Kolehmainen, V., Schweiger, M.J.: Reconstruction and regularisation in optical
tomography. In: Censor, A., Jiang, Y., Louis, M. (eds.) Mathematical Methods in Biomedical
Imaging and Intensity-Modulated Radiation Therapy (IMRT). Scuola Normale Superiore
(2008)
Arridge, S.R., Simmons, A.: Multi-spectral probabilistic diffusion using Bayesian classification.
In: ter Haar Romeny, B.M., Florack, L., Koenderink, J.J., Viergever M.A. (eds.) Scale-Space
Theories in Computer Vision, pp. 224–235. Springer, Berlin (1997). https://doi.org/10.1007/3-
540-63167-4_53
Baete, K., Nuyts, J., Van Paesschen, W., Suetens, P., Dupont, P.: Anatomical-based FDG-PET
reconstruction for the detection of hypo-metabolic regions in epilepsy. IEEE Trans. Med.
Imaging 23(4), 510–519 (2004). https://doi.org/10.1109/TMI.2004.825623
Bai, B., Li, Q., Leahy, R.M.: Magnetic resonance-guided positron emission tomography image
reconstruction. Semin. Nucl. Med. 43, 30–44 (2013). https://doi.org/10.1053/j.semnuclmed.
2012.08.006
Ballester, C., Caselles, V., Igual, L., Verdera, J., Rougé, B.: A variational model for P+XS image
fusion. Int. J. Comput. Vis. 69(1), 43–58 (2006). https://doi.org/10.1007/s11263-006-6852-x
Bathke, C., Kluth, T., Maass, P.: Improved image reconstruction in magnetic particle imaging using
structural a priori information. Int. J. Magn. Part. Imaging 3(1) (2017)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces
(2011). https://doi.org/10.1007/978-1-4419-9467-7
Benning, M., Burger, M.: Modern regularization methods for inverse problems. Acta Numerica 27,
1–111 (2018). https://doi.org/10.1017/S0962492918000016
Bilgic, B., Goyal, V.K., Adalsteinsson, E.: Multi-contrast reconstruction with Bayesian compressed
sensing. Magn. Reson. Med. 66(6), 1601–1615 (2011). https://doi.org/10.1002/mrm.22956
Blomgren, P., Chan, T.F.: Color TV: Total variation methods for restoration of vector-valued
images. IEEE Trans. Image Process. 7(3), 304–309 (1998). https://doi.org/10.1109/83.661180
Bousse, A., Pedemonte, S., Kazantsev, D., Ourselin, S., Arridge, S.R., Hutton, B.F.: Weighted
MRI-based Bowsher priors for SPECT brain image reconstruction. In: IEEE Nuclear Science
Symposium and Medical Imaging Conference, pp. 3519–3522 (2010)
Bousse, A., Pedemonte, S., Thomas, B.A., Erlandsson, K., Ourselin, S., Arridge, S.R., Hutton, B.F.:
Markov random field and Gaussian mixture for segmented MRI-based partial volume correction
in PET. Phys. Med. Biol. 57(20), 6681–6705 (2012). https://doi.org/10.1088/0031-9155/57/20/
6681
7 Multi-modality Imaging with Structure-Promoting Regularizers 267
Bowsher, J.E., Johnson, V.E., Turkington, T.G., Jaszczak, R.J., Floyd, C.E., Coleman, R.E.:
Bayesian reconstruction and use of anatomical a priori information for emission tomography.
IEEE Trans. Med. Imaging 15(5), 673–686 (1996). https://doi.org/10.1109/42.538945
Bowsher, J.E., Yuan, H., Hedlund, L.W., Turkington, T.G., Akabani, G., Badea, A., Kurylo, W.C.,
Wheeler, C.T., Cofer, G.P., Dewhirst, M.W., Johnson, G.A.: Utilizing MRI information to
estimate F18-FDG distributions in rat flank tumors. In: IEEE Nuclear Science Symposium and
Medical Imaging Conference, pp. 2488–2492 (2004). https://doi.org/10.1109/NSSMIC.2004.
1462760
Bredies, K., Dong, Y., Hintermüller, M.: Spatially dependent regularization parameter selection
in total generalized variation models for image restoration. Int. J. Comput. Math. 1–15 (2012).
https://doi.org/10.1080/00207160.2012.700400
Bredies, K., Holler, M.: Regularization of linear inverse problems with total generalized variation.
J. Inverse Ill-Posed Prob. 22(6), 871–913 (2014). https://doi.org/10.1515/jip-2013-0068
Bredies, K., Holler, M.: A TGV-based framework for variational image decompression, zooming,
and reconstruction. Part II: Numerics. SIAM J. Imag. Sci. 8(4), 2851–2886 (2015). https://doi.
org/10.1137/15M1023877
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imag. Sci. 3(3), 492–526
(2010). https://doi.org/10.1137/090769521
Bredies, K., Lorenz, D.A.: Mathematical Image Processing, 1 edn. Birkhäuser Basel (2018).
https://doi.org/10.1007/978-3-030-01458-2
Bresson, X., Chan, T.F.: Fast dual minimization of the vectorial total variation norm and
applications to color image processing. Inverse Prob. Imaging 2(4), 455–484 (2008). https://
doi.org/10.3934/ipi.2008.2.455
Bungert, L., Coomes, D.A., Ehrhardt, M.J., Rasch, J., Reisenhofer, R., Schönlieb, C.B.: Blind
image fusion for hyperspectral imaging with the directional total variation. Inverse Prob. 34(4),
044003 (2018). https://doi.org/10.1088/1361-6420/aaaf63
Bungert, L., Ehrhardt, M.J.: Robust image reconstruction with misaligned structural information
(2020). http://arxiv.org/abs/2004.00589
Bungert, L., Ehrhardt, M.J., Reisenhofer, R.: Robust blind image fusion for misaligned hyperspec-
tral imaging data. In: Proceedings in Applied Mathematics & Mechanics, vol. 18, p. e201800033
(2018). https://doi.org/10.1002/pamm.201800033
Burger, M., Osher, S.: A guide to the TV zoo. In: Level Set and PDE Based Reconstruction
Methods in Imaging, Lecture Notes in Mathematics, vol. 2090, pp. 1–70. Springer (2013).
https://doi.org/10.1007/978-3-319-01712-9
Caselles, V., Coll, B., Morel, J.M.: Geometry and color in natural images. J. Math. Imaging Vision
16(Section 2), 89–105 (2002). https://doi.org/10.1023/A:1013943314097
Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schönlieb, C.B.: Stochastic primal-dual hybrid
gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim. 28(4),
2783–2808 (2018). https://doi.org/10.1007/s10851-010-0251-1
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011). https://doi.org/10.1007/s10851-
010-0251-1
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numerica
25, 161–319 (2016). https://doi.org/10.1017/S096249291600009X
Chan, C., Fulton, R., Feng, D.D., Cai, W., Meikle, S.: An anatomically based regionally adaptive
prior for MAP reconstruction in emission tomography. In: IEEE Nuclear Science Symposium
and Medical Imaging Conference, pp. 4137–4141 (2007). https://doi.org/10.1109/NSSMIC.
2007.4437032
Chan, C., Fulton, R., Feng, D.D., Meikle, S.: Regularized image reconstruction with an anatom-
ically adaptive prior for positron emission tomography. Phys. Med. Biol. 54(24), 7379–400
(2009). https://doi.org/10.1088/0031-9155/54/24/009
Chen, C., Li, Y., Huang, J.: Calibrationless parallel MRI with joint total variation regularization. In:
Medical Image Computing and Computer-Assisted Intervention, pp. 106–114 (2013). https://
doi.org/10.1007/978-3-642-40760-4_14
268 M. J. Ehrhardt
Cheng-Liao, J., Qi, J.: PET image reconstruction with anatomical edge guided level set prior. Phys.
Med. Biol. 56, 6899–6918 (2011). https://doi.org/10.1088/0031-9155/56/21/009
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. Springer Optim.
Appl. 49, 185–212 (2011). https://doi.org/10.1007/978-1-4419-9569-8_10
Comtat, C., Kinahan, P.E., Fessler, J.A., Beyer, T., Townsend, D.W., Defrise, M., Michel, C.J.:
Clinically feasible reconstruction of 3D whole-body PET/CT data using blurred anatomical
labels. Phys. Med. Biol. 47(1), 1–20 (2002)
Davies, M., Puy, G., Vandergheynst, P., Wiaux, Y.: A compressed sensing framework for magnetic
resonance fingerprinting. SIAM J. Imag. Sci. 7(4), 2623–2656 (2013). https://doi.org/10.1137/
130947246
Deidda, D., Karakatsanis, N.A., Robson, P.M., Tsai, Y.J., Efthimiou, N., Thielemans, K., Fayad,
Z.A., Aykroyd, R.G., Tsoumpas, C.: Hybrid PET-MR list-mode kernelized expectation maxi-
mization reconstruction. Inverse Prob. 35(4) (2019). https://doi.org/10.1088/1361-6420/ab013f
Deligiannis, N., Mota, J.F., Cornelis, B., Rodrigues, M.R., Daubechies, I.: Multi-modal dictionary
learning for image separation with application in art investigation. IEEE Trans. Image Process.
26(2), 751–764 (2017). https://doi.org/10.1109/TIP.2016.2623484
Delso, G., Furst, S., Jakoby, B., Ladebeck, R., Ganter, C., Nekolla, S.G., Schwaiger, M., Ziegler,
S.I., Fürst, S., Jakoby, B., Ladebeck, R., Ganter, C., Nekolla, S.G., Schwaiger, M., Ziegler, S.I.:
Performance measurements of the Siemens mMR integrated whole-body PET/MR scanner. J.
Nucl. Med. 52(12), 1914–22 (2011). https://doi.org/10.2967/jnumed.111.092726
Di, Z.W., Leyffer, S., Wild, S.M.: Optimization-based approach for joint X-Ray fluorescence and
transmission tomographic inversion. SIAM J. Imag. Sci. 9(1), 1–23 (2016)
Dong, G., Hintermüller, M., Papafitsoros, K.: Quantitative magnetic resonance imaging: From
fingerprinting to integrated physics-based models. SIAM J. Imag. Sci. 12(2), 927–971 (2019).
https://doi.org/10.1137/18M1222211
Dong, Y., Hintermüller, M., Rincon-Camacho, M.M.: Automated regularization parameter selec-
tion in multi-scale total variation models for image restoration. J. Math. Imaging Vision 40(1),
82–104 (2011). https://doi.org/10.1007/s10851-010-0248-9
Duran, J., Buades, A., Coll, B., Sbert, C., Blanchet, G.: A survey of pansharpening methods with
a new band-decoupled variational model. ISPRS J. Photogramm. Remote Sens. 125, 78–105
(2017). https://doi.org/10.1016/j.isprsjprs.2016.12.013
Ehrhardt, M.J.: Joint reconstruction for multi-modality imaging with common structure. Ph.d.
thesis, University College London (2015)
Ehrhardt, M.J., Arridge, S.R.: Vector-valued image processing by parallel level sets. IEEE Trans.
Image Process. 23(1), 9–18 (2014). https://doi.org/10.1109/TIP.2013.2277775
Ehrhardt, M.J., Betcke, M.M.: Multi-contrast MRI reconstruction with structure-guided total
variation. SIAM J. Imag. Sci. 9(3), 1084–1106 (2016). https://doi.org/10.1137/15M1047325
Ehrhardt, M.J., Markiewicz, P.J., Liljeroth, M., Barnes, A., Kolehmainen, V., Duncan, J., Pizarro,
L., Atkinson, D., Hutton, B.F., Ourselin, S., Thielemans, K., Arridge, S.R.: PET reconstruction
with an anatomical MRI prior using parallel level sets. IEEE Trans. Med. Imaging 35(9), 2189–
2199 (2016). https://doi.org/10.1109/TMI.2016.2549601
Ehrhardt, M.J., Markiewicz, P.J., Schönlieb, C.B.: Faster PET reconstruction with non-smooth
priors by randomization and preconditioning. Phys. Med. Biol. 64(22), 225019 (2019). https://
doi.org/10.1088/1361-6560/ab3d07
Ehrhardt, M.J., Thielemans, K., Pizarro, L., Atkinson, D., Ourselin, S., Hutton, B.F., Arridge, S.R.:
Joint reconstruction of PET-MRI by exploiting structural similarity. Inverse Prob. 31(1), 015001
(2015). https://doi.org/10.1088/0266-5611/31/1/015001
Ehrhardt, M.J., Thielemans, K., Pizarro, L., Markiewicz, P.J., Atkinson, D., Ourselin, S., Hut-
ton, B.F., Arridge, S.R.: Joint reconstruction of PET-MRI by parallel level sets. In: IEEE
Nuclear Science Symposium and Medical Imaging Conference (2014). https://doi.org/10.1109/
NSSMIC.2014.7430895
Elbau, P., Mindrinos, L., Scherzer, O.: Quantitative reconstructions in multi-modal photoacoustic
and optical coherence tomography imaging. Inverse Prob. 34(1) (2018). https://doi.org/10.1088/
1361-6420/aa9ae7
7 Multi-modality Imaging with Structure-Promoting Regularizers 269
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Mathematics and Its
Applications. Springer (1996)
Esser, E., Zhang, X., Chan, T.F.: A general framework for a class of first order primal-dual
algorithms for convex optimization in imaging science. SIAM J. Imag. Sci. 3(4), 1015–1046
(2010). https://doi.org/10.1137/09076934X
Estellers, V., Soatto, S., Bresson, X.: Adaptive regularization with the structure tensor. IEEE Trans.
Image Process. 24(6), 1777–1790 (2015). https://doi.org/10.1109/TIP.2015.2409562
Estellers, V., Thiran, J., Bresson, X.: Enhanced compressed sensing recovery with level set
normals. IEEE Trans. Image Process. 22(7), 2611–2626 (2013). https://doi.org/10.1109/TIP.
2013.2253484
Fang, F., Li, F., Shen, C., Zhang, G.: A variational approach for pan-sharpening. IEEE Trans. Image
Process. 22(7), 2822–2834 (2013). https://doi.org/10.1109/TIP.2013.2258355
Fessler, J.A., Elbakri, I., Sukovic, P., Clinthorne, N.H.: Maximum-likelihood dual-energy tomo-
graphic image reconstruction. In: SPIE: Medical Imaging, vol. 4684, pp. 1–25 (2002). https://
doi.org/doi:10.1117/12.467189
Foygel Barber, R., Sidky, E.Y., Gilat Schmidt, T., Pan, X.: An algorithm for constrained one-step
inversion of spectral CT data. Phys. Med. Biol. 61(10), 3784–3818 (2016). https://doi.org/10.
1088/0031-9155/61/10/3784
Gallardo, L.A., Meju, M.A.: Characterization of heterogeneous near-surface materials by joint 2D
inversion of DC resistivity and seismic data. Geophys. Res. Lett. 30(13), 1658 (2003). https://
doi.org/10.1029/2003GL017370
Gallardo, L.A., Meju, M.A.: Joint two-dimensional DC resistivity and seismic travel time inversion
with cross-gradients constraints. J. Geophys. Res. 109(B3), 1–11 (2004). https://doi.org/10.
1029/2003JB002716
Gallardo, L.A., Meju, M.A.: Structure-coupled multiphysics imaging in geophysical sciences. Rev.
Geophys. 49, 1–19 (2011). https://doi.org/10.1029/2010RG000330.1.INTRODUCTION
Golbabaee, M., Chen, Z., Wiaux, Y., Davies, M.: CoverBLIP: accelerated and scalable itera-
tive matched-filtering for magnetic resonance fingerprint reconstruction. Inverse Prob. 36(1),
015003 (2020). https://doi.org/10.1088/1361-6420/ab4c9a
Goldluecke, B., Strekalovskiy, E., Cremers, D.: The natural vectorial total variation which arises
from geometric measure theory. SIAM J. Imag. Sci. 5(2), 537–563 (2012). https://doi.org/10.
1137/110823766
Haber, E., Holtzman-Gazit, M.: Model fusion and joint inversion. Surv. Geophys. (34), 675–695
(2013). https://doi.org/10.1007/s10712-013-9232-4
Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images.
In: Medical Image Computing and Computer-Assisted Intervention, vol. 46, pp. 726–733.
Springer, Berlin/Heidelberg (2006). https://doi.org/10.1160/ME9046
Haber, E., Oldenburg, D.W.: Joint inversion: A structural approach. Inverse Prob. 13, 63–77 (1997).
https://doi.org/10.1088/0266-5611/13/1/006
Heismann, B., Schmidt, B., Flohr, T.: Spectral Computed Tomography. SPIE Press (2012)
Hintermüller, M., Rincon-Camacho, M.M.: Expected absolute value estimators for a spatially
adapted regularization parameter choice rule in L1-TV-based image restoration. Inverse Prob.
26(8), 085005 (2010). https://doi.org/10.1088/0266-5611/26/8/085005
Holt, K.M.: Total nuclear variation and jacobian extensions of total variation for vector fields. IEEE
Trans. Image Process. 23(9), 3975–3989 (2014). https://doi.org/10.1109/TIP.2014.2332397
Huang, J., Chen, C., Axel, L.: Fast Multi-contrast MRI reconstruction. Magn. Reson. Imaging
32(10), 1344–52 (2014). https://doi.org/10.1016/j.mri.2014.08.025
Huber, R., Haberfehlner, G., Holler, M., Bredies, K.: Total generalized variation regularization for
multi-modal electron tomography. Nanoscale 1–38 (2019). https://doi.org/10.1039/c8nr09058k
Ito, K., Jin, B.: Inverse Problems – Tikhonov Theory and Algorithms. World Scientific Publishing
(2014). https://doi.org/10.1142/9120
Kaipio, J.P., Kolehmainen, V., Vauhkonen, M., Somersalo, E.: Inverse problems with structural
prior information. Inverse Prob. 15(3), 713–729 (1999). https://doi.org/10.1088/0266-5611/15/
3/306
270 M. J. Ehrhardt
Kazantsev, D., Arridge, S.R., Pedemonte, S., Bousse, A., Erlandsson, K., Hutton, B.F., Ourselin,
S.: An anatomically driven anisotropic diffusion filtering method for 3D SPECT reconstruction.
Phys. Med. Biol. 57(12), 3793–3810 (2012). https://doi.org/10.1088/0031-9155/57/12/3793
Kazantsev, D., Jørgensen, J.S., Andersen, M.S., Lionheart, W.R., Lee, P.D., Withers, P.J.: Joint
image reconstruction method with correlative multi-channel prior for x-ray spectral computed
tomography. Inverse Prob. 34(6) (2018). https://doi.org/10.1088/1361-6420/aaba86
Kazantsev, D., Lionheart, W.R.B., Withers, P.J., Lee, P.D.: Multimodal image reconstruction using
supplementary structural information in total variation regularization. Sens. Imaging 15(1), 97
(2014). https://doi.org/10.1007/s11220-014-0097-5
Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies,
color, texture, and volumetric medical images. Int. J. Comput. Vis. 39(2), 111–129 (2000).
https://doi.org/10.1023/A:1008171026419
Knoll, F., Holler, M., Koesters, T., Otazo, R., Bredies, K., Sodickson, D.K.: Joint MR-PET
reconstruction using a multi-channel image regularizer. IEEE Trans. Med. Imaging 36(1)
(2016). https://doi.org/10.1109/TMI.2016.2564989
Knoll, F., Koesters, T., Otazo, R., Boada, F., Sodickson, D.K.: Simultaneous MR-PET reconstruc-
tion using multi sensor compressed sensing and joint sparsity. In: International Society for
Magnetic Resonance in Medicine, vol. 22 (2014)
Kolehmainen, V., Ehrhardt, M.J., Arridge, S.R.: Incorporating structural prior information and
sparsity into EIT using parallel level sets. Inverse Prob. Imaging 13(2), 285–307 (2019). https://
doi.org/10.3934/ipi.2019015
Leahy, R.M., Yan, X.: Incorporation of anatomical MR data for improved functional imaging with
PET. In: Information Processing in Medical Imaging, pp. 105–120. Springer (1991). https://doi.
org/10.1007/BFb0033746
Lenzen, F., Berger, J.: Solution-driven adaptive total variation regularization. In: SSVM, pp. 203–
215 (2015). https://doi.org/10.1007/978-3-642-24785-9
Loncan, L., De Almeida, L.B., Bioucas-Dias, J.M., Briottet, X., Chanussot, J., Dobigeon, N., Fabre,
S., Liao, W., Licciardi, G.A., Simoes, M., Tourneret, J.Y., Veganzones, M.A., Vivone, G., Wei,
Q., Yokoya, N.: Hyperspectral pansharpening: a review. IEEE Geosci. Remote Sens. Mag. 3(3),
27–46 (2015). https://doi.org/10.1109/MGRS.2015.2440094
Long, Y., Fessler, J.A.: Multi-material decomposition using statistical image reconstruction for
spectral CT. IEEE Trans. Med. Imaging 33(8), 1614–1626 (2014). https://doi.org/10.1109/TMI.
2014.2320284
Ma, D., Gulani, V., Seiberlich, N., Liu, K., Sunshine, J.L., Duerk, J.L., Griswold, M.A.:
Magnetic resonance fingerprinting. Nature 495(7440), 187–92 (2013). https://doi.org/10.1038/
nature11971
Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image
registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–
98 (1997). https://doi.org/10.1109/42.563664
Mehranian, A., Belzunce, M., Prieto, C., Hammers, A., Reader, A.J.: Synergistic PET and SENSE
MR image reconstruction using joint sparsity regularization. IEEE Trans. Med. Imaging 37(1),
20–34 (2018). https://doi.org/10.1109/TMI.2017.2691044
Mehranian, A., Belzunce, M.A., Niccolini, F., Politis, M., Prieto, C., Turkheimer, F., Hammers, A.,
Reader, A.J.: PET image reconstruction using multi-parametric anato-functional. Phys. Med.
Biol. (2017). https://doi.org/10.1042/BJ20101136>
Meju, M.A., Mackie, R.L., Miorelli, F., Saleh, A.S., Miller, R.V.: Structurally-tailored 3D
anisotropic CSEM resistivity inversion with cross-gradients criterion and simultaneous model
calibration. Geophysics 84(6), 1–62 (2019). https://doi.org/10.1190/geo2018-0639.1
Möller, M., Brinkmann, E.M., Burger, M., Seybold, T.: Color Bregman TV. SIAM J. Imag. Sci.
7(4), 2771–2806 (2014). https://doi.org/10.1137/130943388
Möller, M., Wittman, T., Bertozzi, A.L., Burger, M.: A variational approach for sharpening
high dimensional images. SIAM J. Imag. Sci. 5(1), 150–178 (2012). https://doi.org/10.1137/
100810356
7 Multi-modality Imaging with Structure-Promoting Regularizers 271
Nuyts, J.: The use of mutual information and joint entropy for anatomical priors in emission
tomography. In: IEEE Nuclear Science Symposium and Medical Imaging Conference, pp.
4149–4154. IEEE (2007). https://doi.org/10.1109/NSSMIC.2007.4437034
Obert, A.J., Gutberlet, M., Kern, A.L., Kaireit, T.F., Grimm, R., Wacker, F., Vogel-Claussen, J.:
1H-guided reconstruction of 19F gas MRI in COPD patients. Magn. Reson. Med. 1–11 (2020).
https://doi.org/10.1002/mrm.28209
Parikh, N., Boyd, S.P.: Proximal algorithms. Found Trends Optim 1(3), 123–231 (2014). https://
doi.org/10.1561/2400000003
Pedemonte, S., Bousse, A., Hutton, B.F., Arridge, S.R., Ourselin, S.: Probabilistic graphical model
of SPECT/MRI. In: Machine Learning in Medical Imaging, pp. 167–174 (2011). https://doi.org/
10.1007/978-3-642-24319-6_21
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Image registration by maximization of combined
mutual information and gradient information. IEEE Trans. Med. Imaging 19(8), 809–14 (2000).
https://doi.org/10.1109/42.876307
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex
optimization. In: Proceedings of the IEEE International Conference on Computer Vision, pp.
1762–1769 (2011). https://doi.org/10.1109/ICCV.2011.6126441
Rangarajan, A., Hsiao, I.T., Gindi, G.: A Bayesian joint mixture framework for the integration
of anatomical information in functional image reconstruction. J. Math. Imaging Vision 12(3),
199–217 (2000). https://doi.org/10.1023/A:1008314015446
Rasch, J., Brinkmann, E.M., Burger, M.: Joint reconstruction via coupled bregman iterations with
applications to PET-MR imaging. Inverse Prob. 34(1), 014001 (2018a). https://doi.org/10.1088/
1361-6420/aa9425
Rasch, J., Kolehmainen, V., Nivajarvi, R., Kettunen, M., Gröhn, O., Burger, M., Brinkmann, E.M.:
Dynamic MRI reconstruction from undersampled data with an anatomical prescan. Inverse Prob.
34(7) (2018b). https://doi.org/10.1088/1361-6420/aac3af
Rigie, D., La Riviere, P.: Joint reconstruction of multi-channel, spectral CT data via constrained
total nuclear variation minimization. Phys. Med. Biol. 60, 1741–1762 (2015). https://doi.org/
10.1088/0031-9155/60/4/1741
Rigie, D.S., Sanchez, A.A., La Riviére, P.J.: Assessment of vectorial total variation penalties on
realistic dual-energy CT data. Phys. Med. Biol. 62(8), 3284–3298 (2017). https://doi.org/10.
1088/1361-6560/aa6392
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algo-
rithms. Physica D: Nonlinear Phenom. 60(1), 259–268 (1992). https://doi.org/10.1016/0167-
2789(92)90242-F
Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multivalued images with applications to color
filtering. IEEE Trans. Image Process. 5(11), 1582–1586 (1996). https://doi.org/10.1109/83.
541429
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging, vol. 167 . Springer, New York/London (2008)
Schmitt, U., Louis, A.K.: Efficient algorithms for the regularization of dynamic inverse problems:
I. Theory. Inverse Problems 18(3), 645–658 (2002). https://doi.org/10.1088/0266-5611/18/3/
308
Schmitt, U., Louis, A.K., Wolters, C., Vaukhonen, M.: Efficient algorithms for the regularization
of dynamic inverse problems: II. Applications. Inverse Prob. 18(3), 659–676 (2002). https://doi.
org/10.1088/0266-5611/18/3/308
Schramm, G., Holler, M., Rezaei, A., Vunckx, K., Knoll, F., Bredies, K., Boada, F., Nuyts,
J.: Evaluation of parallel level sets and Bowsher’s method as segmentation-free anatomical
priors for time-of-flight PET reconstruction. IEEE Trans. Med. Imaging 62(2), 590–603 (2017).
https://doi.org/10.1109/TMI.2017.2767940
Schuster, T., Hahn, B., Burger, M.: Dynamic inverse problems: Modelling – Regularization –
numerics. Inverse Prob. 34(4) (2018). https://doi.org/10.1088/1361-6420/aab0f5
272 M. J. Ehrhardt
Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Trans. Image
Process. 7(3), 310–318 (1998). https://doi.org/10.1109/83.661181
Sodickson, D.K., Feng, L., Knoll, F., Cloos, M., Ben-Eliezer, N., Axel, L., Chandarana, H., Block,
K.T., Otazo, R.: The rapid imaging renaissance: Sparser samples, denser dimensions, and
glimmerings of a grand unified tomography. In: Proceedings of SPIE, vol. 9417, pp. 94170G1–
9417014 (2015). https://doi.org/10.1117/12.2085033
Somayajula, S., Panagiotou, C., Rangarajan, A., Li, Q., Arridge, S.R., Leahy, R.M.: PET image
reconstruction using information theoretic anatomical priors. IEEE Trans. Med. Imaging 30(3),
537–549 (2011). https://doi.org/10.1109/TMI.2010.2076827
Song, P., Deng, X., Mota, J.F.C., Deligiannis, N., Dragotti, P.L., Rodrigues, M.: Multimodal image
super-resolution via joint sparse representations induced by coupled dictionaries. IEEE Trans.
Comput. Imaging 1–1 (2019). https://doi.org/10.1109/tci.2019.2916502
Song, P., Weizman, L., Mota, J.F., Eldar, Y.C., Rodrigues, M.R.: Coupled dictionary learning for
multi-contrast MRI reconstruction. In: International Conference on Image Processing, 2, pp.
2880–2884 (2018). https://doi.org/10.1109/ICIP.2018.8451341
Tang, J., Rahmim, A.: Bayesian PET image reconstruction incorporating anato-functional joint
entropy. Phys. Med. Biol. 54(23), 7063–75 (2009). https://doi.org/10.1088/0031-9155/54/23/
002
Tang, J., Rahmim, A.: Anatomy assisted PET image reconstruction incorporating multi-resolution
joint entropy. Phys. Med. Biol. 60(1), 31–48 (2015). https://doi.org/10.1088/0031-9155/60/1/31
Tang, S., Fernandez-Granda, C., Lannuzel, S., Bernstein, B., Lattanzi, R., Cloos, M., Knoll, F.,
Asslander, J.: Multicompartment magnetic resonance fingerprinting. Inverse Prob. 34(9) (2018).
https://doi.org/10.1088/1361-6420/aad1c3
Tsai, Y.J., Member, S., Bousse, A., Ahn, S., Charles, W., Arridge, S., Hutton, B.F., Member,
S., Thielemans, K.: Algorithms for solving misalignment issues in penalized PET/CT recon-
struction using anatomical priors. In: IEEE Nuclear Science Symposium and Medical Imaging
Conference Proceedings (NSS/MIC). IEEE (2018)
Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDEs: A common frame-
work for different applications. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 506–517 (2005).
https://doi.org/10.1109/TPAMI.2005.87
Vunckx, K., Atre, A., Baete, K., Reilhac, A., Deroose, C.M., Van Laere, K., Nuyts, J.: Evaluation
of three MRI-based anatomical priors for quantitative PET brain imaging. IEEE Trans. Med.
Imaging 31(3), 599–612 (2012). https://doi.org/10.1109/TMI.2011.2173766
Wang, G., Zhang, J., Gao, H., Weir, V., Yu, H., Cong, W., Xu, X., Shen, H., Bennett, J., Furth,
M., Wang, Y., Vannier, M.: Towards omni-tomography – grand fusion of multiple modalities
for simultaneous interior tomography. PloS one 7(6), e39700 (2012). https://doi.org/10.1371/
journal.pone.0039700
Wells III, W.M., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration
by maximization of mutual information. Med. Image Anal. 1(1), 35–51 (1996)
Xi, Y., Zhao, J., Bennett, J., Stacy, M., Sinusas, A., Wang, G.: Simultaneous CT-MRI reconstruction
for constrained imaging geometries using structural coupling and compressive sensing. IEEE
Trans. Biomed. Eng. (2015). https://doi.org/10.1109/TBME.2015.2487779
Xiang, L., Chen, Y., Chang, W., Zhan, Y., Lin, W., Wang, Q., Shen, D.: Deep-learning-based multi-
modal fusion for fast MR reconstruction. IEEE Trans. Biomed. Eng. 66(7), 2105–2114 (2019).
https://doi.org/10.1109/TBME.2018.2883958
Yokoya, N., Grohnfeldt, C., Chanussot, J.: Hyperspectral and multispectral data fusion: A
comparative review of the recent literature. IEEE Geosci. Remote Sens. Mag. 5(2), 29–56
(2017). https://doi.org/10.1109/MGRS.2016.2637824
Zhang, Y., Zhang, X.: PET-MRI joint reconstruction with common edge weighted total variation
regularization. Inverse Prob. 34(6), 065006 (2018). https://doi.org/10.1088/1361-6420/aabce9
Diffraction Tomography, Fourier
Reconstruction, and Full Waveform Inversion 8
Florian Faucher, Clemens Kirisits, Michael Quellmalz,
Otmar Scherzer, and Eric Setterqvist
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Contribution and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Forward Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Incident Plane Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Modeling the Total Field Using Line and Point Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Numerical Comparison of Forward Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Modeling the Scattered Field Assuming Incident Plane Waves . . . . . . . . . . . . . . . . . . . . . . 283
F. Faucher
Faculty of Mathematics, University of Vienna, Vienna, Austria
Project-Team Makutu, Inria Bordeaux Sud-Ouest, Talence, France
e-mail: [email protected]; [email protected]
C. Kirisits
Faculty of Mathematics, University of Vienna, Vienna, Austria
e-mail: [email protected]
M. Quellmalz
Institute of Mathematics, Technical University Berlin, Berlin, Germany
e-mail: [email protected]
O. Scherzer ()
Faculty of Mathematics, University of Vienna, Vienna, Austria
Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria
Christian Doppler Laboratory for Mathematical Modeling and Simulation of Next Generations of
Ultrasound Devices (MaMSi), Vienna, Austria
e-mail: [email protected]
E. Setterqvist
Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria
e-mail: [email protected]
Modeling the Total Field Using Line and Point Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Fourier Diffraction Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Rotating the Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Varying Wave Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Rotating the Object with Multiple Wave Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Reconstruction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Reconstruction Using Full Waveform Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Reconstruction Based on the Born and Rytov Approximations . . . . . . . . . . . . . . . . . . . . . . 291
Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Reconstruction of Circular Contrast with Various Amplitudes and Sizes . . . . . . . . . . . . . . 294
Reconstruction of Embedded Shapes: Phantom 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Reconstruction of Embedded Shapes: Phantom 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
Computational Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Abstract
Keywords
Introduction
inversion of this relation is the Fourier slice theorem. Roughly speaking, it says
that the Fourier transformed measurements are equal to the Fourier transform of f
evaluated along slices through the origin (Natterer 1986).
The straight ray assumption of computerized tomography can be considered
valid as long as the wavelength of the incident field is much smaller than the
size of the relevant details in the object. As soon as the wavelength is similar to
or greater than those details, for instance, in situations where X-rays are replaced
by visible light, diffraction effects are no longer negligible. As an example of a
medical application, an optical diffraction experiment in Sung et al. (2009) utilized
a red laser of wavelength 633 nm to illuminate human cells of diameter around
10 µm, which include smaller subcellular organelles. One way to achieve better
reconstruction quality in such cases is to drop the straight ray assumption and adopt
a propagation model based on the wave equation instead.
The theoretical groundwork for DT was laid more than half a century ago (Wolf
1969). The central result derived there, sometimes called the Fourier diffraction
theorem, says that the Fourier transformed measurements of the scattered wave
are equal to the Fourier transform of the scattering potential evaluated along a
hemisphere. This result relies on a series of assumptions: (i) the object is immersed
in a homogeneous background, (ii) the incident field is a monochromatic plane
wave, (iii) the scattered wave is measured on a plane in R3 , and (iv) the first Born
approximation of the scattered field is valid.
On the one hand, the Born approximation greatly simplifies the relationship
between scattered wave and scattering potential. On the other hand, however, it
generally requires the object to be weakly scattering, thus limiting the applicability
of the Fourier diffraction theorem. An alternative is to assume validity of the first
Rytov approximation instead (Iwata and Nagata 1975). While mathematically this
amounts to essentially the same reconstruction problem, the underlying physical
assumptions are not identical to those of the Born approximation, leading to a
different range of applicability in general (Chen and Stamnes 1998; Slaney et al.
1984). Nevertheless, the restriction to weakly scattering objects remains.
Full waveform inversion (FWI) is a different approach that can overcome some
of the limitations of the first-order methods, typically at the cost of being computa-
tionally more demanding. It relies on the iterative minimization of a cost functional
which penalizes the misfit between measurements and forward simulations of the
total field, cf. Bamberger et al. (1979), Lailly (1983), Pratt et al. (1998), Tarantola
(1984), and Virieux and Operto (2009). Here, the forward model consists of the
solution of the full wave equation, without simplification of first-order approxima-
tions. It results in a nonlinear minimization problem to be solved, typically with
Newton-type methods (Virieux and Operto 2009; Nocedal and Wright 2006).
In practical experiments, there are sometimes only measurements of the intensity,
i.e., the absolute value of the complex-valued wave, available. Different phase
retrieval methods were investigated, e.g., in Maleki and Devaney (1993), Gbur and
Wolf (2002), Horstmeyer et al. (2016), and Beinert and Quellmalz (2022). For this
chapter, we assume that both the phase and amplitude information are present, which
can be achieved by interferometry, cf. Wedberg and Stamnes (1995).
276 F. Faucher et al.
Experimental Setup
1 https://ffaucher.gitlab.io/hawen-website/
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 277
and the resulting field is measured on the line x2 = rM . The distance between
the measurement line and the origin, rM > 0, is sufficiently large so that it does
not intersect the object. From the measurements, we aim to reconstruct the object’s
scattering properties. In order to improve the reconstruction quality, we generate
additional data by rotating the object or changing the incident field’s wavelength.
See Fig. 1 for an illustration of the experimental setup.
We now introduce the physical quantities needed subsequently. Let λ > 0 denote
the wavelength of the incident wave and k0 = 2π /λ the background wave number.
Furthermore, let n(x) denote the refractive index at position x ∈ R2 and n0 the
constant refractive index of the background. From these quantities, we define the
wave number
n(x)
k(x) := k0 .
n0
ω ω
k(x) = and k0 = , (1)
c(x) c0
where c0 is the constant wave speed in the background. The scattering potential f
is obtained by subtracting the background wave number k0 :
n(x)2
f (x) := k 2
(x) − k02 = k02 −1 . (2)
n20
278 F. Faucher et al.
Note that, for all practical purposes, f can be assumed to be bounded and compactly
supported in the disk BrM = {x ∈ R2 : x < rM }.
In our subsequent reconstructions with Born and Rytov approximations, f is the
quantity to be reconstructed from the measured data and k0 is known. On the other
hand, with FWI, we reconstruct c; see Remark 2. These two quantities can be related
to each other via
ω2
c(x) = . (3)
k02 + f (x)
Forward Models
In this section, we propose several forward models for the experiment presented
above. For all of them, the starting point is the system of equations
⎫
− − k(x)2 utot (x) = g(x),⎪ ⎪
⎪
⎬
inc
− − k0 u (x) = g(x),
2
x ∈ R2 . (4)
⎪
⎪
⎪
utot (x) = uinc (x) + usca (x), ⎭
Here, uinc is the given incident field, the total field utot is what is recorded on
the measurement line {x ∈ R2 : x2 = rM }, and the difference between the two
constitutes the scattered field usca . We describe different sources g in the following
subsections. The scattered field usca is assumed to satisfy the Sommerfeld radiation
condition which requires that
∂usca
lim x − ik0 usca =0
x→∞ ∂ x
uniformly for all directions x/x. It guarantees that usca is an outgoing wave.
Further details concerning derivation and analytical properties of problems like
Equation 4 can be found, for instance, in Colton and Kress (2013).
The models considered below are based on the following specifications of
Equation 4. Their numbers agree with the corresponding subsection numbers, where
the models will be explained in more detail.
2.1 Point source g represents a point source located far from the object.
2.2 Line source g represents simultaneous point sources positioned along a
straight line. We refer to this configuration as a “line source”.
They take the form u(x) = eik0 x·s , where the unit vector s specifies the direction
of propagation of u. Plane waves are widely studied in imaging applications and
theory, and we refer to Colton and Kress (2013), Devaney (2012), and Kak and
Slaney (2001) for further information.
In the first model, we consider the incident field is a monochromatic plane wave
propagating in direction e2
In this case, we obtain from Equation 4 the following equation for the scattered field
− − k(x)2 usca (x) = f (x) eik0 x2 . (6)
If the scattered field usca is negligible compared to the incident field eik0 x2 , we can
ignore usca on the right-hand side and obtain
where uBorn is the (first-order) Born approximation to the scattered field. Supple-
menting this equation with the Sommerfeld radiation condition, we have a unique
solution corresponding to an outgoing wave (Colton and Kress 2013). It can be
written as a convolution
uBorn (x) = G(x − y) f (y) eik0 y2 dy, (8)
R2
i (1)
G(x) = H (k0 x), x ∈ R2 \ {0}, (9)
4 0
(1)
where H0 denotes the zeroth-order Hankel function of the first kind; see Colton
and Kress (2013, Sect. 3.4). We note that, in spite of a singularity at the origin, G is
locally integrable in R2 .
The second-order Born approximation can be obtained by replacing the plane
wave eik0 y2 in Equation 8 by the sum eik0 y2 + uBorn (y). Iterating this procedure
yields Born approximations of arbitrary order. For more details, we refer to Kak and
Slaney (2001, Sect. 6.2.1) and Devaney (2012).
tot inc
utot = eϕ , uinc = eϕ , ϕ tot = ϕ inc + ϕ sca , (10)
2 2 2
where ∇ϕ sca = ∂ϕ sca /∂x1 + ∂ϕ sca /∂x2 . The details of this derivation can
2
be found, for instance, in Kak and Slaney (2001, Sect. 6.2.2). Neglecting ∇ϕ sca
in Equation 11, we obtain
where ϕ Rytov is the Rytov approximation to ϕ sca . Note that we still assume uinc to be
a monochromatic plane wave, as given in Equation 5. Thus, the product uinc ϕ Rytov
solves the same equation as uBorn . If we define the Rytov approximation to the
scattered field, uRytov , in analogy to Equation 10 via
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 281
Rytov +ϕ inc
uRytov = eϕ − uinc ,
Born
and replace ϕ Rytov by uuinc , we obtain a relation between the two approximate
scattered fields that can be expressed as
uRytov
uBorn = uinc log +1 . (13)
uinc
The relation between Born and Rytov in Equation 13 is not unique because of
the multiple branches of the complex logarithm. In practical computations, this is
addressed by a phase unwrapping as we will see in Equation 30.
Remark 1. There have been many investigations about the validity of the Born and
Rytov approximations; see, e.g., Chen and Stamnes (1998), Slaney et al. (1984),
or Kak and Slaney (2001, chap. 6). The Born approximation is reasonable only
for a relatively (to the wavelength) small object. In particular, for a homogeneous
cylinder of radius a, the Born approximation is valid if a(n − n0 ) < λ/4, where
λ is the wavelength of the incident wave and n is the constant refractive index
inside the object. In contrast, the Rytov approximation only requires that n − n0 >
(∇ϕ sca )2 /k02 , i.e., the phase change of the scattered phase ϕ sca , see Equation 10, is
small over one wavelength, but it has no direct requirements on the object size and
is therefore applicable for a larger class of objects. The latter is also observed in
numerical simulations in Chen and Stamnes (1998). However, for objects that are
small and have a low contrast n − n0 , the Born and Rytov approximation produce
approximately the same results.
If the position of the source is given by x0 = −r0 e2 with r0 > 0 sufficiently large,
then, after appropriate rescaling, uinc
P approximates a plane wave with wave number
k0 and propagation direction e2 in a neighborhood of 0.
Line Source
Alternatively, we let g be a sum of Dirac functions and consider
⎧
⎪
⎪ tot
N sim
⎪
⎪
⎪
⎪ − − k(x)2
u (x) = δ(x − xj ) ,
⎪
⎨
L
j =1
(15)
⎪
⎪
N
⎪
⎪ sim
⎪
⎪ − − k0 uinc
2
L (x) = δ(x − xj ) ,
⎪
⎩
j =1
where the number Nsim of simultaneous point sources should be sufficiently large.
Moreover, the positions xj should be arranged uniformly along a line perpendicular
to the propagation direction e2 of the plane wave. This is illustrated in section “Mod-
eling the Total Field Using Line and Point Sources”.
In this section, we numerically compare the forward models presented above. For
the discretization of partial differential equations, several approaches exist, we
mention, for instance, the finite differences that approximate the problem on a nodal
grid (e.g., Virieux 1984), or methods that use the variational formulation, such as
finite elements (Monk 2003) or discontinuous Galerkin methods (Hesthaven and
Warburton 2007). In our work, we use the hybridizable discontinuous Galerkin
method (HDG) for (HDG) the discretization and refer to Cockburn et al. (2009),
Kirby et al. (2012), and Faucher and Scherzer (2020) for more details. The
implementation precisely follows the steps prescribed in Faucher and Scherzer
(2020), and it is carried out in the open-source parallel software hawen; see Faucher
(2021) and Footnote 1. While the propagation is assumed on infinite space, the
numerical simulations are performed on a finite discretization domain ⊂ R2 ,
with absorbing boundary conditions (Engquist and Majda 1977) implemented to
simulate free-space. It corresponds to the following Robin-type condition applied
on the boundary of the discretization domain :
where ∂n u denotes the normal derivative of u. The test sample used below is a
homogeneous medium encompassing a circular object of radius 4.5 around the
origin with contrast f = 1. This corresponds to the characteristic function
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 283
⎧
⎨1, x ∈ B ,
a
1disk (x) := (17)
a
⎩0, x ∈ R2 \ Ba ,
of the disk Ba with radius a > 0. The incident plane wave has wave number
k0 = 2π .
We consider the solutions usca of Equation 6 and uBorn of Equation 7, both satisfying
boundary condition Equation 16, and simulated following the HDG discretization
indicated above. As an alternative for computing the Born approximation uBorn ,
we discretize the convolution Equation 8 with the Green’s function G given in
Equation 9. In particular, applying an N × N quadrature on the uniform grid
RN = {−rM , −rM + 2rM /N, . . . , rM − 2rM /N}2 to Equation 8, we obtain
2rM 2
uBorn (x) ≈ uBorn
conv,N (x) := G(x − y) f (y) eik0 y2 , x ∈ R2 . (18)
N
y∈RN
The objective is to evaluate how considering line and point sources differs from
using incident plane waves and if the data obtained with both approaches are
comparable. To compare the scattered fields obtained from Equations 14 and 15
with the solution usca of Equation 6, one needs to rescale according to
usca
P = αP utot
P − uinc
P ,
L = αL uL − uL
usca tot inc
,
where αP and αL are constants depending only on the number and positions of
the point sources xj . We illustrate in Fig. 3, where the line source is positioned
at fixed height x2 = −15 and composed of 441 points between x1 = −22
and x1 = 22. For the case of a point source, we have to consider a very wide
domain, namely, [−500, 500] × [−500, 500], and the point source is positioned in
284 F. Faucher et al.
0 0 0 0 0 0
inc
−20 −20 −20
−1 −0.5 −0.5
−20 0 20 −20 0 20 −20 0 20
(d)
0 Re( sca )
Im( sca )
Re( Born )
Im( Born )
−0.5
Re( Born )
conv,200
Im( Born )
conv,200
−10 −8 −6 −4 −2 0 2 4 6 8 10
1
Fig. 2 Comparison of the scattered wave usca and the Born approximation uBorn The computations
are performed on a domain [−25, 25] × [−25, 25] with boundary conditions given in Equation 16.
Further, we display uBorn
conv,200 which is the Born approximation obtained by the convolution
Equation 18. (a) Perturbation model f = 1disk sca
4.5 . (b) Real part of u . (c) Real part of u
Born .
In this section, we discuss the inverse problem of recovering the scattering potential
from measurements of the scattered wave under the Born or Rytov approximations.
Before stating the fundamental result in this context, see Theorem 1 below, we have
to introduce further notation.
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 285
0 0 0 0 0 0
line-source
−20 −20 −20
−1 −1 −5
−20 0 20 −20 0 20 −20 0 20
200
0 0 0 0 0 0
−200
(g)
−10 −8 −6 −4 −2 0 2 4 6 8 10
1
The computational domain for the point source is very large such that the perturbation is barely
visible and the source is positioned in x0 = (0, −480) . (a) Computational domain [−25, 25]2
for line source with perturbation model f = 1disk tot sca
4.5 . (b) Real part of uL . (c) Real part of uL . (d)
Computational domain [−500, 500] for point source with perturbation model f = 14.5 . (e) Real
2 disk
We can now formulate the Fourier diffraction theorem; see, for instance, Kak and
Slaney (2001, Sect. 6.3), Natterer and Wübbeling (2001, Thm. 3.1), or Wolf (1969).
√ ieiκrM
F1 uBorn (k1 , rM ) = 2π Ff (k1 , κ − k0 ) . (19)
2κ
Suppose the object rotates around the origin during the experiment. Then the
resulting orientation-dependent scattering potential can be written as
uα (x1 , rM ), x1 ∈ R, α ∈ A.
√ ieiκrM
F1 uα (k1 , rM ) = 2π Ff Rα (k1 , κ − k0 ) .
2κ
Thus, the k-space coverage, that is, the set of all spatial frequencies y ∈ R2 at which
Ff is accessible via the Fourier diffraction theorem, is given by
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 287
Fig. 4 k-space coverage for a rotating object. Left: half turn, A = [0, π ]. Right: full turn,
A = [0, 2π ]. The k-space coverage (light red) is a union of infinitely many semicircles, each
corresponding to a different orientation of the object. Some of the semicircles are depicted in red:
solid arc (α = 0), dashed (α = π/2), dotted (α = π ), dash-dotted (α = 3π/2)
Y = y = Rα (k1 , κ − k0 ) ∈ R2 : |k1 | < k0 , α ∈ A .
Now suppose the object is illuminated or insonified by plane waves with wave
numbers ranging over a set K ⊂ (0, +∞). Recall the definition of the scattering
potential fk0 = k02 (n2 /n20 − 1) from Equation 2, but note that we have now added
a subscript to indicate the dependence of f on k0 . If the variation of the object’s
refractive index n with k0 ∈ K is negligible, we can write
uk0 (x1 , rM ), x1 ∈ R, k0 ∈ K.
√ ieiκrM 2
F1 uk0 (k1 , rM ) = 2π k Ff1 (k1 , κ − k0 ) .
2κ 0
Notice that now κ also varies with k0 . The resulting k-space coverage
288 F. Faucher et al.
Y = y = (k1 , κ − k0 ) ∈ R2 : k0 ∈ K, |k1 | < k0
see Fig. 5. Note that, in contrast to the scenarios of section “Rotating the Object,”
there are large missing parts near the origin.
We combine the two previous observations by picking a finite set of wave numbers
K ⊂ (0, ∞) and performing a full rotation of the object for each k0 ∈ K. Let uαk0
be the Born approximation to the wave scattered by fkα0 = k02 f1 ◦ Rα . Then the full
set of measurements is given by
√ ieiκrM 2
F1 uαk0 (k1 , rM ) = 2π k0 Ff1 Rα (k1 , κ − k0 ) . (22)
2κ
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 289
We deduce from sections “Rotating the Object” and “Varying Wave √ Number” that
the resulting k-space coverage Y is the union of disks with radii 2k0 , all centered
at the origin. Hence, Y is just the largest disk, that is, the one corresponding to the
largest wave number max K. However, smaller disks in k-space are covered more
often, which might improve reliability of the reconstruction for noisy data.
Reconstruction Methods
In the following, we assume data generated by line sources according to the setup
described in section “Modeling the Total Field Using Line and Point Sources”. We
simulate the total fields solutions to Equation 15, which are the synthetic data used
for the reconstruction.
For the identification of the physical properties of the medium, the Full Waveform
Inversion (FWI) relies on an iterative minimization of a misfit functional which
evaluates a distance between numerical simulation and measurements of the total
field. The Full Waveform Inversion method arises in the context of seismic inversion
for sub-surface Earth imaging, cf. Bamberger et al. (1979), Lailly (1983), Tarantola
(1984), Pratt et al. (1998), and Virieux and Operto (2009), where the measured
seismograms are compared to simulated waves.
With FWI, we invert with respect to the wave speed c, from which the wave
number is defined according to Equation 1. It further connects with the model
perturbation f according to Equation 3. In our experiment, c0 is used as an initial
guess (i.e., we start from constant background), and then c is inverted rather than
f , as discussed in Remark 2. Given some measurements d of the total field,
the quantitative reconstruction of the wave speed c is performed following the
minimization of the misfit functional J such that
min J(c) , J = dist Rutot , d , where u solves Equation 15 . (23)
c
Here, dist(·) is a distance function to evaluate the difference between the measure-
ments and the simulations, and R is a linear operator to restrict the solution to the
positions of the receivers. For simplicity, we do not encode a regularization term
in Equation 23 and refer the readers to, e.g., Faucher et al. (2020c), Kaltenbacher
(2018), and the references therein.
Several formulations of the distance function have been studied for FWI (in
particular, for seismic applications), such as a logarithmic criterion, Shin et al.
(2007), the use of the signal’s phase or amplitude, Bednar et al. (2007) and Pyun
et al. (2007), the use of the envelope of the signal, Fichtner et al. (2008), criteria
290 F. Faucher et al.
based upon cross-correlation, Luo and Schuster (1991), Van Leeuwen and Mulder
(2010), Faucher et al. (2020a), and Faucher et al. (2021), or optimal transport
distance, Métivier et al. (2016). Here, we rely on a least-squares approach where
the misfit functional is defined as the L2 distance between the data and simulations:
1
J(c) := Rutot (c, ω, α) − d(ω, α) 2L2 (−l ,l ) , (24)
2 M M
ω∈c0 K α∈A
where d(ω, α) refers to the measurement data of the total field at the measurement
plane with respect to the object rotated with angle α, and utot (c, ω, α) is the solution
of Equation 15 with k(x) = ω/c(Rα x). The last term Rα x encodes the rotation of
the object. We note that a rotation of the object is equivalent of the rotation of both
the measurement line and the direction of the incident field. We have encoded a sum
over the frequencies ω, which are chosen in accordance with the frequency content
available in measurements. In the computational experiments, we further investigate
uni- and multifrequency reconstructions.
The minimization of the misfit functional Equation 20 follows an iterative
Newton-type method as depicted in Algorithm 1. Due to the computational cost,
we use first-order information and avoid the Hessian computation (Virieux and
Operto 2009): namely, we rely on the nonlinear conjugate gradient method for the
model update, cf. Nocedal and Wright (2006) and Faucher (2017). Furthermore,
to avoid the formation of the dense Jacobian, the gradient of the misfit functional
Algorithm 1 Iterative reconstruction of the wave speed model following the mini-
mization of the misfit functional. At each iteration, the total field solution to
Equation 15 is computed, and the gradient of the misfit functional is used to
update the wave speed model. The algorithm stops when the prescribed number
of iterations is performed for all of the frequencies of interest.
Input: Initial wave speed model c0 .
Initiate global iteration number := 1;
for frequency ω ∈ c0 K do
for iteration j = 1, . . . , niter do
Compute the solution to the wave equation using current wave speed model c
and frequency ω, that is, the solution to Equation 15 with k(x) = ω/c (x);
Evaluate the misfit functional J in Equation 24;
Compute the gradient of the misfit functional using the adjoint-state method;
Update the wave speed model using nonlinear conjugate gradient method to
obtain c+1 ;
Update global iteration number ← + 1;
end
end
Output: Approximate wave speed c, from which the scattering potential f can be
computed via Equations 2 and 1.
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 291
is computed using the adjoint-state method, cf. Pratt et al. (1998), Plessix (2006),
Barucq et al. (2019), and Faucher and Scherzer (2020). In Algorithm 1, we further
implement a progression in the frequency content, which is common to mitigate
the ill-posedness of the nonlinear inverse problem, Bunks et al. (1995). We further
invert each frequency independently, from low to high, as advocated by Barucq et al.
(2019) and Faucher et al. (2020b). For the implementation details using the HDG
discretization, we refer to Faucher and Scherzer (2020).
In this section, we present numerical methods for the computation of the Born
and Rytov approximations from Equations 7 and 12, respectively, as well as the
reconstruction of the scattering potential. We concentrate on the case of full rotations
of the object using incident waves with different wave numbers k0 ∈ K; see
section “Rotating the Object with Multiple Wave Numbers”. The tomographic
reconstruction is based on the Fourier diffraction theorem, Theorem 1, and the
nonuniform discrete Fourier transform. Nonuniform Fourier methods have also been
applied in computerized tomography (Potts and Steidl 2001), magnetic resonance
imaging (Knopp et al. 2007), spherical tomography (Hielscher and Quellmalz 2015,
2016), or surface wave tomography (Hielscher et al. 2018).
In the following, we describe the discretization steps we apply. For N ∈ 2N, let
N
IN := − + j : j = 0, . . . , N − 1 .
2
[−rs , rs ] for some rs > 0. We assume that we are given measurements of the Born
2
approximation
292 F. Faucher et al.
1 2lM π
F1,m u(k1 , rM ) := √ u(x1 , rM ) e−ix1 k1 , k1 ∈ Im , (26)
2π m 2l
lM
x1 ∈ mM Im
for |k1 | ≤ k0 . Considering that we sample the angle α on the equispaced, discrete
grid A = (2π /nA ){0, 1, . . . , nA − 1} and some finite set K ⊂ (0, ∞), Equation 26
provides an approximation of Ff on the non-uniform grid
Ym,nA := Rα (k1 , κ − k0 ) :
π 2π
k1 ∈ Im , |k1 | ≤ k0 , α ∈ {0, 1, . . . , nA − 1}, k0 ∈ K
lM nA
1 (2rs )2
FN fN (y) := f (x)e−ix·y , y ∈ Ym,nA , (28)
2π N 2
x∈RN
see Plonka et al. (2018, Section 7.1). It provides an approximation of the Fourier
transform
2lM 2π
uαk0 (x1 , rM ), x1 ∈ Im , α ∈ A = {0, . . . , nA − 1}, k0 ∈ K.
m nA
for k0 ∈ K do
for α ∈ A do
Compute −i −iκrM F ∈ lM I m ,
2 α π
π κe 1,m uk0 (k1 , rM ), k1 with a DFT in
Equation 26;
end
end
Solve Equation 27 with Equation 29 for fN using the conjugate gradient method;
Output: Approximate scattering potential fN ≈ (f (x))x∈RN .
The Rytov approximation uRytov , see Equation 12, is closely related to the Born
approximation, but it has a different physical interpretation. Assuming that the
measurements arise from the Rytov approximation, we apply Equation 13 to obtain
uBorn from which we can proceed to recover f as shown above. We note that the
actual implementation of Equation 13 requires a phase unwrapping because the
complex logarithm is unique only up to adding 2π i, cf. Müller et al. (2015). In
particular, we use in the two-dimensional case
⎛ ⎛ ⎞
⎞
uRytov uRytov
⎜ ⎟
uBorn = uinc ⎝i unwrap ⎝arg + 1 ⎠ + ln inc + 1⎠ , (30)
uinc u
where arg denotes the principle argument of a complex number and unwrap
denotes a standard unwrapping algorithm. For the reconstruction with the Rytov
approximation, we can use Algorithm 2 as well, but we have to preprocess the data
u by Equation 30.
294 F. Faucher et al.
Numerical Experiments
2
maxx∈RN f(x)
PSNR(f, g) := 10 log10 " 2 ,
N −2 x∈RN f(x) − g(x)
0 0 0 0 0 0
−2 −2
−10 −10 −10 −0.5
−4 −4
−20 −20 −20 −1
−20 −10 0 10 20 −20 −10 0 10 20 −20 −10 0 10 20
Fig. 6 Different perturbation models f used for the computational experiments, given for fre-
quency ω/(2π ) = 1, with the relation to the wave speed given in Equation 3. Both the size
and contrast vary: we consider two radii (4.5 and 2) and three contrasts (1, 5, and −5 with
corresponding wave speeds c = 0.9876, c = 0.9421, and c = 1.0701, respectively), for a total
of six configurations. The computations are carried out on the domain [−50, 50] × [−50, 50],
i.e., a slightly larger setup than Fig. 3, and we only picture the area near the origin for clearer
visualization. (a) Perturbation f for radius 4.5 and amplitude 5: model f = 5 · 1disk 4.5 . (b)
Perturbation f for radius 4.5 and amplitude −5: model f = −5 · 1disk 4.5 . (c) Perturbation f for
radius 2 and amplitude 1: model f = 1disk 2
symmetric object, the data of each angle are similar and correspond to that of Fig. 3
for f = 1disk
4.5 .
0 0 0 0 0 0
−2 −2
−0.5
−5 −5 −5
−4 −4
−1
−5 0 5 −5 0 5 −5 0 5
0 0 0 0 0 0
−2 −2
−0.5
−5 −5 −5
−4 −4
−1
−5 0 5 −5 0 5 −5 0 5
Fig. 7 Reconstruction using iterative minimization using data of frequency ω/(2π ) = 1 only. In
each cases, 50 iterations are performed and the initial model consists in a constant background
where k0 = 2π . The data consist of nA = 40 different angles of incidence from 0° to 351° (a)
Reconstruction for model f = 1disk2 (PSNR 23.50). (b) Reconstruction for model f = 5 · 1disk
2
(PSNR 24.43). (c) Reconstruction for model f = −5 · 1disk 2 (PSNR 24.25). (d) Reconstruction
4.5 (PSNR 14.79). (e) Reconstruction for model f = 5 · 14.5 (PSNR 15.71). (f)
for model f = 1disk disk
(a) (b)
4 4
5 5
2 2
0 0 0 0
−2 −2
−5 −5
−4 −4
−5 0 5 −5 0 5
Fig. 8 Reconstruction using multi-frequency data from ω/(2π ) = 0.2 to ω/(2π ) = 1. The initial
model consists in a constant wave speed c0 = 1. The data consist of nA = 40 different angles
of incidence from 0° to 351° (a) Reconstruction for model f = 5 · 1disk4.5 (PSNR 19.43). (b)
Reconstruction for model f = −5 · 1disk
4.5 (PSNR 19.02)
frequency is inverted separately. The reconstructions for the object of radius 4.5
and contrast f = ±5 are pictured in Fig. 8. Contrary to the case of a single
frequency (see Fig. 7d), the reconstruction is now accurate and stable: the amplitude
is accurately retrieved and the circular shape is clear, avoiding the circular artifacts
observed in Fig. 7e and f.
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 297
(a) (b)
4 4
5 5
2 2
0 0 0 0
−2 −2
−5 −5
−4 −4
−5 0 5 −5 0 5
Fig. 9 Reconstruction using frequency ω/(2π ) = 0.7. The initial model consists in a constant
wave speed c0 = 1. The data consist of nA = 40 different angles of incidence from 0° to 351°. (a)
Reconstruction for model f = 5 · 1disk
4.5 (PSNR 15.33). (b) Reconstruction for model f = −5 · 14.5
disk
(PSNR 15.03)
Remark 3. It is possible to recover the model with a single frequency, which needs
to be carefully chosen depending on the size of the object and the amplitude of the
contrast. We have seen in Fig. 7 that the frequency ω/(2π ) = 1 is sufficient for the
object of radius 2, but for the radius 4.5, we need a lower frequency (i.e., larger
wavelength) to uncover the larger object. We illustrate in Fig. 9 the reconstruction
using data at only ω/(2π ) = 0.7, where we see that the shape and contrast are
retrieved accurately. Nonetheless, it is hard to predict this frequency a priori, and
we believe it remains more natural to use multiple frequencies (when available in
the data), to ensure the robustness of the algorithm.
0 0 0 0 0 0
0 0 0 0 0 0
Fig. 10 Reconstructions with the Born and Rytov approximation, where the data u(·, rM ) is
generated with the line source model. The incident field has the frequency ω/(2π ) = 1. Visible
is only the cut out center, where we compute the PSNR. (a) Model f = 1disk 4.5 . (b) Born
reconstruction (PSNR 19.35). (c) Rytov reconstruction (PSNR 19.28). (d) Model f = 1disk 2 . (e)
Born reconstruction (PSNR 24.13). (f) Rytov reconstruction (PSNR 24.01)
2 2 2
0 0 0 0 0 0
−2 −2 −2
−5 −5 −5
−4 −4 −4
−5 0 5 −5 0 5 −5 0 5
2 2 2
0 0 0 0 0 0
−2 −2 −2
−5 −5 −5
−4 −4 −4
−5 0 5 −5 0 5 −5 0 5
Fig. 11 Same setting as in Fig. 10, but with a higher amplitude of 5 (a) Model f = 5 · 1disk
4.5 . (b)
Born reconstruction (PSNR 4.68). (c) Rytov reconstruction (PSNR 11.91). (d) Model f = 5 · 1disk
2 .
(e) Born reconstruction (PSNR 19.39). (f) Rytov reconstruction (PSNR 21.31)
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 299
We see that the FWI and the Born/Rytov reconstructions contain different kinds
of artifacts. Therefore, a comparison of the visual image quality perception does
not necessarily yield the same conclusions as for the computed PSNR values.
Furthermore, the size of the object has a considerable effect on the PSNR, e.g.,
the images in Fig. 11c and f show a comparable visual quality, but the latter’s PSNR
is considerably better because of the lower error in the background farther away
from the object; see also Huynh-Thu and Ghanbari (2010) for a study on the PSNR.
In Fig. 12, we use the same setup as before, but with the frequency ω/(2π ) =
0.7 instead of ω/(2π ) = 1 and thus the wave number k0 = ω. Apparently, the
reconstruction becomes worse with lower frequency, because it provides a smaller
k-space coverage.
(a) 0.5
(b)20
10 20
10
0 0 0
−10
0
−10 −20
−0.1 −20
−10 0 10 −20 −10 0 10 20
0 0 0 0 0 0
−2 −2
−10 −5 −5
−4 −20 −4
−20 −10 −10
−20 −10 0 10 20 −10 −5 0 5 10 −10 −5 0 5 10
(f)
4
real part
imaginary part
2
−10 −8 −6 −4 −2 0 2 4 6 8 10
Fig. 13 Illustration of the acquisition setup and generated data. The computations are carried out
on the domain [−20, 20]×[−20, 20]. While FWI uses the total field, the reconstruction based upon
Born and Rytov approximations use the scattered solutions, obtained after removing a reference
solution corresponding to a propagation in an homogeneous medium, cf. section “Forward Models”
(a) Perturbation model at frequency ω/(2π ) = 1, the wave speed is equal to 1 in the background.
The positions of the source and the receivers recording transmission data are pictured in white. (b)
Real part of the global solution to Equation 15 at frequency ω/(2π ) = 1, the source is discretized
by Nsim = 1361 simultaneous excitations at fixed height x2 = −10. (c) Real part of the scattering
solution at frequency ω/(2π ) = 1. (d) Zoom near origin of figure panel (b). (e) Zoom near origin
of figure panel (c). (f) Scattered solution measured at the 201 receivers positioned at fixed height
x2 = 6
0 0 0
−5 −5 −5
−5 0 5 −5 0 5 −5 0 5
(d) 5 (e) 5
0.4 0.4
0 0.2 0 0.2
0 0
−5 −5
−5 0 5 −5 0 5
Fig. 14 Reconstruction of the model Fig. 13a with FWI and starting from a homogeneous
background with f = 0. The models are given at frequency ω/(2π ) = 1 and the wave speed is
equal to 1 in the background. (a) Using single-frequency, ω/(2π ) = 0.7 (PSNR 22.38). (b) Using
single-frequency, ω/(2π ) = 1. (PSNR 22.91). (c) Using single-frequency, ω/(2π ) = 1.2. (PSNR
23.10). (d) Using single-frequency, ω/(2π ) = 1.4. (PSNR 23.31). (e) Using multi-frequency,
ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 27.28)
0 1 0 1 0 1
−5 0 −5 0 −5 0
−5 0 5 −5 0 5 −5 0 5
0 1 0 1 0 1
−5 0 −5 0 −5 0
−5 0 5 −5 0 5 −5 0 5
Fig. 15 Reconstruction with FWI starting from a homogeneous background with f = 0. The
models are given at frequency ω/(2π ) = 1 and the wave speed is equal to 1 in the background
(a) True model. (b) Using single-frequency, ω/(2π ) = 0.7 (PSNR 25.04). (c) Using single-
frequency, ω/(2π ) = 1 (PSNR 25.91). (d) Using single-frequency, ω/(2π ) = 1.2 (PSNR
26.19). (e) Using single-frequency, ω/(2π ) = 1.4 (PSNR 26.75). (f) Using multi-frequency,
ω/(2π ) ∈ {0.7, 1, 1.2, 1.4} (PSNR 28.76)
302 F. Faucher et al.
0 0 0
−5 −5 −5
−5 0 5 −5 0 5 −5 0 5
0 0 0
−5 −5 −5
−5 0 5 −5 0 5 −5 0 5
(g) (h)
5 5
0.4 0.4
0 0.2 0 0.2
0 0
−5 −5
−5 0 5 −5 0 5
Fig. 16 Reconstructions for different frequencies of the incident wave. The PSNR is computed
on the visible part of the grid for the real part of the reconstruction, since we know that f must be
real (a) True model f . (b) Born reconstruction at frequency ω/(2π ) = 1. (PSNR 24.69). (c) Rytov
reconstruction at frequency ω/(2π ) = 1. (PSNR 24.66). (d) Rytov reconstruction at frequency
ω/(2π ) = 0.7. (PSNR 22.72). (e) Rytov reconstruction at frequency ω/(2π ) = 1.2. (PSNR 25.32).
(f) Rytov reconstruction at frequency ω/(2π ) = 1.4. (PSNR 26.14). (g) Born reconstruction using
multi-frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 26.37). (h) Rytov reconstruction using multi-
frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 26.37)
0 1 0 1 0 1
−5 0 −5 0 −5 0
−5 0 5 −5 0 5 −5 0 5
0 1 0 1 0 1
−5 0 −5 0 −5 0
−5 0 5 −5 0 5 −5 0 5
(g) (h)
2 2
5 5
1.5 1.5
0 1 0 1
0.5 0.5
−5 0 −5 0
−5 0 5 −5 0 5
Fig. 17 Reconstructions with a higher contrast, where the rest of the setting is the same as in
Fig. 16 (a) True model f . (b) Born reconstruction at frequency ω/(2π ) = 1. (PSNR 23.53).
(c) Rytov reconstruction at frequency ω/(2π ) = 1. (PSNR 24.47). (d) Rytov reconstruction at
frequency ω/(2π ) = 0.7. (PSNR 21.78). (e) Rytov reconstruction at frequency ω/(2π ) = 1.2.
(PSNR 25.55). (f) Rytov reconstruction at frequency ω/(2π ) = 1.4. (PSNR 26.40). (g) Born
reconstruction using multi-frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 23.77). (h) Rytov
reconstruction using multi-frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 25.92)
√ √
coverage, which is the disk of radius 2k0 = 2ω, see section “Fourier Diffraction
Theorem”. Moreover, the multi-frequency reconstruction is shown in Fig. 16g and h.
Even though the multi-frequency setup covers the same disk in k-space, it still seems
superior because we have more data points of the Fourier transform Ff .
For the similar model from Fig. 15a with a higher contrast, the reconstructions
with Born and Rytov approximation differ more from the FWI reconstruction
because of the more severe scattering; see Fig. 17. In general, we can expect the
FWI reconstruction to be better since it is a numerical approximation of the wave
equation, of which the Born or Rytov approximations are just linearizations.
304 F. Faucher et al.
We now consider the case with combinations of smaller convex and non-convex
shapes included in the background medium.
−2 −2 −2
0 0 0
−4 −4 −4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
(d) (e) (f)
4 4 4
0.4 0.4 0.4
2 2 2
−2 −2 −2
0 0 0
−4 −4 −4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
Fig. 18 Reconstruction with FWI starting from a homogeneous background with f = 0. The
models are given at frequency ω/(2π ) = 1 and the wave speed is equal to 1 in the background
(a) True contrast. (b) Using single-frequency, ω/(2π ) = 0.7 (PSNR 18.91). (c) Using single-
frequency, ω/(2π ) = 1 (PSNR 19.50). (d) Using single-frequency, ω/(2π ) = 1.2 (PSNR 19.90).
(e) Using single-frequency, ω/(2π ) = 1.4 (PSNR 20.10). (f) Using multi-frequency, ω/(2π ) ∈
{0.7, 1, 1.2, 1.4} (PSNR 23.04)
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 305
0 1 0 1 0 1
−4 −4 −4
0 0 0
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
0 1 0 1 0 1
−4 −4 −4
0 0 0
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
Fig. 19 Reconstruction with FWI starting from a homogeneous background with f = 0. The
models are given at frequency ω/(2π ) = 1 and the wave speed is equal to 1 in the background
(a) True contrast. (b) Using single-frequency, ω/(2π ) = 0.7 (PSNR 20.75). (c) Using single-
frequency, ω/(2π ) = 1 (PSNR 21.50). (d) Using single-frequency, ω/(2π ) = 1.2 (PSNR 22.27).
(e) Using single-frequency, ω/(2π ) = 1.4 (PSNR 22.72). (f) Using multi-frequency, ω/(2π ) ∈
{0.7, 1, 1.2, 1.4} (PSNR 22.82)
Computational Costs
Computational cost of FWI. The computational cost of FWI comes from the
discretization and resolution of the wave problem Equation 15 for each of the
sources in the acquisition, coupled with the iterative procedure of Algorithm 1.
In our numerical experiments, we use the software hawen for the iterative
inversion, Faucher (2021), Footnote 1, which relies on the Hybridizable discon-
tinuous Galerkin discretization, Cockburn et al. (2009) and Faucher and Scherzer
(2020). The number of degrees of freedom for the discretization depends on the
number of cells in the mesh and the polynomial order. In the inversion experiments,
we use a fixed mesh for all iterations, with about fifty thousand cells. On the other
hand, the polynomial order is selected depending on the wavelength on each cell.
306 F. Faucher et al.
−2 −2 −2
0 0 0
−4 −4 −4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
(d) (e) (f)
4 4 4
0.4 0.4 0.4
2 2 2
−2 −2 −2
0 0 0
−4 −4 −4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
(g) (h)
4 4
0.4 0.4
2 2
0 0.2 0 0.2
−2 −2
0 0
−4 −4
−4 −2 0 2 4 −4 −2 0 2 4
Fig. 20 Reconstructions of the more complicated shapes. The models are given at frequency
ω/(2π ) = 1. (a) True model f . (b) Born reconstruction at frequency ω/(2π ) = 1. (PSNR 20.14).
(c) Rytov reconstruction at frequency ω/(2π ) = 1. (PSNR 20.10). (d) Rytov reconstruction at
frequency ω/(2π ) = 0.7. (PSNR 18.00). (e) Rytov reconstruction at frequency ω/(2π ) = 1.2.
(PSNR 21.29). (f) Rytov reconstruction at frequency ω/(2π ) = 1.4. (PSNR 22.17). (g) Born
reconstruction using multi-frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 21.91). (h) Rytov
reconstruction using multi-frequency, ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 21.93)
That is, each of the cells in the mesh is allowed to have a different order (here
between 3 to 7); see Faucher and Scherzer (2020). Then, when the frequency
changes, while the mesh remains the same, the order of the polynomial evolves
accordingly to the change of wavelength. Once the wave equation, Equation 15,
is discretized, we obtain a linear system which size is the number of degrees of
freedom that must be solved for the different sources (i.e., the different incident
angles). We rely on the direct solver MUMPS, Amestoy et al. (2019), such that
once the matrix factorization is computed, the numerical cost of having several
sources (i.e., multiple right-hand sides in the linear system) is drastically mitigated,
motivating the use of a direct solver instead of an iterative one.
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 307
0 1 0 1 0 1
−4 −4 −4
0 0 0
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
0 1 0 1 0 1
−4 −4 −4
0 0 0
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
(g) 2 (h) 2
4 4
1.5 1.5
2 2
0 1 0 1
−2 0.5 −2 0.5
−4 −4
0 0
−4 −2 0 2 4 −4 −2 0 2 4
Fig. 21 Reconstructions of the more complicated shapes with a higher contrast than in Fig. 20.
The models are given at frequency ω/(2π ) = 1. (a) True model f . (b) Born reconstruction
at frequency ω/(2π ) = 1. (PSNR 17.16). (c) Rytov reconstruction at frequency ω/(2π ) =
1. (PSNR 18.73). (d) Rytov reconstruction at frequency ω/(2π ) = 0.7. (PSNR 16.10). (e)
Rytov reconstruction at frequency ω/(2π ) = 1.2. (PSNR 20.14). (f) Rytov reconstruction
at frequency ω/(2π ) = 1.4. (PSNR 21.15). (g) Born reconstruction using multi-frequency,
ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 16.92). (h) Rytov reconstruction using multi-frequency,
ω/(2π ) ∈ {0.7, 1, 1.2, 1.4}. (PSNR 20.33)
Our numerical experiments have been carried out on the Vienna Scientific Cluster
VSC-4,2 using 48 cores. For the reconstructions of Figs. 14, 15, 18, and 19, the size
of the computational domain is 40 × 40, with about 350.000 degrees of freedom.
Using single-frequency data, 50 iterations are performed in Algorithm 1, and the
total computational time is of about 40 min. In the case of multiple frequencies, we
have a total of 120 iterations, and the computational time is of about 1 h 45 min.
2 https://vsc.ac.at/
308 F. Faucher et al.
Conclusion
We study the imaging problem for diffraction tomography, where wave measure-
ments are used to quantitatively reconstruct the physical properties, i.e., the refrac-
tive index. The forward operator that describes the wave propagation corresponds
with the Helmholtz equation, which, under the assumption of small background
perturbations, can be represented via the Born and Rytov approximations.
Firstly, we have compared different forward models in terms of the resulting
measured data u. It highlights that, even in the case of a small circular object, the
Born approximation is not entirely accurate to represent the total wave field given by
the Helmholtz equation. In addition, the source that initiates the phenomenon (e.g.,
a point source located very far from the object, or simultaneous point source along
a line) also plays an important role as it changes the resulting signals, hence leading
to systematic differences depending on the choice of forward model. We found that
the line source model approximates the plane wave pretty well.
Secondly, we have carried out the reconstruction using data from the total field
utot and compared the efficiency of the Full Waveform Inversion method (FWI) with
that of the Born and Rytov approximations. FWI works directly with the Helmholtz
problem, Equation 15, hence giving a robust approach that can be implemented in
all configurations, however at the cost of possibly intensive computations. On the
other hand, the Born and Rytov are computationally cheap, but lack accuracy when
the object is too large or when the contrast is too strong. We have also noted that
the Rytov approximation gives better results than the Born one. Furthermore, for
all reconstruction methods, we have shown that using data of multiple frequencies
allows to improve the robustness of the reconstruction by providing information on
multiple wavelengths.
Acknowledgments We thank the anonymous reviewer for carefully reading the manuscript and
making various suggestions for its improvement. This work is supported by the Austrian Science
Fund (FWF) within SFB F68 (“Tomography across the Scales”), Projects F68-06 and F68-07. FF
is funded by the Austrian Science Fund (FWF) under the Lise Meitner fellowship M 2791-N.
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 309
Funding by the DFG under Germany’s Excellence Strategy – The Berlin Mathematics Research
Center MATH+ (EXC-2046/1, Projektnummer: 390685689) as well as by the DFG project STE
571/19 (Projektnummer: 495365311) is gratefully acknowledged. For the numerical experiments,
we acknowledge the use of the Vienna Scientific Cluster VSC-4 (https://vsc.ac.at/).
References
Amestoy, P.R., Buttari, A., L’excellent, J.-Y., Mary, T.: Performance and scalability of the block
low-rank multifrontal factorization on multicore architectures. ACM Trans. Math. Softw.
(TOMS) 45(1), 1–26 (2019). https://doi.org/10.1145/3242094
Bamberger, A., Chavent, G., Lailly, P.: About the stability of the inverse problem in the 1-d wave
equation. J. Appl. Math. Optim. 5, 1–47 (1979)
Barucq, H., Chavent, G., Faucher, F.: A priori estimates of attraction basins for nonlinear least
squares, with application to Helmholtz seismic inverse problem. Inverse Probl. 35(11), 115004
(2019). https://doi.org/10.1088/1361-6420
Bednar, J.B., Shin, C., Pyun, S.: Comparison of waveform inversion, part 2: phase approach.
Geophys. Prospect. 55(4), 465–475 (2007). ISSN: 1365-2478. https://doi.org/10.1111/j.1365-
2478.2007.00618.x
Beinert, R., Quellmalz, M.: Total variation-based reconstruction and phase retrieval for diffraction
tomography SIAM J. Imaging Sci. 15(3), 1373–1399 (2022). ISSN: 1936-4954. https://doi.org/
10.1137/22M1474382
Bunks, C., Saleck, F.M., Zaleski, S., Chavent, G.: Multiscale seismic waveform inversion.
Geophysics 60(5), 1457–1473 (1995). https://doi.org/10.1190/1.1443880
Chen, B., Stamnes, J.J.: Validity of diffraction tomography based on the first Born and the first
Rytov approximations. Appl. Opt. 37(14), 2996 (1998). https://doi.org/10.1364/ao.37.002996
Clément, F., Chavent, G., Gómez, S.: Migration-based traveltime wave-form inversion of 2-D
simple structures: a synthetic example. Geophysics 66(3), 845–860 (2001). https://doi.org/10.
1190/1.1444974
Cockburn, B., Gopalakrishnan, J., Lazarov R.: Unified hybridization of discontinuous Galerkin,
mixed, and continuous Galerkin methods for second order elliptic problems. SIAM J. Numer.
Anal. 47(2), 1319–1365 (2009). https://doi.org/10.1137/070706616
Colton, D., Kress, R.: Inverse Acoustic and Electromagnetic Scattering Theory. Applied Mathe-
matical Sciences, vol. 93, 3rd edn. Springer, Berlin (2013). ISBN: 978-1-4614-4941-6. https://
doi.org/10.1007/978-1-4614-4942-3
Devaney, A.: A filtered backpropagation algorithm for diffraction tomography. Ultrason. Imaging
4(4), 336–350 (1982). https://doi.org/10.1016/0161-7346(82)90017-7
Devaney, A.: Mathematical Foundations of Imaging, Tomography and Wave-Field Inversion.
Cambridge University Press (2012). https://doi.org/10.1017/CBO9781139047838
Engquist, B., Majda, A.: Absorbing boundary conditions for numerical simulation of waves. Proc.
Natl. Acad. Sci. 74(5), 1765–1766 (1977)
Fan, S., Smith-Dryden, S., Li, G., Saleh, B.E.A.: An iterative reconstruction algorithm for optical
diffraction tomography. In: IEEE Photonics Conference (IPC), pp. 671–672 (2017). https://doi.
org/10.1109/ipcon.2017.8116276
Faucher, F.: Contributions to seismic full waveform inversion for time harmonic wave equations:
Stability estimates, convergence analysis, numerical experiments involving large scale optimiza-
tion algorithms. PhD thesis. Université de Pau et Pays de l’Ardour, pp. 1–400 (2017)
Faucher, F.: Hawen: time-harmonic wave modeling and inversion using hybridizable discontinuous
Galerkin discretization. J. Open Source Softw. 6(57) (2021). https://doi.org/10.21105/joss.
02699
Faucher, F., Scherzer, O.: Adjoint-state method for Hybridizable Discontinuous Galerkin dis-
cretization, application to the inverse acoustic wave problem. Comput. Methods Appl. Mech.
Eng. 372, 113406 (2020). ISSN: 0045-7825. https://doi.org/10.1016/j.cma.2020.113406
310 F. Faucher et al.
Faucher, F., Alessandrini, G., Barucq, H., de Hoop, M., Gaburro, R., Sincich, E.: Full Reciprocity-
Gap Waveform Inversion, enabling sparse-source acquisition. Geophysics 85(6), R461–R476
(2020a). https://doi.org/10.1190/geo2019-0527.1
Faucher, F., Chavent, G., Barucq, H., Calandra, H.: A priori estimates of attraction basins
for velocity model reconstruction by time-harmonic Full Waveform Inversion and Data-
Space Reflectivity formulation. Geophysics 85(3), R223–R241 (2020b). https://doi.org/10.
1190/geo2019-0251.1
Faucher, F., Scherzer O., Barucq, H.: Eigenvector models for solving the seismic inverse problem
for the Helmholtz equation. Geophys. J. Int. (2020c). ISSN: 0956-540X. https://doi.org/10.
1093/gji/ggaa009
Faucher, F., de Hoop, M.V., Scherzer, O.: Reciprocitygap misfit functional for Distributed Acoustic
Sensing, combining data from passive and active sources. Geophysics 86(2), R211–R220
(2021). ISSN: 0016-8033. https://doi.org/10.119/geo2020-0305.1
Fichtner, A., Kennett, B.L., Igel, H., Bunge, H.-P.: Theoretical back ground for continental- and
global-scale full-waveform inversion in the time–frequency domain. Geophys. J. Int. 175(2),
665–685 (2008). https://doi.org/10.1111/j.1365-246X.2008.03923.x
Gbur, G., Wolf, E.: Hybrid diffraction tomography without phase information. J. Opt. Soc. Am. A
19(11), 2194–2202 (2002). https://doi.org/10.1364/OL27.001890
Hanke, M.: Conjugate Gradient Type Methods for Ill-Posed Problems. Pitman Research Notes in
Mathematics Series, vol. 327. Longman Scientific & Technical, Harlow (1995)
Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and
Applications. Springer Science & Business Media (2007). https://doi.org/10.1007/978-0-387-
72067-8
Hielscher, R., Potts, D., Quellmalz, M.: An SVD in spherical surface wave tomography. In:
Hofmann, B., Leitao, A., Zubelli, J.P. (eds.) New Trends in Parameter Identification for
Mathematical Models. Trends in Mathematics, pp. 121–144. Birkhäuser, Basel (2018). ISBN:
978-3-319-70823-2. https://doi.org/10.1007/978-3-319-70824-9_7
Hielscher, R., Quellmalz, M.: Optimal mollifiers for spherical de-convolution. Inverse Probl. 31(8),
085001 (2015). https://doi.org/10.1088/02.665611/31/8/085001
Hielscher, R., Quellmalz, M.: Reconstructing a function on the sphere from its means along vertical
slices. Inverse Probl. Imaging 10(3), 711–739 (2016). ISSN: 1930-8337. https://doi.org/10.
3934/ipi.2016018
Horstmeyer, R., Chung, J., Ou, X., Zheng, G., Yang, C.: Diffraction tomography with Fourier
ptychography. Optica 3(8), 827–835 (2016). https://doi.org/10.1364/OPTICA.3.000827
Huynh-Thu, Q., Ghanbari, M.: The accuracy of PSNR in predicting video quality for different
video scenes and frame rates. Telecommun. Syst. 49(1), 35–48 (2010). https://doi.org/10.1007/
s112350109351x
Iwata, K., Nagata, R.: Calculation of refractive index distribution from interferograms using the
Born and Rytov’s approximation. Jpn. J. Appl. Phys. 14(S1), 379–383 (1975). https://doi.org/
10.7567/jjaps.14s1.379
Kak, A.C., Slaney M.: Principles of Computerized Tomographic Imaging. Classics in Applied
Mathematics, vol. 33. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
(2001)
Kaltenbacher, B.: Minimization based formulations of inverse problems and their regularization.
SIAM J. Optim. 28(1), 620–645 (2018). https://doi.org/10.1137/17M1124036
Keiner, J., Kunis, S., Potts, D.: Using NFFT3 – a software library for various nonequispaced fast
Fourier transforms. ACM Trans. Math. Softw. 36, Article 19, 1–30 (2009). https://doi.org/10.
1145/1555386.1555388
Keiner, J., Kunis, S., Potts, D.: NFFT 3.5, C subroutine library (n.d.). https://www.tu-chemnitz.
de/~potts/nfft
Kirby, R.M., Sherwin, S.J., Cockburn, B.: To CG or to HDG: a comparative study. J. Sci. Comput.
51(1), 183–212 (2012). https://doi.org/10.1007/s10915-011-9501-7
8 Diffraction Tomography, Fourier Reconstruction, and Full Waveform Inversion 311
Kirisits, C., Quellmalz, M., Ritsch-Marte, M., Scherzer, O., Setterqvist, E., Steidl, G.: Fourier
reconstruction for diffraction tomography of an object rotated into arbitrary orientations. Inverse
Probl. (2021). ISSN: 0266-5611. https://doi.org/10.1088/1361-6420/ac2749
Knopp, T., Kunis, S., Potts, D.: A note on the iterative MRI reconstruction from nonuniform k-
space data. Int. J. Biomed. Imag. (2007). https://doi.org/10.1155/2007/24727
Kunis, S., Potts, D.: Stability results for scattered data interpolation by trigonometric polynomials.
SIAM J. Sci. Comput. 29, 1403–1419 (2007). https://doi.org/10.1137/060665075
Lailly, P.: The seismic inverse problem as a sequence of before stack migrations. In: Bednar,
J.B. (ed.) Conference on Inverse Scattering: Theory and Application, pp. 206–220. Society for
Industrial and Applied Mathematics (1983)
Luo, Y., Schuster, G.T.: Wave-equation traveltime inversion. Geophysics 56(5), 645–653 (1991).
https://doi.org/10.1190/1.1443081
Maleki, M.H., Devaney, A.: Phase-retrieval and intensity-only recon-struction algorithms for
optical diffraction tomography. J. Opt. Soc. Am. A 10(5), 1086 (1993). https://doi.org/10.1364/
josaa.10.001086
Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: Measuring the misfit between
seismograms using an optimal transport distance: application to full waveform inversion.
Geophys. J. Int. 205(1), 345–377 (2016). https://doi.org/10.1093/gji/ggw014
Monk, P.: Finite Element Methods for Maxwell’s Equations. Oxford University Press, Oxford
(2003)
Müller, P., Schürmann, M., Guck, J.: ODTbrain: a Python library for full-view, dense diffraction
tomography. BMC Bioinform. 16(367) (2015). https://doi.org/10.1186/s12859-015-0764-0
Müller, P., Schürmann, M., Guck, J.: The Theory of Diffraction Tomography (2016). arXiv:
1507.00466 [q-bio.QM]
Natterer, F.: The Mathematics of Computerized Tomography, x+222. B. G. Teubner, Stuttgart
(1986). ISSN: 3-519-02103-X
Natterer, F., Wübbeling, F.: Mathematical Methods in Image Reconstruction. Monographs on
Mathematical Modeling and Computation, vol. 5. SIAM, Philadelphia (2001)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd
edn. Springer, Berlin (2006)
Plessix, R.-E.: A review of the adjoint-state method for computing the gradient of a functional with
geophysical applications. Geophys. J. Int. 167(2), 495–503 (2006). https://doi.org/10.1111/j.
1365-246X.2006.02978.x
Plonka, G., Potts, D., Steidl, G., Tasche, M.: Numerical Fourier Analysis. Applied and Numerical
Harmonic Analysis. Birkhäuser (2018). ISSN: 978-3-030-04305-6. https://doi.org/10.1007/
978-3-030-04306-3
Potts, D., Steidl, G.: A new linogram algorithm for computerized tomography. IMA J. Numer.
Anal. 21, 769–782 (2001). https://doi.org/10.1093/imanum/21.3.769
Pratt, R.G., Shin, C., Hick, G.J.: Gauss–Newton and full Newton methods in frequency–space
seismic waveform inversion. Geophys. J. Int. 133(2), 341–362 (1998). https://doi.org/10.1046/
j.1365-246X.1998.00498.x
Pyun, S., Shin, C., Bednar, J.B.: Comparison of waveform inversion, part 3: amplitude approach.
Geophys. Prospect. 55(4), 477–485 (2007). ISSN: 1365-2478. https://doi.org/10.1111/j.1365-
2478.2007.00619.x
Shin, C., Pyun, S., Bednar, J.B.: Comparison of waveform inversion, part 1: conventional wavefield
vs logarithmic wavefield. Geophys. Prospect. 55(4), 449–464 (2007). ISSN: 1365-2478. https://
doi.org/10.1111/j.1365-2478-2007.00617.x
Slaney, M., Kak, A.C., Larsen, L.E.: Limitations of imaging with first-order diffraction tomog-
raphy. IEEE Trans. Microw. Theory Techn. 32(8), 860–874 (1984). https://doi.org/10.1109/
TMTT.1984.1132783
Sung, Y., Choi, W., FangYen, C., Badizadegan, K., Dasari, R.R., Feld, M.S.: Optical diffraction
tomography for high resolution live cell imaging. Opt. Express 17(1), 266–277 (2009)
312 F. Faucher et al.
Tarantola, A.: Inversion of seismic reflection data in the acoustic approximation. Geophysics 49,
1259–1266 (1984). https://doi.org/10.1190/1.1441754
Van Leeuwen, T., Mulder, W.: A correlation-based misfit criterion for wave-equation traveltime
tomography. Geophys. J. Int. 182(3), 1383–1394 (2010)
Virieux, J.: SH-wave propagation in heterogeneous media: velocity-stress finite-difference method.
Geophysics 49(11), 1933–1942 (1984)
Virieux, J., Operto, S.: An overview of full-waveform inversion in exploration geophysics.
Geophysics 74(6), WCC1–WCC26 (2009). https://doi.org/10.1190/1.3238367
Wedberg, T.C., Stamnes, J.J.: Comparison of phase retrieval methods for optical diffraction
tomography. Pure Appl. Opt. 4, 39–54 (1995). https://doi.org/10.1088/0963-9659/4/1/005
Wolf, E.: Three-dimensional structure determination of semi-transparent objects from holographic
data. Opt. Commun. 1, 153–156 (1969)
Models for Multiplicative Noise Removal
9
Xiangchu Feng and Xiaolong Zhu
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Variational Methods with Different Data Fidelity Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Statistical Property Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
MAP-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Root and Inverse Transformation-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Variational Methods with Different Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
TV Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Sparse Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Nonconvex Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
Multitasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Root Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Fractional Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Nonlocal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Indirect Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
DNN Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Indirect Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Abstract
Image denoising is the most important step in image processing for further image
analysis. It is an important topic in many applications, such as object recognition,
digital entertainment, etc. The digital image can be corrupted with noise during
Keywords
Introduction
Poisson noise; and in synthetic aperture radar (SAR), where the noise follows a
Gamma distribution. In fact, speckle in a SAR image is caused by constructive
and destructive interference of coherent waves reflected by the many elementary
scatterers contained within the imaged resolution cell. The magnitude of the
complex observations of SAR can usually be modeled as corrupted by multiplicative
Rayleigh noise. As a consequence, the noise present in the square of the magnitude,
the so-called intensity, is exponentially distributed. To improve the quality of such
data, a common approach in SAR imaging is to average independent intensity
observations of the same scene to obtain so-called multi look data, which is then
contaminated by multiplicative Gamma noise.
The Gamma noise model and Gamma distribution are given below. If we use
f to denote the image intensity that the SAR measures for a given pixel whose
backscattering coefficient is u, and assume that the SAR image represents an
average of L looks (independent samples or pixels), then f is related to u by the
multiplicative model
f = un (1)
LL nL−1 e−Ln
pn (n) = , n ≥ 0, L ≥ 1. (2)
(L)
f˜ = ũ + ñ (3)
which leads to
LL eñL e−Le
ñ
pñ ñ = (5)
(L)
E ñ = ψ (L) − ln (L) (6)
d
ψ (·) = ln (x) (7)
dx
similarity term, an energy function (SST-model) was presented (Steidl and Teuber
2010). By applying an exponential transformation to the AA model, a globally
convex model for speckle noise removal has been achieved in Jin and Yang (2010).
The regularization term is commonly total variation (TV) and its variations (Xiao
et al. 2010; Hu et al. 2013; Na et al. 2018).
Due to the sparse nature of the l1 norm, TV requires the image to have some
sparsity in the gradient domain. We know that the wavelet coefficients, ridgelet coef-
ficients, or curvelet coefficients of a sharp image are sparse. Based on these, Durand
et al. gave a hybrid method of curvelet field for removing multiplicative noise in
Durand et al. (2010). A combination of total generalized variation filter (which has
been proved to be able to reduce the blocky-effects by being aware of high-order
smoothness) and shearlet transform (that effectively preserves anisotropic image
features such as sharp edges, curves, and so on) was proposed in Ullah et al. (2017).
In Huang et al. (2012), dictionary learning is used as a regularization term, and
experimental results suggest that in terms of visual quality, peak signal-to-noise
ratio, and mean absolute deviation error, the proposed algorithm outperforms many
other methods. In addition to variational models, nonlocal methods (Teuber and
Lang 2012; Huang et al. 2017) are also proposed. Recently, deep neural network
methods (Wang et al. 2017, 2019) are presented, extensive experiments on synthetic
and real images show that they achieve significant improvements over the state-of-
the-art speckle reduction methods.
Usually, a variational method has two terms: a data fidelity term and a regularization
term. More specifically, our interest is in recovering a true underlying image u from
the noise corrupted observation f = un, where n is a random variable following
Gamma distribution. To obtain an estimate û, (10) is considered
û = arg min E (u) := φ u, f + λρ (u) (10)
u∈X
where λ > 0 is a tuning parameter, X is the space that the solution lies in.
Depending on the model, X may be L2 (), BV (), etc. In the discrete case,
usually X = R d . In general, the data fidelity term φ reflects characteristics of the
noise corrupting our observation, and the regularization term ρ (·) is a prior on the
clean image u. A common choice for ρ (·) is total variation(TV)
ρ (u) = |∇u| := J (u) or |u|T V () (11)
318 X. Feng and X. Zhu
(1) RLO-model
Under the assumption that the mean of the multiplicative noise is equal to 1 and
the variance is known, Rudin, Lions, and Osher introduced the following denoising
model (RLO) in Rudin et al. (2003):
2
f f
min J (u) + λ1 dx + λ2 −1 dx (12)
u∈X u u
However, only basic statistical properties, the mean, and the variance of the noise
are considered in the RLO model, which somehow limits its denoising ability. We
know that the likelihoods of the multiplicative Poisson noise and the likelihood
of the multiplicative Rayleigh noise (Setzer et al. 2010; Denis et al. 2009) are
2
1 f
u − f log u dx and 2 u + log u dx, respectively.
Based on the MAP model of Poisson noise and Rayleigh noise, the above
model (12) can be generalized into SO-model.
2
f b f
a + + c log u dx (13)
u 2 u
b 2
ŵ = arg min J (w) + λ af exp (−w) + f exp (−2w) + (a + b) w ,
w∈BV () 2
û = eŵ
(15)
9 Models for Multiplicative Noise Removal 319
2
The fidelity term H w, f = af exp (−w) + b2 f exp (−2w) + (a + b) w
is globally strictly convex. Using gradient descent and the Euler-Lagrange equation
for this total variation-based problem, (16) can be obtained:
∇w 2
wt = ∇ · + λ af exp (−w) + b f exp (−2w) − (a + b) (16)
|∇w|
Shi and Osher extended this convex model to obtain a nonlinear inverse scale space
flow and its corresponding relaxed inverse scale space flow. The numerical results
of SNR show significant improvement over the RLO model (Shi and Osher 2008).
MAP-Based Models
(3) AA-model
Based on the MAP estimator for multiplicative Gamma noise, Aubert and Aujol
(2008) proposed to determine the denoised image as a minimizer in S () =
{u ∈ BV : u > 0} of the following functional
f
min λJ (u) + log u + dx (17)
u∈S() u
The AA model (17) is nonconvex; finding its global solution is a challenging task.
It is known that the convex optimization method has vast applications in image
processing. Therefore, many works have been designed to relieve the nonconvex
AA model.
(4) SO-model
In (2008), Shi and Osher suggested
to keep the data fitting term in (17) but to replace
the regularizer |∇u| by ∇ log u. Moreover, setting as in the log-model w := log u,
this results in the convex function
ŵ = arg min f e−w + wdx + λJ (w) , û = eŵ (18)
w∈BV ()
f
I f, u := f log − f + udx (20)
u
The gradient of the data fitting terms in (18) and (21) coincide if we use again
the relation log û = ŵ. Moreover, if we add TV-regularization, then both functions
have the same minimizer. Since ∇ew = ew ∇w, for u = ew , we have ∇u (x) = 0
if and only if ∇w (x) = 0. The minimizers ŵ and û of functions (18) and (21) are
unique and given by
∇ ŵ
0 = 1 − f e−ŵ − λdiv f or ∇ ŵ (x) = 0 (22)
∇ ŵ
f ∇ û
0=1− − λdiv f or ∇ û (x) = 0 (23)
û ∇ û
∇w ew ∇w ∇u
Since |∇w| = ew |∇w| = |∇u| , we obtain the assertion.
gradient operator is applied to mth root transformed images. Then the transformed
new variational model is expressed as follows:
9 Models for Multiplicative Noise Removal 321
u∗ = arg min
√
m log u + f u−m , 1 + λJ (u)
m
u∈ m U (24)
û = u ∗
The probability density function (25) is a special case of the generalized Gamma
distribution. Hence, the mean value and the variance of nm are
L + m1
E (nm ) = √ (26)
(L) m L
2
(L) L + 2
m− L+ 1
m
var (nm ) = √
m
(27)
(L)2 L2
We know that if u ∈ (0, C , then the objective function of the m-V model (24) is
√
√
convex on the set u 0 < u ≤ min m (m + 1) f , m C1 . We call this property
(7) DZ-model
Since the performance of the m-V model critically depends on the choice of m, a
relaxed method was proposed in Kang et al. (2013) to further relax the m-V model.
Nevertheless, the method is convex only when m is large enough. In Dong and Zeng
(2013), the authors suggested the following model:
2
f u
min E (u) := log u + dx + α − 1 dx + λJ (u) (28)
u∈S̄() u f
with the penalty parameter α > 0. S̄ () := v ∈ BV () : v ≥ 0 is a closed and
√
convex set, log 0 = −∞ and log 10 = +∞ in S̄ (). They proved that if α ≥ 2 6
9 ,
the model (28) is strictly convex.
322 X. Feng and X. Zhu
(8) Exp-model
It was pointed out that the model (28) is mainly suitable for a large value of L. Lu
et al. (2016) replace fu − 1 in the DZ model with fu − β1, yielding the following
optimization problem:
! " # #2
# u #
f # #
min log u + , 1 + α # − β1# + λJ (u) (29)
d
u∈R+ u # f #
2
√
The objective function of this model is strictly convex if αβ ≥ 2 9 6 , where β is no
less than 1 and varies with the level of the noise.
Furthermore, owing to the constraint u > 0 and the observation that exponent-
like models usually provide better quality denoised images than their logarithm-like
counterparts, the authors used the log transformation, w = log u, and proposed the
following model, called the exp model:
⎡ ⎛) ⎞2 ⎤
⎢ ew ⎥
⎣w + f e + α ⎝ − β ⎠ ⎦dx + J (w)
−w
min λ (30)
w∈BV () f
Next, the log transformation u = log ũ is applied, resulting in
Sur (x) = wr x, y q̄ (u) y dy (32)
Using (32), we can obtain the following TV minimization problem with local
constraints:
min J (u) = |Du|, s.t. Sur (x) ≤ C a.e. in (34)
u∈BV ()
9 Models for Multiplicative Noise Removal 323
(9) Convex-model
Another work is the so-called discrete convex model (Zhao et al. 2014)
1# #
#w − μe#2 + α1 F w − u
min 2 1 + α2 J (u) (35)
w,u∈R d 2
where μ is a constant,
e is a vector of which all the components are valued one,
F = diag f is the diagonalization matrix of the noisy image f with main
diagonal entries given by fi , and w is expected as the inverse of the multiplicative
noise: w = n1 . In fact, from f = un we obtain f w = u, and using the
matrix form F #= diag f , we # have F w = u. The# data fidelity # term is
F w − u 1 , i.e., #diag(f )w − u#1 , which is equivalent to #f − un#1 . It replaces
the nonconvex data fidelity term in the AA model and leads to an unconditional
convex problem. Except for the fidelity term and the TV regularization term, the
# #2
third term #w − μe#2 is introduced to avoid the trivial solution.
fm ζm = um (36)
Then take L1 norm fm ζm − um dx and TV semi-norm |∇um | dx as the data
fidelity term and the regularization term, respectively, and introduce the quadratic
2
penalty term ζm − um dx as the prior of noise. Consequently, the proposed
model is formulated as (Zhao et al. 2018)
(a) ζm∗ , u∗m = arg min fm ζm − um dx + ζm −um 2 dx+λ
|∇um | dx
α
2
m ζm ,um
(b) û = u∗m
(37)
where {m : m ≥ 1, m ∈ N}. α and λ are parameters to control the trade-off among
three terms in the objective function. The model is based on the following theorems.
LL m −mL−1 − yLm
pζ m y = y e (38)
(L)
324 X. Feng and X. Zhu
2
lim E ζm − 1 =0 (41)
m→+∞
satisfies
#
# 1 1
DKL # 2
ζm #N μm,L , σm,L =o +o (42)
m L2
2 2
where μm,L = E ζm , σm,L = E ζm − E ζm .
The proposed model (37) is an unconditional convex problem with a parameter
m. It is noted that it reduces to the work in Zhao et al. (2014) when m = 1. However,
it is known from Fig. 1 that the probability density function of ζ = n1 (m = 1) is far
away from the Gaussian distribution, especially for small L. That is to say, the
model (Zhao et al. 2014) cannot describe the prior of the noise very well, which
0.2 0.25
PDF of ζ PDF of ζ
0.18 Gaussian Gaussian
0.16 0.2
0.14
0.12 0.15
0.1
0.08 0.1
0.06
0.04 0.05
0.02
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
2
Fig. 1 Plots of the PDFs of ζm and N μm,L , σm,L with different m and L (a) m = 4, L = 3.
(b) m = 4, L = 4
9 Models for Multiplicative Noise Removal 325
Fig. 2 Results of different methods when removing the multiplicative noise with L = 4. From the
first to the last are original image, noisy image, the restored images of AA, convex, DZ, m-V, and
the mth root transformation model, respectively
restricts its denoising performance. Comparatively, the new model is more flexible
and extensible.
Moreover,
it is worth
noting that the data fidelity term in (37) is the L1 -
norm fm ζm − um dx. The main reason lies in that multiplicative noise mostly
presents the corruption as the speckles or outliers onto the image, so L1 -norm
outperforms L2 -norm or other convex representation as data fidelity term (Zhao
et al. 2014) (Fig. 2).
The regularizer ρ (·) has been extensively studied, and there are a few examples
widely used in image recovery techniques. The choice of this regularizer depends on
the assumptions made about the underlying image structure. Popular choices include
the total variation (TV) semi-norm for image gradient sparsity, the l1 norm for
coefficient sparsity in a wavelet basis or other dictionary, and Huber-like functions
which are akin to the l1 norm but smooth. In general, image processors choose a
regularizer according to two desiderata: one is that the objective function may be
minimized efficiently and the other is that the regularizer accurately reflects image
structure. The regularization term can be classified as TV, sparse, and nonconvex
regularization.
TV Regularization
J (u) = |∇u| dx (43)
In the case of additive Gaussian noise, the minimizer û of the whole ROF function
326 X. Feng and X. Zhu
1 2
f − u dx + λJ (u) (44)
2
/
n /
n
ûi = fi (45)
i=1 i=1
The drawback of the model (44) consists of its staircasing effect so that meanwhile
various alternative regularizers were considered.
(1) Non-local TV
The examples given above are standard TV regularization. Non-local TV is a
promotion of NL-means. The idea of nonlocal means goes back to Buades et al.
(2005) and was incorporated into the variational framework in Gilboa et al. (2006)
and Gilboa and Osher (2009). We refer to these papers for further information on
NL-means. Based on some pre-computed weights w, the regularization term is given
by
2 1/
2
ρ (u) = |∇w u| dx, |∇w u| := u y − u (x) w x, y dy .
(46)
(2) Weberized TV
Inspiring from the Weberized TV regularization method, a nonconvex Weberized
TV regularization-based multiplicative noise removal model was proposed in Xiao
et al. (2010):
|∇u| f
û = arg min E (u) = dx + λ + log u dx (47)
u∈X u u
(3) Modified TV
Another variation of TV is proposed in Hu et al. (2013). When the gradient is small,
a log is multiplied, and when the gradient is large, an affine transformation is made.
Consider the following variational problem:
f
min E (u) := min ρ (Du) + λ log u + (48)
u∈X u∈X u
9 Models for Multiplicative Noise Removal 327
where b = M (1 + M) + log (1 + M), M is a positive constant, and its value is
determined by the size of an image.
(4) TGV
To overcome these staircasing effects, higher-order regularization-based models
were suggested in Chambolle and Lions (1997); Chan et al. (2000), and Li et al.
(2007). As an early work, an inf-convolution TV (ICTV) model was proposed
in Chambolle and Lions (1997), which takes the infimal convolution of TV and
second-order TV. Moreover, Li et al. (2007) proposed a denoising model, involving
a convex combination of TV and second-order TV as a regularizer. On the other
hand, as a generalization of the ICTV, the TGV regularizer was proposed in Bredies
et al. (2010). In particular, the second-order TGV is as follows:
T GV (u) = min
2
α1 ∇u − p + α0 ε p dx (50)
p∈P
T
where ε p = 12 ∇p + ∇p represents the distributional symmetrized deriva-
tive, and α1 , α0 > 0 are the weighted parameters that control the balance between
the first- and second-order terms. From the formulation (50) of TGV, it can be
interpreted that T GV 2 (u) can automatically find an appropriate balance between
the first- and the second-order derivative of u with respect to αi .
Sparse Regularization
Due to the sparse nature of the l1 norm, TV requires the image to have some sparsity
in the gradient domain. We know that the wavelet coefficients, ridgelet coefficients,
or curvelet coefficients of a sharp image are sparse. Based on these, Durand et al.
gave a hybrid method of curvelet field for removing multiplicative noise in Durand
et al. (2010).
# #2
# #
α̂ = arg min #W log f − α # + λ α 0 (51)
α∈R d 2
where W̃ is a left inverse of W , = diag {λi } is some weights. At last step, they
restored the image by exponential transformation and bias correction according to
the Gamma distribution, e.g.,
û = exp W̃ x̂ 1 + ψ1 (L) 2 (53)
2
+∞
where ψ1 (z) = dz d
log (z), (z) = 0 exp (−t) t z−1 dt. In equation (51),
they used curvelet and L2 loss function to preserve the information of edges.
Experimental results in Durand et al. (2010) show that the algorithm can obtain
better results than SO (Shi and Osher 2008), AA (Aubert and Aujol 2008), and BS
(Chesneau et al. 2010).
δ# #
#d − ω∗ #2 + d
min 2 TV (55)
d 2
At the last stage, they transform the result obtained from the second step via an
exponential function and bias correction. Let d ∗ be the solution to (55). d ∗ can be
seen as the estimator of ω∗ ; it is prone to bias, which leads to the fact that the
restored image is bias too. Using bias correction, we have (Figs. 3 and 4)
û = exp d ∗ 1 + ψ1 (L) 2 (56)
330 X. Feng and X. Zhu
Differ from this above approach, the following model adds a TV rule for the log
domain (Huang et al. 2012).
! "
f # #
D̂, âij , û = arg min λ log u + , 1 + γ #log u#T V
D,aij ,u>0 u
1 / # #
#Daij − Rij log u#2 +
/ # #
+ uij #aij #0 (57)
2
(i,j )∈P (i,j )∈P
√ √ 2
where λ, γ are positive regularization parameters, P = 1, 2, · · · , N − n + 1 .
u ∈ R N is the estimated image. The · T V term is defined, in the discrete
setting, by summing over the image domain the norm of ∇u, the classical
2-neighbors discrete gradient estimate. Ri,j ∈ R n×N is the matrix corresponding
to the extraction of the patch located in (i, j ), and ai,j ∈ R K is the sparse vector
the patch Ri,j log u with the dictionary D ∈ R
of coefficients to represent n×K .
The hidden parameters ui,j (i,j )∈P are determined by the optimization procedure
described in Elad and Aharon (2006).
Nonconvex Regularization
(8) Fractional-Order TV
Toenhance the edge-preserving ability of TV, several nonconvex TV regularizers
were proposed in Na et al. (2018); Krishnan and Fergus (2009), and Mei et al.
(2018), which have the form ρ (|∇u|) = ϕ (|∇u|) dx, where ϕ is the nonconvex
function defined as
9 Models for Multiplicative Noise Removal 331
ρs 2 1
ϕ (s) = s q 0 < q < 1 , , 1 + ρs 0 < q < 1 (58)
1 + ρs 2 ρ
⎧
⎪
⎪ α 1
K−1
⎪
⎨ ∇1 u = (−1)k Ckα ui−k,j
i,j
k=0
⎪
⎪ α 1
K−1
⎪
⎩ ∇2 u i,j = (−1)k Ckα ui,j −k
k=0
4
where K ≥ 3 is an integer constant, Ckα = (α + 1) (k + 1) (α − k + 1) ,
(·) is the gamma function, and u is an image of size M × N.
where ϕ (s) = αs (1 + αs), and λ is a constant parameter balancing the data term
and the regularization term, which are based on the following two points:
Firstly, they point out the advantages of the proposed regularizer in the sparse
framework. In fact, both the TV regularizer and the proposed regularizer can be
seen as sparse measurements on the gradient modulus of u. The TV regularizer
1n # #
|∇i u| is equals to the l1 -norm of the gradient modulus #|∇i u|# , while 1
i=1
1n
the proposed nonconvex regularizer ϕ (|∇i u|) which can be converted into
5
i=1
1n # #
|∇i u| |∇i u| + α −1 tends to be #|∇i u|#0 . Note that the parameter α used
i=1
here should be set large enough. The approximating l0 -norm is a much sparser
measurement than the l1 -norm. In sparse representation (Daubechies et al. 2010;
Candes et al. 2008), the sparse property of the approximating l0 -norm has been
widely used, which will lead to preserving edges of images.
Secondly, they present the underlying reason why the regularizer can protect
edges from oversmoothing. This is equivalent to finding out what a good function
ϕ (·) should be. On one hand, in order to protect edges from oversmoothing, ϕ (s)
should be imposed a “growth” condition of the type lim ϕ (s) = c (c is a constant)
s→+∞
so that the contribution of the regularizer would not penalize the formation of strong
gradients of u. In other words, the growth condition is used to protect large details
of images. On the other hand, at near zero points (s → 0+ ), ϕ (s) is preferable
to have the same behavior as the TV regularizer so that u can be better smoothed
in homogeneous regions of images. To make a balance between preserving edges
and smoothing homogeneous regions, necessarily ϕ (s) should have a nonconvex
shape like the type ϕ (s) = αs (1 + αs). Three different choices of ϕ (s) are shown
in Fig. 5. Therefore, the nonconvex sparse regularization is better than convex TV
because the TV regularizer does not satisfy the growth condition (Fig. 6).
N T GV (u) = min α1 ϕ ∇u − p + α0 ϕ ε p dx (62)
p∈P
where ϕ (x) = ρ1 log 1 + ρx with the parameter ρ > 0 controlling the
nonconvexity of the regularization term. This regularization takes advantage of both
nonconvex regularization and TGV regularization.
The authors propose the following model Na et al. (2018) for the removal of
heavy multiplicative noise, which utilizes an NTGV and λ : → R+ :
9 Models for Multiplicative Noise Removal 333
Fig. 5 Nonconvex and convex functions ϕ (·). The nonconvex function ϕ (s) = s /(1 + s) (resp.
ϕ (s) = 10s /(1 + 10s)) corresponds to α = 1 (resp. α = 10). Both of their limits are 1 as
s → +∞. The convex function ϕ (s) = s corresponds to the case of the TV regularizer
Fig. 6 Local enlarged denoising results. From left to right, the clean image, the denoising results
of the AA model, the BF model, and the nonconvex sparse regularizer model are listed
⎡ ⎞2 ⎤
⎛)
⎢ eu ⎥
min λ (x) ⎣u + f e−u + α ⎝ − β1⎠ ⎦ dx + NT GV (u), (63)
u∈X f
where ϕi (i = 0, 1) are the nonconvex log functions, ϕi (x) = ρ1i log 1 + ρi x ,
where ρi > 0 control the nonconvexity of regularization terms. The parameters
α > 0 and β ≥ 0 satisfy the condition αβ 4 ≤ 4096
27 to enforce the convexity of the
data fidelity term. X and P are the corresponding solution spaces.
Multitasks
One of the advantages of using the variational method to build a model is that it can
be easily extended to multitasking situations.
Root Transformation
f = (Au) n, (65)
⎛) ⎞2
f ⎝ Au
min EA (u) := log (Au) + dx + α − 1⎠ dx + λ |Du|
u∈S̄() Au f
(68)
where S̄ () := v ∈ BV () : v ≥ 0 .
√
Proposition 1. If α ≥ 2 9 6 , then the model (68) is convex.
Inspired by Dong and Zeng’s model (Dong and Zeng 2013), the following TGV
regularized model was presented in Shama et al. (2016)
⎛) ⎞2
f ⎝ H u
min E (u) = log H u + dx + β − 1⎠ dx + T GVα2 (u)
u∈LP () Hu f
√ (69)
where β ≥ 2 9 6 , p ∈ (1, ∞) and p ≤ d (d − 1), and d = 2 for the two-
dimensional case.
Fractional Transformation
Zhao et al. (2014) introduced a new convex total variation-based model for restoring
images contaminated with multiplicative noise and blur. The main notion is to
reformulate a blur and multiplicative noise equation such that both the image
variable and noise variable are decoupled. As a result, the concluding energy
function involves the total variational filter, the term of the variance of the inverse
of noise, the l1 -norm of the data fidelity term among the observed image, noise, and
image variables. The convex optimization model is given by
1# #
#w − μe#2 + α1 F w − H u
min 2 1 + α2 Du 2 (70)
w,u∈R d 2
where α1 and α2 are two positive regularization parameters to control the balance
between the three terms in the objective function, μ can be set to be the mean value
of w, and e is a vector with all entries equal to 1.
Nonlocal Methods
Indirect Method
In Huang et al. (2017), the Box-Cox transform is used to transform the random
variable into an approximately normal distribution, and then the similar block
BM3D method is used to denoise. We know Box-Cox transformation (Box and
Cox 1964) can effectively transform a random variable and force it to follow
normal distribution exactly or approximately if a suitable transformation parameter
is selected. Furthermore, BM3D (Dabov et al. 2007) proposed by Dabov et al. is a
rather novel method for additive Gaussian white noise removal. Therefore, inspired
by the work proposed in Makitalo and Foi (2010, 2014), the authors proposed to
transform the multiplicative noise removal to additive Gaussian noise removal by
applying the Box-Cox transformation in Huang et al. (2017), and the images are
finally recovered by an unbiased denoising algorithm. The Box-Cox transformation
parameter is determined through a maximum likelihood method. After applying the
Box-Cox transformation to the observed images, the BM3D method is utilized to
restore the transformed image, and an unbiased improvement is performed so that
the recovered image can finally be obtained.
Applying Box-Cox transformation with parameter λ to each pixel variable of
f =un to get
(un)λ − 1
f (λ) = (71)
λ
(L + λ) λ 1
f (λ) = u − +ε (73)
λLλ (L) λ
where the random variable ε ∼ N 0, σ 2 is based on the assumptions in the Box-
(L+λ) λ
Cox transformation. In (73), if we consider λLλ (L)
u − 1
λ as the original image
and f (λ) as the observed image, the additive noise removal methods can be applied
9 Models for Multiplicative Noise Removal 337
(L+λ) λ
to (73) and a denoised approximation w of λLλ (L)
u − λ1 can be recovered. Finally,
the reconstructed image can be obtained.
1
(L) (λw + 1) λ
û = L (74)
(L + λ)
Direct Method
If we use nonlocal mean directly, the key is how to correctly estimate similar
blocks under multiplicative noise. Now, to measure whether u1 = u2 by the noisy
observations f1 , f2 , Deledalle et al. (2009) suggest using an approximate
sDDT f1 , f2 := p f1 |u1 f1 u p f2 |u2 f2 u du (75)
S
as a measure of similarity. SDDT is equal to the NL-mean filter under additive noise.
When the above method is generalized to multiplicative noise, the conditional
density
pu1 (u) pu2 (u)
p u1 −u2 |(f1 ,f2 ) 0| f1 , f2 = p f1 |u1 f1 u p f2 |u2 f2 u du
S pf1 f1 pf2 f2
pu1 (u) pu2 (u) 1 f1 f2
pn pV2 du
S pf1 f1 pf2 f2 u2 1 u u
(77)
is approximated by SDDT .
For multiplicative Gamma noise,
L−1
(2L − 1) f1 f2
sDDT f1 , f2 = L 2L−1
(L)2 f1 + f2
(2L − 1) 1 1
=L L−1 (78)
(L) 2 f1 + f2 f1 f2
2+ f2 + f1
However, this measure does not seem to be optimal for multiplicative noise.
338 X. Feng and X. Zhu
ln fi = ln (ui ni ) = ln (ui ) + ln (ni ), i = 1, 2. (79)
6 78 9 6 78 9 6 78 9
f˜i ũi ñi
Lemma 1. For f1 , f2 > 0 with pfi fi and S = supp pũi , it holds that
p 0| ln f1 , ln f2
ũ1 −ũ2 | f˜1 ,f˜2
pũ1 (t) pũ2 (t) p f˜ ũ ln f1 t p f˜ ũ ln f2 t
1 1 2 2
= dt
S̃ pf˜1 ln f1 pf˜2 ln f2
= p u1 0| f1 , f2 (80)
u2 (f1 ,f2 )
L +∞
s f1 , f2 = L2L
f1 f2 1
exp −L f1 +f 2
du
(L)2 u2L+1 u
0
L (81)
(2L) (f1 f2 ) (2L)
= =
1
(L)2 (f1 +f2 )2L (L)2 2+ f1 + f2 L
f f 2 1
DNN Method
Indirect Method
The splitting method is used to solve the variational problem, and one of the
subproblems is replaced by DNN, which is essentially a plug-and-play model. Wang
et al. (2019) propose a model for general multiplicative noise removal in (82).
uk+1 , w k+1 = arg min E (u, w)
2
= af e−w + b2 f 2 e−2w + cw dx + θ1
2 u − ew − d k dx + λ (u) dx
(82)
2
where θ1 is the balance parameter. The second term u−e −d
w k dx makes
u = ew , and d k is the Bregman distance. The last term is the deep CNN denoiser
prior. Getting the solution of (82) directly is hard because of the term of (u).
Using the split method, we can import auxiliary variable z = u. Then (82) can be
transformed into (83).
uk+1 , w k+1 , zk+1 = arg min E (u, w, z)
⎧
⎪
⎨
−w b 2 −2w θ1 2
= af e + f e + cw dx + u − ew − d k dx
⎪
⎩ 2 2
⎫
⎪
⎬
θ2
+λ (z) dx + (z − u)2 dx (83)
2 ⎪
⎭
Each variable will be solved separately after dissociation; by using the alternating
optimization strategy, the optimization (84) can be divided into the following
subproblems on (u, w, z):
340 X. Feng and X. Zhu
⎧ ⎫
⎪
⎨ ⎪
⎬
1 2 −2w θ1 2
w = arg min E (w) =
k+1
f e − f e−w dx+ u−e −d
w k
dx
w ⎪
⎩ 2 2 ⎪
⎭
⎧ ⎫ (85)
⎪
⎨ ⎪
⎬
θ2
zk+1 = arg min E (z) = λ (z) dx + (z − u)2 dx (86)
z ⎪
⎩ 2 ⎪
⎭
⎧ ⎫
⎪
⎨ 2 ⎪
⎬
θ1 θ2
uk+1 = arg min E (u) = u − ew − d k dx + (z − u)2 dx
u ⎪
⎩ 2 2 ⎪
⎭
(87)
For calculating w, we can deduce the corresponding Euler-Lagrange equation:
f e−w − f 2 e−2w − θ1 uk+1 − ew − d k = 0 (88)
According to Bayesian theory, (91) is the Gaussian denoiser and the noise
variance is λ θ2 . In Wang et al. (2019), the authors use the CNN Gaussian denoiser
for solving (91) by considering the performance and discriminative image prior
modeling. The reason for using CNN is that it has achieved great success in
Gaussian denoising and better performance (such as PSNR results outperforms
BM3D’s (Dabov et al. 2007)) than a model-based method. By incorporating CNN
Gaussian denoiser into the model, we need not retrain the multiplicative noise
removal model for different types of noise. We can deal with different types of noise
only by changing the data fidelity.
9 Models for Multiplicative Noise Removal 341
k+1
The Bregman distance can be expressed as d k+1 = d k + ew − uk+1 .
Taking into account the above equations, we obtain the complete iteration used
in the algorithm for multiplicative noise removal (Fig. 7).
1. Initialization: u0 = f, d 0 = 0, w 0 = log u0 k = 0
2. Repeat
3. Compute w k using Eq. (89);
4. Compute zk using Eq. (91);
5. Compute uk using Eq. (93);
6. Compute Bergman parameter d
k+1
using d k+1 = d k + ew − uk+1 ;
1. k = k + 1;
2. Until k achieved the presetting value.
3⁕3 Conv ReLu
3⁕3 Conv
5⁕5 Conv ReLu
5 layers
Direct Method
F = UN (94)
1
p (N) = LL N L−1 e−LN (95)
(L)
W H # #2
1 / /# #
#ϕ F w,h − U w,h #
LE ϕE = # # (96)
WH 2
w=1 h=1
/ H
W / 2 2
LT V = Û w+1,h − Û w,h + Û w,h+1 − Û w,h (97)
w=1 h=1
L = L E + λ T V LT V (98)
Conclusion
References
Abolhassani, M., Rostami, Y.: Speckle noise reduction by division and digital processing of a
hologram. Optik 123(10), 937–939 (2012)
Aubert, G., Aujol, J.-F.: A variational approach to removing multiplicative noise. SIAM J. Appl.
Math. 68(4), 925–946 (2008)
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc.: Ser. B (Methodol.) 26(2),
211–243 (1964)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–
526 (2010)
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 2,
pp. 60–65. IEEE (2005)
Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete
with BM3D? In: 2012 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2392–2399. IEEE (2012)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted l1 minimization. J.
Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems.
Numer. Math. 76(2), 167–188 (1997)
344 X. Feng and X. Zhu
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer.
25, 161–319 (2016)
Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J.
Sci. Comput. 22(2), 503–516 (2000)
Chatterjee, P., Milanfar, P.: Is denoising dead? IEEE Trans. Image Process. 19(4), 895–911 (2009)
Chen, D.-Q., Cheng, L.-Z.: Spatially adapted total variation model to remove multiplicative noise.
IEEE Trans. Image Process. 21(4), 1650–1662 (2011)
Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: a flexible framework for fast
and effective image restoration. IEEE Trans. Pattern Anal. Mach Intell. 39(6), 1256–1272
(2016)
Chesneau, C., Fadili, J., Starck, J.-L.: Stein block thresholding for image denoising. Appl. Comput.
Harmon. Anal. 28(1), 67–88 (2010)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-
domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares
minimization for sparse recovery. Commun. Pure Appl. Math.: J. Issued Courant Inst. Math.
Sci. 63(1), 1–38 (2010)
Deledalle, C.-A., Denis, L., Tupin, F.: Iterative weighted maximum likelihood denoising with
probabilistic patch-based weights. IEEE Trans. Image Process. 18(12), 2661–2672 (2009)
Denis, L., Tupin, F., Darbon, J., Sigelle, M.: SAR image regularization with fast approximate
discrete minimization. IEEE Trans. Image Process. 18(7), 1588–1600 (2009)
Dong, Y., Zeng, T.: A convex variational model for restoring blurred images with multiplicative
noise. SIAM J. Imaging Sci. 6(3), 1598–1625 (2013)
Durand, S., Fadili, J., Nikolova, M.: Multiplicative noise removal using L1 fidelity on frame
coefficients. J. Math. Imaging Vis. 36(3), 201–226 (2010)
Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned
dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Gilboa, G., Darbon, J., Osher, S., Chan, T.: Nonlocal convex functionals for image regularization.
UCLA CAM-report, pp. 06–57 (2006)
Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Multiscale Model.
Simul. 7(3), 1005–1028 (2009)
Han, Y., Feng, X.-C., Baciu, G., Wang, W.-W.: Nonconvex sparse regularizer based speckle noise
removal. Pattern Recogn. 46(3), 989–1001 (2013)
Hao, Y., Feng, X., Xu, J.: Multiplicative noise removal via sparse and redundant representations
over learned dictionaries and total variation. Signal Process. 92(6), 1536–1549 (2012)
Hoekman, D.H.: Speckle ensemble statistics of logarithmically scaled data (radar). IEEE Trans.
Geosci. Remote Sens. 29(1), 180–182 (1991)
Hu, X., Wu, Y.H., Li, L.: Analysis of a new variational model for image multiplicative denoising.
J. Inequal. Appl. 2013(1), 568 (2013)
Huang, Y.-M., Moisan, L., Ng, M.K., Zeng, T.: Multiplicative noise removal via a learned
dictionary. IEEE Trans. Image Process. 21(11), 4534–4543 (2012)
Huang, Y.-M., Yan, H.-Y., Zeng, T.: Multiplicative noise removal based on unbiased box-cox
transformation. Commun. Comput. Phys. 22(3), 803–828 (2017)
Jin, Z., Yang, X.: Analysis of a new variational model for multiplicative noise removal. J. Math.
Anal. Appl. 362(2), 415–426 (2010)
Kang, M., Yun, S., Woo, H.: Two-level convex relaxed variational model for multiplicative
denoising. SIAM J. Imaging Sci. 6(2), 875–903 (2013)
Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-Laplacian priors. In: Advances in
Neural Information Processing Systems, pp. 1033–1041 (2009)
Laus, F., Steidl, G.: Multivariate myriad filters based on parameter estimation of Student-t
distributions. SIAM J. Imaging Sci. 12(4), 1864–1904 (2019)
Le, T., Vese, L.: Additive and multiplicative piecewise-smooth segmentation models in a
variational level set approach. UCLA CAM Report 03-52, University of California at Los
Angeles, Los Angeles (2003)
9 Models for Multiplicative Noise Removal 345
Lebrun, M., Colom, M., Buades, A., Morel, J.-M.: Secrets of image denoising cuisine. Acta
Numer. 21, 475 (2012)
Li, F., Shen, C., Fan, J., Shen, C.: Image restoration combining a total variational filter and a
fourth-order filter. J. Vis. Commun. Image Represent. 18(4), 322–330 (2007)
Lu, J., Shen, L., Xu, C., Xu, Y.: Multiplicative noise removal in imaging: an exp-model and its
fixed-point proximity algorithm. Appl. Comput. Harmon. Anal. 41(2), 518–539 (2016)
Makitalo, M., Foi, A.: Optimal inversion of the Anscombe transformation in low-count Poisson
image denoising. IEEE Trans. Image Process. 20(1), 99–109 (2010)
Makitalo, M., Foi, A.: Noise parameter mismatch in variance stabilization, with an application to
Poisson–Gaussian noise estimation. IEEE Trans. Image Process. 23(12), 5348–5359 (2014)
Mei, J.-J., Dong, Y., Huang, T.-Z., Yin, W.: Cauchy noise removal by nonconvex ADMM with
convergence guarantees. J. Sci. Comput. 74(2), 743–766 (2018)
Na, H., Kang, M., Jung, M., Kang, M.: Nonconvex TGV regularization model for multiplicative
noise removal with spatially varying parameters. Inverse Probl. Imaging 13(1), 117 (2018)
Na, H., Kang, M., Jung, M., Kang, M.: An exp model with spatially adaptive regularization
parameters for multiplicative noise removal. J. Sci. Comput. 75(1), 478–509 (2018)
Nikolova, M., Ng, M.K., Tam, C.-P.: Fast nonconvex nonsmooth minimization methods for image
restoration and reconstruction. IEEE Trans. Image Process. 19(12), 3073–3088 (2010)
Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth
nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)
Rudin, L., Lions, P.-L., Osher, S.: Multiplicative denoising and deblurring: theory and algorithms.
In: Geometric Level Set Methods in Imaging, Vision, and Graphics, pp. 103–119. Springer,
New York (2003)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
Setzer, S., Steidl, G., Teuber, T.: Deblurring Poissonian images by split Bregman techniques. J.
Vis. Commun. Image Represent. 21(3), 193–199 (2010)
Shama, M.-G., Huang, T.-Z., Liu, J., Wang, S.: A convex total generalized variation regularized
model for multiplicative noise and blur removal. Appl. Math. Comput. 276, 109–121 (2016)
Shao, L., Yan, R., Li, X., Liu, Y.: From heuristic optimization to dictionary learning: a review
and comprehensive comparison of image denoising algorithms. IEEE Trans. Cybern. 44(7),
1001–1013 (2013)
Shi, J., Osher, S.: A nonlinear inverse scale space method for a convex multiplicative noise model.
SIAM J. Imaging Sci. 1(3), 294–321 (2008)
Singh, P., Jain, L.: A review on denoising of images under multiplicative noise. Int. Res. J. Eng.
Technol. (IRJET) 03(04), 574–579 (2016)
Steidl, G., Teuber, T.: Removing multiplicative noise by Douglas-Rachford splitting methods. J.
Math. Imaging Vis. 36(2), 168–184 (2010)
Teuber, T., Lang, A.: A new similarity measure for nonlocal filtering in the presence of
multiplicative noise. Comput. Stat. Data Anal. 56(12), 3821–3842 (2012)
Tian, D., Du, Y., Chen, D.: An adaptive fractional-order variation method for multiplicative noise
removal. J. Inf. Sci. Eng. 32(3), 747–762 (2016)
Ulaby, F., Dobson, M.C., Álvarez-Pérez, J.L.: Handbook of Radar Scattering Statistics for Terrain.
Artech House, Norwood (2019)
Ullah, A., Chen, W., Khan, M.A.: A new variational approach for restoring images with
multiplicative noise. Comput. Math. Appl. 71(10), 2034–2050 (2016)
Ullah, A., Chen, W., Khan, M.A., Sun, H.: A new variational approach for multiplicative noise and
blur removal. PloS One 12(1), e0161787 (2017)
Wang, P., Zhang, H., Patel, V.M.: SAR image despeckling using a convolutional neural network.
IEEE Signal Process. Lett. 24(12), 1763–1767 (2017)
Wang, G., Pan, Z., Zhang, Z.: Deep CNN Denoiser prior for multiplicative noise removal.
Multimed. Tools Appl. 78(20), 29007–29019 (2019)
Xiao, L., Huang, L.-L., Wei, Z.-H.: A Weberized total variation regularization-based image
multiplicative noise removal algorithm. EURASIP J. Adv. Signal Process. 2010, 1–15 (2010)
346 X. Feng and X. Zhu
Xie, H., Pierce, L.E., Ulaby, F.T.: Statistical properties of logarithmically transformed speckle.
IEEE Trans. Geosci. Remote Sens. 40(3), 721–727 (2002)
Yun, S., Woo, H.: A new multiplicative denoising variational model based on mth root
transformation. IEEE Trans. Image Process. 21(5), 2523–2533 (2012)
Zhao, X.-L., Wang, F., Ng, M.K.: A new convex optimization model for multiplicative noise and
blur removal. SIAM J. Imaging Sci. 7(1), 456–475 (2014)
Zhao, C.-P., Feng, X.-C., Jia, X.-X., He, R.-Q., Xu, C.: Root-transformation based multiplicative
denoising model and its statistical analysis. Neurocomputing 275, 2666–2680 (2018)
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning
of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Recent Approaches to Metal Artifact
Reduction in X-Ray CT Imaging 10
Soomin Jeon and Chang-Ock Lee
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Background: CT Image Formation and Metal Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Normalized Metal Artifact Reduction (NMAR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Surgery-Based Metal Artifact Reduction (SMAR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Convolutional Neural Network-Based MAR (CNN-MAR) . . . . . . . . . . . . . . . . . . . . . . . . . 357
Industrial Application: 3D Cone Beam CT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Simulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Simulation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
NMAR vs. SMAR: Patient Image Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
SMAR vs. CNN-MAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
NMAR vs. SMAR for 3D CBCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Abstract
S. Jeon
Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston,
MA, USA
e-mail: [email protected]
C.-O. Lee ()
Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea
e-mail: [email protected]
Keywords
Introduction
X-ray computed tomography (CT) is one of the most widely used tomographic
imaging techniques for non-destructive visualization of structures inside objects.
X-ray CT uses radiation from X-rays whose energy is absorbed according to the
attenuation coefficients of the tissues in its path (Deans 2007). The cross-sectional
image is reconstructed slice by slice from the measured X-ray data at different
angles around the scanned object.
X-ray CT produces detailed, high-quality images, and its applicability is promis-
ing, but there are various artifacts that severely degrade the quality of CT images:
beam hardening artifacts, scattering artifacts, and artifacts due to partial volume
effects, photon starvation, undersampling, etc. (Barrett and Keat 2004). Artifacts in
CT images are defined as system-induced discrepancies between the reconstructed
CT image and the ground truth. These artifacts can be classified according to their
causes: (i) physics-based artifacts arising from the physical processes involved
during CT data acquisition; (ii) patient-based artifacts caused by factors such as
patient movement or the presence of metallic objects in or on the patient; (iii)
scanner-based artifacts due to defects in certain scanner functions; and (iv) others
such as helical and multi-section artifacts. Among the various causes, implanted
metals such as chest screws, dental fillings, and hip prostheses bring the most serious
artifacts in CT images. They can also be classified according to their shape as streak
artifacts, ring artifacts, cupping artifacts, etc.
The term metal artifact is a generic term for all artifacts caused by metallic
objects such as dental implants and surgical clips which lead to various effects
such as beam hardening, photon starvation, scattering, and noise increases (Boas
and Fleischman 2012). Metal artifacts spread over the entire image in a bright and
shadowy crown shape, damaging the quality of CT images and preventing accurate
diagnoses. For this reason, as CT imaging becomes more popular, the importance
of metal artifact reduction (MAR) technique increases.
Various studies have been attempted to understand metal artifacts, and several
approaches have been proposed to reduce them. Existing MAR methods can
be roughly classified into three categories: inpainting methods in the projection
domain, iterative reconstruction methods, and other methods. For methods based
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 349
the sinogram. The sinogram completion process is iteratively performed using the
basic principles of CT image reconstruction to remove the metal effect from the
sinogram.
Meanwhile, attempts are underway to exploit deep learning in almost all fields
of science and technology. In particular, the concept of deep learning was also
introduced to MAR in Ghani and Karl (2020), Gjesteby et al. (2017), Hwang et al.
(2018), and Zhang and Yu (2018). Among these, the convolutional neural network
(CNN)-based MAR (CNN-MAR) method (Zhang and Yu 2018) is best known as
a general open framework. The CNN-MAR method consists of two phases: CNN
phase and surgery phase with a prior image. In the CNN phase, CNN is used as
an information fusion tool to produce a reduced artifact image by combining the
uncorrected CT image and two pre-corrected ones from some model-based MAR
methods as the input data of the neural network. The surgery phase further reduces
the remaining artifacts by adding seamless surgery process with a prior image based
on tissue classification.
This work introduces the NMAR algorithm (Meyer et al. 2010), the surgery-
based MAR (SMAR) algorithm (Jeon and Lee 2018), and the CNN-MAR meth-
ods (Zhang and Yu 2018). It also reviews a methodology for reducing metal artifacts
in three-dimensional industrial cone beam CT systems (Jeon et al. 2021).
dI
(x) = −fE (x)I (x), x = s + t⊥ , = (cos θ, sin θ ),
dt
where I is the intensity of X-ray, s the distance along the detector, and t the distance
along the path of X-ray; see Fig. 1. Solving the above equation gives the formula for
I at the detector:
Fig. 1 Illustration of the Radon transform and the sinogram. (Reprinted from Jeon and Lee (2018)
with permission from IOS Press)
where I0 (E) is the initial intensity of the X-ray with energy level E. Here, the
projection data Rθ fE (s) is the Radon transform of fE defined by
Rθ fE (s) := fE (x)δ( · x − s) dx,
R2
where Emin and Emax are the minimum and maximum energy levels of the X-ray,
respectively. Then the sinogram PfE is given by
PfE = Pθ fE θ ,
for
Iθ (s)
Pθ fE (s) = − ln , (3)
I0
E
where I0 = Emin max
I0 (E) dE. Then, the CT image is reconstructed from the
sinogram using (1) with PfE instead of RfE . The CT image reconstruction is shown
in Fig. 2.
A smooth function is called a Schwartz function if all its derivatives including
itself decay at infinity faster than the inverse of any polynomial. A function g(θ, s)
defined on [0, 2π ) × R is said to satisfy the homogeneous polynomial condition if
for k = 1, 2, . . ., the integral
g(θ, s)s k ds (4)
R
Fig. 2 Illustration of the reconstruction of CT image from sinogram with inverse Radon transform
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 353
Methods
This method gives excellent results in the absence of high contrast. If bones
or metals are present, normalization with thickness cannot produce a very flat
sinogram, resulting in new artifacts. To extend this idea to objects composed
of bones, metals, and other high-contrast materials, NMAR uses a prior image
that takes these materials into account. Through denormalization, NMAR restores
traces of high-contrast objects buried in metal shadows. This is because the shape
information of these objects is contained in the sinogram of the prior image. NMAR
ensures a certain level of smoothness at the boundaries of the metal traces in the
corrected sinogram and recovers traces of objects contained in the prior image.
NMAR Algorithm
Figure 3 provides a diagram of the different steps of NMAR algorithm. An
uncorrected image is reconstructed from the original sinogram p. The metal image
is then obtained by thresholding. The prior image f prior is created by segmenting
soft tissues and bones. Forward projection produces the corresponding sinograms.
The original sinogram p is then normalized by division by pprior projected from
f prior . The division is only performed on pixels where the divisor is greater than a
Fig. 3 Scheme of NMAR algorithm – from the original sinogram, an uncorrected image is
reconstructed. By thresholding, the metal image and the prior image are obtained. Forward
projection yields the corresponding sinograms. The normalized sinogram is then obtained by
dividing the original sinogram by the sinogram of the prior image. The metal projections determine
where data in the normalized sinogram are replaced by interpolation. The interpolated and
normalized sinogram is denormalized by multiplying it with the sinogram of the prior image.
Reconstruction yields the corrected image. (Reprinted from Meyer et al. (2010) with permission
from John Wiley and Sons)
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 355
small positive value to avoid division by zero. A simple interpolation operation Mint
is performed on the normalized sinogram pnorm to obtain a sinogram with metal
traces removed. Subsequently, the corrected sinogram pcorr is obtained through
denormalization, which multiplies Mint pnorm by pprior :
pprior = Rf prior ,
p
pnorm = ,
pprior
pcorr = pprior Mint pnorm .
In this step, the structure information from the prior image is brought back into
the metal trace because traces of high-contrast objects are included in the sinogram
of the prior image. Normalization and multiplication procedures ensure that there
is no difference between the original sinogram and corrected sinogram, except
for metal trace. Hence, only sinogram values around metal traces are needed for
normalization and denormalization. After reconstruction, the metal is inserted back
into the corrected image.
An important step in NMAR algorithm is finding a good prior image. It should
be modeled as close as possible to the uncorrected image, but should not contain
artifacts. To achieve this, it is necessary to identify air regions, soft tissue regions,
and bone regions. After smoothing the image with Gaussian, simple thresholding
can be applied to segment air, soft tissue, and bone. It is also useful to smooth the
steak structure as described in Müller and Buzug (2009) to reduce streak artifacts
before segmentation. See Meyer et al. (2010) for more details.
Even though NMAR algorithm removes metal artifacts very well, it still generates
streaking artifacts because the corrected sinogram is not consistent. Recently, a
new metal artifact reduction algorithm called SMAR, based on sinogram surgery,
was proposed to reduce metal artifacts by calibrating the sinogram to be nearly
consistent (Jeon and Lee 2018).
SMAR algorithm consists of two steps: a preprocessing step and an iterative
reconstruction step. In the preprocessing step, the metal part from the given CT
image is extracted, and then its metal trace is determined by the forward projection
as in the NMAR algorithm. In the iterative reconstruction step, in order to moderate
metal artifacts, several processes are performed such as average fill-in, sinogram
surgery, and reconstruction from the updated sinogram. Detailed descriptions of
each of these are given below.
Preprocessing Step
(1) Metal extraction: The metal region M can be extracted by simple thresholding.
356 S. Jeon and C.-O. Lee
(2) Surgery region designation: Once the metal region M has been extracted, its
forward projection using the Radon transform R establishes the surgery region
Mproj = supp{RχM },
This region coincides with the corrupted part of the sinogram due to metal.
(1) Average fill-in: For the reconstructed CT image from the previous step, f (n−1) ,
a connected region C is segmented which is surrounding M. Using v (n−1) , the
average of the attenuation coefficients f (n−1) of the region C, the average fill-in
step is evaluated as
(n−1) = Rf(n−1)
p
p(n) = p
(n−1) χMproj + p(0) (1 − χMproj ).
The resulting image has less streak artifacts compared to f (n−1) . Here, other
sophisticated reconstruction methods can be also applied for the image quality
improvement.
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 357
As the iterative reconstruction step is repeated, streak artifacts are reduced gradu-
ally because the missing data is complementarily replaced for both the sinogram and
the reconstructed CT image. The iterative reconstruction step is terminated when
the relative difference between the sinogram data becomes less than the tolerance
level. The convergence of the SMAR algorithm is given empirically in the Appendix
of Jeon and Lee (2018).
Figure 4 provides a schematic diagram of SMAR algorithm.
and two pre-corrected auxiliary images. The loss function Loss : {U, V, W } → R
is defined by
1
N
Loss = CL (un , W ) − vn 2F ,
N
n=1
where · F denotes the Frobenius norm and N is the number of input data,
U = {u1 , · · · , uN } the input data where each ui consists of a raw image and
two auxiliary images, V = {v1 , · · · , vN } a target data of reference images, and
CL a CNN containing a parameter set W . To optimize the loss function, stochastic
gradient descent with momentum (SGDM) is used. SGDM is a variant of stochastic
gradient descent (SGD) by adding momentum to accelerate the SGD algorithm
which updates parameters randomly in order to avoid the situation “trap in local
minima.” SGD is based on traditional gradient descent (GD) algorithm. SGDM is
formulated as
C0 (u) = u0 ,
Cm (u) = ReLU(Wm ∗ Cm−1 (u) + bm ), m = 1, . . . , L − 1,
CL (u) = WL ∗ CL−1 (u) + bL ,
where ∗ stands for convolution, Wm is the m-th kernel, and bm is the bias in the m-
th layer. Each layer consists of 32 channels. The last layer generates an image that
is close to the target. The convolutional kernel is 3 × 3 in each layer. Zero padding
is used in each layer to maintain the size of output data as the same as input data.
Whole architecture of the CNN is shown in Fig. 5.
CNN-MAR Method
The CNN-MAR algorithm consists of the following five steps:
Fig. 5 Architecture of the CNN for metal artifact reduction. (Reprinted from Zhang and Yu (2018)
with permission from IEEE)
f CNN = CL (uinput ) .
Here, the parameters in Cm have been found in advance from the CNN training. In
implementation, L = 5 was used.
Even after the CNN processing, f CNN still has considerable artifacts. Therefore,
additional process is applied to reduce these artifacts; a prior image is generated
from f CNN by the tissue processing in Zhang and Yu (2018). First, because
the water-equivalent tissues have similar attenuations and are accounted for a
dominant proportion in a patient, the pixels corresponding to these tissues are
assigned uniform values. For simple calculation, it is assumed that f CNN consists
of bone, water, and air. Using the k-means clustering on f CNN , two thresholds are
determined; one threshold is the bone-water threshold, and the other is the water-air
threshold. Then, a binary image B is obtained with the water region set to 1 and the
rest set to 0.
To replace the metal trace of the sinogram, a distance image D is introduced,
which is made from the binary image B as follows. The pixel value of D is set to
the distance between the pixel and its nearest 0 pixel if it is not greater than 5 and set
to 5 if it is greater than 5. Hence, in the image D = {Di }, most of the water pixels
have the value 5, and there are 5-pixel transition regions, while the other pixels are
zero. We compute the weighted average of the water pixel values:
360 S. Jeon and C.-O. Lee
Di f CNN
f¯water = i
i
.
i Di
prior
This prior image f prior = {fi } is smoother than f CNN . Using the prior image,
sinogram correction and image reconstruction are performed as follows. First, let the
metal trace occupy from the (jn + 1)-th pixel to the (jn + n )-th pixel in the n-th
projection view according to θ . Then the metal trace is replaced by the following:
prior CNN − R f prior
Rθ fjCNN
n +n +1 − R θ fj n + n +1 − Rθ fj n θ j n
corr
pθ,k = (kn − jn )
n
n + 1
prior prior
+ Rθ fkn + Rθ fjCNN
n
− R θ fjn , jn ≤ kn ≤ jn + n + 1
of pθ is kept in Rθ f
and the other part corr CNN . This produces a new projection data
pcorr = pθcorr θ , which connects the correction of the metal trace to the surrounding
unaffected projection data. It is kind of a seamless surgery of sinogram. Finally, a
corrected CT image is reconstructed by the FBP algorithm for pcorr , and metals are
inserted back into the corrected image. Note that this seamless surgery can also be
used for the SMAR algorithm.
There are two key factors for the success of the CNN-MAR method: selection
of the appropriate pre-corrected auxiliary CT images and preparation of training
data. The first factor provides information to help CNN distinguish between tissue
structures and artifacts. The second factor ensures the generality of the trained CNN
by including as many kinds of metal artifact cases as possible.
Section “Surgery-Based Metal Artifact Reduction (SMAR)” to 3D, and CAD data
is adopted as a shape prior information. In the SMAR algorithm, it is essential to
accurately segment the average fill-in region for the success of the algorithm, and
for this purpose, a registration algorithm is proposed to register the CAD data to the
reconstructed CT volume.
Data Preparation
First, using the given CAD data, a binary volume data VCAD such as
⎧
⎨1, x ∈ inside of the object
VCAD (x) =
⎩0, x ∈ outside of the object.
where the moments Txx , Tyy , Tzz and the products of moment Txy , Txz , Tyz are
given by
Txx = (x ) dV , Tyy =
2
(y ) dV , Tzz =
2
(z2 ) dV ,
V V V
and
Txy = Tyx = − xy dV , Tyz = Tzy = − yz dV , Txz = Izx = − zx dV .
V V V
Since the moment tensor T is symmetric, by the principal axis theorem, the
eigenvectors of T are the principal axes of V .
Let v1 , v2 , v3 be unit eigenvectors of T . Then corresponding eigenvalues
λ1 , λ2 , λ3 satisfy the relation
T vi = λi vi , i = 1, 2, 3.
For the binary volumes V1 and V2 , the ratio of object sizes can be obtained by using
the relationship of the eigenvalues. If r denotes the ratio of sizes between V1 and
V2 , then
⎧
⎨T V1 = 2 ) dV
xx V1 (x
⎩Txx
V2
= 2 r 3 dV
V2 (rx)
V1 V2
implies that r 5 Txx = Txx . Therefore, the scaling constant r becomes
5 λV2
r= .
λV1
Using matrices of principal axes, QCAD and QCT , and scaling ratio r, the
transformation matrix Q to align VCAD to VCT is expressed as
Q = rQCT Q−1
CAD = rQCT QCAD .
T
φ(x, y, z)
⎡ ⎤
(x−a)(n21 (1− cos θ )+ cos θ )+(y−b)(n1 n2 (1− cos θ )−n3 sin θ )+(z−c)(n1 n3 (1− cos θ )+n2 sin θ )
⎢ ⎥
= φ0 ⎢ 2 ⎥
⎣(x−a)(n1 n2 (1− cos θ )+n3 sin θ )+(y−b)(n2 (1− cos θ )+ cos θ )+(z−c)(n2 n3 (1− cos θ )−n1 sin θ )⎦ ,
2
(x−a)(n1 n3 (1− cos θ )−n2 sin θ )+(y−b)(n2 n3 (1− cos θ )+n1 sin θ )+(z−c)(n3 (1− cos θ )+ cos θ )
(6)
Fig. 6 Three-dimensional rotation. (Reprinted from Jeon et al. (2021) with permission from
Taylor & Francis)
axis is aligned with z-axis and (n1 , n2 , n3 )(0) = (0, 0, 1) is set. Finally, the angle
between the center slices of VCT and Valign is computed and set as θ (0) . We generate
particles in the proper intervals centered at (a, b, c)(0) , (n1 , n2 , n3 )(0) , and θ (0) .
Which variable is updated first depends on how much the updated value affects
other variables: updates in order of rotation angle, translation, and rotation axis.
The three-dimensional computation is highly time-consuming, and the most
time-consuming part is the PSO process of finding parameters that minimize (5).
As the number of particles increases, computation time is linearly increasing.
Therefore, a two-resolution approach can be adopted to reduce the computation
time of the registration process. For the down-sampled data, less particles can be
used. The parameter obtained from the down-sampled data is used as an initial for
the registration of the original sized data.
Simulation Conditions
In the simulation study, to generate the polychromatic sinogram, the parallel beam
were modeled with 512 channels per detector and 1800 views per half rotation.
Seven discrete energy bins (10, 20, 30, 40, 60, 80, 100 keV) were defined (Table 1),
and all X-ray coefficients were obtained from the National Institute of Standards
and Technology (NIST) database (Hubbell and Seltzer 2004).
For a quantitative analysis, the metal effect-free CT images f were used as
references, and the performance of MAR algorithms are measured with three error
measurements: the relative l2 error, the relative l∞ error, and peak signal-to-noise
ratio (PSNR). PSNR is defined as
peak value
PSNR = 20 log ,
RMSE
where peak value is the range of window and RMSE is the root mean squared error.
A simple notation is used for the relative error between a CT image f and the
reference CT image f ,
f − f ∗
f ∗, := , ∗ = 2, ∞,
f ∗
where
f 2 := |fi |2 , f ∞ := max |fi |.
i
i
The iteration process of the SMAR algorithm was terminated when the relative
difference of the sinogram data in the sinogram surgery region became less than
Tol = 10−4 . In all reconstructed images, the window level with a width of 1000
centered at 0 (C/W = 0/1000 (HU)) is used.
In this section, the numerical results in Jeon and Lee (2018) are presented.
To compare NMAR and SMAR algorithms, patient images were tested (Fig. 8).
Three cross-sectional images (pelvis, chest, and dental) were selected from a CT
dataset acquired in a dosimetry study of 68 Ga- NOTA-RGD PET/CT (Kim et al.
2012). All study procedures were approved by the Institutional Review Board
of Seoul National University Hospital, Seoul, Korea. Simulated metallic objects
were inserted into the patient images while assuming that the metallic objects are
titanium. The X-ray energy spectrum in Table 1 was used. Because it is difficult
366
Table 1 X-ray intensity and nominal Hounsfield units (HU) and linear attenuation coefficients for materials used in simulations
Air, dry [/mm] Adipose [/mm] Tissue [/mm] Bone [/mm] Iron [/mm] Titanium [/mm]
Energy [keV] X-ray intensity (−1000 HU) (−100 HU) Water [/mm] (0 HU) (150 HU) (1000 HU) (1000 HU ≤) (1000 HU ≤)
10 0.0000 6.169E-04 0.3037 0.5329 0.6455 4.989 134.26 49.815
20 1.604 9.372E-05 0.0528 0.08096 0.09876 0.7002 20.21 7.1325
30 26.93 4.263E-05 0.02847 0.03756 0.04548 0.2329 6.435 2.2374
40 49.12 2.994E-05 0.02227 0.02683 0.03226 0.1165 2.856 0.9963
60 42.78 2.506E-05 0.01835 0.02059 0.02458 0.05509 0.9483 0.3447
80 46.31 2.002E-05 0.01673 0.01837 0.02188 0.03901 0.4684 0.1823
100 14.00 1.857E-05 0.01569 0.01707 0.02032 0.03246 0.2925 0.1224
S. Jeon and C.-O. Lee
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 367
Fig. 8 Patient images (pelvis, chest, and dental). (Reprinted from Jeon and Lee (2018) with
permission from IOS Press)
Fig. 9 Patient pelvis experiment. Uncorrected CT image (left) and results of NMAR (middle) and
SMAR (right): C/W = 0/1000 (HU). (Reprinted from Jeon and Lee (2018) with permission from
IOS Press)
to assign energy level-varying X-ray attenuation coefficients for each tissue type in
patient images, it is assumed that only the X-ray attenuation coefficient of a metallic
object depends on the X-ray energy level.
Figure 9 shows the simulation result for the pelvis of a patient with metallic
hips. In the uncorrected CT image, there are streak artifacts between the metallic
hips, and they corrupt the anatomical structure. NMAR reduces most of the streak
artifacts; however, the resulting image contains bright and dark artifacts which blur
the anatomical structure. In comparison, SMAR reduces streak artifacts effectively
without generating such bright and dark artifacts. As a result, a clean CT image is
obtained and the textures are also preserved well.
As shown in Table 2, the initial relative l∞ and l2 errors, 5.0085 and 0.7044, are
decreased by nearly half to 3.5728 and 0.2341, respectively, for NMAR. The SMAR
algorithm drops the errors more significantly, with the resulting relative l∞ and l2
errors becoming 0.2293 and 0.0269, respectively, the values which are decreased by
a factor of 20 from the initial levels.
Figure 10 presents the experimental results for the chest of a patient. Two metallic
screws inserted into the spine generate streak artifacts, which severely damage
the anatomical structure near the spine. While NMAR does reduce the major part
of the streak artifacts, it generates additional artifacts near the metallic objects,
368 S. Jeon and C.-O. Lee
Table 2 The performance comparison between NMAR and SMAR for the patient image sim-
ulations. The number of iterations is denoted by n. (Reprinted from Jeon and Lee (2018) with
permission from IOS Press)
Initial error NMAR SMAR
Phantom f (0) ∞, f (0) 2, · ∞, · 2, n f (n) ∞, f (n) 2,
Pelvis 5.0085 0.7044 3.5728 0.2341 16 0.2293 0.0269
Chest 12.1957 0.9187 7.8182 0.3100 26 0.5878 0.0314
Dental 11.9255 1.7471 3.4632 0.4378 14 0.3734 0.0476
Fig. 10 Patient chest experiment. Uncorrected CT image (left) and results of NMAR (middle) and
SMAR (right): C/W = 0/1000 (HU). (Reprinted from Jeon and Lee (2018) with permission from
IOS Press)
resulting in bright and dark patterns. These newly generated artifacts appear near the
metallic objects and thus corrupt the anatomical structure. Moreover, in the NMAR
result, the metallic objects are thicker than the original metallic objects, whereas
SMAR improves the image quality without generating additional artifacts, so that
the anatomical structures near the spine can be successfully distinguished. As shown
in Table 2, the initial relative l∞ and l2 errors, 12.1957 and 0.9187, are decreased in
the NMAR result to 7.8182 and 0.3100, respectively. The resulting relative l∞ and
l2 errors of SMAR are 0.5878 and 0.0314, respectively, values which are lower by a
factor of 20 from the initial values.
In the dental image simulations, streak artifacts appear to connect three metallic
objects, as shown in Fig. 11. As shown in the zoomed images of the solid boxes,
both NMAR and SMAR reduce the streak artifacts. However, NMAR produces the
shadow effects even in the region near the teeth and shows undulated artifacts across
the entire image domain. Even in a region far from the metallic objects, undulated
artifacts also appear, as shown in the zoomed images, and they degrade the image
quality. As shown in Table 2, the initial relative l∞ and l2 errors, 11.9255 and 1.7471,
are decreased for NMAR to 3.4632 and 0.4378, respectively. The resulting relative
l∞ and l2 errors for SMAR are 0.3734 and 0.0476, respectively, showing decreases
by a factor of 30 from the initial levels.
In patient image simulations, unlike NMAR, SMAR does not generate undulated
artifacts. SMAR produces clear images and performs noticeably better than NMAR.
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 369
Fig. 11 Patient dental experiment. Uncorrected CT image (left) and results of NMAR (middle)
and SMAR (right): C/W = 0/1000 (HU). (Reprinted from Jeon and Lee (2018) with permission
from IOS Press)
a b
Fig. 12 Reference images for the test of CNN-MAR. (a) Dental image. (b) Chest image
Data Acquisition
For the training of CNN in the CNN-MAR method, data patches are obtained from
dental, head, and pelvis images collected from “The 2016 Low-Dose CT Grand
Challenge” training dataset (AAPM 2016). To show the performance of the CNN-
MAR, a dental image and a chest image in Fig. 12 were chosen from a CT dataset
acquired in a dosimetry study of 68 Ga-NOTA-RGD PET/CT (Kim et al. 2012). It is
expected that the CNN works well when the dental image is used for test, but it will
not work when the chest image is used since it is not in the training set.
370 S. Jeon and C.-O. Lee
Results
First, the CNN-MAR method in Zhang and Yu (2018) used BHC and LI results as
auxiliary images for the input data, but in this work, SMAR and LI results are also
provided as auxiliary images. Since the SMAR algorithm produces better results
than the BHC method, it is expected that CNN-MAR will produce a better output
if the BHC image input is replaced by an SMAR image. Furthermore, since the
SMAR algorithm gives better results than LI, CNN-MAR with only one auxiliary
image from the SMAR algorithm is considered.
Figure 13 is the results of the dental case. Indeed, streak artifacts between the
teeth are reduced in all methods. However, there are big differences in the red-
colored squares. From the numerical results in Table 3, CNN-MAR with SMAR
and LI is the best.
Figure 14 is the results of the chest case. Note that the CNN did not learn the
chest image patches. From the figure, it can be seen that CNN-MAR with SMAR
and LI reduces the streak artifacts well compared with LI. However, comparing with
SMAR, breastbone structure is not clearly reconstructed. In Table 4, SMAR shows
the best performance in terms of PSNR and the l2 error. Furthermore, CNN-MAR
with SMAR only shows the best performance in terms of the maximum error.
In this section, the numerical results in Jeon et al. (2021) are presented.
Fig. 15 Sample 1: Data acquisition setting (left) and upper view of the scanned object (right).
(Reprinted from Jeon et al. (2021) with permission from Taylor & Francis)
Fig. 16 Sample 2: Data acquisition setting (left), upper view of sample body (middle), and lead
(Pb) poles. (Reprinted from Jeon et al. (2021) with permission from Taylor & Francis)
a b c d
e f g h
Fig. 17 MAR results for Sample 1: (a) uncorrected CT image, (b) NMAR, (c) SMAR, and (d)
shape prior SMAR; (e), (f), (g), and (h) are zoomed-in images of (a), (b), (c), and (d), respectively.
(Reprinted from Jeon et al. (2021) with permission from Taylor & Francis)
a b c d
Fig. 18 Center slice views: (a) Sample 2 with an air bubble, (b) reconstructed CT image
containing severe beam hardening artifacts, (c) VCAD , and (d) corrected result. (Reprinted
from Jeon et al. (2021) with permission from Taylor & Francis)
is included near the three tiny cylindrical pores as shown in Fig. 18a. However,
as shown in Fig. 18c, the CAD data does not have the information about the air
bubble. For implementation, three discrete bins (40, 60, 100 keV) are defined, and
all X-ray attenuation coefficients are obtained from Table 1. For convenience, it
is assumed that only the X-ray attenuation coefficient of the lead poles depends
on the X-ray energy level. Using the registration results in the previous section,
sinogram surgery is performed to reduce the metal artifacts due to two lead poles.
As shown in Fig. 18c, an air bubble is not contained in the CAD data. Due to the
severe artifacts, the bubble is hardly identified in Fig. 18b. The shape prior SMAR
algorithm successfully reduces most of the streak artifacts, and it can also accurately
detect the hidden air bubble as shown in Fig. 18d.
Conclusion
In this work, three recent approaches for metal artifact reduction in X-ray CT were
investigated: NMAR, SMAR, and CNN-MAR.
NMAR has shown good performance for various types of metallic implants
and thus been considered as one of the best currently available MAR algorithms.
However, finding a good prior image is at the heart of this algorithm. Incorrect
segmentation results can lead to residual artifacts. A more advanced segmentation
algorithm will definitely improve the results compared to simple thresholding.
SMAR algorithm was applied to various patient images. As in other MAR
approaches based on tissue classification, it is essential for the SMAR algorithm
to find a good tissue classification. The average fill-in region is decided based on
the tissue classification. The advantage of the SMAR algorithm stems from this
point. By filling in the region surrounding metallic objects with the average values,
a resulting image is obtained with less streak artifacts. Then this image is used as
new input data for the next iteration. The SMAR algorithm tends to converge to a
moderate value of the image intensity. Results can be improved when using a more
sophisticated segmentation method rather than simple segmentation based on CT
numbers.
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 375
References
AAPM: Low dose CT grand challenge. Resource document. American Association of Physicists
in Medicine (2016). http://www.aapm.org/GrandChallenge/LowDoseCT/
Abdoli, M., Ay, M.R., Ahmadian, A., Dierckx, R., Zaidi, H.: Reduction of dental filling metallic
artifacts in CT-based attenuation correction of PET data using weighted virtual sinograms
optimized by a genetic algorithm. Med. Phys. 37(12), 6166–6177 (2010)
Bal, M., Spies, L.: Metal artifact reduction in CT using tissue-class modeling and adaptive
prefiltering. Med. Phys. 33(8), 2852–2859 (2006)
Bal, M., Celik, H., Subramanyan, K., Eck, K., Spies, L.: A radial adaptive filter for metal artifact
reduction. Proc. SPIE 5747, 2075–2082 (2005)
Barrett, J.F., Keat, N.: Artifacts in CT: recognition and avoidance. Radiographics 24(6), 1679–1691
(2004)
376 S. Jeon and C.-O. Lee
Boas, F.E., Fleischmann, D.: CT artifacts: causes and reduction techniques. Imaging Med. 4(2),
229–240 (2012)
Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Process. 10, 266–277 (2001)
De Man, B., Nuyts, J., Dupont, P., Marchal, G., Suetens, P.: An iterative maximum-likelihood
polychromatic algorithm for CT. IEEE Trans. Med. Imaging 20(10), 999–1008 (2001)
Deans, S.R.: The Radon Transform and Some of Its Applications. Dover, New York (2007)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: MHS’95, Proceedings
of the Sixth International Symposium on Micro Machine and Human Science, pp. 39–43 (1995).
https://doi.org/10.1109/MHS.1995.494215
Ghani, M.U., Karl, W.C.: Fast enhanced CT metal artifact reduction using data domain deep
learning. IEEE Trans. Comput. Imaging 6, 181–193 (2020). https://doi.org/10.1109/TCI.2019.
2937221
Gjesteby, L., Yang, Q., Xi, Y., Shan, H., Claus, B., Jin, Y., De Man, B., Wang, G.: Deep learning
methods for CT image-domain metal artifact reduction. Proc. SPIE 10391, 103910W (2017).
https:doi.org/10.1117/12.2274427
Gu, J., Zhang, L., Yu, G., Xing, Y., Chen, Z.: X-ray CT metal artifacts reduction through curvature
based sinogram inpainting. J. X-Ray Sci. Technol. 14(2), 73–82 (2006)
Helgason, S.: The Radon transform on Euclidean spaces, compact two point homogeneous spaces
and Grassmann manifolds. Acta Math. 113, 153–180 (1965)
Huang, X., Wang, J., Tang, F., Zhong, T., Zhang, Y.: Metal artifact reduction on cervical CT
images by deep residual learning. BioMed. Eng. OnLine 17, 175 (2018). https://doi.org/10.
1186/s12938-018-0609-y
Hubbell, J.H., Seltzer, S.M.: X-ray mass attenuation coefficients. Resource document.
National Institute of Standards and Technology (2004). https://www.nist.gov/pml/x-ray-mass-
attenuation-coefficients/
Jaberipour, M., Khorram, E., Karimi, B.: Particle swarm algorithm for solving systems of nonlinear
equations. Comput. Math. Appl. 62(2), 566–576 (2011). https://doi.org/10.1016/j.camwa.2011.
05.031
Jeon, S., Lee, C.-O.: A CT metal artifact reduction algorithm based on sinogram surgery. J. X-Ray
Sci. Technol. 26, 413–434 (2018)
Jeon, S., Kim, S., Lee, C.-O.: Shape prior metal artefact reduction algorithm for industrial 3D cone
beam CT. Nondestruct. Test. Eval. 36(2), 176–194 (2021). https://doi.org/10.1080/10589759.
2019.1709457
Kachelrieß, M., Watzke, O., Kalender, W.A.: Generalized multi-dimensional adaptive filtering
(MAF) for conventional and spiral single-slice, multi-slice, and cone-beam CT. Med. Phys.
28(4), 475–490 (2001)
Kalender, W.A., Hebel, R., Ebersberger, J.: Reduction of CT artifacts caused by metallic implants.
Radiology 164(2), 576–577 (1987)
Kano, T., Koseki, M.: A new metal artifact reduction algorithm based on a deteriorated CT image.
J. X-Ray Sci. Technol. 24(6), 901–912 (2016)
Kim, Y., Yoon, S., Yi, J.: Effective sinogram-inpainting for metal artifacts reduction in X-ray CT
images. In: Proceedings of 2010 IEEE 17th International Conference on Image Processing,
pp. 597–600 (2010)
Kim, J.H., Lee, J.S., Kang, K.W., Lee, H.-Y., Han, S.-W., Kim, T.-Y., Lee, Y.-S., Jeong, J.M.,
Lee, D.S.: Whole-body distribution and radiation dosimetry of 68 Ga-NOTA-RGD, a positron
emission tomography agent for angiogenesis imaging. Cancer Biother. Radiopharm. 27, 65–71
(2012)
Klotz, E., Kalender, W., Sokiranski, R., Felsenberg, D.: Algorithm for the reduction of CT artifacts
caused by metallic implants. Proc. SPIE 1234, 642–650 (1990)
Koehler, T., Brendel, B., Brown, K.: A new method for metal artifact reduction. In: The Second
International Conference on Image Formation in X-Ray Computed Tomography, Salt Lake City
(2012)
Lemmens, C., Faul, D., Nuyts, J.: Suppression of metal artifacts in CT using a reconstruction
procedure that combines MAP and projection completion. IEEE Trans. Med. Imaging 28(2),
250–260 (2009)
10 Recent Approaches to Metal Artifact Reduction in X-Ray CT Imaging 377
Mahnken, A.H., Raupach, R., Wildberger, J.E., Jung, B., Heussen, N., Flohr, T.G., Günther, R.W.,
Schaller, S.: A new algorithm for metal artifact reduction in computed tomography: in vitro and
in vivo evaluation after total hip replacement. Investig. Radiol. 38(12), 769–775 (2003)
Meyer, E., Raupach, R., Lell, M., Schmidt, B., Kachelrieß, M.: Normalized metal artifact reduction
(NMAR) in computed tomography. Med. Phys. 37(10), 5482–5493 (2010)
Müller, J., Buzug, T.M.: Spurious structures created by interpolation-based CT metal artifact
reduction. Proc. SPIE 7258, 72581Y (2009)
Osher, S., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM J. Numer.
Anal. 27, 919–940 (1990)
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on
Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)
Park, H.S., Hwang, D., Seo, J.K.: Metal artifact reduction for polychromatic X-ray CT based on a
beam-hardening corrector. IEEE Trans. Med. Imaging 35, 480–487 (2016)
Perona, P., Shiota, T., Malik, J.: Anisotropic diffusion. In: ter Haar Romeny, B.M. (ed.) Geometry-
Driven Diffusion in Computer Vision, pp. 73–92. Springer, Dordrecht (1994)
Philips Healthcare: Metal artifact reduction for orthopedic implants (O-MAR), White Paper,
Philips CT Clinical Science, Andover (2012)
Prell, D., Kyriakou, Y., Beister, M., Kalender, W.A.: A novel forward projection-based metal
artifact reduction method for at-detector computed tomography. Phys. Med. Biol. 54, 6575–
6591 (2009)
Timmer, J.: Metal artifact correction in computed tomography. US Patent, 7,340,027 (2008)
Verburg, J.M., Seco, J.: CT metal artifact reduction method correcting for beam hardening and
missing projections. Phys. Med. Biol. 57(9), 2803–2818 (2012)
Wang, G., Snyder, D.L., O’Sullivan, J.A., Vannier, M.W.: Iterative deblurring for CT metal artifact
reduction. IEEE Trans. Med. Imaging 15(5), 657–664 (1996)
Watzke, O., Kalender, W.A.: A pragmatic approach to metal artifact reduction in CT: merging of
metal artifact reduced images. Eur. J. Radiol. 14(5), 849–856 (2004)
Wei, J., Chen, L., Sandison, G.A., Liang, Y., Xu, L.X.: X-ray CT high-density artifact suppression
in the presence of bones. Phys. Med. Biol. 49(24), 5407–5418 (2004)
Zhang, Y., Yu, H.: Convolutional neural network based metal artifact reduction in X-ray computed
tomography. IEEE Trans. Med. Imaging 37, 1370–1381 (2018)
Zhang, Y., Yan, H., Jia, X., Yang, J., Jiang, S.B., Mou, X.: A hybrid metal artifact reduction
algorithm for X-ray CT. Med. Phys. 40, 041910 (2013)
Zhang, K., Han, Q., Xu, X., Jiang, H., Ma, L., Zhang, Y., Yang, K., Chen, B., Wang, J.: Metal
artifact reduction of orthopedics metal artifact reduction algorithm in total hip and knee
arthroplasty. Medicine (Baltimore) 99(11), e19268 (2020)
Zhao, S., Bae, K.T., Whiting, B., Wang, G.: A wavelet method for metal artifact reduction with
multiple metallic objects in the field of view. J. X-Ray Sci. Technol. 10, 67–76 (2002)
Domain Decomposition for Non-smooth
(in Particular TV) Minimization 11
Andreas Langer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Basic Idea of Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Difficulty for Non-smooth and Non-separable Optimization Problems . . . . . . . . . . . . . . . . 385
Domain Decomposition for Smoothed Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
Direct Splitting Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
Decomposition Based on the Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Decomposition for Predual Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Overlapping Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Non-overlapping Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Decomposition for Primal Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Basic Domain Decomposition Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Domain Decomposition Approach Based on the (Pre)Dual . . . . . . . . . . . . . . . . . . . . . . . . . 412
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
Abstract
A. Langer ()
Centre for Mathematical Sciences, Lund University, Lund, Sweden
e-mail: [email protected]
Keywords
Introduction
Fig. 1 Overlapping
decomposition into two
domains
Γ2
Ω1 Ω2
Γ1
Γ
Ω2
Ω2
Ω1 n
n Γ
Ω1
for a decomposition of the spatial domain into two subdomains. Here u is the
unknown function; denotes the Laplace operator; is a two-dimensional domain,
i.e., ⊂ R2 , with Lipschitz boundary ∂; and f is a given function.
where each n is the outward pointed normal on from 1 . Here we see that due
to the partition of , the original problem (1) is replaced by two subproblems
on each subdomain by imposing both Neumann and Dirichlet conditions on .
These conditions transmit information from one domain patch to the other and
therefore they are called transmission conditions. The equivalence between the
Poisson problem (1) and the multi-domain problem (2) is in general not obvious,
but can be shown under suitable regularity assumptions on f , typically f ∈ L2 (),
by considering the associated variational formulation; see, for example, Quarteroni
and Valli (1999).
⎧ ⎧
⎪ ⎪
⎪ k+1
⎨Lu2 = f
k+1
⎨Lu1 = f
⎪ in 1 ⎪ in 2
k+1
uk+1 =0 on ∂1 \ and u2 = 0 on ∂2 \ (3)
⎪
⎪
1 ⎪
⎪
⎩uk+1 = λk ⎪ ∂uk+1
⎩ ∂uk+1
∂n = ∂n
1 on 2 1
on
with
λk+1 := α̂uk+1
2| + (1 − α̂)λ ,
k
(1989) for a convergence proof based on a functional analysis argument for partial
differential equations.
find uk+1
1 ∈ W1 : a1 (uk+1
1 , v1 ) = (f, v1 )1 ∀v1 ∈ H01 (1 )
uk+1 = λk on
find uk+1
2 ∈ W2 : a2 (uk+1
2 , v2 ) = (f, v2 )2 ∀v2 ∈ H01 (2 )
a2 (uk+1 k+1
2 , R2 μ) = (f, R2 μ)2 + (f, R1 μ)1 − a2 (u1 , R1 μ) ∀μ ∈ W
⎧ ⎧
⎪
⎪ k+1 ⎪
⎪ k+1
⎪Lu1 = f
⎨ in 1 ⎪Lu2 = f
⎨ in 2
uk+1 = uk| on 1 and uk+1 = uk+1 on 2 , (4)
⎪
⎪
1 1 ⎪
⎪
2 1|2
⎪
⎩uk+1 = 0 ⎪
⎩ k+1
1 on ∂1 \ 1 u2 = 0 on ∂2 \ 2
384 A. Langer
1
J(w, u) := a(w, w) − (f, w) + a(u, w). (5)
2
The variational formulation of method (4) reads as follows: initialize u0 ∈ H01 ()
and for k ≥ 0 solve
⎧
⎪
⎪ w1 ∈ W1 : a(w1 , v1 ) = (f, v1 ) − a(u , v1 )
k 0 k k for all v1 ∈ W10
⎪
⎪
⎪
⎨ uk+1/2 = uk + w k
1
(6)
⎪
⎪ w2k ∈ W20 : a(w2k , v2 ) = (f, v2 ) − a(uk+1/2 , v2 ) for all v2 ∈ W20
⎪
⎪
⎪
⎩ uk+1 = uk+1/2 + w k
2
or equivalently
⎧
⎪
⎪ w1k = arg minw1 ∈W 0 J(w1 , uk )
⎪
⎪
⎨ k+1/2 1
u = uk + w1k
(7)
⎪
⎪ w2k = arg minw2 ∈W 0 J(w2 , uk+1/2 )
⎪
⎪ 2
⎩ uk+1 = uk+1/2 + w k .
2
Additive Schwarz method If we make the two steps in (4) independent from
each other, which allows for parallelization, then we obtain the additive alternating
Schwarz method, which computes the sequence of approximations by solving
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 385
⎧ ⎧
⎪
⎪ k+1 ⎪
⎪ k+1
⎨Lu1 = f
⎪ in 1 ⎨Lu2 = f
⎪ in 2
uk+1 = uk| on 1 and uk+1 = uk| on 2 . (8)
⎪
⎪
1 1 ⎪
⎪
2 2
⎪ k+1
⎩ ⎪ k+1
⎩
u1 = 0 on ∂1 \ 1 u2 = 0 on ∂2 \ 2
or
⎧
⎪
⎨ w1 = arg minw1 ∈W10 J(w1 , u )
⎪ k k
where J is defined as in (5). By relation (9) we verify that the original formulation
(8) is equivalent to the variational formulation.
Note that in the overlapping domain decomposition methods presented above,
the subdomain problems are of the same type in each subdomain, while for the non-
overlapping methods the subdomain problems differ due two interface conditions,
which are distributed among the subdomain problems.
For a broader discussion on domain decomposition approaches for partial
differential equations, we refer to Chan and Mathew (1994), Dolean et al. (2015),
Mathew (2008), Quarteroni and Valli (1999), Toselli and Widlund (2006), and Smith
et al. (2004).
Three main issues are of high interest when analyzing domain decomposition
methods: (i) convergence, (ii) rate of convergence, and (iii) the independence of the
rate of convergence on the mesh size, which can be interpreted as a preconditioning
strategy. When talking about convergence, one usually means convergence to a
386 A. Langer
solution of the global problem. However, we will also learn to know domain
decomposition methods that do converge but not necessarily to a solution of
the global problem. Hence, in the sequel when we talk about convergence, we
distinguish between convergence to some point, which may not be a solution of the
global problem, and convergence to a solution of the global problem. For smooth
energies, the convergence to a solution of the global problem and the other two
concerns are at large well established. We remark, that for non-smooth problems,
decomposition algorithms may still work fine as long as the energy splits additively
with respect to the domain decomposition. For such problems convergence to a
solution of the original problem and sometimes even the rate of convergence are
ensured; see, for example, Fornasier (2007), Tseng (2001), Tseng and Yun (2009),
and Wright (2015). In (2009) Vonesch and Unser could provide preconditioning
effects of a subspace correction algorithm for minimizing a non-smooth energy
when applied to deblurring problems. Let us mention that there is a tremendous
amount of literature devoted to splitting methods for non-smooth but separable
problems in the context of coordinate descent methods (Wright 2015). We are not
revising these methods, but concentrate on non-smooth and non-separable problems,
where the situation to construct splitting methods that converge to the correct
solution seems more complicated as the following counterexample by Warga (1963)
indicates.
Example 1. Let V := [0, 1]2 , V1 := {(c, 0) : c ∈ [0, 1]}, V2 := {(0, c) : c ∈ [0, 1]}
and ϕ : V → R given by ϕ(x) = |x1 − x2 | − min{x1 , x2 }, where x = (x1 , x2 ).
We observe that ϕ is convex but non-smooth and non-additive with respect to the
splitting, i.e., ϕ(x) = ϕ((x1 , 0)) + ϕ((0, x2 )). We have that 0 ∈ arg minx∈Vi ϕ(x)
for i ∈ {1, 2} and thus x2k = x1k = 0 for all k ≥ 0. On the contrary (1, 1) ∈
arg minx∈V ϕ(x).
the total variation of u in (Ambrosio et al. 2000; Giusti 1984), where C01 (, Rd )
is the space of continuously differentiable vector-valued functions with compact
support in and | · | 2 denotes the standard Euclidean norm. Here and in the rest
of this chapter, bold letters indicate vector-valued functions. If u ∈ W 1,1 (), the
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 387
Sobolev
space of L1 functions with L1 distributional derivative, then |Du| =
|∇u| 2 dx. Note that different vector norms may be used in the definition of the
total variation. More precisely one may use | · | r with 1 < r ≤ ∞. For example,
the case r = ∞ is considered in Hintermüller and Kunisch (2004).
It is well established that the total variation preserves edges and discontinuities
in images (Chambolle et al. 2010; Chan and Shen 2005), which is one of the reasons
why it has been introduced to image processing as a regularization technique (Rudin
et al. 1992). In this approach one typically minimizes an energy consisting of a data-
fidelity term D, which enforces the consistency between the observed image and the
solution, a total variation term, as a regularizer, and a positive parameter λ weighting
the importance of these two terms. That is, one solves
min D(u) + λ |Du|.
u
The choice of the data term usually depends on the type of noise contamination. For
example, in the case of Gaussian noise, a quadratic L2 data fidelity term is used,
while for impulsive noise an L1 term is suggested (Alliney 1997) and seems more
successful than an L2 term (Nikolova 2002, 2004). Other and different fidelity terms
have been considered in connection with other types of noise models as Poisson
noise (Le et al. 2007), multiplicative noise (Aubert and Aujol 2008), and Rician
noise (Getreuer et al. 2011). For images which are simultaneously contaminated by
Gaussian and impulse noise (Cai et al. 2008), a combined L1 -L2 data fidelity term
has been suggested and demonstrated to work satisfactorily (Calatroni et al. 2017;
Hintermüller and Langer 2013; Langer 2017b, 2019). We will restrict ourselves
to Gaussian noise removal, i.e., L2 data fidelity, as it will cover the fundamental
domain decomposition approaches for total variation minimization proposed so far.
That is, we consider the so-called L2 -TV model
1
min J (u) := T u − g2L2 () + λ |Du| , (12)
u∈BV () 2
where BV () = {u ∈ L1 () : |Du| < ∞} is the space of bounded variation
functions (Ambrosio et al. 2000), g ∈ L2 () is the observation, and T ∈ L(L2 ())
is a bounded linear operator modeling the image formation device. Typical examples
for T are (i) convolution operators, which describe blur in an image; (ii) the identity
operator I , if an image is only corrupted by noise; (iii) the characteristic function of
a subdomain marking missing parts, i.e., the inpainting domain; or (iv) the Fourier
transform, if the observed data are given as corresponding frequencies. Since ⊂
Rd , d = 1, 2, the embedding BV () → L2 () is continuous (Attouch et al.
2014, Theorem 10.1.3), and hence problem (12) is equivalent to minu∈L2 () J (u).
In order to ensure the existence of a minimizer of J , we assume that J is coercive in
BV (), i.e., for every sequence (un )n∈N ⊂ BV () with un BV () → ∞, we have
J (un ) → ∞ or equivalently {u ∈ BV () : J (u) ≤ c} is bounded in BV () for all
388 A. Langer
constants c > 0. This condition holds if T does not annihilate constant functions,
i.e., 1 ∈ ker(T) (Acar and Vogel 1994).
In the context of total variation minimization, the crucial difficulty in deriv-
ing suitable domain decomposition methods lies in the correct treatment of the
interfaces of the domain decomposition patches, i.e., the preservation of crossing
discontinuities and the correct matching where the solution is continuous. This
difficulty is reflected by various effects of the total variation: (i) it is non-smooth,
(ii) it preserves discontinuities and edges in images, and (iii) it is non-additive
(non-separable) with respect to a non-overlapping domain decomposition, since
the total variation of a function on the whole domain equals the sum of the total
variation on the subdomains plus the size of the possible jumps at the interface.
That is, let 1 and 2 be a disjoint (non-overlapping) decomposition of , then
the total variation has the following splitting property (cf. Ambrosio et al. (2000,
Theorem 3.84, p. 177)):
|D(u|1 + u|2 )| = |D(u|1 )| + |D(u|2 )|
1 2
+ |u+ −
| − u| | dH
d−1
(x), (13)
∂1 ∩∂2 1 2
1
min || div p + g||2L2 () over p ∈ H0 (div, )
2 (14)
subject to (s.t.) |p(x)| 2 ≤ λ for almost all (f.a.a.) x ∈ ,
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 389
(see Hintermüller and Kunisch (2004) and Hintermüller and Rautenberg (2015))
where H0 (div, ) := {v ∈ L2 ()d : div v ∈ L2 (), v · n = 0 on ∂} with n being
the outward unit normal on ∂. Instead of (14), one may write equivalently
1
min {F (p) := || div p + g||2L2 () + χK (p)}, (15)
p∈H0 (div,) 2
or
1
min {F(p) := div p + g2L2 () dx + Iλ (p)}
p∈H0 (div,) 2
and
⎧
⎨0 if |p(x)| ≤ λ f.a.a. x ∈ Dom(p)
2
Iλ (p) :=
⎩∞ otherwise.
u∗ = div p∗ + g, (16)
(see Hintermüller and Kunisch (2004)). Note that (14) is separable with respect to
a disjoint decomposition of the spatial domain . Let be decomposed into M
j =1 , then for p ∈ H0 (div, ) we have
disjoint subdomains (j )M
M
| div p + g|2 dx + Iλ (p) = | div(p|j ) + g|2 dx + Iλ (p|j ). (17)
j =1 j
For ease of notation, in the sequel for any sequence (v n )n∈N , we write (v n )n
instead.
If one seeks a minimizer of (12) in the Sobolev space W 1,1 (), then (12) becomes
1
min {J (u) = T u − g2L2 () + λ |∇u| dx}. (18)
u∈W 1,1 () 2
We note that the total variation of u ∈ W 1,1 () is additive with respect to a disjoint
decomposition of , i.e., the interface term in (13) vanishes.
Due to the optimality of vin we get that (J (un ))n is monotonically decreasing and
hence (un )n ⊂ W 1,1 () is bounded, since J is coercive, i.e., (un )n ⊂ {u ∈
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 391
W 1,1 () : J (u) ≤ J (u0 )}. Note that W 1,1 () is non-reflexive and (18) is convex
but neither strongly nor strictly convex and still non-smooth, due to the presence of
the L1 term. Hence, the convergence theory of Tai (2003), Tai and Xu (2002), Tai
and Tseng (2002), and Tseng and Yun (2009) does not cover this splitting algorithm.
A similar decomposition method is considered in Chen and Tai (2007) but without
any rigorous theoretical convergence analysis.
1
Due to the presence of the term |∇u| , this equation is not well defined at points
∇u = 0. To overcome this shortcoming, we introduce an additional small parameter
> 0 to slightly perturb the total variation semi-norm, such that (12) becomes
1
min T u − g2L2 () + λ |∇u|2 + dx. (19)
u∈W 1,1 () 2
Note that the functional in (19) is now strictly convex, Gâteaux differentiable, and
separable. Hence, domain decomposition methods which converge to a solution of
the global problem may be constructed following Tseng and Yun (2009). Domain
decomposition methods for (19) and (20) have been considered, for example,
in Chen and Tai (2007) and Xu et al. (2010, 2014).
While these smoothed problems possess the advantage that domain decomposi-
tion methods with desired convergence properties could be possibly designed, they
do not generate solutions that preserve discontinuities and edges.
In order to avoid the difficulties due to the minimization of a non-smooth and non-
additive energy over a non-reflexive Banach space in (12), the predual problem
(14) of (12) may be tackled instead. In particular the smooth objective and the box
constraint in (14) seem more amenable to domain decomposition than the structure
392 A. Langer
of (12). In fact, in Chang et al. (2015) overlapping and in Hintermüller and Langer
(2015), Lee et al. (2019b), and Lee and Park (2019a,b) non-overlapping domain
decomposition methods for (14) are proposed. Let us review the main ideas and
results for these approaches.
end for
also view it from the other way round, namely, that the partition of unity provides the
overlapping splitting of the spatial domain. From a practical point of view, this has
the advantage, that a partition of unity can always be easily constructed and hence
an overlapping decomposition of the domain. In the case of a rectangle, which is a
usual shape of an image, a simple example for a partition of unity for a splitting into
three subdomains is shown in Fig. 3.
The first convergent overlapping domain decomposition method for the mini-
mization of (14) is presented in Chang et al. (2015), here presented in Algorithm 2.
There the partition of unity (θj )j is chosen such that (21), (22), and (23), θj ≥ 0
and
Cθ
∇θj L∞ () ≤ , (24)
δ
where Cθ > 0 and δ > 0 denotes the overlapping size, hold. The estimate (24)
seems reasonable, as for small overlapping sizes we would expect a larger gradient
and it may allow to get a feeling on how the convergence of the algorithm depends
on the overlapping size; see Theorem 1.
Fig. 3 Partition of unity for a decomposition into three overlapping subdomains. (a) θ1 (b) θ2 . (c) θ3
A. Langer
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 395
Remark that the relaxation parameter α̂ is now only in the interval (0, M 1
], whose
range is theoretically justified, in particular to guarantee the monotonic decay of
(F (pn ))n ; see Chang et al. (2015) for more details.
The convergence of Algorithms 2 and 3 to a solution of the global problem (15)
with rate O( n1 ) is guaranteed. We recall this main result by referring to Chang et al.
(2015) for its proof.
C2
un − u∗ 2L2 () ≤ F (pn ) − F (p∗ ) ≤
n
with
√
2 √ 1
− 1 M M √
C := ζ0 (2M + 1) + 8 2Cθ λ|| 2 (ζ ) 2 √ + 2 − 1
2 0
α̂ δ α̂
We observe that the constant C in Theorem 1 depends on the tunable parameters α̂,
δ, and M. Some comments according to these parameters are in order. Observe that
if the number of subdomains M grows, C grows as well. In order to overcome this
behavior, we may use a so-called coloring technique; see, e.g., Toselli and Widlund
(2006). That is, is partitioned into Mc classes of overlapping subdomains, where
each class has a different color and each class is the union of disjoint subdomains
396 A. Langer
with the same color. We note that in general the disjoint domains with the same
color cannot be solved in parallel without introducing additional new constraints, as
the following example, borrowed from Warga (1963), shows.
√
√
2 1
− 1 Mc N0 √
C= ζ 0 (2Mc + 1) + 8 2Cθ λ|| 2 (ζ ) 2 √
2 0
+ 2−1 ,
α̂ δ α̂
where Mc and N0 may be small, e.g., 2 or 4 (see above), even if the total number
of subdomains grows. A complementary behavior is observed for the parameters α̂
and δ. That is, the smaller these parameters, the larger the constant C. Consequently
one may choose α̂ = 1 in Algorithm 2 and α̂ = M 1
(or respectively α̂ = M1c when
using a coloring technique) in Algorithm 3, which will lead to a faster convergence,
n, +1
θj λp̂n, n,
j + θj λτ ∇(div p̂j − gj )
p̂n,0
j ∈Kj (e.g., p̂n,0 n
j =p̂j ), p̂j = for ≥ 0,
θj λ + τ |∇(div p̂n,
j − gj )| 2
(25)
n+1
where gj := g − div + i>j θi
i<j pi pn for Algorithm 2 and gj := g −
div i =j θi p
n for Algorithm 3. For 0 < τ ≤ 1
8 one shows analogous to the proof
of Chambolle (2004, Theorem 3.1) that the iterates (p̂n,
j ) converge to a respective
minimizer p̂n+1
j of the subdomain problems as → ∞. Due to the presence of θj
in the nominator in (25), the update of p̂n,
j is only performed in j and hence the
subdomain problems are indeed restricted to the respective subdomains.
As already mentioned above, the convergence analysis carried out for the overlap-
ping domain decomposition algorithms in Chang et al. (2015) leading to Theorem 1
cannot be directly applied to a non-overlapping splitting. In particular, till now
it is still an open problem to construct a non-overlapping domain decomposition
method for (14) in an infinite dimensional setting which is guaranteed to converge
to a minimizer of the original global problem. However, for a finite difference and
finite element discretization of (14), splitting methods which converge to the desired
optimum are introduced in Hintermüller and Langer (2015), Lee et al. (2019b),
and Lee and Park (2019a,b). The first method in this series has been proposed
in Hintermüller and Langer (2015) for a finite difference discretization of (14),
where instead of |p| 2 ≤ λ the constraint |p| ∞ ≤ λ is originally used. Nevertheless,
398 A. Langer
the algorithms in Hintermüller and Langer (2015) and its convergence results can
be easily transformed into our setting, i.e., |p| 2 ≤ λ, in which we will review them.
the discrete div : Y → X are defined in a standard way by forward and backward
h
differences such that divh = −(∇ h )∗ ; see, for example, Hintermüller and Langer
(2015). Using this notation the discrete version of (15) is then written as
norms uh 2Xj := x∈h |uh (x)|2 , ph 2Yj := ph,1 2Xj + ph,2 2Xj for uh ∈ Xj
j
and ph ∈ Yj , j = 1, . . . , M.
splitting, we set
⎧
⎨1 if x ∈ h
θjh (x) := j
for j = 1, . . . , M,
⎩0 if x ∈ h \ h ,
j
C
F h (ph,n ) − F h (ph,∗ ) ≤ ,
n
where
⎛ ⎞
M
1
C := M ⎝ divh θjh (ph,∗ − ph,0 )2X ⎠ + (M − 1) F h (ph,0 ) − F h (ph,∗ ) .
2
j =1
(27)
400 A. Langer
Mc
1
divh θjh (ph,∗ − ph,0 )2X ≤ divh (ph,∗ − ph,0 )2X
2
j =1
2
+ c1 max |ph,∗ (x) − ph,0 (x)| 2
x∈h
Fig. 5 Non-overlapping
domain decomposition of h
into h1 and h2 with the Ωh1
h ⊂ (h \ h )
small stripes
for i = 1, 2
i i
h
Ω 2
h
Ω 1
Ωh2
where phj ∈ Kjh , phjc ∈ i =j Kih , and ζ is a suitable function independent on phj ,
j ∈ {1, . . . , M}. Here divh ∪ h , where
is the usual discrete divergence on hj ∪
j j j
h ⊂ h \ h is a small stripe around the interface between h and h \ h . A
j j j j
typical choice for h for which this splitting property holds is shown in Fig. 5 for
j
a decomposition into two domains. Note that the stripe h may be arbitrarily small
j
and hence in the limit case it may be viewed as the boundary of hj inside h , i.e.,
h = ∂h \ ∂h and ∂h ∩ h = ∅. Then using the above splitting property of
j j j j
the divergence operator, a solution ph,n+1
j of the subspace minimization problem of
Algorithms 5 and 6 in j is given as
1
ph,n+1 ∈ arg min divh ∪ ph + fj 2X
j + Iλθjh (pj )
h
j | h ∪
h 2 j j j
j j
h j
p ∈Y
j
(28)
ph,n+1 = 0,
j | h \(h h
j ∪j )
1
min || divh ∪ (ξ j ) + g|h h h + ζ ((1 − θjh )ph,n )| h h ||2X
j 2 j j
j ∪ j ∪j j
ξj ∈Y j
(29)
s.t. projVj ξ j = projVj p h,n
and |ξ j (x)| 2 ≤ λ for all x ∈ hj hj ,
∪
402 A. Langer
divergence operator where this new boundary condition is considered. Then the
optimization problem in the subdomains can be written as
1 h h
min ||div (pj ) + g|h h + ζ ((1 − θjh )ph,n )| h ||2Xj
phj ∈Yj 2 j j
(30)
s.t. |phj (x)| 2 ≤ λ for all x ∈ hj
or equivalently
1 h h
min ||div (pj ) + g|h h + ζ ((1 − θjh )ph,n )| h ||2Xj + Iλ (phj ) (31)
phj ∈Yj 2 j j
p= (p)i ψi ,
i∈I
where I is the set of indices of the basis functions (ψi )i∈I of Y and (p)i denotes
the respective degree of freedom. Based on these definitions, the finite element
discretization of (15) is
1
min div p + g2L2 () + χC (p), (32)
p∈Y 2
where Tj and Ej are the collections of all elements and edges in j for j =
1, . . . , M. Let Ij be the set of indices of the basis functions for Yj and I the
set of indices of degree of freedom of Y on := j <i ∂j ∩ ∂i . By Y =
M
span{ψi }i∈I we denote the interface function space. Further let YI := j =1 Yj ,
Cj := {p ∈ Yj : |(p)i | 2 ≤ λ ∀i ∈ Ij }, CI := M C
j =1 j and C := {p ∈ Y :
|(p )i | 2 ≤ λ ∀i ∈ I }. Note that for p ∈ Y there exists a unique decomposition
such that
⎛ ⎞
M
p = pI ⊕ p = ⎝ pj ⎠ ⊕ p
j =1
1
min div(pI ⊕ p ) + g2L2 () + χCI (pI ) (33)
pI ∈YI 2
for a fixed p ∈ C . We remark that thanks to the splitting property (17) a solution
of (33) can be obtained by independently solving on each subspace
1
min div(pj ⊕ p | ) + g2L2 () + χCj (pj ).
pj ∈Yj 2 j
1
p ∈ arg min div(HI p ⊕ p ) + g2L2 () + χC (p ). (34)
p ∈Y 2
404 A. Langer
It can be shown that (i) if p∗ ∈ Y is a solution of (32), then p∗ = p∗| is a solution
of (34) and (ii) if p∗ ∈ Y is a solution of (34), then p∗ = HI p∗ ⊕ p∗ is a solution
of (32) (Lee and Park 2019a). Using FISTA to solve (34) the domain decomposition
algorithm presented in Algorithm 7 is obtained (Lee and Park 2019a), where projC
is the orthogonal projection onto C .
We remark once more that the minimizer HI qn in Algorithm 7 may be obtained by
solving independently on each subdomain
1
pnj ∈ arg min div(pj ⊕ pn | ) + g2L2 () + χCj (pj )
pj ∈Yj 2 j
M n
and setting HI qn = j =1 pj . Due to the utilization of FISTA, Algorithm 7
converges with order O(1/n2 ) to a solution p∗ ∈ Y of (34).
This approach relies on a splitting into a problem defined on YI and a subdomain
problem defined on the interface Y , which are alternately solved. A similar
decoupling approach is presented in Lee et al. (2019a), where the functional to be
minimized is additively separated with respect to a finite difference discretization
into a problem on disjoint subdomains and one interface problem. By utilizing
the primal-dual algorithm of Chambolle and Pock (2011) these two problems are
successively solved. Note that due to a disjoint splitting, a parallelization of the
problem on these disjoint subdomains is possible. This method is used to minimize
a functional consisting of a total variation term and an L1 date fidelity term with
applications to image denoising, inpainting, and deblurring. For block coordinate
descent methods, a similar splitting approach is presented in Chambolle and Pock
(2015).
A FETI Approach
In contrary to the above finite element approach, in Lee et al. (2019b) a further
and different domain decomposition method is proposed, where the local function
spaces Ỹj are defined in the tearing-and-interconnecting fashion by
For p ∈ Ỹj the jump p · n might be non-zero, which is related to tearing the
subdomain solutions apart. Further let I˜j be the set of indices of the basis functions
M
Ỹj and Ỹ = j =1 Ỹj . Then based on the splitting (17) on each subdomain j ,
j = 1, . . . , M, the following optimization problem might be solved:
1
p̃j ∈ arg min div pj + g2L2 ( ) + χC̃j (pj ), (35)
p ∈Ỹ
2 j
j j
M
1
min div p̃j + g2L2 ( ) + χC̃j (p̃j ) s.t. B p̃ = 0.
p̃∈Ỹ j =1 2 j
M
1
min max div p̃j + g2L2 ( ) + χC̃j (p̃j ) + B p̃, μR|I | . (36)
p̃∈Ỹ μ∈R|I | 2 j
j =1
Since B is bounded, the saddle point problem (36) can be solved by the primal-dual
algorithm proposed in Chambolle and Pock (2011) which yields Algorithm 8.
terms are considered. Also for these applications, the convergence of the splitting
method to a minimizer of the global problem is ensured. A similar tearing-and-
interconnecting strategy together with the primal-dual algorithm (Chambolle and
Pock 2011) has been used in Duan et al. (2016) for image segmentation, more
precisely for the convex Chan-Vese model (Chan et al. 2006). However, in this
setting the convergence of the algorithm to a minimizer of the global problem seems
unclear, as the existence of an isomorphism, similar to the one above, is not shown.
Note that in Duan et al. (2016) the minimization of the total variation is directly
considered and not its predual counterpart.
M
L2 () into M ∈ N appropriate subspaces Uj such that L2 () = j =1 Uj .
In terms of domain decomposition, let be separated into M subdomains j ,
j = 1, . . . , M. Here the decomposition of the domain may be overlapping or non-
overlapping. Then Uj := {u ∈ L2 () : supp(u) ⊂ j } for j = 1, . . . , M. With
this splitting we aim to solve (12) by Algorithm 9.
end for
∞ M
j =1 ⊂ L () is a partition of unity with the properties (i)
Here (θj )M i=j θj =
1 and (ii) θj ∈ Uj for j = 1, . . . , M. From the assumptions on θj we obtain
M
un = M j =1 (θj u ). Further, if the Uj s are orthogonal, i.e., U =
n
j =1 Uj , then
θj un = unj for all n ∈ N and hence there is no need to introduce a partition of unity.
The successive version of Algorithm 9 is stated in Algorithm 10.
end for
or equivalently
Convergence Properties
It can be shown that Algorithms 9 and 10 generate sequences (un )n in L2 (), which
have subsequences that weakly converge in L2 () and BV (), such that (J (un ))n
is non-increasing for all n ∈ N (Hintermüller and Langer 2013, Proposition 3.1).
As a consequence (J (un ))n is also convergent, since it is bounded from below.
Unfortunately the limit point of such subsequences is not guaranteed to be a
solution of the global problem (12), as the following one-dimensional (d = 1)
counterexample demonstrates:
l(c − 1) + λ = 0
which is equivalent to
λ
c =1− .
l
Hence, for λ = l (in particular for λ ≥ l) the minimizer is c = 0 and hence u11 = 0.
In this situation b = 0 for all j ∈ {1, . . . , M} and both algorithms. Consequently
u1j = 0 for all j ∈ {1, . . . , M} and hence u1 = 0 = u0 . If λ = l, a repetition of
these steps shows that un = 0 for all n ∈ N.
On the contrary the minimizer of the global optimization problem (12) is u∗ = 1
for any λ ≥ 0.
Note that this example works for an overlapping as well as for a non-overlapping
decomposition of the spatial domain . Moreover, this counterexample can be
easily extended to a multi-domain decomposition and to R2 by letting ⊂ R2
be a rectangle decomposed into stripes, for example, as in Fig. 4b. A similar
counterexample has been presented in Lee and Nam (2017) for a finite difference
discretization by using the relation to the predual problem.
Despite this quite negative result, in a finite difference setting in Hintermüller
and Langer (2013) an estimate of the distance of a limit point uh,∞ obtained by
discrete version of Algorithm 9 or Algorithm 10 to the true global minimizer uh,∗
is obtained. Let us use the finite difference setting of section “Finite Difference
Setting”, define Xjc := i =j Xi for j ∈ {1, . . . , M}, and consider the discrete
version of J defined as
∗
where T h : X → X is a bounded linear operator. Then, if T h T h is positive
definite in the direction uh,∞ − uh,∗ with smallest eigenvalue σ > 0 and η̂h ∈
arg min h M h h,∞ c ηh X , then
η ∈ j =1 ∂J (u )∩Xj
η̂h X
uh,∞ − uh,∗ X ≤ . (38)
α2 σ
Note that the Lagrange multiplier η̂h indicates the influence of the constraint on
the solution. If η̂h = 0, then the minimizer of the discrete version of (37) is
equivalent to the minimizer of J h in X and hence is indeed a solution of the global
problem. On the contrary, if η̂h = 0, then the discrete version of the constraint in
410 A. Langer
(37) has influence on the solution, which consequently does not coincide with the
global solution. Hence, this estimate does not contradict with the counterexample,
but instead provides an a posteriori upper bound to check whether the algorithm
is indeed converged for a considered example. In particular if η̂jh,nk X → 0 for
k → ∞ along a suitable subsequence (nk )k for at least one j ∈ {1, . . . , M}, then
any accumulation point of the sequence (uh,n )n generated by the discrete version
of Algorithm 9 or Algorithm 10 minimizes J h . By this observation, with the help
of this estimate in Hintermüller and Langer (2013), it is demonstrated by numerical
experiments that Algorithms 9 and 10 generate sequences which seem to converge
to the global minimizer, because η̂h X tends to zero.
It is worth mentioning that Algorithms 9 and 10 have not only been proposed for
the L2 -TV model but also for total variation minimization with a combined L1 /L2
data fidelity term, which seems in particular suitable for removing simultaneously
Gaussian and impulsive noise in images (Hintermüller and Langer 2013). For
a non-overlapping decomposition of the domain , these algorithms have been
also utilized for total variation minimization with an H −1 constraint, i.e., for
solving
1
min T u − g2−1 + |Du|,
u∈BV () 2
where · −1 denotes the H −1 () norm (Schönlieb 2009). In Chang et al. (2014)
a similar splitting method for minimizing the nonlocal total variation (see Gilboa
and Osher (2009), Peyré et al. (2008), Zhang et al. (2010), and the references
therein for more information on nonlocal total variation) is described without any
rigorous theoretical analysis. For total variation image segmentation in Duan et al.
(2016) and Duan and Tai (2012), the domain decomposition methods based on
an additive decomposition of the objective have been proposed. Nevertheless, a
proof of convergence of these methods to a solution of the global problem is
missing.
Subspace Minimization
Algorithms 9 and 10 require that the subspace minimization problems are solved
exactly, which is in general not easily possible. Moreover, due to the presence of the
operator T , which acts on the variable to be minimized, a restriction of the subspace
minimization problems to the respective subdomains and subspaces seems in
general difficult, in particular if T is a global operator. Therefore, in Fornasier et al.
(2010), Fornasier and Schönlieb (2009), and Hintermüller and Langer (2014) the
subproblems are approximated by the so-called surrogate functionals (Daubechies
et al. 2004): assume a, uj ∈ Uj , b ∈ i =j Ui and define
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 411
1
Jjs (uj + b, a + b) : = J (uj + b) + δuj + b − (a + b)2L2 ()
2
−T (uj + b − (a + b))2L2 ()
δ 1
= uj − a + T ∗ g − T (a + b) 2L2 ()
2 δ
+λ |D(uj + b)| + Φ(a, b, g)
un,k+1
j = arg min Jjs (uj + b, un,k
j + b), k ≥ 0, (39)
uj ∈Uj
n+1
where b = i<j ui + i>j (θi )un for the alternating algorithm (cf. Algo-
rithm 10) and b = (1 − θj )un for the parallel version (cf. Algorithm 9) for
j = 1, . . . , M. Note that the sequence (un,k
j )k generated by (39) converges to
a minimizer un+1j of the corresponding subproblems of Algorithms 9 and 10
(Daubechies et al. 2007).
By introducing small stripes around the interfaces of the subdomains as in Fig. 5,
j ⊂ \ j is a small stripe around the interface between j and \ j and
i.e.,
by the splitting property of the total variation
|D(uj + b)| = |D(uj + b)| ∪ | + |D(b)|\( ∪ ) |
j
j ∪ j j j )
\(j ∪ j j
+ |b+ − b− | dHd−1 (x),
j )∩∂(\(j ∪
∂(j ∪ j ))
n,k 1 ∗ n,k
where Uj :={u ∈ L () : supp(u)⊂j }, zj = uj + δ T g−T (uj +b)
2
| j
j ∪
and b ∈ i =j Ui as above. Note that such a splitting holds for overlapping and non-
overlapping domain decompositions. Moreover, in case of an overlapping domain
decomposition in Fornasier et al. (2010) for a discrete setting, the subproblems are
completely restricted to j , j = 1, . . . , M, respectively, due to an induced trace
j is replaced by j := ∂j \ ∂ and the constraint in (40) is then a
condition, i.e.,
trace condition on j . In Fornasier et al. (2010) and Fornasier and Schönlieb (2009)
the resulting subspace minimization problems are solved by oblique thresholding,
which is based on an iterative proximity map algorithm and the computation of a
Lagrange multiplier by a fixed point iteration. In order to speed up the computation,
in Langer et al. (2013) each subproblem is suggested to be solved by a Bregmanized
operator splitting – split Bregman algorithm.
In practice in order to obtain an approximation of the subspace minimization
problems of Algorithms 9 and 10, only a finite number of (inner) iterations of
(39) can be performed. Nevertheless, the respective generated sequence (un )n of
Algorithms 9 and 10 still satisfies the following convergence properties:
Of course this does not imply the convergence of the sequence (un )n to a minimizer
of J ; cf. Example 3. Nevertheless, it means that independently how accurately
the subdomain problems are solved, the overall convergence is untouched. In a
finite difference setting a similar estimate as the one in (38) can be shown (see
Hintermüller and Langer (2014)), which again provides an upper bound of the
distance between the obtained limit and a minimizer of the global problem.
We have seen that for the predual problem (14), the domain decomposition methods,
which are guaranteed to converge to a minimizer of the original global problem, can
be constructed. Based on these methods one can pursue the following strategy in
order to design a domain decomposition method for problem (12): The domain
decomposition methods in Algorithms 1, 2, 3, 4, and 5 are constituted by its
subdomain problems. Then the dual problems of these subdomain problems are
computed, yielding a sequence of subdomain problems of the primal problem. Due
to predualization and dualization, the final constituted domain decomposition meth-
ods of the primal problem (12) look different than the splitting strategies presented
in section “Basic Domain Decomposition Approach”. Using this idea in Langer
and Gaspoz (2019) and Lee and Nam (2017) overlapping and non-overlapping
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 413
1
arg min || div v + f ||2L2 () : v ∈ H0 (div, ), |v(x)| 2 ≤ β(x) f.a.a. x ∈
2
(42)
where
β := λθj with θj ≥ 0 defined as in (21), (22), and
(23), f =
n+1
div i<j pi + i>j θi p + g for Algorithm 2 and f = div
n
i =j θi p
n +g
for Algorithm 3 for any n ≥ 0. If β : → R+ 0 , β ∈ H () ∩ C(),
1
∇βL∞ () < ∞, and supp(β) ⊆ , then a Fenchel dual of (42) is given by
1
arg min u − f 2L2 () + β|Du| , (43)
u∈L2 () 2
whose minimizer
is unique (Langer and Gaspoz 2019). Here and in the sequel, the
expression β|Du| describes the integral of β on with respect to the measure
|Du|, where Du is the distributional gradient of u. Hence, the subdomain problems
of our domain decomposition method are of the form (43). In order that (43) is well
defined, a partition of unity function needs to have the following properties:
M
θj ≡ 1 and θj ≥ 0 a.e. on for j = 1, 2, . . . , M, (44)
i=1
414 A. Langer
u∗ = div p∗ + f.
M−1
n+1
fM = un+1
j − fjn+1 + g.
j =1
end for
un+1 = g + M n+1
j =1 uj − fjn+1 (= un+1
M )
end for
The strong convergence is due to the fact that (un L2 () )n is monotonically
decreasing. We remark that the boundedness assumption on (fjn )n , j = 1, . . . , M,
is essential for the convergence proof, but this assumption automatically holds in a
finite dimensional setting, which is, for example, the situation when the considered
problem is discretized.
The parallel version of Algorithm 11 is presented in Algorithm 12.
Note that here for the update of fjn+1 an averaging (relaxation) is introduced, which
is necessary for theoretical reasons in order to guarantee a similar convergence result
as for the successive algorithm.
Subspace Minimization
Let us turn now to the question how to realize the subspace minimization problems
of Algorithms 11 and 12 and restrict them to the respective subdomains. We
consider, for example, the subspace minimization with respect to u1 , i.e.,
1
un+1
1 = arg min u1 − f1n+1 2L2 () + λ θ1 |Du1 |, (48)
u1 ∈L ()
2 2
by anticipating that the arguments are analogue for the other subdomain problems.
There are two different approaches on how to compute the solution of (48) by
solving a minimization on 1 only. These two approaches relate to “First optimize
then discretize” and “First discretize then optimize,” where the optimization part
allows to restrict the problem to the subdomain. Hence, the first approach restricts
the minimization problem in an infinite dimensional setting before discretization,
while the second approach first discretizes (48) and then restricts the optimization
process to the subdomain 1 .
First optimize then discretize The restriction of the subproblem is based on the
following statement, cf. Langer and Gaspoz (2019, Lemma 2.2).
Utilizing Lemma 1 one can show that the minimizer of (48) can be computed by
solving a minimization problem in 1 only.
(f1n+1 − un+1
1 ,v − un+1
1 )+λ θ1 |Dun+1
1 | ≤λ θ1 |Dv| ∀v ∈ L2 ().
1 1
Due to the presence of the function θ1 , the usual total variation minimization
techniques cannot be used directly to compute a minimizer of the optimization
problem in (49), but may be used after being adapted to locally weighted total
variation minimization. We note that the minimization of locally weighted total
variation has been already considered in the literature (see, for example, Langer
(2017a)), where an algorithm for solving a minimization problem of the type (48) is
already presented. An alternative method modifying the split Bregman algorithm
(Goldstein and Osher 2009) to locally weighted total variation minimization is
proposed in Langer and Gaspoz (2019). Utilizing one of these methods for a
practical implementation would then require a suitable discretization.
First discretize then optimize Since Algorithms 11 and 12 are designed for an
overlapping splitting, let h be a discrete rectangular image domain containing N1 ×
N2 pixels, N1 , N2 ∈ N, and decomposed into overlapping subdomains hi , i =
1, . . . , M such that h = M i=1 i and for any i ∈ {1, . . . , M} there exists at least
h
one j ∈ {1, . . . , M} \ {i} such that hi ∩ hj = ∅. Moreover, we use the finite
difference discretization introduced in section “Finite Difference Setting”. Then the
discretized version of (48) is written as
1
uh,n+1
1 = arg min uh1 − f1h,n+1 2X + λ θ1h (x)|∇
h h
u1 (x)| 2 , (50)
uh ∈X 2
1 x∈h
where θ1h ∈ X is the discrete version of the above introduced θ1 satisfying (44), (45),
and (46). Since θ1h (x) = 0 for all x ∈ h \ h1 we can write the above minimization
problem as
⎧
⎪
⎪f h,n+1 in h \ h1
⎪
⎨ 1
uh,n+1 = arg min 12 uh1 − f1h,n+1 2X1 + λ θ1h (x)|∇
h h
u1 (x)| 2 in h1 ,
1 ⎪
⎪
⎪u1 | ∈X1
h
⎩ h h
x∈1
1
(51)
where uh1 ∈ X is such that uh1 (x) = f1h,n+1 (x) for x ∈ h \ h1 . Hence, in order to
obtain uh,n+1
1 , only a minimization problem in h1 has to be solved, i.e.,
1 h
arg min u − f1h,n+1 2X1 + λ θ1h (x)|(∇
h h
u1 )| h (x)| 2 .
uh1 | ∈X1 2 1 1
h
x∈h1
1
Note that ∇
h is not a local operator, but nonetheless quite local, i.e., it affects only
the neighboring pixels. Hence, by carefully considering the restriction to h1 (i.e.,
we use Dirichlet boundary conditions on the interface between h1 and h \ h1 ),
uh,n+1
h ∈ X1 is obtained by solving an optimization in h1 only. Consequently
1,1
locally weighted total variation minimization techniques may be used by carefully
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 419
adjusting the gradient operator of the total variation term. An implementation based
on the split Bregman algorithm is presented in Langer and Gaspoz (2019), which
allows to obtain uh,n+1
h by solving a linear system only of size |1 |.
h
1,1
Let us mention that all the results presented in this section hold symmetrically
for the minimization with respect to ui , i = 2, . . . , M and that the notations should
be just adjusted accordingly.
1
arg min uhj − fjh,n+1 2Xj + λ |∇
h h
uj (x)| 2 ,
uj ∈Xj 2
x∈hj
1
J h,s (uh , a h ) := J h (uh ) + δuh − a h 2X − T h (uh − a h )2X ,
2
1 1 ∗ ∗
arg min J h,s (uh , a h ) = arg min uh − (T h g h + (δ − T h T h )a h )2X
uh ∈X uh ∈X 2 δ
λ
+ |∇uh (x)| 2
δ
x∈h
Since in each iteration we have to solve a problem which is of the same type as
(12) with T = I , we may use Algorithm 11 or Algorithm 12 now in a non-
overlapping and finite difference setting to speed up the solution process, leading
to Algorithms 13 and 14.
k =k+1
end for
end while
h,k
uh,n+1 = f h,n+1 − M
j =1 qj and vjh,n+1 = qjh,k for j = 1, . . . , M
end for
In Lee and Nam (2017) it is shown for M = 2 that these algorithms produce
sequences (un )n whose accumulation points are minimizers of J h .
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 421
Conclusion
Domain decomposition methods are known to be one of the most successful meth-
ods to construct efficient solvers for large-scale problems. Nevertheless, only quite
recently such methods are developed for total variation minimization. Therefore,
the research in this direction is far from being complete, as only very little is known
yet. We summarize that the domain decomposition algorithms for total variation
minimization with a theoretical guarantee to convergence to the minimizer of the
global problem are till now given for (i) the discrete predual problem with a non-
overlapping decomposition using finite differences (Hintermüller and Langer 2015)
or finite elements (Lee et al. 2019b; Lee and Park 2019b), (ii) the continuous
predual problem with an overlapping decomposition (Chang et al. 2015), (iii) the
discrete primal problem with a non-overlapping decomposition (Lee and Nam
2017), (iv) and the continuous primal problem with an overlapping decomposition
(Langer and Gaspoz 2019). This list of achievements indicates that constructing
overlapping domain decomposition methods in an infinite dimensional setting
seems easier than non-overlapping domain decomposition methods. A reason for
this may be guessed when one looks at the Poisson problem (see section “Basic Idea
of Domain Decomposition”). There one sees that in order to construct convergent
non-overlapping methods, the subdomain problems differ in each subdomain due to
the interface conditions, while in the overlapping situation all subdomain problems
are of the same type. This ostensible flexibility in creating subdomain problems for
a non-overlapping splitting may lead to additional difficulties for problems where
the solution is discontinuous, as the interface conditions are not clear. In particular,
neither of the interface conditions in (2) are suitable.
For the domain decomposition methods tackling the predual problem (14), not
only the convergence but also the convergence order is known. We note that the
decomposition methods for the continuous problems only cover the image denoising
case, i.e., the L2 -TV model with T = I , while the methods for the discretized
objectives can also handle image inpainting and image segmentation problems. The
primal-dual approach in Lee et al. (2019a) is even successfully applied to image
deblurring. Of course, by using the surrogate idea (also called operator splitting
(Combettes and Wajs 2005)), the L2 -TV model can be cast to an image denoising
type of problem for any operator T . But it is in general unclear how accurately
the solution of the domain decomposition iteration has to be computed in order to
guarantee the convergence of the outer surrogate iteration. Interesting tasks arising,
for example, in medical imaging where T might be a sampled Fourier transform
or Radon transform, which are very global operators, have not yet been thoroughly
considered.
422 A. Langer
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Probl. 10(6), 1217–1229 (1994)
Alliney, S.: A property of the minimum vectors of a regularizing functional defined by means of
the absolute norm. IEEE Trans. Signal Process. 45(4), 913–917 (1997)
Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity
Problems. Oxford Mathematical Monographs. The Clarendon Press/Oxford University Press,
New York (2000)
Attouch, H., Buttazzo, G., Michaille, G.: Variational Analysis in Sobolev and BV Spaces. MOS-
SIAM Series on Optimization, 2nd edn. Society for Industrial and Applied Mathematics
(SIAM)/Mathematical Optimization Society, Philadelphia (2014). Applications to PDEs and
optimization
Aubert, G., Aujol, J.-F.: A variational approach to removing multiplicative noise. SIAM J. Appl.
Math. 68(4), 925–946 (2008)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press,
New York (2014)
Burger, M., Sawatzky, A., Steidl, G.: First order algorithms in variational image processing. In:
Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computa-
tion, pp. 345–407. Springer, Cham (2016)
Cai, J.-F., Chan, R.H., Nikolova, M.: Two-phase approach for deblurring images corrupted by
impulse plus Gaussian noise. Inverse Probl. Imaging 2(2), 187–204 (2008)
Calatroni, L., De Los Reyes, J.C., Schönlieb, C.-B.: Infimal convolution of data discrepancies for
mixed noise removal. SIAM J. Imaging Sci. 10(3), 1196–1233 (2017)
Carstensen, C.: Domain decomposition for a non-smooth convex minimization problem and its
application to plasticity. Numer. Linear Algebra Appl. 4(3), 177–190 (1997)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging
Vis. 20(1–2), 89–97 (2004). Special issue on mathematics and image analysis
Chambolle, A., Pock, T.: A First-order Primal-dual Algorithm for Convex Problems with Applica-
tions to Imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: A remark on accelerated block coordinate descent for computing the
proximity operators of a sum of convex functions. SMAI J. Comput. Math. 1, 29–54 (2015)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25,
161–319 (2016)
Chambolle, A., Caselles, V., Cremers, D., Novaga, M., Pock, T.: An introduction to total variation
for image analysis. Theor. Found. Numer. Methods Sparse Recovery 9, 263–340 (2010)
Chan, T.F., Mathew, T.P.: Domain decomposition algorithms. In: Acta Numerica, pp. 61–143.
Cambridge University Press, Cambridge (1994)
Chan, T.F., Shen, J.J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic
Methods. SIAM, Philadelphia (2005)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Chang, H., Zhang, X., Tai, X.-C., Yang, D.: Domain decomposition methods for nonlocal total
variation image restoration. J. Sci. Comput. 60(1), 79–100 (2014)
Chang, H., Tai, X.-C., Wang, L.-L., Yang, D.: Convergence rate of overlapping domain decom-
position methods for the Rudin–Osher–Fatemi model based on a dual formulation. SIAM J.
Imaging Sci. 8(1), 564–591 (2015)
Chen, K., Tai, X.-C.: A nonlinear multigrid method for total variation minimization from image
restoration. J. Sci. Comput. 33(2), 115–138 (2007)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale
Model. Simul. 4(4), 1168–1200 (electronic) (2005)
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 423
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Daubechies, I., Teschke, G., Vese, L.: Iteratively solving linear inverse problems under general
convex constraints. Inverse Probl. Imaging 1(1), 29–46 (2007)
Dolean, V., Jolivet, P., Nataf, F.: An Introduction to Domain Decomposition Methods: Algorithms,
Theory, and Parallel Implementation, vol. 144. SIAM, Philadelphia (2015)
Duan, Y., Tai, X.-C.: Domain decomposition methods with graph cuts algorithms for total variation
minimization. Adv. Comput. Math. 36(2), 175–199 (2012)
Duan, Y., Chang, H., Tai, X.-C.: Convergent non-overlapping domain decomposition methods for
variational image segmentation. J. Sci. Comput. 69(2), 532–555 (2016)
Fornasier, M.: Domain decomposition methods for linear inverse problems with sparsity con-
straints. Inverse Probl. Int. J. Theory Pract. Inverse Probl. Inverse Methods Comput. Inversion
Data 23(6), 2505–2526 (2007)
Fornasier, M., Schönlieb, C.-B.: Subspace correction methods for total variation and l1 -
minimization. SIAM J. Numer. Anal. 47(5), 3397–3428 (2009)
Fornasier, M., Langer, A., Schönlieb, C.-B.: Domain decomposition methods for compressed
sensing. In: Proceedings of the International Conference of SampTA09, Marseilles, arXiv
preprint arXiv:0902.0124 (2009)
Fornasier, M., Langer, A., Schönlieb, C.-B.: A convergent overlapping domain decomposition
method for total variation minimization. Numerische Mathematik 116(4), 645–685 (2010)
Fornasier, M., Kim, Y., Langer, A., Schönlieb, C.: Wavelet decomposition method for L2 /TV-
image deblurring. SIAM J. Imaging Sci. 5(3), 857–885 (2012)
Getreuer, P., Tong, M., Vese, L.A.: A variational model for the restoration of mr images corrupted
by blur and rician noise. In: International Symposium on Visual Computing, pp. 686–698.
Springer (2011)
Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Multiscale Model.
Simul. 7(3), 1005–1028 (2009)
Giusti, E.: Minimal Surfaces and Functions of Bounded Variation. Monographs in Mathematics,
vol. 80. Birkhäuser Verlag, Basel (1984)
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging
Sci. 2(2), 323–343 (2009)
Hintermüller, M., Kunisch, K.: Total bounded variation regularization as a bilaterally constrained
optimization problem. SIAM J. Appl. Math. 64(4), 1311–1333 (2004)
Hintermüller, M., Langer, A.: Subspace correction methods for a class of nonsmooth and
nonadditive convex variational problems with mixed L1 /L2 data-fidelity in image processing.
SIAM J. Imaging Sci. 6(4), 2134–2173 (2013)
Hintermüller, M., Langer, A.: Surrogate functional based subspace correction methods for image
processing. In: Domain Decomposition Methods in Science and Engineering XXI, pp. 829–837.
Springer, Cham (2014)
Hintermüller, M., Langer, A.: Non-overlapping domain decomposition methods for dual total
variation based image denoising. J. Sci. Comput. 62(2), 456–481 (2015)
Hintermüller, M., Rautenberg, C.: On the density of classes of closed convex sets with pointwise
constraints in sobolev spaces. J. Math. Anal. Appl. 426(1), 585–593 (2015)
Hintermüller, M., Rautenberg, C.N.: Optimal selection of the regularization function in a weighted
total variation model. Part I: Modelling and theory. J. Math. Imaging Vis. 59(3), 498–514 (2017)
Ito, K., Kunisch, K.: Lagrange Multiplier Approach to Variational Problems and Applications,
vol. 15. SIAM, Philadelphia (2008)
Langer, A.: Automated parameter selection for total variation minimization in image restoration.
J. Math. Imaging Vis. 57(2), 239–268 (2017a)
Langer, A.: Automated parameter selection in the L1 -L2 -TV model for removing Gaussian plus
impulse noise. Inverse Probl. 33(7), 74002 (2017b)
Langer, A.: Locally adaptive total variation for removing mixed Gaussian–impulse noise. Int. J.
Comput. Math. 96(2), 298–316 (2019)
424 A. Langer
Langer, A., Gaspoz, F.: Overlapping domain decomposition methods for total variation denoising.
SIAM J. Numer. Anal. 57(3), 1411–1444 (2019)
Langer, A., Osher, S., Schönlieb, C.-B.: Bregmanized domain decomposition for image restoration.
J. Sci. Comput. 54(2–3), 549–576 (2013)
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by
poisson noise. J. Math. Imaging Vis. 27(3), 257–263 (2007)
Lee, C.-O., Nam, C.: Primal domain decomposition methods for the total variation minimization,
based on dual decomposition. SIAM J. Sci. Comput. 39(2), B403–B423 (2017)
Lee, C.-O., Park, J.: Fast nonoverlapping block Jacobi method for the dual Rudin–Osher–Fatemi
model. SIAM J. Imaging Sci. 12(4), 2009–2034 (2019a)
Lee, C.-O., Park, J.: A finite element nonoverlapping domain decomposition method with lagrange
multipliers for the dual total variation minimizations. J. Sci. Comput. 81(3), 2331–2355 (2019b)
Lee, C.-O., Lee, J.H., Woo, H., Yun, S.: Block decomposition methods for total variation by
primal–dual stitching. J. Sci. Comput. 68(1), 273–302 (2016)
Lee, C.-O., Nam, C., Park, J.: Domain decomposition methods using dual conversion for the total
variation minimization with L1 fidelity term. J. Sci. Comput. 78(2), 951–970 (2019a)
Lee, C.-O., Park, E.-H., Park, J.: A finite element approach for the dual Rudin–Osher–Fatemi
model and its nonoverlapping domain decomposition methods. SIAM J. Sci. Comput. 41(2),
B205–B228 (2019b)
Lions, J.L.: Optimal Control of Systems Governed by Partial Differential Equations. Die
Grundlehren der mathematischen Wissenschaften, vol. 170. Springer (1971)
Lions, P.-L.: On the Schwarz alternating method. I. In: First International Symposium on Domain
Decomposition Methods for Partial Differential Equations, Paris, pp. 1–42 (1988)
Marini, L.D., Quarteroni, A.: A relaxation procedure for domain decomposition methods using
finite elements. Numerische Mathematik 55(5), 575–598 (1989)
Mathew, T.: Domain Decomposition Methods for the Numerical Solution of Partial Differential
Equations, vol. 61. Springer Science & Business Media, Berlin (2008)
Nikolova, M.: Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application
to the processing of outliers. SIAM J. Numer. Anal. 40(3), 965–994 (electronic) (2002)
Nikolova, M.: A variational approach to remove outliers and impulse noise. J. Math. Imaging Vis.
20(1–2), 99–120 (2004)
Peyré, G., Bougleux, S., Cohen, L.: Non-local regularization of inverse problems. In: European
Conference on Computer Vision, pp. 57–68. Springer (2008)
Pock, T., Unger, M., Cremers, D., Bischof, H.: Fast and exact solution of total variation models
on the gpu. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops, pp. 1–8. IEEE (2008)
Quarteroni, A., Valli, A.: Domain Decomposition Methods for Partial Differential Equations.
Oxford University Press, New York (1999)
Raviart, P.-A., Thomas, J.-M.: A mixed finite element method for 2-nd order elliptic problems. In:
Mathematical Aspects of Finite Element Methods, pp. 292–315. Springer, Berlin (1977)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D: Nonlinear Phenom. 60(1), 259–268 (1992)
Schönlieb, C.-B.: Total variation minimization with an H −1 constraint. CRM Ser. 9, 201–232
(2009)
Schwarz, H.A.: Über einige Abbildungsaufgaben. Journal für die reine und angewandte Mathe-
matik 1869(70), 105–120 (1869)
Smith, B., Bjorstad, P., Gropp, W.: Domain Decomposition: Parallel Multilevel Methods for
Elliptic Partial Differential Equations. Cambridge University Press, Dordrecht (2004)
Tai, X.-C.: Rate of convergence for some constraint decomposition methods for nonlinear
variational inequalities. Numerische Mathematik 93(4), 755–786 (2003)
Tai, X.-C., Tseng, P.: Convergence rate analysis of an asynchronous space decomposition method
for convex minimization. Math. Comput. 71(239), 1105–1135 (2002)
Tai, X.-C., Xu, J.: Global and uniform convergence of subspace correction methods for some
convex optimization problems. Math. Comput. 71(237), 105–124 (2002)
11 Domain Decomposition for Non-smooth (in Particular TV) Minimization 425
Toselli, A., Widlund, O.: Domain Decomposition Methods: Algorithms and Theory, vol. 34.
Springer Science & Business Media, Dordrecht (2006)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization.
J. Optim. Theory Appl. 109(3), 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization.
Math. Prog. 117(1–2), 387–423 (2009)
Vonesch, C., Unser, M.: A fast multilevel algorithm for wavelet-regularized image restoration.
IEEE Trans. Image Process. 18(3), 509–523 (2009)
Warga, J.: Minimizing Certain Concex Functions. J. Soc. Indust. Appl. Math. 11, 588–593 (1963)
Wright, S.J.: Coordinate descent algorithms. Math. Prog. 151(1), 3–34 (2015)
Wu, C., Tai, X.-C.: Augmented lagrangian method, dual methods, and split bregman iteration for
rof, vectorial tv, and high order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Xu, J., Tai, X.-C., Wang, L.-L.: A two-level domain decomposition method for image restoration.
Inverse Probl. Imaging 4(3), 523–545 (2010)
Xu, J., Chang, H.B., Qin, J.: Domain decomposition method for image deblurring. J. Comput.
Appl. Math. 271, 401–414 (2014)
Zhang, X., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvo-
lution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)
Fast Numerical Methods for Image
Segmentation Models 12
Noor Badshah
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Mathematical Models for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Two-Phase Segmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Snakes: Active Contour Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Geodesic Active Contour Model (GAC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Chan-Vese Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Fast Numerical Methods: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Multigrid Solver for Solving a Class of Variational Problems with
Application to Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Sobolev Gradient Minimization of Curve Length in Chan-Vese Model . . . . . . . . . . . . . . . 449
Multiphase Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
Multigrid Method for Multiphase Segmentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
Multigrid Method with Typical and Modified Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Local Fourier Analysis and a Modified Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Convex Multiphase Image Segmentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
A Three-Stage Approach for Multiphase Segmentation Degraded Color Images . . . . . . . . 466
Stage 2: Dimension Lifting with Secondary Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Selective Segmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Image Segmentation Under Geometrical Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Active Contour-Based Image Selective Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Dual-Level Set Selective Segmentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
One-Level Selective Segmentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Reproducible Kernel Hilbert Space-Based Image Segmentation . . . . . . . . . . . . . . . . . . . . . 479
An Optimization-Based Multilevel Algorithm for Selective Image
Segmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
N. Badshah ()
Department of Basic Sciences, University of Engineering and Technology, Peshawar, Pakistan
Abstract
Keywords
Introduction
Image segmentation is one of the fundamental tasks in image analysis and computer
vision. The purpose of image segmentation is to partition a given image into dif-
ferent meaningful regions based on the intensity homogeneity, pattern similarities,
colors similarities, etc. The goal of image segmentation is to represent an image
in such a way that could be easily analyzed. There are two main concerns related
to image segmentation: (i) modeling image segmentation problems and (ii) fast
and advanced numerical methods for the solution of partial differential equations
arising from the minimization of these models. There are many algorithms/models
present in the literature for the solution of image segmentation problems. Among
these, some of them use edge or region information of the image for segmentation
purpose. The most basic edge-based model is the geodesic snake model (Kass et al.
1988; Caselles et al. 1997), which is based on edge information in the image, and
a gradient flow is used as a stopping term to get correct boundaries with sudden
changes in the gradient for attracting the contour to the object boundary. The Chan-
Vese model (Chan and Vese 2001) is based on the variation in regions. For that
purpose it uses region statistics as a stopping criterion.
The Allen-Cahn (AC) equation was originally introduced as a phenomenological
model for antiphase domain coarsening in a binary alloy (Allen and Cahn 1979).
This equation can be used to model flow problems based on mean curvature.
This type of flow is one of the important element for active contour-based
12 Fast Numerical Methods for Image Segmentation Models 429
image segmentation models. For these types of methods, there exists a very fast
computational method such as multigrid method (Badshah and Chen 2008). This
chapter is dedicated to the minimization techniques of various models developed
for the segmentation of images, which leads to a highly nonlinear partial differential
equation. Some well-known numerical methods for the solution of these partial
differential equations have been discussed.
Image segmentation links low-level vision with high-level vision. It is the process
of partitioning an image into a collection of objects which can, later on, be
used for performing high-level tasks like object detection, tracking, recognition,
etc. The current section is about the existing mathematical models developed for
image segmentation. Active contour models have attained much attention in image
segmentation nowadays. These models for segmentation of images are divided
into two groups, namely, (i) edge-based segmentation models and (ii) region-based
segmentation models. In the next section, edge-based active models are discussed
in detail.
2 1 2 2
1 ∂Υ (p)
∂ Υ (p)
F K (Υ (p)) = dp + β 2 dp
0 ∂p 0 ∂ p
(1)
1
+λ e2 (∇(z ∗ Kσ )(Υ (p)))dp,
0
430 N. Badshah
where > 0, β > 0 and λ > are the trade-off parameters. Also e is the edge
detector function and is given by the following:
1
e(∇(z ∗ Kσ )) = , (2)
1 + γ |∇(z ∗ Kσ )|2
where Kσ (x, y) = 1
2π σ 2
exp (x − μx )2 + (y − μy )2 /2σ 2 is the well-known
Gaussian kernel and γ is a positive parameter. F K is nonconvex functional (Kass
et al. 1988) and can be easily stuck at local minima. The local minima of F K can
be the solution of the following Euler-Lagrange’s equation:
∂ 2Υ ∂ 4Υ
− + β + λ∇e2 = 0. (3)
∂p2 ∂p4
The numerical solution of this fourth-order partial differential equation can be found
by using finite difference method (Kass et al. 1988).
In 1997, Casselles et al. proposed another edge-based model by using a new type
of curve parametrization. This is an improvement in snake energy functional (Kass
et al. 1988). The energy functional of the GAC model is given by the following:
1
F C (Υ (p)) = e(|∇z(Υ (p))|)|Υ (p)|dp. (4)
0
Given that L(Υ ) represents the Euclidean length of the moving contour Υ and since
1 L(Υ )
L(Υ ) = 0 |Υ (p)|dp = 0 ds, where ds is the Euclidean length element,
Eq. (4) may be written as follows:
L(Υ )
F C (Υ (p)) = e(|∇z(Υ (p))|)ds (5)
0
This energy functional introduces a new length through weighted Euclidean dif-
ferential length ds by the edge detector e which uses edge information (Aubert
and Kornprobst 2002). The function e is the same as given in (2). The equivalence
between minimizing F C and minimizing F K at β = 0 was studied in Caselles et al.
(1997). Hence the direction for which F C decreases most rapidly provides us the
following minimization flow: more details of its derivation can be found in Caselles
et al. (1997):
∂Υ − (∇e · N)
N,
= eκ N (6)
∂t
12 Fast Numerical Methods for Image Segmentation Models 431
where κ represents curvature and N is the unit normal vector. This equation leads
toward the optimal length of the contour. The steady-state solution of (6) will be the
solution of Euler-Lagrange’s equation for the energy functional given in Eq. (5). By
introducing level set idea, the evolution equation takes the following form:
∂φ ∇φ
= |∇φ|(∇ · (e ) + ν1 e), (7)
∂t |∇φ|
φ is a level set function and the contour Υ is the zero level set φ(x, y) = 0. A
balloon term ν1 e, ν1 > 0 is included to speed up the convergence.
These models are based on the edge detector e which uses the gradient of
the image so these models can only detect objects whose boundaries are defined
by gradient. Also, in practice, the discrete gradients are bounded, and hence the
stopping function e may not vanish on the boundaries, and the contour may leak
through the image edges (Chan and Vese 2001). These models may not work very
well in noisy images.
Chan-Vese Model
In 2001, Chan and Vese proposed a region-based energy functional which uses data
fitting statistics as a stopping process and is a special case of piecewise constant
Mumford-Shah model (Mumford and Shah 1989). Let z be the known bounded
function (image data) and assume that z has two regions (say foreground and
background) of approximately constant intensities zi and zo . Assume that the object
to be detected is represented by the region with intensity zi and its boundary is Γ0 .
Let the average intensity approximating zi and zo be c1 and c2 , respectively. Let Γ
by the interface separating the regions where the average intensities are c1 and c2 .
Based on constant average intensities in two different regions, the following energy
is introduced:
where F CV is given in Eq. (8). This functional is a special case of the piecewise
constant Mumford and Shah energy functional (Mumford and Shah 1989).
Once the optimal value φ is obtained, the final solution (segmented image) can be
found by using the following:
u = c1 H (φ) + c2 (1 − H (φ)).
For the existence of minimizers and its relation with the Mumford and Shah model,
please see Chan and Vese (2001). It must be noted that c1 , c2 are the optimal average
constant intensities inside and outside curve φ = 0. H (φ) is the Heaviside function
and is used as region descriptor. Due to discontinuity of Heaviside function at origin,
a regularized Heaviside function H (φ) is introduced, and the above functional (10)
is minimized with respect to φ to the get the following differential equation:
⎧
⎪
⎪ · ∇φ
|∇φ| − ν − η(z − c1 ) + γ (z − c2 )
2 2 =0
⎪
⎨ δ (φ) μ∇ in Ω,
(11)
⎪
⎪ δ (φ) ∂φ
⎪
⎩ =0 on ∂Ω.
|∇φ| ∂n
Note that the steady-state solution of this parabolic partial differential equation
will give solution of the corresponding elliptic partial differential equation given
in Eq. (11). This is a nonlinear partial differential equation whose solution is done
through fast numerical methods which are discussed in the next section.
Semi-implicit Method
Consider the following evolution problem which is obtained from minimization of
Chan-Vese model:
⎧
⎪
⎪ zH (φ)dΩ z(1 − H (φ))dΩ
⎪
⎪ c1 (φ) = Ω
c2 (φ) = Ω
⎪
⎪
⎪
⎪ Ω H (φ)dxdydΩ Ω (1 − H (φ))dΩ
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ∂φ ∇φ
⎨ = δ (φ) μ∇. − ν−η(z − c1 ) + γ (z − c2 )
2 2
in Ω,
∂t |∇φ|
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ φ(0, x, y) = φ0 (x, y) in Ω,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ∂φ
⎩ =0 on ∂Ω.
∂n
(13)
For given initial φ, the constant average intensities c1 (φ) and c2 (φ) are computed
first. And then φ is computed by solving the nonlinear PDE given in Eq. (13). Steps
of the semi-implicit method for solution of this equation are given here. Suppose
that the size of given input image z is m1 × m2 . Finite difference scheme is used
for discretization. Let x, y ∈ Ω be the spatial variables, h1 , h2 be the horizontal
and vertical space step size, and t be the time step. Divide the image domain
into m1 × m2 grid points, and let (xi , yj ) = (ih1 , j h2 ), for i = 1, 2, . . . , m1 and
j = 1, 2, . . . , m2 . Also let φi,j
k
= φ(k t, xi , yj ) be an approximation of φ(t, x, y)
in the kth iteration, where k ≥ 0 and φ 0 = φ0 be the initial value.
Discretize the parabolic PDE given in Eq. (13) by using finite differences to get
the following nonlinear difference equation to be used for updating φ (k) :
434 N. Badshah
⎡ ⎛ ⎞
φijk+1 − φijk ⎢μ x ⎜
x φ k+1
+ ij ⎟
= δ (φijk ) ⎢
⎣ h2 − ⎝ ⎠
t ( x φ k / h )2 + ((φi,j
k − φi,j
k 2 + β1
1 + ij 1 +1 −1 )/2h2 )
⎛ ⎞
y k+1
μ y ⎜ + φij ⎟
+ − ⎝ ⎠
h22 k
((φi+1,j − φi−1,j
k )/2h1 )2 +(
y k 2 + β1
+ φij / h2 )
− ν − η(zij − c1 (φ )) + γ (zij − c2 (φ ))
k 2 k 2
.
φijk+1 1 + μδ φijk (A1 + A2 + A3 + A4 )
k+1 k+1 k+1
= φijk + tδ φijk μ A1 φi+1,j + A2 φi−1,j + A3 φi,j +1 (14)
2 2
k+1
+ A4 φi,j −1 − ν − η zij − c1 φ k
+ γ zij − c2 φ k
.
Aφ (k+1) = f (k) ,
where A is a block tri-diagonal matrix, which can be solved by using any iterative
method.
Re-initialization of the level set function is done to prevent the level set function
from becoming too flat. This may be done by solving the following initial value
problem; see for reference Sussman et al. (1994):
⎧
⎪
⎪ ∂ξ
⎨ = sgn(φ(t))(1 − |∇ξ |)
∂t (15)
⎪
⎪
⎩ ξ(0, t) = φ(t),
where φ is obtained from solution of Eq. (14) (Chan and Vese 2001).
12 Fast Numerical Methods for Image Segmentation Models 435
φ k+1 → CV (φ k , μ, tol)
1. For given φ0 , calculate average intensities c1 and c2 using first two formulas in Eq. (13).
2. Keep c1 and c2 fixed, and find numerical solution of the PDE in Eq. (13), to have φ k+1 .
3. Compute c1 and c2 using φ k+1 .
4. If |φ k+1 − φ k | < tol stop else.
5. Re-initialize φ, by solving Equation (15), and do step 2.
Note that the semi-implicit method for the solution of parabolic partial differential
equations is unconditionally stable (Weickert and Kühne 2002) so will be conver-
gent for large time steps in lower-dimensional problems. As the dimension of the
problem increases, the bandwidth of the system matrix becomes much larger and
results in a big condition number if the time step is taken larger, whereas the small
time step, in that case, would require a large number of iterations, which lead toward
slow convergence. This drawback of semi-implicit method was also observed in
experimental results; see for details Badshah and Chen (2008) and Weickert et al.
(1997).
∂φ ∇φ
= δ (φ) μ∇ · − ν − η(z − c1 ) + γ (z − c2 )
2 2
. (16)
∂t |∇φ|
k+1
φik+1 − φik φi+1 − φik+1 φik+1 − φi−1
k+1
= δ (φi )
k
− + Fi , (17)
t | x+ φik | | x+ φi−1
k |
Thus with AOS method, solve problems in x- and y-directions with double time step
to get two separate solutions say φ1 and φ2 , and then find the average as follows:
1
φ= (φ1 + φ2 ).
2
Although no stability constraint on the time step is present when the AOS scheme
is utilized, the size of the time step cannot be very large because splitting-related
artifacts associated with loss of rotational invariance will emerge. The practical
implication of this is that the number of iterations needed for the contour to converge
remains quite large. For images of large sizes, the methods discussed in this chapter
are very slow in convergence. To avoid this problem, multigrid method is the best
option.
Multigrid Method
A multigrid method for the Chan-Vese model proposed by Badshah and Chen (2008)
is presented here. The proposed method is based on the global formulation of
the Chan-Vese model proposed by Chan et al. (2006). Consider Euler-Lagrange’s
equation deduced from the minimization of Chan-Vese energy functional given in
(11):
∇φ
δ (φ) μdiv − η(z(x, y) − c1 ) + γ (z(x, y) − c2 ) = 0,
2 2
|∇φ|
δ (φ) has non-compact support, so the above equation may be written as follows:
∇φ
μdiv − η(z(x, y) − c1 )2 + γ (z(x, y) − c2 )2 = 0. (19)
|∇φ|
This is the convex formulation of the Chan-Vese model (Chan and Vese 2001)
proposed by Chan et al. in (2006). But the functional given in Equation (20) is
12 Fast Numerical Methods for Image Segmentation Models 437
homogenous in φ of degree 1 Chan et al. (2006). This means that this functional
has no stationary point, so it needs to impose some extra constraints on φ that is
0 ≤ |φ| ≤ 1.
Use finite difference scheme to discretize Equation (19) for φ. The corresponding
discrete equation at a grid point (i, j ) is given by the following:
⎡ ⎧
⎛ ⎞
⎪
⎢ ⎪⎨ Δx x
⎢ − ⎜ Δ+ φi,j / h1 ⎟
⎢μ ⎝ ⎠
⎣ ⎪⎪ h1 y
(Δ+ φi,j / h1 ) + (Δ+ φi,j / h2 ) + β1
x
⎩ 2 2
⎛ ⎞⎫
⎪
⎪
y y ⎬
Δ− ⎜ Δ+ φi,j / h2 ⎟
+ ⎝ ⎠ (21)
⎪
h2 + β1 ⎪
y
(Δx+ φi,j / h1 )2 + (Δ+ φi,j / h2 )2 ⎭
−η(zi,j − c1 ) + γ (zi,j − c2 )
2 2
= 0,
where β1 > 0 is a small parameter to avoid zero denominator. Equation (21) may
be written in the following way:
⎡ ⎧
⎛ ⎞
⎪
⎢ ⎪
⎨ x
⎢ ⎜ Δ+ φi,j ⎟
⎢μ Δx− ⎝ ⎠ (22)
⎣ ⎪
⎪ y
(Δ+ φi,j ) + (λΔ+ φi,j ) + β̄
x
⎩ 2 2
⎛ ⎞⎫
⎪
⎪
y ⎬
2 y ⎜ Δ+ φi,j ⎟
+λ Δ− ⎝ ⎠
⎪
(Δx+ φi,j )2 + (λΔ+ φi,j )2 + β̄ ⎪
y
⎭
−η(zi,j − c1 )2 + γ (zi,j − c2 )2 = 0,
⎧ ⎛ ⎞
⎪
⎪
⎨ Δx+ φi,j
⎜ ⎟
⇒ μ Δx− ⎝ ⎠
⎪
⎪ y
⎩ (Δx+ φi,j )2 + (λΔ+ φi,j )2 + β̄
⎛ ⎞⎫
⎪
⎪
y ⎬
2 y ⎜ Δ+ φi,j ⎟
+λ Δ− ⎝ ⎠
⎪
(Δx+ φi,j )2 + (λΔ+ φi,j )2 + β̄ ⎪
y
⎭
φi,0 = φi,1 , φi,m2 +1 = φi,m2 , φ0,j = φ1,j , φm1 +1,j = φm1 ,j , (24)
for i = 1, . . . , m1 , j = 1, . . . , m2 and 0 ≤ |φi,j | ≤ 1.
Note that the left side of Eq. (23) resembles with the denoising model by Rudin et al.
(1992) using the total variation (TV) regularization. The parameter β > 0 should be
a small quantity to avoid the singularities.
N h (φ h ) = f h (25)
N h (Φ h + eh ) − N h (Φ h ) = r h . (26)
Smooth components of error eh may not be visible on fine gird Ω h and hence
cannot be well approximated. But that can be well approximated on coarse grid
Ω 2h . Therefore any iterative method which smooths the error on the fine grid can
be further well approximated by the use of the coarse grid correction. Note that
on coarse grid the residual equation is solved which is less expansive as there
will be half the number of grid points. Once a coarse grid approximation of the
error is obtained, then it will be transferred back to the fine grid to correct the
approximation Φ h . This is known as a two-grid cycle, and the recursive use of two-
grid cycle is termed as a multigrid method. Restriction and interpolation operators
for transferring grid functions between Ω h and Ω 2h for cell-centered discretization
are defined here:
12 Fast Numerical Methods for Image Segmentation Models 439
Restriction
Ih2h Ψ h = Ψ 2h
where:
1 h
2h
Ψ,m = Ψ2−1,2m−1 + Ψ2−1,2m
h
+ Ψ2,2m−1
h
+ Ψ2,2m
h
,
4
1 ≤ ≤ m1 /2, 1 ≤ m ≤ m2 /2.
Interpolation
h
I2h Ψ 2h = Ψ h
where:
1 2h
h
Ψ2,2m = 9Ψ,m + 3Ψ+1,m
2h
+ 3Ψ,m+1
2h
+ Ψ+1,m+1
2h
,
16
1 2h
h
Ψ2−1,2m = 9Ψ,m + 3Ψ−1,m
2h
+ 3Ψ,m+1
2h
+ Ψ−1,m+1
2h
,
16
1 2h
h
Ψ2,2m−1 = 9Ψ,m + 3Ψ+1,m
2h
+ 3Ψ,m−1
2h
+ Ψ+1,m−1
2h
,
16
1 2h
h
Ψ2−1,2m−1 = 9Ψ,m + 3Ψ−1,m
2h
+ 3Ψ,m−1
2h
+ Ψ−1,m−1
2h
,
16
for 1 ≤ ≤ m1 /2, 1 ≤ m ≤ m2 /2.
⎧⎡ ⎤
⎪
⎨ x x
⎢ Δ+ φi,j Δ+ φi−1,j ⎥
μ ⎣ − ⎦
⎪
⎩ x y x y
(Δ+ φi,j ) + (λΔ+ φi,j ) + β̄
2 2 (Δ+ φi−1,j ) + (λΔ+ φi−1,j ) + β̄
2 2
⎡ ⎤⎫
y y ⎪
⎬
⎢ Δ+ φi,j Δ+ φi,j −1 ⎥
+λ2 ⎣ − ⎦
⎪
(Δx+ φi,j −1 )2 + (λΔ+ φi,j −1 )2 + β̄ ⎭
y y
(Δx+ φi,j )2 + (λΔ+ φi,j )2 + β̄
= η(zi,j − c1 )2 − γ (zi,j − c2 )2 .
&
+ λ D(φ)i,j (φi,j +1 − φi,j ) − D(φ)i,j −1 (φi,j − φi,j −1 )
2
(27)
Note that all differential coefficients D(φ)i,j , D(φ)i−1,j , and D(φ)i,j −1 contain
φi,j , which will be evaluated at previous time step, and the same values will be used
in the rest of the process. Let 'ϕ be an approximation to φ. By putting the values of
'
ϕ at each grid point in Eq. (27) other than the grid point (i, j ) and also computing
D at each grid point (i, j ), a linear equation in φi,j will be obtained:
%
D(' ϕi+1,j − φi,j ) − D('
ϕ )i,j (' ϕ )i−1,j (φi,j − '
ϕi−1,j )
&
+ λ D('
2
ϕi,j +1 − φi,j ) − D('
ϕ )i,j (' ϕ )i,j −1 (φi,j − '
ϕi,j −1 ) ≡ fi,j /μ ≡ f¯i,j .
Algorithm for solving this equation for φi,j to update the approximation at each
pixel (i, j ):
for i = 1 : m1
for j = 1 : m2
y
D(φ h )i,j = [( x φ )2
+ i,j + (λ + φi,j )
2 + β̄]
end
end
ϕh = φh
for iter = 1 : maxit
for i = 1 : n
for j = 1 : m
'
ϕh ← ϕh
(
D(φ h )i,j 'h
ϕi+1,j + D(φ h )i−1,j 'h
ϕi−1,j + λ2 D(φ h )i,j 'h
ϕi,j +1
)
+λ D(φ )i,j −1 '
2 h h ¯
ϕi,j −1 − fi,j
ϕi,j =
D(φ h )i,j + D(φ h )i−1,j + λ2 (D(φ h )i,j + D(φ h )i,j −1 )
end
end
end
φh ← ϕ
442 N. Badshah
Φ0 = Φ h
1. On the coarsest grid, solve Eq. (25) by using SI or AOS methods (Weickert et al. 1997) and
then stop.
On finer grids do smoothing, i.e.:
2. Restriction:
φ 2h = Ih2h φ h , φ̄ 2h = φ 2h .
Update f¯h .
φ h − φ0 2
If rr = < tol, stop.
φ0 2
Else go to Start.
12 Fast Numerical Methods for Image Segmentation Models 443
For the local smoother, introduce the following notations g1 = D(φ ')i−1,j =
D(φ (k) )i−1,j , g2 = D(φ')i,j = D(φ (k) )i,j , and g3 = D(φ ')i,j −1 = D(φ (k) )i,j −1 ,
and similarly for the global smoother, g1 , g2 , g3 will be computed globally as
follows: g1 = D(Φ) ' i−1,j , g2 = D(Φ)' i,j , and g3 = D(Φ) ' i,j −1 where Φ
' is the
iterate at the previous sweep (global fixed point). Also as h1 = h2 , so λ2 = 1. So:
k+1
−(g1 + 2g2 + g3 )φi,j k+1
+ g1 φi−1,j k+1
+ g3 φi,j ¯
−1 + g2 (φi,j +1 + φi+1,j ) = fi,j .
k k
k+1 k+1
where ei,j = φi,j − φi,j k =φ
and ei,j i,j − φi,j are the local error functions after
k
2θ1 π 2θ2 π
Here α1 = , α2 = ∈ [−π, π ]. The LFA involves expanding the
m m
following:
444 N. Badshah
*
m/2 *
m/2
ek+1 = ψθk+1 B
1 ,θ2 θ1 ,θ2
(xi , yj ), ek = ψθk1 ,θ2 Bθ1 ,θ2 (xi , yj )
θ1 ,θ2 =−m/2 θ1 ,θ2 =−m/2
−π π
in the high-frequency range (α1 , α2 ) ∈ [−π, π ] \ , which defines the
2 2
smoothing rate (Trottenberg and Schuller 2001). Now replace all grid functions by
their Fourier series and essentially consider the so-called amplification factor, i.e.,
the ratio between ψθk+1 and ψθk for each θ where θ = (θ1 , θ2 ). Then for the Fourier
component of the error functions ei,jk and ek+1 before and after relaxation sweep,
i,j
consider the following:
k
ei,j = ψθk ei(2π θ1 i + 2π θ2 j )/m and k+1
ei,j = ψθk+1 ei(2π θ1 i + 2π θ2 j )/m ,
(29)
and introducing |θ | = max(|θ1 |, |θ2 |); the smoothing factor μ̄ is then obtained as
follows:
μ̄ = max μ(θ ),
ρ̂π ≤|θ|≤π
where ρ̂ is the mesh size ratio and the range ρ̂π ≤ |θ | ≤ π is the suitable
range of high- frequency components, i.e., the range of components that cannot
be approximated on the coarser grid. For standard coarsening ρ̂ = 12 , Brandt
(1977). The smoothing factor μ̄ is computed for both smoothers. To proceed with
an analysis, compute g1 , g2 and g3 or the following function:
y
D(φ) = ( x φ)2
+ +( + φ)
2 + β̄,
Numerically, and work out the smoothing factor μ̄ for each set of coefficients g1 ,
g2 , and g3 within a smoother. Use the complete set of coefficients g1 , g2 and g3 for
computing the smoothing factors μ̄, and display the maximum of such factors:
Table 1 μ̂ in the first 4 MG cycle Smoothing steps Rate I:μ̂I Rate II:μ̂I I
cycles of out MG algorithm
1 Pre-smoothing-1 0.4942 0.6776
Pre-smoothing-2 0.4941 0.9317
Post-smoothing-1 0.4942 0.9135
Post-smoothing-2 0.4942 0.9427
2 Pre-smoothing-1 0.6003 0.9561
Pre-smoothing-2 0.6003 0.9174
Post-smoothing-1 0.6003 0.9581
Post-smoothing-2 0.6003 0.9577
3 Pre-smoothing-1 0.7760 0.9533
Pre-smoothing-2 0.7760 0.9193
Post-smoothing-1 0.7757 0.9092
Post-smoothing-2 0.7749 0.9040
4 Pre-smoothing-1 0.6025 0.9594
Pre-smoothing-2 0.6026 0.9456
Post-smoothing-1 0.6026 0.9286
Post-smoothing-2 0.6026 0.9678
From Table 2, it can be observed that the MG method is as fast as the SI method
and AOS method for images of small sizes, but it is more efficient for images having
large sizes, where the abovementioned methods are very slow or not working.
where F is the data fitting term, D is the distance metric, and vε2 is the convex-
relaxation penalty term which enforces the constraint that 0 ≤ u ≤ 1; see Chan et al.
(2006) for choice of vε2 . The corresponding Euler-Lagrange equation is obtained by
minimizing the above functional and is given by the following:
12 Fast Numerical Methods for Image Segmentation Models 447
∇u
μ∇ · g(|∇z(x)|) − λF − θ αD − vε 2 (u) = 0 (31)
|∇u|ε1
with Neumann boundary conditions and where ε1 , ε2 are small positive parame-
ters. Multigrid methods discussed in sections “Multigrid Method” and “Multigrid
Method for Multiphase Segmentation Model” may not be applied for solution of
type of PDE given in (31), due to the following reasons:
1. In the PDE given in (31), the Euler-Lagrange equation arose from minimization
of convex formulation of CV model, which has an extra constraint of 0 ≤ u ≤ 1,
which means that the solution of the PDE will be a binary function everywhere.
And hence there will be significant jumps in the values of vε 2 (u); this leads
to instability of pixel-wise fixed point smoother, and hence the basic multigrid
method fails.
2. Small value of ε2 can lead to the divergence of the algorithm due to discontinuity
of the function vε 2 (u), whereas large value of ε2 may guarantee the convergence
of the algorithm but change the nature of the problem.
3. ε1 is the parameter which avoids singularity in the PDE. Most of the multigrid
method’s convergence depends on the value of ε1 ; small value can lead to the
nonconvergence of the algorithm, and large value changes the nature of the
problem.
4. In the discretization step, all functions will be approximated at the half pixels
and due to nonsmoothness of the edge function, its approximation at the half
pixel may be very inaccurate.
5. Divergence term in the PDE (31) is highly nonlinear. Approximation of this term
around the interfaces in g and u may be inaccurate due to the use of singularity
parameter ε1 as discussed above.
To address these bullets and to apply multigrid methods, the authors in Roberts et al.
(2019) introduced a new formulation of the models given in (30).
First Algorithm
Model in (30) is reformulated by removing the penalty term vε2 (u) which is done by
introducing a new variable v. The new reformulated model becomes the following:
+
min μ g(|∇z(x)|)|∇u|dΩ + λ Fv dΩ
u,v Ω Ω
,
θB
+θ Dv dΩ + α vε2 (v)dΩ + u − vL2 ,
2
(32)
Ω Ω 2
where θB is a tuning parameter. This model will be minimized with respect to u and
v. To minimize with respect to u, the above model reduces to the following:
448 N. Badshah
+ ,
θB
min μ g(|∇z(x)|)|∇u|dΩ + u − vL2 .
2
(33)
u Ω 2
With Neumann boundary condition. In the minimization problem for v, the follow-
ing minimization problem is considered:
+ ,
θB
min λ Fv dΩ + θ Dv dΩ + α vε2 (v)dΩ + u − vL2 ,
2
(35)
v Ω Ω Ω 2
It can be noted that both PDEs do not contain vε 2 , which is the one of the
achievement of the proposed algorithm. For detailed steps of the algorithm, see
Roberts et al. (2019).
Furthermore, the authors introduced Split-Bregman iterations for removing
nonlinearity in the weighted TV term. This is done by introducing a new variable
d for the weighted TV, and hence the minimization problem given in (30) becomes
the following:
+
min μ |d|g dΩ + λ Fu dΩ + θ Du dΩ
u,d Ω Ω Ω
,
λB
+α vε2 (u)dΩ + d − ∇u − b2L2 , (37)
Ω 2
where |d|g = g(|∇z|)|∇u| and λB ≥ 0. Note that b is the Bregman update which
has a simple update formula. To find optimal value of u, the following minimization
problem will be solved:
+ ,
λB
min λ Fu dΩ + θ Du dΩ + α vε2 (u)dΩ + d − ∇u − b2L2 .
u Ω Ω Ω 2
(38)
Minimization problem for d takes the following form:
12 Fast Numerical Methods for Image Segmentation Models 449
+ ,
λB
min μ |d|g dΩ + d − ∇u − bL2 ,
2
(39)
d Ω 2
using Bregman splitting where θB and λB are fixed nonnegative parameters. The
following subproblem is considered for u:
+ -2 ,
θB -
-
-2
- λB -
- (k) -
u(k+1) = arg min -u − v (k) - 2 + -d − ∇u − b(k) - 2 (43)
u 2 L 2 L
with Neumann boundary conditions. This is a linear PDE which can be solved by
a multigrid method. d, b, and v will be updated as given in (40), (41), and (36),
respectively. PDEs obtained from minimization of various subproblems are solved
by using additive operator splitting method and multigrid methods; for detail see
Sect. 5 in Roberts et al. (2019).
In Yuan and He (2012), the Sobolev gradient is used to minimize the length term in
the Chan-Vese
segmentation model. Denote the length term in Chan-Vese model by
Ł(φ) = Ω δ (φ)|∇φ|dΩ. The Sobolev gradient of the curve length functional L(φ)
may be represented through L2 gradient. As done earlier, the Gáteaux derivative of
Ł(φ) in the direction of a test function h ∈ C0∞ is given by the following:
450 N. Badshah
. /
Ł(φ + h) − Ł(φ) ∇φ
Ł (φ)h = lim = δ(φ) , ∇h + δ (φ)|∇φ|hdΩ.
→0 |∇φ| Ω
L2 (Ω)2
(45)
The inner product can be simplified by using integration by parts, which will
happen if φ belongs to Sobolev space H 2,2 (Ω). The Gâteaux derivative of length
term Ł (φ)h is defined to be the unique element that represents the bounded linear
functional Ł (φ) in L2 (Ω) as follows:
∇φ
∇Ł(φ) = −δ (φ) μdiv (47)
|∇φ|
where φ ∈ H 1,2 (Ω). In φ ∈ H 1,2 (Ω), the inner product may be defined as follows:
φ, hH 1,2 (Ω) = φh + ∇φ, ∇hH 1,2 (Ω)2 = Dφ, DhL2 (Ω)3 , h ∈ H 1,2 (Ω).
Ω
For any function φ, h ∈ H 1,2 (Ω), it is well known that the Gâteaux derivative Ł (φ)
which is given in Eq. (45) exists and is a bounded linear functional on H 1,2 (Ω). By
the Riesz theorem, the Gâteaux derivative Ł (φ)h is defined to be the unique element
R(φ) that represents the bounded linear functional Ł (φ) on H 1,2 (Ω) as follows:
Ł (φ)h = R(φ), hH 1,2 (Ω) = D(∇s Ł(φ)), DhL2 (Ω)3 = D ∗ D(∇s Ł(φ)), hL2 (Ω)
(49)
12 Fast Numerical Methods for Image Segmentation Models 451
where D ∗ = (I, −∇) is the adjoint of D. The two gradients may be related in the
following way:
Combine all these results to get the Sobolev gradient of the length term, which is
given as follows:
⎛ ⎞
−1 ⎝ ∇φ ⎠.
∇Ł(φ) = −(I − Δ) δ (φ) μdiv (51)
|∇φ|
Numerical Method
The evolution equation given in Eq. (52) is solved in the following way: the Sobolev
gradient term is computed by introducing an intermediate variable say Φ, i.e.:
−1 ∇φ
Φ = (I − Δ) δ (φ) μ∇. (53)
|∇φ|
or:
∇φ
(I − Δ)Φ = δ (φ) μ∇. . (54)
|∇φ|
452 N. Badshah
(k)
For given value of φi,j , the above equation will be solved by using fast Poisson
(k) (k+1)
solver to get Φ(φi,j , φi,j ). To find numerical solution of evolution equation given
in Eq. (52), the following procedure will be followed. Starting with the initial value
of φ, compute c1 and c2 . Then the numerical approximation of the Eq. (52) can be
found by solving the following discrete equation:
(k+1) (k)
φi,j − φi,j (k) (k+1) (k)
= μΦ(φi,j , φi,j ) + δ(φi,j ) − λ1 (zi,j − c1 )2 + λ2 (zi,j − c2 )2 .
t
(55)
For more details and algorithm, please see Yuan and He (2012).
Speed comparison of both type gradients, i.e., L2 and L2 combined with Sobolev
gradients in Table 3. Both methods are tested on five different type of problems, and
their number of iterations and CPU time in seconds is recorded. It is seen from the
table that L2 combined with Sobolev gradients showed good results compared to L2
gradient only.
In the previous section, Chan-Vese model was discussed which divides a gray image
into two phases say foreground and background. Another model proposed by Vese
and Chan (2002), which divides an image into four phases, will be discussed here.
By using one level set function, an image will be divided into two phases, whereas
increasing the number of level set functions will increase the number of phases.
To segment an image into n phases, log2 n number of level set functions will be
required.
Consider p = log2 n level set function φ : Ω → R for = 1, 2, . . . , p.
The union of the zero level sets of all φ will determine the set of edges in the
segmented image. For 1 ≤ s ≤ n = 2p , denote by cs = mean(z) the average value
of image gray scales in phase s and by χs the characteristic function for phase s.
The following energy functional is proposed; see for detail Vese and Chan (2002):
12 Fast Numerical Methods for Image Segmentation Models 453
*
Fn (c, Φ) = (z(x, y) − cs )2 χs dxdy
1≤s≤n Ω
*
+μ |∇H (φ )|dxdy (56)
1≤≤p Ω
where:
F4 (c, Φ) = (z(x, y) − c11 )2 H (φ1 )H (φ2 )dxdy
Ω
+ (z(x, y) − c10 )2 H (φ1 )(1 − H (φ2 ))dxdy
Ω
+ (z(x, y) − c01 )2 (1 − H (φ1 ))H (φ2 )dxdy
Ω
+μ |∇H (φ1 )|dxdy
Ω
+ (z(x, y) − c00 )2 (1 − H (φ1 ))(1 − H (φ2 ))dxdy
Ω
+μ |∇H (φ2 )|dxdy (58)
Ω
where c = (c11 , c10 , c01 , c00 ) is the vector of average intensities in different phases
of the given image and Φ = (φ1 , φ2 ) is the vector of level sets used for segmentation
of an image into various phases. Minimization of (57) with respect to Φ leads to the
following system of equations:
⎧
⎪
⎪ ∇φ1
⎨ δ (φ1 ) μ∇ · − [T1 H (φ2 ) + T2 (1 − H (φ2 ))] = 0,
|∇φ1 | (59)
⎪ ∇φ2
⎪
⎩ δ (φ2 ) μ∇ · − [T1 H (φ1 ) + T2 (1 − H (φ1 ))] = 0,
|∇φ2 |
are discussed in previous section, which are effective in problems with small sizes.
For large-size problems, the best option is the multigrid method.
Using finite difference schemes to discretize (59) for φ , the equations at a pixel
point (i, j ) are given by the following:
⎧ %
⎪
⎪ x μ x+ (φ1 )i,j / h1
⎪
⎪ δ (φ1 )i,j −
−(T1 )i,j H (φ2 )i,j +
⎪
⎪
⎪
⎪ h1 ( x (φ ) / h )2 + ( y (φ ) / h )2 + β
⎪
⎪ + 1 i,j 1 + 1 i,j 2
&
⎪
⎪ y
⎪
⎪ μ
y
(φ ) / h
⎪
⎪ −
+ 1 i,j 2
− (T2 )i,j (1 − H (φ2 )i,j ) = 0,
⎪
⎪
⎨ h2 ( x (φ1 )i,j / h1 )2 + ( y (φ1 )i,j / h2 )2 + β
+
% +
⎪
⎪
x μ x+ (φ2 )i,j / h1
⎪
⎪ δ (φ2 )i,j −
−(T1 )i,j H (φ1 )i,j +
⎪
⎪
⎪
⎪
h1 ( x (φ ) / h )2 + ( y (φ ) / h )2 + β
⎪
⎪ + 2 i,j 1 + 2 i,j 2
&
⎪ y
⎪ y
⎪
⎪ − μ + (φ )
2 i,j / h 2
⎪
⎪ − (T2 )i,j (1−H (φ1 )i,j ) = 0,
⎪
⎪
⎩ h2 ( x+ (φ2 )i,j / h1 )2 + ( y+ (φ2 )i,j / h2 )2 + β
(60)
Let μ = μ/ h1 , β̄ = h21 β and λ = h1 / h2 . Also denote (f1 )i,j = (T1 )i,j H (φ2 )i,j +
T2 )i,j (1 − H (φ2 )i,j ) and (f2 )i,j = (T1 )i,j H (φ1 )i,j + T2 )i,j (1 − H (φ1 )i,j ).
For = 1, 2, introducing the following notation for the differential coefficients
as follows:
1
D (φ )i,j = ,
y
(Δx+ (φ )i,j )2 + (λΔ+ (φ )i,j )2 + β̄
1
D (φ )i−1,j = ,
y
(Δx+ (φ )i−1,j )2 + (λΔ+ (φ )i−1,j )2 + β̄
1
D (φ )i,j −1 = .
y
(Δx+ (φ )i,j −1 )2 + (λΔ+ (φ )i,j −1 )2 + β̄
D (φ )i,j ((φ )i+1,j − (φ )i,j ) − D (φ )i−1,j ((φ )i,j − (φ )i−1,j )
+ λ2 D (φ )i,j ((φ )i,j +1 − (φ )i,j ) − D (φ )i,j −1 ((φ )i,j − (φ )i,j −1 )
Let φ' be the approximation to φ at the current iteration. Then from Equa-
tion (61), pursuing only local unknowns φ at (i, j ) in the following linear
equations:
D (φ' )i,j ((φ' )i+1,j − (φ )i,j ) − D (φ' )i−1,j ((φ )i,j − (φ' )i−1,j )
+ λ2 D (φ' )i,j ((φ' )i,j +1 − (φ )i,j ) − D (φ' )i,j −1 ((φ )i,j − (φ' )i,j −1 )
These equations will be solved for (φ )i,j , and store their values in (φ' )i,j , to
use it in the next iteration. This equation is used as a smoother in the multigrid
Algorithm 4. For further details, see Badshah and Chen (2009). Local Fourier
analysis is usually used to check the convergence of the smoother, and this is
discussed in the next section.
Local Fourier analysis (LFA) is a suitable tool to analyze the convergence rate
of any iterative method for linear equations. However, the underlying equations
are nonlinear, so LFA will consider a linearized equation, and as linearization
occurs locally at each pixel, the maximum rate from all pixel locations will be
considered.
Consider a square image with m = m1 = m2 and h1 = h2 = h for
simplicity, then λ = 1. Given the previous iterate at step k, φ ' = φ (k) , denote
a1 = D1 (φ'1 )i−1,j , a2 = D1 (φ'1 )i,j , a3 = D1 (φ'1 )i,j −1 , b1 = D2 (φ'2 )i−1,j , b2 =
D2 (φ'2 )i,j , b3 = D2 (φ'2 )i,j −1 which are to be considered as local constants. From
(61), the grid equation at (i, j ) is the following local smoother:
⎧
⎪
⎪ (k+1) (k+1) (k+1)
⎪ − (a1 + 2a2 + a3 )(φ1 )i,j + a1 (φ1 )i−1,j + a3 (φ1 )i,j −1
⎪
⎪
⎨
+ a2 [(φ1 )(k) (k) ¯ (k+1)
⎪ i+1,j + (φ1 )i,j +1 ] = (f1 )i,j , −(b1 + 2b2 + b3 )(φ2 )i,j (63)
⎪
⎪
⎪
⎪
⎩ + b1 (φ2 )(k+1) (k+1) (k) (k) ¯
i−1,j + b3 (φ2 )i,j −1 + b2 [(φ2 )i+1,j + (φ2 )i,j +1 ] = (f2 )i,j .
⎧
⎪
⎪ a1 (e1 )(k+1) (k+1) (k) (k)
⎪
⎪ i−1,j + a3 (e1 )i,j −1 + a2 [(e1 )i+1,j + (e1 )i,j +1 ]
⎪
⎨
−(a1 + 2a2 + a3 )(e1 )(k+1) = 0b1 (e2 )(k+1)
⎪
⎪
i,j i−1,j
⎪
⎪
⎪
⎩ +b3 (e2 )i,j −1 + b2 [(e2 )i+1,j + (e2 )i,j +1 ] − (b1 + 2b2 + b3 )(e2 )(k+1)
(k+1) (k) (k)
= 0.
i,j
(64)
Recall that the LFA measures the largest amplification factor in a relaxation
scheme (Brandt 1977; Chen 2005; Trottenberg and Schuller 2001). Let a general
Fourier component be the following:
xi yj 2iαiπ 2iβj π
Θα,β (xi , yj ) = exp iθα + iθβ = exp + .
h h m m
*
m/2 *
m/2
e1(k) = ψ1(k) Θα,β (xi , yj ), e2(k) = ψ2(k) Θα,β (xi , yj )
α,β α,β
α,β=−m/2 α,β=−m/2
At the kth iteration, each rate μ̄(k) (i, j ) = maxα,β ρ(Aα,β ) in the high-frequency
π π
range (θα , θβ ) ∈ [−π, π ] \ [− , ], measuring the effectiveness of a smoother
2 2
(Brandt 1977), is dependent on a , b , = 1, 2, 3, which in turn depends on the
pixel location (I, j ). Therefore looking for the largest smoothing rate for all i, j
(i.e., among all such pixels):
Table 4 The smoothing rate Outer The smoothing rate The smoothing rate
for a local smoother with 3
iterations s μ̂s taking out “odd pixels” μ̂∗s
inner iterations
1 0.6862 0.5720
2 0.6861 0.3170
3 0.6861 0.2747
However, due to the high nonlinearity, it is useful to define the smoothing rate
as the maximum of the above-accumulated rates out of all s relaxation steps by the
following:
A modified smoother. To motivate the idea, consider the particular case of an odd
pixel assigned with the following:
⎧
⎪
⎪
(k+1) (k+1) (k) (k)
a1 (φ1 )i−1,j + a3 (φ1 )i,j −1 + a2 (φ1 )i+1,j + (φ1 )i,j +1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨ −(a1 + 2a2 + a3 )(1 + ω)(φ1 )i,j + ω(a1 + 2a2 + a3 )(φ1 )i,j = (f¯1 )i,j ,
(k+1) (k)
⎪
⎪
⎪
⎪ b1 (φ2 )(k+1) (k+1) (k) (k)
i−1,j + b3 (φ2 )i,j −1 + b2 (φ2 )i+1,j + (φ2 )i,j +1
⎪
⎪
⎪
⎪
⎪
⎩ −(b + 2b + b )(1 + ω)(φ )(k+1) + ω(b + 2b + b )(φ )(k) = (f¯ ) ,
1 2 3 2 i,j 1 2 3 2 i,j 2 i,j
(66)
458 N. Badshah
for some 0 ≤ ω ≤ 1 (note ω = 0 reduces to the previous local smoother). The new
error equation is as follows:
⎧
⎪
⎪
(k+1) (k+1) (k) (k)
a1 (e1 )i−1,j + a3 (e1 )i,j −1 + a2 (e1 )i+1,j + (e1 )i,j +1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ (k+1) (k)
− (a1 + 2a2 + a3 )(1 + ω)(e1 )i,j + ω(a1 + 2a2 + a3 )(e1 )i,j = 0,
⎨
⎪
⎪
⎪ b1 (e2 )(k+1) (k+1) (k) (k)
⎪
⎪
⎪ i−1,j + b3 (e2 )i,j −1 + b2 (e2 )i+1,j + (e2 )i,j +1
⎪
⎪
⎪
⎪
⎩ − (1 + ω)(b1 + 2b2 + b3 )(e2 )(k+1) + ω(b1 + 2b2 + b3 )(e2 )(k)
i,j i,j = 0.
(67)
⎡ ⎤
2απ 2iβπ
a2 e m +e m +ω(a1 +2a2 +a3 )
⎢ ⎥
⎢ −2iαπ −2iβπ
0 ⎥
⎢ (1+ω)(a1 +2a2 +a3 )−a1 e m −a3 e m ⎥
⎢ ⎥
Aα,β =⎢ ⎥.
⎢ 2iαπ 2iβπ
b2 e m +e m +ω(b1 +2b2 +b3 ) ⎥
⎢ ⎥
⎣ 0
−2iαπ −2iβπ
⎦
(1+ω)(b1 +2b2 +b3 )−b1 e m −b3 e m
Equation (66) with ω = 0.7, this new scheme yields a much better rate of μ =
0.75026. The choice of ω = 0 is based on numerical experience.
− λ2 D (φ' )i,j −1 (1 + ω)(φ )i,j − ω(φ' )i,j − (φ' )i,j −1 ) = (f¯ )i,j . (68)
Table 5 The smoothing rate Outer iterations s The smoothing rate μ̂s
for a modified local smoother
1 0.5720
2 0.3170
3 0.2747
A = D (φ' )hi,j ((φ' )hi+1,j + ω(φ' )hi,j ) + D (φ' )hi−1,j ((φ' )hi−1,j + ω(φ' )hi,j ),
B = D (φ' )hi,j ((φ' )hi,j +1 + ω(φ' )hi,j ) + D (φ' )hi,j −1 ((φ' )hi,j −1 + ω(φ' )hi,j ),
A + λ2 B − f¯ i,j
(φ )hi,j =
(1 + ω)(D (φ' )hi,j + D (φ' )hi−1,j + λ2 (D (φ' )hi,j + D (φ' )hi,j −1 ))
The smoothing analysis of the improved smoother is done again in the same steps
and is given in Table 4. Clearly, the smoothing rates of the modified smoother are
much more acceptable (note the accumulated number of smoothing steps is 3s since
3 inner iterations for each outer step are used) (Table 5).
In Table 6, speed comparison of the multigrid with typical local smoother (MG1),
multigrid with modified smoother (MG1m), and additive operator splitting method
(AOS) in terms of the number of iterations and CPU time is given. Fast convergence
of the MG method can clearly be observed from the table. MG algorithms yield a
computation time of O(N log N) where N = m1 × m2 .
460 N. Badshah
Table 6 Speed comparison of MG1 (multigrid with typical local smoother), MG1m (multigrid
with modified smoother), and AOS methods in terms of number of iterations (“Itr”) and CPU time
(“CPU”). Here “–” implies no convergence (to the tolerance) or very slow convergence
AOS MG1 MG1m
Image size Itr CPU Itr CPU Itr CPU
128 × 128 80 22 3 5 2 2
256 × 256 150 193 4 13 2 7
512 × 512 1500 42,600 4 74 2 33
1024 × 1024 – – 4 525 2 148
The Vese-Chan model (Vese and Chan 2002) discussed in previous section has
the advantage that the segmented phases cannot produce vacuum or overlap by
construction. Moreover, it considerably reduces the number of level set functions
needed and can represent complex boundaries. One of the drawbacks of the Vese-
Chan model is that the energy functional of the model is a nonconvex and hence may
stuck at local minima. This local minima may lead toward wrong segmentation.
In Yang et al. (2014), a convex formulation of the Vese-Chan model (Vese and
Chan 2002) is proposed. The energy functional of the Vese-Chan model is given
in Equation 57. The convex model is then solved by using the Bregman iterations
(Bregman 1967), which are discussed here.
Definition 1. For an energy functional E(·), the Bregman distance between two
functions say u and v is given by the following:
q
DE (u, v) = E(u) − E(v) − q, u − v,
where W (·) is convex with minu W (u) = 0, Bregman iterations are defined in the
following way:
12 Fast Numerical Methods for Image Segmentation Models 461
Definition 2. For given parameter β > 0, the Bregman iterations are defined as
follows:
q (k)
u(k+1) = arg min DE (u, u(k) ) + βW (u), q (k) ∈ ∂E(u(k) ).
u
The next theorem plays an important role in minimization of the problem types
given in (69).
Theorem 1. The minimization problem given in (69) can be solved by the following
Bregman iterations:
q ( k)
u(k+1) = arg min DE (u, u(k) ) + βW (u) (70)
u
⎧
⎪
⎪ ∇φ1
⎪
⎪ μ∇ · − [T1 H (φ2 ) + T2 (1 − H (φ2 ))] = 0,
⎨ |∇φ1 |
(74)
⎪
⎪ ∇φ2
⎪
⎪
⎩ μ∇ · − [T1 H (φ1 ) + T2 (1 − H (φ1 ))] = 0.
|∇φ2 |
Thus the simplified gradient flow equation for (74) becomes the following:
⎧
⎪
⎪ ∂φ1 ∇φ1
⎪
⎪ = μ∇ · − r1 ,
⎨ ∂t |∇φ1 |
(76)
⎪
⎪ ∇φ2
⎪
⎪ ∂φ2
⎩ = μ∇ · − r2 .
∂t |∇φ2 |
where |∇(·)|1 is the total variation (TV) norm and ·, · is the inner product,
respectively, and may be written as follows:
⎧
⎪
⎪
⎪
⎪ |∇φ |
i 1 = |∇φi (x)|dx = T V (φi )
⎨ Ω
(78)
⎪
⎪
⎪
⎪
⎩ φi , ri = φi (x)ri (x)dx.
Ω
min '(φ1 , φ2 ) =
F min (μ|∇φ1 |1 +μ|∇φ2 |1 +φ1 , r1 +φ2 , r2 ). (79)
0≤φ1 ,φ2 ≤1 0≤φ1 ,φ2 ≤1
⎧
⎪
⎨ r̄1 = [T1 φ2 + T2 (1 − φ2 )],
(80)
⎪
⎩ r̄ = [T φ + T (1 − φ )].
2 1 1 2 1
To use edge information of the image, they used weighted TV norm, which is given
by the following:
T Vg (φi ) = g(|∇z|)|∇φi |dx = |∇φi |g , (81)
Ω
min '(φ1 , φ2 ) =
F min (μ|∇φ1 |g + μ|∇φ2 |g + φ1 , r̄1 + φ2 , r̄2 ).
0≤φ1 ,φ2 ≤1 0≤φ1 ,φ2 ≤1
(82)
Also note that c = [c11 , c10 , c01 , c00 ] is the vector of average intensities of image
inside Ω1 , Ω2 , Ω3 , Ω4 , respectively. The model given in Equation (82) will give
four-phase segmentation and can be extended to n phases, for which m = log2 n
level set functions will be required. The functional can be written as follows:
*
m *
m
'n (φ1 , φ2 , . . . , φm ) = min
min F μ |∇φi |g + φi , r̄i . (84)
0≤φi ≤1 0≤φi ≤1
i=1 i=1
where α > 0 is a constant. Bregman iterations for the solution; this unconstrained
minimization problem is given in the following theorem:
Theorem 3. The minimization problem (79) of the proposed model can be con-
verted to a series of optimization problems:
(k+1) (k+1) (k+1) (k+1)
(φ1 , φ2 , p1 , p2 ) = arg min μ|p1 |g + μ|p2 |g + φ1 , r̄1
0 ≤ φ1 , φ2 ≤ 1
p1 , p2
α (k)
p − ∇φ1 − b1 2
+ φ2 , r̄2 +
2 1
α
(k)
+ p2 − ∇φ2 − b2 2 , (87)
2
where bi = (bix , biy ), i = 1, 2 are the Bregman variables, which can be updated by
the following Bregman iterations with initial values b0i = (0, 0), i = 1, 2:
To solve the minimization problem given in Equation (79), it is enough to solve the
minimization problem given in Equation (87). The iterative minimization scheme
can be achieved through the following two steps for solution of Equation (87).
(k) (k)
• Keeping p1 and p2 and minimizing Equation (87) with respect to φ1 and φ2
give the following:
α
(k+1) (k+1) (k) (k) (k)
(φ1 , φ2 ) = arg min φ1 , r̄1 + φ2 , r̄2 + p1 − ∇φ1 − b1 2
0≤φ1 ,φ2 ≤1 2
α
+ p2 − ∇φ2 − b2 (k) 2 . (89)
2
12 Fast Numerical Methods for Image Segmentation Models 465
(k+1) (k+1)
• Secondly, keeping φ1 and φ2 fixed and minimizing Equation (87) with
respect to p1 and p2 give the following:
α
(k+1) (k+1) (k+1) (k)
p1 , p2 = arg min μ|p1 |g + μ|p2 |g + p1 − ∇φ1 − b1 2
p1 ,p2 2
α
+ p2 − ∇φ2(k+1) − b(k)
2 2
. (90)
2
1 (k)
(k+1) (k) (k) (k+1)
Δφ1 = r̄1 + ∇ · p1 − b1 0 ≤ φ1 ≤ 1. (91)
α
1 (k)
(k+1) (k) (k) (k+1)
Δφ2 = r̄2 + ∇ · p2 − b2 0 ≤ φ2 ≤ 1. (92)
α
These Laplace equations are solved by using Gauss-Seidel method and obtained the
(k+1)
following relation for φ :
⎧
⎪
⎪ (k)
= (k)
− (k)
+ (k)
− (k)
− (k) (k)
− b,x,i,j
⎪ ,i,j
⎪
γ p ,x,i−1,j p ,x,i,j p ,y,i,j −1 p ,y,i,j b,x,i−1,j
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
(k) (k)
+ b,y,i,j −1 − b,y,i,j
⎪
⎨
⎪
⎪ (k)
= 1 (k)
+ (k)
+ (k)
+ (k)
− 1 (k)
+ (k)
⎪
⎪ τ,i,j 4 φ ,i−1,j φ ,i+1,j φ ,i,j −1 φ ,i,j +1 α r ,i,j γ ,i,j
⎪
⎪
⎪
⎪
⎪
⎪ + ( ) ,
⎪
⎪ φ (k+1) = max min τ (k) , 1 , 0
⎪
⎩ ,i,j ,i,j
(93)
where = 1, 2.
(k+1) (k) (k+1) 1 (k) (k+1) ρ
p2 = shrinkage g b2 + ∇φ2 , = shrinkage b2 + ∇φ2 ,
α α
(95)
For further details and experimental results of the proposed model and method,
see Yang et al. (2014). In Fig. 1, the proposed method is tested on an artificial image.
In Fig. 2, results of the proposed model on a real MRI image are given.
In 2017, Cai et al. proposed a smoothing, lifting, and thresholding method with
three stages for multiphase segmentation of color images corrupted by different
degradations: noise, information loss, and blur. The proposed method works in the
following steps: in step one, a smooth restored image is obtained by applying the
convex models Cai et al. (2013) and Chan et al. (2014) on each channel of original
color image space. In the second stage, the smooth color image is transformed to
Fig. 1 Application of the proposed model to a simple synthetic image. (a)–(d): The active contour
evolving process from the initial contour to the final contour. (e)–(h): The corresponding fitting
images z at different iterations
12 Fast Numerical Methods for Image Segmentation Models 467
Fig. 2 Application of the proposed model to a brain MR image. (a)–(d): The active contour
evolving process. (e)–(h): The evolution process of the fitting image z. (i)–(l): The final four
segments with four averages c11 = 113.1278, c10 = 48.3514, c01 = 167.2793, and c00 = 4.0692
λ μ
E(zi ) = Ψi (zi −Kui )2 dx+ |∇ui |2 dx+ |∇ui |dx, i = 1, 2, . . . , d,
2 Ω 2 Ω Ω
(97)
where Ψi (·) is the characteristic function and is a region descriptor. For existence
and uniqueness of the minimizer of the above functional, see Cai et al. (2017).
468 N. Badshah
The above model (97) is considered in discrete setting and is solved for the
unique minimizer ūi for each channel i by using different methods such as primal-
dual method (Chambolle and Pock 2011; Chen et al. 2014), alternating direction
method (Boyd et al. 2010), and split Bregman method (Goldstein and Osher 2009;
Bregman 1967). Once ūi is found, it is rescaled onto [0, 1] and hence {ūi }di=1 ∈
[0, 1]d .
In first stage, a restored smooth image ūi is obtained, whereas in this stage, the
dimension lifting is performed on ūi to extract more additional information from a
different color space that help the segmentation in the later stage. Popular choices
for other less correlated color spaces are the HSV (hue, saturation, and value),
the CB (chromaticity-brightness), HSI (hue, saturation, and intensity), and the Lab
(perceived lightness, red-green, and yellow-blue). Note that the Lab is a better
color space than RGB, HSV, and HSI for segmentation. In this stage the authors
created the Lab color space with the aim to be perceptually uniform in the sense
that the numerical difference between two colors is proportional to perceived color
difference. Here, the Lab is used as the additional color space, where the L channel
correlates to perceived lightness, while the a and b channels correlate approximately
with red-green and yellow-blue, respectively.
Let ū denote Lab transform of ū, rescaling all the channels of ū on the interval
[0, 1] to yield an image denoted by ūt ∈ [0, 1]3 . Introduce a new image ū∗ by
stacking ū and ūt having six channels as follows:
Stage 3: Segmentation
Segmentation of the vector-valued image ū∗, obtained from the second stage in K
segments, is done by using thresholding. This is based on the K-means algorithm
(Kanungo et al. 2002) because of its simplicity and good asymptotic properties.
According to the value of K, the algorithm clusters all points of {ū ∗ (x) : x ∈ Ω}
into K Voronoi-shaped cells, say Ω1 ∪ Ω2 ∪ . . . ∪ ΩK = Ω. The mean vector
ck ∈ Ω 6 on each cell Ωk by the following:
Ωk ū ∗ (x)dx
ck = , k = 1, 2, . . . , K. (98)
Ωk dx
k=1 Sk = Ω and ∩k=1 Sk = . For further details see Cai et al. (2013, 2017).
Clearly ∪K K
Usually, two types of image segmentation problems are discussed in image pro-
cessing: one is global segmentation, in which the complete image is segmented
into all possible segments/regions, and the other one is the selective segmentation,
in which a region of interest is segmented in an image. In previous sections,
all discussions were about global segmentation. Another possible name used for
selective segmentation in literature is interactive segmentation. This section is
mainly devoted to selective segmentation.
1
g(w) = .
1 + w2
It must be noted that g(|∇z(x, y)|) approaches to zero near edges in an image as
discussed earlier. The purpose of the edge detector function g is to stop the evolving
curve on edges/boundaries of the objects (ROI). A function d(x, y) (distance metric)
is introduced to stop the evolving curve near the geometrical points given in set B.
This function d(x, y) can be defined in the following way (Guyader and Gout 2008):
np (x − xi∗ )2 (y − yi∗ )2
0 − −
∀(x, y) ∈ Ω, d(x, y) = 1−e 2σ 2 e 2σ 2 . (100)
i=1
for all (x, y) ∈ Ω and i = 1, 2, . . . np ; see for others (Gout et al. 2005). Clearly
d(x, y) acts locally and will be approximately 0 in the neighborhood of points of set
B. The aim is to find a contour Γ along which either d 0 or g 0. The following
energy functional is proposed:
F (Γ ) = d(x, y)g(|∇z(x, y)|)ds. (101)
Γ
The contour Γ will stop at local minima where d 0 (in the neighborhood of
points for B) or g 0 (near object boundaries). By introducing level set function
φ, functional given in Equation (101) becomes the following:
F (φ(x, y)) = d(x, y)g(|∇z(x, y)|)δ (φ)|∇φ(x, y)|dxdy, (102)
Ω
where δ (φ) is the regularized delta function. The functional F (φ(x, y)) will
be minimized with respect to φ(x, y), by considering the following minimization
problem:
where F (φ(x, y)) is given in Equation (102). First variation of the functional given
in Equation (103) leads to the following Euler-Lagrange’s equation:
∇φ(x, y)
−δ (φ(x, y))∇ · d(x, y)g(|∇z(x, y)|) = 0.
|∇φ(x, y)|
Guyader and Gout (2008) solved the following evolution equation by introducing
artificial time step t:
∂φ(x, y) ∇φ(x, y)
= δ (φ(x, y))∇ · d(x, y)g(|∇z(x, y)|) (104)
∂t |∇φ(x, y)|
∂φ(x, y)
= 0,
∂ n
where n is the outward unit normal to the boundary ∂Ω. Clearly the quantity
∂φ(x, y)
tends to 0 when a local minimum is achieved. In other words, if the
∂t
model converges, the curve will not evolve any more since a steady state has been
reached. A rescaling can be made so that the motion is applied to all level sets by
replacing δ (φ(x, y)) by |∇φ(x, y)|. Furthermore, it makes the flow independent of
12 Fast Numerical Methods for Image Segmentation Models 471
the scaling of φ (Alvarez et al. 1992; Zhao et al. 2000). Thus they considered the
following evolution problem:
where φ0 (x, y) is the initial value of φ(x, y). To avoid the evolving curve to
stuck at local minima, an extra term known as “balloon term” is given by
αd(x, y)g(|∇z(x, y)|), where α > 0. Thus the following evolution problem is
considered for solution:
φ(x, y, 0) = φ0 (x, y)
∂φ(x, y) ∇φ(x, y)
= |∇φ(x, y)|∇ · d(x, y)g(|∇z(x, y)|)
∂t |∇φ(x, y)|
This elliptic-type partial differential equation can be solved by using any time
marching scheme. One of the best among those is the additive operator splitting
(AOS) method (Weickert et al. 1997), which is discussed earlier.
where:
F (Γ, c1 , c2 ) = μ Γ d(x, y)g(|∇z(x, y)|)ds + λ1 inside(Γ ) |z(x, y) − c1 |
2 dxdy
+λ2 outside(Γ ) |z(x, y) − c2 |
2 dxdy,
(109)
where:
F (φ(x, y), c1 , c2 )
=μ d(x, y)g(|∇z(x, y)|)δ (φ(x, y))|∇φ(x, y)|dxdy
Ω
+ λ1 |z(x, y) − c1 |2 H (φ(x, y))dxdy + λ2 |z(x, y)
Ω Ω
Note that c1 and c2 are the average intensities as discussed earlier. Introducing
G(x, y) = d(x, y)g(|∇z(x, y)|) and then taking first variation of the proposed
functional with respect to φ through Gâteaux derivatives lead to the following Euler-
Lagrange’s equation:
∇φ
δ (φ)μ∇ · G(x, y)
|∇φ|
Solution of this elliptic PDE is the steady-state solution of the following evolution
equation (parabolic PDE):
∂φ ∇φ
= δ (φ)μ∇ · G(x, y)
∂t |∇φ|
∂φ
where n is the unit normal vector to the boundary of Ω. At steady state
= 0,
∂t
which means the local minimum has been reached. After some manipulation, the
above equation becomes the following:
⎧
⎪
⎪ φ(x, y, 0) = φ0 (x, y)
⎪
⎪
⎪
⎪ ∇φ
⎪
⎪ ∂φ
⎪
⎨ = μδ (φ(x, y))∇ · G(x, y)
∂t |∇φ|
−δ (φ)(λ (z(x, y) − c ) 2 − λ (z(x, y) − c )2 ), (114)
⎪
⎪ 1 1 2 2
⎪
⎪
⎪
⎪
⎪ δ (φ) ∂φ
⎪
⎪ G(x, y) = 0.
⎩ |∇φ| ∂n
∂Ω
A term αG(x, y)|∇φ| (known as a balloon term) could be added to speed up the
convergence of the evolution equation as discussed in the previous section, where α
is a positive constant. This term prevents the curve from stopping on a nonsignificant
local minimum and is also of importance when initializing the process with a curve
inside the object to be detected (Guyader and Gout 2008). Thus Equation (114) with
balloon term can be written as follows:
⎧
⎪
⎪ φ(x, y, 0) = φ0 (x, y)
⎪
⎪
⎪
⎪ ∇φ
⎪
⎪ ∂φ
⎪
⎨ ∂t = μδ (φ(x, y))∇ · G(x, y)
|∇φ|
−δ (φ)(λ1 (z(x, y) − c1 )2 − λ2 (z(x, y) − c2 )2 ) + αG(x, y)|∇φ|, (115)
⎪
⎪
⎪
⎪
⎪
⎪
⎪ δ (φ) ∂φ
⎪
⎪ G(x, y) = 0,
⎩ |∇φ| ∂n
∂Ω
⎪
⎪ −δ (φ)(λ1 (z − c1 )2 − λ2 (z − c2 )2 ) + αG(x, y)|∇φ|,
⎪
⎪
⎪
⎪
⎪
⎪ δ (φ) ∂φ
⎪
⎪ G(x, y)
= 0.
⎪
⎩ |∇φ| ∂n
∂Ω
(116)
474 N. Badshah
1 3 To detect a tumor in a real brain MRI image with 4 markers with initial guess, φ0 =
Fig.
(x − x0 )2 + (y − y0 )2 − r0 , where x0 and y0 are the average of x, y-components of the markers,
respectively. μ = (size of z)2 /10, λ1 = 0.0001, λ2 = 0.0001, α = −1.51 × 10−2 , and σ = 4
Existence and uniqueness of the solution can be proven along similar lines to
Guyader and Gout (2008). This Equation (116) is solved by using time marching
scheme like semi-implicit and additive operator splitting methods, which are
discussed in the previous sections.
In Fig. 3, the proposed model is tested on a real brain MRI image to detect a
tumor by taking four 1 marker points near tumor in brain MR image. The initial
condition is φ0 = (x − x0 )2 + (y − y0 )2 − r0 , where x0 and y0 are the average
of x, y-components of the markers, respectively. The other parameters used are
μ = (size of z)2 /10, λ1 = 0.0001, λ2 = 0.0001, α = −1.51 × 10−2 , and σ = 4.
Top left figure is the original image with initial data, and top right figure is the result
after 10 iterations. Bottom left figure is the result after 40 iterations, and bottom
right figure is the final result after 200 iterations.
1
Parameter’s selection. Initialization of the level set φ0 = (x − x0 )2 + (y − y0 )2
− r0 is done automatically by taking x0 and y0 as the average of x, y-components of
the marker’s points, and r0 is the minimum distance of the center from all marker’s
points. In most of the cases, λ1 = λ2 and may be taken small values of them. α
controls the expanding of contour near edges of the object region whose values are
near to zero and can be positive or negative. And μ is usually taken as multiple of
the size of the given image, and this parameter must be chosen very carefully as the
model is very sensitive with the selection of this parameter.
12 Fast Numerical Methods for Image Segmentation Models 475
⎧ 2 3
⎪
⎨ ΓG = ∂ΩG = (x, y)2∈ Ω | φG (x, y) = 0 3
inside (ΓG ) = ΩG = (x, y) ∈ Ω | φG (x, y) > 0 (118)
⎩ outside (Γ ) = Ω\Ω = (x, y) ∈ Ω | φ (x, y) < 03
⎪ 2
G G G
Note that ΩL ⊂ ΩG ⊂ Ω. To look for all features ΩG in the whole image domain Ω
and the selective features ΩL in the local domain ΩG , they proposed the following
energy functional by using regularized Heaviside function:
4 5
min F φL (x, y), φG (x, y)c1 , c2
φL (x,y),φG (x,y),c1 ,c2
4 5 4 5
= μ1 d(x, y)g(|∇z(x, y)|)δ φL (x, y) ∇φL (x, y) H φG (x, y) + γ dxdy
Ω
2
μL
+ ∇φL (x, y) − 1 dxdy
2 Ω
4 5
+ μ2 g(|∇z(x, y)|)δ φG (x, y) ∇φG (x, y) dxdy
Ω
μG 2 4
+ ∇φG (x, y) −1 dxdy + λ1G z(x, y) − c1 2 H φG (x, y)dxdy
2 Ω Ω
4 5
+ λ2G z(x, y) − c2 2 (1 − H ) φG (x, y) dxdy
Ω
4
+ λ1 z(x, y) − c1 2 H φL (x, y)dxdy
Ω
4 5 4
+ λ2 z(x, y) − c1 2 1 − H φL (x, y) H φG (x, y)dxdy
Ω
4 5 4 5
+ λ3 z(x, y) − c2 2 1 − H φL (x, y) 1 − H φG (x, y) dxdy
Ω
(119)
Here μL , μG are positive. Keeping φ fixed and minimizing with respect to c1 and
c2 lead the following:
476 N. Badshah
4 5 4 5 4 5 4 5
λ1G zH φG dxdy + λ1 Ω zH φL dxdy + λ2 Ω z 1 − H φL H φG dxdy
Ω
c1 = 4 5 4 5 4 5 4 5
λ1G Ω H φG dxdy + λ1 Ω H φL dxdy + λ2 Ω 1 − H φL H φG dxdy
4 5 4 5 4 5
λ2G Ω z 1 − H φG dxdy + λ3 Ω z 1 − H φL 1 − H φG dxdy
c2 = 4 5 4 5 4 5
λ2G Ω 1 − H φG dxdy + λ3 Ω 1 − H φL 1 − H φG dxdy
First variation of the functional given in Equation (119) with respect to φL and
letting G(x, y) = d(x, y)g(|∇z(x, y)|) lead the following:
⎧
⎪
⎪
⎪ 4 5 4 5 ∇φL
⎪
⎪ μ1 δ φL ∇ · G(x, y)H φG + γ |∇φ | + μL ∇ · 1 − |∇φ | ∇φL
1
⎪
⎪
⎪ L L
⎪
⎨ 4 5 4 52 4 52 4 5
+δ φL −λ1 z(x, y) − c1 + λ2 z(x, y) − c1 Hε φG
⎪
⎪
⎪ 4 52 4 5
⎪
⎪ +λ z(x, y) − c 1 − H φ = 0, in Ω
⎪
⎪
3 2 ε G
⎪
⎪
⎩ ∂φL = 0, on ∂Ω
∂ n
(120)
⎪ 4 52 4 52 4 5
⎪
⎪ +λ z(x, y) − c − λ z(x, y) − c 1 − H φL
⎪ 2G 2 2 1
⎪
⎪
⎪ 4 52 4 5
⎪
⎪ +λ3 z(x, y) − c2
⎪ 1 − H φL + αg(x, y) ∇φG , in Ω
⎪
⎪
⎩ ∂φG = 0, on ∂Ω
∂ n
(124)
For further solution steps and experimental results, see Rada and Chen (2012).
The model produces good and accurate results in hard images and images having
overlapped regions but has high computational cost due to solution of system of
PDEs for updating two level sets.
+λ1 z(x, y) − c1 2 dxdy
inside (Γ )
478 N. Badshah
+ λ2 z(x, y) − c2 2 dxdy
outside (Γ )
⎧ 2 2 ⎫
⎨ ⎬
+ν dxdy − A1 + dxdy − A2 , (125)
⎩ inside(Γ ) outsite (Γ ) ⎭
where λ1 , λ2 , μ, ν are positive constants and g is the edge detector function which
was defined earlier. Note that c1 is known, which is the average intensity of the
polygon constructed in the image by using the marker points. c2 and Γ are unknown
and need to found by minimizing the functional in (125). A1 and A2 are the areas
of the region inside and outside polygon constructed from the marker points. Using
level set function and regularized Heaviside function, the functional given in (125)
takes the following form:
4 5 4 5
min F φ(x, y), c2 = μ g | ∇z(x, y) δ (φ(x, y))|∇(φ(x, y))|
φ(x,y),c2 Ω
+ dxdy + λ1 z(x, y) − c1 2 H (φ(x, y))dxdy
Ω
4 5
+ λ2 z(x, y) − c2 2 1 − H (φ(x, y)) dxdy
Ω
% 2
+ν H (φ(x, y))dxdy − A1
Ω
2 &
4 5
+ 1 − H (φ(x, y))dxdy − A2 dxdy.
Ω
(126)
Keeping φ fixed and minimizing this functional with respect to c2 give the
following:
4 5
z(x, y) 1 − H (φ(x, y)) dxdy
c2 (φ(x, y)) = Ω 4 5
Ω 1 − H (φ(x, y)) dxdy
and keeping c2 fixed and if the marker points are not near to the boundary of the
region of interest. Thus first variation with respect to φ gives the following Euler-
Lagrange’s equation:
+
∇φ 4 52 4 52
δ (φ) μ∇· | g(|∇z(x, y))) − λ1 z(x, y) − c1 − λ2 z(x, y) − c2
∇φ |
⎫
⎬
−ν H dxdy − A1 − (1 − H )dxdy − A2 = 0 in Ω, (127)
Ω Ω ⎭
12 Fast Numerical Methods for Image Segmentation Models 479
with Neumann boundary condition. If the marker points are near the boundary of
the ROI, then Equation (127) becomes after introducing balloon term the following:
+
∇φ
δ (φ) μ∇· | d(x, y)g(|∇z(x, y)))
∇φ |
4 52 4 52
− λ1 z(x, y) − c1 − λ2 z(x, y) − c2
⎫
⎬
−ν H dxdy − A1 − (1 − H )dxdy − A2
Ω Ω ⎭
One of the basic problems in image segmentation is to handle low contrast and
missing edge information. This problem is addressed in many papers. One of that
is given in Burrows et al. (2021), in which Burrows et al. proposed methods for
segmentation of images having objects with low contrast by making weak edges
more prominent. To make the unclear/weak edges more prominent, the authors used
reproducible kernel Hilbert space (RKHS) and approximated Heaviside functions.
Deng et al. in (2016) used RKHS and approximated Heaviside functions for
another type of imaging problem, namely, image super resolution. RKHS models
the smooth parts of an image, while edges may be represented by a set of
approximated Heaviside functions. For details about RKHS and approximated
Heaviside functions, see Deng et al. (2016) and Burrows et al. (2021).
1
min z − (Kd + Ψβ)2 + p1 d T Kd + p2 β1 + p3 g T |∇(Kd + Ψβ)|, (129)
d,β 2
1 1 t
ψ(t) = + arctan( ).
2 π δ
β is a vector of all weights used for computing the edge part of an image, which is
modeled from the set of ψ(t). K is a × N matrix with Kj,k = K(xj , xk ); g is the
edge detector function based on Ψβ, performing better than a gradient-based one.
The final term encourages the contrast to be low in homogeneous regions and high
near edges.
The model given in Eq. 129 is solved by introducing auxiliary variables say θ =
β, W = Kd + Ψβ, and v = ∇W , to have the following scheme:
1 ρ1
min z − (Kd + Ψβ)2 + p1 d T Kd + p2 θ 1 + p3 g T |v| + θ
d,β,θ,W,v 2 2
ρ2 ρ3
−β + b1 2 + W − (Kd + Ψβ) + b2 2 + v − ∇W + b3 2 . (130)
2 2
To implement a block coordinate descent scheme, take the following initial approx-
imations: d (0) , β (0) , θ (0) , W (0) , v (0) , and update them alternatively and iteratively
as follows:
The d problem in proximal form:
1 ζ1
d (k) = arg min d z − (Kd + Ψβ (k−1) )2 + p1 d T Kd + d − d (k−1) 2
2 2
ρ2 (k−1) 2
+ W (k−1)
− (Kd + Ψβ (k−1)
) + b2 , (131)
2
Linearizing β problem and solving give the following proximal linear form:
ρ (k−1) ζ2
β (k) = arg minp̂(k) , β − β̂ (k−1) + θ − β + b1(k−1) 2 + β − β̂ (k−1) 2 ,
β 2 2
(133)
where β̂ (k−1) = β (k−1) + ω(k−1) (β (k−1) − β (k−2) ) and p̂(k) = ∇f (β̂ (k−1) ), with:
1
f (β̂ (k−1) ) = z − (Kd (k) + Ψ β̂ (k−1) )2 + μg T |v (k−1) |
2
ρ2 (k−1) 2
+ W (k−1) − (Kd (k) + Ψ β̂ (k−1) ) + b2 , (134)
2
12 Fast Numerical Methods for Image Segmentation Models 481
1 (k−1)
β (k) = (ρ1 (θ (k−1) + b1 ) + ζ2 β̂ (k−1) − p̂(k) ). (135)
(ρ1 + ζ2 )
ρ1 -
-
-
(k−1) -2
θ (k) = arg minαθ 1 + -θ − β (k) + b1 - ,
θ 2 2
- -2
ρ2 -- (k−1) -
-
W (k)
= arg min -W − Kd + Ψβ (k) (k)
+ b2 -
W 2
(136)
ρ3 -
- (k−1)
-2
-
+ -v − ∇W + b(k−1)
3 - ,
2
ρ3 -
-
-
(k−1) -2
v(k) = arg minνg |v| + -v − ∇W (k) + b3 -
v 2
(138)
(k−1) ν
v(k) = shrink ∇W − b3
(k)
, ·g (139)
ρ3
(k) (k−1)
b1 = b1 + θ (k) − β (k) (140)
b2(k) = b2(k−1) + W (k) − Kd (k) + Ψβ (k) (141)
(k) (k−1)
b3 = b3 + v(k) − ∇W (k) . (142)
The first-stage model given in Eq. 129 gives us separation edges from the rest and
gives us a clean image say M = Kd + Ψβ. This clean image M is used in the next
stage as an input in the segmentation model (Chan et al. 2006) and is given by the
following:
482 N. Badshah
F (u) = g(|Ψβ|)|∇u|dx + λ1 (M − c1 )2 udx
Ω Ω
(143)
+ λ2 (M − c2 ) (1 − u)dx + ξ
2
ν(u)dx
Ω Ω
Using similar framework, the authors proposed a combined model which combines
RKHS model with convex CV model. As a result the following model is proposed:
where:
1
F (d, β, u, c1 , c2 ) = z − (Kd + Ψβ)2 + γ d T Kd + αβ1 + μg T |∇u|
2
+ λ[(Kd − Ψβ − c1 )2 u + (Kd − Ψβ − c2 )2 (1 − u)].
(145)
where
1
F (d, β, θ, w, u, c1 , c2 ) = z − (Kd + Ψβ)2 + γ d T Kd + αβ1 + μg T |w|
2
+ λ[(Kd − Ψβ − c1 )2 u + (Kd − Ψβ − c2 )2 (1 − u)]
ρ1 ρ2
+ θ − β + b1 22 + w − ∇u + b2 22 . (147)
2 2
Subproblem 1.
1 --
-
ζ1 -
(k−1) -2 -
d (k)
= arg min z − Kd + Ψβ - + γ (d) Kd + 2 - d − d
(k−1) 2
d 2
(k−1) 2
+ λ u(k−1) Kd + Ψβ (k−1) − c1
(k−1) 2
+ 1−u (k−1)
Kd + Ψβ (k−1)
− c2 . (148)
12 Fast Numerical Methods for Image Segmentation Models 483
where A = (1 + 2λ)K K + 2γ K + ζ1 I .
ζ
β (k) = arg minp̂(k) , β−β̂ (k−1) +ρ1 over2θ (k−1) −β+b1(k−1) + β−β̂(k−1)2 ,
β 2
(150)
where β̂ (k−1) = β (k−1) + ω(k−1) β (k−1) − β (k−2) and p̂(k) = ∇f β̂ (k−1) ,
where f is given by the following:
f β̂ (k−1) = 12 ||z − Kd (k) + Ψ β̂ (k−1) 2 + μg w(k−1)
(k−1) 2
+λ u(k−1) Kd (k) + Ψ β̂ (k−1) − c1 (151)
(k−1) 2
+ 1−u (k−1) Kd + Ψ β̂
(k) (k−1) − c2
∇f =β̂ (k−1) −Ψ
z − Kd + Ψ β̂
(k) (k−1)
2
−2μιΨ w(k−1) g Ψ β̂ (k−1) Ψ β̂ (k−1) (152)
+2λΨ Kd (k) + Ψ β̂ (k−1) − c1
(k−1) (k−1) (k−1)
u − c2 1 − u(k−1) ,
Subproblem 4. This subproblem is solved for finding c1 and c2 , for which the
following minimization problem is solved:
2 ζ - -
(k) 3 - (k−1) -2
c1 = arg minλ u(k−1) Kd (k) + Ψβ (k) − c1 + -c1 − c1 - ,
c1 2
(156)
2 ζ - -2
(k) 4 - (k−1) -
c2 = arg minλ 1 − u(k−1) Kd (k) + Ψβ (k) − c2 + -c2 − c2 - ,
c2 2
(157)
and the solutions are given by the following:
(k−1)
ζ3 c1 + 2λ u(k−1) Kd (k) + Ψβ (k)
(k)
c1 = 4 5 , (158)
ζ3 + 2λ u(k−1) I
(k−1)
ζ4 c2 + 2λ 1 − u(k) Kd (k) + Ψβ (k)
c2(k) = 4 5 . (159)
ζ4 + 2λ 1 − u(k) I
(161)
12 Fast Numerical Methods for Image Segmentation Models 485
(k) 2 (k) 2
where r (k) = Kd (k) + Ψβ (k) − c1 − Kd (k) + Ψβ (k) − c2 and F is the
fast Fourier transform operator and F∗ is its inverse.
ρ2
w(k) = arg minμg |w| +
(k−1) 2
w − ∇u(k) + b2 2 . (162)
w 2
μ
w (k) = shrink(∇u(k) − b2(k−1) , · g), (163)
ρ2
(k) (k−1)
b2 = b2 + w (k) − ∇u(k) . (164)
In 2017, Jumaat and Chen proposed a multilevel method for solution of Badshah-
Chen selective segmentation model discussed in section “Active Contour-Based
Image Selective Model” and Rada-Chen selective segmentation discussed in sec-
tion “One-Level Selective Segmentation Model”.
where G(x, y) = d(x, y)g(|∇z(x, y)|) and |∇H (φ(x, y))| = δ (φ)|∇φ(x, y)|.
Suppose that the average intensities c1 and c2 are found at the start by using (13),
and to update φ, the following minimization problem will be considered:
486 N. Badshah
min F (φ(x, y)) = μ G(x, y)δ (φ)|∇φ(x, y)|dxdy
φ(x,y) Ω
+ λ1 |z(x, y) − c1 |2 H (φ(x, y))dxdy (165)
Ω
+ λ2 |z(x, y) − c2 |2 (1 − H (φ(x, y)))dxdy.
Ω
Here assume that given image z(x, y) has size n × n where n = 2L . The
standard coarsening defines L + 1 levels where k = 1( finest level), 2, . . . , L, L +
1(coarsest level); furthermore, k-th level has τk × τk “superpixels,” and each
“superpixel” has bk × bk pixels where τk = bnk and bk = 2k−1 . By using discrete
form of TV |∇φ|, Equation (165) can be written as follows:
1 −1 m
m* *2 −1
+ [λ1 (zi,j − c1 )2 H (φi,j ) + λ2 (zi,j − c2 )2 (1 − H (φi,j ))].h2 .
i=1 j =1
6
1 −1 m
m* *2 −1 2 2
=μ Gi,j φi+1,j − φi,j + φi,j +1 − φi,j .δ (φi,j ) (166)
i=1 j =1
1 −1 m
m* *2 −1
+ [λ2 (zi,j − c1 )2 − λ2 (zi,j − c2 )2 ]H (φi,j ) + terms independent of φ,
7 89 :
i=1 j =1
r(x,y)
where μ = μ/ h and the minimization is done with respect to φ, so the last term
will not be considered from here onward. Consider fine-level local minimization
first, which is done by using coordinate descent method.
' + C)
min F (φ
C
where C is a local and piecewise constant function. Consider a particular pixel (i, j ).
Clearly if only φi,j is allowed to vary, we simply consider the local subproblem:
12 Fast Numerical Methods for Image Segmentation Models 487
'i+1,j )2 + (φij − φ
min F local (φi,j ) = μ Gij (φij − φ 'i,j +1 )2 δ (φi,j )
φi,j
+ Gi−1,j (φij − φ'i−1,j )2 + (φ'i−1,j − φ'i−1,j +1 )2 δ (φ
'i−1,j )
'i,j −1 )2 + (φ
+ Gi,j −1 (φij − φ 'i,j −1 − φ
'i+1,j −1 )2 δ (φ
'i,j −1 )
'ij ),
+ rij H (φ
iterate the following (Richardson type) scheme to obtain an approximation for φi,j :
new
φi,j = RH S/LH S, (167)
where:
'i+1,j + φ
(φ 'i,j +1 ) 'i−1,j .δ (φ
φ 'i−1,j )
RH S = μ Gij old
δ (φi,j ) + Gi−1,j
L1 L2
'i,j −1 .δ (φ
φ 'i,j −1 )
+ Gi,j −1 'i,j ),
+ ri,j δ (φ
L3
⎡ ⎤
2δ (φ
⎢ i,j
old )
2L1 ' '
δ (φi−1,j ) δ (φi,j −1 ) ⎥
LH S = μ ⎣ + 2
+ + ⎦
L1 2 L2 L3
π( 2 + φi,jold )
and
'i+1,j )2 + (φ old − φ
L1 = (φijold − φ 'i,j +1 )2 + β
ij
'i−1,j )2 + (φ
L2 = (φijold − φ 'i−1,j − φ'i−1,j +1 )2 + β
'i,j −1 )2 + (φ
L3 = (φijold − φ 'i,j −1 − φ
'i+1,j −1 )2 + β,
and γ > 0 is a regularizing parameter. Equation (167) is usually done for few steps
'i,j .
only to update φ
' + C),
min F (φ (168)
C
where C is a local and piecewise constant function of support τk ×τk = 2k−1 ×2k−1
at each block (i, j ) of pixels. On kth level, the subproblem may be taken as follows:
488 N. Badshah
' + Ik Bk c),
ĉ = arg min F (φ Ck = Ik Bk ĉ, (169)
c∈Rτk ×τk
Details of solving the local minimization subproblem (169) are here. Set on level k,
b = τk = 2k−1 , k1 = (i − 1)b + 1, k2 = ib, 1 = (j − 1)b + 1, 2 = j b. Firstly,
note that on level k, there are only m1 /τk × m2 /τk subproblems each of which
is essentially one dimensional (mimicking a coarse grid of a geometric multigrid
method). Secondly, introduce the Richardson-type iterative method adopted for each
subproblem.
At each block (i, j ) of pixels, solve (169) for ci,j . Observe that each TV term
|∇φ| does not change within the interior pixels of each block on level k because of
the following:
'k, ) − (ci,j + φ
[(ci,j + φ 'k+1, )]2 + [(ci,j + φ
'k, ) − (ci,j + φ
'k,+1 )]2
= [φ'k, − φ 'k+1, ]2 + [φ
'k, − φ
'k,+1 ]2 ≡ Tk, .
'i,j + Ik Bk ci,j )
min F (φ
ci,j
*
2
=μ 'k1 −1, − φ
Gk1 −1, [ci,j − (φ 'k1 , )]2 + [φ
'k1 −1, − φ
'k1 −1,+1 ]2 .δ (ci,j + φ
'k1 −1, )
=1
2 −1
k*
+μ 'k,l2 +1 − φ
Gk,2 [ci,j − (φ 'k,2 )]2 + [φ
'k,2 − φ
'k+1,2 ]2 .δ (ci,j + φ
'k,2 )
k=k1
'k2 ,2 +1 − φ
+μGk2 ,2 [ci,j − (φ 'k2 ,2 )]2 + [ci,j − (φ
'k2 +1,2 − φ
'k2 ,2 )]2 .δ (ci,j + φ
'k2 ,2 )
12 Fast Numerical Methods for Image Segmentation Models 489
2 −1
*
+μGk2 , 'k2 +1, − φ
[ci,j − (φ 'k2 , )]2 + [φ
'k2 , − φ
'k2 ,+1 ]2 .δ (ci,j + φ
'k2 , )
=1
k2
*
+μGk,1 'k,1 −1 − φ
[ci,j − (φ 'k,1 )]2 + [φ
'k,1 −1 − φ
'k+1,1 −1 ]2 .δ (ci,j + φ
'k,1 )
k=k1
2 −1
k* 2 −1
* *
2 *
k2
+ 'k, ) +
Tk, .δ (ci,j + φ 'k, ).
r(k, )H (ci,j + φ (170)
k=k1 +1 =1 +1 =1 k=k1
'k,+1 − φ
Φk, = φ 'k, , 'k+1, − φ
Θk, = φ 'k, ,
and:
2
*
F(ci,j ) = μGk1 −1, 'k1 −1, )
(ci,j − Θk1 −1, )2 + Φk21 −1, .δ (ci,j + φ
=1
2 −1
k*
+μGl,2 (ci,j − Φk,2 )2 + Θk,
2 δ (c 'k,2 )
i,j + φ
2
k=k1
2 −1
*
+μ 'k2 , )
Gk2 , (ci,j − Θk2 , )2 + Φk22 , δ (ci,j + φ
=1
*
k2
+μ Gk,1 (ci,j − Φk,1 )2 + Θk,
2 δ (c 'k,1 )
i,j + φ
1
k=k1
√
'k2 ,2 )
+μGk2 ,2 2 (ci,j − Pk2 ,2 )2 + (Qk2 ,2 )2 δ (ci,j + φ
2 −1
k* 2 −1
* *
k2 *
2
+μ 'k, ) +
Tk, .δ (ci,j + φ 'k, ).
rk, H (ci,j + φ
k=k1 +1 =1 +1 k=k1 =1
490 N. Badshah
The first-order condition for F (ci,j ) = 0 and doing some manipulations, the
following iterative scheme for cij will be achieved:
new
ci,j = RH S old /LH S old , (171)
old = 0:
starting from ci,j
*
2 'k1 −1, (cold − Θk1 −1, )2 + Φ 2
φ i,j k1 −1,
RH S old = 2μ Gk1 −1,
( 2 + (ci,j 'k1 −1, )2 )2
old + φ
=1
*
2
Θk1 −1,
+μ
( 2 + (ci,j 'k1 −1, )2 )2 (cold − Θk1 −1, )2 + Φ 2
old + φ
=1 i,j k1 −1,
2 −1
k* 2 −1
*
'k,
φ
+ . . . + 2μ Tk, .
( 2 old 'k, )2 )2
+ (ci,j + φ
k=k1 +1 =1 +1
*
k2 *
2 'k,
old φ
2ci,j 1
− rk, . + ,
( 2 + φk,
2 )2 ( 2 + (ci,j
old 'k, )2 )
+φ
k=k1 =1
and:
old − Θ
k1 −1, ) + Φk1 −1,
2
*
2 (ci,j 2
LH S old
= −2μ Gk1 −1,
( 2 + (ci,j 'k1 −1, )2 )2
old + φ
=1
*
2
1
+μ
( 2 + (ci,j 'k1 −1, )2 )2 (cold − Θk1 −1, )2 + Φ 2
old + φ
=1 i,j k1 −1,
2 −1
k* 2 −1
*
Tk,
+ · · · − 2μ
( 2 old 'k, )2 )2
+ (ci,j + φ
k=k1 +1 =1 +1
*
k2 *
2
'k,
φ
−2 rk, .
( 2 +φ'2 )2
k=k1 =1 k,
'k, + ci,j
φk,l = φ
*
m1 *
m2 *
m1 *
m2
' + Ik Bk c) = min μ
min F (φ 'i,j + c) +
Ti,j δ (φ 'i,j + c).
ri,j H (φ
c c
i=1 j =1 i=1 j =1
*
m1 *
m2 'i,j + cnew
φ
−μ Gij Gi,j Ti,j
( 2 + (φ'i,j + c)2 )2
i=1 j =1
⎡
*
m1 *
m2 'i,j
2cold φ 1
+ ri,j ⎣ + 2
'2 )2
( 2 + φ ( + (c 'i,j )2 )
old + φ
i=1 j =1 i,j
⎤
'i,j
2cnew φ
− ⎦ = 0. (172)
'2 )2
( 2 + φ i,j
Exactly in same lines, multilevel method for RC model can be derived, and it
is left as an exercise for the reader. For comparison and experimental results, see
Jumaat and Chen (2017).
492 N. Badshah
Proposed Framework
The proposed framework can be constructed from any algorithm used for classi-
fication, which is combined with a region-based model with a level set method.
The matrix of classifier probability scores is generated by using KNN and support
vector machine (SVM). The matrix is then regularized and combined with Chan-
Vese (CV) active contour model (Chan and Vese 2001) which is discussed in
section “Chan-Vese Model” in detail.
KNN. KNN provides scores in the range [0, 1] which can be implemented easily
using the fuzzy KNN rule. This rule is derived from the fuzzy set and the KNN
classifier in machine learning. For a reference set XR = {xi : 1 ≤ i ≤ mR } and a
set of l-dimensional vectors W = {wi : 1 ≤ i ≤ mR }, wi = (wi,1 , wi,2 , . . . , wi,l , ),
l and mR are the number of classes and the number of elements in the reference set
XR , respectively. Due to fuzziness of the vectors, the following condition must be
satisfied:
*
l
wi,j = 1, 0 ≤ wi,j ≤ 1. (173)
j =1
1*
ν= ws . (174)
k
s∈K
12 Fast Numerical Methods for Image Segmentation Models 493
Support vector machine SVM. In support vector machine, the given data is
divided into two classes by finding a hyperplane between the classes with largest
margin. This is done by using a sign function class(x) = sgn(h(x)), where h(x) is
the separating hyperplane for the two classes and is given by the following:
It is still hard to find ϕ explicitly, so a kernel K(x, xi ) is introduced and thus (176)
may be written as follows:
*
N
h(x) = αi yi K(x, xi ) + b0 , (177)
i=1
where αi is the estimated SVM parameter and yi ∈ {1, −1} is the desired class for
the corresponding xi . The value of h(x) is the SVM evaluation score and the sign is
the predicted class. Note that the scores of KNN falls in the range [0, 1] and that of
SVM in the range (−∞, ∞), which can be converted to a prior probability score.
Instead of refining these binary scores using machine learning algorithms. To retain
the probability scores which are processed further by applying any region-based
active contour model. This aims to find an optimal solution where the function ρ(s)
can be simply expressed by the following:
ρ2 (s) = s. (179)
494 N. Badshah
A nonlinear function ρ approximately lying under ρ2 for s > 0.5 and above ρ2 for
s < 0.5 leads to better results. The regularization function in general should satisfy
the following conditions:
There are some more options for taking regularization functions ρ(s); for details see
Pratondo et al. (2017).
The map of ρ is then fed to a region-based active contour model. Through energy
minimization using the level set method, the optimum solution for the desired region
can be obtained. For experimental results, data set utilization, and comparisons, see
Pratondo et al. (2017).
In 2022, Badshah and Ahmad proposed a new architecture based on CNNs, namely,
ResBCU-Net for segmentation of skin images/medical images. The network,
ResBCU-Net, is an extension of the U-Net which utilizes residual blocks, batch
normalization, and bidirectional ConvLSTM. In addition, we present an extended
form of ResBCU-Net, ResBCU-Net(d = 3), which takes advantage of densely
connected layers in its bottleneck section.
Proposed Work
Based on U-Net (Olaf et al. 2015) and inspired by residual blocks (He et al. 2016),
batch normalization (Ioffe and Szegedy 2015), and bidirectional convolutional
long-short-term memory (BConvLSTM) network (Song et al. 2018), a neural
network, named as ResBCU-Net, shown in the Fig. 4 was proposed for segmentation
of skin/medical images. The authors have made changes in the encoding path
and decoding path of the classical U-Net, which is explained here in detail by
considering encoding and decoding separately.
12 Fast Numerical Methods for Image Segmentation Models 495
Fig. 4 ResBCU-Net architecture with residual blocks in the encoding path and BConvLSTM in
the decoding path. The numbers on top of the rectangles show number of channels
Encoding
Unlike to the U-Net (Olaf et al. 2015), encoding/contracting path of ResBCU-Net
consists of residual blocks (He et al. 2016) and batch normalization layers (Ioffe and
Szegedy 2015) with nine convolution layers. The path consists of three blocks; each
block contains three convolution layers followed by a batch normalization layer.
The output of first convolution layer in each block is added with the output of the
batch normalization layer, which is then followed by a max pooling layer. At the
same time, before the max pooling layer, the output of each block is passed for
concatenation with the corresponding output of the decoding/expanding path.
Residual Blocks
Successive sequences of convolution layers lead to learning of different features;
in some cases it may also lead to learning of redundant features; and adding more
layers lead to higher training error. To solve this problem in such deeper models,
residual blocks are introduced in He et al. (2016). An input to some convolution
layers is added to the output of the layers; the resultant is again fed to the successive
convolution layers; an example of residual block is shown in the Fig. 5.
The authors utilized this approach for ResBCU-Net encoding path. Instead of
blocks of two convolution layers in the encoding path, three convolutions blocks
each followed by a batch normalization layer are introduced. Each block is then
converted to residual blocks by adding the output of the first convolution layer to
the output of the batch normalization layer in the block, as shown in the Fig. 6.
496 N. Badshah
F (x) relu
Weight layer x
identity
F (x) + x +relu
Batch Normalization
To avoid over-fitting and for acceleration of the training process, batch normal-
ization layers (Ioffe and Szegedy 2015) in the encoding and decoding path of
ResBCU-Net are included. The batch normalization layer controls variation in
distribution by calculating mean and standard deviation values of the data set as
a whole by adjusting the mean to 0 and variance to 1; the equation for batch
normalization (BN) is given below:
In,c,h,w − μc
BN = γc 1 + βc
σc2 −
each followed by an activation function, ReLU, are used. While in the decoding
path, the batch normalization layers after each up-sampling layer are used, which are
then followed by activation functions, ReLU, before proceeding to the next block.
Decoding
The decoding/expanding path of ResBCU-Net, inspired by BCDU-Net (Guo et al.
2019), contains convolution layers, up-sampling layers, batch normalization layers
(Ioffe and Szegedy 2015), and bidirectional LSTM convolutions (BConvLSTM)
(Song et al. 2018). Right after the bottleneck portion of the network, an up-sampling
convolution with 2 × 2 filter, followed by a batch normalization layer, is used which
is then followed by two convolution layer blocks. Features from the corresponding
blocks in the encoding path are passed into the BConvLSTM after concatenation
with the outputs of the corresponding block of the decoding path. In each block,
outputs of BConvLSTMs are passed into two convolutional layers. At the end of the
decoding path, we use a convolution layer with 1 × 1 filter followed by a sigmoid
function as an activation function.
Here, ∗ and o are convolution operator and Hadamard product, respectively. The
ht is the hidden state, output, of the single ConvLSTM block. Now, in case of
bidirectional ConvLSTM (BConvLSTM), the output can be represented as follows:
498 N. Badshah
−
→ − → ←− ← −
Yt = tanh(ωyh ∗ h t + ωyh ∗ h t + b).
−
→ ←−
Here, h t and h t are output states of forward and backward direction feature
process; and the Yt is the final output of a BConvLSTM block.
The copied features from encoding path are concatenated with corresponding
outputs from decoding path and are then passed into BConvLSTM blocks. The out-
put of these blocks then proceeds forward to the two convolution layer blocks. For
training, testing, and comparison, see Badshah and Ahmad (2022) and references
there in.
Conclusion
Some of the well-known active contour models for image segmentation are pre-
sented. Here both types of segmentations (global and selective) are discussed. In
this chapter two-phase and multiphase segmentation models are discussed in detail.
Minimization techniques for finding the optimal values and discussion about the
fast numerical methods for solution of partial differential equations arising from the
minimization of the models were key points of discussion in this chapter.
12 Fast Numerical Methods for Image Segmentation Models 499
References
Allen, A.M., Cahn, J.W.: A microscopic theory for antiphase boundary motion and its application
to antiphase domain coarsening. Acta Metall. 27, 1085–1095 (1979)
Alvarez, L., Lions, P.-L., Morel, J.M.: Image selective smoothing and edge detection by nonlinear
diffusion. SIAM J. Numer. Anal. 29(3), 845–866 (1992)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential
Equations and the Calculus of Variations. Springer, New York (2002)
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional convlstm U-Net with
densley connected convolutions. In: Proceedings of the IEEE/CVF International Conference
on Computer Vision Workshops (2019)
Badshah, N., Ahmad, A.: ResBCU-Net: deep learning approach for segmentation of skin images.
Biomed. Sig. Process. Control 71, 103137 (2022)
Badshah, N., Chen, K.: Multigrid method for the Chan-Vese model in variational segmentation.
Commun. Comput. Phys. 4(2), 294–316 (2008)
Badshah, N., Chen, K.: On two multigrid algorithms for modeling variational multiphase image
segmentation. IEEE Trans. Image Process. 18(5), 1097–1106 (2009)
Badshah, N., Chen, K.: Image selective segmentation under geometrical constraints using an active
contour approach. Commun. Comput. Phys. 7(4), 759–778 (2010)
Barash, D., Schlick, T., Israeli, M., Kimmel, R.: Multiplicative operator splittings in nonlinear
diffusion: from spatial splitting to multiple timesteps. J. Math. Imaging Vis. 19, 33–48 (2003)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3,
1–122 (2010)
Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31(2), 333–
390 (1977)
Bregman, L.: The relaxation method of finding the common point of convex sets and its application
to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3),
200–217 (1967)
Briggs, W.L.: A Multigrid Tutorial. (1999)
Burrows, L., Guo, W., Chen, K., Torella, F.: Reproducible kernel Hilbert space based global and
local image segmentation. Inverse Probl. Imaging 15(1), 1–25 (2021)
Cai, X., Chan, R., Zeng, T.: A two-stage image segmentation method using a convex variant of the
Mumford-Shah model and thresholding. SIAM J. Imaging Sci. 6(1), 368–390 (2013)
Cai, X., Chan, R., Nikolova, M., Zeng, T.: A three-stage approach for segmenting degraded color
images: smoothing, lifting and thresholding (SLaT). J. Sci. Comput. 72, 1313–1332 (2017)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22, 61–79
(1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Chan, T.F., Chen, K.: An optimization based multilevel agorithm for total variation image
denoising. SIAM J. Multiscale Model. Simul. (MMS) 5(2), 615–645 (2006)
Chan, T.F., Vese, L.A.: Active Contours without Edges. IEEE Trans. Image Process. 10(2), 266–
277 (2001)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Chan, R., Yang, H., Zeng, T.: A two-stage image segmentation method for blurry images with
poisson or multiplicative gamma noise. SIAM J. Imaging Sci. 7(1), 98–127 (2014)
Chen, K.: Matrix Preconditioning Techniques and Applications. Cambridge University Press,
Cambridge (2005)
500 N. Badshah
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems.
SIAM J. Optim. 24(4), 1779–1814 (2014)
Cleeremans, A., Servan-Schreiber, D., McClelland, J.: Finite state automata and simple recurrent
networks. Neural Comput. 1(3), 372–381 (1989). MIT Press
Cui, Z., Ke, R., Pu, Z., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural
network for network-wide traffic speed prediction. arXiv preprint, 1801.02143 (2018)
Deng, L.-J., Guo, W., Huang, T.-Z.: Single-image super-resolution via an iterative reproducing
kernel hilbert space method. IEEE Trans. Circuits Syst. Video Technol. 26, 2001–2014 (2016)
Geiser, J., Bartecki, K.: Additive, multiplicative and iterative splitting methods for Maxwell
equations: algorithms and applications. In: International Conference of Numerical Analysis and
Applied Mathematics (ICNAAM 2017)
Goldstein, T., Osher, S.: The split bregman algorithm for l1 regularized problems. SIAM J. Imaging
Sci. 2, 323–343 (2009)
Gout, C., Guyader, C.L., Vese, L.: Segmentation under geometrical consitions with geodesic active
contour and interpolation using level set methods. Numer. Algorithms 39, 155–173 (2005)
Guo, Y., Stein, J., Wu, G., Krishnamurthy, A.: SAU-Net: a universal deep network for cell counting.
In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational
Biology and Health Informatics, pp. 299–306 (2019)
Guyader, C.L., Gout, C.: Geodesic active contour under geometrical conditions theory and 3D
applications. Numer. Algorithms 48, 105–133 (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal
covariate shift. arXiv preprint, 1502.03167 (2015)
Jeon, M., Alexander, M., Pedrycz, W., Pizzi, N.: Unsupervised hierarchical image segmentation
with level set and additive operator splitting. Pattern Recogn. Lett. 26(10), 1461–1469 (2005)
Jordan, M.I.: Attractor dynamics and parallelism in a connectionist sequential machine. Artif.
Neural Netw.: Concept Learn. 112–127 (1990)
Jumaat, A.K., Chen, K.: An optimization based multilevel algorithm for variational image
segmentation models. Electron. Trans. Numer. Anal. 46, 474–504 (2017)
Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient k-means
clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24,
881–892 (2002)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 6(4),
321–331 (1988)
Lu, T., Neittaanmaki, P., Tai, X.-C.: A parallel splitting up method for partial differential equations
and its application to navier-stokes equations. RAIRO Math. Model. Numer. Anal. 26(6), 673–
708 (1992)
Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)
Olaf, R., Philipp, F., Thomas, B.: U-Net: convolutional networks for biomedical image seg-
mentation. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention. Springer, pp. 234–241 (2015)
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on
Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total
variation-based image restoration. Multiscale Model. Simul. 4(2), 460–489 (2005)
Pearlmutter, B.: Learning state space trajectories in recurrent neural networks. Neural Comput.
1(2), 263–269 (1989). MIT Press
Pratondo, A., Chee-Kong, C., Sim-Heng, O.: Integrating machine learning with region-based active
contour models in medical image segmentation. J. Vis. Commun. Image R. 43, 1–9 (2017)
Rada, L., Chen, K.: A new variational model with dual level set functions for selective segmenta-
tion. Commun. Comput. Phys. 12(1), 261–283 (2012)
12 Fast Numerical Methods for Image Segmentation Models 501
Rada, L., Chen, K.: Improved selective segmentation model using one level set. J. Algorithms
Comput. Technol. 7(4), 509–541 (2013)
Roberts, M., Chen, K., Li, J., Irion, K.: On an effective multigrid solver for solving a class of
variational problems with application to image segmentation. Int. J. Comput. Math. 97(10),
1–21 (2019)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithm. Physica
D 60(1–4), 259–268 (1992)
Savage, J., Chen, K.: An improved and accelerated non-linear multigrid method for total-variation
denoising. Int. J. Comput. Math. 82(8), 1001–1015 (2005)
Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.: Pyramid dilated deeper ConvLSTM for video
salient object detection. In: Proceedings of the European Conference on Computer Vision
(ECCV), pp. 715–731 (2018)
Sussman, M., Smereka, P., Osher, S.: A level set approach for computing solutions to incompress-
ible two-phase flow. J. Comput. Phys. 114, 146–159 (1994)
Trottenberg, U., Schuller, A.: Multigrid. Academic, Orlando (2001)
Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the
Mumford and Shah model. Int. J. Comput. Vis. 50, 271–293 (2002)
Weickert, J., Kühne, G.: Fast methods for implicit active contours models, preprint 61. Universität
des Saarlandes, Saarbrücken (2002)
Weickert, J., ter Haar Romeny, B.M., Viergever, M.A.: Efficient and reliable schemes for nonlinear
diffusion filtering. Scale-space theory in computer vision. Lect. Notes Comput. Sci. 1252, 260–
271 (1997)
Yang, Y., Zhao, Y., Wu, B., Wang, H.: A fast multiphase image segmentation model for gray
images. Comput. Math. Appl. 67, 1559–1581 (2014)
Yuan, Y., He, C.: Variational level set methods for image segmentation based on both L2 and
Sobolev gradients. Non Linear Anal. Real World Appl. 13, 959–966 (2012)
Zhao, H.-K., Osher, S., Merriman, B., Kang, M.: Implicit and non parametric shape reconstruction
from unorganized data using a variational level set method. Comput. Vis. Image Underst. 80(3),
295–314 (2000)
On Variable Splitting and Augmented
Lagrangian Method for Total 13
Variation-Related Image Restoration Models
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Augmented Lagrangian Method for Total Variation-Related Image
Restoration Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Augmented Lagrangian Method for TV-L2 Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Augmented Lagrangian Method for TV-L2 Restoration with Box Constraint . . . . . . . . . . 516
Augmented Lagrangian Method for TV Restoration with Non-quadratic Fidelity . . . . . . . 519
Extension to Multichannel Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
The Multichannel TV Restoration Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Augmented Lagrangian Method for Multichannel TV Restoration . . . . . . . . . . . . . . . . . . . 526
Extension to High-Order Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Augmented Lagrangian Method for Second-Order Total Variation Model . . . . . . . . . . . . . 528
Augmented Lagrangian Method for Total Generalized Variation Model . . . . . . . . . . . . . . . 531
Augmented Lagrangian Method for Euler Elastic-Based Model . . . . . . . . . . . . . . . . . . . . . 536
Augmented Lagrangian Method for Mean Curvature-Based Model . . . . . . . . . . . . . . . . . . 539
Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Z. Liu
School of Mathematical Sciences, Tianjin Normal University, Tianjin, China
e-mail: [email protected]
Y. Duan
Center for Applied Mathematics, Tianjin University, Tianjin, China
e-mail: [email protected]
C. Wu ()
School of Mathematical Sciences, Nankai University, Tianjin, China
e-mail: [email protected]
X.-C. Tai
Hong Kong Center for Cerebro-cardiovascular Health Engineering (COCHE), Shatin, Hong Kong
e-mail: [email protected]
Abstract
Variable splitting and augmented Lagrangian method are widely used in image
processing. This chapter briefly reviews its applications for solving the total vari-
ation (TV) related image restoration problems. Due to the nonsmoothness of TV,
related models and variants are nonsmooth convex or nonconvex minimization
problems. Variable splitting and augmented Lagrangian method can benefit from
the separable structure and efficient subsolvers, and has convergence guarantee in
convex cases. We present this approach for a number of TV minimization models
including TV-L2 , TV-L1 , TV with nonquadratic fidelity term, multichannel TV,
high-order TV, and curvature minimization models.
Keywords
Introduction
This short survey provides a brief review of the variable splitting and augmented
Lagrangian method for total variation (TV)-related image restoration models. We
will focus on this computational problem closely, and do not plan to touch other
related topics like theoretical model analysis and algorithmic connections, which
can be referred to, e.g., Aubert and Kornprobst (2010) and Glowinski et al. (2016)
and references therein. Also, to keep the context as compact as possible, we would
not expand all the details, although there are definitely lots of exellent works in the
literature.
Total variation, which is a semi-norm of the space of functions of bounded
variation, was first proposed for image denoising by Rudin, Osher, and Fatemi
(ROF) in Rudin et al. (1992). In the discrete setting, it is essentially the L1 norm
of gradients and can maintain the sparse discontinuities. Therefore, it is appropriate
to preserve image edges that are usually the most important features for images to
recover. Owing to its edge-preserving property and convexity, total variation has
been demonstrated very successful and become popular in image restoration like
image denoising (Rudin et al. 1992; Le et al. 2007), image deblurring (Chan and
Wong 1998; Wu and Tai 2010) and image inpainting (Bertalmio et al. 2003) and also
various other types of image processing tasks including image decomposition (Vese
and Osher 2003), image segmentation (Chan and Vese 2001), CT reconstruction
(Persson et al. 2001), phase retrieval (Chang et al. 2016) and so on.
The total variation model has been generalized in many ways for different
purposes. The original total variation regularization was proposed for gray image
restoration (Rudin et al. 1992), which is the single channel case. To restore
multichannel data, such as color images with RGB channels, people extended it
to color TV and vectorial TV regularizations (Blomgren and Chan 1998; Sapiro
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 505
algorithms. There are some close connections between the augmented Lagrangian
method and other approaches such as split Bregman method (Goldstein and Osher
2009) and Chambolle’s projection method (Chambolle 2004), and some works
for improving classical augmented Lagrangian method can be found in Li et al.
(2013), etc.
The content included here are organized as follows. In section “Basic Notation”,
we present some basic notations. In section “Augmented Lagrangian Method
for Total Variation-Related Image Restoration Models”, we present augmented
Lagrangian methods TV restoration models with L2 fidelity term and TV restoration
models with non-quadratic fidelity. In Section “Extension to Multichannel Image
Restoration”, we present augmented Lagrangian methods for multichannel TV
restoration. In Section “Extension to High-Order Models”, we present augmented
Lagrangian methods for high-order models, including second-order total variation
model, total generalized variation model, Euler’s elastica model, and mean curvature
model. In Section “Numerical Experiments”, we show some numerical experiments.
We conclude this paper in Section “Conclusions”.
Basic Notation
We follow Wu and Tai (2010) for most notations. As a gray image is a 2D array, we
represent it by an N × N matrix, without the loss of generality. It is useful to denote
the Euclidean space RN ×N as X and write Y = X × X. We recall the discrete
gradient operator
∇:X→Y
x → ∇x,
where ∇x is given by
with
xi,j +1 − xi,j , 1 ≤ j ≤ N − 1,
(D̊1+ x)i,j =
xi,1 − xi,N , j = N,
xi+1,j − xi,j , 1 ≤ i ≤ N − 1,
(D̊2+ x)i,j =
x1,j − xN,j , i = N.
Here D̊1+ and D̊2+ are used to denote forward difference operators with periodic
boundary condition for FFT algorithm implementation. We mention that other
boundary conditions with corresponding implementation tricks can also be adopted.
508 Z. Liu et al.
The usual inner products and L2 norms in the spaces X and Y are as follows. We
denote
x, z = xi,j zi,j and x = x, x,
1≤i,j ≤N
for x, z ∈ X ; and
w, y = w1 , y 1 + w 2 , y 2 , and y = y, y,
as the usual Euclidean norm in R2 . We mention that xLp is used to denote the
general Lp norm of x ∈ X.
By using the inner products of X and Y, it is clear that the discrete divergence
operator, as the adjoint operator of −∇, is as follows
div : Y → X
y = (y 1 , y 2 ) → div y,
where
− 1 − 2
(div y)i,j = yi,j
1
− yi,j
1
−1 + yi,j − yi−1,j = (D̊1 y )i,j + (D̊2 y )i,j ,
2 2
with backward difference operators D̊1− and D̊2− and periodic boundary conditions
1 = y 1 and y 2 = y 2 .
yi,0 i,N 0,j N,j
applies. Here the noise is not necessarily to be additive and could be Gaussian,
impulsive, Poisson, or even others. The task of image restoration is to recover
x from d. In this survey we only consider the case where the linear operator K
is given. Even so, we usually cannot directly solve x from (1), because this is a
typical inverse problem. Both the random measurement noise and the bad condition
number of K bring computational difficulties. Regularization on the solution should
be considered to overcome the ill-posedness.
Although the classical Tikhonov regularization has achieved great successes in
lots of general inverse problems, it turns out to over smooth image edges, the most
important image structure. Indeed, one of the most basic and successful image
restoration models is based on total variation regularization, which reads
where F (Kx) is a fidelity term, R(∇x) is the total variation of x (Rudin et al. 1992)
defined by
R(∇x) = TV(x) = |(∇x)i,j |, (3)
1≤i,j ≤N
Lots of researches (Le et al. 2007; Beck and Teboulle 2009; Chan et al. 2013) show
that to involve this kind of constraints is useful, when the intensity range is clear.
Otherwise, one can just let the box parameters b be −∞ or b be +∞. This model
includes numerous particular cases studied in the literatures.
For further analysis and interpretation, we make the following assumptions:
where Null(·) is the null space of ·; dom(F ) = {z ∈ X : F (z) < +∞} is the
domain of F ; and dom(R ◦ ∇), dom(B), dom(F ◦ K) are similar. Here we have
some comments on these assumptions, which are relatively quite natural. Since most
linear operators Ks like blur kernels correspond essentially to averaging operations,
Assumption 1 is reasonable. Moreover, although the fidelity terms F (·)s are diverse
by the statistics of the noise models, many of them meet all of those Assumption 3
and 4, like the following typical ones:
510 Z. Liu et al.
α
F (Kx) = Kx − d2 ,
2
where α > 0 is a parameter. Note for Poisson noise, we use the definition of the
fidelity on the whole space for analysis convenience, compared to Le et al. (2007)
(where K = I ) and (Brune et al. 2009).
Under the Assumptions 1, 2, 3, and 4, it is not difficult to see that the functional
E(x) in (2) is convex, proper, coercive, and lower semi-continuous. Thus we
have the following existence and uniqueness result, by the generalized Weierstrass
theorem and Fermat’s rule (Glowinski and Tallec 1989; Rockafellar and Wets 1998).
Theorem 1. The minimization problem (2) has at least one solution x, which
satisfies
with ∂F (Kx) and ∂R(∇x) being the sub-differentials (Rockafellar and Wets 1998)
of F at Kx and R at ∇x, respectively. Moreover, if F ◦ K(x) is strictly convex, the
minimizer is unique.
In this section, we review the augmented Lagrangian method proposed for the TV
restoration model with L2 fidelity term (Tai and Wu 2009; Wu and Tai 2010)
α
min ETV (x) = Kx − d2 + R(∇x) , (5)
x∈X 2
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 511
where α > 0 and R(∇x) is defined as in (3). This model is a special case of
model (2), where F (Kx) = α2 Kx − d2 and the box constraint vanishes. In the
literatures, people commonly call model (5) as TV-L2 model.
The TV-L2 model is a fundamental model in image restoration, which is usually
applied for removing Gaussian-type noise and the linear degradation like blur in
image restoration problems (Rudin et al. 1992; Acar and Vogel 1994). By standard
Bayesian estimation, the L2 fidelity term is deduced from the statistical distribution
of the i.i.d Gaussian noise, which guarantees that the recovered image resembles the
underly truth image closely. Meanwhile, the total variation regularization preserves
the sharp edges.
As we mentioned before, the total variation term is non-smooth and is a
compound of the L1 norm and the gradient operator. There is a basic idea that
decouples the total variation term and treats the L1 norm and the gradient operator
separately. By combining this with variable splitting technique, the augmented
Lagrangian method demonstrates this idea.
First, we introduce an auxiliary variable y ∈ Y for ∇x and convert the
minimization problem (5) to an equivalent constrained optimization problem
α
min GTV (x, y) = Kx − d2 + R(y) ,
x∈X,y∈Y 2 (6)
s.t. y = ∇x.
Then, we define the following augmented Lagrangian function for the con-
strained optimization problem (6)
α β
LTV (x, y; λ) = Kx − d2 + R(y) + λ, y − ∇x + y − ∇x2 , (7)
2 2
Find (x ∗ , y ∗ , λ∗ ) ∈ X × Y × Y ,
s.t. LTV (x ∗ , y ∗ ; λ) ≤ LTV (x ∗ , y ∗ ; λ∗ ) ≤ LTV (x, y; λ∗ ), (8)
∀(x, y, λ) ∈ X × Y × Y ,
The following theorem (Glowinski and Tallec 1989; Wu and Tai 2010) reveals
the relation between the solution of problem (5) and the saddle-point of problem (8).
λk+1 = λk + β(y k − ∇x k ).
We can see that the minimization problem (9) still can not be solved directly
and exactly. Our strategy is separating the problem (9) into two subproblems with
respect to x and y and minimizing them alternatively.
α β
min Kx − d2 − λk , ∇x + y − ∇x2 .
x∈X 2 2
where F and F −1 denote the Fourier transform and the inverse Fourier transform.
Fourier transforms of operators K ∗ , K, div, and mean the transforms of the
corresponding convolution kernels. If K is not a convolution operator, such as a
Radon transform or a subsampling, we can solve the above equation (10) by other
well-developed linear solvers like conjugate gradient (CG) method.
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 513
β
min R(y) + λk , y + y − ∇x2 . (11)
y∈Y 2
1
yi,j = max 0, 1 − ηij , (13)
β|ηi,j |
β
min |u| + |u − v|2 , (14)
u∈R2 2
|u − v| ≥ |u∗ − v|,
which indicates
514 Z. Liu et al.
Fig. 1 A geometric
u2
interpretation of the
formula (13)
u v
u∗
O u1
β β
|u| + |u − v|2 ≥ |u∗ | + |u∗ − v|2 .
2 2
The above equality implies that the solution of (14) locates on the line segment Ov.
Therefore, we let u = γ v with 0 ≤ γ ≤ 1 and simplify the problem (14) into an
univariate optimization problem
β
min γ |v| + (γ − 1)2 |v|2 . (15)
0≤γ ≤1 2
The above problem (15) can be solved exactly and has a closed form solution
1
γ ∗ = max 0, 1 − .
β|v|
Algorithm 2 Augmented Lagrangian method for TV-L2 model – solve the mini-
mization problem (9)
Initialization: x k,0 = x k−1 , y k,0 = y k−1 .
Iteration: For l = 0, 1, . . . , L − 1:
Here L can be chosen using some convergence test techniques. In fact, setting
L = 1 is sufficient to establish the convergence of the sequence (Wu and Tai 2010)
generated by Algorithm 1. In this case, the augmented Lagrangian method is well-
known as the alternating direction method of multipliers (Boyd 2010).
Convergence Analysis
In this section, we present some convergence results of Algorithm 1. Actually, we
can verify that Algorithm 1 is convergence in two cases, i.e., when the minimization
problem (9) is exactly solved in each iteration and the problem (9) is roughly solved
in each iteration (Glowinski and Tallec 1989; Wu and Tai 2010). We comment that
the convergence proof in Wu and Tai (2010) is based on Glowinski and Tallec (1989)
but reduces the uniform convexity assumption of R(·). Here, we just take the main
convergence results from Wu and Tai (2010) and omit the details.
In the first case, we should set L → ∞ in Algorithm 2, and the inner iteration is
guaranteed to converge.
Since R(y) is continuous, (16) indicates that x k is a minimizing sequence of ETV (·).
If we further have Null(K) = {0}, then
⎧
⎨ lim x k = x ∗ ,
k→∞
⎩ lim y k = y ∗ .
k→∞
Since R(y) is continuous, (17) indicates that x k is a minimizing sequence of ETV (·).
If we further have Null(K) = {0}, then
⎧
⎨ lim x k = x ∗ ,
k→∞
⎩ lim y k = y ∗ .
k→∞
In this section, we review the augmented Lagrangian method for the TV restoration
model with the L2 fidelity term and the box constraint (Chan et al. 2013), which
reads
α
min ETVB (x) = Kx − d2 + R(∇x) + B(x) , (18)
x∈X 2
where α > 0, R(∇x) is defined as (3), and we have −∞ < b ≤ b̄ < +∞ in B(x).
This model is also a special case of model (2), where F (Kx) = α2 Kx − d2 .
The box constraint is inherent in digital image processing. The nature image is
stored as discrete numerical arrays in some digital media. The typical used ranges
are [0, 1] and [0, 255]. It has been shown that adding the box constraint in image
restoration can improve the quality of the recovered image (Beck and Teboulle 2009;
Chan et al. 2013).
The original method proposed in Chan et al. (2013) is under the framework of the
alternating direction method of multipliers, which is a special case of the augmented
Lagrangian method. For the sake of clarity, we reformulate it in our notations and
styles.
Compared with the TV-L2 model (5), this model has one more non-
differentiability term B(x). Thus, we need another variable to eliminate the
nondifferentiation for x. We introduce two auxiliary variables y ∈ Y and z ∈ X
and rewrite the problem (18) to be the following constrained optimization problem
α
min GTVB (x, y, z) = Kx − d2 + R(y) + B(z)
x∈X,y∈Y,z∈X 2
(19)
y ∇
s.t. = x,
z I1
We define the augmented Lagrangian function for the problem (19) as follows
α
LTVB (x, y, z; λy , λz ) = Kx − d2 + R(y) + B(z)
2
λy y ∇
+ , − x
λz z I1 (20)
2
1 y ∇
+ − x ,
2 z I1
S
λy βy I2
where is the Lagrangian multiplier and S = with the identity
λz βz I1
operator I2 : Y → Y and √ positive parameters βy , βz . Here uS denotes the
S-norm, defined by uS = u, Su.
For the augmented Lagrangian method, we consider the saddle-point problem
Algorithm 3 Augmented Lagrangian method for TV-L2 model with box constraint
y −1 0 λ0 0
Initialization: x −1 = 0, = , y0 = .
z−1 0 λz 0
Iteration: For k = 0, 1, . . .:
λk+1 λky βy (y k − ∇x k )
y = + .
λk+1
z λkz βz (zk − x k )
518 Z. Liu et al.
Similar to the equation (10), the above equation can be efficiently solved by fast
linear solvers such as FFT and CG.
• y-subproblem:
βy
min R(y) + (λky , y) + y − ∇x2 , (26)
y∈Y 2
• z-subproblem:
βz
min B(z) + (λkz , z) + z − x2 . (27)
z∈X 2
We can obtain the minimizer of (26) from (13) and the minimizer of (27) as follows
where P[b,b̄] (·) is the projection onto the interval [b, b̄] and
λkz
ξ =x− ∈ X.
βz
After knowing the solutions of the subproblems (23) and (25), we use the
following alternative minimization procedure to solve (22); see Algorithm 4.
y y k,l
• Compute x k,l+1 by solving (24) for = ;
z zk,l
y k,l+1
• Compute from (13) and (28) for x = x k,l+1 .
zk,l+1
yk y k,L
Output: x k = x k,L , = .
zk zk,L
and for Poisson noise removal, we commonly choose the Kullback-Leibler (KL)
divergence fidelity (Le et al. 2007; Brune et al. 2009)
⎧
⎨α ((Kx)i,j − di,j log(Kx)i,j ), (Kx)i,j > 0, ∀ i, j,
F (Kx) = 1≤i,j ≤N (31)
⎩ +∞, otherwise.
In this section, we focus on the augmented Lagrangian method for image restoration
with these two non-quadratic fidelities. For other non-quadratic fidelities, one can
extend our method accordingly.
The non-quadratic fidelities (30) and (31) are non-smooth. Adopting the idea to
cope with total variation term, we require one more auxiliary variable to remove the
nonlinearity arising from F (Kx). We first introduce two auxiliary variables y and z
and reformulate (29) to an equivalent constrained optimization problem
min GTVNQ (y, z) = R(y) + F (z)
x∈X,y∈Y,z∈X
y ∇ (32)
s.t. = x.
z K
λy βy I2
with Lagrange multiplier and S = and consider the saddle-
λz βz I1
point problem
λk+1 λky βy (y k − ∇x k )
y = + .
λk+1
z λkz βz (zk − Kx k )
We can use fast linear solvers to solve the above equation, such as FFT and CG.
We can split it into two distinct minimization problems with respect to y and z as
follows
• y-subproblem:
βy
min R(y) + (λky , y) + y − ∇x2 ; (39)
y∈Y 2
• z-subproblem:
βz
min F (z) + (λkz , z) + z − Kx2 . (40)
z∈X 2
For the problem (39), it is the same as the problem (11) and can be solved
via (13). For the problem (40), we next show its solution based on the choices of
F (·).
For the L1 fidelity (30), we can rewrite the z-subproblem (40) as
βz
min αz − dL1 + z − ξ 2
z∈X 2
where
λkz
ξ = Kx − .
βz
y y k,l
• Compute x k,l+1 by solving (37) for = ;
z zk,l
y k,l+1
• Compute from (13) and (41) for x = x k,l+1 .
zk,l+1
yk y k,L
Output: x k = x k,L , = .
zk zk,L
where
λkz
ξ = Kx − .
βz
Now, the alternating minimization procedure to solve the problem (35) with the KL
divergence fidelity (31) can be described in Algorithm 7.
y y k,l
• Compute x k,l+1 from (37) for = ;
z zk,l
y k,l+1
• Compute from (13) and (42) for x = x k,l+1 .
zk,l+1
yk y k,L
Output: x k = x k,L , = .
zk zk,L
In this section, we review the augmented Lagrangian method for the multichannel
TV restoration (Wu and Tai 2010). The multichannel images are widely used, such
as three-channel RGB color image.
Let us define
X = X × X × · · · × X, Y = Y × Y × · · · × Y .
! " ! "
M M
The usual inner products and L2 norms in the spaces X and Y are as follows. We
denote
√
x, z = xm , zm , x = x, x;
1≤m≤M
√
y, w = ym , wm , y = y, y.
1≤m≤M
for x ∈ X and y ∈ Y .
With reference to the degradation model (1) of the gray image, here we model
the multichannel image degradation procedure as
⎛ ⎞
K11 K12 · · · K1M
⎜ ⎟
⎜ K21 K22 · · · K2M ⎟
K=⎜
⎜ .. .. .. . ⎟⎟,
⎝ . . . .. ⎠
KM1 KM2 · · · KMM
where each Kij is a convolution matrix. The diagonal elements of K denote within-
channel blurs, while the off-diagonal elements describe cross-channel blurs. To
solve x, we consider the following multichannel image restoration model (Sapiro
and Ringach 1996)
α
min EMTV (x) = Kx − d2 + RMTV (∇x) , (43)
x∈X 2
where
RMTV (∇x) = TV(x) = |(∇xm )i,j |2
1≤i,j ≤N 1≤m≤M
is the vectorial TV semi-norm (Sapiro and Ringach 1996) (see Blomgren and Chan
1998 for some other choices).
Similarly as for the single channel image restoration model, here we make the
following assumption:
Under this assumption, one can verify that the functional EMTV (x) in (43) is convex,
proper, coercive, and continuous. Hence, we have the following result (Wu and Tai
2010).
Theorem 6. The problem (43) has at least one solution x, which satisfies
α
min GMTV (x, y) = Kx − d2 + RMTV (y)
x∈X ,y∈Y 2 (44)
s.t. y = ∇x.
α β
LMTV (x, y; λ) = Kx − d2 + RMTV (y) + λ, y − ∇x + y − ∇x2 ,
2 2
Find (x ∗ , y ∗ , λ∗ ) ∈ X × Y × Y
s.t. LMTV (x ∗ , y ∗ ; λ) ≤ LMTV (x ∗ , y ∗ ; λ∗ ) ≤ LMTV (x, y; λ∗ )
∀(x, y; λ) ∈ X × Y × Y . (45)
1. Compute (x k , y k ) from
2. Update
λk+1 = λk + β(y k − ∇x k ).
As for the minimization problem (46), we separate it into two subproblems with
respect to x and y and minimize them alternatively.
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 527
α β
min Kx − d2 − λk , ∇x + y − ∇x2 . (47)
x∈X 2 2
from which F(x) can be found and then x via an inverse Fourier transform (Yang
et al. 2009; Wu and Tai 2010). Here applying Fourier transform to a block matrix is
regarded as applying Fourier transform to each block.
β
min{RMTV (y) + λk , y + y − ∇x2 }.
y∈Y 2
It has the following closed form solution (Yang et al. 2009; Wu and Tai 2010)
1
y i,j = max 1 − , 0 ηi,j , (49)
β|ηi,j |
k
where η = ∇x − λβ ∈ Y . Indeed, this solution is a high-dimensional version of (13),
which can be also derived from the geometric method.
According to (48) and (49), we then have an alternating minimization procedure
to (46); see Algorithm 9.
To overcome the staircase effect, Lysaker, Lundervold, and Tai suggested regulariz-
ing the total variation of the gradient and proposed a model based on second-order
derivatives (Lysaker et al. 2003). We begin with some notations to establish this
second-order total variation (TV2 ) model.
Let
Y%= X × X × X × X.
H : X → Y%
x → H x,
with
−+ ++
(D̊11 x)i,j (D̊12 x)i,j
(H x)i,j = ++ −+ ,
(D̊21 x)i,j (D̊22 x)i,j
−+ ++ ++ −+
where D̊11 , D̊12 , D̊21 and D̊22 are second-order difference operators and given
by
−+
(D̊11 x)i,j := (D̊1− (D̊1+ x))i,j ,
++
(D̊12 x)i,j := (D̊1+ (D̊2+ x))i,j ,
++
(D̊21 x)i,j := (D̊2+ (D̊1+ x))i,j ,
−+
(D̊22 x)i,j := (D̊2− (D̊2+ x))i,j .
The usual inner product and L2 norm in the space Y%are as follows. We denote
y, w = y 1 , w 1 + y 2 , w 2 + y 3 , w 3 + y 4 , w 4 and y = y, y,
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 529
y1 y2 w1 w2
for y = ∈ Y%and w = % At each pixel (i, j ),
∈ Y.
y3 y4 w3 w4
|yi,j | = (y 1 )2i,j + (y 2 )2i,j + (y 3 )2i,j + (y 4 )2i,j
is the usual Euclidean norm in R4 . By using the inner products of Y%and X and the
definitions of the finite difference operators, the adjoint operator of H is as follows
H ∗ : Y%→ X
y1 y2
y= → H ∗ y,
y3 y4
where
+− 1 −− 1 −− 3 +− 4
(H ∗ y)i,j = (D̊11 y )i,j + (D̊21 y )i,j + (D̊12 y )i,j + (D̊22 y )i,j ,
+− −− −− +−
where D̊11 , D̊12 , D̊21 , and D̊22 are second-order difference operators.
By regularizing the norm of the discrete Hessian, the TV2 model (Lysaker et al.
2003) reads
α
min ETV2 (x) = Kx − d2 + RHO (H x) , (50)
x∈X 2
Similarly as for the total variation restoration model, we make the following
assumption:
Under this assumption, the functional ETV2 (x) in (50) is convex, proper, coercive,
and continuous. Hence, we have the following result.
Theorem 7. The problem (50) has at least one solution x, which satisfies
α
min GTV2 (x, ŷ) = Kx − d2 + RHO (ŷ)
x∈X,ŷ∈Y% 2 (52)
s.t. ŷ = H x.
α β
LTV2 (x, ŷ; λ) = Kx − d2 + RHO (ŷ) + λ, ŷ − H x + ŷ − H x2 , (53)
2 2
with the multiplier λ ∈ Y% and a positive constant β, and consider the following
saddle-point problem:
Find (x ∗ , ŷ ∗ , λ∗ ) ∈ X × Y%× Y%
s.t. LTV2 (x ∗ , ŷ ∗ ; λ) ≤ LTV2 (x ∗ , ŷ ∗ ; λ∗ ) ≤ LTV2 (x, ŷ; λ∗ )
∀(x, ŷ; λ) ∈ X × Y%× Y.
% (54)
1. Compute (x k , ŷ k ) from
2. Update
λk+1 = λk + β(ŷ k − H x k ).
α β
min Kx − d2 − λk , H x + ŷ − H x2 , (56)
x∈X 2 2
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 531
This equation can solved by well-developed linear solvers such as FFT and CG.
β
min RHO (ŷ) + (λk , ŷ) + ŷ − H x2 , (58)
ŷ∈Y% 2
1
ŷi,j = max 0, 1 − ηij , (59)
β|ηi,j |
version of (13), which can be also derived from the geometric method.
According to (57) and (59), we then use an iterative procedure to alternatively
calculate x and ŷ ; see Algorithm 11.
Algorithm 11 Augmented Lagrangian method for the TV2 model – solve the
minimization problem (55)
Initialization: x k,0 = x k−1 , ŷ k,0 = ŷ k−1 .
Iteration: For l = 0, 1, 2, . . . , L − 1:
et al. 2010). In this section, we consider the following discrete second-order total
generalized variation (Bredies et al. 2010)-based image restoration model
1
min Kx − d2 + α1 R(∇x − w) + α0 RHO (Ew) , (60)
x∈X,w∈Y 2
E : Y → Y%
1
w = (w 1 , w 2 ) → Ew = (∇w + ∇w T ),
2
with
1
(Ew)ij = (∇w + ∇w T )ij
2
(D̊1+ w 1 )ij + 1 + 2
2 ((D̊2 w )ij + (D̊1 w )ij ) ,
1
= + 1 + 2
2 ((D̊2 w )ij + (D̊1 w )ij )
1
(D̊2+ w 2 )ij
and RHO (·) is defined in (51). Similarly, by using the inner products of Y% and Y
and the definitions of the finite difference operators the adjoint operator of −E is as
follows
div2 : Y%→ Y
z1 z3
z= → div2 z,
z3 z2
where
D̊1− z1 + D̊2− z3
div2 z =
D̊1− z3 + D̊2− z2
with
z1 z3
y = (y 1 , y 2 ) ∈ Y and z = ∈ Y% and transform it into an equivalent
z3 z2
constrained optimization problem
1
min GTGV (x, y, z) = Kx − d2 + α1 R(y) + α0 RHO (z)
x∈X,w∈Y,y∈Y,z∈Y% 2
y ∇ −I2 x
s.t. = .
z E w
(61)
We then define the augmented Lagrangian function as follows
1
LTGV (x, w, y, z; λy , λz ) = Kx − d2 + α1 R(y) + α0 RHO (z)
2
λy y ∇ −I2 x
+ , −
λz z E w (62)
2
1 y ∇ −I 2 x
,
+ −
2 z E w
S
λy βy I2
where is the Lagrange multiplier and S = %2 with the
λz βz I
%2 : Y%→ Y,
identity operator I % and consider the saddle-point problem
∀(x, w, y, z, λy , λz ) ∈ X × Y × Y × Y%× Y × Y.
% (63)
Finally, the iterative algorithm for seeking a saddle point is given by Algo-
rithm 12.
λky
1. Compute (x k , w k , y k , zk ) from , i.e.,
λkz
2. Update
λk+1 λky βy (y k − ∇x k + w k )
y = + .
λk+1
z λkz βz (zk − Ew k )
⎧
⎨1
λky ∇ −I2 x
min Kx − d2 − ,
(x,w)∈X×Y ⎩ 2 λzk E w
2 ⎫
⎪
⎬
1 y ∇ −I2 x
+ − . (65)
2
z E w
⎪
⎭
S
K ∗ K − βy βy div x K ∗ d − div(λky + βy y)
= ,
−βy ∇ βy − βz div2 E w −λky − βy y − div2 (λkz + βz z)
i.e.
⎧
⎪
⎪ (K ∗ K − βy D̊1− D̊1+ − βy D̊2− D̊2+ )x + βy D̊1− w 1 + βy D̊2− w 2 = g 1 ,
⎪
⎪
⎪
⎪
⎨ βz − + 1 βz − + 2
−βy D̊1+ x + (βy I − βz D̊1− D̊1+ −
D̊ D̊ )w − D̊2 D̊1 w = g 2 ,
⎪ 2 2 2 2
⎪
⎪
⎪
⎪ β β
⎪
⎩ −βy D̊2+ x − D̊1− D̊2+ w 1 + (βy I − D̊1− D̊1+ − βz D̊2− D̊2+ )w 2 = g 3 ,
z z
2 2
(66)
where
& ' & '
g 1 = K ∗ d − D̊1− (λky )1 + βy y 1 − D̊2− (λky )2 + βy y 2 ,
& ' & '
g 2 = −(λky )1 − βy y 1 − D̊1− (λkz )1 + βz z1 − D̊2− (λkz )3 + βz z3 ,
& ' & '
g 3 = −(λky )2 − βy y 2 − D̊1− (λkz )3 + βz z3 − D̊2− (λkz )2 + βz z2 .
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 535
This linear system with periodic boundary condition can be efficiently solved by
Fourier transform via FFT implementation (Yang et al. 2009). Firstly, we apply
FFTs to both sides of (66) to get
⎛ ⎞⎛ ⎞ ⎛ ⎞
a 11 a 12 a 13 F(x) F(g 1 )
⎜ 21 22 23 ⎟ ⎜ ⎟ ⎜ ⎟
⎝a a a ⎠ ⎝F(w 1 )⎠ = ⎝F(g 2 )⎠ . (67)
a 31 a 32 a 33 F(w 2 ) F(g 3 )
⎧
⎨
λky y
min α1 R(y) + α0 RHO (z) + ,
(y,z)∈Y×Y%⎩ λkz z
2 ⎫
⎪
⎬
1 y ∇ −I2 x
+ − . (68)
2
z E w
⎪
⎭
S
• y-subproblem:
βy
min α1 R(y) + λky , y + y − ∇x + w2 ; (69)
y∈Y 2
• z-subproblem:
βz
min α0 RHO (z) + λkz , z + z − Ew2 . (70)
z∈Y% 2
The problem (69) and (70) have the closed form solutions
α1 α0
yi,j = max 0, 1 − ηi,j , and zi,j = max 0, 1 − ξi,j ,
βy |ηi,j | βz |ξi,j |
(71)
where
536 Z. Liu et al.
λky λkz
η = ∇x − w − ∈ Y, and ξ = Ew − %
∈ Y.
βy βz
After knowing the solutions of the subproblems (65) and (68), we use the
following alternative minimization procedure to solve (64); see Algorithm 13.
x k,l+1 y y k,l
• Compute from (67) for = ;
w k,l+1 z zk,l
y k,l+1 x x k,l+1
• Compute from (71) for = .
zk,l+1 w w k,l+1
xk x k,L yk y k,L
Output: = , k = .
wk w k,L z zk,L
As basic geometric measurements of curves, both length and curvatures are natural
regularities that are widely used in various image processing problems. Euler’s
elastica is defined as the line energy for a smooth planar curves γ
(
E(γ ) = (a + bκ 2 )ds, (72)
γ
where κ is the curvature of the curve, s is arc length, and a, b are positive constants.
By summing up the Euler’s elastica energies of all the level sets for an image x, it
gives the following energy for image denoising task
1
min REE (κ(x), ∇x) + Kx − d2 , (73)
x 2
∇x
where κ(x) = div( |∇x| ) and REE (κ(x), ∇x) is defined by
) *
REE (κ(x), ∇x) = a + bκ 2 (xi,j ) |(∇x)i,j |.
1≤i,j ≤N
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 537
1
min REE (div n, y) + Kx − d2 + IM (m)
x,y,n,m 2 (74)
s.t. y = ∇x, n = m, |y| = m · y,
M = {mij : |mi,j | ≤ 1, ∀ 1 ≤ i, j ≤ N }.
Note that the variable m was introduced to relax the constraint on variable n. By
requiring m to be lain in the set M, the term |y| − y · m is guaranteed non-negative,
which make the sub-minimization problem w.r.t. m easy to handle with. We can
further define the augmented Lagrangian functional as follows
1
LEE (x, y, n, m; λy , λn , λm ) = REE (div n, y) + Kx − d2 + IM (m)
2
βy βn (75)
+ λy , y − ∇x + y − ∇x2 + λn , n − m + n − m2
2 2
+ λm , |y| − m · y + |y| − m · y, βm ,
1. Compute (x k , y k , nk , mk ) from
2. Update
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
λk+1
y λky βy (y k − ∇x k )
⎜ k+1 ⎟ ⎜ k ⎟ ⎜ ⎟
⎝λn ⎠ = ⎝ λn ⎠ + ⎝ βn (nk − mk ) ⎠
k+1
λm λmk βm (|y | − m · y )
k k k
1 βy
min Kx − d2 + y − ∇x2 − λky , ∇x, (77)
x 2 2
the first-order optimal condition of which gives us
Fast numerical methods can be used to solve the above equation such as fast Fourier
transform (FFT) and iterative schemes.
βy
min a +b(div n)2 , |y|+λky , y+λkm +βm , |y|−m·y+ y −∇x2 , (78)
y 2
βy
) λk + βm λky *
2 + ,
min y − ∇x + ( m )m − + |y|, a + b(div n)2 + λkm + βm .
y 2 βy βy
βn
min IM (m) − λkm , m + n − m2 − (λkm + βm )y, m. (79)
m 2
βn
(λk + βm )y + λkm 2
min IM (m) + m − m − n ,
m 2 βn
the optimal solution of which can be achieved by performing the one-step projection
to the solution of the quadratic minimization.
βn
min b(div n)2 , |y| + λkn , n + n − m2 , (80)
n 2
and can be solved by a frozen coefficient method for easier implementation (Tai
et al. 2011; Yashtini and Kang 2016).
Mean curvature-based model (Zhu and Chan 2012) considers an image restoration
problem as a surface smoothing task. A basic model is as follows
( ) * (
∇x α
min div dx + (Kx − d)2 dx. (81)
x 1 + |∇x|2 2
Originally, the smoothed mean curvature model (81) was numerically solved by
the gradient descent method, which involves high-order derivatives and converges
slowly in practice. Zhu et al. (2013) developed an augmented Lagrangian method
540 Z. Liu et al.
for a mean curvature-based image denoising model (81), with similar ideas further
studied in Myllykoski et al. (2015). Following Zhu et al. (2013), we rewrite the mean
curvature-regularized model into the following constrained minimization problem
α
min RMC (q) + Kx − d2 + IM (m)
x,y,q,n,m 2 (82)
s.t. y = ∇x, 1, q = div n, n = m, |y| = y · m,
α
Kx − d2 + IM (m)
LMC (x, y, q, m, n; λy , λq , λn , λm ) = RMC (q) +
2
+ , βy βq
+ λy , ∇x, 1 + y − ∇x, 12 + q − ∇ · n + q − ∇ · n
2 2
βn
+ λn , n − m + n − m2 + λm , |y| − y · m + βm |y| − y · m,
2
(83)
1. Compute (x k , y k , q k , nk , mk ) from
2. Update
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
λk+1
y λky βy (y k − ∇x k , 1)
⎜ k+1 ⎟ ⎜ k ⎟ ⎜ ⎟
⎜λq ⎟ ⎜ λq ⎟ ⎜ βq (q k − ∇ · nk ) ⎟
⎜ k+1 ⎟ = ⎜ k ⎟ + ⎜ ⎟
⎝λn ⎠ ⎝ λn ⎠ ⎝ βn (n − m ) ⎠ k k
k+1
λm k
λm βm (|y | − y · m )
k k k
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 541
We can separate the minimization problem (84) into subproblems to obtain the
solutions in an alternative way. Similarly as discussed for Euler’s elastica model,
the minimizers to the variable y, q, and m have closed form solutions, while the
minimizers to the variable x and n are obtained by solving the associated Euler-
Lagrange equations by either FFT or fast iterative schemes. Therefore, we omit the
details here.
Numerical Experiments
where x is the ground truth image, d is the observed image, and x ∗ is the recovered
image. For multichannel case, we have the similar definition of ISNR. For each
model, the parameter α is tuned to obtain the highest ISNR. The performances of
augmented Lagrangian methods are demonstrated in Figs. 3, 4, 5, 6, 7, 8, 9, 10,
and 11.
Figure 3 shows the results of augmented Lagrangian method for solving TV-
L2 model. In this experiment, we corrupt the clean image (size 512 × 512)
with Gaussian blur and Gaussian noise. We set the parameters by following the
recommendations in Wu and Tai (2010) and let β = 10. We report the recovered
image and its ISNR in Fig. 3c. We also record the used CPU time t when the
algorithm terminates. We can see that augmented Lagrangian method can solve
TV-L2 model efficiently and obtain high-quality recovered image.
Figure 4 shows the results of augmented Lagrangian method for solving
TV-L2 model with box constraint and the comparisons with TV-L2 model. In
this experiment, the degraded image (size 217 × 181) is corrupted with Gaussian
blur and Gaussian noise. We set the parameters β = βy = 10, and βz = 400. We
542 Z. Liu et al.
Fig. 3 Augmented Lagrangian method (ALM) for solving TV-L2 model. (b) is a corruption of (a)
with Gaussian blur fspecial(’gaussian’,11,3) and Gaussian noise with variation 1e−2;
(c) is the recovered result
Fig. 4 Augmented Lagrangian method for solving TV-L2 model with box constraint (TVBox).
(b) is a corruption of (a) with Gaussian blur fspecial(’gaussian’,5,1.5) and Gaussian
noise with variation 1e − 3; (c) and (d) are the recovered results
Fig. 5 Augmented Lagrangian method for solving TV-L1 model. (b) is a corruption of (a) with
Gaussian blur fspecial(’gaussian’,11,3) and 50% salt and pepper noise; (c) is the
recovered result
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 543
Fig. 6 Augmented Lagrangian method for solving TV-KL model. (b) is a corruption of (a) with
Gaussian blur fspecial(’gaussian’,11,3) and Poisson noise; (c) is the recovered result
Fig. 7 Augmented Lagrangian method for multichannel TV (MTV) restoration (b) is a corruption
of (a) with within-channel Gaussian blur fspecial(’gaussian’,21,5), and Gaussian
noise with variation 1e − 3; (c) is the recovered result
report the recovered images and their ISNRs in Fig. 4c, d. We also record the used
CPU times t when the algorithms terminate. We can see that augmented Lagrangian
method can solve TV-L2 model with box constraint efficiently and obtain high-
quality recovered image. The TV-L2 model with box constraint gains higher ISNR
than the TV-L2 model.
Figures 5 and 6 show the results of augmented Lagrangian methods for TV-L1
model and TV-KL model. In the experiment for TV-L1 model, the observed image
(size 512 × 512) is degraded with Gaussian blur and 50% salt and pepper noise.
We set βy = 20 and βz = 100. In the experiment for TV-KL model, the observed
image (size 256 × 256) is corrupted with Gaussian kernel and Poisson noise. We let
βy = 20 and βz = 20. We can see that augmented Lagrangian methods can recover
high-quality images in these two experiments and the CPU costs are low.
Figure 7 shows the results of augmented Lagrangian method for multichannel TV
restoration. In this experiment, the degraded image is generated by first blurring the
ground truth image (size 512 × 512 × 3) with within-channel Gaussian blur and then
adding Gaussian noise to the blurred image. We set β = 100. We also can see that
544 Z. Liu et al.
Fig. 8 Augmented
Lagrangian method for
solving TV2 model. (b) is a
corruption of (a) with
Gaussian blur
fspecial(’gaussian’,
11,3) and Gaussian noise
with variation 1e − 2; (c) and
(d) are the recovered results
Fig. 9 Augmented
Lagrangian method for
solving TGV model. (b) is a
corruption of (a) with
Gaussian blur
fspecial(’gaussian’,
5,1.5) and Gaussian noise
with variation 1e − 2; (c) and
(d) are the recovered results
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 545
Fig. 10 Augmented Lagrangian method for solving Euler’s elastica (EE) based image denoising
model. (b) is a corruption of (a) with Gaussian noise with variation 1e − 2; (c) is the recovered
result
Fig. 11 Augmented Lagrangian method for solving mean curvature (MC)-based image denoising
model. (b) is a corruption of (a) with Gaussian noise with variation 1e − 2; (c) is the recovered
result
Conclusions
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Prob. 10(6), 1217–1229 (1994)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential
Equations and the Calculus of Variations, 2nd edn. Springer, New York (2010)
Bae, E., Shi. J., Tai, X.C.: Graph cuts for curvature based image denoising. IEEE Trans. Image
Process 20(5), 1199–1210 (2010)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image
denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434
(2009)
Bertalmio, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture image inpainting.
IEEE Trans. Image Process. 12(8), 882–889 (2003)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Optimization and
Neural Computation Series, Athena Scientific, Belmont, Mass (1996(firstly published in 1982))
Blomgren, P., Chan, T.F.: Color TV: Total variation methods for restoration of vector-valued
images. IEEE Trans. Image Process. 7(3), 304–309 (1998)
Boyd, S.: Distributed optimization and statistical learning via the alternating direction method of
multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–
526 (2010)
Bredies, K., Pock, T., Wirth, B.: A convex, lower semicontinuous approximation of Euler’s elastica
energy. SIAM J. Math. Anal. 47(1), 566–613 (2015)
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 547
Brune, C., Sawatzky, A., Burger, M.: Bregman-em-tv methods with application to optical
nanoscopy. In: Tai, X.C., Mørken, K., Lysaker, M., Lie, K.A. (eds.) Scale Space and Variational
Methods in Computer Vision. Springer, Berlin/Heidelberg, pp. 235–246 (2009)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging
Vis. 20(1/2), 89–97 (2004)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems.
Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chan, R.H., Tao, M., Yuan, X.: Constrained total variation deblurring models and fast algorithms
based on alternating direction method of multipliers. SIAM J. Imaging Sci. 6(1), 680–697
(2013)
Chan, T., Wong, C.K.: Total variation blind deconvolution. IEEE Trans. Image Process. 7(3), 370–
375 (1998)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277
(2001)
Chan, T.F., Kang, S.H., Shen, J.: Euler’s elastica and curvature-based inpainting. SIAM J. Appl.
Math. 63(2), 564–592 (2002)
Chang, H., Lou, Y., Ng, M., Zeng, T.: Phase retrieval from incomplete magnitude information via
total variation regularization. SIAM J. Sci. Comput. 38(6), A3672–A3695 (2016)
Chen, C., Chen, Y., Ouyang, Y., Pasiliao, E.: Stochastic accelerated alternating direction method
of multipliers with importance sampling. J. Optim. Theory Appl. 179(2), 676–695 (2018)
Chen, X., Ng, M.K., Zhang, C.: Non-Lipschitz p -regularization and box constrained model for
image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)
Chen, Y., Levine, S., Rao, M.: Variable exponent, linear growth functionals in image restoration.
SIAM J. Appl. Math. 66(4), 1383–1406 (2006)
Deng, L.J., Glowinski, R., Tai, X.C.: A new operator splitting method for the Euler elastica model
for image smoothing. SIAM J. Imaging Sci. 12(2):1190–1230 (2019)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction
method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Duan, Y., Wang, Y., Hahn, J.: A fast augmented Lagrangian method for Euler’s elastica models.
Numer. Math. Theory Methods Appl. 006(001), 47–71 (2013)
Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications to
system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013)
Feng, X., Wu, C., Zeng, C.: On the local and global minimizers of 0 gradient regularized model
with box constraints for image restoration. Inverse Prob. 34(9), 095,007 (2018)
Gao, Y., Liu, F., Yang, X.: Total generalized variation restoration with non-quadratic fidelity.
Multidim. Syst. Sign. Process. 29(4), 1459–1484 (2018)
Glowinski, R., Tallec, P.L.: Augmented Lagrangians and Operator-Splitting Methods in Nonlinear
Mechanics. SIAM, Philadelphia (1989)
Glowinski, R., Osher, S.J., Yin, W. (eds.): (2016) Splitting Methods in Communication, Imaging,
Science, and Engineering. Springer, Cham
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging
Sci. 2(2), 323–343 (2009)
Güven, H.E., Güngör. A., Çetin, M.: An augmented Lagrangian method for complex-valued
compressed SAR imaging. IEEE Trans. Comput. Imag. 2(3), 235–250 (2016)
Hahn, J., Wu, C., Tai, X.C.: Augmented Lagrangian method for generalized TV-Stokes model. J.
Sci. Comput. 50(2), 235–264 (2012)
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas-Rachford alternating direction
method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian
for convexification and denoising. Computing 76(1–2), 109–133 (2006)
Hintermüller, M., Wu, T.: Nonconvex TVq -models in image restoration: Analysis and a trust-region
regularization–based superlinearly convergent solver. SIAM J. Imaging Sci. 6(3), 1385–1415
(2013)
548 Z. Liu et al.
Kang, S.H., Zhu, W., Jianhong, J.: Illusory shapes via corner fusion. SIAM J. Imaging Sci. 7(4),
1907–1936 (2014)
Lai, R., Chan, T.F.: A framework for intrinsic image processing on surfaces. Comput. Vis. Image
Und 115(12), 1647–1661 (2011)
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by
Poisson noise. J. Math. Imaging Vis. 27, 257–263 (2007)
Li, C., Yin, W., Jiang, H., Zhang, Y.: An efficient augmented Lagrangian method with applications
to total variation minimization. Comput. Optim. Appl. 56(3), 507–530 (2013)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization.
SIAM J. Optim. 25(4), 2434–2460 (2015)
Liu, Z., Wali, S., Duan, Y., Chang, H., Wu, C., Tai, X.C.: Proximal ADMM for Euler’s elastica
based image decomposition model. Numer. Math. Theory Methods Appl. 12(2), 370–402
(2018)
Lou, Y., Zhang, X., Osher, S., Bertozzi, A.L.: Image recovery via nonlocal operators. J. Sci.
Comput. 42(2), 185–197 (2010)
Lysaker, M., Lundervold, A., Tai, X.: Noise removal using fourth-order partial differential equation
with applications to medical magnetic resonance images in space and time. IEEE Trans. Image
Process. 12(12), 1579–1590 (2003)
Micchelli, C.A., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Prob.
27(4), 045,009 (2011)
Myllykoski, M., Glowinski, R., Karkkainen, T., Rossi, T.: A new augmented Lagrangian approach
for L1 -mean curvature image denoising. SIAM J. Imaging Sci. 8(1), 95–125 (2015)
Nikolova, M.: A variational approach to remove outliers and impulse noise. J. Math. Imaging Vis.
20(1–2), 99–120 (2004)
Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex
regularized least-squares. Multiscale Model. Simul. 4(3), 960–991 (2005)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E.: An accelerated linearized alternating direction method
of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Persson, M., Bone, D., Elmqvist, H.: Total variation norm for three-dimensional iterative recon-
struction in limited view angle tomography. Phys. Med. Biol. 46(3), 853–866 (2001)
Ramani, S., Fessler, J.A.: Parallel MR image reconstruction using augmented Lagrangian methods.
IEEE Trans. Med. Imaging 30(3), 694–706 (2011)
Rockafellar, R.T.: Augmented Lagrange multiplier functions and duality in nonconvex program-
ming. SIAM J. Control 12(2), 268–285 (1974)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin/Heidelberg (1998)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60, 259–268 (1992)
Sapiro, G., Ringach, D.: Anisotropic diffusion of multivalued images with applications to color
filtering. IEEE Trans. Image Process. 5, 1582–1586 (1996)
Selesnick, I., Lanza, A., Morigi, S., Sgallari, F.: Non-convex total variation regularization for
convex denoising of signals. J. Math. Imaging Vis. 62(6), 825–841 (2020)
Tai, X.C., Wu, C.: Augmented Lagrangian method, dual methods and split Bregman iteration for
ROF model. In: Scale Space and Variational Methods in Computer Vision, Second International
Conference, SSVM 2009, Voss, 1–5 June 2009. Proceedings, pp 502–513 (2009)
Tai, X.C., Hahn, J., Chung, G.J.: A fast algorithm for Euler’s elastica model using augmented
Lagrangian method. SIAM J. Imaging Sci. 4(1), 313–344 (2011)
Vese, L.A., Osher, S.J.: Modeling textures with total variation minimization and oscillating patterns
in image processing. J. Sci. Comput. 19(1/3), 553–572 (2003)
Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for dantzig selector.
SIAM J. Sci. Comput. 34(5), A2792–A2811 (2012)
Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation
image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization.
J. Sci. Comput. 78(1), 29–63 (2019)
13 On Variable Splitting and Augmented Lagrangian Method for Total. . . 549
Wu, C., Tai, X.C.: Augmented Lagrangian method, dual methods, and split Bregman iteration for
ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Wu, C., Zhang, J., Tai, X.C.: Augmented Lagrangian method for total variation restoration with
non-quadratic fidelity. Inverse Probl. Imaging 5(1), 237–261 (2011)
Wu, C., Zhang, J., Duan, Y., Tai, X.C.: Augmented lagrangian method for total variation based
image restoration and segmentation over triangulated surfaces. J. Sci. Comput. 50(1), 145–166
(2012)
Wu, C., Liu, Z., Wen, S.: A general truncated regularization framework for contrast-preserving
variational signal and image restoration: Motivation and implementation. Sci. China Math.
61(9), 1711–1732 (2018)
Yan, M., Duan, Y.: Nonlocal elastica model for sparse reconstruction. J. Math. Imaging Vis. 62,
532–548 (2020)
Yang, J., Yin, W., Zhang, Y., Wang, Y.: A fast algorithm for edge-preserving variational
multichannel image restoration. SIAM J. Imaging Sci. 2(2), 569–592 (2009)
Yashtini, M., Kang, S.H.: A fast relaxed normal two split method and an effective weighted TV
approach for Euler’s elastica image inpainting. SIAM J. Imaging Sci. 9(4), 1552–1581 (2016)
Zeng, C., Wu, C.: On the edge recovery property of noncovex nonsmooth regularization in image
restoration. SIAM J. Numer. Anal. 56(2), 1168–1182 (2018)
Zeng, C., Wu, C.: On the discontinuity of images recovered by noncovex nonsmooth regularized
isotropic models with box constraints. Adv. Comput. Math. 45(2), 589–610 (2019)
Zhang, H., Wu, C., Zhang, J., Deng, J.: Variational mesh denoising using total variation and
piecewise constant function space. IEEE Trans. Vis. Comput. Graphics 21(7), 873–886 (2015)
Zhang, J., Chen, K.: A total fractional-order variation model for image restoration with nonhomo-
geneous boundary conditions and its numerical solution. SIAM J. Imaging Sci. 8(4), 2487–2518
(2015)
Zhu, W., Chan, T.: Image denoising using mean curvature of image surface. SIAM J. Imaging Sci.
5(1), 1–32 (2012)
Zhu, W., Tai, X.C., Chan, T.: Augmented Lagrangian method for a mean curvature based image
denoising model. Inverse Prob. Imaging 7(4), 1409–1432 (2013)
Sparse Regularized CT Reconstruction: An
Optimization Perspective 14
Elena Morotti and Elena Loli Piccolomini
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
Tomographic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Mathematics of Sparse Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Lambert Beer’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
The Radon Transform and Its Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
The Filtered Back Projection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Model-Based Approaches for Sparse-View CT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
From Lambert-Beer’s Law to a Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Implementation of the Forward Operator M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
The Optimization Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
Iterative Algorithms for Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
Regularization: Little or Too Much? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
Toward the Convergence of the Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
New Frontiers of CT Reconstruction with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Case Study: Reconstruction of Digital Breast Tomosynthesis Images . . . . . . . . . . . . . . . . . . . 570
DBT 3D Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Model and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
Reconstructions of the Accreditation Phantom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
Reconstructions of a Human Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
Distance-Driven Approach for 3D CT Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Code Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
E. Morotti
Department of Political and Social Sciences, University of Bologna, Bologna, Italy
e-mail: [email protected]
E. L. Piccolomini ()
Department of Computer Science and Engineering, University of Bologna, Bologna, Italy
e-mail: [email protected]
Abstract
Keywords
Introduction
X-ray computed tomography (CT) is an imaging technique which has first been
experimented in the medical area, as the evolution of the projection radiography. In
particular, medical imaging was born not long after Wilhelm Röntgen discovered X-
rays in 1895, as soon as scientists realized X-ray capability of crossing objects: for
decades 2D planar images (projection radiographies) have been used to investigate
the inner parts of human bodies. However, these images represent a mean of the
information of the 3D scanned object which is squeezed on a 2D plane. In the 1930s,
a new mathematical theory by Johann Radon published in 1917, the studies by the
physician Grossmann, together with the desire to overcome the averaging process
of the conventional X-ray radiography, led to the definition of tomography as a
new tool for object inspection. Since the advent of computers in the 1970s, CT
has raised and revolutionized the non-intrusive diagnostic imaging by allowing the
three-dimensional orientation of anatomy to be reconstructed in transverse (cross-
sectional) sections.
To achieve it, the CT imaging device acquires several projections of the same
slice of the object under exam, from angled views in a round trajectory. Then, a
software reconstructs the digital image from the acquired projection data. Hence,
tomographic image reconstruction mathematically represents an inverse problem.
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 553
Fig. 1 In CT imaging, the direct problem (from the object to the data) is represented by the
acquisition of the sinogram, whereas the inverse problem (from the data to the object) is the
reconstruction of the image
Traditional methods for CT cannot face the ill-posedness and compute images
with unwanted artifacts and noise. To face this, a more recent approach models
the CT imaging process as an optimization problem where the inverse problem is
solved by inverting the discrete model, represented by a linear system, constrained
by means of regularization functions. Imposing regularization allows to choose a
good solution among the infinite possible ones.
In particular, the optimization problem is solved by iterative algorithms (called
model-based iterative algorithms). They converge to the problem solution in many
iterations, but they should possibly compute a good solutions far before conver-
gence. In fact, a slow convergence would make model-based iterative algorithms
not usable on real systems, where very fast executions are required for clinical
needs. However, acceleration techniques make iterative algorithms produce good
solutions in few iterations, and efficient parallel executions on low-cost GPU boards
greatly reduce the execution time; hence the optimization approach is effective in
real applications.
The aim of this chapter is:
The chapter is organized as follows. The next section contains a brief survey both
on the CT scan geometries (with particular attention to few-view protocols) and on
the mathematics of CT imaging. Then, the regularized optimization framework is
presented for the CT image reconstruction; examples of iterative reconstructions as
solution of the optimization problem from a 2D phantom prove the effectiveness
of the approach. Finally, a case study on 3D breast tomosynthesis is analyzed with
results from a parallel implementation on GPUs.
Tomographic Imaging
From the primordial systems to the most modern gantries currently used in medicine
and industrial applications, many studies have been led by different research
groups, collecting engineers, physicists, mathematicians, and computer scientists,
with the aim of improving both the technologies and the reconstruction software.
For each prefixed angled position of the X-ray source, first-generation CT devices
performed long-lasting projections where parallel rays allowed simple reconstruc-
tion algorithms (top-left image in Fig. 2). Among the numerous developments, the
shift from parallel- to fan-beam X-ray projections has been the most significant.
Fan-beam geometries are preferred today, since they enable to acquire all the
single-view measurements in one fan simultaneously (top-right image in Fig. 2).
However, computation speedups are required when recovering objects from fan-
beam projections in real scenarios (Averbuch et al. 2011).
Historically, a further step forward has been the blooming of 3D CT imaging
systems. The first developments led to helical CT, where the X-ray source walked
on a narrow helical trajectory scanning a volume with fan beams, slice by slice. As
depicted in Fig. 2, another approach exploits cone-beam projections to run over a
volume in just one scan. In this case, the X-ray source rotates on a circular planar
trajectory.
In the last years, many tomographic devices have been designed to fit different
medical needs, and, on the other hand, interesting technical, anthropomorphic,
forensic, and archeological as well as paleontological applications of CT have been
developed too (Hughes 2011; De Chiffre et al. 2014). As a consequence, the CT
technique is evolving into new inquiring forms. In particular, motivated by an
increasing focus on the potentially harmful effects of X-ray ionizing radiation, a
recent trend in CT research is to develop safer protocols to reduce the radiation
dose per patient. This allows to apply CT techniques to a wider class of medical
examinations, including vascular, dental, orthopedic, musculoskeletal, chest, and
mammographic imaging. Safer protocols are of interest not only for medicine but
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 555
Fig. 2 Sketches of tomographic devices, from the primordial technology with parallel X-ray scans
(top left) to the most modern solution exploiting fan beams for 2D (top right) and cone beams
(bottom) for 3D CT
also for material science and cultural heritage, to prevent damage to the subject
under study, due to excessive radiations.
Specifically, there are two main techniques allowing for a significant reduction
of the total radiation exposure per patient. The first one, usually named low-dose
CT, consists in reducing the X-ray tube current at each scan. In this case, the
geometry traditionally used in CT, where up to one thousand projections are taken
along the circular trajectory, does not change, but the measured data presents higher
quantum noise. The second practical way to lower the radiation consists in reducing
the number of X-ray projections. The resulting protocols are labeled as sparse
tomography (or sparse-view, few-view tomography), and it leads to incomplete
tomographic data, but very fast examinations (Kubo et al. 2008; Yu et al. 2009).
Figure 3 shows a graphical draft of the reconstruction process. In the first row,
the classical full-dose CT case is represented; in the second row, a sparse-view
556 E. Morotti and E. L. Piccolomini
Fig. 3 Sketches of the tomographic image reconstruction workflow, for full-view, sparse-view
full-angle, and limited-angle protocols (from top to bottom, respectively). From the different
geometries on the left, the acquired projections and the reconstructed image of the Shepp-Logan
phantom. The missing portions of sinogram in the sparse-view and limited-angle protocols are
depicted in light gray
interpretation. As depicted in Fig. 3, the sets of projection data are severely sub-
sampled in case of sparse-view and limited-angle acquisitions with respect to the
full-dose case. The resulting lack of information causes well-studied artifacts on the
images reconstructed with the algorithms traditionally used for full-view protocols.
However, thanks to the efficiency of new reconstruction approaches, some low-dose
and sparse-view protocols have already been approved for screening tests: safer
tomographic exams can indeed be led without compromising the reliability of their
diagnosis (Mueller and Siltanen 2012; He et al. 2018).
What is there behind the X-ray imaging techniques? From a physical point of view,
the projection data reflects the absorption of the photons constituting the X-rays,
and the image of the scanned object is a picture of the attenuation coefficient map
in pseudo-colors. The physical model describing photons absorption in terms of
attenuation coefficients is described in the Lambert Beer’s law.
where m(w) is the intensity of the incoming beam. Reordering (1) and computing
the limit, it holds:
Imposing the initial condition m(0) = m0 (where m0 is the known emitted photon
count, as in Fig. 4) and considering that all the measured intensities are positive
quantities, the previous equation can be written as:
558 E. Morotti and E. L. Piccolomini
Since the direction of the vector θ is uniquely determined by the rotation angle
Φ, it is convenient to denote with θ also the rotation angle. Now, by considering the
X-ray parallel beam emitted from the θ -angled position, the projection of the whole
object described by μ is the map Pθ μ : R → R such that:
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 559
+∞
Pθ μ(t) = μ(tθ ⊥ + sθ )ds, ∀ t ∈ R. (9)
−∞
In other words, the Radon transform R of an object slice described by μ is the set of
projections acquired along the full-angle circular trajectory, in a continuous model.
The process defining the full-dose tomography represents a discrete realization of
the (continuous) Radon transform. The graphical representation of all the measured
data, in the bidimensional case, is called sinogram, and it is represented in Fig. 3
for full-view, sparse-view, and limited-angle geometries. As it is clearly visible, in
case of sparse-view and limited-angle protocols, the incomplete projections provide
only a portion of the entire sinogram, making the corresponding inverse problems
trickier and the reconstruction process more complicated than in the full-view case.
data onto the original ray path causing such absorption (see Kak and Slaney (2001)
for more details).
The FBP algorithm is still implemented in many commercial systems, since
it computes the output image in a very short time, which is a fundamental
request in medical setting. However, it is well known that in the case of few
views the FBP algorithm produces images corrupted by artifacts and noise
(Natterer 2001).
Figure 6 shows some FBP reconstructions of the well-known Shepp-Logan
digital phantom obtained at different sparse geometries. The sparsity is boosted
by decreasing the angular range (from top to bottom) and the number of views
(from left to right). The FBP image quality deteriorates: the large angular step
characterizing sparse-view projections leads to streaking artifacts on the image,
whereas a limited-angle acquisition produces a swiped band corresponding to the
lost projecting directions. In the last row, where the scan is limited to a 60-degree
arc, the object inside the brain is deformed and not distinguishable, regardless of the
number of projection numbers.
Number of projections
360 120 30
[0, 360]
Angular range
[0, 180] [0, 60]
Fig. 6 Shepp Logan reconstructions by the popular FBP algorithm, at different geometric settings
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 561
In the real discrete setting, both the scanned object and the system detector are
discrete. The attenuation coefficient function μ(x, y) is discretized into an image
of N = Nx × Ny picture elements (pixels), with values fi,j , ∀i ∈ 1, . . . , Nx , j ∈
1, . . . , Ny , which can be re-ordered in a vector f.
The detector is made of np recording units of length δx μm; hence, at each X-ray
shot, np is the number of measured data. Figure 7 depicts a graphical example of
the discrete CT configuration where Nx = Ny = 4 and np = 7. The whole scan is
constituted by Nθ projections acquired at equally spaced θk angles, ∀k = 1, . . . , Nθ ,
and performed in the angular range [−Θ, +Θ]. Let Nd = Nθ ·np be the total number
of data: in classical CT Nd N, while Nd < N in case of sparse tomography.
Fixing the k-th projection (acquired from the θk -th angled position) and calling
mi the photon counting measured at the i-th recording unit (with i ∈ 1, . . . , np ),
from equation (7), it is possible to define:
mi
gi = − ln ∀i ∈ 1, . . . , np . (11)
m0
The line integral of equation (7) can be discretized into a sum over all the pixels;
hence:
Fig. 7 Scheme of the scanning process for three different angled projections. The sources rotate
around the 2D object along a circular trajectory. The slice of interest is discretized into N = 16
pixels and the detector has np = 7 recording units
562 E. Morotti and E. L. Piccolomini
N
θk
gi = Mi,j fj ∀i ∈ 1, . . . , np . (12)
j =0
g θk = M θk f (13)
Equation (14) represents the discretization of the Radon transform (10). Using a
more compact notation, the CT process is described by the linear system:
Mf = g (15)
The most crucial issue in the discrete formulation concerns the computation of
the matrix coefficients Mi,j : although very simple in principle, elaborate computer
algorithms and a significant amount of computer time are required to determine
its entries. Really, the matrix M is the mathematical description of the physical
process of CT data acquisition; hence, it must mirror the forward projection of a
slice onto the detector units, for all the scanning views. We recall that M is obtained
by collecting the matrices M θk corresponding to the single projections at angle
θk , k = 1, . . . Nθ as in (14). Different algorithms have been proposed in literature
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 563
θk
to efficiently compute the value Mi,j as the contribution of the object element fj
onto the detector unit gi . The most common are pixel-driven, ray-driven, distance-
driven, and separable trapezoid footprints. Figure 8 schematically draws the idea
behind each approach.
Historically, the first proposed approach has been the pixel-driven (Peters 1981)
one: according to the geometry of the device, the fj pixel is projected from its
center onto the element gi of the detector; its contribution is split among the adjacent
measuring units with a linear (or more complex) interpolation routine (Harauz and
Ottensmeyer 1983; Fessler 1997). When the spatial resolution of the reconstruction
is much bigger than the detector cell size, too few rays are taken into account, and
it may happen that some detector cells do not receive any values at all (which is, of
course, unrealistic).
In the ray-driven (or ray-casting) approach (Lacroute and Levoy 1994; Matej
et al. 2004; O’Connor and Fessler 2006), only a straight line is considered reaching
the center of each detector unit gi from the source, and then for each element fj
θk
crossed by the line, Mi,j is proportional to the length of segment intersecting fj .
In the distance-driven approach, proposed by De Mann in 2002, the idea is to
project onto the detector, for each element fj , not only a point but the element
in-plane expansion. This provides a linear shadow, enlarged for the height of fj ,
creating a rectangular footprint over one or more detector elements. For each
θk
element gi , the value of Mi,j is proportional to the area of the portion of rectangle
built on it. An extension of the distance-driven algorithm to the 3D case is presented
for the case study on limited-angle tomography in a following section.
The separable trapezoid footprint algorithm was introduced in 2010 by Long
and Fessler. In this method, all the vertices of the element fj are projected onto the
detector, and the element footprint is approximated by a trapeze, to shape a more
accurate footprint than in the distance driven case.
The last two methods better model the physical nature of X-ray beams; hence,
they compute more accurate projection matrices at the expense of a higher computa-
tional cost. All these approaches are conceptually straightforward to be generalized
to the 3D case.
564 E. Morotti and E. L. Piccolomini
Some final considerations about the matrix M implementing the forward opera-
tor:
• M is a very sparse matrix, because few pixels are effective for a single value of a
projection; hence, each row has mostly zero elements;
• M is under-determined in case of few views; hence, no unique solution exists for
the linear system (15);
• M cannot be stored because of its huge dimensions, neither in sparse form, for
most of the real CT imaging: whenever we need a matrix product, M must be
recalculated element by element and this represents a noticeable computational
effort.
or
where ≥ 0 and σ ≥ 0 are estimates of the noise and of the value of R(f ) in the
object, respectively.
A meaningful physical constraint to impose is the non-negativity of the solution
which reflects the non-negativity property of the linear attenuation coefficient μ;
hence, model (16) could be reinforced as:
LS(f ) = Mf − g 2
2 (20)
Nd
W LS(f ) = Wi (Mf − g)i 2
2 (21)
i=1
N
T V (f ) = ∇fj 2. (22)
j =1
N
T Vβ (f ) = ∇fj 2
2 + β2 (23)
j =1
p
T pV (f ) = ∇f p, 0≤p≤1 (24)
For the solution of the minimization problem expressed in one of the formulations
(16), (17), (18), and (19), a suitable optimization algorithm is used. For clinical
applications, not only an accurate reconstruction but also a low computational
time is required. Hence, the optimization algorithm should meet the following
demands:
Both the unconstrained (16) and constrained (17), (18), and (19) minimization
formulations depend on a parameter: λ, , or σ . The amount of regularization on
the solution depends on the choice of this parameter.
To investigate the effects of regularization on the reconstructed image, a dataset
freely downloadable from the web page of the Finnish Inverse Problems Society
www.fips.fi/dataset.php is considered (the relative documentation can be found in
Bubba et al. 2016). The object in exam is a lotus root (see Fig. 9) which has been
filled with several objects of different shapes, sizes, and attenuation coefficients.
The scanning process consists in 120 fan-beam projections, performed from a
circular trajectory with angular step size Δθ = 3 degrees; each real projection
array has been downsampled into 429 recorded values; hence, the sinogram is a
data matrix of size 429 × 120 and it is shown in Fig. 9. The dataset also provides
the forward projector, as a sparse matrix of size 51,480 × 65,536; hence, the
reconstruction will be an image of 256 × 256 pixels.
The reconstructions in Fig. 10 are obtained with the minimization model (19)
setting F(f ) as the LS function (20) and R(f ) as the T Vβ function defined in
(23) (with βT V = 10−3 ). The images are computed with different values of the
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 567
Fig. 9 On the left: a picture of the lotus root, filled with different materials. At the center: the
sinogram of the lotus dataset with 120 sparse projections. On the right: the sinogram with 20
highly sparse views
λ = 0.01 λ = 0.1
λ=1 λ = 10
Fig. 10 Results achieved at convergence, for increasing values of the regularization parameter
λ = 0.01, 0.1, 1, 10
568 E. Morotti and E. L. Piccolomini
regularization parameter λ, getting the increasing values λ = 0.01, 0.1, 1, 10. The
artifacts visible in the reconstruction with lowest value of λ disappear when the
regularization parameter increases. However, a too large value of λ blurs the image
as shown in the bottom row of Fig. 10.
Figure 11 reports the lotus images reconstructed at 10, 50, and 100 iterations and
at convergence (about 1000 iterations) using the sinogram with 120 projections
over 360 degrees. The regularization parameter λ is set to 1 in all the tests. From
the zoomed crops aside each reconstructed image, it is visible how the objects
of interest are better enhanced and detected with increasing iterations. It is also
evident that after very few iterations, the contours of the objects are defined, whereas
more iterations are necessary to obtain a good contrast. Moreover, Fig. 11 confirms
that the chosen model well approximates the desired image and that the iterative
method is converging toward the problem solution. In practice, the more iterations
are executed, the better will be the reconstructed image.
Finally, some considerations about the model when applied to a sparser geometry
can be deduced from Fig. 12, where the images are reconstructed from only 20
10 iterations 50 iterations
Fig. 11 Results obtained with λ = 1 at 10, 50, and 100 iterations and at convergence, from 120
projections
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 569
20 iterations 50 iterations
convergence
projections over 360 degrees (with angular step of 18 degrees) in 20 and 50 iterations
and at convergence (145 iterations).
In this case of very sparse-view full-angle CT, some artifacts are present in all
the reconstructions, and more iterations must be performed to achieve reasonable
results, compared to the previous geometry with many more projections. In
20 iterations, not all the objects are detectable and they have low contrast with the
background. However, increasing the iterations enhances the images better, and the
results obtained at the algorithm convergence are very promising.
By the way, these tests show the importance of running the reconstructing solvers
for a longer time, when the CT problem is characterized by a severe subsampling,
and it mirrors the difficulty to back-project the dataset and fit it, in case of few
tomographic projections.
Since few years ago, deep learning (DL)-based methods have emerged over fully
conventional or variational approaches for sparse-view tomographic reconstruction
(Wang et al. 2018). In the first experiments, neural networks have been mainly used
as a postprocessing tool to remove artifacts and noise from fast reconstructions
570 E. Morotti and E. L. Piccolomini
(typically obtained with analytical solver, as FBP). Such approach is usually called
learnt postprocessing (LPP). Here the network learns from a set of ground truth
images reconstructed from full-dose acquisitions (see, e.g., Han and Ye (2018),
Pelt et al. (2018), Zhang et al. (2019), Schnurr et al. (2019), Urase et al. (2020),
Morotti et al. (2021) and the references therein). However, in their inspiring work
(Sidky et al. 2020), Sidky et al. have claimed that the popular LPP schemes lack of
mathematical characterization and a new framework has been recently proposed in
Evangelista et al. (2022) to face this drawback.
Neural networks have been also introduced into model-based schemes to improve
their efficiency. In the so-called unrolling (or unfolding) strategies, each iteration is
executed by a layer of the neural network which learns, in the training phase, some
parameters of the optimization algorithm (Monga et al. 2021). The proposals differ
for the considered iterative scheme and for the block-per-iteration learned by the
neural network. For instance, in 2017, Adler and Öktem have developed a partially
learned gradient descent algorithm, whereas they have worked on the Chambolle-
Pock scheme in Adler and Öktem (2018). In Gupta et al. (2018) a convolutional
neural network is trained to act like a projector in a gradient descent algorithm,
whereas in Xiang et al. (2021) both the proximal operator and gradient operator of
an unrolled FISTA scheme are learned. In Zhang et al. (2020) the neural network
learns the initial iterate of the inner conjugate gradient solver in a splitting scheme
for optimization. A different approach is constituted by the plug-and-play scheme.
In this case, the minimization problem is solved by a splitting optimization method,
such as ADMM, and the neural network is plugged in the denoising substep of the
method at each iteration (Venkatakrishnan et al. 2013; He et al. 2018).
Fig. 13 A modern DBT device on the left and a comparison between a 2D mammographic image
and a DBT image slice, showing the same spiculated mass
higher accuracy in the most important breast diagnostic imaging tasks, i.e., finding
microcalcifications and suspected masses (Andersson et al. 2008; Das et al. 2010).
DBT 3D Imaging
DBT puts into practice a limited-angle sparse tomographic protocol for three-
dimensional imaging; hence, its image reconstruction is not trivial technically. As
schematically reported in Fig. 14 where a Cartesian axis system is introduced for
clarity, in a modern DBT machinery, the X-ray source moves on the Y Z−plane,
drawing an arc which spans 11 to 60 degrees typically (hence Θ ≈ 5 to 30 degrees,
according to the notation previously introduced). From equally spaced angled points
on such trajectory, Nθ = 9 − 25 projection images are acquired by the detector. The
detector is flat, built as a nx × ny grid of recording units with a uniform sensitive
area of δx × δy μm2 . Typically, δx and δy are 85–160 μm. Moreover, the detector is
fixed on a XY −plane and stationary during the whole scanning process.
The breast volume is numerically discretized into Nv = Nx ×Ny ×Nz volumetric
elements (called voxels) of size Δx ×Δy ×Δz μm3 . Due to the high resolution of the
projection images, DBT allows for very high in-plane resolution (i.e., the resolution
on the reconstructed slices which are parallel to the detector plane): Δx and Δy
are smaller than 0.1 mm. On the contrary, because of the severe narrowness of the
scanning range [−Θ, +Θ], DBT is unfeasible to reconstruct thin slices as classical
CT and its Z-axis resolution Δz is 1 to 1.5 mm typically.
572 E. Morotti and E. L. Piccolomini
Fig. 14 On the left, a sketch of a modern DBT device where the Cartesian axis system is added
for clarity. On the right, a view of the DBT geometry, projected onto the Y Z-plane
In contrast to classical medical CT, DBT also makes use of soft X-rays with few
tens of electron volts: this choice helps to reduce the provided radiations and it is
further motivated by the anatomical structure of the breast. In breast imaging, there
are no bones nor metallic objects, but adipose and fibro-glandular tissues that have
very low attenuating properties: breast materials would not capture many photons
from high-radiation X-rays. Since much more photon scattering occurs, this choice
provides noisier data; nevertheless, it also allows to detect the breast objects in a
more distinguishable way.
A further relevant feature of DBT imaging is due to its actual use in hospitals and
clinics, where the high frequency of DBT screening tests makes long executions
too expensive for a variety of reasons. As a consequence, an iterative solver can
perform few iterations and it is stopped far before its convergence, typically. Such
disadvantage is partially alleviated by parallel implementations (Jia et al. 2010;
Matenine et al. 2015; Cavicchioli et al. 2020), but as the allowed computational
time is shorter than 1 min, the huge amount of data and the complexity of the matrix
computation make only four or five iterations feasible.
TV-Based Framework
All the following reconstructions are computed as solutions of the non-negative
constrained and differentiable optimization problem:
As solver, the scaled gradient projection (SGP) method, which is a gradient descent-
like algorithm, is used (Loli Piccolomini et al. 2018). It is a first-order accelerated
method, already proposed in Loli Piccolomini and Morotti (2021) for real 3D
subsampled tomography. Essentially, the method follows a gradient projection
approach accelerated by choosing the step lengths with Barzilai-Borwein techniques
and by introducing a suitable scaling matrix improving the matrix conditioning. Its
convergence to the unique minimum of (25) is proved in Bonettini and Prato (2015),
under feasible assumptions. Numerically, the SGP solver runs until the following
stopping condition on the objective function J is satisfied by an iterate f (k) :
f (x (k) ) − f (x (k−1) )
< 10−6 . (26)
f (x (k) )
μMS − μBG
CN RMS = (27)
σMS − σBG
where μ and σ are the mean and standard deviation computed on the reconstructed
volume, in small regions located inside the mass (MS) or in the background (BG).
Similarly, the CNR measure on a microcalcification is defined as:
MMC − μBG
CN RMC = (28)
σBG
where d is the standard deviation of the Gaussian curve fitting the PP. In particular,
574 E. Morotti and E. L. Piccolomini
w = F W H M · Δy (30)
approximates the width of the examined microcalcification. The plane profiles are
also useful tools to evaluate the reconstruction accuracy on the transverse plane.
To estimate the solver effectiveness along the Z direction, which is the most
challenging purpose in DBT imaging, it is convenient to extract the artifact spread
function (ASF) vector from the digital reconstruction. The ASF components are
computed on a microcalcification as:
where μ(z) is the mean of the reconstructed values inside a circular region of three
pixels diameter inside the considered MC and in the background, z̄ corresponds to
the slice where the object is on focus, and Nz is the total number of discrete slices.
Similarly, we compute the ASF for the masses.
The tests here reported are performed on the Giotto Class digital system by the
Italian I.M.S. Giotto Spa company in Bologna (IMS Giotto Class). To get the
considered data, the source executes Nθ = 11 scans from equally spaced angles in
an approximately 30-degree range. The detector has squared pixel pitch of 85 μm,
whereas the reconstructed voxel dimensions along the three Cartesian axes are
Δx = Δy = 90 μm and Δz = 1 mm, respectively.
The scanned object is a breast imaging phantom, the model 020 of BR3D, pro-
duced by CIRS Tissue Simulation and Phantom company (Computerized Imaging
Reference Systems). It is characterized by a heterogeneous background, where
adipose-like and gland-like tissues are mixed in about 50:50 ratio. Inside, objects
of interest for breast cancer detection are inserted at the same depth: they are acrylic
spheres simulating breast masses (MSs), acrylic short fibers, and clusters of calcium
carbonate specks simulating microcalcifications (MCs), of different dimensions and
thickness.
Running the gradient descent solver, the convergence criterion (26) stops the
execution after 44 iterations. The fast decreasing behavior of the J function along
the iterative reconstruction process is remarkable, as visible in Fig. 15. The objective
function exhibits a very fast reduction in the first five iterations, whereas it has a very
flat trend from ten iterations on, as confirmed by the red-labeled values. Indeed, the
reconstructed images are visually almost indistinguishable after 30 iterations.
Figure 16 presents the reconstructions of a 4.7 mm mass and of a cluster with the
165-μm-thick MCs, obtained in 5, 15, and 30 iterations. In Fig. 17 the corresponding
PP and ASF plots are reported.
Simulated and anatomical masses are larger than microcalcifications, but their
lower photon absorption capability makes their detection difficult. In fact, even if
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 575
Fig. 16 Crops of a reconstructed slice on BR3D phantom, obtained in 5, 15, and 30 iterations
(from left to right). First row: zooms in of a mass. Last row: zooms in of a MC cluster
visible in only five iterations, the mass tends to present smooth edges, and more
iterations are required to enhance the mass contrast to the background (see the first
row of Fig. 16 and the corresponding plane profile in Fig. 17). The perfect location
of the mass at its correct depth still remains critical, since it tends to be out of focus
and blurred along the Z direction.
In spite of their smallness, microcalcifications are immediately visible on the
earliest model-based reconstructions, as high absorbing structures of a breast. In
fact, all the six MCs of the reported cluster are clearly detected in only five iterations,
but again the effect of the TV regularization needs longer executions to make them
less blurry and more contrasted from the background. It is remarked by the PP
576 E. Morotti and E. L. Piccolomini
Fig. 17 Plane profiles (first row) and ASF plots (second row) of the mass and one microcalci-
fication from the BR3D phantom reconstructions shown in Fig. 16. In all the plots: black line
corresponds to five iterations, red line to 15 iterations, and blue line to 30 iterations
Table 1 FWHM index (29) and w measures (30) computed on images reconstructed in 5, 15, and
30 SGP iterations. In the first column are the actual diameters of the microcalcification spheres of
the BR3D phantom
Diameters (μm) FWHM w (μm)
of the MC 5 it. 15 it. 30 it. 5 it. 15 it. 30 it.
230 4.77 3.32 2.70 430 299 243
165 3.52 2.65 2.32 317 238 209
130 – 2.05 1.52 – 185 137
plots of Fig. 17. Even the object detection along the Z axis improves with ongoing
iterations, as deductible from the depth-oriented inspection by the ASF plot. The
FWHM values and the corresponding MCs width w (reported in Table 1) denote
that the regularized iterative approach is indeed effective in recovering very small
microcalcifications: MCs of 130 μm width, which should approximately fill inside
only two voxels, are not discernible from the background in only five iterations (the
FWHM is not measurable here), but they can be well recovered after more iterations
with a good approximation of their real size.
At last, the increasing values of CNR in Table 2 denote the strong effect of the
regularized model in denoising the objects of interest.
Table 2 CNR measure for microcalcifications as in (28) and for masses as in (27) computed on
images reconstructed in 5, 15, and 30 SGP iterations. In the first column are the actual diameters
of the considered objects of the BR3D phantom
Diameters (μm) 5 it. 15 it. 30 it.
MS 4700 0.82 1.07 1.66
MS 3100 0.87 1.00 1.33
MC 230 24.21 33.34 38.00
MC 165 10.03 19.00 28.00
MC 130 7.27 11.02 17.00
Fig. 18 Results obtained after 5, 15, and 30 SGP iterations on a human breast dataset. First row:
zooms in of a 440 × 400 pixels region presenting both a spherical mass (pointed by the arrow)
and a microcalcification (identified by the circle). Last row: plane profiles on the mass and on the
microcalcification. In the plots: black line corresponds to 5 iterations and blue line to 30 iterations
the images in Fig. 18 zoom over such objects of interest on the reconstructions
computed in 5, 15, and 30 iterations. Figure 18 also shows the plot profiles of the
mass and the microcalcification. In this case, the mass detection is already effective
in the earliest reconstruction and its gray level intensity does not change remarkably,
but the denoising effect of the TV prior in the last iterations is evident on the PP. Also
the microcalcification is detected in few iterations, even if a more time-consuming
SGP execution enhances the contrast of the object with respect to the background
and the corresponding FWHM values (reported in Table 3) confirm its getting more
and more defined, from 5 to 30 iterations.
578 E. Morotti and E. L. Piccolomini
Fig. 19 The distance-driven approach for the forward projection on a DBT-like device. On the
left, the process is seen on the Y Z-plane (hence the detector is reduced to a 1D array and the
volume to a grid of voxels); on the right, one volume slice is considered over the detector. In all
the images, one detector unit is considered and remarked in blue, whereas its backward projection
cone is highlighted in pink, defining the voxels that are indeed involved in the forward projections
Fig. 20 The distance-driven approach for the forward projection on a DBT-like device. On the
left, the process is seen on the Y Z-plane for one slice; on the right, it is projected onto the XY -
plane. In all the images, one detector unit is considered and remarked in blue, whereas its backward
projection area is highlighted in pink, and the considered voxel is green
the backward footprints onto the slice (dashed pink in Fig. 20) of the i-th pixel and
ai,j be the area of the intersection between the pink and the gray squares, as denoted
in Fig. 20. The matrix element is computed as:
θk Δz ai,j
Mi,j = (32)
αi γi Ai
Code Parallelization
The required accuracy on the breast digital volume and the resolution of the detector
make the DBT problem of very high dimensions. The magnitude of the involved
numerical objects prevents the storage of the system matrix M on the hardware;
hence, its entries must be computed at each invocation of the matrix itself. This
causes an extremely long execution of the optimization solver (which also impacts
on the number of iterations allowed in a real clinical setting). In fact, by profiling a
serial execution of an iterative solver, two main kernels can be identified as heavy
580 E. Morotti and E. L. Piccolomini
computational tasks in each iteration, and they are the forward and the backward
applications of the matrix operator, i.e., the steps with the matrix-vector products
involving M and M T , respectively.
To set a realistic example, consider a volume with N = 1.5 · 108 voxels to be
recovered from Nθ = 11 views of 3000 × 1500 pixel projection images (resulting in
Nd ≈ 5 · 107 data). Table 4 reports the output of the profiling analysis of the scaled
gradient projection algorithm, compiled on an i7 high-end computer with 32 GB of
RAM and 1 TB of solid state disk (Cavicchioli et al. 2020). In such a configuration,
almost 90% of the computational time is spent for forward and backward projections
in a gradient descent solver, where both the kernels occur only once per iteration.
A third task addressing all the computations for the TV function covers 5% of the
execution time per iteration, whereas only the 8% is spent for all the remaining SGP
steps.
By parallelizing the C code on NVIDIA GPU by means of the CUDA SDK,
the execution times drastically go down: GPU implementation exploits the massive
parallel architecture of graphical boards and distributes work to hundreds of small
cores. However, if the algorithm cannot store all the necessary variables in the GPU
memory entirely, many data transfers between the CPU and the GPU are required
during each iteration of the solver (see Fig. 21): as visible from the second row of
Table 4 Results of the profiling of the iterative solver, according to its different implementations
on a CPU (Intel i7 7700K CPU at 4.3 GHz, 32 GB of RAM, and 1 TB of solid state disk) and on the
Titan V board by NVIDIA (12 GB of RAM and 5120 CUDA cores). In each row: the computing
time of the four considered kernels, the whole iteration time, the number of feasible iterations in
50 s, and the resulting speedup (with respect to the serial implementation). All the times are relative
to a single iteration of a gradient descent-like solver and are expressed in milliseconds
Forward Backward TV (ms) Other 1 iter. Iters. Speedup
(ms) (ms) (ms) (ms) in 50 s
Serial 235,368 237,556 23,841 39,735 536,500 – –
Parallel on CPU 270 263 1229 7613 9375 5 57×
Parallel on GPU 116 110 372 548 1146 50 468×
Table 4, the resulting parallel execution achieves a 57× speedup with respect to the
serial one, allowing for only five iterations in less than 1 min. On the contrary, if
the GPU has a larger global memory, a higher level of parallelism can be exploited
to completely run the SGP solver on the GPU so that one iteration requires about
only 1 s (reflecting an impressive speedup of almost 470). This means to achieve a
close-to-convergence reconstruction in less than 1 min.
Conclusion
Nowadays the medical world aims at enlarging the class of CT exams with new,
safe, and fast X-ray protocols, which can be defined by reducing the number of
projection views. Model-based iterative methods are efficient methods for sparse-
view CT image reconstruction, since they solve an optimization problem where
a priori information are embedded by means of a regularization function. When
approaching convergence, iterative solvers achieve very accurate images where
low-contrast objects and very small structures are well detected and shaped. On a
case study on real projections of 3D breast tomosynthesis, model-based approaches
reconstruct in very few iterations images where the objects of interest, such as
masses and microcalcifications, are clearly distinguishable. Moreover, a parallel
reconstruction of breast imaging on a GPU board can be obtained from real data
in less than 1 min, a time compatible with clinical requests.
Indeed, if the main drawback of iterative solver lays in their high computational
costs and slow executions, the ongoing development of GPU boards (which are
more and more powerful and affordable) paves the way to almost real-time
reconstructions, making this approach feasible for real-life applications.
Finally, the flexibility of the optimization framework also allows to incorporate
external information by means of neural networks to improve the quality of the
reconstructed image.
References
Computerized Imaging Reference Systems: https://www.cirsinc.com/products/a11/51/br3d-
breast-imaging-phantom/. BR3D Breast Imaging Phantom, Model 020
IMS Giotto Class: http://www.imsgiotto.com/
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33(12), 124007 (2017)
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Andersson, I., Ikeda, D.M., Zackrisson, S., Ruschin, M., Svahn, T., Timberg, P., Tingberg, A.:
Breast tomosynthesis and digital mammography: a comparison of breast cancer visibility and
birads classification in a population of cancers with subtle mammographic findings. Eur. Radiol.
18(12), 2817–2825 (2008)
Averbuch, A., Sedelnikov, I., Shkolnisky, Y.: CT reconstruction from parallel and fan-beam
projections by a 2-d discrete radon transform. IEEE Trans. Image Process. 21(2), 733–741
(2011)
582 E. Morotti and E. L. Piccolomini
Barca, P., Lamastra, R., Tucciariello, R., Traino, A., Marini, C., Aringhieri, G., Caramella, D.,
Fantacci, M.: Technical evaluation of image quality in synthetic mammograms obtained from
15◦ and 40◦ digital breast tomosynthesis in a commercial system: a quantitative comparison.
Phys. Eng. Sci. Med. 44(1), 23–35 (2021). cited By 0
Bonettini, S., Prato, M.: New convergence results for the scaled gradient projection method. Inv.
Probl. 31(9), 1196–1211 (2015)
Bubba, T.A., Hauptmann, A., Huotari, S., Rimpeläinen, J., Siltanen, S.: Tomographic x-ray data of
a lotus root filled with attenuating objects. arXiv preprint arXiv:1609.07299 (2016)
Buzug, T.M.: Computed tomography. In: Springer Handbook of Medical Technology, pp. 311–342.
Springer, Muller and Siltanen, Philadelphia(2011)
Cavicchioli, R., Hu, J., Loli Piccolomini, E., Morotti, E., Zanni, L.: A first-order primal-
dual algorithm for convex problems with applications to imaging. GPU acceleration of a
model-based iterative method for digital breast tomosynthesis. Sci. Rep. 10(1), 120–145
(2020)
Choi, K., Wang, J., Zhu, L., Suh, T.-S., Boyd, S.P., Xing, L.: Compressed sensing based cone-beam
computed tomography reconstruction with a first-order method. Med. Phys. 37(9), 5113–5125
(2010)
Das, M., Gifford, H.C., O’Connor, J.M., Glick, S.J. Penalized maximum likelihood reconstruction
for improved microcalcification detection in breast tomosynthesis. IEEE Trans. Med. Imaging
30(4), 904–914 (2010)
De Chiffre, L., Carmignato, S., Kruth, J.-P., Schmitt, R., Weckenmann, A.: Industrial applications
of computed tomography. CIRP Ann. 63(2), 655–677 (2014)
De Man, B., Basu, S.: Distance-driven projection and backprojection. In: 2002 IEEE Nuclear
Science Symposium Conference Record, vol. 3, pp. 1477–1480. IEEE (2002)
De Man, B., Basu, S.: Distance-driven projection and backprojection in three dimensions. Phys.
Med. Biol. 49(11), 2463 (2004)
Evangelista, D., Morotti, E., Piccolomini, E.L.: Rising a new framework for few-view tomographic
image reconstruction with deep learning. arXiv preprint arXiv:2201.09777 (2022)
Feldkamp, L.A., Davis, L.C., Kress, J.W.: Practical cone-beam algorithm. J. Opt. Soc. Am. A 1(6),
612–619 (1984)
Fessler, J.A.: Equivalence of pixel-driven and rotation-based backprojectors for tomographic image
reconstruction (1997)
Graff, C., Sidky, E.: Compressive sensing in medical imaging. Appl. Opt. 54(8), C23–C44 (2015)
Gupta, H., Jin, K.H., Nguyen, H.Q., McCann, M.T., Unser, M.: CNN-based projected gradient
descent for consistent CT image reconstruction. IEEE Trans. Med. Imaging 37(6), 1440–1453
(2018)
Hadamard, J.: Sur les problèmes aux dérivées partielles et leur signification physique, pp. 49–52.
Princeton University Bulletin, Natterer, Stuttgart (1902)
Han, Y., Ye, J.C.: Framing U-NET via deep convolutional framelets: application to sparse-view
CT. IEEE Trans. Med. Imaging 37(6), 1418–1429 (2018)
Harauz, G., Ottensmeyer, F.: Interpolation in computing forward projections in direct three-
dimensional reconstruction. Phys. Med. Biol. 28(12), 1419 (1983)
Hashemi, S., Beheshti, S., Gill, P.R., Paul, N.S., Cobbold, R.S.: Fast fan/parallel beam CS-based
low-dose CT reconstruction. In: 2013 IEEE International Conference on Acoustics, Speech and
Signal Processing, pp. 1099–1103. IEEE (2013)
He, J., Yang, Y., Wang, Y., Zeng, D., Bian, Z., Zhang, H., Sun, J., Xu, Z., Ma, J.: Optimizing
a parameterized plug-and-play ADMM for iterative low-dose CT reconstruction. IEEE Trans.
Med. Imaging 38(2), 371–382 (2018)
He, Y., Luo, S., Wu, X., Yang, H., Zhang, B.B., Bleyer, M., Chen, G.: Computed tomography
angiography with 3d reconstruction in diagnosis of hydronephrosis cause by aberrant renal
vessel: a case report and mini review. J. X-Ray Sci. Technol. 26(1), 125–131 (2018)
Huang, J., Zhang, Y., Ma, J., Zeng, D., Bian, Z., Niu, S., Feng, Q., Liang, Z., Chen, W.: Iterative
image reconstruction for sparse-view CT using normal-dose image induced total variation prior.
PloS One 8(11), e79709 (2013)
14 Sparse Regularized CT Reconstruction: An Optimization Perspective 583
Schnurr, A.-K., Chung, K., Russ, T., Schad, L.R., Zöllner, F.G.: Simulation-based deep artifact
correction with convolutional neural networks for limited angle artifacts. Zeitschrift für
Medizinische Physik 29(2), 150–161 (2019)
Sidky, E., Chartrand, R., Boone, J., Pan, X.: Constrained TpV-minimization for enhanced
exploitation of gradient sparsity: application to CT image reconstruction. IEEE J. Transl. Eng.
Health Med. 2, 1800418 (2014)
Sidky, E.Y., Kao, C.M., Pan, X.: Accurate image reconstruction from few-views and limited-angle
data in divergent-beam CT. J. Xray Sci. Technol. 14(2), 119–139 (2009)
Sidky, E.Y., Lorente, I., Brankov, J.G., Pan, X.: Do cnns solve the CT inverse problem? IEEE
Trans. Biomed. Eng. 68(6), 1799–1810 (2020)
Thibault, J.-B., Sauer, K.D., Bouman, C.A., Hsieh, J.: A three-dimensional statistical approach to
improved image quality for multislice helical CT. Med. Phys. 34(11), 4526–4544 (2007)
Urase, Y., Nishio, M., Ueno, Y., Kono, A.K., Sofue, K., Kanda, T., Maeda, T., Nogami, M., Hori,
M., Murakami, T.: Simulation study of low-dose sparse-sampling CT with deep learning-based
reconstruction: usefulness for evaluation of ovarian cancer metastasis. Appl. Sci. 10(13), 4446
(2020)
Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based
reconstruction. In: 2013 IEEE Global Conference on Signal and Information Processing,
pp. 945–948. IEEE (2013)
Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)
Wang, C., Tao, M., Nagy, J.G., Lou, Y.: Limited-angle CT reconstruction via the l_1/l_2
minimization. SIAM J. Imaging Sci. 14(2), 749–777 (2021)
Wang, G., Ye, J.C., Mueller, K., Fessler, J.A.: Image reconstruction is a new frontier of machine
learning. IEEE Trans. Med. Imaging 37(6), 1289–1296 (2018)
Wu, T., Moore, R.H., Rafferty, E.A., Kopans, D.B.: A comparison of reconstruction algorithms for
breast tomosynthesis. Med. Phys. 31(9), 2636 (2004)
Xiang, J., Dong, Y., Yang, Y.: Fista-net: learning a fast iterative shrinkage thresholding network for
inverse problems in imaging. IEEE Trans. Med. Imaging 40(5), 1329–1339 (2021)
Yu, L., Liu, X., Leng, S., Kofler, J.M., Ramirez-Giraldo, J.C., Qu, M., Christner, J., Fletcher, J.G.,
McCollough, C.H.: Radiation dose reduction in computed tomography: techniques and future
perspective. Imaging Med. 1(1), 65 (2009)
Yu, W., Zeng, L.: A novel weighted total difference based image reconstruction algorithm for few-
view computed tomography. PloS One 9(10), e109345 (2014)
Zhang, H., Liu, B., Yu, H., Dong, B.: Metainv-net: meta inversion network for sparse view CT
image reconstruction. IEEE Trans. Med. Imaging 40(2), 621–634 (2020)
Zhang, T., Gao, H., Xing, Y., Chen, Z., Zhang, L.: Dualres-UNET: limited angle artifact reduction
for computed tomography. In: 2019 IEEE Nuclear Science Symposium and Medical Imaging
Conference (NSS/MIC), pp. 1–3. IEEE (2019)
Zhang, Y., Chan, H.H.-P., Sahiner, B., Wei, J., Goodsitt, M., Hadjiiski, L.M.L., Ge, J., Zhou, C.: A
comparative study of limited-angle cone-beam reconstruction methods for breast tomosynthesis.
Med. Phys. 33(10), 3781 (2006)
Recent Approaches for Image Colorization
15
Fabien Pierre and Jean-François Aujol
Contents
Context and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Mathematical Modeling of Colorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Range of Chrominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
Color Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
State-of-the-Art of Color Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
Coupled Total Variation for Image Colorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Constrained TV-L2 Debiasing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
Exemplar-Based Colorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Morphing-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Segmentation-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
Patch-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
A Variational Model for Image Colorization with Channel Coupling . . . . . . . . . . . . . . . . 605
Colorization from Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Coupled Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Coupling Manual Approach with Exemplar-Based Colorization . . . . . . . . . . . . . . . . . . . . . 609
Coupling CNN with a Variational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
F. Pierre ()
LORIA, UMR CNRS 7503, Université de Lorraine, INRIA projet Tangram, Nancy, France
e-mail: [email protected]
J.-F. Aujol ()
Univ. Bordeaux, Bordeaux INP, CNRS, IMB, UMR 5251, Talence, France
e-mail: [email protected]
Abstract
In the last years, image and video colorization has been considered from many
points of view. The technique consists of the addition of a color component to a
grayscale image. This operation needs additional priors which can be given by
manual intervention of the user from an example image or be extracted from a
large dataset of color images. A very large variety of approaches has been used to
solve this problem, like PDE models, non-local methods, variational frameworks,
learning approaches, etc. In this chapter, we aim at providing a general overview
of state-of-the-art approaches with a focus on few representative methods.
Moreover, some recent techniques from the different types of priors (manual,
exemplar-based, dataset-based) are explained and compared. The organization
of the chapter aims at describing the evolution of the techniques in relation to
each other. A focus on some efficient strategies is proposed for each kind of
methodology.
Keywords
Challenge
(there are 256 gray levels and about 16 million colors displayable on standard
screens). In existing approaches, this information can be added by three ways: the
first one directly adds color to the image by the user (see, e.g., the approach of Levin
et al. 2004), the second one provides an example image (also called source image,
see, e.g., the method of Welsh et al. 2002), and the third one uses a deep learning
approach based on a large database (see for instance the method of Zhang et al.
2016).
In this chapter, we propose a general overview of colorization methods which
have been described in the literature with a focus on few representative approaches.
This review is not based on the application point of view but it has been done from
a methodological perspective. The term “automatic” has been widely used, but it
means in fact that the algorithms are able to assist the user. For manual methods,
the diffusion of the colors put by the user is automatic, and for exemplar-based
approaches, the diffusion of colors from a given reference image to the target one
is automatic but actually it requires the choice of the source image. For dataset-
based colorization, the colorization is automatic after training on a large dataset
given by the user. In this chapter, an overview of the three different approaches
to colorize images (manual, exemplar-based, and dataset-based) is proposed. In
particular, a highlight on a variational model is used as a thread along the chapter
because this model enables some coupling of different approaches such as manual
with exemplar-based. More generally, we focus on the different strategies available
among state-of-the-art methods for each kind of methodology. Moreover, a final
section proposes an overview of coupled strategies.
In this chapter, the mathematical modeling of the colorization problem is
reviewed in section “Mathematical Modeling of Colorization”. Next, in sec-
tion “Range of Chrominance”, we recall the definition of the range of the solution,
and we present an algorithm to compute an orthogonal projection onto this set. The
three next sections deal with, respectively, the manual, the exemplar-based, and the
dataset-based colorization. Finally, in section “Coupled Approaches”, we propose
an overview about the coupling of some techniques within a variational formulation.
Some other definitions are also sometimes used. For instance, the L channel of the
CIE Lab color space can be used. In order to preserve its content, colorization meth-
ods must always require that the luminance channel of the image of interest is equal
588 F. Pierre and J.-F. Aujol
to the target image. Most methods compute only the two chrominance channels,
complementary to the luminance, which is enough to provide a displayable color
image.
Some different spaces have been introduced, such as YUV, YCbCr, YIQ, etc. The
transformation from RGB to YUV is linear and defined with the following matrices:
Let us notice that the main problem raised by these color spaces is that all the
luminance-chrominance values cannot be converted into a RGB color between 0
and 255. Thus, some additional techniques have to be employed to recover the RGB
color image (Pierre et al. 2015c). These techniques are out of the scope of this
chapter, but the reader has to keep in mind that they are essential to compute the
final result. The next section recalls the basis of gamut problem in the case of the
YUV color space.
Range of Chrominance
The natural problem arising when editing a color while keeping its luminance
or intensity constant is the preservation of the RGB standard range of the pro-
duced image. Most of the methods of the literature work directly in the RGB
space (Nikolova and Steidl 2014; Fitschen et al. 2015; Pierre et al. 2015c), since
it is easier to maintain the standard range. Nevertheless, working in the RGB space
needs to process three channels, while two chrominance channels are enough to edit
a color image while keeping the luminance.
Remark 1. For a given luminance, the chrominance values out of this polygon can
be transformed into the RGB space, but they are out of the bounds of the RGB cube.
A truncation of the coordinates is usually done, but it generally changes both the
luminance and the hue of the result.
15 Recent Approaches for Image Colorization 589
Fig. 1 The set of the RGB colors with a particular luminance is a convex polygon. The map from
RGB to YUV being affine, the set of the corresponding chrominances is also a convex polygon.
(a) Set of the RGB colors with a fixed luminance. (b) Corresponding colors in the YUV space
Proof. [of Proposition 1] The intuition of the proof is given in Fig. 1. The set of the
colors in the RGB cube whose luminance is equal to a particular value y is a convex
polygon (see, e.g., Pierre et al. 2015c). Indeed, the set of colors with a particular
luminance is an affine plane in R3 and the intersection of the RGB cube with it is
a polygon. The transformation of the RGB values into the YUV space being affine,
the set of corresponding colors is thus also a convex polygon included in the set
Y = y.
Color Diffusion
Fig. 2 To compute the orthogonal projection, different cases can appear. If the YUV color respects
the constraint, the projection is the identity. Otherwise, the orthogonal projection onto the closest
edge or vertex should be done
Some papers of the literature aim at helping the user to perform manual colorization.
This is done by a diffusion of the colors over the grayscale image by various tech-
niques. The diffusion approaches can also take inspiration from manual colorization
15 Recent Approaches for Image Colorization 591
where r ∼ s means that pixels r and s are neighbors and U is a chrominance channel
(the same functional is minimized for the channel V ). wrs denotes the weights which
can be either:
2 /2σ 2
wrs ∝ e(Y (r)−Y (s)) ,
or:
1
wrs ∝ 1 + (Y (r) − μr )(Y (s) − μr ),
σr2
where μr and σr denote the mean and the variance of the neighborhood of the pixel
r. The two types of weights are more or less sensitive to the variation of contrast.
The authors of Luan et al. (2007) include texture similarity in the model of Levin
et al. (2004) to improve the diffusion process.
The authors of Yatziv and Sapiro (2006) have proposed a simple and fast method
using geodesic distance to weight for each pixel the melting of the colors given by
the scribbles. For each pixel of the grayscale image, the geodesic distance from the
scribble is computed with respect to the gradient of the image. Next, a weighted
average of the chrominances given by the scribbles is computed. The weights are
computed from a function depending on the geodesic distance. This method enables
a diffusion of the chrominance on constant parts of the image with respect to a
function having similar properties as the inverse function:
• limr→0 w(r) = ∞;
• limr→∞ w(r) = 0;
• limr→∞ w(r + r0 )/w(r) = 1.
1
Yatziv et al. have proposed experimental results with the function with 1 ≤ b≤6.
rb
592 F. Pierre and J.-F. Aujol
The authors of Kawulok et al. (2012) have extended this method to textured
images by introducing texture descriptors in the diffusion potential.
Some methods are designed as a propagation of the colors from neighbors to
neighbors. Some colors are given by strokes drawn by the user. In this way, some
of the image pixels are colored. The algorithm then propagates the color to their
neighbors with a rule based on the values of the grayscale image. To this aim, the
authors of Heu et al. (2009) give an explicit formula for melting the neighbor colors,
whereas the ones of Lagodzinski and Smolka (2008) provide a modeling based on
probabilistic distance transform, and the authors of Kim et al. (2010) use random
walks.
It was also proposed to use diffusion through the regularization of non-local
graphs. The method proposed by Lézoray et al. (2008) is based on the regularity
of the image. This is modeled as a graph, each pixel being represented by a vertex
and each neighborhood relationship by an edge. A local graph is considered, where
each edge represents a relationship of eight neighborhoods. The weight of an edge
being inversely proportional to the difference between gray levels, the minimization
of an energy depending on these weights (see, e.g., Lézoray et al. 2007a) enables
to diffuse the chrominances on the constant parts of the image. If a non-local graph
is designed with a weight which depends on the distance between patches, a set
of pixels is considered constant if the patches are similar. Thus, the color of the
scribbles is diffused between pixels close in the graph, therefore belonging to similar
textures.
Inspired by the PDE diffusion scheme (Perona and Malik 1990), some chromi-
nance diffusion including a guidance with Di Zenzo tensor structure computed from
grayscale image was proposed independently by Peter et al. (2017) and by Drew
and Finlayson (2011).
The authors of Quang et al. (2010) have proposed a variational approach in
chromaticity-brightness color space (see, e.g., Chan et al. 2001) to interpolate the
missing colors. The reproducing kernel Hilbert spaces (RKHS) are used to compute
a link between the chromaticity and brightness channels. Jin et al. (2016) introduced
a variational model with the coupling of contour directions. Based on Mumford-
Shah-type functional, the authors of Jung and Kang (2016) introduced a novel
variational image colorization model. In the following, we present a recent state-
of-the-art method based on total variation minimization. This approach enables to
combine various strategies of the literature.
with
TVC (u) = γ ∇Y (x)2 + ∇U (x)2 + ∇V (x)2 dx, (6)
where Y , U , and V are the luminance and chrominance channels. This term is a
coupled total variation which enforces the chrominance channels to have a contour
at the same location as the luminance ones. γ is a parameter which enforces
the coupling of the channels. Some other total variation formulations have been
proposed to couple the channels; see for instance Kang and March (2007) or
Caselles et al. (2009).
The fidelity-data term is a classical L2 norm between chrominance channels of
the unknown u and the data c. For each pixel, the chrominance values live onto the
convex polygon denoted by R and described in section “Range of Chrominance”.
This last assumption ensures that the final solution lies onto the RGB cube, avoiding
the final truncation that leads to modification of the luminance channel. Model (5)
is convex and it can be turned into a saddle-point problem of the form:
λ
min max u − c2 + ∇u|z1,...,4 + γ ∇Y |z5,...,6 − χB(0,1) (z) + χR (u). (7)
2
u∈R z∈R 6 2
The primal-dual algorithm (Chambolle and Pock 2011) used to compute such
saddle-point problem is recalled in Algorithm 2, where PR is the orthogonal
projection described in Algorithm 1 and PB is defined as follows for one pixel:
z1,...,4 , z5,6 − σ ∇Y
PB (z) = . (8)
max 1, z1,...,4 , z5,6 − σ ∇Y 2
6: un+1 ← 2un+1 − un
7: end for
8: set û(c) = un+1 and ẑ = zn+1 .
594 F. Pierre and J.-F. Aujol
The results produced by Algorithm 2 are promising, but with a low data
parameter λ, they are drab (see, e.g., Pierre et al. 2017b).
In this section we present a debiased algorithm for correcting the loss of colorfulness
of the solution given by the optimum of (5).
where F is a convex data-fidelity term with respect to the data c and G is a convex
regularizer. For G being the total variation regularization, the estimator û(c) is
generally computed by an iterative algorithm, and it presents a loss of contrast with
respect to the data c. In order to debias this estimator, the CLEAR method refits the
data c with respect to some structural information contained in the biased estimator
û. This information is encoded by the Jacobian of the biased estimator with respect
to the data c:
For instance, when G is the anisotropic TV regularization, the Jacobian contains the
information concerning the support of the solution û, on which a projection of the
data can be computed.
In general case, the CLEAR method relies on the refitting estimator Rû (c) of the
data c from the biased estimation û(c):
where δ = c − û(c). In practice, the global value ρ allows to recover most of the
bias in the whole image domain.
An algorithm is then proposed in Deledalle et al. (2017) to compute the numerical
value of Jû(c) (c − û(c)). The process is based on the differentiation of the algorithm
providing û(c).
It is important to notice that the CLEAR method applies well for estimators
obtained from the resolution of unconstrained minimization problems of the
form (9). Nevertheless, it is not adapted to the denoising problem (5) that contains
an additional constraint χR (u) as CLEAR may violate the constraint.
In the case of the constrained problem, the function to be minimized is written as:
ρ → û(c) + ρJû(c) c − û(c) − c22 + χR (û(c) + ρJû(c) c − û(c) ). (15)
Let us denote by ρ the value defined in Equation (13). In the case when the constraint
is fulfilled, i.e., when û(c) + ρJû(c) c − û(c) ∈ R, then the minimum of (15) is
reached with ρ.
If not, since function (15) is convex, it is possible to compute explicitly the
minimizer. The value ρ = 0 is in the domain of the functional because û(c) ∈ R.
The idea is to find the maximum value of ρ such that û(c) + ρJû(c) δ ∈ R. In this
case, since R is a convex polygon, this computation can be done with a ray-tracing
algorithm (Williams et al. 2005). To this aim, we can parametrize the segment
[û(c), û(c) + ρJû(c) δ]:
ρ̃ = max tρ such that û(c) + tρJû(c) c − û(c) ∈ R. (16)
t∈[0,1]
Equation (16) can thus be directly solved by the maximum value t such that û(c) +
tρJû(c) c − û(c) intersects the border of R.
596 F. Pierre and J.-F. Aujol
Finally, the ray-tracing is applied to obtain ρ̃ and get the debiased solution as
û(c) + ρ̃Jû(c) (c − û(c)).
Unfortunately, this direct approach does not lead to valuable results on general
cases. Indeed, if for one particular pixel the solution û(c) is saturated, and if the
solution is out of R, then ρ̃ = 0 is the unique global ρ satisfying û(c) +
debiased
ρJû(c) c − û(c) ∈ R. Thus, the debiased solution is equal to the biased one, and
the debiasing algorithm has no action.
In the next section, we propose a model with an adaptive ρ parameter, depending
on the pixel, to tackle this saturated value issue.
ρ̃ω = max tω ρ such that û(c)ω + tω ρJû(c),ω cω − û(c)ω ∈ R. (18)
tω ∈[0,1]
This definition ensures that the debiased estimation fulfills the constraint.
Moreover, if the debiasing method of Deledalle et al. (2017) produces an estimation
that fulfills the constraint, this solution is retained. Notice however that the CLEAR
hypothesis Jh (c) = ρJû (c) for some ρ ∈ R in model (11) is not fulfilled anymore.
In numerical experiments, for most pixels, the values of ρ̃ω computed with this
method are the same as with Model (11).
As illustrated by Fig. 3, such a local debiasing strategy realizes an oblique
projection onto R (Figs. 4 and 5).
Fig. 3 The refitting of the method of Deledalle et al. (2017) may be out of the constraint. An
oblique projection onto this constraint is able to respect most of hypotheses of Model (11) while
fulfilling the constraint
598 F. Pierre and J.-F. Aujol
The transformation of the chrominance values u = (U, V ) to the RGB space with
the luminance value Y is denoted by TY (u). From the expression of the standard
transformation from RGB to YUV, we have TY (u) = Y (1, 1, 1)t + L(U, V ) with
L a linear function. The following equalities come:
so that
255 − TY (u)R
ρ̃R255 = . (23)
TY (c)R − Y
For each of the six values ρ̃cv computed as in Equation (23), one can compute
ρ̃ v
tcv = ρc . The values tcv that are between 0 and 1 correspond to an intersection of the
segment [u, u + ρc] with the boundaries of R. One finally takes t ∗ = mintcv ∈[0;1] tcv
and the result of Equation (18) is given by t ∗ ρ.
Figure 4 and 5 show some numerical results to compare Models (5) and (19).
One can remark that Model (5) fits well the contours of images in comparison to the
standard TVL2 model on chrominance channels. Moreover, the debiasing approach
improves the colorfulness of the results in comparison with Model (5) and it has the
advantage of well fitted contours.
To summarize, to design a suitable variational model for image colorization, the
three main ingredients are the coupled total variation, the orthogonal projection
onto the range of the problem, and the debiasing algorithm. This variational model
is a basis for image colorization in many paradigms. In the next sections, some
concrete cases of application of this model are presented in the case of exemplar-
based approaches or coupled with manual techniques or CNN-based framework.
15 Recent Approaches for Image Colorization 599
Fig. 4 Results of chrominance channels with a TV-L2 model on chrominance, with the biased
method, and with the unbiased method. The debiasing algorithm produces more colorful results
Fig. 5 The advantage of the coupled total variation (5) on the TV-L2 model has been shown
in Pierre et al. (2015a). In Pierre et al. (2017b), it is refined in a better colorfulness-preserving
model
Exemplar-Based Colorization
The manual methods enable the user to choose the color in each pixel of the image.
Nevertheless, their main drawback is the tedious work needed for complex scenes,
for instance with textures. In exemplar-based image colorization methods, the color
information is provided by a color image called source image. The grayscale image
to colorize is called target image. This color image can be chosen by the user or
automatically provided from a database with an indexation algorithm.
600 F. Pierre and J.-F. Aujol
The results available in this chapter are based on Pierre et al. (2014b,c, 2015a)
which are among the most recent methods in patch-based colorization and on Persch
et al. (2017) which is the current most competitive method for exemplar-based
colorization of face images.
In order to transfer the colors from the source image to the target one, three
concepts have been proposed in the literature. One of them is based on geometry,
the two others are based on texture similarities. The first one is specifically well
adapted to face colorization. In the first part of this section, we will review the
work of Persch et al. (2017) which is the current most competitive method for
exemplar-based colorization of face images. Next, we will present an overview of
segmentation-based approaches which use the texture similarities on the segmented
parts of the images to transfer colors. Finally, we present patch-based technique
which avoids the requirement of an efficient segmentation method and which can be
coupled with a variational model.
Morphing-Based Approach
In this section, we describe the model of Persch et al. (2017). The authors compute
the morphing map between the two grayscale images Itemp and Itar with a model
inspired by Berkels et al. (2015). This results in the deformation sequence ϕ which
produces the resulting map from the template image to the target one. Due to the
discretization of the images, the map is defined, for images of size n × m, on the
discrete grid G := {1 . . . n} × {1 . . . m}:
where (x) is the position in the source image which corresponds to the pixel x ∈ G
in the target image. Now we colorize the target image by computing its chrominance
channels, denoted by (Utar (x), Vtar (x)) at position x as
Utar (x), Vtar (x) := U ((x)), V ((x)) . (25)
The chrominance channels of the target image are defined on the image grid G, but
usually (x) ∈ G. Therefore, the values of the chrominance channels at (x) have
to be computed by interpolation. In the algorithm, bilinear interpolation is simply
used, which is defined for (x) = (p, q) with (p, q) ∈ [i, i + 1] × [j, j + 1],
(i, j ) ∈ {1, . . . , m − 1} × {1, . . . , n − 1} by
U ((x)) = U (p, q)
U (i, j ) U (i, j + 1) j +1−q
:= (i + 1 − p, p − i) .
U (i + 1, j ) U (i + 1, j + 1) q −j
(26)
15 Recent Approaches for Image Colorization 601
Fig. 6 Overview of the color transfer. The mapping ϕ is computed from a model inspired
by Berkels et al. (2015) between the luminance channel of the source image and the target
one. From this map, the chrominances of the source image are mapped. Finally, from these
chrominances and the target image the colorization result is computed
Finally, a colorized RGB image is computed from its luminance Itar = Ytar and the
chrominance channels.
Figure 6 summarizes the color transfer method.
The technique proposed in Persch et al. (2017) is adapted to faces. To address the
problem of colorization of textured images, geometric similarities are not reliable.
Texture similarities have to be obviously compared. Such approaches are reviewed
in the next sections.
Segmentation-Based Techniques
In order to transfer the colors from the source image to the target one, a lot of
approaches are based on an image segmentation technique in order to compare the
statistical attributes of the textures. For instance, the authors of Irony et al. (2005)
proposed to compute the best correspondence between the target image and some
segmented parts of the source image. From these correspondences, some micro-
scribbles are drawn of the target image from the source image and the color strokes
are then propagated by the diffusion technique in Levin et al. (2004). In Sỳkora et al.
(2004), the author used a segmentation approach to colorize images of old cartoons.
The method of Gupta et al. (2012) extracts various descriptors from superpixel
segmentation (see, e.g., Ren and Malik 2003; Achanta et al. 2012) from target image
and matches them with the ones of the target image with these various descriptors
(SURF, mean, standard deviation, Gabor filters, etc.). The method hence draws one
scribble for each superpixel from this matching. The final color is computed from
the optimization of a criterion which favors a spatial consistency of the colors as
done in Levin et al. (2004). A similar approach has been proposed in Kuzovkin
et al. (2015).
602 F. Pierre and J.-F. Aujol
Patch-Based Methods
The first patch-based method for image colorization is the one proposed by Welsh
et al. (2002), which is widely inspired by the texture synthesis algorithm introduced
by the authors of Efros and Leung (1999). It is based on the patch similarities in the
colorization process.
First, a luminance remapping (see, e.g., Hertzmann et al. 2001) is done as a first
step: in order to make the luminance values more comparable between the source
image and the target one, an affine mapping is used on the luminance of the source
image in order to better match the histogram of the luminance channel. Indeed, the
range of the luminance channels could be different and the comparison of these
channels could be senseless.
Next, for each pixel of the target image, the algorithm compare the patch centered
in this pixel with a set of patches extracted from the luminance channel of the
source image. Once the closest patch is computed, the chrominance values of the
pixel at the center of the patch of the source image are extracted and provided to
the considered pixel in the target image (see, e.g., Fig. 7). In combination with the
luminance of the target image and the chrominance values extracted from the source
image, a RGB color is given.
The set of reference patches extracted from the source image is a subset of
patches randomly chosen in this way: the image is divided within a regular grid
and one pixel is chosen randomly on each part of this grid (see, e.g., Fig. 7b).
The authors of Di Blasi and Reforgiato (2003) proposed an improvement which
speeds up the patch research with a tree-clustering algorithm inspired from Wei and
Levoy (2000). Next, the authors of Chen and Ye (2011) proposed an improvement
based on a Bayesian approach.
The patch-based approaches suffer from two drawbacks, which are the difficulty
choosing a reliable metric to compare the patches and the spatial coherency in the
border of two areas with different textures. We will see in the following how to
overcome these limitations.
The patch-based approaches need some metrics in order to compare the patches.
Unfortunately, there does not exist any perfect metric, each of them having its
advantages and drawbacks. In most computer vision problems, the algorithms have
to distinguish objects or textures with the same accuracy and the same sensitivity as
human visual system. Metrics for texture comparison are based on numerical data.
The link between this data and the human visual system is done by features that are
vectors which describe the local statistic of the image.
The most simple metrics are based on the mean or the standard deviation of the
patches, whereas some others use histograms, Fourier transform, SURF features
(Bay et al. 2006), structure tensors, co-variance matrices, Gabor features, etc.
15 Recent Approaches for Image Colorization 603
Fig. 8 Some methods of the literature begin with the search of C candidates per pixel (here C = 8)
These descriptors are used by the authors of Bugeau and Ta (2012) to extract eight
color candidates for each pixel in the same way as done in Welsh et al. (2002).
For each metric, the method retains the pixel of the source image corresponding
to the closest patch with respect to this metric. After this step, for each pixel of
the target image, eight pixels of the source image can match. To summarize, each
pixel having its luminance and eight chrominance values coming from the matched
pixels (see, e.g., Fig. 8), eight colors are available, called color candidates. In the
work of Bugeau and Ta (2012), the colors are used directly, whereas in Pierre et al.
(2014a) an oblique projection in the RGB color space is proposed in order to avoid
some artificial modification of the hue due to gamut problems.
Some other metrics could be used. For instance, whereas the method of Charpiat
et al. (2008) is not based on patch decomposition, it uses a local representation with
SURF descriptors to predict color in each pixel. Let us mention that this method
also requires numerous and complex steps.
With multiple color candidates coming from various descriptors, a choice has to
be done among them. In the following we will consider a generic number of color
candidates denoted by C. The aim of the methods described hereinafter consists of
15 Recent Approaches for Image Colorization 605
the selection of one of the color candidates. Let us notice that the choice of an ideal
metric based on metric learning has been proposed in Pierre et al. (2015b) but with
rather worst results than the state of the art due to a lack of spatial regularization
of the results. In order to retain only one color per pixel, the authors of Bugeau
and Ta (2012) proposed to compute a median of the candidates based on an order
between them computed with a standard PCA of the set of colors. This PCA is
required because there is no natural order in the RGB space of colors. The method
of Lézoray et al. (2005, 2007b) provides an order in the set of colors, but it requires
some neighborhood information which is not available here.
Let us remark that the method of Bugeau and Ta (2012) does not use the spatial
regularization or spatial coherency of the color to choose a color candidate. The
authors of Jin et al. (2019) proposed an extension to exemplar-based colorization
of Jung and Kang (2016) with color inference based on patch descriptors (DFT
and variance of patches). A variational method similar to Pierre et al. (2015a) is
proposed to regularize the final results.
In Pierre et al. (2015a), the authors have proposed a functional that selects a color
among candidates extracted from a patch-based method, inspired by the method
of Bugeau et al. (2014), in order to tackle some issues (numerical cost of numerical
scheme, halo effects, etc.). Assume that C candidates are available in each pixel
of a domain and assume that two chrominance channels are available for each
candidate. Let us denote for each pixel at position x the i-th candidate by ci (x),
u(x) = (U (x), V (x)) stands for chrominances to compute, and w(x) = {wi (x)}
with i = 1, . . . , C for the candidate weights. Let us minimize the following
functional with respect to (u, w):
C
λ
F (u, w) := T VC (u) + wi u(x) − ci (x)22 dx + χR (u(x)) + χ (w(x)).
2 i=1
(27)
C
wi (x)u(x) − ci (x)22 dx. (28)
i=1
This term is a weighted average of some L2 norms with respect to the candidates ci .
The weights wi can be seen as a probability distribution of the ci . For instance, if
w1 = 1 and wi = 0 for 2 ≤ i ≤ C, the minimum of F with respect to u is equal to
the minimization of
606 F. Pierre and J.-F. Aujol
λ
T VC (u) + u(x) − c1 (x)22 dx + χR (u(x)). (29)
2
To simplify the notations, the dependence of each value to the position x of the
current pixel will be removed
in the following. For instance, the second term of (27)
will be denoted by C i=1 wi u − ci 2 dx.
2
This model is a classical one with a fidelity-data term C i=1 wi u − ci 2
2
and a regularization term T VC (u) defined in Equation (6). Since the first step of
the method extracts many candidates, we propose averaging the fidelity-data term
issued from each candidate. This average is weighted by wi . Thus, the term
C
wi u − ci 22 (30)
i=1
connects the candidate color ci to the color u that will be retained. The minimum
of this term with respect to u is reached when u is equal to the weighted average of
candidates ci .
Since the average is weighted by wi , these weights are constrained to be onto
the probability simplex. This constraint is formalized by χ (w) whose value is 0 if
w ∈ and +∞ otherwise, with defined as:
⎧ ⎫
⎨ C ⎬
:= (w1 , · · · , wC ) s.t. 0 ≤ wi ≤ 1 and wi = 1 . (31)
⎩ ⎭
i=1
In order to compute a suitable solution for the problem in (27), the authors
of Pierre et al. (2015a) propose a primal-dual algorithm with alternating mini-
mization of the terms depending of w. They also proposed numerical experiments
showing the convergence of their algorithm. Let us note that this recent reference
shows that the convergence of such numerical schemes can be demonstrated
after smoothing of the total variation term. Among all the numerical schemes
proposed in the references (Pierre et al. 2015a; Tan et al. 2019), we choose the
methodology having the best convergence rate as well as a convergence proof.
This scheme is given in Algorithm 2 in Tan et al. (2019). This algorithm is a
block-coordinate forward-backward algorithm. To increase the speed-up of the
convergence, Algorithm 2 of Tan et al. (2019) is initialized with the result of
500 iterations of the primal-dual algorithm of Pierre et al. (2015a). Whereas this
algorithm has no guaranty of convergence, the authors of Tan et al. (2019) have
experimentally observed that it numerically converges faster.
Unfortunately, the functional (27) is highly non-convex and it contains many
critical points. More precisely, the functional is convex with respect to u with
fixed w, and reversely, it is convex with respect to w for fixed u. Nevertheless, the
functional is not convex with respect to the joint variables (u, w). Thus, even if the
15 Recent Approaches for Image Colorization 607
numerical scheme would converge to a local minimum, the solution of the problem
depends on the initialization.
The dependence to the initialization implies an influence of the source image
for exemplar-based colorization, and it does not enable a fully automatic image
colorization within this paradigm. In the next section, we will show how the
colorization from datasets can be used to tackle this last limitation.
The third colorization approach uses some large image databases (Zhang et al.
2016). Neural networks (convolutional neural networks, generative adversarial
networks, autoencoder, recursive neural networks) have also been used successfully
leading to a significant number of recent contributions. The survey proposed in this
section is based on the paper (Mouzon et al. 2019). This literature can be divided into
two categories of methods. The first evaluates the statistical distribution of colors
for each pixel (Zhang et al. 2016; Royer et al. 2017; Chen et al. 2018). The network
computes, for each pixel of the grayscale images, the probability distribution of
the possible colors. The second takes a grayscale image as input and provides a
color image as output, mostly in the form of chrominance channels (Iizuka et al.
2016; Larsson et al. 2016; Cao et al. 2017; Isola et al. 2017; Deshpande et al. 2017;
Guadarrama et al. 2017; He et al. 2018; Su et al. 2018). Some methods use a mixture
of both (e.g., Zhang et al. 2017).
Both techniques require image resizing that is either done by deconvolution
layers or performed a posteriori with standard interpolation techniques.
In the case of Zhang et al. (2016), the network computes a probability distribution
of the color on a down-sampled version of the original image. The choice of a color
in each pixel at high resolution is made by linear interpolation without taking into
account the grayscale image. Hence, the contours of chrominance and luminance
may be not aligned, producing halo effects. Figure 9 shows some gray halo effects at
the bottom of the cat that are visible on the red part, near the tail. On the other hand,
in comparison to the other approaches of the state of the art, the method of Zhang
et al. (2016) produces images which are shinier.
Fig. 9 Example of halo effects produced by the method of Zhang et al. (2016). Based on a
variational model, the approach of Mouzon et al. (2019) is able to remove such artifacts
608 F. Pierre and J.-F. Aujol
Below, the CNN described in Zhang et al. (2016) is presented in detail. The
method of Zhang et al. (2016) is based on a discretization of the CIE Lab color space
into C = 313 colors. This number of reference colors comes from the intersection
gamut of the RGB color space and the discretization of the Lab space. The authors
designed a CNN based on a VGG network (Simonyan and Zisserman 2015) in order
to compute a statistical distribution of the C colors in each pixel. The input of the
network is the L lightness channel of the Lab transform of an image of size 256 ×
256. The output is a distribution of probability over a set of 313 couples of a, b
chrominance values for each pixel of a 64 × 64 size image. The quantification of the
color space in 313 colors is computed from two assumptions. First, the colors are
regularly spaced onto the CIE Lab color space. On this color space two colors are
close with respect to the Euclidean norm when the human visual system feels them
close. The second assumption that rules the set of colors is the respect of the RGB
gamut. The colors have to be displayable onto a standard screen.
To train this CNN, the database ImageNet (Deng et al. 2009) is used without the
grayscale images. The images are resized at size 256×256 and then transformed into
the CIE Lab color space. The images are then resized at size 64 × 64 to compute the
a and b channels. The loss function used is the cross-entropy between the luminance
(a, b) of the training image and the distribution over the 313 original colors. Let us
denote by the probability simplex in C = 313 dimensions.
Denoting by (ŵi (x))i=1..C ∈ N the probability distribution of dimension C in
the N pixels of the 64 × 64 image (over a domain ), and denoting by (wi (x)) the
ground truth distribution computed with a soft-encoding scheme (see Zhang et al.
2016 for details), the loss function is given by:
C
L(ŵ, w) = − wi (x) log(ŵi (x)). (32)
x∈ i=1
Coupled Approaches
Neither the exemplar-based methods, nor the manual techniques, nor the deep
learning approaches are able to colorize images without some defects. All of them
having drawbacks or advantages, we propose to describe some coupling approaches
15 Recent Approaches for Image Colorization 609
A method can be considered interactive when the user can influence the result of the
colorization process. Nevertheless, the interactivity can be difficult to reach. Indeed,
if a method computes a result with a too long delay, the user cannot stand to an
intermediary result in order to see the influence of his intervention. The results and
the survey proposed in this section are based on the papers (Pierre et al. 2014b,
2015a) which have led to a software (Pierre et al. 2016).
Some of the exemplar-based methods enable some interaction with the user,
for instance, the swatches approach of Welsh et al. (2002) in which the user
distinguishes some parts of the image by drawing some rectangles on the source
and target images where the textures are similar. The method then colorizes some
parts of the target image with the specified parts of the source image. Finally, the
method computes a solution for all the remaining uncolored pixels of the image
based on the already colorized parts. The advantage of this framework is that the
user can easily distinguish or associate the textures of the different images, which is
difficult with an automatic method. At the opposite, the exemplar-based method is
reliable to well colorize an image from its own parts, because the textures are more
similar. With this method, a contextual information is added.
The framework of Chia et al. (2011) exploits the huge quantity of data available
on the Internet. Nevertheless, the user has to manually segment and label the
objects of the target image. Next, for each labeled object, the images with the same
label are found on the Internet and used as source images. The image research is
based on superpixel extraction (Comaniciu and Meer 2002) as well as graph-based
optimization.
In the work of Ding et al. (2012), the scribbles are automatically generated and
the user is invited to associate a color to each scribble. Then, the phases of the
wavelet in the quaternion space are computed in order to propagate the colors along
the lines of equal phase. Indeed, the wavelets in quaternion space are a measure of
contours.
The method proposed in Pierre et al. (2014b) consists of a combination of
the method of Bugeau and Ta (2012) and the one of Yatziv and Sapiro (2006).
The approach uses a GPU implementation to compute a solution of model (27)
that enables to colorize an image of size 370 × 600 in approximately 1 s. This
computation time enables an extension of the exemplar-based approach of Pierre
et al. (2014c) by including an interaction with the user, which leads to a software
for colorization (Pierre et al. 2016).
The scribbles can be given in advance or added step by step by the user. When
a source image is added, the first step consists of the extraction of C candidates as
610 F. Pierre and J.-F. Aujol
in section “Patch-Based Methods” and the corresponding weights are initialed with
the value w = 1/C.
The information given by the scribbles influences the weights and the candidate
number. More precisely, for each pixel of the image, a new candidate is added
for each scribble. When a candidate is introduced, its weight is initialized for the
minimization process with a value depending on the geodesic distance in a similar
way as Yatziv and Sapiro (2006).
The geodesic distance, denoted by D, is computed with the fast marching
−4
algorithm (Sethian 1999) with a potential equal to 0.001 + ∇u22 given
by Chan and Vese (2001). D is normalized to get values between 0 and 1. The
implementation of Peyré (2008) can be used to compute it.
The pixels having a low geodesic distance from a scribble get its color, whereas
those having a high geodesic distance are not influenced by the user intervention.
The w variable is composed of concatenation of uniform weights for the color
candidates coming from the source image with the patch extraction and the
weight coming from the geodesic distance. The values are then projected onto
the probability simplex with the algorithm of Chen and Ye (2011). The u
variable is initialized with i wi ci and the functional (27) is minimized using this
initialization.
In Fig. 10, we show a first example of colorization using both manual and
exemplar-based approaches. Figure 10a and b shows the source and target images.
Figure 10c corresponds to exemplar-based colorization done without manual scrib-
ble. In this first result, the sky is not suitably colorized since it appears brown instead
of blue, as well as the door in ruins. Moreover, some blue blotches appear on the
floor. Figure 10d shows the corrections done by the user by adding three scribbles
on the exemplar-based result (Fig. 10c). Figure 10e illustrates the advantage of the
combination of the methods. Indeed, the work provided by the user is of lower
quality than the full manual colorization. It also shows that Model (27) is able to
enhance contours.
Figure 11 shows the additional results and illustrate the advantage of using the
joint model instead of using only the source image (fourth column) or only scribbles
(fifth column). Colorization results in the last column in Fig. 11 are visually better
than the ones computed from only one information source. This experiment shows
Fig. 10 Colorization using manual and exemplar-based approach. (a) Source image. (b) Target
image with three scribbles. (c) Exemplar-based colorization. (d) Manual colorization. (e) Both
15 Recent Approaches for Image Colorization 611
Fig. 11 Advantage of the joint approach, compared to manual and exemplar-based colorization.
From left to right: source, target with scribbles added by the user, exemplar-based result, scribble-
based results, and finally the joint approach
also that old photographies and faces are difficult to colorize with exemplar-based
approaches since they require more scribbles. This statement has been done in Chen
et al. (2004). Indeed, old pictures contain a lot of noise and textures. Face image
contains smooth parts, for instance skin or background, with no textures. This kind
of images is hard to colorize with assumption of texture similarities. Nevertheless, it
is possible to compute a suitable result with the joint method, as well as morphing-
based approach presented in section “Morphing-Based Approach”. Let us remark
that the scribbles given by the user have naturally a local influence, but this influence
can be also considered as global. For instance, on the last row in Fig. 11, the blue
scribble in the arch also improves the color of the sky in the left-hand part of the
image.
In the following, we recall the results given in the paper Mouzon et al. (2019) which
consists of a coupling between a variational approach and the output of the CNN
of Zhang et al. (2016). Next, we perform numerical comparisons with the original
CNN approach of Zhang et al. (2016).
distribution provided by the CNN described in Zhang et al. (2016). In addition, the
numerical results of Pierre et al. (2015a) demonstrate the ability to remove halos,
which is relevant to the limitations of Zhang et al. (2016). This functional will have
to face two main problems: on the one hand, the transition from a low to a high
resolution and, on the other hand, the maintenance of a higher saturation than the
current methods.
In this section, a method to couple the prediction power of CNN with the
precision of variational methods is described. To this aim, let us remark that the
variable w of the functional (27) represents the ratio of each color candidate which
is represented in the final result. This comes from the fact that, for a given vector
w ∈ RC , the minimum of
C
wi u − ci (33)
i=1
C
wi ci . (34)
i=1
Fig. 12 Overview of method (Mouzon et al. 2019). A CNN computes color distribution on each pixel. A variational method selects then a color for each pixel
based on a regularity hypothesis
613
614 F. Pierre and J.-F. Aujol
Numerical Results
In this section we show a qualitative comparison between Zhang et al. (2016) and the
framework of Mouzon et al. (2019). A lot of results provided by Zhang et al. (2016)
are accurate and reliable. We show on these examples that the method of Mouzon
et al. (2019) does not reduce the quality of the images. We then propose some
comparisons with erroneous results of Zhang et al. (2016), which shows that the
method of Mouzon et al. (2019) is reliable to fully automatically colorize images
without artifacts and halo effects. A time comparison between the CNN inference
computation and the variational step will be proposed to show that the regularization
of the result is not a burden on the CNN approach. Finally, to show the limitation of
CNN in image colorization, we show some results where neither the approach
of Zhang et al. (2016) nor the framework of Mouzon et al. (2019) is able to produce
some reliable results.
Figure 13 shows the colorization results of the method of Zhang et al. (2016).
Whereas it is hard to see that the method of Mouzon et al. (2019) produces a
15 Recent Approaches for Image Colorization 615
0.12
Zhang et al. 2016
Ours
0.1
0.08
0.06
0.04
0.02
0
0 0.2 0.4 0.6 0.8 1
Histograms of saturation
Fig. 13 Results of Zhang et al. (2016) compared with Mouzon et al. (2019). The histogram of
the saturation shows the second result is shinier than the first one. Indeed, the average value of the
saturation is higher for the model of Mouzon et al. (2019) (0.4228) than the one of Zhang et al.
(2016) (0.3802). (a) Original image. (b) Zhang et al. 2016. (c) Mouzon et al. 2019. (d) Histograms
of saturation
shinier result than the result of Zhang et al. (2016) unless being a calibration expert,
the histogram of the saturation is able to show the improvement. Indeed, since the
histogram is right-shifted, it means that globally, the saturation is higher on the
result of Mouzon et al. (2019). Quantitatively, the average of the saturation is equal
to 0.4228 for the method of Mouzon et al. (2019), while it is equal to 0.3802
for the method of Zhang et al. (2016). This improvement comes from the fact
that the method of Mouzon et al. (2019) selects one color among the ones given
by the results of the CNN, whereas the method of Zhang et al. (2016) computes
the annealed mean of them. The mean of the chrominances of the colors produces
a decrease of the saturation and makes the colors drabber. By using a selection
algorithm based on the image regularization, the method of Mouzon et al. (2019) is
able to avoid this drawback.
616 F. Pierre and J.-F. Aujol
Fig. 14 Comparison of Mouzon et al. (2019) with Zhang et al. (2016). This example provides a
proof of concept. The method of Mouzon et al. (2019) is able to remove the halo effects on the
colorization result of Zhang et al. (2016)
The result in Fig. 14 is a proof of concept for the proposed framework. We can
see a toy example which is automatically colorized by the method of Zhang et al.
(2016). The result given by the method of Zhang et al. (2016) produces some halo
effect near the only contour of the image, which is unnatural. The regularization
of the result is able to remove this halo effect and to recover an image looking
less artificial. This toy example contains only two constant parts. The aim of the
variational method is to couple the contours of the chrominance channels and the
ones of the luminance. The result produced with the method of Mouzon et al. (2019)
contains no halo effect, showing the benefits of their framework.
In Fig. 15, we review some results and we compare them to the method of Zhang
et al. (2016). For the lion, first line, a misalignment of the colors with the grayscale
image is visible (a part of the lion is colorized in blue and a part of the sky is brown
beige). This is a typical case of halo effect where the framework of Mouzon et al.
(2019) is able to remove the artifacts. For the image of mountaineer, on the result
of Zhang et al. (2016) some pink stains appear. With the method of Mouzon et al.
(2019), the minimization of the total variation ensures the regularity of the image,
and thus it removes these strains.
Figure 16 shows additional results. The first line is an old port card. Its col-
orization is reliable with the CNN, and, in addition, the variational approach makes
it a little bit shinier. This example shows the ability of the approach of Mouzon
et al. (2019) to colorize historical images. In the second example, most of the
image is well colorized by the original method of Zhang et al. (2016). Nevertheless,
the lighthouse and the right-side building contain some orange halos that are not
reliable. With the variational method, the colors are convincing. Additional results
are available in http://www.fabienpierre.fr/colorization.
The computational time of the CNN forward pass is about 1.5 s in GPU, whereas
the minimization of the variational model (27) is about 15 s in Matlab on CPU.
15 Recent Approaches for Image Colorization 617
Fig. 15 Comparison between Mouzon et al. (2019) and Zhang et al. (2016)
In Pierre et al. (2017a), the authors provide a computation time almost equal to
1 s with unoptimized GPU implementation. Since the minimization scheme of Tan
et al. (2019) is approximately the same, the computational time would be almost
equal. Thus, the computational time of the approach of Mouzon et al. (2019) is not
a burden in comparison with the method of Zhang et al. (2016).
In Fig. 17, a failure case is shown. In this case, since the minimization of the
variational model strongly depends on its initialization, the method of Mouzon
et al. (2019) is not able to recover its realistic colors. Actually, fully automatic
colorization remains an open problem.
618 F. Pierre and J.-F. Aujol
Fig. 16 Additional comparisons of Mouzon et al. (2019) with Zhang et al. (2016)
Fig. 17 Fail case. The prediction of the CNN is not able to recover a reliable color
In this chapter, we have shown that image colorization has known a huge progress
during the last 10 years by introducing a wide number of methods and approaches.
Some extensions of these techniques have been proposed for video colorization but
with limited number of frames. Future works could consider this application with
more success. In this work, we have shown some limitations to colorization which
15 Recent Approaches for Image Colorization 619
let the topic open for active research. Joint approaches have shown their efficiency,
and a combination of deep leaning with manual approaches could enhance the
human system interface for image and video colorization.
Acknowledgments This study has been carried out with financial support from the French
Research Agency through the PostProdLEAP project (ANR-19-CE23-0027-01).
References
Abidi, B.R., Zheng, Y., Gribok, A.V., Abidi, M.A.: Improving weapon detection in single energy
x-ray images through pseudocoloring. IEEE Trans. Syst. Man Cybern. Part C 36(6), 784–796
(2006)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to
state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282
(2012)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: European Conference
on Computer Vision, pp. 404–417. Springer (2006)
Berkels, B., Effland, A., Rumpf, M.: Time discrete geodesic paths in the space of images. SIAM J.
Imaging Sci. 8(3), 1457–1488 (2015)
Bugeau, A., Ta, V.T.: Patch-based image colorization. In: IEEE International Conference on Pattern
Recognition, pp. 3058–3061 (2012)
Bugeau, A., Ta, V.T., Papadakis, N.: Variational exemplar-based image colorization. IEEE Trans.
Image Proces. 23(1), 298–307 (2014)
Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via generative adversarial
networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 151–166. Springer (2017)
Caselles, V., Facciolo, G., Meinhardt, E.: Anisotropic cheeger sets and applications. SIAM J.
Imaging Sci. 2(4), 1211–1254 (2009)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Proces. 10(2), 266–277
(2001)
Chan, T.F., Kang, S.H., Shen, J.: Total variation denoising and enhancement of color images
based on the cb and hsv color models. J. Vis. Commun. Image Represent. 12(4), 422–435
(2001)
Charpiat, G., Hofmann, M., Schölkopf, B.: Automatic image colorization via multimodal predic-
tions. In: European Conference on Computer Vision, pp. 126–139. Springer (2008)
Chen, Y., Ye, X.: Projection onto a simplex. arXiv preprint arXiv:1101.6081 (2011)
Chen, T., Wang, Y., Schillings, V., Meinel, C.: Grayscale image matting and colorization. In: Asian
Conference on Computer Vision, pp. 1164–1169 (2004)
Chen, Y., Luo, Y., Ding, Y., Yu, B.: Automatic colorization of images from chinese black and
white films based on cnn. In: 2018 IEEE International Conference on Audio, Language and
Image Processing, pp. 97–102 (2018)
Chia, A.Y.S., Zhuo, S., Kumar, R.G., Tai, Y.W., Cho, S.Y., Tan, P., Lin, S.: Semantic colorization
with internet images. In: ACM SIGGRAPH ASIA (2011)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans.
Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Cui, M., Hu, J., Razdan, A., Wonka, P.: Color-to-gray conversion using isomap. Vis. Comput.
26(11), 1349–1360 (2010)
Deledalle, C.A., Papadakis, N., Salmon, J., Vaiter, S.: Clear: covariant least-square re-fitting with
applications to image restoration. SIAM J. Imaging Sci. 10(1), 243–284 (2017)
620 F. Pierre and J.-F. Aujol
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical
image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–
255 (2009)
Deshpande, A., Lu, J., Yeh, M.C., Chong, M.J., Forsyth, D.A.: Learning diverse image colorization.
In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2877–2885 (2017)
Di Blasi, G., Reforgiato, D.: Fast colorization of gray images. Eurographics Italian (2003). http://
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.6839&rep=rep1&type=pdf
Ding, X., Xu, Y., Deng, L., Yang, X.: Colorization using quaternion algebra with automatic scribble
generation. In: Advances in Multimedia Modeling (2012)
Drew, M.S., Finlayson, G.D.: Improvement of colorization realism via the structure tensor. Int. J.
Image Graph. 11(04), 589–609 (2011)
Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: IEEE International
Conference on Computer Vision, vol. 2, pp. 1033–1038 (1999)
Fitschen, J.H., Nikolova, M., Pierre, F., Steidl, G.: A variational model for color assignment. In:
Scale Space and Variational Methods in Computer Vision, pp. 437–448 (2015)
Fornasier, M.: Nonlinear projection recovery in digital inpainting for color image restoration. J.
Math. Imaging Vis. 24(3), 359–373 (2006)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Upper Saddle River, Pearson
(2008)
Guadarrama, S., Dahl, R., Bieber, D., Shlens, J., Norouzi, M., Murphy, K.: Pixcolor: pixel recursive
colorization. In: British Machine Vision Conference (2017)
Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar
images. In: ACM International Conference on Multimedia, pp. 369–378 (2012)
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM Trans.
Graph. 37(4), 47:1–47:16 (2018)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: ACM
Computer Graphics and Interactive Techniques, pp. 327–340 (2001)
Heu, J.H., Hyun, D.Y., Kim, C.S., Lee, S.U.: Image and video colorization based on prioritized
source propagation. In: IEEE International Conference on Image Processing, pp. 465–468
(2009)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global
and local image priors for automatic image colorization with simultaneous classification. ACM
Trans. Graph. 35(4), 1–11 (2016)
Irony, R., Cohen-Or, D., Lischinski, D.: Colorization by example. In: Eurographics Symposium on
Rendering, vol. 2. Citeseer (2005)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial
networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Jin, Z., Zhou, C., Ng, M.K.: A coupled total variation model with curvature driven for image
colorization. Inverse Prob. Imaging 10(1930–8337), 1037 (2016). https://doi.org/10.3934/ipi.
2016031
Jin, Z., Min, L., Ng, M.K., Zheng, M.: Image colorization by fusion of color transfers based on
DFT and variance features. Comput. Math. Appl. 77, 2553–2567 (2019)
Jung, M., Kang, M.: Variational image colorization models using higher-order mumford–
shah regularizers. J. Sci. Comput 68(2), 864–888 (2016). https://doi.org/10.1007/s10915-015-
0162-9
Kang, S.H., March, R.: Variational models for image colorization via chromaticity and brightness
decomposition. IEEE Trans. Image Proces. 16(9), 2251–2261 (2007)
Kawulok, M., Kawulok, J., Smolka, B.: Discriminative textural features for image and video
colorization. IEICE Trans. Inf. Syst. 95-D(7), 1722–1730 (2012)
Kim, T.H., Lee, K.M., Lee, S.U.: Edge-preserving colorization using data-driven random walks
with restart. In: IEEE International Conference on Image Processing, pp. 1661–1664 (2010)
Kuhn, G.R., Oliveira, M.M., Fernandes, L.A.: An improved contrast enhancing approach for color-
to-grayscale mappings. Vis. Comput. 24(7–9), 505–514 (2008)
15 Recent Approaches for Image Colorization 621
Kuzovkin, D., Chamaret, C., Pouli, T.: Descriptor-based image colorization and regularization. In:
Computational Color Imaging, pp. 59–68. Springer, Cham (2015)
Lagodzinski, P., Smolka, B.: Digital image colorization based on probabilistic distance transfor-
mation. In: 50th International Symposium ELMAR, vol. 2, pp. 495–498 (2008)
Lannaud, C.: Fallait-il coloriser la guerre? L’express (2009). Disponible en ligne sur http://www.
lexpress.fr/culture/tele/fallait-il-coloriser-la-guerre_789380.html
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization.
In: European Conference on Computer Vision, pp. 1–16. Springer (2016)
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: ACM Transactions on
Graphics, vol. 23–3, pp. 689–694 (2004)
Lézoray, O., Meurie, C., Elmoataz, A.: A graph approach to color mathematical morphology. In:
IEEE International Symposium on Signal Processing and Information Technology, pp. 856–861
(2005)
Lézoray, O., Elmoataz, A., Bougleux, S.: Graph regularization for color image processing. Comput.
Vis. Image Underst. 107(1), 38–55 (2007a)
Lézoray, O., Elmoataz, A., Meurie, C.: Mathematical morphology in any color space. In:
IAPR/IEEE International Conference on Image Analysis and Processing, Computational Color
Imaging Workshop (2007b)
Lézoray, O., Ta, V.T., Elmoataz, A.: Nonlocal graph regularization for image colorization. In: IEEE
International Conference on Pattern Recognition, pp. 1–4 (2008)
Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y.Q., Shum, H.Y.: Natural image colorization.
In: Proceedings of the 18th Eurographics Conference on Rendering Techniques, EGSR’07,
pp. 309–320. Eurographics Association, Aire-la-Ville (2007). https://doi.org/10.2312/EGWR/
EGSR07/309-320
Mouzon, T., Pierre, F., Berger, M.O.: Joint CNN and variational model for fully-automatic
image colorization. In: SSVM 2019 – Seventh International Conference on Scale Space and
Variational Methods in Computer Vision, Hofgeismar (2019). https://hal.archives-ouvertes.fr/
hal-02059820
Nikolova, M., Steidl, G.: Fast hue and range preserving histogram specification: theory and
new algorithms for color image enhancement. IEEE Trans. Image Proces. 23(9), 4087–4100
(2014)
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
Persch, J., Pierre, F., Steidl, G.: Exemplar-based face colorization using image morphing.
J. Imaging 3(4), 48 (2017)
Peter, P., Kaufhold, L., Weickert, J.: Turning diffusion-based image colorization into efficient color
compression. IEEE Trans. Image Proces. 26(2), 860–869 (2017)
Peyré, G.: Toolbox fast marching – a toolbox for fast marching and level sets compu-
tations (2008). http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=
6110&objectType=FILE
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: Hue constrained image colorization in the RGB
space. Preprint (2014a). Disponible en ligne sur https://hal.archives-ouvertes.fr/hal-00995724/
document
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: A unified model for image colorization. In: Color and
Photometry in Computer Vision (ECCV Workshop), pp. 1–12 (2014b)
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T., Papadakis, N.: Exemplar-based colorization in RGB
color space. In: IEEE International Conference on Image Processing, pp. 1–5 (2014c)
Pierre, F., Aujol, J.F., Bugeau, A., Papadakis, N., Ta, V.T.: Luminance-chrominance model for
image colorization. SIAM J. Imaging Sci. 8(1), 536–563 (2015a)
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: Combinaison linéaire optimale de métriques pour la
colorisation d’images. In: XXVème colloque GRETSI, pp. 1–4 (2015b)
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: Luminance-hue specification in the RGB space. In:
Scale Space and Variational Methods in Computer Vision, pp. 413–424 (2015c)
622 F. Pierre and J.-F. Aujol
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: Colociel. Dépôt Agence de Protection des Programmes
No IDDN.FR.001.080021.000.S.P.2016.000.2100 (2016). Disponible en ligne sur http://www.
labri.fr/perso/fpierre/colociel_v1.zip
Pierre, F., Aujol, J.F., Bugeau, A., Ta, V.T.: Interactive video colorization within a variational
framework. SIAM J. Imaging Sci. 10(4), 2293–2325 (2017a) a
Pierre, F., Aujol, J.F., Deledalle, C.A., Papadakis, N.: Luminance-guided chrominance denoising
with debiased coupled total variation. In: International Workshop on Energy Minimization
Methods in Computer Vision and Pattern Recognition, pp. 235–248. Springer (2017b)
Quang, M.H., Kang, S.H., Le, T.M.: Image and video colorization using vector-valued reproducing
kernel hilbert spaces. J. Math. Imaging Vis. 37(1), 49–65 (2010)
Ren, X., Malik, J.: Learning a classification model for segmentation. In: IEEE International
Conference on Computer Vision, pp. 10–17 (2003)
Royer, A., Kolesnikov, A., Lampert, C.H.: Probabilistic image colorization. In: British Machine
Vision Conference (2017)
Sethian, J.A.: Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computa-
tional Geometry, Fluid Mechanics, Computer Vision, and Materials Science, vol. 3. Cambridge
University Press, Cambridge (1999)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.
In: International Conference on Learning Representations (2015)
Song, M., Tao, D., Chen, C., Bu, J., Yang, Y.: Color-to-gray based on chance of happening
preservation. Neurocomputing 119, 222–231 (2013)
Su, Z., Liang, X., Guo, J., Gao, C., Luo, X.: An edge-refined vectorized deep colorization model
for grayscale-to-color images. Neurocomputing 311, 305–315 (2018)
Sỳkora, D., Buriánek, J., Žára, J.: Unsupervised colorization of black-and-white cartoons. In: Pro-
ceedings of the 3rd International Symposium on Non-photorealistic Animation and Rendering,
pp. 121–127. ACM (2004)
Tan, P., Pierre, F., Nikolova, M.: Inertial alternating generalized forward–backward splitting for
image colorization. J. Math. Imaging Vis. 61(5), 672–690 (2019)
Wei, L.Y., Levoy, M.: Fast texture synthesis using tree-structured vector quantization. In: ACM
Computer Graphics and Interactive Techniques, pp. 479–488. Press/Addison-Wesley Publishing
Co. (2000)
Welsh, T., Ashikhmin, M., Mueller, K.: Transferring color to greyscale images. In: ACM
Transactions on Graphics, vol. 21–3, pp. 277–280. ACM (2002)
Williams, A., Barrus, S., Morley, R.K., Shirley, P.: An efficient and robust ray-box intersection
algorithm. In: ACM SIGGRAPH 2005 Courses, p. 9 (2005)
Wolfgang Baatz Massimo Fornasier, P.A.M., Schönlieb, C.B.: Inpainting of ancient austrian
frescoes. In: Sarhangi, R., Séquin, C.H. (eds.) Bridges Leeuwarden: Mathematics, Music, Art,
Architecture, Culture, pp. 163–170. Tarquin Publications, London (2008). Disponible en ligne
sur http://archive.bridgesmathart.org/2008/bridges2008-163.html
Yatziv, L., Sapiro, G.: Fast image and video colorization using chrominance blending. IEEE Trans.
Image Proces. 15(5), 1120–1129 (2006)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on
Computer Vision, pp. 1–16. Springer (2016)
Zhang, R., Zhu, J.Y., Isola, P., Geng, X., Lin, A.S., Yu, T., Efros, A.A.: Real-time user-guided
image colorization with learned deep priors. ACM Trans. Graph. 9(4), 119:1–119:11 (2017)
Zheng, Y., Essock, E.A.: A local-coloring method for night-vision colorization utilizing image
analysis and fusion. Inf. Fusion 9(2), 186–199 (2008)
Numerical Solution for Sparse
PDE Constrained Optimization 16
Xiaoliang Song and Bo Yu
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
Finite Element Approximation and Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
An Inexact Heterogeneous ADMM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
An Inexact Heterogeneous ADMM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
Convergence Results of ihADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
An Inexact Majorized Accelerated Block Coordinate Descent Method for (Dh ) . . . . . . . . . . 652
An Inexact Block Symmetric Gauss-Seidel Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
Inexact Majorized Accelerate Block Coordinate Descent (imABCD) Method . . . . . . . . . . 656
A sGS-imABCD Algorithm for (Dh ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Algorithmic Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Abstract
X. L. Song · B. Yu ()
School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
e-mail: [email protected]; [email protected]
Keywords
Introduction
⎧
⎪
⎪ 1 α
⎪
⎪ min J (y, u) = y − yd 2L2 () + u2L2 () + βuL1 ()
⎪
⎪ (y,u)∈Y ×U 2 2
⎪
⎪
⎨
s.t. Ly = u + yr in ,
⎪
⎪
⎪
⎪ y=0 on Γ,
⎪
⎪
⎪
⎪
⎩ u ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e. on } ⊆ U,
(P)
where Y := H01 (), U := L2 (), ⊆ Rn (n = 2 or 3) is a convex, open, and
bounded domain with C 1,1 - or polygonal boundary Γ ; the desired state yd ∈ L2 ()
and the source term yr ∈ L2 () are given; and a ≤ 0 ≤ b and α, β > 0. Moreover,
the operator L is a second-order linear elliptic differential operator. It is well-known
that L1 -norm could lead to sparse optimal control, i.e., the optimal control with
small support. Such an optimal control problem (P) plays an important role for
the placement of control devices (Stadler 2009). In some cases, it is difficult or
undesirable to place control devices all over the control domain and one hopes
to localize controllers in small and effective regions, and the L1 -solution gives
information about the optimal location of the control devices.
Through this chapter, let us suppose the elliptic PDEs involved in (P) which are
of the form
16 Numerical Solution for Sparse PDE Constrained Optimization 625
Ly = u + yr in ,
(1)
y=0 on ∂,
n
(Ly)(x) := − ∂xj (aij (x)yxi ) + c0 (x)y(x), (2)
i,j =1
where functions aij (x), c0 (x) ∈ L∞ (), c0 ≥ 0, and it is uniformly elliptic, i.e.,
aij (x) = aj i (x) and there is a constant θ > 0 such that
n
aij (x)ξi ξj ≥ θ ξ 2 for a.a. x ∈ and ∀ξ ∈ Rn . (3)
i,j =1
does not have a decoupled form with respect to the coefficients {ui }, where {φi (x)}
are the piecewise linear nodal basis functions. Hence, the authors introduced
an alternative discretization of the L1 -norm which relies on a nodal quadrature
formula:
Nh
uh L1 () := |ui | φi (x)dx. (7)
h i=1 h
Obviously, this quadrature incurs an additional error, although the authors proved
that this approximation does not change the order of error estimates. In a sequence
of papers (Casas et al. 2012a,b), for the non-convex case governed by a semi-
linear elliptic equation, Casas et al. proved second-order necessary and sufficient
16 Numerical Solution for Sparse PDE Constrained Optimization 627
method (Beck and Teboulle 2009), ADMM (Fazel et al. 2013; Chen and Toh 2017;
Li et al. 2015, 2016), etc., have become the state-of-the-art algorithms. Thanks to
the iteration complexity O(1/k 2 ), a fast inexact proximal (FIP) method in function
space, which is actually the APG method, was proposed to solve the problem (P)
in Schindele and Borzì (2016). As we know, the efficiency of the FIP method
depends on how close the step-length is to the Lipschitz constant. However, in
general, choosing an appropriate step-length is difficult since the Lipschitz constant
is usually not available analytically. Thus, this disadvantage largely limits the
efficiency of APG method.
In this chapter, we will focus first on the ADMM algorithm. The classical
ADMM was originally proposed by Glowinski and Marroco (1975) and Gabay and
Mercier (1976), and it has found lots of efficient applications in a broad spectrum
of areas. In particular, we refer to Boyd et al. (2011) for a review of the applications
of ADMM in the areas of distributed optimization and statistical learning. We give
a brief sketch of ADMM for the following finite dimensional linearly constrained
convex programming problem with two blocks of functions and variables:
⎧
⎪
⎪ min f (u) + g(z)
⎪
⎨
s.t. A1 u + A2 z = c, (8)
⎪
⎪
⎪
⎩ u ∈ U, z ∈ Z,
where f (u) : Rn → R and g(z) : Rm → R are both closed, proper, and convex
functions (but not necessary smooth); A1 ∈ Rp×n , A2 ∈ Rp×m and c ∈ Rp ; U ⊂
Rn and Z ⊂ Rm are given closed, convex, and non-empty sets. Let
σ
Lσ (u, z, λ; σ ) = f (u) + g(z) + λ, A1 u + A2 z − c + A1 u + A2 z − c2 (9)
2
Thanks to the separable structure of the objective function, each subproblem in (10)
involves only one block of f (u) and g(z) and could be solved easily. Under some
trivial assumptions, the classical ADMM for solving (8) has global convergence and
sublinear convergence rate at least.
Motivated by the success of the finite dimensional ADMM algorithm, it is
reasonable to consider extending the ADMM to infinite dimensional optimal control
16 Numerical Solution for Sparse PDE Constrained Optimization 629
⎧
⎪
⎪
1 α α
J (y, u, z) = y − yd 2L2 () + u2L2 () + z2L2 ()
⎪
⎪ min
⎪
⎪ (y,u,z)∈Y ×U ×U 2 4 4
⎪
⎪
⎪
⎪
⎪
⎪ + βzL1 ()
⎪
⎪
⎪
⎨
s.t. Ay = u + yr in , (P)
⎪
⎪
⎪
⎪ y = 0 on ∂,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ u = z,
⎪
⎪
⎪
⎩
z ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e on } ⊆ U.
However, when the classical ADMM is directly used to solve (DPh ), i.e., the discrete
version of (P), there is no well-structure as in continuous case and the corresponding
subproblems cannot be efficiently solved. Thus, making use of the inherent structure
of (DPh ), a heterogeneous ADMM is proposed. Meanwhile, sometimes it is
unnecessary to exactly compute the solution of each subproblem even if it is doable,
especially at the early stage of the whole process. For example, if a subproblem is
equivalent to solving a large-scale or ill-condition linear system, it is a natural idea
to use the iterative methods such as some Krylov-based methods. Hence, taking the
inexactness of the solutions of associated subproblems into account, a more practical
inexact heterogeneous ADMM (ihADMM) is proposed. Different from the classical
ADMM, we utilize two different weighted inner products to define the augmented
Lagrangian function for two subproblems, respectively. Specifically, based on the
Mh -weighted inner product, the augmented Lagrangian function with respect to the
u-subproblem in k-th iteration is defined as
σ
Lσ (u, zk ; λk ) = f (u) + g(zk ) + λ, Mh (u − zk ) + u − zk 2Mh ,
2
where Mh is the mass matrix. On the other hand, for the z-subproblem, based on the
Wh -weighted inner product, the augmented Lagrangian function in k-th iteration is
defined as
630 X. L. Song and B. Yu
σ k+1
Lσ (uk+1 , z; λk ) = f (uk+1 ) + g(z) + λ, Mh (uk+1 − z) + u − z2Wh ,
2
where the lumped mass matrix Wh is diagonal.
Benefiting from different weighted techniques, each subproblem of ihADMM
for (DPh ) can be efficiently solved. Specifically, the u-subproblem of ihADMM,
which results in a large-scale linear system, is the main computation cost in whole
algorithm. Wh -weighted technique makes z-subproblem have a decoupled form
and admit a closed-form solution given by the soft thresholding operator and the
projection operator onto the box constraint [a, b]. Moreover, global convergence and
the iteration complexity result o(1/k) in non-ergodic sense for our ihADMM will be
proved. Taking the precision of discretized error into account, we should mention
that using our ihADMM algorithm to solve problem (DPh ) is highly enough and
efficient in obtaining an approximate solution with moderate accuracy.
As far as we know, most of the aforementioned papers are devoted to solving the
primal problem. Based on the special structure of the dual problem, we will also
consider using the duality-based approach for (P). The dual of problem (P) can be
written, in its equivalent minimization form, as
1 1
min Φ(λ, μ, p) := A∗ p − yd 2L2 () + − p + λ + μ2L2 ()
2 2α
(D)
1
+ p, yr L2 () + δβB∞ (0) (λ) + δU∗ ad (μ) − yd 2L2 () ,
2
where p ∈ H01 (), λ, μ ∈ L2 (), B∞ (0) := {λ ∈ L2 () : λL∞ () ≤ 1}, and
for any given non-empty, closed convex subset C of L2 (), δC (·) is the indicator
function of C. Based on the L2 -inner product, we define the conjugate of δC (·) as
follows:
Although the duality-based approach has been introduced in Clason and Kunisch
(2011) for elliptic control problems without control constraints in nonreflexive
Banach spaces, the authors did not take advantage of the structure of the dual
problem and still used semismooth Newton methods to solve the Moreau-Yosida
regularization of the dual problem. In the chapter, in terms of the structure of
problem (D), we aim to design an algorithm which could efficiently and fast solve
the dual problem (D).
By setting x = (μ, λ, p), x0 = μ, and x1 = λ, it is quite clear that our dual
problem (D) belongs to a general class of multi-block convex optimization problems
of the form
Under suitable assumptions and certain inexactness criteria, the author can prove
that the inexact mABCD method also enjoys the impressive O(1/k 2 ) iteration
complexity.
In this chapter, which is inspired by the success of the iABCD and imABCD
methods, we combine their virtues and propose an inexact sGS-based majorized
ABCD method (called sGS-imABCD) to solve problem (D). The design of this
method combines an inexact 2-block majorized ABCD and the recent advances in
the inexact sGS technique. Owing to the convergence results of imABCD method
which are given in Cui (2016, Chapter 3), our proposed algorithm could be proven
having the O(1/k 2 ) iteration complexity as well. Moreover, some truly imple-
mentable inexactness criteria controlling the accuracy of the generated imABCD
632 X. L. Song and B. Yu
The goal of this section is to study the approximation of problems (P) and (P) by
finite elements.
To achieve our aim, we first consider a family of regular and quasi-uniform
¯ For each cell T ∈ Th , let us define the diameter of
triangulations {Th }h>0 of .
the set T by ρT := diam T and define σT to be the diameter of the largest ball
contained in T . The mesh size of the grid is defined by h = maxT ∈Th ρT . We
suppose that the following regularity assumptions on the triangulation are satisfied
which are standard in the context of error estimates.
ρT h
≤κ and ≤ τ,
σT ρT
hold for all T ∈ Th and all h > 0. Let us define ¯ h = T ∈T T , and let h ⊂
h
and Γh denote its interior and its boundary, respectively. In the case that is
a convex polyhedral domain, we have = h . In the case that has a C 1,1 -
boundary Γ , we assumed that ¯ h is convex and that all boundary vertices of ¯h
are contained in Γ , such that |\h | ≤ ĉh , where | · | denotes the measure of the
2
as the discretized state space, where P1 denotes the space of polynomials of degree
less than or equal to 1. For a given source term yr and right-hand side u ∈ L2 (),
we denote by yh (u) the approximated state associated with u, which is the unique
solution for the following discretized weak formulation:
⎛ ⎞
n
⎝ aij yh xi vhxj + c0 yh vh ⎠ dx = (u + yr )vh dx ∀vh ∈ Yh . (13)
h i,j =1 h
Lemma 1 (Ciarlet 1978, Theorem 4.4.6). For a given u ∈ L2 (), let y and yh (u)
be the unique solution of (4) and (13), respectively. Then there exists a constant
c1 > 0 independent of h, u, and yr such that
y −yh (u)L2 () +h∇y −∇yh (u)L2 () ≤ c1 h2 (uL2 () +yr L2 () ). (14)
Nh
φi (x) ≥ 0, φi (x)∞ = 1 ∀i = 1, 2, . . . , Nh , φi (x) = 1. (15)
i=1
Nh
Nh
Nh
uh = ui φi (x), zh = zi φi (x), yh = yi φi (x),
i=1 i=1 i=1
⎧ ⎫
⎨
Nh
⎬
Uad,h : = Uh ∩ Uad = zh = zi φi (x) a ≤ zi ≤ b, ∀i = 1, . . . , Nh ⊂ Uad .
⎩ ⎭
i=1
Following the approach of Carstensen (1999), for the error analysis further below,
let us introduce a quasi-interpolation operator Πh : L1 (h ) → Uh which provides
interpolation estimates. For an arbitrary w ∈ L1 (), the operator Πh is constructed
as follows:
Nh
h w(x)φi (x)dx
Πh w = πi (w)φi (x), πi (w) = . (16)
h φi (x)dx
i=1
Based on the assumption on the mesh and the control discretization , we extend Πh w
to by taking Πh w = w for every x ∈ \h and have the following estimates
of the interpolation error. For the detailed proofs, we refer to Carstensen (1999)
and de Los Reyes et al. (2008).
where
⎛ ⎞2
Nh
zh 2L2 ( = ⎝ zi φi (x)⎠ dx, (18)
h)
h i=1
16 Numerical Solution for Sparse PDE Constrained Optimization 635
Nh
zh L1 (h ) = zi φi (x)dx. (19)
h i=1
This implies, for problem (P), we have the following discretized version:
⎧
⎪
⎪ 1 α
⎪
⎪ min Jh (yh , uh , zh ) = yh − yd 2L2 ( ) + uh 2L2 ( )
⎪
⎪ (yh ,uh ,zh )∈Yh ×Uh ×Uh 2 h 2 h
⎪
⎪
⎨
+ βuh L1 (h ) (Ph )
⎪
⎪
⎪
⎪ s.t. yh = Sh (uh + yr ),
⎪
⎪
⎪
⎪
⎩ uh ∈ Uad,h .
For problem (Ph ), in Wachsmuth and Wachsmuth (2011), the authors gave the
following error estimates results.
However, the resulting discretized problem (Ph ) is not in a decoupled form as the
finite dimensional l 1 -regularization optimization problem usually does, since (18)
and (19) do not have a decoupled form. Thus, if we directly apply ADMM algorithm
to solve the discretized problem, then the z-subproblem cannot have a closed-form
solution. Thus, directly solving (Ph ), it cannot make full use of the advantages of
ADMM. In order to overcome this bottleneck, we introduce the nodal quadrature
formulas to approximately discretized the L2 -norm and L1 -norm. Let
⎛ ⎞1
Nh 2
Nh
zh L1 (h ) := |zi | φi (x)dx, (22)
h
i=1 h
It is obvious that the L2h -norm and the L1h -norm can be considered as a weighted
l 2 -norm and a weighted l 1 -norm of the coefficient of zh , respectively. Both of them
are norms on Uh . In addition, the L2h -norm is a norm induced by the following inner
product:
Nh
zh , vh L2h (h ) = (zi vi ) φi (x)dx for zh , vh ∈ Uh . (23)
i=1 h
Thus, based on (22) and (21), we derive a new discretized optimal control
problems
⎧
⎪
⎪
1 α
min Jh (yh , uh , zh ) = yh − yd 2L2 ( ) + uh 2L2 ( )
⎪
⎪
⎪
⎪ 2 h 4 h
⎪
⎪
⎪
⎪ α
⎪
⎨ + zh 2L2 ( ) + βzh L1 (h )
4 h h h
h)
(DP
⎪
⎪ s.t. yh = Sh uh ,
⎪
⎪
⎪
⎪
⎪
⎪ uh = zh ,
⎪
⎪
⎪
⎩
zh ∈ Uad,h .
It should be mentioned that the approximate L1h was already used in Wachsmuth
and Wachsmuth (2011, Section 4.4). However, different from their discretization
schemes, in this chapter, in order to keep the separability of the discrete L2 -norm
with respect to z, we use (21) to approximately discretize it. In addition, although
these nodal quadrature formulas incur additional discrete errors, it will be proven
that these approximation steps will not change the order of error estimates as shown
in (20); see Theorem 1.
To give the error estimates, we first introduce the Karush-Kuhn-Tucker (KKT)
conditions. It is clear that problem (P) is continuous and strongly convex . Therefore,
the existence and uniqueness of solution of (P) are obvious.
16 Numerical Solution for Sparse PDE Constrained Optimization 637
y ∗ = S(u∗ + yr ), (26a)
p∗ = S∗ (y ∗ − yd ), (26b)
α ∗
u + p∗ + λ∗ = 0, (26c)
2
u∗ = z∗ , (26d)
z∗ ∈ Uad , (26e)
α ∗ ∗ ∗
z − λ , z̃ − z + β(z̃L1 () − z∗ L1 () ) ≥ 0, ∀z̃ ∈ Uad . (26f)
2 L2 ()
Moreover, we have
∗ 1 ∗
u = PUad soft −p , β , (27)
α
where the projection operator PUad (·) and the soft thresholding operator soft(·, ·)
are defined as follows, respectively:
Analogous to the continuous problem (P), the discretized problem (DP h ) is also
a strictly convex problem, which is uniquely solvable. We derive the following
first-order optimality conditions, which are necessary and sufficient for the optimal
h ).
solution of (DP
yh = Sh (uh + yr ), (29a)
ph = S∗h (yh − yd ), (29b)
α
uh + ph + λh = 0, (29c)
2
638 X. L. Song and B. Yu
uh = zh , (29d)
zh ∈ Uad,h , (29e)
α
zh , z̃h − zh − (λh , z̃h − zh )L2 (h )
2 L2 (h )
h
∀z̃h ∈ Uad,h .
Now, let us start to do error estimation. Let (y, u, z) be the optimal solution of
h ). We have
problem (P), and (yh , uh , zh ) be the optimal solution of problem (DP
the following results.
Theorem 4. Let (y, u, z) be the optimal solution of problem (P), and (yh , uh , zh )
h ). For any h > 0 small enough and α0 > 0,
be the optimal solution of problem (DP
there is a constant C such that for all 0 < α ≤ α0 ,
α 1
u − uh 2L2 () + y − yh 2L2 () ≤ C(h2 + αh2 + α −1 h2 + h3 + α −1 h4 + α −2 h4 ),
2 2
Proof. Due to the optimality of z and zh , z and zh satisfy (26f) and (29f),
respectively. Let us use the test function zh ∈ Uad,h ⊂ Uad in (26f) and the test
function z̃h := Πh z ∈ Uad,h in (29f); thus, we have
α
z − λ, zh − z + β zh L1 () − zL1 () ≥ 0, (30)
2 L2 ()
α
zh , z̃h −zh − λh , z̃h −zh L2 (h ) +β z̃h L1 (h ) −zh L1 (h ) ≥ 0.
2 L2 (h )
h h
h
(31)
α α
z − λ, z − zh + β zL1 (h ) − zh L1 (h ) ≤ λ − z, z
2 L2 (h ) 2 L2 (\h )
where the last inequality follows from the boundedness of λ and z and the
assumption |\h | ≤ ĉh2 .
16 Numerical Solution for Sparse PDE Constrained Optimization 639
+ β zh L1 (h ) − zL1 (h ) + z̃h L1 (h ) − zh L1 (h ) + ch2
h h
α α
(uh − u) + ph − p, z − zh uh + ph , z̃h − z
= 2 L2 (h ) + 2 L2 (h ) (35)
!" # !" #
I1 I2
β zh L1 (h ) − zL1 (h ) + z̃h L1 (h ) − zh L1 (h )
+ !" h h
# + ch ,
2
I3
α
α α ph − p, z̃h − zh L2 (h ) u + p, z̃h − z
z − zh 2L2 ( ) + u − uh 2L2 ( ) ≤ !" #+ 2 L2 (h )
2 h 2 h
I4 !" #
I5
α
uh − u, z̃h − z L2 (h )
+ 2 !" # + ch .
2
I6
(37)
For the term I4 , let p̃h = S∗h (y − yd ), and we have
Consequently,
α α
z − zh 2L2 ( ) + u − uh 2L2 ( ) + y − yh 2L2 ( ) ≤ I5 + I6 + I7 + I8 + ch2 .
2 h 2 h h
(38)
In order to further estimate (38), we will discuss each of these items from I5 to
I8 in turn. Firstly, from the regularity of the optimal control u, i.e., u ∈ H 1 (), and
(27), we know that
1 β
uH 1 () ≤ pH 1 () + + |a| + b M(), (39)
α α
α 3 1
u + pH 1 () ≤ pH 1 () + (β + α|a| + αb)M().
2 2 2
Moreover, due to the boundedness of the optimal control u, the state y, the adjoint
state p, and the operator S, we can choose a large enough constant L > 0
independent of α, h and a constant α0 , such that for all 0 < α ≤ α0 and h > 0, the
following inequation holds:
3
pH 1 () + (β + αa + αb)M() +y − yd L2 () + yr L2 ()
2
+SL(H −1 ,L2 ) + sup uh ≤ L. (40)
uh ∈Uad,h
16 Numerical Solution for Sparse PDE Constrained Optimization 641
From (40) and u = z, we have zH 1 () ≤ α −1 L. Thus, for the term I5 , utilizing
Lemma 2, we have
α
I5 ≤ u + pH 1 (h ) z̃h − zH −1 (h ) ≤ c2 LzH 1 (h ) h2 ≤ c2 L2 α −1 h2 .
2
(41)
For terms I6 and I7 , using Hölder’s inequality, Lemma 1, and Lemma 2, we have
α α α c2 L2 α −1 2
I6 ≤ uh − u2L2 ( ) + z̃h − z2L2 ( ) ≤ uh − u2L2 ( ) + 2 h ,
4 h 4 h 4 h 4
(42)
and
1
I7 ≤ y − yh 2L2 ( ) + 2Sh − S2L(L2 ,L2 ) (z̃h 2L2 ( ) + yr 2L2 ( ) )
2 h h h
1
≤ y − yh 2L2 ( ) + 2c12 L2 h4 + c22 L3 α −2 h4 .
2 h
I8 ≤ y − yd L2 (h ) Sh − SL(L2 ,L2 ) (z̃h − zL2 (h ) + z − zh L2 (h ) )
α 1
u−uh 2L2 ( ) + y −yh 2L2 ( ) ≤ C(h2 +α −1 h2 +α −1 h3 +α −1 h4 +α −2 h4 ),
2 h 2 h
where C > 0 is a properly chosen constant. Using again the assumption |\h | ≤
ch2 , we can get
α 1
u−uh 2L2 () + y −yh 2L2 () ≤ C(h2 +αh2 +α −1 h2 +h3 +α −1 h4 +α −2 h4 ).
2 2
Corollary 1. Let (y, u, z) be the optimal solution of problem (P), and (yh , uh , zh )
h ). For every h0 > 0, α0 > 0, there is a
be the optimal solution of problem (DP
constant C > 0 such that for all 0 < α ≤ α0 , 0 < h ≤ h0 it holds
3
u − uh L2 () ≤ C(α −1 h + α − 2 h2 ),
In this section, we will introduce the ihADMM algorithm with the aim of solving
h ) to moderate accuracy. Firstly, let us define following stiffness and mass
(DP
matrices:
$ % Nh
Nh
Kh = ah (φi , φj ) i,j =1
, Mh = φi φj dx ,
h
i,j =1
Due to the quadrature formulas (21) and (22), a lumped mass matrix Wh =
Nh
diag h φi (x)dx is introduced. Moreover, by (24) in Proposition 1, we have
i=1
the following results about the mass matrix Mh and the lump mass matrix Wh .
&
Nh &
Nh
Denoting by yd,h := ydi φi (x) and yc,h := yri φi (x) the L2 -projection of
i=1 i=1
yd and yr onto Yh , respectively, and identifying discretized functions with their
h ) as a matrix-vector form:
coefficient vectors, we can rewrite the problem (DP
⎧
⎪
⎪ 1 α α
⎪
⎪ min y − yd 2Mh + u2Mh + z2Wh + Wh z1
⎪
⎪ (y,u,z)∈R3Nh 2 4 4
⎪
⎪
⎨
s.t. Kh y = Mh (u + yr ), (DPh )
⎪
⎪
⎪
⎪ u = z,
⎪
⎪
⎪
⎪
⎩ z ∈ [a, b]Nh .
⎧
⎪
⎨ min f (u) + g(z)
(u,z)∈R2Nh
(RDPh )
⎪
⎩ s.t. u = z.
where
1 −1 α
f (u) = Kh Mh (u + yr ) − yd 2Mh + u2Mh ,
2 4
α
g(z) = zWh + βWh z1 + δ[a,b]Nh (z).
2
(45)
4
z k+1
= arg min g(z) + λk , uk+1 − z + σ/2uk+1 − z2 , (ADMM1)
⎪
⎪ z
⎪
⎪
⎪
⎩ λk+1 = λk + τ σ (uk+1 − zk+1 ).
⎧
⎪
⎪ uk+1 = arg min f (u) + λk , Mh (u − zk ) + σ/2u − zk 2Wh ,
⎪
⎪
⎪
⎨
u
z k+1
= arg min g(z) + λk , Mh (uk+1 − z) + σ/2uk+1 − z2Wh ,
⎪
⎪ z
⎪
⎪
⎪
⎩λ k+1
= λ + τ σ (uk+1 − zk+1 ).
k
(ADMM2)
644 X. L. Song and B. Yu
⎧
⎪
⎪ uk+1 = arg min f (u) + λk , Mh (u − zk ) + σ/2u − zk 2Mh ,
⎪
⎪
⎪
⎨
u
z k+1
= arg min g(z) + λk , Mh (uk+1 − z) + σ/2uk+1 − z2Mh ,
⎪
⎪ z
⎪
⎪
⎪
⎩ λk+1 = λk + τ σ (uk+1 − zk+1 ).
(ADMM3)
⎧
⎪
⎪ uk+1 = arg min f (u) + λk , Mh (u − zk ) + σ/2u − zk 2Mh ,
⎪
⎪
⎪
⎨
u
z k+1
= arg min g(z) + λk , Mh (uk+1 − z) + σ/2uk+1 − z2Wh ,
⎪
⎪ z
⎪
⎪
⎪
⎩λ k+1
= λ + τ σ (uk+1 − zk+1 ).
k
(ADMM4)
As one may know, (ADMM1) is actually the classical ADMM for (RDPh ). The
remaining three ADMM-type algorithms are proposed based on the structure of
(RDPh ). Now, let us start to analyze and compare the advantages and disadvantages
of the four algorithms. Firstly, we focus on the z-subproblem in each algorithm.
Since both identity matrix I and lumped mass matrix Wh are diagonal, it is clear that
all the z-subproblems in (ADMM1), (ADMM2), and (ADMM4) have a closed form
solution, except for the z-subproblem in (ADMM3). Specifically, for z-subproblem
in (ADMM1), the closed-form solution could be given by
α
zk = PUad ( Wh + σ I )−1 Wh soft(Wh−1 (σ uk+1 + λk ), β) . (49)
2
Similarly, the u-subproblem in (ADMM2) can be converted into the following linear
system:
16 Numerical Solution for Sparse PDE Constrained Optimization 645
⎡ ⎤⎡ ⎤ ⎡ ⎤
Mh 0 Kh y k+1 Mh yd
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ 0 α
2 M h + σ Wh −Mh ⎦ ⎣ uk+1 ⎦ = ⎣ σ Wh zk − Mh λk ⎦ . (52)
Kh −Mh 0 p k+1 Mh yr
while, reduced forms of (51) and (52), both involve the inversion of Mh .
For abovementioned reasons, we prefer to use (ADMM4), which is called
the heterogeneous ADMM (hADMM). However, in general, it is expensive and
unnecessary to exactly compute the solution of saddle point system (54) even if it is
doable, especially at the early stage of the whole process. Based on the structure
of (54), it is a natural idea to use the iterative methods such as some Krylov-
based methods. Hence, taking the inexactness of the solution of u-subproblem into
account, a more practical inexact heterogeneous ADMM (ihADMM) algorithm is
proposed.
Due to the inexactness of the proposed algorithm, we first introduce an error
tolerance. Throughout this chapter, let {k } be a summable sequence of nonnegative
numbers, and define
∞
∞
C1 := k ≤ ∞, C2 := k2 ≤ ∞. (55)
k=0 k=0
For the ihADMM (Algorithm 1), in this section, we establish the global convergence
and the iteration complexity results in non-ergodic sense for the sequence generated
by Algorithm 1.
646 X. L. Song and B. Yu
Set k = 1.
Output : uk , zk , λk
Step 1 Find an minimizer (inexact)
σ
uk+1 = arg min f (u) + (Mh λk , u − zk ) + u − zk 2Mh − δ k , u ,
2
Before giving the proof of Theorem 5, we first provide a lemma, which is useful
for analyzing the non-ergodic iteration complexity of ihADMM and introduced in
Chen and Toh (2017).
For the convenience of the iteration complexity analysis below, we define the
function Rh : (u, z, λ) → [0, ∞) by
By the definitions of f (u) and g(z) in (45), it is obvious that f (u) and g(z) are
both closed, proper, and convex functions. Since Mh and Kh are symmetric positive
16 Numerical Solution for Sparse PDE Constrained Optimization 647
which are the exact solutions at the (k+1)-th iteration in Algorithm 1. The following
results show the gap between (uk+1 , zk+1 ) and (ūk+1 , z̄k+1 ) in terms of the given
error tolerance δ k 2 ≤ k .
Proof. By the optimality conditions at point (uk+1 , zk+1 ) and (ūk+1 , z̄k+1 ), we
have
thus,
which implies (61). From (50) and (60), and the fact that the projection operator
[a,b] (·) and soft thresholding operator soft(·, ·) are nonexpansive, we get
648 X. L. Song and B. Yu
σ
zk+1 − z̄k+1 ≤ uk+1 − ūk+1 ,
σ + 0.5α
r k = uk − zk , r̄ k = ūk − z̄k
λ̃k+1 = λk + σ r k+1 , λ̄k+1 = λk + τ σ r̄ k+1 , λ̂k+1 = λk + σ r̄ k+1 ,
and give two inequalities which is essential for establishing both the global
convergence and the iteration complexity of our ihADMM. For the details of the
proof, one can see in Appendix.
1 σ
δ k , uk+1 − u∗ + λk − λ∗ 2Mh + zk − z∗ 2Mh
2τ σ 2
1 σ
− λk+1 − λ∗ 2Mh − zk+1 − z∗ 2Mh ≥ uk+1 − u∗ 2T
2τ σ 2
σ k+1 σ σ
+ z − z∗ 22Wh −Mh + r k+1 2Wh −τ Mh + uk+1 − zk 2Mh ,
2 2 2
(63)
where T := Σf − 2 (Wh − Mh ).
σ
1 σ 1 σ
λk − λ∗ 2Mh + zk − z∗ 2Mh − λ̄k+1 − λ∗ 2Mh − z̄k+1 − z∗ 2Mh
2τ σ 2 2τ σ 2
∗ 2 σ k+1 ∗ 2 σ k+1 2 σ k+1
≥ ūk+1
− u T + z̄ −z 2Wh −Mh + r̄ Wh−τ Mh + ū − zk 2Mh ,
2 2 2
(64)
where T := Σf − σ2 (Wh − Mh ).
Moreover, there exists a constant C only depending on the initial point (u0 , z0 , λ0 )
and the optimal solution (u∗ , z∗ , λ∗ ) such that for k ≥ 1,
C
min {Rh (ui , zi , λi )} ≤ , lim k × min {Rh (ui , zi , λi )} = 0. (67)
1≤i≤k k k→∞ 1≤i≤k
Proof. It is easy to see that (u∗ , z∗ ) is the unique optimal solution of discrete
problem (RDPh ) if and only if there exists a Lagrangian multiplier λ∗ such that
the following Karush-Kuhn-Tucker (KKT) conditions hold:
In the inexact heterogeneous ADMM iteration scheme, the optimality conditions for
(uk+1 , zk+1 ) are
Next, let us first prove the global convergence of iteration sequences, e.g.,
establish the proof of (65) and (66).
The first step is to show that {(uk , zk , λk )} is bounded. We define the following
sequence θ k and θ̄ k with:
$ /%
1 1 σ 12 k ∗ ∗
θ = √
k
Mh (λ − λ ),
2 k
M (z − z ) ,
2τ σ 2 h
$ / % (70)
1 1
∗ σ 1
∗
θ̄ = √
k
Mh2 (λ̄ − λ ),
k
M 2 (z̄ − z ) .
k
2τ σ 2 h
θ k 2 . As a result, we have
θ k+1 ≤ θ̄ k+1 + θ̄ k+1 − θ k+1 = θ k + θ̄ k+1 − θ k+1 . (71)
650 X. L. Song and B. Yu
1 σ
θ̄ k+1 − θ k+1 2 = λ̄k+1 − λk+1 2Mh + z̄k+1 − zk+1 2Mh
2τ σ 2 (72)
≤ (2τ + 1/2)σ Mh ρ 2 k2 ≤ 5/2σ Mh ρ 2 k2 ,
√
which implies θ̄ k+1 − θ k+1 ≤ 5/2σ Mh ρk . Hence, for any k ≥ 0, we have
0 0 ∞
θ k+1 ≤ θ k + 5/2σ Mh ρk ≤ θ 0 + 5/2σ Mh ρ k
k=0 (73)
0
= θ 0 + 5/2σ Mh ρC1 ≡ ρ̄.
From θ̄ k+1 ≤ θ k , for any k ≥ 0, we also have θ̄ k+1 ≤ ρ̄. Therefore,
the sequences {θ k } and {θ̄ k } are bounded. From the definition of {θ k } and the
fact that Mh 0, we can see that the sequences {λk } and {zk } are bounded.
Moreover, from updating technique of λk , we know {uk } is also bounded. Thus,
due to the boundedness of the sequence {(uk , zk , λk )}, we know the sequence has
a subsequence {(uki , zki , λki )} which converges to an accumulation point (ū, z̄, λ̄).
Next we should show that (ū, z̄, λ̄) is a KKT point and equal to (u∗ , z∗ , λ∗ ).
Again employing Proposition 4, we can derive
∞
σ σ σ
ūk+1 −u∗ 2T + z̄k+1 −z∗ 22Wh−Mh + r̄ k+1 2Wh−τ Mh + ūk+1 −zk 2Mh
2 2 2
k=0
∞
0
≤ (θ k 2 −θ k+1 2 +θ k+1 2 −θ̄ k+1 2 ) ≤ θ 0 2 +2ρ̄ 5/2σ Mh ρC1 < ∞.
k=0
(74)
Note that T 0, Wh − Mh 0, Wh − τ Mh 0 and Mh 0, then we have
From the fact that lim k = 0 and (75), by taking the limit of both sides of (76),
k→∞
we have
16 Numerical Solution for Sparse PDE Constrained Optimization 651
Rh (w k+1 ) = Mh λk+1 + ∇f (uk+1 )2 + dist2 (0, −Mh λk+1 + ∂g(zk+1 ))
+ uk+1 − zk+1 2 ≤ 2δ k 2 + ηr k+1 2 + 4σ 2 Mh uk+1 − zk 2Mh ,
(79)
where η := 2(τ − 1)2 σ 2 Mh 2 + 2σ 2 Mh 2 + σ 2 Wh − τ Mh 2 + 1.
In order to get a upper bound for Rh (w k+1 ), we will use (63) in Proposition 3.
First, by the definition of θ k and (73), for any k ≥ 0 we can easily have
1 1
∗ 2τ σ ∗ 2
λ − λ ≤ ρ̄
k
, z − z ≤ ρ̄
k
.
Mh−1 σ Mh−1
∞
∞
∞
σ σ
r k+1 2W −τ M + uk+1 − zk 2M ≤ (θ k − θ k+1 ) + δ k , uk+1 − u∗
2 h h 2 h
k=0 k=0 k=0
∞
∞
≤ θ 0 + η̄ δ k ≤ θ 0 + η̄ k = θ 0 + η̄C1 .
k=0 k=0
(81)
Hence,
∞
∞
2(θ 0 + η̄C1 ) 2(θ 0 + η̄C1 )
r ≤
k+1 2
, uk+1 − zk 2Mh ≤ .
σ (Wh − τ Mh )−1 σ
k=0 k=0
(82)
By substituting (82) to (79), we have
∞
∞
∞
∞
Rh (w k+1 ) ≤ 2 δ k 2 + η r k+1 2 + 4σ 2 Mh uk+1 − zk 2Mh
k=0 k=0 k=0 k=0
1 1
min Φh (μ, λ, p) := Kh p − Mh yd 2 −1 + λ + μ − p2Mh + Mh yr , p
μ,λ,p∈RNh 2 Mh 2α
∗ 1
+ δ[−β,β] (λ) + δ[a,b] (Mh μ) − yd 2Mh .
2
(Dh )
16 Numerical Solution for Sparse PDE Constrained Optimization 653
1
min φ(x1 ) + x, Hx − r, x , (84)
2
where x ≡ (x1 , . . . , xs ) ∈ X with xi ∈ Xi , i = 1, . . . , s, φ : X1 → (−∞, +∞]
is a closed proper convex function, H : X → X is a given self-adjoint positive
semidefinite linear operator, and r ≡ (r1 , . . . , rs ) ∈ X is a given vector.
For notational convenience, we denote the quadratic function in (84) as
1
h(x) := x, Hx − r, x , (85)
2
H = D + U + U∗ , (87)
654 X. L. Song and B. Yu
where
⎛ ⎞
0 H12 ··· H1s
⎜ . ⎟
⎜ .. ⎟
⎜ · · · H2s ⎟
U := ⎜ ⎟, (88)
⎜ ..
. H(s−1)s ⎟
⎝ ⎠
0
with the convention x≤0 = x≥0 = ∅. Moreover, in order to solve problem (84)
inexactly, we introduce the following two error tolerance vectors:
where Δ(δ , δ) could be regarded as the error term. Then, the following sGS
decomposition theorem, which is established by Li et al. in (2015), shows that
computing x + in (91) is equivalent to computing in an inexact block symmetric
Gauss-Seidel-type sequential updating of the variables x1 , . . . , xs .
Theorem 6 (Li et al. 2015, Theorem 2.1). Assume that the self-adjoint linear
operators Hii are positive definite for all i = 1, . . . , s. Then, it holds that
H + T = (D + U)D−1 (D + U∗ ) 0. (92)
then the optimal solution x + defined by (91) can be obtained exactly via
⎧
⎪
⎪ x1+ = arg min φ(x1 ) + h(x1 , x≥2
) − δ1 , x1 ,
⎪
⎪ x1 ∈X1
⎪
⎪
⎪
⎪ +
⎪
⎨ xi = arg min φ(x1+ ) + h(x≤i−1
+
, xi , x≥i+1 ) − δi , xi
xi ∈Xi
⎛ ⎞ (94)
⎪
⎪
⎪
⎪
i−1
s
⎪
⎪ ⎝ri + δi −
⎪
⎪ = H−1 H∗j i xj+ − Hij xj ⎠ , i = 2, . . . , s.
⎪
⎩
ii
j =1 j =i+1
Remark 2. (a). In (93) and (94), xi and xi+ should be regarded as inexact solutions
to the corresponding minimization problems without the linear error terms δi , xi
and δi , xi . Once these approximate solutions have been computed, they would
generate the error vectors δi and δi as follows:
⎛ ⎞
i−1
s
δi = Hii xi − ⎝ri − H∗j i x̄j − Hij xj ⎠ , i = s, . . . , 2,
j =1 j =i+1
⎛ ⎞
s
δ1 ∈ ∂φ(x1+ ) + H11 x1+ − ⎝r1 − H1j xj ⎠ ,
j =2
⎛ ⎞
i−1
s
δi = Hii xi+ − ⎝ri − H∗j i xj+ − Hij xj ⎠ , i = 2, . . . , s.
j =1 j =i+1
With the above known error vectors, we have that xi and xi+ are the exact solutions
to the minimization problems in (93) and (94), respectively.
(b). In actual implementations, assuming that for i = s, . . . , 2, we have computed
xi in the backward GS sweep for solving (93), then when solving the subproblems
in the forward GS sweep in (94) for i = 2, . . . , s, we may try to estimate xi+ by
using xi , and in this case the corresponding error vector δi would be given by
i−1
δi = δi + H∗j i (xj − x̄j ).
j =1
656 X. L. Song and B. Yu
In order to estimate the error term Δ(δ , δ) in (90), we have the following
proposition.
7 = H + T is
Proposition 5 (Li et al. 2015, Proposition 2.1). Suppose that H
7 −1/2
positive definite. Let ξ = H Δ(δ , δ). It holds that
7 −1/2
ξ ≤ D−1/2 (δ − δ ) + H δ . (95)
1 1 1
φ(v, w) = Kh p − Mh yd 2 −1 + λ + μ − p2Mh + Mh yr , p − yd 2Mh ,
2 M h 2α 2
(98)
where f : V → (−∞, +∞] and g : W → (−∞, +∞] are two convex functions
(possibly nonsmooth), φ : V × W → (−∞, +∞] is a smooth convex function,
and V, W are real finite dimensional Hilbert spaces.
1
φ(z) = φ(z ) + ∇φ(z ), z − z + z − z2G ,
2
where ∂ 2 φ(z ) denotes the Clarke’s generalized Hessian at given z and [z , z]
denotes the line segment connecting z and z. Under Assumption 3, it is obvious
that there exist two self-adjoint positive semidefinite linear operators Q and Q7 :
V × W → V × W such that for any z ∈ V × W,
7
Q G Q, ∀ G ∈ ∂ 2 φ(z).
1
φ(z) ≥ φ(z ) + ∇φ(z ), z − z + z − z2Q ,
2
and
1
φ(z) ≤ φ̂(z; z ) := φ(z ) + ∇φ(z ), z − z + z − z2Q7.
2
Assumption 4 (Cui 2016, Assumption 3.1). There exist two self-adjoint positive
semidefinite linear operators D1 : U → U and D2 : V → V such that
Q7:= Q + Diag(D1 , D2 ).
(BIQ) programming, the SDP relaxation for computing lower bounds for quadratic
assignment problems (QAPs), and so on, and one can refer to Sun et al. (2016).
Fortunately, it should be noted that the function φ defined in (98) for our problem
(Dh ) is quadratic and thus we can choose Q = ∇ 2 φ.
We can now present the inexact majorized ABCD algorithm for the general
problem (99) as follows.
max{δvk , δw
k
} ≤ k .
Compute
⎧
⎪
⎨ ṽ = arg v∈V
min{f (v) + φ̂(v, w k ; v k , w k ) − δvk , v },
k
⎪
⎪
⎪
⎩ w̃ = arg min {g(w) + φ̂(ṽ , w; v , w ) − δw , w }.
k k k k k
w∈W
3
1+ 1+4tk2 tk −1
Step 2 Set tk+1 = 2 and βk = tk+1 , compute
Here we state the convergence result without proving. For the detailed proof, one
could see Cui (2016, Chapter 3). This theorem builds a solid foundation for our
subsequent proposed algorithm.
Theorem 7 (Cui 2016, Theorem 3.2). Suppose that Assumption 4 holds and the
solution set of the problem (99) is non-empty. Let z∗ = (v ∗ , w ∗ ) ∈ . Assume
&
∞
that kk < ∞. Then the sequence {z̃k } := {(ṽ k , w̃ k )} generated by the Algorithm
k=1
2 satisfies that
2z̃0 − z∗ 2S + c0
θ (z̃k ) − θ (z∗ ) ≤ , ∀k ≥ 1,
(k + 1)2
Now, we can apply Algorithm 2 to our problem (Dh ), where μ is taken as one block,
and (λ, p) are taken as the other one. Let us denote z = (μ, λ, p). Since φ defined
in (98) for (Dh ) is quadratic, we can take
⎛ ⎞
M Mh −Mh
1⎜ h ⎟
Q := ⎝ Mh Mh −Mh ⎠, (100)
α
−Mh −Mh Mh + αKh Mh−1 Kh
where
$ %
1 1 Mh −Mh
Q11 := Mh , Q22 := .
α α −Mh Mh + αKh Mh−1 Kh
∗ 1
μ̃k = arg min δ[a,b] (Mh μ) + φ(μ, λk , p k ) + μ − μk 2D1 − δμk , μ ,
2
8$ % $ %82
8 8
1 8 λ λk 8
(λ̃k , p̃ k ) = arg min δ[−β,β] (λ)+φ(μ̃k , λ, p)+ 88 − 8 − δk , λ − δk , p .
28 p p 8
k
8
λ p
D2
3
1+ 1+4tk2 tk −1
Step 2 Set tk+1 = 2 and βk = tk+1 , compute
μk+1 = μ̃k + βk (μ̃k − μ̃k−1 ), p k+1 = p̃ k + βk (p̃ k − p̃ k−1 ), λk+1 = λ̃k + βk (λ̃k − λ̃k−1 ).
660 X. L. Song and B. Yu
Next, another key issue that should be considered is how to choose the operators
D1 and D2 . As we know, choosing the appropriate and effective operators D1 and
D2 is an important thing from the perspective of both theory analysis and numerical
implementation. Note that for numerical efficiency, the general principle is that both
D1 and D2 should be chosen as small as possible such that μ̃k and (λ̃k , p̃k ) could
take larger step-lengths while the corresponding subproblems still could be solved
relatively easily.
First, for the proximal term 12 μ − μk 2D1 , in order to make the subproblem of
the block μ having an analytical solution, and from Proposition (1), we choose
⎧
1 1 ⎨ 4 if n = 2,
D1 := cn Mh Wh−1 Mh − Mh , where cn =
α α ⎩ 5 if n = 3.
Next, we will focus on how to choose the operator D2 . If we ignore the proximal
8$ % $ %82
8 8
8 λ λk 8
term 12 8
8 p − 8 and the error terms, it is obvious that the subproblem
8 pk 88
D2
of the block (λ, p) belongs to the form (84), which can be rewritten as
9$ % $ % $ %:
1 λ λ λ
min δ[−β,β] (λ) + ,H − r, , (102)
2 p p p
⎛ ⎞
Mh −Mh
where H = Q22 = 1 ⎝ ⎠ and
Mh + αKh Mh−1 Kh
α
−Mh
$ %
1 k
r = α Mh μ̃ . Since the objective function of (102) is the
Mh yr − Kh yd − α1 Mh μ̃k
sum of a two-block quadratic function and a nonsmooth function involving only the
first block, thus the inexact sGS technique, which is introduced in Section , can be
used to solve (102) . To achieve our goal, we choose
⎛ ⎞
−1 −1
1 ⎝ Mh (Mh + αKh Mh Kh ) Mh 0
⎠.
D2 = sGS(Q22 ) =
α 0 0
Then according to Theorem 6, we can solve the (λ, p)-subproblem by the following
procedure:
16 Numerical Solution for Sparse PDE Constrained Optimization 661
⎧
⎪
⎪ 1 1
⎪
⎪ p̂k = arg min Kh p − Mh yd 2 −1 + p − λk − μ̃k + αyr 2Mh − δ̂pk , p ,
⎪
⎪ 2 M 2α
⎪
⎪
h
⎨
1
λ̃k = arg min λ − (p̂k − μ̃k )2Mh + δ[−β,β] (λ),
⎪
⎪ 2α
⎪
⎪
⎪
⎪ k 1 1
⎪
⎪
⎩ p̃ = arg min Kh p − Mh yd M −1 + p − λ̃k − μ̃k + αyr 2Mh − δpk , p .
2
2 h 2α
(103)
However, it is easy to see that the λ-subproblem is coupled about the variable λ
since the mass matrix Mh is not diagonal; thus, there is no closed-form solution for
λ. To overcome this difficulty, we can take advantage of the relationship between the
mass matrix Mh and the lumped mass matrix Wh and add a proximal term 2α 1
λ −
λ Wh −Mh to the λ-subproblem. Fortunately, we have
k 2
⎛ - .⎞
1 W − M 0
sGS(Q22 ) = sGS ⎝Q22 + h h ⎠,
α 0 0
1
which implies that the proximal term 2α λ−λk 2Wh −Mh has no influence on the sGS
technique. Thus, we can choose D2 as follows:
$ %
1 Wh − Mh 0
D2 = sGS(Q22 ) + .
α 0 0
&
∞
Theorem 8. Assume that kk < ∞. Let {z̃k } := {(μ̃k , λ̃k , p̃k )} be the sequence
i=k
generated by Algorithm 4. Then we have
662 X. L. Song and B. Yu
2z̃0 − z∗ 2S + c0
Φh (z̃k ) − Φh (z∗ ) ≤ , ∀k ≥ 1,
(k + 1)2
Compute
1 ∗ 1
μ̃k = arg min μ − (p k − λk )2Mh + δ[a,b] (Mh μ) + μ − μk 2D1 − δμk , μ ,
2α 2
1 1
p̂ k = arg min Kh p − Mh yd 2 −1 + p − λk − μ̃k + αyr 2Mh − δ̂pk , p ,
2 Mh 2α
1 1
λ̃k = arg min λ − (p̂ k − μ̃k )2Mh + δ[−β,β] (λ) + λ − λk 2Wh −Mh ,
2α 2α
1 1
p̃ k = arg min Kh p − Mh yd 2 −1 + p − λ̃k − μ̃k + αyr 2Mh − δpk , p .
2 Mh 2α
3
1+ 1+4tk2 tk −1
Step 2 Set tk+1 = 2 and βk = tk+1 , compute
μk+1 = μ̃k + βk (μ̃k − μ̃k−1 ), p k+1 = p̃ k + βk (p̃ k − p̃ k−1 ), λk+1 = λ̃k + βk (λ̃k − λ̃k−1 ).
Numerical Results
In this section, we will first use Example 1 and Example 2 to evaluate the numerical
behavior of the ihADMM and use Example 3 and Example 4 to evaluate the
numerical behavior of the sGS-imABCD.
Algorithmic Details
We begin by describing the algorithmic details which are common to all examples.
Discretization. The discretization was carried out by using piecewise linear and
continuous finite elements. The assembly of mass and the stiffness matrices, as well
as the lump mass matrix, was left to the iFEM software package. To present the
finite element error estimate results, it is convenient to introduce the experimental
order of convergence (EOC), which for some positive error functional E(h) with
h > 0 is defined as follows: Given two grid sizes h1 = h2 , let
It follows from this definition that if E(h) = O(hγ ), then EOC ≈ γ . The
error functional E(·) investigated in the present section is given by E2 (h) :=
u − uh L2 () .
Initialization. For all numerical examples, we choose u = 0 as initialization u0
for all algorithms.
In Example 1 and Example 2, for comparison with ihADMM, we will also show
the numerical results obtained by the classical ADMM and the APG algorithm,
and the PDAS with line search. For the classical ADMM and our ihADMM, the
penalty parameter σ was chosen as σ = 0.1α. About the step-length τ , we choose
τ = 1.618 for the classical ADMM, and τ = 1 for our ihADMM. For the PDAS
method, the parameter in the active set strategy was chosen as c = 1. For the
APG method, we estimate an approximation for the Lipschitz constant L with a
backtracking method. In the numerical experiments, we measure the accuracy of
an approximate optimal solution by using the corresponding K-K-T residual error
for each algorithm. For the purpose of showing the efficiency of our ihADMM,
we report the numerical results obtained by running the classical ADMM and the
APG method to compare with the results obtained by our ihADMM. In this case, we
terminate all the algorithms when η < 10−6 with the maximum number of iterations
set at 500.
In Example 3 and Example 4, for comparison with sGS-imABCD, we will
also show the numerical results obtained by the ihADMM and APG methods for
(DPh ). For the ihADMM method, the step-length τ for Lagrangian multipliers λ
was chosen as τ = 1, and the penalty parameter σ was chosen as σ = 0.1α. For
664 X. L. Song and B. Yu
Examples
Example 1.
⎧
⎪
⎪ 1 α
⎪
⎪ min J (y, u) = y − yd 2L2 () + u2L2 () + βuL1 ()
⎪
⎪ (y,u)∈H01 ()×L2 () 2 2
⎪
⎪
⎨
s.t. − Δy = u + yc in ,
⎪
⎪
⎪
⎪ y=0
⎪
⎪ on ∂,
⎪
⎪
⎩ u ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e on }.
Here, we consider the problem with control u ∈ L2 () on the unit square =
(0, 1)2 with α = 0.5, β = 0.5, a = −0.5, and b = 0.5. It is a constructed problem;
thus, we set y ∗ = sin(π x1 ) sin(π x2 ) and p∗ = 2β sin(2π x1 ) exp(0.5x1 ) sin(4π x2 ).
Then through u∗ = Uad α1 soft −p∗ , β , yc = y ∗ −Su∗ , and yd = S−∗ p∗ +y ∗ ,
we can construct the example for which we know the exact solution.
The error of the control u w.r.t the L2 -norm and the EOC for control are presented
in Table 1. They also confirm that indeed the convergence rate is of order O(h).
Numerical results for the accuracy of solution, number of iterations, and CPU
time obtained by our ihADMM, classical ADMM, and APG methods are shown
in Table 1. As a result from Table 1, we can see that our proposed ihADMM method
is an efficient algorithm to solve problem (DPh ) to medium accuracy. Moreover,
it is obvious that our ihADMM outperforms the classical ADMM and the APG
method in terms of CPU time, especially when the discretization is in a fine level.
It is worth noting that although the APG method requires less number of iterations
when the termination condition is satisfied, the APG method spends much time
on backtracking step with the aim of finding an appropriate approximation for the
Lipschitz constant. This is the reason that our ihADMM has better performance than
the APG method in actual numerical implementation. Furthermore, the numerical
results in terms of iteration numbers illustrate the mesh-independent performance
of the ihADMM and the APG method, except for the classical ADMM.
16 Numerical Solution for Sparse PDE Constrained Optimization 665
Table 1 Example 1: The convergence behavior of our ihADMM, classical ADMM, and APG for
(DPh ). In the table, #dofs stands for the number of degrees of freedom for the control variable on
each grid level
h #dofs E2 EOC Index ihADMM Classical ADMM APG
2−3 49 0.2925 – iter 27 32 13
residual η 7.15e-07 7.55e-07 6.88e-07
CPU time/s 0.19 0.23 0.18
2−4 225 0.1127 1.3759 iter 31 44 13
residual η 9.77e-07 9.91e-07 8.23e-07
CPU times/s 0.37 0.66 0.32
2−5 961 0.0457 1.3390 iter 31 58 12
residual η 7.41e-07 8.11e-07 7.58e-07
CPU time/s 1.02 2.32 1.00
2−6 3969 0.0161 1.3944 iter 32 76 14
residual η 7.26e-07 8.10e-07 7.88e-07
CPU time/s 4.18 9.12 4.25
2−7 16129 0.0058 1.4132 iter 31 94 14
residual η 5.33e-07 7.85e-07 4.45e-07
CPU time/s 17.72 65.82 26.25
2−8 65025 0.0019 1.4503 iter 32 127 13
residual η 6.88e-07 8.93e-07 7.47e-07
CPU time/s 70.45 312.65 80.81
2−9 261121 0.0007 1.4542 iter 31 255 13
residual η 7.43e-07 7.96e-07 6.33e-07
CPU time/s 525.28 4845.31 620.55
Example 2.
⎧
⎪
⎪ 1 α
⎪
⎪ min J (y, u) = y − yd 2L2 () + u2L2 () + βuL1 ()
⎪
⎪ (y,u)∈Y ×U 2 2
⎪
⎪
⎨
s.t. − Δy = u, in = (0, 1) × (0, 1)
⎪
⎪
⎪
⎪ y = 0, on ∂
⎪
⎪
⎪
⎪
⎩ u ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e on },
where the desired state yd = 16 sin(2π x) exp(2x) sin(2πy) and the parameters α =
10−5 , β = 10−3 , a = −30, and b = 30. In addition, the exact solutions of the
problem are unknown. Instead, we use the numerical solutions computed on a grid
with h∗ = 2−10 as reference solutions.
666 X. L. Song and B. Yu
The error of the control u w.r.t the L2 norm with respect to the solution on the
finest grid (h∗ = 2−10 ) and the experimental order of convergence (EOC) for control
are presented in Table 2. They confirm the linear rate of convergence w.r.t. h.
Numerical results for the accuracy of solution, number of iterations, and CPU
time obtained by our ihADMM, classical ADMM, and APG methods are also
shown in Table 2. Experiment results show that the ADMM has evident advantage
over the classical ADMM and the APG method in computing time. Furthermore,
the numerical results in terms of iteration numbers also illustrate the mesh-
independent performance of our ihADMM. These results demonstrate that our
ihADMM is highly efficient in obtaining an approximate solution with moderate
accuracy.
Table 2 Example 2: The convergence behavior of ihADMM, classical ADMM, and APG for
(DPh )
h #dofs E2 EOC Index ihADMM Classical ADMM APG
2−3 49 6.6122 – iter 40 48 18
residual η 8.22e-07 8.65e-07 7.96e-07
CPU time/s 0.30 0.51 0.24
2−4 225 2.6314 1.3293 iter 41 56 18
residual η 7.22e-07 8.01e-07 7.58e-07
CPU times/s 0.45 0.71 0.44
2−5 961 1.2825 1.1831 iter 40 69 19
residual η 8.12e-07 8.01e-07 7.90e-07
CPU time/s 1.60 3.05 1.58
2−6 3969 0.7514 1.0458 iter 42 85 18
residual η 6.11e-07 7.80e-07 6.45e-07
CPU time/s 7.25 14.62 7.45
2−7 16129 0.2930 1.1240 iter 40 108 18
residual η 6.35e-07 7.11e-07 5.62e-07
CPU time/s 33.85 101.36 34.39
2−8 65025 0.1357 1.1213 iter 41 132 19
residual η 7.55e-07 7.83e-07 7.57e-07
CPU time/s 158.62 508.65 165.75
2−9 261121 0.0958 1.0181 iter 42 278 18
residual η 5.25e-07 5.56e-07 4.85e-07
CPU time/s 1781.98 11788.52 1860.11
2−10 1046529 – – iter 41 500 19
residual η 8.78e-07 Error 8.47e-07
CPU time/s 42033.79 Error 44131.27
16 Numerical Solution for Sparse PDE Constrained Optimization 667
Example 3.
⎧
⎪
⎪ 1 α
⎪
⎪ min J (y, u) = y − yd 2L2 () + u2L2 () + βuL1 ()
⎪
⎪ (y,u)∈H01 ()×L2 () 2 2
⎪
⎪
⎨
s.t. − Δy = u + yr in ,
⎪
⎪
⎪
⎪ y=0
⎪
⎪ on ∂,
⎪
⎪
⎩ u ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e on }.
Here, we consider the problem with control u ∈ L2 () on the unit square =
(0, 1)2 with α = 0.5, β = 0.5, a = −0.5, and b = 0.5. It is a con-
structed problem; thus, we set y ∗ = sin(2π x1 ) exp(0.5x1 ) sin(4π x2 ) and p∗ =
2β sin(2π x1 ) exp(0.5x1 ) sin(4π x2 ).
The error of the control u w.r.t the L2 norm and the experimental order of
convergence (EOC) for control are presented in Tables 3 and 5. They also confirm
that indeed the convergence rate is of order O(h). Comparing the error results from
Tables 3 and 5, it is obvious to see that solving the dual problem (Dh ) could get
better error results than that from solving (DPh ).
Numerical results for the accuracy of solution, number of iterations, and CPU
time obtained by our proposed sGS-imABCD method for (Dh ) are also shown
in Table 3. As a result we obtain from Table 3, one can see that our proposed
sGS-imABCD method is an efficient algorithm to solve problem (Dh ) to high
accuracy. It should be pointed out that iter.p̃-block denotes the iterations of p̃ in
Table 3. It is clear that p-subproblem almost always not be computed twice, which
demonstrates the efficiency of our strategy to predict the solution of p̃-subproblem.
Furthermore, the numerical results in terms of iteration numbers illustrate the mesh-
independent performance of our proposed sGS-imABCD method. Additionally, in
Table 4, we list the numbers of iteration steps and the relative residual errors of
PMHSS-preconditioned GMRES method for the p̂-subproblem on mesh h = 2−7
Table 3 Example 3: The performance of sGS-imABCD for (Dh ). In the table, #dofs stands for
the number of degrees of freedom for the control variable on each grid level
h #dofs iter.sGS-imABCD iter.p̃-block residual η CPU time/s E2 EOC
2−3 49 13 4 6.60e-08 0.14 0.1784 -
2−4 225 13 4 6.32e-08 0.20 0.0967 0.8834
2−5 961 12 3 7.38e-08 0.33 0.0399 1.0803
2−6 3969 13 3 9.78e-08 2.04 0.0155 1.1749
2−7 16129 12 3 6.66e-08 8.25 0.0052 1.2754
2−8 65025 10 3 7.05e-08 52.15 0.0017 1.3388
2−9 261121 9 2 5.19e-08 312.82 0.0006 1.3617
668 X. L. Song and B. Yu
and h = 2−8 . From Table 4, we can see that the number of iteration steps of the
PMHSS-preconditioned GMRES method is roughly independent of the mesh size h.
As a comparison, numerical results obtained by the our proposed sGS-imABCD
method for (Dh ) and the iwADMM and APG methods for (DPh ) are shown in
Table 5. As a result from Table 5, it can be observed that our sGS-imABCD is faster
and more efficient than the iwADMM and APG methods in terms of the iterations
and CPU times.
At last, in order to show the robustness of our proposed sGS-imABCD method
with respect to the parameters α and β, we also test the same problem with different
values of α and β on mesh h = 2−8 . The results are presented in Table 6. From
Table 6, it is obvious to see that our method could solve problem (Dh ) to high
accuracy for all tested values of α and β within 50 iterations. More importantly, from
the results, we can see that when α is fixed, the number of iteration steps of the sGS-
imABCD method remains nearly constant for β ranging from 0.005 to 1. However,
for a fixed β, as α increases from 0.005 to 0.5, the number of iteration steps of
the sGS-imABCD method changes drastically. These observations indicate that the
sGS-imABCD method shows the β-independent convergence property, whereas it
does not have the same convergence property with respect to the parameter α.
16 Numerical Solution for Sparse PDE Constrained Optimization 669
Table 5 Example 3: The convergence behavior of sGS-imABCD for (Dh ), ihADMM, and APG
for (DPh ). In the table, #dofs stands for the number of degrees of freedom for the control variable
on each grid level. E2 = min{E2 (sGS − imABCD), E2 (ihADMM), E2 (AP G)}
Index of
h #dofs E2 EOC performance sGS-imABCD ihADMM APG
2−3 49 0.2925 – iter 13 32 16
residual η 6.25e-08 6.33e-08 3.51e-08
CPU time/s 0.16 0.23 0.22
2−4 225 0.1127 1.3759 iter 12 36 18
residual η 6.34e-08 8.91e-08 7.23e-08
CPU times/s 0.24 0.44 0.45
2−5 961 0.0457 1.3390 iter 13 40 16
residual η 7.10e-08 7.42e-08 8.88e-08
CPU time/s 0.47 1.17 2.98
2−6 3969 0.0161 1.3944 iter 14 44 16
residual η 4.05e-08 9.10e-08 6.60e-08
CPU time/s 2.62 6.04 4.86
2−7 16129 0.0058 1.4132 iter 12 50 16
residual η 6.43e-08 9.80e-08 8.45e-08
CPU time/s 10.22 29.53 30.63
2−8 65025 0.0019 1.4503 iter 10 53 17
residual η 7.05e-08 8.93e-08 8.88e-08
CPU time/s 60.45 160.24 92.60
2−9 261121 0.0007 1.4542 iter 10 54 18
residual η 5.21e-08 7.96e-08 3.24e-08
CPU time/s 395.78 915.71 859.22
It should be pointed out that the numerical results are also consistent with the
theoretical conclusion based on Theorem 8.
Example 4.
⎧
⎪
⎪ 1 α
⎪
⎪ min J (y, u) = y − yd 2L2 () + u2L2 () + βuL1 ()
⎪
⎪ (y,u)∈Y ×U 2 2
⎪
⎪
⎨
s.t. − Δy = u in = (0, 1) × (0, 1),
⎪
⎪
⎪
⎪ y=0 on ∂,
⎪
⎪
⎪
⎪
⎩ u ∈ Uad = {v(x)|a ≤ v(x) ≤ b, a.e on },
where the desired state yd = 16 sin(2π x) exp(2x) sin(2πy) and the parameters
α = 10−5 , β = 10−3 , a = −30, and b = 30. In addition, the exact solution of
670 X. L. Song and B. Yu
Table 6 Example 3: The performance of sGS-imABCD for (Dh ) with different values of α and β
h α β iter.sGS-imABCD residual error η about K-K-T
2−8 0.005 0.005 49 7.59e-08
0.05 48 8.86e-08
0.5 46 6.76e-08
1 48 5.49e-08
0.05 0.005 23 8.74e-08
0.05 25 7.26e-08
0.5 22 5.77e-08
1 23 7.63e-08
0.5 0.005 12 6.51e-08
0.05 11 8.80e-08
0.5 10 7.05e-08
1 12 8.53e-08
Table 7 Example 4: The performance of sGS-imABCD for (Dh ). In the table, #dofs stands for
the number of degrees of freedom for the control variable on each grid level
iter.
h #dofs sGS-imABCD No.p̃-block residual η CPU time/s E2 EOC
2−3 49 37 12 8.67e-08 0.64 5.5408 –
2−4 225 30 10 7.32e-08 0.65 2.4426 1.1817
2−5 961 22 8 8.38e-08 0.73 1.1504 1.1340
2−6 3969 22 7 6.83e-08 4.65 0.4380 1.2203
2−7 16129 16 5 6.46e-08 16.60 0.1774 1.2413
2−8 65025 15 3 6.36e-08 105.70 0.1309 1.0807
2−9 261121 15 3 5.65e-08 1158.62 0.0406 1.1821
2−10 1046529 16 3 4.50e-08 24008.07 – –
the problem is unknown. In this case, using a numerical solution as the reference
solution is a common method. For more details, one can see Hinze et al. (2009).
In our practice implementation, we use the numerical solution computed on a grid
with h∗ = 2−10 as the reference solution. It should be emphasized that choosing
the solution that computed on mesh h∗ = 2−10 is reliable. As shown below, when
h∗ = 2−10 , the scale of data is 1046529.
sGS-imABCD method for solving (Dh ) and iwADMM and APG methods for (DPh ).
Comparing the error results from Tables 7 and 9, we can see that directly solving
(Dh ) can get better error results than that from solving (Dh ) and (DPh ). Obviously,
this conclusion shows the efficiency of our dual-based approach which can avoid
the additional error caused by the approximation of L1 -norm. Furthermore, from
Table 7, the numerical results in terms of iteration numbers illustrate the mesh-
independent performance of our proposed sGS-imABCD method.
In addition, in Table 8, numbers of iteration steps and the relative residual errors
of PMHSS-preconditioned GMRES method for the p̂-subproblem on mesh h = 2−7
672 X. L. Song and B. Yu
Table 9 Example 4: The convergence behavior of sGS-imABCD, ihADMM, and APG for (DPh )
Index of
h #dofs E2 EOC performance sGS-imABCD ihADMM APG
2−3 49 6.6122 – iter 40 56 44
residual η 6.06e-08 8.36e-08 9.92e-08
CPU time/s 0.72 0.42 0.60
and h = 2−8 are presented, which shows that the PMHSS-preconditioned GMRES
method is roughly independent of the mesh size h.
As a result from Table 9, it can be also observed that our sGS-imABCD is
faster and more efficient than the iwADMM and APG methods in terms of the
iteration numbers and CPU times. The numerical performance of our proposed sGS-
imABCD method clearly demonstrates the importance of our method.
Finally, to show the influence of the parameters α and β on our proposed sGS-
imABCD method, we also test Example 4 with different values of α and β on mesh
h = 2−8 . The results are presented in Table 10. From Table 10, it is obvious to
see that our proposed sGS-imABCD method is independent of the parameter β.
However, its convergence rate depends on α. It also confirms the convergence results
of Theorem 8.
16 Numerical Solution for Sparse PDE Constrained Optimization 673
Table 10 Example 4: The performance of sGS-imABCD for (Dh ) with different values of α and β
h α β iter.sGS-imABCD residual error η about K-K-T
2−8 10−6 0.0005 26 8.37e-08
0.001 27 8.40e-08
0.005 26 9.77e-08
0.008 28 2.47e-08
10−5 0.0005 13 5.44e-08
0.001 15 6.36e-08
0.005 14 8.60e-08
0.008 13 8.17e-08
10−4 0.0005 5 9.84e-08
0.001 4 3.71e-08
0.005 5 9.23e-08
0.008 5 5.22e-08
Conclusion
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Bergounioux, M., Ito, K., Kunisch, K.: Primal-dual strategy for constrained optimal control
problems, SIAM J. Control Optim. 37, 1176–1194 (1999)
Blumensath, T., Davies, M.E.: Iterative Thresholding for Sparse Approximations. J. Fourier Anal.
Appl. 14, 629–654 (2008)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical
learning via the alternating direction method of multipliers, Found. Trends® Mach. Learn. 3,
1–122 (2011)
674 X. L. Song and B. Yu
Carstensen, C.: Quasi-interpolation and a posteriori error analysis in finite element methods.
ESAIM: Math. Model. Numer. Anal. 33, 1187–1202 (1999)
Casas, E.: Using piecewise linear functions in the numerical approximation of semilinear elliptic
control problems. Adv. Comput. Math. 26, 137–153 (2007)
Casas, E., Tröltzsch, F.: Error estimates for linear-quadratic elliptic control problems. Analysis and
optimization of differential systems, pp. 89–100. Springer (2003)
Casas, E., Clason,C., Kunisch, K.: Approximation of elliptic control problems in measure spaces
with sparse solutions. SIAM J. Control Optim. 50, 1735–1752 (2012)
Casas, E., Herzog, R., Wachsmuth, G.: Approximation of sparse controls in semilinear equations
by piecewise linear functions. Numer. Math. 122, 645–669 (2012a)
Casas, E., Herzog, R., Wachsmuth, G.: Optimality conditions and error analysis of semilinear
elliptic control problems with L1 cost functional. SIAM J. Optim. 22, 795–820 (2012b)
Chambolle, A., Dossa, C.: A remark on accelerated block coordinate descent for computing the
proximity operators of a sum of convex functions (2015). https://hal.archives-ouvertes.fr/hal-
01099182
Chen, L. Sun, D.F., Toh, K.C.: An efficient inexact symmetric Gauss-Seidel based majorized
ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1),
237–270 (2017)
Ciarlet, P.G.: The finite element method for elliptic problems. Math. Comput. 36, xxviii+530
(1978)
Clason, C., Kunisch, K.: A duality-based approach to elliptic control problems in non-reflexive
Banach spaces. ESAIM Control Optim. Calc. Var. 17, 243–266 (2011)
Collis, S.S., Heinkenschloss, M.: Analysis of the streamline upwind/Petrov Galerkin method
applied to the solution of optimal control problems. CAAM TR02–01 (2002)
Cui, Y.: Large scale composite optimization problems with coupled objective functions: theory,
algorithms and applications. PhD thesis, National University of Singapore (2016)
de Los Reyes, J.C., Meyer, C., Vexler, B.: Finite element error analysis for state-constrained
optimal control of the Stokes equations. Control. Cybern. 37, 251–284 (2008)
Elvetun, O.L., Nielsen, B.F.: The split bregman algorithm applied to PDE-constrained optimization
problems with total variation regularization. Comput. Optim. Appl. 64, 1–26 (2014)
Falk, R.S.: Approximation of a class of optimal control problems with order of convergence
estimates. J. Math. Anal. Appl. 44, 28–47 (1973)
Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to
system identification and realization. SIAM J. Matrix Anal. Appl. 34, 946–977 (2013)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via
finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Geveci, T.: On the approximation of the solution of an optimal control problem problem governed
by an elliptic equation. RAIRO-Analyse numérique. 13, 313–328 (1979)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution,
par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires, Revue française
d’automatique, informatique, recherche opérationnelle. Analyse numérique 9, 41–76 (1975)
Hintermüller, M., Ulbrich, M.: A mesh-independence result for semismooth Newton methods.
Math. Program. 101, 151–184 (2004)
Hiriart-Urruty, J.-B., Strodiot, J.-J., Nguyen, V.H.: Generalized Hessian matrix and second-order
optimality conditions for problems with C 1,1 data. Appl. Math. Optim. 11, 43–56 (1984)
Herzog, R., Ekkehard S.: Preconditioned conjugate gradient method for optimal control problems
with control and state constraints. SIAM J. Matrix Anal. Appl. 31, 2291–2317 (2010)
Hinze, M.: A variational discretization concept in control constrained optimization: the linear-
quadratic case. Comput. Optim. Appl. 30, 45–61 (2005)
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints, Mathematical
Modelling: Theory and Applications, p. 23. Springer, New York (2009)
Li, X.D., Sun, D.F., Toh, K.C.: QSDPNAL: A two-phase Newton-CG proximal augmented
Lagrangian method for convex quadratic semidefinite programming problems (2015).
arXiv:1512.08872
16 Numerical Solution for Sparse PDE Constrained Optimization 675
Li, X.D., Sun, D.F., Toh, K.C.: A Schur complement based semi-proximal ADMM for convex
quadratic conic programming and extensions. Math. Program. 155, 333–373 (2016)
Meyer, C., Rösch, A.: Superconvergence properties of optimal control problems. SIAM J. Control
Optim. 43, 970–985 (2004)
Porcelli, M., Simoncini, V., Stoll, M.: Preconditioning PDE-constrained optimization with L1 -
sparsity and control constraints. Comput. Math. Appl. 74, 1059–1075 (2017)
Rösch, A.: Error estimates for linear-quadratic control problems with control constraints. Optim.
Methods Softw. 21, 121–134 (2006)
Schindele, A., Borzì, A.: Proximal methods for elliptic optimal control problems with sparsity cost
functional. Appl. Math. 7, 967–992 (2016)
Sun, D.F., Toh, K.C., Yang, L.Q.: An Efficient Inexact ABCD Method for Least Squares
Semidefinite Programming. SIAM J. Optim. 26, 1072–1100 (2016)
Stadler, G.: Elliptic optimal control problems with L1 -control cost and applications for the
placement of control devices. Comp. Optim. Appls. 44, 159–181 (2009)
Ulbrich, M.: Nonsmooth Newton-like methods for variational inequalities and constrained opti-
mization problems in function spaces. Habilitation thesis, Fakultät für Mathematik, Technische
Universität München (2002)
Ulbrich, M.: Semismooth Newton methods for operator equations in function spaces. SIAM J.
Optim. 13, 805–842 (2003)
Wachsmuth, G., Wachsmuth D.: Convergence and regularisation results for optimal control
problems with sparsity functional. ESAIM Control Optim. Calc. Var. 17, 858–886 (2011)
Wathen, A.J.: Realistic eigenvalue bounds for the Galerkin mass matrix. IMA J. Numer. Anal. 7,
449–457 (1987)
Game Theory and Its Applications in
Imaging and Vision 17
Anis Theljani, Abderrahmane Habbal, Moez Kallel, and Ke Chen
Contents
Introduction to Game Theory and Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
Applications of Game Theory in Image Restoration and Segmentation . . . . . . . . . . . . . . . . . 681
Applications of Game Theory in Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Introduction to Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Application of Game Theory to a Simple Registration Model . . . . . . . . . . . . . . . . . . . . . . . 685
Application of Game Theory to Registering Images Requiring Bias Correction . . . . . . . . 688
Game Models in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
GANs for Image Generation: A Two-Player Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698
GANs for Image Segmentation: A Two-Player Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Abstract
It is very common to see many terms in a variational model from Imaging and
Vision, each aiming to optimize some desirable measure. This is naturally so
because we desire several objectives in an objective functional. Among these
is data fidelity which in itself is not unique and often one hopes to have both
L1 and L2 norms to be small for instance, or even two differing fidelities: one
for geometric fitting and the other for statistical closeness. Regularity is another
demanding quantity to be settled on. Apart from combination models where
one wants both minimizations to be achieved (e.g., total generalized variation or
infimal convolution) in some balanced way through an internal parameter, quite
often, we demand both gradient and curvature based terms to be minimized;
such demand can be conflicted. A conflict is resolved by a suitable choice of
parameters which can be a daunting task. Overall, it is fair to state that many
variational models for Imaging and Vision try to make multiple decisions through
one complicated functional.
Game theory deals with situations involving multiple decision makers, each
making its optimal strategies. When assigning a decision (objective) by a
variational model to a player by associating it with a game framework, many
complicated functionals from Imaging and Vision modeling may be simplified
and studied by game theory. The decoupling effect resulting from game theory
reformulation is often evident when dealing with the choice of competing
parameters. However, the existence of solutions and equivalence to the original
formulations are emerging issues to be tackled.
This chapter first presents a brief review of how game theory works and then
focuses on a few typical Imaging and Vision problems, where game theory has
been found useful for solving joint problems effectively.
Keywords
Game theory deals with situations involving multiple decision makers. Each
decision maker owns the control on some variable known as his action. All actions
are collected in an overall variable known as a strategy. Each of the decision
makers owns a specific cost function, to be minimized, which depends on the
overall strategy variable. Decision makers are also termed by players or agents,
and cost functions could also be replaced by payoffs, to be maximized instead. For
readers who are familiar with, let us rephrase the classical optimization problems
as follows: optimization deals with situations where a single decision maker owns
control over one single overall strategy (all optimization variables), and optimizes a
single cost/payoff function, possibly subject to constraints.
17 Game Theory and Its Applications in Imaging and Vision 679
To start with some comprehensive and easy-reading reference, the book (Gibbons
1992) introduces, most if not all, the must-have material, including the earliest
models of Cournot and Bertrand, those of Stackelberg and actually illustrates with
many examples how the game theory first emerged from the need to model economic
behavior.
We focus in this introduction on noncooperative games, which means that the
players do not share the same cost function, or they do not aggregate their costs
into a single one (e.g., a weighted sum). We do not consider as well finite or
discrete games, where the set of strategies is either finite (e.g., prisoner’s dilemma)
or discrete (e.g., games on graphs).
Noncooperative games may be static or dynamic. Roughly speaking, in a
dynamic game, players sequentially observe actions of other players and then
choose their optimal responses. In a static game, players choose their best responses
to the others without exchange (or communication) of information. Remark that the
notion of time involved in games is not necessarily the physical time involved in, for
example, state equations. As well, a static game could be played by players whose
cost functions are constrained by, for example, unsteady fluid mechanics. Games
may also be with complete information, meaning that all players know each other’s
strategy spaces and cost functionals (including their own ones). The failure of this
assumption is termed as a game with incomplete information, see Gibbons (1992)
for details.
Noncooperative games may also be differential and/or stochastic.
Differential games involve state equations governed by system of differential
equations. They model a huge variety of competitive interactions, in social behavior,
economics, biology among many others, predator-prey, pursuit-evasion games, and
so on (Isaacs 1999). Stochastic games theory, starting from the seminal paper by
Shapley (1953), occupies nowadays most of the game theory publications, and a
vast literature is dedicated to stochastic differential games (Friedman 1972), robust
games (Nishimura et al. 2009), games on random graphs, or agents learning games
(Hu and Wellman 2003), among many other branches, and it is definitely out of
the scope of the introductory section to review all aspects of the field. See also the
introductory book (Neyman and Sorin 2003) to the basic concepts of the stochastic
games theory.
Solutions to noncooperative games are called equilibria. Contrarily to classical
optimization, the definition of an equilibrium depends on the game setting (game
rules). Within the static with complete information setting, a relevant one is the so-
called Nash equilibrium (NE).
We consider primarily the standard static, under complete information, Nash
equilibrium problem (NEP) (Gibbons 1992).
where y(x) = y1 (x), . . . , yp (x) : X ⊂ Rn → Rp (with n ≥ p) denotes a
vector of cost functions (a.k.a. pay-off or utility functions), yi denotes the specific
cost function of player i, and
the strategy variable x consists of block components
x1 , . . . , xp x = (xj )1≤j ≤p .
In other words, when all players have chosen to play an (NE), then no single
player has incentive to move from his x∗i . Let us however mention by now that,
generically, Nash equilibria are not efficient, that is, do not belong to the underlying
set of best compromise solutions, called Pareto front, of the objective vector
(yi (x))x∈X .
An important class of games are the so-called potential games. As introduced
in the survey paper (David and Hernández-Lerma Onésimo 2016), in the static
case, a noncooperative game is said to be a potential game if there is a real-valued
function, called a potential function, such that a strategy profile that optimizes the
potential function is a Nash equilibrium for the game. This is precisely one of the
key properties of potential games; namely, in a potential game one can find Nash
equilibria by optimizing a single function rather than using a fixed-point argument
as is typically done for noncooperative games.
From application side, few papers are dedicated to engineering applications
involving partial differential state equations where distributed parameters are seen as
Nash strategies. In Habbal et al. (2004), a Nash game is set up between two physical
processes, heat transfer and structural mechanics, using cooling and structural
material densities (like as in topology optimization) as Nash strategies. Nash games
could also be used to model biological processes, as introduced in Habbal (2005),
where tumoral angiogenesis is modeled as a Nash game between pro- and anti-
angiogenic factors and involves porous media and elasticity state equations. In Roy
et al. (2017), Nash strategies are used to model the cognitive process of pedestrian
avoidance, with Fokker-Planck state equations.
Engineering applications involving multidisciplinary optimization may also ben-
efit from reframing within a Nash game framework, see Desideri et al. (2014) for an
overview and Benki et al. (2015) for an original application in nonlinear mechanics.
Finally, and in close connection to image processing, ill-posed inverse problems
may find a strikingly efficient benefit in being reformulated as Nash games. See
Habbal and Kallel (2013) for a novel approach in solving data recovery problems,
17 Game Theory and Its Applications in Imaging and Vision 681
and Habbal et al. (2019); Chamekh et al. (2019) in devising new algorithms to solve
the coupled data recovery and parameter or shape identification problems.
There are two classical problems associated with image processing: the image
denoising (restoration) and contour identification (segmentation). To address these
issues, there are various approaches, such as the stochastic modeling, the wavelet
approach and the variational approach leading to the partial differential equations.
Image restoration is an inverse problem which consists of finding the original image
from another observed, often linked by the equation, I0 = TI +v, where T is a linear
operator modeling the blur, I a (mathematical) image defined by the intensity (or
gray level), and v represents the noise (Gaussian for example). Image segmentation
is the process of extracting objects from an image, and can be formulated as finding
a finite collection {i }Ki=1 of disjoint open subsets of , where is an open
and bounded subset of R2 and represents the image domain. The restoration and
segmentation of the image can be performed simultaneously. In this case, one has
to solve a minimization problem of a sum of two energies (see, e.g., Mumford-
Shah functional (Mumford 1989)). One favors image regularization and the other
detects and enhances the contours presented in the image. If the regularization term
of the energy is favored over the segmentation term, then the contours are smoothed
and hence destroyed. On the other hand, if the segmentation contribution to the
energy is made stronger than the regularization contribution, then we might obtain
an oversegmented image.
A game-theoretic approach was proposed in Kallel et al. (2014) to simul-
taneously restore and segment noisy images. The method is based on iterative
negotiation between the two antagonistic processes, segmentation and restoration,
where acceptable solutions arise then as stationary (noncooperative) decisions.
In this work, the game theory concepts are used and define two players: one is
interested in the regularization of the image and the other is concerned with its
segmentation. Each of two players will try to increase his profit by making an
adequate decision until a “Nash equilibrium” is reached. More specifically, the
restoration player’s goal is to minimize the functional
J1 (I, C) = (I − I0 )2 dx + μ |∇I |2 dx, (3)
\C
K
1
J2 (I, C) = (I0 − Ii )2 dx + ν|C|, where Ii = I (x) dx. (4)
i |i | i
i=1
682 A. Theljani, et al.
The functional (4) is inspired from the Mumford-Shah one and it is obtained
by replacing the restriction of I in each connected component i of with its
mean over i . To summarize this approach, the authors consider a two-player
static of complete information game where the first player is restoration, and the
second is segmentation. Restoration minimizes the cost J1 (I, C) with action on the
intensity field I , while segmentation minimizes the cost J2 (I, C) with action on
the discontinuity set C. In this case, solving the game amounts to finding a Nash
equilibrium (NE), defined as a pair of strategies (I ∗ , C∗ ), such that
I ∗ = argminI J1 (I, C∗ ),
(5)
C∗ = argminC J2 (I ∗ , C).
Finally, the authors use a level-set approach to get rid of the tricky control
dependence of functional spaces. After, a numerical study is carried on some
real images in order to evaluate the effectiveness of the proposed algorithm. In
particular, they show that by decoupling the Mumford-Shah functional using the
game algorithm, the dependence on the regularization parameters μ and ν is
uncorrelated and the choice of their values becomes more flexible and natural. On
the other hand, the dependence of the functional J2 only on the mean of I in each
connected component has a significant effect on the speed of convergence. In Fig. 1,
a numerical result using only one level-set function is represented. The top row
displays the evolution of curves over the corresponding images I (k) , k ∈ {0, 10, 50}.
The bottom row displays the final segmentation result (second image) and denoised
image (third image) with P SNR = 31.98. For this case, the algorithm converges
after 135 iterations.
17 Game Theory and Its Applications in Imaging and Vision 683
Fig. 1 Top row: noisy image with Gaussian noise (variance = 0.2) and initial contour, evolution
by iterations. Bottom row: segmentation and restoration of image by the proposed algorithm with
(ν = 0.2, μ = 0.01), for k = 135. CPU time = 117 sec
There exist many image registration models: each is designed for one class of
problems. It is challenging to find an universally robust model that can deal with
all registration problems, due to the inherent difficulties of image registration. The
previous section discussed how game theory can be used to enhance a model
for image restoration and segmentation. Here we shall see that game theory is
also a natural tool to reformulate an image registration model in achieving better
performance and robustness.
In this section, we review recent works on using game theory to design and
reformulate the traditional variational models for deformable image registration.
The advantages gained will be in reduction of the burden of tuning many parameters;
hence a more robust model is obtained. The ideas are generally applicable to almost
other variational models.
Image registration (Chen et al. 2019) aims to align two given images through
mapping one (the template image T ) to the other (the reference image R) so that
684 A. Theljani, et al.
the aligned (or registered) image T (φ) may be used to give us complementary
information from T to R, or highlight the differences between T and R. Here
φ(x) = x + u(x) where u(x) = (u1 (x), u2 , . . . , ud (x)) is the unknown map
concerned if x ∈ ⊂ Rd . Practically d = 2, 3 are more common.
To find φ, a typical variational model takes the form ; Chen et al. (2019)
where F (u), S(u), C(u) are, respectively, the fitting terms to align T , R, the
regularization term to overcome the ill-posedness of minimizing the fitting term
alone and the control term to ensure the underlying map φ does not have folding
(e.g., by making φ diffeomorphic).
Flexibilities exist for specifying each of the three terms in (6) differently, though
none of these flexibilities is sufficient to construct a robust model for a wider class
of problems than with a fixed choice of terms.
First, since the fitting term F is supposed to measure the dissimilarity of T , R,
it has many possible choices especially for multi-modality pairs of T , R (e.g., T is
from MRI and R is from ultrasound).
For single modal images (e.g., when both T , R are CT images), a popular choice
for F is the SSD (sum of squared differences)
F (u) = |T (x + u) − R(x)|2 dx.
For multimodal image pairs, one may take the popular choice of mutual information
(Maes et al. 1997). This statistical measure has also been improved a few times since
1997. One alternative is the normalized gradient differences (NGD)
F (u) = |∇n T (x + u) − ∇n R(x)|2 dx
where ∇n T = ∇T /|∇T |; however, we remark that this fitting term is not very
robust, a better variant is proposed in Theljani and Chen (2019a).
Second, as for designing the regularizer S, one way is to regularize all deforma-
tion directions individually:
but one may introduce some coupling between these individual terms.
Finally, the control term C is designed to ensure det(∇φ) > 0. If it makes sense
to achieve volume or area preservation in features of T , R, that is, det(∇φ) = 1, a
simple method is to define
C(u) = (det(∇φ) − 1)2 dx.
17 Game Theory and Its Applications in Imaging and Vision 685
However, if this is not appropriate for other applications, a robust method seems to
define
C(u) = (μ(φ))2 dx,
where is some smooth function (Zhang and Chen 2018) and μ is the Beltrami
coefficient for the same mapping φ projected to a complex plane with φ = φ1 (x) +
iφ2 (x) with d = 2. The central idea is the equivalence relationship |μ| < 1 ⇔
det(∇φ) > 0, which facilitates the design of an unconstrained optimization problem
(Lam and Lui 2014).
Of course, it is entirely appropriate to propose a minimization problem like (6)
without its third term, and to add the constraint det(∇φ) > 0 as done in Zhang et al.
(2016) and Thompson and Chen (2019). However, nonlinear constraints are not easy
to deal with in numerical implementations.
One drawback of the Beltrami coefficient is that such a quantity μ does not
exist when d ≥ 3, though there are some recent attempts to generalize it to high
dimensions. The recent work by Zhang and Chen (2020) designed a 3D Beltrami
coefficient-like quantity that possesses the same property as 2D, and hence extended
the classical work.
Another method to replace the third term in (6) is the so-called inverse consistent
formulation where the folding is avoided by simultaneously registering T to R by φ
and also R to T by ψ. The central idea is φ(ψ) = I or ψ(φ) = I so that the map is
inversely consistent and does not fold. See Christensen et al. (2007), Thompson and
Chen (2019), Theljani and Chen (2019c) and Chen and Ye (2010).
To illustrate the idea of using the game theory, let us first consider the diffusion
registration model for simple modal images before we elaborate on more robust
models in later subsections.
Let us start with the simple diffusion model (Fischer and Modersitzki 2002)
which takes the following form:
min J(u) = |∇u|2 dx + αM(u) (8)
u∈W 1,2 ()
where M(·) is a similarity measure. One application using game theory for this
model is to consider two different similarity measures. For the simple case of
monomodal images, using the sum of squared differences is used because of the
grey value constancy assumption. However, in some scenarios, the SSD has a
big drawback: it is quite susceptible to slight changes in brightness, which often
appear in natural scenes. Therefore, it is useful to allow some small variations in
the grey value and help to determine the displacement vector by a criterion that is
686 A. Theljani, et al.
invariant under grey value changes. Thus, to have a model which is less sensitive
to illumination variations, it is interesting to combine SSD with another measure,
which can capture more information, such as gradients, and fulfill the gradient
constancy assumption.
This approach may lead to a solution which is sensible to choice of the weighting
parameters λ1 and λ2 between the two measures. In fact, if more weights are put
on the SSD term, it seems that the model does not work because the SSD will not
handle well the regions in the images that distorted by varying illumination. Only
few regions are well registered where there is no big difference in the intensity
variation between the two images. Reversely, if the NGD contribution to the model
is too much strong by taking large value of λ2 , then the solution seems to be well
registered in regions of varying intensity whereas the registration quality is poorer
than the SSD model in clean regions, that is, nor varying intensities.
where
J1 (u, v) = |∇u|2 dx + |T (x + u) − R(x)|2 dx + λ (u − v)2 dx, (11)
J2 (u, v) = |∇v|2 dx + |∇n T (x +v)−∇n R(x)|2 dx +λ (u−v)2 dx (12)
17 Game Theory and Its Applications in Imaging and Vision 687
The first energy uses the sum of squared difference as similarity measure, whereas
the second energy uses the normalized gradient difference term (NGD). The third
part is a coupling term which serves for the communication between the two players
u and v. The first player tries to minimize his one cost J1 (·) taking into account the
information about the gradient consistency coming from the second player v through
the coupling term, and vice versa.
Examples
Figure 2 shows an example of using game model for a pair of MRI images
registration. We assess the registration quality by measuring the normalized cross
correlation coefficient (NCC) between the registered image T (u) and R (closer
NCC to 1 means better registration). Mainly, the example illustrates how two players
in a game model can cooperate to achieve better registration quality. However, by
considering two separate models, that is, no communication for λ = 0, the first
model in (11) is unable to achieve an acceptable result.
Fig. 2 Example 1: the game approach for registering a pair of MRI images. The template image
T contains some undesirable artifact. Clearly, the game approach is able to cope with this case
because of the use of two different measures. (a) The reference R (b) The template T (c)
Model (11) for λ = 0: T (u), NCC=0.61 (d) Model (12) for λ = 0: T (u), NCC=0.79 (e) Game:
Model (11) for λ = 1: T (u), NCC=0.81 (f) Game: Model (12) for λ = 0: T (u), NCC=0.81
688 A. Theljani, et al.
T (u) − s
R1 = mR + s, T (u) ≈ R1 , with Tc (u) = ≈ R, (13)
m
where T (u) is the uncorrected and registered image, carrying the bias field features
from T and aligned with R, that is, one may minimize one of these fidelity terms
for m, s, u in some norm:
T (u) − s
mR + s − T (u) , −R .
m
Non-game Approach
A classical variational approach for joint full bias correction and image registration
consists in solving the multivariate optimization problem
JM J(u, m, s) := λ |mR + s − T (u)|2 dx + R(u, s, m), (14)
CV min {J(u, s, ml , Kl ) =
u,s,ml ,Kl
R(u, s, ml , Kl ) + λ1 |T (u) − e Kl
− s| dx + λ2
2
|ml + Rl − Kl |2 dx}
(16)
where u is the main deformation field variable, R(·) contains regularization terms
associated to all four unknowns (to be specified), and the rest of the energy are two
fidelity terms. Clearly there are no multiplicative terms in (16) as designed. One
would normally specify R(·) and try to solve the joint optimization problem by some
techniques, for example, the alternating direction method of multipliers (ADMM)
(Boyd et al. 2011) or Augmented Lagrangian (Bonnans et al. 2006; Boyd et al.
2011). The problem (16) is split into 4 subproblems for each of the main variables:
u, s, ml , Kl . There are two challenges: i) choosing the 5 parameters (assuming there
are 3 new parameters from R(·)) suitably is a highly nontrivial task; ii) one cannot
avoid coupling all 4 variables in any subproblem. This challenge can be solved using
a game theory formulation as described in the sequel.
Game Model
It was shown in Theljani and Chen (2019b) that it is more convenient to refor-
mulate (16) to another form using the Nash game idea where both of these two
690 A. Theljani, et al.
challenges are overcome: first, each subproblem will have one parameter which can
be tuned for that subproblem in an easier way; second, it is possible to modify
the above subproblems to reduce couplings and hence improve convergence. The
authors demonstrated that the game model offers a better solution for two main
aspects: choice of underlying parameters and proof of solution existence. In fact, the
Kl subproblem in model (16) has three terms and involves two penalty parameters
λ1 and λ2 , which are pretended to be large enough. The solution will be sensitive
to these two parameters and the optimal choice is nontrivial. We shall reformulate
this problem to yield only one parameter (instead of two) by considering a
game approach that has a separable structure and makes the model less sensitive
to these parameters. The joint model (16) was reformulated as a game where the
solution is a Nash equilibrium defined by (A1 , A2 , A3 , A4 ) = (u, s, ml , Kl ) in the
space X = W × W 1,2 () × W 1,2 () × W 1,2 () where W = W 2,2 (, R2 ) ∩
W01,2 (, R2 ). The space X is endowed with the following norm:
1/2
z X = u 2
W + ∇s 2
W 1,2 ()
+ ∇ml 2
W 1,2 ()
+ ∇Kl 2
W 1,2 ()
,
1/2
where u W = ∇u 22 + ∇ 2 u 22 . The game formulation allows many
choices of energies Ri (·) and Gi (·) whose terms may not be part of each other.
The choice of the different energies leads to either potential or non-potential games
(Monderer and Shapley 1996).
(17)
where Ri (·) is the regularization term in energy i. There are many possible
choices of regularization leading to different solution spaces. For the deformation
u, the authors in Theljani and Chen (2019b) used regularizers based on combined
first and second-order derivatives. Using only the first-order derivatives, that is,
H 1 semi-norm, is sensitive to affine preregistration. We avoid this problem by
combining it with the second-order derivative term which are not sensitive to (affine)
preregistration as it has the affine transformations in its kernel. Moreover, this choice
17 Game Theory and Its Applications in Imaging and Vision 691
penalizes oscillations and also allows smooth transformations in order to get visually
pleasing registration results. The variables Kl , ml , and s are chosen in the space
W 1,2 () and we could consider different spaces such as W 2,2 () or the space of
bounded variation functions BV (). The formulation in (17) is special cases of
game formulation known as a potential game (PG) (Monderer and Shapley 1996)
which amounts to find a minimizer of an energy L(·) = 4i Ji (u, s, ml , Kl ) in (16)
– then the game model reduces to an ADMM algorithmif alternating iterations
are used or a Nash equilibrium of (16) is a minimizer of 4i Ji (u, s, ml , Kl ). We
refer the reader to Monderer and Shapley (1996), Attouch and Soueycat (2008) and
Attouch et al. (2008) for more details about potential game in PDEs.
(18)
where = {Kl ∈ L2 (); Kmin ≤ Kl ≤ Kmax } is a closed and convex set; and
ι (·) is a projection into . The variables Kl are bounded for theoretical reasons in
order to prove the existence of a Nash equilibrium (NE). In this case, an NE is not a
minimizer of 4i Ji (u, s, ml , Kl ), which makes the proof of the existence difficult.
Formally this Nash game problem is called a non-potential game (denoted by NPG).
Clearly the essential simplification is in G4 and there are other possible alternative
formulations, for example, using L1 semi-norm. These changes simplify the Kl -
problem in (17), equivalently in (16), where the Kl -energy has three terms and
which necessitates two regularization parameters λ4 and λ5 . Whereas, in the game
approach (18), the same problem consists only of regularization and one fidelity
term, that is, has only one parameter λ4 . Moreover, to discuss any theory for (18),
the non-convexity should be addressed, e.g. the energy G1 (·) is non-convex w.r.t u.
Non-convexity means that we cannot apply the Nash theorem (Nash 1951) to show
the existence of an NE.
Iterative Algorithm
To compute the (NE), the authors in Theljani and Chen (2019b) used alternating
Forward-Backward algorithm (ADMM-like), by means of the following iterative
process:
692 A. Theljani, et al.
z(k+1) −z(k)
• If z(k) 2
2
≤ , stop. Otherwise k = k + 1, go to Step 1.
Examples
The experiments show that the game approach can have significant robustness in
presence of bias noise and varying illumination. In all examples, the weighting
parameters were fixed as λ1 = 200 for the u-subproblem, λ2 = 20 for the
s-subproblem, λ3 = 1 for the ml -subproblem, and λ4 = 5 for the Kl -subproblem.
A multi-resolution technique was used to initialize the displacement u in order
to avoid local minima and to speed up registration. The game model, denoted by
“Game,” is compared with joint models (14) “JM” and the classical variational
model (16) denoted by “CV.” The last models are the more natural choices for the
class of joint problems. The authors also compared with the Mutual Information
based multi-modality model where they minimize an energy which uses the same
regulrizer R1 (·) and the Mutual Information as similarity measure (denoted by
“MI” below). Numerical experiments on “MI” are performed using the publicly
available image registration toolbox – Flexible algorithms for image registration
(FAIR) (http://www.siam.org/books/fa06/), where the implementation is based on
the Gauss-Newton method.
In the examples, they show the registered images T (u) and the corrected images
Tc (u). The latter are defined by the formula Tc = (T (u) − s)/eml for ‘Game and
‘CV , and Tc = (T (u) − s)/m for JM. In contrast, the final registered image for
MI is just T (u). The normalized correlation coefficient (NCC) between Tc and R
and between T (u) and R was used as evaluation metric to quantify the performance
of the models and the comparison (the closer the NCC is to 1, the better is the
alignment).
Fig. 3 Example 2: Comparison of 3 different models to register MRI T-1 and T-2 images. From
this figure and Fig. 4, we see that Game model gives the best registration result. (a) The reference
R (b) The template T (c) Game: T (u) only, NCC=0.81 (d) JM: T (u) only, NCC=0.78 (e) MI:
T (u), NCC=0.77 (f) CV: Tc (u), NCC=0.79
in the middle of the images, the game model is the most advantageous and this
can be observed in the zoomed details in Fig. 4. For the parameters tuning, the
authors tested different values and they are tabulated in Table 1 which indicates the
registration results for different parameters λi (i = 1, ..., 4). The table shows that
the game approach is stable.
Fig. 4 Example 2: Compared zoom regions of 5 different models to register MRI T-1 and T-2
images. Again Game model is the best in solving the registration and the intensity correction
jointly, whereas JM model cannot solve both problem jointly, only the image correction task is
successful. (a) The reference R (b) The template T (c) Game: T (u) (d) JM: T (u) (e) MI: T (u)
(f) CV T (u)
Table 1 Parameters tuning for the pair of MRI images in Fig. 3 using Game. In the first column,
we fix the parameters λ3 and λ4 and we vary the parameters λ1 and λ2 . In the third column, we
vary λ1 and λ3 where λ2 and λ4 are fixed, whereas, in the last column, we vary λ1 and λ4 for fixed
λ2 and λ3 . The NCC errors for the different values of parameters are comparable
Parameters
λ1 λ2 | NCC λ3 | NCC λ4 | NCC
100 05 |NCC=0.77 0.5 |NCC=0.78 01 |NCC=0.78
150 15 | NCC=0.79 01 |NCC=0.80 05 |NCC=0.80
200 20 |NCC=0.80 05 |NCC=0.80 20 |NCC=0.79
250 40 | NCC=0.79 10 |NCC=0.77 50 |NCC=0.78
λ3 = 1 and λ4 = 5 λ2 = 20 and λ4 = 5 λ2 = 20 and λ3 = 1
17 Game Theory and Its Applications in Imaging and Vision 695
The result of both registration and correction is satisfactory and this underlines the
performance of this model in solving both problems jointly and efficiently which is
not the case for CV, JM, and MI as they only handle the correction task correctly
and fail in registration. For this particular example, T (u) is very useful as clinicians
like to where the contrasts from perfusion CT (“artifacts”) would be located on
the CT.
true and generated data. They are trained in an adversarial and iterative manner
until convergence is achieved when both are satisfied, that is, equilibrium situation.
The illustration in Fig. 6 gives a rough idea on the work-flow of the generator and
discriminator in the Generative Adversarial Networks.
Fig. 7 The model architecture of GANs model for the image generation problem
min max J(D, G) = Ex∼pdata (x) log[D(x)] + Ez∼pdata (z) log[(1 − D(G(z))] (23)
G D
where Ex∼pdata (x) is the expected value over all real data instances. It is easy to
prove the existence of Nash equilibrium for this model as it is two-player zero-sum
minimax game. However, the main challenge in GANs is the training as finding a
Nash equilibrium is not straightforward. The model is trained in alternating way;
the D-problem consists of solving the maximization problem
min J(D, G) = Ex∼pdata (x) log[D(x)] + Ez∼pdata (z) log[(1 − D(G(z))], (24)
G
17 Game Theory and Its Applications in Imaging and Vision 699
where the first allows to recognize real images, whereas the second helps to
recognize fake ones. The G-problem consists in solving the minimization problem
The GANs training algorithm involves training both the discriminator and the
generator nets in parallel. The algorithm used in the original 2014 paper by
Goodfellow (Goodfellow et al. 2014) is summarized in the figure below:
end for
• Sample mini batch of m noise samples {x (1) , · · · , x (m) } from noise prior pg (z).
• Update the generator by descending its stochastic gradient:
1
m
∇θg log 1 − D G z(i) .
m
i=1
endfor
With iteration, Generator G gets stronger and stronger at generating the real
images and the discriminator D also gets stronger and stronger at identifying which
one is real, which one is fake.
Examples
Few examples of images created by GANs for MNIST dataset are given in Fig. 8.
Several approaches for the image segmentation problem based on the GANs
framework were proposed in Luc et al. (2016), Mahapatra et al. (2018), and Tanner
et al. (2018). We describe here the proposed GANs model in Luc et al. (2016) for
the particular case of semantic segmentation.
700 A. Theljani, et al.
Fig. 8 Starting from random noise images, the generator gradually learns with iterations to
emulate the features of the training dataset; it produces like-handwritten digits
The idea consists of using a generative adversarial networks (GANs) for RGB
images segmentation where the trained network takes an RGB image x of size H ×
W × 3 as inputs and outputs the segmented image which is represented as a class
label at each pixel location independently.
Model Loss
The generator and the discriminator are trained together to optimize global loss
function which is a weighted sum two terms. Given a data set of N training color
images xn of size H ×W ×C and a corresponding label maps yn , the authors defined
a global loss as
17 Game Theory and Its Applications in Imaging and Vision 701
Ground truth
Fig. 9 Figure taken from Luc et al. (2016). Overview of the proposed approach. Left: segmenta-
tion net takes RGB image as input, and produces per-pixel class predictions. Right: Adversarial
net takes label map as input and produces class label (1=ground truth, or 0=synthetic). Adversarial
optionally also takes RGB image as input
N
J(G, D) = mce (G(xn ), yn ) − λ bce (D(xn , yn ), 1) + bce (D(xn , G(xn )), 0)
n=1
(26)
where λ = 10 controls the contribution of the two terms, that is, the multi-class
cross-entropy loss
×W
H C
mce (G(xn ), yn ) = − yni log(G(xn )c ),
i=1 c=1
The term mce (G(xn ), yn ) denotes the multi-class cross-entropy loss for predictions
G(xn ) and is a standard loss for semantic segmentation models. It encourages the
segmentation model to predict the right class label at each pixel location indepen-
dently. The Discriminator output D(xn , yn ) ∈ [0, 1] represents the scalar probability
of yn being the ground truth label map of xn , or being a fake map produced by
Generator G. The second part of the loss is for adversarial convolutional network
and is large if the adversarial network can discriminate the generated segmentation
map by Generator G from ground-truth label maps.
Similar to all GANs models, this is a min-max game model where the full loss
is minimized with respect to Generator G of the segmentation, and maximized with
702 A. Theljani, et al.
Fig. 10 Figure taken from Luc et al. (2016). Segmentations on Stanford Background. Class
probabilities without (first row) and with (second row) adversarial training. In the last row the
class labels are superimposed on the image
Training
The model is trained in alternating way; the D-problem consists in solving the
minimization problem
where the first allows to recognize real labels, whereas the second helps to recognize
fake ones. The G-problem consists in solving the minimization problem
min mce (G(xn ), yn ) − λbce (D(xn , G(xn )), 0) (29)
G
The GANs training algorithm involves training both the discriminator and the
generator nets in parallel.
Example
The numerical example in Fig. 10 illustrate a comparison between the segmentation
results using adversarial (GANs) and non-adversarial approaches. The results state
that GANs approach clearly enhances the segmentation better than a classical deep
learning approach, that is, non-adversarial.
17 Game Theory and Its Applications in Imaging and Vision 703
Conclusion
References
Aghajani, K., Manzuri, M.T., Yousefpour, R.: A robust image registration method based on total
variation regularization under complex illumination changes. Comput. Meth. Prog. Biomed.
134, 89–107 (2016)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Alternating proximal algorithms for weakly
coupled convex minimization problems. applications to dynamical games and pde’s. J. Convex
Anal. 15(3), 485 (2008)
Attouch, H., Soueycatt, M.: Augmented lagrangian and proximal alternating direction methods
of multipliers in hilbert spaces. applications to games, pde’s and control. Pac. J. Optim. 5(1),
17–37 (2008)
Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., Graepel, T.: The mechanics of
n-player differentiable games. arXiv preprint arXiv:1802.05642 (2018)
Bansal, R., Staib, L.H., Peterson, B.S.: Correcting nonuniformities in MRI intensities using entropy
minimization based on an elastic model. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 78–86. Springer (2004)
Benki, A., Habbal, A., Mathis, G., Beigneux, O.: Multicriteria shape design of an aerosol can. J.
Comput. Design Eng. 11 (2015). https://doi.org/10.1016/j.jcde.2015.03.003. https://hal.inria.fr/
hal-01144269
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theo-
retical and Practical Aspects. Springer Science & Business Media (2006)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1),
1–122 (2011)
Chamekh, R., Habbal, A., Kallel, M., Zemzemi, N.: A nash game algorithm for the solution of
coupled conductivity identification and data completion in cardiac electrophysiology. Math.
Modell. Nat. Phenom. 14(2), 15 (2019). https://doi.org/10.1051/mmnp/2018059. https://hal.
archives-ouvertes.fr/hal-01923819
Chang, H., Huang, W., Wu, C., Huang, S., Guan, C., Sekar, S., Bhakoo, K.K., Duan, Y.: A new
variational method for bias correction and its applications to rodent brain extraction. IEEE
Trans. Med. Imaging 36(3), 721–733 (2017)
Chen, K., Lui, L.M., Modersitzki, J.: Image and surface registration. In: Handbook of Numerical
Analysis – Processing, Analyzing and Learning of Images, Shapes, and Forms, vol. 20. Elsevier
(2019)
704 A. Theljani, et al.
Chen, Y., Ye, X.: Inverse consistent deformable image registration. In: The Legacy of Alladi
Ramakrishnan in the Mathematical Sciences, pp. 419–440. Springer (2010)
Christensen, G.E., Song, J.H., Lu, W., ElNaqa, I., Low, D.A.: Tracking lung tissue motion and
expansion/compression with inverse consistent image registration and spirometry. Med. Phys.
34, 2155–2163 (2007)
Chumchob, N., Chen, K.: Improved variational image registration model and a fast algorithm for its
numerical approximation. Numer. Meth. Partial Differen. Equations 28(6), 1966–1995 (2012)
Mumford, D.J.S.: Optimal approximations by piecewise smooth functions and variational prob-
lems. Commun. Pure Appl. Math. 42, 577–685 (1989)
Desideri, J.A., Duvigneau, R., Habbal, A.: Multiobjective design optimization using nash Games.
In: M. Vasile, V.M. Becerra (eds.) Computational Intelligence in the Aerospace Sciences,
Progress in Astronautics and Aeronautics. American Institute of Aeronautics and Astronautics
(AIAA) (2014). https://hal.inria.fr/hal-00923584
Duan, Y., Chang, H., Huang, W., Zhou, J., Lu, Z., Wu, C.: The l_{0} regularized mumford–shah
model for bias correction and segmentation of medical images. IEEE Trans. Image Process.
24(11), 3927–3938 (2015)
Ebrahimi, M., Martel, A.L.: A general pde-framework for registration of contrast enhanced
images. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 811–819. Springer (2009)
Fischer, B., Modersitzki, J.: Fast diffusion registration. Contemp. Math. 313, 117–12 (2002)
Friedman, A.: Stochastic differential games. J. Differen. Equ. 11(1), 79–108 (1972)
Gemp, I., Mahadevan, S.: Global convergence to the equilibrium of gans using variational
inequalities. arXiv preprint arXiv:1808.01531 (2018)
Ghaffari, A., Fatemizadeh, E.: Image registration based on low rank matrix: Rank-regularized ssd.
IEEE Trans. Med. Imaging 37(1), 138–150 (2018)
Gibbons, R.S.: Game Theory for Applied Economists. Princeton University Press (1992)
Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality
perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551 (2018a)
Gidel, G., Hemmat, R.A., Pezeshki, M., Lepriol, R., Huang, G., Lacoste-Julien, S., Mitliagkas, I.:
Negative momentum for improved game dynamics. arXiv preprint arXiv:1807.04740 (2018b)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing
Systems, pp. 2672–2680 (2014)
Habbal, A.: A topology Nash game for tumoral antiangiogenesis. Struct. Multidiscip. Optim. 30(5),
404–412 (2005)
Habbal, A., Kallel, M.: Neumann-Dirichlet nash strategies for the solution of elliptic cauchy
problems. SIAM J. Control. Optim. 51(5), 4066–4083 (2013). https://hal.inria.fr/hal-00923574
Habbal, A., Kallel, M., Ouni, M.: Nash strategies for the inverse inclusion Cauchy-Stokes problem.
Inverse Prob. Imag. 13(4), 36 (2019). https://doi.org/10.3934/ipi.2019038. https://hal.inria.fr/
hal-01945094
Habbal, A., Petersson, J., Thellner, M.: Multidisciplinary topology optimization solved as a Nash
game. Int. J. Numer. Meth. Engng 61, 949–963 (2004)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-
scale update rule converge to a local nash equilibrium. In: Advances in Neural Information
Processing Systems, pp. 6626–6637 (2017)
Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res.
4(Nov), 1039–1069 (2003)
Isaacs, R.: Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit,
Control and Optimization. Courier Corporation (1999)
Kallel, M., Aboulaich, R., Habbal, A., Moakher, M.: A nash-game approach to joint image
restoration and segmentation. Appl. Math. Model. 38(11-12), 3038–3053 (2014)
Kim, Y., Tagare, H.D.: Intensity nonuniformity correction for brain mr images with known voxel
classes. SIAM J. Imag. Sci. 7(1), 528–557 (2014)
17 Game Theory and Its Applications in Imaging and Vision 705
Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion
deblurring using conditional adversarial networks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 8183–8192 (2018)
Lam, K.C., Lui, L.M.: Landmark- and intensity-based registration with large deformations via
quasi-conformal maps. SIAM J. Imag. Sci. 7(4), 2364–2392 (2014)
Li, C., Gatenby, C., Wang, L., Gore, J.C.: A robust parametric method for bias field estimation and
segmentation of mr images. In: IEEE Conference on Computer Vision and Pattern Recognition,
2009. CVPR 2009, pp. 218–223. IEEE (2009)
Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks.
arXiv preprint arXiv:1611.08408 (2016)
Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image
registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–
198 (1997)
Mahapatra, D., Antony, B., Sedai, S., Garnavi, R.: Deformable medical image registration using
generative adversarial networks. In: 2018 IEEE 15th International Symposium on Biomedical
Imaging (ISBI 2018), pp. 1449–1453. IEEE (2018)
Modersitzki, J.: FAIR: Flexible Algorithms for Image Registration. SIAM publications (2009)
Modersitzki, J., Wirtz, S.: Combining homogenization and registration. In: International Workshop
on Biomedical Image Registration, pp. 257–263. Springer (2006)
Monderer, D., Shapley, L.S.: Potential games. Games Econom. Behav. 14(1), 124–143 (1996)
Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: Advances in
Neural Information Processing Systems, pp. 5585–5595 (2017)
Nash, J.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 36(1), 48–49 (1950)
Nash, J.: Non-cooperative games. Ann. Math. 286–295 (1951)
Neyman, A., Sorin, S.: Stochastic Games and Applications, vol. 570. Springer Science & Business
Media (2003)
Nishimura, R., Hayashi, S., Fukushima, M.: Robust nash equilibria in n-person non-cooperative
games: Uniqueness and reformulation. Pac. J. Optim. 5(2), 237–259 (2009)
Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational
divergence minimization. In: Advances in Neural Information Processing Systems, pp. 271–279
(2016)
Park, C.R., Kim, K., Lee, Y.: Development of a bias field-based uniformity correction in magnetic
resonance imaging with various standard pulse sequences. Optik 178, 161–166 (2019)
Rak, M., König, T., Tönnies, K.D., Walke, M., Ricke, J., Wybranski, C.: Joint deformable liver
registration and bias field correction for mr-guided hdr brachytherapy. Int. J. Comput. Assist.
Radiol. Surg. 12(12), 2169–2180 (2017)
Roy, S., Borzì, A., Habbal, A.: Pedestrian motion modeled by FP-constrained Nash games. R. Soc.
Open Sci. (2017). https://doi.org/10.1098/rsos.170648. https://hal.inria.fr/hal-01586678
Uryas’ev, S., Rubinstein, R.Y.: On relaxation algorithms in computation of noncooperative
equilibria. IEEE Trans. Autom. Control 39, 1263–1267 (1994)
David, S., Hernández-Lerma Onésimo, G.: A survey of static and dynamic potential games. Sci.
China Math. 59(11), 2075–2102 (2016)
Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, 2nd edn. MIT Press
Cambridge (2018)
Tanner, C., Ozdemir, F., Profanter, R., Vishnevsky, V., Konukoglu, E., Goksel, O.: Generative
adversarial networks for mr-ct deformable image registration. arXiv preprint arXiv:1807.07349
(2018)
Theljani, A., Chen, K.: An augmented lagrangian method for solving a new variational model based
on gradients similarity measures and high order regularization for multimodality registration.
Inv. Prob. Imag. 13, 309–335 (2019a)
Theljani, A., Chen, K.: A nash game based variational model for joint image intensity correction
and registration to deal with varying illumination. Inv. Prob. 36, 034002 (2019b)
706 A. Theljani, et al.
Theljani, A., Chen, K.: A variational model for diffeomorphic multi-modal image registration using
a new correlation like measure. submitted (2019c)
Thompson, T., Chen, K.: An effective diffeomorphic model and its fast multigrid algorithm for
registration of lung ct images improved optimization methods for image registration problems.
J. Comput. Meth. Appl. Math. (2019)
Thompson, T., Chen, K.: A more robust multigrid algorithm for diffusion type registration models.
J. Comput. Appl. Math. 361, 502–527 (2019)
Van Leemput, K., Maes, F., Vandermeulen, D., Suetens, P.: Automated model-based bias field
correction of mr images of the brain. IEEE Trans. Med. Imaging 18(10), 885–896 (1999)
Vovk, U., Pernus, F., Likar, B.: A review of methods for correction of intensity inhomogeneity in
MRI. IEEE Trans. Med. Imaging 26(3), 405–421 (2007)
Wang, L., Pan, C.: Nonrigid medical image registration with locally linear reconstruction.
Neurocomputing 145, 303–315 (2014)
Zhang, D., Chen, K.: A novel diffeomorphic model for image registration and its algorithm. J.
Math. Imaging Vision 60, 1261–1283 (2018)
Zhang, D., Chen, K.: 3D orientation-preserving variational models for accurate image registration.
SIAM J. Imaging Sci. 13, 1653–1691 (2020)
Zhang, J., Chen, K., Yu, B.: A novel high-order functional based image registration model with
inequality constraint. Comput. Math. Appl. 72, 2887–2899 (2016)
First-Order Primal–Dual Methods for
Nonsmooth Non-convex Optimization 18
Tuomo Valkonen
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
Sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Bregman Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Primal–Dual Proximal Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Optimality Conditions and Proximal Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
Algorithm Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Block Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
Convergence Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
A Fundamental Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Ellipticity of the Bregman Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
Ellipticity for Block-Adapted Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Nonsmooth Second-Order Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
Second-Order Growth Conditions for Block-Adapted Methods . . . . . . . . . . . . . . . . . . . . . 727
Convergence of Iterates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Convergence of Gaps in the Convex-Concave Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
Inertial Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
A Generalization of the Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Inertia (Almost) as Usually Understood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Improvements to the Basic Method Without Dual Affinity . . . . . . . . . . . . . . . . . . . . . . . . . 737
Further Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Stochastic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Alternative Bregman Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
Functions on Manifolds and Hadamard Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
T. Valkonen ()
Center for Mathematical Modeling, Escuela Politécnica Nacional, Quito, Ecuador
Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
e-mail: [email protected]
Abstract
Keywords
Introduction
We now discuss sample imaging and inverse problems of the types (S) and (1) and
then outline our approach to solving them in the rest of the chapter.
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 709
Sample Problems
Optimization problems of the type (1) can effectively model linear inverse problems;
typically one would attempt to minimize the sum of a data term and a regularizer
where
The last two examples would frequently be combined with subsampling for
reconstruction from limited data.
In many important problems, T is, however, nonlinear:
In the last example, the PDE governs the physics of measurement, typically relating
boundary measurements and excitations to interior data. The methods we study in
this chapter are applied to electrical impedance tomography in Jauhiainen et al.
(2020) and Mazurenko et al. (2020).
How to fit a nonlinear forward operator T into the framework (S) that requires
both F and G∗ to be convex? If the noise model : Rn → R is convex, proper,
and lower semicontinuous, we can write (2) using the Fenchel conjugate ∗ and
KT A (x, (y1 , y2 )) := z − T (x)|y1 + Ax|y2 as
This is of the form (S) for the functions F̃ ≡ 0 and G̃∗ (y1 , y2 ) := ∗ (y1 ) −
G∗ (y2 ). Even for linear T , although (2) is readily of the form (1) and hence (S),
this reformulation may allow expressing (2) in the form (S) with both F̃ and G̃∗
“prox-simple.” We will make this concept, important for the effective realization of
algorithms, more precise in section “Primal–Dual Proximal Splitting.”
Finally, fully general K in (S) was shown in Clason et al. (2020) to be useful
for highly nonsmooth and non-convex problems, such as the Geman and Geman
(1984). Indeed, the “0-function”
⎧
⎨0, t = 0,
|t|0 :=
⎩1, t = 0,
can be written as
For the (anisotropic) Potts model, this is applied pixelwise on a discretized image
gradient computed for an n1 × n2 image by ∇h : Rn1 n2 → R2×n1 n2 (Clason et al.
2020):
1 1
n 2 n
min max b − x22 + ρ([∇h x]ij , yij ), (4)
x∈Rn n 1 2 y∈R2×n1 n2 2
i=1 j =1
Outline
method. We work in Banach spaces, as was done in Hohage and Homann (2014).
To be able to define proximal-type methods in Banach spaces, in section “Bregman
Divergences,” we introduce and recall the crucial properties of the so-called
Bregman divergences.
Our main reason for working with Bregman divergences is, however, not the
generality of Banach spaces. Rather, they provide a powerful proof tool to deal with
the general K in (S). This approach allows us in section “Convergence Theory”
to significantly simplify and better explain the original convergence proofs and
conditions of Chambolle and Pock (2011), Valkonen (2014), Clason et al. (2019),
Clason et al. (2020), and Mazurenko et al. (2020). Without additional effort, they
also allow us to present block-adapted methods like those in Valkonen and Pock
(2017), Valkonen (2019), and Mazurenko et al. (2020).
Our overall approach and the internal organization of section “Convergence
Theory” centers around the following three main ingredients of the convergence
proof:
Bregman Divergences
The norm and inner product in a (real) Hilbert space X satisfy the three-point
identity:
712 T. Valkonen
1 1 1
x − y, x − zX = x − y2X − y − z2X + x − z2X (x, y, z ∈ X). (5)
2 2 2
This is crucial for convergence proofs of optimization methods (Valkonen 2020), so
we would like to have something similar in Banach spaces—or other more general
spaces. Towards this end, we let J : X → R be a Gâteaux-differentiable function1 .
Then one can define the asymmetric Bregman divergence:
Moreover, the Bregman divergence satisfies for any x̄ ∈ X the three-point identity
4.1.1.
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 713
1
DJ (x) − DJ (y)2X∗ ≤ J (x) − J (y) − DJ (y)|x − y (x, y ∈ ). (10)
2L
Then, for any α > 0,
L α
|D1 BJ (x, y)|z − x| ≤ BJ (x, y) + BJ (z, x) (x, y, z ∈ ).
α γ
1 α
|D1 BJ (x, y)|z − x| ≤ DJ (x) − DJ (y)2X∗ + z − x2X .
2α 2
By the strong convexity, γ2 z − x2X ≤ BJ (z, x), and by the smoothness prop-
1
erty (10), 2L DJ (x) − DJ (y)2X∗ ≤ BJ (x, y). Together these estimates yield the
claim.
We now formulate a basic version of our primal–dual method. Later in section “Iner-
tial Terms” we improve the algorithm to be more effective when K is not affine in y.
•
> Notation
Throughout the manuscript, we combine the primal and dual variables x and y into
variables involving the letter u:
3 In
Banach spaces strong subdifferentiability is implied by strong convexity, as defined without
subdifferentials. In Hilbert spaces the two properties are equivalent.
714 T. Valkonen
Writing Dx K and Dy K for the Gâteaux derivatives of K with respect to the two
variables, if K is convex-concave, basic results in convex analysis (Ekeland and
Temam 1999; Bauschke and Combettes 2017) show that
If X and Y were Hilbert spaces, we could in principle use the classical proximal
point method (Minty 1961; Rockafellar 1976) to solve (12): given step length
parameters τk > 0, iteratively solve uk+1 from
Algorithm Formulation
for uk+1 . Inserting (12) and (7) for J = J 0 as defined in (14), we expand and
rearrange this implicitly defined method as:
When this map has an analytical closed-form expression, we say that F is prox-
simple (without reference to JX ). In finite dimensions, several worked out proximal
maps may be found online (Chierchia et al. 2019) or in the book (Beck 2017). Some
extend directly to Hilbert spaces or by superposition to L2 .
If K is bilinear, the two variants are the exactly same PDPS of Chambolle and
Pock (2011). For K not affine in y, the method is neither the generalized PDPS
of Clason et al. (2020) nor the version for convex-concave K from Hamedani and
Aybat (2018).
Block Adaptation
m
n
F (x) = Fj (xj ) and G∗ (y) = G ∗ (y ),
j =1 =1
m
n
JX (x) = τj−1 NXj (xj ) and JY (y) = σ −1 NY (y )
j =1 =1
The idea is that the blockwise step length parameters adapt the algorithm to
the structure of the problem. We will return their choices in the examples of
section “Ellipticity for Block-Adapted Methods.”
Recall the saddle-point formulation (3) for inverse problems with nonlinear
forward operators. We can now adapt step lengths to the constituent dual blocks.
Example 2. Let A1 ∈ C 1 (X; Y1∗ ) and A2 ∈ L(X; Y2∗ ), and suppose the convex
functions G1 : Y1∗ → R and G2 : Y2∗ → R have the preconjugates G1∗ and G2∗ .
Then we can write the problem
in the form (S) with G∗ (y1 , y2 ) = G1∗ (y1 ) + G2∗ (y2 ) and K(x, y) = A1 (x)|y1 +
A2 x|y2 . The algorithm (18) specializes as
for some step length parameters τ, σ1 , σ2 > 0. We return to their choices and
the local neighborhood of convergence in Examples 8 and 17 after developing the
necessary convergence theory.
Convergence Theory
We now seek to understand when the basic version (15) of the PDBS convergences.
The organization of this section centers around the three main ingredients of the
convergence proof, as discussed in the Introduction:
With these basic ingredients, we then prove various convergence results in sec-
tions “Convergence of Iterates” and “Convergence of Gaps in the Convex-Concave
Setting.” The usefulness of both (ii) and (iii) will become apparent from the
fundamental estimates and examples of the next section “A Fundamental Estimate.”
A Fundamental Estimate
N −1
N −1
B(ū, uN ) + B(uk+1 , uk ) + G(uk+1 , ū) ≤ B(ū, u0 ). (D)
k=0 k=0
Inserting (C), we obtain (F). Summing the latter over k = 0, . . . , N − 1 yields (D).
N −1
N −1
1 N 2 1 k+1 2 1 2
u − ūX + u − uk X + τ (F (uk+1 ) − F (ū)) ≤ ū − u0 X .
2τ 2 2
k=0 k=0
(20)
With ū a minimizer, this clearly forces F (uN ) F (ū) as N ∞, suggesting why
we call (D) the “descent inequality.”
As just discussed, for Theorem 1 to provide estimates that we can use to prove the
convergence of the PDBS, we need at least the semi-ellipticity of B 0 generated by
J 0 given in (14). Deriving simple conditions that ensure such semi-ellipticity or
ellipticity is the topic of the present subsection. To do this, we need the “basic”
Bregman divergences BX and BY on both spaces X and Y to be elliptic:
The examples that follow the next general lemma will provide improved
estimates.
LDK
BK (u , u) ≤
2
u − uX×Y . (21)
2
1 1
BK (u , u) = DK(u+t (u −u))−DK(u)|u −u dt ≤ tLDK u − uX×Y dt.
2
0 0
−1 −1
Using (21), therefore B 0 (u , u) ≥ τ −L
2
DK
x − x2X + σ −L
2
DK
y − y2Y . Thus
0 −1 −1
B is -elliptic when τ , σ ≥ LDK + . This gives the claim.
1 > τ σ A2 .
Indeed
LDA ρy
1 > τ σ L2A + τ .
2
Indeed, for any w > 1, using the mean value equality as in the proof of Lemma 2,
we deduce
Example 7. As in Example 2, take K(x, (y1 , y2 )) = A1 (x)|y1 + A2 x|y2 with
A1 ∈ C 1 (X; Y1∗ ) and A2 ∈ L(X; Y2∗ ). Then B 0 is elliptic within = X × B(0, ρy )
if
LDA1 ρy1
1 > τ σ (L2A1 + A2 2 ) + τ .
2
Indeed, we bound BK by summing (23) for A1 and (24) for A2 . This yields for any
w1 , w2 > 0 the estimate
Taking w1 = σ LA1 /(1−σ ) and w2 = σ A2 /(1−σ ) and using (22), we deduce
the claimed ellipticity for small enough > 0.
Hilbert case, Clason et al. (2019, 2020) secure such bounds by taking the primal
step length τ small enough and arguing as in Theorem 1 individually on the primal
and dual iterates.
We now study ellipticity for block-adapted methods. The goal is to obtain faster
convergence by adapting the blockwise step length parameters to the problem
structure (connections between blocks) and the local (blockwise) properties of the
problem.
m
1
n
1
B 0 (u , u) = xj − xj X + y − y Y − BK (u , u).
2 2
(26)
2τj j 2σ
j =1 =1
Example 8. Let K(x, (y1 , y2 )) = A1 (x)|y1 +A2 x|y2 with A1 ∈ C 1 (X; Y1∗ ) and
A2 ∈ L(X; Y2∗ ) as in Examples 2 and 7. Write τ = τ1 . Using (25) in (26) for m = 1
and n = 2 with (25), we see B 0 to be -elliptic within = X × B(0, ρy1 ) × Y2 if
τ −1 ≥ w1 LA1 + LDA1 ρy1 + w2 A2 + and σ1−1 ≥ w1−1 LA1 as well as σ2−1 ≥
w2−1 A2 + . Taking w1 = σ1 LA1 /(1 − σ1 ) and w2 = σ2 A2 /(1 − σ2 ), B 0 is
LDA1 ρy1
therefore elliptic (some > 0) within if 1 > τ (σ1 L2A1 + σ2 A2 2 ) + τ 2 .
m
n
Lj
BK (u , u) ≤ (xj − xj + y + y ).
2 2
(27)
2
j =1 =1
n
Consequently, using (26), we see that B 0 is -elliptic if 1 ≥ τj ( =1 Lj + ) and
1 ≥ σ ( nj=1 Lj + ) for all j = 1, . . . , m and = 1, . . . , n.
724 T. Valkonen
m
m
BK (u , u) ≤ Aj xj − xj yj − yj
j =1 =1
⎛ ⎞
m
n
wj Aj wj−1 Aj
≤ ⎝ 2
xj − xj + y − x ⎠ .
2
2 2
j =1 =1
We now study conditions for (C) to hold with G( · , ū) ≥ 0. We start by writing out
the condition for the PDBS.
Lemma 3. Let ū = (x̄, ȳ) ∈ X × Y , and suppose for some G(u, ū) ∈ R and
a neighborhood ū ⊂ X × Y that for all u = (x, y) ∈ ū , x ∗ ∈ ∂F (x) and
y ∗ ∈ ∂G∗ (y)
Let {uk+1 }k∈N be generated by the PDBS (16) for some u0 ∈ X × Y , and
suppose {uk }k∈N ⊂ ū . Then with B = B 0 the fundamental condition (C) and
the quantitative -Féjer monotonicity (F) hold for all k ∈ N, and the descent
inequality (D) holds for all N ≥ 1.
Proof. Theorem 1 proves (F) and (D) if we show (C2 ). For H in (12), we have
∗
xk+1 + Dx K(x k+1 , y k+1 ) ∗
xk+1 ∈ ∂F (x k+1 ),
hk+1 = ∗ ∈ H (uk+1 ) with ∗
yk+1 − Dy K(x k+1 , y k+1 ) yk+1 ∈ ∂G∗ (y k+1 ).
∗ , y ∗ ).
Thus (C) expands as (C2 ) for u = uk+1 and (x ∗ , y ∗ ) = (xk+1
k+1
x̂ ∗ + Dx K(x̂, ŷ) x̂ ∗ ∈ ∂F (x̂),
0= ∈ H (û) with (28)
ŷ ∗ − Dy K(x̂, ŷ) ŷ ∗ ∈ ∂G∗ (ŷ).
If γ = 0, we drop the word “strong.” For T = ∂F , (29) follows from the γ -strong
subdifferentiability of F .
γF x − x̂2 +γG∗ y − ŷ2 ≥ BK (û, u)+BK (u, û)+G(u, û) (u ∈ ū ), (30)
equivalently
γF x − x̂2 + γG∗ y − ŷ2 ≥ aK (û, u) + aK (u, û) + G(u, û) (u ∈ ū ) (30 )
for
aK (u, ū) := K(x, y)−K(x̄, ȳ)+Dx K(x, y)|x̄ −x+Dy K(x̄, ȳ)|ȳ −y. (31)
S (u, u ) :=
Note that (30) involves the symmetrized Bregman divergence BK
BK (u, u ) + BK (u , u) generated by K.
Using the assumed strong monotonicities, and the definitions of BK and aK , this is
immediately seen to hold when (30) or (30 ) does.
726 T. Valkonen
Example 12. If K is convex-concave, the next Lemma 5 and Lemma 4 prove (C2 )
for
This is in particular true for K(x, y) = Ax|y + E(x) with A ∈ L(X; Y ∗ ) and
E ∈ C 1 (X) convex.
G(u, û) = (γF − LDK )x − x̂2 + (γG∗ − LDK )y − ŷ2 .
Example 14. Let K(x, y) = A(x)|y for some A ∈ L(X; Y ∗ ) such that DA is
Lipschitz with the factor LDA ≥ 0. For some γ̃F , γ̃G∗ ≥ 0 and ρy , ρ̂x , α > 0, let
either
LDA
(a) γ̃F ≥ 2 (ρ + ŷY ), γ̃G∗ ≥ 0, and û = X × B(0, ρy ); or
y
(b) γ̃F > LDA ŷY + α2 , γ̃G∗ ≥ L2α ρ̂x , and û = B(x̂, ρ̂x ) × Y .
DA 2
G(u, û) = (γF − γ̃F )x − x̂2 + (γG∗ − γ̃G∗ )y − ŷ2 .
Arguing with the mean value equality and the Lipschitz assumption as in Lemma 2,
we get aK (û, u) + aK (u, û) ≤ LDA 2
2 (yY + ŷY )x − x̂ . Thus (a) implies (30 ).
By (32), the mean-value equality, and the Lipschitz assumption, also
Remark 4. In the last two examples, we need to bound some of the iterates and
to initialize close enough to a solution. Showing that the iterates stay in a local
neighborhood is a large part of the work in Clason et al. (2019, 2020), as discussed
in Remark 3.
As only some of the component functions may have γFj , γG ∗ > 0, through
detailed analysis of the block structure, we hope to obtain (strong) convergence on
some subspaces even if the entire primal or dual variables might not converge.
Similarly to Lemma 4 we prove:
m
n
k+1 := γ̃Fj xj − x̂j 2Xj + γ̃G ∗ y − ŷ 2Y ≥ aK (û, u) + aK (u, û)
j =1 =1
for some γ̃Fj , γG ∗ ≥ 0 for all u ∈ û . Then (C2 ) holds with
728 T. Valkonen
m
n
G(u, û) = (γFj − γ̃Fj )xj − x̂j 2Xj + (γG ∗ − γ̃G ∗ )y − ŷ 2Y . (33)
j =1 =1
n
m
γ̃Fj = Lj (j = 1, . . . , m) and γ̃G ∗ = Lj ( = 1, . . . , n).
=1 j =1
n m
Thus G( · , û) ≥ 0 if γFj ≥ =1 Lj and γG ∗ ≥ j =1 Lj for all and j .
Example 17. As in Example 2, let K(x, y) = A1 (x)|y1 + A2 x|y2 for A1 ∈
C 1 (X; Y1∗ ) and A2 ∈ L(X; Y2∗ ). Then, as in (32),
which does not depend on A2 . For any α, ρy , ρ̂x > 0, let either
LDA1
(a) γ̃F ≥ 2 (ρ + ŷ1 Y1 ), γ̃G1∗ ≥ 0, and û = X × B(0, ρy1 ); or
y1 LDA
(b) γ̃F > LDA1 ŷ1 Y1 + α2 , γ̃G1∗ ≥ 2α 1 ρ̂x2 , and û = B(x̂, ρ̂x ) × Y .
Arguing as in Example 14 and using Lemma 6, we then see (C2 ) to hold with G
as in (33) and γ̃G2∗ = 0. In this case G( · , û) is non-negative if γF ≥ γ̃F and
γG1∗ ≥ γ̃G1∗ .
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 729
Convergence of Iterates
We are now ready to prove the convergence of the iterates. We start with weak
convergence and proceed to strong and linear convergence. For weak convergence
in infinite dimensions, we need some further technical assumptions. We recall
that a set-valued map T : X ⇒ X∗ is weak-to-strong (weak-∗-to-strong) outer
∗
semicontinuous if xk∗ ∈ T (x k ) and x k x (x k x) and xk∗ → x ∗ imply
x ∗ ∈ T (x). The nonreflexive case of the next assumption covers spaces of functions
of bounded variation (Ambrosio et al. 2000, Remark 3.12), important for total
variation based imaging.
Assumption 1. Each of the spaces X and Y is, individually, either a reflexive Banach
space or the dual of separable space. The operator H : X × Y ⇒ X∗ × Y ∗ is weak(-
∗)-to-strong outer semicontinuous, where we mean by “weak(-∗)” that we take the
weak topology if the space is reflexive and weak-∗ otherwise, individually on X
and Y .
•
> Verification of the conditions
To verify the nonsmooth second-order growth condition (C2 ) for each of the
following Theorems 2, 3, and 4, we point to sections “Nonsmooth Second-Order
Conditions” and “Second-Order Growth Conditions for Block-Adapted Methods.”
For the verification of the (semi-)ellipticity of B 0 , we point to sections “Ellipticity of
the Bregman Divergences” and “Ellipticity for Block-Adapted Methods.” As special
cases of the PDBS (16), the theorems apply to the Hilbert-space PDPS (17) and its
block adaptation (18). Then JX and JY are continuously differentiable and convex.
5 This result seems difficult to find in the literature for Banach spaces but follows easily from the
definition of the subdifferential: If F (x) ≥ F (x k ) + xk∗ |x − x k and xk∗ → x̂ ∗ as well as x k (or
∗
) x̂, then, using the fact that {x k − x̂}k∈N is bounded, in the limit F (x) ≥ F (x̂) + x̂ ∗ |x − x̂.
730 T. Valkonen
Let {uk+1 }k∈N be generated by the PDBS (16) for any initial u0 , and suppose
{uk }k∈N ⊂ ∩ û . Then there exists at least one cluster point of {uk }k∈N , and
all weak(-∗) cluster points belong to H −1 (0).
Proof. Lemma 3 establishes (D) for B = B 0 and all N ≥ 1. With > 0 the factor
of ellipticity of B 0 , it follows
N −1
N 2 k+1 2
u − ûX×Y + u − uk X×Y ≤ B 0 (û, u0 ) (N ≥ 1).
2 2
k=0
Remark 5. For a unique weak limit, we may in Hilbert spaces use the quantitative
Féjer monotonicity (F) with Opial’s lemma (Opial 1967; Browder 1967). For
bilinear K the result is relatively immediate, as B 0 is a squared matrix-weighted
norm; see Valkonen (2020). Otherwise a variable-metric Opial’s lemma (Clason
et al. 2019) and additional work based on the Brezis–Crandall–Pazy lemma (Brezis
et al. 1970, Corollary 20.59 (iii)) are required; see Clason et al. (2019) for K(x, y) =
A(x)|y and Clason et al. (2020) for general K.
Let {uk+1 }k∈N be generated by the PDBS (16) for any initial u0 , and suppose
{uk }k∈N ⊂ ∩ û . Then G(uk+1 , û) → 0 as N → ∞.
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 731
O(1/N).
−1
Proof. Lemma 3 establishes (D). By the semi-ellipticity of B 0 , then N
k=0 G(u
k+1 ,
û) ≤ B (û, u ), (N ∈ N). Since G(u , û) ≥ 0, this shows that G(u , û) → 0.
0 0 k+1 N
The strong convergence of the primal variable for quadratically minorized G is then
immediate whereas following by Jensen’s inequality gives the ergodic convergence
claim.
Let {uk+1 }k∈N be generated by the PDBS (16) for any initial u0 , and suppose
{uk }k∈N ⊂ ∩ û . Then B 0 (û, uN ) → 0 and uN → û at a linear rate.
In particular, if G(u, û) ≥ γ u − û2 , (k ∈ N), for some γ > 0, and J 0 is
Lipschitz-continuously differentiable, then uN → û at a linear rate.
Proof. Lemma 3 establishes the quantitative -Féjer monotonicity (F). Using (i),
this yields (1 + γ )B 0 (û, uk+1 ) ≤ B 0 (û, uk ). By the semi-ellipticity of B 0 , the
claimed linear convergence of B 0 (û, uN ) → 0 follows. Since B 0 is assumed
elliptic, also uN → û linearly. If J 0 is Lipschitz-continuously differentiable, then,
2
similarly to Lemma 2, B 0 (û, uk+1 ) ≤ LDJ uk+1 − û for some LDJ > 0.
−1 0
Thus G(uk+1 , û) ≥ γ HDJ B (û, uk+1 ), so the main claim establishes the particular
claim.
We finish this section by studying the convergence of gap functionals in the convex-
concave setting.
Lemma 7. Suppose F and G∗ are convex, proper, and lower semicontinuous and
K ∈ C 1 (X × Y ) is convex-concave on dom F × dom G∗ . Then (C2 ) holds for all
ū ∈ X × Y with ū = X × Y and G = GL the Lagrangian gap
−1 L k+1
(i) 0 ≤ N1 N k=0 G (u , û) → 0 at the rate O(1/N) for û ∈ H −1 (0).
L N
(ii) 0 ≤ G (ũ , û) → 0 at the rate O(1/N) for û ∈ H −1 (0).
(iii) If M ∈ C(X × Y ) and ⊂ X × Y is bounded with ∩ H −1 (0) = ∅,
then 0 ≤ G (ũN ) → 0 at the rate O(1/N) for the partial gap G (u) :=
supū∈ GL (u, ū).
for all (x, y) ∈ X × Y . Also using x ∗ ∈ ∂F (x k+1 ) and y ∗ ∈ ∂G( y k+1 ) with the
definition of the convex subdifferential, we see that G = GL satisfies (C2 ). The
non-negativity of G( · , û) follows by similar reasoning, first using that
K(x, ŷ) − K(x̂, y) ≥ Dx K(x̂, ŷ)|x − x̂ − Dy K(x̂, ŷ)|y − ŷ (34)
for all (x, y) ∈ X × Y and following by the definition of the subdifferential applied
to −Dx K(x̂, ŷ) ∈ ∂F (x̂) and Dy K(x̂, ŷ) ∈ ∂G∗ (ŷ).
For (i)–(iii), we first observe that the semi-ellipticity of B 0 and (C2 ) imply
N −1 L k+1
k=0 G (u , ū) ≤ M(ū). Dividing by N and using that GL (uk+1 , û) ≥ 0 for
ū ∈ H (0), we obtain (i). Jensen’s inequality then gives GL (ũk+1 , ū) ≤ M(ū)/N ,
−1
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 733
hence (ii) for ū ∈ H −1 (0). Finally, taking the supremum over ū ∈ gives (iii)
because M is bounded on bounded sets.
Inertial Terms
−
0 ∈ H (uk+1 ) + D1 Bk+1 (uk+1 , uk ) + D1 Bk+1 (uk , uk−1 ), (IPP)
− −
for Bk+1 := BJk+1 and Bk+1 := BJ − generated by Jk+1 , Jk+1 : U → R. We take
k+1
u−1 := u0 for this to be meaningful for k = 0. Our main reason for introducing
the dependence on uk−1 is to improve (16) and (17) to be explicit in K when K is
not affine in y: Otherwise the dual step of those methods is in general not practical
to compute unlike the affine case of Remark 1. Along the way we also construct a
more conventional inertial method.
− −
hk+1 |uk+1 −ū ≥ [(Bk+2 +Bk+3 )−(Bk+1 +Bk+2 )](ū, uk+1 )+G(uk+1 , ū) (IC)
734 T. Valkonen
−
holds, and Bk+1 satisfies the general Cauchy inequality
−
D1 Bk+1 (uk , u)|uk − u ≤ Bk+1
(uk , u) + Bk+1 (u , uk ) (u, u ∈ X) (35)
, B
for some Bk+1 k+1 : U × U → R, then we have the modified descent inequality
N −1
− −
[BN +1 + BN +2 − BN +1 ](ū, u ) +
N
[Bk+1 + Bk+2 − Bk+1 − Bk+2 ](uk+1 , uk )
k=0
N −1
+ G(uk+1 , ū) ≤ [B1 + B2− ](ū, u0 ).
k=0
(ID)
−
0 = hk+1 + D1 Bk+1 (uk+1 , uk ) + D1 Bk+1 (uk , uk−1 ) for some hk+1 ∈ H (uk+1 ).
(36)
Testing (IPP) by applying · |uk+1 − ū, we obtain
−
0 = hk+1 + D1 Bk+1 (uk+1 , uk ) + D1 Bk+1 (uk , uk−1 )|uk+1 − ū.
N −1
−
0 = SN + hk+1 + D1 [Bk+1 + Bk+2 ](uk+1 , uk )|uk+1 − ū (37)
k=0
for
N −1
SN := D1 BJ − (uN , uN −1 )|ū − uN + D1 BJ − (uk , uk−1 )|uk+1 − uk .
N+1 k+1
k=0
−
Abbreviating B̄k+1 := Bk+1 + Bk+2 and using (IC) and the three-point identity (8)
in (37), we obtain
N −1
0 ≥ SN + B̄k+2 (ū, uk+1 ) − B̄k+1 (ū, uk ) + B̄k+1 (uk+1 , uk ) + G(uk+1 , ū) .
k=0
Using the generalized Cauchy inequality (35) and, again, that u−1 = u0 , we get
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 735
N −1
N −1
SN ≥−BN N
+1 (u , u )−BN N
+1 (ū, u )− Bk+1 (uk , uk−1 )+Bk+1 (uk+1 , uk )
k=0
N −1
= −BN +1 (ū, u ) −
N
[Bk+1 + Bk+2 ](uk+1 , uk ).
k=0
Inertial PDBS
Iteratively over k ∈ N, solve for x k+1 and y k+1 :
More generally, however, (38) does not directly apply inertia to the iterates. It
applies inertia to K.
The general Cauchy inequality (35) automatically holds by the three-point
− − −
identity (8) with Jk+1 = Jk+1 = Jk+1 if Bk+1 ≥ 0, which is to say that Jk+1 is
convex. This is the case if λk ≤ 0. For usual inertia we, however, want λk > 0. We
will therefore use Lemma 1, requiring:
Moreover, the parameters {λk }k∈N are non-increasing and for some > 0
1 − − λk β
0 ≤ λk+1 ≤ (k ∈ N). (41)
2
Example 25. The bound (41) holds for some > 0 if λk ≡ λ for 0 ≤ λ < 1/(2+β).
Lemma 8. Suppose Assumption 2 holds and that (C2 ) holds within ū for some
ū ∈ and G(u, ū). Given u0 ∈ , suppose the iterates generated by the inertial
PDBS (38) satisfy {uk }N
k=0 ⊂ ū ∩ . Then
N −1
N −1
B 0 (ū, uN ) + B 0 (uk+1 , uk ) + G(uk+1 , ū) ≤ (1 − λ1 )B 0 (ū, u0 ). (42)
k=0 k=0
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 737
−
Proof. Since Bk+1 = B 0 and Bk+1 = −λk B 0 for all k ∈ N,
− −
(Bk+2 + Bk+3 ) − (Bk+1 + Bk+2 ) = (λk+1 − λk+2 )B 0 .
−
[BN +1 + BN +2 − BN +1 ](ū, u ) = (1 − λk+1 − λk β)B (ū, u )
N 0 N
and
−
[Bk+1 + Bk+2 −Bk+1 −Bk+2 ](uk+1 , uk ) = (1 − λk+1 − λk β − λk+1 )B 0 (uk+1 , uk ).
Proof. We replace Lemma 3 and (D) by Lemma 8 and (42) in the proofs of
Theorems 2, 3, and 5. Observe that Assumption 2 implies that B 0 is (semi-)elliptic.
Remark 8. Since Theorem 6 does not provide the quantitative -Féjer monotonic-
ity used in Theorem 4, we cannot prove linear convergence using our present
simplified “testing” approach lacking the “testing parameters” of Valkonen (2020).
We now have the tools to improve the basic PDBS (16) to enjoy prox-simple steps
for general K not affine in y. Compared to (14) we amend Jk+1 = J 0 by taking
738 T. Valkonen
−
Jk+1 (u) := [J 0 − Jk ](u) = −2K(x k , y). (44)
−
As always, we write Bk+1 , B 0 , and Bk+1 for the Bregman divergences generated by
−
Jk+1 , J 0 , and Jk+1 .
Since
− ∗
D1 [Bk+1 − B 0 ](uk , uk−1 ) + D1 Bk+1 (uk , uk−1 ) = (0, ỹk+1 )
for
∗
ỹk+1 = 2[Dy K(x k+1 , y k+1 ) − Dy K(x k+1 , y k ) − Dy K(x k , y k ) + Dy K(x k , y k−1 )],
Modified PDBS
Iteratively over k ∈ N, solve for x k+1 and y k+1 :
The method reduces to the basic PDBS (16) when K is affine in y. In Hilbert
spaces X and Y with JX = τ −1 NX and JY = σ −1 NY , we can rearrange (45) as
Modified PDPS
Iterate over k ∈ N:
Remark 9. The modified PDPS (46) is slightly more complicated than the method
in Clason et al. (2020), which would update
Likewise, (45) is different from the algorithm presented in Hamedani and Aybat
(2018) for convex-concave K. It would, for the standard generating functions,
update6
−
We could produce this method by taking Jk+1 (u) = −K(x k , y). However, the
convergence proofs would require some additional steps.
−
D1 Bk+1 (uk , u)|uk − u = 2Dy K(x k , y k ) − Dy K(x k , y)|y k − y (48)
2 2
≤ LDK,y y − y k + LDK,y y − y k
=: Bk+1 (uk , u) + Bk+1 (u , uk ).
6 Notethat Hamedani and Aybat (2018) uses the historical ordering of the primal and dual updates
from Chambolle and Pock (2011), prior to the proof-simplifying discovery of the proximal point
formulation in He and Yuan (2012). Hence our y k is their y k+1 .
740 T. Valkonen
Lemma 9. Suppose Assumption 3 holds and (C2 ) holds within ū for some ū ∈
X × Y and G(u, ū). Given u0 ∈ X × Y , suppose the iterates generated by the
modified PDBS (45) satisfy {uk }N
k=0 ⊂ ū . Then
N −1
N −1
B 0 (ū, uN ) + B 0 (uk+1 , uk ) + G(uk+1 , ū) ≤ [B1 + B2− ](ū, u0 ). (50)
k=0 k=0
Proof. Inserting (43) and (44), (IC) reduces to (C), which follows from (C2 ) as in
Lemma 3. We verify (35) via (48) and Assumption 3. Thus Theorem 6 proves (ID).
Inserting (47) and (49) with Bk+1
and Bk+1 from (48) into (ID) proves (50).
Proof. We replace Lemma 3 and (D) by Lemma 9 and (50) in Theorems 2, 3, and 5.
Observe that (strong) Assumption 3 implies the (semi-)ellipticity of B 0 .
Now we have a locally convergent method (46) with easily implementable steps
to tackle problems such as Potts segmentation (4) (Clason et al. 2020).
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 741
Further Directions
We close by briefly reviewing some things not covered, other possible extensions,
and alternative algorithms.
Acceleration
To avoid technical detail, we did not cover O(1/N 2 ) acceleration. The fundamental
ingredients of proof are, however, exactly the same as we have used: sufficient
second-order growth and ellipticity of the Bregman divergences Bk0 , which are now
iteration-dependent. Additionally, a portion of the second-order growth must be
used to make the metrics Bk0 grow as k → ∞. For bilinear K in Hilbert spaces, such
an argument can be found in Valkonen (2020); for K(x, y) = A(x)|y in Clason
et al. (2019); and for general K in Clason et al. (2020). As mentioned in Remarks 1
and 9, the algorithms in the latter two differ slightly from the ones presented here.
Stochastic Methods
It is possible to refine the block-adapted (18) and its accelerated version into
stochastic methods. The idea is to take on each step subsets of primal-blocks
S(i) ⊂ {1, . . . , m} and dual blocks V (i + 1) ⊂ {1, . . . , n} and to only update the
corresponding xjk+1 and y k+1 . Full discussion of such technical algorithms is out-
side the scope of our present overview. We refer to Valkonen (2019) for an approach
covering block-adapted acceleration and both primal and dual randomization in the
case of bilinear K, but see also Chambolle et al. (2018) for a more basic version.
For more general K affine in y, see Mazurenko et al. (2020).
We have used Bregman divergences as a proof tool, in the end opting for the standard
quadratic generating functions on Hilbert spaces. Nevertheless, our theory works for
arbitrary Bregman divergences. The practical question is whether F and G∗ remain
prox-simple with respect to such a divergence. This can be the case for the “entropic
distance” generated on L1 (; [0, ∞)) by
⎧
⎨ x(t) ln x(t) dt, x ≥ 0 a.e. on ,
J (x) :=
⎩∞, otherwise
742 T. Valkonen
See, for example, Burger et al. (2019) for a Landweber method (gradient descent on
regularized least squares) based on such a distance.
Alternative Approaches
as the “generalized iterative soft thresholding” (GIST), but has also been called
the primal–dual fixed point method (PDFP, Chen et al. 2013) and the proximal
alternating predictor corrector (PAPC, Drori et al. 2015).
The classical Augmented Lagrangian method solves the saddle point problem
τ
min max F (x) + E(x)2 + E(x)|y, (53)
x y 2
The PDPS has been extended in Begmann et al. (2019) to functions on Riemannian
manifolds: the problem minx∈M F (x) + G(Ex), where E : M → N with M and
N Riemannian manifolds. In general, between manifolds, there are no linear maps,
so E is nonlinear. Indeed, besides introducing a theory of conjugacy for functions
on manifolds, the algorithm presented in Begmann et al. (2019) is based on the
NL-PDPS of Valkonen (2014); Clason et al. (2019).
Convergence could only be proved on Hadamard manifolds, which are special:
a type of three-point inequality holds (do Carmo 2013, Lemma 12.3.1). Indeed,
744 T. Valkonen
in even more general Hadamard spaces with the metric d, for any three points
x k+1 , x k , x̄, we have (Bačák 2014, Corollary 1.2.5)
1 1 1
d(x k , x k+1 )2 + d(x k+1 , x̄)2 − d(x k , x̄)2 ≤ d(x k , x k+1 )d(x̄, x k+1 ). (54)
2 2 2
Multiplying this inequality by d(x̄, x k+1 ) and using the three-point inequality (54)
1 1 1
d(x k , x k+1 )2 + d(x k+1 , x̄)2 + [f (x k+1 ) − f (x k )]d(x̄, x k+1 ) ≤ d(x k , x̄)2 .
2 2 2
Glossary
F (x ) − F (x) ≥ x ∗ |x − x (x ∈ X).
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 745
References
Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity
Problems. Oxford University Press (2000)
Arridge, S.R., Kaipio, J.P., Kolehmainen, V., Tarvainen, T.: Optical imaging. In: Scherzer, O. (ed.)
Handbook of Mathematical Methods in Imaging, pp. 735–780. Springer, New York (2011).
https://doi.org/10.1007/978-0-387-92920-0_17
Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford
University Press (1958)
Bačák, M.: Convex Analysis and Optimization in Hadamard Spaces, Nonlinear Analysis and
Applications. De Gruyter (2014)
746 T. Valkonen
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert
Spaces. CMS Books in Mathematics, 2 edition. Springer (2017). https://doi.org/10.1007/978-3-
319-48311-5
Beck, A.: First-Order Methods in Optimization. SIAM (2017). https://doi.org/10.1137/1.
9781611974997
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image
denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009). https://
doi.org/10.1109/tip.2009.2028250
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM J. Imaging Sci. 2, 183–202 (2009). https://doi.org/10.1137/080716542
Begmann, R., Herzog, R., Tenbrick, D., Vidal-Núñez, J.: Fenchel duality for convex optimization
and a primal dual algorithm on Riemannian manifolds (2019). arXiv:1908.02022
Benning, M., Knoll, F., Schönlieb, C.B., Valkonen, T.: Preconditioned ADMM with nonlinear
operator constraint. In: System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO
2015, Sophia Antipolis, 29 June–3 July 2015, Revised Selected Papers, pp. 117–126. Springer
(2016). https://doi.org/10.1007/978-3-319-55795-3_10. arXiv:1511.00425
Bredies, K., Sun, H.: Preconditioned Douglas–Rachford splitting methods for convex-concave
saddle-point problems. SIAM J. Numer. Anal. 53, 421–444 (2015). https://doi.org/10.1137/
140965028
Brezis, H., Crandall, M.G., Pazy, A.: Perturbations of nonlinear maximal monotone sets in
Banach space. Commun. Pure Appl. Math. 23, 123–144 (1970). https://doi.org/10.1002/cpa.
3160230107
Browder, F.E.: Convergence theorems for sequences of nonlinear operators in Banach spaces.
Mathematische Zeitschrift 100, 201–225 (1967). https://doi.org/10.1007/bf01109805
Burger, M., Resmerita, E., Benning, M.: An entropic Landweber method for linear ill-posed
problems (2019) arXiv:1906.10032
Chambolle, A., DeVore, R.A., Lee, N.Y., Lucier, B.J.: Nonlinear wavelet image processing:
variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans.
Image Process. 7, 319–335 (1998). https://doi.org/10.1109/83.661182
Chambolle, A., Ehrhardt, M., Richtárik, P., Schönlieb, C.: Stochastic primal-dual hybrid gradient
algorithm with arbitrary sampling and imaging applications. SIAM J. Optim. 28, 2783–2808
(2018). https://doi.org/10.1137/17m1134834
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-
0251-1
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm.
Math. Program. 1–35 (2015). https://doi.org/10.1007/s10107-015-0957-3
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25,
161–319 (2016). https://doi.org/10.1017/s096249291600009x
Chen, P., Huang, J., Zhang, X.: A primal-dual fixed point algorithm for convex separable
minimization with applications to image restoration. Inverse Probl. 29, 025011 (2013). https://
doi.org/10.1088/0266-5611/29/2/025011
Chierchia, G., Chouzenoux, E., Combettes, P.L., Pesquet, J.C.: The Proximity Operator Repository
(2019). http://proximity-operator.net. Online resource
Clarke, F.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics
(1990). https://doi.org/10.1137/1.9781611971309
Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order
primal-dual method for nonconvex problems. SIAM J. Optim. 29, 933–963 (2019). https://doi.
org/10.1137/18m1170194. arXiv:1802.03347
Clason, C., Mazurenko, S., Valkonen, T.: Primal-dual proximal splitting and generalized conju-
gation in nonsmooth nonconvex optimization. Appl. Math. Optim. (2020). https://doi.org/10.
1007/s00245-020-09676-1. arXiv:1901.02746
Clason, C., Valkonen, T.: Introduction to Nonsmooth Analysis and Optimization (2020).
arXiv:2001.00216. Work in progress
18 First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization 747
Condat, L.: A primal–dual splitting method for convex optimization involving lipschitzian,
proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013). https://
doi.org/10.1007/s10957-012-0245-9
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004). https://
doi.org/10.1002/cpa.20042
do Carmo, M.P.: Riemannian Geometry. Mathematics: Theory & Applications. Birkhäuser (2013)
Douglas Jim, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two
and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956). https://doi.org/10.2307/
1993056
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave
saddle-point problems. Oper. Res. Lett. 43, 209–214 (2015). https://doi.org/10.1016/j.orl.2015.
02.001
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM (1999)
Federer, H.: Geometric Measure Theory. Springer (1969)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M.,
Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution
of Boundary-Value Problems. Studies in Mathematics and Its Applications, vol. 15, pp. 299–
331. North-Holland (1983)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration
of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/
tpami.1984.4767596
Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex-concave saddle point
problems (2018). arXiv:1803.01401
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem:
from contraction perspective. SIAM J. Imaging Sci. 5, 119–149 (2012). https://doi.org/10.1137/
100814494
Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Grundlehren Text Edi-
tions. Springer (2004)
Hohage, T., Homann, C.: A Generalization of the Chambolle-Pock Algorithm to Banach Spaces
with Applications to Inverse Problems (2014). arXiv:1412.0126
Hunt, A.: Weighing without touching: applying electrical capacitance tomography to mass flowrate
measurement in multiphase flows. Meas. Control 47, 19–25 (2014). https://doi.org/10.1177/
0020294013517445
Jauhiainen, J., Kuusela, P., Seppänen, A., Valkonen, T.: Relaxed Gauss–Newton methods with
applications to electrical impedance tomography. SIAM J. Imaging Sci. 13, 1415–1445 (2020).
https://doi.org/10.1137/20m1321711. arXiv:2002.08044
Kingsley, P.: Introduction to diffusion tensor imaging mathematics: Parts I–III. Concepts Magn.
Reson. Part A 28, 101–179 (2006). https://doi.org/10.1002/cmr.a.20048
Kuchment, P., Kunyansky, L.: Mathematics of photoacoustic and thermoacoustic tomography. In:
Scherzer, O. (ed.) Handbook of Mathematical Methods in Imaging, pp. 817–865. Springer, New
York (2011). https://doi.org/10.1007/978-0-387-92920-0_19
Lions, P., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer.
Anal. 16, 964–979 (1979). https://doi.org/10.1137/0716071
Lipponen, A., Seppänen, A., Kaipio, J.P.: Nonstationary approximation error approach to imaging
of three-dimensional pipe flow: experimental evaluation. Meas. Sci. Technol. 22, 104013
(2011). https://doi.org/10.1088/0957-0233/22/10/104013
Loris, I., Verhoeven, C.: On a generalization of the iterative soft thresholding algorithm for the case
of non-separable penalty. Inverse Probl. 27, 125007 (2011). https://doi.org/10.1088/0266-5611/
27/12/125007
Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: the application of compressed sensing for rapid
MR imaging. Magn. Reson. Med. 58, 1182–1195 (2007). https://doi.org/10.1002/mrm.21391
Malitsky, Y., Tam, M.K.: A forward-backward splitting method for monotone inclusions without
cocoercivity (2018). arXiv:1808.04162
748 T. Valkonen
Mazurenko, S., Jauhiainen, J., Valkonen, T.: Primal-dual block-proximal splitting for a class of
non-convex problems, Electron. Trans. Numer. Anal. 52, 509–552 (2020). https://doi.org/10.
1553/etna_vol52s509. arXiv:1911.06284
Minty, G.J.: On the maximal domain of a “monotone” function. Mich. Math. J. 8, 135–137 (1961)
Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization
(Translated from Russian). Wiley Interscience Series in Discrete Mathematics. Wiley (1983)
Nishimura, D.: Principles of Magnetic Resonance Imaging. Stanford University (1996)
Ollinger, J.M., Fessler, J.A.: Positron-emission tomography. IEEE Signal Process. Mag. 14, 43–55
(1997). https://doi.org/10.1109/79.560323
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive
mappings. Bull. Am. Math. Soc. 73, 591–597 (1967). https://doi.org/10.1090/s0002-9904-
1967-11761-0
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex
optimization. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1762–
1769. IEEE (2011). https://doi.org/10.1109/iccv.2011.6126441
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-
Shah functional. In: 12th IEEE Conference on Computer Vision, pp. 1133–1140. IEEE (2009).
https://doi.org/10.1109/iccv.2009.5459348
Rockafellar, R.T.: Convex Analysis. Princeton University Press (1972)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Optim. 14, 877–
898 (1976). https://doi.org/10.1137/0314056
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60, 259–268 (1992)
Shen, J., Chan, T.F.: Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math.
62, 1019–1043 (2002). https://doi.org/10.1137/s0036139900368844
Trucu, D., Ingham, D.B., Lesnic, D.: An inverse coefficient identification problem for the
bio-heat equation. Inverse Probl. Sci. Eng. 17, 65–83 (2009). https://doi.org/10.1080/
17415970802082880
Uhlmann, G.: Electrical impedance tomography and Calderón’s problem. Inverse Probl. 25,
123011 (2009). https://doi.org/10.1088/0266-5611/25/12/123011
Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications
to MRI. Inverse Probl. 30, 055012 (2014). https://doi.org/10.1088/0266-5611/30/5/055012.
arXiv:1309.5032
Valkonen, T.: Block-proximal methods with spatially adapted acceleration. Electron. Trans. Numer.
Anal. 51, 15–49 (2019). https://doi.org/10.1553/etna_vol51s15. arXiv:1609.07373
Valkonen, T.: Inertial, corrected, primal-dual proximal splitting. SIAM J. Optim. 30, 1391–1420
(2020). https://doi.org/10.1137/18m1182851. arXiv:1804.08736
Valkonen, T.: Testing and non-linear preconditioning of the proximal point method. Appl. Math.
Optim. 82 (2020). https://doi.org/10.1007/s00245-018-9541-6. arXiv:1703.05705
Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex func-
tions. J. Math. Imaging Vis. 59, 394–414 (2017) https://doi.org/10.1007/s10851-016-0692-2.
arXiv:1511.06566
Vogel, C.R., Oman, M.E.: Fast, robust total variation-based reconstruction of noisy, blurred images.
IEEE Trans. Image Process. 7, 813–824 (1998). https://doi.org/10.1109/83.679423
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv.
Comput. Math. 38, 667–681 (2013). https://doi.org/10.1007/s10444-011-9254-8
Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman
iteration. J. Sci. Comput. 46, 20–46 (2011). https://doi.org/10.1007/s10915-010-9408-8
Part II
Model- and Data-Driven Variational
Imaging Approaches
Learned Iterative Reconstruction
19
Jonas Adler
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Gradient-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
Proximal-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
Primal-Dual Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Other Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
Engineering Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
Architectures for Learned Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
Parameter Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Further Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Learned Step Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Scalable Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
Abstract
J. Adler ()
Department of Mathematics, KTH – Royal Institute of Technology, Stockholm, Sweden
e-mail: [email protected]
Now with DeepMind, London, UK
about the physics of the image formation process, represented by the forward
operator, with soft knowledge about how the reconstructions should look like,
represented by deep neural networks. A diverse set of such methods have been
proposed, and this chapter seeks to give an overview of their similarities and
differences, as well as discussing some of the commonly used methods to
improve their performance.
Keywords
Introduction
Tx = y + δ
should be a scalar field in space. Since the input and output lives in different spaces,
we cannot even perform standard linear operations on them, such as addition, much
less hope that a convolution would take us from one to the other.
One way to solve this would be to generalize the concept of convolutions,
and a significant effort has actually been spent on how to connect these spaces
in mathematically rigorous ways. Notably the field of Fourier integral operators
(FIO) (Hörmander 1971) has been developed, and these operators can be seen
as generalizations of convolutions. However, the simple point-correspondence of
convolutions breaks down, and instead we get a point-to-set correspondence, the
canonical relation. Fourier integral operators are also notoriously complicated
to work with and often computationally expensive. For this reason, the gener-
alization of convolutional neural networks, perhaps FIO-neural networks (Feliu-
Faba et al. 2019; Alizadeh et al. 2019), has so far not been applied to inverse
problems.
While some have gone for a fully learned approach, ignoring the inherent
symmetries, this does not seem to scale to realistic problem sizes (Zhu et al. 2018).
Instead researchers have taken a middle way of incorporating more knowledge about
the forward operator in a separate non-learned way into their learned reconstruction
techniques. A very successful early approach has been to somehow convert the
reconstruction problem into an image processing problem, which is easier than
one might expect. Simply start by applying any reconstruction operator to the data
to obtain a suboptimal initial reconstruction and then train a convolutional neural
network to map the initial reconstruction to a more high-quality reconstruction (Jin
et al. 2017; Kang et al. 2017).
While such methods incorporate significant components of the physics of the
problem, encapsulated in the initial reconstruction, this also gives the methods a
strong bias toward the result of the initial reconstruction, and in particular if there
is any information lost in the initial reconstruction, it cannot be recovered by the
post-processing.
An alternative, learned iterative reconstruction, has been developed in recent
years. In learned iterative reconstruction, the physics of the problem is not seen as
a separate component to be done prior to applying learning, but rather it is seen
as an integral component of the learned reconstruction operator of equal footing
with other commonly used components in neural networks such as convolutions and
pointwise nonlinearities, thus allowing us to learn a reconstruction method acting on
measured raw data.
This chapter will survey the development of these learned iterative reconstruction
schemes and try to give an overview of architectures, training procedures, and
practical and theoretical results. We note that several other high-quality review
papers have looked at deep learning for inverse problems (Wang et al. 2018;
McCann and Unser 2019; Arridge et al. 2019; Hammernik and Knoll 2020) and
invite the reader to look at them for a broader overview of other techniques to use
deep learning for image reconstruction.
754 J. Adler
Deep Learning
L(θ ) = (Nθ (yi ), xi )
i
Architectures
Over the last years, a range of architectures for learned iterative reconstruction
have been investigated, and although there have been steps toward it (Leuschner
et al. 2019; Zbontar et al. 2018; Ramzi 2019), there is as of yet no consistent
comparison of their performance in a benchmark, with each architecture sporting
different upsides and downsides. We’ll here give a broad overview of the most
common architectures used in the literature.
The core idea of learned iterative reconstruction is to interlace application of
knowledge-driven operators, e.g., the forward operator, with learned operators such
as convolutional neural networks. There are multiple ways to motivate specific
learned iterative reconstruction architectures, but the most popular is to see them as
neural network architectures inspired by unrolling of optimization solvers (Hershey
et al. 2014). Specifically one notes that an optimization solver stopped after a finite
number of iterations almost satisfies our conditions for a neural network (Banert
et al. 2018). It is an operator that takes the data as input, processes it with simple
components such as computing linear combinations and gradients, and returns a
reconstruction. The individual components are also often differentiable, so the only
thing missing is parametrizing the scheme so that there is something to learn.
19 Learned Iterative Reconstruction 755
There are many optimization problems to be inspired by and even more solvers.
Learned iterative reconstruction methods can be broadly classified according to
what type of optimization solver they were inspired by, and by now most commonly
used classes of optimization solvers have been converted into reconstruction
schemes. The learning on the other hand is introduced by replacing certain
components, such as gradients or proximals, with learned counterparts in the form
of neural networks.
Here we must stop and stress that the architectures are merely inspired by
optimization solvers. Learned iterative reconstruction schemes do not actually try
to solve any optimization problem as part of computing the reconstruction, not even
approximately.
We’ll now introduce some of the most common such constructions in a structured
manner. We’ll then follow up with various engineering tricks that have been found
to sometimes vastly improve performance before finally turning to the training.
Gradient-Based Architectures
A set of very well-studied optimization problems are those associated with the
maximum a posteriori solution given some prior. These optimization problems have
been extensively explored over the years, including in both Tikhonov and total
variation (TV) regularization. It can be studied using Bayes’ theorem, according
to which the posterior distribution P (x | y) can be decomposed into components
P (y | x)P (x)
P (x | y) = .
P (y)
this very basic form of gradient-based learned iterative reconstruction was never
published on its own, but a wide range of closely related schemes have been
considered (Hauptmann et al. 2019; Chen et al. 2018).
Variational Networks
Variational networks (Hammernik et al. 2018) are a widely used class of gradient-
based learned iterative reconstruction methods that more closely follow the inspira-
tion from optimization than other schemes. In particular, the learned operator Λθ is
required to be the gradient of some function which is learned
Λθ (x) = ∇x hθ (x).
K
hθ (x) = φθk (Kθk x)
k=1
Proximal-Based Architectures
The proximal gradient algorithm (Parikh et al. 2014) is a method for solving convex
optimization problems given by the sum of two functionals where only one of the
functional is required to be differentiable; the other needs only to have a proximal
operator defined. The method is an excellent fit for inverse problems since the log
data likelihood is typically smooth while the prior is not.
Given a specific (log-)prior, the proximal operator can be seen as a backward
gradient step and is given by
1
proxx→−α log P (x) (x̂) = arg min x − x̂2 − α log P (x) .
x∈X 2
Using this, the proximal gradient algorithm, given in the setting of Bayesian
inversion, is given in Algorithm 3.
As an opportunity for learning, we note that this is very similar to the gradient
ascent scheme except that instead of an additive gradient, the proximal of the log-
prior acts on the updated point. The corresponding learned iterative reconstruction
scheme can be obtained by replacing the proximal operator by a learned component.
This type of scheme was first published under the name recurrent inference
machines with applications to image processing problems (Putzky and Welling
2017). Several other papers extended the methods by adding further components
but also by applying the method to CT (Adler and Öktem 2017; Gupta et al. 2018)
MRI (Lønning et al. 2018) and photoacoustic tomography (Hauptmann et al. 2018;
Yang et al. 2019).
We should however note that there is a different way to use the proximal gradient
scheme. In particular, it is sometimes the case that the proximal of the data log-
758 J. Adler
This scheme often has very fast convergence since the proximal (depending on
the data likelihood) can be seen as a projection onto the feasible set in a single
iteration. For this reason, the algorithm is sometimes called projected gradient
descent, and we’ll adopt that name here for disambiguation purposes. Algorithms
such as ADMM can also be seen as variations of the general idea.
One can then introduce learning as usual by replacing the knowledge-driven prior
with a data-driven prior as in Algorithm 6.
This class of algorithms has become very popular in MRI reconstruction due
to their ease of implementation and speed improvements over gradient-based
schemes. In this domain, the proximal step is often called a data consistency term,
since the proximal enforces the result to be (approximately) consistent with the
data (Schlemper et al. 2017; Aggarwal et al. 2018; Kofler et al. 2018). One of the
first learned iterative reconstruction schemes, ADMM-Net (Sun et al. 2016) used a
related approach for MRI reconstruction, and a range of works have followed with
some interesting variations (Mardani et al. 2017a,b, 2018), and there has even been
some analysis of their convergence (Schwab et al. 2018).
Primal-Dual Networks
Algorithm 7 Primal-Dual
1: Select x 0 ∈ X, z0 ∈ Y
2: for n = 1, . . . do
3: y n ← prox−α(log L)∗ y n−1 + Tx n−1
4: x n ← prox−α log P x n−1 − T∗ y n
5: end for
P (y | x) = L(y | Tx),
then the problem can be solved using a proximal-based scheme with only knowledge
about the proximal of the functional Tx → − log P (y | Tx). The most simple
of such scheme, the Arrow-Hurwich algorithm (Arrow et al. 1958), is given in
Algorithm 7. Accelerated versions of the scheme using momentum, including the
primal-dual hybrid gradient algorithm (Chambolle and Pock 2011), are very popular
for optimization in inverse problems due to their speed and versatility.
Following the recipe from before, we can convert the primal-dual algorithm into
a learned scheme by replacing the proximals with learned operators. Here one could
replace only the proximal related to the prior or both proximals, but most authors
prefer to learn both and this gives rise to the learned primal dual scheme, as in
Algorithm 8.
Given the versatility of this kind of algorithm, practically only requiring access
to the forward operator, it can be applied to almost any inverse problem. So far,
applications have been to CT (Adler and Öktem 2018b; Wu et al. 2018, 2019b),
possibly with incomplete data (Zhang et al. 2019), and image processing (Vogel and
Pock 2017).
Other Schemes
To round off our expose on classical methods that have been converted into
learned iterative reconstruction schemes, we note that some authors have found their
inspiration in iterative schemes outside of optimization. One such idea is Neumann
networks (Gilton et al. 2019), which gain inspiration from the Neumann series for
the inverse
760 J. Adler
∞
T−1 = (I − ηT∗ T)n ηT∗
n=1
where η < T∗ T is a step length. The authors view the partial sums as an iteration
and add a learning component as a small offset to the update, which leads to
Algorithm 9.
We note that the algorithm is very similar to a gradient-based scheme, but that
the result is given as the sum of all partial iterates, and that the data only enters in
the beginning.
Others have taken inspiration from classical iterative reconstruction schemes,
e.g., the Landweber algorithm (Aspri et al. 2018), and there is nothing to stop
researchers from using other methods such as conjugate gradient in the future.
Training Procedure
Given an architecture, the next step is to select the optimal parameters. The
definition of what’s meant by “optimal” is however a hot area for both research
and debate. By far the most popular definition of “optimal” for neural networks
in general and learned iterative reconstruction schemes in particular is to view the
problem as an inference problem where the data is seen as a sample from a random
variable y, and we seek to infer the unknown signal which is a sample from another
random variable, x. Our training data is seen as N samples (yn , xn ) from the joint
random variable (y, x). Further, as in the introduction, we introduce a loss function
: X → X which characterizes how good a single reconstruction is. Given all of
this, the optimal parameter choice is defined as the parameters which minimize the
risk function
Since the risk involves an expectation over the random variables y and x, which
we don’t have access to since they should represent all possible inputs/outputs, we
need to approximate it using our training data. Thankfully, the sample mean is an
19 Learned Iterative Reconstruction 761
unbiased estimator for the expectation, so we can instead chose to minimize the
empirical risk function
N
L̂(θ ) = (Nθ (yn ), xn ).
n=1
Some authors have looked further than these relatively simple losses and have
looked toward using neural networks to define a loss function. The earliest such
attempts were to use perceptual losses (Johnson et al. 2016), which consider an
image as good if it looks like the true image according to a neural network. The
definition of “looks like the true image” is taken to have similar intermediate
activations and the neural network typically taken to be a ImageNet classifier. This
approach has been applied to CT and MRI denoising, where it gave more visually
appealing results (Yang et al. 2017a,b, 2018).
A related type of loss is adversarial losses (Goodfellow et al. 2014), where
one trains a neural network to judge how good a reconstruction is. In the most
simple setting, a discriminator network is trained to determine if an image is a
reconstruction or a true image, and the reconstruction operator is trained to generate
true-looking images. In order to make sure that the network returns a reconstruction
that is related to the input, one typically combines this with some form of classical
loss and sometimes a cycle-consistency (data-fit) condition. The latter case is
especially interesting, since it allows training without paired training data (Mardani
2017; Lei et al. 2019).
Another way of using a neural network to define the loss is to ask “how useful
is the reconstruction?”, where we define usefulness by how well another network
can be trained on the reconstruction to solve some task (Adler et al. 2018). This
general and straightforward idea can be applied to practically any downstream task,
but initial work has focused on segmentation (Boink et al. 2019), object detection
(Wu et al. 2018), and classification (Effland et al. 2018; Diamond et al. 2017).
All of the above methods (possibly excluding adversarial losses) require super-
vised training data. However, access to this kind of data, especially in large amounts,
is often a luxury. Many hence see training using unsupervised data as something
of a grand challenge in order to get truly scalable learned iterative reconstruction
that is applicable in practice. Some algorithmic advances have been made in this
direction, notably the Noise2Noise (Lehtinen et al. 2018) method which uses the
fact that when trained with squared norm loss, the result should only depend on
the conditional mean of the data. Hence, it is possible to train using noisy ground
truth samples, and the learned reconstruction should approximate their mean. Other
methods have been developed with the same goal, e.g., the SURE estimator (Raphan
and Simoncelli 2007). These methods have just started being used for image
reconstruction, but with promising results (Soltanayev and Chun 2018; Cha et al.
2019).
Finally there is great potential in combining learned reconstruction with advances
in deep generative models in order to achieve true Bayesian reconstruction methods
where one can sample from the posterior distribution instead of computing a single
estimator (Adler and Öktem 2018a; Anonymous 2020). Such methods are especially
relevant in the low signal/high noise setting, such as ultralow dose CT and dynamic
imaging or for highly complicated imaging modalities such as seismic imaging
(Herrmann et al. 2019).
19 Learned Iterative Reconstruction 763
To conclude, supervised training with simple losses is still by far the most popular
way to train learned iterative reconstruction schemes, but their combination of
expressive power, speed, and versatility allows a huge range of other options for
training, and we can only expect this field to grow in the future.
Engineering Aspects
Initialization
Just like optimization, all learned iterative reconstruction methods begin with an
initial estimate x 0 which is then refined. Since only a finite number of steps are
used, it’s reasonable to expect this choice to have quite significant impact on the
764 J. Adler
final result. Authors have converged on two different initialization schemes. These
are zero-initialization (Adler and Öktem 2018b), x 0 = 0, and pseudo-inverse
initialization x 0 = T† y, where T† : Y → X is some pseudo-inverse, e.g., zero
filled Fourier inversion (Hammernik et al. 2018) or filtered back projection (Adler
et al. 2017b). In some cases where the forward operator is approximately unitary,
e.g., in photoacoustic tomography, the adjoint has been used in place of a pseudo-
inverse (Hauptmann et al. 2018). Some have also tried learning some parameters
of the initial reconstruction, e.g., learning the filters in filtered back projection
(Hammernik et al. 2017). These more advanced initialization schemes have possible
speed and accuracy advantages over zero-initialization since the learned operator
only needs to learn a correction from the initial reconstruction, but they run a risk
of overfitting to the initial reconstruction, giving worse generalization.
Parameter Sharing
The algorithms as presented here have been shown with a single learned gradi-
ent/proximal operator that is used in all iterations. However, it has been found
by several authors that a significant improvement can be obtained by relaxing this
requirement and instead learning a different operator Λθ n for each iteration, where
the full parameter vector is θ = [θ 1 , θ 2 , . . . , θ N ]. For example, Adler and Öktem
(2018b) reports a very noticeable 4.5 dB uplift when learning ten different proximals
instead of one.
The reason for this uplift has not been thoroughly explained, but the most simple
explanation is that it gives the network ten times more learned parameters. However,
making a single proximal ten times larger has not been found to give the same uplift,
so perhaps the explanation lies in the ability of different parts of the network to focus
on different tasks, with early iterations focusing on large-scale structure while the
last iterations finalize the finer structures.
Further Memory
Preconditioning
(T∗ T + λI )−1 .
However, this is only feasible when the above operator is easily computed, which
is only really the case for image processing problems and Fourier inversion.
Others have used approximations by, e.g., filtering (Hauptmann et al. 2019) or
diagonal approximations to the Hessian (Ravishankar et al. 2019). Finally, some
have investigated other optimization-based ways of speeding up convergence, e.g.,
Nesterov momentum (Li et al. 2018).
where Λθ : X2 → X in this case. This should have some upsides in that the
network could in theory learn, e.g., a preconditioner. Similar ideas can be applied
to most proximal-based learned iterative schemes, e.g., learned primal-dual (Adler
and Öktem 2018b).
Scalable Training
them using the backpropagation algorithm (LeCun et al. 1989) is extremely memory
intense since every step of the algorithm has to be stored in memory. For this reason,
researchers have had significant issues in scaling the algorithms beyond slice-by-
slice cases of roughly 5122 pixels.
A method to train on full 3d volumes of about 5123 voxels hence either needs a
very expensive supercomputer (Laanait et al. 2019) or to be trained without standard
backpropagation. Several researchers have investigated the latter. One such method
is to train the network one iteration at a time, which significantly reduces the amount
of memory needed (Hauptmann et al. 2018; Wu et al. 2019a). Another method is to
use gradient checkpointing (Chen et al. 2016) which reduces the amount of memory
used by recomputing on the fly. An extreme case of this is invertible networks
(Dinh et al. 2014; Jacobsen et al. 2018) which totally remove the need for storing
intermediate results, enabling 3d reconstruction (Putzky et al. 2019)
It is common to combine several, if not all, of the above ideas in a single algorithm.
To give a more practical example in CT, let us assume that T is the radon transform
and that we have Gaussian noise, in which case log P (y | x) = 12 y − Tx2 .
A learned iterative reconstruction scheme for this inverse problem using the
learned proximal gradient method can be obtained by combining pseudo-inverse
initialization with initialization with avoiding parameter sharing, learned steps, extra
memory, and preconditioning which should give a state-of-the-art reconstruction
method. Most parts are straightforward, except for the choice of preconditioner.
Here one could use that due to the Fourier slice theorem, the inverse Hessian
(T∗ T)−1 can be approximated by a convolution with a sharpening kernel K.
Using this, we arrive at Algorithm 10 which is a state-of-the art learned iterative
reconstruction algorithm.
Conclusions
Learned iterative reconstruction has attracted significant interest in just a few years,
and research has quickly gone from a wild-west of architecture exploration to a more
structured view. Given the enormous success of deep learning methods in general
in solving supervised learning problems, research has started shifting toward new
19 Learned Iterative Reconstruction 767
frontiers. The first is moving into more practicably applicable domains, where we
need to learn from large amounts of data without a ground truth and with various
artifacts. The second frontier is the ability to solve previously unsolvable problems
such as reconstructing the posterior distribution or integrating reconstruction with
image analysis tasks. A final frontier is to gain a theoretical understanding of why
these algorithms work so well. Some steps toward this has been taken (Effland et al.
2019; Mardani et al. 2019), but there is still a huge gap between theory and practice.
I suspect that we will see an explosive development in this field in the coming
years and can only hope that this chapter can serve as an introduction to its many
possibilities in the future.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving,
G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th {USENIX}
Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283
(2016)
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Prob. 33(12), 124007 (2017)
Adler, J., Öktem, O.: Deep Bayesian Inversion. arXiv1811.05910 (2018a)
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018b)
Adler, J., Kohr, H., Öktem, O.: ODL-A Python Framework for Rapid Prototyping in Inverse
Problems. Royal Institute of Technology (2017a)
Adler, J., Ringh, A., Öktem, O., Karlsson, J.: Learning to Solve Inverse Problems Using
Wasserstein Loss. arXiv1710.10898 (2017b)
Adler, J., Lunz, S., Verdier, O., Schönlieb, C.B., Öktem, O.: Task Adapted Reconstruction for
Inverse Problems. arXiv1809.00948 (2018)
Aggarwal, H.K., Mani, M.P., Jacob, M.: MoDL: model-based deep learning architecture for inverse
problems. IEEE Trans. Med. Imaging 38(2), 394–405 (2018)
Alizadeh, K., Farhadi, A., Rastegari, M.: Butterfly Transform: An Efficient FFT Based Neural
Architecture Design. arXiv1906.02256 (2019)
Anonymous: Closed loop deep Bayesian inversion: uncertainty driven acquisition for fast MRI. In:
Submitted to International Conference on Learning Representations (2020). https://openreview.
net/forum?id=BJlPOlBKDB. Under review
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.B.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford
University Press, Stanford (1958)
Aspri, A., Banert, S., Öktem, O., Scherzer, O.: A Data-Driven Iteratively Regularized Landweber
Iteration. arXiv1812.00272 (2018)
Banert, S., Ringh, A., Adler, J., Karlsson, J., Öktem, O.: Data-Driven Nonsmooth Optimization.
arXiv1808.00946 (2018)
Boink, Y.E., Manohar, S., Brune, C.: A Partially Learned Algorithm for Joint Photoacoustic
Reconstruction and Segmentation. arXiv1906.07499 (2019)
Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin, D., Wanderman-Milne,
S.: JAX: composable transformations of Python+NumPy programs (2018). http://github.com/
google/jax
Cha, E., Jang, J., Lee, J., Lee, E., Ye, J.C.: Boosting CNN Beyond Label in Inverse Problems.
arXiv1906.07330 (2019)
768 J. Adler
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost.
arXiv1604.06174 (2016)
Chen, H., Zhang, Y., Chen, Y., Zhang, J., Zhang, W., Sun, H., Lv, Y., Liao, P., Zhou, J., Wang, G.:
LEARN: learned experts’ assessment-based reconstruction network for sparse-data CT. IEEE
Trans. Med. Imaging 37(6), 1333–1347 (2018)
Diamond, S., Sitzmann, V., Boyd, S., Wetzstein, G., Heide, F.: Dirty Pixels: Optimizing Image
Classification Architectures for Raw Sensor Data. arXiv1701.06487 (2017)
Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear Independent Components Estimation.
arXiv1410.8516 (2014)
Effland, A., Hölzel, M., Klatzer, T., Kobler, E., Landsberg, J., Neuhäuser, L., Pock, T., Rumpf,
M.: Variational networks for joint image reconstruction and classification of tumor immune cell
interactions in melanoma tissue sections. In: Bildverarbeitung für die Medizin 2018, pp. 334–
340. Springer (2018)
Effland, A., Kobler, E., Kunisch, K., Pock, T.: An Optimal Control Approach to Early Stopping
Variational Methods for Image Restoration. arXiv preprint arXiv:1907.08488 (2019)
Feliu-Faba, J., Fan, Y., Ying, L.: Meta-learning Pseudo-differential Operators with Deep Neural
Networks. arXiv1906.06782 (2019)
Gilton, D., Ongie, G., Willett, R.: Neumann Networks for Inverse Problems in Imaging.
arXiv1901.03707 (2019)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing
Systems, pp. 2672–2680 (2014)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, London (2016)
Gupta, H., Jin, K.H., Nguyen, H.Q., McCann, M.T., Unser, M.: CNN-based projected gradient
descent for consistent CT image reconstruction. IEEE Trans. Med. Imaging 37(6), 1440–1453
(2018)
Hammernik, K., Knoll, F.: Machine learning for image reconstruction. In: Handbook of Medical
Image Computing and Computer Assisted Intervention, pp. 25–64. Elsevier, London (2020)
Hammernik, K., Würfl, T., Pock, T., Maier, A.: A deep learning architecture for limited-angle
computed tomography reconstruction. In: Bildverarbeitung für die Medizin 2017, pp. 92–97.
Springer (2017)
Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., Knoll, F.: Learning
a variational network for reconstruction of accelerated mri data. Magn. Reson. Med. 79(6),
3055–3071 (2018)
Hauptmann, A., Lucka, F., Betcke, M., Huynh, N., Adler, J., Cox, B., Beard, P., Ourselin, S.,
Arridge, S.: Model-based learning for accelerated, limited-view 3-d photoacoustic tomography.
IEEE Trans. Med. Imaging 37(6), 1382–1393 (2018)
Hauptmann, A., Adler, J., Arridge, S., Öktem, O.: Multi-Scale Learned Iterative Reconstruction.
arXiv1908.00936 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Herrmann, F.J., Siahkoohi, A., Rizzuti, G.: Learned Imaging with Constraints and Uncertainty
Quantification. arXiv1909.06473 (2019)
Hershey, J.R., Roux, J.L., Weninger, F.: Deep Unfolding: Model-Based Inspiration of Novel Deep
Architectures. arXiv1409.2574 (2014)
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot,
B., Azar, M., Silver, D.: Rainbow: combining improvements in deep reinforcement learning. In:
Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Hörmander, L.: Fourier integral operators. I. Acta Math. 127(1), 79–183 (1971)
Innes, M., Edelman, A., Fischer, K., Rackauckus, C., Saba, E., Shah, V.B., Tebbutt, W.: Zygote:
A Differentiable Programming System to Bridge Machine Learning and Scientific Computing.
arXiv1907.07587 (2019)
19 Learned Iterative Reconstruction 769
Jacobsen, J.H., Smeulders, A., Oyallon, E.: i-Revnet: Deep Invertible Networks. arXiv1802.07088
(2018)
Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse
problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-
resolution. In: European Conference on Computer Vision, pp. 694–711. Springer (2016)
Kang, E., Min, J., Ye, J.C.: A deep convolutional neural network using directional wavelets for
low-dose x-ray CT reconstruction. Med. Phys. 44(10), e360–e375 (2017)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv1412.6980 (2014)
Knoll, F., Hammernik, K., Zhang, C., Moeller, S., Pock, T., Sodickson, D.K., Akcakaya, M.: Deep
Learning Methods for Parallel Magnetic Resonance Image Reconstruction. arXiv1904.01112
(2019)
Kobler, E., Muckley, M., Chen, B., Knoll, F., Hammernik, K., Pock, T., Sodickson, D., Otazo,
R.: Variational deep learning for low-dose computed tomography. In: 2018 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6687–6691. IEEE
(2018)
Kofler, A., Haltmeier, M., Kolbitsch, C., Kachelrieß, M., Dewey, M.: A u-nets cascade for sparse
view computed tomography. In: International Workshop on Machine Learning for Medical
Image Reconstruction, pp. 91–99. Springer (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural
networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Laanait, N., Romero, J., Yin, J., Young, M.T., Treichler, S., Starchenko, V., Borisevich, A., Sergeev,
A., Matheson, M.: Exascale Deep Learning for Scientific Inverse Problems. arXiv1909.11150
(2019)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.:
Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551
(1989)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2noise:
Learning Image Restoration Without Clean Data. arXiv1803.04189 (2018)
Lei, K., Mardani, M., Pauly, J.M., Vasawanala, S.S.: Wasserstein GANs for MR Imaging: From
Paired to Unpaired Training. arXiv1910.07048 (2019)
Leuschner, J., Schmidt, M., Baguer, D.O., Maaß, P.: The LoDoPaB-CT Dataset: A Benchmark
Dataset for Low-Dose CT Reconstruction Methods. arXiv1910.01113 (2019)
Li, H., Yang, Y., Chen, D., Lin, Z.: Optimization Algorithm Inspired Deep Neural Network
Structure Design. arXiv1810.01638 (2018)
Lønning, K., Putzky, P., Caan, M.W., Welling, M.: Recurrent Inference Machines for Accelerated
MRI Reconstruction. arXiv (2018)
Mardani, L.L.M.: Semi-supervised super-resolution GANs for MRI. In: 31st Conference on Neural
Information Processing Systems (NIPS 2017), Long Beach (2017)
Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S., Zaharchuk, G., Alley, M., Thakur, N., Han, S.,
Dally, W., Pauly, J.M., et al.: Deep Generative Adversarial Networks for Compressed Sensing
Automates MRI. arXiv1706.00051 (2017a)
Mardani, M., Monajemi, H., Papyan, V., Vasanawala, S., Donoho, D., Pauly, J.: Recurrent
Generative Adversarial Networks for Proximal Learning and Automated Compressive Image
Recovery. arXiv1711.10046 (2017b)
Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S.S., Zaharchuk, G., Xing, L., Pauly, J.M.:
Deep generative adversarial neural networks for compressive sensing MRI. IEEE Trans. Med.
Imaging 38(1), 167–179 (2018)
Mardani, M., Sun, Q., Papyan, V., Vasanawala, S., Pauly, J., Donoho, D.: Degrees of Freedom
Analysis of Unrolled Neural Networks. arXiv preprint arXiv:1906.03742 (2019)
McCann, M.T., Unser, M.: Algorithms for Biomedical Image Reconstruction. arXiv1901.03565
(2019)
Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
770 J. Adler
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A.,
Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Putzky, P., Welling, M.: Recurrent Inference Machines for Solving Inverse Problems.
arXiv1706.04008 (2017)
Putzky, P., Karkalousos, D., Teuwen, J., Miriakov, N., Bakker, B., Caan, M., Welling, M.: i-RIM
Applied to the fastMRI Challenge. arXiv1910.08952 (2019)
Ramzi, Z.: fastMRI reproducible benchmark. https://github.com/zaccharieramzi/fastmri-
reproducible-benchmark (2019)
Raphan, M., Simoncelli, E.P.: Learning to be Bayesian without supervision. In: Advances in Neural
Information Processing Systems, pp. 1145–1152 (2007)
Ravishankar, S., Ye, J.C., Fessler, J.A.: Image Reconstruction: From Sparsity to Data-Adaptive
Methods and Machine Learning. arXiv1904.02816 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 234–241. Springer (2015)
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A., Rueckert, D.: A deep cascade of convolutional
neural networks for mr image reconstruction. In: International Conference on Information
Processing in Medical Imaging, pp. 647–658. Springer (2017)
Schlemper, J., Salehi, S.S.M., Kundu, P., Lazarus, C., Dyvorne, H., Rueckert, D., Sofka, M.:
Nonuniform variational network: deep learning for accelerated nonuniform MR image recon-
struction. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 57–64. Springer (2019)
Schwab, J., Antholzer, S., Haltmeier, M.: Deep null space learning for inverse problems:
convergence analysis and rates. Inverse Prob. https://iopscience.iop.org/article/10.1088/1361-
6420/aaf14a (2018)
Soltanayev, S., Chun, S.Y.: Training deep learning based denoisers without ground truth data. In:
Advances in Neural Information Processing Systems, pp. 3257–3267 (2018)
Sun, J., Li, H., Xu, Z., et al.: Deep ADMM-Net for compressive sensing MRI. In: Advances in
Neural Information Processing Systems, pp. 10–18 (2016)
Syben, C., Michen, M., Stimpel, B., Seitz, S., Ploner, S., Maier, A.K.: PYRO-NN: Python
Reconstruction Operators in Neural Networks. arXiv1904.13342 (2019)
van Aarle, W., Palenstijn, W.J., De Beenhouwer, J., Altantzis, T., Bals, S., Batenburg, K.J.,
Sijbers, J.: The ASTRA Toolbox: a platform for advanced algorithm development in electron
tomography. Ultramicroscopy 157, 35–47 (2015)
Vishnevskiy, V., Sanabria, S.J., Goksel, O.: Image reconstruction via variational network for real-
time hand-held sound-speed imaging. In: International Workshop on Machine Learning for
Medical Image Reconstruction, pp. 120–128. Springer (2018)
Vishnevskiy, V., Rau, R., Goksel, O.: Deep Variational Networks with Exponential Weighting for
Learning Computed Tomography. arXiv1906.05528 (2019)
Vogel, C., Pock, T.: A primal dual network for low-level vision problems. In: German Conference
on Pattern Recognition, pp. 189–202. Springer (2017)
Wang, G., Ye, J.C., Mueller, K., Fessler, J.A.: Image reconstruction is a new frontier of machine
learning. IEEE Trans. Med. Imaging 37(6), 1289–1296 (2018)
Wu, D., Kim, K., Dong, B., El Fakhri, G., Li, Q.: End-to-end lung nodule detection in computed
tomography. In: International Workshop on Machine Learning in Medical Imaging, pp. 37–45.
Springer (2018)
Wu, D., Kim, K., El Fakhri, G., Li, Q.: Computational-efficient cascaded neural network for CT
image reconstruction. In: Medical Imaging 2019: Physics of Medical Imaging, vol. 10948,
p. 109485Z. International Society for Optics and Photonics (2019a)
Wu, D., Kim, K., Kalra, M.K., De Man, B., Li, Q.: Learned primal-dual reconstruction for dual
energy computed tomography with reduced dose. In: 15th International Meeting on Fully
Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 11072, p.
1107206. International Society for Optics and Photonics (2019b)
19 Learned Iterative Reconstruction 771
Yang, G., Yu, S., Dong, H., Slabaugh, G., Dragotti, P.L., Ye, X., Liu, F., Arridge, S., Keegan, J.,
Guo, Y., et al.: DAGAN: deep de-aliasing generative adversarial networks for fast compressed
sensing MRI reconstruction. IEEE Trans. Med. Imaging 37(6), 1310–1321 (2017a)
Yang, Q., Yan, P., Kalra, M.K., Wang, G.: CT Image Denoising with Perceptive Deep Neural
Networks. arXiv1702.07019 (2017b)
Yang, Q., Yan, P., Zhang, Y., Yu, H., Shi, Y., Mou, X., Kalra, M.K., Zhang, Y., Sun, L., Wang, G.:
Low-dose CT image denoising using a generative adversarial network with Wasserstein distance
and perceptual loss. IEEE Trans. Med. Imaging 37(6), 1348–1357 (2018)
Yang, C., Lan, H., Gao, F.: Accelerated photoacoustic tomography reconstruction via recurrent
inference machines. In: 2019 41st Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), pp. 6371–6374. IEEE (2019)
Zbontar, J., Knoll, F., Sriram, A., Muckley, M.J., Bruno, M., Defazio, A., Parente, M., Geras,
K.J., Katsnelson, J., Chandarana, H., et al.: FastMRI: An Open Dataset and Benchmarks for
Accelerated MRI. arXiv1811.08839 (2018)
Zhang, H., Dong, B., Liu, B.: JSR-Net: a deep network for joint spatial-radon domain CT
reconstruction from incomplete data. In: ICASSP 2019–2019 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 3657–3661. IEEE (2019)
Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain-
transform manifold learning. Nature 555(7697), 487 (2018)
An Analysis of Generative Methods for
Multiple Image Inpainting 20
Coloma Ballester, Aurélie Bugeau, Samuel Hurault,
Simone Parisotto, and Patricia Vitoria
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
A Walk Through the Image Inpainting Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
How to Achieve Multiple and Diverse Inpainting Results? . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
Variational Autoencoders and Conditional Variational Autoencoders . . . . . . . . . . . . . . . . . 784
Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Image Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
From Single-Image Evaluation Metrics to Diversity Evaluation . . . . . . . . . . . . . . . . . . . . . . . 793
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
Quantitative Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Qualitative Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
Additional Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
Additional Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Abstract
Keywords
Introduction
the methods generally attempt to generate one single result from a given image,
ignoring many other plausible solutions. In this chapter, we focus on analyzing
recent advances in the inpainting literature, concentrating on the learning-based
approaches for multiple and diverse inpainting. The goal of those methods is to
estimate multiple plausible inpainted solutions while being as much diverse as
possible. Those methods mainly focus on the idea of exploiting image coherency
at several levels along with the power of neural networks trained on large datasets
of images. Unlike previous one-to-one methods, multiple-image inpainting offers
the advantage of exploring a large space of possible solutions. This procedure gives
capacity to the user to eventually choose the preferred fit under his/her judgment
instead of leaving the task of singling out one solution to the algorithm itself.
This chapter is structured as follows: Section “A Walk Through the Image
Inpainting Literature” provides a brief overview of both model-based and learning-
based inpainting methods in the literature. Section “How to Achieve Multiple and
Diverse Inpainting Results?” presents the underlying theory of several approaches
for multiple and diverse inpainting together with a review of the most representative
(to the best of our knowledge) state-of-the-art proposals using those particular
strategies. Section “From Single-Image Evaluation Metrics to Diversity Evaluation”
presents the evaluation metrics for both inpainting quality and diverse inpainting.
The multiple inpainting results of the methods of section “How to Achieve Multiple
and Diverse Inpainting Results?” are presented and compared in section “Exper-
imental Results” both quantitatively and qualitatively, on common datasets and
masks, concerning three aspects: proximity to ground truth, perceptual quality,
and inpainting diversity. Finally, section “Conclusions” concludes the presented
analysis.
In the literature, inpainting methods can fall under different categories, e.g., local
vs. nonlocal depending on the ability to capture and exploit non-nearby content, or
geometric vs. exemplar-based methods depending on the action on points or patches.
For our purposes, it is more convenient to distinguish between learning- and model-
based approaches, according to the usage or not of machine learning techniques.
For extensive reviews of existing inpainting methods, we refer the reader to the
works in Guillemot and Le Meur (2014), Schonlieb (2015), Buyssens et al. (2015),
and Parisotto et al. (2022).
Model-Based Inpainting
Model-based inpainting methods are designed to manipulate an image by exploit-
ing its regularity and coherency features with an explicit model governing the
inpainting workflow. One approach for restoring geometric image content is to
locally propagate the intensity values and regularity of the image level lines inward
the inpainting domain with curvature-driven (Nitzberg et al. 1993; Masnou and
776 C. Ballester et al.
Morel 1998; Ballester et al. 2001; Chan and Shen 2001; Esedoglu and Shen 2002;
Shen et al. 2003) and diffusion-based (Caselles et al. 1998; Shen and Chan 2002;
Tschumperle and Deriche 2005) evolutionary partial differential equations (PDEs),
possibly of fluid-dynamic nature (Bertalmío et al. 2000, 2001; Tai et al. 2007) or
with coherent transport mechanisms (Bornemann and März 2007), also by invoking
variational principles (Grossauer and Scherzer 2003; Bertozzi et al. 2007) and
regularization (possibly of higher order) priors (Papafitsoros and Schönlieb 2013).
The filling-in of geometry, especially of small scratches and homogeneous content
in small inpainting domains, is the most effective scenario of these methods, which
perform poorly in the recovery of texture. Such issue is overcome by considering
a patch (a group of neighboring points in the image domain) as the imaging atom
containing the essential texture element. The variational formulation of dissimilarity
metrics based on the estimation of a correspondence map between patches (Efros
and Leung 1999; Bornard et al. 2002; Demanet et al. 2003; Criminisi et al. 2004;
Aujol et al. 2010) has led to the design of optimal copying-pasting strategies for
inpainting large domains. However, these methods still fail, e.g., in the presence of
different scale-space features. Thus, some researchers have exploited, also using a
variational approach, the efficiency of PatchMatch (Barnes et al. 2009) in computing
a probabilistic approximation of correspondence maps between patches to average
the contribution of multiple-source patches during the synthesis step. For example,
Arias et al. (2011) and Newson et al. (2014) use it in a non-local mean fashion
(Wexler et al. 2004), to inpaint rescaled versions of the original image with
results propagated from the coarser to the finer scale; Cao et al. (2011) to guide
the inpainting with geometric-sketches; Sun et al. (2005) to guide structures; or
Mansfield et al. (2011), Eller and Fornasier (2016), and Fedorov et al. (2016) to
account for geometric transformations of patches. However, these mathematical and
numerical advances may result to be computationally expensive while suffering
from having only one single-imaging source as input, and dependence on the
initialization quality and the selection of associated parameters (e.g., the size of
the patch). Thus, it seems natural to study if image coherency, smoothness, and
self-similarity patterns can be further exploited by augmenting the dataset of source
images and eventually synthesize multiple inpainting solutions: this is where diverse
inpainting with deep learning-based generative approaches is a significant step
forward.
One of the earliest model-based inpainting works dealing with multiple-source
images is Kang et al. (2002), where salient landmarks are extracted in a scene
under different perspectives and then synthesized by interpolation, guiding the
imaging restoration. As said, model-based models are sensitive to initializations and
chosen parameters: One way to diminish these drawbacks is to perform inpainting
of the input image multiple times, by varying parameters like the patch size, the
number of pyramid scales, initializations, and inpainting methodologies. Thus,
a final assembling step will produce an inpainted image, which encodes locally
the most coherent content (Hays and Efros 2007; Le Meur et al. 2013; Kumar
et al. 2016). Still, the computational effort of estimating several solutions with
different parameters and their fine-tuning is a keypoint, leading to the need for a
20 An Analysis of Generative Methods for Multiple Image Inpainting 777
one-encompassing strategy that can locally adapt the synthesis step from multiple-
source images. This task can be solved with learning-based methods.
Learning-Based Methods
Learning-based methods address image inpainting by learning a mapping from a
corrupted input to the estimated restoration by training on a large-scale dataset.
Besides capturing local or non-local regularities and redundancy inside the image
or the entire dataset, those methods also exploit high-level information inherent in
the image itself, such as global regularities and patterns, or perceptual clues and
semantics over the images.
Early learning-based methods tackled the problem as a blind inpainting problem
(Ren et al. 2015; Cai et al. 2015) by minimizing the distance between the predicted
image and the ground truth. This type of methods behaved as an image denoising
algorithm and was limited to tiny inpainting domains. To deal with bigger and
more realistic inpainting regions, later approaches incorporated in the model the
information provided by the mask, e.g., Köhler et al. (2014), Ren et al. (2015),
Pathak et al. (2016), and Lempitsky et al. (2018). Also, several modifications to
vanilla convolutions have been proposed to explicitly use the information of the
mask, like partial onvolutions (Liu et al. 2018) and gated convolutions (Yu et al.
2019), where the output of those layers only depends on non-corrupted points.
Additionally, attempts to increase the receptive field without increasing the number
of layers have been proposed with dilated convolutions (Iizuka et al. 2017; Wang
et al. 2018) and contextual attention (Yu et al. 2018, 2019). Learning to inpaint in
a single step has shown to be a complex endeavor. Progressive learning approaches
have also been introduced to split the learning into several steps: for instance, Zhang
et al. (2018a) progressively fills the holes from outside to inside; similarly, Guo
et al. (2019), Zeng et al. (2020), and Li et al. (2020) also learn how to update the
inpainting mask for next iteration, and Li et al. (2019) learns jointly structure and
feature information.
To train the network, early approaches minimized some distance between the
ground-truth and the predicted image. But this approach takes into account just
one of the several possible plausible solutions to the inpainting problem. Several
approaches have been proposed to overcome this drawback. Some works use
perceptual metrics based on generative adversarial networks (GANs) aiming to
generate more perceptually realistic results (Pathak et al. 2016; Yeh et al. 2017;
Iizuka et al. 2017; Yu et al. 2018; Vitoria et al. 2019, 2020; Dapogny et al. 2020; Liu
et al. 2019; Lahiri et al. 2020). Other works tackle the problem in the feature space
by minimizing distances at feature space level (Fawzi et al. 2016; Yang et al. 2017;
Vo et al. 2018) by using an additional pre-trained network, or by directly inpainting
those features (Yan et al. 2018; Zeng et al. 2019). Also, two-step approaches have
been proposed. They are based on a first coarse inpainting (Yang et al. 2017; Yu et al.
2018; Liu et al. 2019), edge learning (Liao et al. 2018; Nazeri et al. 2019; Li et al.
2019), or structure prediction (Xiong et al. 2019; Ren et al. 2019) and followed by a
refinement step adding finer texture details. Furthermore, Liu et al. (2020) aimed to
ensure consistency between structure and texture generation. Another big problem
778 C. Ballester et al.
of early deep learning methods is that deep models treat input images with limited
resolution. While first approaches were able to deal with images of maximum size
64 × 64, the latest methods can deal with 1024 × 1024 resolution images by using,
for example, a multiscale approach (Yang et al. 2017; Zeng et al. 2019), or even
to 8K resolution by generating first a low-resolution solution and second its high-
frequency residuals (Yi et al. 2020b).
Recent works (e.g., Zheng et al. 2019; Zhao et al. 2020b; Cai and Wei 2020; Peng
et al. 2021; Wan et al. 2021; Liu et al. 2021) deal with the ill-posed nature of the
problem by allowing more than one possible plausible solution to a given image.
They aim to generate multiple and diverse solutions by using deep probabilistic
models based on variational autoencoders (VAEs), GANs, autoregressive models,
transformers, or a combination of them. Note that those types of methods have been
also used for real case applications such as diverse fashion image inpainting (Han
et al. 2019) and Cosmic microwave background radiation (CMB) image inpainting
(Yi et al. 2020a). Besides, it is worth mentioning that there are several single-
image generation methods that estimate complete images with some variations. For
instance, SinGAN (Rott Shaham et al. 2019) produces several random images which
are deviations of an input image by learning the distribution of its patches. Park
et al. (2019) synthesizes new images by controlling style and semantics. However,
these strategies do not completely fit within the multiple inpainting problems where
regions of the image are known and should not be changed. In this chapter, we will
focus on the study of multiple-image inpainting methods. More precisely, we will
review, analyze, and compare, theoretically as well as experimentally, the different
approaches proposed on the literature to generate inpainting diversity.
In this section, we will describe the different tools and methods that successfully
addressed multiple image inpainting. Later in section “Experimental Results”, we
will conduct a thorough experimental study comparing these methods visually and
quantitatively.
As previously mentioned, image inpainting is an inverse problem with multiple
plausible solutions. Generally, ill-posed problems are solved by incorporating some
knowledge or priors into the solution. Mathematically, this is frequently done using
a variational approach where a prior is added to a data-fidelity term to create an
overall objective functional that is lastly optimized. The selected prior promotes
the singling out of a particular solution. Traditionally, the incorporated priors were
model-based, founded on properties of the expected solution.
More recently, data-driven proposals have emerged where the prior knowledge
on the image distribution is implicitly or explicitly learned via neural networks
optimization (we refer to the recent survey Arridge et al. 2019 and references
therein). Among them, generative methods have been used to learn the underlying
geometric and semantic priors of a set of non-corrupted images. Indeed, generative
methods aim to estimate the probability distribution of a large set, X, of data. In
20 An Analysis of Generative Methods for Multiple Image Inpainting 779
Table 1 Generative methods used in the analyzed state-of-the-art proposals for diverse inpainting
We are interested in modeling the conditional distribution, p(x|y), over the values
of the variable x (corresponding to the complete image) conditioned on the value of
the observed variable y. As possibly many plausible images are consistent with the
same input image y, the distribution p(x|y) will likely be multimodal. Then, each
of the multiple solutions can be generated by sampling from that distribution using
a given sampling strategy. Thus, the goal is not only to obtain a generative model
that minimizes d(PG , PXs ), where Xs ⊂ X is the set of possible solutions, but also
to design a mechanism able to sample the conditional distribution p(x|y), i.e., for a
given damaged incomplete image y, output a set of plausible completions x of y.
In this section, we will analyze the different families of generative models
proposed in the literature to realize diverse image inpainting. We will in particular
describe generative adversarial networks (GAN), variational autoencoders (VAE),
autoregressive models, and transformers. We will also detail the different objective
losses proposed to train these networks. Finally, for each family of models, we will
review several state-of-the-art diverse inpainting methods that relate to this model.
Table 1 lists all the methods that will be reviewed in this section.
Generative adversarial networks (GANs) are a type of generative models that have
received a lot of attention since the seminal work of Goodfellow et al. (2014). The
GAN strategy is based on a game theory scenario between two networks, a generator
network and a discriminator network, that are jointly trained competing against each
other in the sense of a Nash equilibrium. The generator maps a vector from the
latent space, z ∼ PZ , to the image space trying to trick the discriminator, while
the discriminator receives either a generated or a real image and must distinguish
between both. The parameters of the generator and the discriminator are learned
jointly by optimizing a GAN objective by a min-max procedure. This procedure
leads the probability distribution of the generated data to be as close as possible,
for some distance, to the one of the real data. Several GAN variants have appeared.
They mainly differ on the choice of the distance d(P1 , P2 ) between two probability
20 An Analysis of Generative Methods for Multiple Image Inpainting 781
distributions P1 and P2 . The first GAN by Goodfellow et al. (2014) (also referred
to as vanilla GAN) makes use of the Jensen–Shannon divergence, which is defined
from the Kullback–Leibler divergence (KL), by
1 P1 + P2 P1 + P2
dJS (P1 , P2 ) = KL P1 || + KL P2 || , (2)
2 2 2
P1 (x)
KL(P1 , P2 ) = P1 (x) log . (3)
x
P2 (x)
The Wasserstein GAN (Arjovsky et al. 2017) uses the Wasserstein-1 distance, given
by
where (P1 , P2 ) is the set of all joint distributions π whose marginals are,
respectively, P1 and P2 . By Kantorovitch–Rubenstein duality, the Wasserstein-1
distance can be computed as
W1 (P1 , P2 ) = sup Ex∼P1 [D(x)] − Ey∼P2 [D(y)] , (6)
D∈D
where D denotes the set of 1-Lipschitz functions. In practice, the dual variable D is
parametrized by a neural network and it represents the so-called discriminator.
Both the generator and discriminator are jointly trained to solve
min sup Ex∼PX [D(x)] − Ey∼PG [D(y)] , (7)
G D∈D
for the vanilla GAN. In (8), the discriminator D is simply a classifier that tries to
distinguish samples in the training set X (real samples) from the generated samples
G(z) (fake samples) by designing a probability D(x) ∈ [0, 1] for its likelihood to
be from the same distribution as the samples in X.
782 C. Ballester et al.
GANs are sometimes referred to as implicit probabilistic models due to the fact
that they are defined through a sampling procedure where the generator learns
to generate new image samples. This is in contrast to variational autoencoders,
autoregressive models, and methods that explicitly maximize the likelihood.
For the task of inpainting, several proposals set the problem as a conditioned
one. The GAN approach is modified such that the input of the generator G is both
an incomplete image y and a latent vector z ∼ PZ , and G performs conditional
image synthesis where the conditioning input is y. In the GAN-based works that
we present in this section (Cai and Wei 2020; Liu et al. 2021), the authors focus on
multimodal conditioned generation where the goal is to generate multiple plausible
output images for the same given incomplete image.
Finally, let us mention that in these works, and in general in some works
described in this chapter, the used generative methods are combined with consis-
tency losses that encourage the inpainted images to be close to the ground truth.
Examples of those consistency losses include value and feature reconstruction losses
and perceptual losses. Nonetheless, multiple inpainting researchers acknowledge
that it can be counterproductive to rely on consistency losses due to the fact that the
ground truth is only one of the multiple solutions.
Fig. 1 Overview of the architecture of PiiGAN: Generative Adversarial Networks for Pluralistic
Image Inpainting (Cai and Wei 2020). (Figure from Cai and Wei 2020)
Fig. 2 Overview of the architecture of PD-GAN: Probabilistic Diverse GAN for Image Inpainting
(Liu et al. 2021). (Figure from Liu et al. 2021)
PD-GAN: Probabilistic Diverse GAN for Image Inpainting (Liu et al. 2021)
The authors of Liu et al. (2021) propose a method to perform diverse image
inpainting called PD-GAN. PD-GAN takes advantage of the benefits of GANs in
generating diverse content from different random noise inputs. Figure 2 displays an
overview of the algorithm pipeline. In contrast to the original vanilla GAN, in PD-
GAN all the decoder deep features are modulated from coarse to fine by injecting
prior information at each scale. This prior information is extracted from an initially
restored image at a coarser resolution together with the inpainting mask. For that
purpose, they introduce a probabilistic diversity normalization (SPDNorm) module
based on the Spatially-adaptive denormalization (SPADE) module proposed in Park
et al. (2019). SPDNorm works by modeling the probability of generating a pixel
conditioned on the context information. It allows more diversity toward the center
of the inpainted hole and more deterministic content around the inpainting boundary.
The objective loss is a combination of several losses, including a diversity loss,
a reconstruction loss, an adversarial loss, and a feature matching loss (difference
in the output feature layers computed with the learned discriminator). In general,
in the context of multiple-image synthesis, diversity losses aim at ensuring that the
different reconstructed images are diverse enough. In particular, the authors of PD-
GAN (Liu et al. 2021) use the so-called perceptual diversity loss, defined as
784 C. Ballester et al.
1
Lpdiv (xout a , xout b ) = . (9)
l M (l (xout a ) − l (xout b ))1 +
where xout a and xout b are two inpainted results, and M the inpainting mask (with
1 values on the missing pixels and 0 elsewhere). The minimization of (9) favors
the maximization of the perceptual distance of inpainted regions in xout a and xout b .
Notice that the non-masked pixels are not affected by this loss. A similar diversity
loss was proposed in Mao et al. (2019).
Variational autoencoders (VAE) (Kingma and Welling 2013) are generative models
for which the considered distance between probability distributions is the Kullback–
Leibler divergence. Maximization of the log-likelihood criterion is equivalent to
the minimization of a Kullback–Leibler divergence between the data and model
distributions. In the VAE context, the generator Gθ is referred to as the decoder.
Let us first derive the vanilla VAE formulation in the general context of non-
corrupted images x ∈ X. Using Bayes rule, the likelihood pGθ (x), for x ∼ PX and
z ∼ PZ , is given by
Lθ,ψ is the so-called evidence lower bound (ELBO). By positivity of the KL, it
verifies
Lθ,ψ (x) = log pθ (x) − KL qψ (z|x)||pθ (z|x) ≤ log pθ (x), (14)
and Lθ,ψ (x) = log pθ (x) if and only if qψ (z|x) is equal to pθ (z|x).
20 An Analysis of Generative Methods for Multiple Image Inpainting 785
VAE training consists in maximizing Lθ,ψ in (14) with respect to the parameters
{θ, ψ} of the encoder and of the decoder, simultaneously. The goal is to obtain a
good approximation qψ (z|x) of the true posterior pθ (z|x) while maximizing the
marginal likelihood pθ (x).
The work Sohn et al. (2015) extends VAEs by proposing conditional variational
autoencoders (CVAE). Their targeted distribution is the conditional distribution of
x given an input “conditional” variable c and the maximization of the log-likelihood
criterion becomes
CVAE loss is obtained with a similar argument as in (11), (12), (13), and (14) by
maximizing the conditional log-likelihood, which gives the variational lower bound
of the conditional log-likelihood
Eqψ (z|x) log pθ (x|c, z) − KL qψ (z|x, c)||pθ (z|x) ≤ log pθ (x|c). (16)
Then, the idea of the deep conditional generative modeling is simple: given an
observation (input) x, z is drawn from a prior distribution pθ (z|x). Then, the output
is generated from the distribution pθ (x|z, c). Bao et al. (2017) combines a CVAE
with a GAN (CVAE-GAN) for fine-grained category image generation. Even if
inpainting results are shown, the network is not trained explicitly for inpainting but
for image generation conditioned on image labels.
In the context of multiple-image inpainting, or more generally of multiple-
image restoration, a straightforward idea is to condition the generative model on
the input degraded image y and to generate multiple images x sampling from
pθ (x|z, c = y). BicycleGAN (Zhu et al. 2017) uses this idea for diverse image-
to-image translation. Their goal is to learn a bijective mapping between two image
domains with a multimodal conditional distribution. They combine CVAE-GAN
with latent regressors and show that their method can produce both diverse and
realistic results across various image-to-image translation problems. However, their
method is not explicitly applied for image inpainting. Moreover, as observed by
several authors (see, e.g., Zheng et al. 2019; Wan et al. 2021), using standard
conditional VAEs or CVAE-GAN for the specific task for image inpainting still leads
to minimal diversity and quality. Several extensions of these models have recently
appeared for diverse image inpainting. They are presented below with more details.
Finally, let us notice that VAE model has been extended in van den Oord et al.
(2017) and Razavi et al. (2019) to the so-called vector quantized–variational autoen-
coder (VQ-VAE) that uses vector quantization to model discrete latent variables.
Such discretization is done to avoid posterior collapse. The quantization codebook
is trained at the same time as the autoencoder with an objective loss made of a
reconstruction term and a regularization term that ensures that the embedding fits
the encoder and outputs, respectively. The work Razavi et al. (2019) is a hierarchical
extension of van den Oord et al. (2017). In particular, the authors of Razavi et al.
786 C. Ballester et al.
Fig. 3 Overview of the PIC architecture of pluralistic image completion (Zheng et al. 2019).
(Figure from Zheng et al. 2019)
the whole training loss is a combination of three types of terms. First, they use the
KL divergences between the mentioned distributions. Second, the appearance terms
based on the L1 norm of the error, where in the generative path it only has into
account the visible pixels. And lastly, the third term is an adversarial discriminator-
based term. It is based on the L1 difference among the discriminator features of
the ground-truth and the reconstructed image for the reconstructive path and on the
discriminator value on the generated image for the generative path. Additionally, to
exploit the distant relation among the encoder and decoder, they use a modified self-
attention layer that captures fine-grained features in the encoder and more semantic
generative features in the decoder.
Autoregressive Models
In autoregressive models (Van Oord et al. 2016; Oord et al. 2016; Chen et al. 2018),
the likelihood pθ (x) is learned by choosing an order of the data variables x =
(x1 , x2 , . . . , xn ) ∈ X, frequently related to values on the n pixels of an image, and
exploiting the fact that the joint distribution can be decomposed as
n
p(x) = p(x1 , x2 , . . . , xn ) = p(x1 ) p(xi |x1 , . . . , xi−1 ). (17)
i=2
More generally, a similar decomposition to (17) can be obtained by splitting the set
of variables in smaller disjoint subsets. In this case, and considering the variable
order of x1 , x2 , . . . , xn to be represented by a directed and noncyclic graph, one has
m
p(x) = p(x1 , x2 , . . . , xn ) = p(x1 ) p(xi |S(xi )), (18)
i=2
Fig. 5 Overview of the architecture of generating diverse structure for image inpainting with
hierarchical VQ-VAE (DSI-VQVAE) (Peng et al. 2021). (Figure from Peng et al. 2021)
The first stage of Peng et al. (2021), known as diverse structure generator,
generates multiple low-resolution results, each of which has a different structure by
sampling from a conditional autoregressive distribution. The second stage, known as
texture generator, uses an encoder–decoder architecture with a structural attention
module that refines each low-resolution result separately by augmenting texture.
The structural information module facilitates the capture of distant correlations.
They further reuse the VQ-VAE to calculate two feature losses, which help improve
structure coherence and texture realism, respectively.
The authors first train the hierarchical VQ-VAE and, afterward, the diverse
structure generator (Gs depending on parameters θ ) and the texture generator (Gt
depending on parameters ϕ) are trained separately. These generators are later on
used for inference. The structure generator Gs is constructed via a conditional
autoregressive network for the distribution over structural features. In inference, it
will generate different structural features via sampling. Its objective loss is defined
as the negative log-likelihood
where y is the input image to be inpainted on the points of O where the hole
mask M is equal to 1, PX denotes the distribution of the training dataset, sgt denote
the vector quantized structural features of the ground truth at the coarser scale given
by the hierarchical VQ-VAE, and θ the parameters of Gs .
790 C. Ballester et al.
Besides, the objective loss for the texture generator Gt is composed by: (i)
the L1 norm comparing the inpainted solution to the ground truth at pixel level,
(ii) an adversarial loss using the discriminator trained with the SN-PatchGAN
hinge version (Yu et al. 2019) applied to the resulting image and, moreover, (iii)
a structural feature loss Lsf (ϕ), and (iv) a textural feature loss Ltt (ϕ). These last
two losses are defined similarly using a multiclass cross-entropy loss. In particular,
the structural feature loss is defined as
Lsf (ϕ) = − αk,j log softmax(λ2 δk,j ) , (20)
k,j
where δk,j denotes the truncated distance similarity score between the k-th feature
vector of scomp (computed from the inpainted image using the trained encoder) and
the j-th prototype vector of the structural codebook of VQ-VAE, λ2 is a parameter set
to 10, and αk,j is an indicator of the prototype vector class. That is, αk,j = 1 when
the k-th feature vector of sgt belongs to the j-th class of the structural codebook;
otherwise, αk,j = 0. The authors define the textural feature loss Ltt (ϕ) in an
analogous way.
As mentioned, in section “Experimental Results”, we will experimentally ana-
lyze this method. It will be denoted there as DSI-VQVAE.
Image Transformers
relate embedded inputs to each other. It is worth noticing that transformers will
maintain the number of tokens throughout all computations. If tokens were related
to pixels, each pixel would have a one-to-one correspondence with the output, thus,
maintaining the spatial resolution of the original input image. Since transformers
are set-to-set functions, they do not intrinsically retain the information of the
spatial position for each individual token; thus, the embedding is concatenated to a
learnable position embedding to add the positional information to the representation.
One advantage of using a transformer for image restoration is that it naturally
supports pluralistic outputs by directly optimizing the underlying data distribution.
One drawback is the computational complexity that increases quadratically with the
input length, thus making it difficult to directly synthesize high-resolution images.
Fig. 6 Overview of the architecture of high-fidelity pluralistic image completion with transform-
ers (Wan et al. 2021), referred to as ICT. (Figure from Wan et al. 2021)
792 C. Ballester et al.
visible regions), then the transformer is optimized by minimizing the negative log-
likelihood of the masked tokens xπk , conditioned the visible regions X− , that is,
1
K
LMLM (θ ) = EX − log p(xπk |X− ; θ ) (21)
K
k=1
where θ contains the parameters of the transformer and the subindex MLM stands
for the masked language model which is similar to the one in BERT (Devlin
et al. 2018). One particularity of the ICT model is that each token attends
simultaneously to all positions thanks to bidirectional attention. This enables the
generated distribution to capture the full context, thus leading to a consistency
between generated contents and unmasked region.
Once the transformer is trained, instead of directly sampling the entire set
of masked positions which would lead to non-plausible results due to the inde-
pendence property, they apply Gibbs sampling to iteratively sample tokens at
different locations. To do so, in each iteration, a grid position is sampled from
p(xπk |X− , X<πk , θ ) with the top-K predicted elements, where X<πk denotes the
previous generated tokens.
The second step is to perform texture refinement at the original resolution using
a CNN, which is optimized by minimizing the L1 loss between the predicted image
and the ground truth, together with an adversarial loss based on the vanilla GAN
(cf. (8) in section “Generative Adversarial Networks”).
Fig. 7 Overview of the BAT architecture of diverse image inpainting with bidirectional and
autoregressive transformers (Yu et al. 2021). (Figure from Yu et al. 2021)
1
K
LBAT (θ ) = EX − log p(xπk |X− , M, X<πk ; θ ). (22)
K
k=1
where we have used the same notations as in (21), namely, K is the length of masked
tokens, and X− are all the unmasked tokens (corresponding to the visible regions).
Finally, X<πk denote the previous predicted tokens, and M the masked positions.
Finally, they construct a texture generator based on CNN-based synthesis, which
is optimized by minimizing the L1 loss between the predicted image and the ground
truth together with an adversarial loss and a perceptual loss (Johnson et al. 2016).
In inference, each masked token is predicted bidirectionally and autoregressively.
As in Wan et al. (2021), they iteratively use top-K sampling to randomly sample
from the K most likely next tokens.
does not ensure being realistic. Other perceptual metrics have been proposed
and are supposed to be more consistent with human judgment. In particular,
Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al. 2018b) has been
demonstrated to correlate well with the human perceptual similarity. It relies on
the observation that hidden activations in CNNs trained for image classification
are indeed a space where distance can strongly correlate with human judgment.
Precisely, LPIPS computes a weighted L2 norm between deep features of pair of
images:
1
LPIPS(x, xgt ) = wl (lr (i, j ) − lgt (i, j ))22 (23)
Ml Nl
l ij
where x is the reconstructed image, xgt is the ground truth, l is a layer number,
(i, j ) is a pixel, wl are weights for each features, and l and lgt ∈ RMl ×Nl ×Cl
are features unit-normalized in the channel dimension. LPIPS has been used in
the context of inpainting when generating one image (e.g., Zheng et al. 2021). In
Kettunen et al. (2019), it was shown that standard adversarial attack techniques can
easily fool LPIPS. Therefore, a slightly different metric called E-LPIPS (Ensemble
LPIPS) is proposed by applying random simple image transformations and dropout.
Nonetheless, up to our knowledge, it has never been used in the context of
inpainting.
When, apart from the set of images, there is available corresponding image
categories, other metrics, that are also supposed to be following human judgment,
can be used. The inception score (IS) (Salimans et al. 2016) was designed to measure
how realistic the output from a GAN is. This score measures the variety of a set of
generated images as well as the probability distribution of each image classification.
This is done by comparing the class distribution of each image, which should have
a low entropy, with the marginal distribution of the whole set, which should have
high entropy:
IS(G) = exp Ex∼pg KL p(y|x)||p(y) (24)
where pg is the model distribution of the whole set given by the generative model
G; x, an image sampled from pg ; p(y|x), the conditional class distribution; KL,
the Kullback–Leibler divergence; and p(y), the marginal class distribution. As
detailed in Barratt and Sharma (2018), inception score has its own limitations:
sensitivity to small changes in network weights, misleading results when used
beyond the ImageNet dataset (Rosca et al. 2017), and adversarial examples when
used for model optimization. The IS score was adapted to diverse inpainting in Zhao
et al. (2020b), leading to the Modified Inception Score (MIS). When performing
inpainting, there is only one kind of image, and so p(y) can be removed. The MIS
is then defined as
20 An Analysis of Generative Methods for Multiple Image Inpainting 795
⎛ ⎞
MIS(G) = exp ⎝Ex∼pg p(yi |x) log p(yi |x) ⎠ , (25)
i
where yi of is the class label of the ith generated sample. Another improvement of
the IS is the Fréchet Inception Distance (FID) (Heusel et al. 2017) that compares the
statistics of generated images to the ones of original images. FID uses the inception
pre-trained model to extract the feature vectors of real images and fake images and
compare their feature-wise means (μr , μf ) and covariances (Σr , Σf ):
Fréchet Inception Distance has been widely used for validating single and diverse
inpainting results in recent papers (e.g., Peng et al. 2021; Liu et al. 2021; Yu et al.
2021).
Measuring Diversity
In the context of pluralistic inpainting, following the idea proposed for image-to-
image translation in Zhu et al. (2017), LPIPS has been used as a diversity score to
measure how perceptually different the generated images are (Cai and Wei 2020;
Zhao et al. 2020b; Liu et al. 2021). The higher the LPIPS, the more diversity is
present in the results. For instance, in Cai and Wei (2020), they compute the average
distance between the 10,000 pairs randomly generated from the 1000 center-masked
image samples. LPIPS is computed on the full-inpainting results and mask-region
inpainting results, respectively.
Experimental Results
Table 2 Generative methods for diverse inpainting: experimental conditions. Random regular and
irregular masks are generated as in Zheng et al. (2019)
Method Input size Train datasets Training masks Code
PIC 256 × 256 Celeba-HQ Regular (center 128 × 128 + ✓
ImageNet random)
Paris Irregular (random)
Places2
PiiGAN 128 × 128 CelebA center 64 × 64 ✓
Mauflex
Agricultural Disease
UCTGAN 256 × 256 Celeba-HQ Regular (center 128 × 128 + ✗
ImageNet random)
Paris Irregular (random)
Places2
DSI-VQVAE 256 × 256 Celeba-HQ Regular (center 128 × 128 + ✓
ImageNet random)
Places2 Irregular (random)
ICT 256 × 256 FFHQ Irregular Pconv (Liu et al. 2018) ✓
ImageNet
Places2
PD-GAN 256 × 256 Celeba-HQ Irregular Pconv (Liu et al. 2018) ✗
Paris StreetView
Places2
BAT 256 × 256 CelebA-HQ Irregular Pconv (Liu et al. 2018) ✓
Paris StreetView
Places2
Experimental Settings
Table 2 lists all the explained methods together with the training dataset and
corresponding training masks. Aiming for a fair comparison, we compare and test
the methods trained on the same training images, i.e., the VAE-based model PIC
(Zheng et al. 2019), the VQVAE-based model DSI-VQVAE (Peng et al. 2021), and
the two transformer-based models ICT (Wan et al. 2021) and BAT (Yu et al. 2021).
Notice that we do not analyze the performance of PiiGAN (Cai and Wei 2020), as
the training datasets and size images are different from the other methods.
Datasets
We evaluate the methods on the three datasets Celeba-HQ (Karras et al. 2018),
Places2 (Zhou et al. 2017), and ImageNet (Russakovsky et al. 2015). All the
evaluated models take as input images of resolution 256 × 256. Due to the
long inference time of DSI-VQVAE and ICT methods (see Table 7), quantitative
experiments are made on 100 randomly selected images from each training dataset.
For each kind of mask (see below) and for each image, we sample 25 different
results.
20 An Analysis of Generative Methods for Multiple Image Inpainting 797
Fig. 8 Example for each kind of mask considered for evaluation. In gray are the hidden pixels.
From left to right: center, random regular, random irregular, and irregular Pconv masks from Liao
et al. (2018) with <20%, [40%, 60%] and [40%, 60%] hidden pixels
For Celeba-HQ, the 1024 × 1024 resolution images are resized to 256 × 256. For
Places2 and ImageNet, the compared methods were trained on 256 × 256 patches
either by resizing the input images (PIC), by cropping them, randomly (DSI) or
to the center patch (BAT), or by both cropping and resizing (ICT). We will both
consider center-cropped and resized versions of the input images to ensure a fair
comparison among the trained models.
Note that ICT is not trained on Celeba-HQ but on the FFHQ face dataset (Karras
et al. 2019). FFHQ contains higher variation than Celeba-HQ in terms of age,
ethnicity, and image background. It also has a good coverage of accessories. Images
from both datasets are, however, similarly aligned and cropped. Therefore, we still
give the results of the ICT method tested on the Celeba-HQ dataset, but the reader
should remember this difference when analyzing the results.
Inpainting Masks
We use the following type of masks: center, random regular, random irregular,
and irregular masks from Liu et al. (2018) with different proportions of hidden
pixels. Figure 8 shows an example of each kind of mask. The random masks are
generated once for each test image so that all the methods are evaluated on the same
degradation.
We would like to highlight that the methods PIC and DSI-VQVAE train a
different model for regular and irregular holes. Testing on centered or random
regular masks is realized with the former model, and testing on irregular masks
with the latter. The transformer-based methods ICT and BAT only train on “irregular
Pconv” holes given by Liu et al. (2018). Testing on each type of mask will be done
with this unique model.
Quantitative Performance
Table 3 Quantitative comparison of four pluralistic image inpainting methods (PIC, DSI-
VQVAE, ICT and BAT) on Celeba-HQ and for different kind of masks (central, random regular,
random irregular and from Liu et al. 2018). Best and second best results by column are in bold and
italics, respectively
Similarity to GT Realism Diversity
Mask Method PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
Irregular PIC 34.63 0.964 1.17 0.0206 16.8 0.0009
<20%
DSI- 35.49 0.968 1.41 0.0216 11.0 0.0081
VQVAE
ICT 34.72 0.968 2.09 0.0200 9.84 0.0084
BAT 36.25 0.974 1.20 0.0208 9.90 0.0056
Irregular PIC 26.69 0.879 4.19 0.0216 34.2 0.0091
20%–40%
DSI- 27.36 0.888 4.06 0.0223 28.8 0.0357
VQVAE
ICT 26.83 0.891 4.71 0.0189 26.7 0.0383
BAT 27.28 0.900 3.85 0.0214 20.7 0.0269
Irregular PIC 21.47 0.745 10.36 0.0153 65.4 0.0527
40%–60%
DSI- 22.53 0.770 9.01 0.0156 51.9 0.0916
VQVAE
ICT 21.92 0.773 9.82 0.0153 50.7 0.0970
BAT 22.35 0.787 8.91 0.0183 39.7 0.0731
Central PIC 24.46 0.868 5.26 0.0212 23.8 0.0288
128 × 128
DSI- 25.25 0.880 5.08 0.0210 21.7 0.0243
VQVAE
ICT 24.45 0.872 6.06 0.0170 27.3 0.0486
BAT 25.10 0.882 5.21 0.0218 21.5 0.0365
Random PIC 24.16 0.840 7.23 0.0188 33.4 0.0402
regular
DSI- 24.98 0.850 6.46 0.0200 30.5 0.0642
VQVAE
ICT 24.51 0.852 7.24 0.0180 31.3 0.0665
BAT 24.85 0.860 6.52 0.0209 24.6 0.0541
Random PIC 23.47 0.759 8.45 0.0161 73.5 0.0280
irregular
DSI- 24.27 0.785 7.56 0.0167 58.8 0.0744
VQVAE
ICT 23.26 0.781 9.26 0.0148 52.2 0.0855
BAT 24.36 0.810 7.13 0.0186 40.8 0.0495
Average PIC 25.65 0.843 6.11 0.0189 41.2 0.0266
DSI- 26.65 0.857 5.60 0.0195 33.8 0.0497
VQVAE
ICT 25.95 0.855 6.53 0.0173 33.0 0.0575
BAT 26.70 0.869 5.47 0.0203 26.2 0.0410
20 An Analysis of Generative Methods for Multiple Image Inpainting 799
Table 4 Quantitative comparison of four pluralistic image inpainting methods (PIC, DSI-
VQVAE, ICT and BAT) on 256 × 256 center-cropped images from Places2, for different kind
of masks (central, random regular, random irregular and from Liu et al. 2018)
Similarity to GT Realism Diversity
Mask Method PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
Irregular PIC 30.48 0.937 2.02 0.0507 36.8 0.0050
<20% DSI- 31.58 0.952 2.11 0.0482 19.3 0.0187
VQVAE
ICT 29.86 0.943 3.64 0.0463 22.8 0.0198
BAT 32.20 0.957 1.83 0.0463 14.2 0.0158
Irregular PIC 23.88 0.820 6.46 0.0378 97.6 0.0344
20%–40% DSI- 24.20 0.844 6.14 0.0438 63.6 0.0707
VQVAE
ICT 23.08 0.831 8.05 0.0428 70.0 0.0769
BAT 24.10 0.853 6.14 0.0423 53.2 0.0671
Irregular PIC 19.92 0.667 13.75 0.0326 156.1 0.1309
40%–60% DSI- 20.34 0.703 12.52 0.0398 110.2 0.1566
VQVAE
ICT 19.49 0.686 14.66 0.0371 128.7 0.1668
BAT 19.98 0.705 13.10 0.0364 107.0 0.1610
Central PIC 20.98 0.812 9.00 0.0435 96.8 0.1080
128 × 128 DSI- 21.41 0.819 8.85 0.0416 79.8 0.1234
VQVAE
ICT 20.93 0.812 10.22 0.0476 92.2 0.1204
BAT 21.20 0.822 8.76 0.0442 81.8 0.1190
Random PIC 21.70 0.783 10.14 0.0425 103.8 0.1124
regular DSI- 22.36 0.805 9.21 0.0412 75.8 0.1167
VQVAE
ICT 21.75 0.796 10.77 0.0405 87.1 0.1237
BAT 22.34 0.808 9.15 0.0436 76.6 0.1200
Random PIC 20.86 0.658 12.80 0.0255 165.4 0.0979
irregular DSI- 21.18 0.701 11.78 0.0360 114.4 0.1450
VQVAE
ICT 20.07 0.681 14.14 0.0334 131.9 0.1548
BAT 20.85 0.708 12.00 0.0374 103.2 0.1454
Average PIC 22.97 0.780 9.02 0.0388 109.2 0.0814
DSI- 23.51 0.804 8.44 0.0418 76.9 0.1052
VQVAE
ICT 22.53 0.792 10.25 0.0413 88.9 0.1107
BAT 23.44 0.809 8.97 0.0417 72.7 0.1047
(Tables 8 and 9), the results on these two datasets for resized images. The ICT
method is run, as proposed in the original paper, with its top-K parameter (cf.
section “Image Transformers”) set to 50. We investigate the influence of the top-
K parameter in Table 6. Note that for fair quantitative comparison, unlike Zheng
800 C. Ballester et al.
Table 5 Quantitative comparison of three pluralistic image inpainting methods (PIC, DSI-
VQVAE and ICT) on 256 × 256 center-cropped images from ImageNet, for different kind of
masks (central, random regular, random irregular and from Liu et al. 2018)
Similarity to GT Realism Diversity
Mask Method PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
Irregular PIC 30.33 0.941 2.02 0.2416 20.2 0.0036
<20% DSI- 30.44 0.946 2.38 0.2361 12.1 0.0199
VQVAE
ICT 29.23 0.940 3.98 0.2323 10.7 0.0185
Irregular PIC 23.02 0.797 7.37 0.1709 83.7 0.0289
20%–40% DSI- 22.98 0.809 7.56 0.2015 53.4 0.0855
VQVAE
ICT 22.24 0.802 9.23 0.1970 24.9 0.0771
Irregular PIC 18.33 0.623 16.34 0.0792 183.9 0.1269
40%–60% DSI- 18.92 0.651 14.82 0.1192 126.3 0.1907
VQVAE
ICT 18.52 0.646 16.41 0.1329 101.7 0.1700
Central PIC 19.87 0.794 9.77 0.1591 95.8 0.1067
128 × 128 DSI- 20.06 0.795 9.99 0.1754 85.6 0.1291
VQVAE
ICT 20.34 0.795 10.76 0.1753 73.8 0.1162
Random PIC 19.81 0.737 13.24 0.0934 129.2 0.1027
regular DSI- 20.54 0.756 11.52 0.1305 89.3 0.1540
VQVAE
ICT 20.32 0.752 12.76 0.1420 77.6 0.1360
Random PIC 19.51 0.598 14.78 0.0645 193.0 0.0982
irregular DSI- 19.81 0.636 14.04 0.1147 136.8 0.1757
VQVAE
ICT 19.02 0.628 16.08 0.1363 108.5 0.1574
Average PIC 21.81 0.748 10.59 0.1347 117.6 0.0735
DSI- 22.13 0.765 10.07 0.1629 83.9 0.1258
VQVAE
ICT 21.61 0.761 11.54 0.1693 66.2 0.1125
et al. (2019), we do not use any discriminator score to select the best generated
samples.
To measure inpainting quality, we take into account three factors: the similarity to
the ground truth, the realism of inpainting outputs, and the diversity of those outputs.
Definitions and details on the metrics for each factor can be found in section “From
Single-Image Evaluation Metrics to Diversity Evaluation”. Note that, contrary to
Zheng et al. (2019), we do not use here any discriminator score to select the best
samples before evaluation.
In each table, the best and second-best results by column are in bold and italics,
respectively.
20 An Analysis of Generative Methods for Multiple Image Inpainting 801
Table 6 Influence of the top-K parameter on the ICT results. Results obtained on Places2 dataset,
with central mask
Similarity to GT Realism Diversity
top-K PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
5 21.76 0.820 6.52 0.0510 87.6 0.0854
25 21.16 0.813 10.03 0.0495 90.2 0.1146
50 20.93 0.812 10.22 0.0476 92.2 0.1204
Perceptual Quality
Second, to measure realism in the outputs, we measure perceptual quality by using
Modified Inception Score (MIS) and Fréchet Inception Distance (FID) metrics
(defined by (25) and (26), respectively). These two metrics are computed directly
on the whole sets of generated or ground truth images.
BAT, ICT, and DSI-VQVAE are the methods that provide the best scores on
average on all datasets. On the opposite, PIC gives the worst results quantitatively
and, as we will see later, also qualitatively. We argue that a possible reason for
the superior performance of BAT, ICT, and DSI-VQVAE is that, with different
strategies, they separate the tasks of texture and structure recovery. Each task
is handled with a specific subnetwork, first reconstructing structures that then
guide the texture recovery. From a more practical point of view, BAT and ICT
use transformers for global structure understanding and high-level semantics at
a coarse resolution and CNNs for generating textures at the original resolution.
DSI-VQVAE incorporates the multiscale hierarchical organization of VQ-VAE
where the information corresponding to the texture is disentangled from the one
about structure and geometry. Accordingly, DSI-VQVAE incorporates two different
generators respectively devoted to both levels (cf. section “How to Achieve Multiple
and Diverse Inpainting Results?”). Although DSI-VQVAE and PIC are VAE-based
methods, DSI-VQVAE has the advantage that first, at low resolution, it proposes
802 C. Ballester et al.
diverse completions of structure inside the hole. These different structures then
guide the completion of texture at high resolution. PIC does not have this global
structure completion (at least, not explicitly). All in all, splitting the estimation of
coarse and fine details in two distinct steps seems like a successful approach for
high-quality image inpainting.
Note also that BAT is the method that achieves the best scores in terms of
realism. Indeed, as explained before, autoregressive transformers have the ability to
model longer dependencies across the image than CNN-based methods, which can
be crucial for image inpainting. Note that BAT outperforms the other transformer-
based method ICT, especially on irregular masks and large holes. As explained in
section “Autoregressive Models”, one can explain this difference by the fact that
BAT was trained, not only with bidirectional attention but also with autoregressive
sampling. Therefore, it creates better consistency of the reconstructed structures,
especially for large missing regions. The very good results of the DSI-VQVAE
method also prove that autoregressive modeling performs well for realistic image
inpainting.
Finally, one can observe the influence of the complexity of the training dataset
on the performance. Notice that the underlying probability distribution of CelebA-
HQ dataset is semantically less complex and diverse than the one of Places2 and
ImageNet, and, thus, training is more difficult in the latter cases. We hypothesize
that this affects both inpainting quality and inpainting diversity. Regarding quality
the average FID score on all the studied methods trained on CelebA is equal to
33.55, while in the case of Places2 and Imagenet, it is equal to 86.92 and 128.43,
respectively. This gives us an idea of the difference in complexity for each particular
dataset.
Inpainting Diversity
To measure diversity, we rely on the LPIPS metric. The higher the LPIPS is, the
more diverse are the outputs. For each generated sample, we compute the LPIPS
distance with another sample randomly selected from the other 24 results from
the same corrupted image. The reported LPIPS score corresponds to this distance
averaged over the 2500 selected pairs.
First and foremost, from the range of LPIPS values on the different datasets, one
can again observe the influence of the complexity of the training dataset. CelebA-
HQ dataset is semantically more constraint and less complex than the one of Places2
and ImageNet, leading to lower diversity in the outputs. Indeed, the LPIPS is, in
average, ∼2 times smaller on CelebA-HQ than on Places2 or ImageNet. Similarly,
as expected, all the methods create more diverse samples on larger holes than on
smaller holes.
These observations argue for the existence of a trade-off between inpainting
quality and inpainting diversity. The harder the inpainting problem gets (on a more
complex dataset or for a larger hole), the more diverse outputs will be created. This
trade-off, already highlighted in Yu et al. (2021), also arises when parametrizing
a method itself. We study in Table 6 the influence of the top-K parameter on the
20 An Analysis of Generative Methods for Multiple Image Inpainting 803
performance of the ICT algorithm. One can observe that using a smaller K creates
outputs that are, on the one hand, closer to GT and more realistic but, on the other
hand, less diverse.
PIC is the method giving the less diverse results on all datasets. One reason
could be the aforementioned disentanglement of structure and texture of BAT,
DSI-VQVAE, and ICT. In practice, these three methods first attempt to produce
a multiplicity of coherent structures and then fill each of the sampled structure with
a deterministic texture generator. This divide-and-conquer approach makes easier
the creation of diversity as it is only performed on low-resolution structures and not
on the whole reconstructed output.
ICT slightly outperforms DSI-VQVAE and BAT in terms of LPIPS on
the Celeba-HQ testing images. Recall that for this experiment ICT was trained
on the more diverse FFHQ dataset. This observation highlights again the influence
of the training dataset on the capacity of the model to create diverse outputs.
Qualitative Performance
Similar to Zheng et al. (2019), Peng et al. (2021), Wan et al. (2021), and Yu et al.
(2021), for qualitative comparison, we select for each method the 5 samples with
the highest discriminator score out of the 25 generated samples. We use pretrained
discriminators given by each of the models, i.e., for PIC, the discriminator of the
generative pipeline; for DSI-VQVAE, the discriminator of the texture generation
module; and for ICT and BAT, the discriminator of the upsampling module. We
perform this comparison on a representative selection of testing images and masks.
Figures 9, 10, 11, and 12 show some results on CelebA-HQ, Places2, and ImageNet
datasets for the methods PIC, DSI-VQVAE, ICT, and BAT. BAT does not provide
weights for ImageNet. Remember that ICT was not trained on Celeba-HQ but on
FFHQ. Additional visual results are also given in the Appendix.
At first glance, we observe that DSI-VQVAE, ICT, and BAT provide more
plausibly visual results than PIC. PIC struggles to recover information on less
constrained datasets, like Places2 and Imagenet, and creates strong artifacts when
applied to large missing regions (see second examples in Figs. 10 and 12). Among
these methods, BAT and ICT propose the most realistic outputs. For instance, in
Fig. 9, PIC generates results that do not maintain the proportions and harmony of a
face (see the second example). DSI-VQVAE does not have a full understanding
of the image either: for example, in the second example in Fig. 9 and the third
example in Fig. 12, one eye is visible in the input image, but the other is not. On
the opposite, transformer-based methods are able to reconstruct a left eye similar to
the right. This can be explained by the capability of transformers to have a global
structure understanding and high-level semantics. Other examples strengthening
this observation are the first images of Fig. 10, where the inpainting of the snow
is sometimes not realistic, and all the ImageNet results in Fig. 12.
When images contain strong structures, like Figs. 10 and 11, transformer-based
methods again estimate more realistic reconstructions. This can be explained by
804 C. Ballester et al.
Output on 256 u 256 images masked with 128 u 128 center hole
PIC
DSI-VQVAE
ICT
BAT
Output on 256 u 256 images masked with random regular hole
PIC
DSI-VQVAE
ICT
BAT
Fig. 9 Diverse inpainting output on 256× 256 images from Celeba dataset with center, random
regular, and random irregular masks. For each method, out of 25 generated samples, the five
samples with highest discriminator score are displayed
20 An Analysis of Generative Methods for Multiple Image Inpainting 805
Output on 256 u 256 images masked with 128 u 128 center hole
PIC
DSI-VQVAE
ICT
BAT
Output on 256 u 256 images masked with Pconv 40% 60% hole
PIC
DSI-VQVAE
ICT
BAT
Output on 256 u 256 images masked with Pconv 20% 40% hole
PIC
DSI-VQVAE
ICT
BAT
Fig. 10 Diverse inpainting output on 256 × 256 images from Places2 dataset with center and
irregular masks with various proportion of hidden pixels. For each method, out of 25 generated
samples, the 5 samples with highest discriminator score are displayed
806 C. Ballester et al.
Output on 256 u 256 images masked with 128 u 128 center hole
PIC
DSI-VQVAE
ICT
BAT
Output on 256 u 256 images masked with Pconv 40% 60% hole
PIC
DSI-VQVAE
ICT
BAT
Fig. 11 Diverse inpainting output on 256 × 256 images from Places2 dataset with center and
irregular masks with various proportion of hidden pixels. For each method, out of 25 generated
samples, the five samples with highest discriminator score are displayed
20 An Analysis of Generative Methods for Multiple Image Inpainting 807
PIC
DSI-VQVAE
ICT
Output on 256 u 256 images masked with random regular holes
PIC
DSI-VQVAE
ICT
Output on 256 u 256 images masked with Pconv 20% 40% hole
PIC
DSI-VQVAE
ICT
Fig. 12 Diverse inpainting output on 256 × 256 images from ImageNet dataset with center and
irregular masks with various proportion of hidden pixels. For each method, out of 25 generated
samples, the five samples with highest discriminator score are displayed
the fact that they include previously predicted tokens in the training objective, and
thus, global consistency is imposed over the results. This consistency shall avoid
problems in the center of big holes. In some situations, such as the middle example
in Fig. 10, the structure and texture disentanglement of DSI-VQVAE also provides
good reconstructions.
808 C. Ballester et al.
Conclusions
In this chapter, we have tackled the question of whether generative methods are
a suitable strategy to obtain multiple solutions to problems that do not have a
unique solution. By focusing on the inpainting problem, we have reviewed the
main generative models and recent learning-based image completion methods for
multiple and diverse inpainting. We have compared the methods with available code
and model weights on three public datasets. We have shown that the transformer-
based method BAT (or BAT-Fill) and the VQ-VAE-based method DSI-VQVAE
provide the best results in both inpainting quality and multiple inpainting diversity.
This is true both quantitatively and qualitatively. Our analysis highlights that their
advantageous results are due to their strategy that consists in, first, sampling multiple
structures inside the missing regions, and, second, generating textures at higher
resolution in a deterministic way. The PIC method is, however, computationally
way faster than the concurrence. Moreover, our analysis shows that the multiple
inpainting problem is not solved yet, as the results lack of diversity or in general
visually satisfactory results. The difficulty of learning the probability distribution
depending on the training dataset is also evident from our study. Therefore, we
argue that most efforts should be made on improving and exploring new generative
strategies to enhance both the quality and diversity of the solutions of such ill-
posed inverse problem with multiple solutions. For instance, following the spirit
of structure/texture division, one could further separate the problem into different
subtasks or tackle different regions of the scene separately. Another way to improve
inpainting quality would be to have a control of the solution by bounding it
through an input condition such as the semantic of the object you want to fill-
in or by a reference image, among others. Finally, the computational burden of
some of the transformer-based or autoregressive methods is prohibitive for sampling
a high number of solutions in reasonable time. We think that this limitation has
been overviewed for the purpose of image quality but should be now primarily
addressed.
Appendix
Table 8 Quantitative comparison of three pluralistic image inpainting methods (PIC, DSI-
VQVAE, ICT) on 256 × 256 resized images from Places2
Similarity to GT Realism Diversity
Mask Method PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
Irregular PIC 29.86 0.934 2.14 0.0489 32.3 0.0055
<20% DSI- 30.64 0.948 2.30 0.0533 20.0 0.0214
VQVAE
ICT 29.05 0.939 3.83 0.0450 23.0 0.0224
Irregular PIC 22.98 0.808 7.06 0.0394 91.0 0.0375
[20%, 40%] DSI- 23.04 0.832 6.92 0.0443 64.4 0.0789
VQVAE
ICT 22.11 0.818 8.81 0.0423 72.8 0.0831
Irregular PIC 19.01 0.649 14.71 0.0273 144.2 0.1357
[40%, 60%] DSI- 19.15 0.684 13.90 0.0287 115.0 0.1700
VQVAE
ICT 18.50 0.669 15.78 0.0330 127.4 0.1755
Central PIC 19.50 0.797 10.27 0.0335 104.5 0.1129
128 × 128 DSI- 19.46 0.797 10.60 0.0387 94.6 0.1364
VQVAE
ICT 19.42 0.796 11.72 0.0352 101.0 0.1284
Random PIC 20.80 0.773 10.95 0.0359 93.8 0.1152
regular DSI- 21.15 0.791 10.48 0.0426 79.0 0.1233
VQVAE
ICT 21.03 0.787 11.51 0.0382 84.3 0.1239
Random PIC 19.91 0.640 13.85 0.0246 157.7 0.1023
irregular DSI- 20.05 0.682 12.98 0.0329 116.5 0.1539
VQVAE
ICT 19.10 0.662 15.41 0.0285 131.4 0.1607
Average PIC 22.01 0.767 9.83 0.0349 103.9 0.0848
DSI- 22.25 0.789 9.53 0.0401 81.6 0.1140
VQVAE
ICT 21.54 0.779 11.18 0.0370 90.0 0.1157
Another explanation is that the training datasets are large enough and the models
have enough capacity for being robust to such a transformation.
Table 9 Quantitative comparison of three pluralistic image inpainting methods (PIC, DSI-
VQVAE, ICT) on 256 × 256 resized images from ImageNet
Similarity to GT Realism Diversity
Mask Method PSNR↑ SSIM↑ L1 ↓ MIS↑ FID↓ LPIPS↑
Irregular PIC 31.37 0.944 1.82 0.1885 21.5 0.0028
<20% DSI- 31.83 0.952 2.08 0.1913 12.8 0.0175
VQVAE
ICT 30.21 0.946 3.41 0.2002 12.2 0.0203
Irregular PIC 23.13 0.807 6.91 0.1401 93.7 0.0323
[20%, 40%] DSI- 23.45 0.825 6.72 0.1617 61.8 0.0790
VQVAE
ICT 22.36 0.817 8.34 0.1739 52.1 0.0810
Irregular PIC 18.39 0.636 15.84 0.0497 198.0 0.1314
[40%, 60%] DSI- 18.95 0.672 14.14 0.0737 147.9 0.1901
VQVAE
ICT 18.34 0.663 15.85 0.0822 120.4 0.1764
Central PIC 19.31 0.795 10.35 0.0583 153.9 0.1091
128 × 128 DSI- 19.47 0.800 10.25 0.0700 172.1 0.1293
VQVAE
ICT 19.91 0.796 11.27 0.0790 120.3 0.1247
Random PIC 19.63 0.745 13.13 0.0690 150.5 0.1071
regular DSI- 20.13 0.769 11.59 0.1048 113.8 0.1457
VQVAE
ICT 20.13 0.766 12.56 0.1028 101.7 0.1376
Random PIC 19.70 0.618 14.04 0.0457 194.6 0.1021
irregular DSI- 20.11 0.665 12.85 0.0642 155.9 0.1648
VQVAE
ICT 18.94 0.649 15.33 0.0859 131.2 0.1652
Average PIC 21.92 0.758 10.35 0.0919 135.4 0.0949
DSI- 22.32 0.781 9.61 0.1110 107.7 0.1211
VQVAE
ICT 21.64 0.773 11.13 0.1207 89.7 0.1175
Acknowledgments PV, CB, and AB acknowledge the EU Horizon 2020 research and innovation
program NoMADS (Marie Skłodowska-Curie grant agreement No 777826). SP acknowledges the
Leverhulme Trust Research Project Grant “Unveiling the invisible: Mathematics for Conservation
in Arts and Humanities.” CB and PV also acknowledge partial support by MICINN/FEDER UE
project, ref. PGC2018-098625-B-I00, and RED2018-102511-T. AB also acknowledges the French
Research Agency through the PostProdLEAP project (ANR-19-CE23-0027-01). SH acknowledges
the French Ministry of Research through a CDSN grant of ENS Paris-Saclay.
812 C. Ballester et al.
Output on 256 u 256 images masked with 128 u 128 center hole
PIC
DSI-VQVAE
ICT
BAT
Output on 256 u 256 images masked with Pconv 20% - 40%
PIC
DSI-VQVAE
ICT
BAT
Fig. 13 Diverse inpainting output on 256 × 256 images from Celeba dataset with center, and
irregular masks. For each method, out of 25 generated samples, the 5 samples with highest
discriminator score are displayed
20 An Analysis of Generative Methods for Multiple Image Inpainting 813
Output on 256 u 256 images masked with 128 u 128 center hole
PIC
DSI-VQVAE
ICT
Output on 256 u 256 images masked with Pconv < 20%
PIC
DSI-VQVAE
ICT
Output on 256 u 256 images masked with Pconv 40% - 60%
PIC
DSI-VQVAE
ICT
Fig. 14 Diverse inpainting output on 256 × 256 images from ImageNet dataset with centered and
irregular masks with different hidded proportions. For each method, out of 25 generated samples,
the 5 samples with highest discriminator score are displayed
References
Arias, P., Facciolo, G., Caselles, V., Sapiro, G.: A variational framework for exemplar-based image
inpainting. Int. J. Comput. Vis. 93(3), 319–347 (2011)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In:
International Conference on Machine Learning, pp. 214–223. PMLR (2017)
814 C. Ballester et al.
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Aujol, J.-F., Ladjal, S., Masnou, S.: Exemplar-based inpainting from a variational point of view.
SIAM J. Math. Anal. 42(3), 1246–1285 (2010)
Baatz, W., Fornasier, M., Markowich, P.A., bibiane Schönlieb, C.: Inpainting of ancient austrian
frescoes. In: Conference Proceedings of Bridges, pp. 150–156 (2008)
Ballester, C., Bertalmío, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation
of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001)
Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: Cvae-gan: fine-grained image generation through
asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision,
pp. 2745–2754 (2017)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch. In: ACM SIGGRAPH
2009 papers on – SIGGRAPH’09. ACM Press (2009)
Barratt, S., Sharma, R.: A note on the inception score (2018). arXiv preprint arXiv:1801.01973
Bertalmío, M., Bertozzi, A., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video
inpainting. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001. IEEE Computer Society (2001)
Bertalmío, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proceedings of the
27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH’00,
pp. 417–424. ACM Press/Addison-Wesley Publishing Co (2000)
Bertozzi, A.L., Esedoglu, S., Gillette, A.: Inpainting of binary images using the cahn–hilliard
equation. IEEE Trans. Image Process. 16(1), 285–291 (2007)
Bevilacqua, M., Aujol, J.-F., Biasutti, P., Brédif, M., Bugeau, A.: Joint inpainting of depth and
reflectance with visibility estimation. ISPRS J. Photogram. Rem. Sens. 125, 16–32 (2017)
Biasutti, P., Aujol, J.-F., Brédif, M., Bugeau, A.: Diffusion and inpainting of reflectance and height
LiDAR orthoimages. Comput. Vis. Image Underst. 179, 31–40 (2019)
Bornard, R., Lecan, E., Laborelli, L., Chenot, J.-H.: Missing data correction in still images and
image sequences. In: Proceedings of the Tenth ACM International Conference on Multimedia
– MULTIMEDIA’02. ACM Press (2002)
Bornemann, F., März, T.: Fast image inpainting based on coherence transport. J. Math. Imag. Vis.
28(3), 259–278 (2007)
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.: Generating sentences
from a continuous space (2015). arXiv preprint arXiv:1511.06349
Buyssens, P., Daisy, M., Tschumperle, D., Lezoray, O.: Exemplar-based inpainting: Technical
review and new heuristics for better geometric reconstructions. IEEE Trans. Image Process.
24(6), 1809–1824 (2015)
Cai, N., Su, Z., Lin, Z., Wang, H., Yang, Z., Ling, B.W.-K.: Blind inpainting using the fully
convolutional neural network. Vis. Comput. 33(2), 249–261 (2015)
Cai, W., Wei, Z.: Piigan: generative adversarial networks for pluralistic image inpainting. IEEE
Access 8, 48451–48463 (2020)
Calatroni, L., d’Autume, M., Hocking, R., Panayotova, S., Parisotto, S., Ricciardi, P., Schönlieb,
C.-B.: Unveiling the invisible: mathematical methods for restoring and interpreting illuminated
manuscripts. Herit. Sci. 6(1), 56 (2018)
Cao, F., Gousseau, Y., Masnou, S., Pérez, P.: Geometrically guided exemplar-based inpainting.
SIAM J. Imag. Sci. 4(4), 1143–1179 (2011)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object
detection with transformers. In: European Conference on Computer Vision, pp. 213–229.
Springer (2020)
Caselles, V., Morel, J.-M., Sbert, C.: An axiomatic approach to image interpolation. IEEE Trans.
Image Process. 7(3), 376–386 (1998)
Chan, T.F., Shen, J.: Nontexture inpainting by curvature-driven diffusions. J. Vis. Commun. Image
Rep. 12(4), 436–449 (2001)
20 An Analysis of Generative Methods for Multiple Image Inpainting 815
Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: An improved autoregressive
generative model. In: International Conference on Machine Learning, pp. 864–872. PMLR
(2018)
Chen, Y., Li, Y., Guo, H., Hu, Y., Luo, L., Yin, X., Gu, J., Toumoulin, C.: CT metal artifact
reduction method based on improved image segmentation and sinogram in-painting. Math.
Probl. Eng. 2012, 1–18 (2012)
Criminisi, A., Perez, P., Toyama, K.: Region filling and object removal by exemplar-based image
inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004)
Dahl, R., Norouzi, M., Shlens, J.: Pixel recursive super resolution. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 5439–5448 (2017)
Dapogny, A., Cord, M., Pérez, P.: The missing data encoder: cross-channel image completion
with hide-and-seek adversarial network. Proc. AAAI Conf. Artif. Intell. 34(07), 10688–10695
(2020)
Demanet, L., Song, B., Chan, T.: Image inpainting by correspondence maps: a deterministic
approach. Appl. Comput. Math. 1100, 217–50 (2003)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional
transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani,
M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers
for image recognition at scale (2020). arXiv preprint arXiv:2010.11929
Efros, A., Leung, T.: Texture synthesis by non-parametric sampling. In: Proceedings of the Seventh
IEEE International Conference on Computer Vision. IEEE (1999)
Eller, M., Fornasier, M.: Rotation invariance in exemplar-based image inpainting. In: Variational
Methods: In Maitine, B., Gabriel, P., Christoph, S., Jean-Baptiste, C., Thomas, H. (eds.),
Imaging and Geometric Control, pp. 108–183. De Gruyter, Berlin, Boston (2017). https://doi.
org/10.1515/9783110430394-004
Esedoglu, S., Shen, J.: Digital inpainting based on the mumford–shah–euler image model. Eur. J.
Appl. Math. 13(04), 353–370 (2002)
Fawzi, A., Samulowitz, H., Turaga, D., Frossard, P.: Image inpainting through neural networks hal-
lucinations. In: IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop,
pp. 1–5. IEEE (2016)
Fedorov, V., Arias, P., Facciolo, G., Ballester, C.: Affine invariant self-similarity for exemplar-
based inpainting. In: Proceedings of the 11th Joint Conference on Computer Vision, Imaging
and Computer Graphics Theory and Applications. SCITEPRESS – Science and Technology
Publications (2016)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A.C., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680
(2014)
Grossauer, H.: Inpainting of movies using optical flow. In: Mathematics in Industry, pp. 151–162.
Springer, Berlin/Heidelberg (2006)
Grossauer, H., Scherzer, O.: Using the complex ginzburg-landau equation for digital inpainting
in 2d and 3d. In: Scale Space Methods in Computer Vision, pp. 225–236. Springer,
Berlin/Heidelberg (2003)
Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: Pixcolor: Pixel recursive
colorization (2017). arXiv preprint arXiv:1705.07208
Guillemot, C., Le Meur, O.: Image inpainting: overview and recent advances. IEEE Sig. Process.
Mag. 31(1), 127–144 (2014)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of
wasserstein gans (2017). arXiv preprint arXiv:1704.00028
Guo, Z., Chen, Z., Yu, T., Chen, J., Liu, S.: Progressive image inpainting with full-resolution
residual network. In: Proceedings of the 27th ACM International Conference on Multimedia,
MM’19, New York, pp. 2496–2504. Association for Computing Machinery (ACM) (2019)
816 C. Ballester et al.
Han, X., Wu, Z., Huang, W., Scott, M.R., Davis, L.S.: Finet: compatible and diverse fashion image
inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision,
pp. 4481–4491 (2019)
Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. 26(3),
87–94 (2007)
Hervieu, A., Papadakis, N., Bugeau, A., Gargallo, P., Caselles, V.: Stereoscopic image inpainting:
distinct depth maps and images inpainting. In: 2010 20th International Conference on Pattern
Recognition, pp. 4101–4104. IEEE (2010)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two
time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30,
6629–6640 (2017)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. In:
ACM Transactions on Graphics (ToG), vol. 36(4), pp. 1–14. ACM, New York, NY, USA (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-
resolution. In: European Conference on Computer Vision, pp. 694–711. Springer (2016)
Kang, S.H., Chan, T., Soatto, S.: Inpainting from multiple views. In: Proceedings. First Inter-
national Symposium on 3D Data Processing Visualization and Transmission. IEEE Computer
Society (2002)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality,
stability, and variation. In: International Conference on Learning Representations (2018)
Karras, T., Laine, S., and Aila, T.: A style-based generator architecture for generative adversarial
networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 4401–4410 (2019)
Kettunen, M., Härkönen, E., Lehtinen, J.: E-lpips: robust perceptual image similarity via random
transformation ensembles (2019). arXiv preprint arXiv:1906.03973
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on
Learning Representations (2013)
Kingma, D.P., Welling, M., et al.: An introduction to variational autoencoders. Found. Trends®
Mach. Learn. 12(4), 307–392 (2019)
Köhler, R., Schuler, C., Schölkopf, B., Harmeling, S.: Mask-specific inpainting with deep neural
networks. In: Jiang, X., Hornegger, J., Koch, R. (eds.) Pattern Recognition, pp. 523–534,
Springer International Publishing, Cham (2014)
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer (2021). arXiv preprint
arXiv:2102.04432
Kumar, V., Mukherjee, J., Mandal, S.K.D.: Image inpainting through metric labeling via guided
patch mixing. IEEE Trans. Image Process. 25(11), 5212–5226 (2016)
Lahiri, A., Jain, A.K., Agrawal, S., Mitra, P., Biswas, P.K.: Prior guided GAN based semantic
inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 13696–13705 (2020)
Le Meur, O., Ebdelli, M., Guillemot, C.: Hierarchical super-resolution-based inpainting. IEEE
Trans. Image Process. 22(10), 3779–3790 (2013)
Lempitsky, V., Vedaldi, A., Ulyanov, D.: Deep image prior. In: 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 9446–9454. IEEE (2018)
Li, J., He, F., Zhang, L., Du, B., Tao, D.: Progressive reconstruction of visual structure for image
inpainting. In: 2019 IEEE/CVF International Conference on Computer Vision. IEEE (2019)
Li, J., Wang, N., Zhang, L., Du, B., Tao, D.: Recurrent feature reasoning for image inpainting.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 7760–7768 (2020)
Liao, L., Hu, R., Xiao, J., Wang, Z.: Edge-aware context encoder for image inpainting. In: 2018
IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3156–3160.
IEEE (2018)
Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular
holes using partial convolutions. In: European Conference on Computer Vision, pp. 89–105
(2018)
20 An Analysis of Generative Methods for Multiple Image Inpainting 817
Liu, H., Jiang, B., Song, Y., Huang, W., Yang, C.: Rethinking image inpainting via a mutual
encoder-decoder with feature equalizations. In: Computer Vision – ECCV 2020, pp. 725–741.
Springer International Publishing (2020)
Liu, H., Jiang, B., Xiao, Y., Yang, C.: Coherent semantic attention for image inpainting. In: 2019
IEEE/CVF International Conference on Computer Vision. IEEE (2019)
Liu, H., Wan, Z., Huang, W., Song, Y., Han, X., Liao, J.: Pd-gan: probabilistic diverse gan for image
inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 9371–9381 (2021)
Mansfield, A., Prasad, M., Rother, C., Sharp, T., Kohli, P., Gool, L.V.: Transforming image
completion. In: Procedings of the British Machine Vision Conference 2011. British Machine
Vision Association (2011)
Mao, Q., Lee, H.-Y., Tseng, H.-Y., Ma, S., Yang, M.-H.: Mode seeking generative adversarial
networks for diverse image synthesis. In: Conference on Computer Vision and Pattern
Recognition, pp. 1429–1437 (2019)
Masnou, S., Morel, J.-M.: Level lines based disocclusion. In: Proceedings 1998 International
Conference on Image Processing. ICIP98 (Cat. No.98CB36269). IEEE Computer Society
(1998)
Nazeri, K., Ng, E., Joseph, T., Qureshi, F., Ebrahimi, M.: EdgeConnect: generative image
inpainting with adversarial edge learning. In: The IEEE International Conference on Computer
Vision Workshops (2019)
Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Video inpainting of complex scenes.
SIAM J. Imag. Sci. 7(4), 1993–2019 (2014)
Nitzberg, M., Mumford, D., Shiota, T.: Filtering, Segmentation and Depth. Springer,
Berlin/Heidelberg (1993)
Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G.,
Lockhart, E., Cobo, L., Stimberg, F., et al.: Parallel wavenet: Fast high-fidelity speech synthesis.
In: International Conference on Machine Learning, pp. 3918–3926. PMLR (2018)
Oord, A.V.D., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Con-
ditional image generation with pixelcnn decoders. In: Proceedings of the 30th International
Conference on Neural Information Processing Systems, pp. 4797–4805 (2016)
Papafitsoros, K., Schönlieb, C.B.: A combined first and second order variational approach for
image reconstruction. J. Math. Imag. Vis. 48(2), 308–338 (2013)
Parisotto, S., Lellmann, J., Masnou, S., Schönlieb, C.-B.: Higher-order total directional variation:
imaging applications. SIAM J. Imag. Sci. 13(4), 2063–2104 (2020)
Parisotto, S., Vitoria, P., Ballester, C., Bugeau, A., Reynolds, S., Schonlieb, C.-B.: The Art of
Inpainting – A Monograph on Mathematical Methods for the Virtual Restoration of Illuminated
Manuscripts (2022) (submitted)
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive
normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 2337–2346 (2019)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature
learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2536–2544. IEEE (2016)
Peng, J., Liu, D., Xu, S., Li, H.: Generating diverse structure for image inpainting with hierarchical
vq-vae. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 10775–10784 (2021)
Peter, P., Weickert, J.: Compressing images with diffusion- and exemplar-based inpainting.
In: Lecture Notes in Computer Science, pp. 154–165. Springer International Publishing
(2015)
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2.
In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
Ren, J.S., Xu, L., Yan, Q., Sun, W.: Shepard convolutional neural networks. In: Proceedings
of the 28th International Conference on Neural Information Processing Systems. NIPS’15,
Cambridge, MA, vol. 1, pp. 901–909. The MIT Press (2015)
818 C. Ballester et al.
Ren, Y., Yu, X., Zhang, R., Li, T.H., Liu, S., Li, G.: StructureFlow: image inpainting via structure-
aware appearance flow. In: 2019 IEEE/CVF International Conference on Computer Vision,
pp. 181–190. IEEE (2019)
Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational approaches for
auto-encoding generative adversarial networks (2017). arXiv preprint arXiv:1706.04987
Rott Shaham, T., Dekel, T., Michaeli, T.: Singan: Learning a generative model from a single natural
image. In: International Conference on Computer Vision (2019)
Royer, A., Kolesnikov, A., Lampert, C.H.: Probabilistic image colorization (2017). arXiv preprint
arXiv:1705.04258
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A.,
Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J.
Comput. Vis. 115(3), 211–252 (2015)
Ružić, T., Cornelis, B., Platiša, L., Pižurica, A., Dooms, A., Philips, W., Martens, M., Mey,
M.D., Daubechies, I.: Virtual restoration of the ghent altarpiece using crack detection and
inpainting. In: Advanced Concepts for Intelligent Vision Systems, pp. 417–428. Springer,
Berlin/Heidelberg (2011)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved
techniques for training gans. In: Advances in Neural Information Processing Systems (2016)
Schonlieb, C.-B.: Partial Differential Equation Methods for Image Inpainting. Cambridge
University Press, New York (2015)
Shen, J., Chan, T.F.: Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math.
62(3), 1019–1043 (2002)
Shen, J., Kang, S.H., Chan, T.F.: Euler’s elastica and curvature-based inpainting. SIAM J. Appl.
Math. 63(2), 564–592 (2003)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional
generative models. Adv. Neural Inf. Process. Syst. 28, 3483–3491 (2015)
Sun, J., Yuan, L., Jia, J., Shum, H.-Y.: Image completion with structure propagation. ACM Trans.
Graph. 24(3), 861–868 (2005)
Tai, X.-C., Osher, S., Holm, R.: Image inpainting using a TV-stokes equation. In: Image Processing
Based on Partial Differential Equations, pp. 3–22. Springer, Berlin/Heidelberg (2007)
Tovey, R., Benning, M., Brune, C., Lagerwerf, M.J., Collins, S.M., Leary, R.K., Midgley, P.A.,
Schönlieb, C.-B.: Directional sinogram inpainting for limited angle tomography. Inverse Probl.
35(2), 024004 (2019)
Tschumperle, D., Deriche, R.: Vector-valued image regularization with PDEs: a common
framework for different applications. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 506–517
(2005)
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In:
Proceedings of the 31st International Conference on Neural Information Processing Systems,
pp. 6309–6318 (2017)
Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In:
International Conference on Machine Learning, pp. 1747–1756. PMLR (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin,
I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–
6008 (2017)
Vitoria, P., Ballester, C.: Automatic flare spot artifact detection and removal in photographs. J.
Math. Imag. Vis. 61(4), 515–533 (2019)
Vitoria, P., Sintes, J., Ballester, C.: Semantic image inpainting through improved Wasserstein
generative adversarial networks. In: Proceedings of the 14th International Joint Conference on
Computer Vision, Imaging and Computer Graphics Theory and Applications. VISAPP, vol. 4,
pp. 249–260. INSTICC, SciTePress (2019)
Vitoria, P., Sintes, J., Ballester, C.: Semantic image completion through an adversarial strategy. In:
Communications in Computer and Information Science, pp. 520–542. Springer International
Publishing (2020)
20 An Analysis of Generative Methods for Multiple Image Inpainting 819
Vo, H.V., Duong, N.Q.K., Pérez, P.: Structural inpainting. In: 2018 ACM Multimedia Conference
on Multimedia Conference, MM’18, New York, pp. 1948–1956. Association for Computing
Machinery (ACM) (2018)
Wan, Z., Zhang, J., Chen, D., Liao, J.: High-fidelity pluralistic image completion with transformers
(2021). arXiv preprint arXiv:2103.14031
Wang, Z.B., Alan, C.S., Hamid, R.S.: Image quality assessment: From error visibility to structural
similarity. IEEE Trans. Image Process 13(4), 600–612 (2004)
Wang, Y., Tao, X., Qi, X., Shen, X., Jia, J.: Image inpainting via generative multi-column
convolutional neural networks. In: Proceedings of the 32nd International Conference on Neural
Information Processing Systems, pp. 329–338. Curran Associates Inc., Montréal, Canada
(2018)
Wexler, Y., Shechtman, E., Irani, M.: Space-time video completion. In: Proceedings of the 2004
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004.
IEEE (2004)
Xiong, W., Yu, J., Lin, Z., Yang, J., Lu, X., Barnes, C., Luo, J.: Foreground-aware image inpainting.
In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5840–5848.
IEEE (2019)
Yan, Z., Li, X., Li, M., Zuo, W., Shan, S.: Shift-net: image inpainting via deep feature
rearrangement. In: Computer Vision – ECCV 2018, pp. 3–19. Springer International Publishing
(2018)
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using
multi-scale neural patch synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition, pp. 6721–6729. IEEE (2017)
Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-
resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 5791–5800 (2020)
Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image
inpainting with deep generative models. In: 2017 IEEE Conference on Computer Vision and
Pattern Recognition, pp. 5485–5493. IEEE (2017)
Yi, K., Guo, Y., Fan, Y., Hamann, J., Wang, Y.G.: Cosmovae: variational autoencoder for CMB
image inpainting (2020a). arXiv preprint arXiv:2001.11651
Yi, Z., Tang, Q., Azizi, S., Jang, D., Xu, Z.: Contextual residual aggregation for ultra high-
resolution image inpainting. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 7505–7514. IEEE (2020b)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.: Free-form image inpainting with gated
convolution. In: International Conference on Computer Vision, pp. 4470–4479 (2019)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual
attention. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 5505–5514. IEEE (2018)
Yu, Y., Zhan, F., Wu, R., Pan, J., Cui, K., Lu, S., Ma, F., Xie, X., Miao, C.: Diverse image inpainting
with bidirectional and autoregressive transformers (2021). arXiv preprint arXiv:2104.12335
Zeng, Y., Fu, J., Chao, H., Guo, B.: Learning pyramid-context encoder network for high-
quality image inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 1486–1494. IEEE (2019)
Zeng, Y., Lin, Z., Yang, J., Zhang, J., Shechtman, E., Lu, H.: High-resolution image inpainting with
iterative confidence feedback and guided upsampling. In: European Conference on Computer
Vision, pp. 1–17. Springer (2020)
Zhang, H., Hu, Z., Luo, C., Zuo, W., Wang, M.: Semantic image inpainting with progressive
generative networks. In: 2018 ACM Multimedia Conference on Multimedia Conference,
MM’18, pp. 1939–1947. ACM Press (2018a)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep
features as a perceptual metric. In: Conference on Computer Vision and Pattern Recognition
(2018b)
820 C. Ballester et al.
Zhao, J., Han, J., Shao, L., Snoek, C.G.: Pixelated semantic colorization. Int. J. Comput. Vis.
128(4), 818–834 (2020a)
Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W., Lu, D.: UCTGAN: diverse image
inpainting based on unsupervised cross-space translation. In: Conference on Computer Vision
and Pattern Recognition, pp. 5741–5750 (2020b)
Zheng, C., Cham, T.-J., Cai, J.: Pluralistic image completion. In: Conference on Computer Vision
and Pattern Recognition, pp. 1438–1447 (2019)
Zheng, C., Cham, T.-J., Cai, J.: Tfill: image completion via a transformer-based architecture
(2021). arXiv preprint arXiv:2104.00845
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database
for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Multimodal
image-to-image translation by enforcing bi-cycle consistency. In: Advances in Neural Informa-
tion Processing Systems, pp. 465–476 (2017)
Analysis of Different Losses for Deep
Learning Image Colorization 21
Coloma Ballester, Hernan Carrillo, Michaël Clément,
and Patricia Vitoria
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Losses in the Colorization Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Error-Based Losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
Generative Adversarial Network-Based Losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
Distribution-Based Losses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
Proposed Colorization Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Detailed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Quantitative Evaluation Metrics Used in Colorization Methods . . . . . . . . . . . . . . . . . . . . . 833
Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
Generalization to Archive Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Abstract
C. Ballester · P. Vitoria
University Pompeu Fabra, Barcelona, Spain
e-mail: [email protected]; [email protected]
H. Carrillo · M. Clément ()
LaBRI, CNRS, Bordeaux INP, Université de Bordeaux, Bordeaux, France
e-mail: [email protected]; [email protected]
losses rely on the comparison of perceptual properties. But, is the choice of the
objective function that crucial, i.e., does it play an important role in the results?
In this chapter, we aim to answer this question by analyzing the impact of the
loss function on the estimated colorization results. To that goal, we review the
different losses and evaluation metrics that are used in the literature. We then
train a baseline network with several of the reviewed objective functions, classic
L1 and L2 losses, as well as more complex combinations such as Wasserstein
GAN and VGG-based LPIPS loss. Quantitative results show that the models
trained with VGG-based LPIPS provide overall slightly better results for most
evaluation metrics. Qualitative results exhibit more vivid colors when trained
with Wasserstein GAN plus the L2 loss or again with the VGG-based LPIPS.
Finally, the convenience of quantitative user studies is also discussed to overcome
the difficulty of properly assessing on colorized images, notably for the case of
old archive photographs where no ground truth is available.
Keywords
Introduction
all possible correct solutions. Alternatively, instead of directly learning the per-
pixel chrominance information, some methods learn a per-pixel color distribution to,
afterward, sample from it the color at each pixel. In principle, this could encourage
the mapping to be one to many, which can be desirable. However, how to properly
capitalize and train such networks to account for the different possible solutions
having, both, geometric and semantic meaning remains an open problem.
This chapter aims to analyze the influence of the optimized objective function
on the results of automatic deep learning methods for image colorization. Some of
the chosen objective functions favor colorization results perceptually as plausible as
the associated color ground truth image, no matter the pixel-wise color differences
between them, while others aim to recover the ground truth values. To the best of
our knowledge, there is currently no study about their influence over the results.
Additionally, besides the selected objective function used to train the model,
another important choice is the color space we will work on. Almost all colorization
methods work either on a Luminance–Chrominance or on the RGB color space.
Only a few of them, such as Larsson et al. (2016), work on Hue-Saturation-
based color spaces. Thus, together with this chapter, another chapter of the current
handbook, called Chap. 22, “Influence of Color Spaces for Deep Learning Image
Colorization” has been added for completeness. It focuses on the influence of color
spaces. It also contains a more detailed review of the literature on image colorization
and of the used datasets. We refer the reader to the mentioned chapter for these
reviews.
The rest of this chapter is organized as follows. In Section “Losses in the
Colorization Literature,” we first make a review of the loss functions that have been
used in the field of image colorization while connecting them with the colorization-
related works. Section “Proposed Colorization Framework” details the framework
used to analyze the influence of the different losses, including both the chosen
architecture and evaluation metrics. Finally, in Section “Experimental Analysis,” we
present quantitative and qualitative colorization results on a classical image dataset,
and Section “Generalization to Archive Images” shows extended results on archive
images. Conclusions can be found in Section “Conclusion.”
The objective loss function summarizes the desired properties that we want the
estimated outcome to satisfy. In this section, we review the losses and evaluation
methods used in the literature.
Along this chapter, a color image is assumed to be defined on a bounded domain
, a subset of R2 . With a slight abuse of notation, we will both use the same notation
to refer to the continuous setting, where ⊂ R2 is an infinite resolution image
domain and u : → RC , and to the discrete setting, where represents a discrete
domain given by a grid of M × N pixels, M, N ∈ N, and u is a function defined
on this discrete and with values in RC . In the latter case, u is usually given by
a real-valued matrix of size M × N × C representing the image values. Finally, C
824 C. Ballester et al.
Table 1 Losses used to train deep learning methods for image colorization. CE stands for cross-
entropy and KL for Kullback–Leibler divergence
Using Histogram User Object
GANs prediction guided Diverse aware Survey
He et al. (2018)
Su et al. (2020)
MAE Antic (2019) • • •
smooth-L1 • • • •
MSE • • • • • • • • • • •
GANs • • • • • •
KL on distributions •
CE on distributions • • • •
KL for classification •
CE for classification • •
neg log-likelihood • • • •
Perceptual • •
Error-Based Losses
In the following, the different losses used in the literature of image colorization are
described and related to some representative works that capitalize on them. Table 1
summarizes it.
MSE or squared L2 loss. Given two functions u and v defined on and with
values in RC , C ∈ N, the so-called Mean Square Error (MSE) between u and v is
defined as the squared L2 loss of their difference. That is
MSE(u, v) = u − v2L2 (;RC ) = u(x) − v(x)22 dx, (1)
M
N
C
MSE(u, v) = (ui,j,k − vi,j,k )2 . (2)
i=1 j =1 k=1
It has been extensively used for image colorization methods (Cheng et al. 2015;
Larsson et al. 2016; Zhang et al. 2016; Iizuka et al. 2016; Isola et al. 2017; Nazeri
et al. 2018; Vitoria et al. 2020) (see also Table 1), where C = 3 if u and v are color
images (usually the predicted and the ground truth data) or C = 2 in the case that u
and v are chrominance images. Although while the training with this loss can lead
to a more stable solution, it is not robust to outliers in the data and penalizes large
errors while being more tolerant to small errors.
MAE or L1 loss with l 1 -coupling. The Mean Absolute Error is defined as the L1
loss with l 1 -coupling, that is
C
MAE(u, v) = u(x) − v(x)l 1 dx = |uk (x) − vk (x)|dx. (3)
k=1
In the discrete setting, it coincides with the sum of the absolute differences
|ui,j,k − vi,j,k |. Some authors use a l 2 -coupled version of it:
N
M C
MAEc (u, v) = (u
i,j,k − vi,j,k ) .
2 (4)
i=1 j =1 k=1
for g ∈ R. Several works Su et al. (2020), Cao et al. (2017), Yoo et al. (2019), Zhang
et al. (2017), He et al. (2018), and Guadarrama et al. (2017) use MAE, MAEc , or
Smooth L1 losses either alone or combined with other losses (cf. Table 1).
Previous error-based losses aim to find a solution close to the ground truth.
This is counterproductive to the idea that image colorization has multiple possible
solutions. Additionally, both metrics are poorly related to perceptual quality.
Nonetheless, both metrics are the most used ones to train deep learning approaches.
In Section “Experimental Analysis,” we present some numerical results together
with a comparison with other kinds of losses.
826 C. Ballester et al.
Aiming at favoring a solution keeping from the ground truth not the exact values
but more perceptual or style features, the following error losses have been proposed
and used for colorization purposes.
Feature Loss. The feature reconstruction loss (Gatys et al. 2016; Johnson et al.
2016) is a perceptual loss that encourages images to have similar feature representa-
tions as the ones computed by a pretrained network, denoted here by Φ. Let Φl (u)
be the activation of the l-th layer of the network Φ when processing the image u; if l
is a convolutional layer, then Φl (u) will be a feature map of size Cl × Wl × Hl . The
feature reconstruction loss is the normalized squared Euclidean distance between
feature representations, that is
1 2
Llfeat (u, v) = Φl (u) − Φl (v) 2
. (6)
Cl Wl Hl
It penalizes the output reconstructed image when it deviates in feature content from
the target.
In our experimental analysis in Section “Experimental Analysis,” we analyze
the influence of the perceptual loss given by the VGG-based LPIPS (21), which
was introduced in Ding et al. (2021) as a generalization of the perceptual loss
above (Johnson et al. 2016).
Aiming to favor more diverse and perceptually plausible colorization results, losses
based on Generative Adversarial Networks (GANs) (Goodfellow et al. 2014) have
been introduced in the colorization literature (Isola et al. 2017; Cao et al. 2017;
Nazeri et al. 2018; Yoo et al. 2019; Vitoria et al. 2020). GANs are a kind of
generative methods where the goal is to learn the probability distribution of the
considered dataset by learning to generate new samples as if they where coming
from that dataset. In the case of GANs, the learning is done by an adversarial
learning strategy.
Vanilla GAN. The first GAN proposal by Goodfellow et al. (2014) is based on a
game theory scenario between two networks competing one against another. The
first network called generator, denoted by G, aims to generate samples of data
as similar as possible to the ones of real data Pr . The second network, called
discriminator, aims to classify between real and generated data. To do so, the
discriminator, denoted here by D, is trained to maximize the probability of correctly
distinguishing between real examples and samples created by the generator. On the
other hand, G is trained to fool the discriminator by generating realistic examples.
The adversarial loss of the vanilla GAN is defined as
21 Analysis of Different Losses for Deep Learning Image Colorization 827
Wasserstein GAN. Although vanilla GANs have achieved good results in many
domains, they have some drawbacks like convergence, vanishing gradients, and
mode collapse problems. Therefore, some modifications from the original GAN
have been proposed. For example, the Wasserstein GAN (WGAN), proposed by
Arjovsky et al. (2017), replaces the underlying Jensen–Shannon divergence from
the original proposal with the Wasserstein−1 distance (or Earth Mover distance)
between two probability distributions. Then, the WGAN loss, Ladv,wgan , and WGAN
optimization problem can be defined as
min max Ladv,wgan (Gθ , Dφ ) = min max Eu∼Pr [Dφ (u)] − Ev∼PGθ [Dφ (v)]
Gθ Dφ ∈D Gθ Dφ ∈D
(9)
where D denotes the set of 1-Lipschitz functions. To enforce the 1-Lipschitz
condition, in Gulrajani et al. (2017), the authors propose a Gradient Penalty (GP)
term constraining the L2 norm of the gradient while optimizing the original WGAN
during training. The resulting loss for the WGAN-GP can be defined as
min max Eu∼Pr [Dφ (u)] − Ev∼PGθ [Dφ (v)] − λEu∼P [(∇u D(u)2 − 1)2 ]
Gθ Dφ
(10)
where u is a sample defined as
u = tu + (1 − t)v,
with t uniformly sampled in [0, 1] and u ∼ Pr , v ∼ PGθ . The last term in (10)
provides a tractable approximation to enforce the norm of the gradient of D to be
less than 1. The authors of Gulrajani et al. (2017) motivated it by a theoretical result
showing that the optimal discriminator D contains straight lines connecting samples
in the ground truth space and samples in the space of generated data. Moreover,
they experimentally observed that this technique exhibits good performance in
practice. Finally, let us observe that the minus before the gradient penalty term
in (10) corresponds to the fact that the WGAN min-max objective (10) implies
maximization with respect to the discriminator parameters.
In our experimental results in Section “Experimental Analysis,” we will present
a comparison of several losses, and we will include a combination of WGAN
loss and a VGG-based LPIPS loss. To the best of our knowledge, it has not
been proposed yet.
828 C. Ballester et al.
Distribution-Based Losses
Here, ρ is usually taken as the ground truth density (sometimes as a Dirac delta or a
one-hot vector on the ground truth value, or a regularized one) and ρ the predicted
one.
Some works predict a color distribution density per pixel where the color bins
are associated to a fixed 2D grid in a chrominance space (e.g., CIE Lab in Zhang
et al. 2016). In Zhang et al. (2016), the final color of each pixel in the inferred color
image is given by the expectation (sum over the color bin centroids weighted by the
histogram). Others, such as Larsson et al. (2016), learn Hue-Saturation-based color
distributions. More precisely, Larsson et al. (2016) learn the marginal distributions
ρ Hue and ρ Chroma of Hue and Chroma, per pixel, where chroma is related to
saturation by the formula Saturation= Chroma
Value and Value=Luminance+
Chroma
2 . They
use the KL divergence to measure the deviation between the estimated distributions
and the ground truth ones. The marginal ground truth distributions, ρ Chroma , ρ Hue ,
are again defined as either a one-hot vector on the bin associated to the ground truth
color or regularized version of it. Then, their loss is
L(ρ||ρ ) = KL(ρ Chroma ||ρ Chroma ) + λcKL(ρ Hue ||ρ Hue ) (12)
where c ∈ [0, 1] is the ground truth Chroma of the considered pixel and λ = 5 in
Larsson et al. (2016). The authors introduce this weight depending on the Chroma
multiplying the KL term on ρ Hue to avoid Hue instability issues when Chroma
approaches zero. For inference and to sample a color value per pixel from the
estimated marginal distributions, they experimentally tested that a median-based
selection (a periodically modified version in the case of Hue) gives the best results.
Besides, Vitoria et al. (2020) uses the KL loss (11) to learn, for each image,
the distribution density of semantic classes, for a fixed number of classes. It
provides information about the semantic content and objects present in the image.
21 Analysis of Different Losses for Deep Learning Image Colorization 829
In particular, they define the ground truth probability density ρ of semantic classes
to be the output distribution of a pre-trained VGG-16 model applied to the grayscale
image and ρ the estimated class distribution density.
where, again, ρ is usually taken as the ground truth density and ρ the predicted one.
In the classification context, ρ is often a one-hot vector equal to 1 on the ground
truth class, or a regularized version of it. Let us also note, from (11) and (13), that
there is a relationship between the Kullback–Leibler and the cross-entropy losses
given by
n
p(u) = p(u1 , u2 , . . . , un ) = p(u1 ) p(ui |u1 , . . . , ui−1 ). (15)
i=2
where ubi denotes the Cb value for pixel i, uri its Cr value, and ub,r i its (Cb, Cr)
chrominance. They train the model using maximum likelihood, with a cross-entropy
loss per pixel. Afterward, they perform high-resolution refinement to upscale the
chrominance image at the dimensions of the original grayscale image.
In Royer et al. (2017), a feed-forward network followed by an autoregressive
network is used to predict for each pixel a probability distribution over all possible
chrominances conditioned to the luminance. They work in the Lab color space.
p(ua,b |L) is factorized again as in (15) and (16) as the product of terms of the
form p(ua,b a,b a,b
i |u1 , . . . , ui−1 , L), which are learned on a set of training images D by
minimizing negative log-likelihood of the chrominance channels in the training data:
arg min − log p(ua,b |L). (17)
u∈D
In this section, we present the framework used to study the influence of the chosen
objective loss on the estimated images colorization results. First we detail the
architecture and second the dataset used for both training and testing. Note that
the same architecture and training procedure is used in Chap. 22, “Influence of
Color Spaces for Deep Learning Image Colorization” of this handbook.
Detailed Architecture
conv2 conv8
conv3
conv7
conv4 conv6
conv5
16 x 16 x 512
32 x 32 x 512 32 x 32 x
256 64 x 64 x
64 x 64 x 256
128
convolutional + ReLU
Transpose Convolution + BN
256 x 256 x 64
max pooling 256 x 256 x 64 256 x 256 x C
Fig. 1 Summary of the baseline U-Net architecture used in our experiments. It outputs a 256 × 256 × C image, where C stands for the number of channels,
being equal to 2 when estimating the missing chrominance channels and to 3 when estimating the RGB components
C. Ballester et al.
21 Analysis of Different Losses for Deep Learning Image Colorization 833
encoder and decoder blocks are linked with skip connections: feature maps from the
encoder are concatenated with the ones from the corresponding upsampling path
and fused using 1 × 1 convolutions. More details can be found in Table 2.
The encoder architecture is identical to the CNN part of a VGG net-
work (Simonyan and Zisserman 2015). It allows us to start from pretrained weights
initially used for ImageNet classification.
The training settings are described as follows:
• Optimizer: Adam
• Learning rate: 2e-5.
• Batch size: 16 images (10–11 GB RAM on Nvidia Titan V).
• All images are resized to 256 × 256 for training which enables using batches.
In practice, to keep the aspect ratio, the image is resized such that the smallest
dimension matches 256. If the other dimension remains larger than 256, we then
apply a random crop to obtain a square image. Note that the random crop is
performed using the same seed for all trainings.
More details regarding this framework are given in Chap. 22, “Influence of
Color Spaces for Deep Learning Image Colorization”.
For the last 20 years, colorization methods have mostly been evaluated with MAE,
MSE, Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM)
metrics (Wang et al. 2004).
In the context of colorization, the PSNR measures the ratio between the
maximum value of a color target image u : → RC and the Mean Square Error
(MSE) between u and a colorized image v : → RC with ∈ Z2 a discrete grid
of size M × N. That is
834 C. Ballester et al.
where C = 3 when working in the RGB color space and C = 2 in any luminance–
chrominance color space as YUV, Lab, and YCbCr. The PSNR score is considered
as a reconstruction measure tending to favor methods that will output results as close
as possible to the ground truth image in terms of the MSE.
SSIM intends to measure the perceived change in structural information between
two images. It combines three measures to compare images, color (l), contrast (c),
and structure (s):
2μu μv + c1 (2σu σv + c2 ) (σuv + c3 )
SSIM(u, v) = l(u, v)c(u, v)s(u, v) = 2
μu + μ2v + c1 σu2 + σv2 + c2 σu σv + c3
(19)
where μu (resp. σu ) is the mean value (resp. the variance) of image u values and σuv
the covariance of u and v. c1 , c2 , and c3 are regularization constants that are used to
stabilize the division for images with mean or standard deviation close to zero.
More recently, other perceptual metrics based on deep learning have been
proposed: the Fréchet Inception Distance (FID) (Heusel et al. 2017) and a Learned
Perceptual Image Patch Similarity (LPIPS) (Zhang et al. 2018). They have been
widely used in image editing for their ability to correlate well with human perceptual
similarity. FID (Heusel et al. 2017) is a quantitative measure used to evaluate the
quality of the outputs’ generative model and which aims at approximating human
perceptual evaluation. It is based on the Fréchet distance (Dowson and Landau 1982)
which measures the distance between two multivariate Gaussian distributions. FID
is computed between the feature-wise mean and covariance matrices of the features
extracted from an Inception v3 neural network applied to the input images (μr , Σr )
and those of the generated images (μg , Σg ):
1
Hl
Wl
LPIPS(u, v) = ωl (Φl (u)i,j − Φl (v)i,j )22 , (21)
Hl Wl
l i=1 j =1
where Hl (resp. Wl ) is the height (resp. the width) of feature map Φl at layer l and ωl
are weights for each features. Note that features are unit-normalized in the channel
dimension.
21 Analysis of Different Losses for Deep Learning Image Colorization 835
Other quantitative metrics can be found in the literature for image colorization.
Accuracy (Nazeri et al. 2018) measures the ratio between the number of pixels
that have the same color information as the source and the total number of pixels.
Raw accuracy (AuC) (Zhang et al. 2016) computes the percentage of predicted
pixel colors within a threshold of the L2 distance from the ground truth in ab
color space. The result is then swept across thresholds from 0 to 150 to produce a
cumulative mass function. Deshpande et al. (2017) evaluate colorfulness as the MSE
on histograms. Royer et al. (2017) verify if the framework produces vivid colors by
computing the average perceptual saturation (Lübbe 2010). Other works evaluate
the capability of a classification network to infer the right class to the generated
image (Zhang et al. 2016; He et al. 2018). Zhang et al. (2016) feed the generated
image to a classification network and observe if the classifier performs well.
Table 3 Evaluation metrics used by deep learning methods for image colorization
Quantitative User study
Note that all models that are trained with a L2 loss will more likely get better
PSNR or MSE as the L2 loss is correlated with the evaluation.
Table 3 summarizes the quantitative evaluation metrics more generally used in
the literature of image colorization. In our experiments, we choose to rely on the
more generally used and more recent ones, namely, L1 (MAE), L2 (MSE), PSNR,
SSIM, LPIPS, and FID.
Experimental Analysis
To compare the influence of the objective loss in the resulting colorization results,
we train the network described in Section “Proposed Colorization Framework” by
changing the objective loss. In particular, we train the network with the L1 loss, the
L2 loss, the VGG-based LPIPS, the combination of WGAN plus L2 losses, and
the combination of WGAN and VGG-based LPIPS. To the best of our knowledge,
the combination of the VGG-based LPIPS loss with a WGAN training procedure is
novel and has not been proposed in the recent literature.
For each of these losses, depending on the chosen color space, we estimate:
• either the two (a, b) chrominance channels given the luminance channel L as
input;
• or the three (R, G, B) color channels given a grayscale image as input.
Quantitative Evaluation
Table 4 shows the quantitative results comparing five losses, namely, the L1 loss,
the L2 loss, the VGG-based LPIPS, the combination of WGAN plus L2 losses, and
21 Analysis of Different Losses for Deep Learning Image Colorization 837
Table 4 Quantitative evaluation of colorization results for different loss functions. Metrics are
used to compare ground truth to every images in the 40k test set. Best and second best results by
column are in bold and italics, respectively
Color space Loss function MAE ↓ MSE ↓ PSNR ↑ SSIM ↑ LPIPS ↓ FID ↓
Lab L1 0.04407 0.00589 22.3020 0.9268 0.1587 8.8109
Lab L2 0.04488 0.00585 22.3283 0.9250 0.1613 8.1517
Lab LPIPS 0.04374 0.00566 22.4699 0.9228 0.1403 3.2221
Lab WGAN+L2 0.04459 0.00582 22.3512 0.9243 0.1609 7.6127
Lab WGAN+LPIPS 0.04383 0.00568 22.4541 0.9223 0.1406 3.1045
RGB L1 0.04385 0.00587 22.3119 0.9268 0.1583 8.0125
RGB L2 0.04458 0.00587 22.3136 0.9255 0.1606 7.4223
RGB LPIPS 0.04573 0.00577 22.3892 0.9196 0.1429 3.0576
RGB WGAN+L2 0.05256 0.00651 21.8667 0.8559 0.2469 15.4780
RGB WGAN+LPIPS 0.04901 0.00679 21.6806 0.9137 0.1495 2.6719
the combination of WGAN and VGG-based LPIPS (denoted in Table 4 as L1, L2,
LPIPS, WGAN+L2, and WGAN+LPIPS, respectively). The first five rows display
this assessment when the used color space is Lab (i.e., the model estimates the two
ab chrominance channels), while for the last five rows, the used color space is RGB
(i.e., the model estimates the three RGB color channels). In particular, let us remark
that the quantitative evaluations are always performed in the final RGB color space.
Thus, even when the model is trained to estimate the ab chrominance channels,
the resulting Lab color image is converted to the RGB color space to compute the
evaluation metrics.
From the results in Table 4, we observe that for the analyzed dataset, the models
trained with the VGG-based LPIPS loss function provide overall better quantitative
results, for both Lab and RGB color spaces. This is especially true for the perceptual
metrics LPIPS and FID, as they are strongly correlated to this loss function. The fact
that the VGG-based LPIPS training loss is computed on RGB color space (as this
loss is computed with a pre-trained VGG expecting RGB images as input) and also
is a quantitative result might be related to the performance (see also Chap. 22
“Influence of Color Spaces for Deep Learning Image Colorization”). In the same
spirit, we can observe a slight correlation between the used training loss and the
quantitative metric. For instance, when training with L1, MAE results are better.
However, we can see that L2 loss is not at the top in any of the metrics, while we
could have expected in the case of MSE or PSNR, but this is not the case.
Nevertheless, no strong tendency clearly emerges from this table: for many
metrics, the different losses do not differ so much from one another and could be in
the margin of error. From our analysis, we hypothesize that, apart from the chosen
objective function, the network architecture design, and the training process, may
play a very important role as a prior on the colorization operator. Further analysis
will be done on that matter.
838 C. Ballester et al.
Fig. 2 Examples where multiple objects are in the same image. Five losses are compared, namely,
L1, L2, LPIPS, WGAN+L2, and WGAN+LPIPS. The used color space is Lab for all the cases
(i.e., the model estimates two ab chrominance channels)
Qualitative Evaluation
Fig. 3 Examples to evaluate shininess of the results. Five losses are compared, namely, L1, L2,
LPIPS, WGAN+L2, and WGAN+LPIPS. The used color space is Lab for all the cases
Fig. 4 Colorization results on images that contain objects have strong structures and that have
been seen many times in the training set. Five losses are compared, namely, L1, L2, LPIPS,
WGAN+L2, and WGAN+LPIPS. The used color space is Lab for all the cases
In Fig. 2, we can see some results obtained for each of the studied losses in
images with multiple objects. We can observe that each loss brings slightly different
colors to objects. Overall, VGG-based LPIPS and WGAN losses generate shinier
and more colorful images (it can be seen, for instance, in the sky, grass, and
vegetables), although we can observe colorful examples in the case of the L2 loss
in the example of the flowers or vegetables. However, WGAN hallucinates more
unrealistic colors as can be seen on the table or the wall on the image with a flower
840 C. Ballester et al.
Fig. 5 Examples where multiple objects are in the same image. Five losses are compared, namely,
L1, L2, LPIPS, WGAN+L2, and WGAN+LPIPS perceptual. The used color space is RGB for all
the cases (i.e., the model estimates three RGB color channels)
of the last row of Fig. 2. This effect can be reduced by improving architecture and
semantic features (e.g., Vitoria et al. 2020) or by introducing spatial localization
(e.g., Su et al. 2020). Besides, by comparing the two last columns obtained with the
models trained with the adversarial strategy WGAN combined with, respectively,
the L2 or the VGG-based LPIPS, one can observe that WGAN+VGG-based LPIPS
tends to homogenize colors (e.g., some of the balloons take similar color to the
sky on the second row; the flowers on the fifth have grayish colors, more similar
to the wall). WGAN+VGG-based LPIPS also tends to have less bleeding than
WGAN+L2.
The generation of more vivid colors with VGG-based LPIPS and WGAN losses
in also visible on Fig. 3. The grass and bushes are more green and look more natural.
However, none of the losses give consistency to all the limbs of the tennis player on
the first row (e.g., the right leg).
Figure 4 shows results on objects, here zebra and stop sign, with strong contours
that were highly present in the training set. The colorization of this object is
impressive for any loss. None of the losses manage to properly colorize the person
21 Analysis of Different Losses for Deep Learning Image Colorization 841
Fig. 6 Examples to evaluate shininess of the results. Five losses are compared, namely, L1, L2,
LPIPS, WGAN+L2, and WGAN+LPIPS perceptual. The used color space is RGB for all the cases
Fig. 7 Colorization results on images that contain objects which have strong structures and that
have been seen many times in the training set. Four losses are compared, namely, L1, L2, LPIPS,
and WGAN+L2. The used color space is RGB for all the cases
near the center car on the first row. This type of examples could be improved by
learning high-level semantics on the image content.
Figures 5, 6, and 7 show an additional experimental comparison of five losses,
namely, L1, L2, VGG-based LPIPS, WGAN+L2, and WGAN+VGG-based LPIPS,
but when the network is trained to learn the three RGB color channels for all the
cases. For these test images, more realistic and consistent results are obtained in
general for this configuration. Let us notice from the results in these three figures
that more colorful images are obtained compared to the ones of Figs. 2, 3, and 4,
although less textured. Further analysis on the influence of the chosen color space
can be found in Chap. 22, “Influence of Color Spaces for Deep Learning Image
Colorization”.
842 C. Ballester et al.
Fig. 8 Examples in original black and white Images. These colorization results have been obtained
using the five networks trained, respectively, with L1, L2, LPIPS, WGAN+L2, and WGAN+LPIPS
losses, and learning the two ab chrominance channels
21 Analysis of Different Losses for Deep Learning Image Colorization 843
Fig. 9 Examples in original black and white Images. These colorization results have been obtained
using the five networks trained, respectively, with L1, L2, LPIPS, WGAN+L2, and WGAN+LPIPS
losses, and learning the three RGB color channels
844 C. Ballester et al.
Finally, in Figs. 8 and 9, we can see additional colorization results on real black and
white images from the Pascal VOC dataset. Those results have been obtained using
the network trained with the five different losses, respectively, with L1, L2, VGG-
based LPIPS, WGAN+L2, and WGAN+VGG-based LPIPS. For Fig. 8, only the two
ab chrominance channels are learned, while in Fig. 9, the three RGB color channels
are learned. Again, none of the losses manage to consistently colorize the skin of
all the people of the image at the first, second, and fourth rows of Fig. 8, although
possibly it is slightly better when using perceptual and GAN losses. Notice that
also in these cases, the colors are slightly more vivid, specially visible in the first
two rows of Fig. 8. However, color inconsistency and failures in spatial localization
appear, more visible in the first four rows. As mentioned, this effect can be reduced
by introducing semantic information (e.g., Vitoria et al. 2020) or spatial localization
(e.g., Su et al. 2020).
Conclusion
In this chapter, we have studied the role of loss functions on automatic colorization
with deep learning methods. Using a fixed standard network, we have shown that
the choice of the right loss does not seem to play a crucial role in the colorization
results. We therefore argue that most efforts should be made on the influence of
the architecture design, as it is related to the type of colorization operator one can
expect to obtain. Indeed, in our analysis, we used a U-Net-based architecture which
has shown to have a strong impact on the experimental results. For the employed
architecture, the models including the VGG-based LPIPS loss function provide
overall slightly better results, especially for the perceptual metrics LPIPS and FID.
Likewise, the role of both architectures and losses for obtaining a real diversity of
colorization results could be explored in future works.
Acknowledgments This study has been carried out with financial support from the French
Research Agency through the PostProdLEAP project (ANR-19-CE23-0027-01) and from the
EU Horizon 2020 research and innovation programme NoMADS (Marie Skłodowska-Curie
grant agreement No 777826). The first and fourth authors acknowledge partial support by
MICINN/FEDER UE project, ref. PGC2018-098625-B-I00, and RED2018-102511-T. This chapter
was written together with another chapter of the current handbook, called Chap. 22, “Influence
of Color Spaces for Deep Learning Image Colorization”. All authors have contributed to both
chapters.
References
Antic, J.: Deoldify. https://github.com/jantic/DeOldify (2019)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein Generative Adversarial Networks. In:
International Conference on Machine Learning, vol 70, pp. 214–223 (2017)
21 Analysis of Different Losses for Deep Learning Image Colorization 845
Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via Generative Adversarial
Networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 151–166 (2017)
Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: an improved autoregressive
generative model. In: International Conference on Machine Learning, pp. 864–872 (2018)
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: IEEE International Conference on Computer
Vision, pp. 415–423 (2015)
Deshpande, A., Lu, J., Yeh, M.-C., Jin Chong, M., Forsyth, D.: Learning diverse image
colorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6837–
6845 (2017)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Comparison of full-reference image quality models
for optimization of image processing systems. Int. J. Comput. Vis. 129(4), 1258–1281 (2021)
Dowson, D., Landau, B.: The Fréchet distance between multivariate normal distributions. J.
Multivar. Anal. 12(3), 450–455 (1982)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. J. Vis. 16(12), 326
(2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems
(2014)
Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: Pixcolor: pixel recursive
colorization. In: British Machine Vision Conference (2017)
Gu, S., Timofte, R., Zhang, R.: Ntire 2019 challenge on image colorization: report. In: Conference
on Computer Vision and Pattern Recognition Workshops (2019)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of
Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5769–5779
(2017)
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM Trans.
Graph. 37(4), 1–16 (2018)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-
scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information
Processing Systems, vol. 30 (2017)
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional
transformers (2019). arXiv preprint arXiv:1912.12180
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global
and local image priors for automatic image colorization with simultaneous classification. ACM
Trans. Graph. 35(4), 1–11 (2016)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial
networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134
(2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-
resolution. In: European Conference on Computer Vision, pp. 694–711 (2016)
Kong, G., Tian, H., Duan, X., Long, H.: Adversarial edge-aware image colorization with semantic
segmentation. IEEE Access 9, 28194–28203 (2021)
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer (2021). arXiv preprint
arXiv:2102.04432
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization.
In: European Conference on Computer Vision, pp. 577–593 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.:
Microsoft COCO: common objects in context. In: European Conference on Computer Vision,
pp. 740–755 (2014)
Lübbe, E.: Colours in the Mind-Colour Systems in Reality: A Formula for Colour Saturation.
BoD–Books on Demand, Norderstedt (2010)
Mouzon, T., Pierre, F., Berger, M.-O.: Joint CNN and variational model for fully-automatic image
colorization. In: Scale Space and Variational Methods in Computer Vision, pp. 535–546 (2019)
846 C. Ballester et al.
Nazeri, K., Ng, E., Ebrahimi, M.: Image colorization using Generative Adversarial Networks. In:
International Conference on Articulated Motion and Deformable Objects, pp. 85–94 (2018)
Oord, A.V.D., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Con-
ditional image generation with PixelCNN decoders. In: Advances in Neural Information
Processing Systems (2016)
Pierre, F., Aujol, J.-F.: Recent approaches for image colorization. In: Handbook of Mathematical
Models and Algorithms in Computer Vision and Imaging (2020)
Pierre, F., Aujol, J.-F., Bugeau, A., Papadakis, N., Ta, V.-T.: Luminance-chrominance model for
image colorization. SIAM J. Imag. Sci. 8(1), 536–563 (2015)
Pucci, R., Micheloni, C., Martinel, N.: Collaborative image and object level features for image
colourisation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–
2169 (2021)
Riba, E., Mishkin, D., Ponsa, D., Rublee, E., Bradski, G.: Kornia: an open source differentiable
computer vision library for PyTorch. In: Winter Conference on Applications of Computer
Vision, pp. 3674–3683 (2020)
Royer, A., Kolesnikov, A., Lampert, C.H.: Probabilistic image colorization. In: British Machine
Vision Conference (2017)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural
Information Processing Systems, vol. 30 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.
In: International Conference on Learning Representations (2015)
Su, J.-W., Chu, H.-K., Huang, J.-B.: Instance-aware image colorization. In: IEEE Conference on
Computer Vision and Pattern Recognition, pp. 7968–7977 (2020)
Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In:
International Conference on Machine Learning, pp. 1747–1756 (2016)
Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic
class distribution. In: Winter Conference on Applications of Computer Vision, pp. 2445–2454
(2020)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error
visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., Choo, J.: Coloring with limited data: few-shot
colorization via memory augmented networks. In: IEEE Conference on Computer Vision and
Pattern Recognition (2019)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on
Computer Vision, pp. 649–666 (2016)
Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A.S., Yu, T., Efros, A.A.: Real-time user-guided
image colorization with learned deep priors. ACM Trans. Graph. 36, 1–11 (2017)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of
deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern
Recognition, pp. 586–595 (2018)
Influence of Color Spaces for Deep Learning
Image Colorization 22
Aurélie Bugeau, Rémi Giraud, and Lara Raad
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
On Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Review of Colorization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Datasets Used in Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
Proposed Colorization Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
Detailed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
Training and Testing Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862
Learning Strategy for Different Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Analysis of the Influence of Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
Generalization to Archive Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875
A. Bugeau ()
LaBRI, CNRS, UMR5800, Univ. Bordeaux, F-33400 Talence, France
Institut universitaire de France (IUF), Paris, France
e-mail: [email protected]
R. Giraud
Univ. Bordeaux, CNRS, IMS UMR5251, Bordeaux INP, F-33400 Talence, France
e-mail: [email protected]
L. Raad
LIGM, CNRS, Univ Gustave Eiffel, F-77454 Marne-la-Vallée, France
e-mail: [email protected]
Abstract
Colorization is a process that converts a grayscale image into a colored one that
looks as natural as possible. Over the years this task has received a lot of atten-
tion. Existing colorization methods rely on different color spaces: RGB, YUV,
Lab, etc. In this chapter, we aim to study their influence on the results obtained
by training a deep neural network, to answer the following question: “Is it crucial
to correctly choose the right color space in deep learning-based colorization?”
First, we briefly summarize the literature and, in particular, deep learning-based
methods. We then compare the results obtained with the same deep neural
network architecture with RGB, YUV, and Lab color spaces. Qualitative and
quantitative analysis do not conclude similarly on which color space is better. We
then show the importance of carefully designing the architecture and evaluation
protocols depending on the types of images that are being processed and their
specificities: strong/small contours, few/many objects, recent/archive images.
Keywords
Introduction
to work directly on RGB to cope with this limitation by constraining the luminance
channel (Pierre et al. 2014).
The objective of this chapter is to analyze the influence of color spaces
on the results of automatic deep learning methods for image colorization. This
chapter comes together with another chapter of this handbook. This other chapter,
Chap. 21, “Analysis of Different Losses for Deep Learning Image Colorization”,
focuses on the influence of losses. We refer the reader to it for a review of the tradi-
tionally used different losses and evaluation metrics. Here, after reviewing existing
works in image colorization and, in particular, works based on deep learning, we
will focus on the influence of color spaces. Based on our analysis of the literature,
a baseline architecture is defined and later used in all comparisons. Additionally,
again based on the literature review, we set a uniform training procedure to ensure
fair comparisons. Experiments encompass qualitative and quantitative analysis.
The chapter is organized as follows. Section “Related Work” first recalls some
basics on color spaces and then provides a detailed survey of the literature on
colorization methods and finally lists the datasets traditionally used. Next, in
section “Proposed Colorization Framework”, we present the chosen architecture
and in section “Learning Strategy for Different Color Spaces” the learning strategy.
Section “Analysis of the Influence of Color Spaces” presents the results of the
different experiments. A discussion on the generalization of this work to archive
images is later provided in section “Generalization to Archive Images” before a
conclusion is drawn.
Related Work
On Color Spaces
This section presents the different color spaces that have been used for colorization
in the literature. For more information about color theory and color constancy (i.e.,
the underlying ability of human vision to perceive colors very robustly with respect
to changes of illumination), see, for instance, Ebner (2007) and Fairchild (2013).
Colored images are traditionally saved in the RGB color space. A grayscale
image contains only one channel that encodes the luminosity (perceived brightness
of that object by a human observer) or the luminance (absolute amount of light
emitted by an object per unit area). A way to model this luminance Y which is close
to the human perception of luminance is:
retrieval of two chrominance channels given the luminance Y . There exist several
luminance-chrominance spaces. Two of them are mostly used for colorization. The
first one, YUV, historically used for a specific analog encoding of color information
in television systems, is the result of the linear transformation:
⎛ ⎞ ⎛ ⎞⎛ ⎞
Y 0.299 0.587 0.114 R
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ U ⎠ = ⎝ −0.14713 −0.28886 0.436 ⎠ ⎝ G ⎠ .
V 0.615 −0.51498 −0.10001 B
The reverse conversion from YUV and RGB is simply obtained by inverting the
matrix. The other linear space that has been used for colorization is YCbCr.
The CIELAB color space, also referred to as Lab or La b , defined by the
International Commission on Illumination (CIE) in 1976, is also frequently used
for colorization. It has been designed such that the distances between colors in this
space correspond to the perceptual distances of colors for a human observer. The
three channels become uncorrelated. The transformation from RGB to Lab (and the
reverse) is nonlinear. First, it is necessary to convert the RGB values to the CIEXYZ
color space:
⎛ ⎞ ⎛ ⎞⎛ ⎞
X 2.769 1.7518 0.13 R
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ Y ⎠ = ⎝ 1 4.5907 0.0601 ⎠ ⎝ G ⎠ .
Z 0 0.0565 5.5943 B
with
⎧
⎨t 1/3 6 3
if t > ( 29 ) ,
f (t) = 2
⎩1 29
t+ 4
otherwise,
3 6 29
Table 1 Color spaces used in deep learning methods for image colorization
Using Histogram User Object
GANs prediction guided Diverse aware Survey
He et al. (2018)
Su et al. (2020)
Antic (2019)
RGB • • • • •
YUV • • •
YCbCr •
Lab • • • • • • • • • • • • • •
hue/ •
chroma
Compari • • •
son
This section presents an overview of the colorization methods in the three cat-
egories: scribble-based, exemplar-based, and deep learning. For a more detailed
review with the same classification, we refer the reader to the recent review Li et al.
(2020). Another survey focused on deep learning approaches proposes a taxonomy
to separate these methods into seven categories (Anwar et al. 2020). The authors of
this review have redrawn all networks architectures, thus allowing to easily compare
architecture specificity. Comparisons of methods are made on a new Natural-Color
Dataset made of objects with white background.
852 A. Bugeau et al.
Fig. 1 Example of scribble-based image colorization taken from Levin et al. (2004). The user
draws color that are successively diffused to neighbor pixels according under some constraints that
depend on the different methods
22 Influence of Color Spaces for Deep Learning Image Colorization 853
these approaches propose a global optimization over the image, thus leading to
spatial consistency in the result.
Fig. 2 Principle of exemplar-based image colorization. Methods in this category have proposed
different similar patch search strategies and techniques to add spatial consistency when copying
patch colors
854 A. Bugeau et al.
Gupta et al. (2012) extract different features from the superpixels (Ren and Malik
2003) of the target image and match them with the source ones. The final colors
are computed by imposing spatial consistency as in Levin et al. (2004). Li et al.
(2017b) extract low- and high-level features on superpixels of the reference to
form a dictionary then used as a dictionary-based sparse reconstruction problem.
Sparse representation was previously used for colorization in Pang et al. (2013)
where images are segmented from scribbles. These approaches incorporate local
consistency into automatic methods via segmentation. In Charpiat et al. (2008),
spatial consistency is solved with graph cuts after estimating for each pixel the
conditional probability of colors. In Bugeau et al. (2014) and Pierre et al. (2014)
each pixel can only take its chrominance (or RGB color) among a reduced set of
possible candidates chosen from the reference image. The final color is chosen using
a variational formulation. In the same trend, Fang et al. (2019) propose a superpixel-
based variational model. In Li et al. (2017a), the distribution of intensity deviation
for uniform and nonuniform regions is learned and used in a Markov random field
(MRF) model for improved consistency. Finally, Li et al. (2019) propose cross-scale
local texture matching, which are then fused using global graph-cut optimization.
A major problem of this family of methods is the high dependency on the
reference image. Chia et al. (2011) therefore propose to rely on several reference
images obtained from an Internet search based on semantic information.
For instance, the paper that won both tracks of the Gu et al. (2019) NTIRE
2019 Challenge on Image Colorization was the end-to-end method proposed by
IPCV_IIMT. It implements an encoder-decoder structure that resembles to a U-
Net with the encoder built using deep dense-residual blocks. Wan et al. (2020a)
proposed to combine neural networks with color propagation. It first trains a neural
network in order to colorize interest points of extracted superpixels. Then those
colors are propagated by optimizing an objective function. In an older work, Iizuka
et al. (2016) presented an end-to-end colorization framework based on CNNs to
22 Influence of Color Spaces for Deep Learning Image Colorization 855
infer the ab channels of the CIE Lab color space. This work is built on the basis
that a classification of the images can help to provide global priors that will improve
the colorization performance. The network extracts global and local features and is
jointly trained for classification and colorization in a labeled dataset.
Using GANs: Still being end-to-end, other methods use generative adversarial
networks (GANs) (Goodfellow et al. 2014). Isola et al. (2017) propose the so-
called image-to-image method pix2pix. It maps an input image to an output image
using a U-Net generator and a patch GANs discriminator. The method is used
in many applications including colorization. This method was extended in Nazeri
et al. (2018) using deep convolutional GAN (DCGAN) (Radford et al. 2016). In
Cao et al. (2017), a fully convolutional generator with a conditional GANs is
considered. This architecture does not use downsampling to avoid extracting global
features which are not suitable to recover accurate boundaries. To avoid noise
attenuation and make the colorization results more diversified, they concatenate a
noise channel onto the first half of the generator layers. GANs have also been used
in chromaGAN (Vitoria et al. 2020) which extends Iizuka et al. (2016) by proposing
to learn the semantic image distribution without any need of a labeled dataset.
This method combines three losses: a color error loss by computing MSE on ab
channels, a class distribution loss by computing the Kullback-Leibler divergence on
VGG-16 class distribution vectors, and an adversarial Wasserstein GAN (WGAN)
loss (Arjovsky et al. 2017). To prevent the need for training on a huge amount of
data, Yoo et al. (2019) introduce MemoPainter, a few-shot colorization framework.
MemoPainter is able to colorize an image with limited data by using an external
memory network in addition to a colorization network. The memory network learns
to retrieve a color feature that best matches the ground-truth color feature of
the query image, while the generator-discriminator colorization network learns to
effectively inject the color feature to the target grayscale image.
of a U-Net architecture trained as follows: the generator is first trained with the
perceptual loss (Johnson et al. 2016), followed by training the critic as a binary
classifier distinguishing between real images and those generated by the generator,
and finally the generator and critic are trained together in an adversarial manner on
1–3% of the ImageNet (Deng et al. 2009) data. The latter is the so-called NoGAN
strategy which is enough to add color realism to the results and which also allows
to avoid flickering across video frames while the colorization is applied individually
frame per frame.
Considering user priors: Few methods give the possibility to add user inputs as
additional priors. The architecture in Zhang et al. (2017) learns to propagate color
hints by fusing low-level cues and high-level semantic information. He et al. (2018)
uses a reference colored image to guide the output of their deep exemplar-based
colorization method.
Restoring and colorizing: Luo et al. (2020) propose to specifically restore and
colorize old black and white portrait photos in a unified framework. It uses an
additional high-quality color reference image (the sibling) automatically generated
by first training a network that projects images into the StyleGAN2 (Karras et al.
2020) latent space and then uses the pretrained StyleGAN2 generator to create the
sibling. Fine details and colors are extracted from the sibling. A latent code is
then optimized through a three-term cost function and decoded by a StyleGAN2
generator yielding a high-quality color version of the antique input. The cost
function is composed of a color term inspired by the style loss in Gatys et al. (2016a)
between the features of the sibling and those of the generated high-quality colored
image, a perceptual term (Johnson et al. 2016) between a degraded version of the
generative model’s output and the antique input, and a contextual term between the
VGG features of the sibling and those of the generated high-quality colored image.
Decomposing the scene into objects: Recently, some methods try to explicitly
deal with the decomposition of the scene into objects in order to tackle one of the
main drawbacks of most deep learning-based colorization methods which is color
bleeding across different objects. Su et al. (2020) proposed to colorize a grayscale
image in an instance-aware fashion. They train three separate networks: a first one
that performs global colorization, a second one that achieves instant colorization,
and a third one that fuses both colorization networks. These networks are trained
by minimizing the Huber loss (also called Smooth L1 loss). In general, after fusing
both results the global colorization will be enhanced. The instances per image are
obtained by using a standard pretrained object detection network, Mask R-CNN (He
et al. 2017). Pucci et al. (2021) propose to improve Zhang et al. (2016) by using a
network which is more aware of image instances, in the spirit of Su et al. (2020),
by combining convolutional and capsule networks. They train from end to end a
single network which first generates a per-pixel color distribution followed by a
final convolutional layer that recovers the missing chrominance channels as opposed
to Zhang et al. (2016) that computes the annealed mean on the per-pixel color
distribution network’s output. They train the network by minimizing the cross-
entropy between per pixel color distributions and L2 loss on the chrominance
channels. Kong et al. (2021) propose to colorize a grayscale image by training
a multitask network for colorization and semantic segmentation in an adversarial
manner. They train a U-Net-type network with a three-term cost function: a color
regression loss in terms of hue, saturation, and lightness; the cross-entropy on the
ground-truth and generated semantic labels; and a GANs term. The main objective
of the proposal is to reduce color bleeding across edges.
Table 2 summarizes all these deep learning methods providing details on their
particular inputs (other than the obvious grayscale image), their outputs, their
858 A. Bugeau et al.
Table 2 Short description of deep networks for image colorization, their input, other than
grayscale image, output. Here FCONV stands for fully convolutional, FC for fully connected,
and U-Net for a U-Net-like network and not the vanilla U-Net
Additional Network’s
inputs Network output Post-processing
Cheng et al. Handcrafted 3 layers FC UV Joint bilateral
(2015) features filtering
Iizuka et al. − CNNs (local/global) ab Upsampling
(2016)
Wan et al. Superpixels’ FC net Interest points’ Propagation and
(2020a) handcrafted color refinement
features
Using GANs
Vitoria et al. − CNNs (local/global) ab Upsampling
(2020) + PatchGAN
Nazeri et al. − U-Net (Isola et al. Lab −
(2018) 2017) + DCGAN
Cao et al. − FCONV generator UV/RGB −
(2017) with multi-layer noise (diverse)
+ PatchGAN
Yoo et al. Color thief Colorization U-Net + − −
(2019) features memory nets noise
Antic (2019) − U-Net + self-attention RGB YUV
+ GAN conversion +
cat(original
Y/UV) + RGB
conversion
Histogram prediction
Larsson et al. − VGG-16 + FC layers Distributions Expectation
(2016)
Zhang et al. − VGG-styled net Distributions Annealed mean
(2016)
Mouzon et al. − Zhang et al. (2016) Distributions Variational
(2019) model
User guided
Zhang et al. User point, U-Net Distributions + −
(2017) global ab
histograms,
and average
saturation
He et al. Color Similarity sub-net + Bidirectional −
(2018) reference U-Net (gray similarity
VGG-19) maps + ab
(continued)
22 Influence of Color Spaces for Deep Learning Image Colorization 859
Table 2 (continued)
Additional Network’s
inputs Network output Post-processing
Diverse colorization and autoregressive models
Deshpande − cVAE + MDN Diverse −
et al. (2017) colorization
Guadarrama − PixelCNN + CNN Diverse −
et al. (2017) colorization
Royer et al. − CNN + PixelCNN++ Diverse −
(2017) colorization
Kumar et al. − Axial transformer + Diverse −
(2021) color/spatial colorization
upsamplers
(self-attention blocks)
Object aware
Su et al. Object U-Net ab −
(2020) bounding (global/instance) +
boxes CNN (fusion)
Pucci et al. − CNN + capsule net ab −
(2021)
Kong et al. − U-Net + PatchGAN ab + semantic −
(2021) segmentation
Survey
Gu et al. − U-Net RGB −
(2019)
architectures, and pre- and post-processing steps. This summary table is only
provided for deep learning-based methods since we focus on deep learning-based
strategies in the remaining of the chapter.
To train and test the deep learning methods presented in previous subsection,
different datasets have been used. Table 3 summarizes the use of these datasets
in colorization methods. They contain from one thousand (DIV2K (Agustsson
and Timofte 2017)) to million of images (ImageNet (Deng et al. 2009)). Image
dimensions also vary a lot, from 32 × 32 in CIFAR-10 (Krizhevsky et al. 2009) to
2K resolution in DIV2K.
Other differences concern the content of the images itself. Some datasets are
very specific to a type of image: faces (LFW (Huang et al. 2007)) and bedrooms
(LSUN (Yu et al. 2015)). Other present various scenes as Places (Zhou et al. 2017)
with 205 scene categories, COCO (Lin et al. 2014) with 80 object categories and 91
stuff categories, and SUN (Xiao et al. 2010) with 899 scene categories.
860 A. Bugeau et al.
Pascal VOC
CIFAR-10
DIV2K
COCO
Places
SUN
Remark/Other
In this section, we present the framework that we will use for evaluating the influ-
ence of color spaces on image colorization results. First, we detail the architecture
and, second, the dataset used for training and testing.
Note that the same architecture and training procedure are used in the Chap. 21,
“Analysis of Different Losses for Deep Learning Image Colorization” of this
handbook.
Detailed Architecture
Fig. 4 Summary of the baseline U-Net architecture used in our experiments. It outputs a 256 ×
256 × C image, where C stands for the number of channels, being equal to 2 when estimating the
missing chrominance channels and to 3 when estimating the RGB components
862 A. Bugeau et al.
• Optimizer: Adam
• Learning rate: 2e-5 as in ChromaGAN (Vitoria et al. 2020).
• Batch size: 16 images (approx. 11 GB RAM usage on Nvidia Titan V).
• All images are resized to 256 × 256 for training which enable using batches.
In practice, to keep the aspect ratio, the image is resized such that the smallest
dimension matches 256. If the other dimension remains larger than 256, we then
apply a random crop to obtain a square image. Note that the random crop is
performed using the same seed for all trainings.
Throughout our experiments we use the COCO dataset (Lin et al. 2014), containing
various natural images of different sizes. COCO is divided into three sets that
approximately contain 118k, 5k, and 40k images that, respectively, correspond to
the training, validation, and test sets. Note that we carefully remove all grayscale
images, which represent around 3% of the overall amount of each set. Although
larger datasets such as ImageNet have been regularly used in the literature, COCO
offers a sufficient number and a good variety of images so we can efficiently train
and compare numerous models.
22 Influence of Color Spaces for Deep Learning Image Colorization 863
The goal of the whole colorization process is to generate RGB images that look
visually natural. When training on different color spaces, one must decide which
color space is used to compute losses and when is the conversion back to RGB
performed. In this chapter, we propose to experiment with three learning strategies
to compare RGB, YUV, and Lab color spaces (see Fig. 5):
• RGB: in this case, the network takes as input a grayscale image L and directly
estimates a three-channel RGB image of size 256 × 256 × 3. The loss is done
directly in the RGB color space. This strategy is illustrated in Fig. 5a.
• YUV and Lab Luminance/chrominance: in this case, the network takes as input a
grayscale image considered as the luminance (L for Lab, Y for YUV) and outputs
two chrominance channels (a, b or U , V ). The loss compares the output with
the corresponding chrominance channels of the ground-truth image converted
to the luminance/chrominance space. After concatenating the initial luminance
channel to the inferred chrominances, the image is converted back to RGB for
visualization purposes. This strategy is illustrated in Fig. 5b.
• LabRGB: as in the previous case, the network takes as input the luminance and
estimates the corresponding two chrominance channels. After concatenating with
the corresponding luminance channel, they are converted to the RGB color space
and the loss is computed directly there. Notice that in this last case, as the loss
is computed on RGB color space, the conversion must be done in a way that is
differentiable to be able to compute the gradient and allow the backpropagation
step. We perform the color conversion using the color module in the Kornia
library. Kornia (Riba et al. 2020) is a differentiable library that consists of a set of
routines and differentiable modules to solve generic computer vision problems. It
allows classical computer vision tasks to be integrated into deep learning models.
Computing the loss on RGB images instead of chrominance ones enables to
ensure images are similar to ground truth after the clipping operation needed
to fit into the RGB cube. This strategy is illustrated in Fig. 5c.
Remark. During training, all images are resized to 256 × 256. One advantage of
using luminance/chrominance spaces is that only chrominance channels are resized.
It is therefore possible to keep the original content of the luminance channels
without manipulating it with the resizing steps.
This section presents quantitative and qualitative results obtained with the three
strategies discussed above. For this analysis, we have considered, as loss function,
the L2 loss and the VGG-based LPIPS which was introduced in Ding et al. (2021)
as a generalization of the feature loss (Johnson et al. 2016). These loss functions are
defined hereafter.
864 A. Bugeau et al.
c
22 Influence of Color Spaces for Deep Learning Image Colorization 865
MSE or squared L2 loss. The L2 loss, between two functions u and v defined
on and with values in RC , C ∈ N, is defined as the squared L2 loss of their
difference. That is,
MSE(u, v) = u − v2L2 (;RC ) = u(x) − v(x)22 dx, (2)
M
N
C
MSE(u, v) = (ui,j,k − vi,j,k )2 . (3)
i=1 j =1 k=1
Feature Loss. The feature reconstruction loss (Gatys et al. 2016b; Johnson et al.
2016) is a perceptual loss that encourages images to have similar feature represen-
tations as the ones computed by a pretrained network, denoted here by . Let l (u)
be the activation of the l-th layer of the network when processing the image u; if l
is a convolutional layer, then l (u) will be a feature map of size Cl × Wl × Hl . The
feature reconstruction loss is the normalized squared Euclidean distance between
feature representations, that is,
1
l
Lfeat (u, v) = l (u) − l (v)2 . (4)
Cl Wl Hl 2
It penalizes the output reconstructed image when it deviates in feature content from
the target.
LPIPS. LPIPS (Zhang et al. 2018) computes a weighted L2 distance between deep
features of a pair of images u and v:
1
Hl
Wl
LPIPS(u, v) = ωl (l (u)i,j − l (v)i,j )22 , (5)
Hl Wl
l i=1 j =1
where Hl (resp. Wl ) is the height (resp. the width) of feature map l at layer l and ωl
is the weight for each feature. Note that features are unit-normalized in the channel
dimension. We will denote VGG-based LPIPS when feature maps l are taken from
a VGG network.
Note that to compute the VGG-based LPIPS loss, the output colorization always
has to be converted to RGB, even for YUV and Lab color spaces (as in Fig. 5c),
because this loss is computed with a pretrained VGG expecting RGB images as
input. Since VGG-based LPIPS is computed on RGB images, the two strategies
Lab and LabRGB are the same. For more details on the various losses usually
used in colorization, we refer the reader to the Chap. 21, “Analysis of Different
866 A. Bugeau et al.
Losses for Deep Learning Image Colorization” . Our experiments have shown that
same conclusions can be drawn with other losses.
For testing, we apply the network to images at their original resolution, while
training is done on batches of square 256 × 256 images.
Quantitative Evaluation
MAE or L1 loss with l 1 -coupling. The mean absolute error is defined as the L1
loss with l 1 -coupling, that is,
C
MAE(u, v) = u(x) − v(x)l 1 dx = |uk (x) − vk (x)|dx. (6)
k=1
In the discrete setting, it coincides with the sum of the absolute differences |ui,j,k −
vi,j,k |. Some authors use a l 2 -coupled version of it:
N
M C
MAEc (u, v) = (u
i,j,k − vi,j,k ) .
2 (7)
i=1 j =1 k=1
PSNR. The PSNR measures the ratio between the maximum value of a color target
image u : → RC and the mean square error (MSE) between u and a colorized
image v : → RC with ∈ Z2 a discrete grid of size M × N . That is,
where C = 3 when working in the RGB color space and C = 2 in any luminance-
chrominance color space as YUV, Lab, and YCbCr. The PSNR score is considered
as a reconstruction measure tending to favor methods that will output results as close
as possible to the ground-truth image in terms of the MSE.
2μu μv + c1 (2σu σv + c2 ) (σuv + c3 )
SSIM(u, v) = l(u, v)c(u, v)s(u, v) = 2 ,
μu + μ2v + c1 σu2 + σv2 + c2 σu σv + c3
(9)
where μu (resp. σu ) is the mean value (resp. the variance) of image u values and
σuv the covariance of u and v. c1 , c2 , c3 are regularization constants that are used to
stabilize the division for images with mean or standard deviation close to zero.
FID. FID (Heusel et al. 2017) is a quantitative measure used to evaluate the quality
of the outputs’ generative model and which aims at approximating human perceptual
evaluation. It is based on the Fréchet distance (Dowson and Landau 1982) which
measures the distance between two multivariate Gaussian distributions. FID is
computed between the feature-wise mean and covariance matrices of the features
extracted from an Inception v3 neural network applied to the input images (μr , r )
and those of the generated images (μg , g ):
FID (μr , r ), (μg , g ) = μr − μg 22 + T r(r + g − 2r g )1/2 . (10)
The results are presented in Table 5. In terms of these metrics, the best results are
obtained with YUV color space except for L1 and Fréchet Inception Distance, even
if not by much. The results in Table 5 also indicate that Lab does not outperform
other color spaces when using a classic reconstruction loss (L2), while better
results are obtained when using the VGG-based LPIPS. Thus, using a feature-
based reconstruction loss is better suited as was already the case in exemplar-based
image colorization methods where different features for patch-based metrics were
proposed for matching pixels. LabRGB strategy gets the worst quantitative results
based on Table 5. One would expect to get the “best of both” color spaces while
recovering from the loss of information in the conversion process. However, this is
not reflected with these particular evaluation metrics. The LabRGB line for VGG-
based LPIPS is not included, as it would be identical to the Lab one. Also, note
that the quantitative evaluation is performed on RGB images as opposed to training
which is done for specific color spaces (RGB, YUV, Lab, and LabRGB).
868 A. Bugeau et al.
Table 5 Quantitative evaluation of colorization results for different color spaces. Metrics are used
to compare ground truth to every image in the 40k test set. Best and second best results by column
are in bold and italicized respectively
Color space Loss function L1 ↓ L2 ↓ PSNR ↑ SSIM ↑ LPIPS ↓ FID ↓
RGB L2 0.04458 0.00587 22.3136 0.9255 0.1606 7.4223
YUV L2 0.04469 0.00562 22.5052 0.9278 0.1593 7.6642
Lab L2 0.04488 0.00585 22.3283 0.9250 0.1613 8.1517
LabRGB L2 0.04608 0.00589 22.2989 0.9209 0.1698 8.3413
RGB LPIPS 0.04573 0.00577 22.3892 0.9197 0.1429 3.0576
YUV LPIPS 0.04460 0.00557 22.5438 0.9097 0.1400 3.3260
Lab LPIPS 0.04374 0.00566 22.4699 0.9228 0.1403 3.2221
Qualitative Evaluation
In this section, we qualitatively analyze the results obtained by training the network
with different color spaces as explained in section “Learning Strategy for Different
Color Spaces”.
Figure 6 shows results on images and objects (here person skiing, stop sign
and zebra) with strong contours that were highly present in the training set. The
colorization of these images is really impressive for any color space. Nevertheless,
YUV has the tendency to sometimes create artifacts that are not predictable. This
is visible with the blue stain in the YUV-L2 zebra and the yellow spot in the YUV-
LPIPS zebra. One can also notice that the overall colorization tends to be more
homogeneous with LabRGB-L2 than with Lab-L2 as can be seen, for instance,
on the wall behind the stop signs, the grass, and tree leaves in the zebra image
which suggest that it might be better to compute losses over RGB images. A similar
remark is valid for the VGG-based LPIPS results as can be seen, for instance, in the
homogeneous colorization of the sky in the person skiing image where the loss is
again computed over the RGB image. This indicates that there could be an additional
influence on the results when using VGG-based LPIPS given that the predicted
colored image is converted back to RGB before backpropagation.
Figure 7 presents results on images where the final colorization is not consistent
over the whole image. On the first row, the color of the water is stopped by the
chair legs. On the second row, the colors of the grass and the sky are not always
similar on both sides of the hydrant. LabRGB seems to reduce this effect. This
happens when strong contours seem to stop the colorization and are independent
on the color space. Global coherency can only be obtained if the receptive field is
large enough and that self-similarities present in natural images are preserved. These
results highlight that efforts must be put on the design of architectures that would
impose these constraints.
One major problem in automatic colorization results comes from color bleedings
that occur as soon as contours are not strong enough. Figure 8 illustrates this
problem in different contexts. On the first row, the color from the flowers bleeds
22 Influence of Color Spaces for Deep Learning Image Colorization 869
Fig. 6 Colorization results with different color spaces on images that contain objects, have strong
structures, and have been seen many times in the training set. The three first rows are with L2 loss
and the three last ones with VGG-based LPIPS
to the wall. On the second row, the green of the grass bleeds to the shorts. Finally,
on the last row, the green of the grass bleeds to the neck of the background cow.
These effects are independent from the color space or the loss. Some methods
reduce this effect by introducing semantic information (e.g., Vitoria et al. 2020)
or spatial localization (e.g., Su et al. 2020), while others achieve to reduce it by
considering segmentation as an additional task (e.g., Kong et al. 2021). Note that
with the VGG-based LPIPS, Lab color space provides more realistic result on the
tennis man image.
Finally, Fig. 9 presents colorization of images containing many different objects.
We see that final colors might be dependent on the color spaces and are more diverse
and colorful with Lab color space. LabRGB strategy with L2 loss is probably the
more realistic statement that holds with the VGG-based LPIPS.
870 A. Bugeau et al.
Fig. 7 Colorization results with different color spaces on images that exhibit strong structures that
may lead to inconsistent spatial colors. The two first rows are with L2 loss and the two last ones
with VGG-based LPIPS
The qualitative evaluation does not point to the same conclusion as the quanti-
tative one. According to Table 5, the best colorization is obtained for YUV color
space. However, the qualitative analysis shows that even if in some cases colors
are brighter and more saturated in other ones, it creates unpredictable color stains
(yellowish and blueish). This raises the question on the necessity to design specific
metrics for the colorization task, which should be combined with user studies. Also,
in the qualitative evaluation, one can observe that when working with LabRGB
instead of Lab, the overall colorization result looks more stable and homogeneous
as opposed to what is concluded in the quantitative evaluation.
Summary of qualitative analysis: Our analysis leads us to the following conclu-
sions:
• There is no major difference in the results regarding the color space that is used.
• YUV color space sometimes generates color artifacts that are hardly predictable.
This is probably due to clipping that is necessary to remain in the color space
range of values.
• More realistic and consistent results are obtained when losses are computed in
the RGB color space.
• There is no evidence justifying why most colorization methods in the literature
choose to work with Lab. One can assume that this is mainly done to ease the
colorization problem by working in a perceptual luminance-chrominance color
space. In addition, differentiable color conversion libraries were not available up
22 Influence of Color Spaces for Deep Learning Image Colorization 871
Fig. 8 Colorization results with different color spaces on images that contain small contours
which lead to color bleeding. The two first rows are with L2 loss and the two last ones with
VGG-based LPIPS
to 2020 to apply a strategy as in Fig. 5c. In fact, the qualitative results show
that when training on RGB, the luminance reconstruction is satisfying in all
examples. Hence, there is no obvious reason why not to work directly in RGB
color space.
• Same conclusions hold with different losses.
872 A. Bugeau et al.
Fig. 9 Colorization results with different color spaces on images that contain several small objects
which end up with different colors depending on the color spaces used. The three first rows are with
L2 loss and the three last ones with VGG-based LPIPS
Fig. 10 Colorization results with different color spaces and L2 or VGG-based LPIPS on archive
black and white images
enable artists to reach high-level quality images but require long human inter-
vention. The current pipeline for professional colorization usually starts with
restoration: denoising, deblurring, completion, super-resolution with off-the-shelf
tools (e.g., Diamant) and manual correction. Next, images are segmented into
objects and manually colorized by specialists with color spectrum that must be
historically and artistically correct.
Automatic colorization methods could at least help professionals in the last step.
Very few papers in the literature tackle old black and white images’ colorization. In
deep learning-based approaches, Vitoria et al. (2020) and Antic (2019) present some
results on Legacy Black and White Photographs, while Luo et al. (2020) restore and
colorize old black and white portraits. Wan et al. (2020b) focus on the restoration of
old photos by training two variational autoencoders (VAE) to project clean and old
photos to two latent spaces and to learn the translation between these latent spaces
on synthetic paired data. Old photos are synthesized using Pascal VOC dataset’s
images.
874 A. Bugeau et al.
Conclusion
This chapter has presented the role of the color spaces on automatic colorization
with deep learning. Using a fixed standard network, we have shown, qualitatively
and quantitatively, that the choice of the right color space is not straightforward and
might depend on several factors such as the architecture or the type of images. With
our architecture, the best quantitative results are obtained in YUV, while qualitative
results rather teach us to compute losses in the RGB color space. We therefore
argue that most efforts should be made on the architecture design. Furthermore,
for all methods the final step consists in clipping final values to fit in the RGB
color cube. This abrupt operation sometimes leads to artifacts with saturated pixels.
An interesting topic for future research would be to learn a model that learns a
projection into the color cube while preserving good image quality, similar to the
geometric model from Pierre et al. (2015b). Future works should also include the
development of methods that would give the possibility to produce several outputs in
the same trend as HistoGAN (Afifi et al. 2021). Finally, if the purpose of colorization
is often to enhance old black and white images, research papers rarely focus on this
application. Strategies for better training or transfer learning must be developed
in the future along with complete architectures that perform colorization together
with other quality improvement methods such as super resolution, denoising, or
deblurring.
Acknowledgments This study has been carried out with financial support from the French
Research Agency through the PostProdLEAP project (ANR-19-CE23-0027-01) and from the
EU Horizon 2020 research and innovation program NoMADS (Marie Skłodowska-Curie grant
agreement No 777826). This chapter was written together with another chapter of the current
handbook, Chap. 21, “Analysis of Different Losses for Deep Learning Image Colorization”.
All authors have contributed to both chapters.
22 Influence of Color Spaces for Deep Learning Image Colorization 875
References
Afifi, M., Brubaker, M.A., Brown, M.S.: HistoGAN: controlling colors of gan-generated and real
images via color histograms. In: IEEE Conference on Computer Vision and Pattern Recognition,
pp. 7941–7950 (2021)
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: dataset and
study. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135
(2017)
Antic, J.: Deoldify (2019). https://github.com/jantic/DeOldify
Anwar, S., Tahir, M., Li, C., Mian, A., Khan, F.S., Muzaffar, A.W.: Image colorization: a survey
and dataset (2020). arXiv preprint arXiv:2008.10774
Arbelot, B., Vergne, R., Hurtut, T., Thollot, J.: Automatic texture guided color transfer and
colorization. In: Expressive, Elsevier, pp. 21–32 (2016)
Arbelot, B., Vergne, R., Hurtut, T., Thollot, J.: Local texture-based color transfer and colorization.
Comput. Graph. 62, 15–27 (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In:
International Conference on Machine Learning, vol. 70, pp. 214–223 (2017)
Bugeau, A., Ta, V.-T.: Patch-based image colorization. In: International Conference on Pattern
Recognition, pp. 3058–3061 (2012)
Bugeau, A., Ta, V.-T., Papadakis, N.: Variational exemplar-based image colorization. IEEE Trans.
Image Process. 23(1), 298–307 (2014)
Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via generative adversarial
networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 151–166 (2017)
Charpiat, G., Hofmann, M., Schölkopf, B.: Automatic image colorization via multimodal
predictions. In: European Conference on Computer Vision, pp. 126–139 (2008)
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: IEEE International Conference on Computer
Vision, pp. 415–423 (2015)
Chia, A.Y.-S., Zhuo, S., Gupta, R.K., Tai, Y.-W., Cho, S.-Y., Tan, P., Lin, S.: Semantic colorization
with internet images. In: ACM SIGGRAPH ASIA (2011)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical
image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–
255 (2009)
Deshpande, A., Rock, J., Forsyth, D.: Learning large-scale automatic image colorization. In: IEEE
International Conference on Computer Vision (2015)
Deshpande, A., Lu, J., Yeh, M.-C., Jin Chong, M., Forsyth, D.: Learning diverse image
colorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6837–
6845 (2017)
Di Blasi, G., Reforgiato, D.: Fast colorization of gray images. In: Eurographics Italian,
Eurographics Association (2003)
Ding, X., Xu, Y., Deng, L., Yang, X.: Colorization using quaternion algebra with automatic scribble
generation. In: Advances in Multimedia Modeling, pp. 103–114 (2012)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Comparison of full-reference image quality models
for optimization of image processing systems. Int. J. Comput. Vis. 129(4), 1258–1281 (2021)
Dowson, D., Landau, B.: The Fréchet distance between multivariate normal distributions. J.
Multivar. Anal. 12(3), 450–455 (1982)
Drew, M.S., Finlayson, G.D.: Improvement of colorization realism via the structure tensor. Int. J.
Image Graph. 11(4), 589–609 (2011)
Ebner, M.: Color Constancy, vol. 7. Wiley, Hoboken (2007)
876 A. Bugeau et al.
Efros, A., Leung, T.: Texture synthesis by non-parametric sampling. In: IEEE International
Conference on Computer Vision, pp. 1033–1038 (1999)
Fairchild, M.D. : Color Appearance Models. Wiley, Hoboken (2013)
Fang, F., Wang, T., Zeng, T., Zhang, G.: A superpixel-based variational model for image
colorization. IEEE Trans. Vis. Comput. Graph. 26(10), 2931–2943 (2019)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural net-
works. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423
(2016a)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. J. Vis. 16(12), 326
(2016b)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems
(2014)
Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: Pixcolor: pixel recursive
colorization. In: British Machine Vision Conference (2017)
Gu, S., Timofte, R., Zhang, R.: Ntire 2019 challenge on image colorization: report. In: Conference
on Computer Vision and Pattern Recognition Workshops (2019)
Gupta, R.K., Chia, A.Y.-S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar
images. In: ACM International Conference on Multimedia, pp. 369–378 (2012)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 2961–2969 (2017)
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM Trans.
Graph. 37(4), 1–16 (2018)
Heu, J., Hyun, D.-Y., Kim, C.-S., Lee, S.-U.: Image and video colorization based on prioritized
source propagation. In: IEEE International Conference on Image Processing, pp. 465–468
(2009)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-
scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information
Processing Systems, vol. 30 (2017)
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional
transformers. arXiv preprint arXiv:1912.12180 (2019)
Huang, Y.-C., Tung, Y.-S., Chen, J.-C., Wang, S.-W., Wu, J.-L.: An adaptive edge detection based
colorization algorithm and its applications. In: ACM International Conference on Multimedia,
pp. 351–354 (2005)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for
studying face recognition in unconstrained environments. Technical Report 07-49, University
of Massachusetts, Amherst (2007)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global
and local image priors for automatic image colorization with simultaneous classification. ACM
Trans. Graph. 35(4), 1–11 (2016)
Irony, R., Cohen-Or, D., Lischinski, D.: Colorization by example. In: Eurographics Conference on
Rendering Techniques (2005)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial
networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134
(2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-
resolution. In: European Conference on Computer Vision, pp. 694–711 (2016)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the
image quality of stylegan. In: IEEE Conference on Computer Vision and Pattern Recognition,
pp. 8110–8119 (2020)
Kawulok, M., Kawulok, J., Smolka, B.: Discriminative textural features for image and video
colorization. IEICE Trans. Inf. Syst. 95-D(7), 1722–1730 (2012)
Kong, G., Tian, H., Duan, X., Long, H.: Adversarial edge-aware image colorization with semantic
segmentation. IEEE Access 9, 28194–28203 (2021)
22 Influence of Color Spaces for Deep Learning Image Colorization 877
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical
report, University of Toronto (2009)
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. arXiv preprint
arXiv:2102.04432 (2021)
Lagodzinski, P., Smolka, B.: Digital image colorization based on probabilistic distance transfor-
mation. In: 50th International Symposium ELMAR, vol. 2, pp. 495–498 (2008)
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization.
In: European Conference on Computer Vision, pp. 577–593 (2016)
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Trans. Graph. 23(3),
689–694 (2004)
Lézoray, O., Ta, V.-T., Elmoataz, A.: Nonlocal graph regularization for image colorization. In:
International Conference on Pattern Recognition, pp. 1–4 (2008)
Li, B., Lai, Y.-K., Rosin, P.L.: Example-based image colorization via automatic feature selection
and fusion. Neurocomputing 266, 687–698 (2017a)
Li, B., Zhao, F., Su, Z., Liang, X., Lai, Y.-K., Rosin, P.L.: Example-based image colorization
using locality consistent sparse representation. IEEE Trans. Image Process. 26(11), 5188–5202
(2017b)
Li, B., Lai, Y.-K., John, M., Rosin, P.L.: Automatic example-based image colorization using
location-aware cross-scale matching. IEEE Trans. Image Process. 28(9), 4606–4619 (2019)
Li, B., Lai, Y.-K., Rosin, P.L.: A review of image colourisation. In: Handbook of Pattern
Recognition and Computer Vision, p. 139. World Scientific, Singapore (2020)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.:
Microsoft COCO: common objects in context. In: European Conference on Computer Vision,
pp. 740–755 (2014)
Ling, Y., Au, O.C., Pang, J., Zeng, J., Yuan, Y., Zheng, A.: Image colorization via color propagation
and rank minimization. In: IEEE International Conference on Image Processing, pp. 4228–4232
(2015)
Liu, S., Zhang, X.: Automatic grayscale image colorization using histogram regression. Pattern
Recogn. Lett. 33(13), 1673–1681 (2012)
Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y.-Q., Shum, H.-Y.: Natural image colorization.
In: Eurographics Conference on Rendering Techniques, pp. 309–320 (2007)
Luo, X., Zhang, X., Yoo, P., Martin-Brualla, R., Lawrence, J., Seitz, S.M.: Time-travel
rephotography. arXiv preprint arXiv:2012.12261 (2020)
Mouzon, T., Pierre, F., Berger, M.-O.: Joint CNN and variational model for fully-automatic image
colorization. In: Scale Space and Variational Methods in Computer Vision, pp. 535–546 (2019)
Nazeri, K., Ng, E., Ebrahimi, M.: Image colorization using generative adversarial networks. In:
International Conference on Articulated Motion and Deformable Objects, pp. 85–94 (2018)
Oord, A.V.D., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Con-
ditional image generation with PixelCNN decoders. In: Advances in Neural Information
Processing Systems (2016)
Pang, J., Au, O.C., Tang, K., Guo, Y.: Image colorization using sparse representation. In: IEEE
International Conference on Acoustics, Speech, and Signal Processing, pp. 1578–1582 (2013)
Pierre, F., Aujol, J.-F.: Recent approaches for image colorization. In: Handbook of Mathematical
Models and Algorithms in Computer Vision and Imaging, springer (2020)
Pierre, F., Aujol, J.-F., Bugeau, A., Ta, V.-T.: A unified model for image colorization. In: European
Conference on Computer Vision Workshops, pp. 297–308 (2014)
Pierre, F., Aujol, J.-F., Bugeau, A., Papadakis, N., Ta, V.-T.: Luminance-chrominance model for
image colorization. SIAM J. Imaging Sci. 8(1), 536–563 (2015a)
Pierre, F., Aujol, J.-F., Bugeau, A., Ta, V.-T.: Luminance-Hue Specification in the RGB Space.
In: Scale Space and Variational Methods in Computer Vision, pp. 413–424. Springer, Cham
(2015b)
Pucci, R., Micheloni, C., Martinel, N.: Collaborative image and object level features for image
colourisation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–
2169 (2021)
878 A. Bugeau et al.
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional
generative adversarial networks. In: International Conference on Learning Representations
(2016)
Ren, X., Malik, J.: Learning a classification model for segmentation. In: IEEE International
Conference on Computer Vision, pp. 10–17 (2003)
Riba, E., Mishkin, D., Ponsa, D., Rublee, E., Bradski, G.: Kornia: an open source differentiable
computer vision library for PyTorch. In: Winter Conference on Applications of Computer
Vision, pp. 3674–3683 (2020)
Royer, A., Kolesnikov, A., Lampert, C.H.: Probabilistic image colorization. In: British Machine
Vision Conference (2017)
Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the Pixel-
CNN with discretized logistic mixture likelihood and other modifications. arXiv preprint
arXiv:1701.05517 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.
In: International Conference on Learning Representations (2015)
Su, J.-W., Chu, H.-K., Huang, J.-B.: Instance-aware image colorization. In: IEEE Conference on
Computer Vision and Pattern Recognition, pp. 7968–7977 (2020)
Tai, Y.-W., Jia, J., Tang, C.-K.: Local color transfer via probabilistic segmentation by expectation-
maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 747–754
(2005)
Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic
class distribution. In: Winter Conference on Applications of Computer Vision, pp. 2445–2454
(2020)
Wan, S., Xia, Y., Qi, L., Yang, Y.-H., Atiquzzaman, M.: Automated colorization of a grayscale
image with seed points propagation. IEEE Trans. Multimedia 22(7), 1756–1768 (2020a)
Wan, Z., Zhang, B., Chen, D., Zhang, P., Chen, D., Liao, J., Wen, F.: Bringing old photos back to
life. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2747–2757 (2020b)
Wang, S., Zhang, Z.: Colorization by matrix completion. In: AAAI Conference on Artificial
Intelligence (2012)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error
visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Welsh, T., Ashikhmin, M., Mueller, K. Transferring color to greyscale images. ACM Trans. Graph.
21(3), 277–280 (2002)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene
recognition from abbey to zoo. In: IEEE Conference on Computer Vision and Pattern
Recognition, pp. 3485–3492 (2010)
Yao, Q., James, T.K.: Colorization by patch-based local low-rank matrix completion. In: AAAI
Conference on Artificial Intelligence (2015)
Yatziv, L., Sapiro, G.: Fast image and video colorization using chrominance blending. IEEE Trans.
Image Process. 15(5), 1120–1129 (2006)
Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., Choo, J.: Coloring with limited data: few-shot
colorization via memory augmented networks. In: IEEE Conference on Computer Vision and
Pattern Recognition (2019)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale
image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
(2015)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on
Computer Vision, pp. 649–666 (2016)
Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A.S., Yu, T., Efros, A.A.: Real-time user-guided
image colorization with learned deep priors. ACM Trans. Graph. 36, 1–11 (2017)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of
deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern
Recognition, pp. 586–595 (2018)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database
for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Variational Model-Based Deep Neural
Networks for Image Reconstruction 23
Yunmei Chen, Xiaojing Ye, and Qingchao Zhang
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Learned Algorithm for Specified Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882
Structured Image Reconstruction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Proximal Point Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
ISTA-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
ADMM-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
Variational Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
Primal-Dual Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Learnable Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
Abstract
Keywords
Introduction
Variational method has been one of the most mature and effective approaches for
solving inverse problems in imaging Aubert and Vese (1997), Dal Maso et al.
(1992), Koepfler et al. (1994), and Scherzer et al. (2009). In the context of image
reconstruction, the inverse problem can be formulated as an optimization in a
general form as follows:
min g(u) + h(u), (1)
u
where u is the image to be reconstructed, h(u) is the data fidelity that measures the
discrepancy between u and the acquired data (often in the transformed domain), and
g(u) is a regularization term which imposes the prior knowledge or our preference
on the solution u.
To instantiate the variational method (1), we may consider the image recon-
struction problem with total-variation (TV) regularization for compressive sensing
magnetic resonance imaging (CS-MRI) in the discretized form: Suppose√that √ the
gray-scale image u to be reconstructed is defined on the two-dimensional n × n
mesh grid (thus a total of n pixels) representing its square domain [0, 1]2 . Then u
can be interpreted as a vector in Rn where its ith component ui ∈ R is the integral
(or average) of the image intensity value over the ith pixel for i = 1, . . . , n. MRI
scanners can acquire the Fourier coefficients of u, from which one can recover u
simply by applying inverse Fourier transform. For fast imaging in CS-MRI, we only
acquire a fraction of Fourier coefficients b ∈ Cm with m < n, which relates to
u by b = P Fu + e where F ∈ Cn×n is the discrete Fourier transform matrix,
P ∈ Rm×n is a binary selection matrix (one entry as 1 and the rest as 0 in each row)
indicating the indices of the sampled Fourier coefficients, and e ∈ Cm represents the
unknown noise in data acquisition. Then the data fidelity term h(u) in (1) can be set
to (1/2) · P Fu − b22 . For fast imaging, m is often much smaller than n and hence
we need additional regularization g(u) in (1) to ensure robust and stable recovery of
u. TV is one of the most commonly used regularization in image reconstruction–the
simplified version of TV in the discrete setting is T V (u) = ni=1 Di u2 where
Di ∈ R2×n is binary and has only two nonzero entries (1 and −1) corresponding
to the forward finite difference approximations to partial derivatives along the
coordinate axes at pixel i. Hence the regularization can be set to g(u) = μ T V (u)
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 881
for some user-chosen weight parameter μ > 0 in (1). The motivation of using TV as
regularization is that images with small TV tend to have distinct constant intensity
values in different regions and sharp intensity change on the boundary between two
regions, hence displaying the included objects with clear intensity contrasts. The
minimization in (1) thus reflects the principle of the variational method for image
recovery—we want to find the minimizer u such that it is consistent to the observed
data (small value of h(u)) and meanwhile has desired regularity (small value of
g(u)). To this point, (1) becomes an optimization problem of u ∈ Rn , for which we
can apply a proper numerical optimization algorithm and solve for u from (1).
The variational method yields a concise and elegant formulation of image
reconstruction as in (1). It has achieved great success in image reconstruction thanks
to the fast developments of numerical optimization techniques in the past decades.
However, there are several main issues associated with this approach.
The first issue with (1) is the choice of regularization g(u). There are numerous
regularization terms proposed in the literature. Although many of them have proven
robust in practice, they are often overly simplified and cannot capture the fine details
in medical images which are critical in diagnosis and treatment. For example, TV
regularization is known for its “staircase” effect due to its promotion of sparse
gradients, such that the reconstructed images tend to be piecewise constant which
are not ideal approximations to the real-world images. For example, important fine
structures and minor contrast changes can be smeared in the reconstructed image
using TV regularization, which is unacceptable for applications that require high
image quality.
The second issue is the parameter tuning. To achieve desired balance between
noise reduction and faithful structural reconstruction, the parameters of a recon-
struction model (e.g., μ > 0 mentioned above) and its associated optimization
algorithm (such as step sizes) need to be carefully tuned. Unfortunately, the image
quality is often very sensitive to these parameters; and the optimal parameters are
also shown to be highly dependent on the specific acquisition settings and imaging
datasets.
Last but not least, the reconstruction time of iterative optimization algorithms
is also a major concern on their applications in real-world problems. Despite that
the efficiency of optimization algorithms is continuously being improved, these
algorithms, even for convex problems, often require hundreds of iterations or more
to converge, which result in long computational time.
The issues with the classical variational methods and optimization algorithms
mentioned above inspired a new class of deep learning-based approaches. Deep
learning Goodfellow et al. (2016) with deep neural networks (DNNs) as the core
component has achieved great success in a variety of real-world applications,
including computer vision (He et al. 2016; Krizhevsky et al. 2012; Zeiler and
Fergus 2014), natural language processing (Devlin et al. 1810; Hinton et al. 2012;
Sarikaya et al. 2014; Socher et al. 2012; Vaswani et al. 2017), medical imaging
(Hammernik et al. 2018; Schlemper et al. 2018; Sun et al. 2016), etc. DNNs have
provable representation power and can be trained with little or no knowledge about
the underlying functions. However, there are several major issues of such standard
882 Y. Chen et al.
deep learning approaches: (i) Generic DNNs may fail to approximate the desired
functions if the training data is scarce; (ii) the training of these DNNs is prone to
overfitting, noises, and outliers; and (iii) the trained DNNs are mostly “blackboxes”
without rigorous mathematical justification and can be very difficult to interpret.
To mitigate the aforementioned issues of DNNs, a class of learnable optimization
algorithms (LOAs) has been proposed recently. In brief, the architectures of the
neural networks in LOAs mimic the iterative scheme of the optimization algo-
rithms, also known of “unrolling” the optimization algorithms. More specifically,
these reconstruction networks are composed of a small number of phases, where
each phase mimics one iteration of a classical, optimization-based reconstruction
algorithm. In most cases, the terms corresponding to the manually designed regular-
ization in the classical methods are parameterized by multilayer perceptrons whose
parameters are to be learned adaptively in the offline training process with lots of
imaging data. After training, these networks work as fast feedforward mappings
with extremely low computational cost, so that the reconstruction of new images
can be performed on the fly. These methods combine the best parts of variational
methods and deep learning for fast and adaptive image reconstruction. In the next
section, we first consider the algorithms that are designed to solve a prescribed
model in the form of (1). Section “Structured Image Reconstruction Networks” is
dedicated to the class of deep reconstruction networks that can learn the variational
model or algorithm such that the outputs are high-quality reconstructions of the
images.
1
min μu1 + Au − b2 , (2)
u 2
where A ∈ Rm×n , b ∈ Rm , and the parameter μ > 0 are given. The solution of (2)
is also known as the least absolute shrinkage and selection operator (lasso) or sparse
recovery since the solution u fits the observed data b in the data fidelity term h(u) :=
(1/2) · Au − b2 and meanwhile tends to have only a small amount of nonzero
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 883
1
h(u) ≈ h(u(k) ) + ∇h(u(k) ), u − u(k) + u − u(k) 2
2α
1
= u − (u(k) − α∇h(u(k) ))2 + const, (3)
2α
where we completed the square to obtain the equality above, and the term “const”
represents a constant independent of u. As a result, ISTA generates the next iterate
u(k+1) by
1
u(k+1) = arg min g(u) + u − (u(k) − α∇h(u(k) ))2 , (4)
u 2α
where the constant term is omitted since it does not affect the result u(k+1) in (4).
To obtain u(k+1) in (4), it is essential to find the solution of the proximity operator
proxg defined below for any given z ∈ Rn :
1
proxg (z) := arg min g(x) + x − z2 . (5)
x 2
With g(x) := μx1 , the proximity operator proxg has a closed form solution,
called the shrinkage operator Sμ . That is, the ith component of Sμ (z) = proxg (z) ∈
Rn is
where α is set to the optimal value 1/L in (7) and L is the largest eigenvalue of A A
(i.e., the Lipschitz constant of ∇h(u) = A (Au − b)). It can be shown that, starting
from any initial guess u(0) , ISTA (7) generates a sequence {u(k) } that converges to a
solution of (2) at a sublinear rate of O(1/k) in function value.
However, the practical performance of ISTA is not satisfactory as it often requires
hundreds to thousands of iterations to obtain an acceptable approximation to the
solution. Although there are a variety of optimization techniques to improve the
884 Y. Chen et al.
(k) (k)
for k = 0, . . . , K − 1. In LISTA (8), the linear mappings W1 , W2 and the
nonlinear mapping (can also be a preselected nonlinear activation function) σk
can be learned, such that the final output u(K) , as a function of these parameters
:= (. . . , W1 , W2 , σk , . . . ), is close to a solution u∗ of (2) for a given b. More
(k) (k)
1 (K)
N
min u (bj ; ) − u∗j 2
N
j =1
where u(K) (b; ) denotes the output of the K-phase network with parameter and
input data b. By training the parameter with various of b and the corresponding
u∗ , LISTA can find an effective path from u(0) to u(K) using the learned ∗ . If
training result is satisfactory with a small K (e.g., K = 10), then LISTA, as a
feedforward neural network, is expected to compute good approximation of u∗ given
new input b on the fly. Note that LISTA (8) reduces to ISTA (7) if the parameters
(k) (k)
are not learned but pre-defined as W1 = A /L, W2 = I − A A/L, and σk (·) =
Sμ/L (·) for all k. It is shown that LISTA can achieve similar solution accuracy with
iteration number K 18 to 35 times fewer than that required in ISTA or FISTA for
problems with dimension 100 to 400 (Gregor and LeCun 2010).
In recent years, there have been a number of follow-up research works that
exploit the properties and variations of LISTA. In Chen et al. (2018), a simplified
version of LISTA is proposed:
1
u(k+1) = Sμ/L u(k) − W (Au(k) − b) , (9)
L
with learnable W , and the convergence of (9) for solving (2) is also established in
Chen et al. (2018) and Liu et al. (2019). In Sprechmann et al. (2015), LISTA is
extended to learnable pursuit process architectures for structured sparse and robust
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 885
low rank models derived from proximal gradient algorithm. It is shown that such
network architecture can approximate the exact sparse or low rank representation
at a fraction of the complexity of the standard optimization methods. In Xin et al.
(2016), a learned iterative hard thresholding (IHT) algorithm where σk is replaced
by a hard thresholding operator Hk is developed, and its potential to recover minimal
l0 norm solution is shown both theoretically and empirically. The work Borgerding
et al. (2017) developed a learned approximate message passing (LAMP) algorithm
for the lasso problem (2):
In contrast to LISTA, LAMP (10) includes a residual v (k) in each layer k, which
performs shrinkage dependent on k. By the inclusion of the “Onsager correction”
term βk v (k) to decouple errors across layers, LAMP appears to outperform LISTA
in accuracy empirically. For example, on synthetic data with Gaussian matrix
A, LAMP takes 7 iteration numbers to obtain the normalized mean square error
(NMSE) −34dB, whereas LISTA takes 15 iterations (Borgerding et al. 2017).
The aforementioned learned optimization algorithms are for unconstrained mini-
mizations. Recently, the work in Xie et al. (1905) developed an algorithm, called the
differentiable linearized alternating direction method of multipliers (D-LADMM),
can be used to solve problems with linear equality constraints. D-LADMM is a
K-layer linearized ADMM-inspired deep neural network, which is obtained by
using learnable weights in the classical linearized ADMM and generalizing the
proximal operator to learnable activation functions. It is proved that there exist a set
of learnable parameters for D-LADMM to generate globally converged solutions.
To this point, we have seen several instances of modifying the ISTA (7) to obtain
deep neural networks with trainable components to solve (2). Each iteration of ISTA
is transformed into one layer of a neural network, the parameters of which are
then trained using available imaging data. Once properly trained, these networks
can often achieve more accurate approximations of the solution in much less time
than the traditional approaches. Global convergence results, sometimes even better
than the original optimization algorithms, have been established for several of these
methods. However, most of these methods are restricted to the variational model (1)
with l1 or l0 regularization, so that the proximity operators can yield closed-form
shrinkage as the nonlinear activation function. It remains as an open problem on
extending this type of methods to handle more general or learnable regularization.
As the data fidelity h is formulated based on the definitive relation between image
and acquired data, such as h(u) = (1/2) · P Fu − b2 in CS-MRI as shown in
section “Introduction”, it is often kept unmodified in (11a). Moroever, the step size
α can be set to αk which is not manually chosen but learned during the training
process. On the other hand, the proximal term in (11b) is due to the regularization g
and performs as an image “denoiser” that modifies inputs r (k) to obtain an improved
image u(k) . Instead of choosing regularization g manually and solving (11b) in each
iteration, we can directly parametrize its proximity operator proxαg as a learnable
denoiser parametrized as convolutional neural network (CNN) (Goodfellow et al.
2016). Moreover, we can use the residual network (ResNet) structure proposed in
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 887
Fig. 1 Architecture of the proximal point network (11a) and (12). The kth phase updates r (k) and
u(k) . The dependencies of each variable on other variables are shown as incoming arrows, and the
network parameters used for update are labeled next to the corresponding arrows
He et al. (2016) for the CNN which proves to be more effective for reducing training
error in imaging applications. Namely, we replace the proximity operator proxαg in
(11b) by a denoising network (Zhang et al. 2017):
where φk is a standard multiplayer CNN that maps r (k) to the residual between u(k)
and r (k) . The architecture of the proximal point network given by (11a) and (12)
is illustrated in Fig. 1, where each arrow indicates a mapping from its input to the
output with the required network parameters labeled next to it.
Let denote the collection of learnable parameters in φk (e.g., the convolutional
kernels and the biases) and algorithm parameters (e.g., αk > 0) for all k = 1, . . . , K,
and then the output after K cycles (phases) of (11a) and (12) is a function of for
any given imaging data b. Denote this output by u(K) (b; ), which is the output
of any given image data b passing through this network with parameter ; we can
form the loss function of by regression as:
1 (K)
L(; b, u∗ ) = u (b; ) − u∗ 2 , (13)
2
where u∗ is the ground truth image corresponding to the (possibly noisy and
incomplete) imaging data b, both given in the training data. By feeding in a large
amount of instances of form (b, u∗ ), we can solve for the minimizer ∗ of the sum
of L as in (13) over all of these instances. Then the deep reconstruction network
with K phases, each consisting of (11a) and (12), is a feedforward neural network
with parameters ∗ for fast image reconstruction given any new coming data b.
The proximal point network can be applied to a variety of imaging applications,
including image denoising, image deblurring, and image super-resolution by replac-
ing the proximal operator by a denoiser network in regularization subproblem of
half-quadratic splitting algorithm (Zhang et al. 2017). In Zhang et al. (2017), φk
is designed to contain 7 dilated convolutions with 64 feature maps in each middle
layer, where ReLU activation function is used after the first convolution, and both
batch normalization (BN) and ReLU are used in every convolution thereafter. The
training data is composed of 256 × 4000 image patches of size 35 × 35 cropped
from the BSD400 (Martin et al. 2001), 400 images from ImageNet validation set
888 Y. Chen et al.
(Deng et al. 2009), and 4,744 Waterloo Exploration images (Ma et al. 2016). They
evaluate their results on BSD68 (Roth and Black 2009), Set5, and Set14 (Timofte
et al. 2014), respectively. In Zhang and Ghanem (2018), IRCNN is compared with
several other methods on Set11 (Kulkarni et al. 2016) with various sampling ratios,
and the results will be presented later in this section.
The work developed in Cheng et al. (2019), Chun et al. (2019), Meinhardt
et al. (2017), Rick Chang et al. (2017), Wang et al. (2016), and Zhang et al.
(2017) can all be considered as variations of the method described above. For
instance, CNN denoiser has been placed in the proximal gradient descent algorithm
in Meinhardt et al. (2017), subproblem in half-quadratic splitting in Zhang et al.
(2017), subproblem in ADMM in Meinhardt et al. (2017) and Rick Chang et al.
(2017), and subproblems in primal-dual algorithm in Cheng et al. (2019), Meinhardt
et al. (2017), and Wang et al. (2016).
ISTA-Net
ISTA-Net Zhang and Ghanem (2018) is a deep neural network architecture for
image reconstruction inspired by ISTA as given in (7). Recall that ISTA is originally
derived to solve the l1 minimization problem (2), i.e., (1) with g(u) = μu1 and
h(u) = (1/2)·Au−b2 , as we showed in section “Learned Algorithm for Specified
Optimization Problem”. For image reconstruction, the sole l1 norm is not a suitable
regularization since almost all natural images are not sparse themselves. Instead,
they are often sparse in certain transform domains. Let ∈ Rn×n be a sparsifying
operator (e.g., wavelet transform) that transforms u into a sparse vector u. Then,
we can modify lasso (2) and obtain a similar form as:
Although (14) does not exactly match the ISTA (2) due to the presence of , this
can be easily resolved by using an orthogonal sparsifying operator and setting
x = u as the unknown for (2). For example, if we set to an orthogonal 2D
wavelet transform. In this case, we just need to solve x from the exact form of (2)
with g(x) = μx1 and h̃(x) := h( x) as the data fidelity, and recover u = x
using the output x of ISTA. Integrating this change of variables into the scheme
(11), we obtain a slightly modified version of ISTA as follows:
where θ = αμ combines the two parameters, and (15b) involves shrinkage due to
the choice of g(x) = μx1 . The gradient ∇h in (15a) is due to the data fidelity h
in (14). Therefore, we do not need to “learn” this part in the reconstruction. On the
other hand, the use of the sparsifying transform and 1 regularization is rather
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 889
heuristic. If there is sufficient amount of training data, it is likely that we can learn
a better representation of this regularization using a deep learning technique.
Bearing this idea, ISTA-Net is proposed to replace the transform and in
(15) by multilayer convolutional neural networks (CNN), while keeping the proxαg ,
i.e., the shrinkage due to the 1 norm, as it seems robust in suppressing noises. To
this end, ISTA-Net follows the scheme of ISTA (15) and constructs a deep neural
network of a prescribed K phases as in section “Proximal Point Network”.
Unlike LISTA and its variations in section “Learned Algorithm for Specified
Optimization Problem”, the kth phase of ISTA-Net is to mimic the two steps in the
kth iteration of ISTA in (15). Given the output u(k−1) of the previous phase, the
update of r (k) follows (15a) directly since h is known to accurately describe the data
formation. Therefore, only the parameter α in (15a), which behaves as the step size
in ISTA, is set to αk and is to be learned during the training process in ISTA-Net.
After r (k) is updated, it is passed to (15b) with and replaced by two multilayer
CNNs H (k) and H̃ (k) , respectively, and the shrinkage parameter θ is replaced by θk ,
which is to be learned as well. Namely, u(k) is updated by
u(k) = H̃ (k) Sθk H (k) (r (k) ) . (16)
In ISTA-Net Zhang and Ghanem (2018), H (k) and H̃ (k) are set to simple two-layer
CNNs as follows:
where we have omitted excessive parentheses for notation simplicity, i.e., H (k) r (k)
stands for H (k) (r (k) ), etc. The K phases are concatenated in order, where the kth
phase accepts the output u(k−1) of the previous phase, updates r (k) using (18a) with
αk , and finally outputs u(k) using (18b). Hence, the parameters to be learned are αk ,
θk , and w1(k) , w2(k) in H (k) and w̃1(k) and w̃2(k) in H̃ (k) for k = 1, 2, . . . , K. In the first
phase, the input is the initial guess u(0) , which can be set to A b. The output of the
last phase, u(K) , is used in the loss function that measures its squared discrepancy
890 Y. Chen et al.
Fig. 2 Architecture of ISTA-Net (18). The kth phase updates r (k) and u(k) . The dependencies of
each variable on other variables are shown as incoming arrows, and the network parameters used
for update are labeled next to the corresponding arrows
1 (K)
Ldis (; b, u∗ ) = u (b; ) − u∗ 2 (19)
2
where (b, u∗ ) is a training pair as in the proximal point network in section “Proximal
(k) (k) (k) (k)
Point Network”, and := {αk , θk , w1 , w2 , w̃1 , w̃2 | k = 1, . . . , K}. The
structure of the ISTA-Net can be visualized in Fig. 2. For more details of the network
structure and its relation to the back-propagation procedure, we refer to Wang et al.
(2019).
In addition, since H (k) and H̃ (k) in (17) are replacing and , respectively,
they are expected to satisfy H̃ (k) H (k) = I , the identity mapping. To make this
constraint approximately satisfied, the mismatch between H̃ (k) (H (k) (u∗ )) and u∗
can be integrated into the following loss function, despite that it is much weaker
than H̃ (k) H (k) = I :
1
K
Lid (; u∗ ) = H̃ (k) (H (k) (u∗ )) − u∗ 2 . (20)
2
k=1
The loss function for a particular training pair (b, u∗ ) is thus the sum of the losses
in (19) and (20) with a balancing parameter γ > 0:
and the total loss function during training is the sum of L(; b, u∗ ) in (21) over all
training pairs of form (b, u∗ ) in the training dataset.
The optimal parameter ∗ can be obtained by minimizing the loss function (21),
which can be accomplished using the stochastic gradient descent (SGD) method.
The key in the implementation of SGD is the computation of the gradient of (21)
(k) (k) (k) (k)
with respect to each network parameter, i.e., αk , θk , w1 , w2 , w̃1 , w̃2 for k =
1, . . . , K. More specifically, we first need to compute the gradient of L defined in
(21) with respect to the main variables u(k) and r (k) . Then we compute the gradients
(k) (k) (k) (k)
of u(k) with respect to its parameters, i.e., θk , w1 , w2 , w̃1 , w̃2 , and the gradient
of r (k) with respect to α (k) . Finally, the gradients of L with respect to these network
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 891
Fig. 3 Qualitative reconstruction results of ISTA-Net+ (Zhang and Ghanem 2018) applied to the
Butterfly image in Set11 (Kulkarni et al. 2016) with various sampling ratios. The numbers in the
captions of (b)-(d) are the corresponding sampling ratios, and PSNR are shown in the parentheses.
Results are generated by the code available at https://github.com/jianzhangcs/ISTA-Net. (a) True
(b) 10% (25.91) (c) 25% (33.52) (d) 50% (40.18)
ADMM-Net
ADMM-Net (Sun et al. 2016) is one of the earliest attempts to unroll a known
optimization algorithm into a deep neural network. ADMM-Net is originated from
the alternating minimization method of multipliers, or ADMM for short, which
is a numerical algorithm particularly effective for convex optimization problems
with linear equality constraints. Combined with the variable splitting technique,
ADMM has been very popular and successful in solving variety of nonsmooth
and/or constrained problems.
In its standard form, ADMM can solve constrained convex problems where the
primal variable (i.e., the variable to be solved in the optimization problem) consists
of two blocks related by a linear equality constraint. In addition, there is a dual
variable, i.e., the Lagrangian multiplier, associated with the equality constraint. In
each iteration, ADMM updates the two blocks of the primal variables in order, one
892 Y. Chen et al.
at each time with the other one fixed and then the dual variable using the updated
primal variable. ADMM yields more complex iterations due to the multiple-variable
structure than ISTA.
We first recall the variable splitting and the original ADMM for image recon-
struction problem, which is formulated as the one in ISTA as (14):
1
min g(u) + Au − b2 , (22)
u 2
but with more specific data fidelity h(u) = (1/2) · Au − b2 . Here, we write
the regularization in (22) as a composite function where g is simple (i.e., the
proximity operator proxg has closed form or is easy to compute) and as a linear
operator. A typical example is the total
variation regularization we mentioned in
section “Introduction”: g(u) := μ ni=1 Di u2 with weight parameter μ > 0.
That is, is the discrete gradient operator (finite forward differences) D, and g is
a slight variation of l1 norm which takes sum of the l2 norms of the gradients at all
pixels. For ADMM to work efficiently, there is also requirement on the matrices
and A, which we will specify later. To apply ADMM, we first use variable splitting
by introducing an auxiliary variable w such that w = Du and rewrite (22) as the
following equivalent problem:
1
min g(w) + Au − b2 , subject to w = Du. (23)
w,u 2
1 ρ
L(u, w; λ) = g(w) + Au − b2 + λ, w − Du + w − Du2 , (24)
2 2
with Lagrangian multiplier λ. ADMM is then applied to solve (23) with the
augmented Lagrangian (24). In each iteration of ADMM, the primal variables w
and u are updated in order, and then the dual variable λ is updated. In the case of
CS-MRI with A = P F mentioned in section “Introduction”, the subproblems are
given as follows:
where θ = μ/ρ. Given an initial guess (w (0) , u(0) , λ(0) ), ADMM repeats the cycle
of the three steps (25) for iteration k = 1, 2, . . . , until a stopping criterion is
satisfied. As we can see, for ADMM to work efficiently, the inverse of D D +
ρA A in (25b) must be easy to compute. In certain imaging applications, this is
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 893
1 (K)
L(; b, u∗ ) = u (b; ) − u∗ 2 . (27)
2
The total loss function is the sum of the loss in (27) above over all training pairs
(b, u∗ ) in the given training dataset. Then, the total loss function is minimized
using the (stochastic) gradient descent method, and the minimizer ∗ is the learned
894 Y. Chen et al.
Fig. 4 Architecture of ADMM-Net (26). The kth phase updates w (k) , u(k) , and λ(k) . The
dependencies of each variable on other variables are shown as incoming arrows, and the network
parameters used for update are labeled next to the corresponding arrows
Fig. 5 Brain MR image reconstruction by ADMM-Net (Sun et al. 2016) with sampling ratio 20%.
Left: ground truth. Middle: image reconstructed by zero filling. Right: reconstructed image by
ADMM-Net. Results are generated by the code available at https://github.com/yangyan92/Deep-
ADMM-Net
Variational Network
As we have seen above, the proximal point network, ISTA-Net, and ADMM-Net all
aim to solve the variational model of form:
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 895
where g, D, and even h can be learned from the training data adaptively. If we
apply the well-known gradient descent method in numerical optimization to (28),
we obtain:
where αk is the step size in iteration k. Note that above we adopted a slight abuse
of notation ∇g, since in image reconstruction g often represents the 1 norm or
alike which is not differentiable. Hence, it is more rigorous to interpret ∇g as a
subgradient of g, and the updating rule (29) is the subgradient descent. Nevertheless,
this term will be replaced by a parameterized function to be learned in training, and
thus its differentiability is not an important issue in the following derivation of the
variational reconstruction network.
The variational network (Hammernik et al. 2018) was inspired by this concise
updating rule (29). In Hammernik et al. (2018), the variational network is a fixed
number of K phases, and each phase mimics one iteration of (29). The kth phase of
variational network is built as
Here λk , H (k) , and φk are all to be learned from data. The step size αk is omitted
since it is absorbed by the learnable terms. In particular, H (k) is a convolution to
replace the manually chosen linear operator D (e.g., gradient in traditional image
reconstruction) in (29), and φk is a parameterized function to replace ∇g.
In Hammernik et al. (2018), φk in (30) is represented as a linear combination of
Gaussian functions. First of all, φk is to be applied to H (k) u(k−1) ∈ Rn component
wisely, and hence it is sufficient to describe the component-wise operation of φk
using a univariate function. To this end, we first determine a set of Nc + 1 control
points {pl : l = 0, . . . , Nc } uniformly spaced on a prescribed interval [−I, I ] such
that −I = p0 < p1 < · · · < pNc = I and pl − pl−1 = 2I /Nc for l = 1, . . . , Nc .
For each point pl , the Gaussian function with a prescribed standard deviation σ is
given by
Bl (x) = e−(x−pl )
2 /(2σ 2 )
. (31)
One can also design other basis functions, instead of (31) or even parametrize φk as
a generic neural network. For H (k) , it is a convolution operation applied to u(k−1) ,
896 Y. Chen et al.
Fig. 6 Architecture of the variational network (30). The kth phase updates u(k) . The dependencies
of each variable on other variables are shown as incoming arrows, and the network parameters used
for update are labeled next to the corresponding arrows
and hence it suffices to determine the convolution kernel. This is a very simplified
case of convolution layers of CNNs, and we omit the details here.
Now we can see that the variational network consists of K phases, where each
phase operates as (30). In particular, the first phase accepts u(0) as the input such as
A b. The last Kth phase outputs u(K) , which is used in the loss function to compare
with the reference image u∗ :
1 (K)
L(; b, u∗ ) = u (b; ) − u∗ 2 . (33)
2
where the network parameter := {αk , γ (k) , H (k) | k = 1, . . . , K}. The total
loss function is then the sum of (33) over all training pairs of form (b, u∗ ). The
architecture of variational network is presented in Fig. 6. More details about the
derivation of the back-propagation and its relation to the network structure in Fig. 6
are provided in Wang et al. (2019). Similar to the proximal point network and ISTA-
Net introduced above, the variational network can be applied to problems where the
data fidelity term h is differentiable with Lipschitz continuous gradient.
In Hammernik et al. (2018), the variational network considered above is applied
to parallel imaging MR image reconstruction. In their experiment, H (k) is imple-
mented as 48 real/imaginary filter pairs and Nc is prescribed to be 31. The network
is trained on the dataset which contains 20 image slices from 10 patients and tested
on reconstructing the whole image volume for 10 clinical patients that is non-
overlapping with training set. The qualitative illustration of a reconstructed scan
of variational network is visualized in Fig. 7.
Primal-Dual Network
where h∗ (z) and g ∗ (y) are the conjugates (Fenchel dual) of h(Au) and g(u),
respectively. Due to the Moreau’s decomposition theorem:
for any b ∈ Rn , τ > 0, and convex function f , one can obtain the following iterative
scheme by applying the primal-dual gradient algorithm to (34):
1
z(k+1) = arg min −Auk , z + h∗ (z) + z − zk 2
z 2γ
1
= proxγ h∗ (zk + γ Auk ) = zk + γ Auk − γ proxγ −1 h ( zk + Auk ) (36a)
γ
1
y (k+1) = arg min −uk , y + g ∗ (y) + y − y k 2
y 2γ
1
= proxγ g ∗ (y k + γ uk ) = y k + γ uk − γ proxγ −1 g ( y k + uk ) (36b)
γ
1
u(k+1) = arg min Au, z(k+1) + u, y (k+1) + u − u(k) 2
u 2τ
= uk − τ A z(k+1) − τy (k+1) (36c)
u (k+1)
=u (k+1)
+ θ (u (k+1)
−u )
(k)
(36d)
898 Y. Chen et al.
z(k) + σ (Au(k) − b)
z(k+1) = , (37a)
1+σ
u(k+1) = proxτg (u(k) − τ A∗ z(k+1) ), (37b)
where σ , τ , and θ are algorithm parameters. (ii) The Chambolle-Pock network (CP-
Net) learns a generalized Chambolle-Pock algorithm with the data fidelity term
(1/2) · Au − b2 relaxed to h(Au). Then the updating scheme of z(k+1) becomes
z(k+1) = proxσ h∗ (z(k) + σ Au(k) ) and CP-Net learns both proxτg and proxσ h∗ with
CNN denoisers. (iii) By breaking the linear combination parts in above iterates for
z(k+1) , u(k+1) , and u(k+1) in CP-Net, primal-dual net (PD-Net) further increases the
network flexibility by freely learning those combinations in addition to the learnable
proximal operators. In Cheng et al. (2019), the primal or dual proximal operators are
substituted by learned CNN denoisers with 3 convolutional layers and 32 channels in
each hidden layer. All these networks are trained and tested on 1400 and 200 images
of size 256 × 256 and the corresponding k-space data undersampled by Poisson disk
sampling mask. The qualitative reconstruction results of these three variations of the
network on MR images are shown in Fig. 8, which are obtained from (Cheng et al.
2019).
1
N
min L(u(bj ; ), u∗j ) + R(), (38a)
N
j =1
where h is the data fidelity term to ensure that the reconstructed image u is faithful
to the given data b, and g is the regularization that may incorporate proper prior
information of u. The regularization g(·; ) (and possibly h also) is realized as a
DNN with parameter to be learned. The loss function L(u, u∗ ) is to measure the
difference between a reconstruction u and the corresponding ground truth image u∗
from the training data. The optimal parameter of g (and h) is then obtained by
solving the upper-level optimization (38a).
If the actual minimizer u(b; ) is replaced by the direct output of an LOA-
based DNN (such as ISTA-Net etc. in the previous subsection) which mimics
an iterative optimization scheme for solving the lower-level minimization in the
constraint of (38) and then (38) reduces to the unrolling methods introduced in the
previous subsections. However, the unrolled networks do not have any convergence
guarantee, and the learned components do not represent g in (38) and can be difficult
to interpret.
To obtain convergence guarantee with interpretable network structures, (Chen
et al. 2020) proposed a novel learnable descent algorithm (LDA). Consider the case
where the data fidelity term h(u) := (1/2) · Au − b2 (or any smooth but possibly
nonconvex function) and g(u) is a nonsmooth nonconvex regularization function
900 Y. Chen et al.
which is design to be g(u) = r(u)2,1 = m i=1 ri (u). Here r = (r1 , . . . , rm ) is
a smooth but nonconvex mapping realized by a deep neural network whose param-
eters are learned from training data, and ri (u) ∈ Rd stands for a d-dimensional
feature vector for i = 1, . . . , m. To overcome the nondifferentiability issue of
g(u), a smooth approximation of g by applying Nesterov’s smoothing technique
(Nesterov 2005) is employed: gε (u) = i∈I0 2ε ri (u) + i∈I1 ri (u) − 2ε ,
1 2
where the index set I0 and its complement I1 at u for the given r and ε are defined
by I0 = {i ∈ [m] | ri (u) ≤ ε}, I1 = [m] \ I0 . Denote fε (u) = h(u) + gε (u) (we
omit for notation simplicity). Then LDA iterates
where in each iteration uk+1 = wk+1 if fεk (wk+1 ) ≤ fεk (vk+1 ) and vk+1 otherwise;
and εk+1 = λεk if ∇fεk (uk+1 ) < σ εk and εk+1 = εk otherwise, where λ ∈ (0, 1)
is a prescribed hyperparameter. It is shown that εk will monotonically decrease to
0 such that fεk approximates the original nonsmooth nonconvex function f , and
any accumulation points of a particular subsequence of {uk } is a Clarke stationary
point (analouge to the critical points of differentiable functions) of the nonsmooth
nonconvex function f (Chen et al. 2020).
Since LDA follows the algorithm exactly, the convergence of the LDA network
can be guaranteed. Moreover, the practical performance of LDA is very promising
in a wide range of image reconstruction applications. For example, Table 1 shows
the PSNR of the reconstructions obtained by LDA (with r parameterized by a
simple generic 4-layer CNN and K = 15 total phases) on the dataset Set11
(Kulkarni et al. 2016) with a prefixed sampling matrix. Compared to the classical
TV-based reconstruction method and several unrolling methods, LDA achieves the
best reconstruction quality with highest PSNR. In addition, LDA uses much fewer
parameters than the other networks as is shared by all its phases. In Fig. 9, the
qualitative reconstruction result of LDA is shown and compared with several state-
Table 1 Average PSNR (dB) of reconstructions obtained by the some methods on Set11 dataset
with various CS ratios and the number of learnable network parameters (#Param), where the PSNR
data is quoted from Zhang and Ghanem (2018) and Chen et al. (2020)
Method 10% 25% 50% #Param
TVAL3 Li et al. (2013) 22.99 27.92 33.55 NA
IRCNN Zhang et al. (2017) 24.02 30.07 36.23 185,472
ISTA-Net Zhang and Ghanem (2018) 25.80 31.53 37.43 171,090
ISTA-Net+ Zhang and Ghanem (2018) 26.64 32.57 38.07 336,978
LDA Chen et al. (2020) 27.42 32.92 38.50 27,967
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 901
Fig. 9 Reconstruction of parrot image in Set11 (Kulkarni et al. 2016) with CS ratio 10% obtained
by CS-Net (Shi et al. 2017), SCS-Net (Shi et al. 2019) and LDA (Chen et al. 2020). Images in the
bottom row zoom in the corresponding ones in the top row. PSNR are shown in the parentheses.
(a) Reference (b) CS-Net (28.00) (c) SCS-Net (28.10) (d) LDA (29.54)
Concluding Remarks
We reviewed several typical deep neural networks inspired by the variational method
and associated numerical optimization algorithms for the inverse problem of image
reconstruction. These neural networks have architectures that mimic the well-known
efficient optimization algorithms, such that each phase of a network corresponds to
one iteration in the original numerical scheme. The algorithm parameters and other
manually selected terms, such as the regularization, in the variational model and
optimization algorithm are replaced by learnable components in the deep recon-
struction network. The network output is thus a function of these parameters and
learnable components. Given the ground truth or high-quality image data, we can
form the loss function which measures the discrepancy between the network output
and the ground truth and apply back-propagation and stochastic gradient descent
method to optimize the parameters such that the loss function is minimized during
the training procedure. After training, these networks with optimal parameters serve
as fast feedforward networks that can reconstruct high-quality images on the fly.
902 Y. Chen et al.
Fig. 10 The norm of the gradient at every pixel in TV based image reconstruction (top row)
and the norm of the feature map r at every pixel learned in LDA (bottom row), where important
details, such as the antennae of the butterfly, the lip of Lena, and the bill of the parrot, are faithfully
recovered by LDA. (Images are obtained from Chen et al. 2020)
partial derivatives to indicate spatial dependencies and compute the gradients here.
First of all, we have
∂L ∂L ∂u(k)
= , (40)
∂r (k) ∂u(k) ∂r (k)
due to that u(k) is a function of r (k) as shown in Fig. 2. The gradient ∂u(k) /∂r (k) in
(40) is straightforward to compute due to the relation between r (k) and u(k) in (18b)
and the chain rule:
∂u(k)
= ∇ H̃ (k) (sk ) · Sθk (hk ) · ∇H (k) (r (k) ), (41)
∂r (k)
Substituting (41) into (40), we see that ∂L/∂r (k) can be obtained once we have
∂L/∂u(k) . The gradient ∂L/∂u(k) can also be computed by the chain rule:
∂L ∂ L ∂ r (k+1)
(k)
= (k+1) , (43)
∂u ∂r ∂u(k)
∂ r (k+1)
= I − αk+1 ∇ 2 h(u(k) ). (44)
∂u(k)
Hence, we can get ∂L/∂u(k) once ∂L/∂r (k+1) is computed. Therefore, we can
compute the gradients of L with respect to u(k) and r (k) for all k in the order from
left to right using (40), (41), (43), and (44), starting from ∂L/∂u(K) = u(K) − u∗ ,
as follows:
∂L ∂L ∂L ∂L ∂L ∂L
(K)
→ (K) → · · · → (k+1) → (k) → (k) → · · · → (0) (45)
∂u ∂r ∂r ∂u ∂r ∂u
∂ r (k)
= −∇h(u(k) ). (46)
∂αk
904 Y. Chen et al.
(k)
The gradient of u(k) with respect to wj in the j th layer of the CNN H (k) defined
in (17) can be obtained by applying the chain rule to (18b):
∂ u(k) ∂ hk
= ∇ H̃ (k) (sk ) · Sθk (hk ) · (47)
∂wj(k) ∂wj(k)
for j = 1, 2, where hk is the output of H (k) given the input r (k) and sk is the output
(k)
of Sθk given the input hk defined in (42). The partial derivative ∂hk /∂wj is standard
as in the back-propagation of CNN, which we omit the details here. Similarly, the
(k)
gradient of u(k) with respect to w̃j in the j th layer of the CNN H̃ (k) defined in (17)
can be obtained since u(k) and sk are the output and input of H̃ (k) , respectively. The
gradient of u(k) with respect to θk is slightly different:
In this case, we will need to treat Sθk (hk ) ∈ Rn as a function of θk for given hk , i.e.,
S· (hk ) : θk → Sθk (hk ) defined by
⎧
⎪
⎨−θk + [hk ]i
⎪ if 0 < θk < hk ,
[Sθk (hk )]i = θk − [hk ]i if 0 < θk < −hk , (49)
⎪
⎪
⎩0 otherwise.
With all the partial derivatives obtained above, we can apply the chain rule to
compute the gradient of L with respect to each of the network parameters. For
example,
∂L ∂L ∂r (k)
= (k) , (51)
∂αk ∂r ∂αk
where ∂L/∂r (k) is obtained by (40) and (41) following the back-propagation process
and ∂r (k) /∂αk is obtained by (46). The partial derivatives with respect to the other
parameters can be similarly computed as follows:
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 905
where ∂L/∂u(k) is obtained by (43) and (44) and the partial derivatives of u(k) with
respect to θk , wj(k) , and w̃j(k) are obtained similarly as explained above.
With these gradients of L with respect to the network parameters, we can employ
a stochastic gradient descent (SGD) method and find the optimal parameters ∗
that minimizes (21) over the entire training dataset. With the optimal ∗ , ISTA-
Net works as a feedforward mapping, which takes imaging data b and outputs a
reconstructed image u(K) . This feedforward mapping can be computed very fast
since all operations in (18) are explicit given ∗ .
References
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Aubert, G., Vese, L.: A variational method in image recovery. SIAM J. Numer. Anal. 34(5), 1948–
1979 (1997)
Bennett Landman, S.W.E.: 2013 diencephalon free challenge (2013). https://doi.org/10.7303/
syn3270353
Borgerding, M., Schniter, P., Rangan, S.: Amp-inspired deep networks for sparse linear inverse
problems. IEEE Trans. Signal Process. 65(16), 4293–4308 (2017)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
Chen, X., Liu, J., Wang, Z., Yin, W.: Theoretical linear convergence of unfolded ista and its
practical weights and thresholds. In: Advances in Neural Information Processing Systems, pp.
9061–9071 (2018)
Chen, Y., Liu, H., Ye, X., Zhang, Q.: Learnable descent algorithm for nonsmooth nonconvex image
reconstruction. arXiv preprint arXiv:2007.11245 (2020)
Cheng, J., Wang, H., Ying, L., Liang, D.: Model learning: Primal dual networks for fast mr imaging.
ArXiv abs/1908.02426 (2019)
Chun, I.Y., Huang, Z., Lim, H., Fessler, J.A.: Momentum-net: Fast and convergent iterative neural
network for inverse problems. arXiv preprint arXiv:1907.11818 (2019)
Dal Maso, G., Morel, J.M., Solimini, S.: A variational method in image segmentation: Existence
and approximation results. Acta Math. 168(1), 89–151 (1992)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical
image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.
248–255. IEEE (2009)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge
(2016)
Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: J. Fürnkranz,
T. Joachims (eds.) Proceedings of the 27th Internation Conference on Machine Learning (ICML
2010), pp. 399–406, Haifa (2010)
Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., Knoll, F.: Learning
a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 79(6),
3055–3071 (2018)
906 Y. Chen et al.
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heide, F., Steinberger, M., Tsai, Y.T., Rouf, M., Pajak,
˛ D., Reddy, D., Gallo, O., Liu, J., Heidrich,
W., Egiazarian, K., et al.: Flexisp: A flexible camera image processing framework. ACM Trans.
Graph. 33(6), 231 (2014)
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V.,
Nguyen, P., Kingsbury, B., et al.: Deep neural networks for acoustic modeling in speech
recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Koepfler, G., Lopez, C., Morel, J.M.: A multiscale algorithm for image segmentation by variational
method. SIAM J. Numer. Anal. 31(1), 282–299 (1994)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural
networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., Ashok, A.: Reconnet: Non-iterative reconstruction
of images from compressively sensed measurements. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 449–458 (2016)
Li, C., Yin, W., Jiang, H., Zhang, Y.: An efficient augmented lagrangian method with applications
to total variation minimization. Comput. Optim. Appl. 56(3), 507–530 (2013)
Liu, J., Chen, X., Wang, Z., Yin, W.: Alista: Analytic weights are as good as learned weights in
lista. In: ICLR (2019)
Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database:
New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–
1016 (2016)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and
its application to evaluating segmentation algorithms and measuring ecological statistics. In:
Proceedings of 8th International Conference Computer Vision, vol. 2, pp. 416–423 (2001)
Meinhardt, T., Moller, M., Hazirbas, C., Cremers, D.: Learning proximal operators: Using
denoising networks for regularizing inverse imaging problems. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 1781–1790 (2017)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152
(2005)
Rick Chang, J., Li, C.L., Poczos, B., Vijaya Kumar, B., Sankaranarayanan, A.C.: One network to
solve them all–solving linear inverse problems using deep projection models. In: Proceedings
of the IEEE International Conference on Computer Vision, pp. 5888–5897 (2017)
Roth, S., Black, M.J.: Fields of experts. Int. J. Comput. Vis. 82(2), 205 (2009)
Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language
understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Springer, New York (2009)
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D.: A deep cascade of convolu-
tional neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 37(2),
491–503 (2018)
Shi, W., Jiang, F., Liu, S., Zhao, D.: Scalable convolutional neural network for image compressed
sensing. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Shi, W., Jiang, F., Zhang, S., Zhao, D.: Deep networks for compressed image sensing. In: 2017
IEEE International Conference on Multimedia and Expo (ICME), pp. 877–882. IEEE (2017)
Socher, R., Bengio, Y., Manning, C.D.: Deep learning for nlp (without magic). In: Tutorial
Abstracts of ACL 2012, pp. 5–5. Association for Computational Linguistics (2012)
Sprechmann, P., Bronstein, A.M., Sapiro, G.: Learning efficient sparse and low rank models. IEEE
Trans. Pattern Anal. Mach. Intell. 37(9), 1821–1833 (2015)
Sun, J., Li, H., Xu, Z., et al.: Deep admm-net for compressive sensing mri. In: Advances in Neural
Information Processing Systems, pp. 10–18 (2016)
Timofte, R., De Smet, V., Van Gool, L.: A+: Adjusted anchored neighborhood regression for fast
super-resolution. In: Asian Conference on Computer Vision, pp. 111–126. Springer (2014)
23 Variational Model-Based Deep Neural Networks for Image Reconstruction 907
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin,
I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–
6008 (2017)
Wang, G., Zhang, Y., Ye, X., Mou, X.: Machine Learning for Tomographic Imaging (2019). IOP
Publishing. https://doi.org/10.1088/2053-2563/ab3cc4
Wang, S., Fidler, S., Urtasun, R.: Proximal deep structured models. In: Advances in Neural
Information Processing Systems, pp. 865–873 (2016)
Xie, X., Wu, J., Zhong, Z., Liu, G., Lin, Z.: Differentiable linearized admm. arXiv preprint
arXiv:1905.06179 (2019)
Xin, B., Wang, Y., Gao, W., Wipf, D., Wang, B.: Maximal sparsity with deep networks? In:
Advances in Neural Information Processing Systems, pp. 4340–4348 (2016)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European
Conference on Computer Vision, pp. 818–833. Springer (2014)
Zhang, J., Ghanem, B.: Ista-net: Interpretable optimization-inspired deep network for image
compressive sensing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1828–1837 (2018)
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–
3938 (2017)
Bilevel Optimization Methods in Imaging
24
Juan Carlos De los Reyes and David Villacís
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 910
Variational Inverse Problems Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Image Reconstruction as an Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Restoration Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
Optimality and Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916
Bilevel Optimization in Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917
Total Variation Gaussian Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919
Solution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924
Infinite-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924
Existence and Other Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
Stationarity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Dualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
Nonlocal Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 930
Neural Network Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932
Deep Neural Networks as a Further Regularizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
Deep Unrolling Within Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
Abstract
Optimization techniques have been widely used for image restoration tasks,
as many imaging problems may be formulated as minimization ones with the
Keywords
Introduction
Several classical image processing tasks such as denoising, inpainting, and deblur-
ring, among others, may be treated as minimization problems in suitable function
spaces and using properly chosen energy functionals, typically nonsmooth ones.
As a consequence, the historical connection between optimization and imaging
has been very fruitful, and several analytical and algorithmic developments have
originated from this close relationship. We refer to Chambolle and Pock (2016)
and the references therein for a thorough review on these links and current
developments.
More recently, new optimization ideas entered the scene hand in hand with
modern data-driven approaches. Although machine learning techniques have years
of tradition on solving inverse and imaging problems, its use in combination with
structural properties of the mathematical models has proven to be of relevance,
leading to state-of-the-art developments and applications (see, e.g., Calatroni et al.
2017; Arridge et al. 2019; Holler et al. 2018; Hintermüller and Papafitsoros 2019;
Sherry et al. 2020).
A learning approach that combines practical and theoretical advantages is bilevel
optimization. Within this setting, the imaging problems are considered as lower-
level constraints, while on the upper-level a loss function, based on a training
set, is used for estimating the different parameters involved in the models. The
resulting mathematical problems pose different challenges that need to be addressed
using sophisticated tools from variational and nonsmooth analysis (Outrata 2000;
Mordukhovich 2018; Schirotzek 2007).
A prototypical problem in this direction is the parameter learning associated with
image restoration models. An initial contribution in this respect was the paper by
Tappen and coauthors Tappen (2007), where the parameters of a Markov random
fields model were learned by means of variational optimization. Thereafter, Haber
and coauthors Haber et al. (2008) considered a general learning approach for inverse
problems and, although no mathematical theory was developed, made a case for the
24 Bilevel Optimization Methods in Imaging 911
f = A(u) + n, (1)
where u is the original image, f is the observed degraded image, n is the noise
contained in the observed image, and A is a possibly nonlinear forward operator
that models the acquisition process. In most imaging problems, the operator A is
rank deficient, leading to an ill-posed inverse problem. Therefore, nonuniqueness of
solutions or instability of the direct inversion of such operator motivates the use of
different solution techniques.
A classical way to solve such inverse problems is to make use of a variational
“energy” formulation. Using this methodology, we can state the solution of (1) as
the solution of the following optimization problem:
where û is the reconstructed image, H a bounded linear operator, F the data fidelity,
and R a regularization term. The parameters λ and α affect the contribution of
the fidelity and regularization terms to the final solution, respectively. The choice
of these two terms has a crucial impact on the solution. Indeed, the data fidelity
term models the type of noise present in the image, while the regularization term
promotes certain features which are known a priori about the image.
Regularizers
A seminal idea proposed by Tikhonov and Arsenin (1977) for the solution of inverse
problems is to use the following type of regularization term:
R(∇u, α) = α ∇u22 dx, (3)
Ω
24 Bilevel Optimization Methods in Imaging 913
Here, ΩI stands for the interaction domain of a bounded region Ω consisting of all
points outside of the domain that interact with points inside of it. The function w(t)
controls the intensity threshold at which the nonlocal filter acts and is the target of
a learning scheme. For a comparison between total variation and nonlocal means,
see Fig. 1.
Fig. 1 Comparison of regularizers in variational image denoising. (a) Noisy (b) Total Variation
(c) Nonlocal Means
24 Bilevel Optimization Methods in Imaging 915
Restoration Models
Three well-known image restoration tasks are denoising, deblurring, and inpainting.
The goal of denoising is to recover a noise-free image u from a particular noise
contaminated one f . This perturbation is usually modeled based on the statistical
estimates or approximated by a proper noise model coming from the physics behind
the acquisition of f . For a normally distributed f , the data term corresponds to a
squared Euclidean norm (Rudin et al. 1992):
F(u, λ) := λ u − f 2 dx. (8)
Ω
In the case of a Poisson noise distribution present in the damaged image, the
data fidelity term was studied
in Sawatzky et al. (2009) and Le et al. (2007) and has
the form F(u, λ) := λ Ω (u − f ) log udx. In Nikolova (2004), the author studied
impulse noise contaminated images and proposed the nonsmooth data fidelity term
F(u, λ) := λ Ω u − f 1 dx. Other convex and non-convex data fidelity models,
as well as several combinations, have been investigated as well.
In the case of deblurring, the task consists in recovering a sharp image from its
blurry observation. This blur usually comes as an optical blur from de deviation of
the object from the focused imaging plane, mechanical blur from the rapid motion
of either the target object or the imaging device, and of medium-induced blur due
to the optical turbulence of the photonics media. Given a blur operator A, the image
deblurring problem reads
F(u, λ) := λ A(u) − f 2 dx. (9)
Ω
the use of convex analysis tools for characterizing the solution of the restoration
models at hand.
By restating problem (2) for fixed parameters λ ∈ Pλ+ and α ∈ Pα+ , we obtain
where X, Y are two Banach spaces and Pλ+ , Pα+ are suitable positive sets in
the parameters spaces. Assuming that Rα : Y → R is a proper convex, lower
semicontinuous, and possibly nonsmooth function; Fλ : X → R a smooth, proper
convex, and lower semicontinuous function; and H : X → Y a bounded linear
operator, the optimality condition for this primal problem reads
where ∂(·) denotes the standard convex analysis subdifferential. Introducing the dual
multiplier q ∈ Y , the dual problem of (11) is given by
Solution Methods
Since the nonsmoothness of the function Rα prevents the direct use of standard
differentiable techniques, there are several numerical strategies for finding solutions
to (2). A first idea consists in solving this type of problems by making use of
subgradient-based methods for dealing with the primal problem directly. Although
this appears to be the most natural approach, this option has the drawback of the
classical slow convergence rate of subgradient methods (Beck 2017 Chapter 8).
24 Bilevel Optimization Methods in Imaging 917
By exploiting the differentiability of Fλ and the fact that in general the regularizer
Rα is a simple convex lower semicontinuous function, forward-backward splitting
schemes were developed, where in each iteration a gradient descent step on F and
a proximal step on Rα are performed. The resulting algorithm behaves robustly
and gets faster as the smoothness properties of Fλ improve. Moreover, accelerated
versions of this scheme (like the FISTA algorithm) became quite popular in the last
years.
Alternatively, the saddle point formulation (15) may be numerically exploited.
A popular strategy considers an alternate update, where first a descent step for the
primal variable u is performed and thereafter an ascent step in the dual variable
p is carried out. This procedure, called ADMM, can further be speed up by
considering a relaxation step (see, e.g., Chambolle and Pock 2011). These primal-
dual update steps are well-suited for parallel computation, making these methods
practical for high-resolution image denoising (Villacís 2017). Related popular
primal-dual methods are the well-known Douglas-Rachford and the Chambolle-
Pock algorithms. An extension to nonlinear operators H can be found in Valkonen
(2014).
Another frequent numerical alternative consists in regularizing the non-
differentiable term by means of a sufficiently smooth function. As a consequence,
fast second-order methods, i.e., methods where both gradient and hessian
information is used to define a descent direction, may me devised for the solution of
the regularized problems. Indeed, Newton and semismooth Newton methods, along
with globalization strategies, have been used to solve image restoration models (see,
e.g., Hintermüller and Stadler 2006; De los Reyes and Schönlieb 2013).
P
min J (uk , utrain (16)
(λ,α) k )
k=1
where the upper-level problem handles the optimal parameter loss function J , while
the lower-level problem corresponds to the restoration model of interest.
A general family of lower-level problems that allow us to learn the noise model,
as described in De los Reyes and Schönlieb (2013) and Calatroni et al. (2013), as
well as the weights for a family of regularizers (De los Reyes et al. 2017; Kunisch
and Pock 2013) is given by the energy
rj
M
N
sl
arg min E(u, λ, α, f ) := λj,i φj (u; f )i + αl,i (Bl u)i , (18)
u∈Rn j =1 i=1 l=1 i=1
rj
M
N
sl
arg min E(u, λ, α, f ) := λj φj (u; f )i + αl (Bl u)i . (19)
u∈Rn j =1 i=1 l=1 i=1
rj
M
N
sl
arg min E(u, λ, α, f ) := Pj (λj )i φj (u; f )i + Ql (αl )i (Bl u)i .
u∈Rn j =1 i=1 l=1 i=1
(20)
Most classical image denoising variational models (TV-l2 , TV-l1 , TGV-l2 , ICTV-l2 ,
etc.) as well as TV deblurring and inpainting are instances of the latter.
Also an essential component of The bilevel problem are equations (16) and
(17) is the loss function J , which models the quality of the reconstruction when
compared to the original image provided in the dataset. One classic approach is to
24 Bilevel Optimization Methods in Imaging 919
compute the difference between a ground truth image utrain and its reconstruction u
using a mean squared error (MSE) criteria J (u, utrain ) = MSE(u, utrain ) := 12 u −
utrain 22 , which is closely related to the peak signal-to-noise ratio quality measure
P SN R(u, utrain ) := 10 log10 (2552 /MSE(u, utrain )). Even though this measure is
widely used in the imaging community due to its low computational complexity,
it depends strongly on the image intensity scaling. Furthermore, PSNR does not
necessarily coincide with a human visual response to the image quality.
A more reliable quality measure proposed is the structural similarity index
(SSIM) (Wang et al. 2004), which can be casted as
J (u, utrain ) = SSI M(u, utrain ) = l(u, utrain )c(u, utrain )s(u, utrain ),
where
2μu μutrain + C1
l(u, utrain ) = ,
μ2u + μ2utrain + C1
2σu σutrain + C2
c(u, utrain ) = ,
σu2 + σu2train + C2
2σuutrain + C3
s(u, utrain ) = ,
σu + σutrain + C3
and μu and σu correspond to the mean luminance and the standard deviation of the
image u, respectively. The use of this quality measure in the bilevel optimization
context is, however, restrictive due to its nonsmoothness and non-convexity.
An alternative loss function aimed at prioritizing jump preservation was proposed
in De los Reyes et al. (2017), where the authors make use of a Huber regularization
of a total variation cost:
m
J (u, utrain ) := K(u − utrain )j .
j =1
This loss function is differentiable, convex, and it was proven advantageous for
evaluating the quality of the reconstructed image.
To simplify the exposition of the methodology, let us restrict the analysis to the
bilevel problem (16) in the specific case of total variation denoising and a single
image dataset (utrain , f ). By considering a scale-dependent parameter λ ∈ Rn+ , our
bilevel problem then reads
920 J. C. De los Reyes and D. Villacís
min
λ∈Rn+ J (u(λ), utrain ) (21)
1
n s
s.t u(λ) = arg min λi ui − fi 2 + (Ku)i (22)
u∈Rn 2
i=1 i=1
min
λ∈Rn+ J (u(λ), utrain ) (23)
s
s
s.t λ ◦ (u − f ), v − u + (Kv)i − (Ku)i ≥ 0, ∀v ∈ Rn ,
i=1 i=1
(24)
where ◦ stands for the Hadamard product between vectors. This is an optimization
problem constrained by a variational inequality of the second kind, along with non-
negativity constraints for the parameter λ.
Moreover, using duality techniques, the variational inequality of the second
kind in problem (23) can be equivalently written in primal-dual form, yielding the
following reformulation of problem (21):
minimize
(λ,u,q)∈Rn ×Rn ×Rs×2 J (u, utrain )
subject to λ ◦ (u − f ) + K q = 0
qj , (Ku)j = (Ku)j , ∀i = 1, . . . , s (25)
qj ≤ 1, ∀j = 1, . . . , s
λj ≥ 0, ∀j = 1, . . . , n.
used to derive stationarity conditions, at the price of being possibly weaker than the
ones originally expected.
In that sense, a first idea consists in carrying out a nonsmooth analysis of the
solution operator associated to the lower-level problem. Indeed, it can be shown
(De los Reyes and Meyer 2016; Hintermüller and Wu 2015) that the solution
mapping S : Rn+ → Rn , λ → u, for the lower-level problem is Bouligand
differentiable, i.e., directionally differentiable and locally Lipschitz continuous.
Using the chain rule for B-differentiable functions, the composite loss function is
Bouligand differentiable as well (Dontchev and Rockafellar 2009). This implies that
the problem (28) can be written in reduced form as
min
λ∈Rn+ J (u(λ), utrain ) (28)
⎧ ⎧ ⎫
⎪
⎨ ⎨q = (Ku)j , ⎪
j if (Ku)j = 0, ⎬
Q(u) := K q : q ∈ Rs×2 : (Ku)j
⎪
⎩ ⎩qj ≤ 1, if (Ku)j = 0. ⎪⎭
M
NGphQ (u, K q)
⎧ ⎧ ⎫
⎪
⎪ ⎪ (Ku)j wj = (Kv)j − (Kv)j , qj qj , if (Ku)j = 0, ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ (Kv)j = 0, if |qj |2 < 1, ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎨ ⎪
⎨ ⎫ ⎪
⎬
⎪
= (K w, v) : (Kv)j = 0, ∨ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎬ ⎪
⎪
⎪
⎪ ⎪
⎪ (Kv) = cq (c ∈ R), w , q = 0 ∨ ⎪
⎪
⎪ ⎪
⎪ if (Ku)j = 0, |qj |2 = 1 ⎪
⎪
⎪
⎪
⎪ ⎪
⎪
j j j j
⎪
⎪ ⎪
⎪
⎪ ⎪ ⎪
⎪
⎪
⎩ ⎩ (Kv)j = cqj (c ≥ 0), wj , qj ≥ 0. ⎪
⎪ ⎭ ⎪
⎪
⎭
0 −diag(u∗ − f ) K w
∈ {0} × NRMn (30)
I −diag(λ∗ ) v +
λ ◦ (u∗ − f ) + K q ∗ = 0, (31a)
qj∗ , (Ku∗ )j = (Ku∗ )j , ∀i = 1, . . . , s, (31b)
qj∗ ≤ 1, ∀j = 1, . . . , s, (31c)
λ ◦ p + K ϕ = ∇u J (u∗ ), (31d)
∗
(u − f ) ◦ p + ϑ = 0, (31e)
(Ku∗ )j ϕj = (Kp)j − (Kp)j , qj∗ qj∗ , if (Ku∗ )j = 0, (31f)
0 ≤ λ ⊥ ϑ ≥ 0, (31i)
The difference betweeen M-stationarity and strong stationarity systems concerns the
information about the multipliers on the so-called biactive set B = j ∈ {1, . . . , s} :
(Ku)j = 0, qj = 1}. The biactive characterization of those multipliers in (31h)
is actually weaker than in a strong stationarity system.
924 J. C. De los Reyes and D. Villacís
Finally, it can be proven that if strict complementarity holds, i.e., if the biactive
set is empty, all strong, B-, M-, and C-stationarity conditions are equivalent (see,
e.g., De los Reyes 2015; De los Reyes and Meyer 2016).
Solution Algorithms
When dealing with the numerical optimization of the bilevel problem, the solution
of a regularized version of (28) appears to be the more frequent approach. In this
line, the nonsmoothness is regularized by means of a differentiable function, and
nonlinear optimization methods are then applied. In De los Reyes and Schönlieb
(2013), for instance, the authors implement a BFGS algorithm with Armijo back-
tracking to solve a regularized bilevel problem for image denoising. Alternatively,
the authors in Hintermüller and Wu (2015) propose a projected gradient method to
find stationary points in the case of blind deconvolution.
For dealing with the nonsmooth bilevel problem, we point out the works (Outrata
and Zowe 1995) and (Christof et al. 2020). In the first one, subgradients of the
reduced cost function are computed by means of a generalized adjoint equation,
while, in the second one, a trust-region method exploiting the nonsmooth Bouligand
subdifferential properties of the solution operator is proposed. Both algorithms are
precisely devised for optimization problems with variational inequality constraints,
and convergence toward a C-stationary point is verified in the second one.
Infinite-Dimensional Case
instead of vectors, has proven to be superior for different imaging tasks, and, in order
to consider spatially dependent parameters, the function space framework appears
indeed to be the natural choice in this context.
Considering as image domain the open bounded convex set Ω ⊂ R2 and
assuming that the noisy image f lies in the Hilbert Y = L2 (Ω), the bilevel problem,
for a single training pair, consists in searching for parameters λ = (λ1 , . . . , λM ) and
α = (α1 , . . . , αN ) in abstract nonnegative parameter sets Pλ+ and Pα+ that solve
with
M
N
E(u; λ, α) := λi (x)φi (x, [Au](x)) dx + αj (x) d|Bj u|(x).
i=1 Ω j =1 Ω
M
E γ ,μ
(u; λ, α) := μH (u) + λi (x)φi (x, [Au](x)) dx
i=1 Ω
N
+ αj (x) d|Bj u|γ (x).
j =1 Ω
926 J. C. De los Reyes and D. Villacís
The measure |ν|γ corresponds to the Huber regularization of the total variation
measure |ν|.
The first questions to be answered concerning the bilevel problem (P) are related
to the existence of optimal parameters as well as the structure of the optimizers. At
least partially, some answers to these inquires have been given in De los Reyes et al.
(2016) (see also the review paper Calatroni et al. 2017). We briefly summarize next
the main results obtained in those references.
Considering the particular, but frequent, setup with quadratic loss functional and
fidelity term
1 1
J (u) = Au − utrain 2L2 (Ω) , and φ1 (x, v) = |f (x) − v|2 , (33)
2 2
and with M = 1 and Pλ+ = {1}, we may obtain conditions for positivity of the
parameters α = (α1 , . . . , αN ) ∈ Pα+ = [0, ∞]N . In fact, suppose that f, f0 ∈
BV(Ω) ∩ L2 (Ω) satisfy
then there exist μ̄, γ̄ > 0 such that any optimal solution αγ ,μ ∈ [0, ∞] to the
problem
1 train
min u − uα 2L2 (Ω)
α∈[0,∞] 2
with
1 μ
uα ∈ arg min f − u2L2 (Ω) + α|Du|γ (Ω) + ∇u2L2 (Ω;Rn )
u∈BV(Ω) 2 2
24 Bilevel Optimization Methods in Imaging 927
satisfies αγ ,μ > 0, whenever μ ∈ [0, μ̄] and γ ∈ [γ̄ , ∞]. The choice γ = ∞ should
be understood as the standard unregularized total variation measure or norm.
For fixed values γ < ∞ and μ > 0, existence of an optimal parameter can
be proven by the direct method of the calculus of variations. What condition 34
guarantees is existence of an optimal interior solution α > 0 to (P) without any
additional box constraints. Moreover, condition (34) also guarantees convergence of
optimal parameters of the numerically regularized H 1 problems (Pγ ,μ ) to a solution
of the original BV(Ω) problem (P).
A similar structural result may be obtained for second-order total generalized
variation Gaussian denoising, again assuming that the noisy data has to oscillate
more in terms of TGV2 than the ground truth does. Specifically, if the data f, utrain ∈
L2 (Ω) ∩ BV(Ω) satisfies for some α2 > 0 the condition
then there exists μ̄, γ̄ > 0 such that any optimal solution αγ ,μ = ((αγ ,μ )1 , (αγ ,μ )2 )
to the problem
1
min f0 − vα 2L2 (Ω)
α∈[0,∞]2 2
with
1
(vα , wα ) ∈ arg min f − v2L2 (Ω) + α1 |Dv − w|γ (Ω) + α2 |Ew|γ (Ω)
v∈BV(Ω) 2
w∈BD(Ω)
μ
+ (∇v, ∇w)2L2 (Ω;Rn ×Rn×n )
2
satisfies (αγ ,μ )1 , (αγ ,μ )2 > 0, whenever μ ∈ [0, μ̄], γ ∈ [γ̄ , ∞]. Observe that we
allow for infinite parameters α.
Additionally, a result on the approximation properties as γ ∞ and μ 0
is also obtained. In fact, for both previous settings, there exist γ̄ ∈ (0, ∞) and
μ̄ ∈ (0, ∞) such that the solution map (γ , μ) → αγ ,μ is outer semicontinuous
within [γ̄ , ∞] × [0, μ̄]. Roughly, the outer semicontinuity (Rockafellar and Wets
1998) of the solution map means that as the regularization vanishes, any optimal
parameters for the regularized models (Pγ ,μ ) tend to some optimal parameters of
the original model (P).
Stationarity Conditions
M
μ ∇u, ∇v dx + λi φi (Au)Av dx
Ω i=1 Ω
N
+ αj hγ (Bj u), Bj v dx = 0, ∀v ∈ V , (36)
j =1 Ω
M
μ ∇p, ∇v dx + λi φi (Au)Ap, Av dx
Ω i=1 Ω
N
+ αj h∗
γ (Bj u)Bj p, Bj v dx = −∇u J (u)v, ∀v ∈ V , (37)
j =1 Ω
φi (Au)Ap(ζ − λi ) dx ≥ 0, ∀ζ ≥ 0, i = 1, . . . , M, (38)
Ω
hγ (Bj u)Bj p(η − αj ) dx ≥ 0, ∀η ≥ 0, j = 1, . . . , N, (39)
Ω
where V stands for the Sobolev space where the regularized image lives (typically a
subspace of H 1 (Ω; Rm )), p ∈ V stands for the adjoint state, and hγ is a regularized
version of the TV subdifferential, e.g.,
⎧
⎪
⎪ z
if γ |z| − 1 ≥ 1
⎪
⎨ |z| 2γ
hγ (z) := z γ
|z| (1 − 2 (1 − γ |z| +
1 2
2γ ) ) if γ |z| − 1 ∈ (− 2γ
1 1
, 2γ ) (40)
⎪
⎪
⎪
⎩γ z if γ |z| − 1 ≤ − 2γ
1
.
24 Bilevel Optimization Methods in Imaging 929
The rigorous derivation of the optimality system has to be justified for each specific
combination of spaces, regularizers, noise models, and cost functionals.
With help of the adjoint equation (37), also gradient formulas for the reduced
cost functional J(λ, α) := J (uα,λ , λ, α) are derived:
(∇λ J)i = φi (Au)Ap dx, (∇α J)j = hγ (Bj u)Bj p dx, (41)
Ω Ω
Dualization
An alternative technique for studying the bilevel problem, via duality, was proposed
by Hintermüller and coauthors (2017), where the lower-level problem is replaced
by its pre-dual version. In the case of total variation and with a weight solely on the
regularizer, the bilevel problem becomes
β
min J (R(div p)) + α2H 1 (Ω) (D)
α≤α(x)≤α 2
μ γ 1
s.t. p ∈ arg min ∇p2L2 (Ω) + p2L2 (Ω) + div p + f 2L2 (Ω) ,
p∈H1 (Ω):|p(x)|∞ ≤α(x) 2
0
2 2
where μ, γ > 0 are regularization parameters and R stands for the localized residual
function. As a consequence, the necessary and sufficient optimality condition for the
lower-level problem becomes a variational inequality of the first kind, which may
be reformulated as a complementarity system as well. The abstract problem then
constitutes a mathematical program with equilibrium constraints in function space.
The treatment of this problem is, however, by no means any easier than the primal
bilevel one. In fact, in order to carry out the analysis, the authors have to penalize the
pointwise box constraint by means of a Moreau-Yosida function Pδ (p, α), yielding
the problem
β
min J (R(div p)) + α2H 1 (Ω)
α≤α(x)≤α 2
μ γ 1 1
s.t. p ∈ arg min ∇p2L2 (Ω) + p2L2 (Ω) + div p + f 2L2 (Ω) + Pδ (p, α) ,
p∈H10 (Ω) 2 2 2
930 J. C. De los Reyes and D. Villacís
For each penalized problem, existence of Lagrange multipliers is then proven using
standard Karush-Kuhn-Tucker theory in function spaces. Although no limit analysis
is carried out in order to get an optimality system for problem (D), the authors
provide some useful density and stability results.
Nonlocal Problems
where
2
γw (x, y) := exp − w(τ ) f (x + τ ) − f (y + τ ) dτ χ y ∈ B (x)
Bρ (0)
is the nonlocal energy norm. If w is a constant weight, the space is simply denoted
as V .
Existence of an optimal solution for the bilevel problem in each of the settings
has been proven in d’Elia et al. (2019), under the inclusion of box constraints for the
parameters. For the case of a spatially dependent coefficient in front of the fidelity,
an extra Tikhonov regularization term has to be added to the loss functional to get
existence of an optimal solution.
In constrast to the variational regularizers reviewed before, for the nonlocal prob-
lem (43), Gâteaux differentiability of the solution operator can be demonstrated.
As a consequence, necessary optimality systems that characterize strong stationary
points can be established in each of the cases (see d’Elia et al. 2019 for further
details).
For the case when a spatially dependent weight λ ∈ H 1 (Ω) is optimized, while
keeping the kernel fixed, a necessary optimality condition is given by the following
complementarity problem:
(u(x) − u(y))(ψ(x) − ψ(y))γw (x, y) dy dx
Ω∪ΩI Ω∪ΩI
+ λ (u − f ) ψ dx = 0, ∀ψ ∈ Vc , (44a)
Ω
(p(x) − p(y))(φ(x) − φ(y))γw (x, y) dy dx
Ω∪ΩI Ω∪ΩI
+ λ p φ dx = −∇u J (u)φ, ∀φ ∈ Vc , (44b)
Ω
where σΩ+ , −σΩ− , σΓ+ , σΓ− are Karush-Kuhn-Tucker multipliers associated to the
box constraints. As can be observed, in this case, the optimality system couples
local and nonlocal systems of equations with additional pointwise complementarity
conditions.
On the other hand, for the case when the weight within the kernel w ∈ L2 (Bρ (0))
is optimized, the following optimality system is satisfied:
932 J. C. De los Reyes and D. Villacís
(u(x) − u(y))(ψ(x) − ψ(y))γw (x, y) dy dx
Ω∪ΩI Ω∪ΩI
+ λ (u − f ) ψ dx = 0, ∀ψ ∈ Vc , (45a)
Ω
(p(x) − p(y))(φ(x) − φ(y))γw (x, y) dy dx
Ω∪ΩI Ω∪ΩI
+ λ p φ dx = −J (u)φ, ∀φ ∈ Vc , (45b)
Ω
u(w) − u(w) (p − p )
γ(h−w) (x, y) dy dx ≥ 0, ∀h ∈ Uad .
Ω∪ΩI Ω∪ΩI
(45c)
with Uad := v ∈ L2 (Bρ (0)) : 0 ≤ w(t) ≤ U, ∀ t ∈ Bρ (0) and the linearized
kernel
2
γh (x, y) = γw (x, y)
−h(τ ) f (x + τ ) − f (y + τ ) dτ . (45d)
Bρ (0)
In this case, even if “only” a scalar is determined, the computational cost becomes
high since in principle the kernel changes in each iteration of any solution algorithm
and, with it, the assembly of the nonlocal interaction matrix, which is in principle a
dense one.
best properties of variational models and neural networks have been proposed. We
provide next a brief review on a few of them, with the sole purpose of highlighting
the importance of these connections.
Even though we have previously detailed bilevel learning strategies for variational
problems, recently also bilevel optimization approaches that make use of neural
networks have been proposed. In particular, Deep Bilevel Optimization Neural Net-
works (BOONet), introduced by Antil and coworkers (2020), develop a strategy for
finding optimal regularization parameters based on a bilevel optimization problem.
Here, an upper-level optimization problem is used to measure the reconstruction
error on a training dataset, while the lower-level problem measures the misfit of the
data reconstruction. This reconstruction is based on a generalized regularizer that
has a network-like structure, leading to insightful comparisons over regularizers and
activation functions used in neural networks.
Now, regarding the regularization term in (2), it has been further improved
recently by making use of a pretrained CNN. Indeed, in Lunz et al. (2018) a data-
driven regularizer is built using modern generative adversarial network principles,
leading to the neural network Tikhonov (NETT) approach, where a pretrained
network is composed with a regularization functional (Li et al. 2020).
In Kobler et al. (2020), a different technique for learning regularizers is proposed,
called total deep variation. In this case, the regularizer is built using a multi-scale
convolutional neural network, which training is based on a sampled optimal control
problem interpretation. This formulation allows the authors to provide a sensitivity
analysis of the learned coefficients with respect to the training dataset. It is worth
mentioning that this regularizer can be trained using a different dataset than the
application at hand, resembling the properties of transfer learning strategies.
Assuming we use an iterative scheme for solving (2) that is based on a proximal
operator
1
proxτ R (û) = arg min u − û2 + R(u), (46)
u∈R n 2τ
solve the variational imaging task of interest. Even though this training process is
computationally expensive, this procedure is often performed as an off-line batch
operation. Once trained, the network evaluation is less expensive when used in the
reconstruction process. Gregor and LeCun (2010) were able to incorporate these
ideas into an ISTA iterative scheme for solving a sparse coding problem (LISTA).
This procedure is based on “unrolling” the iterative scheme and replacing its explicit
updates with learned ones. Hersey et al. (2014) propose to unfold the iterations into a
layer-based structure similar to a neural network with application to learning optimal
parameters of Markov random fields and nonnegative matrix factorization.
In the imaging context, these ideas have been considered in the unrolling of
iterative schemes for problems with the structure presented in (11), such as proximal
gradient (Adler and Öktem 2017), primal-dual hybrid gradient (Adler and Öktem
2018), or ADMM (Sun et al. 2016). This technique generates new tailor-designed
deep neural network architectures that make use of the structure within the problem
at hand. In the case of a Field of Experts (FoE) regularizer, this approach led to the
development of variational networks (Kobler et al. 2017), where the authors rely on
unrolling a proximal gradient descent step for a finite number of iterations and the
connections to residual neural networks (He et al. 2016) are highlighted.
Numerical Experiments
Table 1 SSIM quality measures Quality measures obtained in the training and validation dataset
using the optimal parameter learned for different image denoising models
TRAINING
num noisy TV TGV NL SD-TV PD-TV4 PD-TV16 PD-TV32
1 0.5838 0.8583 0.8715 0.7889 0.8441 0.8405 0.8433 0.8341
2 0.5298 0.8397 0.8463 0.7729 0.8226 0.8107 0.8121 0.8194
3 0.4447 0.8412 0.8612 0.8433 0.8713 0.8651 0.8655 0.8639
4 0.5877 0.8159 0.8270 0.8026 0.8531 0.8505 0.8544 0.8625
5 0.4865 0.7896 0.8234 0.8110 0.8398 0.8498 0.8457 0.8607
6 0.4699 0.8285 0.8469 0.7909 0.8343 0.8275 0.8283 0.8281
7 0.4827 0.8413 0.8564 0.7909 0.8218 0.7727 0.7785 0.8017
8 0.4884 0.8095 0.8325 0.7751 0.8389 0.8370 0.8381 0.8391
9 0.6144 0.8353 0.8654 0.7934 0.8505 0.8484 0.8484 0.8495
10 0.5029 0.8087 0.8366 0.7945 0.8298 0.7992 0.8087 0.8313
mean 0.8268 0.8467 0.7963 0.8407 0.8298 0.8323 0.8391
VALIDATION
num noisy TV TGV NL SD-TV PD-TV4 PD-TV16 PD-TV32
1 0.6020 0.8232 0.8298 0.7847 0.7826 0.8301 0.8292 0.8219
2 0.5915 0.8557 0.8596 0.7094 0.6827 0.7572 0.7527 0.7399
3 0.5280 0.7480 0.7342 0.7707 0.7258 0.7844 0.7903 0.7850
4 0.5076 0.7816 0.7769 0.7221 0.7500 0.7961 0.7958 0.7884
5 0.4569 0.7944 0.7841 0.7728 0.7856 0.8254 0.8306 0.8284
6 0.5342 0.8215 0.8344 0.7258 0.7434 0.7847 0.7856 0.7783
7 0.4937 0.7865 0.7789 0.6591 0.7064 0.7628 0.7577 0.7375
8 0.5457 0.7453 0.7569 0.6903 0.7328 0.7780 0.7797 0.7708
9 0.4907 0.7567 0.7809 0.8092 0.7277 0.8036 0.7995 0.7855
10 0.5475 0.7937 0.8146 0.8359 0.8086 0.8586 0.8561 0.8452
mean 0.7907 0.7950 0.7480 0.7445 0.7981 0.7977 0.7881
938 J. C. De los Reyes and D. Villacís
Fig. 5 Validation dataset reconstructions Average values of SSIM (red) and PSNR (blue) for the
reconstruction of the validation dataset using different parameter models
To prevent the effect of over-fitting from happening and obtain better generaliza-
tion properties, the patch-dependent regularization parameters (with few degrees of
freedom) may be considered. To test this statement and realize how many degrees
of freedom should serve that goal, we carry out an extra experiment. Specifically,
the denoised results in the validation dataset for different dimensions are presented
in Fig. 5. Indeed, the restriction on the degrees of freedom for the regularization
parameter allows better generalization according to both the SSIM and the PSNR
quality measures.
Conclusions
References
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33(12), 124007 (2017)
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Antil, H., Di, Z.W., Khatri, R.: Bilevel optimization, deep learning and fractional Laplacian
regularizatin with applications in tomography. Inverse Probl. 36(6), 064001 (2020)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Bartels, S., Weber, N.: Parameter learning and fractional differential operators: application in image
regularization and decomposition. arXiv preprint arXiv:2001.03394 (2020)
Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics,
Philadelphia (2017)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–
526 (2010)
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE CVPR,
vol. 2, pp. 60–65 (2005)
Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete
with BM3D? In: 2012 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 2392–2399 (2012)
Calatroni, L., De los Reyes, J.C., Schönlieb, C.-B.: Dynamic sampling schemes for optimal noise
learning under multiple nonsmooth constraints. In: IFIP Conference on System Modeling and
Optimization, pp. 85–95 (2013)
Calatroni, L., Papafitsoros, K.: Analysis and automatic parameter selection of a variational model
for mixed Gaussian and salt-and-pepper noise removal. Inverse Probl. 35(11), 114001 (2019)
Calatroni, L., Cao, C., De los Reyes, J.C., Schönlieb, C.-B., Valkonen, T.: Bilevel Approaches for
Learning of Variational Imaging Models. Walter de Gruyter GmbH, pp. 252–290 (2017)
Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems.
Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. JMIV 40, 120–145 (2011)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25,
161–319 (2016)
Christof, C., De los Reyes, J.C., Meyer, C.: A nonsmooth trust-region method for locally Lipschitz
functions with application to optimization problems constrained by variational inequalities.
SIAM J. Optim. 30(3), 2163–2196 (2020)
D’Elia, M., De los Reyes, J.C., Miniguano, A.: Bilevel parameter optimization for nonlocal image
denoising models. arXiv preprint arXiv:1912.02347 (2019)
Dabov, K., et al.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE
Trans. Image Process. 16(8), 2080–2095 (2007)
Davoli, E., Fonseca, I., Liu, P.: Adaptive image processing: first order PDE constraint regularizers
and a bilevel training scheme. arXiv preprint arXiv:1902.01122 (2019)
Davoli, E., Liu, P.: One dimensional fractional order $ TGV $: gamma-convergence and bilevel
training scheme. Commun. Math. Sci. 16(1), 213–237 (2018)
De los Reyes, J.C.: Optimal control of a class of variational inequalities of the second kind. SIAM
J. Control Optim. 49(4), 1629–1658 (2011)
De los Reyes, J.C.: Numerical PDE-Constrained Optimization. Springer, Cham (2015)
De los Reyes, J.C., Meyer, C.: Strong stationarity conditions for a class of optimization problems
governed by variational inequalities of the second kind. J. Optim. Theory Appl. 168(2), 375–409
(2016)
De los Reyes, J.C., Schönlieb, C.-B., Valkonen, T.: The structure of optimal parameters for image
restoration problems. J. Math. Anal. Appl. 434(1), 464–500 (2016)
940 J. C. De los Reyes and D. Villacís
De los Reyes, J.C., Schönlieb, C.-B., Valkonen, T.: Bilevel parameter learning for higher-order
total variation regularisation models. J. Math. Imaging Vision 57, 1–25 (2017)
De los Reyes, J.C., Schönlieb, C.-B.: Image denoising: learning the noise model via nonsmooth
PDE-constrained optimization. Inverse Probl. Imaging 7(4), 1183–1214 (2013)
Dempe, S.: Foundations of Bilevel Programming. Springer Science & Business Media, Boston
(2002)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-
resolution. In: European Conference on Computer Vision, pp. 184–199 (2014)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings, vol. 543. Springer,
New York (2009)
Du, Q., et al.: Analysis and approximation of nonlocal diffusion problems with volume constraints.
SIAM Rev. 54(4), 667–696 (2012)
Ehrhardt, M.J., Roberts, L.: Inexact Derivative-Free Optimization for Bilevel Learning. arXiv
preprint arXiv:2006.12674 (2020)
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. Society for Industrial and
Applied Mathematics, Philadelphia (1999)
Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Multiscale Model.
Simul. 7(3), 1005–1028 (2008)
Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. Multi-
scale Model. Simul. 6(2), 595–630 (2007)
Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: Proceedings of the
27th International Conference on International Conference on Machine Learning, pp. 399–406
(2010)
Gunzburger, M., Lehoucq, R.B.: A nonlocal vector calculus with application to nonlocal boundary
value problems. Multiscale Model. Simul. 8, 1581–1598 (2010)
Haber, E., Horesh, L., Tenorio, L.: Numerical methods for experimental design of large-scale linear
ill-posed inverse problems. Inverse Probl. 24(5), 055012 (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hershey, J.R., Le Roux, J., Weninger, F.: Deep unfolding: model-based inspiration of novel deep
architectures. arXiv preprint arXiv:1409.2574 (2014)
Hintermüller, M., Papafitsoros, K.: Generating Structured Nonsmooth Priors and Associated
Primal-Dual Methods, vol. 20, pp. 437–502. Elsevier (2019)
Hintermüller, M., Rautenberg, C.N.: Optimal selection of the regularization function in a weighted
total variation model. Part I: Modelling and theory. J. Math. Imaging Vis. 59(3), 498–514 (2017)
Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation–
based inf-convolution-type image restoration”. SIAM J. Sci. Comput. 28(1), 1–23 (2006)
Hintermüller, M., Wu, T.: Bilevel optimization for calibrating point spread functions in blind
deconvolution. Inverse Probl. Imaging 9(4), 1139–1170 (2015)
Holler, G., Kunisch, K., Barnard, R.C.: A bilevel approach for parameter learning in inverse
problems. Inverse Probl. 34(11), 115012 (2018)
Knoll, F., Bredies, K., Pock, T., Stollberger, R.: Second order total generalized variation (TGV) for
MRI. Magn. Reson. Med. 65(2), 480–491 (2011)
Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total Deep Variation for Linear Inverse Problems.
arXiv preprint arXiv:2001.05005 (2020)
Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational
methods and deep learning. In: German Conference on Pattern Recognition, pp. 281–293 (2017)
Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational
models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by
Poisson noise. J. Math. Imaging Vis. 27(3), 257–263 (2007)
Li, H., et al.: NETT: solving inverse problems with deep neural networks. Inverse Probl. 36(6),
065005 (2020)
Lou, Y., et al.: Image recovery via nonlocal operators. J. Sci. Comput. 42(2), 185–197 (2010)
24 Bilevel Optimization Methods in Imaging 941
Lunz, S., Öktem, O., Schönlieb, C.-B.: Adversarial regularizers in inverse problems. arXiv preprint
arXiv:1805.11572 (2018)
Mordukhovich, B.S.: Variational Analysis and Applications. Springer, Cham (2018)
Nikolova, M.: A variational approach to remove outliers and impulse noise. J. Math. Imaging Vis.
20(1), 99–120 (2004)
Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media, New York
(2006)
Ochs, P., Ranftl, R., Brox, T., Pock, T.: Techniques for gradient-based bilevel optimization with
non-smooth lower level problems. J. Math. Imaging Vis. 56(2), 175–194 (2016)
Outrata, J.V.: A generalized mathematical program with equilibrium constraints. SIAM J. Control
Optim. 38(5), 1623–1638 (2000)
Outrata, J., Zowe, J.: A numerical approach to optimization problems with variational inequality
constraints. Math. Program. 68(1), 105–130 (1995)
Ring, W.: Structural properties of solutions to total variation regularization problems. ESAIM:
Math. Model. Numer. Anal. 34(4), 799–810 (2000)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer Science & Business
Media, Berlin (1998)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
Sawatzky, A., Brune, C., Müller, J., Burger, M.: Total Variation Processing of Images with Poisson
Statistics, vol. 5702, pp. 533–540. Springer, Berlin/Heidelberg (2009)
Schirotzek, W.: Nonsmooth Analysis. Springer Science & Business Media, Berlin/Heidelberg
(2007)
Sherry, F., et al.: Learning the sampling pattern for MRI. IEEE Trans. Med. Imaging 39(12), 4310–
4321 (2020)
Sun, J., Li, H., Xu, Z.: Deep ADMM-Net for compressive sensing MRI. Adv. Neural Inf. Process.
Syst. 29 (2016)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tappen, M.F.: Utilizing variational optimization to learn Markov random fields. In: 2007 IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Tikhonov, A.N., Arsenin, V.: Solutions of Ill-Posed Problems, vol. 14. Winston, Washington, DC
(1977)
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV, p. 2 (1998)
Valkonen, T.: A primal–dual hybrid gradient method for nonlinear operators with applications to
MRI. Inverse Probl. 30(5), 055012 (2014)
Van Chung, C., De los Reyes, J.C., Schönlieb, C.-B.: Learning optimal spatially-dependent
regularization parameters in total variation image denoising. Inverse Probl. 33(7), 074005
(2017)
Venkatakrishnan, S.V., C.A. Bouman, Wohlberg, B.: Plug-and-play priors for model based
reconstruction. In: 2013 IEEE Global Conference on Signal and Information Processing,
pp. 945–948 (2013)
Villacís, D.: First order methods for high resolution image denoising. Latin American Journal of
Computing Faculty of Systems Engineering Escuela Politécnica Nacional Quito-Ecuador 4(3),
37–42 (2017)
Wang, Y.-Q.: A multilayer neural network for image demosaicking. In: 2014 IEEE International
Conference on Image Processing (ICIP), pp. 1852–1856 (2014)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., others: Image quality assessment: from
error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Xu, L., Ren, J.S.J., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In:
Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)
Yaroslavsky, L.P.: Digital picture processing: an introduction. Appl. Opt. 25, 3127 (1986)
Multi-parameter Approaches
in Image Processing 25
Markus Grasmair and Valeriya Naumova
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
PDE-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Dictionary-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Multiparameter Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Balancing Principle and Balanced Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 954
L-Hypersurface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
Generalized Lasso Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
Parameter Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
Numerical Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
Abstract
M. Grasmair ()
NTNU, Trondheim, Norway
e-mail: [email protected]
V. Naumova ()
Machine Intelligence Department, Simula Consulting and SimulaMet, Oslo, Norway
e-mail: [email protected]
Keywords
Introduction
Image restoration aiming, for instance, at the recovery of lost information from
noisy, blurred, and/or partially observed images plays an important role in many
practical applications such as anomaly detection in medical images and galaxy
analysis in astronomical images. With the massive production of digital images and
videos, the need for efficient image restoration methods emerges even more. No
matter how good cameras are, an improvement of the images is always desirable.
Moreover, many image restoration tasks such as image denoising are necessary in
many more applications than the ones mentioned above. Image denoising, being
the simplest possible inverse problem, provides a useful and by now well-accepted
framework in which different image processing ideas and techniques can be tested,
compared, and perfected. Therefore, the field of image processing in general has
received numerous contributions in the last decades from diverse scientific commu-
nities. Various statistical estimators, deep learning methods, adaptive filters, partial
differential equations, transform-domain methods, splines, differential geometry-
based methods, and regularization are only some of many areas and tools explored
in studying image processing tasks.
This chapter does not intend to provide an overview of the vast amount of
methods in image processing, but rather concentrate on variational multiparameter
approaches for image restoration. These approaches have provided notable advances
on different image restoration tasks in the last decades and continue to play an
important role in this and other fields.
Mathematically speaking, we model an image restoration problem as follows:
y = Au + δ, (1)
where u is a ground truth image affected by the action of the imaging operator A
and is measured in the presence of a random noise δ. For simplicity, we assume here
that the noise is additive, although most of the argumentation and methods below
still remain valid for more complicated scenarios. In the simplest case of denoising,
the operator A is the identity; other typical examples are convolution operators in
the case of deblurring and masking operators in inpainting tasks.
25 Multi-parameter Approaches in Image Processing 945
K
û = arg min{(Au, y) + λi Ψi (u)}.
u
i=1
In the specific case when we are interested in separating different components of the
image, such as cartoon and texture, we impose different penalization terms on the
different components. This results in the model:
PDE-Based Approaches
A∗ Aû − λû = A∗ y
A∗ Aû − λ1 û + λ2 2 û = A∗ y.
Such models have been attractive for a long time mainly because of their computa-
tional simplicity: they only require the solution of a linear system, which moreover
has in many cases a very simple structure. However, the usage of the squared H 1 -
norm leads to very smooth, blurred results, a problem that may be made even worse
by the addition of higher-order terms.
In Rudin et al. (1992), it has been argued that the “correct” way for treating
image restoration problems is the usage of the total variation as the regularization
term. There, one uses theL1 -norm of the image gradient as penalization term, that
is, Ψ (u) = T V (u) = |∇u|dx. In contrast to a quadratic penalization of the
gradient, this has the advantage of a much weaker penalization of large gradients,
allowing edges to remain in the restored image. While the total variation is well
suited for capturing large uniform regions in images, and also edges, it destroys the
other important feature of natural images: textured patterns. In order to be able to
reconstruct realistic images, it is therefore necessary to find a way for incorporating
textures into the regularization functionals.
One possibility, suggested by Meyer (2001) (see also Vese and Osher (2003),
which contains the first numerical implementation of the method), is to decompose
the image into a geometry part u1 , which can be treated by the total variation, and a
texture part u2 , for the treatment of which he introduced a norm that is dual to total
25 Multi-parameter Approaches in Image Processing 947
For a more precise definition of the involved spaces, see Meyer (2001). The intuition
behind the introduction of the G-norm is the idea that textures mainly consist
of rapidly oscillating, relatively uniform patterns. For such repeating structures,
however, their G-norm is inversely proportional to the frequency of the oscillations:
in the one-dimensional case, for instance, the G-norm of the function u2 (x) =
sin(kx) would be 1/k. More complex-related decomposition models, where an
image is decomposed into more than two parts, have been suggested, for instance,
in Bertalmio et al. (2003) and Aujol et al. (2005). An example of the resulting
decomposition of an image into its cartoon part and texture part is given in Fig. 1.
Note that only the positive part of the texture is shown and that it has been rescaled
to fill the whole color range.
An alternative image decomposition approach can be derived from a model of
image formation, which originates from the fact that natural images are projections
of three-dimensional objects onto the two-dimensional image plane. Assuming
that the depicted objects are up to a certain degree “homogeneous,” this gives rise
to the model of images consisting of several distinct, smooth regions, bordered by
the different objects’ silhouettes, which coincide with discontinuities in the image
u. Based on this assumption, Mumford and Shah formulated their famous model
Fig. 1 Decomposition of an image into a cartoon and texture part according to Meyers model (5).
Left: Original image. Middle: Resulting cartoon part. Right: Rescaled, positive texture part
948 M. Grasmair and V. Naumova
where K denotes the (one-dimensional) edge set in the image u and len(K) its
length. Originally, this model has been only formulated in the context of denoising,
whereas its application to deblurring problems requires some additional constraints
to be included (e.g., see Fornasier et al. 2013). Moreover, in contrast to the other
models discussed in this chapter, it has the disadvantage of being highly non-convex.
In addition, its numerical minimization requires in general some form of either
approximation or parametrization of the edge set K. Different approaches to that
end have been suggested using, e.g., phase-field approaches (see Ambrosio and Tor-
torelli 1990), nonlocal approximations (see Braides and Dal Maso 1997), singular
perturbations (see Braides 1998), topological gradients (see Grasmair et al. 2013;
Beretta et al. 2014), finite difference approximations (see Chambolle 1995; Gobbino
1998), or convex relaxations (see Pock et al. 2010). Note that this list is by no
means exhaustive. In the numerical experiments in Section “Numerical Examples,”
we have used the phase-field approach due to Ambrosio and Tortorelli (1990); see
also Aubert and Kornprobst (2006, Chap. 4.2.4). Here, the edge set K is approxi-
mated by a phase-field v, which is a function on Ω that is approximately 0 in a thin
strip surrounding K and approximately 1 outside this strip. This yields the model:
1 1
min Au − y + λ1
2
v |∇u| + λ2
2 2
|∇v| + (v − 1)
2 2
(7)
u,v 2 Ω Ω 4
for some small parameter > 0, which roughly corresponds to the width of the
strip that approximates K. As tends to zero, the solutions (u , v ) of (7) converge
to solutions of (6) in the sense that the functions u converge to a solution û of (6),
whereas the phase fields v converge to an indicator function of Ω \ K. An example
for this approximation of the Mumford-Shah model is presented in Fig. 2.
One of the drawbacks of total variation regularization is the so-called stair-casing
effect that is often observed in the obtained results: in the reconstructed images,
Fig. 2 Application of the Mumford-Shah model to the parrots image. Left: Original image.
Middle: Resulting cartoon part. Right: Resulting edge indicator according the Ambrosio-Tortorelli
approximation (Ambrosio and Tortorelli 1990)
25 Multi-parameter Approaches in Image Processing 949
small edges are often inserted, and smooth changes of the intensities are broken up
and replaced by gradual transitions. Convex approaches for improving this behavior
are often formulated in terms of infimal convolutions of several convex functionals
penalizing different smoothness properties of the component functions. The most
basic approach in that direction is the infimal convolution of a total variation term
and a quadratic penalization of the gradient, that is, the model:
1 1
û = arg min A(u1 + u2 ) − y2 + λ1 |∇u1 | + λ2 |∇u2 | .
2
u1 , u2 2 Ω 2 Ω
with parameter ε = λ1 /λ2 . The quadratic term that becomes active at small
gradients limits the stair-casing and allows for smooth, slow-intensity transitions,
whereas the linear term penalizing large gradients allows for edges to remain in the
reconstructed image.
Other common methods combine derivatives of several orders. The first idea
in this direction can be found in Chambolle and Lions (1997), where the authors
propose the model:
1
û = arg min A(u1 + u2 ) − y2 + λ1 |∇u1 | + λ2 |∇ u2 | .
2
(8)
u1 , u2 2 Ω Ω
Fig. 3 Application of TGV regularization. Left: Resulting image û. Middle: Norm of D û. Right:
Norm of v
where:
1
Ev = ∇v + (∇v)T
2
Dictionary-Based Approaches
PDE-based approaches were among the first ones that have considered the sepa-
ration of an image into several distinct components. However, the actual solution
of the resulting models can be very demanding numerically, in particular for non-
quadratic models. In order to overcome this bottleneck, a complementary direction
of work, inspired by ideas and advances from signal processing, is to consider image
reconstruction and separation from the point of view of sparsity and compressed
sensing.
Sparsity has become important prior for many image processing applications.
Since natural images typically are not sparse in their pixel domain, different
transforms such as wavelet transforms and different generalizations have been
proposed in the last decades with the goal of finding better and more efficient image
representations. In the sparse model, each datum (signal) can be approximated by
the linear combination of a small (sparse) number of elementary signals, called
atoms, from a prespecified basis or frame, called dictionary. The natural next
25 Multi-parameter Approaches in Image Processing 951
u = Φα.
with Φ † being the pseudo-inverse of the synthesis operator Φ. If one works with
bases instead of frames, the matrix Φ is square and invertible, Φ † = Φ −1 , and the
two approaches are equivalent. Moreover, in the case of orthonormal bases like the
Fourier basis, we have Φ −1 = Φ T .
We will first discuss two approaches that use a single analytic dictionary together
with a multiparameter approach: the first approach, which is also the probably best
known multiparameter approach, is the elastic net (Zou and Hastie 2005), which
takes the form:
1 1
min A(Φα) − y2 + λ1 α1 + λ2 α22 .
α 2 2
It has been widely used in statistics for robust regression and within imaging for
various tasks like feature selection.
The second approach in this category uses the multi-scale nature of the wavelets
and imposes different regularization parameters for different frequency bands of the
wavelet regularization operator. This idea was pursued in Lu et al. (2007) for the
952 M. Grasmair and V. Naumova
recovery of the high-resolution images. The regularization operator for the ill-posed
problem is decomposed in a multiscale manner by using bi-orthogonal wavelets or
tight frames. Specifically, the authors propose a multi-resolution framework which
introduces different regularization parameters for different frequency bands of the
regularization operator resulting in:
1
p
1
û = arg min Au − y +
2
λs Rs u2 ,
T 2
u 2 2
s=0
where Rs and R̄s are obtained from a wavelet or frame system with:
p
R̄sT Rs = I.
s=0
Here I is the identity matrix. This model has the explicit solution:
p
(A A +
T
λs R̄sT Rs )û = AT y.
s=0
Fig. 4 Decomposition of an image into piecewise smooth and texture parts according to MCA.
The addition of the texture part and the piecewise smooth part reproduces the original image. Left:
Original image. Middle: Piecewise smooth content part. Right: Texture part
The parameter σ should take into account both the noise level and the model
inaccuracies in representing sparsely u1 and u2 . Figure 4 illustrates the performance
of the MCA method for image separation with two analytic dictionaries: curvelets
for the cartoon part and the discrete cosine transform for the texture part.
The benefit of such a separation is obvious, as there is an agreement that images
are in fact a mixture of cartoon and texture parts. By treating each of the parts
separately using a proper dictionary, the image is processed much better and still
efficiently by using analytic-based dictionaries. Moreover, MCA can be run either
on the complete image or on small and overlapping patches. The immediate benefits
of the latter mode are the locality of the processing, allowing for efficient parallel
implementation, and the ability to incorporate learned dictionaries into the MCA.
Parameter Selection
K
û = arg min Au − y2 + λi Ψi (u) + βu2 , (12)
u
i=1
954 M. Grasmair and V. Naumova
where {λi }K i=1 > 0 and β > 0 are the regularization parameters, Ψi (u) = Ri u ,
2
Au({λi }K
i=1 , β) − y = cδ,
where δ is the (assumed) noise level and c ≥ 1 is some a priori specified parameter.
Typically, c is chosen slightly larger than 1, e.g., c = 1.2, in order to obtain a stable
solution in case of a slight underestimation of the noise level.
The authors also propose a numerical realization of this principle based on the
model function approximation, which approximates the discrepancy term locally by
means of some simple model function m({λi }K i=1 , β) and allows to find subsequent
parameters based on some simple equations. The scheme results in a nonunique
parameter selection rule, which limits its applicability in practice. To overcome this
issue, the follow-up work Lu et al. (2010) introduced a quasi-optimality criterion to
facilitate a unique choice.
The nonuniqueness of the discrepancy principle was also addressed in Ito et al.
(2014), where the authors consider augmented Tikhonov regularization and revisit
the balancing principle for two parameter regularization. As a result, the balanced
discrepancy principle was suggested, which incorporates the constraints into the
augmented approach and allows to partially resolve the nonuniqueness issue.
As a first step, we consider the following balancing principle, where we choose
the parameters λi in such a way that the system
⎧
⎪
⎪ û = arg min Au − y2 + λ1 Ψ1 (u) + λ2 Ψ2 (u)
⎪
⎨ u
⎪
⎪ 1 y − Aû
⎪
⎩ λi = γ Ψ (û) , i = 1, 2,
i
is satisfied. That is, we are interested in finding parameters λi which balance the
data fidelity with the respective penalty term. Here γ > 0 is a weighting parameter.
The balanced discrepancy principle combines this idea with the discrepancy
principle. That is, we choose the weighting parameter γ in such a way that the
residual satisfies Au − y = cδ. For two-parameter regularization, this leads to the
system:
⎧
⎨ Au − y = cδ, c ≥ 1,
(13)
⎩ λ1 Ψ1 (u) = λ2 Ψ2 (u).
25 Multi-parameter Approaches in Image Processing 955
L-Hypersurface
In Belge et al. (1998, 2002), a parameter selection rule for functional (12) with
β = 0 has been proposed, which is based on the generalization of the L−curve
method (Hansen 1992) to the multiparameter setting. Similar to the one-dimensional
case, one plots on the appropriate scale the residual norm
z(λ) = y − Aû(λ)2
uj (λ) = Ψj (û), j = 1, . . . , K.
A point on the L-hypersurface around which the surface is maximally warped corre-
sponds to a point where the regularization and data-fitting errors are approximately
balanced. The surface warpedness can be measured by calculating the Gaussian
curvature. However, since evaluation of the Gaussian curvature for a large number
of regularization parameters can be a computationally expensive task, which also
might yield multiple extrema, the authors propose a surrogate minimum distance
function (MDF) to approximate the curvature. However, the accuracy of the L-
hypersurface approximation with MDF sometimes depends on the MDF origin. The
authors provide some heuristic rule for the origin selection, which seems to work in
specific cases. However, a robust means for selecting the origin is needed to promote
practical usability of the method.
In Grasmair et al. (2018) a fully adaptive approach for parameter selection was
proposed for a multi-penalty functional of the form:
1
min A(u + v) − y2 + λ1 u1 + λ2 v22 ,
u,v 2
956 M. Grasmair and V. Naumova
Fig. 5 Part of the parameter space detailing the different solutions. Each of the different tiles
corresponds to a different support or sign pattern of the solution of interest
Parameter Learning
the mean error of the lower-level problem on the given training set. In many cases,
the solution of the lower-level problem presupposes the PDE-based optimization
and can be very computationally demanding.
In many real-life applications, the access to the ground truth image cannot be
granted or might indeed be impossible, such as in X-ray tomography. Therefore,
the recent efforts were dedicated to the development of an unsupervised parameter
learning rule for different regularization methods (de Vito et al. 2018). The
attractiveness of the suggested approach lies in the fact that one requires only a
training set of noisy samples {yi }N i=1 for learning the optimal parameter for given
class of images or data in general. The idea behind the method is that the ground
truth images follow intrinsically a lower dimensional geometry (i.e., they belong to a
lower dimensional manifold), which can be approximated by using a training set of
noisy samples. Once the proxy ũ is calculated, one can use it to guide the selection
of the parameter by minimizing, for instance, the discrepancy û−ũ2 . The first step
of finding a suitable proxy ũ is completely independent of the regularization method
and explores only the structure of the solution, whereas the second step of selecting
the optimal parameter is dependent on the regularization method. The authors also
showed that a learned parameter can be used on new images with similar structure
without any retraining.
Numerical Solution
With the exception of the Mumford-Shah model, all approaches discussed above
require the minimization of a convex functional composed of three or more subparts.
However, many of the models include some non-smooth, sparsity-promoting terms,
which make the application of non-smooth optimization algorithms necessary.
In the recent years, convex analysis-based methods have been the method of
choice in many imaging applications, particularly methods based on the augmented
Lagrangian or alternating direction method of multipliers (ADMM) and various
splitting methods. A large overview of different algorithms can be found in
Bauschke and Combettes (2011), Combettes and Pesquet (2011), and Komodakis
and Pesquet (2015). A specific mention here is deserved by the Chambolle-Pock
algorithm (Chambolle and Pock 2011), which has been demonstrated to be very
efficient in several total variation-based applications. We refer the reader not familiar
with convex analysis to Komodakis and Pesquet (2015), which contains a succinct
introduction into the main concepts and results necessary for the implementation of
the different algorithms.
There are at least two notable differences between single-penalty and multi-
penalty methods when it comes to their practical implementation: first, by their
very nature, they require the minimization of functionals consisting of three or
more separate terms. However, many of the more efficient primal-dual methods are
primarily formulated only for a sum of two functionals, that is, a loss term and
a single regularization term. In the situation of pure decomposition-based models
like (8) or (11), which take the form:
958 M. Grasmair and V. Naumova
1
min A(u1 + . . . + uK ) − y + λ1 Ψ (u1 ) + . . . λK Ψ (uK ) ,
2
u1 ,...,uK 2
1
F1 (u1 , . . . , uK ) = A(u1 + . . . + uK ) − y2 ,
2 (14)
F2 (u1 , . . . , uK ) = λ1 Ψ1 (u1 ) + . . . λK ΨK (uK ).
In this case, the prox-operator (see Komodakis and Pesquet 2015) for F2∗ , which
is the central ingredient in all the aforementioned algorithms, decouples into the
sum of the prox-operators for the regularization functionals Ψ1∗ ,. . . ,ΨK∗ . As long
as the latter ones can be efficiently evaluated, an efficient implementation of these
algorithms is possible.
In situations where this split is not possible, the direct application of many
well-known splitting methods can be numerically more challenging. However, there
exist generalizations to the sum of three of more convex functionals. For instance,
examples of how ADMM and Douglas-Rachford splitting can be adapted to this
more general setting can be found in Combettes and Pesquet (2011, Chap 10.7).
In addition, there exists a growing number of algorithms specifically aimed at the
minimization of a sum of three convex functionals. One notable example here is due
to Condat (2013) and Vũ (2013). We refer again to Komodakis and Pesquet (2015),
where a large number of similar algorithms are collected.
The second difference to single-parameter settings is the numerical realization
of the parameter choice: heuristic rules for single-parameter regularization like
balancing principle or the L-curve require the minimization of the regularization
functional for a number of different regularization parameters in order to find the
optimal choice. However, the situation becomes notably more complicated for
multi-penalty methods, as one has to find optimal parameters within an at least
two-dimensional set. Methods like L-hypersurfaces therefore require many more
solutions to yield reasonable results than in the single-penalty case. In order to
speed up computations, it is therefore necessary to implement good stopping criteria
for the optimization algorithms that not only take into account the convergence of
the algorithm but also the question whether the current parameter setting may be
feasible or not; in the latter case, an early termination of the optimization algorithm
can lead to a significant gain in efficiency.
Numerical Examples
that provide state-of-the-art results are based on learned dictionaries rather than
predefined ones (Starck et al. 2015). Learning a dictionary is a problem by itself,
requiring a proper tuning of many parameters that influence the performance of
the algorithm. The bi-level approaches presuppose the existence of training set
for finding a parameter that minimizes the error between the ground truth and the
reconstructed image. This type of setting falls within machine learning framework
and is outside the scope of the current chapter.
We compare the performance of the balanced discrepancy principle, the
L-hypersurface method, and the discrepancy principle without additional balancing.
We chose to omit a comparison with the generalized lasso and with machine learning
approaches, as the former is a method that is applicable only in very specialized
settings and the latter require a large amount of training data of sufficiently good
quality.
As a test example, we have used the “baboon” image, as it contains sharp edges
and a high contrast between different image regions as well as parts characterized
by a marked texture. For the denoising, we have added to each of the three color
channels pixel-wise i.i.d. Gaussian noise with a standard deviation σ = 50, the true
image taking values in the range [0, 255]. See Fig. 6 for the true and the noisy image.
We consider first the H 1 -Laplacian model (4) applied to denoising, that is, the
model:
1 λ1 λ2
û = arg min u − y2 + |∇u| +2 2
(u) , (15)
u 2 2 Ω 2 Ω
for some given noisy image y. Here all terms are applied separately, but with
the same regularization parameters λ1 and λ2 , to the three color channels of the
image. As discussed above, this is a quadratic optimization problem with the Euler-
Lagrange equation (optimality condition):
Fig. 6 Test image used for the numerical examples. Left: Original, noise-free image. Right: Noisy
image
960 M. Grasmair and V. Naumova
u − λ1 u + λ2 2 u = y,
Fig. 7 Analysis of the parameter settings for the H 1 -Laplace denoising model (15) applied
to the noisy baboon image. Left: Resulting L-hypersurface. Middle: Admissible (blue) versus
inadmissible (gray) parameter settings according to the discrepancy principle. Right: The gray
curve depicts the parameters that satisfy the discrepancy principle with equality, the blue curve
the parameters that satisfy the balancing principle. The parameter setting chosen according to the
balanced discrepancy principle is the intersection of the two curves
Fig. 8 Results of the H 1 -Laplace denoising model for different parameter choices. First row:
Resulting denoised image. Second row: Error, that is, difference between reconstruction and true,
noise-free image. Left: Optimal reconstruction according to MSE, obtained by full grid search;
PSNR = 15.18. Middle: Optimal reconstruction subject to discrepancy principle; PSNR = 20.59.
Right: Result with balanced discrepancy principle; PSNR = 20.68
25 Multi-parameter Approaches in Image Processing 961
noise level was available. The latter yields a unique parameter pair, which has been
used to obtain the right hand images in Fig. 8. In addition, we have performed a
full grid search in order to find the parameter pair that minimizes the mean square
error (MSE) as well as the pair minimizing the MSE subject to the constraint that
the discrepancy principle is satisfied with equality. The resulting images as well
as the PSNR for the different results are shown in Fig. 8. Note that the latter uses
the knowledge of the actual noise-free image, which of course is not available in
practice. Moreover, it is necessary to mention that both the MSE and the PSNR
are somehow dubious quality measures for images, as they ignore all structural
information that is present in the images and only consider point-wise discrepancies.
Next, we perform a similar numerical study for the Ambrosio-Tortorelli approx-
imation (7) of the Mumford-Shah model (6), that is, the model:
1 1
min u − y2 + λ1 v 2 |∇u|2 + λ2 |∇v|2 + (v − 1)2 .
u,v 2 Ω Ω 4
where both PDEs are solved with homogeneous Neumann boundary conditions.
The parameter was chosen to be 1 pixel-width; this results in an edge-indicator
function that is highly localized around the detected edges.
The results for this approximation of the Mumford-Shah model are shown in
Figs. 9 and 10. Again, we have compared the result for the balanced discrepancy
principle with the optimal results according to MSE obtained by a full grid search.
As can be expected from the Mumford-Shah model, which completely disregards
texture, the results are more cartoon-like than with even the H 1 -Laplace model,
leading to a slightly lower PSNR. At the same time, the result includes distinct
edges, which have been blurred in the other model.
We can also observe for the Mumford-Shah model that the balancing of the two
regularization terms is crucial even in the presence of the discrepancy principle. This
can be seen clearly in Fig. 11, where we have compared the results according to the
balanced discrepancy principle with a result that satisfies the discrepancy principle,
but where the second regularization parameter has been chosen to small. One can
clearly see that this results in a general under-smoothing of the image.
As final example, we consider the Chambolle-Lions model (8) applied to the
noisy parrots image, that is, the model:
962 M. Grasmair and V. Naumova
Fig. 9 Analysis of the parameter settings for the Mumford-Shah denoising model applied
to the noisy baboon image. Left: Resulting L-hypersurface. Middle: Admissible (blue) versus
inadmissible (gray) parameter settings according to the discrepancy principle. Right: The gray
curve depicts the parameters that satisfy the discrepancy principle with equality, the blue curve
the parameters that satisfy the balancing principle. The parameter setting chosen according to the
balanced discrepancy principle is the intersection of the two curves
Fig. 10 Results of the Mumford-Shah denoising model for different parameter choices. First row:
Resulting denoised image. Second row: Error, that is, difference between reconstruction and true,
noise-free image. Left: Optimal reconstruction according to MSE, obtained by full grid search;
PSNR = 15.28. Middle: Optimal reconstruction subject to discrepancy principle; PSNR = 19.93.
Right: Result with balanced discrepancy principle; PSNR = 20.29
1
(û1 , û2 ) = arg min u1 + u2 − y2 + λ1 |∇u1 | + λ2 |∇ 2 u2 | . (16)
u1 ,u2 2 Ω Ω
In this case, the result is decomposition of the restored image û into a part û1 mostly
containing the cartoon-like components of û and a part û2 mostly containing the
texture-like components. Moreover, we have a convex but non-smooth optimization
25 Multi-parameter Approaches in Image Processing 963
Fig. 11 Application of Mumford-Shah denoising to the noisy baboon image Fig. 6. Upper row:
Denoised image and edge indicator using the balanced discrepancy principle. Lower row: Denoised
image and edge indicator satisfying the discrepancy principle, but not the additional balancing
principle
problem, which can be solved by any of the methods described in Section “Numeri-
cal Solution”. Specifically, we have used the Chambolle-Pock algorithm (Chambolle
and Pock 2011) using the splitting (14). We note here that the solution of (16) is
not unique, as neither regularization term penalizes constant functions.
In order to
obtain a unique solution, we have therefore added the restriction Ω û1 dx = 0. The
results for this model are shown in Figs. 12 and 13.
Conclusion
Fig. 12 Analysis of the parameter settings for the Chambolle-Lions model applied to the noisy
parrots image. Left: Resulting L-hypersurface. Middle: Admissible (blue) versus inadmissible
(gray) parameter settings according to the discrepancy principle. Right: The blue curve depicts the
parameters that satisfy the discrepancy principle with equality, the gray curve the parameters that
satisfy the balancing principle. The parameter setting chosen according to the balanced discrepancy
principle is the intersection of the two curves
Fig. 13 Application of the Chambolle-Lions model to the denoising of the parrots image. Upper
row, left: Noisy data. Upper row, right: Total result û1 + û2 using the balanced discrepancy
principle. Lower row, left: Cartoon part û1 of the solution. Lower row, right: Texture part û2 of
the solution
There are several interesting open questions related to both numerical and
theoretical aspects of multiparameter regularization. Specifically, further systematic
studies of parameter learning from noisy data (unsupervised learning) not only
could be beneficial for the specific methods but also could provide new insights
into efficient construction of unsupervised deep learning algorithms.
References
Aharon, M., Elad, M., Bruckstein, A.M.: On the uniqueness of overcomplete dictionaries, and a
practical way to retrieve them. J. Linear Algebra Appl. 416, 48–67 (2006)
Ambrosio, L., Tortorelli, V.M.: Approximation of functionals depending on jumps by elliptic
functionals via Γ -convergence. Commun. Pure Appl. Math. 43(8), 999–1036 (1990)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing. Applied Mathematical
Sciences, vol. 147, 2nd edn. Springer, New York. Partial differential equations and the calculus
of variations, With a foreword by Olivier Faugeras (2006)
Aujol, J.-F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded
variation component and an oscillating component. J. Math. Imaging Vis. 22(1), 71–88 (2005)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert
Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New
York (2011)
Belge, M., Kilmer, M.E., Miller, E.L.: Simultaneous multiple regularization parameter selection
by means of the l-hypersurface with applications to linear inverse problems posed in the wavelet
transform domain. In: Bayesian Inference for Inverse Problems, vol. 3459, pp. 328–336.
International Society for Optics and Photonics (1998)
Belge, M., Kilmer, M.E., Miller, E.L.: Efficient determination of multiple regularization
parameters in a generalized l-curve framework. Inverse Prob. 18(4), 1161 (2002)
Beretta, E., Grasmair, M., Muszkieta, M., Scherzer, O.: A variational algorithm for the detection
of line segments. Inverse Prob. Imaging 8(2), 389–408 (2014)
Bertalmio, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture image inpainting.
IEEE Trans. Image Process. 12(8), 882–889 (2003)
Bobin, J., Starck, J.-L., Fadili, J.M., Moudden, Y., Donoho, D.L.: Morphological component
analysis: an adaptive thresholding strategy. IEEE Trans. Image Process. 16(11), 2675–2681
(2007)
Braides, A.: Approximation of Free-Discontinuity Problems. Lecture Notes in Mathematics,
vol. 1694. Springer, Berlin (1998)
Braides, A., Dal Maso, G.: Non-local approximation of the Mumford-Shah functional. Calc. Var.
Partial Differ. Equ. 5, 293–322 (1997)
Bredies, K., Holler, M.: Regularization of linear inverse problems with total generalized variation.
J. Inverse Ill-Posed Probl. 22(6), 871–913 (2014)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3),
492–526 (2010)
Chambolle, A.: Image segmentation by variational methods: Mumford and Shah functional and
the discrete approximations. SIAM J. Appl. Math. 55(3), 827–863 (1995)
Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems.
Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point
Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its
Applications, vol. 49, pp. 185–212. Springer, New York (2011)
966 M. Grasmair and V. Naumova
Condat, L.: A primal-dual splitting method for convex optimization involving lipschitzian,
proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460–479 (2013)
Daubechies, I., Teschke, G.: Variational image restoration by means of wavelets: Simultaneous
decomposition, deblurring, and denoising. Appl. Comput. Harmon. Anal. 19(1), 1–16 (2005)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
De los Reyes, J.C., Schönlieb, C.-B.: Image denoising: learning the noise model via nonsmooth
PDE-constrained optimization. Inverse Probl. Imaging 7(4), 1183–1214 (2013)
de Vito, E., Kereta, Z., Naumova, V.: Unsupervised parameter selection for denoising with the
elastic net (2018)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2),
407–499 (2004)
Field, D.J., Olshausen, B.A.: Emergence of simple-cell receptive field properties by learning a
sparse code for natural images. Nature 381, 607–609 (1996)
Fornasier, M., March, R., Solombrino, F.: Existence of minimizers of the Mumford-Shah
functional with singular operators and unbounded data. Ann. Mat. Pura Appl. 192(3), 361–391
(2013)
Gobbino, M.: Finite difference approximation of the Mumford-Shah functional. Commun. Pure
Appl. Math. 51(2), 197–228 (1998)
Grasmair, M., Muszkieta, M., Scherzer, O.: An approach to the minimization of the Mumford-
Shah functional using Γ -convergence and topological asymptotic expansion. Interfaces Free
Bound. 15(2), 141–166 (2013)
Grasmair, M., Klock, T., Naumova, V.: Adaptive multi-penalty regularization based on a
generalized lasso path. Appl. Comput. Harmon. Anal. 49(1), 30–55 (2018)
Gribonval, R., Schnass, K.: Dictionary identifiability – sparse matrix-factorisation via l1 -
minimisation. IEEE Trans. Inf. Theory 56(7), 3523–3539 (2010)
Hansen, P.C.: Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 34(4),
561–580 (1992)
Ito, K., Jin, B., Takeuchi, T.: Multi-parameter tikhonov regularization – an augmented approach.
Chin. Ann. Math. Ser. B 35(3), 383–398 (2014)
Komodakis, N., Pesquet, J.: Playing with duality: an overview of recent primal-dual approaches
for solving large-scale optimization problems. IEEE Sig. Process. Mag. 32(6), 31–54 (2015)
Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational
models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)
Lu, S., Pereverzev, S.V.: Multi-parameter regularization and its numerical realization. Numer.
Math. 118(1), 1–31 (2011)
Lu, Y., Shen, L., Xu, Y.: Multi-parameter regularization methods for high-resolution image
reconstruction with displacement errors. IEEE Trans. Circuits Syst. I: Regul. Pap. 54(8),
1788–1799 (2007)
Lu, S., Pereverzev, S.V., Shao, Y., Tautenhahn, U.: Discrepancy curves for multi-parameter
regularization. J. Inverse Ill-Posed Prob. 18(6), 655–676 (2010)
Mairal, J., Bach, F., Ponce, J.: Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach.
Intell. 34(4), 791–804 (2012)
Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. University
Lecture Series, vol. 22. American Mathematical Society, Providence (2001). The fifteenth Dean
Jacqueline B. Lewis memorial lectures
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: Global solutions of variational models with
convex regularization. SIAM J. Imaging Sci. 3(4), 1122–1145 (2010)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms.
Physica D 60(1–4), 259–268 (1992)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2009)
25 Multi-parameter Approaches in Image Processing 967
Starck, J.-L., Elad, M., Donoho, D.: Redundant multiscale transforms and their application
for morphological component separation. In: Advances in Imaging and Electron Physics,
pp. 287–348. Elsevier, London (2004)
Starck, J.-L., Elad, M., Donoho, D.L.: Image decomposition via the combination of sparse
representations and a variational approach. IEEE Trans. Image Process. 14(10), 1570–1582
(2005)
Starck, J.-L., Murtagh, F., Fadili, J.: Sparse Image and Signal Processing: Wavelets and Related
Geometric Multiscale Analysis, 2nd edn. Cambridge University Press, New York (2015)
Vese, L., Osher, S.: Modeling textures with total variation minimization and oscillating patterns
in image processing. J. Sci. Comput. 19(1–3), 553–572 (2003). Special issue in honor of the
sixtieth birthday of Stanley Osher
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv.
Comput. Math. 38(3), 667–681 (2013)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B
Stat. Methodol. 67(2), 301–320 (2005)
Generative Adversarial Networks
for Robust Cryo-EM Image Denoising 26
Hanlin Gu, Yin Xian, Ilona Christy Unarta, and Yuan Yao
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
Robust Denoising in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
Challenges of Cryo-EM Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973
Background: Data Representation and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975
Robust Denoising Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
Huber Contamination Noise Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
Robust Denoising Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
Robust Recovery via β-GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
Stabilized Robust Denoising by Joint Autoencoder and β-GAN . . . . . . . . . . . . . . . . . . . . . 981
Application: Robust Denoising of Cryo-EM Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
Evaluation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
This research made use of the computing resources of the X-GPU cluster supported by the Hong
Kong Research Grant Council Collaborative Research Fund: C6021-19EF. The research of Hanlin
Gu and Yuan Yao is supported in part by HKRGC 16308321, ITF UIM/390, the Hong Kong
Research Grant Council NSFC/RGC Joint Research Scheme N_HKUST635/20, as well as awards
from Tencent AI Lab and Si Family Foundation. We would like to thank Dr. Xuhui Huang for
helpful discussions.
Abstract
Keywords
Introduction
Deep learning technique has rapidly entered into the field of image processing.
One of the most popular methods was the denoising autoencoder (DA) motivated
by Vincent et al. (2008). It used the reference data to learn a compressed represen-
tation (encoder) for the dataset. One extension of DA was presented in Xie et al.
(2012), which exploited the sparsity regularization and the reconstruction loss to
avoid over-fitting. Other developments, such as Zhang et al. (2017), made use of
the residual network architecture to improve the quality of denoised images. In
addition, Agostinelli et al. (2013) combined several sparse denoising autoencoders
to enhance the robustness under different noise.
The generative adversarial network (GAN) recently gained its popularity and
provides a promising new approach for image denoising. GAN was proposed
by Goodfellow et al. (2014) and was mainly composed of two parts: the generator
(G: generate the new samples) and the discriminator (D: determine whether the
samples are real or generated (fake)). Original GAN (Goodfellow et al. 2014)
aimed to minimize the Jensen-Shannon (JS) divergence between distributions of the
generated samples and the true samples, hence called JS-GAN. Various GANs were
then studied, and in particular, Arjovsky et al. (2017) proposed the Wasserstein GAN
(WGAN), which replaced the JS divergence with Wasserstein distance. Gulrajani
et al. (2017) further improved the WGAN with the gradient penalty that stabilized
the model training. For image denoising problem, GAN could better describe the
distribution of original data by exploiting the common information of samples.
Consequently, GANs were widely applied in the image denoising problem (Tran
et al. 2020; Tripathi et al. 2018; Yang et al. 2018; Chen et al. 2018; Dong et al.
2020).
Recently, Gao et al. (2019, 2020) showed that a general family of GANs
(β-GANs, including JS-GAN and TV-GAN) enjoyed robust reconstruction when
the datasets contain outliers under Huber contamination models (Huber 1992).
In this case, observed samples are drawn from a complex distribution, which is
a mixture of contamination distribution and real data distribution. A particular
example is provided by cryo-electron microscopy (cryo-EM) imaging, where the
original noisy images are likely contaminated with outliers as broken or non-
particles. The main challenges of cryo-EM image denoising are summarized in the
subsequent section.
The cryo-electron microscopy (cryo-EM) has become one of the most popular
techniques to resolve the atomic structure. In the past, cryo-EM was limited to large
complexes or low-resolution models. Recently the development of new detector
972 H. Gu et al.
achieve robust denoising against such kind of contamination thus becomes a critical
problem. Therefore, it is a great challenge to develop robust denoising methods for
cryo-EM images to reconstruct heterogeneous biomolecular structures.
There are a plethora of denoising methods developed in applied mathematics
and machine learning that could be applied to cryo-EM image denoising. Most
of them in cryo-EM are based on unsupervised learning, which don’t need any
reference image data to learn. Wang and Yin (2013) proposed a filtering method
based on nonlocal means, which made use of the rotational symmetry of some
biological molecules. Also, Wei and Yin (2010) designed the adaptive nonlocal
filter, which made use of a wide range of pixels to estimate the denoised pixel
values. Besides, Xian et al. (2018) compared transform domain filtering method,
BM3D (Dabov et al. 2007), and dictionary learning method, KSVD (Aharon et al.
2006), in denoising problem in cryo-EM. However, all of these didn’t work well in
low signal-to-noise ratio (SNR) situations. In addition, Covariance Wiener Filtering
(CWF) (Bhamre et al. 2016) was proposed for image denoising. However, CWF
needed large sample size of data in order to estimate the covariance matrix correctly,
although it had an attractive denoising effect. Therefore, a robust denoising method
in cryo-EM images was needed.
Outline
– Both autoencoder and GANs help each other for cryo-EM denoising in low
signal-to-noise ratio scenarios. On the one hand, autoencoder helps stabilize
GANs during training, without which the training processes of GANs are often
collapsed due to high noise; on the other hand, GANs help autoencoder in
denoising by sharing information in similar samples via distribution learning.
For example, WGAN combined with autoencoder often achieve state-of-the-art
performance due to its ability of exploiting information in similar samples for
denoising.
– To achieve robustness against partial contamination of samples, one needs
to choose both robust reconstruction loss for autoencoder (e.g., 1 loss) and
robust β-GANs (e.g., (.5, .5)-GAN or (1, 1)-GAN,1 which is proved to be
robust against Huber contamination) that achieve competitive performance with
WGANs in contamination-free scenarios, but do not deteriorate that much with
data contamination.
– Numerical experiments are conducted with both a heterogeneous conformational
dataset on the Thermus aquaticus RNA polymerase (RNAP) and a homogenous
1 β-GAN has two parameters: α and β, written as (α, β)-GAN in this chapter.
974 H. Gu et al.
Autoencoder
Autoencoder (Baldi 2012) is a type of neural network used to learn efficient codings
of unlabeled data. It learns a representation (encoding) for a set of data, typically
for dimensional reduction by training the network. An autoencoder has two main
parts: encoder and decoder. The encoder maps the input data x (∈ X) into a latent
representation z, while the decoder maps the latent representation back to the data
space:
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 975
z ∼ Enc(x) (1)
x̂ ∼ Dec(z). (2)
where h is the mapping function of the encoding layer and Ω(h) is the regularization
term. The autoencoder is good at data denoising and dimension reduction.
GAN
– Hard to achieve Nash equilibrium. The updating process of the generator and the
discriminator models are hard to guarantee a convergence.
– Vanishing gradient. The gradient update is slow when the discriminator is well
trained.
– Mode collapse. The generator fails to generate samples with enough representa-
tive.
JS-GAN
JS-GAN proposed in Goodfellow et al. (2014) took Jensen-Shannon (JS) distance
to measure the difference between different data distributions. The mathematics
expression is follows:
min max Ex∼P (X),z∼P (Z) [log D(x) + log(1 − D(G(z))], (4)
G D
real data from the fake data generated by G. P (X) is the input data distribution. z is
noise. P (Z) is the noise distribution, and it is used for data generation. Training of
GANs is a minimax game by alternatively updating generators and discriminators,
where the purpose of generators is to fool the discriminator as an adversarial
process.
where x̃ is uniformly sampled along straight lines connecting pairs of the generated
and real samples and μ is a weighting parameter. In WGANgp, the last layer of the
sigmoid function in the discriminator network is removed. Thus D’s output range is
the whole real R, but its gradient is close to 1 to achieve Lipschitz-1 condition.
Let x ∈ Rd1 ×d2 be a clean image, often called reference image in the sequel.
The generative model of noisy image y ∈ Rd1 ×d2 under the linear, weak phase
approximation (Bhamre et al. 2016) could be described by
y = a ∗ x + ζ, (7)
where ∗ denotes the convolution operation, a is the point spread function of the
microscope convolving with the clean image, and ζ is an additive noise, usually
assumed to be Gaussian noise that corrupts the image. In order to remove the noise
the microscope brings, traditional denoising autoencoder could be exploited to learn
from examples (yi , xi )i=1,...,n the inverse mapping a −1 from the noisy image y to
the clean image x.
However, this model is not sufficient in the real case. In the experimental data,
the contamination will significantly affect the denoising efficiency if the denoising
methods continuously depend on the sample outliers. Therefore we introduce the
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 977
following Huber contamination model to extend the image formation model (see
Eq. (7)).
Consider that the pair of reference image and experimental image (x, y) is
subject to the following mixture distribution P :
Exploit a neural network to approximate the robust inverse mapping Gθ : Rd1 ×d2 →
Rd1 ×d2 . The neural network is parameterized by θ ∈ Θ. The goal is to ensure that
discrepancy between reference image x and reconstructed image x = Gθ (y) is
small. Such a discrepancy is usually measured by some nonnegative loss function:
(x,
x ). Therefore, the denoising problem minimizes the following expected loss:
n
S (θ ) := 1
arg min L (xi , Gθ (yi )). (10)
θ∈Θ n
i=1
Recently Gao et al. (2019, 2020) came up with a more general form of β-GAN. It
aims to solve the following minimax optimization problem to find the Gθ :
1 t
where S(t, 1) = − t cα−1 (1 − c)β dc, S(t, 0) = − 0 cα (1 − c)β−1 dc, α, β ∈
[−1, 1]. For simplicity, we denote this family with parameters α, β by (α, β)-GAN
in this chapter.
The family of (α, β)-GAN includes many popular members. For example, when
α = 0, β = 0, it becomes the JS-GAN (Goodfellow et al. 2014), which aims to solve
the minmax problem (Eq. (4)) whose loss is the Jensen-Shannon divergence. When
α = 1, β = 1, the loss is a simple mean square loss; when α = −0.5, β = −0.5,
the loss is boost score.
However, the Wasserstein GAN (WGAN) is not a member of this family. By
formally taking S(t, 1) = t and S(t, 0) = −t, we could derive the type of WGAN
as Eq. (5).
iid
Definition 3 (Strong Contamination Model). X1 , . . . , Xn ∼ P , for some P
satisfying
T V (P , Pell ) < .
Gl+1 (B) = g(x) = ReLU vh gh (x) : |vh | ≤ B, gh ∈ Gl (B) .
h≥1 h≥1
(13)
Note that the neighboring two layers are connected via ReLU activation functions.
Finally, the network structure is defined by
D (κ, B)= D(x) = sigmoid
L
wj gj (x) : |wj | ≤ κ, gj ∈ GL (B) . (14)
j ≥1 j ≥1
Now consider the following β-GAN induced by a proper scoring rule S : [0, 1]×
{0, 1} → R with the discriminator class above:
1
n
(θ̂ , Σ̂) = arg min max S(D(xi ), 1) + Ex∼Pell (Θ,Σ) S(D((x)), 0).
(θ,Σ) D∈D L (κ,B) n
i=1
(15)
The following theorem shows that such a β-GAN may give a statistically optimal
estimate of location and scatter of the general family of elliptical distributions under
strong contamination models.
Theorem 1 (Gao et al. 2020). Consider the (α, β)-GANs with |α − β| < 1. The
discriminator class D = D L (k, B) is specified by Eq. (14). Assume pn + 2 ≤ c for
some sufficiently small constant c > 0. Set 1 ≤ L = O(1), 1 ≤ B = O(1), and κ =
iid
O( pn + ). Then for any X1 , . . . Xn ∼ P , for some P satisfying T V (P , Pell ) <
with small enough , we have
p
θ̂ − θ 2 < C( ∨ 2 ),
n
p
Σ̂ − Σ2op < C( ∨ 2 ), (16)
n
2)
with probability at least 1 − eC (p+n (universal constants C and C ) uniformly
over all θ ∈ R p and all Σop ≤ M.
The theorem established that for all |α − β| < 1, (α, β)-GAN family is robust
in the sense that one can learn a distribution Pell from contaminated distributions
P such that T V (P , Pell ) < , which includes Huber contamination model as a
special case. Therefore a (α, β)-GAN with suitable choice of network architecture
can robustly learn the generative model from arbitrary contamination Q when is
small (e.g., no more than 1/3).
In the current case, the denoising autoencoder network is modified to Gθ (y),
providing us a universal approximation of the location (mean) of the inverse
generative model as Eq. (7), where the noise can be any member of the elliptical
distribution. Moreover, the discriminator is adapted to the image classification
problem in the current case. Equipped with this design, the proposed (α, β)-
GAN may help enhance the denoising autoencoder robustness against unknown
contamination, e.g., the Huber contamination model for real contamination in the
image data. The experimental results in fact confirm the efficacy of such a design.
In addition, Wasserstein GAN (WGAN) is not a member of this β-GAN family.
Compared to JS-GAN, WGAN aims to minimize the Wasserstein distance between
the sample distribution and the generator distribution. Therefore, WGAN is not
robust in the sense of contamination models above as arbitrary portion of outliers
can be far away from the main distribution P0 such that the Wasserstein distance is
arbitrarily large.
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 981
Although β-GAN can robustly recover model parameters with contaminated sam-
ples, as a zero-sum game involving a non-convex-concave minimax optimization
problem, training GANs is notoriously unstable with typical cyclic dynamics and
possible mode collapse entrapped by local optima (Arjovsky et al. 2017). However,
in this section we show that the introduction of autoencoder loss is able to stabilize
the training and avoid the mode collapse. In particular, autoencoder can help
stabilize GAN during training, without which the training processes of GAN are
often oscillating and sometimes collapsed due to the presence of high noise.
Compared with the autoencoder, β-GAN can further help denoising by exploiting
common information in similar samples during distribution training. In GAN,
the divergence or Wasserstein distance between the reference image set and the
denoised image set is minimized. The similar images can therefore help boost
signals for each other.
For these considerations, a combined loss is proposed with both β-GAN and
autoencoder reconstruction loss:
GAN (x,
L x ) + λx −
p
x p , (17)
Datasets
11: θ← − θ − ηg gω
12: end for
13: end for
Return:Denoised image: xi = Gθ (yi )
Fig. 2 Comparison between JS-GAN (black) and joint JS-GAN-1 -autoencoder (blue). (a) and
(b) are the change of MSE in training and testing data. Joint training of JS-GAN-1 -autoencoder
is much more stable than pure JS-GAN training that oscillates a lot
from DNA (transcription) in the cell. During the initiation of transcription, the
holoenzyme must bind to the DNA and then separate the double-stranded DNA into
single-stranded (Browning and Busby 2004). Taq holoenzyme has a crab-claw-like
structure, with two flexible domains, the clamp and β pincers. The clamp, especially,
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 983
Fig. 3 Five conformations in RNAP heterogeneous dataset; from left to right are close conforma-
tion to open conformation of different angles
has been suggested to play an important role in the initiation, as it has been captured
in various conformations by cryo-EM during initiation (Chen et al. 2020). Thus, we
focus on the movement of the clamp in this study. To generate the heterogeneous
dataset, we start with two crystal structures of Taq holoenzyme, which vary in their
clamp conformation, open (PDB ID: 1L9U (Murakami et al. 2002)) and closed
(PDB ID: 4XLN (Bae et al. 2015)) clamp. For the closed-clamp structure, we
remove the DNA and RNA in the crystal structure, leaving only the RNAP and σ A
for our dataset. The Taq holoenzyme has about 370 kDa molecular weight. We then
generate the clamp intermediate structures between the open and closed clamp using
multiple-basin coarse-grained (CG) molecular dynamic (MD) simulations (Okazaki
et al. 2006; Kenzaki et al. 2011). CG-MD simulations simplify the system such that
the atoms in each amino acid are represented by one particle. The structures from
CG-MD simulations are refined back to all-atom or atomic structures using PD2
ca2main (Moore et al. 2013) and SCRWL4 (Krivov et al. 2009). Five structures
with equally spaced clamp opening angle are chosen for our heterogeneous dataset
(shown in Fig. 3). Then, we convert the atomic structures to 128×128×128 volumes
using Xmipp package (Marabini et al. 1996) and generate the 2D projections with
an image size of 128 × 128 pixels. We further contaminate those clean images with
additive Gaussian noise at different signal-to-noise ratio (SNR): SN R = 0.05. The
SNR is defined as the ratio of signal power and the noise power in the real space.
For simplicity, we did not apply the contrast transfer function (CTF) to the datasets,
and all the images are centered. Figure 3 shows the five conformation pictures.
Training data size is 25,000 paired images (noisy and reference images). Test
data to calculate the MSE, PSNR, and SSIM is another 1500 paired images.
Fig. 4 The architectures of (a) discriminator D and (b) generator G, which borrow the residue
structure. The input image size (128 × 128) here is adapted to RNAP dataset, while input image
size of EMPIAR-10028 dataset is 256 × 256
105,247 noisy particles with an image size of 360 × 360 pixels. In order to decrease
the complexity of the computing, we pick up the center square of each image with
a size of 256 × 256, since the surrounding area of the image is entirely useless that
does not lose information in such a preprocessing. Then the 256 × 256 images are
fed as the input of the Gθ -network (Fig. 4). Since the GAN-based method needs
clean images as reference, we prepare their clean counterparts in the following way:
we first use cryoSPARC1.0 (Punjani et al. 2017) to build a 3.2A resolution volume
and then rotate the 3D volume by the Euler angles obtained by cryoSPARC to get
projected 2D images. The training data size we pick is 19,500, and the test data size
is 500.
Evaluation Method
We exploit the following three metrics to determine whether the denoising result is
good or not. They are the mean square error (MSE), the peak signal-to-noise ratio
(PSNR), and the structural similarity index measure (SSIM).
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 985
– (MSE) For images of size d1 × d2 , the mean square error (MSE) between the
reference image x and the denoised image
x is defined as
1
d1
d2
MSE := (x(i, j ) −
x (i, j ))2 .
d1 d2
i=1 j =1
The smaller is the MSE, the better the denoising result is.
– (PSNR) Similarly, the peak signal-to-noise ratio (PSNR) between the reference
image x and the denoised image x whose pixel value range is [0, t] (1 by default)
is defined by
t2
PSNR := 10 log10 d1 d2 .
j =1 (x(i, j ) −
1 2
d1 d2 i=1 x (i, j ))
The larger is the PSNR, the better the denoising result is.
– (SSIM) The third criterion which is the structural similarity index measure
(SSIM) between reference image x and denoised image x is defined in (Wang
et al. 2004):
x + c1 )(2σx σ
(2μx μ x + c2 )(σx
x + c3 )
SSIM = .
(μ2x + μ
x + c1 )(σx + σ
2 2
x + c2 )(σx σ
2
x + c3 )
Although these metrics are widely used in image denoising, they might not be
the best metrics for cryo-EM images. In Appendix “Influence of the Regularization
Parameter: λ,” it shows an example that the best-reconstructed images perhaps do
not meet the best MSE/PSNR/SSIM metrics.
In addition to these metrics, we consider the 3D reconstruction based on denoised
images. Particularly, we take the 3D reconstruction by RELION to validate the
denoised result. The procedure of our RELION reconstruction is as follows: firstly
creating the 3D initial model, then doing 3D classification, followed by operating
3D auto-refine. Moreover, for heterogeneous conformations in simulation data, we
further turn the denoising results into a clustering problem to measure the efficacy
of denoising methods, whose details will be discussed in Appendix “Clustering to
Solve the Conformational Heterogeneity.”
986 H. Gu et al.
In the experiments of this chapter, the best results come from the ResNet architecture
(Su et al. 2018) shown in Fig. 4, which has been successfully applied to study
biological problems such as predicting protein-RNA binding. The generator in such
GANs exploits the autoencoder network architecture, while the discriminator is a
binary classification ResNet. In Appendix “Convolution Network” and “Test RNAP
Dataset with PGGAN Strategy,” we also discuss a convolutional network without
residual blocks and the PGGAN (Karras et al. 2018) strategy with their experimental
results, respectively.
We chose Adam (Kingma and Ba 2015) for the optimization. The learning rate of
the discriminator is ηd = 0.001, and the learning rate of the generator is ηg = 0.01.
We choose m = 20 as our batch size, kd = 1, and kg = 2 in Algorithm 1.
For (α, β)-GAN, we report two types of choices, (1) α = 1, β = 1 and (2)
α = 0.5, β = 0.5 since they show the best results in our experiments, while the
others are collected in Appendix “Influence of Parameter(α, β) Brings in β-GAN.”
For WGAN, the gradient penalty with parameter μ = 10 is used to accelerate the
speed of convergence, and hence the algorithm is denoted as WGANgp below. The
trade-off (regularization) parameter of 1 or 2 reconstruction loss is set to be λ =
10 throughout this section, while an ablation study on varying λ is discussed in
Appendix “Influence of the Regularization Parameter: λ.”
Fig. 5 Results for RNAP dataset. (a) is denoised images in different denoised methods (from left
to right, top to bottom): clean, noisy, BM3D, KSVD, nonlocal means, CWF, 1 -autoencoder, 2 -
autoencoder, (1,1)-GAN + 1 , (0, 0)-GAN + 1 , (.5, .5)-GAN + 1 , and WGANgp + 1 . (b) and (c)
are reconstruction of clean images and (.5, .5)-GAN + 1 denoised images. (d) is FSC curve of (b)
and (c). (e), (f), and (g) are robustness tests of various methods under ∈ {0.1, 0.2, 0.3}-proportion
contamination in three types of contamination: (e) type A, replacing the reference images with
random noise; (f) type B, replacing the noisy images with random noise; (g) type C, replacing
both with random noise. (h) and (j) are reconstructions of images with (.5, .5)-GAN + 1 and
2 -autoencoder under type A contamination, respectively, where 2 -autoencoder totally fails but
(.5, .5)-GAN + 1 is robust. (i) shows FSC curves of (h) and (j)
conformation of Fig. 3) to present, and the performances show that WGANgp and
(α, β)-GAN can grasp the “open” shape completely and derive the more explicit
pictures than other methods.
What’s more, in order to test the denoised results of β-GAN, we reconstruct the
3D volume by RELION in 200,000 images of SNR 0.1, which are denoised by
(.5, .5)-GAN + 1 . The value of pixel size, amplitude contrast, spherical aberration,
and voltage are 1.6, 2.26, 0.1, and 300. For the other terms, retain the default settings
in RELION software. Figure 5b and c separately shows the 3D volume recovered
by clean images and denoised images. Also, the related FSC curves are shown
in Fig. 5d. Specifically, the blue curve, which represents the denoised images in
(.5, .5)-GAN + 1 , is closed to red curves representing the clean images. We use
the 0.143 cutoff criterion in literature (the resolution as Fourier shell correlation
reaches 0.143, shown by dash lines in Fig. 5d) to choose the final resolution: 3.39Å.
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 989
The structure recovered by (.5, .5)-GAN + 1 and FSC curve are as good as the
original structure, which illustrates that the denoised result of β-GAN can identify
the details of image and be helpful in 3D reconstruction.
In addition, Appendix “Clustering to Solve the Conformational Heterogeneity”
also shows an example that GAN with 1 -autoencoder helps heterogeneous confor-
mation clustering.
Fig. 6 Results for EMPIAR-10028. (a) Comparison in EMPIAR-10028 dataset in different deep
learning methods (from left to right, top to bottom): clean image, noisy image, 1 -autoencoder,
2 -autoencoder, (0, 0)-GAN + 1 , (1, 1)-GAN + 1 , (.5, .5)-GAN + 1 , WGANgp + 1 . (b) is
the MSE, PSNR, and SSIM in different denoised methods. (c) and (d) are the 3D reconstruction
of denoised images by (.5, .5)-GAN + 1 and the FSC curve, respectively. The resolution of
reconstruction from (.5, .5)-GAN + 1 denoised images is 3.20Å, which is as good as the original
resolution
fails in the contamination case. The outcome of the reconstruction demonstrates that
(.5, .5)-GAN + 1 is relatively robust, whose 3D result is consistent with the clean
image reconstruction.
In summary, some (α, β)-GAN methods, such as the ((.5, .5)-GAN and (1, 1)-
GAN, with 1 -autoencoder are more resistant to sample contamination, which are
better to be applied into the denoising of cryo-EM data.
The following Fig. 6a and b shows the denoising results by different deep learning
methods in experimental data, 1 - or 2 -autoencoders, JS-GAN ((0, 0)-GAN),
WGANgp, and (α, β)-GAN, where we add 1 loss in all of the GAN-based
structures. Although the autoencoder can grasp the shape of macromolecules, it is a
little blur in some parts. What is more, WGANgp and (.5, .5)-GAN perform better
than other deep learning methods according to MSE and PSNR, which is largely
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 991
consistent with the result of the RNAP dataset. The improvements of such GANs
over pure autoencoders lie in their ability of utilizing structural information among
similar images to learn the data distribution better.
Finally, we implement reconstruction via RELION of 100,000 images, which
are denoised by (.5, .5)-GAN +1 . The parameters are the same as the ones set in
the paper (Wong et al. 2014). The reconstruction results are shown in Fig. 6c. It is
demonstrated that the final resolution is 3.20Å, which is derived by FSC curve in
Fig. 6d using the same 0.143 cutoff (dashed line) to choose the final resolution. We
note that the final resolution by RELION after denoising is as good as the original
resolution 3.20Å reported in Wong et al. (2014).
Conclusion
In this chapter, we set a connection between the traditional image forward model
and Huber contamination model in solving the complex contamination in the
cryo-EM dataset. The joint training of autoencoder and GAN has been proved to
substantially improve the performance in cryo-EM image denoising. In this joint
training scheme, the reconstruction loss of autoencoder helps GAN to avoid mode
collapse and stabilize training. GAN further helps autoencoder in denoising by
utilizing the highly correlated cryo-EM images since they are 2D projections of
one or a few 3D molecular conformations. To overcome the low signal-to-noise
ratio challenge in cryo-EM images, joint training of 1 -autoencoder combined
with (.5, .5)-GAN, (1, 1)-GAN, and WGAN with gradient penalty is often among
the best performances in terms of MSE, PSNR, and SSIM when the data is
contamination-free. However, when a portion of data is contaminated, especially
when the reference data is contaminated, WGAN with 1 -autoencoder may suffer
from the significant deterioration of reconstruction accuracy. Therefore, robust 1 -
autoencoder combined with robust GANs ((.5, .5)-GAN and (1, 1)-GAN) is the
overall best choice for robust denoising with contaminated and high-noise datasets.
Part of the results in this chapter is based on a technical report (Gu et al. 2020).
Most of the deep learning-based techniques in image denoising need reference
data, limiting themselves in the application of cryo-EM denoising. For example,
in our experimental dataset EMPIAR-10028, the reference data is generated by the
cryoSPARC, which itself becomes problematic in highly heterogeneous conforma-
tions. Therefore, the reference image we learn may follow a fake distribution. How
to denoise without the reference image thus becomes a significant problem. It is
still open how to adapt to different experiments and those without reference images.
In order to overcome this drawback, an idea called “image-blind denoising” was
offered by the literature (Lehtinen et al. 2018; Krull et al. 2019), which viewed the
noisy image or void image as the reference image to denoise. Besides, Chen et al.
(2018) tried to extract the noise distribution from the noisy image and gain denoised
images through removing the noise for noisy data; Quan et al. (2020) augmented
the data by Bernoulli sampling and denoise image with dropout. Nevertheless, all of
the methods need noise is independent of the elements themselves. Thus it is hard
992 H. Gu et al.
to remove noise in cryo-EM because the noise from ice and machine is related to
the particles.
In addition, for reconstruction problems in cryo-EM, Zhong et al. (2020)
proposed an end-to-end 3D reconstruction approach based on the network from
cryo-EM images, where they attempt to borrow the variational autoencoder (VAE)
to approximate the forward reconstruction model and recover the 3D structure
directly by combining the angle information and image information learned from
data. This is one future direction to pursue.
Appendix
In this part, we have applied β-GAN into denoising problem. How to pick up
a good parameter: (α, β) in the β-GAN becomes an important issue. Therefore,
we investigate the impact of the parameter (α, β) on the outcome of denoising.
We choose eight significant groups of α, β. Our result is shown in Table 2. It is
demonstrated that the effect of these groups in different parameters is not large. The
best result appears in α = 1, β = 1 and α = 0.5, β = 0.5
In this part, we try to analyze whether the denoised result is good in solving
conformation heterogeneity in simulated RNAP dataset. Specifically, for hetero-
geneous conformations in simulation data, we mainly choose the following two
typical conformations: open and close conformations (the leftmost and rightmost
conformations in Fig. 3) as our testing data. Our goal is to distinguish these two
classes of conformations. However, different from the paper (Xian et al. 2018),
we do not have the template images to calculate the distance matrix, so what we
try is unsupervised learning – clustering. Our clustering method is firstly using
manifold learning, Isomap (Tenenbaum et al. 2000), to reduce the dimension of
the denoised images and then making use of k-means (k = 2) to group the different
conformations.
Figure 7a displays the 2D visualizations of two conformations about the clus-
tering effect in different denoised methods. Here the SNR of noisy data is 0.05.
In correspondence to those visualizations, the accuracy of competitive methods is
reported: (1, 1)-GAN+1 , 54/60 (54 clustering correctly in 60); WGANgp+1 ,
54/60; 2 -autoencoder, 44/60; BM3D, 34/60; and KSVD, 36/60. This result shows
that clean images separate well; (α, β)-GAN and WGANgp with l1 -autoencoder can
distinguish the open and close structure partially, although there exist several wrong
points; 2 -autoencoder and traditional techniques have poor performance because it
is hard to detect the clamp shape.
Table 2 The result of β-GANs with ResNet architecture: MSE, PSNR, and SSIM of different (α, β) in β-GAN under various levels of Gaussian noise
corruption in RNAP dataset
MSE PSNR SSIM
Parameter/SNR 0.1 0.05 0.1 0.05 0.1 0.05
α = 1, β = 1 2.99e-3(3.51e-5) 4.01e-3(1.54e-4) 25.30(0.05) 24.07(0.16) 0.82(0.03) 0.79(0.03)
α = 0.5, β = 0.5 3.01e-3(2.81e-5) 3.98e-3(4.60e-5) 25.27(0.04) 24.07(0.05) 0.79(0.04) 0.80(0.03)
α = −0.5, β = −0.5 3.02e-3(1.69e-5) 4.15e-3(5.05e-5) 25.27(0.02) 23.91(0.05) 0.80(0.03) 0.80(0.03)
α = −1, β = −1 3.05e-3(3.54e-5) 4.12e-3(8.30e-5) 25.23(0.05) 23.93(0.08) 0.80(0.05) 0.77(0.04)
α = 1, β = −1 3.05e-3(4.30e-5) 4.10e-3(5.80e-5) 25.24(0.06) 23.96(0.06) 0.82(0.02) 0.76(0.03)
α = 0.5, β = −0.5 3.09e-3(6.79e-5) 4.05e-3(6.10e-5) 25.17(0.04) 24.01(0.06) 0.79(0.04) 0.77(0.05)
α = 0, β = 0 3.06e-3(5.76e-5) 4.02e-3(5.67e-4) 25.23(0.04) 24.00(0.06) 0.78(0.03) 0.78(0.03)
α = 0.1, β = −0.1 3.07e-3(5.62e-5) 4.05e-3(8.55e-5) 25.23(0.08) 23.98(0.04) 0.78(0.02) 0.79(0.03)
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising
993
994 H. Gu et al.
Fig. 7 2D visualization of two-conformation images in manifold learning. Red point and blue
point separately represent the open and closed conformation. (a) is 2D visualization of two-
conformation image by ISOMAP in different methods (from the left and top to the right and
bottom): clean image, BM3D, KSVD, 2 -autoencoder, (1, 1)-GAN+ 1 , WGANgp+ 1 . (b) is
2D visualization of two-conformation image in different manifold learning methods (from left to
right): spectral methods, MDS, TSNE, and ISOMAP
Furthermore, the reason we use Isomap is it performs the best in our case,
and comparisons of different manifold learning methods are shown in Fig. 7b. It
demonstrates that blue and red points separate most in the graph of ISOMAP.
Specifically, the accuracy of these four methods are 50/60 (spectral method),
46/50 (MDS), 46/50 (TSNE), and 54/60 (ISOMAP). It is shown that Isomap can
distinguish best in the two structures’ images compared to other methods, such as
the spectral method (Ng et al. 2002), MDS (Cox and Cox 2008), and TSNE (Maaten
and Hinton 2008).
Convolution Network
We present the result of simple deep convolution network (remove the ResNet
block); the performances in all of criterion are worse than performances of the
residue’s architecture work. Table 3 compares the MSE and PSNR performance of
various methods in the RNAP dataset with SNR 0.1 and 0.05. And Fig. 8a displays
the denoised image of different methods in the RNAP dataset with SNR 0.05.
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 995
Table 3 MSE and PSNR of different models under various levels of Gaussian noise corruption in
RNAP dataset, where the architectures of GANs or autoencoders are simply convolution network
MSE PSNR
Method/SNR 0.1 0.05 0.1 0.05
BM3D 3.5e-2(7.8e-3) 5.9e-2(9.9e-3) 14.535(0.1452) 12.134(0.1369)
KSVD 1.8e-2(6.6e-3) 3.5e-2(7.6e-3) 17.570(0.1578) 14.609(0.1414)
Nonlocal means 5.0e-2(5.5e-3) 5.8e-2(8.9e-3) 13.040(0.4935) 12.404(0.6498)
CWF 2.5e-2(2.0e-3) 9.3e-3(8.8e-4) 16.059(0.3253) 20.314(0.4129)
2 -Autoencoder 4.0e-3(6.0e-4) 6.7e-3(9.0e-4) 24.202(0.6414) 21.739(0.7219)
(0, 0)-GAN + 1 3.8e-3(6.0e-4) 5.6e-3(8.0e-4) 24.265(0.6537) 22.594(0.6314)
WGANgp + 1 3.1e-3(5.0e-4) 5.0e-3(8.0e-4) 25.086(0.6458) 23.010(0.6977)
(1, −1)-GAN + 1 3.4e-3(5.0e-4) 4.9e-3(9.0e-4) 24.748(0.7233) 23.116(0.7399)
(.5, −.5)-GAN + 1 3.5e-3(5.0e-4) 5.6e-3(9.0e-4) 24.556(0.6272) 22.575(0.6441)
Fig. 8 (a) Denoised images with convolution network without ResNet structure in different
methods in RNAP dataset with SNR 0.05 (from left to right, top to bottom): clean, noisy, BM3D,
2 -autoencoder, KSVD, JS-GAN + 1 , WGANgp + 1 , (1, −1)-GAN + 1 , (.5, −.5)-GAN + 1 .
(b) Denoised and reference images in different regularization λ (we use (.5, .5)-GAN +λ 1 as an
example) in corresponding to Table 4. From left to right, top to bottom, the image is clean image,
λ = 0.1, λ = 1, λ = 5, λ = 10, λ = 50, λ = 100, λ = 500, λ = 10,000
It accelerates and stabilizes the model training. Since cryo-EM images are in large
pixel size that fits well the PGGAN method, here we choose its structure2 instead
of the ResNet and convolution structures above to denoise cryo-EM images. Our
experiments partially demonstrate two things: (1) the denoised images sharpen
more, though the MSE changes to be higher; (2) we do not need to add 1
regularization to make model training stable; it can also detect the outlier of images
for both real and simulated data without regularization.
In detail, based on the PGGAN architecture and parameters, we test the following
two objective functions developed in the section “Robust Denoising Method”:
WGANgp and WGANgp + 1 , in the RNAP simulated dataset with SNR 0.05 as
an example to explain. The denoised images are presented in Fig. 9; it is noted
that the model is hard to collapse regardless of adding 1 regularization. The MSE
of adding regularization is 8.09e-3(1.46e-3), which is less than 1.01e-2(1.81e-3)
Fig. 9 Denoised and reference images by PGGAN instead of simple ResNet and convolution
structure in RNAP dataset with SNR 0.05. The PGGAN strategy is tested in two objective
functions: WGANgp + 1 and WGANgp. (a) and (b) are denoised and reference images using
PGGAN with WGANgp + 1 ; (c) and (d) are denoised and reference images using PGGAN
in WGANgp, respectively. Specifically, the images highlighted in red color show the structural
difference between denoised images and reference images. It demonstrates that denoised images
are different from reference images using PGGAN strategy
without adding regularization. Nevertheless, both of them don’t exceed the results
based on the ResNet structure above. This shows that PGGAN architecture does not
have more power than the ResNet structure. But an advantage of PGGAN lies in its
efficiency in training. So it is an interesting problem to improve PGGAN toward the
accuracy of ResNet structure.
Another thing that needs to be highlighted is MSE may not be a good criterion
because denoised images by PGGAN are clearer in some details than the front
methods we propose. This phenomenon is also shown in Appendix “Influence of the
Regularization Parameter: λ.” So how to find a better criterion to evaluate the model
and combine two strengths of ResNet-GAN and PGGAN await us to explore.
In this chapter, we add 1 regularization to make model stable, but how to choose
λ of 1 regularization becomes a significant problem. Here we take (.5, .5)-GAN to
denoise in RNAP dataset with SNR 0.1. According to some results in different λ in
Table 4, we find as the λ tends to infinity, the MSE results tend to 1 -autoencoder,
which is reasonable. Also, the MSE result becomes the smallest as the λ = 10.
What’s more, an interesting phenomenon is found that a much clearer result could
be obtained at λ = 100 than that at λ = 10, although the MSE is not the best (shown
in Fig. 8b).
References
Agostinelli, F., Anderson, M., Lee, H.: Adaptive multi-column deep neural networks with
application to robust image denoising. In: Advances in Neural Information Processing Systems,
pp. 1493–1501 (2013)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionar-
ies for sparse representation. IEEE Trans. Sig. Process. 54(11), 4311–4322 (2006)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceeding
of the International Conference on Machine Learning, pp. 214–223 (2017)
Bae, B., Feklistov, A., Lass-Napiorkowska, A., Landick, R., Darst, S.: Structure of a bacterial RNA
polymerase holoenzyme open promoter complex. Elife 4, e08504 (2015)
998 H. Gu et al.
Bai, X.C., McMullan, G., Scheres, S.: How Cryo-EM is revolutionizing structural biology. Trends
Biochem. Sci. 40(1), 49–57 (2015)
Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of
ICML Workshop on Unsupervised and Transfer Learning, JMLR Workshop and Conference
Proceedings, pp. 37–49 (2012)
Bau, D., Zhu, J.Y., Wulff, J., Peebles, W., Strobelt, H., Zhou, B., Torralba, A.: Seeing what a
gan cannot generate. In: Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 4502–4511 (2019)
Bhamre, T., Zhang, T., Singer, A.: Denoising and covariance estimation of single particle Cryo-EM
images. J. Struct. Biol. 195(1), 72–81 (2016)
Browning, D., Busby, S.: The regulation of bacterial transcription initiation. Nat. Rev. Microbiol.
2(1), 57–65 (2004)
Chen, J., Chen, J., Chao, H., Yang, M.: Image blind denoising with generative adversarial network
based noise modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 3155–3164 (2018)
Chen, J., Chiu, C., Gopalkrishnan, S., Chen, A., Olinares, P., Saecker, R., Winkelman, J., Maloney,
M., Chait, B., Ross, W. et al.: Stepwise promoter melting by bacterial RNA polymerase. Mol.
Cell 78, 275–288.e6 (2020)
Cox, M., Cox, T.: Multidimensional scaling. In: Handbook of Data Visualization. Springer, Berlin,
pp. 315–347 (2008)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-
domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdinov, R.: Good semi-supervised learning that
requires a bad gan. In: Proceedings of the 31st International Conference on Neural Information
Processing Systems, pp. 6513–6523 (2017)
Dong, Z., Liu, G., Ni, G., Jerwick, J., Duan, L., Zhou, C.: Optical coherence tomography image
denoising using a generative adversarial network with speckle modulation. J. Biophotonics
13(4), e201960135 (2020)
Frank, J.: Three-dimensional electron microscopy of macromolecular assemblies: visualization of
biological molecules in their native state. Oxford University Press, New York (2006)
Gao, C., Liu, J., Yao, Y., Zhu, W.: Robust estimation and generative adversarial nets. In: Interational
Conference on Learning Representation, New Orleans (2019)
Gao, C., Yao, Y., Zhu, W.: Generative adversarial nets for robust scatter estimation: a proper scoring
rule perspective. J. Mach. Learn. Res. 21, 160–161 (2020)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing
Systems, pp. 2672–2680 (2014)
Gu, H., Unarta, I.C., Huang, X., Yao, Y.: Robust autoencoder gan for cryo-em image denoising
(2020)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of
Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777
(2017)
Hua, Y., Li, R., Zhao, Z., Chen, X., Zhang, H.: Gan-powered deep distributional reinforcement
learning for resource management in network slicing. IEEE J. Sel. Areas Commun. 38(2),
334–349 (2019)
Huber, P.: Robust estimation of a location parameter. In: Breakthroughs in Statistics. Springer,
New York, pp. 492–518 (1992)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality,
stability, and variation. In: International Conference on Learning Representation, Vancouver
(2018)
Kenzaki, H., Koga, N., Hori, N., Kanada, R., Li, W., Okazaki, K., Yao, X.Q., Takada, S.: CafeMol:
a coarse-grained biomolecular simulator for simulating proteins at work. J. Chem. Theory
Comput. 7(6), 1979–1989 (2011)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on
Learning Representation, San Diego (2015)
26 Generative Adversarial Networks for Robust Cryo-EM Image Denoising 999
Krivov, G., Shapovalov, M., Dunbrack R.L. Jr.: Improved prediction of protein side-chain
conformations with SCWRL4. Proteins: Struct. Funct. Bioinform. 77(4), 778–795 (2009)
Krull, A., Buchholz, T.O., Jug, F.: Noise2void-learning denoising from single noisy images. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–
2137 (2019)
Kühlbrandt, W.: The resolution revolution. Science 343(6178), 1443–1444 (2014)
Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2noise:
learning image restoration without clean data. In: Proceeding of the International Conference
on Machine Learning, pp. 2965–2974 (2018)
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605
(2008)
Marabini, R., Masegosa, I., San Martın, M., Marco, S., Fernandez, J., De la Fraga, L., Vaquerizo,
C., Carazo, J.: Xmipp: an image processing package for electron microscopy. J. Struct. Biol.
116(1), 237–240 (1996)
Moore, B., Kelley, L., Barber, J., Murray, J., MacDonald, J.: High–quality protein backbone
reconstruction from alpha carbons using Gaussian mixture models. J. Comput. Chem. 34(22),
1881–1889 (2013)
Murakami, K., Masuda, S., Darst, S.: Structural basis of transcription initiation: Rna polymerase
holoenzyme at 4 å resolution. Science 296(5571), 1280–1284 (2002)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in
Neural Information Processing Systems, pp. 849–856 (2002)
Okazaki, K., Koga, N., Takada, S., Onuchic, J., Wolynes, P.: Multiple-basin energy landscapes
for large-amplitude conformational motions of proteins: structure-based molecular dynamics
simulations. Proc. Natl. Acad. Sci. 103(32), 11844–11849 (2006)
Punjani, A., Rubinstein, J.L., Fleet, D.J., Brubaker, M.A.: CryoSPARC: algorithms for rapid
unsupervised Cryo-EM structure determination. Nat. Methods 14(3), 290 (2017)
Quan, Y., Chen, M., Pang, T., Ji, H.: Self2self with dropout: Learning self-supervised denoising
from single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 1890–1898 (2020)
Sarmad, M., Lee, H.J., Kim, Y.M.: Rl-gan-net: a reinforcement learning agent controlled gan
network for real-time point cloud shape completion. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 5898–5907 (2019)
Scheres, S.: Processing of structurally heterogeneous Cryo-EM data in RELION. In: Methods in
Enzymology. Elsevier, Academic Press, vol. 579, pp. 125–157 (2016)
Shen, P.: The 2017 Nobel Prize in Chemistry: Cryo-EM comes of age. Anal. Bioanal. Chem.
410(8), 2053–2057 (2018)
Su, M., Zhang, H., Schawinski, K., Zhang, C., Cianfrocco, M.: Generative adversarial networks as
a tool to recover structural information from cryo-electron microscopy data. BioRxiv, p. 256792
(2018)
Tenenbaum, J., De Silva, V., Langford, J.: A global geometric framework for nonlinear dimension-
ality reduction. Science 290(5500), 2319–2323 (2000)
Tran, L., Nguyen, S.M., Arai, M.: GAN-based noise model for denoising real images. In:
Proceedings of the Asian Conference on Computer Vision (2020)
Tripathi, S., Lipton, Z.C., Nguyen, T.Q.: Correction by projection: denoising images with
generative adversarial networks (2018)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features
with denoising autoencoders. In: Proceeding of the International Conference on Machine
Learning, pp. 1096–1103 (2008)
Wang, J., Yin, C.C (2013) A zernike-moment-based non-local denoising filter for cryo-em images.
Sci. China Life Sci. 56(4), 384–390
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to
structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, F., Gong, H., Liu, G., Li, M., Yan, C., Xia, T., Li, X., Zeng, J.: DeepPicker: a deep learning
approach for fully automated particle picking in Cryo-EM. J. Struct. Biol. 195(3), 325–336
(2016)
1000 H. Gu et al.
Warren, B.E.: X-Ray Diffraction. Courier Corporation. Dover Publications; Reprint Edition (1990)
Wei, D.Y., Yin, C.C.: An optimized locally adaptive non-local means denoising filter for cryo-
electron microscopy data. J. Struct. Biol. 172(3), 211–218 (2010)
Wong, W., Bai, X.C., Brown, A., Fernandez, I., Hanssen, E., Condron, M., Tan, Y.H., Baum, J.,
Scheres, S.: Cryo-EM structure of the Plasmodium falciparum 80s ribosome bound to the anti-
protozoan drug emetine. Elife 3, e03080 (2014)
Wüthrich, K.: NMR with proteins and nucleic acids. Europhys. News 17(1), 11–13 (1986)
Xian, Y., Gu, H., Wang, W., Huang, X., Yao, Y., Wang, Y., Cai, J.F.: Data-driven tight frame for
cryo-em image denoising and conformational classification. In: Proceeding of the IEEE Global
Conference on Signal and Information Processing (GlobalSIP), pp. 544–548 (2018)
Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances
in Neural Information Processing Systems, pp. 341–349 (2012)
Yang, Q., Yan, P., Zhang, Y., Yu, H., Shi, Y., Mou, X., Kalra, M., Zhang, Y., Sun, L., Wang, G.:
Low-dose CT image denoising using a generative adversarial network with wasserstein distance
and perceptual loss. IEEE Trans. Med. Imaging 37(6), 1348–1357 (2018)
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: Residual learning
of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Zhong, E., Bepler, T., Davis, J., Berger, B.: Reconstructing continuous distributions of 3D protein
structure from Cryo-EM images. In: International Conference on Learning Representation,
Addis Ababa (2020)
Variational Models and Their Combinations
with Deep Learning in Medical Image 27
Segmentation: A Survey
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002
Conventional Algorithms Based on Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
The Data Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004
The Regularization Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006
Variational Models Meet Deep Learning in Medical Image Segmentation . . . . . . . . . . . . . . 1011
Variational Models Guided Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011
Deep Learning-Driven Variational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017
Abstract
L. Gui · J. Ma
Department of Mathematics, Nanjing University of Science and Technology, Nanjing, China
e-mail: [email protected]; [email protected]
X. Yang ()
Department of Mathematics, Nanjing University, Nanjing, China
e-mail: [email protected]
based models with deep learning is a hot topic. In this survey, we briefly
review the segmentation methods based on a variational method making use of
image information and regularity information. Subsequently, we clarify how the
integration of variational methods into the deep learning framework leads to more
precise segmentation results.
Keywords
Introduction
During the past 5 years, fully supervised deep learning methods have revolu-
tionized medical image segmentation (Litjens et al. 2017), and many convolutional
neural networks (CNNs) (Long et al. 2015; Shelhamer et al. 2017; Ronneberger
et al. 2015; Isensee et al. 2021) have achieved unprecedented performance, such
as liver segmentation (Bilic et al. 2019; Kavur et al. 2021), cardiac segmenta-
tion (Bernard et al. 2018), kidney segmentation (Heller et al. 2020), and so on.
CNN-based segmentation methods directly build the end-to-end mapping between
images and annotations by automatically learning object feature representations
from a number of training data. The learned models can be directly applied to
testing images without any hyperparameter tuning. However, these methods lack
27 Variational Models and Their Combinations with Deep Learning in. . . 1003
interpretability and rely on the large training sets. In this paper, we mainly focus
on fully supervised deep learning methods, while there are also weakly supervised
methods (Cheplygina et al. 2019) for medical image segmentation.
Thanks to the complementary roles between classical variational models and
modern deep learning approaches, a natural trend is to combine the advantages of
the two types of approaches to design more accurate, data-efficient, and transparent
segmentation methods.
This paper aims to present an overview of classical variational models and
their extensions in deep learning era, especially in medical image segmentation.
The remainder of this article is organized as follows. First, we introduce the
conventional variational models with typical data terms and regularization terms.
Then, we present the different combination mechanisms between variational models
and deep learning: variational model-guided deep learning and deep learning-driven
variational models. Finally, we draw a brief conclusion.
In 1989 Mumford and Shah proposed a famous image segmentation model, named
Mumford-Shah (MS) model (Mumford and Shah 1989), which assumes the image
I as a piece-wise smooth function u with the following energy functional:
EMS (C, u) = |I − u| dx + ν 2
|∇u|2 dx + γ H 1 (C), (1)
Ω Ω\C
Efidelity Eregularization
where C is a closed subset of image domain Ω and represents the boundary of the
object, and H 1 is the one-dimensional Hausdorff measure.
The solution of the functional (1) is formed by smooth regions Ri , which is
represented by u and with sharp boundaries C. A reduced form of this problem is
to simplify the restriction of EMS to piecewise constant functions u, that is, u = ci
on each Ri . The reduced case is proposed by Chan and Vese (2001); the energy
functional of Chan-Vese (CV) model is as follows:
ECV (C, c1 , c2 ) = λ1 |I − c1 |2 dx + λ2 |I − c2 |2 , dx
inside(C) outside(C)
Efidelity
+ μ|C| , (2)
Eregularization
and these kinds of models have been extensively studied since the original work
of Kass et al. (1988). The main idea is based on deforming the initial contour so that
it is oriented towards the boundary of the object to be detected. The classical snakes
model relates the parametrized planar curve C(q) : [0, 1] → R2 to an energy which
is given by
1 1
Esnakes (C) = − |∇I (C(q))|2 dq + α|C (q)|2 + β|C (p)|2 dq . (3)
0
0
Efidelity Eregularization
The above three most typical methods are based on regional information and
boundary information, respectively. Many researchers also classify the variational
model-based segmentation methods into two categories: region information-based
and boundary information-based. This classification method is mainly according to
the usage of different types of information in data terms. However, in the actual
segmentation of medical images, there are inevitably disturbing factors such as
imaging noise, artifacts, and occlusions, which can easily mislead the segmentation
algorithm and lead to imprecise segmentation results. In this case, it has become a
current inevitable trend to impose proper features or constraints on the segmentation
models. The energy term to achieve this function is called the regularization term.
The functional of the above three classical methods also consists of two types of
energy, the fidelity term and the regularization term, as labeled in these energy
functionals. One is the term driven by image information, which guarantees the
correspondence between segmentation results and image data and is called the
fidelity term. The other guarantees specific properties of the contour or region. This
category is called the regularization term.
In image segmentation, the fidelity term is also called the data term for two main
reasons. First, the energy of this term usually originates from the image itself, such
as Efidelity in the snakes model, which utilizes the gradient information of the image,
and Efidelity in the CV model, which utilizes the mean values of the intensity of the
different regions of the image. In addition, segmentation models also usually make
assumptions about the image, such as the MS model, in which a piecewise smooth
function u is used to approximate the image. The fidelity term Ω |I −u|2 dx ensures
that the function u does not deviate too far from the actual image I . According to
the different types of image information utilized by the fidelity, we classify them
into two categories, boundary information-based, and regional information-based.
actual boundary of the object is usually considered to be where the pixel changes
most dramatically, so the boundary information can be obtained by applying
edge detectors, which typically involve first- or second-order spatial differential
operators.
One of the most popular segmentation models using edge information is the GAC
model (Caselles et al. 1997), which uses image gradient to construct a monotonically
decreasing function as a stopping function to control the contour evolution. Since
the object boundary is usually expressed as the maximum gradient in the image, this
method enables the contour to stop at the desired object boundary.
Since segmentation algorithms aim to find the boundaries of the objects, the
detection of boundaries and boundary-based segmentation algorithms is a very
intuitive idea and has very accurate segmentation results on better-quality images.
However, since interferences such as noise and pseudo-boundaries are often present
on medical images and segmentation targets often show weak or missing bound-
aries, in these cases, boundary-dependent algorithms are often fragile. Therefore,
some researchers have also emphasized the importance of integrating regional
information for accurate segmentation (Haddon and Boyce 1990; Falah et al. 1994;
Chan et al. 1996; Muñoz et al. 2003).
In segmentation, the regularization term, also known as constraint, keeps the model
from overfitting or imposing some restrictions so that the segmentation curve or
segmented region has specific desired properties. Based on the purpose of these
regularization terms, they can be divided into two categories: generic regularization
terms, which are not related to the segment objects, and specific regularization
terms, which are related to the segment objects. Furthermore, they constrain and
guide the segmentation model according to some characteristics of the objects.
Generic Regularization
The constraints imposed on the curve are usually independent of the specific
segmentation target, by which the smoothness or other characteristics of the curve
are guaranteed.
27 Variational Models and Their Combinations with Deep Learning in. . . 1007
Fig. 1 The 1st row: liver segmentation results, from left to right: by CV model (Chan and Vese
2001) and method from Gui et al. (2017c); the 2nd row: zoomed regions of the segmentation results
The penalty for length is one of the most famous regularization terms in the
segmentation model, such as the MS model (1) and CV model (2). Although the
constraint on the length of the contour helps cope with problems such as a certain
amount of noise in the image, it also brings a bias towards smaller-length contour
lines, which leads to isotropic smooth segmentation curves, and small/shortened
objects.
The total variation regularization can smooth only the tangent direction of each
level line
RT V (φ) = sup{ u divφ : φ ∈ Cc , φ ≤ ∞} (4)
Ω
1008 L. Gui et al.
where κ is the curvature γ , and two parameters a, b > 0. The most remarkable
feature of elastic regularity is that it promotes convex contours. It may therefore
be used for some particular task of segmenting objects with a convex shape (Bae
et al. 2017). And in the snake model (3), the regularization term then consists of
two components, the bending energy and the elastic energy, where the bending
energy is defined as the sum of the squared curvature of the curve, generating the
bending force. In contrast, the elastic energy prevents the stretching of the curve by
introducing tension.
In addition to restriction on the nature of the curve itself, regularization terms
on the curve have also been proposed as a guarantee of stability and speed of
evolution. For instance, Li et al. (2010) avoided re-initialization of the level set
by imposing restriction on the gradient of the high-dimensional surface φ while
ensuring evolutionary stability, making larger steps and faster speeds possible. Yu
et al. (2019) performed a restriction on a small neighborhood of zero-level set
functions by adding a perturbation factor, thus breaking the pseudo-balance due
to heavy noise and then reaching the global optimum.
geometric, and topological properties of the target are widely applied to promote
segmentation efficiency.
The segmentation task in medical images is usually to segment out some organs,
tissues, or lesions. Fortunately, some organs and tissues have generally similar
morphological features. Although the images are subject to imaging errors and
individual differences, the shape prior is a robust semantic descriptor for specifying
targeted objects. In our categorization, shape prior can be modeled in two ways:
building statistical templates and representing by analytical expressions.
Some simple shapes, such as circles or ellipses, can be expressed analytically, and
by optimizing the parameters of these analytic expressions, the shape constraints of
this analytic representation can be adapted to different variations of the segmented
objects, including scale, rotation, and translation (Ray and Acton 2004).
For complex shapes that are difficult to express analytically, an alternative
approach is to use a prior shape representation in the form of templates. Template-
based shape priors are usually obtained by training on a set of similar shapes.
Some researchers have studied the distribution of points on significant positions
of the object, also called landmark points, to build a shape template for the
object (Cootes et al. 1995), and some researchers employed boundary points as
the shape templates (Grenander et al. 2012; Mardia et al. 1991). Subsequently,
this kind of parametric point distribution shape prior was also extended into a
hybrid segmentation model incorporating intensities (Grenander and Miller 1994)
or both gradient and region-homogeneity information (Chakraborty et al. 1994). In
the level-set-based approaches, shape constraint is represented as a zero level set
of a higher-dimensional surface. Any deviation from the shape can be penalized
(Leventon et al. 2002); a simple way to calculate the dissimilarity between them
is given by Ω (φ1 − φ2 )2 dx, where φ1 and φ2 are shape constraint and segmented
contour, respectively. Usually, to fit the unknown segmentation target, parameters
of position, scale, orientation, and other information are also included in the shape
energy term (Chen et al. 2002; Pluempitiwiriyawej et al. 2005).
In addition to specific shapes, segmentation targets on medical images may have
other more general morphological properties that allow researchers to add them as
high-level information to the energy functional as effective constraints. For example,
many objects have convex characteristics. As mentioned above, the curvature-
based elastic energy term can maintain the convexity of the target. In addition,
the limitation of the region can also provide the convexity of the segmentation
target (Li et al. 2019; Yan et al. 2020; Luo et al. 2019). In medical images, the
left ventricle segmentation is a representative example of the need to preserve the
convexity of the object (Feng et al. 2016; Shi and Li 2021; Hajiaghayi et al. 2016).
Segmentation of the left ventricle (LV) is critical for the diagnosis of cardiovascular
disease. Accurate assessment of crucial clinical parameters such as ejection fraction,
myocardial mass, and beat volume depends on the segmentation of the LV, that is,
the precise segmentation of the endocardial border. According to the anatomy of
the left ventricle, the left ventricle includes the cardiac chambers, trabeculae, and
papillary muscles surrounded by the myocardium.Although there is good contrast
1010 L. Gui et al.
between myocardium and blood flow on MR images, there are still difficulties in
segmentation. This problem is mainly due to the presence of papillary muscles
and trabeculae (irregular walls) within the ventricles. They have the same intensity
distribution as the surrounding myocardial tissue. Therefore, they can easily mislead
the segmentation algorithm and prevent the walls from being clearly depicted,
causing critical difficulties in endocardial segmentation.
In addition to the above geometric features, many other regularization terms
proposed for segmented object characteristics can also facilitate segmentation.
For example, some segmentation objects have a tendency to cluster together,
which is defined as compactness. This characteristic can be used as constraint in
segmentation organs, such as liver, prostate, as well as cysts and most hepatocellular
carcinoma (Gui et al. 2017b). Considering that segmented objects in medical images
may present deformation due to lesions, researchers used low-order moment as
regularity to constrain the size/volume (Ayed et al. 2008) or location (Klodt and
Cremers 2011) of the objects. Figure 2 shows the different segmentation results
given by the two methods, one using the classical GAC method (Caselles et al.
1997) without any prior and the other using the intensity information of the image
and the isoperimetric shape prior (Gui et al. 2017b). The differences between the
two segmentation results can be observed by zooming in on the region.
Fig. 2 The 1st row: liver segmentation results, from left to right: by geodesic active contours
(GAC) (Caselles et al. 1997) and method from Gui et al. (2017b) ; the 2nd row: zoomed regions of
the segmentation results
27 Variational Models and Their Combinations with Deep Learning in. . . 1011
Since 2015, deep learning has gradually dominated medical image segmentation
methods. A typical segmentation network is composed of an encoder network
followed by a decoder network. The encoder network aims to extract and aggregate
features from input images, and the decoder network is to project the features onto
the pixel space to get dense predictions. In this way, the deep learning network can
directly generate pixel-wise segmentation results with input images. Thus, a natural
problem is that could one combine the advantages of deep learning networks and
variational models. In this section, we will summarize the progress in this direction.
∂φt
φt+1 = φt + η , (7)
∂t
where η is the step size (or the learning rate in deep learning). Then, sequence data
{xt } for recurrent network input are generated based on the level set evolution:
1 The network module is a combination of several network layers, which is part of the network.
For example, the well-known U-Net consists of multiple Convolution-Batch Normalization-ReLU
modules.
1012 L. Gui et al.
different update rules and outputs. Specifically, the variational level set is updated
by the gradient flow of the energy functional, and the output is still a level set
function, while deep learning level set is updated by network layers with learnable
hyperparameters, and the output is the Softmax probability map.
This network module can be directly connected to existing segmentation net-
works with convolutional layers and deconvolutional layers for medical image
segmentation. For example, Le et al. proposed deep recurrent level set network for
brain tumor segmentation (Le et al. 2018a), which achieved less computational time
during inference and improved the Dice Similarity Coefficient (DSC) by 1–2%.
In addition to unrolling the level set evolution as network modules, regularizers or
priors in classical variational models can also be incorporated into segmentation net-
works for end-to-end learning. The main challenge is to formulate the non-smooth
constraints as differentiable network modules. Typical segmentation CNNs (Ron-
neberger et al. 2015; Çiçek et al. 2016; Shelhamer et al. 2017) predict each pixel
independently and do not explicitly consider the dependency between pixels, which
could lead to isolated or scattered small segmentation errors, especially when only
few training data is available. To embed spatial regularity in segmentation CNNs,
Jia et al. proposed total variation (TV) regularized segmentation CNNs (Jia et al.
2021) to add spatial regularization to the segmented networks, which can produce
smooth edges and eliminate isolated segmentation errors. This approach was further
applied to pancreas segmentation (Fan and Tai 2019) by unfolding the primal-dual
block of TV regularizer and embedding in 2D U-Net (Ronneberger et al. 2015).
This type of method has two main benefits. On the one hand, it can produce smooth
segmentation edges and eliminate isolated segmentation errors. On the other hand,
it is more efficient than the commonly used post-processing methods (Kamnitsas
et al. 2017). In order to explicitly add non-local priors to CNNs, Jia et al. (2020)
introduced graph total variation to the Softmax function by a primal-dual hybrid
gradient method, which can capture long-range information.
Some common shape priors were embedded in segmentation CNNs by reformu-
lating the Softmax layer. Liu et al. (2020b) proposed a Soft Threshold Dynamics
framework to integrate many spatial priors of the classical variational models into
segmentation CNNs, including spatial regularization, volume, and star-shape priors.
The key idea to interpret the Softmax function s is to consider it as a solution of the
following variational problem:
where o is the network output in the last layer and N i=1 si = 1 (N is the number
of classes). In this way, many spatial priors can be imposed on the Softmax results
by adding corresponding terms on the energy functional (9). Furthermore, a Soft
Threshold Dynamics algorithm was designed to solve the regularized variation
problems, which enable stable and fast convergence during forward and backward
propagation. Similarly, the convex shape prior (Liu et al. 2020a) and volume-
preserving regularization (Li et al. 2020a) were also imposed on segmentation
27 Variational Models and Their Combinations with Deep Learning in. . . 1013
The Mumford-Shah model-inspired loss function (Kim and Ye 2019) This loss
function is based on the observation that the characteristic function in the Mumford-
Shah model has a striking similarity to the Softmax function in segmentation CNNs.
Thus, Kim et al. proposed the following loss function by replacing the characteristic
function with Softmax function:
N N
LMS (Θ; I ) = |I (x) − ci | Si (I (x); Θ)dx + λ
2
|∇Si (I (x); Θ)|dx,
i=1 Ω i=1 Ω
(10)
where Θ is the trainable network parameters and
I (xSi (x; Θ))
ci = Ω (11)
Ω Si (x; Θ)dx
is the average intensity value of the i-th class. This loss function enables semi-
supervised and unsupervised segmentation, which only requires limited labeled
data.
Chan-Vese model-inspired loss function Kim et al. introduced level set loss (Kim
et al. 2019) by using the region term of Chan-Vese model, which is defined by
LLevelSet = |IGT − c1 |2 H (φΘ )dx + |IGT − c2 |2 (1 − H (φΘ ))dx, (12)
Ω Ω
where φΘ is the predicted level set function by the network with parameters Θ and
H (φΘ ) = 12 (1 + tanh ( φΘ )). c1 and c2 denote the average values of the interior and
exterior of the contour, which are defined by
Ω IGT H (φΘ )dx
c1 =
Ω H (φΘ )dx
and
Ω IGT (1 − H (φΘ ))dx
c2 = ,
Ω 1 − H (φΘ )dx
respectively.
1014 L. Gui et al.
Chen et al. proposed an active contour loss (Chen et al. 2019) to consider the
area inside and outside objects as well as the size of boundaries during learning.
In particular, it introduces total variation to approximate the boundary length and
membership functions to compute the region area, which is defined by
Geodesic active contour inspired loss (Ma et al. 2021b) To explicitly embed
object global information in segmentation CNNS, Ma et al. proposed a level set
regression network with the geodesic active contour loss function:
LGAC = gI δ (φΘ )|∇φΘ |dx, (15)
Ω
where gI = 1+|∇I 1
| is the edge indicator function. Different from the level set loss
and active contour loss that only used the groundtruth information, the geodesic
active contour loss explicitly introduced the image gradient information, which can
guide the CNNs to capture detailed boundary information.
Figure 3 presents the visualized segmentation results of different methods on left
atrial MRI and pancreas CT images (Fig. 3-a). Commonly used Dice loss (Milletari
et al. 2016) (Fig. 3-b) may have obvious segmentation errors because it does not
have any global constraint. Level set loss (Kim et al. 2019) (Fig. 3-c) and active
contour loss (Chen et al. 2019) (Fig. 3-d) generate similar results that are better than
the Dice loss. However, there are still some isolated outliers in the segmentation
results. In contrast, the learning GAC (Ma et al. 2021b) (Fig. 3-e) significantly
reduces the isolated segmentation masses, and the boundaries are closer to the
27 Variational Models and Their Combinations with Deep Learning in. . . 1015
(1)
Left Atrial MRI
(2)
(3)
Pancreas CT
(4)
Image Dice loss Level set loss Active contour loss Learning GAC
Fig. 3 Qualitative comparisons between commonly used Dice loss (Milletari et al. 2016), Chan-
Vese model-inspired level set loss (Kim et al. 2019), active contour loss (Chen et al. 2019), and
geodesic active contours inspired learning GAC method (Ma et al. 2021b). The green and red
contours denote groundtruth and segmentation results, respectively. (a) Image. (b) Dice loss. (c)
Level set loss. (d) Active contour loss. (e) Learning GAC
ground truth. The is because the learning GAC explicitly considers the image
boundary information and geodesic geometry constraint, which can guide the
network outputs to achieve lower-energy state of geodesic active contour model
and then lead to more accurate results in boundary regions. In addition, it should be
noted that the above variational model-inspired loss functions should be added to
the Dice loss in a supervised learning framework.
(I − c1 )2 (I − c2 )2
min δ(φ)|∇φ|dx + λ1 H (φ)dx + λ2 (1 − H (φ))dx
φ,c1 ,c2 Ω Ω A1 Ω A2
(16)
where A1 = Ω H (φ)dx and A2 = Ω (1 − H (φ))dx are the area of the local
interior and exterior regions surrounding the contour. To adaptively estimate the
region term weights λ1 and λ2 separately for each case during contour evolution,
a CNN was employed to predict the location of the zero level set contour relative
to the segmentation target (e.g., lesions), and the output was a probability for each
of three classes: inside the lesion and far from its boundaries (p1), close to the
boundaries of the lesion (p2), or outside the lesion and far away from its boundaries
(p3). The weight parameters were set as follows:
1 + p2 + p3 1 + p1 + p 2
λ1 = exp( ), λ2 = exp( ). (17)
1 + p1 + p2 1 + p2 + p3
If p1 > p3 , then λ2 > λ1 and the contour will expand. Conversely, if p3 > p1 , then
λ1 > λ2 and the contour tend to shrink. In this way, the contour can be adaptively
expanded or shrinked towards the object boundary without any manual tuning.
Instead of predicting the contour location, Hatamizadeh et al. (2019) used an
encoder-decoder network to predict the segmentation probability map Sθ . The
weights was set as follows:
2 − Sθ 1 + Sθ
λ1 = exp( ), λ2 = exp( ). (18)
1 + Sθ 2 − Sθ
Experiments on various lesion segmentation tasks (e.g., brain lesion, liver lesion,
lung lesion) and image modalities (CT and MR) show that the proposed method can
produce more accurate and detailed boundaries compared with only using CNNs.
This minimization problem can be solved by the split Bregman algorithm (Goldstein
et al. 2010). In the forward propagation, the DenseU-Net generated initial contours
and pixel-wise hyperparameter maps of Eq. (19). Then, the contours, maps, and
input images were transmitted to the active contour model that was solved by the
split Bregman algorithm (Goldstein et al. 2010). The whole network was trained by
comparing the final output to the ground truth with cross-entropy loss function.
27 Variational Models and Their Combinations with Deep Learning in. . . 1017
Conclusion
In this paper, we have introduced the typical variational models and their com-
binations with modern deep learning methods, which have many applications in
medical image segmentation. We have witnessed several different strategies to fuse
the merits of variational models and deep learning methods. However, there is still a
lack of the public segmentation benchmark to evaluate and compare these methods
in a common and fair platform. We hope this survey can reach broad audiences with
diverse backgrounds and inspire more inter-crossing researches between variational
models and deep learning.
References
Ali, H., Rada, L., Badshah, N.: Image segmentation for intensity inhomogeneity in presence of
high noise. IEEE Trans. Image Process. 27(8), 3729–3738 (2018)
Ayed, I.B., Li, S., Islam, A., Garvin, G., Chhem, R.: Area prior constrained level set evolution
for medical image segmentation. In: Medical Imaging 2008: Image Processing, vol. 6914,
p. 691402. International Society for Optics and Photonics (2008)
Bae, E., Tai, X.C., Wei, Z.: Augmented lagrangian method for an Euler’s elastica based segmenta-
tion model that promotes convex contours (2017)
Balafar, M.: Gaussian mixture model based segmentation methods for brain MRI images. Artif.
Intell. Rev. 41(3), 429–439 (2014)
Beichel, R., Bischof, H., Leberl, F., Sonka, M.: Rosbust active appearance models and their
application to medical image analysis. IEEE Trans. Med. Imaging 24(9), 1151–1169 (2005)
Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.A., Cetin, I., Lekadir,
K., Camara, O., Ballester, M.A.G., et al.: Deep learning techniques for automatic mri cardiac
multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging
37(11), 2514–2525 (2018)
Bilic, P., Christ, P.F., Vorontsov, E., Chlebus, G., Chen, H., Dou, Q., Fu, C.W., Han, X.,
Heng, P.A., Hesser, J., et al.: The liver tumor segmentation benchmark (lits). arXiv preprint
arXiv:1901.04056 (2019)
Boonnuk, T., Srisuk, S., Sripramong, T.: Texture segmentation using active contour model with
edge flow vector. Int. J. Inf. Electron. Eng. 5(2), 107 (2015)
1018 L. Gui et al.
Boykov, Y., Funka-Lea, G.: Graph cuts and efficient ND image segmentation. Int. J. Comput. Vis.
70(2), 109–131 (2006)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE
Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22(1), 61–79
(1997)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense
volumetric segmentation from sparse annotation. In: International Conference on Medical
Image Computing and Computer-Assisted Intervention, pp. 424–432 (2016)
Chakraborty, A., Staib, L.H., Duncan, J.S.: An integrated approach to boundary finding in medical
images. In: Proceedings of IEEE Workshop on Biomedical Image Analysis, pp. 13–22. IEEE
(1994)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277
(2001)
Chan, F., Lam, F., Poon, P., Zhu, H., Chan, K.: Object boundary location by region and contour
deformation. IEE Proc.-Vis. Image Sig. Process. 143(6), 353–360 (1996)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Chen, Y., Tagare, H.D., Thiruvenkadam, S., Huang, F., Wilson, D., Gopinath, K.S., Briggs, R.W.,
Geiser, E.A.: Using prior shapes in geometric active contours in a variational framework. Int. J.
Comput. Vis. 50(3), 315–328 (2002)
Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning
active contour models for medical image segmentation. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 11632–11640 (2019)
Cheplygina, V., de Bruijne, M., Pluim, J.P.: Not-so-supervised: a survey of semi-supervised, multi-
instance, and transfer learning in medical image analysis. Med. Image Anal. 54, 280–296
(2019)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.:
Learning phrase representations using rnn encoder-decoder for statistical machine translation.
In: Empirical Methods in Natural Language Processing (EMNLP) (2014)
Cootes, T.F., Hill, A., Taylor, C.J., Haslam, J.: Use of active shape models for locating structures
in medical images. Image Vis. Comput. 12(6), 355–365 (1994)
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and
application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
Cootes, T., Baldock, E., Graham, J.: An introduction to active shape models. Image Process. Anal.
328, 223–248 (2000)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal.
Mach. Intell. 23(6), 681–685 (2001)
Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation:
integrating color, texture, motion and shape. Int. J. Comput. Vis. 72(2), 195–215 (2007)
Esedoglu, S., March, R.: Segmentation with depth but without detecting junctions. J. Math.
Imaging Vis. 18(1), 7–15 (2003)
Falah, R.K., Bolon, P., Cocquerez, J.P.: A region-region and region-edge cooperative approach
of image segmentation. In: Proceedings of 1st International Conference on Image Processing,
vol. 3, pp. 470–474. IEEE (1994)
Fan, J., Tai, X.c.: Regularized unet for automated pancreas segmentation. In: Proceedings of the
Third International Symposium on Image Computing and Digital Medicine, pp. 113–117 (2019)
Feng, C., Zhang, S., Zhao, D., Li, C.: Simultaneous extraction of endocardial and epicardial
contours of the left ventricle by distance regularized level sets. Med. Phys. 43(6Part1), 2741–
2755 (2016)
Goldstein, T., Bresson, X., Osher, S.: Geometric applications of the split bregman method:
segmentation and surface reconstruction. J. Sci. Comput. 45(1), 272–293 (2010)
Grenander, U., Miller, M.I.: Representations of knowledge in complex systems. J. R. Stat. Soc.:
Ser. B (Methodological) 56(4), 549–581 (1994)
27 Variational Models and Their Combinations with Deep Learning in. . . 1019
Grenander, U., Chow, Y.-S., Keenan, D.M.: Hands: A pattern theoretic study of biological shapes,
vol. 2. Springer Science & Business Media, New York (2012)
Gui, L., Yang, X.: Automatic renal lesion segmentation in ultrasound images based on saliency
features, improved lbp, and an edge indicator under level set framework. Med. Phys. 45(1),
223–235 (2018)
Gui, L., He, J., Qiu, Y., Yang, X.: Integrating compact constraint and distance regularization with
level set for hepatocellular carcinoma (HCC) segmentation on computed tomography (CT)
images. Sens. Imaging 18(1), 4 (2017a)
Gui, L., Li, C., Yang, X.P.: Medical image segmentation based on level set and isoperimetric
constraint. Phys. Med. 42, 162–173 (2017b)
Gui, L., Yang, X., Cremers, A.B., Chen, Y.: Dempster-shafer evidence theory-based CV model for
renal lesion segmentation of medical ultrasound images. J. Med. Imaging Health Inform. 7(3),
595–606 (2017c)
Gur, S., Wolf, L., Golgher, L., Blinder, P.: Unsupervised microvascular image segmentation using
an active contours mimicking neural network. In: Proceedings of the IEEE International
Conference on Computer Vision, pp. 10722–10731 (2019)
Haddon, J.F., Boyce, J.F.: Image segmentation by unifying region and boundary information. IEEE
Trans. Pattern Anal. Mach. Intell. 12(10), 929–948 (1990)
Hajiaghayi, M., Groves, E.M., Jafarkhani, H., Kheradvar, A.: A 3-D active contour method for
automated segmentation of the left ventricle from magnetic resonance images. IEEE Trans.
Biomed. Eng. 64(1), 134–144 (2016)
Hatamizadeh, A., Hoogi, A., Sengupta, D., Lu, W., Wilcox, B., Rubin, D., Terzopoulos, D.:
Deep active lesion segmentation. In: International Workshop on Machine Learning in Medical
Imaging, pp. 98–105 (2019)
Hatamizadeh, A., Sengupta, D., Terzopoulos, D.: End-to-end trainable deep active contour models
for automated image segmentation: delineating buildings in aerial imagery. In: European
Conference on Computer Vision, pp. 730–746 (2020)
Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan, Y., Mu, G., Lin, Z., Han,
M., Yao, G., Gao, Y., Zhang, Y., Wang, Y., Hou, F., Yang, J., Xiong, G., Tian, J., Zhong, C., Ma,
J., Rickman, J., Dean, J., Stai, B., Tejpaul, R., Oestreich, M., Blake, P., Kaluzniak, H., Raza,
S., Rosenberg, J., Moore, K., Walczak, E., Rengel, Z., Edgerton, Z., Vasdev, R., Peterson, M.,
McSweeney, S., Peterson, S., Kalapara, A., Sathianathen, N., Papanikolopoulos, N., Weight, C.:
The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging:
results of the kits19 challenge. Med. Image Anal. 67, 101821 (2020)
Hoogi, A., Subramaniam, A., Veerapaneni, R., Rubin, D.L.: Adaptive estimation of active contour
parameters using convolutional neural networks and texture analysis. IEEE Trans. Med.
Imaging 36(3), 781–791 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional net-
works. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 4700–4708 (2017)
Iglesias, J.E., Sabuncu, M.R.: Multi-atlas segmentation of biomedical images: a survey. Med.
Image Anal. 24(1), 205–219 (2015)
Isensee, F., Jäeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring
method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211
(2021)
Jia, F., Tai, X.C., Liu, J.: Nonlocal regularized cnn for image segmentation. Inverse Probl. Imaging
14(5), 891 (2020)
Jia, F., Liu, J., Tai, X.C.: A regularized convolutional neural network for semantic image
segmentation. Anal. Appl. 19(01), 147–165 (2021)
Ji, Z., Xia, Y., Sun, Q., Chen, Q., Xia, D., Feng, D.D.: Fuzzy local Gaussian mixture model for
brain MR image segmentation. IEEE Trans. Inf. Technol. Biomed. 16(3), 339–347 (2012)
Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D.,
Glocker, B.: Efficient multi-scale 3D CNN with fully connected crf for accurate brain lesion
segmentation. Med. Image Anal. 36, 61–78 (2017)
1020 L. Gui et al.
Kanizsa, G.: Contours without gradients or cognitive contours? Giornale Italiano di Psicologia
(1974)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4),
321–331 (1988)
Kavur, A.E., Gezer, N.S., Barış, M., Aslan, S., Conze, P.H., Groza, V., Pham, D.D., Chatterjee,
S., Ernst, P., Özkan, S., Baydar, B., Lachinov, D., Han, S., Pauli, J., Isensee, F., Perkonigg, M.,
Sathish, R., Rajan, R., Sheet, D., Dovletov, G., Speck, O., Nürnberger, A., Maier-Hein, K.H.,
Bozdaḡı Akar, G., Ünal, G., Dicle, O., Selver, M.A.: Chaos challenge – combined (CT-MR)
healthy abdominal organ segmentation. Med. Image Anal. 69, 101950 (2021)
Kim, B., Ye, J.C.: Mumford–Shah loss functional for image segmentation with deep learning. IEEE
Trans. Image Process. 29, 1856–1866 (2019)
Kim, Y., Kim, S., Kim, T., Kim, C.: CNN-based semantic segmentation using level set loss. In:
2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1752–1760
(2019)
Klodt, M., Cremers, D.: A convex framework for image segmentation with moment constraints.
In: 2011 International Conference on Computer Vision, pp. 2236–2243. IEEE (2011)
Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Trans. Image
Process. 17(11), 2029–2039 (2008)
Le, T.H.N., Gummadi, R., Savvides, M.: Deep recurrent level set for segmenting brain tumors. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,
pp. 646–653 (2018a)
Le, T.H.N., Quach, K.G., Luu, K., Duong, C.N., Savvides, M.: Reformulating level sets as deep
recurrent neural network approach to semantic segmentation. IEEE Trans. Image Process. 27(5),
2393–2407 (2018b)
Leventon, M.E., Grimson, W.E.L., Faugeras, O.: Statistical shape influence in geodesic active
contours. In: 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002,
p. 8. IEEE (2002)
Li, C., Kao, C.Y., Gore, J.C., Ding, Z.: Minimization of region-scalable fitting energy for image
segmentation. IEEE Trans. Image Process. 17(10), 1940–1949 (2008)
Li, C., Xu, C., Anderson, A.W., Gore, J.C.: MRI tissue classification and bias field estimation
based on coherent local intensity clustering: a unified energy minimization framework. In:
International Conference on Information Processing in Medical Imaging, pp. 288–299. Springer
(2009)
Li, C., Xu, C., Gui, C., Fox, M.D.: Distance regularized level set evolution and its application to
image segmentation. IEEE Trans. Image Process. 19(12), 3243–3254 (2010)
Li, C., Huang, R., Ding, Z., Gatenby, J.C., Metaxas, D.N., Gore, J.C.: A level set method for image
segmentation in the presence of intensity inhomogeneities with application to MRI. IEEE Trans.
Image Process. 20(7), 2007–2016 (2011)
Li, L., Luo, S., Tai, X.C., Yang, J.: Convex hull algorithms based on some variational models.
arXiv preprint arXiv:1908.03323 (2019)
Li, H., Liu, J., Cui, L., Huang, H., Tai, X.C.: Volume preserving image segmentation with entropy
regularized optimal transport and its applications in deep learning. J. Vis. Commun. Image Rep.
71, 102845 (2020a)
Li, X., Yang, X., Zeng, T.: A three-stage variational image segmentation framework incor-
porating intensity inhomogeneity information. SIAM J. Imaging Sci. 13(3), 1692–1715
(2020b)
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak,
J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis.
Med. Image Anal. 42, 60–88 (2017)
Liu, J., Tai, X.C., Luo, S.: Convex shape prior for deep neural convolution network based eye
fundus images segmentation. arXiv preprint arXiv:2005.07476 (2020a)
Liu, J., Wang, X., Tai, X.C.: Deep convolutional neural networks with spatial regularization,
volume and star-shape prior for image segmentation. arXiv preprint arXiv:2002.03989
(2020b)
27 Variational Models and Their Combinations with Deep Learning in. . . 1021
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–
3440 (2015)
Lu, J., Wang, G., Pan, Z.: Nonlocal active contour model for texture segmentation. Multimedia
Tools Appl. 76(8), 10991–11001 (2017)
Luo, S., Tai, X.C., Huo, L., Wang, Y., Glowinski, R.: Convex shape prior for multi-object
segmentation using a single level set function. In: Proceedings of the IEEE/CVF International
Conference on Computer Vision, pp. 613–621 (2019)
Ma, J., Chen, J., Ng, M., Huang, R., Li, Y., Li, C., Yang, X., Martel, A.: Loss odyssey in medical
image segmentation. Med. Image Anal. 71, 102035 (2021a)
Ma, J., He, J., Yang, X.: Learning geodesic active contours for embedding object global
information in segmentation CNNs. IEEE Trans. Med. Imaging 40(1), 93–104 (2021b)
Mardia, K., Kent, J., Walder, A.: Statistical shape models in image analysis. In: Proceedings of the
23rd Symposium on the Interface, Seattle, pp. 550–557 (1991)
Marquez-Neila, P., Baumela, L., Alvarez, L.: A morphological approach to curvature-based
evolution of curves and surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 2–17 (2013)
Martinez-Uso, A., Pla, F., Sotoca, J.M.: A semi-supervised Gaussian mixture model for image
segmentation. In: 2010 20th International Conference on Pattern Recognition, pp. 2941–2944.
IEEE (2010)
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully convolutional neural networks for volumetric
medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV),
pp. 565–571 (2016)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Muñoz, X., Freixenet, J., Cufı, X., Martı, J.: Strategies for image segmentation combining region
and boundary information. Pattern Recogn. Lett. 24(1–3), 375–392 (2003)
Niu, S., Chen, Q., De Sisternes, L., Ji, Z., Zhou, Z., Rubin, D.L.: Robust noise region-based active
contour model via local similarity factor for image segmentation. Pattern Recogn. 61, 104–119
(2017)
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on
Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988)
Pluempitiwiriyawej, C., Moura, J.M., Wu, Y.J.L., Ho, C.: Stacs: new active contour scheme for
cardiac MR image segmentation. IEEE Trans. Med. Imaging 24(5), 593–603 (2005)
Pons, S.V., Rodríguez, J.L.G., Pérez, O.L.V.: Active contour algorithm for texture segmentation
using a texture feature set. In: 2008 19th International Conference on Pattern Recognition,
pp. 1–4. IEEE (2008)
Ray, N., Acton, S.T.: Motion gradient vector flow: an external force for tracking rolling leukocytes
with shape and size constrained active contours. IEEE Trans. Med. Imaging 23(12), 1466–1478
(2004)
Reska, D., Boldak, C., Kretowski, M.: A texture-based energy for active contour image segmen-
tation. In: Image Processing & Communications Challenges, vol. 6, pp. 187–194. Springer
(2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 234–241 (2015)
Schoenemann, T., Cremers, D.: Introducing curvature into globally optimal image segmentation:
minimum ratio cycles on product graphs. In: 2007 IEEE 11th International Conference on
Computer Vision, pp. 1–6. IEEE (2007)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE
Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Shi, X., Li, C.: Convexity preserving level set for left ventricle segmentation. Magn. Reson.
Imaging 78, 109–118 (2021)
Tai, X.C., Hahn, J., Chung, G.J.: A fast algorithm for Euler’s elastica model using augmented
lagrangian method. SIAM J. Imaging Sci. 4(1), 313–344 (2011)
1022 L. Gui et al.
Tuceryan, M., Jain, A.K.: Texture analysis. In: Chen, CH, Pau, LF, Wang, PSP (eds) The Handbook
of Pattern Recognition and Computer Vision, 2nd Edn., pp. 207–248. World Scientific (1998)
Wu, Q., Gan, Y., Lin, B., Zhang, Q., Chang, H.: An active contour model based on fused texture
features for image segmentation. Neurocomputing 151, 1133–1141 (2015)
Yan, S., Tai, X.C., Liu, J., Huang, H.Y.: Convexity shape prior for level set-based image
segmentation method. IEEE Trans. Image Process. 29, 7141–7152 (2020)
Yezzi Jr, A., Tsai, A., Willsky, A.: A fully global approach to image segmentation via coupled
curve evolution equations. J. Vis. Commun. Image Rep. 13(1–2), 195–216 (2002)
Yu, H., He, F., Pan, Y.: A novel segmentation model for medical images with intensity inhomo-
geneity based on adaptive perturbation. Multimedia Tools Appl. 78(9), 11779–11798 (2019)
Zhang, M., Dong, B., Li, Q.: Deep active contour network for medical image segmentation. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,
pp. 321–331 (2020)
Zhou, S., Wang, J., Zhang, M., Cai, Q., Gong, Y.: Correntropy-based level set method for medical
image segmentation and bias correction. Neurocomputing 234, 216–229 (2017)
Bidirectional Texture Function Modeling
28
Michal Haindl
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025
Visual Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027
Bidirectional Texture Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027
BTF Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029
Compound Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030
Principal Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031
Principal Single Model Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032
Non-parametric Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032
Non-parametric Markov Random Field with Iterative Synthesis . . . . . . . . . . . . . . . . . . . . 1033
Non-parametric Markov Random Field with Fast Iterative Synthesis . . . . . . . . . . . . . . . . 1035
Potts Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
Potts-Voronoi Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1038
Bernoulli Distribution Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040
Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041
Local Markov and Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
3D Causal Simultaneous Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
3D Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046
Spatial 3D Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049
Texture Synthesis and Enlargement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1050
Texture Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053
Texture Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053
Illumination Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053
(Un)supervised Image Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054
Multispectral/Multi-channel Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
M. Haindl ()
Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czechia
e-mail: [email protected]
Abstract
Keywords
Introduction
computing time and memory constraints. Furthermore, for example, a car interior
usually has about 20 different materials to synthesize.
Principle component analysis (PCA)-based BTF approximation (Müller et al.
2003; Sattler et al. 2003; Ruiters et al. 2013) allows BTF lossy compression
but not enlargement. Furthermore, projecting the measured data onto a linear
space constructed by statistical analysis such as PCA results in low-quality data
compression. Another compression method (Tsai and Shih 2012) is based on
K-clustered tensor approximation or the polynomial wavelet tree (Baril et al. 2008).
BTF data can be approximated using separate texel models, i.e., spatially varying
bidirectional reflectance distribution function (SVBRDF) models that combine
texture mapping and BRDF models but sacrifice some spatial dependency infor-
mation. A linear combination of multivariate spherical radial basis functions is
used to model BTF as a set of texelwise BRDFs (SVBRDF) in Tsai et al. (2011).
Another SVBRDF method (Wu et al. 2011) uses a parametric mixture model with
a basis analytical BRDF function for texel modeling. Several SVBRDF models
use multilayer perceptron neural networks (Aittala et al. 2016; Deschaintre et al.
2018; Rainer et al. 2020). A deep convolutional neural network VGG-19 is used
in Aittala et al. (2016), while the convolutional neural network recovers SVBRDF
from estimated normal, diffuse albedo, specular albedo, and specular roughness
from a single image lit by a handheld flash in Deschaintre et al. (2018). A learned
SVBRDF decoder in a multilayer perceptron neural model approximates BRDF
values in Rainer et al. (2020). The SVBRDF methods approximate BTF quality, are
computationally expensive due to the nonlinear optimization, allow only moderate
compression ratio, require several manually tuned parameters, and do not allow BTF
space enlargement.
Mathematical multidimensional data models are useful for describing many
of the multidimensional data types provided that we can assume some data
homogeneity, so some data characteristics are a translation invariant. While the
1D models like time series (Anderson 1971; Broemeling 1985) are relatively well
researched, and they have a rich application history in control theory, economet-
rics, medicine, meteorology, and many other data mining or machine learning
applications, multidimensional models are much less known (e.g., more than three-
dimensional MRF), and their applications are still limited. The reason is not only
unsolved theory difficulties but mainly their vast computing power demands, which
prevented their more extensive use until recently.
Visual data models need nonstandard multidimensional (three-dimensional for
static color textures, four-dimensional for videos, or even seven-dimensional for
static BTFs) models. However, if such a nD data space can be factorized, then these
data can also be approximated using a set of lower-dimensional probabilistic mod-
els. Although full visual nD models allow unrestricted spatial-spectral-temporal-
angular correlation modeling, their main drawback is many parameters to be
estimated, which require a correspondingly large learning set. In some models (e.g.,
Markov models), the necessity is to estimate all these parameters simultaneously.
28 Bidirectional Texture Function Modeling 1027
We introduced (Haindl and Havlíček 1998, 2000, 2010, 2016, 2017b, 2018a,b;
Haindl et al. 2012, 2015b), several efficient fast multiresolution Markov random
field (MRF)-based models which exploit BTF space factorization. Our methods
avoid the time-consuming Markov chain Monte Carlo simulation (MCMC) so
typical for Markov models applications with one exception of the Potts MRF. Our
models avoid some problems of alternative options (see Haindl 1991 for details), but
they are also easy to analyze as well as to synthesize, and last but not least, they are
still flexible enough to correctly imitate a broad set of natural and artificial textures
or other spatial data.
We can categorize the model’s applications into synthesis and analysis. Analyt-
ical applications include static or dynamic data un-/semi-/supervised recognition,
scene understanding, data space analysis, motion detection, and numerous others.
Typical synthesis applications are missing data reconstruction, restoration, image
compression, and static or dynamic texture synthesis.
Visual Texture
The visual texture notion is closely tied to the human semantic meaning of surface
material appearance, and texture analysis is an essential and frequently published
area of image processing. However, there is still no mathematically rigorous
definition of the texture that would be accepted throughout the computer vision
community.
We understand a textured image or the visual texture (Haindl and Filip 2013)
to be a realization of a random field, and our effort is to find its parameterizations
in such a way that the real texture representing the specific material appearance
measurements will be visually indiscernible from the corresponding random field’s
realization, whatever the observation conditions might be. Some work distinguishes
between texture and color. We regard such separation between spatial structure
and spectral information to be artificial and principally wrong because there is no
bijective mapping between gray scale and multispectral textures. Thus, our random
field model is always multispectral.
than the traditional three-dimensional static color texture representation. BTF can
model complex lighting effects such as self-shadows, masking, foreshortening,
interreflections, and multiple subsurface light scattering due to material surface
microgeometry.
The seven-dimensional bidirectional texture function (BTF) reflectance model
Fig. 1 is the best recent visual texture representation, which can still be simul-
taneously measured and modeled using state-of-the-art measurement devices and
computers as well as the most advanced mathematical models of visual data. Thus, it
is the most important representation for the high-end and physically correct surface
materials appearance modeling. Nevertheless, BTF requires the most advanced
modeling as well as high-end hardware support. The BTF reflectance model
BTF Measurement
Accurate and reliable BTF acquisition is not a trivial task; only a few BTF measure-
ment systems currently exist (for details see Haindl and Filip 2013; Schwartz et al.
2014; Dana et al. 1997; Koudelka et al. 2003; Sattler et al. 2003; Han and Perlin
2003; Müller et al. 2004; Wang and Dana 2006; Ngan and Durand 2006; Debevec
et al. 2000; Marschner et al. 2005; Holroyd et al. 2010; Ren et al. 2011; Aittala
et al. 2013, 2015). However, their number increases every year in response to the
growing demand for photorealistic virtual representations of real-world materials.
These systems are (similar to bidirectional reflectance distribution function (BRDF)
measurement systems) based on the light source, video/still camera, and material
sample. The main difference between individual BTF measurement systems is in
the type of measurement setup allowing four degrees of freedom for camera/light,
the type of measurement sensor (CCD, video, and some other), and light.
In some systems, the camera is moving, and the light is fixed (Dana et al. 1997;
Sattler et al. 2003; Neubeck et al. 2005), while in others, e.g., Koudelka et al. (2003),
it is just the opposite. There are also systems where both camera and light source
remain fixed (Han and Perlin 2003; Müller et al. 2004).
The UTIA gonioreflectometer setup Fig. 2 consists of independently controlled
arms with a camera and light. Its parameters, such as angular precision 0.03
degree, spatial resolution 1000 DPI, or selective spatial measurement, classify this
differ only in their contextual support set i Ir and corresponding parameter sets i θ
(a set of all i-th local random field parameters). The same type of sub-models are
assumed only for simplicity and can be omitted without any problems if needed.
The BTF-CMRF model has a posterior probability
P (X, Y | Ỹ ) = P (Y | X, Ỹ )P (X | Ỹ ) (2)
28 Bidirectional Texture Function Modeling 1031
where ΩX , ΩY are the corresponding configuration spaces for both random fields
(X, Y ). To avoid an iterative MCMC MAP solution for parameter estimation, we
proposed the following two-step approximation X̆, Y̆ (Haindl and Havlíček 2010):
The principal part (X) of the BTF compound Markov models (BTF-CMRF )
is assumed to be independent on illumination and observation angles, i.e., it
is identical for all possible combinations φi , φv , θi , θv azimuthal and elevation
illumination/viewing angles, respectively. This assumption does not compromise
the resulting BTF space quality because it influences only a material texture
macrostructure independent of these angles for static BTF textures.
The principal random field X̆ is estimated using simple K-means clustering of
Ỹ in the RGB color space into a predefined number of K classes, where cluster
indices are X̆r ∀r ∈ I estimates. We further use for simplicity the RGB color
space, but any other color space can be used as well. The number of classes K
can be estimated using the Kullback-Leibler divergence and considering a sufficient
amount of data necessary to estimate all local Markovian models reliably. If the
BTF texture contains subparts with distinct texture but similar colors, any more
sophisticated texture segmenter (e.g., Haindl and Mikeš 2007; Haindl et al. 2009a,b,
2015a) can be used.
1032 M. Haindl
The simplest principal model is a constant field that contains only one model BTF-
CMRFc... P (X | Ỹ ) = const., i.e., P (Xr | Ỹ ) = P (Xs | Ỹ ) ∀r, s. Then there is
no need to use the MAP approximation (3), (4), and the compound Markov model
simplifies into a single random field BTF-MRF model, and the BTF-MRF model
can be any of the following local MRF models.
If we do not assume any specific principal control field parametric model, but rather
we seamlessly and directly enlarge its realization from measured data (Fig. 3), we
get several non-parametric principal control field approaches. The non-parametric
principal field BTF-CMRFN P rol... (NProl. . . – a non-parametric roller-based prin-
cipal field with any local random fields denoted as . . .; see Figs. 3, 4, 16) can
be modeled using the roller method (Haindl and Havlíček 2010) for optimal X̆
compression and speedy enlargement to any required field size. The roller method
(Haindl and Hatka 2005a,b) principle is the overlapping tiling and subsequent
minimum error boundary cut. One or several optimal double toroidal data patches
are seamlessly and randomly repeated during the synthesis step. This fully automatic
method starts with minimal tile size detection, which is limited by the size of the
principal field, the number of toroidal tiles we are looking for, and the sample spatial
frequency content.
Fig. 3 Measured brick principal field (upper left), its optimal double toroidal patch (bottom left),
and enlarged synthetic principal field (right, K = 8)
28 Bidirectional Texture Function Modeling 1033
Fig. 4 Synthetic BTF-CMRFN P rol3DCAR enlarged color bark (right) estimated from their natural
measurements (left)
Fig. 5 The granite (Fig. 6) principal field synthesis. The target texture principal field, initializa-
tion, and selected iteration steps rightwards
Fig. 6 The granite measurement and its synthetic enlargement (BTF-CMRFN P i3AR )
1. Pixels r and s are randomly selected with the following properties: The pixel r
from the class ωi is on the border between region ↓ ωiA (a region A which can
be decreased) and region ↑ ωjB (a region B which can be increased). The pixel s
from the class ωj is on the border between region ↓ ωjC (a region C which can be
decreased) and region ↑ ωiD (a region D which can be increased). These regions
have to be distinct, i.e., A ∩ D = ∅ and B ∩ C = ∅. If such pixels r, s exist, go
to step 5. If not repeat this step once more.
28 Bidirectional Texture Function Modeling 1035
The non-parametric principal field (Haindl and Havlíček 2018a) BTF-CMRFN Pf i...
is estimated as in the previous section, and its synthesis is modified to be signifi-
cantly faster at the cost of slightly compromised principal field variability. The fast
algorithm compromise is its preference for convex regions instead of their general
shapes but profits with faster convergency.
The median speed up between this method and the approach for the non-
parametric principal field synthesis in section “Non-parametric Markov Random
Field with Iterative Synthesis” is one-fifth of the required cycles to converge. Some
textures (e.g., granite; Fig. 7) have sufficiently similar statistics of the synthesized
regions with the principal target field already in the initialization step. Hence,
the principal field synthesis even does not need any iterations. The lichen Fig. 8
principal target field (512 × 512) requires 29 137 iterations, while the previous
iterative method needs nearly 5 times more (140 146) iterations to converge.
Fig. 7 The granite principal field synthesis. The target texture principal field, initialization, and a
similar 104 -th iteration step result
Fig. 8 The lichen measurement and its synthetic enlargement (BTF-CMRFN Pf i3DCAR )
all classes have their correct required number of pixels but not their correct
region size histograms.
1.–7. Identical with the corresponding items in section “Iterative Principal Field
Synthesis”.
where Z is the appropriate normalizing constant and δ() is the Kronecker delta
function. The rough-scale-upper- level Potts model (a = 1) regions are further
elaborated with the detailed fine-scale-level (a = 2) Potts model which models the
corresponding subregions in each upper-level region. The parameter β (a) for both
level models is estimated using an iterative estimator which starts from the upper
β limit (βmax ) and adjusts (decreases or increases) its value until the Potts model
regions have similar parameters (average inscribed squared region size and/or the
region’s perimeter) with the target texture switching field. This iterative estimator
gives more resembling results with the target texture than the alternative maximum
pseudo-likelihood method (Levada et al. 2008). The corresponding Potts models
are synthesized (Fig. 9 – middle) using the fast Swendsen-Wang sampling method
(Swendsen and Wang 1987).
Fig. 9 The rusty plate texture measurement, its principal synthetic field, and the final synthetic
CMRFP 3AR model texture
1038 M. Haindl
The principal field (X) of the CMRF BTF-CMRFP V ... model (Haindl et al. 2015b)
is a mosaic represented as a Voronoi diagram (Aurenhammer 1991), and the
distribution of the particular colors (texture classes) of the mosaic is modeled as
a Potts random field which is built on top of the adjacency graph (G) of the mosaic.
Figure 10 illustrates this model applied to the floor mosaic, while Fig. 11 shows this
model applied to a glass mosaic synthesis in St. Vitus Cathedral in Prague Castle.
The algorithm requires input in the form of a segmented mosaic with distinguishable
regions of the same texture type.
After that follows the identification of the mosaic field centers and the estimation
of the parameters of the 2D discrete point process, which samples the control
points of the newly synthesized Voronoi mosaic. This sampling is done using a 2D
histogram, which has shown to be sufficient for the good quality estimate. The only
other parameter is the number of points to be sampled, which grows linearly
with the required area of the synthetic image in the case of texture enlargement
applications.
With the control points for the Voronoi mosaic cells having been sampled, we
compute the Voronoi diagram, and optionally mark the delimiting edges between
adjacent cells. The assignment of a regional texture model to each mosaic cell
(the principal MRF (P (X | Ỹ ))) is then mapped by the flexible K−state Potts
random field (Potts and Domb 1952; Wu 1982).
Let us denote G = (V , E) the adjacency graph of the mosaic areas and
the 1st-order neighborhood, where V , E are the vertex and edge sets. Vertexes
correspond to the particular areas in the mosaic, and there is an edge between two
vertexes if their corresponding areas are directly next to each other.
The resulting thematic principal map X̆ is represented by the Potts model for a
general graph
Fig. 10 The floor mosaic measurement and its synthesis (BTF-CMRFP V 3DCAR )
28 Bidirectional Texture Function Modeling 1039
Fig. 11 An example of St. Vitus Cathedral in Prague Castle stained glass window with two
original panels (yellow arrows) replaced with synthetic images (BTF-CMRFP V 3DCAR )
⎧ ⎫
1 ⎨ ⎬
p(X̆|β) = exp −β δ(Xu , Xv ) (7)
Z ⎩ ⎭
u∈V ,v∈Nu
where Z is the appropriate normalizing constant and δ() is the Kronecker delta
function. The parameter β is estimated from the K-means clustered input mosaic
using the maximum pseudo-likelihood method described by Levada et al. (2008).
The local density of the Potts field can be expressed as
Calculating the logarithm, differentiating, and setting the result equal to 0, we get
the maximum pseudo-likelihood equation (10) for the β estimate:
1040 M. Haindl
K
k=1 v∈Nu δ(Xu , Xv ) exp β v∈Nu δ(k, Xv )
Ψ (β) = −
K
u∈V k=1 exp β v∈Nu δ(k, Xv )
+ δ(Xu , Xv ) = 0. (10)
u∈V v∈Nu
The corresponding Potts models are synthesized using the fast Swendsen-Wang
sampling method (Swendsen and Wang 1987), although for smaller fields, which
the mosaics undoubtedly are, other sampling MCMC methods such as the Gibbs
sampler (Geman and Geman 1984) can be used. Alternatively, the Metropolis
algorithm (Metropolis et al. 1953) should also work sufficiently fast enough.
where M is set of all mixture components, m a mixture component index, {r} is a set
of indices from Ir , and the principal field BTF-CMRFBM... is further decomposed
into separate binary bit planes of binary variables ξ ∈ B, B = {0, 1} which are
separately modeled and can be learned from much smaller training texture than a
multi-level discrete mixture model (see examples in Fig. 14). We suppose that a
bit factor of a principal field can be fully characterized by a marginal probability
distribution of binary levels on pixels within the scope of a window centered around
the location r and specified by the index set Ir ⊂ I , i. e., X{r} ∈ Bη and P (X{r} ) is
the corresponding marginal distribution of P (X | Ỹ ). The component distributions
P (· | m) are factorizable, and multivariable Bernoulli
P (X{r} | m) = Xs
θ̇m,s (1 − θ̇m,s )1−Xs Xs ∈ X{r} . (12)
s∈Ir
The mixture model parameters (11), (12) include component weights p(m) and the
univariate discrete distributions of binary levels. They are defined by one parameter
θ̇m,s as a vector of probabilities:
1 (t)
p(t+1) (m) = q (m | X{r} ), (15)
|S |
X{r} ∈S
and
1
ps(t+1) (ξ | m) = δ(ξ, Xs ) q (t) (m | X{r} ), ξ ∈ B. (16)
|S | p(t+1) (m)
X{r} ∈S
The discrete principal field can be alternatively modeled (Haindl and Havlíček
2017b) by a continuous RF BTF-CMRFGM... if we map single indices into
continuous random variables with uniformly separated mean values and small
variance. The synthesis results are subsequently inversely mapped back into a
corresponding synthetic discrete principal field. We assume the joint probability
distribution P (X{r} ), X{r} ∈ K η in the form of a normal mixture, and the mixture
components are defined as products of univariate Gaussian densities
i. e., the components are multivariate Gaussian densities with diagonal covariance
matrices. The maximum-likelihood estimates of the parameters p(m), μms , σms can
be computed by the expectation-maximization (EM) algorithm (Dempster et al.
1977; Grim and Haindl 2003). Anew we use a data set S obtained by pixel-
wise shifting the observation window within the original texture image S =
(1) (K) (k)
{X{r} , . . . , X{r} }, X{r} ⊂ X. The corresponding log-likelihood function is
maximized by the EM algorithm (m ∈ M, n ∈ N, X{r} ∈ S), and the iterations are
(14), (15) and
1042 M. Haindl
1
m,n =
μ(t+1) Xn q(m | X{r} ), (18)
X{r} ∈S q (m | X{r} ) X ∈S
(t)
{r}
X{r} ∈S Xn q (m | X{r} )
2 (t)
(σm,n ) = −(μ(t+1)
(t+1) 2
m,n ) +
2
. (19)
X{r} ∈S q(m|X{r} )
While the principal models control the overall large-scale low-frequency textural
structure, the local models synthesize the detail, regional and fine-granularity
spatial-spectral BTF information. Once we have synthesized the required size’s
principal random field, using some of the previously described models, we use it
to synthesize the local random part (3) of the BTF compound random model Y .
This local model is a mosaic of K random field sub-models. These sub-models are
assumed to be of the same type, but they differ in parameters and contextual support
sets. This assumption is for simplicity only and is not restrictive because every sub-
model is estimated and synthesized independently; thus, the Y mosaic can be easily
composed of different types of random field models.
Local i-th texture region (not necessarily continuous) models are view and
illumination dependent; thus, they need to be ideally represented by models which
can be analytically estimated as well as easily non-iteratively synthesized (BTF-
CMRFN P rol3DCAR (Haindl and Havlíček 2010), BTF-CMRF2P 3DCAR (Haindl
et al. 2012), BTF-CMRFP V 3DCAR (Haindl et al. 2015b), BTF-CMRFc3DGM
(Haindl and Havlíček 2016), BTF-CMRFBM3DCAR (Haindl and Havlíček 2017b),
BTF-CMRFGM3DCAR , BTF-CMRFN P rol3DMA (Haindl and Havlíček 2017a), BTF-
CMRFN P i3DCAR (Haindl and Havlíček 2018b), BTF-CMRFN Pf i3DCAR (Haindl
and Havlíček 2018a)).
where As are matrices (21) and the zero mean white Gaussian noise vector er has
uncorrelated components with data indexed from Irc but noise vector components
can be mutually correlated with a constant covariance matrix Σ.
⎛s1 ,s2 s1 ,s2
⎞
a1,1 , . . . , a1,d
⎜ ⎟
As1 ,s2 =⎜
⎝
.. . . ..
., . , .
⎟
⎠ (21)
s1 ,s2 s1 ,s2
ad,1 , . . . , ad,d
where d × d are parameter matrices. The model can be expressed in the matrix form
Yr = γ Zr + er , (22)
where
Zr = [Ỹr−s
T
: ∀s ∈ Irc ], (23)
γ = [A1 , . . . , Aη ] (24)
i.e., maxj {p(Mj | Y (r−1) )}. Simultaneous conditional density can be evaluated
analytically from
p(Y (r−1) | Mj ) = p(Y (r−1) | γ , Σ −1 )p(γ , Σ −1 | Mj )dγ dΣ −1 (25)
, and for the implemented uniform priors start, we get a decision rule (Haindl and
Šimberová 1992):
The most probable AR model given past data Y (r−1) , the normal-Wishart
parameter prior and the uniform model prior is the model Mi (Haindl 1983) for
which
1044 M. Haindl
i = arg max{Dj }
j
d β(r) − dη + d + 1 d 2η
Dj = − ln |Vx(r−1) | − ln |λ(r−1) | + ln π (26)
2 2 2
d
β(r) − dη + d + 2 − i β(0) − dη + d + 2 − i
+ ln Γ − ln Γ
2 2
i=1
where Vz(r−1) = Ṽz(r−1) + Vz(0) with Ṽz(r−1) defined in (31), Vz(0) is an appropriate
part of V0 (31), β(r) is defined in (27), (28) and λ(r−1) is (29).
The statistics (26) uses the following notation (27), (28), (29), (30) and (31):
and
−1
λ(r) = Vy(r) − Vzy(r)
T
Vz(r) Vzy(r) . (29)
Marginal densities p(γ | Y (r−1) ) and p(Σ −1 | Y (r−1) ) can be evaluated from
(32), (33), respectively.
p(γ | Y (r−1) ) = p(γ , Σ −1 | Y (r−1) )dΣ −1 (32)
p(Σ −1 | Y (r−1) ) = p(γ , Σ −1 | Y (r−1) )dγ (33)
The marginal density p(Σ −1 | Y (r−1) ) is the Wishart distribution density (Haindl
1983)
d(1−d) β(r)−dη
π 4 |Σ −1 | 2 β(r)−dη+d+1
p(Σ −1 | Y (r−1) ) = d(β(r)−dη+d+1) d |λ(r−1) | 2
β(r)−dη+2+d−i
2 2
i=1 Γ ( 2 )
1
exp − tr{Σ −1 λ(r−1) } (34)
2
28 Bidirectional Texture Function Modeling 1045
with
2(β(r) − dη + 1)
. (36)
λ(r−1) λT(r−1)
The marginal density p(γ | Y (r−1) ) is matrix t distribution density (Haindl 1983)
d β(r)+d+2−i
i=1 Γ ( ) 2
− d 2η dη d
p(γ | Y (r−1)
) = d 2
β(r)−dη+d+2−i
π |λ(r−1) |− 2 |Vx(r−1) | 2
i=1 Γ ( 2 )
− β(r)+d+1
I + λ−1
2
(r−1) (γ − γ̂r−1 ) Vz(r−1) (γ − γ̂r−1 )
T
(37)
Similar statistics can be easily derived (Haindl 1983) for the alternative Jeffreys
non-informative parameter prior. Similar to other model statistics, also the predictive
density can be analytically derived.
The one-step-ahead predictive posterior density for the normal-Wishart parame-
ter prior has the form of d-dimensional Student’s probability density (40) (Haindl
1983)
Γ ( β(r)−dη+d+2 )
p(Yr | Y (r−1) ) = 2
d −1 d 1
Γ ( β(r)−dη+2
2 ) π 2 (1 + ZrT Vz(r−1) Zr ) 2 |λ(r−1) | 2
⎛ ⎞− β(r)−dη+d+2
(Yr − γ̂r−1 Zr )T λ−1
2
with β(r) − dη + 2 degrees of freedom; if β(r) > dη then the conditional mean
value is
1046 M. Haindl
and
−1
1 + Zr Vz(r−1) ZrT
E (Yr − γ̂r−1 Zr )(Yr − γ̂r−1 Zr )T | Y (r−1) = λ(r−1) . (42)
(β(r) − dη)
The 3DCAR model can be made adaptive if we modify its recursive statistics
using an exponential forgetting factor, i.e., a constant ϕ ≈ 0.99. This forgetting
factor smaller than 1 is used to weigh the influence of older data. The numerical
stability of 3DCAR can be guaranteed if all its recursive statistics use the square
root factor updating applying either the Cholesky or LDLT decomposition (Haindl
2000), respectively.
The 3DCAR (analogously also the 2DCAR model) model has advantages in
analytical solutions (Bayes, ML, or LS estimates) for Ir , γ̂ , σ̂ 2 , Ŷr statistics. It
allows straightforward, fast synthesis, adaptivity, and building efficient recursive
application algorithms.
Single multispectral texture factors Y are modeled using the extended version
(3D MA) of the moving average model (Li et al. 1992; Haindl and Havlíček 2017a).
A stochastic multispectral texture can be considered to be a sample from a 3D
random field defined on an infinite 2D lattice. The model assumes that each factor
is the output of an underlying system, which completely characterizes it in response
to a 3D uncorrelated random input. This system can be represented by the impulse
response of a linear 3D filter. The intensity values of the most significant pixels,
together with their neighbors, are collected and averaged. The resultant 3D kernel is
used as an estimate of the impulse response of the underlying system. A synthetic
mono-spectral factor can be generated by convolving an uncorrelated 3D random
field with this estimate. Suppose a stochastic multispectral texture denoted by Y is
the response of an underlying linear system that completely characterizes the texture
in response to a 3D uncorrelated random input Er ; then, Yr is determined by the
difference equation
Yr = Bs Er−s (43)
s∈Ir
E{Er Es } = 0 ∀r1 = s1 ∨ r2 = s2 ,
E{Er1 ,r2 ,r3 Er1 ,r2 ,r̄3 } = 0 ∀r3 = r̄3 .
The index set Ir depends on modeled visual data and can have any other than this
rectangular shape. Y{r} denotes the corresponding matrix containing all d ×1 vectors
Ys in some fixed order arrangement such that s ∈ Ir , Y{r} = [Ys ∀ s ∈ Ir ], Y{r} ⊂ Y ,
η = cardinality{Ir }, and P (Y{r} ) is the corresponding marginal distribution of P (Y ).
If we assume the joint probability distribution P (Y{r} ), in the form of a normal
mixture (Haindl and Havlíček 2016)
P (Y{r} ) = p(m) P (Y{r} | μm , Σm ) Y{r} ⊂ Y,
m∈M
= p(m) ps (Ys | μm,s , Σm,s ) (45)
m∈M s∈Ir
i. e., the components are multivariate Gaussian densities with covariance matrices
(53).
The underlying structural model of conditional independence is estimated from
a data set S obtained by the step-wise shifting of the contextual window Ir within
the original textural image, i. e., for each location r one realization of Y{r} .
Parameter Estimation
The unknown parameters of the approximating mixture can be estimated using
the iterative EM algorithm (Dempster et al. 1977). In order to estimate the
unknown distributions ps (· | m) and the component weights p(m) we maximize
the likelihood function (49) corresponding to the training set (48):
28 Bidirectional Texture Function Modeling 1049
⎡ ⎤
1
L= log ⎣ P (Y{r} | μm , Σm ) p(m)⎦ . (49)
|S |
Y{r} ∈S m∈M
E:
M:
1 (t)
p(t+1) (m) = q (m | Y{r} ), (51)
|S |
Y{r} ∈S
1
m,s =
μ(t+1) (t) (m | Y
Y{r} ∈S q {r} )
Ys q (t) (m | Y{r} ). (52)
Y{r} ∈S
(t+1) (t+1) T
Y{r} ∈S,Ys ∈Y{r} q (t) (m | Y{r} ) Ys YsT p(t+1) (m) |S| μm,s μm,s
= (t) (m | Y
− (t) (m | Y
.
Yr ∈S q {r} ) Yr ∈S q {r} )
The iteration process stops when the criterion increments are sufficiently small.
The EM algorithm iteration scheme has the monotonic property L(t+1) ≥ L(t) , t =
0, 1, 2, . . . which implies the convergence of the sequence {L(t) }∞
0 to a stationary
point of the EM algorithm (local maximum or a saddle point of L). Figure 13 illus-
trates the usefulness of the BTF-CMRF3DGM model for textile material modeling,
while Fig. 18 shows this model applied to scratch restoration.
Applications
Numerous modeling applications can exploit the BTF models. The synthesis is
beneficial not only for physically correct appearance modeling of surface materials
under realistic and variable observation conditions (Figs. 15 and 17, upper row)
1050 M. Haindl
Fig. 14 Measured original cloth and corduroy materials and their synthesis using the
CRF BM−3CAR model
but also for texture editing (Fig. 16), texture compression, or texture inpainting
and restoration (Fig. 18). Various state-of-the-art unsupervised, semi-supervised, or
supervised visual scene classification and understanding under variable observation
conditions is the primary application for BTF analysis.
Texture synthesis methods may be divided primarily into intelligent sampling and
model-based methods (Fig. 14). They differ in need to store (sampling) or not
(modeling) some actual texture measurements for new texture synthesis. Thus, even
some methods which view texture as a stochastic process (Heeger and Bergen 1995;
Efros and Leung 1999) still require to store an input exemplar. Sampling approaches
De Bonet (1997), Efros and Leung (1999), Efros and Freeman (2001), Heeger and
Bergen (1995), Xu et al. (2000), Dong and Chantler (2002), and Zelinka and Garland
28 Bidirectional Texture Function Modeling 1051
(2002) rely on sophisticated sampling from real texture measurements, while the
model-based techniques (Kashyap 1981; Haindl 1991; Haindl and Havlíček 1998,
2000; Bennett and Khotanzad 1998, 1999; Gimelfarb 1999; Paget and Longstaff
1998; Zhu et al. 2000) describe texture data using multidimensional mathematical
models, and their synthesis is based on the estimated model parameters only. The
mathematical model-based synthesis has an advantage in the possibility of seamless
texture enlargement to any size (e.g., Fig. 6). The enlargement of a restricted texture
measurement is always required in any application but cannot be achieved with
sampling approaches without visible seams or repetitions.
The BTF modeling’s ultimate aim is to create a visual impression of the same
material without a pixel-wise correspondence to the finding condition model of the
original measurements. Figure 15 shows the finding condition model of the beautiful
gothic style relief (around 1370) of the Christ in Gethsemane (Prague) in the right
and restored condition to a possible original appearance in the left.
The cornerstone of our BTF compression and modeling methods is the replace-
ment of a vast number of original BTF measurements by their efficient parametric
estimates derived from an underlying set of spatial probabilistic models and thus to
allow a huge BTF compression ratio unattainable by any alternative sampling-based
BTF synthesis method. Simultaneously these models can be used to reconstruct
missing parts of the BTF measurement space or the controlled BTF space editing
(Haindl and Havlíček 2009, 2012; Haindl et al. 2015b) by changing some of the
model’s parameters.
Textures without significant low frequencies such as Fig. 14-corduroy or Fig. 13-
fabric can be modeled using simple local models only, either Markovian or
mixtures such as 3DCAR, 3DMA, 3DBM, 3DGM, etc. Textures with substantial
low frequencies (Figs. 4, 9, 14-cloth) will benefit from a compound version of the
BTF model. Non-BTF textures can approximate low frequencies using a multiscale
version of these models, e.g., pyramidal model (Haindl and Filip 2013).
Fig. 15 3D model of the beautiful gothic style relief of the Christ in Gethsemane, Prague (finding
condition model right, restored condition to a possible original appearance left) mapped with the
BTF synthetic plaener using the CMRF 3CAR model
1052 M. Haindl
The 3DCAR model is synthesized directly from its predictor (41) and Gaussian
noise generator (22), (39). The advantage of a mixture model is its simple synthesis
based on the marginals:
Ṁ
pn | ρ (Yn | Y{ρ} ) = Wm (Y{ρ} ) pn (Yn | m), (54)
m=1
where Wm (Y{{ρ} ) are the a posterior component weights corresponding to the given
sub-matrix Y{ρ} ⊂ Y{r} :
p(m)Pρ (Y{ρ} | m)
Wm (Y{ρ} ) = , (55)
Ṁ
j =1 p(j )Pρ (Y{ρ} | j )
28 Bidirectional Texture Function Modeling 1053
There are several alternatives for the 3DGM model synthesis (Haindl et al. 2011)
(Fig. 13). The unknown multivariate vector-levels Yn can be synthesized by random
sampling from the conditional density (54), or the mixture RF can be approximated
using the GM mixture prediction.
Texture Compression
Texture Editing
Illumination Invariants
Textures are essential clues to specify objects present in a visual scene. However, the
appearance of natural textures is highly illumination and view angle-dependent. As a
1054 M. Haindl
It is possible to show (Vacha and Haindl 2007) that assuming (57) the following
3DCAR model-derived features are illumination invariant:
1. trace: trace Am , m = 1, . . . , η K
2. eigenvalues: νm,j of Am , m = 1, . . . , η K, j = 1, . . . , C
3. 1 + XrT Vx−1 Xr ,
% & 'T & '
4. r Yr − γ̂ Xr λ−1 Yr − γ̂ Xr ,
% & 'T & '
5. r Yr − μ λ−1 Yr − μ ,
Fig. 17 BTF wood mosaic and the MW3-AR8i model-based (Haindl et al. 2015a) unsupervised
segmentation results
learning set size and the vertical viewing and illumination angle, which is a very
inadequate representation of the enormous material appearance variability.
Although plentiful different methods were already published (Zhang 1997),
the image recognition problem is still far from being solved. This situation is
among others due to missing reliable performance comparison between different
techniques. Only limited results were published (Martin et al. 2001; Sharma and
Singh 2001; Ojala et al. 2002; Haindl and Mikeš 2008) on suitable quantitative
measures that allow us to evaluate and compare the quality of segmentation
algorithms.
Spatial interaction models and especially Markov random field-based models are
increasingly popular for texture representation (Kashyap 1986; Reed and du Buf
1993; Haindl 1991), etc. Several researchers dealt with the difficult problem of
unsupervised segmentation using these models, see for example Panjwani and
Healey (1995), Manjunath and Chellapa (1991), Andrey and Tarroux (1998), Haindl
(1999), and Matuszak and Schreiber (2009).
Our unsupervised segmenters (Haindl and Mikeš 2004, 2005, 2006; Haindl
et al. 2015a) assume the multispectral or multi-channel textures to be locally
represented by the parameters (Θr ) of the multidimensional random field models
possibly recursively evaluated for each pixel and several scales. The segmentation
part of the algorithm is then based on the underlying Gaussian mixture model
(p(Θr ) = K i=1 pi p(Θr | νi , Σi )) representing the Markovian parametric space
and starts with an over-segmented initial estimation, which is adaptively modified
until the optimal number of homogeneous mammogram segments is reached. The
corresponding mixture model equations (p(Θr ), p(Θr | νi , Σi )) are solved using a
modified EM algorithm (Haindl and Mikeš 2007).
1056 M. Haindl
Ŷ = E{Ÿ } (59)
Fig. 18 Cobra skin scratch restoration using the spatial 3D Gaussian mixture model
and assuming the noisy image Ÿ can be represented by a 3DCAR model, then
the restoration model as well as the local estimation of the point-spread function
leads to a fast analytical solution (Haindl 2002). A similar restoration approach can
also be derived for a multi-channel (Haindl and Šimberová 2002) or multitemporal
(Haindl and Šimberová 2005) image restoration problems typically caused by
random fluctuations originating mostly in the Earth’s atmosphere during ground-
based telescope observations.
A difficult restoration problem is to restore missing parts of an image or a
spatially correlated data field. For example, every movie deteriorates with usage and
time irrespective of any care it gets. Movies (on both optical and magnetic materials)
suffer from blotches, dirt, sparkles, noise, scratches (Fig. 18), missing or heavily
corrupted frames, mold, flickering, jittering, image vibrations, and other problems.
For each kind of defect, usually a different kind of restoration algorithm is needed.
The scratch notion means every coherent region with missing data (simultaneously
in all spectral bands) in a color movie frame (Haindl and Filip 2002), static image,
range map, radio-spectrograph (Haindl and Šimberová 1996), radar observation,
color textures (Haindl and Havlíček 2015), etc. These missing data restoration
methods (inpainting) exploit correlations in the spatial/spectral/temporal data space
and benefit from the discussed Markovian or mixture (Fig. 18) random field models.
Conclusion
There is no single universal BTF model applicable for physically correct modeling
of visual properties of all possible BTF textures. Every presented model is better
suited for some subspace of possible BTF textures, either natural or artificial. Their
selection depends primarily on their spectral and spatial frequency content as well
as on available learning data. We present exceptional adaptive 3D Markovian or
mixture models, either solved analytically or iteratively and quickly synthesized.
The presented compound Markovian models are rare exceptions in the Marko-
vian model family that allow deriving extraordinarily efficient and fast data process-
1058 M. Haindl
ing algorithms. All their statistics can be either evaluated recursively, and they either
do not need any Monte Carlo sampling typical for other Markovian models or can
use a fast form of such sampling (Potts random field). The 3DCAR models have
an advantage over non-causal (3DAR) in their analytical treatment. It is possible
to find the analytical solution of model parameters, optimal model support, model
predictor, etc. Similarly, the 3DCAR model synthesis is straightforward, and this
model can be directly generated from the model equation.
The mixture models are capable of reducing additive noise and restore missing
textural parts simultaneously. They produce high-quality results, especially of
regular or near-regular color textures. Their typical drawback the extensive learning
date set requirement is lessened by the ample available BTF measurement space
using a transfer learning approach.
The BTF-CMRF models offer a large data compression ratio (only tens of
parameters per BTF), easy simulation, and fast, seamless synthesis of any required
texture size. The methods have no restriction to the number of spectral channels;
thus, they can be easily applied to hyperspectral BTFs. The methods can be easily
generalized for color or BTF texture editing by estimating some local models from
different target materials or for image restoration or inpainting.
The Markovian models can be used for image enhancement, e.g., utterly
automatic mammogram enhancement, multispectral and multiresolution texture
qualitative measures development, or image or video segmentation. Some of these
models also allow robust textural features for texture classification when learning
and classified textures differ in scale. The classifiers based on Markovian features
can exploit illumination or geometric invariance properties and often outperform
the state-of-the-art alternative methods on tested public databases (e.g., eye, bark,
needles, textures).
Acknowledgments The Czech Science Foundation Project GAČR 19-12340S supported this
research.
References
Acton, S., Bovik, A.: Piecewise and local image models for regularized image restoration using
cross-validation. IEEE Trans. Image Process. 8(5), 652–665 (1999)
Aittala, M., Weyrich, T., Lehtinen, J.: Practical SVBRDF capture in the frequency domain. ACM
Trans. Graph. (Proc. SIGGRAPH) 32(4), 110:1– 110:13 (2013)
Aittala, M., Weyrich, T., Lehtinen, J.: Two-shot SVBRDF capture for stationary materials. ACM
Trans. Graph. 34(4), 110:1–110:13 (2015). https://doi.org/10.1145/2766967
Aittala, M., Aila, T., Lehtinen, J.: Reflectance modeling by neural texture synthesis. ACM Trans.
Graph. 35(4), 65:1–65:13 (2016). https://doi.org/10.1145/2897824.2925917
Anderson, T.W.: The Statistical Analysis of Time Series. Wiley, New York (1971)
Andrews, H.C., Hunt, B.: Digital Image Restoration. Prentice-Hall, Englewood Cliffs (1977)
Andrey, P., Tarroux, P.: Unsupervised segmentation of markov random field modeled textured
images using selectionist relaxation. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 252–262
(1998)
28 Bidirectional Texture Function Modeling 1059
Asmussen, J.C.: Modal analysis based on the random decrement technique: application to civil
engineering structures. PhD thesis, University of Aalborg (1997)
Aurenhammer, F.: Voronoi diagrams-a survey of a fundamental geometric data structure. ACM
Comput. Surv. (CSUR) 23(3), 345–405 (1991)
Baril, J., Boubekeur, T., Gioia, P., Schlick, C.: Polynomial wavelet trees for bidirectional texture
functions. In: SIGGRAPH’08: ACM SIGGRAPH 2008 talks, p. 1. ACM, New York (2008).
https://doi.org/10.1145/1401032.1401072
Bennett, J., Khotanzad, A.: Multispectral random field models for synthesis and analysis of color
images. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 327–332 (1998)
Bennett, J., Khotanzad, A.: Maximum likelihood estimation methods for multispectral random
field image models. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 537–543 (1999)
Broemeling, L.D.: Bayesian Analysis of Linear Models. Marcel Dekker, New York (1985)
Burgeth, B., Pizarro, L., Didas, S., Weickert, J.: Coherence-enhancing diffusion filtering for matrix
fields. In: Locally Adaptive Filtering in Signal and Image Processing. Springer, Berlin (2009)
Cole Jr, H.A.: On-line failure detection and damping measurement of aerospace structures by
random decrement signatures. Technical Report TMX-62.041, NASA (1973)
Dana, K.J., Nayar, S.K., van Ginneken, B., Koenderink, J.J.: Reflectance and texture of real-world
surfaces. In: CVPR, pp. 151–157. IEEE Computer Society (1997)
De Bonet, J.: Multiresolution sampling procedure for analysis and synthesis of textured images.
In: ACM SIGGRAPH 97, pp. 361–368. ACM Press (1997)
Debevec, P., Hawkins, T., Tchou, C., Duiker, H.P., Sarokin, W., Sagar, M.: Acquiring the
reflectance field of a human face. In: Proceedings of ACM SIGGRAPH 2000, Computer
Graphics Proceedings, Annual Conference Series, pp. 145–156 (2000)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em
algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
Deschaintre, V., Aittala, M., Durand, F., Drettakis, G., Bousseau, A.: Single-image svbrdf capture
with a rendering-aware deep network. ACM Trans. Graph. 37(4), 1–15 (2018). https://doi.org/
10.1145/3197517.3201378
Dong, J., Chantler, M.: Capture and synthesis of 3D surface texture. In: Texture 2002, vol. 1,
pp. 41–45. Heriot-Watt University (2002)
Dong, J., Wang, R., Dong, X.: Texture synthesis based on multiple seed-blocks and support vector
machines. In: 2010 3rd International Congress on Image and Signal Processing (CISP), vol. 6,
pp. 2861–2864 (2010). https://doi.org/10.1109/CISP.2010.5646815
Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Fiume, E. (ed.)
ACM SIGGRAPH 2001, pp. 341–346. ACM Press (2001). citeseer.nj.nec.com/efros01image.
html
Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: Proceedings of
International Conference on Computer Vision (2), Corfu, pp. 1033–1038 (1999). citeseer.nj.
nec.com/efros99texture.html
Felsberg, M.: Adaptive filtering using channel representations. In: Locally Adaptive Filtering in
Signal and Image Processing. Springer, Berlin (2009)
Filip, J., Haindl, M.: Bidirectional texture function modeling: a state of the art survey. IEEE Trans.
Pattern Anal. Mach. Intell. 31(11), 1921–1940 (2009). https://doi.org/10.1109/TPAMI.2008.
246
Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions and bayesian restoration of
images. IEEE Trans. Pattern Anal. Mach. Intel. 6(11), 721–741 (1984)
Gimelfarb, G.: Image Textures and Gibbs Random Fields. Kluwer Academic Publishers, Dordrecht
(1999)
Google (2019) Tensorflow. Technical report, Google AI, http://www.tensorflow.org/
Grim, J., Haindl, M.: Texture modelling by discrete distribution mixtures. Comput. Stat. Data Anal.
41(3–4), 603–615 (2003)
Haindl, M.: Identification of the stochastic differential equation of the type arma. PhD thesis, ÚTIA
Czechoslovak Academy of Sciences, Prague (1983)
Haindl, M.: Texture synthesis. CWI Q. 4(4), 305–331 (1991)
1060 M. Haindl
Haindl, M.: Texture segmentation using recursive Markov random field parameter estimation.
In: Bjarne, K.E., Peter, J. (eds.) Proceedings of the 11th Scandinavian Conference on
Image Analysis, Pattern Recognition Society of Denmark, Lyngby, pp. 771–776 (1999).
http://citeseer.ist.psu.edu/305262.html; http://www.ee.surrey.ac.uk/Research/VSSP/3DVision/
virtuous/Publications/Haindl-SCIA99.ps.gz
Haindl, M.: Recursive square-root filters. In: Sanfeliu, A., Villanueva, J., Vanrell, M., Alquezar,
R., Jain, A., Kittler, J. (eds.) Proceedings of the 15th IAPR International Conference on Pattern
Recognition, vol. II, pp. 1018–1021. IEEE Press, Los Alamitos (2000). https://doi.org/10.1109/
ICPR.2000.906246
Haindl, M.: Recursive model-based colour image restoration. Lect. Notes Comput. Sci. (2396),
617–626 (2002)
Haindl, M., Filip, J.: Fast restoration of colour movie scratches. In: Kasturi, R., Laurendeau,
D., Suen, C. (eds.) Proceedings of the 16th International Conference on Pattern Recognition,
vol. III, pp. 269–272. IEEE Computer Society, Los Alamitos (2002). https://doi.org/10.1109/
ICPR.2002.1047846
Haindl, M., Filip, J.: Extreme compression and modeling of bidirectional texture function. IEEE
Trans. Pattern Anal. Mach. Intell. 29(10), 1859–1865 (2007). https://doi.org/10.1109/TPAMI.
2007.1139
Haindl, M., Filip, J.: Visual Texture. Advances in Computer Vision and Pattern Recognition.
Springer, London (2013). https://doi.org/10.1007/978-1-4471-4902-6
Haindl, M., Hatka, M.: BTF Roller. In: Chantler, M., Drbohlav, O. (eds.) Texture 2005. Proceedings
of the 4th International Workshop on Texture Analysis, pp. 89–94. IEEE, Los Alamitos (2005a)
Haindl, M., Hatka, M.: A roller – fast sampling-based texture synthesis algorithm. In: Skala,
V. (ed.) Proceedings of the 13th International Conference in Central Europe on Computer
Graphics, Visualization and Computer Vision, pp. 93–96. UNION Agency – Science Press,
Plzen (2005b)
Haindl, M., Havlíček, V.: Multiresolution colour texture synthesis. In: Dobrovodský, K. (ed.)
Proceedings of the 7th International Workshop on Robotics in Alpe-Adria-Danube Region,
pp. 297–302. ASCO Art, Bratislava (1998)
Haindl, M., Havlíček, V.: A multiresolution causal colour texture model. Lect. Notes Comput. Sci.
(1876), 114–122 (2000)
Haindl, M., Havlíček, V.: Texture editing using frequency swap strategy. In: Jiang, X., Petkov,
N. (eds.) Computer Analysis of Images and Patterns. Lecture Notes in Computer Science,
vol. 5702, pp. 1146–1153. Springer (2009). https://doi.org/10.1007/978-3-642-03767-2_139
Haindl, M., Havlíček, V.: A compound MRF texture model. In: Proceedings of the 20th Interna-
tional Conference on Pattern Recognition, ICPR 2010, pp. 1792–1795. IEEE Computer Society
CPS, Los Alamitos (2010). https://doi.org/10.1109/ICPR.2010.442
Haindl, M., Havlíček, V.: A plausible texture enlargement and editing compound marko-
vian model. In: Salerno, E., Cetin, A., Salvetti, O. (eds.) Computational Intelligence for
Multimedia Understanding. Lecture Notes in Computer Science, vol. 7252, pp. 138–148.
Springer, Berlin/Heidelberg (2012). https://doi.org/10.1007/978-3-642-32436-9_12, http://
www.springerlink.com/content/047124j43073m202/
Haindl, M., Havlíček, V.: Color Texture Restoration, pp. 13–18. IEEE, Piscataway (2015). https://
doi.org/10.1109/ICCIS.2015.7274540
Haindl, M., Havlíček, V.: Three-dimensional gaussian mixture texture model. In: The 23rd
International Conference on Pattern Recognition (ICPR), pp. 2026–2031. IEEE (2016). https://
doi.org/978-1-5090-4846-5/16/\protect\T1\textdollar31.0, http://www.icpr2016.org/site/
Haindl, M., Havlíček, M.: A compound moving average bidirectional texture function model. In:
Zgrzynowa, A., Choros, K., Sieminski, A. (eds.) Multimedia and Network Information Systems,
Advances in Intelligent Systems and Computing, vol. 506, pp. 89–98. Springer International
Publishing (2017a). https://doi.org/10.1007/978-3-319-43982-2_8
Haindl, M., Havlíček, V.: Two compound random field texture models. In: Beltrán-Castañón, C.,
Nyström, I., Famili, F. (eds.) 2016 the 21st IberoAmerican Congress on Pattern Recognition
(CIARP 2016). Lecture Notes in Computer Science, vol. 10125, pp. 44–51. Springer Interna-
tional Publishing AG, Cham (2017b). https://doi.org/10.1007/978-3-319-52277-7_6
28 Bidirectional Texture Function Modeling 1061
Haindl, M., Havlíček, V.: BTF compound texture model with fast iterative non-parametric control
field synthesis. In: di Baja, G.S., Gallo, L., Yetongnon, K., Dipanda, A., Castrillon-Santana, M.,
Chbeir, R. (eds.) Proceedings of the 14th International Conference on Signal-Image Technology
& Internet-Based Systems (SITIS 2018), pp. 98–105. IEEE Computer Society CPS, Los
Alamitos (2018a). https://doi.org/10.1109/SITIS.2018.00025
Haindl, M., Havlíček, V.: BTF compound texture model with non-parametric control field. In:
The 24th International Conference on Pattern Recognition (ICPR 2018), pp. 1151–1156. IEEE
(2018b). http://www.icpr2018.org/
Haindl, M., Mikeš, S.: Model-based texture segmentation. Lect. Notes Comput. Sci. (3212), 306–
313 (2004)
Haindl, M., Mikeš, S.: Colour texture segmentation using modelling approach. Lect. Notes
Comput. Sci. (3687), 484–491 (2005)
Haindl, M., Mikeš, S.: Unsupervised texture segmentation using multispectral modelling approach.
In: Tang, Y., Wang, S., Yeung, D., Yan, H., Lorette, G. (eds.) Proceedings of the 18th
International Conference on Pattern Recognition, ICPR 2006, vol. II, pp. 203–206. IEEE
Computer Society, Los Alamitos (2006). https://doi.org/10.1109/ICPR.2006.1148
Haindl, M., Mikeš, S.: Unsupervised texture segmentation using multiple segmenters strategy. In:
Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. Lecture Notes in Computer Science, vol. 4472,
pp. 210–219. Springer (2007). https://doi.org/10.1007/978-3-540-72523-7_22
Haindl, M., Mikeš, S.: Texture segmentation benchmark. In: Lovell, B., Laurendeau, D., Duin, R.
(eds.) Proceedings of the 19th International Conference on Pattern Recognition, ICPR 2008,
pp. 1–4. IEEE Computer Society, Los Alamitos (2008). https://doi.org/10.1109/ICPR.2008.
4761118
Haindl, M., Šimberová, S.: A multispectral image line reconstruction method. In: Theory &
Applications of Image Analysis. Series in Machine Perception and Artificial Intelligence,
pp. 306–315. World Scientific, Singapore (1992). https://doi.org/10.1142/9789812797896_
0028
Haindl, M., Šimberová, S.: A high – resolution radiospectrograph image reconstruction method.
Astron. Astrophys. 115(1), 189–193 (1996)
Haindl, M., Šimberová, S.: Model-based restoration of short-exposure solar images. In: Abraham,
A., Ruiz-del Solar, J., Koppen, M. (eds.) Soft Computing Systems Design, Management and
Applications, pp. 697–706. IOS Press, Amsterdam (2002)
Haindl, M., Šimberová, S.: Restoration of multitemporal short-exposure astronomical images.
Lect. Notes Comput. Sci. (3540), 1037–1046 (2005)
Haindl, M., Mikeš, S., Pudil, P.: Unsupervised hierarchical weighted multi-segmenter. In: Benedik-
tsson, J., Kittler, J., Roli, F. (eds.) Lecture Notes in Computer Science. MCS 2009, vol. 5519,
pp. 272–282. Springer (2009a). https://doi.org/10.1007/978-3-642-02326-2_28
Haindl, M., Mikeš, S., Vácha, P.: Illumination invariant unsupervised segmenter. In: Bayoumi, M.
(ed.) IEEE 16th International Conference on Image Processing – ICIP 2009, pp. 4025–4028.
IEEE (2009b). https://doi.org/10.1109/ICIP.2009.5413753
Haindl, M., Havlíček, V., Grim, J.: Probabilistic mixture-based image modelling. Kybernetika
46(3), 482–500 (2011). http://www.kybernetika.cz/content/2011/3/482/paper.pdf
Haindl, M., Remeš, V., Havlíček, V.: Potts compound markovian texture model. In: Proceedings
of the 21st International Conference on Pattern Recognition, ICPR 2012, pp. 29–32. IEEE
Computer Society CPS, Los Alamitos (2012)
Haindl, M., Mikeš, S., Kudo, M.: Unsupervised surface reflectance field multi-segmenter. In:
Azzopardi, G., Petkov, N. (eds.) Computer Analysis of Images and Patterns. Lecture Notes in
Computer Science, vol. 9256, pp. 261–273. Springer International Publishing (2015a). https://
doi.org/10.1007/978-3-319-23192-1_22
Haindl, M., Remeš, V., Havlíček, V.: BTF Potts Compound Texture Model, vol. 9398, pp. 939807–
1–939807–11. SPIE, Bellingham (2015b). https://doi.org/10.1117/12.2077481
Han, J.Y., Perlin, K.: Measuring bidirectional texture reflectance with a kaleidoscope. ACM Trans.
Graph. 22(3), 741–748 (2003)
Heeger, D., Bergen, J.: Pyramid based texture analysis/synthesis. In: ACM SIGGRAPH 95,
pp. 229–238. ACM Press (1995)
1062 M. Haindl
Holroyd, M., Lawrence, J., Zickler, T.: A coaxial optical scanner for synchronous acquisition of 3D
geometry and surface reflectance. ACM Trans. Graph. (Proc. SIGGRAPH 2010) (2010). http://
www.cs.virginia.edu/~mjh7v/Holroyd10.php
Kashyap, R.: Analysis and synthesis of image patterns by spatial interaction models. In: Kanal, L.,
Rosenfeld, A. (eds.) Progress in Pattern Recognition 1. Elsevier, North-Holland (1981)
Kashyap, R.: Image models. In: Young, T.Y., Fu, K.S. (eds.) Handbook of Pattern Recognition and
Image Processing. Academic, New York (1986)
Koudelka, M.L., Magda, S., Belhumeur, P.N., Kriegman, D.J.: Acquisition, compression, and
synthesis of bidirectional texture functions. In: Texture 2003: Third International Workshop
on Texture Analysis and Synthesis, Nice, pp. 59–64 (2003)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University
of Toronto (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
Kwatra, V., Schodl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis
using graph cuts. ACM Trans. Graph. 22(3), 277–286 (2003)
Levada, A., Mascarenhas, N., Tannus, A.: Pseudolikelihood equations for potts mrf model
parameter estimation on higher order neighborhood systems. Geosci. Remote Sens. Lett. IEEE
5(3), 522–526 (2008). https://doi.org/10.1109/LGRS.2008.920909
Li, X., Cadzow, J., Wilkes, D., Peters, R., Bodruzzaman II, M.: An efficient two dimensional mov-
ing average model for texture analysis and synthesis. In: Proceedings IEEE Southeastcon’92,
vol. 1, pp. 392–395. IEEE (1992)
Liang, L., Liu, C., Xu, Y.Q., Guo, B., Shum, H.Y.: Real-time texture synthesis by patch-based
sampling. ACM Trans. Graph. (TOG) 20(3), 127–150 (2001)
Liu, F., Picard, R.: Periodicity, directionality, and randomness: wold features for image modeling
and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 722–733 (1996). https://doi.org/10.
1109/34.506794
Loubes, J., Rochet, P.: Regularization with approximated L2 maximum entropy method. In:
Locally Adaptive Filtering in Signal and Image Processing. Springer, Berlin (2009)
Manjunath, B., Chellapa, R.: Unsupervised texture segmentation using Markov random field
models. IEEE Trans. Pattern Anal. Mach. Intell. 13, 478–482 (1991)
Marschner, S.R., Westin, S.H., Arbree, A., Moon, J.T.: Measuring and modeling the appearance of
finished wood. ACM Trans. Graph. 24(3), 727–734 (2005)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and
its application to evaluating segmentation algorithms and measuring ecological statistics. In:
Proceedings of 8th International Conference on Computer Vision, vol. 2, pp. 416–423 (2001).
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
Matuszak, M., Schreiber, T.: Locally specified polygonal Markov fields for image segmentation.
In: Locally Adaptive Filtering in Signal and Image Processing. Springer, Berlin (2009)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state
calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Mikeš, S., Haindl, M.: View dependent surface material recognition. In: Bebis, G., Boyle, R.,
Parvin, B., Koračin, D., Ushizima, D., Chai, S., Sueda, S., Lin, X., Lu, A., Thalmann, D., Wang,
C., Xu, P. (eds.) 14th International Symposium on Visual Computing (ISVC 2019). Lecture
Notes in Computer Science, vol. 11844, pp. 156–167. Springer Nature Switzerland AG (2019).
https://doi.org/10.1007/978-3-030-33720-9_12, https://www.isvc.net/
Müller, G., Meseth, J., Klein, R.: Compression and real-time rendering of measured BTFs using
local PCA. In: Vision, Modeling and Visualisation 2003, pp. 271–280 (2003)
Müller, G., Meseth, J., Sattler, M., Sarlette, R., Klein, R.: Acquisition, synthesis and rendering
of bidirectional texture functions. In: Eurographics 2004, STAR – State of The Art Report,
Eurographics Association, pp. 69–94 (2004)
Neubeck, A., Zalesny, A., Gool, L.: 3D texture reconstruction from extensive BTF data. In:
Chantler, M., Drbohlav, O. (eds.) Texture 2005. Heriot-Watt University, Edinburgh (2005)
28 Bidirectional Texture Function Modeling 1063
Ngan, A., Durand, F.: Statistical acquisition of texture appearance. In: Eurographics Symposium
on Rendering, Eurographics (2006)
Ojala, T., Maenpaa, T., Pietikainen, M., Viertola, J., Kyllonen, J., Huovinen, S.: Outex: new
framework for empirical evaluation of texture analysis algorithms. In: International Conference
on Pattern Recognition, pp. I:701–706 (2002)
Paget, R., Longstaff, I.D.: Texture synthesis via a noncausal nonparametric multiscale markov
random field. IEEE Trans. Image Process. 7(8), 925–932 (1998)
Panjwani, D., Healey, G.: Markov random field models for unsupervised segmentation of textured
color images. IEEE Trans. Pattern Anal. Mach. Intell. 17(10), 939–954 (1995)
Pattanayak, S.: Pro Deep Learning with TensorFlow. Apress (2017). https://doi.org/10.1007/978-
1-4842-3096-1
Polzehl, J., Tabelow, K.: Structural adaptive smoothing: principles and applications in imaging. In:
Locally Adaptive Filtering in Signal and Image Processing. Springer, Berlin (2009)
Portilla, J., Simoncelli, E.: A parametric texture model based on joint statistics of complex wavelet
coefficients. Int. J. Comput. Vis. 40(1), 49–71 (2000)
Potts, R., Domb, C.: Some generalized order-disorder transformations. In: Proceedings of the
Cambridge Philosophical Society, vol. 48, pp. 106–109 (1952)
Praun, E., Finkelstein, A., Hoppe, H.: Lapped textures. In: ACM SIGGRAPH 2000, pp. 465–470
(2000)
Rainer, G., Ghosh, A., Jakob, W., Weyrich, T.: Unified neural encoding of BTFs. In: Computer
Graphics Forum, vol. 39, pp. 167–178. Wiley Online Library (2020)
Reed, T.R., du Buf, J.M.H.: A review of recent texture segmentation and feature extraction
techniques. CVGIP–Image Underst. 57(3), 359–372 (1993)
Ren, P., Wang, J., Snyder, J., Tong, X., Guo, B.: Pocket reflectometry. ACM Trans. Graph. (Proc.
SIGGRAPH) 30(4) (2011). https://doi.org/10.1145/2010324.1964940
Ruiters, R., Schwartz, C., Klein, R.: Example-based interpolation and synthesis of bidirectional
texture functions. In: Computer Graphics Forum, vol. 32, pp. 361–370. Wiley Online Library
(2013)
Sattler, M., Sarlette, R., Klein, R.: Efficient and realistic visualization of cloth. In: Eurographics
Symposium on Rendering (2003)
Schwartz, C., Sarlette, R., Weinmann, M., Rump, M., Klein, R.: Design and implementation of
practical bidirectional texture function measurement devices focusing on the developments at
the university of bonn. Sensors 14(5), 7753–7819 (2014). https://doi.org/10.3390/s140507753.
http://www.mdpi.com/1424-8220/14/5/7753
Sharma, M., Singh, S.: Minerva scene analysis benchmark. In: Seventh Australian and New
Zealand Intelligent Information Systems Conference, pp. 231–235. IEEE (2001)
Soler, C., Cani, M., Angelidis, A.: Hierarchical pattern mapping. ACM Trans. Graph. 21(3), 673–
680 (2002)
Swendsen, R.H., Wang, J.S.: Nonuniversal critical dynamics in Monte Carlo simulations. Phys.
Rev. Lett. 58(2), 86–88 (1987). https://doi.org/10.1103/PhysRevLett.58.86
Tong, X., Zhang, J., Liu, L., Wang, X., Guo, B., Shum, H.Y.: Synthesis of bidirectional texture
functions on arbitrary surfaces. ACM Trans. Graph. (TOG) 21(3), 665–672 (2002)
Tsai, Y.T., Shih, Z.C.: K-clustered tensor approximation: a sparse multilinear model for real-
time rendering. ACM Trans. Graph. 31(3), 19:1–19:17 (2012). https://doi.org/10.1145/2167076.
2167077
Tsai, Y.T., Fang, K.L., Lin, W.C., Shih, Z.C.: Modeling bidirectional texture functions with
multivariate spherical radial basis functions. Pattern Anal. Mach. Intell. IEEE Trans. 33(7),
1356 –1369 (2011). https://doi.org/10.1109/TPAMI.2010.211
Vacha, P., Haindl, M.: Image retrieval measures based on illumination invariant textural mrf
features. In: CIVR’07: Proceedings of the 6th ACM International Conference on Image and
Video Retrieval, pp. 448–454. ACM Press, New York (2007). https://doi.org/10.1145/1282280.
1282346
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int.
J. Comput. Vis. 62(1–2), 61–81 (2005)
1064 M. Haindl
Wang, J., Dana, K.: Relief texture from specularities. IEEE Trans. Pattern Anal. Mach. Intell. 28(3),
446–457 (2006)
Wei, L., Levoy, M.: Texture synthesis using tree-structure vector quantization. In: ACM SIG-
GRAPH 2000, pp. 479–488. ACM Press/Addison Wesley/Longman (2000). citeseer.nj.nec.
com/wei01texture.html
Wei, L., Levoy, M.: Texture synthesis over arbitrary manifold surfaces. In: SIGGRAPH 2001,
pp. 355–360. ACM (2001)
Wu, F.: (1982) The Potts model. Rev. Modern Phys. 54(1), 235–268
Wu, H., Dorsey, J., Rushmeier, H.: A sparse parametric mixture model for BTF compression,
editing and rendering. Comput. Graph. Forum 30(2), 465–473 (2011)
Xu, Y., Guo, B., Shum, H.: Chaos mosaic: fast and memory efficient texture synthesis. Technical
Report MSR-TR-2000-32, Redmont (2000)
Zelinka, S., Garland, M.: Towards real-time texture synthesis with the jump map. In: 13th European
Workshop on Rendering, p. 99104 (2002)
Zelinka, S., Garland, M.: Interactive texture synthesis on surfaces using jump maps. In: Chris-
tensen, P., Cohen-Or, D. (eds.) 14th European Workshop on Rendering, Eurographics (2003)
Zhang, Y.J.: Evaluation and comparison of different segmentation algorithms. Pattern Recogn.
Lett. 18, 963–974 (1997)
Zhang, J.D., Zhou, K., Velho ea, L.: Synthesis of progressively-variant textures on arbitrary
surfaces. ACM Trans. Graph. 22(3), 295–302 (2003)
Zhu, S., Liu, X., Wu, Y.: Exploring texture ensembles by efficient Markov Chain Monte Carlo –
toward a “trichromacy” theory of texture. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 554–
569 (2000)
Regularization of Inverse Problems by
Neural Networks 29
Markus Haltmeier and Linh Nguyen
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066
Ill-Posedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067
Data-Driven Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069
Right Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1070
Regularization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
Regularizing Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
Null-Space Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079
The NETT Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080
Learned Regularization Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080
Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082
Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088
Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1091
Abstract
M. Haltmeie ()
Department of Mathematics, University of Innsbruck, Innsbruck, Austria
e-mail: [email protected]
L. Nguyen
Department of Mathematics, University of Idaho, Moscow, ID, USA
e-mail: [email protected]
Keywords
Introduction
Ill-Posedness
The inherent character of inverse problems is their ill-posedness. This means that
even in the case of exact data, the solution of (1) is either not unique, not existent,
or does not stably depend on the given data. More formally, for an inverse problem,
at least one of the following three unfavorable properties holds:
These conditions imply that the forward operator does not have a continuous inverse,
which could be used to directly solve (1). Instead, regularization methods have to
be applied, which result in stable methods for solving inverse problem.
Regularization methods approach the ill-posedness by two steps. First, to address
non-uniqueness and non-existence issues (I1), (I2), one restricts the image and pre-
image space of the forward operator to sets M ⊆ X and ran(A) ⊆ Y, such that
the restricted forward operator Ares : M → ran(A) becomes bijective. For any exact
data, the equation A(x) = y then has a unique solution in M, which is given by the
inverse of the restricted forward operator applied to y. Second, in order to address
the instability issue (I3), in a second step, one considers a family of continuous
operators Bα : Y → X for α > 0 that converge to A−1 res in a suitable sense; see
section “Preliminaries” for precise definitions.
Note that the choice of the set M is crucial as it represents the class of desired
reconstructions and acts as selection criteria for picking a particular solution of the
given inverse problem. The main challenge is that this class is actually unknown or
at least it cannot be described properly. For example, in CT for medical imaging,
the set of desired solutions represents the set of all functions corresponding to
spatially attenuation inside patients, a function class that is clearly challenging, if
not impossible, to describe in simple mathematical terms.
Variational regularization and variants (Scherzer et al. 2009) have been the most
successful class of regularization methods for solving inverse problems. Here, M is
defined as solutions having a small value of a certain regularization functional that
can be interpreted as a measure for the deviation from the desired solutions. Various
regularization functionals have been analyzed for inverse problems, including
Hilbert space norms (Engl et al. 1996), total variation (Acar and Vogel 1994), and
sparse q -penalties (Daubechies et al. 2004; Grasmair et al. 2008). Such handcrafted
regularization functionals have limited complexity and are unlikely to accurately
model complex signal classes arising in applications such as medical imaging. On
the other hand, their regularization effects are well understood, efficient numerical
1068 M. Haltmeier and L. Nguyen
algorithms have been developed for their realization, they work reasonably well in
practice, and they have been rigorously analyzed mathematically.
Data-Driven Reconstruction
Recently data-driven methods based on neural networks and deep learning demon-
strated to significantly outperform existing variational and iterative reconstruction
algorithms for solving inverse problems. The essential idea is to use neural networks
to define a class (Rθ )θ∈ of reconstruction networks Rθ : Y → X and to select the
parameter vector θ ∈ of the network in a data-driven manner. The selection is
based on a set of training data (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ), where xi ∈ M are
desired reconstructions and yi = A(x∗i )+ξi ∈ Y are corresponding data. Even if the
set M of desired reconstructions is unknown, the available samples x1 , . . . , xN can
be used to select the particular reconstruction method. A typical selection strategy
is to minimize a penalized least-squares functional having the form
⎧ ⎫
⎨1
N
⎬
θ ∗ ∈ arg min xi − Rθ (yi ) + P (θ ) .
2
(2)
θ ⎩N ⎭
i=1
at the same time have a regularization effect on the underlying inverse problem.
Needless to say, understanding and analyzing where exactly the regularization effect
comes from will increase the reliability of any algorithm and allows its further
improvement. In conclusion, any data-driven reconstruction method has to include
either explicitly or implicitly a form of regularization. In this chapter, we will
analyze the regularization properties of various deep learning methods for solving
inverse problems.
Outline
Preliminaries
Right Inverses
Clearly, a right inverse always exists because for any y ∈ ran(A), there exists an
element By := x, such that Ax = y. However, in general, no continuous right inverse
exists. More precisely, we have the following result (compare Nashed 1987).
(a) A has a linear right inverse with bounded B ◦ A if and only if ker(A) is
complemented.
(b) A linear right inverse as in (a) is continuous if and only ran(A) is closed.
Proof. (a) First, suppose that A has a linear right inverse B : ran(A) → X such
that B ◦ A is bounded. For any x ∈ X, we have (B ◦ A)2 (x) = B ◦ (A ◦ B)(A(x)) =
(B ◦ A)(x). Hence, B ◦ A is a linear bounded projection. This implies the topological
decomposition X = ran(B ◦ A) ⊕ ker(B ◦ A) with closed subspaces ran(B ◦ A)
and ker(B ◦ A). It holds ker(B ◦ A) ⊇ ker(A) = ker(A ◦ B ◦ A) ⊇ ran(B ◦ A),
which shows that ker(A) = ran(B ◦ A) is complemented. Conversely let ker(A) be
complemented, and write X = X1 ⊕ ker(A). Then Ares : X1 → ran(A) is bijective
and therefore has a linear inverse A−1
res defining a desired right inverse for A.
(b) For any continuous right inverse, ran(A) is closed according to Proposition 1.
Conversely, let B : ran(A) → X be linear right inverse such that B ◦ A is bounded
and ran(A) closed. In particular, ker(A) is complemented, and we can write
X = X1 ⊕ker(A). The restricted mapping Ares : X1 → ran(A) is bijective, therefore
bounded according to the bounded inverse theorem. This implies that B is bounded,
too.
Example 1 (Bounded linear operator without linear right inverse). Consider the set
c0 (N) of all sequences converging to zero as a subspace of the space ∞ (N) of all
bounded sequences x : N → R with the supremum norm x∞ := supn∈N x(n) .
Note that c0 (N) ⊆ ∞ (N) is a classic example for a closed subspace that is not
complemented in a Banach space, as first shown in Phillips (1940). Now consider
∞
the quotient space Y = (N)c0 (N), where elements in ∞ (N) are identified
if their difference is contained in c0 (N). Then the quotient map A : ∞ (N) →
Y : x → [x] is clearly linear, bounded, and onto with ker(A) = c0 (N). It is clear
that a right inverse of A exists, which can be constructed by simply choosing any
representative in [x]. However, because c0 (N) is not complemented, the kernel of A
is not complemented, and according to Proposition 2, no linear right inverse B of A
such that B ◦ A is bounded.
(a) A has a unique linear right inverse A† : ran(A) → X with A◦A† = Id −Pker(A) .
(b) ∀y ∈ ran(A) : A† (y) = arg min {x | A(x) = y}.
(c) A† is continuous if and only if ran(A) is closed.
(d) If ran(A) is non-closed, then any right inverse is discontinuous.
In the case that X and Y are both Hilbert spaces, there is a unique extension
A† : ran(A) ⊕ ran(A)⊥ → X, such that A† (y1 ⊕ y2 ) = A† (y1 ) for all y1 ⊕ y2 ∈
ran(A) ⊕ ran(A)⊥ . The operator A† is referred to as the Moore-Penrose inverse of
A. For more background on generalized in inverses in Hilbert and Banach spaces,
see Nashed (1987).
Regularization Methods
is a regularization method for (1) on the signal class M∗ with respect to the norm
distance dY . One calls α0 an a-prior parameter choice over the set M∗ .
Proof. For any > 0, choose α( ) such Bα( ) (y) − x ≤ /2 for all x ∈ M∗ .
Moreover, choose τ ( ) such that for all z ∈ Y with y − z ≤ τ ( ), we have
Bα( ) (y) − Bα( ) (z) ≤ /2. Without loss of generality, we can assume that τ ( ) is
strictly increasing and continuous with τ (0+) = 0. We define α0 := α ◦ τ −1 . Then,
for every δ > 0 and y − yδ ≤ δ,
Bα0 (δ) (yδ ) − x ≤ Bα0 (δ) (y) − x + Bα0 (δ) (y) − Bα0 (δ) (yδ )
= Bα◦τ −1 (δ) (y) − x + Bα◦τ −1 (δ) (y) − Bα◦τ −1 (δ) (yδ )
Deep Learning
In this subsection, we give a brief review of neural networks and deep learning.
Deep learning can be characterized as the field where deep neural networks are used
to solve various learning problems (LeCun et al. 2015; Goodfellow et al. 2016).
Several such methods recently appeared as a new paradigm for solving inverse
problems. In deep learning literature, neural networks are often formulated in a finite
dimensional setting. To allow a unified treatment, we consider here a general setting,
including the finite dimensional as well as the infinite-dimensional setting.
1
N
RN : → [0, ∞] : θ → L(θ (yi ), xi ) + P (θ ) . (5)
N
i=1
Here L : X × X → R is the so-called loss function, which is a measure for the error
made by the function θ on the training examples, and P is a penalty that prevents
overfitting of the network and also stabilizes the training process.
Both the numerical minimization of the functional (5) and investigating proper-
ties of θ ∗ as N → ∞ are of interest in its own (Glorot and Bengio 2010; Chen
et al. 2018) but not subject of our analysis. Instead, most theory in this chapter is
developed under the assumption of suitable trained prediction function.
Example 3 (CNNs using sparsity and weight sharing). In order to reduce the
number of free parameter between a linear mapping between images, say of sizes
q = n × n and p = n × n, CNNs implement sparsity and weight sharing via
convolution operators. In fact, a convolution operation K : Rn×n → Rn×n with
kernel size k×k is represented by k 2 numbers, which clearly enormously reduces the
number n4 of parameters required to represent a general linear mapping on Rn×n .
To enrich the expressive power of the neural network, actual CNN architectures use
multiple-input multiple-output convolutions K : Rn×n×c → Rn×n×d , which uses
one convolution kernel for each pair in {1, . . . , c}×{1, . . . , d} formed between each
input channel and each output channel. This now increases the number of learnable
parameters to cdk 2 , but overall the number of parameters remains much smaller
than for a full dense layer between large images. Moreover, the use of multiple-input
multiple-output convolutions in combination with typical nonlinearities introduces
a flexible and complex structure, which demonstrated to give state-of-the art results
in various imaging tasks.
Regularizing Networks
The results in this section generalize the methods and some of the results of Schwab
et al. (2019) from the Hilbert case to the Banach space case.
Null-Space Networks
A ◦ IdX +Pker(A) ◦ Nθ (x) = A(x) + A ◦ Pker(A) ◦ Nθ (x) = y . (7)
Remark 1 (Computation of the projection layer). One of the main ingredient in the
null-space network is the computation of the projection layer Pker(A) . In some cases,
it can be computed explicitly. For example, if A = SI ◦ F is the subsampled Fourier
transform, then Pker(A) = F∗ ◦ SI ◦ F. For a general forward operator between
Hilbert spaces, the projection Pker(A) z can be implemented via standard methods
for solving linear equation. For example, using the starting value z and solving the
equation A(x) = 0 with the CG (conjugate gradient) method for the normal equation
or Landwebers methods gives a sequence that converges to the projection Pker(A) z =
arg min {x − z | A(x) = 0}.
An example comparing a standard residual network IdX +Nθ and a null-space
network Id +Pker(A) ◦ Nθ both with two weight layers are shown in Fig. 1.
(i) θ ◦ B is continuous
(ii) B is continuous
(iii) ran(A) is closed.
1
N
RN : → [0, ∞] : θ → xi − θ (zi )2 + βθ 22 . (9)
N
n=1
Note that for our analysis, it is not required that (9) is exactly minimized. Any null-
space network θ where N n=1 xi − θ (zi ) is small yields a right inverse θ B
2
that does a better job in estimating xn from data Axn than the original right inverse B.
1078 M. Haltmeier and L. Nguyen
Fig. 2 Linear
Regularization (Rδ )δ>0
combined with a null-space
network
θ = Id +Pker(A) ◦ Nθ . We
start with a linear
regularization Rδ y and the
null-space network
θ = Id +Pker(A) ◦ Nθ adds
missing parts along the null
space ker(A)
Convergence Analysis
x−θ ◦Rδ (yδ ) = (Id +Pker(A) ◦Nθ )(B◦A(x))−(Id +Pker(A) ◦Nθ )(Rδ (yδ ))
≤ L(B ◦ A)(x) − Rδ (yδ ) .
Here, we have used the identity x = (Id +Pker(A) ◦ Nθ )((B ◦ A)(x)) for x ∈ ran(θ ).
Consequently
29 Regularization of Inverse Problems by Neural Networks 1079
In particular, (θ ◦Rδ )δ>0 is a regularization method for (1) on θ (U∗ ) with respect
to the similarity measure D
Extensions
Let us recall that convex variational regularization of the inverse problem (1)
consists in minimizing the generalized Tikhonov functional D(A(x), yδ ) + αR(x),
where R is a convex functional and D a similarity measure (see section “Regular-
ization Methods”). The regularization term R is traditionally a semi-norm defined
on a dense subspace of X. In this section, we will extend this setup by using deep
learning techniques with learned regularization functionals.
inverse problem for two reasons. The first reason is that it aligns with the inverse
problem (and better serves any solution approach). Secondly, in medical imaging
applications, the training signals are often not the groundtruth signals. They are
normally obtained with a reconstruction method from high-quality data. Therefore,
while training the regularizer, one should also keep in mind the reconstruction
mechanism of the training data. A possible approach is to first train a baseline
neural network to learn model-independent representation. Then an additional block
is added on top to train for model-dependent representation. This has been shown in
Obmann et al. (2020a) to be a very efficient strategy.
Let us mention that approach (T1) has richer literature than (T2) but less
(convergence) analysis. In this section, we focus more on (T2), where we establish
the convergence analysis and convergence rate in section “Convergence Analysis”.
This is an extension of our works Haltmeier et al. (2019) and Obmann et al. (2020a).
In section “Related Methods”, we review a few existing methods that are most
relevant to our discussion, including some works in approach (T1). We also propose
INDIE, which can be regarded as an operator inversion-free variant of the MODL
technique (Aggarwal et al. 2018) and can make better use of parallel computation.
Convergence Analysis
Analysis for regularization with neural networks has been studied in Li et al. (2020)
and Haltmeier et al. (2019). In this section, we further investigate the issue. To
this end, we consider the approach (T2), where the neural networks are trained
independently of the optimization problem (11). That is, θ = θ ∗ is already fixed
a priori. For the sake of simplicity, we will drop θ from the notation of Rθ and Dθ .
We focus on how the problem depends on the regularization parameter α and noise
level δ in the data. Such analysis in standard situations is well studied; see, e.g.,
Scherzer et al. (2009). However, we need to extend the analysis to more general
cases to accommodate the fact that R comes from a neural network and is likely
non-convex.
Let us make several assumptions on the regularizer and fidelity term.
Condition 3.
For (A1), the coercivity condition (c) is the most restrictive. However, it can be
accommodated. One such regularizer is proposed in our recent work Haltmeier et al.
(2019) as follows:
β
R(x) = φ(E(x)) + x − (D ◦ E)(x)22 . (12)
2
Here, D◦E : X → X is an encoder-decoder network. The regularizer R is to enforce
that a reasonable solution x satisfies x (D ◦ E)(x) and φ(E(x)) is small. The term
φ(E(x)) implements learned prior knowledge, which is normally a sparsity measure
in a non-linear basis. The second term x − (D ◦ E)(x)22 forces x to be close to data
manifold M. Their combination also guarantees the coercivity of the regularization
functional R. Another choice for R was suggested in Li et al. (2020).
For (A2), C is a conic set in Y. For any y ∈ C, we define dom(D(y, · )) =
{y | D(y, y ) < ∞}. The data consistency conditions in (A2) are flexible enough
to be satisfied by a few interesting cases. The first example is that D(y, y ) = y −
y2 , which is probably the most popular data consistency measure. Another case
is the Kullback-Leibler divergence, which reads as follows. Let Y = Rn , and A :
X → Y is a bounded linear positive operator.1 Consider nonnegative cone C =
{(y1 , . . . , yn ) | ∀i : yi ≥ 0}. We define D : C × C → [0, ∞] by
n
yi
D(y, y ) = yi log + yi − yi .
yi
i=1
(a) Existence: For all y ∈ C and α > 0, there exists a minimizer of Ty,α in D.
1A is positive if: y ≥ 0 ⇒ Ay ≥ 0.
1084 M. Haltmeier and L. Nguyen
(b) Stability: If yk → y, D(y, yk ) < ∞ and xk ∈ arg min Tα;yk , then weak
accumulation points of (xk )k∈N exist and are minimizers of Tα;y .
(c) Convergence: Let y ∈ ran(A) ∩ C and (yk )k∈N satisfy D(y, yk ) ≤ δk for some
sequence (δk )k∈N ∈ (0, ∞)N with δk → 0. Suppose xk ∈ arg minx Tyk ,α(δk ) (x),
and let the parameter choice α : (0, ∞) → (0, ∞) satisfy
δ
lim α(δ) = lim = 0. (14)
δ→0 δ→0 α(δ)
Proof. (a) First, we observe that c := infx Ty,α (x) ≤ Ty,α (0) < ∞. Let (xk )k be
a sequence such that Ty,α (xk ) → c. There exists M > 0 such that Ty,α ( xk ) ≤ M,
which implies αR(xk ) ≤ M. Since R is coercive, we obtain that (xk )k is bounded.
By passing into a subsequence, xki x∗ ∈ D. Due to the lower semi-continuity of
∗
Tα, · ( · ), we have x ∈ arg min Ty,α .
(b) Since xk ∈ arg min Ty,α , it holds Tyk ,α (xk ) ≤ Tyk ,α (0) = D(0, yk ) + αR(0).
Thanks to the continuity of D(0, · ) on C, (D(0, yk ))k is a bounded sequence.
Therefore, αR(xk ) ≤ Tyk ,α (xk ) ≤ M, for a constant M independent of k. Since
R is coercive, (xk )k is bounded and hence has a weakly convergent subsequence
xki x† .
Let us now prove that x† is a minimizer of Ty,α . Since Ty,α (x) is lower semi-
continuous in x and y,
On the other hand, let x ∈ D be such that Ty,α (x) < ∞. We obtain D(A(x), y) < ∞
and R(x) < ∞. Condition (A2)(d) and D(y, yk ) < ∞ give D(A(x), yk ) < ∞.
That is, yk ∈ dom(D(A(x), ·). The continuity of D(A(x), ·) on its domain implies
D(A(x), yk ) → D(A(x), y). Since xk is the minimizer of Tyk ,α , Tyk ,α (xk ) ≤
Tyk ,α (x). Taking the limit, we obtain lim supk Tyk ,α (xk ) ≤ Ty,α (x). From (15),
Ty,α (x† ) ≤ Ty,α (x) for any x ∈ D. We conclude that x† is a minimizer of Ty,α .
(c) We prove the properties item by item.
The function F is called totally non-linear at x if νF (x, t) > 0 for all t ∈ (0, ∞).
1086 M. Haltmeier and L. Nguyen
The following result, due to Li et al. (2020), connects the convergence in absolute
Bregman distance and in norm
We now focus on the convergence rate. To this end, we make the following
assumptions:
The most restrictive condition in the above list is that A has finite-dimensional
range. However, this assumption holds true in practical applications such as sparse
data tomography, which is the main focus of deep learning techniques for inverse
problems. For infinite-dimensional space result, see Li et al. (2020).
We start our analysis with the following result.
The proof follows Obmann et al. (2020a). We present it here for the sake of
completeness.
Proof. Let us first prove that for some constant γ ∈ (0, ∞), it holds
Indeed, let P be the orthogonal projection onto ker(A) and define x0 = (x† −
P(x† )) + P(x). Then, we have A(x0 ) = A(x† ) and x − x0 ∈ ker(A)⊥ . Since the
restricted operator A|ker(A)⊥ : ker(A)⊥ → Y is injective and has finite-dimensional
range, it is bounded from below by a constant γ0 . Therefore,
R (x† ), x† − x = R (x† ), x† − x0 + R (x† ), x0 − x
≤ R ( x† ), x0 − x ≤ R (x† ) x0 − x.
+ CαA(xδα ) − A(x† )
≤ δ 2 + Cαδ − D(A( xδα ), yδ ) + CαA(xδα ) − yδ
≤ δ + Cαδ − D(A(xα ), y ) + Cα D(A( xδα ), yδ ) .
2 δ δ
Related Methods
Nc Nc
Ty,α (x) = Tc (x) := φic ((K̄ic x)j )+α ψic ((Kic (A(x)−y))j ) ,
c=1 c=1 j i j i
where K̄ic and, Kic are learnable convolutional operators, and φi , ψi are learnable
functionals. Alternating gradient descent method for minimizing Ty,α provides the
update formula
Network cascades: Deep network cascades (Kofler et al. 2018; Schlemper et al.
2017) alternate between the application of post-processing networks and so-called
29 Regularization of Inverse Problems by Neural Networks 1089
data consistency layers. The data consistency condition proposed in Kofler et al.
(2018) for sparse data problems A = S ◦ AF , where S is a sampling operator and AF
a full data forward operator (such as the fully sampled Radon transform), takes the
form
xn+1 = BF arg min z − AF (Nθ(n) (xn ))2 + αy − S(z)2 ,
2 2
(23)
z
Concatenating these steps together, one arrives at a deep neural network. Similar to
network cascades, each block (24) consists of a trainable layer zn = A y+αNθ ( xn )
and a non-trainable data consistency layer xn+1 = (A A + λ Id)−1 ( zn ).
Here the constant C > 0 is an upper bound for the operator norm A. Elementary
manipulations show the identity
Ln (x) = −2 A (y − A(xn )) + αNθ (xn ) + Cxn , x
1
xn+1 = A (y − A(xn )) + αNθ (xn ) + Cxn . (25)
α+C
This results at a deep neural network similar to the MODL iteration. However, each
block in (25) is clearly simpler than the blocks in (24). In fact, as opposed to MODL,
our proposed learned iterative scheme does not require costly matrix inversion. We
name the resulting iteration INDIE (for inversion-free deep iterative) cascades. We
consider the numerical comparison of MODL and INDIE as well as the theoretical
analysis of both architectures to be interesting lines of future research.
Learned synthesis regularization: Let us finish this section by pointing out that
regularization by neural network is not restricted to the form (11). For example, one
can consider the synthesis version, which reads (Obmann et al. 2020b)
xsyn = Dθ arg min A ◦ Dθ (ξ ) − y2 + α ωλ |ξλ |p , (26)
ξ
λ∈Λ
Inverse problems are central to solving a wide range of important practical problems
within and outside of imaging and computer vision. Inverse problems are char-
acterized by the ambiguity and instability of their solution. Therefore, stabilizing
solution methods based on regularization techniques is necessary to solve them in
a reasonable way. In recent years, neural networks and deep learning have emerged
as the rising stars for the solution of inverse problems. In this chapter, we have
developed the mathematical foundations for solving inverse problems with deep
learning. In addition, we have shown stability and convergence for selected neural
networks to solve inverse problems. The investigated methods, which combine
29 Regularization of Inverse Problems by Neural Networks 1091
the strengths of both worlds, are regularizing null-space networks and the NETT
(Network-Tikhonov) approach for inverse problems.
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Probl. 10(6), 1217–1229 (1994)
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33(12), 124007 (2017)
Aggarwal, H.K., Mani, M.P., Jacob, M.: MoDL: model-based deep learning architecture for inverse
problems. IEEE Trans. Med. Imaging 38(2), 394–405 (2018)
Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve
deep neural networks. arXiv:1412.6830 (2014)
Aljadaany, R., Pal, D.K., Savvides, M.: Douglas-rachford networks: learning both the image prior
and data fidelity terms for blind image deconvolution. In: Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, pp. 10235–10244 (2019)
Arridge, S., Maass, P., Öktem, O., Schönlieb C.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives.
IEEE Trans. Pattern Anal. 35(8), 1798–1828 (2013)
Boink, Y.E., Haltmeier, M., Holman, S., Schwab, J.: Data-consistent neural networks for solving
nonlinear inverse problems. arXiv:2003.11253 (2020), to apper in Inverse Probl. Imaging
Bora, A., Jalal, A., Price, E., Dimakis, A.G.: Compressed sensing using generative models. In:
Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 537–546
(2017)
Brosch, T., Tam, R., et al.: Manifold learning of brain MRIs by deep learning. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 633–640.
Springer (2013)
Bubba, T.A., Kutyniok, G., Lassas, M., Maerz, M., Samek, W., Siltanen, S., Srinivasan, V.:
Learning the invisible: a hybrid deep learning-shearlet framework for limited angle computed
tomography. Inverse Probl. 35(6), 064002 (2019)
Chen, D., Davies, M.E.: Deep decomposition learning for inverse imaging problems. In European
Conference on Computer Vision, pp. 510–526). Springer, Cham (2020)
Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations.
In: Advances in Neural Information Processing Systems, pp. 6571–6583 (2018)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Dittmer, S., Maass, P.: A projectional ansatz to reconstruction. arXiv:1907.04675 (2019)
Dittmer, S., Kluth, T., Maass, P., Baguer, D.O.: Regularization by architecture: a deep prior
approach for inverse problems. J. Math. Imaging Vis. 62, 456–470 (2020)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, vol. 375. Kluwer
Academic Publishers Group, Dordrecht (1996)
Georg, M., Souvenir, R., Hope, A., Pless, R.: Manifold learning for 4D CT reconstruction of the
lung. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
pp. 1–8. IEEE (2008)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks.
In: Proceedings of 13th International Conference on Artificial Intelligence and Statistics,
pp. 249–256 (2010)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, London (2016)
Grasmair, M., Haltmeier, M., Scherzer, O.: Sparse regularization with l q penalty term. Inverse
Probl. 24(5), 055020 (2008)
1092 M. Haltmeier and L. Nguyen
Han, Y., Yoo, J.J., Ye, J.C.: Deep residual learning for compressed sensing CT reconstruction via
persistent homology analysis (2016). http://arxiv.org/abs/1611.06391
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778
(2016)
Huang, Y., Preuhs, A., Manhart, M., Lauritsch, G., Maier, A.: Data consistent ct reconstruction
from insufficient data with learned prior images. arXiv:2005.10034 (2020)
Ivanov, V.K., Vasin, V.V., Tanana, V.P.: Theory of Linear Ill-Posed Problems and Its Applications.
Inverse and Ill-Posed Problems Series, 2nd edn. VSP, Utrecht, (2002). Translated and revised
from the 1978 Russian original
Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse
problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)
Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational
methods and deep learning. In: German Conference on Pattern Recognition, pp. 281–293.
Springer (2017)
Kofler, A., Haltmeier, M., Kolbitsch, C., Kachelrieß, M., Dewey, M.: A U-Nets cascade for sparse
view computed tomography. In: Proceedings of 1st Workshop on Machine Learning for Medical
Image Reconstruction, pp. 91–99. Springer (2018)
Kofler, A., Haltmeier, M., Schaeffter, T., Kachelrieß, M., Dewey, M., Wald, C., Kolbitsch, C.:
Neural networks-based regularization of large-scale inverse problems in medical imaging. Phys.
Med. Biol. 65, 135003 (2020)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Li, H., Schwab, J., Antholzer, S., Haltmeier, M.: NETT: solving inverse problems with deep neural
networks. Inverse Probl. 36, 065005 (2020)
Lindenstrauss, J., Tzafriri, L.: On the complemented subspaces problem. Israel J. Math. 9(2),
263–269 (1971)
Lunz, S., Öktem, O., Schönlieb, C.: Adversarial regularizers in inverse problems. In: Advances in
Neural Information Processing Systems, vol. 31, pp. 8507–8516 (2018)
Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S.S., Zaharchuk, G., Xing, L., Pauly, J.M.: Deep
generative adversarial neural networks for compressive sensing MRI. IEEE Trans. Med. Imag.
38(1), 167–179 (2018)
Nashed, M.Z.: Inner, outer, and generalized inverses in banach and hilbert spaces. Numer. Func.
Anal. Opt. 9(3–4), 261–325 (1987)
Obmann, D., Nguyen, L., Schwab, J., Haltmeier, M.: Sparse aNETT for solving inverse problems
with deep learning. In 2020 IEEE 17th International Symposium on Biomedical Imaging
Workshops (ISBI Workshops) (pp. 1–4). IEEE (2020a)
Obmann, D., Schwab, J., Haltmeier, M.: Deep synthesis network for regularizing inverse problems.
Inverse Problems, 37(1), 015005 (2020b)
Obmann, D., Nguyen, L., Schwab, J., Haltmeier, M.: Augmented NETT regularization of inverse
problems. J. Phys. Commun. 5(10), 105002 (2021)
Phillips, R.S.: On linear transformations. Trans. Am. Math. Soc. 48(3), 516–541 (1940)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv:1710.05941
(2017)
Resmerita, E., Anderssen, R.S.: Joint additive Kullback–Leibler residual minimization and
regularization for linear inverse problems. Math. Methods Appl. Sci. 30(13), 1527–1544 (2007)
Roth, S., Black, M.J.: Fields of experts: a framework for learning image priors. In: Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2,
pp. 860–867. IEEE (2005)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2009)
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A., Rueckert, D.: A deep cascade of convolutional
neural networks for MR image reconstruction. In: Proceedings of Information Processing in
Medical Imaging, pp. 647–658. Springer (2017)
29 Regularization of Inverse Problems by Neural Networks 1093
Schwab, J., Antholzer, S., Haltmeier, M.: Deep null-space learning for inverse problems:
convergence analysis and rates. Inverse Probl. 35(2), 025008 (2019)
Schwab, J., Antholzer, S., Haltmeier, M.: Big in Japan: regularizing networks for solving inverse
problems. J. Math. Imaging Vis. 62, 445–455 (2020)
Sulam, J., Aberdam, A., Beck, A., Elad, M.: On multi-layer basis pursuit, efficient algorithms
and convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 1968–1980
(2019)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, pp. 9446–9454 (2018)
Van Veen, D., Jalal, A., Soltanolkotabi, M., Price, E., Vishwanath, S., Dimakis, A.G.: Compressed
sensing with deep image prior and learned regularization. arXiv:1806.06438 (2018)
Wachinger, C., Yigitsoy, M., Rijkhorst, E., Navab, N.: Manifold learning for image-based breathing
gating in ultrasound and MRI. Med. Image Anal. 16(4), 806–818 (2012)
Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-net for compressive sensing MRI. In: Proceedings
of 30th International Conference on Neural Information Processing Systems, pp. 10–18 (2016)
Shearlets: From Theory to Deep Learning
30
Gitta Kutyniok
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096
The Applied Harmonic Analysis Viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096
Frame Theory Comes into Play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097
Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098
From Wavelets to Shearlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098
From Inverse Problems to Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
Continuous Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100
Classical Continuous Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101
Cone-Adapted Continuous Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103
Resolution of the Wavefront Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104
Discrete Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105
Cone-Adapted Discrete Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105
Frame Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106
Sparse Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109
Extensions of Shearlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1111
Higher Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1111
α-Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113
Universal Shearlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114
Digital Shearlet Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116
Digital 2D Shearlet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116
Extensions of the Digital 2D Shearlet Transform and ShearLab3D . . . . . . . . . . . . . . . . . . 1118
Applications of Shearlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119
Sparse Regularization Using Shearlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1120
Shearlets Meet Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
G. Kutyniok ()
Ludwig-Maximilians-Universität München, Mathematisches Institut, München, Germany
e-mail: [email protected]
Abstract
Keywords
Introduction
This can be regarded as an encoding step, often aiming to reveal important features
of the data f such as singularities by analyzing the associated coefficient sequence.
On the other hand, (ψλ )λ∈ can also serve as a means to expand the data by
representing it as
30 Shearlets: From Theory to Deep Learning 1097
f = c(f )λ ψλ for all f ∈ C. (2)
λ∈
Since efficient expansions are typically desirable, one usually aims for the coeffi-
cient sequence (c(f )λ )λ∈ to be sparse in the sense of rapid decay to allow efficient
encoding of the data f .
In case that (ψλ )λ∈ forms an orthonormal basis, it is well-known that c(f )λ =
, ψλ for all λ ∈ . However, it might not be possible to design an orthonormal
basis with the desirable properties, or redundancy is for other reasons such as
robustness required. This then leads to the notion of a frame, in which case (1)
and (2) cannot be that easily linked, but requires methods from frame theory.
The area of frame theory focuses on redundant representation systems in the sense
of nonunique expansions, thereby going beyond the concept of orthonormal bases.
It provides a general framework for redundant systems (ψλ )λ∈ while allowing to
control their stability.
A system (ψλ )λ∈ is called a frame for H, if there exist constants 0 < A ≤ B <
∞ such that
A f 2≤ |f, ψλ |2 ≤ B f 2 for all f ∈ H.
λ∈
which is self-adjoint with spectrum σ (S) ⊂ [A, B]. The sequence (ψ̃λ )λ∈ :=
(S −1 ψλ )λ∈ is then referred to as the canonical dual frame. It allows reconstruction
of some f ∈ H from the decomposition (1) and the construction of an explicit
coefficient sequence in the expansion (2) by considering
f = f, ψλ ψ̃λ and f = f, ψ̃λ ψλ for all f ∈ H,
λ∈ λ∈
respectively. The coefficient sequence (f, ψ̃λ )λ∈ can even be shown to be the
smallest in 2 norm among all possible ones.
For further information on frame theory, we refer to Casazza et al. (2012)
and Christensen (2003).
1098 G. Kutyniok
Wavelets
One first highlight in applied harmonic analysis was the development of the system
of wavelets, based on translation (x → x −m) and dilation (x → 2j x) leading to the
representation of functions in L2 (Rd ) at different locations and different resolution
levels.
dj
{ψj,m
= 2 2 ψ (2j · −m) : j ∈ Z, m ∈ Zd , = 1, . . . , L}. (3)
(a) (b)
This problem has led to the development of various novel anisotropic repre-
sentation systems within the area of applied harmonic analysis. Some of the key
contributions are steerable pyramid by Simoncelli et al. (1992), directional filter
banks by Bamberger and Smith (1992), 2D directional wavelets by Antoine et al.
(1993), curvelets by Candès and Donoho (2004), contourlets by Do and Vetterli
(2005), bandelets by Le Pennec and Mallat (2005), and shearlets (Guo et al. 2006;
Labate et al. 2005). Shearlet systems indeed satisfy all desiderata one commonly
requires from an anisotropic system as stated before.
The main application areas of shearlets are inverse problems, foremost in imaging.
A common approach to solve an ill-posed inverse problem Tf = g for a linear,
bounded operator T : H → H is by Tikhonov regularization. A generalization of
this conceptual approach to sparse regularization was suggested in Daubechies et al.
(2004). Given a representation system (ψλ )λ∈ , an approximation of the solution
can be computed by minimizing the functional
Tf − g 2
+ β · (, ψλ )λ∈ 1 , (4)
1100 G. Kutyniok
with β being the regularization parameter. This approach exploits the fact that,
when carefully designing the system (ψλ )λ∈ , the solution of Tf = g exhibits
a sparse coefficient sequence (, ψλ )λ∈ . Exemplary general inverse problems are
inpainting (Genzel and Kutyniok 2014; King et al. 2014), morphological component
analysis (Donoho and Kutyniok 2013; Kutyniok and Lim 2012) and segmentation
(Häuser and Steidl 2013) or inverse problems from medical diagnosis such as
magnetic resonance imaging (Kutyniok and Lim 2018).
Recently, deep learning has swept the area of imaging science with deep
neural network-based approaches often outperforming the to-date state-of-the-art
algorithms. The last years though have shown that in fact hybrid methods, i.e.,
combinations of model-based and data-driven approaches, typically lead to the best
results by taking the best out of both worlds. Since the shearlet representation is
particularly well suited to analyze anisotropic features, several hybrid approaches
were suggested which combine the shearlet transform with deep neural networks
such as for limited-angle computed tomography (Bubba et al. 2019) as well as for
wavefront set and semantic edge detection (Andrade-Loarca et al. 2020, 2019).
In the following, we will provide an introduction to and a survey about the theory
and applications of shearlets. For additional information, we refer to Kutyniok and
Labate (2012).
Outline
We start by introducing the main notation and the definition of continuous shearlets.
Shearlet systems are composed of three operators, namely, scaling, shearing, and
translation, applied to a generating function, related to different resolution levels,
orientations, and positions, respectively. The term “continuous” indicates that
continuous parameter sets are considered. Notice that also the continuous shearlet
system and associated transform can be generalized in a canonical way to L2 (Rn )
for n ≥ 3 with the results from sections “Classical Continuous Shearlet Systems”
30 Shearlets: From Theory to Deep Learning 1101
We will first present the classical version of continuous shearlet systems. For this,
let the parabolic scaling matrix Aa , a ∈ R∗ := R \ {0} and the shearing matrix Ss ,
s ∈ R, be given by
a 0 1s
Aa = and Ss = , (5)
0 |a|1/2 01
This is a locally compact group with left Haar measure dμ (a, s, t) = da/|a|3 dsdt
(Dahlke et al. 2009). The map from S into the group of unitary operators on
L2 (R2 ), U(L2 (R2 )), given by (a, s, t) → ψa,s,t can now be regarded as a unitary
representation of the shearlet group. This allows to analyze square-integrability of
this mapping, i.e., irreducibility and the existence of a nontrivial admissible function
ψ ∈ L2 (R2 ) which, for all f ∈ L2 (R2 ), satisfies the admissibility condition
1102 G. Kutyniok
|f, ψa,s,t |2 dμ(a, s, t) < ∞.
S
This leads to the following result, which heavily relies on group theoretic
arguments:
is an isometry.
Let us now consider some examples of shearlets. The first and most extensively
studied shearlet is the so-called classical shearlet, which is a band-limited function
introduced in Labate et al. (2005). For an illustration of the support of the associated
Fourier transform, we refer to Fig. 2a.
Fig. 2 (a) Partitioning of Fourier domain by supports of several elements of the classical shearlet
system, with the support of the Fourier transform of the classical shearlet itself being highlighted.
(b) The partition of Fourier domain into four conic regions C1 – C4 and a centered rectangle
R = {(ξ1 , ξ2 ) : |ξ1 |, |ξ2 | ≤ 1} as the low-frequency regime
30 Shearlets: From Theory to Deep Learning 1103
1
2
|ψ̂2 (ξ + k)| = 1 for a.e. ξ ∈ [−1, 1],
k=−1
satisfying ψ̂2 ∈ C ∞ (R) and supp ψ̂2 ⊆ −1, 1 .
The associated transform can be defined in a similar manner as before in the pure
group-theoretic approach.
where (a, s, t) ∈ R∗ × R × R2 .
holds for some Ck > 0. The wavefront set W F (f ) is then defined as the
complement of the set of all regular directed points.
WF(f )c = D.
Discrete shearlet systems are derived by sampling the parameter set of continuous
shearlet systems. Thus, similar to continuous shearlet systems, both a “classical”
and a cone-adapted variant are available. Due to the fact that the first variant in the
discrete setting not only is incapable of detecting the horizontal direction precise –
only asymptotically – but also faces numerical instabilities due to the occurrence of
arbitrarily small support sets, we will focus in the sequel only on the cone-adapted
variant.
The discretization of the parameter sets of parabolic scaling and shearing as defined
in (5) is typically performed by choosing A2j and Sk with j, k ∈ Z. Coorbit theory
(cf. section “Classical Continuous Shearlet Systems)” then yields the discretization
(for c ∈ (R+ )2 to add flexibility)
⎧
⎪
⎨ f, ψj,k,m : ι = −1, j ≥ 0, |k| ≤ 2j/2 , m ∈ Mc Z2 ,
SHφ,ψ,ψ̃ f (j, k, m, ι) := f, φm : ι = 0, m ∈ c1 Z2 ,
⎪
⎩ f, ψ̃
j,k,m : ι = 1, j ≥ 0, |k| ≤ 2
j/2 , m ∈ M̃ Z2 .
c
Frame Properties
(a) (b)
Fig. 3 (a) Partitioning of Fourier domain by the cone-adapted discrete shearlet system with
classical shearlets as generating functions. (b) One shearlet in spatial domain
Band-Limited Shearlets
Classical shearlets as introduced in Example 1 are the most well-known type
of band-limited shearlets. With slight modifications, the associated cone-adapted
discrete shearlet system forms a Parseval frame for L2 (R2 ).
Further, let ψ ∈ L2 (R2 ) be a classical shearlet, let ψ̃(ξ1 , ξ2 ) = ψ(ξ2 , ξ1 ), and let
φ ∈ L2 (R2 ) be chosen so that, for a.e. ξ ∈ R2 ,
ˆ
|φ̂(ξ )|2 + |ψ̂(S−k
T
A2−j ξ )|2 χC + |ψ̃(S 2
= 1.
−k Ã2−j ξ )| χC
j ≥0 |k|≤2j/2 j ≥0 |k|≤2j/2
Then the modified cone-adapted discrete shearlet system (φ)∪PC (ψ)∪PC ˜ (ψ̃)
is a Parseval frame for L2 (R2 ).
ess inf |φ̂(ξ )| > 0 and ess inf |ψ̂(ξ )| > 0. (6)
ξ ∈0 ξ ∈1
ˆ S̃ T Ã −j ξ )|2 > 0.
ess inf |φ̂(ξ )|2 + |ψ̂(SkT A2−j ξ )|2 + |ψ̃( k 2
ξ ∈R2
j ≥0 |k|≤2j/2
The following result then proves that, provided the Fourier transforms of the scaling
function and shearlets decay fast enough with sufficient vanishing moments and
satisfy (6), we obtain a shearlet frame SH(φ, ψ, ψ̃; c).
φ̂(ξ1 , ξ2 ) ≤ C1 · min {1, |ξ1 |−γ } · min {1, |ξ2 |−γ } and
|ψ̂(ξ1 , ξ2 )| ≤ C2 · min{1, |ξ1 |α } · min {1, |ξ1 |−γ } · min {1, |ξ2 |−γ }, (7)
for some positive constants C1 , C2 < ∞ and α > γ > 3. Define ψ̃(x1 , x2 ) =
ψ(x2 , x1 ) and assume that φ, ψ satisfy (6). Then, there exists some positive constant
c∗ such that SH(φ, ψ, ψ̃, c) forms a frame for L2 (R2 ) for any c = (c1 , c2 ) with
max{c1 , c2 } ≤ c∗ .
Sparse Approximation
Recalling the goal to derive suitable decompositions (1) and efficient representations
(2) of data, we will now show that within a certain model setting, shearlets can be
proven to serve for both tasks in an optimal way.
For this, we first focus on the approximation properties of shearlets and introduce
the related basic notions of approximation theory. Given a class of functions and
a representation system, one main goal of approximation theory is to analyze the
suitability of this system for uniformly approximating functions from this class.
This leads to the notion of best N-term approximation.
sup f − fN L2 = O(N −γ ) as N → ∞
f∈C
Definition 10. For fixed ν > 0, the class E2ν of cartoonlike functions is the set of
functions f : R2 → C of the form
f = f0 + f1 χB ,
Fig. 4 Example of a
cartoonlike function
1110 G. Kutyniok
2
where B ⊂ 0, 1 with ∂B being a closed C 2 -curve with curvature bounded by ν
2
as well as fi ∈ C 2 (R2 ) with supp fi ⊂ 0, 1 and fi C 2 ≤ 1 for each i = 0, 1.
The optimal (sparse) approximation rate for cartoonlike functions was proven by
Donoho as well and can be stated in the situation of frames as follows. We wish to
emphasize that the original result is proven for more general function systems.
Theorem 5 (Donoho 2001). Let (ψλ )λ∈ be a frame for L2 (R2 ). Then the optimal
asymptotic approximation error of f ∈ E2ν is given by
f − fN L2 ≤ C · N −1 as N → ∞,
This benchmark result allows to make the phrase “optimal sparse approximations
of cartoonlike functions” mathematically precise, namely, being justified in case a
representation system does satisfy this rate. Indeed, it can be proven that, under
weak assumptions on the generating functions, cone-adapted discrete shearlet
systems associated with compactly supported shearlets provide this optimal rate up
to a log-factor, which is typically assumed to be negligible.
Theorem 6 (Kutyniok and Lim 2011). Let c > 0, and let φ, ψ, ψ̃ ∈ L2 (R2 )
be compactly supported. Suppose that, in addition, for all ξ = (ξ1 , ξ2 ) ∈ R2 , the
shearlet ψ satisfies
where α > 5, γ ≥ 4, h ∈ L1 (R), and C1 are constant, and suppose that the shearlet
ψ̃ satisfies (i) and (ii) with the roles of ξ1 and ξ2 reversed. Further, suppose that
SH(φ, ψ, ψ̃; c) forms a frame for L2 (R2 ). Then, for any ν > 0, the shearlet frame
SH(φ, ψ, ψ̃; c) provides (almost) optimal sparse approximations of functions
f ∈ E2ν in the sense that there exists some C > 0 such that
3
f − fN L2 ≤ C · N −1 · (log N) 2 as N → ∞,
A similar result can also be derived in the setting of band-limited shearlets (Guo
and Labate 2007).
30 Shearlets: From Theory to Deep Learning 1111
Extensions of Shearlets
Higher Dimensions
We will first describe the extension to the three-dimensional situation, i.e., to derive
a frame for L2 (R3 ). In this situation, the four cones will be replaced by six pyramids
again leading to a uniform way to treat the different directions. Accordingly, we
define paraboloidal scaling matrices A2j , Ã2j and Ă2j , j ∈ Z by
A2j =diag(2j , 2j/2 , 2j/2 ), Ã2j =diag(2j/2 , 2j , 2j/2 ), and Ă2j =diag(2j/2 , 2j/2 , 2j )
x3 x3
x1 x1
x2 x2
where
α-Molecules
P := R+ × T × R2 ,
Definition 12. Let α ∈ [0, 1], and let (, ) be a parametrization. Then (mλ )λ∈
is a system of α-molecules of order (L, M, N1 , N2 ) ∈ (Z+ ∪ {∞})2 × Z2+ , if, for all
λ ∈ ,
(1+α)/2 (λ)
mλ (x) = sλ a Aα,sλ Rθλ (x − xλ ) , (λ) = (sλ , θλ , xλ ),
M − N1 − N2
−(1−α)
∂ β â (λ) (ξ ) min 1, sλ−1 + |ξ1 | + sλ
2 2
|ξ2 | 1 + |ξ |2 1 + ξ22 .
1114 G. Kutyniok
We next state the key result enabling the transfer of sparse approximation results
from one system to all other systems within this framework for the same α. It
provides an estimate for the decay of the entries of the cross-Gramian matrix away
from the main diagonal, which requires an appropriate notion of distance. For this,
let (, ) and (, ˜ ˜
˜ ) be parametrizations. For λ ∈ and μ ∈ , we then
define the index distance by
ω(λ, μ) := ω( (λ), ˜ (μ)) := 2|sλ −sμ | 1 + 2argmin(sλ ,sμ ) d λ, μ ,
where
d λ, μ := |θλ − θμ |2 + |xλ − xμ |2 + | cos(θλ ), sin(θλ ) , xλ − xμ |.
Theorem 7 (Grohs and Kutyniok 2014; Grohs et al. 2016a). Let α ∈ [0, 1], N >
0, and let (mλ )λ∈ , (pμ )μ∈˜ be systems of α-molecules of order (L, M, N1 , N2 )
with
3−α 1+α
L ≥ 2N, M > 3N − , N1 ≥ N + , N2 ≥ 2N.
2 2
˜
Then, for all λ ∈ and μ ∈ ,
mλ , pμ ω(λ, μ)−N .
Universal Shearlets
Based on these, universal shearlet systems are defined as follows (Genzel and
Kutyniok 2014):
where
j j
In the special situation when all αj and cj coincide, i.e., αj = α0 and (c1 , c2 ) =
(c1 , c2 ) for all scales j , and α0 = 1, the system reduces to cone-adapted discrete
shearlet systems in the sense that SH(φ, ψ, ψ̃; α, c) = SH(φ, ψ, ψ̃; c). If in this
situation α0 = 2, then the universal shearlet systems reduce to isotropic wavelet
systems. Finally, for α0 → 0, the system of ridgelets is approached.
Since the implementation of ShearLab3D in www.ShearLab.org relies on
universal shearlets, we also state the associated transform explicitly.
Definition 14. Retain the notions from Definition 13, and let SH (φ, ψ, ψ̃; α, c)
be a universal shearlet system. Then the associated universal shearlet transform of
f ∈ L2 (R2 ) is given by
⎧
⎪
⎪
j (2−αj )
⎨ f, ψj,k,m : ι = −1, j ≥ 0, |k| ≤ 2 2 , m ∈ Z2 ,
SHφ,ψ,ψ̃ f (j, k, m, ι) := f, φm : ι = 0, m ∈ Z2 ,
⎪
⎪
⎩ f, ψ̃ j (2−αj )
j,k,m : ι = 1, j ≥ 0, |k| ≤ 2 2 , m ∈ Z2 .
On the theoretical side, this approach has so far been only analyzed for band-
limited generators concerning their frame properties. More precisely, in Genzel
and Kutyniok (2014), it has been shown that there exists a large class of scaling
sequences α = (αj )j such that, using classical shearlets with small modifications,
the system SH (φ, ψ, ψ̃; α, c) forms a Parseval frame for L2 (R2 ).
1116 G. Kutyniok
One main advantage of shearlets is the fact that they admit a faithful digitalization
and hence a consistent implementation, mainly due to the fact that directional
sensitivity is incorporated by a shearing operator (instead of, for instance, a rotation
operator, which would change the digital grid). The first digital version was
introduced in Easley et al. (2008) as the nonsubsampled shearlet transform in 2D
and 3D, which digitalized the cone-adapted discrete shearlet transform based on
band-limited shearlets. The first faithful digital shearlet transform using compactly
supported shearlets was suggested in Lim (2010). It utilizes separable shearlets
to achieve low complexity. This approach was later improved in Lim (2013) by
an implementation called nonseparable shearlet transform. It uses the fact that
nonseparable compactly supported shearlet generators can much better approximate
classical band-limited shearlets, which in turn can be designed to form Parseval
frames.
In the sequel, we will describe the concept of digital shearlet systems and associated
transforms as developed in Lim (2013). In fact, these are also the basis for the
software package ShearLab3D provided on the webpage www.ShearLab.org
(see also Kutyniok et al. (2016)), which extends this concept to both universal
shearlets and the 3D situation.
The digital shearlet systems we will introduce are a faithful digitalization of
cone-adapted discrete shearlet systems SH(φ, ψ, ψ̃; c) = (φ; c1 ) ∪ (ψ; c) ∪
˜ (ψ̃; c) as in Definition 7. Since the component (φ; c1 ) is just the scaling part
coinciding with a wavelet scaling part, we refer for its digitalization to the common
wavelet literature (Daubechies 1992; Mallat 1998). Furthermore, we restrict to
discussing (ψ; c), since ˜ (ψ̃; c) can be digitalized similarly except for switching
the order of variables.
We first define a separable shearlet ψ sep ∈ L2 (R2 ), which will be the basis
for defining a nonseparable variant. For this, let ψ 1 and φ 1 ∈ L2 (R) be a
compactly supported 1D wavelet and an associated (orthonormal) scaling function,
respectively, satisfying the two scale relations
√
φ 1 (x1 ) = h(n1 ) 2φ 1 (2x1 − n1 )
n1 ∈Z
and
√
ψ 1 (x1 ) = g(n1 ) 2φ 1 (2x1 − n1 ),
n1 ∈Z
with some appropriately chosen filters g and h in the sense that both ψ 1 and φ 1 are
sufficiently smooth and ψ 1 has sufficient vanishing moments. For later use, we also
define
30 Shearlets: From Theory to Deep Learning 1117
−1
j
Hj (ξ1 ) := H (2k ξ1 ) and Gj (ξ1 ) := G(2j −1 ξ1 )Hj −1 (ξ1 ),
k=0
where the trigonometric polynomial P is a 2D fan filter (cf. Do and Vetterli 2005).
With a suitable choice for P , we indeed have
where
3
ψj,k,m (x) := 2 4 j ψ(Sk A2j x − Mc m) = ψj,0,m (Sk2−j/2 x) (10)
Definition 15. Let fJ ∈ 2 (Z2 ) be the scaling coefficients given in (9), and retain
the notions from this subsection. Then the digital shearlet transform associated with
(ψ; c) is defined by
d )(2J A−1 M m)
DSTψ2D f (j, k, m) := (fJ ∗ ψj,k for j = 0, . . . , J − 1,
2j c
where
d
ψj,k := Sk2
d
−j/2 (pj ∗ gj ),
with the shearing operator defined by (12) and the sampling matrix Mc chosen so
that 2J A−1
2j
Mc m ∈ Z 2 .
Considering the full shearlet system and not only (ψ; c) then leads in a
canonical manner to the digital shearlet transform DST 2D f (j, k, m, ι) with ι
φ,ψ,ψ̃
playing a similar role as in Definition 7.
f (x) = fJ,m 2J ·3/2 (φ 1 ⊗ φ 1 ⊗ φ 1 )(2J x − m) (13)
m∈Z3
as follows:
Definition 16. Let fJ ∈ 2 (Z3 ) be the scaling coefficients given in (13), and retain
the notions from this section. Then the digital shearlet transform associated with
(ψ; α, c) is defined by
j j
where the sampling constants c1 and c2 are chosen so that
αj αj
m̃ := (2J −j c1 m1 , 2J − c2 m2 , 2J −
j j j j j
2 2 c2 m3 ) ∈ Z3 ,
d are
and the discrete-time Fourier transforms of the 3D digital shearlet filters ψj,k
defined by
d
j,k (ξ ) := GJ −j (ξ1 ) d
j,k1 (ξ1 , ξ2 )
d
j,k2 (ξ1 , ξ3 )
d
φj,k1 ,(n ,n
1 2 ) := Sd −dα (hJ − αj j ∗x2 pj )
k1 2 j 2 (n1 ,n2 )
and
d
φj,k2 ,(n1 ,n3 )
:= Sd −dαj (hJ − αj j ∗x3 pj ) ,
k2 2 2 (n1 ,n3 )
respectively.
Similar to the 2D situation, the definition of the full 3D digital shearlet transform
DST 3D f (j, k, m, ι) is then canonical.
φ,ψ,ψ̃,ψ̆
Applications of Shearlets
Tf − g 2
+β · f 2
,
anisotropic features such as images, shearlet systems provide optimal sparse approx-
imations. The generalization of Tikhonov regularization introduced in Daubechies
et al. (2004) exploits such information by suggesting to minimize
Tf − g 2
+ β · (, ψλ )λ∈ 1 ,
with (ψλ )λ∈ being a shearlet system, instead. We remark that the concept of sparse
regularization is closely related to, and in fact might also be seen as belonging to,
the area of compressed sensing (Davenport et al. 2012).
We now discuss two different special situations in which this conceptual
approach can be applied.
Image Separation
Images are typically a composition of morphologically distinct components. The
problem of image separation, which is a highly ill-posed inverse problem, aims
to decompose the image into those components. To be mathematically precise,
assuming just two components, the problem can be modeled as follows: Let f1 , f2 ∈
L2 (R2 ) and g = f1 + f2 ; we aim to recover f1 and f2 from g. One possible setting
is the separation of curve-like and point-like objects, which, for example, appears
in neurobiological imaging in the form of spines (point-like objects) and dendrites
(curve-like objects) or astronomical imaging in the form of stars (point-like objects)
and filaments (curve-like objects). For further examples, we refer to Starck et al.
(2010).
This problem can only be solved by assuming prior information on the compo-
nents. The approach of sparse regularization assumes that each component f1 and
f2 can be sparsified by a representation system (ψλ1 )λ∈ and (ψλ2 )λ∈ , respectively.
This leads to the following minimization problem:
P
f1 := |x − xi |−3/2 and f2 := δτ (t) dt,
i=1
Fig. 6 Separation of spines and dendrites in neurobiological imaging (Kutyniok and Lim 2012)
using ShearLab3D to solve (14). (a) Original image. (b) Extracted dendrites (curve-like objects).
(c) Extracted spines (point-like objects)
g= Fj ∗ (Fj ∗ g), for all g ∈ L2 (R2 ). (15)
j
Theorem 8 (Donoho and Kutyniok 2009, 2013). Retaining the notation from this
subsection and letting f
1,j , f2,j denote the solution of (14) for the separation
problem gj = f1,j + f2,j , we have
f1,j − f
1,j L2 + f2,j − f
2,j L2
→ 0, j → ∞.
f1,j L2 + f2,j L2
Image Inpainting
Image inpainting aims to recover missing or deteriorated parts of an image. It is
thus a special case of a data recovery problem; and the approach we discuss can
be generalized to this setting as well. The problem can be formulated as follows:
30 Shearlets: From Theory to Deep Learning 1123
Again aiming for an asymptotic analysis, let (Fj )j be a sequence of filters (cf. (15)),
which leads to a scale-dependent decomposition, and consider the filtered image
fj := f ∗ Fj as well as the filtered observed image gj := (f · 1R2 \Mh ) ∗ Fj , where
j
we also make the width of the mask dependent on the scale j . The following result
shows that at all sufficiently fine scales, nearly perfect inpainting is achieved in case
the shearlets are asymptotically larger than the width of the mask.
Fig. 7 Numerical experiments using ShearLab3D to solve (16). (a) Original image. (b) Masked
image. (c) Inpainted image
1124 G. Kutyniok
Theorem 9 (King et al. 2014). Retaining the notation from this subsection and
letting fj denote the solution of (16) for the inpainting problem gj = (f ·1R2 \Mh )∗
j
Fj , if hj = o(2−j/2 ) as j → ∞, we have
fj − fj L2
→ 0, j → ∞.
fj L2
A similar result holds for wavelet inpainting, then with the sufficient condition
that hj = o(2−j ) as j → ∞ according to the smaller width of a wavelet element.
An extension to inpainting using universal shearlet systems can be found in Genzel
and Kutyniok (2014). For similar results in the general Hilbert space setting, we
refer to Donoho and Kutyniok (2013) and Genzel and Kutyniok (2014).
Deep learning approaches have recently swept the area of inverse problems,
predominantly from imaging, the main reason being that no physical model for
images exists, consequently making data-driven methods very effective. A standard
feed-forward deep neural network consists of affine-linear maps W : RN−1 →
RN , = 1, . . . , L, i.e., W (x) = A x + b , where A ∈ RN ×N−1 and b ∈ RN ,
as well as a (nonlinear) univariate function σ : R → R called activation function,
and realizes the map NNθ : Rd → RNL
with σ being applied componentwise and θ denoting all parameters of the neural
network, i.e., the weight matrices A and biases b . In applications, the activation
function is typically chosen as the ReLU (Rectified Linear Unit) given by σ (x) :=
max{0, x}. Corresponding to the depiction as a graph, L is referred to as the number
i=1 of a function f : R → R , learning
of layers. Given samples (xi , f (xi ))m d NL
approach, which was suggested in Jin et al. (2017), first recovers an approximation
of f from g by standard model-based approaches followed by a convolutional neural
network, which acts as a denoiser. More sophisticated types of approaches aim to
insert deep neural networks in iterative reconstruction schemes, for instance, by
replacing certain steps such as a denoising step by a neural network, which was
pioneered in Gregor and LeCun (2010), or replacing some of the proximal operators
by networks (see, e.g., Meinhardt et al. (2017) and Adler and Öktem (2018)). For
an overview of deep learning approaches to inverse problems, we refer to Adler and
Öktem (2017) and McCann et al. (2017).
In contrast to the previously discussed approaches, we will now present two
exemplary algorithms which combine the model-based realm represented by shear-
lets with the data-driven realm of deep neural networks following the philosophy
of using model-based methods as far as they are reliable and data-driven methods
where it is necessary. This conceptual type of approach not only avoids that deep
neural networks affect the entire data set during inversion, which presumably causes
instabilities (Gottschling et al. 2020), but also allows a better interpretation of the
results.
Rf (φ, s) = f (x)dS(x),
L(φ,s)
!
where L(φ, s) = x ∈ R2 : x1 cos(φ) + x2 sin(φ) = s , φ ∈ [−π/2, π/2), and
s ∈ R (Natterer 2001). The inverse problem of reconstructing f from its Radon
transform g := Rf becomes even more challenging when only partial data
is available. One instance of this problem complex is limited-angle computed
tomography, where Rf (·, s) is only sampled on [−φ, φ] ⊂ [−π/2, π/2). Examples
include breast tomosynthesis, dental CT, and electron tomography. Due to the large
missing part in the measured data – in contrast to, for instance, low-dose CT –
model-based approaches only provide crude reconstructions, since no model-based
priors exist which model a human body sufficiently accurately.
Depending on the missing angle, it is known which information about the
wavefront set of the original image is contained in the measured data, hence in this
sense what is “visible” (Quinto 1993). This allows to view the problem of limited-
angle computed tomography as an inpainting problem of the wavefront set. Due
to the sensitivity of shearlets to the wavefront set (Theorem 2), it is suggestive to
exploit this system in this problem setting.
The approach “Learning the Invisible” (Bubba et al. 2019) pursues this strategy,
by first reconstructing the image using sparse regularization with shearlets as
sparsifying system, followed by surgically precisely learning the invisible data
1126 G. Kutyniok
corresponding to the missing part of the wavefront set by a deep learning approach.
The algorithm can be outlined as follows:
f ∗ := arg min Rf − g 2
2 + SHφ,ψ,ψ̃ f 1,w ,
f ≥0
1
m
min NNθ (SHφ,ψ,ψ̃ fi∗ ) − SHφ,ψ,ψ̃ figt |Iinv 2
w,2 ,
θ m
i=1
and compute
" #
∗ !
NNθ : SHφ,ψ,ψ̃ f |Ivis −→ F ≈ SHφ,ψ,ψ̃ f |Iinv .
gt
Figure 8 shows numerical results, which prove superiority not only over the
model-based approach but even over the pure deep learning approach from Gu and
Ye (2017).
Fig. 8 Numerical experiments from Bubba et al. (2019) using data from Mayo-60◦ with a missing
wedge of 60◦ , where RE stands for relative error and HaarPSI is the Haar wavelet-based perceptual
similarity index for image quality assessment (Reisenhofer et al. 2018). (a) Original image. (b) f ∗
(RE: 0.19, HaarPSI: 0.43). (c) Result from Gu and Ye (2017) (RE: 0.22, HaarPSI: 0.40). (d) fLtI
(RE: 0.09, HaarPSI: 0.76)
image can be related to the wavefront set of its transformed version such as its Radon
transform by (microlocal) canonical relations. Being able to detect the wavefront set
of the Radon transform, say, allows to compute an approximation of the wavefront
set of the original image by a (microlocal) canonical relation and use it as a prior
for reconstruction (Andrade-Loarca et al. 2020).
Cone-adapted continuous shearlet systems are able to resolve wavefront sets
(Theorem 2). But algorithms following this model such as Yi et al. (2009)
and Reisenhofer et al. (2015) often suffer from the fact that real-world scenarios are
highly complex and the theoretical analysis only provides an asymptotic estimate.
In the sequel, we will discuss an approach coined DeNSE (Deep Network
Shearlet Edge Extractor) (Andrade-Loarca et al. 2019), which again follows the
philosophy to use a model-based approach as far as it is reliable and use a deep
1128 G. Kutyniok
Fig. 9 Numerical experiments from Andrade-Loarca et al. (2019), where the color coding
indicates the detected direction. (a) Original image. (b) Result from Yi et al. (2009). (c) Result
from Reisenhofer et al. (2015). (d) Result using DeNSE. Copyright ©2019 Society for Industrial
and Applied Mathematics. Reprinted with permission. All rights reserved
• Step 1: Reveal Directionality in the Shearlet Domain. For a given test image f ∈
RM×M , compute the digital shearlet transform of f with 49 shearlet generators,
i.e.,
2D
DSTφ,ψ,ψ̃
f (j, k, m, ι) j,k,m∈[1,M]2 ,ι∈{−1,0,1}
.
30 Shearlets: From Theory to Deep Learning 1129
• Step 2: Shearlet Transform. For every location m∗ = (m∗1 , m∗2 ) ∈ [11, M − 10]2 ,
apply a neural network classifier consisting of four convolutional layers plus one
fully connected layer to the associated patch
2D
DSTφ,ψ,ψ̃
f (j, k, m, ι) j,k,m∈[m∗ −10,m∗ +10]×[m∗ −10,m∗ +10],ι∈{−1,0,1}
.
1 1 2 2
If the network predicts the presence of an edge with direction ϑ, then (m∗ , ϑ) is
detected as an element of the wavefront set of f .
Conclusion
The area of applied harmonic analysis provides representation systems for data
processing, aiming for both decomposition and expansion of data/functions. Shear-
let systems are specifically designed for the setting of multivariate functions and
exist as continuous, discrete, and digital systems. While the continuous version
allows a precise resolution of wavelet fronts, the discrete version provides optimally
sparse approximations of cartoon-like functions as a model class of functions being
governed by anisotropic features, and the digital version yields faithful implementa-
tions. Shearlet systems can be extended to higher dimensions as well as also to more
general universal shearlets and a-molecules. Shearlet systems are typically used for
sparse regularization of inverse problems such as feature extraction and inpainting,
for which both theoretical and numerical results are available. Recent applications
combine the shearlet transform with deep neural networks in a smart way targeting
problems such as limited-angle computed tomography and wavefront set detection.
Acknowledgments G.K. would like to thank Hector Andrade-Loarca for producing several of the
figures.
References
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33, 124007 (2017)
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE T. Med. Imaging 37, 1322–1332
(2018)
Andrade-Loarca, H., Kutyniok, G., Öktem, O.: Shearlets as feature extractor for semantic edge
detection: the model-based and data-driven realm. Proc. R. Soc. A. 476(2243), 20190841
(2020). https://royalsocietypublishing.org/toc/rspa/2020/476/2243
Andrade-Loarca, H., Kutyniok, G., Öktem, O., Petersen, P.: Extraction of digital wavefront sets
using applied harmonic analysis and deep neural networks. SIAM J. Imaging Sci. 12, 1936–
1966 (2019)
1130 G. Kutyniok
Antoine, J.P., Carrette, P., Murenzi, R., Piette, B.: Image analysis with two-dimensional continuous
wavelet transform. Sig. Process. 31, 241–272 (1993)
Bamberger, R.H., Smith, M.J.T.: A filter bank for the directional decomposition of images: theory
and design. IEEE Trans. Sig. Process. 40, 882–893 (1992)
Bodmann, B.G., Labate, D., Pahari, B.R.: Smooth projections and the construction of smooth
Parseval frames of shearlets. Adv. Comput. Math. 45, 3241–3264 (2019)
Bubba, T.A., Kutyniok, G., Lassas, M., März, M., Samek, W., Siltanen, S., Srinivasan, V.:
Learning the invisible: a hybrid deep learning-shearlet framework for limited angle computed
tomography. Inverse Probl. 35, 064002 (2019). https://iopscience.iop.org/article/10.1088/1361-
6420/ab10ca
Candès, E.J., Donoho, D.L.: New tight frames of curvelets and optimal representations of objects
with piecewise C 2 singularities. Commun. Pure Appl. Math. 56, 216–266 (2004)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8,
679–698 (1986)
Casazza, P.G., Kutyniok, G., Philipp, F.: Introduction to finite frame theory. In: Finite Frames:
Theory and Applications, pp. 1–53. Birkhäuser, Boston (2012)
Christensen, O.: An Introduction to Frames and Riesz Bases. Birkhäuser, Boston (2003)
Cohen, A.: Numerical Analysis of Wavelet Methods. Studies in Mathematics and Its Applications,
vol. 32. JAI Press, Greenwich (2003)
Dahlke, S., Kutyniok, G., Maass, P., Sagiv, C., Stark, H.-G., Teschke, G.: The uncertainty principle
associated with the continuous shearlet transform. Int. J. Wavelets Multiresolut. Inf. Process. 6,
157–181 (2008)
Dahlke, S., Kutyniok, G., Steidl, G., Teschke, G.: Shearlet coorbit spaces and associated banach
frames. Appl. Comput. Harmon. Anal. 27, 195–214 (2009)
Dahlke, S., Steidl, G., Teschke, G.: The continuous shearlet transform in arbitrary space dimen-
sions. J. Fourier Anal. Appl. 16, 340–354 (2010)
Dahlke, S., Steidl, G., Teschke, G.: Shearlet coorbit spaces: compactly supported analyzing
shearlets, traces and embeddings. J. Fourier Anal. Appl. 17, 1232–1255 (2011)
Dahlke, S., Häuser, S., Steidl, G., Teschke, G.: Shearlet coorbit spaces: traces and embeddings in
higher dimensions. Monatsh. Math. 169, 15–32 (2013)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Daubechies, I., Defrise, M., De Mo, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Davenport, M., Duarte, M., Eldar, Y., Kutyniok, G.: Introduction to compressed sensing. In:
Compressed Sensing: Theory and Applications, pp. 1–64. Cambridge University Press (2012)
Do, M.N., Vetterli, M.: The contourlet transform: an efficient directional multiresolution image
representation. IEEE Trans. Image Process. 14, 2091–2106 (2005)
Donoho, D.L.: Sparse components of images and optimal atomic decomposition. Constr. Approx.
17, 353–382 (2001)
Donoho, D.L., Kutyniok, G.: Geometric separation using a wavelet-shearlet dictionary.
SampTA’09, Marseille. Proceedings (2009)
Donoho, D.L., Kutyniok, G.: Microlocal analysis of the geometric separation problem. Commun.
Pure Appl. Math. 66, 1–47 (2013)
Easley, G., Labate, D.: Image processing using shearlets. In: Shearlets: Multiscale Analysis for
Multivariate Data, pp. 283–325. Birkhäuser, Boston (2012)
Easley, G., Labate, D., Lim, W.-Q.: Sparse directional image representation using the discrete
shearlet transform. Appl. Comput. Harmon. Anal. 25, 25–46 (2008)
Genzel, M., Kutyniok, G.: Asymptotic analysis of inpainting via universal shearlet systems. SIAM
J. Imaging Sci. 7, 2301–2339 (2014)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine
Learning. MIT Press, Cambridge (2017)
Gottschling, N., Antun, V., Adcock, B., Hansen, A.C.: The troublesome kernel: why deep learning
for inverse problems is typically unstable. preprint, arXiv:2001.01258 (2020)
30 Shearlets: From Theory to Deep Learning 1131
Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: International Conference
on Machine Learning (ICML), pp. 399–406 (2010)
Grohs, P.: Continuous Shearlet frames and Resolution of the Wavefront Set. Monatsh. Math. 164,
393–426 (2011a)
Grohs, P.: Continuous shearlet tight frames. J. Fourier Anal. Appl. 17, 506–518 (2011b)
Grohs, P.: Bandlimited shearlet frames with nice duals. J. Comput. Appl. Math. 142, 139–151
(2013)
Grohs, P., Kutyniok, G.: Parabolic molecules. Found. Comput. Math. 14, 299–337 (2014)
Grohs, P., Keiper, S., Kutyniok, G., Schäfer, M.: α-molecules. Appl. Comput. Harmon. Anal. 42,
297–336 (2016a)
Grohs, P., Keiper, S., Kutyniok, G., Schäfer, M.: Cartoon approximation with α-curvelets. J.
Fourier Anal. Appl. 22, 1235–1293 (2016b)
Gu, J., Ye, J.C.: Multi-scale wavelet domain residual learning for limited-angle CT reconstruction.
In: Procs Fully3D, pp. 443–447 (2017)
Guo, K., Labate, D.: Optimally sparse multidimensional representation using shearlets. SIAM J
Math. Anal. 39, 298–318 (2007)
Guo, K., Labate, D.: The construction of smooth parseval frames of shearlets. Math. Model. Nat.
Phenom. 8, 82–105 (2013)
Guo, K., Kutyniok, G., Labate, D.: Sparse multidimensional representations using anisotropic
dilation and shear operators. In: Wavelets and Splines, Athens, 2005, pp. 189–201. Nashboro
Press, Nashville (2006)
Guo, K., Labate, D., Lim, W.-Q.: Edge analysis and identification using the continuous shearlet
transform. Appl. Comput. Harmon. Anal. 27, 24–46 (2009)
Häuser, S., Steidl, G.: Convex multiclass segmentation with shearlet regularization. Int. J. Comput.
Math. 90, 62–81 (2013)
Hörmander, L.: The analysis of linear partial differential operators. I. Distribution theory and
Fourier analysis. Springer, Berlin (2003)
Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse
problems in imaging. IEEE Trans. Image Proc. 26, 4509–4522 (2017)
King, E.J., Kutyniok, G., Zhuang, X.: Analysis of inpainting via clustered sparsity and microlocal
analysis. J. Math. Imaging Vis. 48, 205–234 (2014)
Kittipoom, P., Kutyniok, G., Lim, W.-Q.: Irregular shearlet frames: geometry and approximation
properties. J. Fourier Anal. Appl. 17, 604–639 (2011)
Kittipoom, P., Kutyniok, G., Lim, W.-Q.: Construction of compactly supported shearlet frames.
Constr. Approx. 35, 21–72 (2012)
Kutyniok, G.: Clustered sparsity and separation of cartoon and texture. SIAM J. Imaging Sci. 6,
848–874 (2013)
Kutyniok, G.: Geometric separation by single-pass alternating thresholding. Appl. Comput.
Harmon. Anal. 36, 23–50 (2014)
Kutyniok, G., Labate, D.: Resolution of the wavefront set using continuous shearlets. Trans. Am.
Math. Soc. 361, 2719–2754 (2009)
Kutyniok, G., Labate, D.: Introduction to shearlets. In: Shearlets: Multiscale Analysis for Multi-
variate Data, pp. 1–3. Birkhäuser, Boston (2012)
Kutyniok, G., Lim, W.-Q.: Compactly supported shearlets are optimally sparse. J. Approx. Theory
163, 1564–1589 (2011)
Kutyniok, G., Lim, W.-Q.: Image separation using wavelets and shearlets. In: Curves and Surfaces,
Avignon, 2010). Lecture Notes in Computer Science, vol. 6920, pp. 416–430. Springer (2012)
Kutyniok, G., Lim, W.-Q.: Optimal compressive imaging of Fourier data. SIAM J. Imaging Sci.
11, 507–546 (2018)
Kutyniok, G., Petersen, P.: Classification of edges using compactly supported shearlets. Appl.
Comput. Harmon. Anal. 42, 245–293 (2017)
Kutyniok, G., Lemvig, J., Lim, W.-Q.: Optimally sparse approximations of 3D functions by
compactly supported shearlet frames. SIAM J. Math. Anal. 44, 2962–3017 (2012)
1132 G. Kutyniok
Kutyniok, G., Lim, W.-Q., Reisenhofer, R.: ShearLab 3D: faithful digital shearlet transforms based
on compactly supported shearlets. ACM Trans. Math. Softw. 42, 5 (2016)
Labate, D., Lim, W.-Q., Kutyniok, G., Weiss, G.: Sparse multidimensional representation using
shearlets. In: Wavelets XI, Proceedings of SPIE, Bellingham, vol. 5914, pp. 254–262 (2005)
Labate, D., Mantovani, L., Negi, P.S.: Shearlet smoothness spaces. J. Fourier Anal. Appl. 19, 577–
611 (2013)
Le Pennec, E.L., Mallat, S.: Sparse geometric image representations with bandelets. IEEE Trans.
Image Process. 14, 423–438 (2005)
Lessig, C., Petersen, P., Schäfer, M.: Bendlets: a second-order shearlet transform with bent
elements. Appl. Comput. Harmon. Anal. 46, 384–399 (2019)
Lim, W.-Q.: The discrete shearlet transform: a new directional transform and compactly supported
shearlet frames. IEEE Trans. Image Proc. 19, 1166–1180 (2010)
Lim, W.-Q.: Nonseparable shearlet transform. IEEE Trans. Image Proc. 22, 2056–2065 (2013)
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, San Diego (1998)
McCann, M.T., Jin, K.H., Unser, M.: Convolutional neural networks for inverse problems in
imaging: a review. IEEE Signal Proc. Mag. 34, 85–95 (2017)
Meinhardt, T., Möller, M., Hazirbas, C., Cremers, D.: Learning proximal operators: using
denoising networks for regularizing inverse imaging problems. In: International Conference
on Computer Vision (ICCV) (2017)
Natterer, F.: The Mathematics of Computerized Tomography. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia (2001)
Quinto, E.T.: Singularities of the X-ray transform and limited data tomography in R2 and R3 .
SIAM J. Math. Anal. 24, 1215–1225 (1993)
Reisenhofer, R., Kiefer, J., King, E.J.: Shearlet-based detection of flame fronts. Exp. Fluids 57, 11
(2015)
Reisenhofer, R., Bosse, S., Kutyniok, G., Wiegand, T.: A Haar wavelet-based perceptual similarity
index for image quality assessment. Sig. Process. Image 61, 33–43 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image
segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI).
LNCS, vol. 9351, pp. 234–241. Springer (2015)
Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transforms.
IEEE Trans. Inform. Theory 38, 587–607 (1992)
Starck, J.-L., Murtagh, F., Fadili, J.: Sparse Image and Signal Processing: Wavelets, Curvelets,
Morphological Diversity. Cambridge University Press, Cambridge (2010)
Yi, S., Labate, D., Easley, G.R., Krim, H.: A shearlet approach to edge analysis and detection.
IEEE Trans. Image Process. 18, 929–941 (2009)
Learned Regularizers for Inverse Problems
31
Sebastian Lunz
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134
Shallow Learned Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
Bilevel Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
Dictionary Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137
Deep Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138
Regularization Properties of Learned Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138
Adversarial Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1141
Total Deep Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145
Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149
Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1152
Abstract
In the past years, there has been a surge of interest in methods to solve
inverse problems that are based on neural networks and deep learning. A variety
of approaches have been proposed, showing improvements in reconstruction
quality over existing methods. Among those, a class of algorithms builds
on the well-established variational framework, training a neural network as
a regularization functional. Those approaches come with the advantage of a
theoretical understanding and a stability theory that is built on existing results
for variational regularization. We discuss various approaches for learning a
S. Lunz ()
Department of Applied Mathematics and Theoretical Physics, University of Cambridge,
Cambridge, UK
e-mail: [email protected]
Keywords
Introduction
y = Ax + , (1)
arg maxx log p(x|y) = arg minx − log p(x|y) − log p(x). (3)
The expression log p(x|y) is captured by the data term D(Ax, y), whereas the
regularization functional can be viewed as an approximation to the log prior.
This viewpoint motivates investigating priors beyond their ability to stabilize
reconstruction, explaining the success of wildly used handcrafted priors such as
total variation (TV) that capture the distinct properties of the distribution of images,
such as sharp edges, more closely than Tikhonov-type regularization.
While TV has enjoyed great success in the past decades, its representation of
the behavior of images remains limited, assuming them to be piecewise constant.
As this is not true for many images, TV-based regularization is known to intro-
duce staircasing artefacts into reconstructions. To overcome these drawbacks, the
research community has shifted their focus on learning priors from data directly,
with the goal of obtaining a more realistic and detailed image representation. More
precisely, one aims at utilizing a training set {(x i , y i )} of ground truth images
xi
31 Learned Regularizers for Inverse Problems 1135
For those methods, the trained network ΨΘ (·, A) can be applied directly to new
measurements at inference. On the other hand, approaches based on learning a reg-
ularization functional RΘ typically separate between the training procedure of RΘ
and the reconstruction step, using a variational functional of the form (2) or a similar
functional for reconstruction. While those methods in general perform slightly
worse than methods based on a direct parametrization that are trained end-to-end
(Adler and Öktem 2018), they often allow for stability and convergence guarantees
and enable a statistical interpretation of the learned functional. In this survey, we
will in particular discuss Network Tikhnonov (NETT) in section “Regularization
Properties of Learned Regularizers”, adversarial regularizers in section “Adversarial
Regularization”, and total deep variation in section “Total Deep Variation”.
Some hybrid approaches invoke a variational problem (or an early stopped
version of it) but aim at parametrizing the gradient of the regularization functional
instead of the functional directly (Kobler et al. 2017; Romano et al. 2017). While
these methods have shown very good reconstruction results, we will omit them
in our summary, focusing instead on methods that parametrize a regularization
functional directly. In particular, approaches like regularization by denoising (RED)
cannot always guarantee that the learned gradient is in fact the gradient of some
functional.
Finally, deep image priors (Ulyanov et al. 2018) use the network architecture
itself, without prior training, as a regularization term. These methods however are
crucially reliant on early stopping, and we will not discuss them in detail here, but
instead, refer to Ulyanov et al. (2018) for details.
Outline In this summary, we first give a brief overview over classical approaches
at learning regularization functionals that do not make use of deep neural networks
in section “Shallow Learned Regularizers”. We then discuss three approaches for
using neural networks as regularization functionals in detail in section “Deep
Regularizers”: Network Tikhonov in section “Regularization Properties of Learned
1136 S. Lunz
Bilevel Learning
The generic framework of (5) has been used in various contexts to learn a
regularization functional RΘ . A prominent example is learning TV-type regularizers
that consist of one or multiple regularization functionals based on the 1 norm of
the gradient or smoothed versions thereof (Kunisch and Pock 2013; Calatroni et al.
2012). More complex regularization functionals, such as the field of experts (FoE)
model (Roth and Black 2005), have also been trained using bilevel learning (Chen
et al. 2013). In this setting, a linear combination of filters is learned from data.
Deriving sharp optimality conditions for bilevel learning generally requires the
lower-level problem in (5) to be sufficiently regular. Under sufficient smoothness
assumptions on the inner problem, optimality conditions can be established, and
the problem (5) can be solved utilizing suitable techniques from PDE-constrained
optimization.
In general, solving (5) is hard, with the problem being non-convex in Θ even
in simple scenarios such as the Operator A = I d being the identity (Arridge
et al. 2019), making it challenging to scale bilevel techniques to highly parametric
regularization functionals such as those given by neural networks. However, the
concept of empirical risk minimization, i.e., of using a term of the form
Θ̂ ∈ arg minΘ [(xΘ
i
,
x i )]
i
31 Learned Regularizers for Inverse Problems 1137
is wildly used to train neural networks, and we will see an approach that utilizes
a term of this form to train a deep regularization functional in the chapter on total
deep variation (Kobler et al. 2020).
Dictionary Learning
Dictionary learning is based on the concept that the model parameter has a sparse
representation in a some dictionary D. Approaches for dictionary learning (Aharon
et al. 2006; Dabov et al. 2007; Xu et al. 2012) can be classified by the strategy taken
to learn the dictionary D, which can be defined a priori in an analytical form, can be
learned before reconstruction from data, or can be generated at reconstruction time,
where the latter is mostly used in patch-based approaches.
A common approach in this context is sparse dictionary learning, aiming at
learning a dictionary S from a collection of samples xi ∈ X by minimizing the
functional
arg minD,ξ xi , Dξi ) + μξ 1 ,
X ( (6)
i
Deep Regularizers
In the network Tikhonov (NETT) paper (Li et al. 2020), the authors propose one
of the earliest approaches at learning a regularization functional from data using
tools from deep learning. The authors put a strong emphasis on deducing stability
results for the resulting algorithm that resemble the classical theory of variational
regularization (Engl et al. 1996).
The authors study the inverse problem associated with (1) in the general setting
of (X, · ) and (Y, · ) being reflexive Banach spaces with domain D. We denote
by δ the noise level such that the noise satisfies ≤ δ. The authors restrict their
study to regularization functional of the form
Assumption 1.
– Network regularizer R:
• the regularizer is defined by (8);
• The linear part of the affine layers in ΨΘ is bounded;
• The activation functions σ are weakly continuous;
• The functional φ is weakly lower semi-continuous.
– Data consistency term D:
• For some τ > 1 we have ∀y0 , y1 , y2 ∈ Y : D(y0 , y1 ) ≤ τ D(y0 , y2 ) +
τ D(y2 , y1 );
• ∀y0 , y1 ∈ Y : D(y0 , y1 ) = 0 ⇐⇒ y0 = y1 ;
• ∀(yk )k∈N ∈ Y N : yk → y ⇒ D(yk , y) → 0;
• The functional (x, y) → D(A(x), y) is sequentially lower semi-continuous.
– Coercivity condition:
• RΘ (·) is coercive, that is RΘ (x) → ∞ as x → ∞.
δ
lim λ(δ) = lim =0 (10)
δ→0 δ→0 λ(δ)
Training scheme and results While the main emphasis of the paper is on an
extensive stability and convergence theory, the authors also propose an algorithm
for training a regularization functional. In particular, they choose a parametrization
of the form
q
RΘ (x) = ΨΘ,i (x)q ,
i
31 Learned Regularizers for Inverse Problems 1141
where ΨΘ,i (x) denotes the i the component of ΨΘ (x). In order to train ΨΘ ,
the authors propose an encoder-decoder-based architecture that invokes a decoder
network Φ in addition to the encoder ΨΘ . The joint architecture is trained to detect
the characteristic artefacts in unregularized reconstructions, as shown in Fig. 1. The
heuristic motivation behinds is that the resulting network is able to decompose the
parts of a given reconstruction that are part of the underlying images and the ones
that are reconstruction artefacts only. By penalizing the q norm of the noise part
only, typical noise patterns are suppressed during reconstruction without introducing
artefacts in the underlying image. Note the similarity to adversarial training as
discussed in the next section on adversarial regularizers (Lunz et al. 2018).
The authors employ subgradient descent for solving the minimization problem
(9) and show results for photoacoustic tomography (PAT), as seen in Fig. 2.
Note that the authors and further researchers have published a variety of
extension papers based on the NETT theory discussed here. These papers include
discussions on improved training schemes and architectures as well as on further
fields of applications (Obmann et al. 2020a,b). The NETT paper (Li et al. 2020) can
be viewed as the theoretical foundation and first result in this direction.
Adversarial Regularization
The paper “adversarial regularizers” (Lunz et al. 2018) introduces a regime for
learning regularization functionals, training the functional to reduce the distribu-
tional distance between reconstructions and true images. While there are similarities
between the training regimes in this paper and in the previously discussed NETT
(Li et al. 2020) approach, the authors of the adversarial regularizer paper focus their
1142 S. Lunz
Fig. 2 Results for photoacoustic tomography reconstruction using the NETT approach on the
Shepp-Logan phantom. (Taken from Li et al. 2020)
The authors argue that a good regularization functional RΘ is able to tell apart
the distributions Pr and Pn . The authors use this as a motivation to choose the
loss functional for training a neural network ΨΘ that directly parametrizes the
regularization functional RΘ = ΨΘ as
2
EX∼Pr ΨΘ (X) − EX∼Pn ΨΘ (X) + μ · E ∇x ΨΘ (X) − 1 +
, (11)
where the last term in the loss functional serves to enforce the trained network ΨΘ
to be Lipschitz continuous with constant one.
Written using the empirical distributions instead, the training loss (11) reads as
xi ) −
ΨΘ ( ΨΘ (A† yi ) + μ ∇x ΨΘ (ξi ) − 1)2+ ,
i i i
where the points ξ are chosen randomly on the straigt line between xi and A† y.
The authors make this choice of penalty term for its connection to the Wasserstein
distance between the distributions Pr and Pn that allows them to deduce the follow-
ing theorem on the gradient flow over a perfectly trained regularization functional.
Here, perfectly trained refers to the functional being 1-Lipschitz and perfectly
minimizing the Wasserstein distance in the Kontorovich duality formulation
Wass(Pr , Pn ) = sup EX∼Pn f (X) − EX∼Pr f (X) . (12)
f ∈1−Lip
d
Wass(Pr , Pη )|η=0 = −EX∼Pn ∇x ΨΘ (X)2 .
dη
d
[ΨΘ (gη (X))]|η=0 = −1 (14)
dη
The authors are hence able to show that the regularization functional trained via
(11) can in fact optimally reduce the Wasserstein distance between reconstructions
and ground truth images, at least at the initial step of the gradient descent scheme.
The authors extend their analysis by deducing an explicit form of the regularization
functional in the specific scenario of the true distribution being concentrated along
a manifold M ⊂ X.
Assumption 2. Denote by
the data manifold projection, where D denotes the set of points for which such
a projection exists. We assume Pn (D) = 1. This can be guaranteed under weak
assumptions on M and Pn . We make the assumption that the measures Pr and Pn
satisfy
−1
i.e., for every measurable set A ⊂ X, we have Pn (PM (A)) = Pr (A)
Theorem 3 (Data Manifold Distance (Thm 2 Lunz et al. 2018)). Under Assump-
tion 2, a maximizer to the functional
The authors motivate the theorem as a consistency result, demonstrating that the
approach yields reasonable regularization functionals in the particular setting of the
theorem.
The paper also contains stability result with a similar flavor, the NETT paper
in Theorem 1. The analysis is however less exhaustive and requires stronger
assumptions on the operator A, making it less readily applicable to all inverse
problems than Theorem 1. On a technical level, the key difference is that the NETT
paper develops assumptions that ensure that the learned regularization functional
31 Learned Regularizers for Inverse Problems 1145
Computational Results The authors show results for the discussed algorithm for
denoising and computed tomography reconstruction. They show improved results
compared to classical approaches such as total variation (Engl et al. 1996), but do not
match results obtained with end-to-end trained algorithms such as post-processing
approach for computed tomography (Jin et al. 2017) or a DnCNN (Zhang et al.
2017) for denoising. The results in Fig. 3 show results on denoising, whereas Fig. 4
contains results for computed tomography reconstruction.
The recent paper Kobler et al. (2020) follows the paradigm of end-to-end learning
in order to obtain a regularization functional, using a distance functional between
reconstruction and ground truth as training objective. In general, unrolling methods
Fig. 3 Denoising results for the adversarial regularizer on the Berkeley Segmentation dataset
(BSDS500). (Taken from Lunz et al. 2018). (a) Ground Truth. (b) Noisy Image. (c) TV. (d)
Denoising N.N. (e) Adversarial Reg.
1146 S. Lunz
Fig. 4 Reconstruction from simulated CT measurements on the LIDC dataset using adversarial
regularizers. (Taken from Lunz et al. 2018). (a) Ground Truth. (b) FBP. (c) TV. (d) Post-Processing.
(e) Adversarial Reg.
such as Adler and Öktem (2017), Meinhardt et al. (2017), and Kobler et al. (2017)
recover an image xT from measurements y by applying
where the iteration is typically initialized with a pseudo-inverse x0 and stopped after
a fixed predefined number of steps N. The parameters Θ are trained by minimizing
a loss functional
i
(xN , xTi ) (20)
i
over the parameters Θ for a collection of samples {xTi , y i } and a notion of distance
that is typically chosen to be the 2 distance. Various approaches differ in their
choice of parametrization of ΨΘ , ranging from architectures that do not further
restrict the mapping properties of ΨΘ to those that explicitly separate out a gradient
terms obtained from the data term and the image prior, leading to the form
While these methods have shown to yield high-quality reconstructions, they cannot
readily be understood using the viewpoint of variational regularization, as the
regularization or image prior is implicitly contained in the mapping properties of
the network ΨΘ . Even if parameterized as in (21), the network parametrizes the
gradients of an implicit regularization functional rather than the functional directly.
An additional challenge in bridging the gap between unrolling based methods and
a variational methods lies in the fixed choice of iterations N that is typically small
and prohibits viewing xN as the result of a minimization of a variational problem.
The authors of Kobler et al. (2020) bridge these problems by introducing
two novel contributions: firstly, instead of parametrizing the gradient of the reg-
ularization functional, the functional itself is parametrized directly. While this
makes training slightly more challenging, requiring double backpropagation for
minimization, it yields a true regularization functional that can be interpreted as
31 Learned Regularizers for Inverse Problems 1147
Architecture For an image x ∈ RnC , where n denotes the number of pixels and C
the number of channels, the authors parametrize a regularization functional R of the
form
n
RΘ (x) = r(x, Θ)i , r(x, Θ) = ωt N(Kx) ∈ Rn . (22)
i=1
Mi21 / Mi25
+ Mi22 / Mi24
+ Mi23
downsampling
3,1
3,2
+ upsampling
φ
K2
K2
/ concatenation
+ addition
1148 S. Lunz
T t i T
xsi+1 = xsi − A Axs+1 − y i − ∇RΘ (xsi ), (24)
S S
The training objective is simultaneously minimized for the time horizon T and
the parameters Θ that determine the form of the regularization functional. The
stochastic ADAM optimizer is used for minimization. The zero-mean constraint on
the regularization functional is enforced projection after every minimization step.
Note that differentiating (23) with respect to (T , Θ) involves derivatives of the
regularization functional RΘ (x) with respect to both Θ and x. These terms are
handled in a numerically efficient way using the double backpropagation algorithm.
The algorithm can be applied in this context as the architecture and activation func-
tions used have been chosen to be C 2 . The application of double backpropagation
separates this work from earlier attempts at learning a regularization functional with
a loss functional of the form (20).
The authors also derive various theorems to characterize the solutions of (23).
Theorem 5 (Existence of a solution (Thm 2.1 Kobler et al. 2020)). The time
continuous version of (23) alongside its corresponding state equation (24) admits a
solution in the sense that the infimum is attained.
Fig. 6 TDV results for denoising with various choices for the time horizon. (Taken from Kobler
et al. 2020)
functional RΘ , the norm of the gradient of the regularization functional, and various
others. Details can be found in Theorem 3.2 in the paper.
Results The authors show results for their TDV approach on a variety of inverse
problems. For denoising, the approach is able to outperform approaches like BM3D
(Dabov et al. 2007) as well as some end-to-end trained approaches like DnCNN
(Zhang et al. 2017), but slightly underperforms compared to FOCNet (Jia et al.
2019). The latter has roughly one hundred times more parameter than the TDV
approach. Results for denoising are shown in Fig. 6 for various choices of time
discretization S and time horizon T . As expected, choosing the time horizon lower
than the learned optimal parameter leads to under-regularization, while choosing it
higher leads to over-regularization.
This chapter also discusses applications of the approach to medical imaging,
demonstrating that a prior trained for computed tomography reconstruction of
abdominal CT images can be readily applied for MRI reconstruction of knee images
– a task that differs both in the imaging modality and in the characteristics of the
images occurring. This shows that TDV generalizes well between different tasks
and image characteristics. Results for MRI reconstruction can be seen in Fig. 7.
Conclusion
Fig. 7 TDV results for MRI reconstruction. (Taken from Kobler et al. 2020)
Stability Results The NETT paper contains an extensive stability analysis that
is applicable to a wide variety of inverse problems. On a technical level, the
theory does not make any assumptions on the data term being coercive and
ensures coercivity by discussing sufficient conditions for the learned regularization
31 Learned Regularizers for Inverse Problems 1151
functional to be coercive. This in particular allows the application of the theory to ill-
posed inverse problems. The adversarial regularizer paper on the other hand makes
strong assumptions on the properties of the forward operator, which can be violated
in the context of ill-posed inverse problems. Most of the theoretical analysis in the
paper focuses on discussing the effects of the learned regularization functional on
the distribution of reconstructions instead of focusing on an instance-level stability
theory. For the total deep variation approach, the authors include a discussion in
terms of optimal control theory as well as stability with respect to changes in the
training dataset, but do not derive stability results that are equivalent to the classical
stability theory for inverse problems.
Training Data Both the NETT and the TDV approach rely on paired training data
consisting of measurements and their corresponding ground truth images. While the
first one can be extended to an unpaired setting when changing the training scheme
(Obmann et al. 2020b), TDV is fundamentally dependent on paired data. Looking
at marginals of distributions only, the adversarial regularizer approach can naturally
handle unpaired training data.
Outlook
References
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33(12), 124007 (2017)
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete
dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Calatroni, L., Cao, C., De Los Reyes, J.C., Schönlieb, C.-B., Valkonen, T.: Bilevel approaches for
learning of variational imaging models. RADON Book Series 18 (2012)
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs
for image restoration. In: German Conference on Pattern Recognition, pp. 271–281. Springer
(2013)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-
domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, vol. 375. Springer
Science & Business Media, Dordrecht (1996)
Jia, X., Liu, S., Feng, X., Zhang, L.: Focnet: a fractional optimal control network for image denois-
ing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 6054–6063 (2019)
Jin, K.H., McCann, M., Froustey, E., Unser, M.: Deep convolutional neural network for inverse
problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)
Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational
methods and deep learning. In: German Conference on Pattern Recognition, pp. 281–293.
Springer (2017)
Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation for linear inverse problems
(2020)
Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational
models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)
Li, H., Schwab, J., Antholzer, S., Haltmeier, M.: NETT: solving inverse problems with deep neural
networks. Inverse Probl. 36, 065005 (2020)
Lunz, S., Öktem, O., Schönlieb, C.-B.: Adversarial regularizers in inverse problems. In: Bengio,
S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in
Neural Information Processing Systems 31, pp. 8507–8516. Curran Associates, Inc., Red Hook
(2018)
Meinhardt, T., Moller, M., Hazirbas, C., Cremers, D.: Learning proximal operators: using
denoising networks for regularizing inverse imaging problems. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 1781–1790 (2017)
Obmann, D., Schwab, J., Haltmeier, M.: Deep synthesis regularization of inverse problems. arXiv
preprint arXiv:2002.00155 (2020a)
Obmann, D., Nguyen, L., Schwab, J., Haltmeier, M.: Sparse aNETT for solving inverse problems
with deep learning. arXiv preprint arXiv:2004.09565 (2020b)
Romano, Y., Elad, M., Milanfar, P.: The little engine that could: regularization by denoising (red).
SIAM J. Imaging Sci. 10(4), 1804–1844 (2017)
31 Learned Regularizers for Inverse Problems 1153
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 234–241. Springer (2015)
Roth, S., Black, M.J.: Fields of experts: a framework for learning image priors. In: 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2,
pp. 860–867. IEEE (2005)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 9446–9454 (2018)
Xu, Q., Yu, H., Mou, X., Zhang, L., Hsieh, J., Wang, G.: Low-dose x-ray CT reconstruction via
dictionary learning. IEEE Trans. Med. Imaging 31(9), 1682–1697 (2012)
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning
of deep cnn for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Filter Design for Image Decomposition
and Applications to Forensics 32
Robin Richter, Duy H. Thai, Carsten Gottschlich,
and Stephan F. Huckemann
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156
Applications and Challenges for Automated Image Decomposition . . . . . . . . . . . . . . . . . . . 1157
Diffusion Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1160
Fourier and Wavelet Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1160
Variational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162
Non-linear Spectral Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164
Texture Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166
Adaptive Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167
Adapting the Data-Fidelity-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168
Connection to the G-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168
Other Choices of M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169
Connections with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169
Solving via the ADMM/AL-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1170
Interpretation via a Feasibility Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171
A General Learning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175
Filter Design Using Factor Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179
Abstract
Keywords
Introduction
Image decomposition is one of the first and crucial steps in image analysis,
be it decomposition into signal and noise, foreground and background, or more
refined, such as decompositions into cartoon, texture, and noise. Often, under
theoretical technical assumptions, precise objective functions are employed to this
end; however, for specific applications, they are not a priori available. The latter
is the case, for instance, when at crime scenes, latent fingerprints or shoeprints
are to be compared to print scans taken from suspects at hand who are released
immediately after. For expert comparison taking place afterwards, the quality of the
scanned prints is decisive. This quality, however, can only be defined indirectly,
for example, that improved quality is proportional to improved (lowered) error
rates. In this application scenario, other image processing steps surface as well,
namely, image enhancement, for example, of latent prints, and image compression
to significant features, for example, in large databases.
In this chapter we give, guided by examples from forensics, a brief overview
of image decomposition methods from the past to the present with emphasis on a
unified viewpoint for some current challenges.
In acoustic signal processing, digital filter design has first been inspired by analog
electric filtering circuits, and this has also inspired filter design for images. Images,
however, have fundamentally different features than acoustic signals. While for
the former Fourier decomposition was highly effective, image analysis required
different types of analysis, for example, Haar wavelet frames for sharp edge
modeling (Daubechies 1992; Mallat 2008). Other popular approaches are given
by diffusion equations (Perona and Malik 1990; Weickert 1998) or minimization
problems (Mumford and Shah 1989; Scherzer et al. 2009). This has led to the
development of entirely new mathematical frameworks, often connected with one
another (Steidl et al. 2004; Burger et al. 2016).
32 Filter Design for Image Decomposition and Applications to Forensics 1157
Very often, images contain an object of interest (or several) within a region of
interest (ROI), for example, the area covered by a latent fingerprint or shoeprint
in forensics (cf. Fig. 1), a tumor within an organ in medical imaging, faces observed
1158 R. Richter et al.
Fig. 1 Latent fingerprint images from the NIST special database 27 (left) (cf. Garris and McCabe
(2000) with boundary (drawn in yellow) of the estimated region of interest by the DG3PD of Thai
and Gottschlich (2016a)) and from Wiesner et al. (2020b) two overlapping shoeprints with similar
shoe pattern elements (right). A natural question is: Are those from the same shoe?
Fig. 2 Fingerprint ridge lines of very good quality following an orientation field, ending or forking
at minutiae (left, from Turroni et al. 2011). Shoeprint (detail from Fig. 1) with sole pattern and
pattern damages called accidentals (right). Here the black dots with circular white halo due to sand
grains and the dark black clusters due to dirt need to be discriminated from true wear effects, for
instance, on the left side of the brand’s logo
For shoeprint analysis, due to the larger challenges given by the huge diversity
of shoe element patterns and accidental structures, automated comparison is still in
its very beginnings, e.g., Wiesner et al. (2020a,b).
Diffusion Methods
Solving the heat equation with initial conditions given by the image at hand, and
following it over time, is one of the oldest smoothing methods. Over time, first
smaller structures are smoothed, and then also bigger structures disappear, until,
after infinite time, no information remains. This calls for smart choices of stopping
times, and, in order to preserve specific structures for a longer time, alterations
of the diffusion differential equation. For instance, Perona and Malik (1990), and
subsequently Alvarez et al. (1992), impede diffusion along image gradients by
anisotropic nonlinear diffusion, thus steering diffusion along rather constant image
intensity regions.
In fingerprint images, among others, as detailed above, estimation of orientation
fields is of high importance. Due to small interridge distances in fingerprints, in low-
quality fingerprint images, however, image gradients are heavily influenced by noise
and cannot be relied on. To this end, Perona (1998) applied orientation diffusion
to estimate a smooth orientation field. Such separately estimated orientation fields
(for alternate methods, e.g., Bazen and Gerez 2002) have been used by Gottschlich
and Schönlieb (2012) for fingerprint enhancement (cf. Fig. 3 for this and related
methods):
In the context of image processing, Fourier, wavelet, curvelet, and similar trans-
formations map an image from an image domain into a spectral, wavelet, etc.
domain, apply some form of thresholding, and map the result under the inverse
transformation back to the image domain, giving a filtered image. Such methods
may serve all ends of noise removal, cartoon and texture identification, image
32 Filter Design for Image Decomposition and Applications to Forensics 1161
(a) Original fingerprint (b) Gradient based coherence- (c) Short time Fourier trans-
enhancing diffusion form analysis
(d) Orientation field (e) Curved Gabor filters (f) Oriented linear diffusion
Fig. 3 A low-quality fingerprint (a) from Turroni et al. (2011) and the corresponding orientation
field (d), where orientations in degrees are encoded as gray values between 0 and 179, with
0 denoting the x-axis’ direction angle and angles increase clock-wise. Compared enhancement
methods are (b) gradients-based coherence-enhancing diffusion filtering according to Weickert
(1999), (c) STFT analysis by Chikkerur et al. (2007), (e) curved Gabor filters by Gottschlich
(2012), and (f) oriented diffusion filtering by Gottschlich and Schönlieb (2012)
Fig. 4 Factorized directional bandpass (FDB) method from Thai et al. (2016): Soft-thresholding
the result of 16 directional filters in the Fourier domain (first factor), binarizing the reconstructing
(second factor) in the image domain, and morphological operations lead to identification of the
ROI used in Fig. 5
to discriminate between real and spoof fingers. Spoof fingerprints are artificial
fingerprints created from gelatin or latex; say, cf. Maltoni et al. (2009). Factorized
directional bandpass (FDB) filters have been built by Thai et al. (2016) using
the directional Hilbert transform of a Butterworth bandpass (DHBB) filter and
soft-thresholding; cf. Fig. 4. Curiously, thresholding can be viewed as testing with
statistical significance for the presence of non-zero filter response coefficients;
cf. (Donoho and Johnstone 1994; Frick et al. 2012). The FDB filters have been
optimized for texture extraction from fingerprint images with the purpose of
segmentation; see Fig. 5.
Variational Problems
Variational problems have played an important role in imaging over the last decades;
see, for example, Scherzer et al. (2009) and Aubert and Kornprobst (2006). As
for anisotropic diffusion the emphasis lies on computing image approximations
32 Filter Design for Image Decomposition and Applications to Forensics 1163
Fig. 5 Four examples of estimated fingerprint segmentation by FDB from Thai et al. (2016)
that keep sharp edges (discontinuities) while removing uninformative noise and/or
texture. The, possibly, most influential model is the Rudin-Osher-Fatemi (ROF)
model of Rudin et al. (1992). In its unconstrained form it is below given as
Problem 1 (cf. Chambolle and Lions 1997; they use Neumann boundary conditions,
however, as discussed below). It can be seen as a convex recast of the more intrigued
Mumford-Shah (MS) model (Mumford and Shah 1989) that involves a non-convex
term (Hausdorff measure of discontinuities); and for this the MS model is difficult
to minimize numerically. Instead, the ROF model includes a total-variation term as
regularizer that, when discretized, is given by an 1 -norm of the discrete gradient.
Note that instead of the Neumann boundary conditions that are often enforced in
the continuous domain (e.g., in the classical Rudin et al. 1992), we, being in the
discrete domain, prefer periodic boundary conditions. Then, the discrete gradient
comes in handy as a circular convolution operator denoted by CD ; cf. Definition 1
in section “Adaptive Balancing”.
Solving (1) via steepest descent has led Andreu et al. (2001) to consider a
corresponding partial differential equation (PDE) with weak solutions coined as
total variation (TV) flow.
Meanwhile, alleviating for the systematic loss of contrast in the classical ROF-
model, Osher et al. (2005) propose iterative Bregman iterations beginning with the
ROF solution, passing near a putative noise free version and eventually converging
in an inverse scale-space flow to the original noisy image.
An extension using higher order derivatives has led to the total generalized
variation (TGV) model in Bredies et al. (2010) with more detail in Papafitsoros and
1164 R. Richter et al.
Bredies (2015). For a detailed overview of total variation in imaging, see Caselles
et al. (2015) and the chapter in this book. In the context of relating different imaging
techniques to one another, (Steidl et al. 2004), among others, link the balancing
parameter μ of (1) to the stopping time in the anisotropic diffusion model discussed
in section “Diffusion Methods”. In the following we also consider the general
regularization problem given for an input image F ∈ Rn×m by
Texture Information
for U, V ∈ Rn×m and C(V ) being the curvelet decomposition (Candès et al. 2006)
of V , which is to be minimized under the constraints
|C(ε)|∞ ≤ δ , F = U +V +ε,
where C is the same curvelet transform of Candès et al. (2006) and μ1 , μ2 , δ∈R.
Due to orientation sensitivity of the curvelet transform, G3PD is well suited to
capture the fringe pattern of a fingerprint in the texture component; see Fig. 6.
In automated practice, when applied to images, not containing other small-scale
information featuring similar frequencies as the fingerprint pattern, parameters can
be well tuned to specific sensors, such that ROIs are reliably extracted. On crime
1166 R. Richter et al.
Fig. 6 Decomposition by G3PD of a fingerprint image F from Thai and Gottschlich (2016b) into
three parts: cartoon (U ), texture (V ), noise (ε)
scene images, however, ideal parameter choices often vary substantially over images
with different background, calling for more flexibility of the model and specific
learning methods.
Machine Learning
We have thus far reported how for specific tasks at hand (e.g., segmentation,
enhancement) specific tools have been designed, often using elaborate parameter
tuning. In fact such ideal parameters often vary over varying use cases (e.g., G3PD
requires different parameter choices when large regions contain small-scale patterns
not related to the fingerprint at hand, as is often the case in real crime scene
images). This calls for designing more flexible models and learning methods to
incorporate heterogeneous use cases. Notably, when abundant data are available,
nearly any machine learning method off the shelf usually works well. The less
data are available, however, the more a priori structure must be built into learning
methods. This is, for instance, the case in academic forensic research. For example,
supervised learning models involving second order minutiae structure have resulted
in a highly discriminatory test for separating real fingerprints from synthetic images
where training set, validation set, and test set have summed up only 110 fingers (and
8 impressions per finger) per class; cf. Gottschlich and Huckemann (2014).
The very small size of data sets in fingerprint recognition and forensic appli-
cations stands in stark contrast to databases for image classification and visual
object recognition like ImageNet which contains more than 14 million images
(http://image-net.org/about-stats). Very large data sets enable fully automatic end-
to-end learning by neural networks (Bengio 2009), whereas a very small number of
training examples pose a huge additional machine learning challenge for biometric
and forensic research. For the task of fingerprint quality estimation using image
decomposition, Richter et al. (2019) proposed a new robust biometric quality vali-
dation scheme (RBQ VS) based on repeated random subsampling cross-validation
to deal with problematic lack of a preferable number of training and test images.
For fingerprint alteration detection, even fewer examples are available for training
and testing and Gottschlich et al. (2015) also resort to cross-validation in order
to compare different approaches. Biometric and forensic applications could profit
32 Filter Design for Image Decomposition and Applications to Forensics 1167
immensely from research on, e.g., deep learning (LeCun et al. 2015), evolutionary
algorithms (Kennedy and Shi 2001), Bayesian learning (Neal 2012), support vector
machines (Schoelkopf and Smola 2002), or random forests (Breiman 2001) if only
larger data sets were available.
Regarding the aforementioned imaging approaches (cf. sections “Diffusion
Methods”, “Fourier and Wavelet Methods”, “Variational Problems”), there have
been a multitude of machine learning extensions proposed in the literature. For
example, anisotropic diffusion has been learned by De los Reyes and Schönlieb
(2013), and Chen and Pock (2017) have learned reaction diffusion models, while
(Grossmann et al. 2020) learn TV transform filters. Arridge et al. (2019) give a
survey on solving ill-posed inverse problems based on deep learning, with domain-
specific knowledge contained in physical–analytical models.
Adaptive Balancing
where B ∗ U is the usual circular convolution of matrices (e.g., Mallat 2008) with
components given by
n−1 m−1
(B ∗ U )[r, s] = B[k, ]U [r − k, s − ] ,
k=0 =0
Then the (forward) discrete gradient (with periodic boundary conditions) is given
by the matrix-family convolution
T
CD : Rn×m → 2 , U → CD1 (U ), CD2 (U ) ,
where
1168 R. Richter et al.
⎧ ⎧
⎪ ⎪
⎨−1
⎪ if [k, ] = [0, 0] ⎨−1
⎪ if [k, ] = [0, 0]
D1 [k, ] := 1 if [k, ] = [1, 0] , D2 [k, ] := 1 if [k, ] = [0, 1] .
⎪
⎪ ⎪
⎪
⎩0 else ⎩0 else
with the discrete gradient CD and the new balancing filter M ∈ Rn×m featuring
M ∈ Rn×m+ (where M is the discrete Fourier transform). Notably, Aujol and Gilboa
(2006) also allow operators more general than the circular convolution operator CM
above. Of course, the ROF model is a special case of (4) by choosing CM as a
multiplication with the balancing parameter μ. Let us ponder first on a connection
of (4) with the G-norm given in (3) and secondly on some literature considering (4).
f − u = div(g) = ΔP .
Plugging the above into the model of Vese and Osher (2003), one obtains (cf.
Equation (2.1) in Osher et al. 2003) for λ ∈ R+ the objective function
32 Filter Design for Image Decomposition and Applications to Forensics 1169
2
|∇u| + λ ∇(Δ−1 )(f − u) .
Other Choices of M
1
Lσ (ξ ) := .
1 + (2π σ ξ )4
The advantage of the functional JB,M given in (5) lies in its convexity, in the
smoothness of its data-fidelity term, and in the norm of its regularizer, being well
understood. In the following we focus on the alternating directions method of
multipliers (ADMM) in the context of augmented Lagrangian (AL) approaches.
While the convergence of ADMM/AL to the exact solution is often slower when
compared to other methods, its convergence to a neighborhood, when given
bad starting values, is rather satisfactory. The method of multipliers has been
introduced by Powell (1969) and Hestenes (1969). For a general result on the
setup and convergence of ADMM/AL algorithms in the context of minimization
via the augmented Lagrangian, see Theorem 8 ofEckstein and Bertsekas (1992) and
references therein.
There have been various other algorithms proposed for minimizing functionals
such as JB,M from (5). The original ROF model, a special case of (4), was
solved by Rudin et al. (1992) via a rather slow gradient descent algorithm. Popular
later approaches include projection algorithms (Chambolle 2004; Aujol and Gilboa
2006), the use of Bregman distances (Goldstein and Osher 2009), graph-cut methods
(Darbon and Sigelle 2006a,b), and forward-backward splitting (Chambolle and
Pock 2011). For an in-depth overview, we refer to Chambolle and Pock (2016)
and Goldstein et al. (2014).
The functional JB,M of (5) contains the non-linear regularization term
|CB (U )|1,κ which cannot be minimized simply by differentiation with respect
to U . For this reason a new additional variable W is introduced, taking the place of
CB (U ). This yields the constrained problem
μ 2
minimize JB,M (U, W ) := W 1,κ + CM (F − U ) ,
2 (6)
such that CB (U ) = W , over U ∈ Rn×m and W ∈ P ,
μ 2
JAL (U, W , λ) := W 1,κ + CM (F − U )
2
2 (7)
β
+ W − CB (U ) + λ, W − CB (U ) .
2
32 Filter Design for Image Decomposition and Applications to Forensics 1171
W (τ ) = arg min JAL U (τ −1) , W ; λ(τ ) ,
W ∈P
U (τ ) = arg min JAL U, W (τ ) ; λ(τ ) ,
U ∈Rn×m
λ(τ +1) = λ(τ ) + β W (τ ) − CB U (τ ) .
end for
Notably, a saddle-point of (7) does not depend on the choice of β. To solve for the
saddle-point, Algorithm 1 alternates between minimizing JAL for W and U (one
iteration of an ADMM algorithm), while updating in each iteration the Lagrangian
multiplier λ via a gradient step.
We now show that Algorithm 1, which converges to the saddle-point of JAL , solves
a special case of a broader feasibility problem (Problem 3). Before, we state a
(seemingly) different feasibility problem directly derived from the above updating
rules.
To prepare for the proof of equivalence of the above feasibility problem and the
one stated further below (Problem 3), let us first compute the minimizers of JAL
1172 R. Richter et al.
with respect to W and U , using standard variational calculus (from, e.g., Boyd and
Vandenberghe 2004; Bauschke and Combettes 2011).
β
J1 (W ) := W 1,κ + ||W − CB (U )||2 + λ, W ,
2
is given by
1 1
W † = Sκ CB (U ) − λ; , (9)
β β
μ β
J2 (U ) := ||CM (F − U )||2 + ||W − CB (U )||2 − λ, CB (U ) ,
2 2
is given by
−1
U † = μ μC∗M CM + βC∗B CB C∗M CM (F )
−1 (10)
1
+ β μC∗M CM + βC∗B CB C∗B W + λ ,
β
∈ P such
Abbreviating the two operators in (10), we introduce A ∈ Rn×m and B
that
−1
CA := μ μC∗M CM + βC∗B CB C∗M CM , (11)
and
−1
C∗B := β μC∗M CM + βC∗B CB C∗B .
This gives the above anticipated feasibility problem. As before, for any matrix A, A
denotes its discrete Fourier transform.
32 Filter Design for Image Decomposition and Applications to Forensics 1173
1. Y ∈ Rn×m
+ ,
2. CY C∗B CB has all eigenvalues in [0, 1).
∈ P via
Moreover, define B
∗
CA = E − CB
CB ,
Theorem 1. Let F ∈ Rn×m , B ∈ P and κ ∈ {1, 2}. For given M ∈ Rn×m such
that M ∈ Rn×m
+ , let μ ∈ R+ and let (U , W , λ ) ∈ 1+2P be a solution of
† † †
∗
we have that Y ∈ Rn×m + , that CY CB CB has all eigenvalues in [0, 1), and that
† † †
(U , W , λ ) is a solution of Problem 3.
Vice versa, let Y ∈ Rn×m , such that Y ∈ Rn×m + and CY C∗B CB has all eigenvalues
in [0, 1), let ν ∈ R+ , and let (U † , W † , λ† ) ∈ 1+2P be a solution of Problem 3.
Then, defining μ = 1 and CM as the unique positive semi-definite square root of
1174 R. Richter et al.
−1
= νCY
CM E − CY C∗B CB = νC−1 ∗
Y − νCB CB
(existing due to the eigenvalues of CY C∗B CB being strictly less than 1), then M ∈
Rn×m
+ and (U † , W † , λ† ) is a solution of Problem 2.
Recall that the solution does not depend on the choice of β ∈ R+ for JAL . Hence,
w.l.o.g., we can set β = μ. Moreover setting ν = μ we have at once by (9) that
the definitions of κ1 of Problem 2 and of κ1 Problem 3 coincide. Since C is the
same for both problems, we are left to show that (U † , W † , λ† ) ∈ G
2.
Defining Y ∈ Rn×m as in the assertion, we have at once that Y ∈ Rn×m + .
Moreover, since matrix convolution operators are diagonalized by the discrete
Fourier transform, the eigenvalues of CY C∗B CB are given by
⎛ ⎞−1
P
2 P
2
β ⎝μM[k, ]2 + β Bp [k, ] ⎠ Bp [k, ] ∈ [0, 1) ,
p=1 p=1
because M ∈ Rn×m+ .
Last, by (10) we have that
−1
U † =μ μC∗M CM + βC∗B CB C∗M CM (F )
−1
∗ ∗ ∗ 1 †
+ β μCM CM + βCB CB CB W + λ †
β
μ 1
= CY C∗M CM (F ) + CY C∗B W † + λ†
β β
1
=CA (F ) + C∗B W † + λ† ,
ν
Hence, (U † , W † , λ† ) ∈ G † † †
2 yielding that (U , W , λ ) is a solution of Problem 3.
Vice versa, let now F, Y ∈ R n×m , B ∈ P , ν ∈ R+ , and κ ∈ {1, 2}, with
Y ∈ Rn×m
+ , and suppose that C C C
Y B B has all eigenvalues in [0, 1). Further, let
(U , W , λ ) ∈ 1+2P be a solution of Problem 3.
† † †
32 Filter Design for Image Decomposition and Applications to Forensics 1175
The filter Y ∈ Rn×m introduced in Problem 3 had to satisfy two properties. In order
to generalize beyond these, we formalize them as relations between B and B and
add already a relaxed version, which comes first.
∈ 2P .
Definition 2. Let (B, B)
P
0≤ p [k, ]Bp [k, ] < 1 ,
B for all 0 ≤ k ≤ n − 1 and 0 ≤ ≤ m − 1 .
p=1
Algorithm 2
Input: F ∈ Rn×m .
∈ 1+2P .
Input Filters: (A, B, B)
Customizable Parameters: β ∈ R+ , κ ∈ {1, 2} .
Initialization: U (0) = F ∈ Rn×m , λ(1) = 0 ∈ P .
for τ = 1, 2, . . . do
1
1
W (τ ) = Sκ CB U (τ −1) − λ(τ ) ; ,
β β
1
U (τ ) = CA (F ) + C∗B W (τ ) + λ(τ ) ,
β
λ(τ +1) = λ(τ ) + β W (τ ) − CB (U (τ ) ) .
end for
∗
As CD is a discrete gradient, the operator CD CD is a discrete Laplace operator. The
filter A can now be recast by the Laplacian B-spline φ defined in Van De Ville et al.
(2005) given in the frequency domain by
! " γ2
4 sin2 x2 + sin2 y2 − 83 sin x2 sin y2
φ(x, y) := . (14)
x2 + y2
In Van De Ville et al. (2005) the function φ served as a scaling function to construct
bi-orthogonal wavelets. Doing a similar construction (B, B) can be obtained by a
bi-orthogonal, directional wavelet frames construction (for the exact construction,
see Richter et al. 2020, Appendix C of Richter 2019 and also Mallat 2008; Unser
and Ville 2010). Note that if one were to use orthogonal wavelet frames we would
be in the realm of strongly factoring, which is exactly the case for the Gabor wavelet
frames proposed by Aujol and Gilboa (2006). The heuristic derivation of the new
filter A, elaborated above, draws on a similar connection as in Cai et al. (2012),
where the discrete gradient CD is recast as a Haar wavelet frame, the first order
cardinal B-spline (e.g., Chui 1992).
1178 R. Richter et al.
Fig. 7 Applying Algorithm 2 with filter families based on (14) in Problem 4 to the shoeprint detail
(from Fig. 2), featuring sharp edges, little blurring, and minimal loss of contrast (left). From this
cartoon picture, shoeprint elements are detected by a classical edge detection (Canny 1986) filter
(right). For instance, the wear effect (called accidental, cf. section “Applications and Challenges
for Automated Image Decomposition”) on the left of the brand’s logo is no longer part of the
corresponding element’s edge
Conclusion
Additionally, since the use case is often defined only indirectly (e.g., improved
quality results by improved matching rates, as in Richter et al. 2019), this calls
for the development of
In this chapter we have given a short survey on current research with emphasis on a
recent development that seems promising in view of the above-stated goals (1)–(4).
32 Filter Design for Image Decomposition and Applications to Forensics 1179
Acknowledgments The authors thank the anonimous referee for the valuable comments and the
first and last author gratefully acknowledge funding by the DFG within the RTG 2088.
References
Nist fingerprint quality (NFIQ) (2015) https://www.nist.gov/services-resources/software/nist-
biometric-image-software-nbis. Accessed: 2017-12-04
Alvarez, L., Lions, P.-L., Morel, J.-M.: Image selective smoothing and edge detection by nonlinear
diffusion. II. SIAM J. Numer. Anal. 29(3), 845–866 (1992)
Andreu, F., Ballester, C., Caselles, V., Mazón, J.M.: Minimizing total variation flow. Differ. Integral
Equ. 14(3), 321–360 (2001)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numerica 28, 1–174 (2019)
Aubert, G., Kornprobst, P.: Mathematical problems in image processing, volume 147 of Applied
Mathematical Sciences. Springer, New York, 2nd edn. Partial differential equations and the
calculus of variations, With a foreword by Olivier Faugeras (2006)
Aujol, J.-F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded
variation component and an oscillating component. J. Math. Imaging Vision 22(1), 71–88
(2005)
Aujol, J.-F., Chambolle, A.: Dual norms and image decomposition models. Int. J. Comput. Vis.
63(1), 85–104 (2005)
Aujol, J.-F., Gilboa, G.: Constrained and SNR-based solutions for TV-Hilbert space image
denoising. J. Math. Imaging Vision 26(1–2), 217–237 (2006)
Aujol, J.-F., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition—modeling,
algorithms, and parameter selection. Int. J. Comput. Vis. 67(1), 111–136 (2006)
Bartůněk, J., Nilsson, M., Sällberg, B., Claesson, I.: Adaptive fingerprint image enhancement with
emphasis on preprocessing of data. IEEE Trans. Image Process. 22(2), 644–656 (2013)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces.
CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York. With
a foreword by Hédy Attouch (2011)
Bazen, A., Gerez, S.: Systematic methods for the computation of the directional fields and singular
points of fingerprints. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 905–919 (2002)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Computer Science
and Applied Mathematics. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New
York/London (1982)
Bigun, J.: Vision with Direction. Springer, Berlin/Germany (2006)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Bredies, K., Dong, Y., Hintermüller, M.: Spatially dependent regularization parameter selection in
total generalized variation models for image restoration. Int. J. Comput. Math. 90(1):109–123
(2013)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imag. Sci. 3(3), 492–526
(2010)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Buades, A., Le, T., Morel, J.-M., Vese, L.: Fast cartoon + texture image filters. IEEE Trans. Image
Process. 19(8), 1978–1986 (2010)
Burger, M., Gilboa, G., Moeller, M., Eckardt, L., Cremers, D.: Spectral decompositions using one-
homogeneous functionals. SIAM J. Imag. Sci. 9(3), 1374–1408 (2016)
Cai, J.-F., Dong, B., Osher, S., Shen, Z.: Image restoration: total variation, wavelet frames, and
beyond. J. Am. Math. Soc. 25(4), 1033–1089 (2012)
Calatroni, L., Cao, C., De Los Reyes, J.C., Schönlieb, C.-B., Valkonen, T.: Bilevel approaches for
learning of variational imaging models. Variational Meth Imaging Geometric Control 18(252),
2 (2017)
1180 R. Richter et al.
Candès, E., Demanet, L., Donoho, D., Ying, L.: Fast discrete curvelet transforms. Multiscale
Model. Simul. 5(3), 861–899 (2006)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.
8(6), 679–698 (1986)
Caselles, V., Chambolle, A., Novaga, M.: Total variation in imaging. In Handbook of Mathematical
Methods in Imaging. Vol. 1, 2, 3. Springer, New York (2015), pp. 1455–1499
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging
Vision 20(1–2), 89–97 (2004)
Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems.
Numerische Mathematik 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numerica
25, 161–319 (2016)
Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and
effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1256–1272 (2017)
Chikkerur, S., Cartwright, A., Govindaraju, V.: Fingerprint image enhancement using STFT
analysis. Pattern Recogn. 40(1), 198–211 (2007)
Chui, C.K.: An introduction to Wavelets, Volume 1 of Wavelet Analysis and its Applications.
Academic Press, Inc., Boston (1992)
Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation. I. Fast and exact
optimization. J. Math. Imaging Vision 26(3), 261–276 (2006a)
Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation. II. Levelable
functions, convex priors and non-convex cases. J. Math. Imaging Vision 26(3), 277–291 (2006b)
Daubechies, I.: Ten lectures on wavelets, volume 61 of CBMS-NSF Regional Conference Series
in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
(1992)
De los Reyes, J.C., Schönlieb, C.-B.: Image denoising: learning the noise model via nonsmooth
PDE-constrained optimization. Inverse Probl. Imaging 7(4), 1183–1214 (2013)
Donoho, D.L., Johnstone, J.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3),
425–455 (1994)
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point
algorithm for maximal monotone operators. Math. Program. 55(3, Ser. A), 293–318 (1992)
Frick, K., Marnitz, P., Munk, A., et al. Statistical multiresolution dantzig estimation in imaging:
fundamental concepts and algorithmic framework. Electron. J. Stat. 6, 231–268 (2012)
Garnett, J.B., Le, T.M., Meyer, Y., Vese, L.A.: Image decompositions using bounded variation
and generalized homogeneous Besov spaces. Appl. Comput. Harmon. Anal. 23(1), 25–56
(2007)
Garris, M.D., McCabe, R.M.: Nist special database 27: Fingerprint minutiae from latent and match-
ing tenprint images. Technical Report 6534, National Institute of Standards and Technology,
Gaithersburg (2000)
Gilboa, G.: A total variation spectral framework for scale and texture analysis. SIAM J. Imag. Sci.
7(4), 1937–1961 (2014)
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization
methods. SIAM J. Imag. Sci. 7(3), 1588–1623 (2014)
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imag.
Sci. 2(2), 323–343 (2009)
Gottschlich, C.: Curved-region-based ridge frequency estimation and curved Gabor filters for
fingerprint image enhancement. IEEE Trans. Image Process. 21(4), 2220–2227 (2012)
Gottschlich, C., Huckemann, S.: Separating the real from the synthetic: Minutiae histograms as
fingerprints of fingerprints. IET Biom. 3(4), 291–301 (2014)
Gottschlich, C., Mikaelyan, A., Olsen, M., Bigun, J., Busch, C.: Improving fingerprint alteration
detection. In: Proceedings of 9th International Symposium on Image and Signal Processing and
Analysis (ISPA 2015), pp. 83–86, Zagreb (2015)
32 Filter Design for Image Decomposition and Applications to Forensics 1181
Gottschlich, C., Schönlieb, C.-B.: Oriented diffusion filtering for enhancing low-quality fingerprint
images. IET Biom. 1(2), 105–113 (2012)
Gragnaniello, D., Poggi, G., Sansone, C., Verdoliva, L.: Wavelet-Markov local descriptor for
detecting fake fingerprints. Electron. Lett. 50(6), 439–441 (2014)
Grossmann, T.G., Korolev, Y., Gilboa, G., Schönlieb, C.-B.: Deeply learned spectral total variation
decomposition. arXiv preprint arXiv:2006.10004 (2020)
Hait, E., Gilboa, G.: Spectral total-variation local scale signatures for image manipulation and
fusion. IEEE Trans. Image Process. 28(2), 880–895 (2018)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Hopper, T., Brislawn, C., Bradley, J.: WSQ gray-scale fingerprint image compression specification.
Technical report, Federal Bureau of Investigation (1993)
Horesh, D., Gilboa, G.: Separation surfaces in the spectral tv domain for texture decomposition.
IEEE Trans. Image Process. 25(9), 4260–4270 (2016)
Kennedy, J.R.E., Shi, Y.: Swarm Intelligence. Academic, San Diego (2001)
Le, T.M., Vese, L.A.: Image decomposition using total variation and div(BMO). Multiscale Model.
Simul. 4(2), 390–423 (2005)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Ma, J., Plonka, G.: The curvelet transform. IEEE Signal Process. Mag. 27(2), 118–133 (2010)
Mallat, S.: A Wavelet Tour of Signal Processing. Academic, San Diego (2008)
Maltoni, D., Maio, D., Jain, A.K., Prabhakar, S.: Handbook of Fingerprint Recognition. Springer,
London (2009)
Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. American
Mathematical Society, Boston (2001)
Moeller, M., Diebold, J., Gilboa, G., Cremers, D.: Learning nonlinear spectral filters for color
image reconstruction. In: Proceedings of the IEEE International Conference on Computer
Vision, pp. 289–297 (2015)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Neal, R.M.: Bayesian Learning for Neural Networks, Vol. 118. Springer Science & Business Media
(2012)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total
variation-based image restoration. Multiscale Model. Simul. 4(2), 460–489 (2005)
Osher, S., Solé, A., Vese, L.: Image decomposition and restoration using total variation minimiza-
tion and the H −1 norm. Multiscale Model. Simul. 1(3), 349–370 (2003)
Papafitsoros, K., Bredies, K.: A study of the one dimensional total generalised variation regulari-
sation problem. Inverse Prob. Imaging 9(2), 511 (2015)
Perona, P.: Orientation diffusions. IEEE Trans. Image Process. 7(3), 457–467 (1998)
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Optimization
(Symposium, University of Keele, Keele, 1968), pp. 283–298. Academic, London (1969)
Richter, R.: Cartoon-Residual Image Decompositions with Application in Fingerprint Recognition.
Ph.D. thesis, Georg-August-University of Goettingen (2019)
Richter, R., Gottschlich, C., Mentch, L., Thai, D., Huckemann, S.: Smudge noise for quality
estimation of fingerprints and its validation. IEEE Trans. Inf. Forensics Secur. 14(8), 1963–
1974 (2019)
Richter, R., Thai, D.H., Huckemann, S.: Generalized intersection algorithms with fixpoints for
image decomposition learning. arXiv preprint arXiv:2010.08661 (2020)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60(1–4), 259–268 (1992)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Springer (2009)
Schmidt, M.F., Benning, M., Schönlieb, C.-B.: Inverse scale space decomposition. Inverse Prob.
34(4), 1–34 (2018)
1182 R. Richter et al.
Schoelkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Shen, J.: Piecewise H −1 + H 0 + H 1 images and the Mumford-Shah-Sobolev model for segmented
image decomposition. AMRX Appl. Math. Res. Express (4), 143–167 (2005)
Steidl, G., Weickert, J., Brox, T., Mrázek, P., Welk, M.: On the equivalence of soft wavelet
shrinkage, total variation diffusion, total variation regularization, and sides. SIAM J. Numer.
Anal. 42(2), 686–713 (2004)
Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regulariza-
tion. Inverse Prob. 19(6), S165–S187 (2003). Special section on imaging
Thai, D., Gottschlich, C.: Directional global three-part image decomposition. EURASIP J. Image
Video Process. 2016(12), 1–20 (2016a)
Thai, D., Gottschlich, C.: Global variational method for fingerprint segmentation by three-part
decomposition. IET Biom. 5(2), 120–130 (2016b)
Thai, D., Huckemann, S., Gottschlich, C.: Filter design and performance evaluation for fingerprint
image segmentation. PLoS ONE 11(5), e0154160 (2016)
Turroni, F., Maltoni, D., Cappelli, R., Maio, D.: Improving fingerprint orientation extraction. IEEE
Trans. Inf. Forensics Secur. 6(3), 1002–1013 (2011)
Unser, M., Ville, D.V.D.: Wavelet steerability and the higher-order Riesz transform. IEEE Trans.
Image Process. 19(3), 636–652 (2010)
Van De Ville, D., Blu, T., Unser, M.: Isotropic polyharmonic B-splines: scaling functions and
wavelets. IEEE Trans. Image Process. 14(11), 1798–1813 (2005)
Vese, L., Osher, S.: Modeling textures with total variation minimization and oscillatory patterns in
image processing. J. Sci. Comput. 19(1–3), 553–572 (2003)
Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998)
Weickert, J.: Coherence-enhancing diffusion filtering. Int. J. Comput. Vis. 31(2/3), 111–127 (1999)
Wiesner, S., Kaplan-Damary, N., Eltzner, B., Huckemann, S.F.: Shoe prints: The path from
practice to science. In: Banks, D., Kafadar, K., Kaye, D. (eds.) Handbook of Forensic Statistics,
pp. 391–410. Springer (2020a)
Wiesner, S., Shor, Y., Tsach, T., Kaplan-Damary, N., Yekutieli, Y.: Dataset of digitized racs and
their rarity score analysis for strengthening shoeprint evidence. J. Forensic Sci. 65(3), 762–774
(2020b)
Wu, C., Tai, X.-C.: Augmented Lagrangian method, dual methods, and split Bregman iteration for
ROF, vectorial TV, and high order models. SIAM J. Imag. Sci. 3(3), 300–339 (2010)
Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-net for compressive sensing MRI. In: 30th
Conference on Neutral Information Processing Systems (NIPS 2016), pp. 10–18 (2016)
Yao, Z., Le Bars, J.-M., Charrier, C., Rosenberger, C.: A literature review of fingerprint quality
assessment and its evaluation. IET J. Biom. 5(3), 243–251 (2016)
Zeune, L., van Dalum, G., Terstappen, L.W., van Gils, S.A., Brune, C.: Multiscale segmentation
via bregman distances and nonlinear spectral analysis. SIAM J. Imaging Sci. 10(1), 111–146
(2017)
Deep Learning Methods for Limited Data
Problems in X-Ray Tomography 33
Johannes Schwab
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185
Tomographic Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1187
Case Examples in X-Ray CT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190
Limited Angle Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192
Reduction of Metal Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196
Low-Dose Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197
Further Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200
Abstract
J. Schwab ()
Department of Mathematics, University of Innsbruck, Innsbruck, Austria
e-mail: [email protected]
Keywords
Introduction
Most modern medical imaging methods rely on the solution of an inverse problem,
meaning that for given measured data g ∈ Y and physics-based forward model
R : X → Y, the task is to estimate the cause f ∈ X for the observed measurements
under the model R. In an ideal setting, this amounts in solving the following task:
Fig. 1 Different sources of imperfect data in tomographic imaging. LEFT, incomplete data (e.g.,
limited angle CT); MIDDLE, corrupted data due to high-intensity region (e.g., a metal artifact);
RIGHT, noisy data (e.g., low-dose CT)
• Only incomplete data is available, meaning that only parts of the complete
measurement data are given (Fig. 1).
• Partially corrupted data is measured. Here parts of the measurements are
affected by physical effects not modelled by R (Fig. 1).
• Presence of strong noise in the data. Physical measurements are inevitably
affected by statistical uncertainty; therefore, the measured data cannot be fully
described by the model R (Fig. 1).
All of these scenarios typically lead to ill-posed inverse problems, where the
reconstruction is either non-unique, the reconstruction process unstable, or the data
not in the range of the operator R. These issues can be analyzed by mathematical
regularization theory (Engl et al. 1996). Incomplete data and partially corrupted data
can lead to severe artifacts in the reconstruction. The noise in the data is propagating
to the reconstructed image and can be severely amplified in the reconstruction
process if the inverse of the forward operator is discontinuous. In all of these
problems, exact direct reconstruction methods are either unavailable or lead to
strong degradation of the reconstructed images. Iterative methods are extremely
flexible and show good performance in all three cases, but come with very high
computational cost. Deep Learning offers an alternative approach that can achieve
good performance while being computationally efficient (Wang 2016).
Background
Fig. 2 Basic deep learning approach to improve analytic image reconstruction. First an analytic
inversion method (derived for ideal data) is applied. In a second step, a deep learning algorithm is
used to improve the initial reconstruction
33 Deep Learning Methods for Limited Data Problems in X-Ray Tomography 1187
R+ : Y ⊃ range(R) ⊕ range(R)⊥ → X
y → arg min{xX | x ∈ X ∧ R∗ Rx = R∗ y}
the sequence fk converges to the minimum norm least squares solution R+ (g) of
the inverse problem (1) (Engl et al. 1996).
One advantage of indirect methods is that they are very flexible and one can
easily add a penalty term P : X → [0, ∞] to obtain solutions that have specific
characteristics (prior knowledge) by finding
Deep Learning
• Supervised learning: Here the input to the task and the corresponding solution
are known for the training set. Therefore, the training set consists of a subset of
A × B where the training pairs are coupled by the problem to be solved.
• Unsupervised learning: No paired data set is available, and the training set only
consists of the inputs; the solutions or even the concrete task is unknown. The
training set consist of a subset of A assumed to have some particular property
which is to be discovered.
1
N
∗ = arg min D((ai ), bi ),
: A→B N
i=1
where D is some similarity measure in the space B. However, a model ∗ also has
to predict meaningful solutions for data different from the data used for the fitting. A
model, which is unable to make predictions, is more or less useless. To achieve this,
the class of admissible operators is restricted to a subset C of all mappings : A →
B. In practice, additional strategies are adopted in the optimization procedure in
order to restrict the class of possible solution operators.
The ultimate goal for the application is to implement a computer program, which
finds a good approximation of the operator that is able to solve some specific task,
as, for example, image analysis and image reconstruction tasks in medical imaging
(Wang 2016). This model optimization is also termed learning of the model. For this
purpose, the user has to feed the computer with experience, called training data, for
example, images or measurement data.
We now introduce a popular approach of setting up such a task-solving machin-
ery for supervised learning problems. The approach consists in parametrizing the
function, which maps a given input to the solution of the problem. A particular class
of such functions is called artificial neural networks (Werbos 1974).
After discretization of the spaces A := RL and B := RM , a feed-forward artificial
neural network is given by
W : RL → RM
(3)
a → W (a) := σK ◦ WK ◦ σK−1 ◦ WK−1 ◦ . . . ◦ σ1 ◦ W1 (a),
where Wi : Rni → Rni+1 and σi : Rni → Rni for i ∈ {1, . . . , K} are affine linear
operators and point-wise nonlinear mappings, respectively. Further W denotes the
dependence of the function W on the operators Wi . We consider the real vector
spaces Rn1 = RL and RnK = RM as input and output spaces. Networks of the form
(3) are called feed-forward networks as they have a sequential, forward directed
33 Deep Learning Methods for Limited Data Problems in X-Ray Tomography 1189
or sigmoid functions.
Given a set of training data (ai , bi )N
i=1 , the goal now is to find good linear
operators, such that the neural network fits the training data and is able to generalize
the learned expertise. If we denote the vector of weights by W := (W1 , . . . , WN ),
the corresponding minimization problem can be formulated by
1
N
Find W minimizing L(W) := D(W (ai ), bi ), (4)
N
i=1
W → W − η∇L(W),
where η is a parameter determining the step size, also called learning rate in machine
learning. In practice a much cheaper alternative is deployed which only takes into
account the gradient of the cost function L corresponding to a subset of the training
data. Typically these subsets of training instances are randomly selected, resulting
in stochastic gradient descent methods. The partial derivatives of the gradient are
computed by the backpropagation algorithm (Hecht-Nielsen 1992; Higham and
Higham 2019).
Optimization of (4) is challenging, since the cost function is non-convex.
Various techniques to an improvement of this optimization process as well as the
1190 J. Schwab
• Evaluating the model with a data set not contained in the training set during
training to estimate the generalization capability of the network; this set is
typically called validation set.
• Including other operations (layers) in the network architecture; some examples
are pooling layers, which reduce the dimension by taking the maximum or
average over a small region of an intermediate output. Further possibilities
to improve generalization properties and optimization are dropout layers and
batch normalization layers and also residual connections and other skip
connections that add or concatenate outputs obtained earlier in the network to
inputs in later stages. Detailed explanation of these building blocks can be found
in Goodfellow et al. (2016) and Lundervold and Lundervold (2019).
• More sophisticated variants of gradient descent algorithms including momentum
or Nesterov updates; a summary and explanation of popular optimization
algorithms are given in Ruder (2016).
• Including a penalty term P for the weights in the cost function and minimizing
1
N
L(W) = D(W (ai ), bi ) + P(W).
N
i=1
The choice of the particular network and optimizer is very important for obtain-
ing the best possible results and depends on the specific task to be solved. Likewise,
the choice of the loss function D plays an important role to obtain a valuable model.
Depending on the specific task, a huge amount of different loss functions have been
proposed, 2 , 1 , structural similarity (SSIM) and Wasserstein distance being the
most popular, when working with images. In the following, however, we concentrate
on illustrating the conceivable application of neural networks rather than on the
concrete network design and optimization strategy.
To illustrate deep learning methods for tomographic image reconstruction, in the fol-
lowing, we consider the parallel beam geometry for X-ray computed tomography. In
this imaging method, the particular mathematical model R is the Radon transform,
which evaluates the integrals over all lines across the radiative absorption coefficient
of the tissue. In this case, the sought-after function f is the spatially depending
absorption coefficient, and the measured data follows the physical model.
g(θ, s) = Rf (θ, s) = f (x) dσ (x). (5)
L(θ,s)
33 Deep Learning Methods for Limited Data Problems in X-Ray Tomography 1191
Here Sd−1 denotes the (d − 1)−dimensional unit sphere, which declares the
direction of the line, s ∈ R determines the distance of the line to the origin, and
σ denotes the surface measure on L(θ, s). The data consists of all line integrals
along lines L(θ, s) := sθ ⊥ + Rθ where (θ, s) ∈ Sd−1 × R (Fig. 3). This operator R
is called Radon transform, and in theory exact inversion of the transform is possible.
An extensive overview of the mathematical formulation of X-ray tomography and
solution methods can be found, among others, in Natterer (2001), Deans (2007),
and Scherzer et al. (2009).
In the following, we will shortly describe common reconstruction methods for
X-ray computed tomography.
Analytic Reconstruction
For the Radon transform (5) for d = 2, such an exact reconstruction formula
(Natterer 2001) is given by
1 ∂s Rf (θ, s)
f (x) = ds dθ, (6)
4π 2 S1 R θ ·x − s
where ∂s denotes the partial derivative in the s component and · the standard inner
product in R2 . Using the Hilbert transform
Det
ecto
rs
S1
Sou
s
rces
L( , s)
Fig. 3 Illustration of parallel beam CT. The sources and detectors rotate around the object. The
vector θ ∈ S1 determines the angle of the parallel lines and the scalar s ∈ R the distance of the
line to the origin
1192 J. Schwab
1 g(θ, t)
Hs g(θ, s) = dt
π R s−t
1 ∗
f (x) = R Hs ∂s R(f ) (x), (7)
4π
Here the improper integral arising in the Hilbert transform is understood in the sense
of Cauchy. The operation Hs ∂s is called filtering, whereas the adjoint of the Radon
transform is also called backprojection operator. Such inversion formulas of filtered
backprojection type are available for different variants of the Radon transform as
well, occurring in various tomographic imaging problems.
Analytic reconstruction in practice then consists of implementing a discretized
version of the inversion formula. The inversion formula (7) can, for example, be
implemented by:
For some applications, it is favorable to only measure line integrals along a limited
range of angles, to reduce scanning time or being able to reduce the scanning
area to a smaller region. In some applications, it is even impossible to measure
all line integrals around the object under investigation due to physical constraints.
Therefore, the data is limited to certain areas which make high-quality reconstruc-
tions with simple inversion formulas an unsolvable task. Recently, however, deep
learning algorithms have made a huge advance that has made it possible to get a
good reconstruction despite the limited data. In the case of limited angle computed
tomography, we consider subsets of the form := × R ⊂ S1 × R, where is
a connected subset of S1 denoting the set of directions in which measurements are
available. The set corresponds to the range of measurement angles not covering
the full range of 180◦ required for exact reconstruction. Thus, for example, the set
covering only 90◦ is given by := {(sin(α), cos(α)) | α ∈ [0, π/2]} ⊂ S1 . The
restriction of the data can be formulated by χ g, where
33 Deep Learning Methods for Limited Data Problems in X-Ray Tomography 1193
⎧
⎨1 θ∈
χ (θ, s) =
⎩0 θ∈
/ .
1
N
∗ := arg min (χ gi ) − gi Y , (9)
∈C N
i=1
or using a direct reconstruction method for the full operator R. Many algorithms
have been proposed to extend the data to the full range of 180◦ , as, for example,
sinogram restoration based on Helgason-Ludwig consistency conditions (Huang
et al. 2017) and other consistency conditions. A popular approach is to approximate
the extension operator by a fully convolutional network or a generative adversarial
network. In Anirudh et al. (2018), the authors propose a 1D convolutional network
1194 J. Schwab
to generate a latent code from the partial sinogram. Subsequently this latent code
is fed to a 2D convolutional generator network, which is optimized together with
a discriminator network, rating the generated image. Applying the full-view Radon
transform to the generated image yields the projection data for all angles, and the
missing values in the original sinogram are replaced by the new data. Typically one
wants the extension operator ∗ to be consistent with the given data, meaning that
it does not change the available measured values of the data in .
or by some direct reconstruction algorithm (e.g., (6)). The refinement operator can
then be calculated by
1
N
∗ := arg min DX ((fi∗ ), fi ), (10)
∈C N
i=1
for some distance measure DX on X. Here C again is a class of operators which can
be defined by neural networks after discretization. One shortcoming of these post-
processing networks is that they depend heavily on the set of training data and are
vulnerable to adversarial examples or changes in the noise characteristics (Huang
et al. 2018b). Including knowledge of the operator R within the deep learning can
potentially remedy these problems and are discussed in the following.
1
f + ∈ arg min S(f )1,w + χ g − χ R(f )Y .
f ≥0 2
Here S denotes the analysis operator for the shearlet frame and · 1,w a weighted
1 norm. In a second step, a neural network ∗ is trained to estimate the invisible
coefficients from the visible ones.
f ∗ = S ∗ S(f + ) + r .
Learned Backprojection
Although some works for fully learned reconstructions for tomographic imaging
: Y → X exist (Zhu et al. 2018; Boink and Brune 2019), they are strongly
limited by the size of the images and the data, and for a known forward operator,
a fully learned scheme seems inadequate. Nevertheless, it is possible to improve
direct inversion methods by deep neural networks. In Würfl et al. (2018), the authors
propose a reconstruction framework based on a filtered backprojection algorithm
for limited angle tomography. Their framework consists in a weighting layer W,
which performs a pixel-wise independent weighting of the projection data, a 1D
convolutional layer with a single convolution mimicking the filtering operation in
(7), and a backprojection step. The reconstruction is obtained by
f ∗ = R∗ (W(g)) .
1196 J. Schwab
A similar approach of using the backprojection algorithm as basis for the network
for photoacoustic tomographic imaging was studied in Schwab et al. (2018, 2019c),
where compensation weights were learned in order to improve reconstruction for
limited data problems.
f = χcm f + χm f.
Due to the linearity of the Radon transform, this leads to a composition of the data
Rf = R(χcm f ) + fm R(χm ),
where fm denotes the radiative absorption coefficient of the metal. Most methods
for artifact reduction in the presence of metal now aim at finding the set
M := (θ, s) ∈ S1 × R | R(χm )(θ, s) = 0 ,
consisting of the data which is responsible for the artifacts in the reconstruction. The
set M contains the reliable information of the measured data coming from the non-
metal region; therefore, knowledge of this set would give the opportunity to remove
the corrupted data in this region and apply a data extension operator. One possible
approach to identify this set consists in three steps:
If the set M is found, similar to (9), a training set can be generated by computing
the Radon transform of the training examples fi s and the corresponding corrupted
data by setting gi (θ, s) = 0 for (θ, s) ∈/ Mi . Here the sets Mi denote the region of
corrupted data for the ith training instance. The extension operator ∗ : Y → Y
should then satisfy
1
N
∗ := arg min DY (χMi gi ), gi ,
∈C N
i=1
Since X-ray radiation creates a potential risk for the patient, it is desired to lower
the radiation dose. There are two main strategies to achieve a reduction of the X-ray
radiation for computed tomography, namely, limiting the X-ray flux by reducing the
operating current and minimizing the number of measurements.
Lowering the radiation dose results in a noisy data and consequently a noisy
reconstructed image with a low signal-to-noise ratio. This can potentially make
medical diagnosis more difficult, and therefore a great amount of algorithms were
proposed for improving image reconstruction for low-dose computed tomography.
Especially with the availability of large datasets, such as the Low-Dose Parallel
Beam (LoDoPaB)-CT data set (Leuschner et al. 2021), there has been a large body
of work aimed at improving the reconstruction of low-dose data (Kang et al. 2017).
Generally these methods can be categorized into (Chen et al. 2017):
Algorithms of the first and third category have the advantage that they can
be efficiently combined with classical reconstruction methods, whereas iterative
reconstruction algorithms tend to suffer from numerical complexity. Deep learning
methods have proven to be particularly suitable for tackling the reconstruction
problem, as they are able to achieve image quality, either favorable or comparable to
commercial iterative reconstructions, while at the same time being computationally
more efficient (Shan et al. 2019).
Deep learning offers the possibility to account for filtering in the sinogram
domain and image processing after some initial reconstruction. Given full dose
measurements g ∈ Y, the low-dose measurements can be defined by σ (g), where
σ : Y → Y denotes the transformation mapping from full-dose to low-dose data. A
neural network ∗ can then be trained to approximate the inverse of σ (Ghani and
i=1 ⊂ Y × Y and a suitable
Karl 2018). Denoting the training data by (σ (gi ), gi )N
operator class C ⊂ { : Y → Y}, we want to find
1
N
∗ = arg min DY (σ (gi )), gi .
∈C N
i=1
The reconstruction can then be carried out by a classical method after the application
of ∗ to the sinogram.
In contrast to this approach, if we denote the reconstruction from low-dose
measurements with a classical method by f σ , then the learning task in image
domain can be formulated by
1
N
∗
= arg min DX (fiσ ), fi ,
∈C N
i=1
Further Methods
(Wu et al. 2017; Chen et al. 2020; Guazzo 2020). The structure of these learned
iterative schemes is as follows. For an existing iterative procedure, a number of
iterations is chosen, and in all iterations up to this number, the update process is
augmented by a neural network. For given data g and an initial guess f0 , the final
reconstruction fˆ in its simplest form are given by
f ∗ = N
WN ◦ G (g) ◦ . . . ◦ W1 ◦ G (g) (f0 ),
N 1 1
where iWi denote the augmentation networks and Gi (g) iterative updates of the
reconstruction. These updates depend on the current iteration but also on the data g,
which results in a final reconstruction f ∗ that is consistent to the given data. This
class of algorithms can also be utilized for every inverse imaging problem, where the
forward operator can be modeled, including the three example problems discussed
above.
Other approaches that will continue to play a very important role in the future
are unsupervised methods that do not rely on a paired dataset. Recently a variety of
methods have been published in this field of research as well (Kwon and Ye 2021;
Lee et al. 2020; Kuanar et al. 2019), just to name a few. The great advantage of these
methods is that no paired training data need to be collected, which is very difficult
or even impossible in many experimental applications.
Completeness in such a rapidly developing field of research is impossible;
nevertheless, a more complete and detailed survey of deep learning methods for
inverse imaging is given in the nice review article (Arridge et al. 2019).
Conclusion
Deep learning methods show excellent results for tomographic image reconstruction
and represent a promising framework for obtaining good image quality for differ-
ent measurement cases that create incomplete, corrupted, or noisy data. Various
deep learning-based methods have meanwhile been designed in order to optimize
tomographic image reconstruction. Among them are learned iterative schemes,
network cascades, learned regularizers, and two-step approaches; we presented two-
stage strategies of deploying data-driven methods to improve image reconstruction
in frequently occurring imperfect data situations in X-ray CT. Most of these
approaches can be similarly adapted to other tomographic imaging modalities
as well. Nevertheless, it is important to consciously harness the power of deep
learning to ensure robustness and guarantee meaningful images for diagnosis. In
my opinion, knowledge of the physics (the modelling operator R) and consistency
constraints such as data consistency can help overcome these issues and should be
incorporated in the design of deep learning approaches in tomographic imaging.
Furthermore, careful and extensive validation and evaluation of these methods
including experts’ opinions from radiologists and medical doctors are necessary to
exploit the indisputable power of deep learning for medical imaging.
1200 J. Schwab
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems.
Inverse Probl. 10(6), 1217 (1994)
Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks.
Inverse Probl. 33(12), 124007 (2017)
Adler, J., Öktem O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–
1332 (2018)
Anirudh, R., Kim, H., Thiagarajan, J.J., Mohan, K.A., Champley, K., Bremer, T.: Lose the views:
limited angle CT reconstruction via implicit sinogram completion. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 6343–6352 (2018)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven
models. Acta Numer. 28, 1–174 (2019)
Bayaraa, T., Hyun, C.M., Jang, T.J., Lee, S.M., Seo, J.K.: A two-stage approach for beam
hardening artifact reduction in low-dose dental CBCT. IEEE Access 8, 225981–225994 (2020)
Beard, P.: Biomedical photoacoustic imaging. Interface Focus 1(4), 602–631 (2011)
Boink, Y.E., Brune, C.: Learned SVD: solving inverse problems via hybrid autoencoding. arXiv
preprint arXiv:1912.10840 (2019)
Boink, Y.E., Manohar, S., Brune, C.: A partially-learned algorithm for joint photo-acoustic
reconstruction and segmentation. IEEE Trans. Medi. Imaging 39(1), 129–139 (2019)
Boink, Y.E., Haltmeier, M., Holman, S., Schwab, J.: Data-consistent neural networks for solving
nonlinear inverse problems. arXiv preprint arXiv:2003.11253 (2020)
Bubba, T.A., Kutyniok, G., Lassas, M., Maerz, M., Samek, W., Siltanen, S., Srinivasan, V.:
Learning the invisible: a hybrid deep learning-shearlet framework for limited angle computed
tomography. Inverse Probl. 35(6), 064002 (2019)
Chen, H., Zhang, Y., Kalra, M.K., Lin, F., Chen, Y., Liao, P., Zhou, J., Wang, G.: Low-dose CT with
a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging 36(12),
2524–2535 (2017)
Chen, G., Hong, X., Ding, Q., Zhang, Y., Chen, H., Fu, S., Zhao, Y., Zhang, X., Ji, H., Wang,
G. et al.: Airnet: fused analytical and iterative reconstruction with deep neural network
regularization for sparse-data CT. Med. Phys. 47(7), 2916–2930 (2020)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Commun. Pure Appl. Math. J. Issued Courant Inst. Math.
Sci. 57(11), 1413–1457 (2004)
Deans, S.R.: The Radon Transform and Some of Its Applications. Courier Corporation. Dover
Publications, INC., Mineola, New York (2007)
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and
Image Processing. Springer Science & Business Media, New York (2010)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, vol. 375. Springer
Science & Business Media, Dordrecht (1996)
Frikel, J., Quinto, E.T.: Characterization and reduction of artifacts in limited angle tomography.
Inverse Probl. 29(12), 125007 (2013)
Ghani, M.U., Karl, W.C.: CNN based sinogram denoising for low-dose CT. In: Mathematics
in Imaging, pp. MM2D–5. Optical Society of America, Optical Society of America, Orlando,
Florida (2018)
Ghani, M.U., Karl, W.C.: Fast enhanced CT metal artifact reduction using data domain deep
learning. IEEE Trans. Comput. Imaging, IEEE Trans. Comput. Imaging, vol. 6, 181–193 (2019)
Gjesteby, L., Shan, H., Yang, Q., Xi, Y., Claus, B., Jin, Y., De Man, B., Wang, G.: Deep neural
network for CT metal artifact reduction with a perceptual loss function. In: Proceedings of the
Fifth International Conference on Image Formation in X-Ray Computed Tomography, vol. 1
(2018)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA (2016)
Gu, J., Ye, J.C.: Multi-scale wavelet domain residual learning for limited-angle CT reconstruction.
arXiv preprint arXiv:1703.01382 (2017)
33 Deep Learning Methods for Limited Data Problems in X-Ray Tomography 1201
Guazzo, A.: Deep learning for PET imaging: from denoising to learned primal-dual reconstruction
(2020)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining,
Inference, and Prediction. Springer Science & Business Media (2009)
Hauptmann, A., Adler, J., Arridge, S.R., Oktem, O.: Multi-scale learned iterative reconstruction.
IEEE Trans. Comput. Imaging, vol. 6, 843–856 (2020)
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Neural Networks for
Perception, pp. 65–93. Elsevier (1992)
Higham, C.F., Higham, D.J.: Deep learning: an introduction for applied mathematicians. SIAM
Rev. 61(4), 860–891 (2019)
Hounsfield, G.N.: Computerized transverse axial scanning (tomography): part 1. description of
system. Br. J. Radiol. 46(552), 1016–1022 (1973)
Huang, Y., Huang, X., Taubmann, O., Xia, Y., Haase, V., Hornegger, J., Lauritsch, G., Maier,
A.: Restoration of missing data in limited angle tomography based on Helgason–Ludwig
consistency conditions. Biomed. Phys. Eng. Express 3(3), 035015 (2017)
Huang, X., Wang, J., Tang, F., Zhong, T., Zhang, Y.: Metal artifact reduction on cervical CT images
by deep residual learning. Biomed. Eng. Online 17(1), 175 (2018a)
Huang, Y., Würfl, T., Breininger, K., Liu, L., Lauritsch, G., Maier, A.: Some investigations
on robustness of deep learning in limited angle tomography. In: International Conference
on Medical Image Computing and Computer-Assisted Intervention, pp. 145–153. Springer
(2018b)
Kang, E., Min, J., Ye, J.C.: A deep convolutional neural network using directional wavelets for
low-dose x-ray CT reconstruction. Med. Phys. 44(10), e360–e375 (2017)
Kuanar, S., Athitsos, V., Mahapatra, D., Rao, K.R., Akhtar, Z., Dasgupta, D.: Low dose abdominal
CT image reconstruction: an unsupervised learning based approach. In: 2019 IEEE International
Conference on Image Processing (ICIP), pp. 1351–1355. IEEE (2019)
Kwon, T., Ye, J.C.: Cycle-free cyclegan using invertible generator for unsupervised low-dose CT
denoising. arXiv preprint arXiv:2104.08538 (2021)
Landweber, L.: An iteration formula for Fredholm integral equations of the first kind. Am. J. Math.
73(3), 615–624 (1951)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lee, J., Gu, J., Ye, J.C.: Unsupervised CT metal artifact learning using attention-guided beta-
cyclegan. arXiv preprint arXiv:2007.03480 (2020)
Leuschner, J., Schmidt, M., Baguer, D.O., Maass, P.: LoDoPab-CT, a benchmark dataset for low-
dose computed tomography reconstruction. Sci. Data 8(1), 1–12 (2021)
Li, H., Schwab, J., Antholzer, S., Haltmeier, M.: Nett: solving inverse problems with deep neural
networks. Inverse Probl. 36(6), 065005 (2020)
Lundervold, A.S., Lundervold, A.: An overview of deep learning in medical imaging focusing on
mri. Zeitschrift für Medizinische Physik 29(2), 102–127 (2019)
Lunz, S., Öktem, O., Schönlieb, C.-B.: Adversarial regularizers in inverse problems. arXiv preprint
arXiv:1805.11572 (2018)
Mukherjee, S., Dittmer, S., Shumaylov, Z., Lunz, S., Öktem, O., Schönlieb, C.-B.: Learned convex
regularizers for inverse problems. arXiv preprint arXiv:2008.02839 (2020)
Natterer, F.: The Mathematics of Computerized Tomography. SIAM, Philadelphia (2001)
Obmann, D., Nguyen, L., Schwab, J., Haltmeier, M.: Sparse anett for solving inverse problems with
deep learning. In: 2020 IEEE 17th International Symposium on Biomedical Imaging Workshops
(ISBI Workshops), pp. 1–4. IEEE (2020)
Park, H.S., Chung, Y.E., Lee, S.M., Kim, H.P., Seo, J.K.: Sinogram-consistency learning in CT for
metal artifact reduction. arXiv preprint arXiv:1708.00607, 1 (2017)
Purcell, E.M., Torrey, H.C., Pound, R.V.: Resonance absorption by nuclear magnetic moments in
a solid. Phys. Rev. 69(1–2), 37 (1946)
Quinto, E.T.: Tomographic reconstructions from incomplete data-numerical inversion of the
exterior radon transform. Inverse Probl. 4(3), 867 (1988)
1202 J. Schwab
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.
04747 (2016)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Springer, New York (2009)
Schwab, J., Antholzer, S., Nuster, R., Haltmeier, M.: Real-time photoacoustic projection imaging
using deep learning. arXiv preprint arXiv:1801.06693 (2018)
Schwab, J., Antholzer, S., Haltmeier, M.: Big in Japan: regularizing networks for solving inverse
problems. J. Math. Imaging Vis., vol. 62, 445–455 (2019a)
Schwab, J., Antholzer, S., Haltmeier, M.: Deep null space learning for inverse problems:
convergence analysis and rates. Inverse Probl. 35(2), 025008 (2019b)
Schwab, J., Antholzer, S., Haltmeier, M.: Learned backprojection for sparse and limited view
photoacoustic tomography. In: Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878,
p. 1087837. International Society for Optics and Photonics, SPIE BiOS, San Francisco,
California (2019c)
Shan, H., Padole, A., Homayounieh, F., Kruger, U., Khera, R.D., Nitiwarangkul, C., Kalra, M.K.,
Wang, G.: Competitive performance of a modularized deep neural network compared to
commercial algorithms for low-dose CT image reconstruction. Nat. Mach. Intell. 1(6), 269–
276 (2019)
Wang, G.: A perspective on deep imaging. IEEE Access 4, 8914–8924 (2016)
Werbos, P.: Beyond regression: new tools for prediction and analysis in the behavioral sciences.
Ph. D. dissertation, Harvard University (1974)
Wu, D., Kim, K., El Fakhri, G., Li, Q.: Iterative low-dose CT reconstruction with priors trained by
artificial neural network. IEEE Trans. Med. Imaging 36(12), 2479–2486 (2017)
Wu, D., Kim, K., Kalra, M.K., De Man, B., Li, Q.: Learned primal-dual reconstruction for dual
energy computed tomography with reduced dose. In: 15th International Meeting on Fully
Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 11072,
p. 1107206. International Society for Optics and Photonics (2019)
Würfl, T., Hoffmann, M., Christlein, V., Breininger, K., Huang, Y., Unberath, M., Maier, A.K.:
Deep learning computed tomography: learning projection-domain weights from image domain
in limited angle problems. IEEE Trans. Med. Imaging 37(6), 1454–1463 (2018)
Zhang, Y., Yu, H.: Convolutional neural network based metal artifact reduction in x-ray computed
tomography. IEEE Trans. Med. Imaging 37(6), 1370–1381 (2018)
Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain-
transform manifold learning. Nature 555(7697), 487–492 (2018)
MRI Bias Field Estimation and Tissue
Segmentation Using Multiplicative Intrinsic 34
Component Optimization and Its Extensions
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204
Multiplicative Intrinsic Component Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207
Decomposition of MR Images into Multiplicative Intrinsic Components . . . . . . . . . . . . 1207
Mathematical Description of Multiplicative Intrinsic Components . . . . . . . . . . . . . . . . . 1208
Energy Formulation for MICO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
Optimization of Energy Function and Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211
Numerical Stability Using Matrix Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213
Execution of MICO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215
Some Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217
Introduction of Spatial Regularization in MICO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217
The Proposed TV-Based MICO Model and Its Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217
Spatiotemporal Regularization for 4D Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222
Modified MICO Formulation with Weighting Coefficients for Different Tissues . . . . . . . 1224
Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232
S. Wali
School of Information and Communication Engineering, University of Electronic Science and
Technology of China, Chengdu, China
Department of Mathematics, Namal Univeristy, Mianwali, Pakistan
e-mail: [email protected]
C. Li () · L. Zhang
School of Information and Communication Engineering, University of Electronic Science and
Technology of China, Chengdu, China
e-mail: [email protected]; [email protected]
Abstract
Keywords
Introduction
The formulation of MICO for bias field estimation and tissue segmentation based
on the decomposition of an MRI into two multiplicative components is presented
in this section. The proposed energy minimization technique leads to the MICO
algorithm for combined bias field estimation and tissue segmentation. We follow Li
et al. (2014) for most mathematical formulation and notations.
where J (x) is the clean image, b(x) is the bias field that accounts for the observed
image’s intensity inhomogeneity, and n(x) is zero-mean additive noise. The widely
accepted assumptions in the literature for both J and b are given in Wells et al.
(1996), Leemput et al. (1999), and Pham and Prince (1999). The bias field b is
supposed to vary smoothly. The true image J describes a physical characteristic of
the tissues being imaged, which should ideally take a specific value for voxels of the
same tissue type. As a result, for all point x in the i-th tissue, we assume that J (x)
is approximately a constant ci .
In this chapter, we consider Eq. (1) decomposes the MR image I into two
multiplicative components b and J , as well as additive zero-mean noise n. From this
aspect, we specify systematically biased field estimation and tissue segmentation as
a variational-based problem, which is seeking accurate decomposition of given MRI
I into two multiplicative components b and J. It is important to mention here that
the bias field b and the true image J are intrinsic components of the observed MR
image I . In this chapter, we consider an observed image I as a function I : → R
on a continuous domain .
In computer vision, a given observed image I can be decomposed into reflectance
image R and the illumination image S that can be shown in multiplicative form as
I = RS. These multiplicative components of an observed image are similar to Eq.
(1). The terminologies intrinsic images were introduced by Barrow and Tenenbaum
in (1978) to express these two multiplicative components. In computer vision,
estimating intrinsic images from an observed scene image has been a significant
challenge. Several methods for estimating the intrinsic images from a scene image
based on different assumptions on the two intrinsic images have been presented
(Tappen et al. 2005; Weiss 2001; Kimmel et al. 2003).
The bias field b and the real image J are considered as multiplicative intrinsic
components of an observed MR image in this study. From an observed MR image,
we present a unique approach for estimating these two components. We should point
1208 S. Wali et al.
out that the method proposed in this chapter differs from those used in computer
vision to estimate reflectance and illumination images. In fact, due to a lack of
knowledge about the unknown intrinsic images R and S, estimation of intrinsic
images is an ill-posed problem.
If no prior knowledge of the multiplicative components b and J of the observed
MR image I is used, estimation of these components is an underdetermined or ill-
posed problem. To solve the problem, we have to gain some knowledge about the
bias field b and true image J. The piecewise constant property of the true image
J and the smoothly varying property of the bias field b are used in this study to
present a strategy that uses the basic properties of the true image and bias field. In
the development of our proposed technique, the decomposition of the MR image I
into two multiplicative intrinsic components b and J with their respective spatial
properties is completely exploited.
The membership function ui is a binary membership function in the ideal case when
each voxel contains just one kind of tissue, with ui (x) = 1 for x ∈ 1 and ui (x) = 0
for x ∈/ i . Because of the partial volume effect, one voxel may include more
than one type of tissue, especially at the interface between adjacent tissues. In this
scenario, the N tissues are represented by fuzzy
membership functions ui (x) with
values ranging from 0 to 1 and satisfying N i=1 ui = 1. The fuzzy membership
function ui (x) value can be construed as the proportion of the i-th tissue within
the voxel x. A column vector-valued function u = (u1 , · · · uN )T , where T is the
transpose operator, can be used to express such membership functions u1 , · · · uN .
The space of all such vector-valued functions is denoted as U.
N
J (x) = ci ui (x). (4)
i=1
The function in Eq. (4) is a piecewise constant function when the membership
functions ui are binary functions, with J (x) = ci for x ∈i = {x : ui (x) = 1}.
If u1 , · · · , uN are the binary membership functions, the segmentation is called the
hard segmentation, while the corresponding regions 1 , · · · , N show an image
domain partition, with the conditions as ∪N i=1 i = and i ∩ j = ∅. On the
other hand, the functions u1 , . . . uN are fuzzy membership functions with values
between 0 and 1 representing a soft segmentation result.
We propose an energy minimization approach for simultaneous bias field esti-
mation and tissue segmentation based on the image model Eq. (1). The membership
function u = (u1 , · · · , uN ) gives the outcome of tissue segmentation. The estimated
bias field b is used to compute the bias field corrected image, which is expressed as
the reciprocal, i.e., Ib .
Based on the image model Eq. (1) and the intrinsic features of the bias field and the
true image as mentioned in section “Decomposition of MR Images into Multiplica-
tive Intrinsic Components”, we present an energy minimization formulation for bias
field estimation and tissue segmentation. In light of the image model (1), we address
the problem of determining the multiplicative intrinsic components b and J of an
observed MR image I to minimize the following energy.
1210 S. Wali et al.
F (b, J ) = |I (x) − b(x)J (x)|2 dx. (5)
N
ci ui (x) = ci for x ∈ i
i=1
N 2
F (u, c, w) = I (x) − w G(x)
T
ci ui (x) dx
i=1
N
2
= I (x) − wTG(x)ci dx
i=1 i
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1211
N
2
= I (x) − wTG(x)ci ui (x)dx
i=1
(7)
N 2
q
Fq (u, c, w) = I(x) − wTG(x)ci ui (x)dx. (9)
i=1
The optimal membership functions that minimize the energy F (u, c, w) for the
scenario q > 1 are fuzzy membership functions with values between 0 and 1.
By minimizing the energy F (u, c, w) in Eq. (8) or Fq (u, c, w) in Eq. (9), our
technique accomplishes image segmentation and bias field estimation, subject to the
constraints u ∈ U. The fact that the energy Fq (u, c, w) is convex in each variable,
u, c, or w, is a desired characteristic (Li et al. 2009). This characteristic guarantees
that the energy Fq (u, c, w) has a unique minimum point for each of its variables.
We used alternating minimization technique in which one can achieve the minimum
and independent solution of Fq (u, c, w) with respect to each of its variables given
1212 S. Wali et al.
the other two fixed. The alternating minimization of Fq (u, c, w) with respect to each
of its variables is described below.
Optimization of c
The energy Fq (u, c, w) is optimized with respect to the variable c for fixed w and
u = (u1 , · · · , uN )T . It is simple to present that Fq (u, c, w) is minimized by c =
ĉ = (c1 , · · · , ĉN )T with the following:
q
I(x)b(x)ui (x)dx
ĉi = q , i = 1, · · · , N. (10)
2
b (x)ui (x)dx
⎛ ⎞
N
G(x)GT (x) ⎝ ci2 ui (x)⎠ dx.
q
A= (12)
i=1
The equation ∂F
∂w = 0 can be represented as a linear equation:
Aw = v (13)
We compute the estimated bias field as b̂(x) = ŵTG(x) given the solution to
this equation, ŵ = A−1 v. The non-singularity of matrix A is demonstrated in
section “Numerical Stability Using Matrix Analysis”.
∂w = −2v + 2Aw = 0 has a unique solution
As a result, the linear equation ∂F
ŵ = A−1 v. The vector ŵ can be represented explicitly by using Eq. (12) as follows:
⎛ ⎛ ⎞ ⎞−1 ⎛ ⎞
N
N
⎜ ⎟
G(x)GT (x) ⎝ ci2 ui (x)⎠ dx ⎠ G(x)I (x) ⎝ ci ui (x)⎠ dx.
q q
ŵ = ⎝
i=1 i=1
(14)
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1213
The estimated bias field is obtained using the optimum vector ŵ provided
by Eq. (14).
We will verify the non-singularity of the matrix A, as well as the numerical stability
of the foregoing calculation for solving the linear system (13) in section “Numerical
Stability Using Matrix Analysis”. These are two critical concerns in the implemen-
tation of our proposed technique.
Optimization of u
We begin with the scenario where q > 1 and minimize the energy F (u, c, w) for
fixed c and w, subject to the constraint that u ∈ U. It can be demonstrated that
F (u, c, w) is minimized at u = û = (û1 , · · · , ûN )T , obtained by the following:
1
(δi (x)) 1−q
ûi (x) = 1
, i = 1, · · · , N, (16)
N
(δ
j =1 j (x)) 1−q
where:
where:
The bias field estimate computation comprises calculating the vector v in (11), the
matrix A in (12), and the inverse matrix A−1 in (14). The matrix A is an M × M
matrix, where M is the number of basis functions. We use M = 20 basis functions
in this chapter; hence the dimension of matrix A is a 20 × 20. The non-singularity
of the matrix A assures that the inverse matrix A−1 exists and that the Eq. (13) has
a unique solution. We will also demonstrate that the numerical calculation of the
inverse matrix A−1 is stable.
1214 S. Wali et al.
where λmin (A) and λmax (A) are the minimal and maximal eigenvalues of matrix A,
respectively. For very large value of the condition number κ(A), minor variations
in the matrix A or the vector v, which are most likely caused by image noise and
accumulating intermediate rounding errors, can cause very large variation of the
ˆ to the Eq. (13). As a result, it is vital to guarantee that the condition
solution bw
number kappa(A) is not huge, as shown below, to ensure the robustness of the bias
field computation.
The matrix analysis that follows is predicated on the orthogonality of the basis
functions, that is:
gm (x)gk (x)dx = δmk , (19)
For instance, if maxi {ci } = 250 and mini {ci } = 50, by the inequality (20), we have
2
κ(A) ≤ 250
502
= 25. We observed that the condition numbers of the matrix A are at
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1215
this level in the implementations of our approach to actual MRI data, which is small
enough to assure the numerical stability of the inversion operation.
Execution of MICO
During the iteration procedure described above, each of the three variables is
updated with the other two variables computed in the previous iteration. In Step-1 of
the preceding iteration process, we only need to initialize two of the three variables,
such as u and c. In Step-5, the convergence criteria is |c(n) − c(n−1) | < ε, where c(n)
is the vector c updated in Step-3 at the n-th iteration, and ε is set to 0.001.
We used a synthetic image in Fig. 1a to show the robustness of our proposed
technique to initialization, using three alternative initializations of the member-
ship functions u1 , · · · , uN and the constants c1 , · · · , cN . The initial membership
function u = (u1 , · · · , uN ) and the vector c = (c1 , · · · , cN ) can be visualized
N
as an image defined by Ju,c (x) = i=1 ci ui (x). The images Ju,c for the three
different initializations of u and c are shown in Fig. 1b, c, and d that show a
wide range of patterns. The first initialization illustrated in Fig. 1b is achieved by
randomly generating the membership functions u1 (x), · · · , uN (x) and the constants
c1 , · · · , cN . The bias field converges to the same function for these three alternative
initializations of u and c up to a scalar multiple. The three estimated bias fields
are the same, up to a minor difference, when the bias fields are normalized
(e.g., dividing the bias field b by its maximum value max x{b(x)}), as shown in
Fig. 1e. Meanwhile, the membership function u converges to the same vector-valued
function, with just a minor variation, providing the identical segmentation result as
shown in Fig. 1f. The corrected bias field image is provided in Fig. 1g.
We display the energy minimization F (u, c, w) of the variables u, c, and w
computed at each iteration up to the 20 iterations in Fig. 1h. The energy F (u, c, w)
rapidly drops to the same value from three distinct initial values corresponding to
three separate initializations. Figure 1h also presents the fast convergence of the
iteration in MICO, as we can see that the energy is rapidly decreased and converges
to the minimal value in less than 10 iterations. As a result, in our MICO applications,
we often just perform 10 iterations.
1216 S. Wali et al.
(e) Estimated bias (f) Segmentation re- (g) Bias field cor-
field. sult. rected image.
8
x 10
3
energy curve 1
energy curve 2
2.5 energy curve 3
2
Energy
1.5
0.5
0
0 5 10 15 20
Iterations
(h) Energy minimization curve.
Fig. 1 Robustness of our proposed method to different initializations. (a) Original image, (b)–(d)
three possible initializations of the membership functions are visualized, (e) estimated bias field,
(f) segmentation result, (g) bias field correction result, (h) curves illustrating the energy F used in
the iteration process for three different initializations (b), (c), and (d)
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1217
Some Extensions
The original MICO formulation described above can be easily extended by includ-
ing a regularization term on the membership functions. Regularization of the mem-
bership functions can be accomplished using the MICO formulation by combining
the total variations (TV) of the membership functions in the following energy:
N
F(u, c, w) = λF (u, c, w) + T V (ui ), (21)
i=1
where F is the energy defined in (8), λ > 0 is the weight of F , and T V is the total
variations of u defined by the following:
T V (u) = |∇u(x)|dx. (22)
This energy should be minimized subject to the constraint that 0 ≤ ui (x) ≤ 1 and
N
i=1 ui (x) = 1 for every point x. The variational formulation in (21) is referred
to by TVMICO formulation. The definition of this energy (21) is simple; however,
dealing with the aforementioned point-wise constraint is not straightforward in the
context of energy minimization.
Many scholars have developed numerous numerical approaches (Goldstein and
Osher 2009) to address variational problems in the context of image segmentation
using a TV regularization term T V (u) for a membership function u subject to the
constraint 0 ≤ u(x) ≤ 1 in recent years. These approaches can only segment
images into two complementary regions, denoted by the membership functions u
and 1−u. In general, three or more membership functions u1 , · · · , uN are employed
to represent N > 2 regions for segmentation. Li et al. developed a numerical
strategy to address the energy minimization problem with TV regularization on the
membership functions in Li et al. (2010); they used the operator splitting method
proposed by Lions and Mercier in (1979). The numerical technique provided in Li
et al. (2010) can be used to minimize the energy F with respect to the membership
functions u1 , · · · , uN in Eq. (21). The energy minimizations with respect to the
variables c and w, which are independent of the TV regularization term of the
membership functions, remain the same as described in section “Optimization of
Energy Function and Algorithm”.
N 2 N
q q
min λ I (x) − ci w G(x) ui (x)dx +
T
∇ui (x) dx, (23)
u,c,w
i=1 i=1
where λ is a positive parameter which can balance the length of the boundary ∂i
because Tv the second term in Eq. 23 equals to the length of the boundary at ith
position. We will discuss both cases for q = 1 and q > 1. When q = 1, ui can only
take values 0 and 1, and then the vector-valued function for bounded variation space
can be defined as follows:
U0 u = (u1 , · · · , uN )T : ui ∈ BV (ω), ui (x) ∈ {0, 1}, i = 1, · · · , N,
N
and ui (x) = 1, for all x ∈ (24)
i=1
At each point x, there is only one function with a value of 1, while all the other
functions have a value of 0. As a result, set U0 is not continuous, which causes
challenges and instability in numerical implementations. However, we may relax
binary indicator function defined in Eq. 24 to fuzzy membership functions ui that
meet the nonnegativity and sum-to-one constraint, i.e., (u1 , . . . , uN ) belongs to the
set described as U in Eq. (3). It is self-evident that ui (x) ∈ [0, 1] and is a simplex at
any x. As a result, ui (x) may be thought of as the chance that pixel x belongs to the
ith class.
The proposed model Eq. 23 is a convex with respect to u, c, and w independently,
but not in together. The TV could be with L2 (He et al. 2012) and L1 (Li et al. 2016)
fidelity terms. We can also use some nonlinear and nonconvex regularizations such
as total generalized variation (Wali et al. 2019a) and Euler’s elastica (Liu et al.
2019; Wali et al. 2019b) for further extensions; however, these models need more
constrains to relax and require efficient algorithm such as ADMM. In this section,
we only focus on L1 fidelity term, and we called our proposed method as total
variation-based multiplicative intrinsic component optimization (TVMICO).
N
2
min λ I (x) − ci w G(x) vi (x)dx +
T
pi (x) dx + lU (v),
p,v,u,c,w
i=1
0, v ∈ U;
lU (v) =
∞, otherwise.
The unconstrained augmented Lagrangian functional for Eq. (25) can be formulated
as follows:
N
2
L(p, v, u, c, w; μp , μv ) = λ I (x) − ci wT G(x) vi (x)dx + pi (x)dx
i=1
N
γ 2
+ lU (v), + μpi , ∇ui − pi + ∇ui (x) − pi (x) dx
2
i=1
N
γ 2
+ μvi , ui − vi + ui (x) − vi (x) dx , (26)
2
i=1
where μpi = μp1 , · · · , μpN and μvi = μv1 , · · · ,μvN are Lagrange multipliers and
γ is a positive constant. Here μpi , ∇ui − pi = μTpi (x)(∇ui (x) − pi (x))dx and
μvi , ui − vi = μTvi (x)(ui (x) − vi (x))dx. The ADMM can update Lagrangian
multipliers after solving primal variables in Gauss-Seidel manner. The ADMM for
solving Eq. (26) can be described in the following Algorithm 1.
wk+1 = arg min L(pk+1 , vk+1 , uk+1 , ck+1 , w; μkp , μkv ) (31)
p
k+1
pi = μpi + γ (∇ui
μk+1 k
− pik+1 )
k+1
vi = μvi + γ (ui
μk+1 k
− vik+1 )
p-Subproblem
We can write terms from (26) associated with primal variable p and fixed all other
variables as follows:
N
pk+1 = arg min pi (x)dx − (μkpi (x))T pi (x)dx
p
i=1
2
γ k
+ ∇ui (x) − pi (x) dx . (32)
2
Equation (32) is equivalent to the following:
N
2
γ
p k+1
= arg min pi (x)dx + pi (x) − Xk dx , (33)
p 2
i=1
μkp (x)
where Xk = ∇uki (x) + γi . Equation (33) has a close form solution, and it can
be solved by shrinkage operator; we can compute pk+1 as follows:
1
pk+1 = S Xk , . (34)
γ
S denotes the shrinkage operator, which is defined as follows:
X
S(X, γ ) = ∗ max( X − γ , 0).
X
v-Subproblem
The subproblem for v is as follows:
N
2
v k+1
= arg min λ I (x) − ci (w ) G(x) vi (x)dx − (μkvi (x))T vi (x)dx
k k T
v
i=1
2
γ k
+ ui (x) − vi (x) dx + lU (v) . (35)
2
Equation (35) is equivalent to the following:
N
2
γ
v k+1
= arg min vi (x) − Y k + lU (v), (36)
v 2
i=1
2
μkv (x) λI (x)−cik (wk )T G(x)
where = YK uki (x) + γi
− . Because U is a convex simplex
γ
at any x in domain , the solution is given by the following:
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1221
N
vk+1 = U Yk , (37)
i=1
where denotes the projection onto the simplex U ; for more details, please see
Chen and Ye (2011).
u-Subproblem
The subproblem for u is as follows:
N
2
γ k
u k+1
= arg min (∇uki (x))T μpi (x) + ∇ui (x) − pik+1 (x) dx
u 2
i=1
2
γ
+ (uki (x))T μkvi (x) + ui (x) − vik+1 (x) dx . (38)
2
The closed-form solution of uk+1 can be produced from the following equation:
We follow Wang et al. (2008), where diagonalized technique is used to get the fast
solution for uk+1 .
N
2
ck+1 = arg min λ I (x) − ci wTG(x) vi (x)dx . (40)
c
i=1
To find ck+1 , we compute the similar solution used in basic MICO described
in section “Multiplicative Intrinsic Component Optimization” with regularization
parameter λ.
1222 S. Wali et al.
k+1
λI (x)b(x)ui (x)dx
c k+1
= 2 k+1
, i = 1, · · · , N. (41)
λb (x)ui (x)dx
The estimated bias field is calculated by bk+1 using the optimum vector wk+1 given
by Eq. (42).
where J (·, t) is the true image, b(·, t) is the bias field, and n(·, t) is additive noise.
We assume there are N types of tissues in the Nimage domain . The true image
J (x, t) can be approximated by J (x, t) = i=1 ci (t)ui (x, t), where N is the
number of tissues in , ui (·, t) is the membership function of the i-th tissue, and
the constant ci (t) is the value of the true image J (x, t) in the i-th tissue. For
convenience, we represent the constants c1 (t), · · · , cN (t) with a column vector
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1223
c(t) = (c1 (t), · · · , cN (t))T . The membership functions u1 (x, t), · · · , uN (x, t) are
also represented by a vector-valued function u(x, t) = (u1 (x, t), · · · , uN (x, t))T .
The bias field b(·, t) at each time point t is estimated by a linear combination of
a set of smooth basis functions g1 (x), · · · , gM (x). Using the vector representation
in section “Mathematical Description of Multiplicative Intrinsic Components”, the
bias field b(·, t) at the time point t can be expressed as follows:
with w(t) = (w1 (t), · · · , wM (t))T , where w1 (t), · · · , wM (t) are the time-
dependent coefficients of the basis function gj (x), j = 1, · · · , M.
The spatiotemporal regularization of the membership functions ui (x, t) can be
naturally taken into account in the following variational formulation with a data
term (image-based term) and a spatiotemporal regularization term as follows:
N
G(u, c, w) = λ F (u(·, t), c(t), w(t))dt + T V (ui ) (46)
[0,L] i=1
where λ > 0 is a constant, F (u(·, t), c(t), w(t)) is the data term defined in (8) for
the image I (·, t) at the time point t, namely:
N
q
F (u(·, t), c(t), w(t)) = |I (x, t) − w(t)T G(x)ci (t)|2 ui (x, t)dx,
i=1
where the gradient operator ∇ is with respect to the spatial and temporal variables
x and t. We call the above variational formulation a 4D MICO formulation.
The minimization of the energy G is subject to the constraints on the membership
function. Therefore, we solve the following constrained energy minimization
problem:
The minimization of the energy G with respect to c(t) and w(t) is independent
of the spatiotemporal regularization term in (46). The optimal vectors c(t) and
1224 S. Wali et al.
w(t) can be computed for each time point t independently from the image I (·, t)
as in the energy minimization for the basic MICO formulation described in
section “Optimization of Energy Function and Algorithm”. The numerical technique
in Li et al. (2010) for variational formulations with TV regularization can be used to
minimize G with respect to the 4D membership function u subject to the constraint
in Eq. (48). In our future research work focusing on 4D segmentation based on
the fundamental MICO formulation, we will provide a detailed explanation of the
numerical approach for addressing the constrained energy minimization problem in
Eq. (48) and its modified variants.
N
q
F (u, c, w) = λi |I (x) − wT G(x)ci |2 ui (x)dx, (49)
i=1
here λi is the coefficient for the i-th tissue. The parameters λ1 , · · · , λN provide
users the option of improving the outcomes of the standard MICO formulation
in 2. For instance, if the i-th tissue is over-segmented using the standard MICO
formulation in section “Multiplicative Intrinsic Component Optimization”, the
above-modified formulation in Eq. (49) with a large λi > 1 can be used instead.
Our approach has been thoroughly validated on both synthetic and real MRI data,
including 1.5T and 3T MRI data. In this part, we first provide experimental results
for various synthetic and actual MR images, including those with significant inten-
sity inhomogeneities. We also give quantitative evaluation findings and comparisons
with other well-known methodologies.
In our MICO applications for 1.5T and 3T MR images, we employ 20 poly-
nomials of the first three orders as the basis functions g1 , · · · , gM with M = 20.
For images obtained from 1.5T and 3T MRI scanners, our technique with these 20
basis functions works effectively. The intensity inhomogeneities in high-field (e.g.,
7T) MRI scanners exhibit more complex profiles than 1.5T and 3T MR pictures.
More basis functions are required in this circumstance so that a wider variety of
bias fields may be well represented by linear combinations. Given an appropriately
large number of basis functions, any function can be well approximated by a linear
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1225
Fig. 3 The left column displays the results for images with extreme intensity inhomogeneity.
Columns 2, 3, and 4 show the estimated bias fields, segmentation results, and bias field corrected
images, respectively
bias field in the interval [1 − α, 1 + α] with α > 0 indicates the degree of intensity
inhomogeneity. We created five image sets with α = 0.1, 0.2, 0.3, 0.4, and 0.5.
We constructed six alternative bias fields with values in [1 − α, 1 + α] for each α
and multiplied them with the original image obtained from BrainWeb to obtain six
images with varying intensity inhomogeneities. The images were then subjected to
six different degrees of noise. Thus, the five sets of images have 30 images with
varying degrees of intensity inhomogeneities and noise levels. We first show the
segmentation results of the 4 tested methods for 2 of the 30 images in Fig. 4; we
first show the segmentation results of the 4 tested methods for 2 of the 30 images,
1 with the lowest degree of intensity inhomogeneity (generated with α = 0.1) and
the other 1 the highest degree of intensity inhomogeneity (generated with α = 0.5).
By visual comparison, the segmentation results of the four approaches for an image
with a low degree of intensity inhomogeneity seem similar, as shown in the upper
row of Fig. 4. Our technique has a distinct benefit for images with a high degree of
intensity inhomogeneity, as seen in the lower row of Fig. 4.
By evaluating the segmentation results using the Jaccard similarity (JS) index
(Shattuck et al. 2001), a more objective and exact comparison of the segmentation
accuracy of the four segmentation techniques can be done.
|S1 ∩ S2 |
J (S1 , S2 ) = , (50)
|S1 ∪ S2 |
Fig. 4 Comparison of our method with SPM, FSL, and FANTASM on synthetic images with
different degrees of intensity inhomogeneities. The input images are displayed in the first column
from the left, containing one with a low degree of intensity inhomogeneity (in the top row) and one
with a high degree of intensity inhomogeneity (in the lower row). The segmentation results of our
technique, SPM, FSL, and FANTASM are displayed in the second, third, fourth, and fifth columns,
respectively
ground truth. We have the ground truth of the segmentation of the WM, GM, and
CSF for synthetic data from the BrainWeb, which can be directly utilized as S2 in
Eq. (50) to compute the JS index. The greater the JS value, the more similar the
algorithm segmentation is to the reference segmentation.
The comparison of JS values of the 4 approaches on the 30 synthetic images
with varying degrees of intensity inhomogeneities and different amounts of noise is
shown in Fig. 5. The box plot of the JS values for the GM and WM generated by
our approach (MICO and TVMICO), SPM, FSL, and FANTASM is shown in Fig. 5.
In terms of segmentation accuracy and robustness, the box plot of the JS values in
Fig. 5 clearly shows that MICO and TVMICO perform better than SPM, FSL, and
FANTASM.
We see that the box in the box plot for the basic MICO is comparatively
shorter, and there are no outliers in the JS values throughout all 30 test images.
This demonstrates the basic MICO’s intended robustness. The TVMICO is slightly
more accurate than the regular MICO; however there are outliers in the TVMICO’s
JS values. The performance of TVMICO is determined by the parameter λ in
Eq. (21), which must be modified in some circumstances. We set λ = 0.01 for
all 30 test images in this experiment and observed that the results are generally
favorable, except for one scenario, which results in outliers in the box plot in
Fig. 5. In comparison, the basic MICO is more robust and has more steady
performance than TVMICO, while the latter is somewhat more accurate in most
circumstances. In reality, the difference in segmentation accuracy between MICO
and TVMICO is not substantial for images with reasonable noise levels. When
robustness is a priority and the image noise level is low, we recommend using
1228 S. Wali et al.
Fig. 5 Quantitative 1
evaluation of TVMICO (with
λ = 0.01), MICO, SPM, FSL, 0.9
and FANTASM segmentation
0.8
outcomes for 30 images using
Jaccard similarity index with 0.7
JS for GM
ground truth
0.6
0.5
0.4
0.3
0.2
0.1
TVMICO MICO SPM FSL Fantasm
1
0.9
0.8
0.7
JS for WM
0.6
0.5
0.4
0.3
0.2
0.1
TVMICO MICO SPM FSL Fantasm
the basic MICO. Otherwise, TVMICO has better segmentation and bias-corrected
results; see Figs. 6, 7 and 8. When the noise level is high, the results obtained by
our proposed TVMICO outperform the basic MICO. Figures 6, 7 and 8 show the
progress in segmentation and bias field correction in zoomed images. We added
various intensity inhomogeneities and noise to the images generated from the
atrophy simulator to assess the performance of our technique in the presence of
intensity inhomogeneities and noise. In this experimental result, we set λ = 0.008
in the TVMICO formulation in Eq. (23). We observed that the performance of
the TVMICO formulation is affected by the parameter λ as well as certain extra
parameters in the numerical method for energy minimization with respect to the
membership functions. More information on the implementation and validations of
the 4D MICO formulation in Eq. (46) and its modified variants will be published in
a subsequent publication as an extension of this study. In the case of fully automatic
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1229
Fig. 6 On BrainWeb data, we obtained results for tissue segmentation and bias correction using
our proposed MICO and TVMICO. Figure (a) shows the original image, (b) and (e) show the
segmentation results, (c) and (f) show bias fields, and (d) and (g) provide bias field corrected
images
Fig. 7 On BrainWeb data, we obtained results for tissue segmentation and bias correction using
our proposed MICO and TVMICO. Figure (a) shows the original image, (b) and (e) show the
segmentation results, (c) and (f) show bias fields, and (d) and (g) provide bias field corrected
images
segmentation of huge data sets, robustness and stability of performance are critical.
The basic MICO is preferred to TVMICO because of its robustness and stability.
MICO’s estimated bias field b̂ can be used to compute the bias field corrected
image I /b̂. We examined the performance of MICO’s bias field correction and
compared it to two well-known bias field correction methods, namely, the N3
approach described in Sled et al. (1998) and the entropy minimization method
proposed in Likar et al. (2001).
1230 S. Wali et al.
Fig. 8 On BrainWeb data, we obtained results for tissue segmentation and bias correction using
our proposed MICO and TVMICO. Figure (a) shows the original image, (b) and (e) show the
segmentation results, (c) and (f) show bias fields, and (d) and (g) provide bias field corrected
images
σ (T )
CV (T ) = ,
μ(T )
where σ (T ) and μ(T ) denote the standard deviation and mean of the intensities in
the tissue T , respectively. The CJV is defined as follows:
σ (W M) + σ (GM)
CJ V = .
|μ(W M) − μ(GM)|
The CV and CJV of the bias field corrected images are used to evaluate the
performance of bias field correction, with lower CV and CJV values indicating
better bias field correction outcomes.
We used our approach, as well as the N3 and entropy minimization algorithms
included in the MIPAV software, to analyze 15 pictures from 3 Tesla MRI scanners.
MIPAV software is freely accessible at http://mipav.cit.nih.gov/. The CV and
CJV values of the 3 tested techniques for the 15 images are displayed in Fig. 9,
demonstrating that our method outperforms the N3 and entropy minimization
methods.
It is worth noting that the GM and WM are the ground truth in the conventional
definition of CV and CJV in the literature on bias field correction (Vovk et al. 2007).
We used an approximation of the ground truth of GM/WM by the intersection of
the segmented GM/WM obtained by applying the K-means algorithm to the bias-
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1231
CV for GM
on 15 images from 3T MR
scanners (a) CV for GM. (b) 0.1
CV for WM. (c) CJV
0.05
0
MICO N3 Entropy minimization
(a) CV for GM
0.2
0.15
CV for WM
0.1
0.05
0
MICO N3 Entropy minimization
(b) CV for WM
0.65
CJV
0.45
0.25
0.05
MICO N3 Entropy minimization
(c) CJV
corrected images by the three compared bias field correction methods: our method
and the well-known N3 method (Sled et al. 1998) and the entropy minimization
method (Likar et al. 2001).
1232 S. Wali et al.
Conclusion
References
Ahmed, M., Yamany, S., Mohamed, N., Farag, A., Moriarty, T.: A modified fuzzy c-means
algorithm for bias field estimation and segmentation of MRI data. IEEE Trans. Med. Imaging
21(3), 193–199 (2002)
Axel, L., Costantini, J., Listerud, J.: Intensity correction in surface-coil MR imaging. Am. J. Radiol.
148(2), 418–420 (1987)
Barrow, H., Tenenbaum, J.: Recovering intrinsic scene characteristics from images. In: Hanson,
A., Riseman, E. (eds.) Computer Vision Systems, pp. 3–26. Academic, Orlando (1978)
Bezdek, J.C., Ehrlich, R., Full, W., FCM: The fuzzy c-means clustering algorithm. Comput.
Geosci. 10(2–3), 191–203 (1984)
Chen, K.: Introduction to Variational Image-Processing Models and Applications (2013)
Chen, Y., Ye, X.: Projection onto a simplex, arXiv preprint arXiv:1101.6081 (2011)
Condon, B.R., Patterson, J., Wyper, D.: Image nonuniformity in magnetic resonance imaging: its
magnitude and methods for its correction. Br. J. Radiol. 60(1), 83–87 (1987)
Dawant, B., Zijdenbos, A., Margolin, R.: Correction of intensity variations in MR images for
computer-aided tissues classification. IEEE Trans. Med. Imaging 12(4), 770–781 (1993)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via
finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par
pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Mathematical
34 MRI Bias Field Estimation and Tissue Segmentation Using Multiplicative. . . 1233
Pham, D., Prince, J.: Adaptive fuzzy segmentation of magnetic resonance images. IEEE Trans.
Med. Imaging 18(9), 737–752 (1999)
Powell, M.J.D.: Approximation Theory and Methods. Cambridge University Press, Cambridge
(1981)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
Salvado, O., Hillenbrand, C., Wilson, D.: Correction of intensity inhomogeneity in MR images of
vascular disease. In: EMBS’05, pp. 4302–4305. IEEE, Shanghai (2005)
Shattuck, D.W., Sandor-Leahy, S.R., Schaper, K.A., Rottenberg, D.A., Leahy, R.M.: Magnetic
resonance image tissue classification using a partial volume model. Neuroimage 13, 856–876
(2001)
Simmons, A., Tofts, P.S., Barker, G.J., Arrdige, S.R.: Sources of intensity nonuniformity in spin
echo images at 1.5t. Magn. Reson. Med. 32(1), 121–128 (1991)
Sled, J., Zijdenbos, A., Evans, A.: A nonparametric method for automatic correction of intensity
nonuniformity in MRI data. IEEE Trans. Med. Imaging 17(1), 87–97 (1998)
Stockman, G., Shapiro, L.G.: Computer Vision. Prentice Hall, Upper Saddle River (2001)
Styner, M., Brechbuhler, C., Szekely, G., Gerig, G.: Parametric estimate of intensity inhomo-
geneities applied to MRI. IEEE Trans. Med. Imaging 19(3), 153–165 (2000)
Tappen, M., Freeman, W., Adelson, E.: Recovering intrinsic images from a single image. IEEE
Trans. Pattern Anal. Mach. Intell. 27(9), 1459–1472 (2005)
Tincher, M., Meyer, C.R., Gupta, R., Williams, D.M.: Polynomial modeling and reduction of RF
body coil spatial inhomogeneity in MRI. IEEE Trans. Med. Imaging 12(2), 361–365 (1993)
Tu, X., Gao, J., Zhu, C., Cheng, J.-Z., Ma, Z., Dai, X., Xie, M.: MR image segmentation and bias
field estimation based on coherent local intensity clustering with total variation regularization,
Med. Biol. Eng. Computi. 54(12), 1807–1818 (2016)
Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C.: N4itk:
improved n3 bias correction. IEEE Trans. Med. Imaging 29(6), 1310–1320 (2010)
Tustison, N., Avants, B., Cook, P., Zheng, Y., Egan, A., Yushkevich, P., Gee, J.: N4itk: improved
n3 bias correction. IEEE Trans. Med. Imaging 29(6), 1310–1320 (2010)
Vovk, U., Pernus, F., Likar, B.: A review of methods for correction of intensity inhomogeneity in
MRI. IEEE Trans. Med. Imaging 26(3), 405–421 (2007)
Vovk, U., Pernus, F., Likar, B.: A review of methods for correction of intensity inhomogeneity in
MRI. IEEE Trans. Med. Imaging 26(3), 405–421 (2007)
Wali, S., Zhang, H., Chang, H., Wu, C.: A new adaptive boosting total generalized variation (TGV)
technique for image denoising and inpainting. J. Vis. Commun. Image Represent. 59, 39–51
(2019a)
Wali, S., Shakoor, A., Basit, A., Xie, L., Huang, C., Li, C.: An efficient method for Euler’s elastica
based image deconvolution. IEEE Access 7, 61226–61239 (2019b)
Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation
image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008)
Weiss, Y.: Deriving intrinsic images from image sequences. In: Proceedings of 8th International
Conference on Computer Vision (ICCV), vol. II, pp. 68–75 (2001)
Wells, W., Grimson, E., Kikinis, R., Jolesz, F.: Adaptive segmentation of MRI data. IEEE Trans.
Med. Imaging 15(4), 429–442 (1996)
Wicks, D.A.G., Barker, G.J., Tofts, P.S.: Correction of intensity nonuniformity in MR images of
any orientation. Magn. Reson. Imag. 11(2), 183–196 (1993)
Zheng, X., Lei, Q., Yao, R., Gong, Y., Yin, Q.: Image segmentation based on adaptive k-means
algorithm. EURASIP J. Image Video Process. 2018(1), 1–10 (2018)
Data-Informed Regularization for Inverse
and Imaging Problems 35
Jonathan Wittmer and Tan Bui-Thanh
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236
A Data-Informed Regularization (DI) Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238
Data-Informed Regularization Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238
A Statistical Data-Informed (DI) Inverse Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245
Properties of the DI Regularization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1250
Applications to Imaging Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256
Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256
Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265
X-Ray Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271
J. Wittmer ()
Department of Aerospace Engineering and Engineering Mechanics, UT Austin, Austin, TX, USA
e-mail: [email protected]
T. Bui-Thanh ()
Department of Aerospace Engineering and Engineering Mechanics, The Oden Institute for
Computational Engineering and Sciences, UT Austin, Austin, TX, USA
e-mail: [email protected]
Abstract
This chapter presents a new regularization method for inverse and imaging prob-
lems, called data-informed (DI) regularization, that implicitly avoids regularizing
the data-informed directions. Our approach is inspired by and has a rigorous
root in disintegration theory. We shall, however, present an elementary and
constructive path using the classical truncated SVD and Tikhonov regularization
methods. Deterministic and statistical properties of the DI approach are rigor-
ously discussed, and numerical results for image deblurring, image denoising,
and X-ray tomography are presented to verify our findings.
Keywords
Introduction
In this section we review the key ideas behind regularization by truncation using the
singular value decomposition (SVD). This provides the basic insights into the data-
informed regularization technique. A statistical interpretation of the data-informed
inverse framework will be discussed in section “A Statistical Data-Informed (DI)
Inverse Framework”. To begin, let us consider a linear inverse problem to determine
x ∈ Rp given
y = Ax + e, (1)
where A ∈ Rd×p , e ∼ N 0, λ2 I , I ∈ Rd×d , and y ∈ Rd . In the following,
the identity matrix I may have different size at different places and the actual size
should be clear from the context. The simplest approach to attempt to solve this
inverse problem is perhaps the least-squares approach:
1
Ax − y 2 ,
min (2)
x 2
where · denotes the standard Euclidean norm. Figure 1a plots the exact synthetic
solution (black curve) against the least-squares solution (red curve) for a deconvo-
lution problem with d = p = 101 and λ = 0.05. As can be seen, the least-squares
solution blows up (or is unstable) due to the ill-conditioning of AT A, which is not
surprising since the inverse problem is (typically) ill-posed.
Fig. 1 Deconvolution using (a) the least-squares approach and (b) a Tikhonov regularization with
regularization parameter α = 1 and x 0 = 0
35 Data-Informed Regularization for Inverse and Imaging Problems 1239
1
Ax − y 2 + α x − x 0 2 ,
min
x 2 2
where U n := U 1 , . . . , U n (the first n columns of U corresponding to n nonzero
singular values (This rank-n decomposition is often known as the reduced
SVD.)),
Σ n := diag , . . . , σ (σ ≥ σ ≥ . . . ≥ σ ), n ≤ min d, p , and V n :=
[σ
1 n ] 1 2 n
V 1 , . . . , V n (the first n columns of V corresponding to n nonzero singular values).
U n forms an orthonormal basis for the column space of A, and V n forms an
orthonormal basis for the row space of A. The solution of (2) with this rank-n
truncation using the pseudo-inverse A† reads
n
−1 T U Ti y
x nSVD = A† y = V n Σ n Un y= V i,
σi
i=1
r
U Ti y
x rTSVD := V i, (3)
σi
i=1
Figure 2 applies the TSVD approach to the deconvolution problem and compares
the results with the Tikhonov regularization. As can be seen, TSVD solutions are
stable and do not seem to over-regularize the solution. However, as r increases,
TSVD solutions tend to be more oscillatory (more unstable). How can this behavior
of TSVD be explained?
The answer lies on the fact that the j th column of A is the observational vector
when the parameter x is the j th canonical basis vector in Rp . Thus the range space
(column space) of A can be understood as the observable subspace in Rd . Within
this observable subspace, we say that the subspace spanned by U j , i.e., span U j ,
is more observable than the subspace spanned by U i , i.e., {U i }, when j < i.
span
Equivalently, span {U i } is less observable than span U j . With this (relative)
definition, span {U 1 } is most observable, while span {U n } is least observable.
Clearly j < i implies 1/σj ≤ 1/σi . Consequently, in the TSVD solution (3), less
1
Ax − y 2 + 1 L (x − x 0 )2 ,
min (4)
2 2
where
T ⊥ ⊥ T
LT L := ∞ I − V r V r = ∞ Vr Vr
⊥ 0 0 ⊥ T
= V r, V r V r, V r ,
0 ∞I
T
and I − V r V r is the orthogonal projection onto the data-uninformed sub-
d
space spanned by V j j =r+1 . Here, multiplication by infinity is understood in
the usual limit sense, e.g., ∞I := limα→∞ αI . Thus, regularization—an infinite
amount in this case—is only added in data-uninformed directions. The solution of
(4) is formally given by
1242 J. Wittmer and T. Bui-Thanh
−1
T
x I nf = AT A + ∞ I − V r V r AT y + LT Lx 0
⎧ ⎛ ⎫−1
⎪ ⎞ ⎪
⎨ 2 T⎬
⊥ ⎝ Σr 0 0 0 ⎠ r ⊥
= V r, V r + V , Vr AT y
⎪
⎩ 0 D2 0 ∞I ⎪
⎭
−2 T −1 T
= V r Σr Vr AT y = V r Σ r Ur y =: x rT SV D ,
⊥
where V r is the orthogonal complement
of V r in Rp , Σ r := diag [σ1 , . . . , σr ],
and D := diag σr+1 , . . . , σp . The second equality clearly shows that the
regularization scheme adds infinity to all singular values that correspond to data-
uninformed modes. The last equality proves that infinite regularization on data-
uninformed parameter subspace is the same as the TSVD approach.
The beauty of the TSVD approach is that it avoids putting any regularization on
data-informed parameter directions, and hence avoids polluting inverse solutions in
these directions, while annihilating data-uninformed directions. However, it is often
the case that there is no clear-cut between the data-informed and data-uninformed
ones (i.e., σk = 0 for k ≥ r + 1) but gradual (sometimes exponential) decay
of the singular values of A. In that case, completely removing less data-informed
directions may not be ideal, as they may still contain valuable parameter information
encoded in the data. Instead, we may want to impose finite regularization in the data-
uninformed directions, i.e.,
1
Ax − y 2 + 1 L (x − x 0 )2 ,
min (5)
2 2
where
T ⊥ ⊥ T
LT L := α I − V r V r = α Vr Vr
r ⊥ 0 0 ⊥ T
= V , V r
V r, V r .
0 αI
Let us call this approach the data-informed (DI) regularization method. The
inverse solution in this case reads
−1
T
x DI = AT A + α I − V r V r AT y + LT Lx 0
⎧ ⎛ ⎫−1
⎪ ⎞ ⎪
⎨ r 2 T⎬
⊥ ⎝ Σ 0 0 0 ⎠ r ⊥
= V r, V r + V , Vr
⎪
⎩ 0 D2 0 αI ⎪
⎭
35 Data-Informed Regularization for Inverse and Imaging Problems 1243
× AT y+LT Lx 0
⎧ ⎛ ⎫−1
⎪ ⎞ ⎪
⎨ 2 T⎬
⊥ ⎝ Σr 0 I 0 αI 0 ⎠ r ⊥
= V r, V r +α − V , Vr
⎪
⎩ 0 D 2 0 I 0 0 ⎪
⎭
× AT y + LT Lx 0 .
The last equality suggests that the DI approach can be considered as first applying
the same (Note that α need not be the same for all directions.) (finite) regularization
for all parameter directions and then removing regularization in the data-informed
directions.
A few observations are in order: (1) When r = 0, DI becomes the standard
Tikhonov regularization; (2) when α → ∞ DI approaches the truncated SVD; and
(3) when α σi for i ≤ r (i.e., regularization in the data-informed modes is
negligible), the Tikhonov solution
−1
x T ikhonov = AT A + αI AT y + LT Lx 0
⎧ ⎛
⎪
⎨ 2
r ⊥ ⎝ Σr 0
= r
V , V
⎪
⎩ 0 D2
⎫−1
⎞ ⎪
I 0 ⎠ r ⊥ T⎬
+α V , Vr AT y + LT Lx 0
0I ⎪
⎭
Fig. 3 Deconvolution with noise level λ = 5% using DI and Tikhonov regularization for various
values of regularization parameter and r
behaves the same as Tikhonov for the under-regularization regime (α < 0.01)
as expected, and it outperforms Tikhonov for α > 0.01 as the retained data-
informed modes, which determine the quality of the deconvolution solution, are
left untouched. For r = 20, the retained modes now also include high-frequency
modes, and hence the DI approach is not as accurate as Tikhonov for α < 1. For all
cases with significant number of modes retained, i.e., r > 5, the DI solution quality
is insensitive to a large range of the regularization parameter. Note that methods
for choosing the regularization parameter α in practice include L-curve (Hansen
and O’Leary 1993; Hansen 1992), the Morozov’s discrepancy principle (Morozov
1966), and generalized cross-validation (Golub et al. 1979). These methods are
inherently computationally costly, and this can be mitigated using the DI approach
as it is robust with regularization parameter.
We have used a rank-r SVD approximation to derive and gain insights into
the DI approach. For large-scale problems, this low rank-decomposition could
be prohibitively expensive. To lead to an alternative computational approach (see
Algorithm 2) and more importantly to provide a probabilistic view point of the DI
approach, let us take r = n until the end of section “A Statistical Data-Informed (DI)
T
Inverse Framework”. In this case, since V n V n is the orthogonal projection into
35 Data-Informed Regularization for Inverse and Imaging Problems 1245
Fig. 4 Deconvolution with noise level λ = 5% using DI and Tikhonov regularizations for α =
10−4 , 102 and r = {1, 5, 10, 20}
†
T
the row space of A, i.e., V n V n = AT AAT A, we can rewrite the inverse
(optimization) problem (5) as
1
Ax − y 2 + 1 L (x − x 0 )2 ,
min J := (6)
x 2 2
where
†
L L := α I − AT AAT A .
T
In this form, the DI regularization approach (6) not only avoids using V n explicitly
but also brings us to a statistical data-informed inverse framework in the next
section.
2
exp − 12 Ax − y × exp − α2 x − x 0 2
exp (−J ) = † .
T
exp − α2 Ax − Ax 0 AAT Ax − Ax 0
From a Bayesian inverse perspective (Kaipio and Somersalo 2005; Tarantola 2005;
Franklin 1970; Lehtinen et al. 1989; Lasanen 2002; Stuart 2010; Piiroinen 2005),
the numerator is the product of the likelihood
1 2
πlike y|x ∝ exp − Ax − y
2
from the observational model (1) with the noise e ∼ N 0, I and the Gaussian prior
α
πprior (x) ∝ exp − x − x 0 2 (7)
2
Note that it is necessary to use the pseudo-inverse for the inverse of the covariance
†
C, i.e., C−1 := α AAT , since A may not have full row rank and thus the push-
forward distribution can be a degenerate Gaussian.
! " $
α † #
exp − (x − x 0 ) I −A AA
T T T
A (x − x 0 )
2
in such a way that the new prior leaves the data-informed directions, i.e., the row
space of A, untouched, and hence only regularizes data-uninformed directions. The
data-informed approach accomplishes this by the push-forward of the prior via the
parameter-to-observable map A.
where A# πprior (x) denotes the push-forward of πprior (x) via the parameter-to-
observable map A.
We have constructively derived the DI approach by modifying the truncated SVD
method and Gaussian prior with scaled-identity covariance matrix. In practice, the
prior can be more informative about the correlations among components of x and
in that case the covariance matrix is no longer an identity matrix. Let us denote by
πprior (x) = N x 0 , Γ /α the Gaussian prior with covariance matrix Γ /α. Let us
also consider a more general data distribution where, for a given parameter x, the
data is distributed by the Gaussian N Ax, Λ . In order to use most of the above
1
results, let us whiten both the parameter and observations. In particular, Λ− 2 y is
1
the whitened observations (inducing Λ− 2 A as the new parameter-to-observable
1
forward map), and Γ − 2 x is the prior-whitened parameter. (Here, the square roots
for Γ and Λ are understood in the broader sense including: (1) if Γ and Λ are
diagonal matrices, the square roots are simply diagonal matrices with square roots
of the diagonal elements; (2) if Γ and Λ are not diagonal matrices, these square roots
are understood in the spectral decomposition sense. For example: let Γ = V ΣV T
be the spectral decomposition of Γ , then Γ 1/2 := V Σ 1/2 V T . Note that this is
meaningful as we assume the corresponding Gaussian distribution is non-degenerate
and hence Σ is the diagonal matrix with positive diagonal elements; and (3) if
Cholesky-type decomposition is available, i.e., Γ = LLT (L is not necessarily a
Cholesky factorization), then Γ 1/2 = L, and we simply add the “transpose” operator
at appropriate places for one of the square roots.) The push-forward of the prior via
1 1
Λ− 2 A now reads (Note that using the modified forward map Λ− 2 A, though making
the presentation clearer and constructive, is not necessary as using the original map
A yields the same result.)
1 1 1 1 1
Λ− 2 A# πprior (x) = N Λ− 2 Ax 0 , Λ− 2 AΓ AT Λ− 2 , (9)
α
1248 J. Wittmer and T. Bui-Thanh
The DI posterior (8) with whitened parameter, whitened observations, and induced
parameter-to-observable map now becomes
which, after writing the push-forward measure in terms of the whitened parameter,
reads
2 1 2
1 − 12 − 12 1
exp − 2 Λ Ax − Λ y × exp − α2 Γ − 2 x − Γ − 2 x 0
πDI x|y ∝ ⎛ ⎞,
1 2
exp ⎝− α2 Γ − 2 x − Γ − 2 x 0 1 T − 1 − 1 1⎠
1
1 † −1
Γ 2A Λ 2 ΛT −2 AΓ A Λ 2 Λ 2 AΓ 2
or equivalently
1 1 2 1 −1
2
−2 − 12
− log πDI x|y ∝ Λ Ax − Λ y + L Γ x − Γ x 0 2 − 12
,
2 2
(11)
where
1
1 1 1 † 1 1
L L = α I − Γ 2 AT Λ− 2 Λ− 2 AΓ AT Λ− 2 Λ− 2 AΓ 2
T
⊥ ⊥ T
= α I − V nV nT = α V n Vn
n ⊥ 0 0 ⊥ T
= V , V
n
V n, V n , (12)
0 αI
where V n contains the first n right singular vectors of the following SVD
1 1
Λ− 2 AΓ 2 := U ΣV T . (13)
As can be seen, the push-forward measure seeks to find the first n columns
of V associated with the n nonzero singular values. The DI method then avoids
regularizing these “data-informed directions" V n . In other words, in the whitened
parameter, the induced regularization by the prior is identity, and the DI approach
35 Data-Informed Regularization for Inverse and Imaging Problems 1249
1
z := T x, where T := Ψ V T Γ − 2 , (14)
1
w := Sy, where S := Φ T U T Λ− 2 . (15)
1
where z are the first n coordinates of x in V , after whitening via Γ − 2 and then being
scaled by Ψ . Similarly, w are the first n coordinates of y in U , after whitening via
1
Λ− 2 and then being scaled by Φ. The map T pushes forward the prior in x to the
prior in z as
⎛ ⎞
n
1
πprior (z) ∼ exp ⎝− σi−1 (zi − zi )2 ⎠ , (16)
2
i=1
where z = T x 0 . Similarly, given x (and hence z), the induced likelihood in terms
of w is given by
⎛ ⎞
n
1
πlike w|z ∼ exp ⎝− σi−1 (wi − σi zi )2 ⎠ . (17)
2
i=1
As can be seen from (16) and (17), the maps T and S transform the original
parameter x and original data y to new parameter z and new data w. Two
observations are in order: (1) though in general the original parameter and data
dimensions are different, the new parameter and data have the same dimension; and
(2) the new data w and new parameter z, up to the difference in the mean, have
the same distribution. In particular, both z and w are Rn -vectors of independent
Gaussian distributions with diagonal covariance matrix Θ ∈ Rn×n with Θ ii = σi .
Both zi and wi , up to the difference in the mean, are the same Gaussian distribution
with variance σi . Since σ1 ≥ σ2 ≥ . . . ≥ σn > 0, the independent random variable
zi (and hence wi ) is ranked from the one with most variance to the one with least
variance.
Let us call the ith column of U , namely U i , the ith important direction in the
data space and the ith column of V , namely V i , the ith important direction in
1250 J. Wittmer and T. Bui-Thanh
the parameter space. Let us also rank the degree of importance of U i and V i by
the magnitude of σi . It follows that the transformations T and S map the original
parameter x and data y into new parameter z and data w in which the corresponding
parameter zi and data wi are equally important. This is similar to the concept of
balanced transformation in control theory (see, e.g., Gugercin and Antoulas 2004;
Antoulas 2005 and the references therein). The new parameter z is thus equally
data-informed and prior-informed. In particular zi is equally less data-informed and
prior-informed relatively to zj for j < i.
The DI method thus regularizes only the (equally) data-uninformed and
prior-uninformed parameters/directions.
Deterministic Properties
It is easy
to see the optimality condition of the optimization problem
maxx log πDI x|y is given by
H x DI = b, (18)
where
% " #&
1 †
−1 −1 T − 12 −2 T − 12 − 12
H := A Λ T
A+α Γ −A Λ Λ AΓ A Λ Λ A ,
" 1 #
1 1 † 1
b := AT Λ−1 y + α Γ −1 − AT Λ− 2 Λ− 2 AΓ AT Λ− 2 Λ− 2 A x 0
In order to solve the optimality condition (18) in practice, we can use the rank-r
approximation
1 1
Λ− 2 AΓ
T T
2 = U nΣ n V n ≈ UrΣr V r (19)
1
1
1 † 1
for the push-forward matrix AT Λ− 2 Λ− 2 AΓ AT Λ− 2 Λ− 2 A, where again n
is the largest index for which σn > 0. Thus rank-r approximations (only for the
regularization/prior) for H and y are given by
1 1
H r := AT Λ−1 A + α Γ −1 − Γ − 2 V r V r Γ −2 ,
T
1 1 1 1
br := Γ − 2 V n Σ n U n Λ− 2 y + α Γ −1 − Γ − 2 V r V r Γ − 2 x0.
T T
H r using the conjugate gradient (CG) method which requires only matrix-vector
products. In the numerical results section, we present a nested optimization method
(see Algorithm 2) that avoids the low-rank approximation altogether. The analysis
of such method is, however, more technical and thus left for future work. The rank-r
approximation to the solution of the optimality condition (18) is defined as
H r x rDI = br , (20)
for which the corresponding DI inverse formulation is given in (24) (by replacing rε
with r), which reduces to (5) when Λ = I and Γ = I . We can rewrite H r in terms
of n singular vectors corresponding to the n nonzero singular values as
1 1
H r = αΓ − 2 I + V n D n V n Γ −2 ,
T
where D n is an n × n diagonal matrix with D n (i, i) = σi2 − α /α for i ≤ r and
D n (i, i) = σi2 /α for r < i ≤ n.
1 1
Λ− 2 y.
T
Ax nDI = Λ 2 U n U n (22)
−1 1 1 n,r T 1
Hr = Γ 2 I − V n dDI V n Γ 2, (23)
α
n,r n,r
where d is an n × n diagonal matrix with dDI (i, i) = σi2 − α /σi2 for i ≤ r
n,r
and dDI (i, i) = σi2 / σi2 + α for r < i ≤ n. The computation of the product
−1
Hr y r to arrive at the assertion is straightforward algebraic manipulation and
hence omitted.
The result (22) shows that the image of the DI solution x DI through the
1
parameter-to-observable map is exactly the data if U n U n = I or Λ− 2 y resides
T
in the column space of U n . This happens, for example, when A has full row rank
1252 J. Wittmer and T. Bui-Thanh
and the number of data is not more than the dimension of the parameter, i.e., d ≤ p.
In this case, retaining all modes corresponding to nonzero singular values in the DI
solution makes the data misfit vanish, that is, the DI solution in this case would
match the noise, which is undesirable. As discussed in section “Data-Informed
Regularization Derivation”, r should be smaller than n for the solution to be
meaningful. Let us define
rε := max {i : 1 ≤ i ≤ n and σi ≥ ε} ,
for some ε > 0 (which, as discussed before, can be chosen using the Morozov’s
discrepancy principle), and the “reconstruction operator” (Colton and Kress 1998;
Kirsch 2011)
−1 1
Rε := H rε A T Λ− 2 .
Theorem 1. For any ε > 0 and α > 0, consider the inverse problem
1 − 12 1 2
1 1
2
min J = Λ Ax − Λ− 2 y + LΓ − 2 (x − x 0 ) , (24)
x 2 2
(i) The inverse problem with rank-rε DI approach, i.e., the optimization problem
(24), is well-posed in the Hadamard sense.
(ii) Suppose that the nullspace of A is trivial, i.e., N A = {0}, then the DI
technique is a regularization strategy (Colton and Kress 1998; Kirsch 2011)
in the following sense
1
lim Rε Λ− 2 Ax = x.
ε→0
1
β (ε, α) := ,
minrε <i≤n rε , σi + α/σi
which shows the DI solution is stable, which in turn proves i). To see assertion ii),
1
we use the definition of Rε and the SVD of Λ− 2 AΓ 2 to arrive at
1
1 1 1 n,rε 1
R ε Λ− 2 A = Γ −2 =
T 2 T
Γ 2 I − V nd Vn V n Σn Vn
α
⎡ ⎤
I 0
⎢
1 ⎥ n 1
Γ −2 ,
T
Γ Vn⎣
2 σi2 ⎦ V
0 diag
σi2 +α
rε <i≤n
which implies
1 1 1
lim Rε Λ− 2 Ax = Γ 2 V n I V n Γ − 2 x = x,
T
ε→0
T
where we have used the fact that rε → n as ε → 0 and that V n V n = I since
N A = {0}.
For assertion iii), it is sufficient to show that
1
− 12 −2
sup Rε Λ y − x : Λ Ax − y ≤ ε → 0 as ε → 0,
y
where we have used the result from (ii), definition of Rε , and the orthonormality of
V and U . Using the assumption α = O (ε) concludes the proof.
1254 J. Wittmer and T. Bui-Thanh
Remark 2. Note that most of the above arguments are still valid for infinite
1
dimensional setting, i.e., p = ∞, assuming that Γ is a trace class. Indeed, Λ− 2 AΓ 2
1
is then a compact operator, and we can invoke the infinite dimensional singular value
1
decomposition (Colton and Kress 1983) for Λ− 2 AΓ 2 . Note that all the matrices are
1
Statistical Properties
Now we discuss some probabilistic aspects of the DI prior and the DI posterior.
Since the regularization parameter α plays no role in the following discussion, we
absorb it into Γ . We define the DI prior as
2
1 1
πDI-prior (x) ∼ exp − LΓ − 2 (x − x 0 ) . (25)
2
" 1 † #
− 12 1
T − 12 −2 T − 12 − 12 1 1
C n †
:= Γ I −Γ A Λ
2 Λ AΓ A Λ Λ AΓ 2 Γ −2
" † #
1 1 1 1 1 ⊥ ⊥ T 1
= Γ − 2 I − Γ 2 AT AΓ AT AΓ 2 Γ − 2 = Γ − 2 V n Vn Γ −2 ,
where we have used the fact that Λ is invertible in the second equality. Thus, Λ
actually contributes to neither the DI prior nor its rank-r version
1 ⊥ ⊥ T 1
:= Γ − 2 V r Γ −2 .
†
Cr Vr
(i) z and z⊥ are distributed by the push-forward density of the prior throughT
and T ⊥ , respectively. In particular, z ∼ N (T x 0 , I ) and z⊥ ∼ N T ⊥ x 0 , I .
(ii) The DI prior density is the density of z⊥ and hence is well-defined.
(iii) The DI prior density is the conditional density of x given z.
Proof. Assertion (i) is straightforward. To see the second assertion, we note that
the density of z⊥ , ignoring the normalized constant, can be written as
2
1 ⊥ ⊥ 1 T
⊥ ⊥
exp − z − T x 0 = exp − (x − x 0 ) T T (x − x 0 )
2 2
1 1 ⊥ ⊥ T 1
= exp − (x − x 0 ) Γ − 2 V r Vr Γ − 2 (x − x 0 ) ,
2
which is exactly the DI prior (25). In other words, we have shown that the DI prior
is a well-defined density on z⊥ . To see assertion (iii), we observe that
⊥ ⊥
πprior (x) = πprior V r z + V r z ,
πprior (x)
πprior x|z =
π (z)
1 1 ⊥ ⊥ T 1
= exp − (x − x 0 ) Γ − 2 V r Vr Γ − 2 (x − x 0 ) ,
2
Remark 3. Note that the above decomposition of x into z and z⊥ , through the maps
T and T ⊥ , is still valid for infinite dimensional settings. However, z⊥ would be
distributed by an infinite dimensional Gaussian measure with identity covariance
operator, which is not a valid Gaussian measure. A more general understanding of
the DI prior is through disintegration. Indeed, under mild conditions on the map
T and its push-forward measure of the prior measure, the DI prior πprior x|z is
nothing more than a disintegration of the prior measure via the map T , and this
view is also valid for infinite dimensional settings.
To quantify the uncertainty in the DI inverse solution (21), we can use the
covariance matrix of the DI posterior (10). For linear inverse problems with
Gaussian prior and Gaussian noise—the problems considered in this chapter—the
1256 J. Wittmer and T. Bui-Thanh
covariance matrix is exactly the inverse of the Hessian. For rank-r DI approach, the
post
DI posterior covariance matrix CDI is given in (23), i.e.,
post 1 1 1 n,r T 1
CDI = Γ − Γ 2 V n dDI V n Γ 2 (27)
α α
It is easy to see that the covariance matrix corresponding to the Tikhonov regular-
ization is given by
post 1 1 1 n,r T 1
CTik = Γ − Γ 2 V n dTik V n Γ 2, (28)
α α
n,r n,r
where both dDI and dTik are diagonal matrices given in Table 1. Note that we
have used α as the magnitude of the regularization to study the robustness and
accuracy of all methods. If not needed, α can be straightforwardly absorbed into
Γ , and hence σi2 ; in that case α is simply replaced by 1 everywhere it appears
n,r
(including those in Table 1). As can be seen, dTik (i, i) is always non-negative for
n,r
all i, while dDI (i, i) is negative when σi2 < α for i ≤ r. That is, while the
post
Tikhonov posterior uncertainty, CTik (Bayesian posterior with standard Gaussian
prior), is always smaller than the prior uncertainty Γ no matter how much informed
the data is, the DI posterior uncertainty could be higher than the prior counterpart
if the data supports this. In other words, standard (or typical) Gaussian priors do
not allow the data to increase the uncertainty and hence are prone to producing
overconfident results (see section “Applications to Imaging Problems”). The DI
prior, on the other hand, takes the parameter-to-observable map (the proxy to the
data) into account, and thus along parameter directions that are more data-informed,
i.e. σi2 ≥ α, the posterior uncertainty is reduced relative to the prior uncertainty.
Along parameter directions that are less data-informed, i.e., σi2 < α, the posterior
uncertainty increases relative to the prior uncertainty.
Image Deblurring
One typical inverse problem in imaging is image deblurring. Given some blurry
image, we want to recover the true, sharp image. To understand the deblurring
process, we must first understand how an image becomes blurred in the first place.
35 Data-Informed Regularization for Inverse and Imaging Problems 1257
AXtrue = B (29)
where A is the blurring (convolution) operator acting on the true image Xtrue ∈
Rm1 ×m2 resulting in the blurred image B ∈ Rm1 ×m2 . By stacking (or vectorizing)
the columns of Xtrue , we can write (29) as a linear algebraic equation. Let us denote
by xtrue the vectorized true image and by y the vectorized blurred image, i.e.,
Axtrue = y (30)
Λ = λ2 I , and Γ = I ,
where φi is usually called the filter factor as it has the effect of filtering (damping)
when φi is close to 0. It can be shown that the filter factor for rank-r TSVD is given
by
⎧
⎨1, i≤r
φi =
⎩0, otherwise.
σi2
φi =
σi2 + α
Remark 4. It should be emphasized that the DI method also shares the same spectral
decomposition form in this case because Γ = I and x 0 = 0. When Γ = I , singular
1
vectors of Λ− 2 A do not necessarily diagonalize both A and Γ simultaneously. In
other words, the filtered form (31) is not valid for the DI approach unless U and V
1
are singular vectors of Λ− 2 AΓ 2 and x 0 = 0. When x 0 = 0, there is an additional
1
We can see here again that (1) when r → 0, DI approaches Tikhonov; (2)
when α σi for i ≤ r, Tikhonov is close to DI; and 3) when α → ∞, DI
converges to TSVD. This can be clearly seen in Fig. 5a for a deblurring problem
in which we plot the relative error between the deblurred images and the original
ones for m1 = m2 = 128, λ = 0.01, r = 400, and a wide range of α.
For the under-regularization regime, i.e., α < 1, which should be avoided, the
regularization is not sufficient to suppress the oscillations due to the high-frequency
modes for both Tikhonov and DI methods, resulting in inaccurate reconstructions.
For reasonable-to-over-regularization regimes, i.e., α > 1, DI is the best compared
to both Tikhonov and TSVD method as it combines the advantages from both sides.
That is: (1) DI behaves similar to Tikhonov for reasonable (but small) regularization
and outperforms Tikhonov in reasonable-to-over-regularization regimes; and (2)
compared to TSVD, DI is more accurate for reasonable regularization parameters as
it maintains the benefits of keeping useful information from all parameter directions
while avoiding potential errors caused by over-regularization. Consequently, the DI
35 Data-Informed Regularization for Inverse and Imaging Problems 1259
Fig. 5 Deblurring results for m1 = m2 = 128, λ = 0.01, r = 400. (a) relative error between
deblurred images and the truth for a range of regularization parameter α ∈ 1, 104 . (b) the DI
deblurred image with α = 100. (c) the DI deblurred image with α = 1000. (d) the DI deblurred
image with α = 5000
error is the smallest of the three methods discussed for all α > 103 , and DI is robust
with respect to the regularization parameter.
In Fig. 5b are the deblurred images for α = 100 corresponding to the smallest
deblurring error for both DI and Tikhonov. As can be seen, the Tikhonov result is
similar to the DI one, while the truncated SVD result is blurry as it removes (putting
infinite regularization on) useful information in directions V i for i > r. Figure 5c, d
show the deblurred images for α = 1000 and α = 5000, respectively, corresponding
to cases where DI outperforms both Tikhonov and TSVD (see Fig. 5a). Indeed, the
DI deblurred image has higher quality.
1260 J. Wittmer and T. Bui-Thanh
Fig. 6 Deblurring results for m1 = m2 = 128, λ = 0.05, r = 400. (a) relative error of DI and
Tikhonov solutions with respect to true solution for noise levels of 1% and 5% and α ∈ 1, 104 .
(b) the DI, Tikhonov, and TSVD deblurred images with α = 1000
In order to see if the DI method is sensitive to noise, we now consider the case
with λ = 5% noise. Deblurring accuracy for this case (purple) is shown in Fig. 6a
together with the accuracy for the case of 1% noise (yellow). As can be seen, the
solution quality of the DI method does not degrade significantly due to the presence
of noise. Compare this to the difference seen in the Tikhonov method (red and blue
curves) with the increase in noise level, we can see that the solution quality of the
Tikhonov method degrades rapidly in the presence of noise. It can also be seen that
Tikhonov regularization becomes more sensitive to the choice of α as the noise
increases. Since the DI method regularizes only the data-uninformed directions,
which also contain much of the noise, increasing the noise level has little effect
on the solution quality.
For the rest of this section, we consider the more challenging cases with λ = 5%
noise. To make the problem even more challenging, we consider images with
missing pixels to simulate more interesting cases when images are damaged or
incomplete. Figure 7 show the deblurring results using DI, TSVD, and Tikhonov
(Tik) regularizations for damaged images with m1 = m2 = 128, r = 400. The
first column contains four scenarios with 10% random data, 25% random data, 50%
random data, and 100% data, all with noise. Note that we plot the damaged images
by filling the missing data with 0. The second column contains the corresponding
TSVD deblurring results. The last four columns contain the results from DI and
Tikhonov with α = 10 and 20. As can be observed, all methods are able to deblur
and at the same time recover the true image quite well even with only 10% data. Both
DI and Tikhonov yield clearer images compared to TSVD. The Tikhonov results
are “darker,” especially with α = 20, indicating over-regularization, while the DI
images are insensitive to regularization parameter as the data-informed modes are
35 Data-Informed Regularization for Inverse and Imaging Problems 1261
Fig. 7 Deblurring results using DI, TSVD, and Tikhonov (Tik) regularizations for damaged
images with m1 = m2 = 128, λ = 0.05, r = 400. The first column consists of four scenarios
with 10%, 25%, 50%, and 100% data. The second column is the corresponding TSVD deblurring
results. The last four columns are the results from DI and Tikhonov with α = 10 and 20
left untouched. Indeed, Fig. 8 clearly demonstrates these expected results for larger
regularization parameters (α = 50 and α = 100).
Recall that the goal of sections “A Statistical Data-Informed (DI) Inverse
Framework” and “Statistical Properties” is to gain insights into statistical properties
of the DI prior. For linear parameter-to-observable maps—which are the cases for
this chapter—with Gaussian observational noise, the posterior is also a Gaussian.
As a result, the result at the end of section “Statistical Properties” also allows
us to use the posterior covariances (27) and (28) to estimate the uncertainty in
the corresponding inverse solutions. Since the posterior for either Tikhonov or DI
prior is Gaussian, its diagonal contains the marginal pixel-wise variances, which
can be used as a measure of uncertainty for each pixel. Clearly this does not take
into account the correlation among pixels, but is straightforward to have a glimpse
of uncertainty in high-dimensional (1282 -dimensional) spaces. We now study the
uncertainty estimation in the solution of deblurring problems.
To begin, it is important to distinguish the following two cases:
Fig. 8 Deblurring results using DI, TSVD, and Tikhonov (Tik) regularizations for damaged
images with m1 = m2 = 128, λ = 0.05, r = 400. The first column consists of four scenarios
with 10%, 25%, 50%, and 100% data. The second column is the corresponding TSVD deblurring
results. The last four columns are the results from DI and Tikhonov with α = 50 and 100
posterior covariance (27) thus involves the second and third columns in Table 1,
and a rank-n SVD (13) is needed.
• Case II: performing rank-r low-rank approximation of the posterior covariance
in addition to rank-r DI regularization. This amounts to using only the second
column of Table 1 for the DI posterior covariance in (27). This case is typically
more practical for large-scale problems as only a rank-r SVD (19) is needed.
In Fig. 9a are the minimum pixel-wise variances for four scenarios with 10% random
data, 25% random data, 50% random data, and 100% random data for Case II. As
can be seen, the uncertainty corresponding to the case of missing data is lower than
the uncertainty for full data case! We expect the opposite, that is, more available
(supposedly) informative data is expected to lead to lower uncertainty in the inverse
solution. The observation is twofold: first, care needs to be taken for Case II results
as rank-r approximation may not provide accurate uncertainty; second, for 10% data
case, when r > 500 the uncertainty is larger compared to the full data case. This
suggests that r needs to be sufficiently large for an accurate uncertainty estimation,
and this will be confirmed in the discussion below for Case I in which we use the
full rank (rank-n) decomposition (13). The criteria for estimating such a value of r
are a subject for future research. (At the moment of writing this chapter, we have
not yet found such a criteria.)
35 Data-Informed Regularization for Inverse and Imaging Problems 1263
Fig. 9 Rank-r DI posterior pixel-wise uncertainty using rank-n SVD decomposition (Case I with
both the second and third columns of Table 1) and using rank-r SVD decomposition (Case II with
only the second column of Table 1)
We next discuss the results for Case I. Again, this requires a rank-n SVD (13),
where n is the rank of A, to compute (27) using Table 1. Figure 9b shows that
the minimum uncertainty for any missing data case is higher than the full data
case regardless of any value of r in rank-r DI regularization. As also expected,
the uncertainty scales inversely with the amount of available data, i.e., the more
informative data we have, the smaller the uncertainty in the inverse solution. Note
that the result and the conclusion for the largest pixel-wise variances are similar and
hence omitted here.
We now compare the DI and Tikhonov posterior uncertainty estimations. Since
Case I, though more expensive, provides more accurate uncertainty estimation, it
is used for computing DI posterior pixel-wise variances. To be fair, we also use
the full decomposition for Tikhonov regularization. In other words, the following
comparison is based on (27) and (28) and Table 1. As discussed above in Figs. 6a
and 7, α = 10 corresponds to a case in the region where DI and Tikhonov give
nearly the same reconstructions (in fact Tikhonov slightly over-regularizes), so let
us start with this case first. Figure 10 shows that the DI posterior has higher pixel-
wise variance than the Tikhonov posterior. This is consistent with the result and the
discussion of Table 1 and Fig. 7, that is, the Tikhonov posterior is not only over-
regularizing but also overconfident. For both methods, regions of higher uncertainty
are visually discernible where data is missing. In the case of 100% data, the result is
the same, namely, Tikhonov uncertainty estimation subjectively is less than the DI
uncertainty estimation. In this case, the uncertainty estimate is not very interesting:
both DI and Tikhonov have approximately uniform uncertainty everywhere as we
have data everywhere. We next consider the case with α = 1000 where Tikhonov
significantly over-regularizes (see Fig. 6b). Figure 11, shows that while Tikhonov is
1264 J. Wittmer and T. Bui-Thanh
Fig. 10 Visualization of pixel-wise variance estimates for the deblurring problem with λ = 0.05,
r = 400, and α = 10. In the left column are the noisy images with 10% data and 100% data. In the
second column are the Tikhonov uncertainty estimates for 10% data (top) and 100% data (bottom).
Likewise, the third column contains the DI uncertainty estimates for 10% data (top) and 100% data
(bottom)
Fig. 11 Visualization of pixel-wise variances for the deblurring problem with λ = 0.05, r = 400,
and α = 1000. In the left column are the noisy images with 10% data and 100% data. In the second
column are the Tikhonov uncertainty estimates for 10% data (top) and 100% data (bottom). The
third column contains the DI uncertainty estimates for 10% data (top) and 100% data (bottom)
35 Data-Informed Regularization for Inverse and Imaging Problems 1265
Image Denoising
We can extend the idea of data-informed (DI) regularization to the image denoising
problem. Since noise typically resides in the high-frequency portion of the image,
denoising can be performed by applying spectral filtering techniques directly to the
noisy image. These noisy high-frequency modes are also the less informative modes
in the DI setting. Taking the SVD of the noisy image, Xnoisy , we have
Xnoisy = U ΣV T = σi U i V Ti ,
i
Xf ilt = U Σ f ilt V T = φi σi U i V Ti ,
i
f ilt
where Σ f ilt is the diagonal matrix with Σii = φi σi . The filter factors φi are
the same as those defined for the deblurring case. For a numerical demonstration,
we pick a noisy image (Hansen et al. 2006) with 20% noise (see the top-left sub-
figure of Fig. 12a). Shown in Fig. 12a are denoised results using DI with r = 20
and α = 100, TSVD with r = 20, and Tikhonov with α = 100. Though the
difference in the results is not clearly visible, the DI has smaller error compared
the other two methods. This can be verified in Fig. 12b where the relative error
between the denoised image and the true one for a wide range of “regularization
parameter” α ∈ 10−2 , 104 is presented. Clearly, we would not choose α < 1
as these correspond to under-regularization. For α > 1, DI is the best compared to
both Tikhonov and TSVD method as it combines the advantages from both methods.
Indeed, the DI error is smallest for all α > 1, and DI is robust with regularization
parameter.
X-Ray Tomography
In the previous two examples, we have been able to implement spectral filtering
methods directly by introducing filter factors which effectively modified the singular
values to minimize the impact of noise on the inversion process. (Recall that the
DI method also shares the same spectral decomposition form in this case because
1266 J. Wittmer and T. Bui-Thanh
Fig. 12 Denoising with DI, Tikhonov, and TSVD methods. (a) The relative error between the
denoised image and the true one for a wide range of “regularization parameter”. The DI error is
smallest for all α > 1 (corresponding reasonable to over-regularization regimes). (b) Denoised
results using DI with r = 20 and α = 100, TSVD with r = 20, and Tikhonov with α = 100
1
Γ = I and x 0 = 0.) Each method relied on computing a full factorization of Λ− 2 A
and then applying filters. While this is an effective and straightforward method
to solve small-to-moderate inverse problems that helps provide insight into each
approach, it can be cumbersome or even computationally infeasible to compute full
factorizations for large-scale problems. It is not uncommon that inverse problems
arising in imaging applications can lead to very large matrix operators. Indeed, we
have seen even in the toy image deblurring problem in section “Image Deblurring”
that matrix size of 16384×16384 is significantly large, and we have employed more
sophisticated methods to compute the factorization of the convolution operator.
For many problems, however, such efficient factorizations may not exist, or it is
computationally prohibitive to compute a full factorization.
One way to overcome the challenge of factorizing large matrices is to solve
the optimality condition (20) iteratively. Since H is symmetric positive definite,
we choose the conjugate gradient (CG) method (see, e.g, Shewchuk 1994 and the
references therein) which requires only matrix-vector products, which in turn avoids
forming any matrices (including A or H ) completely. We consider two variants: (a)
using CG to solve for (20), that is, we still require rank-r approximation of the DI
regularization, and (b) using CG to solve for (18), that is, a rank-r approximation of
the DI regularization is not required. In this case we use a least-squares optimization
1
1 †
method to compute the pseudo-inverse Λ− 2 AΓ AT Λ− 2 acting on a vector for
each CG iteration.
The detailed computational procedure for the a)-variant is given in Algorithm 1.
Note that the viability of this method for large-scale problems relies on
35 Data-Informed Regularization for Inverse and Imaging Problems 1267
Fig. 13 DI reconstructions for various values of the regularization parameter α and the rank r.
Each row contains the results for each regularization parameter with different values of r. The
corresponding values for α and r can be found in the rows and columns of Table 2 along with the
relative error between the reconstructed image and the true phantom
for which Fig. 14 shows that both variants give similar reconstruction quality, and
the reconstruction from both variants is shown in Fig. 15. As can be seen, the result
from the b)-variant looks much clearer, which is expected in this case, as r = 200
is not sufficient to capture all the data-informed modes for the a)-variant. By using
35 Data-Informed Regularization for Inverse and Imaging Problems 1269
Table 2 Comparison of the relative errors of the DI solution estimate for various regularization
parameters α and various values for r. The noise level here is λ = 1%
α Relative Error, %
r = 0 (Tik) r = 10 r = 50 r = 100 r = 200 r = 400
1 33.52 33.52 33.52 33.52 33.52 33.52
10 31.73 31.73 31.73 31.73 31.73 31.73
100 24.44 24.45 24.45 24.45 24.45 24.45
1000 29.81 29.80 29.72 29.66 29.51 29.09
104 58.76 58.52 56.93 55.92 54.03 50.43
105 81.77 77.10 70.33 67.78 63.84 57.84
106 96.09 81.29 72.44 69.50 81.80 299.73
Fig. 14 A comparison between variant (b) (red curve) and variant (a) with r = 200 (blue curve).
Here, we compute the relative error of the reconstruction and the truth image for various values of
regularization parameter α
the pseudoinverse formulation, we can still get excellent results while avoiding the
computation of a large factorization.
Conclusions
Fig. 15 X-Ray tomography reconstruction with 1% noise and α = 100: (a) the result from the
a)-variant with r = 200 and (b) the result from the b)-variant
using the conjugate gradient method. For each CG iterations, compute the product of
1
F T (F F T )† F Γ − 2 with any vector x using matrix-free Algorithm 3.
1
Algorithm 3 Compute the product of F T (F F T )† F Γ − 2 with any vector using
optimization
Input: functions to compute F x and F T x, current estimate of x, prior covariance matrix Γ
1
1: Compute b = F Γ − 2 x.
2: Using conjugate gradient method, solve linear equation
F F T z = b.
3: Return F T z.
approach does not pollute the data-informed modes and regularizes only less data-
informed ones. As a direct consequence, the DI approach is at least as good
as the Tikhonov method for any value of the regularization parameter, and it is
35 Data-Informed Regularization for Inverse and Imaging Problems 1271
more accurate than the TSVD (for reasonable regularization parameter). Due to
the blending of these two classical methods, DI is expected to be robust with
regularization parameter, and this is verified numerically. We have shown that DI is a
regularization strategy. The DI approach has an interesting statistical interpretation,
that is, it transforms both the data distribution (i.e., the likelihood) and prior
distribution (induced by Tikhonov regularization) to the same Gaussian distribution
whose covariance matrix is diagonal, and the diagonal elements are exactly the
singular values of a composition of the prior covariance matrix, the forward map,
and the noise covariance matrix. In other words, DI finds the modes that are most
equally data-informed and prior-informed and leaves these modes untouched so that
the inverse solution receives the best possible (balanced) information from both
prior and the data. Furthermore, the DI approach takes the data uncertainty into
account and hence can avoid overconfident uncertainty estimation. To demonstrate
and to support our deterministic and statistical findings, we have presented various
results for popular computer vision and imaging problems including deblurring,
denoising, and X-ray tomography.
References
Antoulas, A.C.: Approximation of Large-Scale Systems. SIAM, Philadelphia (2005)
Babacan, S.D., Mancera, L., Molina, R., Katsaggelos, A.K.: Non-convex priors in bayesian
compressed sensing. In: 2009 17th European Signal Processing Conference, pp. 110–114 (2009)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image
denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009)
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic
or linear programs. SIAM J. Optim. 23, 2183–2207 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical
learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3,
1–122 (2010)
Chartrand, R., Wohlberg, B.: A nonconvex admm algorithm for group sparsity with sparse groups.
In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6009–
6013 (2013)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: 2008 IEEE
International Conference on Acoustics, Speech and Signal Processing, pp. 3869–3872 (2008)
Colton, D., Kress, R.: Integral Equation Methods in Scattering Theory. Wiley (1983)
Colton, D., Kress, R.: Inverse Acoustic and Electromagnetic Scattering, 2nd edn. Applied
Mathematical Sciences, Vol. 93. Springer, Berlin/Heidelberg/New-York/Tokyo (1998)
Franklin, J.N.: Well-posed stochastic extensions of ill–posed linear problems. J. Math. Anal. Appl.
31, 682–716 (1970)
Goldstein, T., Osher, S.: The slit Bregman method for L1-regularized problems. SIAM J. Imag.
Sci. 2, 323–343 (2009)
Golub, G., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good
ridge parameter. 21, 215–223 (1979)
Gugercin, S., Antoulas, A.C.: A survey of model reduction by balanced truncation and some new
results. Int. J. Control. 77, 748–766 (2004)
Hansen, P.C.: Truncated singular value decomposition solutions to discrete ill-posed problems with
ill-determined numerical rank. SIAM J. Sci. Stat. Comput. 11, 503–518 (1990)
Hansen, P.C.: Analysis of discrete ill-posed problems by means of the l-curve. SIAM Rev. 34,
561–580 (1992)
1272 J. Wittmer and T. Bui-Thanh
Hansen, P.C., Nagy, J.G., O’Leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering.
SIAM, Philadelphia (2006)
Hansen, P.C., O’Leary, D.P.: The use of the l-curve in the regularization of discrete ill-posed
problems. SIAM J. Sci. Comput. 14, 1487–1503 (1993)
Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160 of Applied
Mathematical Sciences. Springer, New York (2005)
Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems, 2nd edn. Applied
Mathematical Sciences, Vol. 120. Springer, New-York (2011)
Lasanen, S.: Discretizations of generalized random variables with applications to inverse prob-
lems, Ph.D. thesis, University of Oulu (2002)
Lehtinen, M.S., Päivärinta, L., Somersalo, E.: Linear inverse problems for generalized random
variables. Inverse Prob. 5, 599–612 (1989)
Morozov, V.A.: On the solution of functional equations by the method of regularization. Soviet
Math. Dokl. (1966)
Mueller, J.L., Siltanen, S.: Linear and Nonlinear Inverse Problems with Practical Applications.
SIAM, Philadelphia (2012)
Nikolova, M.: Weakly constrained minimization: Application to the estimation of images and
signals involving constant regions. J. Math. Imaging Vision 21, 155–175 (2004)
Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex
regularized least-squares. Multiscale Model. Simul. 4, 960–991 (2005) (electronic)
Piiroinen, P.: Statistical measurements, experiments, and applications, Ph.D. thesis, Department of
Mathematics and Statistics, University of Helsinki (2005)
Ramirez-Giraldo, J., Trzasko, J., Leng, S., Yu, L., Manduca, A., McCollough, C.H.: Nonconvex
prior image constrained compressed sensing (ncpiccs): Theory and simulations on perfusion ct.
Med. Phys. 38, 2157–2167 (2011)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D 60, 259–268 (1992)
Shewchuk, J.R.: An introduction of the conjugate gradient method without the agonizing
pain, Carnegie Mellon University (1994). https://www.cs.cmu.edu/~quake-papers/painless-
conjugate-gradient.pdf
Stuart, A.M.: Inverse problems: A Bayesian perspective. Acta Numerica 19, 451–559 (2010).
https://doi.org/10.1017/S0962492910000061
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM,
Philadelphia (2005)
Tikhonov, A.N., Arsenin, V.A.: Solution of Ill-posed Problems. Winston & Sons, Washington, DC
(1977)
Randomized Kaczmarz Method for Single
Particle X-Ray Image Phase Retrieval 36
Yin Xian, Haiguang Liu, Xuecheng Tai, and Yang Wang
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274
The Phase Retrieval Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274
Challenges of X-Ray Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275
Phase Retrieval with Noisy or Incomplete Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 1276
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276
Background: Phase Retrieval and Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 1277
Phase Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277
Stochastic Optimization and the Kaczmarz Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278
Variance-Reduced Randomized Kaczmarz (VR-RK) Method . . . . . . . . . . . . . . . . . . . . . . . . 1279
Application: Robust Phase Retrieval of the Single-Particle X-Ray Images . . . . . . . . . . . . . . 1281
Synthetic Single-Particle Data Recovery Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281
Recovery Efficiency Under Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1282
Results of the PR772 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286
Y. Xian ()
TCL Research Hong Kong, Hong Kong, SAR, China
e-mail: [email protected]
H. Liu
Microsoft Research-Asian, Beijing, China
e-mail: [email protected]
X. Tai
Hong Kong Center for Cerebro-cardiovascular Health Engineering (COCHE), Shatin, Hong
Kong, China
e-mail: [email protected]
Y. Wang
Hong Kong University of Science and Technology, Hong Kong, SAR, China
e-mail: [email protected]
Abstract
Keywords
Introduction
initialization. The follow-up works include the truncated Wirtinger flow (Chen and
Candès 2017), truncated amplitude flow (Wang et al. 2017), and reshaped Wirtinger
flow (Zhang and Liang 2016). These methods have less computational complexities
and have theoretical convergence guarantees.
The randomized Kaczmarz algorithm is introduced to solve the phase retrieval
problem by Wei (2015). The randomized Kaczmarz method can be viewed as a
special case of the stochastic gradient descent (SGD) (Needell et al. 2014). For
the phase retrieval problem, the method is essentially SGD for the amplitude flow
objective. It was shown numerically that the method outperforms the Wirtinger flow
and the ER method (Wei 2015). The convergence rate of the randomized Kaczmarz
method for the linear system is studied in the paper of Strohmer and Vershynin
(2009). The theoretical justification of using randomized Kaczmarz method for
phase retrieval has been presented in the paper of Tan and Vershynin (2019).
In this chapter, we investigate the phase retrieval algorithms of the XFEL data. The
baseline for a good phase retrieval algorithm is its robustness against noises and the
incompleteness of information (Shi et al. 2019).
The number of photons detected by the optical sensor is of Poisson distribution. For
the phase retrieval problem contaminated by the Poisson noise, or has incomplete
magnitude information, the prior information is crucial to process the data. Research
for imposing prior information to image processing is shown in the literature (Le
et al. 2007; Zhang et al. 2012; Hunt et al. 2018).
In order to better reconstruct the data, one can consider a variational model
by introducing a total variation (TV) regularization, which is widely used in
imaging processing community. TV regularization can enable recovery of signals
from incomplete or limited measurements. The alternating direction of multipliers
method (ADMM) (Glowinski and Le Tallec 1989; Wu and Tai 2010) and the
split Bregman method (Goldstein and Osher 2009) is usually applied to solve
the TV-regularization problem. They have been applied in the phase retrieval
problem (Chang et al. 2016, 2018; Bostan et al. 2014; Li et al. 2016).
Besides TV regularization, Tikhonov regularization is another important smooth-
ing techniques in variational image denoising. It is often applied in noise removal.
The phase retrieval problem with a Tikhonov regularization has been solved by
the Gauss-Newton method (Seifert et al. 2006; Sixou et al. 2013; Langemann and
Tasche 2008; Ramos et al. 2019). Considering the sparsity constraints, the fixed
point iterative approach (Fornasier and Rauhut 2008; Tropp 2006; Ma et al. 2018)
has been applied for the problem with nonlinear joint sparsity regulation.
Outline
Phase Retrieval
m
min (yk − |ak , x|2 )2 . (1)
x
k=1
where y is the measurement, x is the signal that need to be recovered, and ak is the
measurement operating vector. In the setting of forward X-ray scattering imaging at
the far field, ak is a Fourier vector, and y is a diffraction pattern of the target. The
problem in phase retrieval is the limitation of optical sensors, which measures only
the intensity.
The loss function of Eq. (1) is expressed as the squared difference between
measurement intensities and the modelled intensities. It is a system of quadratic
equation, and therefore, it is a non-convex problem.
To solve Eq. (1), the alternate projection methods are often used, such as
HIO, ER, and RAAR methods as mentioned previously. These algorithms can be
expressed in the form of fixed-point equation. They can be implemented jointly to
better avoid local minima.
When the loss function is expressed as the squared loss of amplitudes, the
formulation can be written as:
m
√
min ( yk − |ak , x|)2 . (2)
x
k=1
To solve Eq. (2), it is possible to apply the amplitude flow algorithm (Wang et al.
2017), which is essentially a gradient descent algorithm that can converge under
good initialization.
1278 Y. Xian et al.
1
m
min fk (x), (3)
x m
k=1
tk
m
the gradient descent method updating rule is: xk+1 = xk − m ∇fk (xk ), where
k=1
tk is the step size at each iteration and m is the number of samples, or the number of
measurements in the phase retrieval setting. The gradient descent is expensive, and
it requires evaluation of n derivatives at each iteration. To reduce the computational
cost, the SGD is proposed:
1
m
xk+1 = xk − η ∇fik (xk ) − ∇fik (x̄) + ∇fi (x̄) (5)
m
i=1
where η is the step size, and x̄ is a snapshot value in each epoch (Johnson and Zhang
2013).
The Kaczmarz method is a well-known iterative method for solving a system of
linear equations Ax = b, where A ∈ Rm×n , x ∈ Rn , and b ∈ Rm . The classical
Kaczmarz method sweeps through the rows in A in a cyclic manner and projects
the current estimate onto a hyperplane associated with the row of A to get the
new estimate. The randomized Kaczmarz method randomly chooses the row for
projection in each iteration:
bik − aik , xk
xk+1 = xk + aik (6)
||aik ||22
where aik is the row of A. The randomized Kaczmarz can be viewed as a reweighted
SGD with importance sampling for the least squares problem (Needell et al. 2014):
1 T
m
1
F (x) = ||Ax − b||22 = (ai x − bi )2 . (7)
2 2
i=1
36 Randomized Kaczmarz Method for Single Particle X-Ray Image Phase Retrieval 1279
m
min (bk − |ak , x|)2 . (8)
x
k=1
The update scheme for randomized Kaczmarz for the phase retrieval objective of
Eq. (8), according to the paper of Tan and Vershynin (2019), is:
where
ik is drawn independently and identically distributed (i.i.d.) from the index set
{1, 2, · · · , m} with the probability
||aik ||2
gk = . (10)
||A||2F
The VR-RK method is inspired by the randomized Kaczmarz method and the
SVRG method. It is proposed originally to solve the linear system equation (Jiao
et al. 2017). Let fi (x) = 12 (aiT x − bi )2 , and let
fi (x) 1 ||A||2F
hi (x) = = (|aiT x| − bi )2 (11)
gi 2 ||ai ||2
then,
||A||2F
∇hi (x) = (aiT x − sign(aiT x)bi )ai (12)
||ai ||2
Let μi (x) = ∇hi (x), and s be the size of the epoch. The variance-reduced
randomized Kaczmarz algorithm for phase retrieval is shown in Algorithm 1.
1280 Y. Xian et al.
x̄ = x k and μ̄ = μ(x k )
m
where μ̄ = 1
m ∇hi (x̄)
i=1
1
m
min (bk − |ak , x|)2 + γ ||x||2 . (13)
x 2
k=1
Applying the randomized Kaczmarz method, according to Hefny et al. (2017), the
updating process becomes:
1
m
xk+1 = xk − ∇cik (xk ) + ∇cik (x̄) − ∇ci (x̄) (16)
m
i=1
For the consideration of the sparsity, the L1 instead of the L2 constraint can be
imposed; then the objective function becomes:
1
m
min (bk − |ak , x|)2 + λ||x||1 . (17)
x 2
k=1
min C||x − d||22 + λ||x||1 (18)
x
where C is a constant and C ≥ ρmax (AH A), ρmax is the largest eigenvalue of a
matrix, and d is the constant vector that is defined as:
1 H
d := xk − A (Axk − b ej (Axk )
). (19)
C
Above, the notation is the element-wise Hadamard product of two vectors, and
is the phase angle. The close form solution of x is:
λ
x ∗ = ej (d)
max |d| − 1, 0 .
2C
The first experiment is to test the reconstruction efficiency of the virus data, as
shown in Fig. 1. The image size of Fig. 1a is 755 × 755 pixels, and the pixel values
are normalized to [0,1]. The diffraction pattern (Fig. 1b) is created by taking the
Fourier transform of Fig. 1a. In this experiment, X-ray scattering signals are mainly
observed at low resolutions, corresponding to low frequencies in Fourier space. A
gap is placed in the center of the diffraction pattern to allow the incident beam to
pass through, to avoid damaging or saturating detector sensors. The gap results in an
information loss at low-frequency regime, as shown in Fig. 1c. The low-frequency
information corresponds to the overall shape of the object. Without which, it poses
a challenge for reconstruction.
We reconstruct the sample virus image from the diffraction pattern with detector
gap in Fig. 1c. The VR-RK, randomized Kaczmarz, HIO, and RAAR methods are
tested in the MATLAB platform. In order to reconstruct the data, a reference signal
is used as a priori for preprocessing as described in the paper of Barmherzig et al.
(2019), and the numerical iteration is then performed. Comparison of convergence
rates and the relative square errors is shown in Fig. 2 and Table 1. The relative
square error is defined by: ||x − x̂||2 /||x||2 , where x is the ground truth image and
x̂ is the reconstructed image. The experiment shows that the VR-RK algorithm has
a faster convergence rate and a better reconstruction accuracy compared with the
randomized Kaczmarz algorithm and the iterative projection algorithms.
1282 Y. Xian et al.
Fig. 1 Virus sample particle and its diffraction patterns (Li 2016). (a) Virus particle 2D projection
imaging in real space. (b) Simulated X-ray data. (c) The simulated data with a gap. The size of
pixels in the gap is 409
Fig. 2 Comparison of
convergence rate
To further illustrate the convergence rate, we compare the VR-RK algorithm and the
randomized Kaczmarz algorithm under L1 and L2 constraints on reconstructing the
virus sample data. The cost function changes per iteration are shown in Fig. 3. From
the figure, the loss function decays faster in VR-RK than randomized Kaczmarz
method.
Considering that the single-particle X-ray imaging data are influenced by the
Poisson noise, we examine the reconstruction accuracy at various noise levels, with
from 0.005 to 0.1, and the measurement under the noise: y = |Ax|2 (1 + ).
Table 2 shows the relative square error of reconstruction using different phase
retrieval algorithms in various noise levels. From Table 2, we can see that the VR-
RK method outperforms other algorithms under noise.
36 Randomized Kaczmarz Method for Single Particle X-Ray Image Phase Retrieval 1283
Fig. 3 Comparison of convergence rate. (a) Under L1 constraint. (b) Under L2 constraint
We test the VR-RK algorithm on the PR772 particle dataset (Reddy et al. 2017).
The image size is 256 × 256 pixels, and the pixel values are scaled to the range of
[0, 255]. Illustration of the diffraction pattern of the single-particle data is shown in
Fig. 4a and e.
For this dataset, the shrinkwrap method is applied to obtain a tight object
support (Shi et al. 2019; Marchesini et al. 2003), and the square root of the
diffraction intensities is used as a reference for the missing pixels during numerical
iteration. A recovery example is shown in Fig. 4, and more recovery examples are
presented in the supplementary materials.
We use the VR-RK algorithm and the RAAR and HIO methods to recover
the data and classify the single-particle scattering pattern data and the non-single-
particle scattering pattern data. We use the VR-RK for computation. There are
497 samples with labels in the validation set (Shi et al. 2019). Among them,
208 are single-particle samples, and 289 are non-single-particle samples. We use
ISOMAP for data compression and clustering and KNN for classification. We use
fourfold cross-validation. The VR-RK has the best result. The AUCs of the binary
classification results are listed as follows (Table 3).
From the results, we can see that the VR-RK method can help recover the data
and improve classification rate.
1284 Y. Xian et al.
Fig. 4 PR772 single-particle scattering pattern phase retrieval. (a) and (e) are two single-particle
diffraction patterns; (b) and (f) are the recovered diffraction patterns of (a) and (e), respectively; (c)
and (g) show the comparison of the original and the recovered diffraction patterns, the left half is
the original, and the right half is the recovered; (d) and (h) are the real-space images reconstructed
using VR-RK algorithm from (a) and (e)
Conclusion
Appendix
For the PR772 dataset, further examples of phase retrieval recovery are shown
here. Figure 5a and b are examples of 25 diffraction pattern sample reconstructions.
Figure 5c shows the corresponding real-space recovered images.
Figures 6 and 7 are examples of 100 diffraction pattern samples reconstruction.
Figure 8 shows the corresponding real space recovered images.
36 Randomized Kaczmarz Method for Single Particle X-Ray Image Phase Retrieval 1285
Fig. 5 Phase retrieval of the PR772 dataset. (a) Original data diffraction pattern illustrations. (b)
Recovered image diffraction pattern illustrations. (c) Recovered real-space data illustrations
References
Bahmani, S., Romberg, J.: Phase retrieval meets statistical learning theory: a flexible convex
relaxation. In: Artificial Intelligence and Statistics, pp. 252–260. PMLR (2017)
Barmherzig, D., Sun, J., Li, P., Lane, T.J., Candès, E.: Holographic phase retrieval and reference
design. Inverse Probl. 35(9), 094001 (2019)
Bauschke, H., Combettes, P., Luke, R.: Hybrid projection–reflection method for phase retrieval.
JOSA A 20(6), 1025–1034 (2003)
Bernstein, F., Koetzle, T., Williams, G., Meyer Jr, E., Brice, M., Rodgers, J., Kennard, O.,
Shimanouchi, T., Tasumi, M.: The protein data bank: a computer-based archival file for
macromolecular structures. J. Mol. Biol. 112(3), 535–542 (1977)
Bostan, E., Froustey, E., Rappaz, B., Shaffer, E., Sage, D., Unser, M.: Phase retrieval by using
transport-of-intensity equation and differential interference contrast microscopy. In: 2014 IEEE
International Conference on Image Processing (ICIP), pp. 3939–3943. IEEE (2014)
36 Randomized Kaczmarz Method for Single Particle X-Ray Image Phase Retrieval 1287
Candès, E., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from
magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1241–
1274 (2013)
Candès, E., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms.
IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Chang, H., Lou, Y., Ng, M., Zeng, T.: Phase retrieval from incomplete magnitude information via
total variation regularization. SIAM J. Sci. Comput. 38(6), A3672–A3695 (2016)
Chang, H., Lou, Y., Duan, Y., Marchesini, S.: Total variation–based phase retrieval for poisson
noise removal. SIAM J. Imag. Sci. 11(1), 24–55 (2018)
Chen, Y., Candès, E.: Solving random quadratic systems of equations is nearly as easy as solving
linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017)
Fienup, J., Wackerman, C.: Phase-retrieval stagnation problems and solutions. JOSA A 3(11),
1897–1907 (1986)
Fornasier, M., Rauhut, H.: Recovery algorithms for vector-valued data with joint sparsity
constraints. SIAM J. Numer. Anal. 46(2), 577–613 (2008)
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear
Mechanics. SIAM, Philadelphia (1989)
Goldstein, T., Osher, S.: The split bregman method for l1-regularized problems. SIAM J. Imag.
Sci. 2(2), 323–343 (2009)
Goldstein, T., Studer, C.: Convex phase retrieval without lifting via phasemax. In: International
Conference on Machine Learning, pp. 1273–1281. PMLR (2017)
Gu, H., Xian, Y., Unarta, I., Yao, Y.: Generative adversarial networks for robust Cryo-EM image
denoising. arXiv preprint arXiv:2008.07307 (2020)
Hefny, A., Needell, D., Ramdas, A.: Rows versus Columns: Randomized Kaczmarz or Gauss–
Seidel for Ridge Regression. SIAM J. Sci. Comput. 39(5), S528–S542 (2017)
Hunt, X., Reynaud-Bouret, P., Rivoirard, V., Sansonnet, L., Willett, R.: A data-dependent weighted
LASSO under poisson noise. IEEE Trans. Inf. Theory 65(3), 1589–1613 (2018)
Jiao, Y., Jin, B., Lu, X.: Preasymptotic convergence of randomized Kaczmarz method. Inverse
Probl. 33(12), 125012 (2017)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance
reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
Langemann, D., Tasche, M.: Phase reconstruction by a multilevel iteratively regularized gauss–
newton method. Inverse Probl. 24(3), 035006 (2008)
Le, T., Chartrand, R., Asaki, T.: A variational approach to reconstructing images corrupted by
poisson noise. J. Math. Imag. Vis. 27(3), 257–263 (2007)
Li, F., Abascal, J., Desco, M., Soleimani, M.: Total variation regularization with split Bregman-
based method in magnetic induction tomography using experimental data. IEEE Sens. J. 17(4),
976–985 (2016)
Li, P.: EE368 project: phase processing with a priori. http://github.com/leeneil/adm (2016)
Liu, H., Spence, J.: XFEL data analysis for structural biology. Quant. Biol. 4(3), 159–176 (2016)
Luke, R.: Relaxed averaged alternating reflections for diffraction imaging. Inverse Probl. 21(1),
37 (2004)
Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation:
Gradient descent converges linearly for phase retrieval and matrix completion. In: International
Conference on Machine Learning, pp. 3345–3354. PMLR (2018)
Marchesini, S.: Invited article: a unified evaluation of iterative projection algorithms for phase
retrieval. Rev. Sci. Instrum. 78(1), 011301 (2007)
Marchesini, S., He, H., Chapman, H., Hau-Riege, S., Noy, A., Howells, M., Weierstall, U., Spence,
J.: X-ray image reconstruction from a diffraction pattern alone. Phys. Rev. B 68(14), 140101
(2003)
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the
randomized Kaczmarz algorithm. Adv. Neural Inf. Process. Syst. 27, 1017–1025 (2014)
Qiu, T., Palomar, D.: Undersampled sparse phase retrieval via majorization–minimization. IEEE
Trans. Sig. Process. 65(22), 5957–5969 (2017)
1288 Y. Xian et al.
Ramos, T., Grønager, B., Andersen, M., Andreasen, J.: Direct three-dimensional tomographic
reconstruction and phase retrieval of far-field coherent diffraction patterns. Phys. Rev. A 99(2),
023801 (2019)
Reddy, H., Yoon, C., Aquila, A., Awel, S., Ayyer, K., Barty, A., Berntsen, P., Bielecki, J., Bobkov,
S., Bucher, M.: Coherent soft X-ray diffraction imaging of coliphage PR772 at the Linac
coherent light source. Sci. Data 4(1), 1–9 (2017)
Scheres, S.: RELION: implementation of a bayesian approach to cryo-em structure determination.
J. Struct. Biol. 180(3), 519–530 (2012)
Seifert, B., Stolz, H., Donatelli, M., Langemann, D., Tasche, M.: Multilevel Gauss–Newton
methods for phase retrieval problems. J. Phys. A: Math. General 39(16), 4191 (2006)
Shi, Y., Yin, K., Tai, X., DeMirci, H., Hosseinizadeh, A., Hogue, B., Li, H., Ourmazd, A.,
Schwander, P., Vartanyants, I.: Evaluation of the performance of classification algorithms for
XFEL single-particle imaging data. IUCrJ 6(2), 331–340 (2019)
Sixou, B., Davidoiu, V., Langer, M., Peyrin, F.: Absorption and phase retrieval with Tikhonov and
joint sparsity regularizations. Inverse Probl. Imag. 7(1), 267 (2013)
Sorzano, C., Marabini, R., Velázquez-Muriel, J., Bilbao-Castro, J., Scheres, S., Carazo, J., Pascual-
Montano, A.: XMIPP: a new generation of an open-source image processing package for
electron microscopy. J. Struct. Biol. 148(2), 194–204 (2004)
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J.
Fourier Anal. Appl. 15(2), 262–278 (2009)
Tan, Y., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees. Inf. Infer.
J. IMA 8(1), 97–123 (2019)
Tropp, J.: Algorithms for simultaneous sparse approximation. Part II: Convex relaxation. Sig.
Process. 86(3), 589–602 (2006)
Wang, G., Giannakis, G., Eldar, Y.: Solving systems of random quadratic equations via truncated
amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2017)
Wang, H., Wang, J.: How cryo-electron microscopy and X-ray crystallography complement each
other. Protein Sci. 26(1), 32–39 (2017)
Wei, K.: Solving systems of phaseless equations via kaczmarz methods: A proof of concept study.
Inverse Probl. 31(12), 125008 (2015)
Wu, C., Tai, X.: Augmented lagrangian method, dual methods, and split bregman iteration for
ROF, vectorial TV, and high order models. SIAM J. Imag. Sci. 3(3), 300–339 (2010)
Xian, Y., Gu, H., Wang, W., Huang, X., Yao, Y., Wang, Y., Cai, J.: Data-driven tight frame for Cryo-
EM image denoising and conformational classification. In: 2018 IEEE Global Conference on
Signal and Information Processing (GlobalSIP), pp. 544–548. IEEE (2018)
Zhang, H., Liang, Y.: Reshaped wirtinger flow for solving quadratic system of equations. Adv.
Neural Inf. Process. Syst. 29, 2622–2630 (2016)
Zhang, X., Lu, Y., Chan, T.: A novel sparsity reconstruction method from poisson data for 3d
bioluminescence tomography. J. Sci. Comput. 50(3), 519–535 (2012)
A Survey on Deep Learning-Based
Diffeomorphic Mapping 37
Huilin Yang, Junyan Lyu, Roger Tam, and Xiaoying Tang
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
Diffeomorphic Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
Problem Statement and Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293
Deep Learning-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295
Related Deep Network Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1298
Fully Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1300
U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1300
Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
Huilin Yang and Junyan Lyu contributed equally with all other contributors.
H. Yang
Department of Electronic and Electrical Engineering, Southern University of Science and
Technology, Shenzhen, Guangdong, China
School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
e-mail: [email protected]
J. Lyu
Department of Electronic and Electrical Engineering, Southern University of Science and
Technology, Shenzhen, Guangdong, China
Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
e-mail: [email protected]
R. Tam
School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
e-mail: [email protected]
X. Tang ()
Department of Electronic and Electrical Engineering, Southern University of Science and
Technology, Shenzhen, Guangdong, China
e-mail: [email protected]
Abstract
Keywords
Introduction
Diffeomorphic Mapping
and shapes in all subsequent contexts. After transformation, it is supposed that the
deformed moving object φ1 · m should be very close to the fixed object f . vt is
the velocity of the transformation with respect to time t. We name it as a dynamic
velocity field method when the velocity of the mapping varies across time and as a
stationary velocity field method when the velocity of the mapping stays static during
transformation. When t = 0, the registration field is identity such that φ0 · m = m,
and the optimal registration field φ1 is obtained at t = 1. γ is a weight ranging from
0 to 1, serving as a trade-off coefficient between the regularization term and the
overall discrepancy term. Increasing γ imposes more weight on the registration field
enforcing a smoother transformation, whereas decreasing γ puts more attention on
the discrepancy term making the deformed moving object closer to the fixed object.
It should be noted that, in traditional methods, we get only one optimal
registration field at the time t = 1 after optimizing the objective function with
respect to a pair of moving object and fixed object. At the top panel of Fig. 3,
we present the flowchart of the typical optimization scheme in traditional methods.
There are a variety of methods that can be categorized into this category, including
large LDDMM (Beg et al. 2005; Vaillant et al. 2007; Glaunes et al. 2008) and
SVF (Modat et al. 2012). During the past decade, they have been extensively and
successfully applied to various biomedical applications (Tang et al. 2019; Jiang
et al. 2018; Yang et al. 2015, 2017a; Bossa et al. 2010). Nevertheless, since these
methods all make use of traditional optimization schemes and biomedical data
usually have large size especially for 3D data such as MRI and CT, it usually takes
such methods up to several hours to process one pair of objects of interest. In order
Fig. 3 An overview of traditional and deep learning-based registration methods. The top panel
shows the flowchart of traditional methods, and the bottom panel shows the flowcharts of two
types of deep learning-based methods (unsupervised ones and supervised ones)
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1295
to address this problem, recent researches have started to focus on learning the
registration field through deep learning. Endowed by powerful functional properties
and computational efficiency on GPU of deep neural networks, the registration time
has been largely reduced and is capable of predicting not only one but multiple
registration fields. Once training has been finished, a pair of even large-size objects
can be processed within several seconds. According to the learning style, deep
learning-based diffeomorphic mapping can be categorized into two major classes,
namely, unsupervised methods and supervised methods.
Deep learning-based methods can be divided into unsupervised ones and supervised
ones according to whether they require labels from traditional methods. A brief
summary of key information for deep learning-based methods is illustrated in Fig. 4.
Details will be described in the following subsections.
Unsupervised Methods
Unsupervised methods refer to such a kind of approach that trains a deep neural
network without any information of registration fields obtained through traditional
methods. It usually directly minimizes the discrepancy between the deformed
moving object and the fixed object together with regularization on the registration
field. In the upper part of the bottom panel in Fig. 3, we show the flowchart of
unsupervised methods, which takes as inputs a fixed object and a moving object and
then feeds them into a deep neural network to predict the corresponding registration
field. The subsequent spatial transform module takes the predicted registration field
and the moving object as inputs to perform diffeomorphic deformation yielding
the deformed moving object (also called moved object). Some methods only need
to input fixed objects, and the moving object (in dashed box in Fig. 3) can be
estimated during training. The whole training phase makes use of only two kinds
of information, namely, the fixed objects and some methods the corresponding
moving objects. Regularization is imposed on the registration field through the loss
term Lsmooth (φ1 ), and measurement of similarity is conducted through minimizing
the discrepancy loss Ldis (m · φ1 , f ). Unsupervised methods mimic the traditional
optimization scheme except that they aim at predicting diffeomorphisms between
a set of template-and-target pairs instead of between only one pair of template and
target within one single training course.
Supervised Methods
Compared with unsupervised methods, supervised methods usually take three kinds
of information as inputs including the fixed objects, the moving objects, and
parameterizations of the corresponding registration fields such as the velocity or
momentum acquired from performing traditional methods. This means that we first
need to conduct traditional diffeomorphic registrations on all pairs of moving-
sup
and-fixed objects to obtain the corresponding registration fields, called φ1 , as
ground truth. We then use them together with all moving-and-fixed pairs of objects
as training materials. The lower part of the bottom panel in Fig. 3 shows the
sup
training flow of supervised methods. The loss function Ldis (φ1 , φ1 ) minimizes
the discrepancy between the predicted registration field φ1 and the pre-obtained
sup
“ground truth” registration field φ1 . Supervised methods assume the registration
fields obtained through the traditional optimization scheme are optimal and try to
make the learning-based predictions as close to them as possible.
The remainder of this chapter is organized as follows. Related deep network
introduction and summary will be described in section “Related Deep Network
Introduction”. Unsupervised methods will be reviewed in section “Unsupervised
Methods”, followed by a survey of supervised methods in section “Supervised
Methods”. We will also cover current achievements and related applications in sec-
tion “Discussion and Future Direction”. After reviewing existing works, potential
emerging topics and future directions will be elaborated. Finally, conclusion of this
survey will be organized in section “Conclusions”.
Table 1 (continued)
Han et al. Differential Brain with
(2020) CNN SVF NLCC operator on v MRI Brain tumors
Shen et al. U-Net SVF NLCC Differential MRI Knee Symmetric
(2019a) operator on v loss
Louis et al. RNN Dynamic SSD Gaussian MRI Brain Force small
(2019) smoothing variance on
layer on v latent space
Niethammer CNN SVF NLCC OMT on MRI Brain OMT on
et al. (2019) multi- local
Gaussian deformation
kernel
weights
Krebs et al. CVAE SVF SSD Gaussian MRI Cardiac Regularity on
(2021) TCN smoothing both spatial
layer on v and temporal
domains
Mok and CNN SVF NLCC ∇ 2 on v MRI Brain Coarse-to-
Chung (2020b) fine;
Laplacian
pyramid
framework
Shen et al. CNN RD- Multi- ∇ 2 on v; CT Lung; Multi- kernel
(2019b) MM kernel OMT knee NLCC
NLCC regularity similarity
metric
Hoffmann U-Net SVF Soft Dice ∇ on d MRI Brain Train purely
et al. (2020) with
synthetic data
Amor et al. Res- LDD- CD; EMD LDDMM Mesh Cortex; Train on one
(2021) Net MM regularity heart; pair of data
on v liver;
femur;
hand
methods and Table 2 for supervised methods. In this section, we will introduce in
detail several main types of deep neural network (DNN) architectures that have been
adopted in existing diffeomorphic mapping approaches.
kernels operating on local input regions to extract features. A pooling layer performs
linear or nonlinear downsampling on feature maps to reduce spatial resolution and
summarize local information. An activation layer can be the sigmoid function, the
hyperbolic tangent function (Tanh), or the rectified linear unit (ReLU) (Sibi et al.
2013), introducing nonlinearity to CNNs. A fully connected layer is the same as
multi-layer perceptron, providing final classification or regression predictions. By
stacking these layers hierarchically, CNNs gain large receptive fields and thus can
exploit and capture scale-invariant and translation-invariant features. Several novel
CNN architectures including VGGNet (Simonyan and Zisserman 2014), Inception
1300 H. Yang et al.
Fig. 5 A typical hierarchy of convolutional neural networks. (Taken from Krizhevsky et al. 2012)
(Szegedy et al. 2017), ResNet (He et al. 2016), and DenseNet (Iandola et al. 2014)
have been proposed since 2012, after AlexNet achieved a big breakthrough on
ImageNet classification; the error rate was reduced by half (Krizhevsky et al. 2012).
U-Net
Fig. 6 Illustrations of U-Net (left) and V-Net (right). (Taken from Ronneberger et al. 2015 and
Milletari et al. 2016)
Autoencoders
Autoencoders are first proposed for dimensionality reduction (Wang et al. 2014).
An autoencoder learns an approximation of the identity mapping through an
encoder-decoder structure: an encoder extracts the latent space representation of the
input, and a decoder reconstructs the input with the extracted vector. The encoder
and decoder can be multi-layer perceptrons, CNNs, or any feed-forward neural
networks. Supervised by the identity loss (mean absolute error or mean square error)
between the input and the output, the latent vector is able to provide a compact
expression of the input in a lower-dimensional space. Autoencoders have been
applied to principal component analysis (Kramer 1991), image denoising (Gondara
2016), and anomaly detection (Zhou and Paffenroth 2017).
Variational autoencoder (VAE) (Kingma and Welling 2013) is one of the
most important variants of autoencoders. Unlike a traditional autoencoder, a VAE
assumes that the latent space fits a certain probability distribution, such as a
Gaussian distribution, and estimates the parameters of this probability distribution
from the input data. Therefore, the approximated distribution of the latent space
of a VAE matches the input space closer than that of a traditional autoencoder.
In addition to minimizing the identity loss between the input and the output,
a regularization term of Kullback-Leibler (KL) divergence between the desired
distribution and the predicted distribution is used to train a VAE.
CNNs and the other aforementioned networks are unable to handle input sequences
of various lengths and thus cannot model the temporal correlations within
sequences. Recurrent neural networks (RNNs) (Karpathy et al. 2015) are proposed
to solve this problem and have been widely used to process text, video, and time
series. At each timestamp, a RNN collects the previous hidden state vector and
the current input vector to update the current hidden state and produces output by
sending the current hidden state vector to a feed-forward network.
However, RNNs suffer from vanishing or exploding gradients as the sequences
grow longer (Karpathy et al. 2015), resulting in poor performance on capturing
long-term dependencies. Long short-term memory (LSTM) (Karpathy et al. 2015) is
explicitly designed to address such long-term dependency issue. LSTM introduces
three gates to protect and control the cell state and the hidden state: the forget gate
is used to determine how much information in the previous cell state should be kept;
the input gate is used to collect useful information from the current input and the
previous hidden state and add them to the filtered previous cell state so as to update
the current cell state; and the output gate is used to output a filtered informative
vector, namely, the current hidden state, from the updated cell state. All these three
types of gates take the previous hidden state as well as the current input as inputs
for calculations of their corresponding filter coefficients (Fig. 7).
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1303
Fig. 7 Architectures of a RNN (left) model and a LSTM (right) model. (Taken from Karpathy
et al. 2015)
Unsupervised Methods
Loss Function
where Lsim is the similarity term measuring the difference between the deformed
moving objects m · φ1 and the fixed objects f and Lreg is the regularization term
imposing certain constraint on the registration fields φ1 to make them diffeomor-
phic. In the process of minimizing the loss function, the set of deformed moving
objects is increasingly closer to the set of fixed objects, and the corresponding
registration fields are becoming smoother. γ is a trade-off factor between the
similarity term and the regularization term. A too large γ will result in inadequate
registration fields that cause highly inaccurate registrations, whereas a too small
γ will lead to overly flexible registration fields that might be irregular and not
diffeomorphic anymore. In practice, γ is usually empirically chosen.
Similarity Metrics
For different data types, there are different metrics to quantify the similarity between
the moved objects m·φ1 and the fixed objects f . For image data, mean squared error
1304 H. Yang et al.
1 2
MSE(m · φ1 , f ) = [m · φ1 ](p) − f (p) . (4)
|Ω|
p∈Ω
In MSE, p indexes image pixels or voxels and Ω represents the whole image. Since
loss function is minimized to train the deep network, a small MSE is desired to yield
a good alignment result. The similarity loss term with MSE can be directly written as
Lsim (m·φ1 , f ) = MSE((m·φ1 , f )). Unlike MSE that measures a global difference,
NLCC quantifies local cross-correlation and is commonly termed as CC, which is
computed over the whole image. CC can be written as
CC(m · φ1 , f ) = NLCC
p∈Ω
2
pi ([m · φ1 ](pi ) − [m · φ1 ](p̄))(f (pi ) − f (p̄))
= (5)
2 2
p∈Ω pi f (p i ) − f ( p̄) pi [m · φ 1 ](p i ) − [m · φ 1 ]( p̄)
where f (p̄) and [m · φ1 ](p̄) denote images with local mean intensities, f (p̄) =
1
n d pi
f (pi ), with pi iterates over a n2 area (2D) or a n3 volume (3D) around p. A
higher CC indicates a better alignment, yielding the loss function: Lsim (m·φ1 , f ) =
−CC(m · φ1 , f ).
For shape data such as landmarks, curves, or meshes, l2 norm or norm of
differences on manifold-based vector-valued metrics (Vaillant et al. 2007; Glaunes
et al. 2008) is usually taken as the similarity term.
or
Lreg (φ) = ||∇ 2 u(p)||2 , (7)
p∈Ω
where ∇ is the gradient operator. Other than formulating the regularization term as
part of the loss function, employing a smoothing filter like Gaussian convolution
right behind the layer for predicting displacement or velocity is also a useful
way to smooth the registration field. Applying a Gaussian convolution layer is
equivalent to imposing a diffusion-like regularization prior on the predicted velocity
or displacement (Krebs et al. 2019).
CNN-Based Methods
measures the averaged number of voxels whose Jacobian determinants of the defor-
mation field are less than 0, serving as a local orientation consistency regularity. Lreg
measures the l2 norm on the gradients of vXY and vY X across all voxels serving as a
global smoothness regularity. Lmag measures the averaged discrepancy between the
l2 norm of vXY and that of vY X , explicitly guaranteeing the magnitudes of the two
predicted symmetric velocity fields to be (approximately) the same. Both Lmean and
Lmag enforce the mapping and the corresponding inverse mapping to be symmetric.
Comparing SYMNet with ANTs SyN (Avants et al. 2008), VoxelMorph (Balakr-
ishnan et al. 2019), and VoxelMorph-diff (Dalca et al. 2018) are conducted via atlas-
based registration using 425 T1-weighted brain MRI scans from OASIS (Fotenos
et al. 2005). Different from the original experimental settings in VoxelMorph
and VoxelMorph-diff, all learning-based methods involved in the comparisons
are trained by pairwise registrations of all image pairs in the training set. The
average Dice scores for SYMNet, VoxelMorph, VoxelMorph-diff, and ANTs SyN
are, respectively, 0.738, 0.707, 0.693, and 0.680 (0.567 for affine only), and the
corresponding numbers of voxels whose Jacobian determinants are less than or
equal to 0 are, respectively, 0.471, 0.588, 346.712, and 0.047. The running time
is 0.414s for SYMNet, 0.695 s for VoxelMorph, and 0.517 s for VoxelMorph-diff on
a NVIDIA GTX 1080Ti GPU and 1039 s for ANTs SyN on an Intel Core i7-7700
CPU. SYMNet achieves the best performance on the evaluated dataset. Ablation
studies successfully validate the effectiveness of the local orientation-consistency
loss proposed by SYMNet.
VAE-Based Methods
Different from CNN-based methods which usually directly estimate the parameter-
izations (displacement, velocity, or momentum) of the registration field, VAE-based
methods estimate a latent space that encodes the deformation space through an
encoder and predict the velocity field through a decoder. Subsequent layers deform
the moving image to reconstruct the input image (the fixed image) with the predicted
velocity field, yielding the deformed moving image. Furthermore, the template,
namely, the moving image, can be simultaneously estimated together with the latent
space in the training phase. Two representative works will be described in detail,
and more related works will be briefly covered in section “More Related Works”.
Supervised Methods
Loss Function
For supervised learning-based diffeomorphic mapping, the loss function also con-
sists of a similarity term and a regularization term. However, the similarity term is
completely different from that in an unsupervised method. The typical loss function
can be written as follows:
where Lsim is the similarity term that measures the difference between the param-
eterization u of the predicted deformation and usup obtained from conducting
one-to-one registration utilizing traditional methods. Lreg is the regularization term
that imposes certain constraint on the parameterization of the deformation. When
minimizing the loss function, the set of the estimated parameters of the deformation
is increasingly closer to the set of the ground truth usup , and the registration field is
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1311
progressively smoother. γ is a trade-off factor between the similarity term and the
regularization term, which behaves similarly as the unsupervised one.
Similarity Metrics
So far, supervised learning-based diffeomorphic mappings are focused on image
data and usually employ sum of squared difference (SSD), also called MSE, as the
similarity metric:
1 sup
SSD(usup , u) = ||u − u||2 , (9)
|Ω|
p∈Ω
where p indexes image pixels or voxels and Ω represents the whole image.
u and usup , respectively, represent the predicted parameterization and the one
obtained from traditional methods. Since the loss function is minimized to train
the framework, a small SSD is desired to yield a good alignment. The similarity
loss term with SSD can be directly written as Lsim (usup , u) = SSD(usup , u).
CNN-Based Methods
Quicksilver (Yang et al. 2016) proposes a fast predictive image registration method
in 2016 which focuses only on atlas-based registration. A later version (Yang et al.
2017b) extends the former work to multi-modal image registration. Quicksilver
(Yang et al. 2017c) is an enhanced version of the two previous works. It is a patch-
based learning framework that mimics LDDMM by (approximately) predicting
LDDMM’s momentum through neural networks instead of employing traditional
LDDMM. The predicted momentum is constrained by a LDDMM regularity term
so as to ensure smooth mapping. Concretely, two patches of size 15 × 15 × 15
of the same location, respectively, taken from the moving image and the fixed
image are fed into the framework to learn feature maps, which encode spatial and
contextual information of the inputs. The feature maps are subsequently passed
through three independent decoding branches with identical network structure to
predict the corresponding momentum at the three axes. SSD is employed as the
similarity metric to train the network. An extra shooting procedure (Vialard et al.
2012) not included in the network is adopted to perform registration with the
1312 H. Yang et al.
Fig. 8 An example of velocity field in spatial domain and Fourier domain. (Taken from Wang and
Zhang 2020b)
predicted momentum. It is worth noting since the input patches are extracted from
the whole MRI scans, a large stride 14 of the sliding window for the three axes is
preferable considering the computational cost.
Besides, a probabilistic framework is presented to evaluate the registration
uncertainty. It assumes the prior on the weights of each layer of the network is
a diagonal matrix, each entry of which is drawn from a Bernoulli distribution (a
way of drop out). A correction network is additionally proposed to further boost
the registration accuracy. Specifically, the momentum predicted in the previous
procedure is regarded as an initial prediction and is used to apply backward warp
to the fixed patch. The moving patch and the backwardly warped fixed patch are
subsequently fed into the correction network to estimate the residual momentum
between the initial prediction and the true one (obtained from performing traditional
LDDMM) with a residual connection. Results from LDDMM of traditional scheme
implemented in PyCA (Singh et al. 2013) with GPU are employed to obtain
the supervised labels. There are 3 types of evaluations, including atlas-to-image
registration on 150 MRI scans from the OASIS longitudinal dataset (Fotenos et al.
2005), image-to-image registration on 373 MRI scans from the OASIS longitudinal
dataset (Fotenos et al. 2005) for training and 2168 MRI scans from 4 datasets
(LPBA40, IBSR18, MGH10, CUMC12) (Klein et al. 2009) for testing, and multi-
modal registration (T1-weighted to T2-weighted) on 375 MRI scans from the IBIS
3D Autism Brain image dataset (Hazlett et al. 2017). Three metrics including
target overlap (Yang et al. 2017c), number of voxels whose logarithm Jacobian
determinant of the registration field are equal to or less than 0, and deformation
errors (mm) are adopted for evaluation purposes. Comparisons are conducted with
several related methods with respect to the three metrics.
registration, with SSD as the similarity metric. Unlike Quicksilver which employs
independent branches for predicting momentum of each axis, SVF-Net instead
estimates the velocity using a 4D map, the last dimension of which, respectively,
represents the velocity in x, y, z axes. No explicit regularization is presented in the
loss function of SVF-Net. The true velocity labels are obtained by using an iterative
log-approximation scheme with the scaling and squaring approach (Arsigny et al.
2006). It starts with the displacement field defined on the whole image grid and
parameterizes a transformation that maps a set of selected landmarks from the
moving image to the corresponding fixed image.
Inter-patient registration is conducted on 187 segmented 3D MRI cardiac scans
acquired from multiple clinical centers. Small translations in x and y axes are
performed as data augmentation for the training data. Results with respect to four
evaluation metrics (Dice, HD, NLCC, relative variance of Log-Jacobian) for SVF-
Net and LCC Log-Demons (Lorenzi et al. 2013) on four ROIs are shown. When
performing on a NVIDIA TitanX GPU, SVF-Net takes less than 0.03 s for one pair
of registration.
two learning-based methods: Quicksilver (Yang et al. 2017c) and VoxelMorph (Bal-
akrishnan et al. 2019). When comparing Dice scores among these methods, 0.780
for DeepFLASH, 0.774 for VoxelMorph, 0.762 for Quicksilver, 0.788 for FLASH,
0.770 for ANTs SyN, and 0.760 for VM-LDDMM are obtained. Considering the
training time, DeepFLASH takes 14.1 h, VoxelMorph takes 29.7 h, and Quicksilver
takes 31.4 h under the same conditions. However, both DeepFLASH and Quicksil-
ver need extra time for acquiring the registration labels through conducting con-
ventional methods before the training procedure. The registration time on NVIDIA
GTX 1080Ti GPUs is, respectively, 0.273 s for DeepFLASH, 0.571 s for Voxel-
Morph, 0.760 s for Quicksilver, 53.4 s for FLASH, and 262 s for VM-LDDMM.
Besides, Krebs et al. (2017) explores training a reinforcement learning model with a
large number of synthetically deformed image pairs and a small number of real inter-
subject pairs through agent-based action learning. Pathan and Hong (2018) combine
LSTM and CNN to learn a predictive regression model based on LDDMM for longi-
tudinal images with missing data. Ding proposes a framework similar to Quicksilver,
called FPSGR (Ding et al. 2019; Kwitt and Niethammer 2017), to approximate a
simplified geodesic regression model so as to capture longitudinal brain changes.
To be specific, FPSGR predicts initial momenta supervised by the geodesic distance
between images. The geodesic regression can be solved by approximately perform-
ing pairwise image registrations between the first image and all subsequent images
of the longitudinal data. FPSGR-derived correlations with clinical indicators are also
analyzed. A work on arXiv (Wang and Zhang 2020a) first estimates the regularity
parameters of the image registrations for given image pairs using a CNN. Afterward,
a new two-stream CNN-based network is trained to estimate the mapping from
image pairs to their corresponding regularity parameters, under the supervision of
the estimated regularity parameters from the previous step. Table 2 lists the related
information of all reviewed supervised learning-based works.
Challenges
grid are considered and no strong constraint of the original shape is involved in the
network structures. Thus, how to design a more suitable deep registration framework
for shape data remains a challenging topic to explore.
Future Directions
Conclusions
Acknowledgments This study was supported by the National Natural Science Foundation
of China (62071210); the Shenzhen Basic Research Program (JCYJ20200925153847004,
JCYJ20190809120205578); and the High-Level University Fund (G02236002). The authors
would like to thank Yuanyuan Wei from the University of British Columbia for his help on this
chapter.
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1317
References
Amor, B.B., Arguillère, S., Shao, L.: Resnet-LDDMM: advancing the LDDMM framework using
deep residual networks (2021). arXiv preprint arXiv:210207951
Arganda-Carreras, I., Turaga, S.C., Berger, D.R., Cireşan, D., Giusti, A., Gambardella, L.M.,
Schmidhuber, J., Laptev, D., Dwivedi, S., Buhmann, J.M., et al.: Crowdsourcing the
creation of image segmentation algorithms for connectomics. Front. Neuroanat. 9, 142
(2015)
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Fast and simple calculus on tensors in the
log-Euclidean framework. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 115–122. Springer (2005)
Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A log-Euclidean framework for statistics on
diffeomorphisms. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 924–931. Springer (2006)
Ashburner, J.: A fast diffeomorphic image registration algorithm. Neuroimage 38(1), 95–113
(2007)
Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration
with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.
Med. Image Anal. 12(1), 26–41 (2008)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning
model for deformable medical image registration. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 9252–9260 (2018)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: a learning
framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–
1800 (2019)
Beg, M.F., Miller, M.I., Trouvé, A., Younes, L.: Computing large deformation metric mappings via
geodesic flows of diffeomorphisms. Int. J. Comput. Vis. 61(2), 139–157 (2005)
Bône, A., Louis, M., Colliot, O., Durrleman, S., Initiative, A.D.N., et al.: Learning low-dimensional
representations of shape data sets with diffeomorphic autoencoders. In: International Confer-
ence on Information Processing in Medical Imaging, pp. 195–207. Springer (2019)
Bossa, M., Zacur, E., Olmos, S., Initiative, A.D.N., et al.: Tensor-based morphometry with
stationary velocity field diffeomorphic registration: application to ADNI. Neuroimage 51(3),
956–969 (2010)
Cao, Y., Miller, M.I., Mori, S., Winslow, R.L., Younes, L.: Diffeomorphic matching of diffusion
tensor images. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop
(CVPRW’06), pp. 67–67. IEEE (2006)
Charon, N., Trouvé, A.: The varifold representation of nonoriented shapes for diffeomorphic
registration. SIAM J. Imaging Sci. 6(4), 2547–2580 (2013)
Cheng, J., Dalca, A.V., Fischl, B., Zöllei, L., Initiative, A.D.N., et al.: Cortical surface registration
using unsupervised learning. NeuroImage 221, 117161 (2020)
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast proba-
bilistic diffeomorphic registration. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 729–738. Springer (2018)
Dalca, A.V., Rakic, M., Guttag, J., Sabuncu, M.R.: Learning conditional deformable templates with
convolutional networks (2019a). arXiv preprint arXiv:190802738
Dalca, A.V., Yu, E., Golland, P., Fischl, B., Sabuncu, M.R., Iglesias, J.E.: Unsupervised deep
learning for Bayesian brain MRI segmentation. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 356–365. Springer (2019b)
Debavelaere, V., Durrleman, S., Allassonnière, S., Initiative, A.D.N.: Learning the clustering of
longitudinal shape data sets into a mixture of independent or branching trajectories. Int. J.
Comput. Vis. 128, 2794–2809 (2020)
Detlefsen, N.S., Freifeld, O., Hauberg, S.: Deep diffeomorphic transformer networks. In: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4403–4412
(2018)
1318 H. Yang et al.
Di Martino, A., Yan, C.G., Li, Q., Denio, E., Castellanos, F.X., Alaerts, K., Anderson, J.S., Assaf,
M., Bookheimer, S.Y., Dapretto, M., et al.: The autism brain imaging data exchange: towards a
large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19(6), 659–
667 (2014)
Ding, Z., Fleishman, G., Yang, X., Thompson, P., Kwitt, R., Niethammer, M., Initiative, A.D.N.,
et al.: Fast predictive simple geodesic regression. Med. Image Anal. 56, 193–209 (2019)
Durrleman, S.: Statistical models of currents for measuring the variability of anatomical curves,
surfaces and their evolution. PhD thesis, Université Nice Sophia Antipolis (2010)
Evan, M.Y., Dalca, A.V., Sabuncu, M.R.: Learning conditional deformable shape templates for
brain anatomy. In: International Workshop on Machine Learning in Medical Imaging, pp. 353–
362. Springer (2020)
Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012)
Fotenos, A.F., Snyder, A., Girton, L., Morris, J., Buckner, R.: Normative estimates of cross-
sectional and longitudinal brain volume decline in aging and ad. Neurology 64(6), 1032–1039
(2005)
Freifeld, O., Hauberg, S., Batmanghelich, K., Fisher, J.W.: Transformations based on continuous
piecewise-affine velocity fields. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2496–2509
(2017)
Glaunes, J., Qiu, A., Miller, M.I., Younes, L.: Large deformation diffeomorphic metric curve
mapping. Int. J. Comput. Vis. 80(3), 317–336 (2008)
Gondara, L.: Medical image denoising using convolutional denoising autoencoders. In: 2016 IEEE
16th International Conference on Data Mining Workshops (ICDMW), pp. 241–246. IEEE
(2016)
Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., Poupon, C., Hartmann, A., Ayache, N.,
Durrleman, S.: A Bayesian framework for joint morphometry of surface and curve meshes in
multi-object complexes. Med. Image Anal. 35, 458–474 (2017)
Han, X., Shen, Z., Xu, Z., Bakas, S., Akbari, H., Bilello, M., Davatzikos, C., Niethammer,
M.: A deep network for joint registration and reconstruction of images with pathologies. In:
International Workshop on Machine Learning in Medical Imaging, pp. 342–352. Springer
(2020)
Hazlett, H.C., Gu, H., Munsell, B.C., Kim, S.H., Styner, M., Wolff, J.J., Elison, J.T., Swanson,
M.R., Zhu, H., Botteron, K.N., et al.: Early brain development in infants at high risk for autism
spectrum disorder. Nature 542(7641), 348–351 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hinkle, J., Womble, D., Yoon, H.J.: Diffeomorphic autoencoders for LDDMM atlas building
(2018)
Hoffmann, M., Billot, B., Eugenio Iglesias, J., Fischl, B., Dalca, A.V.: Learning image registration
without images (2020). arXiv e-prints arXiv–2004
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet:
implementing efficient convnet descriptor pyramids (2014). arXiv preprint arXiv:14041869
Jack, C.R. Jr, Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B.,
Britson, P.J., Whitwell, J.L., Ward, C., et al.: The Alzheimer’s disease neuroimaging initiative
(ADNI): MRI methods. J. Magn. Reson. Imaging: Off. J. Int. Soc. Magn. Res. Med. 27(4),
685–691 (2008)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks
(2015). arXiv preprint arXiv:150602025
Jiang, Z., Yang, H., Tang, X.: Deformation-based statistical shape analysis of the corpus callosum
in mild cognitive impairment and Alzheimer’s disease. Curr. Alzheimer Res. 15(12), 1151–1160
(2018)
Joshi, S.C., Miller, M.I.: Landmark matching via large deformation diffeomorphisms. IEEE Trans.
Image Process. 9(8), 1357–1370 (2000)
Karpathy, A., Johnson, J., Fei-Fei, L.: Visualizing and understanding recurrent networks (2015).
arXiv preprint arXiv:150602078
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1319
Kaul, C., Manandhar, S., Pears, N.: Focusnet: An attention-based fully convolutional network
for medical image segmentation. In: 2019 IEEE 16th International Symposium on Biomedical
Imaging (ISBI 2019), pp. 455–458. IEEE (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013). arXiv preprint
arXiv:13126114
Klein, A., Andersson, J., Ardekani, B.A., Ashburner, J., Avants, B., Chiang, M.C., Christensen,
G.E., Collins, D.L., Gee, J., Hellier, P., et al.: Evaluation of 14 nonlinear deformation algorithms
applied to human brain mri registration. Neuroimage 46(3), 786–802 (2009)
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks.
AIChE J. 37(2), 233–243 (1991)
Krebs, J., Mansi, T., Delingette, H., Zhang, L., Ghesu, F.C., Miao, S., Maier, A.K., Ayache, N.,
Liao, R., Kamen, A.: Robust non-rigid registration through agent-based action learning. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,
pp. 344–352. Springer (2017)
Krebs, J., Delingette, H., Mailhé, B., Ayache, N., Mansi, T.: Learning a probabilistic model for
diffeomorphic registration. IEEE Trans. Med. Imaging 38(9), 2165–2176 (2019)
Krebs, J., Delingette, H., Ayache, N., Mansi, T.: Learning a generative motion model from image
sequences based on a latent motion matrix. IEEE Trans. Med. Imaging 40(5), 1405–1416 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural
networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Kwitt, R., Niethammer, M.: Fast predictive simple geodesic regression. In: Third International
Workshop DLMIA, p. 267 (2017)
Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques.
Neurocomputing 338, 321–348 (2019)
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to
action segmentation. In: European Conference on Computer Vision, pp. 47–54. Springer (2016)
Li, D., Yang, Y., Song, Y.Z., Hospedales, T.: Learning to generalize: meta-learning for domain
generalization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
(2018)
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M.: Deep learning for
generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020)
Lorenzi, M., Ayache, N., Frisoni, G.B., Pennec, X., ADNI, et al.: LCC-demons: a robust and
accurate symmetric diffeomorphic registration algorithm. NeuroImage 81, 470–483 (2013)
Louis, M., Charlier, B., Durrleman, S.: Geodesic discriminant analysis for manifold-valued
data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, pp. 332–340 (2018)
Louis, M., Couronné, R., Koval, I., Charlier, B., Durrleman, S.: Riemannian geometry learning
for disease progression modelling. In: International Conference on Information Processing in
Medical Imaging, pp. 542–553. Springer (2019)
Lyu, J., Cheng, P., Tang, X.: Fundus image based retinal vessel segmentation utilizing a fast and
accurate fully convolutional network. In: International Workshop on Ophthalmic Medical Image
Analysis, pp. 112–120. Springer (2019)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric
medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV),
pp. 565–571. IEEE (2016)
Modat, M., Daga, P., Cardoso, M.J., Ourselin, S., Ridgway, G.R., Ashburner, J.: Parametric non-
rigid registration using a stationary velocity field. In: 2012 IEEE Workshop on Mathematical
Methods in Biomedical Image Analysis, pp. 145–150. IEEE (2012)
Mok, T.C., Chung, A.: Fast symmetric diffeomorphic image registration with convolutional neural
networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 4644–4653 (2020a)
Mok, T.C., Chung, A.C.: Large deformation diffeomorphic image registration with laplacian
pyramid networks. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 211–221. Springer (2020b)
1320 H. Yang et al.
Niethammer, M., Kwitt, R., Vialard, F.X.: Metric learning for image registration. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8463–8472
(2019)
Olut, S., Shen, Z., Xu, Z., Gerber, S., Niethammer, M.: Adversarial data augmentation via
deformation statistics. In: European Conference on Computer Vision, pp. 643–659. Springer
(2020)
Pathan, S., Hong, Y.: Predictive image regression for longitudinal studies with missing data (2018).
arXiv preprint arXiv:180807553
Rohé, M.M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: SVF-Net: learning deformable
image registration using shape matching. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 266–274. Springer (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 234–241. Springer (2015)
Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel ‘squeeze & excitation’ in
fully convolutional networks. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 421–429. Springer (2018)
Shattuck, D.W., Mirza, M., Adisetiyo, V., Hojatkashani, C., Salamon, G., Narr, K.L., Poldrack,
R.A., Bilder, R.M., Toga, A.W.: Construction of a 3d Probabilistic Atlas of Human Cortical
Structures. Neuroimage 39(3), 1064–1080 (2008)
Shen, Z., Han, X., Xu, Z., Niethammer, M.: Networks for joint affine and non-parametric image
registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 4224–4233 (2019a)
Shen, Z., Vialard, F.X., Niethammer, M.: Region-specific diffeomorphic metric mapping (2019b).
arXiv preprint arXiv:190600139
Sibi, P., Jones, S.A., Siddarth, P.: Analysis of different activation functions using back propagation
neural networks. J. Theor. Appl. Inf. Technol. 47(3), 1264–1268 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition
(2014). arXiv preprint arXiv:14091556
Singh, N., Hinkle, J., Joshi, S., Fletcher, P.T.: A vector momenta formulation of diffeomorphisms
for improved geodesic regression and atlas construction. In: 2013 IEEE 10th International
Symposium on Biomedical Imaging, pp. 1219–1222. IEEE (2013)
Sun, S., Shi, H., Wu, Y.: A survey of multi-source domain adaptation. Inf. Fusion 24, 84–92 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact
of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 31 (2017)
Tang, X., Ross, C.A., Johnson, H., Paulsen, J.S., Younes, L., Albin, R.L., Ratnanather, J.T., Miller,
M.I.: Regional subcortical shape analysis in premanifest Huntington’s disease. Hum. Brain
Map. 40(5), 1419–1433 (2019)
Tian, L., Puett, C., Liu, P., Shen, Z., Aylward, S.R., Lee, Y.Z., Niethammer, M.: Fluid registration
between lung CT and stationary chest tomosynthesis images. In: International Conference on
Medical Image Computing and Computer-Assisted Intervention, pp. 307–317. Springer (2020)
Ulman, V., Maška, M., Magnusson, K.E., Ronneberger, O., Haubold, C., Harder, N., Matula,
P., Matula, P., Svoboda, D., Radojevic, M., et al.: An objective comparison of cell-tracking
algorithms. Nat. Methods 14(12), 1141–1152 (2017)
Vaillant, M., Glaunes, J.: Surface matching via currents. In: Biennial International Conference on
Information Processing in Medical Imaging, pp. 381–392. Springer (2005)
Vaillant, M., Qiu, A., Glaunès, J., Miller, M.I.: Diffeomorphic metric surface mapping in subregion
of the superior temporal gyrus. NeuroImage 34(3), 1149–1159 (2007)
Vialard, F.X., Risser, L., Rueckert, D., Cotter, C.J.: Diffeomorphic 3D image registration via
geodesic shooting using an efficient adjoint calculation. Int. J. Comput. Vis. 97(2), 229–241
(2012)
Wang, J., Zhang, M.: Deep learning for regularization prediction in diffeomorphic image registra-
tion (2020a). arXiv preprint arXiv:201114229
37 A Survey on Deep Learning-Based Diffeomorphic Mapping 1321
Wang, J., Zhang, M.: Deepflash: an efficient network for learning-based medical image registration.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 4444–4452 (2020b)
Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework
for dimensionality reduction. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pp. 490–497 (2014)
Wilson, G., Cook, D.J.: A survey of unsupervised deep domain adaptation. ACM Trans. Intell.
Syst. Technol. (TIST) 11(5), 1–46 (2020)
Yang, X., Li, Y., Reutens, D., Jiang, T.: Diffeomorphic metric landmark mapping using stationary
velocity field parameterization. Int. J. Comput. Vis. 115(2), 69–86 (2015)
Yang, X., Kwitt, R., Niethammer, M.: Fast predictive image registration. In: Deep Learning and
Data Labeling for Medical Applications, pp. 48–57. Springer, Cham (2016)
Yang, H., Wang, J., Tang, H., Ba, Q., Yang, G., Tang, X.: Analysis of mitochondrial shape
dynamics using large deformation diffeomorphic metric curve matching. In: 2017 39th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
pp. 4062–4065. IEEE (2017a)
Yang, X., Kwitt, R., Styner, M., Niethammer, M.: Fast predictive multimodal image registration. In:
2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 858–862.
IEEE (2017b)
Yang, X., Kwitt, R., Styner, M., Niethammer, M.: Quicksilver: fast predictive image registration–a
deep learning approach. NeuroImage 158, 378–396 (2017c)
Yi, X., Walia, E., Babyn, P.: Generative adversarial network in medical imaging: a review. Med.
Image Anal. 58, 101552 (2019)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535.
IEEE (2010)
Zhang, M., Fletcher, P.T.: Bayesian principal geodesic analysis in diffeomorphic image regis-
tration. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 121–128. Springer (2014)
Zhang, M., Fletcher, P.T.: Fast diffeomorphic image registration via fourier-approximated lie
algebras. Int. J. Comput. Vis. 127(1), 61–73 (2019)
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of
the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
pp. 665–674 (2017)
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: a survey (2021). arXiv
preprint arXiv:210302503
Part III
Shape Spaces and Geometric Flows
Stochastic Shape Analysis
38
Alexis Arnaudon, Darryl Holm, and Stefan Sommer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326
Key Concepts from Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328
Large Deformation Diffeomorphic Metric Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328
Metric and Variational Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329
Hamiltonian Systems and Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1330
Hamiltonian Systems and Landmark Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1330
Noise from a Statistical Physics Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1332
Non-dissipative Stochastic Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333
Riemannian Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334
Lagrangian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335
Eulerian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
Stochastic Euler-Poincaré Reduction and Its Infinite Dimensional Extension . . . . . . . . . . . 1337
Reduction by Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1338
Stochastic EPDiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339
Other Stochastic Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340
Stochastic Models in Shape Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340
Random Orbits with Time-Continuous Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340
Noise Inference from Evolution of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1341
Likelihood-Based Inference and Bridge Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1342
Likelihood Maximisation and Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 1343
A. Arnaudon
Department of Mathematics, Imperial College, London, UK
Blue Brain Project, École polytechnique fédéral de Lausanne (EPFL), Geneva, Switzerland
e-mail: [email protected]; [email protected]
D. Holm
Department of Mathematics, Imperial College, London, UK
e-mail: [email protected]
S. Sommer ()
Department of Computer Science (DIKU), University of Copenhagen, Copenhagen E, Denmark
e-mail: [email protected]
Abstract
Keywords
Introduction
Shape analysis is a vast topic that can be approached from many angles including
geometry, analysis, statistics and numerical analysis. Shape modelling and analysis
similarly finds application in a range of domains including biology, medical image
analysis, computer vision, computer graphics and engineering.
The mathematical study of shapes often involves geometric methods due to the
inherent nonlinearity of shape spaces. Examples include the setting of inner metrics
(Bauer et al. 2014), where a Riemannian structure is defined directly on the shape
space or the pattern theory pioneered by Grenander (1994) and advanced farther by,
e.g. Miller, Christensen, Trouvé and Younes (Christensen et al. 1996; Grenander and
Miller 1998; Younes 1998; Trouvé 1998) where sets of transformations of the shape
domain are equipped with geometric structure. Specifically, in the latter approach, a
right-invariant Riemannian metric is defined on a subgroup of the diffeomorphism
group, whose action on shapes descends to a Riemannian metric on the shape space
itself. Due to the transformation of the entire domain in which the shape resides,
this class of metrics is denoted outer metrics.
In both settings, Riemannian geometric structure is defined on the shape space.
Then, the optimal trajectory between two shapes, a matching of the shapes, is a
38 Stochastic Shape Analysis 1327
geodesic, and the dual of the Riemannian metric defines a Hamiltonian of which the
geodesic satisfies Hamilton’s equations. In the outer metric case, the Hamiltonian
is defined on both the shape space and the transformation group, and due to the
invariance of the metric, the concepts of momentum maps and symmetry reduction
of the Hamiltonian flow have important roles (Holm et al. 2004; Bruveris et al.
2009).
The foregoing models treat shape transformation as deterministic smooth evolu-
tions in the shape space. However, both from applied and theoretical perspectives, it
has been of recent interest to generalise these models to admit stochastic transforma-
tions of shapes. For example, evolutionary biology incorporates stochasticity in the
models of species change, organs may evolve stochastically during the development
of a disease, and stochastic processes define probability distributions which can be
used for statistical analysis.
Several stochastic shape models exist, each with different properties. With the
variety of shape problems that require randomness, several stochastic frameworks
must be available. It is therefore relevant to have several models, each with different
properties. Here, we will focus on four models that are applicable to at least the
simplest shape representation with landmarks. In Fig. 1, we illustrate these four
models by solving an initial value problem forward from a configuration of 21
landmarks:
(a) Riemannian Brownian motion: this noise corresponds to pure Brownian motion
but on the shape manifold. It does not have any initial momenta and has little
spatial correlation.
(b) Langevin dynamics: landmarks are interpreted as interacting particles in a heat
bath; it has an initial momenta and noise on the momentum, so more regular
trajectories. The dissipation term slows down the landmark trajectories.
Fig. 1 Examples of stochastically evolving landmarks configurations from a H shape (in blue
dots) up to time 1 (dark dots) with models surveyed in this chapter: (a) Riemannian Brownian
motion (Sommer et al. 2017); (b) Lagrangian noise (Trouve and Vialard 2012); (c) Langevin
dynamics (Marsland and Shardlow 2017); (d) Eulerian noise (Arnaudon et al. 2019a). See the
text for more details on these noise models
1328 A. Arnaudon et al.
(c) Lagrangian noise: landmarks have their own intrinsic additive noise and
initial momenta, as in the Langevin dynamics, but without dissipation; thus,
trajectories move further in space for the same initial momenta. The spatial
correlation of the trajectories results only from the interactions of landmarks.
(d) Eulerian noise: the image has noise fields encoding spatially correlated noise
(a grid of 4 by 4 Gaussian kernels uniformly covering the landmark trajectory
space), on which landmarks move and interact. This model has initial momenta,
and noisy trajectories, but with full control on the spatial correlation of the noise
via the noise fields.
Chapter content. The chapter starts with a short outline of key concepts from
non-stochastic shape analysis in section “Key Concepts from Shape Analysis”
focusing on the outer metric viewpoint. We then review basic Hamiltonian dynamics
with and without noise in Hamiltonian Systems and Noise, with a statistical
physics perspective; then, in section “Non-dissipative Stochastic Shape Models”, we
describe three of four main stochastic models for landmarks dynamics, illustrated
in Fig. 1. The last model, with Eulerian noise, can be extended to other types
of shape spaces in a geometrical way, which is described in section “Stochastic
Euler-Poincaré Reduction and Its Infinite Dimensional Extension”. A few other
approaches to stochastic shape analysis are outlined in section “Other Stochastic
Shape Models”, and some applications in statistics of shapes are described in
section “Stochastic Models in Shape Statistics”. We end with a discussion section
in “Conclusion and Outlook”.
This section briefly outlines the non-stochastic shape space theory, as a basis
on which stochastic extensions will be constructed in the rest of the chapter.
We refer the reader to texts such as Younes (2010) and Marsland and Sommer
(2020) for further details or Holm (2011) for the geometric mechanics foundations
for the theory. The constructions we describe below fall into the category of
outer metrics, specifically the large deformation diffeomorphic metric mapping
(LDDMM, Christensen et al. 1996; Grenander and Miller 1998; Younes 1998;
Trouvé 1998) framework.
The starting point of LDDMM is the action of the diffeomorphism group Diff(Ω)
of a domain Ω ⊆ Rd on shapes being landmarks, curves, surfaces, images or tensor
fields. We here define the shape space and action in two of those cases, landmarks
and curves:
One aim of shape analysis is to define a good notion of distance between shapes.
The LDDMM approach starts with the problem of shape matching through the
optimisation problem
1 1
min E(v) , E(v) = v(t)2V + S(φ(1).q0 , q1 ), (1)
v 0 2λ2
which makes V a reproducing kernel Hilbert space (see, e.g. Younes 2010).
The variational formulation has a number of important consequences:
1
H (φ, m) = Km2V , (4)
2
1 1
(φ, v) = v ◦ φ2φ = v2V , (5)
2 2
In all three cases, extremal flows for the variational principle are determined
uniquely by their initial conditions. On the diffeomorphism side, this is the velocity
field at time t = 0, i.e. v(0). On the shape side, this is the starting configuration q(0)
and the momentum (covector) p(0) (velocity vector in Lagrangian dynamics). See
illustration in Fig. 2.
Fig. 2 (a) Human corpus callosum shape represented by landmarks (black curve and points) and
geodesic flow (blue curves) specified by an initial vector field (vectors). (b) Geodesic matching
between two corpus callosum shapes (black and red). Compare these deterministic evolutions to
the stochastic trajectories shown later in the chapter
d ∂ p
q= H (q, p) = ,
dt ∂p m
(6)
d ∂ ∂U (q)
p = − H (q, p) = − .
dt ∂q ∂q
In the LDDMM setting, the Hamiltonians define the kinetic energy for a Riemannian
metric. They are in quadratic form, and the potential energy is absent. The dynamics
of this class of Hamiltonians describes geodesic motion, since no force derived
from a potential is present. This is the case for landmark dynamics, where the
Hamiltonian is given by
n
H0 (q, p) = piT K(qi , qj )pj , (7)
i,j =1
d ∂ n
qi = H0 (q, p) = K(qi , qj )pj ,
dt ∂pi
j =1
(8)
d ∂ n
pi = − H0 (q, p) = − piT ∂qi K(qi , qj )pj .
dt ∂qi
j =1
qi − qj 2
K(qi , qj ) = K(qi − qj ) = exp − , ∀qi , qj ∈ Ω, (9)
2σ 2
where β = 1/T is the inverse temperature. To better understand what this system
represents, one may consider the following stochastic differential equation:
d ∂
q= H0 (q, p)
dt ∂p
(11)
∂ ∂
dp = − H0 (q, p)dt − σ dW − θ H0 (q, p)dt,
∂q ∂p
1 −βH0 (q,p)
: P∞ (q, p) = e , (12)
Z
Let use rewrite the SDE (11) in matrix form by using the function H1 (p, q) =
σ · p, as
θ q̇ = σ Ẇ − ∇q U (q). (14)
We have so far discussed the noise from a Hamiltonian perspective with the
Hamiltonian being the kinetic energy coming from a Riemannian metric on the
landmark space. We now use the Riemannian metric on S directly to define
infinitesimal stochastic perturbations which are identically distributed and have
isotropic variance, thereby generating the Riemannian Brownian motion. In infinite-
dimensional models, noise with equal variance in all dimensions has infinite
magnitude. Hence, Brownian motion in its direct form is defined only on finite
dimensional manifolds. In this case, the resulting process is well defined up to a
possible explosion time.
Let the shape space S be a finite-dimensional Riemannian manifold with dimen-
sion m and let g denote the Riemannian metric. The Laplace-Beltrami operator Δ on
√
∂( ga i )
S is given by Δf = ∇ · ∇f where the divergence is defined as ∇ · X = √1
g ∂q i
applied to a vector field X = a i ∂q∂ i and ∇f is the Riemannian gradient of the
function f : S → R. Riemannian Brownian motion is a diffusion process on S
with generator Δ/2. There are various characterisations of the Brownian motion
and various ways to construct the process; see, e.g. Emery (1989) and Hsu (2002). If
Q(t) is a Brownian motion, its density at time t > 0 with respect to the Riemannian
volume form satisfies the heat equation:
1
∂t p(t, q) = Δp(t, q), q ∈ S. (15)
2
i
1
dQ(t) = − g(Q(t))kl Γ (Q(t))kl dt +
i
g(Q(t))−1 dW (t), (16)
2
1
dQ(t) = − K(Qtk , Qtk )kl Γ (Q(t))kl dt + K(Qtk , Qtk )dW (t). (17)
2
Figure 4 shows a sample path from the landmark Riemannian Brownian motion.
This process has been used in Staneva and Younes (2017) for analysis of stochastic
landmark trajectories with continuous observations and in Sommer et al. (2017) with
a Brownian bridge simulation to perform statistical estimation on landmark spaces
with discrete time observations.
38 Stochastic Shape Analysis 1335
Lagrangian Noise
Upon using this formulation without dissipation (θ = 0) for n landmarks, with the
Hamiltonian H0 in (7) and the generalisation of H1 in n ‘stochastic Hamiltonians’
Hi = σi ·qi , we arrive at the model of Trouve and Vialard (2012) and Vialard (2013),
written explicitly as
n
dqi = K(qi , qj )pj dt
j =1
(18)
n
dpi = − piT ∂qi K(qi , qj )pj dt − σi dWi ,
j =1
where we considered a different Wiener process Wi for each Hi . See Fig. 5 for an
illustration of this noise. This noise has a Lagrangian flavour to it, because each
noise Wi is associated with a single landmark (qi , pi ). This Lagrangian formulation
of stochastic landmark dynamics has ‘smooth’ trajectories in space, as the noise
only appears in the momentum equation. Thus, the paths have the regularity of
the integral of the Wiener processes Wi . In addition, landmarks can cross each
other under the influence of the noise, which violates one of the properties of the
deterministic equations. See Holm and Tyranowski (2016) for a numerical study of
these landmark crossings. Finally, it is interesting to note that in the limit of infinitely
1336 A. Arnaudon et al.
many particles, that is, for infinite dimensional shapes, this noise persists under the
form of cylindrical Brownian motion. See Vialard (2013) for more detail.
Eulerian Noise
where the functions σl are the fields on Ω, and we can have a number of Wiener
processes different from the number of landmarks. With (19), the corresponding
stochastic Hamiltonian equation is more complex as it is multiplicative in both
equations. It explicitly reads
d n
qi = K(qi , qj )pj dt + σl (qi ) ◦ dW (t)l
dt
j =1 l
(20)
d n ∂
pi = − piT ∂qi K(qi , qj )pj dt − pi · σl (qi ) ◦ dW (t)l ,
dt ∂qi
j =1 l
particular, the spatial correlation of the noise can be fully characterised by the set
of functions σi , independently of the number of landmarks. As we will see later,
this property is crucial for any inverse problem of learning the noise or estimating
the uncertainty of a matching problem from data. Notice that if the noise fields
σi are taken constant, this model simplifies to landmarks with exactly the same
constant additive noise in the position equation, corresponding to a global random
displacement of the domain, but not to the Lagrangian model. The noise fields and
a sample from the stochastic evolution are visualised in Fig. 6.
These two models are therefore different in nature and by their specific properties
may be used for different applications. For example, this model will result in noisy
trajectories of landmarks, which requires more care in the numerical integration with
Stratonovich noise and some hypotheses about the form of the noise fields σi . At the
contrary, the Lagrangian noise is simpler to integrate and, in its simplest form, may
only require a single constant to parametrise the noise amplitude of all landmarks.
As mentioned in the previous section, only the Eulerian model for stochastic
landmark dynamics can be generalised to other shape spaces. This is because
this form of the stochastic Hamiltonian is compatible with reduction by sym-
metry, which provides the tools to compute the geodesic equations for general
shape spaces, and in particular landmarks, as in (8). Here, we outline Euler-
Poincaré reduction for diffeomorphisms (Holm and Marsden 2003) leading to the
EPDiff equations which provide the basis for the stochastic Euler-Poincaré theory
used in section “Stochastic Euler-Poincaré Reduction and Its Infinite Dimensional
Extension”. The main point is that because the Riemannian metric defined above
is invariant under right translations, the geodesic equations can be solved on
V ⊂ X(Ω) and then subsequently reconstructed to give a path on φ(t), by using
the flow equation (2). This is an example reduction by symmetry.
1338 A. Arnaudon et al.
Reduction by Symmetry
From Eq. (1), without the dissimilarity term, the matching problem corresponds to
the variational principle:
1
1
δE(φ(t)) = δ (v(t))dt = 0, (21)
2 0
using the Lagrangian and with variations δφ(t) of the curve φ(t) that vanish at the
endpoints φ0 and φ0 .
Regarding Diff(Ω) as a group, we let Rφ be the right translation Rφ (ψ) = ψ ◦ φ.
The derivative with respect to ψ is the pushforward (Rφ )∗ . Particularly, (Rφ −1 )∗
is a mapping from the tangent space Tφ Diff(Ω) to TId Diff(Ω). The latter may
be identified with the Lie algebra of smooth vector fields, X(Ω). Let w(t) :=
(Rφ(t)−1 )∗ δφ(t) be the right translations of the variation δφ(t) for each t. The
variation δv(t) of v(t) is related to w(t) by the equality
d
δv(t) − w(t) = [v, w] = −adv w, (22)
dt
in terms of the Lie algebra adjoint map ad, where the bracket [v, w] is the Jacobi–Lie
bracket between vector fields.
Let δ/δv denote the variational derivative of the Lagrangian with respect to
v, and let δvδ
δv := δ(v) denote the pairing of this with a variation of v. Since
δE(φ) vanishes for all such variations from (21), one obtains
d δl δl
+ ad∗v = 0. (23)
dt δv δv
These are called the Euler-Poincaré equations when derived for general Lie groups.
They are called the EPDiff equations in the present case, when the group is Diff(Ω).
Here, the Lagrangian only contains a kinetic energy and reads
l(v) = v · Lvdx, (24)
Ω
where L = K −1 is the operator associated with the kernel K discussed earlier. For
vector fields, the commutator is given by [u, v] = u∇x v − v∇x u. Thus, in terms of
the momentum variable m = Lv, the EPdiff equation reads:
ṁ = − ad∗δh m, (26)
δm
h(v) = m · Km dx. (27)
Ω
Stochastic EPDiff
where σl are, as before, fields on the image domain Ω. The corresponding stochastic
EPDiff is then the stochastic differential equation:
dm = ad∗δh m dt + ad∗δhl m ◦ dWl . (29)
δm δm
l
dm = (u · ∇x )m + m(∇x u)T + (∇x · u)m
(30)
+ (σl · ∇x )m + m · (∇σ )T + (∇ · σl )m) ◦ dWl .
l
m(x, t) = pi (t)δ(x − qi (t)). (31)
i
Once substituted in (30), this singular representation yields the stochastic landmark
equations (20). We refer the interested reader to Arnaudon et al. (2018b) and Kühnel
et al. (2018) for more detailed treatments of the stochastic EPDIff equation in the
context of shape analysis.
1340 A. Arnaudon et al.
Stochastic extensions of landmark and image dynamics are also considered in the
context of Brownian flows in the sense of Kunita Kunita (1997). Here, infinite
dimensional stochastic noise is added to the flow equation (2) resulting in the SDE:
∞
dφ(t)(x) = v(φ(t)(x), t)dt + fi (φ(t)(x), t)dW i (t). (32)
i=1
The stochastic shape models described in the previous section appear in applications
when estimating the noise structure along a shape trajectory observed at multiple
time points, along ensembles of observed shape trajectories, and for modelling
probability distributions on the nonlinear shape space. Here, we describe examples
of such applications.
From the Eulerian stochastic model, the noise fields σl are to be determined to
obtain relevant stochastic dynamics. One possibility is to infer them from some
distributions of shapes, assumed to be samples from a single choice of the noise
fields. To solve this inverse problem, one first simplifies the search space by
parametrising a tractable number of noise fields, with, for example, Gaussian
kernels, and tries to infer these parameters.
In Arnaudon et al. (2019a), the evolution of the first moments of the probability
distributions of landmarks on position and momenta was first computed, and the
mean and variance of the moments associated with positions were used to match the
observed distributions of shapes. This algorithm requires one to derive and solve a
set of couple ordinary equations approximating the Fokker-Planck equation for the
probability distributions and implement a shooting algorithm on the initial momenta
for the mean, variance and the noise parameters. As demonstrated in Arnaudon et al.
(2019a), this method gives accurate results when landmarks do not interact much on
noisy regions of the image or, equivalently, when the moment approximation is most
accurate.
1342 A. Arnaudon et al.
The same method has been applied on the EPDiff equation (30), after the
application of an appropriate spatial discretisation in low-frequency Fourier modes
(Kühnel et al. 2018), following Zhang and Fletcher (2015). This algorithm works
because of the control one has on the spatial correlation of the noise. Indeed,
by choosing noise fields as Fourier modes, discarding the highest frequency
components of the dynamics to reduce the dimensionality of the problem only
restricts the number of noise fields that can be inferred.
N
L(θ ; q1 , . . . , qN ) = pT (qi ). (33)
i=1
where W (t) is a Euclidean Brownian motion, e.g. (17). We are now interested in
conditioning Q(t) on hitting a point v at time T , thinking of v as a data point that
could be a sample from Q(T ). The conditioned processed Q∗ = Q|Q(T ) = v has
the SDE
38 Stochastic Shape Analysis 1343
dQ∗ (t) = b(Q∗ (t), t)dt + σ (t, Q∗ (t))σ (t, Q∗ (t))T ∇ log pT −t (v; Q∗ (t))
(35)
+ σ (Q∗ (t), t)dW (t),
where pT −t (v; Q∗ (t)) is the transition density of the process started at Q∗ (t),
running for time T − t and evaluated at v. However, this SDE can generally
not be used for numerical simulations since the gradient of the log-transition
density ∇ log pT −t (v; Q∗ (t)) in the added drift term is generally not available. To
circumvent this, Delyon and Hu (2006) introduced the idea of guided proposals
approximating the bridge SDE (35) by
Y(t) − v
dY(t) = b(Y(t, t)dt − dt + σ (Y(t), t)dW (t). (36)
T −t
EY [f (Y(t))ϕ(Y(t))]
EQ|Q(T )=v [f (Q(t))] = , (37)
EY [ϕ(Y(t))]
Bridge sampling allows to approximate the likelihood (33) using (38). It remains
to maximise L(θ ; q1 , . . . , qN ) with respect to the parameters θ . In Sommer et al.
(2017), a simple stochastic gradient optimisation scheme is used for maximising
the likelihood on the landmark manifold with the Riemannian Brownian motion.
The parameters are here the starting point q(0) of the process and parameters of the
1344 A. Arnaudon et al.
kernel matrix determining the metric of the manifold. Such gradient-based schemes
need approximations of the gradient ∇θl L(θl ; q1 , . . . , qN ). In the deterministic
setting, the gradient with respect to the initial conditions of the energy (1) can
be computed using the adjoint equations of the Hamiltonian system. It is often
not in practice feasible to derive similar systems for the gradient of (33) or (38).
Instead, modern automatic differentiation can be used to compute gradients of the
entire numerical simulation scheme used for the stochastic integration of (q, p)
and ϕ.
This idea of using automatic differentiation of stochastic geometric systems was
first pursued in Arnaudon et al. (2017) using the framework Theano Geometry (Küh-
nel et al. 2019) (http://bitbucket.org/stefansommer/theanogeometry). Recently, it
has been extended to general stochastic Hamiltonian system including the ones
discussed in this chapter using the automatic differentiation features of Julia
(Arnaudon et al. 2020). The use of automatic differentiation for shape and general
geometric computations has furthermore been treated in Kuhnel and Sommer (2017)
and Kühnel et al. (2019) using the Theano framework, in the Geomstats library
(http://geomstats.ai), in KeOps (https://www.kernel-operations.io/) and Deformet-
rica (http://www.deformetrica.org/).
The paper additionally links the momentum map representation of images (Bruveris
et al. 2009) with the stochastic EPDiff model. The stochastic EPDiff models have
been used for medical imaging and computational anatomy in Arnaudon et al.
(2017), for example, for modelling variations in the human corpus callosum.
The metamorphosis framework (Trouve and Vialard 2012) combines variations
in shapes arising from deformations with variations in the data itself, e.g. pixel
intensity variation in images. The stochastic EPDiff models have been extended
to include metamorphosis in Holm (2017) and Arnaudon et al. (2019b).
In Holm (2020), stochasticity in the Lagrangian and Eulerian reference frames
is coupled via a momentum map. This results in a multi-scale flow with two
interpenetrating degrees of freedom coupled by two different forms of stochasticity.
This interpenetration approach allows perturbations which are not simply attached
to the flow. Instead, they can propagate relative to the flow. The model has been
used as a framework for investigating wave-current interaction in the dynamics of
ocean-atmosphere coupling, and we expect it to also be relevant in the context of
stochastic shape analysis.
Finally, one needs to mention rough path theory, or rough flow theory, a powerful
method of dealing with the dynamics of highly oscillatory nonlinear systems. Rough
flow theory transcends Itô and Stratonovich stochastic calculus, by providing an
almost sure pathwise definition of the solution of a stochastic partial differential
equation. The landmark trajectories in the stochastic framework are described by
stochastic integrals which do not have a pathwise interpretation. The rough flow
treatment restores this property. Moreover, a rough flow solution is well posed
in the sense of convergence to a sequence of smooth flows in the p-variation
metric, as described, e.g. in Friz and Victoir (2010). In contrast, solutions in
Itô and Stratonovich stochastic calculus converge only weakly; namely, they
converge in the sense of the L2 norm. Thus, rough flow theory transcends Itô and
Stratonovich stochastic calculus on semimartingale flows by admitting partial dif-
ferential equations driven by non-semimartingale flows, such as Gaussian processes
and Markov processes defined on Banach spaces which are neither differentiable
nor of bounded variation (Friz and Victoir 2010). Moreover, rough flows comprise
a natural basis for functions on data streams that can be used for machine learning
(Lyons 2014).
By using the theory of controlled rough paths (Gubinelli 2004), one may
derive a class of rough EPDiff equations for shape analysis as critical points of
a rough action functional. The rough variational approach to EPDiff considerably
enhances the stochastic variational approach. For example, the rough flow driven
variational approach admits non-Markovian perturbations. Memory effects can
also be introduced into this approach through a judicious choice of the driving
rough flows. In particular, one may choose these models to characterise landmark
trajectories in shape analysis as time-dependent geometric rough paths (GRP) on
the manifold of diffeomorphic maps. For a parallel derivation of Euler–Poincaré
equations on GRP for applications in fluid dynamics, see Crisan et al. (2020).
1346 A. Arnaudon et al.
References
Arnaudon, A., Holm, D.D., Pai, A., Sommer, S.: A Stochastic large deformation model for
computational anatomy. In: Information Processing in Medical Imaging. Lecture Notes in
Computer Science, pp. 571–582. Springer (2017). https://doi.org/10.1007/978-3-319-59050-
9_45
Arnaudon, A., De Castro, A.L., Holm, D.D.: Noise and dissipation on coadjoint orbits. J. Nonlinear
Sci. 28(1), 91–145 (2018a)
Arnaudon, A., Holm, D., Sommer, S.: String methods for stochastic image and shape matching. J.
Math. Imaging Vis. 60(6), 953–967 (2018b). https://doi.org/10.1007/s10851-018-0823-z
Arnaudon, A., Holm, D.D., Sommer, S.: A geometric framework for stochastic shape analysis.
Found. Comput. Math. 19(3), 653–701 (2019a). https://doi.org/10.1007/s10208-018-9394-z
Arnaudon, A., Holm, D.D., Sommer, S.: Stochastic metamorphosis with template uncertainties.
Math. Shapes Appl. 37, 75 (2019b)
38 Stochastic Shape Analysis 1347
Arnaudon, A., van der Meulen, F., Schauer, M., Sommer, S.: Diffusion Bridges for Stochas-
tic Hamiltonian Systems with Applications to Shape Analysis. arXiv:2002.00885 [physics]
(2020)
Bauer, M., Bruveris, M., Michor, P.W.: Overview of the geometries of shape spaces and
diffeomorphism groups. J. Math. Imaging Vis. 50(1–2), 60–97 (2014). https://doi.org/10.1007/
s10851-013-0490-z
Bruveris, M., Gay-Balmaz, F., Holm, D.D., Ratiu, T.S.: The Momentum Map Representation of
Images. 0912.2990 (2009)
Budhiraja, A., Dupuis, P., Maroulas, V.: Large deviations for stochastic flows of diffeomorphisms.
Bernoulli 16(1), 234–257 (2010). https://doi.org/10.3150/09-BEJ203
Christensen, G., Rabbitt, R., Miller, M.: Deformable templates using large deformation kinematics.
Image Process. IEEE Trans. 5(10), 1435–1447 (1996)
Crisan, D., Holm, D.D., Leahy, J.M., Nilssen, T.: A Variational Principle for Fluid Dynamics on
Geometric Rough Paths. arXiv preprint arXiv:2005.09348 (2020)
Delyon, B., Hu, Y.: Simulation of conditioned diffusion and application to parameter estimation.
Stoch. Process. Appl. 116(11), 1660–1675 (2006). https://doi.org/10.1016/j.spa.2006.04.004
Weinan, E., Ren, W., Vanden-Eijnden, E.: Finite temperature string method for the study of rare
events. J. Phys. Chem. B 109(14), 6688–6693 (2005). https://doi.org/10.1021/jp0455430
Emery, M.: Stochastic Calculus in Manifolds. Universitext. Springer, Berlin/Heidelberg (1989)
Friz, P.K., Victoir, N.B.: Multidimensional Stochastic Processes as Rough Paths: Theory and
Applications, vol. 120. Cambridge University Press, Cambridge/New York (2010)
Grenander, U.: General Pattern Theory: A Mathematical Study of Regular Structures. Oxford
University Press, Oxford, UK (1994)
Grenander, U., Miller, M.I.: Computational anatomy: an emerging discipline. Q. Appl. Math.
LVI(4), 617–694 (1998)
Gubinelli, M.: Controlling rough paths. J. Funct. Anal. 216(1), 86–140 (2004)
Holm, D.D.: Geometric Mechanics – Part II: Rotating, Translating and Rolling, 2nd edn. Imperial
College Press, London/Hackensack (2011)
Holm, D.D.: Variational principles for stochastic fluid dynamics. Proc. Math. Phys. Eng. Sci./R.
Soc. 471(2176) (2015). https://doi.org/10.1098/rspa.2014.0963
Holm, D.D.: Stochastic Metamorphosis in Imaging Science. arXiv:1705.10149 [math-ph] (2017)
Holm, D.D.: Variational Formulation of Stochastic Wave-Current Interaction (SWCI).
arXiv:2002.04291 [math-ph, physics:physics] (2020)
Holm, D.D., Marsden, J.E.: Momentum Maps and Measure-Valued Solutions (Peakons, Filaments
and Sheets) for the EPDiff Equation. nlin/0312048 (2003)
Holm, D.D., Tyranowski, T.M.: Variational principles for stochastic soliton dynamics. Proc. R.
Soc. A 472(2187), 20150827 (2016). https://doi.org/10.1098/rspa.2015.0827
Holm, D.D., Ratnanather, J.T., Trouvé, A., Younes, L.: Soliton dynamics in computa-
tional anatomy. NeuroImage 23, S170–S178 (2004). nlin/0411014. https://doi.org/10.1016/j.
neuroimage.2004.07.017
Hsu, E.P.: Stochastic Analysis on Manifolds. American Mathematical Society, Boston, MA (2002)
Kuhnel, L., Sommer, S.: Computational anatomy in theano. In: Mathematical Foundations of
Computational Anatomy (MFCA) (2017)
Kühnel, L., Arnaudon, A., Fletcher, T., Sommer, S.: Stochastic Image Deformation in Frequency
Domain and Parameter Estimation Using Moment Evolutions. arXiv:1812.05537 [cs, math, stat]
(2018)
Kühnel, L., Sommer, S., Arnaudon, A.: Differential geometry and stochastic dynamics with deep
learning numerics. Appl. Math. Comput. 356, 411–437 (2019). https://doi.org/10.1016/j.amc.
2019.03.044
Kunita, H.: Stochastic Flows and Stochastic Differential Equations. Cambridge University Press,
Cambridge (1997)
Lyons, T.: Rough paths, signatures and the modelling of functions on streams. arXiv preprint
arXiv:1405.4537 (2014)
1348 A. Arnaudon et al.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1350
Matching of Geometric Curves Based on Reparametrization-Invariant
Riemannian Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351
General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351
The SRV Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1359
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1371
The Geodesic Boundary Value Problem on Parametrized Curves . . . . . . . . . . . . . . . . . . . 1371
Normalization by Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1372
Minimization over the Reparametrization Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1372
Open-Source Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1380
Abstract
This chapter reviews some past and recent developments in shape comparison
and analysis of curves based on the computation of intrinsic Riemannian metrics
on the space of curve modulo shape-preserving transformations. We summarize
M. Bauer · E. Klassen
Department of Mathematics, Florida State University, Tallahassee, FL, USA
e-mail: [email protected]; [email protected]
N. Charon ()
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD,
USA
e-mail: [email protected]
A. Le Brigant
Department of Applied Mathematics, University Paris 1, Paris, France
e-mail: [email protected]
the general construction and theoretical properties of quotient elastic metrics for
Euclidean as well as non-Euclidean curves before considering the special case of
the square root velocity metric for which the expression of the resulting distance
simplifies through a particular transformation. We then examine the different
numerical approaches that have been proposed to estimate such distances in
practice and in particular to quotient out curve reparametrization in the resulting
minimization problems.
Keywords
Introduction
Many applications that involve quantitative comparison and statistics over sets
of geometric objects like curves often rely on a certain notion of metric on the
corresponding shape space. Some of them, such as medical imaging or computer
vision, are concerned with the outline of an object, represented by a closed curve,
while others, such as trajectory analysis or speech recognition, consider open curves
drawing the evolution of a given time process in a certain space, say a manifold. In
both cases, it is often interesting when studying these curves to factor out certain
transformations (e.g., rotations, translations, reparametrizations), so as to study the
shape of the considered object, or to deal with the considered time process regardless
of speed or pace.
Beyond computing distances between shapes, a desirable goal in these appli-
cations is to perform statistical analysis on a set of shapes, e.g., to compute
the mean and perform classification or principal component analysis. For this
purpose, considering shapes as elements of a shape manifold that we equip with a
Riemannian structure provides a convenient framework. In this infinite-dimensional
shape manifold, points represent shapes, and the distance between two shapes is
given by the length of the shortest path linking them – the geodesic. This approach
allows us to do more than simply compute distances: it enables us to define the
notion of an optimal deformation between two shapes, and to locally linearize
the shape manifold using its tangent space. For instance, given a set of shapes,
one can perform methods of standard statistical analysis in the flat representation
space given by the tangent space at the barycenter.
The idea of a shape space as a Riemannian manifold was first developed
by Kendall (1984), who defines shapes as “what is left” of a curve after the effects
of translation and rotation and changes of scale are filtered out. Mathematically,
this means defining the shape space as a quotient space, where the choice of which
transformations to quotient out depends on the application. The shapes considered
by Kendall are represented by labeled points in Euclidean space, and the shape
spaces are finite-dimensional. More recent works deal with continuous curves with
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1351
values in a Euclidean space or a nonlinear manifold (Fig. 1), and thus with infinite-
dimensional shape spaces.
There exist two main complementary approaches to define the shape space and its
metric. One possibility is to deform shapes by diffeomorphisms of the entire ambient
space. In this setting, metrics are defined on the space of spatial deformations, and
are called extrinsic (or outer) metrics as developed in the works of Grenander
(1993), Trouvé (1998), and Beg et al. (2005) among other references. Another
approach consists in defining metrics directly on the space of curves itself, which are
thus called intrinsic (or inner) metrics. This chapter focuses on the second approach,
and studies inner metrics with certain invariance properties. We are specifically
interested in the invariance to shape-preserving transformations, in particular to the
action of temporal deformations, also called reparametrizations, which we represent
by diffeomorphisms of the parameter space ([0, 1] for open curves, S 1 for closed
curves). In the following sections, we will introduce a class of invariant Sobolev
metrics we call elastic on the space of immersed curves which in turn descend
to metrics on the space of shapes. These were initially studied in Michor and
Mumford (2005, 2007) and Mennucci et al. (2008) and in subsequent works. We
will then discuss in detail the particular case of the so-called “square root velocity”
(SRV) metric (Srivastava et al. 2011), a first-order invariant metric which allows for
particularly simple computations not only for curves in Euclidean spaces but also
curves with values in homogeneous spaces or even Riemannian manifolds. Finally,
we review different methods to factor out the action of the reparametrization group,
which, because of its infinite dimensionality, presents an important challenge in the
computation of distances and geodesics in this framework.
General Framework
Let D be either the interval I = [0, 1] or the circle S 1 and (M, ., .) a finite-
dimensional Riemannian manifold with T M denoting its tangent bundle. In the
following we introduce the central object of interest in this book chapter, the infinite-
dimensional manifold of open (respectively, closed) curves.
is a smooth Fréchet manifold with tangent space at c the set of C ∞ vector fields
along c, i.e.,
Tc Imm(D, M) = h ∈ C ∞ (D, T M) : h ◦ π = c , (2)
1352 M. Bauer et al.
Fig. 1 (continued)
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1353
The main difficulties for understanding this result stem from the manifold
structure of the ambient space M. For the convenience of the reader, we note that
for M = Rd , the situation simplifies significantly: in that case Imm(D, Rd ) is an
open subset of the infinite-dimensional vector space C ∞ (D, Rd ), and thus tangent
vectors to Imm(D, Rd ) can be identified with smooth functions with values in Rd
as well. See Fig. 2 for a schematic explanation of the involved objects.
In most applications in shape analysis, one is not interested in the parametrized
curve itself, but only in its features after quotienting out the action of shape-
preserving transformations. Therefore, we introduce the reparametrization group of
the domain D:
Diff+ (D) = γ ∈ C ∞ (D, D) : γ is an orientation preserving diffeomorphism. .
(3)
Similarly to the space of immersions, this space carries the structure of an infinite-
dimensional manifold. In fact it has even more structure, namely, it is an infinite-
dimensional Lie group (Hamilton 1982, Section 4). This group acts on the space of
immersed curves by composition from the right, and this action merely changes the
parametrization of the curve but not its actual shape. See Fig. 2 for an example of
different parametrizations of the same geometric curve.1
Similarly, we can consider the left action of the group Isom(M) of isometries
of M on Imm(D, M). Note that the isometry group is always a finite-dimensional
group; e.g., for M = Rd , the group Isom(M) is generated by the set of translations
and linear isometries (In some applications one is also interested in modding out
the action of the scaling group, which requires a slight modification of the family of
elastic metrics. We will not discuss these details here, but refer the interested reader
to the literature, e.g., Bruveris and Møller-Andersen 2017.). Thus, the action of the
Fig. 1 Examples of geodesics on spaces of unparametrized curves w.r.t elastic metrics (target
curve in red). Some intermediate curves c(t, ·) are shown in dashed line and the trajectory of a few
specific points in blue. Left figure: second-order Sobolev metric, estimated with the approach of
Bauer et al. (2019a), cf. section “Relaxation of the Exact Matching Problem”. Middle figure: SRV
metric for curves with values on homogeneous spaces as implemented in Su et al. (2018), where
the optimal reparametrization is estimated using dynamic programming; cf. sections “Curves in
Lie Groups” and “Dynamic Programming Approach”. Right figure: SRV metric for manifold-
valued curves in the hyperbolic plane, as implemented in Le Brigant (2019) with successive
horizontalizations; cf. section “Curves in Riemannian Manifolds”, method 1 and section “Iterative
“Horizontalization” Method”
1 To be mathematically exact, one should limit oneself to the slightly smaller set of free immersions
in this definition,as the quotient space has some mild singularities without this restriction. We will,
however, ignore this subtlety for thepurpose of this book chapter.
1354 M. Bauer et al.
Fig. 2 Left panel: Tangent vector field to a curve c(u) on the two-dimensional sphere M = S 2
(left) and its tangential and normal parts (right). Right panel: Two different parametrizations of the
same geometric curve
infinite-dimensional group Diff+ (D) is the most difficult to deal with, both from a
theoretical and an algorithmic viewpoint. This allows us now to introduce the shape
space of curves (To be mathematically exact, one should limit oneself to the slightly
smaller set of free immersions in this definition, as the quotient space has some mild
singularities without this restriction. We will, however, ignore this subtlety for the
purpose of this book chapter.)
S(D, M) := Imm(D, M)/ Diff+ (M) × Isom(M) ) (4)
Lemma 2 (Cervera et al. 1991 and Binz and Fischer 1981). The shape space
S(D, M) is a smooth Frechet manifold, and the projection p : Imm(D, M) →
S(D, M) is a smooth submersion.
This means specifically that the mapping p is Frechet-differentiable and that for
any c ∈ Imm(D, M), dp(c) is onto from Tc Imm(D, M) to T[c] S(D, M). The so-
called vertical space at c associated with the submersion is defined as Verc = {h ∈
Tc Imm(D, M) | dp(c) · h = 0}.
We aim to introduce Riemannian metrics on the shape space S(D, M) by
defining metrics on the space of parametrized curves that satisfy certain invariance
properties. In the literature these metrics are also referred to as elastic metrics, as
they account for both bending and stretching of the curve.
A Riemannian metric on Imm(D, M) is a smooth family of inner products
Gc (., .) on each tangent space Tc Imm(D, M), and we call such a metric G
reparametrization-invariant if it satisfies the relation:
In the following we will introduce the class of Sobolev-type metrics. For the
convenience of the reader, we will first discuss the special case of a first-order metric
and M = Rd . We will then generalize this to the more complicated situation of
curves with values in general manifolds and more general metrics. For a curve c ∈
Imm(D, Rd ) and tangent vectors h, k ∈ C ∞ (D, Rd ), we let
⎛ ⎞
h k
Gc (h, k) = ⎝h, k + , ⎠ |c |du = h, k + Ds h, Ds k ds,
|c | |c |
D D
(6)
where the desired invariance follows directly by integration using substitution. Here
Ds = |c∂u | and ds = |c |du denote differentiation and integration with respect to
arclength. These definitions naturally generalize to curves with values in abstract
manifolds by replacing the partial derivative ∂u in Ds by the covariant derivative
with respect to the curve velocity ∇c (u) . We will denote the induced differential
∇
operator as ∇s = |cc | .
Using this notation, a reparametrization-invariant Sobolev metric of order n on
the space of manifold valued curves can be defined via
n
Gc (h, k) = ∇si h, ∇si kc ds. (7)
i=0 D
More generally we can consider metrics that are defined by an abstract, positive,
pseudo-differential operator Lc , which satisfies the equivariance property Lc (h) ◦
γ = Lc◦γ (h ◦ γ ) for all reparametrizations γ , immersions c, and tangent vectors h.
The corresponding metric can then be written via
Gc (h, k) = Lc (h), Lc (k)c ds. (8)
D
where a, b > 0 are constants and ⊥ and denote the projection on the normal
(respectively, tangential) part of the tangent vector. Here normal and tangential are
calculated with respect to the foot-point curve c, as illustrated in Fig. 2.
As a next step, we will show that the invariance of the metric G will allow
us to define an induced metric on the shape space of unparametrized curves.
Before we are able to formulate this result, we review some basic facts on Rie-
mannian submersions. Therefore let (M, g1 ) and (N, g2 ) be two (possibly infinite
1356 M. Bauer et al.
We want to emphasize here that this theorem is nontrivial in our setting: in finite
dimensions, the invariance of the Riemannian metric would always imply the
existence of a Riemannian metric on the quotient space, such that the projection is a
Riemannian submersion. In our infinite-dimensional situation, the proof is slightly
more delicate, as one has to show the existence of the horizontal bundle by hand.
This can be done by adapting a variant of Moser’s trick to the present setting.
For the reparametrization-invariant metrics studied in this chapter, the horizontality
condition requires one essentially to solve a differential equation of order 2n with n
being the order of the metric. In the case where one is only interested in factoring
out the reparametrization group, these two subspaces are given by
Verc = h = a.c ∈ Tc Imm(D, M) : a ∈ C ∞ (D, R) , (10)
Horc = k ∈ Tc Imm(D, M) : Gc (k, ac ) = 0 for all a ∈ C ∞ (D, R) ; (11)
see, e.g., Michor and Mumford (2007) and Bauer et al. (2011). If one wants to factor
out in addition the group of isometries of M, one has to change the definition of the
vertical and thus horizontal bundle accordingly. The exact formulas will depend on
the manifold M.
The above theorem allows us to develop algorithms on the quotient space
S(D, M) while performing most of the operations on the space of parametrized
curves. In the following, we discuss how to express the geodesic distance resulting
from the above Riemannian metric, which will serve as our similarity measure on
the space of shapes. We will first do this for parametrized curves and then in a
second step describe the induced distance on the space of geometric curves. For
parametrized curves c0 , c1 ∈ Imm(D, M), we have
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1357
1
dist(c0 , c1 ) = inf Gc (∂t c, ∂t c)dt, (12)
0
where the infimum has to be calculated over all paths c : [0, 1] → Imm(D, M) such
that c(0) = c0 and c(1) = c1 . In the following we will usually view paths of curves
as functions of two variables c(t, u) where t ∈ [0, 1] is the time variable along the
path and u ∈ D the curve parameter.
The induced geodesic distance on the quotient shape space S(D, M) can now be
calculated via
Note that this can be formulated as a joint optimization problem over the path of
curves c, the reparametrization function γ , and the isometry g ∈ Isom(M).
In finite dimensions, geodesic distance always gives rise to a true distance
function, i.e., it is symmetric, is positive, and satisfies the triangle inequality. On
the contrary, this can fail quite spectacularly in this infinite-dimensional situation,
as the geodesic distance can vanish identically on the space. This phenomenon
has been found first by Eliashberg and Polterovich for the W −1,p -metric on the
symplectomorphism group (Eliashberg and Polterovich 1993). In the context of
reparametrization-invariant metrics on space of immersions, this surprising result
has been proven by Michor and Mumford (2005). In the following theorem, we
summarize results on the geodesic distance for the class of Sobolev metrics. See
Michor and Mumford (2007), Bauer et al. (2012, 2020b), and Jerrard and Maor
(2019) and the references therein for further information on this topic.
This result suggests that metrics of order at least one are potentially well-suited
for applications in shape analysis. For such applications, one is usually interested in
computing numerically the geodesic distance as well as the corresponding optimal
path between two given curves. In Riemannian geometry, these optimal paths
are called minimizing geodesics, and they are locally described by the so-called
geodesic equation, which is simply the first-order optimality condition for the
length functional as defined in (12). In our context these equations become rather
difficult; they are nonlinear PDEs of order 2n (where n is the order of the metric).
Nevertheless there exist powerful results on existence of solutions.
1358 M. Bauer et al.
Imms (D, M) := c ∈ H s (D, M) : |c | = 0 , (14)
which is a smooth Banach manifold. Here H s (D, M) denotes the Sobolev space of
order s; see, e.g., Bauer et al. (2020c) for the exact definition in a similar notation.
Note that the condition |c | = 0 is well defined as all functions in H s (D, Rd )
are C 1 for s > 32 . We are now able to state the main result on geodesic and
metric completeness, which is of relevance to our applications. In order to keep the
presentation as concise as possible, we will formulate this result for closed curves
and will only comment on the open curve case below.
Theorem 3 (Bruveris et al. 2014; Bruveris 2015 and Bauer et al. 2020c). Let
dist be the geodesic distance of the Sobolev metric G, as defined in (7), of order
n ≥ 2 on the space Imm(S 1 , M) of smooth regular curves. The following statements
hold:
1. The metric G and its corresponding geodesic distance function extend smoothly
to the space of Sobolev immersions Imms (S 1 , M) for all s ≥ n.
2. The space Immn (S 1 , M) equipped with the geodesic distance function dist (of
the Sobolev metric of order n) is a complete metric space.
3. For any two curves in the same connected component of Immn (S 1 , M), there
exists a minimizing geodesic connecting them.
For open curves it has been shown that the constant coefficient metric as defined in
(7) is in fact not metrically complete (Bauer et al. 2019a). The reason for this is that
one can always shrink down a straight line (open geodesic in the manifold M resp.)
to a point using finite energy. One can, however, regain the analogue of the above
completeness result for open curves by considering a length-weighted version of the
Riemannian metric; see Bauer et al. (2020c).
As a direct consequence of the completeness results, we obtain the existence of
optimal reparametrizations, i.e., the well-posedness of the matching problem on the
space of unparametrized curves. To state our main result on existence of optimal
reparametrizations, we introduce the quotient space of Sobolev immersion modulo
Sobolev diffeomorphisms:
We have not determined whether this space carries the structure of a manifold.
Nevertheless, we can consider the induced geodesic distance on this space and
obtain the following completeness result, which we will formulate again for closed
curves only.
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1359
Theorem 4 (Bruveris 2015). Let n ≥ 2 and let dist be the geodesic distance
of the Sobolev metric of order n on Immn (S 1 , M). Then Sn (S 1 , M) equipped
with the quotient distance distS is a complete metric space. Furthermore, given
two unparametrized curves [c0 ], [c1 ] ∈ S n (S 1 , M), there exists an optimal
reparametrization γ and isometry g, i.e., the infimum
In the article Bruveris (2015), this result is formulated for the action of the infinite-
dimensional group Diff+ (S 1 ) only and for M = Rd only. The proof can however be
easily adapted to incorporate the action of the compact group Isom(M), and, using
the results of Bauer et al. (2020c), it directly translates to the case of manifold-
valued curves. Similar as in Theorem 3, this results continue to hold for open curves
after changing the Riemannian metric to a length-weighted version.
For further results on general Sobolev metrics on spaces of curves, we refer to
the vast literature on the topic, including Sundaramoorthi et al. (2007), Bauer et al.
(2014b, 2020a), Klassen et al. (2004), Michor and Mumford (2007), Younes (1998),
and Tumpach and Preston (2017). An example of a geodesic between two planar
closed curves for a second-order Sobolev metric is shown in Fig. 1 (left), which
was computed with the approach described later in section “Relaxation of the Exact
Matching Problem”. In the following section, we will study one particular metric
of order one that will lead to explicit formulas for geodesics and geodesic distance
on open, parametrized curves. This will in turn allow us to recover the results on
existence of geodesics and optimal reparametrizations. These optimal objects will
however fail to have the regularity properties that the optimizers in this section were
guaranteed to have.
Curves in Rd
The reparametrization-invariant Riemannian metrics discussed above are designed
to induce Riemannian metrics on the space of shapes. In general, calculating
geodesics and distances with respect to these metrics requires numerical optimiza-
tion, and is often computation-intensive. However, for the case of open curves in
Rd , one of these metrics provides geodesics and distances that are especially easy
to compute. This method is known as the “square root velocity” (SRV) framework.
The main tool in this framework is the map Q : Imm(D, Rd ) → C ∞ (D, Rd ),
often referred to in the literature as the SRV transform or function, defined by
1360 M. Bauer et al.
c (u)
Q(c)(u) = √ . (17)
|c (u)|
The importance of this map becomes evident in the following theorem by Srivastava
et al. (2011), which connects it to the Ga,b -metric (9) for a particular choice of
constants a and b.
In the following we will describe the SRV framework in the case of open curves,
and we will only comment briefly on applications of the SRV transform to closed
curves at the end of the section.
Open Curves The reason for treating the case of open curves separately is the
fact that the mapping Q becomes a bijection, which will allow us to completely
transform all calculations to the image of Q – a vector space. While we could
perform all of these operations in the smooth category, it turns out to be beneficial
to consider this method on a much larger space, which will then turn out to be
the metric completion of the space of smooth immersions with respect to the SRV
metric.
Henceforth, for I = [0, 1], let AC(I, Rd ) denote the set of absolutely continuous
functions I → Rd . Since the considered metric will be invariant under translation,
we standardize all curves to begin at the origin; therefore, let AC0 (I, Rd ) denote
the set of all c ∈ AC(I, Rd ) such that c(0) = 0. We can extend the mapping Q as
defined in (17) to a mapping on this larger space via Q : AC0 (I, Rd ) → L2 (I, Rd )
as follows:
⎧
⎨ √c (u) if c (u) = 0;
Q(c)(u) = |c (u)| (18)
⎩0 if c (u) = 0.
u
−1
c(u) = Q (q)(u) = |q(y)|q(y)dy, (19)
0
and, thus, that Q is a bijection. Diff+ (I ) acts on AC0 (I, Rd ) from the right by
composition; hence, there is a unique right action of Diff+ (I ) on L2 (I, Rd ) that
makes Q equivariant. The explicit formula for this action is
(q ∗ γ )(u) = γ (u)q(γ (u)), (20)
Theorem 6 (Lahiri et al. 2015 and Bruveris 2016). The space of absolutely
continuous curves equipped with the SRV metric is a geodesically and metrically
complete space. Furthermore, given any curves c0 , c1 ∈ AC0 (I, Rd ), the unique
minimizing geodesic connecting them is given by
and thus the geodesic distance between c0 and c1 can be calculated via
1
dist(c0 , c1 ) = |Q(c0 )(u) − Q(c1 )(u)|2 du. (22)
0
opposite directions (and with parameters within a certain distance of one another).
However, for a < b, the conditions become much more restrictive, as one needs
to exclude the situation in which there is an open interval in the parameter domain
of one curve where the angle between the tangents and the tangent at a point of
nearby parameter in the other curve exceeds aπ/b. In particular, for the SRV metric,
this basically constrains angles between tangent vectors of the two curves to be
smaller than π/2, which is an impractical assumption in typical applications. As we
discuss next, it turns out that by allowing instead of a single diffeomorphism a pair
of “generalized” reparametrization functions, one can recover an existence result for
fairly general classes of curves.
In the following we aim to describe this construction, which will require us to
consider the closure of the Diff+ (I ) orbits on AC0 (I, Rd ). Hence, we define an
equivalence relation on AC0 (I, Rd ) by c1 ∼ c2 if and only if the Diff+ (I ) orbits
of Q(c1 ) and Q(c2 ) have the same closure in L2 (I, Rd ). We then define the shape
space of open curves in Rd as
and for c ∈ AC0 (I, Rd ), we let [c] denote the equivalence class of c under ∼ .
In order to better understand these equivalence classes, we need an expanded
version of Diff+ (I ). To be precise, define Diff+ (I ) to be the set of all absolutely
continuous functions γ : I → I such that γ (0) = 0, γ (1) = 1, and γ (u) ≥ 0
almost everywhere. Note that Diff+ (I ) is only a monoid, not a group, since the only
elements of Diff+ (I ) that have inverses are those γ such that γ (u) = 0 almost
everywhere. We then have the following description of a general equivalence class
of AC0 (I, Rd ) under the relation ∼.
Lemma 3 (Lahiri et al. 2015). Let c ∈ AC0 (I, Rd ), and assume that c (u) = 0
almost everywhere. Then the equivalence class of c under ∼ is equal to
{c ◦ γ : γ ∈ Diff+ (I )}.
Note that if c (u) = 0 on a set of nonzero measure, then we cannot directly use
Lemma 3 to characterize [c]; however, we can reparametrize c by arclength to obtain
another element c̃ in the same equivalence class as c, and then use Lemma 3 to
characterize [c] = [c̃].
We can now define a distance function on the shape space as follows: if [c1 ] and
[c2 ] are elements of S(I, Rd ), then we let
Note that it seems at first that we need to consider reparametrizations of both c0 and
c1 , because Diff+ (I ) is not a group but only a monoid. However, it can be shown that
the infimum will be the same if we only consider reparametrizations of one of the
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1363
curves. See Lahiri et al. (2015) and Bruveris (2016). The optimal reparametrization
problem for curves in AC0 (I, Rd ) can now be formulated as follows: suppose c0 and
c1 are elements of AC0 (I, Rd ), and that both have nonvanishing derivatives almost
everywhere. Do there exist γ0 and γ1 in Diff+ (I ) such that
The following theorem gives the known results about this problem.
Theorem 7 (Lahiri et al. 2015 and Bruveris 2016). Let c0 and c1 be elements
of AC0 (I, Rd ) with both having nonvanishing derivatives almost everywhere. We
have:
In practice, Step 3 is often omitted to save computation, because the path pro-
duced by Step 2 is generally very close to a geodesic. In order to find optimal
reparametrizations for a pair of closed curves, it is not enough to consider the
methods developed for open curves, because of the freedom to choose any point
on a closed curve to be its starting and ending point (i.e., the point c(0) = c(1)). To
remedy this, the algorithms discussed for open curves need to be implemented along
a densely spaced set of points on one of the curves in order to choose the matching
that leads to the shortest geodesic between the curves. For details, see Srivastava
and Klassen (2016).
⎧
⎨Q : AC(I, G) → G × L2 (I, g)
(23)
⎩Q(c) = (c(0), q),
where
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1365
⎧
⎪
⎨ dLc(u)−1 c (u)
√ c (u) = 0
q(u) = c (u) (24)
⎪
⎩0 c (u) = 0
Note that Lc(u)−1 denotes left translation on G by c(u)−1 , which is added to transport
the whole curve to the same tangent space g. Note also that the second part of this
transformation is simply the generalization of the SRV transform for curves in a
Euclidean space to curves with values in a Lie group and the first factor is added to
keep track of the starting point. In Su et al. (2018), it is shown that the map Q is a
bijection.
We put a product metric on G × L2 (I, g) coming from the left-invariant metric
on G and the L2 -metric on L2 (I, g). Then the smooth structure and Riemannian
metric on G×L2 (I, g) are pulled back to AC(I, G) leading to the following explicit
formula for the corresponding geodesic distance:
1
dist(c0 , c1 )2 = distG (c0 (0), c1 (0))2 + q1 (u) − q0 (u)2 du, (25)
0
with distG being the geodesic distance on the finite-dimensional group G and qi (u)
being the q-map, as defined in equation (24), of the curve ci . Note that the smooth
structure and metric are invariant under the action of Diff+ (I ) and also under the
left action of G. For the relation of the corresponding Riemannian metric to the class
of elastic metrics as defined in equation (7), we refer to the articles Su et al. (2018)
and Celledoni et al. (2016b).
For the last equality, we used that c(u)T = c(u)−1 . Moreover the geodesic distance
on G is given explicitly by distG (c0 , c1 ) = log(c0T c1 )2F , where log denotes the
standard matrix logarithm and .F the Frobenius norm. This leads to the following
specific expression of the SRV distance (25) for parametrized curves in SO(n, R):
1366 M. Bauer et al.
1
dist(c0 , c1 )2 = log(c0 (0)T c1 (0))2F + q1 (u) − q0 (u)2F du. (28)
0
This surjection is not a bijection, because a curve c in AC(I, M) does not have a
unique horizontal lift to G. Rather, it has a unique horizontal lift starting at each
point of π −1 (c(0)). To fix this, we define a right action of K on G × L2 (I, k⊥ ) by
This bijection, which is equivariant with respect to the left action of G, is the key
tool that we use to define a Riemannian metric on AC(I, M). To see this, note first
that we can endow G × L2 (I, k⊥ ) with the natural product metric in the same way
that we did in the case of Lie groups. Then, note that this metric is invariant under
the right action of K, so it induces a metric on the quotient space (G × L2 (I, k⊥ ))/K
and, hence, on AC(I, M). Furthermore, this Riemannian metric is invariant under
the left action of G.
Geodesics Geodesics in L2 (I, k⊥ ) are simply straight lines. Let us assume that
we can compute geodesics in G, as well. Then geodesics in G × L2 (I, k⊥ ) are
products of geodesics in these two spaces. To compute geodesics and geodesic
distance in AC(I, M), we need to compute geodesics in (G × L2 (I, k⊥ ))/K. This is
accomplished as follows. Suppose we are given two elements of (G×L2 (I, k⊥ ))/K,
[(c1 , q1 )] and [(c2 , q2 )]. In order to calculate a geodesic between them, we must find
y ∈ K that minimizes d((c1 , q1 ), (c2 y, y −1 q2 y)). Note that this is a minimization
problem over the compact Lie group K. In fact, the gradient of this function on K
can be explicitly calculated (see Lemma 5 of Su et al. (2018) for the computation),
reducing the computation of geodesics to an optimization problem on a compact
Lie group with an explicit gradient. This technique yields efficiently computable
formulas for geodesics and geodesic distances; see Celledoni et al. (2016a) and Su
et al. (2018). See Fig. 4 for an example of geodesics between curves on the sphere.
Furthermore, analogues of the optimal reparametrization results, cf. Theorem 7,
have been proven; see Su et al. (2018).
Finally, we note that under the framework just described, the Lie group G acts
on AC(I, M) by isometries. Hence, for some applications, one may wish to mod
out by this action (in addition to the reparametrization group) when defining the
shape space of open curves in M. We observe that the current framework extends
very naturally to performing the additional optimization implied by this quotient
operation. We refer the reader to Su et al. (2017, 2018) for more details.
Method 1 In the case of curves with values in a general manifold, the elastic
G1,1/2 -metric is no longer a flat metric. However it can still be obtained as a pullback
by the SRV transform of a natural metric on the tangent bundle T Imm(I, M),
1368 M. Bauer et al.
namely, a pointwise version of the Sasaki metric on T M. Recall that the Sasaki
metric is a natural choice of metric on the tangent bundle T M that depends on the
horizontal and vertical projections of each tangent vector. Intuitively, the horizontal
projection of a tangent vector of T(p,w) T M for any (p, w) ∈ T M corresponds to
the way it moves the base point p, and its vertical projection, to the way it linearly
moves w. More precisely, define just as in the Euclidean case the SRV transform to
be Q : Imm(I, M) → T Imm(I, M):
Q(c)(u) = c (u)/ |c (u)|.
Consider the following metric on the tangent bundle T Imm(I, M): for any
pair (c, h) ∈ T Imm(I, M), and any infinitesimal deformations ξ1 , ξ2 ∈
T(c,h) T Imm(I, M) of the pair (c, h), define
Ĝ(c,h) ξ1 , ξ2 = ξ1 (0)hor , ξ2 (0)hor + ξ1 (u)ver , ξ2 (u)ver du, (29)
I
where ξ1 (u)hor ∈ T M and ξ1 (u)ver ∈ T M are the horizontal and vertical projections
of the tangent vector ξ1 (u) ∈ T(c(u),h(u)) T M for all u ∈ I . Then, the elastic G1,1/2 -
metric is the pullback of Ĝ with respect to the SRV transform Q, i.e.:
1,1/2
Gc (h, k) = ĜQ(c) Tc Q(h), Tc Q(k) = h(0), k(0)
+ ∇h(u) Q(c), ∇k(u) Q(c)du, (30)
I
for any curve c ∈ Imm(I, M) and h, k ∈ Tc Imm(I, M), where ∇h(u) Q(c) denotes
the covariant derivative in M of the vector field Q(c) in the direction of the vector
field h. Notice that here we add a position term to the integral definition (9) of the
G1,1/2 -metric in order to take into account translations. Accordingly, the energy of
a path of curves [0, 1] t → c(t) which SRV transform we write q(t, ·) = Q(c(t))
for the G1,1/2 -metric is given by
1
E(c) = |∂t c(t, 0)|2 + |∇t q(t, u)|2 du dt. (31)
0 I
∇t2 q(t, u) + |q(t, u)| r(t, u) + r(t, u)T =0, ∀(t, u) ∈ [0, 1] × I,
where the vector field r depends on the curvature tensor R of the base manifold
M and on the parallel transport ∂t c(t, v)v,u of the vector field ∂t c(t, ·) along c(t, ·)
from c(t, v) to c(t, u):
1
r(t, u) = R(q, ∇t q)∂t c(t, v)v,u dv.
u
In the flat case M = Rd , the curvature term r in the geodesic equation vanishes,
and we obtain ∇t ∂t c(t, 0) = ∂t2 c(t, 0) = 0, ∇t2 q(t, u) = ∂t2 q(t, u) = 0 for
all (t, u) ∈ [0, 1] × I . We then recover the fact that the geodesic for the SRV
metric between two curves in Rd links their starting points with a straight line
and linearly interpolates between their SRV representations. In the general case,
the initial value problem for geodesics can be solved by finite differences, and the
boundary value problem by geodesic shooting. In the case where the base manifold
M has constant sectional curvature, e.g., the sphere or the hyperbolic plane, a
comprehensive discrete framework was proposed in Le Brigant (2019) that correctly
approximates the continuous setting and makes numerical computations easier.
Ĝ(x,v) ((w1 , η1 ), (w2 , η2 )) = w1 , w2 + η1 (u), η2 (u)du.
I
1370 M. Bauer et al.
It is easily shown that the pullback of this metric to Imm(I, M) is invariant under
reparametrizations and under the group of isometries of M, and therefore yields an
alternative to the elastic metric (30). The energy of a path of curves [0, 1] t →
c(t) for this metric is given by an expression similar to (31)
1
E(c) = |∂t c(t, 0)|2 + |∇t q (t, u)|2 du dt. (32)
0 I
One finds that the conditions for such a curve to be a geodesic have been simplified
with respect to those of the exact elastic metric framework written in Proposition 1.
In the context of finding the geodesic c between two curves c1 and c2 , the first
equation describes the behavior of the baseline curve t → c(t, 0) linking the starting
points c1 (0) and c2 (0), and the second equation expresses the fact that q = Q (c)
is covariant linear, i.e., q (t, u) can be obtained as a linear interpolation between the
TSRV representations q1 (u) and q2 (u) of c1 and c2 , parallel transported along the
baseline curve to c(t, 0). The difficulty of implementing this method depends on
the particular manifold M. For curves in the sphere S 2 , the baseline curve linking
the starting points is a circular arc, thus yielding simplifications with respect to the
general geodesic shooting problem (Zhang et al. 2018a). The case of curves in the
space of positive definite symmetric matrices is studied in Zhang et al. (2018b).
√
where q ,p (u) is obtained by parallel translating the vector c (u)/ |c (u)| along
the shortest geodesic in M from c(u) to p. One then defines the distance between
,p ,p
two curves c0 and c1 to be the L2 distance between q0 = Q,p (c0 ) and q1 =
Q,p (c1 ), i.e.:
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1371
1/2
,p ,p
d(c0 , c1 ) = |q0 (u) − q1 (u)|2 du .
I
Implementation
In this section we will discuss the computation of the geodesic distance. We will
first briefly address the case of parametrized curves. In the second part, we will
then describe the main difficulty in this context which is the minimization over
reparametrizations in the group Diff+ (D). In particular we will describe several
different approaches that have been developed to tackle this highly nontrivial task.
For open curves with values in Euclidean space, Lie groups or homogenous spaces
and the SRV metric, there exist analytic solution formulas for these operations, and
thus these computations become trivial. For most of the other situations discussed
in this chapter, the absence of such formulas requires one to solve these problems
using numerical optimization. Therefore, one first has to choose a discretization for
all of the involved objects, i.e., one has to discretize the path of curves c(t, u) for
t ∈ [0, 1] and u ∈ D. A standard approach for this task consists of choosing B-
splines in both time and space, i.e.:
where Bi and Cj are the chosen B-spline basis functions and where ci,j for
i = 0 . . . Nt and j = 0 . . . Nu are the coefficients. Note that this includes as
a special case the discretization of regular curves as piecewise linear functions.
This procedure then reduces the calculation of the geodesic distance (12) to an
unconstrained minimization problem of the discretized length functional, where
the control points ci,j for i = 1 . . . Nt − 1 and j = 0 . . . Nu of the B-splines are
1372 M. Bauer et al.
the free variables. Here the control points of the boundary curves c0j and cNt j are
chosen as fixed parameters and are not changed in the optimization procedure. After
this discretization step, one can use standard methods of numerical optimization,
such as the L-BFGS method, to approximate the solution of the finite-dimensional
unconstrained minimization problem. For further information, in the notation of this
chapter, we refer the reader to the article Bauer et al. (2017). See also Bauer et al.
(2019a), Nardi et al. (2016), and Michor and Mumford (2006).
Normalization by Isometries
The shape space S(D, M) in (4) involves quotienting out isometric transformations
of M; in other words one has to technically minimize in (13) the elastic distance
over g ∈ Isom(M). This is a finite-dimensional group which, for most manifolds M
encountered in practice, usually has a simple parametric representation.
One common approach being used, although not rigorously equivalent to the
optimization in (13), is to pre-align the two shapes with respect to isometries of M
prior to estimating the elastic distance. When M = Rd , this amounts to finding the
optimal rotation and translation that best align them, which is classically addressed
by Procrustes analysis, cf., for example, Dryden and Mardia (2016).
Alternatively, one can parametrize the group Isom(M) and perform the min-
imization over g within the estimation of the distance itself, i.e., jointly with
reparametrizations. For planar curves, this simply amounts to optimizing over a
two-dimensional translation vector and the angle of rotation, which is the approach
used, in particular, in Bauer et al. (2017, 2019a). Note that for general Rd , a
similar strategy is also possible by representing rotations as the exponential of
antisymmetric matrices. In the case of manifold-valued curves however, normalizing
with respect to isometries of M may not always be relevant or can be harder
to deal with in practice. This typically depends on the availability of convenient
representations of the isometry group Isom(M); we refer the reader to Su et al.
(2018) where some simple examples are considered.
piecewise linear (i.e., polygonal) curves, one may in turn choose to look for an
optimal reparametrization of Diff+ (D) that is also piecewise linear. For curves
in a Euclidean space and the SRV metric, this is in part supported by the recent
work of Lahiri et al. (2015) where authors show that such optimal piecewise linear
reparametrizations exist. In general, as piecewise linear functions are a dense set in
the space of absolutely continuous functions, it is reasonable in practice to restrict
the search to reparametrizations of this form.
More specifically, assume that the two curves c0 and c1 are both piecewise linear.
For simplicity, let’s also assume that D = [0, 1] and that both curves are sampled
uniformly on D, namely, that c0 and c1 are linear on each of the subintervals Di =
[ti , ti+1 ] for all i = 0, . . . , N − 1 where ti = i/N. One may then approximate
positive diffeomorphisms in Diff+ (D) by piecewise linear homeomorphisms of D
with nodes in the set {0, t1 , t2 , . . . , tN }. Writing J = {t0 , t1 , t2 , . . . , tN }, we can
equivalently consider all the polygonal paths defined on the grid J × J joining
(0, 0) to (1, 1) and which are the graph of an increasing piecewise linear function
with nodes in J . This set Γ is now finite albeit containing a very large number of
possible paths.
Nevertheless, an efficient way to determine an optimal discrete reparametrization
is through dynamic programming. This is well-suited to situations where the energy
to minimize can be written as an additive function over the different segments of
the discrete path, which is made possible by the SRV transform in the case of
elastic G1,1/2 -metrics (or more generally for the Ga,b -metric using the transforms
of Younes et al. 2008, Needham and Kurtek 2020, and Bauer et al. 2014a). We want
to note here that this method is not well-suited to cases in which one does not has
access to an explicitly computable distance function, such as for the higher-order
elastic metrics.
Indeed, if γ ∈ Γ is piecewise linear on the K consecutive segments of vertices
(ti0 , tj0 ) = (0, 0), (ti1 , tj1 ), . . . , (tiK , tjK ) = (1, 1) with ti0 < ti1 < . . . < tiK and
tj0 < tj1 < . . . < tjK , then the discrete energy to be minimized is expressed as
K−1
i ,jm+1
E(γ ) = Q(c0 ) − Q(c1 ◦ γ )2L2 = E(γimm+1
,jm )
m=0
i ,j
where E(γimm+1 ,jm
m+1
) is the energy of the linear path from vertex (tim , tjm ) to
(tim+1 , tjm+1 ) and is given by
2
im+1 −1
E(γimm+1
i ,jm+1
)=
1 Q(c0 )(tk ) − tjm+1 − tjm Q(c1 )(tk ) .
,jm tim+1 − tim
N
k=im
Now the generic dynamic programming method first computes the minimal energy
among all paths in Γ going from (0, 0) to any given vertex (ti , tj ), which we write
E i,j , through the following iterative procedure on i:
1374 M. Bauer et al.
1. Set E (0,0) = 0.
2. For a given i ∈ {1, . . . , N } and all j ∈ {1, . . . , N}, compute E (i,j ) and P (i,j ) as
(i,j ) (i,j )
E (i,j ) = min E (k,l) + E(k,l) , P (i,j ) = argmin(k,l)∈Nij E (k,l) + E(k,l) (34)
(k,l)∈Nij
(i,j )
where E(k,l) denotes in short the energy of the linear path from vertex (tk , tl ) to
vertex (ti , tj ) and Nij is a set of admissible vertex indices connecting to (i, j ).
At the end of this process, one obtains the minimal energy E (N,N ) . A corresponding
optimal path γ ∈ Γ can be simply recovered by backtracking from the final vertex
(1, 1) to (0, 0), the index of the vertices in γ being specifically (iq , jq ) = (N, N ),
(iq−1 , jq−1 ) = P (iq ,jq ) , . . . , (i1 , j1 ) = P (i2 ,j2 ) and (i0 , j0 ) = P (i1 ,j1 ) = (0, 0).
The choice of search neighborhood Nij in the above procedure has a critical
impact on the resulting complexity. To find the true minimum over all possible paths
in Γ , one should technically take in (34), Nij = {(k, l) : 0 ≤ k ≤ i − 1, 0 ≤ k ≤
j − 1} for any 1 ≤ i, j ≤ N − 1. This would result however in a high numerical cost
of the order O(N 4 ). It can be significantly reduced by restricting Nij to a smaller
set of admissible neighboring vertices. For instance, authors in Mio et al. (2007)
propose to limit the search to a small square of size 3 × 3 with upper right vertex
(i − 1, j − 1). While this constrains the possible minimal and maximal slope of
the estimated γ , it is generally sufficient in most cases and reduces the numerical
complexity to O(N 2 ), making the whole approach efficient in practice. Note that
alternative dynamic programming algorithms have been investigated more recently,
in particular in the work of Bernal et al. (2016) which makes use of adaptive strips
neighborhoods to further reduce the complexity to O(N).
γ (u) = γk Dk (u).
k
Here Bi , Cj and Dk are the chosen basis functions for the discretization of the path
of curves and the reparametrization function, respectively. One difficulty in this
context is that the composition of the (discretized) target c(1, u) and the (discretized)
reparametrization function γ (u) typically leaves the chosen discretization space.
Thus one has to consider the corresponding projection operator that projects this
reparametrized curve back to the discretization space. This procedure can lead to
numerical phenomenona such as loss of features in the target curve. For more details
we refer to the presentation in Bauer et al. (2017).
Verc = ker dp(c) = mv := mc /|c | : m ∈ C ∞ ([0, 1], R), m(0) = m(1) = 0 .
Paths of curves with horizontal velocity vectors are called horizontal, and horizontal
geodesics for G project onto geodesics of the shape space for the Riemannian metric
induced by the Riemannian submersion p : Imm([0, 1], M) → S([0, 1], M); see,
e.g., Michor (2008, Section 26.12). A natural way to solve the boundary value
problem in the shape space is by fixing the parametrization c0 of one of the curves
and computing the horizontal geodesic linking c0 to the closest reparametrization
c1 ◦ γ of the second curve c1 , by iterative “horizontalizations” of geodesics. The
idea is to decompose any path of curves t → c(t) ∈ Imm(D, M) as
where in the second expression we have used the fact that ∂t chor ◦ γ is horizontal
by definition of chor , and ∂u chor is vertical as we can see from the first expression.
From this, we immediately see that if the metric G is reparametrization invariant,
taking the horizontal part of a path decreases its length:
LG (chor ) ≤ LG (c).
Therefore, by taking the horizontal part of the geodesic linking two curves c0 and
c1 , we obtain a shorter, horizontal path linking c0 to the fiber of c1 , which gives a
closer (in terms of G) representative c̃1 = c1 ◦ γ (1) of the target curve. However
it is no longer a geodesic path. By computing the geodesic between c0 and this
new representative c̃1 , we are guaranteed to reduce once more the distance to the
fiber. The optimal matching algorithm simply iterates these two steps, and converges
to a horizontal geodesic. At each step, the horizontal part of the geodesic can be
computed using the following result.
m(t, u)
∂t γ (t, u) = ∂u γ (t, u), (36)
|∂u c(t, u)|
with initial condition γ (0) = Id, and where m(t, u) := |∂t cver (t, u)|.
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1377
Fig. 4 Numerical comparison of the distance and geodesics between 3D curves lying on the unit
sphere modulo reparametrizations: first and third pictures for the SRVF metric in the Euclidean
space R3 (computed with the relaxed algorithm of Bauer et al. 2019b), second and forth pictures
for the SRVF distance on S 2 (estimated with the method of Su et al. 2018). Observe that the
geodesics calculated in Euclidean space do not stay on the sphere and thus result in a lower SRVF
distance
This method can be applied as long as the horizontal part of a tangent vector (or
equivalently, the norm of the vertical component m) can be computed. For the class
of Ga,b -elastic metrics, and for the SRV metric in particular, m can be found by
solving an ODE; see Le Brigant (2019). An example of geodesic between curves in
the hyperbolic plane estimated with this approach is shown in Fig. 1 (right).
1
inf Gc (∂t c, ∂t c)dt + λd̃(c(1), g ◦ c1 )2 (37)
0
1378 M. Bauer et al.
over all paths c : [0, 1] → Imm(D, Rd ) such that c(0) = c0 . Note that minimization
over γ ∈ Diff+ (D) is no longer needed here, and a minimizing path c of (37) is by
construction a geodesic between c0 and c(1) ≈ c1 in the quotient space S(D, Rd ).
In the above, λ > 0 denotes a fixed weighting coefficient between the two terms
which controls the accuracy of the matching to the target c1 . Other strategies such
as augmented Lagrangian methods can also be used to adapt the choice of this
parameter in order to reach a prescribed matching accuracy, cf. Bauer et al. (2019a).
Remark 3. In the specific case of the SRV metric of section “Curves in Rd ”, the
variational problem (37) can be even further simplified to a minimization problem
over the end curve c1 = c(1) ∈ Imm(D, Rd ) instead of a full curve path. Indeed,
using the properties of the SRV transform, it is easy to see that the problem can be
equivalently rewritten as
and leads, after discretization, to a simple minimization problem over the vertices
of the deformed curve. This formulation is for instance implemented in Bauer
et al. (2019b). Note that this principle also applies to other simplifying transforms
associated with different choices of elastic parameters as proposed and implemented
in Sukurdeep et al. (2019).
Open-Source Implementations
Several of the methods and algorithms described above are available in open-source
software packages. Here is a (non-exhaustive) list of some of these:
Conclusion
In this chapter, we reviewed the current state of the art of curve comparison through
intrinsic quotient Riemannian metrics for Euclidean as well as non-Euclidean
curves. We discussed the theoretical framework, in particular the questions of non-
degeneracy of Sobolev metrics and geodesic completeness of the corresponding
infinite-dimensional manifolds before analyzing more specifically the case of the
SRV metric for which the variational expression of the distance considerably
simplifies. We also discussed several numerical approaches that have been proposed
for the computation of such metrics in the different settings and for which several
open-source implementations are available.
There are many directions in which this framework can be extended. One is the
construction and computation of corresponding intrinsic metrics between surfaces
modulo reparametrizations. Due to their significantly more complex structure
than curves, this is a subject of ongoing and active investigations both from the
mathematical and numerical sides: we refer interested readers, e.g., to Jermyn et al.
(2017), Kurtek et al. (2011), Su et al. (2020), Tumpach et al. (2015), and Kilian et al.
(2007).
1380 M. Bauer et al.
Going back to curves, as noted in Remark 1, there have been several extensions
and variations of the SRV framework which introduced simplifying transforms
for other first-order metrics than the specific one considered in section “The SRV
Framework”. We finally mention the recent work of Younes (2018) which explored
the possibility to combine intrinsic Sobolev metrics with extrinsic diffeomorphism-
based metrics within a hybrid framework.
References
Bauer, M., Harms, P., Michor, P.W.: Sobolev metrics on shape space of surfaces. J. Geom. Mech.
3(4), 389–438 (2011)
Bauer, M., Bruveris, M., Harms, P., Michor, P.W.: Vanishing geodesic distance for the Riemannian
metric with geodesic equation the KdV-equation. Ann. Glob. Anal. Geom. 41(4), 461–472
(2012)
Bauer, M., Bruveris, M., Marsland, S., Michor, P.W.: Constructing reparameterization invariant
metrics on spaces of plane curves. Differ. Geom. Appl. 34, 139–165 (2014a)
Bauer, M., Bruveris, M., Michor, P.W.: Overview of the geometries of shape spaces and
diffeomorphism groups. J. Math. Imag. Vis. 50(1–2), 60–97 (2014b)
Bauer, M., Bruveris, M., Harms, P., Møller-Andersen, J.: A numerical framework for Sobolev
metrics on the space of curves. SIAM J. Imag. Sci. 10(1), 47–73 (2017)
Bauer, M., Bruveris, M., Charon, N., Møller-Andersen, J.: A relaxed approach for curve matching
with elastic metrics. ESAIM: Control Optim. Calc. Var. 25, 72 (2019a)
Bauer, M., Charon, N., Harms, P.: Inexact elastic shape matching in the square root normal field
framework. In: Geometric Science of Information, pp. 13–20. Springer, Cham (2019b)
Bauer, M., Harms, P., Michor, P.W.: Fractional sobolev metrics on spaces of immersions. Calc. Var.
Partial Differ. Equ. 59(2), 1–27 (2020a)
Bauer, M., Harms, P., Preston, S.C.: Vanishing distance phenomena and the geometric approach to
sqg. Arch. Ration. Mech. Anal. 235(3), 1445–1466 (2020b)
Bauer, M., Maor, C., Michor, P.W.: Sobolev metrics on spaces of manifold valued curves. arXiv
preprint arXiv:2007.13315 (2020c)
Beg, M.F., Miller, M.I., Trouvé, A., Younes, L.: Computing large deformation metric mappings via
geodesic flows of diffeomorphisms. Int. J. Comput. Vis. 61, 139–157 (2005)
Bernal, J., Dogan, G., Hagwood, C.R.: Fast dynamic programming for elastic registration of curves.
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),
pp. 1066–1073 (2016)
Binz, E., Fischer, H.R.: The manifold of embeddings of a closed manifold. In: Differential
Geometric Methods in Mathematical Physics, pp. 310–325. Springer, Berlin/Heidelberg/New
York (1981)
Bruveris, M.: Completeness properties of Sobolev metrics on the space of curves. J. Geom. Mech.
7(2), 125–150 (2015)
Bruveris, M.: Optimal reparametrizations in the square root velocity framework. SIAM J. Math.
Anal. 48(6), 4335–4354 (2016)
Bruveris, M., Møller-Andersen, J.: Completeness of length-weighted Sobolev metrics on the space
of curves (2017). arXiv:1705.07976
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1381
Bruveris, M., Michor, P.W., Mumford, D.: Geodesic completeness for Sobolev metrics on the space
of immersed plane curves. In: Forum of Mathematics, Sigma, vol. 2. Cambridge University
Press, Cambridge (2014)
Celledoni, E., Eidnes, S., Schmeding, A.: Shape analysis on homogeneous spaces: a generalised
srvt framework. In: The Abel Symposium, pp. 187–220. Springer (2016a)
Celledoni, E., Eslitzbichler, M., Schmeding, A.: Shape analysis on lie groups with applications in
computer animation. J. Geom. Mech. 8(3), 273–304 (2016b)
Cervera, V., Mascaro, F., Michor, P.W.: The action of the diffeomorphism group on the space of
immersions. Differ. Geom. Appl. 1(4), 391–401 (1991)
Charon, N., Trouvé, A.: The varifold representation of non-oriented shapes for diffeomorphic
registration. SIAM J. Imag. Sci. 6(4), 2547–2580 (2013)
Charon, N., Charlier, B., Glaunès, J., Gori, P., Roussillon, P.: Fidelity metrics between curves and
surfaces: currents, varifolds, and normal cycles. In: Riemannian Geometric Statistics in Medical
Image Analysis, pp. 441–477. Academic Press, San Diego (2020)
Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis, with Applications in R, 2nd edn. Wiley,
Chichester (2016)
Durrleman, S., Fillard, P., Pennec, X., Trouvé, A., Ayache, N.: Registration, atlas estimation and
variability analysis of white matter fiber bundles modeled as currents. NeuroImage 55(3), 1073–
1090 (2010)
Eliashberg, Y., Polterovich, L.: Bi-invariant metrics on the group of Hamiltonian diffeomorphisms.
Int. J. Math. 4(5), 727–738 (1993)
Glaunès, J., Qiu, A., Miller, M., Younes, L.: Large deformation diffeomorphic metric curve
mapping. Int. J. Comput. Vis. 80(3), 317–336 (2008)
Grenander, U.: General Pattern Theory: A Mathematical Study of Regular Structures. Clarendon
Press Oxford, Oxford/Clarendon/New York (1993)
Hamilton, R.S.: The inverse function theorem of Nash and Moser. Am. Math. Soc. 7(1), 65–122
(1982)
Huang, W., Gallivan, K.A., Srivastava, A., Absil, P.-A.: Riemannian optimization for registration
of curves in elastic shape analysis. J. Math. Imag. Vis. 54(3), 320–343 (2016)
Huang, W., Gallivan, K.A., Srivastava, A., Absil, P.-A., et al.: Riemannian optimization for elastic
shape analysis. In: Mathematical Theory of Networks and Systems. Springer (2014)
Jermyn, I.H., Kurtek, S., Laga, H., Srivastava, A.: Elastic shape analysis of three-dimensional
objects. Synth. Lect. Comput. Vis. 12(1), 1–185 (2017)
Jerrard, R.L., Maor, C.: Vanishing geodesic distance for right-invariant sobolev metrics on
diffeomorphism groups. Ann. Glob. Anal. Geom. 55(4), 631–656 (2019)
Kaltenmark, I., Charlier, B., Charon, N.: A general framework for curve and surface comparison
and registration with oriented varifolds. In: Computer Vision and Pattern Recognition (CVPR)
(2017)
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond.
Math. Soc. 16(2), 81–121 (1984)
Kilian, M., Mitra, N.J., Pottmann, H.: Geometric modeling in shape space. In: ACM Transactions
on Graphics (TOG), vol. 26, p. 64. ACM (2007)
Klassen, E., Srivastava, A., Mio, M., Joshi, S.H.: Analysis of planar shapes using geodesic paths
on shape spaces. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 372–383 (2004)
Kurtek, S., Klassen, E., Ding, Z., Jacobson, S.W., Jacobson, J.L., Avison, M.J., Srivastava, A.:
Parameterization-invariant shape comparisons of anatomical surfaces. IEEE Trans. Med. Imag.
30(3), 849–858 (2011)
Lahiri, S., Robinson, D., Klassen, E.: Precise matching of PL curves in RN in the square root
velocity framework. Geom. Imag. Comput. 2(3), 133–186 (2015)
Le Brigant, A.: Computing distances and geodesics between manifold-valued curves in the SRV
framework. J. Geom. Mech. 9(2), 131–156 (2017)
Le Brigant, A.: A discrete framework to find the optimal matching between manifold-valued
curves. J. Math. Imag. Vis. 61(1), 40–70 (2019)
1382 M. Bauer et al.
Mennucci, A.C., Yezzi, A., Sundaramoorthi, G.: Properties of Sobolev-type metrics in the space of
curves. Interfaces Free Bound. 10(4), 423–445 (2008)
Michor, P.W.: Manifolds of Differentiable Mappings, vol. 3. Birkhauser and Springer (1980)
Michor, P.W.: Topics in Differential Geometry, vol. 93. American Mathematical Society, Provi-
dence (2008)
Michor, P.W., Mumford, D.: Vanishing geodesic distance on spaces of submanifolds and diffeo-
morphisms. Doc. Math. 10, 217–245 (2005)
Michor, P.W., Mumford, D.: Riemannian geometries on spaces of plane curves. J. Eur. Math. Soc.
8, 1–48 (2006)
Michor, P.W., Mumford, D.: An overview of the riemannian metrics on spaces of curves using the
Hamiltonian approach. Appl. Comput. Harmon. Anal. 23(1), 74–113 (2007)
Mio, W., Srivastava, A., Joshi, S.: On shape of plane elastic curves. Int. J. Comput. Vis. 73(3),
307–324 (2007)
Nardi, G., Peyré, G., Vialard, F.-X.: Geodesics on shape spaces with bounded variation and Sobolev
metrics. SIAM J. Imag. Sci. 9(1), 238–274 (2016)
Needham, T., Kurtek, S.: Simplifying transforms for general elastic metrics on the space of plane
curves. SIAM J. Imag. Sci. 13(1), 445–473 (2020)
Roussillon, P., Glaunès, J.: Kernel metrics on normal cycles and application to curve matching.
SIAM J. Imag. Sci. 9(4), 1991–2038 (2016)
Srivastava, A., Klassen, E.: Functional and Shape Data Analysis. Springer Series in Statistics.
Springer, New York (2016)
Srivastava, A., Klassen, E., Joshi, S.H., Jermyn, I.H.: Shape analysis of elastic curves in Euclidean
spaces. IEEE T. Pattern Anal. 33(7), 1415–1428 (2011)
Su, J., Kurtek, S., Klassen, E., Srivastava, A.: Statistical analysis of trajectories on Riemannian
manifolds: bird migration, hurricane tracking and video surveillance. Ann. Appl. Stat. 8(1),
530–552 (2014)
Su, Z., Klassen, E., Bauer, M.: The square root velocity framework for curves in a homogeneous
space. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition
Workshops, pp. 680–689 (2017)
Su, Z., Klassen, E., Bauer, M.: Comparing curves in homogeneous spaces. Differ. Geom. Appl. 60,
9–32 (2018)
Su, Z., Bauer, M., Preston, S.C., Laga, H., Klassen, E.: Shape analysis of surfaces using general
elastic metrics. J. Math. Imag. Vis. 62, 1087–1106 (2020)
Sukurdeep, Y., Bauer, M., Charon, N.: An inexact matching approach for the comparison of plane
curves with general elastic metrics. In: 2019 53rd Asilomar Conference on Signals, Systems,
and Computers, pp. 512–516. IEEE (2019)
Sundaramoorthi, G., Yezzi, A., Mennucci, A.C.: Sobolev active contours. Int. J. Comput. Vis.
73(3), 345–366 (2007)
Trouvé, A.: Diffeomorphisms groups and pattern matching in image analysis. Int. J. Comput. Vis.
28(3), 213–221 (1998)
Trouvé, A., Younes, L.: Diffeomorphic matching problems in one dimension: Designing and
minimizing matching functionals. In: European Conference on Computer Vision, pp. 573–587.
Springer (2000a)
Trouvé, A., Younes, L.: On a class of diffeomorphic matching problems in one dimension. SIAM
J. Control Optim. 39(4), 1112–1135 (2000b)
Tumpach, A.B., Drira, H., Daoudi, M., Srivastava, A.: Gauge invariant framework for shape
analysis of surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 46–59 (2015)
Tumpach, A.B., Preston, S.C.: Quotient elastic metrics on the manifold of arc-length parameterized
plane curves. J. Geom. Mech. 9(2), 227–256 (2017)
Younes, L.: Computable elastic distances between shapes. SIAM J. Appl. Math. 58(2), 565–586
(1998)
Younes, L.: Hybrid Riemannian metrics for diffeomorphic shape registration. Ann. Math. Sci.
Appl. 3(1), 189–210 (2018)
Younes, L.: Shapes and Diffeomorphisms. Springer (2019)
39 Intrinsic Riemannian Metrics on Spaces of Curves: Theory and Computation 1383
Younes, L., Michor, P.W., Shah, J., Mumford, D.: A metric on shape space with explicit geodesics.
Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Natur. 19(1), 25–57 (2008)
Zhang, Z., Su, J., Klassen, E., Le, H., Srivastava, A.: Video-based action recognition using rate-
invariant analysis of covariance trajectories. arXiv preprint arXiv:1503.06699 (2015)
Zhang, Z., Klassen, E., Srivastava, A.: Phase-amplitude separation and modeling of spherical
trajectories. J. Comput. Graph. Stat. 27(1), 85–97 (2018a)
Zhang, Z., Su, J., Klassen, E., Le, H., Srivastava, A.: Rate-invariant analysis of covariance
trajectories. J. Math. Imag. Vis. 60(8), 1306–1323 (2018b)
An Overview of SaT Segmentation
Methodology and Its Applications in Image 40
Processing
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386
SaT Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1389
SaT-Based Methods and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1392
T-ROF Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1392
Two-Stage Method for Poisson or Gamma Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1393
SLaT Method for Color Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396
Two-Stage Method for Hyperspectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1398
Tight-Frame-Based Method for Images with Vascular Structures . . . . . . . . . . . . . . . . . . . 1400
Wavelet-Based Segmentation Method for Spherical Images . . . . . . . . . . . . . . . . . . . . . . . 1401
Three-Stage Method for Images with Intensity Inhomogeneity . . . . . . . . . . . . . . . . . . . . . 1403
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406
Abstract
X. Cai ()
School of Electronics and Computer Science, University of Southampton, Southampton, UK
e-mail: [email protected]
R. Chan ()
Department of Mathematics, College of Science, City University of Hong Kong, Kowloon Tong,
Hong Kong, China
e-mail: [email protected]
T. Zeng ()
Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China
e-mail: [email protected]
Keywords
Introduction
K−1
Assuming that Ω = i=0 Ωi with pairwise disjoint sets Ωi and constant functions
u(x) ≡ mi on Ωi , i = 0, . . . K − 1, model (2) can be rewritten as
1
K−1
K−1
EPCMS (Ω, m) = Per(Ωi ; Ω) + λ (mi − f )2 dx, (3)
2 Ωi
i=0 i=0
K−1 K−1
where Ω := {Ωi }i=0 , m := {mi }i=0 , and Per(Ωi ; Ω) denotes the perimeter of Ωi
in Ω. If the number of phases is two, i.e., K = 2, the PCMS model is the model of
the active contours without edges (Chan-Vese model) (Chan and Vese 2001),
ECV (Ω1 , m0 , m1 ) = Per(Ω1 ; Ω) + λ (m1 − f )2 dx + (m0 − f )2 dx .
Ω1 Ω\Ω1
(4)
In Chan and Vese (2001), the authors proposed to solve (4), where it can easily get
stuck in local minima. To overcome this drawback, a convex relaxation approach
was proposed in Chan et al. (2006a). More precisely, it was shown that a global
minimizer of ECV (·, m0 , m1 ) for fixed m0 , m1 can be found by solving
ū = arg min T V (u) + λ (m0 − f )2 − (m1 − f )2 u dx , (5)
u∈BV (Ω) Ω
and setting Ω1 := {x ∈ Ω : ū(x) > ρ} for any choice of ρ ∈ [0, 1); see also
Bellettini et al. (1991) and Bresson et al. (2007). Note that the first term of (5)
is known as the total variation (T V ) and the space BV is the space of functions
of bounded variation; see Section 2 for the definition. In other words, (5) is a
tight relaxation of the Chan-Vese model with fixed m0 and m1 . For the convex
formulation of the full model (4), see Brown et al. (2012).
There are many other approaches for two-phase image segmentation based on
the Chan-Vese model and its convex version; see, e.g., Zhang et al. (2008), Bresson
et al. (2007), Dong et al. (2010), and Bauer et al. (2017). In particular, a hybrid level
1388 X. Cai et al.
set method was proposed in Zhang et al. (2008), which replaces the first term of
(4) by a boundary feature map and the data fidelity terms in (4) by the difference
between the given image f and a fixed threshold chosen by a user or a specialist.
Method Zhang et al. (2008) was used in medical image segmentation. However,
since every time it needs the user to choose a proper threshold for its model, it is
not automatic and thus its applications are restricted. In Bresson et al. (2007), the
T V term of (5) was replaced by a weighted T V term which helps the new model
to capture much more important geometric properties. In Dong et al. (2010), the
T V term of (5) was replaced by a wavelet frame decomposition operator which,
similar to the model in Bresson et al. (2007), can also capture important geometric
properties. Nevertheless, for its solution u, no similar conclusions as the ones in
Chan et al. (2006a) can be addressed; that is, there is no theory to support that its
segmentation result Ω1 = {x : u(x) > ρ} for ρ ∈ [0, 1) is a solution as to some kind
of objective functional. In Bauer et al. (2017), the Chan-Vese model was extended
for 3D biopore segmentation in tomographic images.
In Vese and Chan (2002), Chan and Vese proposed a multiphase segmentation
model based on the PCMS model using level sets. However, this method can also
get stuck easily in local minima. Convex (non-tight) relaxation approaches for the
PCMS model were proposed, which are basically focusing on solving
K−1
K−1
K−1
min |∇ui |dx + λ (mi − f )2 ui dx , s.t. ui = 1.
mi ,ui ∈[0,1] Ω Ω
i=0 i=0 i=0
(6)
For more detail along this line, refer, e.g., to Bar et al. (2011), Cai (2015), Cai et al.
(2015), Lellmann and Schnörr (2011), Li et al. (2010), Pock et al. (2009a), Yuan
et al. (2010b), Zach et al. (2008) and the references therein.
In 1992, Rudin, Osher, and Fatemi (Rudin et al. 1992) proposed the variational
model
μ
min T V (u) + u − f )2 dx , μ > 0. (7)
u∈BV (Ω) 2 Ω
which has been studied extensively in the literature; see, e.g., Chambolle (2005),
Chambolle et al. (2010), Chan et al. (2006b) and references therein.
A subtle connection between image segmentation and image restoration has been
raised in Cai et al. (2013b). In detail, a two-stage image segmentation method
is proposed – SaT method – which finds the solution of a convex variant of the
Mumford-Shah model in the first stage, followed by a thresholding step in the
second one. The convex minimization functional in the first stage (the smoothing
stage) is the ROF functional (7) plus an additional smoothing term Ω |∇u|2 dx. In
Cai et al. (2019), a linkage between the PCMS and ROF models was shown, which
gives rise to a new image segmentation paradigm: manipulating image segmentation
through image restoration plus thresholding. This is also the essence of the SaT
segmentation methodology.
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1389
The remainder of this article is organized as follows. Firstly, the SaT segmen-
tation and its advantages are introduced. After that, more SaT-based methods and
applications are presented and demonstrated, followed by a brief conclusion.
SaT Methodology
The main procedures of the SaT segmentation methodology are first smoothing
and then thresholding, where the smoothing step is executed by solving pertinent
convex objective functions (note that most of segmentation models in literature are
nonconvex and therefore much harder to handle compared to convex models) and
the thresholding step is just completed by thresholding the result from the smoothing
step using proper thresholds; see an instance given below.
The smoothing process in Cai et al. (2013b) is to solve the convex minimization
problem (cf. the non-smooth Mumford-Shah functional (1)):
μ λ
inf (f − Ag)2 dx + |∇g|2 dx + |∇g|dx , (8)
g∈W 1,2 (Ω) 2 Ω 2 Ω Ω
where λ and μ are positive parameters and A is the blurring operator if the observed
image is blurred by A or the identity operator if there is no blurring. The minimizer
of (8) is a smoothed approximation of f . The first term in (8) is the data-fitting term,
the second term ensures smoothness of the minimizer, and the third term ensures
regularity of the level sets of the minimizer. We emphasize that model (8) can be
minimized quickly by using currently available efficient algorithms such as the split-
Bregman algorithm (Goldstein and Osher 2009) or the Chambolle-Pock method
(Chambolle and Pock 2011). After we have obtained g in (8), assume we are given
the thresholds
linear operator from L2 (Ω) to itself and Ker(A) is the kernel of A. Then (8) has a
unique minimizer g ∈ W 1,2 (Ω).
Figures 1, 2, and 3 illustrate the SaT framework using the two-phase segmenta-
tion strategy in Cai et al. (2013b).
1390 X. Cai et al.
Fig. 1 Segmentations with Gaussian noise and blur. (a) Given binary image; (b) degraded image
with motion blur (for the motion blur, the motion is vertical and the filter size is 15) and Gaussian
noise (with mean 10−3 and variance 2 × 10−3 ); (c) Chan-Vese method (Chan and Vese 2001); and
(d) SaT segmentation with K-means thresholding (Cai et al. 2013b)
Fig. 2 SaT segmentation framework illustration using a two-phase segmentation example. (a)
Given image (size 384×480); (b) obtained smoothed image (i.e., a solution of the convex model in
Cai et al. 2013b); (c) segmentation result (boundary highlighted in yellow color) after thresholding
(b) using threshold 0.2. Particularly, (b) and (c) correspond to the first and second steps in the SaT
segmentation framework, respectively
The good performance of the SaT approach is solidly backed up. If we set the
parameter λ in (8) to zero, one can show (see Cai and Steidl 2013 and Cai et al.
2019) that the SaT method is equivalent to the famous Chan-Vese segmentation
method (Chan and Vese 2001), which is a simplified Mumford-Shah model.
Furthermore, numerical experiments show that a properly selected λ can usually
increase segmentation accuracies.
The SaT method is very efficient and flexible. It performs excellently for
degraded images (e.g., noisy and blurry images and images with information loss). It
also has the following advantages. Firstly, the smoothing model with (8) is strictly
convex. This guarantees a unique solution of (8), which can be solved efficiently
by many optimization methods. Secondly, the thresholding step is independent of
the smoothing step. Therefore, the SaT approach is capable of segmentations with
arbitrary phases, and one can easily try different thresholds without recalculating
(8). On the contrary, for other segmentation methods, the number of phases K has
to be determined before the calculation, and it is usually computationally expensive
to regenerate a different segmentation if K changes. Thirdly, the SaT approach is
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1391
Fig. 3 (continued)
1392 X. Cai et al.
very flexible. One can easily modify the smoothing step to better segment images
with specific properties.
The SaT segmentation methodology has been used for images corrupted by
Poisson and Gamma noises (Chan et al. 2014), degraded color images (Cai et al.
2017), images with intensity inhomogeneity (Chan et al. 2019), hyperspectral
images (Chan et al. 2020), vascular structures (Cai et al. 2011, 2013a), spherical
images (Cai et al. 2020), etc.
T-ROF Method
In Cai and Steidl (2013) and Cai et al. (2019), the thresholded-ROF (T-ROF) method
was proposed. It highlights a relationship between the PCMS model (3) and the
ROF model (7), proving that thresholding the minimizer of the ROF model leads to
a partial minimizer of the PCMS model when K = 2 (Chan-Vese model (4)), which
remains true under specific assumptions when K > 2.
Theorem 2 (Relation between ROF and PCMS models for K = 2). Let K = 2
and u∗ ∈ BV (Ω) solve the ROF model (7). For given 0 < m0 < m1 ≤ 1, let
Σ̃ := {x ∈ Ω : u∗ (x) > m1 +m
2
0
} fulfill 0 < |Σ̃| < |Ω|. Then Σ̃ is a minimizer of
the PCMS model (4) for λ := 2(m1μ−m0 ) and fixed m0 , m1 . In particular, (Σ̃, m0 , m1 )
is a partial minimizer of (4) if m0 = meanf (Ω\Σ̃) and m1 = meanf (Σ̃).
This linkage between the PCMS model and the ROF model validates the effec-
tiveness of the proposed SaT method in Cai et al. (2013b) for image segmentation.
Due to the significance of the PCMS model and ROF model, respectively, in image
segmentation and image restoration, this linkage bridges to some extent these
two research areas and might serve as a motivation to improve and design better
Fig. 3 Four-phase segmentation. (a) Clean 256 × 256 image; (b) given noisy image (Gaussian
noise with zero mean and variance 0.03); (c)–(e) results of methods Li et al. (2010), Sandberg
et al. (2010) and Yuan et al. (2010b), respectively; (f) obtained smoothed image (i.e., a solution
of the convex model in Cai et al. (2013b)); (g) segmentation result after thresholding (f) using
thresholds ρ1 = 0.1652, ρ2 = 0.4978, ρ3 = 0.8319; (h)–(k) boundary of each phase of the result
in (g)
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1393
The Poisson noise and the multiplicative Gamma noise are firstly recalled below.
For the Poisson noise, for each pixel x ∈ Ω, we assume that the intensity f (x)
is a random variable following the Poisson distribution with mean g(x), i.e., its
probability mass function is
(g(x))n e−g(x)
pf (x) (n; g(x)) = ,
n!
1394 X. Cai et al.
Fig. 4 Retina image segmentation which contains extremely thin vessels (size 584 × 565). (a)
Clean image; (b) noisy image; (c)–(h) results of methods (Li et al. 2010; Pock et al. 2009a; Yuan
et al. 2010b; He et al. 2012; Cai et al. 2013b) and the T-ROF method (Cai et al. 2019), respectively
where n is the intensity of f at the pixel x. In this case, we say that f is corrupted by
Poisson noise. For the Gamma noise, suppose that for each pixel x ∈ Ω the random
variable η(x) follows the Gamma distribution, i.e., its probability density function is
1 y
pη(x) (y; θ, K) = y K−1 e− θ for y ≥ 0, (9)
θ K Γ (K)
where Γ is the usual Gamma-function and θ and K denote the scale and shape
parameters in the Gamma distribution, respectively. Notice that the mean of η(x) is
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1395
where β is a parameter. If the noise follows the Poisson distribution, then maximiz-
ing p(g|f ) corresponds to minimizing the functional
(g − f log g)dx + β |∇g|dx (10)
Ω Ω
(see Le et al. 2007). If the noise is multiplicative following the Gamma distribution,
then maximizing p(u|f ) corresponds to minimizing the functional
f
( + log g)dx + β |∇g|dx (11)
Ω g Ω
(see Aubert and Aujol 2008). However, it is observed in the numerical examples in
Aubert and Aujol (2008) and Shi and Osher (2008) that for the denoising model (11)
the noise survives much longer at low image values if we increase the regularization
parameter. Therefore, in Shi and Osher (2008), the authors suggested to take w =
log g and change the objective functional (11) to
−w
(f e + w)dx + β |∇w|dx. (12)
Ω Ω
In Chan et al. (2014), a two-stage method for segmenting blurry images in the
presence of Poisson or multiplicative Gamma noise is proposed. It was inspired by
the SaT segmentation method in Cai et al. (2013b) and the Gamma noise denoising
method in Steidl and Teuber (2010). Specifically, the data fidelity term of the model
(8) at the first stage of the SaT segmentation method in Cai et al. (2013b) was
replaced by the one which is suitable for Gamma noise, i.e.,
λ
inf μ (Ag − f log Ag)dx + |∇g| dx +2
|∇g|dx . (13)
g∈W 1,2 (Ω) Ω 2 Ω Ω
1396 X. Cai et al.
Fig. 5 Segmentations of a fractal image corrupted with Gamma noise and blur. (a) Degraded
image; (b)–(d) results of methods (Yuan et al. 2010a; Dong et al. 2011), and SaT with user-
provided thresholds (Chan et al. 2014), respectively. For clarity, only the top-left corner of
the segmentations is shown. We see that the SaT method produces the best result, with the
segmentation line (the yellow line) very close to the real boundary
Then at the second stage the solution g is thresholded to reveal different segmenta-
tion features.
The follow Theorems 3 and 4 assure that model (13) has a unique minimizer with
identity or blurring operator A.
Figure 5 gives an example which shows the great performance of the SaT-based
method (Chan et al. 2014) for images with multiplicative Gamma noise.
2000; Cremers et al. 2007; Jung et al. 2007; Kay et al. 2009; Martin et al. 2001;
Pock et al. 2009a; Storath and Weinmann 2014), among others. It is often mentioned
that the RGB color space is not well adapted to segmentation because for real-world
images the R, G, and B channels can be highly correlated. In Rotaru et al. (2008),
RGB images are transformed into HSI (hue, saturation, and intensity) color space
in order to perform segmentation. In Benninghoff and Garcke (2014), a general
segmentation approach was developed for gray-value images and further extended
to color images in the RGB, the HSV (hue, saturation, and value), and the CB
(chromaticity-brightness) color spaces. However, a study on this point in Paschos
(2001) has shown that the Lab (perceived lightness, red-green, and yellow-blue)
color space defined by the CIE (Commission Internationale de l’Eclairage) is better
adapted for color image segmentation than the RGB and the HSI color spaces.
In Cardelino et al. (2013), RGB input images were first converted to Lab space.
In Wang et al. (2015), color features were described using the Lab color space and
texture using histograms in RGB space.
A careful examination of the methods that transform a given RGB image to
another color space (HSI, CB, Lab, etc.) before performing the segmentation task
has shown that these algorithms are always applied only to noise-free RGB images
(though these images unavoidably contain quantization and compression noise). For
instance, this is the case of Benninghoff and Garcke (2014), Cardelino et al. (2013),
Rotaru et al. (2008) and Wang et al. (2015), among others. One of the main reasons
is that if the input RGB image is degraded, the degradation would be hard to control
after a transformation to another color space (Paschos 2001).
A color image is usually represented by a vector valued function f =
(f1 , f2 , f3 ) : Ω → R3 , where the components f1 , f2 , and f3 generally
represent red, green, and blue channels, respectively. The difficulty for color
image segmentation partly comes from the strong interchannel correlation. A novel
extension of the SaT approach is the smoothing, lifting, and thresholding (SLaT)
method introduced in Cai et al. (2017), which is able to work on vector-valued
(color) images possibly corrupted with noise, blur, and missing data. One first
solves (8) for the three components f1 , f2 , and f3 to obtain three smooth functions
g1 , g2 , and g3 . Then one transforms (g1 , g2 , g3 ) to another color space (ḡ1 , ḡ2 , ḡ3 )
which can reduce interchannel correlation. This is the lifting process, and the
Lab color space is usually a good choice. In the thresholding step, one performs
K-means to threshold the lifted image with 6 channels (g1 , g2 , g3 , ḡ1 , ḡ2 , ḡ3 ) to get
the phases.
In Cai et al. (2017), model (8) was also extended to tackle information loss and
both Gaussian and Poisson noises. In particular, the existence and uniqueness of the
extended model with information loss and both Gaussian and Poisson noises was
also proved.
This SLaT method is easy to implement with promising results; see Fig. 6 with
images chosen from the Berkeley Segmentation Dataset and Benchmark (https://
www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/) Moreover, the SLaT
method has the ability to segment color images corrupted by noise, blur, or when
some pixel information is lost. More experimental results in Cai et al. (2017) on
1398 X. Cai et al.
Fig. 6 Color image segmentation for degraded images. First row: degraded color images (the first
three images are degraded by various noise and blur, and the last two images are degraded by 60%
information loss and noise). Second row: Pock et al. (2009a). Third row: SLaT method (Cai et al.
2017)
RGB images coupled with Lab secondary color space demonstrate that the method
gives much better segmentation results for images with degradation than some state-
of-the-art segmentation models both in terms of quality and CPU time cost.
Remotely sensed hyperspectral images are images taken from drones, airplanes,
or satellites that record a wide range of electromagnetic spectrum, typically more
than 100 spectral bands from visible to near-infrared wavelengths. Since different
materials reflect different spectral signatures, one can identify the materials at each
pixel of the image by examining its spectral signatures. Hyperspectral images are
used in many applications, including agriculture (Patel et al. 2001; Datt et al. 2003),
disaster relief (Eismann et al. 2009), food safety (Gowen et al. 2007), military
(Manolakis and Shaw 2002; Stein et al. 2002), and mineralogy (Hörig et al. 2001).
One of the most important problems in hyperspectral data exploitation is
hyperspectral image classification. It has been an active research topic in past
decades (Fauvel et al. 2013). The pixels in the hyperspectral image are often
labeled manually by experts based on careful review of the spectral signatures and
investigation of the scene. Given these ground-truth labels of some pixels (also
called “training pixels”), the objective of hyperspectral image classification is to
assign labels to part or all of the remaining pixels (the “testing pixels”) based on
their spectral signatures and their locations.
In Chan et al. (2020), a two-stage method was proposed based on the SaT method
(Cai et al. 2013b) for hyperspectral image classification. Pixel-wise classifiers, such
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1399
as the classical support vector machine (SVM), consider spectral information only.
As spatial information is not utilized, the classification results are not optimal,
and the classified image may appear noisy. Many existing methods, such as
morphological profiles, superpixel segmentation, and composite kernels, exploit the
spatial information. In Chan et al. (2020), a two-stage approach was proposed. In
the first stage, SVMs are used to estimate the class probability for each pixel. In
the second stage, the SaT model is applied to each probability map to denoise and
segment the image into different classes. The proposed method effectively utilizes
both spectral and spatial information of the datasets and is fast as only convex
minimization is needed in addition to the SVMs.
We emphasize that the convex model used in Chan et al. (2020) is the model
(8) at the first stage of the SaT segmentation method in Cai et al. (2013b), with a
constraint, i.e.,
μ λ
inf (fk − Agk )2 dx + |∇gk |2 dx + |∇gk |dx ,
gk 2 Ω 2 Ω Ω
where fk represents the probability map of the kth class obtained from stage one
using the SVM method, gk is the improved probability map of the kth class, and
Ωtrain is the set of training pixels. After obtaining gk , k = 1 . . . , K, individual pixels
will be labeled to a set which possesses the maximum values among gk (x), k =
1 . . . , K. Note that the above stage two performs like the SaT strategy.
Figure 7 gives an example which shows the great performance of the two-stage
method (Chan et al. 2020) for hyperspectral image classification. For more detail,
please refer to Chan et al. (2020).
Fig. 7 Hyperspectral image classification of the Indian Pines dataset. (a) Ground truth, (b) training
set (10% of total pixels), and (c) classification with SaT (Chan et al. 2020) (98.83% overall
accuracy)
1400 X. Cai et al.
1
f (i+ 2 ) = U (f (i) ), (15)
(i+ 12 )
f (i+1) = A T Tλ (A f ), i = 1, 2, . . . . (16)
sgn(vk )(|vk | − λk ), if |vk | > λk ,
tλk (vk ) ≡ (17)
0, if |vk | ≤ λk .
Let P (i+1) be the diagonal matrix where the diagonal entry is 1 if the correspond-
ing index is in Λ(i+1) and 0 otherwise. Then
1 1
f (i+1) ≡ (I − P (i+1) )f (i+ 2 ) + P (i+1) A T Tλ (A f (i+ 2 ) ). (18)
By reordering the entries of the vector f (i+1) into columns, we obtain the image
f (i+1) . We remark that the effect of (18) is to denoise and smooth the image on
Λ(i+1) .
Figures 8 and 9 give examples which show the great performance of the tight-
frame-based method (Cai et al. 2013a) for images with tube-like structures. For
more detail, please refer to Cai et al. (2013a).
Fig. 8 Segmentation of the kidney volume dataset. (a) Given CTA image; (b) CURVES segmen-
tation (Lorigo et al. 2001); (c) ADA segmentation (Franchini et al. 2009); (d) tight-frame-based
method (Cai et al. 2013a)
1402 X. Cai et al.
Fig. 9 Segmentation of the brain volume dataset. (a) Given MRA image; (b) CURVES segmen-
tation (Lorigo et al. 2001); (c) ADA segmentation (Franchini et al. 2010); (d) tight-frame-based
method (Cai et al. 2013a)
geophysics (Simons et al. 2011), and neuroscience (Rathi et al. 2011), where images
are naturally defined on the sphere. Clearly, images defined on the sphere are
different to Euclidean images in 2D and 3D in terms of symmetries, coordinate
systems, and metrics constructed (see, e.g., Li and Hai 2010).
Wavelets have become a powerful analysis tool for spherical images, due to
their ability to simultaneously extract both spectral and spatial information. A
variety of wavelet frameworks have been constructed on the sphere in recent years,
e.g., Baldi et al. (2009), McEwen et al. (2018), and have led to many insightful
scientific studies in the fields mentioned above (see McEwen et al. 2007b, Schmitt
et al. 2012, Audet 2014, Simons et al. 2011, Rathi et al. 2011). Different types
of wavelets on the sphere have been designed to probe different structures in
spherical images, for example, isotropic or directional and geometrical features,
such as linear or curvilinear structures, to mention a few. Axisymmetric wavelets
(Baldi et al. 2009; Leistedt et al. 2013) are useful for probing spherical images
with isotropic structure, directional wavelets (McEwen et al. 2018) for probing
directional structure, ridgelets (Michailovich and Rathi 2010; Starck et al. 2006)
for analyzing antipodal signals on the sphere, and curvelets (Starck et al. 2006;
Chan et al. 2017) for studying highly anisotropic image content such as curve-
like features (we refer to Candés and Donoho (2005) for the general definition
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1403
Fig. 10 Results of light probe image – the Uffizi Gallery. First row: noisy image shown on
the sphere (a) and in 2D using a Mollweide projection (b) and the zoomed-in red rectangle
area of the noisy (c) and original images (d), respectively; second to fourth rows from left to
right: results of methods K-means (e), WSSA-A (f), WSSA-D (g) with N = 6 (even N ), and
WSSA-H (h), respectively. Note that methods WSSA-A, WSSA-D, and WSSA-H are the wavelet-
based segmentation method (Cai et al. 2020), respectively, equipped with axisymmetric wavelets,
directional wavelets, and hybrid wavelets defined on the sphere
motivate the curve evolution. These models can handle the intensity inhomogeneity
to some extent.
In Li et al. (2020), a new three-stage segmentation framework was proposed
based on the SaT method and the intensity inhomogeneity information of an image.
The first stage in this framework is to perform a dimension lifting method. An
intensity inhomogeneity image is added as an additional channel, which results in
a vector-valued image. In the second stage, a SaT model is applied to each channel
of the vector-valued image to obtain a smooth approximation. The semi-proximal
alternating direction method of multipliers (sPADMM) (Han et al. 2018) is used to
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1405
Fig. 11 Segmentation results on single-channel images. In the first and the third rows, the
first column, images from the Alpert’s dataset (size: 300 × 225) and, the second column, the
corresponding intensity inhomogeneity images, respectively. In the second and the fourth rows,
from the first column to the last column, segmentation results of the methods in Cai et al. (2017),
Li et al. (2010, 2020), Zhi and Shen (2018), Wang et al. (2009) and the ground truth
solve this model, and it is proved that the sPADMM for solving this convex model
has Q-linear convergence rate. In the last stage, a thresholding method is applied to
the smoothed vector-valued image to get the final segmentation.
Figure 11 shows the great performance of the three-stage method (Li et al. 2020)
incorporating intensity inhomogeneity information, and Fig. 12 demonstrates that
Li et al. (2020) provides the most accurate segmentation results in comparison with
five state-of-the-art methods including a deep learning approach (U-net method)
(Ronneberger et al. 2015). For more detail, please refer to Li et al. (2020).
Conclusions
Fig. 12 Column (a), the original images from the 100 test dataset; column (b), segmentation
results of the U-net method (Ronneberger et al. 2015); column (c), segmentation results of the
method in Li et al. (2020); and column (d), segmentation results of the method in Cai et al. (2017)
various segmentation tasks. The SaT approach connects the segmentation problem
to image restoration problem. Recent researches show that the SaT method can also
be applied to classification problems. We hope that, with this article, the SaT method
can reach audiences from broader areas and can inspire more cross-disciplinary
researches.
References
Ambrosio, L., Tortorelli, V.: Approximation of functions depending on jumps by elliptic function-
als via t-convergence. Commun. Pure Appl. Math. 43, 999–1036 (1990)
Aubert, G., Aujol, J.: A variational approach to removing multiplicative noise. SIAM J. Appl.
Math. 68, 925–946 (2008)
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1407
Audet, P.: Toward mapping the effective elastic thickness of planetary lithospheres from a
spherical wavelet analysis of gravity and topography. Phys. Earth Planet Inter. 226, 48–82
(2014)
Baldi, P., Kerkyacharian, G., Marinucci, D., Picard, D.: Asymptotics for spherical needlets. Ann.
Stat. 37(3), 1150–1171 (2009)
Bar, L., Chan, T., Chung, G., Jung, M., Kiryati, N., Mohieddine, R., Sochen, N., Vese, L.: Mumford
and shah model and its applications to image segmentation and image restoration. In: Handbook
of Mathematical Imaging, pp. 1095–1157. Springer, New York (2011)
Bauer, B., Cai, X., Peth, S., Schladitz, K., Steidl, G.: Variational-based segmentation of biopores
in tomographic images. Comput. Geosci. 98, 1–8 (2017)
Bellettini, G., Paolini, M., Verdi, C.: Convex approximations of functionals with curvature. Math.
Appl. 2(4), 297–306 (1991)
Benninghoff, H., Garcke, H.: Efficient image segmentation and restoration using parametric curve
evolution with junctions and topology changes. SIAM J. Imag. Sci. 7(3), 1451–1483 (2014)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge, MA (1987)
Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J., Osher, S.: Fast global minimization of the
active contour/snake model. J. Math. Imag. Vis. 28(2), 151–167 (2007)
Brown, E., Chan, T., Bresson, X.: Completely convex formulation of the chan-vese image
segmentation model. Int. J. Comput. Vis. 98, 103–121 (2012)
Brox, T., Rousson, M., Deriche, R., Weickert, J.: Colour, texture, and motion in level set based
segmentation and tracking. Image Vis. Comput. 28, 376–390 (2010)
Cai, X.: Variational image segmentation model coupled with image restoration achievements.
Pattern Recogn. 48(6), 2029–2042 (2015)
Cai, X., Steidl, G.: Multiclass segmentation by iterated ROF thresholding. In: Energy Min-
imization Methods in Computer Vision and Pattern Recognition, pp. 237–250. Springer,
Berlin/Heidelberg (2013)
Cai, J., Chan, R., Shen, Z.: A framelet-based image inpainting algorithm. Appl. Comput. Harmon.
Anal. 24, 131–149 (2008)
Cai, X., Chan, R., Morigi, S., Sgallari, F.: Framelet-based algorithm for segmentation of tubular
structures. In: SSVM. LNCS6667. Springer (2011)
Cai, X., Chan, R., Morigi, S., Sgallari, F.: Vessel segmentation in medical imaging using a tight-
frame based algorithm. SIAM J. Imag. Sci. 6(1), 464–486 (2013a)
Cai, X., Chan, R., Zeng, T.: A two-stage image segmentation method using a convex variant of the
Mumford–Shah model and thresholding. SIAM J. Imag. Sci. 6(1), 368–390 (2013b)
Cai, X., Fitschen, J., Nikolova, M., Steidl, G., Storath, M.: Disparity and optical flow partitioning
using extended potts priors. Inf. Inference J. IMA 4, 43–62 (2015)
Cai, X., Chan, R., Nikolova, M., Zeng, T.: A three-stage approach for segmenting degraded color
images: smoothing, lifting and thresholding (SLaT). J. Sci. Comput. 72(3), 1313–1332 (2017).
https://doi.org/10.1007/s10915-017-0402-2
Cai, X., Chan, R.H., Schönlieb, C.B., Steidl, G., Zeng, T.: Linkage between piecewise constant
Mumford–Shah model and Rudin–Osher–Fatemi model and its virtue in image segmentation.
SIAM J. Sci. Comput. 41(6), B1310–B1340 (2019)
Cai, X., Wallis, C.G.R., Chan, J.Y.H., McEwen, J.D.: Wavelet-based segmentation on the sphere.
Pattern Recogn. 100 (2020). https://doi.org/10.1016/j.patcog.2019.107,081
Candés, E., Donoho, D.: Continuous curvelet transform: II. Discretization and frames. Appl.
Comput. Harmon. Anal. 19(2), 198–222 (2005)
Cardelino, J., Caselles, V., Bertalmio, M., Randall, G.: A contrario selection of optimal partitions
for image segmentation. SIAM J. Imag. Sci. 6(3), 1274–1317 (2013)
Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A.,
Vemuri, B.C., Yuille, A.L. (eds.) Energy Minimization Methods in Computer Vision and Pattern
Recognition – EMMCVPR 2005. Lecture Notes in Computer Science, vol. 3757, pp. 136–152.
Springer, Berlin (2005)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
1408 X. Cai et al.
Chambolle, A., Caselles, V., Novaga, M., Cremers, D., Pock, T.: An introduction to total variation
for image analysis. Theor. Found. Numer. Methods Sparse Recover. Radon Ser. Comput. Appl.
Math. 9, 263–340 (2010)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277
(2001)
Chan, T.F., Sandberg, B.Y., Vese, L.A.: Active contours without edges for vector-valued images.
J. Vis. Commun. Image Represent. 11(2), 130–141 (2000)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006a)
Chan, T.F., Esedoglu, S., Park, F., Yip, A.: Total variation image restoration: overview and recent
developments. In: Handbook of Mathematical Models in Computer Vision, pp. 17–31. Springer,
New York (2006b)
Chan, R., Yang, H., Zeng, T.: A two-stage image segmentation method for blurry images with
Poisson or multiplicative Gamma noise. SIAM J. Imag. Sci. 7(1), 98–127 (2014)
Chan, J., Leistedt, B., Kitching, T., McEwen, J.D.: Second-generation curvelets on the sphere.
IEEE Trans. Sig. Proc. 65(1), 5–14 (2017)
Chan, R., Yang, H., Zeng, T.: Total Variation and Tight Frame Image Segmentation with Intensity
Inhomogeneity (2019). arXiv e-prints arXiv:1904.01760
Chan, R., Kan, K.K., Nikolova, M., Plemmons, R.J.: A two-stage method for spectral-spatial
classification of hyperspectral images. J. Math. Imag. Vis. 62, 790–807 (2020)
Chapman, B., Parker, D., Stapelton, J., Parker, D.: Intracranial vessel segmentation from time-of-
flight mra using pre-processing of the mip z-buffer: accuracy of the ZBS algorithm. Med. Image
Anal. 8(2), 113–126 (2004)
Chen, J., Amini, A.: Quantifying 3d vascular structures in mra images using hybrid pde and
geometric deformable models. IEEE Trans. Med. Imag. 23(10), 1251–1262 (2004)
Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation:
integrating color, texture, motion and shape. Int. J. Comput. Vis. 72(2), 195–215 (2007)
Datt, B., McVicar, T., Van Niel, T., Jupp, D., Pearlman, J.: Preprocessing eo-1 hyperion hyperspec-
tral data to support the application of agricultural indexes. IEEE Trans. Geosci. Remote Sens.
41(6), 1246–1259 (2003)
Dong, B., Chien, A., Shen, Z.: Frame based segmentation for medical images. Commun. Math.
Sci. 32, 1724–1739 (2010)
Dong, B., Chien, A., Shen, Z.: Frame based segmentation for medical images. Commun. Math.
Sci. 9(2), 551–559 (2011)
Durand, S., Fadili, J., Nikolova, M.: Multiplicative noise removal using l1 fidelity on frame
coefficients. J. Math. Imag. Vis. 38, 201–226 (2010)
Eismann, M., Stocker, A., Nasrabadi, N.: Automated hyperspectral cueing for civilian search and
rescue. Proc. IEEE 97(6), 1031–1055 (2009)
Fauvel, M., Tarabalka, Y., Benediktsson, J., Chanussot, J., Tilton, J.: Advances in spectral-spatial
classification of hyperspectral images. Proc. IEEE 101(3), 652–675 (2013)
Franchini, E., Morigi, S., Sgallari, F.: Composed segmentation of tubular structures by an
anisotropic pde model. In: Tai, X.-C., et al. (eds.) SSVM 2009. LNCS5567, pp. 75–86
(2009)
Franchini, E., Morigi, S., Sgallari, F.: Segmentation of 3D tubular structures by a PDE-based
anisotropic diffusion model. In: Dæhlen, M., et al. (eds.) MMCS 2008. LNCS5862, pp. 224–241
(2010)
Ge, Q., Liang, X., Wang, L., Zhang, Z., Wei, Z.: A hybrid active contour model with structured
feature for image segmentation. Sig. Process 108, 147–158 (2015)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of
images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Goldstein, T., Osher, S.: The split Bregman method for l1-regularized problems. SIAM J. Imag.
Sci. 2(2), 323–343 (2009)
Gooya, A., Liao, H., et al.: A variational method for geometric regularization of vascular
segmentation in medical images. IEEE Trans. Image Process. 17(8), 1295–1312 (2008)
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1409
Gowen, A., O’Donnell, C., Cullen, P., Downey, G., Frias, J.: Hyperspectral imaging-an emerging
process analytical tool for food quality and safety control. Trends Food Sci. Technol. 18(12),
590–598 (2007)
Han, D., Sun, D., Zhang, L.: Linear rate convergence of the alternating direction method of
multipliers for convex composite programming. Math. Oper. Res. 43(2), 622–637 (2018)
He, Y., Hussaini, M.Y., Ma, J., Shafei, B., Steidl, G.: A new fuzzy c-means method with total
variation regularization for image segmentation of images with noisy and incomplete data.
Pattern Recogn. 45, 3463–3471 (2012)
Hörig, B., Kühn, F., Oschütz, F., Lehmann, F.: Hymap hyperspectral remote sensing to detect
hydrocarbons. Int. J. Remote Sens. 22(8), 1413–1422 (2001)
Jung, Y.M., Kang, S.H., Shen, J.: Multiphase image segmentation via Modica-Mortola phase
transition. SIAM J. Appl. Math. 67(5), 1213–1232 (2007)
Kay, D., Tomasi, A., et al.: Color image segmentation by the vector-valued Allen–Cahn phase-field
model: a multigrid solution. IEEE Trans. Image Process. 18(10), 2330–2339 (2009)
Kim, W., Kim, C.: Active contours driven by the salient edge energy model. IEEE Trans. Image
Process. 22, 1667–1673 (2013)
Kirbas, C., Quek, F.: A review of vessel extraction techniques and algorithms. CV Comput. Surv.
36, 81–121 (2004)
Krissian, K., Malandain, G., Ayache, N., Vaillant, R., Trousset, Y.: Model-based detection of
tubular structures in 3d images. CVIU 80, 130–171 (2000)
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by
Poisson noise. J. Math. Imag. Vis. 27, 257–263 (2007)
Leistedt, B., McEwen, J., Vandergheynst, P., Wiaux, Y.: S2let: a code to perform fast wavelet
analysis on the sphere. Astron. Astrophys. 558(A128), 1–9 (2013)
Lellmann, J., Schnörr, C.: Continuous multiclass labeling approaches and algorithms. SIAM J.
Imag. Sci. 44(4), 1049–1096 (2011)
Li, S., Hai, Y.: A full-view spherical image format. In: ICPR, pp. 2337–2340 (2010)
Li, C., Kao, C., Gore, J., Ding, Z.: Minimization of region-scalable fitting energy for image
segmentation. IEEE Trans. Image Process. 17, 1940–1949 (2008)
Li, F., Ng, M., Zeng, T., Shen, C.: A multiphase image segmentation method based on fuzzy region
competition. SIAM J. Imag. Sci. 3(2), 277–299 (2010)
Li, X., Yang, X., Zeng, T.: A three-stage variational image segmentation framework incorporating
intensity inhomogeneity information. SIAM J. Imag. Sci. 13(3), 1692–1715 (2020)
Lorigo, L., Faugeras, O., Grimson, E., et al.: Curves: curve evolution for vessel segmentation.
Med. Image Anal. 5, 195–206 (2001)
Manolakis, D., Shaw, G.: Detection algorithms for hyperspectral imaging applications. IEEE Sig.
Process. Mag. 19(1), 29–43 (2002)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and
its application to evaluating segmentation algorithms and measuring ecological statistics. In:
ICCV, vol. 2, pp. 416–423 (2001)
McEwen, J., Hobson, M., Mortlock, D., Lasenby, A.: Fast directional continuous spherical wavelet
transform algorithms. IEEE Trans. Sig. Process. 55(2), 520–529 (2007a)
McEwen, J., Vielva, P., Wiaux, Y., et al.: Cosmological applications of a wavelet analysis on the
sphere. J. Fourier Anal. Appl. 13(4), 495–510 (2007b)
McEwen, J., Durastanti, C., Wiaux, Y.: Localisation of directional scale-discretised wavelets on
the sphere. Appl. Comput. Harm Anal. 44(1), 59–88 (2018)
Michailovich, O., Rathi, Y.: On approximation of orientation distributions by means of spherical
ridgelets. IEEE Trans. Sig. Proc. 19(2), 461–477 (2010)
Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. XLII, 577–685 (1989)
Paschos, G.: Perceptually uniform color spaces for color texture analysis: an empirical evaluation.
IEEE Trans. Image Process. 10(6), 932–937 (2001)
Patel, N., Patnaik, C., Dutta, S., Shekh, A., Dave, A.: Study of crop growth parameters using
airborne imaging spectrometer data. Int. J. Remote Sens. 22(12), 2401–2411 (2001)
1410 X. Cai et al.
Pock, T., Chambolle, A., Cremers, D., Bischof, H.: A convex relaxation approach for computing
minimal partitions. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE, pp. 810–817 (2009a)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the piecewise
smooth mumford-shah functional. In: ICCV (2009b)
Rathi, Y., Michailovich, O., Setsompop, K., et al.: Sparse multi-shell diffusion imaging. MICCAI,
Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. 14(2), 58–65 (2011)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image
segmentation. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 234–241. Springer (2015)
Rotaru, C., Graf, T., Zhang, J.: Color image segmentation in HSI space for automotive applications.
J. Real-Time Image Process. 3(4), 311–322 (2008)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60, 259–268 (1992)
Sandberg, B., Kang, S., Chan, T.: Unsupervised multiphase segmentation: a phase balancing
model. IEEE Trans. Image Process. 19, 119–130 (2010)
Schmitt, J., Starck, J., Casandjian, J., Fadili, J., Grenier, I.: Multichannel Poisson denoising and
deconvolution on the sphere: application to the Fermi Gamma-ray Space Telescope. Astron.
Astrophys. 546(A114) (2012). https://www.aanda.org/articles/aa/full_html/2012/10/aa18234-
11/aa18234-11.html
Shi, J., Osher, S.: A nonlinear inverse scale space method for a convex multiplicative noise model.
SIAM J. Imag. Sci. 1, 294–321 (2008)
Simons, F., Loris, I., Nolet, G., et al.: Solving or resolving global tomographic models with
spherical wavelets, and the scale and sparsity of seismic heterogeneity. Geophys. J. Int. 187,
969–988 (2011)
Starck, J., Moudden, Y., Abrial, P., Nguyen, M.: Wavelets, ridgelets and curvelets on the sphere.
Astron. Astrophys. 446(3), 1191–1204 (2006)
Steidl, G., Teuber, T.: Removing multiplicative noise by Douglas-Rachford splitting methods. J.
Math. Imag. Vis. 36(2), 168–184 (2010)
Stein, D., Beaven, S., Hoff, L., Winter, E., Schaum, A., Stocker, A.: Anomaly detection from
hyperspectral imagery. IEEE Sig. Process. Mag. 19(1), 58–69 (2002)
Storath, M., Weinmann, A.: Fast partitioning of vector-valued images. SIAM J. Imag. Sci. 7(3),
1826–1852 (2014)
Sum, K., Cheung, P.: Vessel extraction under non-uniform illumination: a level set approach. IEEE
Trans. Biomed. Eng. 55(1), 358–360 (2008)
Vese, L., Chan, T.: A multiphase level set framework for image segmentation using the mumford
and shah model. Int. J. Comput. Vis. 50(3), 271–293 (2002)
Wallis, C., Wiaux, Y., McEwen, J.: Sparse image reconstruction on the sphere: analysis and
synthesis. IEEE Trans. Image Process. 26(11), 5176–5187 (2017)
Wang, L., Li, C., Sun, Q., Xia, D., Kao, C.Y.: Active contours driven by local and global intensity
fitting energy with application to brain MR image segmentation. Comput. Med. Imag. Graph.
33(7), 520–531 (2009)
Wang, X., Huang, D., Xu, H.: An efficient local Chan-Vese model for image segmentation. Pattern
Recogn. 43, 603–618 (2010)
Wang, X., Tang, Y., Masnou, S., Chen, L.: A global/local affinity graph for image segmentation.
IEEE Trans. Image Process. 24(4), 1399–1411 (2015)
Yan, P., Kassim, A.: MRA image segmentation with capillary geodesic active contours. Med.
Image Anal. 10, 317–329 (2006)
Yuan, J., Bae, E., Tai, X.C.: A study on continuous max-flow and min-cut approaches. In: 2010
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2217–
2224. IEEE (2010a)
Yuan, J., Bae, E., Tai, X.C., Boykov, Y.: A continuous max-flow approach to potts model. In:
European Conference on Computer Vision, pp. 379–392 (2010b)
40 An Overview of SaT Segmentation Methodology and Its Applications. . . 1411
Zach, C., Gallup, D., Frahm, J.-M., Niethammer, M.: Fast global labeling for real-time stereo using
multiple plane sweeps. In: Vision, Modeling, and Visualization Workshop (2008)
Zhang, Y., Matuszewski, B., Shark, L., Moore, C.: Medical image segmentation using new
hybrid level-set method. In: 2008 Fifth International Conference BioMedical Visualization:
Information Visualization in Medical and Biomedical Informatics, pp. 71–76 (2008)
Zhi, X., Shen, H.: Saliency driven region-edge-based top down level set evolution reveals the
asynchronous focus in image segmentation. Pattern Recogn. 80, 241–255 (2018)
Zonoobi, D., Kassim, A., Shen, W.: Vasculature segmentation in mra images using gradient
compensated geodesic active contours. J. Sig. Process. Syst. 54, 171–181 (2009)
Recent Development of Medical
Shape Analysis via Computational 41
Quasi-conformal Geometry
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1414
The Quasi-conformal Teichmüller Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1414
Conformal Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1414
Quasi-conformal Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415
Teichmüller Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417
Medical Image Segmentation and Registration by Quasi-conformal Theory . . . . . . . . . . . . 1418
Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419
Image Registration and Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422
Other Imaging Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1423
Surface Analysis for Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1424
3D Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1424
High-Dimensional Shape Deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1427
Disease Diagnosis and Classification by Quasi-conformal Geometry . . . . . . . . . . . . . . . . . . 1430
Classification of the Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1430
Other Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1432
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435
Abstract
recent years when computational geometry comes into appliance, and it opens
up a huge room for the incorporation of mathematics with medical analysis.
For instance, medical imaging, geometric modeling for medical surfaces, and
machine learning for disease classification are crucial topics nowadays having
heavy reliance on image processing and geometric analysis. There are many
streams in applying the study of geometry. Among those, the application of the
quasi-conformal Teichmüller theory has shown to be very successful in recent
years. This article serves to conclude some most updated models having solid
contributions to the medical science in different aspects.
Keywords
Introduction
Conformal Mappings
Fig. 1 Illustration of a
conformal mapping that maps
an infinitesimal disk into
another infinitesimal disk
Here the smooth function λ is called the conformal factor of the conformal
mapping f .
From the definition, we can see that a conformal mapping preserves the surface
metric on M up to the conformal factor as a multiplying factor. Infinitesimally, a
conformal mapping f maps a disk into another disk, as illustrated by Fig. 1.
Equivalently, conformal mapping can be defined as a diffeomorphism f : M →
N satisfying the Cauchy-Riemann equation:
∂f
= 0. (2)
∂ z̄
where ∂∂z̄ = ∂x ∂
+ i ∂y
∂
. While the former one provides a more straightforward
understanding on the local geometry preserving property of conformal mappings,
the latter one is the more convenient definition for us to generalize the notion of
conformal mappings into quasi-conformal mappings.
Quasi-conformal Mappings
∂f ∂f
=μ· , (3)
∂ z̄ ∂z
where ∂z∂
= ∂x∂
− i ∂y
∂
. Here, μ : M → C is a Lebesgue measurable complex
function and satisfies
The function μ defined in equation (3) is called the Beltrami coefficient associ-
ated to f . An immediate observation is that μ = 0 if and only if f is conformal.
1416 H.-L. Chan and L.-M. Lui
Fig. 2 Illustration of a quasi-conformal mapping that maps an infinitesimal disk into an infinites-
imal ellipse
Note that in Equation (5), the term f (x) and the term fz (x) are just the translation
term and the dilation term, respectively, which are both conformal. Therefore, the
non-conformality of f is completely originated from the term D(z) = z + μ(x)z̄.
Hence, analyzing the conformality of f can be simplified into the analysis of the
Beltrami coefficient μ. Indeed, as for the infinitesimal behavior of a quasi-conformal
mapping f , the angle of maximum magnification is arg(μ(x))/2 with magnifying
factor being 1+|μ(x)| while the angle of maximum contraction is arg(μ(x)−π )/2
with contracting factor being 1−|μ(x)|. Therefore, there is a very close relationship
between a quasi-conformal mapping f and its associated Beltrami coefficient μ.
One important relationship between f and μ is that, the diffeomorphic property
of f can be totally replaced by a norm constraint on μ, as described by the following
theorem:
From the Beltrami Equation (3) and the measurable Riemann mapping theorem,
there is a one-to-one correspondence between f and μ under suitable normalization.
In other words, most constraints on a mapping f can be regarded as constraints on
the space of the corresponding Beltrami coefficient.
Concerning about the composition of quasi-conformal maps, if f andg are two
quasi-conformal mappings associated with the Beltrami coefficients μf andμg ,
respectively, then the Beltrami coefficient of the composition mapping g ◦f is given
by
μf + (μg ◦ f )τ
μg◦f = , (7)
1 + μ¯f (μg ◦ f )τ
Teichmüller Mappings
Teichmüller maps are closely related to a class of maps called extremal quasi-
conformal maps, defined by:
1418 H.-L. Chan and L.-M. Lui
1
dT ((fi , Si ), (fj , Sj )) = inf log K(ϕ), (10)
ϕ 2
where ϕ : Si → Sj varies over all quasi-conformal maps with {pik }nk=1 cor-
responding to {pjk }nk=1 , which is homotopic to fj−1 ◦ fi , and K is the maximal
quasi-conformal dilation.
medical analysis. These corresponds to the study of image segmentation and image
registration in computational mathematics. In particular, the QC theory helps in
building a convenient interface of the image processing models while generating
consistent and accurate results.
Image Segmentation
Bg (D) = μ (11)
such that
f μ (D̂g ) = D, f μ = I d in C \ . (12)
where μ is the Beltrami coefficient of the deformation mapping f μ . The idea of the
Beltrami representation is to make use of the one-to-one correspondence between a
mapping and its Beltrami coefficient to implement the diffeomorphic deformation
constraint on the space of Beltrami coefficients. The model restricts the deformation
to be diffeomorphic without imposing further constraint on the deformation. As
such, the template object adapts to the target subject with high flexibility while
eluding any topological occlusion. In implementing the idea, Chan et al. proposed a
variational model:
E(μ) = |μ| + η (I ◦ f − J ) + λ
2 μ 2
|∇μ| + σ (|u|2 + |∇u|2 ), (13)
2
Fig. 3 An example of the segmentation by the proposed model extracted from Chan et al.
(2018). Left: input image; middle: initial template object superimposed on the input image; right:
segmentation result
existence of solution to the model, and the minimization process of the variational
model (13), readers are referred to Chan et al. (2018).
Figure 3 demonstrates an example using the QC model to segment a noisy brain
image. While the center part of the brain (the hippocampus) shows significantly
different color which will usually be taken as occlusions by other segmentation
models, the fuzzy noise on the image also adds extra difficulty in segmenting
the brain as a whole. Nevertheless, the QC model still captures the brain with an
accurate boundary. This demonstrates the effectiveness of applying the QC theory
to prescribe the topology of the target region.
The application of the QC theory in medical image segmentation does not only lie
on topology preservation. By discretizing the QC theory and joint-forcing the notion
of dihedral angles on the meshed image domain, convexity can also be prescribed
on particular portions on the segmented region. Therefore, the segmentation process
can be much more adaptive to the given target subject, without relying too much on
a given shape prior.
In Siu et al. (2020), Chan et al. advanced their topology preserving segmentation
model to a convexity preserving segmentation model. They employed the notion of
dihedral angle and implemented the QC segmentation model in a discrete setting. In
particular, the dihedral angle plays the role to determine and constrain the convexity
of the triangulated image domain. In advance, convexity can be constrained on just
sectors of the template object. That is, the model enables the constraint of partial
convexity on the segmented region. Given a portion Γ ⊂ D at which the user wants
to prescribe convexity on it, their QC segmentation model with partial convexity
prior reads
μ
min E(μV , νV ) = |νV |2 + η (I ◦ fV − J )2 + λ |∇νV |2
μV ,νV
v∈V v∈V v∈V
+σ (|uV |2 + |∇uV |2 ) + δ |νV − μV |2 (14)
v∈V v∈V
41 Recent Development of Medical Shape Analysis via Computational. . . 1421
Fig. 4 Two examples of the segmentation by the proposed model extracted from Siu et al.
(2020). Left: input image; middle: initial template object superimposed on the input image; right:
segmentation result
subject to θf (e) ≥ (|Fv | − 1)π for all v ∈ Γ, whereΓ ⊂ ∂D. (15)
e∈Ev
subject to
f is diffeomorphic, (17)
f (pi ) = qi i = 1, 2, . . . , m. (18)
In other words, the desired mapping is a diffeomorphism that deforms the moving
images M (Fig. 5a, e) to adapt to the static images S (Fig. 5b, f), while matching the
landmark points pi ’s to qi ’s, respectively.
To obtain such a deformation mapping f , Lam et al. apply the quasi-conformal
theory and formulate the variational model:
(μ̄, f ) = arg min |∇ν|2 + α |ν|p
ν,g
(19)
1
+ (ST − M ◦ g) +2
(S − MT ◦ g) 2
2
subject to
Fig. 5 Examples for medical image registration and fusion extracted from Lam and Lui (2015)
(a) Image M (b) Image S (c) Result (d) Merged image (e) Image M (f) Image S (g) Result (h)
Merged image
The model measures the similarity between the deformed moving image and
the static image by the mutual-transformed intensity difference and searches for
the transformation mapping maximizing the mutual information with the least
conformality distortion.
The QC model has the merit to control the conformality distortion of the
transformation, which in turns controls the smoothness of the mapping without
losing bijectivity and diffeomorphicity. Figure 5 demonstrates two examples of the
image registration and fusion by the QC model.
In (2018), Zhang and Chen proposed another quasi-conformal image regis-
tration model. They introduced a novel, unbiased, and robust regularizer which
is reformulated from Beltrami coefficient framework to ensure a diffeomorphic
transformation. With a suitable approximation of the exact Hessian matrix which
is necessary to derive a convergent iterative method, their model not only get a
diffeomorphic registration even when the deformation is large but also possess high
accuracy as compared with other existing models.
Surface processing and analysis tools are crucial bridges between medical data and
other applications. For example, given a segmentation of anatomical structures, the
corresponding surface can be simulated and meshed for further clinical analysis.
The simulated surfaces may be used in, for example, database generation, disease
diagnosis and case study, etc. In this section, we will introduce how the quasi-
conformal geometry may participate in this field.
3D Surface Registration
then compute the spherical mesh by inverse stereographic projection. The landmark
aligned spherical map is therefore obtained by an extra stereographic projection and
solving a Laplace equation and compositing with a quasi-conformal mapping within
the process. Figure 6 demonstrates an example of the registration result between two
Fig. 6 An example demonstrating the registration result of the FLASH algorithm. The sulcal
landmarks are highlighted. (a) and (b) show the source cortical surface and the target cortical
surface, respectively. (c) shows the conformal registration without any landmark constraints. One
can observe that the landmark curves are not matched. (d) shows the registration obtained using
FLASH:
1426 H.-L. Chan and L.-M. Lui
cortical surfaces using the FLASH algorithm. The FLASH algorithm usually takes
several seconds to finish the whole registration process which is hundred times faster
than other currently used models.
It should be emphasized that the quasi-conformal geometry plays a crucial role
in the FLASH algorithm. After computing the landmark-aligned mapping φ in (23),
the Beltrami differential μφ of φ is computed. And μφ is smoothened to be a
Beltrami coefficient μ by the variational model:
μsmooth = arg min (|∇μ|2 + |μ − μφ |2 + A(T )|μ|2 ), (24)
μ
where A(T ) is the area of the triangular face T on the plane. This step ensures the
mapping f corresponding to μ is smooth, in which φ does not necessarily possess
this property. The variational model above is solved by the linear Beltrami solver
(LBS).
The framework, in particular, can be applied to hippocampal surface registration.
According to medical research, the hippocampus would undergo abnormal deforma-
tion in the prodromal stage of the Alzheimer’s disease. However, the hippocampus
shows no obvious landmark on the surface. It is a challenging task to correspond the
hippocampal surfaces and analyze the deformation.
Motivated by the situation, in (2020), Chan et al. propose a registration model,
ACC-REG, for hippocampal surfaces. Given two hippocampal surfaces, ACC-REG
automatically generates two landmark curves using the eigen-graph on the surfaces.
A histogram matching mapping is applied onto the two eigen-graphs to calibrate
the propagation of the landmark curves along the surfaces. Afterwards, ACC-REG
employs the FLASH algorithm to register the two hippocampal surfaces.
Figure 7 demonstrates an example of the calibrated eigen-graph on a hippocam-
pal surface. Figure 8 shows two experiments of the ACC-REG model. The results
have demonstrated the effectiveness of the ACC-REG model to obtain an accurate
registration between hippocampal surfaces.
In (2011), Zeng and Gu proposed a quasi-conformal model for surface registra-
tion by solving Beltrami equations using curvature flow. The proposed model can
attain at global minimum which is unique up to a three-dimensional transformation
group.
Fig. 7 An example of the eigen-graph and landmark curves on a hippocampal surface extracted
from Chan et al. (2020). (Left) Eigen-graph with function values goes from blue (0) to red (1);
(right) landmark curves (green and black) on the surface
41 Recent Development of Medical Shape Analysis via Computational. . . 1427
Fig. 8 Two examples of the surface registration of ACC-REG. (Left) Original surfaces; (middle)
artificially deformed surfaces; (right) registered surfaces by the ACC-REG model
In (2019), Ma et al. applied the optimal mass transport mapping (OMT-Map) and
Teichmüller mapping (T-Map) to solve for a unique bijective surface registration
with landmark constraints in case of large deformations. Their model is advanta-
geous in enforcing the robustness by avoiding large area distortion and producing
diffeomorphisms with all landmarks matched consistently. Medical applications of
the model is believed to be promising.
In Gu et al. (2004) and Wang et al. (2007), analyzed a family of quasi-conformal
maps including harmonic maps, conformal maps, and least-squares conformal maps
with regard to 3D shape matching and hence proposed a novel and computationally
efficient shape matching framework by using least-squares conformal maps. Their
model achieves high accuracy and efficiency in 3D shape matching. Their model, if
applied to the medical industry, is believed to be one another powerful tool to use.
⎧
⎨ ||Df (x)||2F
, if detDf (x) > 0,
Kf (x) := det(Df (x))2/n . (25)
⎩+∞, otherwise
f (pi ) = qi , i = 1, 2, . . . , m. (27)
Fig. 9 Two examples of the 3D lung data registration extracted from Lee et al. (2016). The red and
blue dots on the left and middle images indicate the landmark points. The vectors at each vertex
on the right image indicates the alignment of the landmark points by the QC model
41 Recent Development of Medical Shape Analysis via Computational. . . 1429
Lf (x) − 1
Aidf (x) =
Lf (x) + 1
where Lf : M → N is defined by
|f (u) − f (x)|
Lf (x) := lim sup .
r→0
u,v∈SxM (r) |f (v) − f (x)|
u=v
0.4 0.5
0.45
0.35
0.4
0.3
0.35
0.25
0.3
0.2 0.25
0.15 0.2
0.15
0.1
0.1
0.05
0.05
0
0
Fig. 10 Two examples of the color plot of the anisotropic indicator on 3D volumetric data
extracted from Lee et al. (2016). (Left) Brain data; (right) lung data
1430 H.-L. Chan and L.-M. Lui
The Alzheimer’s disease is a no-cure disease. One of the crucial tasks in dealing
with this disease is to detect it in the early stage. It is evident that the hippocampus
would show abnormal deformation in the early stage of the disease. In (2016), Chan
et al. proposed an Alzheimer’s disease (AD) classification model which analyzes
the hippocampal surfaces by considering their local geometric distortions. The key
is to combine local shape deformities including the conformality distortion, the
Gaussian curvature distortion, and the mean curvature distortion of the deformation
of a subject’s hippocampal surface along the longitudinal direction (i.e., different
time frame). More specifically, Chan et al. proposed a shape index:
j j j j j j
i
Eshape (vi ) = γ |μ(fi )(vi )| + α|H0 (vi ) − H1 (fi (vi ))| + β|K0 (vi ) − K1 (fi (vi ))|
(28)
The shape index is a complete descriptor of the local deformation of the
hippocampal surface mesh and is taken to be the vertex-wise feature to classify
the disease. In Chan et al. (2016), the authors are given a database consisting
of 99 normal control subjects and 41 AD subjects. After registering each pair of
the surfaces, the shape index is computed for each surface. All the shape indexes
are stacked to form a feature matrix, and a modified t-test is applied to extract
features with high discriminating power. The trimmed feature matrix is then used
to build a L2 -norm-based binary classification model. The model is found to be
effective in classifying the Alzheimer’s disease and results in a 87.9% accuracy in
a leave-one-out validation test on the given database. Here it is emphasized that
j
the conformality distortion term |μ(fi )(vi ) plays a crucial role in analyzing the
infinitesimal distortion of the mapping. Without this term, the classification rate
drops significantly from 87.9% to 77.1%.
41 Recent Development of Medical Shape Analysis via Computational. . . 1431
Fig. 11 An example of the color plot of shape index on a hippocampal surface extracted from
Chan et al. (2020). (Left) Front view; (right) back view
The QC model does not only predict the disease status of the given subject.
Since the shape index is a local indicator, one can visualize the location of the
abnormal deformation by a color plot of the p-value of the shape index. Figure 11
demonstrates an example of the color plot. The QC model helped medical doctors
to easily locate the regions of abnormalities which is contributing to further medical
analysis.
Later in (2020), Chan et al. further proposed an AD diagnosis model by joint-
forcing the quasi-conformal geometry and the spherical harmonics (SPHARM)
theory. By applying the SHREC algorithm derived from the SPHARM theory, a
template mean surface can be simulated by investigating the normal control subjects.
Deformation of the hippocampus can therefore be regarded as that from the template
surface to the subject surface. This releases the necessity for longitudinal data as
in the previous model and allows instant diagnosis of the disease. The SPHARM
registration also provides a set of global features, the SPHARM coefficients, on
the hippocampal surface. They are combined with the quasi-conformal-based shape
index, and the volume distortion from the template surface to the subject surface, to
formulate a geometric feature vector for each surface:
Fig. 12 Three examples of the hippocampal surfaces extracted from Chan et al. (2020). (Left and
middle): normal control subjects; (right): AD subjects
In Chan et al. (2020), authors are given two sets of data. The first set of data
consists of 110 normal control subjects and 110 AD subjects. In the experiment
part, 140 training data are randomly chosen to build the classification model using a
10-fold cross-validation scheme. The remaining 80 data is taken as the testing data.
The process is repeated for 1,000 times and our model records over 85% accuracy
on average.
According to medical research, in the prodromal stage of AD, there is a medical
situation called the amnestic mild cognitive impairment (aMCI). While some aMCI
patients may stay stable in the current state, some patients may further progress
into AD. It is a challenging task to classify the two groups of patients. In Chan et al.
(2020), the authors are also concerned with the prediction of the disease progression
by the QC-SPHARM model. A database consisting of 40 aMCI patients is given,
in which 20 of them remain stable in aMCI for dozens of years after scanning
for the hippocampal surface and the remaining 20 of them progressed into AD
soon after the scan. The authors run an experiment to randomly picked 30 data
to build the classification model and used the remaining 10 data to test the accuracy
of the classifier. The process is repeated for 1,000 times, and the result showed
that the QC-SPHARM model achieves over 81% accuracy in predicting the further
development of the disease status. Figure 12 demonstrates three examples of the
registered hippocampal surfaces in the database for reference.
Eshape (fi )(v k ) = α|Hi (v k ) − H (fi (v k ))| + β|Ki (v k ) − K(fi (v k ))| + γ di , (30)
which involves the Gaussian curvature distortion term, the mean curvature distortion
term, and the Teichmüller distance term. A t-test-based scheme is applied followed
by the SVM to build the classification machine.
It is noteworthy that in Choi et al.’s work, they also proposed the spherical
marching scheme (SMS) to optimize the parameters α, β, andγ in terms of higher
classification accuracy. The spherical marching scheme makes use of the fact that
the norm of the shape index has no contribution to the classification process.
Therefore, the space of the parameters (α, β, γ ) can be restricted on the unit sphere.
And the optimized parameters can be exhaustively searched by regular gridding on
the domain of the unit sphere in spherical coordinates. That is
Conclusion
Fig. 13 The pipeline of the quasi-conformal based skull dating model extracted from Choi et al. (2020)
H.-L. Chan and L.-M. Lui
41 Recent Development of Medical Shape Analysis via Computational. . . 1435
data. The conformality distortion can also be used to measure the abnormality of
the deformation mapping. The QC theory is also helpful in disease classification.
By incorporating the conformality distortion and the Teichmüller distance with other
common measurements, a trustworthy disease classification model can be built with
high accuracy and stability.
References
Chan, H.-L., Lui, L.-M.: Detection of n-dimensional shape deformities using n-dimensional quasi-
conformal maps. Geometry Imaging Comput. 1(4), 395–415 (2014)
Chan, H.-L., Li, H., Lui, L.-M.: Quasi-conformal statistical shape analysis of hippocampal surfaces
for Alzheimer’s disease analysis. Neurocomputing 175, 177–187 (2016)
Chan, H.-L., Yan, S., Lui, L.-M., Tai, X.C.: Topology-preserving image segmentation by Beltrami
representation of shapes. J. Math. Imaging Vision 60(3), 401–421 (2018)
Chan, H.-L., Yam, T.-C., Lui, L.-M.: Automatic characteristic-calibrated registration (ACC-REG):
Hippocampal surface registration using Eigen-graphs. Pattern Recogn. 103, 107142 (2020)
Chan, H.-L., Luo, Y., Shi, L., Lui, L.-M.: QC-SPHRAM: Quasi-conformal Spherical Harmonics
Based Geometric Distortions on Hippocampal Surfaces for Early Detection of the Alzheimer’s
Disease. Computerized Medical Imaging and Graphics (Submitted 2020)
Choi, P.-T., Lam, K.-C., Lui, L.-M.: FLASH: Fast landmark aligned spherical harmonic parame-
terization for genus-0 closed brain surfaces. SIAM J. Imag. Sci. 8(1), 67–94 (2015)
Choi, G.P.T., Chan, H.-L., Yong, R., Ranjitkar, S., Brook, A., Townsend, G., Chen, K., Lui, L.-M.:
Tooth morphometry using quasi-conformal theory. Pattern Recogn. 99, 107064 (2020)
Dong, H.Y., Cheng, Q., Song, G.Y., Tang, W.X., Wang, J., Cui, T.J.: Realization of broadband
acoustic metamaterial lens with quasi-conformal mapping. Appl. Phys. Express 10(8), 087202
(2017)
Gu, X., Wang, Y., Chan, T.F., Thompson, P.M., Yau, S.-T.: Genus zero surface conformal mapping
and its application to brain surface mapping. IEEE Trans. Med. Imaging 23(8), 949–958 (2004)
Heisterkamp, D.R., Peng, J., Dai, H.K.: Adaptive quasiconformal kernel metric for image retrieval.
In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition. CVPR 2001, vol. 2. IEEE (2001)
Jones, G.W., Mahadevan, L.: Planar morphometry, shear and optimal quasi-conformal mappings.
Proc. R. Soc. A Math. Phys. Eng. Sci. 469(2153), 20120653 (2013)
Lam, K.-C., Lui, L.-M.: Quasi-conformal hybrid multi-modality image registration and its appli-
cation to medical image fusion. In: International Symposium on Visual Computing. Springer,
Cham (2015)
Lam, K.-C., Lui, L.-M.: Landmark and intensity based registration with large deformations via
quasi-conformal maps. SIAM J. Imag. Sci. 7(4), 2364–2392 (2014)
Lee, Y.-T., Lam, K.-C., Lui, L.-M.: Landmark-matching transformation with large deformation via
n-dimensional quasi-conformal maps. J. Sci. Comput. 67(3), 926–954 (2016)
Lui, L.M., Wong, T.W., Zeng, W., Gu, X., Thompson, P.M., Chan, T.F., Yau, S.T.: Optimization of
surface registrations using Beltrami holomorphic flow. J. Sci. Comput. 50(3), 557–585 (2012)
Lui, L.M., Lam, K.C., Yau, S.T., Gu, X.F.: Teichmuller mapping (T-Map) and its applications to
landmark matching registrations. SIAM J. Imag. Sci. 7(1), 391–426 (2014)
Ma, M., Yu, X., Lei, N., Si, H., Gu, X.: Optimal mass transport based brain morphometry for
patients with congenital hand deformities. Vis. Comput. 35(9), 1311–1325 (2019)
Meng, T.W., Choi, P.T., Lui, L.M.: TEMPO: Feature-Endowed Teichmuller extremal mappings of
point clouds. SIAM J. Imag. Sci. 9(4), 1922–1962 (2016)
Naitsat, A., Saucan, E., Zeevi, Y.Y.: Volumetric quasi-conformal mappings. In: Proceedings of the
10th International Conference on Computer Graphics Theory and Applications (2015)
Saucan, E., Appleboim, E., Barak-Shimron, E., Lev, R., Zeevi, Y.Y.: Local versus global in quasi-
conformal mapping for medical imaging. J. Math. Imaging Vision 32(3), 293–311 (2008)
1436 H.-L. Chan and L.-M. Lui
Siu, C.-Y., Chan, H.-L., Lui, L.-M.: Image segmentation with partial convexity prior using discrete
conformality structure. SIAM J. Imag. Sci. (Accepted 2020)
Taimouri, V., Hua, J.: Deformation similarity measurement in quasi-conformal shape space. Graph.
Model. 76(2), 57–69 (2014)
Wang, Y., Lui, L.M., Gu, X., Hayashi, K.M., Chan, T.F., Thompson, P.M., Yau, S.-T.: Brain surface
conformal parameterization using Riemann surface structure. IEEE Trans. Med. Imaging 26(6),
853–865 (2007)
Zeng, W., Gu, X.D.: Registration for 3D surfaces with large deformations using quasi-conformal
curvature flow. In: CVPR 2011. IEEE (2011)
Zhang, D., Chen, K.: A novel diffeomorphic model for image registration and its algorithm. J.
Math. Imaging Vision 60(8), 1261–1283 (2018)
Zhang, M., An, D., Young, G.S., Gu, X., Xu, X.: A quasi-conformal mapping-based data
augmentation technique for brain tumor segmentation. In: Medical Imaging 2020: Image
Processing, vol. 11313, p. 113132P. International Society for Optics and Photonics (2020)
A Survey of Topology and
Geometry-Constrained Segmentation 42
Methods in Weakly Supervised Settings
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1438
Geometrical Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1441
Characterization of Geometrical Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1442
Model 1: A Simple Variational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1443
Model 2: A Moving Band Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1443
Model 3: A Dual Level Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1444
Model 4: The Use of Moment Constraint for Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 1445
Model 5: Convex Segmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446
Model 6: Convex Models Based on Geodesic Distances . . . . . . . . . . . . . . . . . . . . . . . . . . 1446
Other Possible Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1447
Topological Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1449
Topology Prescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1451
Regularization Enforcement on the Evolving Front . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1459
Joint Segmentation and Registration Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463
Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463
Overview of Existing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1466
A Mixed Segmentation/Registration Model Based on a Nonlocal
Characterization of Weighted Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468
Other Related Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1473
Optimal Flow Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1473
K. Chen ()
Department of Mathematical Sciences, Centre for Mathematical Imaging Techniques, University
of Liverpool, Liverpool, UK
e-mail: [email protected]
N. Debroux
Pascal Institute, University of Clermont Auvergne, Clermont-Ferrand, France
e-mail: [email protected]
C. Le Guyader
INSA Rouen Normandie, Laboratory of Mathematics, Normandie University, Rouen, France
e-mail: [email protected]
Abstract
Keywords
Introduction
Not all image shapes and patterns extracted provide useful information. The
usefulness is relative to applications. For instance, an automatic car mainly cares
about stationary and moving objects on its path, not objects (such as buildings or
trees) that are far away, while in medical imaging, a specialist on liver diseases
is not primarily interested to see patterns in lungs and abdomen. Therefore in this
practical sense, only models that have topology and geometry constraints built-in
to extract patterns of interest are really valuable, while other generic segmentation
models capable of identifying all objects are not helpful.
Although simple to state, this task is nevertheless challenging and ill-posed as
emphasized by Zhu et al. in the comprehensive segmentation survey (Zhu et al.
2016):
(i) First, owing to the polysemy of the word object and because interpretation is
intrinsically subjective: different human beings may have different views of
what an object is. The definition of an object encompasses several acceptations
according to human perception: it can be something material (a thing), a
periodic pattern, an overall structure (e.g., a forest, the sea), or even a sub-part
of a given object (e.g., a tumor in a brain MRI image).
(ii) Second, due to the difficulty in computerizing/reproducing the human vision
system, capable of synthesizing (interpolating) the observed data into a contin-
uous whole, human tends to merge elements taking on shared similarities, to
complete missing data, to favor continuous contours, etc., whereas most images
in computers are represented by low-level characteristics reflecting mainly local
properties and failing thus to capture the global (continuous) nature of the
observed object.
These two elements together make the evaluation of segmentation techniques still
an open question.
An exhaustive classification of segmentation methods into three main categories
is provided in Zhu et al. (2016) and ordered according to the level of supervision
or user involvement, combined with a description/analysis of each methodology as
follows:
2001; Vese and Le Guyader 2015; Vese and Chan 2002; Wang et al. 2009)—
framework in which the image is seen as a continuous surface, avoiding thus
the grid bias artifacts inherent to discrete methods and producing visually
more pleasing results—methods involving edge-based models and region-
based ones.
(ii) Local models: semi/weakly supervised methods which incorporate a small
amount of high-level information and as such, are usually interactive and
require human expertise and intervention to better match human perception.
This class of methods is partitioned into two subclasses: (a) interactive
methods that rely on a small amount of prior information provided by the user
(e.g., labels of a few pixels as initial constraints) and that encompass three
groups of methodology, contour tracking approaches (Osher and Sethian 1988;
Mortensen et al. 1992; Liu and Yu 2012; He et al. 2013; McGuinness et al.
2010; Werlberger et al. 2009; Le Guyader and Gout 2008; Le Guyader and Vese
2008), label propagation approaches (Boykov and Jolly 2001; Grady 2006;
Price et al. 2010; Bai and Sapiro 2007), and local optimization approaches
(Hosni et al. 2013; Criminisi et al. 2008), and (b) image cosegmentation
(Rother et al. 2006), well-suited for large-scale image dataset and which
consists in identifying common objects in a set of images.
(iii) Learning models: fully supervised methods (refer to Zhu et al. 2016, Section 4
and Garcia-Garcia et al. 2018 for an overview): they consist in training a
segmentation algorithm thanks to fully annotated data—all pixels are labeled
as either boundary or nonboundary—and then segmenting an unknown image.
They reach high performance but the labeling is very expensive. However,
more and more datasets are now available (see Zhu et al. (2016) for a list of
them) with the explosion of machine learning-based algorithms and increasing
computer abilities in the past few years.
In line with this classification, this chapter aims to focus on the second class
of weakly supervised methods and more specifically, on interactive approaches
(although the joint segmentation/registration models depicted below might be
viewed as special instances of cosegmentation). The study entails the following
three focal areas that can be envisioned as three distinct types of a priori knowledge
included in the segmentation process and that structure the rest of the paper:
(a) (b)
(c) (d)
Geometrical Constraints
We now present the first class of methods building models based on a given set of
geometrical constraints.
1442 K. Chen et al.
Fig. 2 Geometrical constraints and segmentation. Note the initial u is arbitrary. (a) Original image
with M (b) Segmented (local) object (c) Initialization: Left: φ (non-convex case) and Right: u
(convex case)
and Roberts et al. (2019), was defined as a Euclidean distance in Gout et al. (2005)
and Le Guyader and Gout (2008)
(x − xi )2 (y − yi )2
d(x, y) = m
i=1 1 − exp(− − ) ,
2σ 2 2σ 2
is then derived (z denoting the considered image) to drive the initial level set curve
to the desirable φ zero level line defining the intended object. Here g is an edge
detector function that helps the contour evolve by ensuring that the front stops
propagating when localized on meaningful contours. Extending this model to a
variational framework, several models were then considered.
Initialization. In the models shown below, the level set φ is initialized automati-
cally (i.e., no further user intervention is required) using the polygon formed by the
marker points as displayed in Fig. 2c for 3 markers, depending on the convexity of
the underlying model; we may place a small circle near the markers if the number
of marker points is less than 3. However, when the underlying model is convex,
initialization can be made with an arbitrary contour.
The first and yet simple extension was done in Badshah and Chen (2010) by
extending the Chan-Vese model (Chan and Vese 2001) to incorporate geometric
constraints so that noisy images can be better segmented than in Gout et al. (2005)
as follows
min dg(|∇z|)|∇H (φ)|d + λ1 H (φ)|z−c1 |2 +λ2 (1−H (φ))|z−c2 |2 d.
φ,c1 ,c2
A weakness of Badshah and Chen (2010) lies in the fact that the obtained
global solution often contains neighboring objects, which can be only avoided if
one terminates iterations early. In some cases, the model reaches the same global
segmentation result as the Chan-Vese solution.
One way to overcome the problem is to know when to stop the algorithm or
when a model reaches the object boundary (which may not correspond to the local
1444 K. Chen et al.
minimizer of Badshah and Chen 2010 model). The idea in Zhang et al. (2014) is to
evolve the initial curve from a polygon in a band fashion so that we would not step
over the object boundary. The proposed model takes the form of
inf λ1 H (φ)|z−c1 |2 b(φ, γin , γout )+λ2 (1−H (φ))|z−c2 |2 b(φ, γin , γout ) d
φ,c1 ,c2
+ dg(|∇z|)|∇H (φ)|d,
where b(φ, γin , γout ) = H (φ − γin ) H (γout − φ) defines a narrow band of varying
and adaptive widths γin , γout . The idea of “not stepping over” the desirable when
φ = 0 is achieved by an adaptive searching algorithm based on checking local
intensity variations (Zhang et al. 2014). Of course, the use of a varying band domain
in the formulation leads to a highly non-convex model, making it hard to develop a
theory.
To overcome the issue raised in Badshah and Chen (2010), alternatively, the idea
of Rada-Chen (2012) in Rada and Chen (2012) is to compute the Chan-Vese
solution by a level set function φG and also to compute the desirable solution (of an
object only) via an embedded level set function φL , resulting in the dual level set
formulation
min μL dg(|∇z|)|∇H (φL )|H (φG + γ )d
φG ,φL ,c1 ,c2
+ μG g(|∇z|)|∇H (φG )|d
+ λ1G H (φG )|z − c1 |2 + λ2G (1 − H (φG ))|z − c2 |2 d
+ λ1L H (φL )|z − c1 |2 + λ2L (1 − H (φL ))H (φG )|z − c1 |2 d
+ λ3L |z − c2 | (1 − H (φL ))(1 − H (φG ))d ,
2
where the first component balanced by μL essentially selects the desirable solution
φL from the global solution φG , while the last term weighted by λ3L helps separate
the “true” background from the foreground intensity. The latter is because both
objects included in φG and the desirable object included in φL often have the same
intensity c1 , i.e., objects included in φG but not desirable to our selection are quite
different from the true background intensity c2 . Parameter γ is an integer included
to increase the search domain (by a small band) in the computation of the weighted
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1445
length of L . Its introduction increases the model robustness until final convergence.
The above model was found extremely robust for selective segmentation, but its
implementation doubles the amount of work one normally needs because of the use
of two level set functions instead of the usual one.
To stay with one level set function and yet to overcome the drawback of getting
redundant objects beyond set M, a useful idea is to impose the so-called moment
constraints. The early work (Ayed et al. 2008) of 0th order moment (or area
constraint) uses
2
μ
min g(z)ds + 2 d − Ap g(z)d
Ap R RC
where u > 0 in domain R , H (−u) indicates the outside domain (note the typo in
the definition of level set function u in Ayed et al. (2008) which stated wrongly u <
0 in R ). The above reformulation was surveyed in Nosrati and Hamarneh (2016)
which unfortunately replaced the latter term g(z)H (−u)d by in g(z)d =
g(z)H (u)d which is a major typo.
High-order moments were considered in Klodt and Cremers (2011). Denoting by
u the indicator function, i.e., 1 inside object and 0 outside, the proposal in Klodt and
Cremers (2011) for area constraint is
2
min E(u) = f u d + g|Du|d + λ ud − Ap ,
u
where the last term to impose the area Ap of the targeted object is a simple version
of a more general constraint
a1 ≤ u d ≤ a2 .
Here f is taken as the log likelihood ratio for observing z(x, y) at a point (x, y)
given that (x, y) is part of the background or the object. Higher-order moments
refer to tensors of high order, e.g., centroid (first order) and covariance (second
1446 K. Chen et al.
order). Note that the set defined by the inequality is convex which could potentially
be explored.
The model by Rada-Chen (2013) builds the area constraint into selection
min dg(|∇z|)|∇H (φ)|d+ λ1 H (φ)|z−c1 |2 +λ2 (1−H (φ))|z−c2 |2 d
φ,c1 ,c2
2 2
+ν H (φ)d − A1 +ν (1 − H (φ))d − A2 ,
where A1 is the area defined by the polygon formed by markers in set M assuming
m ≥ 3, while A2 = mes() − A1 is the area outside this polygon.
The work of Chan et al. (2006) proposes a convex relaxation idea for the Chan-Vese
model that is nowadays widely used, where the relaxation consists in replacing {0, 1}
by [0, 1] after substituting H (φ) by u. This idea is emphasized by Spencer and Chen
(2015) who propose a convex selective segmentation model
min
u,c1 ,c2
dg(|∇z|)|∇u|d + λ1 u|z − c1 |2 + λ2 (1 − u)|z − c2 |2 d
0≤u≤1
+θ Pd ud
A deep idea to explore more the marker set M is to design a new distance function
d(x, y) that takes into account several factors: the set M itself as before, large edges
of image z, the previously used Euclidean distance Pd , and possible anti-markers
(to define a set AM of points that are definitely not in the intended object). Then a
geodesic distance encompassing all these constraints and denoted by D is defined in
Roberts et al. (2019) to replace the previous distance d and satisfies the Eikonal-type
equation:
min g(|∇z|)|∇u|d + λ1 u|z − c1 |2 + λ2 (1 − u)|z − c2 |2 d
u,c1 ,c2
+μ D ud + α ν(u)d
⎧
⎪
⎨ +1 if (x, y) ∈ M,
L(x, y) = −1 if (x, y) ∈ AM,
⎪
⎩ 0 elsewhere,
which may be used to replace or enhance our distance function. The suggestion in
Cremers et al. (2007) is to add a regularization term to influence optimization for
the level set function (object φ > 0)
1448 K. Chen et al.
Euser (φ) = − L(x, y)sign(φ(x, y))d.
where
m
M(z) = δ(z − (xi , yi )) with (xi , yi ) ∈ M
i=1
and N(x, y) denotes an infinitesimal neighborhood of (x, y). Here, L(x, y) = 0 for
pixels in set M and 1 for pixels in set AM. However L(x, y) = H (φ(x, y)) at
other pixels away from markers. The new H that can reflect the feedback of a user
is found by
2
Euser (φ) = L(x , y ) − H (φ(x, y)) K(x, x , y, y )dd
(v) Another interesting class of research directions falls within the scope of
boundary convexity constraint. This helps segmentation in case of very noisy images
or missing data. See the recent works of Liu et al. (2020), Luo et al. (2019), and Siu
et al. (2020).
Finally it should be noted that the above discussed models can be extended in a
simple manner to deal with 3D images; in fact many were implemented and tested
in 3D. As also remarked above, it remains to develop models based on frameworks
beyond piecewise constant intensities that can process texture images effectively.
This will be a future direction.
(a) prescribed topology enforcement in the sense that the segmented target should
be homeomorphic to the original shape supplied by the user—two objects being
homeomorphic provided they can be deformed into each other by a continuous,
invertible mapping—or should exhibit a prescribed number of connected com-
ponents/holes. Topology enforcement in segmentation is particularly important
when a user’s requirement is not in visual agreement with the data, i.e., in the
case where, without including those topological constraints, most segmentation
models would fail: Fig. 3 would show two objects, while a single cell would be
outlined in Fig. 4.
Fig. 4 Segmentation steps of two blood cells close to each other when topological constraints are
applied
Figures 3 and 4 taken from Le Guyader and Vese (2008) provide two
illustrations of prescribed topology enforcement. In Fig. 3, we aim at seg-
menting the two disks while maintaining the same topology throughout the
process, meaning that we expect to get a simply connected shape. Figure 4
illustrates the case where the initial condition is made of two disjoint closed
curves. We expect to have both curves evolving without merging. If visually the
blood cells look glued, individual cell segmentation is required. Here, the final
segmentation shows that the two cells are disconnected, in compliance with the
user’s requirement.
(b) regularity enforcement on the edge set of the segmentation, thus influencing
the topology of the segmenting curves/shapes, with an emphasis on variational
models, due to their ability to include multiple criteria. This kind of approach
does not fall exactly within the scope of topological prior-based methods—since
it does not intend to prescribe the topology of the targeted object—but influences
nevertheless the regularity and so in some way the topology of the final shape by
removing undesirable small patterns. In this regard, Fig. 5 taken from Alvarez
et al. (2018) (courtesy of Luis Alvarez, Universidad de Las Palmas de Gran
Canaria, Spain) illustrates how a geometrical partial differential equation can
be used for level set regularization, i.e., to remove small-scale features and
spurious oscillations. By choosing adequately a forcing term appearing in the
partial differential equation (PDE) that dictates the front evolution, one can keep
more or less detail in the final segmenting curve.
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1451
Topology Prescription
Digital Topology
It is generally agreed that the implicit framework of the level set setting displays
several advantages over parametric methods when tracking a front that propagates:
the evolving contour is embedded in a higher dimensional level set function,
thus avoiding parameterization issues; the model is intrinsic, i.e., invariant to
a reparameterization of the curve, able to handle topological changes (merging
and splitting—note nevertheless that some works aim to reconcile parametric
implementations with handling of topological changes as in Precioso et al. (2002),
for instance, where the authors address the issue of moving object segmentation
1452 K. Chen et al.
sphere surface. (iii) For each regularly sampled spherical point, one identifies the
intersecting triangle of the sphere mesh (by finding the closest triangle of the
tessellated sphere surface, a specific strategy being developed to favor some of
the triangles related to topological defects). Then using barycentric coordinates in
this triangle, the coordinates of the fiber vertex lying on the original mesh surface are
determined, yielding three functions defined on the sphere: each regularly sampled
spherical point is associated with the three coordinates giving its location on the
original cortical surface. (iv) Every function is expanded in the spherical harmonic
basis where the coefficients are defined as the L2 -inner product of the function
and the basis functions. Using the computed coefficients, two surfaces are then
reconstructed: the first one is a high-frequency surface employing all coefficients,
while the second one is a smooth surface reconstructed from filtered coefficients
using a low-pass filter. Vertices from the low-pass filtered reconstruction are patched
into the high-frequency one in regions that previously contained defects (and that
are likely to display pikes after the spherical harmonic-based reconstruction). At
last, a post-processing step is applied to correct self-intersections.
Despite the fact that deep learning-based methods are beyond the scope of this
contribution, we would like to end this part by highlighting some methods that
intertwine the soundness of these approaches (that yield remarkable results when-
ever sufficient labeled data can be collected) with variational models and specially
active contour-based models, capable of encoding high-level shape features such
as topology. In Thierbach et al. (2018), Thierbach et al. propose to combine con-
volutional neural networks with topology-preserving geometric deformable models
(Bogovic et al. 2013) in the context of neural cell bodies segmentation from light
sheet microscopy, while limiting manual annotations. The training step is achieved
with simple cell centroid annotations and the final segmentation provides accurate
results complying with the topological requirements (no cell splitting/merging).
R() = − log (x + d ∇(x)) ds − log −(x − l ∇(x)) ds,
∂D ∂D
with D = x | (x) > 0 and d > 0, l > 0 given parameters. Parameters d
and l influence on the distance between distinct connected components and on
the size of the connected components themselves. Although devoted to a different
application, the work Rochery et al. (2006) uses a similar idea to the one developed
in Sundaramoorthi and Yezzi (2005) to prevent pieces of the same curve from
colliding, merging, or breaking. The goal is to track thin long objects that evolve,
with applications to the automatic extraction of road networks in remote sensing
images. The authors propose interesting nonlocal regularizations on the curve C
parameterized by p ∈ [0, 1] phrased as E(C) = − 0 0 t(p) · t(p ) ( C(p) −
1 1
C(p ) ) dp dp , with t(p) denoting the tangent vector to the curve at point C(p) and
C(p) − C(p ) being the Euclidean distance between the curve points √ C(p) and
C(p ). The function is chosen to be (l) = sinh−1 (1/ l) + l − 1 + l 2 , thus
decreasing on [0, +∞[. Other nonlocal forms are considered as well, and geometric
motions of thin long objects are obtained. The implicit representation by level sets
is used for the implementation. In Rochery et al. (2005), the authors carry on their
ideas but, this time, in a phase field approach.
Still in the prospect of a local treatment of topology preservation, Cecil (2003,
Section 4) is dedicated to the tracking of interfaces with fixed topology. The model
relies on a coupled system of PDEs involving the level set function ϕ embedding
the propagating front, and the arclength function — being conjugate to ϕ in
the sense that the two form an orthogonal coordinate system on the zero level set
of ϕ— and on an accurate estimate of the Jacobian J of the interface function and
its conjugate (J tends to 0 at merge points, while it tends to ∞ at pinch points).
Motivated by geometric considerations, Le Guyader and Vese (2008) complement
the classical geodesic active contour model by a nonlocal component interpreted
as a repelling force. More precisely, the level set function being assumed to be
a signed distance function to the evolving contour C and l > 0 denoting a tuning
parameter, the following functional is incorporated into the classical geodesic active
contour model phrased in the level set framework:
E() = − G( x − y 2 ) ∇(x), ∇(y) H ((x) + l) H (l − (x))
the potential function G measuring the closeness of the two points x and y,
H denoting the 1D Heaviside function. Again, the goal is to penalize spatial
proximity of curve points belonging to a narrow band around the zero level set and
subsequently the curvature of the level lines. This idea is then revisited in Schaeffer
and Duggan (2014) in the context of region-based active contours.
An energy involving a fixed-width band around the evolving curve as in
Le Guyader and Vese (2008) is also introduced in Mille (2009) to achieve a
proper trade-off between local features of gradient-like terms and global region
characteristics, and to weaken the strong assumption of uniformity of intensity over
regions classical region-based models rely on.
More recently, still in an effort to ensure orientation preservation and to address
the issue of stability with respect to noise—for instance, when the shapes exhibit
multiple disjoint objects that should be viewed as a whole— a new framework based
on quasiconformal mappings has been introduced in Chan et al. (2018). Given an
image containing an object to be segmented together with the desired prescribed
topology (that can be viewed as a shape prior), a simple template image is deformed
so that it matches the boundary of the target object. In this regard, the model may
be seen as a joint segmentation/registration one. The deformation undergone by the
moving shape is dictated by the Beltrami equation and relies on the fine properties
of quasiconformal mappings. Quasiconformal mappings can be defined as follows
(Lehto and Virtanen 1973) (we restrict ourselves to quasiconformal mappings that
are homeomorphisms between plane domains):
∂f ∂f
(z) = μ(z) (z) for almost every z ∈ ,
∂ z̄ ∂z
K −1
μ ∞ ≤ < 1.
K +1
1458 K. Chen et al.
∂f 1 ∂f ∂f ∂f 1 ∂f ∂f
With = −i and = +i , setting f (z) = f (x +
∂z 2 ∂x ∂y ∂ z̄ 2 ∂x ∂y
iy) = u(x, y) + iv(x, y) (so in the context of registration, the sought deformation
∂f 2 ∂f 2
T
is ϕ = (u, v) ), the Jacobian is defined by Jf (z) = (z) − (z) =
∂z ∂ z̄
∂u ∂v ∂u ∂v 1,2
(x, y) (x, y)− (x, y) (x, y) = det ∇ϕ(x, y). Consequently, if f , Wloc -
∂x ∂y ∂y ∂x
homeomorphism, is K-quasiconformal,
∂f 2 ∂f 2 ∂f 2 2
Jf (z) = (z) − (z) = (z) 1 − μ(z) ,
∂z ∂ z̄ ∂z
Fig. 6 On the left, prescribed topology superimposed on the input image. On the right, obtained
segmentation
functional, the last one guarantees some smoothness on the deformation mapping
f μ whereas the constraint on μ ∞ insures admissibility of the Beltrami represen-
tation of the object in D. The connection to the Chan-Vese data fidelity term can
be made explicit by making a simple change of variable in the second component.
Also the regularization of the Beltrami coefficient is now a substitute for the classical
shape perimeter minimization. High accuracy and stability of the proposed model
with different segmentation tasks are then exemplified as demonstrated in Fig. 6 in
which we try to preserve the 0-genus property of the shape—note nevertheless that
the algorithm can deal with shapes of higher genuses; for instance, if we go back to
the example of Fig. 3, it would suffice to build a proper topological prior image J
made of two separate shapes. Classical segmentation models would fail to segment
the desired shape as a whole because of the visible occluded regions: the bear ears
would be disconnected from the body.
∇u
with F (∇u, ∇ 2 u) = div |∇u|, this latter quantity being the mean
|∇u|
curvature (thus ensuring smoothness of the evolving curve), while k denotes a
forcing term, k being assumed bounded Hölder continuous (so with rather weak
required regularity—only Hölder continuity in comparison to the classical Lipschitz
regularity). This function k that may depend on the image contents is the parameter
that can be adjusted by the user according to his needs. Note also that this class of
PDEs falls within a broader framework studied in Giga et al. (1991), for instance.
The geometric feature of these PDEs is conveyed by the fact that F : Rn \{0}×Sn →
R satisfies F (λp, λX + σ p ⊗ p) = λ F (p, X), ∀λ > 0, ∀σ ∈ R, ∀p ∈ Rn ,
⊗ denoting the tensor product in Rn and Sn denoting the space of n × n real
symmetric matrices. It expresses that the zero level set of function u only depends
on the zero level set of the initial condition and not on the initial condition itself, and
the composition of any solution with a nondecreasing function remains a solution of
the equation. Such PDEs fall within the framework of the viscosity solution theory
(Crandall et al. 1992). Theoretical issues/qualitative properties of the hypersurface
evolution are investigated like comparison principle or existence/uniqueness of the
solution, with fewer requirements on function k (Hölder continuity only). Special
care is taken to the qualitative properties of the hypersurface evolution. This analysis
is fully meaningful once we have studied the shape of radial solutions, making it
possible to derive some qualitative properties (asymptotic behavior) of the (unique)
solution of the problem associated with an unspecified (but smooth enough) initial
condition.
Considering as initial hypersurface 0 of Rn the boundary of a bounded open
set U0 , and denoting by u0 : Rn → R any bounded uniformly continuous function
(Lipschitz continuous if k is merely Hölder continuous) that satisfies:
⎧
⎪
⎪ x ∈ U0
⎪ u0 (x) < 0 if
⎨
u0 (x) = u0 (x) > 0 if x ∈ (Rn − U0 )
⎪
⎪
⎪
⎩ 0 if x ∈ 0 = ∂U0 ,
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1461
and
t = ∂Ut ,
with u := u(t, x) the unique viscosity solution of (1) for the initial datum u0 (x).
The authors first prove that Ut and t are independent of the choice of u0 and
only depend on U0 . Then they provide a comparison principle saying that if U0
and U 0 are bounded open sets satisfying the inclusion relation U0 ⊆ U 0 , then for
t ≥ 0, this inclusion relation still holds, i.e., U = x ∈ R n , u(t, x) < 0 ⊆ U t =
t
x ∈ Rn , û(t, x) < 0 , u and û being the unique solutions of the PDE associated,
respectively, with initial condition u0 and u0 . This later result will enable us to infer
the asymptotic behavior of an evolving shape given an initial condition. In order
to get a clear view of the asymptotic states of the solution of (1) according to the
choice of the forcing term k, the authors focus on the radial solution shape analysis.
In that purpose, let U0 = Br0 (x0 ) be the ball centered at x0 with radius r0 , and let us
choose as initial datum u0 , the signed distance function defined by:
⎧
⎨ |x − x0 | − r0 if |x − x0 | < r0 + S
u0 (x) = dBr0 (x0 ) (x) =
⎩ S otherwise,
for some S > 0. Standard computations lead to the following equation in radial
coordinates:
∂u n−1
(t, r) = sgn(ur (t, r)) + k(r) |ur (t, r)|, (2)
∂t r
with sgn being the sign function. In this case, Ut is given by the ball Br(t) (x0 ) where
r(t) satisfies u(t, r(t)) = 0, ∀t > 0. By deriving this relation with respect to t and by
∂u
substituting with the right-hand side of (2), the problem amounts to solving an
∂t
ordinary differential equation in r for which an explicit expression of the solution
can be provided. The shape of r(t) is then investigated for several choices of the
forcing term k as well as its asymptotic behavior, which suggests that by choosing
the forcing term k properly, some stabilization properties of the propagating front
(the radius may stabilize in a finite time) can be expected as well as some particular
responses such as shrinkage of the shape or on the contrary expansion of the radius
with time. These observations combined with the monotonicity principle mentioned
above (preservation of the inclusion property) motivate the application of this model
as level set regularization (refer to Fig. 5 for an application).
1462 K. Chen et al.
High-order schemes have been proposed to solve (3), most of them being based
on nonoscillatory local interpolation technique, for which nevertheless general
convergence theorems are lacking. These limitations motivate the introduction of
a new class of high-order schemes for time-dependent Hamilton-Jacobi equations
grounded on filtered schemes. The design of these filtered schemes relies on a
simple coupling of a monotone scheme and a high-order scheme, which allows
inheriting both the property of convergence to the weak viscosity solution of the
monotone scheme—known however to be at most first order accurate—and the
higher accuracy of high-order schemes that prove to be in general unstable by
properly connecting these two schemes, guaranteeing then global convergence. This
is the main focal point of Falcone et al. (2020) after proposing a way to compute a
modified (in the sense, extended) velocity c in (3) ensuring regularity of the front
evolution. As the front represents the boundary of an evolving shape, and since
segmentation aims to extract object shapes from a given image, the front should stop
moving in the vicinity of the desired object boundaries. The question of designing
a suitable image-related speed function naturally emerges. From the modeling, this
speed has only meaning on the zero level set function over the entire domain. In
order for the evolution equation to have consistent meaning for all the level sets, an
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1463
- the image-related speed function must be devised so that level sets moving under this
speed function cannot collide. A natural way of designing it is to let the speed at a point P
lying on a level set {v = c} be the value of the speed at a point Q such that Q is the closest
point to P lying on the level set {v = 0}. Q is uniquely determined whenever the normal
direction in P is well-defined.
The novelty of Falcone et al. (2020) relies then on the method of construction of
this extended velocity: it is based on the central premise that if the layout of the
level sets is known at initial stage and if all the points in the normal direction to
the zero level set evolve according to the same law, it sounds reasonable to expect
that all such points will keep their relative distance unchanged as time flows. This
observation leads to the following definition of the extended velocity c̃:
vx vy
c̃(x, y, v, vx , vy ) = c(x − d(v) , y − d(v) ),
|∇v| |∇v|
Motivations
(i) a deformation model describing the setting in which the objects to be matched
are interpreted and viewed and allowing to favor certain properties of the
deformation: physical models, purely geometric models, models including a
priori knowledge, etc.
(ii) a cost function which generally comprises two terms: a first one quantifying the
misalignment between the deformed template and the reference and the second
one regularizing the deformation, regularization prescribing the nature of the
deformation
(iii) an optimization method
The deformation model, which is thus the first ingredient, actually motivates the
way the deformation ϕ is built in order to apply to a specific task:
(i) by analogy with physical models: for instance, elastic models (Broit 1981) in
which the shapes to be matched are considered as the observations of the same
body before and after being subject to constraints, fluid models (Christensen
et al. 1996) in which the shapes to be matched are viewed as fluids evolving
in accordance with Navier-Stokes equations, diffusion models (Fischer and
Modersitzki 2002), curvature models (Fischer and Modersitzki 2003), flows of
diffeomorphisms (Beg et al. 2005), and nonlinear models (Burger et al. 2013,
Derfoul and Le Guyader 2014, Droske and Rumpf 2004, Le Guyader and Vese
2011, Rumpf and Wirth 2009, Rabbitt et al. 1995, Pennec et al. 2005) to allow
for large deformations.
(ii) by interpolation or approximation-driven models: it means that the deformation
is described in a parameterizable set. The displacements are considered to
be known on a restricted set and are then extrapolated or approximated on
the whole domain. The family of interpolation strategies includes radial basis
functions (Zagorchev and Goshtasby 2006), elastic body splines (Davis et al.
1997), free-form deformations (Sederberg and Parry 1986), basis functions
from signal processing (Ashburner and Friston 1999), and piecewise affine
models. These models are rich enough to describe the transformations, while
having low degrees of freedom.
(iii) by including a priori knowledge (through conditioning statistically image
matching or biomechanical/biophysical models, for instance, tumor growth
model or biomechanical model of breast tissue (Clatz et al. 2005)) or shape
a priori in order to penalize configurations that diverge too much from it.
Additional constraints can be applied in order for the deformation to exhibit suitable
properties such as topology or orientation preservation (one-to-one property of
the deformation) (Karaçali and Davatzikos 2004, Christensen et al. 1996, Musse
et al. 2001, Noblet et al. 2005), symmetry, inverse consistency (which means that
interchanging the template and the reference should not impact on the produced
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1465
result) (Yanovsky et al. 2007), volume preservation (Haber and Modersitzki 2004),
lower and upper bounds on the Jacobian determinant (Haber and Modersitzki 2007),
etc.
The second component of an image registration method is the objective function
or the matching criterion, that is, how the available data are exploited to drive the
registration process. Ideally, it should be devised in order to comply with the nature
of the observations to be registered and should put the emphasis on salient features.
There exist numerous types of matching criterion which can be regrouped into three
categories as follows:
Finally, the last component is the optimization method, consisting of the following
types:
(i) continuous methods in which the variables are assumed to take real values and
the objective function to be differentiable: gradient descent (Beg et al. 2005),
conjugate gradient (Miller et al. 2002), Newton-type methods, Levenberg-
Marquardt, and stochastic gradient descent methods (Wells et al. 1996)
(ii) discrete methods (contrary to continuous methods, they perform a global search
and exhibit better convergence rates than the continuous methods): graph-based
(Tang et al. 2007), belief propagation, and linear programming methods
(iii) miscellaneous methods: greedy approaches and evolutionary algorithms
For images including several objects, registration cannot just track the changes of a
particular one. Yet, in some applications, we are only interested in tracing only one
of these objects, resulting in a linear process of the two tasks: segmentation should
be achieved first and then registration, meaning that segmentation and registration
are processed sequentially, one task after another, without correlating them, which
in practice may propagate errors from step to step. Still, as structure/salient com-
ponent/shape/geometrical feature matching and intensity distribution comparison
rule registration, combining the segmentation and registration tasks into a single
framework sounds relevant. Beyond the fact it may reduce propagation of uncer-
tainty, jointly performing these tasks yields positive mutual influence and benefit on
the obtained results as exemplified in Fig. 7. Accurate segmented structures allow
to drive the registration process correctly, providing then a reliable deformation
between the encoded structures, not only based on intensity distribution comparison
(local criterion) but also on geometrical and topological features (nonlocal feature)
and edge transfer—thus diminishing the influence of noise. Besides, registration
can be viewed as the incorporation of prior information to guide the segmentation
process, in particular for the questions of topology preservation (the unknown
1466 K. Chen et al.
deformation is substituted for the classical evolving contour and the related Jacobian
determinant is subject to positivity constraints) and geometric priors (since the
registration allows to overcome the issue of weak boundaries). An overview of prior
related works dedicated to joint segmentation and registration is produced in the
next section.
with
R H (0 (ϕ)) dx R (1 − H (0 (ϕ))) dx
c1 (ϕ) = , c2 (ϕ) = ,
H (0 (ϕ)) dx (1 − H (0 (ϕ))) dx
Saint Venant-Kirchhoff ones (Ciarlet 1985). This outlook rules the design of the
regularization on ϕ which is thus based on the stored energy function of a Saint
Venant-Kirchhoff material. We recall that the right Cauchy-Green strain tensor
(viewed as a quantifier of the square of local change in distances due to deformation)
is defined by C = ∇ϕ T ∇ϕ = F T F . The Green-Saint Venant strain tensor is defined
by E = 12 (C − I ). Associated with a given deformation ϕ, it is a measure of the
deviation between ϕ and a rigid deformation. We also need √ the following notations:
A : B = trAT B, the matrix inner product and ||A|| = A : A, the related matrix
norm (Frobenius norm). The stored energy function of a Saint Venant-Kirchhoff
material is defined by WSV K (F ) = W (E) = λ (tr E)2 + μ tr E 2 , λ and μ being
2
the Lamé coefficients. To ensure that the distribution of the deformation Jacobian
determinants does not exhibit contractions or expansions that are too large and to
avoid singularity as much as possible, we complement the stored energy function
WSV K by the term μ (det F − 1)2 controlling that the Jacobian determinant remains
close to 1. The weighting of the determinant component by parameter μ allows
to recover a property of convexity for the function introduced later. (Note that
the stored energy function WSV K alone lacks a term penalizing the determinant:
it does not preclude deformations with negative Jacobian. The expression of its
quasiconvex envelope is more complex since involving explicitly the singular values
of F . Also, when they are all lower than 1, the quasiconvex envelope equals
0, which shows bad behavior under compression). Therefore, the regularization
can be written, after intermediate computations, as W (F ) = β( F 2 − α)2 −
μ μ(λ+μ) λ+μ λ+2μ
2 (det F ) + μ(det F − 1) + 2(λ+2μ) , where α = 2 λ+2μ and β =
2 2
8 . Although
meaningful, function W takes on a drawback since it is not quasiconvex (see
Dacorogna 2008, Chapter 9 for a complete review of this notion), which raises an
issue of a theoretical nature since we cannot obtain the weak lower semicontinuity
property. The⎧ idea is thus to replace W by its quasiconvex envelope defined by
⎪
⎪ λ+μ
⎨ W (ξ ) if ||ξ ||2 ≥ 2 ,
QW (ξ ) = λ + 2μ
⎪ λ + μ and , the convex mapping such that
⎪ Ψ (det ξ ) if ||ξ ||2 < 2
⎩ ,
λ + 2μ
μ μ(λ + μ)
: t → − t 2 +μ (t − 1)2 + (see Ozeré et al. 2015 for the derivation),
2 2(λ + 2μ)
for which the minimal argument is t = 2. The regularizer is now complemented by
a dissimilarity measure inspired by the unified model of image segmentation and
denoising introduced by Bresson et al. (2007), designed to overcome the limitation
of local minima and to deal with global minimum.
In that purpose, let g : R+ → R+ be an edge detector function satisfying
g(0) = 1, g strictly decreasing and lim g(r) = 0. From now on, we set
r→+∞
g := g(|∇R|), and for theoretical purposes, we assume that ∃c > 0 such that
0 < c ≤ g ≤ 1 and that g is Lipschitz continuous. We then use the generalization
of the notion of function of bounded variation to the setting of BV -spaces associated
with a Muckenhoupt’s weight function depicted in Baldi (2001). We follow Baldi’s
arguments and notations to define the weighted BV -space related to weight g.
1470 K. Chen et al.
For a general weight w, some hypotheses are required (fulfilled here by g).
¯ the positive weight w ∈ L1 (0 )
More precisely, 0 being a neighborhood of , loc
is assumed to belong to the global Muckenhoupt’s A1 = A1 () class of weight
functions, i.e., w satisfies the condition:
1
C w(x) ≥ w(y) dy a.e. (4)
|B(x, r)| B(x,r)
Definition 1 (Baldi 2001, Definition 2). Let w be a weight function in the class
A∗1 . We denote by BV (, w) the set of functions u ∈ L1 (, w) (set of functions
that are integrable with respect to the measure w(x) dx) such that:
" #
sup u div(ϕ) dx : |ϕ| ≤ w everywhere, ϕ ∈ Lip0 (, R ) < ∞, 2
(5)
with Lip0 (, R2 ) the space of Lipschitz continuous functions with compact support.
We denote by varw u the quantity (5).
Remark 1. In Baldi (2001), Baldi defines the BV -space taking as test functions
elements of Lip0 (, R2 ). Classically in the literature, the test functions are chosen
in C1c (, R2 ). It can be proved that these two definitions coincide thanks to
mollifications and density results.
To explain (5), we give the following result (Baldi 2001, Remark 10):
Equipped with this material (and due to the properties of function g: it is obviously
L1 , continuous and it suffices to take C = 1c to satisfy (4) pointwise), we propose
introducing as dissimilarity measure the following functional:
ν
Wf id (ϕ) = varg T ◦ ϕ + (T ◦ ϕ(x) − R(x))2 dx
2
!
+a (c1 − R(x))2 − (c2 − R(x))2 T ◦ ϕ(x) dx, (6)
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1471
R(x) Hε (T ◦ϕ(x)−ρ ) dx R(x) 1−Hε (T ◦ϕ(x)−ρ ) dx
with c1 =
and c2 = —we
Hε (T ◦ϕ(x)−ρ ) dx 1−Hε (T ◦ϕ(x)−ρ ) dx
dropped the dependency on ϕ to lighten the expressions— Hε denoting a regu-
larization of the Heaviside function and ρ ∈ [0, 1] being a fixed parameter allowing
to partition T ◦ ϕ into two phases and yielding a binary version of the reference. ρ
can be estimated by analyzing the reference histogram to discriminate two relevant
regions or phases, for instance, through histogram shape-based methods, clustering-
based methods, entropy-based methods, object attribute-based methods, spatial
methods, or local methods (Sezgin and Sankur 2004). This proposed functional
emphasizes the link between the geodesic active contour model (Caselles et al.
1997) and the piecewise constant Mumford-Shah model (Mumford and Shah 1989):
if T̃ is the characteristic function of the set C , bounded subset of with
regular boundary C, varg T̃ is a new definition of the length of C with a metric
depending on the reference content (so minimizing this quantity is equivalent to
locating the curve on the boundary ! of the shape contained in the reference), while
(c1 − R(x)) − (c2 − R(x))
2 2 T̃ (x) dx approximates R in the L2 sense by two
regions C and \C with two values c1 and c2 . Indeed, varg T̃ = ∩C g dH 1 ,
and if c1 and c2 are fixed (which is in practice
! the case in the alternating
algorithm), (c1 − R(x))2 − (c2 − R(x))2 1C dx is equivalent to minimizing
2 2
c1 − R(x) 1C dx + c2 − R(x) 1\C dx .
In the end, the global minimization problem denoted by (QP)—which stands for
quasiconvex problem— is stated by:
inf ¯
I (ϕ) = Wfid (ϕ) + QW(∇ϕ) dx (QP)
ϕ∈W=Id+W01,4 (,R2 )
(QP) and the infimum of (P), and derivation of a related nonlocal problem, motivated
by the strength and robustness of nonlocal methods exemplified in many image
processing tasks such as image denoising, color image deblurring in the presence
of Gaussian or impulse noise, color image inpainting, color image super-resolution,
or color filter array demosaicing (Jung et al. 2011). The next part is dedicated to
the derivation of a nonlocal counterpart of problem (QP). In practice, in terms of
quantitative and qualitative accuracy, this nonlocal model gives better results than
those obtained with the local one, with higher Dice coefficients, in particular when
the shapes to be matched exhibit fine details or complex topologies.
The statement of the nonlocal problem relies on the following nonlocal approx-
imation of the weighted total variation (or nonlocal weighted BV semi-norm) by a
sequence of integral operators involving a differential quotient and a radial mollifier
sequence. It is inspired by prior works by Dávila and Ponce dedicated to the design
of nonlocal counterparts of Sobolev and BV semi-norms. Let (ρn )n∈N be a sequence
of radial mollifiers satisfying: ∀n ∈ N, ∀x ∈ R2 , ρn (x) = ρn (|x|); ∀n ∈ N, ρn ≥ 0;
+∞
∀n ∈ N, R2 ρn (x) dx = 1; ∀δ > 0, lim ρn (r) r dr = 0. Then the
n→+∞ δ
following theorem holds:
Theorem 2. Let ⊂ R2 be an open bounded set with Lipschitz boundary and let
f ∈ BV (, g) ⊂ BV () as 0 < c ≤ g ≤ 1 everywhere. Consider (ρn )n∈N defined
previously. Then
$ %
|f (x) − f (y)|
lim g(x) ρn (x − y) dy dx
n→+∞ |x − y|
⎡ ⎤
2π
⎢ 1
=⎣ 1 e. cos(θ ) dθ ⎥ ⎦ varg f = K1,2 varg f,
|S | 0 sin(θ )
with e being any unit vector of R2 and S 1 being the unit sphere in R2 .
, -
1 |T ◦ ϕ(y)−T ◦ ϕ(x)|
inf En (ϕ)= g(x) ρn (x−y) dy dx
ϕ∈Id+W01,4 (,R2 ) K1,2 |x−y|
!
+a (c1 − R)2 − (c2 − R)2 T ◦ ϕ dx
.
ν
+ T ◦ϕ−R 2
L2 ()
+ QW(∇ϕ) dx . (NLP)
2
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1473
Fig. 7 Mapping of cardiac MRI images (ED-ES) (size: 150 × 150). (a) R (b) T (c) Binary
Reference (rescaled) (d) T ◦ ϕ (e) Deformation grid (f) T̃ (g) R − T̃ (h) Segmented Reference
The topics related to segmentation and registration are huge. The area is active and
fast growing. We only briefly mention a few directions.
When a sequence of images z1 , z2 , . . . are given, e.g., from functional MRI or from
frames of a video image, segmented objects in z1 are hoped to be registered to
1474 K. Chen et al.
the evolved features in subsequent images. This may be realized by optimal flow
registration methods or by joint segmentation and registration models (Debroux and
Le Guyader 2018; Brox and Malik 2010; Cohen 1993).
Shape Priors
Given a shape prior ψ0 intended for an image z, there exist several models in the
literature trying to segment an object ψ in z that is some transformed version of ψ0 .
This seemingly simple and useful task is highly non-trivial to realize, unless ψ is a
parametric (e.g., affine) transform of ψ0 ; see Cremers et al. (2002) and Gu (2017)
and many references therein. One fundamental challenge is that registration models
such as (QP) are highly capable to transform one shape to another (Debroux et al.
2017) and hence if not constrained, registration would attempt to find a match of
objects by essentially ignoring the given shape prior.
Deep learning models have been extremely popular for solving models in segmen-
tation, registration, or joint segmentation and registration (Estienne et al. 2019; Xu
and Niethammer 2019). In fact, supervised learning for image segmentation and
unsupervised learning for image registration are widely used for various image
applications. Current and emerging works show novelties in terms of new network
designs for segmentation while of new energy (loss) functions.
Multi-modal Problems
Conclusion
References
Alberti, G., Bouchitté, G., Dal Maso, G.: The calibration method for the Mumford-Shah functional
and free-discontinuity problems. Calc. Var. Partial Differ. Equ. 16(3), 299–333 (2003)
Alexandrov, O., Santosa, F.: A topology-preserving level set method for shape optimization. J.
Comput. Phys. 204(1), 121–130 (2005)
Alvarez, L., Cuenca, C., Díaz, J.I., González, E.: Level set regularization using geometric flows.
SIAM J. Imag. Sci. 11(2), 1493–1523 (2018)
Ambrosio, L., Dal Maso, G.: A general chain rule for distributional derivatives. Proc. Am. Math.
Soc. 108(3), 691–702 (1990)
An, J.H., Chen, Y., Huang, F., Wilson, D., Geiser, E.: A variational PDE based level set method for a
simultaneous segmentation and non-rigid registration. In: Duncan, J.S., Gerig, G. (eds.) Medical
Image Computing and Computer-Assisted Intervention – MICCAI 2005: 8th International
Conference, Palm Springs, 26–29 Oct 2005, Proceedings, Part I, pp. 286–293. Springer,
Berlin/Heidelberg (2005)
Ashburner, J., Friston, K.J.: Nonlinear spatial normalization using basis functions. Hum. Brain
Mapp. 7(4), 254–266 (1999)
Astala, K., Iwaniec, T., Martin, G.: Elliptic Partial Differential Equations and Quasiconformal
Mappings in the Plane. Princeton University Press (2009)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential
Equations and the Calculus of Variations. Applied Mathematical Sciences. Springer (2001)
Ayed, I.B., Li, S., Islam, A., Garvin, G., Chhem, R.: Area prior constrained level set evolution for
medical image segmentation. In: Reinhardt, J.M., Pluim, J.P.W. (eds.) Medical Imaging 2008:
Image Processing, vol. 6914, pp. 27–32. SPIE (2008)
Badshah, N., Chen, K.: Image selective segmentation under geometrical constraints using an active
contour approach. Commun. Comput. Phys. 7, 759–778 (2010)
Bai, X., Sapiro, G.: A geodesic framework for fast interactive image and video segmentation and
matting. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)
Baldi, A.: Weighted BV functions. Houston J. Math. 27(3), 683–705 (2001)
Barrett, W., Mortensen, E.N.: Interactive live-wire boundary extraction. Med. Image Anal. 1(4),
331–341 (1997)
1476 K. Chen et al.
Beg, M., Miller, M., Trouvé, A., Younes, L.: Computing large deformation metric mappings via
geodesic flows of diffeomorphisms. Int. J. Comput. Vis. 61(2), 139–157 (2005)
Ben-Zadok, N., Riklin-Raviv, T., Kiryati, N.: Interactive level set segmentation for image-guided
therapy. In: IEEE International Symposium on Biomedical Imaging (ISBI), pp. 1079–1082
(2009)
Bertrand, G.: Simple points, topological numbers and geodesic neighborhoods in cubic grids.
Pattern Recogn. Lett. 15(10), 1003–1011 (1994)
Bertrand, G.: A Boolean characterization of three-dimensional simple points. Pattern Recogn. Lett.
17(2), 115–124 (1996)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1989)
Bogovic, J.A., Prince, J.L., Bazin, P.L.: A multiple object geometric deformable model for image
segmentation. Comput. Vis. Image Underst. 117(2), 145–157 (2013)
Boink, Y.: Combined modelling of optimal transport and segmentation revealing vascular proper-
ties (2016)
Boutry, N., Géraud, T., Najman, L.: A tutorial on Well-Composedness. J. Math. Imaging Vision
60(3), 443–478 (2018)
Boykov, Y.Y., Jolly, M..: Interactive graph cuts for optimal boundary map; region segmentation
of objects in N-D images. In: Proceedings Eighth IEEE International Conference on Computer
Vision. ICCV 2001, vol. 1, pp. 105–112 1 (2001)
Bresson, X., Esedoḡlu, S., Vandergheynst, P., Thiran, J.P., Osher, S.: Fast global minimization of
the active contour/snake model. J. Math. Imaging Vis. 28(2), 151–167 (2007)
Broit, C.: Optimal registration of Deformed Images. Ph.D. thesis, Computer and Information
Science, University of Pennsylvania (1981)
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion
estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500–513 (2010)
Burger, M., Modersitzki, J., Ruthotto, L.: A hyperelastic regularization energy for image registra-
tion. SIAM J. Sci. Comput. 35(1), B132–B148 (2013)
Cai, X., Chan, R., Zeng, T.: A two-stage image segmentation method using a convex variant of the
mumford–shah model and thresholding. SIAM J. Imag. Sci. 6(1), 368–390 (2013)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22(1), 61–79
(1997)
Cecil, T.: Numerical methods for partial differential equations involving discontinuities. Ph.D.
thesis, Department of Mathematics, University of California, Los Angeles (2003)
Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM J. Imag.
Sci. 5(4), 1113–1158 (2012)
Chan, H.L., Yan, S., Lui, L.M., Tai, X.C.: Topology-preserving image segmentation by Beltrami
representation of shapes. J. Math. Imaging Vis. 60(3), 401–421 (2018)
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image
segmentation and denoising models. J. SIAM Appl. Math. 66(5), 1632–1648 (2006)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277
(2001)
Chen, C., Freedman, D.: Topology noise removal for curve and surface evolution. In: Menze, B.,
Langs, G., Tu, Z., Criminisi, A. (eds.) Medical Computer Vision. Recognition Techniques and
Applications in Medical Imaging, pp. 31–42. Springer, Berlin/Heidelberg (2011)
Chen, C., Freedman, D., Lampert, C.H.: Enforcing topological constraints in random field image
segmentation. In: CVPR 2011, pp. 2089–2096 (2011)
Chen, D., Zhang, J., Cohen, L.D.: Minimal paths for tubular structure segmentation with coherence
penalty and adaptive anisotropy. IEEE Trans. Image Process. 28(3), 1271–1284 (2019)
Chen, K., Lui, L.M., Modersitzki, J.: Image and surface registration. In: Elsevier Handbook of
Numerical Analysis. Processing, Analyzing and Learning of Images, Shapes, and Forms: Part
2, chap. 15, pp. 579–611. North Holland (2019)
Christensen, G., Rabbitt, R., Miller, M.: Deformable templates using large deformation Kinemat-
ics. IEEE Trans. Image Process. 5(10), 1435–1447 (1996)
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1477
Chuang, K.S., Tzeng, H.L., Chen, S., Wu, J., Chen, T.J.: Fuzzy C-means clustering with spatial
information for image segmentation. Comput. Med. Imaging Graph. 30(1), 9–15 (2006)
Ciarlet, P.: Elasticité Tridimensionnelle. Masson (1985)
Clatz, O., Sermesant, M., Bondiau, P.Y., Delingette, H., Warfield, S.K., Malandain, G., Ayache, N.:
Realistic simulation of the 3-D growth of brain tumors in MR images coupling diffusion with
biomechanical deformation. IEEE Trans. Med. Imaging 24(10), 1334–1346 (2005)
Cohen, I.: Nonlinear variational method for optical flow computation. In: Proceedings of the 8th
Scandinavian Conference on Image Analysis (SCIA), pp. 523–530. Springer (1993)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans.
Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Crandall, M., Ishii, H., P.-L.L.: User’s guide to viscosity solutions of second order partial
differential equations. Bull. Am. Math. Soc. 27(1), 1–67 (1992)
Cremers, D., Fluck, O., Rousson, M., Aharon, S.: A probabilistic level set formulation for
interactive organ segmentation. In: Medical Imaging 2007: Image Processing, vol. 6512,
pp. 304–312. SPIE (2007)
Cremers, D., Tischhäuser, F., Weickert, J., Schnörr, C.: Diffusion snakes: Introducing statistical
shape knowledge into the Mumford-Shah functional. Int. J. Comput. Vis. 50(3), 295–313
(2002)
Criminisi, A., Sharp, T., Blake, A.: GeoS: Geodesic image segmentation. In: Forsyth, D., Torr, P.,
Zisserman, A. (eds.) Computer Vision – ECCV 2008, pp. 99–112. Springer, Berlin/Heidelberg
(2008)
Dacorogna, B.: Direct methods in the calculus of variations, 2nd edn. Springer (2008)
Davis, M.H., Khotanzad, A., Flamig, D.P., Harms, S.E.: A physics-based coordinate transformation
for 3-D image matching. IEEE Trans. Med. Imaging 16(3), 317–328 (1997)
Debroux, N., Le Guyader, C.: A joint segmentation/registration model based on a nonlocal
characterization of weighted total variation and nonlocal shape descriptors. SIAM J. Imag. Sci.
11(2), 957–990 (2018)
Debroux, N., Ozeré, S., Le Guyader, C.: A non-local topology-preserving segmentation guided
registration model. J. Math. Imag. Vision 59, 1–24 (2017)
Derfoul, R., Le Guyader, C.: A relaxed problem of registration based on the Saint Venant-
Kirchhoff material stored energy for the mapping of mouse brain gene expression data to a
neuroanatomical mouse atlas. SIAM J. Imag. Sci. 7(4), 2175–2195 (2014)
Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM J.
Appl. Math. 64(2), 668–687 (2004)
Droske, M., Rumpf, M.: Multiscale joint segmentation and registration of image morphology. IEEE
Trans. Pattern Anal. Mach. Intell. 29(12), 2181–2194 (2007)
Estienne, T., Vakalopoulou, M., Christodoulidis, S., Battistela, E., Lerousseau, M., Carre, A.,
Klausner, G., Sun, R., Robert, C., Mougiakakou, S., Paragios, N., Deutsch, E.: U-ReSNet:
Ultimate coupling of registration and segmentation with deep nets. In: Shen, D., Liu, T., Peters,
T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A. (eds.) Medical Image Computing
and Computer Assisted Intervention – MICCAI 2019, pp. 310–319. Springer International
Publishing (2019)
Falcone, M., Paolucci, G., Tozza, S.: A high-order scheme for image segmentation via a modified
level-set method. SIAM J. Imag. Sci. 13(1):497–534 (2020)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput.
Vis. 59(2), 167–181 (2004)
Fischer, B., Modersitzki, J.: Fast diffusion registration. AMS Contemp. Math. Inverse Prob. Image
Anal. Med. Imag. 313, 11–129 (2002)
Fischer, B., Modersitzki, J.: Curvature based image registration. J. Math. Imaging Vis. 18(1), 81–85
(2003)
Fischl, B., Liu, A., Dale, A.M.: Automated manifold surgery: constructing geometrically accurate
and topologically correct models of the human cerebral cortex. IEEE Trans. Med. Imaging
20(1), 70–80 (2001)
1478 K. Chen et al.
Fuzhen, H., Xuhong, Y.: Image segmentation under occlusion using selective shape priors.
In: Campilho, A., Kamel, M. (eds.) Image Analysis and Recognition, pp. 89–95. Springer,
Berlin/Heidelberg (2010)
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P.,
Garcia-Rodriguez, J.: A survey on deep learning techniques for image and video semantic
segmentation. Appl. Soft Comput. 70, 41–65 (2018)
Geiping, J.A.: Comparison of topology-preserving segmentation methods and application to
mitotic cell tracking. Westfälische Wilhelms-Universität Münster (2014)
Giga, Y., Goto, S., Ishii, H., Sato, M.H.: Comparison principle and convexity preserving properties
for singular degenerate parabolic equations on unbounded domains. Indiana Univ. Math. J.
40(2), 443–470 (1991)
Gooya, A., Pohl, K., Bilello, M., Cirillo, L., Biros, G., Melhem, E., Davatzikos, C.: GLISTR:
Glioma image segmentation and registration. IEEE Trans. Med. Imaging 31(10), 1941–1954
(2012)
Gorthi, S., Duay, V., Bresson, X., Cuadra, M.B., Castro, F.J.S., Pollo, C., Allal, A.S., Thiran,
J.P.: Active deformation fields: Dense deformation field estimation for atlas-based segmentation
using the active contour framework. Med. Image Anal. 15(6), 787–800 (2011)
Gout, C., Le Guyader, C., Vese, L.A.: Segmentation under geometrical conditions with geodesic
active contour and interpolation using level set methods. Numer. Algorithms 39(1), 155–173
(2005)
Grady, L.: Random Walks for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(11),
1768–1783 (2006)
Gu, Y., Rice, M., Xiong, W., Li, L.: A new approach for image segmentation with shape priors
based on the Potts model. In: Proceedings of APSIPA Annual Summit and Conference 2017,
pp. 12–15. IEEE (2017)
Haber, E., Modersitzki, J.: Numerical methods for volume preserving image registration. Inverse
Probl. 20(5), 1621–1638 (2004)
Haber, E., Modersitzki, J.: Image registration method with guaranteed displacement regularity. Int.
J. Comput. Vision 71(3), 361–372 (2007)
Han, X., Xu, C., Braga-Neto, U., Prince, J.L.: Topology correction in brain cortex segmentation
using a multiscale, graph-based algorithm. IEEE Trans. Med. Imaging 21(2), 109–121 (2002)
Han, X., Xu, C., Prince, J.L.: A topology preserving level set method for geometric deformable
models. IEEE Trans. Pattern Anal. Mach. Intell. 25(6), 755–768 (2003)
He, J., Kim, C.S., Kuo, C.C.J.: Interactive segmentation techniques: Algorithms and performance
evaluation. Springer Publishing Company, Incorporated (2013)
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual
correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)
Ibrahim, M., Chen, K., Rada, L.: An improved model for joint segmentation and registration based
on linear curvature smoother. J. Algoritm. Comput. Technol. 10(4), 314–324 (2016)
Jung, M., Bresson, X., Chan, T.F., Vese, L.A.: Nonlocal Mumford-Shah regularizers for color
image restoration. IEEE Trans. Image Process. 20(6), 1583–1598 (2011)
Karaçali, B., Davatzikos, C.: Estimating topology preserving and smooth displacement fields. IEEE
Trans. Med. Imag. 23(7), 868–880 (2004)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vis. 1(4),
321–331 (1988)
Kihara, Y., Soloviev, M., Chen, T.: In the shadows, shape priors shine: Using occlusion to improve
multi-region segmentation. pp. 392–401. IEEE (2016)
Klodt, M., Cremers, D.: A convex framework for image segmentation with moment con-
straints. In: 13th IEEE International Conference on Computer Vision (ICCV), pp. 2236–2243
(2011)
Kong, T., Rosenfeld, A.: Digital topology: Introduction and survey. Comput. Vision Graph. Image
Process. 48(3), 357–393 (1989)
Latecki, L.J.: D Well-composed pictures. Graph. Models Image Process. 59
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1479
Le Guyader, C., Gout, C.: Geodesic active contour under geometrical conditions: Theory and 3D
applications. Numer. Algoritm. 48(1), 105–133 (2008)
Le Guyader, C., Vese, L.A.: Self-repelling snakes for topology-preserving segmentation models.
IEEE Trans. Image Process. 17(5), 767–779 (2008)
Le Guyader, C., Vese, L.A.: A combined segmentation and registration framework with a nonlinear
elasticity smoother. Comput. Vis. Image Underst. 115(12), 1689–1709 (2011)
Lee, Y.T., Lam, K.C., Lui, L.M.: Landmark-matching transformation with large deformation via
n-dimensional quasi-conformal maps. J. Sci. Comput. 67(3), 926–954 (2016)
Lehto, O., Virtanen, K.: Quasiconformal Mappings in the Plane. Springer (1973)
Li, C., Kao, C., Gore, J.C., Ding, Z.: Implicit Active Contours Driven by Local Binary Fitting
Energy. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007)
Liu, C., Ng, M.K.P., Zeng, T.: Weighted variational model for selective image segmentation with
application to medical images. Pattern Recogn. 76, 367–379 (2018)
Liu, J., Tai, X.C., Luo, S.: Convex shape prior for deep neural convolution network based eye
fundus images segmentation (2020). https://arxiv.org/abs/2005.07476
Liu, Y., Yu, Y.: Interactive image segmentation based on level sets of probabilities. IEEE Trans.
Vis. Comput. Graph. 18(2), 202–213 (2012)
Lord, N., Ho, J., Vemuri, B., Eisenschenk, S.: Simultaneous registration and parcellation of
bilateral hippocampal surface pairs for local asymmetry quantification. IEEE Trans. Med.
Imaging 26(4), 471–478 (2007)
Luo, S., Tai, X.C., Huo, L., Wang, Y., Glowinski, R.: Convex shape prior for multi-object
segmentation using a single level set function. In: 2019 IEEE/CVF International Conference
on Computer Vision (ICCV), pp. 613–621 (2019)
Malladi, R., Sethian, J.A., Vemuri, B.C.: Shape modeling with front propagation: a level set
approach. IEEE Trans. Pattern Anal. Mach. Intell. 17(2), 158–175 (1995)
McGuinness, K., O’Connor, N.E.: A comparative evaluation of interactive segmentation algo-
rithms. Pattern Recogn. 43(2), 434–444 (2010). Interactive Imaging and Vision
Mille, J.: Narrow band region-based active contours and surfaces for 2D and 3D segmentation.
Comput. Vis. Image Underst. 113(9), 946–965 (2009)
Miller, M., Trouvé, A., Younes, L.: On the Metrics and Euler-Lagrange Equations of Computa-
tional Anatomy. Annu. Rev. B. Eng. 4(1), 375–405 (2002)
Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press (2004)
Mortensen, E., Morse, B., Barrett, W., Udupa, J.: Adaptive boundary detection using ‘live-wire’
two-dimensional dynamic programming. In: Proceedings Computers in Cardiology, pp. 635–
638 (1992)
Mory, B., Ardon, R.: Fuzzy region competition: A convex two-phase segmentation framework. In:
Sgallari, F., Murli, A., Paragios, N. (eds.) Scale Space and Variational Methods in Computer
Vision, pp. 214–226. Springer, Berlin/Heidelberg (2007)
Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated
variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Musse, O., Heitz, F., Armspach, J.P.: Topology preserving deformable image matching using
constrained hierarchical parametric models. IEEE Trans. Image Process. 10(7), 1081–1093
(2001)
Nakahara, M.: Geometry, Topology and Physics. Taylor & Francis (2003)
Noblet, V., Heinrich, C., Heitz, F., Armspach, J.P.: 3-D deformable image registration: a topology
preservation scheme based on hierarchical deformation models and interval analysis optimiza-
tion. IEEE Trans. Image Process. 14(5), 553–566 (2005)
Nosrati, M.S., Hamarneh, G.: Incorporating prior knowledge in medical image segmentation: a
survey. arXiv e-prints (2016). https://arxiv.org/abs/1607.01092
Ohlander, R., Price, K., Reddy, D.R.: Picture segmentation using a recursive region splitting
method. Comput. Graphics Image Process. 8(3), 313 – 333 (1978)
Oliveira, F.P., ao Manuel R.S. Tavares, J.: Medical image registration: a review. Comput. Meth.
Biomech. Biomed. Eng. 17(2), 73–93 (2014)
1480 K. Chen et al.
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on
Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988)
Ozeré, S., Gout, C., Le Guyader, C.: Joint segmentation/registration model by shape alignment
via weighted total variation minimization and nonlinear elasticity. SIAM J. Imag. Sci. 8(3),
1981–2020 (2015)
Pennec, X., Stefanescu, R., Arsigny, V., Fillard, P., Ayache, N.: Riemannian elasticity: A statistical
regularization framework for non-linear registration. In: Duncan, J.S., Gerig, G. (eds.) Medical
Image Computing and Computer-Assisted Intervention – MICCAI 2005: 8th International
Conference, Palm Springs, 26–29 Oct 2005, Proceedings, Part II, pp. 943–950. Springer,
Berlin/Heidelberg (2005)
Precioso, F., Barlaud, M.: B-spline active contour with handling of topology changes for fast video
segmentation. EURASIP J. Adv. Signal Process. 2002(6), 555–560 (2002)
Price, B.L., Morse, B., Cohen, S.: Geodesic graph cut for interactive image segmentation. In: 2010
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3161–
3168 (2010)
Rabbitt, R., Weiss, J., Christensen, G., Miller, M.: Mapping of Hyperelastic Deformable Templates
Using the Finite Element Method. In: Proceedings SPIE, vol. 2573, pp. 252–265. SPIE (1995)
Rada, L., Chen, K.: A new variational model with dual level set functions for selective segmenta-
tion. Commun. Comput. Phys. 12(1), 261–283 (2012)
Rada, L., Chen, K.: Improved selective segmentation model using one level-set. J. Alg. Comput.
Technol. 7(4), 509–540 (2013)
Rao, S.R., Mobahi, H., Yang, A.Y., Sastry, S.S., Ma, Y.: Natural image segmentation with adaptive
texture and boundary encoding. In: Zha, H., Taniguchi, R.I., Maybank, S. (eds.) Computer
Vision – ACCV 2009, pp. 135–146. Springer Berlin/Heidelberg (2010)
Roberts, M., Chen, K., Irion, K.: A convex geodesic selective model for image segmentation. J.
Math. Imaging Vision 61(5), 482–503 (2019)
Rochery, M., Jermyn, I., Zerubia, J.: Phase field models and higher-order active contours. In: Tenth
IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol. 2, pp. 970–976
(2005)
Rochery, M., Jermyn, I.H., Zerubia, J.: Higher order active contours. Int. J. Comput. Vis. 69(1),
27–42 (2006)
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram
matching – incorporating a global constraint into MRFs. In: 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp. 993–1000
(2006)
Rumpf, M., Wirth, B.: A nonlinear elastic shape averaging approach. SIAM J. Imag. Sci. 2(3),
800–833 (2009)
Schaeffer, H., Duggan, N., Le Guyader, C., Vese, L.: Topology preserving active contours.
Commun. Math. Sci. 12(7), 1329–1342 (2014)
Sederberg, T., Parry, S.: Free-form deformation of solid geometric models. SIGGRAPH Comput.
Graph. 20(4), 151–160 (1986)
Ségonne, F.: Active contours under topology control—genus preserving level sets. Int. J. Comput.
Vis. 79(2), 107–117 (2008)
Ségonne, F., Pacheco, J., Fischl, B.: Geometrically accurate topology-correction of cor-
tical surfaces using nonseparating loops. IEEE Trans. Med. Imaging 26(4), 518–529
(2007)
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance
evaluation. J. Electron. Imaging 13(1), 146–168 (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
22(8), 888–905 (2000)
Siu, C.Y., Chan, H.L., Lui, L.M.: Image segmentation with partial convexity prior using discrete
conformality structures. SIAM J. Image Sci. 13(4), 2105–2139 (2020)
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: A survey. IEEE
Trans. Med. Imaging 32(7), 1153–1190 (2013)
42 A Survey of Topology and Geometry-Constrained Segmentation. . . 1481
Spencer, J., Chen, K.: A convex and selective variational model for image segmentation. Commun.
Math. Sci. 13(6), 1453–1452 (2015)
Storath, M., Weinmann, A.: Fast partitioning of vector-valued images. SIAM J. Imag. Sci. 7(3),
1826–1852 (2014)
Sundaramoorthi, G., Yezzi, A.: More-than-topology-preserving flows for active contours and
polygons. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Vol. 1,
Vol. 2, pp. 1276–1283 (2005)
Taghanaki, S.A., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic
segmentation of natural and medical images: a review. arXiv e-prints (2019). https://arxiv.org/
abs/1910.07655
Tang, T., Chung, A.: Non-rigid image registration using graph-cuts. In: Medical Image Computing
and Computer-Assisted Intervention: MICCAI International Conference on Medical Image
Computing and Computer-Assisted Intervention, vol. 10, pp. 916–924 (2007)
Theljani, A., Chen, K.: An augmented Lagrangian method for solving a new variational model
based on gradients similarity measures and high order regularization for multimodality registra-
tion. Inverse Prob. Imaging 13(2), 309–335 (2019)
Thierbach, K., Bazin, P.L., Gavriilidis, F., Kirilina, E., Jäger, C., Morawski, M., Geyer, S.,
Weiskopf, N., Scherf, N.: Deep Learning meets Topology-preserving Active Contours: towards
scalable quantitative histology of cortical cytoarchitecture. bioRxiv (2018)
Thiruvenkadam, S.R., Chan, T.F., Hong, B.-W.: Segmentation under occlusions using selective
shape prior. SIAM J. Imaging Sci. 1(1), 115–142 (2008)
Tustison, N.J., Avants, B.B., Siqueira, M., Gee, J.C.: Topological well-composedness and glam-
orous glue: A digital gluing algorithm for topologically constrained front propagation. IEEE
Trans. Image Process. 20(6), 1756–1761 (2011)
Unal, G., Slabaugh, G.: Coupled PDEs for non-rigid registration and segmentation. In: IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005,
vol. 1, pp. 168–175 (2005)
Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image Registration via level-set motion: Applications to
atlas-based segmentation. Med. Image Anal. 7(1), 1–20 (2003)
Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the
Mumford and Shah Model. Int. J. Comput. Vis. 50(3), 271–293 (2002)
Vese, L.A., Le Guyader, C.: Variational Methods in Image Processing. Mathematical and Compu-
tational Imaging Sciences Series. Chapman & Hall/CRC, Taylor & Francis (2015)
Waggoner, J., Zhou, Y., Simmons, J., Graef, M.D., Wang, S.: Topology-preserving multi-label
image segmentation. In: 2015 IEEE Winter Conference on Applications of Computer Vision,
pp. 1084–1091 (2015)
Wang, L., Li, C., Sun, Q., Xia, D., Kao, C.Y.: Active contours driven by local and global intensity
fitting energy with application to brain mr image segmentation. Comput. Med. Imaging Graph.
33(7), 520–531 (2009)
Wells, W.M., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration by
maximization of mutual information. Med. Image Anal. 1(1), 35–51 (1996)
Werlberger, M., Pock, T., Unger, M., Bischof, H.: A variational model for interactive shape prior
segmentation and real-time tracking. In: X.C. Tai, K. Mørken, M. Lysaker, K.A. Lie (eds.) Scale
Space and Variational Methods in Computer Vision, pp. 200–211. Springer, Berlin/Heidelberg
(2009)
Wirth, B.: On the Γ -limit of joint image segmentation and registration functionals based on phase
fields. Interfaces Free Bound. 18(4), 441–477 (2016)
Wu, G., Wang, L., Gilmore, J., Lin, W., Shen, D.: Joint segmentation and segistration for infant
brain images. In: Menze, B., Langs, G., Montillo, A., Kelm, M., Müller, H., Zhang, S., Cai,
T.W., Metaxas, D. (eds.) Medical Computer Vision: Algorithms for Big Data: International
Workshop, MCV 2014, Held in Conjunction with MICCAI 2014, Cambridge, 18 Sept 2014,
Revised Selected Papers, pp. 13–21. Springer International Publishing (2014)
Xu, Z., Niethammer, M.: DeepAtlas: Joint semi-supervised learning of image registration and
segmentation. arXiv e-prints (2019). https://arxiv.org/abs/1904.08465
1482 K. Chen et al.
Yanovsky, I., Thompson, P.M., Osher, S., Leow, A.D.: Topology preserving log-unbiased nonlinear
image registration: Theory and implementation. In: Proceedings of IEEE Conference on
Computer Vision Pattern Recognition, pp. 1–8 (2007)
Yatziv, L., Bartesaghi, A., Sapiro, G.: O(N ) implementation of the fast marching algorithm. J.
Comput. Phys. 212(2), 393–399 (2006)
Yezzi, A., Zollei, L., Kapur, T.: A variational framework for joint segmentation and registration.
In: Mathematical Methods in Biomedical Image Analysis, pp. 44–51. IEEE-MMBIA (2001)
Yotter, R.A., Dahnke, R., Thompson, P.M., Gaser, C.: Topological correction of brain surface
meshes using spherical harmonics. Hum. Brain Mapp. 32(7), 1109–1124 (2011)
Yu, W., Lee, H.K., Hariharan, S., Bu, W., Ahmed, S.: Evolving generalized Voronoï diagrams
for accurate cellular image segmentation. Cytometry. Part A J. Int. Soc. Anal. Cytol. 77A(4),
379–386 (2010)
Zagorchev, L., Goshtasby, A.: A comparative study of transformation functions for nonrigid image
registration. IEEE Trans. Image Process. 15(3), 529–538 (2006)
Zhang, D., Chen, K.: A novel diffeomorphic model for image registration and its algorithm. J.
Math. Imaging Vision 60(8), 1261–1283 (2018)
Zhang, J., Chen, K., Yu, B., Gould, D.: A local information based variational model for selective
image segmentation. Inverse Prob. Imaging 8(1), 293–320 (2014)
Zhou, T., Ruan, S., Canu, S.: A review: Deep learning for medical image segmentation using multi-
modality fusion. Array 3–4, 100004 (2019)
Zhu, H., Meng, F., Cai, J., Lu, S.: Beyond pixels: A comprehensive survey from bottom-up to
semantic image segmentation and cosegmentation. J. Vis. Commun. Image Represent. 34, 12–
27 (2016)
Recent Developments of Surface
Parameterization Methods Using 43
Quasi-conformal Geometry
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1484
Previous Works on Surface Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486
Mesh Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486
Point Cloud Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487
Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487
Conformal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1488
Quasi-conformal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1488
Linear Beltrami Solver (LBS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1491
Beltrami Holomorphic Flow (BHF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1492
Teichmüller Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1493
Mesh Parameterization Using Quasi-conformal Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 1494
Genus-0 Closed Triangle Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1495
Simply Connected Open Triangle Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1501
Multiply Connected Open Triangle Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507
Point Cloud Parameterization Using Conformal and Quasi-conformal Geometry . . . . . . . . 1509
Genus-0 Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1511
Point Clouds with Disk Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1516
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518
G. P. T. Choi
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
e-mail: [email protected]
L. M. Lui ()
Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China
e-mail: [email protected]
Abstract
Keywords
Introduction
Fig. 1 Parameterization of surfaces with different topology. The top left panel shows a spherical
conformal parameterization of a genus-0 closed surface. (Image adapted from Choi et al. 2015).
The top right panel shows a free-boundary conformal parameterization of a simply connected open
surface. (Image adapted from Choi et al. 2020a). The bottom left panel shows a disk conformal
parameterization of a simply connected open surface. (Image adapted from Choi and Lui 2015).
The bottom right panel shows a poly-annulus conformal parameterization of a multiply connected
open surface. (Image adapted from Choi et al. 2021)
Fig. 2 Examples of mesh and point cloud parameterizations. Left: A simply connected open
triangle mesh and the disk conformal parameterization. (Image adapted from Choi and Lui 2015).
Right: A genus-0 point cloud and the spherical conformal parameterization. (Image adapted
from Choi et al. 2016)
Mesh Parameterization
Over the past several decades, numerous mesh parameterization methods have been
developed. Readers are referred to Floater and Hormann (2005), Sheffer et al.
(2006), and Hormann et al. (2007) for detailed surveys on the subject. Below, we
highlight some recent works on mesh parameterization.
In recent years, conformal parameterization methods have been extensively
studied (see Gu and Yau 2008; Gu et al. 2020 for a comprehensive discussion).
Among all conformal parameterization methods, one common approach is to make
use of harmonic energy minimization (Gu et al. 2004; Lai et al. 2014). Another
common approach is to utilize surface Ricci flow (Jin et al. 2008; Yang et al.
2009; Zhang et al. 2014) (see Zhang et al. 2015 for a survey). Other notable
methods for computing conformal parameterizations include the slit map (Yin
et al. 2008), Koebe’s iteration (Zeng et al. 2009), metric scaling (Ben-Chen et al.
43 Recent Developments of Surface Parameterization Methods. . . 1487
2008), boundary first flattening (Sawhney and Crane 2017), and conformal energy
minimization (Yueh et al. 2017).
Area-preserving mesh parameterization methods have also been widely studied
in recent years. Recent works include the Lie advection method (Zou et al. 2011),
the optimal mass transportation (OMT) method (Zhao et al. 2013; Su et al. 2016;
Nadeem et al. 2016; Pumarola et al. 2019; Giri et al. 2021; Lei and Gu 2021; Choi
et al. 2022), stretch energy minimization (Yueh et al. 2019), and density-equalizing
maps (Choi and Rycroft 2018; Choi et al. 2020b).
Besides, there are many other energy minimization approaches for computing
mesh parameterizations in computer graphics. Typically, these approaches define
some distortion measures and attempt to minimize them to produce the desired
effects. Recent works include the advanced MIPS method (Fu et al. 2015),
symmetric Dirichlet energy (Smith and Schaefer 2015), scalable locally injective
mappings (SLIM) (Rabinovich et al. 2017), isometry-aware preconditioning (Claici
et al. 2017), progressive parameterization (Liu et al. 2018), and efficient bijective
parameterizations (Su et al. 2020).
With the advancement of 3D data acquisition techniques, the use of point clouds
has been increasingly popular in recent decades. For this reason, there is also an
increasing interest in the development of point cloud parameterization methods for
the shape analysis and processing of point clouds.
In 2004, Zwicker and Gotsman proposed a spherical parameterization method
for genus-0 point clouds. In 2006, Tewari et al. proposed a doubly periodic global
parameterization method for genus-1 point clouds. In 2010, Zhang et al. developed
an as-rigid-as-possible meshless parameterization method for point clouds with disk
topology. In 2013, Meng et al. proposed a self-organizing radial basis function
(RBF) neural network method for point cloud parameterization.
For the conformal parameterization of point clouds, one important component is
the approximation of the Laplacian operator on point clouds. In recent years, several
point cloud Laplacian approximation methods have been proposed, including the
moving least squares (MLS) method (Belkin et al. 2009; Liang et al. 2012; Liang
and Zhao 2013), the local mesh method (Lai et al. 2013; Choi et al. 2022), and the
non-manifold Laplacian method (Sharp and Crane 2020).
Mathematical Background
Conformal Maps
∂u ∂v ∂u ∂v
= and =− . (1)
∂x ∂y ∂y ∂x
∂f
= 0. (3)
∂z
Conformal maps preserve angles and hence the local geometry. Intuitively, under a
conformal map, infinitesimal circles are mapped to infinitesimal circles (see Fig. 3).
Quasi-conformal Maps
Fig. 3 An illustration of conformal and quasi-conformal maps. (Image adapted from Lui et al.
2014). Left: A surface with a circle packing texture. Middle: A conformal map of the surface onto
the unit disk. Note that the small circles are mapped to small circles. Right: A quasi-conformal
map of the surface onto the unit disk. Note that the small circles are mapped to small ellipses
43 Recent Developments of Surface Parameterization Methods. . . 1489
∂f ∂f
= μf (z) (4)
∂z ∂z
for some complex-valued function μf with ||μf ||∞ < 1. μ is called the Beltrami
coefficient of the map f . Considering the first order approximation of f around a
point p with respect to its local parameter, we have the following:
and hence:
fz (p) 1 − μf (p) z − p ≤ f (z) − f (p) ≤ fz (p) 1 + μf (p) z − p .
(7)
This shows that an infinitesimal circle is mapped to an infinitesimal ellipse with
bounded eccentricity under a quasi-conformal
map (see Figs. 3 and 4), where
the maximal magnification factor is fz (p) 1 + |μf (p)| , the maximal shrinkage
factor is fz (p) 1 − μf (p) , and the maximal dilatation of f is as follows:
1 + μf ∞
K(f ) = . (8)
1 − μf ∞
Also, note that the last equality in Equation (7) holds if and only if:
1+
1−
arg /2
Fig. 4 An illustration of quasi-conformal maps. (Image adapted from Choi et al. 2020c)
1490 G. P. T. Choi and L. M. Lui
This shows that the orientation change of the major axis of the ellipse is
arg(μf (p))/2. From the above, it can be observed that the Beltrami coefficient
μ encodes useful information of the quasi-conformality of the mapping f .
The bijectivity of the map f is also related to the Beltrami coefficient of it. More
specifically, if f (z) = f (x + iy) = u(x, y) + iv(x, y), where u, v are two real-
valued functions, the Jacobian of f is given by the following:
Jf = ux vy − uy vx
1
= (ux + vy )2 + (uy − vx )2 − (ux − vy )2 − (uy + vx )2
4
2 2
1 1
= (fx − ify ) − (fx + ify )
2 2
= |fz |2 − |fz |2
2
= |fz |2 1 − |μf | , (11)
fz
μf + fz (μg ◦ f)
μg◦f = . (12)
fz
1+ fz μf (μg ◦ f)
In particular, if μf −1 = μg , then:
43 Recent Developments of Surface Parameterization Methods. . . 1491
fz fz fz fz
μf + (μg ◦ f ) = μf + (μf −1 ◦ f ) = μf + − μf = 0, (13)
fz fz fz fz
dzα dzβ
μα = μβ (14)
dzα dzβ
(ux − vy ) + i(vx + uy )
μf = . (15)
(ux + vy ) + i(vx − uy )
−vy = α1 ux + α2 uy ;
vx = α2 ux + α3 uy , (16)
where:
(ρ − 1)2 + τ 2 2τ 1 + 2ρ + ρ 2 + τ 2
α1 = , α2 = − , α3 = . (17)
1−ρ −τ
2 2 1−ρ −τ
2 2 1 − ρ2 − τ 2
1492 G. P. T. Choi and L. M. Lui
uy = α1 vx + α2 vy ;
−ux = α2 vx + α3 vy . (18)
−vy uy
Now, since ∇ · = 0 and ∇ · = 0, we have the following:
vx −ux
⎛ ⎞ ⎛ ⎞
ux ⎠ vx ⎠
∇ · ⎝A = 0 and ∇ · ⎝A = 0, (19)
uy vy
α1 α2
where A = .
α2 α3
In the discrete case, one can discretize the elliptic PDEs (19) as sparse positive
definite linear systems. Therefore, for any given μ and some prescribed boundary
conditions, one can efficiently obtain a quasi-conformal map f with the associated
Beltrami coefficient being μ. See Lui et al. (2013) for more details of the
computational procedure of the LBS method.
In Lui et al. (2010, 2012), Lui et al. developed another method called the Beltrami
holomorphic flow (BHF) for reconstructing quasi-conformal maps for given Bel-
trami coefficients. The BHF method is based on the following theorem (Gardiner
and Lakic 2000):
with μ in the unit ball of C ∞ (C), ν, (t) ∈ L∞ (C) such that (t)∞ → 0 as
t → 0. Then, for all w ∈ C, we have the following:
f μ (w)(f μ (w) − 1) ν(z)(f μ )z (z))2
V (f μ , ν)(w) = − dz.
C f (z)(f (z) − 1)(f (z) − f (w))
π μ μ μ μ
(22)
In other words, given a Beltrami coefficient and the target positions of three
points, one can obtain a unique quasi-conformal map. In practice, to reconstruct the
quasi-conformal map, one can start with the identity map and iteratively flow the
map to f μ using BHF. See Lui et al. (2012) for more details of the computational
procedure of the BHF method.
Teichmüller Maps
φ
μf = k , (23)
φ
ϕ
μf = k , (24)
ϕ
Teichmüller maps and extremal maps are connected by the following theo-
rem (Lui et al. 2014):
In recent years, quasi-conformal theory has been widely used in surface mapping,
registration, and visualization. For instance, Zeng et al. (2012) developed a method
for computing quasi-conformal mappings between Riemann surfaces using Yamabe
flow and an auxiliary metric which incorporates quasi-conformality induced from
the Beltrami differential. Specifically, quasi-conformal mappings are equivalent
to conformal mappings under the auxiliary metric and hence can be effectively
computed. Lipman et al. (2012) computed quasi-conformal plane deformations by
introducing a formula for 4-point planar warping. Weber et al. (2012) developed a
method for computing piecewise linear approximations of extremal quasi-conformal
maps. Lipman (2012) and Chien et al. (2016) developed methods for computing
bounded distortion mappings. Wong and Zhao (2014, 2015) developed methods for
computing surface mappings using discrete Beltrami flow. Zeng and Gu (2011) pro-
posed a surface registration method using quasi-conformal curvature flow. Lui and
Wen (2014) proposed a method for high-genus surface registration by computing
a quasi-conformal map between the conformal embedding of the surfaces on the
hyperbolic disk. Quasi-conformal theory has also been used in the development of
rectilinear maps (Yang and Zeng 2020) and retinotopic maps (Tu et al. 2020; Ta
43 Recent Developments of Surface Parameterization Methods. . . 1495
et al. 2021). In this section, we review the latest mesh parameterization methods
developed based on quasi-conformal geometry.
By the uniformization theorem, every simply connected Riemann surface is
conformally equivalent to either the unit disk, the complex plane, or the Riemann
sphere. Also, every multiply connected open surface is conformally equivalent to a
circle domain with circular holes. Therefore, as mentioned earlier in section “Intro-
duction”, various methods have been proposed for parameterizing surface meshes
with different topology onto different parameter domains. Table 1 summarizes the
recent mesh parameterization methods based on quasi-conformal theory. Below, we
first introduce the parameterization methods for genus-0 closed triangle meshes and
then discuss the methods for simply connected and multiply connected open triangle
meshes.
Conformal Parameterization
In 2015, Choi et al. proposed a fast algorithm for the spherical conformal parame-
terization of genus-0 closed triangle meshes (see Fig. 5). More specifically, given a
genus-0 closed triangle mesh M, the algorithm first follows the idea in Haker et al.
(2000) and punctures one triangle T = [vi , vj , vk ] from M. The punctured surface
M \ T is then a simply connected open surface and hence can be mapped onto the
plane by solving the Laplace equation:
g = 0, (27)
where g : M \ T → C flattens the punctured mesh onto a planar triangular
domain with the three mapped boundary vertices g(vi ), g(vj ), and g(vk ) forming a
1496 G. P. T. Choi and L. M. Lui
Fig. 5 An illustration of the fast spherical conformal parameterization method. (Image adapted
from Choi et al. 2015)
boundary triangle with the same angle structure as T . One can then map the planar
triangular domain onto the unit sphere using the inverse stereographic projection
−1
ϕN : C → S2 , where the stereographic projection ϕN : S2 → C is given by the
following:
X Y
ϕN (X, Y, Z) = +i (28)
1−Z 1−Z
−1
and the inverse stereographic projection ϕN : C → S2 is given by the following:
−1
The composition map ϕN ◦ g is then a parameterization mapping from M onto
the unit sphere S2 . However, the conformal distortion near the punctured triangle T ,
which corresponds to the north pole region of the unit sphere, is severe in the discrete
case. To correct the conformal distortion there, the algorithm in Choi et al. (2015)
maps the sphere to the extended complex plane using the south pole stereographic
projection ϕS : S2 → C with the following:
X Y
ϕS (X, Y, Z) = +i , (30)
1+Z 1+Z
such that the south pole region of the unit sphere is mapped to the outermost part
of the planar domain and the north pole region of the unit sphere is mapped to
the innermost part of the planar domain. The algorithm then computes a quasi-
conformal map h : C → C with the Beltrami coefficient μh = μ(ϕS ◦ϕ −1 ◦g)−1 and
N
43 Recent Developments of Surface Parameterization Methods. . . 1497
with the outermost part of the domain fixed using the LBS method (Lam and Lui
−1
2014). The composition map h ◦ ϕS ◦ ϕN ◦ g is then conformal by the composition
formula in Equation (12). Finally, the map ϕS−1 ◦ h ◦ ϕS ◦ ϕN−1
◦ g gives a conformal
parameterization of M onto the unit sphere. Moreover, the use of the Beltrami
coefficients also helps ensure that the mapping is bijective (see Fig. 6).
Another spherical conformal parameterization method that utilizes quasi-
conformal theory is the parallelizable global conformal parameterization (PGCP)
method (Choi et al. 2020a) (see Fig. 7 for an example). The PGCP method achieves
the conformal parameterization using a divide-and-conquer manner by considering
Fig. 6 The spherical conformal parameterization method in Choi et al. (2015) is capable of
mapping a complicated dinosaur mesh (left) onto the unit sphere bijectively (bottom right), while
the traditional method (Gu et al. 2004) (top right) produces overlaps. (Image adapted from Choi
2016)
Fig. 7 The spherical conformal parameterization of a genus-0 duck surface mesh obtained using
the parallelizable global conformal parameterization (PGCP) method. (Image adapted from Choi
et al. 2020a). The colors indicate the correspondence between the subdomains in the original mesh
and in the parameterization result
1498 G. P. T. Choi and L. M. Lui
a partition of the input triangle mesh into several submeshes. Because of the use of
mesh partition, the PGCP method is capable of handling not only genus-0 closed
surfaces but also simply connected open surfaces. The method will be explained in
detail later in section “Simply Connected Open Triangle Meshes”.
Quasi-conformal Parameterization
In 2015, Choi et al. developed the fast landmark-aligned spherical harmonic
parameterization (FLASH) method for genus-0 closed triangle meshes (see Fig. 8
for an illustration). More specifically, given two genus-0 closed triangle meshes S1
and S2 with two sets of corresponding landmarks {pj }nj=1 and {qj }nj=1 on S1 and
S2 , respectively, denote the spherical conformal parameterization of S2 obtained by
the abovementioned method in Choi et al. (2015) by φ2 : S2 → S2 . The FLASH
method aims to find a spherical parameterization f : S1 → S2 such that f (pj )
matches φ2 (qj ) as accurately as possible for all j = 1, 2, . . . , n, and the conformal
distortion of f is also as small as possible. To achieve this, the method first computes
the spherical conformal parameterization φ1 : S1 → S2 . It then solves for a quasi-
conformal map ψ : S2 → S2 that minimizes the following combined energy:
n
Ecombined (ψ) = |∇ψ|2 + λ |ψ(φ1 (pj )) − φ2 (qj )|2 , (31)
j =1
where λ ≥ 0 is a weighting factor for balancing the conformality and the landmark
mismatch. In particular, a large λ yields a quasi-conformal map with a smaller
landmark mismatch but a larger conformal distortion, while a small λ yields a
smaller conformal distortion but the landmark mismatch will be larger. φ can be
obtained by solving the following equation:
K(T ) − 1
μ(T ) = (34)
K(T ) + 1
for every triangle T . By applying the LBS method (Lui et al. 2013) to reconstruct
a quasi-conformal map on the plane associated with the Beltrami coefficient
μ followed by the inverse stereographic projection, the desired spherical quasi-
conformal parameterization is obtained. Figure 10 shows an example of spherical
quasi-conformal parameterization obtained by the FSQC method.
43 Recent Developments of Surface Parameterization Methods. . . 1501
Fig. 10 An example of the fast spherical quasi-conformal parameterization (FSQC) method for
genus-0 closed triangle meshes. (Image adapted from Choi et al. 2016). Left: The input genus-0
closed surface with a circle packing texture and the spherical quasi-conformal parameterization
obtained by FSQC. Right: The prescribed quasi-conformal dilatation and the final dilatation of the
resulting parameterization. Note that the circles on the input surface are mapped to two classes of
ellipses with different eccentricity as shown in the parameterization result, which correspond to
K = 1.5 and K = 3 in the target dilatation histogram, respectively
Conformal Parameterization
In 2015, Choi and Lui proposed a fast disk conformal parameterization method
for simply connected open triangle meshes (see Fig. 11). The method involves two
major steps, namely, the “north pole” step and the “south pole” step. Analogous
to the spherical conformal parameterization method in Choi et al. (2015), the
method handles the conformal distortion at different parts of the parameter domain
separately. More specifically, after getting an initial disk harmonic map by solving
the Laplace equation:
f = 0 (35)
subject to a circular boundary constraint, the method considers the following “north
pole” step. It first maps the unit disk to the upper half plane using the Cayley
transform:
1+z
W (z) = i , (36)
1−z
and composes the map with another quasi-conformal map to reduce the conformal
distortion using the idea of quasi-conformal composition in Equation (12) with the
boundary triangle fixed. Then, it maps the upper half plane back to the unit disk
using the inverse Cayley transform:
z−i
W −1 (z) = . (37)
z+i
1502 G. P. T. Choi and L. M. Lui
Fig. 11 An illustration of the fast disk conformal parameterization method for simply connected
open triangle meshes. (Image adapted from Choi and Lui 2015). (a) “North pole” iteration. (b)
“South pole” iteration
The above step helps reduce the conformal distortion at the innermost region of the
disk, while the distortion at the region around z = 1 may still be large. Therefore,
in the subsequent “south pole” step, the method uses a reflection mapping z → 1z
to reflect the disk along the unit circle, so that the outermost region of the new
shape corresponds to the innermost region of the disk, which is with low conformal
distortion due to the previous “north pole” step. One can then fix the outermost
region and apply the idea of quasi-conformal composition again to compute a
quasi-conformal map so that the conformal distortion at the region around z = 1
is reduced. By repeating the above procedure, one can eventually obtain a disk
conformal parameterization.
In 2018, Choi and Lui proposed a linear formulation for disk conformal parame-
terization of simply connected open triangle meshes. The idea is to use a technique
called double covering to turn any given simply connected open triangle mesh into a
genus-0 mesh and then apply the fast spherical conformal parameterization method
in Choi et al. (2015). More specifically, given a simply connected open triangle
mesh M = (V, E, F), the method constructs a new mesh M by duplicating
M and reversing the orientation of every triangle in it. In other words, for each
triangle [vi , vj , vk ] in M, the corresponding triangle in M is given by [vi , vk , vj ],
where vi , vj , vk are copies of the vertices vi , vj , vk . One can then glue M and M
43 Recent Developments of Surface Parameterization Methods. . . 1503
v
v → (38)
|v|
for all boundary vertices. Finally, to correct the conformal distortion caused by
the projection, the method composes the parameterization map with another quasi-
conformal map based on the composition formula in Equation (12), thereby yielding
a disk conformal parameterization (see Fig. 12 for an example).
Note that the abovementioned methods compute the conformal parameterization
of the input mesh globally. In case the density of the input mesh is very high or the
mesh geometry is complicated, the computation of the global parameterization may
be expensive and challenging. To resolve this issue, Choi et al. (2020a) proposed
the parallelizable global conformal parameterization (PGCP) method (Choi et al.
2020a) (see Fig. 13 for an illustration). Specifically, the PGCP method considers
partitioning the input mesh into different subdomains. For each subdomain, the
discrete natural conformal parameterization (DNCP) method in Desbrun et al.
(2002) is used for finding an initial free-boundary conformal flattening map. As
the local parameterizations of different subdomains may not be consistent along
their boundaries, the PGCP method looks for a series of conformal maps to deform
Fig. 12 The disk conformal parameterization of a simply connected open surface obtained using
the linear disk map method. (Image adapted from Choi and Lui 2018)
1504 G. P. T. Choi and L. M. Lui
Fig. 14 An illustration of the partial welding procedure. (Image adapted from Choi et al. 2020a)
the boundaries to enforce the consistency between them. This is achieved using a
variant of conformal welding called partial welding.
More specifically, given a diffeomorphism f from a closed curve (e.g., the unit
circle) to itself, conformal welding aims to find two Jordan domains D, ⊂ C
and two conformal maps φ : D → and φ ∗ : D ∗ → ∗ , where D ∗ and ∗
are the exterior of D and , respectively, such that φ = φ ∗ ◦ f on the closed
curve. In other words, the two surfaces are stitched together seamlessly. By the
sewing theorem (Lehto 1973), if f is a quasisymmetric function from the real axis
to itself, then the upper and lower half-planes can be mapped conformally onto
disjoint Jordan domains D, by two maps φ, φ ∗ , with φ(x) = φ ∗ (f (x)) for
all x ∈ R. Partial welding is a variant of conformal welding in the sense that it
does not assume the full correspondence between two boundary curves but only the
correspondence between a portion of the two curves. As illustrated in Fig. 14, to
enforce the consistency between two arcs of the boundaries of two Jordan regions
A and B on the complex plane, one can apply a series of analytic functions to
map A to the upper half plane and B to the lower half plane such that the two
corresponding arcs are mapped to the same interval I on the real axis. Then, one can
43 Recent Developments of Surface Parameterization Methods. . . 1505
find a conformal map that matches the corresponding points on the two arcs, thereby
enforcing the consistency between them. After transforming all the boundaries
of the flattened subdomains using this idea of partial welding, one can solve the
Laplace equation subject to the welded boundary constraints for each subdomain.
The final result is then a global free-boundary conformal parameterization of the
input mesh. It is noteworthy that both the initial and final parameterizations of the
subdomains are independent of those of the other subdomains, and hence one can
exploit parallelization in the computational procedure. Some additional steps can
be further incorporated for producing disk conformal parameterizations. It is also
possible to further reduce the area distortion of the conformal parameterizations by
finding an optimal Möbius transformation.
For some applications, it is more desirable to compute conformal parameteriza-
tions of the given surfaces onto a standardized planar domain different from a disk
or a rectangle. For instance, 3D carotid artery surfaces are usually visualized with
the aid of a nonconvex L-shaped parameter domain. In 2017, Choi et al. developed a
conformal parameterization method for flattening carotid artery surface meshes. The
method starts by computing an arclength scaling map onto a nonconvex L-shaped
planar domain for the initialization. Next, it computes the Beltrami coefficient of
the inverse of the arclength scaling map and then constructs a quasi-conformal
map from the L-shaped domain onto itself with the same Beltrami coefficient using
the LBS method (Lui et al. 2013), thereby yielding a conformal flattening map by
the composition formula in Equation (12). However, since the L-shaped domain is
nonconvex, the overall mapping is not guaranteed to be bijective especially near the
nonconvex corner of the domain. To enforce the bijectivity, the method considers
smoothing and chopping the Beltrami coefficient iteratively. More specifically, the
smoothing step is done by solving the following energy minimization problem:
μ̃ = argminμ (|∇μ|2 + |μ − ν| + |μ|2 ), (39)
Quasi-conformal Parameterization
The LBS method (Lui et al. 2013) and the BHF method (Lui et al. 2012) can be
naturally applied for computing quasi-conformal parameterizations of any given
simply connected open triangle mesh. Specifically, after parameterizing the given
mesh onto a planar domain using the abovementioned conformal parameterization
methods, one can compute a quasi-conformal map with a prescribed Beltrami
1506 G. P. T. Choi and L. M. Lui
Fig. 15 The conformal parameterization of a carotid artery surface onto a standardized L-shaped
planar domain. (Image adapted from Choi et al. 2017). Here, the color represents the vessel-wall-
plus-plaque thickness (VWT) measurement for the carotid model
coefficient subject to some boundary constraints using either LBS or BHF. Similarly,
the QC iteration method (Lui et al. 2014) can be used for computing landmark-
matching Teichmüller parameterization of simply connected open triangle meshes.
It is noteworthy that these approaches can only produce fixed-boundary quasi-
conformal parameterizations.
More recently, Qiu et al. (2019) proposed a method for computing free-boundary
quasi-conformal parameterization of simply connected open triangle meshes. Let
f (z) = f (x + iy) = u(x, y) + iv(x, y) and μ = ρ + iτ . The least squares quasi-
conformal energy is defined as follows:
1
ELSQC (u, v; μ) = P ∇u + J P ∇v2 dx dy, (40)
2
where:
1 1 − ρ −τ
P = (41)
1 − |μ|2 −τ 1 + ρ
and:
0 −1
J = . (42)
1 0
where:
⎛ ⎞
(ρ−1)2 +τ 2
⎜ 1−ρ 2 −τ 2 − 1−ρ2τ2 −τ 2 ⎟
A=⎝ ⎠ (44)
1+2ρ+ρ 2 +τ 2
− 1−ρ2τ2 −τ 2 1−ρ 2 −τ 2
.
Conformal Parameterization
In Choi et al. (2021), Choi developed a method for the annulus conformal
parameterization of multiply connected open triangle meshes with one hole and
a method for the poly-annulus conformal parameterization of multiply connected
open triangle meshes with k > 1 holes.
An illustration of the annulus conformal map (ACM) method is shown in Fig. 16.
Given any multiply connected open triangle mesh, the ACM method starts by
finding a path from a vertex at the inner boundary to a vertex at the outer boundary
and slicing the mesh along the path. As the sliced mesh is simply connected, one can
map it onto a rectangle using the rectangular conformal parameterization method
in Meng et al. (2016) with a periodic boundary constraint at the top and bottom
boundaries (the method will be explained in detail later in section “Point Cloud
Parameterization Using Conformal and Quasi-conformal Geometry”). Now, denote
the rectangular domain as [0, L] × [0, 1]. One can apply the following exponential
map η to map the rectangular domain to an annulus with inner radius e−2π L and
outer radius 1:
η(z) = e2π(z−L) . (45)
Fig. 16 An illustration of the annulus conformal map (ACM) method. (Image adapted from Choi
et al. 2021)
1508 G. P. T. Choi and L. M. Lui
Fig. 17 An illustration of the poly-annulus conformal map (PACM) method. (Image adapted
from Choi et al. 2021)
Quasi-conformal Parameterization
Given any multiply connected open surface and any target Beltrami coefficient,
it is natural to ask whether one can compute a quasi-conformal parameterization
of the surface onto a canonical circle domain with the Beltrami coefficient of the
resulting mapping matching the input Beltrami coefficient. One major challenge in
this problem is that the radii and centers of the inner circles on the circle domain
43 Recent Developments of Surface Parameterization Methods. . . 1509
depend on the input multiply connected surface and hence cannot be set arbitrarily.
As the LBS method (Lui et al. 2013) and the BHF method (Lui et al. 2012) require
fixed (Dirichlet) boundary conditions, they cannot be used for computing the quasi-
conformal parameterization with the desired Beltrami coefficient directly. To solve
this problem, Ho and Lui (2016) proposed a variational approach called QCMC
for computing the quasi-conformal parameterization of multiply connected open
surfaces. More specifically, given any multiply connected open triangle mesh M
with ∂M = γ0 − γ1 − γ2 − · · · − γk , i.e., γ0 is the outer boundary and γ1 , . . . , γk
are the inner boundaries, and any Beltrami coefficient μ, the QCMC method treats
the radii r and centers c of the inner circles on the circle domain as variables and
minimizes the following energy to solve for an optimal quasi-conformal map f :
E(f, r, c) = fz − μfz 2 , (46)
M
subject to the constraints f (γ0 ) = ∂D, f (γi ) = ∂Bri (ci ) for i = 1, . . . , k and
μ(f )∞ = fz /fz ∞ < 1. Here, Bri (ci ) denotes the circle centered at a point
ci ∈ Z with radius ri > 0. In other words, the QCMC method simultaneously
searches for the optimal conformal module (r, c) for the boundary constraints and
the optimal quasi-conformal map f that satisfies the boundary constraints and is
associated with the prescribed Beltrami coefficient. Figure 18 shows an example of
the quasi-conformal parameterization obtained by the QCMC method.
It is also possible to compute the Teichmüller parameterizations of multiply
connected open triangle meshes. In 2014, Ng et al. developed a method for
computing the extremal Teichmüller map between two multiply connected domains.
The method iteratively updates the Beltrami coefficient of the mapping using BHF
until the norm of the Beltrami coefficient becomes uniform (see Fig. 19 for an
example). By combining the conformal parameterization methods for multiply
connected open surfaces and the proposed extremal Teichmüller mapping method,
the Teichmüller parameterization of any multiply connected open triangle mesh can
be obtained.
In recent years, several methods have been proposed for computing the conformal
and quasi-conformal parameterization of point clouds. Many of these methods are
motivated by prior mesh parameterization approaches, with some key modifications
and extensions for handling point clouds. Table 2 gives an overview of the recent
works. Below, we introduce the works for the parameterization of genus-0 point
clouds and then the works for point clouds with disk topology.
1510
Boy mesh Quasi-conformal parameterization Histogram of the input BC Histogram of the output BC
0.2 1 140 140
0.9 120 120
0.1
0.8
0 100 100
0.7
0.6 80 80
–0.1
0.5 60 60
–0.2
0.4 40 40
–0.3 0.3
20 20
–0.4 0.2
0.1 0 0
–0.4 –0.3 0 0.2 0.4 –0.2 0 0.2 0.4 0.6 0.8 1 1.2 –0.2 0 0.2 0.4 0.6 0.8 1 1.2
Fig. 18 An example of the QCMC method for the quasi-conformal parameterization of multiply connected open surfaces. (Image adapted from Ho and Lui
2016). Left: The input multiply connected open triangle mesh and the output quasi-conformal parameterization color-coded by the norm of the Beltrami
coefficient of the output map. Right: The histograms of the norm of the prescribed Beltrami coefficient and that of the output map
G. P. T. Choi and L. M. Lui
43 Recent Developments of Surface Parameterization Methods. . . 1511
4000
3500
3000
2500
no. of faces
2000
1500
1000
500
0
0.2 0.4 0.6
m
Fig. 19 An example of the extremal Teichmüller map between two multiply connected domains.
(Image adapted from Ng et al. 2014). Left: A multiply connected domain with a circle packing
texture. Middle: The extremal Teichmüller map onto another multiply connected domain. Note
that the small circles are mapped to small ellipses with uniform eccentricity. Right: The histogram
of the norm of the Beltrami coefficient of the resulting map
Table 2 A summary of recent conformal and quasi-conformal parameterization methods for point
clouds
Method Surface type Target domain Criterion
Spherical map (Choi et al. Topological sphere Sphere Conformal
2016)
TEMPO (Meng et al. 2016) Topological disk Rectangle Conformal/Teichmüller
PCQC (Meng and Lui Topological disk Rectangle Quasi-conformal
2018)
Free-boundary map (Choi Topological disk Free-boundary Conformal
et al. 2022)
For the parameterization of genus-0 point clouds, Choi et al. developed a spherical
conformal parameterization method in Choi et al. (2016). Analogous to the spherical
conformal mapping algorithm for triangle meshes in Choi et al. (2015), the point
cloud spherical conformal parameterization method considers a “north pole” step
and a “south pole” step. More specifically, the method starts by approximating the
Laplacian operator on point clouds using the moving least squares (MLS) method
with a Gaussian-type weight function. Using the point cloud Laplacian, one can
compute a harmonic flattening map of a genus-0 point cloud and then map it to the
sphere using the inverse stereographic projection in Equation (29). This forms the
“north pole” step in the proposed method (Choi et al. 2016). As for the “south pole”
step, instead of solving for a quasi-conformal map as described in Choi et al. (2015),
here the method applies the south pole stereographic projection in Equation (30)
and then solves another Laplace equation followed by the inverse south pole
1512 G. P. T. Choi and L. M. Lui
stereographic projection. It was shown in Choi et al. (2016) that by performing the
“north pole” step and the “south pole” step iteratively, one can eventually obtain a
spherical conformal parameterization of the point cloud. In other words, using the
north-south reiteration scheme, one can achieve conformality without computing
quasi-conformal maps as in the abovementioned mesh parameterization methods.
Figure 20 shows an example of the spherical conformal parameterization obtained
by Choi et al. (2016). More recently, a variation of the method has been proposed
in Jarvis et al. (2021) for the spherical parameterization of sparse genus-0 point
clouds.
In 2016, Meng et al. proposed a framework called TEMPO for computing Teich-
müller extremal mappings of point clouds with disk topology. In particular, they
developed methods for computing the rectangular conformal parameterizations and
landmark-matching Teichmüller parameterizations of disk-type point clouds (see
Fig. 21 for an illustration).
For the rectangular conformal parameterization, the method starts by computing
a harmonic map φ0 : P → D of the input disk-type point cloud P onto the unit disk
by solving the Laplace equation:
φ0 = 0 (47)
By the composition formula (12), the composition map φ2 ◦ φ0 with the optimal
h gives a rectangular conformal parameterization of the input point cloud. After
getting the rectangular conformal parameterization, the landmark-matching Teich-
müller parameterization can be obtained by extending the QC iteration method (Lui
et al. 2014) for point clouds. Using the TEMPO framework, it is possible to compute
landmark-matching registrations of point cloud surfaces. Figure 22 shows an
example of registering two facial point clouds with prescribed landmark constraints.
One important component in the above framework is the approximation of
the Beltrami coefficient μ on point clouds. In 2018, Meng and Lui presented a
rigorous treatment of the approximation of quasi-conformal maps and the relevant
concepts on point clouds. In particular, they proposed a geometric quantity called the
point cloud Beltrami coefficient (PCBC) and proved that it can effectively capture
the local geometric distortion of a point cloud mapping. Using the PCBC, they
developed the point cloud quasi-conformal (PCQC) parameterization method for
the parameterization of point clouds with any prescribed PCBC (see Fig. 23 for an
example).
More recently, Liu et al. developed a free-boundary conformal parameterization
method for disk-type point clouds (Choi et al. 2022) by extending the mesh-based
DNCP algorithm in Desbrun et al. (2002). The method approximates the Laplacian
operator on disk-type point clouds using a modified local mesh method with some
special treatments at the point cloud boundary. More specifically, let P be the
given point cloud with n vertices. For each vertex vi , the method considers its
k-nearest neighbors and computes the local Delaunay triangulation to obtain a one-
ring neighborhood Ri . The angles in Ri are then used for constructing an n × n
pc
matrix Lk,i :
1514 G. P. T. Choi and L. M. Lui
Fig. 22 An illustration of the TEMPO framework. (Image adapted from Meng et al. 2016). Left
column: The source human facial point cloud and the rectangular conformal parameterization.
Middle column: The target human facial point cloud and the rectangular conformal parameteriza-
tion. Right column: The registration result and the corresponding landmark-matching Teichmüller
mapping of the rectangular domain
1.5
0.5
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
⎧
⎪
⎪ pc pc 1
⎨Lk,i (i, j ) = Lk,i (j, i) = − (cot αij + cot βij ) if vj ∈ Ri ,
2
1 (49)
⎪ pc
⎩Lk,i (i, i) = 2
⎪ (cot αij + cot βij ),
j :vj ∈Ri
where αij and βij are the angles opposite to the edge [vi , vj ] in the local
pc
triangulation, and all other entries of Lk,i are set to be 0. Noticing that the above
approximation may be inaccurate at the boundary vertices in case the point cloud
boundary shape is nonconvex, the method further checks if every boundary angle
θ in the local triangulation for boundary vertices satisfies the angle criterion c1 <
θ < c2 , where (c1 , c2 ) is a prescribed angle range. It then removes all triangles that
pc
violate this angle criterion and obtains the matrices Lk,i for the boundary vertices.
pc
The Laplacian operator Lk for the entire point cloud can then be approximated by
pc pc
Lk = 13 ni=1 Lk,i . Finally, the point cloud parameterization f = (fx , fy ) can be
obtained by solving the following linear system:
pc
Lk 0 0 M1 fx
pc − = 0, (50)
0 Lk M2 0 fy
Applications
Fig. 25 Registering brain cortical surfaces using the FLASH method. (Image adapted from Choi
et al. 2015). (a) The source brain with sulcal landmarks. (b) The target brain with sulcal landmarks.
(c) The registration obtained using conformal parameterization without landmark constraints. (d)
The registration obtained using landmark-constrained optimized conformal parameterization. It
can be observed that the landmark-constrained parameterization gives a more accurate registration
result
43 Recent Developments of Surface Parameterization Methods. . . 1517
conformal parameterization method developed in Choi et al. (2015) has been applied
to optical mapping for cardiac electrophysiology (Christoph et al. 2017) and cardiac
radiofrequency catheter ablation (Zhou et al. 2016). In 2018, Choi and Mahadevan
utilized Teichmüller mappings for insect wing morphometry (see Fig. 26). In Choi
et al. (2020c,d), Choi et al. utilized conformal parameterizations and Teichmüller
mappings for analyzing human and other mammalian tooth shape (see Fig. 27).
The mapping methods have also been applied to different engineering problems.
For instance, the spherical conformal parameterization method in Choi et al. (2015)
has been applied to collaborative robotics (Popov and Klimchik 2019). The disk
conformal parameterization method in Choi and Lui (2015) has been applied to
structural optimization (Kussmaul et al. 2019) and robot navigation (Notomista and
Saveriano 2021). The rectangular parameterization method in Meng et al. (2016)
has been applied to T-spline surface reconstruction (Wang 2021) and nanotech-
nology (Guralnik 2021). In 2017, Choi et al. developed a method for subdivision
connectivity surface remeshing via Teichmüller mappings. In 2018, Yung et al.
developed an efficient image registration method using coarse triangulations and
landmark-matching quasi-conformal mappings. In (2019, 2021), Choi et al. utilized
conformal and quasi-conformal mapping methods (Meng et al. 2016; Choi and Lui
2018) in developing constrained optimization frameworks for kirigami metamaterial
design. In 2021, Shaqfa et al. extended the disk conformal parameterization
method (Choi and Lui 2015) for spherical cap parameterization and utilized it for
analyzing stone microstructures. Recently, Jarvis et al. (2021) developed a method
for reconstructing 3D asteroid and comet shapes from sparse feature point sets via
spherical parameterizations based on the method in Choi et al. (2016).
Conclusion
Acknowledgments This work was supported in part by the National Science Foundation under
Grant No. DMS-2002103 (to Gary P. T. Choi) and HKRGC GRF under project ID 2130549 (to
Lok Ming Lui).
References
Ahlfors, L.V.: Lectures on Quasiconformal Mappings, vol. 38. American Mathematical Society,
Providence (2006)
Belkin, M., Sun, J., Wang, Y.: Constructing Laplace operator from point clouds in Rd . In: Proceed-
ings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1031–1040
(2009)
Ben-Chen, M., Gotsman, C., Bunin, G.: Conformal flattening by curvature prescription and metric
scaling. Comput. Graph. Forum 27(2), 449–458 (2008)
Chan, H.L., Li, H., Lui, L.M.: Quasi-conformal statistical shape analysis of hippocampal surfaces
for Alzheimer’s disease analysis. Neurocomputing 175, 177–187 (2016)
Chan, H.L., Yam, T.C., Lui, L.M.: Automatic characteristic-calibrated registration (ACC-REG):
hippocampal surface registration using eigen-graphs. Pattern Recogn. 103, 107142 (2020)
43 Recent Developments of Surface Parameterization Methods. . . 1519
Chien, E., Levi, Z., Weber, O.: Bounded distortion parametrization in the space of metrics. ACM
Trans. Graph. 35(6), 1–16 (2016)
Choi, P.T.: Surface conformal/quasi-conformal parameterization with applications. In: CUHK
Electronic Theses and Dissertations Collection, The Chinese University of Hong Kong (2016)
Choi, P.T., Lui, L.M.: Fast disk conformal parameterization of simply-connected open surfaces. J.
Sci. Comput. 65(3), 1065–1090 (2015)
Choi, G.P.-T., Lui, L.M.: A linear formulation for disk conformal parameterization of simply-
connected open surfaces. Adv. Comput. Math. 44(1), 87–114 (2018)
Choi, G.P.T., Mahadevan, L.: Planar morphometrics using Teichmüller maps. Proc. R. Soc. A
474(2217), 20170905 (2018)
Choi, G.P.T., Rycroft, C.H.: Density-equalizing maps for simply connected open surfaces. SIAM
J. Imaging Sci. 11(2), 1134–1178 (2018)
Choi, G.P.T., Rycroft, C.H.: Volumetric density-equalizing reference map with applications. J. Sci.
Comput. 86(3), 41 (2021)
Choi, P.T., Lam, K.C., Lui, L.M.: FLASH: fast landmark aligned spherical harmonic parameteri-
zation for genus-0 closed brain surfaces. SIAM J. Imaging Sci. 8(1), 67–94 (2015)
Choi, G.P.-T., Ho, K.T., Lui, L.M.: Spherical conformal parameterization of genus-0 point clouds
for meshing. SIAM J. Imaging Sci. 9(4), 1582–1618 (2016)
Choi, G.P.-T., Man, M.H.-Y., Lui, L.M.: Fast spherical quasiconformal parameterization of genus-
0 closed surfaces with application to adaptive remeshing. Geom. Imaging Comput. 3(1–2), 1–29
(2016)
Choi, G.P.T., Chen, Y., Lui, L.M., Chiu, B.: Conformal mapping of carotid vessel wall and plaque
thickness measured from 3D ultrasound images. Med. Biol. Eng. Comput. 55(12), 2183–2195
(2017)
Choi, C.P., Gu, X., Lui, L.M.: Subdivision connectivity remeshing via Teichmüller extremal map.
Inverse Probl. Imaging 11(5), 825–855 (2017)
Choi, G.P.T., Dudte, L.H., Mahadevan, L.: Programming shape using kirigami tessellations. Nat.
Mater. 18(9), 999–1004 (2019)
Choi, G.P.T., Leung-Liu, Y., Gu, X., Lui, L.M.: Parallelizable global conformal parameterization of
simply-connected surfaces via partial welding. SIAM J. Imaging Sci. 13(3), 1049–1083 (2020a)
Choi, G.P.T., Chiu, B., Rycroft, C.H.: Area-preserving mapping of 3D carotid ultrasound images
using density-equalizing reference map. IEEE Trans. Biomed. Eng. 67(9), 1507–1517 (2020b)
Choi, G.P.T., Qiu, D., Lui, L.M.: Shape analysis via inconsistent surface registration. Proc. R. Soc.
A 476(2242), 20200147 (2020c)
Choi, G.P.T., Chan, H.L., Yong, R., Ranjitkar, S., Brook, A., Townsend, G., Chen, K., Lui, L.M.:
Tooth morphometry using quasi-conformal theory. Pattern Recogn. 99, 107064 (2020d)
Choi, G.P.T.: Efficient conformal parameterization of multiply-connected surfaces using quasi-
conformal theory. J. Sci. Comput. 87(3), 70 (2021)
Choi, G.P.T., Giri, A., Kumar, L.: Adaptive area-preserving parameterization of open and closed
anatomical surfaces. Comput. Biol. Med., 148, 105715 (2022)
Choi, G.P.T., Dudte, L.H., Mahadevan, L.: Compact reconfigurable kirigami. Phys. Rev. Res. 3(4),
043030 (2021)
Choi, G.P.T., Liu, Y., Lui, L.M.: Free-boundary conformal parameterization of point clouds. J. Sci.
Comput. 90(1), 14 (2022)
Christoph, J., Schröder-Schetelig, J., Luther, S.: Electromechanical optical mapping. Prog. Bio-
phys. Mol. Biol. 130, 150–169 (2017)
Claici, S., Bessmeltsev, M., Schaefer, S., Solomon, J.: Isometry-aware preconditioning for mesh
parameterization. Comput. Graph. Forum 36(5), 37–47 (2017)
Desbrun, M., Meyer, M., Alliez, P.: Intrinsic parameterizations of surface meshes. Comput. Graph.
Forum 21(3), 209–218 (2002)
Floater, M.S., Hormann, K.: Surface parameterization: a tutorial and survey. In: Advances in
Multiresolution for Geometric Modelling, pp. 157–186. Springer, Berlin/New York (2005)
Fu, X.-M., Liu, Y., Guo, B.: Computing locally injective mappings by advanced MIPS. ACM Trans.
Graph. 34(4), 1–12 (2015)
1520 G. P. T. Choi and L. M. Lui
Gardiner, F.P., Lakic, N.: Quasiconformal Teichmüller Theory, vol. 76. American Mathematical
Society, Providence (2000)
Giri, A., Choi, G.P.T., Kumar, L.: Open and closed anatomical surface description via hemispheri-
cal area-preserving map. Sig. Process. 180, 107867 (2021)
Gu, X.D., Yau, S.-T.: Computational Conformal Geometry, vol. 1. International Press, Somerville
(2008)
Gu, X., Wang, Y., Chan, T.F., Thompson, P.M., Yau, S.-T.: Genus zero surface conformal mapping
and its application to brain surface mapping. IEEE Trans. Med. Imaging 23(8), 949–958 (2004)
Gu, X., Luo, F., Yau, S.T.: Computational conformal geometry behind modern technologies. Not.
Am. Math. Soc. 67(10), 1509–1525 (2020)
Guralnik, B., Hansen, O., Henrichsen, H.H., Caridad, J.M., Wei, W., Hansen, M.F., Nielsen, P.F.,
Petersen, D.H.: Effective electrical resistivity in a square array of oriented square inclusions.
Nanotechnology 32(18), 185706 (2021)
Haker, S., Angenent, S., Tannenbaum, A., Kikinis, R., Sapiro, G., Halle, M.: Conformal surface
parameterization for texture mapping. IEEE Trans. Vis. Comput. Graph. 6(2), 181–189 (2000)
Ho, K.T., Lui, L.M.: QCMC: quasi-conformal parameterizations for multiply-connected domains.
Adv. Comput. Math. 42(2), 279–312 (2016)
Hormann, K., Lévy, B., Sheffer, A.: Mesh parameterization: theory and practice. In: ACM
SIGGRAPH 2007 Courses (2007)
Jarvis, B., Choi, G.P.T., Hockman, B., Morrell, B., Bandopadhyay, S., Lubey, D., Villa, J.,
Bhaskaran, S., Bayard, D., Nesnas, I.A.: 3D shape reconstruction of small bodies from sparse
features. IEEE Robot. Autom. Lett. 6(4), 7089–7096 (2021)
Jin, M., Kim, J., Luo, F., Gu, X.: Discrete surface Ricci flow. IEEE Trans. Vis. Comput. Graph.
14(5), 1030–1043 (2008)
Wang, J., Leach, R., Chen, R., Xu, J., Jiang, X.J.: Distortion-free intelligent sampling of sparse
surfaces via locally refined T-spline metamodelling. Int. J. Precis. Eng. Manuf. – Green Technol.
8(5), 1471–1486 (2021)
Kussmaul, R., Jónasson, J.G., Zogg, M., Ermanni, P.: A novel computational framework for
structural optimization with patched laminates. Struct. Multidiscipl. Optim. 60(5), 2073–2091
(2019)
Lai, R., Liang, J., Zhao, H.-K.: A local mesh method for solving PDEs on point clouds. Inverse
Probl. Imaging 7(3), 737–755 (2013)
Lai, R., Wen, Z., Yin, W., Gu, X., Lui, L.M.: Folding-free global conformal mapping for genus-0
surfaces by harmonic energy minimization. J. Sci. Comput. 58(3), 705–725 (2014)
Lam, K.C., Lui, L.M.: Landmark-and intensity-based registration with large deformations via
quasi-conformal maps. SIAM J. Imaging Sci. 7(4), 2364–2392 (2014)
Lam, K.C., Gu, X., Lui, L.M.: Landmark constrained genus-one surface Teichmüller map applied
to surface registration in medical imaging. Med. Image Anal. 25(1), 45–55 (2015)
Lee, Y.T., Lam, K.C., Lui, L.M.: Landmark-matching transformation with large deformation via
n-dimensional quasi-conformal maps. J. Sci. Comput. 67(3), 926–954 (2016)
Lehto, O.: Quasiconformal Mappings in the Plane, vol. 126. Springer, Berlin/Heidelberg (1973)
Lei, N., Gu, X.: FFT-OT: a fast algorithm for optimal transportation. In: Proceedings of the
IEEE/CVF International Conference on Computer Vision, pp. 6280–6289 (2021)
Lévy, B., Petitjean, S., Ray, N., Maillot, J.: Least squares conformal maps for automatic texture
atlas generation. ACM Trans. Graph. 21(3), 362–371 (2002)
Liang, J., Zhao, H.: Solving partial differential equations on point clouds. SIAM J. Sci. Comput.
35(3), A1461–A1486 (2013)
Liang, J., Lai, R., Wong, T.W., Zhao, H.: Geometric understanding of point clouds using Laplace-
Beltrami operator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 214–221 (2012)
Li, S., Zeng, W., Zhou, D., Gu, X., Gao, J.: Compact conformal map for greedy routing in wireless
mobile sensor networks. IEEE Trans. Mobile Comput. 15(7), 1632–1646 (2015)
Lipman, Y.: Bounded distortion mapping spaces for triangular meshes. ACM Trans. Graph. 31(4),
1–13 (2012)
43 Recent Developments of Surface Parameterization Methods. . . 1521
Lipman, Y., Kim, V.G., Funkhouser, T.A.: Simple formulas for quasiconformal plane deformations.
ACM Trans. Graph. 31(5), 1–13 (2012)
Liu, L., Ye, C., Ni, R., Fu, X.-M.: Progressive parameterizations. ACM Trans. Graph. 37(4), 1–12
(2018)
Lui, L.M., Wen, C.: Geometric registration of high-genus surfaces. SIAM J. Imaging Sci. 7(1),
337–365 (2014)
Lui, L.M., Wong, T.W., Thompson, P., Chan, T., Gu, X., Yau, S.-T.: Shape-based diffeomorphic
registration on hippocampal surfaces using Beltrami holomorphic flow. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 323–330.
Springer, (2010)
Lui, L.M., Wong, T.W., Zeng, W., Gu, X., Thompson, P.M., Chan, T.F., Yau, S.-T.: Optimization of
surface registrations using Beltrami holomorphic flow. J. Sci. Comput. 50(3), 557–585 (2012)
Lui, L.M., Lam, K.C., Wong, T.W., Gu, X.: Texture map and video compression using Beltrami
representation. SIAM J. Imaging Sci. 6(4), 1880–1902 (2013)
Lui, L.M., Lam, K.C., Yau, S.-T., Gu, X.: Teichmüller mapping (T-map) and its applications to
landmark matching registration. SIAM J. Imaging Sci. 7(1), 391–426 (2014)
Lui, L.M., Gu, X., Yau, S.-T.: Convergence of an iterative algorithm for Teichmüller maps via
harmonic energy optimization. Math. Comput. 84(296), 2823–2842 (2015)
Meng, T., Lui, L.M.: PCBC: quasiconformality of point cloud mappings. J. Sci. Comput. 77(1),
597–633 (2018)
Meng, Q., Li, B., Holstein, H., Liu, Y.: Parameterization of point-cloud freeform surfaces using
adaptive sequential learning RBF networks. Pattern Recogn. 46(8), 2361–2375 (2013)
Meng, T.W., Choi, G.P.-T., Lui, L.M.: TEMPO: feature-endowed Teichm̈uller extremal mappings
of point clouds. SIAM J. Imaging Sci. 9(4), 1922–1962 (2016)
Nadeem, S., Su, Z., Zeng, W., Kaufman, A., Gu, X.: Spherical parameterization balancing angle
and area distortions. IEEE Trans. Vis. Comput. Graph. 23(6), 1663–1676 (2016)
Ng, T.C., Gu, X., Lui, L.M.: Computing extremal Teichmüller map of multiply-connected domains
via Beltrami holomorphic flow. J. Sci. Comput. 60(2), 249–275 (2014)
Notomista, G., Saveriano, M.: Safety of dynamical systems with multiple non-convex unsafe sets
using control barrier functions. IEEE Control Syst. Lett. 6, 1136–1141 (2021)
Popov, D., Klimchik, A.: Real-time external contact force estimation and localization for collab-
orative robot. In: 2019 IEEE International Conference on Mechatronics, vol. 1, pp. 646–651.
IEEE (2019)
Pumarola, A., Sanchez-Riera, J., Choi, G.P.T., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: mod-
eling the geometry of dressed humans. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 2242–2251 (2019)
Qiu, D., Lam, K.-C., Lui, L.-M.: Computing quasi-conformal folds. SIAM J. Imaging Sci. 12(3),
1392–1424 (2019)
Rabinovich, M., Poranne, R., Panozzo, D., Sorkine-Hornung, O.: Scalable locally injective
mappings. ACM Trans. Graph. 36(4), 1 (2017)
Sawhney, R., Crane, K.: Boundary first flattening. ACM Trans. Graph. 37(1), 1–14 (2017)
Shaqfa, M., Choi, G.P.T., Beyer, K.: Spherical cap harmonic analysis (SCHA) for characterising
the morphology of rough surface patches. Powder Technol. 393, 837–856 (2021)
Sharp, N., Crane, K.: A Laplacian for nonmanifold triangle meshes. Comput. Graph. Forum 39(5),
69–80 (2020)
Sheffer, A., Praun, E., Rose, K.: Mesh parameterization methods and their applications. Found.
Trends® Comput. Graph. Vis. 2(2), 105–171 (2006)
Smith, J., Schaefer, S.: Bijective parameterization with free boundaries. ACM Trans. Graph. 34(4),
1–9 (2015)
Su, K., Cui, L., Qian, K., Lei, N., Zhang, J., Zhang, M., Gu, X.D.: Area-preserving mesh
parameterization for poly-annulus surfaces based on optimal mass transportation. Comput.
Aided Geom. Des. 46, 76–91 (2016)
Su, J.-P., Ye, C., Liu, L., Fu, X.-M.: Efficient bijective parameterizations. ACM Trans. Graph.
39(4), 111–1 (2020)
1522 G. P. T. Choi and L. M. Lui
Ta, D., Tu, Y., Lu, Z.-L., Wang, Y.: Quantitative characterization of the human retinotopic map
based on quasiconformal mapping. Med. Image Anal. 75, 102230 (2021)
Tewari, G., Gotsman, C., Gortler, S.J.: Meshing genus-1 point clouds using discrete one-forms.
Comput. Graph. 30(6), 917–926 (2006)
Tu, Y., Ta, D., Gu, X.D., Lu, Z.-L., Wang, Y.: Diffeomorphic registration for retinotopic mapping
via quasiconformal mapping. In: 2020 IEEE 17th International Symposium on Biomedical
Imaging, pp. 687–691. IEEE (2020)
Vogiatzis, P., Ma, M., Chen, S., Gu, X.D.: Computational design and additive manufacturing
of periodic conformal metasurfaces by synthesizing topology optimization with conformal
mapping. Comput. Methods Appl. Mech. Eng. 328, 477–497 (2018)
Weber, O., Myles, A., Zorin, D.: Computing extremal quasiconformal maps. Comput. Graph.
Forum 31(5), 1679–1689 (2012)
Wen, C., Wang, D., Shi, L., Chu, W.C.W., Cheng, J.C.Y., Lui, L.M.: Landmark constrained
registration of high-genus surfaces applied to vestibular system morphometry. Comput. Med.
Imaging Graph. 44, 1–12 (2015)
Wong, T.W., Zhao, H.-K.: Computation of quasi-conformal surface maps using discrete Beltrami
flow. SIAM J. Imaging Sci. 7(4), 2675–2699 (2014)
Wong, T.W., Zhao, H.-K.: Computing surface uniformization using discrete Beltrami flow. SIAM
J. Sci. Comput. 37(3), A1342–A1364 (2015)
Yang, Y.-J., Zeng, W.: Quasiconformal rectilinear map. Graph. Models 107, 101057 (2020)
Yang, Y.-L., Guo, R., Luo, F., Hu, S.-M., Gu, X.: Generalized discrete Ricci flow. Comput. Graph.
Forum 28(7), 2005–2014 (2009)
Yin, X., Dai, J., Yau, S.-T., Gu, X.: Slit map: conformal parameterization for multiply connected
surfaces. In: International Conference on Geometric Modeling and Processing, pp. 410–422.
Springer (2008)
Yueh, M.-H., Lin, W.-W., Wu, C.-T., Yau, S.-T.: An efficient energy minimization for conformal
parameterizations. J. Sci. Comput. 73(1), 203–227 (2017)
Yueh, M.-H., Lin, W.-W., Wu, C.-T., Yau, S.-T.: A novel stretch energy minimization algorithm for
equiareal parameterizations. J. Sci. Comput. 78(3), 1353–1386 (2019)
Yueh, M.-H., Li, T., Lin, W.-W., Yau, S.-T.: A novel algorithm for volume-preserving parameteri-
zations of 3-manifolds. SIAM J. Imaging Sci. 12(2), 1071–1098 (2019)
Yueh, M.-H., Huang, H.-H., Li, T., Lin, W.-W., Yau, S.-T.: Optimized surface parameterizations
with applications to chinese virtual broadcasting. Electron. Trans. Numer. Anal. 53, 383–405
(2020)
Yung, C.P., Choi, G.P.T., Chen, K., Lui, L.M.: Efficient feature-based image registration by
mapping sparsified surfaces. J. Vis. Commun. Image Represent. 55, 561–571 (2018)
Zeng, W., Gu, X.D.: Registration for 3D surfaces with large deformations using quasi-conformal
curvature flow. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2457–2464. IEEE (2011)
Zeng, W., Yang, Y.-J.: Colon flattening by landmark-driven optimal quasiconformal mapping. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,
pp. 244–251. Springer (2014)
Zeng, W., Yin, X., Zhang, M., Luo, F., Gu, X.: Generalized Koebe’s method for conformal mapping
multiply connected domains. In: 2009 SIAM/ACM Joint Conference on Geometric and Physical
Modeling, pp. 89–100 (2009)
Zeng, W., Marino, J., Gurijala, K.C., Gu, X., Kaufman, A.: Supine and prone colon registration
using quasi-conformal mapping. IEEE Trans. Vis. Comput. Graph. 16(6), 1348–1357 (2010)
Zeng, W., Lui, L.M., Luo, F., Chan, T.F.-C., Yau, S.-T., Gu, D.X.: Computing quasiconformal maps
using an auxiliary metric and discrete curvature flow. Numer. Math. 121(4), 671–703 (2012)
Zhang, L., Liu, L., Gotsman, C., Huang, H.: Mesh reconstruction by meshless denoising and
parameterization. Comput. Graph. 34(3), 198–208 (2010)
Zhang, M., Guo, R., Zeng, W., Luo, F., Yau, S.-T., Gu, X.: The unified discrete surface Ricci flow.
Graph. Models 76(5), 321–339 (2014)
43 Recent Developments of Surface Parameterization Methods. . . 1523
Zhang, M., Zeng, W., Guo, R., Luo, F., Gu, X.D.: Survey on discrete surface Ricci flow. J. Comput.
Sci. Technol. 30(3), 598–613 (2015)
Zhang, D., Choi, G.P.T., Zhang, J., Lui, L.M.: A unifying framework for n-dimensional quasi-
conformal mappings. SIAM J. Imaging Sci. 15(2), 960–988 (2022)
Zhao, X., Su, Z., Gu, X.D., Kaufman, A., Sun, J., Gao, J., Luo, F.: Area-preservation mapping
using optimal mass transport. IEEE Trans. Vis. Comput. Graph. 19(12), 2838–2847 (2013)
Zhao, J., Qi, X., Wen, C., Lei, N., Gu, X.: Automatic and robust skull registration based on discrete
uniformization. In: Proceedings of the IEEE International Conference on Computer Vision,
pp. 431–440 (2019)
Zhou, X.-Y., Ernst, S., Lee, S.-L.: Path planning for robot-enhanced cardiac radiofrequency
catheter ablation. In: 2016 IEEE International Conference on Robotics and Automation,
pp. 4172–4177. IEEE (2016)
Zou, G., Hu, J., Gu, X., Hua, J.: Authalic parameterization of general surfaces using Lie advection.
IEEE Trans. Vis. Comput. Graph. 17(12), 2005–2014 (2011)
Zwicker, M., Gotsman, C.: Meshing point clouds using spherical parameterization. In: Proceedings
of the Eurographics Symposium on Point-Based Graphics, pp. 173–180 (2004)
Recent Geometric Flows in
Multi-orientation Image Processing 44
via a Cartan Connection
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527
Scores on Lie Groups G = Rd T and the Motivation for Left-Invariant
Processing and a Left-Invariant Connection on T (G) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1530
Motivation: Choosing a Cartan Connection for Geometric (PDE-Based)
Image Processing via Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533
Structure and Contributions of the Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1534
A Parameterized Class of Cartan Connections and Their Duals . . . . . . . . . . . . . . . . . . . . . . 1536
Expressing the Lie-Cartan Connection (and Its Dual) in Left-Invariant Coordinates . . . . 1541
(Partial) Lie-Cartan Connections for (Sub)-Riemannian Geometry . . . . . . . . . . . . . . . . . . 1543
The Special Case of Interest ν = 1 and Hamiltonian Flows for the
Riemannian Geodesic Problem on G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1544
The Homogeneous Space Md of Positions and Orientations . . . . . . . . . . . . . . . . . . . . . . . . . 1549
The Metric Models on Md : Shortest Curves and Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . 1550
Straight Curve Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1554
Exponential Curve Fits of the Second Order Are Found by SVD of the
Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557
Overview of Image Analysis Applications for G = SE(d) . . . . . . . . . . . . . . . . . . . . . . . . . . 1559
Shortest Curve Application: Tracking of Blood Vessels . . . . . . . . . . . . . . . . . . . . . . . . . . 1560
Straight Curve Application: Biomarkers for Diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1563
Abstract
Keywords
Introduction
Fig. 1 Top: Lifted paths γ (t) = (x(t), y(t), θ(t)) in R2 × S 1 (left) where the tangent γ̇ (t) is
restricted to the span of (cos θ(t), sin θ(t), 0) and (0, 0, 1), of which the green plane on the right is
an example. Bottom: Lifted image data depicted by an orange volume rendering. The meaning of
shortest path between points in an image is determined by a combination of a cost computed from
the lifted data, the restriction above and a curvature penalization. The path optimization problem
is formulated on the position-orientation domain such as in the image on the right. The cost for
moving through the orange parts is lower than elsewhere
1528 R. Duits et al.
wavelets, including cake wavelets (where inversion requires integration over angles
only) (Duits et al. 2007; Bekkers 2017), or nonlinearly via orientation channel
representations (Forssen 2004; Felsberg et al. 2006). In the differential geometry
article, we constrain ourselves to invertible orientation scores (Duits and Franken
2010a) constructed by cake wavelets following standard settings as explained in
Bekkers (2017).
In multi-orientation processing on orientation scores (Janssen et al. 2018; Duits
et al. 2007, 2019; Zhang et al. 2016) (or on other orientation lifts Felsberg 2012; Citti
and Sarti 2006; Duits and Franken 2011; Citti et al. 2016; Momayyez-Siahkal and
Siddiqi 2009), differential geometry plays a fundamental role in PDE- and ODE-
based techniques for pattern recognition, cortical modelling and image analysis.
Image processing applications are then provided with fundamental differential
geometrical tools such as Cartan connections (Piuze et al. 2015; Duits et al. 2016)
that ‘literally connect’ all tangent spaces in the tangent bundle T (Md ) above the
space Md of positions and orientations. Such a connection underlies flows Duits
and Franken (2011), segmentations (Zhang et al. 2016), detection (Bekkers et al.
2015) and tracking (Duits et al. 2018) on Md . In all of these PDE-based processing
techniques on Md , one has the major benefit (over related algorithms acting directly
in the image domain Rd ) that the processing generically deals with complex
structures (such as crossings, bifurcations, etc.). In this article, we will highlight
some applications in the experimental section, to illustrate how our preferred Cartan
connection enters image analysis applications.
Here the key idea is that elongated structures that are involved in crossings are
manifestly disentangled in orientation lifts of image data; see Fig. 2. This allows for
crossing-preserving enhancements and tracking via such orientation lifts as shown
in Fig. 3.
Furthermore, in the space of positions and orientations it is possible to check
for alignment of local orientations in the image data. Filtering well-aligned local
features in multi-orientation distributions (e.g. orientation scores) of image data is
sometimes called ‘contextual image processing’ (Prčkovska et al. 2015; Bekkers
2017; Franken 2008). It relates to cortical models for line perception in human
vision (Petitot 2003; Bosking et al. 1997; Citti and Sarti 2006) and is highly
beneficial for data enhancement and denoising in image analysis applications; see,
for example, (Duits et al. 2019; Chambolle and Pock 2018; Citti et al. 2016; Duits
and Franken 2011; Momayyez-Siahkal and Siddiqi 2009; Franken and Duits 2009;
Portegies et al. 2015), prior to geometric tracking (Meesters et al. 2017; Portegies
et al. 2015; Chen and Cohen 2018; Duits et al. 2018) in the homogeneous space of
positions and orientations.
The homogeneous space of positions is formally defined as a Lie group quotient:
Fig. 2 Current tracking algorithms on images often fail (left); therefore, we first extend the image
domain to the space of positions and orientations (where no such crossings occur) and then apply
geodesic tracking (right), enhancement and learning to automatically deal with complex structures
Fig. 3 Top: instead of direct processing of an image, we process via an invertible orientation score,
obtained by convolving the image with a set of rotated wavelets (Duits et al. 2007; Bekkers 2017;
Janssen et al. 2014). Second row: vessel tracking in a 2D image via orientation scores (Bekkers
2017; Bekkers et al. 2015; Duits et al. 2018). Third row: crossing-preserving diffusion via the
orientation score of a 3D image (Janssen et al. 2014; Duits et al. 2016). For automation, one
can integrate geometric deep learning via PDE-based G-CNNs (Smets et al. 2020) and G-CNNs
(Bekkers et al. 2018; Cohen and Welling 2016). Here we will not elaborate on such machine
learning techniques but rather focus on the underlying PDEs and Cartan connection
section of this work, we mainly focus on the case d = 2, but we also highlight
related works and applications where the case d = 3 is tackled.
Lie SE(2) group of rotations and translations in the plane is isomorphic to the
three-dimensional homogeneous space M2 of positions and orientations. In case
d = 3, the subgroup H ≡ SO(2) and therefore the homogeneous space M3
of positions and orientations is five dimensional. In that case, a multi-orientation
distribution U : M3 → C can be visualized by a field of angular profiles on a grid:
|U (x,n)|
{x + 2U L∞ (M3 ) n | n ∈ S , x ∈ Z } with colour-coded orientations. For such a
2 3
In this article, we shall not be concerned with applications of the other Lie group
cases mentioned in the remark above, but in order to keep generality of our
theoretical results, we will initially study Cartan connections on Lie groups G in
general, so that our results also apply to the general Lie group setting.
Furthermore, we deliberately avoid technical issues (Duits et al. 2013, 2019;
Smets et al. 2019, 2020) that come along with taking Lie group quotients like in (1).
The differential geometrical results in this article are easier to grasp if one just
considers the whole Lie group G. For integration of the appropriate symmetries that
come along with taking Lie group quotients, with in particular the one of primary
interest (1), see Duits et al. (2013, 2019), Smets et al. (2019, 2020).
In the general Lie group setting, we consider Lie groups G = Rd T that are
the semi-direct product of Rd with another Lie group T (reflecting the feature
of interest, e.g. orientations, velocities, frequencies, scales, etc.). Then one uses
a unitary representation g → Ug of such a Lie group onto the space of images
modelled by L2 (Rd ) to construct the ‘score’ (or ‘lifted image’) by probing image f
by a family of group coherent wavelets constructed from a wavelet ψ ∈ L2 (Rd ) ∩
L1 (Rd )
Clearly, not every (square integrable) function on the Lie group is the orientation
score of an image. It turns out that such a transform Wψ : L2 (Rd ) → CG K
is a unitary map onto its range which is the unique reproducing kernel Hilbert
space CG K consisting of functions on the Lie group G with reproducing kernel
K(g, h) = (Ug ψ, Uh ψ)L2 (Rd ) . For details, see Duits (2005), Ali et al. (1999) and
Fuehr (2005).
Remark 3. In our special case of interest where the score is an ‘orientation score’,
we set Lie group G = SE(d) = Rd SO(d), for d ∈ {2, 3}, with group product
for all g = (b, R) ∈ SE(d), x ∈ Rd . In this case, the family of group coherent
wavelets are rotated and translated versions of ψ. For d = 3, one must assume that
ψ is rotationally symmetric around the reference axis in order to ensure that the
orientation score Wψ f is well defined on M3 . For details, see Janssen et al. (2018).
Remark 4. If U is reducible, which is the case for the representation given by (3),
one can apply a decomposition into irreducible subspaces (Duits and Franken 2010a,
App.A). Then one either must restrict the space of images (e.g. to the space of
ball-limited images Fuehr 2005, ch.5.2, Duits 2005, ch.4.5) or one must rely on
distributional wavelet transforms (Bekkers et al. 2014, App.B). In both cases, one
must take care that all coherently transformed wavelets Ug ψ together ‘cover all the
frequencies in the Fourier domain’; see Duits (2005), Fuehr (2005) and Duits and
Bekkers (2020).
Remark 5. In this article, we will not address the issue of choosing a proper wavelet
ψ. For the setting of G = SE(3) or more precisely for M3 = G/H , we prefer to use
so-called cake wavelets to construct invertible orientation scores (Duits 2005). For
1532 R. Duits et al.
quick practical explanations on 2D cake wavelets, see Bekkers et al. (2014); for the
same on 3D cake wavelets, see Duits et al. (2016). All experiments in this chapter
use cake wavelets ψ with standard parameter settings (Martin and Duits 2017). For
detailed educational background on invertible orientation scores, proper wavelets,
and cake wavelets, see Duits and Bekkers (2020).
and an operator is right invariant [Rg V ] = Rg [V ], for all g ∈ G, and for all
V ∈ L2 (G).
satisfies
Ug ◦ ϒ = ϒ ◦ Ug for all g ∈ G
which shows that score processing must be left invariant. Moreover, we have
which shows that right invariance is a highly undesirable property for score
processing.
See Fig. 5 to get a visual impression what the above theorem means for the group of
roto-translations in the plane G = SE(2) ≡ M2 ; recall Remark 3.
Proof. This Lemma essentially gathers earlier results of the first author Duits (2005,
Thm. 21) and Duits et al. (2013, Thm. 1) where the proof can be found.
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1533
Processed Processed
Image Orientation Score
Fig. 4 A schematic view of image processing via scores. According to Lemma 1, must be
left invariant and not right invariant. The same applies to the other Lie group cases mentioned in
Remark 2, where the score is not an ‘orientation score’ but, for example, a ‘frequency score’ (Duits
et al. 2013)
Fig. 5 A roto-translation of the image corresponds to a shift twist of the orientation score, both
defined via group representations of G = SE(2) on the image and the orientation score. Shift
twist of images and orientation scores are denoted, respectively, by the left-regular representations
Ug (3) and Lg (5). In this illustration of Wψ ◦ Ug = Lg ◦ Wψ , we have set g = (0, θ), with θ
increasing from left to right
Geometric image processing via scores on Lie Groups requires a choice of underly-
ing Cartan connection on T (G). For geometric image processing, we literally need
to ‘connect’ tangent spaces Tg (G) at different base points g ∈ G in the domain of
a score. Such a connection gives rise to (coordinate free) covariant derivatives that
1534 R. Duits et al.
we need in PDE-based image processing via scores on Lie groups. Next, we will
illustrate this on two geometric (PDE-based) image processing techniques:
overview of the possibilities and impact of the differential geometric theory on many
medical image analysis applications.
(RetA − I )V (g)
(dR(A))V (g) = lim , for V ∈ D(dR(A))
t↓0 t
and the domain D(dR(A)) of this unbounded operator R(A) is the subset of L2 (G)
for which the above limit exists in L2 -sense.
Let Lg : G → G denote the left multiplication given by Lg h = gh. Let us
choose a basis {A1 , . . . , An } in Te (G), and let us define the corresponding vector
fields
Ai |g = (Lg )∗ Ai , for i = 1, . . . , n.
Let us define the corresponding dual basis (‘left-invariant co-frame’) in Tg∗ (G) by
i
ω , Aj g = δji (7)
g
with δji denoting the usual Kronecker delta. Then one has Ai = dR(Ai ), and the
k of the Lie algebra relate via
structure constants cij
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1537
n
n
[Ai , Aj ] = k
cij Ak ⇔ [Ai , Aj ] = Ai ◦ Aj − Aj ◦ Ai = k
cij Ak . (8)
k=1 k=1
n
Gg = gij ωi ⊗ ωj .
g g
i,j =1
with ξi > 0 for i = 1, . . . , n and the Kronecker δij . As a result for all g ∈ G, the
mapping (Lg −1 )∗ : Tg (G) → Te (G) is unitary. The mapping is known as the Cartan-
Maurer form and ‘connects’ tangent spaces in a left-invariant way. See Fig. 6 where
the Maurer-Cartan form is illustrated for the group SE(2) of roto-translations in the
plane with group product (2). The associated ‘Cartan − connection’ (Kobayashi and
Nomizu 1963) is given by
n
∇ − := ωi ⊗ Ai ◦ ωk (·) Ak ,
i,k=1
Fig. 6 The Maurer-Cartan form (in red) ‘connects’ tangent space Tg (G) to Te (G) in a left-
invariant way. It underlies the Lie-Cartan connection with ν = 0 as can be seen in Lemma 2.
Right we depict the Lie group case SE(2) = R2 S 1 and left we show spatial projections x(t) of
the curves γ (t) = (x(t), θ(t)) ∈ SE(2)
1538 R. Duits et al.
n
−
∇X Y := ωi (X) Ai (ωk (Y )) Ak .
i,k=1
n
nMorej precisely, for two arbitrary vector fields X = i=1 x Ai and Y =
i
has
⎛ ⎞
n n
− ⎝
∇X Y = x i Ai y k ⎠ Ak .
k=1 i=1
This has big limitations and is not always the right choice for a connection on
a Lie group G. Therefore, we consider a more general class of connections on
the Lie group G, the so-called Lie-Cartan connections, as we define next. Then
in particular we consider a 1-parameter class of Cartan connections. We will call
these connections ‘Lie-Cartan connections’ as they are directly induced by the Lie
bracket.
Definition 2. Per Cogliati and Mastrolia (2018, section 5.2), Cartan (1926), a
Cartan (or canonical) connection on a Lie group is a vector bundle connection with
the following additional properties:
1. Left invariance:
2. For any a ∈ Te (G), the exponential curve and auto-parallel curve coincide:
We now look at a specific set of Cartan connections that relate to the Lie bracket.
n
n
∇ [ν] := ωi ⊗ Ai ◦ ωk (·) Ak + ωi ⊗ ωj ν cij
k
Ak (13)
i,k=1 i,j,k=1
d
Remark 6. Left-invariant vector field X can be written as X = x i Ai with
i=1
constant coefficients x i ∈ R. As a result, we have that for left-invariant vector fields
[ν]
X, Y the first term vanishes in (13), and we have that ∇X Y = ν[X, Y ].
Remark 7. The Christoffel symbols ijk (10) relative to the left-invariant moving
frame of reference equal ijk = ν cijk and vanish iff ν = 0, and indeed one has
[ν]
T∇ [ν] (X, Y ) := ∇X Y − ∇Y[ν] X − [X, Y ] = (2ν − 1)[X, Y ],
for left-invariant vector fields X, Y , but we prefer to index the Lie-Cartan con-
nections ∇ [ν] with the parameter ν arising in the commutator rather than with the
parameter 2ν − 1 in the torsion of the connection:
Ad(q) = (Lg )∗ ◦ Ad(q) ◦ (Lg −1 )∗ , (16)
with Ad(g) = (Lg ◦ Rg −1 )∗ : Te (G) → Te (G), where (·)∗ denotes the push-
forward, so that (Ad)∗ = ad with ad(Xe )(Ye ) = [Xe , Ye ], and the transferred
adjoint representation given by Ad(g) = (Lg ◦ Rg −1 )∗ : Tg (G) → Tg (G) that
satisfies
Remark 9 (from left-invariant vector fields to general vector fields). The formulas
above in Lemma 3 only hold for left-invariant vector fields. For example, the general
formula for the torsion is
n
T∇ [ν] = (2ν − 1) ωi ⊗ ωj cij
k
Ak , (21)
i,j,k=1
so only for left-invariant vector fields do we have T∇ [ν] (X, Y ) = (2ν − 1)[X, Y ].
It is not a coincidence that vanishing torsion for arbitrary non-commuting vector
fields gives ν = 12 , whereas the same conclusion can be drawn from left-invariant
non-commuting vector fields. In general, the torsion T∇ and curvature R∇ of a
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1541
connection ∇, and the covariant derivative ∇G of the metric tensor fields, are tensor
fields. Therefore one has for example:
T∇ [ν] (f1 X1 + f2 X2 , g1 Y1 + g2 Y2 ) =
f1 g1 T∇ [ν] (X1 , Y1 ) + f2 g1 T∇ [ν] (X2 , Y1 ) + f1 g2 T∇ [ν] (X1 , Y2 ) + f2 g2 T∇ [ν] (X2 , Y2 )
(22)
for all fi , gi ∈ C ∞ (G) and all vector fields Xi , Yi on G, i = 1, 2.
1. Torsion-free iff ν = 12 ,
2. Curvature-free iff ν ∈ {0, 1},
3. Metric compatible w.r.t. left-invariant metric G if ν = 0.
The above properties explain why the choices ν ∈ {0, 12 , 1} are the most common
choices for Cartan connections. Our application (recall Fig. 1) will require torsion
and metric incompatibility of connections on G. Metric incompatibility allows us
to distinguish between ‘straight curves’ (auto-parallel curves with parallel velocity)
and ‘shortest curves’ (distance minimizing geodesics with parallel momentum), as
we will see in Theorem 1.
Now that we defined the Lie-Cartan connections and that we addressed their
fundamental geometric properties, we express them explicitly in left-invariant
coordinates.
The covariant
n derivative of a field Y = nk=1 y k Ak , along a smooth vector field
X = i=1 x i Ai , is given by (for details, see Remark 10)
1542 R. Duits et al.
⎛ ⎞
[ν]
n
n
∇X Y = ⎝ẏ k + k i j⎠
νcij xy Ak , (23)
k=1 i,j =1
= dt y (γ (t))
d k
where we use common short notation (Jost 2011, (3.1.6)) ẏ k (t)
which equals ẏ k (t) = (Xy k )(γ (t)) and x i = γ̇ i (t) where x i := γ i (t) =
γ (t)
ωi , γ̇ (t) along all flowlines γ of smooth vector field X. A ‘flowline’ is a
γ (t)
smooth curve γ satisfying γ̇ (t) = Xγ (t) ). With slight abuse of notation, we write
⎛ ⎞
n
n
∇γ̇[ν] Y = ⎝ẏ k + k i j⎠
ν cij γ̇ y Ak . (24)
k=1 i,j =1
n
[ν],∗ [ν]
where λ = λi ωi ∈ T ∗ (G). Note that ∇X λ, Y = Xλ, Y − λ, ∇X Y , and
i=1
from this formula we see how (25) follows from (24). The fact that both formulas
involve a plus sign for the summation reflects that the Christoffel symbols (Jost
2011) of the connection and dual connection (in the left-invariant frame) are each
other’s inverse:
[ν],∗ k [ν]
0 = ν(cjk i + cij
k
) = ∇A i
ω , Aj + ωk , ∇A i
Aj .
Remark 10. Next, we explain how (23) follows by the corresponding (previously
addressed) coordinate free formulation (13):
n
[ν]
∇X (Y ) := ∇ [ν] (X, Y ) = ωi ⊗ Ai ◦ ωk (·) (X, Y ) + ωi (X) ωj (Y ) νcij
k Ak
i,j,k=1
n
n
= xi (Ai y k )A k + k xi yj A ,
νcij k
i,k=1 i,j,k=1
n
n
n
with X|γ = x i Ai |γ (·) = γ̇ = γ̇ i Ai |γ (·) and Y = y k Ak , and
i=1 i=1 k=1
n
ẏ k (t) = d k
dt y (γ (t)) = xi (Ai y k )(γ (t)) = X(y k )(γ (t)) via the chain-law.
i=1
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1543
n
G= gij ωi ⊗ ωj , (26)
i,j =1
where gij constant relative to the left-invariant co-frame ωi given by (7) s.t. matrix
[gij ] ∈ Rn×n is symmetric positive definite. Recall we restricted ourselves to the
diagonal case (9). The linear map associated with metric tensor field G is written as
˜
G(X) = G(X, ·) (27)
In many applications (robotics (Chirikjian and Kyatkin 2001; Saccon et al. 2012),
image analysis (Bekkers et al. 2015), cortical vision (Citti and Sarti 2015; Petitot
2003)), it is useful to rely on sub-Riemannian geometry (Agrachev and Sachkov
2004) where certain direction in the tangent bundle is forbidden as they go with
infinite cost. This means that tangents of connecting curves are prescribed to be in
a sub-bundle (also known as ‘distribution’) of the tangent bundle T (G), i.e.
Typically for a controllable system, and its commutators should fill the full
tangent space, in view of Hörmander’s theorem (Hormander 1968). Here we will
constrain ourselves to the case that the Lie algebra is two-bracket generating
+ [ , ] = T (G). (28)
Remark 11. For instance, let us consider the car in Fig. 1 that needs to move in
Lie group SE(2). As the car can proceed forward (by giving gas) and change its
orientation (by turning the wheel), it cannot move sideward. Optimal paths for the
car boil down to sub-Riemannian geodesic problems in which the partial Cartan
[1]
connection ∇ will play a major role, as we will show in the next subsection.
Now let us assume that we label the Lie algebra in such a way that
for some index set I ⊂ {1, . . . , n} and recall that we assumed (28) to hold.
This allows us to consider partial Cartan connections on G that will play a major
role on sub-Riemannian problems on sub-Riemannian manifolds (G, , G0 ) with
1544 R. Duits et al.
G0 = gij ωi ⊗ ωj , (30)
i,j ∈I
as we will see later. Again we restrict ourselves to the diagonal case gij = ξi δij .
n
Ai , Aj = k
cij Ak .
k=1
Consider the distribution given by (29). Then the partial Lie-Cartan connection with
parameter ν ∈ R (defined only on vector fields which map into the distribution)
equals
[ν]
∇ := ωi ⊗ Ai ◦ ωk (·) Ak + ωi ⊗ ωj νcij
k
Ak . (31)
i,k∈I i,j,k∈I
[ν] k
∇ γ̇ Y = k γ̇ i y j A ,
ẏ + νcij k
i,j,k∈I
n
n (32)
[ν],∗
∇X λ = λ̇i + ν k λ γ̇ j ωi ,
cij k
i=1 k=1 j ∈I
where we highlighted the difference with the full Lie-Cartan connection in red,
compared to Definition 3, (24), (25). Again X, Y are vector fields and λ a dual vector
field and γ is an integral curve of X, and ẏ k (t) := dt
d k
y (γ (t)), λ̇k (t) := dt
d
λk (γ (t)).
1
dG0 (g0 , g1 ) := min C(γ (t)) G0 γ (t) (γ̇ (t), γ̇ (t)) dt. (34)
γ ∈ Lip([0, 1], G) 0
γ (0) = g0 ,
γ (1) = g1 ,
∀t∈[0,1] : γ̇ (t) ∈ |γ (t)
for all g0 , g1 ∈ G. The next theorem motivates the choice ν = 1 for the Lie-Cartan
connection ∇ [1] , that is, underlying the Hamiltonian flow associated with (33).
Recall from geometric control theory (Agrachev and Sachkov 2004; Agrachev et al.
2020) that the Pontryagin maximum principle describes the Hamiltonian flow. It
allows us to simultaneously analyse all lifted geodesics (γ (·), λ(·)) in the co-tangent
bundle T ∗ (G), where λ(·) denotes the momentum along the geodesic. This is
important, as a single (analytic) description of a geodesic typically does not say
that much. It is rather the continuum of geodesics and how their lifted versions
in T ∗ (G) are organized that help us understanding their behaviour. This is well
known for classical problems like the ‘mathematical pendulum’, but it is also crucial
in understanding the cut locus (Sachkov 2011) of the ‘sub-Riemannian geodesic’
problem or the ‘elastica problem’ (Sachkov 2008; Bryant and Griffiths 1986;
Mumford 1994) in SE(2) as shown by Sachkov. For cortical contour perception
models (Petitot 2017; Citti and Sarti 2006), this is equally important.
However, the underlying deep role of Cartan connections is often not mentioned,
despite its use in deriving simple solutions to cusp-free sub-Riemannian geodesics
in SE(2) solving association field models (Duits et al. 2016) and for new solutions
(Duits et al. 2016) of sub-Riemannian geodesics in SE(3). The power of such Cartan
connections is also stressed in the Lagrangian geometric viewpoint on optimal
curves by Bryant et al. (2003), Bryant and Griffiths (1986).
In this work, we take the venture point of the geometric Hamiltonian viewpoint
on (sub-)Riemannian geometry (Agrachev et al. 2020; Agrachev and Sachkov
2004) and include a key element coming from Bryant’s Lagrangian viewpoint on
contact manifolds (and his analysis of ‘elastica’ Bryant and Griffiths 1986): That
is (partial) Cartan connections that carry torsion. They will allow us to distinguish
between ‘shortest’ and ‘straight’ curves in Lie groups. For multi-orientation image
processing, this is very useful and intuitive as we show in Theorem 1, Fig. 7 and
section “Overview of Image Analysis Applications for G = SE(d)”.
Theorem 1. In a Riemannian manifold (G, T (G), G), with the tangent bundle
T (G) and metric tensor field G defined in (26), the induced metric dG defined in (33)
and the Lie-Cartan connection ∇ [ν] for ν = 1 defined in (13), we have the following
relations for ‘straight’ curves:
1546 R. Duits et al.
Fig. 7 (a) Geodesically equidistant surfaces St = {g ∈ SE(2)|d (0, g) = t} and geodesic (in
green) for the sub-Riemannian case: = 0 and C = 1. (b) Geodesically equidistant surfaces St
and geodesic for the isotropic Riemannian case: = 1 and C = 1. Now the geodesics are straight
lines. (c) A set of horizontal exponential curves for which γ̇ (τ ) = c1 A1 |γ (τ ) + c3 A3 |γ (τ ) ∈ ,
with constant tangent vector components c1 and c3 . Such curves are auto-parallel (‘straight curves’
in torqued and curved geometry modelled by Lie-Cartan connection ∇ [1] )
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1547
and the following for ‘shortest’ curves (minimizers in (33)); recall also (27):
[1]
γ is a ∇ -straight curve ⇔ γ is a horizontal exponential curve
[1] [1] (37)
⇔ ∇ γ̇ γ̇ = 0 ⇔ γ has ∇ -auto-parallel velocity,
in which P∗ is the projection P∗ n
i=1 λi ω
i = i
i∈I λi ω . For the reverse in (36)
and (38) and for a minimizing curve between g1 = γ (0) and g2 = γ (t), one must
have 0 ≤ t ≤ tcut = min{tconj (λ(0)), tMax,1 (λ(0)}, cf. Agrachev and Sachkov
(2004); Sachkov (2011) for details.
They are found by steepest descent:
t
γ (t) = γ (0) + gradG W (γ (s)) ds, (39)
0
on distance maps W (g) = dG (g, e) that are viscosity solutions of eikonal PDE:
⎧
⎨ grad W (g) = G (grad W (g), grad W (g)) = 1,
G g G G
(40)
⎩ W (e) = 0,
1548 R. Duits et al.
−1
with (metric-intrinsic) gradient gradG W (g) = G˜ dW (g), as this only gives global
minimizing curves, even in the SR setting G → G0 .
Proof. First, we address the ‘shortest curves’ part of the theorem. The items (36)
and (38) follow by the Pontryagin maximum principle Agrachev and Sachkov
(2004) and Theorem 2 in Appendix A. Theorem 2 proves the actual fundamental
relation between the (partial) Lie-Cartan connection ∇ [1] to the Hamiltonian flow,
for the (sub)-Riemannian setting. Here we stress that PMP provides only local
optimality of geodesics.
The geodesics are found by the exponential map that integrates the Hamiltonian
flow (λ(0), t) → (γ (t), λ(t)) = eth (λ0 ).
Optimality of t → γ (t) requires t to be less than the cut time. Such a cut time is
the minimum of the conjugate time tconj (λ(0)) ∈ R ∪ {∞} where local optimality
is lost and the first Maxwell time tMax,1 , where two equidistant geodesics meet for
the first time and where global optimality is lost. Now t ≤ tcut (λ(0)) is guaranteed
by steepest descent (39) on the distance maps W which are obtained as viscosity
solutions (Crandall and Lions 1983; Evans 2010) to the eikonal PDE. This is well
known for the Riemannian case (Mantegazza and Mennucci 2002; Crandall and
Lions 1983) but also applies to the sub-Riemannian (For an intuitive illustration
inside, the viscosity solutions of the PDEs non-optimal wavefronts are cut (at the
first Maxwell set) in the sub-Riemannian setting (42); see (Bekkers et al. 2015,
Fig.3).) case (Monti and Cassano 2001; Bekkers et al. 2015) and holds even in more
general Finsler geometrical settings (Duits et al. 2018).
Secondly, regarding the ‘straight curves’ (35), one has (by (24)) and anti-
symmetry of the structure constants (8) that
n
∇γ̇[1] γ̇ = 0 ⇔ ∀k∈{1,...,n} : γ̈ k − k γ̇ i γ̇ j = γ̈ k = 0
cij
i,j =1
⇔ ∀k∈{1,...,n} : ωk , γ̇ =: γ̇ k = ck = constant (41)
γ
n
t ck Ak
⇔ γ (t) = γ (0) e k=1 ,
For 2D image processing via orientation scores (Duits and Franken 2010a,b;
Bekkers 2017; Duits et al. 2007), we must consider the special case:
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1549
G = SE(2),
A1 = cos θ ∂x + sin θ ∂y , A2 = − sin θ ∂x + cos θ ∂y , A3 = ∂θ ,
I = {1, 3} ⇒ = span{A1 , A3 },
G0 = ξ 2 ω1 ⊗ ω1 + ω3 ⊗ ω3
= ξ 2 (cos θ dx + sin θ dy) ⊗ (cos θ dx + sin θ dy) + dθ ⊗ dθ,
G = G0 + ξ 2 ζ −2 (− sin θ dx + cos θ dy) ⊗ (− sin θ dx + cos θ dy),
(42)
with curve stiffness parameter ξ > 0 and with anisotropy parameter 0 < ζ 1.
The basic idea here is that one considers a path optimization via a Reeds-Shepp car
moving in the orientation score (ξ > 0 puts relative costs on moving forward to
turning the wheel of the car); see Fig. 1. In Fig. 1 the green plane indicates g ⊂
Tg (G) for some g = (x, y, θ ) ∈ SE(2). This 2D subspace is the subspace to which
local velocities are constrained in the sub-Riemannian setting, i.e. γ̇ (0) ∈ g for all
smooth ‘horizontal’ curves γ in the sub-Riemannian manifold with γ (0) = g.
The geometric control problem (34) is then concerned with finding the shortest
path for the car in the orientation score. See Fig. 7 for an intuitive illustration of
Theorem 1 in the SE(2) setting (42). In order to generalize this special case from
d = 2 to d = 3, we must distinguish between the homogeneous space Rd S d−1
of positions and orientations on which the rigid body motion group SE(d) acts and
the Lie group itself. This will be the topic of the next section.
for d ∈ {2, 3}, with roto-translation group G = SE(d) = Rd SO(d) and with
subgroup H = {0} × StabSO(d) (a). Here StabSO(d) (a) = {R ∈ SO(d) | Ra = a}
denotes the subgroup of SO(d) that stabilizes an a priori reference axis a ∈ S d−1 .
In case d = 2, H consist only of the unity element and R2 S 1 ≡ SE(2).
Therefore, let us explain the remaining case d = 3, where we set a = (0, 0, 1)T .
Then the subgroup H can be parameterized as follows:
where we recall that Ra,α denotes a (counterclockwise) rotation around the reference
axis a. The reason behind this construction is that the group SE(3) acts transitively
on R3 S 2 by (x , n ) → g (x , n ) given by
1550 R. Duits et al.
Recall that by the definition of the left cosets, one has g1 ∼ g2 ⇔ g1−1 g2 ∈ H . The
latter equivalence simply means that for g1 = (x1 , R1 ) and g2 = (x2 , R2 ), one has
(x, n) ∈ M3 .
They consist of all g = (x, Rn ) ∈ SE(3) that map reference point (0, a) onto
(x, n) ∈ R3 S 2 : g (0, a) = (x, n), where Rn is any rotation that maps a ∈ S 2
onto n ∈ S 2 .
The shortest curves (distance minimizers) are computed by steepest descent on the
distance maps; recall Theorem 1 and Fig. 1. For a visualization of a steepest descent
(according to Theorem 1) in the lifted image data defined on Md , see Fig. 8.
For uniform cost, the non-data-driven uniform cost case (i.e. C = 1 in (33)),
they can often be computed analytically, and also the cut locus tcut (λ(0)) can be
computed analytically (Sachkov 2011) for (G = SE(2), = span{A1 , A3 }, G0 ).
For the higher dimensional case
(G = SE(3), = span{A3 , A4 , A5 }, G0 = ξ 2 ω3 ⊗ ω3 + ω4 ⊗ ω4 + ω5 ⊗ ω5 ),
the curves can be computed analytically (Duits et al. 2016), and the cut locus mainly
numerically (Duits et al. 2016, 2018). For an explicit definition of the left-invariant
vector fields on SE(3), see Appendix B. The corresponding distance on M3 is then
given by
dM3 (p1 , p2 ) = min dSE(3) ((x1 , Rn1 )h1 , (x1 , Rn2 )h2 ), (45)
h1 ,h2 ∈H
where Rni are any rotations mapping a priori reference axis a onto ni .
Numerical implementations to compute the shortest distance curves in dMd can
be done by accurate, relatively slow, PDE iterations (Bekkers et al. 2015) or better
by more efficient anisotropic fast-marching algorithms (Mirebeau 2018) that are
sufficiently accurate (Sanguinetti et al. 2015). For state-of-the-art fast-marching
approaches, we refer to work of Jean-Marie Mirebeau (2018) and several variants
including a semi-Lagrangian fast-marching approach (where acuteness of stencils
guarantees a single pass algorithm with convergence results) (Mirebeau 2014). See
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1551
Fig. 8 Geodesic front propagation directly in the image domain leaks at crossings (left). To
overcome this complication, we lift the data to Md = Rd S d−1 (here d = 2). This gives a
mobility/cost C in the lifted space (Duits et al. 2018; Bekkers et al. 2015). This determines the
distance on (34), and we apply geodesic front propagation (FP) in Md via the eikonal equation (40),
as depicted by the growing opaque spheres in green. We depict FP in symmetric (sub)-Riemannian
models and in asymmetric improvements (Duits et al. 2018). In purple, we indicate the steepest
descent (SD) backtracking (39)
also Duits et al. (2018). For a more recent Hamiltonian fast-marching approach, see
Mirebeau and Portegies (2019). The Hamiltonian approach directly relates to the
PDE approach in Bekkers et al. (2015) also discretizing the eikonal equations, with
the main difference that at each step one updates only the relevant voxels (instead
of a full volume) in a single pass algorithm which leads to a tremendous speedup
(Sanguinetti et al. 2015).
In practice, it does not make a big difference if one relies on highly anisotropic
Riemannian geodesic models (with an anisotropy of, say, about 10) to simplify
sub-Riemannian geodesic models (with infinite anisotropy). See Duits et al. (2018,
Thm.2) and Sanguinetti et al. (2015) for theoretical and practical underpinning of
this statement.
For the homogeneous space of positions and orientations, both the highly
anisotropic Riemannian and the sub-Riemannian models have major benefits over
isotropic Riemannian models in vessel tracking applications (Bekkers et al. 2017,
2018); see section “Overview of Image Analysis Applications for G = SE(d)”.
Furthermore, there exist several extensions of the highly anisotropic Riemannian
or sub-Riemannian models (Duits et al. 2018). There the most relevant extended
models are:
1552 R. Duits et al.
Fig. 9 An example of a
smooth sub-Riemannian
geodesic
γ = (x(·), y(·), θ(·)) (in
purple), whose spatial
projection (in black) shows a
cusp (red point). A cusp point
is a point (x, y, θ) on γ such
that the velocity (black arrow)
ẋ of the projected curve
x(·) = (x(·), y(·)) switches
sign at (x, y)
ξ 2 |ẋ · n|2 + ṅ2 if ẋ ∝ n and ẋ · n ≥ 0,
F+
0 (p, ṗ)
2 := (46)
+∞ otherwise,
with p = (x, n) and ṗ = (ẋ, ṅ), and while including a highly anisotropic
Riemannian approximation and the mobility/cost C (recall (33) and (34)) into
the Finsler function, one obtains altogether
ξ2
F+
ζ (p, ṗ) := (C(p))
2 2 ξ 2 |ẋ · n|2 +
ζ2
ẋ ∧ n2
+ ( ζ12 − 1)(ẋ · n)2− ) + ṅ2
(47)
with 0 < ζ 1. For details and illustrations, see Duits et al. (2018).
• The projective line bundle variant (where anti-podal points are identified) that
partly resolves the cusp problem (Bekkers et al. 2017, ch.4) and that better relates
to cortical sub-Riemannian models (Petitot 2017). It can be shown that it boils
down to taking the minimum distance over the four cases that arise by flipping
(i.e. ni → −ni ) or not flipping the two boundary conditions. For details, see
Bekkers et al. (2017).
In Fig. 10, we depict growing spheres of several models. It can be observed in the
sub-Riemannian setting such spheres reveal folds which are the closure of the first
Maxwell set where two geodesics with equal length meet. This is easily understood
as geodesic back propagation via steepest descent (with the same speed) can be done
along two directions orthogonal to each of the orthogonal wavefronts that meet at
44 Recent Geometric Flows in Multi-orientation Image Processing via. . .
Fig. 10 The development of spheres centred around e = (0, 0, 0) with increasing radius R. A: The normal SR spheres on M2 given by {p ∈ M | dF0 (p, e) = R}
where the folds reflect the first Maxwell sets (Bekkers et al. 2015; Sachkov 2011). B: The SR spheres with identification of antipodal points with additional
folds (first Maxwell sets) due to π -symmetry. C: The asymmetric Finsler norm spheres given by {p ∈ M | dF+ (p, e) = R} visualized from two perspectives
0
with extra folds (first Maxwell sets) at the back (−μ, 0, 0). The black dots indicate points with two folds. For details, see Duits et al. (2018)
1553
1554 R. Duits et al.
folds on the spheres. In Fig. 11, we depict the cut locus where geodesic fronts lose
their optimality for the projective line bundle case.
γg,c (t) := g expG (Lg −1 )∗ ct (48)
n
Such an exponential curve (recall Fig. 7) is determined by c = i=1 c Ai |g ∈
i
⎧
⎪
⎨
n
d
dt γg,c (t) = ci Ai |γg,c (t) , t ∈ R,
i=1
⎪
⎩ γ (0)
g,c = c ∈ Tg (G).
The tangent to the locally best fitting exponential curve will be the first vector of our
locally adaptive frame (henceforth referred to as ‘gauge frame’). The mathematical
details on the fitting procedure on how to compute the best fitting exponential curve
and the local optimization problem that defines such a best exponential curve fit will
follow in section “Exponential Curve Fits of the Second Order Are Found by SVD
of the Hessian”. For now, to get a geometrical intuition, see Fig. 12. Inclusion of
such a gauge frame has the following benefits:
Fig. 11 Top two rows: Evolution of the first Maxwell set as the radius R of the SR spheres (with uniform cost C = 1) increases. The first Maxwell sets are
visible via folds on the spheres, as steepest descent (39) has more than one equal length options. Bottom: Equal length SR length minimizers (shortest curves) in
the projective line bundle case ending at the points indicated in the top two rows. The multiplicity of the Maxwell points is indicated by ν and the characteristic
radii R, R̃, where the multiplicity changes from 2 to 3 and from 3 to 4 can be computed analytically; see Bekkers et al. (2017)
1555
1556 R. Duits et al.
Fig. 12 Illustrating gauge frame fitting at a fixed point g ∈ SE(2). Top: left invariant frame where
Ad = n · ∇R2 , as indicated by the red line. Bottom: we choose a frame with Bd = c given by (51)
or (55) that takes into account the local curvature. In green, we see the corresponding exponential
curve fit γg,c to the data
Hf = ∇ ∗ df (49)
for all f ∈ C 2 (G, R). It will turn out in section “Exponential Curve Fits of the
Second Order Are Found by SVD of the Hessian” that the theory of best exponential
curve fits of second order will boil down to an SVD of ∇ ∗ df , where one can either
choose ∇ = ∇ [0] or ∇ = ∇ [1] as the corresponding linear maps associated to the
Hessian are each other’s adjoints. Indeed, a brief computation in the frame {Ai }ni=1
of left-invariant vector fields gives us
(∇ [0],∗ )Ai df (Aj ) = Ai Aj f,
n
(∇ [1],∗ )Ai df (Aj ) = Ai Aj f − k A f = A A f − (A A − A A )f
cij k i j i j j i
k=1
= Aj Ai f,
(50)
Exponential Curve Fits of the Second Order Are Found by SVD of the
Hessian
In this section, we will show that exponential curve fits (like the white line in Fig. 12)
are computed by singular value decomposition of the Hessian. This technique is well
known on the Lie group G = R2 and widely used in image processing to compute
locally adaptive frames (or ‘gauge frames’) (Haar Romenij 2003), but generalizing
this to a Lie group like G = SE(2) requires the Lie-Cartan connections (for ν = 0
or ν = 1 as we will see).
We start by defining the main gauge vector (by means of the Lie-Cartan
connection) as
[0]
Bd g := argmin c∇ gradf (g), (51)
c ∈ Tg (G)
c = 1
Above the vectors in the purple parts belong to Tg (G), whereas the vectors in the
green part belongs to Tγg,c (t) (G). Next, we write (51) as an SVD problem that
involves the Hessian of f at g:
B d g = argmin ∇c[0] df (g)∗
c ∈ Tg (G)
c = 1
(52)
= argmin (H [0] f (g))(c, ·)∗ .
c ∈ Tg (G)
c = 1
1558 R. Duits et al.
Now identify the Hessian H [0] f (g) in the natural way to the associated linear map
Ag : Tg (G) → Tg∗ (G) by
so we arrive at an SVD of A.
Remark 12. As the SVD of A∗ coincides with the SVD of A, we may as well
replace the choice ν = 0 in (52) and (51) with our special choice ν = 1. Recall
also (19).
Remark 13. The matrix representation for (53) relative to the basis of left-invariant
vector fields gets a bit involved if expressed in the left-invariant frame since the
adjoint A∗ depends on the choice of left-invariant metric. In our special case of
interest (42), we set ζ = 1 and get
with Mξ = diag{ξ −1 , ξ −1 , 1} and with H the matrix whose element Hji with row
index i and column index j equals Hji = Aj Ai f (g).
Bd g := argmin Kρ h−1 g · Lgh−1 ∇c[0] grad f (h) dμ(h) .
∗
c ∈ Tg (G) G
c = 1
(55)
where μ is the left-invariant Haar measure on G = SE(3) = R3 SO(3), with
c(h) = (Lhg −1 )∗ c ∈ Th (G), and where K is an external regularization kernel with
two regularization parameters ρ = (ρS , ρA ) ∈ (R+ )2 . Typically, it is the direct
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1559
product of an isotropic spatial Gaussian on R3 with spatial scale ρS > 0 and a heat
kernel on SO(3) with angular scale ρA ; for details and motivation, see Duits et al.
(2016, ch:2.7).
Such a regularization will stabilize the best exponential curve fits, so that
they become more adjacent with neighbouring exponential curve fits. Again the
regularized problem is solved with an SVD with Aρ :
et al. 2014) is computed via an SVD of the Hessian induced by the Lie-Cartan
connection with ν = 1; recall section “Straight Curve Fits”.
More important, however, is that the proposed toolset enables the design of
a completely new range of algorithms that enables analyses that are simply not
possible by holding on to data representations on Rd . These include globally
optimal path optimization with an intrinsic (curvature penalizing) smoothness
constraint via sub-Riemannian geometry (Bekkers et al. 2015; Duits et al. 2018;
Chen 2016; Mirebeau and Portegies 2019; Franceschiello et al. 2019), which is
the topic of Sect. “Shortest Curve Application: Tracking of Blood Vessels”; the
direct computation of curvature and torsion of blood vessels for biomarker research
without having to explicitly track/model the vessel trajectories (Bekkers et al. 2015),
which is the topic of section “Straight Curve Application: Biomarkers for Diabetes”;
and crossing-preserving, curvature-adaptive denoising schemes (Franken and Duits
2009; Duits et al. 2019; Smets et al. 2019), which is the topic of section “Straight
Curve Application: PDEs on M2 for Denoising”.
What all of these applications have in common is that they either rely on ‘straight
curves’ which are auto-parallel w.r.t. the Lie-Cartan connections or on ‘shortest
curves’ which have parallel momentum (for ν = 1) according to our main theorem,
Theorem 1.
The differential-geometrical toolset described in this chapter can directly be
translated to numerical schemes by working with discrete grids and finite difference
stencils (Creusen et al. 2011) or via basis expansion methods such as spherical
harmonics (Janssen et al. 2017; Reisert and Kiselev 2011; Skibbe and Reisert
2017) and B-splines (Bekkers et al. 2018) that allow for the computation of exact
derivatives or via a mix of numerical and analytical schemes (Bekkers 2017; Zhang
et al. 2016). Examples of the latter include the use of analytic approximations of sub-
Riemannian distances (Sachkov 2011; Bekkers et al. 2015) in a clustering algorithm
(Bekkers et al. 2017) or analytic solution approximations to left-invariant diffusion
equations (Portegies et al. 2015) for smoothing or uncertainty analysis (Meesters
et al. 2017). The interested reader is referred to Franken and Duits (2009), Janssen
et al. (2017), Duits et al. (2016), Creusen et al. (2011), and Bekkers (2017) for
algorithmic implementation details of the left-invariant derivatives for processing of
orientation scores.
Fig. 13 Results of globally optimal data-adaptive geodesics computed in different metric tensor
settings. Left column: Conventionally such shortest paths are computed based on 2D isotropic
metrics. Such models suffer from short cuts (geodesics snap to other, typically parallel, dominant
vessels) and often fail at crossings. Middle column: Shortest path computations using an isotropic
metric in a lifted position-orientation space M2 ≡ SE(2) reduce problems with crossings due to a
disentanglement of local orientations, but the issue of short cuts remains as unnatural curves with
high curvature points are still allowed. Right column: Both problem are solved by working with
a sub-Riemannian metric on SE(2) by which only natural curves are allowed in the lifted space
(cf. Fig. 1). The right two columns show the 2D projections of geodesics in SE(2). For further
experiments on large datasets, see Bekkers et al. (2015, 2017)
lifted space Md using a sub-Riemannian geometry (cf. Fig. 1), we are able to solve
such limitations of classical vessel tracking on Rd .
In our approach for computing globally optimal sub-Riemannian distance mini-
mizers between two points g0 , g1 ∈ SE(2), we consider the metric dG0 of Eq. (34),
which is defined using the SR metric tensor G0 given in (30) and which is based
on a cost function C : G → R+ that is derived from the orientation score.
The cost function encourages curves to move over vessel regions (low cost) and
penalizes moving over background regions (high cost). Such a cost can, for example,
be derived from the orientation score via a vesselness measure (Hannink et al.
2014), a line-fidelity measure based on left-invariant derivatives (Bekkers et al.
2015) or gauge derivatives in SE(2) (Duits et al. 2016; Zhang et al. 2016). The
actual computation of the shortest paths then consists of (1) solving the SR eikonal
equation in order to obtain a distance map from g0 to any other point in SE(2) and
(2) perform gradient descent on the distance maps from g1 back to g0 to obtain the
geodesic (cf. Theorem 1). The numerical computation of step 1 can, for example,
be done in an iterative upwind scheme with left-invariant finite-difference stencils
(Bekkers et al. 2015) or via very efficient fast-marching schemes (Mirebeau and
1562 R. Duits et al.
Table 1 Comparison of successful vessel extractions (Bekkers et al. 2017) via Riemannian
geodesics using 2D isotropic metric tensors in the image domain, Riemannian geodesics in
the lifted domain SE(2) of orientation scores with spatially isotropic metric tensors and sub-
Riemannian geodesics in SE(2)
Metric Nr of successful vessel extractions
Riemannian R2 71.7% (132/184)
Riemannian SE(2) - Eq. (33) 82.6% (152/184)
Sub-Riemannian SE(2) - Eq. (34) 92.4% (170/184)
Portegies 2019; Mirebeau 2014) in which the sub-Riemannian metric tensor field is
approximated with a highly anisotropic Riemannian metric tensor field (Sanguinetti
et al. 2015).
Exemplary results are given in Fig. 13, and a quantitative evaluation of the benefit
of a sub-Riemannian versus Riemannian metrics is given in Table 1. The principle
that in a sub-Riemannian framework we only consider natural smooth paths, as illus-
trated in Fig. 1, leads to very clear improvements for vessel tracking. The method for
computing such curvature-penalized data-adaptive SR geodesics generalizes well to
other Lie groups G and has found several high-impact applications in medical image
analysis. See, e.g. Bekkers et al. (2015) and Sanguinetti et al. (2015) for 2D vessel
tracking via SR geodesics in G = SE(2). See also Mashtakov et al. (2017) for
vessel tracking in retinal images defined on the two-sphere S 2 = SO(3)/SO(2) via
SR geodesics in SO(3).
Fig. 14 Tracking of coronary arteries in 3D-X-ray: (a) test dataset with two boundary points
x0 , x1 ∈ R3 , (b) geodesic tracking result (yellow) that is far from ground truth (red) when applying
standard geodesic tracking (Caselles et al. 1997) on R3 without lifting to M3 , (c) geodesic tracking
result (blue) when applying geodesic tracking in M3 using the isotropic Riemannian model (i.e.
using Finsler function F(ẋ, ṅ) = ξ 2 ẋ2 + ṅ2 with ξ =0.1), (d) geodesic tracking result when
applying geodesic tracking in M3 using the sub-Riemannian model (i.e. using asymmetric Finsler
function F+0 given by (46) again with ξ = 0.1). The spherical parts of the boundary conditions
p1 = (x1 , n1 ) and p2 = (x2 , n2 ) in (45) are automatically optimized by checking for the ‘first
passing front’, i.e. adjust the source set in eikonal PDE system (40) in Theorem 1 from singleton
{e} to the set S = {(x0 , n) | n ∈ S 2 } and select minimal n1 = argminn∈S 2 W (x1 , n) prior to
backtracking (39)
These ‘straight’ curves have constant velocity components w.r.t. the left-invariant
frame {A}ni=1 , and their projections to Rd are circles/spirals whose curvature κ can
3 sign c1
directly be computed, e.g. in SE(2), one has κ = √c .
|c1 |2 +|c3 |2
Akin to the vessel enhancement techniques via orientation scores of Zhang et al.
(2016) and Hannink et al. (2014), a confidence measure for the presence of a line
structure can be extracted from the left-invariant Hessian Franken et al. (2007),
Franken and Duits (2009) and Bekkers et al. (2015). Together, the confidence and
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1565
Fig. 15 Top row: Via exponential curve fits in orientation scores (cf. section “Straight Curve
Fits”), we are able to locally analyse line structures and compute their corresponding curvature
values, as well as assigning confidence scores at each position and orientation. In the right most
figure, curvature is colour coded, and confidence is encoded with opacity. Bottom row: confidence
and curvature projected to the 2D plane and visualized as in an overlay on top of the original input
image. From these summarizing statistics such as the mean and standard deviation of absolute
curvature can be computed, which can be used as biomarkers for diabetes and hypertension
curvature measure can be used to obtain summarizing statistics for the amount
of tortuosity of blood vessels in medical images, as is illustrated in Fig. 15.
Such tortuosity measures are significantly associated with severity of diabetes and
hypertension on large-scale clinical datasets with retinal images (Bekkers et al.
2015; Bekkers 2017; Zhang et al. 2017; Zhu et al. 2016, 2020). For quantification
of blood vessel tortuosity in 3D medical image data, see Janssen et al. (2017).
Two key ideas have greatly improved techniques for image enhancement and
denoising: the lifting of image data to multi-orientation distributions (e.g. orien-
tation scores Duits 2005) and the application of nonlinear PDEs such as total
variation flow (TVF) and mean curvature flow (MCF). These two ideas were
recently combined by Chambolle and Pock (for TVF ) (2018) and Citti and Sarti
(2006) (for MCF) for 2D images.
1566 R. Duits et al.
In our recent works Duits et al. (2019) and Smets et al. (2019), these approaches
were extended to enhance and denoise images of arbitrary dimension. The TV
flows and MC flows on Md showed best results when using locally adaptive
frames of a specific type, namely, these locally adaptive frames that were computed
via the best-exponential curve fit procedure (i.e. the ‘straight curve’ fit in the
torqued and curved space SE(d); recall Theorem 1 and Figs. 1 and 7) explained in
section “Straight Curve Fits”. Then the standard procedure mentioned in section “A
Single Exponential Curve Fit Gives Rise to a Gauge Frame” to compute the induced
locally adaptive frame (‘gauge frame’) {B1 , . . . , Bn } is applied. The principle
direction Bd tangent to the exponential curve is computed as the eigenvector with
smallest eigenvalue in the SVD of the Hessian induced by the Lie-Cartan connection
with ν = 1; recall Definition 6.
For an illustration, recall Fig. 12 where d = 2 and n = 3.
In this section, we constrain ourselves to d = 2, and we shall summarize the MCF
and TVF PDEs on SE(2) (for crossing-preserving flows via invertible orientation
scores, recall Fig. 1) and highlight a denoising result, where we compare to a popular
denoising method called ‘Block Matching 3D’ (BM3D) (Lebrun 2012; Dabov et al.
2007).
The PDE system for MCF and TVF on M2 = SE(2) via the gauge frame
{B1 , . . . , B3 } is best expressed in this frame and is given by
⎧
⎪
⎪ 3
⎪ ∂W
⎨ Bi W (·, t)
(g, t) = ∇W (g, t)a Bi (g), g ∈ SE(2), t ≥ 0,
∂t ∇W (·, t)
⎪
⎪
i=1
⎪
⎩W (g, 0) = U (g), g ∈ SE(2),
(56)
with parameter a ∈ {0, 1}, where we have a total variation flow (TVF) if a = 0
and a mean curvature flow (MCF) if a = 1. We denote the operator that maps the
orientation score U (·) to its denoised version W (·, t) by t :
π
f → Wψ f → t (Wψ f )(·, ·) → ft (·) := t (Wψ f )(·, θ ) dθ. (57)
−π
• Denoising via orientation scores is beneficial over direct image denoising. For
PDE-based image processing, this was already done in Franken and Duits (2009)
and by others in Citti et al. (2016), Citti and Sarti (2006), Baspinar (2018),
Boscain et al. (2018), and Bertalmío et al. (2019) performing left-invariant
PDE-based image processing via ‘orientation liftings’ (expanding the image
domain to Md ). However, our experiments where we use (data-driven) TVF and
MCF (56) on M2 now show that we considerably improve quantitative results
Fig. 16 Comparing Gauge TVF with coherence enhancement and BM3D against correlated noise.
Top row, from left to right: (1) original image, (2) original image polluted with correlated Gaussian
noise, (3) denoising result using the BM3D method, (4) denoising result using the TVF method via
invertible orientation scores given by (57) relying on PDE (56) with a = 0. Bottom row, the same as
the top row but now applied on a different image containing collagen fibres. The standard deviation
for BM3D and evolution time for TVF were adjusted to reach optimal L2 error; see Smets et al.
(2019) for details
⎧
⎪
⎪ 3
⎪ ∂W
⎨ Ai W (·, t)
(p, t) = ∇W (p, t)a Ai (p), p ∈ M2 , t ≥ 0,
∂t ∇W (·, t)
⎪
⎪
i=1
⎪
⎩W (p, 0) = U (p), p ∈ M2 ,
(58)
as done also in Citti et al. (2016) and Chambolle and Pock (2011). Gauge TVF
performs better than normal left-invariant TVF. Gauge MCF performs better than
normal left-invariant MCF via invertible orientation scores (57).
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1569
These observations are also supported by much more experiments with both
quantitative and qualitative comparisons for the case d = 2 and d = 3; see Smets
et al. (2019). Regarding related works and experiments via crossing-preserving
diffusions via invertible orientation scores, we refer to Franken and Duits (2009)
(d = 2) and Janssen et al. (2018) (d = 3).
In Smets et al. (2019), we have compared the (crossing-preserving) TVF and
MCF PDE flows via invertible orientation scores to (crossing-preserving) nonlinear
diffusions via invertible orientation scores. In general, better results are obtained
by the MCF and TVF approach than with nonlinear diffusion (Perona and Malik
1990, coherence enhancing diffusion (Weickert 1999)). However, edge-enhancing
diffusion techniques (Fabbrini et al. 2013) via invertible orientation scores could
advocate otherwise and are left for future work.
Fig. 17 Qualitative comparison of denoising a FODF obtained by (CSD) (Tournier et al. 2007;
Descoteaux et al. 2009) from a standard DW-MRI dataset (with b = 1000s/mm2 and 54 gradient
directions). For the CSD, we used up to eighth-order spherical harmonics, and the FODF is then
spherically sampled on a tessellation of the icosahedron with 162 orientations. Image is taken from
our previous journal article (Smets et al. 2019). For details on this qualitative DW-MRI experiment
and related quantitative DW-MRI denoising experiments, see the works by St.-Onge et al. (2019)
and Smets et al. (2019)
where surgeons should not damage the optic radiation bundle as this can lead to
a reduction of visual sight. Left-invariant PDE evolutions (such as diffusions) on
M3 discussed in Duits and Franken (2011), Portegies (2018), Reisert and Kiselev
(2011), Momayyez-Siahkal and Siddiqi (2009), and Duits et al. (2013) are very
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1571
beneficial for identifying such bundles, as shown by Meesters et al. (2017) and more
generally by Prckovska et al. (2015).
As we can see in Fig. 17, the FODF obtained from raw DW-MRI data via an effective
and widely used method CSD produces a lot of spurious peaks in the spatial field
of angular distributions that are not well aligned/supported by neighbouring peaks
and one needs ‘contextual processing’ (Prčkovska et al. 2015; Momayyez-Siahkal
and Siddiqi 2009; Reisert and Kiselev 2011) to identify large bundles (Portegies
et al. 2015; Meesters et al. 2017) in a stable way. Here we observe that crossing-
preserving MCF and TVF on M3 better preserve crossings and bundle boundaries
than diffusion methods do. For detailed evaluations, see Smets et al. (2019) and
St Onge et al. (2019).
Conclusion
Fig. 18 3D X-ray image of renal arteries. Three view points on the same scene. Input image
in yellow frame, output of coherence enhancing diffusion via 3D-orientation scores (CEDOS:
Eq. (59)) for a fixed stopping time. For details and comparisons to other methods such as coherence
enhancing diffusion (Weickert 1999) acting directly in the image domain, see Janssen et al. (2018)
The family of all geodesics γ (t) augmented to v(t) = (γ (t), λ(t)) with their
momentum representation λ(t) = ni=1 λi (t) ωi along the geodesic are flow-
γ (t)
lines of a so-called Hamiltonian flow on the co-tangent bundle T ∗ (G). Controlling
the Hamiltonian flow means controlling the complete family of all geodesics
(minimal distance curves) together. Next, we explain the concept of Hamiltonian
flows and derive the canonical Hamiltonian equations associated to the left-invariant
Riemannian and sub-Riemannian problem of interest.
To a Hamiltonian function h
−
→
one associates a Hamiltonian vector field h (or ‘Hamiltonian lift’) in the co-tangent
bundle. It is determined via the fundamental symplectic form that is given by
1574 R. Duits et al.
n
σ = ωi ∧ d̄λi ,
i=1
−
→
∀V =(ġ,λ̇)∈Tg (G)×T (Tg∗ (G)) : σ ( h (g, λ), V ) = dh(g, λ), V . (60)
d −
→ −
→
h(v(t)) = σ ( h (v(t)), h (v(t))) = 0, with v(t) = (γ (t), λ(t)),
dt
−
→
Furthermore, the lifting of a Hamiltonian h to its Hamiltonian lift h is a Lie algebra
isomorphism (Agrachev and Sachkov 2004):
−−−−→ −
→ − →
{h1 , h2 } = [ h 1 , h 2 ] (61)
where {·, ·} denotes Poisson brackets and [·, ·] denotes the usual Lie bracket of
vector fields. In the left-invariant (co)-frames, Poisson brackets are expressed as
n
∂g ∂f
{g, f } = (Ai f ) − (Ai g), (62)
∂λi ∂λi
i=1
but this may also be expressed in canonical coordinates (Agrachev and Sachkov
2004, eq.11.21).
Generalizing the above example, the next theorem provides the Hamiltonian flows
for the left-invariant Riemannian and sub-Riemannian problem on G.
1 i 1 ij
n n
h= λ λi = g λi λj (63)
2 2
i=1 i,j =1
and the corresponding Hamiltonian flow (generated by the Hamiltonian vector field
−
→
h ) can be written as (recall the definition of linear map G˜ (27))
⎧
−
→ ⎨ ˜−1
G λ = γ̇ (horizontal part)
v̇ = h (v) ⇔
⎩ ∇γ̇[1],∗ λ = 0 (vertical part)
⎧
⎪
⎪
n (64)
⎨ γ̇ = u = λ :=
⎪ i i i g ij λj (horizontal part)
j =1
⇔
n
⎪
⎪
⎪
⎩ λ̇ i = {h, λ i } = − k λ uj (vertical part)
cij k
j,k=1
with velocity controls ui := γ̇ i = ωi , γ̇ and v(t) = (γ (t), λ(t)) a curve in the
γ (·)
co-tangent bundle T ∗ (G)where the geodesic γ (t) ∈ G and the momentum along
the geodesic λ(t) ∈ Tγ∗(t) (G), and with {·, ·} denoting Poisson brackets, recall (62).
The Hamiltonian on sub-Riemannian manifold (G, = span{Aj }j ∈I , G0 ) equals
1 i 1 ij
h= λ λi = g λj λi (65)
2 2
i∈I i,j ∈I
Proof. The results (64) and (66) follow from standard application of the Pontryagin
maximum principle (PMP Agrachev and Sachkov 2004) to the Riemannian and
sub-Riemannian geodesic problem, respectively. First of all, we note that regarding
the Hamiltonian in the Riemannian case (63), we have that it is computed by
applying the Fenchel transform on the integrand of the action functional (i.e. squared
Lagrangian):
1576 R. Duits et al.
n
h(g, λ) = sup {λ, γ̇ − L2 (g, γ̇ )} with λ = λi ωi ∈ Tg∗ (G);
γ̇ ∈Tg (G) i=1 g
get the Hamiltonian h :
hence, we ! T ∗ (G) → R+ given by
n
1 i j
n
n
n
h = max λi v − 2
i v v gij = 1
2 λi gij λj = 1
2 λi λi ,
(v 1 ,...,v n ) i=1 i,j =1 i,j =1 i=1
(67)
n
with λi = g ij λj . The Hamiltonian in the SR-case (65) comes with the constraint
j =1
γ̇ ∈ = 0 if i ∈
(i.e. γ̇ i / I ), and then with
a isimilar type of
reasoning above (but
then with v i = 0 if i ∈
/ I ), we get h = 12 λ λi with λi = g ij λj , and we find
i∈I j ∈I
i
the ‘extremal controls’ (Agrachev and Sachkov 2004): vmax = ui = λi .
Note that (64) and (66) are of the form a ⇔ b ⇔ c. We first comment on a ⇔ c
and then show b ⇔ c.
a ⇔ c follows by direct computation as we show next. By computing, we have the
following relation in Poisson brackets:
n
n
[Ai , Aj ] = k
cij Ak ⇔ {λi , λj } = Ai λj − Aj λi = k
cij λk ,
k=1 k=1
as the ‘conjugate momentum mapping’ gives rise to a Lie algebra morphism; see
Agrachev and Sachkov (2004, p.164). Therefore (via (62), (67)), we find (with
k = −ck ):
Liouville’s theorem and cij ji
which hold for i = 1, . . . , n in the Riemannian case and for i ∈ I in the sub-
Riemannian case. In the above expression, one must set J = {1, . . . , n} in the
Riemannian case and J = I in the sub-Riemannian case.
b ⇔ c follows by (68), and the expression (25) for the Lie-Cartan connection (with
ν = 1) and expression (32) for the partial Lie-Cartan connection (again with ν = 1),
respectively, are expressed in left-invariant coordinates.
We need two charts to cover SO(3). When using the following coordinates (ZYZ-
Euler angles) for SE(3) = R3 SO(3) for the first chart:
A1 |g = (cos α cos β cos γ − sin α sin γ )∂x + (sin α cos γ + cos α cos β sin γ )∂y − cos α sin β ∂z
A2 |g = (− sin α cos β cos γ − cos α sin γ )∂x + (cos α cos γ − sin α cos β sin γ )∂y + sin α sin β ∂z
A3 |g = sin β cos γ ∂x + sin β sin γ ∂y + cos β ∂z ,
A4 |g = cos αcotβ ∂α + sin α ∂β − cos α
sin β ∂γ ,A5 |g = − sin αcotβ ∂α + cos α ∂β + sin α
sin β ∂γ ,
A6 |g = ∂α .
(70)
The above formulas do not hold for β = π or β = 0: We need a second chart (Duits
and Franken 2011):
π π
g = (x, y, z, Rex ,γ̃ Rey ,β̃ Rez ,α ), with β̃ ∈ [−π, π ), α ∈ [0, 2π ), γ̃ ∈ (− , ).
2 2
(71)
Then the left-invariant vector field formulas are (for |β̃| = π2 ) given by
A1 |g = cos α cos β̃ ∂x + (cos γ̃ sin α + cos α sin β̃ sin γ̃ ) ∂y + (sin α sin γ̃ − cos α sin β̃ cos γ̃ ) ∂z
A2 |g = − sin α cos β̃ ∂x + (cos α cos γ̃ − sin α sin β̃ sin γ̃ )∂y + (sin α sin β̃ cos γ̃ + cos α sin γ̃ ) ∂z
A3 |g = sin β̃ ∂x − cos β̃ sin γ̃ ∂y + cos β̃ cos γ̃ ∂z ,
A4 |g = − cos αtanβ̃ ∂α + sin α ∂β̃ + cos α ∂γ̃ ,A5 |g = sin αtanβ̃ ∂α + cos α∂β̃ − cos sin α
β ∂γ̃ ,
cos β̃
A6 |g = ∂α .
(72)
Proof of Lemma 2
n
n
∇γ̇[0] Y (g) = x i Ai |g (y k ) Ak |g = X|g (y k ) Ak |g
i,k=1 k=1
n
y k (γ (t)) − y k (g)
= lim Ak |g
t→0 t
k=1
n
k
k=1 y (γ (t)) Lgγ (t)−1 Ak |γ (t) − Y (g)
∗
= lim
t→0 t
Lgγ (t)−1 Y (γ (t)) − Y (g)
∗
= lim .
t→0 t
Now let X, Y be left-invariant. Note that ∇ [0] Y = 0 because Lg(γ (t))−1 ∗ Y (γ (t))
= Y (g) in (14) regardless of γ . Then the alternative formula (15) for general
Lie-Cartan connection ∇ [ν] follows, as the structure constants cij k ∈ R satisfy
k
k cij Ak = [Ai , Aj ] and the Lie bracket is bilinear for left-invariant vector fields,
[ν] [0]
and we find ∇X Y = ∇X Y + ν[X, Y ] = ν[X, Y ].
∗ (Xg )(Yg ) = [Xg , Yg ] that we
For our reformulation in (15), we used (17):(Ad)
show next. By the derivation in Jost (2011, Lemma.5.4.2), one has (Ad)∗ (Xe )(Ye ) =
[Xe , Ye ]. Now the Cartan-Maurer form is a Lie algebra isomorphism, and we
get (17)
Proof of Lemma 3
[ν]
T∇ [ν] (X, Y ) = ∇X Y − ∇Y[ν] X − [X, Y ]
= ν[X, Y ] − ν[Y, X] − [X, Y ] = (2ν − 1)[X, Y ].
[ν] [ν]
R∇ [ν] (X, Y )Z = ∇X ∇Y Z − ∇Y[ν] ∇X
[ν] [ν]
Z − ∇[X,Y ]Z
References
Agrachev, A.A., Sachkov, Y.L.: Control Theory from the Geometrical Viewpoint, Vol 87. Springer
(2004)
Agrachev, A., Barilari, D., Boscain, U.: A Comprehensive Introduction to Sub-Riemannian
Geometry. CUP Cambridge Studies in Advanced Mathematics (2020)
Ali, S., Antoine, J., Gazeau, J.: Coherent States, Wavelets and Their Generalizations. Springer,
New York/Berlin/Heidelberg (1999)
Barbieri, D., Citti, G., Cocci, G., Sarti, A.: A cortical-inspired geometry for contour perception and
motion integration. J. Math. Imaging Vision 49(3), 511–529 (2014)
Baspinar, E.: Minimal surfaces in Sub-Riemannian structures and functional geometry of the visual
cortex. Ph.D. thesis, University of Bologna (2018)
Bekkers, E.: Retinal Image Analysis using Sub-Riemannian Geometry in SE(2). Ph.D. thesis,
Eindhoven University of Technology (2017) cum laude (≤ 5% best at TU/e). https://pure.tue.
nl/ws/files/52750592/20170123_Bekkers.pdf
Bekkers, E., Duits, R., Berendschot, T., Haar Romeny, B.: A multi-orientation analysis approach
to retinal vessel tracking. JMIV 49(3), 583–610 (2014)
Bekkers, E., Zhang, J., Duits, R., ter Haar Romeny, B.: Curvature based biomarkers for diabetic
retinopathy via exponential curve fits in se(2). In: Chen, X.E.A. (ed.) Proceedings of the
Ophthalmic Medical Image Analysis International Workshop, Oct 113–120 (2015)
Bekkers, E., R. Duits, Mashatkov, A., Sanguinetti, G.: A PDE approach to data-driven sub-
Riemannian geodesics in SE(2). SIAM J. Imag. Sci. 8(4), 2740–2770 (2015)
Bekkers, E., Duits, R., Mashtakov, A., Sachkov, Y.: Vessel tracking via sub-Riemannian geodesics
on R2 P 1 . LNCS Proc. Geom. Sci. Inf. GSI 2017 10589, 1611–3349 (2017)
Bekkers, E.J., Chen, D., Portegies, J.M.: Nilpotent approximations of sub-Riemannian distances
for fast perceptual grouping of blood vessels in 2D and 3D. arXiv:1707.02811 [math], July
(2017) arXiv: 1707.02811
Bekkers, E., Lafarge, M., Veta, M., Eppenhof, K., Pluim, J., Duits, R.: Roto-translation covariant
convolutional networks for medical image analysis. In: Frangi, F., et al. (ed.) Medical Image
Computing and Computer Assisted Intervention – MICCAI 2018, pp. 440–448. Springer
International Publishing, Cham (2018)
Bekkers, E., Loog, M., ter Haar Romeny, B., Duits, R.: Template matching via densities on the
roto-translation group. IEEE Trans. Pattern Anal. Mach. Intell. 40, 452–466 (2018)
Bertalmío, M., Calatroni, L., Franceschi, V., Franceschiello, B., Prandi, D.: A cortical-inspired
model for orientation-dependent contrast perception: A link with wilson-cowan equations.
In: Lellmann, J., Burger, M., Modersitzki, J., (eds.) Scale Space and Variational Methods in
Computer Vision, pp. 472–484. Springer International Publishing, Cham (2019)
Boscain, U., Chertovskih, R., Gauthier, J.-P., Prandi, D., Remizov, A.: Cortical-inspired image
reconstruction via sub-Riemannian geometry and hypoelliptic diffusion. arXiv:1801.03800
(2018)
Bosking, W.H., Zhang, Y., Schofield, B., Fitzpatrick, D.: Orientation selectivity and the arrange-
ment of horizontal connections in tree shrew striate cortex. J. Neurosci. 17, 2112–2127
" (1997)
Bryant, R., Griffiths, P.: Reduction for constrained variational problems and (1/2) κ 2 ds. Am. J.
Math. 108(3), 525–570 (1986)
Bryant, R., Griffiths, P., Grossman, D.: Exterior Differential Systems and Euler-Lagrange Partial
Differential Equations. Chicago Lectures in Mathematics, Chicago and London (2003)
Cartan, É.: Sur une classe remarquable d’espaces de riemann. Bulletin de la Société Mathématique
de France 54, 214–264 (1926)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22(1), 61–79
(1997)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
1580 R. Duits et al.
Chambolle, A., Pock, T.: Total roto-translation variation. Arxiv, 1–47, July (2018)
Chen, D.: New minimal paths models for tubular structure extraction and image segmentation.
Ph.D. thesis, Université Paris Dauphine, PSL Research University (2016)
Chen, D., Cohen, L.: Fast asymmetric fronts propagation for image segmentation. J. Math. Imaging
Vision 60, 766–783 (2018)
Chirikjian, G.S., Kyatkin, A.B.: Engineering Applications of Noncommutitative Harmonic Analy-
sis: With Emphasis on Rotation and Motion Groups. CRC Press, Boca Raton (2001)
Citti, G., Sarti, A.: A cortical based model of perceptional completion in the roto-translation space.
J. Math. Imaging Vision 24(3), 307–326 (2006)
Citti, G., Sarti, A.: Models of the Visual Cortex in Lie Groups, pp. 1–55. Springer, Basel (2015)
Citti, G., Franceschiello, B., Sanguinetti, G., Sarti, A.: Sub-Riemannian mean curvature flow for
image processing. SIIMS 9(1), 212–237 (2016)
Cogliati, A., Mastrolia, P.: Cartan, schouten and the search for connection. Hist. Math. 45(1), 39–
74 (2018)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: Proceedings of the 33rd
International Conference on Machine Learning, Vol. 48, pp. 1–12 (2016)
Crandall, M., Lions, P.-L.: Viscosity solutions of hamilton-jacobi equations. Trans. A.M.S. 277(1),
1–42 (1983)
Creusen, E., Duits, R., Dela Haije, T.: Numerical schemes for linear and non-linear enhancement
of dw-mri. In: International Conference on Scale Space and Variational Methods in Computer
Vision, pp. 14–25. Springer, Berlin/Heidelberg (2011)
Creusen, E., Duits, R., Vilanova, A., Florack, L.: Numerical schemes for linear and non-
linear enhancement of DW-MRI. Numer. Math. Theory Meth. Appl. 6(1), 138–168
(2013)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3d transform-domain
collaborative filtering. IEEE Trans. Image Processing 16(8), 2080–2095 (2007)
Descoteaux, M., Deriche, R., Knosche, T.R., Anwander, A.: Deterministic and probabilistic
tractography based on complex fibre orientation distributions. IEEE Trans. Med. Imaging 28(2),
269–286 (2009)
Dieudonné, J.: Treatise on Analysis, V. AP, New York (1977)
Duits, R.: Perceptual organization in image analysis. Ph.D. thesis, Eindhoven University of
Technology, Department of Biomedical Engineering (2005)
Duits, R., Bekkers, E.: Lecture notes of the course Differential Geometry for Image Processing.
Part II: Invertible Orientation Scores. tech. rep., TU/e Dep. of Mathematics and Computer
Science (2020). www.win.tue.nl/~rduits/partIIversion1.pdf
Duits, R., Franken, E.M.: Left invariant parabolic evolution equations on SE(2) and contour
enhancement via invertible orientation scores, part I: Linear left-invariant diffusion equations
on SE(2). Q. Appl. Math. 68, 255–292 (2010a)
Duits, R., Franken, E.M.: Left invariant parabolic evolution equations on SE(2) and contour
enhancement via invertible orientation scores, part II: Nonlinear left-invariant diffusion equa-
tions on invertible orientation scores. Q. Appl. Math. 68, 293–331 (2010b)
Duits, R., Franken, E.M.: Left-invariant diffusions on the space of positions and orientations and
their application to crossing-preserving smoothing of HARDI images. Int. J. Comput. Vis. 92,
231–264 (2011)
Duits, R., Felsberg, M., Granlund, G., ter Haar Romeny, B.M.: Image analysis and reconstruction
using a wavelet transform constructed from a reducible representation of the Euclidean motion
group. Int. J. Comput. Vis. 79(1), 79–102 (2007)
Duits, R., Fuehr, H., Janssen, B., Florack, L., van Assen, H.: Evolution equations on gabor
transforms and their applications. ACHA 35(3), 483–526 (2013)
Duits, R., Creusen, E., Ghosh, A., Dela Haije, T.: Morphological and linear scale spaces for fiber
enhancement in DW-MRI. J. Math. Imaging Vision 46, 326–368 (2013)
Duits, R., Janssen, M.H., Hannink, J., Sanguinetti, G.R.: Locally adaptive frames in the roto-
translation group and their applications in medical imaging. J. Math. Imaging Vis. 56(3),
367–402 (2016)
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1581
Duits, R., Ghosh, A., Dela Haije, T., Mashtakov, A.: On sub-Riemannian geodesics in SE(3) whose
spatial projections do not have cusps. J. Dyn. Control. Syst. 22(4), 771–805 (2016)
Duits, R., Meesters, S.P.L., Mirebeau, J.-M., Portegies, J.M.: Optimal paths for variants of the 2D
and 3D Reeds-Shepp car with applications in image analysis. JMIV 60, 816–848 (2018)
Duits, R., St-Onge, E., Portegies, J., Smets, B.: Total variation and mean curvature PDEs on the
space of positions and orientations. In: Lellmann, J., Modersitzki, J., Burger, M. (eds.) Scale
Space and Variational Methods in Computer Vision – 7th International Conference, SSVM
2019, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 211–223. Springer, 6 (2019)
Duits, R., Bekkers, E.J., Mashtakov, A.: Fourier transform on the homogeneous space of 3d
positions and orientations for exact solutions to linear PDEs. Entropy: Special Issue: Joseph
Fourier 250th Birthday: Modern Fourier Analysis and Fourier Heat Equation in Information
Sciences for the XXIst century, Vol. 21, no. 1, pp. 1–38 (2019)
Evans, L.C.: Partial Differential Equations. American Mathematical Society, Providence (2010)
Fabbrini, L., Greco, M., Messina, M., Pinelli, G.: Improved edge enhancing diffusion filter for
speckle-corrupted images. IEEE Geosci. Remote Sens. Lett. 11(1), 99–103 (2013)
Felsberg, M.: Adaptive Filtering Using Channel Representations, pp. 31–48. Springer, London
(2012)
Felsberg, M., Forssen, P.-E., Scharr, H.: Channel smoothing: Efficient robust smoothing of low-
level signal features. IEEE Trans. Pattern Anal. Mach. Intell. 28, 209–222 (2006)
Forssen, P.-E.: Low and Medium Level Vision using Channel Representations. Ph.D. thesis,
Linkoping University, Sweden (2004) Dissertation No. 858, ISBN 91-7373-876-X
Franceschiello, B., Mashtakov, A., Citti, G., Sarti, A.: Geometrical optical illusion via sub-
riemannian geodesics in the roto-translation group. Differ. Geom. Appl. 65, 55–77
(2019)
Frangi, A., et al.: Multiscale vessel enhancement filtering. In: Proceedings of Medical Image Com-
puting and Computer-Assisted Intervention: Lecture Notes in Computer Science, Vol. 1496,
pp. 130–137 (1998)
Franken, E.M.: Enhancement of crossing elongated structures in images. Ph.D. thesis, Department
of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, October (2008)
cum laude and selected for promotion prize (≤ 2% best at TU/e)
Franken, E.M., Duits, R.: Crossing preserving coherence-enhancing diffusion on invertible orien-
tation scores. Int. J. Comput. Vis. 85(3), 253–278 (2009)
Franken, E.M., Duits, R., ter Haar Romeny, B.M.: Curvature estimation for enhancement of
crossing curves. In: Niessen, W., Westin, C.F., Nielsen, M. (eds.) Digital Proceedings of the 8th
IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis
(MMBIA), held in conjuction with the IEEE International Conference on Computer Vision
(Rio de Janeiro, Brazil) , pp. 1–8, Omnipress, Oct (2007) Awarded the MMBIA 2007 best
paper award
Fuehr, H.: Abstract Harmonic Analysis of Continuous Wavelet Transforms. Springer, Heidel-
berg/New York (2005)
Grossmann, A., Morlet, J., Paul, T.: Integral transforms associated to square integrable representa-
tions. J. Math. Phys. 26, 2473–2479 (1985)
Haar Romenij ter, B.: Front-end vision and multi-scale image analysis : multi-scale computer
vision theory and applications, written in Mathematica. Computational imaging and vision.
Kluwer Academic Publishers, CIVI (2003)
Hannink, J., Duits, R., Bekkers, E.: Crossing-preserving multi-scale vesselness. In: G. et al. (eds.)
MICCAI vol. 8674, pp. 603–610 (2014)
Hormander, L.: Hypoellptic second order differential equations. Acta Math. 119, 147–171 (1968)
Janssen, M., Duits, R., Breeuwer, M.: Invertible orientation scores of 3D images. SSVM-LNCS
9087, 563–575 (2014)
Janssen, M., Dela Haije, T., Martin, F., Bekkers, E., Duits, R.: The hessian of axially symmetric
functions on se(3) and application in 3D image analysis. LNCS (2017) Submitted to SSVM
(2017)
1582 R. Duits et al.
Janssen, M.H.J., Janssen, A.J.E.M., Bekkers, E.J., Bescós, J.O., Duits, R.: Design and processing
of invertible orientation scores of 3D images. J. Math. Imaging Vision 60(9), 1427–1458 (2018)
Jost, J.: Riemannian Geometry and Geometric Analysis. Springer (2011)
Kobayashi, S., Nomizu, K.: Foundations of Differential Geometry, vol. 1. New York (1963)
Kolar, I., Slovak, J., Michor, P.: Natural operations in differential geometry. Springer (1999)
corrected version of original version in (1993)
Lebrun, M.: An analysis and implementation of the bm3d image denoising method. IEEE Trans.
Image Process 2, 175–213 (2012)
Lee, J.M., Chow, B., Chu, S.-C., Glickenstein, D., Guenther, C., Isenberg, J., Ivey, T., Knopf, D.,
Lu, P., Luo, F., et al.: Manifolds and differential geometry. Topology 643, 658 (2009)
Mantegazza, C., Mennucci, A.: Hamilton-jacobi equations and distance functions on Riemannian
manifolds. App. Math. Optim. 47(1), 1–25 (2002)
Martin, F., Duits, R.: Lie analysis homepage. http://www.lieanalysis.nl/ (2017)
Mashtakov, A., Duits, R., Sachkov, Y., Bekkers, E., Beschastnyi, I.: Tracking of lines in spherical
images via sub-Riemannian geodesics in SO(3). JMIV 58(2), 239–364 (2017)
Meesters, S., Ossenblok, P., Wagner, L., Schijns, O., Boon, P., Florack, L., Vilanova, Duits, R.:
Stability metrics for optic radiation tractography: Towards damage prediction after resective
surgery. J. Neurosci. Methods (2017). https://doi.org/10.1016/j.jneumeth.2017.05.029
Mirebeau, J.: Anisotropic fast-marching on cartesian grids using lattice basis reduction. SIAM J.
Numer. Anal. 52(4), 1573–1599 (2014)
Mirebeau, J.-M.: Fast marching methods for curvature penalized shortest paths. J. Math. Imaging
Vis. Special Issue: Orientation Analysis and Differential Geometry in Image Processing
60(6), 784–815 (2018)
Mirebeau, J., Portegies, J.: Hamiltonian fast marching: A numerical solver for anisotropic and
non-holonomic eikonal PDEs. IPOL 9, 47–93 (2019)
Momayyez-Siahkal, P., Siddiqi, K.: 3D stochastic completion fields for fiber tractography. In:
Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition,
pp. 178–185, June (2009)
Monti, R., Cassano, F.: Surface measures in Carnot-carathéody spaces. Calc. Var. 13, 339–376
(2001)
Mumford, D.: Elastica and computer vision. Algebraic Geometry and Its Applications. Springer,
pp. 491–506 (1994)
Pechaud, M., Descoteaux, M., Keriven, R.: Brain Connectivity Using Geodesics in HARDI,
pp. 482–489. Springer, Berlin/Heidelberg (2009)
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans.
Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
Petitot, J.: The neurogeometry of pinwheels as a sub-Riemannian contact structure. J. Physiol.
Paris 97, 265–309 (2003)
Petitot, J.: Elements of Neurogeometry. Lecture Notes in Morphogenesis. Springer (2017)
Piuze, E., Sporring, J., Siddiqi, K.: Maurer-cartan forms for fields on surfaces: Application to heart
fiber geometry. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2492–2504 (2015)
Portegies, J.: PDEs on the Lie Group SE(3) and their Applications in Diffusion-Weighted MRI.
Ph.D. thesis, Department of Mathematics and Computer Science, TU/e, February (2018)
Portegies, J.M., Fick, R.H.J., Sanguinetti, G.R., Meesters, S.P.L., Girard, G., Duits, R.: Improving
fiber alignment in HARDI by combining contextual PDE flow with constrained spherical
deconvolution. PLoS ONE 10(10) (2015). https://doi.org/10.1371/journal.pone.0138122
Portegies, J., Sanguinetti, G., Meesters, S., Duits, R.: New approximation of a scale space
kernel on SE(3) and applications in neuroimaging. In: SSVM 2015, LNCS 9087, pp. 40–52
(2015)
Portegies, J., Meesters, S., Ossenblo, P., Fuster, A., Florack, L., Duits, R.: Brain connectivity
measures via direct sub-finslerian front propagation on the 5D sphere bundle of positions and
directions, ch. 24, p. 14. Springer (2019)
Prčkovska, V., Andorrà, M., Villoslada, P., Martinez-Heras, E., Duits, R., Fortin, D., Rodrigues, P.,
Descoteaux, M.: Contextual diffusion image post-processing aids clinical applications. In: Hotz,
44 Recent Geometric Flows in Multi-orientation Image Processing via. . . 1583
I., Schultz, T. (eds.) Visualization and Processing of Higher Order Descriptors for Multi-Valued
Data, Cham, pp. 353–377. Springer International Publishing (2015)
Reisert, M., Kiselev, V.G.: Fiber continuity: An anisotropic prior for ODF estimation. IEEE Trans.
Med. Imaging 30(6), 1274–1283 (2011)
Saccon, A., Aguiar, A.P., Hausler, A.J., Hauser, J., Pascoal, A.M.: Constrained motion planning
for multiple vehicles on se(3). In: 2012 IEEE 51st IEEE Conference on Decision and Control
(CDC), pp. 5637–5642, Dec (2012)
Sachkov, Y.: Maxwell strata in the Euler elastic problem. J. Dyn. Control. Syst. 14(2), 169–234
(2008)
Sachkov, Y.: Cut locus and optimal synthesis in the sub-Riemannian problem on the group of
motions of a plane. ESAIM: Control Optim. Calc. Var. 17, 293–321 (2011)
Sanguinetti, G., Bekkers, E., Duits, R., Janssen, M.H.J., Mashtakov, A., Mirebeau, J.-M.: Sub-
Riemannian Fast Marching in SE(2). Springer (2015)
Sharma, U., Duits, R.: Left-invariant evolutions of wavelet transforms on the similitude group.
ACHA 39, 110–137 (2015)
Siffre, L.: Rigid-Motion Scattering for Image Classification. Ph.D. thesis, Ecole Polytechnique,
Paris (2014)
Skibbe, H., Reisert, M.: Spherical tensor algebra: A toolkit for 3D image processing. JMIV 58,
349–381 (2017)
Smets, B.: Geometric image denoising and machine learning (cum laude). Master’s thesis,
Industrial and Applied Mathematics, CASA-TU/e, June (2019) Supervisor R.Duits. www.win.
tue.nl/~rduits/reportBartSmets.pdf
Smets, B., Duits, R., St-Onge, E., Portegies, J.: Total variation and mean curvature PDEs on the
homogeneous space of positions and orientations. Submitted to JMIV special issue (2019)
Smets, B., Portegies, J., Bekkers, E., Duits, R.: Pde-based group equivariant convolutional neural
networks. Technical report, Department of Mathematics and Computer Science TU/e, Jan
(2020)
St Onge, E., Meesters, S., Bekkers, E., Descoteaux, M., Duits, R.: Hardi denoising with mean-
curvature enhancement pde on SE(3). In: J. et al. (eds.) ISMRM Proceedings, Montreal, pp. 1–3
(2019). http://archive.ismrm.org/2019/3409.html
ter Elst, A.F.M., Robinson, D.W.: Weighted subcoercive operators on Lie groups. J. Funct.
Anal. 157, 88–163 (1998)
Tournier, J.D., Calamante, F., Connelly, A.: Robust determination of the fibre orientation dis-
tribution in diffusion mri: non-negativity constrained super-resolved spherical deconvolution.
Neuroimage 35(4), 1459–1472 (2007)
Weickert, J.: Coherence-enhancing diffusion filtering. Int. J. Comput. Vis. 31(2/3), 111–127 (1999)
Zhang, J., Duits, R., ter Haar Romeny, B., Sanguinetti, G.: Numerical approaches for linear left-
invariant diffusions on SE(2), their comparisons to exact solutions, and their applications in
retinal imaging. Numer. Math. Theory Methods Appl. 9, 1–50 (2016)
Zhang, J., Dashtbozorg, B., Bekkers, E., Pluim, J., Duits, R., ter Haar Romeny, B.: Robust retinal
vessel segmentation via locally adaptive derivative frames in orientation scores. IEEE-TMI
35(12), 2631–2644 (2016)
Zhang, J., Dashtbozorg, B., Huang, F., Berendschot, T.T., ter Haar Romeny, B.M.: Analysis
of retinal vascular biomarkers for early detection of diabetes. In: European Congress on
Computational Methods in Applied Sciences and Engineering, pp. 811–817. Springer (2017)
Zhu, S., et al.: Retinal vascular tortuosity in hospitalized patients with type 2 diabetes and diabetic
retinopathy in China. J. Biomed. Sci. Eng. 9(10), 143 (2016)
Zhu, S., Liu, H., Du, R., Annick, D.S., Chen, S., Qian, W.: Tortuosity of retinal main and branching
arterioles, venules in patients with type 2 diabetes and diabetic retinopathy in china. IEEE
Access 8, 6201–6208 (2020)
PDE-Constrained Shape Optimization:
Toward Product Shape Spaces and 45
Stochastic Models
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586
Optimization Over Product Shape Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588
Optimization on Shape Spaces with Steklov–Poincaré Metric . . . . . . . . . . . . . . . . . . . . . . 1590
Optimization of Multiple Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1598
Stochastic Multi-shape Optimization and the Stochastic Gradient Method . . . . . . . . . . . . . . 1605
Numerical Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1611
Deterministic Model Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1612
Stochastic Model Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614
Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1624
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1626
This work has been partly supported by the state of Hamburg within the Landesforschungs-
förderung under project “Simulation-Based Design Optimization of Dynamic Systems Under
Uncertainties” (SENSUS) with project number LFF-GK11, and by the German Academic
Exchange Service (DAAD) within the program “Research Grants-Doctoral Programmes in
Germany, 2017/18.”
C. Geiersbach ()
Weierstrass Institute, Berlin, Germany
e-mail: [email protected]
E. Loayza-Romero
Institute for Analysis and Numerics, University of Münster, Münster, Germany
e-mail: [email protected]
K. Welker
Faculty of Mechanical Engineering and Civil Engineering,
Helmut-Schmidt-University/University of the Federal Armed Forces Hamburg, Hamburg,
Germany
e-mail: [email protected]
Abstract
Shape optimization models with one or more shapes are considered in this
chapter. Of particular interest for applications are problems in which a so-called
shape functional is constrained by a partial differential equation (PDE) describing
the underlying physics. A connection can be made between a classical view
of shape optimization and the differential geometric structure of shape spaces.
To handle problems where a shape functional depends on multiple shapes, a
theoretical framework is presented, whereby the optimization variable can be
represented as a vector of shapes belonging to a product shape space. The
multi-shape gradient and multi-shape derivative are defined, which allows for
a rigorous justification of a steepest descent method with Armijo backtracking.
As long as the shapes as subsets of a hold-all domain do not intersect, solving a
single deformation equation is enough to provide descent directions with respect
to each shape. Additionally, a framework for handling uncertainties arising
from inputs or parameters in the PDE is presented. To handle potentially high-
dimensional stochastic spaces, a stochastic gradient method is proposed. A model
problem is constructed, demonstrating how uncertainty can be introduced into the
problem and the objective can be transformed by use of the expectation. Finally,
numerical experiments in the deterministic and stochastic case are devised, which
demonstrate the effectiveness of the presented algorithms.
Keywords
Introduction
used to solve such shape optimization problems using the gradient descent method.
In the past, major effort in shape calculus has been devoted toward expressions
for shape derivatives in the so-called Hadamard form, which are integrals over
the surface (cf. Delfour and Zolésio 2001; Sokolowski and Zolésio 1992). During
the calculation of these expressions, volume shape derivative terms arise as an
intermediate result. In general, additional regularity assumptions are necessary in
order to transform the volume forms into surface forms. Besides saving analytical
effort, this makes volume expressions preferable to Hadamard forms. In this
chapter, the Steklov–Poincaré metric is considered, which allows to use the volume
formulations (cf. Schulz et al. 2016). The reader is referred to Hardesty et al.
(2020) and Hiptmair et al. (2015) for a comparison on the volume and boundary
formulations with respect to their order of convergence in a finite element setting.
In applications, often more than one shape needs to be considered, e.g., in electri-
cal impedance tomography, where the material distribution of electrical properties
such as electric conductivity and permittivity inside the body is examined (Cheney
et al. 1999; Kwon et al. 2002; Laurain and Sturm 2016) and the optimization
of biological cell composites in the human skin (Siebenborn and Vogel 2021;
Siebenborn and Welker 2017). If a shape is seen as a point on an abstract manifold, it
is natural to view a collection of shapes as a vector of points. Using this perspective,
a shape optimization problem can be formulated over multiple shapes. This novel,
multi-shape optimization problem is developed in this chapter.
A second area of focus in this chapter is in the development of stochastic models
for multi-shape optimization problems. There is an increasing effort to incorporate
uncertainty into shape optimization models (see, for instance Dambrine et al. 2015,
2019, Hiptmair et al. 2018, Liu et al. 2017, and Martínez-Frutos et al. 2016).
Many relevant problems contain additional constraints in the form of a PDE, which
describe the physical laws that the shape should obey. Often, material coefficients
and external inputs might not be known exactly but rather be randomly distributed
according to a probability distribution obtained empirically. In this case, one might
still wish to optimize over a set of these possibilities to obtain a more robust shape.
When the number of possible scenarios in the probability space is small, then the
optimization problem can be solved over the entire set of scenarios. This approach
is not relevant for most applications, as it becomes intractable if the random field
has more than a few scenarios. For problems with PDEs containing uncertain inputs
or parameters, either the stochastic space is discretized or sampling methods are
used. If the stochastic space is discretized, one typically relies on a finite-dimension
assumption, where a truncated expansion is used as an approximation of the infinite-
dimensional random field. Numerical methods include stochastic Galerkin method
(Babuska et al. 2004) and sparse-tensor discretization (Schwab and Gittelson 2011).
Sample-based approaches involve taking random or carefully chosen realizations
of the input parameters; this includes Monte Carlo or quasi-Monte Carlo methods
and stochastic collocation (Babuška et al. 2007). In the stochastic approximation
approach, dating back to a chapter by Robbins and Monro (1951), one uses a
stochastic gradient in place of a gradient to iteratively minimize the expected value
over a random function. Recently, stochastic approximation was proposed to solve
problems formulated over a shape space that contains uncertainties (Geiersbach
1588 C. Geiersbach et al.
et al. 2021). A novel stochastic gradient method was formulated over infinite-
dimensional shape spaces and convergence of the method was proven. The work
was informed by its demonstrated success in the context of PDE-constrained
optimization under uncertainty (Geiersbach and Pflug 2019; Haber et al. 2012;
Martin et al. 2018; Geiersbach and Wollner 2020; Geiersbach and Scarinci 2021).
The chapter is structured as follows. Section “Optimization Over Product
Shape Manifolds” is concerned with deterministic shape optimization. First, in
section “Optimization on Shape Spaces with Steklov–Poincaré Metric”, it is
summarized how the theory of deterministic PDE-constrained shape optimization
problems can be connected with the differential geometric structure of the space of
smooth shapes. The novel contribution of this chapter is in section “Optimization of
Multiple Shapes”, which concentrates on more than one shape to be optimized in
the optimization model. A framework is introduced to justify a mesh deformation
method using a Steklov–Poincaré metric defined on a product manifold. This novel
framework is further developed in section “Stochastic Multi-shape Optimization
and the Stochastic Gradient Method” in the context of shape optimization under
uncertainty. The stochastic gradient method is revisited in the context of problems
depending on multiple shapes. Numerical experiments demonstrating the effective-
ness of the deterministic and stochastic methods are shown in section “Numerical
Investigations”. Finally, closing remarks are shared in section “Conclusion”.
n3 u4
u3 n4
n2
u2 n5
n1 u5
u1
c ∼c̃ ⇔ d
dt φα (c(t))|t=0 = dt φα (c̃(t))|t=0 ∀ α
d
with u ∈ Uα ,
where {(Uα , φα )}α atlas of Ui .
1590 C. Geiersbach et al.
j : UN → R, u → j (u).
min jˆ(u, y)
(u,y)∈UN ×Y
(2)
s.t. e(u, y) = 0.
When e in (2) represents a PDE, the shape optimization problem is called PDE-
constrained. Formally, if the PDE has a (unique) solution given any choice of u, then
the control-to-state operator S : UN → Y, u → y is well-defined. With j (u) :=
jˆ(u, Su) one obtains an unconstrained optimization problem of the form (1). This
observation justifies the following work with (1), although later in the application
section, a problem of the form (2) is presented.
Section “Optimization on Shape Spaces with Steklov–Poincaré Metric” concen-
trates on N = 1 and summarizes how the theory of deterministic PDE-constrained
shape optimization problems can be connected to the differential geometric structure
of shape spaces. Here, in view of obtaining efficient gradient-based algorithms,
one focuses on the Steklov–Poincaré metric considered in Schulz et al. (2016).
Afterward, section “Optimization of Multiple Shapes” concentrates on N > 1,
which leads to product shape manifolds. It will be shown that it is possible to define
a product metric and use this to justify the main result of this chapter, Theorem 1. It
is rigorously argued that vector fields induced by the shape derivative give descent
directions with respect to each individual element of the shape space as well as the
corresponding element of the product shape space.
the space of smooth shapes and shape optimization is analyzed. Please note the
following: one shape is both an element of a manifold and a subset of Rd . In
classical shape calculus, a shape is considered to be a subset of Rd , only. However,
this subsection explains that equipping a shape with additional structure provides
theoretical advantages, enabling the use of concepts from differential geometry like
the pushforward, exponential maps, etc.
If for all directions W ∈ Ck0 (D, Rd ) the Eulerian derivative (3) exists and the
mapping
is linear and continuous, the expression dj (u)[W ] is called the shape derivative of
j at u in direction W ∈ Ck0 (D, Rd ). In this case, j is called shape differentiable of
class Ck at u.
The proof of existence of shape derivatives can be done via different approaches
like the Lagrangian (Sturm 2013), min-max (Delfour and Zolésio 2001), chain
rule (Sokolowski and Zolésio 1992), and rearrangement (Ito et al. 2008) methods,
among others. If the objective functional is given by a volume integral, under the
assumptions of the Hadamard structure theorem (cf. Sokolowski and Zolésio 1992,
1592 C. Geiersbach et al.
Theorem 2.27), the shape derivative can be expressed as an integral over the domain,
the so-called volume or weak formulation, and also as an integral over the boundary,
the so-called surface or strong formulation. Recent advances in PDE-constrained
optimization on shape manifolds are based on the surface formulation, also called
Hadamard form, as well as intrinsic shape metrics. Major effort in shape calculus
has been devoted toward such surface expressions (cf. Delfour and Zolésio 2001;
Sokolowski and Zolésio 1992), which are often very tedious to derive. When one
derives a shape derivative of an objective functional, which is given by an integral
over the domain, one first gets the volume formulation. This volume form can be
converted into its surface form by applying the integration by parts formula. In
order to apply this formula, one needs a higher regularity of the state and adjoint
of the underlying PDE. Recently, it has been shown that the weak formulation has
numerical advantages (see, for instance, Berggren 2010, Gangl et al. 2015, Hiptmair
and Paganini 2015, and Paganini 2015). In Hardesty et al. (2020) and Laurain and
Sturm (2013), practical advantages of volume shape formulations have also been
demonstrated.
d
(j∗ )u : Tu U → R, c → j (c(t))|t=0 = (j ◦ c) (0).
dt
In this setting, where tangent spaces are defined as equivalence classes of curves, the
pushforward of f : M → N at a point p ∈ M is generally given by a map between
the tangent spaces, i.e., (f∗ )p : Tp M → Tf (p) N with (f∗ )p (c) := dtd f (c(t))|t=0 =
(f ◦ c) (0).
With the help of the pushforward, it is possible to define the Riemannian shape
gradient.
for k = 0, 1, . . . do
[1] Compute the Riemannian shape gradient v k ∈ Tuk U with respect to G by solving
end for
1594 C. Geiersbach et al.
illustrates this situation. With (5) the (k + 1)-th shape iterate uk+1 is calculated,
where expuk : Tuk U → U, z → expuk (z) denotes the exponential map; this defines
a local diffeomorphism between the tangent space Tuk U and the manifold U by
following the locally uniquely defined geodesic starting in the k-th shape iterate
uk ∈ U in the direction −v k ∈ Tuk U. In Algorithm 1, an Armijo backtracking
line search technique is used to calculate the step-size t k in each iteration.
√Here, the
norm introduced by the metric under consideration is needed, · G := G(·, ·).
Here, Emb(S d−1 , Rd ) denotes the set of all embeddings from the unit circle S d−1
into Rd , and Diff(S d−1 ) is the set of all diffeomorphisms from S d−1 into itself. In
Kriegl and Michor (1997), it is verified that the shape space Be is a smooth manifold.
The tangent space is isomorphic to the set of all smooth normal vector fields along
c, i.e.,
Tu Be (S d−1 , Rd ) ∼
= h : h = α n, α ∈ C∞ (S d−1 ) ,
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1595
where n denotes the outer unit normal field to the shape u. Next, the connection of
shape derivatives with the geometric structure of Be is addressed. This combination
results in efficient optimization techniques on Be .
with tr : H01 (D, Rd ) → H 1/2 (u, Rd ) denoting the trace operator on Sobolev spaces
for vector-valued functions and V ∈ H01 (D, Rd ) solving the Neumann problem
a(V , W ) = v (tr(W ) · n) ds ∀W ∈ H01 (D, Rd ),
u
where a : H01 (D, Rd )×H01 (D, Rd ) → R is a symmetric and coercive bilinear form.
Note that a Steklov–Poincaré metric depends on the choice of the bilinear form.
Thus, different bilinear forms lead to various Steklov–Poincaré metrics. To define a
metric on Be , the Steklov–Poincaré metric is restricted to the mapping g S : Tu Be ×
Tu Be → R.
Next, the connection between Be equipped with the Steklov–Poincaré metric
g S and shape calculus is stated. As already mentioned, the shape derivative can
be expressed in a weak and strong form under the assumptions of the Hadamard
structure theorem. The Hadamard structure theorem actually states the existence of
a scalar distribution r on the boundary of a domain. However, in the following, it
is always assumed that r is an integrable function. In general, if r ∈ L1 (u), then
r is obtained in the form of the trace on u of an element of W 1,1 (D). This means
that it follows from Hadamard structure theorem that the shape derivative can be
expressed more conveniently as
d surf j (u)[W ] := r(s) W (s) · n(s) ds. (7)
u
In view of the connection between the shape space Be with respect to the Steklov–
Poincaré metric g S and shape calculus, r ∈ C∞ (u) is assumed. In contrast, if the
shape functional is a pure volume integral, the weak form is given by
d vol j (u)[W ] := RW (x) dx, (8)
D
which is equivalent to
w(s) · [(S pr )−1 v](s)ds = r(s)w(s)ds ∀w ∈ C∞ (u). (9)
u u
From (9), one gets that a vector V ∈ H01 (D, Rd ) ∩ C∞ (D, Rd ) can be viewed as
an extension of a Riemannian shape gradient to the hold-all domain D because of
the identities
One option for a(·, ·) is the bilinear form associated with linear elasticity, i.e.,
a elas (V , W ) := (λtr((V ))id + 2μ(V )) : (W ) dx,
D
Remark 2. Note that it is not ensured that V ∈ H01 (D, Rd ) solving the PDE (in
weak form)
k k
Ru : Tuk U → U, v → Ru (v) := uk + v (12)
(cf. Schulz and Welker 2018). The retraction is only a local approximation; for large
vector fields, the image of this function may no longer belong to Be . This retraction
is closely related to the perturbation of the identity, which is defined for vector fields
on the domain D. Given a starting shape uk+1 in the k-th iteration of Algorithm 1,
the perturbation of the identity acting on the domain D in the direction V k , where
V k solves (11) for u = uk , gives
D(uk+1 ) = {x ∈ D | x = x k − t k V k }. (13)
As vector fields induced from solving (11) have less regularity than is required on
the manifold, it is worth mentioning that the shape uk+1 resulting from this update
could leave the manifold Be . To summarize, either large or less smooth vector fields
can contribute to the iterate uk+1 leaving the manifold. One indication that the
iterate has left the manifold would be that the curve uk+1 develops corners. Another
possibility is that the curve uk+1 self-intersects. One way to avoid this behavior is by
preventing the underlying mesh to break (meaning elements from the finite element
discretization overlap). One can avoid broken meshes as long as the step-size is not
chosen to be too large.
the concepts of the pushforward, Riemannian shape gradient, and shape derivative
need to be generalized. In view of applications in shape optimization, the metric
GN on the product manifold is related later to the Steklov–Poincaré metric. As a
main contribution, the computation of vector fields extended to the hold-all domain
is discussed.
Analogously to Abraham et al. (2012, 3.3.12 Proposition), one can identify the
tangent bundle T UN with the product space T U1 × · · · × T UN . In particular, there
is an identification of the tangent space of the product manifold UN in the point u;
more precisely,
T u UN ∼
= Tu1 U1 × · · · × TuN UN .
d
(πi∗ )u : Tu UN → Tπi (u) Ui , c → πi (c(t))|t=0 = (πi ◦ c) (0).
dt
where Tπ∗i (u) Ui and Tu∗ UN are the dual spaces of Tπi (u) Ui and Tu UN , respectively.
Thanks to these definitions, the product metric GN to the product shape space UN
can be defined:
N
GN = πi∗ Gi .
i=1
N
GN
u (v, w) = Giπi (u) (πi∗ v, πi∗ w) ∀ v, w ∈ Tu UN . (14)
i=1
Arguments identical to the ones in the proof of O’neill (1983, chapter 3, lemma 5)
make (UN , GN ) a Riemannian product manifold.
In order to define a shape gradient of a functional j : UN → R using the
definition of the product metric in (14), Definition 2 needs to be first generalized
to the product shape space.
1600 C. Geiersbach et al.
d
(j∗ )u : Tu UN → R, c → j (c(t))|t=0 = (j ◦ c) (0).
dt
GN
u (v, w) = (j∗ )u w ∀ w ∈ T u UN .
expN
uk
: Tuk UN → UN , z = (z1 , . . . , zN ) → (expuk z1 , . . . , expuk zN ) (15)
1 N
is needed to update the shape vector uk = (uk1 , . . . , ukN ) in each iteration k, where
expuk : Tuk Ui → Ui , z → expuk (z) for all i = 1, . . . , N. An Armijo backtracking
i i i
line search strategy is used to calculate the step-size t k in each iteration. Here, the
norm introduced on GN is given by · GN := GN (·, ·).
So far in this subsection, each shape ui has been considered as an element of the
Riemannian shape manifold (Ui , Gi ), for all i = 1, . . . , N , in order to define the
multi-shape gradient with respect to the Riemannian metric GN . In classical shape
calculus, each shape ui is only a subset of Rd . If one focuses on this perspective, then
it is possible to generalize the classical shape derivative to a partial shape derivative
and, thus, to a multi-shape derivative. With these generalized objects, a connection
between shape calculus and the differential geometric structure of the product shape
manifold UN can be made.
Let D be partitioned in N non-overlapping Lipschitz domains 1 , . . . , N such
that uk ⊂ k . This construction will be referred as an admissible partition (see
Fig. 3 for an example in R2 . The indicator function 1 i : D → {0, 1} is defined by
1 i (x) = 1, if x ∈ i , and 1 i (x) = 0, otherwise.
for k = 0, 1, . . . do
[1] Compute the Riemannian multi-shape gradient v k with respect to GN by solving
uk+1 := expN
uk
(−t k v k ). (17)
end for
W|
j (u1 , . . . , ui−1 , Ft i
(ui ), ui+1 , . . . , uN ) − j (u)
dui j (u)[W | i ] := lim .
t→0+ t
(18)
If for all directions W ∈ Ck0 (D, Rd ) the i-th partial Eulerian derivative (18) exists
and the mapping
is linear and continuous, the expression dui j (u)[W | i ] is called the i-th partial
shape derivative of j at u in direction W ∈ Ck0 (D, Rd ). If the i-th partial shape
derivatives of j at u in the direction W ∈ Ck0 (D, Rd ) exist for all i = 1, . . . , N ,
then
N
dj (u)[W ] := dui j (u)[W | i ] (19)
i=1
Remark 4. For a single shape, by the Hadamard Structure Theorem, the shape
derivative takes either the forms (7) or (8). Using the definition above, the
1602 C. Geiersbach et al.
Hadamard Structure Theorem for multiple shapes can also be applied. The surface
representation for ri ∈ L1 (ui ) is
N N
d surf
j (u)[W ] := dusurf
i
j (u)[W | i ]= ri (s) W | i (s) · n(s) ds. (20)
i=1 i=1 ui
The expressions (20) and (22) suggest that the multi-shape derivative is in
fact independent of the partition, provided it is an admissible one, i.e., with
nonintersecting shapes and ui ⊂ i for nonintersecting subdomains i . This can
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1603
Vi := {V ∈ H 1 ( i, R ):
d
V = 0 on ∂D ∩ ∂ i },
V0i = H01 ( i , R ).
d
One has (cf. Quarteroni and Valli 1999, Subchapter 1.2) i = H 1/2 ( i , Rd ) if
i ∩∂D = ∅. In case i ∩∂D = ∅, the space i is strictly included in H
1/2 ( , Rd )
i
and is endowed with a norm which is larger than the norm of H 1/2 ( i , Rd ). The
trace space over := ∪N i=1 i is given by
:= η ∈ H 1/2 ( , Rd ) : η = V , for a suitable V in H01 (D, Rd ) .
The following main theorem justifies solving (23) to obtain a vector field that
gives descent directions with respect to each shape.
Proof. This proof follows the arguments from Quarteroni and Valli (1999, Sec. 1.2),
generalizing for the case N > 2. First, it is shown that (24) yields the system (25).
Let V be a solution to (24). Then setting Vi = V | i for i = 1, . . . , N , one trivially
obtains (25b) in the sense of the corresponding traces. Moreover, using Wi = W | i
for an arbitrary W ∈ H01 (D, Rd ), one has ai (Vi , Wi ) = dui j (u)[Wi ] for all Wi ∈ Vi
and in particular for all Wi ∈ V0i , showing (25a). Moreover, the function
⎧
⎪
⎪
⎪ E η
⎨ 1 1
in 1,
..
Eη := . (26)
⎪
⎪
⎪
⎩ EN ηN in N
N
a(V , W ) = ai (Vi , W | i − Ei ηi ) + ai (Vi , Ei ηi )
i=1
N
= dui j (u)[W | i − Ei ηi ] + dui j (u)[Ei ηi ]
i=1
= dj (u)[W ],
Remark 6. The second and third conditions of (25) are continuity conditions along
for the solution V and the normal flux (normal stress) relating Vi for all i =
1, . . . , N . The extension operator Ei can be chosen arbitrarily; one example is the
extension-by-zero operator (cf. Hiptmair et al. 2015).
exponential map. If one chooses the retraction (12) instead of the exponential maps
expuk in (15) for all i = 1, . . . , N in Algorithm 2, one gets again the relation to the
i
perturbation of the identity. In this setting, Theorem 1 justifies the update
D(uk+1 ) = {x ∈ D | x = x k − t k V k } (27)
Remark 7. Notice that the variational problem given in (23) reflects exactly the
approach presented, e.g., in Geiersbach et al. (2021), Siebenborn and Vogel (2021),
and Siebenborn and Welker (2017) to generate descent directions for problems con-
taining multiple shapes. Hence the above theory supports the numerical approach
already used in those papers.
Given the framework for understanding shape optimization problems over product
shape spaces, it is now possible to incorporate uncertainty. In this section, the
focus is on the case where the uncertainty can be characterized by a known
probability space, for instance, through prior sampling. The probability space is a
triple (, F, P), where is the sample space containing all possible “realizations,”
F ⊂ 2 is the σ -algebra of events, and P : → [0, 1] is a probability measure. Note
that in certain applications, there may be different sources of uncertainty that are
independent of each other. In this case, one could work with the product probability
1606 C. Geiersbach et al.
Notice that the function j representing the transformed function J only depends on
u, the vector of shapes. Therefore minimizers of (28) do not depend on ω, i.e., they
are deterministic.
More interesting problems involve uncertainty in the equality constraint. The
equality can be parametrized by the operator e : UN × Y × → W, with Banach
spaces Y and W. A property is said to hold almost surely (a.s.) provided that the
set in where the property does not hold is a null set. Of interest are constraints of
the form
e(u, y, ω) = 0 a.s.
is obtained. If the equality constraint in (29) is uniquely solvable for any choice of
u ∈ UN and almost every ω ∈ , then the operator S(ω) : UN → Y, u → y(ω) is
well-defined for almost every ω. As before, with J (u, ω) := Jˆ(u, S(ω)u, ω), (29)
is formally equivalent to the problem (28). This unconstrained view will be helpful
in formulating the stochastic gradient method. However, the reader is reminded that
the stochastic gradient implicitly depends on the operator S(·).
If the stochastic dimension is relatively small, the expectation can be approx-
imated using quadrature and Algorithm 2 can be applied. This type of sample
average approximation approach is not an algorithm, and it becomes intractable
as the stochastic dimension grows. For larger stochastic dimensions, the stochastic
gradient method is widely used in stochastic optimization. It is a classical method
developed by Robbins and Monro (1951). As a sample-based approach, the
stochastic gradient method does not suffer from the curse of dimensionality the
way the discretizations mentioned in the introduction do. In Geiersbach et al.
(2021), the stochastic gradient method was applied to the novel setting of shape
spaces, where an example with multiple shapes was also presented. However, a
theoretical background over product manifolds was not considered there. To apply
the method to the setting containing multiple shapes, several concepts developed in
section “Optimization of Multiple Shapes” need to be generalized. To this end, it
will sometimes be helpful to use the shorthand Jω (·) := J (·, ω).
d
((Jω )∗ )u : Tu UN → R, c → Jω (c(t))|t=0 = (Jω ◦ c) (0).
dt
GN
u (v, w) = ((Jω )∗ )u w ∀ w ∈ T u UN .
W|
J (u1 , . . . , ui−1 , Ft i
(ui ), ui+1 , . . . , uN , ω) − J (u, ω)
dui J (u, ω)[W | i ] := lim
t→0+ t
(30)
If for all directions W ∈ Ck0 (D, Rd ) the i-th partial Eulerian derivative (30) exists
and the mapping
is linear and continuous, the expression dui J (u, ω)[W | i ] is called the i-th partial
shape derivative of j at u in direction W ∈ Ck0 (D, Rd ). If the i-th partial shape
derivatives of J at u for a fixed realization ω ∈ in the direction W ∈ Ck0 (D, Rd )
exist for all i = 1, . . . , N , then
N
dJ (u, ω)[W ] := dui J (u, ω)[W | i ] (31)
i=1
W|
J (u1 , . . . , ui−1 , Ft i
(ui ), ui+1 , . . . , uN , ω) − J (u, ω)
≤ C(ω).
t
Equipped with these tools, it is now possible to formulate the stochastic gradient
method for objectives formulated on a product shape space in Algorithm 3. Instead
of a backtracking procedure as in Algorithm 2 to determine the step-size, the
algorithm uses the classical “Robbins–Monro” step-size from the original work
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1609
∞ ∞
t k ≥ 0, t k = ∞, (t k )2 < ∞. (32)
k=0 k=0
Under additional assumptions on the manifold and function J (cf. Geiersbach et al.
2021), this rule guarantees step-sizes that are large enough to converge to stationary
points while asymptotically dampening oscillations in the iterates. In contrast to the
backtracking procedure, the step-size sequence is in practice chosen exogenously,
and its scaling is either informed by a priori estimates or tuned offline.
for k = 0, 1, . . . do
[1] Randomly sample ωk , independent of ω1 , . . . , ωk−1
[2] Compute the stochastic Riemannian multi-shape gradient v k = v k (ωk ) w.r.t. In this way
the margins will be respected GN by
solving
[3] Set
uk+1 := expN
uk
(−t k v k ))
procedure could also be used for the stochastic setting; however, in Geiersbach
(2020), it was demonstrated how the Armijo backtracking rule when combined
with stochastic gradients fails in minimizing a function over the real line. Of
course, there are modifications possible. In the most basic version of the method,
ωk comprises a single sample randomly drawn from the probability space. One
might think that the problem could be remedied by simply taking multiple samples
ωk = (ωk,1 , . . . , ωk,mk ) at each iteration k and computing the empirical average
mk
1
∇J (uk , ωk ) = ∇J (uk , ωk,i ). (33)
mk
i=1
the literature to emphasize the numerical bias induced by the iteration. The analysis
in Martin et al. (2019), which gives efficient choices for the sample size mk , step-
size tk , and discretization error tolerance, works because the original problem is
strongly convex, problem parameters are well-known, and the meshes involved are
not deformed as part of the outer optimization loop. For more challenging problems,
these choices no longer apply and future analysis would be needed.
Again for optimal control problems with PDEs, but for a larger class of problems,
including nonsmooth and convex problems, the authors Geiersbach and Wollner
(2020) propose a different approximation scheme without needing to take additional
samples (meaning mk ≡ 1 is permissible). The proposed method uses averaging of
the iterate uk instead of the stochastic gradient. The descent is smoothed indirectly
without having to take additional samples at each iteration. This was shown to work
efficiently in combination with a mesh refinement rule, carefully coupled with the
step-size rule t k . Extending these results to the context of shape optimization would
also be challenging as well, not only due to the analysis of numerical error and lack
of convexity; here, uk represents a shape, not an element from a Banach space, and
its “average” would need to be made precise.
A final connection to the shape space (BeN , gS ) is now desirable in view
of the following numerical experiments. Using the theoretical justification from
Theorem 1, it is possible to compute a deformation vector V = V (ω) ∈ H01 (D, Rd )
in the point u = (u1 , . . . , uN ) ∈ BeN by solving the variational problem
Numerical Investigations
with
N
1 1
j obj (u) := (y(x) − ȳ(x))2 dx = (yi (x) − ȳi (x))2 dx, (35)
2 D 2 Di
i=0
N
j reg (u) := νi dS (36)
i=1 ui
where n0 represents the outward normal vector on D0 . The equations (38)–(39) are
complemented by the transmission conditions
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1613
∂yi ∂y0
κi (x) (x) + κ0 (x) (x) = 0, yi (x) − y0 (x) = 0 on ui , i = 1, . . . , N.
∂ni ∂n0
(40)
Note that the system (38), (39), and (40) can becompactly represented in the weak
formulation: find y ∈ Hav
1 (D) := {v ∈ H 1 (D)|
D v dx = 0} such that
κ(x)∇y(x) · ∇v(x)dx = g(x)v(x)dx ∀v ∈ Hav
1
(D).
D ∂D
Remark 9. In general, the distribution ȳ and the diffusion coefficient κ do not need
to have as high a regularity as assumed above to formulate the PDE-constrained
problem (37), (38), (39), (40). The regularity above is only needed for shape
differentiability of the objective functional (see Ito et al. 2008, Section 3.2).
The shape derivative to (37), (38), (39), (40) can be achieved using standard
calculation techniques like the one mentioned in section “Optimization on Shape
Spaces with Steklov–Poincaré Metric” combined with the help of the partial shape
derivative definition and Remark 4. Its volume formulation is given by
dj (u)[W ]
= −κ(x)∇y(x) · (∇W (x)+∇W (x))∇p(x)−(y(x) − ȳ(x))∇ ȳ(x) · W (x)
D
∂pi ∂p0
κi (x) (x) + κ0 (x) (x) = 0, pi (x) − p0 (x) = 0 on ui , i = 1, . . . , N.
∂ni ∂n0
(44)
The sum of integrals over ui in (41) is the shape derivative of the perimeter
regularization, which is computed with the help of the partial shape derivative
definition as follows:
d+
N
dj reg (u)[W ] = vi W| dS,
dt t=0 Ft i (ui )
i=1
where the last equality holds, thanks to Novruzi and Pierre (2002, Proposition 5.1).
This gives the -th partial shape derivative du j reg (u)[W | ] and thus the shape
derivative of the regularization term in (41).
Now, every object needed for the application of Algorithm 2 is given. In sec-
tion “Numerical Experiments”, this algorithm is applied to solve the deterministic
model problem.
For the stochastic model, the domain D is partitioned as described for the deter-
ministic model above. For a function f : D × → R, the function fi denotes
the restriction f |Di : Di × → R. The slightly abusive notation ∇fi (x, ω) =
∇x fi (x, ω) means ω is fixed and the gradient is to be understood with respect to
the variable x only. Additionally, the notation for the directional derivative means
∂fi
∂ni (x, ω) = limt→0 t (fi (x + t ni (x), ω) − fi (x, ω)). A parametrized objective
1
where
N
1 1
J obj (u, ω) := (y(x, ω) − ȳ(x))2 dx = (yi (x, ω) − ȳi (x))2 dx
2 D 2 Di
i=0
(45)
and J reg is defined as in (36). For simplicity, the source term g and the target term
ȳ are deterministic with the same regularity as in the previous section. Suppose
however that the source of uncertainty comes from the coefficients, i.e., κi =
κi (x, ω) are random fields with regularity κi ∈ L2 (, C 1 (Di )). This leads to a
modification of the deterministic problem
min j (u) := E J (u, ω) (46)
u∈BeN
∂yi ∂y0
κi (x, ω) (x, ω) + κ0 (x, ω) (x, ω) = 0 on ui × , i = 1, . . . , N,
∂ni ∂n0
yi (x, ω) − y0 (x, ω) = 0 on ui × , i = 1, . . . , N.
(49)
Using standard techniques for calculating the shape derivative (see Geiersbach
et al. 2021, Appendix B), the shape derivative in volume formulation for a fixed ω
is given by
dJ (u, ω)[W ]
= −κ(x, ω)∇y(x, ω) · (∇W (x) + ∇W (x))∇p(x, ω)
D
where y = y(x, ω) satisfies the state equation (47), (48), and (49) and p = p(x, ω)
satisfies adjoint equation
1616 C. Geiersbach et al.
∂pi ∂p0
κi (x, ω) (x, ω) + κ0 (x, ω) (x, ω) = 0 on ui × , i = 1, . . . , N,
∂ni ∂n0
pi (x, ω) − p0 (x, ω) = 0 on ui × , i = 1, . . . , N.
(52)
The construction of the coefficients κ for the purpose of simulations requires
some discussion. Karhunen–Loève expansions are frequently used to simulation
random perturbations of a coefficient within a material and are also used in the
experiments in section “Numerical Experiments”. Given a domain D̃, a (truncated)
Karhunen–Loève expansion of a random field a : D̃ × → R takes the form
m
√
a(x, ω) = ā(x) + γk φk (x)ξk (ω),
k=1
mi
√
κ̃i (x, ω) = κ̄i (x) + γi,k φi,k (x)ξi,k (ω), (53)
k=1
where κ̄ : D̃ → R, ξi,k (ω) = (ξi,1 (ω), . . . , ξi,mi (ω)) ∈ Rmi is a random vector, and
γi,k and φi,k denote the eigenvalues and eigenfunctions that depend on the domain
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1617
D̃. Finally, κi = κ̃i |Di . The coefficient κ over the domain D is then stitched together
by definition of
N
κ(x, ω) = κ0 (x, ω) + κi (x, ω)1Di (x).
i=1
Numerical Experiments
Additionally, following the ideas from Schulz and Siebenborn (2016), at each
iteration k, an additional PDE is solved to choose values for the Lamé parameters
in the deformation equation. The parameter λ is set to zero, and μ is chosen from
the interval [μmin , μmax ] such that it is decreasing smoothly from ui , i = 1, . . . , N,
to the outer boundary ∂D. One possible way to model this behavior is to solve the
Poisson equation
1618 C. Geiersbach et al.
μ=0 in Di , i = 0, . . . , N
μ = μmax on ui , i = 1, . . . , N,
μ = μmin on ∂D.
(a) (b)
(c) (d)
Fig. 5 The target shapes are displayed by the dotted lines. The outer domain D0k is displayed in
teal, the domain D1k is displayed in light green, and the subdomain D2k is shown in purple. The
figures show the progression of the initial configuration D 0 to the final subdomain configuration
D 400 . (a) Initial configuration D 0 . (b) D 50 . (c) D 200 . (d) D 400
Algorithm 3. An example with two shapes is used again, i.e., N = 2, and the same
target shape vector u∗ as in section “Deterministic Case: Behavior of Algorithm 2”
is considered. The same values for g and ν1 = ν2 are used.
To generate samples according to the discussion at the end of section “Stochastic
Model Problem”, for simplicity D̃ = D is used, allowing for the explicit
representations of the eigenfunctions and eigenvalues in (53). From Lord et al. 2014,
Example 9.37, the eigenfunctions and eigenvalues on D are given by the formula
1
φ̃jk (x) := 2 cos(j π x2 ) cos(kπ x1 ), γ̃jk := exp(−π(j 2 + k 2 )l 2 ), j, k ≥ 1,
4
1620 C. Geiersbach et al.
(a) (b)
Fig. 7 Vector fields V k are displayed that result from solving the deformation equation (23) at
iteration k. (a) Vector field V 0 . (b) Vector field V 3
where terms are then reordered so that the eigenvalues appear in descending order
(i.e., φ1 = φ̃11 and λ1 = λ̃11 ). The correlation length l = 0.5 and the number of
summands M = 20 are fixed. For the simplicity of presentation, each subdomain has
the same eigenfunctions and eigenvalues, and only the means and random vectors
are modified. More precisely, (53) has the representation
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1621
H 1 -norm of V k
101 100
10−1
100
10−2
10−1
100 101 102 100 101 102
Iterations Iterations
(a) (b)
Fig. 8 Objective function and norm of the shape gradient as a function of iteration number (log/log
scale). (a) Objective function decay. (b) Deformation vector field
20
√
κ̃i (x, ω) = κ̄i (x) + γk φk (x)ξi,k (ω), (54)
k=1
(a) (b)
Fig. 9 Two examples of random field κ, with the left, right, and bottom scales corresponding to
the outer domain D0k , the ellipse D1k , and the tube D2k , respectively. (a) Example realization of the
κ at iteration k = 100. (b) Example realization of the κ at iteration k = 300
produces excellent results as shown in Fig. 10. Even in the presence of noise, the
progression of the subdomains resembles that shown in Fig. 5.
Figure 11 provides a stochastic counterpart to Fig. 8, in which one sees the
progression of the parametrized functional J (uk , ωk ) as well as the vector field
V k = V k (ωk ), where ωk represents the abstract realization from the probability
space in iteration k, which is manifested by the specific realizations of the random
vectors (ξi,1 (ωk ), . . . , ξi,20 (ωk )), i = 0, 1, 2, used in the random fields. In contrast
to the Armijo line search rule, the Robbins–Monro step-size rule does not guarantee
descent in every iteration. Moreover, the information displayed in the plots can
only provide estimates for the true objective j (uk ) = E[J (uk , ·)] and the average
E[V k (·)H 1 (D,R2 ) ]. Although small oscillations in the shapes were observed in
the course of the algorithm, the oscillations from the plots come more from the
stochastic error occuring due to J (uk , ωk ) ≈ E[J (uk , ·)] and V k (ωk )H 1 (D,R2 ) ≈
E[V k (ω)H 1 (D,R2 ) ]. The log/log scale misleadingly exaggerates these oscillations
for higher iteration numbers and the Robbins–Monro step-size rule tended to
dampen oscillations in the shapes for higher iterations. However, even with the
oscillations, descent is seen on average in both the parametrized objective and in
the H 1 –norm of the randomly generated deformation vector fields.
(a) (b)
(c) (d)
Fig. 10 The target shapes are displayed by the dotted lines. The figures show the progression
from the initial configuration of domains D 0 to the final configuration of domains D 400 . (a) Initial
configuration D 0 . (b) D 50 . (c) D 200 . (d) D 400
For the generation of the target data ȳ in the tracking-type objective functional,
the target shape u∗ is chosen to be the boundary of an ellipse as illustrated by the
dotted lines in Fig. 12. The target distribution ȳ is computed on the target domain
D ∗ = D0∗ D1∗ u∗ by solving the state equation (38), (39), and (40) using the
constant values κ̄0 = 1000 over the outer domain D0∗ and κ̄1 = 7.5 defined over the
ellipse D1∗ . The target data can be seen in Fig. 13. As in the previous experiments,
algorithms are run for 400 iterations. The results of the simulation are shown in
Fig. 12, where the target shape u∗ is represented by dotted lines. The same initial
configuration, shown in Fig. 12a, is used for three separate runs of the algorithm.
In the first run, the stochastic model with the parameters described in the previous
paragraph is used, and the stochastic gradient method (Algorithm 3) is used with
the step-size rule t k = 0.026 for k = 0, . . . , 200, and t k = 0.026/(k − 200) for
1624 C. Geiersbach et al.
102
J(uk , ω k ) 101 V k (ω k )H 1(D,R2)
Objective function
H 1 -norm of V k
101
100
100
10−1
10−1
100 101 102 100 101 102
Iterations Iterations
(a) (b)
Fig. 11 Objective function and norm of the shape gradient as a function of iteration number
(log/log scale). (a) Objective function decay. (b) Deformation vector field
Conclusion
This chapter gives an overview how the theory of (PDE-constrained) shape opti-
mization can be connected with the differential geometric structure of shape space
and how this theory can be adapted to handle harder problems containing multiple
shapes and uncertainties. The framework presented is focused on shape spaces as
Riemannian manifolds, in particular, on the space of smooth shapes and the Steklov–
Poincaré metric. The Steklov–Poincaré metric allows for the usage of the shape
derivative in its volume expression in optimization methods. A novel framework
developed in this chapter is a product shape, which allows for shape optimization
over a vector of shapes. As part of this framework, new concepts including the
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1625
(a) (b)
(c) (d)
Fig. 12 The figures show the initial configuration in (a) and the configuration computed using
the stochastic model and stochastic gradient approach in (b). Using the lower bound choices
produces an incorrect identification in (c); with the upper bound choices, the target shape is
likewise incorrectly identified. (a) Initial configuration D 0 . (b) D 400 for stochastic model. (c) D 400
produced using the constants κi,min , i = 0, 1. (d) D 400 using the constants κi,max , i = 0, 1
partial and multi-shape derivatives are presented. The steepest descent method
with Armijo backtracking on product shape spaces is formulated to solve a shape
optimization problem over a vector of shapes.
The second area of focus in this chapter is concerned with shape optimization
problems subject to uncertainty. The problem is posed as a minimization of the
expectation of a random objective functional depending on uncertain parameters.
Using the product shape space framework, it is no trouble to consider stochastic
shape optimization problems depending on shape vectors. Corresponding defini-
tions for the stochastic partial and multi-shape gradient are presented. These are
needed to present the stochastic gradient method on product shape spaces. It is
1626 C. Geiersbach et al.
discussed how the stochastic shape derivative in its volume expression can be used
algorithmically.
The final part of the chapter is dedicated to carefully designed numerical
simulations showing the performance of the algorithms. Compatible deterministic
and stochastic problems are presented. A novel technique for producing stochastic
samples of the Karhunen–Loève type is presented. The stochastic model is shown
in experiments to be robust if a model for the uncertainties is present.
The new framework provides a rigorous justification for computing descent
vectors “all-at-once” on a hold-all domain. Moreover, new concepts like the partial
shape derivatives and multi-shape derivatives provide tools that could be used
in other applications. There are some open questions; for one, it is not clear
how descent directions in general prevent shapes from intersecting as part of the
optimization procedure. Mesh deformation methods like the kind used here would
result in broken meshes. While the algorithms presented do not rely on remeshing,
it is notable that meshes lose their integrity if initial shapes are chosen too far away
from the target. These challenges will be addressed in other works.
References
Abraham, R., Marsden, J., Ratiu, T.: Manifolds, tensor analysis, and applications, vol. 75. Springer
Science & Business Media, New York, USA (2012)
Absil, P., Mahony, R., Sepulchre, R.: Optimization algorithms on matrix manifolds. Princeton
University Press, Princeton, USA (2008)
Alnæs, M.S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring,
J., Rognes, M.E., Wells, G.N.: The FEniCS project version 1.5. Arch. Numer. Softw. 3(100)
(2015). https://doi.org/10.11588/ans.2015.100.20553
Babuska, I., Tempone, R., Zouraris, G.: Galerkin finite element approximations of stochastic
elliptic partial differential equations. SIAM J. Numer. Anal. 42(2), 800–825 (2004)
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1627
Babuška, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential
equations with random input data. SIAM J. Numer. Anal. 45(3), 1005–1034 (2007)
Bänsch, E., Morin, P., Nochetto, R.H.: A finite element method for surface diffusion: the parametric
case. J. Comp. Phys. 203(1), 321–343 (2005). https://doi.org/10.1016/j.jcp.2004.08.022
Bauer, M., Harms, P., Michor, P.: Sobolev metrics on shape space of surfaces. J. Geom. Mech. 3(4),
389–438 (2011)
Bauer, M., Harms, P., Michor, P.: Sobolev metrics on shape space II: weighted Sobolev metrics and
almost local metrics. J. Geom. Mech. 4(4), 365–383 (2012)
Berggren, M.: A unified discrete-continuous sensitivity analysis method for shape optimization. In:
Fitzgibbon, W., et al. (eds.) Applied and numerical partial differential equations. Computational
methods in applied sciences, vol. 15, pp. 25–39. Springer (2010)
Cheney, M., Isaacson, D., Newell, J.: Electrical impedance tomography. SIAM Rev. 41(1), 85–101
(1999)
Dambrine, M., Dapogny, C., Harbrecht, H.: Shape optimization for quadratic functionals and states
with random right-hand sides. SIAM J. Control Optim. 53, 3081–3103 (2015)
Dambrine, M., Harbrecht, H., Puig, B.: Incorporating knowledge on the measurement noise in
electrical impedance tomography. ESAIM: Control Optim. Calc. Var. 25, 84 (2019)
Delfour, M., Zolésio, J.P.: Shapes and geometries: Metrics, analysis, differential calculus,
and optimization. Advanced design control, vol. 22, 2nd edn. SIAM, Philadelphia, USA
(2001)
Doǧan, G., Morin, P., Nochetto, R.H., Verani, M.: Discrete gradient flows for shape optimization
and applications. Comput. Meth. Appl. Mech. Eng. 196(37–40), 3898–3914 (2007). https://doi.
org/10.1016/j.cma.2006.10.046
Droske, M., Rumpf, M.: Multi scale joint segmentation and registration of image morphology.
IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2181–2194 (2007)
Etling, T., Herzog, R., Loayza, E., Wachsmuth, G.: First and second order shape optimization
based on restricted mesh deformations. SIAM J. Scient. Comput. 42(2), A1200–A1225 (2020).
https://doi.org/10.1137/19m1241465
Evans, L.: Partial differential equations. graduate studies in mathematics, vol. 19. American
Mathematical Society, Providence, USA (1998)
Feppon, F., Allaire, G., Bordeu, F., Cortial, J., Dapogny, C.: Shape optimization of a coupled
thermal fluid-structure problem in a level set mesh evolution framework. SeMA J. Boletin de la
Sociedad Espanñola de Matemática Aplicada. 76(3), 413–458 (2019). https://doi.org/10.1007/
s40324-018-00185-4
Fuchs, M., Jüttler, B., Scherzer, O., Yang, H.: Shape metrics based on elastic deformations. J. Math.
Imaging Vis. 35(1), 86–102 (2009)
Gangl, P., Laurain, A., Meftahi, H., Sturm, K.: Shape optimization of an electric motor subject to
nonlinear magnetostatics. SIAM J. Sci. Comput. 37(6), B1002–B1025 (2015)
Geiersbach, C.: Stochastic approximation for PDE-constrained optimization under uncertainty.
Ph.D. thesis, University of Vienna (2020)
Geiersbach, C., Pflug, G.C.: Projected stochastic gradients for convex constrained problems in
Hilbert spaces. SIAM J. Optim. 29(3), 2079–2099 (2019)
Geiersbach, C., Scarinci, T.: Stochastic proximal gradient methods for nonconvex problems in
Hilbert spaces. Comput. Optim. Appl. 3(78), 705–740 (2021). https://doi.org/10.1007/s10589-
020-00259-y
Geiersbach, C., Wollner, W.: A stochastic gradient method with mesh refinement for pde-
constrained optimization under uncertainty. SIAM J. Sci. Comput. 42(5), A2750–A2772 (2020)
Geiersbach, C., Loayza-Romero, E., Welker, K.: Stochastic approximation for optimization in
shape spaces. SIAM J. Optim. 31(1), 348–376 (2021)
Haber, E., Chung, M., Herrmann, F.: An effective method for parameter estimation with PDE
constraints with multiple right-hand sides. SIAM J. Optim. 22(3), 739–757 (2012)
Hardesty, S., Kouri, D., Lindsay, P., Ridzal, D., Stevens, B., Viertel, R.: Shape optimization for
control and isolation of structural vibrations in aerospace and defense applications. techreport,
Office of Scientific and Technical Information (OSTI) (2020). https://doi.org/10.2172/1669731
1628 C. Geiersbach et al.
Haubner, J., Siebenborn, M., Ulbrich, M.: A continuous perspective on shape optimization via
domain transformations. SIAM J. Scient. Comput. 43(3), A1997–A2018 (2020). https://doi.
org/10.1137/20m1332050
Herzog, R., Loayza-Romero, E.: A manifold of planar triangular meshes with complete riemannian
metric (2020). ArXiv:2012.05624
Hiptmair, R., Paganini, A.: Shape optimization by pursuing diffeomorphisms. Comput. Methods
Appl. Math. 15(3), 291–305 (2015)
Hiptmair, R., Jerez-Hanckes, C., Mao, S.: Extension by zero in discrete trace spaces: inverse
estimates. Math. Comput. 84(296), 2589–2615 (2015)
Hiptmair, R., Paganini, A., Sargheini, S.: Comparison of approximate shape gradients. BIT. Num.
Math. 55(2), 459–485 (2015). https://doi.org/10.1007/s10543-014-0515-z
Hiptmair, R., Scarabosio, L., Schillings, C., Schwab, C.: Large deformation shape uncertainty
quantification in acoustic scattering. Adv. Comput. Math. 44(5), 1475–1518 (2018)
Ito, K., Kunisch, K.: Lagrange Multiplier Approach to Variational Problems and Applications.
Advanced Design Control, vol. 15. SIAM, Philadelphia, USA (2008)
Ito, K., Kunisch, K., Peichl, G.: Variational approach to shape derivatives. ESAIM Control Optim.
Calc. Var. 14(3), 517–539 (2008)
Kendall, D.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond.
Math. Soc. 16(2), 81–121 (1984)
Kriegl, A., Michor, P.: The convient setting of global analysis. In Mathematical surveys and
monographs, vol. 53. American Mathematical Society, Providence, USA (1997). https://books.
google.de/books?id=l-XxBwAAQBAJ
Kwon, O., Woo, E.J., Yoon, J., Seo, J.: Magnetic resonance electrical impedance tomography
(MREIT): simulation study of J -substitution algorithm. IEEE Trans. Biomed. Eng. 49(2),
160–167 (2002)
Laurain, A., Sturm, K.: Domain expression of the shape derivative and application to electrical
impedance tomography. Technical Report No. 1863, Weierstraß-Institut für angewandte Analy-
sis und Stochastik, Berlin (2013)
Laurain, A., Sturm, K.: Distributed shape derivative via averaged adjoint method and applications.
ESAIM: Math. Model. Numer. Anal. 50(4), 1241–1267 (2016)
Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach.
Intell. 29(2), 286–299 (2007)
Liu, D., Litvinenko, A., Schillings, C., Schulz, V.: Quantification of airfoil geometry-induced
aerodynamic uncertainties—comparison of Approaches (2017)
Lord, G., Powell, C., Shardlow, T.: An introduction to computational stochastic PDEs. Cambridge
University Press, Cambridge, UK (2014)
Luft, D., Schulz, V.: Pre-shape calculus and its application to mesh quality optimization. Control.
Cybern. 50(3), 263–301 (2021a) https://doi.org/10.2478/candc-2021--0019.ArXiv:2012.09124
ArXiv:2012.09124
Luft, D., Schulz, V.: Simultaneous shape and mesh quality optimization using pre-shape cal-
culus. Control. Cybern. 50(4), 473–520 (2021b) https://doi.org/10.2478/candc-2021--0028.
ArXiv:2103.15109
Martin, M., Krumscheid, S., Nobile, F.: Analysis of stochastic gradient methods for PDE-
constrained optimal control problems with uncertain parameters. Tech. rep., École Polytech-
nique MATHICSE Institute of Mathematics (2018)
Martin, M., Nobile, F., Tsilifis, P.: A multilevel stochastic gradient method for pde-constrained
optimal control problems with uncertain parameters. arXiv preprint arXiv:1912.11900 (2019)
Martínez-Frutos, J., Herrero-Pérez, D., Kessler, M., Periago, F.: Robust shape optimization of
continuous structures via the level set method. Comput. Methods Appl. Mech. Eng. 305,
271–291 (2016)
45 PDE-Constrained Shape Optimization: Toward Product Shape. . . 1629
Michor, P., Mumford, D.: Vanishing geodesic distance on spaces of submanifolds and diffeomor-
phisms. Doc. Math. 10, 217–245 (2005)
Michor, P., Mumford, D.: Riemannian geometries on spaces of plane curves. J. Eur. Math. Soc.
(JEMS) 8(1), 1–48 (2006)
Michor, P., Mumford, D.: An overview of the Riemannian metrics on spaces of
curves using the Hamiltonian approach. Appl. Comput. Harmon. Anal. 23(1), 74–113
(2007)
Morin, P., Nochetto, R.H., Pauletti, M.S., Verani, M.: Adaptive finite element method for shape
optimization. ESAIM Control Optim. Calc. Var. 18(4), 1122–1149 (2012). https://doi.org/10.
1051/cocv/2011192
Novruzi, A., Pierre, M.: Structure of shape derivatives. J. Evol. Equ. 2(3), 365–382 (2002)
O’neill, B.: Semi-Riemannian geometry with applications to relativity. Academic Press, London,
UK (1983)
Onyshkevych, S., Siebenborn, M.: Mesh quality preserving shape optimization using nonlinear
extension operators. J Optim. Theory. Appl. 189(1), 291–316 (2021). https://doi.org/10.1007/
s10957-021-01837-8
Paganini, A.: Approximative shape gradients for interface problems. In: Pratelli, A., Leugering, G.
(eds.) New trends in shape optimization. International series of numerical mathematics, vol. 166,
pp. 217–227. Springer (2015)
Quarteroni, A., Valli, A.: Domain decomposition methods for partial differential equations. Oxford
University Press, Oxford, UK (1999)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407
(1951)
Schulz, V.: A Riemannian view on shape optimization. Found. Comput. Math. 14(3), 483–501
(2014)
Schulz, V., Siebenborn, M.: Computational comparison of surface metrics for PDE constrained
shape optimization. Comput. Methods Appl. Math. 16(3), 485–496 (2016)
Schulz, V., Welker, K.: On optimization transfer operators in shape spaces. In: Shape optimization,
homogenization and optimal Control, pp. 259–275. Springer (2018)
Schulz, V., Siebenborn, M., Welker, K.: Structured inverse modeling in parabolic diffusion
problems. SIAM J. Control Optim. 53(6), 3319–3338 (2015)
Schulz, V., Siebenborn, M., Welker, K.: Efficient PDE constrained shape optimization based on
Steklov-Poincaré type metrics. SIAM J. Optim. 26(4), 2800–2819 (2016)
Schwab, C., Gittelson, C.: Sparse tensor discretizations of high-dimensional parametric and
stochastic pdes. Acta Numer. 20, 291–467 (2011)
Shapiro, A., Wardi, Y.: Convergence analysis of gradient descent stochastic algorithms. J. Optim.
Theory Appl. 91(2), 439–454 (1996)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and
theory. SIAM, Philadelphia, USA (2009)
Siebenborn, M., Vogel, A.: A shape optimization algorithm for cellular composites. PINT
Computing and Visualization in Science (2021). ArXiv:1904.03860
Siebenborn, M., Welker, K.: Algorithmic aspects of multigrid methods for optimization in shape
spaces. SIAM J. Sci. Comput. 39(6), B1156–B1177 (2017)
Sokolowski, J., Zolésio, J.: Introduction to shape optimization. In: Computational mathematics,
vol. 16. Springer (1992)
Sturm, K.: Lagrange method in shape optimization for non-linear partial differential equations:
a material derivative free approach. Technical Report No. 1817, Weierstraß-Institut für ange-
wandte Analysis und Stochastik, Berlin (2013)
Sturm, K.: Shape optimization with nonsmooth cost functions: from theory to numerics. SIAM J.
Control Optim. 54(6), 3319–3346 (2016). https://doi.org/10.1137/16M1069882
1630 C. Geiersbach et al.
Wardi, Y.: Stochastic algorithms with armijo stepsizes for minimization of functions. J. Optim.
Theory Appl. 64(2), 399–417 (1990)
Welker, K.: Efficient PDE constrained shape optimization in shape spaces. Ph.D. thesis, Universität
Trier (2016)
Welker, K.: Suitable spaces for shape optimization. Appl. Math. Optim. (2021). https://doi.org/10.
1007/s00245-021-09788-2
Wirth, B., Rumpf, M.: A nonlinear elastic shape averaging approach. SIAM J. Imag. Sci. 2(3),
800–833 (2009)
Wirth, B., Bar, L., Rumpf, M., Sapiro, G.: A continuum mechanical approach to geodesics in shape
space. Int. J. Comput. Vis. 93(3), 293–318 (2011)
Zolésio, J.P.: Control of moving domains, shape stabilization and variational tube formulations.
Int. Ser. Numer. Math. 155, 329–382 (2007)
Iterative Methods for Computing
Eigenvectors of Nonlinear Operators 46
Guy Gilboa
Contents
Introduction and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632
One-Homogeneous Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632
Eigenvectors of Nonlinear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634
Nossek-Gilboa (NG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636
NG Flow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637
NG Iteration Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1639
Aujol-Gilboa-Papadakis (AGP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1639
AGP Flow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1640
AGP Iteration Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1641
Feld-Aujol-Gilboa-Papadakis (FAGP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1641
Cohen-Gilboa (CG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644
Bungert-Hait-Papadakis-Gilboa (BHPG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1647
Evaluation and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1649
Global and Local Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1649
Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1651
Conclusion, Discussion and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1652
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656
Abstract
In this chapter we are examining several iterative methods for solving nonlinear
eigenvalue problems. These arise in variational image processing, graph partition
and classification, nonlinear physics, and more. The canonical eigenproblem
we solve is T (u) = λu, where T : Rn → Rn is some bounded nonlinear
operator. Other variations of eigenvalue problems are also discussed. We present
a progression of five algorithms, coauthored in recent years by the author and
G. Gilboa ()
Technion – IIT, Haifa, Israel
e-mail: [email protected]
Keywords
In this section, we outline some basic notations and properties which will be used
throughout this chapter. A main type of functionals we are discussing are one-
homogeneous functionals, used frequently as regularizers in image processing and
learning.
One-Homogeneous Functionals
The functional J in finite dimensions can be, for instance, of the general form:
⎛ ⎞1/q
N N
J (u) = ⎝ wij |ui − uj |q ⎠ , (2)
i=1 j =1
From the equivalence of norms, we have that if u is of zero mean, there exists a
constant κ > 0 for which
p, 1 = 0.
√
We use 2 and 1 norms of u defined as ||u||2 = u, u and ||u||1 = u, sign(u).
1634 G. Gilboa
Lu = λu,
−Δu = λu,
where Δ denotes the Laplacian. For appropriate boundary conditions, sines and
cosines are solutions to this problem, which are the basis elements of the Fourier
transform. For one-homogeneous regularizing functionals, such as total variation,
one obtains different (sharp) eigenfunctions, which can serve for representing
signals based on nonlinear spectral transforms, as shown in Gilboa (2013, 2014,
2018), Burger et al. (2016), and Bungert et al. (2019a). We would not elaborate on
this direction, which is beyond the scope of this chapter.
For absolutely one-homogeneous functionals, the eigenvalues are nonnegative,
since J (u) = λu, u = λ||u||22 and λ = J (u)2 ≥ 0. An interesting insight on
||u||2
the eigenvalue λ shown in Aujol et al. (2018) can be gained by the following
proposition. We define K = {∂J (0)} to be the set of possible subgradients for any u.
Indeed if p ∈ ∂J (u), then p ∈ ∂J (0). We first note that an eigenfunction that admits
λu ∈ ∂J (u) has zero mean from Property above. Next, we have the following result.
Proposition. For any nonconstant eigenfunction u, we have ∀μ ≥ λ:
λu = Proj(μu),
K
Eigenfunctions in the form of (7) have analytic solutions, when used as initial
conditions in gradient flows. Let a gradient flow be defined by
where ut is the first-time derivative of u(t; x). As shown in Burger et al. (2016),
when the flow is initialized with an eigenfunction (i.e., λf ∈ ∂J (f )), the following
solution is obtained:
where (q)+ = q for q > 0 and 0 otherwise. This means that the shape f (x) is
spatially preserved and changes only by contrast reduction throughout time. An
analytic solution (see Benning and Burger 2013, and Burger et al. 2016) can be
shown for the proximal problem as well, that is, a minimization with the square 2
norm:
α
min J (u) + f − u22 . (10)
u 2
In this case, when f is an eigenfunction and α ∈ R+ (R+ = {x ∈ R | x ≥ 0}) is
fixed, the problem has the following solution:
+
λ
u(x) = 1 − f (x). (11)
α
In this case also, u(x) preserves the spatial shape of f (x) (as long as α > λ). This
was already observed by Meyer in (2001) for the case of a disk with J the TV
functional. Earlier research on nonlinear eigenfunctions induced by TV, which are
set indicator functions, has been referred as calibrable sets. First aspects of this line
of research can be found in the work of Bellettini et al. (2002). They introduced
a family of convex bounded sets C with finite perimeter in R2 that preserve their
boundary throughout the TV flow (gradient flow (8) where J is TV). It is shown
that the indicator function of a set C, 1C , with perimeter P (C) which admits
P (C)
ess sup κ(p) ≤ (12)
p∈∂C |C|
P (C)
λC = . (13)
|C|
We will address here ways also of how to solve such problems. In the variational
context, T and Q are two subgradient elements of different convex functionals, J
and H ; thus (14) is rewritten as
This type of problem appears in the relaxation of the Cheeger cut problem, where
J is TV and H is 1 ; see Hein and Bühler (2010), Szlam and Bresson (2010),
and Feld et al. (2019). There are several additional algorithms which attempt to
compute nonlinear eigenfunctions in some specific settings. In Bozorgnia (2016,
2019), algorithms for computing the smallest eigenvalue and eigenfunction of the
p-Laplacian are proposed, along with convergence proofs. As part of analyzing
variational networks, (Effland et al. 2020) analyze the learned regularizers by
computing their eigenfunctions. This is performed by minimizing a generalized
Rayleigh quotient using accelerated gradient descent. In the process of nonlinear
spectral decomposition based on gradient descent (Gilboa 2014; Burger et al. 2016),
near extinction time only a single eigenfunction “survives.” This idea is formalized
in Bungert et al. (2019b) where eigenfunctions are computed by taking the limit
at extinction time of a gradient flow. Gautier et al. (2019, 2020) have used power
iterations to solve several nonlinear eigenpair problems. Existence and uniqueness
results were obtained based on Perron-Frobenius theory.
We will now present in detail five algorithms, coauthored by the author and
colleagues, to solve various types of nonlinear eigenvalue problems. Some of the
iterative algorithms can be understood as a discretization in time of a continuous
nonlinear flow.
Nossek-Gilboa (NG)
This simple algorithm, presented first in Nossek and Gilboa (2018), was the first of
a series of algorithms, which stem from nonlinear flows. These flows reach a steady
state only at eigenfunctions. Different initial conditions yield different steady states.
The goal for the (NG) algorithm is to provide a solution to the nonlinear eigenvalue
problem (7), where J is an absolutely one-homogeneous functional, admitting (1).
We assume a constant unit vector is in its null-space (Property ). The proposed
nonlinear flow is
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1637
u p
ut = − , p ∈ ∂J (u), (16)
||u||2 ||p||2
uk+1 p k+1
uk+1 = uk + Δt − k , (17)
||uk ||2 ||p ||2
NG Flow Properties
There are several desired properties of this flow. Although it does not emerge as a
gradient flow of a certain energy functional, the solution becomes smoother with
time (in terms of the regularizing functional J ). On the other hand, the 2 norm
of the solution is increasing. The main properties are summarized in the following
theorem. In this case, the proof is presented, and is relatively simple to follow (it is
based on Nossek and Gilboa 2018 and Aujol et al. 2018). This allows us to get the
intuition of how such flows behave. In subsequent parts, proofs are omitted, and we
refer the reader to the relevant papers for details, to avoid a lengthy presentation.
Theorem 1. Assume that there exists a solution u in W 1,2 ((0, T ); X), T > 0, of
the flow (16). Then the following properties hold:
d 1
||u(t)||22 ≥ 0, (19)
dt 2
d
J (u(t)) ≤ 0 for almost every t. (20)
dt
Proof. Recalling that p, u ≤ ||p||2 ||u||2 , this flow ensures that
d 1 u p u, p
||u(t)||22 = u, ut = u, − = ||u||2 − ≥0
dt 2 ||u||2 ||p||2 ||p||2
d
F (v(t)) = z, vt , ∀z ∈ ∂F (v(t)) a.e. in (0, T ).
dt
This inequality holds for almost every t, and since t → J (u(t)) is an absolutely
continuous function, we deduce that it is a nonincreasing function.
||p||2 J (u)
p= u ∈ ∂J (u) ⇒ p = u
||u||2 ||u||22
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1639
J (u)
and u is an eigenfunction of J with eigenvalue λ = .
||u||22
Theorem 2. The solution uk of the discrete flow (17) of Algorithm 1 has the
following properties:
(i) uk , 1 = 0.
(ii) pk+1 2 ≤ pk 2 .
(iii) uk+1 2 ≥ uk 2 .
k+1 k
(iv) J||u(uk+1 || ) ≤ J||u(uk || ) .
2 2
(v) A sufficient and necessary condition for steady-state uk+1 = uk holds if uk is
an eigenfunction, admitting (7).
Aujol-Gilboa-Papadakis (AGP)
In Aujol et al. (2018), the authors proposed a generalized flow for solving (7), which
is more stable than (NG) and can be better analyzed theoretically. The general flow,
for α ∈ [0; 1], is
α 1−α
J (u) J (u)
ut = u− p, p ∈ ∂J (u), (21)
||u||22 ||p||22
with u(0) = u0 ∈ X, u0 , 1 = 0. Notice that for α = 1/2, we retrieve the (NG)
flow, (16), up to a normalization with J 1/2 (u). For the case α = 1, the flow becomes
J (u)
ut = u − p, p ∈ ∂J (u). (22)
||u||22
In this case, there is no term with ||p||2 in the denominator, and the analysis
simplifies. Uniqueness of the flow and convergence of the iterative algorithm are
established.
For the case α = 1, we get that the 2 norm is fixed in time. This allows us
to have a unit norm throughout the evolution. In the discrete iterations, however,
an additional normalization step is required to maintain this property. Given any
input f , to obtain a valid initial condition u0 , we first subtract the mean and then
1640 G. Gilboa
normalize by the 2 norm. The associated iterative algorithm, α = 1, for solving (7)
is detailed in Algorithm 2.
J (uk )uk+1/2
uk+1/2 = uk + Δt − p k+1/2 ,
||uk ||22
uk+1/2
uk+1 = . (23)
||uk+1/2 ||2
There is a unique minimizer v for any time step Δt which is in the range specified
above.
Theorem 3. For u0 of zero mean and ∀α ∈ [0; 1], if u is in W 1,2 ((0, T ); X), then
the trajectory u(t) of the flow (21) satisfies the following properties:
(i) u(t), 1 = 0.
d
(ii) dt J (u(t)) ≤ 0 for almost every t. Moreover, t → J (u(t)) is nonincreasing.
If α = 0, we have for almost every t that dt d
J (u(t)) = 0 and t → J (u(t)) is
constant.
d
(iii) dt ||u(t)||2 ≥ 0 and dt
d
||u(t)||2 = 0 for α = 1.
||p∗ ||2
2(1−α)
(iv) If the flow converges to u∗ , we have p∗ = J 2α−1 (u∗ ) u∗ ∈ ∂J (u∗ ) so
||u∗ ||2α
2
that u∗ is an eigenfunction.
Uniqueness. For the case α = 1, one can establish uniqueness of the flow (22),
under mild conditions.
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1641
Theorem 4. Let u and v be two solutions of (22) in W 1,2 ((0, T ); X) with respec-
tive initial condition u0 and v0 , such that J (u0 ) < +∞ and J (v0 ) < +∞, with
u0 2 = v0 2 = 1. Then we have:
d 1 J (u) + J (v)
u − v22 ≤ u − v22 . (25)
dt 2 2
By the fact that J (u) is decreasing and using Gronwall lemma, we obtain
u − v22 ≤ u0 − v0 22 exp (J (u0 ) + J (v0 )(t − t0 ) . (26)
Theorem 5. Let u0 in X, and the sequence uk defined by (23). Then the sequences
J (uk ) and pk 2 are nonincreasing, uk 2 = u0 2 for all k, and uk+1 − uk → 0.
Feld-Aujol-Gilboa-Papadakis (FAGP)
In Feld et al. (2019), the aim is to solve the problem (15) for the case when J and H
are both absolutely one-homogeneous functionals. Let us consider the generalized
nonlinear Rayleigh quotient:
J (u)
R(u) := . (27)
H (u)
In an analogue to the linear case, eigenfunctions in the sense of (15) are critical
points of (27). In segmentation, classification, and clustering, often we seek
eigenfunctions with the least (strictly positive) eigenvalue. Thus, excluding the null-
space of J and H , we seek to minimize the Rayleigh quotient (27). A classical way
to reach a local minimizer of R(u) is by using a gradient-descent flow:
ut = −∇R(u).
1642 G. Gilboa
Taking the variational derivative of R(u), with q ∈ ∂H (u), p ∈ ∂J (u), the gradient
descent flow is
J (u)q − H (u)p
ut = . (28)
H 2 (u)
R(u)q − p
ut = .
H (u)
This flow is hard to analyze theoretically, mainly due to the division by H (u).
Therefore, Feld et al. (2019) proposed the following flow to minimize R(u):
ut = R(u)q − p. (29)
ut = −∇(log R(u)),
This is motivated by a widely used practice of using the log of a function involving
multiplicative expressions. It is commonly employed in statistics and machine
learning algorithms, such as maximum likelihood estimation and policy learning.
The flow is essentially a time rescaling of (29) by 1/J (u). We note that it is not in
the form of Brezis Lemma 1 and therefore is harder to analyze. We will not focus
on this flow here. It is worth mentioning, however, that in the context of the Cheeger
cut problem, we found out that numerically it is very stable and highly resilient to
the choice of the discrete time step. Thus a large time step can be chosen, which
speeds up numerical convergence (see details in Feld et al. 2019).
The algorithm is based on the following semi-explicit scheme of the flow:
(uk+1/2 − uk )/Δt = R(uk )qk − pk+1/2 , qk ∈ ∂H (uk ), pk+1/2 ∈ ∂J (uk+1/2 )
uk+1 = uk+1/2 /||uk+1/2 ||2 .
(31)
1
uk+1/2 = arg min F (u) := ||u − uk ||22 − R(uk ) qk , u + J (u), (32)
u∈X 2Δt
where uk+1/2 being a minimizer of F implies that there exist pk+1/2 ∈ ∂J (uk+1/2 )
such that
1 k+1/2
(u − uk ) − R(uk )qk + pk+1/2 = 0.
dt
Further relations to calibrable sets and variants of Algorithm 3 for Cheeger cut
minimization on graphs are provided in detail in Feld et al. (2019).
Cohen-Gilboa (CG)
ut + uux + δ 2 uxxx = 0,
with δ a small real scalar. Reformulating this expression for a stationary wave yields
u2
− uXX = λ −cu + , (34)
2
In recent decades there has been a growing research concerning nonlinear physical
models, where more complex nonlinear eigenvalue problems emerge, such as the
two-dimensional nonlinear Schrodinger equation:
uxx + uyy − V0 sin2 x + sin2 y u + σ |u|2 u = −μu. (35)
In Cohen and Gilboa (2018), a method for solving such problems was proposed,
following the flows of Nossek and Gilboa (2018) and Aujol et al. (2018). The basic
formulation was to solve the (double) nonlinear problem:
where T (u) ∈ ∂J (u), J (u) is a convex, proper, lsc regularizing functional and Q(u)
is a bounded nonlinear operator, with both T , Q ∈ L2 (Ω). The following flow is a
natural generalization of Nossek and Gilboa (2018):
where
Q(u) T (u)
M(u) = s − , (38)
||Q(u)||2 ||T (u)||2
− Δu = λQ(u). (39)
1
E(u) = Q(u), 12 , (40)
2
with
∂E = Q(u), 1∂Q,
and ∂Q is the variational derivative of Q(u), 1. We would like E(u) = 0 at steady
state to ensure we obtain a meaningful solution. A variant of a gradient descent with
respect to E is defined by
ut = C(u) (41)
where
1646 G. Gilboa
∂u E, T (u)
C(u) = −∂u E + T (u). (42)
||T (u)||22
It ensures one decreases E while not increasing J . We call this the complementary
flow. Let us compute the time derivatives of J and E:
d
J (u) =T (u), ut = T (u), C(u)
dt
∂u E, T (u)
=T (u), −∂u E + T (u) = 0. (43)
||T (u)||22
For E we have
d
E(u) =∂u E, ut = ∂u E, C(u)
dt
∂u E, T (u)2
= − ||∂u E||22 + ≤ 0, (44)
||T (u)||22
where the last inequality follows Cauchy-Schwarz. We thus can merge the main
flow (37) and the complementary one (41), with some weight parameter α to obtain
the final flow:
where α ∈ R+ and M(u) and C(u) are defined in (38), and (42), respectively. This
combined flow admits (d/dt)J (u) ≤ 0 and (d/dt)E(u) ≤ 0 (for α large enough).
Numerically, iterations which follow this flow are provided in Cohen and Gilboa
(2018), using the following adaptive time step for the main flow:
Δuk , M(uk )
dtM = 2 , (46)
||∇M(uk )||22
1
E(uk+ 2 )
dtC = − 1 1
. (47)
∂E(uk+ 2 ), C(uk+ 2 )
The choice of dtC was such that it approximates in a single step E(u) ≈ 0, within a
first Taylor approximation. The numerical algorithm, a dissipating flow with respect
to the energy term J (ensured to be nonincreasing), is shown in Algorithm 4. Since
it is basically an explicit scheme with carefully chosen time steps, each iteration
requires a low computational effort.
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1647
Bungert-Hait-Papadakis-Gilboa (BHPG)
The last algorithm presented here is related to very general and complex nonlinear
operators, which often cannot be expressed analytically. In Hait-Fraenkel and
Gilboa (2019) and Bungert et al. (2020), the operators considered were nonlinear
denoisers, which can be based on classical algorithms or on deep neural networks.
The setting is as follows. Let T : H → H be a generic (nonlinear) operator on a
real Hilbert space H with norm || · ||. In the case of a neural network, one typically
has H = Rn , equipped with the Euclidean norm. We aim at solving the nonlinear
eigenproblem (6):
T (u) = λu,
Luk
uk+1 ← , k ← k + 1. (48)
Luk 2
Under mild conditions, it is known to converge to the eigenvector with the largest
eigenvalue, although convergence is slow. A straightforward analogue of this
process for the nonlinear case, having an operator T (u), is to initialize similarly
and to iterated until convergence:
1648 G. Gilboa
T (uk )
uk+1 ← , k ← k + 1. (49)
T (uk )2
One can analyze this process more easily in a restricted nonlinear case, where J is
an absolutely one-homogeneous functional, based on a proximal operator of J :
1
proxJα (u) := arg min ||v − u||2 + αJ (v), (50)
v∈H 2
which for J = T V coincides with the ROF denoising model (Rudin et al. 1992).
In Bungert et al. (2020), it was shown that the process is well defined for a range
of parameters α, that the energy is decreasing, J (uk+1 ) ≤ J (uk ), along with a full
proof of convergence to a nonlinear eigenvector, in the sense of (6).
For more complex nonlinear operators, however, certain modifications are
required. A critical issue is the range of the operator. Unlike linear or homogeneous
operators, general nonlinear operators often are expected to perform only in a certain
range. This is certainly true in neural networks, where the range is dictated implicitly
by the range of the images in the training set. Thus normalization by the norm, as in
(49), can drastically change the range of uk and cause unexpected behavior of the
operator. Furthermore, the mean value of uk is a significant factor. For denoisers,
we often expect that a denoising operation does not change the mean value of the
input image, that is
It can be shown that for any vector u = 0 with nonnegative entries and a denoiser
T admitting (52), if u is an eigenvector, then λ = 1. Another issue is the invariance
to a constant shift in illumination. We expect the behavior of T to be invariant to a
small global shift in image values. That is, T (u + c) = T (u) + c, for any c ∈ R,
such that (u + c) ∈ H.
We thus relax the basic eigenproblem (6) as follows:
where λ ∈ R, ū = 1, u/|Ω| is the mean value of u over the image domain Ω. Note
that now (relaxed) eigenvectors, admitting (53), can have any eigenvalue, keeping
the assumptions on T stated above. In addition, if u is an eigenvector, so is u + c,
as expected for operators with invariance to global value shifts. A suitable Rayleigh
quotient, associated with the relaxed eigenvalue problem (53), is
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1649
u − u, T (u) − T (u)
R † (u) = , (54)
||u − u||22
which still has the property that λ = R † (u) whenever u fulfills (53). The modified
nonlinear power method is detailed in Algorithm 5, aiming at computing a relaxed
eigenvector (53) by explicitly handling the mean value and keeping the norm of the
initial condition. We found this adaptation to perform well on denoising networks.
We present here several results of the algorithms presented earlier. First we discuss
how the numerical solutions can be evaluated. Then we show several numerical
examples related to image processing, learning, and physics.
Since there is often no ground truth or analytic solutions for nonlinear eigenvalue
problems, we need to find alternative ways to determine whether the algorithm
converged to an eigenfunction. Often exact convergence is very slow; thus knowing
that you approximately reached an eigenfunction numerically may also speed up the
algorithm and serve as a good stopping criterion for the iterative process.
One general formulation for any operator T is by the angle (see Nossek and
Gilboa 2018). For eigenvectors, vectors u and T (u) are collinear. Thus their
respective angle is either 0 (for positive eigenvalues) or π (for negative eigenvalues).
Since both u and T (u) are real, eigenvalues are also real. Thus, the angle is a simple
scalar measure that quantifies how close u and T (u) are to collinearity. We define
the angle θ between u and T (u) by
u, T (u)
cos(θ ) = . (55)
uT (u)
1650 G. Gilboa
q 20°
100
15°
Angle [deg]
50 10°
5°
0
10 20 30 40 50 60 0°
0 1000 2000 3000 4000 5000
Fig. 2 θ (degrees) as a function of iterations, for (NG) flow, J = T GV , and for (CG) flow,
Nonlinear Schrodinger equation. (Taken from Nossek and Gilboa 2018 and Cohen and Gilboa
2018)
See Fig. 1 for an illustration of θ . In most cases discussed here, we have positive
eigenvalues; thus we aim to reach an angle close to 0. In Fig. 2 we show two
examples of the behavior of theta over time for (NG) and (CG) algorithms. Note
that θ may not be monotonic and may increase in some time range. The angle θ
is a good global measure. In the iterative algorithms, it can be used as a stopping
criterion. Instead of requiring ||uk+1 − uk ||2 < ε, one can require reaching a small
enough theta θ < θthres . In our studies we often regard a function with θ < π/360
( 12 degree) as a numerical eigenfunction.
One may also like to have a local measure. Usually there is no precise pointwise
convergence of (T (u))(x) = λu(x), ∀x. A good way to see how spatially the
function is close to an eigenfunction is by examining the ratio:
T (u)
Λ(x) = , ∀u(x) = 0.
u
At full convergence we should have Λ(x) ≡ λ. The deviation map from a constant
function reveals the areas where the numerical approximation is less accurate. To
avoid dividing by values close to 0, one may compute this map only for u(x) > δ,
where δ is a small constant. In Fig. 3 we show two examples of this ratio, when one
obtains a function close (but not precisely) an eigenfunction and for a case with full
convergence.
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1651
10
10
Fig. 3 Local measure Λ(x) = T (u)/u. At convergence T (u) = λu; thus for any u = 0, we can
examine the ratio Λ(x), which should be a constant function of value λ, ∀x. Top row, algorithm
did not fully converge yet, u is close to an eigenfunction for isotropic TV, and the ratio (right)
exposes areas where there is deviation from a constant. Bottom row, a converged eigenfunction for
anisotropic TV. The ratio image is constant, up to numerical precision. (Taken from Aujol et al.
2018)
Numerical Examples
Fig. 4 Two examples of the (NG) flow. Top row J = T V , bottom row J = T GV of order 2
(Bredies et al. 2010). (Taken from Nossek and Gilboa 2018)
Fig. 5 Nonlinear power method evolution (BHPG) for a denoising neural-network FFDNet
(Zhang et al. 2018). Converged eigenfunction (λ = 1), right, is a highly stable structure for the
network. (Taken from Bungert et al. 2020)
In Figs. 8 and 9, we show the resilience of eigenfunctions against noise, esp. when
denoised by the matching regularizer J or operator T . In Fig. 8 an eigenfunction
of TV was denoised using three classical algorithms. Spectral TV (Gilboa 2014),
which is based on the TV regularizer, is most fit to denoise such functions. In
Fig. 9 we see a similar trend for EPLL denoiser. Here we have the most stable
and unstable eigenfunctions (depending on their eigenvalues) and results of natural
images, which are in between, with respect to denoising results. This gives insight
on the priors of the denoiser, with respect to the expected spatial structures. Also
adversarial examples can be obtained.
In this chapter several methods for solving nonlinear eigenvalue problems are
presented. Such problems appear in wide and diverse fields of signal and image
processing, classification and learning, and nonlinear physics. It is shown how some
fundamental concepts of linear eigenvalue problems carry out to the nonlinear case.
Specifically, the generalized Rayleigh quotient is a key notion, where eigenfunctions
serve as its critical points. A common theme of the presented algorithms is the use of
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1653
1
0.8
0.6
0.4
0.2
0
–0.2
–0.4
–0.6
–0.8
–1
45
40
35 45
30 40
25 35
20 30
25
15 20
10 15
5 10
5
0 0
0.8
0.6
0.4
0.2
0
8π
4π 8π
0 4π
–4π 0
–4π
–8π –8π
Fig. 6 EF induced by TGV (left, (NG) flow) and EF of the 2D nonlinear Schrodinger equation
(35) (right, (CG) flow). (Taken from Nossek and Gilboa 2018 and Cohen and Gilboa 2018)
1654 G. Gilboa
Fig. 7 Results of the flow for TV defined on graphs based on point cloud distances. The processes
converge to natural clustering of the data. (Taken from Aujol et al. 2018)
Fig. 8 An eigenfunction obtained by (NG) algorithm for TV. These structures are highly stable in
denoising and most suitable for the regularizer (here TV). Here it is shown that for additive white
Gaussian noise, spectral TV (Gilboa 2014) recovers well the signal, compared to well-designed
classical denoisers BM3D (Dabov et al. 2007) and EPLL (Zoran and Weiss 2011). (Taken from
Nossek and Gilboa 2018)
an (often long) iterative process to compute a single eigenfunction. The process can
sometimes be understood as a discrete realization of a continuous nonlinear PDE.
These nonlinear flows may emerge as gradient descent of a certain energy. However,
this energy is always non-convex and has many local minima (each of them is an
eigenfunction). Naturally, this implies that the selection of the initial condition is
critical to the computation. This is actually true for all iterative processes presented
here, even if they are not directly based on a non-convex energy. We would like to
highlight several challenges this emerging field is still facing with.
We list below the main intriguing issues and open problems:
1. Initial condition. What are the effects of the initial condition to the computation
process? Can a link be formulated between the initial condition and the obtained
eigenfunction? Is it related to a decomposition of the initial condition into
eigenfunctions, in an analogue manner to the linear case? Are there special
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1655
Fig. 9 Nonlinear power method for EPLL denoiser. PSNR gain: eigenfunctions vs. natural
images, varnoise = 15 varimg . (Taken from Hait-Fraenkel and Gilboa 2019)
characteristics to the flow when random noise serves as initial condition? Is noise
a good choice and in what sense?
2. Mapping the eigenfunction landscape of a nonlinear operator. Can one
characterize analytically eigenfunctions for a broad family of operators? This
was successfully performed for TV (mainly in 2D). For more complex operators
and complicated domains or graphs, this is still an open problem. For a given
operator, how to design numerically algorithms which span well its eigenfunc-
tions? We have shown that eigenfunctions of large and small eigenvalues can be
computed; however reaching middle-range eigenvalues is highly nontrivial with-
out prohibitively large computational efforts (passing through all eigenvalues in
ascending/descending order).
3. Spectral decomposition. Can a general theory be developed related to the
decomposition of a signal into nonlinear eigenfunctions? For the case of one-
homogeneous functionals, it was shown how gradient-descent flows can be used
for decomposition (see Gilboa 2014, Burger et al. 2016, and Bungert et al.
2019a). A similar phenomenon was observed for the p-Laplacian case in Cohen
and Gilboa (2020). Can this be extended to gradient descent of general convex
functionals? Can these flows be used to generate multiple eigenfunctions in a
much more efficient manner?
4. Convergence rates. Until now the algorithms presented here did not deal
with convergence rates. They are inherently quite slow; sometimes hundreds
or even thousands of iterations are needed in order to numerically converge.
1656 G. Gilboa
A first analysis of the convergence rate of nonlinear power methods for one-
homogeneous functionals is in Bungert et al. (2020). This area surely requires
additional focus.
5. Correspondence to the linear case. It was shown that the extended definition
of the Rayleigh quotient generalizes very well in the nonlinear setting. Are there
additional properties related to eigenvalue analysis that can be generalized? For
instance, for the power method, we know in the linear case that the method
converges to the eigenfunction with the largest eigenvalue (which is part of
the initial condition). We see a similar trend in the nonlinear case, where large
eigenvalues are reached. Can this be formalized?
6. Neural networks as operators. Last but not least, can neural networks benefit
from this research field? We have shown in Bungert et al. (2020) that one can treat
an entire neural network (intended for denoising) as a single complex nonlinear
operator and find some of its eigenfunctions. They represent highly stable and
unstable modes (depending on the eigenvalue). Can additional insights be gained
by analyzing eigenfunctions of deep neural networks? How can eigenfunctions
be defined for classification networks (where the input and output dimensions are
very different)? One direction is to develop singular value decomposition into a
nonlinear setting, following the earlier work of Benning and Burger (2013). One
can also analyze eigenfunctions between layers in the net, the effect of gradient
descent (or its stochastic version) on eigenfunctions, and more. For variational
networks, the authors of Effland et al. (2020) and Kobler et al. (2020) have shown
interesting insights on the learned regularizers can be gained.
Acknowledgments This work was supported by the European Union’s Horizon 2020 research
and innovation program under the Marie Skłodowska-Curie grant agreement No. 777826, by the
Israel Science Foundation (Grant No. 534/19) and by the Ollendorff Minerva Center.
References
Aujol, J.F., Gilboa, G., Papadakis, N.: Theoretical analysis of flows estimating eigenfunctions of
one-homogeneous functionals. SIAM J. Imaging Sci. 11(2), 1416–1440 (2018)
Bellettini, G., Caselles, V., Novaga, M.: The total variation flow in Rn . J. Differ. Equ. 184(2),
475–525 (2002)
Benning, M., Burger, M.: Ground states and singular vectors of convex variational regularization
methods. Methods Appl. Anal. 20(4), 295–334 (2013)
Bozorgnia, F.: Convergence of inverse power method for first eigenvalue of p-laplace operator.
Numer. Funct. Anal. Optim. 37(11), 1378–1384 (2016)
Bozorgnia, F.: Approximation of the second eigenvalue of the p-laplace operator in symmetric
domains. arXiv preprint arXiv:190713390 (2019)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–
526 (2010)
Brezis, H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de
Hilbert. Norht Holland (1973)
Bungert, L., Burger, M., Chambolle, A., Novaga, M.: Nonlinear spectral decompositions by
gradient flows of one-homogeneous functionals. Anal. PDE (2019a). To appear
46 Iterative Methods for Computing Eigenvectors of Nonlinear Operators 1657
Bungert, L., Burger, M., Tenbrinck, D.: Computing nonlinear eigenfunctions via gradient flow
extinction. In: International Conference on Scale Space and Variational Methods in Computer
Vision, pp. 291–302 . Springer (2019b)
Bungert, L., Hait-Fraenkel, E., Papadakis, N., Gilboa, G.: Nonlinear power method for computing
eigenvectors of proximal operators and neural networks. arXiv preprint arXiv:200304595
(2020)
Burger, M., Gilboa, G., Moeller, M., Eckardt, L., Cremers, D.: Spectral decompositions using one-
homogeneous functionals. SIAM J. Imaging Sci. 9(3), 1374–1408 (2016)
Cohen, I., Gilboa, G.: Energy dissipating flows for solving nonlinear eigenpair problems. J.
Comput. Phys. 375, 1138–1158 (2018)
Cohen, I., Gilboa, G.: Introducing the p-laplacian spectra. Signal Process. 167, 107281 (2020)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain
collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Effland, A., Kobler, E., Kunisch, K., et al.: Variational networks: an optimal control approach to
early stopping variational methods for image restoration. J Math Imaging Vis. 62, 396–416
(2020)
Feld, T., Aujol, J.F., Gilboa, G., Papadakis, N.: Rayleigh quotient minimization for absolutely one-
homogeneous functionals. Inverse Probl. 35(6), 064003 (2019)
Gautier, A., Tudisco, F., Hein, M.: The perron–frobenius theorem for multihomogeneous map-
pings. SIAM J. Matrix Anal. Appl. 40(3), 1179–1205 (2019)
Gautier, A., Hein, M., Tudisco, F.: Computing the norm of nonnegative matrices and the log-
sobolev constant of markov chains. arXiv preprint arXiv:200202447 (2020)
Gilboa, G.: A spectral approach to total variation. In: International Conference on Scale Space and
Variational Methods in Computer Vision, pp. 36–47. Springer (2013)
Gilboa, G.: A total variation spectral framework for scale and texture analysis. SIAM J. Imaging
Sci. 7(4), 1937–1961 (2014)
Gilboa, G.: Nonlinear Eigenproblems in Image Processing and Computer Vision. Springer, Cham
(2018)
Hait-Fraenkel, E., Gilboa, G.: Numeric solutions of eigenvalue problems for generic nonlinear
operators. arXiv preprint arXiv:190912775 (2019)
Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in
1-spectral clustering and sparse pca. In: Advances in Neural Information Processing Systems,
pp. 847–855 (2010)
Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation: a stable regularizer for inverse
problems. arXiv preprint arXiv:200608789 (2020)
Meyer, Y.: Oscillating patterns in image processing and in some nonlinear evolution equations.
The 15th Dean Jacquelines B. Lewis Memorial Lectures. American Mathematical Society,
Providence (2001)
Nossek, R.Z., Gilboa, G.: Flows generating nonlinear eigenfunctions. J. Sci. Comput. 75(2), 859–
888 (2018)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica
D 60, 259–268 (1992)
Szlam, A., Bresson, X.: Total variation and Cheeger cuts. In: International Conference on Machine
Learning (ICML’10), pp. 1039–1046 (2010)
Vassilis, A., Jean-François, A., Dossal, C.: The differential inclusion modeling fista algorithm and
optimality of convergence rate in the case b \leq3. SIAM J. Optim. 28(1), 551–574 (2018)
Zabusky, N.J., Kruskal, M.D.: Interaction of “solitons” in a collisionless plasma and the recurrence
of initial states. Phys. Rev. Lett. 15(6), 240 (1965)
Zhang, K., Zuo, W., Zhang, L.: FFDNet: toward a fast and flexible solution for CNN-based image
denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration.
In: International Conference on Computer Vision, pp. 479–486. IEEE (2011)
Optimal Transport for Generative Models
47
Xianfeng Gu, Na Lei, and Shing-Tung Yau
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1661
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662
Optimal Transport Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662
Generative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663
Optimal Transport Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665
Monge’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665
Kantorovich’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667
Brenier’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1668
McCann’s Displacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1669
Benamou-Brenier Dynamic Fluid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1670
Otto’s Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1671
Regularity of Optimal Transport Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1673
Computational Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676
Semi-discrete Optimal Transport Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676
Damping Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1680
Monte-Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1684
Manifold Distribution Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686
Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688
ReLu Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690
AutoEncoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1692
Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1694
X. Gu ()
Stony Brook, Stony Brook University, Stony Brook, NY, USA
e-mail: [email protected]
N. Lei
Dalian University of Technology, Dalian, China
e-mail: [email protected]
S.-T. Yau
Harvard University, Cambridge, MA, USA
e-mail: [email protected]
Abstract
Optimal transport plays a fundamental role in deep learning. Natural data sets
have intrinsic patterns, which can be summarized as the manifold distribution
principle: a natural class of data can be treated as a probability distribution on
a low-dimensional manifold, embedded in a high-dimensional ambient space.
A deep learning system mainly accomplishes two tasks: manifold learning and
probability distribution learning.
Given a manifold X, all the probability measures on X form an infinite
dimensional manifold, the so-called Wasserstein space. Optimal transport assigns
a Riemannian metric on the Wasserstein space, the so-called Wasserstein metric,
and defines Otto’s calculus, such that variational optimization can be carried out
in the Wasserstein space P(X). A deep learning system learns the distribution by
optimizing some functionals in the Wasserstein space P(X); therefore optimal
transport lays down the theoretic foundation for deep learning.
This work introduces the theory of optimal transport and the profound relation
between Brenier’s theorem and Alexandrov’s theorem in differential geometry
via Monge-Ampère equation. We give a variational proof for Alexandrov’s
theorem and convert the proof to a computational algorithm to solve the optimal
transport maps. The algorithm is based on computational geometry and can be
generalized to general manifold setting.
Optimal transport theory and algorithms have been extensively applied in
the models of generative adversarial networks (GANs). In a GAN model, the
generator computes the optimal transport map (OT map), while the discriminator
computes the Wasserstein distance between the generated data distribution and
the real data distribution. The optimal transport theory shows the competition
between the generator and the discriminator is completely unnecessary and
should be replaced by collaboration. Furthermore, the regularity theory of
optimal transport map explains the intrinsic reason for mode collapsing.
A novel generative model is introduced, which uses an autoencoder (AE)
for manifold learning and OT map for probability distribution transformation.
This AE-OT model improves the theoretical rigor and transparency, as well as
the computational stability and efficiency; in particular, it eliminates the mode
collapsing.
Keywords
Introduction
Deep learning is the mainstream technique for many machine learning tasks,
including image recognition, machine translation, speech recognition, and so on.
Despite its great success, the theoretical understanding on how it works remains
primitive. Many fundamental problems need to be solved, and many profound
questions need to be answered.
In this chapter, we focus on a geometric view of optimal transport (OT) to
understand deep learning models, such as generative adversarial networks (GANs).
Especially, we aim at answering the following basic questions:
Question 1. What does a deep learning system really learn? The system learns
the probability distributions on manifolds. Each natural class of data set can be
treated as a point cloud in the high-dimensional ambient space, and the point
cloud approximates a special probability measure defined on a low-dimensional
manifold. The system learns two things: one is the manifold structure and the
other is the distribution on the manifold. The manifold structure is represented
by the encoding and decoding maps, which map between the manifold and the
latent space. In generative models, such as GANs, the probability distributions are
represented by the transport mappings from a predefined white noise (such as a
Gaussian distribution, which can be easily generated from a uniform distribution) to
the data distribution, either in the latent space or on the data manifold.
Question 2. How does a deep learning system really learn? All the probability
distributions on a manifold form an infinite dimensional space P(), the so-
called Wasserstein space. A deep learning system performs optimization in the
space of P(). For example, the principle of maximum entropy searches for a
distribution in P() by optimizing the entropy functional with some constraints
obtained by observations. The optimal transport theory defines a Riemannian
metric on the probability distribution space P(), and Otto’s calculus, such that
the Wasserstein distance between measures can be computed explicitly and the
variational optimizations can be carried out by these theoretic tools. For example,
the discriminator in the WGAN model computes the Wasserstein distance between
the real data distribution and the generated data distribution, and the training process
follows the Wasserstein gradient flow on P().
Question 3. How well does a deep learning system really learn? Current deep
learning system designs have fundamental flaws; most generative models suffer
from mode collapsing. Namely, they keep forgetting some knowledge already
learned at the intermediate stage, or they generate unrealistic samples. This can be
explained by the regularity theory of optimal transport maps; basically the transport
maps are discontinuous, whereas the deep neural networks can only represent
continuous maps; therefore either the map misses some connected components of
the support of data distribution or covers all the components but also the gaps among
them.
1662 X. Gu et al.
From the above short answers, we can see the importance of the theories of
manifold and optimal transport for deep learning. In the following, we will briefly
review the most related works in section “Related Works,” introduce the theory of
optimal transport in section “Optimal Transport Theory,” explain the computational
algorithms for optimal transport in details in section “Computational Algorithm,”
and after the preparation, we will explain the manifold distribution principle in deep
learning and manifold learning by autoencoder in sections “Manifold Distribution
Principle” and “Manifold Learning”, respectively; and then we use optimal transport
view to analyze GAN model and explain the reason for mode collapse and the novel
design to eliminate mode collapse in section “Generative Adversarial Networks”;
finally, we conclude the work in section “Conclusion”.
Related Works
The literature of optimal transport and generative models is huge. Here, we only
review the most directly related works.
and processing; recent works generalized them to deal with 3D surfaces by using
computational geometric approaches. By incorporating with conformal mapping
methods, optimal transportation maps are applied to obtain area-preserving maps in
Su et al. (2016). The methods in Yu et al. (2018) can simultaneously balance the area
and the angle distortion. Su et al. generalized the algorithm to three-dimensional
cases and presented a volume-preserving map in Su et al. (2016), and then in Su
et al. (2017) they further gave a volumetric controllable algorithm by OT maps.
While most of the research works deal with optimal transport problems with
Euclidean metric, Wang (2004) and Cui et al. (2019) focused on solving the optimal
transportation problems in the spherical domain. The method has also been applied
for area-preserving brain mapping in Su et al. (2013), which maps the cortical
surface onto the unit sphere conformally and then onto the extended complex plane
by the stereographic projection. The method has been improved in Nadeem et al.
(2017) by using the conformal welding method.
Recent research works also introduce optimal transportation theory in the
optical design field. Reflector design problems were summarized as a group of
Monge-Ampère equations in Wang (1996, 2004) and Guan et al. (1998). The cor-
respondence between Monge-Ampère equations and reflector design problems was
listed as one of the open problems in Yau (1998) and can further be related to optimal
transportation theory. Similar researches in lens design situation were introduced in
Gutiérrez, Qingbo and Huang (2009). Numerical methods and simulation results of
these optical design problems were proposed in Meyron et al. (2018).
Generative Models
pre-trained VGG19 network (Simonyan and Zisserman 2014), which makes it not
so reasonable under the scenes where the data sets are not included in those used to
train the VGG.
Optimal Transport Based Generative Model In (2019) Lei et al. first gave a
geometric interpretation to the generative adversarial networks (GANs). By using
the optimal transport view of GAN model, they showed that the discriminator
computes the Wasserstein distance via the Kantorovich potential and the gener-
ator calculates the transport map. For a large class of transportation costs, the
Kantorovich potential can give the optimal transportation map by a close-form
formula. This shows the adversarial competition can be replaced by collaboration to
improve the efficiency and simplicity. In Lei et al. (2020) the authors pointed out that
GANs mainly accomplish two tasks: manifold learning and probability distribution
transformation. The latter can be carried out using the classical OT method. Then in
An et al. (2020), a new generative model based on extended semi-discrete optimal
transport was proposed, which avoids representing discontinuous maps by DNNs
and therefore effectively prevents mode collapse and mode mixture (Fig. 23).
Numerical Method In this work, we show that the reason that causes the mode
collapse in deep learning is indeed the discontinuity of optimal transport map in
general. It is very similar to the situation when using the classic numerical method
to solve OT map. For instance, the Brenier potential in OT satisfies the Hamiltonian–
Jacobi equation which could be continuous. However, its velocity (correspond-
ing to the OT map) satisfying the conservation law is generally discontinuous.
For examples, the Benamou-Brenier method (Benamous and Brenier 1999) and
Haker-Tannenbaum-Angenent method (Angenent et al. 2003) compute the optimal
transport maps based on fluid dynamics.
In this subsection, we will introduce basic concepts and theorems in classic optimal
transport theory, focusing on Brenier’s approach, and their generalization to the
discrete setting. Details can be found in Villani’s book (Villani 2008).
Monge’s Problem
f (x)dx = g(y)dy. (1)
X Y
f (x)
detDT (x) = . (3)
g ◦ T (x)
The Monge’s problem of optimal transport arises from finding the measure-
preserving map that minimizes the total transport cost.
Kantorovich’s Approach
Depending on the cost function and the measures, the optimal transport map
between (X, μ) and (Y, ν) may not exist. For example,
k suppose μ is atomic
k
μ = δ(x − x0 ), and ν = i=1 νi δ(y − yi ) with i=1 νi = 1, k > 1, then
the mass concentrated on x0 has to be split and sent to different yi ’s. Kantorovich
relaxed transport maps to transport plans or transport schemes. A transport plan is
represented by a joint probability measure ρ : X ×Y → R≥0 , such that the marginal
probability of ρ equals to μ and ν, respectively. Formally, let the projection maps
be πx (x, y) = x, πy (x, y) = y, and then the joint measure class is defined as
Kantorovich’s problem can be solved using the linear programming method. Due
to the duality of linear programming, the (KP) Eq. 8 can be reformulated as the
following duality problem (DP):
The maximum value of Eq. 9 gives the Wasserstein distance. Most existing Wasser-
stein GAN models are based on the duality formulation under the L1 cost function.
Assume c(x, y) and ϕ are with C 1 continuity, and then the necessary condition for
c-transform is given by
(DP ) Wc (μ, ν) = max ϕ(x)dμ + ϕ c (y)dν, (12)
ϕ X Y
Brenier’s Approach
The existence, uniqueness, and the intrinsic structure of the optimal transport map
were proven by Brenier (1991).
47 Optimal Transport for Generative Models 1669
then there exists a convex function u : X → R, the so-called Briener’s potential, its
gradient map ∇u gives the solution to the Monge’s problem,
(∇u)# μ = ν. (17)
The Brenier’s potential is unique up to a constant; hence the optimal mass transport
map is unique.
Therefore, finding the optimal transport map is reduced to solving the Monge-
Ampère equation.
Problem 4 (Brenier). Suppose X and Y are subsets of the Euclidean space Rd and
the transport cost is the quadratic Euclidean distance. Furthermore μ is absolutely
continuous with respect to Lebesgue measure and μ and ν have finite second-order
moments; Find a convex function u : X → R satisfies the Monge-Ampère equation
Eq. 15.
We can show the following relation holds for quadratic Euclidean cost:
∗
1 2 1 2
|y| − ϕ c (y) = |x| − ϕ(x) . (19)
2 2
McCann’s Displacement
We consider all the probability measures μ defined on X with finite second order
moment; μ is absolutely continuous with respect to Lebesgue measure:
1670 X. Gu et al.
P(X) := μ : |x| dμ(x) < ∞, μ a.c.
2
(20)
X
Then according to Brenier’s theorem, for any pair μ, ν ∈ P(X), there exists a
unique optimal transport map T : X → X, T# μ = ν; furthermore T = ∇u for
some Brenier potential u, which satisfies the Monge-Ampère equation 15. The
transportation cost gives the Wasserstein distance between μ and ν in Eq. 6.
Supp(ρt ) bounded
0≤t≤1
47 Optimal Transport for Generative Models 1671
4. By mass conservation law, the pair (ρ, v) satisfies the continuity equation:
∂ρt
+ ∇ · (ρt vt ) = 0 (22)
∂t
Benamou-Brenier proves that the kinetic energy of the solution to Eq. 23 equals to
the square of the Wasserstein distance in Eq. 6, namely, Benamou-Brenier problem
is equivalent to Brenier problem; furthermore the geodesic is given by the solution
to the Benamou-Brenier problem:
1 1
min |v(x, t)| dρ(x, t)dt : (ρt , vt ) ∈ (μ, ν) .
2
2 0 X
Otto’s Calculus
w 2
ρ|v| ≤ 2
ρ v + ε ,
ρ
therefore we have
v, w = 0.
1 ∂ρt
W22 (μ, ν) = min |∇u| dρt dt, ρ0 = μ, ρ1 = ν, −∇ · (ρt ∇u) =
2
.
(ρt ,u) 0 X ∂t
Given two geodesics ρ1 (t), ρ2 (t) ⊂ P(X),ρ1 (0) = ρ2 (0) = ρ, their tangent vectors
at ρ ∈ P(X) are
∂ρ1 ∂ρ2
= −∇ · (ρ1 ∇ϕ1 ), = −∇ · (ρ2 ∇ϕ2 ),
∂t ∂t
and
we obtain
d
Ent(ρ(t)) = ∇ log ρ, v ρ dx
dt X
This shows the Wasserstein gradient of entropy equals to ∇ log ρ. We plug it into
the continuity equation and obtain
47 Optimal Transport for Generative Models 1673
∂ρt ∇ρt ∂ρt
+∇ · − ρt = − Δρt = 0.
∂t ρt ∂t
This shows that the Wasserstein gradient flow of the entropy is equivalent to the
classical heat flow.
Let and be two bounded smooth open sets in Rd , and let dμ = f dx and
dν = gdy be two probability measures on Rd such that f |Rd \ = 0 and g|Rd \ =
0. Assume that f and g are bounded away from zero and infinity on and ,
respectively.
x0 := Dx c(x0 , )
is convex.
It is obvious that ∂u(x) is a closed convex set. Geometrically, if p ∈ u(x), then the
hyper-plane
touches u from below at x, namely, lx,p ≤ u in and lx,p (x) = u(x), lx,p is a
supporting plane to u at x.
The Brenier potential u is differentiable at x if its subgradient ∂u(x) is a
singleton. We classify the points according to the dimensions of their subgradients
and define the sets
k (u) := x ∈ Rd | dim(∂u(x)) = k , k = 0, 1, 2 . . . , d.
It is obvious that 0 (u) is the set of regular points, k (u), k > 0 are the set of
singular points. We also define the reachable subgradients at x as
∇∗ u(x) := lim ∇u(xk )|xk ∈ 0 , xk → x .
k→∞
47 Optimal Transport for Generative Models 1675
It is well known that the subgradient equals to the convex hull of the reachable
subgradient
The subgradient of x0 , ∂u(x0 ) is the entire inner hole of , ∂u(x1 ) which is the
shaded triangle. For each point on γk (t), ∂u(γk (t)) is a line segment outside . x1
is the bifurcation point of γ1 , γ2 , and γ3 . The Brenier potential on 1 and 2 is not
differentiable, and the optimal transportation map ∇u on them is discontinuous.
Figure 2 shows the singularity structure of an optimal transport map between the
uniform distribution inside a solid ball to that of the solid Stanford bunny. Since
the target domain is non-convex, the boundary surface has complicated folding
structure, which is the singularity set of the map.
x0
γ2 x1 γ1
γ0
γ3
∂u
Ω Λ
Computational Algorithm
Brenier’s theorem can be directly generalized to the discrete situation. The source
measure μ is absolutely continuous with respect to Lebesgue measure, defined on a
convex compact domain ; the target measure ν is the summation of Dirac measures
n
ν= νi δ(y − yi ), (24)
i=1
where Y = {y1 , y2 , · · · , yn }
are training samples. The source and the target
measures have equal total mass ni=1 νi = μ( ). Each sample yi corresponds to a
supporting plane of the Brenier potential, denoted as
47 Optimal Transport for Generative Models 1677
where the height hi is an unknown variable. We represent all the height variables as
h = (h1 , h2 , · · · , hn ).
An envelope of a family of hyper-planes in the Euclidean space is a hyper-surface
that is tangent to each member of the family at some point, and these points of
tangency together form the whole envelope. As shown in Fig. 3, the Brenier potential
uh : → R is a piecewise linear convex function determined by h, which is the
upper envelope of all its supporting planes,
n n
uh (x) = max{πh,i (x)} = max x, yi + hi . (26)
i=1 i=1
The graph of Brenier potential is a convex polytope. Each supporting plane πh,i
corresponds to a facet of the polytope. The projection of the polytope induces a cell
decomposition of , each supporting plane πi (x) projects onto a cell Wi (h),
n
= Wi (h) ∩ , Wi (h) := {p ∈ Rd |∇uh (p) = yi }. (27)
i=1
uh u∗h
πh,i ∗
πh,i
proj proj ∗
∇uh
Wi (h) yi
Ω, μ Y, ν
Fig. 3 PL Brenier potential (left) and its Legendre dual (right)
1678 X. Gu et al.
The gradient map ∇uh : → Y maps each cell Wi (h) to a single point yi ,
Given the target measure ν in Eq. 24, there exists a discrete Brenier potential in
Eq. 26, whose projected μ-volume of each facet wi (h) equals to the given target
measure νi . This was proved by Alexandrov in convex geometry.
wi (h) = νi , for all i. The vector h is the unique minimum argument of the following
convex energy
h
n
n
E(h) = wi (η)dηi − hi νi , (30)
0 i=1 i=1
Definition 10. (power distance) Given a point yi ∈ Rd with a power weight ψi , the
power distance is given by
pow(x, yi ) = |x − yi |2 − ψi . (35)
Definition 11. (power diagram) Given weighted points (y1 , ψ1 ), . . . , (yk , ψk ), the
power diagram is the cell decomposition of Rd
The weighted Delaunay triangulation, denoted as T(ψ), is the Poincaré dual to the
power diagram; if Wi (ψ) ∩ Wj (ψ) = ∅, then there is an edge connecting yi and
yj in the weighted Delaunay triangulation. Note that pow(x, yi ) ≤ pow(x, yj ) is
equivalent to
1 1
x, yi + (ψi − |yi |2 ) ≥ x, yj + (ψj − |yj |2 ). (38)
2 2
Wi (ψ) = {x ∈ Rd | x, yi + hi ≥ x, yj + hj , ∀j }. (39)
Next, we need to determine the step length λ. We initialize λ as one and compute
the convex hull of the points
{(y1 , hk1 + λd1 ), (y2 , hk2 + λd2 ), · · · , (yn , hkn + λdn )}.
If the convex hull misses any point, then hk + λd is outside the admissible space,
and the corresponding Brenier potential is not strictly convex. Then we reduce the
step length λ by half, λ ← 12 λ, and repeat the trial. We repeat this procedure and
find the minimal l, such that
min hk + 2−l d ∈ .
l
47 Optimal Transport for Generative Models 1681
As shown in Fig. 5, given a genus zero surface S with a single boundary, it has
an induced Euclidean metric g, which induces the surface area element dAg . After
the normalization, the total surface area is π . The Riemann mapping ϕ : (S, g) →
(D, du2 + dv 2 ) maps the surface onto the unit disk and pushes the area element to
the disk, denoted as ϕ# dAg . Since Riemann mapping is conformal, the surface area
element can be written as
where e2λ(u,v) is the area distortion function and can be treated as the target density
function.
On the disk, the Lebesgue measure, or equivalently the Euclidean metric
du2 + dv 2 , induces the Euclidean area element dudv. We compute the optimal
transportation T : (D, dudv) → (D, ϕ# dAg ) using the geometric variational
method. The optimal transport mapping result is shown between the two planar
images. The composition between the Riemann mapping ϕ and the inverse of the
optimal transport map T −1 gives an area-preserving mapping
1682 X. Gu et al.
Fig. 5 The optimal transport map for a male face. (a) conformal parameterization (b) area-
preserving parameterization (c) conformal mapping (d) optimal transport map (e) Brenier potential
(f) Legendre dual
47 Optimal Transport for Generative Models 1683
×104
6 700
a 600
b
5
4 500
400
3
300
2
200
1 100
0 0
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
×10 4
3500 2
3000
c 1.8 d
1.6
2500 1.4
2000 1.2
1
1500 0.8
1000 0.6
0.4
500 0.2
0 0
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
Fig. 6 Angle distortion and area distortion histograms of the male surface in Fig. 5. (a) angle
distortion of conformal mapping (b) area distortion of conformal mapping (c) angle distortion of
optimal transport map (d) area distortion of optimal transort map
1684 X. Gu et al.
Fig. 7 Singularity set of the Brenier potential function is the discontinuity set of the optimal
transportation map
Figure 8 shows the computation process of the Buddha surface model. The
conformal mapping is computed first, and then the optimal transport map is obtained
by finding the Brenier potential. The intermediate maps are shown in the figure.
Monte-Carlo Method
In practice, our goal is to compute the discrete Brenier potential in Eq. (26) by
optimizing the convex energy in Eq. (30). For low dimensional cases, we can directly
use Newton’s method by computing the gradient Eq. (33) and Hessian matrix
Eq. (34). For deep learning applications, direct computation of Hessian matrix is
unfeasible; instead we can use gradient descend method or quasi-Newton’s method
with super-linear convergence. The key of the gradient is to estimate the μ-volume
wi (h). This can be done use Monte-Carlo method: we draw n random samples
from the distribution μ and count the number of samples falling in Wi (h), the ratio
converge to the μ-volume. This method is purely parallel and can be implemented
using GPU. Furthermore, we can use hierarchical method to further improve the
efficiency: first we partition the target samples to clusters and compute the optimal
transportation map to the mass centers of the clusters; second, for each cluster, we
compute the OT map from the corresponding cell to the original target samples
within the cluster.
In order to avoid mode collapse, we need to find the singularity sets in . As
shown in Fig. 7, the target Dirac measure has two clusters; the source is the uniform
distribution on the unit planar disk. The graph of the Brenier potential function is
a convex polyhedron with a ridge in the middle. The projection of the ridge on
the disk is the singularity set 1 (u); the optimal mapping is discontinuous on 1 .
47 Optimal Transport for Generative Models 1685
Fig. 8 Buddha surface, the last two rows show the intermediate computational results during the
optimization. (a) Buddha surface front side (b) Buddha surface back side (c) Brenier potential (d)
Legendre dual
1686 X. Gu et al.
In general cases, if two cells Wi (h) and Wj (h) are adjacent, then we compute the
angle between the normals to the corresponding support planes:
yi , yj
θij :=
|yi | · |yj |
if θij is greater than a threshold, then the common facet Wi (h) ∩ Wj (h) is in the
discontinuity singular set.
We believe the great success of deep learning can be partially explained by the well
accepted manifold distribution principle.
Definition 12 (Manifold).
Suppose M is a topological space, covered by a set of
open sets M ⊂ α Uα . For each open set Uα , there is a homeomorphism ϕα :
Uα → Rn ; the pair (Uα , ϕα ) form a chart. The union of charts form an atlas A =
{(Uα , ϕα )}. If Uα ∩ Uβ = ∅, then the chart transition map is given by ϕαβ : ϕα (Uα ∩
Uβ ) → ϕβ (Uα ∩ Uβ ),
ϕαβ := ϕβ ◦ ϕα−1 .
47 Optimal Transport for Generative Models 1687
Fig. 9 The MNIST data set is a two dimensional surface in the image space. (a) LeCunn’s MNIST
handwritten digits samples on manifold (b) Hinton’s t-SNE embedding on the latent space
Rn Image Space
Σ M anif old
Uj
Ui
ϕj
ϕi
ϕij
Z Latent Space
The MNIST data set is treated as the data manifold ; the space of all possible
images is the image space R784 ; the plane is the latent space Z; the mapping from
the data manifold to the latent space ϕ : → Z is called the encoding map; the
inverse mapping ϕ −1 : Z → is called the decoding map. Each handwritten digit
image p ∈ is a training sample on the data manifold; its image of the encoding
map ϕ(p) is called the latent code of p. The data set can be treated as a probability
distribution μ defined on the data manifold , which is called data distribution
(Fig. 10).
1688 X. Gu et al.
p
p̃
Main Tasks In general, deep learning systems have two major tasks:
We use the manifold view to explain how the denoising is accomplished by a deep
learning system. Traditional methods Fourier transform the noisy image, filter out
the high frequency component, and inverse Fourier transform back to the denoised
image. Deep learning methods use the clean images to train the neural network,
obtain a representation of the manifold, and then project the noisy image to the
manifold; the projection image point is the denoised image. As shown in Fig. 11 and
the left frame of Fig. 12, we use a deep learning system to learn the data manifold
of clean human facial images. A facial image with noise is p̃, which is not on but
close to the manifold. We project P̃ to using the Riemannian metric in the image
space Rn , the closest point on to p̃ is p, and then p is the denoised image.
Traditional method is independent of the content of the image; ML method
heavily depends on the content of the image. The prior knowledge is encoded by the
manifold. If the wrong manifold is chosen, then the denoising result is of nonsense.
As shown in Fig. 12 right frame, we use the cat face manifold to denoise a human
face image; the result looks like a cat face.
Manifold Learning
Learning the data manifold structure is equivalent to learning the encoding and
decoding maps. The encoding mapping ϕ : → Z maps the data manifold
to the latent space. It push-forwards μ to the latent distribution, denoted as ϕ# μ.
Given the data manifold and the latent space Z, there are infinite many encoding
mappings. In practice, it is crucial to choose the appropriate mapping that preserves
the data distribution. We use a low-dimensional example to illustrate the concepts
as shown in Fig. 13. The Buddha surface represents the data manifold ; μ is the
47 Optimal Transport for Generative Models 1689
Fig. 12 Human facial image denoising by projection to the data manifold. (a) projection to a
human facial photo manifold (b) projection to a cat face image manifold
Fig. 13 Different encoding mappings from the manifold to the planar disk
1690 X. Gu et al.
uniform distribution on . Each row shows one encoding map. In the top row, if
we uniformly sample the unit disk in the latent space, the samples are pulled back
to the surface by the decoding map, and then the pullback samples on are highly
nonuniform. In contrast, in the bottom row, the uniform latent samples are pulled
back to uniform samples on the surface. This shows the encoding map in the bottom
row preserves the data distribution μ in the latent space.
In practice, many methods have been proposed to compute the encod-
ing/decoding maps, such as VAE (variational autoencoder) (Kingma and Welling
2013; Jain et al. 2017), WAE (Wasserstein autoencoder) (Gelly et al. 2018),
adversarial autoencoder (Makhzani et al. 2015), and so on.
In deep learning, the deep neural networks are used to approximate mappings
between Euclidean spaces. One of the most commonly used activation function
is the ReLU function, σ (x) = max{x, 0}. When x is positive, we say the neuron
is activated. One neuron represents a function σ ( ki=1 λi xi − bi ), where λi ’s are
weights and bi the bias. Many neurons are connected to form a network. A ReLU
deep neural network (DNN) represents a piecewise linear map.
Definition 13 (ReLU DNN). For any number of hidden layers k ∈ N, input and
output dimensions w0 , wk+1 ∈ N, a Rw0 → Rwk+1 ReLU DNN is given by
specifying a sequence of k natural numbers w1 , w2 , . . . , wk representing widths
of the hidden layers, a set of k affine transformations Ti : Rwi−1 → Rwi for
i = 1, . . . , k and a linear transformation Tk+1 : Rwk → Rwk+1 corresponding to
weights of hidden layers.
ϕθ = Tk+1 ◦ σk ◦ Tk ◦ · · · ◦ T2 ◦ σ1 ◦ T1 , (40)
where ◦ denotes mapping composition, θ represent all the weight and bias parame-
ters, and σi represents the mapping σi : Rwi−1 → Rwi σi = (σi1 , σi2 , · · · , σiwi ),
⎛ ⎞
wi−1
σi = σ ⎝ λi xk − bi ⎠ .
j jk j
k=1
Fixing the parameter θ , the map ϕθ induces cell decompositions for the input space
and the output space.
D(ϕθ ) : X = Uα ,
α
Furthermore, ϕθ maps the cell decomposition in the ambient space D(ϕθ ) to a cell
decomposition in the latent space. The restriction of ϕθ on each cell is a linear map.
The number of cells in D(ϕθ ) describes the capacity of the network, namely, the
learning capability of the network.
We can explicitly estimate the upper bound of the network capacity N(N ). The
maximum number of parts one can get when cutting d-dimensional space Rd with
n hyper-planes is denoted as C(d, n), and then by induction, one can easily show
that
n n n n
C(d, n) = + + + ··· + . (41)
0 1 2 d
k+1
N(N ) ≤ Πi=1 C(wi−1 , wi ). (42)
1692 X. Gu et al.
AutoEncoder
One of the most popular models for learning the encoding and decoding maps is
AutoEncoder as shown in Fig. 14. The AutoEncoder model consists two symmetric
deep neural networks: the first network represents the encoder, and the second
network represents the decoder. The numbers of nodes in the input and the output
layers equal to the dimension of the ambient space. Between the encoder and
decoder, there is a bottleneck layer. The number of nodes in the bottleneck layer
equals to the dimension of the latent space.
We denote the ambient space as X, latent space as Z, encoding map ϕθ : X → Z,
and decoding map ψθ : Z → X. We sample the data manifold ⊂ X to get training
samples {x1 , x2 , · · · , xn } ⊂ and apply the L2 -norm as the loss function Lθ . The
training process is the optimization
n
min Lθ (x1 , . . . , xn ) = min |xi − ψθ ◦ ϕθ (xi )|2 . (43)
θ θ
i=1
Fig. 15 Manifold embedding computed by an autoencoder. (a) Input manifold ⊂ X (b) latent
˜ = ψθ (D) ⊂ X
representation D = ϕθ (M) ⊂ Z (c) reconstructed manifold
Fig. 16 The cell decomposions induced by the autoencoder. (a) cell decomposition D(ϕθ ) (b)
latent space cell decomposition (c) cell decomposition D(ψθ ◦ ϕθ )
where | · | represents the cardinality of the set. For any point p ∈ , the local
feature size of p is the distance from p to the medial axis (). Suppose the
samplings on are X = {x1 , x2 , . . . , xn }, such that, for any point q ∈ , the
geodesic disk c(q, δ) intersects X is non-empty, and the geodesic distance between
any pair of samples is greater than ε, then X is called a (δ, ε) sampling. Given such a
sampling, we can compute the geodesic Delaunay triangulation of X; this induces a
polyhedral surface .˜ By geometric approximation theory, suppose is C 2 smooth,
we can determine the parameters δ, ε by the injective radius, the principle curvature,
and the local feature size, such that ˜ approximates the original surface with
arbitrary precision in terms of Hausdorff distance, Riemannian metric, Laplace-
Beltami operator, curvature measures, and so on.
Assume the network capacity for the autoencoder is big enough, the (δ, ε)
samples are the training set, and the optimization reduces the loss function to be
0; then the restriction of ψθ ◦ ϕθ equals to identity, and the autoencoder recovers
˜ By construction, the decoded surface approximates the original surface with
.
user desired accuracy. This argument can be generalized to higher dimensional
1694 X. Gu et al.
manifolds. In reality, the data manifold is unknown, and it is hard to figure out its
injective radius, curvatures, and local feature size; the optimization of deep networks
often gets stuck at the local optima. There are many widely open challenges for
learning the manifold structure.
Generative adversarial networks (GAN) are one of the most popular generative
models in deep learning. It has many merits, such as it can automatically generate
samples; the requirement for the data samples is reduced; and it can model arbitrary
data distribution without closed form expression. As shown in Fig. 17, a GAN
model includes two deep neural networks, the generator and the discriminator. The
generator converts a white noise (user prescribed distribution in the latent space) to
generated samples; the discriminator takes both the real data samples and the fake
generated samples and verifies whether the current sample is authentic or fake.
The generator and the discriminator compete with each other; the generator
improves the quality of the generated samples to confuse the discriminator, and the
discriminator improves the discriminating capability and detect the fake samples.
Eventually, the system reaches the Nash equilibrium; the discriminator cannot
differentiate the generated ones from the real samples, and then the generated
samples can be applied to real applications, such as training other recognition
systems and so on.
Wasserstein GAN applies optimal transport method as shown in Fig. 18. The
generator G computes the optimal transport map gθ : Z → , which transforms
X
Σ
μθ
G : gθ
ν
ζ
D : Wc (μθ , ν), ϕξ
Z
Fig. 18 The framework of a GAN model; Z is the latent space, ζ the white noise, X the image
space, the data manifold, G generator, D discriminator
the white noise ζ in the latent space Z to the generated distribution μθ = (gθ )# ζ .
The discriminator D computes the Kantorovich potential ϕξ and then computes the
Wasserstein distance between μθ and the real data distribution ν
Wc (μθ , ν) = max ϕξ (x)dμθ (x) + ϕξc (y)dν(y),
ϕξ X Y
where X and Y should be the data manifold ; in practice, they are replaced by
the image space X in Arjovsky et al. (2017). The whole training process of WGAN
model is a min-max optimization:
min max ϕξ (x)dμθ (x) + ϕξc (y)dν(y).
θ ξ X Y
One can choose L1 -cost, then c(x, y) = |x − y|, ϕ c = −ϕ, given ϕ is 1-Lipsitz,
then the WGAN model optimizes
min max ϕξ ◦ gθ (z)dζ (z) − ϕξ (y)dν(y),
θ ξ X Y
namely,
1 2
uξ = |x| − ϕξ (x), Tξ = ∇uξ .
2
1696 X. Gu et al.
In general, deep neural networks have huge amount of parameters, such that their
capacities are big enough to memorize all the training samples. So the following
question is naturally raised:
Question 4. Memorization vs. Learning: Does a deep learning system really learn
something or just memorize all the training samples?
we can tell that the system really memorizes all the training samples {yi } but also
learns the probability for each sample represented by {hi }, which are obtained by
nonlinear optimization.
47 Optimal Transport for Generative Models 1697
Hence deep learning systems both memorize the training samples and learn
the probability measure.
Mode Collapsing
Fig. 19 Comparison between conventional models VAE and WGAN with our model AE-OT
using MNIST data set. (a) VAE (b) WGAN (c) Our model, AE-OT
1698 X. Gu et al.
Fig. 20 Mode collapsing in WGAN-GP and WGAN-div model on CelebA data set. (a) WGAN-
GP (b) WGAN-div
If the target measure ν has multiple modes, namely its support has multiple con-
nected components, then the continuous map may cover one connected component
and miss the other modes; this induces mode collapse; or the continuous map covers
all the modes but also the gaps among the modes, and then the samples generated
in the gap area will mix samples from different modes; this induces mode mixture.
47 Optimal Transport for Generative Models 1699
a b
0.4 0.4
0.2 0.2
0.0 0.0
−0.2 −0.2
−0.4 −0.4
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
c d
0.4 0.4
0.2 0.2
0.0 0.0
−0.2 −0.2
−0.4 −0.4
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
Fig. 22 Comparison between conventional models with AE-OT. (a) original (b) GAN (c) pacgan
(d) Our model, AE-OT
As shown in Fig. 22, each orange spot represents a mode in frame (a); the GAN
model (Goodfellow et al. 2014) misses some modes and also covers the gaps among
the modes in frame (b); the pacgan model (Lin et al. 2018) covers all the modes but
also covers the gaps among them. Hence GAN model and pacgan model suffer from
both mode collapse and mode mixture.
In order to verify our hypothesis that the transport map is discontinuous on the
singularity sets in real applications, we design and perform an experiment using
human facial image data set celebA. As shown in Fig. 23, we use an autoencoder to
encode the data manifold to the latent space, ϕ : → Z; ϕ push forwards the
data distribution μ to the latent code distribution ϕ# μ; then in the latent space, we
compute an optimal transport map from a uniform distribution on the unit ball to
the latent code distribution ϕ# μ; we draw line segments in the unit ball, which are
mapped to curves on the data manifold; each curve is an interpolation in the facial
image set. As shown in Fig. 24, each row is an interpolation curve on the human
facial image manifold.
As shown in Fig. 23, there are singularity sets in the unit ball, and a blue line
segment intersects the singularity sets at p; then T (p) is outside the latent code set
ϕ(); the decoded image ϕ −1 (T (p)) is outside the data manifold . In this way,
we can detect the boundary of the data manifold . An image on the human facial
1700 X. Gu et al.
M anif old
ϕ−1 ◦ T
ϕ ϕ−1
T
Latent Space
image manifold means a human face, which is physically “allowable,” satisfies all
the anatomical and biological laws but with zero probability to appear in reality. As
shown in Fig. 25, we start from a boy image with brown eyes and end at a girl image
with blue eyes. In the middle of the interpolation, we generate a facial image with
one blue eye and one brown eye. This type of human faces exist in real world, but
the probability to encounter such a person is almost zero in practice. All the training
facial images are either brown eyes or blue eyes; the generated facial image with
different eye colors is on the boundary of the data manifold. This demonstrates that
the existence of singularity set and the transport map T is discontinuous at the .
47 Optimal Transport for Generative Models 1701
AE-OT Model
In order to eliminate mode collapse, improve the stability, and make the whole
model more understandable, we propose a novel generative model: AE–OT model.
As shown in Fig. 26, the model consists two parts: AE and OT. The AE network is an
autoencoder, which focuses on manifold learning and computes the encoding map
fθ : → Z and the decoding map gξ : → Z; the OT module is in charge of
probability distribution transformation and finds the optimal transport map using our
geometric variational approach. The OT module can be implemented either using a
deep neural network and optimized by training or directly using geometric method,
such as Monte Carlo OT algorithm on GPU.
The mode collapses in conventional generative models are mainly caused by
the step of computing transport map, because the transport map is discontinuous,
but DNNs can only represent continuous maps. The AE-OT model conquers this
fundamental difficulty in the following way: observe Fig. 27, in the latent space
the latent code distribution has multiple clusters; the support rectangle of the white
noise is partitioned into 10 cells as well; each cell is mapped to a cluster with the
same color. Therefore, the optimal transport map between the noise and the latent
code is discontinuous across the cell boundaries. Instead of computing the OT map
itself, the AE-OT model computes the Brenier potential (lower-left corner), which
is continuous (but not globally differentiable) and representable by neural networks.
Since the OT map covers all the clusters of the latent code distribution, and skips all
the gaps among the clusters, no mode collapse or mode mixture can happen.
1702 X. Gu et al.
ν
gξ
fθ
T
μ = (fθ )# ν
ζ
Z Z
encoder
decoder
OT M apper
Brenier Potential
white noise
Furthermore, the AE-OT model has the merits: solving Monge-Ampère equation
is reduced to a convex optimization, which has a unique solution due to the
Brenier Theorem 2. The optimization won’t be trapped in a local optimum; the
Hessian matrix of the energy has an explicit formulation. The Newton’s method
can be applied with second-order convergence; or the quasi-Newton’s method can
be used with super-linear convergence, whereas conventional gradient descend
method has linear convergence. The approximation accuracy can be fully controlled
by the density of the sampling using Monte-Carlo method; the algorithm can be
refined to be hierarchical and self-adaptive to further improve the efficiency; the
parallel algorithm can be implemented using GPU. By comparing Figs. 20 and 28,
47 Optimal Transport for Generative Models 1703
Fig. 28 Comparison between CRGAN (Mescheder et al. 2018) and our model. (a) CRGAN –
mode collapsing (b) Our model
we can see that the AE-OT model greatly reduces the mode collapse and mode
mixture. Figure 29 shows the generated facial images by training our model on the
CelebAHQ data set.
1704 X. Gu et al.
Conclusion
References
Alexandrov, A.D.: Convex polyhedra Translated from the 1950 Russian edition by N.S. Dairbekov,
S.S. Kutateladze, A.B. Sossinsky. Springer Monographs in Mathematics (2005)
An, D., Guo, Y., Lei, N., Luo, Z., Yau, S.-T., Gu, X.: Ae-ot: A new generative model based on
extended semi-discrete optimal transport. In: International Conference on Learning Represen-
tations (2020)
Angenent, S., Haker, S., Tannenbaum, A.: Minimizing flows for the monge-kantorovich problem.
SIAM J. Math. Ann. 35(1), 61–97 (2003)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML, pp.
214–223 (2017)
Benamous, J.-D., Brenier, Y.: A numerical method for the optimal time-continuous mass transport
problem and related problems. In: Caffarelli, L.A., Milman, M. (eds.) Monge Ampère Equation:
Applications to Geometry and Optimization (Deerfield Beach, FL), volume 226 of Contempo-
rary Mathematics, pp. 1–11, Providence (1999) American Mathematics Society
Bonnotte, N.: From knothe’s rearrangement to Brenier’s optimal transport map. SIAM J. Math.
Anal. 45(1), 64–87 (2013)
Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun.
Pure Appl. Math. 44(4), 375–417 (1991)
Caffarelli, L.A.: Some regularity properties of solutions of monge–ampère equation. Commun.
Pure Appl. Math. 44(8–9), 965–969 (1991)
Cui, L., Qi, X., Wen, C., Lei, N., Li, X., Zhang, M., Gu, X.: Spherical optimal transportation.
Comput. Aided Des. 115, 181–193 (2019)
47 Optimal Transport for Generative Models 1705
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C.,
Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Informa-
tion Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)
Dai, B., Wipf, D.: Diagnosing and enhancing VAE models. In: International Conference on
Learning Representations (2019)
De Goes, F., Breeden, K., Ostromoukhov, V., Desbrun, M.: Blue noise through optimal transport.
ACM Trans. Graph. 31(6), 171 (2012)
De Goes, F., Cohen-Steiner, D., Alliez, P., Desbrun, M.: An optimal transport approach to robust
reconstruction and simplification of 2D shapes. In: Computer Graphics Forum, vol. 30, pp.
1593–1602. Wiley Online Library (2011)
Dominitz, A., Tannenbaum, A.: Texture mapping via optimal mass transport. IEEE Trans. Vis.
Comput. Graph. 16(3), 419–433 (2010)
Donahue, J., Simonyan, K.: Large scale adversarial representation learning. In:
https://arxiv.org/abs/1907.02544 (2019)
Figalli, A.: Regularity properties of optimal maps between nonconvex domains in the plane.
Communications in Partial Differential Equations, 35(3), 465–479 (2010)
Goodfellow, I.: Nips 2016 tutorial: Generative adversarial networks. arXiv preprint
arXiv:1701.00160 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
Bengio, Y.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Gu, D.X., Luo, F., Sun, J., Yau, S.-T.: Variational principles for minkowski type problems, discrete
optimal transport, and discrete monge–ampère equations. Asian J. Math. 20, 383–398 (2016)
Guan, P., Wang, X.-J., et al.: On a monge-ampere equation arising in geometric optics. J. Diff.
Geom. 48(48), 205–223 (1998)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of
wasserstein gans.yIn NIPS, pp. 5769–5779 (2017)
Gutiérrez, C.E., Huang, Q.: The refractor problem in reshaping light beams. Arch. Ration. Mech.
Anal. 193(2), 423–443 (2009)
Gelly, S., Schoelkopf, B., Tolstikhin, I., Bousquet, O.: Wasserstein auto-encoders. In: ICLR (2018)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial
networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational
autoencoders. In: CVPR, pp. 5415–5424 (2017)
Jeff Donahue, T.D., Krähenbühl, P.: Adversarial feature learning. In: International Conference on
Learning Representations (2017)
Kantorovich, L.V.: On a problem of monge. J. Math. Sci. 133(4), 1383–1383 (2006)
Kantorovich, L.V.: On a problem of monge. Uspekhi Mat. Nauk. 3, 225–226 (1948)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
(2013)
Lindbo Larsen, A.B., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels
using a learned similarity metric (2016)
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani,
A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative
adversarial network (2017)
Lei, N., An, D., Guo, Y., Su, K., Liu, S., Luo, Z., Yau, S.-T., Gu, X.: A geometric understanding of
deep learning. Engineering 6(3), 361–374 (2020)
Lei, N., Su, K., Cui, L., Yau, S.-T., Gu, X.D.: A geometric view of optimal transportation and
generative model. Comput. Aided Geom. Des. 68, 1–21 (2019)
Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial
networks. In: Advances in Neural Information Processing Systems, pp. 1505–1514 (2018)
Liu, H., Gu, X., Samaras, D.: Wasserstein gan with quadratic transport cost. In: ICCV (2019)
Ma, X.N., Trudinger, N.S., Wang, X.J.: Regularity of potential functions of the optimal transporta-
tion problem. Arch. Ration. Mech. Anal. 177(2), 151–183 (2005)
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv
preprint arXiv:1511.05644 (2015)
1706 X. Gu et al.
Mérigot, Q.: A multiscale approach to optimal transport. In: Computer Graphics Forum, vol. 30,
pp. 1583–1592. Wiley Online Library (2011)
Mescheder, L.M., Nowozin, S., Geiger, A.: Which training methods for gans do actually conver-
gence? In: International Conference on Machine Learning (ICML) (2018)
Meyron, J., Mérigot, Q., Thibert, B.: Light in power: a general and parameter-free algorithm for
caustic design. In: SIGGRAPH Asia 2018 Technical Papers, p. 224. ACM (2018)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial
networks. In: ICLR (2018)
Nadeem, S., Su, Z., Zeng, W., Kaufman, A.E., Gu, X.: Spherical parameterization balancing angle
and area distortions. IEEE Trans. Vis. Comput. Graph. 23(6), 1663–1676 (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional
generative adversarial networks. In: ICLR (2016)
Razavi, A., Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. In: ICLR
2019 Workshop DeepGenStruct (2019)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference
in deep generative models. arXiv preprint arXiv:1401.4082 (2014)
Salakhutdinov, R., Burda, Y., Grosse, R.: Importance weighted autoencoders. In: ICML (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition
(2014)
Simonyany, K., Brock, A., Donahuey, J.: Large scale gan training for high fidelity natural image
synthesis. In: International Conference on Learning Representations (2019)
Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.:
Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.
ACM Trans. Graph. 34, 1–11 (2015a)
Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.:
Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.
ACM Trans. Graph. 34(4), 66 (2015b)
Solomon, J., Rustamov, R., Guibas, L., Butscher, A.: Earth mover’s distances on discrete surfaces.
ACM Trans. Graph. 33(4), 67 (2014)
Su, K., Chen, W., Lei, N., Cui, L., Jiang, J., Gu, X.D.: Measure controllable volumetric mesh
parameterization. Comput. Aided Des. 78(C), 188–198 (2016)
Su, K., Chen, W., Lei, N., Zhang, J., Qian, K., Gu, X.: Volume preserving mesh parameterization
based on optimal mass transportation. Comput. Aided Des. 82:42–56 (2017)
Su, K., Cui, L., Qian, K., Lei, N., Zhang, J., Zhang, M., Gu, X.D.: Area-preserving mesh
parameterization for poly-annulus surfaces based on optimal mass transportation. Comput.
Aided Geom. Des. 46(C):76–91 (2016)
Su, Z., Zeng, W., Shi, R., Wang, Y., Sun, J., Gu, X.: Area preserving brain mapping. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2242 (2013)
Thoma, J., Acharya, D., Van Gool, L., Wu, J., Huang, Z.: Wasserstein divergence for gans. In:
ECCV (2018)
ur Rehman, T., Haber, E., Pryor, G., Melonakos, J., Tannenbaum, A.: 3D nonrigid registration via
optimal mass transport on the GPU. Med. Image Anal. 13(6), 931–940 (2009)
van den Oord, K.K.A., Vinyals, O.: Neural discrete representation learning. In: NeurIPS (2017)
Villani, C.: Optimal transport: Old and new, vol. 338. Springer Science & Business Media (2008)
Wang, X.-J.: On the design of a reflector antenna. Inverse Prob. 12(3), 351 (1996)
Wang, X.-J.: On the design of a reflector antenna II. Calc. Var. Partial Differ. Equ. 20(3), 329–341
(2004)
Xiao, C., Zhong, P., Zheng, C.: Bourgan: Generative networks with metric embeddings. In:
NeurIPS (2018)
Yau, S.-T.: SS Chern: A great geometer of the twentieth century. International PressCo (1998)
Yu, X., Lei, N., Zheng, X., Gu, X.: Surface parameterization based on polar factorization. J.
Comput. Appl. Math. 329(C), 24–36 (2018)
Image Reconstruction in Dynamic Inverse
Problems with Temporal Models 48
Andreas Hauptmann, Ozan Öktem, and Carola Schönlieb
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708
Outline of Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1710
Spatiotemporal Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1711
Reconstruction Without Explicit Temporal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1712
Reconstruction Using a Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714
Reconstruction Using a Deformable Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1715
Motion Models Based on Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1718
Physical Motion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1718
Deformable Templates Given by Diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1722
Flow of Diffeomorphisms and Intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1723
Deformable Templates by Metamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1724
Spatiotemporal Reconstruction with LDDMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1725
Data-Driven Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727
Data-Driven Reconstruction Without Temporal Modelling . . . . . . . . . . . . . . . . . . . . . . . . 1729
A. Hauptmann
Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
Department of Computer Science, University College London, London, UK
e-mail: [email protected]
O. Öktem ()
Department of Information Technology, Division of Scientific Computing, Uppsala University,
Uppsala, Sweden
Department of Mathematics, KTH – Royal Institute of Technology, Stockholm, Sweden
e-mail: [email protected]
C.-B. Schönlieb
Department of Applied Mathematics and Theoretical Physics, University of Cambridge,
Cambridge, UK
e-mail: [email protected]
Abstract
Keywords
Introduction
Dynamic inverse problems in imaging refer to the case when the object being
imaged undergoes a temporal evolution during the data acquisition. The resulting
data in such an inverse problem is a time (or quasi-time) series and due to limited
sampling speed typically highly undersampled. Failing to account for the dynamic
nature of the imaged object will lead to severe degradation in image quality, and
hence there is a strong need for advanced modeling of the involved dynamics by
incorporating temporal models in the reconstruction task.
The need for dynamic imaging arises, for instance, in various tomographic
imaging studies in medicine, such as imaging moving organs (respiratory and
cardiac motion) with computed tomography (CT) (Kwong et al. 2015), positron
emission tomography (PET), or magnetic resonance imaging (MRI) (Lustig et al.
2006), and in functional imaging studies by means of dynamic PET (Rahmim et al.
2019) or functional MRI (Glover 2011). In functional imaging studies, the dynamic
information is crucial for the diagnostic value to assess functionality of organs
or tracking an injected tracer. Spatiotemporal imaging also arises in life sciences
(Mokso et al. 2014) where it is crucial to understand dynamics and interactions
of organisms. Lastly, applications in material sciences (De Schryver et al. 2018;
Ruhlandt et al. 2017) and process monitoring (Chen et al. 2018) rely on the
capabilities of dynamic image reconstruction.
Mathematically, solving dynamic inverse problems in imaging or spatiotemporal
image reconstruction aims to recover a time-dependent image from a measured time
48 Image Reconstruction in Dynamic Inverse Problems 1709
series. Since the measured time series is typically highly undersampled in each
time instance, the reconstruction task is ill-posed, and additional prior knowledge is
needed to recover a meaningful spatiotemporal image. One such prior assumption
can be made on the type of dynamics in the studied object, which can regularize the
reconstruction task by penalizing unrealistic motion.
There are various approaches in the literature for solving dynamic inverse
problems. In this paper, we focus on variational models for this task which
occupy a relatively large space in this context in the literature. Here, we identify
two subgroups: those variational approaches which incorporate prior temporal
information in the regularizer without a physical motion model but as a smoothness
prior, e.g., as in Niemi et al. (2015) for slowly evolving images, and those variational
approaches which incorporate prior temporal information in the model by motion
constraints characterized either by an evolutionary PDE for the reconstruction or by
a registration approach with a time-dependent deformation operator that is applied
to a template.
The former, variational methods with a temporal smoothness prior, are applicable
to a wide range of dynamic inverse problems as outlined in Schmitt and Louis (2002)
and Schmitt et al. (2002). Indeed, the absence of an explicit motion constraint makes
these methods more generally applicable. Some imaging-related applications are
Feng et al. (2014), Lustig et al. (2006), and Steeden et al. (2018) for spatiotemporal
compressed sensing in dynamic MRI. Here, the temporal regularity is enforced by
a sparsifying transform (or total variation). Further examples are μCT imaging of
dynamic processes (Bubba et al. 2017; Niemi et al. 2015) and process monitoring
with electrical resistance tomography (Chen et al. 2018).
The latter, variational methods featuring explicit motion models, can be divided
in two categories. The first ones model the motion as an evolutionary PDE (Burger
et al. 2017, 2018; Dirks 2015; Frerking 2016) using optical flow (Horn and Schunck
1981) or a continuity equation (Burger et al. 2018; Lang et al. 2019a), either
as a constraint or in the form of a penalty term in the variational reconstruction
model. Some prominent applications of this approach are in dynamic photoacoustic
tomography (Lucka et al. 2018) and 3D computed tomography (Djurabekova et al.
2019), just to name a few. The second one parametrizes the dynamics in the form
of a time-dependent diffeomorphic deformation operator (Younes 2019). Examples
for such deformation models are LDDMM (Beg et al. 2005; Miller et al. 2006;
Trouvé and Younes 2015) and metamorphosis (Younes 2019, Chapter 13). Dynamic
image reconstruction is then modeled as an indirect registration task, as in Gris et al.
(2020) with metamorphosis or Chen et al. (2019) and Lang et al. (2019b) using
LDDMM. See also Yang et al. (2013) and Chen and Öktem (2018) for surveys on
this topic.
Recently, deep neural network approaches have also entered the picture as a
mean to approximate the solution to the computationally demanding variational
approaches discussed above. Examples for these are Schlemper et al. (2017),
Hauptmann et al. (2019), and Kofler et al. (2019) for dynamic image reconstruction
without incorporating physical motion models and Qin et al. (2018), Liu et al.
(2019), and Pouchol et al. (2019) for learned indirect registration approaches.
1710 A. Hauptmann et al.
Outline of Survey
The survey focuses on variational methods for recovering a tomographic image that
undergoes temporal evolution.
Section “Spatiotemporal Inverse Problems” is an overview of various approaches
for reconstruction in such a setting. It starts with a mathematical formalization
of a spatiotemporal inverse problem that is given as the task of solving an (time
dependent) operator equation. This is followed by specifying various variational
approaches for reconstruction that differ according to how the temporal model is
specified. Section “Reconstruction Without Explicit Temporal Models” outlines a
setup of a variational approach for reconstruction in a setting when one lacks an
explicit temporal model resulting in (4). Such an approach is however not further
explored in this survey; instead, focus is on a setting where there is an explicit
temporal model and here the survey considers two variants.
In the first (section “Reconstruction Using a Motion Model”), the temporal model
is given as the solution to an operator equation with a time-dependent parameter
as in (7). The resulting variational model for reconstruction can be expressed as
in (13). Section “Motion Models Based on Partial Differential Equations” further
develops this formulation by considering partial differential equation (PDE)-based
formulations.
In the second (section “Reconstruction Using a Deformable Template”), the
temporal model is given by applying a parametrized deformation operator to a
template in which the parameter is time dependent. This results in a temporal
model of the form (15) that can be incorporated into a variational approach for
reconstruction as in (17). This is followed by an outline of two approaches when data
is time discretized. Section “Deformable Templates Given by Diffeomorphisms”
builds on these approaches by considering explicit diffeomorphic deformation
operators given by solving a flow equation.
As already stated, section “Motion Models Based on Partial Differential Equa-
tions” outlines how PDE-based motion models can be used for spatiotemporal
reconstruction through (13). Likewise, section “Deformable Templates Given by
Diffeomorphisms” outlines approaches based on (17) in which the deformation
operator is given by solving an ordinary differential equation (ODE).
Section “Data-Driven Approaches” reviews data-driven approaches that have
been developed for improving upon the computational feasibility of the varia-
tional models in sections “Deformable Templates Given by Diffeomorphisms”
and “Motion Models Based on Partial Differential Equations”. In particular, sec-
tion “Data-Driven Reconstruction Without Temporal Modelling” outlines data-
driven methods that can be viewed as building on section “Reconstruction Without
Explicit Temporal Models”. Similarly, one can see section “Learning Motion
Models” as a data-driven extension of sections “Reconstruction Using a Motion
Model” and “Motion Models Based on Partial Differential Equations” and sec-
tion “Learning Deformation Operators” as a data-driven extension of the methods
48 Image Reconstruction in Dynamic Inverse Problems 1711
Remark 1. The formulation in (1) also covers cases when noise in data depends on
the signal strength, like Poisson noise. Simply
assume e(t, ·) in (1) is a sample of
the random variable e(t, ·) := g(t, ·) − A t, f (t, ·) where g(t, ·) is the Y -valued
random variable generating data.
Special cases of (1) arise depending on how the time dependency enters into
the problem. In particular, the following three components can depend on time
independently of each other:
1712 A. Hauptmann et al.
(a) Forward operator: The forward model may depend intrinsically on time.
(b) Data acquisition geometry: The way the forward operator is sampled has a
specific time dependency.
(c) Image: The image to be recovered depends on time.
Next, an important special case is when data in (1) is observed at discrete time
instances 0 ≤ t0 < . . . < tn ≤ T ; see also Schmitt and Louis (2002). Then, (1)
reduces to the task of recovering images fj ∈ X from data gj ∈ Y where
gj := g(tj , ·) ∈ Y fj := f (tj , ·) ∈ X
(3)
ej := e(tj , ·) ∈ Y A j := A tj , ·) : X → Y.
On the other hand, if the image has edges that need to be preserved, then BV(Ω)-
regularity is more natural and a total variation (TV)-regularizer is a better choice
(Rudin et al. 1992). This regularizer is for f ∈ W 1,1 (Ω) expressible as
S(f ) := ∇f (x) dx. (6)
Ω
Other choices may include higher order terms to the total variation functional, like
in total generalized variation; see Benning and Burger (2018) and Scherzer et al.
(2009) for a survey.
The choice of temporal regularizer is much less explored. This functional
accounts for a priori temporal regularity. Similarly to (5) one can here think of a
smoothness prior (Niemi et al. 2015) for slowly evolving images
2
T(∂t f ) := ∂t f (x) dx, (7)
Ω
or a total variation type of penalty (Feng et al. 2014) for changes that are small or
occur stepwise (image changes stepwise). The regularizer (7) acts pointwise in time,
and full temporal dependency is obtained by integrating over time in (4).
Methods for solving (1) based on (4) can be used when there is no explicit
temporal model that connects images and data across time. Hence, such methods
are applicable to a wide range of dynamic inverse problems as outlined in
Schmitt and Louis (2002) and Schmitt et al. (2002). More specific imaging-related
applications are Feng et al. (2014), Lustig et al. (2006), and Steeden et al. (2018) for
spatiotemporal compressed sensing in dynamic MRI. Here, the temporal regularity
is enforced by a sparsifying transform (or total variation). Further examples are μCT
imaging of dynamic processes (Bubba et al. 2017; Niemi et al. 2015) and process
monitoring with electrical resistance tomography (Chen et al. 2018).
Remark 2. When data is time discretized, then one also has the option to consider
reconstructing images at each time step independently. An example of this is to
recover the image at tj by using a variational regularization method, i.e., as fj ≈ fj
where
fj := arg min L A j (f ), gj + S γj (f ) for j = 1, . . . , n. (8)
f ∈X
Our emphasis will henceforth be on methods for solving (1) that utilize more
explicit temporal models.
1714 A. Hauptmann et al.
The idea here is to assume that a solution t → f (t, ·) ∈ X to (1) has a time evolution
that can be modeled by a motion model. Restating this assumption mathematically,
we assume there is an operator Ψ : [0, T ] × X → X (motion model) such that
Ψ t, f (t, ·) = 0 on Ω whenever t → f (t, ·) solves (1). (9)
Hence, (1) can be rephrased as the task of recovering the image trajectory t →
f (t, ·) ∈ X along with its motion model Ψ : [0, T ] × X → X from time series data
t → g(t, ·) ∈ Y where
g(t, ·) = A t, f (t, ·) (t, ·) + e(t, ·) on M
s.t. Ψ t, f (t, ·) = 0 on Ω. for t ∈ [0, T ]. (10)
for some t → θt . Then, (1) can be rephrased as the task to recover t → f (t, ·) ∈ X
along with motion parameter t → θt ∈ Θ from time series data t → g(t, ·) ∈ Y
where
g(t, ·) = A t, f (t, ·) (t, ·) + e(t, ·) on M
s.t. Ψθt f (t, ·) = 0 on Ω. for t ∈ [0, T ]. (12)
T
arg min L A t, f (t, ·) , g(t, ·) + Tτ (t, θt ) + S γ (f (t, ·)) dt .
f (t,·)∈X 0
θt ∈Θ
s.t. Ψθt f (t, ·) = 0, for t ∈ [0, T ].
(13)
Just as for (4), one here needs to choose S γ : X → R (spatial regularizer) and
Tτ (t, ·) : X → R (temporal regularizer), whereas L : Y × Y → R is derived from
a statistical model for the noise in data.
In practice, the hard constrained formulation might be too restrictive, and we
rather aim to solve a penalized version, where the motion constraint is incorporated
as a regularizer; see section “Motion Models Based on Partial Differential Equa-
tions” for further detials. Next, for data that is time discretized, the formulation in
(13) reduces to a series of reconstruction and registration problems that are solved
simultaneously. Practically, the optimization is usually performed in an alternating
way, where first a dynamic reconstruction f (t, ·) for t ∈ [0, T ] is obtained, followed
by an update of the motion parameters t → θt . This alternating minimization
procedure is then iterated until a convergence criterion is fulfilled (Burger et al.
2018). Interpreted in a Bayesian setting, this approach compares to smoothing
(Burger et al. 2017).
The idea here is that when solving (1), the temporal model for t → f (t, ·) ∈ X
is given by deforming a fixed (time-independent) template f0 ∈ X using a time-
dependent parametrization of a deformation operator.
Deformation Operators
To formalize the underlying assumption in reconstruction with a deformable
template, we assume there is a fixed family {W θ }θ∈Θ of mappings (deformation
operators)
Wθ : X → X for θ ∈ Θ. (14)
for some t → θt ∈ Θ and f0 ∈ X. Then, (1) can be rephrased as the inverse problem
of recovering f0 ∈ X and t → θt ∈ Θ from time series data g(t, ·) ∈ Y where
g(t, ·) = A t, W θt (f0 ) + e(t, ·) on M for t ∈ [0, T ]. (16)
1716 A. Hauptmann et al.
Remark 3. Comparing assumption (15) with (9), we see that they are equivalent if
Ψ t, W θt (f0 ) = 0 holds on Ω for t ∈ [0, T ].
S γ : X → R and Tτ (t, ·) : Θ → R.
gj = A j W θj (f0 ) + ej for j = 1, . . . , n. (18)
n
(f0 , θ1 , . . . , θn ) ∈ arg min L A j W θj (f0 ) , gj
f0 ∈X j =1
θ1 ,...,θn ∈Θ
+ Tτ (θj ) + S γ W θj (f0 ) . (19)
∂f
Ψν f (t, ·) := (t, ·) + ∇ · ν(t, ·)f (t, ·) = 0 on Ω ⊂ Rd . (24)
∂t
df ∂f ∂f dxi
d
0= = + = ∂t f + ∇f · ν. (25)
dt ∂t ∂xi dt
i=1
This equation is also called the optical flow constraint, and it is a popular approach
to model motion between consecutive images (Horn and Schunck 1981). In the
following, we will base the motion-constrained reconstruction as formulated in (13)
on the continuity equation (24), assuming either mass conservation or the stronger
assumption of brightness constancy in the form of the optical flow model. For both
48 Image Reconstruction in Dynamic Inverse Problems 1719
p
T 1 q r
arg min A t, f (t, ·) − g(t, ·) + α f (t, ·) BV
+ β ν(t, ·) BV
dt,
t→f (t,·)∈X 0 p p
t→ν(t,·)∈V
s.t. Ψν f (t, ·) = 0 on Ω ⊂ Rd .
(26)
Here we use for both image sequence and vector field the respective total variation
as a regularizer, given by the semi-norm in the space of bounded variation.
Consequently, given fixed domain Ω ⊂ Rd , the spaces under consideration here are
X = BV(Ω, R) for the reconstructions and V = BV(Ω, Rd ) for the corresponding
vector field. Other models can be considered such as L2 -regularizer for the mass
conservation or other convex regularizer (see Burger et al. 2018; Dirks 2015 for
details). We furthermore assume the forward operator A(t, ·) : X → Y to be a
bounded linear operator to some Hilbert space Y . In particular, it can be time-
dependent (Burger et al. 2017; Frerking 2016).
The motion constraint in (24) is used to describe how image sequence and
vector fields are connected. From the perspective of tomographic reconstructions,
the motion constraint acts as an additional temporal regularizer along the motion
field ν. Instead of imposing the motion constraint exactly as in (26), we can also
relax it and add as a least-squares term to the functional itself, cf. Burger et al.
(2018).
In order to establish existence of minimizers of (26), we need to ensure appro-
priate weak-star compactness of sublevel sets and lower semicontinuity. We will
restrict the following results here now to dimension d = 2. For the minimization,
we consider the space
D := (f, ν) ∈ Lmin{p,q} [0, T ]; X × Lr [0, T ]; V |
of p. We can now state an existence result for the joint model (26) that is proven in
Burger et al. (2018).
1 p
T
q
J(f, ν) := A t, f (t, ·) − g(t, ·) + α|f (t, ·)|BV + β|ν(t, ·)|rBV dt.
0 p p
Furthermore, let A be such that it does not eliminate constants, i.e., A(t, 1) = 0
for all t ∈ [0, 1]. Then, there exists a minimizer of J(f, ν) in the constraint set
S := (f, ν) ∈ D | Ψν (f ) = 0 where D is given as in (27).
The proof for p = 2 follows from Dirks (2015) and Burger et al. (2018), and the
case for p = 1 follows similar arguments as outlined in Frerking (2016). Existence
for the unconstrained case is proved by incorporating the constraint as a penalty term
in the functional J as shown in Burger et al. (2018). We note here that the choice
q, r > 1 has to be made in the analysis in order to avoid dealing with measures in
time. In the computational use cases considered below, it is however reasonable to
set q = r = 1.
1 p
T
arg min A t, f (t, ·) − g(t, ·)
t→f (t,·)∈X 0 p p
t→ν(t,·)∈V
+ α|f (t, ·)|BV + γ Ψν f (t, ·) + β|ν(t, ·)|BV dt, (28)
1
1
T
f k+1 = arg min A(t, f ) − g p + α|f |BV + γ Ψ k (f ) dt
p ν 1
t→f (t,·)∈X 0 p
(29)
T
β
ν k+1 = arg min Ψν (f k+1 ) + |ν|BV dt. (30)
t→ν(t,·)∈V 0 1 γ
Most importantly, both subproblems are now linear and convex, but we note that
the solution of the alternating scheme might correspond to local minima of the
joint model. In practice, one would initialize f 0 = 0 and ν = 0, and then
the first minimization problem for f 1 corresponds to a classic total variation
regularized solution for each image time instance separately followed by a motion
estimation. Reconstructions from Burger et al. (2017) using this alternating scheme
for experimental μCT data are shown in Fig. 1 and an illustration of the influence of
Lp -norms in the data fidelity in Fig. 2.
One can use any optimization algorithm that supports non-differentiable terms
for computing solutions to each of the subproblems (29) and (30). In dimension
d = 2, one could simply use a primal-dual hybrid gradient scheme (Chambolle and
Pock 2011) as outlined in Burger et al. (2017) (see also Aviles-Rivero et al. 2018);
Fig. 1 Reconstructions from Burger et al. (2017) of experimental X-ray data using the approach
in (28) with an optical flow constraint. Top row shows the ground-truth spatiotemporal image, and
bottom row shows data and reconstruction for three sampling schemes
1722 A. Hauptmann et al.
Fig. 2 Reconstruction results for the random sampling with both p = 1, 2 for the fidelity term
in (28) for time points 17 and 25. The left images show that L1 -norm clearly favors sparse
reconstructions with a resulting sparse motion field. In contrast, the L2 -norm shown in the right
favors smoother reconstructions and motion fields
here, both applications use the optical flow constraint (25). In higher dimensions
where the computational burden of the forward operator becomes more prevalent,
it is advised to consider other schemes with fewer operator evaluations, and we
refer to Lucka et al. (2018) for an application to dynamic 3D photoacoustic
tomography as well as Djurabekova et al. (2019) for dynamic 3D computed
tomography.
To conclude this section, we mention that in other applications, it might be more
suitable to require mass conversation using the continuity equation instead (see, for
instance, Lang et al. 2019a).
The reconstruction methods described here aim to solve (16) using deformable
templates (section “Reconstruction Using a Deformable Template”).
Images are elements in the Hilbert space X := L2 (Ω, R) for some fixed bounded
domain Ω ⊂ Rd . The deformation operator is given by acting with diffeomorphisms
on images. Hence, let Diff(Ω) denote the group of diffeomorphisms (with compo-
sition as group law), and (φ, f0 ) → φ.f0 denotes the (group) action of Diff(Ω) on
X. In imaging, there are now two natural options:
Geometric group action: This group action simply moves image intensities with-
out changing their gray scale values, which correspond to shape deformation:
Mass-preserving group action: Image intensities are allowed to change, but one
preserves the total mass:
as
ν
φs,t := φ(t, ·) ◦ φ(s, ·)−1 for s, t ∈ [0, T ] and φ(t, ·) solving (33). (34)
ν
GV := φ : Rd → Rd : φ = φ0,T for some ν ∈ L1 ([0, T ], V ) . (35)
p,∞ p
Diff0 (Ω) := φ ∈ Diffp,∞ (Ω) : φ − Id ∈ C0 (Ω, Rd ) .
p p,∞
Next, if V is embedded in C0 (Ω, Rd ), then GV is a subgroup of Diff0 (Ω).
ν,ζ
One can show that (36) has a unique solution t → (φ0,t
ν ,I
t ) ∈ GV × X (Trouvé
and Younes 2005; Charon et al. 2018), so the above construction can be used for
deforming images.
The aim here is to solve (16) with time discretized data. Following Gris et al. (2020),
the idea is to adopt the independent trajectory approach outlined in section “Time
Discretized Data”, so the inverse problem can be reformulated as a sequence of
indirect registration problems (18). Hence, the task reduces to recovering and
matching a template f0 independently to data gj in the sense of joint reconstruction
and registration (indirect registration). One could here consider various approaches
for indirect registration (see Yang et al. 2013; Chen and Öktem 2018 for surveys),
and Gris et al. (2020) uses metamorphosis for this step.
The above considerations lead to the following variational formulation:
n
(θ1 , . . . , θn ) ∈ arg min L Aj W θj (f0 ) , gi + λ ν 2
2 +τ ζ 2
2 .
θ1 ,...,θn ∈V ×X i=1
(37)
The template f0 ∈ X and data g1 , . . . , gn ∈ Y are related to each other as in
(2), and the deformation operator W θj : X → X, which is parametrized by θj :=
(ν(tj , ·), ζ (tj , ·)) ∈ V × X, is given by the metamorphosis framework as
ν ν,ζ ν ν,ζ
Wθj (f0 ) := φ0,t .I
i ti
where (φ0,t , It ) ∈ GV × X solves (36). (38)
The aim here is to solve (16) with time continuous data by a variational formulation
of the type (17). Following Chen et al. (2019), W θt : X → X in (17) (deformation
operator) is given by the LDDMM framework, so it is parametrized by θt :=
ν(t, ·) ∈ V for some ν ∈ L2 ([0, T ], V ) as
ν ν
W θt (f0 ) := φ0,t .f0 for f0 ∈ X and φ0,t ∈ GV as in (34). (39)
Fig. 3 (continued)
48 Image Reconstruction in Dynamic Inverse Problems 1727
T t 2
min L A t f (t, ·) , g(t, ·) + τ θs ds dt + S γ (f0 )
f0 ∈X V
0 0
t→θt ∈V
s.t. ∂t f (t, ·) + ∇f (t, ·), θt Rn = 0.
f (0, ·) = f0 .
In a similar manner, if the group action is the mass-preserving as in (32), then (40)
becomes
T t
2
min L A t f (t, ·) , g(t, ·) + τ θ2 ds dt + S γ (f0 )
f0 ∈X V
0 0
t→θt ∈V
s.t. ∂t f (t, ·) + ∇ · f (t, ·) θt = 0.
f (0, ·) = f0
Data-Driven Approaches
Fig. 3 Spatiotemporal reconstruction using metamorphosis. Top row shows the target image we
seek to recover at 5 (out of 20) selected time points in [0, 1]. Second row shows corresponding
gated tomographic data. Third row shows the reconstruction of the target at these time points
obtained from (37). Fourth and fifth rows show the corresponding shape and photometric
trajectories. Bottom row shows reconstructions assuming a stationary target
1728 A. Hauptmann et al.
Fig. 4 Spatiotemporal reconstruction using LDDMM from gated tomographic data of a heart
phantom obtained by solving (40). The heart phantom is a 120×120 pixel image with gray values in
[0, 1] that is taken from Grenander and Miller (2007). Data is gated 2D parallel beam tomography
where the i:th gate has 20 evenly distributed directions in [(i − 1)π/5, π + (i − 1)π/5]. Data (not
shown) also has additive Gaussian white noise corresponding to a noise level of about 14.9dB.
Bottom row compares outcome at an enlarged region of interest (ROI). The ground truth (bottom
leftmost image) is compared against LDDMM reconstruction (second image from left) and TV
reconstruction (third image from left). The latter is computed assuming a stationary spatiotemporal
target, and corresponding full image is also shown (bottom rightmost). It is clear that the cardiac
wall is better resolved using a spatiotemporal reconstruction method. This is essential in CT
imaging in coronary artery disease
fast to apply. Next, its large model capacity also allows for capturing complicated
temporal evolution that is otherwise difficult to account for in handcrafted models.
Embedding a deep learning model into a spatiotemporal reconstruction method is
however far from straightforward.
Section “Data-Driven Reconstruction Without Temporal Modelling” outlines
how to do this in the context of the reconstruction method in section “Recon-
struction Without Explicit Temporal Models”. The situation is more complicated
for reconstruction methods that use explicit temporal models. These methods
rely on joint optimization of the image and the temporal model, so the latter
needs to be parametrized. Embedding a deep learning-based temporal model is
therefore only feasible if the said parametrization is preserved and most existing
48 Image Reconstruction in Dynamic Inverse Problems 1729
deep learning approaches for temporal modelling of images do not fulfil this
requirement. Section “Learning Deformation Operators” surveys selected deep
learning models for deformations that can be embedded into reconstruction methods
that use a deformable template (section “Reconstruction Using a Deformable
Template”). Finally, section “Learning Motion Models” considers embedding deep
learning-based models into reconstruction methods that use motion models (sec-
tion “Reconstruction Using a Motion Model”).
N
T
ϑ ∈ arg min L(ϑ) where L(ϑ) := X R ϑ t, gi (t, ·) , fi (t, ·) dt.
ϑ∈X i=1 0
(41)
Here, X : X × X → R quantifies goodness-of-fit of images, and t → gi (t, ·) ∈ Y
and t → fi (t, ·) ∈ X for i = 1, . . . , N represent noisy data and corresponding truth
of spatiotemporal image, i.e.,
t → (fi (t, ·), gi (t, ·) ∈ X × Y satisfy (1) for i = 1, . . . , N. (42)
Note here that ϑ ∈ X is the deep neural network parameter that is set during training.
It is not the same as the deformation parameter θ ∈ Θ, which parametrizes the
48 Image Reconstruction in Dynamic Inverse Problems 1731
by computing ϑ ∈ X as
N
ϑ ∈ arg min L(ϑ) where L(ϑ) := X W Λϑ (f i ,I i ) (f0i ), I i . (44)
0
ϑ∈X i=1
N
ϑ ∈ arg min L(ϑ) where L(ϑ) := Θ Λϑ (f0i , I i ), θ i (46)
ϑ∈X i=1
ν ν
W θ (f0 ) := φ0,1 .f0 with φ0,1 ∈ GV as in (34), (47)
and the group action is typically geometric (31) or mass-preserving (32). It is known
that the vector field θ ∈ Θ that registers a template to a target can be computed by
geodesic shooting (see Miller et al. 2006 and Younes 2019, Section 10.6.4). The
registration problem, which is to find θ , thus reduces to finding the initial momenta.
Quicksilver (Yang et al. 2017) trains a deep neural network in the unsupervised
1732 A. Hauptmann et al.
setting (as in (44)) to learn these initial momenta. The network architecture for
Λϑ : X × X → Θ is of convolutional neural network (CNN) type with an encoder
and a decoder. The encoder acts as a feature extraction for both template and target
images. The extracted features are then concatenated and fed into the decoder, which
consists of three independent convolutional networks that predict the momenta for
the three dimensions. To recover from prediction errors, correction networks with
the same architecture are used for predicting the prediction error. Training such a
deep neural network model with entire images is challenging, so Quicksilver only
uses patches of images as input. In this way, relatively few images and ground-truth
momenta result in a large amount of training data. A drawback is that the patches
are extracted from the target, and template and deformation are on the same spatial
grid locations, so the deformed patch in the target is assumed to lie (predominantly)
in the same location as the one in the template image. This assumes the deformation
is relatively small.
Another similar approach is VoxelMorph (Balakrishnan et al. 2019) where
training is performed in an unsupervised manner (as in (44)) with only pairs of
template and morphed image. The output is the displacement field θ ∈ Θ necessary
to register a template against a target, e.g., using an LDDMM-based deformation
operator. VoxelMorph uses CNN architecture similar to U-net for Λϑ : X × X →
Θ that consists of encoder and decoder sections with skip connections. The
unsupervised loss (44) can be complemented by an auxiliary loss that leverages
anatomical segmentations at training time. The trained network can also provide
the registered image, i.e., it offers a deep learning-based registration operator. A
further development of VoxelMorph is FAIM (Kuang and Schmah 2018) that has
fewer trainable parameters (i.e., dimension of ϑ in FIAM is smaller than the one in
VoxelMorph). Authors also claim that FAIM achieves higher registration accuracy
than VoxelMorph, e.g., it produces deformations with many fewer “foldings,” i.e.,
regions of non-invertibility where the surface folds over itself.
One may also learn the spatially adaptive regularizer that is used for defining the
deformation operator (Niethammer et al. 2019). See also Mussabayeva et al. (2019)
for a closely related approach where one learns the regularizer in the LDDMM
framework, which is the Riemannian metric for the group GV in (35).
The above approaches all avoid learning the entire deformation; instead, they
learn a deformation that belongs to a specific class of deformation models. This
makes it possible to embed the learned deformation model in a variational model
for image reconstruction.
The methods mentioned here deals with using deep learning in reconstruction with
a motion model (section “Reconstruction Using a Motion Model”). Many of the
motion models are however sufficient for capturing the desired motion, so the main
motivation with introducing deep learning is to speed up these methods.
48 Image Reconstruction in Dynamic Inverse Problems 1733
In particular, the above means we still aim to solve the penalized variational
formulation (28) with an explicit temporal model, such as the continuity equation
(24). The network then essentially learns to produce the motion field ν(t, ·) from
the time series f (t, ·). Such a network can then be utilized to estimate the motion
field, instead of solving the corresponding subproblem (30) in the alternating
minimization. For instance, one could use neural networks that are designed to
compute the optical flow (Dosovitskiy et al. 2015; Ilg et al. 2017).
Another possibility is to account for the explicit structure of the PDE by using
networks that aim to find a PDE representation for given data (Long et al. 2019).
Alternatively, one may build network architectures based on the discretization of
the underlying equations as motivated in Arridge and Hauptmann (2020). Finally,
similar to the work of joint motion estimation and reconstruction, one can learn a
motion map that is used in a learned reconstructions scheme (Qin et al. 2018).
Note
1 The temporal model is defined by considering a time-dependent deformation parameter. The
deep neural network representing the deformation operator also has parameters, but these are not
the same as the deformation parameter. In particular, the network parameters are set during training.
In contrast, the deformation parameter varies with time.
References
Arguillere, S., Trélat, E., Trouvé, A., Younes, L.: Shape deformation analysis from the optimal
control viewpoint. Journal de Mathématiques Pures et Appliqués 104(1), 139–178 (2015)
Arridge, S., Hauptmann, A.: Networks for nonlinear diffusion problems in imaging. J. Math. Imag.
Vis. 62(3), 471–487 (2020). https://doi.org/10.1007/s10851-019-00901-3
Aviles-Rivero, A.I., Williams, G., Graves, M.J., Schönlieb, C.B.: Compressed sensing plus motion
(CS+M): a new perspective for improving undersampled mr image reconstruction. ArXiv
preprint 1810.10828 (2018)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: VoxelMorph: a learning
framework for deformable medical image registration. IEEE Trans. Med. Imag. 38(8), 1788–
1800 (2019)
Beg, F.M., Miller, M.I., Trouvé, A., Younes, L.: Computing large deformation metric mappings via
geodesic flow of diffeomorphisms. Int. J. Comput. Vis. 61(2), 139—157 (2005)
Benning, M., Burger, M.: Modern regularization methods for inverse problems. Acta Numer. 27,
1–111 (2018)
Bertero, M., Lantéri, H., Zanni, L.: Iterative image reconstruction: a point of view. In: Censor,
Y., Jiang, M., Louis, A.K. (eds.) Interdisciplinary Workshop on Mathematical Methods in
Biomedical Imaging and Intensity-Modulated Radiation (IMRT), Pisa, pp. 37–63 (2008)
Bubba, T.A., März, M., Purisha, Z., Lassas, M., Siltanen, S.: Shearlet-based regularization in sparse
dynamic tomography. In: Wavelets and Sparsity XVII, vol. 10394, p. 103940Y. International
Society for Optics and Photonics, Bellinghams (2017)
Burger, M., Dirks, H., Frerking, L., Hauptmann, A., Helin, T., Siltanen, S.: A variational
reconstruction method for undersampled dynamic x-ray tomography based on physical motion
models. Inverse Probl. 33(12), 124008 (2017)
Burger, M., Dirks, H., Schönlieb, C.B.: A variational model for joint motion estimation and image
reconstruction. SIAM J. Imag. Sci. 11(1), 94–128 (2018)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications
to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
Charon, N., Charlier, B., Trouvé, A.: Metamorphoses of functional shapes in Sobolev spaces.
Found. Comput. Math. 18(6), 1535–1596 (2018). https://doi.org/10.1007/s10208-018-9374-3
Chen, C., Öktem, O.: Indirect image registration with large diffeomorphic deformations. SIAM J.
Imag. Sci. 11(1), 575–617 (2018)
Chen, B., Abascal, J., Soleimani, M.: Extended joint sparsity reconstruction for spatial and
temporal ERT imaging. Sensors 18(11), 4014 (2018)
Chen, C., Gris, B., Öktem, O.: A new variational model for joint image reconstruction and motion
estimation in spatiotemporal imaging. SIAM J. Imag. Sci. 12(4), 1686–1719 (2019)
De Schryver, T., Dierick, M., Heyndrickx, M., Van Stappen, J., Boone, M.A., Van Hoorebeke, L.,
Boone, M.N.: Motion compensated micro-CT reconstruction for in-situ analysis of dynamic
processes. Sci. Rep. 8, 7655 (10pp) (2018)
48 Image Reconstruction in Dynamic Inverse Problems 1735
Dirks, H.: Variational methods for joint motion estimation and image reconstruction. Phd thesis,
Institute for Computational and Applied Mathematics, University of Münster (2015)
Djurabekova, N., Goldberg, A., Hauptmann, A., Hawkes, D., Long, G., Lucka, F., Betcke,
M.: Application of proximal alternating linearized minimization (PALM) and inertial PALM
to dynamic 3D CT. In: 15th International Meeting on Fully Three-Dimensional Image
Reconstruction in Radiology and Nuclear Medicine, vol. 11072, p. 1107208. International
Society for Optics and Photonics, Bellingham (2019)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P.,
Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: IEEE
International Conference on Computer Vision, pp. 2758–2766 (2015)
Feng, L., Grimm, R., Block, K.T., Chandarana, H., Kim, S., Xu, J., Axel, L., Sodickson, D.K.,
Otazo, R.: Golden-angle radial sparse parallel MRI: combination of compressed sensing,
parallel imaging, and golden-angle radial sampling for fast and flexible dynamic volumetric
MRI. Magn. Reson. Med. 72(3), 707–717 (2014)
Frerking, L.: Variational methods for direct and indirect tracking in dynamic imaging. Phd thesis,
Institute for Computational and Applied Mathematics, University of Münsternster (2016)
Fu, Y., Lei, Y., Wang, T., Curran, W.J., Liu, T., Yang, X.: Deep learning in medical image
registration: a review. ArXiv preprint 1912.12318 (2019)
Glover, G.H.: Overview of functional magnetic resonance imaging. Neurosurg. Clin. 22(2), 133–
139 (2011)
Grasmair, M.: Generalized Bregman distances and convergence rates for non-convex regularization
methods. Inverse Probl. 26(11), 115014 (2010)
Grenander, U., Miller, M.: Pattern Theory. From Representation to Inference. Oxford University
Press, Oxford (2007)
Gris, B., Chen, C., Öktem, O.: Image reconstruction through metamorphosis. Inverse Probl. 36(2),
025001 (27pp) (2020)
Hakkarainen, J., Purisha, Z., Solonen, A., Siltanen, S.: Undersampled dynamic x-ray tomography
with dimension reduction kalman filter. IEEE Trans. Comput. Imag. 5(3), 492–501 (2019).
https://doi.org/10.1109/TCI.2019.2896527
Haskins G., Kruger, U., Yan, P.: Deep learning in medical image registration: a survey. Mach. Vis.
Appl. 31(8) (2020)
Hauptmann, A., Arridge, S., Lucka, F., Muthurangu, V., Steeden, S.A.: Real-time cardiovascular
mr with spatio-temporal artifact suppression using deep learning–proof of concept in congenital
heart disease. Magn. Reson. Med. 81(2), 1143–1156 (2019)
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of
optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2462–2470 (2017)
Kofler, A., Dewey, M., Schaeffter, T., Wald, C., Kolbitsch, C.: Spatio-temporal deep learning-based
undersampling artefact reduction for 2D radial cine MRI with limited training data. IEEE Trans.
Med. Imag. 39(3), 703–717 (2019). https://doi.org/10.1109/TMI.2019.2930318
Kuang, D., Schmah, T.: FAIM – a ConvNet method for unsupervised 3D medical image
registration. ArXiv preprint 1811.09243 (2018)
Kushnarev, S., Qiu, A., Younes, L. (eds.): Mathematics of Shapes and Applications. World
Scientific, Singapore (2020)
Kwong, Y., Mel, A.O., Wheeler, G., Troupis, J.M.: Four-dimensional computed tomography
(4DCT): a review of the current status and applications. J. Med. Imag. Radiat. Oncol. 59(5),
545–554 (2015)
Lang, L.F., Dutta, N., Scarpa, E., Sanson, B., Schönlieb, C.B., Étienne, J.: Joint motion estimation
and source identification using convective regularisation with an application to the analysis of
laser nanoablations. bioRxiv 686261 (2019a)
Lang, L.F., Neumayer, S., Öktem, O., Schönlieb, C.B.: Template-based image reconstruction from
sparse tomographic data. Appl. Math. Optim. (2019b). https://doi.org/10.1007/s00245-019-
09573-2
1736 A. Hauptmann et al.
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak,
J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image
analysis. Med. Image Anal. 42, 60–88 (2017)
Liu, J., Aviles-Rivero, A.I., Ji, H., Schönlieb, C.B.: Rethinking medical image reconstruction via
shape prior, going deeper and faster: deep joint indirect registration and reconstruction. To
appear in Medical Image Analysis, preprint on arxiv 1912.07648 (2019)
Long, Z., Lu, Y., Dong, B.: Pde-net 2.0: learning pdes from data with a numeric-symbolic hybrid
deep network. J. Comput. Phys. 399, 108925 (2019)
Lucka, F., Huynh, N., Betcke, M., Zhang, E., Beard, P., Cox, B., Arridge, S.: Enhancing
compressed sensing 4D photoacoustic tomography by simultaneous motion estimation. SIAM
J. Imag. Sci. 11(4), 2224–2253 (2018)
Lustig, M., Santos, J.M., Donoho, D.L., Pauly, J.M.: kt SPARSE: high frame rate dynamic MRI
exploiting spatio-temporal sparsity. In: 13th Annual Meeting of ISMRM, Seattle, vol. 2420
(2006)
Miller, M.I., Trouvé, A., Younes, L.: Geodesic shooting for computational anatomy. J. Math. Imag.
Vis. 24(2), 209—228 (2006)
Mokso, R., Schwyn, D.A., Walker, S.M., Doube, M., Wicklein, M., Müller, T., Stampanoni, M.,
Taylor, G.K., Krapp, H.G.: Four-dimensional in vivo x-ray microscopy with projection-guided
gating. Sci. Rep. 5, 8727 (6pp) (2014)
Mussabayeva, A., Pisov, M., Kurmukov, A., Kroshnin, A., Denisova, Y., Shen, L., Cong, S., Wang,
L., Gutman, B.: Diffeomorphic metric learning and template optimization for registration-based
predictive models. In: Zhu, D., Yan, J., Huang, H., Shen, L., Thompson, P.M., Westin, C.F.,
Pennec, X., Joshi, S., Nielsen, M., Fletcher, T., Durrleman, S., Sommer, S. (eds.) Multimodal
Brain Image Analysis and Mathematical Foundations of Computational Anatomy (MBIA
2019/MFCA 2019). Lecture Notes in Computer Science, vol. 11846, pp. 151–161. Springer
Nature Switzerland, Cham (2019)
Niemi, E., Lassas, M., Kallonen, A., Harhanen, L., Hämäläinen, K., Siltanen, S.: Dynamic multi-
source x-ray tomography using a spacetime level set method. J. Comput. Phys. 291, 218–237
(2015)
Niethammer, M., Kwitt, R., Vialard, F.X.: Metric learning for image registration. In: Computer
Vision and Pattern Recognition (CVPR 2019) (2019)
Pennec, X., Sommer, S., Fletcher, T. (eds.): Riemannian Geometric Statistics in Medical Image
Analysis. Academic Press, Cambridge (2020)
Pouchol, C., Verdier, O., Öktem, O.: Spatiotemporal PET reconstruction using ML-EM with
learned diffeomorphic deformation. In: Knoll, F., Maier, A., Rueckert, D., Ye, J.C. (eds.)
Machine Learning for Medical Image Reconstruction. Second International Workshop, MLMIR
2019, Held in Conjunction with MICCAI 2019. Lecture Notes in Computer Science, vol. 11905,
pp. 151–162. Springer (2019). Selected for oral presentation
Qin, C., Bai, W., Schlemper, J., Petersen, S.E., Piechnik, S.K., Neubauer, S., Rueckert, D.:
Joint learning of motion estimation and segmentation for cardiac mr image sequences. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,
pp. 472–480. Springer (2018)
Rahmim, A., Lodge, M.A., Karakatsanis, N.A., Panin, V.Y., Zhou, Y., McMillan, A., Cho, S.,
Zaidi, H., Casey, M.E., Wahl, R.L.: Dynamic whole-body PET imaging: principles, potentials
and applications. Eur. J. Nucl. Med. Mol. Imag. 46, 501–518 (2019)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys.
D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
Ruhlandt, A., Töpperwien, M., Krenkel, M., Mokso, R., Salditt, T.: Four dimensional material
movies: high speed phase-contrast tomography by backprojection along dynamically curved
paths. Sci. Rep. 7, 6487 (9pp) (2017)
Salman, H., Yadollahpour, P., Fletcher, T., Batmanghelich, K.: Deep diffeomorphic normalizing
flows. ArXiv preprint 1810.03256 (2018)
Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in
Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2009)
48 Image Reconstruction in Dynamic Inverse Problems 1737
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D.: A deep cascade of convolu-
tional neural networks for dynamic mr image reconstruction. IEEE Trans. Med. Imag. 37(2),
491–503 (2017)
Schmitt, U., Louis, A.K.: Efficient algorithms for the regularization of dynamic inverse problems:
I. Theory. Inverse Probl. 18(3), 645 (2002)
Schmitt, U., Louis, A.K., Wolters, C., Vauhkonen, M.: Efficient algorithms for the regularization
of dynamic inverse problems: II. Applications. Inverse Probl. 18(3), 659 (2002)
Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng.
19, 221–248 (2017)
Steeden, J.A., Kowalik, G.T., Tann, O., Hughes, M., Mortensen, K.H., Muthurangu, V.: Real-
time assessment of right and left ventricular volumes and function in children using high
spatiotemporal resolution spiral bssfp with compressed sensing. J. Cardiovasc. Magn. Reson.
20(1), 79 (2018)
Trouvé, A., Younes, L.: Metamorphoses through Lie group action. Found. Comput. Math. 5(2),
173–198 (2005)
Trouvé, A., Younes, L.: Shape spaces. In: Otmar, S. (ed.) Handbook of Mathematical Methods in
Imaging, pp. 1759–1817. Springer, New York (2015)
Yang, G., Hipwell, J.H., Hawkes, D.J., Arridge, S.R.: Numerical methods for coupled reconstruc-
tion and registration in digital breast tomosynthesis. Ann. Br. Mach. Vis. Assoc. 2013(9), 1–38
(2013)
Yang, X., Kwitt, R., Styner, M., Niethammer, M.: Quicksilver: fast predictive image registration–a
deep learning approach. NeuroImage 158, 378–396 (2017)
Younes, L.: Shapes and Diffeomorphisms. Applied Mathematical Sciences, vol. 171, 2nd edn.
Springer, Heidelberg (2019)
Computational Conformal Geometric
Methods for Vision 49
Na Lei, Feng Luo, Shing-Tung Yau, and Xianfeng Gu
Contents
Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1742
Riemann Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1742
Conformal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1743
Uniformization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746
Quasi-conformal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746
Holomorphic Quadratic Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749
Teichmüller Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1750
Teichmüller Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1751
Ricci Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1752
Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1754
Concepts in Discrete Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1755
A Discrete Conformal Geometry of Polyhedral Surfaces Derived from
Vertex Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758
A Discrete Conformal Geometry of Polyhedral Surfaces Derived from
Circle Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1764
Harmonic Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767
Hodge Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1770
Direct Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1772
N. Lei
Dalian University of Technology, Dalian, China
e-mail: [email protected]
F. Luo
Rutgers University, Piscataway, NJ, USA
e-mail: [email protected]
S.-T. Yau
Harvard University, Cambridge, MA, USA
e-mail: [email protected]
X. Gu ()
Stony Brook University, Stony Brook, NY, USA
e-mail: [email protected]
Abstract
Keywords
Conformal geometry has deep roots in pure mathematics fields, such as Riemann
surfaces, complex analysis, differential geometry, algebraic topology, partial differ-
ential equations, and others. Historically, conformal geometry has been broadly used
in many engineering applications (Bobenko et al. 2015), such as electromagnetics,
vibrating membranes, acoustics, elasticity, heat transfer, and fluid flow. Most of
these applications depend on conformal mappings between planar domains.
Recently, with the rapid development of 3D scanning and medical imaging
technologies, 3D geometric data has become ubiquitous. Figure 1 shows a human
facial surface acquired using a scanning system based on structured light. The
system can capture dynamic geometric data with very high spacial resolution and
scanning speed. It is challenging to process the huge amount of this geometric
data with high accuracy and efficiency. The challenge can be tackled using various
geometric theories. Compared to topology or Riemannian geometry, conformal
49 Computational Conformal Geometric Methods for Vision 1741
Fig. 1 3D human facial surface data scanned using structured light technology
geometry better fits this purpose because conformal structure has much richer
information than topological structure and conformal mappings are much more
flexible than isometries.
With the increase of computational power and further advances in mathematical
theories, computational conformal geometry emerges as an interdisciplinary field,
bridging mathematics and computer science. Computational conformal geometric
theories and algorithms have been generalized from planar domains to surfaces with
arbitrary topologies and have been applied to many engineering and medical fields.
This paper is not intended to be an overview of the field and will mainly focus on
our contributions to the field. Many important works have not been touched upon
and many references are missing. More details can be found, for instance, in Gu and
Yau (2007, 2020).
Essentially, conformal geometry focuses more on surface conformal structures
and conformal mappings, which are limited. In practice, most mappings are not
conformal. Fortunately, quasi-conformal geometry studies much more broad range
of mappings (quasi-conformal mappings), which model most homeomorphisms in
reality. From computational point of view, quasi-conformal mappings are converted
to conformal ones under special metric transformations and therefore can be
achieved using the same techniques in conformal geometry. In the following, we
introduce the concepts, theorems, and computational methods in both conformal
geometry and quasi-conformal geometry.
1742 N. Lei et al.
Fundamental Concepts
Riemann Surfaces
Uj Ui
ϕj ϕi
ϕi ◦ϕ−1
j
the notions of harmonic functions, and more generally, harmonic and holomorphic
differentials, are well defined on a Riemann surface.
Almost every surface we encounter is a Riemann surface. For instance, every
open set in the plane is a Riemann surface. In fact, complex analysis that we learn
in undergraduate and graduate courses is the Riemann surface theory on open sets
in the plane C. Furthermore, every oriented smooth surface S with a Riemannian
metric g is naturally a Riemann surface – the complex structure on S is induced by
the Riemannian metric g, and the notion of angle defined by the complex structure
coincides. This was first observed by C. F. Gauss for the case of real analytic
Riemannian metrics. He showed that at each point p ∈ S, one can find a coordinate
chart (U, ϕ) such that ϕ : (U, g) → (R2 , dx 2 + dy 2 ) is an angle-preserving smooth
embedding. These coordinate charts (U, ϕ) are called the isothermal coordinates. In
particular, all smooth oriented surfaces in 3-space are naturally Riemann surfaces.
Another class of Riemann surfaces comes from algebraic geometry. Namely, an
algebraic curve in C2 , i.e., a surface defined by a polynomial equation p(z, w) = 0,
is naturally a Riemann surface where coordinate charts are derived from the implicit
function theorem.
Conformal Maps
The natural correspondences between Riemann surfaces are those bijections that
preserve angles. We call them conformal maps. From complex analysis, we know
that holomorphic maps are angle preserving (away from singularities). Thus,
conformal maps can be considered as generalizations of injective holomorphic
maps. A prominent example of a conformal map is the stereographic map from
the unit sphere to the plane.
Conformal maps can be characterized as those smooth maps which preserve
infinitesimal circles. In Fig. 3, two diffeomorphisms map a female facial surface
to the planar unit disk. The top row shows a conformal mapping, which maps the
infinitesimal circles on the face to the infinitesimal circles on the disk. In contrast,
the bottom illustrates a general diffeomorphism which maps infinitesimally ellipses
to circles and vice versa. If the eccentricities of the ellipses (the ratio between the
major axis and the minor axis) are uniformly bounded, then the mapping is called a
quasi-conformal map.
Equivalently, a conformal map preserves local shapes; namely, locally it is a
scaling transformation followed by a rotation, where the scaling factor varies from
point to point. This is illustrated in Fig. 4. The head surface of the Michelangelo’s
David sculpture is conformally flattened onto a planar rectangle. The complicated
curved surface becomes a planar sheet under this conformal map. From the shading,
one can see that the complicated local geometric shapes, such as the eyes, ears, and
curly hair, are well recognizable on the plane. We can identify the major geometric
features from their planar images.
In engineering applications, the distortions of mappings are classified into two
categories, angle distortion and area distortion. It is always desirable to find
1744 N. Lei et al.
Fig. 3 Top row: conformal mapping transforms infinitesimal circles to infinitesimal circles;
Bottom row: general diffeomorphism maps infinitesimal ellipses to infinitesimal circles
to the planar disk that preserves both angle and area. But it is possible to pursue
either a mapping without angle distortion or a mapping without area distortion or a
mapping with a good balance between angle and area distortions.
Uniformization
The famous Riemann mapping theorem classifies simply connected planar domains
up to conformal diffeomorphism. Can one classify all connected Riemann surfaces
up to conformal diffeomorphisms? This classification is achieved by the remarkable
uniformization theorem of Poincaré and Koebe proved in 1907. It states that every
simply connected Riemann surface is conformally diffeomorphic to the 2-sphere
S2 , the plane E2 , or the open unit disc H2 , as shown in Fig. 6. Using covering
space theory, the uniformization theorem implies that every connected oriented
surface with a Riemannian metric (S, g) is conformally diffeomorphic to one of
three canonical models of surfaces: (i) the unit sphere S2 ; (ii) a flat torus E2 / , or
E2 , or E2 − {0}; or (iii) a hyperbolic surface H2 / where is a discrete torsion-free
subgroup of isometries of the hyperbolic plane H2 . Equivalently, the uniformization
theorem states that for any connected Riemannian surface (S, g) there exists a
real-valued function, λ : S → R, such that the conformal Riemannian metric
eλ g is a complete Riemannian metric of constant Gaussian curvature 1, 0, or −1.
The three curvatures correspond to the three cases (i), (ii), and (iii) above. The
uniformization theorem also holds for compact surfaces with boundaries. As shown
in Fig. 7, Riemannian metric surfaces with boundaries can be conformally mapped
to the canonical surfaces with constant curvatures with a finite number of geodesic
disks removed. We remark that there is still a famous open problem on conformal
classification of planar domains. In 1910, P. Koebe conjectured that every connected
open set in the plane is conformally diffeomorphic to a new domain whose boundary
components are either round circles or points.
The uniformization theorem plays a fundamental role for applications in engi-
neering and medical imaging. It sorts all kinds of shapes in the real physical
world to only three canonical types. If one can develop an algorithm that can
handle the canonical type surfaces, then the algorithm can process all shapes via
uniformization. This greatly simplifies the algorithmic design task for engineers.
Quasi-conformal Maps
∂ϕ(z) ∂ϕ(z)
= μ(z) , (1)
∂ z̄ ∂z
where ∂z = 1/2(∂x − i∂y ) and ∂z̄ = 1/2(∂x + i∂y ). The dilatation of ϕ is defined as
49 Computational Conformal Geometric Methods for Vision 1747
1 + |μϕ |
Kϕ = . (2)
1 − |μϕ |
1 + μϕ ∞
K(ϕ) := . (3)
1 − μϕ ∞
Equation 1 is called the Beltrami equation. An important theorem says that given
the Beltrami coefficient μ, one can solve the Beltrami equation Eq. 1 in ϕ. More
precisely, the measurable Riemann mapping theorem says that given a measurable
complex function μ : D → C, such that μ∞ < 1, then there exists a quasi-
conformal homeomorphism ϕ : D → D satisfying the Beltrami equation Eq. 1.
Furthermore, two such solutions differ by a Möbius transformation,
z − z0
z → eiθ , θ ∈ [0, 2π ), |z0 | < 1. (4)
1 − z̄0 z
∂wj ∂wj
d z̄i = μ(zi ) dzi .
∂ z̄i ∂zi
Note that the definition shows μ(zi )d z̄i /dzi is invariant under the coordinate tran-
sitions and thus is globally defined. The K-quasi-conformal map and its associated
Beltrami differential can be generalized to the Riemann surface cases directly. For
instance, any C 1 -smooth diffeomorphism between two compact Riemann surfaces
is a quasi-conformal map.
The complex linear space of all the holomorphic one-forms on a closed Riemann
surface of genus g is g dimensional. The space is isomorphic to the cohomology
group of the surface H 1 (S, R). For a quadratic differential ω = φ(z)dz2 on a
Riemann surface S, its L1 norm or its area is
ωL1 = |ω| = |φ(z)| |dz|2 .
S S
S = C ∪ {∞} − {a1 , a2 , · · · , an },
then every integrable holomorphic quadratic differential has the form ϕ(z)dz2 ,
where
n
ρk
ϕ(z) = ,
z − ak
i=1
such that
1750 N. Lei et al.
n
n
n
ρk = 0, ρk ak = 0, ρk ak2 = 0.
k=1 i=1 i=1
Teichmüller Map
K(f ) ≤ K(h),
ϕ̄
μf = k
|ϕ|
for some 0 ≤ k < 1 and quadratic differential ϕL1 < ∞. The maximal dilatation
of μf is μf ∞ = k which is equal to the dilatation |μf (z)| at each point z. This
means the infinitesimal ellipses have the same eccentricity everywhere except at
zeros of ϕ. Furthermore, it is known that if the Beltrami differential of a quasi-
ϕ̄
conformal map f : S1 → S2 is of the form k |ϕ| for some nonzero quadratic
differential ϕ, then f is an extremal quasi-conformal map.
On the target surface S2 , there is a corresponding holomorphic quadratic
differential η, such that the Teichmüller map f maps the horizontal trajectories of
ϕ to the horizontal trajectories of η, the vertical trajectories of ϕ to the vertical
trajectories of η, and the zeros of ϕ to the zeros of η. Furthermore, suppose x + iy is
the natural coordinates of ϕ, u+iv the natural coordinates of η, then the Teichmüller
map f has the local representation: x + iy → u + iv,
u 1+k 0 x
= ,
v 0 1−k y
Teichmüller Space
Metric surfaces with same topology can be further classified by conformal equiv-
alence. If there is a conformal map between the two surfaces, then the surfaces
are conformal equivalent. All the conformal equivalence classes form a finite
dimensional manifold, the so-called Teichmüller space, which also admits a natural
Riemannian metric, the Weil-Petersson metric. Therefore, we can use the Teich-
müller space as the model of shape space and measure the distances among shapes.
Let S be an orientable smooth surface; the Teichmüller space T (S) of S is
the space of Riemann surface structures on S up to isotopy. More precisely, two
conformal structures X and Y on S are said to be Teichmüller equivalent, if there is
a diffeomorphism f , such that f is isotopic to the identity of S and f : (S, X) →
(S, Y ) is conformal. The T (S) is the space of equivalence classes of conformal
structures on S modulo this relation. Suppose S is a punctured Riemann surface of
genus g > 1 surface with n punctures, then T (S) is of 6g − 6 + 3n dimension.
1752 N. Lei et al.
1
dT (S) ([X], [Y ]) := log K(f ).
2
Ricci Flow
∂ ∂
vk = ξ1k + ξ2k , k = 1, 2,
∂x1 ∂x2
then
g11 g12 ξ12
v1 , v2 g = [ ξ11 ξ21 ] .
g21 g22 ξ22
Conformal Map Suppose g1 and g2 are two Riemannian metrics on S, we say they
are conformal equivalent, if there is a function u : S → R, such that
∗ ∂u/∂x ∂v/∂x h11 h12 ∂u/∂x ∂u/∂y
f h= .
∂u/∂y ∂v/∂y h21 h22 ∂v/∂x ∂v/∂y
g = e2u(x,y) (dx 2 + dy 2 ).
1 1 ∂2 ∂2
K(x, y) = −g u(x, y) = − u(x, y) = − 2
+ 2 u(x, y).
e2u(x,y) e2u(x,y) ∂x ∂y
Surface Ricci Flow Most geometric problems in engineering and medical appli-
cations can be reduced to find an appropriate Riemannian metric with required
curvature. Surface Ricci flow is a powerful tool for this purpose. Intuitively, surface
Ricci flow deforms the Riemannian metric proportional to the current curvature,
such that the curvature evolves according to a diffusion-reaction process. If the
diffusion component dominates, the curvature will converge to a constant. This gives
us the uniformization metric. Hamilton’s surface Ricci flow is defined as follows:
1754 N. Lei et al.
∂g(x, t)
= −2K(x, t)g(x, t), (7)
∂t
∂K(x, t)
= g(t) K(x, t) + K 2 (x, t). (8)
∂t
where A(0) is the total surface area at time 0 and χ (S) is the Euler characteristic
number of S. Surface Ricci flow deforms the Riemannian metric conformally; hence
the conformal factor equation can be written down as
∂u(x, t) 2π χ (S)
= − K(x, t). (10)
∂t A(0)
Computational Methods
Fig. 11 Geometric approximation using Riemann mapping and normal cycle. (a) Original surface.
(b) Conformal mapping. (c) 2k samples. (d) 8k samples
vk vk vk
θk θk θk
θi θj θi θj θi θj
vi vj vi vj
vi 2
vj 2
E2 H S
Discrete Curvature Let V (T) be the set of all vertices in the triangulation T. At
each vertex v ∈ V (T), the discrete curvature Kd (v) of d is
jk
2π − j k θi vi ∈
∂
K(vi ) = jk (15)
π− j k θi vi ∈ ∂
jk
where θi is the corner angle at vi in the triangle face [vi , vj , vj ], as shown in
Fig. 13. The discrete curvature satisfies the Gauss-Bonnet theorem,
K(v) + kA(S) = 2π χ (S), (16)
v∈T
The table below summarizes common smooth notions and their discrete counter-
parts.
a v b
K (v)
r(v1 ) r(v2 )
v1 v2
l(v1 , v2 )
includes these two as special cases, was proposed by Glickenstein (2011). Both of
these definitions were motivated by the seminal work of R. Hamilton on Ricci flows
for smooth Riemannian manifolds.
Vertex Scaling Given two PL metrics on a triangulated surface (, T) whose edge
ˆ we say and ˆ are related by a vertex scaling (Luo
length functions are and ,
2004; Roček and Williams 1981), written as ˆ = u ∗ , if there exists a function
u : V (T) → R such that for each edge e with end points v1 , v2 ,
Here dg is the Riemannian distance associated with the Riemannian metric g, i.e.,
dg (x, y) is the infimum of the lengths of all paths joining x to y. The above estimate
holds the key for showing that discrete conformal maps defined using (17) converge
to the smooth case.
Variational Principle The definition of vertex scaling in Eq. (17) carries a natural
variational principle relating a PL metric to its discrete curvature (Luo 2004).
49 Computational Conformal Geometric Methods for Vision 1759
Fix a Euclidean triangle [vi , vj , vk ], with edge lengths li , lj , lk and corner angles
θi , θj , θk . Let be the new triangle whose edge lengths are euj +uk li , and then the
Jacobian matrix is symmetric and negative semi-definite:
⎡ ∂θi ∂θi ∂θi
⎤ ⎡ ⎤
⎢ ∂ui ∂uj ∂uk
⎥ ⎢ − cot θk − cot θj cot θk cot θj
⎥
⎢ ∂θj ∂θj ∂θj ⎥=⎣ cot θk − cot θk − cot θi cot θi ⎦.
⎣ ∂ui ∂uj ∂uk ⎦
∂θk ∂θk ∂θk cot θj cot θi − cot θj − cot θi
∂ui ∂uj ∂uk
(18)
In particular, the locally concave function
u
F (ui , uj , uk ) = θi dui + θj duj + θk duk
0
∇F (ui , uj , uk ) = (θi , θj , θk )T .
Note that discrete curvature is built from the inner angles θi ’s. The above formula
relates a PL metric u ∗ and its discrete curvature. The explicit form of the function
F was found in the work of Bobenko-Pinkall-Springborn (Bobenko et al. 2015).
They showed that F can be extended to a concave function on R3 and is related
to the three-dimensional hyperbolic volume of ideal tetrahedra and is expressed in
terms of the Lobachevsky function (i.e., dilogarithm).
Discrete Yamabe Flow A basic goal in geometry is to find the relationship between
the metric and its curvature. In the discrete setting, it translates into the following
questions.
Obviously the function K̂ must satisfy the Gauss-Bonnet condition in Eq. 16. If such
a function u exists, then any other function that differs from u by a constant is also
a solution of the problem.
These questions, together with Hamilton’s Ricci flow, led to the introduction of
the discrete Yamabe flow (Luo 2004):
du(t)
(v) = K̂(v) − Ku∗ (v). (19)
dt
1760 N. Lei et al.
The variational principle associated with (17) shows that the flow is the gradient
flow of the locally concave discrete energy
u
E(u) = (K̂(v) − Ku∗ (v))du(v), (20)
0 v∈V (T)
∂ 2 E(u) ∂ 2 E(u)
= −wij , = wij , (22)
∂ui ∂uj ∂u2i k=j
where wij is the cotangent edge weight: suppose two corner angles against edge
[vi , vj ] are θk and θl ,
1
wij := (cot θk + cot θl ). (23)
2
It is proved in Bobenko et al. (2015) that the solution to the equation Ku∗ = K̂
is unique in u up to the addition of a constant function. However, the existence of u,
even if one assumes the Gauss-Bonnet condition on K̂, is in general false, and the
discrete Yamabe flow develops singularities in finite time.
Dynamic Yamabe Flow The drawback of (17) is that it depends on the choices of
the triangulation T. Recall that a marked surface is a pair (, V ) where V is a finite
set in S. A PL metric on (, V ) is a PL metric on S such that its conical singularities
are contained in V . By a triangulation T of (, V ), we mean a triangulation of
such that V (T) = V .
Suppose d1 and d2 are two PL metrics on a marked surface (,V) and T and
T are two triangulations of (, V ). Let k and k be the associated edge length
functions of dk for T and T , k = 1, 2. As shown in the following diagram, where
k : (T, dk ) → (T , dk ) are isometries.
49 Computational Conformal Geometric Methods for Vision 1761
An affirmative answer would imply that the vertex scaling operator in Eq. (17) is
independent of the choice of triangulations. Unfortunately, the answer is negative in
general. However, the condition 2 = w ∗ 1 does hold for some w if we assume all
triangulations T and T are Delaunay in dk for k = 1, 2. This is proved in Gu et al.
(2018b). Recall that a Delaunay triangulation of a polyhedral surface is a geometric
triangulation such that the sum of two angles facing each edge is at most π . Given a
PL metric d on a marked surface (S, V ), there is always a Delaunay triangulation of
(, V , d) whose vertex set is V . Generically, Delaunay triangulation on (, V , d)
is unique. However, non-uniqueness occurs when the sum of the two angles facing
an edge e is π . In this case, consider the quadrilateral Q formed by the two triangles
adjacent to e and replace the diagonal e in Q by the other diagonal. The resulting
triangulation is still Delaunay and the operation is called an edge flip. Note that edge
flip does not change the underlying PL metric, but only the combinatorics.
A triangulation-independent definition of discrete conformal equivalence of
PL metrics on a marked surface (, V ) was introduced in Gu et al. (2018b)
by modifying vertex scaling in Eq. (17) and adding the Delaunay condition on
triangulations.
1. T1 = T2 and their associated edge length functions d1 and d2 differ by a vertex
scaling Eq. (17) on T1 .
2. d1 = d2 and T1 differ from T2 by an edge flip.
It turns out this is the correct notion to solve both the existence and uniqueness
questions.
η
vj
a
pk b
pl
vk
b vl
vj vk vi vl a
E2 vi
2
H
Fig. 14 Conversion from Euclidean geometry to hyperbolic geometry
We remark that the above theorem for the case of the torus was first proved by F.
Fillastre in a different context.
aa
Cr([vi , vj ]) = .
bb
as shown in Fig. 14 left frame, where we use the upper half plane model for
the hyperbolic plane. By isometrically gluing the ideal triangles with the shear
coordinates, we obtain a complete finite area hyperbolic metric dh on the punctured
surface − V . We denote this conversion as dh = η(d).
49 Computational Conformal Geometric Methods for Vision 1763
∼
{S, V , d1 } {S, V , d2 }
η1 η2
=
{S, V , dh,1 } {S, V , dh,2 }
It can be shown that the metric dh is independent of the choice of the Delaunay
triangulations and two PL metrics d1 and d2 are discrete conformal if and only
if their associated hyperbolic metrics are isometric. Namely, the above diagram
commutes. For more details, see Gu et al. (2018a,b).
yk lk
sinh = eui sinh euj (25)
2 2
This was introduced in Bobenko et al. (2015). The Yamabe energy is defined
similarly
1764 N. Lei et al.
u
E(u) = (K̄ − K(v))du(v). (26)
v∈
The Hessian matrix of the Yamabe energy can be derived from one face case,
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ S1 S1
⎤⎡ ⎤
dθ1 S1 0 0 −1 cos θ3 cos θ2 ⎢ 0 C1 +1 C1 +1 ⎥ ⎢ du1 ⎥
⎢ ⎥ −1 ⎢ ⎥⎢ ⎥
⎣ 0 S2 0 ⎦ ⎣ cos θ3 −1 cos θ1 ⎦ ⎢ ⎥ ⎣ du2 ⎦
S2
⎣ dθ2 ⎦ = ⎣ C2 +1 0 C2S+1 2
⎦
A
dθ3 0 0 S3 cos θ2 cos θ1 −1 S3
C +1
S3
C3 +1 0 du3
3
(28)
where Sk = sinh yk and Ck = cosh yk .
The edge flip operation used in the above discrete conformal equivalence relation
has created computational complications. There is a more robust, triangulation-
dependent discrete curvature flow which one can use to find PL metrics with
the targeted curvatures. The basic idea comes from W. Thurston’s work on circle
packing and Hamilton’s work on Ricci flow. Unlike the previous conformal
equivalence which is derived from discretizing the conformal Riemannian metric
eu g, this new discretization focuses on the infinitesimal circle-preserving property of
the conformal maps. The associated finite dimensional variational principle was first
established by Colin de de Verdière (1991) in the tangential case and in the general
case in Chow and Luo (2003). Based on the variational principle, the work Chow
and Luo (2003) introduced discrete Ricci flow (30) on surfaces and established its
basic properties. Algorithmic details can be found in Luo et al. (2007) and Zeng
and Gu (2013).
Here are some mathematical details. Given a triangulated surface (S, T) and
an assignment of edge weight : E(T) → [0, π ) (measuring the intersection
angles of circles), a circle packing metric is a function, called radius assignment,
r : V (T) → R>0 such that the associated length function
l(v1 v2 ) = r(v1 )2 + r(v2 )2 + 2r(v1 )r(v2 ) cos((v1 v2 )) (29)
produces a PL metric on (S, T), i.e., satisfies the triangular inequality l(ei )+l(ej ) >
l(ek ) for every triple of edges {ei , ej , ek } belonging to a triangle in T. Thurston
proved that if (E(T)) ⊂ [0, π/2] (see Fig. 13b), then the triangle inequality
always holds for all choices of r ∈ RV (T) (see Thurston 1997). The discrete
49 Computational Conformal Geometric Methods for Vision 1765
dr(t)
(v) = −2(Kr (v) − K̂(v))r(t)(v) (30)
dt
is the gradient flow of a concave function (namely, W). From this fact, many of the
basic properties, including long time existence, of discrete Ricci flow follow. The
flow is robust and algorithmically effective if (E(T)) ⊂ [0, π/2].
Discrete Ricci flow does not work well if one (e) lies outside of the interval
[0, π/2]. This is one of the drawbacks of the flow for real-world applications. Many
polyhedral surfaces produced by digital media cannot be expressed as circle packing
metrics such that (E(T)) ⊂ [0, /2]. Modifications of the triangular meshes are
needed to achieve the condition (E(T)) ⊂ [0, /2].
The convergence of circle packing metrics on bounded simply connected
domains to the Riemann mapping was first conjectured by Thurston in 1985 and
proved in a celebrated paper by Rodin-Sullivan in (1987). However, convergence
questions for nonplanar surfaces remain open.
Below are some examples of discrete Ricci flows. Figure 19 shows one example
of computing the extremal length of a topological quadrilateral using Ricci flow.
Basically, we set the target curvature to be zero for all interior and boundary vertices,
except the four corners, and set the target curvatures for the corners to be π/2, then
we run Ricci flow to get the target metric, and isometrically embed the surface using
the target metric to obtain the planar rectangle.
Figure 15 shows a generalization of circle packing by replacing circles by squares
to compute the extremal length of a combinatorial quadrilateral. The left frame
shows a three-connected graph, with four corner nodes. The right frame shows
1766
Fig. 15 Square tiling of a three-connected graph; each node is replaced by a square with the same label and color. Two nodes are connected in the graph, if
and only if their corresponding squares are tangent
N. Lei et al.
49 Computational Conformal Geometric Methods for Vision 1767
the extremal length, where each node is replaced by a square with the same label
and color. Two nodes are connected in the graph if and only if their corresponding
squares are tangent. In theory, squares can be replaced by more general convex
shapes.
Figure 16 shows an example for computing the hyperbolic metric on a high genus
surface. As shown in the left frame, the input surface is triangulated, and each face is
a hyperbolic triangle instead of a Euclidean triangle. The theoretic formulation and
the algorithmic details are very similar. After obtaining the uniformization metric,
we isometrically embed a finite portion of the universal covering space of the surface
onto the Poincaré model of H2 . Each color represents a fundamental polygon, and
the boundaries of the fundamental polygons are hyperbolic geodesics.
Harmonic Maps
Another useful algorithm is based on surface harmonic maps for a genus zero closed
surfaces (Gu et al. 2004). Figure 17 shows the computational method for genus zero
closed surface: harmonic mapping.
Intuitively, the harmonic energy measures the elastic deformation energy induced
by a mapping between surfaces. It depends on the Riemannian metric of the target
surface and the conformal structure of the source surface. Given a C 1 mapping
between two surfaces f : (S, g) → (T , h), with isothermal parameters,
Fig. 17 A spherical harmonic mapping from the Stanford bunny surface onto the unit sphere
e2λ(u,v)
e(f, g, h) := (|∇u|2 + |∇v|2 )
e2μ(x,y)
az + b
z → , ad − bc = 1, a, b, c, d ∈ C.
cz + d
E(f ) = wij |f (vi ) − f (vj )|2 , (31)
[vi ,vj ]∈
where wij is the cotangent edge weight: suppose two corner angles against edge
[vi , vj ] are θk and θl , then
1
wij := (cot θk + cot θl ). (32)
2
In practice, we first construct the Gauss map from the Stanford bunny surface to the
unit sphere and then use the nonlinear heat diffusion method to reduce the harmonic
energy. At the k-th step, we compute the Laplacian of fk , fk ; then we project fk
to the tangent space of the sphere. The normal component of fk is
where τ is the step length. In order to remove the Möbius ambiguity, we add one
constraint that the mass center of the image of fk+1 is at the origin,
1
ck+1 := fk+1 (vi ),
|V |
vi ∈V
We repeat this procedure, until the norm of the tangential component of the map is
less than a user prescribed threshold. The details can be found in Algorithm 2. More
algorithmic details can be found in Gu et al. (2004) and Gotsman et al. (2003).
1770 N. Lei et al.
The harmonic map between hyperbolic surfaces was introduced in Shi et al.
(2016), for high genus surface registration. The harmonic map between a surface
and a graph with distance was introduced in Lei et al. (2017a,b), and this is applied
for computing holomorphic quadratic differentials for the purpose of computational
mechanics.
Hodge Decomposition
Another algorithm is based on Hodge decomposition theorem (Gu and Yau 2003).
Hodge decomposition says that any differential form ω on a closed Riemannian
manifold can be uniquely written as the sum of three parts: ω = dα + δβ + γ , where
γ is harmonic γ = 0 and = dδ + δd. Intuitively, this can be interpreted as any
vector field on a surface can be decomposed into three components: a curl-free part,
divergence-free part, and harmonic part. A vector field is harmonic if and only if it
has zero curl and zero divergence, as shown in Fig. 18.
Homology Group We compute the basis of the homology group of the polyhedral
surface (, T), denoted as H1 (, Z). We compute the Poincaré dual ¯ of the
¯ and then the cut
surface and compute a spanning tree T̄ of the vertices of ,
graph of is given by
:= {e ∈ (, T) : ē ∈ T̄ }.
\ T = {e1 , e2 , . . . , e2g }.
The union of ek and T has a unique loop γk , and then {γ1 , γ2 , . . . , γ2g } is a set of
basis of H1 (, Z).
49 Computational Conformal Geometric Methods for Vision 1771
Fig. 18 Two conjugate harmonic one-forms, in the left and middle frames, consist a holomorphic
one-form in the right frame. The loops in the left (middle) frame show the vertical (horizontal)
trajectories of the holomorphic form
Then ωk equals to zero on the boundary edges; hence ωk is defined on the original
closed surface . By this construction, the closed one-forms {ω1 , ω2 , . . . , ω2g } form
a set of basis of the first cohomology group of the polyhedral surface H 1 (, R).
one-form ∗ ωk , and this operator is called Hodge star. Because {ω1 , ω2 , . . . , ω2g } is
a set of basis of H (, R), ∗ ωk can be represented as a linear combination of them.
We can construct linear equations to find the linear combination coefficients,
2g
∗
ωk ∧ ωi = λkj ωj ∧ ωi .
j =1
And the left-hand side can be evaluated using vector field representation. For
example, we isometrically embed one face on the (x, y)-plane, ωk = αk dx +
βk dy, and then ∗ ωk = αk dy − βk dx. ωi = αi dx + βi dy,
∗
ωk ∧ ωi = −(αi αk + βi βk )A(),
where A() is the area of the triangle. We form the holomorphic one-form ϕk =
ωk + i ∗ ωk , and then {ϕ1 , ϕ2 , · · · , ϕ2g } is a set of basis of the holomorphic one-form
group of (, T). The algorithmic pipeline is summarized in Algorithm 3
As shown in Fig. 18, the left frame shows a harmonic one-form ω, the middle
frame shows the conjugate√harmonic one-form ∗ ω, and the right frame shows a
holomorphic one-form ω+ −1∗ ω. Using this method, we can construct the basis of
the group of holomorphic one-forms of the Riemann surface. By linear combination,
we can construct any holomorphic one-form. The algorithmic details can be found
in Gu and Yau (2003) and Jin et al. (2004).
Direct Applications
Conformal geometry can be applied for computer vision and medical imaging
directly. In the following, we introduce some of the most direct applications. More
applications can be found in Gu and Yau (2007, 2020), Gu et al. (2012), and Zeng
and Gu (2013).
49 Computational Conformal Geometric Methods for Vision 1773
Shape Space
All the surfaces in the real world form a shape space. The shape space can be
classified using different transformation groups. The equivalence classes form the
quotient shape spaces. The transformation groups form a hierarchical chain of
subgroups, the corresponding quotient spaces for a sequence of subspaces. The
homeomorphism group classifies the shape space by topology; each topological
equivalent class can be further classified by conformal transformation group,
all the conformal equivalence classes form the Teimüller space; each conformal
equivalence class can be further classified by the isometric transformation group;
each isometric class can be further classified by the rigid motion group (translation
and rotation). Two surfaces differ by a rigid motion if and only if they have the same
Riemannian metric, mean curvature, and boundary position.
This work focuses on conformal classification, namely, discriminating shapes
in the Teichmüller space. In the following, we introduce efficient algorithms to
compute the Teichmüller coordinates for metric surfaces with different topologies.
In practice, homeomorphic surfaces can be differentiated by their Teichmüller
coordinates.
conformally equivalent if and only if their conformal planar annuli are similar,
namely, they share the same Teichmüller coordinate.
the Teichmüller coordinate of the surface. For topological poly-annulus with n inner
holes, the dimension of the Teichmüller space is 3n − 3.
We can construct a unique holomorphic one-form ϕ, such that the imaginary
part of the integration of ϕ along γ0 is 2π , −2π along γ1 , and 0 along all other
boundary components.
p Then we fix a base point q, and the conformal mapping
f (p) = exp( q ϕ) maps the surface onto a canonical annulus with concentric
circular slits, as shown in Fig. 21. The algorithm was introduced in Yin et al. (2008).
1776 N. Lei et al.
Fig. 24 Conformal periodic mapping from a genus one closed surface to the plane
boundary components, we need 3n−3 parameters to describe the circle domain, and
the Teichmüller space is 3n − 3 dimensional.
Figure 23 demonstrates Koebe’s iteration algorithm (Zeng et al. 2009) that
conformally maps a poly-annulus onto a circle domain, namely, the complement
of the union of a finite number of disks. First, we fill the holes of the mouth and
the right eye and then conformally map the topological annulus onto a canonical
annulus; second, we fill the center circular hole of the left eye, open the hole of
the mouth, and map the topological annulus onto a canonical annulus; third, we fill
the center circular hole of the mouth, open the hole of the right eye, and map the
topological annulus onto a canonical annulus. We repeat this procedure, sequentially
opening one hole and filling all the other holes, and then map the topological annulus
to the canonical annulus. The boundary components become rounder and rounder,
and the mapping images converge to a circle domain exponentially fast.
= {a + bz : a, b ∈ Z},
where z ∈ C is a constant.
The flat torus is defined as the quotient space C/ , and the conformal mapping
is between the input surface and the flat torus f : → C/ . The Teichmüller
coordinate of the genus one closed surface is given by the z parameter for the
1778 N. Lei et al.
flat torus. Therefore, the Teichmüller space of genus one closed surface is two-
dimensional.
There are two ways to compute the conformal mapping for a torus. One way
is based on discrete surface Yamabe flow. We set the target curvature to be zero
everywhere and compute the flat metric using the flow. We slice the surface
open along a set of homology group basis {γ1 , γ2 } to obtain a topological disk ¯
¯ ¯
and isometrically flatten on the plane to obtain a fundamental domain f (). By
gluing the translated copies of the fundamental domain, we can tessellate the whole
plane and construct the flat torus C/ .
The second method is based on holomorphic one-form algorithm. First, we
compute a holomorphic one-form ϕ, then we choose a base point q ∈ , ¯ and define
the mapping by integration,
p
f (p) = ϕ, ¯
∀p ∈ .
q
¯ on the plane.
This gives a fundamental domain f ()
High Genus Closed Surface The conformal invariants of a high genus closed
surface can be computed using hyperbolic uniformization metric. As shown in
Fig. 25, given a genus g > 0 closed surface , we can choose a set of canonical
basis of the fundamental group π1 (, q), {a1 , b1 , a2 , b2 , · · · , ag , bg }, such that all
of them go through the base point q ∈ ,and satisfy the intersection conditions:
ai · bi = 1, ai · bj = 0 ai · aj = 0, bi · bj = 0,
where ai · bj represents the algebraic intersection number between two loops ai and
¯
bj . We slice the surface along the canonical basis to form a fundamental domain ,
whose boundary is given by
a2
b2
b2 b−1
1
a1
a−1
2 a−1
1
q
a2 b−1 b1
2
b1 a1
Similar to the torus case, by using the hyperbolic uniformization metric, we can
isometrically embed ¯ onto the hyperbolic plane to get a hyperbolic fundamental
polygon. By transforming the fundamental polygon by hyperbolic rigid motions,
we can generate a tessellation of the whole hyperbolic plane. All such kind of
hyperbolic rigid motions form the so-called Fuchsian group of the surface. The
generators of the Fuchsian group gives the Teichmüller coordinate of the surface .
As shown in Fig. 26, we use the Poincaré’s disk model to represent the hyperbolic
plane H2 ,
dzd z̄
H := |z| < 1, ds 2 =
2
.
(1 − zz̄)2
z − z0
z → eiθ , |z0 | < 1, θ ∈ [0, 2π ].
1 − z̄0 z
Surface Registration
∂ f˜ ∂ f˜
=μ ,
∂ z̄ ∂z
we can obtain the map f˜. One way to solve the Beltrami equation is to construct
an auxiliary metric g, such that the mapping f˜ : (D, dzd z̄) → (D, dwd w̄) is
quasi-conformal, where z and w are the complex parameters of the domain and
the range, the exact same map under the auxiliary metric becomes a conformal map,
f˜ : (D, dzd z̄) → (D, g). Since we know how to compute conformal maps, then
we can find f˜. So the key step is to construct the auxiliary metric. Fortunately, the
auxiliary metric can be easily constructed as
gμ = |dz + μd z̄|2 .
By using auxiliary metric method, we can solve the Beltrami equation and obtain
the quasi-conformal map.
Fig. 28 Facial surface matching by a Teichmüller map; all ellipses have the same eccentricity
disk. As shown in the top row of Fig. 29, the algorithm automatically registers each
finger tip to the corresponding one without any mismatching. This demonstrates the
accuracy of the registration.
Fig. 29 Surface registration based on conformal and optimal transport maps. (a) Armadillo 1.
(b) Armadillo 2. (c) Area-preserving map 1. (d) Area-preserving map 2. (e) Conformal map 1.
(f) Conformal map 2
49 Computational Conformal Geometric Methods for Vision 1785
Given a sequence facial surface {(k , gk )}nk=1 , we locate the feature points
for each frame {(p1k , p2k , · · · , pm
k )}n , and then we compute Riemann mappings
k=1
ϕk : k → D. Our goal is to find a sequence homeomorphisms fk : ϕk (k ) →
ϕk+1 (k+1 ), with the landmark constraints fk (ϕk (pik )) = ϕk+1 (pik+1 ), k =
1, . . . , n − 1, i = 1, . . . , m. Each map fk is a quasi-conformal map with Beltrami
coefficient μk , μk ∞ < 1. The Beltrami coefficients can be obtained by optimizing
the following energy:
|∇μk |2 dA+ |μk −μk−1 |2 dA+ |H (p)−H ◦f μk (p)|2 +|c(p)−c◦f μk (p)|2 dp
D D D
where H (·) and c(·) represent the mean curvature and the texture color of the
surface. Figure 30 demonstrates an expression tracking result; the blue quadrilateral
mesh is attached to the first facial surface and moves along with it. The trajectories
of the vertices of the blue mesh represent the expression (Yu et al. 2017). The facial
expression tracking technique plays an important role in the movie industry. More
algorithmic details and applications of quasi-conformal mappings can be found
in Lui et al. (2010, 2012), Ng et al. (2014), and Wong and Zhao (2014).
Medical Imaging
Conformal geometry has been applied to many fields in medical imaging. For
example, in the field of brain imaging, it is crucial to register different brain
cortex surfaces reconstructed from MRI or CT images. Because brain surfaces are
in these folds, they are hard to be located and recognized. By using conformal
geometric methods, one can flatten the whole colon surface onto a planar rectangle,
as shown in the right frame. Then all the haustral folds are expanded, all polyps
are exposed, and abnormalities can be found efficiently on the planar image.
Furthermore, in practice, the colon surface will be scanned twice with supine and
prone positions. Because the colon surface is highly elastic, there will be large
deformations between the two scans. Conformal colon flattening can be applied to
find a good registration between the supine and prone colon surfaces (Zeng et al.
2010; Zeng and Gu 2013). Today, the conformal colon flattening technique has
already been widely used in clinical practice. More applications and algorithmic
details for virtual colonoscopy can be found in Saad Nadeem et al. (2017) and Ma
et al. (2019).
1788 N. Lei et al.
Conclusion
References
Bobenko, A.I., Pinkall, U., Springborn, B.A.: Discrete conformal maps and ideal hyperbolic
polyhedra. Geom. Topol. 19(4), 2155–2215 (2015)
Chen, W., Zhang, M., Lei, N., Gu, D.X.: Dynamic unified surface ricci flow. Geom. Imag. Comput.
3(1), 31–56 (2016)
Chow, B., Luo, F.: Combinatorial Ricci flows on surfaces. J. Differ. Geom. 63(1), 97–129 (2003)
de Verdière, Y.C.: Un principe variationnel pour les empilements de cercles. Invent. Math. 104(3),
655–669 (1991)
Glickenstein, D.: Discrete conformal variations and scalar curvature on piecewise flat two- and
three-dimensional manifolds. J. Differ. Geom. 87(2), 201–237 (2011)
Gotsman, C., Gu, X., Sheffer, A.: Fundamentals of spherical parameterization for 3d meshes. ACM
Trans. Graph. (TOG) 22(3), 358–363 (2003)
Gu, X., Guo, R., Luo, F., Sun, J., Wu, T.: A discrete uniformization theorem for polyhedral surfaces
(II). J. Differ. Geom. (JDG) 109(3), 431–466 (2018a)
Gu, X., Luo, F., Sun, J., Wu, T.: A discrete uniformization theorem for polyhedral surfaces (I). J.
Differ. Geom. (JDG) 109(2), 223–256 (2018b)
Gu, X., Luo, F., Wu, T.: Convergence of discrete conformal geometry and computation of
uniformization maps. Asian J. Math. (AJM) 23(1), 21–34 (2019)
Gu, X., Wang, Y., Chan, T.F., Thompson, P.M., Yau, S.-T.: Genus zero surface conformal mapping
and its application to brain surface mapping. IEEE Trans. Med. Imag. (TMI) 23(8), 949–958
(2004)
Gu, X., Yau, S.-T.: Global conformal surface parameterization. In: Proceedings of the
2003 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 127–137.
Eurographics Association (2003)
Gu, X., Yau, S.-T.: Computational Conformal Geometry. Advanced Lectures in Mathematics,
vol. 3. International Press and Higher Education Press. Boston (2007)
Gu, X., Yau, S.-T.: Computational Conformal Geometry – Theory. International Press and Higher
Education Press. Boston (2020)
Gu, X.D., Zeng, W., Luo, F., Yau, S.-T.: Numerical computation of surface conformal mappings.
Comput. Methods Funct. Theory 11(2), 747–787 (2012)
Jin, M., Kim, J., Luo, F., Gu, X.: Discrete surface ricci flow. IEEE Trans. Vis. Comput. Graph.
(TVCG) 14(5), 1030–1043 (2008)
Jin, M., Wang, Y., Gu, X., Yau, S.-T., et al.: Optimal global conformal surface parameterization
for visualization. Commun. Inf. Syst. 4(2), 117–134 (2004)
49 Computational Conformal Geometric Methods for Vision 1789
Jin, M., Zeng, W., Ding, N., Gu, X.: Computing fenchel-nielsen coordinates in teichmuller shape
space. Commun. Inf. Syst. 9(2), 213–234 (2009a)
Jin, M., Zeng, W., Luo, F., Gu, X.: Computing tëichmuller shape space. IEEE Trans. Vis. Comput.
Graph. 15(3), 504–517 (2009b)
Lei, N., Zheng, X., Jiang, J., Lin, Y.-Y., Gu, D.X.: Quadrilateral and hexahedral mesh generation
based on surface foliation theory. Comput. Methods Appl. Mech. Eng. 316, 758–781 (2017a)
Lei, N., Zheng, X., Luo, Z., Gu, D.X.: Quadrilateral and hexahedral mesh generation based on
surface foliation theory II. Comput. Methods Appl. Mech. Eng. 321, 406–426 (2017b)
Lui, L.M., Gu, X., Yau, S.-T.: Convergence of an iterative algorithm for teichmüller maps via
harmonic energy optimization. Math. Comput. 84(296), 2823–2842 (2015)
Lui, L.M., Wong, T.W., Zeng, W., Gu, X., Thompson, P.M., Chan, T.F., Yau, S.T.: Detection of
shape deformities using yamabe flow and beltrami coefficients. Inverse Probl. Imag. 4(2), 311–
333 (2010)
Lui, L.M., Wong, T.W., Zeng, W., Gu, X., Thompson, P.M., Chan, T.F., Yau, S.-T.: Optimization of
surface registrations using beltrami holomorphic flow. J. Sci. Comput. 50(3), 557–585 (2012)
Luo, F.: Combinatorial Yamabe flow on surfaces. Commun. Contemp. Math. 6(5), 765–780 (2004)
Luo, F., Gu, X., Dai, J.: Variational Principles for Discrete Surfaces. Advanced Lectures in
Mathematics, vol. 4. International Press and Higher Education Press. Boston (2007)
Ma, M., Marino, J., Nadeem, S., Gu, X.: Supine to prone colon registration and visualization based
on optimal mass transport. Graph. Models 104, 101031 (2019)
Ng, T.C., Gu, X., Lui, L.M.: Computing extremal teichmüller map of multiply-connected domains
via beltrami holomorphic flow. J. Sci. Comput. 60(2), 249–275 (2014)
Peng, H., Wang, X., Duan, Y., Frey, S.H., Gu, X.: Brain morphometry on congenital hand
deformities based on teichmüller space theory. Comput.-Aided Des. 58, 84–91 (2015)
Rodin, B., Sullivan, D.: The convergence of circle packings to the Riemann mapping. J. Differ.
Geom. 26(2), 349–360 (1987)
Roček, M., Williams, R.M.: Quantum Regge calculus. Phys. Lett. B 104(1), 31–37 (1981)
Saad Nadeem, J.M., Gu, X., Kaufman, A.: Corresponding supine and prone colon visualization
using eigenfunction analysis and fold modeling. IEEE Trans. Vis. Comput. Graph. 23(1), 751–
760 (2017)
Shi, R., Zeng, W., Su, Z., Jiang, J., Damasio, H., Lu, Z., Wang, Y., Yau, S.-T., Gu, X.: Hyperbolic
harmonic mapping for surface registration. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
Su, Z., Wang, Y., Shi, R., Zeng, W., Sun, J., Luo, F., Gu, X.: Optimal mass transport for
shape matching and comparison. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2246–2259
(2015)
Thurston, W.P.: Three-Dimensional Geometry and Topology. Vol.1. Princeton Mathematical
Series, vol. 35. Princeton University Press, Princeton (1997)
Wang, Y., Gu, X., Hayashi, K.M., Chan, T.F., Thompson, P.M., Yau, S.-T.: Brain surface conformal
parameterization. In: Proceedings of the Eighth IASTED International Conference, Computer
Graphics and Imaging. Honolulu, Hawaii, pp. 76–81 (2005)
Proceedings of the Eighth IASTED International Conference, Computer Graphics and Imaging,
August, 2005, Honolulu, Hawaii, USA. Before references, a pa
Wang, Y., Lui, L.M., Gu, X., Hayashi, K.M., Chan, T.F., Toga, A.W., Thompson, P.M., Yau, S.-T.:
Brain surface conformal parameterization using riemann surface structure. IEEE Trans. Med.
Imag. 26(6), 853–865 (2007)
Wong, T.W., Zhao, H.-K.: Computation of quasi-conformal surface maps using discrete beltrami
flow. SIAM J. Imag. Sci. 7(4), 2675–2699 (2014)
Yin, X., Dai, J., Yau, S.-T., Gu, X.: Slit map: linear conformal parameterization for multiply
connected domains. Comput.-Aided Geom. Des. (CAGD) 4975, 410–422 (2008)
Yu, X., Lei, N., Wang, Y., Gu, X.: Intrinsic 3D dynamic surface tracking based on dynamic ricci
flow and teichmuller map. In: Proceedings of the IEEE International Conference on Computer
Vision, pp. 5390–5398 (2017)
Zeng, W., Gu, X.: Ricci Flow for Shape Analysis and Surface Registration – Theories, Algorithms
and Applications. Springer Briefs in Mathematics. Springer. Springer (2013)
1790 N. Lei et al.
Zeng, W., Marino, J., Gurijala, K.C., Gu, X., Kaufman, A.: Supine and prone colon registration
using quasi-conformal mapping. IEEE Trans. Vis. Comput. Graph. 16(6), 1348–1357 (2010)
Zeng, W., Samaras, D., Gu, X. Ricci flow for 3D shape analysis. IEEE Trans. Pattern Anal. Mach.
Intell. (TPAMI) 32(4), 662–677 (2010)
Zeng, W., Yin, X., Zhang, M., Luo, F., Gu, X.: Generalized Koebe’s method for conformal mapping
multiply connected domains. In: 2009 SIAM/ACM Joint Conference on Geometric and Physical
Modeling, pp. 89–100. ACM (2009)
Zhang, M., Guo, R., Zeng, W., Luo, F., Yau, S.-T., Gu, X.: The unified discrete surface ricci flow.
Graph. Models 76(5), 321–339 (2014)
From Optimal Transport to Discrepancy
50
Sebastian Neumayer and Gabriele Steidl
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1792
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1794
Discrepancies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1797
Optimal Transport and Wasserstein Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1804
Regularized Optimal Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1805
Sinkhorn Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1815
Numerical Approach and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1823
Basic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1824
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1824
Abstract
by numerical examples and show the behavior of the distances when used for the
approximation of measures by point measures in a process called dithering.
Keywords
Introduction
Fig. 1 Approximation of a measure on S2 by an empirical measure (Gräf et al. 2013) (left) and a
measure supported on a curve (Ehler et al. 2019) (right) using discrepancies as objective function
to minimize
50 From Optimal Transport to Discrepancy 1793
with Wasserstein distances; see, e.g., Chauffert et al. (2017), Goes et al. (2012),
and Lebrat et al. (2019).
Recently, regularized versions of OT for an efficient numerical treatment, known
as Sinkhorn divergences (Cuturi 2013), were used as replacement of OT in data
science. Note that such regularization ideas are also investigated in the earlier works
(Rüschendorf 1995; Sinkhorn 1964; Wilson 1969; Yule 1912). For appropriately
related transport cost functions and discrepancy kernels, the Sinkhorn divergences
interpolate between the OT distance if the parameter goes to zero and the
discrepancy if it goes to infinity (Feydy et al. 2019). In this chapter, the convergence
behavior is examined for general measures on compact sets. Since cost functions
applied in practice are mainly Lipschitz, we restrict our attention to such costs. This
simplifies some proofs, since the theorem of Arzelà–Ascoli can be utilized. To make
the paper self-contained, we provide most of the proofs although some of them are
not novel and the corresponding papers are cited in the context. For estimating
approximation rates when approximating measures by those of certain subsets
(see, e.g., Chevallier (2018), Ehler et al. (2019), Genevay et al. (2019), and Novak
and Wozniakowski (2010)), the dual form of the discrepancy, respectively, of the
(regularized) Wasserstein distance, plays an important role. Therefore, we are
interested in the properties of the optimal dual potentials for varying regularization
parameters. In Proposition 5, we prove that the optimal dual potentials converge
uniformly to certain functions as ε → ∞. Then, in Corollary 2, we see that
the normalized difference of these limiting functions coincides with the optimal
potential in the dual form of the discrepancy if the cost function and the kernel are
appropriately related. This behavior is underlined by a numerical example.
This chapter is organized as follows: section “Preliminaries” recalls basic results
on measures, the Kullback-Leibler (KL) divergence, and from convex analysis.
In section “Discrepancies”, we introduce discrepancies, in particular their dual
formulation. Since these rely on positive definite kernels, we have a closer look at
positive definite and conditionally positive definite kernels. Optimal transport and in
particular Wasserstein distances are considered in section “Optimal Transport and
Wasserstein Distances”. In section “Regularized Optimal Transport”, we investigate
the limiting processes for the KL-regularized OT distances, when the regularization
parameter goes to zero or infinity. Some results in Proposition 2 are novel in
this generality; Proposition 5 seems to be new as well. Remark 3 highlights why
the KL divergence should be preferred as regularizer instead of the (neg)-entropy
when dealing with non-discrete measures. KL-regularized OT does not fulfill
OTε (μ, μ) = 0, which motivates the definition of the Sinkhorn divergence Sε in
section “Sinkhorn Divergence”. Further, we prove -convergence to the discrepancy
as ε → ∞ if the cost function of the Sinkhorn divergence is adapted to the kernel
defining the discrepancy. Section “Numerical Approach and Examples” underlines
the results on the limiting process by numerical examples. Further, we provide an
example on the dithering of the standard Gaussian when Sinkhorn divergences with
respect to different regularization parameters ε are involved. Finally, conclusions
and directions of future research are given in section “Conclusions”.
1794 S. Neumayer and G. Steidl
Preliminaries
Measures Let X be a compact Polish space (separable, complete metric space) with
metric distX . By B(X), we denote the Borel σ -algebra on X and by M(X) the linear
space of all finite signed Borel measures on X, i.e., all μ : B(X) → R satisfying
μ(X) < ∞ and for anysequence {Bk }k∈N ⊂ B(X) of pairwise disjoint sets the
∞
relation μ(∪∞ k=1 Bk ) = k=1 μ(Bk ). In the following, the subset of nonnegative
measures is denoted by M+ (X). The support of a measure μ is defined as the
closed set
supp(μ) := x ∈ X : B ⊂ X open, x ∈ B ⇒ μ(B) > 0 .
∞ ∞
|μ|(B) := sup |μ(Bk )| : Bk = B, Bk pairwise disjoint .
k=1 k=1
With the norm μ M = |μ|(X), the space M(X) becomes a Banach space.
By C(X), we denote the Banach space of continuous real-valued functions on
X equipped with the norm ϕ C(X) := maxx∈X |ϕ(x)|. The space M(X) can be
identified via Riesz’ representation theorem with the dual space of C(X), and the
weak-∗ topology on M(X) gives rise to the weak convergence of measures. More
precisely, a sequence {μk }k∈N ⊂ M(X) converges weakly to μ, and we write
μk μ, if
For a nonnegative, finite measure μ and p ∈ [1, ∞), let Lp (X, μ) be the Banach
space (of equivalence classes) of complex-valued functions with norm
1
p
f Lp (X,μ) = |f | dμ
p
< ∞.
X
Convex analysis The following can be found, e.g., in Bredies and Lorenz (2011).
Let V be a real Banach space with dual V ∗ , i.e., the space of real-valued continuous
linear functionals on V . We use the notation v, x = v(x), v ∈ V ∗ , x ∈ V . For
F : V → (−∞, +∞], the domain of F is given by domF := {x ∈ V : F (x) ∈ R}.
If domF = ∅, then F is called proper. The subdifferential of F : V → (−∞, +∞]
at a point x0 ∈ domF is defined as
∂F (x0 ) := v ∈ V ∗ : F (x) ≥ F (x0 ) + v, x − x0 ,
see Ekeland and Témam (1999, Thm. 4.1, p. 61), where we consider
sup − F (−x) − G(Ax) = − inf F (−x) + G(Ax)
x∈V x∈V
as primal problem with respect to the notation in Ekeland and Témam (1999). If the
optimal (primal) solution x̂ exists, it is related to any optimal (dual) solution ŵ by
Df (μ, ν) = f ◦ σμ dν + f∞ μ⊥ (X). (4)
X
see Liero et al. (2018, Rem. 2.10). Hence, Df (·, ν) is the Fenchel conjugate of
H : C(X) → R given by H (ϕ) := X f ∗ ◦ ϕ dν. If f ∗ is differentiable, we directly
deduce from (1) that
In case that the above Radon-Nikodym derivative does not exist, (4) implies
KL(μ, ν) = +∞. For μ, ν ∈ P(X), the last two summands in (6) cancel each
other. Hence, we have for discrete measures μ = nj=1 μj δxj and ν = nj=1 νj δxj
with μj , νj ≥ 0 and nj=1 μj = nj=1 νj = 1 that
n
μj
KL(μ, ν) = log μj .
νj
j =1
50 From Optimal Transport to Discrepancy 1797
Further, the KL divergence is strictly convex with respect to the first variable. Due
to the Fenchel conjugate pairing
dμ
ϕ ∈ ∂μ KL(μ, ν) ⇔ μ = eϕ ν ⇔ ϕ = log . (8)
dν
Finally, note that the KL divergence and the total variation norm · M are related
by the Pinsker inequality μ − ν 2M ≤ KL(μ, ν).
Discrepancies
n
ai aj K(xi , xj ) ≥ 0
i,j =1
is satisfied for all (aj )nj=1 ∈ Rn and strictly positive definite if strict inequality holds
for all (aj )nj=1 = 0. Assuming that K ∈ C(X × X) is symmetric, positive definite,
we know by Mercer’s theorem (Cucker and Smale 2002; Mercer 1909; Steinwart
and Scovel 2011) that there exists an orthonormal basis {φk : k ∈ N} of L2 (X, σX )
and nonnegative coefficients {αk }k∈N ∈ 1 such that K has the Fourier expansion
∞
K(x, y) = αk φk (x)φk (y) (9)
k=0
with absolute and uniform convergence of the right-hand side. If αk > 0 for some
k ∈ N0 , the corresponding function φk is continuous. Every function f ∈ L2 (X, σX )
has a Fourier expansion
1798 S. Neumayer and G. Steidl
∞
f = fˆk φk , fˆk := f φk dσX .
k=0 X
Moreover, for k ∈ N0 with αk > 0, the Fourier coefficients of μ ∈ P(X) are well-
defined by
μ̂k := φk dμ.
X
The kernel K gives rise to a reproducing kernel Hilbert space (RKHS). More
precisely, the function space
∞
HK (X) := f ∈ L2 (X, σX ) : αk−1 |fˆk |2 < ∞
k=0
∞
1
f, gHK (X) = αk−1 fˆk ĝk , f HK (X) = f, f H2 K (X) (10)
k=0
∞ ∞
1
2
K(x, ·) HK (X) = αk φk (x)φk (·) = αk |φk (x)|2
HK (X)
k=0 k=0
is also continuous so that we have X K(x, ·) HK (X) dμ(x) < ∞. By the definition
of Bochner integrals (see Hytönen et al. (2016, Prop. 1.3.1)), we have for any μ ∈
P(X) that
we obtain by Schwarz’ inequality that the optimal dual potential (up to the sign) is
given by
∞
2
D2K (μ, ν) = αk μ̂k − ν̂k , (16)
k=0
w : X → [0, 1], where pure black is the largest value of w and white the smallest
1 M
one. Then, looking for a discrete measure μ = M j =1 δ(·−pj ) that approximates
ν by minimizing the squared discrepancy is equivalent to solving the minimization
problem
1
M M
arg min K(pi , pj ) − w(x)K(x, pi ) .
p∈RM 2M X
i,j =1 i=1
repulsion attraction
• the first term is minimal if the points are far away from each other, implying a
repulsion;
• the second (negative) term becomes maximal if for large w(x), there are many
points positioned in this area; so it can be considered as an attraction steered
by w.
K(x, y) := h distX (x, y) ,
p d
h(r) = (1 − r)+ , p≥ + 1, (17)
2
where a denotes the largest integer less or equal than a ∈ R and a+ := max(a, 0).
In connection with Wasserstein distances, we are interested in (negative) powers
of distances K(x, y) = x − y p , p > 0, related to the functions h(r) = r p .
Unfortunately, all these functions are not positive definite! By (17), we know that
K̃(x, y) = 1 − |x − y| is positive definite in one dimension d = 1. A more general
result for the Euclidean distance is given in the following proposition:
K̃(x, y) := C − x − y
Proof. In Gräf (2013, Cor. 2.15), it was shown that K̃ is positive definite. The rest
follows in a straightforward way from (15) and (14) regarding that μ and ν are
probability measures.
n
ai aj K(xi , xj ) ≥ 0 (18)
i,j =1
n
ai P (xi ) = 0 for all P ∈ m−1 (Rd ).
i=1
The radial kernels related to the following functions are strictly conditionally
positive definite of order m on Rd :
where a denotes the smallest integer larger or equal than a ∈ R. The first group of
functions is called multiquadric and the last group is known as thin plate splines. In
connection with Wasserstein distances, the second group of functions is of interest.
By the following lemma, it is easy to turn conditionally positive definite func-
tions into positive definite ones. However, only for conditionally positive definite
functions of order m = 1 that the discrepancy remains the same.
Lemma 1. Let := {uk : k = 1, . . . , N } with N := d+m−1 m−1 be a set of points
such that P (uk ) = 0 for all k = 1, . . . , N , P ∈ m−1 (Rd ), is only fulfilled for
the zero polynomial. Denote by {Pk : k = 1, . . . , N } the set of Lagrangian basis
polynomials with respect to , i.e., Pk (uj ) = δj k . Let K ∈ C(X×X) be a symmetric
conditionally positive definite kernel of order m.
(i) Then
N
N
K̃(x, y) := K(x, y)− Pj (x)K(uj , y) − Pk (y)K(x, uk )
j =1 k=1
N
+ Pj (x)Pk (y)K(uj , uk )
j,k=1
and
where
50 From Optimal Transport to Discrepancy 1803
Proof.
D2K̃ (μ, ν)
N
N
N
=D2K̃ (μ, ν)− pj (cμ,j +cν,j )− pj (cμ,k +cν,k )+2 pj pk K(uj , uk )
j =1 k=1 j,k=1
N
N
N
+ pj (cμ,j + cν,j ) + pj (cμ,k + cν,k ) − 2 pj pk K(uj , uk )
j =1 k=1 j,k=1
(iii) Let m = 1. Then we have for the optimal dual potential in (14) related to DK̃
that
where (μ, ν) denotes the set of joint probability measures π on X2 with marginals
μ and ν. In our setting, the OT functional π → X2 c dπ is weakly continuous,
(21) has a solution, and every such minimizer π̂ is called optimal transport plan. In
general, we cannot expect the optimal transport plan to be unique. However, if X is
p
a compact subset of a separable Hilbert space, c(x, y) = x − y X , p ∈ (1, ∞),
and either μ or ν is regular (see Ambrosio et al. (2005, Def. 6.2.2) for the technical
definition), then (21) has a unique solution. Instead of giving the exact definition, we
want to remark that for X = Rd the regular measures are precisely the ones which
have a density with respect to the Lebesgue measure.
The c-transform ϕ c ∈ C(X) of ϕ ∈ C(X) is defined as
ϕ c (y) = min c(x, y) − ϕ(x) .
x∈X
Note that ϕ c has the same Lipschitz constant as c. A function ϕ c ∈ C(X) is called
c-concave if it is the c-transform of some function ϕ ∈ C(X).
The dual formulation of the OT problem (21) reads
Maximizing pairs are essentially of the form (ϕ, ψ) = (ϕ̂, ϕ̂ c ) for some c-concave
function ϕ̂ and fulfill ϕ̂(x) + ϕ̂ c (y) = c(x, y) in supp(π̂ ), where π̂ is any optimal
transport plan. The function ϕ̂ is called (Kantorovich) potential for the couple
(μ, ν). If (ϕ̂, ψ̂) is an optimal pair, clearly also (ϕ̂ − C, ψ̂ + C) with C ∈ R
is optimal, and manipulations outside of supp(μ) and supp(ν) do not change the
functional value. But even if we exclude such manipulations, the optimal dual
potentials are in general not unique as Example 1 shows.
⎧ ⎧
⎪ ⎪
⎨0.1 − x
⎪ for x ∈ [0, 0.1], ⎨0.2 − x
⎪ for x ∈ [0, 0.2],
ϕ̂1 (x) = x − 0.9 for x ∈ [0.9, 1], and ϕ̂2 (x) = x − 0.9 for x ∈ [0.9, 1],
⎪
⎪ ⎪
⎪
⎩0 else, ⎩0 else.
Remark 2. Note that the space C(X)2 in the dual problem could also be replaced
with C(supp(μ)) × C(supp(ν)). Using the Tietze extension theorem, any feasible
point of the restricted problem can be extended to a feasible point of the original
problem, and hence the problems coincide. If the problem is restricted, all other
concepts have to be adapted accordingly.
1
p
Wp (μ, ν) := min dist(x, y)p dπ(x, y) .
π ∈(μ,ν) X2
It is a metric on P(X), which metrizes the weak topology. Indeed, due to compact-
ness of X, we have that μk μ if and only if limk→∞ Wp (μk , μ) = 0.
For 1 ≤ p ≤ q < ∞, it holds Wp ≤ Wq . The distance W1 is also called
Kantorovich-Rubinstein distance or Earth’s mover distance. Here, it holds ϕ c = −ϕ
and the dual problem reads
where the maximum is taken over all Lipschitz continuous functions with Lipschitz
constant bounded by 1. This looks similar to the discrepancy (13), but the space of
test functions is larger for W1 .
The distance W1 is related to Wp by
1
W1 (μ, ν) ≤ Wp (μ, ν) ≤ CW1 (μ, ν) p
OTε (μ, ν) := min c dπ + εKL(π, μ ⊗ ν) . (23)
π ∈(μ,ν) X2
Compared to the original OT problem, we will see in the numerical part that OTε
can be efficiently solved numerically; see also Cuturi and Peyré (2019). Moreover,
OTε has the following properties:
Lemma 2.
(i) There is a unique minimizer π̂ε ∈ P(X2 ) of (23) with finite value.
(ii) The function OTε is weakly continuous and Fréchet differentiable.
(iii) For any μ, ν ∈ P(X) and ε1 , ε2 ∈ [0, ∞] with ε1 ≤ ε2 , it holds
Proof.
(i) First, note that μ ⊗ ν is a feasible point and hence the infimum is finite.
Existence of minimizers follows as the functional is weakly lsc and (μ, ν) ⊂
P(X2 ) is weakly compact. Uniqueness follows since KL(·, μ ⊗ ν) is strictly
convex.
(ii) The proof uses the dual formulation in Proposition 3; see Feydy et al. (2019,
Prop. 2).
(iii) Let π̂ε2 be the minimizer for OTε2 (μ, ν). Then, it holds
Note that in special cases, e.g., for absolutely continuous measures (see Carlier
et al. (2017) and Léonard (2012)), it is possible to show convergence of the optimal
solutions π̂ε to an optimal solution of OT(μ, ν) as ε → 0. However, we are not
aware of a fully general result. An extension of entropy regularization to unbalanced
OT is discussed in Chizat et al. (2018).
Originally, entropic regularization was proposed in Cuturi (2013) for discrete
probability measures with the negative entropy E (see also Peyré (2015)),
& ε (μ, ν) :=
OT min c dπ + εE(π ) ,
π ∈(μ,ν) X2
n
E(π ) := log(pij )pij = KL(π, λ ⊗ λ),
i,j =1
50 From Optimal Transport to Discrepancy 1807
where λ denotes the counting measure. For π ∈ (μ, ν), it is easy to check that
n
E(π ) = KL(π, μ ⊗ ν) + log(μi νj )μi νj
i,j =1
i.e., the minimizers are independent of the chosen regularization. For non-discrete
measures, special care is necessary as the following remark shows:
π μ⊗ν ⇐⇒ π λ ⊗ λ,
where the right implication follows directly and the left one can be seen as follows:
If π λ ⊗ λ with density σπ ∈ L1 (X × X), then
0= σπ (x, y) dy dx.
{z∈X:σμ (z)=0} X
σπ (x, y)
π = σπ (λ ⊗ λ) = (μ ⊗ ν),
σμ (x)σν (y)
where the quotient is defined as zero if σμ or σν vanish. Hence, the left implication
also holds true.
If KL(μ ⊗ ν, λ ⊗ λ) < ∞, we conclude for any π λ ⊗ λ with π ∈ (μ, ν)
that the following expressions are well-defined
KL(π, λ ⊗ λ) − KL(μ ⊗ ν, λ ⊗ λ)
d(μ ⊗ ν)
= log(σπ ) dπ − log d(μ ⊗ ν)
X2 X2 d(λ ⊗ λ)
1808 S. Neumayer and G. Steidl
= KL(π, μ ⊗ ν) + log σμ (x)σν (y) dπ(x, y)
X2
− log σμ (x)σν (y) dμ(x) dν(y)
X2
= KL(π, μ ⊗ ν).
Proposition 2.
Proof.
and consequently lim supε→∞ OTε (μ, ν) ≤ OT∞ (μ, ν). In particular, the
optimal transport plan π̂ε satisfies lim supε→∞ εKL(π̂ε , μ ⊗ ν) ≤ OT∞ (μ, ν).
Since KL is weakly lsc, we conclude that the sequence of minimizers π̂ε
satisfies π̂ε μ ⊗ ν as ε → ∞. Hence, we obtain the desired result from
50 From Optimal Transport to Discrepancy 1809
(ii) This part is more involved and follows from Proposition 5 (ii).
Similar as OT in (22), its regularized version OTε can be written in dual form;
see Chizat et al. (2018) and Clason et al. (2019).
OTε (μ, ν) = sup ϕ dμ + ψ dν
(ϕ,ψ)∈C(X)2 X X
ϕ(x) + ψ(y) − c(x, y)
−ε exp − 1 d(μ ⊗ ν) . (24)
X2 ε
If optimal dual solutions ϕ̂ε and ψ̂ε exist, they are related to the optimal transport
plan π̂ε by
F (ϕ, ψ) = ϕ dμ + ψ dν,
X X
ϕ − c
G(ϕ) = ε exp − 1 d(μ ⊗ ν),
X2 ε
A(ϕ, ψ)(x, y) = ϕ(x) + ψ(y).
Then, (24) has the form of the left-hand side in (2). Incorporating (7), we get
G∗ (π ) = c dπ + εKL(π, μ ⊗ ν).
X
= ι(μ,ν) (π ).
dπ̂ε
ϕ̂ε (x) + ψ̂ε (y) = c + log ,
d(μ ⊗ ν)
Remark 4. Using the Tietze extension theorem, we could also replace the space
C(X)2 by C(supp(μ)) × C(supp(ν)).
Note that the last term in (24) is a smoothed version of the associated constraint
ϕ(x) + ψ(y) ≤ c(x, y) appearing in (22). Clearly, the values of ϕ and ψ are only
relevant on supp(μ) and supp(ν), respectively. Further, for any ϕ, ψ ∈ C(X) and
C ∈ R, the potentials ϕ + C, ψ − C realize the same value in (24).
For fixed ϕ or ψ, the corresponding maximizing potentials in (24) are given by
Therefore, any pair of optimal potentials ϕ̂ε and ψ̂ε must satisfy
For every ϕ ∈ C(X) and C ∈ R, it holds Tμ,ε (ϕ + C) = Tμ,ε (ϕ) + C. Hence, Tμ,ε
can be interpreted as an operator on the quotient space C(X)/R, where f1 , f2 ∈
C(X) are equivalent if they differ by a real constant. This space can be equipped
with the oscillation norm
f 1
◦,∞ := 2 (max f − min f ),
Lemma 3.
(i) For any measure μ ∈ P (X), ε > 0, and ϕ ∈ C(X), the function Tμ,ε (ϕ) ∈ C(X)
has the same Lipschitz constant as c and satisfies
' (
Tμ,ε (ϕ)(x) ∈ min c(x, y) − ϕ(y), max c(x, y) − ϕ(y) . (27)
y∈supp(μ) y∈supp(μ)
(ii) For fixed μ ∈ P(X), the operator Tμ,ε : C(supp(μ)) → C(X) is 1-Lipschitz.
Additionally, the operator Tμ,ε : C(supp(μ))/R → C(X)/R is κ-Lipschitz with
κ < 1.
Proof.
so that
ϕ(y) − c(x , y)
2
exp dμ(y)
X ε
L ϕ(y) − c(x , y)
1
≤ exp |x1 − x2 | exp dμ(y).
ε X ε
Tμ,ε (ϕ)(x1 ) − Tμ,ε (ϕ)(x2 ) ≤ ε log exp L |x1 − x2 | = L|x1 − x2 |.
ε
1
Tμ,ε (ϕ1 )(x) − Tμ,ε (ϕ2 )(x) = d
dt Tμ,ε ϕ1 + t (ϕ2 − ϕ1 ) (x) dt (28)
0
1
= ϕ1 (z) − ϕ2 (z) ρt,x (z) dμ(z) dt
0 X
with
exp tϕ2 + (1 − t)ϕ1 − c(x, ·)/ε
ρt,x := .
X exp tϕ2 (z) + (1 − t)ϕ1 (z) − c(x, z) /ε dμ(z)
In order to show the second claim, we choose representatives ϕ1 and ϕ2 such that
ϕ1 − ϕ2 ∞ = ϕ1 − ϕ2 ◦,∞ . Given x, y ∈ X, we conclude using (28) that
1
Tμ,ε (ϕ1 )(x) − Tμ,ε (ϕ2 )(x) − Tμ,ε (ϕ1 )(y) + Tμ,ε (ϕ2 )(y)
2
1 1
= ϕ1 (z) − ϕ2 (z) ρt,x (z) − ρt,y (z) dμ(z) dt
2 0 X
1 1
≤ ϕ1 − ϕ2 ◦,∞ ρt,x − ρt,y L1 (μ) dt. (29)
2 0
and similarly for z ∈ X with pt,y (z) ≥ pt,x (z). Hence, we obtain
Proposition 4. The optimal potentials ϕ̂ε , ψ̂ε ∈ C(X) exist and are unique on
supp(μ) and supp(ν), respectively (up to the additive constant).
which are Lipschitz continuous with the same constant as c by Lemma 3 (i) and
therefore uniformly equi-continuous. Next, we can choose some x0 ∈ supp(μ)
and w.l.o.g. assume ψ̃n (x0 ) = 0. Due to the uniform Lipschitz continuity, the
potentials ψ̃n are uniformly bounded, and by (27), the same holds true for ϕ̃n .
Now, the theorem of Arzelà–Ascoli implies that both sequences contain convergent
subsequences. Since the functional in (24) is continuous, we can readily infer the
existence of optimal potentials ϕ̂ε , ψ̂ε ∈ C(X). Due to the uniqueness of π̂ε , (25)
implies that ϕ̂ε |supp(μ) and ψ̂ε |supp(ν) are uniquely determined up to an additive
constant.
Combining the optimality condition (26) and (24), we directly obtain for any pair
of optimal solutions
ϕ dμ = 1
2 OT∞ (μ, ν), (31)
X
the restricted optimal potentials ϕ̂ε |supp(μ) and ψ̂ε |supp(ν) are unique. The next
proposition investigates the limits of the potentials as ε → 0 and ε → ∞.
Proposition 5.
(i) If (31) is satisfied, the restricted potentials ϕ̂ε |supp(μ) and ψ̂ε |supp(ν) converge
uniformly for ε → ∞ to
respectively.
(ii) For ε → 0, every accumulation point of (ϕ̂ε |supp(μ) , ψ̂ε |supp(ν) ) can be
extended to an optimal dual pair for OT(μ, ν) satisfying (31). In particular,
limε→0 OTε (μ, ν) = OT(μ, ν).
Proof.
(i) Since X is bounded, the Lipschitz continuity of the potentials together with (31)
implies that all ϕ̂ε are uniformly bounded on supp(μ). Then, we conclude for
y ∈ supp(ν) using l’Hôpital’s rule, dominated convergence, and (31) that
X ϕ̂ε (x) − c(x, y) exp ϕ̂ε (x) − c(x, y) /ε dμ(x)
lim ψ̂ε (y) = lim −
X exp ϕ̂ε (x) − c(x, y) /ε dμ(x)
ε→∞ ε→∞
= lim c(x, y) exp ϕ̂ε (x) − c(x, y) /ε
ε→∞ X
− ϕ̂ε (x) exp ϕ̂ε (x) − c(x, y) /ε dμ(x)
= c(x, y) dμ(x)
X
− lim ϕ̂ε (x) exp ϕ̂ε (x)−c(x, y) /ε −1 + ϕ̂ε (x) dμ(x)
ε→∞ X
= c(x, y) dμ(x) − 1
2 OT∞ (μ, ν).
X
Again, a similar reasoning, incorporating (27), can be applied for ϕ̂ε . Finally,
note that pointwise convergence of uniformly Lipschitz continuous functions on
compact sets implies uniform convergence.
(ii) By continuity of the integral, we can directly infer that (31) is satisfied for any
accumulation point. Note that for any fixed ϕ ∈ C(X), x ∈ X, and ε → 0, it
holds
see Feydy et al. (2019, Prop. 9), which by uniform Lipschitz continuity of
Tμ,ε (ϕ) directly implies the convergence in C(X). Let {(ϕ̂εj , ψ̂εj )}j be a
subsequence converging to (ϕ̂0 , ψ̂0 ) ∈ C(supp(μ)) × C(supp(ν)). Then, we
have
50 From Optimal Transport to Discrepancy 1815
and we conclude
Similarly, we get
lim OTεj (μ, ν) = ϕ̂0 dμ + ψ̂0 dν ≤ OT(μ, ν) ≤ lim OTεj (μ, ν).
j →∞ X X j →∞
Hence, the extended potentials are optimal for (22). Since the subsequence choice
was arbitrary, this also shows Proposition 2 (ii).
Sinkhorn Divergence
The regularized functional OTε is biased, i.e., in general minν OTε (ν, μ) =
OTε (μ, μ). Hence, the usage as distance measure is meaningless, which motivates
the introduction of the Sinkhorn divergence
1816 S. Neumayer and G. Steidl
Indeed, it was shown that Sε is nonnegative and biconvex and metrizes the
convergence in law under mild assumptions (Feydy et al. 2019). Clearly, we have
S0 = OT. By (14) and Proposition 5, we obtain the following corollary:
ϕ̂∞ − ψ̂∞
ϕ̂K = .
ϕ̂∞ − ψ̂∞ HK (X)
Note that (12) already implies that for the chosen c, it holds ϕ̂∞ , ψ̂∞ ∈ HK (X).
By Corollary 1, we have for c(x, y) := − K(x, y) that S∞ (μ, ν) = 12 D2K (μ, ν) if
K ∈ C(X×X) is symmetric, positive definite. For the cost c(x, y) = x−y p of the
classical p-Wasserstein distance, we have already seen in section “Discrepancies”
that K(x, y) = −c(x, y) is not positive definite. However, at least for p = 1 the
kernel is conditionally positive definite of order 1 and can be tuned by Proposition 1
to a positive definite kernel by adding a constant, which changes the value of
neither the discrepancy nor the optimal dual potential. More generally, we have the
following corollary:
1
ϕ̂∞ (x) = −K(x, y) dν(y) + K d(μ ⊗ ν) + K(x, ξ )
X 2 X2
1
+ cν − cμ − K(ξ, ξ ) ,
2
1
ψ̂∞ (y) = −K(x, y) dμ(x) + K d(μ ⊗ ν) + K(ξ, y)
X 2 X2
1
+ cμ − cν − K(ξ, ξ ) ,
2
The importance of -convergence relies in the fact that every cluster point of
minimizers of {Fn }n∈N is a minimizer of F .
Proposition 6. It holds Sε (·, ν) −
→ S∞ (·, ν) as ε → ∞ and Sε (·, ν) −
→ OT(·, ν)
as ε → 0.
Proof. In both cases, the lim sup-inequality follows from Proposition 2 by choosing
for some fixed μ ∈ P(X) the constant sequence μn = μ, n ∈ N.
Concerning the lim inf-inequality, we first treat the case ε → ∞. Let μn μ
and εn → ∞. Since OTε (μ, ν) is increasing with ε, it holds for every fixed m ∈ N
that
lim inf Sεn (μn , ν) = lim inf OTεn (μn , ν) − 12 OTεn (μn , μn ) − 12 OTεn (ν, ν)
n→∞ n→∞
≥ lim inf OTm (μn , ν) − 12 OT∞ (μn , μn ) − 12 OT∞ (ν, ν).
n→∞
In this section, we discuss the Sinkhorn algorithm for computing OTε based on
the (pre)-dual form (24) and show some numerical examples. As pointed out in
Remark 4, we can restrict the potentials and the update operator (26) to supp(μ)
and supp(ν), respectively. In particular, this restriction results in a discrete problem
if both input measures are atomic. For a fixed starting iterate ψ (0) , the Sinkhorn
algorithm iterates are defined as
Equivalently, we could rewrite the scheme with just one potential and the following
update ψ (i+1) = Tμ,ε ◦ Tν,ε (ψ (i) ). According to Lemma 3, the operator Tμ,ε ◦ Tν,ε
is contractive, and hence the Banach fixed point theorem implies that the algorithm
converges linearly. Note that it suffices to enforce the additional constraint (31) after
the Sinkhorn scheme by adding an appropriately chosen constant. Then, the value
of OTε (μ, ν) can be computed from the optimal potentials using (30). Here, we
do not want to go into more detail on implementation issues, since this is not the
main scope of this chapter. The numerical examples merely serve as an illustration
of the theoretical results. All computations in this section are performed using
GEOMLOSS, a publicly available PyTorch implementation for regularized optimal
transport. Implementation details can be found in Feydy et al. (2019) and in the
corresponding GitHub repository.
We observe that the values converge as shown in Proposition 2 and that the
change mainly happens in the interval [10−2 , 101 ]. Additionally, the numerical
results indicate Sε1 (μ, ν) ≤ Sε2 (μ, ν) for ε1 > ε2 , which is the opposite behavior
as for OTε where the energies increase; see Lemma 2 (iii). So far we are not aware
of any theoretical result in this direction for Sε (μ, ν).
50 From Optimal Transport to Discrepancy 1819
Fig. 2 Energy values between S0 and S∞ for two given measures on [0, 1] and cost function
c(x, y) = |x − y|. Every blue dot corresponds to the position and the weight of a Dirac measure.
(a) Measure μ. (b) Measure ν. (c) Values Sε (μ, ν) for increasing ε
Next, we investigate the behavior of the corresponding optimal potentials ϕ̂ε and
ψ̂ε in (24). The convergence of the potentials as shown in Proposition 5 (iii) is
numerically verified in Fig. 3. Further, the corresponding potentials ϕ̂ε are depicted
in Fig. 4, and the differences ϕ̂ε −ψ̂ε are depicted in Fig. 5. According to Corollary 1,
this difference is related to the optimal potential ϕ̂K in the dual formulation of
the related discrepancy. The shape of the potentials ranges from something almost
linear for small ε to something more quadratic for large ε. Again, we observe that
the changes mainly happen for ε in the interval [10−2 , 101 ] and that numerical
instabilities start to occur for ε > 103 . For small values of ε, we actually observe
numerical convergence and that the relation ψ̂ε ≈ −ϕ̂ε holds true; see Fig. 3c.
This fits the theoretical findings for W1 (μ, ν) in section “Optimal Transport and
Wasserstein Distances”.
Fig. 3 Numerical verification of Prop. 5 and of ψ̂ε ≈ −ϕ̂ε for small ε. (a) supsupp(μ) |ϕ̂ε − ϕ̂∞ |
for increasing values of ε. (b) supsupp(ν) |ψ̂ε − ψ̂∞ | for increasing values of ε. (c) ϕ̂1e−4 + ψ̂1e−4
For solving this problem, we can equivalently minimize over the positions of the
equally weighted Dirac spikes in ν. Hence, we need the gradient of Sε with respect
to these positions. If ε = ∞, this gradient is given by an analytic expression.
Otherwise, we can apply automatic differentiation tools to the Sinkhorn algorithm
in order to compute a numerical gradient; see Feydy et al. (2019) for more details.
Here, it is important to ensure high enough numerical precision and to perform
enough Sinkhorn iterations. In any case, the gradient serves as input for the
L-BFGS-B (quasi-Newton) method in which the Hessian is approximated in a
memory-efficient way (Byrd et al. 1995). The numerical results are depicted in
Fig. 6, where all examples are iterated to high numerical precision. Numerically,
50 From Optimal Transport to Discrepancy 1821
Fig. 4 Optimal potentials ϕ̂ε in OTε (μ, ν) for increasing values of ε. (a) ϕ̂0.02 . (b) ϕ̂0.08 . (c) ϕ̂0.32 .
(d) ϕ̂1.28 . (e) ϕ̂81.92 . (f) ϕ̂∞
Fig. 5 Difference ϕ̂ε − ψ̂ε of optimal potentials in OTε (μ, ν) for increasing ε, where the
normalized function ϕ̂∞ − ψ̂∞ coincides with the optimal dual potential ϕ̂K in the discrepancy by
Corollary 2. (a) ϕ̂0.02 −ψ̂0.02 . (b) ϕ̂0.08 −ψ̂0.08 . (c) ϕ̂0.32 −ψ̂0.32 . (d) ϕ̂1.28 −ψ̂1.28 . (e) ϕ̂81.92 −ψ̂81.92 .
(f) ϕ̂∞ − ψ̂∞
N
2
αk μ̂k − ν̂k , N := 128
k=0
as target functional; see Gräf et al. (2013). The value of S∞ for the Fourier
method is slightly larger than the result using optimization of S∞ directly. Since
the computational cost increases as ε gets smaller, we suggest to choose ε ≈ 1 or
50 From Optimal Transport to Discrepancy 1823
Fig. 6 Optimal approximations ν̂ and corresponding energies Sε (μ, ν̂) for increasing ε. (a) Fixed
measure μ. (b) S0.03 (μ, ν̂) = 1.303e−3 . (c) S0.15 (μ, ν̂) = 1.071e−4 . (d) S1.25 (μ, ν̂) = 1.491e−5 .
(e) S∞ (μ, ν̂) = 1.118e−5 . (f) Fourier formulation (Ehler et al. 2019), S∞ (μ, ν̂) = 1.156e−5
to directly stick with discrepancies. This also avoids that the approximation rates
suffer from the so-called curse of dimensionality.
Finally, note that we sampled μ with a lot more points than we used for the
dithering. If not enough points are used, we would observe clustering of the dithered
measure around the positions of μ. One possibility to avoid such a behavior for
Sε could be to use the semi-discrete approach described in Genevay et al. (2016),
avoiding any sampling of the measure μ. In the Fourier-based approach, this issue
was less pronounced.
Conclusions
Proposition 5 (ii), we were not able to show convergence of the whole sequence of
optimal potentials {(ϕ̂ε , ψ̂ε )}ε without further assumptions so far.
Basic Theorems
References
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows in Metric Spaces and in the Space of
Probability Measures. Birkhäuser, Basel (2005)
Berman, R.J.: The Sinkhorn algorithm, parabolic optimal transport and geometric Monge–Ampère
equations. Numer. Math. 145(4), 771–836 (2020)
Braides, A.: -Convergence for Beginners. Oxford University Press, Oxford (2002)
Bredies, K., Lorenz, D.: Mathematische Bildverarbeitung. Vieweg+Teuber, Wiesbaden (2011)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained
optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)
Carlier, G., Duval, V., Peyré, G., Schmitzer, B.: Convergence of entropic schemes for optimal
transport and gradient flows. SIAM J. Math. Anal. 49(2), 1385–1418 (2017)
Chauffert, N., Ciuciu, P., Kahn, J., Weiss, P.: A projection method on measures sets. Constr.
Approx. 45(1), 83–111 (2017)
50 From Optimal Transport to Discrepancy 1825
Chevallier, J.: Uniform decomposition of probability measures: quantization, clustering and rate of
convergence. J. Appl. Probab. 55(4), 1037–1045 (2018)
Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.-X.: Scaling algorithms for unbalanced optimal
transport problems. Math. Comput. 87(314), 2563–2609 (2018)
Clason, C., Lorenz, D., Mahler, H., Wirth, B.: Entropic regularization of continuous optimal
transport problems. arXiv:1906.01333 (2019)
Cominetti, R., San Martín, J.: Asymptotic analysis of the exponential penalty trajectory in linear
programming. Math. Program. 67(1–3), 169–187 (1994)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39(1),
1–49 (2002)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in
Neural Information Processing Systems, vol. 26, pp. 2292–2300 (2013)
Cuturi, M., Peyré, G.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6),
355–607 (2019)
Delsarte, P., Goethals, J.M., Seidel, J.J.: Spherical codes and designs. Geom. Dedicata 6, 363–388
(1977)
Di Marino, S., Gerolin, A.: An optimal transport approach for the Schrödinger bridge problem and
convergence of Sinkhorn algorithm. arXiv:1911.06850 (2019)
Dziugaite, G.K., Roy, D.M., Ghahramani, Z.: Training generative neural networks via maximum
mean discrepancy optimization. In: Proceedings of the 31st Conference on Uncertainty in
Artificial Intelligence, pp. 258–267 (2015)
Ehler, M., Gräf, M., Neumayer, S., Steidl, G.: Curve based approximation of measures on
manifolds by discrepancy minimization. arXiv:1910.06124 (2019)
Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999)
Fernández, V.A., Gamero, M.J., García, J.M.: A test for the two-sample problem based on empirical
characteristic functions. Comput. Stat. Data Anal. 52(7), 3730–3748 (2008)
Feydy, J., Séjourné, T., Vialard, F.-X., Amari, S., Trouvé, A., Peyré, G.: Interpolating between
optimal transport and MMD using Sinkhorn divergences. In: Proceedings of Machine Learning
Research, vol. 89, pp. 2681–2690. PMLR (2019)
Genevay, A., Cuturi, M., Peyré, G., Bach, F.: Stochastic optimization for large-scale optimal
transport. In: Advances in Neural Information Processing Systems, vol. 29, pp. 3440–3448
(2016)
Genevay, A., Chizat, L., Bach, F., Cuturi, M., Peyré, G.: Sample complexity of Sinkhorn
divergences. In: Proceedings of Machine Learning Research, vol. 89, pp. 1574–1583. PMLR
(2019)
Gnewuch, M.: Weighted geometric discrepancies and numerical integration on reproducing kernel
Hilbert spaces. J. Complex. 28(1), 2–17 (2012)
Goes, F.D., Breeden, K., Ostromoukhov, V., Desbrun, M.: Blue noise through optimal transport.
ACM Trans. Graph. 31(6), 171–182 (2012)
Gräf, M.: Efficient Algorithms for the Computation of Optimal Quadrature Points on Riemannian
Manifolds. PhD thesis, TU Chemnitz (2013)
Gräf, M., Potts, M., Steidl, G.: Quadrature errors, discrepancies and their relations to halftoning on
the torus and the sphere. SIAM J. Sci. Comput. 34(5), 2760–2791 (2013)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-
sample-problem. In: Advances in Neural Information Processing Systems, vol. 19, pp. 513–520
(2007)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J.
Mach. Learn. Res. 13(1), 723–773 (2012)
Hytönen, T., van Neerven, J., Veraar, M., Weis, L.: Analysis in Banach Spaces-Vol. I: Martingales
and Littlewood-Paley Theory. A Series of Modern Surveys in Mathematics, vol. 63. Springer,
Cham (2016)
Kuipers, L., Niederreiter, H.: Uniform Distribution of Sequences. Wiley, New York (1974)
Lebrat, L., de Gournay, F., Kahn, J., Weiss, P.: Optimal transport approximation of 2-dimensional
measures. SIAM J. Imaging Sci. 12(2), 762–787 (2019)
1826 S. Neumayer and G. Steidl
Léonard, C.: From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal.
262(4), 1879–1920 (2012)
Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problems and a new Hellinger–
Kantorovich distance between positive measures. Invent. Math. 211(3), 969–1117 (2018)
Lorenz, D., Manns, P., Meyer, C.: Quadratically regularized optimal transport. J. Math. Anal. Appl.
494, 124432 (2021)
Matousek, J.: Geometric Discrepancy. Algorithms and Combinatorics, vol. 18. Springer, Berlin
(2010)
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral
equations. Philos. Trans. R. Soc. Lond. Ser. A 209(441–458), 415–446 (1909)
Micchelli, C.A.: Interpolation of scattered data: distance matrices and conditionally positive
definite functions. Constr. Approx. 2(1), 11–22 (1986)
Navrotskaya, I., Rabier, P.J.: L log L and finite entropy. Adv. Nonlinear Anal. 2(4), 379–387 (2013)
Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems. Volume II. EMS Tracts in
Mathematics, vol. 12. EMS Publishing House, Zürich (2010)
Peyré, G.: Entropic Wasserstein gradient flows. SIAM J. Imaging Sci. 8(4), 2323–2351 (2015)
Rüschendorf, L.: Convergence of the iterative proportional fitting procedure. Ann. Stat. 23(4),
1160–1174 (1995)
Santambrogio, F.: Optimal Transport for Applied Mathematicians. Progress in Nonlinear Differen-
tial Equations and Their Applications, vol. 87. Birkhäuser, Basel (2015)
Schmaltz, C., Gwosdek, P., Bruhn, A., Weickert, J.: Electrostatic halftoning. Comput. Graph. For.
29(8), 2313–2327 (2010)
Schoenberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39(4), 811–841
(1938)
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices.
Ann. Math. Statist. 35(2), 876–879 (1964)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)
Steinwart, I., Scovel, C.: Mercer’s theorem on general domains: on the interaction between
measures, kernels, and RKHSs. Constr. Approx. 35(3), 363–417 (2011)
Teuber, T., Steidl, G., Gwosdek, P., Schmaltz, C., Weickert, J.: Dithering by differences of convex
functions. SIAM J. Imaging Sci. 4(1), 79–108 (2011)
Vialard, F.-X.: An elementary introduction to entropic regularization and proximal methods for
numerical optimal transport. Lecture (2019)
Wendland, H.: Scattered Data Approximation. Cambridge Monographs on Applied and Computa-
tional Mathematics, vol. 17. Cambridge University Press, Cambridge (2004)
Wilson, A.G.: The use of entropy maximising models in the theory of trip distribution, mode split
and route split. J. Transp. Econ. Policy 3(1), 108–126 (1969)
Yule, G.U.: On the methods of measuring association between two attributes. J. R. Stat. Soc. 75(6),
579–652 (1912)
Compensated Convex-Based Transforms
for Image Processing and Shape 51
Interrogation
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1828
Related Areas: Semiconvex Envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1832
Related Areas: Proximity Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1833
Related Areas: Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1834
Related Areas: Quadratic Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1835
Outline of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1836
Notation and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1836
Compensated Convexity-Based Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1843
Smoothing Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1843
Stable Ridge/Edge Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1844
Stable Multiscale Intersection Transform of Smooth Manifolds . . . . . . . . . . . . . . . . . . . . 1853
Stable Multiscale Medial Axis Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1856
Approximation Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1859
Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1862
Convex-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1862
Moreau Envelope-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1865
Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867
Prototype Example: Upper Transform of a Singleton Set of R2 . . . . . . . . . . . . . . . . . . . . 1867
A. Orlando
CONICET, Departamento de Bioingeniería, Universidad Nacional de Tucumán, Tucumán,
Argentina
e-mail: [email protected]
E. Crooks
Department of Mathematics, Swansea University, Swansea, UK
e-mail: [email protected]
K. Zhang ()
School of Mathematical Sciences, University of Nottingham, Nottingham, UK
e-mail: [email protected]
Abstract
This paper reviews some recent applications of the theory of the compensated
convex transforms or of the proximity hull as developed by the authors to
image processing and shape interrogation with special attention given to the
Hausdorff stability and multiscale properties. This paper contains also numerical
experiments that demonstrate the performance of our methods compared to the
state-of-art ones.
Keywords
Introduction
The compensated convex transforms were introduced in Zhang (2008a,b) for the
purpose of tight approximation of functions defined in Rn , and their definitions
were originally motivated by the translation method (Tartar 1985) in the study of the
quasiconvex envelope in the vectorial calculus of variations (see Dacorogna (2008)
and references therein) and in the variational approach of material microstructure
(Ball and James 1987). Thanks to their smoothness and tight approximation prop-
erty, these transforms provide geometric convexity-based techniques for general
functions that yield novel methods for identifying singularities in functions (Zhang
et al. 2015a,b,c, 2016b) and new tools for function and image interpolation and
approximation (Zhang et al. 2016a, 2018). In this paper, we present some of the
applications that have been tackled by this theory up to date. These range from the
detection of features in images or data (Zhang et al. 2015b,c, 2016b) to multiscale
medial axis extraction (Zhang et al. 2015a), to surface reconstruction from level sets,
to approximation of scattered data and noise removal from images, and to image
inpainting (Zhang et al. 2016a, 2018).
Suppose f : Rn → R satisfies the following growth condition
where |x| is the Euclidean norm of x ∈ Rn and co[g] the convex envelope
(Hiriart-Urruty and Lemaréchal 2001; Rockafellar 1970) of a function g : Rn → R
bounded below. Similarly, given f : Rn → R satisfying the growth condition
It is not difficult to verify that if f meets both (1) and (3), for instance, if f is
bounded, there holds
thus, the lower and upper compensated convex transforms are λ-parametrized
families of transforms that approximate f from below and above, respectively.
Furthermore, they have smoothing effects and are tight approximations of f in the
sense that if f is C 1,1 in a neighborhood of x0 , there is a finite Λ > 0, such that
f (x0 ) = Cλl (f )(x0 ) (respectively, f (x0 ) = Cλu (f )(x0 ) whenever λ ≥ Λ. This
approximation property, which we refer to as tight approximation, is pivotal in the
developments of the theory, because it allows the transforms to be used for detecting
singularities of functions by exploiting the fact that it is only when a point x is close
to a singularity point of f we might find that the values of Cλl (f )(x) and Cλu (f )(x)
might be different from that of f (x) (Zhang et al. 2015b). Figure 1 visualizes
the smoothing and tight approximation of the mixed transform Cλu (Cλl (f )) of the
squared-distance function f to a four-point set. Given the type of singularity of f ,
we apply the lower transform to f which smoothes the “concave”-like singularity
followed by the upper transform that smoothes the “convex”-like singularity of
Cλl (f ) which are unaltered with respect to the original function f . This can be
appreciated by the graph of the pointwise error e(x) = |f (x) − Cλu (Cλl (f ))(x)| for
x ∈ which is zero everywhere but in a neighborhood of the singularities of f .
The transforms additionally satisfy the locality property that the values of Cλl (f ),
Cλ (f ) at x ∈ Rn depend only on the values of f in a neighborhood of x and are
u
translation invariant in the sense that Cλl (f ), Cλu (f ) are unchanged if the “weight”
| · |2 in the formula (2) and (4) is replaced by | · −x0 |2 for any shift x0 ∈ Rn .
1830 A. Orlando et al.
Fig. 1 Graph of (a) a squared-distance function f to a four-point set, (b) its mixed transform
Cλu (Cλl (f )) and (c) the pointwise error e = |f − Cλu (Cλl (f ))|
These last two properties make the explicit calculation of transforms tractable for
specific prototype functions f , which facilitate the creation of dedicated extractors
for a variety of different types of singularity using customized combinations of the
transforms.
These new geometric approaches enjoy key advantages over previous image and
data processing techniques (Chan and Shen 2005; Schönlieb 2015). The curvature
parameter λ provides scales for features that allow users to select which size of
feature they wish to detect, and the techniques are blind and global, in the sense that
images/data are treated as a global object with no a priori knowledge required of,
e.g., feature location. Figure 2 displays the λ−scale dependence in the case of the
medial axis where λ is associated with the scale of the different branches, whereas
Fig. 3 shows the multiscale feature for given λ associated with the height of the
different branches of the multiscale medial axis map.
Many of the methods can also be shown to be stable under perturbation and
different sampling techniques. Most significantly, Hausdorff stability results can be
rigorously proven for many of the methods. For example, the Hausdorff-Lipschitz
continuity estimate (Zhang et al. 2015b)
√
|Cλu (χE )(x) − Cλu (χF )(x)| ≤ 2 λdistH (E, F ), x ∈ Rn ,
shows that the upper transform Cλu is Hausdorff stable against sampling of geometric
shapes defined by their characteristic functions. Such stability is particularly
important for the extraction of information when “point clouds” represent sampled
domains. If a geometric shape is densely sampled, then from a human vision
point of view, one can typically still identify geometric features of the sample
and sketch its boundary. From the mathematical/computer science perspective,
however, feature identification from sampled domains is challenging, and usually
methods are justified only by either ad hoc arguments or numerical experiments.
Figure 4 displays an instance of this property where we show the edges of the
51 Compensated Convex-Based Transforms for Image Processing and. . . 1831
Fig. 2 Support of the multiscale medial axis map (suplevel set with level t =
10−8 maxx∈R2 Mλ (·; K)) with the “spurious” branches generated by pixelation of the boundary
for (a) λ = 1 and for (b) λ = 8
Fig. 3 Selection of branches via the suplevel set of the multiscale medial axis map for λ =
1 using different values of the threshold t, (a) t = 10−3 maxx∈R2 Mλ (·; K) and (b) t = 2 ·
10−2 maxx∈R2 Mλ (·; K)
continuous nonnegative function f (x, y) = dist2 ((x, y), ∂), with (x, y) ∈
= ([−1.5, 1.5] × [−1.5, 1.5]) \ ([−1.5, 0.5] × [−1.5, −0.5]) and of its sparse
sampling f · χA where A ⊂ is a sparse set (see Fig. 4a, b, respectively). Due
to the Hausdorff stability of the stable ridge transform, we are able to recover an
approximation of the ridges from the sampled image (compare Fig. 4c, d).
Via fast and robust numerical implementations of the transforms (Zhang et al.
2021), this theory also gives rise to a highly effective computational toolbox for
1832 A. Orlando et al.
Fig. 4 (a) Image of f (x, y); (b) sampled image of f (x, y) by random salt and pepper noise; (c)
stable ridges of f (x, y); (d) stable ridges from sampled image
applications. The efficiency of the numerical computations benefits greatly from the
locality property, which holds despite the global nature of the convex envelope itself.
Before we describe the applications of this theory, we provide next alternative
characterizations of the compensated convex transforms.
Given the definitions (2) and (4), lower and upper compensated convex transforms
can be considered as parameterized semiconvex and semiconcave envelopes, respec-
tively, for a given function. The notions of semiconvex and semiconcave functions
go back at least to Reshetnyak (1956) and have since been studied by many authors
in different contexts (see, e.g., Alberti et al. 1992; Cannarsa and Sinestrari 2004;
Lasry and Lions 1986). Let ⊆ Rn be an open set; we recall that a function
f : → R ∪ {+∞} is semiconvex if there is a constant C ≥ 0 such that
f (x) = g(x)−C|x|2 with g a convex function. More general weight functions, such
as |x|σ (|x|), for example, are also used in the literature for defining more general
semiconvex functions (Alberti et al. 1992). Since general DC functions (difference
of convex functions) (Hartman 1959) and semiconvex/semiconcave functions are
locally Lipschitz functions in their essential domains (Cannarsa and Sinestrari
2004, Theorem 2.1.7), Rademacher’s theorem implies that they are differentiable
almost everywhere. Fine properties for the singular sets of convex/concave and
semiconvex/semiconcave functions have been studied extensively (Alberti et al.
1992; Cannarsa and Sinestrari 2004) showing that the singular set of a semi-
convex/semiconcave function is rectifiable. By applying results and tools of the
theory of compensated convex transforms, it is possible therefore to study how
such functions can be effectively approximated by smooth functions; whether all
singular points are of the same type, that is, whether for semiconcave (semiconvex)
functions, all singular points are geometric ‘ridge’ (‘valley’) points; how singular
sets can be effectively extracted beyond the definition of differentiability; and how
the information concerning “strengths” of different singular points can be effectively
measured. These are all questions relevant to applications in image processing and
51 Compensated Convex-Based Transforms for Image Processing and. . . 1833
computer-aided geometric design. An instance of this study, for example, has been
carried out in Zhang et al. (2015a, 2016b) to study the singular set of the Euclidean
squared-distance function dist2 (·, c ) to the complement of a bounded open domain
⊂ Rn (called the medial axis (Blum 1967) of the domain ) and of the weighted
squared-distance function.
where the Moreau lower and upper envelopes (Moreau 1965) are defined, in our
notation, respectively, by
with f satisfying the growth condition (1) and (3), respectively. Moreau envelopes
play important roles in optimization, nonlinear analysis, optimal control, and
Hamilton-Jacobi equations, both theoretically and computationally (Crandall et al.
1992; Cannarsa and Sinestrari 2004; Hiriart-Urruty and Lemaréchal 2001; Rock-
afellar and Wets 1998). The mixed Moreau envelopes M τ (Mλ (f )) and Mτ (M λ (f ))
coincide with the Lasry-Lions double envelopes (fλ )τ and (f λ )τ defined in Lasry
and Lions (1986) by (16) and (17), respectively, in the case of λ = τ and are also
referred to in Rockafellar and Wets (1998) as proximal hull and upper proximal hull,
respectively. They have been extensively studied and used as approximation and
smoothing methods of not necessarily convex functions (Attouch and Aze 1993;
Cannarsa and Sinestrari 2004; Hare 2009; Parikh and Boyd 2013). In particular,
in the partial differential equation literature, the focus of the study of the mixed
Moreau envelopes M τ (Mλ (f )) and Mτ (M λ (f )) for the case τ > λ is known, under
suitable growth conditions, as the Lasry-Lions regularizations of f of parameter λ
and τ . In this case, the mixed Moreau envelopes are both C 1,1 functions (Attouch
and Aze 1993; Cannarsa and Sinestrari 2004; Lasry and Lions 1986). However,
crucially they are not “tight approximations” of f , in contrast with our lower
and upper transforms Cλl (f )(x) and Cλu (f )(x) (Zhang 2008a). Generalized inf
and sup convolutions have also been considered, for instance, in Cannarsa and
Sinestrari (2004) and Rockafellar and Wets (1998). However, due to the way these
regularization operators are defined, proof of mathematical and geometrical results
to describe how such approximations work has usually been challenging, making
their analysis and applications very difficult. As a result, the study of the proximal
1834 A. Orlando et al.
hull using the characterization in terms of the compensated convex transform would
make them much more accessible and feasible for real-world applications.
Moreau lower and upper envelopes have also been employed in mathematical
morphology in the 1990s (Jackway 1992; van den Boomgaard 1992), to define
gray scale erosion and dilation morphological operators, whereas the critical mixed
Moreau envelopes M λ (Mλ (f )) and Mλ (M λ (f )) are gray scale opening and closing
morphological operators (Serra 1982; Soille 2004). In convex analysis, the infimal
convolution of f with g is denoted as f g and is defined as (Rockafellar 1970;
Rockafellar and Wets 1998)
(see (9) and (10)). Such characterizations will allow us to derive various new
geometric and stability properties for opening and closing morphological operators.
Furthermore, when we apply compensated convex transforms to extract singularities
from characteristic functions of compact geometric sets, our operations can be
viewed as the application of morphological operations devised for “gray scale
images” to “binary images.” As a result, it might look not efficient to apply more
involved operations for processing binary images, when in the current literature
Serra (1982) and Soille (2004) there are “binary” set theoretic morphological
operations that have been specifically designed for the tasks under examination.
Nevertheless, an advantage of adopting our approach is that the compensated
convex transforms of characteristic functions are (Lipschitz) continuous; therefore,
applying a combination of transforms will produce a landscape of various levels
(heights) that can be designed to highlight a specific type of singularity. We can then
extract multiscale singularities by taking thresholds at different levels. In fact, the
graphs of functions obtained by combinations of compensated convex transforms
contain much more geometric information than binary operations that produce
simply a yes or no answer. Also, for “thin” geometric structures, such as curves and
surfaces, it is difficult to design “binary” morphological operations to be Hausdorff
stable.
From definition (2), it also follows that Cλl (f )(x) is the envelope of all the quadratic
functions with fixed quadratic term λ|x|2 that are less than or equal to f , that is,
Cλl (f )(x) = sup −λ|x|2 + (x) : −λ|y|2 + (y)
≤ f (y) for all y ∈ Rn and affine , (9)
whereas from (4) it follows that Cλu (f )(x) is the envelope of all the quadratic
functions with fixed quadratic term λ|x|2 that are greater than or equal to f , that
is,
Cλu (f )(x) = inf λ|x|2 + (x) : f (y) ≤ λ|y|2
+ (y) for all y ∈ Rn and affine . (10)
This characterization was first given in Zhang et al. (2015b, Eq. (1.4)) and can be
derived by noting that since the convex envelope of a function g can be characterized
as the pointwise supremum of the family Aff(Rn ) of all the affine functions which
are majorized by g, we have then
1836 A. Orlando et al.
= sup (x) − λ|x|2 : (y) − λ|y|2 ≤ f (y) for any y ∈ Rn ,
∈Aff(Rn )
(11)
which is (9). As stated before, (11) can be in turn related directly to the Moreau’s
mixed envelope. The characterization (9) has been recently also reproposed by
Carlsson (2019) for the study of low-rank approximation and compressed sensing.
It is instructive to compare this characterization with (75) below about the
Moreau envelope as lower envelope of parabolas with given curvature λ.
The plan of this paper is as follows: After this general introduction, we will intro-
duce relevant notation and recall basic results in convex analysis and compensated
convex transforms in the next section. In section “Compensated Convexity-Based
Transforms”, we introduce the different compensated convex-based transforms that
we have been developing. Their definition can be either motivated by a mere
application of key properties of the basic transforms, namely, the lower and upper
transform, or by an ad hoc designed combinations of the basic transforms so to
create a singularity at the location of the feature of interest. Section “Numerical
Algorithms” introduces some of the numerical schemes that can be used for the
numerical realization of the compensated convex-based transforms, namely, of
the basic transform given by the lower compensated convex transform. We will
therefore describe the convex-based and Moreau-based algorithms, which can be
both used according to whether we refer to the definition (2) or the characterization
(5) of the lower compensated convex transform. Section “Numerical Examples”
contains some representative applications of the transformations introduced in this
paper. More specifically, we will consider an application to shape interrogation by
considering the problem of identifying the location of intersections of manifolds
represented by point clouds and applications of our approximation compensated
convex transform to the reconstruction of surfaces using level lines and isolated
points, image inpainting, and salt & pepper noise removal.
smallest (with respect to inclusion) convex set that contains the set K and χK its
characteristic function, that is, χK (x) = 1 if x ∈ K and χK (x) = 0 if x ∈ K c .
The Euclidean distance transform of a non-empty set K ⊂ Rn is the function
that, at any point x ∈ Rn , associates the Euclidean distance of x to K, which is
defined as inf{|x − y|, y ∈ K} and is denoted as dist(x, K). Let δ > 0, the open
δ-neighborhood K δ of K is then defined by K δ = {x ∈ Rn , dist(x, K) < δ} and
is an open set. For x ∈ Rn and r > 0, B(x; r) indicates the open ball with center
x and radius r, whereas S(x; r) denotes the sphere with center x and radius r, that
is, S(x; r) = ∂B(x; r) is the boundary of B(x; r). The suplevel set of a function
f : ⊆ Rn → R of level α is the set
Sα f = {x ∈ : f (x) ≥ α} , (12)
whereas the level set of f with level α is also defined by (12) with the inequality
sign replaced by the equality sign. Finally, we use the notation Df to denote the
derivative of f .
Next, we list some basic properties of compensated convex transforms. Without
loss of generality, these properties are stated mainly for the lower compensated
convex transform given that it is then not difficult to derive the corresponding results
for the upper compensated convex transform using (4). Only in the case f is the
characteristic function of a set K, i.e., f = χK , we will refer explicitly to Cλu (χK )
given that Cλl (χK )(x) = 0 for any x ∈ Rn if K is, e.g., a finite set. For details and
proofs, we refer to Zhang (2008a) and Zhang et al. (2015b) and references therein,
whereas for the relevant notions of convex analysis, we refer to Hiriart-Urruty and
Lemaréchal (2001) and Rockafellar (1970).
n+1
n+1
n+1
co[f ](x0 ) = inf λi f (xi ) : λi = 1, λi xi = x0 ,
xi ∈R
n
i=1,...,n+1 i=1 i=1 i=1
λi ≥ 0 i = 1, . . . , n + 1 , (13)
that is, the convex envelope of f at a point x0 ∈ Rn depends on the values of f on its
whole domain of definition, namely, Rn in this case. We will however introduce also
a local version of this concept which will be used to formulate the locality property
of the compensated convex transform and is fundamental for our applications.
n+1
n+1
n+1
coB(x0 ; r) [f ](x0 ) = inf λi f (xi ) : λi = 1, λi xi = x0 ,
xi ∈B(x0 ; r)
i=1,...,n+1 i=1 i=1 i=1
λi ≥ 0 i = 1, . . . , n + 1 . (14)
Unlike the global definition, the infimum in (14) is taken only over convex
combinations in B(x0 ; r) rather than in Rn .
As part of the convex analysis reminder, we also recall the definition of the
Legendre-Fenchel transform.
and
(f λ )τ (x) = infn sup f (u) − λ|u − y|2 + τ |y − x|2
y∈R u∈Rn
= Mτ (M λ (f ))(x) . (17)
51 Compensated Convex-Based Transforms for Image Processing and. . . 1839
Both (fλ )τ and (f λ )τ approach f from below and above, respectively, as the
parameters λ and τ go to +∞. If λ = τ , then (fλ )λ = M λ (Mλ (f )) is called
proximal hull of f , whereas (f λ )λ = Mλ (M λ (f )) is referred to as the upper
proximal hull of f . It is not difficult to verify that whenever τ > λ > 0, the
following relation holds between the compensated convex transforms, the Moreau
envelopes, and the Lasry-Lions regularizations of f (Zhang 2008a),
without loss of generality, in the following we can assume that the functions are
lower semicontinuous.
The monotonicity and approximation properties of Cλl (f ) with respect to λ are
described by the following results:
Proposition 2. Given f : Rn → R that satisfies (1), then for all A1 < λ < τ < ∞,
we have
The approximation of f from below by Cλl (f ) given by (22) can be better specified,
given that Cλl (f ) realizes a “tight” approximation of the function f in the following
sense (see Zhang 2008a, Theorem 2.3(iv)).
Proposition 3. Let f ∈ C 1,1 (B(x0 ; r)), x0 ∈ Rn , r > 0. Then for sufficiently large
λ > 0, we have that f (x0 ) = Cλl (f )(x0 ). If the gradient of f is Lipschitz in Rn with
Lipschitz constant L, then Cλl (f )(x) = f (x) for all x ∈ Rn whenever λ ≥ L.
Related to this property is the density property of the lower compensated transform
established in Zhang et al. (2015b) that can be viewed as a tight approximation for
general bounded functions.
Cλl (f ) ≤ f ≤ f ,
as a result of Theorem 1, the set of points at which the lower compensated convex
transform equals the original function satisfies a density property, that is, the closed
Rλ,M -neighborhoods of Tl (f, λ) covers Rn . For any point x0 ∈Rn , the point x0
is contained in the local convex hull co Tl (f, λ) ∩ B̄(x0 ; Rλ,M ) . Furthermore, if
f is bounded and continuous, Tl (f, λ) is exactly the set of points at which f is
λ-semiconvex (Cannarsa and Sinestrari 2004), i.e., points x0 where
with an affine function satisfying (x0 ) = 0 and condition (1) holds for f .
A fundamental property for the applications is the locality of the compensated
convex transforms. For a lower semicontinuous function that is in addition bounded
on any bounded set, the locality property was established for this general case
in Zhang (2008a). We next report its version for a bounded function which is
relevant for the applications to image processing and shape interrogation (Zhang
et al. 2015b).
n+1
n+1
n+1
Cλl (f )(x0 ) = inf λi (f (xi )+λ|xi − x0 |2 ), λi ≥0, λi =1, λi xi =x0 ,
i=1 i=1 i=1
|xi − x0 | ≤ Rλ,M , (23)
Since the convex envelope is affine invariant, it is not difficult to realize that there
holds
Despite the definition of Cλl (f ) involves the convex envelope of f + λ| · |2 , the value
of the lower transform for a bounded function at a point depends on the values of
the function in its Rλ,M -neighborhood. Therefore, when λ is large, the neighborhood
will be very small. If f is globally Lipschitz, this result is a special case of Lemma
3.5.7 at p. 72 of Cannarsa and Sinestrari (2004).
The following property shows that the mapping f → Cλl (f ) is nondecreasing,
that is, we have
Next, we recall the definition of Hausdorff distance from Ambrosio and Tilli
(2004).
Suppose E, F ⊂ Rn are two non-empty closed sets. It is, then, easy to see that
(i) if E ⊂ F ,
For a given non-empty closed set K, by definition of the function Dλ2 (x, K), we
have
The following result establishes the relationship between the upper transform of
χK (x) and Dλ2 (x, K) and it was established in Zhang et al. (2015b).
The Hausdorff-Lipschitz continuity of Cλu (χK )(x) and Cλu (Dλ2 (·, K))(x) were
also established in Zhang et al. (2015b).
Consequently,
√
|Cλu (χE )(x) − Cλu (χF )(x)| ≤ 2 λdistH (E, F ). (33)
The lower compensated convex transform (2) and the upper compensated convex
transform (4) represent building blocks for defining novel transformations to smooth
functions, to identify singularities in functions, and to interpolate and approximate
data. For the creation of these transformations, we follow mainly two approaches.
One approach makes a direct use of the basic transforms to single out singularities
of the function or to smooth and/or approximate the function. By contrast, the
other approach realizes a suitably designed combination of the basic transforms that
creates the singularity at the location of the feature of interest.
Smoothing Transform
for some C1 , C2 > 0, then given λ, τ > C1 , we can define two (quadratic) mixed
compensated convex transform as follows:
u,l l,u
Cτ,λ (f )(x) := Cτu (Cλl (f ))(x) and Cλ,τ (f )(x) := Cλl (Cτu (f ))(x), x ∈ Rn .
(35)
From (4), we have that for every λ, τ > C1
u,l l,u
Cτ,λ (f )(x) = −Cτ,λ (−f )(x). (36)
l,u u,l
Hence, properties of Cτ,λ (f ) follow from those for Cτ,λ (f ), and we can thus state
u,l
appropriate results only for Cτ,λ (f ). In this case, then, whenever τ, λ > C1 we have
u,l u,l
that Cτ,λ (f ) ∈ C 1,1 (Rn ). As a result, if f is bounded, then Cτ,λ (f ) ∈ C 1,1 (Rn ) and
l,u
Cτ,λ (f ) ∈ C 1,1 (Rn ) for all λ > 0 and τ > 0. This is important in applications of
the mixed transforms to image processing, because there the function representing
the image takes a value from a fixed range at each pixel point and so is always
bounded. The regularizing effect of the mixed transform is visualized in Fig. 5
l,u
where we display Cλ,τ (f ) of the no-differentiable function f (x, y) = |x| − |y|,
(x, y) ∈ [−1, 1] × [−1, 1] and of f (x, y) + n(x, y) with n(x, y) a bivariate
l,u
normal distribution with mean value equal to 0.05. The level lines of Cλ,τ (f ) and
l,u
Cλ,τ (f + n) displayed in Fig. 5b and d, respectively, are smooth curves.
Finally, as a consequence of the approximation result (22) and likewise result for
Cτu (f ) (see Proposition 2), it is then not trivial to establish a similar approximation
result also for the mixed transforms and verify that there are τj , λj → ∞ as j → ∞
such that on every compact subset of Rn , there holds
The ridge, valley, and edge transforms introduced in Zhang et al. (2015b) are
basic operations for extracting geometric singularities. The key property is the tight
approximation of the compensated convex transforms (see Proposition 3) and the
approximation to f from below by Cλl (f ) and above by Cλu (f ), respectively.
Basic Transforms
Let f : Rn → R satisfy the growth condition (34). The ridge Rλ (f ), the valley
Vλ (f ), and the edge transforms Eλ (f ) of scale λ > C1 are defined, respectively, by
51 Compensated Convex-Based Transforms for Image Processing and. . . 1845
l,u
Fig. 5 (a) Input function f (x, y) = |x| − |y|; (b) graph of Cλ,τ (f ) for λ = 5 and τ = 5; (c) input
function f (x, y) + n(x, y) with n(x, y) a bivariate normal distribution with mean value equal to
l,u
0.05; (d) graph of Cλ,τ (f + n) for λ = 5 and τ = 5
Rλ (f ) = f − Cλl (f ); Vλ (f ) = f − Cλu (f );
(38)
Eλ (f ) = Rλ (f ) − Vλ (f ) = Cλu (f ) − Cλl (f ) .
Fig. 7 (a) Input test image from Smith and Brady(1997); (b) suplevel set of the ridge transform
with λ = 0.1 and for the level equal to 0.004 · max Rλ (f ) ; (c) Canny edges
each other. In the applications, we usually consider −Vλ (f ) to make the resulting
function nonnegative. Figure 6 displays the suplevel set of Rλ (f ) and −Vλ (f ) of
the same level for a gray scale image f compared to the Canny edge filter, whereas
Fig. 7 demonstrates on the test image used in Smith and Brady (1997) the ability of
Rλ (f ) to detect edges between different gray levels.
The transforms Rλ (f ) and Vλ (f ) satisfy the following properties:
Rλ (f + ) = Rλ (f ) and Vλ (f + ) = Vλ (f ) (39)
51 Compensated Convex-Based Transforms for Image Processing and. . . 1847
Fig. 8 (a) A binary image χ of a Chinese character; (b) image 255χ + with = 70(i − j ) for
1 ≤ i ≤ 546 , 1 ≤ j ≤ 571, i.e., the scaled characteristic function of the character plus an affine
function; (c) edges extracted by Canny edge detector; (d) edges extracted by the edge transform
Eλ (f ) with λ = 0.1 after thresholding
for all α > 0. Consequently, the edge transform Eλ (f ) is also scale covariant.
(iii) The transforms Rλ (f ), Vλ (f ), and Eλ (f ) are all stable under curvature
perturbations in the sense that for any g ∈ C 1,1 (Rn ) satisfying |Dg(x) −
Dg(y)| ≤ |x − y|, if λ > then
The numerical experiments depicted in Fig. 8 illustrate the affine invariance of the
edge transform expressed by (39), whereas Fig. 9 shows implications of the stability
of the edge transform under curvature perturbations according to (41).
To get an insight on the geometric structure of the edge transform, it is
informative to consider the case where f is the characteristic function of a set. Let
⊂ Rn be a non-empty open regular set such that ¯ = Rn and ⊂ ∂, then for
λ > 0, we have that (Zhang et al. 2015b)
⎧ √ √
⎪
⎪
⎨= 0 x ∈ (1/ ∪ \ (c )1/
λ )c λ
⎪
√ √
Eλ (χ∪ )(x) ¯ ∪ (c )1/ (42)
⎪
⎪ ∈ (0, 1) x ∈ 1/ λ \ λ \ c
⎪
⎩= 1 x ∈ ∂ .
Fig. 9 (a) A scaled binary image of a Chinese character perturbed by a smooth image; (b)
edges extracted by Canny edge detector; (c) edges extracted by the edge transform Eλ (f ) after
thresholding
that is, λ controls the width of the neighborhood of χ∂ . As λ → ∞, the support of
Eλ (χ ) shrinks to the support of χ∂ .
Figure 10 illustrates the behavior of Eλ (χ ) by displaying the support of Eλ (χ )
for different values of λ.
Since the original function f is directly involved in the definitions of the ridge,
valley, and edge transforms, the transforms (38) are not Hausdorff stable if we
consider a dense sampling of the original function. It is possible nevertheless to
establish stable versions of ridge and valley transforms in the case that f is the
characteristic function χE of a non-empty compact set E ⊂ Rn . For this result,
it is fundamental the observation on the Hausdorff stability of the upper transform
of the characteristic function χE of non-empty compact subsets of Rn (see Zhang
et al. 2015b, Theorem 5.5) which motivates the definition of stable ridge transform
of E as
For the ridge defined by (44), we have that if E, F are non-empty compact subsets
of Rn , for λ > 0 and τ > 0, then there holds
√
|SRλ,τ (χE )(x) − SRλ,τ (χF )(x)| ≤ 4 λdistH (E, F ) (for x ∈ Rn ) . (45)
Fig. 10 Scale effect associated with λ on the support of the edge transform of the (a) image
f = 255 · χ of a Chinese character for different values of λ: (b) λ = 1; (c) λ = 10; (d) λ = 100
Fig. 11 (a) Domain E given by the image of an elephant displayed here as 1 − χE ; (b) boundary
extraction using the stable ridge transform, SRλ,τ (χE )), for λ = 0.1 and τ = λ/8; (c) domain F
obtained by randomly sampling E; (d) boundary extraction of the data sample after thresholding
the stable ridge transform, SRλ,τ (χF )), computed for λ = 0.1 and τ = λ/8
Hence, if λ ≤ τ , we would get SVλ,τ (χE )(x) = 0, and SEλ,τ (χE )(x) would simply
be equal to SRλ,τ (χE )(x).
point, whereas if it meets only the second condition, it is called interior δ−regular
point. Figure 12 displays the different types of points of ∂.
The stable ridge transform allows the characterization of such points given that
if x ∈ ∂ is a δ−regular point of with δ > 0 sufficiently small, in Zhang et al.
(2015b) it is shown that there holds
√ √
( λ + τ − τ )2
SRλ,τ (χ¯ )(x0 ) ≤ . (46)
λ
where
√ √
( λ + τ − τ )2
μ1 (λ, τ ) := (48)
λ
is called the standard height for codimension-1 regular boundary points. The
analysis of the behavior of SRλ,τ (χKa ) in the case of the prototype exterior corner
defined by the set Ka = {(x, y) ∈ R2 : |y| ≤ ax, a, x ≥ 0}, with angle θ satisfying
a = tan(θ/2), shows that the value of SRλ,τ (χKa ) at the corner tip (0, 0) of Ka is
given by
51 Compensated Convex-Based Transforms for Image Processing and. . . 1851
Fig. 13 Graph of SRλ,τ (χKa ) for different pairs of opening angle θ (a) π/2&π/2; (b)
5π/12&7π/12; (c) π/12&11π/12
⎧
⎪
⎪ λ λ+τ
⎪
⎨ if a ≤
2
SRλ,τ (χKa )(0, 0) := μ2 (a, λ, τ ) = λ + (1 +√a 2 )τ τ
⎪ + 2 ( λ + τ − √ τ )2
⎪
⎪ 1 a
⎩ otherwise .
a2 λ
(49)
One can then verify that for a > 0, and for any λ, τ > 0,
This result means that when the angle θ approaches π , the singularity at (0, 0)
disappears. Figure 13 illustrates the behavior of SRλ,τ (χKa ) for different values
of the opening angle θ and for τ = σ λ with σ = 1/8, for which the value of a that
separates the two conditions in (49) corresponds to θ = 2π/3.
Based on this prototype example (Zhang et al. 2015b, Example 6.11), one can
therefore conclude that Rτ (Cλu (χ¯ )) can actually detect exterior corners, whereas
it might happen that at some δ-singular points of ∂, Rτ (Cλu (χ¯ )) takes on
values lower than at the regular points of ∂. As a result, a different Hausdorff
stable method will be therefore needed to detect interior corners and boundary
intersections of domains.
1852 A. Orlando et al.
Fig. 14 Prototype of internal corner in an L−shape domain. (a) Graph of Dλ2 (·, K) for λ =
0.0001; (b) graph of Vλd (·, K)
Interior Corners
Since a prototype interior corner is defined as the complement of an exterior
corner, one could think of detecting interior corners of by looking at the stable
ridge transform of the complement of in Rn . But this would not provide useful
information for geometric objects subject to finite sampling. On the other hand,
traditional methods, such as the Harris and the Susan (Smith and Brady 1997) corner
detector, as well as other local mask-based corner detection methods, would also
not apply directly to such a situation. In this case, therefore we adopt an indirect
approach. This consists of constructing an ad hoc geometric design-based function
that is robust under sampling and is such that its singularities can be identified with
the geometric singularities we want to extract: (i) interior corners of a domain
and (ii) intersections of smooth manifolds. By applying one of the transforms
introduced in section “Basic Transforms” according to the type of singularity, we
can detect the singularity of interest. Given a non-empty closed set K ⊂ R2 with
K = Rn , an instance of function whose singularities capture the type of geometric
feature of K which we are interested of is the distance-based function (26) for λ > 0,
which we rewrite next for ease of reference
√ 2
Dλ2 (x, K) = max{0, 1 − λdist(x, K)} , x ∈ Rn . (50)
Figure 14a displays the graph of Dλ2 (x, K) for a prototype of interior corner in an
L−shape domain and shows that such singularity is of the valley type. By applying
then to Dλ2 (·, K) the valley transform (38) with the same parameter λ as used to
compute Dλ2 (·, K) itself, we obtain
whose graph is displayed in Fig. 14b. We observe therefore that this transform
allows the definition of the set of interior corner points and intersection points of
51 Compensated Convex-Based Transforms for Image Processing and. . . 1853
√
scale 1/ λ as the support of Vλd (·, K), that is
Rather than devising an ad hoc function that embeds the geometric features as its
singularities, one can suitably modify the landscape of the characteristic function
of the object and generate singularities which are localized in a neighborhood of
the geometric feature of interest. This is, for instance, the rationale behind the
transformation introduced in Zhang et al. (2015c). The objective is to obtain a
Hausdorff stable multiscale method that is robust with respect to sampling, so that
it can be applied to geometric objects represented by point clouds, and that is able
to describe possible hierarchy of features as defined in terms of some characteristic
geometric property. If we denote by K ⊂ Rn the union of finitely many smooth
compact manifolds Mk , for k = 1, . . . , m, in this section we are interested to extract
two types of geometric singularities:
Fig. 15 Graph of Dλ2 (·, K), λ = 0.0001, for the three prototypes of interior angle: (a) acute
angle, (b) rectangular angle, and (c) obtuse angle. Graph of Vλd (·, K), λ = 0.0001, for the three
prototypes of interior angle: (d) acute angle, (e) rectangular angle, and (f) obtuse angle. Suplevel
set of Vλd (·, K), λ = 0.0001 and level equal to 0.8 max {Vλd (x, K)} for different values of the
x∈R2
opening angle of the interior corner prototype: (g) acute angle, maxx∈R2 {Vλd (x, K)} = 0.4137;
(h) rectangular angle, maxx∈R2 {Vλd (x, K)} = 0.3323; (i) obtuse angle, maxx∈R2 {Vλd (x, K)} =
0.1053
their stability properties, under dense sampling of the set M, are not known. Let
K ⊂ Rn be a non-empty compact set. By using compensated convex transforms,
we introduced the intersection extraction transform of scale λ > 0 (Zhang et al.
2015c) by
u
Iλ (x; K) = C4λ (χK )(x) − 2 Cλu (χK )(x) − Cλl (Cλu (χK ))(x) , x ∈ Rn .
(53)
51 Compensated Convex-Based Transforms for Image Processing and. . . 1855
Fig. 16 Graph of: (a) The upper transform Cλu (χKα=1 )(x) of the characteristic function of two
crossing lines with right angle; (b) The mixed transform Cλl (Cλu (χKα=1 ))(x); (c) The intersection
filter Iλ (·; Kα=1 ) together with the graph of the characteristic function of Kα=1 displayed as
reference
By recalling the definition of the stable ridge transform (45) of scale λ and τ for the
characteristic function χK , Iλ (x; K) can be expressed in terms of SRλ,τ (χK )(x) for
τ = λ as
u
Iλ (x; K) = C4λ (χK )(x) − 2SRλ,λ (χK )(x), x ∈ Rn . (54)
As instance of how Iλ (·; K) is used to remove or filter regular points, Fig. 16 illus-
trates the graphs of Cλu (χKα=1 )(x), Cλl (Cλu (χKα=1 ))(x) and of the filter Iλ (·; Kα=1 )
in the case of the intersection of two lines perpendicular to each other. This example
can be generalized to “regular directions” and “regular points” on manifolds K and
verify that Iλ (x, K) = 0 at these points. Let K ⊂ Rn be a non-empty compact set
and e a δ-regular direction of x ∈ K, then Iλ (y; K) = 0 for y ∈ [x −δe, x +δe] :=
{x + tδe, −1 ≤ t ≤ 1} when λ ≥ 1/δ 2 . In particular, we have that at the point x,
transforms are Hausdorff stable, it follows that Iλ (·; K) is also Hausdorff stable,
that is, for E, F non-empty compact subsets of Rn and λ > 0, then there holds
√
|Iλ (x; E) − Iλ (x; F )| ≤ 12 λ distH (E, F ) , x ∈ Rn . (56)
whereas for a bounded open set ⊂ Rn with boundary ∂, the quadratic multiscale
medial axis map of with scale λ > 0 is defined by
and the limit map M∞ (x; K) presents well separated values, in the sense that they
are zero outside the medial axis and remain strictly positive on it. To gain an insight
of the geometric structure of Mλ (x; K), for x ∈ MK , Zhang et al. (2015a) makes
use of the separation angle θx introduced in Lieutier (2004). Let K(x) denote the
set of points of ∂K that realize the distance of x to K and by [y1 − x, y2 − x] the
angle between the two nonzero vectors y1 − x and y2 − x for y1 , y2 ∈ K(x), then
By means of this geometric parameter, it was shown in Zhang et al. (2015a) that for
every λ > 0 and x ∈ MK that
This result along with the examination of prototype examples ensures that the
multiscale medial axis map of scale λ keeps a constant height along the part of the
medial axis generated by a two-point subset, with the value of the height depending
on the distance between the two generating points. Such values can, therefore,
be used to define a hierarchy between different parts of the medial axis, and one
can thus select the relevant parts through simple thresholding, that is, by taking
suplevel sets of the multiscale medial axis map, justifying the word “multiscale” in
its definition. For each branch of the medial axis, the multiscale medial axis map
automatically defines a scale associated with it. In other words, a given branch has
a strength which depends on some geometric features of the part of the set that
generates that branch.
An inherent drawback of the medial axis MK is in fact its sensitivity to boundary
details, in the sense that small perturbations of the object (with respect to the
Hausdorff distance) can produce huge variations of the corresponding medial axis.
This does not occur in the case of the quadratic multiscale medial axis map, given
that Zhang et al. (2015a) shifts the focus from the support of Mλ (·; K) to the whole
map. Let K, L ⊂ Rn denote non-empty compact sets and μ := distH (K, L), it was
shown in Zhang et al. (2015a) that for x ∈ Rn , we have
Mλ (x; K) − Mλ (x; L) ≤ μ(1 + λ) (dist(x, K) + μ)2 + 2dist(x, K) + 2μ + 1 .
(61)
While the medial axis of K is not a stable structure with respect to the Hausdorff
distance, its medial axis map Mλ (x; K) is by contrast a stable structure. This result
complies with (61) which shows that as λ becomes large, the bound in (61) becomes
large.
With the aim of giving insights into the implications of the Hausdorff stability
of Mλ (x; ∂), we display in Fig. 17 the graph of the multiscale medial axis map
of a nonconvex domain and of an -sample K of its boundary. An inspection of
1858 A. Orlando et al.
Fig. 17 Multiscale medial axis map of a nonconvex domain and of an -sample K of its
boundary. (a) Nonconvex domain (−) and an -sample K (×) of ∂; (b) graph of Mλ (·; ∂)
for λ = 2.5; (c) support of Mλ (·; ∂); (d) graph of Mλ (·; K ); (e) support of Mλ (·; ); (f)
suplevel set of Mλ (x; K ) for a threshold equal to 0.15 max {Mλ (x; K )}
x∈R2
the graph of Mλ (x; ∂) and Mλ (x; K ), displayed in Fig. 17b and d, respectively,
reveals that both functions take comparable values along the main branches of M .
Also, Mλ (x; K ) takes small values along the secondary branches, generated by
the sampling of the boundary of . These values can therefore be filtered out by a
simple thresholding so that a stable approximation of the medial axis of can be
computed. This can be appreciated by looking at Fig. 17f, which displays a suplevel
set of Mλ (x; K ) that appears to be a reasonable approximation of the support of
Mλ (x; ∂) shown in Fig. 17c whereas Fig. 17e depicts the support of Mλ (·; K ).
A relevant implication of (61) concerns with the continuous approximation of the
medial axis of a shape starting from subsets of the Voronoi diagram of a sample of
the shape boundary which is pertinent for shape reconstruction from point clouds.
Let us consider an -sample K of ∂, that is, a discrete set of points such that
distH (∂, K ) ≤ . Since the medial axis of K is the Voronoi diagram of K , if
we denote by V the set of all the vertices of the Voronoi diagram Vor(K ) of K
and denote by P the subset of V formed by the “poles” of Vor(K ) introduced
in Amenta and Bern (1999) (i.e., those vertices of Vor(K ) that converge to the
medial axis of as the sample density approaches infinity), then, for λ > 0, it was
established in Zhang et al. (2015a) that
lim Mλ (x ; K ) = 0 for x ∈ V \ P .
→0+
51 Compensated Convex-Based Transforms for Image Processing and. . . 1859
Vλ, K = x ∈ Rn : λdist(x, MK ) ≤ dist(x, K) , (62)
and a sharp estimate for the Lipschitz constant of Ddist2 (·, K) was also obtained.
This result can be viewed as a weak Lusin-type theorem for the squared-distance
function which extends regularity results of the squared-distance function to any
closed non-empty subset of Rn .
Approximation Transform
{(x, fK (x)), x ∈ K} its graph, the setting for the application of the compensated
convex transforms to obtain an approximation transform is the following. Given
M > 0, we define first two functions extending fK to Rn \ K, namely,
1860 A. Orlando et al.
⎧
⎨ fK (x), x ∈ K,
fK−M (x) = f (x)χK (x) − MχRn \K =
⎩ −M, x ∈ Rn \ K ;
⎧ (64)
⎨ fK (x), x ∈ K,
fKM (x) = f (x)χK (x) + MχRn \K =
⎩ M, x ∈ Rn \ K ,
1 l M
λ (fK )(x) =
AM Cλ (fK )(x) + Cλu (fK−M )(x) , x ∈ Rn , (65)
2
which we refer to as the average compensated convex approximation transform of
fK of scale λ and level M (Zhang et al. 2016a).
In the case that K ⊂ Rn is a compact set and f : Rn → R is bounded
and uniformly continuous, error estimates are available for M → ∞ and for
x ∈ co[K]. If for x ∈ co[K] \ K we denote by rc (x) the convex density radius
as the smallest radius of a closed ball B̄(x; rc (x)) such that x is in the convex hull
of K ∩ B̄(x; rc (x)), then for λ > 0 and all x ∈ co[K] there holds
a 2b
|A∞
λ (fK )(x) − f (x)| ≤ ω rc (x) + + , (66)
λ λ
a 2b
λ (fK )(x) − f (x)|
|AM ≤ ω rc (x) + + , (67)
λ λ
51 Compensated Convex-Based Transforms for Image Processing and. . . 1861
where, as for (66), the constants a ≥ 0 and b ≥ 0 are such that ω(t) ≤ at + b
for t ≥ 0 with ω = ω(t) the least concave majorant of the modulus of continuity ωf
of f .
Both the estimates (66) and (67) can be improved for Lipschitz functions and for
C 1,1 functions.
Another natural and practical question in data approximation and interpolation is
the stability of a given method. For approximations and interpolations of sampled
functions, we would like to know, for two sample sets which are “close” to each
other under the Hausdorff distance (Ambrosio and Tilli 2004), for instance, whether
the corresponding approximations are also close to each other. It is easy to see
that differentiation- and integration-based approximation methods are not Hausdorff
stable because continuous functions can be sampled over a finite dense set. One
of the advantages of the compensated convex approximation is that for a bounded
uniformly continuous function f , and for fixed M > 0 and λ > 0, the mapping
K → AM λ (fK ) is continuous with respect to the Hausdorff distance for compact
sets K, and the continuity is uniform with respect to x ∈ Rn . This means that if
another sampled subset E ⊂ Rn (finite or compact) is close to K, then the output
AMλ (fE )(x) is close to Aλ (fK )(x) uniformly with respect to x ∈ R . As far as we
M n
1 u l M
τ,λ (fK )(x) =
(SA)M (C (C (f ))(x) + Cτl (Cλu (fK−M ))(x) , x ∈ Rn . (68)
2 τ λ K
Since the mixed compensated convex transforms are C 1,1 functions (Zhang 2008a,
Theorem 2.1(iv) and Theorem 4.1(ii)), the mixed average approximation (SA)M τ,λ
is a smooth version of our average approximation. Also, for a bounded function
f : Rn → R, satisfying |f (x)| ≤ M, x ∈ Rn for some constant M > 0, we have
the following estimates (Zhang et al. 2015b, Theorem 3.13):
16Mλ 16Mλ
0≤Cτu (Cλl (f ))(x) − Cλl (f )(x)≤ , 0≤Cλu (f )(x) − Cτl (Cλu (f ))(x)≤
τ τ
for all x ∈ Rn , λ > 0, and τ > 0 and hence we can easily show that for any closed
set K ⊂ Rn ,
16Mλ
τ,λ (fK )(x) − Aλ (fK )(x)| ≤
|(SA)M M
, x ∈ Rn .
τ
This implies that for given λ > 0 and M > 0, the mixed approximation (SA)M
τ,λ (fK )
converges to the basic average approximation Aλ (fK ) uniformly in R as τ → ∞,
M n
Fig. 18 Different
approaches for computing the
lower compensated convex
transform Cλl (f )
Numerical Algorithms
Convex-Based Algorithms
Algorithms to compute convex hull such as the ones given in Barber et al. (1996)
are more suitable for discrete set of points, and their complexity is related to the
cardinality of the set. An adaptation of these methods to our case, with the set to
convexify given by the epigraph of f + λ| · |2 , does not appear to be very effective,
especially for functions defined in subsets of Rn for n ≥ 2, compared to the methods
that (directly) compute the convex envelope of a function (Vese 1999; Oberman
2008; Contento et al. 2015).
PDE-Based Algorithm
Of particular interest for applications to image processing, where functions involved
are defined on grid of pixels, is the characterization of the convex envelope
as the viscosity solution of a nonlinear obstacle problem (Oberman 2008). An
approximated solution is then obtained by using centered finite differences along
directions defined by an associated stencil to approximate the first eigenvalue of
the Hessian matrix at the grid point. A generalization of the scheme introduced in
Oberman (2008) in terms of the number of convex combinations of the function
values at the grid points of the stencil is briefly summarized in Algorithm 1 and
described below. Given a uniform grid of points xk ∈ Rn , equally spaced with grid
size h, let us denote by Sxk the d−point stencil of Rn with center at xk . The stencil
Sxk is defined as Sxk = {xk + hr, |r|∞ ≤ 1, r ∈ Zn } where |r|∞ is the ∞ -norm of
51 Compensated Convex-Based Transforms for Image Processing and. . . 1863
r ∈ Zn and d = #(S) is the cardinality of the finite set S. At each grid point xk , we
compute an approximation of the convex envelope of f at xk by an iterative scheme
where each iteration step m is given by
(co f )m (xk ) = min f (xk ), λi (co f )m−1 (xi ) : λi = 1, λi ≥ 0, xi ∈ Sxk
with the minimum taken between f (xk ) and only some convex combinations of
(co f )m−1 at the stencil grid points xi of Sxk . It is then not difficult to show that the
scheme is monotone, thus convergent. However, there is no estimate of the rate of
convergence which, in actual applications, appears to be quite slow. Furthermore,
results are biased by the type of underlying stencil.
Biconjugate Algorithm
Based on the characterization of the convex envelope of f in terms of the
biconjugate (f ∗ )∗ of f (Hiriart-Urruty and Lemaréchal 2001; Rockafellar 1970),
where f ∗ is the Legendre-Fenchel transform of f , we can approximate the convex
envelope by computing twice the discrete Legendre-Fenchel transform. We can thus
improve speed efficiency with respect to a brute force algorithm, which computes
(f ∗ )∗ with complexity O(N 2 ) with N the number of grid points, if we have an
efficient scheme to compute the discrete Legendre-Fenchel transform of a function.
For functions f : X → R defined on Cartesian sets of the type X = ni=1 Xi
with Xi intervals of R, i = 1, . . . , n, the Legendre-Fenchel transform of f can
be reduced to the iterate evaluation of the Legendre-Fenchel transform of functions
dependent only on one variable as follows:
As a result, one can improve the complexity of the computation of f ∗ if one has an
efficient scheme to compute the Legendre-Fenchel transform of functions of only
one variable. For instance, the algorithm described in Lucet (1997) and Helluy and
Mathis (2011), which exploits an idea of Brenier (1989) and improves the imple-
mentation of Corrias (1996), computes the discrete Legendre-Fenchel transform in
linear time, that is, with complexity O(N). If gh denote the grid values of a function
of one variable, the key idea of Brenier (1989) and Corrias (1996) is to compute
(gh )∗ as approximation of g ∗ using the following result:
∗
(gh )∗ (ξ ) = co[Πfh ] (ξ ) , ξ ∈R (70)
where Πgh denotes the continuous piecewise affine interpolation of the grid values
gh . Therefore, applying an algorithm with linear complexity, for instance, the
beneath-beyond algorithm (Preparata and Shamos 1985), to compute the convex
envelope co[Πgh ], followed by the use of analytical expressions for the Legendre-
Fenchel transform of a convex piecewise affine function yields an efficient method
to compute (gh )∗ (Lucet 1997). For functions defined in a bounded domain, in
Lucet (1997), it was recommended to increment the size of the domain for a better
precision of the computation of the Legendre-Fenchel transform. The work Helluy
and Mathis (2011) avoids this by elaborating the exact expression of the Legendre-
Fenchel transform of a convex piecewise affine function defined in a bounded
domain X which is equal to infinity in R \ X, or it has therein an affine variation. In
this manner, they can avoid boundary effects. For ease of reference, we report next
the analytical expression of g ∗ in the case where g : R → R is convex piecewise
affine. Without loss of generality, let x1 < . . . < xN be a grid of points of R,
c1 < . . . < cN , and assume g : R → R to be defined as follows:
⎧
⎪
⎪ +∞ if x ≤ x1
⎪
⎨
g : x ∈ R → gi + ci (xi − x) if xi ≤ x ≤ xi+1 , i = 1, . . . , N − 1 (71)
⎪
⎪
⎪
⎩ g + c (x − x) if x ≥ x
N N N N
Once we know g ∗ , using the decomposition (69), we can compute f ∗ and thus the
biconjugate f ∗∗ .
51 Compensated Convex-Based Transforms for Image Processing and. . . 1865
= inf λ|x1 − ξ1 |2 + inf {λ|x2 − ξ2 |2 + f (ξ1 , ξ2 )} . (74)
ξ1 ∈X ξ2 ∈Y
that is, the Moreau envelope of a function of one variable is reduced to the com-
putation of the lower envelope of parabolas of given curvature λ. The computation
of such envelope is realized by Felzenszwalb and Huttenlocher (2012) in two steps.
In the first one, they compute the envelope by adding the parabolas one at time
which is done in linear time and comparing each parabola to the parabolas that
realize the envelope, which is done in constant time, whereas in the second step,
they compute the value of the envelope at the given point x ∈ R. The key points
of the scheme result from two observations. The first one is that given any two
parabolas of F parameterized by q, r ∈ , their intersection occurs only at one
point with coordinate
whereas the second one regards the relation between the parabolas so that if q < r,
then pq (x) ≤ pr (x) for x < xs and pq (x) ≥ pr (x) for x > xs . This scheme allows
the evaluation of Mλ (f )(x) for any x ∈ Rn even if f is defined only on a bounded
open set , without any consideration on how to extend f on Rn \ .
about the primary domain, where the Moreau envelope is defined, and the dual
domain, which is the one where the Legendre-Fenchel transform is defined.
Numerical Examples
Given the singleton set K = {0} ⊂ R2 , the analytical expression of Cλu (χK )
established in Zhang et al. (2015c, Example 1.2) is given by
√
0, √ if |x| > 1/√λ ,
Cλ (χK )(x) =
u
(77)
λ(1/ λ − |x|)2 , if |x| ≤ 1/ λ .
We compute then Cλu (χK ) by applying the convex-based algorithms, i.e., Algo-
rithm 1 (Oberman 2008) and the biconjugate-based scheme (shorted as BS here-
after) (Lucet 1997; Helluy and Mathis 2011), and the Moreau-based algorithms, i.e.,
Algorithm 2 and the parabola envelope scheme (shorted as P ES hereafter) (Felzen-
szwalb and Huttenlocher 2012). To compare the accuracy of the schemes, we
will consider (i) the Hausdorff distance between the support of the exact and the
computed upper transform,
√ u,h
eH = distH B(0; 1/ λ), sprt Cλ (χK )
with Cλu,h (χK ) the computed upper compensated transform; (ii) the relative
L∞ error norm given by
and (iii) the execution time tc in seconds by a PC with processor Intel® Core™
i7-4510U [email protected] GHz and 8 GB of memory RAM. √
Figure 19 displays the support of Cλu (χK ) given by B(0; 1/ λ) and of Cλu,h (χK )
computed by the numerical schemes mentioned above. Algorithm 2 and the parabola
envelope algorithm yield the same results; thus, Fig. 19 displays the support as
computed by only one of the two schemes. In this case, we observe that the support
1868 A. Orlando et al.
Fig. 19 Supports of the exact and computed upper compensated transform of the characteristic
function√of a singleton set of R2 by the different numerical schemes. (a) Exact support given by
B(0; 1/ λ) for λ = 0.01; (b) support of Cλu,h (χK ) computed by Algorithm 1 (Oberman 2008); (c)
support of Cλu,h (χK ) computed by the biconjugate-based scheme (Lucet 1997; Helluy and Mathis
2011) for hd = 0.001; (d) support of Cλu,h (χK ) computed by Algorithm 2 (Zhang et al. (2021))
which coincides with the one computed using the parabola envelope scheme (Felzenszwalb and
Huttenlocher 2012)
coincides with the exact one. This does not happen for the support computed by the
other two schemes. The application of Algorithm 1 evidences the bias of the scheme
with the underlying stencil, whereas by applying the biconjugate-based scheme, we
note some small error all over the domain. The spread of this error depends on the
dual mesh grid size hd . Table 1 reports the values of tc , eL∞ and dH for the different
schemes. For the biconjugate-based scheme, we have different results according
to the parameter hd that controls the uniform discretization of the dual mesh. The
value hd = 1 means that we are considering the same grid size as the grid of the
input function χK , whereas lower values for hd means that we are computing on
a finer dual mesh compared to the primal one. The results given in Table 1 show
that in terms of the values of Cλu (χK ), the biconjugate-based scheme is the one that
produces the best results (compare the values of eL∞ ), but this occurs at the fraction
of cost of reducing hd which means to increase the number of the dual grid nodes
and consequently the computational time. The issue of the choice of the dual grid on
the accuracy of the computation of the convex envelope by the conjugate has been
also tackled and recognized in Contento et al. (2015). However, as already pointed
out in the analysis of Fig. 19, the support of Cλu,h (χK ) computed by the biconjugate
scheme is the one to yield the worst value for eH .
Table 1 Comparison between the different numerical schemes for the computation of Cλu (χK )
for λ = 0.01. The symbol hd refers to the dual mesh size of the scheme that computes the convex
envelope via the biconjugate
tc eL∞ eH
Algorithm 1 1.9791 0.0390 1.7321
Convex based
hd = 1 0.1575 48 9.4999
schemes
Biconjugate
hd = 0.1 0.2157 0.2400 9
scheme
hd = 0.01 0.5935 0.0142 7.6158
hd = 0.001 16.6603 0.0032 7.5498
Moreau based Algorithm 2 0.1246 0.0249 0
schemes PE scheme 0.2553 0.0249 0
Fig. 20 (a) Medial axis of the road network; (b) location of the intersection points; (c) map of
the road network and location of the intersection points shown in (b)
with the location of its singular lines and the parts of surface with higher curvature.
Figure 21b depicts the intersections between manifolds of different dimensions,
namely, in the figure, we have the Whitney umbrella of the implicit equation
x 2 = y 2 z, a cylinder, and a helix, with the location of their mutual intersections
and also of where the Whitney surface intersects itself; finally, Fig. 21c displays the
intersection between a cylinder, planes, and a helix.
The intersection of the line with the plane for the geometry shown in Fig. 21
is weaker than the geometric singularities of the surfaces. With this meaning, the
values of the local maxima of Iλ (·; K) determine a scale between the different types
1870 A. Orlando et al.
Fig. 21 (a) Plücker surface with identification of its singular lines and surface parts of higher
curvatures; (b) intersections of the Whitney surface of equation x 2 = y 2 z with a helix and a
cylinder; (c) intersections of planes with a cylinder and an helix
Fig. 22 (a) Tangential intersection of a sampled sphere and cylinder which are “almost”
tangentially intersected and indication of the intersection marker; (b) intersection markers for the
intersection among loosely sampled piecewise affine surfaces of equation ||10x − 75| − |10y −
75| + |10z − 75| − 45|=0, the circle of equation (10x − 75)2 + (10z − 75)2 ≤ 452 on the plane of
equation y = 75 and the line of equation x = 75, z = 75
of intersections present in the manifold K and represent the multiscale nature of the
filter Iλ (·; K).
Finally, the numerical experiments displayed in Fig. 22 refer to critical conditions
that are not directly covered by the theoretical results we have obtained. Figure 22a
shows the result of the application of Iλ (·; K) to a sphere and a cylinder that are
“almost” tangentially intersecting each other, whereas Fig. 22b illustrates the results
of the application of the filter to detect the intersection between loosely sampled
piecewise affine functions, a plane and a line.
51 Compensated Convex-Based Transforms for Image Processing and. . . 1871
Fig. 23 Reconstruction of real-world digital elevation maps. (a) Ground truth model from USGS-
STRM1 data relative to the area with geographical coordinates [N 40◦ 23 25 , N 40◦ 27 37 ] ×
[E 14◦ 47 25 , E 14◦ 51 37 ]; (b) sample set K1 formed by only level lines at regular height interval
of 58.35 m. The set K1 contains 14% of the ground truth points; (c) sample set K2 formed by
taking randomly 30% of the points belonging to the level lines of the set K1 and scattered points
corresponding to 5% density. The sample set K2 contains 7% of the ground truth points
Approximation Transform
Fig. 24 Reconstruction of real-world digital elevation maps. (a) Graph of AM λ (fK ) for sample set
K1 . Relative L2 -Errors: = 0.0118, K = 0. Parameters: λ = 2·103 , M = 1·106 . Total number of
λ (fK ) for sample set K2 . Relative L -Errors: = 0.0109, K = 0.
iterations: 3818; (b) graph of AM 2
the 14% of the ground truth real digital data. The second sample set, denoted by K2 ,
has been formed by taking randomly the 30% of the points belonging to the level
lines of the set K1 and scattered points corresponding to 5% density so that the
sample set K2 amounts to about 7% of the ground truth points. The two sample sets
K1 and K2 are shown in Fig. 23b and c, respectively.
The graphs of the AMλ (fK ) interpolant and of the AMLE interpolant for the two
sample sets along with the respective isolines at equally spaced heights equal to
58.35 m are displayed in Figs. 24 and 25, respectively, whereas Table 2 contains the
values of the relative L2 -error on and K on the sample set K between such
interpolants and the ground truth model, given by, respectively,
f − AM
λ (fK )L2 () fK − AM
λ (fK )L2 (K)
= and K = , (78)
f L2 () fK L2 (K)
where f is the ground truth model and AM λ (fK ) is the average approximation
of the sample fK of f over K. We observe that while AM λ (fK ) yields an exact
interpolation of fK over , this is not the case for the AMLE approximation.
Though both reconstructions are comparable visually to the ground truth model, a
closer inspection of the pictures shows that in the reconstruction from the synthetic
data, the AMLE interpolant does not reconstruct correctly the mountains peaks,
which appear to be smoothed and introduce artificial ridges along the slopes of the
51 Compensated Convex-Based Transforms for Image Processing and. . . 1873
Fig. 25 Reconstruction of real-world digital elevation maps. (a) Graph of the AMLE interpolant
from set K1 . Relative L2 -Error: = 0.0410, K = 0.0110. Total number of iterations: 11542; (b)
graph of the AMLE interpolant from set K2 . Relative L2 -Error: = 0.02863, K = 0.0109. Total
number of iterations: 12457; (c) isolines of the AMLE interpolant from sample set K1 at regular
heights of 58.35 m; (d) isolines of the AMLE interpolant from sample set K2 at regular heights of
58.35 m
Table 2 Relative L2 -error for the DEM reconstruction from the two sample sets using the
AMλ (fK ) and the AMLE interpolant. The realization of K = 0 for Aλ (fK ) says that Aλ (fK )
M M
Fig. 26 (continued)
51 Compensated Convex-Based Transforms for Image Processing and. . . 1875
faulty memory locations in hardware, so that information is lost at the faulty pixels
and the corrupted pixels are set alternatively to the minimum or to the maximum
value of the range of the image values. When the noise density is low, about less
than 40%, the median filter (Astola et al. 1997) or its improved adaptive median
filter (Hwang and Haddad 1995) is quite effective for restoring the image. However,
this filter loses its denoising power for higher noise density given that details and
features of the original image are smeared out. In those cases, other techniques must
be applied; one possibility is the two-stage TV-based method proposed in Chan et al.
(2005) which consists of applying first an adaptive median filter to identify the pixels
that are likely to contain noise and construct, thus a starting guess which is used in
the second stage for the minimization of a functional of the form
2552
PSNR = 10 log10 (79)
1
mn i,j |fi,j − ri,j |2
where fi,j and ri,j denote the pixel values of the original and restored image,
respectively, and m, n denotes the size of the image f . In our numerical experi-
ments, we have considered the following cases. The first one assumes the set K
to be given by the noise-free interior pixels of the corrupted image together with
Fig. 26 (a) Original image; (b) original image covered by a salt & pepper noise density of 70%.
PSNR = 6.426 dB; (c) restored image AM λ (fK ) by Moreau-based scheme (Algorithm 2) with the
set K padded by two pixels. PSNR = 26.020 dB. λ = 20, M = 1E13. Total number of iterations:
21; (d) restored image AM λ (fK ) by convex-based scheme (Algorithm 1) with the set K padded by
two pixels. PSNR = 26.642 dB. λ = 20, M = 1E13. Total number of iterations: 1865; (e) restored
image by the adaptive median filter (Hwang and Haddad 1995) used as starting guess for the two-
stage TV-based method described in Cai et al. (2007) and Chan et al. (2005). Window size w = 33
pixels. PSNR = 22.519 dB; (f) restored image by the two-stage TV-based method described in Cai
et al. (2007) and Chan et al. (2005) with the set K padded by two pixels. PSNR = 26.475 dB. Total
number of iterations: 3853
1876 A. Orlando et al.
the boundary pixels of the original image. In the second case, K is just the set of
the noise-free pixels of the corrupted image, without any special consideration on
the image boundary pixels. In analyzing this second case, to reduce the boundary
effects produced by the application of Algorithms 1 and 2, we have applied our
method to an enlarged image and then restricted the resulting restored image to the
original domain. The enlarged image has been obtained by padding a fixed number
of pixels before the first image element and after the last image element along each
dimension, making mirror reflections with respect to the boundary. The values used
for padding are all from the corrupted image. In our examples, we have considered
two versions of enlarged images, obtained by padding the corrupted image with
two pixels and ten pixels, respectively. Tables 3, 4, and 5 compare the values of the
PSNR of the restored images by our method and the TV-based method applied to the
corrupted image with noise-free boundary and to the two versions of the enlarged
images with the boundary values of the enlarged images given by the padded noisy
image data. We observe that there are no important variations in the denoising result
between the different methods of treating the image boundary. This is also reflected
by the close value of the PSNR of the resulting restored images. For 70% salt &
pepper noise, Fig. 26c and d display the restored image AM λ (fK ) by Algorithms 1
and 2, respectively, with K equal to the true set that has been enlarged by two pixels,
whereas Fig. 26e and f show the restored image by the adaptive median filter and the
TV-based method (Cai et al. 2007; Chan et al. 2005) using the same set K. Although
the visual quality of the images restored from 70% noise corruption is comparable
between our method and the TV-based method, the PSNR using our method with
Algorithm 2 is higher than that for the TV-based method in all of the experiments
reported in Tables 3, 4, and 5. An additional advantage of our method is its speed.
Our method does not require initialization which is in contrast with the two-stage
TV-based method, for which the initialization, for instance, is given by the restored
image using an adaptive median filter.
Inpainting
Inpainting is the problem where we are given an image that is damaged in some
parts and we want to reconstruct the values in the damaged part on the basis of
the known values of the image. This topic has attracted lot of interest especially
as an application of TV-related models (Chan and Shen 2005; Schönlieb 2015).
The main motivation is that functions of bounded variations provide the appropriate
functional setting given that such functions are allowed to have jump discontinuities
1878 A. Orlando et al.
Fig. 27 Restoration of 90% corrupted image (PSNR = 5.372 dB) with the set K padded by two
pixels. (a) Restored image AM λ (fK ) by Moreau-based scheme (Algorithm 2). PSNR = 22.654 dB.
λ = 10, M = 1e13. Total number of iterations: 32; (b) restored image AM λ (fK ) by convex-based
scheme (Algorithm 1). PSNR = 23.078 dB. λ = 10, M = 1e13. Total number of iterations: 10445;
(c) restored image by the two-stage TV-based method described in Cai et al. (2007) and Chan et al.
(2005). PSNR = 22.459 dB. Total number of iterations: 2679. Restoration of 99% corrupted image
(PSNR = 4.938 dB), with the set K padded by ten pixels. (d) restored image AM λ (fK ) by Moreau-
based scheme (Algorithm 2). PSNR = 18.026 dB. λ = 2, M = 1e13. Total number of iterations:
78; (e) restored image AM λ (fK ) by convex-based scheme (Algorithm 1). PSNR = 18.342 dB.
λ = 2, M = 1e13. Total number of iterations: 54823; (f) restored image by the two-stage TV-
based method described in Cai et al. (2007) and Chan et al. (2005). PSNR = 17.330 dB. Total
number of iterations: 13125
(Ambrosio et al. 2000). These authors usually argue that continuous functions
cannot be used to model digital image-related functions as functions representing
images may have jumps (Chan and Shen 2005), which are associated with the image
features. However, from the human vision perspective, it is hard to distinguish
between a jump discontinuity, where values change abruptly, and a continuous
function with sharp changes within a very small transition layer. By the application
of our compensated convex-based average transforms, we are adopting the latter
point of view. A comprehensive study of this theory applied to image inpainting
can be found in Zhang et al. (2016a, 2018) where we also establish error estimates
for our inpainting method and compare with the error analysis for image inpainting
discussed in Chan and Kang (2006). We note that for the relaxed Dirichlet problem
of the minimal graph (Chan and Kang 2006) or of the TV model used in Chan and
51 Compensated Convex-Based Transforms for Image Processing and. . . 1879
Fig. 28 Inpainting of a text overprinted on an image. (a) Input image; (b) restored image AM
λ (fK )
using Algorithm 2. PSNR = 39.122 dB. Parameters: λ = 18 and M = 1 · 105 . Total number of
iterations: 19; (c) restored image by the AMLE method described in Schönlieb (2015) and Parisotto
and Schönlieb (2016). PSNR = 36.406 dB. Total number of iterations: 5247; (d) restored image
by the split Bregman inpainting method described in Getreuer (2012). PSNR = 39.0712 dB. Total
number of iterations: 19
Kang (2006), as the boundary value of the solution does not have to agree with
the original boundary value, extra jumps can be introduced along the boundary. By
comparison, since our average approximation is continuous, it will not introduce
such a jump discontinuity at the boundary.
To assess the performance of our reconstruction compared to state-of-art inpaint-
ing methods, we consider synthetic example where we are given an image f and
we overprint some text on it. The problem is then removing the text overprinted
on the image displayed in Fig. 28a and how close we can get to the original
1880 A. Orlando et al.
Fig. 29 Comparison of a detail of the original image with the corresponding detail of the restored
images according to the compensated convexity method and the TV-based method. Lips detail of
the: original image (a) without and (b) with overprinted text. Lips detail of the: (c) restored image
AMλ (fK ) using Algorithm 2; (d) AMLE-based restored image: (e) TV-based restored image