Duality Based Algorithms For Total Variation Regularized Image Restoration
Duality Based Algorithms For Total Variation Regularized Image Restoration
Abstract Image restoration models based on total variation (TV) have be-
come popular since their introduction by Rudin, Osher, and Fatemi (ROF) in
1992. The dual formulation of this model has a quadratic objective with sep-
arable constraints, making projections onto the feasible set easy to compute.
This paper proposes application of gradient projection (GP) algorithms to the
dual formulation. We test variants of GP with different step length selection
and line search strategies, including techniques based on the Barzilai-Borwein
method. Global convergence can in some cases be proved by appealing to ex-
isting theory. We also propose a sequential quadratic programming (SQP) ap-
proach that takes account of the curvature of the boundary of the dual feasible
set. Computational experiments show that the proposed approaches perform
well in a wide range of applications and that some are significantly faster than
previously proposed methods, particularly when only modest accuracy in the
solution is required.
Keywords Image Denoising · Constrained Optimization · Gradient Projec-
tion
Computational and Applied Mathematics Report 08-33, UCLA, October, 2008 (Revised).
This work was supported by National Science Foundation grants DMS-0610079, DMS-
0427689, CCF-0430504, CTS-0456694, and CNS-0540147, and Office of Naval Research grant
N00014-06-1-0345. The work was performed while T. Chan was on leave at the National Sci-
ence Foundation as Assistant Director of Mathematics and Physical Sciences.
M. Zhu
Mathematics Department, UCLA, Box 951555, Los Angeles, CA 90095-1555, USA
E-mail: [email protected]
S. J. Wright
Department of Computer Sciences, University of Wisconsin, 1210 W. Dayton Street, Madi-
son, WI 53705, USA
E-mail: [email protected]
T. F. Chan
Mathematics Department, UCLA, Box 951555, Los Angeles, CA 90095-1555, USA
E-mail: [email protected]
2
1 Introduction
1.1 Background
This above problem yields the same solution as (1) for a suitable choice of the
Lagrange multiplier λ (see [6]).
Recently, many researchers have proposed algorithms that make use of the
dual formulation of the ROF model; see, for example [8], [4], and [5]. To derive
this form, we first notice the TV semi-norm has the following equivalent forms
Z Z Z
|∇u| = 1
max ∇u · w = max −u∇ · w, (3)
Ω w∈C0 (Ω), |w|≤1 Ω |w|≤1 Ω
where u and w are the primal and dual variables, respectively. The min-max
theorem (see e.g., [12, Chapter VI, Proposition 2.4]) allows us to interchange
the min and max, to obtain
Z
λ
max min −u∇ · w + ku − f k22 .
w∈C0 (Ω), |w|≤1 u
1
Ω 2
1
u=f+ ∇·w (4)
λ
leading to the following dual formulation:
"
2 #
λ 2
1
max D(w) := kf k2 −
∇ · w + f
, (5)
1
w∈C0 (Ω), |w|≤1 2
λ 2
or, equivalently,
1
min k∇ · w + λf k22 . (6)
w∈C01 (Ω), |w|≤1 2
For a primal-dual feasible pair (u, w), the duality gap G(u, w) is defined to
be the difference between the primal and the dual objectives:
The duality gap bounds the distance to optimality of the primal and dual
objectives. Specifically, if u and w are feasible for the primal (2)) and dual (5)
problems, respectively, we have
Before describing the numerical algorithms, let us fix our main notational
conventions.
Often in this paper we need to concatenate vectors and matrices, in both
column-wise or row-wise fashion. We follow the MATLAB convention of using
“,” for adjoining vectors and matrices in a row, and “;” for adjoining them in
a column. Thus, for any vectors x, y and z, the following are synonymous:
x
y = (xT , y T , z T )T = (x; y; z)
z
For simplicity, we assume that the domain Ω is the unit square [0, 1]×[0, 1],
and define a discretization via a regular n × n grid of pixels, indexed as (i, j),
for i = 1, 2, . . . , n, j = 1, 2, . . . , n. The index (i, j) represents the point (i/(n +
1), j/(n + 1)) ∈ Ω. We represent images images as two-dimensional matrices
of dimension n × n, where ui,j represents the value of the function u at the
point indexed by (i, j). (Adaptation to less regular domains is not difficult
in principle.) To define the discrete total variation, we introduce a discrete
gradient operator, whose two components at each pixel (i, j) are defined as
follows:
1 ui+1,j − ui,j if i < n
(∇u)i,j = (9a)
0 if i = n
ui,j+1 − ui,j if j < n
(∇u)2i,j = (9b)
0 if j = n.
(and similarly for objects in Rn×n×2 ), we have from definition of the discrete
divergence operator that for any u ∈ Rn×n and w ∈ Rn×n×2 , that h∇u, wi =
5
hu, −∇ · wi. It is easy to check that the divergence operator can be defined
explicitly as follows:
1 1
2 2
wi,j − wi−1,j if 1 < i < n wi,j − wi,j−1 if 1 < j < n
1 2
(∇ · w)i,j = wi,j if i = 1 + w if j = 1
1 i,j2
−wi−1,j if i = n −wi,j−1 if j = n.
v(j−1)n+i = ui,j , 1 ≤ i, j ≤ n.
Using this notation, the discrete version of the primal ROF model (2) can be
written as follows:
N
X λ
min kATl vk2 + kv − gk22 (11)
v 2
l=1
The complete vector x ∈ R2N of unknowns for the discretized dual problem
is then obtained by concatenating these subvectors: x = (x1 ; x2 ; . . . ; xN ). We
also form the matrix A by concatenating the matrices Al , l = 1, 2, . . . , N
defined in (10), that is, A = (A1 , . . . , AN ) ∈ RN ×2N . In this notation, the
divergence ∇ · w is simply −Ax, so the discretization of the dual ROF model
(6) is
1
min kAx − λgk22 (12)
2
x∈X
Here we make several remarks on the discretized problems (11), (12) and prove
a general convergence result. It is easy to verify that both problems can be
obtained from the function ` : RN × X → R defined as follows:
λ
`(v, x) := xT AT v + kv − gk22 . (13)
2
The primal problem (11) is simply
It is easy to verify that the conditions (H1), (H2), (H3), and (H4) of [15,
pp. 333-334] are satisfied by this setting. Thus, it follows from [15, Chap-
ter VII, Theorem 4.3.1] that ` has a nonempty compact convex set of saddle
points (v̄, x̄) ∈ RN × X. Moreover, from [15, Chapter VII, Theorem 4.2.5], the
point (v̄, x̄) ∈ RN × X is a saddle point if and only if v̄ solves (11) and x̄ solves
(12).
Note that by strict convexity of the objective in (11), the solution v̄ of
(11) is in fact uniquely defined. For any saddle point (v̄, x̄), we have that
`(v̄, x̄) ≤ `(v, x̄) for all v ∈ RN , that is, v̄ is a minimizer of `(·, x̄). Thus, from
optimality conditions for `(·, x̄), the following relationship is satisfied for the
unique solution v̄ of (11) and for any solution x̄ of (12):
By uniqueness of v̄, it follows that Ax̄ is constant for all solutions x̄ of (12).
The following general convergence result will be useful in our analysis of
algorithms in Section 2.
Proposition 1 Let {xk } be any sequence with xk ∈ X for all k = 1, 2, . . .
such that all accumulation points of {xk } are stationary points of (12). Then
the sequence {v k } defined by
1 k
vk = g − Ax (15)
λ
converges to the unique solution v̄ of (11).
Proof Note first that all stationary points of (12) are in fact (global) solutions
of (12), by convexity.
Suppose for contradiction that v k 6→ v̄. Then we can choose > 0 and a
subsequence S such that kv k − v̄k2 ≥ for all k ∈ S. Since all xk belong to
the bounded set X, the sequence {xk } is bounded, so {v k } is bounded also.
In particular, the subsequence {v k }k∈S must have an accumulation point v̂,
7
We briefly review here some of the many algorithms that have been proposed
for solving the primal formulation (2) of the ROF model, the dual formulation
(6), or both formulations simultaneously. We refer the interested readers to [9]
for a more comprehensive survey.
In their original paper [17], ROF proposed a time-marching scheme that
solves the associated Euler-Lagrange equation of (2) by seeking the steady-
state solution of a parabolic PDE. The method is (asymptotically) slow due
to the CFL stability constraints (see [16]), which puts a tight bound on the
time step when the solution develop flat regions (where |∇u| ≈ 0). Hence, this
scheme is useful in practice only when low-accuracy solutions suffice. Even for
an accuracy sufficient to yield a visually satisfactory result, the cost is often
too great.
In [19], Vogel and Oman proposed to solve the same Euler-Lagrange equa-
tion of (2) via fixed-point iteration. Their main idea is to fix the diffusion
1
coefficient |∇u| in the Euler-Lagrange equation to its value at a previous step,
thus obtaining the solution to the nonlinear equation by solving a sequence of
linear systems. They prove global convergence and show that their method is
asymptotically much faster than the explicit time-marching scheme.
Chan, Golub, and Mulet [8] (CGM) use Newton’s method to solve a smoothed
version of the primal-dual system for the ROF model, in which p the gradient
norm |∇u| is replaced by a smoothed approximation |∇u|β = |∇u|2 + β,
for some smoothing parameter β > 0. Since this approach is based on New-
ton’s method, it converges quadratically, but to a solution of the smoothed
approximate model rather than to the solution of (2). Smaller values of β
yield approximate solutions that are closer to the true solution, but more iter-
ations are required before the onset of asymptotic quadratic convergence. The
cost per iteration is similar to that of the fixed-point iteration scheme.
Hintermüller and Stadler [14] (HS) discuss an
R infeasible-interior-point method
for a modification of (2) in which an term µ Ω |∇u|2 dx is added, for some
small but positive µ. By perturbing the dual of their problem with a regular-
ization term, then applying a semismooth Newton method to the primal-dual
8
Most existing numerical algorithms to solve ROF models (2) or (6) can be
loosely divided into two categories: those that need to solve a linear system
of equations at each iteration (implicit) and those that require only a matrix-
vector multiplication in the discrete setting (explicit). Generally speaking, the
implicit methods (e.g. CGM, HS, and SOCP) have fast asymptotic conver-
gence rates and can provide highly accurate benchmark solutions. However,
explicit methods are preferred in many situations for their simplicity and their
convergence with relatively little computational effort to medium-accurate and
visually satisfactory results. Their low memory requirements make them even
more attractive for large-scale problems. To illustrate the high memory require-
ments of implicit schemes, we note that an image of size 512 × 512 is close to
the limit of what the SOCP solver MOSEK can handle on a workstation with
2GB of memory.
In the remainder of this paper, we report on the development, implementa-
tion, and testing of some simple but fast explicit algorithms. These algorithms
are based on the the dual formulation (6) so they do not require any numerical
smoothing parameters that would prevent them from converging to the true
optimizer. Our proposed approaches are for the most part gradient projection
algorithms applied to (6), in which the search path from each iterate is ob-
tained by projecting negative-gradient (steepest descent) directions onto the
feasible set. Various enhancements involving different step-length rules and
9
From now on, we will focus on the solution of problem (12), which we restate
here:
1
min F (x) := kAx − λgk22 , (20)
x∈X 2
where the compact set X ⊂ R2N is defined in (12). In this section, we discuss
GP techniques for solving this problem. Our approaches move from iterate
xk to the next iterate xk+1 using the scheme (18)-(19). Projection PX on
the set X, a Cartesian product of unit Euclidean balls, can be computed
straightforwardly as follows.
xl
PX (x) = , l = 1, 2, . . . , N. (21)
l max{kxl k, 1}
This operation projects each 2 × 1 subvector of x separately onto the unit
ball in R2 . It is worth pointing out here that this structure of the dual con-
straints, which makes the gradient projection approach practical, also enables
Chambolle to develop an analytical formula for the Lagrange multipliers in
[5].
Our approaches below differ in their rules for choosing the step-length
parameters αk and γk in (18) and (19).
10
Framework GP-NoLS
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.
Step 1. Choose step length αk ∈ [αmin , αmax ].
Step 2. Set xk+1 = xk (αk ).
Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1
and go to Step 1.
Framework GP-ProjArc
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax ,
and choose ρ ∈ (0, 1) and µ ∈ (0, 12 ). Choose x0 and set k ← 0.
Step 1. Choose initial step length ᾱk ∈ [αmin , αmax ].
Step 2. Backtracking Line Search. Choose reference value Frk , and set m to
the smallest nonnegative integer such that
F (xk (ρm ᾱk )) ≤ Frk − µ∇F (xk )T (xk − xk (ρm ᾱk )),
where xk (α) is defined as in (19);
Set αk = ρm ᾱk and xk+1 = xk (αk ).
Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1
and go to Step 1.
Framework GP-LimMin
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.
Step 1. Choose step length αk ∈ [αmin , αmax ]. Compute xk (αk ) and set δ k :=
(xk (αk ) − xk ).
Step 2. Limited Minimizing Line Search. Set xk+1 = xk + γk δ k , with γk =
mid(0, γk,opt , 1) and
−(δ k )T ∇F (xk )
γk,opt = arg min F (xk + γδ k ) = (23)
kAδ k k22
so the Lipschitz constant for ∇F is kAT Ak2 , which is bounded by 8 (see [5,
p. 92]). It follows immediately from [2, Proposition 2.3.2] that every accumu-
lation point of {xk } is stationary for (20) provided that 0 < α < .25. The
result now follows immediately from Proposition 1.
which yields
k∆xk−1 k22 k∆xk−1 k22
αk,1 = = . (24)
h∆xk−1 , ∆g k−1 i kA∆xk−1 k22
An alternative formula is obtained similarly, by doing a least-squares fit to α
rather than α−1 , to obtain
h∆xk−1 , ∆g k−1 i kA∆xk−1 k22
αk,2 = arg min k∆xk−1 − α∆g k−1 k22 = = .
α∈R k∆g k−1 k22 kAT A∆xk−1 k22
(25)
13
(g k )T ∇F (xk ) kg k k22
ᾱk = 2 = .
k
kAg k2 kAg k k22
In practice, we find that using 12 ᾱk as the initial value gives better performance,
and backtracking is not necessary in any of our numerical experiments.
14
BB
αml+i = αml+1 for l = 0, 1, 2, . . . and i = 1, 2, . . . , m − 1,
BB
where αml+1 is obtained from (24) with k = ml + 1, restricted to the interval
[αmin , αmax ].
where γk,opt is obtained from the limited minimization rule (23). We refer
interested readers to [18] for the rationale of the criterion. In any case, the
chosen αk is adjusted to ensure that it lies in the interval [αmin , αmax ].
15
where the scalars zl are Lagrange multipliers for the constraints kxl k22 ≤ 1,
l = 1, 2, . . . , N , and the operator ⊥ indicates that at least one of its two
operands must be zero. At iteration k, we compute an estimate of the active
set Ak ⊂ {1, 2, . . . , N }, which are those indices for which we believe that
kxl k22 = 1 at the solution. In our implementation, we choose this set as follows:
The SQP step is a Newton-like step for the following system of nonlinear
equations, from the current estimates xk and zlk , l = 1, 2, . . . , N :
Using z̃lk+1 to denote the values of zl at the next iterate, and d˜k to denote
the step in xk , a “second-order” step can be obtained from (27) by solving the
following system for d˜k and z̃lk+1 , l = 1, 2, . . . , N :
ATl Ad˜k + 2z̃lk+1 d˜kl = −ATl [Axk − λg] − 2xkl z̃lk+1 , l = 1, 2, . . . , N, (28a)
2(xkl )T d˜kl = 0, l ∈ Ak , (28b)
z̃lk+1 = 0, l ∈
/ Ak . (28c)
αk−1 dkl + 2zlk+1 dkl = −ATl [Axk − λg] − 2xkl zlk+1 , l = 1, 2, . . . , N, (29a)
2(xkl )T dkl = 0, l ∈ Ak , (29b)
zlk+1 = 0, l ∈
/ Ak . (29c)
Considering indices l ∈ Ak , we take the inner product of (29a) with xkl and
use (29b) and (26) to obtain:
We obtain the steps dkl for these indices by substituting this expression in
(29a):
dkl = −(αk−1 + 2zlk+1 )−1 ATl (Axk − λg) + 2xkl zlk+1 , l ∈ Ak .
4 Termination
Since |∇u∗ | = ∇u∗ · w∗ and |∇u| ≥ ∇u · w for any feasible w (since |w| ≤ 1),
we have
Z
λku − u∗ k22 = (u − u∗ )(∇ · w − ∇ · w∗ )
Ω
Z Z Z Z
∗ ∗
= u∇ · w − u∇ · w − u ∇·w+ u∗ ∇ · w ∗
Ω Ω Ω Ω
Z Z Z Z
∗ ∗
= −∇u · w + ∇u · w + ∇u · w − |∇u∗ |
ZΩ ZΩ Z Ω Ω
∗
≤ −∇u · w + |∇u| + |∇u |(|w| − 1)
ZΩ Ω Ω
≤ (|∇u| − ∇u · w)
Ω
= G(u, w). (32)
Using this bound, we obtain the following bound when (u, w) satisfies (31):
p p
ku − u∗ k2 ≤ G(u, w)/λ ≤ (|P (u)| + |D(w)|)(tol)/λ.
5 Computational Experiments
Fig. 1 The original clean images for our test problems (left) alongside noisy images gener-
ated by adding Gaussian noise with variance .01, by calling Matlab function imnoise. Top:
128 × 128 “shape”; middle: 256 × 256 “cameraman”; Bottom: 512 × 512 “Barbara”.
19
small and αmax sufficiently large.) In Algorithm GPLS, we used ρ = 0.5 and
µ = 10−4 . In Algorithm GPABB, we set γl = 0.1 and γu = 5. In GPBB(safe),
we set µ = 10−4 and M = 5 in the formula (22).
We also tried variants of the GPBB methods in which the initial choice
of αk was scaled by a factor of 0.5 at every iteration. We found that this
variant often enhanced performance. This fact is not too surprising, as we
can see from Section 3 that the curvature of the boundary of constraint set X
suggests that it is appropriate to add positive diagonal elements to the Hessian
approximation, which corresponds to decreasing the value of αk .
In the CGM implementation, we used a direct solver for the linear system
at each iteration, as the conjugate gradient iterative solver (which is an option
in the CGM code) was slower on these examples. The smooth parameter β
is dynamically updated based on duality gap from iteration to iteration. In
2
particular, we take β0 = 100 and let βk = βk−1 (Gk /Gk−1 ) , where Gk and
Gk−1 are the duality gaps for the past two iterations. This simple strategy for
updating β, which is borrowed from interior-point methods, outperforms the
classical CGM approach, producing faster decrease in the duality gap.
All methods are coded in MATLAB and executed on a Dell Precision T5400
workstation with 2.66 GHz Inel Quadcore processor and 4GB main memory.
It is likely the performance can be improved by recoding the algorithms in C
or C++, but we believe that improvements would be fairly uniform across all
the algorithms.
Tables 1, 2, and 3 report number of iterations and average CPU times over
ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point x0 = 0 in each algorithm and
the relative duality gap stopping criterion (31). We vary the threshold tol
from 10−2 to 10−6 , where smaller values of tol yield more accurate solutions
to the optimization formulation.
Figure 2 shows the denoised images obtained at different values of tol.
Note that there is little visual difference between the results obtained with
two tolerance values 10−2 and 10−4 . Smaller values of tol do not produce
further visual differences. By varing λ slightly we can obtain better visual
results for these images, but still the visual quality of the computed solution
does not improve markedly as tol is reduced below 10−2 .
The tables show that on all problems, the proposed gradient projection
algorithms are competitive to Chambolle’s method, and that some variants
are significantly faster, especially when moderate accuracy is required for
the solutions. Three variants stood out as good performers: the GPBB-NM
and GPABB variants, along with the GPBB-M(3) variant in which the initial
choice of αk was scaled by 0.5 at each iteration. For all tests with tol = 10−2 ,
tol = 10−3 , and tol = 10−4 , the winner was one of the gradient-projection
Barzilai-Borwein strategies.
For these low-to-moderate accuracy requirements, CGM is generally slower
than the gradient-based methods, particularly on the larger problems. The
picture changes somewhat, however, when high accuracy (tol = 10−6 ) is
20
Fig. 2 The denoised images with different level of termination criterions. left column: tol =
10−2 , right column: tol = 10−4 .
21
Table 1 Number of iterations and CPU times (in seconds) for problem 1. ∗ = initial αk
scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 18 0.06 164 0.50 1091 3.34 22975 74.26
GPCL 36 0.11 134 0.38 762 2.22 15410 46.34
GPLS 22 0.14 166 1.02 891 5.63 17248 113.84
GPBB-M 12 0.05 148 0.61 904 3.75 16952 72.82
GPBB-M(3) 13 0.05 69 0.28 332 1.32 4065 16.48
GPBB-M(3)∗ 11 0.05 47 0.18 188 0.75 2344 9.50
GPBB-NM 10 0.04 49 0.18 229 0.85 3865 14.66
GPABB 13 0.07 54 0.26 236 1.13 2250 10.90
GPBB(safe) 10 0.05 50 0.21 209 0.98 3447 17.32
SQPBB-M 11 0.07 48 0.32 170 1.20 3438 25.05
CGM 5 1.13 9 2.04 12 2.73 18 4.14
Table 2 Number of iterations and CPU times (in seconds) for problem 2. ∗ = initial αk
scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 26 0.36 165 2.21 813 11.03 14154 193.08
GPCL 32 0.41 116 1.49 535 7.03 9990 132.77
GPLS 24 0.55 134 3.84 583 17.48 11070 341.37
GPBB-M 20 0.38 124 2.35 576 10.93 10644 203.83
GPBB-M(3) 20 0.36 98 1.78 333 6.09 3287 60.44
GPBB-M(3)∗ 17 0.31 47 0.86 167 3.05 1698 31.17
GPBB-NM 16 0.27 53 0.91 183 3.15 2527 43.80
GPABB 16 0.36 47 1.02 158 3.46 1634 35.85
GPBB(safe) 16 0.29 52 0.96 170 3.45 2294 51.87
SQPBB-M 14 0.37 48 1.36 169 5.16 2537 79.83
CGM 5 5.67 9 10.37 13 15.10 19 22.28
6
10
BB−NM
Chambolle
CGM
4
10
2
10
0
10
−2
10
−4
10
−6
10
−8
10
0 5 10 15 20 25 30
CPU Time
8
10
BB−NM
Chambolle
6 CGM
10
4
10
2
10
Relative Duality Gap
0
10
−2
10
−4
10
−6
10
−8
10
0 10 20 30 40 50 60 70 80 90
CPU Time
8
10
BB−NM
Chambolle
6 CGM
10
4
10
2
10
Relative Duality Gap
0
10
−2
10
−4
10
−6
10
−8
10
0 50 100 150 200 250 300 350
CPU Time
Fig. 3 Duality gap vs. CPU time for GPBB-NM, Chambolle, and CGT codes, for problems
Prolems 1, 2, and 3, respectively.
23
Table 3 Number of iterations and CPU times (in seconds) for problem 3. ∗ = initial αk
scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 27 2.35 131 11.43 544 47.16 8664 751.59
GPCL 24 2.02 80 6.83 337 28.76 5907 505.86
GPLS 40 5.86 89 14.72 324 57.87 5780 1058.59
GPBB-M 20 2.44 88 10.82 352 43.33 5644 695.89
GPBB-M(3) 20 2.38 59 7.09 197 23.73 2598 313.59
GPBB-M(3)∗ 17 2.03 41 4.94 131 15.77 1191 143.66
GPBB-NM 15 1.67 40 4.58 117 13.20 1513 171.21
GPABB 14 1.99 38 5.41 117 16.76 1135 162.54
GPBB(safe) 15 1.74 39 4.70 112 14.61 1425 204.50
SQPBB-M 14 2.35 35 6.24 93 17.19 1650 312.16
CGM 6 42.38 10 67.65 13 87.66 19 126.68
Acknowledgements We thank the referees for several suggestions that improved the pa-
per.
References
1. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA Journal of
Numerical Analysis 8, 141–148 (1988)
2. Bertsekas, D.P.: Nonlinear Programming, second edn. Athena Scientific (1999)
3. Birgin, E.G., Martı́nez, J.M., Raydan, M.: Nonmonotone spectral projected gradient
methods on convex sets. SIAM Journal on Optimization 10(4), 1196–1211 (2000)
24
4. Carter, J.L.: Dual method for total variation-based image restoration. Report 02-13,
UCLA CAM (2002)
5. Chambolle, A.: An algorithm for total variation minimization and applications. Journal
of Mathematical Imaging and Visualization 20, 89–97 (2004)
6. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related
problmes. Numerische Mathematik 76, 167–188 (1997)
7. Chan, T.F., Esedoglu, S., Park, F., Yip, A.: Total variation image restoration: Overview
and recent developments. In: N. Paragios, Y. Chen, O. Faugeras (eds.) Handbook of
Mathematical Models in Computer Vision. Springer (2005)
8. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation
based image restoration. SIAM Journal of Scientific Computing 20, 1964–1977 (1999)
9. Chan, T.F., Zhu, M.: Fast algorithms for total variation-based image processing. In:
Proceedings of the 4th ICCM. Hangzhuo, China (2007)
10. Dai, Y.H., Fletcher, R.: Projected barzilai-borwein methods for large-scale box-
constrained quadratic programming. Numerische Mathematik 100, 21–47 (2005)
11. Dai, Y.H., Hager, W.W., Schittkowski, K., Zhang, H.: The cyclic Barzilai-Borwein
method for unconstrained optimization. IMA Journal of Numerical Analysis 26, 604–
627 (2006)
12. Ekeland, I., Témam: Convex Analysis and Variational Problems. SIAM Classics in
Applied Mathematics. SIAM (1999)
13. Goldfarb, D., Yin, W.: Second-order cone programming methods for total variation-
based image restoration. SIAM Journal on Scientific Computing 27, 622–645 (2005)
14. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for TV-based inf-
convolution-type image restoration. SIAM Journal on Scientific Computing 28, 1–23
(2006)
15. Hiriart-Urruty, J., Lemaréchal, C.: Convex Analysis and Minimization Algorithms,
vol. I. Springer-Verlag, Berlin (1993)
16. Osher, S., Marquina, A.: Explicit algorithms for a new time dependent model based on
level set motion for nonlinear deblurring and noise removal. SIAM Journal of Scientific
Computing 22, 387–405 (2000)
17. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algo-
rithms. Physica D 60, 259–268 (1992)
18. Serafini, T., Zanghirati, G., Zanni, L.: Gradient projection methods for large quadratic
programs and applications in training support vector machines. Optimization Methods
and Software 20(2–3), 353–378 (2004)
19. Vogel, C.R., Oman, M.E.: Iterative methods for total variation denoising. SIAM Journal
of Scientific Computing 17, 227–238 (1996)
20. Wang, Y., Ma, S.: Projected barzilai-borwein methods for large-scale nonnegative image
restoration. Inverse Problems in Science and Engineering 15(6), 559–583 (2007)