0% found this document useful (0 votes)

24 views20 pages

Beck 2009

Uploaded by

consultas.artinchik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views20 pages

Beck 2009

Uploaded by

consultas.artinchik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

SIAM J.

IMAGING SCIENCES
c 2009 Society for Industrial and Applied Mathematics
Vol. 2, No. 1, pp. 183–202

A Fast Iterative Shrinkage-Thresholding Algorithm

for Linear Inverse Problems∗
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Amir Beck† and Marc Teboulle‡

Abstract. We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse
problems arising in signal/image processing. This class of methods, which can be viewed as an ex-
tension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for
solving large-scale problems even with dense matrix data. However, such methods are also known to
converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm
(FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence
which is proven to be significantly better, both theoretically and practically. Initial promising nu-
merical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is
shown to be faster than ISTA by several orders of magnitude.

Key words. iterative shrinkage-thresholding algorithm, deconvolution, linear inverse problem, least squares and
l1 regularization problems, optimal gradient method, global rate of convergence, two-step iterative
algorithms, image deblurring

AMS subject classifications. 90C25, 90C06, 65F22

DOI. 10.1137/080716542

1. Introduction. Linear inverse problems arise in a wide range of applications such as

astrophysics, signal and image processing, statistical inference, and optics, to name just a
few. The interdisciplinary nature of inverse problems is evident through a vast literature
which includes a large body of mathematical and algorithmic developments; see, for instance,
the monograph [13] and the references therein.
A basic linear inverse problem leads us to study a discrete linear system of the form
(1.1) Ax = b + w,
where A ∈ Rm×n and b ∈ Rm are known, w is an unknown noise (or perturbation) vector,
and x is the “true” and unknown signal/image to be estimated. In image blurring problems,
for example, b ∈ Rm represents the blurred image, and x ∈ Rn is the unknown true image,
whose size is assumed to be the same as that of b (that is, m = n). Both b and x are
formed by stacking the columns of their corresponding two-dimensional images. In these
applications, the matrix A describes the blur operator, which in the case of spatially invariant
blurs represents a two-dimensional convolution operator. The problem of estimating x from
the observed blurred and noisy image b is called an image deblurring problem.
∗
Received by the editors February 25, 2008; accepted for publication (in revised form) October 23, 2008; published
electronically March 4, 2009. This research was partially supported by the Israel Science Foundation, ISF grant 489-
06.
http://www.siam.org/journals/siims/2-1/71654.html
†
Department of Industrial Engineering and Management, Technion–Israel Institute of Technology, Haifa 32000,
Israel ([email protected].).
‡
School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel ([email protected].).
183

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

184 AMIR BECK AND MARC TEBOULLE

1.1. Background. A classical approach to problem (1.1) is the least squares (LS) approach
[4] in which the estimator is chosen to minimize the data error:

(LS): x̂LS = argmin Ax − b2 .

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

When m = n (as is the case in some image processing applications) and A is nonsingular, the
LS estimate is just the näive solution A−1 b. In many applications, such as image deblurring,
it is often the case that A is ill-conditioned [22], and in these cases the LS solution usually has
a huge norm and is thus meaningless. To overcome this diﬃculty, regularization methods are
required to stabilize the solution. The basic idea of regularization is to replace the original ill-
conditioned problem with a “nearby” well-conditioned problem whose solution approximates
the required solution. One of the popular regularization techniques is Tikhonov regularization
[33] in which a quadratic penalty is added to the objective function:

(1.2) (T): x̂TIK = argmin{Ax − b2 + λLx2 }.

The second term in the above minimization problem is a regularization term that controls the
norm (or seminorm) of the solution. The regularization parameter λ > 0 provides a tradeoff
between fidelity to the measurements and noise sensitivity. Common choices for L are the
identity or a matrix approximating the first or second order derivative operator [19, 21, 17].
Another regularization method that has attracted a revived interest and considerable
amount of attention in the signal processing literature is l1 regularization in which one seeks
to find the solution of

(1.3) min{F (x) ≡ Ax − b2 + λx1 },

where x1 stands for the sum of the absolute values of the components of x; see, e.g.,
[15, 32, 10, 16]. More references on earlier works promoting the use of l1 regularization,
as well as its relevance to other research areas, can be found in the recent work [16]. In
image deblurring applications, and in particular in wavelet-based restoration methods, A is
often chosen as A = RW, where R is the blurring matrix and W contains a wavelet basis
(i.e., multiplying by W corresponds to performing inverse wavelet transform). The vector x
contains the coeﬃcients of the unknown image. The underlying philosophy in dealing with
the l1 norm regularization criterion is that most images have a sparse representation in the
wavelet domain. The presence of the l1 term is used to induce sparsity in the optimal solution
of (1.3); see, e.g., [11, 8]. Another important advantage of the l1 -based regularization (1.3)
over the l2 -based regularization (1.2) is that as opposed to the latter, l1 regularization is less
sensitive to outliers, which in image processing applications correspond to sharp edges.
The convex optimization problem (1.3) can be cast as a second order cone programming
problem and thus could be solved via interior point methods [1]. However, in most appli-
cations, e.g., in image deblurring, the problem is not only large scale (can reach millions of
decision variables) but also involves dense matrix data, which often precludes the use and
potential advantage of sophisticated interior point methods. This motivated the search of
simpler gradient-based algorithms for solving (1.3), where the dominant computational eﬀort

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 185

is a relatively cheap matrix-vector multiplication involving A and AT ; see, for instance, the
recent study [16], where problem (1.3) is reformulated as a box-constrained quadratic prob-
lem and solved by a gradient projection algorithm. One of the most popular methods for
solving problem (1.3) is in the class of iterative shrinkage-thresholding algorithms (ISTA),
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where each iteration involves matrix-vector multiplication involving A and AT followed by a

shrinkage/soft-threshold step;1 see, e.g., [7, 15, 10, 34, 18, 35]. Speciﬁcally, the general step
of ISTA is

(1.4) xk+1 = T λt xk − 2tAT (Axk − b) ,

where t is an appropriate stepsize and T α : Rn → Rn is the shrinkage operator deﬁned by

(1.5) T α (x)i = (|xi | − α)+ sgn (xi ).

In the optimization literature, this algorithm can be traced back to the proximal forward-
backward iterative scheme introduced in [6] and [30] within the general framework of splitting
methods; see [14, Chapter 12] and the references therein for a very good introduction to this
approach, including convergence results. Another interesting recent contribution including
very general convergence results for the sequence xk produced by proximal forward-backward
algorithms under various conditions and settings relevant to linear inverse problems can be
found in [9].
1.2. Contribution. The convergence analysis of ISTA has been well studied in the lit-
erature under various contexts and frameworks, including various modiﬁcations; see, e.g.,
[15, 10, 9] and the references therein, with a focus on establishing conditions under which the
sequence {xk } converges to a solution of (1.3). The advantage of ISTA is in its simplicity.
However, ISTA has also been recognized as a slow method. The very recent manuscript [5]
provides further rigorous grounds to that claim by proving that under some assumptions on
the operator A the sequence {xk } produced by ISTA shares an asymptotic rate of convergence
that can be very slow and arbitrarily bad (for details, see in particular Theorem 3 and the
conclusion in [5, section 6]).
In this paper, we focus on the nonasymptotic global rate of convergence and eﬃciency
of methods like ISTA measured through function values. Our development and analysis will
consider the more general nonsmooth convex optimization model

(1.6) min{F (x) ≡ f (x) + g(x)},

where f, g are convex functions, with g possibly nonsmooth (see section 2.2 for a precise
description). Basically, the general step of ISTA is of the form

xk+1 = T λt (G(xk )),

where G(·) stands for a gradient step of the ﬁt-to-data LS term in (1.3) and ISTA is an
extension of the classical gradient method (see section 2 for details). Therefore, ISTA belongs
1
Other names in the signal processing literature include, for example, threshold Landweber method, iterative
denoising, and deconvolution algorithms.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

186 AMIR BECK AND MARC TEBOULLE

to the class of first order methods, that is, optimization methods that are based on function
values and gradient evaluations. It is well known that for large-scale problems first order
methods are often the only practical option, but as alluded to above it has been observed that
the sequence {xk } converges quite slowly to a solution. In fact, as a first result we further
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

conﬁrm this property by proving that ISTA behaves like

F (xk ) − F (x∗ ) O(1/k),

namely, shares a sublinear global rate of convergence.

The important question then is whether we can devise a faster method than the iterative
shrinkage-thresholding scheme described above, in the sense that the computational effort of
the new method will keep the simplicity of ISTA, while its global rate of convergence will be
significantly better, both theoretically and practically. This is the main contribution of this
paper which answers this question affirmatively. To achieve this goal, we consider a method
which is similar to ISTA and of the form

xk+1 = T λt (G(yk )),

where the new point yk will be smartly chosen and easy to compute; see section 4. This idea
builds on an algorithm which is not so well known and which was introduced and developed by
Nesterov in 1983 [27] for minimizing a smooth convex function, and proven to be an “optimal”
ﬁrst order (gradient) method in the sense of complexity analysis [26].
Here, the problem under consideration is convex but nonsmooth, due to the l1 term.
Despite the presence of a nonsmooth regularizer in the objective function, we prove that
we can construct a faster algorithm than ISTA, called FISTA, that keeps its simplicity but
shares the improved rate O(1/k2 ) of the optimal gradient method devised earlier in [27] for
minimizing smooth convex problems. Our theoretical analysis is general and can handle an
objective function with any convex nonsmooth regularizers (beyond l1 ) and any smooth convex
function (instead of the LS term), and constraints can also be handled.
1.3. Some recent algorithms accelerating ISTA. Very recently other researchers have
been working on alternative algorithms that could speed up the performance of ISTA. Like
FISTA proposed in this paper, these methods also rely on computing the next iterate based
not only on the previous one, but on two or more previously computed iterates. One such line
of research was very recently considered in [3], where the authors proposed an interesting two-
step ISTA (TWIST) which, under some assumptions on the problem’s data and appropriately
chosen parameters deﬁning the algorithm, is proven to converge to a minimizer of an objective
function of the form

(1.7) Ax − b2 + ϕ(x),

where ϕ is a convex nonsmooth regularizer. The eﬀectiveness of TWIST as a faster method

than ISTA was demonstrated experimentally on various linear inverse problems [3].
Another line of analysis toward an acceleration of ISTA for the same class of problems
(1.7) was recently considered in [12] by using sequential subspace optimization techniques
and relying on generating the next iterate by minimizing a function over an aﬃne subspace

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 187

spanned by two or more previous iterates and the current gradient. The speedup gained
by this approach has been shown through numerical experiments for denoising application
problems. For both of these recent methods [3, 12], global nonasymptotic rate of convergence
has not been established.
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

After this paper was submitted for publication we recently became aware2 of a very recent
unpublished manuscript by Nesterov [28], who has independently investigated a multistep
version of an accelerated gradient-like method that also solves the general problem model (1.6)
and, like FISTA, is proven to converge in function values as O(1/k2 ), where k is the iteration
counter. While both algorithms theoretically achieve the same global rate of convergence, the
two schemes are remarkably different both conceptually and computationally. In particular,
the main differences between FISTA and the new method proposed in [28] are that (a) on
the building blocks of the algorithms, the latter uses an accumulated history of the past
iterates to build recursively a sequence of estimate functions ψk (·) that approximates F (·),
while FISTA uses just the usual projection-like step, evaluated at an auxiliary point very
specially constructed in terms of the two previous iterates and an explicit dynamically updated
stepsize; (b) the new Nesterov’s method requires two projection-like operations per iteration,
as opposed to one single projection-like operation needed in FISTA. As a consequence of the
key differences between the building blocks and iterations of FISTA versus the new method of
[28], the theoretical analysis and proof techniques developed here to establish the global rate
convergence rate result are completely different from that given in [28].
1.4. Outline of the paper. In section 2, we recall some basic results pertinent to gradient-
based methods and provide the building blocks necessary to the analysis of ISTA and, more
importantly, of FISTA. Section 3 proves the aforementioned slow rate of convergence for ISTA,
and in section 4 we present the details of the new algorithm FISTA and prove the promised
faster rate of convergence. In section 5 we present some preliminary numerical results for
image deblurring problems, which demonstrate that FISTA can be even faster than the proven
theoretical rate and can outperform ISTA by several orders of magnitude, thus showing the
potential promise of FISTA. To gain further insights into the potential of FISTA we have also
compared it with the recent algorithm TWIST of [3]. These preliminary numerical results
show evidence that FISTA can also be faster than TWIST by several orders of magnitude.
Notation. The inner product of two vectors x, y ∈ Rn is denoted by x, y = xT y. For a
matrix A, the maximum eigenvalue is denoted by λmax (A). For a vector x, x denotes the
Euclidean norm of x. The spectral norm of a matrix A is denoted by A.
2. The building blocks of the analysis. In this section we first recall some basic facts on
gradient-based methods. We then formulate our problem and establish in Lemma 2.3 a result
which will play a central role in the global convergence rate of analysis of the algorithms under
study.
2.1. Gradient methods and ISTA. Consider the unconstrained minimization problem of
a continuously differentiable function f : Rn → R:

(U) min{f (x) : x ∈ Rn }.

2
We are also grateful to a referee for pointing out to us Nesterov’s manuscript [28].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

188 AMIR BECK AND MARC TEBOULLE

One of the simplest methods for solving (U) is the gradient algorithm which generates a
sequence {xk } via

(2.1) x0 ∈ Rn , xk = xk−1 − tk ∇f (xk−1 ),

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

where tk > 0 is a suitable stepsize. It is very well known (see, e.g., [31, 2]) that the gradient
iteration (2.1) can be viewed as a proximal regularization [24] of the linearized function f at
xk−1 , and written equivalently as

1
xk = argmin f (xk−1 ) + x − xk−1 , ∇f (xk−1 ) + x − xk−1 2 .
x 2tk
Adopting this same basic gradient idea to the nonsmooth l1 regularized problem

(2.2) min{f (x) + λx1 : x ∈ Rn }

leads to the iterative scheme

1 2
xk = argmin f (xk−1 ) + x − xk−1 , ∇f (xk−1 ) + x − xk−1 + λx1 .
x 2tk
After ignoring constant terms, this can be rewritten as

1 2
(2.3) xk = argmin x − (xk−1 − tk ∇f (xk−1 )) + λx1 ,
x 2tk

which is a special case of the scheme introduced in [30, model (BF), p. 384] for solving (2.2).
Since the l1 norm is separable, the computation of xk reduces to solving a one-dimensional
minimization problem for each of its components, which by simple calculus produces

xk = T λtk (xk−1 − tk ∇f (xk−1 )),

where T α : Rn → Rn is the shrinkage operator given in (1.5).

Thus, with f (x) := Ax − b2 , the popular ISTA is recovered as a natural extension of a
gradient-based method. As already mentioned in the introduction, for solving the l1 problem
(1.3), ISTA has been developed and analyzed independently through various techniques by
many researchers. A typical condition ensuring convergence of xk to a minimizer x∗ of (1.3)
is to require that tk ∈ (0, 1/AT A). For example, this follows as a special case of a more
general result which can be found in [14, Theorem 12.4.6] (see also Chapter 12 of [14] and its
references for further details, modiﬁcations, and extensions).
2.2. The general model. As recalled above, the basic idea of the iterative shrinkage
algorithm is to build at each iteration a regularization of the linearized diﬀerentiable function
part in the objective. For the purpose of our analysis, we consider the following general
formulation which naturally extends the problem formulation (1.3):

(2.4) (P) min{F (x) ≡ f (x) + g(x) : x ∈ Rn }.

The following assumptions are made throughout the paper:

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 189

• g : Rn → R is a continuous convex function which is possibly nonsmooth.

• f : Rn → R is a smooth convex function of the type C1,1 , i.e., continuously diﬀeren-
tiable with Lipschitz continuous gradient L(f ):
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

∇f (x) − ∇f (y) ≤ L(f )x − y for every x, y ∈ Rn ,

where · denotes the standard Euclidean norm and L(f ) > 0 is the Lipschitz constant
of ∇f .
• Problem (P) is solvable, i.e., X∗ := argmin F = ∅, and for x∗ ∈ X∗ we set F∗ := F (x∗ ).
Example 2.1. When g(x) ≡ 0, (P) is the general unconstrained smooth convex minimiza-
tion problem.
Example 2.2. The l1 regularization problem (1.3) is obviously a special instance of problem
(P) by substituting f (x) = Ax − b2 , g(x) = x1 . The (smallest) Lipschitz constant of the
gradient ∇f is L(f ) = 2λmax (AT A).
2.3. The basic approximation model. In accordance with the basic results recalled in
section 2.1, we adopt the following approximation model. For any L > 0, consider the following
quadratic approximation of F (x) := f (x) + g(x) at a given point y:
L
(2.5) QL (x, y) := f (y) + x − y, ∇f (y) + x − y2 + g(x),
2
which admits a unique minimizer

(2.6) pL (y) := argmin{QL (x, y) : x ∈ Rn }.

Simple algebra shows that (ignoring constant terms in y)

2
L 1

pL (y) = argmin g(x) + x − y − ∇f (y) .
x 2 L

Clearly, the basic step of ISTA for problem (P) thus reduces to

xk = pL (xk−1 ),

where L plays the role of a stepsize. Even though in our analysis we consider a general
nonsmooth convex regularizer g(x) in place of the l1 norm, we will still refer to this more
general method as ISTA.
2.4. The two pillars. Before proceeding with the analysis of ISTA we establish a key
result (see Lemma 2.3 below) that will be crucial for the analysis of not only ISTA but also
the new faster method introduced in section 4. For that purpose we first need to recall the
first pillar, which is the following well-known and fundamental property for a smooth function
in the class C 1,1 ; see, e.g., [29, 2].
Lemma 2.1. Let f : Rn → R be a continuously differentiable function with Lipschitz con-
tinuous gradient and Lipschitz constant L(f ). Then, for any L ≥ L(f ),
L
(2.7) f (x) ≤ f (y) + x − y, ∇f (y) + x − y2 for every x, y ∈ Rn .
2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

190 AMIR BECK AND MARC TEBOULLE

We also need the following simple result which characterize the optimality of pL (·).
Lemma 2.2. For any y ∈ Rn , one has z = pL (y) if and only if there exists γ(y) ∈ ∂g(z),
the subdiﬀerential of g(·), such that
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(2.8) ∇f (y) + L(z − y) + γ(y) = 0.

Proof. The proof is immediate from optimality conditions for the strongly convex problem
(2.6).
We are now ready to state and prove the promised key result.
Lemma 2.3. Let y ∈ Rn and L > 0 be such that
(2.9) F (pL (y)) ≤ Q(pL (y), y).
Then for any x ∈ Rn ,
L
F (x) − F (pL (y)) ≥ pL (y) − y2 + Ly − x, pL (y) − y.
2
Proof. From (2.9), we have
(2.10) F (x) − F (pL (y)) ≥ F (x) − Q(pL (y), y).
Now, since f, g are convex we have
f (x) ≥ f (y) + x − y, ∇f (y),
g(x) ≥ g(pL (y)) + x − pL (y), γ(y),
where γ(y) is defined in the premise of Lemma 2.2. Summing the above inequalities yields
(2.11) F (x) ≥ f (y) + x − y, ∇f (y) + g(pL (y)) + x − pL (y), γ(y).
On the other hand, by the definition of pL (y) one has
L
(2.12) Q(pL (y), y) = f (y) + pL (y) − y, ∇f (y) + pL (y) − y2 + g(pL (y)).
2
Therefore, using (2.11) and (2.12) in (2.10) it follows that
L
F (x) − F (pL (y)) ≥ − pL (y) − y2 + x − pL (y), ∇f (y) + γ(y)
2
L
= − pL (y) − y2 + Lx − pL (y), y − pL (y),
2
L
= pL (y) − y2 + Ly − x, pL (y) − y,
2
where in the first equality above we used (2.8).
Note that from Lemma 2.1, it follows that if L ≥ L(f ), then the condition (2.9) is always
satisfied for pL (y).
Remark 2.1. As a final remark in this section, we point out that all the above results and
the forthcoming results in this paper also hold in any real Hilbert space setting. Moreover,
all the results can be adapted for problem (P) with convex constraints. In that case, if C is a
nonempty closed convex subset of Rn , the computation of pL (·) might require intensive com-
putation, unless C is very simple (e.g., the nonnegative orthant). For simplicity of exposition,
all the results are developed in the unconstrained and finite-dimensional setting.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 191

3. The global convergence rate of ISTA. The convergence analysis of ISTA has been
well studied for the l1 regularization problem (1.3) and the more general problem (P), with
a focus on conditions ensuring convergence of the sequence {xk } to a minimizer. In this
section we focus on the nonasymptotic global rate of convergence/eﬃciency of such methods,
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

measured by function values.

We begin by stating the basic iteration of ISTA for solving problem (P) deﬁned in (2.4).
ISTA with constant stepsize
Input: L := L(f ) - A Lipschitz constant of ∇f .
Step 0. Take x0 ∈ Rn .
Step k. (k ≥ 1) Compute

(3.1) xk = pL (xk−1 ).

If f (x) = Ax − b2 and g(x) = λx1 (λ > 0), then algorithm (3.1) reduces to the
1
basic iterative shrinkage method (1.4) with t = L(f ) . Clearly, such a general algorithm will
be useful when pL (·) can be computed analytically or via a low cost scheme. This situation
occurs particularly when g(·) is separable, since in that case the computation of pL reduces
to solving a one-dimensional minimization problem, e.g., with g(·) being the pth power of the
lp norm of x, with p ≥ 1. For such computation and other separable regularizers, see, for
instance, the general formulas derived in [25, 7, 9].
A possible drawback of this basic scheme is that the Lipschitz constant L(f ) is not always
known or computable. For instance, the Lipschitz constant in the l1 regularization problem
(1.3) depends on the maximum eigenvalue of AT A (see Example 2.2). For large-scale prob-
lems, this quantity is not always easily computable. We therefore also analyze ISTA with a
backtracking stepsize rule.
ISTA with backtracking
Step 0. Take L0 > 0, some η > 1, and x0 ∈ Rn .
Step k. (k ≥ 1) Find the smallest nonnegative integers ik such
that with L̄ = η ik Lk−1

(3.2) F (pL̄ (xk−1 )) ≤ QL̄ (pL̄ (xk−1 ), xk−1 ).

Set Lk = η ik Lk−1 and compute

(3.3) xk = pLk (xk−1 ).

Remark 3.1. Note that the sequence of function values {F (xk )} produced by ISTA is
nonincreasing. Indeed, for every k ≥ 1,

F (xk ) ≤ QLk (xk , xk−1 ) ≤ QLk (xk−1 , xk−1 ) = F (xk−1 ),

where Lk is either chosen by the backtracking rule or Lk ≡ L(f ) is a given Lipschitz constant
of ∇f .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

192 AMIR BECK AND MARC TEBOULLE

Remark 3.2. Since inequality (3.2) is satisﬁed for L̄ ≥ L(f ), where L(f ) is the Lipschitz
constant of ∇f , it follows that for ISTA with backtracking one has Lk ≤ ηL(f ) for every
k ≥ 1. Overall,
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(3.4) βL(f ) ≤ Lk ≤ αL(f ),

L0
where α = β = 1 for the constant stepsize setting and α = η, β = L(f ) for the backtracking
case.
Recall that ISTA reduces to the gradient method when g(x) ≡ 0. For the gradient method
it is known that the sequence of function values F (xk ) converges to the optimal function value
F∗ at a rate of convergence that is no worse than O(1/k), which is also called a “sublinear”
rate of convergence. That is, F (xk ) − F∗ ≤ C/k for some positive constant C; see, e.g., [23].
We show below that ISTA shares the same rate of convergence.
Theorem 3.1. Let {xk } be the sequence generated by either (3.1) or (3.3). Then for any
k≥1
αL(f )x0 − x∗ 2
(3.5) F (xk ) − F (x∗ ) ≤ ∀x∗ ∈ X∗ ,
2k
where α = 1 for the constant stepsize setting and α = η for the backtracking stepsize setting.
Proof. Invoking Lemma 2.3 with x = x∗ , y = xn , and L = Ln+1 , we obtain
2
(F (x∗ ) − F (xn+1 )) ≥ xn+1 − xn 2 + 2xn − x∗ , xn+1 − xn
Ln+1
= x∗ − xn+1 2 − x∗ − xn 2 ,

which combined with (3.4) and the fact that F (x∗ ) − F (xn+1 ) ≤ 0 yields
2
(3.6) (F (x∗ ) − F (xn+1 )) ≥ x∗ − xn+1 2 − x∗ − xn 2 .
αL(f )
Summing this inequality over n = 0, . . . , k − 1 gives
k−1
2
(3.7) kF (x∗ ) − F (xn+1 ) ≥ x∗ − xk 2 − x∗ − x0 2 .
αL(f ) n=0

Invoking Lemma 2.3 one more time with x = y = xn and L = Ln+1 yields
2
(F (xn ) − F (xn+1 )) ≥ xn − xn+1 2 .
Ln+1
Since Ln+1 ≥ βL(f ) (see (3.4) and F (xn ) − F (xn+1 ) ≥ 0), it follows that
2
(F (xn ) − F (xn+1 )) ≥ xn − xn+1 2 .
βL(f )
Multiplying the last inequality by n and summing over n = 0, . . . , k − 1, we obtain
k−1 k−1
2
(nF (xn ) − (n + 1)F (xn+1 ) + F (xn+1 )) ≥ nxn − xn+1 2 ,
βL(f ) n=0 n=0

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 193

which simpliﬁes to
k−1 k−1
2
(3.8) −kF (xk ) + F (xn+1 ) ≥ nxn − xn+1 2 .
βL(f ) n=0 n=0
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Adding (3.7) and (3.8) times β/α, we get

k−1
2k β
(F (x∗ ) − F (xk )) ≥ x∗ − xk 2 + nxn − xn+1 2 − x∗ − x0 2 ,
αL(f ) α
n=0

and hence it follows that

αL(f )x − x0 2
F (xk ) − F (x∗ ) ≤ .
2k
The above result can be interpreted as follows. The number of iterations of ISTA required
to obtain an ε-optimal solution, that is, an x̃ such that F (x̃) − F∗ ≤ ε, is at most C/ε,
∗ 2
where C = αL(f )x20 −x .
In the next section we will show that we can devise a different method, which is as simple
as ISTA, but with a significantly improved complexity result.
4. FISTA: A fast iterative shrinkage-thresholding algorithm. In the previous section
we showed that ISTA has a worst-case complexity result of O(1/k). In this section we will
introduce a new ISTA with an improved complexity result of O(1/k2 ).
We recall that when g(x) ≡ 0, the general model (2.4) consists of minimizing a smooth
convex function and ISTA reduced to the gradient method. In this smooth setting it was
proven in [27] that there exists a gradient method with an O(1/k2 ) complexity result which is
an “optimal” first order method for smooth problems, in the sense of Nemirovsky and Yudin
[26]. The remarkable fact is that the method developed in [27] does not require more than
one gradient evaluation at each iteration (namely, same as the gradient method) but just an
additional point that is smartly chosen and easy to compute.
In this section we extend the method of [27] to the general model (2.4) and we establish
the improved complexity result. Our analysis also provides a simple proof for the special
smooth case (i.e., with g(x) ≡ 0) as well.
We begin by presenting the algorithm with a constant stepsize.
FISTA with constant stepsize
Input: L = L(f ) - A Lipschitz constant of ∇f .
Step 0. Take y1 = x0 ∈ Rn , t1 = 1.
Step k. (k ≥ 1) Compute

(4.1) xk = pL (yk ),

1 + 1 + 4t2k
(4.2) tk+1 = ,
2
tk − 1
(4.3) yk+1 = xk + (xk − xk−1 ).
tk+1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

194 AMIR BECK AND MARC TEBOULLE

The main difference between the above algorithm and ISTA is that the iterative shrinkage
operator pL (·) is not employed on the previous point xk−1 , but rather at the point yk which
uses a very specific linear combination of the previous two points {xk−1 , xk−2 }. Obviously
the main computational effort in both ISTA and FISTA remains the same, namely, in the
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

operator pL . The requested additional computation for FISTA in the steps (4.2) and (4.3) is
clearly marginal. The speciﬁc formula for (4.2) emerges from the recursive relation that will
be established below in Lemma 4.1.
For the same reasons already explained in section 3, we will also analyze FISTA with a
backtracking stepsize rule, which we now state explicitly.
FISTA with backtracking
Step 0. Take L0 > 0, some η > 1, and x0 ∈ Rn . Set y1 = x0 , t1 = 1.
Step k. (k ≥ 1) Find the smallest nonnegative integers ik such that
with L̄ = η ik Lk−1

F (pL̄ (yk )) ≤ QL̄ (pL̄ (yk ), yk ).

Set Lk = η ik Lk−1 and compute

xk = pLk (yk ),

1 + 1 + 4t2k
tk+1 = ,
2
tk − 1
yk+1 = xk + (xk − xk−1 ).
tk+1

Note that the upper and lower bounds on Lk given in Remark 3.2 still hold true for FISTA,
namely,
βL(f ) ≤ Lk ≤ αL(f ).

The next result provides the key recursive relation for the sequence {F (xk ) − F (x∗ )} that
will imply the better complexity rate O(1/k2 ). As we shall see, Lemma 2.3 of section 2 plays
a central role in the proofs.
Lemma 4.1. The sequences {xk , yk } generated via FISTA with either a constant or back-
tracking stepsize rule satisfy for every k ≥ 1

2 2 2 2
tk vk − t vk+1 ≥ uk+1 2 − uk 2 ,
Lk Lk+1 k+1

where vk := F (xk ) − F (x∗ ), uk := tk xk − (tk − 1)xk−1 − x∗ .

Proof. First we apply Lemma 2.3 at the points (x := xk , y := yk+1 ) with L = Lk+1 , and
likewise at the points (x := x∗ , y := yk+1 ), to get

2L−1 2
k+1 (vk − vk+1 ) ≥ xk+1 − yk+1 + 2xk+1 − yk+1 , yk+1 − xk ,
−2L−1 2 ∗
k+1 vk+1 ≥ xk+1 − yk+1 + 2xk+1 − yk+1 , yk+1 − x ,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 195

where we used the fact that xk+1 = pLk+1 (yk+1 ). To get a relation between vk and vk+1 , we
multiply the ﬁrst inequality above by (tk+1 − 1) and add it to the second inequality:
2
((tk+1 −1)vk −tk+1 vk+1 ) ≥ tk+1 xk+1 −yk+1 2 +2xk+1 −yk+1 , tk+1 yk+1 −(tk+1 −1)xk −x∗ .
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Lk+1

Multiplying the last inequality by tk+1 and using the relation t2k = t2k+1 − tk+1 which holds
thanks to (4.2), we obtain
2
(t2k vk −t2k+1 vk+1 ) ≥ tk+1 (xk+1 −yk+1 )2 +2tk+1 xk+1 −yk+1 , tk+1 yk+1 −(tk+1 −1)xk −x∗ .
Lk+1
Applying the usual Pythagoras relation

b − a2 + 2b − a, a − c = b − c2 − a − c2

to the right-hand side of the last inequality with

a := tk+1 yk+1 , b := tk+1 xk+1 , c := (tk+1 − 1)xk + x∗ ,

we thus get
2
(t2k vk − t2k+1 vk+1 ) ≥ tk+1 xk+1 − (tk+1 − 1)xk − x∗ 2 − tk+1 yk+1 − (tk+1 − 1)xk − x∗ 2 .
Lk+1
Therefore, with yk+1 (cf. (4.3)) and uk deﬁned by

tk+1 yk+1 = tk+1 xk + (tk − 1)(xk − xk−1 ) and uk = tk xk − (tk − 1)xk−1 − x∗ ,

it follows that
2
(t2 vk − t2k+1 vk+1 ) ≥ uk+1 2 − uk 2 ,
Lk+1 k
which combined with the inequality Lk+1 ≥ Lk yields
2 2 2 2
t vk − t vk+1 ≥ uk+1 2 − uk 2 .
Lk k Lk+1 k+1
We also need the following trivial facts.
Lemma 4.2. Let {ak , bk } be positive sequences of reals satisfying

ak − ak+1 ≥ bk+1 − bk ∀k ≥ 1, with a1 + b1 ≤ c, c > 0.

Then, ak ≤ c for every k ≥ 1.

Lemma 4.3. The positive sequence {tk } generated in FISTA via (4.2) with t1 = 1 satisﬁes
tk ≥ (k + 1)/2 for all k ≥ 1.
We are now ready to prove the promised improved complexity result for FISTA.
Theorem 4.4. Let {xk }, {yk } be generated by FISTA. Then for any k ≥ 1

2αL(f )x0 − x∗ 2
(4.4) F (xk ) − F (x∗ ) ≤ ∀x∗ ∈ X∗ ,
(k + 1)2

196 AMIR BECK AND MARC TEBOULLE

where α = 1 for the constant stepsize setting and α = η for the backtracking stepsize setting.
Proof. Let us deﬁne the quantities
2 2
ak := t vk , bk := uk 2 , c := y1 − x∗ 2 = x0 − x∗ 2 ,
Lk k
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and recall (cf. Lemma 4.1) that vk := F (xk ) − F (x∗ ). Then, by Lemma 4.1 we have for every
k≥1
ak − ak+1 ≥ bk+1 − bk ,
and hence assuming that a1 + b1 ≤ c holds true, invoking Lemma 4.2, we obtain that
2 2
t vk ≤ x0 − x∗ 2 ,
Lk k
which combined with tk ≥ (k + 1)/2 (by Lemma 4.3) yields
2Lk x0 − x∗ 2
vk ≤ .
(k + 1)2
Utilizing the upper bound on Lk given in (3.4), the desired result (4.4) follows. Thus, all
that remains is to prove the validity of the relation a1 + b1 ≤ c. Since t1 = 1, and using the
deﬁnition of uk given in Lemma 4.1, we have here
2 2
a1 = t1 v1 = v1 , b1 = u1 2 = x1 − x∗ .
L1 L1
Applying Lemma 2.3 to the points x := x∗ , y := y1 with L = L1 , we get
L1
(4.5) F (x∗ ) − F (p(y1 )) ≥ p(y1 ) − y1 2 + L1 y1 − x∗ , p(y1 ) − y1 .
2
Thus,

F (x∗ ) − F (x1 ) = F (x∗ ) − F (p(y1 ))

(4.5) L
1
≥ p(y1 ) − y1 2 + L1 y1 − x∗ , p(y1 ) − y1
2
L1
= x1 − y1 2 + L1 y1 − x∗ , x1 − y1
2
L1
= {x1 − x∗ 2 − y1 − x∗ 2 }.
2
Consequently,
2
v1 ≤ y1 − x∗ 2 − x1 − x∗ 2 ;
L1
that is, a1 + b1 ≤ c holds true.
The number of iterations of FISTA required to obtain an ε-optimal solution, that is,
√
an x̃ such that F (x̃) − F∗ ≤ ε, is at most C/ ε − 1, where C = 2αL(f )x0 − x∗ 2 ,
and which clearly improves ISTA. In the next section we demonstrate the practical value of
this theoretical global convergence rate estimate derived for FISTA on the l1 wavelet-based
regularization problem (1.3).

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 197

original blurred and noisy

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 1. Deblurring of the cameraman.

5. Numerical examples. In this section we illustrate the performance of FISTA compared

to the basic ISTA and to the recent TWIST algorithm of [3]. Since our simulations consider
extremely ill-conditioned problems (the smallest eigenvalue of AT A is nearly zero, and the
maximum eigenvalue is 1), the TWIST method is not guaranteed to converge, and we thus
use the monotone version of TWIST termed MTWIST. The parameters for the MTWIST
method were chosen as suggested in [3, section 6] for extremely ill-conditioned problems. All
methods were used with a constant stepsize rule and applied to the l1 regularization problem
(1.3), that is, f (x) = Ax − b2 and g(x) = λx1 .
In all the simulations we have performed, we observed that FISTA signiﬁcantly outper-
formed ISTA with respect to the number of iterations required to achieve a given accuracy.
Similar conclusions can be made when compared with MTWIST. Below, we describe repre-
sentative examples and results from these simulations.

5.1. Example 1: The cameraman test image. All pixels of the original images described
in the examples were first scaled into the range between 0 and 1. In the first example we look
at the 256 × 256 cameraman test image. The image went through a Gaussian blur of size 9 × 9
and standard deviation 4 (applied by the MATLAB functions imfilter and fspecial) followed
by an additive zero-mean white Gaussian noise with standard deviation 10−3 . The original
and observed images are given in Figure 1.
For these experiments we assume reflexive (Neumann) boundary conditions [22]. We
then tested ISTA, FISTA, and MTWIST for solving problem (1.3), where b represents the
(vectorized) observed image, and A = RW, where R is the matrix representing the blur
operator and W is the inverse of a three stage Haar wavelet transform. The regularization
parameter was chosen to be λ = 2e-5, and the initial image was the blurred image. The
Lipschitz constant was computable in this example (and those in what follows) since the
eigenvalues of the matrix AT A can be easily calculated using the two-dimensional cosine
transform [22]. Iterations 100 and 200 are described in Figure 2. The function value at
iteration k is denoted by Fk . The images produced by FISTA are of a better quality than
those created by ISTA and MTWIST. It is also clear that MTWIST gives better results than

198 AMIR BECK AND MARC TEBOULLE

ISTA: F100 = 5.44e-1 ISTA: F200 = 3.60e-1

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

MTWIST: F100 = 3.09e-1 MTWIST: F200 = 2.61e-1

FISTA: F100 = 2.40e-1 FISTA: F200 = 2.28e-1

Figure 2. Iterations of ISTA, MTWIST, and FISTA methods for deblurring of the cameraman.

ISTA. The function value of FISTA was consistently lower than the function values of ISTA
and MTWIST. We also computed the function values produced after 1000 iterations for ISTA,
MTWIST, and FISTA which were, respectively, 2.45e-1, 2.31e-1, and 2.23e-1. Note that the
function value of ISTA after 1000 iterations, is still worse (that is, larger) than the function
value of FISTA after 100 iterations, and the function value of MTWIST after 1000 iterations
is worse than the function value of FISTA after 200 iterations.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 199

original blurred and noisy

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Figure 3. Deblurring of the simple test image.

5.2. Example 2: A simple test image. In this example we will further show the beneﬁt
of FISTA. The 256 × 256 simple test image was extracted from the function blur from the
regularization toolbox [20]. The image then undergoes the same blurring and noise-adding
procedure described in the previous example. The original and observed images are given in
Figure 3.
The algorithms were tested with regularization parameter λ=1e-4 and with the same
wavelet transform. The results of iterations 100 and 200 are described in Figure 4. Clearly,
FISTA provides clearer images and improved function values. Moreover, the function value
0.321 obtained at iteration number 100 of FISTA is better than the function values of both
ISTA and MTWIST methods at iteration number 200 (0.427 and 0.341, respectively). More-
over, MTWIST needed 416 iterations to reach the value that FISTA obtained after 100 it-
erations (0.321) and required 1102 iterations to reach the value 0.309 produced by FISTA
after 200 iterations. In addition we ran the algorithm for tens of thousands of iterations and
noticed that ISTA seems to get stuck at a function value of 0.323 and MTWIST gets stuck at
a function value of 0.318.
From the previous example it seems that practically FISTA is able to reach accuracies
that are beyond the capabilities of ISTA and MTWIST. To test this hypothesis we also
considered an example in which the optimal solution is known. For that sake we considered
a 64 × 64 version of the previous test image which undergoes the same blur operator as the
previous example. No noise was added, and we solved the LS problem, that is, λ = 0. The
optimal solution of this problem is zero. The function values of the three methods for 10000
iterations are described in Figure 5. The results produced by FISTA are better than those
produced by ISTA and MTWIST by several orders of magnitude and clearly demonstrate
the eﬀective performance of FISTA. One can see that after 10000 iterations FISTA reaches
an accuracy of approximately 10−7 , while ISTA and MTWIST reach accuracies of 10−3 and
10−4 , respectively. Finally, we observe that the values obtained by ISTA and MTWIST at
iteration 10000 were already obtained by FISTA at iterations 275 and 468, respectively.
These preliminary computational results indicate that FISTA is a simple and promising
iterative scheme, which can be even faster than the proven predicted theoretical rate. Its
potential for analyzing and designing faster algorithms in other application areas and with

200 AMIR BECK AND MARC TEBOULLE

ISTA: F100 = 5.67e-1 ISTA: F200 = 4.27e-1

Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

MTWIST: F100 = 3.83e-1 MTWIST: F200 = 3.41e-1

FISTA: F100 = 3.21e-1 FISTA: F200 = 3.09e-1

Figure 4. Outputs of ISTA, MTWIST and FISTA for the simple test image.

other types of regularizers, as well as a more thorough computational study, are topics of
future research.

Acknowledgment. The authors are indebted to the two anonymous referees for their use-
ful suggestions and for having drawn the authors’ attention to additional relevant references.

A FAST ITERATIVE SHRINKAGE-THRESHOLDING ALGORITHM 201

2
10
ISTA
MTWIST
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

FISTA
0
10

−2
10

−4
10

−6
10

−8
10
0 2000 4000 6000 8000 10000

Figure 5. Comparison of function value errors F (xk ) − F (x∗ ) of ISTA, MTWIST, and FISTA.

REFERENCES

[1] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and
Engineering Applications, MPS/SIAM Ser. Optim., SIAM, Philadelphia, 2001.
[2] D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Belmont, MA, 1999.
[3] J. Bioucas-Dias and M. Figueiredo, A new TwIST: Two-step iterative shrinkage/thresholding algo-
rithms for image restoration, IEEE Trans. Image Process., 16 (2007), pp. 2992–3004.
[4] A. Björck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996.
[5] K. Bredies and D. Lorenz, Iterative Soft-Thresholding Converges Linearly, Technical report, 2008;
available online at http://arxiv.org/abs/0709.1598v3.
[6] R. J. Bruck, On the weak convergence of an ergodic iteration for the solution of variational inequalities
for monotone operators in Hilbert space, J. Math. Anal. Appl., 61 (1977), pp. 159–164.
[7] A. Chambolle, R. A. DeVore, N. Y. Lee, and B. J. Lucier, Nonlinear wavelet image processing:
Variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image
Process., 7 (1998), pp. 319–335.
[8] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J.
Sci. Comput., 20 (1998), pp. 33–61.
[9] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale
Model. Simul., 4 (2005), pp. 1168–1200.
[10] I. Daubechies, M. Defrise, and C. D. Mol, An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint, Comm. Pure Appl. Math., 57 (2004), pp. 1413–1457.
[11] D. L. Donoho and I. M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage, J. Amer.
Statist. Assoc., 90 (1995), pp. 1200–1224.
[12] M. Elad, B. Matalon, and M. Zibulevsky, Subspace optimization methods for linear least squares
with non-quadratic regularization, Appl. Comput. Harmon. Anal., 23 (2007), pp. 346–367.
[13] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Math. Appl. 375,
Kluwer Academic Publishers Group, Dordrecht, The Netherlands, 1996.

202 AMIR BECK AND MARC TEBOULLE

[14] F. Facchinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Prob-
lems, Vol. II, Springer Ser. Oper. Res., Springer-Verlag, New York, 2003.
[15] M. A. T. Figueiredo and R. D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE
Trans. Image Process., 12 (2003), pp. 906–916.
Downloaded 08/20/17 to 128.119.168.112. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[16] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, Gradient projection for sparse reconstruction:
Application to compressed sensing and other inverse problems, IEEE J. Sel. Top. Signal Process., 1
(2007), pp. 586–597.
[17] G. H. Golub, P. C. Hansen, and D. P. O’Leary, Tikhonov regularization and total least squares,
SIAM J. Matrix Anal. Appl., 21 (1999), pp. 185–194.
[18] E. Hale, W. Yin, and Y. Zhang, A Fixed-Point Continuation Method for l1 -Regularized Minimization
with Applications to Compressed Sensing, CAAM Technical report TR07-07, Rice University, Houston,
TX, 2007.
[19] P. C. Hansen and D. P. O’Leary, The use of the L-curve in the regularization of discrete ill-posed
problems, SIAM J. Sci. Comput., 14 (1993), pp. 1487–1503.
[20] P. C. Hansen, Regularization tools: A MATLAB package for analysis and solution of discrete ill-posed
problems, Numer. Algorithms, 6 (1994), pp. 1–35.
[21] P. C. Hansen, Rank-Deﬁcient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,
SIAM, Philadelphia, 1997.
[22] P. C. Hansen, J. G. Nagy, and D. P. O’Leary, Deblurring Images: Matrices, Spectra, and Filtering,
Fundam. Algorithms 3, SIAM, Philadelphia, 2006.
[23] E. S. Levitin and B. T. Polyak, Constrained minimization methods, Comput. Math. Math. Phys., 6
(1966), pp. 787–823.
[24] B. Martinet, Régularisation d’inéquations variationnelles par approximations successives, Rev.
Française Informat. Recherche Opérationnelle, 4 (1970), pp. 154–158.
[25] P. Moulin and J. Liu, Analysis of multiresolution image denoising schemes using generalized Gaussian
and complexity priors, IEEE Trans. Inform. Theory, 45 (1999), pp. 909–919.
[26] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Eﬃciency in Optimization,
Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, New York, 1983.
[27] Y. E. Nesterov, A method for solving the convex programming problem with convergence rate O(1/k2 ),
Dokl. Akad. Nauk SSSR, 269 (1983), pp. 543–547 (in Russian).
[28] Y. E. Nesterov, Gradient Methods for Minimizing Composite Objective Function, CORE report, 2007;
available at http://www.ecore.be/DPs/dp 1191313936.pdf.
[29] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables,
Classics Appl. Math. 30, SIAM, Philadelphia, 2000.
[30] G. B. Passty, Ergodic convergence to a zero of the sum of monotone operators in Hilbert space, J. Math.
Anal. Appl., 72 (1979), pp. 383–390.
[31] B. T. Polyak, Introduction to Optimization, Translations Series in Mathematics and Engineering, Op-
timization Software, Publications Division, New York, 1987.
[32] J. L. Starck, D. L. Donoho, and E. J. Candès, Astronomical image representation by the curevelet
transform, Astron. Astrophys., 398 (2003), pp. 785–800.
[33] A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-Posed Problems, V. H. Winston, Washington, DC,
1977.
[34] C. Vonesch and M. Unser, Fast iterative thresholding algorithm for wavelet-regularized deconvolution,
in Proceedings of the SPIE Optics and Photonics 2007 Conference on Mathematical Methods: Wavelet
XII, Vol. 6701, San Diego, CA, 2007, pp. 1–5.
[35] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, Sparse reconstruction by separable approxima-
tion, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2008), 2008, pp. 3373–3376.

Discrete Inverse Problem - Insight and Algorithms
No ratings yet
Discrete Inverse Problem - Insight and Algorithms
209 pages
EUROCOD 5 - Design of Timber Structures - General Rules
100% (1)
EUROCOD 5 - Design of Timber Structures - General Rules
72 pages
ANSYS Presentation
100% (1)
ANSYS Presentation
48 pages
Welding Machine Pre Start Checklist
No ratings yet
Welding Machine Pre Start Checklist
2 pages
Aristotle On Matter
No ratings yet
Aristotle On Matter
24 pages
General Science Reviewer For LET 2022
No ratings yet
General Science Reviewer For LET 2022
117 pages
Fast Image Deconvolution Using Hyper-Laplacian Priors: Dilip Krishnan, Rob Fergus
No ratings yet
Fast Image Deconvolution Using Hyper-Laplacian Priors: Dilip Krishnan, Rob Fergus
9 pages
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
No ratings yet
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
58 pages
Deblurring Images, Matrices, Spectra, and Filtering (Fundamentals of Algorithms)
No ratings yet
Deblurring Images, Matrices, Spectra, and Filtering (Fundamentals of Algorithms)
145 pages
Cs606 Final Term Quizez and MCQZ Solved With Refer
No ratings yet
Cs606 Final Term Quizez and MCQZ Solved With Refer
18 pages
Heat of Combustion Lab 2
No ratings yet
Heat of Combustion Lab 2
14 pages
Computational Inverse Problems
100% (1)
Computational Inverse Problems
67 pages
Optimisation of Intake Trashracks: S. Bjarnason T.S. Leifsson G. Pétursson H. Jóhannesson
No ratings yet
Optimisation of Intake Trashracks: S. Bjarnason T.S. Leifsson G. Pétursson H. Jóhannesson
7 pages
Image Enhancement by Regularization Methods
No ratings yet
Image Enhancement by Regularization Methods
44 pages
Inverse Problems in Image Processing: Effrosyni Kokiopoulou Martin Ple Singer
No ratings yet
Inverse Problems in Image Processing: Effrosyni Kokiopoulou Martin Ple Singer
39 pages
Add Math Project Work 1 2010
100% (1)
Add Math Project Work 1 2010
17 pages
A Novel Study On Image and Video Denoising
No ratings yet
A Novel Study On Image and Video Denoising
33 pages
Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization
No ratings yet
Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization
35 pages
Uniform and Non-Uniform Single Image Deblurring Based On Sparse Representation and Adaptive Dictionary Learning
No ratings yet
Uniform and Non-Uniform Single Image Deblurring Based On Sparse Representation and Adaptive Dictionary Learning
14 pages
Adaptive Image Reconstruction Using Information Measures
No ratings yet
Adaptive Image Reconstruction Using Information Measures
18 pages
Matrix Structures For Image Applications: Some Examples and Open Problems
No ratings yet
Matrix Structures For Image Applications: Some Examples and Open Problems
8 pages
Incomplete Oblique Projections Method For Solving Regularized Least-Squares Problems in Image Reconstruction
No ratings yet
Incomplete Oblique Projections Method For Solving Regularized Least-Squares Problems in Image Reconstruction
22 pages
Vit Assignment
No ratings yet
Vit Assignment
2 pages
2014 Elena
No ratings yet
2014 Elena
20 pages
Crude Oil Emulsions A State-Of-The-Art Review
100% (3)
Crude Oil Emulsions A State-Of-The-Art Review
11 pages
Understanding Scuffing and Micropitting of Gears: R W Snidle, H P Evans, M P Alanou, M J A Holmes
No ratings yet
Understanding Scuffing and Micropitting of Gears: R W Snidle, H P Evans, M P Alanou, M J A Holmes
18 pages
BM3D Frames and Variational Image Deblurring
No ratings yet
BM3D Frames and Variational Image Deblurring
13 pages
STACK
No ratings yet
STACK
39 pages
Iterative Methods of Richardson-Lucy-type For Image Deblurring
No ratings yet
Iterative Methods of Richardson-Lucy-type For Image Deblurring
15 pages
ICME Deblurring PDF
No ratings yet
ICME Deblurring PDF
6 pages
Fast Image Recovery Using Variable Splitting and Constrained Optimization
No ratings yet
Fast Image Recovery Using Variable Splitting and Constrained Optimization
11 pages
Sparse Stochastic Processes and Discretization of Linear Inverse Problems
No ratings yet
Sparse Stochastic Processes and Discretization of Linear Inverse Problems
12 pages
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
No ratings yet
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
26 pages
Albert Einstein
No ratings yet
Albert Einstein
19 pages
IR Tools - A MATLAB Package of Iterative Regularization Methods and Large-Scale Test Problems
No ratings yet
IR Tools - A MATLAB Package of Iterative Regularization Methods and Large-Scale Test Problems
36 pages
Ipdo 050
No ratings yet
Ipdo 050
7 pages
Blind Deconvolution Using A Normalized Sparsity Measure
No ratings yet
Blind Deconvolution Using A Normalized Sparsity Measure
8 pages
Two-Step Algorithms For Linear Inverse Problems With Non-Quadratic Regularization
No ratings yet
Two-Step Algorithms For Linear Inverse Problems With Non-Quadratic Regularization
4 pages
A Sparse Representation and Dictionary Learning Ba
No ratings yet
A Sparse Representation and Dictionary Learning Ba
27 pages
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
No ratings yet
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
21 pages
J. Vis. Commun. Image R.: Wenfei Cao, Jian Sun, Zongben Xu
No ratings yet
J. Vis. Commun. Image R.: Wenfei Cao, Jian Sun, Zongben Xu
11 pages
M.S.Ramaiah Institute of Technology Department of Management Studies
No ratings yet
M.S.Ramaiah Institute of Technology Department of Management Studies
5 pages
Classroom Activity - Externally Applied Loads
No ratings yet
Classroom Activity - Externally Applied Loads
1 page
Automated Face Mask Detection: A Project by Nishant Goel Under The Guidance of Dr. Anil Kumar
No ratings yet
Automated Face Mask Detection: A Project by Nishant Goel Under The Guidance of Dr. Anil Kumar
21 pages
College of Engineering Science and Technology Department of Computing Science & Information Systems
No ratings yet
College of Engineering Science and Technology Department of Computing Science & Information Systems
3 pages
Deconvolution: A Wavelet Frame Approach: Received: Date / Revised Version: Date
No ratings yet
Deconvolution: A Wavelet Frame Approach: Received: Date / Revised Version: Date
36 pages
LTspice Tutorial Part 4 - Intermediate Circuits
No ratings yet
LTspice Tutorial Part 4 - Intermediate Circuits
23 pages
Automatic High Beam Controller For Vehicles
No ratings yet
Automatic High Beam Controller For Vehicles
6 pages
Bstract
No ratings yet
Bstract
12 pages
Rudin Osher Fatemi
No ratings yet
Rudin Osher Fatemi
10 pages
Deblurring Images Via Partial Differential Equations: Sirisha L. Kala Mississippi State University
No ratings yet
Deblurring Images Via Partial Differential Equations: Sirisha L. Kala Mississippi State University
8 pages
Automatic Localized Deblurring in Digital Images: Cherin Joseph, Pradip Harindran Vallathol, Raj Kumar Gupta
No ratings yet
Automatic Localized Deblurring in Digital Images: Cherin Joseph, Pradip Harindran Vallathol, Raj Kumar Gupta
5 pages
Inverse Problems in Image Processing - Blind Image Restoration
No ratings yet
Inverse Problems in Image Processing - Blind Image Restoration
95 pages
50 - Introduction To Inverse Problems in Imaging
No ratings yet
50 - Introduction To Inverse Problems in Imaging
4 pages
PMA 133 Book - Verbal Intelligence Test Questions (Solved) - 1
No ratings yet
PMA 133 Book - Verbal Intelligence Test Questions (Solved) - 1
4 pages
1 s2.0 S0377042713004202 Main
No ratings yet
1 s2.0 S0377042713004202 Main
16 pages
1 s2.0 S1051200421003249 Main
No ratings yet
1 s2.0 S1051200421003249 Main
13 pages
Lecture08 Restoration Deblur
No ratings yet
Lecture08 Restoration Deblur
27 pages
CE Topic 2 & 3
No ratings yet
CE Topic 2 & 3
2 pages
Physics Statistical Mechanics N Solid State Physics
No ratings yet
Physics Statistical Mechanics N Solid State Physics
4 pages
Admm Diptv
No ratings yet
Admm Diptv
8 pages
FISTA
No ratings yet
FISTA
20 pages
Regularisation in Image Reconstruction
No ratings yet
Regularisation in Image Reconstruction
4 pages
Jimaging 09 00249 v2
No ratings yet
Jimaging 09 00249 v2
14 pages
Pourya 2024
No ratings yet
Pourya 2024
16 pages
PM157 bm1
No ratings yet
PM157 bm1
19 pages
Diffusion Survey
No ratings yet
Diffusion Survey
38 pages
Image Restoration With Mixed or Unknown Noises
No ratings yet
Image Restoration With Mixed or Unknown Noises
31 pages
Indigo: An Inn-Guided Probabilistic Diffusion Algorithm For Inverse Problems
No ratings yet
Indigo: An Inn-Guided Probabilistic Diffusion Algorithm For Inverse Problems
6 pages
Nonlinear Total Variation Based Noise Removal Algo
No ratings yet
Nonlinear Total Variation Based Noise Removal Algo
11 pages
Math
No ratings yet
Math
14 pages
A Piecewise Local Regularized Richardson-Lucy Algorithm For Remote Sensing Image Deconvolution
No ratings yet
A Piecewise Local Regularized Richardson-Lucy Algorithm For Remote Sensing Image Deconvolution
8 pages
Ergonomically Designed Turmeric - FINALE
No ratings yet
Ergonomically Designed Turmeric - FINALE
24 pages
Inverse Problems QA
No ratings yet
Inverse Problems QA
3 pages
IAS111 Study Guide 2024
No ratings yet
IAS111 Study Guide 2024
15 pages
AM For ImageReconstruction
No ratings yet
AM For ImageReconstruction
24 pages
Kolehmainen
No ratings yet
Kolehmainen
104 pages
Es Model
No ratings yet
Es Model
16 pages
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
No ratings yet
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
37 pages
Moment Gradient Factor For Steel I-Beams
No ratings yet
Moment Gradient Factor For Steel I-Beams
20 pages
Ammonia STD 10
No ratings yet
Ammonia STD 10
2 pages
Diffusion Reconstruction From Very Noisy Tomograph
No ratings yet
Diffusion Reconstruction From Very Noisy Tomograph
11 pages
Revised Notes Chapter 1
No ratings yet
Revised Notes Chapter 1
16 pages
S M S T C Lecture Notes Lecture9
No ratings yet
S M S T C Lecture Notes Lecture9
15 pages
Inversion by Direct Iteration An Alternative To Denoising Diffusion 2303.11435v5
No ratings yet
Inversion by Direct Iteration An Alternative To Denoising Diffusion 2303.11435v5
35 pages
Chemistry-Neet Chemical Kinetics (Easy) Solution
No ratings yet
Chemistry-Neet Chemical Kinetics (Easy) Solution
8 pages
Mathematical Methods in Image Processing
No ratings yet
Mathematical Methods in Image Processing
31 pages
A Survey On Diffusion Models For Inverse Problems: 1.1 Problem Setting
No ratings yet
A Survey On Diffusion Models For Inverse Problems: 1.1 Problem Setting
38 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet

Beck 2009

Uploaded by

Beck 2009

Uploaded by

SIAM J.

A Fast Iterative Shrinkage-Thresholding Algorithm

Amir Beck† and Marc Teboulle‡

AMS subject classifications. 90C25, 90C06, 65F22

1. Introduction. Linear inverse problems arise in a wide range of applications such as

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(LS): x̂LS = argmin Ax − b2 .

(1.2) (T): x̂TIK = argmin{Ax − b2 + λLx2 }.

(1.3) min{F (x) ≡ Ax − b2 + λx1 },

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

where each iteration involves matrix-vector multiplication involving A and AT followed by a

where t is an appropriate stepsize and T α : Rn → Rn is the shrinkage operator deﬁned by

(1.5) T α (x)i = (|xi | − α)+ sgn (xi ).

(1.6) min{F (x) ≡ f (x) + g(x)},

xk+1 = T λt (G(xk )),

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

conﬁrm this property by proving that ISTA behaves like

F (xk ) − F (x∗ )  O(1/k),

namely, shares a sublinear global rate of convergence.

xk+1 = T λt (G(yk )),

(1.7) Ax − b2 + ϕ(x),

where ϕ is a convex nonsmooth regularizer. The eﬀectiveness of TWIST as a faster method

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(U) min{f (x) : x ∈ Rn }.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(2.1) x0 ∈ Rn , xk = xk−1 − tk ∇f (xk−1 ),

(2.2) min{f (x) + λx1 : x ∈ Rn }

leads to the iterative scheme

xk = T λtk (xk−1 − tk ∇f (xk−1 )),

where T α : Rn → Rn is the shrinkage operator given in (1.5).

(2.4) (P) min{F (x) ≡ f (x) + g(x) : x ∈ Rn }.

The following assumptions are made throughout the paper:

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

• g : Rn → R is a continuous convex function which is possibly nonsmooth.

∇f (x) − ∇f (y) ≤ L(f )x − y for every x, y ∈ Rn ,

(2.6) pL (y) := argmin{QL (x, y) : x ∈ Rn }.

Simple algebra shows that (ignoring constant terms in y)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(2.8) ∇f (y) + L(z − y) + γ(y) = 0.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

measured by function values.

(3.2) F (pL̄ (xk−1 )) ≤ QL̄ (pL̄ (xk−1 ), xk−1 ).

Set Lk = η ik Lk−1 and compute

(3.3) xk = pLk (xk−1 ).

F (xk ) ≤ QLk (xk , xk−1 ) ≤ QLk (xk−1 , xk−1 ) = F (xk−1 ),

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(3.4) βL(f ) ≤ Lk ≤ αL(f ),

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Adding (3.7) and (3.8) times β/α, we get

and hence it follows that

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

F (pL̄ (yk )) ≤ QL̄ (pL̄ (yk ), yk ).

Set Lk = η ik Lk−1 and compute

where vk := F (xk ) − F (x∗ ), uk := tk xk − (tk − 1)xk−1 − x∗ .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

b − a2 + 2b − a, a − c = b − c2 − a − c2

to the right-hand side of the last inequality with

a := tk+1 yk+1 , b := tk+1 xk+1 , c := (tk+1 − 1)xk + x∗ ,

tk+1 yk+1 = tk+1 xk + (tk − 1)(xk − xk−1 ) and uk = tk xk − (tk − 1)xk−1 − x∗ ,

ak − ak+1 ≥ bk+1 − bk ∀k ≥ 1, with a1 + b1 ≤ c, c > 0.

Then, ak ≤ c for every k ≥ 1.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

F (x∗ ) − F (x1 ) = F (x∗ ) − F (p(y1 ))

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

original blurred and noisy

Figure 1. Deblurring of the cameraman.

5. Numerical examples. In this section we illustrate the performance of FISTA compared

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ISTA: F100 = 5.44e-1 ISTA: F200 = 3.60e-1

MTWIST: F100 = 3.09e-1 MTWIST: F200 = 2.61e-1

FISTA: F100 = 2.40e-1 FISTA: F200 = 2.28e-1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

original blurred and noisy

Figure 3. Deblurring of the simple test image.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ISTA: F100 = 5.67e-1 ISTA: F200 = 4.27e-1

F (xk ) − F (x∗ ) O(1/k),

b − a2 + 2b − a, a − c = b − c2 − a − c2