0% found this document useful (0 votes)

12 views15 pages

Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging

Uploaded by

tariq mezroub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging

Uploaded by

tariq mezroub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

15/08/2024 03:20 6.

Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

Numerical optimization for inverse problems

Contents
6.1. Smooth optimization
6.2. Convex optimization
6.3. Exercises
6.4. Assignments

In this chapter we treat numerical algorithms for solving optimization problems over R n . Throughout we will assume that the objective
J(u) = D(u) + R(u) satisfies the conditions for a unique minimizer to exist. We distinguish between two important classes of problems;
smooth problems and convex problems.

6.1. Smooth optimization

For smooth problems, we assume to have access to as many derivatives of J as we need. As before, we denote the first derivative (or
gradient) by J ′ : R n → R n . We denote the second derivative (or Hessian) by J ′′ : R n → R n×n . We will additionally assume that the
Hessian is globally bounded, i.e. there exists a constant L < ∞ such that −L ⋅ I ⪯ J ′′ (u) ⪯ L ⋅ I for all u ∈ R n . Note that this implies
that J ′ is Lipschitz continous with constant L: ∥J ′ (u) − J ′ (v)∥ 2 ≤ L∥u − v∥ 2 .

For a comprehensive treatment of this topic (and many more), we recommend the seminal book Numerical Optimization by Stephen Wright
and Jorge Nocedal [Nocedal and Wright, 2006].

Before discussing optimization methods, we first introduce the optimality conditions.

 Definition: Optimality conditions

Given a smooth functional J : R n → R, a point u ∗ ∈ R n is local minimizer iff it satisfies the first and second order optimality
conditions

J ′ (u ∗ ) = 0, J ′′ (u ∗ ) ⪰ 0.

If J ′′ (u ∗ ) ≻ 0 we call u ∗ a strict local minimizer.

6.1.1. Gradient descent

The steepest descent method proceeds to find a minimizer through a fixed-point iteration

u k+1 = (I − λJ ′ )(u k ) = u k − λJ ′ (u k ),

where λ > 0 is the step size. The following theorem states that this iteration will yield a fixed point of J , regardless of the initial iterate,
provided that we pick λ small enough.

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 1/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Theorem: Global convergence of steepest descent

Let J : R n → R be a smooth, Lipschitz-continuos functional. The fixed point iteration

u k+1 = (I − λJ ′ )(u k ), (6.1)

with λ ∈ (0, (2L) −1 ) produces iterates u k for which

J(u 0 ) − J ∗
min ∥J ′ (u k )∥ 22 ≤ ,
k∈{0,1,…,n−1} Cn

with C = λ (1 − ) and J ∗ = min u J(u). This implies that ∥J ′ (u k )∥ 2 → 0 as k → ∞. To guarantee

λL
2
min k∈{0,1,…,n−1} ∥J ′ (u k )∥ 2 ≤ ϵ we thus need O(1/√ϵ) iterations.

 Proof

Stronger statements about the rate of convergence can be made by making additional assumptions on J (such as (strong) convexity), but
this is left as an exercise.

6.1.2. Line search

While the previous results are nice in theory, we usually do not have access to the Lipschitz constant L. Moreover, the global bound on the
step size provided by the Lipschitz constant may be pessimistic for a particular starting point. This could lead us to pick a very small step
size, yielding slow convergence in practice. A popular way of choosing a step size adaptively is a line search strategy. To introduce these, we
slightly broaden the scope and consider the iteration

u k+1 = u k + λ k d k ,

where d k is a descent direction satisfying ⟨d k , J ′ (u k )⟩ < 0. Obviously, d k = −J ′ (u k ) is a descent direction, but other choices may be
beneficial in practice. In particular, we can choose d k = −BJ ′ (u k ) for any positive-definite matrix B to obtain a descent direction. How to
choose such a matrix will be discussed in the next section.

Two important line search methods are discussed below.

 Definition: Backtracking line search

In order to ensure sufficient progress of the iterations, we can choose a steplength that guarantees sufficient descent:

J(u k + λd k ) ≤ J(u k ) + c 1 λ⟨d k , J ′ (u k )⟩, (6.2)

with c 1 ∈ (0, 1) a small constant (typically c 1 = 10 −4 ). Existence of a λ satisfying these conditions is guaranteed by the regularity
of J . We can find a suitable λ by backtracking:

def backtracking(J,Jp,u,d,lmbda,rho=0.5,c1=1e-4)
"""
Backtracking line search to find a step size satisfying J(u + lmbda*d) <= J(u) + lmbda*c1*J(u)^Td

Input:
J - Function object returning the value of J at a given input vector
Jp - Function object returning the gradient of J at a given input vector
u - current iterate as array of length n
d - descent direction as array of length n
lmbda - initial step size
rho,c1 - backtracking parameters, default (0.5,1e-4)

Output:
lmbda - step size satisfying the sufficient decrease condition
"""
while J(u + lmbda*d) > J(u) + c*lmbda*Jp(u).dot(d):
lmbda *= rho
return lmbda

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 2/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Definition: Wolfe linesearch

A possible disadvantage of the backtracking linesearch introduced earlier is that it may end up choosing very small stepsizes. To
obtain a stepsize that yields a new iterate at which the slope of J is not too large, we introduce the following condition

|⟨J ′ (u k + λd k ), d k ⟩| ≤ c 2 |⟨J ′ (u k ), d k ⟩|, (6.3)

where c 2 is a small constant satisfying 0 < c 1 < c 2 < 1. The conditions (6.2) and (6.3), are referred to as the strong Wolfe
conditions. Existence of a stepsize satisfying these conditions is again guaranteed by the regularity of J (cf. [Nocedal and Wright,
2006], lemma 3.1). Finding such a λ is a little more involved than the backtracking procedure outlined above (cf. [Nocedal and
Wright, 2006], algorithm 3.5). Luckily, the SciPy library provides an implementation of this algorithm (cf.
scipy.optimize.line_search)

6.1.3. Second order methods

A well-known method for root finding is Newton’s method, which finds a root for which J ′ (u) = 0 via the fixed point iteration

u k+1 = u k − J ′′ (u k ) −1 J ′ (u k ). (6.4)

We can interpret this method as finding the new iterate u k+1 as the (unique) minimizer of the quadratic approximation of J around u k :

J(u) ≈ J(u k ) + J ′ (u k )(u − u k ) + 1

2
⟨u − u k , J ′′ (u k )(u − u k )⟩.

 Theorem: Convergence of Newton’s method

Let J be a smooth functional and u ∗ be a (local) minimizer. For any u 0 sufficiently close to u ∗ , the iteration (6.4) converges
quadratically to u ∗ , i.e.,

∥u k+1 − u ∗ ∥ 2 ≤ M∥u k − u ∗ ∥ 22 ,

with M = 2∥J ′′′ (u ∗ )∥ 2 ∥J ′′ (u ∗ ) −1 ∥ 2 .

 Proof

In practice, the Hessian may not be invertible everywhere and we may not have an initial iterate sufficiently close to a minimizer to ensure
convergence. Practical applications therefore include a line search and a safeguard against non-invertible Hessians.

In some applications, it may be difficult to compute and invert the Hessian. This problem is addressed by so-called quasi-Newton methods
which approximate the Hessian. The basis for such approximations is the secant relation

H k (u k+1 − u k ) = (J ′ (u k+1 ) − J ′ (u k )),

which is satisfied by the true Hessian J ′′ at a point η k = u k + t(u k+1 − u k ) for some t ∈ (0, 1). Obviously, we cannot hope to solve for
H k ∈ R n×n from just these n equations. We can, however, impose some structural assumptions on the Hessian. Assuming a simple
diagonal structure H k = h k I yields h k = ⟨J ′ (u k+1 ) − J ′ (u k ), u k+1 − u k ⟩/∥u k+1 − u k ∥ 22 . In fact, even gradient-descent can be
interpreted in this manner by approximating J ′′ (u k ) ≈ L ⋅ I .

An often-used approximation is the Broyden-Fletcher-Goldfarb-Shannon (BFGS) approximation, which keeps track of the steps
s k = u k+1 − u k and gradients y k = J ′ (u k+1 ) − J ′ (u k ) to recursively construct an approximation of the inverse of the Hessian as

B k+1 = (I − ρ k s k y Tk )B k (I − ρ k y k s Tk ) + ρ k s k s Tk ,

with ρ k = (⟨s k , y k ⟩) −1 and B 0 choses appropriately (e.g., B 0 = L −1 ⋅ I ). It can be shown that this approximation is sufficiently accurate to
yield super linear convergence when using a Wolfe line search.

The are many practical aspects to implementing such methods. For example, what do we do when the approximated Hessian becomes
(almost) singular? Discussing these issues is beyond the scope of these lecture notes and we refer to [Nocedal and Wright, 2006], chapter 6
for more details. The SciPy library provides an implementation of various optimization methods.

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 3/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

6.2. Convex optimization

In this section, we consider finding a minimizer of a convex functional J : R n → R ∞ . Note that we allow the functionals to take values on
the extended real line. We accordingly define the domain of J as dom(J) = {u ∈ R n | J(u) < ∞}.

To deal with convex functionals that are not smooth, we first generalize the notion of a derivative.

 Definition: subgradient

Given a convex functional J , we call g ∈ R n a subgradient of J at u if

J(v) ≥ J(u) + ⟨g, v − u⟩ ∀v ∈ R n .

This definition is reminiscent of the Taylor expansion and we can indeed easily check that it holds for convex smooth functionals
for g = J ′ (u). For non-smooth functionals there may be multiple vectors g satisfying the inequality. We call the set of all such
vectors the subdifferential which we will denote as ∂J(u). We will generally denote an arbritary element of ∂J(u) by J ′ (u).

 Example: Subdifferentials of some functions

Let

J 1 (u) = |u|,
J 2 (u) = δ [0,1] (u),
J 3 (u) = max{u, 0}.

All these functions are convex and exhibit a discontinuity in the derivative at u = 0. The subdifferentials at u = 0 are given by

∂J 1 (0) = [−1, 1]
∂J 2 (0) = (−∞, 0]
∂J 3 (0) = [0, 1]

Fig. 6.1 Examples of several convex functions and an

element of their subdifferential at u = 0.

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 4/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

Some useful calculus rules for subgradients are listed below.

 Theorem: Computing subgradients

Let J i : R n → R ∞ be proper convex functionals and let A ∈ R n×n , b ∈ R n . We then have the following usefull rules

Summation: Let J = J 1 + J 2 , then J 1′ (u) + J 2′ (u) ∈ ∂J(u) for u in the interior of dom(J).
Affine transformation: Let J(u) = J 1 (Au + b), then A T J 1′ (Au + b) ∈ ∂J for u, Au + b in the interior of dom(J).

An overview of other useful relations can be found in e.g., [Beck, 2017] section 3.8.

With this we can now formulate optimality conditions for convex optimization.

 Definition: Optimality conditions for convex optimization

Let J : R n → R ∞ be a proper convex functional. A point u ∗ ∈ R n is a minimizer iff

0 ∈ J ′ (u ∗ ).

 Example: Computing the median

The median u of a set of numbers (f 1 , f 2 , … , f n ) (with f i < f i+1 ) is a minimizer of

n
J(u) = ∑ |u − f i |.
i=1

Introducing J i = |u − f i | we have

⎧−1 u < fi
J i′ (u) = ⎨[−1, 1] u = f i ,
⎩
1 u > fi

with which we can compute J ′ (u) using the sum-rule:

⎧−n u < f1
2i − n u ∈ (f i , f i+1 )
J ′ (u) = ⎨ .
⎩2i − 1 − n + [−1, 1] u = fi
n u > fn

To find a u for which 0 ∈ J ′ (u) we need to consider the middle two cases. If n is even, we can find an i such that 2i = n and get
that for all u ∈ [f n/2 , f n/2+1 ] we have 0 ∈ J ′ (u). When n is odd, we have optimality only for u = f (n+1)/2 .

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html
⎪
Fig. 6.2 Subgradient of J for f = (1, 2, 3, 4) and
f = (1, 2, 3, 4, 5).

5/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

6.2.1. Subgradient descent

A natural extension of the gradient-descent method for smooth problems is the subgradient descent method:

u k+1 = u k − λ k J ′ (u k ), J ′ (u k ) ∈ ∂J(u k ), (6.5)

where λ k denote the step sizes.

 Theorem: Convergence of subgradient descent

Let J : R n → R be a convex, L− Lipschitz-continuous function. The iteration (6.5) produces iterates for which

∥u 0 − u ∗ ∥ 22 + L 2 ∑ n−1 2
k=0 λ k
min J(u k ) − J(u ∗ ) ≤ .
k∈{0,1,…,n−1} 2 ∑ n−1
k=0 λ k

Thus, J(u k ) → J(u ∗ ) as k → ∞ when the stepsize satisfy

∞ ∞
∑ λ k = ∞, ∑ λ 2k < ∞.
k=0 k=0

 Proof

 Remark: Convergence rate for a fixed stepsize

If we choose λ k = λ, we get

∥u 0 − u ∗ ∥ 22 + L 2 λ 2 n 2
min J(u k ) − J(u ∗ ) ≤ .
k∈{0,1,…,n−1} 2λn

we can guarantee that min k∈{0,1,…,n−1} J(u k ) − J(u ∗ ) ≤ ϵ by picking stepsize λ = ϵ/L 2 and doing n = (∥u 0 − u ∗ ∥ 2 L/ϵ) 2
iterations. However, for smooth convex functions we derive a stronger result that gradient-descent requires only O(1/ϵ) iterations
(use exercise 6.4.2, the Lipschitz property, and the subgradient inequality). For smooth strongly convex functionals we can
strengthen the result even further and show that we only need O(log 1/ϵ) iterations (see exercise 6.4.1). The proofs are left as an
exercise.

6.2.2. Proximal gradient methods

While the subgradient descent method is easily implemented, it does not fully exploit the structure of the objective. In particular, we can
often split the objective in a smooth and a convex part. For the discussion we will assume for the moment that

J(u) = D(u) + R(u),

where D is smooth and R is convex. We are then looking for a point u ∗ for which

D ′ (u ∗ ) ∈ −∂R(u ∗ ). (6.6)

Finding such a point can be done (again!) by a fixed-point iteration

u k+1 = (I + λ∂R) −1 (I − λD ′ )(u k ),

where u = (I + λ∂R) −1 (v) yields a point u for which λ −1 (v − u) ∈ ∂R(u). We can easily show that a fixed point of this iteration indeed
solves the differential inclusion problem (6.6). Assuming a fixed point u ∗ , we have

u ∗ = (I + λ∂R) −1 (I − λD ′ )(u ∗ ),

using the definition of (I + λ∂R) this yields

−1

λ −1 (u ∗ − λD ′ (u ∗ ) − u ∗ ) ∈ ∂R(u ∗ ),

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 6/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

which indeed confirms that −D ′ (u ∗ ) ∈ ∂R(u ∗ ).

 Definition: Proximal operator

The operator (I + λ∂R) −1 is called the proximal operator of λR, whose action on input v is implicitly defined as solving

min 12 ∥u − v∥ 22 + λR(u).
u

We usually denote this operator by prox λR (v).

With this, the proximal gradient method for solving (6.6) is then denoted as

u k+1 = prox λR (u k − λD ′ (u k )). (6.7)

 Theorem: Convergence of the proximal point iteration

Let J = D + R be a functional with D smooth and R convex. Denote the Lipschitz constant of D ′ by L D . The iterates produced
by (6.7) with a fixed stepsize λ = 1/L D converge to a fixed point, u ∗ , of (6.7).

If, in addition, D is convex the iterates converges sublinearly to a minimizer u ∗ :

L D ∥u ∗ − u 0 ∥ 22
J(u k ) − J ∗ ≤ .
2k

If D is μ-strongly convex, the iteration converges linearly to a minimizer u ∗ :

∥u k+1 − u ∗ ∥ 22 ≤ (1 − μ/L D )∥u k − u ∗ ∥ 22 .

 Proof

When compared to the subgradient method, we may expect better performance from the proximal gradient method when D is strongly
convex and R is convex. Even if J is smooth, the proximal gradient method may be favorable as the convergence constants depend on the
Lipschitz constant of D only; not J . All this comes at the cost of solving a minimization problem at each iteration, so these methods are
usually only applied when a closed-form expression for the proximal operator exists.

 Example: one-norm

The proximal operator for the ℓ 1 norm solves

min 12 ∥u − v∥ 2 + λ∥u∥ 1 .
u

The solution obeys u − v ∈ −∂λ∥u∥ 1 , which yields

⎧{−λ} ui > 0
u i − v i ∈ ⎨[−λ, λ] u i = 0
⎩
{λ} ui < 0

This condition is fulfulled by setting

⎧v i − λ vi > λ
u i = ⎨0 |v i | ≤ λ
⎩
vi + λ v i < −λ

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 7/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Example: box constraints

The Proximal operator of the indicator function of δ [a,b]n solves

minn 12 ∥u − v∥ 2 .
[a,b]

The solution is given by

⎧a vi < a
u i = ⎨v i v i ∈ [a, b].
⎩
b vi > b

Thus, u is an orthogonal projection of v on [a, b] n .

6.2.3. Splitting methods

The proximal point methods require that the proximal operator for R can be evaluated efficiently. In many practical applications this is not
the cases, however. Instead, we may have a regularizer of the form R(Au) for some linear operator A. Even when R(⋅) admits an efficient
proximal operator R(A⋅) will, in general, not. In this section we discuss a class of methods that allow us to shift the operator A to the other
part of the objective. As a model-problem we will consider solving

min D(u) + R(Au),

u∈R n

with D smooth and convex, R(⋅) convex and A ∈ R m×n a linear map. The basic idea is to introduce an auxiliary variable v and re-formulate
the variational problem as

min
u∈R n ,v∈R m

The equivalence between (6.8) and (6.9) is established in the following theorem

 Theorem: Saddle point theorem

⎪
D(u) + R(v), s.t. Au = v.

To solve such constrained optimization problems we employ the method of Lagrange multipliers which defines the Lagrangian

Λ(u, v, ν) = D(u) + R(v) + ⟨ν, Au − v⟩,

where ν ∈ R m are called the Lagrange multipliers. The solution to (6.8) is a saddle point of Λ and we can thus be obtained by solving

min max Λ(u, v, ν).

u,v ν

Let (u ∗ , v ∗ ) be a solution to (6.8), then there exists a ν ∗ ∈ R m such that (u ∗ , v ∗ , ν ∗ ) is a saddle point of Λ and vice versa.

 Proof

Another important concept related to the Lagrangian is the dual problem.

 Definition: Dual problem

The dual problem related to (6.9) is

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html
max min Λ(u, v, ν).
ν u,v

For convex problems, the primal and dual problems are equivalent, giving us freedom when designing algorithms.
(6.8)

(6.9)

(6.10)

8/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Theorem: Strong duality

The primal (6.9) and dual (6.10) are equivalent in the sense that

min max Λ(u, v, ν) = max min Λ(u, v, ν).

u,v ν ν u,v

 Proof

 Example: TV-denoising

The TV-denoising problem can be expressed as

1 δ 2
min
n 2 ∥u − f ∥ 2 + λ∥Du∥ 1 ,
R

with D ∈ R m×n a discretisation of the first derivative. We can express the corresponding dual problem as

max min 12 ∥u − f δ ∥ 22 + ⟨ν, Du⟩ + min v λ∥v∥ 1 − ⟨ν, v⟩.

ν u

The first term is minimised by setting u = f δ − D ∗ ν . The second term is a bit trickier. First, we note that λ∥v∥ 1 − ⟨ν, v⟩ is not
bounded from below when ∥ν∥ ∞ > λ. Furthermore, for ∥ν∥ ∞ ≤ λ it attains a minimum for v = 0.

This leads to

max − 12 ∥D ∗ ν∥ 22 + ⟨D ∗ ν, f δ ⟩ − δ ∥⋅∥∞ ≤λ (ν),

which is a constrained quadratic program. Since the first part is smooth and the proximal operator for the constraint ∥ν∥ ∞ ≤ λ is
easy we can employ a proximal gradient method to solve the dual problem. Having solved it, we can retrieve the primary variable
via the relation u = f δ − D ∗ ν .

The strategy illustrated in the previous approach is an example of a more general approach to solving problems of form (6.8).

 Dual-based proximal gradient

We start from the dual problem (6.10):

max (min (D(u) + ⟨Au, ν⟩)) + (min (R(v) − ⟨ν, v⟩)).

ν u v

In this expression we recognise the convex conjugates of D and R. With this, we re-write the problem as

min D ∗ (−A T ν) + R ∗ (ν).

Thus, we have moved the linear map to the other side. We can now apply the proximal gradient method provided that:

We have a closed-form expression for the convex conjugates of D and R;

R ∗ has a proximal operator that is easily evaluated.

For many simple functions, we do have such closed-form expressions of their convex conjugates. Moreover, to compute the
proximal operator, we can use Moreau’s identity: prox R (u) + prox R∗ (u) = u.

It may not always be feasible to formulate the dual problem explicitly as in the previous example. In such cases we would rather solve (6.10)
directly. A popular way of doing this is the Alternating Direction of Multipliers Method.

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 9/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Alternating Direction of Multipliers Method (ADMM)

We augment the Lagrangian by adding a quadratic term:

ρ
Λ ρ (u, v, ν) = D(u) + R(v) + ⟨ν, Au − v⟩ + 2 ∥Au − v∥ 22 .

We then find the solution by updating the variables in an alternating fashion

u k+1 = arg min Λ ρ (u, v k , ν k ),

v k+1 = arg min Λ ρ (u k+1 , v, ν k ),

ν k+1 = ν k + ρ(Au k+1 − v k+1 ).

Efficient implementations of this method rely on the proximal operators of D and R.

 Example:TV-denoising

Consider the TV-denoising problem from the previous example.

The ADMM method find a solution via

u k+1 = (I + ρD ∗D) −1 (f δ + D ∗ (ρv k − ν k )).

v k+1 = prox (λ/ρ)∥⋅∥1 (Du k+1 + ρ −1 ν k ).

ν k+1 = ν k + ρ (Du k+1 − v k+1 ).

We cannot do justice to the breadth and depth of the topics smooth and convex optimization in one chapter. Rather, we hope that this
chapter serves as a starting point for further study in one of these areas for some, and provides useful recipes for others.

6.3. Exercises

6.3.1. Steepest descent for strongly convex functionals

Consider the following fixed point iteration for minimizing a given function J : R n → R

u (k+1) = u (k) − αJ ′ (u (k) ),

where J is twice continuously differentiable and strictly convex:

μI ⪯ J ′′ (u) ⪯ LI,

with 0 < μ < L < ∞.

Show that the fixed point iteration converges linearly, i.e., ∥u (k+1) − u ∗ ∥ ≤ ρ∥u (k) − u ∗ ∥ with ρ < 1, for 0 < α < 2/L.

 Answer

Determine the value of α for which the iteration converges fastest.

 Answer

6.3.2. Steepest descent for convex functions

Let J : R n be convex and Lipschitz-smooth. Show that the basic steepest-descent iteration with step size λ = 1/L produces iterates for
which

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 10/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

∥u 0 − u ∗ ∥
J(u k ) − J(u ∗ ) ≤ .
2Lk

The key is to use that

J(v) ≤ J(u) + ⟨J ′ (u), v − u⟩ + L

2 ∥u − v∥ 22 .

 Answer

6.3.3. Rosenbrock
We are going to test various optimization methods on the Rosenbrock function

f(x, y) = (a − x) 2 + b(y − x 2 ) 2 ,

with a = 1 and b = 100. The function has a global minimum at (a, a 2 ).

Write a function to compute the Rosenbrock function, its gradient and the Hessian for given input (x, y). Visualize the function on
[−3, 3] 2 and indicate the neighborhood around the minimum where f is convex.
Implement the method from exercise 1 and test convergence from various initial points. Does the method always convergce? How
small do you need to pick α? How fast?
Implement a linesearch strategy to ensure that α k satisfies the Wolfe conditions, does α vary a lot?

 Answer

6.3.4. Subdifferentials
Compute the subdifferentials of the following functionals J : R n → R + :

The Euclidean norm J(u) = ∥u∥ 2 .

The elastic net J(u) = α∥u∥ 1 + β∥u∥ 22
The weighted ℓ 1 -norm J(u) = ∥Du∥ 1 , with D ∈ R m×n for m < n a full-rank matrix.

 Answer

6.3.5. Dual problems

Derive the dual problems for the following optimization problems

min u ∥u − f δ ∥ 1 + λ∥u∥ 22 .
min u 12 ∥u − f δ ∥ 22 + λ∥u∥ p , p ∈ N >0 .
min u∈[−1,1]n 12 ∥u − f δ ∥ 22 .

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 11/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Answer

6.3.6. TV-denoising
In this exercise we consider a one-dimensional TV-denoising problem

1 δ 2
min
n 2 ∥u − f ∥ 2 + λ∥Du∥ 1 ,
R

with D a first-order finite difference discretization of the first derivative.

Show that the problem is equivalent (in terms of solutions) to solving

min 12 ∥D ∗ ν − f δ ∥ 22 s.t. ∥ν∥ ∞ ≤ λ.

Implement a proximal-gradient method for solving the dual problem.

Implement an ADMM method for solving the (primal) denoising problem.
Test and compare both methods on a noisy signal. Example code is given below.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300

# grid \Omega = [0,1]

n = 100
h = 1/(n-1)
x = np.linspace(0,1,n)

# parameters
sigma = 1e-1

# make data
u = np.heaviside(x - 0.2,0)
f_delta = u + sigma*np.random.randn(n)

# FD differentiation matrix
D = (np.diag(np.ones(n-1),1) - np.diag(np.ones(n),0))/h

# plot
plt.plot(x,u,x,f_delta)
plt.xlabel(r'$x$')
plt.show()

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 12/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

 Answer

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 13/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

6.4. Assignments

6.4.1. Spline regularisation

The aim is to solve the following variational problem

1
min ∥Ku − f δ ∥ 22 + α∥Lu∥ 1 ,
u 2

where K is a given forward operator (matrix) and L is a discretization of the second derivative operator.

1. Design and implement a method for solving this variational problem; you can be creative here – multiple answers are possible
2. Compare your method with the basic subgradient-descent method implemented below
3. (bonus) Find a suitable value for α using the discrepancy principle

Some code to get you started is shown below.

# import libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300

# forward operator
def getK(n):
h = 1/n;
x = np.linspace(h/2,1-h/2,n)
xx,yy = np.meshgrid(x,x)
K = h/(1 + (xx - yy)**2)**(3/2)

return K,x

# define regularization operator

def getL(n):
h = 1/n;
L = (np.diag(np.ones(n-1),-1) - 2*np.diag(np.ones(n),0) + np.diag(np.ones(n-
1),1))/h**2
return L

# define grid and operators

n = 100
delta = 1e-2
K,x = getK(n)
L = getL(n)

# true solution and corresponding data

u = np.minimum(0.5 - np.abs(0.5-x),0.3 + 0*x)
f = K@u

# noisy data
noise = np.random.randn(n)
f_delta = f + delta*noise

# plot
plt.plot(x,u,x,f,x,f_delta)
plt.xlabel(r'$x$')
plt.show()

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 14/15
15/08/2024 03:20 6. Numerical optimization for inverse problems — 10 Lectures on Inverse Problems and Imaging

By Tristan van Leeuwen and Christoph Brune (CC BY-NC 4.0)

https://tristanvanleeuwen.github.io/IP_and_Im_Lectures/numerical_optimisation.html 15/15

Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Chương 9
No ratings yet
Chương 9
12 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
BFGS
No ratings yet
BFGS
9 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Backpropagation Optimization Tutorial
No ratings yet
Backpropagation Optimization Tutorial
14 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Smooth Convex Minimization Problems
No ratings yet
Smooth Convex Minimization Problems
28 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Benson 1
No ratings yet
Benson 1
36 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
02 Grad Desc
No ratings yet
02 Grad Desc
54 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
OPTFIT Aflevering
No ratings yet
OPTFIT Aflevering
9 pages
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
No ratings yet
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
12 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
EE364a Homework 7 Solutions
No ratings yet
EE364a Homework 7 Solutions
16 pages
Metodos Iterativos para Optimizacion
No ratings yet
Metodos Iterativos para Optimizacion
188 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
Advances in Operations Research - 2009 - Yuan - A Trust Region Based BFGS Method With Line Search Technique For Symmetric
No ratings yet
Advances in Operations Research - 2009 - Yuan - A Trust Region Based BFGS Method With Line Search Technique For Symmetric
22 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
No ratings yet
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
188 pages
Unconstrained
No ratings yet
Unconstrained
30 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
No ratings yet
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
187 pages
Lec 6 Tutorial
No ratings yet
Lec 6 Tutorial
27 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
No ratings yet
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
8 pages
Math 11143 Peer Review
No ratings yet
Math 11143 Peer Review
16 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Opt ch6
No ratings yet
Opt ch6
6 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Unconstrained Optimization - Ipynb - Colaboratory
No ratings yet
Unconstrained Optimization - Ipynb - Colaboratory
5 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
No ratings yet
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
8 pages
Class06 SGD
No ratings yet
Class06 SGD
24 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
HZDR Information
No ratings yet
HZDR Information
1 page
British Journal For The History of Philosophy
No ratings yet
British Journal For The History of Philosophy
27 pages
CHATZIIOANNOU, V. - Forward and Inverse Modelling of Single-Reed Woodwind Instruments With Application To Digital Sound Synthesis
No ratings yet
CHATZIIOANNOU, V. - Forward and Inverse Modelling of Single-Reed Woodwind Instruments With Application To Digital Sound Synthesis
173 pages
1993-Aramaki-The Contact Between Rough
No ratings yet
1993-Aramaki-The Contact Between Rough
6 pages
Jeh Gauges Issue 1 H
No ratings yet
Jeh Gauges Issue 1 H
2 pages
Eec 125
No ratings yet
Eec 125
63 pages
Hierarchical Controller For Highly Dynamic Locomotion Utilizing Pattern Modulation and Impedance Control Implementation On The MIT Cheetah Robot
No ratings yet
Hierarchical Controller For Highly Dynamic Locomotion Utilizing Pattern Modulation and Impedance Control Implementation On The MIT Cheetah Robot
111 pages
Magnification Factor and Frequency Response Curve
No ratings yet
Magnification Factor and Frequency Response Curve
8 pages
Sensory System
No ratings yet
Sensory System
51 pages
Cea Cse
No ratings yet
Cea Cse
1,092 pages
01 Semiconductor Theory
No ratings yet
01 Semiconductor Theory
32 pages
ELLIPSOIS
No ratings yet
ELLIPSOIS
4 pages
Physical Science 3
No ratings yet
Physical Science 3
16 pages
IGCSE Work Energy and Power
100% (2)
IGCSE Work Energy and Power
13 pages
CH 5 Edexcel p3 Worksheet
No ratings yet
CH 5 Edexcel p3 Worksheet
44 pages
Natf Reference Document Generator Specifications Open
No ratings yet
Natf Reference Document Generator Specifications Open
16 pages
Teachers Guide Earthquake Education
No ratings yet
Teachers Guide Earthquake Education
118 pages
Geotechnical Report 1
No ratings yet
Geotechnical Report 1
7 pages
Induction Motor Bee
No ratings yet
Induction Motor Bee
3 pages
Updated Seat Matrix M Tech 2025-26-2
No ratings yet
Updated Seat Matrix M Tech 2025-26-2
18 pages
Lecture 10
No ratings yet
Lecture 10
14 pages
Handout 3 - Supplementary - CEE 381
No ratings yet
Handout 3 - Supplementary - CEE 381
22 pages
Get 201 L1
No ratings yet
Get 201 L1
28 pages
5 Linear Independence: Not All Zero Such That
No ratings yet
5 Linear Independence: Not All Zero Such That
8 pages
Energy in The Environment: Lesson Plan Grade 10 Science
No ratings yet
Energy in The Environment: Lesson Plan Grade 10 Science
5 pages
G10 - PB1 Portion - 24-25
No ratings yet
G10 - PB1 Portion - 24-25
3 pages
Chapter - 1 G9 Term 3
No ratings yet
Chapter - 1 G9 Term 3
23 pages
Fully Developed Turbulent Flow in Pipe
No ratings yet
Fully Developed Turbulent Flow in Pipe
29 pages
Bsg8 Edited q1w7nk Heat and Temperature
No ratings yet
Bsg8 Edited q1w7nk Heat and Temperature
15 pages
Physics Note (Ch-19)
No ratings yet
Physics Note (Ch-19)
9 pages