0% found this document useful (0 votes)
53 views61 pages

An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms

This document introduces numerical algorithms for nonsmooth convex optimization. It begins with definitions of nonsmooth convex optimization problems and applications. It then discusses subgradients and subdifferentials, and basic properties. The document outlines numerical algorithms like nonsmooth black-box optimization and proximal gradient algorithms. It concludes with an overview of algorithms for solving nonsmooth convex optimization problems.

Uploaded by

Leonardo Augusto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views61 pages

An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms

This document introduces numerical algorithms for nonsmooth convex optimization. It begins with definitions of nonsmooth convex optimization problems and applications. It then discusses subgradients and subdifferentials, and basic properties. The document outlines numerical algorithms like nonsmooth black-box optimization and proximal gradient algorithms. It concludes with an overview of algorithms for solving nonsmooth convex optimization problems.

Uploaded by

Leonardo Augusto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Introduction Numerical algorithms for nonsmooth optimization Conclusions References

An introduction to nonsmooth convex


optimization: numerical algorithms

Masoud Ahookhosh

Faculty of Mathematics, University of Vienna


Vienna, Austria

Convex Optimization I

January 29, 2014

1 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Table of contents

1 Introduction
Definitions
Applications of nonsmooth convex optimization
Basic properties of subdifferential

2 Numerical algorithms for nonsmooth optimization


Nonsmooth black-box optimization
Proximal gradient algorithm
Smoothing algorithms
Optimal complexity algorithms

3 Conclusions

4 References

2 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem

minimize f (x)
(1)
subject to x ∈ C

f (x) is a convex function;


C is a closed convex subset of vector space V ;

Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Applications
Applications of convex optimization:
Approximation and fitting;
Norm approximation;
Least-norm problems;
Regularized approximation;
Robust approximation;
Function fitting and interpolation;
Statistical estimation;
Parametric and nonparametric distribution estimation;
Optimal detector design and hypothesis testing;
Chebyshev and Chernoff bounds;
Experiment design;
Global optimization;
Find bounds on the optimal value;
Find approximation solutions;
Convex relaxation;
4 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Geometric problems;
Projection on and distance between sets;
Centering and classification;
Placement and location;
Smallest enclosed elipsoid;
Image and signal processing;
Optimizing the number of image models using convex relaxation;
Image fusion for medical imaging;
Image reconstruction;
Sparse signal processing;
Design and control of complex systems;
Machine learning;
Financial and mechanical engineering;
Computational biology;

5 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Dfinition: subgradient and subdifferential

Definition 2 (Subgradient and subdifferential).


A vector g ∈ Rn is a subgradient of f : Rn → R at x ∈ domf if

f (z) ≥ f (x) + g T (z − x), (2)

for all z ∈ domf .


The set of all subgradients of f at x is called the subdifferential of f
at x and denoted by ∂f (x).

Definition 3 (Subdifferentiable functions).


A function f is called subdifferentiable at x if there exists at least
one subgradient of f at x.
A function f is called subdifferentiable if it is subdifferentiable at all
x ∈ domf .
6 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Subgradient and subdifferential

Examples:
if f is convex and differentiable, then the following first order
condition holds:

f (z) ≥ f (x) + ∇f (x)T (z − x), (3)

for all z ∈ domf . This implies: ∂f (x) = ∇f (x);


Absolute value. Consider f (x) = |x|, then we have

 1 x > 0;
∂f (x) = [-1,1] x = 0;

−1 x < 0.

Thus, g = sign(x) is a subgradient of f at x.

7 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Basic properties

Basic properties of subdifferential are as follows:


The subdifferential ∂f (x) is a closed convex set, even for a
nonconvex function f .
If f is convex and x ∈ int domf , then ∂f (x) is nonempty and
bounded.
∂(αf (x)) = α∂f (x), for α ≥ 0.
P P
∂( ni=1 fi (x)) = ni=1 ∂fi (x).
If h(x) = f (Ax + b), then ∂h(x) = AT ∂f (Ax + b).
If h(x) = maxS
i = 1, · · · , nfi (x), then
∂h(x) = conv {∂fi (x) | fi (x) = h(x) i = 1, · · · , n}.
If h(x) = supβSfβ (x), then
∂h(x) = conv {∂fβ (x) | fβ (x) = h(x) β ∈ B}.

8 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

How to calculate subgradients


Pn
Example: consider f (x) = kxk1 = i=1 |xi |. It is clear that

f (x) = max{sT x | si ∈ {−1, 1}}

We have sT x is differentiable and g = ∇fi (x) = s. Thus, for active


sT x = kxk1 , we should have

 1 s > 0;
si = {-1,1} s = 0; (4)

−1 s < 0.

This clearly implies


[
∂f (x) = conv {g | g of the form (4), g T x = kxk1 }
= {g | kgk∞ ≤ 1, g T x = kxk1 }.

Thus, g = sign(x) is a subgradient of f at x.


9 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimality condition:

First-order condition: A point x∗ is a minimizer of a convex function


f if and only if f is subdifferentiable at x∗ and

0 ∈ ∂f (x∗ ), (5)

i.e., g = 0 is a subgradient of f at x∗ .
The condition (5) reduces to ∇f (x∗ ) = 0 if f is differentiable at x∗ .
Analytical complexity: The number of calls of oracle, which is
required to solve a problem up to the accuracy ε. This means the
number of calls of oracle such that

f (xk ) − f (x∗ ) ≤ ε; (6)

Arithmetical complexity: The total number of arithmetic operations,


which is required to solve a problem up to the accuracy ε;
10 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical algorithms

The algorithms for solving nonsmooth convex optimization problems are


commonly divided into the following classes:

The nonsmooth balck-box optimization;


Proximal mapping technique;
Smoothing methods;

We here will not consider derivative-free and heuristic algorithms for


solving nonsmooth convex optimization problems.

11 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Nonsmooth black-box optimization: subgradient algorithms


The subgradient scheme for unconstrained problems:

xk+1 = xk − αk gk ,

where gk is a subgradient of the function f at xk , and is a step size


determined by:
Constant step size: αk = α;
Constant step length: αk = γ/kgk k2 ;
Square summable
Pn but not summable:
Pn
αk ≥ 0, k=1 = αk2 < ∞, k=1 = αk = ∞;
Nonsummable diminishing step
Pn size:
αk ≥ 0, limk→∞ αk = 0, k=1 = αk = ∞;
Nonsummable diminishingPstep length: αk = γk /kgk k such that
n
γ ≥ 0, limk→∞ γk = 0, k=1 = γk = ∞.

12 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

The subgradient algorithm: properties

Main properties:
The subgradient method is simple for implementations and applies
directly to the nondifferentiable f ;
The step sizes are not chosen via line search, as in the ordinary
gradient method;
The step sizes are determined before running the algorithm and do
not depend on any data computed during the algorithm;
Unlike the ordinary gradient method, the subgradient method is not
a descent method;
The function vale is nonmonotone meaning that it can even increase;
The subgradient algorithm is very slow for solving practical problems.

13 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Bound on function values error:


If the Euclidean distance of the optimal set is bounded, kx0 − x∗ k2 ≤ R,
and kgk k2 ≤ G, then we have
P
∗ R2 + G2 ki=1 αk2
fk − f ≤ P := RHS. (7)
2 ki=1 αk

Constant step size: k → ∞ ⇒ RHS → G2 α/2;


Constant step length: k → ∞ ⇒ RHS → Gγ/2;
Square summable but not summable: k → ∞ ⇒ RHS → 0;
Nonsummable diminishing step size: k → ∞ ⇒ RHS → 0;
Nonsummable diminishing step length: k → ∞ ⇒ RHS → 0.
Example: we now consider the LASSO problem
1
minimizex∈Rn kAx − bk22 + kxk1 , (8)
2
where A and b are randomly generated.
14 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 1: A comparison among the subgradient algorithms when they stopped


after 60 seconds of the running time (dense, m = 2000 and n = 5000) 15 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 2: A comparison among the subgradient algorithms when they stopped


after 20 seconds of the running time (sparse, m = 2000 and n = 5000)
16 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 3: A comparison among the subgradient algorithms when they stopped


after 60 seconds of the running time (dense, m = 2000 and n = 5000) 17 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 4: A comparison among the subgradient algorithms when they stopped


after 20 seconds of the running time (sparse, m = 2000 and n = 5000) 18 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 5: The nonmonotone behaviour of the original subgradient algorithms


when they stopped after 20 seconds of the running time (sparse, m = 2000 and
n = 5000)
19 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Projected subgradient algorithm


Consider the following constrained problem
minimize f (x)
(9)
subject to x ∈ C,
where C is a simple convex set. Then the projected subgradient scheme
is given by
xk+1 = P (xk − αk gk ), (10)
where
1
P (y) = argminx∈C kx − yk22 . (11)
2
Nonnegative orthant;
Affine set;
Box or unit ball;
Unit simplex;
An ellipsoid;
Second-order cone;
Positive semidefinite cone; 20 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Projected subgradient algorithm


Example: Let us to consider

minimize kxk1
(12)
subject to Ax = b,

where x ∈ Rn , x ∈ Rm and A ∈ Rm×n . Considering the set


C = {x | Ax = b}, we have

P (y) = y − AT (AAT )−1 (Ay − b). (13)

The projected subgradient algorithm can be summarized as follows

xk+1 = xk − αk (I − AT (AAT )−1 A)gk . (14)

By setting gk = sign(xk ), we obtain

xk+1 = xk − αk (I − AT (AAT )−1 A)sign(xk ). (15)


21 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Proximal gradient algorithm


Consider a composite function as follows
h(x) = f (x) + g(x). (16)
Characteristics of the considered convex optimization:
Appearing in many applications in science and technology: signal
and image processing, machine learning, statistics, inverse problems,
geophysics and so on.
In convex optimization → every local optimum is global optimizer.
Most of the problems are combination of both smooth and
nonsmooth functions:
h(x) = f (Ax) + g(Bx),
where f (Ax) and g(Ax) are respectively smooth and nonsmooth
functions.
Function and subgradient evaluations are so costly: Affine
transformations are the most costly part of the computation.
They are involving high-dimensional data. 22 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Proximal gradient algorithm


The algorithm involve two step, namely forward and backward, as follows:
Algorithm 1: PGA proximal gradient algorithm
Input: α0 ∈ (0, 1]; y0 ;  > 0;
begin
while stopping criteria are not hold do
yk+1 = xk − αk gk ;
xk+1 = argminx∈Rn 21 kx − yk+1 k22 + g(x);
end
end

First step called forward because it aims to go toward the minimizer,


and the second step called backward step because it remind us
feasibility step of the projected gradient method.
It is clear that the projected gradient method is a spacial case of
PGA.
23 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Smoothing algorithms

The smoothing algorithms involve the following steps:


Reformulate the problem in the appropriate form for smoothing
processes;
Make the problem smooth;
Solve the problem with smooth convex solvers.

Nesterov’s smoothing algorithm:


Reformulate the problem in the form of the minimax problem
(saddle point representation);
Add a strongly convex prox function to the reformulated problem to
make it smooth;
Solve the problem with optimal first-order algorithms.

24 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimal complexity for first-order methods


Nemirovski and Yudin in 1983 proved the following complexity bound for
smooth and nonsmooth problems:
Theorem 4 (Complexity analysis).
Suppose that f is a convex function. Then complexity bounds for
smooth and nonsmooth problems are

(Nonsmooth complexity bound) If the point generated by the


algorithm stays in bounded region of the interior of C, or f is
Lipschitz continuous
 in C, then the total number of iterations
1
needed is O 2 . Thus the asymptotic worst case complexity is
O 12 .

(Smooth complexity bound) If f has Lipschitz continuous gradient,


 
the total number of iterations needed for the algorithm is O √1 .

25 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimal first-order algorithms

Some popular optiml first-order algorithms:


Nonsummable diminishing subgradient algorithm;
Nesterov’s 1983 smooth algorithm;
Nesterov and Nemiroski’s 1988 smooth algorithm;
Nesterov’s constant step algorithm;
Nesterov’s 2005 smooth algorithm;
Nesterov’s composite algorithm;
Nesterov’s universal gradient algorithm;
Fast iterative shrinkage-thresholding algorithm
Tseng’s 2008 single projection algorithm;
Lan’s 2013 bundle-level algorithm;
Neumaier’s 2014 fast subgradient algorithm;

26 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Algorithm 2: NES83 Nesterov’s 1983 algorithm


Input: select z such that z 6= y0 and gy0 6= gz ; y0 ;  > 0;
begin
a0 ← 0; x−1 ← y0 ;
α−1 ← ky0 − zk/kgy0 − gz k;
while stopping criteria are not hold do
α̂k ← αk−1 ; x̂k ← yk − α̂k gyk ;
while f (x̂k ) < f (yk ) − 12 α̂k kgyk k2 do
α̂k ← ρα̂k ; x̂k ← yk − α̂k gyk ;
end
xk+1 ← x̂k ; αk ← α̂k ;
 q 
ak+1 ← 1 + 4a2k + 1 /2;
yk+1 ← xk + (ak − 1)(xk − xk−1 )/ak+1 ;
end
end

27 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Algorithm 3: FISTA fast iterative shrinkage-thresholding algorithm


Input: select z such that z 6= y0 and gy0 6= gz ; y0 ;  > 0;
begin
while stopping criteria are not hold do
αk ← 1/L;
zk ← yk − αk gyk ;
xk = argminx L2 kx − zk k22 + g(x);
 q 
ak+1 ← 1 + 4a2k + 1 /2;
yk+1 ← xk + (ak − 1)(xk − xk−1 )/ak+1 ;
end
end
By this adaptation, FISTA obtains the optimal complexity of smooth
first-order algorithms

28 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 6: A comparison among the subgradient algorithms when they stopped


after 60 seconds of the running time (dense, m = 2000 and n = 5000)

29 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 7: A comparison among the subgradient algorithms when they stopped


after 20 seconds of the running time (sparse, m = 2000 and n = 5000)

30 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 8: A comparison among the subgradient algorithms when they stopped


after 60 seconds of the running time (dense, m = 2000 and n = 5000)
31 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 9: A comparison among the subgradient algorithms when they stopped


after 20 seconds of the running time (sparse, m = 2000 and n = 5000)

32 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Conclusions
Summarizing our discussion:

They are appearing in applications much more than smooth


optimization;
Solving nonsmooth optimization problems is much harder than
common smooth optimization;
The most efficient algorithms for solving them are first-order
methods;
There are no normal stopping criterion in corresponding algorithms;
The algorithms are divided into three classes:
Nonsmooth back-box algorithms;
Proximal mapping algorithms;
Smoothing algorithms;
Analytical complexity of the algorithms is the most important part
of theoretical results;
Optimal complexity algorithms are so efficient to solve practical
problems. 33 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

References

[1] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding


algorithm for linear inverse problems, SIAM Journal on Imaging
Sciences, 2 (2009), 183–202.
[2]Boyd, S., Xiao, L., Mutapcic, A.: Subgradient methods, (2003).
[3] Nemirovski, A.S., Yudin, D.: Problem Complexity and Method
Efficiency in Optimization. Wiley- Interscience Series in Discrete
Mathematics. Wiley, XV (1983).
[4] Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A
Basic Course. Kluwer, Massachusetts (2004).
[5] Nesterov, Y.: A method of solving a convex programming
problem with convergence rate O(1/k 2 ), Doklady AN SSSR (In
Russian), 269 (1983), 543-547. English translation: Soviet Math.
Dokl. 27 (1983), 372–376.
34 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Thank you for your consideration

35 / 35
Introduction Novel optimal algorithms Numerical experiments Conclusions

Optimal subgradient methods for large-scale


convex optimization

Masoud Ahookhosh

Faculty of Mathematics, University of Vienna


Vienna, Austria

Convex Optimization I

January 30, 2014

1 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Table of contents

1 Introduction
Definition of the problem
State-of-the-art solvers

2 Novel optimal algorithms


Optimal SubGradient Algorithm (OSGA)
Algorithmic structure: OSGA

3 Numerical experiments
Numerical experiments: linear inverse problem
Comparison with state-of-the-art software

4 Conclusions

2 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem

minimize f (x)
(1)
subject to x ∈ C

f (x) is a convex function;


C is a closed convex subset of vector space V ;

Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Which kind of algorithms can deal with these problems?


Appropriate algorithms for this class of problems: First-order methods
Gradient and Subgradient projection algorithms;
Conjugate gradient algorithms;
Optimal gradient and subgradient algorithms;
Proximal mapping and Soft-thresholding algorithms;
Optimal complexity for COP  (Nemirovski
 and Yudin 1983):
Smooth problems → O √1 .

 1
Nonsmooth problems → O . 2
Some examples:
N83: Nesterovs single-projection (1983);
N07: Nesterovs dual-projection (2007);
FISTA: Beck and Teboulle optimal proximal algorithm (2009);
N07: Nesterovs universal gradient (2013);
OSGA & ASGA: Ahookhosh and Neumaier affine subgradient
(2013).
4 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Optimal SubGradient Algorithm (OSGA): Motivation


The primary aim:

0 ≤ f (xb ) − f (x∗ ) ≤ Bound → 0 (2)

To do so, we consider:
First-order oracle: black-box unit that computes f (x) and ∇f (x)
for the numerical method at each point x:

O(x) = (f (x), ∇f (x)). (3)

Linear relaxation: f (z) ≥ γ + hh, zi


Prox function: Q is continuously differentiable,
Q0 = inf z∈C Q(z) > 0 and
1
Q(z) ≥ Q(x) + hqQ (x), z − xi + kz − xk2 , ∀x, z ∈ C. (4)
2
5 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Auxiliary subproblem:

γ + hh, zi
E(γ, h) = inf (5)
z∈C Q(z)

where z = U (γ, h) ∈ C and E(γ, h) and U (γ, h) are computable.


Error bound: from the definition of E(γ, h), the linear relaxation
and some manipulations, it can be concluded

0 ≤ f (xb ) − f (x∗ ) ≤ ηQ(x∗ ). (6)

How to use in algorithm:


If Q(x∗ ) is computable, then the error bound ηQ(x∗ ) is appliable.
Otherwise, we will search for decreasing {ηk } satisfying

0 ≤ f (xb ) − f (x∗ ) ≤ Q(x∗ ). (7)

for some constant  > 0.

6 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Algorithmic structure
Algoritm 2: Optimal SubGradient Algorithm (OSGA)
Input: λ, αmax ∈ (0 1), 0 < κ0 ≤ κ, µ ≥ 0,  > 0 and ftarget .
Begin
Choose xb ; Stop if f (xb ) ≤ ftarget ;
h = g(xb ); γ = f (xb ) − hh, xb i;
γb = γ − f (xb ); u = U (γb , h); η = E(γb , h) − µ; αmax ;
While η > 
x = xb + α(u − xb ); g = g(x); h̄ = h + α(g − h);
γ̄ = γ + α(f (x) + hg, xi − γ); x0b = argmin{f (xb ), f (x)};
γb0 = γ̄ − f (x0b ); u0 = U (γb0 , h̄); x0 = xb + α(u0 − xb );
Choose x̄b = argmin{f (x0b ), f (x0 )};
γ¯b = γ̄ − f (x̄b ); u0 = U (γ̄b , h̄); η = E(γ̄b , h̄) − µ;
xb = x¯b ; Stop if f (xb ) ≤ ftarget ;
Update α, h, γ, η, u;
End
End
7 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Theoretical Analysis

Theorem 2 (Complexity analysis).


Suppose that f is a convex function. Then complexity bounds for
smooth and nonsmooth problems are

(Nonsmooth complexity bound) If the point generated by Algorithm


2 stay in bounded region of the interior of C, or f is Lipschitz
continuous
 in C, then the total number of iterations neededis
O 12 . Thus the asymptotic worst case complexity is O 12 .

(Smooth complexity bound) If f has Lipschitz continuous gradient,


 
the total number of iterations needed for the algorithm is O √1 .

⇒ OSGA IS AN OPTIMAL METHOD

8 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Prox function and subproblem solving

Quadratic norm: p
kzk := hBz, zi
Dual norm: p
khk∗ := kB −1 hk = hh, B −1 hi
Prox function:
1
Q(z) := Q0 + kz − z0 k2
2
Subproblem solution:

U (γ, h) = z0 − E(γ, h)−1 B −1 h



−β+ β 2 +2Q0 khk2∗ khk2∗
E(γ, h) = 2Q0 = √ .
β+ β 2 +2Q0 khk2∗

9 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Numerical experiments: linear inverse problem

Definition 3 (Linear inverse problem).


We consider the following convex optimization problems:

Ax = b + δ (8)

A ∈ Rm×n is a matrix or a linear operator, x ∈ Rn and b, δ ∈ Rm

Examples:
Signal and image processing
Machine learning and statistics
Compressed sensing
Geophysics
···
10 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Approximate solution
Definition 4 (Least square problem).
1
Minimize kAx − bk22 (9)
2

The problem includes high-dimensional data


The problem is usually ill-conditioned and singular

Alternative problems: Tikhonov regularization:


1
minimize kAx − bk22 + λkxk22 . (10)
2
General case:
1
minimize kAx − bk22 + λg(x), (11)
2
where g(x) is a regularization term like g(x) = kxkp for p ≥ 1 or
0 ≤ p < 1 and g(x) = kxkIT V or kxkAT V . 11 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Isotropic and anisotropic total variation


Two standard choices of discrete TV-based regularizers, namely isotropic
total variation and anisotropic total variation, are popular in signal
and image processing, where they are respectively defined by
m−1 Xq
X n−1
kXkIT V = (Xi+1,j − Xi,j )2 + (Xi,j+1 − Xi,j )2
i j
(12)
m−1
X n−1
X
+ |Xi+1,n − Xi,n | + |Xm,j+1 − Xm,j |,
i i
and
m−1
X n−1
X
kXkAT V = {|Xi+1,j − Xi,j | + |Xi,j+1 − Xi,j |}
i j
(13)
m−1
X n−1
X
+ |Xi+1,n − Xi,n | + |Xm,j+1 − Xm,j |,
i i
m×n 12 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Denising of the noisy image

(a) Original image (b) Noisy image

13 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Denising by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA 14 / 26


Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Denising by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.18 IST
IST TwIST
0.16 TwIST FISTA
FISTA OSGA
OSGA −1
10
0.14

0.12

0.1 −2
10

0.08

0.06
−3
10
0.04

0.02

−4
0 10
0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60

(a) step vs. iter (b) Func vs. time


0
10 8

IST
TwIST 7
FISTA
OSGA
−1
10 6

−2
10 4

−3
10 2
IST
TwIST
1 FISTA
OSGA
−4
10 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50

(c) Func vs. iter (d) ISNR vs. iter 15 / 26


Introduction Novel optimal algorithms Numerical experiments Conclusions

Inpainting images with missing data

(a) Original image (b) Noisy image

16 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA


17 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.7

IST
TwIST
0.6 FISTA
OSGA

0.5 −1
10

0.4 IST
TwIST
FISTA
OSGA
0.3
−2
10

0.2

0.1

−3
0 10
0 100 200 300 400 500 600 0 10 20 30 40 50 60

(a) step vs. iter (b) Func vs. time


0
10 25

20

−1
10

15
IST IST
TwIST TwIST
FISTA FISTA
OSGA OSGA
10
−2
10

−3
10 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600

(c) Func vs. iter (d) ISNR vs. iter 18 / 26


Introduction Novel optimal algorithms Numerical experiments Conclusions

Deblurring of the blurred/noisy image

(a) Original image (b) Noisy image

19 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA 20 / 26


Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.07
TwIST
TwIST SpaRSA
−1
SpaRSA 10 FISTA
0.06 FISTA OSGA
OSGA

relative error of function values


−2
0.05 10
relative error of steps

0.04
−3
10

0.03
−4
10

0.02

−5
10
0.01

−6
0 10
0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25
iterations time

(a) step vs. iter (b) Func vs. time


0
10 5

TwIST
4.5
SpaRSA
−1
10 FISTA
OSGA 4

3.5
relative error of function values

−2
10
SNR improvement
3

−3
10 2.5

2
−4
10
1.5

1 TwIST
−5
10 SpaRSA
FISTA
0.5
OSGA

−6
10 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
iterations iterations

(c) Func vs. iter (d) ISNR vs. iter 21 / 26


Introduction Novel optimal algorithms Numerical experiments Conclusions

A comparison among first-order methods for a sparse


signal recovery by solving minx 12 kAx − bk22 + λkxk1

4
10 1.4

NSDSG NSDSG
NES83 NES83
NESCS 1.2 NESCS
NES05 NES05
PGA PGA
FISTA 1 FISTA
10
3 NESCO NESCO
NESUN NESUN
OSGA OSGA
function values

0.8

MSE
0.6

2
10
0.4

0.2

1
10 0
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
iterations iterations

(a) step vs. iter (b) Func vs. time

22 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Original signal(n = 10000, number of nonzeros = 300)


1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Noisy signal
1

−1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Direct solution using the pseudo inverse
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES83 (MSE = 0.000729386)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NESCS (MSE = 0.000947418)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES05 (MSE = 0.000870573)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
FISTA (MSE = 0.000849244)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
OSGA (MSE = 0.000796183)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

23 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Conclusions and references

Summarizing our discussion:

OSGA is optimal algorithms for both smooth and nonsmooth convex


optimization problems;
OSGA is feasible and avoid using the Lipschitz information;
Low memory requirement OSGA makes them to be appropriate for
solving high-dimensional problems;
OSGA is efficient and robust in applications and practice and
superior to some state-of-the-art solvers.

24 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

References

[1] A. Neumaier, OSGA: fast subgradient algorithm with optimal


complexity, Manuscript, University of Vienna, 2014.
[5] M. Ahookhosh, A. Neumaier, Optimal subgradient methods with
application in large-scale linear inverse problems, Manuscript,
University of Vienna, 2014.
[3] M. Ahookhosh, A. Neumaier, Optimal subgradient-based
methods for convex constrained optimization I: theoretical results,
Manuscript, University of Vienna, 2014.
[4] M. Ahookhosh, A. Neumaier, Optimal subgradient-based
methods for convex constrained optimization II: numerical results,
Manuscript, University of Vienna, 2014.

25 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Thank you for your consideration

26 / 26

You might also like