0% found this document useful (0 votes)

53 views61 pages

An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms

This document introduces numerical algorithms for nonsmooth convex optimization. It begins with definitions of nonsmooth convex optimization problems and applications. It then discusses subgradients and subdifferentials, and basic properties. The document outlines numerical algorithms like nonsmooth black-box optimization and proximal gradient algorithms. It concludes with an overview of algorithms for solving nonsmooth convex optimization problems.

Uploaded by

Leonardo Augusto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views61 pages

An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms

Uploaded by

Leonardo Augusto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Introduction Numerical algorithms for nonsmooth optimization Conclusions References

An introduction to nonsmooth convex

optimization: numerical algorithms

Masoud Ahookhosh

Faculty of Mathematics, University of Vienna

Vienna, Austria

Convex Optimization I

January 29, 2014

1 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Table of contents

1 Introduction
Definitions
Applications of nonsmooth convex optimization
Basic properties of subdifferential

2 Numerical algorithms for nonsmooth optimization

Nonsmooth black-box optimization
Proximal gradient algorithm
Smoothing algorithms
Optimal complexity algorithms

3 Conclusions

4 References

2 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem

minimize f (x)
(1)
subject to x ∈ C

f (x) is a convex function;

C is a closed convex subset of vector space V ;

Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Applications
Applications of convex optimization:
Approximation and fitting;
Norm approximation;
Least-norm problems;
Regularized approximation;
Robust approximation;
Function fitting and interpolation;
Statistical estimation;
Parametric and nonparametric distribution estimation;
Optimal detector design and hypothesis testing;
Chebyshev and Chernoff bounds;
Experiment design;
Global optimization;
Find bounds on the optimal value;
Find approximation solutions;
Convex relaxation;
4 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Geometric problems;
Projection on and distance between sets;
Centering and classification;
Placement and location;
Smallest enclosed elipsoid;
Image and signal processing;
Optimizing the number of image models using convex relaxation;
Image fusion for medical imaging;
Image reconstruction;
Sparse signal processing;
Design and control of complex systems;
Machine learning;
Financial and mechanical engineering;
Computational biology;

5 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Dfinition: subgradient and subdifferential

Definition 2 (Subgradient and subdifferential).

A vector g ∈ Rn is a subgradient of f : Rn → R at x ∈ domf if

f (z) ≥ f (x) + g T (z − x), (2)

for all z ∈ domf .

The set of all subgradients of f at x is called the subdifferential of f
at x and denoted by ∂f (x).

Definition 3 (Subdifferentiable functions).

A function f is called subdifferentiable at x if there exists at least
one subgradient of f at x.
A function f is called subdifferentiable if it is subdifferentiable at all
x ∈ domf .
6 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Subgradient and subdifferential

Examples:
if f is convex and differentiable, then the following first order
condition holds:

f (z) ≥ f (x) + ∇f (x)T (z − x), (3)

for all z ∈ domf . This implies: ∂f (x) = ∇f (x);

Absolute value. Consider f (x) = |x|, then we have

 1 x > 0;
∂f (x) = [-1,1] x = 0;

−1 x < 0.

Thus, g = sign(x) is a subgradient of f at x.

7 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Basic properties

Basic properties of subdifferential are as follows:

The subdifferential ∂f (x) is a closed convex set, even for a
nonconvex function f .
If f is convex and x ∈ int domf , then ∂f (x) is nonempty and
bounded.
∂(αf (x)) = α∂f (x), for α ≥ 0.
P P
∂( ni=1 fi (x)) = ni=1 ∂fi (x).
If h(x) = f (Ax + b), then ∂h(x) = AT ∂f (Ax + b).
If h(x) = maxS
i = 1, · · · , nfi (x), then
∂h(x) = conv {∂fi (x) | fi (x) = h(x) i = 1, · · · , n}.
If h(x) = supβSfβ (x), then
∂h(x) = conv {∂fβ (x) | fβ (x) = h(x) β ∈ B}.

8 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

How to calculate subgradients

Pn
Example: consider f (x) = kxk1 = i=1 |xi |. It is clear that

f (x) = max{sT x | si ∈ {−1, 1}}

We have sT x is differentiable and g = ∇fi (x) = s. Thus, for active

sT x = kxk1 , we should have

 1 s > 0;
si = {-1,1} s = 0; (4)

−1 s < 0.

This clearly implies

[
∂f (x) = conv {g | g of the form (4), g T x = kxk1 }
= {g | kgk∞ ≤ 1, g T x = kxk1 }.

Thus, g = sign(x) is a subgradient of f at x.

9 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimality condition:

First-order condition: A point x∗ is a minimizer of a convex function

f if and only if f is subdifferentiable at x∗ and

0 ∈ ∂f (x∗ ), (5)

i.e., g = 0 is a subgradient of f at x∗ .
The condition (5) reduces to ∇f (x∗ ) = 0 if f is differentiable at x∗ .
Analytical complexity: The number of calls of oracle, which is
required to solve a problem up to the accuracy ε. This means the
number of calls of oracle such that

f (xk ) − f (x∗ ) ≤ ε; (6)

Arithmetical complexity: The total number of arithmetic operations,

which is required to solve a problem up to the accuracy ε;
10 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical algorithms

The algorithms for solving nonsmooth convex optimization problems are

commonly divided into the following classes:

The nonsmooth balck-box optimization;

Proximal mapping technique;
Smoothing methods;

We here will not consider derivative-free and heuristic algorithms for

solving nonsmooth convex optimization problems.

11 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Nonsmooth black-box optimization: subgradient algorithms

The subgradient scheme for unconstrained problems:

xk+1 = xk − αk gk ,

where gk is a subgradient of the function f at xk , and is a step size

determined by:
Constant step size: αk = α;
Constant step length: αk = γ/kgk k2 ;
Square summable
Pn but not summable:
Pn
αk ≥ 0, k=1 = αk2 < ∞, k=1 = αk = ∞;
Nonsummable diminishing step
Pn size:
αk ≥ 0, limk→∞ αk = 0, k=1 = αk = ∞;
Nonsummable diminishingPstep length: αk = γk /kgk k such that
n
γ ≥ 0, limk→∞ γk = 0, k=1 = γk = ∞.

12 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

The subgradient algorithm: properties

Main properties:
The subgradient method is simple for implementations and applies
directly to the nondifferentiable f ;
The step sizes are not chosen via line search, as in the ordinary
gradient method;
The step sizes are determined before running the algorithm and do
not depend on any data computed during the algorithm;
Unlike the ordinary gradient method, the subgradient method is not
a descent method;
The function vale is nonmonotone meaning that it can even increase;
The subgradient algorithm is very slow for solving practical problems.

13 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Bound on function values error:

If the Euclidean distance of the optimal set is bounded, kx0 − x∗ k2 ≤ R,
and kgk k2 ≤ G, then we have
P
∗ R2 + G2 ki=1 αk2
fk − f ≤ P := RHS. (7)
2 ki=1 αk

Constant step size: k → ∞ ⇒ RHS → G2 α/2;

Constant step length: k → ∞ ⇒ RHS → Gγ/2;
Square summable but not summable: k → ∞ ⇒ RHS → 0;
Nonsummable diminishing step size: k → ∞ ⇒ RHS → 0;
Nonsummable diminishing step length: k → ∞ ⇒ RHS → 0.
Example: we now consider the LASSO problem
1
minimizex∈Rn kAx − bk22 + kxk1 , (8)
2
where A and b are randomly generated.
14 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 1: A comparison among the subgradient algorithms when they stopped

after 60 seconds of the running time (dense, m = 2000 and n = 5000) 15 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 2: A comparison among the subgradient algorithms when they stopped

after 20 seconds of the running time (sparse, m = 2000 and n = 5000)
16 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 3: A comparison among the subgradient algorithms when they stopped

after 60 seconds of the running time (dense, m = 2000 and n = 5000) 17 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 4: A comparison among the subgradient algorithms when they stopped

after 20 seconds of the running time (sparse, m = 2000 and n = 5000) 18 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 5: The nonmonotone behaviour of the original subgradient algorithms

when they stopped after 20 seconds of the running time (sparse, m = 2000 and
n = 5000)
19 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Projected subgradient algorithm

Consider the following constrained problem
minimize f (x)
(9)
subject to x ∈ C,
where C is a simple convex set. Then the projected subgradient scheme
is given by
xk+1 = P (xk − αk gk ), (10)
where
1
P (y) = argminx∈C kx − yk22 . (11)
2
Nonnegative orthant;
Affine set;
Box or unit ball;
Unit simplex;
An ellipsoid;
Second-order cone;
Positive semidefinite cone; 20 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Projected subgradient algorithm

Example: Let us to consider

minimize kxk1
(12)
subject to Ax = b,

where x ∈ Rn , x ∈ Rm and A ∈ Rm×n . Considering the set

C = {x | Ax = b}, we have

P (y) = y − AT (AAT )−1 (Ay − b). (13)

The projected subgradient algorithm can be summarized as follows

xk+1 = xk − αk (I − AT (AAT )−1 A)gk . (14)

By setting gk = sign(xk ), we obtain

xk+1 = xk − αk (I − AT (AAT )−1 A)sign(xk ). (15)

21 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Proximal gradient algorithm

Consider a composite function as follows
h(x) = f (x) + g(x). (16)
Characteristics of the considered convex optimization:
Appearing in many applications in science and technology: signal
and image processing, machine learning, statistics, inverse problems,
geophysics and so on.
In convex optimization → every local optimum is global optimizer.
Most of the problems are combination of both smooth and
nonsmooth functions:
h(x) = f (Ax) + g(Bx),
where f (Ax) and g(Ax) are respectively smooth and nonsmooth
functions.
Function and subgradient evaluations are so costly: Affine
transformations are the most costly part of the computation.
They are involving high-dimensional data. 22 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Proximal gradient algorithm

The algorithm involve two step, namely forward and backward, as follows:
Algorithm 1: PGA proximal gradient algorithm
Input: α0 ∈ (0, 1]; y0 ; > 0;
begin
while stopping criteria are not hold do
yk+1 = xk − αk gk ;
xk+1 = argminx∈Rn 21 kx − yk+1 k22 + g(x);
end
end

First step called forward because it aims to go toward the minimizer,

and the second step called backward step because it remind us
feasibility step of the projected gradient method.
It is clear that the projected gradient method is a spacial case of
PGA.
23 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Smoothing algorithms

The smoothing algorithms involve the following steps:

Reformulate the problem in the appropriate form for smoothing
processes;
Make the problem smooth;
Solve the problem with smooth convex solvers.

Nesterov’s smoothing algorithm:

Reformulate the problem in the form of the minimax problem
(saddle point representation);
Add a strongly convex prox function to the reformulated problem to
make it smooth;
Solve the problem with optimal first-order algorithms.

24 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimal complexity for first-order methods

Nemirovski and Yudin in 1983 proved the following complexity bound for
smooth and nonsmooth problems:
Theorem 4 (Complexity analysis).
Suppose that f is a convex function. Then complexity bounds for
smooth and nonsmooth problems are

(Nonsmooth complexity bound) If the point generated by the

algorithm stays in bounded region of the interior of C, or f is
Lipschitz continuous
in C, then the total number of iterations
1
needed is O 2 . Thus the asymptotic worst case complexity is
O 12 .

(Smooth complexity bound) If f has Lipschitz continuous gradient,

the total number of iterations needed for the algorithm is O √1 .

25 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Optimal first-order algorithms

Some popular optiml first-order algorithms:

Nonsummable diminishing subgradient algorithm;
Nesterov’s 1983 smooth algorithm;
Nesterov and Nemiroski’s 1988 smooth algorithm;
Nesterov’s constant step algorithm;
Nesterov’s 2005 smooth algorithm;
Nesterov’s composite algorithm;
Nesterov’s universal gradient algorithm;
Fast iterative shrinkage-thresholding algorithm
Tseng’s 2008 single projection algorithm;
Lan’s 2013 bundle-level algorithm;
Neumaier’s 2014 fast subgradient algorithm;

26 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Algorithm 2: NES83 Nesterov’s 1983 algorithm

Input: select z such that z 6= y0 and gy0 6= gz ; y0 ; > 0;
begin
a0 ← 0; x−1 ← y0 ;
α−1 ← ky0 − zk/kgy0 − gz k;
while stopping criteria are not hold do
α̂k ← αk−1 ; x̂k ← yk − α̂k gyk ;
while f (x̂k ) < f (yk ) − 12 α̂k kgyk k2 do
α̂k ← ρα̂k ; x̂k ← yk − α̂k gyk ;
end
xk+1 ← x̂k ; αk ← α̂k ;
q
ak+1 ← 1 + 4a2k + 1 /2;
yk+1 ← xk + (ak − 1)(xk − xk−1 )/ak+1 ;
end
end

27 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Algorithm 3: FISTA fast iterative shrinkage-thresholding algorithm

Input: select z such that z 6= y0 and gy0 6= gz ; y0 ; > 0;
begin
while stopping criteria are not hold do
αk ← 1/L;
zk ← yk − αk gyk ;
xk = argminx L2 kx − zk k22 + g(x);
q
ak+1 ← 1 + 4a2k + 1 /2;
yk+1 ← xk + (ak − 1)(xk − xk−1 )/ak+1 ;
end
end
By this adaptation, FISTA obtains the optimal complexity of smooth
first-order algorithms

28 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 6: A comparison among the subgradient algorithms when they stopped

after 60 seconds of the running time (dense, m = 2000 and n = 5000)

29 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk1

Figure 7: A comparison among the subgradient algorithms when they stopped

after 20 seconds of the running time (sparse, m = 2000 and n = 5000)

30 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 8: A comparison among the subgradient algorithms when they stopped

after 60 seconds of the running time (dense, m = 2000 and n = 5000)
31 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Numerical experiment: f (x) = kAx − bk22 + λkxk22

Figure 9: A comparison among the subgradient algorithms when they stopped

after 20 seconds of the running time (sparse, m = 2000 and n = 5000)

32 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Conclusions
Summarizing our discussion:

They are appearing in applications much more than smooth

optimization;
Solving nonsmooth optimization problems is much harder than
common smooth optimization;
The most efficient algorithms for solving them are first-order
methods;
There are no normal stopping criterion in corresponding algorithms;
The algorithms are divided into three classes:
Nonsmooth back-box algorithms;
Proximal mapping algorithms;
Smoothing algorithms;
Analytical complexity of the algorithms is the most important part
of theoretical results;
Optimal complexity algorithms are so efficient to solve practical
problems. 33 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

References

[1] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding

algorithm for linear inverse problems, SIAM Journal on Imaging
Sciences, 2 (2009), 183–202.
[2]Boyd, S., Xiao, L., Mutapcic, A.: Subgradient methods, (2003).
[3] Nemirovski, A.S., Yudin, D.: Problem Complexity and Method
Efficiency in Optimization. Wiley- Interscience Series in Discrete
Mathematics. Wiley, XV (1983).
[4] Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A
Basic Course. Kluwer, Massachusetts (2004).
[5] Nesterov, Y.: A method of solving a convex programming
problem with convergence rate O(1/k 2 ), Doklady AN SSSR (In
Russian), 269 (1983), 543-547. English translation: Soviet Math.
Dokl. 27 (1983), 372–376.
34 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References

Thank you for your consideration

35 / 35
Introduction Novel optimal algorithms Numerical experiments Conclusions

Optimal subgradient methods for large-scale

convex optimization

Masoud Ahookhosh

Faculty of Mathematics, University of Vienna

Vienna, Austria

Convex Optimization I

January 30, 2014

1 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Table of contents

1 Introduction
Definition of the problem
State-of-the-art solvers

2 Novel optimal algorithms

Optimal SubGradient Algorithm (OSGA)
Algorithmic structure: OSGA

3 Numerical experiments
Numerical experiments: linear inverse problem
Comparison with state-of-the-art software

4 Conclusions

2 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem

minimize f (x)
(1)
subject to x ∈ C

f (x) is a convex function;

C is a closed convex subset of vector space V ;

Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Which kind of algorithms can deal with these problems?

Appropriate algorithms for this class of problems: First-order methods
Gradient and Subgradient projection algorithms;
Conjugate gradient algorithms;
Optimal gradient and subgradient algorithms;
Proximal mapping and Soft-thresholding algorithms;
Optimal complexity for COP (Nemirovski
and Yudin 1983):
Smooth problems → O √1 .

1
Nonsmooth problems → O . 2
Some examples:
N83: Nesterovs single-projection (1983);
N07: Nesterovs dual-projection (2007);
FISTA: Beck and Teboulle optimal proximal algorithm (2009);
N07: Nesterovs universal gradient (2013);
OSGA & ASGA: Ahookhosh and Neumaier affine subgradient
(2013).
4 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Optimal SubGradient Algorithm (OSGA): Motivation

The primary aim:

0 ≤ f (xb ) − f (x∗ ) ≤ Bound → 0 (2)

To do so, we consider:
First-order oracle: black-box unit that computes f (x) and ∇f (x)
for the numerical method at each point x:

O(x) = (f (x), ∇f (x)). (3)

Linear relaxation: f (z) ≥ γ + hh, zi

Prox function: Q is continuously differentiable,
Q0 = inf z∈C Q(z) > 0 and
1
Q(z) ≥ Q(x) + hqQ (x), z − xi + kz − xk2 , ∀x, z ∈ C. (4)
2
5 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Auxiliary subproblem:

γ + hh, zi
E(γ, h) = inf (5)
z∈C Q(z)

where z = U (γ, h) ∈ C and E(γ, h) and U (γ, h) are computable.

Error bound: from the definition of E(γ, h), the linear relaxation
and some manipulations, it can be concluded

0 ≤ f (xb ) − f (x∗ ) ≤ ηQ(x∗ ). (6)

How to use in algorithm:

If Q(x∗ ) is computable, then the error bound ηQ(x∗ ) is appliable.
Otherwise, we will search for decreasing {ηk } satisfying

0 ≤ f (xb ) − f (x∗ ) ≤ Q(x∗ ). (7)

for some constant > 0.

6 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Algorithmic structure
Algoritm 2: Optimal SubGradient Algorithm (OSGA)
Input: λ, αmax ∈ (0 1), 0 < κ0 ≤ κ, µ ≥ 0, > 0 and ftarget .
Begin
Choose xb ; Stop if f (xb ) ≤ ftarget ;
h = g(xb ); γ = f (xb ) − hh, xb i;
γb = γ − f (xb ); u = U (γb , h); η = E(γb , h) − µ; αmax ;
While η >
x = xb + α(u − xb ); g = g(x); h̄ = h + α(g − h);
γ̄ = γ + α(f (x) + hg, xi − γ); x0b = argmin{f (xb ), f (x)};
γb0 = γ̄ − f (x0b ); u0 = U (γb0 , h̄); x0 = xb + α(u0 − xb );
Choose x̄b = argmin{f (x0b ), f (x0 )};
γ¯b = γ̄ − f (x̄b ); u0 = U (γ̄b , h̄); η = E(γ̄b , h̄) − µ;
xb = x¯b ; Stop if f (xb ) ≤ ftarget ;
Update α, h, γ, η, u;
End
End
7 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Theoretical Analysis

Theorem 2 (Complexity analysis).

Suppose that f is a convex function. Then complexity bounds for
smooth and nonsmooth problems are

(Nonsmooth complexity bound) If the point generated by Algorithm

2 stay in bounded region of the interior of C, or f is Lipschitz
continuous
in C, then the total number of iterations neededis
O 12 . Thus the asymptotic worst case complexity is O 12 .

(Smooth complexity bound) If f has Lipschitz continuous gradient,

the total number of iterations needed for the algorithm is O √1 .

⇒ OSGA IS AN OPTIMAL METHOD

8 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Prox function and subproblem solving

Quadratic norm: p
kzk := hBz, zi
Dual norm: p
khk∗ := kB −1 hk = hh, B −1 hi
Prox function:
1
Q(z) := Q0 + kz − z0 k2
2
Subproblem solution:

U (γ, h) = z0 − E(γ, h)−1 B −1 h

√
−β+ β 2 +2Q0 khk2∗ khk2∗
E(γ, h) = 2Q0 = √ .
β+ β 2 +2Q0 khk2∗

9 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Numerical experiments: linear inverse problem

Definition 3 (Linear inverse problem).

We consider the following convex optimization problems:

Ax = b + δ (8)

A ∈ Rm×n is a matrix or a linear operator, x ∈ Rn and b, δ ∈ Rm

Examples:
Signal and image processing
Machine learning and statistics
Compressed sensing
Geophysics
···
10 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Approximate solution
Definition 4 (Least square problem).
1
Minimize kAx − bk22 (9)
2

The problem includes high-dimensional data

The problem is usually ill-conditioned and singular

Alternative problems: Tikhonov regularization:

1
minimize kAx − bk22 + λkxk22 . (10)
2
General case:
1
minimize kAx − bk22 + λg(x), (11)
2
where g(x) is a regularization term like g(x) = kxkp for p ≥ 1 or
0 ≤ p < 1 and g(x) = kxkIT V or kxkAT V . 11 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Isotropic and anisotropic total variation

Two standard choices of discrete TV-based regularizers, namely isotropic
total variation and anisotropic total variation, are popular in signal
and image processing, where they are respectively defined by
m−1 Xq
X n−1
kXkIT V = (Xi+1,j − Xi,j )2 + (Xi,j+1 − Xi,j )2
i j
(12)
m−1
X n−1
X
+ |Xi+1,n − Xi,n | + |Xm,j+1 − Xm,j |,
i i
and
m−1
X n−1
X
kXkAT V = {|Xi+1,j − Xi,j | + |Xi,j+1 − Xi,j |}
i j
(13)
m−1
X n−1
X
+ |Xi+1,n − Xi,n | + |Xm,j+1 − Xm,j |,
i i
m×n 12 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Denising of the noisy image

(a) Original image (b) Noisy image

13 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Denising by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA 14 / 26

Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Denising by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.18 IST
IST TwIST
0.16 TwIST FISTA
FISTA OSGA
OSGA −1
10
0.14

0.12

0.1 −2
10

0.08

0.06
−3
10
0.04

0.02

−4
0 10
0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60

(a) step vs. iter (b) Func vs. time

0
10 8

IST
TwIST 7
FISTA
OSGA
−1
10 6

−2
10 4

−3
10 2
IST
TwIST
1 FISTA
OSGA
−4
10 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50

(c) Func vs. iter (d) ISNR vs. iter 15 / 26

Introduction Novel optimal algorithms Numerical experiments Conclusions

Inpainting images with missing data

(a) Original image (b) Noisy image

16 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA

17 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.7

IST
TwIST
0.6 FISTA
OSGA

0.5 −1
10

0.4 IST
TwIST
FISTA
OSGA
0.3
−2
10

0.2

0.1

−3
0 10
0 100 200 300 400 500 600 0 10 20 30 40 50 60

(a) step vs. iter (b) Func vs. time

0
10 25

−1
10

15
IST IST
TwIST TwIST
FISTA FISTA
OSGA OSGA
10
−2
10

−3
10 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600

(c) Func vs. iter (d) ISNR vs. iter 18 / 26

Introduction Novel optimal algorithms Numerical experiments Conclusions

Deblurring of the blurred/noisy image

(a) Original image (b) Noisy image

19 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V

(c) OSGA (d) IST

(e) TwIST (f) FISTA 20 / 26

Introduction Novel optimal algorithms Numerical experiments Conclusions

1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.07
TwIST
TwIST SpaRSA
−1
SpaRSA 10 FISTA
0.06 FISTA OSGA
OSGA

relative error of function values

−2
0.05 10
relative error of steps

0.04
−3
10

0.03
−4
10

0.02

−5
10
0.01

−6
0 10
0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25
iterations time

(a) step vs. iter (b) Func vs. time

0
10 5

TwIST
4.5
SpaRSA
−1
10 FISTA
OSGA 4

3.5
relative error of function values

−2
10
SNR improvement
3

−3
10 2.5

2
−4
10
1.5

1 TwIST
−5
10 SpaRSA
FISTA
0.5
OSGA

−6
10 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
iterations iterations

(c) Func vs. iter (d) ISNR vs. iter 21 / 26

Introduction Novel optimal algorithms Numerical experiments Conclusions

A comparison among first-order methods for a sparse

signal recovery by solving minx 12 kAx − bk22 + λkxk1

4
10 1.4

NSDSG NSDSG
NES83 NES83
NESCS 1.2 NESCS
NES05 NES05
PGA PGA
FISTA 1 FISTA
10
3 NESCO NESCO
NESUN NESUN
OSGA OSGA
function values

0.8

MSE
0.6

2
10
0.4

0.2

1
10 0
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
iterations iterations

(a) step vs. iter (b) Func vs. time

22 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Original signal(n = 10000, number of nonzeros = 300)

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Noisy signal
1

−1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Direct solution using the pseudo inverse
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES83 (MSE = 0.000729386)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NESCS (MSE = 0.000947418)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES05 (MSE = 0.000870573)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
FISTA (MSE = 0.000849244)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
OSGA (MSE = 0.000796183)
1

−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

23 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Conclusions and references

Summarizing our discussion:

OSGA is optimal algorithms for both smooth and nonsmooth convex

optimization problems;
OSGA is feasible and avoid using the Lipschitz information;
Low memory requirement OSGA makes them to be appropriate for
solving high-dimensional problems;
OSGA is efficient and robust in applications and practice and
superior to some state-of-the-art solvers.

24 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

References

[1] A. Neumaier, OSGA: fast subgradient algorithm with optimal

complexity, Manuscript, University of Vienna, 2014.
[5] M. Ahookhosh, A. Neumaier, Optimal subgradient methods with
application in large-scale linear inverse problems, Manuscript,
University of Vienna, 2014.
[3] M. Ahookhosh, A. Neumaier, Optimal subgradient-based
methods for convex constrained optimization I: theoretical results,
Manuscript, University of Vienna, 2014.
[4] M. Ahookhosh, A. Neumaier, Optimal subgradient-based
methods for convex constrained optimization II: numerical results,
Manuscript, University of Vienna, 2014.

25 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions

Thank you for your consideration

26 / 26

John Deere 1640, 1840, 2040, 2040s Tractors Service Technical Manual
50% (10)
John Deere 1640, 1840, 2040, 2040s Tractors Service Technical Manual
22 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lecture07 Slides
No ratings yet
Lecture07 Slides
42 pages
02 Grad Desc
No ratings yet
02 Grad Desc
54 pages
Simons Bootcamp01convex
No ratings yet
Simons Bootcamp01convex
77 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
No ratings yet
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
368 pages
CH 4
No ratings yet
CH 4
28 pages
Ahookhosh 等 - 2021 - A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization Superlinear Co
No ratings yet
Ahookhosh 等 - 2021 - A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization Superlinear Co
33 pages
SESO2018 Wednesday Sagastizabal
No ratings yet
SESO2018 Wednesday Sagastizabal
181 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Lecture 11
No ratings yet
Lecture 11
46 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Irs Optimiazation Meta
No ratings yet
Irs Optimiazation Meta
32 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Unit 5
No ratings yet
Unit 5
24 pages
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
No ratings yet
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
283 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
Unconstrained and Constrained Optimization Algorithms by Soman K.P
No ratings yet
Unconstrained and Constrained Optimization Algorithms by Soman K.P
166 pages
ConvexSpring25 Week9
No ratings yet
ConvexSpring25 Week9
26 pages
Unconstrained Optimization - Ipynb - Colaboratory
No ratings yet
Unconstrained Optimization - Ipynb - Colaboratory
5 pages
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Branch & Bound For NLP
No ratings yet
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Branch & Bound For NLP
55 pages
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
No ratings yet
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
23 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Berkeley-Tutorial Optimization For Machine Learningpart2
No ratings yet
Berkeley-Tutorial Optimization For Machine Learningpart2
35 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
p5 CO Opti Algo
No ratings yet
p5 CO Opti Algo
15 pages
Weak Subgradient Method For Solving Nonsmooth Nonconvex Optimization Problems
No ratings yet
Weak Subgradient Method For Solving Nonsmooth Nonconvex Optimization Problems
42 pages
O4MD 02 Foundations
No ratings yet
O4MD 02 Foundations
8 pages
Primal - Dual Decomposition Methods
No ratings yet
Primal - Dual Decomposition Methods
40 pages
2001 - Schittkowski and Zillober
No ratings yet
2001 - Schittkowski and Zillober
21 pages
Convex Module B
No ratings yet
Convex Module B
29 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
NeurIPS 2019 Efficient Algorithms For Smooth Minimax Optimization Paper
No ratings yet
NeurIPS 2019 Efficient Algorithms For Smooth Minimax Optimization Paper
12 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Chapter 2 Optimization
No ratings yet
Chapter 2 Optimization
47 pages
Optimization Algorithms For Data Analysis Wright
No ratings yet
Optimization Algorithms For Data Analysis Wright
49 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
(1.5.2) Unconstrained Nonlinear Programming
No ratings yet
(1.5.2) Unconstrained Nonlinear Programming
25 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
CS769 2025 Lecture 8-Annotated
No ratings yet
CS769 2025 Lecture 8-Annotated
37 pages
Lecture 11 AGD Restart Lower Bounds
No ratings yet
Lecture 11 AGD Restart Lower Bounds
5 pages
Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods
No ratings yet
Introductory Course On Non-Smooth Optimisation: Lecture 01 - Gradient Methods
49 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Continuous Optimization
No ratings yet
Continuous Optimization
51 pages
Machine Learning - Lecture 2
No ratings yet
Machine Learning - Lecture 2
28 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
DVS 2203-2
100% (1)
DVS 2203-2
3 pages
Reference - Advanced Operators For Web Search - GoogleWebSearchEducation
No ratings yet
Reference - Advanced Operators For Web Search - GoogleWebSearchEducation
5 pages
Chain Drive: Guided By:-Prof. P.D. Wadile Presented By: - Hemraj C. Thakare Roll No: - 52
100% (2)
Chain Drive: Guided By:-Prof. P.D. Wadile Presented By: - Hemraj C. Thakare Roll No: - 52
17 pages
On The Calibration of Surface Plates: Measurement February 2013
No ratings yet
On The Calibration of Surface Plates: Measurement February 2013
11 pages
Solution Ass 1
No ratings yet
Solution Ass 1
3 pages
Osmc Installer Log
No ratings yet
Osmc Installer Log
3 pages
JPT12 2 PDF
No ratings yet
JPT12 2 PDF
133 pages
IM300-20 (LCMass RHE 14) PDF
No ratings yet
IM300-20 (LCMass RHE 14) PDF
40 pages
Citrato de Sodio Buffer
No ratings yet
Citrato de Sodio Buffer
6 pages
Calculation and Determination of Concrete Mix Design
No ratings yet
Calculation and Determination of Concrete Mix Design
6 pages
Sensory Perception of Fat in Milk
No ratings yet
Sensory Perception of Fat in Milk
10 pages
2619 1 5218 1 10 20160408 PDF
No ratings yet
2619 1 5218 1 10 20160408 PDF
14 pages
(S7-200 Manual Chinese)
No ratings yet
(S7-200 Manual Chinese)
472 pages
Chemistry Worksheet - Ions
No ratings yet
Chemistry Worksheet - Ions
2 pages
Advanced Engineering, Worldwide Facilities & Comprehensive Technical Support
No ratings yet
Advanced Engineering, Worldwide Facilities & Comprehensive Technical Support
12 pages
Chapter6 (Interfaces and Modem)
100% (3)
Chapter6 (Interfaces and Modem)
51 pages
Jurnal No.1
No ratings yet
Jurnal No.1
11 pages
Circular Motion 1
0% (1)
Circular Motion 1
1 page
Round Bar Flat Bar Angle Double Angle: Cross-Section
No ratings yet
Round Bar Flat Bar Angle Double Angle: Cross-Section
11 pages
(Klystron and Travelling Wave Tubes) : (Magnetron)
No ratings yet
(Klystron and Travelling Wave Tubes) : (Magnetron)
9 pages
6 Oct 2020 Formating
No ratings yet
6 Oct 2020 Formating
13 pages
Take Home Exam Solutions
No ratings yet
Take Home Exam Solutions
60 pages
Airspan Mobile Wimax RF Planning Parameters PDF
No ratings yet
Airspan Mobile Wimax RF Planning Parameters PDF
23 pages
Installing The Mercedes-Benz Wis
100% (1)
Installing The Mercedes-Benz Wis
4 pages
Moj
No ratings yet
Moj
3 pages
Aptitude C-CAT Ques Part-1 PDF
No ratings yet
Aptitude C-CAT Ques Part-1 PDF
4 pages
Tapered, Circular Tubes
No ratings yet
Tapered, Circular Tubes
11 pages
Cheat Sheet For SQL From Beginner To Expert
No ratings yet
Cheat Sheet For SQL From Beginner To Expert
2 pages
27 Ways To Kill Your PD Pump
No ratings yet
27 Ways To Kill Your PD Pump
12 pages