An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms
An Introduction To Nonsmooth Convex Optimization: Numerical Algorithms
Masoud Ahookhosh
Convex Optimization I
1 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Table of contents
1 Introduction
Definitions
Applications of nonsmooth convex optimization
Basic properties of subdifferential
3 Conclusions
4 References
2 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem
minimize f (x)
(1)
subject to x ∈ C
Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Applications
Applications of convex optimization:
Approximation and fitting;
Norm approximation;
Least-norm problems;
Regularized approximation;
Robust approximation;
Function fitting and interpolation;
Statistical estimation;
Parametric and nonparametric distribution estimation;
Optimal detector design and hypothesis testing;
Chebyshev and Chernoff bounds;
Experiment design;
Global optimization;
Find bounds on the optimal value;
Find approximation solutions;
Convex relaxation;
4 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Geometric problems;
Projection on and distance between sets;
Centering and classification;
Placement and location;
Smallest enclosed elipsoid;
Image and signal processing;
Optimizing the number of image models using convex relaxation;
Image fusion for medical imaging;
Image reconstruction;
Sparse signal processing;
Design and control of complex systems;
Machine learning;
Financial and mechanical engineering;
Computational biology;
5 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Examples:
if f is convex and differentiable, then the following first order
condition holds:
7 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Basic properties
8 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Optimality condition:
0 ∈ ∂f (x∗ ), (5)
i.e., g = 0 is a subgradient of f at x∗ .
The condition (5) reduces to ∇f (x∗ ) = 0 if f is differentiable at x∗ .
Analytical complexity: The number of calls of oracle, which is
required to solve a problem up to the accuracy ε. This means the
number of calls of oracle such that
Numerical algorithms
11 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
xk+1 = xk − αk gk ,
12 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Main properties:
The subgradient method is simple for implementations and applies
directly to the nondifferentiable f ;
The step sizes are not chosen via line search, as in the ordinary
gradient method;
The step sizes are determined before running the algorithm and do
not depend on any data computed during the algorithm;
Unlike the ordinary gradient method, the subgradient method is not
a descent method;
The function vale is nonmonotone meaning that it can even increase;
The subgradient algorithm is very slow for solving practical problems.
13 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
minimize kxk1
(12)
subject to Ax = b,
Smoothing algorithms
24 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
25 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
26 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
27 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
28 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
29 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
30 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
32 / 35
Introduction Numerical algorithms for nonsmooth optimization Conclusions References
Conclusions
Summarizing our discussion:
References
35 / 35
Introduction Novel optimal algorithms Numerical experiments Conclusions
Masoud Ahookhosh
Convex Optimization I
1 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Table of contents
1 Introduction
Definition of the problem
State-of-the-art solvers
3 Numerical experiments
Numerical experiments: linear inverse problem
Comparison with state-of-the-art software
4 Conclusions
2 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Definition of problems
Definition 1 (Structural convex optimization).
Consider the following a convex optimization problem
minimize f (x)
(1)
subject to x ∈ C
Properties:
f (x) can be smooth or nonsmooth;
Solving nonsmooth convex optimization problems is much harder
than solving differentiable ones;
For some nonsmooth nonconvex cases, even finding a decent
direction is not possible;
The problem is involving linear operators. 3 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
To do so, we consider:
First-order oracle: black-box unit that computes f (x) and ∇f (x)
for the numerical method at each point x:
Auxiliary subproblem:
γ + hh, zi
E(γ, h) = inf (5)
z∈C Q(z)
6 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Algorithmic structure
Algoritm 2: Optimal SubGradient Algorithm (OSGA)
Input: λ, αmax ∈ (0 1), 0 < κ0 ≤ κ, µ ≥ 0, > 0 and ftarget .
Begin
Choose xb ; Stop if f (xb ) ≤ ftarget ;
h = g(xb ); γ = f (xb ) − hh, xb i;
γb = γ − f (xb ); u = U (γb , h); η = E(γb , h) − µ; αmax ;
While η >
x = xb + α(u − xb ); g = g(x); h̄ = h + α(g − h);
γ̄ = γ + α(f (x) + hg, xi − γ); x0b = argmin{f (xb ), f (x)};
γb0 = γ̄ − f (x0b ); u0 = U (γb0 , h̄); x0 = xb + α(u0 − xb );
Choose x̄b = argmin{f (x0b ), f (x0 )};
γ¯b = γ̄ − f (x̄b ); u0 = U (γ̄b , h̄); η = E(γ̄b , h̄) − µ;
xb = x¯b ; Stop if f (xb ) ≤ ftarget ;
Update α, h, γ, η, u;
End
End
7 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Theoretical Analysis
8 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Quadratic norm: p
kzk := hBz, zi
Dual norm: p
khk∗ := kB −1 hk = hh, B −1 hi
Prox function:
1
Q(z) := Q0 + kz − z0 k2
2
Subproblem solution:
9 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Ax = b + δ (8)
Examples:
Signal and image processing
Machine learning and statistics
Compressed sensing
Geophysics
···
10 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
Approximate solution
Definition 4 (Least square problem).
1
Minimize kAx − bk22 (9)
2
13 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
1
Denising by solving minx 2 kAx − bk22 + λkxkIT V
1
Denising by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.18 IST
IST TwIST
0.16 TwIST FISTA
FISTA OSGA
OSGA −1
10
0.14
0.12
0.1 −2
10
0.08
0.06
−3
10
0.04
0.02
−4
0 10
0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60
IST
TwIST 7
FISTA
OSGA
−1
10 6
−2
10 4
−3
10 2
IST
TwIST
1 FISTA
OSGA
−4
10 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
16 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V
1
Inpainting by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.7
IST
TwIST
0.6 FISTA
OSGA
0.5 −1
10
0.4 IST
TwIST
FISTA
OSGA
0.3
−2
10
0.2
0.1
−3
0 10
0 100 200 300 400 500 600 0 10 20 30 40 50 60
20
−1
10
15
IST IST
TwIST TwIST
FISTA FISTA
OSGA OSGA
10
−2
10
−3
10 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
19 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V
1
Deblurring by solving minx 2 kAx − bk22 + λkxkIT V
0
10
0.07
TwIST
TwIST SpaRSA
−1
SpaRSA 10 FISTA
0.06 FISTA OSGA
OSGA
0.04
−3
10
0.03
−4
10
0.02
−5
10
0.01
−6
0 10
0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25
iterations time
TwIST
4.5
SpaRSA
−1
10 FISTA
OSGA 4
3.5
relative error of function values
−2
10
SNR improvement
3
−3
10 2.5
2
−4
10
1.5
1 TwIST
−5
10 SpaRSA
FISTA
0.5
OSGA
−6
10 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
iterations iterations
4
10 1.4
NSDSG NSDSG
NES83 NES83
NESCS 1.2 NESCS
NES05 NES05
PGA PGA
FISTA 1 FISTA
10
3 NESCO NESCO
NESUN NESUN
OSGA OSGA
function values
0.8
MSE
0.6
2
10
0.4
0.2
1
10 0
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
iterations iterations
22 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Noisy signal
1
−1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Direct solution using the pseudo inverse
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES83 (MSE = 0.000729386)
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NESCS (MSE = 0.000947418)
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
NES05 (MSE = 0.000870573)
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
FISTA (MSE = 0.000849244)
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
OSGA (MSE = 0.000796183)
1
−1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
23 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
24 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
References
25 / 26
Introduction Novel optimal algorithms Numerical experiments Conclusions
26 / 26