0% found this document useful (0 votes)
57 views

Seq Slides

Sequential convex programming (SCP) is a local optimization method for nonconvex problems that solves a sequence of convex subproblems. At each step, SCP forms convex approximations of the objective and constraints around the current solution. It then solves the approximated convex problem to update the solution. While SCP is only guaranteed to find a local optimum, it leverages efficient convex solvers and often finds solutions with good objective values. The approximations are formed using techniques like Taylor approximations, fitting convex functions to sampled points, or penalizing constraint violations.

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Seq Slides

Sequential convex programming (SCP) is a local optimization method for nonconvex problems that solves a sequence of convex subproblems. At each step, SCP forms convex approximations of the objective and constraints around the current solution. It then solves the approximated convex problem to update the solution. While SCP is only guaranteed to find a local optimum, it leverages efficient convex solvers and often finds solutions with good objective values. The approximations are formed using techniques like Taylor approximations, fitting convex functions to sampled points, or penalizing constraint violations.

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Sequential Convex Programming

• sequential convex programming


• alternating convex optimization
• convex-concave procedure

EE364b, Stanford University


Methods for nonconvex optimization problems

• convex optimization methods are (roughly) always global, always fast

• for general nonconvex problems, we have to give up one


– local optimization methods are fast, but need not find global
solution (and even when they do, cannot certify it)
– global optimization methods find global solution (and certify it),
but are not always fast (indeed, are often slow)

• this lecture: local optimization methods that are based on solving a


sequence of convex problems

EE364b, Stanford University 1


Sequential convex programming (SCP)

• a local optimization method for nonconvex problems that leverages


convex optimization
– convex portions of a problem are handled ‘exactly’ and efficiently

• SCP is a heuristic
– it can fail to find optimal (or even feasible) point
– results can (and often do) depend on starting point
(can run algorithm from many initial points and take best result)

• SCP often works well, i.e., finds a feasible point with good, if not
optimal, objective value

EE364b, Stanford University 2


Problem
we consider nonconvex problem

minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, j = 1, . . . , p

with variable x ∈ Rn

• f0 and fi (possibly) nonconvex


• hi (possibly) non-affine

EE364b, Stanford University 3


Basic idea of SCP

• maintain estimate of solution x(k), and convex trust region T (k) ⊂ Rn

• form convex approximation fˆi of fi over trust region T (k)

• form affine approximation ĥi of hi over trust region T (k)

• x(k+1) is optimal point for approximate convex problem

minimize fˆ0(x)
subject to fˆi(x) ≤ 0, i = 1, . . . , m
ĥi(x) = 0, i = 1, . . . , p
x ∈ T (k)

EE364b, Stanford University 4


Trust region

• typical trust region is box around current point:

(k)
T (k) = {x | |xi − xi | ≤ ρi, i = 1, . . . , n}

• if xi appears only in convex inequalities and affine equalities, can take


ρi = ∞

EE364b, Stanford University 5


Affine and convex approximations via Taylor expansions

• (affine) first order Taylor expansion:

fˆ(x) = f (x(k)) + ∇f (x(k))T (x − x(k))

• (convex part of) second order Taylor expansion:

fˆ(x) = f (x(k)) + ∇f (x(k))T (x − x(k)) + (1/2)(x − x(k))T P (x − x(k))

2 (k)

P = ∇ f (x ) +
, PSD part of Hessian

• give local approximations, which don’t depend on trust region radii ρi

EE364b, Stanford University 6


Quadratic trust regions
• full second order Taylor expansion:

fˆ(x) = f (x(k))+∇f (x(k))T (x−x(k))+(1/2)(x−x(k))∇2f (x(k))(x−x(k)),

• trust region is compact ellipse around current point: for some P ≻ 0

T (k) = {x | (x − x(k))T P (x − x(k)) ≤ ρ}

• Update is any x(k+1) for which there is λ ≥ 0 s.t.

∇2f (x(k)) + λP  0, λ(kx(k+1)k2 − 1) = 0,


(∇2f (x(k)) + λP )x(k) = −∇f (x(k))

EE364b, Stanford University 7


Particle method

• particle method:
– choose points z1, . . . , zK ∈ T (k)
(e.g., all vertices, some vertices, grid, random, . . . )
– evaluate yi = f (zi)
– fit data (zi, yi) with convex (affine) function
(using convex optimization)

• advantages:
– handles nondifferentiable functions, or functions for which evaluating
derivatives is difficult
– gives regional models, which depend on current point and trust
region radii ρi

EE364b, Stanford University 8


Fitting affine or quadratic functions to data
fit convex quadratic function to data (zi, yi)

PK (k) T (k) T (k)


2
minimize i=1 (zi − x ) P (zi − x ) + q (zi − x ) + r − yi
subject to P  0

with variables P ∈ Sn, q ∈ Rn, r ∈ R

• can use other objectives, add other convex constraints


• no need to solve exactly
• this problem is solved for each nonconvex constraint, each SCP step

EE364b, Stanford University 9


Quasi-linearization
• a cheap and simple method for affine approximation

• write h(x) as A(x)x + b(x) (many ways to do this)

• use ĥ(x) = A(x(k))x + b(x(k))

• example:
T
h(x) = (1/2)xT P x + q T x + r = ((1/2)P x + q) x + r

• ĥql(x) = ((1/2)P x(k) + q)T x + r

• ĥtay (x) = (P x(k) + q)T (x − x(k)) + h(x(k))

EE364b, Stanford University 10


Example

• nonconvex QP

minimize f (x) = (1/2)xT P x + q T x


subject to kxk∞ ≤ 1

with P symmetric but not PSD

• use approximation

f (x(k)) + (P x(k) + q)T (x − x(k)) + (1/2)(x − x(k))T P+(x − x(k))

EE364b, Stanford University 11


• example with x ∈ R20
• SCP with ρ = 0.2, started from 10 different points
−10

−20

−30
f (x(k))
−40

−50

−60

−70
5 10 15 20 25 30

• runs typically converge to points between −60 and −50


• dashed line shows lower bound on optimal value ≈ −66.5

EE364b, Stanford University 12


Lower bound via Lagrange dual
• write constraints as x2i ≤ 1 and form Lagrangian
n
X
L(x, λ) = (1/2)xT P x + q T x + λi(x2i − 1)
i=1

= (1/2)xT (P + 2 diag(λ)) x + q T x − 1T λ

−1
• g(λ) = −(1/2)q T (P + 2 diag(λ)) q − 1T λ; need P + 2 diag(λ) ≻ 0

• solve dual problem to get best lower bound:


−1
maximize −(1/2)q T (P + 2 diag(λ)) q − 1T λ
subject to λ  0, P + 2 diag(λ) ≻ 0

EE364b, Stanford University 13


Some (related) issues

• approximate convex problem can be infeasible

• how do we evaluate progress when x(k) isn’t feasible?


need to take into account
– objective f0(x(k))
– inequality constraint violations fi(x(k))+
– equality constraint violations |hi(x(k))|

• controlling the trust region size


– ρ too large: approximations are poor, leading to bad choice of x(k+1)
– ρ too small: approximations are good, but progress is slow

EE364b, Stanford University 14


Exact penalty formulation
• instead of original problem, we solve unconstrained problem
Pm Pp
minimize φ(x) = f0(x) + λ ( i=1 fi (x)+ + i=1 |hi (x)|)

where λ > 0
• for λ large enough, minimizer of φ is solution of original problem
• for SCP, use convex approximation
p
m
!
φ̂(x) = fˆ0(x) + λ fˆi(x)+ +
X X
|ĥi(x)|
i=1 i=1

• approximate problem always feasible

EE364b, Stanford University 15


Trust region update

• judge algorithm progress by decrease in φ, using solution x̃ of


approximate problem
• decrease with approximate objective: δ̂ = φ(x(k)) − φ̂(x̃)
(called predicted decrease)
• decrease with exact objective: δ = φ(x(k)) − φ(x̃)
• if δ ≥ αδ̂, ρ(k+1) = β succρ(k), x(k+1) = x̃
(α ∈ (0, 1), β succ ≥ 1; typical values α = 0.1, β succ = 1.1)
• if δ < αδ̂, ρ(k+1) = β failρ(k), x(k+1) = x(k)
(β fail ∈ (0, 1); typical value β fail = 0.5)
• interpretation: if actual decrease is more (less) than fraction α of
predicted decrease then increase (decrease) trust region size

EE364b, Stanford University 16


Nonlinear optimal control

l 2 , m2

τ2 θ2

τ1
l 1 , m1

θ1

• 2-link system, controlled by torques τ1 and τ2 (no gravity)

EE364b, Stanford University 17


• dynamics given by M (θ)θ̈ + W (θ, θ̇)θ̇ = τ , with
2
 
(m1 + m2)l1 m2l1l2(s1s2 + c1c2)
M (θ) =
m2l1l2(s1s2 + c1c2) m2l22
 
0 m2l1l2(s1c2 − c1s2)θ̇2
W (θ, θ̇) =
m2l1l2(s1c2 − c1s2)θ̇1 0

si = sin θi, ci = cos θi

• nonlinear optimal control problem:


RT
minimize J = 0 kτ (t)k22 dt
subject to θ(0) = θinit, θ̇(0) = 0, θ(T ) = θfinal, θ̇(T ) = 0
kτ (t)k∞ ≤ τmax, 0 ≤ t ≤ T

EE364b, Stanford University 18


Discretization
• discretize with time interval h = T /N
PN
• J ≈ h i=1 kτik22, with τi = τ (ih)
• approximate derivatives as

θi+1 − θi−1 θi+1 − 2θi + θi−1


θ̇(ih) ≈ , θ̈(ih) ≈
2h h2

• approximate dynamics as set of nonlinear equality constraints:

θi+1 − 2θi + θi−1 θi+1 − θi−1 θi+1 − θi−1


 
M (θi) 2
+ W θ i , = τi
h 2h 2h

• θ0 = θ1 = θinit; θN = θN +1 = θfinal

EE364b, Stanford University 19


• discretized nonlinear optimal control problem:
PN
minimize h i=1 kτik22
subject to θ0 = θ1 = θinit, θN = θN +1 = θfinal
kτik∞ ≤ τmax, i = 1, . .. , N 
θi+1 −2θi +θi−1 θi+1 −θi−1 θi+1 −θi−1
M (θi) h2
+ W θi , 2h 2h = τi

• replace equality constraints with quasilinearized versions


(k) (k)
!
(k) θi+1 − 2θi + θi−1 (k) θi+1 − θi−1 θi+1 − θi−1
M (θi ) 2
+ W θ i , = τi
h 2h 2h

• trust region: only on θi


• initialize with θi = ((i − 1)/(N − 1))(θfinal − θinit), i = 1, . . . , N

EE364b, Stanford University 20


Numerical example

• m1 = 1, m2 = 5, l1 = 1, l2 = 1

• N = 40, T = 10

• θinit = (0, −2.9), θfinal = (3, 2.9)

• τmax = 1.1

• α = 0.1, β succ = 1.1, β fail = 0.5, ρ(1) = 90◦

• λ=2

EE364b, Stanford University 21


SCP progress

70

60

50
φ(x(k))

40

30

20

10
5 10 15 20 25 30 35 40
k

EE364b, Stanford University 22


Convergence of J and torque residuals

2
14 10

sum of torque residuals


13.5 1
10
13
0
10
12.5
J (k)

12 −1
10

11.5
−2
10
11

−3
10.5 10
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
k k

EE364b, Stanford University 23


Predicted and actual decreases in φ

2
140 10

120
δ̂ (dotted), δ (solid)

1
10
100

80 0

ρ(k) (◦)
10
60
−1
40 10

20 −2
10
0
−3
−20 10
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
k k

EE364b, Stanford University 24


Trajectory plan

1.5 3.5

3
1
2.5

θ1
τ1

0.5
1.5

1
0
0.5

−0.5 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

t t
2 5

θ2
τ2

0
0

−1 −5
0 2 4 6 8 10 0 2 4 6 8 10
t t

EE364b, Stanford University 25


Convex composite

• general form: for h : Rm → R convex, c : Rn → Rm smooth,

f (x) = h(c(x))

• exact penalty formulation of

minimize f (x) subject to c(x) = 0

• approximate f locally by convex approximation: near x,

f (y) ≈ fˆx(y) = h(c(x) + ∇c(x)T (y − x))

EE364b, Stanford University 26


Convex composite (prox-linear) algorithm
given function f = h ◦ c and convex domain C,
line search parameters α ∈ (0, .5), β ∈ (0, 1), stopping tolerance ǫ > 0
k := 0
repeat
Use model fˆ = fx(k)
Set x̂(k+1) = argminx∈C {fˆ(x)} and direction ∆(k+1) = x̂(k+1) − x(k)
Set δ (k) = fˆ(x(k) + ∆(k)) − f (x(k))
Set t = 1
while f (x(k) + t∆(k)) ≥ f (x(k)) + αtδ (k)
t=β·t
If k∆(k+1)k2/t ≤ ǫ, quit
k := k + 1

EE364b, Stanford University 27


Nonlinear measurements (phase retrieval)
• phase retrieval problem: for ai ∈ Cn, x⋆ ∈ Cn, observe
bi = |a∗i x⋆|2

• goal is to find x, natural objectives are of form


f (x) = |Ax|2 − b

• “robust” phase retrieval problem


Xm
f (x) = |a∗i x|2 − bi
i=1
or quadratic objective
m
1X ∗ 2 2
f (x) = |ai x| − bi
2 i=1

EE364b, Stanford University 28


Numerical example

• m = 200, n = 50, over reals R (sign retrieval)

• Generate 10 independent examples, A ∈ Rm×n, b = |Ax⋆|2,

Aij ∼ N (0, 1), x⋆ ∼ N (0, I)

• Two sets of experiments: initialize at

x(0) ∼ N (0, I) or x(0) ∼ N (x⋆, I)

• Use h(z) = kzk1 or h(z) = kzk22, c(x) = (Ax)2 − b.

EE364b, Stanford University 29


Numerical example (absolute loss, random initialization)
103

102

101
f (x(k)) − f (x⋆)

100

10−1

10−2

10−3

10−4

10−5
20 40 60 80 100
k

EE364b, Stanford University 30


Numerical example (absolute loss, good initialization)
103

102

101
f (x(k)) − f (x⋆)

100

10−1

10−2

10−3

10−4

10−5
1 2 3 4 5 6 7
k

EE364b, Stanford University 31


Numerical example (squared loss, random init)
101

100
f (x(k)) − f (x⋆)

10−1

10−2

10−3

10−4

10−5

10−6
1 2 3 4 5 6 7 8 9 10
k

EE364b, Stanford University 32


Numerical example (squared loss, good init)
1
10

0
10

−1
f (x(k)) − f (x⋆)

10

−2
10

−3
10

−4
10

−5
10

−6
10

−7
10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
k

EE364b, Stanford University 33


Extensions and convergence of basic prox-linear method

• regularization or “trust” region: update


 
1
x(k+1) (k) (k) T (k)
= argmin h(c(x ) + ∇c(x ) (x − x )) + kx − x(k)k22
x∈C 2αk

• with line search or αk small enough, lower bound on


inf x f (x) = inf x h(c(x)) > −∞, guaranteed to converge to stationary
point

• When h(z) = kzk22, often called ’Gauss–Newton’ method, some variants


called ’Levenberg–Marquardt’

EE364b, Stanford University 34


‘Difference of convex’ programming

• express problem as

minimize f0(x) − g0(x)


subject to fi(x) − gi(x) ≤ 0, i = 1, . . . , m

where fi and gi are convex

• fi − gi are called ‘difference of convex’ functions

• problem is sometimes called ‘difference of convex programming’

EE364b, Stanford University 35


Convex-concave procedure

• obvious convexification at x(k): replace f (x) − g(x) with

fˆ(x) = f (x) − g(x(k)) − ∇g(x(k))T (x − x(k))

• since fˆ(x) ≥ f (x) for all x, no trust region is needed


– true objective at x̃ is better than convexified objective
– true feasible set contains feasible set for convexified problem

• SCP sometimes called ‘convex-concave procedure’

EE364b, Stanford University 36


Example (BV §7.1)
• given samples y1, . . . , yN ∈ Rn from N (0, Σtrue)
• negative log-likelihood function is

N
X
f (Σ) = log det Σ + Tr(Σ−1Y ), Y = (1/N ) yiyiT
i=1

(dropping a constant and positive scale factor)


• ML estimate of Σ, with prior knowledge Σij ≥ 0:

minimize f (Σ) = log det Σ + Tr(Σ−1Y )


subject to Σij ≥ 0, i, j = 1, . . . , n

with variable Σ (constraint Σ ≻ 0 is implicit)

EE364b, Stanford University 37


• first term in f is concave; second term is convex
• linearize first term in objective to get
 
fˆ(Σ) = log det Σ (k)
+ Tr (Σ (k) −1
) (Σ − Σ (k)
) + Tr(Σ−1Y )

EE364b, Stanford University 38


Numerical example

convergence of problem instance with n = 10, N = 15

−5

−10
f (Σ)

−15

−20

−25

−30
1 2 3 4 5 6 7

EE364b, Stanford University 39


Alternating convex optimization

• given nonconvex problem with variable (x1, . . . , xn) ∈ Rn


S
• I1, . . . , Ik ⊂ {1, . . . , n} are index subsets with j Ij = {1, . . . , n}

• suppose problem is convex in subset of variables xi, i ∈ Ij ,


when xi, i 6∈ Ij are fixed

• alternating convex optimization method: cycle through j, in each step


optimizing over variables xi, i ∈ Ij

• special case: bi-convex problem


– x = (u, v); problem is convex in u (v) with v (u) fixed
– alternate optimizing over u and v

EE364b, Stanford University 40


Nonnegative matrix factorization

• NMF problem:
minimize kA − XY kF
subject to Xij , Yij ≥ 0
variables X ∈ Rm×k , Y ∈ Rk×n, data A ∈ Rm×n

• difficult problem, except for a few special cases (e.g., k = 1)

• alternating convex optimation: solve QPs to optimize over X, then Y ,


then X . . .

EE364b, Stanford University 41


Example

• convergence for example with m = n = 50, k = 5


(five starting points)
30

25
kA − XY kF

20

15

10

0
0 5 10 15 20 25 30
k

EE364b, Stanford University 42

You might also like