BV Cvxslides
BV Cvxslides
Mathematical optimization
Convex optimization
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
gi (x) = 0, i = 1, . . . , p
I generally, no
I but you can try to solve it approximately, and it often doesn’t matter
Mathematical optimization
Convex optimization
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
I variable x ∈ Rn
I equality constraints are linear
I f0 , . . . , fm are convex: for 𝜃 ∈ [0, 1],
I classical view:
– linear (zero curvature) is easy
– nonlinear (nonzero curvature) is hard
I gets close to the basic idea: say what you want, not how to get it
I variable is x x = cp.Variable(n)
obj = cp.norm2(A @ x - b)**2
I A, b given
constr = [x >= 0]
I x 0 means x1 ≥ 0, . . . , xn ≥ 0 prob = cp.Problem(cp.Minimize(obj), constr)
prob.solve()
I algorithms
– 1947: simplex algorithm for linear programming (Dantzig)
– 1960s: early interior-point methods (Fiacco & McCormick, Dikin, …)
– 1970s: ellipsoid method and other subgradient methods
– 1980s & 90s: interior-point methods (Karmarkar, Nesterov & Nemirovski)
– since 2000s: many methods for large-scale convex optimization
I applications
– before 1990: mostly in operations research, a few in engineering
– since 1990: many applications in engineering (control, signal processing, communications,
circuit design, …)
– since 2000s: machine learning and statistics, finance
Generalized inequalities
𝜃 = 1.2 x1
𝜃=1
𝜃 = 0.6
x2
𝜃=0
𝜃 = −0.2
affine set: contains the line through any two distinct points in the set
line segment between x1 and x2 : all points of form x = 𝜃x1 + (1 − 𝜃)x2 , with 0 ≤ 𝜃 ≤ 1
convex set: contains line segment between any two points in the set
x = 𝜃 1 x1 + 𝜃 2 x2 + · · · + 𝜃 k xk
with 𝜃 1 + · · · + 𝜃 k = 1, 𝜃 i ≥ 0
x = 𝜃 1 x1 + 𝜃 2 x2
with 𝜃 1 ≥ 0, 𝜃 2 ≥ 0
x1
x2
0
convex cone: set that contains all conic combinations of points in the set
x0 aT x ≥ b
halfspace: set of the form {x | aT x ≤ b}, with a ≠ 0
aT x ≤ b
xc
t
n+1
{(x, t) | kxk 2 ≤ t} ⊂ R
0
1
is called second-order cone 0
0
1
x2 −1 −1
x1
Convex Optimization Boyd and Vandenberghe 2.8
Polyhedra
{x | Ax b, Cx = d}
P
a5
a3
a4
x y
0.5
example: ∈ S2+
z
y z
0
1
1
0 0.5
y −1 0 x
Convex Optimization Boyd and Vandenberghe 2.10
Outline
Generalized inequalities
3. show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls, …)
by operations that preserve convexity
– intersection
– affine mapping
– perspective mapping
– linear-fractional mapping
I picture for m = 2:
2
1
1
0
p(t)
x2
0
−1
−1
−2
0 2 𝜋/3
𝜋/3
t
𝜋 −2 −1
x01 1 2
I images and inverse images of convex sets under perspective are convex
I linear-fractional function f : Rn → Rm :
Ax + b
f (x) = , dom f = {x | cT x + d > 0}
cT x + d
I images and inverse images of convex sets under linear-fractional functions are convex
1
f (x) = x
x1 + x2 + 1
1 1
x2
x2
0 C 0 f (C)
−1 −1
−1 0 1 −1 0 1
x1 x1
Generalized inequalities
examples
I nonnegative orthant K = Rn+ = {x ∈ Rn | xi ≥ 0, i = 1, . . . , n}
I positive semidefinite cone K = Sn+
I nonnegative polynomials on [0, 1]:
x K y ⇐⇒ y − x ∈ K, x ≺K y ⇐⇒ y − x ∈ int K
I examples
– componentwise inequality (K = Rn+ ): x Rn y ⇐⇒ xi ≤ yi , i = 1, . . . , n
+
– matrix inequality (K = Sn+ ): X Sn Y ⇐⇒ Y − X positive semidefinite
+
these two types are so common that we drop the subscript in K
x K y, u K v =⇒ x + u K y + v
Generalized inequalities
I if C and D are nonempty disjoint (i.e., C ∩ D = ∅) convex sets, there exist a ≠ 0, b s.t.
aT x ≤ b for x ∈ C, aT x ≥ b for x ∈ D
aT x ≥ b aT x ≤ b
D
C
a
x0
C
Convex functions
Quasiconvexity
(y, f (y))
(x, f (x))
I f is concave if −f is convex
I f is strictly convex if dom f is convex and for x, y ∈ dom f , x ≠ y, 0 < 𝜃 < 1,
concave functions:
I affine: ax + b on R, for any a, b ∈ R
I powers: x 𝛼 on R++ , for 0 ≤ 𝛼 ≤ 1
I logarithm: log x on R++
I entropy: −x log x on R++
I negative part: min{0, x}
convex functions:
I affine functions: f (x) = aT x + b
I any norm, e.g., the ℓp norms
– kxk p = (|x1 | p + · · · + |xn | p ) 1/p for p ≥ 1
– kxk ∞ = max{|x1 |, . . . , |xn |}
I sum of squares: kxk 22 = x12 + · · · + xn2
I max function: max(x) = max{x1 , x2 , . . . , xn }
I softmax or log-sum-exp function: log(exp x1 + · · · + exp xn )
f (x) x ∈ dom f
f̃ (x) =
∞ x ∉ dom f
f (y)
f (x) + ∇f (x) T (y − x)
(x, f (x))
𝜕 2 f (x)
∇2 f (x)ij = , i, j = 1, . . . , n,
𝜕xi 𝜕xj
2
I quadratic-over-linear: f (x, y) = x2 /y, y > 0
f (x, y)
1
T
2 y y
2
∇ f (x, y) = 3 0
y −x −x 0
2 2
y 0 −2 x
Convex Optimization Boyd and Vandenberghe 3.11
More examples
Ín
I log-sum-exp: f (x) = log k=1 exp xk is convex
1 1
∇2 f (x) = diag(z) − T 2 zzT (zk = exp xk )
1 z
T (1 z)
( k zk vk2 )( k zk ) − ( k vk zk ) 2
Í Í Í
T 2
v ∇ f (x)v = ≥0
( k zk ) 2
Í
În
I geometric mean: f (x) = ( k=1 xk )
1/n on Rn++ is concave (similar proof as above)
epi f
f (E z) ≤ E f (z)
prob(z = x) = 𝜃, prob(z = y) = 1 − 𝜃
I suppose X ∼ N (𝜇, 𝜎 2 )
I with f (u) = exp u, Y = f (X) is log-normal
I we have E f (X) = exp(𝜇 + 𝜎 2 /2)
I Jensen’s inequality is
p(f (X))
E f (X)
f (E X)
p(X)
EX
Convex functions
Quasiconvexity
3. show that f is obtained from simple convex functions by operations that preserve convexity
– nonnegative weighted sum
– composition with affine function
– pointwise maximum and supremum
– composition
– minimization
– perspective
examples
I log barrier for linear inequalities
m
log(bi − aTi x), dom f = {x | aTi x < bi , i = 1, . . . , m}
Õ
f (x) = −
i=1
examples
I piecewise-linear function: f (x) = maxi=1,...,m (aTi x + bi )
I sum of r largest components of x ∈ Rn :
examples
I distance to farthest point in a set C: f (x) = supy∈C kx − yk
I maximum eigenvalue of symmetric matrix: for X ∈ Sn , 𝜆max (X) = sup kyk 2 =1 yT Xy is convex
I support function of a set C: SC (x) = supy∈C yT x is convex
I the function g(x) = inf y∈C f (x, y) is called the partial minimization of f (w.r.t. y)
I if f (x, y) is convex in (x, y) and C is a convex set, then partial minimization g is convex
examples
I f (x, y) = xT Ax + 2xT By + yT Cy with
A B
0, C0
BT C
examples
I f (x) = exp g(x) is convex if g is convex
I f (x) = 1/g(x) is convex if g is concave and positive
I you will use this composition rule constantly throughout this course
I you need to commit this rule to memory
Convex functions
Quasiconvexity
the function
(x − y) 2
f (x, y) = , x < 1, y<1
1 − max(x, y)
is convex
constructive analysis:
I (leaves) x, y, and 1 are affine
I max(x, y) is convex; x − y is affine
I 1 − max(x, y) is concave
I function u2 /v is convex, monotone decreasing in v for v > 0
I f is composition of u2 /v with u = x − y, v = 1 − max(x, y), hence convex
in disciplined convex programming (DCP) users construct convex and concave functions as
expressions using constructive convex analysis
(x − y) 2
, x < 1, y<1
1 − max(x, y)
import cvxpy as cp
x = cp.Variable()
y = cp.Variable()
expr = cp.quad_over_lin(x - y, 1 - cp.maximum(x, y))
expr.curvature # Convex
expr.sign # Positive
expr.is_dcp() # True
√
I consider convex function f (x) = 1 + x2
I expression f1 = cp.sqrt(1+cp.square(x)) is not DCP
I expression f2 = cp.norm2([1,x]) is DCP
I CVXPY will not recognize f1 as convex, even though it represents a convex function
Convex functions
Quasiconvexity
I g is convex if f is convex
examples
I f (x) = xT x is convex; so g(x, t) = xT x/t is convex for t > 0
I f (x) = − log x is convex; so relative entropy g(x, t) = t log t − t log x is convex on R2++
f (x)
xy
(0, −f ∗ (y))
y<0
−1 − log(−y)
f (y) = sup (xy + log x) =
∗
x>0 ∞ otherwise
1 T −1
f ∗ (y) = sup (yT x − (1/2)xT Qx) = y Q y
x 2
Convex functions
Quasiconvexity
S 𝛼 = {x ∈ dom f | f (x) ≤ 𝛼}
a b c
I f is quasiconcave if −f is quasiconvex
I f is quasilinear if it is quasiconvex and quasiconcave
|x| is quasiconvex on R
p
I
I ceil(x) = inf{z ∈ Z | z ≥ x} is quasilinear
I log x is quasilinear on R++
aT x + b
f (x) = , dom f = {x | cT x + d > 0}
cT x + d
is quasilinear
∇f (x)
x
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
I x ∈ Rn is the optimization variable
I f0 : Rn → R is the objective or cost function
I fi : Rn → R, i = 1, . . . , m, are the inequality constraint functions
I hi : Rn → R are the equality constraint functions
f0 (xlo )
p★
x★ xlo
examples with n = 1, m = p = 0
I f0 (x) = 1/x, dom f0 = R++ : p★ = 0, no optimal point
I f0 (x) = − log x, dom f0 = R++ : p★ = −∞
I f0 (x) = x log x, dom f0 = R++ : p★ = −1/e, x = 1/e is optimal
I f0 (x) = x3 − 3x: p★ = −∞, x = 1 is locally optimal
10
6
0
5 3 0
−3
0
0
0 1 2 0 1 2 0 1/e 1 −2 0 2
example:
Ík
minimize f0 (x) = − i=1 log(bi − aTi x)
is an unconstrained problem with implicit constraints aTi x < bi
find x
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
minimize 0
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
aTi x = bi , i = 1, . . . , p
I objective and inequality constraints f0 , f1 , …, fm are convex
I equality constraints are affine, often written as Ax = b
I feasible and optimal sets of a convex optimization problem are convex
proof:
I suppose x is locally optimal, but there exists a feasible y with f0 (y) < f0 (x)
I x locally optimal means there is an R > 0 such that
−∇f0 (x)
X x
∇f0 (x)i ≥ 0 xi = 0
x 0,
∇f0 (x)i = 0 xi > 0
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
minimize cT x + d
subject to Gx h
Ax = b
I convex problem with affine objective and constraint functions
I feasible set is a polyhedron
−c
P x★
minimize cT x
subject to Ax b, x0
minimize cT x
−A −b
subject to x
−I 0
I equivalent to LP
minimize t
subject to aTi x + bi ≤ t, i = 1, . . . , m
n
with variables x ∈ R , t ∈ R
maximize r
subject to aTi xc + rkai k 2 ≤ bi , i = 1, . . . , m
minimize (1/2)xT Px + qT x + r
subject to Gx h
Ax = b
−∇f0 (x★)
x★
P
minimize f T x
subject to kAi x + bi k 2 ≤ ciT x + di , i = 1, . . . , m
Fx = g
(Ai ∈ Rni ×n , F ∈ Rp×n )
I inequalities are called second-order cone (SOC) constraints:
minimize cT x
subject to aTi x ≤ bi , i = 1, . . . , m,
minimize cT x
subject to aTi x ≤ bi for all ai ∈ Ei , i = 1, . . . , m,
minimize cT x
subject to prob(aTi x ≤ bi ) ≥ 𝜂, i = 1, . . . , m
minimize cT x
subject to āTi x + kPiT xk 2 ≤ bi , i = 1, . . . , m
I assume ai ∼ N ( āi , Σi )
I aTi x ∼ N ( āTi x, xT Σi x), so
bi − āTi x
!
prob(aTi x ≤ bi ) = Φ
kΣi1/2 xk 2
√ ∫u 2
where Φ(u) = (1/ 2𝜋) −∞ e−t /2 dt is N (0, 1) CDF
I prob(aTi x ≤ bi ) ≥ 𝜂 can be expressed as āTi x + Φ−1 (𝜂)kΣi1/2 xk 2 ≤ bi
I for 𝜂 ≥ 1/2, robust LP equivalent to SOCP
minimize cT x
subject to āTi x + Φ−1 (𝜂)kΣi1/2 xk 2 ≤ bi , i = 1, . . . , m
minimize cT x
subject to Fx + g K 0
Ax = b
I constraint Fx + g K 0 involves a generalized inequality with respect to a proper cone K
I linear programming is a conic form problem with K = Rm
+
minimize cT x
subject to x1 F1 + x2 F2 + · · · + xn Fn + G 0
Ax = b
with Fi , G ∈ Sk
I inequality constraint is called linear matrix inequality (LMI)
I includes problems with multiple LMI constraints: for example,
1/2
minimize kA(x)k 2 = 𝜆max (A(x) T A(x))
where A(x) = A0 + x1 A1 + · · · + xn An (with given Ai ∈ Rp×q )
equivalent SDP
minimize t
tI A(x)
subject to 0
A(x) T tI
I variables x ∈ Rn , t ∈ R
I constraint follows from
kAk 2 ≤ t ⇐⇒ AT A t 2 I, t ≥ 0
tI A
⇐⇒ 0
AT tI
SOCP: minimize f T x
subject to kAi x + bi k 2 ≤ ciT x + di , i = 1, . . . , m
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
I non-convex problem
minimize x1 /x2 + x3 /x1
subject to x2 /x3 + x1 ≤ 1
with implicit constraint x 0
which is convex
suppose
I 𝜙0 is monotone increasing
I 𝜓i (u) ≤ 0 if and only if u ≤ 0, i = 1, . . . , m
I 𝜑i (u) = 0 if and only if u = 0, i = 1, . . . , p
maximize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
I examples:
– 𝜙0 (u) = −u transforms maximizing a concave function to minimizing a convex function
– 𝜙0 (u) = 1/u transforms maximizing a concave positive function to minimizing a convex
function
Convex Optimization Boyd and Vandenberghe 4.34
Eliminating equality constraints
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
is equivalent to
minimize (over z) f0 (Fz + x0 )
subject to fi (Fz + x0 ) ≤ 0, i = 1, . . . , m
where F and x0 are such that Ax = b ⇐⇒ x = Fz + x0 for some z
minimize f0 (A0 x + b0 )
subject to fi (Ai x + bi ) ≤ 0, i = 1, . . . , m
is equivalent to
minimize (over x, yi ) f0 (y0 )
subject to fi (yi ) ≤ 0, i = 1, . . . , m
yi = Ai x + bi , i = 0, 1, . . . , m
minimize f0 (x)
subject to aTi x ≤ bi , i = 1, . . . , m
is equivalent to
minimize (over x, s) f0 (x)
subject to aTi x + si = bi , i = 1, . . . , m
si ≥ 0, i = 1, . . . m
minimize (over x, t) t
subject to f0 (x) − t ≤ 0
fi (x) ≤ 0, i = 1, . . . , m
Ax = b
minimize f0 (x1 , x2 )
subject to fi (x1 ) ≤ 0, i = 1, . . . , m
is equivalent to
minimize f̃0 (x1 )
subject to fi (x1 ) ≤ 0, i = 1, . . . , m
where f̃0 (x1 ) = inf x2 f0 (x1 , x2 )
Ĉ = {x | fi (x) ≤ 0, i = 1, . . . , m, fm (x) ≤ 0, Ax = b}
I convex problem
minimize ĥ(x)
subject to fi (x) ≤ 0, i = 1, . . . , m, Ax = b
is a convex relaxation of the original problem
I optimal value of relaxation is lower bound on optimal value of original problem
minimize cT (x, z)
subject to F (x, z) g, A(x, z) = b, z ∈ {0, 1}q
with variables x ∈ Rn , z ∈ Rq
I zi are called Boolean variables
I this problem is in general hard to solve
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
I specify objective as
– minimize {scalar convex expression}, or
– maximize {scalar concave expression}
I specify constraints as
– {convex expression} <= {concave expression} or
– {concave expression} >= {convex expression} or
– {affine expression} == {affine expression}
P1 ⇐⇒ P2 ⇐⇒ ··· ⇐⇒ PN−1 ⇐⇒ PN
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
K
ck x1a1k x2a2k · · · xnank , dom f = Rn++
Õ
f (x) =
k=1
minimize f0 (x)
subject to fi (x) ≤ 1, i = 1, . . . , m
hi (x) = 1, i = 1, . . . , p
K
!
T
log f (ey1 , . . . , eyn ) = log eak y+bk
Õ
(bk = log ck )
k=1
n
log kDMD−1 k 2F = log
©Õ
exp 2(yi − yj + log |Mij |) ®
ª
«i,j=1 ¬
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
with f0 : Rn → R quasiconvex, f1 , …, fm convex
can have locally optimal points that are not (globally) optimal
(x, f0 (x))
I linear-fractional program
minimize cT y + dz
subject to Gy hz, Ay = bz
eT y + fz = 1, z ≥ 0
I recover x★ = y★/z★
example:
I f0 (x) = p(x)/q(x), with p convex and nonnegative, q concave and positive
I take 𝜙t (x) = p(x) − tq(x): for t ≥ 0,
– 𝜙t convex in x
– p(x)/q(x) ≤ t if and only if 𝜙t (x) ≤ 0
Optimization problems
Transforming problems
Geometric programming
Quasiconvex optimization
Multicriterion optimization
I feasible x dominates another feasible x̃ if f0 (x) f0 ( x̃) and for at least one i, Fi (x) < Fi ( x̃)
I i.e., x meets x̃ on all objectives, and beats it on at least one
O
O
f0 (xpo )
f0 (x★)
x★ is optimal xpo is Pareto optimal
25
20 O
F2 (x) = kxk 22
15
10
0
0 10 20 30 40 50
F1 (x) = kAx − bk 22
15%
1
10%
allocation x
0.5
x(1)
5%
0
0%
0% 10% 20% 0% 10% 20%
f0 (x1 )
f0 (x3 )
𝜆1
𝜆2
f0 (x2 )
20
15
kxk 22 10
5
𝛾=1
0
0 5 10 15 20
kAx − bk 22
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , p
minimize xT x
subject to Ax = b
∇x L(x, 𝜈) = 2x + AT 𝜈 = 0 =⇒ x = −(1/2)AT 𝜈
1
g(𝜈) = L((−1/2)AT 𝜈, 𝜈) = − 𝜈 T AAT 𝜈 − bT 𝜈
4
minimize cT x
subject to Ax = b, x0
I Lagrangian is
I L is affine in x, so
−bT 𝜈 AT 𝜈 − 𝜆 + c = 0
g(𝜆, 𝜈) = inf L(x, 𝜆, 𝜈) =
x −∞ otherwise
minimize kxk
subject to Ax = b
I dual function is
bT 𝜈 kAT 𝜈k ∗ ≤ 1
T T
g(𝜈) = inf (kxk − 𝜈 Ax + b 𝜈) =
x −∞ otherwise
minimize xT Wx
subject to xi2 = 1, i = 1, . . . , n
minimize f0 (x)
subject to Ax b, Cx = d
I dual function
f0 (x) + (AT 𝜆 + C T 𝜈) T x − bT 𝜆 − d T 𝜈
g(𝜆, 𝜈) = inf
x∈dom f0
= −f0∗ (−AT 𝜆 − C T 𝜈) − bT 𝜆 − d T 𝜈
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
I finds best lower bound on p★, obtained from Lagrange dual function
I a convex optimization problem, even if original primal problem is not
I dual optimal value denoted d★
I 𝜆, 𝜈 are dual feasible if 𝜆 0, (𝜆, 𝜈) ∈ dom g
I often simplified by making implicit constraint (𝜆, 𝜈) ∈ dom g explicit
maximize −bT 𝜈
subject to AT 𝜈 + c 0
weak duality: d★ ≤ p★
I always holds (for convex and nonconvex problems)
I can be used to find nontrivial lower bounds for difficult problems, e.g., solving the SDP
maximize −1T 𝜈
subject to W + diag(𝜈) 0
gives a lower bound for the two-way partitioning problem on page 5.7
strong duality: d★ = p★
I does not hold in general
I (usually) holds for convex problems
I conditions that guarantee strong duality in convex problems are called constraint
qualifications
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
I also guarantees that the dual optimum is attained (if p★ > −∞)
I can be sharpened: e.g.,
– can replace int D with relint D (interior relative to affine hull)
– affine inequalities do not need to hold with strict inequality
I there are many other types of constraint qualifications
primal problem
minimize cT x
subject to Ax b
dual problem
maximize −bT 𝜆
subject to AT 𝜆 + c = 0, 𝜆0
minimize xT Px
subject to Ax b
dual function
1
g(𝜆) = inf xT Px + 𝜆T (Ax − b) = − 𝜆T AP −1 AT 𝜆 − bT 𝜆
x 4
dual problem
maximize −(1/4)𝜆T AP −1 AT 𝜆 − bT 𝜆
subject to 𝜆 0
G G
p★ p★
𝜆u + t = g(𝜆) d★
g(𝜆)
u u
𝜆u + t = g(𝜆) p★
g(𝜆)
u
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
i=1 i=1
≤ f0 (x★)
if strong duality holds and x, 𝜆, 𝜈 are optimal, they satisfy the KKT conditions
if x̃, 𝜆,
˜ 𝜈˜ satisfy KKT for a convex problem, then they are optimal:
I from complementary slackness: f0 ( x̃) = L( x̃, 𝜆,
˜ 𝜈)
˜
I from 4th condition (and convexity): g(𝜆, ˜ = L( x̃, 𝜆,
˜ 𝜈) ˜ 𝜈)
˜
hence, f0 ( x̃) = g(𝜆,
˜ 𝜈)
˜
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
I assume strong duality holds for unperturbed problem, with 𝜆★, 𝜈★ dual optimal
I apply weak duality to perturbed problem:
– if 𝜆★
i small: p does not decrease much if we loosen constraint i (ui > 0)
★
– if 𝜈★
i large and positive: p increases greatly if we take vi < 0
★
– if 𝜈★
i large and negative: p increases greatly if we take vi > 0
★
– if 𝜈★
i small and positive: p does not decrease much if we take vi > 0
★
– if 𝜈★
i small and negative: p does not decrease much if we take vi < 0
★
p★ (0) − 𝜆★u
Convex Optimization Boyd and Vandenberghe 5.25
Outline
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
common reformulations
I introduce new variables and equality constraints
I make explicit constraints implicit or vice-versa
I transform objective or constraint functions, e.g., replace f0 (x) by 𝜙(f0 (x)) with 𝜙 convex,
increasing
minimize f0 (y)
subject to Ax + b − y = 0
I minimize kAx − bk
I reformulate as minimize kyk subject to y = Ax − b
I recall conjugate of general norm:
0 kzk ∗ ≤ 1
∗
kzk =
∞ otherwise
maximize bT 𝜈
subject to AT 𝜈 = 0, k𝜈k ∗ ≤ 1
KKT conditions
Sensitivity analysis
Problem reformulations
Theorems of alternatives
I a theorem of alternatives states that two inequality systems are (weak or strong)
alternatives
I can be considered the extension of duality to feasibility problems
fi (x) ≤ 0, i = 1, . . . , m, hi (x) = 0, i = 1, . . . , p
minimize 0
subject to fi (x) ≤ 0, i = 1, . . . , m,
hi (x) = 0, i = 1, . . . , p
𝜆 0, g(𝜆, 𝜈) > 0
𝜆 0, g(𝜆, 𝜈) ≥ 1
I consider system
Ax = b, x0
−bT 𝜈 AT 𝜈 = 𝜆
I dual function is g(𝜆, 𝜈) =
−∞ otherwise
AT 𝜈 0, bT 𝜈 ≤ −1
I Farkas’ lemma:
Ax 0, cT x < 0 and AT y + c = 0, y0
are strong alternatives
minimize cT x
subject to Ax 0
6.2
2.35
0.04
T T
x = −7.7 , p x = −0.2, 1 x = 0, Vx =
0.00
1.5
2.19
I with prices p̃, there is no arbitrage, with risk-neutral probability
0.36
1.0
0.27
T
y= V y = 0.8
0.26
0.7
0.11
Convex Optimization Boyd and Vandenberghe 5.38
6. Approximation and fitting
Outline
Regularized approximation
Robust approximation
I Euclidean approximation (k · k 2 )
– solution x★ = A† b
𝜙(u)
𝜙(u) = max{0, |u| − a} 1
deadzone-linear
square 𝜙(u) = u2
0
−2 −1 0 1 2
20
1.5
𝜙hub (u)
u2 |u| ≤ M
1
𝜙hub (u) =
M (2|u| − M) |u| > M
0.5
0
−1.5 −1 −0.5 0 0.5 1 1.5
u
20
10
f (t)
0
−10
−20
−10 −5 0 5 10
t
I least-norm problem:
minimize kxk
subject to Ax = b,
with A ∈ Rm×n , m ≤ n, k · k is any norm
Regularized approximation
Robust approximation
I a bi-objective problem:
I estimation: linear measurement model y = Ax + v, with prior knowledge that kxk is small
I optimal design: small x is cheaper or more efficient, or the linear model y = Ax is only
valid for small x
I robust approximation: good approximation Ax ≈ b with small x is less sensitive to errors
in A than good approximation with large x
y(t)
u(t)
0
−5 −0.5
−10 −1
0 50 100 150 200 0 50 100 150 200
t t
4 1
2 0.5
y(t)
u(t)
0 0
−2 −0.5
−4 −1
0 50 100 150 200 0 50 100 150 200
t t
4 1
2 0.5
y(t)
u(t)
0 0
−2 −0.5
−4 −1
0 50 100 150 200 0 50 100 150 200
t t
Convex Optimization Boyd and Vandenberghe 6.14
Signal reconstruction
I bi-objective problem:
– x ∈ Rn is unknown signal
– xcor = x + v is (known) corrupted version of x, with additive noise v
– variable x̂ (reconstructed signal) is estimate of x
– 𝜙 : Rn → R is regularization function or smoothing objective
I examples:
– quadratic smoothing, 𝜙quad ( x̂) = n−1 ( x̂i+1 − x̂i ) 2
Í
i=1
Ín−1
– total variation smoothing, 𝜙tv ( x̂) = i=1 | x̂i+1 − x̂i |
0.5
x̂
0
x
0 −0.5
0 1000 2000 3000 4000
0.5
−0.5
0 1000 2000 3000 4000
x̂
0
−0.5
0.5
0 1000 2000 3000 4000
0.5
xcor
x̂
0
−0.5 −0.5
0 1000 2000 3000 4000 0 1000 2000 3000 4000
i i
x̂i
0
2 −2
0 500 1000 1500 2000
1 2
x
x̂i
−1 0
−2
0 500 1000 1500 2000 −2
0 500 1000 1500 2000
2 2
1
xcor
x̂i
0 0
−1
−2
−2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
three solutions on trade-off curve
original signal x and noisy signal xcor
k x̂ − xcor k 2 versus 𝜙quad ( x̂)
x̂
0
2 −2
0 500 1000 1500 2000
1 2
x
x̂
−1 0
−2
0 500 1000 1500 2000 −2
0 500 1000 1500 2000
2 2
1
xcor
x̂
0 0
−1
−2
−2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
three solutions on trade-off curve
original signal x and noisy signal xcor
k x̂ − xcor k 2 versus 𝜙tv ( x̂)
Regularized approximation
Robust approximation
I two approaches:
– stochastic: assume A is random, minimize E kAx − bk
– worst-case: set A of possible values of A, minimize supA∈ A kAx − bk
12
r(u)
6
xstoch
with u uniform on [−1, 1] xwc
I xwc minimizes sup−1≤u≤1 kA(u)x − bk 22 4
2
plot shows r(u) = kA(u)x − bk 2 versus u
0
−2 −1 0 1 2
u
I A = Ā + U, U random, E U = 0, E U T U = P
I stochastic least-squares problem: minimize E k( Ā + U)x − bk 22
I explicit expression for objective:
0.2 xrls
frequency
0.15
0.1
xtik
0.05 xls
0
0 1 2 3 4 5
r(u)
Hypothesis testing
Experiment design
maximize l(x) = m T
i=1 log p(yi − ai x)
Í
(y is observed value)
exp(aT u + b)
p = prob(y = 1) =
1 + exp(aT u + b)
I a, b are parameters; u ∈ Rn are (observable) explanatory variables
I estimation problem: estimate a, b from m observations (ui , yi )
I log-likelihood function (for y1 = · · · = yk = 1, yk+1 = · · · = ym = 0):
k m
exp(aT ui + b) Ö
!
Ö 1
l(a, b) = log
i=1
1 + exp(a ui + b) i=k+1 1 + exp(aT ui + b)
T
k m
(aT ui + b) − log(1 + exp(aT ui + b))
Õ Õ
=
i=1 i=1
concave in a, b
Convex Optimization Boyd and Vandenberghe 7.5
Example
0.8
prob(y = 1)
0.6
0.4
0.2
0
0 2 4 6 8 10
u
N
1Õ
−2𝜋n − log det Σ − yT Σ −1 y
l(Σ) =
2 k=1
N
= −2𝜋n − log det Σ − tr Σ −1 Y
2
ÍN
with Y = (1/N) k=1 yk yk ,
T the empirical covariance
I l is not concave in Σ (the log det Σ term has the wrong sign)
I with no constraints or regularization, MLE is empirical covariance Σml = Y
I change variables to S = Σ −1
I recover original parameter via Σ = S −1
I S is the natural parameter in an exponential family description of a Gaussian
I in terms of S, log-likelihood is
N
l(S) = (−2𝜋n + log det S − tr SY)
2
which is concave
I (a similar trick can be used to handle nonzero mean)
− log det S + tr SY + 𝜆
Õ
|Sij |
i≠j
1 0 0.5 0
0 1 0 0.1
Strue
=
0.5 0 1 0.3
0 0.1 0.3 1
I empirical and sparse estimate values of Σ −1 (with 𝜆 = 0.2)
Hypothesis testing
Experiment design
randomized detector
I a nonnegative matrix T ∈ R2×n , with 1T T = 1T
I if we observe X = k, we choose hypothesis 1 with probability t1k , hypothesis 2 with
probability t2k
I if all elements of T are 0 or 1, it is called a deterministic detector
1 − Pfp Pfn
D= Tp Tq
=
Pfp 1 − Pfn
variable T ∈ R2×n
(1, 0) pk ≥ 𝜆qk
(t1k , t2k ) =
(0, 1) pk < 𝜆qk
0.8
0.6
0.70 0.10
Pfn
0.20 0.10
p q =
0.4
0.05 0.70
0.05 0.10 1
0.2
2 4
3
0
0 0.2 0.4 0.6 0.8 1
Pfp
Hypothesis testing
Experiment design
m m
! −1
ai aTi
Õ Õ
x̂ = yi ai
i=1 i=1
i=1
subject to 𝜆 0, 1T 𝜆 = 1
dual problem
maximize log det W + n log n
subject to vkT Wvk ≤ 1, k = 1, . . . , p
interpretation: {x | xT Wx ≤ 1} is minimum volume ellipsoid centered at origin, that includes all
test vectors vk
𝜆k (1 − vkT Wvk ) = 0, k = 1, . . . , p
(p = 20)
𝜆 1 = 0.5
𝜆 2 = 0.5
minimize ÍpX
log det −1
k=1
Centering
Classification
√
I factor n can be improved to n if C is symmetric
Centering
Classification
xcheb xmve
fi (x) ≤ 0, i = 1, . . . , m, Fx = g
is defined as solution of
minimize − m
Í
i=1 log(−fi (x))
subject to Fx = g
I objective is called the log-barrier for the inequalities
I (we’ll see later) analytic center more easily computed than MVE or Chebyshev center
I two sets of inequalities can describe the same set, but have different analytic centers
xac
I we have
Einner ⊆ {x | aTi x ≤ bi , i = 1, . . . , m} ⊆ Eouter
where
Centering
Classification
aT xi + b > 0, i = 1, . . . , N, aT yi + b < 0, i = 1, . . . , M
I homogeneous in a, b, hence equivalent to
aT xi + b ≥ 1, i = 1, . . . , N, aT yi + b ≤ −1, i = 1, . . . , M
a set of linear inequalities in a, b, i.e., an LP feasibility problem
H1 = {z | aT z + b = 1}
H2 = {z | aT z + b = −1}
is dist(H1 , H2 ) = 2/kak 2
minimize (1/2)kak 22
subject to aT xi + b ≥ 1, i = 1, . . . , N (2)
aT yi + b ≤ −1, i = 1, . . . , M
a QP in a, b
minimize 1T u + 1T v
subject to aT xi + b ≥ 1 − ui , i = 1, . . . , N, aT yi + b ≤ −1 + vi , i = 1, . . . , M
u 0, v 0
I an LP in a, b, u, v
I at optimum, ui = max{0, 1 − aT xi − b}, vi = max{0, 1 + aT yi + b}
I equivalent to minimizing the sum of violations of the original inequalities
Centering
Classification
I interpretations
– points are locations of plants or warehouses; fij is transportation cost between facilities i and
j
– points are locations of cells in an integrated circuit; fij represents wirelength
0 0 0
−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
I histograms of edge lengths kxi − xj k 2 , (i, ) ∈ E
4 4 6
5
3 3 4
2 2 3
2
1 1
1
00 0.5 1 1.5 2 00 0.5 1 1.5 00 0.5 1 1.5
Convex Optimization Boyd and Vandenberghe 8.20
B. Numerical linear algebra background
Outline
Block elimination
I there are good implementations of BLAS and variants (e.g., for sparse matrices)
I CPU single thread speeds typically 1–10 Gflops/s (109 flops/sec)
I CPU multi threaded speeds typically 10–100 Gflops/s
I GPU speeds typically 100 Gflops/s–1 Tflops/s (1012 flops/sec)
Block elimination
I A ∈ Rn×n is invertible, b ∈ Rn
I solution of Ax = b is x = A−1 b
I it’s super useful to recognize matrix structure that can be exploited in solving Ax = b
x1 := b1 /a11
x2 := (b2 − a21 x1 )/a22
x3 := (b3 − a31 x1 − a32 x2 )/a33
..
.
xn := (bn − an1 x1 − an2 x2 − · · · − an,n−1 xn−1 )/ann
1 j = 𝜋i
aij =
0 otherwise
– interpretation: Ax = (x 𝜋1 , . . . , x 𝜋n )
– satisfies A −1 = AT , hence cost of solving Ax = b is 0 flops
– example:
0 1 0 0 0 1
T
A= 0 0 1 , A = A = 1
−1 0 0
1 0 0 0 1 0
A = A1 A2 · · · Ak
A1 x1 = b, A2 x2 = x1 , ... Ak x = xk−1
I we wish to solve
Ax1 = b1 , Ax2 = b2 , ... Axm = bm
I when factorization cost dominates solve cost, we can solve a modest number of equations
at the same cost as one (!!)
I cost is usually much less than (2/3)n3 ; exact value depends in a complicated way on n,
number of zeros in A, sparsity pattern
I same as
– permuting rows and columns of A to get à = PT AP
– then finding Cholesky factorization of Ã
I cost is usually much less than (1/3)n3 ; exact value depends in a complicated way on n,
number of zeros in A, sparsity pattern
I reverse order of entries (i.e., permute) to get lower arrow sparsity pattern
∗ ∗ ∗
∗ ∗ ∗
à = L =
∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
L is sparse with O(n) nonzeros; cost of solve is O(n)
A = PLDLT PT
Block elimination
I express Ax = b in blocks as
A11 A12 x1 b1
=
A21 A22 x2 b2
x1 = A11
−1
(b1 − A12 x2 )
I to compute x2 , solve
(A22 − A21 A11
−1
A12 )x2 = b2 − A21 A11
−1
b1
I S = A22 − A21 A11 12 is the Schur complement
−1 A
A B x b
=
C −I y 0
then solve Ax = b − By
I this proves the matrix inversion lemma: if A and A + BC are nonsingular,
Newton’s method
Self-concordant functions
Implementation
minimize f (x)
I we assume
– f convex, twice continuously differentiable (hence dom f open)
– optimal value p★ = inf x f (x) is attained at x★ (not necessarily unique)
∇f (x) = Px + q = 0
1
f (x) − p★ ≤ k∇f (x)k 22
2m
I useful as stopping criterion (if you know m, which usually you do not)
Convex Optimization Boyd and Vandenberghe 9.6
Outline
Newton’s method
Self-concordant functions
Implementation
f (x + tΔx)
4
x (0)
– very slow if 𝛾 1 or 𝛾 1
x2
0
– example for 𝛾 = 10 at right x (1)
– called zig-zagging −4
−10 0 10
x1
x (0) x (0)
x (2)
x (1)
x (1)
104
f (x (k) ) − p★ 102
10 −2
backtracking line search
10 −4
0 50 100 150 200
k
Newton’s method
Self-concordant functions
Implementation
−∇f (x)
−∇f (x)
Δxnsd Δxnsd
x (0)
x (0)
x (2)
x (1) x (2)
x (1)
I steepest descent with backtracking line search for two quadratic norms
I ellipses show {x | kx − x (k) k P = 1}
I interpretation of steepest descent with quadratic norm k · k P : gradient descent after
change of variables x̄ = P1/2 x
I shows choice of P has strong effect on speed of convergence
Newton’s method
Self-concordant functions
Implementation
1
f (x + v) = f (x) + ∇f (x) T v + vT ∇2 f (x)v
2
b
b
f
(x, f (x))
(x + Δxnt , f (x + Δxnt )) f
f (x + v) = ∇f (x) + ∇2 f (x)v = 0
∇f (x + v) ≈ ∇b
b
f′
f′
(x + Δxnt , f ′ (x + Δxnt ))
(x, f ′ (x))
1/2
I Δxnt is steepest descent direction at x in local Hessian norm kuk ∇2 f (x) = uT ∇2 f (x)u
x + Δxnsd
x + Δxnt
f (x (0) ) − p★
+ log2 log2 (𝜖 0 /𝜖)
𝛾
105
x (0) 100
f (x (k) ) − p★
x (1) 10 −5
10 −10
10 −150 1 2 3 4 5
k
105 2
exact line search
100 1.5
backtracking
10 −5 1
exact line search
10 −10 0.5 backtracking
10 −150 2 4 6 8 10
0
0 2 4 6 8
k k
105
f (x (k) ) − p★
100
10 −5
0 5 10 15 20
k
Newton’s method
Self-concordant functions
Implementation
definition
I convex f : R → R is self-concordant if |f 000 (x)| ≤ 2f 00 (x) 3/2 for all x ∈ dom f
I f : Rn → R is self-concordant if g(t) = f (x + tv) is self-concordant for all x ∈ dom f , v ∈ Rn
examples on R
I linear and quadratic functions
I negative logarithm f (x) = − log x
I negative entropy plus negative logarithm: f (x) = x log x − log x
properties
I preserved under positive scaling 𝛼 ≥ 1, and sum
I preserved under composition with affine function
I if g is convex with dom g = R++ and |g000 (x)| ≤ 3g00 (x)/x then
is self-concordant
examples: properties can be used to show that the following are s.c.
I f (x) = − mi=1 log(bi − ai x) on {x | ai x < bi , i = 1, . . . , m}
T T
Í
f (x (0) ) − p★
+ log2 log2 (1/𝜖)
𝛾
20
iterations
15
10
0
0 5 10 15 20 25 30 35
f (x (0) ) − p★
Newton’s method
Self-concordant functions
Implementation
main effort in each iteration: evaluate derivatives and solve Newton system
HΔx = −g
Implementation
minimize f (x)
subject to Ax = b
I we assume
– f convex, twice continuously differentiable
– A ∈ Rp×n with rank A = p
– p★ is finite and attained
∇f (x★) + AT 𝜈★ = 0, Ax★ = b
P AT x
★
−q
=
A 0 𝜈★ b
Ax = 0, x≠0 =⇒ xT Px > 0
I
x̂ = ben , F= ∈ Rn× (n−1)
−1T
Implementation
∇ f (x) AT
2
v
−∇f (x)
=
A 0 w 0
f (y) = 𝜆(x) 2 /2
f (x) − inf b
Ay=b
d
f (x + tΔxnt ) = −𝜆(x) 2
dt t=0
1/2
I in general, 𝜆(x) ≠ ∇f (x) T ∇2 f (x) −1 ∇f (x)
I iterates of Newton’s method with equality constraints, started at x (0) = Fz (0) + x̂, are
x (k) = Fz (k) + x̂
Implementation
is primal-dual residual
∇ f (x) AT ∇f (x) + AT 𝜈
2
Δxnt
=−
A 0 Δ𝜈nt Ax − b
given starting point x ∈ dom f , 𝜈, tolerance 𝜖 > 0, 𝛼 ∈ (0, 1/2), 𝛽 ∈ (0, 1).
repeat
1. Compute primal and dual Newton steps Δxnt , Δ𝜈nt .
2. Backtracking line search on krk 2 .
t := 1.
while kr(x + tΔxnt , 𝜈 + tΔ𝜈nt )k 2 > (1 − 𝛼t)kr(x, 𝜈)k 2 , t := 𝛽t.
3. Update. x := x + tΔxnt , 𝜈 := 𝜈 + tΔ𝜈nt .
until Ax = b and kr(x, 𝜈)k 2 ≤ 𝜖.
d
kr(y + tΔy) k 2 = −kr(y)k 2
dt t=0
Implementation
H AT v g
=−
A 0 w h
Ín
I primal problem: minimize − i=1 log xi subject to Ax = b
Ín
I dual problem: maximize −bT 𝜈 + T
i=1 log(A 𝜈)i +n
– recover x★ as x★
i = 1/(AT 𝜈) i
105
100
f (x (k) ) − p★
10 −5
10 −10
0 5 10 15 20
k
Convex Optimization Boyd and Vandenberghe 10.17
Newton method applied to dual problem
I requires AT 𝜈 (0) 0
105
100
p★ − g(𝜈 (k) )
10 −5
10 −10
0 2 4 6 8 10
k
Convex Optimization Boyd and Vandenberghe 10.18
Infeasible start Newton method
I requires x (0) 0
1010
105
10 −5
10 −10
10 −15
0 5 10 15 20 25
k
Convex Optimization Boyd and Vandenberghe 10.19
Complexity per iteration of three methods is identical
I for feasible Newton method, use block elimination to solve KKT system
diag(x) −2 AT diag(x) −1 1
Δx
=
A 0 w 0
diag(x) −2 AT diag(x) −1 1 − AT 𝜈
Δx
=
A 0 Δ𝜈 b − Ax
Ín
I network flow optimization problem: minimize i=1 𝜙i (xi ) subject to Ax = b
I KKT system is
H AT v g
=−
A 0 w h
I H = diag(𝜙100 (x1 ), . . . , 𝜙n00 (xn )), positive diagonal
I solve via elimination:
AH −1 AT w = h − AH −1 g, v = −H −1 (g + AT w)
Íp
I eliminate ΔX from first equation to get ΔX = X − j=1 wj XAj X
I form and solve this set of equations to get w, then get ΔX from equation above
I form p(p + 1)/2 inner products tr((LT Ai L)(LT Aj L)) to get coefficent matrix (1/2)p2 n2
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
we assume
I fi convex, twice continuously differentiable
I A ∈ Rp×n with rank A = p
I p★ is finite and attained
I problem is strictly feasible: there exists x̃ with
I SDPs and SOCPs are better handled as problems with generalized inequalities (see later)
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
10
−5
−3 −2 −1 0 1
u
(for now, assume x★ (t) exists and is unique for each t > 0)
I central path is {x★ (t) | t > 0}
minimize cT x
subject to aTi x ≤ bi , i = 1, . . . , 6
x★ (10)
hyperplane cT x = cT x★ (t) is tangent to level curve of x★
𝜙 through x★ (t)
where we define 𝜆★i (t) = 1/(−tfi (x★ (t)) and 𝜈★ (t) = w/t
I this confirms the intuitive idea that f0 (x★ (t)) → p★ if t → ∞:
p★ ≥ g(𝜆★ (t), 𝜈★ (t)) = L(x★ (t), 𝜆★ (t), 𝜈★ (t)) = f0 (x★ (t)) − m/t
where Hi = {x | aTi x = bi }
−c
−3c
t=1 t=3
Convex Optimization Boyd and Vandenberghe 11.12
Outline
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
I terminates with f0 (x) − p★ ≤ 𝜖 (stopping criterion follows from f0 (x★ (t)) − p★ ≤ m/t)
I centering usually done using Newton’s method, starting at current x
I choice of 𝜇 involves a trade-off: large 𝜇 means fewer outer iterations, more inner
(Newton) iterations; typical values: 𝜇 = 10 or 20
I several heuristics for choice of t (0)
140
102
Newton iterations
120
100
duality gap
100
10 −2 80
60
10 −4 40
10 −6 𝜇 = 50 𝜇 = 150 𝜇=2 20
0
0 20 40 60 80 0 40 80 120 160 200
Newton iterations 𝜇
102
100
duality gap
10 −2
10 −4
10 −6 𝜇 = 150 𝜇 = 50 𝜇=2
0 20 40 60 80 100 120
Newton iterations
Convex Optimization Boyd and Vandenberghe 11.16
Family of standard LPs
(A ∈ Rm×2m )
minimize cT x
subject to Ax = b, x0
m = 10, . . . , 1000; for each m, solve 100 randomly generated instances
35
Newton iterations
30
25
20
15 1
10 102 103
m
number of iterations grows very slowly as m ranges over a 100 : 1 ratio
Convex Optimization Boyd and Vandenberghe 11.17
Outline
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
fi (x) < 0, i = 1, . . . , m, Ax = b
I (like the infeasible start Newton method, more sophisticated interior-point methods do not
require a feasible starting point)
I phase I method forms an optimization problem that
– is itself strictly feasible
– finds a strictly feasible point for original problem, if one exists
– certifies original problem as infeasible otherwise
I phase II uses barrier method starting from strictly feasible point found in phase I
minimize (over x, s) s
subject to fi (x) ≤ s, i = 1, . . . , m
Ax = b
minimize 1T s
subject to s 0, fi (x) ≤ si , i = 1, . . . , m
Ax = b
I for infeasible problems, produces a solution that satisfies many (but not all) inequalities
60 60
40 40
number
number
20 20
0 0
−1 −0.5 0 0.5 1 1.5 −1 −0.5 0 0.5 1 1.5
bi − aTi xmax bi − aTi xsum
Newton iterations
Newton iterations
Newton iterations
80 Infeasible Feasible 80 80
60 60 60
40 40 40
20 20 20
0 0 0
−1 −0.5 0 0.5 1 −100 −10 −2 −10 −4 −10 −6 10 −6 10 −4 10 −2 100
𝛾 𝛾 𝛾
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
log(m/(𝜖t (0) ))
log 𝜇
I we will bound number of Newton steps per centering iteration using self-concordance
analysis
second condition
I holds for LP, QP, QCQP
I may require reformulating the problem, e.g.,
Ín Ín
minimize i=1 xi log xi −→ minimize i=1 xi log xi
subject to Fx g subject to Fx g, x 0
I needed for complexity analysis; barrier method works even when self-concordance
assumption does not apply
i=1
= 𝜇tf0 (x) − 𝜇tL(x+ , 𝜆, 𝜈) − m − m log 𝜇
≤ 𝜇tf0 (x) − 𝜇tg(𝜆, 𝜈) − m − m log 𝜇
= m(𝜇 − 1 − log 𝜇)
using L(x+ , 𝜆, nu) ≥ g(𝜆, 𝜈) in second last line and f0 (x) − g(𝜆, 𝜈) = m/t in last line
4 104
3 104
N versus 𝜇 for typical values of 𝛾, c;
N
m
2 104 m = 100, initial duality gap t (0) 𝜖
= 105
1 104
0
1 1.1 1.2
𝜇
I confirms trade-off in choice of 𝜇
I in practice, #iterations is in the tens; not very sensitive for 𝜇 ≥ 10
√
I for 𝜇 = 1 + 1/ m:
√ m/t (0)
N=O m log
𝜖
√
I number of Newton iterations for fixed gap reduction is O( m)
I multiply with cost of one Newton iteration (a polynomial function of problem dimensions),
to get bound on number of flops
I this choice of 𝜇 optimizes worst-case complexity; in practice we choose 𝜇 fixed and larger
Barrier method
Phase I methods
Complexity analysis
Generalized inequalities
minimize f0 (x)
subject to fi (x) Ki 0, i = 1, . . . , m
Ax = b
I we assume
– fi twice continuously differentiable
– A ∈ Rp×n with rank A = p
– p★ is finite and attained
– problem is strictly feasible; hence strong duality holds and dual optimum is attained
examples
I nonnegative orthant K = Rn+ : 𝜓(y) = ni=1 log yi , with degree 𝜃 = n
Í
2
𝜓(y) = log(yn+1 − y12 − · · · − yn2 ) with degree (𝜃 = 2)
∇𝜓(y) K ∗ 0, yT ∇𝜓(y) = 𝜃
I nonnegative orthant Rn+ : 𝜓(y) = ni=1 log yi
Í
1 w
𝜆★i (t) = ∇𝜓i (−fi (x★ (t))), 𝜈★ (t) =
t t
I from properties of 𝜓i : 𝜆★i (t) K ∗ 0, with duality gap
i
m
Õ
f0 (x (t)) − g(𝜆 (t), 𝜈 (t)) = (1/t)
★ ★ ★
𝜃i
i=1
I dual point on central path: Z ★ (t) = −(1/t)F (x★ (t)) −1 is feasible for
maximize tr(GZ)
subject to tr(Fi Z) + ci = 0, i = 1, . . . , n
Z0
log 𝜇
I complexity analysis via self-concordance applies to SDP, SOCP
102
120
Newton iterations
100
duality gap
10 −2 80
10 −4
40
10 −6 𝜇 = 50 𝜇 = 200 𝜇=2
0
0 20 40 60 80 20 60 100 140 180
Newton iterations 𝜇
Newton iterations
100 100
duality gap
10 −2
60
10 −4
10 −6 𝜇 = 150 𝜇 = 50 𝜇=2 20
Newton iterations
30
25
20
15
101 102 103
n
Convex Optimization Boyd and Vandenberghe 11.41
Primal-dual interior-point methods
I update primal and dual variables, and 𝜅, at each iteration; no distinction between inner
and outer iterations
I often exhibit superlinear asymptotic convergence
I search directions can be interpreted as Newton directions for modified KKT conditions
mathematical optimization
I problems in engineering design, data analysis and statistics, economics, management, …,
can often be expressed as mathematical optimization problems
I techniques exist to take into account multiple objectives or uncertainty in the data
tractability
I roughly speaking, tractability in optimization requires convexity
I algorithms for nonconvex optimization find local (suboptimal) solutions, or are very
expensive
I surprisingly many applications can be formulated as convex problems