0% found this document useful (0 votes)
11 views29 pages

Chap04 ConvexOptimizationBasics

The document covers the basics of convex optimization, including definitions, properties, and examples of convex sets and functions. It outlines optimization terminology, key properties of convex problems, and first-order optimality conditions. Additionally, it discusses specific examples such as lasso and support vector machines, highlighting the unique characteristics of convex optimization solutions.

Uploaded by

Anh Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views29 pages

Chap04 ConvexOptimizationBasics

The document covers the basics of convex optimization, including definitions, properties, and examples of convex sets and functions. It outlines optimization terminology, key properties of convex problems, and first-order optimality conditions. Additionally, it discusses specific examples such as lasso and support vector machines, highlighting the unique characteristics of convex optimization solutions.

Uploaded by

Anh Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Convexity II: Optimization Basics

Lecturer: Ryan Tibshirani


Convex Optimization 10-725/36-725

See supplements for reviews of


• basic multivariate calculus
• basic linear algebra
Geometrically, this inequality means that the line segment between (x, f (x)) and

Last time: convex sets and functions


(y, f (y)), which is the chord from x to y, lies above the graph of f (figure 3.1).
A function f is strictly convex if strict inequality holds in (3.1) whenever x ̸= y
and 0 < θ < 1. We say f is concave if −f is convex, and strictly concave if −f is
strictly convex.
For an affine function we always have equality in (3.1), so all affine (and therefore
“Convex calculus” makes it easy to check convexity. Tools:
also linear) functions are both convex and concave. Conversely, any function that
is convex and concave is affine.
A function is convex if and only if it is convex when restricted to any line that
• Definitions of convex sets and functions, classic examples
intersects its domain. In other words f is convex if and only if for all x ∈ dom f and
2 Convex sets

(y, f (y))

(x, f (x))

Figure 3.1 Graph of a convex function. The chord (i.e., line segment) be-
Figure 2.2 Some simple convex and nonconvex tween sets. any two points
• Key properties (e.g., first- and second-order characterizations
Left. on the graph lies above the graph.
The hexagon,
which includes its boundary (shown darker), is convex. Middle. The kidney
for functions)
shaped set is not convex, since the line segment between the two points in
the set shown as dots is not contained in the set. Right. The square contains
some boundary points but not others, and is not convex.
• Operations that preserve convexity (e.g., affine composition)

   
1
E.g., is max log , kAx + bk51 convex?
(aT x + b)7

Figure 2.3 The convex hulls of two sets in R2 . Left. The convex hull of a
set of fifteen points (shown as dots) is the pentagon (shown shaded). Right. 2
Outline

Today:
• Optimization terminology
• Properties and first-order optimality
• Equivalent transformations

3
Optimization terminology
Reminder: a convex optimization problem (or program) is

min f (x)
x∈D
subject to gi (x) ≤ 0, i = 1, . . . m
Ax = b

where f and gi , i = 1, . . . m
Tmare all convex, and the optimization
domain is D = dom(f ) ∩ i=1 dom(gi ) (often we do not write D)
• f is called criterion or objective function
• gi is called inequality constraint function
• If x ∈ D, gi (x) ≤ 0, i = 1, . . . m, and Ax = b then x is called
a feasible point
• The minimum of f (x) over all feasible points x is called the
optimal value, written f ?

4
• If x is feasible and f (x) = f ? , then x is called optimal; also
called a solution, or a minimizer1
• If x is feasible and f (x) ≤ f ? + , then x is called -suboptimal
• If x is feasible and gi (x) = 0, then we say gi is active at x
• Convex minimization can be reposed as concave maximization

min f (x) max − f (x)


x x
subject to gi (x) ≤ 0, ⇐⇒ subject to gi (x) ≤ 0,
i = 1, . . . m i = 1, . . . m
Ax = b Ax = b

Both are called convex optimization problems

1
Note: a convex optimization problem need not have solutions, i.e., need
not attain its minimum, but we will not be careful about this
5
Convex solution sets
Let Xopt be the set of all solutions of convex problem, written
Xopt = argmin f (x)
subject to gi (x) ≤ 0, i = 1, . . . m
Ax = b
Key property: Xopt is a convex set

Proof: use definitions. If x, y are solutions, then for 0 ≤ t ≤ 1,


• tx + (1 − t)y ∈ D
• gi (tx + (1 − t)y) ≤ tgi (x) + (1 − t)gi (y) ≤ 0
• A(tx + (1 − t)y) = tAx + (1 − t)Ay = b
• f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) = f ?
Therefore tx + (1 − t)y is also a solution

Another key property: if f is strictly convex, then the solution is


unique, i.e., Xopt contains one element
6
Example: lasso
Given y ∈ Rn , X ∈ Rn×p , consider the lasso problem:

min ky − Xβk22
β

subject to kβk1 ≤ s

Is this convex? What is the criterion function? The inequality and


equality constraints? Feasible set? Is the solution unique, when:
• n ≥ p and X has full column rank?
• p > n (“high-dimensional” case)?

How do our answers change if we changed criterion to Huber loss:


n
(
1 2
X z |z| ≤ δ
ρ(yi − xTi β), ρ(z) = 2 1 2
?
i=1
δ|z| − 2 δ else

7
Example: support vector machines
Given y ∈ {−1, 1}n , X ∈ Rn×p with rows x1 , . . . xn , consider the
support vector machine or SVM problem:
n
1 X
min kβk22 + C ξi
β,β0 ,ξ 2
i=1
subject to ξi ≥ 0, i = 1, . . . n
yi (xTi β + β0 ) ≥ 1 − ξi , i = 1, . . . n

Is this convex? What is the criterion, constraints, feasible set? Is


the solution (β, β0 , ξ) unique? What if changed the criterion to
n
1 1 X
kβk22 + β02 + C ξi1.01 ?
2 2
i=1

For original criterion, what about β component, at the solution?

8
Local minima are global minima
For a convex problem, a feasible point x is called locally optimal is
there is some R > 0 such that

f (x) ≤ f (y) for all feasible y such that kx − yk2 ≤ R

Reminder: for convex optimization problems, local optima are


global optima



● ●
Proof simply follows

from definitions ●

● ● ●●

Convex Nonconvex

9
Rewriting constraints
The optimization problem

min f (x)
x
subject to gi (x) ≤ 0, i = 1, . . . m
Ax = b

can be rewritten as

min f (x) subject to x ∈ C


x

where C = {x : gi (x) ≤ 0, i = 1, . . . m, Ax = b}, the feasible set.


Hence the above formulation is completely general

With IC the indicator of C, we can write this in unconstrained form

min f (x) + IC (x)


x

10
First-order optimality condition
For a convex problem

min f (x) subject to x ∈ C


x

and differentiable f , a feasible point x is optimal if and only if

∇f (x)T (y − x) ≥ 0 for all y ∈ C

This is called the first-order condition


for optimality

In words: all feasible directions from x


are aligned with gradient ∇f (x)

Important special case: if C = Rn (unconstrained optimization),


then optimality condition reduces to familiar ∇f (x) = 0

11
Example: quadratic minimization
Consider minimizing the quadratic function
1
f (x) = xT Qx + bT x + c
2
where Q  0. The first-order condition says that solution satisfies

∇f (x) = Qx + b = 0

Cases:
• if Q  0, then there is a unique solution x = −Q−1 b
• if Q is singular and b ∈
/ col(Q), then there is no solution (i.e.,
minx f (x) = −∞)
• if Q is singular and b ∈ col(Q), then there are infinitely many
solutions
x = −Q+ b + z, z ∈ null(Q)
where Q+ is the pseudoinverse of Q
12
Example: equality-constrained minimization
Consider the equality-constrained convex problem:

min f (x) subject to Ax = b


x

with f differentiable. Let’s prove Lagrange multiplier optimality


condition
∇f (x) + AT u = 0 for some u
According to first-order optimality, solution x satisfies Ax = b and

∇f (x)T (y − x) ≥ 0 for all y such that Ay = b

This is equivalent to

∇f (x)T v = 0 for all v ∈ null(A)

Result follows because null(A)⊥ = row(A)

13
Example: projection onto a convex set
Consider projection onto convex set C:

min ka − xk22 subject to x ∈ C


x

First-order optimality condition says that the solution x satisfies

∇f (x)T (y − x) = (x − a)T (y − x) ≥ 0 for all y ∈ C

Equivalently, this says that ●


a − x ∈ NC (x)
● ●

where recall NC (x) is the normal


cone to C at x

14
Partial optimization

Reminder: g(x) = miny∈C f (x, y) is convex in x, provided that f


is convex in (x, y) and C is a convex set

Therefore we can always partially optimize a convex problem and


retain convexity

E.g., if we decompose x = (x1 , x2 ) ∈ Rn1 +n2 , then

min f (x1 , x2 ) min f˜(x1 )


x1 ,x2 x1
subject to g1 (x1 ) ≤ 0 ⇐⇒ subject to g1 (x1 ) ≤ 0
g2 (x2 ) ≤ 0

where f˜(x1 ) = min{f (x1 , x2 ) : g2 (x2 ) ≤ 0}. The right problem is


convex if the left problem is

15
Example: hinge form of SVMs
Recall the SVM problem
n
1 X
min kβk22 + C ξi
β,β0 ,ξ 2
i=1
subject to ξi ≥ 0, yi (xTi β + β0 ) ≥ 1 − ξi , i = 1, . . . n

Rewrite the constraints as ξi ≥ max{0, 1 − yi (xTi β + β0 )}. Indeed


we can argue that we have = at solution

Therefore plugging in for optimal ξ gives the hinge form of SVMs:


n
1 X
kβk22 + C 1 − yi (xTi β + β0 ) +
 
min
β,β0 2
i=1

where a+ = max{0, a} is called the hinge function

16
Transformations and change of variables
If h : R → R is a monotone increasing transformation, then

min f (x) subject to x ∈ C


x
⇐⇒ min h(f (x)) subject to x ∈ C
x

Similarly, inequality or equality constraints can be transformed and


yield equivalent optimization problems. Can use this to reveal the
“hidden convexity” of a problem

If φ : Rn → Rm is one-to-one, and its image covers feasible set C,


then we can change variables in an optimization problem:

min f (x) subject to x ∈ C


x
⇐⇒ min f (φ(y)) subject to φ(y) ∈ C
y

17
Example: geometric programming
A monomial is a function f : Rn++ → R of the form

f (x) = γxa11 xa22 · · · xann

for γ > 0, a1 , . . . an ∈ R. A posynomial is a sum of monomials,


p
X
f (x) = γk x1ak1 x2ak2 · · · xankn
k=1

A geometric program is of the form

min f (x)
x
subject to gi (x) ≤ 1, i = 1, . . . m
hj (x) = 1, j = 1, . . . r

where f , gi , i = 1, . . . m are posynomials and hj , j = 1, . . . r are


monomials. This is nonconvex
18
Let’s prove that a geometric program is equivalent to a convex one.
Given f (x) = γxa11 xa22 · · · xann , let yi = log xi and rewrite this as
T y+b
γ(ey1 )a1 (ey2 )a2 · · · (eyn )an = ea
T
for b = log γ. Also, a posynomial can be written as pk=1 eak y+bk .
P
With this variable substitution, and after taking logs, a geometric
program is equivalent to
p0
!
T
X
min log ea0k y+b0k
x
k=1
pi
!
aT
X
subject to log e ik y+bik ≤ 0, i = 1, . . . m
k=1
cTj y + dj = 0, j = 1, . . . r

This is convex, recalling the convexity of soft max functions

19
Many interesting problems are geometric programs, e.g., floor
8.8 Floor planning
planning: 43

wi

Ci hi
H
(xi , yi )

W
Figure 8.18 Floor planning problem. Non-overlapping rectangular cells are
placed in a rectangle with width W , height H, and lower left corner at (0, 0).
See Boyd
The ithetcell
al.is specified
(2007),by“A tutorial
its width on geometric
wi , height programming”,
hi , and the coordinates of its
lower left corner, (xi , yi ).
and also Chapter 8.8 of B & V book

20
Example floor planning program:

min WH
W,H,
x,y,w,h

subject to 0 ≤ xi ≤ W, i = 1, . . . n
0 ≤ yi ≤ H, i = 1, . . . n
xi + wi ≤ xj , (i, j) ∈ L
yi + hi ≤ yj , (i, j) ∈ B
wi hi = Ci , i = 1, . . . n.

Check: why is this a geometric program?

21
Eliminating equality constraints
Important special case of change of variables: eliminating equality
constraints. Given the problem

min f (x)
x
subject to gi (x) ≤ 0, i = 1, . . . m
Ax = b

we can always express any feasible point as x = M y + x0 , where


Ax0 = b and col(M ) = null(A). Hence the above is equivalent to

min f (M y + x0 )
y

subject to gi (M y + x0 ) ≤ 0, i = 1, . . . m

Note: this is fully general but not always a good idea (practically)

22
Introducing slack variables
Essentially opposite to eliminating equality contraints: introducing
slack variables. Given the problem

min f (x)
x
subject to gi (x) ≤ 0, i = 1, . . . m
Ax = b

we can transform the inequality constraints via

min f (x)
x,s

subject to si ≥ 0, i = 1, . . . m
gi (x) + si = 0, i = 1, . . . m
Ax = b

Note: this is no longer convex unless gi , i = 1, . . . , n are affine


23
Relaxing nonaffine equality constraints
Given an optimization problem

min f (x) subject to x ∈ C


x

we can always take an enlarged constraint set C̃ ⊇ C and consider

min f (x) subject to x ∈ C̃


x

This is called a relaxation and its optimal value is always smaller or


equal to that of the original problem

Important special case: relaxing nonaffine equality constraints, i.e.,

hj (x) = 0, j = 1, . . . r

where hj , j = 1, . . . r are convex but nonaffine, are replaced with

hj (x) ≤ 0, j = 1, . . . r

24
Example: maximum utility problem
The maximum utility problem models investment/consumption:
T
X
max αt u(xt )
x,b
t=1
subject to bt+1 = bt + f (bt ) − xt , t = 1, . . . T
0 ≤ xt ≤ bt , t = 1, . . . T

Here bt is the budget and xt is the amount consumed at time t; f


is an investment return function, u utility function, both concave
and increasing

Is this a convex problem? What if we replace equality constraints


with inequalities:

bt+1 ≤ bt + f (bt ) − xt , t = 1, . . . T ?

25
Example: principal components analysis
Given X ∈ Rn×p , consider the low rank approximation problem:

min kX − Rk2F subject to rank(R) = k


R

Here kAk2F = ni=1 pj=1 A2ij , the entrywise squared `2 norm, and
P P
rank(A) denotes the rank of A

Also called principal components analysis or PCA problem. Given


X = U DV T , singular value decomposition or SVD, the solution is

R = Uk Dk VkT

where Uk , Vk are the first k columns of U, V and Dk is the first k


diagonal elements of D. I.e., R is reconstruction of X from its first
k principal components

This problem is not convex. Why?


26
We can recast the PCA problem in a convex form. First rewrite as

min kX − XZk2F subject to rank(Z) = k, Z is a projection


Z∈Sp
⇐⇒ maxp tr(SZ) subject to rank(Z) = k, Z is a projection
Z∈S

where S = X T X. Hence constraint set is the nonconvex set


n
C = Z ∈ Sp : λi (Z) ∈ {0, 1}, i = 1, . . . p, tr(Z) = k}

where λi (Z), i = 1, . . . n are the eigenvalues of Z. Solution in this


formulation is
Z = Vk VkT
where Vk gives first k columns of V

27
Now consider relaxing constraint set to Fk = conv(C), its convex
hull. Note

Fk = {Z ∈ Sp : λi (Z) ∈ [0, 1], i = 1, . . . p, tr(Z) = k}


= {Z ∈ Sp : 0  Z  I, tr(Z) = k}

Recall this is called the Fantope of order k

Hence, the linear maximization over the Fantope, namely

max tr(SZ)
Z∈Fk

is convex. Remarkably, this is equivalent to the nonconvex PCA


problem (admits the same solution)!

(Famous result: Fan (1949), “On a theorem of Weyl conerning


eigenvalues of linear transformations”)

28
References and further reading

• S. Boyd and L. Vandenberghe (2004), “Convex optimization”,


Chapter 4
• O. Guler (2010), “Foundations of optimization”, Chapter 4

29

You might also like