Math6015 Lecture 04
Math6015 Lecture 04
Lecture : Chapter 2 - 4
04
Jingwei Liang
Institute of Natural Sciences and School of Mathematical Sciences
Email: [email protected]
Office: Room 355, No. 6 Science Building
Previously
We have seen optimization problems of the form
Norm approximation
minimize ||Ax − b||.
Penalty function approximation problem
minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
Least-norm problem
minimize ||x||,
subject to Ax = b.
MAP estimation problem
maximize log py|x (x, y) + log px (x).
2
Convexity • /65
Optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
3
Convexity • /65
Optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Find good (or best) actions
x represents some actions
trades in a portfolio
airplane control surface deflections
schedule or assignment
resource allocation
Constraints limit actions or impose conditions on outcome.
The samller (or larger) the objective f0 (x), the better
total cost (or negative profit)
deviation from desired or target outcome
risk
fuel use
3
Convexity • /65
Optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Find good models
x represents parameters of a model.
Constraints impose requirements on model parameters (e.g. nonnegativity).
Objective f0 (x) is sum of two terms
a prediction error (or loss) on some observed data
a (regularization) term that penalizes model complexity
3
Convexity • /65
Optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Worse-case analysis
Variables are actions or parameters out of our control (and possibly under the control of an
adversary).
Constraints limit the possible values of the parameters.
Minimizing −f0 (x) finds worst possible parameter values
If the worst possible value of f0 (x) is tolerable, you’re OK.
It is good to know what the worst possible scenario can be.
3
Convexity • /65
Convex optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
Variable x ∈ Rn .
f0 : Rn → R is the objective function.
Equality constraints are linear.
fi : Rn → R, i = 0, 1, ..., m are convex.
4
Convexity • /65
Convex optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
When is an optimization problem hard to solve?
Classical view
Linear is easy.
Non-linear is hard.
But it is wrong!
The correct view
convex is easy.
Nonconvex (negative curvature) is hard.
4
Convexity • /65
Convex optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
Solving convex optimization problems
Many different algorithms (that run on many platforms)
Interior-point methods for up to 10000s of variables.
First-order methods for larger problems
Do not require initial point, babysitting, or tuning...
Can develop and deploy quickly using modeling languages such as CVXPY.
Solvers are reliable, so can be embedded
Code generation yields real-time solvers that execute in milliseconds.
4
Convexity • /65
Convex optimization problems
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
Brief history
Algorithms
1947: simplex algorithm for linear programming (Dantzig)
1960s: early interior-point methods (Fiacco & McCormick, Dikin, …)
1970s: ellipsoid method and other subgradient methods
1980s & 90s: interior-point methods (Karmarkar, Nesterov & Nemirovski)
Since 2000s: many methods for large-scale convex optimization
Applications
Before 1990: mostly in operations research, a few in engineering
Since 1990: many applications in engineering (control, signal processing, communications,
circuit design, …)
Since 2000s: image processing, machine learning and statistics, finance
4
Convexity • /65
Affine and Convex Sets
Affine sets
Chapter 2 Convex sets
5
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Definition. Affine set [WyYβ⊤ ]
A set C ⊆ Rn is called affine if for any x1 , x2 ∈ C and θ ∈ R, there holds the inclusion
θx1 + (1−θ)x2 ∈ C.
6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Fact. Affine set and subspace [WyYβ⊤ ]
If C is an affine set and x0 ∈ C, then the set
def { }
S = C − x0 = x − x0 | x ∈ C
is a subspace.
6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Example. Solutions of linear equation [WyYβ⊤ ]
The set of solutions of linear equations
def { }
C = x ∈ Rn | Ax = b ,
where A ∈ Rm×n and b ∈ Rm , is an affine set.
Nullspace of A: C − x, x ∈ C
6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Affine hull For a set C ⊆ E, the affine hull of C, denoted by aff(C), is the intersection of all affine
sets containing C. { k }
def
∑ ∑
k
aff(C) = θi x i | θi = 1, xi ∈ C .
i=1 i=1
aff(C) is by itself an affine set, and it is the smallest affine set containing C (w.r.t. inclusion).
6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine dimension and relative interior
Definition. Affine dimension [WyYβ⊤ ]
The affine dimension of a set C is the dimension of its affine hull.
Remark Not always consistent with other definitions of dimensions (e.g. unit circle).
7
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine dimension and relative interior
Definition. Affine dimension [WyYβ⊤ ]
The affine dimension of a set C is the dimension of its affine hull.
Relative interior If the affine dimension of C is less than n, then aff(C) ̸= Rn . Relative interior:
def { }
relint(C) = x ∈ C | B(x, r) ∩ aff(C) ⊆ C for some r > 0 .
Remark Not always consistent with other definitions of dimensions (e.g. unit circle).
7
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Convex sets
Definition. Convex set [WyYβ⊤ ]
A subset S of E is convex if for any x, y ∈ S and θ ∈ [0, 1], there holds
θx + (1 − θ)y ∈ S.
θx + (1 − θ)y is called the convex combination of x and y.
8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Convex sets
∑
Convex combination given x1 , x2 , ..., xk and θ1 , θ2 , ..., θk ∈ R+ such that i θi = 1, then
θ1 x 1 + θ2 x 2 + · · · + θ k x k
is called the convex combination of x1 , x2 , ..., xk .
Convex hull The convex hull of a set C, denoted by conv(C), is the set of all convex combinations
of points in C { k }
def
∑ ∑k
conv(C) = θi xi | xi ∈ C, θi ∈ R+ and θi = 1 .
i=1 i=1
8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Convex sets
Convex hull The convex hull of a set C, denoted by conv(C), is the set of all convex combinations
of points in C { k }
def
∑ ∑k
conv(C) = θi xi | xi ∈ C, θi ∈ R+ and θi = 1 .
i=1 i=1
8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Cones
Definition. Convex cone [WyYβ⊤ ]
Given a set C ⊂ Rn , we call C a cone (or nonnegative homogeneous) if for any x ∈ C and θ ≥ 0,
there holds
θx ∈ C.
If, moreover, C is convex, then it is called a convex cone.
9
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Cones
Definition. Convex cone [WyYβ⊤ ]
Given a set C ⊂ Rn , we call C a cone (or nonnegative homogeneous) if for any x ∈ C and θ ≥ 0,
there holds
θx ∈ C.
If, moreover, C is convex, then it is called a convex cone.
Conic hull The conic hull of a set C, is the set of all conic
combinations of points in C
{ k }
∑
θi xi | xi ∈ C, θi ∈ R+
i=1
10
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Euclidean balls and ellipsoids
Ellipsoids let P = P ⊤ ≻ 0,
def { }
E = x ∈ Rn | (x − xc )⊤ P −1 (x − xc ) ≤ 1
is convex.
11
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Norm balls and norm cones
12
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra
Polyhedron A polyhedron is define as the solution set of a finite number of linear equalities and
inequalities:
def { }
P = x ∈ Rn | a ⊤ ⊤
i x ≤ bi , i = 1, ..., m and cj x = dj , j = 1, ..., p .
Affine sets (e.g., subspaces, hyperplanes, lines), rays, line segments, and halfspaces are all
polyhedra.
Alternative notation
def { }
P = x ∈ Rn | Ax ⪯ b and Cx = d .
13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra
Simplexes let v0 , v1 , ..., vk ∈ Rn be affinely independent (i.e. v1 − v0 , ..., vk − v0 are linearly inde-
pendent), the simplex determined by then is given by
{ } { }
C = conv v0 , v1 , ..., vk = θ0 v0 + θ1 v1 + · · · + θk vk | θ ⪰ 0, 1 ⊤θ = 1 .
def
line segment for 1D simplex, triangle for 2D simplex and tetrahedron for 3D simplex.
The unit simplex is the n-dimensional simplex determined by the zero vector and the unit vectors,
i.e. 0, e1 , ..., en ∈ Rn , and can expressed as
{ }
x ∈ Rn : x ⪰ 0 and 1⊤ x ≤ 1.
The probability simplex is the (n−1)-dimensional simplex determined by the unit vectors e1 , ..., en ∈
Rn . It is the set of vectors that satisfy
{ }
x ∈ Rn : x ⪰ 0 and 1 ⊤x = 1 .
13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra
Convex hull description of polyhedra the convex hull of the finite set {v1 , v2 , ..., vk } is
def { }
C = θ1 v1 + θ2 v2 + · · · + θk vk | θ ⪰ 0, 1 ⊤θ = 1 .
This set is polyhedron and bounded.
13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
The positive semidefinite cone
14
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
2.3 Operations that Preserve Convexity
Showing a set is convex
Methods for establishing convexity of a set C
1. Use definition, recommended only for very simple sets.
2. Use convex functions (next lecture).
3. Show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,...) by
operations that preserve convexity
Intersection
Affine mapping
Perspective mapping
Linear-fractional mapping
15
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Intersection
16
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Affine functions
Examples
Scaling
Translation
Projection
Sum/difference
Partial sum
17
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Perspective functions
The perspective function the perspective function P : Rn+1 → Rn , with domain dom(P ) = Rn ×
R++ , is defined as
z
P (z, t) = .
t
The perspective function scales or normalizes vectors so the last component is one, and
Remark
then drops the last component.
18
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Perspective functions
The perspective function the perspective function P : Rn+1 → Rn , with domain dom(P ) = Rn ×
R++ , is defined as
z
P (z, t) = .
t
If C ⊆ dom(P ) is convex, then its image
def { }
P (C) = P (x) | x ∈ C
is convex.
The perspective function scales or normalizes vectors so the last component is one, and
Remark
then drops the last component.
18
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Linear-fractional functions
A linear-fractional function is formed by composing the perspective function with an affine func-
tion. Suppose g : Rn → Rm+1 is affine, i.e.
[ ] ( )
A b
g(x) = ⊤ x + ,
c d
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn and d ∈ R. The function f : Rn → Rm given by f = P ◦ g,
Ax + b { }
f (x) = ⊤ , dom(f ) = x ∈ Rn | c⊤ x + d > 0
c x+d
is called linear-fractional function.
19
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Preliminaries
Let
xj = x0 + tej , j = 1, 2, ..., n.
By definition of derivative, for j = 1, 2, ..., n
f (xj ) − (tLej + f (x0 )) f (x0 + tej ) − f (x0 )
lim =0 =⇒ lim = Lej .
t→0 t t→0 t
Partial derivative
∂f
(x0 ).
∂xj
21
Convexity • Chapter 3-Convex Functions • Preliminaries /65
The derivative matrix
As f : Rn → Rm , ∂f
∂xj (x0 )
1
f1 (x)
f2 (x) ∂f1 (x )
∂xj 0
∂f
f (x) =
.
and (x0 ) = .. .
.. ∂xj
.
∂f1
fm (x) ∂xj (x 0 )
21
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
Given a function f : Rn → R and an open set S ⊂ dom(f ). If for every point x ∈ S and i = 1, ..., n,
the partial derivative
∂f (x) f (x + tei ) − f (x)
= lim
∂xi t→0 t
exists and continuous, we say f is continuous differentiable. We denote f ∈ C 1 .
22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
If for every point x ∈ S, i = 1, ..., n and j = 1, ..., n, the 2nd order partial derivative
∂ 2 f (x)
∂xi ∂xj
exists and continuous, we say f is twice continuous differentiable. We denote f ∈ C 2 .
Remark While Hessian matrix is square and symmetric, the Jacobi matrix in general is not!
22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
Example. Quadratic function [WyYβ⊤ ]
Consider
1 ⊤
f (x) = x P x + q⊤ x + r
2
where P ∈ S n , q ∈ Rn and r ∈ R.
22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Taylor expansion
Let S ⊂ Rn be an open set and f ∈ C 1 , then the 1st-order Taylor expansion of f at x ∈ S reads
f (y) = f (x) + ⟨∇f (x) | y − x⟩ + o(||y − x||).
Let S ⊂ Rn be an open set and f ∈ C 2 , then the 2nd-order Taylor expansion of f at x ∈ S reads
1 2
f (y) = f (x) + ⟨∇f (x) | y − x⟩ + ⟨y − x | ∇2 f (x)(y − x)⟩ + o(||y − x|| ).
2
23
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Chain rule
Suppose f : Rn → Rm is differentiable at x ∈ int(domf ) and g : Rm → Rp is differentiable at
f (x) ∈ int(domg). Define the composition h : Rn → Rp by
h(x) = g(f (x)).
Then h(x) is differentiable at x, with gradient reads
∇h(x) = ∇g(f (x))∇f (x).
∂f ∂f ∂a ∂v d (
k
) ∑ ∂f ∂gi (x)
= ··· and f g1 (x), . . . , gk (x) =
∂x ∂a ∂b ∂x dx ∂gi (x) ∂x
i=1
24
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Chain rule
24
Convexity • Chapter 3-Convex Functions • Preliminaries /65
3.2 Basic Properties and Examples
Convex functions
Definition. Convex function [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty convex set, a function f : S → R is said to be convex if for any x, y ∈ S
and any λ ∈ (0, 1), there holds
( )
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
It is said to be strictly convex if
( )
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y).
( )
y, f (y)
( )
x, f (x)
25
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Convex functions
Definition. Convex function [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty convex set, a function f : S → R is said to be convex if for any x, y ∈ S
and any λ ∈ (0, 1), there holds
( )
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
It is said to be strictly convex if
( )
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y).
25
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Extended-value extension
Definition. Extended-value extension [WyYβ⊤ ]
If f is convex, we define its extended-value extensions f˜ : Rn → R ∪ {∞} by
{
f (x) x ∈ dom(f )
˜
f (x) =
+∞ x ∈ / dom(f ).
{ }
dom(f˜) = Rn and dom(f ) = x | f˜(x) < +∞ .
Indicator function {
0 x∈C
ιC (x) =
+∞ x ∈
/ C.
26
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Theorem. First-order condition [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued differentiable function from S
to R. Then f is convex if and only if for any x, y ∈ S, there holds
f (y) ≥ f (x) + ⟨∇f (x) | y − x⟩.
( )
x, f (x)
27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Necessity
27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Sufficienty
27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Corollary. [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued function from S to R which is
differentiable at x̄, then for any x ∈ S, there holds
f (x) ≥ f (x̄) + ⟨∇f (x̄) | x − x̄⟩.
27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Second-order condition
Theorem. Second-order condition [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued twice-differentiable function
from S to R. Then f is convex if and only if
∇2 f (x)
is positive semi-definite for any x ∈ S. It is moreover strictly convex if and only if
∇2 f (x)
is positive definite for any x ∈ S.
Powers xa , x > 0
a
Powers of absolute value |x|
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
Norms
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
{ }
Max function f (x) = max x1 , x2 , ..., xn
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
x2
Quadratic-over-linear function f (x, y) = y with dom(f ) = R × R++ .
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
( )
Log-sum-exp f (x) = log ex1 + ex2 + · · · + exn
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
∏n 1/n
Geometric mean f (x) = ( i=1 xi )
29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Associated sets
The epigraph of f is
{ }
epi(f ) = (x, t) | x ∈ dom(f ), f (x) ≤ t
⊆ Rn+1 . dom(f )
30
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Associated sets
( )⊤
∇f (x), −1
30
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Jensen’s inequality and extensions
Proposition. Jensen’s inequality [WyYβ⊤ ]
If f is convex, x1 , ..., xk ∈ dom(f ), and θ1 , ..., θk ≥ 0 with θ1 + · · · + θk = 1, then
f (θ1 x1 + · · · + θk xk ) ≤ θ1 f (x1 ) + · · · + θk f (xk ).
31
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
3.3 Operations that Preserve Convexity
Nonnegative weighted sums
Non-negative weighted sum of convex functions
f = w1 f 1 + w2 f 2 + · · · + wm f m ,
where fi , i = 1, ..., m are convex, and wi , i = 1, ..., m are non-negative.
32
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Composition with an affine mapping
Suppose f : Rn → R, A ∈ Rn×m and b ∈ Rn . Define g : Rm → R by
def
g(x) = f (Ax + b),
{ }
with dom(g) = x | Ax + b ∈ dom(f ) .
33
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
If f1 and f2 are convex functions, then their pointwise maximum f , defined by
def { }
f (x) = max f1 (x), f2 (x)
with dom(f ) = dom(f1 ) ∩ dom(f2 ), is also convex.
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
If f1 and f2 are convex functions, then their pointwise maximum f , defined by
def { }
f (x) = max f1 (x), f2 (x)
with dom(f ) = dom(f1 ) ∩ dom(f2 ), is also convex.
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
The pointwise maximum property extends to the pointwise supremum over an infinite set of convex
functions.
If for each y ∈ A, function f (x, y) is convex, then the function g, defined by
def
g(x) = sup f (x, y)
y∈A
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Support function [WyYβ⊤ ]
Let C ∈ Rn be non-empty. The support function SC associated with the set C is defined as
def
SC (x) = sup ⟨x | y⟩,
y∈C
def { }
with dom(SC ) = x | supy∈C ⟨x | y⟩ < ∞ .
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Distance to farthest point of a set [WyYβ⊤ ]
Let C ∈ Rn . The distance (in any norm) to the farthest point of C,
def
f (x) = sup ||x − y||.
y∈C
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Maximum eigenvalue of a symmetric matrix [WyYβ⊤ ]
The function f (X) = λmax (X), with dom(f ) = Sm , is convex.
||y ||=1
34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Composition
def
Consider h : Rk → R and g : Rn → Rk , their composition f = h ◦ g : Rn → R, is defined by
( ) { }
f (x) = h g(x) , dom(f ) = x ∈ dom(g), g(x) ∈ dom(h) .
1D Chain rule
f ′′ (x) = h′′ (x)g ′ (x)2 + h′ (g(x))g ′′ (x)
35
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Minimization
If f is convex in (x, y) and C is a convex non-empty set, then the function
g(x) = inf f (x, y)
y∈C
36
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Minimization
Example. Distance to a set [WyYβ⊤ ]
The distance of a point x to a set S ⊆ Rn , in the norm || · ||, is defined as
def
dist(x, S) = inf ||x − y||.
y∈S
36
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
3.4 The Conjugate Function
Conjugate function
Definition. Conjugate [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞], the conjugate of f is defined by
x∈Rn
Biconjugate f ∗∗ = (f ∗ )∗ .
Also called Fenchel conjugate, gra(f )
Legendre transform, or
Legendre-Fenchel transform. gra(h·|yi)
f ∗(y)
37
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Affine function f (x) = ax + b
38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Negative logarithm f (x) = ex
38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Exponential f (x) = x log x
38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Negative entropy f (x) = 1/x
38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Inverse f (x) = ax + b
38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Strictly convex quadratic function
Let Q ∈ Sn++ and consider
1 ⊤
f (x) = x Qx
2
39
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Support function
S⊥
40
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm
Self-conjugacy
41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm
Self-conjugacy
41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm
Self-conjugacy
Example. Conjugate of norm [WyYβ⊤ ]
Let f = ||x|| be a norm with dual norm || · ||∗ . Then
{
∗ 0 ||y||∗ ≤ 1,
f (y) =
+∞ o.w.
41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties
42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties
42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties
42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties
42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
The Fenchel-Moreau theorem
Theorem. Fenchel-Moreau [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞] be proper. Then f is closed and convex if and only if f = f ∗∗ . In this case,
f ∗ is proper as well.
43
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
The Fenchel-Moreau theorem
Theorem. Fenchel-Moreau [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞] be proper. Then f is closed and convex if and only if f = f ∗∗ . In this case,
f ∗ is proper as well.
Corollary. [WyYβ⊤ ]
Let f ∈ Γ0 (Rn ), then f ∗ ∈ Γ0 (Rn ) and f ∗∗ = f .
43
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Calculus
Theorem. Subdifferential and conjugation [WyYβ⊤ ]
Let f ∈ Γ0 (Rn ), let x ∈ Rn and y ∈ Rn . Then the following are equivalent
(x, y) ∈ gra(∂f ).
f (x) + f ∗ (y) = ⟨x | y⟩.
(y, x) ∈ gra(∂f ∗ ).
44
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Optimization Problems
Basic terminology
Expressing problems in standard form
Equivalent problems
Chapter 4
Convexity Optimization
Convex optimization problems in standard form
Local and global optima
An optimality criterion for differentiable f0
Optimization variable
Objective function, cost function (loss function)
Equality constraints, equality constraint functions
Inequality constraints, inequality constraint functions
Unconstrained
45
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimization problems
Domain of the optimization problem
def
∩
m ∩
p
D= dom(fi ) ∩ dom(hi ).
i=0 i=1
45
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Optimal value p⋆
def { }
p⋆ = inf f0 (x) | fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .
46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Optimal value p⋆
def { }
p⋆ = inf f0 (x) | fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .
Sub-optimal...
46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Locally optimal point Let R > 0 and x be feasible
{ }
f (x) = inf f0 (z) | ||x − z|| ≤ R, fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .
46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Locally optimal point Let R > 0 and x be feasible
{ }
f (x) = inf f0 (z) | ||x − z|| ≤ R, fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .
46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems
Feasible problem
find x
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.
47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems
Feasible problem
find x
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.
Alternative formulation
minimize ιC (x) + ιD (x),
where
def { }
C = x ∈ Rn | fi (x) ≤ 0, i = 1, ..., m
def { }
D = x ∈ Rn | hi (x) = 0, i = 1, ..., p
47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems
Sudoku puzzle
Standard size: 9 × 9 grid. Other sizes: 4 × 4,
16 × 16 and 25 × 25...
Each 3 × 3 sub-block, row and column
contains all of the digits from 1 to 9.
Each column contains all of the digits from 1
to 9.
Each 3 × 3 block contains all of the digits from
1 to 9.
Partially completed grid, at least 17 numbers,
is provided, for completing the puzzle with
unique solution.
47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Standard form
Standard form
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.
48
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Expressing problems in standard form /65
Standard form
Standard form
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.
48
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Expressing problems in standard form /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
Change of variable
( Suppose
) ϕ : Rn → Rn is one-to-one, with image covering the problem domain
D, i.e. D ⊆ ϕ dom(ϕ) .
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
Slack variables
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
Equivalent problems
Equivalent problemTwo problems are called equivalent if from a solution of one, a solution of the
other is readily found, and vice versa.
Reading materials
Eliminating equality constraints
49
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Equivalent problems /65
4.2 Convexity Optimization
Convex optimization problems in standard form
Convex optimization problem
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
a⊤
i x = bi , i = 1, ..., p.
50
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • Convex optimization problems in standard form /65
Local and global optima
Theorem. Local and global optima [WyYβ⊤ ]
Any locally optimal point is also global optimal.
51
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • Local and global optima /65
First-order optimality condition
Optimality condition Suppose f0 is differentiable, then x is optimal if and only if
⟨∇f (x) | y − x⟩ ≥ 0 ∀y ∈ X,
with X being the feasible set.
52
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • An optimality criterion for differentiable f0 /65
Unconstrained problems
Optimality condition Suppose f0 is differentiable, then x is optimal if and only if
∇f (x) = 0.
53
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • An optimality criterion for differentiable f0 /65
Unconstrained problems
Optimality condition Suppose f0 is differentiable, then x is optimal if and only if
∇f (x) = 0.
53
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • An optimality criterion for differentiable f0 /65
Problems with equality constraints only
Consider the problem
minimize f0 (x)
subject to Ax = b.
54
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • An optimality criterion for differentiable f0 /65
Minimization over the nonnegative orthant
Consider the problem
minimize f0 (x)
subject to x ⪰ 0.
Complementarity
( )
x ⪰ 0, ∇f0 (x) ⪰ 0 and xi ∇f0 (x) i = 0, i = 1, ..., n.
Reading material
Section 4.2.4 Equivalent convex problems
55
Convexity • Chapter 4-Convex Optimization Problems • Convexity Optimization • An optimality criterion for differentiable f0 /65
4.3 Linear Optimization Problems
Linear optimization problems
Definition. Linear program (LP) [WyYβ⊤ ]
A general linear program has the form
minimize c⊤ x + d
subject to Gx ⪯ h
Ax = b,
where G ∈ R m×n
and A ∈ R p×n
.
56
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems /65
Standard and inequality form linear programs
Standard form Only includes non-negative inequality constraint
minimize c⊤ x
subject to Ax = b
x ⪰ 0,
57
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems /65
Converting LPs to standard form
Converting
minimize c⊤ x + d
subject to Gx ⪯ h
Ax = b,
to standard form.
58
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems /65
Diet problem
A healthy diet contains
m different nutrients in quantities at least equal to b1 , · · · , bm .
Nonnegative quantities x1 , · · · , xn of n different foods.
One unit quantity of food j contains an amount ai,j of nutrient i, and has a cost of cj .
59
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems • Examples /65
Chebyshev center of a polyhedron
Finding the largest Euclidean ball that lies in a polyhedron described by linear inequalities,
def { }
P = x ∈ Rn | a ⊤ i x ≤ bi , i = 1, ..., m .
60
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems • Examples /65
Chebyshev center of a polyhedron
Finding the largest Euclidean ball that lies in a polyhedron described by linear inequalities,
def { }
P = x ∈ Rn | a ⊤ i x ≤ bi , i = 1, ..., m .
minimize − r
subject to a⊤
i xc + r||ai ||2 ≤ bi , i = 1, ..., m.
60
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems • Examples /65
Chebyshev center of a polyhedron
Finding the largest Euclidean ball that lies in a polyhedron described by linear inequalities,
def { }
P = x ∈ Rn | a ⊤ i x ≤ bi , i = 1, ..., m .
Other examples:
Dynamic activity planning
Chebyshev inequalities
Piecewise-linear minimization
60
Convexity • Chapter 4-Convex Optimization Problems • Linear Optimization Problems • Examples /65
4.4 Quadratic Optimization Problems
Quadratic optimization problems
Definition. Quadratic program (QP) [WyYβ⊤ ]
A quadratic program has the form
1 ⊤
minimize x P x + q⊤ x + r
2
subject to Gx ⪯ h
Ax = b,
where P ∈ Sn+ , G∈R m×n
and A ∈ R p×n
.
61
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems /65
Quadratic optimization problems
Definition. Quadratic program (QP) [WyYβ⊤ ]
A quadratic program has the form
1 ⊤
minimize x P x + q⊤ x + r
2
subject to Gx ⪯ h
Ax = b,
where P ∈ Sn+ , G∈R m×n
and A ∈ R p×n
.
61
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems /65
Least-squares and regression
Least-squares
||Ax − b|| = x⊤ A⊤ Ax − 2b⊤ Ax + b⊤ b.
2
62
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Examples /65
Distance between polyhedra
{ } { }
The (Euclidean) distance between the polyhedra P1 = x | A1 x ⪰ b1 and P2 = x | A2 x ⪰ b2
in Rn is defined as
def
dist(P1 , P2 ) = inf ||x1 − x2 ||.
x1 ∈P1 , x2 ∈P2
QP formulation
2
minimize ||x1 − x2 ||2
subject to A1 x ⪰ b1 , A 2 x ⪰ b2 .
63
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Examples /65
Distance between polyhedra
Problem. Best pair problem [WyYβ⊤ ]
Let C, D ⊆ Rn be two non-empty closed convex sets
minimize ||x1 − x2 ||
subject to x1 ∈ C, x2 ∈ D.
63
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Examples /65
Distance between polyhedra
Other examples:
Bounding variance
Linear program with random cost
Markowitz portfolio optimization
63
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Examples /65
Second-order cone programming
Definition. Second-order cone programming (SOCP) [WyYβ⊤ ]
A quadratic program has the form
minimize f ⊤ x
subject to ||Ai x + bi ||2 ≤ c⊤
i x + di , i = 1, ..., m
F x = g,
where Ai ∈ Rni ×n and F ∈ Rp×n .
64
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Second-order cone programming /65
Robust linear programming
Linear program with inequality constraint
minimize c⊤ x
subject to a⊤
i x ≤ bi , i = 1, ..., m.
65
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Second-order cone programming /65
Robust linear programming
Linear program with inequality constraint
minimize c⊤ x
subject to a⊤
i x ≤ bi , i = 1, ..., m.
minimize c⊤ x
subject to ā⊤ ⊤
i x + ||Pi x|| ≤ bi , i = 1, ..., m.
65
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Second-order cone programming /65
Robust linear programming
Linear program with inequality constraint
minimize c⊤ x
subject to a⊤
i x ≤ bi , i = 1, ..., m.
Other examples:
Linear programming with random constraints
Minimal surface
65
Convexity • Chapter 4-Convex Optimization Problems • Quadratic Optimization Problems • Second-order cone programming /65
自然科学研究院
Fin