0% found this document useful (0 votes)

24 views149 pages

Math6015 Lecture 04

Uploaded by

epicshadow001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views149 pages

Math6015 Lecture 04

Uploaded by

epicshadow001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 149

Convex Optimization

Lecture : Chapter 2 - 4
04

Convex Sets, Convex Functions and Convex Optimization Problems

Jingwei Liang
Institute of Natural Sciences and School of Mathematical Sciences

Email: [email protected]
Office: Room 355, No. 6 Science Building
Previously
We have seen optimization problems of the form
Norm approximation
minimize ||Ax − b||.
Penalty function approximation problem
minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
Least-norm problem
minimize ||x||,
subject to Ax = b.
MAP estimation problem
maximize log py|x (x, y) + log px (x).

2
Convexity • /65
Optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.

x ∈ Rn is (vector) variable to be chosen.

f0 : Rn → R is the objective function.
fi : Rn → R, i = 1, ..., m are inequality constraint functions.
gi : Rn → R, i = 1, ..., p are equality constraint functions.
Mazimize the objective,multiple objectives...

3
Convexity • /65
Optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Find good (or best) actions
x represents some actions
trades in a portfolio
airplane control surface deflections
schedule or assignment
resource allocation
Constraints limit actions or impose conditions on outcome.
The samller (or larger) the objective f0 (x), the better
total cost (or negative profit)
deviation from desired or target outcome
risk
fuel use

3
Convexity • /65
Optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Find good models
x represents parameters of a model.
Constraints impose requirements on model parameters (e.g. nonnegativity).
Objective f0 (x) is sum of two terms
a prediction error (or loss) on some observed data
a (regularization) term that penalizes model complexity

3
Convexity • /65
Optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
gi (x) = 0, i = 1, ..., p.
Worse-case analysis
Variables are actions or parameters out of our control (and possibly under the control of an
adversary).
Constraints limit the possible values of the parameters.
Minimizing −f0 (x) finds worst possible parameter values
If the worst possible value of f0 (x) is tolerable, you’re OK.
It is good to know what the worst possible scenario can be.

3
Convexity • /65
Convex optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.

Variable x ∈ Rn .
f0 : Rn → R is the objective function.
Equality constraints are linear.
fi : Rn → R, i = 0, 1, ..., m are convex.

4
Convexity • /65
Convex optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
When is an optimization problem hard to solve?
Classical view
Linear is easy.
Non-linear is hard.
But it is wrong!
The correct view
convex is easy.
Nonconvex (negative curvature) is hard.

4
Convexity • /65
Convex optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
Solving convex optimization problems
Many different algorithms (that run on many platforms)
Interior-point methods for up to 10000s of variables.
First-order methods for larger problems
Do not require initial point, babysitting, or tuning...
Can develop and deploy quickly using modeling languages such as CVXPY.
Solvers are reliable, so can be embedded
Code generation yields real-time solvers that execute in milliseconds.

4
Convexity • /65
Convex optimization problems

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m,
Ax = b.
Brief history
Algorithms
1947: simplex algorithm for linear programming (Dantzig)
1960s: early interior-point methods (Fiacco & McCormick, Dikin, …)
1970s: ellipsoid method and other subgradient methods
1980s & 90s: interior-point methods (Karmarkar, Nesterov & Nemirovski)
Since 2000s: many methods for large-scale convex optimization
Applications
Before 1990: mostly in operations research, a few in engineering
Since 1990: many applications in engineering (control, signal processing, communications,
circuit design, …)
Since 2000s: image processing, machine learning and statistics, finance

4
Convexity • /65
Affine and Convex Sets
Affine sets
Chapter 2 Convex sets

Some Important Examples

Operations that Preserve Convexity

2.1 Affine and Convex Sets
Line segments
Definition. Lines and line segments [WyYβ⊤ ]
The line between two points x1 , x2 ∈ E with x1 ̸= x2 is the set of points on the straight line y:
y = θx1 + (1 − θ)x2 = x2 + θ(x1 − x2 ), θ ∈ R.

Line segments θ ∈ [0, 1].

5
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Definition. Affine set [WyYβ⊤ ]
A set C ⊆ Rn is called affine if for any x1 , x2 ∈ C and θ ∈ R, there holds the inclusion
θx1 + (1−θ)x2 ∈ C.

Affine combination Given x1 , x2 , ..., xk ∈ Rn , the point of the form

∑
k
y= θi x i
i=1
∑k
where i=1 θi = 1, θi ∈ R, is called the affine combination of x1 , x2 , ..., xk .

6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Fact. Affine set and subspace [WyYβ⊤ ]
If C is an affine set and x0 ∈ C, then the set
def { }
S = C − x0 = x − x0 | x ∈ C
is a subspace.

6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Example. Solutions of linear equation [WyYβ⊤ ]
The set of solutions of linear equations
def { }
C = x ∈ Rn | Ax = b ,
where A ∈ Rm×n and b ∈ Rm , is an affine set.

Nullspace of A: C − x, x ∈ C

6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine sets
Affine hull For a set C ⊆ E, the affine hull of C, denoted by aff(C), is the intersection of all affine
sets containing C. { k }
def
∑ ∑
k
aff(C) = θi x i | θi = 1, xi ∈ C .
i=1 i=1

aff(C) is by itself an affine set, and it is the smallest affine set containing C (w.r.t. inclusion).

6
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine dimension and relative interior
Definition. Affine dimension [WyYβ⊤ ]
The affine dimension of a set C is the dimension of its affine hull.

Remark Not always consistent with other definitions of dimensions (e.g. unit circle).
7
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Affine dimension and relative interior
Definition. Affine dimension [WyYβ⊤ ]
The affine dimension of a set C is the dimension of its affine hull.
Relative interior If the affine dimension of C is less than n, then aff(C) ̸= Rn . Relative interior:
def { }
relint(C) = x ∈ C | B(x, r) ∩ aff(C) ⊆ C for some r > 0 .

Example Consider the set

def { }
C = x ∈ R3 | x1 ∈ [−1, 1], x2 ∈ [−1, 1], x3 = 0 .

Remark Not always consistent with other definitions of dimensions (e.g. unit circle).
7
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Affine sets /65
Convex sets
Definition. Convex set [WyYβ⊤ ]
A subset S of E is convex if for any x, y ∈ S and θ ∈ [0, 1], there holds
θx + (1 − θ)y ∈ S.
θx + (1 − θ)y is called the convex combination of x and y.

8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Convex sets
∑
Convex combination given x1 , x2 , ..., xk and θ1 , θ2 , ..., θk ∈ R+ such that i θi = 1, then
θ1 x 1 + θ2 x 2 + · · · + θ k x k
is called the convex combination of x1 , x2 , ..., xk .

Convex hull The convex hull of a set C, denoted by conv(C), is the set of all convex combinations
of points in C { k }
def
∑ ∑k
conv(C) = θi xi | xi ∈ C, θi ∈ R+ and θi = 1 .
i=1 i=1

Can be extended to infinite number of points, integral or mean.

8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Convex sets

Convex hull The convex hull of a set C, denoted by conv(C), is the set of all convex combinations
of points in C { k }
def
∑ ∑k
conv(C) = θi xi | xi ∈ C, θi ∈ R+ and θi = 1 .
i=1 i=1

Can be extended to infinite number of points, integral or mean.

8
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Cones
Definition. Convex cone [WyYβ⊤ ]
Given a set C ⊂ Rn , we call C a cone (or nonnegative homogeneous) if for any x ∈ C and θ ≥ 0,
there holds
θx ∈ C.
If, moreover, C is convex, then it is called a convex cone.

9
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
Cones
Definition. Convex cone [WyYβ⊤ ]
Given a set C ⊂ Rn , we call C a cone (or nonnegative homogeneous) if for any x ∈ C and θ ≥ 0,
there holds
θx ∈ C.
If, moreover, C is convex, then it is called a convex cone.

Conic combination (non-negative linear combination) Let {x1 , x2 , ..., xk }

be a set of vectors, then
∑
k
θi x i with θi ≥ 0, i = 1, 2, ..., k
i=1

is called conic combination of {x1 , x2 , ..., xk }.

Conic hull The conic hull of a set C, is the set of all conic
combinations of points in C
{ k }
∑
θi xi | xi ∈ C, θi ∈ R+
i=1

which is the smallest convex cone that contains C.

Affine combination, convex combination and
conic combination...
9
Convexity • Chapter 2-Convex Sets • Affine and Convex Sets • Convex sets /65
2.2 Some Important Examples
Hyperplanes and half spaces

Hyper plane Given a ∈ Rn and b ∈ R, the set

def { }
H = x | a⊤ x = b
is convex.

Half space Given a ∈ Rn and b ∈ R, the set

def { }
H = x | a⊤ x ≤ b
is convex.

Ray Given any x0 ∈ Rn and d ∈ Rn , the set

def { }
H = x | x0 + θd, θ ≥ 0
is convex.

10
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Euclidean balls and ellipsoids

Euclidean ball let xc ∈ Rn , r > 0, then

def { }
B(xc , r) = x ∈ Rn | ||x − xc ||2 ≤ r
is convex.

Ellipsoids let P = P ⊤ ≻ 0,
def { }
E = x ∈ Rn | (x − xc )⊤ P −1 (x − xc ) ≤ 1
is convex.

11
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Norm balls and norm cones

Norm balls let xc ∈ Rn , r > 0 and p ≥ 1, then

def { }
Bp (xc , r) = x ∈ Rn | ||x − xc ||p ≤ r
is convex.

Norm cone The norm cone associated with the norm || · ||

is the set
def { }
C = (x, t) | ||x|| ≤ t ⊆ Rn+1 .

Second-order cone The second-order cone is the norm

cone for the Euclidean norm
def { }
C = (x, t) ∈ Rn+1 | x⊤ x = ||x||2 ≤ t2 , t ≥ 0 .
2

12
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra

Polyhedron A polyhedron is define as the solution set of a finite number of linear equalities and
inequalities:
def { }
P = x ∈ Rn | a ⊤ ⊤
i x ≤ bi , i = 1, ..., m and cj x = dj , j = 1, ..., p .

It is the intersection of a finite number of halfspaces and hyperplanes.

Affine sets (e.g., subspaces, hyperplanes, lines), rays, line segments, and halfspaces are all
polyhedra.

Alternative notation
def { }
P = x ∈ Rn | Ax ⪯ b and Cx = d .

13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra

Simplexes let v0 , v1 , ..., vk ∈ Rn be affinely independent (i.e. v1 − v0 , ..., vk − v0 are linearly inde-
pendent), the simplex determined by then is given by
{ } { }
C = conv v0 , v1 , ..., vk = θ0 v0 + θ1 v1 + · · · + θk vk | θ ⪰ 0, 1 ⊤θ = 1 .
def

line segment for 1D simplex, triangle for 2D simplex and tetrahedron for 3D simplex.

The unit simplex is the n-dimensional simplex determined by the zero vector and the unit vectors,
i.e. 0, e1 , ..., en ∈ Rn , and can expressed as
{ }
x ∈ Rn : x ⪰ 0 and 1⊤ x ≤ 1.

The probability simplex is the (n−1)-dimensional simplex determined by the unit vectors e1 , ..., en ∈
Rn . It is the set of vectors that satisfy
{ }
x ∈ Rn : x ⪰ 0 and 1 ⊤x = 1 .
13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
Polyhedra
Convex hull description of polyhedra the convex hull of the finite set {v1 , v2 , ..., vk } is
def { }
C = θ1 v1 + θ2 v2 + · · · + θk vk | θ ⪰ 0, 1 ⊤θ = 1 .
This set is polyhedron and bounded.

13
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
The positive semidefinite cone

The subset of all n×n symmetric matrices is denoted

by Sn :
def { }
Sn = A ∈ Rn×n | A = A⊤ .
Sn is a vector space with dimension n(n + 1)/2.
Set of symmetric positive semidefinite
matrices
def { }
Sn+ = A ∈ Sn | A ⪰ 0 .

Set of symmetric positive definite matrices

def { }
Sn++ = A ∈ Sn | A ≻ 0 .

14
Convexity • Chapter 2-Convex Sets • Some Important Examples /65
2.3 Operations that Preserve Convexity
Showing a set is convex
Methods for establishing convexity of a set C
1. Use definition, recommended only for very simple sets.
2. Use convex functions (next lecture).
3. Show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,...) by
operations that preserve convexity
Intersection
Affine mapping
Perspective mapping
Linear-fractional mapping

15
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Intersection

Intersection if S1 and S2 are convex, then

S1 ∩ S2
is convex.

Can be extended to INfinite number of sets. Emptyset ∅ is convex.

Example Consider the set

{ }
S = x ∈ Rm : |p(t)| ≤ 1, |t| ≤ π/3 ,
where
∑
k
p(t) = xk cos(kt).
k=1

16
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Affine functions

Affine functions A function f : Rn → Rm is called affine if it is a sum of a linear function and a

constant,
f (x) = Ax + b
where A ∈ Rm×n and b ∈ Rm .

Suppose S ⊆ Rn is convex and f : Rn → Rm is affine, then the image of S under f ,

def { }
f (S) = f (x) | x ∈ S
is convex.

If f : Rk → Rn is affine, the inverse image of S under f ,

def { }
f −1 (S) = x ∈ Rk | f (x) ∈ S
is convex.
17
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Affine functions

Affine functions A function f : Rn → Rm is called affine if it is a sum of a linear function and a

constant,
f (x) = Ax + b
where A ∈ Rm×n and b ∈ Rm .

Examples
Scaling
Translation
Projection
Sum/difference
Partial sum

17
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Perspective functions

The perspective function the perspective function P : Rn+1 → Rn , with domain dom(P ) = Rn ×
R++ , is defined as
z
P (z, t) = .
t

The perspective function scales or normalizes vectors so the last component is one, and
Remark
then drops the last component.
18
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Perspective functions

The perspective function the perspective function P : Rn+1 → Rn , with domain dom(P ) = Rn ×
R++ , is defined as
z
P (z, t) = .
t
If C ⊆ dom(P ) is convex, then its image
def { }
P (C) = P (x) | x ∈ C
is convex.

A linear-fractional function is formed by composing the perspective function with an affine func-
tion. Suppose g : Rn → Rm+1 is affine, i.e.
[ ] ( )
A b
g(x) = ⊤ x + ,
c d
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn and d ∈ R. The function f : Rn → Rm given by f = P ◦ g,
Ax + b { }
f (x) = ⊤ , dom(f ) = x ∈ Rn | c⊤ x + d > 0
c x+d
is called linear-fractional function.

19
Convexity • Chapter 2-Convex Sets • Operations that Preserve Convexity /65
Preliminaries

Chapter 3 Basic Properties and Examples

Operations that Preserve Convexity

The Conjugate Function

3.1 Preliminaries
Differentiability
Given a function f : Rn → Rm and a point x0 ∈ Rn , we wish to find an affine function A(x) that
approximates f near x0 , such that
A(x0 ) = f (x0 ),
A(x) = L(x − x0 ) + f (x0 ).
A(x) approaches f (x) faster than x approaches x0 , that is
||f (x) − A(x)||
lim = 0.
x→x0 , x∈dom(f ) ||x − x0 ||

A function f : S → Rm , S ⊂ Rn , is said to be differentiable at x0 ∈ S if there exists an affine

function that approximates f near x0 ; that is,
||f (x) − (L(x − x0 ) + f (x0 ))||
lim = 0.
x→x0 , x∈dom(f ) ||x − x0 ||
The linear function L is determined uniquely by f and x0 and is called the derivative of f at x0 .

The function f is said to be differentiable on S if f is differentiable at every point of its

Remark
domain S.
20
Convexity • Chapter 3-Convex Functions • Preliminaries /65
The derivative matrix
A linear function/transform from Rn to Rm is a m × n matrix. To find the matrix representation L of
the derivative L of a differentiable function f , we use the natural basis
{ }
e1 , e2 , ..., en
of Rn .

Let
xj = x0 + tej , j = 1, 2, ..., n.
By definition of derivative, for j = 1, 2, ..., n
f (xj ) − (tLej + f (x0 )) f (x0 + tej ) − f (x0 )
lim =0 =⇒ lim = Lej .
t→0 t t→0 t

Partial derivative
∂f
(x0 ).
∂xj

21
Convexity • Chapter 3-Convex Functions • Preliminaries /65
The derivative matrix
As f : Rn → Rm ,  ∂f 
 
∂xj (x0 )
1
f1 (x)
   
 f2 (x)   ∂f1 (x )
   ∂xj 0 
∂f  
f (x) = 
 . 
 and (x0 ) =  .. .
 ..  ∂xj  
   . 
 
∂f1
fm (x) ∂xj (x 0 )

The derivative matrix L has the form

 ∂f1 
∂x1 (x0 ) ··· ∂f1
∂xn (x0 )
 
 .. .. .. 
L= . . . ,
 
∂fm
∂x1 (x0 ) ··· ∂fm
∂xn (x0 )

which is called the Jacobian matrix.

21
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
Given a function f : Rn → R and an open set S ⊂ dom(f ). If for every point x ∈ S and i = 1, ..., n,
the partial derivative
∂f (x) f (x + tei ) − f (x)
= lim
∂xi t→0 t
exists and continuous, we say f is continuous differentiable. We denote f ∈ C 1 .

The gradient of f at x ∈ S ⊂ dom(f ) is an n-dimensional vector

 ∂f (x) 
 ∂x1

 .. 
∇f (x) = 
 .
 ∈ Rn .

 
∂f (x)
∂xn

22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
If for every point x ∈ S, i = 1, ..., n and j = 1, ..., n, the 2nd order partial derivative
∂ 2 f (x)
∂xi ∂xj
exists and continuous, we say f is twice continuous differentiable. We denote f ∈ C 2 .

The Hessian matrix of f at x ∈ S ⊂ dom(f ) is an (n × n)-matrix

 ∂f (x) ∂f (x) ∂f (x)

∂x1 ∂x1 ∂x1 ∂x2 · · · ∂x 1 ∂xn
 
 ∂f (x) ∂f (x) ∂f (x) 
 ∂x2 ∂x1 ∂x2 ∂x2 · · · ∂x2 ∂xn 
 
∇2 f (x) =  . .. ..  ∈ Rn×n .
 . .. 
 . . . . 
 
∂f (x) ∂f (x) ∂f (x)
∂xn ∂x1 ∂xn ∂x2 · · · ∂xn ∂xn

Remark While Hessian matrix is square and symmetric, the Jacobi matrix in general is not!
22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Gradient and Hessian
Example. Quadratic function [WyYβ⊤ ]
Consider
1 ⊤
f (x) = x P x + q⊤ x + r
2
where P ∈ S n , q ∈ Rn and r ∈ R.

22
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Taylor expansion
Let S ⊂ Rn be an open set and f ∈ C 1 , then the 1st-order Taylor expansion of f at x ∈ S reads
f (y) = f (x) + ⟨∇f (x) | y − x⟩ + o(||y − x||).

Let S ⊂ Rn be an open set and f ∈ C 2 , then the 2nd-order Taylor expansion of f at x ∈ S reads
1 2
f (y) = f (x) + ⟨∇f (x) | y − x⟩ + ⟨y − x | ∇2 f (x)(y − x)⟩ + o(||y − x|| ).
2

23
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Chain rule
Suppose f : Rn → Rm is differentiable at x ∈ int(domf ) and g : Rm → Rp is differentiable at
f (x) ∈ int(domg). Define the composition h : Rn → Rp by
h(x) = g(f (x)).
Then h(x) is differentiable at x, with gradient reads
∇h(x) = ∇g(f (x))∇f (x).

Example. Quadratic function [WyYβ⊤ ]

Let A ∈ Rm×n and b ∈ Rm , then
(1 2)
∇ ||Ax − b|| = A⊤ (Ax − b).
2

Proposition. General chain rule [WyYβ⊤ ]

∂f ∂f ∂a ∂v d (
k
) ∑ ∂f ∂gi (x)
= ··· and f g1 (x), . . . , gk (x) =
∂x ∂a ∂b ∂x dx ∂gi (x) ∂x
i=1

24
Convexity • Chapter 3-Convex Functions • Preliminaries /65
Chain rule

24
Convexity • Chapter 3-Convex Functions • Preliminaries /65
3.2 Basic Properties and Examples
Convex functions
Definition. Convex function [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty convex set, a function f : S → R is said to be convex if for any x, y ∈ S
and any λ ∈ (0, 1), there holds
( )
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
It is said to be strictly convex if
( )
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y).

( )
y, f (y)

( )
x, f (x)

25
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Convex functions
Definition. Convex function [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty convex set, a function f : S → R is said to be convex if for any x, y ∈ S
and any λ ∈ (0, 1), there holds
( )
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
It is said to be strictly convex if
( )
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y).

Concave function. Affine functions are both convex and concave.

A function is convex if and only if it is convex when restricted to any line that intersects its
domain
g(t) = f (x + tv)
{ }
with dom(g) = t | x + tv ∈ dom(f ) .

25
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Extended-value extension
Definition. Extended-value extension [WyYβ⊤ ]
If f is convex, we define its extended-value extensions f˜ : Rn → R ∪ {∞} by
{
f (x) x ∈ dom(f )
˜
f (x) =
+∞ x ∈ / dom(f ).

{ }
dom(f˜) = Rn and dom(f ) = x | f˜(x) < +∞ .

Indicator function {
0 x∈C
ιC (x) =
+∞ x ∈
/ C.

26
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Theorem. First-order condition [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued differentiable function from S
to R. Then f is convex if and only if for any x, y ∈ S, there holds
f (y) ≥ f (x) + ⟨∇f (x) | y − x⟩.

f (x) + ⟨∇f (x) | y − x⟩

f (y)

( )
x, f (x)

27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Necessity

27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Sufficienty

27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions

27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
First-order conditions
Corollary. [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued function from S to R which is
differentiable at x̄, then for any x ∈ S, there holds
f (x) ≥ f (x̄) + ⟨∇f (x̄) | x − x̄⟩.

27
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Second-order condition
Theorem. Second-order condition [WyYβ⊤ ]
Let S ⊂ Rn be a non-empty open convex set, and f a real-valued twice-differentiable function
from S to R. Then f is convex if and only if
∇2 f (x)
is positive semi-definite for any x ∈ S. It is moreover strictly convex if and only if
∇2 f (x)
is positive definite for any x ∈ S.

Example Consider the quadratic function f : Rn → R, with dom(f ) = Rn , given by

1 ⊤
f (x) = x P x + q ⊤ x + r,
2
with P ∈ Sn , q ∈ Rn and r ∈ R.
28
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
Some real functions on R
Exponential eax

Powers xa , x > 0

a
Powers of absolute value |x|

Logarithm log x, x > 0

Negative entropy x log x

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
Norms

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
{ }
Max function f (x) = max x1 , x2 , ..., xn

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
x2
Quadratic-over-linear function f (x, y) = y with dom(f ) = R × R++ .

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
( )
Log-sum-exp f (x) = log ex1 + ex2 + · · · + exn

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Examples
∏n 1/n
Geometric mean f (x) = ( i=1 xi )

Log-determinant f (X) = log detX, dom(f ) = Sn++

29
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Associated sets

Definition. α-sublevel set [WyYβ⊤ ]

The α-sublevel set of a function f : Rn → R
is defined as +∞
def { } epi(f )
Cα = x ∈ dom(f ) | f (x) ≤ α .
gra(f )
Definition. Graph and epigraph [WyYβ⊤ ]
The graph of a function f : Rn → R is
{( ) }
gra(f ) = x, f (x) | x ∈ dom(f ) ⊆ Rn+1 .

The epigraph of f is
{ }
epi(f ) = (x, t) | x ∈ dom(f ), f (x) ≤ t

⊆ Rn+1 . dom(f )

Remark Convexity of function and convexity of its epigraph.

30
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Associated sets

Supporting hyperplane Let S be a nonempty subset of Rn , let x ∈ S, and suppose that u ∈

Rn \ {0}. If
sup⟨S | u⟩ ≤ ⟨x | u⟩
{ }
Then y ∈ R | ⟨y | u⟩ = ⟨x | u⟩ is a supporting hyperplane of S at x, and x is a support point
n

of S with normal vector u.

f (x) + ⟨∇f (x) | y − x⟩

epi(f )
( )
x, f (x)

( )⊤
∇f (x), −1

30
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
Jensen’s inequality and extensions
Proposition. Jensen’s inequality [WyYβ⊤ ]
If f is convex, x1 , ..., xk ∈ dom(f ), and θ1 , ..., θk ≥ 0 with θ1 + · · · + θk = 1, then
f (θ1 x1 + · · · + θk xk ) ≤ θ1 f (x1 ) + · · · + θk f (xk ).

31
Convexity • Chapter 3-Convex Functions • Basic Properties and Examples /65
3.3 Operations that Preserve Convexity
Nonnegative weighted sums
Non-negative weighted sum of convex functions
f = w1 f 1 + w2 f 2 + · · · + wm f m ,
where fi , i = 1, ..., m are convex, and wi , i = 1, ..., m are non-negative.

32
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Composition with an affine mapping
Suppose f : Rn → R, A ∈ Rn×m and b ∈ Rn . Define g : Rm → R by
def
g(x) = f (Ax + b),
{ }
with dom(g) = x | Ax + b ∈ dom(f ) .

33
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
If f1 and f2 are convex functions, then their pointwise maximum f , defined by
def { }
f (x) = max f1 (x), f2 (x)
with dom(f ) = dom(f1 ) ∩ dom(f2 ), is also convex.

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
If f1 and f2 are convex functions, then their pointwise maximum f , defined by
def { }
f (x) = max f1 (x), f2 (x)
with dom(f ) = dom(f1 ) ∩ dom(f2 ), is also convex.

Example Piecewise-linear functions

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum

The pointwise maximum property extends to the pointwise supremum over an infinite set of convex
functions.
If for each y ∈ A, function f (x, y) is convex, then the function g, defined by
def
g(x) = sup f (x, y)
y∈A

is convex in x. The domain of g reads

def { }
dom(g) = x | (x, y) ∈ dom(f ) for all y ∈ A, sup f (x, y) < ∞ .
y∈A

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Support function [WyYβ⊤ ]
Let C ∈ Rn be non-empty. The support function SC associated with the set C is defined as
def
SC (x) = sup ⟨x | y⟩,
y∈C
def { }
with dom(SC ) = x | supy∈C ⟨x | y⟩ < ∞ .

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Distance to farthest point of a set [WyYβ⊤ ]
Let C ∈ Rn . The distance (in any norm) to the farthest point of C,
def
f (x) = sup ||x − y||.
y∈C

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Pointwise maximum and supremum
Example. Maximum eigenvalue of a symmetric matrix [WyYβ⊤ ]
The function f (X) = λmax (X), with dom(f ) = Sm , is convex.

f (X) = sup y ⊤ Xy.

def

||y ||=1

34
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Composition
def
Consider h : Rk → R and g : Rn → Rk , their composition f = h ◦ g : Rn → R, is defined by
( ) { }
f (x) = h g(x) , dom(f ) = x ∈ dom(g), g(x) ∈ dom(h) .
1D Chain rule
f ′′ (x) = h′′ (x)g ′ (x)2 + h′ (g(x))g ′′ (x)

35
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Minimization
If f is convex in (x, y) and C is a convex non-empty set, then the function
g(x) = inf f (x, y)
y∈C

is convex in x, provided that g(x) > −∞ for some x.

36
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
Minimization
Example. Distance to a set [WyYβ⊤ ]
The distance of a point x to a set S ⊆ Rn , in the norm || · ||, is defined as
def
dist(x, S) = inf ||x − y||.
y∈S

The function is convex if the set S is convex.

36
Convexity • Chapter 3-Convex Functions • Operations that Preserve Convexity /65
3.4 The Conjugate Function
Conjugate function
Definition. Conjugate [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞], the conjugate of f is defined by

f ∗ (y) = sup (⟨x | y⟩ − f (x)).

def

x∈Rn

Biconjugate f ∗∗ = (f ∗ )∗ .
Also called Fenchel conjugate, gra(f )
Legendre transform, or
Legendre-Fenchel transform. gra(h·|yi)

f ∗(y)

37
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Affine function f (x) = ax + b

38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Negative logarithm f (x) = ex

38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Exponential f (x) = x log x

38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Negative entropy f (x) = 1/x

38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Examples on R
Inverse f (x) = ax + b

38
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Strictly convex quadratic function
Let Q ∈ Sn++ and consider
1 ⊤
f (x) = x Qx
2

39
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Support function

Support function Let S ⊆ Rn be a non-empty convex set, the

support function of S is defined by
σS (y) = sup ⟨x | y⟩ = ι∗S (y).
def S
x∈S

Let S be a linear subspace of Rn , then

σS (y) = ιS ⊥ (y).

S⊥

40
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm

Example. ℓ2 -norm square [WyYβ⊤ ]

2
Let f = 12 ||x|| ,
1
f ∗ (y) =
2
||y|| .
2

Self-conjugacy

41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm

Example. ℓ2 -norm square [WyYβ⊤ ] Definition. Dual norm [WyYβ⊤ ]

Let f = 12 ||x|| ,
2 Let || · || be a norm defined on Rn . Its dual norm,
denoted by || · ||∗ , is defined as
1
f ∗ (y) = { }
2
||y|| .
2 ||y||∗ = sup ⟨y | x⟩ : ||x|| ≤ 1 .

Self-conjugacy

41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Norm

Example. ℓ2 -norm square [WyYβ⊤ ] Definition. Dual norm [WyYβ⊤ ]

Let f = 12 ||x|| ,
2 Let || · || be a norm defined on Rn . Its dual norm,
denoted by || · ||∗ , is defined as
1
f ∗ (y) = { }
2
||y|| .
2 ||y||∗ = sup ⟨y | x⟩ : ||x|| ≤ 1 .

Self-conjugacy
Example. Conjugate of norm [WyYβ⊤ ]
Let f = ||x|| be a norm with dual norm || · ||∗ . Then
{
∗ 0 ||y||∗ ≤ 1,
f (y) =
+∞ o.w.

41
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties

Convexity of conjugate f ∗ is closed and convex.

42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties

Convexity of conjugate f ∗ is closed and convex.

Fenchel–Young inequality Let f : Rn →] − ∞, +∞] be proper. Then

f (x) + f ∗ (y) ≥ ⟨x | y⟩.

42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties

Convexity of conjugate f ∗ is closed and convex.

Fenchel–Young inequality Let f : Rn →] − ∞, +∞] be proper. Then

f (x) + f ∗ (y) ≥ ⟨x | y⟩.

Let f, g be functions from Rn to [−∞, +∞]. Then

g ∗∗ ≤ g.
[ ]
f ≤ g =⇒ f ∗ ≥ g∗ and f ∗∗ ≤ g ∗∗ .

42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Properties

Convexity of conjugate f ∗ is closed and convex.

Fenchel–Young inequality Let f : Rn →] − ∞, +∞] be proper. Then

f (x) + f ∗ (y) ≥ ⟨x | y⟩.

Let f, g be functions from Rn to [−∞, +∞]. Then

g ∗∗ ≤ g.
[ ]
f ≤ g =⇒ f ∗ ≥ g∗ and f ∗∗ ≤ g ∗∗ .

Let f : Rn →] − ∞, +∞]. Then

∀α > 0, ( )∗
(αf )∗ = αf ∗ (·/α) and αf (·/α) = αf ∗ .
Let K : Rn → Rn be bijective. Then
(f ◦ K)∗ = f ∗ ◦ K ∗−1 .

42
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
The Fenchel-Moreau theorem
Theorem. Fenchel-Moreau [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞] be proper. Then f is closed and convex if and only if f = f ∗∗ . In this case,
f ∗ is proper as well.

43
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
The Fenchel-Moreau theorem
Theorem. Fenchel-Moreau [WyYβ⊤ ]
Let f : Rn →] − ∞, +∞] be proper. Then f is closed and convex if and only if f = f ∗∗ . In this case,
f ∗ is proper as well.

Corollary. [WyYβ⊤ ]
Let f ∈ Γ0 (Rn ), then f ∗ ∈ Γ0 (Rn ) and f ∗∗ = f .

43
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Calculus
Theorem. Subdifferential and conjugation [WyYβ⊤ ]
Let f ∈ Γ0 (Rn ), let x ∈ Rn and y ∈ Rn . Then the following are equivalent
(x, y) ∈ gra(∂f ).
f (x) + f ∗ (y) = ⟨x | y⟩.
(y, x) ∈ gra(∂f ∗ ).

When f ∈ Γ0 (Rn ), ∂f ∗ = (∂f )−1 .

44
Convexity • Chapter 3-Convex Functions • The Conjugate Function /65
Optimization Problems
Basic terminology
Expressing problems in standard form
Equivalent problems
Chapter 4
Convexity Optimization
Convex optimization problems in standard form
Local and global optima
An optimality criterion for differentiable f0

Linear Optimization Problems

Examples

Quadratic Optimization Problems

Examples
Second-order cone programming
4.1 Optimization Problems
Optimization problems
Consider problem
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.

Optimization variable
Objective function, cost function (loss function)
Equality constraints, equality constraint functions
Inequality constraints, inequality constraint functions
Unconstrained

45
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimization problems
Domain of the optimization problem

def
∩
m ∩
p
D= dom(fi ) ∩ dom(hi ).
i=0 i=1

A point x ∈ D is feasible if it satisfies all the constraints, o.w. it is infeasible.

Feasible set (constraint set).

45
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Optimal value p⋆
def { }
p⋆ = inf f0 (x) | fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .

Can take extended values.

46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Optimal value p⋆
def { }
p⋆ = inf f0 (x) | fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .

Can take extended values.

Optimal point x⋆ If x⋆ is feasible and

p⋆ = f0 (x⋆ ).
Set of optimal points
def { }
Xopt = f0 (x⋆ ) = p⋆ , fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .

Sub-optimal...

46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Optimal and local optimal points
Locally optimal point Let R > 0 and x be feasible
{ }
f (x) = inf f0 (z) | ||x − z|| ≤ R, fi (x) ≤ 0, i = 1, ..., m and hi (x) = 0, i = 1, ..., p .

Global and local optimal...

Activity of constraints Let x be feasible

Active: fi (x) = 0
Inactive: fi (x) < 0

46
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems
Feasible problem
find x
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.

47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems
Feasible problem
find x
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.

Alternative formulation
minimize ιC (x) + ιD (x),
where
def { }
C = x ∈ Rn | fi (x) ≤ 0, i = 1, ..., m
def { }
D = x ∈ Rn | hi (x) = 0, i = 1, ..., p

47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Feasibility problems

Sudoku puzzle
Standard size: 9 × 9 grid. Other sizes: 4 × 4,
16 × 16 and 25 × 25...
Each 3 × 3 sub-block, row and column
contains all of the digits from 1 to 9.
Each column contains all of the digits from 1
to 9.
Each 3 × 3 block contains all of the digits from
1 to 9.
Partially completed grid, at least 17 numbers,
is provided, for completing the puzzle with
unique solution.

47
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Basic terminology /65
Standard form
Standard form
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.

48
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Expressing problems in standard form /65
Standard form
Standard form
minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, ..., m
hi (x) = 0, i = 1, ..., p.

Example. Box constraints [WyYβ⊤ ]

Consider the problem
minimize f0 (x)
subject to ℓi ≤ xi ≤ ui , i = 1, ..., n

48
Convexity • Chapter 4-Convex Optimization Problems • Optimization Problems • Expressing problems in standard form /65
Equivalent problems