Chapter 2: Optimization for Data Science
Convex Functions
TANN Chantara
Department of Applied Mathematics and Statistics
Institute of Technology of Cambodia
October 9, 2022
TANN Chantara (ITC) Convex Functions October 9, 2022 1 / 31
Table of Contents
1 Definitions
2 Checking Convexity
3 Convexity Preserving Transformations
4 Schur Lemma
5 Generalized Inequalities
6 Summary
TANN Chantara (ITC) Convex Functions October 9, 2022 1 / 31
Definitions
Epigraph and Domain
Definition: The epigraph of f : Rn → (−∞, ∞] is the set
epi(f) = {(x, α) ∈ Rn+1 : f(x) ≤ α}
Definition: The domain of f : Rn → (−∞, ∞] is the set
dom(f) = {x ∈ Rm : f(x) ≤ ∞}
Definition: A function f : Rn → (−∞, ∞] is called proper if dom(f) ̸= ∅
TANN Chantara (ITC) Convex Functions October 9, 2022 2 / 31
Definitions
Convex Functions
Definition: A function f : Rn → (−∞, ∞] is called convex if its epigraph
is a convex set.
Proposition: f is conve if and only if its domain is a convex set and
f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y) (1)
for all x, y ∈ dom(f) and θ ∈ [0, 1]
• f is called strictly convex if the in equality in (1) is strict.
• f is called concave if −f is convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 3 / 31
Definitions
Sublevel Sets
Definition: The α-sublevel set of a function f : Rn → (−∞, ∞] is defined
as Cα = {x : f(x) ≤ α}
Proposition: f is convex then all of its sublevel sets are convex.
• Reverse implication is not true.
• Exercise: Find a non-convex function whose sublevel sets are all convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 4 / 31
Definitions
Examples of Convex Functions
Univariate functions Domain
• Exponential functions f(x) = e ax R
• Powers f(x) = xα (α ≥ 1, α ≤ 0) R++
• Negative logarithm f(x) = − log x R++
• Negative entropy f(x) = x log x R++
Multivariate functions Domain
• Negative entropy f(x) = a⊤ x + b Rn
f(x) = ||x||p = ( i=1 ||x||p )1/p Rn
Pn
• p-Norms (p ≥ 1)
• ∞-Norm f(x) = ||x||p = maxi |xi | Rn
(
0, x ∈ C
• Indicator function of f(x) = C
∞, else
convex set C
Convention: R++ = {x ∈ R : x > 0}
TANN Chantara (ITC) Convex Functions October 9, 2022 5 / 31
Definitions
Example of Convex Functions (cont’d)
Univariate functions Domain
• Trace functions (linear functions) Rm×n
m X
n
f(X) = tr(A⊤ X) =
X
Aij Xij , (A ∈ Rm×n )
i=1 j=1
• maximum eigenvalue f(X) = λmax (X) Sn
• Spectral norm f(X) = ||X||2 = supv̸=0 ||Xv||2 /||v||2 Rm×n
TANN Chantara (ITC) Convex Functions October 9, 2022 6 / 31
Checking Convexity
Checking Convexity Along Line
Proposition: A function f : Rn → (−∞, ∞] is convex if and only if
each univariate function g : R → (−∞, ∞] of the form
g(t) = f(x+ty), for x, y ∈ Rn
is convex in t.
Proof:
⇒: For any x, y ∈ Rn and θ ∈ (0, 1), consider t = θa + (1 − θ)b for
arbitrary a, b ∈ R
g(t) = g(θa + (1 − θ)b) = f(x + (θa + (1 − θ)b)y)
= f(θ(x + ay) + (1 − θ)(x + by))
≤ θf(x + ay) + (1 − θ)f(x + by) = θg(a) + (1 − θ)g(b)
TANN Chantara (ITC) Convex Functions October 9, 2022 7 / 31
Checking Convexity
Checking Convexity Along Line
Proof: Cont’d
⇐: For any x, y ∈ Rn , θ ∈ (0, 1) and t1 , t2 ∈ R
f(x + (θt1 + (1 − θ)t2 )y) = g(θt1 + (1 − θ)t2 )
≤ θg(t1 ) + (1 − θ)g(t2 ) = θf(x + t1 y) + (1 − θ)f(x + t2 y)
For t1 = 0, t2 = 1, x = x′ , y = y′ − x′ we get
f(θx′ + (1 − θ)y′ ) ≤ θf(x′ ) + (1 − θ)f(y′ )
TANN Chantara (ITC) Convex Functions October 9, 2022 8 / 31
Checking Convexity
1st Order Conditions
Definition: A function f : Rn → (−∞, ∞] is differentiable if its gradient
∇f = (∂f/∂x1 , ..., ∂f/∂xn ) exists at each point in dom(f) and if dom(f) is
open
Proposition: A differentiable function f : Rn → (−∞, ∞] is convex
if and only if dom(f) is convex and
f(y) ≥ f(x) + ∇f(x)⊤ (y − x) ∀x, y ∈dom(f)
⇒ 1st-order Taylor approximation
underestimates f globally.
⇒ From local information about
convex function we can obtain
global information.
TANN Chantara (ITC) Convex Functions October 9, 2022 9 / 31
Checking Convexity
Univariate Functions
Proposition: A differential function f : R → R is convex if and only if
f(y) ≥ f(x) + f ′ (x)(y − x) ∀x, y ∈ R
Proof:
⇒: If x, y ∈ R, 0 < t ≤ 1, then
f(x + t(y − x)) ≤ (1 − t)f(x) + tf(y) (convexity)
f(y) − f(x) ≥ [f(x + t(y − x))]/t (divide by t)
′
f(y) − f(x) ≥ f (x)(y − x) (limit t → ∞)
⇐: For any x, y ∈ R, 0 < t ≤ 1, Let z = tx + (1 − t)y
t(f(x) − f(z)) ≥ tf ′ (z)(x − z) (by assumption)
(1 − t)(f(y) − f(z)) ≥ (1 − t)t′ (z)(y − z) (by assumption)
tf(x) + (1 − t)f(y) ≥ f(z) (sum of above)
TANN Chantara (ITC) Convex Functions October 9, 2022 10 / 31
Checking Convexity
1st Order Conditions∗
Proposition: A differentiable function f : Rn → R is convex if and only if
f(y) ≥ f(x) + ∇f(x)⊤ (y − x) ∀x, y ∈ Rn
Proof:
⇒: g(t) = f(tx + 1(1 − t)y) is convex in t for any x, y ∈ Rn
g′ (t) = ∇f(tx + (1 − t)y)⊤ (y − x) (definition of g)
g(1) ≥ g(0) + g′ (0) (convexity of g)
f(x) ≥ f(y) + ∇f(y)⊤ (y − x) (substition)
n
⇐: x, y ∈ R , t, t̃ ∈ R, z = ty + (1 − t)x, z̃ = t̃y + (1 − t̃)x
f(z) ≥ f(z̃) + ∇f(z̃)⊤ (z − z̃) (by assumption)
⊤
g(t) ≥ g(t̃) + ∇g(t̃) (t − t̃) (definition of g, z, z̃)
By the 1st-order condition for univariate functions g is convex. Thus f is
also convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 11 / 31
Checking Convexity
2nd Order Conditions
Definition: A function f : Rn → (−∞, ∞] is twice differentiable if its
Hessian
2
∂ f/∂x1 ∂x1 ··· ∂ 2 f/∂x1 ∂xn
2
∇ f(x) =
.. .. ..
. . .
∂ 2 f/∂xn ∂x1 ··· ∂ 2 f/∂xn ∂xn
exists at each point in dom(f), and dom(f) is open.
Proposition: A twice differential function f : Rn → (−∞, ∞] is
convex if and only if dom(f) is convex and
∇2 f(x) ⪰ 0 ∀x ∈dom(f)
The condition ∇2 f(x) ⪰ 0 can be interpreted geometrically as the
requirement that f has upward curvature at x.
TANN Chantara (ITC) Convex Functions October 9, 2022 12 / 31
Checking Convexity
Univariate Functions∗
Proposition: A twice differential function f : R → R is convex if and only
if f ′′ (x) ≥ 0 ∀x ∈ R
Proof:
⇒: If x, y ∈ R, y > x, then
f(y) ≥ f(x) + f ′ (x)(y − x) (1st order conditions)
f(x) ≥ f(y) + f ′ (y)(x − y) (1st order conditions)
0 ≥ (f ′ (x) − f ′ (y))/(y − x) (sum of above × (y − x)−2 )
′′
0 ≥ f (x) (limit y → x)
⇐: For x, y ∈ R, weRhave
f(y) = f(x) + xRy f ′ (u)du = f(x) + xy f ′ (x) + xu f ′′ (v)dvdu
R R
≥ f(x) + xy f ′ (x)du = f(x) + f ′ (x)(y − x)
Thus, f is convex as it satisfies the 1st-order condition.
TANN Chantara (ITC) Convex Functions October 9, 2022 13 / 31
Checking Convexity
2nd-Order Conditions∗
Proposition: A twice differential function f : Rn → R is convex if and
only if f ′′ (x) ⪰ 0 ∀x ∈ Rn
Proof:
⇒: g(t) = f(x + ty) is convex in t for anyx, y ∈ Rn
g′′ (t) = y⊤ ∇2 f(x + ty)y (definition of g)
g′′ (t) ≥ 0 (univariate case)
∇2 f(x) ⪰ 0 (as y is arbitrary)
⇐: Define g as above. For any t we have
∇2 f(x + ty) ⪰ 0 (by assumption)
g′′ (t) ≥ 0 (definition of g)
By the 2nd-order condition for univariate functions, g is convex. Thus, f is
also convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 14 / 31
Checking Convexity
Examples
• Quadratic functions f(x) = x⊤ Px + q⊤ x + r are convex if ∇2 f(x) = P ⪰ 0
• The least-squares objective f(x) = ||Ax − b||22 is convex because
∇2 f(x) = 2A⊤ A ⪰ 0 for all A ∈ Rm×n
• Quadratic-over-linear function of the type f(x, y) = x2 /y are convex as
long as y > 0 because
!
2 y
∇2 f(x, y) = 3 (y − x) ⪰ 0 ∀y > 0
y −x
TANN Chantara (ITC) Convex Functions October 9, 2022 15 / 31
Checking Convexity
Negative Log-Determinant
Proposition: The log-determinant function f(X) = − log det(X) is convex
on the set of positive definite matrices Sn++ .
Proof: Homework.
TANN Chantara (ITC) Convex Functions October 9, 2022 16 / 31
Convexity Preserving Transformations
Convexity Preserving Transformations
Sometimes one can establish convexity of f by showing that f is obtained
from simple convex functions via transformations that preserve convexity:
non-negative weight sum
composition with affine function
pointwise maximum and supremum
composition
minimization
perspective
TANN Chantara (ITC) Convex Functions October 9, 2022 17 / 31
Convexity Preserving Transformations
Affine Transformations
Affine transformation of input: if f is convex, then g(x) = f(Ax + b) is
also convex
Non-negative affine transformation of output: If f1 , ..., fK are convex
functions and ρ1 , ..., ρK are non-negative numbers, then the conic
combination g(x) = ρ1 f1 (x) + · · · + ρK fK (x) is convex
Generalization to integrals: if f(x, y) is convex in x for each fixed
y ∈ Y and ρ(y) is a non-negative function of y, then
R
g(x) = Y ρ(y)f(x, y)dy
is convex in x (provided that the integral exists)
TANN Chantara (ITC) Convex Functions October 9, 2022 18 / 31
Convexity Preserving Transformations
Pointwise maximum and supremum
Maximum of convex functions: If f1 , ..., fK are convex, then the pointwise
maximum g(x) = max{f1 (x), ..., fK (x)} is also convex.
Recall : Intersections of convex sets are convex
Supremum of convex functions: If f(x, y) is convex in x for every fixed
y ∈ Y, then the pointwise supremum
g(x) = sup f(x, y)
y∈Y
is also convex
TANN Chantara (ITC) Convex Functions October 9, 2022 19 / 31
Convexity Preserving Transformations
Examples
1 Piecewise linear functions f(x) = maxi=1,..,K {a⊤
i x + bj } are convex.
2 The sum of the r largest components of x ∈ Rn is convex as it can be
written as a maximum of linear functions.
f(x) = max{xi1 + xi2 + · · · + xir : 1 < i1 < i2 < ir ≤ n
3 The support function of a (possibly noncovex) set C is convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 20 / 31
Convexity Preserving Transformations
Examples Cont’d
4 Maximum eigenvalue f(X) = λmax (X) for X ∈ Sn
Write X = RDR⊤ , with R orthogonal and D = diag(λ1 , ..., λn )
f(X) = sup v12 λ1 + · · · + vn2 λn = sup v⊤ Dv = sup v⊤ Xv
||v||2 =1 ||v||2 =1 ||v||2 =1
5 Spectral norm f(X) = ||X||2 = supv̸=0 ||Xv||2 /||v||2 for X ∈ Rm×n
f(X) = sup ||Xv||2 = sup sup u⊤ Xv
||v||2 =1 ||V||2 =1 ||u||2 =1
Recall: u⊤ Xv ≤ ||u||2 ||Xv||2 = ||Xv||2
In both cases f(X) is the supremum of linear functions in X and thus
convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 21 / 31
Convexity Preserving Transformations
Composition
Proposition: If g : Rn → R is convex and h : R → R is convex and
non-decreasing, the f : Rn → R defined as f(x) = h(g(x)) is convex.
Proof: For any x, y ∈ Rn and θ ∈ [0, 1]
f(θx + (1 − θ)y) = h(g(θx + (1 − θ)y)) (definition of f)
≤ h(θg(x) + (1 − θ)g(y)) (conv. of g, mono. of h)
≤ θh(g(x)) + (1 − θ)h(g(y)) (convexity of h)
= θf(x) + (1 − θ)f(y) (definition of f)
Thus f is convex.
Example: f(x) = exp(g(x)) is convex if g is convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 22 / 31
Convexity Preserving Transformations
Generalizations
Definition: A function f : Rn → [−∞, ∞) is concave if −f is convex.
Proposition: If g : R2 → R is concave and h : R → R is convex and
non-increasing, then f : Rn → R defined as f(x) = h(g(x)) is convex.
Proposition: If g : Rn → Rk is convex in each component, while
h : Rk → R is non-decreasing in each argument and convex, then
f : Rn → R defined via f(x) = h(g(x)) is convex.
Proposition: If g : Rn → Rk is concave in each component, while
h : Rk → R is non-increasing in each argument and convex, then
f : Rn → R defined via f(x) = h(g(x)) is convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 23 / 31
Convexity Preserving Transformations
Minimization
Proposition: If f(x, y) and g(x, y) are convex in (x, y) and C is a convex
set, then the optimal value function
(
inf y∈C f(x, y)
h(x) =
s.t. g(x, y) ≤ 0
is convex
Proof: Assume that the inner problem is solvable, i.e., for every
x ∈dom(h). Choose x1 , x2 ∈dom(h) and let y1 , y2 ∈ C be the
corresponding minimizers, i.e., h(xi ) = f(xi , yi ) for i = 1, 2, for any
θ ∈ [0, 1]
TANN Chantara (ITC) Convex Functions October 9, 2022 24 / 31
Convexity Preserving Transformations
Minimization Cont’d
h(θx1 + (1 − θ)x2 ) = inf {f(θx1 + 1(1 − θ)x2 , y) :
y∈C
g(θx1 + (1 − θ)x2 , y) ≤ 0}
≤ f(θx1 + (1 − θ)x2 , θy1 + (1 − θ)y2 )
≤ θf(x1 , y1 ) + (1 − θ)f(x2 , y2 )
= θh(x1 ) + (1 − θ)h(x2 )
Thus, h is convex. If the problem is not solvable, once can use a similar
argument using ε-optimal solution for ε → 0
TANN Chantara (ITC) Convex Functions October 9, 2022 25 / 31
Schur Lemma
Schur Lemma
!
A B
Lemma (Schur) Consider X ∈ Sn partitioned as X = ,
B⊤ C
where C ≻ 0. Then
X ⪰ 0 ⇐⇒ A − BC−1 B⊤ ⪰ 0
The matrix A − BC−1 B⊤ is called the Schur complement of C
Proof: Consider the functions f(x, y) = x⊤ Ax + 2x⊤ By + y⊤ Cy and
h(x) = inf y f(x, y) = x⊤ (A − BC−1 B⊤ )x
⇒ X ⪰ 0 =⇒ f convex in (x, y) =⇒ h convex in x
=⇒ A − BC−1 B⊤ ⪰ 0
⇐ We have A − BC−1 B⊤ . Assume X ⪰̸ 0
=⇒ ∃(x0 , y0 ) ̸= 0 with f(x0 , y0 ) < 0
=⇒ h(x0 )x0⊤ (A − BC−1 B⊤ )x0 < 0, which contradicts the positive
definiteness of the Schur complement. Hence, X ⪰ 0.
TANN Chantara (ITC) Convex Functions October 9, 2022 26 / 31
Schur Lemma
Distance function
The distance of x to a fixed convex set C is convex in x, i.e.,
f(x) = dist(x, C) = inf ||x − y||2
y∈C
TANN Chantara (ITC) Convex Functions October 9, 2022 27 / 31
Schur Lemma
Perspective function
Proposition: If f(x) is convex, then the perspective of f, defined as
g(x, t) = tf(x/t), dom(g) = {(x, t) : (x/t) ∈ dom(f), t >)}
is convex in (x, t)
Proof: Choose (x1 , t1 ), (x2 , t2 ) ∈ dom(g) and θ ∈ [0, 1], then
θx1 + (1 − θ)x2
g(θ(x1 , t1 ) + (1 − θ)(x2 , t2 )) = (θt1 + (1 − θ)t2 )f
θt1 + (1 − θ)t2
θt1 x1 /t1 + (1 − θ)t2 x2 /t2
= (θt1 + (1 − θ)t2 )f
θt1 + (1 − θ)t2
≤ θt1 f(x1 /t1 ) + (1 − θ)t2 f(x2 /t2 )
= θg(x1 , t1 ) + (1 − θ)g(x2 , t2 )
Thus g is convex in (x, t)
TANN Chantara (ITC) Convex Functions October 9, 2022 28 / 31
Schur Lemma
Relative Entropy
Proposition: The relative entropy of two vector p, q ∈ Rn++ defined as
Pn
f(p, q) = i=1 pi log(pi /qi )
is convex.
Proof: The negative logarithm f(x) = − log(x) is convex on R++ . We
therefore conclude that its perspective function
g(x, t) = −t log(x/t) = t log(t/x)
is convex on R2++ . The relative entropy now can be seen as a sum of n
convex functions and as such is convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 29 / 31
Generalized Inequalities
Convexity w.r.t generalized inequalities
Definition: Let K ⊂ Rm be a proper convex cone. The function
f : Rn → Rm is called K-convex if
f(θx + (1 − θ)y) ⪯K θf(x) + (1 − θ)f(y) ∀x, y ∈ Rn , θ ∈ [0, 1]
Proposition: If K is a proper convex cone and f is a K-convex function,
then the set C = {x : f(x) ⪯K 0} is convex.
Proof: Consider x, y ∈ C and θ ∈ [0, 1]. Then
f(θx + (1 − θ)y ⪯K θf(x) + (1 − θ)f(y) (f is K-convex)
⪯K 0 (x, y ∈ C, K convex)
Thus, θx + (1 − θ)y ∈ C, which implies that C is convex
Example: f : Sn → Sn , f(X) = X2 , is Sn+ -convex.
TANN Chantara (ITC) Convex Functions October 9, 2022 30 / 31
Summary
Summary
Definition: epigraph, domain and sublevel sets; proper, convex and
concave functions;
Checking convexity: using the basic definition; checking convexity
along lines; checking the 1st- or 2nd-order conditions (only for
differentiable functions).
Convexity-preserving transformations: non-negative weighted sum
and integral; composition with affine function; parametric maximum;
composition; parametric minimum (check convexity condition!);
perspective.
Schur’s lemma: a block matrix with a positive definite diagonal
block is psd if and only if this block’s Schur complement is psd.
Generalized inequalities: constructing convex sets using K-convex
constraint functions and conic inequalities.
TANN Chantara (ITC) Convex Functions October 9, 2022 31 / 31