0% found this document useful (0 votes)
95 views27 pages

Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725

This document provides an overview of convex sets and functions. It begins by defining convex sets as sets such that the line segment between any two points in the set is also contained in the set. Examples of convex sets include balls, hyperplanes, halfspaces, polyhedrons, and simplices. Convex cones are defined as sets closed under positive scaling. Key properties of convex sets are that they have supporting and separating hyperplanes, and operations like intersection, translation, and affine transformations preserve convexity. The document then outlines how it will discuss convex functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views27 pages

Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725

This document provides an overview of convex sets and functions. It begins by defining convex sets as sets such that the line segment between any two points in the set is also contained in the set. Examples of convex sets include balls, hyperplanes, halfspaces, polyhedrons, and simplices. Convex cones are defined as sets closed under positive scaling. Key properties of convex sets are that they have supporting and separating hyperplanes, and operations like intersection, translation, and affine transformations preserve convexity. The document then outlines how it will discuss convex functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Convexity I: Sets and Functions

Ryan Tibshirani
Convex Optimization 10-725

See supplements for reviews of


• basic real analysis
• basic multivariate calculus
• basic linear algebra
Last time: why convexity?
Why convexity? Simply put: because we can broadly understand
and solve convex optimization problems
Nonconvex problems are mostly treated on a case by case basis

Reminder: a convex optimization problem is of ●


the form ●

min f (x) ●
x∈D ●

subject to gi (x) ≤ 0, i = 1, . . . , m
hj (x) = 0, j = 1, . . . , r
● ●
where f and gi , i = 1, . . . , m are all convex, and

hj , j = 1, . . . , r are affine. Special property:
●●
any local minimizer is a global minimizer

2
Outline

Today:
• Convex sets
• Examples
• Key properties
• Operations preserving convexity
• Same, for convex functions

3
Convex sets
Convex set: C ⊆ Rn such that

x, y ∈ C =⇒ tx + (1 − t)y ∈ C for all 0 ≤ t ≤ 1

In words,
24
line segment joining any two elements lies entirely2 inConvex
set sets

Convex combination
which includes1 its boundary
Rn :darker),
Figure 2.2 Some simple convex and
of x , . . . , xk ∈(shown nonconvex sets. Left. The hexagon,
any linear combination
is convex. Middle. The kidney
shaped set is not convex, since the line segment between the two points in
the set shown as dots is not contained in the set. Right. The square contains
θ x + ··· + θ x
some boundary1 points
1 but not others,k kand is not convex.
P
with θi ≥ 0, i = 1, . . . , k, and ki=1 θi = 1. Convex hull of a set C,
conv(C), is all convex combinations of elements. Always convex
4
Examples of convex sets

• Trivial ones: empty set, point, line

• Norm ball: {x : kxk ≤ r}, for given norm k · k, radius r

• Hyperplane: {x : aT x = b}, for given a, b

• Halfspace: {x : aT x ≤ b}

• Affine space: {x : Ax = b}, for given A, b

5
• Polyhedron: {x : Ax ≤ b}, where inequality ≤ is interpreted
componentwise. Note: the set {x : Ax ≤ b, Cx = d} is also a
polyhedron
32 (why?) 2 Convex sets

a1
a2

P
a5

a3

a4

Figure 2.11 The polyhedron P (shown shaded) is the intersection of five


• Simplex: special case of polyhedra, given by
halfspaces, with outward normal vectors a , . . . . , a .
1 5

conv{x0 , . .when }, where


. , itxiskbounded). these points are affinely independent.
Figure 2.11 shows an example of a polyhedron defined as the
The canonical example is the probability simplex,
intersection of five halfspaces.
It will be convenient to use the compact notation

P = {x | Ax ≼ b, Cx = d} (2.6)
conv{e 1 , . . . , en }
for (2.5), where ⎡
=⎤ {w : w

≥⎤ 0, 1T w = 1}
aT1 cT1
⎢ .. ⎥ , ⎢ ⎥
A=⎣ . ⎦ C = ⎣ ... ⎦ ,
aTm cTp
and the symbol ≼ denotes vector inequality or componentwise inequality in Rm :
6
Cones
Cone: C ⊆ Rn such that

x ∈ C =⇒ tx ∈ C for all t ≥ 0

Convex cone:
26 cone that is also convex, i.e., 2 Convex sets

x1 , x2 ∈ C =⇒ t1 x1 + t2 x2 ∈ C for all t1 , t2 ≥ 0

x1

x2

0
Figure 2.4 The pie slice shows all points of the form θ1 x1 + θ2 x2 , where

Conic combination of x1 , . . . , xk ∈ Rn : any linear combination


θ1 , θ2 ≥ 0. The apex of the slice (which corresponds to θ1 = θ2 = 0) is at
0; its edges (which correspond to θ1 = 0 or θ2 = 0) pass through the points
x1 and x2 .

θ 1 x 1 + · · · + θk x k

with θi ≥ 0, i = 1, . . . , k. Conic hull collects all conic combinations


7
Examples of convex cones
• Norm cone: {(x, t) : kxk ≤ t}, for a norm k · k. Under the `2
norm k · k2 , called second-order cone

• Normal cone: given any set C and point x ∈ C, we can define

NC (x) = {g : g T x ≥ g T y, for all y ∈ C}


This is always a convex cone,


● ●

regardless of C

• Positive semidefinite cone: Sn n


+ = {X ∈ S : X  0}, where
X  0 means that X is positive semidefinite (and Sn is the
set of n × n symmetric matrices)
8
2
points and is contained in E2 .

Key properties of convex sets


• Separating hyperplane theorem: two disjoint convex sets have
a separating between hyperplane them
aT x ≥ b aT x ≤ b

D
C

Figure 2.19 The hyperplane {x | aT x = b} separates the disjoint convex sets


if D.C,TheDaffine
Formally:Con and
D.
arefunction
nonempty T
convex
a x − b is nonpositive on Csets with C ∩ D = ∅,
and nonnegative

then there exists a, b such that

C ⊆ {x : aT x ≤ b}
D ⊆ {x : aT x ≥ b}

9
• Supporting hyperplane theorem: a boundary point of a convex
set has a supporting hyperplane passing through it

Formally: if C is a nonempty convex set, and x0 ∈ bd(C),


then there exists a such that

C ⊆ {x : aT x ≤ aT x0 }

Both of the above theorems (separating and supporting hyperplane


theorems) have partial converses; see Section 2.5 of BV
10
Operations preserving convexity
• Intersection: the intersection of convex sets is convex

• Scaling and translation: if C is convex, then

aC + b = {ax + b : x ∈ C}

is convex for any a, b

• Affine images and preimages: if f (x) = Ax + b and C is


convex then
f (C) = {f (x) : x ∈ C}
is convex, and if D is convex then

f −1 (D) = {x : f (x) ∈ D}

is convex

11
Example: linear matrix inequality solution set
Given A1 , . . . , Ak , B ∈ Sn , a linear matrix inequality is of the form

x 1 A1 + x 2 A2 + · · · + x k Ak  B

for a variable x ∈ Rk . Let’s prove the set C of points x that satisfy


the above inequality is convex

Approach 1: directly verify that x, y ∈ C ⇒ tx + (1 − t)y ∈ C.


This follows by checking that, for any v,

 k
X 
T
v B− (txi + (1 − t)yi )Ai v ≥ 0
i=1
P
Approach 2: let f : Rk → Sn , f (x) = B − ki=1 xi Ai . Note that
C = f −1 (Sn+ ), affine preimage of convex set

12
More operations preserving convexity
• Perspective images and preimages: the perspective function is
P : Rn × R++ → Rn (where R++ denotes positive reals),

P (x, z) = x/z

for z > 0. If C ⊆ dom(P ) is convex then so is P (C), and if


D is convex then so is P −1 (D)

• Linear-fractional images and preimages: the perspective map


composed with an affine function,
Ax + b
f (x) =
cT x + d

is called a linear-fractional function, defined on cT x + d > 0.


If C ⊆ dom(f ) is convex then so if f (C), and if D is convex
then so is f −1 (D)

13
Example: conditional probability set
Let U, V be random variables over {1, . . . , n} and {1, . . . , m}. Let
C ⊆ Rnm be a set of joint distributions for U, V , i.e., each p ∈ C
defines joint probabilities

pij = P(U = i, V = j)

Let D ⊆ Rnm contain corresponding conditional distributions, i.e.,


each q ∈ D defines

qij = P(U = i|V = j)

Assume C is convex. Let’s prove that D is convex. Write


n pij o
D = q ∈ Rnm : qij = Pn , for some p ∈ C = f (C)
k=1 pkj

where f is a linear-fractional function, hence D is convex

14
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y). (3.1)
Convex functions
Geometrically, this inequality means that the line segment between (x, f (x)) and
(y, f (y)), which is the chord from x to y, lies above the graph of f (figure 3.1).
A function f is strictly convex if strict inequality holds in (3.1) whenever x ̸= y
n → R such that dom(f ) ⊆ Rn convex, and
0 < θ < 1. Wefsay
Convexandfunction: :Rf is concave if −f is convex, and strictly concave if −f is
strictly convex.
For an affine function we always have equality in (3.1), so all affine (and therefore
falso
(tx + (1
linear) − t)y)
functions ≤ tf
are both (1 − t)f
(x)and+concave.
convex (y) any
Conversely, 0 ≤ that
forfunction t≤1
is convex and concave is affine.
A function is convex if and only if it is convex when restricted to any line that
x, y ∈itsdom(f
and allintersects domain. In)other words f is convex if and only if for all x ∈ dom f and

(y, f (y))

(x, f (x))

In words, function liesof abelow


Figure 3.1 Graph the line
convex function. segment
The chord segment) be-f (x), f (y)
(i.e., line joining
tween any two points on the graph lies above the graph.

Concave function: opposite inequality above, so that

f concave ⇐⇒ −f convex

15
Important modifiers:

• Strictly convex: f tx + (1 − t)y < tf (x) + (1 − t)f (y) for
x 6= y and 0 < t < 1. In words, f is convex and has greater
curvature than a linear function

• Strongly convex with parameter m > 0: f − m 2


2 kxk2 is convex.
In words, f is at least as convex as a quadratic function

Note: strongly convex ⇒ strictly convex ⇒ convex

(Analogously for concave functions)

16
Examples of convex functions

• Univariate functions:
I Exponential function: eax is convex for any a over R
I Power function: xa is convex for a ≥ 1 or a ≤ 0 over R+
(nonnegative reals)
I Power function: xa is concave for 0 ≤ a ≤ 1 over R+
I Logarithmic function: log x is concave over R++

• Affine function: aT x + b is both convex and concave

• Quadratic function: 21 xT Qx + bT x + c is convex provided that


Q  0 (positive semidefinite)

• Least squares loss: ky − Axk22 is always convex (since AT A is


always positive semidefinite)

17
• Norm: kxk is convex for any norm; e.g., `p norms,

n
!1/p
X
kxkp = |xi |p for p ≥ 1, kxk∞ = max |xi |
i=1,...,n
i=1

and also operator (spectral) and trace (nuclear) norms,


r
X
kXkop = σ1 (X), kXktr = σr (X)
i=1

where σ1 (X) ≥ . . . ≥ σr (X) ≥ 0 are the singular values of


the matrix X

18
• Indicator function: if C is convex, then its indicator function
(
0 x∈C
IC (x) =
∞ x∈
/C

is convex

• Support function: for any set C (convex or not), its support


function
IC∗ (x) = max xT y
y∈C

is convex

• Max function: f (x) = max{x1 , . . . , xn } is convex

19
Key properties of convex functions

• A function is convex if and only if its restriction to any line is


convex

• Epigraph characterization: a function f is convex if and only


if its epigraph

epi(f ) = {(x, t) ∈ dom(f ) × R : f (x) ≤ t}

is a convex set

• Convex sublevel sets: if f is convex, then its sublevel sets

{x ∈ dom(f ) : f (x) ≤ t}

are convex, for all t ∈ R. The converse is not true

20
• First-order characterization: if f is differentiable, then f is
convex if and only if dom(f ) is convex, and

f (y) ≥ f (x) + ∇f (x)T (y − x)

for all x, y ∈ dom(f ). Therefore for a differentiable convex


function ∇f (x) = 0 ⇐⇒ x minimizes f

• Second-order characterization: if f is twice differentiable, then


f is convex if and only if dom(f ) is convex, and ∇2 f (x)  0
for all x ∈ dom(f )

• Jensen’s inequality: if f is convex, and X is a random variable


supported on dom(f ), then f (E[X]) ≤ E[f (X)]

21
Operations preserving convexity

• Nonnegative linear combination: f1 , . . . , fm convex implies


a1 f1 + · · · + am fm convex for any a1 , . . . , am ≥ 0

• Pointwise maximization: if fs is convex for any s ∈ S, then


f (x) = maxs∈S fs (x) is convex. Note that the set S here
(number of functions fs ) can be infinite

• Partial minimization: if g(x, y) is convex in x, y, and C is


convex, then f (x) = miny∈C g(x, y) is convex

22
Example: distances to a set

Let C be an arbitrary set, and consider the maximum distance to


C under an arbitrary norm k · k:

f (x) = max kx − yk
y∈C

Let’s check convexity: fy (x) = kx − yk is convex in x for any fixed


y, so by pointwise maximization rule, f is convex

Now let C be convex, and consider the minimum distance to C:

f (x) = min kx − yk
y∈C

Let’s check convexity: g(x, y) = kx − yk is convex in x, y jointly,


and C is assumed convex, so apply partial minimization rule

23
More operations preserving convexity

• Affine composition: if f is convex, then g(x) = f (Ax + b) is


convex

• General composition: suppose f = h ◦ g, where g : Rn → R,


h : R → R, f : Rn → R. Then:
I f is convex if h is convex and nondecreasing, g is convex
I f is convex if h is convex and nonincreasing, g is concave
I f is concave if h is concave and nondecreasing, g concave
I f is concave if h is concave and nonincreasing, g convex
How to remember these? Think of the chain rule when n = 1:

f 00 (x) = h00 (g(x))g 0 (x)2 + h0 (g(x))g 00 (x)

24
• Vector composition: suppose that
 
f (x) = h g(x) = h g1 (x), . . . , gk (x)

where g : Rn → Rk , h : Rk → R, f : Rn → R. Then:
I f is convex if h is convex and nondecreasing in each
argument, g is convex
I f is convex if h is convex and nonincreasing in each
argument, g is concave
I f is concave if h is concave and nondecreasing in each
argument, g is concave
I f is concave if h is concave and nonincreasing in each
argument, g is convex

25
Example: log-sum-exp function
P T
Log-sum-exp function: g(x) = log( ki=1 eai x+bi ), for fixed ai , bi ,
i = 1, . . . , k. Often called “soft max”, as it smoothly approximates
maxi=1,...k (aTi x + bi )

How to showPconvexity? First, note it suffices to prove convexity of


f (x) = log( ni=1 exi ) (affine composition rule)

Now use second-order characterization. Calculate


exi
∇i f (x) = Pn x`
`=1 e
x
e i exi exj
∇2ij f (x) = Pn 1{i = j} − P
`=1 e
x` ( n`=1 ex` )2
P
Write ∇2 f (x) = diag(z) − zz T , where zi = exi /( n`=1 ex` ). This
matrix is diagonally dominant, hence positive semidefinite

26
References and further reading

• S. Boyd and L. Vandenberghe (2004), “Convex optimization”,


Chapters 2 and 3
• J.P. Hiriart-Urruty and C. Lemarechal (1993), “Fundamentals
of convex analysis”, Chapters A and B
• R. T. Rockafellar (1970), “Convex analysis”, Chapters 1–10,

27

You might also like