Lec 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

10-725: Convex Optimization Fall 2013

Lecture 4: Convexity
Lecturer: Barnabás Póczos Scribes: Jessica Chemali, David Fouhey, Yuxiong Wang

Note: LaTeX template courtesy of UC Berkeley EECS dept.


Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.

The goals of this lecture are to:

1. give definitions that are important to convexity as well as examples of convex sets and basic properties;
2. define convex functions and their properties, as well as some examples.

4.1 Basic Definitions

We begin by formalizing a few mathematical objects that we will use throughout the lecture:

Definition 4.1 A line passing through x1 and x2 ∈ Rn forms the set {x ∈ Rn |x = θx1 + (1 − θ)x2 , θ ∈ R}.

Definition 4.2 The line segment passing through x1 and x2 is defined similarly, but with θ confined to [0, 1],
or {x ∈ Rn |x = θx1 + (1 − θ)x2 , θ ∈ [0, 1]}.

Figure 4.1: Line and line segment.

Definition 4.3 A Euclidean ball with radius r centered at x0 , B(xo , r), is: {x ∈ Rn |||x − x0 ||2 ≤ r}

Pn 1/p
Definition 4.4 An Lp ball is the equivalent, but for the Lp distance, or ||x||p = ( i=1 |xi |p ) {x ∈
n
R |||x − x0 ||p ≤ r}

Definition 4.5 A half-space HS(a, b) is a set of points on one side of a hyperplane: {x ∈ Rn |aT x ≤ b} .

4-1
4-2 Lecture 4: Convexity

Figure 4.2: Halfspace.

Definition 4.6 A hyperplane HS(a, b) is defined similarly, but with equality, or {x ∈ Rn |aT x = b} .

Figure 4.3: Hyperplane.

Definition 4.7 A cone is any set such that if x ∈ C and θ ≥ 0 then θx ∈ C.

We can immediately use these to give a definition of an affine set. Specifically, a set C is affine if for any
points in it, the line through them is also contained in C. For example, the affine hull of a circle in R2 is R2
and the affine hull of a line in R2 is R. Written more formally, for x1 , x2 ∈ C and θ ∈ C, θx1 + (1 − θ)x2 ∈ C.
This definition also leads to the definition of the affine hull of a set, which is the smallest affine set containing
C. The affine hull can be written out explicitly as:
( k
)
X
Aff[C] = θ1 x1 + · · · + θk xk |xi ∈ C, θi = 1
i=1

Similarly, a conic hull is


Cone[C] = {x|x = θ1 x1 + ... + θk xk , θi ≥ 0, xi ∈ C}
Lecture 4: Convexity 4-3

Figure 4.4: Cone.

Figure 4.5: Conic Hull.

Before we proceed with the remaining definitions, we also need to give a few topological definitions. Given
a set C,

Definition 4.8 We say that x is on the boundary of C, or ∂C, if for some small enough  > 0, the epsilon-
radius ball centered at x covers both inside and outside the set, or B(x, ) ∩ C 6= ∅ and B(x, ) ∩ C c 6= ∅.

Definition 4.9 We say that x is in the interior of C if ∃  > 0 s.t. B(x, ) ⊂ C.

Definition 4.10 We say that the relative interior of C, rel int C is the same as the interior, but with the
ball restricted to the affine hull of C. Specifically, x is in rel int C if B(x, ) ∩ Aff[C] ⊂ C. This definition is
more natural in some ways. Consider a line segment in a 2D-space. No points in the space are in the interior
the line segment since any epsilon ball contains points off the line; however, if we restrict our consideration
to the affine hull of the line segment (the equivalent line), some points are.

We can then use set operations to take these and generate more definitions and properties of C:

1. The closure of C, cl C, is C ∪ ∂C.


2. The relative boundary of C, rel∂C, is cl C\rel int C.
3. C is closed if ∂C ⊂ C
4. C is open if ∂C ∩ C = ∅
4-4 Lecture 4: Convexity

5. C is compact if it is closed and bounded in Rn .

4.2 Convex sets

4.2.1 Definitions

Definition 4.11 A set C ⊂ Rn is convex if for any two points in C, the line segment joining them is
contained in C. Formally, it is convex if and only if for all x1 , x2 ∈ C and θ ∈ [0, 1], θx1 + (1 − θ)x2 ∈ C.

Figure 4.6: Convex sets.

Definition 4.12 A convex set is strictly convex if for any two points in the set in general position, the line
segment less the endpoints is contained in int C. Formally, if for all x1 , x2 ∈ C with x1 6= x2 and θ ∈ (0, 1)
θx1 + (1 − θ)x2 ∈ int C.

Definition 4.13 The convex hull of a set C is the set of all convex combinations of the points in C

k
X
conv[C] = {θ1 x1 + · · · + θk xk |xi ∈ C, θi ≥ 0 ∀i = 1, . . . , k, θi = 1, k ∈ Z+ }
i=1

Pk
Here, we are constraining θi in both ways that we did before: θi ≥ 0 as in the conic hull, and i=1 θi = 1
as in the affine hull. Just as in the affine hull case, the conv[C] is the smallest convex set that also contains
C. Note that: conv[C] is convex and C ⊂ conv[C].

Figure 4.7: Convex hull.


Lecture 4: Convexity 4-5

4.2.2 Examples

Example 4.14 Many of the objects defined earlier are convex sets: lines, line segments, hyperplanes and
half spaces. Additionally, the empty set ∅ and singleton sets {x} are convex, as are complete spaces Rd . Lp
Balls are convex for p ≥ 1, but are not for 0 < p < 1.

Example 4.15 A more complex example of a convex set is a polyhedron. This, at least in this course, is
a solution set to a finite set number of linear equalities and inequalities (i.e., {x|Ax ≤ b, Cx = d}. If the
solution set is bounded, we call it a polytope. As we will see later, this is easy to show as convex, as it is a
an intersection of halfspaces (from Ax ≤ b) and hyperplanes (from Cx = d), each of which are convex.

Example 4.16 One can also construct convex sets via the convex combination of points. The definition
of the convex hull gives it for a finite setPof points. One can P∞ also define it for an infinite sum: given

x1 , x2 , . . . ∈ C and θ1 , θ2 , . . . ≥ 0 such that i=1 θRi = 1, then i=1 θi xi ∈ C so long as the series converges.
Similarly, one can use a density function p, and C p(x)xdx ∈ C if the integral exists.

Example 4.17 We can extend this to more complex domains, such as matrices. For instance, the set of
n
n × n positive semi-definite (PSD) matrices form a convex cone which we denote S+ . Recall that a matrix
T T n
A is PSD if and only if A = A and x Ax ≥ 0 for any x ∈ R ; equivalently a symmetric matrix is PSD if
all its eigenvalues are positive. This may also be written as A ≥ 0, which suggests a partial ordering defined
by the PSD property, namely, M ≥ N if and only if M − N is PSD. We can prove convexity by linearity:
given PSD matrices A and B, xT [θA + (1 − θ)B]x] = θxT Ax + (1 − θ)xT Bx ≥ 0 as it is a convex sum of
non-negative terms (since A and B are PSD).

4.2.3 Representations

We can represent a convex set in two equivalent ways.

Definition 4.18 The primal representation represents a convex set C using its convex hull: a convex com-
bination of its points. The number of points might be finite (like in the case of a polyhedron) or infinite (like
the case of a circle).

Definition 4.19 The dual representation represents a closed set C as the intersection of all the closed
halfspaces containing it.

4.2.4 Convexity Preserving Operations

One easy way to show that a set is convex is to construct it from convex sets via convexity preserving
operations. Here are a few. Given convex sets C, D ⊂ Rn , b ∈ Rn , and A ∈ Rm×n , α ∈ R, the following sets
are convex:

1. Translation: C + b = {x + b : x ∈ C}

2. Scaling: αC = {αx : x ∈ C}

3. Intersection: C ∩ D; this can also be extended to an infinite number of sets.


4-6 Lecture 4: Convexity

Figure 4.8: Dual representation as Intersection of hyperplanes.

4. Affine: AC + b = {Ax + b|x ∈ C} ⊂ Rm


5. Set sum: C + D = {x + y|x ∈ C, y ∈ D}.
6. Direct sum: C × D = {(x, y) ∈ Rn+m , x ∈ C, y ∈ D}

A more involved operation is perspective projection: suppose C ⊂ Rn × R++ is convex, then P (C) is also
convex, where P is perspective projection, or:
P (x) = P (x1 , . . . , xn , t) = (x1 /t, x2 /t, . . . , xn /t) ∈ Rn

This can be extended further to linear-fractional functions.


Theorem: image of linear-fractional functions. Let A ∈ Rm×n , c ∈ Rn , and b ∈ Rm , d ∈ R. Then, define f
as:
Ax + b
f (x) = T , dom f = {x|cT x + d > 0}
c x+d
Then f (X) is also convex.

Example 4.20 Let C = {pij |u = ui ∈ [0, 1], v = vj ∈ [0, 1], i ∈ [1, m], j ∈ [1, n]}. The set C represents the
joint probability distribution of discrete variables u and v. Let
D = {fij |i ∈ [1, m], j ∈ [1, n]}, where
pij
fij = P r(u = ui |v = vj ) = P
i pij

By the linear-fractionals theorem above: if C is convex, then D is also convex.

4.2.5 Separating hyperplanes

Theorem: Let C and D be two convex sets such that C ∩ D = ∅. Then ∃ aT ∈ Rn , b ∈ R, s.t
∀x ∈ C aT x ≤ b,
∀x ∈ D aT x ≥ b.
Lecture 4: Convexity 4-7

Strict inequalities do not always hold even when C and D do not intersect. For example, when points of
both C and D are points of the separating hyperplane.

Figure 4.9: Separating hyperplane.

Definition 4.21 Strong Separation: Let C and D be two disjoint convex sets. We say that they are stongly
separated if they can be shifted by a small amount and stay separated by a hyperplane H(a, b). Formally:

∃  > 0 s.t. aT (C + B(0, )) > b and


aT (D + B(0, )) < b.

Definition 4.22 Proper Separation: C and D are properly separated, if they are separated by hyperplane
H(a, b) and it is not the case that C ⊂ H(a, b) and D ⊂ H(a, b).

Definition 4.23 Strict Separation: C and D are strictly separated by hyperplane H(a,b) if aT x < b and aT y >
b ∀x ∈ C and y ∈ D.

Theorem 4.24 Strong Separation Theorem: If C and D are non-empty convex sets in Rn , cl C ∩ cl D = ∅,
and one of them is bounded, then there is a hyperplane that separates them strongly.

Example 4.25 Why do we need boundedness? We can construct one example to show that without the
boundedness property of at least one of the sets, there’s no strong separation. Let C be the epigraph of
f (x) = x1 and D be the set of all points below the line y = 0 in the 2D euclidean space. Then because f(x)
tends to 0, the line y = 0 does not strongly separate the two sets.

Theorem 4.26 Another version of the theorem is that C and D are strongly separated iff the distance
between any two points x ∈ C and y ∈ D is greater than 0. In the example above the distance between C
and D decreases to 0 as x tends to infinity.

Theorem 4.27 Supporting plane theorem: For any point x0 at the boundary of a convex set, ∃ a hyperplane
that lies entirely on one side of the set.
There is a very useful partial converse for this theorem: If a set C is closed and has a non-empty interior,
and ∃ a supporting hyperplane for all its boundary points, then C is convex.
4-8 Lecture 4: Convexity

Figure 4.10: Supporting hyperplane.

4.2.6 How to prove that a set is convex

• Use the definition.

• Represent it as a convex hull.

• Represent it as an intersection of halfspaces.

• Use the supporting hyperplane converse theorem.

• Construct C from convex sets using convexity-preserving operations.

4.3 Convex Function

4.3.1 Definitions

Definition 4.28 A function f : Rn → R is convex if domf is a convex set and if for all x, y ∈ domf , and
θ with 0 ≤ θ ≤ 1, we have
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).

Definition 4.29 A function f is strictly convex if whenever x 6= y, and 0 < θ < 1, strict inequality holds,
that is, we have
f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y).

Example 4.30 x4 is strictly convex.

Definition 4.31 f is concave if −f is convex.

The geometric interpretation of convex functions is shown in Fig. ??. The chord from x to y (i.e., line
segment) between any two points on the graph lies above the graph of f .
Lecture 4: Convexity 4-9

Figure 4.11: Graph of a convex function

4.3.2 Strongly Convex Function

Definition 4.32 A differentiable function f is called m−strongly convex if m > 0 and

2
(∇f (x) − ∇f (y))T (x − y) ≥ m kx − yk2 , ∀x, y ∈ domf

An equivalent condition is

m 2
f (y) ≥ f (x) + ∇f (x)T (y − x) + ky − xk2 , ∀x, y ∈ domf
2

It is not necessary for a function to be differentiable. We could have the definition without gradient.

Definition 4.33 A function f is called m−strongly convex if m > 0 and for 0 ≤ t ≤ 1

1 2
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) − mt(1 − t) kx − yk2 , ∀x, y ∈ domf
2

If the function is twice continuously differentiable, we could have the definition with Hessian matrix.

Definition 4.34 f is called m−stronly convex if m > 0 and

∇2 f (x) ≥ mI, ∀x, y ∈ domf

A strongly convex function is also strictly convex, but not vice-versa.

Example 4.35 Convex functions

p
• |x| , p ≥ 1, ∀x ∈ R.

• Max function: f (x) = max(x1 , ..., xn ), ∀x ∈ Rn .

• Norms: Every norm on Rn is convex.


4-10 Lecture 4: Convexity

Figure 4.12: Epigraph of a function f , shown shaded. The lower boundary, shown darker, is the graph of f .

Example 4.36 Concave functions


Qn
• Geometric mean: f (x) = ( i=1 xi )1/n isconcave, ∀x ∈ R++
n
.
n
• Log determinant: logdet(X)isconcave, ∀X ∈ S++ .

4.3.3 Extended-value Extension

It is often convenient to extend a convex function to all of Rn by defining its value to be ∞ outside its
domain.

Definition 4.37 f˜ : Rn → R {∞} is extended-value extension of f :


S
(
˜ f (x) x ∈ domf
f (x) =
∞ x∈ / domf

The extension f˜ is defined on all Rn , and takes values in R {∞}. This does not change its convexity
S

Theorem 4.38 f is convex


⇔ f˜ is convex
⇔ f˜(θx + (1 − θ)y) ≤ θf˜(x) + (1 − θ)f˜(y), 0 ≤ θ ≤ 1

4.3.4 Epigraph

Definition 4.39 The epigraph of a function f : Rn → R is defined as


epif = {(x, t)|x ∈ domf, f (x) ≤ t}

The definition is illustrated in Fig. ??. The link between convex sets and convex functions is via the epigraph:

Theorem 4.40 f is convex ⇔epif is a convex set


Lecture 4: Convexity 4-11

Figure 4.13: 0th order property.

Figure 4.14: 1st order property: If f is convex and differentiable, then f (y) ≥ f (x) + ∇f (x)T (y − x).

4.3.5 Convex Function Properties

4.3.5.1 0th Order Characterization

A function is convex if and only if when restricted to any line that intersects its domain is convex. That is:
f : Rn → R is convex
⇔ g(x) = f (x + tv) is convex, domg = {t|x + tv ∈ domf }, ∀x ∈ domf, ∀v ∈ Rn ,
This is useful, because we only need to check the convexity of 1D functions, as Fig. ?? shows.

4.3.5.2 1st Order Characterization

Let f be a differentiable function, domf is open and convex, then we have


f is convex ⇔ f (y) ≥ f (x) + ∇f (x)T (y − x)
It is illustrated as Fig. ?? shows. The inequality states that for a convex function, the first-order Taylor
approximation is a global underestimator of the function. Conversely, if the first-order Taylor approximation
of a function is always a global underestimator of the function, then the function is convex.
4-12 Lecture 4: Convexity

The inequality also shows that if ∇f (x) = 0, then for ∀y ∈ domf, f (y) ≥ f (x), and x is a global minimizer
of the function.

4.3.5.3 2nd Order Characterization

Let f be twice differentiable, domf is open, then we have


f is convex ⇔ ∇2 f (x) ≥ 0, ∀x ∈ domf
If ∇2 f (x) > 0, ∀x ∈ domf , f is strictly convex. The converse is not true.
For example, the function f : R → R given by f (x) = x4 is strictly convex but has zero second derivative at
x=0

4.3.6 Jensen’s Inequality

The basic inequality, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y), is sometimes called Jensen’s inequality. Now we
extended it to convex combinations of more than two points. If x is a random variable such that x ∈ domf
with probability one, and f is convex, then we have

f (Ex) ≤ Ef (x),

provided the expectations exist.

4.3.7 Convexity-preserving function operations

1. Nonnegative Weighted Sum:


If f1 , f2 are convex, ωi ≥ 0⇒ h(x) = ω1 f1 (x) + ω2 f2 (x) is convex.

2. Pointwise Max/Sup:
If f, g are convex ⇒ m(x) = max{f (x), g(x)} is convex.

3. Extension of Pointwise Max/Sup:


If f (x, y) is convex in x for each y⇒ g(x) = supy∈C f (x, y) is convex in x, provided g(x) > −∞ for
some x.

4. Affine Map:
If f : Rn → R is convex ⇒ g(x) = f (Ax + b) is convex, where A ∈ Rn×m , b ∈ Rn .

5. Composition:
If f, g are convex, and g is non-decreasing ⇒ h(x) = g(f (x)) is convex.

6. Perspective Map:
If f (x) is convex ⇒ g(x, t) = tf (x/t) is convex.
Lecture 4: Convexity 4-13

4.3.8 How to prove a function is convex


• Use definition directly.
• Prove that epigraph is convex vis set methods.
• 0th, 1st, 2nd order convexity properties.

• Construct f from simpler convex functions using convexity-preserving operations.

4.4 Summary

4.4.1 Convex Sets


• Representation: convex hull, intersect hyperplanes.

• Supporting, separating hyperplanes.


• Operations that preserve convexity.

4.4.2 Convex Functions


• Epigraph.
• 0 orders, 1st order, 2nd order conditions.

• Operations that preserve convexity.

You might also like