Lec 4
Lec 4
Lec 4
Lecture 4: Convexity
Lecturer: Barnabás Póczos Scribes: Jessica Chemali, David Fouhey, Yuxiong Wang
1. give definitions that are important to convexity as well as examples of convex sets and basic properties;
2. define convex functions and their properties, as well as some examples.
We begin by formalizing a few mathematical objects that we will use throughout the lecture:
Definition 4.1 A line passing through x1 and x2 ∈ Rn forms the set {x ∈ Rn |x = θx1 + (1 − θ)x2 , θ ∈ R}.
Definition 4.2 The line segment passing through x1 and x2 is defined similarly, but with θ confined to [0, 1],
or {x ∈ Rn |x = θx1 + (1 − θ)x2 , θ ∈ [0, 1]}.
Definition 4.3 A Euclidean ball with radius r centered at x0 , B(xo , r), is: {x ∈ Rn |||x − x0 ||2 ≤ r}
Pn 1/p
Definition 4.4 An Lp ball is the equivalent, but for the Lp distance, or ||x||p = ( i=1 |xi |p ) {x ∈
n
R |||x − x0 ||p ≤ r}
Definition 4.5 A half-space HS(a, b) is a set of points on one side of a hyperplane: {x ∈ Rn |aT x ≤ b} .
4-1
4-2 Lecture 4: Convexity
Definition 4.6 A hyperplane HS(a, b) is defined similarly, but with equality, or {x ∈ Rn |aT x = b} .
We can immediately use these to give a definition of an affine set. Specifically, a set C is affine if for any
points in it, the line through them is also contained in C. For example, the affine hull of a circle in R2 is R2
and the affine hull of a line in R2 is R. Written more formally, for x1 , x2 ∈ C and θ ∈ C, θx1 + (1 − θ)x2 ∈ C.
This definition also leads to the definition of the affine hull of a set, which is the smallest affine set containing
C. The affine hull can be written out explicitly as:
( k
)
X
Aff[C] = θ1 x1 + · · · + θk xk |xi ∈ C, θi = 1
i=1
Before we proceed with the remaining definitions, we also need to give a few topological definitions. Given
a set C,
Definition 4.8 We say that x is on the boundary of C, or ∂C, if for some small enough > 0, the epsilon-
radius ball centered at x covers both inside and outside the set, or B(x, ) ∩ C 6= ∅ and B(x, ) ∩ C c 6= ∅.
Definition 4.10 We say that the relative interior of C, rel int C is the same as the interior, but with the
ball restricted to the affine hull of C. Specifically, x is in rel int C if B(x, ) ∩ Aff[C] ⊂ C. This definition is
more natural in some ways. Consider a line segment in a 2D-space. No points in the space are in the interior
the line segment since any epsilon ball contains points off the line; however, if we restrict our consideration
to the affine hull of the line segment (the equivalent line), some points are.
We can then use set operations to take these and generate more definitions and properties of C:
4.2.1 Definitions
Definition 4.11 A set C ⊂ Rn is convex if for any two points in C, the line segment joining them is
contained in C. Formally, it is convex if and only if for all x1 , x2 ∈ C and θ ∈ [0, 1], θx1 + (1 − θ)x2 ∈ C.
Definition 4.12 A convex set is strictly convex if for any two points in the set in general position, the line
segment less the endpoints is contained in int C. Formally, if for all x1 , x2 ∈ C with x1 6= x2 and θ ∈ (0, 1)
θx1 + (1 − θ)x2 ∈ int C.
Definition 4.13 The convex hull of a set C is the set of all convex combinations of the points in C
k
X
conv[C] = {θ1 x1 + · · · + θk xk |xi ∈ C, θi ≥ 0 ∀i = 1, . . . , k, θi = 1, k ∈ Z+ }
i=1
Pk
Here, we are constraining θi in both ways that we did before: θi ≥ 0 as in the conic hull, and i=1 θi = 1
as in the affine hull. Just as in the affine hull case, the conv[C] is the smallest convex set that also contains
C. Note that: conv[C] is convex and C ⊂ conv[C].
4.2.2 Examples
Example 4.14 Many of the objects defined earlier are convex sets: lines, line segments, hyperplanes and
half spaces. Additionally, the empty set ∅ and singleton sets {x} are convex, as are complete spaces Rd . Lp
Balls are convex for p ≥ 1, but are not for 0 < p < 1.
Example 4.15 A more complex example of a convex set is a polyhedron. This, at least in this course, is
a solution set to a finite set number of linear equalities and inequalities (i.e., {x|Ax ≤ b, Cx = d}. If the
solution set is bounded, we call it a polytope. As we will see later, this is easy to show as convex, as it is a
an intersection of halfspaces (from Ax ≤ b) and hyperplanes (from Cx = d), each of which are convex.
Example 4.16 One can also construct convex sets via the convex combination of points. The definition
of the convex hull gives it for a finite setPof points. One can P∞ also define it for an infinite sum: given
∞
x1 , x2 , . . . ∈ C and θ1 , θ2 , . . . ≥ 0 such that i=1 θRi = 1, then i=1 θi xi ∈ C so long as the series converges.
Similarly, one can use a density function p, and C p(x)xdx ∈ C if the integral exists.
Example 4.17 We can extend this to more complex domains, such as matrices. For instance, the set of
n
n × n positive semi-definite (PSD) matrices form a convex cone which we denote S+ . Recall that a matrix
T T n
A is PSD if and only if A = A and x Ax ≥ 0 for any x ∈ R ; equivalently a symmetric matrix is PSD if
all its eigenvalues are positive. This may also be written as A ≥ 0, which suggests a partial ordering defined
by the PSD property, namely, M ≥ N if and only if M − N is PSD. We can prove convexity by linearity:
given PSD matrices A and B, xT [θA + (1 − θ)B]x] = θxT Ax + (1 − θ)xT Bx ≥ 0 as it is a convex sum of
non-negative terms (since A and B are PSD).
4.2.3 Representations
Definition 4.18 The primal representation represents a convex set C using its convex hull: a convex com-
bination of its points. The number of points might be finite (like in the case of a polyhedron) or infinite (like
the case of a circle).
Definition 4.19 The dual representation represents a closed set C as the intersection of all the closed
halfspaces containing it.
One easy way to show that a set is convex is to construct it from convex sets via convexity preserving
operations. Here are a few. Given convex sets C, D ⊂ Rn , b ∈ Rn , and A ∈ Rm×n , α ∈ R, the following sets
are convex:
1. Translation: C + b = {x + b : x ∈ C}
2. Scaling: αC = {αx : x ∈ C}
A more involved operation is perspective projection: suppose C ⊂ Rn × R++ is convex, then P (C) is also
convex, where P is perspective projection, or:
P (x) = P (x1 , . . . , xn , t) = (x1 /t, x2 /t, . . . , xn /t) ∈ Rn
Example 4.20 Let C = {pij |u = ui ∈ [0, 1], v = vj ∈ [0, 1], i ∈ [1, m], j ∈ [1, n]}. The set C represents the
joint probability distribution of discrete variables u and v. Let
D = {fij |i ∈ [1, m], j ∈ [1, n]}, where
pij
fij = P r(u = ui |v = vj ) = P
i pij
Theorem: Let C and D be two convex sets such that C ∩ D = ∅. Then ∃ aT ∈ Rn , b ∈ R, s.t
∀x ∈ C aT x ≤ b,
∀x ∈ D aT x ≥ b.
Lecture 4: Convexity 4-7
Strict inequalities do not always hold even when C and D do not intersect. For example, when points of
both C and D are points of the separating hyperplane.
Definition 4.21 Strong Separation: Let C and D be two disjoint convex sets. We say that they are stongly
separated if they can be shifted by a small amount and stay separated by a hyperplane H(a, b). Formally:
Definition 4.22 Proper Separation: C and D are properly separated, if they are separated by hyperplane
H(a, b) and it is not the case that C ⊂ H(a, b) and D ⊂ H(a, b).
Definition 4.23 Strict Separation: C and D are strictly separated by hyperplane H(a,b) if aT x < b and aT y >
b ∀x ∈ C and y ∈ D.
Theorem 4.24 Strong Separation Theorem: If C and D are non-empty convex sets in Rn , cl C ∩ cl D = ∅,
and one of them is bounded, then there is a hyperplane that separates them strongly.
Example 4.25 Why do we need boundedness? We can construct one example to show that without the
boundedness property of at least one of the sets, there’s no strong separation. Let C be the epigraph of
f (x) = x1 and D be the set of all points below the line y = 0 in the 2D euclidean space. Then because f(x)
tends to 0, the line y = 0 does not strongly separate the two sets.
Theorem 4.26 Another version of the theorem is that C and D are strongly separated iff the distance
between any two points x ∈ C and y ∈ D is greater than 0. In the example above the distance between C
and D decreases to 0 as x tends to infinity.
Theorem 4.27 Supporting plane theorem: For any point x0 at the boundary of a convex set, ∃ a hyperplane
that lies entirely on one side of the set.
There is a very useful partial converse for this theorem: If a set C is closed and has a non-empty interior,
and ∃ a supporting hyperplane for all its boundary points, then C is convex.
4-8 Lecture 4: Convexity
4.3.1 Definitions
Definition 4.28 A function f : Rn → R is convex if domf is a convex set and if for all x, y ∈ domf , and
θ with 0 ≤ θ ≤ 1, we have
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
Definition 4.29 A function f is strictly convex if whenever x 6= y, and 0 < θ < 1, strict inequality holds,
that is, we have
f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y).
The geometric interpretation of convex functions is shown in Fig. ??. The chord from x to y (i.e., line
segment) between any two points on the graph lies above the graph of f .
Lecture 4: Convexity 4-9
2
(∇f (x) − ∇f (y))T (x − y) ≥ m kx − yk2 , ∀x, y ∈ domf
An equivalent condition is
m 2
f (y) ≥ f (x) + ∇f (x)T (y − x) + ky − xk2 , ∀x, y ∈ domf
2
It is not necessary for a function to be differentiable. We could have the definition without gradient.
1 2
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) − mt(1 − t) kx − yk2 , ∀x, y ∈ domf
2
If the function is twice continuously differentiable, we could have the definition with Hessian matrix.
p
• |x| , p ≥ 1, ∀x ∈ R.
Figure 4.12: Epigraph of a function f , shown shaded. The lower boundary, shown darker, is the graph of f .
It is often convenient to extend a convex function to all of Rn by defining its value to be ∞ outside its
domain.
The extension f˜ is defined on all Rn , and takes values in R {∞}. This does not change its convexity
S
4.3.4 Epigraph
The definition is illustrated in Fig. ??. The link between convex sets and convex functions is via the epigraph:
Figure 4.14: 1st order property: If f is convex and differentiable, then f (y) ≥ f (x) + ∇f (x)T (y − x).
A function is convex if and only if when restricted to any line that intersects its domain is convex. That is:
f : Rn → R is convex
⇔ g(x) = f (x + tv) is convex, domg = {t|x + tv ∈ domf }, ∀x ∈ domf, ∀v ∈ Rn ,
This is useful, because we only need to check the convexity of 1D functions, as Fig. ?? shows.
The inequality also shows that if ∇f (x) = 0, then for ∀y ∈ domf, f (y) ≥ f (x), and x is a global minimizer
of the function.
The basic inequality, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y), is sometimes called Jensen’s inequality. Now we
extended it to convex combinations of more than two points. If x is a random variable such that x ∈ domf
with probability one, and f is convex, then we have
f (Ex) ≤ Ef (x),
2. Pointwise Max/Sup:
If f, g are convex ⇒ m(x) = max{f (x), g(x)} is convex.
4. Affine Map:
If f : Rn → R is convex ⇒ g(x) = f (Ax + b) is convex, where A ∈ Rn×m , b ∈ Rn .
5. Composition:
If f, g are convex, and g is non-decreasing ⇒ h(x) = g(f (x)) is convex.
6. Perspective Map:
If f (x) is convex ⇒ g(x, t) = tf (x/t) is convex.
Lecture 4: Convexity 4-13
4.4 Summary