Review 3
Review 3
Contents
1 Convex optimization 1
1.1 Convex function and convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Convex optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Examples of convex optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Convex optimization
1.1 Convex function and convex sets
For a smooth function f , we have the following equivalent definitions for f to be convex:
(a) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
(b) ∇f is monotone: (∇f (x) − ∇f (y), y − x) ≥ 0
(c) If f is smooth, ∇2 f (x) is nonnegative definite
For smooth function, we usually use (c) to check whether a function is convex or not. For
convexity of general function (or operation of convex functions), we use (a).
Similarly, we have the following equivalent definitions for strict convexity:
(a) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). The equality holds only when x = y or λ = 0, 1.
(b) ∇f is monotone: (∇f (x) − ∇f (y), y − x) > 0 if y 6= x.
(c) If f is smooth, ∇2 f (x) is positive definite.
1
Operations on functions preserving convexity
(i) If h and g are convex, then so are m(x) = max(f (x), g(x)) and h(x) = f (x) + g(x)
(ii) If f and g are convex and g is non-decreasing, then h(x) = g(f (x)) is convex
Theorem 1.1 (Relation between convex function and convex sets). (a) If f is convex, then the set
{x | f (x) ≤ c} (could be empty) for any constant c is convex. (b) If f is convex, then the set (called
the epigraph of f ) {(x, t) | f (x) ≤ t} (could be empty) is convex.
You can tell directly a large collection of sets are convex, like the balls
The relation between the sets and functions could help us to determine whether an optimization
problem below is convex or not quickly.
Example 1.1 (Projection on convex sets). Let Ω be a convex set, then the projection of x on the
convex set Ω
min kx − yk2
y∈Ω
Theorem 1.2 (Characterization of projection on convex set). Let Ω be a close convex set and PΩ (x)
is the projection of x on Ω if and only if for any y ∈ Ω,
(y − PΩ (x), x − PΩ (x)) ≤ 0.
Proof. If PΩ (x) is the projection of x on Ω, then for y ∈ Ω, the point (1 − λ)PΩ (x) + λy ∈ Ω for
λ ∈ (0, 1) (because Ω is convex). From the definition, PΩ (x) has the smallest distance to x, i.e.,
2
x
x1 PΩ(x)
x2 x3
Ω
If y 6= PΩ (x), then ky − PΩ (x)k22 > 0 and therefore ky − xk22 > kPΩ (x) − xk22 . This also shows the
uniqueness of the projection PΩ (x).
Example 1.2 (Projection on the unit sphere S = {y | kyk2 ≤ 1}). If x ∈ S, or kxk ≤ 1 then
PΩ (x) = x. Otherwise, PΩ (x) is in the same direction as x, and is located on the boundary of S,
which gives PΩ (x) = x/kxk2 . Therefore
(
x, if kxk2 ≤ 1,
PΩ (x) = x
kxk2 , otherwise.
3
x
x3
pΩ(x) PΩ(x3)
PΩ(x2)
PΩ(x1)
x2
x1
Example 1.4 (Projection on subspaces). If Ω is a subspace (like a hyperplane, but not necessarily
include the origin), then first we have
(y − PΩ (x), x − PΩ (x)) ≤ 0.
On the other hand, 2PΩ (x) − y ∈ Ω (the special property when Ω is a subspace, then replacing y
by 2PΩ (x) − y ∈ Ω in the previous inequality,
2PΩ(x) − y
PΩ(x)
y
Figure 3: The projection of x on a subspace Ω, which have the special property that if y, PΩ (x) ∈ Ω
so is 2PΩ (x) − y.
4
Example 1.5.
min f (x) = |x − x1 | + · · · + |x − xm |, x ∈ R,
where x1 < x2 < · · · < xm are m constants.
Here |x − xi | is convex, so is their sum. Therefore, this is a convex optimization problem.
Example 1.6. p
min f (x) = (x1 − 2)2 + (x2 − 2)2
subject to x1 + x2 = 2.
The objective function can be written as f (x) = kx − x0 k2 where x0 = (2, 2), and is convex.
The constraint is a linear equality, which is also convex. This is a convex optimization problem.
Example 1.7. p
min f (x) = (x1 − 2)2 + (x2 − 2)2
subject to x1 ≤ 1,
x2 ≤ x1 .
The objective is the same as before and is convex. The constraints are two linear inequality, and
are convex. This is a convex optimization problem.
Example 1.8.
min f (x) = x1 + x2
subject to 2 − x21 − x22 ≥ 0.
The objective function f (x) = x1 + x2 is linear, and hence convex. We have to write the constraint
as c1 (x) = x21 + x22 − 2 ≤ 0. Since c1 is convex, this is a convex optimization problem.
Example 1.9.
min f (x) = x1 + x2
subject to 2 − x21 − x22 ≥ 0, x2 ≥ 0.
Compared to the previous problem, we have the additional constraint x2 ≥ 0. Since it is convex,
this new optimization problem is convex.
Example 1.10.
min f (x) = x1 + x2
subject to x21 + x22 = 1 ≥ 0.
The constraint (a circle) is NOT convex, therefore this is not a convex optimization problem. In
general, a convex optimization problem can only have linear equality constraints, but not nonlinear
equality constraints.
Example 1.11. The minimization problem
min f (x)
where f (x) = max(x, x2 ) is convex. Because both x and x2 are convex, so their maximum f .
We can write it in the following equivalent form.
min t
subject to x ≤ t,
x2 ≤ t.
5
This problem has the objective function f (x, t) = t which is convex. The two constraints are convex
too, and this alternative form is also a convex optimization problem. The reason we prefer this form
is that everything is differentiable, but the function f has a kink (thus non-differentiable) at x = 0
and x = 1.
Example 1.12.
min (x1 − 1)2 + (x2 − 1)2
subject to kxk1 = |x1 | + |x2 | ≤ 1.
The objective function is convex, because the Hessian matrix is positive definite. The constraint is
convex because kxk1 is convex. Therefore this is a convex optimization problem.
Example 1.13.
min x1
subject to |x1 − 1| + x2 ≤ 4,
x1 − |x2 − 1| ≥ 0.
The objective function is convex. The first constraint c1 (x) = |x1 − 1| + x2 − 4 ≤ 0 is convex,
because c1 is the sum of two convex functions |x1 − 1| and x2 − 4. The second constraint can be
written as c2 (x) = |x2 − 1| − x1 ≤ 0, and c2 is also the sum of two convex functions |x2 − 1| and
−x1 . Therefore, this is a convex optimization problem.
If the objective function f is convex only on some part of the domain, we have to check whether
f is convex on the feasible region (though it is not on the whole domain).
Example 1.14.
min f (x) = x31 + x22
subject to − 1 ≤ x1 ≤ 0.
The Hessian matrix for the objective function is
6x1 0
∇2 f (x) = ,
0 2
which has a negative eigenvalue 6x1 if −1 ≤ x1 < 0. Therefore f is not convex on −1 ≤ x1 < 0 and
this is NOT a convex optimization.
Example 1.15.
min f (x) = x31 + x22
subject to 0 ≤ x1 ≤ 0.
The Hessian matrix for the objective function is
2 6x1 0
∇ f (x) = ,
0 2
which has non-negative eigenvalues on the whole feasible region. This is a convex optimization.
Example 1.16.
min f (x) = x1
subject to (x1 − 1)2 + x22 = 1,
(x1 + 1)2 + x22 = 1.
In general, problems subject to nonlinear equality constraints can not be convex, but this one
is special in the sense that the feasible region has only one point, the origin. Therefore this is a
convex optimization problem (though trivial).
6
Example 1.17.
min f (x) = x21 − 2x1 + x22 − x23 + 4x3
subject to x1 − x2 + 2x3 = 2.
The objective function f (x) is not convex (because of −x23 term), but on the (convex) feasible
region Ω, it could be convex. One way to show this is to write x2 = x1 + 2x3 − 2 (this should be
choice with the smallest amount of calculation). Equivalently, we can show the convexity of the
function
q(x1 , x3 ) = f (x1 , x1 + 2x3 − 2, x3 ) = 2x21 + 3x23 + 4x1 x2 − 6x1 − 6x3 + 4.
Since the Hessian matrix for q
4 4
∇2 q(x) =
4 6
is positive definite, q (and thus f ) is convex. Notice that we have the condition ∇2 q(x) =
Z t ∇2 f (x)Z, where
1 0
Z = 1 2
0 1
whose columns are in the null space of the constraint, i.e., x1 − x2 + 2x3 = 0.
Convex programming plays an important role in the field of optimization, because of properties
like:
(a) If a local minimal exists, it is a global minimum (but may not be strict)
(c) For each strictly convex function, if the function has a minimum, then it is unique.
• The complexity for the worst case is exponential, making it difficult for large scale problems
(with a lot of variables and constraints)
• The methods can be degenerate and lead to cycling (but can be fixed).
7
x2 x2
x2 − x1 ≤ 1
x1 + x2 = c
x2 ≤ 3
x1 − x2 ≤ 1
x1 ≤ 3
x1 x1
x1 + x2 ≤ 1
Figure 4: The comparison between simplex method (left) and interior point method (right).
The interior point method was proposed in early 1980s to deal with these difficulties of simplex
method. Starting from the primal problem in the standard form:
min ct x
subject to Ax = b, x ≥ 0.
where
x1 s1 1
.. .. ..
X = diag(x1 , · · · , xn ) = , S = diag(s1 , · · · , sn ) = , e = . ,
. .
xn sn 1
for some step length α > 0. The equation for (∆x, ∆λ, ∆s) can be derived from those for
((xk , λk , sk ). From
Axk = b, Axk+1 = b
8
we have αA∆x = A(xk+1 − xk ) = 0. Since α 6= 0, A∆x = 0. Similarly At ∆λ + ∆s = 0. Finally,
from xki ski = µk and xk+1
i sk+1
i = µk+1 , we get
Since both ∆xi and ∆si are small, we can ignore ∆si ∆xi and get the last sets of equations. Once
(∆x, ∆λ, ∆s) is found, the step length α is chosen such that (xk+1 , λk+1 , sk+1 ) is still inside the
feasible region.
min 2x1 + x2
x1 + 2x2 = 4
x1 ≥ 0, x2 ≥ 0.
2
Do one iteration of the interior point method starting from x0 = .
1
The primal problem is already in the standard form with
2
A = 1 2 , b = 4, c = .
1
and the solution is given by ∆x = (−1.2, 0.6), ∆λ = −0.2, ∆s1 = 0.2, ∆s2 = 0.4. Since α > 0, we
only have to check x01 + α∆x1 ≥ 0 or α ≤ 5/3. For this simple case, we get the global minimizer
for this α.
For quadratic programming, we will focus on the the projected gradient method and the related
Active-set method.
For linear equality constraints Ax = b, the gradient pk at xk , either the negative gradient
(p = −∇f (xk )) or the Newton’s method pk = −(∇2 f (xk )−1 ∇f (xk ) may not lie in the null space
k
of A (or xk + αpk is not feasible for any α 6= 0. We can project pk on to the null space of A to get
p̃k = pk − At λ for some λ. This is reduced to another problem
If A has full row rank, then λ = (AAt )−1 Apk and p̃k = (I − At (AAt )−1 A)pk . Here p̃k is called the
projected or reduced gradient.
9
Example 2.2 (Active-set method). Solve the following problem using active-set method
1
min f (x) = (x1 − 3)2 + (x2 − 2)2 ,
2
subject to 2x1 − x2 ≥ 0, (c1 )
− x1 − x2 ≥ −4, (c2 )
x2 ≥ 0. (c3 )
Solution: Start with c1 and c3 are active, so that the feasible region is one point x0 = (0, 0).
The Lagrange multiplier with the subproblem min f (x) subject to c1 and c3 active are governed by
or λ1 = −3/2, λ3 = −11/2. Since both Lagrange multipliers are negative, we can get rid of any one
of them. For simplicity, we get rid of c1 .
For the subproblem of min f (x) with only c3 active, we can get the minimizer x1 = (3, 0) and the
Lagrange multiplier is λ3 = −4 < 0. This implies that c3 is not active and we have an unconstrained
problem at x1 .
We can find the search direction using Newton’s method,
1 2 1 −1 1 0
p = −(∇ f (x )) ∇f (x ) = .
2
The step length α for x2 = x1 + αp1 is determined from the feasible condition for x2 . This gives
α = 1/2 and c2 becomes active.
Finally, we solve the subproblem min f (x) with only c2 active and get x3 = (7/3, 5/3). The
optimality condition indicates that this is a local (actually global) minimizer.
3
x3
2
x2
1
x0 1 2 x31 4
10
3 Barrier and Penalty
For constrained problems, alternative ways to dealing with the constraints are defining the objective
on the whole domain, but preventing the solution into the infeasible region (Barrier) or put some
penalty on the objective function. We have to introduce new variables to control these constraints.
In the appropriate limit of these variables, we get the same minimizer as the original problem.
We actually have encountered barrier in the duality theory, but put an infinity value on the
objective function if the variable is outside the feasible region and zero otherwise. This one is called
the indicate function. More precisely, for the minimization problem f (x) over the feasible region Ω
defined as
Ω = {x | ci (x) = 0, i ∈ E, ci (x) ≥ 0, i ∈ I}.
The original constrained problem is equivalent to
However, we have switch the order of min and max to get something useful.
Other barrier methods use logarithmic and inverse function such that the problem
That is
m
X
βµ (x) = f (x) − µ ln gi (x)
i=1
or
m
X 1
βµ (x) = f (x) + µ .
gi (x)
i=1
11
Using the first equation, the second equation can be reduced to −2 + 2x2 − µ/x2 = 0. The roots (it
is actually a quadratic equation) are
√
1 ± 1 + 2µ
x2 = .
2
√
We have to choose the positive root x2 = (1 + 1 + 2µ)/2 and correspondingly x1 = 3µ/2 +
√
( 1 + 2µ − 1)/2. To find the minimizer, we have to take the limit when µ → 0+ , that is
√
∗ 3µ/2 + ( 1 + 2µ − 1)/2 0
x = lim √ = .
µ→0+ (1 + 1 + 2µ)/2 1
this implies that µ/ci (x) is an approximation of the Lagrangian Multiplier. From the calculate
above
µ µ 2µ p
λ1 (µ) = = 1, λ2 (µ) = =√ = 1 + 2µ − 1.
c1 (x) x2 1 + 2µ + 1
Taking the limit µ → 0+ , we get λ∗1 = 1 and λ∗2 = 0.
12
For the barrier method, the minimizer can not be outside the feasible region, while for the
penalty method, the minimizer can be infeasible, but with a penalty.
The most popular penalty method for equality constrained problem like
and we are interested in the limit when µ goes to infinity (such that x(µ) satisfies ci (x(µ)) → 0).
Example 3.3. Consider the problem
13
To avoid the ill-conditioning of the Hessian matrix, nonsmooth penalty functions like
def µX 2 µX
Q(x; µ) = f (x) + ci (x) + ([ci (x)]− )2 ,
2 2
i∈E i∈I
can be introduced. But the minimizer is more difficult to find, because of the non-differentiability
of Q.
m
∗ 1 X
or x = xi , the mean or the center of mass of these points.
m
i=1
Example 4.2 (Minimal distance to a subspace (Least square or Projection)). Find the minimal
distance from a point x0 to the subspace spanned by the vectors v1 , v2 , · · · , vm .
This subspace can be represented as a linear combination of the vectors µ1 v1 +µ2 v2 +· · ·+µm vm
for the constants µ1 , · · · , µm . Therefore, this problem can be formulated as
where t
V = v1 v2 · · · vm , µ = µ1 µ 2 · · · µm .
It is still an unconstrained problem and the minimizer is given by
0 = ∇f (µ∗ ) = V t (x0 − V µ∗ )
Example 4.3 (Law of reflection). Find C on the line `, such that |AC| + |BC| is minimal. The
minimizer C gives the actual path of the light traveling from A to B (the actual statement is
minimal traveling time, but since the speed of light is constant in this case, it is equivalent to
minimal distance).
14