0% found this document useful (0 votes)
28 views14 pages

Review 3

The document reviews concepts for the final exam, including: - Convex optimization problems minimize a convex objective function over a convex constraint set. - Examples of convex problems include projection onto convex sets, minimizing distances to points on spheres or positive quadrants, and problems with convex objectives and constraints. - Non-convex problems can arise when the constraint set is non-convex, such as defining a circle.

Uploaded by

Bharath S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

Review 3

The document reviews concepts for the final exam, including: - Convex optimization problems minimize a convex objective function over a convex constraint set. - Examples of convex problems include projection onto convex sets, minimizing distances to points on spheres or positive quadrants, and problems with convex objectives and constraints. - Non-convex problems can arise when the constraint set is non-convex, such as defining a circle.

Uploaded by

Bharath S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Review for the Final (last part)

October 30, 2015

Contents
1 Convex optimization 1
1.1 Convex function and convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Convex optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Examples of convex optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Nonlinear (convex) programming 7


2.1 Interior point method for linear programming . . . . . . . . . . . . . . . . . . . . . . 7

3 Barrier and Penalty 11

4 Examples and Applications 14

1 Convex optimization
1.1 Convex function and convex sets
For a smooth function f , we have the following equivalent definitions for f to be convex:
(a) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
(b) ∇f is monotone: (∇f (x) − ∇f (y), y − x) ≥ 0
(c) If f is smooth, ∇2 f (x) is nonnegative definite
For smooth function, we usually use (c) to check whether a function is convex or not. For
convexity of general function (or operation of convex functions), we use (a).
Similarly, we have the following equivalent definitions for strict convexity:
(a) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). The equality holds only when x = y or λ = 0, 1.
(b) ∇f is monotone: (∇f (x) − ∇f (y), y − x) > 0 if y 6= x.
(c) If f is smooth, ∇2 f (x) is positive definite.

Examples of convex functions :


• Powers xp on (0, ∞) for p > 1 or p < 0
• Absolute value |x| and norms kxkp
• Exponential ex , negative of logarithm − ln x on (0, ∞)

1
Operations on functions preserving convexity

(i) If h and g are convex, then so are m(x) = max(f (x), g(x)) and h(x) = f (x) + g(x)

(ii) If f and g are convex and g is non-decreasing, then h(x) = g(f (x)) is convex

(iii) If f (x, y) is convex in x then g(x) = supy∈C f (x, y) is convex

Theorem 1.1 (Relation between convex function and convex sets). (a) If f is convex, then the set
{x | f (x) ≤ c} (could be empty) for any constant c is convex. (b) If f is convex, then the set (called
the epigraph of f ) {(x, t) | f (x) ≤ t} (could be empty) is convex.

You can tell directly a large collection of sets are convex, like the balls

{x | kxk1 ≤ 1}, {x | kxk2 ≤ 1}, {x | kxk∞ ≤ 1}.

The relation between the sets and functions could help us to determine whether an optimization
problem below is convex or not quickly.

1.2 Convex optimization problems


A Convex optimization problem (or Convex programming) is the minimization of a convex
function on a convex subset, i.e.,
min f (x)
subject to x ∈ Ω.
Here f is a convex function and Ω is a convex set. If the objective is to maximize g, then g is
concave (or −g is convex).

Example 1.1 (Projection on convex sets). Let Ω be a convex set, then the projection of x on the
convex set Ω
min kx − yk2
y∈Ω

is a convex problem. In practical calculation, we often minimize kx − yk22 instead of kx − yk because


of the simplicity of the gradient. This projection exist if Ω is close, and is denoted as PΩ (x). It is
easy to see that PΩ (x) = x for any x ∈ Ω.

Theorem 1.2 (Characterization of projection on convex set). Let Ω be a close convex set and PΩ (x)
is the projection of x on Ω if and only if for any y ∈ Ω,

(y − PΩ (x), x − PΩ (x)) ≤ 0.

Proof. If PΩ (x) is the projection of x on Ω, then for y ∈ Ω, the point (1 − λ)PΩ (x) + λy ∈ Ω for
λ ∈ (0, 1) (because Ω is convex). From the definition, PΩ (x) has the smallest distance to x, i.e.,

kPΩ (x) − xk22 ≤ k(1 − λ)PΩ (x) + λy − xk22


= kPΩ (x) − x + λ(y − PΩ (x))k22
= kPΩ (x) − xk22 + 2λ(PΩ (x) − x, y − PΩ (x)) + λ2 ky − PΩ (x)k22 . (1)

Therefore, 2λ(PΩ (x) − x, y − PΩ (x)) + λ2 ky − PΩ (x)k22 ≥ 0 for any λ ∈ (0, 1) or

2(PΩ (x) − x, y − PΩ (x)) + λky − PΩ (x)k22 ≥ 0.

2
x

x1 PΩ(x)

x2 x3

Figure 1: The projection of x on the convex set Ω.

Let λ → 0 we have (PΩ (x) − x, y − PΩ (x)) ≥ 0.


On the other hand, if (y − PΩ (x), x − PΩ (x)) ≤ 0 then

ky − xk22 = ky − PΩ (x) + PΩ (x) − xk22


= ky − PΩ (x)k22 + 2(y − PΩ (x), PΩ (x) − x) + kPΩ (x) − xk22
≥ ky − PΩ (x)k22 + kPΩ (x) − xk22 (2)

If y 6= PΩ (x), then ky − PΩ (x)k22 > 0 and therefore ky − xk22 > kPΩ (x) − xk22 . This also shows the
uniqueness of the projection PΩ (x).

Theorem 1.3 (Nonexpansion of the projection).

kPΩ (x) − PΩ (y)k ≤ kx − yk.

Example 1.2 (Projection on the unit sphere S = {y | kyk2 ≤ 1}). If x ∈ S, or kxk ≤ 1 then
PΩ (x) = x. Otherwise, PΩ (x) is in the same direction as x, and is located on the boundary of S,
which gives PΩ (x) = x/kxk2 . Therefore
(
x, if kxk2 ≤ 1,
PΩ (x) = x
kxk2 , otherwise.

Example 1.3 (Projection on the positive quadrant Rn+ = {y = (y1 , · · · , yn ) | y1 ≥ 0, · · · , yn ≥ 0}).


We can find it using the definition.

kx − yk22 = (x1 − y1 )2 + · · · + (xn − yn )2

Since yi ≥ 0, if xi ≥ i, we can choose yi = xi ; otherwise yi = 0. Therefore

PΩ (x) = (max(x1 , 0), · · · , max(xn , 0)).

3
x

x3
pΩ(x) PΩ(x3)

PΩ(x2)
PΩ(x1)

x2
x1

Figure 2: The projection of x on the sphere and on the positive quadrant.

Example 1.4 (Projection on subspaces). If Ω is a subspace (like a hyperplane, but not necessarily
include the origin), then first we have

(y − PΩ (x), x − PΩ (x)) ≤ 0.

On the other hand, 2PΩ (x) − y ∈ Ω (the special property when Ω is a subspace, then replacing y
by 2PΩ (x) − y ∈ Ω in the previous inequality,

0 ≥ ((2PΩ (x) − y) − PΩ (x), x − PΩ (x)) = −(y − PΩ (x), x − PΩ (x)).

Therefore (y − PΩ (x), x − PΩ (x)) = 0, y − PΩ (x) and x − PΩ (x) are perpendicular.

2PΩ(x) − y
PΩ(x)
y

Figure 3: The projection of x on a subspace Ω, which have the special property that if y, PΩ (x) ∈ Ω
so is 2PΩ (x) − y.

1.3 Examples of convex optimizations


Since the feasible region Ω is usually given by a set of constraints ci (x) ≤ 0, one sufficient condition
is that ci is convex.

4
Example 1.5.
min f (x) = |x − x1 | + · · · + |x − xm |, x ∈ R,
where x1 < x2 < · · · < xm are m constants.
Here |x − xi | is convex, so is their sum. Therefore, this is a convex optimization problem.
Example 1.6. p
min f (x) = (x1 − 2)2 + (x2 − 2)2
subject to x1 + x2 = 2.
The objective function can be written as f (x) = kx − x0 k2 where x0 = (2, 2), and is convex.
The constraint is a linear equality, which is also convex. This is a convex optimization problem.
Example 1.7. p
min f (x) = (x1 − 2)2 + (x2 − 2)2
subject to x1 ≤ 1,
x2 ≤ x1 .
The objective is the same as before and is convex. The constraints are two linear inequality, and
are convex. This is a convex optimization problem.
Example 1.8.
min f (x) = x1 + x2
subject to 2 − x21 − x22 ≥ 0.
The objective function f (x) = x1 + x2 is linear, and hence convex. We have to write the constraint
as c1 (x) = x21 + x22 − 2 ≤ 0. Since c1 is convex, this is a convex optimization problem.
Example 1.9.
min f (x) = x1 + x2
subject to 2 − x21 − x22 ≥ 0, x2 ≥ 0.
Compared to the previous problem, we have the additional constraint x2 ≥ 0. Since it is convex,
this new optimization problem is convex.
Example 1.10.
min f (x) = x1 + x2
subject to x21 + x22 = 1 ≥ 0.
The constraint (a circle) is NOT convex, therefore this is not a convex optimization problem. In
general, a convex optimization problem can only have linear equality constraints, but not nonlinear
equality constraints.
Example 1.11. The minimization problem

min f (x)

where f (x) = max(x, x2 ) is convex. Because both x and x2 are convex, so their maximum f .
We can write it in the following equivalent form.

min t
subject to x ≤ t,
x2 ≤ t.

5
This problem has the objective function f (x, t) = t which is convex. The two constraints are convex
too, and this alternative form is also a convex optimization problem. The reason we prefer this form
is that everything is differentiable, but the function f has a kink (thus non-differentiable) at x = 0
and x = 1.
Example 1.12.
min (x1 − 1)2 + (x2 − 1)2
subject to kxk1 = |x1 | + |x2 | ≤ 1.
The objective function is convex, because the Hessian matrix is positive definite. The constraint is
convex because kxk1 is convex. Therefore this is a convex optimization problem.
Example 1.13.
min x1
subject to |x1 − 1| + x2 ≤ 4,
x1 − |x2 − 1| ≥ 0.
The objective function is convex. The first constraint c1 (x) = |x1 − 1| + x2 − 4 ≤ 0 is convex,
because c1 is the sum of two convex functions |x1 − 1| and x2 − 4. The second constraint can be
written as c2 (x) = |x2 − 1| − x1 ≤ 0, and c2 is also the sum of two convex functions |x2 − 1| and
−x1 . Therefore, this is a convex optimization problem.
If the objective function f is convex only on some part of the domain, we have to check whether
f is convex on the feasible region (though it is not on the whole domain).
Example 1.14.
min f (x) = x31 + x22
subject to − 1 ≤ x1 ≤ 0.
The Hessian matrix for the objective function is
 
6x1 0
∇2 f (x) = ,
0 2
which has a negative eigenvalue 6x1 if −1 ≤ x1 < 0. Therefore f is not convex on −1 ≤ x1 < 0 and
this is NOT a convex optimization.
Example 1.15.
min f (x) = x31 + x22
subject to 0 ≤ x1 ≤ 0.
The Hessian matrix for the objective function is
 
2 6x1 0
∇ f (x) = ,
0 2
which has non-negative eigenvalues on the whole feasible region. This is a convex optimization.
Example 1.16.
min f (x) = x1
subject to (x1 − 1)2 + x22 = 1,
(x1 + 1)2 + x22 = 1.
In general, problems subject to nonlinear equality constraints can not be convex, but this one
is special in the sense that the feasible region has only one point, the origin. Therefore this is a
convex optimization problem (though trivial).

6
Example 1.17.
min f (x) = x21 − 2x1 + x22 − x23 + 4x3
subject to x1 − x2 + 2x3 = 2.
The objective function f (x) is not convex (because of −x23 term), but on the (convex) feasible
region Ω, it could be convex. One way to show this is to write x2 = x1 + 2x3 − 2 (this should be
choice with the smallest amount of calculation). Equivalently, we can show the convexity of the
function
q(x1 , x3 ) = f (x1 , x1 + 2x3 − 2, x3 ) = 2x21 + 3x23 + 4x1 x2 − 6x1 − 6x3 + 4.
Since the Hessian matrix for q  
4 4
∇2 q(x) =
4 6
is positive definite, q (and thus f ) is convex. Notice that we have the condition ∇2 q(x) =
Z t ∇2 f (x)Z, where  
1 0
Z = 1 2
0 1
whose columns are in the null space of the constraint, i.e., x1 − x2 + 2x3 = 0.

Convex programming plays an important role in the field of optimization, because of properties
like:

(a) If a local minimal exists, it is a global minimum (but may not be strict)

(b) the set of all (global) minima is convex

(c) For each strictly convex function, if the function has a minimum, then it is unique.

2 Nonlinear (convex) programming


2.1 Interior point method for linear programming
The difference between simplex method and interior point method is illustrated in Figure ?? for the
problem
max x1 + x2
subject to x1 ≤ 3, x3 ≤ 2, x1 + x2 ≥ 1
x1 − x2 ≤ 1, x2 − x1 ≤ 1.
Using the fact that the optimizer is always on the boundary, the simplex methods (there are
various methods) search the optimizer from vertices to vertices, keeping the objective function
nondecreasing. However, the simplex methods suffer from a few problem:

• The complexity for the worst case is exponential, making it difficult for large scale problems
(with a lot of variables and constraints)

• The methods can be degenerate and lead to cycling (but can be fixed).

7
x2 x2
x2 − x1 ≤ 1
x1 + x2 = c
x2 ≤ 3

x1 − x2 ≤ 1

x1 ≤ 3

x1 x1
x1 + x2 ≤ 1

Figure 4: The comparison between simplex method (left) and interior point method (right).

The interior point method was proposed in early 1980s to deal with these difficulties of simplex
method. Starting from the primal problem in the standard form:

min ct x
subject to Ax = b, x ≥ 0.

The dual problem becomes


max bt λ
subject to At λ + s = c, s ≥ 0.
The interior point method solves (x, λ, s) of the optimality (KKT) condition iteratively. More
precisely, this method solves the solution to the system
 t   
A λ+s−c 0
F (x, λ, s) =  Ax − b  = 0 ,
 x, s ≥ 0,
XSe µe

where
     
x1 s1 1
.. ..  .. 
X = diag(x1 , · · · , xn ) =   , S = diag(s1 , · · · , sn ) =   , e = . ,
   
. .
xn sn 1

and then let τ → 0.


In practice, we have to find the minimizer iteratively. If at iteration k with (xk , λk , sk ), we have
to find (∆x, ∆λ, ∆s) such that

(xk+1 , λk+1 , sk+1 ) = (xk , λk , sk ) + α(∆x, ∆λ, ∆s)

for some step length α > 0. The equation for (∆x, ∆λ, ∆s) can be derived from those for
((xk , λk , sk ). From
Axk = b, Axk+1 = b

8
we have αA∆x = A(xk+1 − xk ) = 0. Since α 6= 0, A∆x = 0. Similarly At ∆λ + ∆s = 0. Finally,
from xki ski = µk and xk+1
i sk+1
i = µk+1 , we get

xki ∆si + ski ∆xi + α∆si ∆xi = (µk+1 − µk )/α.

Since both ∆xi and ∆si are small, we can ignore ∆si ∆xi and get the last sets of equations. Once
(∆x, ∆λ, ∆s) is found, the step length α is chosen such that (xk+1 , λk+1 , sk+1 ) is still inside the
feasible region.

Example 2.1. Consider the linear programming

min 2x1 + x2
x1 + 2x2 = 4
x1 ≥ 0, x2 ≥ 0.
 
2
Do one iteration of the interior point method starting from x0 = .
1
The primal problem is already in the standard form with
 
 2
A = 1 2 , b = 4, c = .
1

The dual problem is


max bt λ
subject to At λ + s = c, s ≥ 0.
 
2
We choose s = and y = 0. For fixed µ0 = 2, the equation for (∆x, ∆λ, ∆s) is
1
    
0 0 1 1 0 ∆x1 0
0 At
  
I ∆x 0
 0 2 0 1

∆x2   0 
  
A 0 0  ∆λ = 
1 2 0 0 0
  ∆λ  =  0  ,
   
S0 0 Xk ∆s 2 0 0 2 0  ∆s1  −2
0 1 0 0 1 ∆s2 1

and the solution is given by ∆x = (−1.2, 0.6), ∆λ = −0.2, ∆s1 = 0.2, ∆s2 = 0.4. Since α > 0, we
only have to check x01 + α∆x1 ≥ 0 or α ≤ 5/3. For this simple case, we get the global minimizer
for this α.

For quadratic programming, we will focus on the the projected gradient method and the related
Active-set method.
For linear equality constraints Ax = b, the gradient pk at xk , either the negative gradient
(p = −∇f (xk )) or the Newton’s method pk = −(∇2 f (xk )−1 ∇f (xk ) may not lie in the null space
k

of A (or xk + αpk is not feasible for any α 6= 0. We can project pk on to the null space of A to get
p̃k = pk − At λ for some λ. This is reduced to another problem

min kpk − At λk2 .


λ

If A has full row rank, then λ = (AAt )−1 Apk and p̃k = (I − At (AAt )−1 A)pk . Here p̃k is called the
projected or reduced gradient.

9
Example 2.2 (Active-set method). Solve the following problem using active-set method
1
min f (x) = (x1 − 3)2 + (x2 − 2)2 ,
2
subject to 2x1 − x2 ≥ 0, (c1 )
− x1 − x2 ≥ −4, (c2 )
x2 ≥ 0. (c3 )

Solution: Start with c1 and c3 are active, so that the feasible region is one point x0 = (0, 0).
The Lagrange multiplier with the subproblem min f (x) subject to c1 and c3 active are governed by

0 = ∇f (x0 ) − λ1 ∇c1 (x0 ) − λ3 ∇c3 (x0 )

or λ1 = −3/2, λ3 = −11/2. Since both Lagrange multipliers are negative, we can get rid of any one
of them. For simplicity, we get rid of c1 .
For the subproblem of min f (x) with only c3 active, we can get the minimizer x1 = (3, 0) and the
Lagrange multiplier is λ3 = −4 < 0. This implies that c3 is not active and we have an unconstrained
problem at x1 .
We can find the search direction using Newton’s method,
 
1 2 1 −1 1 0
p = −(∇ f (x )) ∇f (x ) = .
2

The step length α for x2 = x1 + αp1 is determined from the feasible condition for x2 . This gives
α = 1/2 and c2 becomes active.
Finally, we solve the subproblem min f (x) with only c2 active and get x3 = (7/3, 5/3). The
optimality condition indicates that this is a local (actually global) minimizer.

3
x3
2
x2
1

x0 1 2 x31 4

Figure 5: The active-set method

10
3 Barrier and Penalty
For constrained problems, alternative ways to dealing with the constraints are defining the objective
on the whole domain, but preventing the solution into the infeasible region (Barrier) or put some
penalty on the objective function. We have to introduce new variables to control these constraints.
In the appropriate limit of these variables, we get the same minimizer as the original problem.
We actually have encountered barrier in the duality theory, but put an infinity value on the
objective function if the variable is outside the feasible region and zero otherwise. This one is called
the indicate function. More precisely, for the minimization problem f (x) over the feasible region Ω
defined as
Ω = {x | ci (x) = 0, i ∈ E, ci (x) ≥ 0, i ∈ I}.
The original constrained problem is equivalent to

min f (x) + χΩ (x),

where χΩ (x), called the indicator function, is


(
0, if x ∈ Ω,
χΩ (x) =
∞, otherwise.

We use the following equivalent fact in the duality theory


 
X
χΩ (x) = max − λi ci (x) .
λi ≥0, i∈I S
λi free, i∈E i∈I E

However, we have switch the order of min and max to get something useful.
Other barrier methods use logarithmic and inverse function such that the problem

min f (x) subject to gi (x) ≥ 0, i = 1, 2, · · · , m.

That is
m
X
βµ (x) = f (x) − µ ln gi (x)
i=1
or
m
X 1
βµ (x) = f (x) + µ .
gi (x)
i=1

Example 3.1. Solve the following problem using logarithmic barrier


min f (x) = x1 − 2x2 ,
subject to 1 + x1 − x22 ≥ 0, (c1 )
x2 ≥ 0. (c2 )
The barrier function is

βµ (x) = x1 − 2x2 − µ log(1 + x1 − x22 ) − µ log x2 .

The minimizer is given by


µ !
1− 1+x1 −x22
0 = ∇βµ (x) = 2µx2 .
−2 + 1+x1 −x22
− xµ2

11
Using the first equation, the second equation can be reduced to −2 + 2x2 − µ/x2 = 0. The roots (it
is actually a quadratic equation) are

1 ± 1 + 2µ
x2 = .
2

We have to choose the positive root x2 = (1 + 1 + 2µ)/2 and correspondingly x1 = 3µ/2 +

( 1 + 2µ − 1)/2. To find the minimizer, we have to take the limit when µ → 0+ , that is
 √   
∗ 3µ/2 + ( 1 + 2µ − 1)/2 0
x = lim √ = .
µ→0+ (1 + 1 + 2µ)/2 1

We can get a lot of information from this approach. Since


µ µ
∇βµ (x) = ∇f (x) − ∇c1 (x) − ∇c2 (x),
c1 (x) c2 (x)

this implies that µ/ci (x) is an approximation of the Lagrangian Multiplier. From the calculate
above
µ µ 2µ p
λ1 (µ) = = 1, λ2 (µ) = =√ = 1 + 2µ − 1.
c1 (x) x2 1 + 2µ + 1
Taking the limit µ → 0+ , we get λ∗1 = 1 and λ∗2 = 0.

Example 3.2. Solve the following problem using logarithmic barrier

min f (x) = x21 + x22 ,


subject to x1 − 1 ≥ 0, (c1 )
x2 + 1 ≥ 0. (c2 )

The logarithmic barrier function is

βµ (x) = x21 + x22 − µ log(x1 − 1) − µ log(x2 + 1)

The minimizer satisfies


µ 
2x1 −

0 = ∇βµ (x) = x1 −1
µ
2x2 − x2 +1
or √ √
1+ 1 + 2µ −1 + 1 + 2µ
x1 = , x2 = .
2 2
The Lagrange Multiplier can be estimated as
µ p µ p
λ1 (µ) = = 1 + 2µ + 1, λ2 (µ) = = 1 + 2µ − 1.
x1 − 1 x1 + 1
It is easy to check that x(µ) → x∗ = (1, 0) and λ(µ) → λ∗ = (2, 0).
But in general, the Hessian matrix for Barrier function is ill-conditioned. For this one we have
µ
! 
2 + (x1 −1) 0 2 + µ4 0

2 2
∇x βµ (x) = µ ≈ .
0 2 + (x2 +1)2 0 2
 
Therefore the condition number is approximately 2+ 4
µ /2 = O(µ−1 ). The numerical scheme
may become unstable.

12
For the barrier method, the minimizer can not be outside the feasible region, while for the
penalty method, the minimizer can be infeasible, but with a penalty.
The most popular penalty method for equality constrained problem like

min f (x) subject to ci (x) = 0, i ∈ E.


x

is with quadratic penalty


µX 2
Q(x; µ) = f (x) + ci (x),
2
i∈E

and we are interested in the limit when µ goes to infinity (such that x(µ) satisfies ci (x(µ)) → 0).
Example 3.3. Consider the problem

min f (x) = −x1 x2 ,


subject to g(x) = x1 + 2x2 − 4 = 0.

using quadratic penalty.


The problem with quadratic penalty function is
µ
Q(x; µ) = −x1 x2 + (x1 + 2x2 − 4)2 ,
2
and the minimizer satisfies

−x2 + µ(x1 + 2x2 − 4) = 0, −x1 + 2µ(x1 + 2x2 − 4) = 0.

For µ > 1/4, this yields the solution


8µ 4µ
x1 (µ) = , x2 (µ) = .
µ−1 4µ − 1
The limit when µ goes to infinity gives x(µ) → x∗ = (2, 1). We can also estimate the Lagrange
Multiplier as

−λg(x(µ)) = − → λ∗ = −1.
4µ − 1
The Hessian matrix of Q is  
2 µ 2µ − 1
∇x Q(x; µ) =
2µ − 1 4µ
The conditional number is close to 25µ/4, is also ill-conditioned.
For general constrained problems

min f (x) subject to ci (x) = 0, i ∈ E, ci (x) ≥ 0, i ∈ I.


x

the quadratic penalty function is


def µX 2 µX
Q(x; µ) = f (x) + ci (x) + ([ci (x)]− )2 .
2 2
i∈E i∈I

Here [·]− mean the positive part of the argument, i.e.,


(
ci (x), if ci (x) ≥ 0,
[ci (x)]− =
0, otherwise.

13
To avoid the ill-conditioning of the Hessian matrix, nonsmooth penalty functions like
def µX 2 µX
Q(x; µ) = f (x) + ci (x) + ([ci (x)]− )2 ,
2 2
i∈E i∈I

can be introduced. But the minimizer is more difficult to find, because of the non-differentiability
of Q.

Nature optimizes things the best way.

4 Examples and Applications


Example 4.1 (Minimal distance to discrete points (Least Square)). Let xi , i = 1, 2, · · · , m be m
points in Rn , find the point x with the minimal of the sum of the squared (Euclidean) distance to
these point. This can be formulated as the following unconstrained least square problem.
m
X
min f (x) = kx − xi k22
x
i=1

The minimizer x∗ is given by


m
X m
X
0 = ∇f (x∗ ) = (x∗ − xi ) = nx∗ − xi
i=1 i=1

m
∗ 1 X
or x = xi , the mean or the center of mass of these points.
m
i=1

Example 4.2 (Minimal distance to a subspace (Least square or Projection)). Find the minimal
distance from a point x0 to the subspace spanned by the vectors v1 , v2 , · · · , vm .
This subspace can be represented as a linear combination of the vectors µ1 v1 +µ2 v2 +· · ·+µm vm
for the constants µ1 , · · · , µm . Therefore, this problem can be formulated as

min kx0 − µ1 v1 − · · · − µm vm k22 = minm f (µ), f (µ) = kx0 − V µk22 ,


µ1 ,··· ,µm µ∈R

where    t
V = v1 v2 · · · vm , µ = µ1 µ 2 · · · µm .
It is still an unconstrained problem and the minimizer is given by

0 = ∇f (µ∗ ) = V t (x0 − V µ∗ )

or V t V µ∗ = V t x0 . If v1 , v2 , · · · , vm are linearly independent, then µ∗ = (V t V )−1 V t x0 . Other-


wise, the solution is not unique, and we can choose a linearly independent subset of vectors from
v1 , · · · , vm .

Example 4.3 (Law of reflection). Find C on the line `, such that |AC| + |BC| is minimal. The
minimizer C gives the actual path of the light traveling from A to B (the actual statement is
minimal traveling time, but since the speed of light is constant in this case, it is equivalent to
minimal distance).

14

You might also like