0% found this document useful (0 votes)
161 views5 pages

HW 2 Sol

This document contains homework problems related to optimization from Prof. Boyd's EE364b class. The first problem discusses subgradient optimality conditions for nondifferentiable inequality constrained optimization. The second problem covers optimality conditions and coordinate descent for l1-regularized minimization, and discusses how coordinate descent can fail for some functions but works for l1 regularization. The third problem provides solutions and explanations for each part of the homework.

Uploaded by

techutechu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views5 pages

HW 2 Sol

This document contains homework problems related to optimization from Prof. Boyd's EE364b class. The first problem discusses subgradient optimality conditions for nondifferentiable inequality constrained optimization. The second problem covers optimality conditions and coordinate descent for l1-regularized minimization, and discusses how coordinate descent can fail for some functions but works for l1 regularization. The third problem provides solutions and explanations for each part of the homework.

Uploaded by

techutechu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

EE364b

Prof. S. Boyd

EE364b Homework 2
1. Subgradient optimality conditions for nondifferentiable inequality constrained optimization. Consider the problem
minimize f0 (x)
subject to fi (x) 0,

i = 1, . . . , m,

with variable x Rn . We do not assume that f0 , . . . , fm are convex. Suppose that x


 0 satisfy primal feasibility,
and
fi (
x) 0,

i = 1, . . . , m,

dual feasibility,
0 f0 (
x) +

m
X

i fi (

x),

i=1

and the complementarity condition


i fi (

x) = 0,

i = 1, . . . , m.

Show that x is optimal, using only a simple argument, and definition of subgradient.
Recall that we do not assume the functions f0 , . . . , fm are convex.
P

x). By
Solution. Let g be defined by g(x) = f0 (x) + m
i=1 i fi (x). Then, 0 g(
definition of subgradient, this means that for any y,
g(y) g(
x) + 0T (y x).
Thus, for any y,
f0 (y) f0 (
x)

m
X

i (fi (y) fi (

x)).

i=1

For each i, complementarity implies that either i = 0 or fi (


x) = 0. Hence, for any

feasible y (for which fi (y) 0), each i (fi (y) fi (


x)) term is either zero or negative.
Therefore, any feasible y also satisfies f0 (y) f0 (
x), and x is optimal.
2. Optimality conditions and coordinate-wise descent for 1 -regularized minimization. We
consider the problem of minimizing
(x) = f (x) + kxk1 ,
where f : Rn R is convex and differentiable, and 0. The number is the
regularization parameter, and is used to control the trade-off between small f and
small kxk1 . When 1 -regularization is used as a heuristic for finding a sparse x for
which f (x) is small, controls (roughly) the trade-off between f (x) and the cardinality
(number of nonzero elements) of x.
1

(a) Show that x = 0 is optimal for this problem (i.e., minimizes ) if and only if
kf (0)k . In particular, for max = kf (0)k , 1 regularization yields
the sparsest possible x, the zero vector.
Remark. The value max gives a good reference point for choosing a value of the
penalty parameter in 1 -regularized minimization. A common choice is to start
with = max /2, and then adjust to achieve the desired sparsity/fit trade-off.
Solution. A necessary and sufficient condition for optimality of x = 0 is that
0 (0). Now (0) = f (0) + k0k1 = f (0) + [1, 1]n . In other words,
x = 0 is optimal if f (x) [, ]n . This is equivalent to kf (0)k .
(b) Coordinate-wise descent. In the coordinate-wise descent method for minimizing
a convex function g, we first minimize over x1 , keeping all other variables fixed;
then we minimize over x2 , keeping all other variables fixed, and so on. After
minimizing over xn , we go back to x1 and repeat the whole process, repeatedly
cycling over all n variables.
Show that coordinate-wise descent fails for the function
g(x) = |x1 x2 | + 0.1(x1 + x2 ).
(In particular, verify that the algorithm terminates after one step at the point
(0)
(0)
(x2 , x2 ), while inf x g(x) = .) Thus, coordinate-wise descent need not work,
for general convex functions.
(0)
Solution. We first minimize over x1 , with x2 fixed as x2 . The optimal choice is
(0)
x1 = x2 , since the derivative on the left is 0.9, and on the right, it is 1.1. We
(0)
(0)
then arrive at the point (x2 , x2 ). We now optimize over x2 . But it is optimal,
with the same left and right derivatives, so x is unchanged. Were now at a fixed
point of the coordinate-descent algorithm.
On the other hand, taking x = (t, t) and letting t , we see that g(x) =
0.1t .
Its good to visualize coordinate-wise descent for this function, to see why x gets
stuck at the crease along x1 = x2 . The graph looks like a folded piece of paper,
with the crease along the line x1 = x2 . The bottom of the crease has a small
tilt in the direction (1, 1), so the function is unbounded below. Moving along
either axis increases g, so coordinate-wise descent is stuck. But moving in the
direction (1, 1), for example, decreases the function.
(c) Now consider coordinate-wise descent for minimizing the specific function defined above. Assuming f is strongly convex (say) it can be shown that the iterates
converge to a fixed point x. Show that x is optimal, i.e., minimizes .
Thus, coordinate-wise descent works for 1 -regularized minimization of a differentiable function.
Solution. For each i, xi minimizes the function , with all other variables kept

fixed. It follows that


0 xi (
x) =

f
(
x) + Ii ,
xi

i = 1, . . . , n,

where Ii is the subdifferential of | | at xi : Ii = {1} if xi < 0, Ii = {+1} if


xi > 0, and Ii = [1, 1] if xi = 0.
But this is the same as saying 0 f (
x) + k
xk1 , which means that x minimizes
.
The subtlety here lies in the general formula that relates the subdifferential of
a function to its partial subdifferentials with respect to its components. For a
separable function h : R2 R, we have
h(x) = x1 h(x) x2 h(x),
but this is false in general.
(d) Work out an explicit form for coordinate-wise descent for 1 -regularized leastsquares, i.e., for minimizing the function
kAx bk22 + kxk1 .
You might find the deadzone function

u1 u>1

|u| 1
(u) = 0

u + 1 u < 1

useful. Generate some data and try out the coordinate-wise descent method.
Check the result against the solution found using CVX, and produce a graph showing convergence of your coordinate-wise method.
Solution. At each step we choose an index i, and minimize kAx bk22 + kxk1
over xi , while holding all other xj , with j 6= i, constant.
Selecting the optimal xi for this problem is equivalent to selecting the optimal xi
in the problem
minimize ax2i + cxi + |xi |,
where a = (AT A)ii / and c = (2/)( j6=i (AT A)ij xj + (bT A)i ). Using the theory
discussed above, any minimizer xi will satisfy 0 2axi + c + |xi |. Now we note
that a is positive, so the minimizer of the above problem will have opposite sign
to c. From there we deduce that the (unique) minimizer xi will be
P

xi

0
c [1, 1]
(1/2a)(c + sign(c)) otherwise,

where

1 u < 0

sign(u) = 0

1
3

u=0
u > 0.

Finally, we make use of the deadzone function defined above and write
xi

((2/)

T
j6=i (A A)ij xj
(2/)(AT A)ii

+ (bT A)i )

Coordinate descent was implemented in Matlab for a random problem instance


with A R400200 . When solving to within 0.1% accuracy, the iterative method
required only a third the time of cvx. Sample code appears below, followed by
a graph showing the coordinate-wise descent methods function value converging
to the CVX function value.
% Generate a random problem instance.
randn(state, 10239); m = 400; n = 200;
A = randn(m, n); ATA = A*A;
b = randn(m, 1);
l = 0.1;
TOL = 0.001;
xcoord = zeros(n, 1);
% Solve in cvx as a benchmark.
cvx_begin
variable xcvx(n);
minimize(sum_square(A*xcvx + b) + l*norm(xcvx, 1));
cvx_end
% Solve using coordinate-wise descent.
while abs(cvx_optval - (sum_square(A*xcoord + b) + ...
l*norm(xcoord, 1)))/cvx_optval > TOL
for i = 1:n
xcoord(i) = 0; ei = zeros(n,1); ei(i) = 1;
c = 2/l*ei*(ATA*xcoord + A*b);
xcoord(i) = -sign(c)*pos(abs(c) - 1)/(2*ATA(i,i)/l);
end
end

10

10

10

10

10

10

10

10

10

10

15

20

25

30

You might also like