EE364a Homework 6 Solutions: I 1,..., K I I I
EE364a Homework 6 Solutions: I 1,..., K I I I
Boyd
where
p(t) = a0 + a1 t + a2 t2 + · · · + am tm , q(t) = 1 + b1 t + · · · + bn tn ,
In this problem we fit a rational function p(t)/q(t) to given data, while constraining
the denominator polynomial to be positive on the interval [α, β]. The optimization
variables are the numerator and denominator coefficients ai , bi . The interpolation
points ti ∈ [α, β], and desired function values yi , i = 1, . . . , k, are given.
Solution. Let’s show the objective is quasiconvex. Its domain is convex. Since
q(ti ) > 0 for i = 1, . . . , k, we have
if and only if
|p(ti ) − yi q(ti )| ≤ γq(ti ), i = 1, . . . , k,
which defines a convex set in the variables a and b, since the lefthand side is convex,
and the righthand side is linear. We can further express these inequalities as a set of
2k linear inequalities,
1
Solutions to additional exercises
1. Minimax rational fit to the exponential. (See exercise 6.9.) We consider the specific
problem instance with data
ti = −3 + 6(i − 1)/(k − 1), yi = eti , i = 1, . . . , k,
where k = 201. (In other words, the data are obtained by uniformly sampling the
exponential function over the interval [−3, 3].) Find a function of the form
a0 + a1 t + a2 t2
f (t) =
1 + b1 t + b2 t2
that minimizes maxi=1,...,k |f (ti ) − yi |. (We require that 1 + b1 ti + b2 t2i > 0 for i =
1, . . . , k.)
Find optimal values of a0 , a1 , a2 , b1 , b2 , and give the optimal objective value, com-
puted to an accuracy of 0.001. Plot the data and the optimal rational function fit on
the same plot. On a different plot, give the fitting error, i.e., f (ti ) − yi .
Hint. You can use strcmp(cvx_status,’Solved’), after cvx_end, to check if a feasi-
bility problem is feasible.
Solution. The objective function (and therefore also the problem) is not convex, but
it is quasiconvex. We have maxi=1,...,k |f (ti ) − yi | ≤ γ if and only if
a + a t + a t2
0 1 i 2 i
1 + b1 ti + b2 t2i
− y i ≤ γ,
i = 1, . . . , k.
The following Matlab code solves the problem for the particular problem instance.
2
k=201;
t=(-3:6/(k-1):3)’;
y=exp(t);
Tpowers=[ones(k,1) t t.^2];
if strcmp(cvx_status,’Solved’)
u=gamma;
a_opt=a;
b_opt=b;
objval_opt=gamma;
else
l=gamma;
end
end
y_fit=Tpowers*a_opt./(Tpowers*[1;b_opt]);
figure(1);
plot(t,y,’b’, t,y_fit,’r+’);
xlabel(’t’);
ylabel(’y’);
figure(2);
plot(t, y_fit-y);
xlabel(’t’);
ylabel(’err’);
The optimal values are
a0 = 1.0099, a1 = 0.6117, a2 = 0.1134, b1 = −0.4147, b2 = 0.0485,
3
25
20
15
y
10
0
−3 −2 −1 0 1 2 3
x
Figure 1 Chebyshev fit with rational function. The line represents the data and the
crosses the fitted points.
and the optimal objective value is 0.0233. We also get the following plots.
2. Maximum likelihood prediction of team ability. A set of n teams compete in a tourna-
ment. We model each team’s ability by a number aj ∈ [0, 1], j = 1, . . . , n. When teams
j and k play each other, the probability that team j wins is equal to prob(aj −ak +v >
0), where v ∼ N (0, σ 2 ).
You are given the outcome of m past games. These are organized as
(j (i) , k (i) , y (i) ), i = 1, . . . , m,
meaning that game i was played between teams j (i) and k (i) ; y (i) = 1 means that team
j (i) won, while y (i) = −1 means that team k (i) won. (We assume there are no ties.)
(a) Formulate the problem of finding the maximum likelihood estimate of team abil-
ities, â ∈ Rn , given the outcomes, as a convex optimization problem. You will
find the game incidence matrix A ∈ Rm×n , defined as
y (i) l = j (i)
Ail = −y (i) l = k (i)
0 otherwise,
useful.
The prior constraints âi ∈ [0, 1] should be included in the problem formulation.
Also, we note that if a constant is added to all team abilities, there is no change in
4
0.025
0.02
0.015
0.01
0.005
Error
0
−0.005
−0.01
−0.015
−0.02
−0.025
−3 −2 −1 0 1 2 3
x
Figure 2 Fitting error for Chebyshev fit of exponential with rational function.
Solution.
5
(a) The likelihood of the outcomes y given a is
1 (i)
Y
p(y|a) = Φ y (aj (i) − ak(i) ) ,
i=1,...,n σ
maximize l(a)
subject to 0 a 1.
% Estimate abilities
cvx_begin
variable a_hat(n)
minimize(-sum(log_normcdf(A*a_hat/sigma)))
subject to
a_hat >= 0
a_hat <= 1
cvx_end
Using this code we get that â = (1.0, 0.0, 0.68, 0.37, 0.79, 0.58, 0.38, 0.09, 0.67, 0.58).
(c) The following code is used to predict the outcomes in the test set
% Estimate errors in test set
res = sign(a_hat(test(:,1))-a_hat(test(:,2)));
Pml = 1-length(find(res-test(:,3)))/m_test
Ply = 1-length(find(train(:,3)-test(:,3)))/m_test
The maximum likelihood estimate gives a correct prediction of 86.7% of the games
in test. On the other hand, 75.6% of the games in test have the same outcome
as the games in train.
6
3. Piecewise-linear fitting. In many applications some function in the model is not given
by a formula, but instead as tabulated data. The tabulated data could come from
empirical measurements, historical data, numerically evaluating some complex expres-
sion or solving some problem, for a set of values of the argument. For use in a convex
optimization model, we then have to fit these data with a convex function that is com-
patible with the solver or other system that we use. In this problem we explore a very
simple problem of this general type.
Suppose we are given the data (xi , yi ), i = 1, . . . , m, with xi , yi ∈ R. We will assume
that xi are sorted, i.e., x1 < x2 < · · · < xm . Let a0 < a1 < a2 < · · · < aK be a
set of fixed knot points, with a0 ≤ x1 and aK ≥ xm . Explain how to find the convex
piecewise linear function f , defined over [a0 , aK ], with knot points ai , that minimizes
the least-squares fitting criterion
m
(f (xi ) − yi )2 .
X
i=1
You must explain what the variables are and how they parametrize f , and how you
ensure convexity of f .
Hints. One method to solve this problem is based on the Lagrange basis, f0 , . . . , fK ,
which are the piecewise linear functions that satisfy
fj (ai ) = δij , i, j = 0, . . . , K.
(In this form the function is easily incorporated into an optimization problem.)
Solution. Following the hint, we will use the Lagrange basis functions f0 , . . . , fK .
These can be expressed as
a1 − x
f0 (x) = ,
a1 − a0 +
!!
x − ai−1 ai+1 − x
fi (x) = min , , i = 1, . . . , K − 1,
ai − ai−1 ai − ai+1 +
7
and !
x − aK−1
fK (x) = .
aK − aK−1 +
Fij = fj (xi ), i = 1, . . . , m, j = 0, . . . , K.
minimize kF z − yk22
i+1 −zi
subject to azi+1 −ai
≥ azii −z i−1
−ai−1
, i = 1, . . . , K − 1.
The following code solves this problem for the data in pwl_fit_data.
cvx_quiet(’true’)
figure
plot(x,y,’k:’,’linewidth’,2)
hold on
% Single line
p = [x ones(100,1)]\y;
alpha = p(1)
beta = p(2)
plot(x,alpha*x+beta,’b’,’linewidth’,2)
mse = norm(alpha*x+beta-y)^2
8
for K = 2:4
% Generate Lagrange basis
a = (0:(1/K):1)’;
F = max((a(2)-x)/(a(2)-a(1)),0);
for k = 2:K
a_1 = a(k-1);
a_2 = a(k);
a_3 = a(k+1);
f = max(0,min((x-a_1)/(a_2-a_1),(a_3-x)/(a_3-a_2)));
F = [F f];
end
f = max(0,(x-a(K))/(a(K+1)-a(K)));
F = [F f];
% Solve problem
cvx_begin
variable z(K+1)
minimize(norm(F*z-y))
subject to
(z(3:end)-z(2:end-1))./(a(3:end)-a(2:end-1)) >=...
(z(2:end-1)-z(1:end-2))./(a(2:end-1)-a(1:end-2))
cvx_end
% Plot solution
y2 = F*z;
mse = norm(y2-y)^2
if K==2
plot(x,y2,’r’,’linewidth’,2)
elseif K==3
plot(x,y2,’g’,’linewidth’,2)
else
plot(x,y2,’m’,’linewidth’,2)
end
end
xlabel(’x’)
ylabel(’y’)
9
2
1.5
y
0.5
−0.5
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Figure 3 Piecewise-linear approximations for K = 1, 2, 3, 4
This generates figure 3. We can see that the approximation improves as K increases.
The following table shows the result of this approximation.
K α1 , . . . , αK β1 , . . . , βK J
1 1.91 −0.87 12.73
2 −0.27, 4.09 −0.33, −2.51 2.62
3 −1.80, 2.67, 4.25 −0.10, −1.59, −2.65 0.60
4 −3.15, 2.11, 2.68, 4.90 0.03, −1.29, −1.57, −3.23 0.22
There is another way to solve this problem. We are looking for a piecewise linear
function. If we have at least one internal knot (K ≥ 2), the function should satisfy the
two following constraints:
• convexity: α1 ≤ α2 ≤ · · · ≤ αK
• continuity: αi ai + βi = αi+1 ai + βi+1 , i = 1, . . . , K − 1.
Therefore, the opimization problem is
minimize ( m i=1 f (xi ) − yi )
2
P
subject to αi ≤ αi+1 , i = 1, . . . , K − 1
αi ai + βi = αi+1 ai + βi+1 , i = 1, . . . , K − 1
Reformulating the problem by representing f (xi ) in matrix form, we get
minimize k diag(x)F α + F β − yk2
subject to αi ≤ αi+1 , i = 1, . . . , K − 1
αi ai + βi = αi+1 ai + βi+1 , i = 1, . . . , K − 1
10
where the variables are α ∈ RK and β ∈ RK , and problem data are x ∈ Rm , y ∈ Rm
and
1
if aj−1 = xi , j = 1
Fij = 1 if aj−1 < xi ≤ aj
0 otherwise .
a = [0:1/K:1]’; % a_0,...,a_K
% matrix for sum f(x_i)
F = sparse(1:m,max(1,ceil(x*K)),1,m,K);
% solve problem
cvx_begin
variables alpha(K) beta(K)
minimize( norm(diag(x)*F*alpha+F*beta-y) )
subject to
if (K>=2)
alpha(1:K-1).*a(2:K)+beta(1:K-1) == alpha(2:K).*a(2:K)+beta(2:K)
a(1:K-1) <= a(2:K)
end
cvx_end
fp = sparse(1:mp,max(1,ceil(xp*K)),1,mp,K);
yp = [yp diag(xp)*fp*alpha+fp*beta];
end
plot(x,y,’b.’,xp,yp);
The matrix Ā ∈ Rm×n is called the nominal value or center value, and R ∈ Rm×n ,
which is elementwise nonnegative, is called the radius.
11
The robust least-squares problem, with interval matrix, is
with optimization variable x ∈ Rn . The problem data are A (i.e., Ā and R) and
b ∈ Rm . The objective, as a function of x, is called the worst-case residual norm. The
robust least-squares problem is evidently a convex optimization problem.
(a) Formulate the interval matrix robust least-squares problem as a standard opti-
mization problem, e.g., a QP, SOCP, or SDP. You can introduce new variables
if needed. Your reformulation should have a number of variables and constraints
that grows linearly with m and n, and not exponentially.
(b) Consider the specific problem instance with m = 4, n = 3,
60 ± 0.05 45 ± 0.05 −8 ± 0.05 −6
90 ± 0.05 30 ± 0.05 −30 ± 0.05 −3
A= , b= .
0 ± 0.05 −8 ± 0.05 −4 ± 0.05 18
30 ± 0.05 10 ± 0.05 −10 ± 0.05 −9
(The first part of each entry in A gives Āij ; the second gives Rij , which are all 0.05
here.) Find the solution xls of the nominal problem (i.e., minimize kĀx − bk2 ),
and robust least-squares solution xrls . For each of these, find the nominal residual
norm, and also the worst-case residual norm. Make sure the results make sense.
Solution:
(a) The problem is equivalent to
12
where |x| ∈ Rn is the vector with elements |x|i = |xi |, or equivalently as the QP
minimize y T y
subject to Āx + Rz − b y
Āx − Rz − b −y
−z x z.
where r = Āx − b.
Since kr+∆xk22 = m
Pn 2 Pm Pn 2
i=1 (ri + j=1 ∆ij xj ) = i=1 |ri + j=1 ∆ij xj | , it’s easy to see
P
Therefore
m n
Rij |xj |)2
X X
f (x) = (|ri | +
i=1 j=1
= k|Āx − b| + R|x|k22
Note that the objective function is convex since the Euclidian norm is convex and
increasing on Rm
+ and |Āx − b| + R|x| is convex and nonnegative.
(b) The following script computes the least-squares and the robust solutions and also
computes, for each one, the nominal and the worst-case residual norms.
% input data
A_bar = [ 60 45 -8; ...
90 30 -30; ...
0 -8 -4; ...
30 10 -10];
d = .05;
R = d*ones(4,3);
13
b = [ -6; -3; 18; -9];
% least-squares solution
x_ls = A_bar\b;
% display
disp(’Residual norms for the nominal problem when using LS solution: ’);
disp(nom_res_ls);
disp(’Residual norms for the nominal problem when using robust solution: ’);
disp(nom_res_rls);
disp(’Residual norms for the worst-case problem when using LS solution: ’);
disp(wc_res_ls);
disp(’Residual norms for the worst-case problem when using robust solution: ’);
disp(wc_res_rls);
The robust least-square solution can also be found using the following script:
14
cvx_begin
variables x(3) t(4)
minimize ( norm ( t ) )
abs(A_bar*x - b) + R*abs(x) <= t
cvx_end
This script returns the following results:
Residual norms for the nominal problem when using LS solution:
7.5895
Residual norms for the nominal problem when using robust solution:
17.7106
Residual norms for the worst-case problem when using robust solution:
17.7940
We also generated, for fun, the following histograms showing the distribution of
the residual norms for the case where x = xks and x = xrls . Those were obtained
by creating 1000 instances of A by sampling Aij uniformly between Āij − Rij and
Āij + Rij , and then evaluating the residual norm for each A and each of the 2
solutions.
Residual norm distribution for x = xls
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
15
Residual norm distribution for x = xrls
60
50
40
30
20
10
0
17.64 17.66 17.68 17.7 17.72 17.74 17.76 17.78 17.8
16
polation conditions. One common roughness measure is the ℓ2 variation (squared),
n
m X
(Uij − Ui−1,j )2 + (Uij − Ui,j−1 )2 .
X
i=2 j=2
cvx_begin
variable Ul2(m, n);
Ul2(Known) == Uorig(Known); % Fix known pixel values.
Ux = Ul2(2:end,2:end) - Ul2(2:end,1:end-1); % x (horiz) differences
Uy = Ul2(2:end,2:end) - Ul2(1:end-1,2:end); % y (vert) differences
minimize(norm([Ux(:); Uy(:)], 2)); % l2 roughness measure
cvx_end
cvx_begin
variable Utv(m, n);
Utv(Known) == Uorig(Known); % Fix known pixel values.
Ux = Utv(2:end,2:end) - Utv(2:end,1:end-1); % x (horiz) differences
Uy = Utv(2:end,2:end) - Utv(1:end-1,2:end); % y (vert) differences
minimize(norm([Ux(:); Uy(:)], 1)); % tv roughness measure
cvx_end
17
Original image Obscured image
10 10
20 20
30 30
40 40
50 50
10 20 30 40 50 10 20 30 40 50
10 10
20 20
30 30
40 40
50 50
10 20 30 40 50 10 20 30 40 50
6. Relaxed and discrete A-optimal experiment design. This problem concerns the A-
optimal experiment design problem, described on page 387, with data generated as
follows.
with variable λ ∈ Rp . Find the optimal point λ⋆ and the associated optimal value of
the relaxed problem. This optimal value is a lower bound on the optimal value of the
discrete A-optimal experiment design problem,
P −1
p T
minimize tr i=1 mi vi vi
subject to m1 + · · · + mp = m, mi ∈ {0, . . . , m}, i = 1, . . . , p,
18
with variables m1 , . . . , mp . To get a suboptimal point for this discrete problem, round
the entries in mλ⋆ to obtain integers m̂i . If needed, adjust these by hand or some other
method to ensure that they sum to m, and compute the objective value obtained. This
is, of course, an upper bound on the optimal value of the discrete problem. Give the
gap between this upper bound and the lower bound obtained from the relaxed problem.
Note that the two objective values can be interpreted as mean-square estimation error
E kx̂ − xk22 .
Solution. The objective of the relaxed problem is convex, so it is a convex problem.
Expressing it in cvx requires a little work. We’d like to write the objective as
minimize ((1/m)*trace(inv(V*diag(lambda)*V’)))
but this won’t work, because cvx doesn’t know about matrix convex functions. Instead,
we can express the objective as a sum of matrix fractional functions,
P −1
p
minimize (1/m) nk=1 eTk T
i=1 λi vi vi ek
P
subject to 1T λ = 1, λ 0.
where ek ∈ Rn is the kth unit vector. (Note that e is defined in exercise 6.9 as the
estimation error vector, so ek could also mean the kth entry of the error vector. But
here, clearly, ek is kth unit vector.)
We can express this in cvx using the function matrix_frac. The following code solves
the problem.
n = 5; % dimension
p = 20; % number of available types of measurements
m = 30; % total number of measurements to be carried out
randn(’state’, 0);
V=randn(n,p); % columns are vi, the possible measurement vectors
cvx_begin
variable lambda(p)
obj = 0;
for k=1:n
ek = zeros(n,1);
ek(k)=1;
obj = obj + (1/m)*matrix_frac(ek,V*diag(lambda)*V’);
end
minimize( obj )
subject to
sum(lambda) == 1
lambda >= 0
19
cvx_end
lower_bound = cvx_optval
For this problem instance, simple rounding yielded m̂i that summed to m = 30, so
no adjustment of the rounded values is needed. The lower bound is 0.2481; the upper
bound is 0.2483. The gap is 0.00023, which is around 0.1%.
What this means is this: We have found a choice of 30 measurements, each one from
the set of 20 possible measurements, that yields a mean-square estimation error E kx̂ −
xk22 = 0.2483. We do not know whether this is the optimal choice of 30 measurements.
But we do know that this choice is no more than 0.1% suboptimal; the optimal choice
can achieve a mean-square error that is no smaller than 0.2481. Our experiment design
is, if not optimal, very nearly optimal. (In fact, it is very likely to be optimal.)
20