EE263 Homework 5 Solutions
EE263 Homework 5 Solutions
EE263 Homework 5 Solutions
Lall
(a) Explain how to find a ∈ R and p ∈ RN (which is 24-periodic) that minimize the
RMS value of y − ŷ.
(b) Carry out the procedure described in part (a) on the data set found in tempdata.m.
Give the value of the trend parameter a that you find. Plot the model ŷ and the
measured temperatures y on the same plot. (The matlab code to do this is in the
file tempdata.m, but commented out.)
(c) Temperature prediction. Use the model found in part (b) to predict the tem-
perature for the next 24-hour period (i.e., from t = 169 to t = 192). The file
tempdata.m also contains a 24 long vector ytom with tomorrow’s temperatures.
Plot tomorrow’s temperature and your prediction of it, based on the model found
in part (b), on the same plot. What is the RMS value of your prediction error for
tomorrow’s temperatures?
Solution.
(a) Since p is 24-periodic, we only need to specify its values for t = 1, . . . , 24. We can
express the vector of model temperatures as
ŷ = Ax,
where
a 1
I24×24
p1
25
2 ..
∈ R168×25 ,
x= .. ∈R , A= .. .
. .
I24×24
p24 168
where the righthand part of A consists of 7 24 × 24 identity matrices stacked on
op of each other.
1
The solution is given by
x = (AT A)−1 AT y.
The associated model is given by ŷ = Ax.
(b) The following matlab script solves part (b) and part (c).
% loading the data from the files given
tempdata
% matrix A of the system y=Ax
e24=eye(24);
A = [ e24; e24; e24; e24; e24; e24; e24];
A = [(1:N)’ A];
% estimate coeffs
x = A\y ;
% the fit
yhat = (A*x)’;
% plot the fit and the data on the same figure
plot([1:168],y,’--r’,[1:168],yhat,’-.’);
% now we solve part (c)
% prediction of tomorrow’s temperature using the fit
ytomhat = ([(N+1:N+24)’ e24]*x)’;
% plot the fit and the data for next day
figure;
plot([1:24],ytom,’r’,[1:24],ytomhat,’-.’);
%RMS error
RMS = sqrt(norm(ytomhat-ytom)^2/24)
By running the above script we get a = −0.0121. The fit, ŷ, along with the data,
y, are presented in the following figure.
22
data
fit
20
18
16
14
12
10
6
0 20 40 60 80 100 120 140 160 180
t
2
(c) In part (b) we estimated the values of a and pi , i = 1, . . . , 24, collected together as
one vector x ∈ R25 . To estimate ytom , the 24-vector of tomorrow’s temperatures,
we use
ytom = Atom x,
where x is the value found in part (b), and
169
170
Atom = .. I24×24 .
.
192
The prediction of the temperature along with the data for the next day are pre-
sented in the following figure.
20
data
prediction
18
16
14
12
10
6
0 5 10 15 20 25
The RMS value of the difference between the prediction and the data for the next
day is 0.6522.
6.34 Auto-Bob. A set of 10 powerful lamps, each of whose powers we can choose over the
traditional scale [0, 10], is used to heat the surface of an object to a target temperature
T des (in degrees C). We let p ∈ R10 denote the lamp powers, and we let T ∈ R100
denote the temperature of the surface at 100 locations on a 10 × 10 grid. The mapping
between p and T , T = F (p), is not quite affine, but reasonably close. The mapping
F : R10 → R100 is quite complicated, since every lamp power affects every surface
location temperature, and various linear and nonlinear heat transport mechanisms are
involved. In principle, we could derive a physics-based model of F , but this hasn’t
been done. But we do have the device itself, which means we can set the lamp powers
to any levels we like (with pi ∈ [0, 10]) and measure the resulting surface temperature
vector T ∈ R100 . In other words, we can carry out experiments to evaluate the function
F.
3
We want to find p ∈ R10 , with 0 ≤ pi ≤ 10, that (at least approximately) minimizes
the RMS temperature error,
100
!1/2
1 X
e= (Ti − T des )2 ,
100 i=1
4
Next, we try turning on all lamps at different, uniform powers, p = α1, where α ∈
[0, 10]. Our script steps through the power levels α = 0, 1, . . . , 10. For α = 0, 1, 2 the
temperatures are generally too low; for α = 4, 5, . . . , 10 the temperatures are generally
too high. For α = 3 we get temperatures in the right ballpark, with an RMS error
e = 127. If you were to adjust α over a finer grid, you’d find that α = 2.6 gives the
least RMS error, around e = 107. (You don’t need to be this precise to successfully
solve the problem.)
Now comes the part where EE263 plays a role. We will start from some reasonable
power choice p0 , such as p0 = 31, and set T0 = F (p0 ). We will find an approximate
model of F of the form
δT = Aδp,
where δT = T − T0 , δp = p − p0 , and A ∈ R100×10 is matrix that we’ll need to find.
To determine the entries of A we slightly perturb the initial setting p0 along each
component and record the corresponding change in the temperature. In other words,
we take δp = γei , and record the resulting δT , which will be γ times the ith column
of A. A perturbation level γ = 0.5 would be fine here (and in fact, the method works
with a very wide range of choices of γ, including negative values which correspond to
decreasing the ith lamp power). When we do this, we do not get the exact same matrix
A for different choices of p0 or γ. But if p0 and γ are reasonable, we get quite similar
matrices A. (We didn’t ask you to experiment with this, but if you had more time, it
would have been a good thing for you to do.) Note that so far, we have done only a
few tens of experiments (including the tests where we turn on each lamp full power).
Some people estimated the matrix A using least-squares. This also works.
Now we use our approximate model of F to find a good value of p. We want to find p
so that kF (p) − T des k is minimized. Using out approximate model F (p) ≈ T0 + Aδp,
this becomes
kF (p) − T des k ≈ kT0 + Aδp − T des k.
So we choose δp to minimize this,
δp = A† (T des − T0 ).
Then, we set p = p0 + δp. We would then check to be sure all the entries of p are in
the required range [0, 10]. If they are, we have our next guess. (If not, we would have
added regularization of the form kp − 51k2 to the minimization of kT0 + Aδp − T des k2 .
But for the given data, the new p is within the required range.)
This new value of p works really well. It achieves an RMS error e = 34.5, about 4 times
better than with constant powers, and we only needed to call the surface_heating_sim
function about 30 times. Our value of p is
p = (1.11, 1.14, 3.27, 2.42, 6.74, 2.86, 2.05, 1.03, 7.11, 2.85).
5
Of course you get slightly different values of p, and e, depending on what you choose
as your starting point p0 , and what value of γ you use, or if you used least-squares to
estimate A. They all achieved e in the range of 34-38 degrees or so. You can even carry
out the method above iteratively, by find a new linearized model of f at the p value
you found (but presumably with a small value of γ since you don’t expect to change
your powers much this second time). This doesn’t reduce the value of e very much (if
at all).
The following code carries out the method described above.
% Now, try all lamps at the same power for different power levels.
for alpha=0:10
p = alpha*ones(10,1);
T = surface_heating_sim(p);
title (sprintf(’Surface temperature, all lamps at power %i’,alpha));
fprintf(’RMS error: %d \n’,norm(T-Tdes)/10);
end
% Populate matrix A.
A=[];
gamma = 0.5; % Power setting perturbation.
for i=1:10
p=p0; p(i) = p0(i)+gamma;
A = [A (surface_heating_sim(p)-T0)./gamma];
end
6
p = p0+delp
e = norm(surface_heating_sim(p) - Tdes)/10
Surface temperature
1000
1
900
2
800
3
700
4
600
5
500
6
400
7
300
8
200
9
100
10
0
2 4 6 8 10
7.3 Curve-smoothing. We are given a function F : [0, 1] → R (whose graph gives a curve in
R2 ). Our goal is to find another function G : [0, 1] → R, which is a smoothed version
of F . We’ll judge the smoothed version G of F in two ways:
• Mean-square deviation from F , defined as
Z 1
D= (F (t) − G(t))2 dt.
0
7
You can assume that n is chosen large enough to represent the functions well. Using
this representation we will use the following objectives, which approximate the ones
defined for the functions above:
gives a simple approximation of G00 (i/n). You will only work with this approximate
version of the problem, i.e., the vectors f and g and the objectives c and d.
(a) Explain how to find g that minimizes d+µc, where µ ≥ 0 is a parameter that gives
the relative weighting of sum-square curvature compared to sum-square deviation.
Does your method always work? If there are some assumptions you need to make
(say, on rank of some matrix, independence of some vectors, etc.), state them
clearly. Explain how to obtain the two extreme cases: µ = 0, which corresponds
to minimizing d without regard for c, and also the solution obtained as µ → ∞
(i.e., as we put more and more weight on minimizing curvature).
(b) Get the file curve smoothing.m from the course web site. This file defines a
specific vector f that you will use. Find and plot the optimal trade-off curve
between d and c. Be sure to identify any critical points (such as, for example, any
intersection of the curve with an axis). Plot the optimal g for the two extreme
cases µ = 0 and µ → ∞, and for three values of µ in between (chosen to show the
trade-off nicely). On your plots of g, be sure to include also a plot of f , say with
dotted line type, for reference. Submit your matlab code.
Solution.
(a) Let’s start with the two extreme cases. When µ = 0, finding g to minimize d + µc
reduces to finding g to minimize d. Since d is a sum of squares, d ≥ 0. Choosing
g = f trivially achieves d = 0. This makes perfect sense: to minimize the
deviation measure, just take the smoothed version to be the same as the original
function. This yields zero deviation, naturally, but also, it yields no smoothing!
8
Next, consider the extreme case where µ → ∞. This means we want to make the
curvature as small as possible. Can we drive it to zero? The answer is yes, we
can: the curvature is zero if and only if g is an affine function, i.e., has the form
gi = ai + b for some constants a and b. There are lots of vectors g that have this
form; in fact, we have one for every pair of numbers a, b. All of these vectors g
make c zero. Which one do we choose? Well, even if µ is huge, we still have a
small contribution to d + µc from d, so among all g that make c = 0, we’d like the
one that minimizes d. Basically, we want to find the best affine approximation,
in the sum of squares sense, to f . We want to find a and b that minimize
1 1
2 1
" #
a
3 1
f
−A
where A =
.
b
.. ..
. .
n 1
For n ≥ 2, A is skinny and full rank, and a and b can be found using least-squares.
Specifically, [a b]T = (AT A)−1 AT f . In the general case, minimizing d + µc, is the
same as choosing g to minimize
2
−1 2 −1 0 · · · 0
2
0 −1 2 −1 · · · 0
1 1
n2
+ µ
√
√ Ig − √ f
. g
.
.. .. ..
n n
n−2
0 0 . . . ..
0 0 · · · −1 2 −1
| {z }
S
The inverse of à always always exists because I is full rank. The expression can
also be written as g = ( nI + µS T S)−1 nf .
(b) The following plots show the optimal trade-off curve and the optimal g corre-
sponding to representative µ values on the curve.
9
6 Optimal tradeoff curve
x 10
2
1.8
1.6
Sum−square curvature (x intercept = 1.9724e06)
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Sum−square deviation (y intercept = 0.3347)
1.5
0.5
−0.5
−1
−1.5
−2 f
u=0
u = 10e−7
−2.5 u = 10e−5
u = 10e−4
u = infinity
−3
5 10 15 20 25 30 35 40 45 50
10
The following matlab code finds and plots the optimal trade-off curve between d
and c. It also finds and plots the optimal g for representative values of µ. As
expected, when µ = 0, g = f and no smoothing occurs. At the other extreme, as
µ goes to infinity, we get an affine approximation of f . Intermediate values of µ
correspond to approximations of f with different degrees of smoothness.
close all;
clear all;
curve_smoothing
S = toeplitz([-1; zeros(n-3,1)],[-1 2 -1 zeros(1,n-3)]);
S = S*n^2/(sqrt(n-2));
I = eye(n);
g_no_deviation = f;
error_curvature(1) = norm(S*g_no_deviation)^2;
error_deviation(1) = 0;
u = logspace(-8,-3,30);
for i = 1:length(u)
A_tilde = [1/sqrt(n)*I; sqrt(u(i))*S];
y_tilde = [1/sqrt(n)*f; zeros(n-2,1)];
g = A_tilde\y_tilde;
error_deviation(i+1) = norm(1/sqrt(n)*I*g-f/sqrt(n))^2;
error_curvature(i+1) = norm(S*g)^2;
end
a1 = 1:n;
a1 = a1’;
a2 = ones(n,1);
A = [a1 a2];
affine_param = inv(A’*A)*A’*f;
for i = 1:n
g_no_curvature(i) = affine_param(1)*i+affine_param(2);
end
g_no_curvature = g_no_curvature’;
error_deviation(length(u)+2) = 1/n*norm(g_no_curvature-f)^2;
error_curvature(length(u)+2) = 0;
figure(1);
plot(error_deviation, error_curvature);
xlabel(’Sum-square deviation (y intercept = 0.3347)’);
ylabel(’Sum-square curvature (x intercept = 1.9724e06)’);
title(’Optimal tradeoff curve ’);
print curve_extreme.eps;
u1 = 10e-7;
A_tilde = [1/sqrt(n)*I;sqrt(u1)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
11
g1 = A_tilde\y_tilde;
u2 = 10e-5;
A_tilde = [1/sqrt(n)*I;sqrt(u2)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g2 = A_tilde\y_tilde;
u3 = 10e-4;
A_tilde = [1/sqrt(n)*I;sqrt(u3)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g3 = A_tilde\y_tilde;
figure(3);
plot(f,’*’);
hold;
plot(g_no_deviation);
plot(g1,’--’);
plot(g2,’-.’);
plot(g3,’-’);
plot(g_no_curvature,’:’);
axis tight;
legend(’f’,’u = 0’,’u = 10e-7’, ’u = 10e-5’, ’u = 10e-4’,’u = infinity’,0);
title(’Curves illustrating the trade-off’);
print curve_tradeoff.eps;
12
8.3 Minimum fuel and minimum peak input solutions. Suppose A ∈ Rm×n is fat and
full rank, so there are many x’s that satisfy Ax = y. In lecture we encountered the
least-norm solution given by xln = AT (AAT )−1 y. This solution has the minimum
(Euclidean) norm among all solutions of Ax = y. In many applications we want to
minimize another norm of x (i.e., measure of size of x) subject to Ax = y. Two
common examples are the 1-norm and ∞-norm, which are defined as
n
X
kxk1 = |xi |, kxk∞ = max |xi |.
i=1,...,n
i=1
The 1-norm, for example, is often a good measure of fuel use; the ∞-norm is the peak
of the vector or signal x. There is no simple formula for the least 1-norm or ∞-norm
solution of Ax = y, like there is for the least (Euclidean) norm solution. They can be
computed very easily, however. (That’s one of the topics of EE364.) The analysis is a
bit trickier as well, since we can’t just differentiate to verify that we have the minimizer.
For example, how would you know that a solution of Ax = y has minimum 1-norm?
In this problem you will explore this idea. First verify the following inequality, which
is like the Cauchy-Schwarz inequality (but even easier to prove): for any v, w ∈ Rp ,
the following inequality holds: wT v ≤ kvk∞ kwk1 . From this inequality it follows that
whenever v 6= 0,
wT v
kwk1 ≥ .
kvk∞
Now let z be any solution of Az = y, and let λ ∈ Rm be such that AT λ 6= 0. Explain
why we must have
λT y
kzk1 ≥ .
kAT λk∞
Thus, any solution of Az = y must have 1-norm at least as big as the righthand side
expression. Therefore if you can find xmf ∈ Rn (mf stands for minimum fuel) and
λ ∈ Rm such that Axmf = y and
λT y
kxmf k1 = ,
kAT λk∞
then xmf is a minimum fuel solution. (Explain why.) Methods for computing xmf and
the mysterious vector λ are described in EE364. In the rest of this problem, you’ll use
these ideas to verify a statement made during lecture. Now consider the problem from
the lecture notes of a unit mass acted on by forces x1 , . . . , x10 for one second each. The
mass starts at position p(0) = 0 with zero velocity and is required to satisfy p(10) = 1,
ṗ(10) = 0. There are, of course, many force vectors that satisfy these requirements. In
the lecture notes, you can see a plot of the least (Euclidean) norm force profile. In class
I stated that the minimum fuel solution is given by xmf = (1/9, 0, . . . , 0, −1/9), i.e., an
accelerating force at the beginning, 8 seconds of coasting, and a (braking) force at the
end to decelerate the mass to zero velocity at t = 10. Prove this. Hint: try λ = (1, −5).
13
Verify that the 1-norm of xmf is less than the 1-norm of xln , the (Euclidean) least-norm
solution. Feel free to use matlab. There are several convenient ways to find the 1-
and ∞-norm of a vector z, e.g., norm(z,1) and norm(z,inf) or sum(abs(z)) and
max(abs(z)). One last question, for fun: what do you think is the minimum peak
force vector xmp ? How would you verify that a vector xmp (mp for minimum peak) is
a minimum ∞-norm solution of Ax = y? This input, by the way, is very widely used
in practice. It is (basically) the input used in a disk drive to move the head from one
track to another, while respecting a maximum possible current in the disk drive motor
coil. Hints:
Solution.
First, we will prove the inequality |wT v| ≤ kvk∞ kwk1 for any v, w ∈ Rp :
|wT v| = |
X X
wi vi | ≤ |wi vi |,
i i
X
= |wi ||vi |,
i
X
≤ max |vi | |wi |, = kvk∞ kwk1
i
i
T
Or kwk1 ≥ |w v|
kvk∞
if v 6= 0. Under what conditions will the equality hold? Note that
the equality in the first line holds if wi vi ≥ 0 for i = 1, . . . , n, and the one on the third
line holds when |vi | = maxi |vi | = vm , for i = 1, . . . , n. For both equalities to hold, we
get vi = vm sgn(w). Now we will use the above inequality to derive a lower bound on
kzk1 , for all z satisfying Az = y. Take transposes of both sides to get z T AT = y T ,
then multiply both sides on the right by a nonzero (but otherwise arbitrary) vector λ.
This yields z T AT λ = y T λ. Now let w = z, and v = AT λ in the above inequality to get
|z T (AT λ)| ≤ kzk1 kAT λk∞ , and therefore
|λT y|
kzk1 ≥
kAT λk∞
Thus, any solution of Az = y must have 1-norm at least as large as the righthand side
expression for any λ we pick. Hence, if there exist a z that satisfies the equality for
some value of λ, then z has the smallest possible 1-norm, and is the minimum fuel
solution. ( In fact, this particular value of λ maximizes the righthand side, giving
the largest possible lower bound to the value kzk1 ) Now we can prove that xmf =
h iT
1/9 0 . . . 0 −1/9 is the minimum fuel solution by showing that it achieves
the lower bound for the λ given in the hint. Using matlab:
14
>> lambda = [1; -5];
>> y=[1;0];
>> x_mf = 1/9*[1; zeros(8,1);-1];
>> norm(x_mf,1)
ans =
0.2222
>> abs(lambda’*[1;0])/norm(A’*lambda,inf)
ans =
0.2222
In order to verify that a given xmp is the minimum peak solution, we can proceed
similarly, starting from the same inequality we proved above, and just switching all
the 1-norms and ∞-norms to get
|λT y|
kzk∞ ≥
kAT λk1
And then show that xmp and a certain value of λ (that is found by maximizing the
righthand side expression over λ) in fact satisfy the equality. For this problem, the
minimum peak force xmp turns out to be constant at 1/25 for 5 seconds, and then a
constant −1/25 for the remaining 5 seconds. This is called a bang-bang input because
it consists of a maximum acceleration for 5 seconds, followed by maximum deceleration
for 5 seconds. So don’t make fun of people from the Boolean or ‘full accelator/ full
brake’ driving school; they’re just minimizing the peak force on the vehicle.
15