0% found this document useful (0 votes)
14 views27 pages

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Alp Er
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Alp Er
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Homework 3 Solutions

EE 263 Stanford University Summer 2018

July 12, 2018

1. Least-squares residuals. Suppose A is skinny and full-rank. Let xls be the least-squares
approximate solution of Ax = y, and let yls = Axls . Show that the residual vector r = y − yls
satisfies
krk2 = kyk2 − kyls k2 .
Also, give a brief geometric interpretation of this equality (just a couple of sentences, and
maybe a conceptual drawing).

Solution. Let us first show that r ⊥ yls . Since yls = Axls = AA† y = A(AT A)−1 AT y

yls T r = yls T (y − yls ) = yls T y − yls T yls


= y T A(AT A)−1 AT y − y T A(AT A)−1 AT A(AT A)−1 AT y
= y T A(AT A)−1 AT y − y T A(AT A)−1 (AT A)(AT A)−1 AT y
= y T A(AT A)−1 AT y − y T A(AT A)−1 AT y
= 0.

1
Thus, kyk2 = kyls + rk2 = (yls + r)T (yls + r) = kyls k2 + 2yls T r + krk2 = kyls k2 + krk2 . Therefore
krk2 = kyk2 − kyls k2 .

y
EE
−→ By Pythagoras’ theorem, kyk2 = kyls k2 + krk2
 E
 E
 E
 E
 E
 E
 E krk
 E R(A)
 E
kyk  E
 E
 E
 E
 E E
  E yls = Axls

 
 kyls k
0

2. Least-squares model fitting. In this problem you will use least-squares to fit several
different types of models to a given set of input/output data. The data consist of a scalar
input sequence u, and a scalar output sequence y, for t = 1, . . . , N . You will develop several
different models that relate the signals u and y.

• Memoryless models. In a memoryless model, the output at time t, i.e., y(t), depends
only the input at time t, i.e., u(t). Another common term for such a model is static.

constant model: y(t) = c0


static linear: y(t) = c1 u(t)
static affine: y(t) = c0 + c1 u(t)
static quadratic: y(t) = c0 + c1 u(t) + c2 u(t)2

• Dynamic models. In a dynamic model, y(t) depends on u(s) for some s 6= t. We consider
some simple time-series models (see problem 2 in the reader), which are linear dynamic

2
models.
moving average (MA): y(t) = a0 u(t) + a1 u(t − 1) + a2 u(t − 2)
autoregressive (AR): y(t) = a0 u(t) + b1 y(t − 1) + b2 y(t − 2)
autoregressive moving average (ARMA): y(t) = a0 u(t) + a1 u(t − 1) + b1 y(t − 1)
Note that in the AR and ARMA models, y(t) depends indirectly on all previous inputs,
u(s) for s < t, due to the recursive dependence on y(t − 1). For this reason, the AR and
ARMA models are said to have infinite memory. The MA model, on the other hand, has
a finite memory: y(t) depends only on the current and two previous inputs. (Another
term for this MA model is 3-tap system, where taps refer to taps on a delay line.)
Each of these models is specified by its parameters, i.e., the scalars ci , ai , bi . For each of these
models, find the least-squares fit to the given data. In other words, find parameter values that
minimize the sum-of-squares of the residuals. For example, for the ARMA model, pick a0 , a1 ,
and b1 that minimize
N
X
(y(t) − a0 u(t) − a1 u(t − 1) − b1 y(t − 1))2 .
t=2

(Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are defined.)
For each model, give the root-mean-square (RMS) residual, i.e., the squareroot of the mean
of the optimal residual squared. Plot the output ŷ predicted by your model, and plot the
residual (which is y − ŷ). The data for this problem are available from the class web page in
the file uy_data.json. This file contains the vectors u and y and the scalar N (the length of
the vectors). Now you can plot u, y, etc. Note: the dataset u, y is not generated by any of
the models above. It is generated by a nonlinear recursion, which has infinite memory.

Solution. For each of the given models, we get a linear relationship between the outputs
and the unknown parameters. For example, for the constant model we have
 
y(1)  
 y(2)  1
  .. 
 ..  =  .  c0

 . 
1
y(N )
Or for the static quadratic model
u(1)2
   
y(1) 1 u(1)  
 y(2)   1 u(2) u(2)2  c0
 ..  =  ..  c1 ,
    
.. ..
 .  . . . c2

y(N ) 1 u(N ) u(N )2
Similarly, for the autoregressive moving average model we get
   
y(2) u(2) u(1) y(1)  
 y(3)   u(3) u(2) y(2)  a0

 ..  =  ..   a1  ,
  
.. ..
 .   . . . b1

y(N ) u(N ) u(N − 1) y(N − 1)

3
(Note that for this model we start from y(2), since u(0) and y(0) are undefined). All of
the above are in the form of y = Ax, where y is the output sequence, x is the vector of
corresponding unknown coefficients. The goal is to find the coefficients that minimize the
sum-of-squares of the residuals. This is nothing but the least-squares solution of y = Ax,
given by xls = (AT A)−1 AT y. Then using xls , the model output can be computed as ŷ = Axls .
This can be done easily in matlab:
uy_data; % read u,y,N
A1=ones(N,1); A2=u; A3=[ones(N,1), u]; A4=[ones(N,1), u, u.^2];
x1=A1\y; y1_hat=A1*x1; r1=y-y1_hat; rms1=sqrt(mean(r1.^2))
x2=A2\y; y2_hat=A2*x2; r2=y-y2_hat; rms2=sqrt(mean(r2.^2))
x3=A3\y; y3_hat=A3*x3; r3=y-y3_hat; rms3=sqrt(mean(r3.^2))
x4=A4\y; y4_hat=A4*x4; r4=y-y4_hat; rms4=sqrt(mean(r4.^2))
A5=[u(3:N), u(2:N-1), u(1:N-2)]; y5=y(3:N); A6=[u(3:N), y(2:N-1),
y(1:N-2)]; y6=y(3:N); A7=[u(2:N), u(1:N-1), y(1:N-1)]; y7=y(2:N);
x5=A5\y5; y5_hat=A5*x5; r5=y5-y5_hat; rms5=sqrt(mean(r5.^2))
x6=A6\y6; y6_hat=A6*x6; r6=y6-y6_hat; rms6=sqrt(mean(r6.^2))
x7=A7\y7; y7_hat=A7*x7; r7=y7-y7_hat; rms7=sqrt(mean(r7.^2))
figure(1); subplot(211); plot(y1_hat,’b’); grid on; hold on;
plot(r1,’--r’); hold off; title(’constant’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y2_hat,’b’); grid on; hold
on; plot(r2,’--r’); hold off; title(’linear’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(2); subplot(211); plot(y3_hat,’b’); grid on; hold on;
plot(r3,’--r’); hold off; title(’affine’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y4_hat,’b’); grid on; hold
on; plot(r4,’--r’); hold off; title(’quadratic’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(3); subplot(211); plot(y5_hat,’b’); grid on; hold on;
plot(r5,’--r’); hold off; title(’MA’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y6_hat,’b’); grid on; hold
on; plot(r7,’--r’); hold off; title(’AR’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(4); subplot(211); plot(y7_hat,’b’); grid on; hold on;
plot(r7,’--r’); hold off; title(’ARMA’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(1); print uy_1 figure(2); print uy_2 figure(3); print uy_3
figure(4); print uy_4

And the following RMS values for the residuals are obtained: Constant: 1.1181, linear: 0.5940,
affine: 0.5210, quadratic: 0.5179; MA : 0.2504, AR : 0.1783, ARMA : 0.1853 For the mem-
oryless models, the error decreases as the model becomes more complicated. However, the
models with memory perform significantly better. Among these the error decreases with the
introduction of autoregressive terms. Among the memoryless models, the affine model would
be a good choice, since the more complicated quadratic model yields only slightly smaller
residuals. Overall, the autoregressive model seems to do a good job. Of course, to choose a

4
model, we should really validate on another batch of data. Note that in this problem we were
only concerned with model fitting to the data, and not in ‘validating’ the model, i.e., how well
this model will work for inputs other than the ones used for fitting the model. The following
plots show ŷ (solid) and the residuals y − ŷ (dashed):
constant
3

0

−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n
linear
3

0

−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n

affine
3

1

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n
quadratic
3

1

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

5
moving average
3


−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n
autoregressive
3

1

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

autoregressive moving average


3

1

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

3. Identifying a system from input/output data. We consider the standard setup:

y = Ax + v,

where A ∈ Rm×n , x ∈ Rn is the input vector, y ∈ Rm is the output vector, and v ∈ Rm is the
noise or disturbance. We consider here the problem of estimating the matrix A, given some
input/output data. Specifically, we are given the following:

x(1) , . . . , x(N ) ∈ Rn , y (1) , . . . , y (N ) ∈ Rm .

These represent N samples or observations of the input and output, respectively, possibly
corrupted by noise. In other words, we have

y (k) = Ax(k) + v (k) , k = 1, . . . , N,

where v (k) are assumed to be small. The problem is to estimate the (coefficients of the) matrix
A, based on the given input/output data. You will use a least-squares criterion to form an

6
estimate  of A. Specifically, you will choose as your estimate  the matrix that minimizes
the quantity
XN
J= kAx(k) − y (k) k2
k=1

over A.

a) Explain how to do this. If you need to make an assumption about the input/output
data to make your method work, state it clearly. You may want to use the matrices
X ∈ Rn×N and Y ∈ Rm×N given by

X = x(1) · · · x(N ) ,
 
Y = y (1) · · · y (N )
 

in your solution.

b) On the course web site you will find some input/output data for an instance of this
problem in the file sysid_data.json. Executing this Julia file will assign values to m, n,
and N , and create two matrices that contain the input and output data, respectively. The
n × N matrix variable X contains the input data x(1) , . . . , x(N ) (i.e., the first column of X
contains x(1) , etc.). Similarly, the m×N matrix Y contains the output data y (1) , . . . , y (N ) .
You must give your final estimate Â, your source code, and also give an explanation of
what you did.

Solution.

a) We start by expressing the objective function J as


N
X
J= kAx(k) − y (k) k2
k=1
XN Xm
= (Ax(k) − y (k) )2i
k=1 i=1
N X
m
(k)
X
= (aT
i x
(k)
− yi ) 2
k=1 i=1
m N
!
(k)
X X
= (aT
i x
(k)
− yi )2 ,
i=1 k=1

where aT
i is the ith row of A. The last expression shows that J is a sum of expressions Ji
(shown in parentheses), each of which only depends on ai . This means that to minimize
J, we can minimize each of these expressions separately. That makes sense: we can
estimate the rows of A separately. Now let’s see how to minimize
N
(k)
X
Ji = (aT
i x
(k)
− yi )2 ,
k=1

7
which is the contribution to J from the ith row of A. First we write it as
 (1) 2
x(1)T
 
yi
 ..   .. 
Ji =  .  ai −  .  .
x(N )T (N )
yi

Now that we have the problem in the standard least-squares format, we’re pretty much
done. Using the matrix X ∈ Rn×N given by

X = x(1) · · · x(N ) ,
 

we can express the estimate as


 (1) 
yi
T −1  ..
âi = (XX ) X  . .

(N )
yi

Using the matrix Y ∈ Rm×N given by

Y = y (1) · · · y (N ) ,
 

we can express the estimate of A as

ÂT = (XX T )−1 XY T .

Transposing this gives the final answer:

 = Y X T (XX T )−1 .

b) Once you have the neat formula found above, it’s easy to get matlab to compute the
estimate. It’s a little inefficient, but perfectly correct, to simply use

Ahat = Y*X’*inv(X*X’);

This yields the estimate


2.03 5.02 5.01
 
 0.01 7 1.01 
 
 7.04 0 6.94 
 
 7 3.98 4 
 
 9.01 1.04 7 
 = 
 4.01
.
 3.96 9.03 
 4.99 6.97 8.03 
 
 7.94 6.09 3.02 
 
 0.01 8.97 −0.04 
1.06 8.02 7.03

8
Once you’ve got Â, it’s a good idea to check the residuals, just to make sure it’s reason-
able, by comparing it to
XN
ky (k) k2 .
k=1

Here we get (64.5)2 ,


around 4.08%. There are several other ways to compute  in
matlab. You can calculate the rows of  one at a time, using

a1hat = (X’\(Y(i,:)’))’;

In fact, the backslash operator in matlab solves multiple least-squares problems at once,
so you can use

AhatT = X’ \ (Y’);
Ahat = AhatT’;

or

Ahat = (X’\(Y’))’;

In any case, it’s not exactly a long matlab program . . .

4. Curve-smoothing. We are given a function F : [0, 1] → R (whose graph gives a curve in


R2 ). Our goal is to find another function G : [0, 1] → R, which is a smoothed version of F .
We’ll judge the smoothed version G of F in two ways:

• Mean-square deviation from F , defined as


Z 1
D= (F (t) − G(t))2 dt.
0

• Mean-square curvature, defined as


Z 1
C= G00 (t)2 dt.
0

We want both D and C to be small, so we have a problem with two objectives. In general
there will be a trade-off between the two objectives. At one extreme, we can choose G = F ,
which makes D = 0; at the other extreme, we can choose G to be an affine function (i.e.,
to have G00 (t) = 0 for all t ∈ [0, 1]), in which case C = 0. The problem is to identify the
optimal trade-off curve between C and D, and explain how to find smoothed functions G
on the optimal trade-off curve. To reduce the problem to a finite-dimensional one, we will
represent the functions F and G (approximately) by vectors f, g ∈ Rn , where

fi = F (i/n), gi = G(i/n).

9
You can assume that n is chosen large enough to represent the functions well. Using this
representation we will use the following objectives, which approximate the ones defined for the
functions above:
• Mean-square deviation, defined as
n
1X
d= (fi − gi )2 .
n
i=1

• Mean-square curvature, defined as


n−1  2
1 X gi+1 − 2gi + gi−1
c= .
n−2 1/n2
i=2

In our definition of c, note that


gi+1 − 2gi + gi−1
1/n2
gives a simple approximation of G00 (i/n). You will only work with this approximate version
of the problem, i.e., the vectors f and g and the objectives c and d.
a) Explain how to find g that minimizes d + µc, where µ ≥ 0 is a parameter that gives
the relative weighting of sum-square curvature compared to sum-square deviation. Does
your method always work? If there are some assumptions you need to make (say, on
rank of some matrix, independence of some vectors, etc.), state them clearly. Explain
how to obtain the two extreme cases: µ = 0, which corresponds to minimizing d without
regard for c, and also the solution obtained as µ → ∞ (i.e., as we put more and more
weight on minimizing curvature).

b) Get the file curve_smoothing.json from the course web site. This file defines a specific
vector f that you will use. Find and plot the optimal trade-off curve between d and c.
Be sure to identify any critical points (such as, for example, any intersection of the curve
with an axis). Plot the optimal g for the two extreme cases µ = 0 and µ → ∞, and for
three values of µ in between (chosen to show the trade-off nicely). On your plots of g,
be sure to include also a plot of f , say with dotted line type, for reference. Submit your
matlab code.

Solution.
a) Let’s start with the two extreme cases. When µ = 0, finding g to minimize d + µc
reduces to finding g to minimize d. Since d is a sum of squares, d ≥ 0. Choosing g = f
trivially achieves d = 0. This makes perfect sense: to minimize the deviation measure,
just take the smoothed version to be the same as the original function. This yields zero
deviation, naturally, but also, it yields no smoothing! Next, consider the extreme case
where µ → ∞. This means we want to make the curvature as small as possible. Can
we drive it to zero? The answer is yes, we can: the curvature is zero if and only if g is
an affine function, i.e., has the form gi = ai + b for some constants a and b. There are
lots of vectors g that have this form; in fact, we have one for every pair of numbers a, b.

10
All of these vectors g make c zero. Which one do we choose? Well, even if µ is huge, we
still have a small contribution to d + µc from d, so among all g that make c = 0, we’d
like the one that minimizes d. Basically, we want to find the best affine approximation,
in the sum of squares sense, to f . We want to find a and b that minimize
 
1 1
  2 1
a
 
f −A where A =  3 1  .
 
b  .. .. 
. .
n 1

For n ≥ 2, A is skinny and full rank, and a and b can be found using least-squares.
Specifically, [a b]T = (AT A)−1 AT f . In the general case, minimizing d + µc, is the same
as choosing g to minimize
  2
−1 2 −1 0 · · · 0
1 1 2  0 −1 2 −1 · · · 0 
n2  
√ Ig − √ f + µ √ . . . . g .
n n n−2 0

0 .. . . . . .. 

0 0 · · · −1 2 −1
| {z }
S

This is a multi-objective least-squares problem. The minimizing g is


" # " #
√I √f
T −1 T n n
g = (à Ã) à ỹ where à = √ and ỹ = .
µS 0

The inverse of à always always exists because I is full rank. The expression can also be
written as g = ( nI + µS T S)−1 nf .

b) The following plots show the optimal trade-off curve and the optimal g corresponding

11
to representative µ values on the curve.
6 Optimal tradeoff curve
x 10
2

1.8

1.6
Sum−square curvature (x intercept = 1.9724e06)

1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Sum−square deviation (y intercept = 0.3347)

Curves illustrating the trade−off

1.5

0.5

−0.5

−1

−1.5

−2 f
u=0
u = 10e−7
−2.5 u = 10e−5
u = 10e−4
u = infinity
−3
5 10 15 20 25 30 35 40 45 50

12
The following matlab code finds and plots the optimal trade-off curve between d and c.
It also finds and plots the optimal g for representative values of µ. As expected, when
µ = 0, g = f and no smoothing occurs. At the other extreme, as µ goes to infinity, we
get an affine approximation of f . Intermediate values of µ correspond to approximations
of f with different degrees of smoothness.

close all;
clear all;
curve_smoothing
S = toeplitz([-1; zeros(n-3,1)],[-1 2 -1 zeros(1,n-3)]);
S = S*n^2/(sqrt(n-2));
I = eye(n);
g_no_deviation = f;
error_curvature(1) = norm(S*g_no_deviation)^2;
error_deviation(1) = 0;
u = logspace(-8,-3,30);
for i = 1:length(u)
A_tilde = [1/sqrt(n)*I; sqrt(u(i))*S];
y_tilde = [1/sqrt(n)*f; zeros(n-2,1)];
g = A_tilde\y_tilde;
error_deviation(i+1) = norm(1/sqrt(n)*I*g-f/sqrt(n))^2;
error_curvature(i+1) = norm(S*g)^2;
end
a1 = 1:n;
a1 = a1’;
a2 = ones(n,1);
A = [a1 a2];
affine_param = inv(A’*A)*A’*f;
for i = 1:n
g_no_curvature(i) = affine_param(1)*i+affine_param(2);
end
g_no_curvature = g_no_curvature’;
error_deviation(length(u)+2) = 1/n*norm(g_no_curvature-f)^2;
error_curvature(length(u)+2) = 0;
figure(1);
plot(error_deviation, error_curvature);
xlabel(’Sum-square deviation (y intercept = 0.3347)’);
ylabel(’Sum-square curvature (x intercept = 1.9724e06)’);
title(’Optimal tradeoff curve ’);
print curve_extreme.eps;
u1 = 10e-7;
A_tilde = [1/sqrt(n)*I;sqrt(u1)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g1 = A_tilde\y_tilde;
u2 = 10e-5;
A_tilde = [1/sqrt(n)*I;sqrt(u2)*S];

13
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g2 = A_tilde\y_tilde;
u3 = 10e-4;
A_tilde = [1/sqrt(n)*I;sqrt(u3)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g3 = A_tilde\y_tilde;
figure(3);
plot(f,’*’);
hold;
plot(g_no_deviation);
plot(g1,’--’);
plot(g2,’-.’);
plot(g3,’-’);
plot(g_no_curvature,’:’);
axis tight;
legend(’f’,’u = 0’,’u = 10e-7’, ’u = 10e-5’, ’u = 10e-4’,’u = infinity’,0);
title(’Curves illustrating the trade-off’);
print curve_tradeoff.eps;

Note: Several exams had a typo that defined


n−1  2
1 X gi+1 − 2gi + gi−1
c=
n−1 1/n2
i=2

instead of
n−1  2
1 X gi+1 − 2gi + gi−1
c= .
n−2 1/n2
i=2

The solutions above reflect the second definition. Full credit was given for answers consistent
with either definition. Some common errors

• Several students tried to approximate f using low-degree polynomials. While fitting f


to a polynomial does smooth f , it does not necessarily minimize d + µc for some µ ≥ 0,
nor does it illustrate the trade-off between curvature and deviation.

• In explaining how to find the g that minimizes d + µc as µ → ∞, many people correctly


observed that if g ∈ null(S), then c = 0. For full credit, however, solutions had to show
how to choose the vector in null(S) that minimizes d.

• Many people chose to zoom in on a small section of the trade-off curve rather than plot
the whole range from 0 to µ → ∞. Those solutions received full-credit provided they
calculated the intersections with the axes (i.e. provided they found the minimum value
for d + µc when µ = 0 and when µ → ∞).

5. Hovercraft with limited range. We have a hovercraft moving in the plane with two
thrusters, each pointing through the center of mass, exerting forces in the x and y directions

14
with 100% efficiency. The hovercraft has mass 1. The discretized equations of motion for the
hovercraft are
  1 
1 1 0 0 2 0  
0 1 0 0 x(t) +  1 0 u1 (t)

x(t + 1) = 
0

1 
0 1 1 0
2 u2 (t)
0 0 0 1 0 1

where x1 and x2 are the position and velocity in the x-direction, and x3 , x4 are the position
and velocity in the y-direction. Here
 
u1 (t)
u(t) =
u2 (t)

is the force acting on the hovercraft for time in the interval [t, t + 1). Let the position of the
vehicle at time t be q(t) ∈ R2 .

a) The hovercraft starts at the origin. We’d like to apply thrust to make it move through
points p1 , p2 , p3 at times t1 , t2 , t3 , where
     3
1 0 −2
p1 = p2 = p3 =
− 21 1 0
t1 = 6 t2 = 40 t3 = 50

We will run the hovercraft on the time interval [0, 70]. We’d like to apply a sequence
of inputs u(0), u(1), . . . , u(70) to make the hovercraft position pass through the above
sequence of points at the specified times.
We would like to find the sequence of inputs that drives the hovercraft through the
desired points which has the minimum cost, given by the sum of the squares of the
forces:
X70
ku(t)k2
t=0

To do this, pick Ahov and ydes to set this problem up as an equivalent minimum-norm
problem, where we would like to find the minimum-norm useq which satisfies

Ahov useq = ydes

where useq is the sequence of force inputs


 
u(0)
 u(1) 
useq =
 
.. 
 . 
u(70)

Plot the trajectory of the hovercraft using this input, and the way-points p1 , . . . , p3 . Also
plot the optimal u against time.

15
b) Now we would like to compute the trade-off curve between the accuracy with which the
mass passes through the waypoints and the norm of the force used. Let our two objective
functions be
X3
J1 = kq(ti ) − pi k2 = kAhov useq − ydes k2
i=1
and
70
X
J2 = ku(t)k2
t=0
By minimizing the weighted sum
J1 + µJ2
for a range of values of µ, plot the trade-off curve of J1 against J2 showing the achievable
performance. To generate suitable values of µ, you may find the logspace command
useful in Matlab; you’ll need to pick appropriate maximum and minimum values. This
above trade-off curve shows how we can trade-off between how accurately the hovercraft
passes through the waypoints and how much input energy is used.
c) For each of the following values of µ
p
{ 10 2 | p = −2, 0, 2, . . . , 10 }
plot the trajectories all on the same plot, together with the waypoints.
d) Now suppose we are controlling the hovercraft by radio control, and the maximum range
possible between the transmitter and receiver is 2 (in whatever units we are using for
distance.) Notice that, if we use the minimum-norm input then the hovercraft passes
out of range, both when making its first turn and on the final stretch (between times 50
and 70).
We’d like to do something about this, but trading off the input norm as above doesn’t
do the right thing; if µ is large then the hovercraft stays within range, but misses the
waypoints entirely; if µ is small then it comes close to the waypoints, but goes out of
range. Notice that this is particularly a problem on the final stretch between times 50
and 70; explain why this is.
e) One remedy for this problem is to solve a constrained multiobjective least-squares prob-
lem. We would like to impose the constraint that
Ahov useq = ydes
that is, achieve zero waypoint error J1 = 0. We can attempt to keep the hovercraft in
range by trading off the sum of the squares of the position
70
X
J3 = kq(t)k2
t=0

against input cost J2 subject to this constraint. To do this, we’ll solve


minimize J3 + γJ2
subject to Ahov useq = ydes

16
First, find the matrix W so that the cost function is given by

J3 + γJ2 = kW useq k2

f) Now we have a problem of the form

minimize kW uk2
subject to Au = ydes

This is called a weighted minimum-norm solution; the only difference from the usual
minimum-norm solution to Au = ydes is the presence of the matrix W , and when W = I
the optimal u is just given by uopt = A† ydes . Show that the solution for general W is

uopt = Σ−1 AT (AΣ−1 AT )−1 ydes

where Σ = W T W . (One way to do this is using Lagrange multipliers.) Use this to solve
the remaining parts of this problem.

g) For each of the following values of γ


p
{ 10 2 | p = 0, 2, 4, . . . , 20 }

Plot the trajectories all on the same plot, together with the waypoints. Explain what
you see.

h) By trying different values of γ, you should be able to find a trajectory which just keeps the
hovercraft within range. Plot the trajectory of the hovercraft; what is the corresponding
value of γ? Is this the smallest-norm input u that just keeps the hovercraft within range,
and drives the hovercraft through the waypoints? Explain why, or why not.

i) For a range of values of γ, plot the trade-off curve of J3 against J2 showing the achievable
performance.

Solution.

a) Setting  
1 0 0 0
C=
0 0 1 0
gives the position of the hovercraft at time t as
t−1
X
y(t) = CAt−1−τ Bu(τ )
τ =0

The parameters for the least-squares problem are therefore

CAt1 −1 B CAt1 −1 · · · CB 0 0 . . . 0
   
p1
Ahov =  CAt2 −1 B CAt2 −2 B ··· 0 ydes =  p2 
CA t3 −1 B CA t3 −2 B ··· 0 p3

17
Solving this least squares problem gives optimal trajectory

−1

−2

−2 −1 0 1 2

The corresponding optimal input sequence is below.

0.1

0.05

−0.05
0 10 20 30 40 50 60 70

b) The weighted sum objective is


    2
Ahov ydes
J1 + µJ2 = √ useq −
µI 0

where  
u(0)
useq =  ... 
 

u(69)

18
and so the optimal input sequence is given by
 †  
Away ydes
useq = √
µI 0
Choosing values of µ between 1 and 107 using mus=logspace(0,7,50), the trade-off
curve is shown below.
4.5

3.5

2.5
J1

1.5

0.5

0
0 0.005 0.01 0.015 0.02 0.025 0.03
J2

c) All of the trajectories together are

−1

−2

−2 −1 0 1 2

We can see clearly that increasing µ reduces the accuracy with which the trajectory
passes through the waypoints.

19
d) On the final stretch the input is zero, and so is unaffected by increasing µ. We were
attempting to use the heuristic ’keeping u small keeps x small’ but this fails, because
when u = 0 the hovercraft just keeps going in a straight line.

e) We would like to minimize J3 + γJ2 subject to the constraints that the hovercraft moves
through the waypoints. Denote the sequence of positions of the hovercraft by
 
y(0)
yseq =  ... 
 

y(T )

where T = 70. Then we have


yseq = T useq
where T is the Toeplitz matrix
 
0

 CB 0 

T =
 CAB CB 0 

 .. .. 
 . . 
CAT −1 B CAT −2 B . . . CB

Now the cost function is

J3 + γJ2 = kT useq k2 + γkuseq k2


= kW useq k2

where  
T
W = √
γI

f) We’d like to solve

minimize kW uk2
subject to Au = ydes

One way to solve this is using Lagrange multipliers; if we augment the cost function by
the Lagrange multipliers multiplied by the constraints, we have

L(u, λ) = uT Σu + λT (Au − ydes )

and the optimality conditions are


∂L
= 2uTopt Σ + λT A = 0
∂u
∂L
= uTopt AT − ydes
T
=0
∂λ

20
The first condition gives
1
uopt = − Σ−1 AT λ
2
and substituting this into the second we have
1
− AΣ−1 AT λ = ydes
2
hence
λ = −2(AΣ−1 AT )−1 ydes
and
uopt = Σ−1 AT (AΣ−1 AT )−1 ydes
as desired.

21
g) The trajectory for a range of γ values is shown below. (Actually these are clearer on
separate plots)

2 γ=1 2 γ=100

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=1000 2 γ=3000

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=10000 2 γ=1e+06

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=1e+08 2 γ=1e+10

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

We can see the trade-off clearly; decreasing γ causes the hovercraft to try very hard to

22
stay close to the origin. Also notice the asymmetry caused by the different times at
which the hovercraft must be at the waypoints.

h) A good choice of gamma is about 1.7 × 104 . Here the trajectory just remains within
range, as shown below.

−1

−2

−2 −1 0 1 2

This is not the smallest-norm u that keeps the hovercraft within range and drives the
hovercraft through the waypoints, because we are minimizing the sum of the squares
of kq(t)k, rather than constraining each kq(t)k independently. You can see this in the
plot, since in the final stretch the hovercraft is expending extra effort to stay well within
range, and this excessive input could be reduced.
In fact, one can compute the exact optimal, but this is not required and not covered in
this course; (an approximation of) it is below.

23
2

−1

−2

−2 −1 0 1 2

i) The trade-off is below.

100

90

80

70

60
J3

50

40

30

20

10

0
0 0.5 1 1.5 2
J2

Notice that the vertical asymptote occurs when J2 ≈ 0.03; this is the minimum-norm of
u which drives the hovercraft through the desired trajectory, as seen in part (b).

6. You Must Construct Additional Pylons. You are the Hierarch of the Baelaam charged
with maintaining the power levels of energizing pylons which power various structures in your
base of operations. Consider m structures powered by n pylons. Each structure’s energy level

24
yj for j = 1...m is given by
n
X pi
yj (p) = log( exp( 2 ))
i=1
dj,i

Where pi are the power levels of the i’th pylon and dj,i are the distances between the j’th
structure and the i’th pylon (we choose log-sum-exp as a smooth approximation of the max
function). While each structure has some given target energy level Rj , they can handle some
deviation (either over or under), however that will cause damage to the Nexus Crystals that
act as energy conduits for the structure. Your goal as Hierarch is to find a set of pylon power
levels p ∈ Rn that minimizes the total square deviation, J, from the required energy levels.
m
X
J(p) = (Rj − yj (p))2
j=1

Your chief engineer proposes that you could linearize the yj (p) function to find an update
algorithm that starts with some initial pylon power level and changes the power each step by
a small amount to reduce the total energy deviation J.
a) Find an update expression for the approximate power level y(p+δp) as a linear dynamical
system where y ∈ Rm is the vector of structure energy levels. I.E find A and B such
that

y(p + δp) ≈ Ay(p) + Bδp


We want to relate the energy level at p + δp to the energy level at p and the change in
energy from a small change in power δp.
hint: B is not necessarily constant

b) Derive an expression for the one step change in power levels that minimizes
m
X
J(p + δp) = (Rj − yj (p + δp))2
j=1

as a function of y(p), A, B, δp. Use the result of this minimization problem, (the optimal
δp) to determine an update expression for p[k + 1] = p[k] + αδp, where α is a given step
size, and k is the current iteration. If your method involves an inverse, explain what
conditions must hold in order for the inverse to exist.

c) Given the following list of required energy levels and locations of each structure and
pylon, apply your algorithm for 200 iterations with an α = .01 and in initial power level
of p[0] =[20, 40, 20].
Plot the pylon power levels and the structures energy levels for each iteration. as well
as the the power deviation metric J. There should be 3 plots total. It should converge
in roughly 150-200 iterations.
Also report the final cost and pylon power levels

Stucture_energy_goal=[10, 20 , 5 , 10 , 5]

25
Strcuture_location=[2 8; 4 5; 6 8; 2 2; 4 1]
Pylon_location=[2 5; 3 4 ; 5 4]
p0=[20 40 20]

For locations, each row is an x,y location.

Solution. Solution.

a) 6 points
First we linearize the energy function by finding the Jacobian of the energy function.
Clearly A = I , but B is a bit more involved
 ∂y1 ∂y1 
∂p1 ... ∂pn
y(p + αδp) = y(p) + Bδp = y(p) +  ... .. ..  δp

. . 
∂ym ∂ym
∂p1 ... ∂pn

now we find the partial derivatives

∂yj exp( dp2i ) 1


j,i
= Pn pi 2
∂pi i=1 exp( d2 ) dj,i
j,i

We see that B is now just a function of p. Partial Credit was given to those who
attempted to find B

b) 7 points
We can substitute our expression for y[k + 1] here

J[k + 1] = ||(R − (y(p) + Bδp)||2 = ||z − Hδp||2

Where z is the difference between R − y[k] (the last error vector) and H is the Jacobian
matrix of the energy function Which yields δp(H T H)−1 H T z as the expression that
minimizes the one step power update. Thus the update to the pylon power is given by

y[k + 1] = y[k] + α(H T H)−1 H T (R − y[k])

c) 7 points Final pylon power levels after 200 iterations


[81.3, 35.2, 34.7]
Final cost 3.7
for 50 iterations
Final pylon power levels [52.7, 40.3, 35.7]
Final cost 24.1
Code (in Julia) that solves this problem [language=Julia]code2.jl

26
2/figs/pylon.png

Figure 1: pylon power levels

2/figs/structure.png

Figure 2: structure energy levels

2/figs/cost.png

Figure 3: cost

27

You might also like