0% found this document useful (0 votes)

14 views27 pages

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Alp Er

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views27 pages

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Alp Er

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Homework 3 Solutions

EE 263 Stanford University Summer 2018

July 12, 2018

1. Least-squares residuals. Suppose A is skinny and full-rank. Let xls be the least-squares
approximate solution of Ax = y, and let yls = Axls . Show that the residual vector r = y − yls
satisfies
krk2 = kyk2 − kyls k2 .
Also, give a brief geometric interpretation of this equality (just a couple of sentences, and
maybe a conceptual drawing).

Solution. Let us first show that r ⊥ yls . Since yls = Axls = AA† y = A(AT A)−1 AT y

yls T r = yls T (y − yls ) = yls T y − yls T yls

= y T A(AT A)−1 AT y − y T A(AT A)−1 AT A(AT A)−1 AT y
= y T A(AT A)−1 AT y − y T A(AT A)−1 (AT A)(AT A)−1 AT y
= y T A(AT A)−1 AT y − y T A(AT A)−1 AT y
= 0.

1
Thus, kyk2 = kyls + rk2 = (yls + r)T (yls + r) = kyls k2 + 2yls T r + krk2 = kyls k2 + krk2 . Therefore
krk2 = kyk2 − kyls k2 .

y
EE
−→ By Pythagoras’ theorem, kyk2 = kyls k2 + krk2
E
E
E
E
E
E
E krk
E R(A)
E
kyk E
E
E
E
E E
E yls = Axls

kyls k
0

2. Least-squares model fitting. In this problem you will use least-squares to fit several
different types of models to a given set of input/output data. The data consist of a scalar
input sequence u, and a scalar output sequence y, for t = 1, . . . , N . You will develop several
different models that relate the signals u and y.

• Memoryless models. In a memoryless model, the output at time t, i.e., y(t), depends
only the input at time t, i.e., u(t). Another common term for such a model is static.

constant model: y(t) = c0

static linear: y(t) = c1 u(t)
static affine: y(t) = c0 + c1 u(t)
static quadratic: y(t) = c0 + c1 u(t) + c2 u(t)2

• Dynamic models. In a dynamic model, y(t) depends on u(s) for some s 6= t. We consider
some simple time-series models (see problem 2 in the reader), which are linear dynamic

2
models.
moving average (MA): y(t) = a0 u(t) + a1 u(t − 1) + a2 u(t − 2)
autoregressive (AR): y(t) = a0 u(t) + b1 y(t − 1) + b2 y(t − 2)
autoregressive moving average (ARMA): y(t) = a0 u(t) + a1 u(t − 1) + b1 y(t − 1)
Note that in the AR and ARMA models, y(t) depends indirectly on all previous inputs,
u(s) for s < t, due to the recursive dependence on y(t − 1). For this reason, the AR and
ARMA models are said to have infinite memory. The MA model, on the other hand, has
a finite memory: y(t) depends only on the current and two previous inputs. (Another
term for this MA model is 3-tap system, where taps refer to taps on a delay line.)
Each of these models is specified by its parameters, i.e., the scalars ci , ai , bi . For each of these
models, find the least-squares fit to the given data. In other words, find parameter values that
minimize the sum-of-squares of the residuals. For example, for the ARMA model, pick a0 , a1 ,
and b1 that minimize
N
X
(y(t) − a0 u(t) − a1 u(t − 1) − b1 y(t − 1))2 .
t=2

(Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are defined.)
For each model, give the root-mean-square (RMS) residual, i.e., the squareroot of the mean
of the optimal residual squared. Plot the output ŷ predicted by your model, and plot the
residual (which is y − ŷ). The data for this problem are available from the class web page in
the file uy_data.json. This file contains the vectors u and y and the scalar N (the length of
the vectors). Now you can plot u, y, etc. Note: the dataset u, y is not generated by any of
the models above. It is generated by a nonlinear recursion, which has infinite memory.

Solution. For each of the given models, we get a linear relationship between the outputs
and the unknown parameters. For example, for the constant model we have
 
y(1)  
 y(2)  1
  .. 
 ..  =  .  c0

 . 
1
y(N )
Or for the static quadratic model
u(1)2
   
y(1) 1 u(1)  
 y(2)   1 u(2) u(2)2  c0
 ..  =  ..  c1 ,
    
.. ..
 .  . . . c2

y(N ) 1 u(N ) u(N )2
Similarly, for the autoregressive moving average model we get
   
y(2) u(2) u(1) y(1)  
 y(3)   u(3) u(2) y(2)  a0

 ..  =  ..   a1  ,
  
.. ..
 .   . . . b1

y(N ) u(N ) u(N − 1) y(N − 1)

3
(Note that for this model we start from y(2), since u(0) and y(0) are undefined). All of
the above are in the form of y = Ax, where y is the output sequence, x is the vector of
corresponding unknown coefficients. The goal is to find the coefficients that minimize the
sum-of-squares of the residuals. This is nothing but the least-squares solution of y = Ax,
given by xls = (AT A)−1 AT y. Then using xls , the model output can be computed as ŷ = Axls .
This can be done easily in matlab:
uy_data; % read u,y,N
A1=ones(N,1); A2=u; A3=[ones(N,1), u]; A4=[ones(N,1), u, u.^2];
x1=A1\y; y1_hat=A1*x1; r1=y-y1_hat; rms1=sqrt(mean(r1.^2))
x2=A2\y; y2_hat=A2*x2; r2=y-y2_hat; rms2=sqrt(mean(r2.^2))
x3=A3\y; y3_hat=A3*x3; r3=y-y3_hat; rms3=sqrt(mean(r3.^2))
x4=A4\y; y4_hat=A4*x4; r4=y-y4_hat; rms4=sqrt(mean(r4.^2))
A5=[u(3:N), u(2:N-1), u(1:N-2)]; y5=y(3:N); A6=[u(3:N), y(2:N-1),
y(1:N-2)]; y6=y(3:N); A7=[u(2:N), u(1:N-1), y(1:N-1)]; y7=y(2:N);
x5=A5\y5; y5_hat=A5*x5; r5=y5-y5_hat; rms5=sqrt(mean(r5.^2))
x6=A6\y6; y6_hat=A6*x6; r6=y6-y6_hat; rms6=sqrt(mean(r6.^2))
x7=A7\y7; y7_hat=A7*x7; r7=y7-y7_hat; rms7=sqrt(mean(r7.^2))
figure(1); subplot(211); plot(y1_hat,’b’); grid on; hold on;
plot(r1,’--r’); hold off; title(’constant’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y2_hat,’b’); grid on; hold
on; plot(r2,’--r’); hold off; title(’linear’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(2); subplot(211); plot(y3_hat,’b’); grid on; hold on;
plot(r3,’--r’); hold off; title(’affine’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y4_hat,’b’); grid on; hold
on; plot(r4,’--r’); hold off; title(’quadratic’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(3); subplot(211); plot(y5_hat,’b’); grid on; hold on;
plot(r5,’--r’); hold off; title(’MA’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y6_hat,’b’); grid on; hold
on; plot(r7,’--r’); hold off; title(’AR’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(4); subplot(211); plot(y7_hat,’b’); grid on; hold on;
plot(r7,’--r’); hold off; title(’ARMA’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(1); print uy_1 figure(2); print uy_2 figure(3); print uy_3
figure(4); print uy_4

And the following RMS values for the residuals are obtained: Constant: 1.1181, linear: 0.5940,
affine: 0.5210, quadratic: 0.5179; MA : 0.2504, AR : 0.1783, ARMA : 0.1853 For the mem-
oryless models, the error decreases as the model becomes more complicated. However, the
models with memory perform significantly better. Among these the error decreases with the
introduction of autoregressive terms. Among the memoryless models, the affine model would
be a good choice, since the more complicated quadratic model yields only slightly smaller
residuals. Overall, the autoregressive model seems to do a good job. Of course, to choose a

4
model, we should really validate on another batch of data. Note that in this problem we were
only concerned with model fitting to the data, and not in ‘validating’ the model, i.e., how well
this model will work for inputs other than the ones used for fitting the model. The following
plots show ŷ (solid) and the residuals y − ŷ (dashed):
constant
3

0
ŷ

−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n
linear
3

0
ŷ

−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n

affine
3

1
ŷ

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n
quadratic
3

1
ŷ

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

5
moving average
3

ŷ
−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
n
autoregressive
3

1
ŷ

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

autoregressive moving average

1
ŷ

−1

−2
0 10 20 30 40 50 60 70 80 90 100
n

3. Identifying a system from input/output data. We consider the standard setup:

y = Ax + v,

where A ∈ Rm×n , x ∈ Rn is the input vector, y ∈ Rm is the output vector, and v ∈ Rm is the
noise or disturbance. We consider here the problem of estimating the matrix A, given some
input/output data. Specifically, we are given the following:

x(1) , . . . , x(N ) ∈ Rn , y (1) , . . . , y (N ) ∈ Rm .

These represent N samples or observations of the input and output, respectively, possibly
corrupted by noise. In other words, we have

y (k) = Ax(k) + v (k) , k = 1, . . . , N,

where v (k) are assumed to be small. The problem is to estimate the (coefficients of the) matrix
A, based on the given input/output data. You will use a least-squares criterion to form an

6
estimate Â of A. Specifically, you will choose as your estimate Â the matrix that minimizes
the quantity
XN
J= kAx(k) − y (k) k2
k=1

over A.

a) Explain how to do this. If you need to make an assumption about the input/output
data to make your method work, state it clearly. You may want to use the matrices
X ∈ Rn×N and Y ∈ Rm×N given by

X = x(1) · · · x(N ) ,

Y = y (1) · · · y (N )

in your solution.

b) On the course web site you will find some input/output data for an instance of this
problem in the file sysid_data.json. Executing this Julia file will assign values to m, n,
and N , and create two matrices that contain the input and output data, respectively. The
n × N matrix variable X contains the input data x(1) , . . . , x(N ) (i.e., the first column of X
contains x(1) , etc.). Similarly, the m×N matrix Y contains the output data y (1) , . . . , y (N ) .
You must give your final estimate Â, your source code, and also give an explanation of
what you did.

Solution.

a) We start by expressing the objective function J as

N
X
J= kAx(k) − y (k) k2
k=1
XN Xm
= (Ax(k) − y (k) )2i
k=1 i=1
N X
m
(k)
X
= (aT
i x
(k)
− yi ) 2
k=1 i=1
m N
!
(k)
X X
= (aT
i x
(k)
− yi )2 ,
i=1 k=1

where aT
i is the ith row of A. The last expression shows that J is a sum of expressions Ji
(shown in parentheses), each of which only depends on ai . This means that to minimize
J, we can minimize each of these expressions separately. That makes sense: we can
estimate the rows of A separately. Now let’s see how to minimize
N
(k)
X
Ji = (aT
i x
(k)
− yi )2 ,
k=1

7
which is the contribution to J from the ith row of A. First we write it as
 (1) 2
x(1)T
 
yi
 ..   .. 
Ji =  .  ai −  .  .
x(N )T (N )
yi

Now that we have the problem in the standard least-squares format, we’re pretty much
done. Using the matrix X ∈ Rn×N given by

X = x(1) · · · x(N ) ,

we can express the estimate as

 (1) 
yi
T −1  ..
âi = (XX ) X  . .

(N )
yi

Using the matrix Y ∈ Rm×N given by

Y = y (1) · · · y (N ) ,

we can express the estimate of A as

ÂT = (XX T )−1 XY T .

Transposing this gives the final answer:

Â = Y X T (XX T )−1 .

b) Once you have the neat formula found above, it’s easy to get matlab to compute the
estimate. It’s a little inefficient, but perfectly correct, to simply use

Ahat = Y*X’*inv(X*X’);

This yields the estimate

2.03 5.02 5.01
 
 0.01 7 1.01 
 
 7.04 0 6.94 
 
 7 3.98 4 
 
 9.01 1.04 7 
Â = 
 4.01
.
 3.96 9.03 
 4.99 6.97 8.03 
 
 7.94 6.09 3.02 
 
 0.01 8.97 −0.04 
1.06 8.02 7.03

8
Once you’ve got Â, it’s a good idea to check the residuals, just to make sure it’s reason-
able, by comparing it to
XN
ky (k) k2 .
k=1

Here we get (64.5)2 ,

around 4.08%. There are several other ways to compute Â in
matlab. You can calculate the rows of Â one at a time, using

a1hat = (X’\(Y(i,:)’))’;

In fact, the backslash operator in matlab solves multiple least-squares problems at once,
so you can use

AhatT = X’ \ (Y’);
Ahat = AhatT’;

Ahat = (X’\(Y’))’;

In any case, it’s not exactly a long matlab program . . .

4. Curve-smoothing. We are given a function F : [0, 1] → R (whose graph gives a curve in

R2 ). Our goal is to find another function G : [0, 1] → R, which is a smoothed version of F .
We’ll judge the smoothed version G of F in two ways:

• Mean-square deviation from F , defined as

Z 1
D= (F (t) − G(t))2 dt.
0

• Mean-square curvature, defined as

Z 1
C= G00 (t)2 dt.
0

We want both D and C to be small, so we have a problem with two objectives. In general
there will be a trade-off between the two objectives. At one extreme, we can choose G = F ,
which makes D = 0; at the other extreme, we can choose G to be an affine function (i.e.,
to have G00 (t) = 0 for all t ∈ [0, 1]), in which case C = 0. The problem is to identify the
optimal trade-off curve between C and D, and explain how to find smoothed functions G
on the optimal trade-off curve. To reduce the problem to a finite-dimensional one, we will
represent the functions F and G (approximately) by vectors f, g ∈ Rn , where

fi = F (i/n), gi = G(i/n).

9
You can assume that n is chosen large enough to represent the functions well. Using this
representation we will use the following objectives, which approximate the ones defined for the
functions above:
• Mean-square deviation, defined as
n
1X
d= (fi − gi )2 .
n
i=1

• Mean-square curvature, defined as

n−1 2
1 X gi+1 − 2gi + gi−1
c= .
n−2 1/n2
i=2

In our definition of c, note that

gi+1 − 2gi + gi−1
1/n2
gives a simple approximation of G00 (i/n). You will only work with this approximate version
of the problem, i.e., the vectors f and g and the objectives c and d.
a) Explain how to find g that minimizes d + µc, where µ ≥ 0 is a parameter that gives
the relative weighting of sum-square curvature compared to sum-square deviation. Does
your method always work? If there are some assumptions you need to make (say, on
rank of some matrix, independence of some vectors, etc.), state them clearly. Explain
how to obtain the two extreme cases: µ = 0, which corresponds to minimizing d without
regard for c, and also the solution obtained as µ → ∞ (i.e., as we put more and more
weight on minimizing curvature).

b) Get the file curve_smoothing.json from the course web site. This file defines a specific
vector f that you will use. Find and plot the optimal trade-off curve between d and c.
Be sure to identify any critical points (such as, for example, any intersection of the curve
with an axis). Plot the optimal g for the two extreme cases µ = 0 and µ → ∞, and for
three values of µ in between (chosen to show the trade-off nicely). On your plots of g,
be sure to include also a plot of f , say with dotted line type, for reference. Submit your
matlab code.

Solution.
a) Let’s start with the two extreme cases. When µ = 0, finding g to minimize d + µc
reduces to finding g to minimize d. Since d is a sum of squares, d ≥ 0. Choosing g = f
trivially achieves d = 0. This makes perfect sense: to minimize the deviation measure,
just take the smoothed version to be the same as the original function. This yields zero
deviation, naturally, but also, it yields no smoothing! Next, consider the extreme case
where µ → ∞. This means we want to make the curvature as small as possible. Can
we drive it to zero? The answer is yes, we can: the curvature is zero if and only if g is
an affine function, i.e., has the form gi = ai + b for some constants a and b. There are
lots of vectors g that have this form; in fact, we have one for every pair of numbers a, b.

10
All of these vectors g make c zero. Which one do we choose? Well, even if µ is huge, we
still have a small contribution to d + µc from d, so among all g that make c = 0, we’d
like the one that minimizes d. Basically, we want to find the best affine approximation,
in the sum of squares sense, to f . We want to find a and b that minimize
 
1 1
2 1
a
 
f −A where A =  3 1  .
 
b  .. .. 
. .
n 1

For n ≥ 2, A is skinny and full rank, and a and b can be found using least-squares.
Specifically, [a b]T = (AT A)−1 AT f . In the general case, minimizing d + µc, is the same
as choosing g to minimize
  2
−1 2 −1 0 · · · 0
1 1 2  0 −1 2 −1 · · · 0 
n2  
√ Ig − √ f + µ √ . . . . g .
n n n−2 0

0 .. . . . . .. 

0 0 · · · −1 2 −1
| {z }
S

This is a multi-objective least-squares problem. The minimizing g is

" # " #
√I √f
T −1 T n n
g = (Ã Ã) Ã ỹ where Ã = √ and ỹ = .
µS 0

The inverse of Ã always always exists because I is full rank. The expression can also be
written as g = ( nI + µS T S)−1 nf .

b) The following plots show the optimal trade-off curve and the optimal g corresponding

11
to representative µ values on the curve.
6 Optimal tradeoff curve
x 10
2

1.8

1.6
Sum−square curvature (x intercept = 1.9724e06)

1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Sum−square deviation (y intercept = 0.3347)

Curves illustrating the trade−off

1.5

0.5

−0.5

−1

−1.5

−2 f
u=0
u = 10e−7
−2.5 u = 10e−5
u = 10e−4
u = infinity
−3
5 10 15 20 25 30 35 40 45 50

12
The following matlab code finds and plots the optimal trade-off curve between d and c.
It also finds and plots the optimal g for representative values of µ. As expected, when
µ = 0, g = f and no smoothing occurs. At the other extreme, as µ goes to infinity, we
get an affine approximation of f . Intermediate values of µ correspond to approximations
of f with different degrees of smoothness.

close all;
clear all;
curve_smoothing
S = toeplitz([-1; zeros(n-3,1)],[-1 2 -1 zeros(1,n-3)]);
S = S*n^2/(sqrt(n-2));
I = eye(n);
g_no_deviation = f;
error_curvature(1) = norm(S*g_no_deviation)^2;
error_deviation(1) = 0;
u = logspace(-8,-3,30);
for i = 1:length(u)
A_tilde = [1/sqrt(n)*I; sqrt(u(i))*S];
y_tilde = [1/sqrt(n)*f; zeros(n-2,1)];
g = A_tilde\y_tilde;
error_deviation(i+1) = norm(1/sqrt(n)*I*g-f/sqrt(n))^2;
error_curvature(i+1) = norm(S*g)^2;
end
a1 = 1:n;
a1 = a1’;
a2 = ones(n,1);
A = [a1 a2];
affine_param = inv(A’*A)*A’*f;
for i = 1:n
g_no_curvature(i) = affine_param(1)*i+affine_param(2);
end
g_no_curvature = g_no_curvature’;
error_deviation(length(u)+2) = 1/n*norm(g_no_curvature-f)^2;
error_curvature(length(u)+2) = 0;
figure(1);
plot(error_deviation, error_curvature);
xlabel(’Sum-square deviation (y intercept = 0.3347)’);
ylabel(’Sum-square curvature (x intercept = 1.9724e06)’);
title(’Optimal tradeoff curve ’);
print curve_extreme.eps;
u1 = 10e-7;
A_tilde = [1/sqrt(n)*I;sqrt(u1)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g1 = A_tilde\y_tilde;
u2 = 10e-5;
A_tilde = [1/sqrt(n)*I;sqrt(u2)*S];

13
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g2 = A_tilde\y_tilde;
u3 = 10e-4;
A_tilde = [1/sqrt(n)*I;sqrt(u3)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g3 = A_tilde\y_tilde;
figure(3);
plot(f,’*’);
hold;
plot(g_no_deviation);
plot(g1,’--’);
plot(g2,’-.’);
plot(g3,’-’);
plot(g_no_curvature,’:’);
axis tight;
legend(’f’,’u = 0’,’u = 10e-7’, ’u = 10e-5’, ’u = 10e-4’,’u = infinity’,0);
title(’Curves illustrating the trade-off’);
print curve_tradeoff.eps;

Note: Several exams had a typo that defined

n−1 2
1 X gi+1 − 2gi + gi−1
c=
n−1 1/n2
i=2

instead of
n−1 2
1 X gi+1 − 2gi + gi−1
c= .
n−2 1/n2
i=2

The solutions above reflect the second definition. Full credit was given for answers consistent
with either definition. Some common errors

• Several students tried to approximate f using low-degree polynomials. While fitting f

to a polynomial does smooth f , it does not necessarily minimize d + µc for some µ ≥ 0,
nor does it illustrate the trade-off between curvature and deviation.

• In explaining how to find the g that minimizes d + µc as µ → ∞, many people correctly

observed that if g ∈ null(S), then c = 0. For full credit, however, solutions had to show
how to choose the vector in null(S) that minimizes d.

• Many people chose to zoom in on a small section of the trade-off curve rather than plot
the whole range from 0 to µ → ∞. Those solutions received full-credit provided they
calculated the intersections with the axes (i.e. provided they found the minimum value
for d + µc when µ = 0 and when µ → ∞).

5. Hovercraft with limited range. We have a hovercraft moving in the plane with two
thrusters, each pointing through the center of mass, exerting forces in the x and y directions

14
with 100% efficiency. The hovercraft has mass 1. The discretized equations of motion for the
hovercraft are
  1 
1 1 0 0 2 0
0 1 0 0 x(t) +  1 0 u1 (t)

x(t + 1) = 
0

1 
0 1 1 0
2 u2 (t)
0 0 0 1 0 1

where x1 and x2 are the position and velocity in the x-direction, and x3 , x4 are the position
and velocity in the y-direction. Here

u1 (t)
u(t) =
u2 (t)

is the force acting on the hovercraft for time in the interval [t, t + 1). Let the position of the
vehicle at time t be q(t) ∈ R2 .

a) The hovercraft starts at the origin. We’d like to apply thrust to make it move through
points p1 , p2 , p3 at times t1 , t2 , t3 , where
3
1 0 −2
p1 = p2 = p3 =
− 21 1 0
t1 = 6 t2 = 40 t3 = 50

We will run the hovercraft on the time interval [0, 70]. We’d like to apply a sequence
of inputs u(0), u(1), . . . , u(70) to make the hovercraft position pass through the above
sequence of points at the specified times.
We would like to find the sequence of inputs that drives the hovercraft through the
desired points which has the minimum cost, given by the sum of the squares of the
forces:
X70
ku(t)k2
t=0

To do this, pick Ahov and ydes to set this problem up as an equivalent minimum-norm
problem, where we would like to find the minimum-norm useq which satisfies

Ahov useq = ydes

where useq is the sequence of force inputs

 
u(0)
 u(1) 
useq =
 
.. 
 . 
u(70)

Plot the trajectory of the hovercraft using this input, and the way-points p1 , . . . , p3 . Also
plot the optimal u against time.

15
b) Now we would like to compute the trade-off curve between the accuracy with which the
mass passes through the waypoints and the norm of the force used. Let our two objective
functions be
X3
J1 = kq(ti ) − pi k2 = kAhov useq − ydes k2
i=1
and
70
X
J2 = ku(t)k2
t=0
By minimizing the weighted sum
J1 + µJ2
for a range of values of µ, plot the trade-off curve of J1 against J2 showing the achievable
performance. To generate suitable values of µ, you may find the logspace command
useful in Matlab; you’ll need to pick appropriate maximum and minimum values. This
above trade-off curve shows how we can trade-off between how accurately the hovercraft
passes through the waypoints and how much input energy is used.
c) For each of the following values of µ
p
{ 10 2 | p = −2, 0, 2, . . . , 10 }
plot the trajectories all on the same plot, together with the waypoints.
d) Now suppose we are controlling the hovercraft by radio control, and the maximum range
possible between the transmitter and receiver is 2 (in whatever units we are using for
distance.) Notice that, if we use the minimum-norm input then the hovercraft passes
out of range, both when making its first turn and on the final stretch (between times 50
and 70).
We’d like to do something about this, but trading off the input norm as above doesn’t
do the right thing; if µ is large then the hovercraft stays within range, but misses the
waypoints entirely; if µ is small then it comes close to the waypoints, but goes out of
range. Notice that this is particularly a problem on the final stretch between times 50
and 70; explain why this is.
e) One remedy for this problem is to solve a constrained multiobjective least-squares prob-
lem. We would like to impose the constraint that
Ahov useq = ydes
that is, achieve zero waypoint error J1 = 0. We can attempt to keep the hovercraft in
range by trading off the sum of the squares of the position
70
X
J3 = kq(t)k2
t=0

against input cost J2 subject to this constraint. To do this, we’ll solve

minimize J3 + γJ2
subject to Ahov useq = ydes

16
First, find the matrix W so that the cost function is given by

J3 + γJ2 = kW useq k2

f) Now we have a problem of the form

minimize kW uk2
subject to Au = ydes

This is called a weighted minimum-norm solution; the only difference from the usual
minimum-norm solution to Au = ydes is the presence of the matrix W , and when W = I
the optimal u is just given by uopt = A† ydes . Show that the solution for general W is

uopt = Σ−1 AT (AΣ−1 AT )−1 ydes

where Σ = W T W . (One way to do this is using Lagrange multipliers.) Use this to solve
the remaining parts of this problem.

g) For each of the following values of γ

p
{ 10 2 | p = 0, 2, 4, . . . , 20 }

Plot the trajectories all on the same plot, together with the waypoints. Explain what
you see.

h) By trying different values of γ, you should be able to find a trajectory which just keeps the
hovercraft within range. Plot the trajectory of the hovercraft; what is the corresponding
value of γ? Is this the smallest-norm input u that just keeps the hovercraft within range,
and drives the hovercraft through the waypoints? Explain why, or why not.

i) For a range of values of γ, plot the trade-off curve of J3 against J2 showing the achievable
performance.

Solution.

a) Setting
1 0 0 0
C=
0 0 1 0
gives the position of the hovercraft at time t as
t−1
X
y(t) = CAt−1−τ Bu(τ )
τ =0

The parameters for the least-squares problem are therefore

CAt1 −1 B CAt1 −1 · · · CB 0 0 . . . 0
   
p1
Ahov =  CAt2 −1 B CAt2 −2 B ··· 0 ydes =  p2 
CA t3 −1 B CA t3 −2 B ··· 0 p3

17
Solving this least squares problem gives optimal trajectory

−1

−2

−2 −1 0 1 2

The corresponding optimal input sequence is below.

0.1

0.05

−0.05
0 10 20 30 40 50 60 70

b) The weighted sum objective is

2
Ahov ydes
J1 + µJ2 = √ useq −
µI 0

where  
u(0)
useq =  ... 
 

u(69)

18
and so the optimal input sequence is given by
†
Away ydes
useq = √
µI 0
Choosing values of µ between 1 and 107 using mus=logspace(0,7,50), the trade-off
curve is shown below.
4.5

3.5

2.5
J1

1.5

0.5

0
0 0.005 0.01 0.015 0.02 0.025 0.03
J2

c) All of the trajectories together are

−1

−2

−2 −1 0 1 2

We can see clearly that increasing µ reduces the accuracy with which the trajectory
passes through the waypoints.

19
d) On the final stretch the input is zero, and so is unaffected by increasing µ. We were
attempting to use the heuristic ’keeping u small keeps x small’ but this fails, because
when u = 0 the hovercraft just keeps going in a straight line.

e) We would like to minimize J3 + γJ2 subject to the constraints that the hovercraft moves
through the waypoints. Denote the sequence of positions of the hovercraft by
 
y(0)
yseq =  ... 
 

y(T )

where T = 70. Then we have

yseq = T useq
where T is the Toeplitz matrix
 
0

 CB 0 

T =
 CAB CB 0 

 .. .. 
 . . 
CAT −1 B CAT −2 B . . . CB

Now the cost function is

J3 + γJ2 = kT useq k2 + γkuseq k2

= kW useq k2

where
T
W = √
γI

f) We’d like to solve

minimize kW uk2
subject to Au = ydes

One way to solve this is using Lagrange multipliers; if we augment the cost function by
the Lagrange multipliers multiplied by the constraints, we have

L(u, λ) = uT Σu + λT (Au − ydes )

and the optimality conditions are

∂L
= 2uTopt Σ + λT A = 0
∂u
∂L
= uTopt AT − ydes
T
=0
∂λ

20
The first condition gives
1
uopt = − Σ−1 AT λ
2
and substituting this into the second we have
1
− AΣ−1 AT λ = ydes
2
hence
λ = −2(AΣ−1 AT )−1 ydes
and
uopt = Σ−1 AT (AΣ−1 AT )−1 ydes
as desired.

21
g) The trajectory for a range of γ values is shown below. (Actually these are clearer on
separate plots)

2 γ=1 2 γ=100

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=1000 2 γ=3000

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=10000 2 γ=1e+06

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

2 γ=1e+08 2 γ=1e+10

1 1

0 0

−1 −1

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2

We can see the trade-off clearly; decreasing γ causes the hovercraft to try very hard to

22
stay close to the origin. Also notice the asymmetry caused by the different times at
which the hovercraft must be at the waypoints.

h) A good choice of gamma is about 1.7 × 104 . Here the trajectory just remains within
range, as shown below.

−1

−2

−2 −1 0 1 2

This is not the smallest-norm u that keeps the hovercraft within range and drives the
hovercraft through the waypoints, because we are minimizing the sum of the squares
of kq(t)k, rather than constraining each kq(t)k independently. You can see this in the
plot, since in the final stretch the hovercraft is expending extra effort to stay well within
range, and this excessive input could be reduced.
In fact, one can compute the exact optimal, but this is not required and not covered in
this course; (an approximation of) it is below.

23
2

−1

−2

−2 −1 0 1 2

i) The trade-off is below.

100

60
J3

0
0 0.5 1 1.5 2
J2

Notice that the vertical asymptote occurs when J2 ≈ 0.03; this is the minimum-norm of
u which drives the hovercraft through the desired trajectory, as seen in part (b).

6. You Must Construct Additional Pylons. You are the Hierarch of the Baelaam charged
with maintaining the power levels of energizing pylons which power various structures in your
base of operations. Consider m structures powered by n pylons. Each structure’s energy level

24
yj for j = 1...m is given by
n
X pi
yj (p) = log( exp( 2 ))
i=1
dj,i

Where pi are the power levels of the i’th pylon and dj,i are the distances between the j’th
structure and the i’th pylon (we choose log-sum-exp as a smooth approximation of the max
function). While each structure has some given target energy level Rj , they can handle some
deviation (either over or under), however that will cause damage to the Nexus Crystals that
act as energy conduits for the structure. Your goal as Hierarch is to find a set of pylon power
levels p ∈ Rn that minimizes the total square deviation, J, from the required energy levels.
m
X
J(p) = (Rj − yj (p))2
j=1

Your chief engineer proposes that you could linearize the yj (p) function to find an update
algorithm that starts with some initial pylon power level and changes the power each step by
a small amount to reduce the total energy deviation J.
a) Find an update expression for the approximate power level y(p+δp) as a linear dynamical
system where y ∈ Rm is the vector of structure energy levels. I.E find A and B such
that

y(p + δp) ≈ Ay(p) + Bδp

We want to relate the energy level at p + δp to the energy level at p and the change in
energy from a small change in power δp.
hint: B is not necessarily constant

b) Derive an expression for the one step change in power levels that minimizes
m
X
J(p + δp) = (Rj − yj (p + δp))2
j=1

as a function of y(p), A, B, δp. Use the result of this minimization problem, (the optimal
δp) to determine an update expression for p[k + 1] = p[k] + αδp, where α is a given step
size, and k is the current iteration. If your method involves an inverse, explain what
conditions must hold in order for the inverse to exist.

c) Given the following list of required energy levels and locations of each structure and
pylon, apply your algorithm for 200 iterations with an α = .01 and in initial power level
of p[0] =[20, 40, 20].
Plot the pylon power levels and the structures energy levels for each iteration. as well
as the the power deviation metric J. There should be 3 plots total. It should converge
in roughly 150-200 iterations.
Also report the final cost and pylon power levels

Stucture_energy_goal=[10, 20 , 5 , 10 , 5]

25
Strcuture_location=[2 8; 4 5; 6 8; 2 2; 4 1]
Pylon_location=[2 5; 3 4 ; 5 4]
p0=[20 40 20]

For locations, each row is an x,y location.

Solution. Solution.

a) 6 points
First we linearize the energy function by finding the Jacobian of the energy function.
Clearly A = I , but B is a bit more involved
 ∂y1 ∂y1 
∂p1 ... ∂pn
y(p + αδp) = y(p) + Bδp = y(p) +  ... .. ..  δp

. . 
∂ym ∂ym
∂p1 ... ∂pn

now we find the partial derivatives

∂yj exp( dp2i ) 1

j,i
= Pn pi 2
∂pi i=1 exp( d2 ) dj,i
j,i

We see that B is now just a function of p. Partial Credit was given to those who
attempted to find B

b) 7 points
We can substitute our expression for y[k + 1] here

J[k + 1] = ||(R − (y(p) + Bδp)||2 = ||z − Hδp||2

Where z is the difference between R − y[k] (the last error vector) and H is the Jacobian
matrix of the energy function Which yields δp(H T H)−1 H T z as the expression that
minimizes the one step power update. Thus the update to the pylon power is given by

y[k + 1] = y[k] + α(H T H)−1 H T (R − y[k])

c) 7 points Final pylon power levels after 200 iterations

[81.3, 35.2, 34.7]
Final cost 3.7
for 50 iterations
Final pylon power levels [52.7, 40.3, 35.7]
Final cost 24.1
Code (in Julia) that solves this problem [language=Julia]code2.jl

26
2/figs/pylon.png

Figure 1: pylon power levels

2/figs/structure.png

Figure 2: structure energy levels

2/figs/cost.png

Figure 3: cost

Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Free Numerical Reasoning Test Questions Answers
100% (3)
Free Numerical Reasoning Test Questions Answers
18 pages
2023 02 Ansys General Hardware Recommendations
No ratings yet
2023 02 Ansys General Hardware Recommendations
24 pages
Cics Question Bank 1 of 28
No ratings yet
Cics Question Bank 1 of 28
28 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Simple Linear Regression-Merged
No ratings yet
Simple Linear Regression-Merged
65 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Fifth
No ratings yet
Fifth
7 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
No ratings yet
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
58 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lab 8
No ratings yet
Lab 8
13 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
No ratings yet
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
17 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Pinv For Modern ML
No ratings yet
Pinv For Modern ML
31 pages
SI2018
No ratings yet
SI2018
32 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Day 1
No ratings yet
Day 1
41 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Linear Least Squares
No ratings yet
Linear Least Squares
21 pages
Matrix Model
No ratings yet
Matrix Model
6 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Homework 2 MATH2050
No ratings yet
Homework 2 MATH2050
10 pages
CH 2
No ratings yet
CH 2
31 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
CH 2 Linear Equations 11
No ratings yet
CH 2 Linear Equations 11
28 pages
Cs421 Cheat Sheet
No ratings yet
Cs421 Cheat Sheet
2 pages
Least Squares Fit
No ratings yet
Least Squares Fit
13 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lab4 Ch22b023 CTC
No ratings yet
Lab4 Ch22b023 CTC
15 pages
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
No ratings yet
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
24 pages
Lecture 4 - Estimation - BMSLec03
No ratings yet
Lecture 4 - Estimation - BMSLec03
20 pages
Comparison of Various System Identification Methods For A MISO System
No ratings yet
Comparison of Various System Identification Methods For A MISO System
16 pages
Report System Identification and Modelling
No ratings yet
Report System Identification and Modelling
34 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
45 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
No ratings yet
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
21 pages
Regression
No ratings yet
Regression
12 pages
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
No ratings yet
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
15 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
EC501 Lecture 01
No ratings yet
EC501 Lecture 01
28 pages
Pattern Recognition Systems
No ratings yet
Pattern Recognition Systems
81 pages
Worksheet 2
No ratings yet
Worksheet 2
9 pages
Emetnotes 1
No ratings yet
Emetnotes 1
67 pages
Row-Reduced Echlon Format of A Matrix: Linear Algebra For Everyone
No ratings yet
Row-Reduced Echlon Format of A Matrix: Linear Algebra For Everyone
19 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
Econ 471 Notes 1
No ratings yet
Econ 471 Notes 1
14 pages
CH 01
No ratings yet
CH 01
20 pages
HW 1
No ratings yet
HW 1
4 pages
Amazon ML Summer School Previous Year Questions
100% (1)
Amazon ML Summer School Previous Year Questions
12 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
Part A: Working With Matrices
No ratings yet
Part A: Working With Matrices
7 pages
Least-Square Method
No ratings yet
Least-Square Method
32 pages
Assignement 3
No ratings yet
Assignement 3
2 pages
1 s2.0 S016412121630187X Main PDF
No ratings yet
1 s2.0 S016412121630187X Main PDF
13 pages
Grundfos - CR 5 12 A A A E HQQE
No ratings yet
Grundfos - CR 5 12 A A A E HQQE
10 pages
Vio's Bartering Money Guide For Poor People-1 PDF
No ratings yet
Vio's Bartering Money Guide For Poor People-1 PDF
13 pages
Operation Analytics and Investigating Metric Spike
50% (2)
Operation Analytics and Investigating Metric Spike
14 pages
Lecture 1 Basics of PCB
No ratings yet
Lecture 1 Basics of PCB
32 pages
Academic Calendar Spring 2018 FINAL
No ratings yet
Academic Calendar Spring 2018 FINAL
1 page
Polybutadiene Rubber
100% (1)
Polybutadiene Rubber
6 pages
Otondro Prohori, Guarding Who, Against What
No ratings yet
Otondro Prohori, Guarding Who, Against What
10 pages
Memo Example 1
No ratings yet
Memo Example 1
7 pages
Euler's Path
50% (2)
Euler's Path
10 pages
Curriculum - Vitae: Career Objective
No ratings yet
Curriculum - Vitae: Career Objective
3 pages
Background of The Study: Manual System in Generating Reports of Inventory and Check-Up
No ratings yet
Background of The Study: Manual System in Generating Reports of Inventory and Check-Up
5 pages
Fiber Optics Communication en
No ratings yet
Fiber Optics Communication en
50 pages
Abraham Wondale
No ratings yet
Abraham Wondale
73 pages
Arts 10: 3 Quarter Week 3
No ratings yet
Arts 10: 3 Quarter Week 3
10 pages
Electronic Evidence Rule
No ratings yet
Electronic Evidence Rule
4 pages
Tata Nano Car
No ratings yet
Tata Nano Car
34 pages
Oferta de Compraventa Bilingüe
No ratings yet
Oferta de Compraventa Bilingüe
6 pages
Samsung Manual-ACI3PR16001 R2
No ratings yet
Samsung Manual-ACI3PR16001 R2
32 pages
CPIM part 2 practice exam 2单词卡 - Quizlet
No ratings yet
CPIM part 2 practice exam 2单词卡 - Quizlet
15 pages
Lion Air Eticket (IQVQBS) - Diyarn Putra Maulana
No ratings yet
Lion Air Eticket (IQVQBS) - Diyarn Putra Maulana
4 pages
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
No ratings yet
Installation Instruction: Single Pole Insulated Conductor Rail Programme 812
9 pages
Using Google
No ratings yet
Using Google
3 pages
Rule 37-38 Full Text Cases PDF
No ratings yet
Rule 37-38 Full Text Cases PDF
202 pages
Background To IPSAS Implementation in Nigeria
67% (3)
Background To IPSAS Implementation in Nigeria
28 pages
Opa 2863
No ratings yet
Opa 2863
52 pages
STO Process - Pricing Procedure
No ratings yet
STO Process - Pricing Procedure
30 pages

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution

Uploaded by

Homework 3 Solutions

EE 263 Stanford University Summer 2018

July 12, 2018

yls T r = yls T (y − yls ) = yls T y − yls T yls

constant model: y(t) = c0

autoregressive moving average

3. Identifying a system from input/output data. We consider the standard setup:

x(1) , . . . , x(N ) ∈ Rn , y (1) , . . . , y (N ) ∈ Rm .

y (k) = Ax(k) + v (k) , k = 1, . . . , N,

a) We start by expressing the objective function J as

we can express the estimate as

Using the matrix Y ∈ Rm×N given by

we can express the estimate of A as

ÂT = (XX T )−1 XY T .

Transposing this gives the final answer:

This yields the estimate

Here we get (64.5)2 ,

In any case, it’s not exactly a long matlab program . . .

4. Curve-smoothing. We are given a function F : [0, 1] → R (whose graph gives a curve in

• Mean-square deviation from F , defined as

• Mean-square curvature, defined as

• Mean-square curvature, defined as

In our definition of c, note that

This is a multi-objective least-squares problem. The minimizing g is

Curves illustrating the trade−off

Note: Several exams had a typo that defined

• Several students tried to approximate f using low-degree polynomials. While fitting f

• In explaining how to find the g that minimizes d + µc as µ → ∞, many people correctly

Ahov useq = ydes

where useq is the sequence of force inputs

against input cost J2 subject to this constraint. To do this, we’ll solve

f) Now we have a problem of the form

uopt = Σ−1 AT (AΣ−1 AT )−1 ydes

g) For each of the following values of γ

The parameters for the least-squares problem are therefore

The corresponding optimal input sequence is below.

b) The weighted sum objective is

c) All of the trajectories together are

where T = 70. Then we have

Now the cost function is

J3 + γJ2 = kT useq k2 + γkuseq k2

f) We’d like to solve

L(u, λ) = uT Σu + λT (Au − ydes )

and the optimality conditions are

i) The trade-off is below.

y(p + δp) ≈ Ay(p) + Bδp

For locations, each row is an x,y location.

now we find the partial derivatives

∂yj exp( dp2i ) 1

J[k + 1] = ||(R − (y(p) + Bδp)||2 = ||z − Hδp||2

y[k + 1] = y[k] + α(H T H)−1 H T (R − y[k])

c) 7 points Final pylon power levels after 200 iterations

Figure 1: pylon power levels

Figure 2: structure energy levels

You might also like