Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution
Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution
1. Least-squares residuals. Suppose A is skinny and full-rank. Let xls be the least-squares
approximate solution of Ax = y, and let yls = Axls . Show that the residual vector r = y − yls
satisfies
krk2 = kyk2 − kyls k2 .
Also, give a brief geometric interpretation of this equality (just a couple of sentences, and
maybe a conceptual drawing).
Solution. Let us first show that r ⊥ yls . Since yls = Axls = AA† y = A(AT A)−1 AT y
1
Thus, kyk2 = kyls + rk2 = (yls + r)T (yls + r) = kyls k2 + 2yls T r + krk2 = kyls k2 + krk2 . Therefore
krk2 = kyk2 − kyls k2 .
y
EE
−→ By Pythagoras’ theorem, kyk2 = kyls k2 + krk2
E
E
E
E
E
E
E krk
E R(A)
E
kyk E
E
E
E
E E
E yls = Axls
kyls k
0
2. Least-squares model fitting. In this problem you will use least-squares to fit several
different types of models to a given set of input/output data. The data consist of a scalar
input sequence u, and a scalar output sequence y, for t = 1, . . . , N . You will develop several
different models that relate the signals u and y.
• Memoryless models. In a memoryless model, the output at time t, i.e., y(t), depends
only the input at time t, i.e., u(t). Another common term for such a model is static.
• Dynamic models. In a dynamic model, y(t) depends on u(s) for some s 6= t. We consider
some simple time-series models (see problem 2 in the reader), which are linear dynamic
2
models.
moving average (MA): y(t) = a0 u(t) + a1 u(t − 1) + a2 u(t − 2)
autoregressive (AR): y(t) = a0 u(t) + b1 y(t − 1) + b2 y(t − 2)
autoregressive moving average (ARMA): y(t) = a0 u(t) + a1 u(t − 1) + b1 y(t − 1)
Note that in the AR and ARMA models, y(t) depends indirectly on all previous inputs,
u(s) for s < t, due to the recursive dependence on y(t − 1). For this reason, the AR and
ARMA models are said to have infinite memory. The MA model, on the other hand, has
a finite memory: y(t) depends only on the current and two previous inputs. (Another
term for this MA model is 3-tap system, where taps refer to taps on a delay line.)
Each of these models is specified by its parameters, i.e., the scalars ci , ai , bi . For each of these
models, find the least-squares fit to the given data. In other words, find parameter values that
minimize the sum-of-squares of the residuals. For example, for the ARMA model, pick a0 , a1 ,
and b1 that minimize
N
X
(y(t) − a0 u(t) − a1 u(t − 1) − b1 y(t − 1))2 .
t=2
(Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are defined.)
For each model, give the root-mean-square (RMS) residual, i.e., the squareroot of the mean
of the optimal residual squared. Plot the output ŷ predicted by your model, and plot the
residual (which is y − ŷ). The data for this problem are available from the class web page in
the file uy_data.json. This file contains the vectors u and y and the scalar N (the length of
the vectors). Now you can plot u, y, etc. Note: the dataset u, y is not generated by any of
the models above. It is generated by a nonlinear recursion, which has infinite memory.
Solution. For each of the given models, we get a linear relationship between the outputs
and the unknown parameters. For example, for the constant model we have
y(1)
y(2) 1
..
.. = . c0
.
1
y(N )
Or for the static quadratic model
u(1)2
y(1) 1 u(1)
y(2) 1 u(2) u(2)2 c0
.. = .. c1 ,
.. ..
. . . . c2
y(N ) 1 u(N ) u(N )2
Similarly, for the autoregressive moving average model we get
y(2) u(2) u(1) y(1)
y(3) u(3) u(2) y(2) a0
.. = .. a1 ,
.. ..
. . . . b1
y(N ) u(N ) u(N − 1) y(N − 1)
3
(Note that for this model we start from y(2), since u(0) and y(0) are undefined). All of
the above are in the form of y = Ax, where y is the output sequence, x is the vector of
corresponding unknown coefficients. The goal is to find the coefficients that minimize the
sum-of-squares of the residuals. This is nothing but the least-squares solution of y = Ax,
given by xls = (AT A)−1 AT y. Then using xls , the model output can be computed as ŷ = Axls .
This can be done easily in matlab:
uy_data; % read u,y,N
A1=ones(N,1); A2=u; A3=[ones(N,1), u]; A4=[ones(N,1), u, u.^2];
x1=A1\y; y1_hat=A1*x1; r1=y-y1_hat; rms1=sqrt(mean(r1.^2))
x2=A2\y; y2_hat=A2*x2; r2=y-y2_hat; rms2=sqrt(mean(r2.^2))
x3=A3\y; y3_hat=A3*x3; r3=y-y3_hat; rms3=sqrt(mean(r3.^2))
x4=A4\y; y4_hat=A4*x4; r4=y-y4_hat; rms4=sqrt(mean(r4.^2))
A5=[u(3:N), u(2:N-1), u(1:N-2)]; y5=y(3:N); A6=[u(3:N), y(2:N-1),
y(1:N-2)]; y6=y(3:N); A7=[u(2:N), u(1:N-1), y(1:N-1)]; y7=y(2:N);
x5=A5\y5; y5_hat=A5*x5; r5=y5-y5_hat; rms5=sqrt(mean(r5.^2))
x6=A6\y6; y6_hat=A6*x6; r6=y6-y6_hat; rms6=sqrt(mean(r6.^2))
x7=A7\y7; y7_hat=A7*x7; r7=y7-y7_hat; rms7=sqrt(mean(r7.^2))
figure(1); subplot(211); plot(y1_hat,’b’); grid on; hold on;
plot(r1,’--r’); hold off; title(’constant’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y2_hat,’b’); grid on; hold
on; plot(r2,’--r’); hold off; title(’linear’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(2); subplot(211); plot(y3_hat,’b’); grid on; hold on;
plot(r3,’--r’); hold off; title(’affine’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y4_hat,’b’); grid on; hold
on; plot(r4,’--r’); hold off; title(’quadratic’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(3); subplot(211); plot(y5_hat,’b’); grid on; hold on;
plot(r5,’--r’); hold off; title(’MA’); xlabel(’n’);
ylabel(’y_{hat}’); subplot(212); plot(y6_hat,’b’); grid on; hold
on; plot(r7,’--r’); hold off; title(’AR’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(4); subplot(211); plot(y7_hat,’b’); grid on; hold on;
plot(r7,’--r’); hold off; title(’ARMA’); xlabel(’n’);
ylabel(’y_{hat}’);
figure(1); print uy_1 figure(2); print uy_2 figure(3); print uy_3
figure(4); print uy_4
And the following RMS values for the residuals are obtained: Constant: 1.1181, linear: 0.5940,
affine: 0.5210, quadratic: 0.5179; MA : 0.2504, AR : 0.1783, ARMA : 0.1853 For the mem-
oryless models, the error decreases as the model becomes more complicated. However, the
models with memory perform significantly better. Among these the error decreases with the
introduction of autoregressive terms. Among the memoryless models, the affine model would
be a good choice, since the more complicated quadratic model yields only slightly smaller
residuals. Overall, the autoregressive model seems to do a good job. Of course, to choose a
4
model, we should really validate on another batch of data. Note that in this problem we were
only concerned with model fitting to the data, and not in ‘validating’ the model, i.e., how well
this model will work for inputs other than the ones used for fitting the model. The following
plots show ŷ (solid) and the residuals y − ŷ (dashed):
constant
3
0
ŷ
−1
−2
−3
0 10 20 30 40 50 60 70 80 90 100
n
linear
3
0
ŷ
−1
−2
−3
0 10 20 30 40 50 60 70 80 90 100
n
affine
3
1
ŷ
−1
−2
0 10 20 30 40 50 60 70 80 90 100
n
quadratic
3
1
ŷ
−1
−2
0 10 20 30 40 50 60 70 80 90 100
n
5
moving average
3
ŷ
−1
−2
−3
0 10 20 30 40 50 60 70 80 90 100
n
autoregressive
3
1
ŷ
−1
−2
0 10 20 30 40 50 60 70 80 90 100
n
1
ŷ
−1
−2
0 10 20 30 40 50 60 70 80 90 100
n
y = Ax + v,
where A ∈ Rm×n , x ∈ Rn is the input vector, y ∈ Rm is the output vector, and v ∈ Rm is the
noise or disturbance. We consider here the problem of estimating the matrix A, given some
input/output data. Specifically, we are given the following:
These represent N samples or observations of the input and output, respectively, possibly
corrupted by noise. In other words, we have
where v (k) are assumed to be small. The problem is to estimate the (coefficients of the) matrix
A, based on the given input/output data. You will use a least-squares criterion to form an
6
estimate  of A. Specifically, you will choose as your estimate  the matrix that minimizes
the quantity
XN
J= kAx(k) − y (k) k2
k=1
over A.
a) Explain how to do this. If you need to make an assumption about the input/output
data to make your method work, state it clearly. You may want to use the matrices
X ∈ Rn×N and Y ∈ Rm×N given by
X = x(1) · · · x(N ) ,
Y = y (1) · · · y (N )
in your solution.
b) On the course web site you will find some input/output data for an instance of this
problem in the file sysid_data.json. Executing this Julia file will assign values to m, n,
and N , and create two matrices that contain the input and output data, respectively. The
n × N matrix variable X contains the input data x(1) , . . . , x(N ) (i.e., the first column of X
contains x(1) , etc.). Similarly, the m×N matrix Y contains the output data y (1) , . . . , y (N ) .
You must give your final estimate Â, your source code, and also give an explanation of
what you did.
Solution.
where aT
i is the ith row of A. The last expression shows that J is a sum of expressions Ji
(shown in parentheses), each of which only depends on ai . This means that to minimize
J, we can minimize each of these expressions separately. That makes sense: we can
estimate the rows of A separately. Now let’s see how to minimize
N
(k)
X
Ji = (aT
i x
(k)
− yi )2 ,
k=1
7
which is the contribution to J from the ith row of A. First we write it as
(1) 2
x(1)T
yi
.. ..
Ji = . ai − . .
x(N )T (N )
yi
Now that we have the problem in the standard least-squares format, we’re pretty much
done. Using the matrix X ∈ Rn×N given by
X = x(1) · · · x(N ) ,
Y = y (1) · · · y (N ) ,
 = Y X T (XX T )−1 .
b) Once you have the neat formula found above, it’s easy to get matlab to compute the
estimate. It’s a little inefficient, but perfectly correct, to simply use
Ahat = Y*X’*inv(X*X’);
8
Once you’ve got Â, it’s a good idea to check the residuals, just to make sure it’s reason-
able, by comparing it to
XN
ky (k) k2 .
k=1
a1hat = (X’\(Y(i,:)’))’;
In fact, the backslash operator in matlab solves multiple least-squares problems at once,
so you can use
AhatT = X’ \ (Y’);
Ahat = AhatT’;
or
Ahat = (X’\(Y’))’;
We want both D and C to be small, so we have a problem with two objectives. In general
there will be a trade-off between the two objectives. At one extreme, we can choose G = F ,
which makes D = 0; at the other extreme, we can choose G to be an affine function (i.e.,
to have G00 (t) = 0 for all t ∈ [0, 1]), in which case C = 0. The problem is to identify the
optimal trade-off curve between C and D, and explain how to find smoothed functions G
on the optimal trade-off curve. To reduce the problem to a finite-dimensional one, we will
represent the functions F and G (approximately) by vectors f, g ∈ Rn , where
fi = F (i/n), gi = G(i/n).
9
You can assume that n is chosen large enough to represent the functions well. Using this
representation we will use the following objectives, which approximate the ones defined for the
functions above:
• Mean-square deviation, defined as
n
1X
d= (fi − gi )2 .
n
i=1
b) Get the file curve_smoothing.json from the course web site. This file defines a specific
vector f that you will use. Find and plot the optimal trade-off curve between d and c.
Be sure to identify any critical points (such as, for example, any intersection of the curve
with an axis). Plot the optimal g for the two extreme cases µ = 0 and µ → ∞, and for
three values of µ in between (chosen to show the trade-off nicely). On your plots of g,
be sure to include also a plot of f , say with dotted line type, for reference. Submit your
matlab code.
Solution.
a) Let’s start with the two extreme cases. When µ = 0, finding g to minimize d + µc
reduces to finding g to minimize d. Since d is a sum of squares, d ≥ 0. Choosing g = f
trivially achieves d = 0. This makes perfect sense: to minimize the deviation measure,
just take the smoothed version to be the same as the original function. This yields zero
deviation, naturally, but also, it yields no smoothing! Next, consider the extreme case
where µ → ∞. This means we want to make the curvature as small as possible. Can
we drive it to zero? The answer is yes, we can: the curvature is zero if and only if g is
an affine function, i.e., has the form gi = ai + b for some constants a and b. There are
lots of vectors g that have this form; in fact, we have one for every pair of numbers a, b.
10
All of these vectors g make c zero. Which one do we choose? Well, even if µ is huge, we
still have a small contribution to d + µc from d, so among all g that make c = 0, we’d
like the one that minimizes d. Basically, we want to find the best affine approximation,
in the sum of squares sense, to f . We want to find a and b that minimize
1 1
2 1
a
f −A where A = 3 1 .
b .. ..
. .
n 1
For n ≥ 2, A is skinny and full rank, and a and b can be found using least-squares.
Specifically, [a b]T = (AT A)−1 AT f . In the general case, minimizing d + µc, is the same
as choosing g to minimize
2
−1 2 −1 0 · · · 0
1 1 2 0 −1 2 −1 · · · 0
n2
√ Ig − √ f + µ √ . . . . g .
n n n−2 0
0 .. . . . . ..
0 0 · · · −1 2 −1
| {z }
S
The inverse of à always always exists because I is full rank. The expression can also be
written as g = ( nI + µS T S)−1 nf .
b) The following plots show the optimal trade-off curve and the optimal g corresponding
11
to representative µ values on the curve.
6 Optimal tradeoff curve
x 10
2
1.8
1.6
Sum−square curvature (x intercept = 1.9724e06)
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Sum−square deviation (y intercept = 0.3347)
1.5
0.5
−0.5
−1
−1.5
−2 f
u=0
u = 10e−7
−2.5 u = 10e−5
u = 10e−4
u = infinity
−3
5 10 15 20 25 30 35 40 45 50
12
The following matlab code finds and plots the optimal trade-off curve between d and c.
It also finds and plots the optimal g for representative values of µ. As expected, when
µ = 0, g = f and no smoothing occurs. At the other extreme, as µ goes to infinity, we
get an affine approximation of f . Intermediate values of µ correspond to approximations
of f with different degrees of smoothness.
close all;
clear all;
curve_smoothing
S = toeplitz([-1; zeros(n-3,1)],[-1 2 -1 zeros(1,n-3)]);
S = S*n^2/(sqrt(n-2));
I = eye(n);
g_no_deviation = f;
error_curvature(1) = norm(S*g_no_deviation)^2;
error_deviation(1) = 0;
u = logspace(-8,-3,30);
for i = 1:length(u)
A_tilde = [1/sqrt(n)*I; sqrt(u(i))*S];
y_tilde = [1/sqrt(n)*f; zeros(n-2,1)];
g = A_tilde\y_tilde;
error_deviation(i+1) = norm(1/sqrt(n)*I*g-f/sqrt(n))^2;
error_curvature(i+1) = norm(S*g)^2;
end
a1 = 1:n;
a1 = a1’;
a2 = ones(n,1);
A = [a1 a2];
affine_param = inv(A’*A)*A’*f;
for i = 1:n
g_no_curvature(i) = affine_param(1)*i+affine_param(2);
end
g_no_curvature = g_no_curvature’;
error_deviation(length(u)+2) = 1/n*norm(g_no_curvature-f)^2;
error_curvature(length(u)+2) = 0;
figure(1);
plot(error_deviation, error_curvature);
xlabel(’Sum-square deviation (y intercept = 0.3347)’);
ylabel(’Sum-square curvature (x intercept = 1.9724e06)’);
title(’Optimal tradeoff curve ’);
print curve_extreme.eps;
u1 = 10e-7;
A_tilde = [1/sqrt(n)*I;sqrt(u1)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g1 = A_tilde\y_tilde;
u2 = 10e-5;
A_tilde = [1/sqrt(n)*I;sqrt(u2)*S];
13
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g2 = A_tilde\y_tilde;
u3 = 10e-4;
A_tilde = [1/sqrt(n)*I;sqrt(u3)*S];
y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];
g3 = A_tilde\y_tilde;
figure(3);
plot(f,’*’);
hold;
plot(g_no_deviation);
plot(g1,’--’);
plot(g2,’-.’);
plot(g3,’-’);
plot(g_no_curvature,’:’);
axis tight;
legend(’f’,’u = 0’,’u = 10e-7’, ’u = 10e-5’, ’u = 10e-4’,’u = infinity’,0);
title(’Curves illustrating the trade-off’);
print curve_tradeoff.eps;
instead of
n−1 2
1 X gi+1 − 2gi + gi−1
c= .
n−2 1/n2
i=2
The solutions above reflect the second definition. Full credit was given for answers consistent
with either definition. Some common errors
• Many people chose to zoom in on a small section of the trade-off curve rather than plot
the whole range from 0 to µ → ∞. Those solutions received full-credit provided they
calculated the intersections with the axes (i.e. provided they found the minimum value
for d + µc when µ = 0 and when µ → ∞).
5. Hovercraft with limited range. We have a hovercraft moving in the plane with two
thrusters, each pointing through the center of mass, exerting forces in the x and y directions
14
with 100% efficiency. The hovercraft has mass 1. The discretized equations of motion for the
hovercraft are
1
1 1 0 0 2 0
0 1 0 0 x(t) + 1 0 u1 (t)
x(t + 1) =
0
1
0 1 1 0
2 u2 (t)
0 0 0 1 0 1
where x1 and x2 are the position and velocity in the x-direction, and x3 , x4 are the position
and velocity in the y-direction. Here
u1 (t)
u(t) =
u2 (t)
is the force acting on the hovercraft for time in the interval [t, t + 1). Let the position of the
vehicle at time t be q(t) ∈ R2 .
a) The hovercraft starts at the origin. We’d like to apply thrust to make it move through
points p1 , p2 , p3 at times t1 , t2 , t3 , where
3
1 0 −2
p1 = p2 = p3 =
− 21 1 0
t1 = 6 t2 = 40 t3 = 50
We will run the hovercraft on the time interval [0, 70]. We’d like to apply a sequence
of inputs u(0), u(1), . . . , u(70) to make the hovercraft position pass through the above
sequence of points at the specified times.
We would like to find the sequence of inputs that drives the hovercraft through the
desired points which has the minimum cost, given by the sum of the squares of the
forces:
X70
ku(t)k2
t=0
To do this, pick Ahov and ydes to set this problem up as an equivalent minimum-norm
problem, where we would like to find the minimum-norm useq which satisfies
Plot the trajectory of the hovercraft using this input, and the way-points p1 , . . . , p3 . Also
plot the optimal u against time.
15
b) Now we would like to compute the trade-off curve between the accuracy with which the
mass passes through the waypoints and the norm of the force used. Let our two objective
functions be
X3
J1 = kq(ti ) − pi k2 = kAhov useq − ydes k2
i=1
and
70
X
J2 = ku(t)k2
t=0
By minimizing the weighted sum
J1 + µJ2
for a range of values of µ, plot the trade-off curve of J1 against J2 showing the achievable
performance. To generate suitable values of µ, you may find the logspace command
useful in Matlab; you’ll need to pick appropriate maximum and minimum values. This
above trade-off curve shows how we can trade-off between how accurately the hovercraft
passes through the waypoints and how much input energy is used.
c) For each of the following values of µ
p
{ 10 2 | p = −2, 0, 2, . . . , 10 }
plot the trajectories all on the same plot, together with the waypoints.
d) Now suppose we are controlling the hovercraft by radio control, and the maximum range
possible between the transmitter and receiver is 2 (in whatever units we are using for
distance.) Notice that, if we use the minimum-norm input then the hovercraft passes
out of range, both when making its first turn and on the final stretch (between times 50
and 70).
We’d like to do something about this, but trading off the input norm as above doesn’t
do the right thing; if µ is large then the hovercraft stays within range, but misses the
waypoints entirely; if µ is small then it comes close to the waypoints, but goes out of
range. Notice that this is particularly a problem on the final stretch between times 50
and 70; explain why this is.
e) One remedy for this problem is to solve a constrained multiobjective least-squares prob-
lem. We would like to impose the constraint that
Ahov useq = ydes
that is, achieve zero waypoint error J1 = 0. We can attempt to keep the hovercraft in
range by trading off the sum of the squares of the position
70
X
J3 = kq(t)k2
t=0
16
First, find the matrix W so that the cost function is given by
J3 + γJ2 = kW useq k2
minimize kW uk2
subject to Au = ydes
This is called a weighted minimum-norm solution; the only difference from the usual
minimum-norm solution to Au = ydes is the presence of the matrix W , and when W = I
the optimal u is just given by uopt = A† ydes . Show that the solution for general W is
where Σ = W T W . (One way to do this is using Lagrange multipliers.) Use this to solve
the remaining parts of this problem.
Plot the trajectories all on the same plot, together with the waypoints. Explain what
you see.
h) By trying different values of γ, you should be able to find a trajectory which just keeps the
hovercraft within range. Plot the trajectory of the hovercraft; what is the corresponding
value of γ? Is this the smallest-norm input u that just keeps the hovercraft within range,
and drives the hovercraft through the waypoints? Explain why, or why not.
i) For a range of values of γ, plot the trade-off curve of J3 against J2 showing the achievable
performance.
Solution.
a) Setting
1 0 0 0
C=
0 0 1 0
gives the position of the hovercraft at time t as
t−1
X
y(t) = CAt−1−τ Bu(τ )
τ =0
CAt1 −1 B CAt1 −1 · · · CB 0 0 . . . 0
p1
Ahov = CAt2 −1 B CAt2 −2 B ··· 0 ydes = p2
CA t3 −1 B CA t3 −2 B ··· 0 p3
17
Solving this least squares problem gives optimal trajectory
−1
−2
−2 −1 0 1 2
0.1
0.05
−0.05
0 10 20 30 40 50 60 70
where
u(0)
useq = ...
u(69)
18
and so the optimal input sequence is given by
†
Away ydes
useq = √
µI 0
Choosing values of µ between 1 and 107 using mus=logspace(0,7,50), the trade-off
curve is shown below.
4.5
3.5
2.5
J1
1.5
0.5
0
0 0.005 0.01 0.015 0.02 0.025 0.03
J2
−1
−2
−2 −1 0 1 2
We can see clearly that increasing µ reduces the accuracy with which the trajectory
passes through the waypoints.
19
d) On the final stretch the input is zero, and so is unaffected by increasing µ. We were
attempting to use the heuristic ’keeping u small keeps x small’ but this fails, because
when u = 0 the hovercraft just keeps going in a straight line.
e) We would like to minimize J3 + γJ2 subject to the constraints that the hovercraft moves
through the waypoints. Denote the sequence of positions of the hovercraft by
y(0)
yseq = ...
y(T )
where
T
W = √
γI
minimize kW uk2
subject to Au = ydes
One way to solve this is using Lagrange multipliers; if we augment the cost function by
the Lagrange multipliers multiplied by the constraints, we have
20
The first condition gives
1
uopt = − Σ−1 AT λ
2
and substituting this into the second we have
1
− AΣ−1 AT λ = ydes
2
hence
λ = −2(AΣ−1 AT )−1 ydes
and
uopt = Σ−1 AT (AΣ−1 AT )−1 ydes
as desired.
21
g) The trajectory for a range of γ values is shown below. (Actually these are clearer on
separate plots)
2 γ=1 2 γ=100
1 1
0 0
−1 −1
−2 −2
−2 −1 0 1 2 −2 −1 0 1 2
2 γ=1000 2 γ=3000
1 1
0 0
−1 −1
−2 −2
−2 −1 0 1 2 −2 −1 0 1 2
2 γ=10000 2 γ=1e+06
1 1
0 0
−1 −1
−2 −2
−2 −1 0 1 2 −2 −1 0 1 2
2 γ=1e+08 2 γ=1e+10
1 1
0 0
−1 −1
−2 −2
−2 −1 0 1 2 −2 −1 0 1 2
We can see the trade-off clearly; decreasing γ causes the hovercraft to try very hard to
22
stay close to the origin. Also notice the asymmetry caused by the different times at
which the hovercraft must be at the waypoints.
h) A good choice of gamma is about 1.7 × 104 . Here the trajectory just remains within
range, as shown below.
−1
−2
−2 −1 0 1 2
This is not the smallest-norm u that keeps the hovercraft within range and drives the
hovercraft through the waypoints, because we are minimizing the sum of the squares
of kq(t)k, rather than constraining each kq(t)k independently. You can see this in the
plot, since in the final stretch the hovercraft is expending extra effort to stay well within
range, and this excessive input could be reduced.
In fact, one can compute the exact optimal, but this is not required and not covered in
this course; (an approximation of) it is below.
23
2
−1
−2
−2 −1 0 1 2
100
90
80
70
60
J3
50
40
30
20
10
0
0 0.5 1 1.5 2
J2
Notice that the vertical asymptote occurs when J2 ≈ 0.03; this is the minimum-norm of
u which drives the hovercraft through the desired trajectory, as seen in part (b).
6. You Must Construct Additional Pylons. You are the Hierarch of the Baelaam charged
with maintaining the power levels of energizing pylons which power various structures in your
base of operations. Consider m structures powered by n pylons. Each structure’s energy level
24
yj for j = 1...m is given by
n
X pi
yj (p) = log( exp( 2 ))
i=1
dj,i
Where pi are the power levels of the i’th pylon and dj,i are the distances between the j’th
structure and the i’th pylon (we choose log-sum-exp as a smooth approximation of the max
function). While each structure has some given target energy level Rj , they can handle some
deviation (either over or under), however that will cause damage to the Nexus Crystals that
act as energy conduits for the structure. Your goal as Hierarch is to find a set of pylon power
levels p ∈ Rn that minimizes the total square deviation, J, from the required energy levels.
m
X
J(p) = (Rj − yj (p))2
j=1
Your chief engineer proposes that you could linearize the yj (p) function to find an update
algorithm that starts with some initial pylon power level and changes the power each step by
a small amount to reduce the total energy deviation J.
a) Find an update expression for the approximate power level y(p+δp) as a linear dynamical
system where y ∈ Rm is the vector of structure energy levels. I.E find A and B such
that
b) Derive an expression for the one step change in power levels that minimizes
m
X
J(p + δp) = (Rj − yj (p + δp))2
j=1
as a function of y(p), A, B, δp. Use the result of this minimization problem, (the optimal
δp) to determine an update expression for p[k + 1] = p[k] + αδp, where α is a given step
size, and k is the current iteration. If your method involves an inverse, explain what
conditions must hold in order for the inverse to exist.
c) Given the following list of required energy levels and locations of each structure and
pylon, apply your algorithm for 200 iterations with an α = .01 and in initial power level
of p[0] =[20, 40, 20].
Plot the pylon power levels and the structures energy levels for each iteration. as well
as the the power deviation metric J. There should be 3 plots total. It should converge
in roughly 150-200 iterations.
Also report the final cost and pylon power levels
Stucture_energy_goal=[10, 20 , 5 , 10 , 5]
25
Strcuture_location=[2 8; 4 5; 6 8; 2 2; 4 1]
Pylon_location=[2 5; 3 4 ; 5 4]
p0=[20 40 20]
Solution. Solution.
a) 6 points
First we linearize the energy function by finding the Jacobian of the energy function.
Clearly A = I , but B is a bit more involved
∂y1 ∂y1
∂p1 ... ∂pn
y(p + αδp) = y(p) + Bδp = y(p) + ... .. .. δp
. .
∂ym ∂ym
∂p1 ... ∂pn
We see that B is now just a function of p. Partial Credit was given to those who
attempted to find B
b) 7 points
We can substitute our expression for y[k + 1] here
Where z is the difference between R − y[k] (the last error vector) and H is the Jacobian
matrix of the energy function Which yields δp(H T H)−1 H T z as the expression that
minimizes the one step power update. Thus the update to the pylon power is given by
26
2/figs/pylon.png
2/figs/structure.png
2/figs/cost.png
Figure 3: cost
27