Chapter 2
Chapter 2
Control
1�
c
Chapter 2
2.1 Introduction
If the criterion used to measure the error e � y ; Ax in the case of inconsistent system of
equations is the sum of squared magnitudes of the error components, i.e. e0 e, or equivalently
the square root of this, which is the usual Euclidean norm or 2-norm kek2 , then the problem
is called a least squares problem. Formally it can be written as
x ky ; Axk2 :
min (2.1)
The x that minimizes this criterion is called the least square error estimate, or more simply,
the least squares estimate. The choice of this criterion and the solution of the problem go
back to Legendre (1805) and Gauss (around the same time).
64 ... .. 75 + 64 ... 75
4 . 54 .
m;1
} | 1 tN tN{z � � � tN
| y{zN � e
2
y
} | m{zx;1 } | {zeN }
A
X N
The problem is to �nd �0 � : : : � �m;1 such that e0 e � e2i is minimized.
i;1
This solves the least squares estimation problem that we have posed.
The above result, though rather abstractly developed, is immediately applicable to many
concrete cases of interest.
� Specializing to the case of Rm or Cm, and choosing x to minimize the usual Euclidean
norm,
X
m
kek2 � e0 e � jeij2
i�1
we have
xb � (A0 A);1 A0y
Note that if the columns of A form a mutually orthogonal set (i.e. an orthogonal basis
for R(A)), then A0 A is diagonal, and its inversion is trivial.
� If instead we choose to minimize e0Se for some positive de�nite Hermitian S (�6 I ), we
have a weighted least squares problem, with solution given by
for speci�ed functions y(t) and ai (t). If, for instance, y(t) is of �nite extent (or �nite
\support") T , and the ai (t) are sinusoids whose frequencies are integral multiples of
2��T , then the formulas that we obtain for the xi are just the familiar Fourier series
expressions. A simpli�cation in this example is that the vectors in A are orthogonal, so
� A� A � is diagonal.
2.6 Recursive Least Squares (optional)
What if the data is coming in sequentially� Do we have to recompute everything each time
a new data point comes in, or can we write our new, updated estimate in terms of our old
estimate�
Consider the model
yi � Aix + ei � i � 0� 1� : : : � (2.2)
where yi 2 Cm�1 , Ai 2 Cm�n , x 2 Cn�1 , and ei 2 Cm�1 . The vector ek represents the
mismatch between the measurement yk and the model for it, Ak x, where Ak is known and x
is the vector of parameters to be estimated. At each time k, we wish to �nd
�X
k
! �X
k !
xbk � arg min
x (yi ; Ai x)0i Si (yi ; Ai x) � arg mxin e0 Siei
i � (2.3)
i�1 i�1
where Si 2 Cm�m is a positive de�nite Hermitian matrix of weights, so that we can vary the
importance of the ei 's and components of the ei 's in determining xbk .
To compute xbk+1 , let:
2 y0
3 2 A0
3 2 e0
3
66 y1 77 66 A1 77 66 e1 77
yk+1 � 666 : 77 �
75 Ak+1 � 666 : 77 �
75 ek+1 � 666 : 77 �
75
4 : 4 : 4 :
and
S k+1 � diag (S0 � S1 � : : : � Sk+1 )
where Si is the weighting matrix for ei .
Our problem is then equivalent to
min(e0k+1 S k+1 ek+1 )
subject to: yk+1 � Ak+1 xk+1 + ek+1
The solution can thus be written as
(A0k+1S k+1 Ak+1 )xbk+1 � A0k+1 S k+1yk+1
or in summation form as
�kX
+1 ! kX
+1
A0 Si Ai
i xbk+1 � A0iSiyi
i�0 i�0
De�ning
kX
+1
Qk+1 � A0iSiAi :
i�0
we can write a recursion for Qk+1 as follows:
Qk+1 � Qk + A0k+1Sk+1Ak+1:
Rearranging the summation form equation for xbk+1 , we get
h� �
1 Pk A0 S A xb + A0 S y
i
xbk+1 � Q;k+1 i�0 i i i k k+1 k +1 k +1
h i
� Q;k+1
1 Q xb + A0 S y
k k k+1 k+1 k+1
This clearly displays the new estimate as a weighted combination of the old estimate and the
new data, so we have the desired recursion. Another useful form of this result is obtained by
substituting from the recursion for Qk+1 above to get
1 ;A0 S A xb ; A0 S y � �
xbk+1 � xbk ; Q;k+1 k+1 k+1 k+1 k k+1 k+1 k+1
which �nally reduces to
xbk+1 � xbk + Q;k+1 k+1 k+1 (|yk+1 ;{zAk+1 xbk})
1 A0 S
| {z } innovations
Kalman Filter Gain
The quantity Q;k+1 k+1 k+1 is called the Kalman gain, and yk+1 ; Ak+1 xbk is called the
1 A0 S
innovations, since it compares the di�erence between a data update and the prediction given
the last estimate.
Unfortunately, as one acquires more and more data, i.e. as k grows large, the Kalman gain
goes to zero. One data point cannot make much headway against the mass of previous data
which has `hardened' the estimate. If we leave this estimator as is|without modi�cation|the
estimator `goes to sleep' after a while, and thus doesn't adapt well to parameter changes. The
homework investigates the concept of a `fading memory' so that the estimator doesn't go to
sleep.
An Implementation Issue
Another concept which is important in the implementation of the RLS algorithm is the com-
putation of Q;k+1
1 . If the dimension of Q is very large, computation of its inverse can b e
k
computationally expensive, so one would like to have a recursion for Q;k+1
1 .
This recursion is easy to obtain. Applying the handy matrix identity
� �;1
rho � ones(size(a))./sqrt(a)�
(a) Suppose 16 exact measurements of f (t) are available to you, taken at the times ti listed in the
array T below:
Compare the quality of the two approximations by plotting y(ti ), p15 (ti ) and p2 (ti ) for all ti
in T . To see how well we are approximating the function on the whole interval, also plot f (t),
p15 (t) and p2 (t) on the interval [0� 2]. (Pick a very �ne grid for the interval, e.g. t�[0:1000]'/500.)
Report your observations and comments.
(b) Now suppose that your measurements are a�ected by some noise. Generate the measurements
using
yi � f (ti ) + e(ti ) i � 1� : : : � 16 ti 2 T
where the vector of noise values can be generated in the following way:
randn(0 seed0� 0)�
e � randn(size(T ))�
Again determine the coe�cients of the least square error polynomial approximation of the mea-
surements for
1. a polynomial of degree 15, p15 (t)�
2. a polynomial of degree 2, p2 (t).
Compare the two approximations as in part (a). Report your observations and comments.
Explain any surprising results.
(c) So far we have obtained polynomial approximations of f (t) � t 2 [0� 2] � by approximating the
measurements at ti 2 T . We are now interested in minimizing the square error of the polynomial
eT1 S1 e1 + e2T S2 e2 can be written entirely in terms of x^1 , x^2 , and the n � n matrices Q1 � C1T S1 C1 and
Q2 � C2T S2 C2 . What is the signi�cance of this result�
(a) Show (by reducing this to a problem that we already know how to solve | don't start from
scratch!) that the value x^k of x that minimizes the criterion
X
k
f k;i e2i � some �xed f� 0 � f � 1
i�1
is given by
�X
k �;1�X
k �
x^k � f k;i cTi ci f k;i ciT yi
i�1 i�1
The so-called fade or forgetting factor f allows us to preferentially weight the more recent mea-
surements by picking 0 � f � 1, so that old data is discounted at an exponential rate. We
then say that the data has been subjected to exponential fading or forgetting or weighting or
windowing or tapering or ... . This is usually desirable, in order to keep the �lter adaptive to
changes that may occur in x. Otherwise the �lter becomes progressively less attentive to new
data and falls asleep, with its gain approaching 0.
(b) Now show that
x^k � x^k;1 + Q;k 1 cTk (yk ; ck x^k;1 )
where
Qk � fQk;1 + cTk ck � Q0 � 0
The vector gk � Q;k 1 cTk is termed the gain of the estimator.
(c) If x and ci are scalars, and ci is a constant c, determine gk as a function of k. What is the
steady-state gain g1� Does g1 increase or decrease as f increases | and why do you expect
this�
Exercise 2.5 Suppose our model for some waveform y(t) is y(t) � � sin (!t), where � is a scalar,
and suppose we have measurements y(t1 )� : : : � y(tp ). Because of modeling errors and the presence of
measurement noise, we will generally not �nd any choice of model parameters that allows us to pre-
cisely account for all p measurements.
minimizing values of these variables. Using the Gauss-Newton algorithm for this nonlinear least
squares problem, i.e. applying LLSE to the problem obtained by linearizing about the initial
estimates, determine explicitly the estimates �1 and !1 obtained after one iteration of this
algorithm. Use the following notation to help you write out the solution in a condensed form:
X X X
a� sin2 (!0 ti ) � b � t2i cos2 (!0 ti ) � c � ti [sin(w0 ti )][cos(w0 ti )]
(d) What values do you get for �1 and !1 with the data given in (b) above if the initial guesses
are �0 � 3:2 and !0 � 1:8 � Continue the iterative estimation a few more steps. Repeat the
procedure when the initial guesses are �0 � 3:5 and !0 � 2:5, verifying that the algorithm does
not converge.
(e) Since only ! enters the model nonlinearly, we might think of a decomposed algorithm, in which �
is estimated using linear least squares and ! is estimated via nonlinear least squares. Suppose,
for example, that our initial estimate of ! is !0 � 1:8. Now obtain an estimate �1 of � using the
linear least squares method that you used in (b). Then obtain an (improved�) estimate !1 of !,
using one iteration of a Gauss-Newton algorithm (similar to what is needed in (c), except that
now you are only trying to estimate !). Next obtain the estimate �2 via linear least squares,
and so on. Compare your results with what you obtain via this decomposed procedure when
your initial estimate is !0 � 2:5 instead of 1.8.
a) Set up the linear system of equations whose least square error solution would be x^iji . Similarly,
set up the linear system of equations whose least square error solution would be x^iji;1 .
is known as the Kalman �lter. A more elaborate version of the Kalman �lter would include additive
noise driving the state-space model, and other embellishments, all in a stochastic context (rather than
the deterministic one given here).
Exercise 2.8 Let x^ denote the value of x that minimizes ky ; Axk2 , where A has full column rank.
Let x denote the value of x that minimizes this same criterion, but now subject to the constraint that
z � Dx, where D has full row rank. Show that
� �;1
x � x^ + (AT A);1 DT D(AT A);1 DT (z ; Dx^)
(Hint: One approach to solving this is to use our recursive least squares formulation, but modi�ed for
the limiting case where one of the measurement sets | namely z � Dx in this case | is known to
have no error. You may have to use some of the matrix identities from the previous chapter).