0% found this document useful (0 votes)
57 views23 pages

R Akne Ovningar Empirisk Modellering

This document contains problems and solutions related to linear regression modeling and parameter estimation. It covers topics such as linear trend models, stochastic processes, parameter estimation using least squares methods, and issues related to collinearity in models. Solutions are provided for problems involving linear regression on different data sets, calculating variances of parameter estimates, and determining when least squares methods cannot be used due to non-independent inputs.

Uploaded by

arafath1985
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views23 pages

R Akne Ovningar Empirisk Modellering

This document contains problems and solutions related to linear regression modeling and parameter estimation. It covers topics such as linear trend models, stochastic processes, parameter estimation using least squares methods, and issues related to collinearity in models. Solutions are provided for problems involving linear regression on different data sets, calculating variances of parameter estimates, and determining when least squares methods cannot be used due to non-independent inputs.

Uploaded by

arafath1985
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Rakneovningar Empirisk modellering

Bengt Carlsson
Systems and Control
Dept of Information Technology, Uppsala University
25th February 2009

Abstract
R
akneuppgifter samt lite kompletterande teori.

1
Contents
1 L1-Linear regression 3
1.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 L2-Stochastic processes and discrete time system 7


2.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Egenskaper kovariansfunktioner . . . . . . . . . . . . . . . . . . . . 10
2.4 Exempel kovariansfunktioner . . . . . . . . . . . . . . . . . . . . . . 11

3 L3 and L4-Parameter estimations 12


3.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 L5- Some additional problems 18


4.1 Complementary theory - Analysis of the least squares estimate . . . 18
4.1.1 Results from Linear Regression . . . . . . . . . . . . . . . 18
4.1.2 Results for the case when A3 does not hold . . . . . . . . . . 18
4.1.3 Bias, variance and mean squared error . . . . . . . . . . . . 19
4.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2
1 L1-Linear regression
1.1 Problems
Problem 1. A linear trend model
a) Consider a linear regression model

y(t) = a + bt
Calculate the least squares estimate for the following two cases
PN
1. The data are y (1); y (2); ::: y (N ). For this case, use So = y (t), S1 =
PN t=1

t=1 ty (t)
PN
2. The data are y ( N ); y ( N +1); ::: y (N ). For this case, use So = y (t),
P
S1 = Nt= N ty (t)
t= N

Hints:
X
N
N (N + 1)
t =
t=1
2
X
N
N (N + 1)(2N + 1)
t2 =
t=1
6

b) Suppose that the parameter a is estimated with


So
a = ; case 1 above
N
So
a = ; case 2 above
2N + 1
The parameter b is estimated by the least squares method using the model struc-
ture:
y(t) a = bt
Calculate b for the two cases and compare with the estimate obtained in (a).

c) Assume that the data y (1); y (2); ::: y (N ) is generated by


y (t) = ao + bo t + e(t)
where e(t) is white noise with variance . Calculate the variance of the quantity
s(t) = a + bt. What is the variance for t = 1 and t = N ? For which t is the
variance minimal?
Hint:
Let  = (a b)T and '(t) = (1 t)T . Then

var s(t) = '(t)T P '(t)

where P = var .
2) Some accuracy results for linear trend models

3
a) Assume that the data y (1); y (2); ::: y (N ) is generated by
y (t) = ao + bo t + e(t)
where e(t) is white noise with variance . The parameters in a linear trend model

y(t) = a + bt
are estimated with the least squares method. Calculate the variance of b.
b) Assume that we dierenced the data and introduce a new signal

z (t) = y (t) y (t 1) t = 2; 3; :::N


We then have that the data z (t) obeys

z (t) = bo + w(t) (1)

where the new noise source w (t) = e(t) e(t 1). The parameter bo may then be
estimated from the following model

z(t) = b
Calculate the variance of b and compare with the accuracy obtained in (2a).
Hint:
Note that the noise w (t) in (1) is not white. Hence the expression (??) need to be
used when calculating the variance.

3. The problem of collinearity.


Consider the following model

y(t) = au1 (t) + bu2 (t)


Here u1 (t) and u2 (t) are two measured input signals. Suppose that the data is
generated by
y (t) = ao u1 + bu2
where u1 = K and u2 = L, that is two constant signals with amplitude K and
L are used as input signals. Show that det T = 0, and hence the least squares
method can not be used.
Remark:
The columns in must be linearly independent for (T ) 1 to exist.

1.2 Solutions
1) In general we have

X
N X
N
 = [ '(t)'T (t)] 1
'(t)y (t)) = [T ] 1 T Y (2)
t=1 t=1

For the considered model  = (a b)T , and ' = (1 t)T .


Case (i). Data y (1); : : : ; y (N ).

4
" PN PN t # 1 " S # " N # 1" #
 = 1
PtN=1 PNt=1 2 o
N (N +1)
2
So
t=1 t t=1 t S1 = N (N2+1) N (N +1)(2N +1)
S1
" # 6
1
N (N 1)
[2(2N + 1)So 6S1 ]
= 6
N (N 1)(N +1)
[2S1 (N + 1)So ]

Case (ii). Data y (N ); : : : ; y (N ). This gives 2N + 1 data points. All sums will
P
have the form Nt= N .

" # 1" # " #


2N + 1 0 So = 1
S
 = N (N +1)(2N +1)
S1
2N +1 o
3
S1 (3)
0 3 N (N +1)(2N +1)

b)
The model y (t) a = bt gives the least squares estimate
Pt P
(y (t) a) S1 a t
b = P = P2
t2 t
case (i):
b = 3
[2S1 (N + 1)So ]
N (N + 1)(2N + 1)
which is not equal to the solution in a) and is therefore wrong.

case (ii):
S1 0 3
b = P t2 = S1
N (N + 1)(2N + 1)
which is equal to the solution in a) and is therefore correct.

c)

X " PN PN # " # 1
t=1 t N
N 1 N (N +1)
1
P = [ '(t)' (t)] T 1
=  PtN=1 P = 2
t=1 t t=1 t
N 2 N (N +1) N (N +1)(2N +1)
t=1 2 6
(4)
Straightforward calculations using the P above and ' = (1 t) give T

12 (N + 1)(2N + 1)
var s(t) = '(t)T P '(t) = [t2 t(N + 1) + ]
N (N + 1)(N 1) 6
and
12 (N + 1)(2N + 1) 4
var s(1) = [12 1(N +1)+ ] when N is large
N (N + 1)(N 1) 6 N
12 (N + 1)(2N + 1) 4
var s(N ) = [N 2 N (N +1)+ ] = var s(1)  when N is large
N (N + 1)(N 1) 6 N
The t giving minimal variance is obtained by setting
d 12
var s(t) = [2t (N + 1)] = 0
dt N (N + 1)(N 1)

5
which gives t = N 2+1 . Hence, the minimal variance is in the middle of the observa-
tion interval. Furthermore
N +1 12 (N + 1)2 (N + 1)2 (N + 1)(2N + 1) 
var s( )= [ + ]=
2 N (N + 1)(N 1) 4 2 6 N
q
For large data sets, the standard deviation decreases from 2 (=N ) at s(1) and
q
s(N ) to (=N ) in the middle of the interval.
2) For a linear regression we have that

X
N
cov = [T ] 1
= [ '(t)'T (t)] 1

t=1

For a linear trend ' = (1 t)T . This gives


" PN PN # " # 1
PNt=1 t2 N
1 N (N +1)
t=1 1
cov =  PN t t = N (N +1)
2
N (N +1)(2N +1)
t=1 t=1 2 6

The variance of b is then found (the matrix above need to be inverted) to be


 N (N122 1) .
b) When the data is dierentiated we have that the system is

z (t) = bo + w(t)
where w(t) = e(t) e(t 1). We have
Rw (0) = E fw2 (t)g = E f[e(t) e(t 1)]2 g = 2
Rw (1) = E fw(t + 1)w(t)g = E f(e(t + 1) e(t))(e(t) e(t 1))g = 
Rw ( k ) = 0 k > 1
The noise is not white and in order to calculate the variance we need to calculate
R = E fwwT g where w = (w(1); w(2); : : : ; w(N 1))T . See Section 4.3 in Linear
regression. Note that when the data is dierentiated one data point is lost. We
have 2 3
2 1 ::: 0
66 1 2 : : : 0 77
R =  66 .. .. 77
4 . . 1 5
0 ::: 1 2

For the model z (t) = b we have '(t) = 1 and = (1; : : : ; 1)T (with N 1 rows).

We can now calculate the variance of b (the least squares estimate) from the
covariance matrix (which becomes the variance since  is a scalar)

1 1 2
cov = varb = (T ) 1 T R(T ) 1
= 2 = (5)
N 1 N 1 (N 1)2

6
It is easily seen that this expression is larger than the one in a).

3) We have '(t) = (u1 (t) u2 (t))T , with u1 (t) = K and u2 (t) = L. We assume the
number of data to be N . This gives the (N j2) matrix:
2 3
K L
6 .. .. 77
=6
4 . . 5
K L
and " #
T
=
NK 2 NKL
NKL NL2
We then see that det(T ) = 0 and hence the LS method can not be used. It is
not possible to determine the parameters uniquely from this data set (which also
should be intiutively clear).

2 L2-Stochastic processes and discrete time sys-


tem
2.1 Problems
1. Static gain and stability.
Consider the discrete time system
0:1 + 1q 1
y (t) = H (q )u(t); where H (q ) =
1 0:2q 1

Calculate, poles, zeros and static gain of the system. Is the system stable?

2. Spectrum.
The following stochastic process is given:

y (t) 0:2y (t 1) = e(t) 0:1e(t 1)


where e(t) is white noise with zero mean and variance .

a) Determine the spectrum of y ; y (! ).

b) Determine the cross spectrum ye (! ).

3. Covariance function and spectrum.


Consider the following system
bq 1
y (t) = H (q )u(t); where H (q ) =
1 + aq 1

The input signal has the following covariance function Ru (0) = 1, Ru (1) = Ru ( 1) =
0:5, Ru ( ) = 0 for j j > 1. Calculate the spectrum of the output signal.

7
4. Covariance functions
Calculate the covariance function for the following stochastic processes where e(t)
is white noise with variance .

a)
y (t) = e(t) + e(t 1)
b)
y (t) + ay (t 1) = e(t) jaj < 1

2.2 Solutions
1)
0:1 + 1q 1 0:1q + 1
H (q ) = =
1 0:2q 1 q 0:2
0:1z + 1
H (z ) =
z 0:2
We immediately see that the system has one zero in z = 10 and one pole in
z = 0:2. Informally we could as well solve the roots with the q -operator (but not
with q 1 -operator).
The static gain is obtained by setting q = 1 (or z = 1). We have H (1) = 01:1+1
0:2
=
1:1=0:8
The system is stable since all poles are inside the unit circle.

2) We can write the system as

1 0:1q 1
q 0:1
H (q ) = =
1 0:2q 1 q 0:2

a) First note that since e(t) is white noise whith variance  e (! ) = . By using
y (! ) = jH (eiw )j2 e (! )

we get

ei! 0:1 2 (ei! 0:1)(e i!


0:1) 1:01 0:2 cos(! )
y (! ) = j j  = i! = 
ei! 0:2 (e 0:2)(e i! 0:2) 1:04 0:4 cos(! )

b) The cross spectra for a discrete time system H (q ) is given by


ye (! ) = H (eiw )e (! )

which directly gives


ei!
0:1
ye (! ) = i! 
e0:2
3) The denition of spectral density (spectrum) is

X1
k=
u (! ) = Ru (k)e i!k

k= 1

8
The given covariance function gives
u (! ) = 0:5e i!
+ 1 + 0:5ei! = 1 + cos(! )
and the output spectral density is
b b b2 (1 + cos(! ))
y (! ) = jH (eiw j2 u (! ) = H (eiw )H (e iw
)u (! ) = (1+cos(! )) =
eiw + a e iw +a 1 + a2 + 2a cos(! )
4 a) The MA(1) process is
y (t) = e(t) + e(t 1)
and we directly get
Ry (0) = Ey 2 (t) = E (e(t) + e(t 1))(e(t) + e(t 1)) = (1 + 2 )
Ry (1) = Ey (t + 1)y (t) = E (e(t + 1) + e(t))(e(t) + e(t 1)) = 
Ry (k) = Ey (t + k)y (t) = E (e(t + k) + e(t + k 1))(e(t) + e(t 1)) = 0; for k > 1
b) The covariance function for the AR(1) process
y (t) ay (t 1) = e(t) jaj < 1
can be solved (this also holds for a general AR(n) process) by the so called Yule
Walker equations (not a course requirement). Basically the idea is to multiply the
AR process with y (t k ) and take expectations. We have to distinguish between
k = 0, and k > 0. For k = 0 we get
y (t)(y (t) + ay (t 1)) = e(t)y (t)
Taking expectations give
Ry (0) + aRy (1) = 
For k > 0 we get
y (t k)(y (t) + ay (t 1)) = e(t)y (t k)
Taking exectations give
Ry (k) + aRy (k 1) = 0
We hence end up with a lset of linear equations. For k = 0; 1 we get
! ! !
1 a Ry (0) =
 (6)
a 1 Ry (1) 0
with solution

Ry (0) =
1 a2

Ry (1) = ( a)
1 a2

It is easy to see that



Ry (k) = ( a)jkj
1 a2

9
2.3 Egenskaper kovariansfunktioner
at x(t) vara en stokastisk stationar1 process med medelvarde Ex(t) = 0. Pro-
L
cessens kovariansfunktion denieras av

R( ) = E [x(t +  )x(t)]


Foljande relationer galler

 R(0)  R( )  0.
Bevis: E [x(t +  )  x(t)]2 = E [x(t +  )]2 + 2E [x(t)]2  2E [x(t +  )x(t)] =
R(0) + R(0)  2R( ) = 2(R(0)  R( )). Uttrycket E [x(t +  )  x(t)]2 ar
a ar aven R(0)  R( ) positivt varav p
alltid positivt allts ast
aendet foljer.

 R(0)  jR( )j. Foljer direkt av ovanstaende bevis.


 R(  ) = R( ), dvs kovariansfunktionen ar symmetrisk kring origo.
Bevis R(  ) = E [x(t  )x(t)] = E [x(t)x(t  )] = satt t = s +  =
E [x(s +  )x(s)] = R( ).
 Om jR( )j = R(0) for nagot  6= 0 sa ar x(t) periodisk.Bevis: utlammnas.
Korskovariansfunktionen beskriver samvariatonen mellan tv
a stokastiska processer
x(t) och y (t) och denieras
Rxy ( ) = E [x(t +  )y (t)]
Vi ger foljande relationer utan bevis:

 R ( ) = R (  )
xy yx

 R ( ) 6= R (  ) i allmanhet.
xy xy

 R ( ) = 0 for alla  ) x och y ar okorrelerade.


xy

1
En process
ar station
ar om dess egenskaper (f
ordelningar) ej beror av absolut tid.

10
2.4 Exempel kovariansfunktioner
Det ar inget tentakrav att kunna rakna ut komplicerade kovariansfunktioner. Till
tentan ska man kunna rakna ut kovariansfunktionen for en MA process (av god-
tycklig ordning). Mera komplicerade kovariansuttryck ges som ledning.

Nedan ges ett par exempel p a n


agra mera komplicerade kaovariansuutryck. Som
vanligt betecknar e(t) vitt brus med medelvarde 0 och varians . Vi anvander
a Ry (k ) = Ey (t + k )y (t).
ocks

ARMA(1,1) processen

y (t) + ay (t 1) = e(t) + e(t 1)

1 + 2 2a
Ry (0) = 
1 a2
( a)(1 a )
Ry (1) = 
1 a2
( a)(1 a )
R y ( k ) = ( a ) k 1 ; k>1
1 a2
ARMAX(1,1,1) processen

y (t) + ay (t 1) = bu(t 1) + e(t) + e(t 1)


Insignalen ar vit, med medelvarde 0 och varians Eu2 = 

b2  + (1 + 2 2a )
Ry (0) =
1 a2
ab2  + ( a)(1 a )
Ry (1) =
1 a2
Ey (t)u(t) = 0
Ey (t)u(t 1) = b
Notera att b = 0 ger ARMA(1,1) processen och = 0 ger en ARX(1,1) process.

11
3 L3 and L4-Parameter estimations
3.1 Problems
1) Criteria with constraints
Consider the following scalar non-linear minimization problem

min VN ( )


where
1 X
N
VN (  ) = (y (t) y(t; ))2
N j =1
The following constraint is also given:

0 1
Assume that solutions to dVNd() = 0 have been found. Describe how the minimiza-
tion problem should be solved in principle.

2) Calculating the least squares estimate for ARX models.


Consider the following ARX model:

y (t) + ay (t 1) = bu(t 1) + e(t) (7)

(e(t) is white noise with zero mean)

Assume that available data are : y (1); u(1); y (2); u(2); ::: ; y (102); u(102) and
the following sums have been calculated:
X
102 X
102 X
102
y (t 1) = 5:0;
2
y (t 1)u(t 1) = 1:0; u2 (t 1) = 1:0;
t=2 t=2 t=2

X
102 X
102
y (t)y (t 1) = 4:5; y (t)u(t 1) = 1:0;
t=2 t=2

Which value of  = (a b)T minimizes the quadratic criteria


1 X
N
VN () = (y (t) y(t; ))2
N t=2

where y(t;  ) is the predictor obtained from the ARX model (7)?

3) Data with non zero mean.


Assume that the data from a system (normally the data is also noise corrupted
but that is not considered in this example) is given by

A(q )y (t) = B (q )u(t) + K (8)

where A(q ) = 1 + a1 q 1 +; : : : ; an q n, , B (q ) = b1 q 1 +; : : : ; bn q n , and K is an


unknown constant.

12
a) Show that by using the following transformation of the data

(q )u(t) = (1 q 1 )u(t) = u(t) u(t 1)


(q )y (t) = (1 q 1 )y (t) = y (t) y (t 1)
as new input and output signals, the standard LS (least squares) method can be
used to nd the parameters in A(q ) and B (q ).

b) Show that the constant K easily can be included in the LS estimate for the
model (8).

c) What is the standard procedure to deal with data with non zero mean?

4) The problem with feedback.


Consider the following system:

y (t) + ay (t 1) = bu(t 1) + e(t) (9)

(e(t) is white noise with zero mean)

Assume that the system is controlled with a proportional controller

u(t) = Ky (t)
PN '(t)'T (t) becomes singular!
Show that P= t=1

5) Optimal input signal.


The following system is given:

y (t) = bo u(t) + b1 u(t 1) + e(t)


e(t) is white noise with zero mean and variance .
The parameters are estimated with the least squares method. Consider the case
when the number of data points N goes to innity (in practice, this means that
we have many data points available)

a) Show that var(bo ) and var(b1 ) only depends on the following values of the
covariance function:

1 X
N
Ru (0) = Eu (t) = Nlim
2
u2 (k)
!1 N k=1
1 X
N
Ru (1) = Eu(t)u(t 1) = Nlim u(k)u(k 1)
!1 N k =1

b) Assume that the energy of the input signal is constrained to

X
N
u2 (k)  1
1
Ru (0) = Eu2 (t) = Nlim
!1 N k=1

13
Determine Ru (0) and Ru (1) so that the variance of the parameter estimates are
minimized.

6) The following system is given:

y (t) = b1 u(t 1) + b2 u(t 2) + e(t)


e(t) is white noise with zero mean and variance .
Assume that the number of data points goes to innity.

a) Assume that u(t) is white noise2 with variance  and zero mean. Show that the
least squares estimate converge to the true system parameters.

b) Assume that u(t) is a unit step: u(t) = 0, t  0, and u(t) = 1, t  1, show that
= limN !1 1 PN
the matrix R t=1 '(t)' (t) becomes singular.
T
N

7) The following system is given:

y (t) = b1 u(t 1) + b2 u(t 2) + e(t)


e(t) is white noise with zero mean and variance .
The following predictor
y(t) = bu(t 1)
is used to estimate the parameter b with the least squares (LS) method.

Calculate the LS estimate of b (expressed in b1 and b2 ) as the number of data


points goes to innity for the cases:

a) The input signal u(t) is white noise.

b) The input signal is a sinusoidal u(t) = A sin(!1 t) wich has the covariance
function Ru ( ) = 12 A2 cos(!1  )

2
In general, we will also assume that e(t) and u(t) are uncorrelated if not explicitely stated
otherwise.

14
3.2 Solutions
1) Let i be the parameters which gives dVNd() = 0. Check the values of VN (i ),
VN (0) and VN (1) and select the values of  which minimizes V .
2) The predictor for the ARX model is y(t) = 'T (t) where
'(t) = ( y (t 1) u(t 1))T and  = (a b)T . The least squares estimate is
X
N X
N
 = [ '(t)'T (t)] 1
'(t)y (t))
" t=1
P102 y 2(t
t=1
P102 y (t # 1" P102 #
1)u(t
= P102 t=2 1) tP
=2 1)
P102t=2 y (t 1)y (t)
y (t 1)u(t 1) 102 2
u (t 1) u(t 1)y (t)
" # 1" #
t=2 t=2 t=2

5 1 4:5
=
1 1 1
" #
0:875
=
0:125

3a) Multiplying the left and right hand side of the system

A(q )y (t) = B (q )u(t) + K


with (q ) gives
(q )A(q )y (t) = (q )B (q )u(t) + (q )K
but since K is a constant, (q )K (1 q 1 )K == K K = 0 and hence
A(q )((q )y (t)) = B (q )((q )u(t))
Thus if we use the dierentiated input and output signals we can use the standard
LS estimate to estimate A and B .

Remark: Any lter L(q ) with L(1) = 0 would remove the constant K . Note that
L(1) = 0 means zero steady state gain.
b) Use the regression vector

'(t) = ( y (t 1) y (t 2) : : : y (t n) u(t 1) u(t 2) : : : u(t n) 1)T


and
 = (a1 a2 : : : an b1 b2 : : : bn K )T
Note that we can view the system consisting of two input signals; u(t) and 1.

c) Remove the mean from the data, that is use the new signals:

1 X
N
y(t) = y (t) y (k )
N k=1
1 X
N
u(t) = u(t) u(k)
N k=1

15
4a) We directly get
X
N
P = [ '(t)'T (t)]
"t=1 PN PN #
PN y (t 1) t=1 y (t
PN u2 t1)u(1)t
2
t=1 1)
=
t=1 y (t 1)u(t 1) t=1 (
" PN 2 P #
t=1 y (t K PNt=1 y (t 1)
2
1)
= P
K N y 2 (t 1) K 2 Nt=1 y 2 (t 1)
t=1

In the third equality we have used u(t) = Ky (t).


It is directly seen that the matrix is singular (det P = 0) hence this experimental
condition can not be used to estimate the parameters uniquely. This can also be
seen if we look at the predictor y(t) = ay (t 1) + bu(t 1). With the con-
troller u(t) = Ky (t) the predictor becomes y(t) = ay (t 1) bKy (t 1) =
(bK +a)y (t 1) and we see that the predictor does not uniqely depends on a and b.

5) We have that (see Linear Regression)


X
N
 1X N

cov() = [ '(t)'T (t)] 1 = [
N N t=1
'(t)'T (t)] 1
! N !1 N [[E f'(t)' (t)g]
T 1

t=1
 1
= (R )
N
With '(t) = (u(t) u(t 1))T we get
" #
R = Ru (0) Ru (1)
Ru (1) Ru (0)
and as N !1 " #
 Ru (0) Ru (1) 1
cov() =
N Ru (1) Ru (0)
Hence,
 Ru (0)
var(bo ) = var(b1 ) =
N Ru (0) Ru2 (1)
2

b) It is seen directly (note that Ru (0)  jRu ( )j) that the variances are minimized
for Ru (0) = 1 and Ru (1) = 0. One example of a signal that fullls this condition
is white noise with unit variance.

6) The predictor is given by y(t) = 'T (t) where


'(t) = (u(t 1) u(t 2))T and  = (b1 b2 )T .
1
The least squares estimate is (we normalize with since we then get a feasible
expression as N ! 1)
N

1 1 X
N X
N
 = [ '(t)' (t)]
T 1
'(t)y (t))
N N t=1 t=1

As N !1
1 = (R ) 1 E f'(t)y (t)g

16
= E f'(t)'T (t)g cf the previous problem.
where R
" # 1" #
 = Ru (0) Ru (1) Ryu (1)
Ru (1) Ru (0) Ryu (2)
Since u(t is white noise Ru (1) = 0 and
Ryu (1) = Ey (t)u(t 1) = E f[b1 u(t 1 + b2 u(t 1) + e(t)]u(t 1)g = b1 Ru (0)
Ryu (2) = Ey (t)u(t 1) = E f[b1 u(t 1 + b2 u(t 1) + e(t)]u(t 2)g = b2 Ru (0)
the estimate converges to
" #" # " #
1=Ru (0) 0 Ru (0)b1 = b1
1 = 0 1=Ru (0) Ru(0)b2 b2
which is expected since we have a FIR model (wich could be interpreted as a linear
regression model) and model structure is correct.

b) We rst calculate
X " PN P #
1 N
'(t)' (t) =
T
PN u2 (t 1)
1 t=1 u(t
N
PN 2 1)u(t 2)
t=1
N t=1 N u(t 1)u(t 2) u (t 1)
" t=1 # " N 1 Nt=12 #
1 N 1 N 2
= = NN 2 NN 2
N N 2 N 2 N N

and we see that " #


1 1
R = 1 1
which is singular. This means that a step gives too poor excitation of the system,
asymptotically the matrix (which should be inverted in the least squares method)
becomes singular.

7) In this case '(t) = (u(t 1)) and  = b. Asymptotically in N we have


1 = [E f'(t)'T (t)g] 1 E f'(t)y (t)g
For this simple predictor we have:

E f'(t)'T (t)g = Eu2 (t 1) = Ru (0)


E f'(t)y (t)g = Ey (t)u(t 1) = Ryu (1)
and by using the system generating the data

Ryu (1) = Ey (t)u(t 1) = E f[b1 u(t 1+b2 u(t 1)+e(t)]u(t 1)g = b1 Ru (0)+b2 Ru (1)
which gives
Ru (1)
1 = b1 = b1 + b2
Ru (0)
a) If u(t) is white noise Ru (1) = 0 and we get b1 = b1 .

b) For u(t) beeing a sinusoid, b1 = b1 + b2 cos w1

17
4 L5- Some additional problems
4.1 Complementary theory - Analysis of the least squares
estimate
4.1.1 Results from Linear Regression
The accuracy result is based on the following assumptions:

Assumption A1.
Assume that the data are generated by (the true system):

y (t) = 'T (t)o + e(t) t = 1; : : : ; N (10)

where e(t) is a nonmeasurable disturbances term to be specied below. In matrix


form, (10) reads
Y = o + e (11)
where e = [e(1) : : : e(N )]T .
Assumption A2.
It is assumed that e(t) is a white noise process3 with variance .

Assumption A3.
It is nally assumed that E f'(t)e(s)g = 0 for all t and s. This means that the
regression vector is not inuenced (directly or indirectly) by the noise source e(t)

In the material Linear Regression it was shown that if Assumptions A1-A3 hold
then

1. The least squares estimate  is an unbiased estimate of o , that is E fg = o .

2. The uncertainty of the least squares estimate as expressed by the covariance


matrix P is given by

P = cov  = E f( E )( E )T g = E f( o )( o )T g = (T ) 1

X
N
= [ '(t)'T (t)] 1

t=1

4.1.2 Results for the case when A3 does not hold


For the case when A3 does not hold we have that

1. The least squares estimate  is consistent:

 ! o as N !1 (12)

where N is the number of data points.


3
A white noise process e(t) is a sequence of random variables that are uncorrelated, have
mean zero, and a constant finite variance. Hence, e(t) is a white noise process if E fe(t)g = 0,
E fe2 (t)g = , and E fe(t)e(j )g = 0 for t not equal to j .

18
2. The covariance matrix P is given by

P = cov  ! [E f'(t)'T (t)g] 1
as N !1 (13)
N
Remarks:

 Two typical examples when A3 does not hold are when the system is an
AR-process or an ARX-process (we then have values of the output in the
regressor vector.

 The results only holds asymptotically in N . In practice this means that we


need to have many data (some hundreds are typically enough) points for the
estimate to be reliable (and also to get reliable estimate of the covariance
matrix).

 If the noise is not white the estimate will in general not be consistent (in
contrast to when A3 holds- see Linear Regression).

 Results for general linear models are presented on page 297-299 in the text
book.

4.1.3 Bias, variance and mean squared error


Let  be a scalar estimate of the true parameter o . The bias is dened as
bias() = E fg o (14)

The variance is given by

var() = E f( E fg)2 g (15)

The Mean Squared Error (MSE) is given by

MSE() = E f( o )2 g = var() + [bias()]2 (16)

In practice we want to have an estimate with as small MSE as possible. In some


cases this may mean that we accept a small bias if the variance of the estimate
can be reduced.

4.2 Problems
1) Predictor using exponential smoothing

A simple predictor for a signal y (t) is the so-called exponential smoothing which
is given by
1
y(t) = y (t 1)
1 q 1
a) Show that if y (t) = m for all t, the predictor will in steady state (stationarity)
be equal to m.

19
b) For which ARMA model is the predictor optimal?
Hint: Rewrite the predictor in the form y(t) = L(q )y (t) and compare with the
predictor for an ARMA model A(q )y (t) = C (q )y (t).

2) Cross correlation for LS estimate of ARX parameters.


Consider the standard least squares estimate of the parameters in an ARX model:

A(q )y (t) = B (q )u(t) + e(t)


where A(q ) = 1 + a1 q 1 + ::: + ana q na och B (q ) = b1 q 1 + ::: + bnb q nb . The
estimate of the cross correlation between residuals and inputs is given by

1 X
N
Ru ( ) = (t)u(t  ))
N t=1

where (t) = y (t) y(t) = y (t) 'T (t) is the prediction error. Show that the
least squares estimate gives

R u( ) = 0  = 1; 2:::nb
PN
Hint: Show that the least squares estimate gives t=1 '(t)(t) = 0.
3) The variance increases if more parameters than needed are estimated!
Assume that data from an AR(1) process (The system) is collected:

y (t) + ao y (t 1) = e(t)
where e(t) is white noise with zero mean and variance . The system is stable
wich means that jao j < 1

Consider the following two predictors

M1 y(t) = ay (t 1)

M2 y(t) = a1 y (t 1) a2 y (t 2)
where the parameters for each predictor is estimated with the least squares method.

It can easily be shown that for M1: a !a o and for M2: a


1 !a o and a
2 ! 0, as
the number of data points N ! 1.

Hence both estimate gives consistent estimate. Show that the price to pay for
a1 ) > var(a) as N ! 1.
estimating too many parameters is that var(
Hint: For the AR(1) process we have that Ry (k ) = Ey (t + k )y (t) = ( ao )k Ry (0),
k = 1; 2:: where Ry (0) = 1 a2o
4) Variance of parameters in an estimated ARX model.
Assume that data was collected from the following ARX process (The system)

y (t) + ao y (t 1) = bo u(t 1)e(t)

20
where e(t) is white noise with zero mean and variance . The system is stable
wich means that jao j < 1. The input signal is uncorrelated with e(t), and is white
noise with zero mean and variance  .

The parameters in the following predictor

y(tj) = ay (t 1) + bu(t 1)
are estimated with the least squares method.
Calculate the asymptotic (in number of data points N ) variance of the parameter
estimates.
Hint:
b2o  + 
Ry (0) = Ey (t) =
2
1 a2o

21
4.3 Solutions
1a) We rewrite the predictor in standard form
q 1 (1 )
y(t) = y (t) = L(q )y (t)
1 q 1
The static gain is given by L(1) and since we have L(1) = 1 the steady state value
of the predictor will be y(t) = m.

b) For an ARMA process


A(q )y (t) = C (q )e(t)
the optimal predictor is
A( q ) C (q ) A(q )
y(t) = (1 )y (t) = ( )y (t)
C (q ) C (q )
Hence, we need to nd A(q ) and C (q ) so that
C (q ) A(q ) q 1 (1 )
=
C (q ) 1 q 1
which gives C (q ) = 1 q 1
and A( q ) = 1 q 1 .
2) The least squares estimate is given by
X
N X
N
 = [ '(t)' (t)]
T 1
'(t)y (t))
t=1 t=1

which can be written as


X
N X
N
[ '(t)'T (t)] = '(t)y (t)) (17)
t=1 t=1

With (t) = y (t) 'T (t) we get


X
N X
N X
N X
N
'(t)(t) = '(t)[y (t) 'T (t)] = '(t)y (t) [ '(t)'T (t)]
t=1 t=1 t=1 t=1

X
N XN
= '(t)y (t) '(t)y (t) = 0
t=1 t=1

In the last equality (17) was used. We thus have


2 3
y (t 1)
66 y (t 2) 77
66 77
66 ..
. 77
N 6
XN X
'(t)(t) = 666 uy((tt 1) na) 777 (t)
0=
t=1 6
7
t=1
66 u(t 2) 777
66 .. 77
4 . 5
u(t nb)
22
This means that all estimates
1 X
N
R u( ) = (t)u(t  )) = 0 for  = 1; 2:::nb
N t=1

Therefore, the values of the estimated cross correlation function R


u ( ) for  =
1; 2:::nb can not be used to determine if an estimated ARX model is good or bad.
They will always be zero. See also the text book on page 368!

3) In general we have for estimated AR-parameters that



P = cov  !
[E f'(t)'T (t)g] 1 as N ! 1
N
For M1 we have '(t) = y (t 1) and E f'(t)'T (t)g = Ey 2 (t 1) = Ry (0) = 
1 a2o
(see the Hint). Thus
 1 a2o 1
a) !
var( = (1 a2o ) as N ! 1
N  N
For M2 we have '(t) = [ y (t 1) y (t 2)]T and therefore
!
E f'(t)' (t)g = T Ry (0) Ry (1)
Ry (1) Ry (0)
This gives
!
[E f'(t)'T (t)g] 1
=
1 Ry (0) Ry (1)
Ry (0)2 Ry (1)2 Ry (1) Ry (0)
and

a1 )
var( ! N R (0)R2 (0)R (1)2 = N R (0)2R (0)
y

a R (0)
2 2
=
 1 y

N R (0)(1 a2 )
y y y o y y o

as N ! 1
1
=
N
a1 ) > var(a) as N ! 1.
We have thus shown that var(

4) The predictor gives '(t) = [ y (t 1) u(t 1)]T and  = [a b]T . This gives
!
E f'(t)'T (t)g = Ry (0) Ryu (0)
Ryu (1) Ru (0)
The cross covariance between y and u is
Ryu (0) = E f( ao y (t 1) + bo u(t 1) + e(t))(u(t))g = 0
since u(t) is white noise (uncorrelated with e(t)). The (asymptotic) covariance
matrix is
! !

1
o
b2  +
 1 a2o
P = cov  ! 1 a2 o
0
= o  +
b2
0
as N !1
N 0  N 0 1


a) =
From the diagonal elements we get the variances: var(  1 a
2
o and var(b) = 
o
N b2  + N
(as N ! 1).

23

You might also like