R Akne Ovningar Empirisk Modellering
R Akne Ovningar Empirisk Modellering
Bengt Carlsson
Systems and Control
Dept of Information Technology, Uppsala University
25th February 2009
Abstract
R
akneuppgifter samt lite kompletterande teori.
1
Contents
1 L1-Linear regression 3
1.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2
1 L1-Linear regression
1.1 Problems
Problem 1. A linear trend model
a) Consider a linear regression model
y(t) = a + bt
Calculate the least squares estimate for the following two cases
PN
1. The data are y (1); y (2); ::: y (N ). For this case, use So = y (t), S1 =
PN t=1
t=1 ty (t)
PN
2. The data are y ( N ); y ( N +1); ::: y (N ). For this case, use So = y (t),
P
S1 = Nt= N ty (t)
t= N
Hints:
X
N
N (N + 1)
t =
t=1
2
X
N
N (N + 1)(2N + 1)
t2 =
t=1
6
where P = var .
2) Some accuracy results for linear trend models
3
a) Assume that the data y (1); y (2); ::: y (N ) is generated by
y (t) = ao + bo t + e(t)
where e(t) is white noise with variance . The parameters in a linear trend model
y(t) = a + bt
are estimated with the least squares method. Calculate the variance of b.
b) Assume that we dierenced the data and introduce a new signal
where the new noise source w (t) = e(t) e(t 1). The parameter bo may then be
estimated from the following model
z(t) = b
Calculate the variance of b and compare with the accuracy obtained in (2a).
Hint:
Note that the noise w (t) in (1) is not white. Hence the expression (??) need to be
used when calculating the variance.
1.2 Solutions
1) In general we have
X
N X
N
= [ '(t)'T (t)] 1
'(t)y (t)) = [T ] 1 T Y (2)
t=1 t=1
4
" PN PN t # 1 " S # " N # 1" #
= 1
PtN=1 PNt=1 2 o
N (N +1)
2
So
t=1 t t=1 t S1 = N (N2+1) N (N +1)(2N +1)
S1
" # 6
1
N (N 1)
[2(2N + 1)So 6S1 ]
= 6
N (N 1)(N +1)
[2S1 (N + 1)So ]
Case (ii). Data y (N ); : : : ; y (N ). This gives 2N + 1 data points. All sums will
P
have the form Nt= N .
b)
The model y (t) a = bt gives the least squares estimate
Pt P
(y (t) a) S1 a t
b = P = P2
t2 t
case (i):
b = 3
[2S1 (N + 1)So ]
N (N + 1)(2N + 1)
which is not equal to the solution in a) and is therefore wrong.
case (ii):
S1 0 3
b = P t2 = S1
N (N + 1)(2N + 1)
which is equal to the solution in a) and is therefore correct.
c)
X " PN PN # " # 1
t=1 t N
N 1 N (N +1)
1
P = [ '(t)' (t)] T 1
= PtN=1 P = 2
t=1 t t=1 t
N 2 N (N +1) N (N +1)(2N +1)
t=1 2 6
(4)
Straightforward calculations using the P above and ' = (1 t) give T
12 (N + 1)(2N + 1)
var s(t) = '(t)T P '(t) = [t2 t(N + 1) + ]
N (N + 1)(N 1) 6
and
12 (N + 1)(2N + 1) 4
var s(1) = [12 1(N +1)+ ] when N is large
N (N + 1)(N 1) 6 N
12 (N + 1)(2N + 1) 4
var s(N ) = [N 2 N (N +1)+ ] = var s(1) when N is large
N (N + 1)(N 1) 6 N
The t giving minimal variance is obtained by setting
d 12
var s(t) = [2t (N + 1)] = 0
dt N (N + 1)(N 1)
5
which gives t = N 2+1 . Hence, the minimal variance is in the middle of the observa-
tion interval. Furthermore
N +1 12 (N + 1)2 (N + 1)2 (N + 1)(2N + 1)
var s( )= [ + ]=
2 N (N + 1)(N 1) 4 2 6 N
q
For large data sets, the standard deviation decreases from 2 (=N ) at s(1) and
q
s(N ) to (=N ) in the middle of the interval.
2) For a linear regression we have that
X
N
cov = [T ] 1
= [ '(t)'T (t)] 1
t=1
z (t) = bo + w(t)
where w(t) = e(t) e(t 1). We have
Rw (0) = E fw2 (t)g = E f[e(t) e(t 1)]2 g = 2
Rw (1) = E fw(t + 1)w(t)g = E f(e(t + 1) e(t))(e(t) e(t 1))g =
Rw ( k ) = 0 k > 1
The noise is not white and in order to calculate the variance we need to calculate
R = E fwwT g where w = (w(1); w(2); : : : ; w(N 1))T . See Section 4.3 in Linear
regression. Note that when the data is dierentiated one data point is lost. We
have 2 3
2 1 ::: 0
66 1 2 : : : 0 77
R = 66 .. .. 77
4 . . 1 5
0 ::: 1 2
For the model z (t) = b we have '(t) = 1 and = (1; : : : ; 1)T (with N 1 rows).
We can now calculate the variance of b (the least squares estimate) from the
covariance matrix (which becomes the variance since is a scalar)
1 1 2
cov = varb = (T ) 1 T R(T ) 1
= 2 = (5)
N 1 N 1 (N 1)2
6
It is easily seen that this expression is larger than the one in a).
3) We have '(t) = (u1 (t) u2 (t))T , with u1 (t) = K and u2 (t) = L. We assume the
number of data to be N . This gives the (N j2) matrix:
2 3
K L
6 .. .. 77
=6
4 . . 5
K L
and " #
T
=
NK 2 NKL
NKL NL2
We then see that det(T ) = 0 and hence the LS method can not be used. It is
not possible to determine the parameters uniquely from this data set (which also
should be intiutively clear).
Calculate, poles, zeros and static gain of the system. Is the system stable?
2. Spectrum.
The following stochastic process is given:
The input signal has the following covariance function Ru (0) = 1, Ru (1) = Ru ( 1) =
0:5, Ru ( ) = 0 for j j > 1. Calculate the spectrum of the output signal.
7
4. Covariance functions
Calculate the covariance function for the following stochastic processes where e(t)
is white noise with variance .
a)
y (t) = e(t) +
e(t 1)
b)
y (t) + ay (t 1) = e(t) jaj < 1
2.2 Solutions
1)
0:1 + 1q 1 0:1q + 1
H (q ) = =
1 0:2q 1 q 0:2
0:1z + 1
H (z ) =
z 0:2
We immediately see that the system has one zero in z = 10 and one pole in
z = 0:2. Informally we could as well solve the roots with the q -operator (but not
with q 1 -operator).
The static gain is obtained by setting q = 1 (or z = 1). We have H (1) = 01:1+1
0:2
=
1:1=0:8
The system is stable since all poles are inside the unit circle.
1 0:1q 1
q 0:1
H (q ) = =
1 0:2q 1 q 0:2
a) First note that since e(t) is white noise whith variance e (! ) = . By using
y (! ) = jH (eiw )j2 e (! )
we get
X1
k=
u (! ) = Ru (k)e i!k
k= 1
8
The given covariance function gives
u (! ) = 0:5e i!
+ 1 + 0:5ei! = 1 + cos(! )
and the output spectral density is
b b b2 (1 + cos(! ))
y (! ) = jH (eiw j2 u (! ) = H (eiw )H (e iw
)u (! ) = (1+cos(! )) =
eiw + a e iw +a 1 + a2 + 2a cos(! )
4 a) The MA(1) process is
y (t) = e(t) +
e(t 1)
and we directly get
Ry (0) = Ey 2 (t) = E (e(t) +
e(t 1))(e(t) +
e(t 1)) = (1 +
2 )
Ry (1) = Ey (t + 1)y (t) = E (e(t + 1) +
e(t))(e(t) +
e(t 1)) =
Ry (k) = Ey (t + k)y (t) = E (e(t + k) +
e(t + k 1))(e(t) +
e(t 1)) = 0; for k > 1
b) The covariance function for the AR(1) process
y (t) ay (t 1) = e(t) jaj < 1
can be solved (this also holds for a general AR(n) process) by the so called Yule
Walker equations (not a course requirement). Basically the idea is to multiply the
AR process with y (t k ) and take expectations. We have to distinguish between
k = 0, and k > 0. For k = 0 we get
y (t)(y (t) + ay (t 1)) = e(t)y (t)
Taking expectations give
Ry (0) + aRy (1) =
For k > 0 we get
y (t k)(y (t) + ay (t 1)) = e(t)y (t k)
Taking exectations give
Ry (k) + aRy (k 1) = 0
We hence end up with a lset of linear equations. For k = 0; 1 we get
! ! !
1 a Ry (0) =
(6)
a 1 Ry (1) 0
with solution
Ry (0) =
1 a2
Ry (1) = ( a)
1 a2
9
2.3 Egenskaper kovariansfunktioner
at x(t) vara en stokastisk stationar1 process med medelvarde Ex(t) = 0. Pro-
L
cessens kovariansfunktion denieras av
R(0) R( ) 0.
Bevis: E [x(t + ) x(t)]2 = E [x(t + )]2 + 2E [x(t)]2 2E [x(t + )x(t)] =
R(0) + R(0) 2R( ) = 2(R(0) R( )). Uttrycket E [x(t + ) x(t)]2 ar
a ar aven R(0) R( ) positivt varav p
alltid positivt allts ast
aendet foljer.
R ( ) = R ( )
xy yx
R ( ) 6= R ( ) i allmanhet.
xy xy
1
En process
ar station
ar om dess egenskaper (f
ordelningar) ej beror av absolut tid.
10
2.4 Exempel kovariansfunktioner
Det ar inget tentakrav att kunna rakna ut komplicerade kovariansfunktioner. Till
tentan ska man kunna rakna ut kovariansfunktionen for en MA process (av god-
tycklig ordning). Mera komplicerade kovariansuttryck ges som ledning.
ARMA(1,1) processen
1 +
2 2a
Ry (0) =
1 a2
(
a)(1 a
)
Ry (1) =
1 a2
(
a)(1 a
)
R y ( k ) = ( a ) k 1 ; k>1
1 a2
ARMAX(1,1,1) processen
b2 + (1 +
2 2a
)
Ry (0) =
1 a2
ab2 + (
a)(1 a
)
Ry (1) =
1 a2
Ey (t)u(t) = 0
Ey (t)u(t 1) = b
Notera att b = 0 ger ARMA(1,1) processen och
= 0 ger en ARX(1,1) process.
11
3 L3 and L4-Parameter estimations
3.1 Problems
1) Criteria with constraints
Consider the following scalar non-linear minimization problem
min VN ( )
where
1 X
N
VN ( ) = (y (t) y(t; ))2
N j =1
The following constraint is also given:
0 1
Assume that solutions to dVNd() = 0 have been found. Describe how the minimiza-
tion problem should be solved in principle.
Assume that available data are : y (1); u(1); y (2); u(2); ::: ; y (102); u(102) and
the following sums have been calculated:
X
102 X
102 X
102
y (t 1) = 5:0;
2
y (t 1)u(t 1) = 1:0; u2 (t 1) = 1:0;
t=2 t=2 t=2
X
102 X
102
y (t)y (t 1) = 4:5; y (t)u(t 1) = 1:0;
t=2 t=2
where y(t; ) is the predictor obtained from the ARX model (7)?
12
a) Show that by using the following transformation of the data
b) Show that the constant K easily can be included in the LS estimate for the
model (8).
c) What is the standard procedure to deal with data with non zero mean?
u(t) = Ky (t)
PN '(t)'T (t) becomes singular!
Show that P= t=1
a) Show that var(bo ) and var(b1 ) only depends on the following values of the
covariance function:
1 X
N
Ru (0) = Eu (t) = Nlim
2
u2 (k)
!1 N k=1
1 X
N
Ru (1) = Eu(t)u(t 1) = Nlim u(k)u(k 1)
!1 N k =1
X
N
u2 (k) 1
1
Ru (0) = Eu2 (t) = Nlim
!1 N k=1
13
Determine Ru (0) and Ru (1) so that the variance of the parameter estimates are
minimized.
a) Assume that u(t) is white noise2 with variance and zero mean. Show that the
least squares estimate converge to the true system parameters.
b) Assume that u(t) is a unit step: u(t) = 0, t 0, and u(t) = 1, t 1, show that
= limN !1 1 PN
the matrix R t=1 '(t)' (t) becomes singular.
T
N
b) The input signal is a sinusoidal u(t) = A sin(!1 t) wich has the covariance
function Ru ( ) = 12 A2 cos(!1 )
2
In general, we will also assume that e(t) and u(t) are uncorrelated if not explicitely stated
otherwise.
14
3.2 Solutions
1) Let i be the parameters which gives dVNd() = 0. Check the values of VN (i ),
VN (0) and VN (1) and select the values of which minimizes V .
2) The predictor for the ARX model is y(t) = 'T (t) where
'(t) = ( y (t 1) u(t 1))T and = (a b)T . The least squares estimate is
X
N X
N
= [ '(t)'T (t)] 1
'(t)y (t))
" t=1
P102 y 2(t
t=1
P102 y (t # 1" P102 #
1)u(t
= P102 t=2 1) tP
=2 1)
P102t=2 y (t 1)y (t)
y (t 1)u(t 1) 102 2
u (t 1) u(t 1)y (t)
" # 1" #
t=2 t=2 t=2
5 1 4:5
=
1 1 1
" #
0:875
=
0:125
3a) Multiplying the left and right hand side of the system
Remark: Any lter L(q ) with L(1) = 0 would remove the constant K . Note that
L(1) = 0 means zero steady state gain.
b) Use the regression vector
c) Remove the mean from the data, that is use the new signals:
1 X
N
y(t) = y (t) y (k )
N k=1
1 X
N
u(t) = u(t) u(k)
N k=1
15
4a) We directly get
X
N
P = [ '(t)'T (t)]
"t=1 PN PN #
PN y (t 1) t=1 y (t
PN u2 t1)u(1)t
2
t=1 1)
=
t=1 y (t 1)u(t 1) t=1 (
" PN 2 P #
t=1 y (t K PNt=1 y (t 1)
2
1)
= P
K N y 2 (t 1) K 2 Nt=1 y 2 (t 1)
t=1
t=1
1
= (R )
N
With '(t) = (u(t) u(t 1))T we get
" #
R = Ru (0) Ru (1)
Ru (1) Ru (0)
and as N !1 " #
Ru (0) Ru (1) 1
cov() =
N Ru (1) Ru (0)
Hence,
Ru (0)
var(bo ) = var(b1 ) =
N Ru (0) Ru2 (1)
2
b) It is seen directly (note that Ru (0) jRu ( )j) that the variances are minimized
for Ru (0) = 1 and Ru (1) = 0. One example of a signal that fullls this condition
is white noise with unit variance.
1 1 X
N X
N
= [ '(t)' (t)]
T 1
'(t)y (t))
N N t=1 t=1
As N !1
1 = (R ) 1 E f'(t)y (t)g
16
= E f'(t)'T (t)g cf the previous problem.
where R
" # 1" #
= Ru (0) Ru (1) Ryu (1)
Ru (1) Ru (0) Ryu (2)
Since u(t is white noise Ru (1) = 0 and
Ryu (1) = Ey (t)u(t 1) = E f[b1 u(t 1 + b2 u(t 1) + e(t)]u(t 1)g = b1 Ru (0)
Ryu (2) = Ey (t)u(t 1) = E f[b1 u(t 1 + b2 u(t 1) + e(t)]u(t 2)g = b2 Ru (0)
the estimate converges to
" #" # " #
1=Ru (0) 0 Ru (0)b1 = b1
1 = 0 1=Ru (0) Ru(0)b2 b2
which is expected since we have a FIR model (wich could be interpreted as a linear
regression model) and model structure is correct.
b) We rst calculate
X " PN P #
1 N
'(t)' (t) =
T
PN u2 (t 1)
1 t=1 u(t
N
PN 2 1)u(t 2)
t=1
N t=1 N u(t 1)u(t 2) u (t 1)
" t=1 # " N 1 Nt=12 #
1 N 1 N 2
= = NN 2 NN 2
N N 2 N 2 N N
Ryu (1) = Ey (t)u(t 1) = E f[b1 u(t 1+b2 u(t 1)+e(t)]u(t 1)g = b1 Ru (0)+b2 Ru (1)
which gives
Ru (1)
1 = b1 = b1 + b2
Ru (0)
a) If u(t) is white noise Ru (1) = 0 and we get b1 = b1 .
17
4 L5- Some additional problems
4.1 Complementary theory - Analysis of the least squares
estimate
4.1.1 Results from Linear Regression
The accuracy result is based on the following assumptions:
Assumption A1.
Assume that the data are generated by (the true system):
Assumption A3.
It is nally assumed that E f'(t)e(s)g = 0 for all t and s. This means that the
regression vector is not inuenced (directly or indirectly) by the noise source e(t)
In the material Linear Regression it was shown that if Assumptions A1-A3 hold
then
X
N
= [ '(t)'T (t)] 1
t=1
! o as N !1 (12)
18
2. The covariance matrix P is given by
P = cov ! [E f'(t)'T (t)g] 1
as N !1 (13)
N
Remarks:
Two typical examples when A3 does not hold are when the system is an
AR-process or an ARX-process (we then have values of the output in the
regressor vector.
If the noise is not white the estimate will in general not be consistent (in
contrast to when A3 holds- see Linear Regression).
Results for general linear models are presented on page 297-299 in the text
book.
4.2 Problems
1) Predictor using exponential smoothing
A simple predictor for a signal y (t) is the so-called exponential smoothing which
is given by
1
y(t) = y (t 1)
1 q 1
a) Show that if y (t) = m for all t, the predictor will in steady state (stationarity)
be equal to m.
19
b) For which ARMA model is the predictor optimal?
Hint: Rewrite the predictor in the form y(t) = L(q )y (t) and compare with the
predictor for an ARMA model A(q )y (t) = C (q )y (t).
1 X
N
Ru ( ) = (t)u(t ))
N t=1
where (t) = y (t) y(t) = y (t) 'T (t) is the prediction error. Show that the
least squares estimate gives
R u( ) = 0 = 1; 2:::nb
PN
Hint: Show that the least squares estimate gives t=1 '(t)(t) = 0.
3) The variance increases if more parameters than needed are estimated!
Assume that data from an AR(1) process (The system) is collected:
y (t) + ao y (t 1) = e(t)
where e(t) is white noise with zero mean and variance . The system is stable
wich means that jao j < 1
M1 y(t) = ay (t 1)
M2 y(t) = a1 y (t 1) a2 y (t 2)
where the parameters for each predictor is estimated with the least squares method.
Hence both estimate gives consistent estimate. Show that the price to pay for
a1 ) > var(a) as N ! 1.
estimating too many parameters is that var(
Hint: For the AR(1) process we have that Ry (k ) = Ey (t + k )y (t) = ( ao )k Ry (0),
k = 1; 2:: where Ry (0) = 1 a2o
4) Variance of parameters in an estimated ARX model.
Assume that data was collected from the following ARX process (The system)
20
where e(t) is white noise with zero mean and variance . The system is stable
wich means that jao j < 1. The input signal is uncorrelated with e(t), and is white
noise with zero mean and variance .
y(tj) = ay (t 1) + bu(t 1)
are estimated with the least squares method.
Calculate the asymptotic (in number of data points N ) variance of the parameter
estimates.
Hint:
b2o +
Ry (0) = Ey (t) =
2
1 a2o
21
4.3 Solutions
1a) We rewrite the predictor in standard form
q 1 (1 )
y(t) = y (t) = L(q )y (t)
1 q 1
The static gain is given by L(1) and since we have L(1) = 1 the steady state value
of the predictor will be y(t) = m.
X
N XN
= '(t)y (t) '(t)y (t) = 0
t=1 t=1
a1 )
var( ! N R (0)R2 (0)R (1)2 = N R (0)2R (0)
y
a R (0)
2 2
=
1 y
N R (0)(1 a2 )
y y y o y y o
as N ! 1
1
=
N
a1 ) > var(a) as N ! 1.
We have thus shown that var(
4) The predictor gives '(t) = [ y (t 1) u(t 1)]T and = [a b]T . This gives
!
E f'(t)'T (t)g = Ry (0) Ryu (0)
Ryu (1) Ru (0)
The cross covariance between y and u is
Ryu (0) = E f( ao y (t 1) + bo u(t 1) + e(t))(u(t))g = 0
since u(t) is white noise (uncorrelated with e(t)). The (asymptotic) covariance
matrix is
! !
1
o
b2 +
1 a2o
P = cov ! 1 a2 o
0
= o +
b2
0
as N !1
N 0 N 0 1
a) =
From the diagonal elements we get the variances: var( 1 a
2
o and var(b) =
o
N b2 + N
(as N ! 1).
23