0% found this document useful (0 votes)
12 views45 pages

Parametric Identification

The document discusses properties of estimators such as bias, variance, mean squared error, and consistency. It defines these properties and explains that an estimator is consistent if it converges to the true parameter value as the number of data samples increases to infinity.

Uploaded by

hung kung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views45 pages

Parametric Identification

The document discusses properties of estimators such as bias, variance, mean squared error, and consistency. It defines these properties and explains that an estimator is consistent if it converges to the true parameter value as the number of data samples increases to infinity.

Uploaded by

hung kung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Parametric

Identi cation
John Lataire

1
fi
Estimators
and their properties

2
Estimators

An estimator f of parameter (vector) θ based on a data set S


f: ↦ Θ : S ↦ f(S) = θ ̂

S θ̂
Data generating
Estimator
system
θ f

(Stochastic) Model
disturbance structure

3
𝒮
Properties of estimators

• Bias of the estimator


bias = {θ}̂ − θ
• If {θ}̂ = θ then the estimator is unbiased

• Variance of the estimator


variance = σθ2 ̂ ̂ 2
= { | θ − {θ} | }

{( )( ) }
H
co − variance = Cθ = θ ̂ − {θ}̂ θ ̂ − {θ}̂

• The Mean Squared Error of the estimator is


MSE = { | θ ̂ − θ |2 }
𝔼
𝔼
𝔼
4
𝔼
𝔼
𝔼
𝔼
𝔼
Properties of estimators
Consistency

• An estimator is (weakly) consistent if it converges


(N → ∞) in stochastic sense to the true parameter value:

∀ε > 0 : lim Prob ( | θ(N)


̂ − θ | < ε) = 1
N→∞
ˆ )=✓
<latexit sha1_base64="hQ+r+OCsEHQDZbCU44i5dvLPvvA=">AAACUHicbVBNbxMxEJ0NLZRQIIUjF4uoUuAQ7aIKuCBVwIFLP5BIWykbRbPObGPVa6/sWaTVKn+BX8MVfgQ3/gm31tkEibY8yfbTe2OP52WlVp7j+HfUubOxeffe1v3ug+2Hjx73dp6ceFs5SSNptXVnGXrSytCIFWs6Kx1hkWk6zS4+LP3Tr+S8suYL1yVNCjw3KlcSOUjT3iANXYppc5iyTZXJuV6kc+SU58Q4OHwh3okVn/b68TBuIW6TZE36sMbxdCfqpTMrq4IMS43ej5O45EmDjpXUtOimlacS5QWe0zhQgwX5SdOOtBC7QZmJ3LqwDItW/fdGg4X3dZGFygJ57m96S/F/3rji/O2kUaasmIxcNcorLdiKZT5iphxJ1nUgKJ0KfxVyjg4lhxS76UcKszg6CO8eleSQrXvZtBkumnYXLXZF4IqFMqJ0NsNMacV1yDC5mdhtcvJqmLwe7n3e6++/X6e5Bc/gOQwggTewD5/gGEYg4Rt8hx/wM/oV/YkuO9Gq9O8JT+EaOt0rBemyag==</latexit>

plim ✓(N
N !1

• Consistency expresses the contraction of the distribution of


the estimate around the true value as the number of data
samples increases to in nity.

55
fi
Consistency, other de nitions

( N→∞ )
Strongly consistent: Prob ̂
lim θ(N) =θ =1

ˆ ) = ✓0 ˆ ) = ✓0
<latexit sha1_base64="4SpdDn3Wq5BSbIKbNeF9RO1oIfk=">AAAC/3iclVLLbhMxFPUMrxJeKUhsWGARNSqoGs2gqmWDVAELNpQikbZSHEV3HCdj1R4P9h2qyJoFW36EHWLLp/AXfALOJEjQ0gVHsnV07sO+Ps4rJR2m6Y8ovnT5ytVra9c7N27eun2nu3730JnacjHgRhl7nIMTSpZigBKVOK6sAJ0rcZSfvFzEjz4K66Qp3+O8EiMNs1JOJQcM0rj7mSmpx36fWTkrEKw1p0yWU5w3rAD0DAuB0GzuP+4/X/Jxyj7UMGEasLDanyZVwrayptPv/5aMbfoM3H/3HXd7aZK2oOdJtiI9ssLBeD0asYnhtRYlcgXODbO0wpEHi5Ir0XRY7UQF/ARmYhhoCVq4kW8fraEbQZnQqbFhlUhb9c8KD9q5uc5D5mIwdza2EP8VG9Y4fTbysqxqFCVfHjStFUVDFw7QibSCo5oHAtzKcFfKC7DAMfjUYa9EmMWKN6Hv20pYQGOfeBb+gm58u9MWGzRwiVSWtLImh1wqifOLyls7Gg+JSxbkorQQ067xKpGJTppgSHb2+c+Tw6dJtpNsv9vu7b1YWbNGHpBHZJNkZJfskdfkgAwIJz+j+9HDiMaf4i/x1/jbMjWOVjX3yF+Iv/8Co9X1EQ==</latexit>

lim ✓(N w.p. 1 or a. s. lim ✓(N


N !1 N !1

• Weakly consistent:
ˆ )=✓
<latexit sha1_base64="hQ+r+OCsEHQDZbCU44i5dvLPvvA=">AAACUHicbVBNbxMxEJ0NLZRQIIUjF4uoUuAQ7aIKuCBVwIFLP5BIWykbRbPObGPVa6/sWaTVKn+BX8MVfgQ3/gm31tkEibY8yfbTe2OP52WlVp7j+HfUubOxeffe1v3ug+2Hjx73dp6ceFs5SSNptXVnGXrSytCIFWs6Kx1hkWk6zS4+LP3Tr+S8suYL1yVNCjw3KlcSOUjT3iANXYppc5iyTZXJuV6kc+SU58Q4OHwh3okVn/b68TBuIW6TZE36sMbxdCfqpTMrq4IMS43ej5O45EmDjpXUtOimlacS5QWe0zhQgwX5SdOOtBC7QZmJ3LqwDItW/fdGg4X3dZGFygJ57m96S/F/3rji/O2kUaasmIxcNcorLdiKZT5iphxJ1nUgKJ0KfxVyjg4lhxS76UcKszg6CO8eleSQrXvZtBkumnYXLXZF4IqFMqJ0NsNMacV1yDC5mdhtcvJqmLwe7n3e6++/X6e5Bc/gOQwggTewD5/gGEYg4Rt8hx/wM/oV/YkuO9Gq9O8JT+EaOt0rBemyag==</latexit>

plim ✓(N
N !1

{ 0 } = 0 (in mean square)


Consistent: lim ( ̂
θ(N) − θ ) 2
• N→∞
ˆ ) = ✓0 ˆ ) = ✓0
<latexit sha1_base64="G9kwDqltY2EZ53gA7sT/ZjYZP/U=">AAADAniclVLLbhMxFPUMrxJeKawQG6tRo4LQaAZVwAapAhZsKEUibaU4iu44noxVPwb7DigazY4tP8IOseVH+A5+AGcSJGjpgivZOj73Yd97nFdKekzTH1F84eKly1c2rvauXb9x81Z/8/aht7XjYsStsu44By+UNGKEEpU4rpwAnStxlJ+8WPqPPgjnpTXvcFGJiYa5kYXkgIGa9j8zJfW02WdOzksE5+xHJk2Bi5aVgA3DUiC0O/v3h89WeJqy9zXMmAYsnW6kYQ914pO2N+z95qxrh+EUCmv/f6Wn/UGapJ3RsyBbgwFZ28F0M5qwmeW1Fga5Au/HWVrhpAGHkivR9ljtRQX8BOZiHKABLfyk6ebW0u3AzGhhXVgGacf+mdGA9n6h8xC5bM2f9i3Jf/nGNRZPJ2E2VY3C8NVFRa0oWroUgc6kExzVIgDgToa3Ul6CA45Bqh57KUIvTrwOdd9UwgFa96Bh4Tvotul22tk2DVgilYZWzuaQSyVxcV46+C4fglhLcF5Yp1rbqEQmOmmDINnp8Z8Fh4+S7HGy+3Z3sPd8Lc0GuUe2yA7JyBOyR16RAzIinPyM7kZb0SD+FH+Jv8bfVqFxtM65Q/6y+PsvL1317g==</latexit>

lim ✓(N in m.s. or l. i. m. ✓(N


N !1 N !1
𝔼
66
fi
Consistency VS
(asymptotically) unbiased

0.4 N = 2
( N  )
1.5

f ˆ (ˆf (–ˆ )  0)
f ˆ (ˆ –ˆ  0)

0.2 1
f ˆ( )

ˆ
0.5
N
N


0 0
-2 -1 0 1 2 -2 -1 0 1 2
ˆˆ –– 
0
0
ˆ
 –
ˆ–
0
0

77
Law of large numbers

• The sample mean converges asymptotically (N → ∞) to its


expected value.
1 N
N→∞ N ∑
Let {x(n)} = μ, ∀n, then lim x(n) = μ
• n=1

• Depends on the distribution. For example, it is valid for i.i.d


random variables, for which the expected value exists.

88
𝔼
Parametric LTI model classes
Parametric LTI model classes

Denote time-shift operator q, s.t.


qu(t) = u(t + 1), q −1u(t) = u(t − 1)
General model structure
y(t) = G(q)u(t) + H(q)e(t)
with e white noise and
∞ ∞
g(k)q −k h(k)q −k
∑ ∑
G(q) = H(q) = 1 +
k=0 k=1

Then, G(q) and H(q) are considered as rational forms.

10
Models with rational transfer functions

Common cases
<latexit sha1_base64="IHmPU5oSRumc8/ajP0X1WASyeNo=">AAADeHichVLbbtNAEN3YXEq4NC2PfVkRColQI7uqKC9ITQIVICGViLSR4ihabybpKvau2UuVyPIv8HG88SG88MTaCYimUTrSrs7OnDOzM5owiZjSnvez5Lh37t67v/Wg/PDR4yfblZ3dcyWMpNClIhKyFxIFEePQ1UxH0EskkDiM4CKctvP4xRVIxQT/qucJDGIy4WzMKNHWNdwpfZ/XdB2/eItbtW91k+NXGAoXDjTMdHr6sZMFQfkvLRhLQtOcnKWn+b1OI4xOjMYgpZC5uGmJt9VpGi0OOjCRoBS7Avx+JibAhVG41uz06qtpMG7b1/UMnc/NDdXW8nsbBItW20Wr7/J7Rf1Pf9tsNidqidnBJ+BTxlWGh5Wq1/AKwzeBvwRVtLSzYeVHMBLUxMA1jYhSfd9L9CAlUjMaQVYOjIKE0CmZQN9CTmJQg7RYnAzvW88Ij4W0h2tceP9XpCRWah6HlhkTfalWY7lzXaxv9PjNIGXcbgFwuig0NhHWAudbiEdMAtXR3AJCJbN/xfSS2Clpu6tlOwR/teWb4Pyw4b9uHH05qp60luPYQnvoGaohHx2jE/QBnaEuoqVfzp7z3Nl3frvYfenWF1SntNQ8RdfMPfwDt1YDtg==</latexit>

y(t) = B(q)u(t) + e(t) FIR


B(q)
y(t) = u(t) + e(t) output error
F (q)
A(q)y(t) = B(q)u(t) + e(t) Auto-Regressive Exogenous (ARX)
A(q)y(t) = C(q)e(t) ARMA
A(q)y(t) = B(q)u(t) + C(q)e(t) ARMAX
C(q)
A(q)y(t) = B(q)u(t) + e(t) ARARMAX
D(q)
B(q) C(q)
y(t) = u(t) + e(t) Box-Jenkins
F (q) D(q)

1111
Models with rational transfer functions

General form, with A, C, D, F monic ( rst coef cient = 1)


B(q) C(q) white noise
A(q)y(t) = u(t) + e(t)
F(q) D(q) e

C
AD

u B
+ y
AF

1212
fi
fi
Linear Least Squares

1313
Linear Least Squares

• Model assumptions
y = Hθ0 + v
• Linear Least Squares Estimator
1
̂ ≡ argmin ∥y − Hθ∥2
θLS θ 2
2
= (H T H)−1H T y
y = Hθ ̂ = H(H T H)−1H T y
LS

1414
Derivation Linear Least Squares

̂ 1 2 1 T
θ ≡ argminθ ∥y − Hθ∥2 = argminθ e e e ≡ (y − Hθ)
2 2

( dθ )
T T T
1 d T 1 de 1 de de
e e = e+ eT = e = (−H T )(y − Hθ)̂ ≡ 0
2 dθ ̂
2 dθ 2 dθ
θ

⇒ H T y = H T Hθ ̂
⇒ θ ̂ = (H T H)−1H T y

1515
LLS properties,
noiseless regression matrix

• For noiseless regression matrix (and uncorrelated with v)


• Unbiased
̂ } = (H T H)−1H T (Hθ + v) = θ + (H T H)−1H T {v} = θ
{θLS 0 0 0
• Covariance
̂ ) = (H T H)−1H TC H(H T H)−1
Cov(θLS v

1616
𝔼
𝔼
LLS: example
(noiseless regression matrix)

• Polynomial approximation of the arctan function


y(k) = atan u(k) + ny(k) with u ∈ [−2,4]
nθ−1
θru r

Model structure: y =
• r=0

⋮ ⋮ ⋮
̂ = (H T H)−1(H T y) with H =
θLS 1 u(k) ⋯ u(k)nθ−1
⋮ ⋮ ⋮

17
LLS: Example ctd.
y
=0 y
= 0.1 y
= 0.5

2 2 2
y

y
0 0 0

-2 -2 -2
-2 0 2 4 -2 0 2 4 -2 0 2 4
u u u
0 -15 -5
RMS error (dB)

RMS error (dB)


bias error (dB)

-50
-20
-10
-100
-25
-150 -15
-2 0 2 4 -2 0 2 4 -2 0 2 4
u u u

18
LLS properties,
noisy regression matrix
Case 1: regression matrix is approximately known H = H0 + ΔH
and y = H0θ + v
̂ = (H T H)−1(H T H )θ + (H T H)−1(H T v)
• θLS 0 0

• For N → ∞: use law of large numbers:


• H T H → {H0T H0} + {ΔH T ΔH} = H0T H0 + CH
H T H0 → H0T H0
(Assuming that H0 and ΔH are uncorrelated)
• H T v → H0v + {ΔH T v} = H0v + CHv
Thus θ = (H H + C ) (H T H θ + C )
−1
̂
• LS 0
T
0 H 0 0 Hv

1919
𝔼
𝔼
𝔼
LLS properties,
noisy regression matrix
Case 2: regression matrix is known, but correlated with the noise:
y = Hθ0 + v with CHv = {H T v} ≠ 0
• This gives:
θ ̂ = (H T H)−1H T y = (H T H)−1H T (Hθ0 + v)
= θ0 + (H T H)−1H T v
• For N → ∞ (Law of large numbers):
̂
θN→∞ = θ0 + {(H T H)}−1 {H T v}
= θ0 + {(H T H)}−1CHv
• INCONSISTENT !
̂
• Compensate: θBCLS = (H T H)−1(H T y − CHv)
• Dif cult: CHv needs to be known
𝔼
2020
𝔼
𝔼
𝔼
fi
LLS regression of FIR model
• FIR model:
R


y(t) = hnu(t − n) + v(t)
• n=0

• Model structure: y = Hθ with


u(0) 0 0 ⋯ 0
u(1) u(0) 0 … 0
⋮ ⋮ ⋮ ⋱ ⋮
H=
u(t) u(t − 1) u(t − 2) … u(t − R)
⋮ ⋮ ⋮ ⋱ ⋮
u(N ) u(N − 1) u(N − 2) … u(N − R)

• Consistent estimate

2121
Example FIR 1st order

• y(t) = u(t) + au(t − 1) + v(t)


• OR: y(t) − u(t) = au(t − 1) + v(t)
u(−1)
u(0)
a ̂ = (H T H)−1(H T (y − u)), with H = u(1)
• ⋮
u(N − 1)
1 N
N
∑t=1 (y(t) − u(t))u(t − 1)
Thus: a ̂ = .
• 1
N

N
t=1
u(t − 1) 2

2222
LLS for ARMAX

• Model structure: y(t) = au(t) + by(t − 1)

• True system: y(t) = u(t) − a0 y(t − 1) + e(t) + e(t − 1)

• y = ϕθ0 + v ϕ(t) = [u(t), y(t − 1)] (stacked in cols for t)


θ ̂ = (ϕ T ϕ)−1ϕ T y = (ϕ T ϕ)−1(ϕ T ϕθ0 + ϕ T v) = θ0 + (ϕ T ϕ)−1(ϕ T v)
̂
• Consistent? N → ∞: θN→∞ = θ0 + ( {ϕ T ϕ})−1( {ϕ T v})

[y(t − 1)]
T u(t)
{ϕ v} = (e(t) + e(t − 1))

[(au(t − 1) + by(t − 2) + e(t − 1) + e(t − 2))]


u(t)
= (e(t) + e(t − 1))

≠0
𝔼
𝔼
𝔼
2323
𝔼
𝔼
LLS ARMAX example

• True system: y(t) = b0u(t) + a1y(t − 1) + e(t) + e(t − 1)


with u(t) and e(t) white noises with variances σu2 and σe2

• θ ̂ = [ϕ T ϕ]−1ϕ T yϕ = [u(t), y(t − 1)]


̂
• Analysis for N → ∞: θN→∞ = {ϕ T ϕ}−1 {ϕ T y}
• Persistent excitation: {ϕ T ϕ} regular?
̂
• Consistency: θN→∞ = θ0 ?

2424
𝔼
𝔼
𝔼
LLS ARMAX example ctd
{u(t)y(t − 1)} = 0
[ {y(t − 1)u(t)}
{u 2(t)}
{y 2(t − 1)} ]
{u(t)y(t − 1)}
{ϕ T ϕ} = ,
{u 2(t)} = σu2

{y 2(t − 1)} = b02σu2 + a12 {y 2(t − 2)} + 2σe2 + 2a1 {y(t − 2)e(t − 2)}
=σe2
+ {b0u(t − 1)(e(t − 1) + e(t − 2)) + a1y(t − 2)e(t − 1)}
=0

{y(t − 2)e(t − 2)} = {b0u(t − 2)e(t − 2)} + {a1y(t − 3)e(t − 2)} + {e 2(t − 2)} + {e(t − 3)e(t − 2)}
=0 =0 =σe2 =0

→ {ϕ T ϕ} is diagonal (non-zero elements), thus regular: persistency OK

2
b02σu2 + σe2(2 + 2a1)
2 2
Use stationarity: ∀τ ∈ ℤ : {y (t)} = {y (t − τ)} → {y (t − 1)} =
1 − a12

[y(t − 1)]
u(t)
{ϕ T y} = {ϕ T ϕ}θ∘ + (e(t) + e(t − 1))

[ {(b0u(t − 1) + a1y(t − 2) + e(t − 1) + e(t − 2)) (e(t) + e(t − 1))}]


T
0
= {ϕ ϕ}θ∘ +

[σe ]
0
= {ϕ T ϕ}θ∘ + 2 → results in asymptotic error → inconsistent!
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
25
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
LLS ARMAX example
• y(t) = u(t) − 0.7y(t − 1) + e(t) + e(t − 1)

• σe = 0.6 1.2

• σu = 1 0.8

[0.151]
0
0.6
θas. error =
• 0.4
est

0.2

-0.2

-0.4

-0.6

-0.8
10 1 10 2 10 3 10 4
N
2626
More on LLS

̂
• For N → ∞: θN→∞ = ( {ϕ T ϕ})−1 {ϕ T y}
exists if {ϕ T ϕ} is non-singular. Persistency of excitation?

• Consistent if ϕ is uncorrelated with the noise on y

• Also, the estimate is the solution of


ϕ T (y − ϕθ)̂ = 0
• Thus, nd θ ̂ such that y − ϕθ ̂ is uncorrelated with ϕ
• Towards instrumental variables

2727
𝔼
𝔼
𝔼
fi
Linear Least Squares:
examples

• Function approximations (e.g. smoothing)

• FIR estimation

• TF estimation (watch out: Biased, not consistent in a lot of


cases)
• Difference equation estimation

2828
Instrumental Variables

29
Instrumental variables
• Linear in the parameters model: y = ϕθ + v
• Problem when {ϕ T v} = Cϕv ≠ 0

• Least squares estimate: ϕ T (y − ϕθ)̂ = 0

• Replace ϕ T by ζ T, where ζ is
• correlated with the regression matrix: {ζ T ϕ} is regular
• uncorrelated with the noise: {ζ T v} = 0
̂ = (ζ T ϕ)−1(ζ T y)
θIV ̂
θIV,N→∞ = {ζ T ϕ}−1( {ζ T ϕθ0} + {ζ T v}) = θ0

• Choice of ζ : estimate of noiseless ϕ


• Iterative approach: start from LLS
̂ : use to simulate estimated output
→ θLS
→ ζ1 → θIV,1 Iterate → ζi → θIV,i

3030
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
Example Instrumental variables
y(t − 1)
System: y(t) = u(t) − + e(t) − e(t − 1)
2
Regression matrix: ϕ(t) = [u(t) y(t − 1)],

Instrument: ζ(t) = [u(t) x(t − 1)]with x(t) an estimate of y(t):

x(t) = u(t) + a1̂ x(t − 1), where a1̂ optimised iteratively, e.g. starting from LLS.

Convergence to true value? {ζ T v} = {ζ T (e(t) − e(t − 1)} = 0: OK

Identi able? {ζ T ϕ(t)} should be regular.

σu2
[ {x(t − 1)u(t)} {x(t − 1)y(t − 1)}] [ 0
0
σ ]
T {u 2(t)} {u(t)y(t − 1)}
{ζ ϕ(t)} = = 4 2 OK regular!
3 u

{( ( )}
y(t − 2)
{x(t − 1)y(t − 1)} = u(t − 1) + a1̂ x(t − 2)) u(t − 1) − + e(t − 1) − e(t − 2)
2
a1̂
= σu2 − {x(t − 2)y(t − 2)} use stationarity
2
σu2 σu2 4 2
= → = σu use a1̂ → a1
1+
a1̂
1− 1 3
2 4
𝔼
𝔼
𝔼
3131
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
fi
IV ARMAX example
1
• y(t) = 2u(t) + 0.8y(t − 1) + e(t) − e(t − 1)
2
• σe = 2 3.5

• σu = 1 3

2.5

true est 2

LLS 1.5

IV 1

0.5

0
1 2 3 4
10 10 10 10
N
3232
Maximum Likelihood

33
Maximum Likelihood de nition
• Denote data z (e.g. z(t) = (u(t), y(t)))

• Consider that z is a realisation of some stochastic data generating


system, and with
• probability density function parameterised by θ ∈ ℝnθ,
• and true value θ0:

pdf(z) = f(z ∣ θ0)


• When f(z ∣ θ) is considered w.r.t. θ, then it is called
the likelihood that θ has generated the data z.
̂ = arg max f(z ∣ θ)
Maximum likelihood estimate: θML
• θ
→ maximises the likelihood that z have been generated by the
process with parameter θ

3434
fi
ML for Linear in the
parameters, White noise,
• y(t) = Kθ + v(t), with v ∼ (0,Iσv2) (white noise)
1 − 12 v2(t)
∑t 2
f(v) = e σv

• 2πσv2
N

2
N 2 1 v (t)
2∑
logf(y ∣ θ) = − log 2πσv − with v(t) = y(t) − K(t)θ
2 t
σ 2
v

1 (y(t) − K(t)θ)2
θ 2∑
arg max logf(y ∣ θ) = arg min
θ
t
σv
2

3535
𝒩
Correlated noise,
Linear in the parameters
• y(t) = Kθ + v(t), with v ∼ (0,Cv) (white noise)
1 − 12 v T Cv−1v
f(v) = e
• (2π)N | Cv |
N 1 T −1
log f(y ∣ θ) = − log 2π Cv − v Cv v with v = y − Kθ
2 2
1
arg max log f(y ∣ θ) = arg min (y − Kθ)TCv−1(y − Kθ)
θ θ 2

• Cv diagonal → Weighted Least Squares

3636
𝒩
Weighted linear least squares
1
VWLS = (y − Hθ)TW(y − Hθ)
2
d
VWLS = − H TW(y − Hθ)̂ = 0

⇒ H TWHθ ̂ = H TWy
⇒ θ̂ = (H TWH)−1H TWy
WLS

cov (θWLS
̂
) = (H T
WH) −1
(H T
WCv WH)(H T
WH) −1

cov (θWLS) = (H Cv H)
−1
For W = λCv−1: ̂ T −1

37
Maximum Likelihood example

• v0 = Ri0 vm(k) = v0 + nv(k) im(k) = i0 + ni(k)

• Assume known noise variances: σi2, σu2, uncorrelated

• Unknowns: v0, i0, R

• Log likelihood:
1 nv2(k) 1 ni2(k) 1 (vm(k) − Ri)2 1 (im(k) − i)2
∑ ∑ ∑ 2∑
VlogML(R, i, v) = − − =− −
2 k σv 2 2 k σi 2 2 k σv2
k
σi
2

3838
Maximum Likelihood example ctd.
6 1 N
∑ v(k)
1 N R= N k=1


R= v(k) 1 N
∑k=1 i(k)
iN k=1 N

1
5
∂VML(R, i, z) i N 1 N

i= i(k)
σu ∑
=0 ⇒ − 2 (v(k) − Ri) = 0
∂R N k=1
k=1 2
∂VML(R, i, z) R N 1 N
σu ∑ ∑
=0 ⇒ − 2 (v(k) − Ri) − 2 (i(k) − i) = 0
∂i σ i
k=1 k=1 4
3

R N 1 N
σv k=1 (
∑ ∑ )
− 2 v(k) − v(k′) = 0
N k′=1


39
Consistency and
uncertainty

40
Consistency
(based on cost function)

• Necessary conditions
1. arg min lim {VN (θ, z)} = θ0
θ N→∞
2. l . i . m . N→∞VN (θ, z) = V*(θ) with V*(θ) = lim {VN (θ, z)}
N→∞

• Additional conditions:
3. Uniform convergence w.r.t. θ in closed and bounded neighbourhood of θ0
4. Continuous cost function and derivative w.r.t. θ

41
𝔼
𝔼
Asymptotic covariance

42
Cramer-Rao Lower Bound
• Likelihood function f(z ∣ θ) and let θ0 be the true parameters.

• ̂ :
For unbiased estimators : lower bound on covariance of θ(z)
̂
cov(θ(z)) ⩾ Fi−1(θ0)
• with Fi the 'Fisher information matrix'
T

( ) ( )
∂log f(z ∣ θ0) ∂log f(z ∣ θ0)
Fi(θ0) =
∂θ0 ∂θ0

∂2log f(z ∣ θ0)


{ }
=−
∂θ02

• Valid for nite data sets (for unbiased estimators)


𝔼
• Valid asymptotically (N → ∞) for consistent and asymptotically unbiased estimators
𝔼
43
fi
CRLB: discussion
• Fi depends on the sensitivity of the likelihood function w.r.t. parameters.
If the likelihood doesn't change much when changing θ (in some
direction), this means that there is a higher uncertainty on θ (in that
direction).

• Requires true parameter values θ0


• In practice, approximated in the estimated value θ ̂

• Typically too conservative (larger bounds exist)

• ̂
If equality holds cov(θ(z)) = Fi−1(θ0) : estimator is called EFFICIENT
• Valid for the speci c class (stochastic properties) of data set
• Other (better) experiments can be designed to further reduce the uncertainty
→ Optimal Experiment Design BUT requires the knowledge of the true system

• Example: WLS with W = Cv−1 is ef cient

44
fi
fi
Likelihood linear in the
parameters with correlated noise

45

You might also like