Parametric Identification
Parametric Identification
Identi cation
John Lataire
1
fi
Estimators
and their properties
2
Estimators
S θ̂
Data generating
Estimator
system
θ f
(Stochastic) Model
disturbance structure
3
𝒮
Properties of estimators
{( )( ) }
H
co − variance = Cθ = θ ̂ − {θ}̂ θ ̂ − {θ}̂
plim ✓(N
N !1
55
fi
Consistency, other de nitions
( N→∞ )
Strongly consistent: Prob ̂
lim θ(N) =θ =1
•
ˆ ) = ✓0 ˆ ) = ✓0
<latexit sha1_base64="4SpdDn3Wq5BSbIKbNeF9RO1oIfk=">AAAC/3iclVLLbhMxFPUMrxJeKUhsWGARNSqoGs2gqmWDVAELNpQikbZSHEV3HCdj1R4P9h2qyJoFW36EHWLLp/AXfALOJEjQ0gVHsnV07sO+Ps4rJR2m6Y8ovnT5ytVra9c7N27eun2nu3730JnacjHgRhl7nIMTSpZigBKVOK6sAJ0rcZSfvFzEjz4K66Qp3+O8EiMNs1JOJQcM0rj7mSmpx36fWTkrEKw1p0yWU5w3rAD0DAuB0GzuP+4/X/Jxyj7UMGEasLDanyZVwrayptPv/5aMbfoM3H/3HXd7aZK2oOdJtiI9ssLBeD0asYnhtRYlcgXODbO0wpEHi5Ir0XRY7UQF/ARmYhhoCVq4kW8fraEbQZnQqbFhlUhb9c8KD9q5uc5D5mIwdza2EP8VG9Y4fTbysqxqFCVfHjStFUVDFw7QibSCo5oHAtzKcFfKC7DAMfjUYa9EmMWKN6Hv20pYQGOfeBb+gm58u9MWGzRwiVSWtLImh1wqifOLyls7Gg+JSxbkorQQ067xKpGJTppgSHb2+c+Tw6dJtpNsv9vu7b1YWbNGHpBHZJNkZJfskdfkgAwIJz+j+9HDiMaf4i/x1/jbMjWOVjX3yF+Iv/8Co9X1EQ==</latexit>
• Weakly consistent:
ˆ )=✓
<latexit sha1_base64="hQ+r+OCsEHQDZbCU44i5dvLPvvA=">AAACUHicbVBNbxMxEJ0NLZRQIIUjF4uoUuAQ7aIKuCBVwIFLP5BIWykbRbPObGPVa6/sWaTVKn+BX8MVfgQ3/gm31tkEibY8yfbTe2OP52WlVp7j+HfUubOxeffe1v3ug+2Hjx73dp6ceFs5SSNptXVnGXrSytCIFWs6Kx1hkWk6zS4+LP3Tr+S8suYL1yVNCjw3KlcSOUjT3iANXYppc5iyTZXJuV6kc+SU58Q4OHwh3okVn/b68TBuIW6TZE36sMbxdCfqpTMrq4IMS43ej5O45EmDjpXUtOimlacS5QWe0zhQgwX5SdOOtBC7QZmJ3LqwDItW/fdGg4X3dZGFygJ57m96S/F/3rji/O2kUaasmIxcNcorLdiKZT5iphxJ1nUgKJ0KfxVyjg4lhxS76UcKszg6CO8eleSQrXvZtBkumnYXLXZF4IqFMqJ0NsNMacV1yDC5mdhtcvJqmLwe7n3e6++/X6e5Bc/gOQwggTewD5/gGEYg4Rt8hx/wM/oV/YkuO9Gq9O8JT+EaOt0rBemyag==</latexit>
plim ✓(N
N !1
0.4 N = 2
( N )
1.5
f ˆ (ˆf (–ˆ ) 0)
f ˆ (ˆ –ˆ 0)
0.2 1
f ˆ( )
ˆ
0.5
N
N
0 0
-2 -1 0 1 2 -2 -1 0 1 2
ˆˆ ––
0
0
ˆ
–
ˆ–
0
0
77
Law of large numbers
88
𝔼
Parametric LTI model classes
Parametric LTI model classes
10
Models with rational transfer functions
Common cases
<latexit sha1_base64="IHmPU5oSRumc8/ajP0X1WASyeNo=">AAADeHichVLbbtNAEN3YXEq4NC2PfVkRColQI7uqKC9ITQIVICGViLSR4ihabybpKvau2UuVyPIv8HG88SG88MTaCYimUTrSrs7OnDOzM5owiZjSnvez5Lh37t67v/Wg/PDR4yfblZ3dcyWMpNClIhKyFxIFEePQ1UxH0EskkDiM4CKctvP4xRVIxQT/qucJDGIy4WzMKNHWNdwpfZ/XdB2/eItbtW91k+NXGAoXDjTMdHr6sZMFQfkvLRhLQtOcnKWn+b1OI4xOjMYgpZC5uGmJt9VpGi0OOjCRoBS7Avx+JibAhVG41uz06qtpMG7b1/UMnc/NDdXW8nsbBItW20Wr7/J7Rf1Pf9tsNidqidnBJ+BTxlWGh5Wq1/AKwzeBvwRVtLSzYeVHMBLUxMA1jYhSfd9L9CAlUjMaQVYOjIKE0CmZQN9CTmJQg7RYnAzvW88Ij4W0h2tceP9XpCRWah6HlhkTfalWY7lzXaxv9PjNIGXcbgFwuig0NhHWAudbiEdMAtXR3AJCJbN/xfSS2Clpu6tlOwR/teWb4Pyw4b9uHH05qp60luPYQnvoGaohHx2jE/QBnaEuoqVfzp7z3Nl3frvYfenWF1SntNQ8RdfMPfwDt1YDtg==</latexit>
1111
Models with rational transfer functions
C
AD
u B
+ y
AF
1212
fi
fi
Linear Least Squares
1313
Linear Least Squares
• Model assumptions
y = Hθ0 + v
• Linear Least Squares Estimator
1
̂ ≡ argmin ∥y − Hθ∥2
θLS θ 2
2
= (H T H)−1H T y
y = Hθ ̂ = H(H T H)−1H T y
LS
1414
Derivation Linear Least Squares
̂ 1 2 1 T
θ ≡ argminθ ∥y − Hθ∥2 = argminθ e e e ≡ (y − Hθ)
2 2
( dθ )
T T T
1 d T 1 de 1 de de
e e = e+ eT = e = (−H T )(y − Hθ)̂ ≡ 0
2 dθ ̂
2 dθ 2 dθ
θ
⇒ H T y = H T Hθ ̂
⇒ θ ̂ = (H T H)−1H T y
1515
LLS properties,
noiseless regression matrix
1616
𝔼
𝔼
LLS: example
(noiseless regression matrix)
⋮ ⋮ ⋮
̂ = (H T H)−1(H T y) with H =
θLS 1 u(k) ⋯ u(k)nθ−1
⋮ ⋮ ⋮
17
LLS: Example ctd.
y
=0 y
= 0.1 y
= 0.5
2 2 2
y
y
0 0 0
-2 -2 -2
-2 0 2 4 -2 0 2 4 -2 0 2 4
u u u
0 -15 -5
RMS error (dB)
-50
-20
-10
-100
-25
-150 -15
-2 0 2 4 -2 0 2 4 -2 0 2 4
u u u
18
LLS properties,
noisy regression matrix
Case 1: regression matrix is approximately known H = H0 + ΔH
and y = H0θ + v
̂ = (H T H)−1(H T H )θ + (H T H)−1(H T v)
• θLS 0 0
1919
𝔼
𝔼
𝔼
LLS properties,
noisy regression matrix
Case 2: regression matrix is known, but correlated with the noise:
y = Hθ0 + v with CHv = {H T v} ≠ 0
• This gives:
θ ̂ = (H T H)−1H T y = (H T H)−1H T (Hθ0 + v)
= θ0 + (H T H)−1H T v
• For N → ∞ (Law of large numbers):
̂
θN→∞ = θ0 + {(H T H)}−1 {H T v}
= θ0 + {(H T H)}−1CHv
• INCONSISTENT !
̂
• Compensate: θBCLS = (H T H)−1(H T y − CHv)
• Dif cult: CHv needs to be known
𝔼
2020
𝔼
𝔼
𝔼
fi
LLS regression of FIR model
• FIR model:
R
∑
y(t) = hnu(t − n) + v(t)
• n=0
• Consistent estimate
2121
Example FIR 1st order
2222
LLS for ARMAX
[y(t − 1)]
T u(t)
{ϕ v} = (e(t) + e(t − 1))
≠0
𝔼
𝔼
𝔼
2323
𝔼
𝔼
LLS ARMAX example
2424
𝔼
𝔼
𝔼
LLS ARMAX example ctd
{u(t)y(t − 1)} = 0
[ {y(t − 1)u(t)}
{u 2(t)}
{y 2(t − 1)} ]
{u(t)y(t − 1)}
{ϕ T ϕ} = ,
{u 2(t)} = σu2
{y 2(t − 1)} = b02σu2 + a12 {y 2(t − 2)} + 2σe2 + 2a1 {y(t − 2)e(t − 2)}
=σe2
+ {b0u(t − 1)(e(t − 1) + e(t − 2)) + a1y(t − 2)e(t − 1)}
=0
{y(t − 2)e(t − 2)} = {b0u(t − 2)e(t − 2)} + {a1y(t − 3)e(t − 2)} + {e 2(t − 2)} + {e(t − 3)e(t − 2)}
=0 =0 =σe2 =0
2
b02σu2 + σe2(2 + 2a1)
2 2
Use stationarity: ∀τ ∈ ℤ : {y (t)} = {y (t − τ)} → {y (t − 1)} =
1 − a12
[y(t − 1)]
u(t)
{ϕ T y} = {ϕ T ϕ}θ∘ + (e(t) + e(t − 1))
[σe ]
0
= {ϕ T ϕ}θ∘ + 2 → results in asymptotic error → inconsistent!
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
25
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
LLS ARMAX example
• y(t) = u(t) − 0.7y(t − 1) + e(t) + e(t − 1)
• σe = 0.6 1.2
• σu = 1 0.8
[0.151]
0
0.6
θas. error =
• 0.4
est
0.2
-0.2
-0.4
-0.6
-0.8
10 1 10 2 10 3 10 4
N
2626
More on LLS
̂
• For N → ∞: θN→∞ = ( {ϕ T ϕ})−1 {ϕ T y}
exists if {ϕ T ϕ} is non-singular. Persistency of excitation?
2727
𝔼
𝔼
𝔼
fi
Linear Least Squares:
examples
• FIR estimation
2828
Instrumental Variables
29
Instrumental variables
• Linear in the parameters model: y = ϕθ + v
• Problem when {ϕ T v} = Cϕv ≠ 0
• Replace ϕ T by ζ T, where ζ is
• correlated with the regression matrix: {ζ T ϕ} is regular
• uncorrelated with the noise: {ζ T v} = 0
̂ = (ζ T ϕ)−1(ζ T y)
θIV ̂
θIV,N→∞ = {ζ T ϕ}−1( {ζ T ϕθ0} + {ζ T v}) = θ0
3030
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
Example Instrumental variables
y(t − 1)
System: y(t) = u(t) − + e(t) − e(t − 1)
2
Regression matrix: ϕ(t) = [u(t) y(t − 1)],
x(t) = u(t) + a1̂ x(t − 1), where a1̂ optimised iteratively, e.g. starting from LLS.
σu2
[ {x(t − 1)u(t)} {x(t − 1)y(t − 1)}] [ 0
0
σ ]
T {u 2(t)} {u(t)y(t − 1)}
{ζ ϕ(t)} = = 4 2 OK regular!
3 u
{( ( )}
y(t − 2)
{x(t − 1)y(t − 1)} = u(t − 1) + a1̂ x(t − 2)) u(t − 1) − + e(t − 1) − e(t − 2)
2
a1̂
= σu2 − {x(t − 2)y(t − 2)} use stationarity
2
σu2 σu2 4 2
= → = σu use a1̂ → a1
1+
a1̂
1− 1 3
2 4
𝔼
𝔼
𝔼
3131
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
fi
IV ARMAX example
1
• y(t) = 2u(t) + 0.8y(t − 1) + e(t) − e(t − 1)
2
• σe = 2 3.5
• σu = 1 3
2.5
true est 2
LLS 1.5
IV 1
0.5
0
1 2 3 4
10 10 10 10
N
3232
Maximum Likelihood
33
Maximum Likelihood de nition
• Denote data z (e.g. z(t) = (u(t), y(t)))
3434
fi
ML for Linear in the
parameters, White noise,
• y(t) = Kθ + v(t), with v ∼ (0,Iσv2) (white noise)
1 − 12 v2(t)
∑t 2
f(v) = e σv
• 2πσv2
N
2
N 2 1 v (t)
2∑
logf(y ∣ θ) = − log 2πσv − with v(t) = y(t) − K(t)θ
2 t
σ 2
v
1 (y(t) − K(t)θ)2
θ 2∑
arg max logf(y ∣ θ) = arg min
θ
t
σv
2
3535
𝒩
Correlated noise,
Linear in the parameters
• y(t) = Kθ + v(t), with v ∼ (0,Cv) (white noise)
1 − 12 v T Cv−1v
f(v) = e
• (2π)N | Cv |
N 1 T −1
log f(y ∣ θ) = − log 2π Cv − v Cv v with v = y − Kθ
2 2
1
arg max log f(y ∣ θ) = arg min (y − Kθ)TCv−1(y − Kθ)
θ θ 2
3636
𝒩
Weighted linear least squares
1
VWLS = (y − Hθ)TW(y − Hθ)
2
d
VWLS = − H TW(y − Hθ)̂ = 0
dθ
⇒ H TWHθ ̂ = H TWy
⇒ θ̂ = (H TWH)−1H TWy
WLS
cov (θWLS
̂
) = (H T
WH) −1
(H T
WCv WH)(H T
WH) −1
cov (θWLS) = (H Cv H)
−1
For W = λCv−1: ̂ T −1
37
Maximum Likelihood example
• Log likelihood:
1 nv2(k) 1 ni2(k) 1 (vm(k) − Ri)2 1 (im(k) − i)2
∑ ∑ ∑ 2∑
VlogML(R, i, v) = − − =− −
2 k σv 2 2 k σi 2 2 k σv2
k
σi
2
3838
Maximum Likelihood example ctd.
6 1 N
∑ v(k)
1 N R= N k=1
∑
R= v(k) 1 N
∑k=1 i(k)
iN k=1 N
1
5
∂VML(R, i, z) i N 1 N
∑
i= i(k)
σu ∑
=0 ⇒ − 2 (v(k) − Ri) = 0
∂R N k=1
k=1 2
∂VML(R, i, z) R N 1 N
σu ∑ ∑
=0 ⇒ − 2 (v(k) − Ri) − 2 (i(k) − i) = 0
∂i σ i
k=1 k=1 4
3
R N 1 N
σv k=1 (
∑ ∑ )
− 2 v(k) − v(k′) = 0
N k′=1


39
Consistency and
uncertainty
40
Consistency
(based on cost function)
• Necessary conditions
1. arg min lim {VN (θ, z)} = θ0
θ N→∞
2. l . i . m . N→∞VN (θ, z) = V*(θ) with V*(θ) = lim {VN (θ, z)}
N→∞
• Additional conditions:
3. Uniform convergence w.r.t. θ in closed and bounded neighbourhood of θ0
4. Continuous cost function and derivative w.r.t. θ
41
𝔼
𝔼
Asymptotic covariance
42
Cramer-Rao Lower Bound
• Likelihood function f(z ∣ θ) and let θ0 be the true parameters.
• ̂ :
For unbiased estimators : lower bound on covariance of θ(z)
̂
cov(θ(z)) ⩾ Fi−1(θ0)
• with Fi the 'Fisher information matrix'
T
( ) ( )
∂log f(z ∣ θ0) ∂log f(z ∣ θ0)
Fi(θ0) =
∂θ0 ∂θ0
• ̂
If equality holds cov(θ(z)) = Fi−1(θ0) : estimator is called EFFICIENT
• Valid for the speci c class (stochastic properties) of data set
• Other (better) experiments can be designed to further reduce the uncertainty
→ Optimal Experiment Design BUT requires the knowledge of the true system
44
fi
fi
Likelihood linear in the
parameters with correlated noise
45