Estimation
Estimation
Lecture 6
Estimation
• Prior readings
6–1
Outline and Motivations
The abstract statement of the problem that we want to solve is:
Estimation 6–2
The general approach developed in this course comprizes three steps:
These assumptions are often acceptable and make life much simpler.
Estimation 6–3
Prior (and complementary) readings
To prepare the coming courses, you absolutely need to read the following
material (see web-site):
Some explanations on this material will however be given during this and
the subsequent lectures and repetitions.
Estimation 6–4
Gaussian random variable (short reminder)
∂Fx (x)
• Notion of continuous rvrv (crv): px(v) = ∂x .
x=v
• We use the term probability density function (pdf) of a crv for px(·).
• Properties:
– Many practical applications: central limit theorem..., preservation of
“normality” by linear (affine) transformations...
– Characterization of pdf by the first 2 moments only...
Estimation 6–5
Gaussian random processes
(xi−x̄i )2
• xi ∼ √1 2 exp − 2σ2 where x̄i = E xi and σi2 = E(xi − x̄i )2;
2πσi i
Estimation 6–6
Gaussian random vectors
• denoted x ∼ N (x̄, Σ)
• x̄ ∈ Rn is the mean or expected value of x, i.e.,
Z
x̄ = E x = vpx(v)dv
Estimation 6–7
= E xxT − x̄x̄T
Z
= (v − x̄)(v − x̄)T px(v)dv
0.45
√1 e−v /2
0.4
2
0.35
0.3
2π
0.25
px (v) =
0.2
0.15
0.1
0.05
0
−4 −3 −2 −1 0 1 2 3 4
v
Estimation 6–8
• mean and variance of scalar random variable xi are
n
X
E kx − x̄k2 = E Tr(x − x̄)(x − x̄)T = Tr Σ = Σii
i=1
(using Tr AB = Tr BA)
Estimation 6–9
Confidence ellipsoids
Estimation 6–10
geometrically:
Estimation 6–11
2 2 1
example: x ∼ N (x̄, Σ) with x̄ = ,Σ=
1 1 1
√
• x1 has mean 2, std. dev. 2
• x2 has mean 1, std. dev. 1
√
• correlation coefficient between x1 and x2 is ρ = 1/ 2
• E kx − x̄k2 = 3
2
x2
−2
−4
−6
−8
−10 −8 −6 −4 −2 0 2 4 6 8 10
x1
(here, 91 out of 100 fall in E4.6)
Estimation 6–12
Affine transformation
z = Ax + b,
where A ∈ Rm×n, b ∈ Rm
then z is Gaussian, with mean
E z = E(Ax + b) = A E x + b = Ax̄ + b
and covariance
Estimation 6–13
examples:
Estimation 6–14
suppose x ∼ N (x̄, Σ) and c ∈ Rn
Σu = λminu, kuk = 1
√
standard deviation of uTn x is λmin
Estimation 6–15
Degenerate Gaussian vectors
write Σ as
Σ+ 0
Σ = [Q+ Q0] [Q+ Q0]T
0 0
where Q = [Q+ Q0] is orthogonal, Σ+ > 0
Estimation 6–16
then QT x = [z T wT ]T , where
Estimation 6–17
Linear measurements
y = Ax + v
• y ∈ Rm is measurement
• v is sensor noise
Estimation 6–18
common assumptions:
• x ∼ N (x̄, Σx)
• v ∼ N (v̄, Σv )
• x and v are independent
Estimation 6–19
thus
x x̄ Σx 0
∼N ,
v v̄ 0 Σv
using
x I 0 x
=
y A I v
we can write
x x̄
E =
y Ax̄ + v̄
and
T T
x − x̄ x − x̄ I 0 Σx 0 I 0
E =
y − ȳ y − ȳ A I 0 Σv A I
T
Σx ΣxA
=
AΣx AΣxAT + Σv
Estimation 6–20
covariance of measurement y is AΣxAT + Σv
Estimation 6–21
Minimum mean-square estimation
E kφ(y) − xk2
Estimation 6–22
MMSE for Gaussian vectors
where
Λ = Σx − Σxy Σ−1
y ΣT
xy , w = x̄ + Σxy Σ−1
y (y − ȳ)
Estimation 6–23
φmmse is an affine function
note that
Σx − Σxy Σ−1
y ΣT
xy ≤ Σx
i.e., covariance of estimation error is always less than prior covariance of x
Estimation 6–24
Best linear unbiased estimator
estimator
x̂ = φblu(y) = x̄ + Σxy Σ−1
y (y − ȳ)
makes sense when x, y aren’t jointly Gaussian
this estimator
• is unbiased, i.e., E x̂ = E x
• often works well
• is widely used
• has minimum mean square error among all affine estimators
Estimation 6–25
MMSE with linear measurements
consider specific case
x, v independent
MMSE of x given y is affine function
x̂ = x̄ + B(y − ȳ)
Estimation 6–26
• estimator modifies prior guess by B times this discrepancy
Estimation 6–27
MMSE error with linear measurements
Estimation 6–28
to evaluate Σest , only need to know
you do not need to know the measurement y (or the means x̄, v̄)
Estimation 6–29
Information matrix formulas
• n × n inverse instead of m × m
• Σ−1
x , Σ−1
v sometimes called information matrices
Estimation 6–30
can interpret Σ−1 −1 T −1
est = Σx + A Σv A as:
Estimation 6–31
proof: multiply
T T −1 ? T −1 −1 T −1
Σ−1
ΣxA (AΣxA + Σv ) = A v A + Σx A Σv
?
(AT Σ−1 −1 T T −1 T
v A + Σx )Σx A = A Σv (AΣx A + Σv )
which is true
Estimation 6–32
Relation to regularized least-squares
suppose x̄ = 0, v̄ = 0, Σx = α2I, Σv = β 2I
estimator is x̂ = By where
T −1 −1 T −1
Σ−1
B = A v A + Σx A Σv
= (AT A + (β/α)2I)−1AT
over z
Estimation 6–33
Example
navigation using range measurements to distant beacons
y = Ax + v
• x ∈ R2 is location
• yi is range measurement to ith beacon
• vi is range measurement error, IID N (0, 1)
• ith row of A is unit vector in direction of ith beacon
prior distribution:
2
1 2 0
x ∼ N (x̄, Σx), x̄ = , Σx =
1 0 0.52
Estimation 6–34
90% confidence ellipsoid for prior distribution
{ x | (x − x̄)T Σ−1
x (x − x̄) ≤ 4.6 }:
1
x2
0
−1
−2
−3
−4
−5
−6 −4 −2 0 2 4 6
x1
Estimation 6–35
Case 1: one measurement, with beacon at angle 30◦
Estimation 6–36
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5
1
x2
0
−1
−2
−3
−4
−5
−6 −4 −2 0 2 4 6
x1
interpretation: measurement
Estimation 6–37
Case 2: 4 measurements, with beacon angles 80◦, 85◦, 90◦, 95◦
Estimation 6–38
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5
1
x2
0
−1
−2
−3
−4
−5
−6 −4 −2
x1
0 2 4 6
Estimation 6–39