0% found this document useful (0 votes)
13 views39 pages

Estimation

Uploaded by

lulukilo2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views39 pages

Estimation

Uploaded by

lulukilo2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Adapted from EE363: Linear Dynamical Systems, Winter 2005-2006 Stephen Boyd - Stanford University

Lecture 6
Estimation

• Outline and Motivations

• Prior readings

• Gaussian random vectors

• minimum mean-square estimation (MMSE)

• MMSE with linear measurements

• relation to least-squares, pseudo-inverse

6–1
Outline and Motivations
The abstract statement of the problem that we want to solve is:

Given a model of a system y = f (x) and some measurements of y


corrupted by noise, determine a good estimate of x.

This problem covers a huge number of engineering applications,e.g.:

• System identification: determine de values of system parameters


(masses, spring constants, resistances, volumes) from elementary
measurements on the system (positions, speeds, currents, voltages).

• State estimation: determine internal state of system (position, speed,


voltages, temperature) from external measurements (GPS signals,
surface temperatures, terminal voltages and currents)

• Time series forecasting: given past measurements determine likely


future values

Estimation 6–2
The general approach developed in this course comprizes three steps:

• Model the quantities of interest as random variables x, y

• Determine joint probability distribution p(x, y) from prior knowledge


aubout the problem

• Use mathematics to construct an algorithm to compute p(x|y) and


extract estimate x̂(y) from it.

The main assumptions that we will make:

• Physical relationships among quantities of interest can be approximated


by linear equations

• Prior uncertainties and measurement errors can be approximated by


Gaussian distributions

These assumptions are often acceptable and make life much simpler.

Estimation 6–3
Prior (and complementary) readings

To prepare the coming courses, you absolutely need to read the following
material (see web-site):

• Section B.9 (and review of B.5,B.6, B.8) of ’Appendices communs...’

• The Humble Gaussian Distribution, David J.C. MacKay

Some explanations on this material will however be given during this and
the subsequent lectures and repetitions.

Estimation 6–4
Gaussian random variable (short reminder)

• Notion of real-valued random variable (rvrv): P (x < v) = Fx(v).

∂Fx (x)
• Notion of continuous rvrv (crv): px(v) = ∂x .
x=v

• We use the term probability density function (pdf) of a crv for px(·).

• x is Gaussian (i.e. “normally distributed”), denoted by x ∼ N (x̄, σ 2), if


2
 
– px(v) = √ 1 2 exp − (v−x̄)
2σ 2 , where
2πσ
R
– x̄ = E x = vpx(v)dv is the mean
– σ 2 = E(x − x̄)2 = (v − x̄)2px(v)dv is the variance
R

• Properties:
– Many practical applications: central limit theorem..., preservation of
“normality” by linear (affine) transformations...
– Characterization of pdf by the first 2 moments only...

Estimation 6–5
Gaussian random processes

By definition, a (countable) collection {x1 , x2, . . .} of real-valued random


variables is a Gaussian process, if any linear combination of a (finite)
subset of these variables has a normal distribution (or is a constant).
Implications (the first three are “trivial”):

(xi−x̄i )2
 
• xi ∼ √1 2 exp − 2σ2 where x̄i = E xi and σi2 = E(xi − x̄i )2;
2πσi i

• any (finite) affine combination a0 + a1xi1 + . . . + anxin has a normal


distribution (or is a constant);

• if {y1, y2 , . . .} are (finite) affine combinations over a Gaussian process


{x1, x2, . . .}, then {x1, x2 . . .} ∪ {y1, y2, . . .} is also a Gaussian process;

• a Gaussian process {x1, x2, . . .} is entirely characterized by the numbers


x̄i = E xi and σij = E(xi − x̄i)(xj − x̄j ).

Estimation 6–6
Gaussian random vectors

random vector x ∈ Rn is Gaussian if it has density


 
1
px(v) = (2π)−n/2 (det Σ)−1/2 exp − (v − x̄)T Σ−1 (v − x̄) ,
2

for some Σ = ΣT > 0, x̄ ∈ Rn

• denoted x ∼ N (x̄, Σ)
• x̄ ∈ Rn is the mean or expected value of x, i.e.,
Z
x̄ = E x = vpx(v)dv

• Σ = ΣT > 0 is the covariance matrix of x, i.e.,

Σ = E(x − x̄)(x − x̄)T

Estimation 6–7
= E xxT − x̄x̄T
Z
= (v − x̄)(v − x̄)T px(v)dv

density for x ∼ N (0, 1):

0.45
√1 e−v /2

0.4
2

0.35

0.3

0.25
px (v) =

0.2

0.15

0.1

0.05

0
−4 −3 −2 −1 0 1 2 3 4
v

Estimation 6–8
• mean and variance of scalar random variable xi are

E xi = x̄i, E(xi − x̄i)2 = Σii



hence standard deviation of xi is Σii
• covariance between xi and xj is E(xi − x̄i )(xj − x̄j ) = Σij
Σij
• correlation coefficient between xi and xj is ρij = p
ΣiiΣjj
• mean (norm) square deviation of x from x̄ is

n
X
E kx − x̄k2 = E Tr(x − x̄)(x − x̄)T = Tr Σ = Σii
i=1

(using Tr AB = Tr BA)

example: x ∼ N (0, I) means xi are independent identically distributed


(IID) N (0, 1) random variables

Estimation 6–9
Confidence ellipsoids

px(v) is constant for (v − x̄)T Σ−1(v − x̄) = α, i.e., on the surface of


ellipsoid
Eα = {v | (v − x̄)T Σ−1 (v − x̄) ≤ α}

thus x̄ and Σ determine shape of density

can interpret Eα as confidence ellipsoid for x:

the nonnegative random variable (x − x̄)T Σ−1 (x − x̄) has a χ2n


distribution, so Prob(x ∈ Eα) = Fχ2n (α) where Fχ2n is the CDF

some good approximations:

• En gives about 50% probability


• En+2√n gives about 90% probability

Estimation 6–10
geometrically:

• mean x̄ gives center of ellipsoid



• semiaxes are αλiui, where ui are (orthonormal) eigenvectors of Σ
with eigenvalues λi

Estimation 6–11
   
2 2 1
example: x ∼ N (x̄, Σ) with x̄ = ,Σ=
1 1 1

• x1 has mean 2, std. dev. 2
• x2 has mean 1, std. dev. 1

• correlation coefficient between x1 and x2 is ρ = 1/ 2
• E kx − x̄k2 = 3

90% confidence ellipsoid corresponds to α = 4.6:


8

2
x2

−2

−4

−6

−8
−10 −8 −6 −4 −2 0 2 4 6 8 10

x1
(here, 91 out of 100 fall in E4.6)

Estimation 6–12
Affine transformation

suppose x ∼ N (x̄, Σx)


consider affine transformation of x:

z = Ax + b,

where A ∈ Rm×n, b ∈ Rm
then z is Gaussian, with mean

E z = E(Ax + b) = A E x + b = Ax̄ + b

and covariance

Σz = E(z − z̄)(z − z̄)T


= E A(x − x̄)(x − x̄)T AT
= AΣxAT

Estimation 6–13
examples:

• if w ∼ N (0, I) then x = Σ1/2w + x̄ is N (x̄, Σ)


useful for simulating vectors with given mean and covariance

• conversely, if x ∼ N (x̄, Σ) then z = Σ−1/2(x − x̄) is N (0, I)


(normalizes & decorrelates)

Estimation 6–14
suppose x ∼ N (x̄, Σ) and c ∈ Rn

scalar cT x has mean cT x̄ and variance cT Σc

thus (unit length) direction of minimum variability for x is u, where

Σu = λminu, kuk = 1


standard deviation of uTn x is λmin

(similarly for maximum variability)

Estimation 6–15
Degenerate Gaussian vectors

it is convenient to allow Σ to be singular (but still Σ = ΣT ≥ 0)

(in this case density formula obviously does not hold)

meaning: in some directions x is not random at all

write Σ as  
Σ+ 0
Σ = [Q+ Q0] [Q+ Q0]T
0 0
where Q = [Q+ Q0] is orthogonal, Σ+ > 0

• columns of Q0 are orthonormal basis for N (Σ)


• columns of Q+ are orthonormal basis for range(Σ)

Estimation 6–16
then QT x = [z T wT ]T , where

• z ∼ N (QT+x̄, Σ+) is (nondegenerate) Gaussian (hence, density formula


holds)
• w = QT0 x̄ ∈ Rn is not random
(QT0 x is called deterministic component of x)

Estimation 6–17
Linear measurements

linear measurements with noise:

y = Ax + v

• x ∈ Rn is what we want to measure or estimate

• y ∈ Rm is measurement

• A ∈ Rm×n characterizes sensors or measurements

• v is sensor noise

Estimation 6–18
common assumptions:

• x ∼ N (x̄, Σx)
• v ∼ N (v̄, Σv )
• x and v are independent

• N (x̄, Σx) is the prior distribution of x (describes initial uncertainty


about x)
• v̄ is noise bias or offset (and is usually 0)
• Σv is noise covariance

Estimation 6–19
thus      
x x̄ Σx 0
∼N ,
v v̄ 0 Σv

using     
x I 0 x
=
y A I v

we can write    
x x̄
E =
y Ax̄ + v̄
and
  T    T
x − x̄ x − x̄ I 0 Σx 0 I 0
E =
y − ȳ y − ȳ A I 0 Σv A I
T
 
Σx ΣxA
=
AΣx AΣxAT + Σv

Estimation 6–20
covariance of measurement y is AΣxAT + Σv

• AΣxAT is ‘signal covariance’


• Σv is ‘noise covariance’

Estimation 6–21
Minimum mean-square estimation

suppose x ∈ Rn and y ∈ Rm are random vectors (not necessarily Gaussian)

we seek to estimate x given y

thus we seek a function φ : Rm → Rn such that x̂ = φ(y) is near x

one common measure of nearness: mean-square error,

E kφ(y) − xk2

minimum mean-square estimator (MMSE) φmmse minimizes this quantity

general solution: φmmse(y) = E(x|y), i.e., the conditional expectation of x


given y

Estimation 6–22
MMSE for Gaussian vectors

now suppose x ∈ Rn and y ∈ Rm are jointly Gaussian:


       
x x̄ Σx Σxy
∼ N ,
y ȳ ΣTxy Σy

(after alot of algebra) the conditional density is


 
−n/2 −1/2 1
px|y (v|y) = (2π) (det Λ) exp − (v − w)T Λ−1(v − w) ,
2

where
Λ = Σx − Σxy Σ−1
y ΣT
xy , w = x̄ + Σxy Σ−1
y (y − ȳ)

hence MMSE estimator (i.e., conditional expectation) is

x̂ = φmmse(y) = E(x|y) = x̄ + Σxy Σ−1


y (y − ȳ)

Estimation 6–23
φmmse is an affine function

MMSE estimation error, x̂ − x, is a Gaussian random vector

x̂ − x ∼ N (0, Σx − Σxy Σ−1 T


y Σxy )

note that
Σx − Σxy Σ−1
y ΣT
xy ≤ Σx
i.e., covariance of estimation error is always less than prior covariance of x

Estimation 6–24
Best linear unbiased estimator

estimator
x̂ = φblu(y) = x̄ + Σxy Σ−1
y (y − ȳ)
makes sense when x, y aren’t jointly Gaussian

this estimator

• is unbiased, i.e., E x̂ = E x
• often works well
• is widely used
• has minimum mean square error among all affine estimators

sometimes called best linear unbiased estimator

Estimation 6–25
MMSE with linear measurements
consider specific case

y = Ax + v, x ∼ N (x̄, Σx), v ∼ N (v̄, Σv ),

x, v independent
MMSE of x given y is affine function

x̂ = x̄ + B(y − ȳ)

where B = ΣxAT (AΣxAT + Σv )−1, ȳ = Ax̄ + v̄


intepretation:

• x̄ is our best prior guess of x (before measurement)

• y − ȳ is the discrepancy between what we actually measure (y) and the


expected value of what we measure (ȳ)

Estimation 6–26
• estimator modifies prior guess by B times this discrepancy

• estimator blends prior information with measurement

• B gives gain from observed discrepancy to estimate

• B is small if noise term Σv in ‘denominator’ is large

Estimation 6–27
MMSE error with linear measurements

MMSE estimation error, x̃ = x̂ − x, is Gaussian with zero mean and


covariance
Σest = Σx − ΣxAT (AΣxAT + Σv )−1 AΣx

• Σest ≤ Σx, i.e., measurement always decreases uncertainty about x


• difference Σx − Σest gives value of measurement y in estimating x
• e.g., (Σest ii/Σx ii)1/2 gives fractional decrease in uncertainty of xi due
to measurement

note: error covariance Σest can be determined before measurement y is


made!

Estimation 6–28
to evaluate Σest , only need to know

• A (which characterizes sensors)

• prior covariance of x (i.e., Σx)

• noise covariance (i.e., Σv )

you do not need to know the measurement y (or the means x̄, v̄)

useful for experiment design or sensor selection

Estimation 6–29
Information matrix formulas

we can write estimator gain matrix as

B = ΣxAT (AΣxAT + Σv )−1


T −1 −1 −1 T −1

= A Σv A + Σx A Σv

• n × n inverse instead of m × m

• Σ−1
x , Σ−1
v sometimes called information matrices

corresponding formula for estimator error covariance:

Σest = Σx − ΣxAT (AΣxAT + Σv )−1AΣx


T −1 −1 −1

= A Σv A + Σx

Estimation 6–30
can interpret Σ−1 −1 T −1
est = Σx + A Σv A as:

posterior information matrix (Σ−1


est )
= prior information matrix (Σ−1
x )
+ information added by measurement (AT Σ−1 v A)

Estimation 6–31
proof: multiply

T T −1 ? T −1 −1 T −1
Σ−1

ΣxA (AΣxA + Σv ) = A v A + Σx A Σv

on left by (AT Σ−1 −1 T


v A + Σx ) and on right by (AΣx A + Σv ) to get

?
(AT Σ−1 −1 T T −1 T
v A + Σx )Σx A = A Σv (AΣx A + Σv )

which is true

Estimation 6–32
Relation to regularized least-squares

suppose x̄ = 0, v̄ = 0, Σx = α2I, Σv = β 2I

estimator is x̂ = By where

T −1 −1 T −1
Σ−1

B = A v A + Σx A Σv
= (AT A + (β/α)2I)−1AT

. . . which corresponds to regularized least-squares

MMSE estimate x̂ minimizes

kAz − yk2 + (β/α)2kzk2

over z

Estimation 6–33
Example
navigation using range measurements to distant beacons

y = Ax + v

• x ∈ R2 is location
• yi is range measurement to ith beacon
• vi is range measurement error, IID N (0, 1)
• ith row of A is unit vector in direction of ith beacon

prior distribution:
2
   
1 2 0
x ∼ N (x̄, Σx), x̄ = , Σx =
1 0 0.52

x1 has std. dev. 2; x2 has std. dev. 0.5

Estimation 6–34
90% confidence ellipsoid for prior distribution
{ x | (x − x̄)T Σ−1
x (x − x̄) ≤ 4.6 }:

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2 0 2 4 6

x1

Estimation 6–35
Case 1: one measurement, with beacon at angle 30◦

fewer measurements than variables, so combining prior information with


measurement is critical

resulting estimation error covariance:


 
1.046 −0.107
Σest =
−0.107 0.246

Estimation 6–36
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2 0 2 4 6

x1
interpretation: measurement

• yields essentially no reduction in uncertainty in x2


• reduces uncertainty in x1 by a factor about two

Estimation 6–37
Case 2: 4 measurements, with beacon angles 80◦, 85◦, 90◦, 95◦

resulting estimation error covariance:


 
3.429 −0.074
Σest =
−0.074 0.127

Estimation 6–38
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2
x1
0 2 4 6

interpretation: measurement yields

• little reduction in uncertainty in x1


• small reduction in uncertainty in x2

Estimation 6–39

You might also like