0% found this document useful (0 votes)
26 views35 pages

Class

The document discusses Gaussian random vectors and their properties, including: 1) Gaussian random vectors have a probability density function parameterized by a mean and covariance matrix. 2) Confidence ellipsoids describe the regions containing values of a Gaussian random vector with a given probability. 3) Linear measurements of a Gaussian random vector plus Gaussian noise results in a new Gaussian distribution parameterized by the mean and covariance of the linear transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views35 pages

Class

The document discusses Gaussian random vectors and their properties, including: 1) Gaussian random vectors have a probability density function parameterized by a mean and covariance matrix. 2) Confidence ellipsoids describe the regions containing values of a Gaussian random vector with a given probability. 3) Linear measurements of a Gaussian random vector plus Gaussian noise results in a new Gaussian distribution parameterized by the mean and covariance of the linear transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

EE363 Winter 2008-09

Lecture 7
Estimation

• Gaussian random vectors

• minimum mean-square estimation (MMSE)

• MMSE with linear measurements

• relation to least-squares, pseudo-inverse

7–1
Gaussian random vectors

random vector x ∈ Rn is Gaussian if it has density


 
1
px(v) = (2π)−n/2(det Σ)−1/2 exp − (v − x̄)T Σ−1(v − x̄) ,
2

for some Σ = ΣT > 0, x̄ ∈ Rn

• denoted x ∼ N (x̄, Σ)
• x̄ ∈ Rn is the mean or expected value of x, i.e.,
Z
x̄ = E x = vpx(v)dv

• Σ = ΣT > 0 is the covariance matrix of x, i.e.,

Σ = E(x − x̄)(x − x̄)T

Estimation 7–2
= E xxT − x̄x̄T
Z
= (v − x̄)(v − x̄)T px(v)dv

density for x ∼ N (0, 1):

0.45
√1 e−v /2

0.4
2

0.35

0.3

0.25
px(v) =

0.2

0.15

0.1

0.05

0
−4 −3 −2 −1 0 1 2 3 4
v

Estimation 7–3
• mean and variance of scalar random variable xi are

E xi = x̄i, E(xi − x̄i)2 = Σii



hence standard deviation of xi is Σii
• covariance between xi and xj is E(xi − x̄i)(xj − x̄j ) = Σij
Σij
• correlation coefficient between xi and xj is ρij = p
ΣiiΣjj
• mean (norm) square deviation of x from x̄ is

n
X
E kx − x̄k2 = E Tr(x − x̄)(x − x̄)T = Tr Σ = Σii
i=1

(using Tr AB = Tr BA)

example: x ∼ N (0, I) means xi are independent identically distributed


(IID) N (0, 1) random variables

Estimation 7–4
Confidence ellipsoids

• px(v) is constant for (v − x̄)T Σ−1(v − x̄) = α, i.e., on the surface of


ellipsoid
Eα = {v | (v − x̄)T Σ−1(v − x̄) ≤ α}
– thus x̄ and Σ determine shape of density

• η-confidence set for random variable z is smallest volume set S with


Prob(z ∈ S) ≥ η
– in general case confidence set has form {v | pz (v) ≥ β}

• Eα are the η-confidence sets for Gaussian, called confidence ellipsoids


– α determines confidence level η

Estimation 7–5
Confidence levels

the nonnegative random variable (x − x̄)T Σ−1(x − x̄) has a χ2n


distribution, so Prob(x ∈ Eα) = Fχ2n (α) where Fχ2n is the CDF

some good approximations:

• En gives about 50% probability


• En+2√n gives about 90% probability

Estimation 7–6
geometrically:

• mean x̄ gives center of ellipsoid



• semiaxes are αλiui, where ui are (orthonormal) eigenvectors of Σ
with eigenvalues λi

Estimation 7–7
   
2 2 1
example: x ∼ N (x̄, Σ) with x̄ = ,Σ=
1 1 1

• x1 has mean 2, std. dev. 2
• x2 has mean 1, std. dev. 1

• correlation coefficient between x1 and x2 is ρ = 1/ 2
• E kx − x̄k2 = 3

90% confidence ellipsoid corresponds to α = 4.6:


8

2
x2

−2

−4

−6

−8
−10 −8 −6 −4 −2 0 2 4 6 8 10

x1
(here, 91 out of 100 fall in E4.6)

Estimation 7–8
Affine transformation

suppose x ∼ N (x̄, Σx)


consider affine transformation of x:

z = Ax + b,

where A ∈ Rm×n, b ∈ Rm
then z is Gaussian, with mean

E z = E(Ax + b) = A E x + b = Ax̄ + b

and covariance

Σz = E(z − z̄)(z − z̄)T


= E A(x − x̄)(x − x̄)T AT
= AΣxAT

Estimation 7–9
examples:

• if w ∼ N (0, I) then x = Σ1/2w + x̄ is N (x̄, Σ)


useful for simulating vectors with given mean and covariance

• conversely, if x ∼ N (x̄, Σ) then z = Σ−1/2(x − x̄) is N (0, I)


(normalizes & decorrelates; called whitening or normalizing )

Estimation 7–10
suppose x ∼ N (x̄, Σ) and c ∈ Rn

scalar cT x has mean cT x̄ and variance cT Σc

thus (unit length) direction of minimum variability for x is u, where

Σu = λminu, kuk = 1


standard deviation of uTn x is λmin

(similarly for maximum variability)

Estimation 7–11
Degenerate Gaussian vectors

• it is convenient to allow Σ to be singular (but still Σ = ΣT ≥ 0)

– in this case density formula obviously does not hold


– meaning: in some directions x is not random at all
– random variable x is called a degenerate Gaussian

• write Σ as
 
  Σ+ 0  T
Σ= Q+ Q0 Q+ Q0
0 0

where Q = [Q+ Q0] is orthogonal, Σ+ > 0

– columns of Q0 are orthonormal basis for N (Σ)


– columns of Q+ are orthonormal basis for range(Σ)

Estimation 7–12
• then  
z
QT x = , x = Q+z + Q0w
w
– z ∼ N (QT+x̄, Σ+) is (nondegenerate) Gaussian (hence, density
formula holds)
– w = QT0 x̄ ∈ Rn is not random, called deterministic component of x

Estimation 7–13
Linear measurements

linear measurements with noise:

y = Ax + v

• x ∈ Rn is what we want to measure or estimate

• y ∈ Rm is measurement

• A ∈ Rm×n characterizes sensors or measurements

• v is sensor noise

Estimation 7–14
common assumptions:

• x ∼ N (x̄, Σx)
• v ∼ N (v̄, Σv )
• x and v are independent

• N (x̄, Σx) is the prior distribution of x (describes initial uncertainty


about x)
• v̄ is noise bias or offset (and is usually 0)
• Σv is noise covariance

Estimation 7–15
thus      
x x̄ Σx 0
∼N ,
v v̄ 0 Σv

using     
x I 0 x
=
y A I v

we can write    
x x̄
E =
y Ax̄ + v̄
and
  T    T
x − x̄ x − x̄ I 0 Σx 0 I 0
E =
y − ȳ y − ȳ A I 0 Σv A I
T
 
Σx Σx A
=
AΣx AΣxAT + Σv

Estimation 7–16
covariance of measurement y is AΣxAT + Σv

• AΣxAT is ‘signal covariance’


• Σv is ‘noise covariance’

Estimation 7–17
Minimum mean-square estimation

suppose x ∈ Rn and y ∈ Rm are random vectors (not necessarily Gaussian)

we seek to estimate x given y

thus we seek a function φ : Rm → Rn such that x̂ = φ(y) is near x

one common measure of nearness: mean-square error,

E kφ(y) − xk2

minimum mean-square estimator (MMSE) φmmse minimizes this quantity

general solution: φmmse(y) = E(x|y), i.e., the conditional expectation of x


given y

Estimation 7–18
MMSE for Gaussian vectors

now suppose x ∈ Rn and y ∈ Rm are jointly Gaussian:


       
x x̄ Σx Σxy
∼ N ,
y ȳ ΣTxy Σy

(after a lot of algebra) the conditional density is


 
−n/2 −1/2 1
px|y (v|y) = (2π) (det Λ) exp − (v − w)T Λ−1(v − w) ,
2

where
Λ = Σx − Σxy Σ−1
y Σ T
xy , w = x̄ + Σxy Σ−1
y (y − ȳ)

hence MMSE estimator (i.e., conditional expectation) is

x̂ = φmmse(y) = E(x|y) = x̄ + Σxy Σ−1


y (y − ȳ)

Estimation 7–19
φmmse is an affine function

MMSE estimation error, x̂ − x, is a Gaussian random vector

x̂ − x ∼ N (0, Σx − Σxy Σ−1 T


y Σxy )

note that
Σx − Σxy Σ−1
y Σ T
xy ≤ Σx
i.e., covariance of estimation error is always less than prior covariance of x

Estimation 7–20
Best linear unbiased estimator

estimator
x̂ = φblu(y) = x̄ + Σxy Σ−1
y (y − ȳ)
makes sense when x, y aren’t jointly Gaussian

this estimator

• is unbiased, i.e., E x̂ = E x
• often works well
• is widely used
• has minimum mean square error among all affine estimators

sometimes called best linear unbiased estimator

Estimation 7–21
MMSE with linear measurements
consider specific case

y = Ax + v, x ∼ N (x̄, Σx), v ∼ N (v̄, Σv ),

x, v independent
MMSE of x given y is affine function

x̂ = x̄ + B(y − ȳ)

where B = ΣxAT (AΣxAT + Σv )−1, ȳ = Ax̄ + v̄


intepretation:

• x̄ is our best prior guess of x (before measurement)

• y − ȳ is the discrepancy between what we actually measure (y) and the


expected value of what we measure (ȳ)

Estimation 7–22
• estimator modifies prior guess by B times this discrepancy

• estimator blends prior information with measurement

• B gives gain from observed discrepancy to estimate

• B is small if noise term Σv in ‘denominator’ is large

Estimation 7–23
MMSE error with linear measurements

MMSE estimation error, x̃ = x̂ − x, is Gaussian with zero mean and


covariance
Σest = Σx − ΣxAT (AΣxAT + Σv )−1AΣx

• Σest ≤ Σx, i.e., measurement always decreases uncertainty about x


• difference Σx − Σest (or some other comparison) gives value of
measurement y in estimating x
– (Σest ii/Σx ii)1/2 gives fractional decrease in uncertainty of xi due to
measurement
– (Tr Σest/ Tr Σ)1/2 gives fractional decrease in uncertainty in x,
measured by mean-square error

Estimation 7–24
Estimation error covariance

• error covariance Σest can be determined before measurement y is made!

• to evaluate Σest, only need to know


– A (which characterizes sensors)
– prior covariance of x (i.e., Σx)
– noise covariance (i.e., Σv )

• you do not need to know the measurement y (or the means x̄, v̄)

• useful for experiment design or sensor selection

Estimation 7–25
Information matrix formulas

we can write estimator gain matrix as

B = ΣxAT (AΣxAT + Σv )−1


T −1 −1 −1 T −1

= A Σv A + Σx A Σv

• n × n inverse instead of m × m

• Σ−1
x , Σ −1
v sometimes called information matrices

corresponding formula for estimator error covariance:

Σest = Σx − ΣxAT (AΣxAT + Σv )−1AΣx


T −1 −1 −1

= A Σv A + Σx

Estimation 7–26
can interpret Σ−1 −1 T −1
est = Σx + A Σv A as:

posterior information matrix (Σ−1


est )
= prior information matrix (Σ−1
x )
+ information added by measurement (AT Σ−1 v A)

Estimation 7–27
proof: multiply

T T −1 ? T −1 −1 T −1
Σ−1

ΣxA (AΣxA + Σv ) = A v A + Σx A Σv

on left by (AT Σ−1 −1 T


v A + Σx ) and on right by (AΣx A + Σv ) to get

?
(AT Σ−1 −1 T T −1 T
v A + Σx )Σx A = A Σv (AΣx A + Σv )

which is true

Estimation 7–28
Relation to regularized least-squares

suppose x̄ = 0, v̄ = 0, Σx = α2I, Σv = β 2I

estimator is x̂ = By where

T −1 −1 T −1
Σ−1

B = A v A + Σx A Σv
= (AT A + (β/α)2I)−1AT

. . . which corresponds to regularized least-squares

MMSE estimate x̂ minimizes

kAz − yk2 + (β/α)2kzk2

over z

Estimation 7–29
Example
navigation using range measurements to distant beacons

y = Ax + v

• x ∈ R2 is location
• yi is range measurement to ith beacon
• vi is range measurement error, IID N (0, 1)
• ith row of A is unit vector in direction of ith beacon

prior distribution:
2
   
1 2 0
x ∼ N (x̄, Σx), x̄ = , Σx =
1 0 0.52

x1 has std. dev. 2; x2 has std. dev. 0.5

Estimation 7–30
90% confidence ellipsoid for prior distribution
{ x | (x − x̄)T Σ−1
x (x − x̄) ≤ 4.6 }:

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2 0 2 4 6

x1

Estimation 7–31
Case 1: one measurement, with beacon at angle 30◦

fewer measurements than variables, so combining prior information with


measurement is critical

resulting estimation error covariance:


 
1.046 −0.107
Σest =
−0.107 0.246

Estimation 7–32
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2 0 2 4 6

x1
interpretation: measurement

• yields essentially no reduction in uncertainty in x2


• reduces uncertainty in x1 by a factor about two

Estimation 7–33
Case 2: 4 measurements, with beacon angles 80◦, 85◦, 90◦, 95◦

resulting estimation error covariance:


 
3.429 −0.074
Σest =
−0.074 0.127

Estimation 7–34
90% confidence ellipsoid for estimate x̂: (and 90% confidence ellipsoid for
x)
5

1
x2
0

−1

−2

−3

−4

−5
−6 −4 −2
x1
0 2 4 6

interpretation: measurement yields

• little reduction in uncertainty in x1


• small reduction in uncertainty in x2

Estimation 7–35

You might also like