Module C
Module C
Outline:
Basic Estimation Theory: ML, MAP
Conditional Expectation, and Mean Square Estimation
Orthogonality Principle and LMMSE Estimator
1
Estimation Theory
2
Motivating Example
Let the samples be {1, 2, 1.5, 1.75, 2, 1.3, 0.8, 0.3, 1}.
3
Maximum Likelihood Estimation (θ is a parameter)
θ̂M L (b
x) := argmaxθ L(θ|X = x
b).
4
Log Likelihood Estimation
L(θ|X1 = xb1 , X2 = x
b1 , . . . , XN = x bn ) = fX1 ,X2 ,...,XN (b
x1 , x
b2 , . . . , x
bN ; θ)
x1 ; θ) × fX2 (b
= fX1 (b x2 ; θ) . . . × fXN (b
xN ; θ) (due to independence of observations)
x1 ; θ) × fX (b
= fX (b x2 ; θ) . . . × fX (b
xN ; θ) (each Xi has identical distribution)
N
Y N
Y
= fX (b
xi ; θ) = L(θ|Xi = x
bi ).
i=1 i=1
5
Example
We observe {b
x1 , x bN } with each x
b2 , . . . , x bi ∈ {0, 1}.
Problem: find θ̂M L (b
x1 , x
b2 , . . . , x
bN )
The likelihood function L(θ|X1 = x1 , X2 = x2 ....XN = xn ) = .
6
Conditional distribution
P(X1 = x1 , X2 = x2 )
pX1 |X2 (x1 |X2 = x2 ) = P(X1 = x1 |X2 = x2 ) = .
P(X2 = x2 )
7
Conditional Distributions
Consider two discrete random variables X and Y . Let X takes values from
the set {x1 , . . . , xn } and let Y takes values from the set {y1 , . . . , ym }.
8
Example
9
Example
10
Maximum A-Posteriori (MAP) Estimation
Once we observe X = x
b, we find posterior distribution using Baye’s law as:
fθ,X (θ, x
b)
fθ|X (θ|X = x
b) =
fX (b
x)
x|θ)fθ (θ)
fX|θ (b
=
fX (b
x)
x|θ)fθ (θ)
fX|θ (b
=R .
f
θ X|θ (b
x |θ)f θ (θ)dθ
θ̂MAP (b
x) = argmaxθ fθ|X (θ|X = x
b) = argmaxθ x|θ)fθ (θ),
fX|θ (b
11
Example (Previous year End Semester Question)
12
Mean Square Estimation Theory
The best is subjective and need to set a criteria. One popular criteria is
Mean Square Error (MSE).
E[|g(X1 , . . . , Xk ) − X|2 ].
Once we fix the MSE criteria for the best estimator, then the problem of find-
ing the best MSE estimator for X based on the measurements X1 , . . . , Xk
can be formulated as:
Any g that minimizes the above criteria is called a Minimum Mean Square
Error (MMSE) estimator.
When solving for MMSE, we always assume that all the random variables
involved have finite mean and variance.
13
MMSE
14
Conditional Expectation
Example: Let X, Y be discrete r.v with (X, Y ∈ {1, 2}) and joint pmf:
1 1
P[X = 1, Y = 1] = , P[X = 1, Y = 2] =
2 10
1 3
P[X = 2, Y = 1] = , P[X = 2, Y = 2] =
10 10
Determine the marginal pmf of X and Y .
Show that the conditional pmf of X given Y = 1 is
(
5
6 if X = 1
P[X|Y = 1] = 1
6 if X = 2.
Then, E[X|Y = 2] = .
We can view E[X|Y ] as a function of Y as
(
E[X|Y = 1] with probability P[Y = 1]
g(Y ) = E[X|Y ] =
E[X|Y = 2] with probability P[Y = 2]
15
Conditional Expectation
Similarly,
Z
E[h(X)|Y = y] = h(x)fX|Y (x, Y = y)dx
Zx
E[l(X, Y )|Y = y] = l(x, y)fX|Y (x, Y = y)dx
x
16
Example
17
Properties of Conditional Expectation
18
Tower Property and Orthogonality
Tower Property:
E[E[X|Y ]] = E[X].
Proof:
Z
EY [E[X|Y ]] = E[X|Y = y]fY (y)dy
Zy Z
= xfX|Y (x | Y = y)dx fY (y)dy
y x
Z Z
= x fX|Y (x | Y = y)fY (y) dydx
y x | {z }
fxy (x,y)
Z Z
= x fXY (x, y)dy dx
x y
| {z }
=:fX (x)
Z
= xfX (x)dx = E[X]
x
19
Minimum Mean Square Estimator (MMSE)
Proof:
20
L2(Ω, F, P) Space of Random Variables
X · Y := E[XY ].
21
L2-norm and L2 convergence
L
we have limk→∞ Xk →2 X for some random variable X ∈ L2 .
Important Cases:
1. For random variables X1 , . . . , Xk ∈ L2 , the set H = {α1 X1 + . . . +
αk Xk | αi ∈ R} is a closed linear subspace.
2. For any random variables X1 , . . . , Xk ∈ L2 , the set H = {α0 + α1 X1 +
. . . + αk Xk | αi ∈ R} is a closed linear subspace.
22
Orthogonality Principle
Note:
Y ⋆ is called the projection of X on the subspace H and is denoted by
ΠH (X).
Two random variables X, Y are orthogonal, X ⊥ Y , if E[XY ] = 0.
Relate MSE estimator with the above theorem.
23
Linear Minimum Mean Square Error (LMMSE) Estimation
24
Derivation of LMMSE Coefficients
25
LMMSE Coefficients for Multiple Observations
The goal is to find coefficients that minimize the mean square error
k
X
min E[(X − (a0 + ai Yi ))2 ].
a0 ,a1 ,...,ak
i=1
26
Derivation of LMMSE Coefficients
27
Derivation of LMMSE Coefficients
X1
X
2
When X is also a random vector .. , the LMMSE is given by
.
Xn
X
b1,LMSE (Y ) E[X1 ] + Cov(X1 , Y )⊤ [Cov(Y )]−1 [Y − E[Y ]]
⊤ −1
b2,LMSE (Y ) E[X2 ] + Cov(X2 , Y ) [Cov(Y )] [Y − E[Y ]]
X
XLMSE (Y ) =
b .. = .. .
. .
⊤ −1
Xbn,LMSE (Y ) E[Xn ] + Cov(Xn , Y ) [Cov(Y )] [Y − E[Y ]]
28
Example (Previous year End-Sem Question)
29
MMSE and LMMSE Estimator Comparison
An estimator X(Y
b ) is unbiased if E[X(Y
b )] = E[X].
Among MMSE and LMMSE estimators, which one has smaller estimation
error?
If X and Y are uncorrelated, what does the LMMSE estimator give us?
What about MMSE estimator?
X
bLMMSE (Y ) = X
bMMSE (Y )
⇐⇒ E[X|Y ] = E[X] + Cov(X, Y )⊤ [Cov(Y )]−1 [Y − E[Y ]].
30