Problem Set 10 Solutions
Problem Set 10 Solutions
Answer:
1. From Definition 4.18 in the lecture notes we know that the conditional density of a random variable
Y given another random variable X = x is
fX,Y (x,y) , if fX (x) > 0;
fX (x)
f Y |X ( y| x) =
0, otherwise.
1+4x
∀x ∈ (0, 1) fX (x) = 3 > 0, therefore, the joint density of X, Y is
2y + 4x 1 + 4x 2y + 4x
fXY (x, y) = f Y |X ( y| x) fX (x) = · = .
1 + 4x 3 3
The joint cumulative distribution is an integral over this density function
Z yZ x Z yZ x
2b + 4a y 2 x + 2x2 y
FX,Y (x, y) = P(X ⩽ x, Y ⩽ y) = fXY (a, b) da db = da db = .
0 0 0 0 3 3
2. The marginal density of Y can be derived by integrating out the variable x
Z ∞ Z 1
2y + 4x 2y + 2
fY (y) = fXY (x, y) dx = dx = .
−∞ 0 3 3
The corresponding marginal distribution of Y is then given by
Z y Z y
2b + 2 y 2 + 2y
FY (y) = P(Y ⩽ y) = fY (b) db = db = .
0 0 3 3
2y+2
3. Finally, as ∀y ∈ (0, 1) fY (y) = 3 > 0, the conditional density of X given Y is
2y+4x
fX,Y (x, y) 3 y + 2x
f X|Y ( x| y) = = 2y+2 = .
fY (y) 3
y+1
The corresponding conditional cumulative distribution function is
Z x x
yx + x2
Z
y + 2a
FX|Y (x|y) = P(X ≤ x|Y = y) = fX|Y (a|y) da = da = .
0 0 y+1 y+1
1
2 Conditional moments
For random variables X, Y ∼ N 0, σ 2 determine
1. E X X 2
2. E (X |XY )
3. E X 2 + Y 2 |X + Y .
Answer:
2
1. Since normal
√ distribution is symmetric around zero,
√ thus knowing X = x one can immediately tell
2
that X √= x√ with probability √ 1/2 or
√ X = −√ x with probability
√ 1/2. Thus E X X = x =
P (X = x) · x + P (X = − x) · (− x) = 21 · x + 21 · (− x) = 0.
2
Another way to show is to notice that −X ∼ N 0, σ 2 =⇒ E X X 2 = E −X (−X) =
2
2
2
E −X X = −E X X =⇒ E X X = 0.
2. One may note that −X, −Y ∼ N 0, σ 2 .
because −X, −Y have the same distribution as X, Y . Thus E (X |XY ) = −E (X |XY ) ⇐⇒ E (X |XY ) =
0.
2 2
3. Applying the same idea, E X 2 + Y 2 |X + Y = −E (−X) + (−Y ) | − X − Y = −E X 2 + Y 2 |X + Y =⇒
E X 2 + Y 2 |X + Y = 0.
3 Estimators
Let X1 , ..., Xn be i.i.d N (µ, σ 2 ).
4. Is σ̂ 2 a biased estimator? What is the variance of σ̂ 2 ? (e) Show that σ̂ 2 has smaller MSE than
S 2 .Explain why.
Answer:
1.
n
1X 1
E X̄ = E( Xi ) = nEX1 = µ;
n i=1 n
2
n
1X
V ar(X̄) = V ar( Xi )
n i=1
n
1 X
= V ar( Xi )
n2 i=1
1
= nV ar(X1 )
n2
σ2
= ;
n
3. Compute MLE σ̂ 2 ,
n
1 1 X (xi − µ)2
L(µ, σ 2 | X) = n exp{− };
(2πσ 2 ) 2 2 i=1 σ2
n
n n 2 1 X (xi − µ)2
lnL = − ln 2π − ln σ − ;
2 2 2 i=1 σ2
FOC:
n
∂ ln L 1X
= 2 (xi − µ) = 0 (1)
∂µ σ i=1
n
∂ ln L n 1 X
= − + (xi − µ)2 = 0 (2)
∂σ 2 2σ 2 2σ 4 i=1
3
n
1 n−1 2
From (2) ⇒ σ̂ 2 = (xi − X̄)2 ⇒ σ̂ 2 =
P
n n S .
i=1
4. E σ̂ 2 = E( n−1 2
n S )=
n−1 2
n σ (clearly, it is biased. )
n−1 2
2 2
Bias: E(σ̂ − σ ) = n σ −σ 2 = − n1 σ 2 (Notes: as n → ∞, bias → 0 )
2(n−1)σ 4
V ar(σ̂ 2 ) = V ar( n−1 2 n−1 2 2
n S ) = ( n ) V ar(S ) = n2 .
5. MSE of σ̂ 2 is given as
E(σ̂ 2 − σ 2 )2 = V ar(σ̂ 2 ) + (Bias(σ̂ 2 ))2
2(n − 1)σ 4 1
= + 2 σ4
n2 n
2n − 1 4
= σ .
n2
Compare M SE(σ̂ 2 ) with M SE(S 2 ), we find
2n − 1 4 2
2
σ < σ4 ⇒
n n−1
M SE(σ̂ 2 ) < M SE(S 2 ).
This shows there is trade-off between bias and variance.
Answer:
1. The log likelihood function is
n
1X 1 1
lnL(θ1 , θ2 | X, Y, Z) = |{z}
c − (Xi − θ1 )2 + (Yi − θ2 )2 + (Zi − θ1 − θ2 )2 .
2 i=1 2 4
constant
The FOC
∂ 1
lnL = (X − θ1 ) + (Z̄ − θ1 − θ2 ) = 0 ⇒ 4X̄ + Z̄ = 5θ1 + θ2
∂θ1 4
∂ 1 1
lnL = (Ȳ − θ2 ) + (Z̄ − θ1 − θ2 ) = 0 ⇒ 2Ȳ + Z̄ = θ1 + 3θ2
∂θ2 2 4
Solving for θ̂1 and θ̂2 ,
6X̄ − Ȳ + Z̄
θ̂1 = .
7
−2X̄ + 5Ȳ + 2Z̄
θ̂2 = .
7
4
2. Take n = 1. The vector of scores is,
1 1 1
2 2 2
∂ c − 2 (X − θ 1 ) + 2 (Y − θ 2 ) + 4 (Z − θ 1 − θ 2 )
|{z}
∂ ln f (θ1 , θ2 )
constant
=
∂θ ∂θ
(X − θ1 ) + 14 (Z − θ1 − θ2 )
= 1 1 .
2 (Y − θ2 ) + 4 (Z − θ1 − θ2 )
5 1
−1 3
− 41 3
− 14
−1 4 4
1 4
8 4
1 6 −2
I(θ1 , θ2 ) = 1 3 = = = .
4 4
15
16 − 1
16
− 14 5
4 7 − 14 5
4 7 −2 10
√
θ̂1 − θ1 d −1 1 6 −2
n → N (0, I ) = N 0, .
θ̂2 − θ2 7 −2 10
Answer:
Pn
Xi nEX
1. Unbiased: E θb = EX = E i=1
n = n = EX = θ.
2
∂
2. We have − (∂θ) 2 ln f (x; θ) = 2f (x; θ) (obtained by direct computation!).
R∞ R∞
∂2
Thus I (θ) = E − (∂θ) 2 ln f (x; θ) = 2Ef (x; θ) = 2 −∞ f (x; θ) · f (x; θ) dx = 2 −∞ f 2 (x; θ) dx =
∞
2 ∞ ∞
e−x+θ e−x+θ 0
Z Z Z Z
−x+θ tdt tdt
=2 4 dx = 2 − 4 de =2 − 4 =2 4 =
−∞ (1 + e−x+θ ) −∞ (1 + e−x+θ ) ∞ (1 + t) 0 (1 + t)
Z ∞ −3
! Z ∞ ∞ Z ∞ ∞ Z ∞ −3
(1 + t) 2t (1 + t)
=2 td − =2 udv = 2uv −2 vdu = − 3 −2 − dt =
0 3 0 0 3(1 + t) 0 3
0 0
5
−2 ∞
(1 + t) 1
=0−2 = .
6 3
0
The lower bound on the variance of all estimators of θ equals n1 I −1 (θ) = n3 . Estimator θb = X is the
best unbiased estimator (efficient) if its variance equals to the Cramer-Rao lower bound.
π2 3
However, V arθb = 3n > n, thus estimator θb is not the best unbiased estimator of θ.
Remark: Note that the lower bound might not be achievable, i.e. the best unbiased (efficient) estimator
might not exist. Also note that from the above analysis we cannot conclude anything on the optimality
of θb in the class of all unbiased estimators strictly speaking.