0% found this document useful (0 votes)
101 views17 pages

Slides Estimation PDF

The document discusses estimation methods in statistics. It defines an estimator as a function used to approximate a parameter based on sample data. Unbiased estimators have an expected value equal to the true parameter value. The mean squared error is used to compare estimators, balancing bias and variance. The sample mean and ideal sample variance are presented as unbiased estimators for the population mean and variance, respectively, when the underlying distributions are known. However, the ideal sample variance requires knowing the population mean, which is typically unknown.

Uploaded by

Willy Creed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views17 pages

Slides Estimation PDF

The document discusses estimation methods in statistics. It defines an estimator as a function used to approximate a parameter based on sample data. Unbiased estimators have an expected value equal to the true parameter value. The mean squared error is used to compare estimators, balancing bias and variance. The sample mean and ideal sample variance are presented as unbiased estimators for the population mean and variance, respectively, when the underlying distributions are known. However, the ideal sample variance requires knowing the population mean, which is typically unknown.

Uploaded by

Willy Creed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Proba/Stats QEM: B.

Estimation

Christophe Chorro ([email protected])

References: Bernard Lindgren, Statistical Theory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3, 8.7 and Arthur
Goldeberger, A course in Econometrics, 11.1-11.3, 12.1, 12.2

November 2019

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
1 / 17
Introduction

We consider a random variable (discrete or continuous) with a distribution


depending on a parameter θ (scalar or vectorial).
Considering a N sample (X1 , ..., XN ) of the distribution of X , we want to
approximate θ.

Definition
[B.1] An estimator θ̂(X1 , ..., XN ) (denoted by θ̂) is a regular function of
(X1 , ..., XN ).

The objective of this course is to find such θ̂ with good properties.

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
2 / 17
Introduction

Definition
[B.2] An estimator θ̂ of θ is unbiased if

E[θ̂] − θ = 0.
| {z }
Bias

To compare estimator we will use the Mean Squared Error (MSE):

MSE(θ̂) = E[(θ̂ − θ)2 ] = Var [θ̂] + (E[θ̂] − θ)2 .

Rk: When an estimator is unbiased, MSE = Var .

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
3 / 17
Introduction

Definition
[B.3] a) If θ̂ is an unbiased estimator of θ we say that θ̂ is efficient if its
variance is minimal.
b) We say that θ̂ is the BUE if it is unbiased and efficient.
c) We say that θ̂ is the BLUE if it is unbiased, linear in (X1 , ..., XN ) and with a
minimal variance among the linear unbiased estimators.

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
4 / 17
Ex: Sample mean

From a random sample X1 , ..., Xn in L2 we want to know µ = E[X1 ]:


We use Xn and we have:

E[Xn ] = µ
var [X ]
var [Xn ] = n

where in this case var [Xn ] is the precision in terms of the mean squared error
(MSE):

MSE = E[(Xn − µ)2 ] = var [Xn ] + (E[Xn ] − µ)2


| {z }
Bias=0 (here)

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
5 / 17
Ex: Ideal Sample variance

From a random sample X1 , ..., Xn in L4 we want to know σ 2 = var [X1 ]. Since


var [X ] = E[(X − E[X ])2 ],
n
1
we may use σ̃n2 = (Xi − µ)2 and we have:
P
n
i=1

E[σ̃n2 ] = σ 2
µ4 −σ 4
var [σ̃n2 ] = n where µ4 = E[(X − E[X ])4 ]

BUT in general µ is unknown...

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
6 / 17
Ex: Corrected Sample variance
n
1
Another possible approximation is the sample variance σn2 = (Xi − Xn )2
P
n
i=1
n−1 2
but E[σn2 ] = n σ .

Definition
[A.4] Corrected sample variance
n
sn2 = σ2
n−1 n

Theorem
[A.1]

2σ 4 µ4 − 3σ 4
var [sn2 ] = + .
n−1 n

Rk: var [sn2 ] > var [σ̃n2 ]

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
7 / 17
Method of moments
Here, θ is related to moments of the distribution of X . We estimate θ using the
related sample moments:

Definition
[A.3] Let X1 , ..., Xn be a random sample from the density f with X1 ∈ Lr . We
define
1) The r-th sample moment
n
1X r
Mr = Xi
n
i=1

and we denote Xn for the sample mean M1 .


2) The r-th sample moment about Xn
n
1X
Mr0 = (Xi − Xn )r
n
i=1

and we denote σn2 for the sample variance M20 .

Ex: When X ,→ P(θ), θ may be estimated using XN or sN2 .


Christophe Chorro ([email protected]) (References:
Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
8 / 17
The likelihood problem

Suppose that the distribution of X belongs to a known family except for the
parameter of interest θ.
e−θ θ k
Ex: For θ > 0, P(X = k , θ) = k! or fX (x, θ) = θe−θx 1x≥0 .
We suppose to know:

• Parameter Space: θ = (θ1 , ..., θd ) ∈ Θ ⊂ Rd .

• Density family for X : {f (x, θ) | θ ∈ Θ} or {P(X = k , θ) | θ ∈ Θ}


| {z } | {z }
C 0 Case Discretecase

• The true value of the parameter θ0 is not known.

Rk: We have fX (x) = f (x, θ0 ) or P(X = x) = P(X = x, θ0 )

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and Arthur
2019 Goldeberger
9 / 17
The likelihood problem

Proposition
[B.1]a) The true parameter fulfills

θ0 = Argmaxθ∈Θ E [Log(f (X , θ)].


b) The first order condition is
 
∂Log(f (X , θ0 )
E =0
∂θ
∂Log(f (X ,θ)
where z(X , θ) = ∂θ is called the score function.

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
10 / 17
Maximum Likelihood estimation
Starting from a sample (X1 , ...XN ) of X we are looking for the sample
counterpart of the likelihood problem using
N
1 X
E [Log(f (X , θ)] ≈ log(f (Xi , θ)).
N
i=1

Definition
N
Q
[B.4]a) Likelihood function, LN (θ) = f (Xi , θ).
i=1

a) Log-Likelihood function, LN (θ) = log(LN (θ)).


c) Maximum Likelihood estimator
N
1 X
θ̂MLE = Argmaxθ∈Θ log(f (Xi , θ)) = Argmaxθ∈Θ LN (θ).
N
i=1

N
P
Rk: We also write LN (x, θ) = log(f (xi , θ))
i=1
Christophe Chorro ([email protected]) (References:
Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
11 / 17
Maximum Likelihood estimation

To find θ̂MLE , we have to solve the first order condition

N
X ∂log(f (Xi , θ))
= 0.
∂θ
i=1
| {z }
z(X ,θ)

Ex: Gaussian distribution.

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
12 / 17
Cramer-Rao Lower bound (scalar case)

Here we suppose Θ ⊂ R and we want to know if θ̂MLE is efficient?

Definition
[B.5] We define the Fisher information of the model by
 
" 2 #
 ∂LN (θ0 )  ∂LN (θ0 )
I(θ0 ) = Var 

=E
 .
| ∂θ
{z } ∂θ
The score

Proposition
[B.2]
∂ 2 LN (θ0 )
 
I(θ0 ) = −E
∂θ2

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
13 / 17
Cramer-Rao Lower bound (scalar case)

Theorem
[B.1] (Cramer Rao Lower Bound) If T (X1 , ..., XN ) is an unbiased estimator of
θ0 then

MSE(T ) = Var (T ) ≥ I(θ0 )−1 .


In particular if Var (T ) = I(θ0 )−1 , T is a BUE.

Theorem
[B.2] (Admitted) If a BUE exists it is the MLE.

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
14 / 17
Cramer-Rao Lower bound (Θ ⊂ Rd )

Theorem
[B.3] (Cramer Rao Lower Bound) If T (X1 , ..., XN ) is an unbiased estimator of
θ0 ∈ Rd then

Var (T ) ≥ I(θ0 )−1


| {z } | {z }
Variance−covariance matrix Inverse matrix

where
 
 2  
 ∂ LN (θ0 ) 
I(θ0 ) = −E 
 .
 ∂θi ∂θj 
1≤i,j≤d 
| {z }
Hessian matrix of LN

Ex: Gaussian distribution

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
15 / 17
A look on Bayesian estimation
P(B|A)P(A)
Reminder: P(A | B) = P(B) .
We have a N sample (X1 , ..., XN ) of X and the parameter of interest is θ ∈ Θ.

N
Q
• The likelihood function remains LN (x, θ) = f (xi , θ) and may be seen as
i=1
the density of (X1 , ..., XN ) given θ.

• We suppose an a priori distribution for θ ∈ Θ and we call g the associated


density function.

Using the analogy with Bayes formula we deduce an a posteriori distribution


for θ:

h(θ | (X1 , ..., XN )) = LN (θ)g(θ) K


|{z} .
R
Constant to make dθ=1

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
16 / 17
A look on Bayesian estimation

Once the a posteriori distribution is obtained, we can obtain estimates for θ


minimizing or maximizing some criteria:

•R The mean of the a posteriori distribution minimizes the EQL


+∞
−∞
(c − θ)2 h(θ | X )dθ.

•R The median of the a posteriori distribution minimizes the EAL


+∞
−∞
| c − θ | h(θ | X )dθ.

• The mode of the a posteriori distribution maximizes h(θ | X ).

Christophe Chorro ([email protected]) (References:


Proba/Stats
Bernard Lindgren,
QEM: B.Statistical
EstimationTheory, 8.1, 8.2, 7.7, 7.10, 8.2, 2.8, 2.3,
November
8.7 and 2019
Arthur Goldeberger
17 / 17

You might also like