0% found this document useful (0 votes)
21 views14 pages

Week 1 1720465962 Estimation Hour 2

Uploaded by

Jainil Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Week 1 1720465962 Estimation Hour 2

Uploaded by

Jainil Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ESTIMATION Definition 1.

Statistic: A statistic is a function of random variables


and it is free from any unknown parameter. Being a (measurable)
BUDDHANANDA BANERJEE function,T (X) say , of random variables it is also a random variable.

Definition 2. Estimator: If the statistic T (X) is used to estimate a


parametric function g(θ) then T (X ) is said to be {an estimator of g(θ).
Let x = (x 1, x 2, · · · , x n ) be the observed/ realized values of a set of And a realized value of it for X = x i.e. T (x) is know as an estimate
iid
i.i.d. random variables X = (X1, X2, · · · , X n ) whereXi ∼ f θ for some of θ. We often abuse the notation as g(θ) ˆ = T (x) and g(θ)ˆ = T (X)
θ ∈ Θ. Here a family of distributions is denoted by which are understood from the context.
F = { f (x|θ)|θ ∈ Θ} or {F (x|θ)|θ ∈ Θ}

Parametric Estimation: In a parametric inference problem it is


1. Properties
assumed that the family of the distribution is known but the particu-
lar value of the parameter is unknown. We estimate the value of the Definition 3. Unbiased estimator: An estimator T (X) is said to
parameter θ as a function of the observations x. The ultimate goal is be an unbiased estimator of a parametric function g(θ) if E(T (X) −
to approximate the p.d.f f θ or Fθ through the estimation of θ itself. g(θ)) = 0 ∀ θ ∈ Θ.
Parametric estimation has two aspects, namely,
Remark 1. It does not requireT (x) = g(θ) to be hold or it may hold
with probability zero.
 Point estimation
(a) Definition of an estimator Definition 4. Asymptotically unbiased estimator: Denoting
(b) Good properties of an estimator Tn = T (X1, X2, · · · , X n ) an estimator Tn is said to be asymptotically
(c) Methods of estimation (MME and MLE) unbiased of g(θ) if
 Interval estimation
lim Bg(θ) (Tn ) = lim E(Tn − g(θ)) = 0
(a) Definition of confidence interval n→∞ n→∞

(b) Construction of confidence interval


Date: Last updated April 23, 2024.
1
sample size 25 sample size 500
2.0

0 2 4 6 8
Density

Density
1.0
0.0

−0.6 −0.2 0.2 0.6 −0.6 −0.2 0.2 0.6

m1 m3

sample size 50 Unbiasedness

8
0.0 1.0 2.0
Density

Density

4
0
−0.6 −0.2 0.2 0.6 −0.5 0.0 0.5

m2 N = 40000 Bandwidth = 0.02158

2
sample size 25 sample size 500

500
30
Density

Density

200
0 10

0
−0.1 0.0 0.1 0.2 0.3 0.4 −0.1 0.0 0.1 0.2 0.3 0.4

m1^2 m3^2

sample size 50 Asymptotic Unbiasedness

10 20 30
25
Density

Density
10
0

0
−0.1 0.0 0.1 0.2 0.3 0.4 −0.05 0.05 0.10 0.15 0.20

m2^2 N = 40000 Bandwidth = 0.003945

3
Definition 5. Consistent estimator: An estimator Tn is said to 2. Accuracy Measures
P
be consistent estimator g(θ) if Tn −→ g(θ) i.e.
Definition 6. Bias: The bias of an estimator T (X) while estimating
lim P(|Tn − g(θ)| <  ) = 1 ∀ θ ∈ Θ,  > 0 a parametric function g(θ) is Bg(θ) (T (X)) = E(T (X) − g(θ)) ∀ θ ∈ Θ.
n→∞

Definition 7. Mean squared error (MSE): The MSE of an esti-


mator T (X) while estimating a parametric function g(θ) is

M SEg(θ) (T (X)) = E[(T (X) − g(θ)) 2 ] ∀ θ ∈ Θ.


cummean

0.5

Remark 2. M SEg(θ) (T (X)) = V ar (T (X)) + Bg(θ)


2 (T (X))
−1.0

Remark 3. If M SEg(θ) (Tn (X)) ↓ 0 as n ↑ ∞ then show that (Tn (X)) is


0 200 400 600 a consistent estimator.

Remark 4. Asymptotic unbiasedness and consistency are large sample


properties and both are based on L 1 norm. . MSE is defined based
on L 2 norm.
Consistency
Exercise 1. Let (X1, X2, · · · , X n ) be i.i.d random variables with
E(X ) = µ andV ar (X ) = σ 2 . and define Tn (X) = X̄ = n1 i=1
Pn
Xi ,
Density

0 4 8

S1 = n−1 i=1 (Xi − X̄ ) and S2 = n i=1 (Xi − X̄ ) . Show that


2 1 Pn 2 2 1 Pn 2

 Tn (X) is an unbiased estimator of µ.


 S12 is an unbiased estimator of σ 2
−0.5 0.0 0.5  S22 is an asymptotically unbiased estimator of σ 2 .

iid
Remark 5. Let (X1, X2, · · · , X n ) ∼ N (µ, σ 2 ) then M SE(S22 ) <
M SE(S12 ). Unbiased estimator need not have minimum MSE.
4
3. Method of Moments

Histogram of x
Method of Moment for Estimation (MME): Consider x =
(x 1, x 2, · · · , x n ) be the observed/ realized values of a set of i.i.d. ran-

0.20
iid
dom variables X = (X1, X2, · · · , X n ) where Xi ∼ f θ for some θ ∈ Θ. Tru
Est
Then

Density
Step 1: Computer theoretical moments from the p.d.f.

0.10
Step 2: Computer empirical moments from the data.
Step 3: Construct k equations if you have k unknown parameters.
Step 4: Solve the equations for the parameters.

0.00
−4 −2 0 2 4 6

x
0.8
CDF

## True mean= 1.3 estimated mean= 1.246376


0.4

## True sigma= 2 estimated sigma= 2.021477


Tru
Est
0.0

−4 −2 0 2 4 6

x
Remark 6. We can not use MME to estimate the parameters of
C(µ, σ), because the moments does not exists for Cauchy distribu-
tion.
5
4. Maximum likelihood estimate
MLE

Maximum Likelihood Estimator: Consider x = (x 1, x 2, · · · , x n )

0.5
5.45
be the observed/ realized values of a set of i.i.d. random variables 5.0

0.4
iid 6.5
X = (X1, X2, · · · , X n ) where Xi ∼ f θ for some θ ∈ Θ. Then the joint 3.5
p.d.f. of X = (X1, X2, · · · , X n ) is a function of x when the parameter 1.5

0.3
value is fixed i.e.

0.2
Yn
f (x|θ) = f (x i, θ)

0.1
i=1

and the likelihood of a function of parameter for a given set of data

0.0
X = x i.e.
0 2 4 6 8 10
n
Y
`(θ|x) = f (x i, θ). x
i=1

Hence the maximum likelihood estimator of θ is

θ̂ mle = arg max `(θ|x) = arg max log `(θ|x) Propertied of MLE:
θ ∈Θ θ ∈Θ
 MLE need not be unique.
 MLE need not be an unbiased estimator.
 MLE is always a consistent estimator.
 MLE is asymptotically normally distributed up to some loca-
Remark 7. Finding the maxima through differentiation is possible tion and scale when some {regularity condition} satisfied
only if ` is a smoothly differentiable function w.r.t θ. Otherwise it like
has to be maximized by some other methods. Differentiation is (1) Range of the random variable is free from parameter.
not the only way of finding maxima or minima. (2) Likelihood is smoothly differentiable for up to 3rd order
and corresponding expectations exists.

6
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 4 6 6 6 7 7 8 8 8 9 Interval Estimation: Consider a pair of statistic (L(X), U (X)) such
## [2,] 5 6 6 7 7 7 8 8 8 9 that for a parameter θ ,
## [3,] 5 6 6 7 7 7 8 8 9 9 Pθ (θ ∈ [L(X), U (X)]) = 1 − α
## [4,] 5 6 6 7 7 8 8 8 9 10
Then a 100(1 − α)% confidence interval of θ is considered to be
## [5,] 5 6 6 7 7 8 8 8 9 10
[L(X), U (X)].

Example 1. If X1, X2, . . . , X n are i.i.d random variables with N (µ, σ 2 )


distribution with known value of σ 2 . Then a 100(1 − α)% CI of µ is
4e−38

σ σ
" #
L(X) = X − √ zα/2, U (X) = X + √ zα/2
l

n n
0e+00

0.0 0.2 0.4 0.6 0.8 1.0


True mean need not be in the confidence interval always
p

2
Confidence interval

1
0
−600
L

−1
−1400

0.0 0.2 0.4 0.6 0.8 1.0

−2
p 0 10 20 30 40 50

## MLE1= 0.72 MLE2= 0.72

7
Example 2. If X1, X2, . . . , X n are i.i.d random variables with N (µ, σ 2 ) σ̂u2 = n−1
1 Pn 2
i=1 (Xi − X̄ ) is an unbiased estimator of unknown vari-
distribution . Then a 100(1 − α)% CI of µ is ance and a 100(1 − α)% CI of σ 2 is
σˆu σˆu
" #
L(X) = X − √ τα/2,n−1, U (X) = X + √ τα/2,n−1 Pn Pn
n n  (X − X̄ ) 2 2
i=1 (Xi − X̄ ) 
 L(X) = i=1 i , U (X) =
χ2α/2,(n−1) χ21−α/2, (n−1) 
 

8
R Code

################
#####Data#######
################
nn<-1000000
mu<-0
sd=1
x<-rnorm(nn,0,1)
n1<-25
n2<-50
n3<-500
x1<-matrix(x,nrow = n1)
x2<-matrix(x,nrow = n2)
x3<-matrix(x,nrow = n3)

#####Estimation of MEAN#######
##### Unbiased estimator #####
m1<-apply(x1, MARGIN = 2, mean)
m2<-apply(x2, MARGIN = 2, mean)
m3<-apply(x3, MARGIN = 2, mean)
par(mfrow=c(1,1))
tt<-paste("sample size",n1)
hist(m1,probability = T, col="lightblue", main = tt, breaks = 50, xlim=c(-0.7,0.7))
abline(v=mu, col=2, lwd=2)
points(mean(m1),0, col="blue")
tt<-paste("sample size",n2)

9
hist(m2,probability = T, col="lightgreen", main = tt,breaks = 20, xlim=c(-0.7,0.7))
abline(v=mu, col=2, lwd=2)
points(mean(m2),0, col="blue")
tt<-paste("sample size",n3)

hist(m3,probability = T, col="lightgray", main = tt,breaks = 20, xlim=c(-0.7,0.7))


abline(v=mu, col=2, lwd=2)
points(mean(m1),0, col="blue")
plot(density(m1), col="blue", ylim=c(0,10), main='Unbiasedness')
lines(density(m2), col="green")
lines(density(m3), col="black")
abline(v=mu, col=2, lwd=2)

##### Estimation of square of MEAN#########


##### Asymptotically Unbiased estimator ###

par(mfrow=c(1,1))
tt<-paste("sample size",n1)
hist(m1^2,probability = T, col="lightblue", main = tt, breaks = 50, xlim=c(-0.1,0.4))
abline(v=mu, col=2, lwd=2)
points(mean(m1^2),0, col="blue")
tt<-paste("sample size",n2)
hist(m2^2,probability = T, col="lightgreen", main = tt,breaks = 20, xlim=c(-0.1,0.4))
abline(v=mu, col=2, lwd=2)
points(mean(m2^2),0, col="blue")
tt<-paste("sample size",n3)

10
hist(m3^2,probability = T, col="lightgray", main = tt,breaks = 20, xlim=c(-0.1,0.4))
abline(v=mu, col=2, lwd=2)
points(mean(m3^2),0, col="blue")

plot(density(m1^2), xlim=c(-0.05,0.2), main="Asymptotic Unbiasedness")


points(mean(m1^2),0, col="blue", main='consistency')
points(mean(m2^2),0, col="green", main='consistency')
points(mean(m3^2),0, col="black", main='consistency')
abline(v=mu, col=2, lwd=2)

##### Estimation of MEAN#########


##### Consistency ###

n<-750
xsample<-sample(x,n,replace = F)
cummean<-cumsum(xsample)/(1:n)
plot(cummean,type='l',col="gray",ylim=c(-1,1) )
abline(h=mu,col=2,lwd=2,lty=2)
for( i in 2 :10){
xsample<-sample(x,n,replace = F)
cummean<-cumsum(xsample)/(1:n)
lines(cummean,type='l',col="gray")
}
abline(h=mu,col=2,lwd=2,lty=2)
abline(v=25, col="blue")

11
abline(v=50, col="green")
abline(v=500, col="black")
lines(-3/sqrt((1:n)))
lines(3/sqrt((1:n)))

par(mfrow=c(1,1))
plot(density(m1), col="blue", ylim=c(0,10), main='Consistency')
lines(density(m2), col="green")
lines(density(m3), col="black")
abline(v=mu, col=2, lwd=2)

#########################
# Method of Moments
# # Distribution : Normal
mu<-1.3 # mean
s<- 2 # sigma
n<- 200 # sample size
x<- rnorm(n,mean = mu,sd = s) # data
xmin<- min(x) # min of data
xmax<-max(x) # max data
l<- seq(xmin-0.5, xmax+0.5, length=100)
######### Estimation ##########
muh<-mean(x)
sh<-sd(x)
###############################
cat("True mean=", mu, "estimated mean=", muh,"\n")

12
cat("True sigma=", s, "estimated sigma=", sh,"\n")
###############################

plot(pnorm(q = l,mean = mu,sd = s)~l, type = 'l', col=1, lwd=2, ylab = "CDF", xlab
='x')
lines(pnorm(q = l,mean = muh,sd = sh)~l, type = 'l', col=2, lwd=2)
#lines(ecdf(x),col=3, lty=2)
legend("bottomright",legend = c("True", "Estimated"), col = c(1,2), lwd = c(2,2))

hist(x,probability = T, xlab
='x')
lines(dnorm(x = l,mean = mu,sd = s)~l, type = 'l', col=1, lwd=2, ylab = "PDF")
lines(dnorm(x=l,mean = muh,sd = sh)~l, type = 'l', col=2, lwd=2)
legend("topright",legend = c("True", "Estimated"), col = c(1,2), lwd = c(2,2))

###########################
# MLE of binomal parameter
set.seed(12)
n<-10 # size of binomial
x<- sort(rbinom (50, n, 0.7)) # sample given
x<-matrix(x,ncol = 10)
print(x)

# MLE finding
p<-seq(0.01,0.99,by = 0.01)
l<-array(0,dim=c(length(p)))

13
for (i in 1 : length(p)){
l[i]<-prod(dbinom(x,n,p[i])) # product of likelihood
}

plot(l~p, type='l', col=4, lwd=2)


mle1<-p[which(l==max(l))]

abline(v=mle1)
L<-array(0,dim=c(length(p)))
for (i in 1 : length(p)){
L[i]<-sum(log(dbinom(x,n,p[i]))) #sum of log likelihood
}

plot(L~p,type='l', col=2, lwd=2)


mle2<-p[which(L==max(L))]
abline(v=mle2)

cat("MLE1=",mle1,"MLE2=",mle2,"\n")

Department of Mathematics, IIT Kgaragpur


URL: https://sites.google.com/site/buddhanandastat/
E-mail address: [email protected]

14

You might also like