0% found this document useful (0 votes)
42 views6 pages

Statistics: Assignment 6

1. This document is a statistics assignment from the Ecole Polytechnique de Tunisie for the 2019-2020 college year. 2. It contains 4 exercises on parameter estimation for Poisson and normal distributions using maximum likelihood. Estimators for the mean and variance are derived. 3. The final exercise uses a t-distribution to construct confidence intervals for the mean from sample data with small sample sizes. R code is provided to verify the results.

Uploaded by

Houcem Bn Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views6 pages

Statistics: Assignment 6

1. This document is a statistics assignment from the Ecole Polytechnique de Tunisie for the 2019-2020 college year. 2. It contains 4 exercises on parameter estimation for Poisson and normal distributions using maximum likelihood. Estimators for the mean and variance are derived. 3. The final exercise uses a t-distribution to construct confidence intervals for the mean from sample data with small sample sizes. R code is provided to verify the results.

Uploaded by

Houcem Bn Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Ministère de l’Enseignement Supérieur ‫وزارة التعليـم العالي و البحث العلمي‬

et de la Recherche Scientifique
‫جامعـة قرطاج‬
Université de Carthage

Ecole Polytechnique de Tunisie ‫المدرسـة التونسية للتقنيات‬

Statistics
______________________________________

Assignment 6
________________________________

Developed by: Houcem Ben Salem

College year: 2019 – 2020

Rue Elkhawarezmi BP 743 La Marsa 2078 2078 ‫ المرسى‬743 ‫ب‬.‫نهج الخوارزمي ص‬


Tel: 71 774 611 -- 71 774 699 Fax: 71 748 843 71 748 843 :‫ الفاكس‬71 774 699 -- 71 774 611 :‫الهاتف‬
Site Web: www.ept.rnu.tn www.ept.rnu.tn :‫موقع الواب‬

1
Exercise 1:
1- We know that Xi~𝑃𝑜(λ), i = 1,2, . . , n and because they are i.i.d we find the joint probability mass
function for a given set of realizations x1, x2,…, xn (i.e. the data) like this:
λ𝑥𝑖 −λ
P (X1= x1, X2= x2,…, Xn= xn)=P (X1= x1)…….P(Xn= xn)= ∏𝑛𝑖=1 P (𝑋𝑖 = 𝑥𝑖 ) = ∏𝑛𝑖=1 e
𝑥𝑖
λ 𝑥𝑖 λ ∑ 𝑥𝑖 −𝑛λ
The likelihood function and can be written as: L (x1, x2,…, xn| λ)= ∏𝑛𝑖=1 e−λ = 𝑒
𝑥𝑖 ! ∏ 𝑥𝑖 !
As we know from the course the best way to work with method of maximum likelihood is to use the
well-known principle of maxima-minima to maximize the likelihood function so because our function
is differentiable the first order condition for the maximum is that the first derivative with respect to
λ is zero.
To facilitate the derivation, we can use the logarithm of the function because it transforms products
in sums and sums are easy to differentiate by differentiating each term in the sum.
The log-likelihood in our example is therefore:
Ln(L)=∑ 𝑥𝑖 ln (λ)-ln (𝑥1 ! 𝑥2 ! … . 𝑥𝑛 !) − 𝑛 λ
𝜕ln (𝐿) 1
=λ ∑ 𝑥𝑖 − 𝑛 = 0
𝜕λ
1
And we easily obtain the estimation of λ̂ =𝑛 ∑ 𝑥𝑖 =𝑥̅
To confirm that our λ is a maximum for our function the second derivative should be negative:
𝜕2 ln (𝐿) 1 𝑛
=- λ̂ 2 ∑ 𝑥𝑖 = − λ̂ < 0
𝜕λ2
Now we can confirm that the arithmetic mean 𝑥̅ = λ̂ is the maximum likelihood estimator for the
parameter λ of a Poisson distribution.
2- Thanks to the result of first question the log-likelihood function for x1=4, x2=3, x3=8, x4=6, x5=6 is:
We have ∑ 𝑥𝑖 = 27
 Ln(L)=27ln (λ) − ln(4! 3! 8! 6! 6!) − 5λ
We can plot the function in R like this thanks to the command curve():

2
Exercise 2:
1- In this case we have Xi~𝑁(μ, 𝜎 2 ), i = 1,2, . . , n i.i.d we will have the same demarche as the first
exercise:
1 (𝑥−μ)2
• The probability density function of a normal distribution is: f(x)=𝜎√2πexp (- )
2𝜎2
1 (𝑥𝑖 −μ)2
• Our likelihood function is: L (x1, x2,…, xn| μ, 𝜎 2 )=(𝜎√2π)𝑛 exp(-∑𝑛𝑖=1 )
2𝜎2

As we proceed in the last exercise, we will derivate the log-likelihood function and equating it to zero
because it’s so much easier to work the log-function:
𝑛 (𝑥𝑖 −μ)2
• Our log-likelihood function is : ln(L)=− 2 (ln(2π) + ln(𝜎 2 ))- ∑𝑛𝑖=1 2𝜎2
2
In this question 𝜎 = 1;
𝜕ln (𝐿) (𝑥𝑖 −μ)
• The derivation of our log-likelihood function gives us: = 2 ∑𝑛𝑖=1
𝜕μ 1
𝜕ln (𝐿)
 =0
𝜕μ
(𝑥 −μ)
 2 ∑𝑛𝑖=1 𝑖1 = 0
 n μ = ∑ 𝑥𝑖
 μ̂=𝑥̅
2- It’s easy to remark that even if 𝜎 2 ≠ 1 and we don’t have any information about its value, μ̂ still
equal to 𝑥̅ because the solution of the differentiated log-likelihood function equals to zero is
independent of 𝜎 2 so we can conclude that we always have μ̂=𝑥̅ for each arbitrary 𝜎 2 .
Exercise 3:
1- For Tn(X):
1
we have: E(Tn(X)) =E(nXmin)=nE (Xmin) and we know that Xmin~𝐸𝑥𝑝(𝑛λ ) then E(Xmin)=𝑛λ
1 1
So, E(Tn(X)) =n𝑛λ =λ =μ.
3- Tn(X) is unbiased, and therefore also asymptotically unbiased
For Vn(X):
1 1 1 1 1 1
We have E(Vn(X))=E(𝑛 ∑ 𝑋𝑖 )=𝑛 𝐸(∑ 𝑋𝑖 )=𝑛 ∑ 𝐸( 𝑋𝑖 )= 𝑛 𝑛 λ =λ = μ
So, E(Vn(X)) = μ.
4- Vn(X) is unbiased, and therefore also asymptotically unbiased.
2- As we all know we can calculate the MSE of each estimator after calculating its Bias and Variance
because the MSE is its sum.
In our case both estimators are unbiased that’s mean that their Bias is null so MSE=Var.
1
Var (Tn(X)) = Var(nXmin)=𝑛2 Var (Xmin)= 𝑛2 =μ2
λ2 𝑛 2
1 1 1 1 1
Var(Vn(X))=Var(𝑛 ∑ 𝑋𝑖 ) = 𝑛2 ∑ Var(𝑋𝑖 ) = 𝑛2n = μ2
λ2 𝑛
1
5- MSE(Tn(X)) = μ2 and MSE(Vn(X)) = 𝑛 μ2
To find which estimator is more efficient all what we have to do is to compare their Var and the
one who has the lower Variance is more efficient than the other, in our case:
Var (Vn(X)) < Var (Tn(X)) for any n>1
6- Vn(X) is more efficient than Tn(X)

3
Exercise 4:
1- The point estimate of μ is 𝑥̅ which is:
1 1
𝜇̂ =𝑥̅ = ∑ 𝑥𝑖 = ∑(450 + ⋯ + 790) = 667.92
𝑛 24
The variance of 𝜎 can be estimated unbiasedly using 𝑠 2 :
2

1 1
𝜎̂ 2 = 𝑠 2 = ∑(𝑥𝑖 − 𝑥̅ )2 = ((450 − 667.92)2 + ⋯ + (790 − 667.92)2 = 18.035
𝑛−1 23
2- In this exercise we have a sample where n=24<30 and no idea about the value of variance so
as we see in the course, we will need the t-distribution to construct the confidence interval.
α
The first information that we need to start is t23;0.975(α = 0.05 then 1-α = 0.95 and 1- 2 =
0.975) which can be calculated with R(qt(0.975,23) or be fended in the table in our case its
equal to 2.07. as we don’t have any idea about the variance, we will work with 𝜎̂ 2 =18.035
that give us s=√𝜎̂ 2 and 𝑥̅ =667.92
Now all what we have to do is to applicate and we easily find the born of the interval:
𝑠 √18.035
Il(X)= 𝑥̅ -t23;0.975 . = 667.92 − 2.07. = 611.17
√n √24
𝑠 √18.035
Iu(X)= 𝑥̅ +t23;0.975 . =667.92 + 2.07. = 724.66
√n √24
 And we can say that our confidence interval for μ is [611.7;724.66]
3- With R we use the conf.int value of the t.test command to get a confidence interval :

We found almost the same result.


Exercise 5:
1- In all the cases n<30 that’s why we are going to work with t-distribution: same demarche as the last
exercise:
• For “Brose Baskets Bamberg”:
t15;0.975=2.1314, α = 0.05 then:
𝑠 7.047
Il(Bbb)= 𝑥̅ -t15;0.975 . = 199 − 2.1314. = 195.305
√ n √16
𝑠 7.047
Iu(Bbb)= 𝑥̅ +t15;0.975 . = 199 + 2.1314. = 202.815
√n √16
 And we can say that our confidence interval is [195.305;202.815]
• For “Bayer Giants Leverkusen”:
t13;0.975=2.1604, α = 0.05 then:
𝑠 9.782
Il(Bgl)= 𝑥̅ -t13;0.975 . = 196 − 2.1604. = 190.352
√ n √14
𝑠 9.782
Iu(Bgl)= 𝑥̅ +t13;0.975 . = 196 + 2.1604. = 201.648
√n √14
 And we can say that our confidence interval is [190.352;201.648]
• For “werder Bremen”:
t22;0.975=2.0739, α = 0.05 then:
𝑠 5.239
Il(Wb)= 𝑥̅ -t22;0.975 . = 178.52 − 2.0739. = 185.255
√n √23
𝑠 5.239
Iu(Wv)= 𝑥̅ +t22;0.975 . = 178.52 + 2.0739. = 189.786
√n √23
 And we can say that our confidence interval is [185.255;189.786]

4
Interpretation:
Thanks to the values of mean we can easily remark (even before finding the interval of confidence)
that the heights of basketball players is higher than football players.
In addition, if we concentrate, we can see that the two intervals of the basketball teams overlap, but
the interval of the football team with the two basketball teams do not overlap that’s means that all
players of basketball are higher then those of football but there is an interference between basketball
players.

Exercise 6:
1- We can remark thought the operation which happens between the couple that it we have a
Binomial distribution that could be defined as:
Xi=1 if the wife doesn’t wash the dishes
Xi=0 if the wife washes the dishes
Then the parameter p could be estimated by the arithmetic mean because the arithmetic mean
1 1
is an unbiased estimator of p then we can write 𝑝̂ = ∑ 𝑋𝑖 in our case n=98 and 𝑝̂ = 59(the
𝑛 98
wife notes that the coin has shown head 59 that’s mean that the husband wash the dishes 59
time and the wife win 59 from 98 time) then 𝑝̂ = 0.602
2- We have n𝑝̂ (1 − 𝑝̂ ) = 98 . 0.602 . 0.398 = 23.48 > 5 𝑎𝑛𝑑 𝑛 = 98 𝑎𝑛𝑑 𝑛𝑝̂ = 60 > 5 so we can
use an approximation based on the normal distribution to calculate confidence intervals
𝑝̂−𝑝
Z= ~𝑁(0,1) then we have: z1−α/2 = z0.975 = 1.96
√𝑝̂(1−𝑝̂)/𝑛
0.602(0.398)
Il(X)= 𝑝̂ - z0.975. √𝑝̂ (1 − 𝑝̂ )/𝑛 = 0.602 − 1.96. √ = 0.505
98

0.602(0.398)
Iu(X)= 𝑝̂ + z0.975. √𝑝̂ (1 − 𝑝̂ )/𝑛 = 0.602 + 1.96. √ = 0.699
98
 Our confidence interval is: [0.505, 0.699]
It’s very important for us to remark that our interval of confidence doesn’t contain p=0.5 which
the probability that we expected from a game like head/tail if it was fair so we can say that the
coin isn’t fair and its always in favor of the wife.
3- If the coin was fair, we will have p=0.5 then we will need:
𝑧 𝛼 2
1−
2 1.96 2
n≥ [ ] 𝑝̂ (1 − 𝑝̂ )= [0.005] 0.25=38,416

if we assume that they will continue to play the game in each dinner for the last of their lives they
will need more than 100 years to get the desired precision
 We can conclude that such expectation is impossible and should be more realistic in addition it’s
impossible to have p=0.5 because the coin is not fair that’s why we should know how to choose
our parameter and our condition to have the best result.

Exercise 7:
𝜎
1- As we know the length of the confidence interval for the point estimate is:∆=2𝑧1−𝛼 and we
2 √𝑛
are interested in obtaining the value of n for which the confidence interval has a fixed confidence
width of maximum ∆= 0.2𝑠 that’s why we rearrange the previous equation to obtain:
𝑧 𝛼𝜎 2
1− 0
2
𝑛0 ≥ [2 ] =20.85

 We need at least 21 athletes to calculate a confidence interval width of maximum 0.2s.

5
2- All what we have to do is to applicate the rule: we have n=30≥ 30 then we will have a normal
distribution (course):
𝜎0 𝜎0 0.233 0.233
[𝑥̅ − 𝑧0.975 . ; 𝑥̅ + 𝑧0.975 . ]=[10.93 − 1.96 . ; 10.93 + 1.96 . ]=[10.84; 11.01]
√ n √ n √30 √30

Thanks to question (a) we estimate that our confidence interval width should be smaller than 0.2 s if
the size of the sample more than 21 which the case of question (b) where the size of the sample is 30
and the width is 0.17 which is logic.
3- In this question we went to know if our runner is in the best 10% of runners or no, all what we
have to do is to find an interval which indicate this percentage as we all know the confident
intervals are symmetric then the only way to have 10% is to calculate 80% confidence interval
and like that the lower confidence limit corresponds to the time achieved by only 10% of athletes.
if the record of our runner is least than this limit, we can say that he is from the best 10% else he
isn’t.
All what we have to do know is to calculate our confidence interval with 𝛼 = 0.2
We obtain:
𝜎0 𝜎0 0.233 0.233
[𝑥̅ − 𝑧0.9 . ; 𝑥̅ + 𝑧0.9 . ]=[10.93 − 1.28 . ; 10.93 + 1.28 . ]=[10.87; 10.98]
√n √n √30 √30

 the record of our runner is 10.86<10.87 then he is from the first 10%
4- I choose first 30 value from the data to work with then I tested with the command t.test I have
almost the same results:

You might also like