Statistics: Assignment 6
Statistics: Assignment 6
et de la Recherche Scientifique
جامعـة قرطاج
Université de Carthage
Statistics
______________________________________
Assignment 6
________________________________
1
Exercise 1:
1- We know that Xi~𝑃𝑜(λ), i = 1,2, . . , n and because they are i.i.d we find the joint probability mass
function for a given set of realizations x1, x2,…, xn (i.e. the data) like this:
λ𝑥𝑖 −λ
P (X1= x1, X2= x2,…, Xn= xn)=P (X1= x1)…….P(Xn= xn)= ∏𝑛𝑖=1 P (𝑋𝑖 = 𝑥𝑖 ) = ∏𝑛𝑖=1 e
𝑥𝑖
λ 𝑥𝑖 λ ∑ 𝑥𝑖 −𝑛λ
The likelihood function and can be written as: L (x1, x2,…, xn| λ)= ∏𝑛𝑖=1 e−λ = 𝑒
𝑥𝑖 ! ∏ 𝑥𝑖 !
As we know from the course the best way to work with method of maximum likelihood is to use the
well-known principle of maxima-minima to maximize the likelihood function so because our function
is differentiable the first order condition for the maximum is that the first derivative with respect to
λ is zero.
To facilitate the derivation, we can use the logarithm of the function because it transforms products
in sums and sums are easy to differentiate by differentiating each term in the sum.
The log-likelihood in our example is therefore:
Ln(L)=∑ 𝑥𝑖 ln (λ)-ln (𝑥1 ! 𝑥2 ! … . 𝑥𝑛 !) − 𝑛 λ
𝜕ln (𝐿) 1
=λ ∑ 𝑥𝑖 − 𝑛 = 0
𝜕λ
1
And we easily obtain the estimation of λ̂ =𝑛 ∑ 𝑥𝑖 =𝑥̅
To confirm that our λ is a maximum for our function the second derivative should be negative:
𝜕2 ln (𝐿) 1 𝑛
=- λ̂ 2 ∑ 𝑥𝑖 = − λ̂ < 0
𝜕λ2
Now we can confirm that the arithmetic mean 𝑥̅ = λ̂ is the maximum likelihood estimator for the
parameter λ of a Poisson distribution.
2- Thanks to the result of first question the log-likelihood function for x1=4, x2=3, x3=8, x4=6, x5=6 is:
We have ∑ 𝑥𝑖 = 27
Ln(L)=27ln (λ) − ln(4! 3! 8! 6! 6!) − 5λ
We can plot the function in R like this thanks to the command curve():
2
Exercise 2:
1- In this case we have Xi~𝑁(μ, 𝜎 2 ), i = 1,2, . . , n i.i.d we will have the same demarche as the first
exercise:
1 (𝑥−μ)2
• The probability density function of a normal distribution is: f(x)=𝜎√2πexp (- )
2𝜎2
1 (𝑥𝑖 −μ)2
• Our likelihood function is: L (x1, x2,…, xn| μ, 𝜎 2 )=(𝜎√2π)𝑛 exp(-∑𝑛𝑖=1 )
2𝜎2
As we proceed in the last exercise, we will derivate the log-likelihood function and equating it to zero
because it’s so much easier to work the log-function:
𝑛 (𝑥𝑖 −μ)2
• Our log-likelihood function is : ln(L)=− 2 (ln(2π) + ln(𝜎 2 ))- ∑𝑛𝑖=1 2𝜎2
2
In this question 𝜎 = 1;
𝜕ln (𝐿) (𝑥𝑖 −μ)
• The derivation of our log-likelihood function gives us: = 2 ∑𝑛𝑖=1
𝜕μ 1
𝜕ln (𝐿)
=0
𝜕μ
(𝑥 −μ)
2 ∑𝑛𝑖=1 𝑖1 = 0
n μ = ∑ 𝑥𝑖
μ̂=𝑥̅
2- It’s easy to remark that even if 𝜎 2 ≠ 1 and we don’t have any information about its value, μ̂ still
equal to 𝑥̅ because the solution of the differentiated log-likelihood function equals to zero is
independent of 𝜎 2 so we can conclude that we always have μ̂=𝑥̅ for each arbitrary 𝜎 2 .
Exercise 3:
1- For Tn(X):
1
we have: E(Tn(X)) =E(nXmin)=nE (Xmin) and we know that Xmin~𝐸𝑥𝑝(𝑛λ ) then E(Xmin)=𝑛λ
1 1
So, E(Tn(X)) =n𝑛λ =λ =μ.
3- Tn(X) is unbiased, and therefore also asymptotically unbiased
For Vn(X):
1 1 1 1 1 1
We have E(Vn(X))=E(𝑛 ∑ 𝑋𝑖 )=𝑛 𝐸(∑ 𝑋𝑖 )=𝑛 ∑ 𝐸( 𝑋𝑖 )= 𝑛 𝑛 λ =λ = μ
So, E(Vn(X)) = μ.
4- Vn(X) is unbiased, and therefore also asymptotically unbiased.
2- As we all know we can calculate the MSE of each estimator after calculating its Bias and Variance
because the MSE is its sum.
In our case both estimators are unbiased that’s mean that their Bias is null so MSE=Var.
1
Var (Tn(X)) = Var(nXmin)=𝑛2 Var (Xmin)= 𝑛2 =μ2
λ2 𝑛 2
1 1 1 1 1
Var(Vn(X))=Var(𝑛 ∑ 𝑋𝑖 ) = 𝑛2 ∑ Var(𝑋𝑖 ) = 𝑛2n = μ2
λ2 𝑛
1
5- MSE(Tn(X)) = μ2 and MSE(Vn(X)) = 𝑛 μ2
To find which estimator is more efficient all what we have to do is to compare their Var and the
one who has the lower Variance is more efficient than the other, in our case:
Var (Vn(X)) < Var (Tn(X)) for any n>1
6- Vn(X) is more efficient than Tn(X)
3
Exercise 4:
1- The point estimate of μ is 𝑥̅ which is:
1 1
𝜇̂ =𝑥̅ = ∑ 𝑥𝑖 = ∑(450 + ⋯ + 790) = 667.92
𝑛 24
The variance of 𝜎 can be estimated unbiasedly using 𝑠 2 :
2
1 1
𝜎̂ 2 = 𝑠 2 = ∑(𝑥𝑖 − 𝑥̅ )2 = ((450 − 667.92)2 + ⋯ + (790 − 667.92)2 = 18.035
𝑛−1 23
2- In this exercise we have a sample where n=24<30 and no idea about the value of variance so
as we see in the course, we will need the t-distribution to construct the confidence interval.
α
The first information that we need to start is t23;0.975(α = 0.05 then 1-α = 0.95 and 1- 2 =
0.975) which can be calculated with R(qt(0.975,23) or be fended in the table in our case its
equal to 2.07. as we don’t have any idea about the variance, we will work with 𝜎̂ 2 =18.035
that give us s=√𝜎̂ 2 and 𝑥̅ =667.92
Now all what we have to do is to applicate and we easily find the born of the interval:
𝑠 √18.035
Il(X)= 𝑥̅ -t23;0.975 . = 667.92 − 2.07. = 611.17
√n √24
𝑠 √18.035
Iu(X)= 𝑥̅ +t23;0.975 . =667.92 + 2.07. = 724.66
√n √24
And we can say that our confidence interval for μ is [611.7;724.66]
3- With R we use the conf.int value of the t.test command to get a confidence interval :
4
Interpretation:
Thanks to the values of mean we can easily remark (even before finding the interval of confidence)
that the heights of basketball players is higher than football players.
In addition, if we concentrate, we can see that the two intervals of the basketball teams overlap, but
the interval of the football team with the two basketball teams do not overlap that’s means that all
players of basketball are higher then those of football but there is an interference between basketball
players.
Exercise 6:
1- We can remark thought the operation which happens between the couple that it we have a
Binomial distribution that could be defined as:
Xi=1 if the wife doesn’t wash the dishes
Xi=0 if the wife washes the dishes
Then the parameter p could be estimated by the arithmetic mean because the arithmetic mean
1 1
is an unbiased estimator of p then we can write 𝑝̂ = ∑ 𝑋𝑖 in our case n=98 and 𝑝̂ = 59(the
𝑛 98
wife notes that the coin has shown head 59 that’s mean that the husband wash the dishes 59
time and the wife win 59 from 98 time) then 𝑝̂ = 0.602
2- We have n𝑝̂ (1 − 𝑝̂ ) = 98 . 0.602 . 0.398 = 23.48 > 5 𝑎𝑛𝑑 𝑛 = 98 𝑎𝑛𝑑 𝑛𝑝̂ = 60 > 5 so we can
use an approximation based on the normal distribution to calculate confidence intervals
𝑝̂−𝑝
Z= ~𝑁(0,1) then we have: z1−α/2 = z0.975 = 1.96
√𝑝̂(1−𝑝̂)/𝑛
0.602(0.398)
Il(X)= 𝑝̂ - z0.975. √𝑝̂ (1 − 𝑝̂ )/𝑛 = 0.602 − 1.96. √ = 0.505
98
0.602(0.398)
Iu(X)= 𝑝̂ + z0.975. √𝑝̂ (1 − 𝑝̂ )/𝑛 = 0.602 + 1.96. √ = 0.699
98
Our confidence interval is: [0.505, 0.699]
It’s very important for us to remark that our interval of confidence doesn’t contain p=0.5 which
the probability that we expected from a game like head/tail if it was fair so we can say that the
coin isn’t fair and its always in favor of the wife.
3- If the coin was fair, we will have p=0.5 then we will need:
𝑧 𝛼 2
1−
2 1.96 2
n≥ [ ] 𝑝̂ (1 − 𝑝̂ )= [0.005] 0.25=38,416
∆
if we assume that they will continue to play the game in each dinner for the last of their lives they
will need more than 100 years to get the desired precision
We can conclude that such expectation is impossible and should be more realistic in addition it’s
impossible to have p=0.5 because the coin is not fair that’s why we should know how to choose
our parameter and our condition to have the best result.
Exercise 7:
𝜎
1- As we know the length of the confidence interval for the point estimate is:∆=2𝑧1−𝛼 and we
2 √𝑛
are interested in obtaining the value of n for which the confidence interval has a fixed confidence
width of maximum ∆= 0.2𝑠 that’s why we rearrange the previous equation to obtain:
𝑧 𝛼𝜎 2
1− 0
2
𝑛0 ≥ [2 ] =20.85
∆
We need at least 21 athletes to calculate a confidence interval width of maximum 0.2s.
5
2- All what we have to do is to applicate the rule: we have n=30≥ 30 then we will have a normal
distribution (course):
𝜎0 𝜎0 0.233 0.233
[𝑥̅ − 𝑧0.975 . ; 𝑥̅ + 𝑧0.975 . ]=[10.93 − 1.96 . ; 10.93 + 1.96 . ]=[10.84; 11.01]
√ n √ n √30 √30
Thanks to question (a) we estimate that our confidence interval width should be smaller than 0.2 s if
the size of the sample more than 21 which the case of question (b) where the size of the sample is 30
and the width is 0.17 which is logic.
3- In this question we went to know if our runner is in the best 10% of runners or no, all what we
have to do is to find an interval which indicate this percentage as we all know the confident
intervals are symmetric then the only way to have 10% is to calculate 80% confidence interval
and like that the lower confidence limit corresponds to the time achieved by only 10% of athletes.
if the record of our runner is least than this limit, we can say that he is from the best 10% else he
isn’t.
All what we have to do know is to calculate our confidence interval with 𝛼 = 0.2
We obtain:
𝜎0 𝜎0 0.233 0.233
[𝑥̅ − 𝑧0.9 . ; 𝑥̅ + 𝑧0.9 . ]=[10.93 − 1.28 . ; 10.93 + 1.28 . ]=[10.87; 10.98]
√n √n √30 √30
the record of our runner is 10.86<10.87 then he is from the first 10%
4- I choose first 30 value from the data to work with then I tested with the command t.test I have
almost the same results: