0% found this document useful (0 votes)
23 views7 pages

Chapter 3 - Parametric Estimation Using Confidence Intervals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Chapter 3 - Parametric Estimation Using Confidence Intervals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Module : Estimation techniques for engineers Grade : 3rd year

Chapter 3 : Parametric estimation using


confidence intervals
I Introductory activity
•In a musical society that performs vocalworks, let’s suppose that it seeks to estimate the mean
µpop of 40 singers from a sample of 5 observations of this chois. If we estimate µpop by the sample
mean µséch , we cannot reasoably believe that µéch = µpop exactly ; we will make a small estimation
error. Moreover, as the sample is random, the value of µéch you would have obtained on another sample
would probably have been different, albeit just as relevant. Here are a few examples of samples ; you
can see that the value of µéch fluctuates from one sample to the next :

X1 X2 X3 X4 X5 µéch
sample 1 1.89 1.79 1.74 1.90 1.74 1.812
sample 2 1.74 1.95 1.76 1.75 1.71 1.782
sample 3 1.84 1.84 1.88 1.85 1.89 1.86
sample 4 1.84 1.84 1.75 1.83 1.75 1.802
sample 5 1.59 1.68 1.79 1.89 1.79 1.748

• Conclusion : To estimate µpop , you can’t simply give the value of µéch , but you must accompany it
with a margin of error. The aim of this chapter is to understand how to determine these magins of
error, or in mathematical terms, how to construct a confidence interval.

II Principle
• The estimation of an unknown parameter θ ∈ R by confidence interval consists in associating
with a sample a random interval denoted Iθ which, for a risk α ∈ [0, 1] or a significance level 1 − α,
is likely to contain the true value of θ.

• Mathematically, this confidence interval Iθ for the parameter θ, is a random interval of the form
[aα , bα ], where (aα , bα ) ∈ R2 \{(0, 0}, defined as follows :

P(θ ∈ Iθ ) = P(aα ≤ θ ≤ bα ) = 1 − α

• The risk α is the probability that the parameter θ does not belong to the interval Iθ , in other
words, it is the probability that we are mistaken in asserting that θ ∈ Iθ . It is therefore a probability
of error that must be fairly small. The usual values of α are 10%, 5%, 1%, . . .

• The significance level 1 − α is the probability that the parameter θ belongs to the interval Iθ .

• When the parameter θ denotes the mean m or the proportion p, then it seems logical to look
for a confidence interval Iθ for θ of the form [θb − ε, θb + ε], where θb is an unbiased point estimator of

1
θ. According to the above characterization this amounts to determining error margins ε > 0 such
that :
P(θb − ε < θ < θb + ε) = 1 − α
.

III Confidence interval construction


For the rest of the course, we consider (X1 , . . . , Xn ), n > 0, an i.i.d sample of size n and we propose
to construct, for a given risk α, a confidence interval Iθ for the case where the unknown θ is the mean
µpop = µ ∈ R of a population, then for the case where the unknown θ is the variance σpop 2 = σ 2 > 0 of

a population and finally for the case where the unknown θ is the proportion p ∈]0, 1[ of a qualitative
character relative to a population.

III.1 Confidence interval for the mean


• Case of small samples n < 30 :
- Let (X1 , . . . , Xn ), n > 0, be an n-sample following the normal distribution N (µ, σ 2 ), where µ is
the mean and σ 2 is the variance. We consider the classical point estimators (unbiased and convergent)
of µ and σ 2 respectively the empirical mean and the corrected empirical variance given by :
n
X1 + . . . + Xn ′ 1 X 2
Xn = et Sn2 = (Xi − X n ) .
n n−1
i=1

- The main idea behind the construction of the confidence interval


Iµ for µ, with a fixed risk α is as follows.

Let Z ∼ N (0, 1) then there exists a real a such that P(Z ≥ a) = α2 ⇒ a is called quantile
N (0, 1) of order 1 − α2 that will henceforth be noted by z α2 and we determine it from the table of
N (0, 1) (reverse reading from the table).

Thus we have :
α
P(Z ≥ z α2 ) =
2
- Due to the symmetry of N (0, 1), we have also :
α
P(Z ≤ −z α2 ) =
2
This is illustrated graphically by :

2
This implies that :

P(−z α2 < Z < z α2 ) = P(Z ≥ −z α2 ) − P(Z ≥ z α2 )


α α
= 1− −
2 2
= 1−α

So to construct a confidence interval for µ with a fixed confidence level of 1 − α, all we have to do is
use the result we’ve just established :

P(−z α2 < Z < z α2 ) = 1 − α

Moreover, given that the sample has a normal distribution N (µ, σ 2 ) then :
σ 
X n ∼ N µ, √
n

and therefore the idea is to try to construct from X n a random variable ∼ N (0, 1). That’s why we
had to distinguish between two situations : if σ 2 isand if σ 2 is unknown.

Case 1 : the variance σ 2 is known


We obtain that if
σ  Xn − µ
X n ∼ N µ, √ ⇔ Z= ∼ N (0, 1)
n √σ
n

From the above result , it can be seen that :

P(−z α2 < Z < z α2 ) = 1 − α


Xn − µ
P(−z α2 < < z α2 ) = 1 − α
√σ
n
σ σ
P(−z α2 √ < X n − µ < z α2 √ ) = 1 − α
n n
σ σ
P(−X n − z α2 √ < −µ < −X n + z α2 √ ) = 1 − α
n n
σ σ
P(X n − z α2 √ < µ < X n + z α2 √ ) = 1 − α
n n

Theorem 1
A confidence interval for a significance level 1 − α for the parameter µ of the law N (µ, σ) when
σ 2 is known is given by :
σ σ
IC(µ) = [X n − z α2 √ , X n + z α2 √ ]
n n

3
Exercice 1 The weight of a newborn is assumed to be a random variable with a standard deviation
equal to 0.5 kg. In January 2004, in the hospital of Charleville-Mézières, we observed 25 children with
an average weight of
xn = 3, 6 kg.

1. Determine a confidence interval of confidence level 95% for the mean m of a newborn’s weight ?

2. What would be the number of children observed if the confidence interval has for length 0.1 ?

Case 2 : The variance σ 2 is unknown


- We propose to give a confidence interval of risk α for µ with σ 2 is unknown. A natural idea is to
replace σ 2 by its classical point estimator (unbiased and convergent) :
n
1 X
Sn′2 = (Xi − X n )2
n−1
i=1

. and then we define the following random variable :



Xn − µ n(X n − µ)
T = q = p ∼ t(n − 1) (be careful it does not follow N (0, 1))
′2
Sn Sn′2
n

where t(n − 1) is Student’s law with n-1 df.

- Thanks to the symmetry of Student’s law, we obtain the following result :

P(−t α2 ,n−1 < T < t α2 ,n−1 ) = 1 − α

. where t α2 ,n−1 denotes the quantile of the order 1 − α2 of Student’s law at n − 1 df, and is determined
from the table of t(n − 1) (inverse reading of the table).

- Therefore :

 n(X n − µ) 
P −t α
,n−1 < p < t α2 ,n−1 = 1−α
2
Sn′2
r r
 Sn′2 Sn′2 
P X n − t α2 ,n−1 < µ < X n + t α2 ,n−1 = 1−α
n n

Theorem 2
A confidence interval for a significance levels 1 − α for the parameter µ of the law N (µ, σ 2 ) when
σ 2 is unknown is of the form :
r r
h Sn′2 Sn′2 i
IC(µ) = X n − t α2 ,n−1 , X n + t α2 ,n−1
n n

4
• Large sample size n ≥ 30 :
When the sample size n ≥ 30 is not of normal distribution, then the following asymptotic
confidence intervals are obtained :

Theorem 3
When n ≥ 30 and σ 2 known, then the C.L.T. permits us to construct a confidence interval of
risk α for the parameter µ as follows :
σ σ
IC(µ) = [X n − z α2 √ , X n + z α2 √ ]
n n

Theorem 4
When n ≥ 30 and σ 2 unknown, we have t(n) ≈ N (0, 1), and subsequently a confidence interval
of risk α for the parameter µ is of this form :
r r
h Sn′2 Sn′2 i
IC(µ) = X n − z α2 , X n + z α2
n n

Exercice 2 Tests on a 10 sample of a metal’s thermal conductivity yielded the following data

41.60 41.48 42.34 41.95 41.86 42.18 41.71 42.26 41.81 42.04

Let X be the thermal conductivity of the metal. We assume that X follows a Normal distribution with
unknown parameters.

1. Determine an interval for the population mean µ at 95% confidence level.

2. Let’s assume that σ 2 = 0.3, determine the necessary sample size to construct a confidence interval
for µ at 95% confidence level and amplitude equal to 0.06.

III.2 Confidence interval for a proportion


The problem known as the confidence interval for a proportion is, in fact the problem of determining
a confidence interval for the parameter pin]0, 1[ of Bernoulli’s law once given a sample (X1 , . . . , Xn ) ∼
B(p) .
Therefore, a proportion is simply the frequency of the 1 value in the sample. Recall that we have
already shown that a point estimator of p is Pbn = X n , but for a sample that is not normal, the law
of the Pbn statistic is not easy to find. and thereafter the determination of the confidence interval is no
longer possible, but in favor of the Central Limit Theorem (C.L.T.) when n is sufficiently large, we
admit the following result :

Theorem 4
If np > 5 and n(1 − p) > 5 (or n sufficiently large), then the confidence interval of significance
level 1 − α for a proportion p is as follows :
 s s 
P
bn (1 − P
b n ) P
b n (1 − P
b )
n 
IC(p) = Pbn − z α2 , Pbn + z α2
n n

5
Exercice 3 Out of 500 surveyed, 274 said they would vote for candidate A.
1. Give an estimate of p the proportion of people favoring candidate A in the population by confidence
interval at the 0.95 significance level.
2. For what degree of confidence the lower bound would be exactly equal to 50% ?

III.3 Confidence interval for a variance


- The problem is as follows : the variance σ 2 of the unknown population must be framed. We are
therefore looking for two values aα and bα framing σ 2 which verify :
P(aα < σ 2 < bα ) = 1 − α

Case 1 : the mean µ is known


- Given a normal population of unknown variance σ 2 , in this case the proposed point estimator for
σ2 is :
n
2 1X
σ
bn = (Xi − µ)2
n
i=1
n
X (Xi − µ)2
- The random variable Y = n
σ2
bn2 =
×σ follows a χ2 (n) law with n− degrees of freedom,
σ2
i=1
which is not a symmetrical law.
-The main idea behind the construction of the confidence interval Isigma2 for σ 2 , with a fixed α
risk, is as follows :
We are looking for χ1− α2 ,n which verifies :
  α
P χ2 (n) ≤ χ1− α2 ,n =
2
and χ α2 ,n which verifies :
  α
P χ2 (n) ≥ χ α2 ,n =
2
This implies that :

n
!
X (Xi − µ)2 α α
P χ1− α2 ,n < < χ α2 ,n = 1− − =1−α
σ2 2 2
i=1
then we can show that :
 n
X Xn 
2 2
 (Xi − µ) (Xi − µ) 
 i=1
< σ 2 < i=1

P  = 1−α

 χ α
2
,n χ 1− α
2
,n

And we admit the following result :

6
Theorem 5
A confidence interval of significance level1 − α for the parameter σ 2 of the law N (µ, σ) when µ
is known is of the form : " #
nb σ 2 nb
σ 2
n n
IC(σ 2 ) = ,
χ α2 ,n χ1− α2 ,n

Case 2 : the mean µ is unknown


We propose to give a confidence interval of confidence level 1 − α for σ 2 with µ unknown. In this
case, the proposed point estimator for σ 2 is :
n
′ 1 X 2
Sn2 = (Xi − X n )
n−1
i=1


Therefore Y = (n−1)
σ2
× Sn2 follows the law of χ2 (n − 1) of (n − 1) degrees of freedom. Following the
same approach as in the previous situation, we arrive at the following result :

Theorem 6
A confidence interval of risk α for the parameter σ 2 of the law N (µ, σ) when µ is unknown is
given by : " #
′2 ′2
(n − 1)S n (n − 1)S n
IC(σ 2 ) = ,
χ α2 ,n−1 χ1− α2 ,n−1

Exercice 4 We measured the total alcohol content (expressed in g/l) of a sample of of 10 bottles of
sweet cider on the market. The following values were obtained x1 , x2 , x3 , . . . , x10 such that :
10
X 10
X
xi = 62 and x2i = 388.4124
i=1 i=1

The quantity of alcohol in a bottle is modelled by a random variable X following a normal distribution
with expectation µ and variance σ 2 , where the parameters µ and σ are unknown.

1. From the observed sample, propose point estimates of µ and σ 2 .

2. Construct a confidence interval for the mean µ at a confidence level of 1 − α = 95%.

3. Determine a confidence interval at 80% of the variance σ 2 .

4. (a) If n denotes the size of a large sample (n > 50), express as a function of n the amplitude of
the confidence interval of µ at the 95% confidence level.
(b) We wish to construct a confidence interval of µ at the 95% confidence level having an am-

plitude of 0.2. What is the sample size when the sample variance is sn2 = 0, 6 g/l ?

You might also like