Chapter 3 - Parametric Estimation Using Confidence Intervals
Chapter 3 - Parametric Estimation Using Confidence Intervals
X1 X2 X3 X4 X5 µéch
sample 1 1.89 1.79 1.74 1.90 1.74 1.812
sample 2 1.74 1.95 1.76 1.75 1.71 1.782
sample 3 1.84 1.84 1.88 1.85 1.89 1.86
sample 4 1.84 1.84 1.75 1.83 1.75 1.802
sample 5 1.59 1.68 1.79 1.89 1.79 1.748
• Conclusion : To estimate µpop , you can’t simply give the value of µéch , but you must accompany it
with a margin of error. The aim of this chapter is to understand how to determine these magins of
error, or in mathematical terms, how to construct a confidence interval.
II Principle
• The estimation of an unknown parameter θ ∈ R by confidence interval consists in associating
with a sample a random interval denoted Iθ which, for a risk α ∈ [0, 1] or a significance level 1 − α,
is likely to contain the true value of θ.
• Mathematically, this confidence interval Iθ for the parameter θ, is a random interval of the form
[aα , bα ], where (aα , bα ) ∈ R2 \{(0, 0}, defined as follows :
P(θ ∈ Iθ ) = P(aα ≤ θ ≤ bα ) = 1 − α
• The risk α is the probability that the parameter θ does not belong to the interval Iθ , in other
words, it is the probability that we are mistaken in asserting that θ ∈ Iθ . It is therefore a probability
of error that must be fairly small. The usual values of α are 10%, 5%, 1%, . . .
• The significance level 1 − α is the probability that the parameter θ belongs to the interval Iθ .
• When the parameter θ denotes the mean m or the proportion p, then it seems logical to look
for a confidence interval Iθ for θ of the form [θb − ε, θb + ε], where θb is an unbiased point estimator of
1
θ. According to the above characterization this amounts to determining error margins ε > 0 such
that :
P(θb − ε < θ < θb + ε) = 1 − α
.
a population and finally for the case where the unknown θ is the proportion p ∈]0, 1[ of a qualitative
character relative to a population.
Let Z ∼ N (0, 1) then there exists a real a such that P(Z ≥ a) = α2 ⇒ a is called quantile
N (0, 1) of order 1 − α2 that will henceforth be noted by z α2 and we determine it from the table of
N (0, 1) (reverse reading from the table).
Thus we have :
α
P(Z ≥ z α2 ) =
2
- Due to the symmetry of N (0, 1), we have also :
α
P(Z ≤ −z α2 ) =
2
This is illustrated graphically by :
2
This implies that :
So to construct a confidence interval for µ with a fixed confidence level of 1 − α, all we have to do is
use the result we’ve just established :
Moreover, given that the sample has a normal distribution N (µ, σ 2 ) then :
σ
X n ∼ N µ, √
n
and therefore the idea is to try to construct from X n a random variable ∼ N (0, 1). That’s why we
had to distinguish between two situations : if σ 2 isand if σ 2 is unknown.
Theorem 1
A confidence interval for a significance level 1 − α for the parameter µ of the law N (µ, σ) when
σ 2 is known is given by :
σ σ
IC(µ) = [X n − z α2 √ , X n + z α2 √ ]
n n
3
Exercice 1 The weight of a newborn is assumed to be a random variable with a standard deviation
equal to 0.5 kg. In January 2004, in the hospital of Charleville-Mézières, we observed 25 children with
an average weight of
xn = 3, 6 kg.
1. Determine a confidence interval of confidence level 95% for the mean m of a newborn’s weight ?
2. What would be the number of children observed if the confidence interval has for length 0.1 ?
. where t α2 ,n−1 denotes the quantile of the order 1 − α2 of Student’s law at n − 1 df, and is determined
from the table of t(n − 1) (inverse reading of the table).
- Therefore :
√
n(X n − µ)
P −t α
,n−1 < p < t α2 ,n−1 = 1−α
2
Sn′2
r r
Sn′2 Sn′2
P X n − t α2 ,n−1 < µ < X n + t α2 ,n−1 = 1−α
n n
Theorem 2
A confidence interval for a significance levels 1 − α for the parameter µ of the law N (µ, σ 2 ) when
σ 2 is unknown is of the form :
r r
h Sn′2 Sn′2 i
IC(µ) = X n − t α2 ,n−1 , X n + t α2 ,n−1
n n
4
• Large sample size n ≥ 30 :
When the sample size n ≥ 30 is not of normal distribution, then the following asymptotic
confidence intervals are obtained :
Theorem 3
When n ≥ 30 and σ 2 known, then the C.L.T. permits us to construct a confidence interval of
risk α for the parameter µ as follows :
σ σ
IC(µ) = [X n − z α2 √ , X n + z α2 √ ]
n n
Theorem 4
When n ≥ 30 and σ 2 unknown, we have t(n) ≈ N (0, 1), and subsequently a confidence interval
of risk α for the parameter µ is of this form :
r r
h Sn′2 Sn′2 i
IC(µ) = X n − z α2 , X n + z α2
n n
Exercice 2 Tests on a 10 sample of a metal’s thermal conductivity yielded the following data
41.60 41.48 42.34 41.95 41.86 42.18 41.71 42.26 41.81 42.04
Let X be the thermal conductivity of the metal. We assume that X follows a Normal distribution with
unknown parameters.
2. Let’s assume that σ 2 = 0.3, determine the necessary sample size to construct a confidence interval
for µ at 95% confidence level and amplitude equal to 0.06.
Theorem 4
If np > 5 and n(1 − p) > 5 (or n sufficiently large), then the confidence interval of significance
level 1 − α for a proportion p is as follows :
s s
P
bn (1 − P
b n ) P
b n (1 − P
b )
n
IC(p) = Pbn − z α2 , Pbn + z α2
n n
5
Exercice 3 Out of 500 surveyed, 274 said they would vote for candidate A.
1. Give an estimate of p the proportion of people favoring candidate A in the population by confidence
interval at the 0.95 significance level.
2. For what degree of confidence the lower bound would be exactly equal to 50% ?
n
!
X (Xi − µ)2 α α
P χ1− α2 ,n < < χ α2 ,n = 1− − =1−α
σ2 2 2
i=1
then we can show that :
n
X Xn
2 2
(Xi − µ) (Xi − µ)
i=1
< σ 2 < i=1
P = 1−α
χ α
2
,n χ 1− α
2
,n
6
Theorem 5
A confidence interval of significance level1 − α for the parameter σ 2 of the law N (µ, σ) when µ
is known is of the form : " #
nb σ 2 nb
σ 2
n n
IC(σ 2 ) = ,
χ α2 ,n χ1− α2 ,n
′
Therefore Y = (n−1)
σ2
× Sn2 follows the law of χ2 (n − 1) of (n − 1) degrees of freedom. Following the
same approach as in the previous situation, we arrive at the following result :
Theorem 6
A confidence interval of risk α for the parameter σ 2 of the law N (µ, σ) when µ is unknown is
given by : " #
′2 ′2
(n − 1)S n (n − 1)S n
IC(σ 2 ) = ,
χ α2 ,n−1 χ1− α2 ,n−1
Exercice 4 We measured the total alcohol content (expressed in g/l) of a sample of of 10 bottles of
sweet cider on the market. The following values were obtained x1 , x2 , x3 , . . . , x10 such that :
10
X 10
X
xi = 62 and x2i = 388.4124
i=1 i=1
The quantity of alcohol in a bottle is modelled by a random variable X following a normal distribution
with expectation µ and variance σ 2 , where the parameters µ and σ are unknown.
4. (a) If n denotes the size of a large sample (n > 50), express as a function of n the amplitude of
the confidence interval of µ at the 95% confidence level.
(b) We wish to construct a confidence interval of µ at the 95% confidence level having an am-
′
plitude of 0.2. What is the sample size when the sample variance is sn2 = 0, 6 g/l ?