Chapter 2
Chapter 2
FAROUK MSELMI
INFERENTIAL STATISTICS
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
χ2n ≡ X
e1 + X
e2 + . . . + X
en ,
where Xei = X 2 , and Xi is standard normal for i = 1, 2, . . . , n, starts looking like a normal density
i
function when n gets large.
Theorem
(The Central Limit Theorem) (CLT) : Let X1 , X2 , . . . , Xn be identical, independent random
variables, each having mean µ, variance σ 2 , and standard deviation σ. Then for large n, the
random variable
S ≡ X1 + X2 + . . . + Xn ,
√
which has mean nµ, variance nσ 2 , and standard deviation nσ, is approximately normal.
S − nµ
NOTE : Thus, √ is approximately standard normal.
nσ
Example
Recall that χ2n ≡ X12 + X22 + . . . + Xn2 , where each Xi is standard normal, and (using moments) we
√
found χ2n has mean n and standard deviation 2n. The Table below illustrates the accuracy of the
approximation √
n n
P(χ2n ≤ 0) ≈ Φ 0 − √ = Φ −√ .
2n 2
√
n P(χ2n ≤ 0) ≈ Φ − √n
2
2 0.1587
8 0.0228
18 0.0013
1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n
σ2 σ
which has mean µ, variance n
, and standard deviation √
n
, is approximately normal.
X̄ − µ
NOTE : Thus, √ is approximately standard normal.
σ/ n
Example 1
Suppose X1 , X2 , . . . , Xn are identical, independent, uniform random variables, each having density
function
1
2 , for x ∈ [−1, 1],
f (x) =
0, otherwise.
1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n
Example 2
The Central Limit Theorem (CLT) also applies to discrete random variables. The Binomial random
variable, with probability mass function
n
P(X = k ) = · pk · (1 − p)n−k , 0 ≤ k ≤ n,
k
is already a sum (namely, of Bernoulli random variables). Thus its binomial probability mass
function already "looks normal."
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
Sample statistics
Definition
A random sample from a population consists of independent, identically distributed random
variables, X1 , X2 , . . . , Xn .
The values of the Xi are called the outcomes of the experiment.
A statistic is a function of X1 , X2 , . . . , Xn .
Thus, a statistic itself is a random variable.
Example
The most important statistics are :
1
The sample mean X̄ ≡ n
(X1 + X2 + . . . + Xn ).
n
1X
The sample variance S 2 ≡ (Xk − X̄ )2 (to be discussed in detail).
n
k =1
√
The sample standard deviation S = S 2 .
The order statistic in which the observations are ordered in size.
The sample median, which is
The midvalue of the order statistic (if n is odd).
The average of the two middle values (if n is even).
The sample range : the difference between the largest and the smallest observation.
Example
For the 8 observations −0.737, 0.511, −0.083, 0.066, −0.562, −0.906, 0.358, 0.359 from the first
row of the Table given earlier, we have :
Sample mean :
1
X̄ = (−0.737 + 0.511 − 0.083 + 0.066 − 0.562 − 0.906 + 0.358 + 0.359) = −0.124.
8
Sample variance : 0.26.
Sample standard deviation : √
0.26 = 0.51.
The order statistic : −0.906, −0.737, −0.562, −0.083, 0.066, 0.358, 0.359, 0.511.
(−0.083 + 0.066)
The sample median : = −0.0085.
2
The sample range : 0.511 − (−0.906) = 1.417.
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
A point estimate of some population parameter θ is a single value θ̂ of a statistic Θ̂. What are the
desirable properties of a “good” decision function that would influence us to choose one estimator
rather than another ? Let Θ̂ be an estimator whose value θ̂ is a point estimate of some unknown
population parameter θ. Certainly, we would like the sampling distribution of Θ̂ to have a mean
equal to the parameter estimated. An estimator possessing this property is said to be unbiased.
Definitions
1 A statistic Θ̂ is said to be an unbiased estimator of the parameter θ if
µΘ̂ = E(Θ̂) = θ.
2 If we consider all possible unbiased estimators of some parameter θ, the one with the
smallest variance is called the most efficient estimator of θ.
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
Suppose the population mean and standard deviation are µ and σ. As before, the sample mean
1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n
is also a random variable, with expected value
1
µX̄ ≡ E[X̄ ] = E (X1 + X2 + . . . + Xn ) = µ,
n
and variance
σ2
σX̄2 ≡ Var(X̄ ) = ,
n
Standard deviation of X̄ : σ
σX̄ = √ .
n
NOTE : The sample mean approximates the population mean µ.
are quite different. Nevertheless, we will show that for large n their values are close ! Thus, for
large n, we have the approximation
S2 ≈ σ2 .
FACT 2 : From
σ 2 ≡ Var(X ) ≡ E[(X − µ)2 ] = E[X 2 ] − µ2 ,
we (obviously) have
E[X 2 ] = σ 2 + µ2 .
FACT 3 : Recall that for independent, identically distributed Xk , where each Xk has mean µ and
variance σ 2 , we have
σ2
µX̄ ≡ E[X̄ ] = µ, σX̄2 ≡ E[(X̄ − µ)2 ] = .
n
FACT 4 : (Useful for computing S 2 efficiently) :
n
" n #
2 1X 2 1 X 2 2
S ≡ (Xk − X̄ ) = Xk − X̄ .
n n
k =1 k =1
Theorem
The sample variance
n
1X
S2 ≡ (Xk − X̄ )2
n
k =1
Proof :
n
" # " n #
1X 1 X
E[S 2 ] = E (Xk − X̄ )2 = E[Xk2 ] − E[X̄ 2 ] (using Fact 4)
n n
k =1 k =1
n
1X 2 σ2
= (σ + µ2 ) − E[X̄ 2 ] = σ 2 + µ2 − − µ2 (using Fact 2 )
n n
k =1
1
= 1− σ 2 (using Fact 3).
n
Remark
Thus, lim E[S 2 ] = σ 2 .
n→∞
Theorem
The sample variance
n
1 X
Ŝ 2 ≡ (Xk − X̄ )2
n−1
k =1
How good is this approximation for normal random variables Xk ? To answer this, we need :
FACT 5 :
X n Xn
(Xk − µ)2 − (Xk − X̄ )2 = n(X̄ − µ)2 .
k =1 k =1
Rewrite Fact 5 as :
n n
! 2
Xk − µ 2
X n 1X X̄ − µ
− 2 (Xk − X̄ )2 = √ ,
σ σ n σ/ n
k =1 k =1
and then as
n
X n 2
Zk2 − S = Z 2,
σ2
k =1
where S2 is the sample variance, and Z and Zk are standard normal because the Xk are normal.
Finally, we can write the above as
n 2
S = χ2n − χ21 .
σ2
Theorem
n 2
For samples from a normal distribution, S has the χ2n−1 distribution !
σ2
Remark
If we use the alternate definition
n
1 X
Ŝ 2 ≡ (Xk − X̄ )2 ,
n−1
k =1
|RX ,Y | ≤ 1.
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
Definition
The maximum likelihood estimate θ̂ is the value of θ that maximizes f (x1 , x2 , . . . , xn ; θ).
Example 1
For our normal distribution with mean 0, we have
n
1 − 1
Pn 2
k =1 xk
f (x1 , x2 , . . . , xn ; σ) = √ e 2σ2 .
2πσ
To find the maximum (with respect to σ), we set
d
f (x1 , x2 , . . . , xn ; σ) = 0,
dσ
or, equivalently, we set
d 1 − 12 Pn 2
k =1 xk
log e 2σ = 0.
dσ σn
Taking the (natural) logarithm gives
n
" #
d 1 X 2
− 2 xk − n log σ = 0.
dσ 2σ
k =1
Example 1 : continued
We had
n
" #
d 1 X 2
− 2 xk − n log σ = 0.
dσ 2σ
k =1
from which
n
1X 2
σ̂ 2 = xk .
n
k =1
Example 2
Consider the special exponential density function
2 −λx , if x > 0
λ xe
f (x; λ) =
0, if x ≤ 0
f (x; λ) = λ2 xe−λx ,
d h i
log λ2n x1 x2 . . . xn e−λ(x1 +x2 +...+xn ) = 0.
dλ
Taking the logarithm gives
n n
" #
d X X
2n log λ + log xk − λ xk = 0.
dλ
k =1 k =1
Example 2 : continued
We had
n n
" #
d X X
2n log λ + log xk − λ xk = 0.
dλ
k =1 k =1
Differentiating gives
n
2n X
− xk = 0,
λ
k =1
from which
2n
λ̂ = Pn .
k =1 xk
Thus, we have derived the maximum likelihood estimate
2n 2
λ̂ = Pn = .
k =1 Xk X̄
2
NOTE : This result suggests that perhaps E[X ] = λ
.
NOTE :
Maximum likelihood estimates also work in the discrete case.
In such cases, we maximize the probability mass function.
Example 3
Find the maximum likelihood estimator of p in the Bernoulli trial
P(X = 1) = p, P(X = 0) = 1 − p.
We can write
P(x; p) ≡ P(X = x) = px (1 − p)1−x , (x = 0, 1)
( !) so, assuming independence, the joint probability mass function is
Pn Pn
P(x1 , x2 , . . . , xn ; p) = p k =1 xk · (1 − p)n · (1 − p)− k =1 xk .
We found Pn Pn
P(x1 , x2 , . . . , xn ; p) = p k =1 xk · (1 − p)n · (1 − p)− k =1 xk .
To find the maximum (with respect to p), we set
d h Pn Pn i
log p k =1 xk · (1 − p)n · (1 − p)− k =1 xk = 0.
dp
Example 3 : continued
Taking the logarithm gives
d h Pn Pn i
log p k =1 xk + n log(1 − p) − log((1 − p) k =1 xk ) = 0.
dp
Differentiating gives
n n
1X n 1 X
xk − + xk = 0.
p 1−p 1−p
k =1 k =1
Multiplying by 1 − p gives
n n
1X 1X
1−p+ xk = xk = n,
p p
k =1 k =1
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
Even the most efficient unbiased estimator is unlikely to estimate the population parameter exactly.
It is true that estimation accuracy increases with large samples, but there is still no reason we
should expect a point estimate from a given sample to be exactly equal to the population
parameter it is supposed to estimate. There are many situations in which it is preferable to
determine an interval within which we would expect to find the value of the parameter. Such an
interval is called an interval estimate.
An interval estimate of a population parameter θ is an interval of the form θ̂L < θ < θ̂U , where θ̂L
and θ̂U depend on the value of the statistic Θ̂ for a particular sample and also on the sampling
distribution of Θ̂.
Since different samples will generally yield different values of Θ̂ and, therefore, different values for
θ̂L and θ̂U , these endpoints of the interval are values of corresponding random variables Θ̂L and
Θ̂U . From the sampling distribution of Θ̂, we shall be able to determine Θ̂L and Θ̂U such that
P(Θ̂L < θ < Θ̂U ) is equal to any Positive fractional value we care to specify. If, for instance, we
find Θ̂L and Θ̂U such that
P(Θ̂L < θ < Θ̂U ) = 1 − α,
for 0 < α < 1, then we have a probability of 1 − α of selecting a random sample that will produce
an interval containing θ. The interval θ̂L < θ < θ̂U , computed from the selected sample, is called a
100(1 − α)% confidence interval, the fraction 1 − α is called the confidence coefficient or the
degree of confidence, and the endpoints, θ̂L and θ̂U , are called the lower and upper confidence
limits.
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
According to the Central Limit Theorem, we can expect the sampling distribution of X̄ to be
approximately normally distributed with mean µX̄ = µ and standard deviation σX̄ = √σn . Writing
zα/2 for the z-value above which we find an area of α/2 under the normal curve, we can see that
X̄ − µ
P −zα/2 < Z < zα/2 = 1 − α, where Z = √ .
σ/ n
X̄ − µ
Hence, P −zα/2 < √ < zα/2 = 1 − α.
σ/ n
√
Multiplying each term in the inequality by σ/ n and then subtracting X̄ from each term and
multiplying by −1 (reversing the sense of the inequalities), we obtain
!
X̄ − zα/2 σ X̄ + zα/2 σ
P √ <µ< √ = 1 − α.
n n
Farouk Mselmi SAMPLING AND ESTIMATION 39 / 54
Interval estimation Estimating the mean
Example
The average zinc concentration recovered from a sample of measurements taken in 36 different
locations in a river is found to be 2.6 grams per milliliter. Find the 95% and 99% confidence
intervals for the mean zinc concentration in the river. Assume that the population standard
deviation is 0.3 gram per milliliter.
Solution
The point estimate of µ is x̄ = 2.6. The z-value leaving an area of 0.025 to the right, and therefore
an area of 0.975 to the left, is z0.025 = 1.96. Hence, the 95% confidence interval is
0.3 0.3
2.6 − (1.96) √ < µ < 2.6 + (1.96) √ ,
36 36
Which reduces to 2.50 < µ < 2.70. To find a 99% confidence interval, we find the z-value leaving
an area of 0.005 to the right and 0.995 to the left. From Table A.3 again, z0.005 = 2.575, and the
99% confidence interval is
0.3 0.3
2.6 − (2.575) √ < µ < 2.6 + (2.575) √ ,
36 36
or simply
2.47 < µ < 2.73.
We now see that a longer interval is required to estimate µ with a higher degree of confidence.
Student t-distribution
Let Z be a standard normal random variable and V a chi-squared random variable with v degrees
of freedom. If Z and V are independent, then the distribution of the random variable T , where
Z
T = p ,
V /v
for −∞ < t < ∞. This is known as the t-distribution with v degrees of freedom.
In the case of σ unknown, with σ unknown, T can be used to construct a confidence interval on µ.
The procedure is the same as that with σ known except that σ is replaced by S and the standard
normal distribution is replaced by the t-distribution. Referring to Figure 9.5, we can assert that
P −tα/2 < T < tα/2 = 1 − α,
where tα/2 is the t-value with n − 1 degrees of freedom, above which we find an area of α/2.
Because of symmetry, an equal area of α/2 will fall to the left of −tα/2 . Substituting for T , we write
S S
P X̄ − tα/2 √ < µ < X̄ + tα/2 √ = 1 − α.
n n
where tα/2 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the right.
Example
The contents of seven similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2, and
9.6 liters. Find a 95% confidence interval for the mean contents of all such containers, assuming
an approximately normal distribution.
Solution
The sample mean and standard deviation for the given data are x̄ = 10.0 and s = 0.283. We find
t0.025 = 2.447 for v = 6 degrees of freedom. Hence, the 95% confidence interval is :
0.283
10.0 ± 2.447 · √ ≈ (9.722, 10.278) liters.
7
The 95% confidence interval for µ is :
0.283 0.283
10.0 − (2.447) · √ < µ < 10.0 + (2.447) · √
7 7
which reduces to 9.74 < µ < 10.26.
2 Sample statistics
4 Interval estimation
Estimating the mean
Estimating the variance
If a sample of size n is drawn from a normal population with variance σ 2 and the sample variance
s2 is computed, we obtain a value of the statistic S 2 . This computed sample variance is used as a
point estimate of σ 2 . Hence, the statistic S 2 is called an estimator of σ 2 .
An interval estimate of σ 2 can be established by using the statistic
(n − 1)S 2
X2 = .
σ2
The statistic X 2 has a chi-squared distribution with n − 1 degrees of freedom when samples are
chosen from a normal population. We may write
where χ21−α/2 and χ2α/2 are values of the chi-squared distribution with n − 1 degrees of freedom,
leaving areas of 1 − α/2 and α/2, respectively, to the right.
Dividing each term in the inequality by (n − 1)S 2 and then inverting each term (thereby changing
the sense of the inequalities), we obtain
!
(n − 1)S 2 2 (n − 1)S 2
P <σ < = 1 − α.
χ2α/2 χ21−α/2
where χ2α/2 and χ21−α/2 are χ2 -values with v = n − 1 degrees of freedom, leaving areas of α/2
and 1 − α/2, respectively, to the right.
Example
The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain
company : 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0. Find a 95% confidence
interval for the variance of the weights of all such packages of grass seed distributed by this
company, assuming a normal population.
Solution
First, we find
Pn 2
Pn 2
i=1 xi − i=1 xi /n (10)(21, 273.12) − (461.2)2
s2 = = = 0.286.
n(n − 1) (10)(9)
To obtain a 95% confidence interval, we choose α = 0.05. Then, we find χ20.025 = 19.023 and
χ20.975 = 2.700. Therefore, the 95% confidence interval for σ 2 is
(9)(0.286) (9)(0.286)
< σ2 < ,
19.023 2.700
THANK YOU