0% found this document useful (0 votes)

12 views54 pages

Chapter 2

Uploaded by

meryemjaddou31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views54 pages

Chapter 2

Uploaded by

meryemjaddou31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

SAMPLING AND ESTIMATION

FAROUK MSELMI

INFERENTIAL STATISTICS

Farouk Mselmi SAMPLING AND ESTIMATION 1 / 54

PLAN

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 2 / 54

The central limit theorem

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 3 / 54

The central limit theorem

The density function of the Chi-Square random variable

χ2n ≡ X
e1 + X
e2 + . . . + X
en ,

where Xei = X 2 , and Xi is standard normal for i = 1, 2, . . . , n, starts looking like a normal density
i
function when n gets large.

This remarkable fact holds much more generally !

It is known as the Central Limit Theorem (CLT).

RECALL : If X1 , X2 , . . . , Xn are independent, identically distributed, each having mean µ, variance

σ 2 , and standard deviation σ, then
S ≡ X1 + X2 + . . . + Xn ,
mean : µS ≡ E[S] = nµ,
variance : Var(S) = nσ 2 ,
√
Standard deviation : σS = nσ.
NOTE : σS gets bigger as n increases (and so does |µS |).

Farouk Mselmi SAMPLING AND ESTIMATION 4 / 54

The central limit theorem

Theorem
(The Central Limit Theorem) (CLT) : Let X1 , X2 , . . . , Xn be identical, independent random
variables, each having mean µ, variance σ 2 , and standard deviation σ. Then for large n, the
random variable
S ≡ X1 + X2 + . . . + Xn ,
√
which has mean nµ, variance nσ 2 , and standard deviation nσ, is approximately normal.

S − nµ
NOTE : Thus, √ is approximately standard normal.
nσ

Farouk Mselmi SAMPLING AND ESTIMATION 5 / 54

The central limit theorem

Example
Recall that χ2n ≡ X12 + X22 + . . . + Xn2 , where each Xi is standard normal, and (using moments) we
√
found χ2n has mean n and standard deviation 2n. The Table below illustrates the accuracy of the
approximation √
n n
P(χ2n ≤ 0) ≈ Φ 0 − √ = Φ −√ .
2n 2
√
n P(χ2n ≤ 0) ≈ Φ − √n
2
2 0.1587
8 0.0228
18 0.0013

QUESTION : What is the exact value of P(χ2n ≤ 0) ? ( !)

Farouk Mselmi SAMPLING AND ESTIMATION 6 / 54

The central limit theorem

RECALL : If X1 , X2 , . . . , Xn are independent, identically distributed, each having mean µ, variance

σ 2 , and standard deviation σ, then
1
X̄ ≡ n
(X1 + X2 + . . . + Xn ),
mean : µX̄ ≡ E[X̄ ] = µ,
1 σ2
variance : σX̄2 = n2
nσ 2 =
n
,
Standard deviation : σX̄ = √σn .
NOTE : σX̄ gets smaller as n increases.

Corollary (to the Central Limit Theorem)

If X1 , X2 , . . . , Xn are identical, independent random variables, each having mean µ, variance σ 2 ,
and standard deviation σ, then for large n the random variable

1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n
σ2 σ
which has mean µ, variance n
, and standard deviation √
n
, is approximately normal.

X̄ − µ
NOTE : Thus, √ is approximately standard normal.
σ/ n

Farouk Mselmi SAMPLING AND ESTIMATION 7 / 54

The central limit theorem

Example 1
Suppose X1 , X2 , . . . , Xn are identical, independent, uniform random variables, each having density
function 
1
 2 , for x ∈ [−1, 1],

f (x) =

0, otherwise.


with mean µ = 0 and standard deviation σ = √1 .

3
Then for large n, the random variable

1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n

with mean µ = 0 and standard deviation σ = √1 , is approximately normal. Therefore,

3n

x − 0 √
P(X̄ ≤ x) ≈ Φ  1  ≈ Φ 3nx .
√
3n

Farouk Mselmi SAMPLING AND ESTIMATION 8 / 54

The central limit theorem

Example 2
The Central Limit Theorem (CLT) also applies to discrete random variables. The Binomial random
variable, with probability mass function
n
P(X = k ) = · pk · (1 − p)n−k , 0 ≤ k ≤ n,
k
is already a sum (namely, of Bernoulli random variables). Thus its binomial probability mass
function already "looks normal."

Farouk Mselmi SAMPLING AND ESTIMATION 9 / 54

Sample statistics

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 10 / 54

Sample statistics

Sampling can consist of :

Gathering random data from a large population, for example,
Measuring the height of randomly selected adults.
Measuring the starting salary of random computer science graduates.
Recording the results of experiments, for example,
Measuring the breaking strength of randomly selected bolts.
Measuring the lifetime of randomly selected light bulbs.

We shall generally assume the population is infinite (or large).

We shall also generally assume the observations are independent.
The outcome of any experiment does not affect other experiments.

Definition
A random sample from a population consists of independent, identically distributed random
variables, X1 , X2 , . . . , Xn .
The values of the Xi are called the outcomes of the experiment.
A statistic is a function of X1 , X2 , . . . , Xn .
Thus, a statistic itself is a random variable.

Farouk Mselmi SAMPLING AND ESTIMATION 11 / 54

Sample statistics

Example
The most important statistics are :
1
The sample mean X̄ ≡ n
(X1 + X2 + . . . + Xn ).
n
1X
The sample variance S 2 ≡ (Xk − X̄ )2 (to be discussed in detail).
n
k =1
√
The sample standard deviation S = S 2 .
The order statistic in which the observations are ordered in size.
The sample median, which is
The midvalue of the order statistic (if n is odd).
The average of the two middle values (if n is even).
The sample range : the difference between the largest and the smallest observation.

Farouk Mselmi SAMPLING AND ESTIMATION 12 / 54

Sample statistics

Example
For the 8 observations −0.737, 0.511, −0.083, 0.066, −0.562, −0.906, 0.358, 0.359 from the first
row of the Table given earlier, we have :
Sample mean :

1
X̄ = (−0.737 + 0.511 − 0.083 + 0.066 − 0.562 − 0.906 + 0.358 + 0.359) = −0.124.
8
Sample variance : 0.26.
Sample standard deviation : √
0.26 = 0.51.
The order statistic : −0.906, −0.737, −0.562, −0.083, 0.066, 0.358, 0.359, 0.511.
(−0.083 + 0.066)
The sample median : = −0.0085.
2
The sample range : 0.511 − (−0.906) = 1.417.

Farouk Mselmi SAMPLING AND ESTIMATION 13 / 54

Classical methods of point estimation

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 14 / 54

Classical methods of point estimation

A point estimate of some population parameter θ is a single value θ̂ of a statistic Θ̂. What are the
desirable properties of a “good” decision function that would influence us to choose one estimator
rather than another ? Let Θ̂ be an estimator whose value θ̂ is a point estimate of some unknown
population parameter θ. Certainly, we would like the sampling distribution of Θ̂ to have a mean
equal to the parameter estimated. An estimator possessing this property is said to be unbiased.

Definitions
1 A statistic Θ̂ is said to be an unbiased estimator of the parameter θ if

µΘ̂ = E(Θ̂) = θ.

2 If we consider all possible unbiased estimators of some parameter θ, the one with the
smallest variance is called the most efficient estimator of θ.

Farouk Mselmi SAMPLING AND ESTIMATION 15 / 54

Classical methods of point estimation The method of the moments

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 16 / 54

Classical methods of point estimation The method of the moments

The sample mean

Suppose the population mean and standard deviation are µ and σ. As before, the sample mean

1
X̄ ≡ (X1 + X2 + . . . + Xn ),
n
is also a random variable, with expected value

1
µX̄ ≡ E[X̄ ] = E (X1 + X2 + . . . + Xn ) = µ,
n

and variance
σ2
σX̄2 ≡ Var(X̄ ) = ,
n
Standard deviation of X̄ : σ
σX̄ = √ .
n
NOTE : The sample mean approximates the population mean µ.

Farouk Mselmi SAMPLING AND ESTIMATION 17 / 54

Classical methods of point estimation The method of the moments

The sample variance

The sample variance is given by

n
1X
S2 ≡ (Xk − X̄ )2 ,
n
k =1

and the population variance (for discrete random variables)

X
σ 2 ≡ E[(X − µ)2 ] ≡ [(Xk − µ)2 · p(Xk )],
k

are quite different. Nevertheless, we will show that for large n their values are close ! Thus, for
large n, we have the approximation
S2 ≈ σ2 .

Farouk Mselmi SAMPLING AND ESTIMATION 18 / 54

Classical methods of point estimation The method of the moments

FACT 1 : We (obviously) have that

n n
1X X
X̄ = Xk =⇒ Xk = nX̄ .
n
k =1 k =1

FACT 2 : From
σ 2 ≡ Var(X ) ≡ E[(X − µ)2 ] = E[X 2 ] − µ2 ,
we (obviously) have
E[X 2 ] = σ 2 + µ2 .
FACT 3 : Recall that for independent, identically distributed Xk , where each Xk has mean µ and
variance σ 2 , we have
σ2
µX̄ ≡ E[X̄ ] = µ, σX̄2 ≡ E[(X̄ − µ)2 ] = .
n
FACT 4 : (Useful for computing S 2 efficiently) :
n
" n #
2 1X 2 1 X 2 2
S ≡ (Xk − X̄ ) = Xk − X̄ .
n n
k =1 k =1

Farouk Mselmi SAMPLING AND ESTIMATION 19 / 54

Classical methods of point estimation The method of the moments

Theorem
The sample variance
n
1X
S2 ≡ (Xk − X̄ )2
n
k =1

has expected value

1
E[S 2 ] = 1 − · σ2 .
n

Proof :
n
" # " n #
1X 1 X
E[S 2 ] = E (Xk − X̄ )2 = E[Xk2 ] − E[X̄ 2 ] (using Fact 4)
n n
k =1 k =1
n
1X 2 σ2
= (σ + µ2 ) − E[X̄ 2 ] = σ 2 + µ2 − − µ2 (using Fact 2 )
n n
k =1

1
= 1− σ 2 (using Fact 3).
n

Remark
Thus, lim E[S 2 ] = σ 2 .
n→∞

Farouk Mselmi SAMPLING AND ESTIMATION 20 / 54

Classical methods of point estimation The method of the moments

Most authors instead define the sample variance as

n
1 X
Ŝ 2 ≡ (Xk − X̄ )2 .
n−1
k =1

In this case, the theorem becomes :

Theorem
The sample variance
n
1 X
Ŝ 2 ≡ (Xk − X̄ )2
n−1
k =1

has expected value

E[Ŝ 2 ] = σ 2 .

Farouk Mselmi SAMPLING AND ESTIMATION 21 / 54

Classical methods of point estimation The method of the moments

Estimating the variance of a normal distribution

We have shown that

n
1X
S2 ≡ (Xk − X̄ )2 ≈ σ 2 .
n
k =1

How good is this approximation for normal random variables Xk ? To answer this, we need :
FACT 5 :
X n Xn
(Xk − µ)2 − (Xk − X̄ )2 = n(X̄ − µ)2 .
k =1 k =1

Rewrite Fact 5 as :
n n
! 2
Xk − µ 2

X n 1X X̄ − µ
− 2 (Xk − X̄ )2 = √ ,
σ σ n σ/ n
k =1 k =1

and then as
n
X n 2
Zk2 − S = Z 2,
σ2
k =1

where S2 is the sample variance, and Z and Zk are standard normal because the Xk are normal.
Finally, we can write the above as
n 2
S = χ2n − χ21 .
σ2

Farouk Mselmi SAMPLING AND ESTIMATION 22 / 54

Classical methods of point estimation The method of the moments

We have found that

n 2
S = χ2n − χ21 .
σ2

Theorem
n 2
For samples from a normal distribution, S has the χ2n−1 distribution !
σ2

Remark
If we use the alternate definition
n
1 X
Ŝ 2 ≡ (Xk − X̄ )2 ,
n−1
k =1

then the Theorem becomes

n−1 2
Ŝ has the χ2n−1 distribution.
σ2

Farouk Mselmi SAMPLING AND ESTIMATION 23 / 54

Classical methods of point estimation The method of the moments

The sample correlation coefficient

Recall the covariance of random variables X and Y :

σX ,Y ≡ Cov(X , Y ) ≡ E[(X − µX )(Y − µY )] = E[XY ] − E[X ]E[Y ].

It is often better to use a scaled version, the correlation coefficient

σX ,Y
ρX ,Y ≡ ,
σX σY

where σX and σY are the standard deviations of X and Y .

We have :
|σX ,Y | ≤ σX σY (the Cauchy-Schwarz inequality),
Thus |ρX ,Y | ≤ 1,
If X and Y are independent, then ρX ,Y = 0.

Farouk Mselmi SAMPLING AND ESTIMATION 24 / 54

Classical methods of point estimation The method of the moments

Similarly, the sample correlation coefficient of a data set {(Xi , Yi )}N

i=1 is defined as
PN
− X̄ )(Yi − Ȳ )
i=1 (Xi
RX ,Y ≡ qP qP ,
N 2 N 2
i=1 (Xi − X̄ ) i=1 (Yi − Ȳ )

for which we have another version of the Cauchy-Schwarz inequality :

|RX ,Y | ≤ 1.

Like the covariance, RX ,Y measures ”concordance” of X and Y :

If Xi > X̄ when Yi > Ȳ and Xi < X̄ when Yi < Ȳ , then RX ,Y > 0.
If Xi > X̄ when Yi < Ȳ and Xi < X̄ when Yi > Ȳ , then RX ,Y < 0.

Farouk Mselmi SAMPLING AND ESTIMATION 25 / 54

Classical methods of point estimation The method of the moments

The sample correlation coefficient

PN
− X̄ )(Yi − Ȳ )
i=1 (Xi
RX ,Y ≡ qP qP
N 2 N 2
i=1 (Xi − X̄ ) i=1 (Yi − Ȳ )

can also be used to test for linearity of the data. In fact :

If |RX ,Y | = 1, then X and Y are related linearly.
Specifically :
If RX ,Y = 1, then Yi = cXi + d, for constants c, d, with c > 0.
If RX ,Y = −1, then Yi = cXi + d, for constants c, d, with c < 0.
Also :
If |RX ,Y | ≈ 1, then X and Y are almost linear.

Farouk Mselmi SAMPLING AND ESTIMATION 26 / 54

Classical methods of point estimation The maximum likelihood method

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 27 / 54

Classical methods of point estimation The maximum likelihood method

The maximum likelihood procedure is the following : Let X1 , X2 , . . . , Xn be independent, identically

distributed, each having density function f (x; θ), with an unknown parameter θ. By independence,
the joint density function is

f (x1 , x2 , . . . , xn ; θ) = f (x1 ; θ)f (x2 ; θ) . . . f (xn ; θ),

Definition
The maximum likelihood estimate θ̂ is the value of θ that maximizes f (x1 , x2 , . . . , xn ; θ).

NOTE : θ̂ will be a function of x1 , x2 , . . . , xn .

Farouk Mselmi SAMPLING AND ESTIMATION 28 / 54

Classical methods of point estimation The maximum likelihood method

Example 1
For our normal distribution with mean 0, we have
n
1 − 1
Pn 2
k =1 xk
f (x1 , x2 , . . . , xn ; σ) = √ e 2σ2 .
2πσ
To find the maximum (with respect to σ), we set

d
f (x1 , x2 , . . . , xn ; σ) = 0,
dσ
or, equivalently, we set
d 1 − 12 Pn 2
k =1 xk
log e 2σ = 0.
dσ σn
Taking the (natural) logarithm gives
n
" #
d 1 X 2
− 2 xk − n log σ = 0.
dσ 2σ
k =1

Farouk Mselmi SAMPLING AND ESTIMATION 29 / 54

Classical methods of point estimation The maximum likelihood method

Example 1 : continued
We had
n
" #
d 1 X 2
− 2 xk − n log σ = 0.
dσ 2σ
k =1

Taking the derivative gives

n
1 X 2 n
3
xk − = 0,
σ σ
k =1

from which
n
1X 2
σ̂ 2 = xk .
n
k =1

Thus, we have derived the maximum likelihood estimate

v
u n
1 u X
σ̂ = √ t X 2.
n k =1 k

Farouk Mselmi SAMPLING AND ESTIMATION 30 / 54

Classical methods of point estimation The maximum likelihood method

Example 2
Consider the special exponential density function

2 −λx , if x > 0
λ xe

f (x; λ) =

0, if x ≤ 0


For the maximum likelihood estimator of λ, we have

f (x; λ) = λ2 xe−λx ,

for x > 0. So, assuming independence, the joint density function is

f (x1 , x2 , . . . , xn ; λ) = λ2n x1 x2 . . . xn e−λ(x1 +x2 +...+xn ) .

To find the maximum (with respect to λ), we set

d h i
log λ2n x1 x2 . . . xn e−λ(x1 +x2 +...+xn ) = 0.
dλ
Taking the logarithm gives
n n
" #
d X X
2n log λ + log xk − λ xk = 0.
dλ
k =1 k =1

Farouk Mselmi SAMPLING AND ESTIMATION 31 / 54

Classical methods of point estimation The maximum likelihood method

Example 2 : continued
We had
n n
" #
d X X
2n log λ + log xk − λ xk = 0.
dλ
k =1 k =1

Differentiating gives
n
2n X
− xk = 0,
λ
k =1

from which
2n
λ̂ = Pn .
k =1 xk
Thus, we have derived the maximum likelihood estimate
2n 2
λ̂ = Pn = .
k =1 Xk X̄
2
NOTE : This result suggests that perhaps E[X ] = λ
.

Farouk Mselmi SAMPLING AND ESTIMATION 32 / 54

Classical methods of point estimation The maximum likelihood method

NOTE :
Maximum likelihood estimates also work in the discrete case.
In such cases, we maximize the probability mass function.

Example 3
Find the maximum likelihood estimator of p in the Bernoulli trial

P(X = 1) = p, P(X = 0) = 1 − p.

We can write
P(x; p) ≡ P(X = x) = px (1 − p)1−x , (x = 0, 1)
( !) so, assuming independence, the joint probability mass function is
Pn Pn
P(x1 , x2 , . . . , xn ; p) = p k =1 xk · (1 − p)n · (1 − p)− k =1 xk .

We found Pn Pn
P(x1 , x2 , . . . , xn ; p) = p k =1 xk · (1 − p)n · (1 − p)− k =1 xk .
To find the maximum (with respect to p), we set

d h Pn Pn i
log p k =1 xk · (1 − p)n · (1 − p)− k =1 xk = 0.
dp

Farouk Mselmi SAMPLING AND ESTIMATION 33 / 54

Classical methods of point estimation The maximum likelihood method

Example 3 : continued
Taking the logarithm gives

d h Pn Pn i
log p k =1 xk + n log(1 − p) − log((1 − p) k =1 xk ) = 0.
dp

Differentiating gives
n n
1X n 1 X
xk − + xk = 0.
p 1−p 1−p
k =1 k =1

from which, we obtain

n
1 1 X n
+ xk = .
p 1−p 1−p
k =1

Multiplying by 1 − p gives
n n
1X 1X
1−p+ xk = xk = n,
p p
k =1 k =1

from which we obtain the maximum likelihood estimator

n
1X
p̂ = xk = X̄ .
n
k =1

Farouk Mselmi SAMPLING AND ESTIMATION 34 / 54

Interval estimation

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 35 / 54

Interval estimation

Even the most efficient unbiased estimator is unlikely to estimate the population parameter exactly.
It is true that estimation accuracy increases with large samples, but there is still no reason we
should expect a point estimate from a given sample to be exactly equal to the population
parameter it is supposed to estimate. There are many situations in which it is preferable to
determine an interval within which we would expect to find the value of the parameter. Such an
interval is called an interval estimate.

An interval estimate of a population parameter θ is an interval of the form θ̂L < θ < θ̂U , where θ̂L
and θ̂U depend on the value of the statistic Θ̂ for a particular sample and also on the sampling
distribution of Θ̂.

Farouk Mselmi SAMPLING AND ESTIMATION 36 / 54

Interval estimation

Since different samples will generally yield different values of Θ̂ and, therefore, different values for
θ̂L and θ̂U , these endpoints of the interval are values of corresponding random variables Θ̂L and
Θ̂U . From the sampling distribution of Θ̂, we shall be able to determine Θ̂L and Θ̂U such that
P(Θ̂L < θ < Θ̂U ) is equal to any Positive fractional value we care to specify. If, for instance, we
find Θ̂L and Θ̂U such that
P(Θ̂L < θ < Θ̂U ) = 1 − α,
for 0 < α < 1, then we have a probability of 1 − α of selecting a random sample that will produce
an interval containing θ. The interval θ̂L < θ < θ̂U , computed from the selected sample, is called a
100(1 − α)% confidence interval, the fraction 1 − α is called the confidence coefficient or the
degree of confidence, and the endpoints, θ̂L and θ̂U , are called the lower and upper confidence
limits.

Farouk Mselmi SAMPLING AND ESTIMATION 37 / 54

Interval estimation Estimating the mean

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 38 / 54

Interval estimation Estimating the mean

The case of σ known

According to the Central Limit Theorem, we can expect the sampling distribution of X̄ to be
approximately normally distributed with mean µX̄ = µ and standard deviation σX̄ = √σn . Writing
zα/2 for the z-value above which we find an area of α/2 under the normal curve, we can see that
X̄ − µ
P −zα/2 < Z < zα/2 = 1 − α, where Z = √ .
σ/ n

X̄ − µ
Hence, P −zα/2 < √ < zα/2 = 1 − α.
σ/ n

√
Multiplying each term in the inequality by σ/ n and then subtracting X̄ from each term and
multiplying by −1 (reversing the sense of the inequalities), we obtain
!
X̄ − zα/2 σ X̄ + zα/2 σ
P √ <µ< √ = 1 − α.
n n
Farouk Mselmi SAMPLING AND ESTIMATION 39 / 54
Interval estimation Estimating the mean

Confidence Interval on µ, σ 2 Known

If x̄ is the mean of a random sample of size n from a population with known variance σ 2 , a
100(1 − α)% confidence interval for µ is given by
σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √ ,
n n

where zα/2 is the z-value leaving an area of α/2 to the right.

Farouk Mselmi SAMPLING AND ESTIMATION 40 / 54

Interval estimation Estimating the mean

Example
The average zinc concentration recovered from a sample of measurements taken in 36 different
locations in a river is found to be 2.6 grams per milliliter. Find the 95% and 99% confidence
intervals for the mean zinc concentration in the river. Assume that the population standard
deviation is 0.3 gram per milliliter.

Farouk Mselmi SAMPLING AND ESTIMATION 41 / 54

Interval estimation Estimating the mean

Solution
The point estimate of µ is x̄ = 2.6. The z-value leaving an area of 0.025 to the right, and therefore
an area of 0.975 to the left, is z0.025 = 1.96. Hence, the 95% confidence interval is

0.3 0.3
2.6 − (1.96) √ < µ < 2.6 + (1.96) √ ,
36 36
Which reduces to 2.50 < µ < 2.70. To find a 99% confidence interval, we find the z-value leaving
an area of 0.005 to the right and 0.995 to the left. From Table A.3 again, z0.005 = 2.575, and the
99% confidence interval is
0.3 0.3
2.6 − (2.575) √ < µ < 2.6 + (2.575) √ ,
36 36
or simply
2.47 < µ < 2.73.
We now see that a longer interval is required to estimate µ with a higher degree of confidence.

Farouk Mselmi SAMPLING AND ESTIMATION 42 / 54

Interval estimation Estimating the mean

The case of σ unknown

Student t-distribution
Let Z be a standard normal random variable and V a chi-squared random variable with v degrees
of freedom. If Z and V are independent, then the distribution of the random variable T , where

Z
T = p ,
V /v

is given by the density function

!− v +1
v +1
Γ 2 t2 2
h(t) = v
√ 1+ ,
Γ 2
πv v

for −∞ < t < ∞. This is known as the t-distribution with v degrees of freedom.

The random variable

X̄ − µ
T = √
S/ n
has a t-distribution with v = n − 1 degrees of freedom. Here S is the sample standard deviation.

Farouk Mselmi SAMPLING AND ESTIMATION 43 / 54

Interval estimation Estimating the mean

In the case of σ unknown, with σ unknown, T can be used to construct a confidence interval on µ.
The procedure is the same as that with σ known except that σ is replaced by S and the standard
normal distribution is replaced by the t-distribution. Referring to Figure 9.5, we can assert that

P −tα/2 < T < tα/2 = 1 − α,

where tα/2 is the t-value with n − 1 degrees of freedom, above which we find an area of α/2.
Because of symmetry, an equal area of α/2 will fall to the left of −tα/2 . Substituting for T , we write

S S
P X̄ − tα/2 √ < µ < X̄ + tα/2 √ = 1 − α.
n n

Farouk Mselmi SAMPLING AND ESTIMATION 44 / 54

Interval estimation Estimating the mean

Confidence Interval on µ, σ 2 Unknown

If x̄ and s are the mean and standard deviation of a random sample from a normal population with
unknown variance σ 2 , a 100(1 − α)% confidence interval for µ is
s s
x̄ − tα/2 √ < µ < x̄ + tα/2 √ ,
n n

where tα/2 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the right.

Farouk Mselmi SAMPLING AND ESTIMATION 45 / 54

Interval estimation Estimating the mean

Example
The contents of seven similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2, and
9.6 liters. Find a 95% confidence interval for the mean contents of all such containers, assuming
an approximately normal distribution.

Farouk Mselmi SAMPLING AND ESTIMATION 46 / 54

Interval estimation Estimating the mean

Solution
The sample mean and standard deviation for the given data are x̄ = 10.0 and s = 0.283. We find
t0.025 = 2.447 for v = 6 degrees of freedom. Hence, the 95% confidence interval is :

0.283
10.0 ± 2.447 · √ ≈ (9.722, 10.278) liters.
7
The 95% confidence interval for µ is :

0.283 0.283
10.0 − (2.447) · √ < µ < 10.0 + (2.447) · √
7 7
which reduces to 9.74 < µ < 10.26.

Farouk Mselmi SAMPLING AND ESTIMATION 47 / 54

Interval estimation Estimating the variance

1 The central limit theorem

2 Sample statistics

3 Classical methods of point estimation

The method of the moments
The maximum likelihood method

4 Interval estimation
Estimating the mean
Estimating the variance

Farouk Mselmi SAMPLING AND ESTIMATION 48 / 54

Interval estimation Estimating the variance

If a sample of size n is drawn from a normal population with variance σ 2 and the sample variance
s2 is computed, we obtain a value of the statistic S 2 . This computed sample variance is used as a
point estimate of σ 2 . Hence, the statistic S 2 is called an estimator of σ 2 .
An interval estimate of σ 2 can be established by using the statistic

(n − 1)S 2
X2 = .
σ2

The statistic X 2 has a chi-squared distribution with n − 1 degrees of freedom when samples are
chosen from a normal population. We may write

P(χ21−α/2 < X 2 < χ2α/2 ) = 1 − α,

where χ21−α/2 and χ2α/2 are values of the chi-squared distribution with n − 1 degrees of freedom,
leaving areas of 1 − α/2 and α/2, respectively, to the right.

Farouk Mselmi SAMPLING AND ESTIMATION 49 / 54

Interval estimation Estimating the variance

Substituting for X 2 , we write

P χ21−α/2 < (n − 1)S 2 /σ 2 < χ2α/2 = 1 − α.

Dividing each term in the inequality by (n − 1)S 2 and then inverting each term (thereby changing
the sense of the inequalities), we obtain
!
(n − 1)S 2 2 (n − 1)S 2
P <σ < = 1 − α.
χ2α/2 χ21−α/2

Farouk Mselmi SAMPLING AND ESTIMATION 50 / 54

Interval estimation Estimating the variance

Confidence Interval for σ 2

If s2 is the variance of a random sample of size n from a normal population, a 100(1 − α)%
confidence interval for σ 2 is

(n − 1)s2 χ2α/2 < σ 2 < (n − 1)s2 χ21−α/2 ,

where χ2α/2 and χ21−α/2 are χ2 -values with v = n − 1 degrees of freedom, leaving areas of α/2
and 1 − α/2, respectively, to the right.

Farouk Mselmi SAMPLING AND ESTIMATION 51 / 54

Interval estimation Estimating the variance

Example
The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain
company : 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0. Find a 95% confidence
interval for the variance of the weights of all such packages of grass seed distributed by this
company, assuming a normal population.

Farouk Mselmi SAMPLING AND ESTIMATION 52 / 54

Interval estimation Estimating the variance

Solution
First, we find
Pn 2
Pn 2
i=1 xi − i=1 xi /n (10)(21, 273.12) − (461.2)2
s2 = = = 0.286.
n(n − 1) (10)(9)

To obtain a 95% confidence interval, we choose α = 0.05. Then, we find χ20.025 = 19.023 and
χ20.975 = 2.700. Therefore, the 95% confidence interval for σ 2 is

(9)(0.286) (9)(0.286)
< σ2 < ,
19.023 2.700

or simply 0.135 < σ 2 < 0.953.

Farouk Mselmi SAMPLING AND ESTIMATION 53 / 54

Interval estimation Estimating the variance

THANK YOU

Farouk Mselmi SAMPLING AND ESTIMATION 54 / 54

Cooling Water Treatment Advanced Training Course Cooling Water Treatment ... (Pdfdrive)
100% (3)
Cooling Water Treatment Advanced Training Course Cooling Water Treatment ... (Pdfdrive)
266 pages
DocuPrint CM205b - CP205w - CP205 - CP105b Service Manual Draft
0% (2)
DocuPrint CM205b - CP205w - CP205 - CP105b Service Manual Draft
98 pages
Training Programme - On POKA YOKE - 12th March 2010
0% (2)
Training Programme - On POKA YOKE - 12th March 2010
93 pages
ELISA
100% (2)
ELISA
24 pages
TSC Ievc Doc System Syad V2.0
No ratings yet
TSC Ievc Doc System Syad V2.0
750 pages
Insect Repellent Candle (Paraffin) FROM TANGLAD (Cymbopogon Citrates Stapf.) LEAF EXTRACT
No ratings yet
Insect Repellent Candle (Paraffin) FROM TANGLAD (Cymbopogon Citrates Stapf.) LEAF EXTRACT
14 pages
ST Topic 4
No ratings yet
ST Topic 4
110 pages
Chapter 4 Inferential
No ratings yet
Chapter 4 Inferential
135 pages
Chapter 2 Students-Sta408
No ratings yet
Chapter 2 Students-Sta408
59 pages
Mid Year Exam Science Form 2 2011 Latest
No ratings yet
Mid Year Exam Science Form 2 2011 Latest
22 pages
Chapter 5 - Sample Statistics
No ratings yet
Chapter 5 - Sample Statistics
90 pages
Concept Map
100% (1)
Concept Map
1 page
Lecture 4 - Confidence Intervals & Hypothesis
No ratings yet
Lecture 4 - Confidence Intervals & Hypothesis
25 pages
Noun and Question Tag
No ratings yet
Noun and Question Tag
8 pages
Microprocessors & Microcontrollers: Part A 1. What Is Microprocessor?
No ratings yet
Microprocessors & Microcontrollers: Part A 1. What Is Microprocessor?
21 pages
Test N
No ratings yet
Test N
2 pages
DVB-S2 Modem: Sk-Ip / SK-DV / Sk-Ts
No ratings yet
DVB-S2 Modem: Sk-Ip / SK-DV / Sk-Ts
6 pages
Estimação Pontual
No ratings yet
Estimação Pontual
58 pages
BHEL Sample Placement Paper
No ratings yet
BHEL Sample Placement Paper
12 pages
Statistics 512 Notes I D. Small
No ratings yet
Statistics 512 Notes I D. Small
8 pages
Point Estimatiors
No ratings yet
Point Estimatiors
52 pages
Point Estimation
No ratings yet
Point Estimation
47 pages
Bounded and Unbounded Sequence: (A) Definition
No ratings yet
Bounded and Unbounded Sequence: (A) Definition
2 pages
Foundation II Lesson 2
No ratings yet
Foundation II Lesson 2
16 pages
Module02 Slides Print 1
No ratings yet
Module02 Slides Print 1
65 pages
Chapter 5
No ratings yet
Chapter 5
60 pages
Chapter 3. Parameter Estimation: Phuong Le
No ratings yet
Chapter 3. Parameter Estimation: Phuong Le
40 pages
MTH 216 Statistical Inference 2
No ratings yet
MTH 216 Statistical Inference 2
57 pages
ADA Full Manuscript Set II 23030187 1
No ratings yet
ADA Full Manuscript Set II 23030187 1
42 pages
Unit5 Updated
No ratings yet
Unit5 Updated
69 pages
Module 5
No ratings yet
Module 5
67 pages
Chapter #7 Point Estimation and Sampling Dist
No ratings yet
Chapter #7 Point Estimation and Sampling Dist
41 pages
Genome Organization
100% (1)
Genome Organization
23 pages
2.parameter Estimation
No ratings yet
2.parameter Estimation
59 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Chapter 4: Point Estimators and Confidence Interval: Phan Thi Khanh Van
No ratings yet
Chapter 4: Point Estimators and Confidence Interval: Phan Thi Khanh Van
36 pages
Unit - III (P&S Notes)
No ratings yet
Unit - III (P&S Notes)
39 pages
00 Estimation
No ratings yet
00 Estimation
33 pages
Lecture 21: The Sample Total and Mean and The Central Limit Theorem
No ratings yet
Lecture 21: The Sample Total and Mean and The Central Limit Theorem
26 pages
Session2 QTII 24
No ratings yet
Session2 QTII 24
31 pages
Statistical Foundations: SOST70151 - LECTURE 5
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 5
49 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Pro Band Stat
No ratings yet
Pro Band Stat
27 pages
Topic 10 Point Estmation of Parameters
No ratings yet
Topic 10 Point Estmation of Parameters
36 pages
2 5+Sample+Moments
No ratings yet
2 5+Sample+Moments
30 pages
Sampling Distributions and Point Estimation of Parameters PDF
No ratings yet
Sampling Distributions and Point Estimation of Parameters PDF
48 pages
Psp-Unit-6 Estimation Theory PDF
No ratings yet
Psp-Unit-6 Estimation Theory PDF
38 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chap 3
No ratings yet
Chap 3
25 pages
Stimation: Statistic
No ratings yet
Stimation: Statistic
46 pages
STA248
No ratings yet
STA248
26 pages
Point of Estimation of Parameters and Sampling Distri.
No ratings yet
Point of Estimation of Parameters and Sampling Distri.
39 pages
2A.3 Lecture Slides 0
No ratings yet
2A.3 Lecture Slides 0
19 pages
Statistics I: Introduction and Distributions of Sampling Statistics
No ratings yet
Statistics I: Introduction and Distributions of Sampling Statistics
22 pages
STAT 101 Module Handout 4.1
No ratings yet
STAT 101 Module Handout 4.1
12 pages
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
No ratings yet
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
44 pages
Sample Statistics: N I N I
No ratings yet
Sample Statistics: N I N I
13 pages
Estimation: M. Shafiqur Rahman
No ratings yet
Estimation: M. Shafiqur Rahman
31 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
Transition To MATH503
No ratings yet
Transition To MATH503
12 pages
Chuong 7
No ratings yet
Chuong 7
13 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Postmodernism and Biology in John Fowles S The French Lieutenant's Woman
No ratings yet
Postmodernism and Biology in John Fowles S The French Lieutenant's Woman
23 pages
Advanced Algorithms Analysis and Design - CS702 Power Point Slides Lecture 13
100% (1)
Advanced Algorithms Analysis and Design - CS702 Power Point Slides Lecture 13
20 pages
DAN2400 Product Brief
No ratings yet
DAN2400 Product Brief
2 pages
Subject: Inferential Statistics Module Number: 1.1 Module Name: Parameter Estimation - Preliminaries
No ratings yet
Subject: Inferential Statistics Module Number: 1.1 Module Name: Parameter Estimation - Preliminaries
30 pages
TP343
No ratings yet
TP343
7 pages
Chap 2 Parameter Estimation
No ratings yet
Chap 2 Parameter Estimation
14 pages
Lecture7 Estimation
No ratings yet
Lecture7 Estimation
18 pages
8.chapter 2
No ratings yet
8.chapter 2
13 pages
Computer Science Revision Notes
No ratings yet
Computer Science Revision Notes
26 pages
CHAPTER TWO Statistics Method (2) - 1
No ratings yet
CHAPTER TWO Statistics Method (2) - 1
10 pages
Sci Ahead LSec 2 04 SB
No ratings yet
Sci Ahead LSec 2 04 SB
48 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Test (POS)
No ratings yet
Test (POS)
2 pages
E40M RC Filters: M. Horowitz, J. Plummer, R. Howe 1
No ratings yet
E40M RC Filters: M. Horowitz, J. Plummer, R. Howe 1
22 pages
Formula List Statistics 2
No ratings yet
Formula List Statistics 2
4 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Lesson - Plan - SCIENCE 9
No ratings yet
Lesson - Plan - SCIENCE 9
9 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Inferential Statistic: 1 Estimation of A Population Mean
No ratings yet
Inferential Statistic: 1 Estimation of A Population Mean
8 pages
Chapter 9 Sample Estimation Problems: Classical Methods of Estimation Point Estimation
No ratings yet
Chapter 9 Sample Estimation Problems: Classical Methods of Estimation Point Estimation
8 pages
Advanced MR Imaging of The Pancreas
No ratings yet
Advanced MR Imaging of The Pancreas
15 pages
Majlis Vol 27 No 05 A4
No ratings yet
Majlis Vol 27 No 05 A4
24 pages
Development of An Automated Multi-Level Car Parking System: December 2015
No ratings yet
Development of An Automated Multi-Level Car Parking System: December 2015
8 pages
Aluminium Alsi10mg
No ratings yet
Aluminium Alsi10mg
1 page
5th Floor - Main Office Ceiling
No ratings yet
5th Floor - Main Office Ceiling
12 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet

Chapter 2

Uploaded by

Chapter 2

Uploaded by

SAMPLING AND ESTIMATION

Farouk Mselmi SAMPLING AND ESTIMATION 1 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 2 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 3 / 54

The central limit theorem

The density function of the Chi-Square random variable

This remarkable fact holds much more generally !

It is known as the Central Limit Theorem (CLT).

RECALL : If X1 , X2 , . . . , Xn are independent, identically distributed, each having mean µ, variance

Farouk Mselmi SAMPLING AND ESTIMATION 4 / 54

Farouk Mselmi SAMPLING AND ESTIMATION 5 / 54

QUESTION : What is the exact value of P(χ2n ≤ 0) ? ( !)

Farouk Mselmi SAMPLING AND ESTIMATION 6 / 54

RECALL : If X1 , X2 , . . . , Xn are independent, identically distributed, each having mean µ, variance

Corollary (to the Central Limit Theorem)

Farouk Mselmi SAMPLING AND ESTIMATION 7 / 54

with mean µ = 0 and standard deviation σ = √1 .

with mean µ = 0 and standard deviation σ = √1 , is approximately normal. Therefore,

Farouk Mselmi SAMPLING AND ESTIMATION 8 / 54

Farouk Mselmi SAMPLING AND ESTIMATION 9 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 10 / 54

Sampling can consist of :

We shall generally assume the population is infinite (or large).

Farouk Mselmi SAMPLING AND ESTIMATION 11 / 54

Farouk Mselmi SAMPLING AND ESTIMATION 12 / 54

Farouk Mselmi SAMPLING AND ESTIMATION 13 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 14 / 54

Farouk Mselmi SAMPLING AND ESTIMATION 15 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 16 / 54

The sample mean

Farouk Mselmi SAMPLING AND ESTIMATION 17 / 54

The sample variance

The sample variance is given by

and the population variance (for discrete random variables)

Farouk Mselmi SAMPLING AND ESTIMATION 18 / 54

FACT 1 : We (obviously) have that

Farouk Mselmi SAMPLING AND ESTIMATION 19 / 54

has expected value  

Farouk Mselmi SAMPLING AND ESTIMATION 20 / 54

Most authors instead define the sample variance as

In this case, the theorem becomes :

has expected value

Farouk Mselmi SAMPLING AND ESTIMATION 21 / 54

Estimating the variance of a normal distribution

We have shown that

Farouk Mselmi SAMPLING AND ESTIMATION 22 / 54

We have found that

then the Theorem becomes

Farouk Mselmi SAMPLING AND ESTIMATION 23 / 54

The sample correlation coefficient

Recall the covariance of random variables X and Y :

σX ,Y ≡ Cov(X , Y ) ≡ E[(X − µX )(Y − µY )] = E[XY ] − E[X ]E[Y ].

It is often better to use a scaled version, the correlation coefficient

where σX and σY are the standard deviations of X and Y .

Farouk Mselmi SAMPLING AND ESTIMATION 24 / 54

Similarly, the sample correlation coefficient of a data set {(Xi , Yi )}N

for which we have another version of the Cauchy-Schwarz inequality :

Like the covariance, RX ,Y measures ”concordance” of X and Y :

Farouk Mselmi SAMPLING AND ESTIMATION 25 / 54

The sample correlation coefficient

can also be used to test for linearity of the data. In fact :

Farouk Mselmi SAMPLING AND ESTIMATION 26 / 54

1 The central limit theorem

3 Classical methods of point estimation

Farouk Mselmi SAMPLING AND ESTIMATION 27 / 54

The maximum likelihood procedure is the following : Let X1 , X2 , . . . , Xn be independent, identically

has expected value