Cha 2
Cha 2
The sampling distribution of the mean shows how far sample means could be from a
known population mean. Similarly, the sampling distribution of the proportion shows
how far sample proportions could be from a known population proportion. In estimation,
our aim is to determine how far an unknown population mean could be from the mean of
a simple random sample selected from that population; or how far an unknown
population proportion could be from a sample proportion. Those are the concerns of
statistical inference, in which a statement about an unknown population parameter is
derived from information contained in a random sample selected from the population.
A statistical population represents the set of all possible values for a variable. In practice,
we do not study the entire population. Instead, we use data in a sample to shed light on
the wider population. The process of generalizing from the sample to the population is
statistical inference.
Estimation: is the process of using statistics as estimates of parameters. It is any
procedure where sample information is used to estimate/ predict the numerical
value of some population measure (called a parameter).
Estimator- refers to any sample statistic that is used to estimate a population
parameter. E.g. x for μ , p for p.
Types of Estimates:
There are two types of estimates that we can make about a population: a point
estimate and an interval estimate.
A) A point estimate: - is a single number that is used to estimate an unknown population
parameter. It is a single value that is measured from a sample and used as an estimate
of the corresponding population parameter.
The most important point estimates (given that they are single values) are:
o Sample mean ( x ) for population mean ( μ ) ;
Solution
a. Sample mean is 35,420 so this will approximate the population mean so =
35420. It is estimated from the sample mean.
b. The confidence interval is between 35170 and 35670 found by
X ±1. 96
S
( 2050
)
√ n = 35420 1.96 √ 256 = 35168.87 and 35671.13
c. The end points of the confidence interval are called the confidence limits. In this
case they are rounded to 35170 and 35670. 35170 is the lower limit and 35070 is
the upper limit.
d. Interpretation
If we select 100 samples of size 256 form the population of all middle managers and
compute the sample means and confidence intervals, the population mean annual income
would be found in about 95 out of the 100 confidence intervals. About 5 out of the 100
confidence intervals would not contain the population mean annual income.
σ
μ= X ± Z α / 2
√n
Where:
X= sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation
n = sample size
α = the proportion of incorrect statements (α = 1 – Confidence level)
= unknown population mean
From the above formula we can learn that an interval estimate is constructed by adding and
subtracting the error term to and from the point estimate. That is, the point estimate is
found at the center of the confidence interval.
To find the interval estimate of population mean, μ we have the following steps.
1. Compute the standard error of the mean(
σ x)
deviation
S=
√ ∑ ( X i− X )2
n−1 . Then, the standard error of the mean,
SX =
S
σ X , is estimated by
S
μ= X ± Z α / 2
√n
Illustration-1: Suppose that a car rental firm in Addis wants to estimate the average
number of miles traveled by each of its cars rented. A random sample of 110 cars rented
reveals that the sample means travel distance per day is 85.5 miles, with a sample
standard deviation of 19.3 miles. Compute a 99% confidence interval to estimate μ .
Solution:
n= 110 rented cars X = 85.5 miles s = 19.3 miles C= 0.99
S 19 .3
SX =
Step 1. √ n = √110 = 1.84
Step 2. α = 1 – C = 1- 0.99 = 0.01
α /2 = 0.01/2 = 0.005
Step 3.
Z α /2= Z 0.005 =2.58
s
μ= X ± Z α / 2
Step 4. √n
= 85.5 ± 2.57(1.84)
80.77 ≤ ≤ 90.23
We state with 99% confidence that the average distance traveled by rented cars lies
between 80.77 and 90.23 miles.
Illustration-2: A study is being conducted in a company that has 800 engineers. A
random sample of 50 of these engineers reveals that the average sample age is 34.3 years,
and the sample standard deviation is 8 years. Assuming normality, construct a 98%
confidence interval to estimate the average age of all engineers in this company.
Solution:
Given: n = 50 engineers N = 800 engineers X = 34.3 years
s = 8 years C = 0.98
Step 1.
SX =
S
∗
√N−n 8
∗
800−50
√ n N−1 = √50 800−1 = 1.10
Step 2. α = 1 – C = 1- 0.98 = 0.02
√
α /2 = 0.02/2 = 0.01
Step 3.
Z α /2= Z 0. 01=2.33
Step 4.
μ= X ± Z α / 2
= 34.3 ± 2.33(1.10)
s
√n
∗
√ N−n
N −1
= 34.3 ± 2.56
31.74 ≤ ≤ 36.86
We state with 98% confidence that the mean age of engineers lies between 31.74 and
36.86 years.
2.3.2. Interval Estimation of the Population Proportion
We know that a sample proportion ( p ), is an unbiased estimator of a population
proportion P and if the sample size is large then, the sampling distribution of p is normal
P−P P−P
Z= =
√
σp Pq
with n.
√ pq
n . That is, σ p is substituted by
S p=
√ pq
n
P= p±Z α /2
√
= p±Z α /2 S p
pq
n
i.
ii.
S p=
√ √
pq
n=
0 .61∗0 . 39
α = 1 – C = 1- 0.95 = 0.05
87 = 0.0523
α /2 = 0.05/2 = 0.025
iii.
Z α /2= Z 0.025 =1.96
iv.
P= p±Z α / 2 S p
= 0.39 ± 1.96(0.0523)
= 0.39 ± 0.1025
0.2875 ≤ P ≤ 0.4925
We state with 955 confidence that the proportion of companies which use telemarketing
to assist order processing lies between 0.2875 and
Illustration-2: Suppose 1600 of 2000 union members sampled said they plan to vote for
the proposal to merge with a national union. Union by laws state that at least 75% of all
members must approve for the merger to be enacted. Using the 0.95 degree of
confidence, what is the interval estimate for the population proportion? Based on the
+¿ {0.53=p−Zα/2 s p ¿ ¿¿¿
1 = 2p
p = 0.50
Number of females (X) = n× p = 400×0.5 = 200
b)
P= p±Z α / 2 S p
0.53 = 0.50 +
Zα /2 S p
0.03 =
Zα /2
√ 0 . 5∗0 . 5
400
0.03 =
Z α / 2∗0. 025
Z
1.2 = α / 2
(P/Z = 1.2) = 0.3849
C = 0.3849×2
= 76.98%
0.05 =
Zα /2
√ 0 . 7∗0 . 3
400
0.05 =
Z α /2∗0. 023
Z
2.17 = α / 2
(P/Z=2.17) = 0.485
C = 0.485×2
= 97%
Let us define population 1 as all customers who shop at the inner-city store and
population 2 as all customers who shop at the suburban store.
σ 1−2=
√σ 12 σ 22
+
n1 n2
Therefore, interval estimate of the difference between two populations means: σ1 and σ2
known
Illustration-1
Let us return to the Greystone example. Based on data from previous customer
demographic studies, the two population standard deviations are known with σ1 = 9 years
and σ2 = 10 years. The data collected from the two independent simple random samples
of Greystone customers provided the following results.
Inner City Store Suburban Store
Sample Size n1 = 36 n2 = 49
Sample Mean X 1 = 40 years X 2 = 35 years
Solution
Using the above expression, we find that the point estimate of the difference between the
mean ges of the two populations is X 1 - X 2 = 40 - 35 = 5 years. Thus, we estimate that
the customers at the inner-city store have a mean age five years greater than the mean age
of the suburban store customers.
Using 95% confidence and zα/2 =z.025 = 1.96, we have interval estimate of:
Thus, the margin of error is 4.06 years and the 95% confidence interval estimate of the
difference between the two population means is 5 _ 4.06 _ .94 years to 5 _ 4.06 _ 9.06
years.
Illustration-2
A research team is interested in the difference between serum uric acid levels in patients
with and without Down's syndrome. In a large hospital for the treatment of the mentally
retarded, a sample of 12 individuals with Down's syndrome yielded a mean of = 4.5
mg/100 ml. In a general hospital a sample of 15 normal individuals of the same age and
sex were found to have a mean value of = 3.4 mg/100 ml. If it is reasonable to assume
that the two populations of values are normally distributed with variances equal to 1 and
1.5 respectively, find the 95 percent confidence interval for - .
Given n1 = 12 = 4.5 =1
n1 = 15 = 3.4 = 1.5
The point estimate for - is -
- = 4.5 - 3.4 = 1.1
The standard error is
Characteristics of t-distribution
1. The t-distribution is symmetric about its mean (0) and ranges from - ∞ to ∞.
2. The t-distribution is bell-shaped (uni-modal) and has approximately the same
appearance as the standard normal distribution (Z- distribution).
3. The t-distribution depends on a parameter ν (Greek Nu), called the degrees of
freedom of the distribution. v = n -1, where n is sample size. The degree of freedom,
ν, refers to the number of values we can choose freely.
4. The variance of the t-distribution is ν/ (ν-2) for ν>2.
5. The variance of the t-distribution always exceeds 1.
6. As ν increases, the variance of the t-distribution approaches 1 and the shape
approaches that of the standard normal distribution.
7. Because the variance of the t-distribution exceeds 1.0 while the variance of the Z-
distribution equals 1, the t-distribution is slightly flatter in the middle than the Z-
distribution and has thicker tails.
8. The t-distribution is a family of distributions with a different density function
corresponding to each different value of the parameter ν. That is, there is a separate t-
distribution for each sample size. In proper statistical language, we would say,
“There is a different t-distribution for each of the possible degrees of freedom”.
9. The t formula for sample when σ is unknown, the sample size is small, and the
X−μ X −μ
t= =
SX s
population is normally distributed is: √n This formula is essentially
the same as the z-formula, but the distribution table values are not.
The confidence interval to estimate μ becomes:
s
μ= X ±t α / 2 , v
√n
Where: X = sample mean
α=1–C
ν = n – 1 (degrees of freedom)
s = sample standard deviation
n = sample size
= unknown population mean
Steps:
i. Calculate degrees of freedom (V = n-1) and sample standard error of the mean.
α /2
ii. Compute
Illustration One
If a random sample of 27 items produces x = 128.4 and s = 20.6. What is the 98%
confidence interval for μ ? Assume that x is normally distributed for the population.
What is the point estimate?
Solution:
The point estimate of the population mean is the sample mean, in this case 128.4 is the
point estimate.
X
n= 27 = 128.4 s = 20.6 C= 0.98 ν = n – 1 = 27-1 = 26
S 20 . 6
SX =
i. √ n = √27 = 3.96
ii. α = 1 – C = 1- 0.98 = 0.02
α /2 = 0.02/2 = 0.01
t =t
iii. α/2, v 0 .01,26
=2.479
s
μ= X ±t α / 2 , v
iv. √n
= 128.4 ± 2.479(3.96)
= 128.4 ± 9.82
118.56 ≤ ≤ 138.22
We state with 98% confidence that the population mean lies between 118.56 and 138.23.
Illustration Two
A sample of 20 cab fares in Mekelle city shows a sample mean of Br 2.50 and a sample
standard deviation of Br. 0.50. Develop a 90% confidence interval estimate of the mean
cab fares in Mekelle city. Assume the population of cab fares has a normal distribution.
X
n= 20 = Birr 2.50 s = Birr 0.50 C= 0.90 ν = n – 1 = 20-1 = 19
S 0 .5
SX =
i. √ n = √20 = 0.112
ii. α = 1 – C = 1- 0.90 = 0.10
α /2 = 0.10/2 = 0.05
2.31 ≤ ≤ 2.69
We state with 90% confidence that the mean of cab fares in Mekelle city lies
between Birr 2.31 and 2.69.
2.6. Determining the Sample Size
The reason for taking a sample from a population is that it would be too costly to gather
data for the whole population. But collecting sample data also costs money; and the
larger the sample, the higher the cost. To hold cost down, we want to use as small a
sample as possible. On the other hand, we want a sample to be large enough to provide
“good” approximation/estimates of population parameters. Size of a sample must be
determined scientifically. Care must be taken not to select a sample too large or too
small. There are two misconceptions about how many to sample
a) Sample consisting 5% (or similar constant percentage) is adequate for all
problems.
5% can be too much for a particular population say 10 million or can be too small
for another say 200.
b) A sample, for example, must be selected form a heavily populated area.
The avoid such problems the sample size should be mathematically determined.
( )
2
Z .S
n= E
where E = allowable error
Z = Z value for the degree of confidence selected
S = Sample deviation
(
1. 96×3000 2
For this example n = 200
)
= 864.36
Example 1. A marketing research firm wants to conduct a survey to estimate the average
amount spent on entertainment by each person visiting a popular pub. The people who
plan the survey would like to be able to determine the average amount spent by all people
visiting the pub to within br. 120, with 95% confidence. From past operations of the pub,
an estimate of the population standard deviation is = br. 400 what is the minimum
required sample sizes?
Z = 1.96
E = 120
= 400
Required, n?
( )
2
1. 96×400
n = 120 = 42.68 43
( )
2
Zα /2 σ
nμ =
e
nμ = (
1. 96∗6 . 25 2
1 )
= 150. 06 ≈151
Illustration-3: The National Travel and Tour Organization (NTO) would like to estimate
the mean amount of money spent by a tourist to be within Birr 100 with 95% confidence.
If the amount of money spent by tourist is considered to be normally distributed with a
standard deviation of Br 200, what sample size would be necessary for the NTO to meet
their objective in estimating this mean amount?
Solution:
e = Birr 100 σ = Birr 200 C = 0.95
Z α /2=Z 0. 025 =1. 96
( )
2
Zα /2 σ
nμ =
e
( )
2
1. 96∗200
nμ =
100
= 15. 37 ≈ 16
( )
1. 96 2
= 0.40 (1 – 0.4) 0 . 02 = 2,304.96 2305
This sample size might be too large, or too small or exactly correct depending on the
accuracy of p. Note: if there is no logical estimate of p, the sample size can be
estimated by letting p =0.5
Example 2. Suppose the president wants an estimate of the proportion of the population
that support this current policy on unemployment. The president wants the estimate to be
with in 0.04 of the true proportion. Assume a 95% level of confidence and the proportion
supporting current policy to be 0.60.
a) How large a sample is required
b) How large would the sample have to be if the estimate were not available?
Solution:
a) E = 0.04
Z = 1.96
p = 0.60
( )
1 . 96 2
n = 0.6(1 – 0.6) 0 . 04
= 577
b) E = 0.4
Z = 1.96
p = 0.50 (since there is no estimate)
( )
1 . 96 2
n = 0.5 (1 – 0.5) 0 . 04 = 600
Illustration -3: Suppose that a production facility purchases a particular component parts
in large lots from a supplier. The production manager wants to estimate the proportion of
defective parts received from this supplier. She believes that the proportion of defects is
Solution:
e = 0.02 p = 0.2 q =0.8 C = 0.90
Z α /2=Z 0. 05=1 . 64
( )
2
Zα /2
np = pq
e
np = (
1 . 64 2
0. 02 )
0 . 2∗0 . 8
=1075 .84 ≈1076
Illustration-4: What is the largest sample size that would be needed in estimating a
population proportion to be within ± 0.02, with a confidence coefficient of 0.95?
Solution:
e = 0.02 C = 0.95
Z =Z
α /2 0. 025 =1. 96
The largest sample size would be obtained when p = 0.5. So,
( )
2
Zα /2
np = pq
e
( ) 0 . 5∗0. 5
2
1 . 96
np =
0. 02
=2401
If p is unknown and there is no possibility of estimating it, use 0.5 as the value of p
because it will generate the greatest possible sample size as compared with other values.
1.6.3. Determining Sample Size When Estimating μ1 −μ2
When taking two random samples and using the difference in sample means to estimate
the difference in population means, a researcher should have an idea of how large the
( X 1 −X 2 )−( μ 1−μ 2 )
Z=
not look promising, because the equation has nine variables including two different
does
√ σ2 σ2
+
n n = √
Solving for n produces the sample size:
2 σ2
n = √ 2 2
n
σ
= √ 2
n
σ
2 ∗Z 2α /2 σ 2
( )
2
Z σ
n= = 2 α /2
e2 e
The above formula suggests that the necessary sample sizes for comparing two sample
means are each twice as large as the required sample size for estimating single sample
means. It is clear that the larger the sample, the more it costs. Thus sample size formulas
can be effective aids in ensuring that a research project’s goals are met and that the cost
of sampling is minimized.
( )
2
Z α /2 σ
n= 2
e
( )
2
Z α /2 σ
n= 2
e
n= 2 (
2 .58∗100 2
20 )
= 2(166 . 41 )
=332. 82 ≈ 333