0% found this document useful (1 vote)
37 views23 pages

Cha 2

Uploaded by

Senay Haftu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
37 views23 pages

Cha 2

Uploaded by

Senay Haftu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER 2: STATISTICAL ESTIMATION

2.1. Basic Concepts

The sampling distribution of the mean shows how far sample means could be from a
known population mean. Similarly, the sampling distribution of the proportion shows
how far sample proportions could be from a known population proportion. In estimation,
our aim is to determine how far an unknown population mean could be from the mean of
a simple random sample selected from that population; or how far an unknown
population proportion could be from a sample proportion. Those are the concerns of
statistical inference, in which a statement about an unknown population parameter is
derived from information contained in a random sample selected from the population.

Statistical inference is the act of generalizing from a sample to a population with


calculated degree of certainty. The two forms of statistical inference are estimation and
hypothesis testing.
 Estimation- a sample statistic to estimate an unknown parameter value and
 Hypothesis testing- a claim or belief about an unknown parameter value.

A statistical population represents the set of all possible values for a variable. In practice,
we do not study the entire population. Instead, we use data in a sample to shed light on
the wider population. The process of generalizing from the sample to the population is
statistical inference.
Estimation: is the process of using statistics as estimates of parameters. It is any
procedure where sample information is used to estimate/ predict the numerical
value of some population measure (called a parameter).
Estimator- refers to any sample statistic that is used to estimate a population
parameter. E.g. x for μ , p for p.

Estimate- is a specific numerical value of our estimator. E.g. x= 9, 2, 5


x , p ,s 2 ,s ……………. Estimators
μ , p ,σ 2 , σ ………………… items being estimated
1, 0.5, 9, 3 …………………... Estimates

Types of Estimates:
There are two types of estimates that we can make about a population: a point
estimate and an interval estimate.
A) A point estimate: - is a single number that is used to estimate an unknown population
parameter. It is a single value that is measured from a sample and used as an estimate
of the corresponding population parameter.
The most important point estimates (given that they are single values) are:
o Sample mean ( x ) for population mean ( μ ) ;

Stat. for Mgt. II Page 1


o Sample proportion ( p ) for population proportion( p ) ;
o Sample variance ( s ) for population variance ( σ )
2 2
and
o Sample standard deviation ( s ) for population standard deviation ( σ )
 Criteria of Good Estimators/Good properties of estimators/Goodness of an
estimator/
There are four criteria by which we can evaluate the quality of a statistic as an estimator.
These are: unbiasedness, efficiency, consistency and sufficiency.
1) Unbiasedness
This is a very important property that an estimator should possess. If we take all possible
samples of the same size from a population and calculate their means, the mean x μ of all
these means will be equal to the mean μ of the population. This means that the sample
mean x is an unbiased estimator of the population mean μ. When the expected value (or
mean) of a sample statistic is equal to the value of the corresponding population
parameter, the sample statistic is said to be an unbiased estimator.
Suppose we take the smallest sample observation as an estimator of the population mean
μ, it can be easily shown that this estimator is biased. Since the smallest observation must
be less than the mean, its expected value must be less than μ. Symbolically, E(Xs) < μ,
where Xs stands for the smallest item and E stands for the expected value. Thus, this
estimator is biased downwards. The extent of bias is the difference between the expected
value of the estimator and the value of the parameter. In this case, bias is equal to E(Xs)-
μ. In contrast, the biases for the sample mean x is zero.
2) Consistency
Another important characteristic that an estimator should possess is consistency. A
consistent estimator converges toward the parameter being estimated as the sample size
increases.
3) Efficiency
Efficiency is measured in terms of size of the standard error of the statistic. Efficiency
refers to the variance of the estimator’s sampling distribution. Smaller variance means a
more efficient estimator. Among all unbiased estimators, we prefer the minimum
variance estimator, referred to as MVUE (minimum variance unbiased estimator). Since
an estimator is a random variable, it is necessarily characterized by a certain amount of
variability. This means that some estimates may be more variable than others. Just as bias
is related to the expected value of the estimator, so efficiency can be defined in terms of
the variance. In large samples, for example, the variance of the sample mean is V(x) =
σ2/n. As the sample size n increases, the variance of the sample mean (Vx) becomes
smaller, so the estimator becomes more efficient. This criterion, when applied to large
samples, gives better estimates as compared to the small ones.
The efficiency of one estimator in relation to another estimator can be judged by
comparing their sampling variances.
4) Sufficiency
The fourth property of a good estimator is that it should be sufficient. A sufficient
statistic utilizes all the information a sample contains about the parameter to be estimated.
for example, is a sufficient estimator of the population mean μ. It implies that no other

Stat. for Mgt. II Page 2


estimator of μ, such as the sample median, can provide any additional information about
the parameter μ. Likewise, we can say that the sample proportion π.
b) An interval estimate - is a range of values used to estimate a population parameter.
It describes the range of values with in which a parameter might lie. Stated differently, an
interval estimate is a range of values with in which the analyst can declare with some
confidence that the population parameter will fall.
2.2. Point Estimators of the Mean and Proportion
A point estimate of a parameter ϴ is a single number that can be regarded as a sensible
value for ϴ. A point estimate is obtained by selecting a suitable statistic and computing
its value from the given sample data. The selected statistic is called the point estimator of
ϴ.
A random sample of observations is taken from the population of interest and the
observed values are used to obtain a point estimate of the relevant parameter.
a. The sample mean, x , is the best estimator of the population mean .
Different samples from a population yield different point estimates of ,
Illustration-1 : Suppose, for example, that the parameter of interest is μ , the true average
lifetime of batteries of a certain type. A random sample of n = 3 batteries might yield
observed lifetimes (hours) x1 = 5:0; x2 = 6:4; x3 = 5:9.
The computed value of the sample mean lifetime is ( x ) = 5:77. It is reasonable to regard
5.77 as a very plausible value of μ "our best guess" for the value of μ based on the
available sample information.
Illustration-2 Suppose we have the sample 10,20,30,40 and 50 selected randomly from a
population whose mean μ is unknown.
∑ xi10+20+30+ 40+50
=30
The sample mean, x , n = 5 is a point estimate of μ .
On the other hand, if we state that the mean, μ , is between x±10 , the range of values
from 20 (30-10) to 40 (30+10) is an interval estimate.
b. Sample proportion p is a good estimator of population proportion, p.
- Population proportion P is equal to the number of elements in the population
belonging to the category of interest divided by the total number of elements in the
X
population p = N
Where: X is the number of success in the population and
N population size
x
Sample proportion, p = n where;
x is the number of elements in the sample found to belong to the category of interest and
n is the sample size.

Stat. for Mgt. II Page 3


Or p = Number of success in a sample
number sampled
Example of 2000 persons sampled 1600 favored more strict environmental protection
measures, what is the estimated population proportion.
p = 16000 = 0.80
2000
80% is an estimate of the proportion in the population that favor more strict measures
2.3. Interval Estimators of the Mean and Proportion
2.3.1. Interval estimation for population means ( μ )
Interval estimate states the range within which a population parameter probably lies. The
interval with in which a population parameter is expected to lie is usually referred to as
the confidence interval.
The confidence interval for the population mean is the interval that has a high probability
of containing the population mean, 
Two confidence intervals are used extensively.
1. 95% confidence interval and
2. 99% confidence interval
A 95% confidence interval means that about 95% of the similarly constructed intervals
will contain the parameter being estimated. If we use the 99% confidence interval we
expect about 99% of the intervals to contain the parameter being estimated.
Another interpretation of the 95 % confidence interval is that 95 % of the sample means
for a specified sample size will lie within 1.96 standard deviations of the hypothesized
population mean. For 99% the sample means will lie, with in 2.58 standard deviations of
the hypothesized population mean.
Where do the values 1.96 and 2.58 come from?
The middle 95% of the sample mean lie equally on either side of the mean. And logically
0.95/2=0.4750 or 47.5% of the area is to the right of the mean and the area to the left of
the mean is 0.4750.
The Z value for this probability is 1.96.
The Z to the right of the mean is + 1.96 and Z to the left is – 1.96.
Constructing Confidence Interval
a) Compute the standard error of the mean
Standard error of the mean is the standard deviation of the sample means.
σ  = population standard
σ x=
√n deviation
n = sample size
If the population standard deviation is not known, the standard deviation of the sample s,
S
S x=
is used to approximate the population standard deviation. √n

Stat. for Mgt. II Page 4


This indicates that the error in estimating the populations mean decreases as the sample
size increases.
b) The 95% and 99% confidence intervals are constructed as follows when n > 30.
S
95% confidence interval x  1.96 √ n
S
99% confidence interval x  2.58 √ n
1.96 and 2.58 indicate the Z values corresponding to the middle 95% or 99% of the
observation respectively.
S
x±Z
In general a confidence interval for the mean is computed by √ n , Z reflects the
selected level of confidence.
The confidence interval for population mean is affected by:
1. The population distribution, i.e., whether the population is normally
distributed or not
2. The standard deviation, i.e., whether σ is known or not.
3. The sample size, i.e., whether the sample size, n, is large or not.
Example: An experiment involves selecting a random sample of 256 middle managers
for studying the annual income. The sample mean is computed to be Br. 35,420 and the
sample standard deviation is Br. 2,050.
a. What is the estimated mean income of all middle managers (the population)?
b. What is the 95% confidence interval c(rounded to the nearest 10)
c. What are the 95% confidence limits?
d. Interpret the finding.

Solution
a. Sample mean is 35,420 so this will approximate the population mean so  =
35420. It is estimated from the sample mean.
b. The confidence interval is between 35170 and 35670 found by

X ±1. 96
S
( 2050
)
√ n = 35420  1.96 √ 256 = 35168.87 and 35671.13
c. The end points of the confidence interval are called the confidence limits. In this
case they are rounded to 35170 and 35670. 35170 is the lower limit and 35070 is
the upper limit.
d. Interpretation
If we select 100 samples of size 256 form the population of all middle managers and
compute the sample means and confidence intervals, the population mean annual income
would be found in about 95 out of the 100 confidence intervals. About 5 out of the 100
confidence intervals would not contain the population mean annual income.

Stat. for Mgt. II Page 5


Confidence internal estimate of μ - Normal population, σ known
A confidence interval estimate for  is an interval estimate together with a statement of
how confident we are that the interval estimate is correct.
σ
When the population distribution is normal and at the same time is known, we can
μ
estimate (regardless of the sample size) using the following formula.

σ
μ= X ± Z α / 2
√n
Where:
X= sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation
n = sample size
α = the proportion of incorrect statements (α = 1 – Confidence level)
 = unknown population mean
From the above formula we can learn that an interval estimate is constructed by adding and
subtracting the error term to and from the point estimate. That is, the point estimate is
found at the center of the confidence interval.
To find the interval estimate of population mean, μ we have the following steps.
1. Compute the standard error of the mean(
σ x)

2. Compute α /2 from the confidence coefficient.


3. Find the Z value for the α /2 from the table
4. Construct the confidence interval
5. Interpret the results
Illustration-2: The vice president of operations for ethio telecom is in the process of
developing a strategic management plan. He believes that the ability to estimate the length of the
average phone call on the system is important. He takes a random sample of 60 calls from the
company records and finds that the mean sample length for a call is 4.26 minutes. Past history for
these types of calls has shown that the population standard deviation for call length is about 1.1
minutes. Assuming that the population is normally distributed and he wants to have a 95%
confidence, help him in estimating the population mean.
Solution:
Given: n= 60 calls X = 4.26 minutes σ = 1.1 minutes C= 0.95
σ 1 .1
σ X=
Step 1: √n = √60 = 0.142
Step 2: α = 1 – C = 1- 0.95 = 0.05
α /2 = 0.05/2 = 0.025

Stat. for Mgt. II Page 6


Step 3:
Z α /2= Z 0.025 =1.96
σ
μ= X ± Z α / 2
Step 4: √n
= 4.26 ± 1.96(0.142)
= 4.26 ± 0.28
3.98 ≤  ≤ 4.54
The vice-president of Ethio Telecom can be 95% confident that the average length of a
call for the population is between 3.98 and 4.54 minutes.
Confidence interval estimate of μ - Normal population, σ unknown, n large
If we know that the population is normal, and we know the population standard deviation
(σ ), the confidence interval for μ should be constructed in the manner already shown
σ
μ= X ± Z α / 2
i.e., √ n . If the population standard deviation is unknown, it
has to be estimated from the sample; i.e., when σ is unknown, we use sample standard

deviation
S=
√ ∑ ( X i− X )2
n−1 . Then, the standard error of the mean,
SX =
S
σ X , is estimated by

the sample standard error of the mean: √n.


μ
Therefore, the confidence interval to estimate when population standard deviation is
unknown, population normal and n is large is

S
μ= X ± Z α / 2
√n
Illustration-1: Suppose that a car rental firm in Addis wants to estimate the average
number of miles traveled by each of its cars rented. A random sample of 110 cars rented
reveals that the sample means travel distance per day is 85.5 miles, with a sample
standard deviation of 19.3 miles. Compute a 99% confidence interval to estimate μ .
Solution:
n= 110 rented cars X = 85.5 miles s = 19.3 miles C= 0.99
S 19 .3
SX =
Step 1. √ n = √110 = 1.84
Step 2. α = 1 – C = 1- 0.99 = 0.01
α /2 = 0.01/2 = 0.005

Step 3.
Z α /2= Z 0.005 =2.58
s
μ= X ± Z α / 2
Step 4. √n
= 85.5 ± 2.57(1.84)

Stat. for Mgt. II Page 7


= 85.5 ± 4.73

80.77 ≤  ≤ 90.23

We state with 99% confidence that the average distance traveled by rented cars lies
between 80.77 and 90.23 miles.
Illustration-2: A study is being conducted in a company that has 800 engineers. A
random sample of 50 of these engineers reveals that the average sample age is 34.3 years,
and the sample standard deviation is 8 years. Assuming normality, construct a 98%
confidence interval to estimate the average age of all engineers in this company.
Solution:
Given: n = 50 engineers N = 800 engineers X = 34.3 years
s = 8 years C = 0.98

Step 1.
SX =
S

√N−n 8

800−50
√ n N−1 = √50 800−1 = 1.10
Step 2. α = 1 – C = 1- 0.98 = 0.02

α /2 = 0.02/2 = 0.01

Step 3.
Z α /2= Z 0. 01=2.33

Step 4.
μ= X ± Z α / 2

= 34.3 ± 2.33(1.10)
s
√n

√ N−n
N −1

= 34.3 ± 2.56
31.74 ≤  ≤ 36.86
We state with 98% confidence that the mean age of engineers lies between 31.74 and
36.86 years.
2.3.2. Interval Estimation of the Population Proportion
We know that a sample proportion ( p ), is an unbiased estimator of a population
proportion P and if the sample size is large then, the sampling distribution of p is normal
P−P P−P
Z= =


σp Pq
with n.

However, here p is unknown and we want to estimate p by p and hence z becomes


P−P
Z=

√ pq
n . That is, σ p is substituted by
S p=
√ pq
n

Stat. for Mgt. II Page 8


Solving for P results in
P= p+ Z
√ pq
n and since Z can assume both positive and

negative values, it becomes


P= p±Z
pq
n. √
Since Z represents the confidence level we write it as

P= p±Z α /2

= p±Z α /2 S p
pq
n

Where: p = sample proportion


q =1- p
α=1–C
n = sample size

P = unknown population proportion


Illustration-1 : Recently, a study of 87 randomly selected companies with telemarketing
operation was completed. The study revealed that 39% of the sampled companies had
used telemarketing to assist them in order processing. Using this information estimate the
population proportion of telemarketing companies who use their telemarketing operation
to assist them in order processing taking a 95% confidence level.
Solution:
n= 87 p = 0.39 q = 0.61 C = 0.95

i.
ii.
S p=
√ √
pq
n=
0 .61∗0 . 39

α = 1 – C = 1- 0.95 = 0.05
87 = 0.0523

α /2 = 0.05/2 = 0.025

iii.
Z α /2= Z 0.025 =1.96

iv.
P= p±Z α / 2 S p
= 0.39 ± 1.96(0.0523)
= 0.39 ± 0.1025
0.2875 ≤ P ≤ 0.4925
We state with 955 confidence that the proportion of companies which use telemarketing
to assist order processing lies between 0.2875 and
Illustration-2: Suppose 1600 of 2000 union members sampled said they plan to vote for
the proposal to merge with a national union. Union by laws state that at least 75% of all
members must approve for the merger to be enacted. Using the 0.95 degree of
confidence, what is the interval estimate for the population proportion? Based on the

Stat. for Mgt. II Page 9


1600
confidence interval, what conclusion can be drawn? p = 2000 = 0.8. The sample
proportion is 80%

The interval is computed as follows. p Z √ p(1−p )


n √
0. 80(1−0 .8 )
= 0.80  1.96 2000 =
0.08  1.96 √ 0. 00008 4
= 0.78247 and 0 – 81753 rounded to 0.782 and 0.818.
Based on the sample results when all union members vote, the proposal will probably
pass because 0.75 lie below the interval between 0.782 and 0.818.

Illustration-3: A fast food restaurant took a random sample of 400 customers to


determine the proportion of customers who are female. A confidence interval of .53
to .47 was reported.
a. Find the number of females and the sample proportion
b. Find the level of confidence of this interval
Solution:
a) n= 400 0.73 ≤ P ≤ 0.87 p =? Number of females=?
0 .53+0. 47
=0 .50
Point estimate = 2
Or;

+¿ {0.53=p−Zα/2 s p ¿ ¿¿¿
1 = 2p
p = 0.50
Number of females (X) = n× p = 400×0.5 = 200

b)
P= p±Z α / 2 S p
0.53 = 0.50 +
Zα /2 S p

0.03 =
Zα /2
√ 0 . 5∗0 . 5
400

0.03 =
Z α / 2∗0. 025
Z
1.2 = α / 2
(P/Z = 1.2) = 0.3849
C = 0.3849×2
= 76.98%

Stat. for Mgt. II Page 10


Illustration-4: A random sample of 400 faculty members at AAU contained 120 people
who believed that the University should improve its library service. On the basis of this
sample information, an analyst calculated the confidence interval (.25, .35) for the
population proportion of faculty members favoring improvement. What is the level of
confidence of this interval?
Solution:
n= 400 X = 120 p = 0.30 Interval estimate 0.25 ≤ P ≤ 0.30 C =?
P= p±Z α / 2 S p
0.25 = 0.30 -
Zα /2 S p

0.05 =
Zα /2
√ 0 . 7∗0 . 3
400

0.05 =
Z α /2∗0. 023
Z
2.17 = α / 2
(P/Z=2.17) = 0.485
C = 0.485×2
= 97%

2.4. Interval Estimation of the Difference Between Two Independent Means


Letting µ1denote the mean of population 1 and µ2 denote the mean of population 2, we
will focus on inferences about the difference between the means: µ1 - µ2. To make an
inference about this difference, we select a simple random sample of n1 units from
population 1 and a second simple random sample of n2 units from population 2. The two
samples, taken separately and independently, are referred to as independent simple
random samples. In this section, we assume that information is available such that the
two population standard deviations, σ1 and σ2, can be assumed known prior to collecting
the samples. We refer to this situation as the σ1 and σ2 known case. In the following
example we show how to compute a margin of error and develop an interval estimate of
the difference between the two population means when σ1 and σ2 are known.
Grey stone Department Stores, Inc., operates two stores in Buffalo, New York: One is in
the inner city and the other is in a suburban shopping center. The regional manager
noticed that products that sell well in one store do not always sell well in the other. The
manager believes this situation may be attributable to differences in customer
demographics at the two locations. Customers may differ in age, education, income, and
so on. Suppose the manager asks us to investigate the difference between the mean ages
of the customers who shop at the two stores.

Let us define population 1 as all customers who shop at the inner-city store and
population 2 as all customers who shop at the suburban store.

Stat. for Mgt. II Page 11


µ1 - mean of population 1 (i.e., the mean age of all customers who shop at the
inner-city store)
µ2 - mean of population 2 (i.e., the mean age of all customers who shop at the
suburban store)
The difference between the two population means is µ1 - µ2.
To estimate µ1 - µ2, we will select a simple random sample of n1 customers from
population 1 and a simple random sample of n2 customers from population 2. We then
compute the two sample means.
X 1 - Sample mean age for the simple random sample of n1 inner-city customers
X 2 - Sample mean age for the simple random sample of n2 suburban customers
The point estimator of the difference between the two population means is the difference
between the two sample means (i.e. X 1 - X 2). As with other point estimators, the point
estimator X 1 - X 2 has a standard error that describes the variation in the sampling
distribution of the estimator. With two independent simple random samples, the standard
error of X 1 - X 2 is as follows:

σ 1−2=
√σ 12 σ 22
+
n1 n2
Therefore, interval estimate of the difference between two populations means: σ1 and σ2
known

Illustration-1
Let us return to the Greystone example. Based on data from previous customer
demographic studies, the two population standard deviations are known with σ1 = 9 years
and σ2 = 10 years. The data collected from the two independent simple random samples
of Greystone customers provided the following results.
Inner City Store Suburban Store
Sample Size n1 = 36 n2 = 49
Sample Mean X 1 = 40 years X 2 = 35 years
Solution
Using the above expression, we find that the point estimate of the difference between the
mean ges of the two populations is X 1 - X 2 = 40 - 35 = 5 years. Thus, we estimate that
the customers at the inner-city store have a mean age five years greater than the mean age
of the suburban store customers.
Using 95% confidence and zα/2 =z.025 = 1.96, we have interval estimate of:

Stat. for Mgt. II Page 12


40−35 ± 1.96
5 ± 4.06
√ 92 10 2
+
36 49

Thus, the margin of error is 4.06 years and the 95% confidence interval estimate of the
difference between the two population means is 5 _ 4.06 _ .94 years to 5 _ 4.06 _ 9.06
years.

Illustration-2
A research team is interested in the difference between serum uric acid levels in patients
with and without Down's syndrome. In a large hospital for the treatment of the mentally
retarded, a sample of 12 individuals with Down's syndrome yielded a mean of = 4.5
mg/100 ml. In a general hospital a sample of 15 normal individuals of the same age and
sex were found to have a mean value of = 3.4 mg/100 ml. If it is reasonable to assume
that the two populations of values are normally distributed with variances equal to 1 and
1.5 respectively, find the 95 percent confidence interval for - .
Given n1 = 12 = 4.5 =1
n1 = 15 = 3.4 = 1.5
 The point estimate for - is -
- = 4.5 - 3.4 = 1.1
 The standard error is

 The 95% confidence interval is

1.1 ± 1.96 (.4282)


(.26, 1.94)
Discussion: As this is a z-interval, we know that the correct value of z to use is 1.96. We
interpret this interval that the difference between the two population means is 1.1 and we
are 95% confident that the true mean lies between 0.26 and 1.94.
2.5. Student’s t-distribution
If the sample size is small (n<30), we can develop an interval estimate of a population
mean only if the population has a normal probability distribution. If the sample standard
Stat. for Mgt. II Page 13
deviation (s) is used as an estimator of the population standard deviation ) σ ( and if the
population has a normal distribution, interval estimation of the population mean can be
based up on a probability distribution known as t-distribution.

Characteristics of t-distribution
1. The t-distribution is symmetric about its mean (0) and ranges from - ∞ to ∞.
2. The t-distribution is bell-shaped (uni-modal) and has approximately the same
appearance as the standard normal distribution (Z- distribution).
3. The t-distribution depends on a parameter ν (Greek Nu), called the degrees of
freedom of the distribution. v = n -1, where n is sample size. The degree of freedom,
ν, refers to the number of values we can choose freely.
4. The variance of the t-distribution is ν/ (ν-2) for ν>2.
5. The variance of the t-distribution always exceeds 1.
6. As ν increases, the variance of the t-distribution approaches 1 and the shape
approaches that of the standard normal distribution.
7. Because the variance of the t-distribution exceeds 1.0 while the variance of the Z-
distribution equals 1, the t-distribution is slightly flatter in the middle than the Z-
distribution and has thicker tails.
8. The t-distribution is a family of distributions with a different density function
corresponding to each different value of the parameter ν. That is, there is a separate t-
distribution for each sample size. In proper statistical language, we would say,
“There is a different t-distribution for each of the possible degrees of freedom”.
9. The t formula for sample when σ is unknown, the sample size is small, and the
X−μ X −μ
t= =
SX s
population is normally distributed is: √n This formula is essentially
the same as the z-formula, but the distribution table values are not.
The confidence interval to estimate μ becomes:
s
μ= X ±t α / 2 , v
√n
Where: X = sample mean
α=1–C
ν = n – 1 (degrees of freedom)
s = sample standard deviation
n = sample size
 = unknown population mean
Steps:
i. Calculate degrees of freedom (V = n-1) and sample standard error of the mean.
α /2
ii. Compute

Stat. for Mgt. II Page 14


t α / 2, V
iii. Look up
iv. Construct the confidence interval
v. Interpret results

Illustration One
If a random sample of 27 items produces x = 128.4 and s = 20.6. What is the 98%
confidence interval for μ ? Assume that x is normally distributed for the population.
What is the point estimate?
Solution:
The point estimate of the population mean is the sample mean, in this case 128.4 is the
point estimate.
X
n= 27 = 128.4 s = 20.6 C= 0.98 ν = n – 1 = 27-1 = 26

S 20 . 6
SX =
i. √ n = √27 = 3.96
ii. α = 1 – C = 1- 0.98 = 0.02
α /2 = 0.02/2 = 0.01
t =t
iii. α/2, v 0 .01,26
=2.479
s
μ= X ±t α / 2 , v
iv. √n
= 128.4 ± 2.479(3.96)
= 128.4 ± 9.82
118.56 ≤  ≤ 138.22

We state with 98% confidence that the population mean lies between 118.56 and 138.23.
Illustration Two
A sample of 20 cab fares in Mekelle city shows a sample mean of Br 2.50 and a sample
standard deviation of Br. 0.50. Develop a 90% confidence interval estimate of the mean
cab fares in Mekelle city. Assume the population of cab fares has a normal distribution.
X
n= 20 = Birr 2.50 s = Birr 0.50 C= 0.90 ν = n – 1 = 20-1 = 19

S 0 .5
SX =
i. √ n = √20 = 0.112
ii. α = 1 – C = 1- 0.90 = 0.10
α /2 = 0.10/2 = 0.05

Stat. for Mgt. II Page 15


t = t 0 . 05 ,19=1.729
iii. α /2, v
s
μ= X ±t α / 2 , v
iv. √n
= 2.50 ± 1.729(0.112)
= 2.50 ± 0.194

2.31 ≤  ≤ 2.69

We state with 90% confidence that the mean of cab fares in Mekelle city lies
between Birr 2.31 and 2.69.
2.6. Determining the Sample Size
The reason for taking a sample from a population is that it would be too costly to gather
data for the whole population. But collecting sample data also costs money; and the
larger the sample, the higher the cost. To hold cost down, we want to use as small a
sample as possible. On the other hand, we want a sample to be large enough to provide
“good” approximation/estimates of population parameters. Size of a sample must be
determined scientifically. Care must be taken not to select a sample too large or too
small. There are two misconceptions about how many to sample
a) Sample consisting 5% (or similar constant percentage) is adequate for all
problems.
5% can be too much for a particular population say 10 million or can be too small
for another say 200.
b) A sample, for example, must be selected form a heavily populated area.
The avoid such problems the sample size should be mathematically determined.

2.6.1 Sample Size for the Mean


There are three factors that determine the size of the sample. None of which has any
direct relationship to the size of the population.
a. The degree of confidence selected.
b. The maximum allowable error
c. The variation in the population
a. The degree of confidence, this is usually 95% or 99%. But it may be any level. It is
specified by the statistician. The higher the degree of confidence, the larger the
sample required. If we want to be sure the true mean will lie between an intervals, we
would have to survey the entire population. Example. Suppose the parameter to be
estimated is the arithmetic mean, and the degree of confidence selected is 90%. Based
on a sample, it was estimated that the population mean is in the interval between 850
and 1050. Logically, if the degree of confidence were increased to 95% or 99% the
sample size would have to increase.

Stat. for Mgt. II Page 16


b. Maximum error allowed. It is the maximum error that will be tolerable at a specified
level of confidence. Suppose a statistician is interested to estimate the mean income
of residents of an area. There are indications that the family incomes range from a
probable low of 19000 to a high of about 39000. On the assumption that these are
reasonable estimates, does it seem likely that the statistician would be satisfied with
this statement resulting from a sample of area residents. “The population mean is
between 23,000 and 35,000” Probability not. Because confidence limits that wide
indicate little or nothing about the population mean. Instead, the statistician stated
“using the 0.95 confidence level, the total error is predicting the population mean
should not exceed by 200”. The maximum allowable error is denoted ‘E’ = E = | x -
|. This means based on a sample size n, if the estimate of population mean is
computed to be 35,000, then we will assure that the population mean is in the interval
between 34800and 35200. Found by 35,000 + 200 and 35000-200. For the 0.95
degree of confidence selected the maximum error of + 200 in terms of Z is 1.96. To
determine the value of one standard error of the mean
σ x simply divide the total error
of 200 by 1.96 = 102.04
200
σ x = 1. 96 = 102.04
Error cannot exceed
200 200

97.96 102.04 102.04 97.96


-1.96 -1 0 +1 1.96
Population mean must be in the Z
interval+ 200 from the sample
mean

The size of the sample is computed by solving for n in the formula


S
S x=
√ n , note that since we are using a sample standard deviation.
S σ
i.e., x is substituted for x and S for 
Total allowable error Sample standard deviation
=
Z standard deviation √ sample size
Total allowable error let be represented by ‘E’ then it follows that,
E S 200 S
= = =
Z √n 1 . 96 √n = 102.04
Since there are two unknowns for one equation we cannot solve for both.

Stat. for Mgt. II Page 17


c. Variation in the population: There are still two unknowns. To solve for the number
to be sampled we need to estimate the variation in the population. The standard
deviation is a measure of variation. Thus the standard deviation of the population
must be estimated.
This can be done either:
a- By taking a small pilot survey and using the standard deviation of the pilot sample as
an estimate of the population standard deviation or
b- By estimating the standard deviation based on knowledge of the population.
Suppose a pilot survey is conducted and sample standard deviation is computed to be
3000. The number to be sampled can now be estimated.
S E
S x= =
√n Z
200 3000
=
1. 96 √ n n = 864.36
S x is standard error of the mean, the error we commit in estimating . From the above
computation we can learn that as the variation in the population increase the sample size
will increase.
A more convenient computational formula for determining n is.

( )
2
Z .S
n= E
where E = allowable error
Z = Z value for the degree of confidence selected
S = Sample deviation

(
1. 96×3000 2
For this example n = 200
)
= 864.36
Example 1. A marketing research firm wants to conduct a survey to estimate the average
amount spent on entertainment by each person visiting a popular pub. The people who
plan the survey would like to be able to determine the average amount spent by all people
visiting the pub to within br. 120, with 95% confidence. From past operations of the pub,
an estimate of the population standard deviation is  = br. 400 what is the minimum
required sample sizes?
Z = 1.96
E = 120
 = 400
Required, n?

( )
2
1. 96×400
n = 120 = 42.68  43

Stat. for Mgt. II Page 18


Illustration-2 : A gasoline service station shows a standard deviation of Birr 6.25 for the
changes made by the credit card customers. Assume that the station’s management would
like to estimate the population mean gasoline bill for its credit card customers to be
within ± Birr 1.00. For a 95% confidence level, how large a sample would be necessary?
Solution:

e = Birr 1.00 σ = Birr 6.25 C = 0.95


Z α /2=Z 0. 025 =1. 96

( )
2
Zα /2 σ
nμ =
e

nμ = (
1. 96∗6 . 25 2
1 )
= 150. 06 ≈151
Illustration-3: The National Travel and Tour Organization (NTO) would like to estimate
the mean amount of money spent by a tourist to be within Birr 100 with 95% confidence.
If the amount of money spent by tourist is considered to be normally distributed with a
standard deviation of Br 200, what sample size would be necessary for the NTO to meet
their objective in estimating this mean amount?
Solution:
e = Birr 100 σ = Birr 200 C = 0.95
Z α /2=Z 0. 025 =1. 96

( )
2
Zα /2 σ
nμ =
e

( )
2
1. 96∗200
nμ =
100
= 15. 37 ≈ 16

2.6.2 Sample size for estimating population proportion, p


The procedure used to determine the sample size for the mean is applicable to determine when
proportions are involved.
Three things must be specified.
- Decide on the level of confidence
- Indicate how precise the estimate of the population proportion must be
- Approximate the population proportion, P, either from past experience or from a
small pilot survey p
The formula for determining the sample size n for a proportion
2
n= p (1 - p ) ( ZE )

Stat. for Mgt. II Page 19


where: p - estimated proportion
Z = Z value for the selected confidence level
E = the maximum tolerable error
Example 1. A member of parliament wants to determine her popularity in her a region.
She indicates that the proportion of voters who will vote for her must be estimated with
in+ 2 percent of the population proportion. Further, the 95% degree of confidence is to be
used. In past elections she received 40% of the popular vote in that area. She doubts
whether it has changed much. How many registered voters should be sampled?
Z = 1.96
p = 0.40
E = 0.02
2
n= p (1 - p ) ( ZE )

( )
1. 96 2
= 0.40 (1 – 0.4) 0 . 02 = 2,304.96  2305
This sample size might be too large, or too small or exactly correct depending on the
accuracy of p. Note: if there is no logical estimate of p, the sample size can be
estimated by letting p =0.5
Example 2. Suppose the president wants an estimate of the proportion of the population
that support this current policy on unemployment. The president wants the estimate to be
with in 0.04 of the true proportion. Assume a 95% level of confidence and the proportion
supporting current policy to be 0.60.
a) How large a sample is required
b) How large would the sample have to be if the estimate were not available?
Solution:
a) E = 0.04
Z = 1.96
p = 0.60
( )
1 . 96 2
n = 0.6(1 – 0.6) 0 . 04
= 577
b) E = 0.4
Z = 1.96
p = 0.50 (since there is no estimate)

( )
1 . 96 2
n = 0.5 (1 – 0.5) 0 . 04 = 600
Illustration -3: Suppose that a production facility purchases a particular component parts
in large lots from a supplier. The production manager wants to estimate the proportion of
defective parts received from this supplier. She believes that the proportion of defects is

Stat. for Mgt. II Page 20


no more than 0.2 and wants to be with in 0.02 of the true proportion of defects with a
90% level of confidence. How large a sample should she take?

Solution:
e = 0.02 p = 0.2 q =0.8 C = 0.90
Z α /2=Z 0. 05=1 . 64

( )
2
Zα /2
np = pq
e

np = (
1 . 64 2
0. 02 )
0 . 2∗0 . 8
=1075 .84 ≈1076

Illustration-4: What is the largest sample size that would be needed in estimating a
population proportion to be within ± 0.02, with a confidence coefficient of 0.95?
Solution:
e = 0.02 C = 0.95
Z =Z
α /2 0. 025 =1. 96
The largest sample size would be obtained when p = 0.5. So,

( )
2
Zα /2
np = pq
e

( ) 0 . 5∗0. 5
2
1 . 96
np =
0. 02
=2401

If p is unknown and there is no possibility of estimating it, use 0.5 as the value of p
because it will generate the greatest possible sample size as compared with other values.
1.6.3. Determining Sample Size When Estimating μ1 −μ2
When taking two random samples and using the difference in sample means to estimate
the difference in population means, a researcher should have an idea of how large the
( X 1 −X 2 )−( μ 1−μ 2 )
Z=

sample sizes need to be solving for n form the formula n1 n2 √


σ 21 σ 22
+

not look promising, because the equation has nine variables including two different
does

Stat. for Mgt. II Page 21


values of n. However making some assumptions can generate a workable sample size
formula.
2
σ 2 =σ 2= σ
1. Variances of the two populations are the same: 1 2

2. The sample size for each sample is the same: n1 =n 2=n


The difference between x 1−x 2 and μ1 −μ2 is the error of estimation. Or
e=( x 1 −x 2 ) −( μ1 −μ2 ) .
Incorporating these assumptions into the z-formula yields
e e e e
Z=

√ σ2 σ2
+
n n = √
Solving for n produces the sample size:
2 σ2
n = √ 2 2
n
σ
= √ 2
n
σ

2 ∗Z 2α /2 σ 2
( )
2
Z σ
n= = 2 α /2
e2 e

The above formula suggests that the necessary sample sizes for comparing two sample
means are each twice as large as the required sample size for estimating single sample
means. It is clear that the larger the sample, the more it costs. Thus sample size formulas
can be effective aids in ensuring that a research project’s goals are met and that the cost
of sampling is minimized.

Illustration-1: A college admissions officer wants to estimate the difference in the


average GMAT scores of men and women. She plans to take a random sample of men
and women who have taken the GMAT at the same time. She wants to be with in 10
points of the true difference in the mean scores of men and women and 95% confident of
her results. Past GMAT test results indicate that the standard deviation of GMAT test
scores is about 105 points. How large the sample sizes be?
Solution:
e = 10 points σ = 105 points C = 0.95
Z α /2=Z 0. 025 =1. 96 n=?

( )
2
Z α /2 σ
n= 2
e

Stat. for Mgt. II Page 22


n= 2 (
1 . 96∗105 2
10 )
= 2( 421 .54 )
= 847 .10 ≈ 848
Illustration-2
A researcher wants to estimate the difference between the average price of a 21-inch
black and white TV and the average price of a 21-inch color TV set. He believes that the
standard deviation of the price of a 21-inch TV set is about Birr 100. He wants to be 99%
confident of his results and within Birr 20 of the true difference. How large a sample
should he take for each type of television set?
Solution:
e = Birr 20 σ = Birr 100 C = 0.99
Z α /2=Z 0. 005 =2. 58 n=?

( )
2
Z α /2 σ
n= 2
e

n= 2 (
2 .58∗100 2
20 )
= 2(166 . 41 )
=332. 82 ≈ 333

The end of chapter -two

Stat. for Mgt. II Page 23

You might also like