Buss. Stat CH-2
Buss. Stat CH-2
CHAPTER TWO
STATISTICAL ESTIMATION
2.1. Basic Concepts
Statistical Inference is the process of making judgment about a population based on
sampling properties. An important aspect of statistical inference is using estimates to
approximate the value of an unknown population parameter. This chapter will study
different kinds of estimator and lay the foundations for making statistical inference
about the population mean and proportion.
Analyzed
Inference Data
Population
Numerical data
Sample
Definitions
Confidence Interval: An interval estimate with a specific level of confidence
Confidence Level: The probability that the interval estimate will contain the parameter.
Consistent Estimator: An estimator which gets closer to the value of the parameter as
the sample size increases.
Degrees of Freedom: The number of data values which are allowed to vary once a
statistic has been determined.
We can make two types of estimates about a population: a point estimate and an
interval estimate.
A. Point estimate: It is a single number that is used to estimate an unknown population
parameter. Point estimate is the values computed from sample distribution that is used
to estimate the population parameter. The sample mean, X̄ is a point estimator for the
population mean, μ; and sample proportion, p̄ estimates parameter population
proportion, P.
B. Interval estimate: It is a range of values used to estimate the population parameter. It
places the unknown population parameter between two limits. It has ranges to estimate
the population. It also assumes or considers the errors associated with the sampling
procedure. It indicates the errors in two ways: by the extent of its range and by the
probability of the true population parameter lying within that range.
Example: The mean of the age of men attending a show is between 28 and 36 years or it
can be written as28 ≤ age ≤ 36.
2. Efficiency: Efficiency refers to the size of the standard error or standard deviation of the
statistic. An efficient estimator considers the reliability of the estimator in terms of its
tendency to have a smaller standard error for the same sample size when compared
each other.
The sample mean and sample proportions are consistent estimators, since from their
σ
formulas as n get big, the standard error becomes small, that is, σ X̄ = and σ p̄ =
√n
√ pq
n
.
2.1.
Interval Estimates and Confidence Intervals
The probability that we associate with an interval estimate is called the confidence level.
This probability indicates how confident we are that the interval estimate will include
the population parameter. A higher probability means more confident.
Interval Estimation establishes an interval consisting of a lower limit and an upper limit
in which the true value of the population parameter is expected to fall. This interval is
called “Confidence Interval” in the parlance of inferential statistics.
≤ 5%N
, when n or N is very large, otherwise use the multiplier
providing an area ofα /2 in the upper tail of the standard normal distribution.
A review of the normal distribution will illustrate the probability in terms of the
interval estimate around the mean.
The z-score for the normal variable statistics is used to help the determination of the
interval endpoints that correspond to the probability of degree of certainty one which to
use for the interval estimator.
An interval estimator for the mean is given by the following formula:
Or
For reasonably large samples, the results of the central limit theorem state the following:
1. 90% of the sample means selected from population will be within 1.64 standard
deviations of the population mean μ.
2. 95% of the sample means selected from population will be within 1.96 standard
deviations of the population mean μ.
3. 99% of the sample means will lie within 2.58 standard deviations of the
population mean.
Example: Find an interval estimator of the sample mean of a random variable of sample
size 49 if the population standard deviation is 5 and the sample mean is 15. Assume a
95% confidence interval for the population true mean.
Solution: σ X̄ =
√σ2 σ
= =
n √ n √ 49
5
= 0.7143, and the given X̄ = 15.
Since the level of significance, alpha is 5% (100 - 95) or 0.05, so α /2 = 0.025. From the
or the endpoints
The maximum error of the estimate, E, with level of confidence 1-α , is the error
associated with the estimate of the population mean from the sample mean and is
given by the formula below:
or The Confidence Interval, when both the mean and the standard
deviations are estimated from the sample mean and variance if or as above
Example: Find the 99% confidence interval estimate of the true population mean
income if a sample of 100 families gives a sample mean of $28,500. From previous
experience we know that the population standard deviation is $5,000.
Solution: Using alpha = 1 - 0.99 = 0.01, the probabilities are α /2 = 0.01/2 = 0.005; thus,
Z0.005=2.57
Figure. Normal distribution, t distribution for sample size n=21, and t distribution for sample size
n=6.
computed, and they tell the researcher which specific curve to use when a distribution
consists of a family of curves.
For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary. But
once 4 values are selected, the fifth value must be a specific number to get a sum of 50,
since 50/5 = 10. Hence, the degrees of freedom are 5 - 1 = 4, and this value tells the
researcher which t curve to use.
Degree of freedom = n-1
Example 1: An agricultural chemical retail firm wants to estimate the average number
of gallons of weed killer sold per day for the purpose of accurately forecasting and
controlling inventory. Twelve business days were monitored, and average daily sales of
10 gallons were recorded. The sample yielded standard deviation of 2 gallons. Calculate
the confidence limits at the 95% level?
Solution:
n = 12 X̄ = 10 s=2 CL = 0.95
δ = 1 - CL = 1 - 0.95 = 0.05 df = n - 1 = 12 - 1 = 11
δ/2 = 0.05/2 = 0.025 tδ/2 = t0.025 at df (11)= 2.201
Confidence interval:
X̄ ±tα /2 s = 10±2.201(2/√ 12) = 10 ±2.201(0.57735) = 10 ±1.27
√n
= (8.73, 11.27)
Example 2: A drug company is testing a new drug which is supposed to reduce blood
pressure. From the six people who are used as subjects, it is found that the average drop
in blood pressure is 2.28 points, with a standard deviation of .95 points. What is the 95%
confidence interval for the mean change in pressure?
Solution:
X̄ =2. 28 ,
S=0. 95 , 1−α =0 . 95⇒ α =0 . 05 , α /2=0 .025
⇒t α / 2 =2. 571 with df =5 from table .
⇒The required int erval will be X̄±t α / 2 S / √ n
=2. 28±2. 571∗0 . 95/ √ 6
=2. 28±1 . 008 4.2.2.
=( 1 .28 , 3. 28 )
Interval Estimation for a Population Proportion
Example: When a sample of 70 retail executives was surveyed regarding the poor
performance of the retail industry, 66% believed that decreased sales were due to
unseasonably warm temperatures, resulting in consumers’ delaying purchase of cold-
weather items.
a) Estimate the standard error of the proportion of retail executives who blame warm
weather for low sales.
b) Find the upper and lower confidence limits for this proportion, given a 95% confidence
level.
p̄
Solution: n= 70, and = 0.66
√ √
a) σ p̄ = pq = 0.66(0.34) =0.0566
n 70
√
b) p̄ ± Z pq =0.66 ± 1.96(0.0566) =0.66 ± 0.111= [0.316, 0.77].
n
It is discussed so far, we have used for sample size the symbol n instead of a specific
number. Now we need to know how large should the sample be? If it is too small, we
may fail to achieve the objective of our analysis. But if it is too large, we waste resources
when we gather the sample.
Sampling error is controlled by selecting a sample that is adequate in size. In general,
the more precision we want, the larger the sample we will need to take.
The correct sample size depends on three factors. These factors are:
Then solving for n, the sample size for some expected level of error, E; and
then the sample size needed is determined by the formula:
≥
(n is expected to be large, 30)
Example 1: An average price for gasoline is expected to be $1.45 per gallon, if the
standard deviation for a specific National State is $0.10 per gallon. It is believed that the
mean price per gallon has changed. How many samples (gas stations) should be studied
so as to estimate the new National state's mean with a maximum error of the estimate of
$0.01 and a 90% level of confidence?
Solution: α = 0.10 From reference table α /2 = 0.05, Z0.05= 1.65,σ =0.10 ,E = 0.01. So,
The procedures for determining sample sizes for estimating a population proportion are
similar to those for estimating a population mean. Then the sample size needed is
determined by the formula:
z
n = p q [ α / 2 ]2 , 1- p=q
E
Example: Suppose the Prime Minister of a country wants an estimate for the proportion
of the ‘Kebele’ administrators who support the country’s current economic policy. The
Prime Minister wants the estimate to be within ± 0.04 of the true proportion and a 95%
level of confidence. The secretary of the office of the Prime Minister estimated the
proportion supporting the current policy to be 0.60. What sample size is required?
Solution: 1−α =0.95, α =0.05, α /2=0.025, and 0.5-0.025=0.4750, Z0.4750=1.96, E=0.04, p=0.60,
and q =0.40. So,
z 1.96 2
n = p q [ α / 2 ]2= (0.60)(0.40)[ ] =576.24=577
E 0.04
Therefore, the number of ‘Kebele” leaders who support the country’s current economic
policy are 577.
Exercises
1. The operation manger of a certain Tele-center is in the process of developing an
operation plan. For that purpose, he takes a random sample of 60 calls from the
company records and finds that the mean sample length for a call is 4.26
minutes. Past history for these types of calls has shown that the population
standard deviation for call length is about 1.1 minutes. Assuming that the
population is normally distributed and he wants to have a 95% confidence, help
him in estimating the population mean.
2. A survey conducted by a CSA found that the sample mean age of men was 44
years and the sample mean age of women was 47 years. Altogether, 454 people
from Oromia were included in the reader poll –340 women and 114 men.
Assume that the population standard deviation of age for both men and women
is 8 years. Develop a 99% confidence interval estimate for the mean age of the
population men.
3. Suppose that a survey is being conducted in a company that has 800 workers. A
random sample of 50 of these workers reveals that the average sample age is 34.3
years, and the sample standard deviation is 8 years. Assuming normality,
construct a 98% confidence interval to estimate the average age of all workers in
this company.
4. A recent study showed that the modern working person experiences an average
of 2.1 hours per day of distractions (phone calls, e-mails, impromptu visits, etc.).
A random sample of 50 workers for a large corporation found that these workers
were distracted an average of 1.8 hours per day and the population standard
deviation was 20 minutes. Estimate the true mean population distraction time
with 90% confidence, and compare your answer to the results of the study.
5. A survey of 30 emergency room patients found that the average waiting time for
treatment was 174.3 minutes. Assuming that the population standard deviation
is 46.5 minutes, find the best point estimate of the population mean and the 99%
confidence of the population mean.
6. Ten randomly selected people were asked how long they slept at night. The
mean time was 7.1 hours, and the standard deviation was 0.78 hour. Find the
95% confidence interval of the mean time. Assume the variable is normally
distributed.
7. If a random sample of 27 items produces a mean of 128.4 and standard deviation
of 20.6 what is the 98% confidence interval for μ ? Assume that x is normally
distributed for the population. What is the point estimate?
8. A meteorologist who sampled 13 thunderstorms found that the average speed at
which they traveled across a certain state was 15 miles per hour. The standard
deviation of the sample was 1.7 miles per hour. Find the 99% confidence interval
of the mean. If a meteorologist wanted to use the highest speed to predict the
times it would take storms to travel across the state in order to issue warnings,
what figure would she likely use? 13.6 _ m _ 16.4; 16.4 miles per hour
9. A recent study of 28 employees of XYZ Company showed that the mean of the
distance they traveled to work was 14.3 miles. The standard deviation of the
sample mean was 2 miles. Find the 95% confidence interval of the true mean. If a
manager wanted to be sure that most of his employees would not be late, how
much time would he suggest they allow for the commute if the average speed
were 30 miles per hour?
10. For a group of 22 college football players, the mean heart rate after a morning
workout session was 86 beats per minute, and the standard deviation was 5. Find
the 90% confidence interval of the true mean for all college football players after
a workout session.
11. The national average for the number of students per teacher for all U.S. public
schools is 15.9.Arandom sample of 12 school districts from a moderately
populated area showed that the mean number of students per teacher was 19.2
with a variance of 4.41. Estimate the true mean number of students per teacher
with 95% confidence. How does your estimate compare with the national
average?
12. A gasoline service station shows a standard deviation of Birr 6.25 for the changes
made by the credit card customers. Assume that the station’s management
would like to estimate the population mean gasoline bill for its credit card
customers to be within an error of Birr 1.00. For a 95% confidence level, how
large a sample would be necessary?
13. A random sample of shoppers at a convenience store is selected to see how much
they spent on that visit. The standard deviation of the population is $6.43. How
large a sample must be selected if the researcher wants to be 99% confident of
finding whether the true mean differs from the sample mean by $1.50?
14. A random sample of 205 college students were asked if they believed that places
could be haunted, and 65 responded yes. Estimate the true proportion of college
students who believe in the possibility of haunted places with 99% confidence.
According to Time magazine, 37% of Americans believe that places can be
haunted.
15. A survey conducted by Sallie Mae and Gallup of 1404 respondents found that
323 students paid for their education by student loans. Find the 90% confidence
of the true proportion of students who paid for their education by student loans.
16. The national average for the percentage of high school graduates taking the SAT
is 49%, but the state averages vary from a low of 4% to a high of 92%. A random
sample of 300 graduating high school seniors was polled across a particular
tristate area, and it was found that 195 had taken the SAT. Estimate the true
proportion of high school graduates in this region who take the SAT with 95%
confidence.
17. A researcher wishes to estimate, with 95% confidence, the proportion of people
who own a home computer. A previous study shows that 40% of those
interviewed had a computer at home. The researcher wishes to be accurate
within 2% of the true proportion. Find the minimum sample size necessary.
18. It is believed that 25% of U.S. homes have a direct satellite television receiver.
How large a sample is necessary to estimate the true population of homes which
do with 95% confidence and within 3 percentage points? How large a sample is
necessary if nothing is known about the proportion?
19. America’s young people are heavy Internet users; 87% of Americans ages 12 to
17 are Internet users (The Cincinnati Enquirer, February 7, 2006). MySpace was
voted the most popular website by 9% in a sample survey of Internet users in this
age group. Suppose 1400 youths participated in the survey. What is the margin
of error, and what is the interval estimate of the population proportion for which
MySpace is the most popular website? Use a 95% confidence level.