0% found this document useful (0 votes)

22 views20 pages

Chapter Two Stat II

Chapter Two discusses statistical estimation, focusing on the sampling distribution of means and proportions, and the process of estimating unknown population parameters using sample data. It outlines the basic concepts of estimation, properties of estimators (unbiasedness, efficiency, consistency, and sufficiency), and differentiates between point and interval estimates. The chapter also provides formulas and examples for calculating confidence intervals for population means and proportions, emphasizing the importance of sample size and standard deviation in these estimations.

Uploaded by

seadkelil45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views20 pages

Chapter Two Stat II

Uploaded by

seadkelil45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

CHAPTER TWO

STATISTICAL ESTIMATION

Introduction

The sampling distribution of the mean shows how far sample means could be from a known
population mean. Similarly, the sampling distribution of the proportion shows how far sample
proportions could be from a known population proportion. In estimation, our aim is to determine
how far an unknown population mean could be from the mean of a simple random sample
selected from that population; or how far an unknown population proportion could be from a
sample proportion. Those are the concerns of statistical inference, in which a statement about an
unknown population parameter is derived from information contained in a random sample
selected from the population.

2.1. Basic concepts:

Estimation: is the process of using statistics as estimates of parameters. It is any procedure
where sample information is used to estimate/ predict the numerical value of some
population measure (called a parameter).
Estimator- refers to any sample statistic that is used to estimate a population parameter.
E.g. x for  , p for p.
Estimate- is a specific numerical value of our estimator. E.g. x  9, 2, 5

x, p, s 2 , s ……………. Estimators
 , p, 2 , ………………… items being estimated
1, 0.5, 9, 3 …………………... Estimates

Four Important Properties of Estimators

A number of different estimators are possible for the same population parameter, but some
estimators are better than others. To understand how, we need to look at four important
properties of estimators: unbiasedness, efficiency, consistency, and sufficiency.

Unbiasedness: An estimator exhibits unbiasedness when the mean of the sampling estimator is
equal to the population parameter: E (ө) = Ө.

In general, unbiasedness is a desirable property for an estimator. The sample mean is an unbiased
estimator of the population mean. Similarly, the sample variance is an unbiased point estimator
the population variance because the mean of the sampling distribution of the sample variance is
equal to the population variance. And the sample proportion is an unbiased estimator of the
population proportion. However, because standard deviation is a nonlinear function of variance,
the sample standard deviation is not an unbiased estimator of population standard deviation. The
bias of a point estimator is: Bias = E (ө) = Ө. If there are a number of unbiased estimators to
choose from, there are three other criteria that could be used to select an estimator.

1
Efficiency: Efficiency is another standard that can be used to evaluate estimators. Efficiency
refers to the size of the standard error of the statistics. The most efficient estimator is the one
with the smallest variance. Thus, if there are two estimators for Ө with var (ө1) and var (ө2), then
the first estimator ө1 is said to be more efficient than the second estimator ө2, if var(ө1) < var (ө2)
although E(ө1) = E(ө2) = Ө.

Consistency: A third property of estimators, consistency, is related to their behavior as the

sample gets large. A statistic is a consistent estimator of a population parameter if, as the sample
size increases, it becomes almost certain that the value of the statistic comes very close to the
value of the population parameter.

An unbiased estimator is a consistent estimator if the variance approaches 0 as n increases. For

example, the sample mean is an unbiased and a consistent estimator of population mean.
Although the sample standard deviation is not an unbiased estimator of population standard
deviation, it is a consistent estimator of population standard deviation.

Sufficiency: The last property of a good estimator is sufficiency. A sufficient statistic is an

estimator that utilizes all the information a sample contains about the parameter to be estimated.
For example, the sample mean is a sufficient estimator of the population mean. This means that
no other estimator of the population mean from the same sample data, such as the sample
median, can add any further information about the parameter (population mean) that is being
estimated.

Types of Estimates:
We can make two types of estimates about a population: a point estimate and an interval
estimate.

2.2. Point Estimator of the Mean and Proportion

A point estimate: - is a single number that is used to estimate an unknown population parameter.
It is a single value that is measured from a sample and used as an estimate of the corresponding
population parameter.
The most important point estimates (given that they are single values) are:


o Sample mean x for population mean   ;
o Sample proportion p for population proportion  p  ;

 
o Sample variance s 2 for population variance  2 and 
o Sample standard deviation s  for population standard deviation  

An interval estimate - is a range of values used to estimate a population parameter. It describes

the range of values with in which a parameter might lie. Stated differently, an interval estimate is
a range of values with in which the analyst can declare with some confidence that the population
parameter will fall.

2
Example:
Suppose we have the sample 10,20,30,40 and 50 selected randomly from a population whose
mean  is unknown.

The sample mean, x ,  =

xi 10  20  30  40  50
 30 is a point estimate of  .
n 5

On the other hand, if we state that the mean,  , is between x  10 , the range of values from 20
(30-10) to 40 (30+10) is an interval estimate.

2.3. Interval Estimators of the Mean and the Proportion

Point estimators of population parameters, while useful, do not convey as much information as
interval estimators. Point estimation produces a single value as an estimate of the unknown
population parameter. The estimate may or may not be close to the parameter value; in other
words, the estimate may be incorrect. An interval estimate, on the other hand, is a range of
values that conveys the fact that estimation is an uncertain process. The standard error of the
point estimator is used in creating a range of values; thus a measure of variability is incorporated
into interval estimation. Further, a measure of confidence in the interval estimator is provided;
consequently, interval estimates are also called confidence intervals. For these reasons, interval
estimators are considered more desirable than point estimators.

2.3.1. Interval estimation for population mean, 

As a result of the Central Limit Theorem (discussed in Chapter I) the following Z formula for
sample means can be used when sample sizes are large, regardless of the shape of the population
distribution or for smaller sizes if the population is normally distributed.
X 
Z

n
Rearranging the formula:
  X Z
n
Because the sample mean can be greater than or less than the population mean, z can be positive
or negative. Thus, the preceding expression takes the form:
  X Z
n
The value of the population mean,  , lies somewhere within this range. Rewriting this
expression yields the confidence interval for population mean:
X Z    X Z
n n

The confidence interval for population mean is affected by:

3
1. The population distribution, i.e., whether the population is normally distributed or not
2. The standard deviation, i.e., whether  is known or not.
3. The sample size, i.e., whether the sample size, n, is large or not.

A. Confidence internal estimate of  - Normal population,  known

A confidence interval estimate for  is an interval estimate together with a statement of how
confident we are that the interval estimate is correct.

When the population distribution is normal and at the same time  is known, we can estimate 
(regardless of the sample size) using the following formula1.

  X  Z / 2 
n
Where:
X = sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation
n = sample size
α = the proportion of incorrect statements (α = 1 – C)
 = unknown population mean

From the above formula we can learn that an interval estimate is constructed by adding and
subtracting the error term to and from the point estimate. That is, the point estimate is found at
the center of the confidence interval.

To find the interval estimate of population mean,  we have the following steps.
1. Compute the standard error of the mean  x 
2. Compute  2 from the confidence coefficient.
3. Find the Z value for the  2 from the table
4. Construct the confidence interval
5. Interpret the results

Example:
1. The vice president of operations for Ethiopian Tele Communication Corporation (ETC) is in the
process of developing a strategic management plan. He believes that the ability to estimate the
length of the average phone call on the system is important. He takes a random sample of 60
calls from the company records and finds that the mean sample length for a call is 4.26 minutes.

1
This formula works also for problems which involve large sample size (n>30) even though the
n
population is not normally distributed. And if >0.05, finite population correction factor may be used.
N

4
Past history for these types of calls has shown that the population standard deviation for call
length is about 1.1 minutes. Assuming that the population is normally distributed and he wants
to have a 95% confidence, help him in estimating the population mean.

Solution:
n= 60 calls X = 4.26 minutes σ = 1.1 minutes C= 0.95

iv.   X  Z / 2 
1.1
i. X  = = 0.142
n 60 n
ii. α = 1 – C = 1- 0.95 = 0.05 = 4.26 ± 1.96(0.142)
 2 = 0.05/2 = 0.025 = 4.26 ± 0.28
iii. Z / 2  Z0.025  1.96
3.98 ≤  ≤ 4.54
The vice-president of ETC can be 95% confident that the average length of a call for the
population is between 3.98 and 4.54 minutes.
2. A survey conducted by “Addis Zemen Gazetta” found that the sample mean age of men was 44
years and the sample mean age of women was 47 years. All together, 454 people from Addis
were included in the reader poll –340 women and 114 men. Assume that the population standard
deviation of age for both men and women is 8 years.
a. Develop a 95% confidence interval estimate for the mean age of the population men who
read the gazetta.
b. Develop a 95% confidence interval estimate for the mean age of the population women
who read the gazetta.
c. Compare the widths of the two interval estimates form part (a) & (b) which one has a
better precision? Why?
Solution:
a.
n= 114 men X = 44 years σ = 8 years C= 0.95

iv.   X  Z / 2 
8
i.  X  = = 0.75
n 114 n
ii. α = 1 – C = 1- 0.95 = 0.05 = 44 ± 1.96(0.75)
 2 = 0.05/2 = 0.025 = 44 ± 1.47
iii. Z / 2  Z0.025  1.96
42.53 ≤  ≤ 45.47
b.
n= 340 women X = 47 years σ = 8 years C= 0.95

iv.   X  Z / 2 
8
i. X  = = 0.434
n 340 n
ii. α = 1 – C = 1- 0.95 = 0.05 = 47 ± 1.96(0.434)
 2 = 0.05/2 = 0.025 = 47 ± 0.85
iii. Z / 2  Z0.025  1.96
46.15 ≤  ≤ 47.85

5
i. Part b has a better precision because the sample size is larger as compared with part a.

3. Time magazine reports information on the time required for caffeine from products such as
coffee and soft drinks to leave the body after consumption. Assume that the 99% confidence
interval estimate of the population mean time for adults is 5.6 hrs to 6.4 hrs.
a. What is the point estimate of the mean time for caffeine to leave the body after
consumption?
b. If the population standard deviation is 2 hrs, how large a sample was used to provide the
interval estimate?
Solution:
C = 0.99 Confidence interval: 5.6 ≤  ≤6.4
5.6  6.4
a. point estimate =  6 hours
2
Or;
5.6  X  Z / 2 
 n

6.4  X  Z / 2 
 n
12 = 2 X
X = 6 hours

b. 0.99 σ = 2 hours Confidence interval: 5.6 ≤  ≤6.4 n=?

α = 1- C = 1- 0.99 = 0.01 α/2 = 0.005 Z  / 2  Z 0.005  2.57

6.4  X  Z / 2 
n
6.4  6  2.57 2
n
0.4  5.14 ; rearranging the expression
n
5.14
n
0.4
n  12.85 ; squaring both sides
n = 165

We state with 99% confidence that the mean time required for caffeine to leave the body after
consumption lies between 5.6 and 6.4 hrs.

B. Confidence interval estimate of  - Normal population,  unknown, n large

If we know that the population is normal, and we know the population standard deviation,  the
confidence interval for  should be constructed in the manner already shown, i.e.,

6
  X  Z / 2  . If the population standard deviation is unknown, it has to be
n
estimated from the sample; i.e., when  is unknown, we use sample standard
2
( X i  X )
deviation: S  . Then, the standard error of the mean,  X , is estimated by the
n 1
sample standard error of the mean: S X  S .
n

Therefore, the confidence interval to estimate  when population standard deviation is unknown,
population normal and n is large is2
  X  Z / 2 S .
n

Example:
1. Suppose that a car rental firm in Addis wants to estimate the average number of miles traveled
by each of its cars rented. A random sample of 110 cars rented reveals that the sample means
travel distance per day is 85.5 miles, with a sample standard deviation of 19.3 miles. Compute a
99% confidence interval to estimate  .
Solution:
n= 110 rented cars X = 85.5 miles s = 19.3 miles C= 0.99
S 19.3
i. S X  = = 1.84 iv.   X  Z / 2 s
n 110 n
ii. α = 1 – C = 1- 0.99 = 0.01 = 85.5 ± 2.57(1.84)
 2 = 0.01/2 = 0.005 = 85.5 ± 4.73
iii. Z / 2  Z 0.005  2.57
80.77 ≤  ≤ 90.23

We state with 99% confidence that the average distance traveled by rented cars lies between
80.77 and 90.23 miles.
2. A study is being conducted in a company that has 800 engineers. A random sample of 50 of
these engineers reveals that the average sample age is 34.3 years, and the sample standard
deviation is 8 years. Assuming normality, construct a 98% confidence interval to estimate the
average age of all engineers in this company.
Solution:
n= 50 engineers N = 800 engineers X = 34.3 years s = 8 years C= 0.98
S N n 3 8 800  50
i. S X  * = * = 1.10
n N 1 50 800  1

2
This formula also works for large sample size even though the parent population is not normally
distributed.

3
Since the sample size is greater than 5% of the population size, finite population multiplier is used to
calculate the sample standard error of the mean.

7
ii. α = 1 – C = 1- 0.98 = 0.02
 2 = 0.02/2 = 0.01
iii. Z  / 2  Z 0.01  2.33
N n
iv.   X  Z / 2 s *
n N 1
= 34.3 ± 2.33(1.10)
= 34.3 ± 2.56

31.74 ≤  ≤ 36.86
We state with 98% confidence that the mean age of engineers lies between 31.74 and 36.86
years.

C. Confidence interval for    unknown, n-small, population normal

If the sample size is small (n<30), we can develop an interval estimate of a population mean only
if the population has a normal probability distribution. If the sample standard deviation s is used
as an estimator of the population standard deviation  and if the population has a normal
distribution, interval estimation of the population mean can be based up on a probability
distribution known as t-distribution.

Characteristics of t-distribution

1. The t-distribution is symmetric about its mean (0) and ranges from - ∞ to ∞.
2. The t-distribution is bell-shaped (unimodal) and has approximately the same appearance as
the standard normal distribution (Z- distribution).
3. The t-distribution depends on a parameter ν (Greek Nu)4, called the degrees of freedom of the
distribution. Ν = n -1, where n is sample size. The degree of freedom, ν, refers to the number
of values we can choose freely.
4. The variance of the t-distribution is ν/ (ν-2) for ν>2.
5. The variance of the t-distribution always exceeds 1.
6. As ν increases, the variance of the t-distribution approaches 1 and the shape approaches that
of the standard normal distribution.
7. Because the variance of the t-distribution exceeds 1.0 while the variance of the Z-distribution
equals 1, the t-distribution is slightly flatter in the middle than the Z-distribution and has
thicker tails.

4
What are degrees of freedom? We can define them as the number of values we can choose freely. In
general, the degrees of freedom for a t statistic are the degrees of freedom associated with the sum of
squares used to obtain an estimate of the variance. The variance estimate depends on not only on the
sample size but also on how many parameters must be estimated with the sample:
Degrees of  Number of  Number of parameters that
freedom Observations must be estimated beforehand
Here we calculate sample variance by using n observations and estimating one parameter (the mean).
Thus, there are (n – 1) degrees of freedom.

8
8. The t-distribution is a family of distributions with a different density function corresponding
to each different value of the parameter ν. That is, there is a separate t-distribution for each
sample size. In proper statistical language, we would say, “There is a different t-distribution
for each of the possible degrees of freedom”.
9. The t formula for sample when  is unknown, the sample size is small, and the population is
normally distributed is: t  X    X   This formula is essentially the same as the z-
SX s
n
formula, but the distribution table values are not.

The confidence interval to estimate  becomes:

  X  t / 2,v s
n

Where: X = sample mean

α=1–C
ν = n – 1 (degrees of freedom)
s = sample standard deviation
n = sample size
 = unknown population mean
Steps:
i. Calculate degrees of freedom (v=n-1) and sample standard error of the mean.
ii. Compute  2

iii. Look up t / 2 ,V
iv. Construct the confidence interval
v. Interpret results

Example:
1. If a random sample of 27 items produces x = 128.4 and s = 20.6. What is the 98% confidence
interval for  ? Assume that x is normally distributed for the population. What is the point
estimate?
Solution:

The point estimate of the population mean is the sample mean, in this case 128.4 is the point
estimate.

n= 27 X = 128.4 s = 20.6 C= 0.98

S 20.6
i. S X  = = 3.96 ν = n – 1 = 27-1 = 26
n 27
ii. α = 1 – C = 1- 0.98 = 0.02
 2 = 0.02/2 = 0.01
iii. t / 2,v t 0.01, 26  2.479

9
iv.   X  t / 2,v s
n
= 128.4 ± 2.479(3.96)
= 128.4 ± 9.82

118.56 ≤  ≤ 138.22
We state with 98% confidence that the population mean lies between 118.56 and 138.23.

2. A sample of 20 cab fares in Bahir Dar city shows a sample mean of Br 2.50 and a sample
standard deviation of Br. 0.50. Develop a 90% confidence interval estimate of the mean cab
fares in Bahir Dar city. Assume the population of cab fares has a normal distribution.

n= 20 X = Birr 2.50 s = Birr 0.50 C= 0.90

S 0.5
i. S X  = = 0.112 ν = n – 1 = 20-1 = 19
n 20
ii. α = 1 – C = 1- 0.90 = 0.10
 2 = 0.10/2 = 0.05
iii. t / 2,v t 0.05,19  1.729
iv.   X  t / 2,v s
n
= 2.50 ± 1.729(0.112)
= 2.50 ± 0.194
2.31 ≤  ≤ 2.69
We state with 90% confidence that the mean of cab fares in Bahir Dar city lies between Birr 2.31
and 2.69.

3. Sales personnel for X Company are required to submit weekly reports listing customer contacts
made during the week. A sample of 61 weekly contact reports showed a mean of 22.4 customer
contacts per week for the sales personnel. The sample standard deviation was 5 contacts.

a. Develop a 95% confidence interval estimate for the mean number of weekly customer
contacts for the population of sales personnel.
b. Assume that the population of weekly contact data has a normal distribution. Use the t
distribution to develop a 95% confidence interval for the mean number of weekly
customer contacts.
c. Compare your answer for parts (a) and (b). What do you conclude from your results?

Solution:
a.
n= 61 weekly contact reports5 X = 22.4 contacts s = 5 contacts C= 0.95
S 5
i. S X  = = 0.64
n 61

5
Since the sample size is large, we use the Z-distribution to construct the confidence interval.

10
ii. α = 1 – C = 1- 0.95 = 0.05
 2 = 0.05/2 = 0.025
iii. Z / 2  Z 0.025  1.96
iv.   X  Z / 2 s
n
= 22.4 ± 1.96(0.64)
= 22.4 ± 1.25

21.15 ≤  ≤ 23.65

I state with 95% confidence that the mean weekly contact lies between 21.15 and 23.65 contacts.

b.
n= 61 weekly contact reports X = 22.4 contacts s = 5 contacts C= 0.95
S 5
i. S X  = = 0.64 ν = n – 1 = 61 – 1 = 60
n 61
ii. α = 1 – C = 1- 0.95 = 0.05
 2 = 0.05/2 = 0.025
iii. t / 2,v t 0.025,60  2.00
iv.   X  t / 2,v s
n
= 22.4 ± 2.00 (0.64)
= 22.4 ± 1.28

21.12 ≤  ≤ 23.68

I state with 95% confidence that the mean weekly contact lies between 21.12 and 23.68 contacts.

c. As the sample size increases, the t-distribution and z (normal) distribution approximate to be
equal.

2.3.2. Interval Estimation of the Population Proportion

We know that a sample proportion, p , is an unbiased estimator of a population proportion P and

if the sample size is large then, the sampling distribution of p is normal with
PP PP
Z  .
p Pq
n

11
PP
However, here p is unknown and we want to estimate p by p and hence z becomes Z  .
pq
n

That is,  p is substituted by S p  pq

Solving for P results in P  p  Z p q and since Z can assume both positive and negative
n

values, it becomes P  p  Z p q .
n

P  p  Z / 2 pq
Since Z represents the confidence level we write it as n
 p  Z / 2 S p
Where: p = sample proportion
q =1- p
α=1–C
n = sample size
P = unknown population proportion
Example:
1. Recently, a study of 87 randomly selected companies with telemarketing operation was
completed. The study revealed that 39% of the sampled companies had used telemarketing to
assist them in order processing. Using this information estimate the population proportion of
telemarketing companies who use their telemarketing operation to assist them in order
processing taking a 95% confidence level.
Solution:
n= 87 p = 0.39 q = 0.61 C = 0.95

i. S p  pq = 0.61 * 0.39 = 0.0523

n 87
ii. α = 1 – C = 1- 0.95 = 0.05
 2 = 0.05/2 = 0.025
iii. Z / 2  Z 0.025  1.96
iv. P  p  Z / 2 S p
= 0.39 ± 1.96(0.0523)
= 0.39 ± 0.1025

0.2875 ≤ P ≤ 0.4925
We state with 95% confidence that the proportion of companies which use telemarketing to assist
order processing lies between 0.2875 and

2. A fast food restaurant took a random sample of 400 customers to determine the proportion of
customers who are female. A confidence interval of .73 to .87 was reported.

12
a. Find the number of females and the sample proportion
b. Find the level of confidence of this interval
Solution:
a.
n= 400 0.73 ≤ P ≤ 0.87 p =? Number of females=?
0.73  0.87
Point estimate =  0.80
2
Or;
0.73  p  Z / 2 s p


0.87  p  Z / 2 s p

1.60 = 2 p
p = 0.8
Number of females (X) = n* p = 400*0.8 = 320
b.
P  p  Z / 2 S p
0.87 = 0.8+ Z / 2 S p

0.07 = Z / 2 0.8 * 0.2

400
0.07 = Z  / 2 * 0.02
3.50 = Z  / 2
(P/Z=3.5) = 0.49977
C = 0.49977*2
= 99.954%
3. A random sample of 400 faculty members at AAU contained 120 people who believed that the
University should improve its library service. On the basis of this sample information, an analyst
calculated the confidence interval (.25, .35) for the population proportion of faculty members
favoring improvement. What is the level of confidence of this interval?
Solution:
n= 400 X = 120 p = 0.30 Interval estimate 0.25 ≤ P ≤ 0.30 C =?
P  p  Z / 2 S p
0.25 = 0.30 - Z / 2 S p

0.05 = Z / 2 0.7 * 0.3

400
0.05 = Z  / 2 * 0.023
2.17 = Z  / 2
(P/Z=2.17) = 0.485
C = 0.485*2
= 97%

13
2.4. Interval Estimation of the Difference between two independent Means
It is clear that the unbiased point estimate of the difference between the means of two

populations 1  2  is the difference between two sample means x1  x2 , where each sample 
is a random sample taken from the respective target population. The confidence interval is
constructed by adding the relevant standard error value which is called standard error of the
difference between means and the confidence level desired.

Interval Estimation of 1  2 - population normal,  known

If the two parent populations are normal, then the sampling distribution of the difference
between two means will be normally distributed regardless of n (sample size). And we can
estimate 1  2 (regardless of n1 & n2 using the following formula; given that  1 &  2 are known.

 1   2  X 1  X 2  Z  / 2 X  X 6
1 2

 12  22
X   X2   X2  
1X2 1 2
n1 n2
When  1 and  2 are not known, the standard error between two sample means  x1  x2 is  
estimated by the sample standard error of the difference between two sample means,
S12 S 22
SX  S 2
S 2
  , and the interval estimation takes the following form:
1X2 X1 X2
n1 n2

1   2  X 1  X 2  Z / 2 S X  X 1 2
, given that the sample sizes are large.

Example:
1. In a sex discrimination case, an employee alleged that a large corporation paid men more than
women for comparable work. Let population 1 represent all male employees performing certain
jobs and population 2 represent all female employees performing comparable jobs at the
corporation. Independent samples are taken of n1  100 males and n2  100 females; the sample
means are x1  Birr 20,600 and x 2  Birr 19,700 , and the sample standard deviations are
s1  Birr 3,000 and s 2  Birr 2,500 . Construct a 95% confidence interval for 1  2 . What do
you conclude from this?
Solution:
Male employees Female employees
n1  100 males n2  100 females C= 0.95
x1  Birr 20,600 x 2  Birr 19,700
s1  Birr 3,000 s 2  Birr 2,500
Steps:
i. Calculate the (sample) standard error of the difference between two means

6
This formula works also for problems which involve large sample sizes n1 & n2  30 even though the
parent population may not be normally distributed.

14
S12 S 22 (3,000) 2 (2,500) 2
SX      142,500  390.51
1X2
n1 n2 100 100
ii. Compute  2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
iii. Look up Z  / 2  Z 0.025  1.96
iv. Construct the confidence interval
1   2  X 1  X 2  Z / 2 S X  X
1 2

1   2  (20,600  19,700)  1.96(390.51) = 900

± 765.40
134.60 ≤ 1  2 ≤ 1,665.40
We state with 95% confidence that the mean salary difference between the male and female
workers lies between Birr 134.60 and Birr 1665.40

Because this interval contains only positive values, we can be quite confident that 1  2  > 0.
Thus, it reasonable to assume that the mean salary for males exceeds the mean salary for
females.

2. A farmer wants to determine if different types of feed can influence the mean member of eggs
that hens lay per month. In a random sample of 100 hens that ate feed 1, the average member of
eggs per month was x1  15.2 with variance 4. In a random sample of 100 hens that ate feed2,
the average number of eggs per month was x2  14 with variance 4. Construct a 95% confidence
interval for 1  2 . What do you conclude?
Solution:
Feed 1 Feed 2
n1  100 hens n2  100 hens C= 0.95
x1 15.2 eggs x 2  14 eggs
s  4 eggs
2
1 s 22  4 eggs
Steps:
i. Calculate the (sample) standard error of the difference between two means
S12 S 22 4 4
SX      0.08  0.283
1X2
n1 n2 100 100
ii. Compute  2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
iii. Look up Z  / 2  Z 0.025  1.96
iv. Construct the confidence interval
1   2  X 1  X 2  Z / 2 S X  X
1 2

1  2  (15.2  14)  1.96(0.283)

= 1.2 ± 0.5547
15
0.6453 ≤ 1  2 ≤ 1.7547

We state with 95% confidence that the mean number of eggs laid by hens which ate the two type
of feeds lies between 0.6543 eggs and 1.7547 eggs.

Since the interval contains only positive values, then those hens which ate feed type 1 are more
productive than those hens that ate feed type 2.

2.5. Determination of Sample Size

The reason for taking a sample from a population is that it would be too costly to gather data for
the whole population. But collecting sample data also costs money; and the larger the sample,
the higher the cost. To hold cost down, we want to use as small a sample as possible. On the
other hand, we want a sample to be large enough to provide “good” approximation/estimates of
population parameters. Consequently, the question is “How large should the sample be?”

The answer depends on three factors:

1) How precise (narrow) do we want a confidence interval to be?
2) How confident do we want to be that the interval estimate is correct?
3) How variable is the population being sampled?

2.4.1. Sample size for estimating population mean, 

The confidence interval for  is   X  Z / 2  .
n

From the above expression Z  / 2 is called error of estimation (e). That is, the difference
n
between x and  which results from the sampling process. So

e = Z / 2
n
2 Z 2 / 2 2
Squaring both sides results in e 2  Z 2 / 2 . Solving for n results in, n 
n e2

2
Z
n    / 2 
 e 
Example:
1. A gasoline service station shows a standard deviation of Birr 6.25 for the changes made by the
credit card customers. Assume that the station’s management would like to estimate the
population mean gasoline bill for its credit card customers to be with in ± Birr 1.00. For a 95%
confidence level, how large a sample would be necessary?
Solution:
e = Birr 1.00 σ = Birr 6.25 C = 0.95 Z  / 2  Z 0.025  1.96

16
Z  
2

n    / 2 
 e 
2
 1.96 * 6.25 
n    7
 1 
150.06  151

2. The National Travel and Tour Organization (NTO) would like to estimate the mean amount of
money spent by a tourist to be with in Birr 100 with 95% confidence. If the amount of money
spent by tourist is considered to be normally distributed with a standard deviation of Br 200,
what sample size would be necessary for the NTO to meet their objective in estimating this mean
amount?
Solution:
e = Birr 100 σ = Birr 200 C = 0.95 Z  / 2  Z 0.025  1.96
Z  
2

n    / 2 
 e 
2
 1.96 * 200 
n   
 100 
15.37  16

If population standard deviation,  , is unknown we have to make an educated guess or take a

pilot sample and estimate it.
H L
- The rough approximation is   because 95.4% of the total population falls with
4
in  2 .
  1 4 range
Relationship between the error term and sample size
Reducing error term in estimation of an interval estimate to 1/a of the original amount, while
holding the confidence level constant requires a sample size of a2 times the original sample size.

2.4.2. Sample size for estimating population proportion, p.

pq pq
The confidence interval for p is P  p  Z  / 2 . The expression Z  / 2 is called the error
n n
term (e). That is,
pq
e  Z / 2 , squaring both sides
n

7
It a procedure for determining sample size produces a non-integer value, always round to the next
larger integer.

17
pq
e 2  Z 2 / 2 , solving for n
n
Z 2 / 2 p q
np 
e2
Since we are trying to determine n, we can not have p and q . Instead, we should have p and q.
2
 Z / 2 
n 
so it becomes p   pq
 e 
Example
1. Suppose that a production facility purchases a particular component parts in large lots from a
supplier. The production manager wants to estimate the proportion of defective parts received
from this supplier. She believes that the proportion of defects is no more than 0.2 and wants to
be with in 0.02 of the true proportion of defects with a 90% level of confidence. How large a
sample should she take?
Solution:
e = 0.02 p = 0.2 q =0.8 C = 0.90 Z  / 2  Z 0.05  1.64
2
Z 
n p    / 2  pq
 e 
2
 1.64 
np    0.2 * 0.8
 0.02 
 1075.84  1076

2. What is the largest sample size that would be needed in estimating a population proportion to
with in ± 0.02, with a confidence coefficient of 0.95?
Solution:
e = 0.02 C = 0.95 Z  / 2  Z 0.025  1.96

The largest sample size would be obtained when p = 0.5. So,

2
Z 
n p    / 2  pq
 e 
2
 1.96 
np    0.5 * 0.5
 0.02 
 2401
If p is unknown and there is no possibility of estimating it, use 0.5 as the value of p because it
will generate the greatest possible sample size as compared with other values.

2.4.3. Determining Sample Size When Estimating 1  2

When taking two random samples and using the difference in sample means to estimate the
difference in population means, a researcher should have an idea of how large the sample sizes

18
need to be solving for n form the formula Z 
X 1 
 X 2   1   2 
does not look promising,
 12  22

n1 n2
because the equation has nine variables including two different values of n. However making
some assumptions can generate a workable sample size formula.
1. Variances of the two populations are the same: 1   2   2
2 2

2. The sample size for each sample is the same: n1  n2  n

 
The difference between x1  x 2 and 1  2 is the error of estimation. Or e  x1  x 2  1  2 .

Incorporating these assumptions into the z-formula yields

e e e e
Z = = =
2 2 2 2 2 2

2


n n n n n

Solving for n produces the sample size:

2 * Z 2 / 2 2 Z  
2

n  2  / 2 
e2  e 
The above formula suggests that the necessary sample sizes for comparing two sample means are
each twice as large as the required sample size for estimating single sample means. It is clear
that the larger the sample, the more it costs. Thus sample size formulas can be effective aids in
ensuring that a research project’s goals are met and that the cost of sampling is minimized.

Example:
1. A college admissions officer wants to estimate the difference in the average GMAT scores of
men and women. She plans to take a random sample of men and women who have taken the
GMAT at the same time. She wants to be with in 10 points of the true difference in the mean
scores of men and women and 95% confident of her results. Past GMAT test results indicate that
the standard deviation of GMAT test scores is about 105 points. How large the sample sizes be?
Solution:
e = 10 points σ = 105 points C = 0.95 Z  / 2  Z 0.025  1.96 n=?
Z  
2

n  2  / 2 
 e 
2
 1.96 *105 
n  2 
 10 
 2( 421.54)
 847.10  848

2. A researcher wants to estimate the difference between the average price of a 21-inch black and
white TV and the average price of a 21-inch color TV set. He believes that the standard
19
deviation of the price of a 21-inch TV set is about Birr 100. He wants to be 99% confident of his
results and with in Birr 20 of the true difference. How large a sample should he take for each
type of television set?
Solution:
e = Birr 20 σ = Birr 100 C = 0.99 Z  / 2  Z 0.005  2.57 n=?
Z  
2

n  2  / 2 
 e 
2
 2.57 *100 
n  2 
 20 
 2(165.12)
 330.24.10  331

Topic: Identifying Errors: New Curriculum 2023
No ratings yet
Topic: Identifying Errors: New Curriculum 2023
13 pages
MODULE 3 Job Order Costing PDF
100% (1)
MODULE 3 Job Order Costing PDF
9 pages
21UGYS01 - Mapping Techniques
No ratings yet
21UGYS01 - Mapping Techniques
109 pages
Kurmanji Complete
100% (2)
Kurmanji Complete
217 pages
Manual de Servicio de Analizador de Química Clínica
0% (1)
Manual de Servicio de Analizador de Química Clínica
516 pages
9662lsy PDF
No ratings yet
9662lsy PDF
361 pages
The Manual For The Quality Management of Educational Programmes in Myanmar
100% (1)
The Manual For The Quality Management of Educational Programmes in Myanmar
160 pages
POINT INTERVAL Estimates
No ratings yet
POINT INTERVAL Estimates
48 pages
Chapter Two (Estimation and Hypothesis Testing)
No ratings yet
Chapter Two (Estimation and Hypothesis Testing)
20 pages
Statistics For Economists Lecture VI
No ratings yet
Statistics For Economists Lecture VI
33 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Statistics For Manangement II
No ratings yet
Statistics For Manangement II
28 pages
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
No ratings yet
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
119 pages
SSC Gds Notes
No ratings yet
SSC Gds Notes
88 pages
Estimation in Statistics
100% (1)
Estimation in Statistics
4 pages
Estimation
No ratings yet
Estimation
53 pages
BBA IV Business Statistics
No ratings yet
BBA IV Business Statistics
270 pages
OB Chapter Eight
No ratings yet
OB Chapter Eight
12 pages
Buss. Stat CH-2
100% (2)
Buss. Stat CH-2
13 pages
BS - CH II Estimation
No ratings yet
BS - CH II Estimation
10 pages
Geo ch-7
No ratings yet
Geo ch-7
68 pages
Estimation
No ratings yet
Estimation
14 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
Chapter Seven
No ratings yet
Chapter Seven
24 pages
Chapter Four
No ratings yet
Chapter Four
9 pages
Business Statiatics CHP 3 - 111955
No ratings yet
Business Statiatics CHP 3 - 111955
45 pages
Geo Power Point 2
No ratings yet
Geo Power Point 2
68 pages
Ch.3-Estimation Module
No ratings yet
Ch.3-Estimation Module
27 pages
Tema Excel Proiect TIC CECCAR
No ratings yet
Tema Excel Proiect TIC CECCAR
33 pages
Inferential Statistics
No ratings yet
Inferential Statistics
119 pages
Knitting Chapter
No ratings yet
Knitting Chapter
12 pages
CH-5 Analysis-Of-Variance
No ratings yet
CH-5 Analysis-Of-Variance
34 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
101 pages
Business Ethics - Chapter 5
No ratings yet
Business Ethics - Chapter 5
25 pages
MGMT 222 Ch. IV
50% (2)
MGMT 222 Ch. IV
30 pages
YLSTD30-40K01小功率直流充电桩用户手册User Manua V1 - (EN&CN) ) 已校对
No ratings yet
YLSTD30-40K01小功率直流充电桩用户手册User Manua V1 - (EN&CN) ) 已校对
17 pages
Unit V Estimation
No ratings yet
Unit V Estimation
33 pages
6 Estimation
No ratings yet
6 Estimation
65 pages
Interval Estimation
No ratings yet
Interval Estimation
69 pages
UNIT 10 - Estimations (With Voice)
No ratings yet
UNIT 10 - Estimations (With Voice)
67 pages
Chapter 2 Estimation
No ratings yet
Chapter 2 Estimation
27 pages
Ch4 Estimation of Parameters Complete
No ratings yet
Ch4 Estimation of Parameters Complete
53 pages
03 Clerk Post English Grammer
No ratings yet
03 Clerk Post English Grammer
166 pages
Partnership - Case Digests (Thyrz)
No ratings yet
Partnership - Case Digests (Thyrz)
15 pages
Chapter Two-Four
No ratings yet
Chapter Two-Four
118 pages
Project Milk Curdling
No ratings yet
Project Milk Curdling
15 pages
Chapter Four Stat II
No ratings yet
Chapter Four Stat II
19 pages
Cha 2
0% (1)
Cha 2
23 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Chapter Two
No ratings yet
Chapter Two
154 pages
Statistics Estimation
No ratings yet
Statistics Estimation
15 pages
Offiwiz File
No ratings yet
Offiwiz File
46 pages
Unit 2 Statistical Estimation
No ratings yet
Unit 2 Statistical Estimation
15 pages
Session: 27: Topic
No ratings yet
Session: 27: Topic
62 pages
Ch-1.Ppt Business Statx
No ratings yet
Ch-1.Ppt Business Statx
66 pages
Estimation and Sample Size Determination
No ratings yet
Estimation and Sample Size Determination
37 pages
Methods Chapter 2
No ratings yet
Methods Chapter 2
19 pages
Lecture 8
No ratings yet
Lecture 8
85 pages
Sims 2 Thoughts
No ratings yet
Sims 2 Thoughts
13 pages
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
No ratings yet
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
8 pages
Chapter 6
No ratings yet
Chapter 6
33 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
Final Practice
No ratings yet
Final Practice
25 pages
School Plan of Activities Sembreak
No ratings yet
School Plan of Activities Sembreak
2 pages
Unit 6a Point and Interval Estimation
No ratings yet
Unit 6a Point and Interval Estimation
13 pages
NOTES CH 9 Living Organisms G6 2
No ratings yet
NOTES CH 9 Living Organisms G6 2
5 pages
Chapter 8
No ratings yet
Chapter 8
19 pages
Lecture 5 Final Point Estimation and Interval Estimation
No ratings yet
Lecture 5 Final Point Estimation and Interval Estimation
10 pages
Chapter One
No ratings yet
Chapter One
14 pages
Statistics 2 Chapter Two
No ratings yet
Statistics 2 Chapter Two
14 pages
CH II - Statistical Estimations
No ratings yet
CH II - Statistical Estimations
17 pages
Stat CH 3 Edited 1
No ratings yet
Stat CH 3 Edited 1
9 pages
Unit-3 (Estimation)
No ratings yet
Unit-3 (Estimation)
16 pages
Product Decision - MM
No ratings yet
Product Decision - MM
33 pages
SM Lec-2
No ratings yet
SM Lec-2
6 pages
Stat2 Chapter 2-1
No ratings yet
Stat2 Chapter 2-1
10 pages
A Wide Range of High Quality Pumps and Pumpsets Available From
No ratings yet
A Wide Range of High Quality Pumps and Pumpsets Available From
2 pages
Scamper Technique
No ratings yet
Scamper Technique
19 pages
Chapter 2 Statistics Estimation Final
No ratings yet
Chapter 2 Statistics Estimation Final
13 pages
Biostat Inferential Statistics
No ratings yet
Biostat Inferential Statistics
62 pages
Markets in Profile 部分18
No ratings yet
Markets in Profile 部分18
5 pages
Unit 1 AP World History Powerpoint
No ratings yet
Unit 1 AP World History Powerpoint
55 pages
Business Stat Formula
No ratings yet
Business Stat Formula
5 pages
RSPile Tutorials - 1 - Axially Loaded Piles
No ratings yet
RSPile Tutorials - 1 - Axially Loaded Piles
14 pages
General Duty Valves For Water Based Fire Suppression Piping
No ratings yet
General Duty Valves For Water Based Fire Suppression Piping
5 pages
Inferential PDF
No ratings yet
Inferential PDF
9 pages
CMAT - Module 3 Answer Key (QA - DI - LR)
No ratings yet
CMAT - Module 3 Answer Key (QA - DI - LR)
8 pages
Chapter 5 Estimation PDF
No ratings yet
Chapter 5 Estimation PDF
15 pages
9a BMGT 220 S.I. Theory of Estimation
No ratings yet
9a BMGT 220 S.I. Theory of Estimation
5 pages
Point and Interval Estimation-26!08!2011
No ratings yet
Point and Interval Estimation-26!08!2011
28 pages
Stat For Fin CH 4 PDF
No ratings yet
Stat For Fin CH 4 PDF
17 pages
Learning Objectives
No ratings yet
Learning Objectives
20 pages
1 Review of Basic Concepts - Interval Estimation
No ratings yet
1 Review of Basic Concepts - Interval Estimation
4 pages
En (1070)
100% (1)
En (1070)
1 page
Fee Slip
No ratings yet
Fee Slip
1 page
5 - Part 2 - Memory Principles
No ratings yet
5 - Part 2 - Memory Principles
10 pages
Statistics and Probability Module 4 Moodle
No ratings yet
Statistics and Probability Module 4 Moodle
6 pages
Iphone Laptop Computer Information
No ratings yet
Iphone Laptop Computer Information
1 page
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Chapter Two Stat II

Uploaded by

Chapter Two Stat II

Uploaded by

CHAPTER TWO

2.1. Basic concepts:

Four Important Properties of Estimators

Consistency: A third property of estimators, consistency, is related to their behavior as the

An unbiased estimator is a consistent estimator if the variance approaches 0 as n increases. For

Sufficiency: The last property of a good estimator is sufficiency. A sufficient statistic is an

2.2. Point Estimator of the Mean and Proportion

An interval estimate - is a range of values used to estimate a population parameter. It describes

The sample mean, x ,  =

2.3. Interval Estimators of the Mean and the Proportion

2.3.1. Interval estimation for population mean, 

The confidence interval for population mean is affected by:

A. Confidence internal estimate of  - Normal population,  known

b. 0.99 σ = 2 hours Confidence interval: 5.6 ≤  ≤6.4 n=?

B. Confidence interval estimate of  - Normal population,  unknown, n large

C. Confidence interval for    unknown, n-small, population normal

The confidence interval to estimate  becomes:

Where: X = sample mean

n= 27 X = 128.4 s = 20.6 C= 0.98

n= 20 X = Birr 2.50 s = Birr 0.50 C= 0.90

2.3.2. Interval Estimation of the Population Proportion

We know that a sample proportion, p , is an unbiased estimator of a population proportion P and

That is,  p is substituted by S p  pq

i. S p  pq = 0.61 * 0.39 = 0.0523

0.07 = Z / 2 0.8 * 0.2

0.05 = Z / 2 0.7 * 0.3

Interval Estimation of 1  2 - population normal,  known

1   2  (20,600  19,700)  1.96(390.51) = 900

1  2  (15.2  14)  1.96(0.283)

2.5. Determination of Sample Size

The answer depends on three factors:

2.4.1. Sample size for estimating population mean, 

If population standard deviation,  , is unknown we have to make an educated guess or take a

2.4.2. Sample size for estimating population proportion, p.

The largest sample size would be obtained when p = 0.5. So,

2.4.3. Determining Sample Size When Estimating 1  2

2. The sample size for each sample is the same: n1  n2  n

Incorporating these assumptions into the z-formula yields

Solving for n produces the sample size:

You might also like