Unit 4: Statistical Estimation and Small Sampling Theories
Unit 4: Statistical Estimation and Small Sampling Theories
Unit 4: Statistical Estimation and Small Sampling Theories
UNIT 4
OBJECTIVES
GENERAL OBJECTIVE
SPECIFIC OBJECTIVE
INPUT
INPUT
Since the populations from which these values were obtained are large, these
values are only estimates of the true parameters and are derived from data
collected from samples.
The statistical procedure for estimating the populations mean, variance and
standard deviation will be explained in this module.
An important question in estimation is that of sample size. How large should the
sample be in order to make an accurate estimate?
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 3
AApoint
pointestimate
estimateisisaaspecific
specificnumerical
numericalvalue
valueestimate
estimateofof
parameter.
parameter.The
Thebest
bestpoint
pointestimate
estimateofofthe
thepopulation
populationmean
mean isisthe
the
sample mean.
sample mean.
Sample measures (i.e., statistics) are used to estimate population measures (i.e.,
parameters). The sample mean is the best estimate of the population mean
because the means of samples vary less than other statistics such as medians
and modes when many samples are selected from the same population.
The estimator should be unbiased estimator. That is, the expected value or
the mean of the estimates obtained from samples of a given size is equal to
the parameter being estimated.
The estimator should be consistent. For a consistent estimator, as sample
size increases, the value of the estimator approaches the value of the
parameter estimated.
The estimator should be a relatively efficient estimator. That is, of all the
statistics that can be used to estimate a parameter, the relatively efficient
parameter has the smallest variance.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 4
As stated in the previous module, the sample mean will be, for the most part,
somewhat different from the population mean due to sampling error. Then, how
good is point estimate? As the accuracy of a point estimate is questionable,
statisticians use another type of estimate called an interval estimate.
An
Aninterval
intervalestimate
estimateofofaaparameter
parameterisisan
aninterval
intervalororaarange
rangeofofvalues
values
used
usedtotoestimate
estimatethe
theparameter.
parameter.This
Thisestimate
estimatemaymayorormaymaynot
notcontain
containthe
the
value of the parameter being estimated.
value of the parameter being estimated.
For example, an interval estimate for the average age of all the students might be
26.9< <27.7, or 27.3 0.4 years.
Either the interval contains the parameter or it does not. A degree of confidence
(usually %) can be assigned before an interval estimate is made. For instance,
one may wish to be 95% confident that the interval contains the true population
mean. Another question then arises. Why 95%? Why not 99% or 99.5%?
If one desires to be more confident (99% or 99.5%), then the interval must be
larger. For example, a 99% confidence interval for the mean age of the Poly
students might be 26.7< <27.9, or 27.3 0.6. Hence, a trade-off occurs. To
be more confident that the interval contains the true population mean, one must
make the interval wider.
The central limit theorem states that when the sample size is large,
approximately 95% of the sample means will fall within 1.96 standard errors of
the population mean. That is
1.96
n
Now, if a specific mean is selected, say X , there is a 95% probability that
it falls within the range of 1.96 . Likewise there is a 95% probability that
n
the interval specified by X 1.96 will contain . Stated another way,
n
X 1.96 < < X 1.96
n n
Hence, on can be 95% confident that the population mean is contained within
that interval when the values of the variable are normally distributed in the
population.
Since other confidence intervals are used in statistics, the symbol Z 2 is used in
the general formula for confidence intervals. The Greek letter (alpha)
represents the total area in both of the tails of the standard normal distribution
curve.
2 represents the area in each one of the tails.
The relationship between and the confidence level is that the stated
confidence level is the percentage equivalent to the decimal value of 1 - , and
vice versa. When the 95% confidence interval is to be found, = 0.05, since 1 –
0.05 = 0.95. When = 0.01, the 1 - = 1 – 0.01 = 0.99, and the 99%
confidence interval is being calculated.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 6
Formula
Formulafor
forthe
theConfidence
ConfidenceInterval
Intervalofofthe
theMean
Meanfor
foraaSpecific
Specific
<<<<
For
For a 95% confidence interval, = 1.96;and
a 95% confidence interval, = 1.96; andfor
fora a99%
99%confidence
confidenceinterval,
interval,
==2.58
2.58
The term z 2 is called the maximum error of estimate. For a specific value,
n
say = 0.05, 95% of the sample means will fall within this error value on either
side of the population mean.
The
Themaximum
maximumerror
errorofofestimate
estimateisisthe
themaximum
maximumlikely
likelydifference
differencebetween
between
the
thepoint
pointestimate
estimateofofa aparameter
parameterand andthe
theactual
actualvalue
valueofofthe
theparameter.
parameter.
Example 4.1
1. One of the Polytechnic directors wishes to estimate the average age of the
students currently enrolled. Per last year record, it is known that the standard
deviation is 2 years. A sample of 50 students is selected of which the mean
age is 23.2 years. Find the 95% confidence interval of the population mean.
2. A well known tonic drink is known to increase the pulse rate of its users. The
standard deviation of the pulse rate is known to be 5 beats per minute. A
sample of 30 users had an average pulse rate of 104 beats per minute. Find
the 99% confidence interval of the true mean.
Z 1.96
1. Since the 95% confidence interval is desired, . Hence,
2
substituting in the formula X 1.96 X 1.96
n n
2 2
23.2 – 1.96 23.2 1.96
50 50
The director can say, with95% confidence, the average age of the
students is between 22.6 and 23.8 years, based on 50 students.
Z 2.58
2. Since the 99% confidence interval is desired,
2
X 1.96 X 1.96
n n
5 5
104 2.58 104 2.58
30 30
104 2.4 104 2.4
101.6 106.4
102 106 or 104.2 2
One can be 99% confident that the mean pulse rate of all users is
between 102 and 106 beats per minute, based on a sample of 30 users.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 8
3.
STEP 1 It is known that the mean ( X ) is 11.091 and the standard
deviation
(s) = 14.405
STEP 2 Find
2 . Since the 90% confidence interval is to be used,
= 1 – 0.90 = 0.10, and 2 0.10 0.05
2
STEP 3 Find z 2 . Subtract 0.05 from 0.5000 to get 0.4500. The
corresponding z from the table is 1.65.
STEP 4 Substitute in the formula
s s
X z X z
2
n 2
n
(s is used in place of when is unknown, since n 30 )
14.405 14.405
11 .091 1.65 11 .091
30 30
6.752 L 15.430
Hence, one can be 90% confident that the population mean of the assets
is between RM6.752 million and 15.430 million, based on a sample of 30
koperasi.
4. (a) For the population: = 0.025 mm, for the sample: N = 16, x = 0.314,
because an infinite number of measurements can be obtained for the
diameter of the wire, the population is infinite and the estimated value
of the confidence interval of the population mean is given by
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 9
X z X z = 0.314 1.28(0.025) 0.314 0.008
2
n 2
n 16
mm.
0.314 0.01 mm = X z , i.e z = 1.6
2
n 2
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 10
ACTIVITY 4A
5. Find each:
a) z for the 99% confidence interval
2
b) Find the 99% confidence interval of the mean test scores of the
entire first semester students.
SOLUTION TO ACTIVITY 4A
2. The maximum error of estimate is the likely range of values to the right or
left of the statistic which may contain the parameter.
5. a) 2.58
b) 2.33
c) 1.96
d) 1.65
e) 1.88
6. a) 77< <87
b) 75<<89
c) The 99% confidence interval is larger because the confidence level is
larger.
INPUT
INPUT
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 12
The formula for sample size is derived from the maximum error of estimate
formula,
E z and this formula is solved for n as follows:
2
n
2
z .
n 2
E
Example 4.2
You are asked to estimate the average age of the students in this Poly.
How large a sample is necessary? You want to be confident that the estimate
should be accurate within one year. From the previous study, the standard
deviation of the ages is known to be 3 years.
2
z . 2
n 2 =
( 2.58)(3)
= 59.9 which is rounded up to 60. Well, you
E 1
need a sample size of at least 60 students in order to be 99% confident that the
estimate is within 1 year of the true mean age.
ACTIVITY 4B
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 14
3. The mean weight of 84 soil samples is 61.2 grams and the standard
deviation is 7.9 grams. Find the 95% confidence interval for the true mean.
4. A poly director wishes to estimate the average number of hours his part-time
lecturers teach per week. The standard deviation from the previous study is
2.6 hours. How large a sample must be selected if he wants to be 99%
confident of finding whether the true mean differs from the sample mean by
1 hour?
5. You are required to estimate the fresh weights of concrete cubes. How large
a sample must be selected if you are required to be 90% confident that the
true mean is within 600 grams of the sample mean? The standard deviation
of the fresh weights is known to be 800 grams.
6. A class lecturer would like to estimate the average number of sick days that
students use per year. It is assumed that the standard deviation is 2.5 days.
How large a sample must be selected if the lecturer wants to be 95%
confident of getting an interval that contains the true mean with a maximum
error of 1 day?
SOLUTION TO ACTIVITY 4B
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 15
1. (a) 11.9 < < 13.3 (b) It would be highly unlikely, since this is far larger
than 13.3
4. 45
5. 5
6. 25
INPUT
INPUT
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 16
1. It is bell-shaped.
2. It is symmetrical about the mean.
3. The mean, median, and mode are equal to 0 and are located at the
center of the distribution.
4. The curve never touches the x-axis.
Many statistical distributions use the concept of degrees of freedom, and the
formulas for finding the degrees of freedom vary for different statistical tests. The
degrees of freedom are the number of values that are free to vary after a sample
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 17
statistic has been computed, and they tell researcher which specific curve to use
when a distribution consists of a family of curves.
For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary.
But once 4 values are selected, the 5th value must be a specific number to get a
sum of 50. Since 50/5 = 10. Hence, the d.f. are 5 – 1 = 4, and this value tells the
researcher which t curve to use.
The symbol d.f. will be used for degrees of freedom. The d.f. for a confidence
interval for the means are found by subtracting 1 from the sample size, i.e d.f. = n
– 1.
Formula for a Specific Confidence Interval for the Mean When Is Unknown and
n<30
s s
X t X t
2
n 2
n
The degrees of freedom are n - 1
Yes
Use values no matter what the
Is known sample size is
No
Yes
Use values and s in place of
Is n In the formula
No
Example 4.3
1. Find the t 2 value for a 95% confidence interval when the sample size is
22.
2. Ten randomly selected automobiles were stopped, and the tread depth of
the right front tire was measured. The mean was 0.32 mm, and the
standard deviation was 0.08 mm. Find the 95% confidence interval of the
mean depth. Assume that the variable is approximately normally
distributed.
1. d.f. = 22 -1, or 21. Find 21 in the left column and 95% in the row labeled
“confidence intervals.” The intersection where the two meet give the value
for t 2 , which is 2.080. See the figure below. Note: At the bottom of the
table where d.f. = , the z 2 can be found for specific confidence
intervals. The reason is that as the degrees of freedom increase, the t
distribution approaches the standard normal distribution.
2. Since is unknown and s must replace it, the t distribution (see table
‘F’) must be used for 95%. Hence with 9 degrees of freedom, t 2 = 2.262.
The 95% confidence interval of the population mean is found by
substituting in the formula
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 19
s s 0.08
X t X t = 0.32 (2.62) , 0.26 0.38
2
n 2
n 10
3.
STEP 1 Find the mean and standard deviation for the data
Use the formulas or your calculator
The mean X = 7041.4
The standard deviation s = 1610.3
STEP 2 Find t 2 from table ‘F’. Use the 99% confidence interval with
d.f. = 6. It is 3.707.
s s
X t X t
2
n 2
n
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 20
1610.3 1610.3
7041.4 3.707 7041.4 3.707
7 7
4785.2< <9297.6
One can be 99% confident that the population mean of home fires started by
candles each year is between 4785.2 and 9297.6, based on a sample of home
fires occurring over a period of 7 years.
ACTIVITY 4C
For the following activities, assume that all variables are approximately
distributed.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 21
5. The time taken for a chemical reaction to take place is measured 5 times
and is found to be: 0.28 hours, 0.30 hours, 0.27 hours, 0.33 hours and
0.31 hours. Determine the 95% and 99% confidence intervals for the
estimated true reaction time.
SOLUTION TO ACTIVITY 4C
1. (a) The 95% confidence interval are 2.455 cm and 2.485 cm.
(b) The 80% confidence interval are 2.463 cm and 2.477 cm.
2. It is likely that 80% of all the lamps will fail between 1171.2 and
1182.8 hours. ( t 2 = 0.868).
3. 4.417< <4.343
4. 4.324< <4.436
SELF ASSESSMENT 4
You are approaching success. Try all the questions in this self-
assessment section and check your answers on the next page. If you encounter
any problems, consult your instructor. Good luck.
1. When should the t distribution be used to find a confidence interval for the
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 23
mean?
6. The three confidence intervals used most often are the ____%, ______%,
and ___%.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 24
10. The average hemoglobin for a sample of 20 lecturers was 16 grams per
100 milliliters, with a sample standard deviation of 2 grams. Find the 99%
confidence interval of the true mean.
12. A recent study of 28 city residents showed that the mean of the time they
had lived at their present address was 9.3 years. The standard deviation
of the sample was 2 years. Find the 90% confidence interval of the true
mean.
STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES C5606/4/ 25
3. b
4. b
6. 90; 95; 99
8. 95%
FEEDBACK TO SELF ASSESSMENT 4
9. (a) 2.355 < < 2.445; 2.341 < < 2.459 (b) 86%