Chapter 4
Chapter 4
Contents
Chapter 3 Estimation
References
1. Hogg, R.V., & Tanis, E.A. (2010). Probability and Statistical Inference. (8th ed.). Upper Saddle
River, New Jersey : Prentice Hall.
2. Wackerly, D.D., Mendenhall III, W., & Scheaffer, R. L. (2008). Mathematical Statistics with
Applications. (7th ed.). California: Duxbury / Thomson Learning.
Chapter 4 - 1
UECM1224 Probability and Statistics II
Example 4.1
Suppose there are only five students in the class and the midterm scores are
70 78 80 80 95
Let X denote the score of a student, we can have the probability distribution of the population as
𝑥 70 78 80 95
𝑃(𝑋 = 𝑥)
Sampling distribution of X
The probability distribution of X is called its sampling distribution. It lists the various values that X
can assume and the probability of each value of X .
Example 4.2
For the data in Example 4.1, lists all possible samples of three scores that can be selected, without
replacement. Calculate the sample mean X for each sample and the sampling distribution of X .
Solution
All possible samples and their means when the sample size is 3.
Chapter 4 - 2
UECM1224 Probability and Statistics II
Sampling error is the difference between the value of the sample statistic and the value of the
corresponding population parameter.
Nonsampling error is the error that occurs in the selection, recording and tabulation of data.
Example 4.3
Reconsider the data in Example 4.1, now suppose we take a random sample of three scores from this
population. Assume that this sample includes the scores 70, 80 and 95, calculate the sampling error.
Now suppose, when we select the above mentioned sample, we mistakenly record the second score as 82
instead of 80, calculate the nonsampling error.
The mean of the sampling distribution of X is always equal to the mean of the population. Thus,
X = .
For a sample of size n, if the sampling is done from a finite population (of size N), the standard deviation
of X is given by
n
if 0.05 or sampling is done with replacement
n N
X =
N − n if n 0.05 and sampling is done without replacement
n N − 1 N
Remark
N −n N −n
1. is called the finite population correction factor and 1 when N is large and
N −1 N −1
n
0.05 .
N
2. The value of X decreases as n increases.
Chapter 4 - 3
UECM1224 Probability and Statistics II
Example 4.4
The mean wage per hour for all 5000 employees working at a large company is RM13.50 and the
standard deviation is RM2.90. Let X be the mean wage per hour for a random sample of certain
employees selected from this company. Find the mean and standard deviation of X for a sample size of
(a) 30 (b) 75 (c) 200
Theorem
Let X 1 , X 2 , ..., X n be a random sample of size n from a normal distribution with mean, µ and standard
deviation, σ, then the sampling distribution of the sample mean, X , will also be normally distributed
with the following mean and standard deviation, irrespective of the sample size: X = and X =
n
.
2
That means, if X N(µ, σ2), then X N( X = , X2 = )
n
Example 4.5
In a recent STAT test, the mean score for all examinees was 1016. Assume that the distribution of STAT
scores of all examinees is normal with a mean of 1016 and a standard deviation of 153. Let X be the
mean STAT score of a random sample of certain examinees. Calculate the mean and standard deviation
of X and describe the shape of its sampling distribution when the sample size is
(a) 16 (b) 50 (c) 1000
Chapter 4 - 4
UECM1224 Probability and Statistics II
2
X N( X = , 2
X
= )
n
Remark:
1. The sample size is usually considered to be large if n 30 .
2. As sample size increases, the sampling distribution of X behaves more like normal distribution
and hence, the approximation is better.
Example 4.6
The mean rent paid by all tenants in a large city is RM1250 with a standard deviation of RM225.
However, the population distribution of rents for all tenants in this city is skewed to the right. Calculate
the mean and standard deviation of X and describe the shape of its sampling distribution when the
sample size is
(a) 30 (b) 100
Example 4.7
Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 ounces and a standard deviation of 0.3 ounce. Find the probability that the mean weight, X ,
of a random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 ounces.
Example 4.8
The prices of the houses in Selangor have a skewed probability distribution with a mean of RM165,300
and standard deviation RM29,500. Find the probability that the mean prices, X , of a random sample of
400 houses in Selangor is
(i) within RM3,000 of the population mean,
(ii) less than the population mean by at least RM2,500.
Chapter 4 - 5
UECM1224 Probability and Statistics II
Example 4.9
If a random sample of size 30 is taken from each of the following distribution, find, for each case, the
probability that the sample mean exceeds 5.
(a) 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝜆 = 4.5) (b) 𝑋~𝐵(𝑛 = 9, 𝑝 = 0.5)
Example 4.10
If a large number of samples of size 𝑛 are taken from 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2.5) and approximately 5% of the
sample means are less than 2.025, estimate 𝑛.
Theorem
Let X 1 , X 2 , ..., X n be a random sample of size n from a normal distribution with mean, µ and standard
deviation, σ, then
(n − 1) s 2
(X i − X )
1 n 2
=
2
i =12
Chapter 4 - 6
UECM1224 Probability and Statistics II
has a 2 distribution with n − 1 degrees of freedom. Also, X and s 2 are independent random
variables.
4.5.1 The Chi-Square 2 Distribution
(i) The chi-square 2 distribution is a continuous distribution with p.d.f.
1
𝑓(𝑥) = 𝜈 𝑥 𝜈/2 𝑒 −𝑦/2 , 𝑥 > 0.
22 Γ(𝜈/2)
(ii) The value of 2 is never negative (i.e. always positive).
(iii) There is a family of 2 distribution. Each with a different shape, depending on the number of
degrees of freedom ( df = v ) and is denoted by 2 (v )
(iv) When the number of df is small the distribution is positively skewed but as the number of degrees
of df increase it becomes symmetrical and approaches the normal distribution.
Example 4.11
Determine the 2-value of the chi-square-distribution for the following.
(a) 7 degrees of freedom and an area 0.1 in the right tail.
(b) 2 degrees of freedom and an area 0.99 in the left tail.
Example 4.12
Find the area in the right tail of the chi-square distribution for 2 -value = 2.558, v = 10.
X −
Therefore, t = has a t distribution with n − 1 degrees of freedom.
s/ n
Example 4.13
Find the t-value of the t-distribution for the following.
(i) Area in the right tail = 0.05 and v=5
(ii) Area in the left tail = 0.025 and v = 20.
Chapter 4 - 7
UECM1224 Probability and Statistics II
Example 4.14
Find the area in the appropriate tail of the t-distribution for the following,
(i) t = 2.467, v = 28 (ii) t = – 2.878, v = 18
4.6 Estimation
The assignment of value(s) to a population parameter based on a value of the corresponding sample
statistic is called estimation.
The value(s) assigned to a population parameter based on the value of a sample statistic is called an
estimate. The sample statistic used to estimate a population parameter is called an estimator.
For example,
(i) the sample proportion, p̂ , is the best point estimate of the population proportion, 𝑝.
(ii) the sample mean, X , is the best point estimate of the population mean, 𝜇.
(iii) the sample variance, 𝑠 2 , is the best point estimate of the population variance, 2 .
An interval estimate is an interval that constructed around the point estimate, and it is stated that this
interval is likely to contain the true value of a population parameter.
Each interval is constructed with regard to a given confidence level and is called a confidence interval.
The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 − )100%
A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property of a
population.
The null hypothesis (denoted by H 0 ) is a statement that the value of a population parameter is assumed
to be true until it is declared false.
Chapter 4 - 8
UECM1224 Probability and Statistics II
The alternative hypothesis (denoted by H 1 or H a ) is the statement that the parameter has a value that
somehow differs from the null hypothesis.
Example 4.15
Give the relevant null hypothesis and alternative hypothesis.
(a) Knowing that the proportion of drivers who admit to running red lights is at least 0.5, test if the
proportion has changed.
(b) The mean height of professional basketball players is at most 7ft., test if the claim has changed.
(c) The standard deviation of IQ scores of actors is equal to 15, test if the standard deviation
(i) has changed,
(ii) is getting smaller,
(iii) is getting bigger.
Test statistic
The test statistic is a value computed from the sample data, and it is used in making the decision about
the rejection of the null hypothesis
x− x−
Test statistic for mean(large sample), z= or z=
n s n
Rejection and nonrejection region
The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject
the null hypothesis.
The nonrejection region (or acceptance region) is the set of all values of the test statistic that cause us not
to reject the null hypothesis.
The critical value is a value that separates the critical region and the nonrejection region.
The significance level (denoted by ) is the probability that the test statistic will fall in the critical region
when the null hypothesis is actually true.
Tails of a test
The tails in a distribution are the extreme regions bounded by critical values. Some hypothesis tests are
two-tailed (the critical region is in the two extreme regions), some are left-tailed (the critical region is in
the extreme left region) and some are right-tailed tests (the critical region is in the extreme right region).
hypothesis H 1
Rejection region In both tails In the left tail In the right tail
Example 4.16
Write the null and alternative hypothesis for each of the following cases. Determine whether we have
each is a case of a two-tailed, a left-tailed or a right-tailed test.
(a) According to the formal record, the mean family size was 3.19 in 1995. A researcher wants to
check whether or not this mean has changed since 1995.
(b) A company claims that the mean amount of soda in all soft-drink cans is 12 ounces. Suppose a
consumer agency wants to test whether the mean amount of soda per can is less than 12 ounces.
(c) A research report shows that the mean cholesterol of all adult males in KL is 175 in 1995. Test if
the mean cholesterol of all adult males in KL is now higher than 175.
/2 /2
* * * *Left–tailed test
Two–tailed test Right–tailed test
A Type I error occurs when a true null hypothesis is rejected. The value of represents the probability
of committing this type of error, that is
= P( H 0 is rejected | H 0 is true)
The value of = the significance level of the test.
A Type II error occurs when a false null hypothesis is not rejected. The value of represents the
probability of committing a Type II error, that is
Chapter 4 - 10
UECM1224 Probability and Statistics II
For case I and II, the normal distribution is used to make inference on ( known).
The (1 − )100% confidence interval for is
x z / 2
n
The value of z / 2 used here is read from the standard normal distribution table for the given confidence
level.
The test statistic is
x−
z=
n
The value of z calculated for a sample mean x is also called the observed value of z .
The maximum error of estimate for 𝜇, denoted by E, is the quantity that is subtracted from and added to
the value of x to obtain a confidence interval for 𝜇. Thus,
s s
E = z / 2 or E = z / 2 or E = t / 2
n n n
Example 4.17
A research department took a sample of 36 textbooks and collected information on their prices. This
information produced a mean price of RM54.40. It is known that the prices of all textbooks follow
normal distribution with standard deviation of RM4.50.
(a) What is the point estimate of the mean price of all textbooks? What is the margin of error for this
estimate?
(b) Construct a 90% confidence interval for the mean price of all textbooks.
(c) Use a 0.05 significance level to test the common belief that the mean price of all textbook is
RM53.
Note:
If the sample size is too large, use normal distribution as an approximation to the t − distribution.
Example 4.18
A doctor wanted to estimate the mean cholesterol level for all adult men living in a town. He took a
sample of 25 adult men from the town and found that the mean cholesterol level for this sample is 186
with a standard deviation of 12. Assume that the cholesterol levels for all adult men in the town are
normally distributed.
(a) Construct a 95% confidence interval for the population mean, .
(b) Testing at the 1% significance level, would you conclude that the mean cholesterol level for all
adult men living in the town is more than 180?
Let be the mean cholesterol level for all adult men living in the town.
Chapter 4 - 12
UECM1224 Probability and Statistics II
Example 4.19
According to a recent survey, the workers employed in manufacturing industries earned an average of
RM546 per month. Assume that this mean is based on a random sample of 1000 workers selected from
the manufacturing industries and that the standard deviation of earnings for this sample is RM75. Find a
99% confidence interval for the mean earnings of all workers employed in manufacturing industries.
Solution
Let be the mean earnings of all workers employed in manufacturing industries.
n=
E
where z 2 = critical z score based on the desired confidence interval
E = desired margin of error
= population standard deviation
When finding the sample size n, if the use of formula does not result in a whole number, always increase
the value of n to the next larger whole number.
Remark
When is not known, we can estimate using these methods:
1. range 4
2. Estimate the value of by using the earlier result.
Example 4.20
If we want to estimate the mean IQ scores for the population of statistics professors. Given that the
standard deviation of IQ scores for all statistics professors is 15. How many statistics professors must be
randomly selected for IQ tests if we want 99% confidence that the sample mean is within 2 IQ points of
the population mean?
Chapter 4 - 13
UECM1224 Probability and Statistics II
Example 4.21
The management of Priority Health Club claims that its members lose an average of 10 pounds or more
within the first month after joining the club. A consumer agency that wanted to check this claim took a
random sample of 36 members of this health club and found that they lost an average of 9.2 pounds
within the first month of membership with a standard deviation of 2.4 pounds.
(a) Find the rejection and nonrejection regions with 1% significance level for the test statistics and for
the sample means.
(b) What will the conclusion be in (a)?
(c) What are the Type I and Type II errors in this case?
(d) Compute the probability of committing Type I error in this case.
(e) What is the probability of making Type II error if the mean is changed to 9 pounds?
Chapter 4 - 14
UECM1224 Probability and Statistics II
p-value is the probability, computed assuming that 𝐻0 is true, of observing a value of the test statistic that
is at least as contradictory to 𝐻0 and supportive of 𝐻1 as the value actually computed from the sample
data.
Example 4.22
A sample of 106 body temperatures having a mean of 98.20F. Assume that the sample is a simple
random sample and that the population standard deviation is known to be 0.62F. Use a 0.05 significance
level to test the common belief that the mean body temperature of healthy adults is equal to 98.60F.
Find the p − value of the test.
Chapter 4 - 15
UECM1224 Probability and Statistics II
Example 4.23
From the past record of a bank, with the old computer system, a teller at this bank could serve, on average
22 customers per hour. Recently, a new system was installed, expecting that it would increase the service
rate. To check if the new computer system is more efficient than the old system, the management took a
random sample of 18 hours and found that during these hours the mean number of customers served by
tellers was 28 per hour with a standard deviation of 2.5. Testing at the 1% significance level, would you
conclude that the new computer system is more efficient than the old computer system? Assume that the
number of customers served per hour by a teller is approximately normally distributed.
Example 4.24
A psychologist claims that the mean age at which children start walking is 12.5 months. Carol wanted to
check if this claim is true. She took a random sample of 18 children and found that the mean age at which
these children started walking was 12.9 months with a standard deviation of 0.80 month. Using the 1%
significance level, can you conclude that the mean age at which all children start walking is different
from 12.5 months? Assume that the ages at which all children start walking have an approximately
normal distribution.
Chapter 4 - 16
UECM1224 Probability and Statistics II
Assuming that the population is (approximately) normal, the (1 − )100% confidence interval for 2 is
(n − 1) s 2 (n − 1) s 2
to
2 / 2 12− / 2
where
2 / 2 and 12− / 2 are obtained from the chi-square distribution with the degree of freedom, = n − 1
for / 2 and 1 − / 2 areas in the right tail
Example 4.25
ABC is producing cookies. The machine that fills packages of these cookies is set up in such a way that
the average net weight of these packages is 32 ounces with a variance of 0.015 square ounces. From time
to time, the quality control inspector at the company selects a sample of a few such packages, calculate
the variance of the net weights of these packages. Assume that the net weights of cookies in all packages
are normally distributed.
(a) A recently taken random sample of 25 packages from the production line gave a sample variance
of 0.029 square ounce. Construct a 95% confidence interval for the population variance of the net
weights of these packages.
(b) The acceptable value of the population variance is 0.015 square ounce or less. A recently taken
random sample of 30 packages from the production line gave a sample variance of 0.025 square
ounce. By using = 0.01, do you conclude that the population variance is in the acceptable
criteria?
Chapter 4 - 17