0% found this document useful (0 votes)
58 views17 pages

Chapter 4

This document outlines the contents and chapters of the course UECM1224 Probability and Statistics II. It includes 8 chapters that cover topics such as inferences on the mean and variance of a distribution, inferences on proportions, comparing means and proportions, analysis of variance, joint distributions, distributions of functions of random variables, and chi-square tests. It also lists two textbooks as references and provides contact information for the lecturer and tutor of the course.

Uploaded by

JHAN FAI MOO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views17 pages

Chapter 4

This document outlines the contents and chapters of the course UECM1224 Probability and Statistics II. It includes 8 chapters that cover topics such as inferences on the mean and variance of a distribution, inferences on proportions, comparing means and proportions, analysis of variance, joint distributions, distributions of functions of random variables, and chi-square tests. It also lists two textbooks as references and provides contact information for the lecturer and tutor of the course.

Uploaded by

JHAN FAI MOO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UECM1224 Probability and Statistics II

UECM1224 Probability and Statistics II

Contents

Chapter 4 Inferences on the Mean and Variance of a Distribution

Chapter 5 Inference on Proportions

Chapter 6 Comparing Two Means, Two Variances and Two Proportions

Chapter 7 The Analysis of Variance

Chapter 1 Joint Distributions

Chapter 2 Distributions of Functions of Random Variables

Chapter 3 Estimation

Chapter 8 Chi-Square Tests

References

1. Hogg, R.V., & Tanis, E.A. (2010). Probability and Statistical Inference. (8th ed.). Upper Saddle
River, New Jersey : Prentice Hall.

2. Wackerly, D.D., Mendenhall III, W., & Scheaffer, R. L. (2008). Mathematical Statistics with
Applications. (7th ed.). California: Duxbury / Thomson Learning.

Lecturer: Dr. Wong Wai Kuan ([email protected])


Tutor: Dr. Pang Sook Theng ([email protected])

Chapter 4 - 1
UECM1224 Probability and Statistics II

Chapter 4: Inferences on the Mean and Variance of a Distribution

4.1 Population and Sampling Distribution

The population distribution is the probability distribution of the population data.

Example 4.1
Suppose there are only five students in the class and the midterm scores are
70 78 80 80 95

Let X denote the score of a student, we can have the probability distribution of the population as
𝑥 70 78 80 95
𝑃(𝑋 = 𝑥)

The probability distribution of a sample statistic is called its sampling distribution.

Sampling distribution of X
The probability distribution of X is called its sampling distribution. It lists the various values that X
can assume and the probability of each value of X .

Example 4.2
For the data in Example 4.1, lists all possible samples of three scores that can be selected, without
replacement. Calculate the sample mean X for each sample and the sampling distribution of X .

Solution

Suppose we assign A, B, C, D and E to the scores of five students so that


A = 70, B = 78, C = 80, D = 80, E = 95

All possible samples and their means when the sample size is 3.

Sample scores in the sample X


ABC 70, 78, 80 76.00
ABD 70, 78, 80 76.00
ABE 70, 78, 95 81.00
ACD 70, 80, 80 76.67
ACE 70, 80, 95 81.67
ADE 70, 80, 95 81.67
BCD 78, 80, 80 79.33
BCE 78, 80, 95 84.33
BDE 78, 80, 95 84.33
CDE 80, 80, 95 85.00

Sampling distribution of X when the sample size is 3

𝑥̅ 76.00 76.67 79.33 81.00 81.67 84.33 85.00


𝑃(𝑋̅ = 𝑥̅ )

Chapter 4 - 2
UECM1224 Probability and Statistics II

4.1.1 Sampling and nonsampling errors

Sampling error is the difference between the value of the sample statistic and the value of the
corresponding population parameter.

In the case of mean, sampling error = X −  .


Assuming that the sample is random and no nonsampling error has been made.

Nonsampling error is the error that occurs in the selection, recording and tabulation of data.

Example 4.3
Reconsider the data in Example 4.1, now suppose we take a random sample of three scores from this
population. Assume that this sample includes the scores 70, 80 and 95, calculate the sampling error.
Now suppose, when we select the above mentioned sample, we mistakenly record the second score as 82
instead of 80, calculate the nonsampling error.

4.2 Mean and Standard Deviation of X .

The mean of the sampling distribution of X is always equal to the mean of the population. Thus,
X =  .

For a sample of size n, if the sampling is done from a finite population (of size N), the standard deviation
of X is given by
  n
 if  0.05 or sampling is done with replacement
n N
X = 
  N − n if n  0.05 and sampling is done without replacement
 n N − 1 N

And if the sampling is done from an infinite population, we have



X =
n

Remark
N −n N −n
1. is called the finite population correction factor and  1 when N is large and
N −1 N −1
n
 0.05 .
N
2. The value of  X decreases as n increases.
Chapter 4 - 3
UECM1224 Probability and Statistics II

Example 4.4
The mean wage per hour for all 5000 employees working at a large company is RM13.50 and the
standard deviation is RM2.90. Let X be the mean wage per hour for a random sample of certain
employees selected from this company. Find the mean and standard deviation of X for a sample size of
(a) 30 (b) 75 (c) 200

4.3 Shape of the sampling distribution of X .


The shape of the sampling distribution of X relates to the following two cases.
1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal distribution.

4.3.1 Sampling from a normally distributed population

Theorem
Let X 1 , X 2 , ..., X n be a random sample of size n from a normal distribution with mean, µ and standard
deviation, σ, then the sampling distribution of the sample mean, X , will also be normally distributed

with the following mean and standard deviation, irrespective of the sample size:  X =  and  X =
n
.
2
That means, if X  N(µ, σ2), then X  N(  X =  ,  X2 = )
n

Example 4.5
In a recent STAT test, the mean score for all examinees was 1016. Assume that the distribution of STAT
scores of all examinees is normal with a mean of 1016 and a standard deviation of 153. Let X be the
mean STAT score of a random sample of certain examinees. Calculate the mean and standard deviation
of X and describe the shape of its sampling distribution when the sample size is
(a) 16 (b) 50 (c) 1000

4.3.2 Sampling from a population that is not normally distributed

Central Limit Theorem


For a relatively large sample size, the sampling distribution of X is approximately normal, regardless of
the distribution of the population under consideration. The mean and standard deviation of the sampling

distribution of X are  X =  and  X = .
n
That means, for all distribution of X, if n is large

Chapter 4 - 4
UECM1224 Probability and Statistics II

2
X  N(  X =  ,  2
X
= )
n
Remark:
1. The sample size is usually considered to be large if n  30 .
2. As sample size increases, the sampling distribution of X behaves more like normal distribution
and hence, the approximation is better.

Example 4.6
The mean rent paid by all tenants in a large city is RM1250 with a standard deviation of RM225.
However, the population distribution of rents for all tenants in this city is skewed to the right. Calculate
the mean and standard deviation of X and describe the shape of its sampling distribution when the
sample size is
(a) 30 (b) 100

Let X be the rent paid by a tenant

4.4 Application of the sampling distribution of X .

Example 4.7
Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 ounces and a standard deviation of 0.3 ounce. Find the probability that the mean weight, X ,
of a random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 ounces.

Let X be the weight of a package of cookies.

Example 4.8
The prices of the houses in Selangor have a skewed probability distribution with a mean of RM165,300
and standard deviation RM29,500. Find the probability that the mean prices, X , of a random sample of
400 houses in Selangor is
(i) within RM3,000 of the population mean,
(ii) less than the population mean by at least RM2,500.

Let X be the price of a house in Selangor.

Chapter 4 - 5
UECM1224 Probability and Statistics II

Example 4.9
If a random sample of size 30 is taken from each of the following distribution, find, for each case, the
probability that the sample mean exceeds 5.
(a) 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝜆 = 4.5) (b) 𝑋~𝐵(𝑛 = 9, 𝑝 = 0.5)

Example 4.10
If a large number of samples of size 𝑛 are taken from 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2.5) and approximately 5% of the
sample means are less than 2.025, estimate 𝑛.

4.5 Some related sampling distributions


Theorem
Let X 1 , X 2 , ..., X n be a random sample of size n from a normal distribution with mean, µ and standard
X −
deviation, σ, then Z i = i are independent standard normal random variables, where i = 1, 2, ..., n

X −
2
n n
and  Z i2 =   i  has a  2 distribution with n degrees of freedom.
i =1 i =1
  

Theorem
Let X 1 , X 2 , ..., X n be a random sample of size n from a normal distribution with mean, µ and standard
deviation, σ, then
(n − 1) s 2
 (X i − X )
1 n 2
=
 2
 i =12

Chapter 4 - 6
UECM1224 Probability and Statistics II

has a  2 distribution with n − 1 degrees of freedom. Also, X and s 2 are independent random
variables.
4.5.1 The Chi-Square  2 Distribution
(i) The chi-square  2 distribution is a continuous distribution with p.d.f.
1
𝑓(𝑥) = 𝜈 𝑥 𝜈/2 𝑒 −𝑦/2 , 𝑥 > 0.
22 Γ(𝜈/2)
(ii) The value of  2 is never negative (i.e. always positive).
(iii) There is a family of  2 distribution. Each with a different shape, depending on the number of
degrees of freedom ( df = v ) and is denoted by  2 (v )
(iv) When the number of df is small the distribution is positively skewed but as the number of degrees
of df increase it becomes symmetrical and approaches the normal distribution.

Example 4.11
Determine the 2-value of the chi-square-distribution for the following.
(a) 7 degrees of freedom and an area 0.1 in the right tail.
(b) 2 degrees of freedom and an area 0.99 in the left tail.

Example 4.12
Find the area in the right tail of the chi-square distribution for  2 -value = 2.558, v = 10.

4.5.2 The Student’s t -distribution


Theorem
Let Z be a standard normal random variable, and let W be a  2 -distributed variable with  degrees of
freedom. Then, if Z and W are independent,
Z
T=
W /
has a t distribution with  degrees of freedom.

X −
Therefore, t = has a t distribution with n − 1 degrees of freedom.
s/ n

Some properties of Student’s t -distribution


1. There are infinitely many t distributions, each identified by one parameter  , called degree of
freedom. This parameter is always a positive integer.
2. Student’s t -distribution is a continuous distribution with p.d.f.
− ( +1) / 2
(( + 1) / 2)  t 2 
f (t ) = 1 +  , −  t  
( / 2)    
3. It is a (bell-shaped) symmetric distribution which is flatter than standard normal distribution
4. As the number of degrees of freedom increases, the variance of the random variable t decreases
and the t distribution approaches the standard normal distribution.

Example 4.13
Find the t-value of the t-distribution for the following.
(i) Area in the right tail = 0.05 and v=5
(ii) Area in the left tail = 0.025 and v = 20.
Chapter 4 - 7
UECM1224 Probability and Statistics II

Example 4.14
Find the area in the appropriate tail of the t-distribution for the following,
(i) t = 2.467, v = 28 (ii) t = – 2.878, v = 18

4.6 Estimation

The assignment of value(s) to a population parameter based on a value of the corresponding sample
statistic is called estimation.

The value(s) assigned to a population parameter based on the value of a sample statistic is called an
estimate. The sample statistic used to estimate a population parameter is called an estimator.

The estimation procedure involves the following steps:


1. Select a sample.
2. Collect the required information from the members of the sample.
3. Calculate the value of the sample statistic.
4. Assign value(s) to the corresponding population parameter.
A point estimate is a single value (or point) used to approximate a population parameter.

For example,
(i) the sample proportion, p̂ , is the best point estimate of the population proportion, 𝑝.
(ii) the sample mean, X , is the best point estimate of the population mean, 𝜇.
(iii) the sample variance, 𝑠 2 , is the best point estimate of the population variance,  2 .

An interval estimate is an interval that constructed around the point estimate, and it is stated that this
interval is likely to contain the true value of a population parameter.

Each interval is constructed with regard to a given confidence level and is called a confidence interval.
The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 −  )100%

4.7 Basics of Hypothesis Testing

In statistics, a hypothesis is a claim or statement about a property of a population.

A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property of a
population.

4.7.1 Components of a Formal Hypothesis Test

Null hypothesis and alternative hypothesis

The null hypothesis (denoted by H 0 ) is a statement that the value of a population parameter is assumed
to be true until it is declared false.

Chapter 4 - 8
UECM1224 Probability and Statistics II

The alternative hypothesis (denoted by H 1 or H a ) is the statement that the parameter has a value that
somehow differs from the null hypothesis.
Example 4.15
Give the relevant null hypothesis and alternative hypothesis.
(a) Knowing that the proportion of drivers who admit to running red lights is at least 0.5, test if the
proportion has changed.
(b) The mean height of professional basketball players is at most 7ft., test if the claim has changed.
(c) The standard deviation of IQ scores of actors is equal to 15, test if the standard deviation
(i) has changed,
(ii) is getting smaller,
(iii) is getting bigger.

Test statistic

The test statistic is a value computed from the sample data, and it is used in making the decision about
the rejection of the null hypothesis

x− x−
Test statistic for mean(large sample), z= or z=
 n s n
Rejection and nonrejection region

The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject
the null hypothesis.

The nonrejection region (or acceptance region) is the set of all values of the test statistic that cause us not
to reject the null hypothesis.

The critical value is a value that separates the critical region and the nonrejection region.

The significance level (denoted by  ) is the probability that the test statistic will fall in the critical region
when the null hypothesis is actually true.
Tails of a test

The tails in a distribution are the extreme regions bounded by critical values. Some hypothesis tests are
two-tailed (the critical region is in the two extreme regions), some are left-tailed (the critical region is in
the extreme left region) and some are right-tailed tests (the critical region is in the extreme right region).

Two-tailed test Left-tailed test Right-tailed test


Sign in the null
hypothesis H 0 = = or  = or 
Sign in the
alternative  < >
Chapter 4 - 9
UECM1224 Probability and Statistics II

hypothesis H 1
Rejection region In both tails In the left tail In the right tail
Example 4.16
Write the null and alternative hypothesis for each of the following cases. Determine whether we have
each is a case of a two-tailed, a left-tailed or a right-tailed test.
(a) According to the formal record, the mean family size was 3.19 in 1995. A researcher wants to
check whether or not this mean has changed since 1995.
(b) A company claims that the mean amount of soda in all soft-drink cans is 12 ounces. Suppose a
consumer agency wants to test whether the mean amount of soda per can is less than 12 ounces.
(c) A research report shows that the mean cholesterol of all adult males in KL is 175 in 1995. Test if
the mean cholesterol of all adult males in KL is now higher than 175.

 /2  /2 

* * * *Left–tailed test
Two–tailed test Right–tailed test

Rejection Region Acceptance Region * Critical Value

Type I and Type II errors

True State of Nature


The null The null
hypothesis is hypothesis is
true. false.
We decide to reject the
null hypothesis. Type I error Correct decision
Decision
We fail to reject the
null hypothesis. Correct decision Type II error

A Type I error occurs when a true null hypothesis is rejected. The value of  represents the probability
of committing this type of error, that is
 = P( H 0 is rejected | H 0 is true)
The value of  = the significance level of the test.

A Type II error occurs when a false null hypothesis is not rejected. The value of  represents the
probability of committing a Type II error, that is

Chapter 4 - 10
UECM1224 Probability and Statistics II

 = P( H 0 is not rejected | H 0 is false)


The value of 1 −  is called the power of the test. It represents the probability of not making a Type II
error.
Steps to perform a test of hypothesis
1. State the null and alternative hypothesis.
2. Select the distribution to use (test statistic).
3. Determine the rejection and nonrejection regions.
4. Calculate the value of the test statistic.
5. Make a decision.

4.8 Inference on population mean  :  known


There are three possible cases with  known:
Case I
The population from which the sample is selected is normally distributed.
Case II
The sample size is large ( n  30 ) regardless the population distribution.
Case III
The sample size is small ( n  30 ) and the population is not normally distributed.

For case III, a nonparametric method is used to make inference on  .

For case I and II, the normal distribution is used to make inference on  (  known).
The (1 −  )100% confidence interval for  is

x  z / 2
n
The value of z / 2 used here is read from the standard normal distribution table for the given confidence
level.
The test statistic is
x−
z=
 n
The value of z calculated for a sample mean x is also called the observed value of z .

The maximum error of estimate for 𝜇, denoted by E, is the quantity that is subtracted from and added to
the value of x to obtain a confidence interval for 𝜇. Thus,
 s s
E = z / 2 or E = z / 2 or E = t / 2
n n n

Example 4.17
A research department took a sample of 36 textbooks and collected information on their prices. This
information produced a mean price of RM54.40. It is known that the prices of all textbooks follow
normal distribution with standard deviation of RM4.50.
(a) What is the point estimate of the mean price of all textbooks? What is the margin of error for this
estimate?
(b) Construct a 90% confidence interval for the mean price of all textbooks.
(c) Use a 0.05 significance level to test the common belief that the mean price of all textbook is
RM53.

Let  be the mean price of all textbooks.


Chapter 4 - 11
UECM1224 Probability and Statistics II

4.9 Inference on population mean  :  not known


There are three possible cases with  is not known:
Case I
The population from which the sample is selected is normally distributed.
Case II
The sample size is large ( n  30 ) regardless the population distribution.
Case III
The sample size is small ( n  30 ) and the population is not normally distributed.

For case III, a nonparametric method is used to make inference on  .


For case I and II, a t − distribution is used to make inference on  (  unknown).
The (1 −  )100% confidence interval for  is
s
x  t / 2
n
The value of t 2 used here is read from the t − distribution table for degrees of freedom df ,  = n − 1and
the given confidence level.
The test statistic is
x−
t= , with  = n − 1
s n
The value of t calculated for a sample mean x is also called the observed value of t .

Note:
If the sample size is too large, use normal distribution as an approximation to the t − distribution.

Example 4.18
A doctor wanted to estimate the mean cholesterol level for all adult men living in a town. He took a
sample of 25 adult men from the town and found that the mean cholesterol level for this sample is 186
with a standard deviation of 12. Assume that the cholesterol levels for all adult men in the town are
normally distributed.
(a) Construct a 95% confidence interval for the population mean,  .
(b) Testing at the 1% significance level, would you conclude that the mean cholesterol level for all
adult men living in the town is more than 180?

Let  be the mean cholesterol level for all adult men living in the town.

Chapter 4 - 12
UECM1224 Probability and Statistics II

Example 4.19
According to a recent survey, the workers employed in manufacturing industries earned an average of
RM546 per month. Assume that this mean is based on a random sample of 1000 workers selected from
the manufacturing industries and that the standard deviation of earnings for this sample is RM75. Find a
99% confidence interval for the mean earnings of all workers employed in manufacturing industries.

Solution
Let  be the mean earnings of all workers employed in manufacturing industries.

Sample size for estimating mean 𝜇


 z 2 
2

n= 
 E 
where z 2 = critical z score based on the desired confidence interval
E = desired margin of error
 = population standard deviation
When finding the sample size n, if the use of formula does not result in a whole number, always increase
the value of n to the next larger whole number.

Remark
When  is not known, we can estimate  using these methods:
1.   range 4
2. Estimate the value of  by using the earlier result.

Example 4.20
If we want to estimate the mean IQ scores for the population of statistics professors. Given that the
standard deviation of IQ scores for all statistics professors is 15. How many statistics professors must be
randomly selected for IQ tests if we want 99% confidence that the sample mean is within 2 IQ points of
the population mean?

Let X be the IQ scores for statistics professors

Chapter 4 - 13
UECM1224 Probability and Statistics II

Example 4.21
The management of Priority Health Club claims that its members lose an average of 10 pounds or more
within the first month after joining the club. A consumer agency that wanted to check this claim took a
random sample of 36 members of this health club and found that they lost an average of 9.2 pounds
within the first month of membership with a standard deviation of 2.4 pounds.
(a) Find the rejection and nonrejection regions with 1% significance level for the test statistics and for
the sample means.
(b) What will the conclusion be in (a)?
(c) What are the Type I and Type II errors in this case?
(d) Compute the probability of committing Type I error in this case.
(e) What is the probability of making Type II error if the mean is changed to 9 pounds?

Let  be the mean weight lost of all the members.

Chapter 4 - 14
UECM1224 Probability and Statistics II

p-value is the probability, computed assuming that 𝐻0 is true, of observing a value of the test statistic that
is at least as contradictory to 𝐻0 and supportive of 𝐻1 as the value actually computed from the sample
data.

The p-value method in hypothesis testing

Right-tailed test: p − value = area to right of the test statistic z


Left-tailed test: p − value = area to left of the test statistic z
Two-tailed test: p − value = twice the area of the extreme region bounded
by the test statistic z

Criteria of the decision making


Reject H 0 if p − value   (the significance level)
Do not reject H 0 if p − value   (the significance level)

Example 4.22
A sample of 106 body temperatures having a mean of 98.20F. Assume that the sample is a simple
random sample and that the population standard deviation is known to be 0.62F. Use a 0.05 significance
level to test the common belief that the mean body temperature of healthy adults is equal to 98.60F.
Find the p − value of the test.

Let  be the mean body temperature of all healthy adults.

Chapter 4 - 15
UECM1224 Probability and Statistics II

Example 4.23
From the past record of a bank, with the old computer system, a teller at this bank could serve, on average
22 customers per hour. Recently, a new system was installed, expecting that it would increase the service
rate. To check if the new computer system is more efficient than the old system, the management took a
random sample of 18 hours and found that during these hours the mean number of customers served by
tellers was 28 per hour with a standard deviation of 2.5. Testing at the 1% significance level, would you
conclude that the new computer system is more efficient than the old computer system? Assume that the
number of customers served per hour by a teller is approximately normally distributed.

Let  be the mean number of customers served per hour by a teller.

Example 4.24
A psychologist claims that the mean age at which children start walking is 12.5 months. Carol wanted to
check if this claim is true. She took a random sample of 18 children and found that the mean age at which
these children started walking was 12.9 months with a standard deviation of 0.80 month. Using the 1%
significance level, can you conclude that the mean age at which all children start walking is different
from 12.5 months? Assume that the ages at which all children start walking have an approximately
normal distribution.

Let  be the mean age at which all children start walking.

Chapter 4 - 16
UECM1224 Probability and Statistics II

4.9 Inferences about the population variance,  2 (single population)

Assuming that the population is (approximately) normal, the (1 −  )100% confidence interval for  2 is
(n − 1) s 2 (n − 1) s 2
to
 2 / 2 12− / 2
where
2 / 2 and 12− / 2 are obtained from the chi-square distribution with the degree of freedom,  = n − 1
for  / 2 and 1 −  / 2 areas in the right tail

When the population is (approximately) normally distributed:


The test statistic for a test of hypothesis about  2 is
(n − 1) s 2
 =
2
.
2

The value of  is substituted from H 0 .
2

Example 4.25
ABC is producing cookies. The machine that fills packages of these cookies is set up in such a way that
the average net weight of these packages is 32 ounces with a variance of 0.015 square ounces. From time
to time, the quality control inspector at the company selects a sample of a few such packages, calculate
the variance of the net weights of these packages. Assume that the net weights of cookies in all packages
are normally distributed.
(a) A recently taken random sample of 25 packages from the production line gave a sample variance
of 0.029 square ounce. Construct a 95% confidence interval for the population variance of the net
weights of these packages.
(b) The acceptable value of the population variance is 0.015 square ounce or less. A recently taken
random sample of 30 packages from the production line gave a sample variance of 0.025 square
ounce. By using  = 0.01, do you conclude that the population variance is in the acceptable
criteria?

Let  2 be the variance of the net weights of these packages.

Chapter 4 - 17

You might also like