0% found this document useful (0 votes)
18 views66 pages

MTPDF9 Statistical Inference of Two Samples

The document outlines the procedures for hypothesis testing involving two samples, focusing on the difference in means of two normal distributions. It details the formulation of null and alternative hypotheses, the conditions for performing z-tests and t-tests, and provides examples to illustrate the application of these tests. Additionally, it distinguishes between independent and dependent samples and explains the necessary conditions for conducting tests on paired data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views66 pages

MTPDF9 Statistical Inference of Two Samples

The document outlines the procedures for hypothesis testing involving two samples, focusing on the difference in means of two normal distributions. It details the formulation of null and alternative hypotheses, the conditions for performing z-tests and t-tests, and provides examples to illustrate the application of these tests. Additionally, it distinguishes between independent and dependent samples and explains the necessary conditions for conducting tests on paired data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

COE0011 – Engineering Data Analysis

Inference on the Difference in


Means of Two Normal
Distributions
MPS Department | FEU Institute of Technology
OBJECTIVES

 Discuss the procedure in the test of statistical hypothesis


 Formulate and test statistical hypothesis involving two samples
 Apply the formulas on hypothesis testing in various fields involving two samples
Subtopic 1
Inference on the Difference in
Means of Two Normal Distributions

 Inference on the Difference in Means of Two Normal Distributions, Variances Known


 Inference on the Difference in Means of Two Normal Distributions, Variances Unknown
In a two-sample hypothesis test, two parameters from two populations are compared.

• For a two-sample hypothesis test,


1. the null hypothesis H0 is a statistical hypothesis that usually states there is
no difference between the parameters of two populations. The null
hypothesis always contains the symbol , =, or .
2. the alternative hypothesis Ha is a statistical hypothesis that is true when H0
is false. The alternative hypothesis always contains the symbol >, , or <.
To write a null and alternative hypothesis for a two-sample hypothesis test, translate the
claim made about the population parameters from a verbal statement to a mathematical
statement.

H0: μ1 = μ2 H0: μ1  μ2 H0: μ1  μ2


Ha: μ1  μ2 Ha: μ1 > μ2 Ha: μ1 < μ2

Regardless of which hypotheses used, μ1 = μ2 is always assumed to be true.


Three conditions are necessary to perform a z-test for the difference between two
population means μ1 and μ2.

1. The samples must be randomly selected.


2. The samples must be independent. Two samples are independent if the
sample selected from one population is not related to the sample selected
from the second population.
3. Each sample size must be at least 30, or, if not, each population must have a
normal distribution with a known standard deviation.
If these requirements are met, the sampling distribution for x1  x 2 (the
difference of the sample means) is a normal distribution with mean and
standard error of
μx  x  μx  μx  μ1  μ2
1 2 1 2

and
σ 12 σ 22
σ x x  σ x2  σ x2   .
1 2 1 2
n1 n2

Sampling distribution
for x1  x 2 σ x 1
x 2
μ1  μ2 σ x x
1 2
x1  x 2
Two-Sample z-Test for the Difference Between Means
A two-sample z-test can be used to test the difference between two population
means μ1 and μ2 when a large sample (at least 30) is randomly selected from each
population and the samples are independent. The test statistic is x1  x 2and the
standardized test statistic is
z 
x1  x 2   μ1  μ2
where σ x  x 
σ 12 σ 22
 .
σ x x
1 2
1 2
n1 n2
When the samples are large, you can use s1 and s2 in place of 1 and 2. If the
samples are not large, you can still use a two-sample z-test, provided the
populations are normally distributed and the population standard deviations are
known.
In Words In Symbols
1. State the claim mathematically. Identify the null and
State H0 and Ha.
alternative hypotheses.
2. Specify the level of significance. Identify .
3. Sketch the sampling distribution.
4. Determine the critical value(s).
Use z-table.
5. Determine the rejection regions(s).
In Words In Symbols
x1  x 2  μ1  μ2
6. Find the standardized test statistic. z
σ x x
1 2

7. Make a decision to reject or fail to reject the If z is in the rejection


null hypothesis. region, reject H0.
8. Interpret the decision in the context of the Otherwise, fail to reject H0.
original claim.
A high school math teacher claims that students in her class will score higher on
the math portion of the ACT then students in a colleague’s math class. The mean
ACT math score for 49 students in her class is 22.1 and the standard deviation is
4.8. The mean ACT math score for 44 of the colleague’s students is 19.8 and the
standard deviation is 5.4. At  = 0.10, can the teacher’s claim be supported?

H0: 1  2
 = 0.10
Ha: 1 > 2 (Claim)

-3 -2 -1 0 1 2 3
z

z0 = 1.28
H0: 1  2 z0 = 1.28
Ha: 1 > 2 (Claim) z
-3 -2 -1 0 1 2 3

The standardized error is Reject H0.


σ 12 σ 22 4.82 5.4 2  1.0644.
σ x x    
1 2
n1 n2 49 44

The standardized test statistic is


x1  x 2  μ1  μ2  22.1  19.8  0  2.161
z 1.0644
σ x x
1 2

There is enough evidence at the 10% level to support the teacher’s claim that her
students score better on the ACT.
• The production manager of Risen Manufacturing would like to decide
which of the two plants should be given the responsibility of producing
the soft drink bottle cups. This decision is to be based on productivity
levels. A sample of 50 days at the Golden Star Plant produced the mean
of 104.6 thousand cups a day with s = 13.4 thousand. The Blue Moon
Plant produced an average of 98.7 thousand per day with s=15.2
thousand over 60 days. Do these plants differ significantly in production
level? Use a 0.05 level of significance.

• Ans: The two plants significantly differ in production level


If samples of size less than 30 are taken from normally-distributed populations, a
t-test may be used to test the difference between the population means μ1 and
μ2.

Three conditions are necessary to use a t-test for small independent samples:
1. The samples must be randomly selected.
2. The samples must be independent. Two samples are independent if the sample
selected from one population is not related to the sample selected from the
second population.
3. Each population must have a normal distribution.
A two-sample t-test is used to test the difference between two population means μ1 and
μ2 when a sample is randomly selected from each population. Performing this test
requires each population to be normally distributed, and the samples should be
independent. The standardized test statistic is

t 
x1  x 2   μ1  μ2 .
σ x x
1 2

If the population variances are equal, then information from the two samples is combined
to calculate a pooled estimate of the standard deviation σ. ˆ

σˆ 
n1  1 s12  n2  1 s 22
n1  n2  2
The standard error for the sampling distribution of x1  x 2 is
1 1
σ x  x  σˆ   Variances equal
1 2
n1 n2
and d.f.= n1 + n2 – 2.

If the population variances are not equal, then the standard error is

s12 s 22
σ x x   Variances not equal
1 2
n1 n2

and d.f = smaller of n1 – 1 or n2 – 1.


In Words In Symbols
1. State the claim mathematically. Identify the null and State H0 and Ha.
alternative hypotheses.
2. Specify the level of significance. Identify .
d.f. = n1+ n2 – 2 or
3. Identify the degrees of freedom and sketch the d.f. = smaller of
sampling distribution. n1 – 1 or
n2 – 1.
4. Determine the critical value(s). Use t-table
In Words In Symbols
5. Determine the rejection regions(s).
t
x1  x 2  μ1  μ2
6. Find the standardized test statistic. σ x x
1 2

7. Make a decision to reject or fail to reject the null If t is in the rejection region,
hypothesis. reject H0. Otherwise, fail to
8. Interpret the decision in the context of the original reject H0.
claim.
A random sample of 17 police officers in Brownsville has a mean annual income of
$35,800 and a standard deviation of $7,800. In Greensville, a random sample of 18
police officers has a mean annual income of $35,100 and a standard deviation of
$7,375. Test the claim at  = 0.01 that the mean annual incomes in the two cities
are not the same. Assume the population variances are equal.

H0: 1 = 2
 = 0.005  = 0.005
Ha: 1  2 (Claim)

-2 t
d.f. = n1 + n2 – 2 -3 -1 0 1 2
t0
3
= 2.733
–t0 = –2.733
= 17 + 18 – 2 = 33
H0: 1 = 2
Ha: 1  2 (Claim) t
-3 -2 -1 0 1 2 3
–t0 = –2.733 t0 = 2.733

The standardized error is

σ x  x  σˆ 1

1

n1  1 s12  n2  1 s 22 
1

1
1 2
n1 n2 n1  n2  2 n1 n2


17  1 78002  18  1 73752 
1

1
17  18  2 17 18

 7584.0355(0.3382)

 2564.92 Continued.
H0: 1 = 2
Ha: 1  2 (Claim) -3 -2 -1 0 1 2 3
t
–t0 = –2.733 t0 = 2.733

The standardized test statistic is

t
x1  x 2  μ1  μ2 35800  35100  0
σ   0.273
x x
1 2
2564.92
Fail to reject H0.
There is not enough evidence at the 1% level to support the claim that the mean
annual incomes differ.
Example

• The director of Universal Communications is planning to put more telephone


operators in the evening shift if his assistant’s report is valid – that telephone calls
are fewer in the afternoon than in the evening. To test the validity of the report, the
number of afternoon and evening calls was recorded. Results showed that in a
sample of 15 days the afternoon calls averaged to 425 with s = 47, and in 17 days the
evening calls averaged to 460 with s = 50. Should there be additional operators in the
evening shift? Use a 0.01 level of significance.

• Ans: There’s no need to add telephone operators in the eveniing


Two samples are independent if the sample selected from one population is not
related to the sample selected from the second population. Two samples are
dependent if each member of one sample corresponds to a member of the other
sample. Dependent samples are also called paired samples or matched samples.

Independent Samples Dependent Samples


Classify each pair of samples as independent or dependent.

Sample 1: The weight of 24 students in a first-grade class


Sample 2: The height of the same 24 students
These samples are dependent because the weight and height can be paired with
respect to each student.

Sample 1: The average price of 15 new trucks


Sample 2: The average price of 20 used sedans

These samples are independent because it is not possible to pair the new trucks
with the used sedans. The data represents prices for different vehicles.
To perform a two-sample hypothesis test with dependent samples, the difference
between each data pair is first found:

d = x1 – x2 Difference between entries for a data pair.

d Mean of the differences between paired data entries in the


d  .
n dependent samples.

Three conditions are required to conduct the test.


1. The samples must be randomly selected.
2. The samples must be dependent (paired).
3. Both populations must be normally distributed.

d
–t0 μd t0
The following symbols are used for the t-test for μd .

Symbol Description
n The number of pairs of data
d The difference between entries for a data pair, d = x1 – x2
μd The hypothesized mean of the differences of paired data in the population

d The mean of the differences between the paired data entries in the dependent samples

d  d
n
sd The standard deviation of the differences between the paired data entries in the
dependent samples
n(d 2 )  d 
2
sd 
n(n  1)
A t-test can be used to test the difference of two population means when a sample is
randomly selected from each population. The requirements for performing the test are
that each population must be normal and each member of the first sample must be paired
with a member of the second sample.

d  d
The test statistic is
n
d  μd
and the standardized test statistic is t  .
sd n

The degrees of freedom are


d.f. = n – 1.
In Words In Symbols
1. State the claim mathematically. Identify the null and State H0 and Ha.
alternative hypotheses.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom and sketch the d.f. = n – 1
sampling distribution.
4. Determine the critical value(s). Use t-table.

Continued.
In Words In Symbols
5. Determine the rejection region(s). d  d
n

sd  n(d 2 )  (d )2
6. Calculated andsd .Use a table. n(n  1)

d  μd
7. Find the standardized test statistic. t 
sd n

8. Make a decision to reject or fail to reject the null If t is in the rejection region,
hypothesis. reject H0. Otherwise, fail to
reject H0.
9. Interpret the decision in the context of the original
claim.
A reading center claims that students will perform better on a standardized reading test
after going through the reading course offered by their center. The table shows the
reading scores of 6 students before and after the course. At  = 0.05, is there enough
evidence to conclude that the students’ scores after the course are better than the scores
before the course?

Student 1 2 3 4 5 6
Score (before) 85 96 70 76 81 78
Score (after) 88 85 89 86 92 89

H0: d  0
Ha: d > 0 (Claim)
d.f. = 6 – 1 = 5
H0: d  0  = 0.05

Ha: d > 0 (Claim)


-3 -2 -1 0 1 2 3
t
d = (score before) – (score after)
t0 = 2.015
Student 1 2 3 4 5 6
Score (before) 85 96 70 76 81 78
Score (after) 88 85 89 86 92 89
d 3 11 19 10 11 11  d  43
d2 9 121 361 100 121 121  d 2  833
d  d  643
 7.167
n
n(d 2 )  (d )2 6(833)  1849
sd    104.967  10.245
n(n  1) 6(5)
H0: d  0
Ha: d > 0 (Claim)
d
–t0 μd t0

The standardized test statistic is


d  μd 7.167  0  1.714.
t  
sd n 10.245 6
Fail to reject H0.
There is not enough evidence at the 5% level to support the claim that the students’
scores after the course are better than the scores before the course.
Elementary Statistics by Bluman
Engineering Data Analysis
Inference on the Variance and
Proportions of Two Normal
Distributions
MPS Department | FEU Institute of Technology
Subtopic 2
OBJECTIVES

 Discuss the procedure in the test of statistical hypothesis


 Formulate and test statistical hypothesis involving two samples
 Apply the formulas on hypothesis testing in various fields
involving two samples
Subtopic 2

Inference on the Variance and Proportions of


Two Normal Distributions

 Inference on the Variance of Two Normal Distributions


 Inference on Two Population Proportions
The F Distribution
We wish to test the hypotheses:

• The development of a test procedure for these


hypotheses requires a new probability distribution, the
F distribution.
The F Distribution

Upper and lower percentage point of the F


distribution

Probability density functions of two F distributions


The F Distribution

The lower-tail percentage points f-1,u, can be found as follows.


Hypothesis Tests on the Ratio of Two Variances
Hypothesis Tests on the Ratio of Two Variances
Example 1
Example 1
Example 1
Type II Error and Choice of Sample Size
Example 2
Confidence Interval on the Ratio of Two Variances
Example 3
Example 3
Example 3
Large-Sample Test on the Difference in
Population Proportions
We wish to test the hypotheses:
Large-Sample Test on the Difference in Population
Proportions
The following test statistic is distributed
approximately as standard normal and is the
basis of the test:
Large-Sample Test on the Difference in Population
Proportions
Example 4
Example 4
Example 4
• Elementary Statistics by Bluman
• http://slideshare.com

You might also like