0% found this document useful (0 votes)
148 views60 pages

Flipped Notes 8 Hypothesis Testing

This document discusses key concepts in hypothesis testing including: 1. Formulating the null hypothesis (H0) and alternative hypothesis (Ha). H0 represents the default assumption and contains an equals sign, while Ha contradicts H0. 2. Distinguishing between one-sided and two-sided hypotheses. A one-sided Ha tests if a parameter differs in one direction from H0, while a two-sided tests if it differs in either direction. 3. Defining the two types of errors in hypothesis testing - rejecting H0 when it is true or failing to reject H0 when it is false. Hypothesis tests aim to minimize errors.

Uploaded by

Joemar Subong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views60 pages

Flipped Notes 8 Hypothesis Testing

This document discusses key concepts in hypothesis testing including: 1. Formulating the null hypothesis (H0) and alternative hypothesis (Ha). H0 represents the default assumption and contains an equals sign, while Ha contradicts H0. 2. Distinguishing between one-sided and two-sided hypotheses. A one-sided Ha tests if a parameter differs in one direction from H0, while a two-sided tests if it differs in either direction. 3. Defining the two types of errors in hypothesis testing - rejecting H0 when it is true or failing to reject H0 when it is false. Hypothesis tests aim to minimize errors.

Uploaded by

Joemar Subong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Republic of the Philippines

Cagayan State University


Carig Campus
CSU Vision
COLLEGE OF ENGINEERING
Transforming

lives by
FLIPPED NOTES NUMBER 8
Educating for

the BEST.

CSU Mission

CSU is committed
to transform the In partial fulfilment for the requirements of the course
lives of people and ENGINEERING DATA ANALYSIS
communities
through high
quality instruction
and innovative
research,
development, By:
production and
SUBONG, JOEMAR D.
extension.
BACANI, VALERIE ELAINE M.
DOCA, AL JOHNKENETH A.
CSU – IGA
Competence
TANNAGAN, NOREEN G.

Social Responsibility

Unifying Presence

COE – IGA
January 06, 2020
Innovative Thinking

Synthesis

Personal

Responsibility

Empathy

Research Skill

Entrepreneurial Skill
UNIT I
Hypothesis Testing

LEARNING OUTCOMES:

In this lesson we are going to look the parts of hypothesis testing which is essential to
thoroughly understand the whole lesson for hypothesis testing. At end of this unit, the following
target are be are to be accomplish

 To formulate hypotheses based on the given inferential question


 To differentiate null hypothesis and alternative hypothesis
 To differentiate one-sided and two-sided hypotheses
 To identify the two types of errors that is useful on deciding whether to accept or reject
the hypothesis
 To make decisions on testing the hypothesis based on the rejection regions and critical
values
Introduction

Consider the hypothetical situation:

From previous experience we know that the birth weights of England are normally distributed
with a mean of 3000g and a standard deviation of 500g.

We think that maybe babies in Australia have a mean birth weight greater than 3000g and we
would like to test this hypothesis.

The statistical method that is used in making statistical decision using experimental data and
basically an assumption that we make about population parameter is called Hypothesis Testing.
A hypothesis test is a formal way to make a decision based on statistical analysis. It refers to the
process of making inferences or educated guesses about a particular parameter. This can either
be done using statistics and sample data, or it can be done on the basis of an uncontrolled
observational study.

Formulating of Hypotheses
Setting a new studies/theory always begins with formation of assumption or claim that is known
in field of statistics as Hypothesis.

Hypothesis is used to explain a phenomenon or predict a relationship in communication research.


There are four evaluation criteria that a hypothesis must meet.

 It must state an expected relationship between variables. 


 It must be testable and falsifiable, researchers must be able to test whether a hypothesis is
truth or false.
 It should be consistent with the existing body of knowledge
 It should be stated as simply and concisely as possible.
Formulating a hypothesis requires a specific, testable, and predictable statement driven by
theoretical guidance and/or prior evidence. A hypothesis can be formulated in various research
designs. In experimental settings, researchers compare two or more groups of research
participants to investigate the differences of the research outcomes.

Hypothesis testing is a set of logical and statistical guidelines used to make decision from sample
statistics to population statistics. The intent of hypothesis testing is to formally examine two
opposing hypotheses, the null and alternative hypotheses. These two hypotheses are mutually
exclusive and exhaustive.

Definition of Null and Alternative Hypotheses

1. Null hypothesis, which denoted as H 0, is the statement about the population parameter that is
assumed to be true unless there is convincing evidence to the contrary. The null hypothesis
attempts to show that no variation exists between variables or that a single variable is no
different than its mean. It is presumed to be true until statistical evidence nullifies it for an
alternative hypothesis. Simply null hypothesis is a type of conjecture used in statistics that
proposes that no statistical significance exists in a set of given observations. [ CITATION
Hay19 \l 4105 ]

For example, if the hypothesis test is set up so that the alternative hypothesis states that the
population parameter is not equal to the claimed value. Therefore, the cook time for the
population mean is not equal to 12 minutes; rather, it could be less than or greater than the stated
value. If the null hypothesis is accepted or the statistical test indicates that the population mean is
12 minutes, then the alternative hypothesis is rejected. And vice-versa.

2. Alternative Hypothesis, denoted H a , is a statement about the population parameter that is


contradictory to the null hypothesis, and is accepted as true only if there is convincing
evidence in favor of it.

The alternative hypothesis can be either one-sided or two sided.

Two-sided
Use a two-sided alternative hypothesis (also known as a nondirectional hypothesis) to determine
whether the population parameter is either greater than or less than the hypothesized value. A
two-sided test can detect when the population parameter differs in either direction, but has less
power than a one-sided test.

For example, a researcher has results for a sample of students who took a national exam at a high
school. The researcher wants to know if the scores at that school differ from the national average
of 850. A two-sided alternative hypothesis (also known as a nondirectional hypothesis) is
appropriate because the researcher is interested in determining whether the scores are either less
than or greater than the national average. (H0: μ = 850 vs. Ha: μ≠ 850)
One-sided
Use a one-sided alternative hypothesis (also known as a directional hypothesis) to determine
whether the population parameter differs from the hypothesized value in a specific direction.
You can specify the direction to be either greater than or less than the hypothesized value. A one-
sided test has greater power than a two-sided test, but it cannot detect whether the population
parameter differs in the opposite direction.

For example, a researcher has exam results for a sample of students who took a training course
for a national exam. The researcher wants to know if trained students score above the national
average of 850. A one-sided alternative hypothesis (also known as a directional hypothesis) can
be used because the researcher is specifically hypothesizing that scores for trained students are
greater than the national average. (H0: μ = 850 vs. Ha: μ > 850)

Hypothesis testing allows a mathematical model to validate or reject a null hypothesis within a
certain confidence level.

The end result of a hypotheses testing procedure is a choice of one of the following two possible
conclusions:

1. Reject H 0 (and therefore accept H a), or


2. Fail to reject H 0 (and therefore fail to accept H a).

The null hypothesis typically represents the status quo, or what has historically been true. In the
example of the respirators, we would believe the claim of the manufacturer unless there is reason
not to do so, so the null hypotheses is H 0:μ=75. The alternative hypothesis in the example is the
contradictory statement H a:μ<75. The null hypothesis will always be an assertion containing an
equals sign, but depending on the situation the alternative hypothesis can have any one of three
forms: with the symbol <, as in the example just discussed, with the symbol >, or with the
symbol ≠. The following two examples illustrate the latter two cases.

Example 1. The recipe for a bakery item is designed to result in a product that contains 8 grams
of fat per serving. The quality control department samples the product periodically to insure that
the production process is working as designed. State the relevant null and alternative hypotheses.

Solution: The default option is to assume that the product contains the amount of fat it was
formulated to contain unless there is compelling evidence to the contrary. Thus the null
hypothesis is H0:μ=8.0 . Since to contain either more fat than desired or to contain less fat than
desired are both an indication of a faulty production process, the alternative hypothesis in this
situation is that the mean is different from 8.0, so Ha:μ≠8.0.

Example 2. A publisher of college textbooks claims that the average price of all hardbound
college textbooks is $127.50 . A student group believes that the actual mean is higher and
wishes to test their belief. State the relevant null and alternative hypotheses.
Solution: The default option is to assume that the product contains the amount of fat it was
formulated to contain unless there is compelling evidence to the contrary. Thus the null
hypothesis is H0:μ=8.0 . Since to contain either more fat than desired or to contain less fat than
desired are both an indication of a faulty production process, the alternative hypothesis in this
situation is that the mean is different from 8.0, so Ha:μ≠8.0.

In order to make the null and alternative hypotheses easy to distinguish, in every example and
problem in this text we will always present one of the two competing claims about the value of a
parameter with an equality. The claim expressed with an equality is the null hypothesis. This is
the same as always stating the null hypothesis in the least favorable light. So in the introductory
example about the respirators, we stated the manufacturer’s claim as “the average is 75
minutes” instead of the perhaps more natural “the average is at least 75 minutes,” essentially
reducing the presentation of the null hypothesis to its worst case

Types of error for hypothesis Testing


The goal of any hypothesis testing is to make a decision. In particular, we will decide whether to
reject the null hypothesis, H0, in favor of the alternative hypothesis, Ha. Although we would like
always to be able to make a correct decision, we must remember that the decision will be based
on sample information, and thus we are subject to make one of two types of error, as defined in
the accompanying boxes.

The format of the testing procedure in general terms is to take a sample and use the information
it contains to come to a decision about the two hypotheses. As stated before our decision will
always be either.

1. Reject the null hypothesis H0H0 in favor of the alternative Ha presented, or


2. Do not reject the null hypothesis H0 in favor of the alternative H0 presented.

There are four possible outcomes of hypothesis testing procedure, as shown in the following
table:
True State of Nature
Ho is true Ho is false

Our Decision Do not reject Ho Correct decision Type II error

Reject Ho Type I Error Correct Decision

Definition of Type I and II error


 A Type I error is the decision to reject H0 when it is in fact true.
 A Type II error is the decision not to reject H0 when it is in fact not true.

The null hypothesis can be either true or false further, we will make a conclusion either to reject
or not to reject the null hypothesis. Thus, there are four possible situations that may arise in
testing a hypothesis.
Definition of Level of significance
The number α that is used to determine the rejection region is called the level of significance of
the test. It is the probability that the test procedure will result in a Type I error. The following are
procedure for formulating hypotheses and stating conclusions.

The probability of making a Type II error is too complicated to discuss in a beginning text, so we
will say no more about it than this: for a fixed sample size, choosing alpha smaller in order to
reduce the chance of making a Type I error has the effect of increasing the chance of making a
Type II error. The only way to simultaneously reduce the chances of making either kind of error
is to increase the sample size.

Example 3: A metal lathe is checked periodically by quality control inspectors to determine if it


is producing machine bearings with a mean diameter of .5 inch. If the mean diameter of the
bearings is larger or smaller than .5 inch, then the process is out of control and needs to be
adjusted. Formulate the null and alternative hypotheses that could be used to test whether the
bearing production process is out of control. Specify what Type I and Type II errors would
represent, in terms of the problem.

Solution: A Type I error is the error of incorrectly rejecting the null hypothesis. In our example,
this would occur if we conclude that the process is out of control when in fact the process is in
control, i.e., if we conclude that the mean bearing diameter is different from .5 inch, when in fact
the mean is equal to .5 inch. The consequence of making such an error would be that
unnecessary time and effort would be expended to repair the metal lathe.

A Type II error, which of accepting the null hypothesis when it is false, would occur if we
conclude that the mean bearing diameter is equal to .5 inch when in fact the mean differs from .5
inch. The practical significance of making a Type II error is that the metal lathe would not be
repaired, when in fact the process is out of control.

The probability of making a Type I error (a) can be controlled by the researcher (how to do this
will be explained in Section 4). a is often used as a measure of the reliability of the conclusion
and called the level of significance (or significance level) for a hypothesis test.

You may note that we have carefully avoided stating a decision in terms of "accept the null
hypothesis H0." Instead, if the sample does not provide enough evidence to support the
alternative hypothesis Ha, we prefer a decision "not to reject H0." This is because, if we were to
"accept H0," the reliability of the conclusion would be measured by a, the probability of Type II
error. However, the value of a is not constant, but depends on the specific alternative value of the
parameter and is difficult to compute in most testing situations.

Formulating hypotheses and stating conclusions


1. State the hypothesis as the alternative hypothesis Ha.
2. The null hypothesis, H0, will be the opposite of Ha and will contain an equality sign.
3. If the sample evidence supports the alternative hypothesis, the null hypothesis will be
rejected and the probability of having made an incorrect decision (when in fact H0 is
true) is , a quantity that can be manipulated to be as small as the researcher wishes.
4. If the sample does not provide sufficient evidence to support the alternative hypothesis,
then conclude that the null hypothesis cannot be rejected on the basis of your sample. In
this situation, you may wish to collect more information about the phenomenon under
study.

Logic of Hypothesis Testing


Although we will study hypothesis testing in situations other than for a single population mean
(for example, for a population proportion instead of a mean or in comparing the means of two
different populations), in this section the discussion will always be given in terms of a single
population mean μ .

The null hypothesis always has the form H0:μ=μ0 for a specific number μ0 (in the respirator
example μ0=75 , in the textbook example μ0=127.50 , and in the baked goods example
μ0=8.0). Since the null hypothesis is accepted unless there is strong evidence to the contrary, the
test procedure is based on the initial assumption that H0 is true. This point is so important that
we will repeat it in a display:

The test procedure is based on the initial assumption that H0 is true.

Our decision procedure therefore reduces simply to:

 if Ha has the form Ha : μ<μ0 then reject H0 if x is far to the left of μ0


 if Ha has the form Ha : μ<μ0 then reject H0 if x is far to the right of μ0
 if Ha has the form Ha : μ≠μ0 then reject H0 if x is far away from μ0 in either
direction.

Think of the respirator example, for which the null hypothesis is H0:μ=75, the claim that the
average time air is delivered for all respirators is 75 minutes. If the sample mean is 75 or greater
then we certainly would not reject H0 (since there is no issue with an emergency respirator
delivering air even longer than claimed).

If the sample mean is slightly less than 75 then we would logically attribute the difference to
sampling error and also not reject H0 either.
Values of the sample mean that are smaller and smaller are less and less likely to come from a
population for which the population mean is 75. Thus if the sample mean is far less than 75, say
around 60 minutes or less, then we would certainly reject H0, because we know that it is highly
unlikely that the average of a sample would be so low if the population mean were 75. This is the
rare event criterion for rejection: what we actually observed (X<60) would be so rare an event if
μ=75 were true that we regard it as much more likely that the alternative hypothesis μ<75 holds.

In summary, to decide between H0 and Ha in this example we would select a “rejection region”
of values sufficiently far to the left of 75, based on the rare event criterion, and reject H0 if the
sample mean X lies in the rejection region, but not reject H0 if it does not.

The Rejection Region

Each different form of the alternative hypothesis Ha has its own kind of rejection region:
 if (as in the respirator example) Ha has the form Ha : μ<μ0 , we reject H0 if x is far to
the left of μ0 , that is, to the left of some number C , so the rejection region has the form
of an interval (−∞,C]
 if (as in the textbook example) Ha has the form Ha:μ>μ0 , we reject H0 if x is far to
the right of μ0 , that is, to the right of some number C , so the rejection region has the
form of an interval [C,∞) ;
 if (as in the baked good example) Ha has the form Ha:μ≠μ0 , we reject H0 if x¯ is far
away from μ0 in either direction, that is, either to the left of some number C or to the
right of some other number C', so the rejection region has the form of the union of two
intervals (−∞,C]∪[C′,∞).

The key issue in our line of reasoning is the question of how to determine the number C or
numbers C and C' , called the critical value or critical values of the statistic, that determine the
rejection region.

Definition of Critical Values


The critical value or critical values of a test of hypotheses are the number or numbers that
determine the rejection region.

Suppose the rejection region is a single interval, so we need to select a single number C. Here is
the procedure for doing so. We select a small probability, denoted α, say 1%, which we take as
our definition of “rare event:” an event is “rare” if its probability of occurrence is less than α .
(In all the examples and problems in this text the value of α will be given already.) The
probability that X takes a value in an interval is the area under its density curve and above that
interval, so as shown in Figure below (drawn under the assumption that H0 is true, so that the
curve centers at μ0 ) the critical value C is the value of X that cuts off a tail area α in the
probability density curve of X. When the rejection region is in two pieces, that is, composed of
two intervals, the total area above both of them must be α , so the area above each one is α/2 ,
as also shown in Figure below.
The number α is the total area of a tail or a pair of tails.

Example 4: In the context of Example 1, suppose that it is known that the population is
normally distributed with standard deviation α=0.15 gram, and suppose that the test of
hypotheses H0 : μ=8.0 versus Ha : μ≠8.0 will be performed with a sample of size 5 .
Construct the rejection region for the test for the choice α=0.10. Explain the decision procedure
and interpret it.

Solution:

If H0 is true then the sample mean X is normally distributed with mean and standard deviation.
μ x =μ
= 8.0
σ
σ x=
√n
0.15
¿
√5
= 0.067
Since Ha contains the ≠ symbol the rejection region will be in two pieces, each one
corresponding to a tail of area α/2=0.10/2=0.05 . z x = 1.645 , so C and C' are 1.645 standard
deviations of X to the right and left of its mean 8.0.

C=8.0−( 1.645 )( 0.067 )=7.89


C ' =8.0+ ( 1.645 ) ( 0.067 )=8.11
The decision procedure is: take a sample of size 5 and compute the sample mean x. If x is either
7.89 grams or less or 8.11 grams or more then reject the hypothesis that the average amount of
fat in all servings of the product is 8.0 grams in favor of the alternative that it is different from
8.0 grams. Otherwise do not reject the hypothesis that the average amount is 8.0 grams.

The reasoning is that if the true average amount of fat per serving were 8.0 grams then there
would be less than a 10% chance that a sample of size 5 would produce a mean of either 7.89
grams or less or 8.11 grams or more. Hence if that happened it would be more likely that the
value 8.0 is incorrect (always assuming that the population standard deviation is 0.15 gram).

Because the rejection regions are computed based on areas in tails of distributions, as shown in
Figure, hypothesis tests are classified according to the form of the alternative hypothesis in the
following way.
Unit II
ONE SAMPLE STATISTICAL TESTS
LEARNING OUTCOMES:
In this lesson we are going to move on and look at inferential statistics to test hypotheses
concerned with comparing a single sample (instead of a single score) with some population
parameter. At the end of this unit, the following targets are to be accomplished:
1. to apply the Z-test to compare the mean of a large group to a single statistic
2. to apply the one-sample t-test to compare the mean of a small group to a single value
or statistic.

I. INTRODUCTION
In our last lesson we looked at the process for making inferences about research.
In this context we looked at the significance of a single score. We wanted to see if a score
differed significantly from a population value. To test statistical hypotheses involving a
single score we calculated the scores Z-score. We referred to this as the Z-score test. As a
reminder the formula for the Z-score (or the Z-score test) was
X− μ
Z=
σ
II. THE Z-TEST

 The z-test for the mean is a statistical test for a population mean. The z-test can be used
when the population is normal and  is known, or for any population when the sample
size n is at least 30.
 The test statistic is the sample mean and the standardized test statistic is z.
 Using the formula:
x́−μ σ
 st a n dar d er r or  σ x
z= n x́−μ
σ z=
σx
√n
where x́ is the sample mean, μ is a specified value to be tested, σ is the population
standard deviation, and n is the size sample. The significance level of the z-value in
the standard normal table must be considered.
 The sampling distribution of the mean is the mean of a set of many sample means
taken from a population. It is the mean of all the means. In practice the sampling
distribution of the mean is the same as the population mean, so we can use μ instead
of μ X .
 σ X is the standard error of the mean. It is the standard deviation of many sample
means. Unfortunately for us the standard error of the mean does not equal the
population standard deviation but instead is equal to the population standard deviation
(sigma) divided by the square root of the sample size (n).
 The steps for using z-test in hypothesis testing can be summarized into 7 steps:

WORDS IN SYMBOLS
1. State the claim mathematically and verbally. State H0 and Ha.
Identify the null and alternative hypotheses.
2. Specify the level of significance. Identify .
3. Determine the standardized test statistic. x μ
z 
σ n
4. Find the area that corresponds to z. significance level of the z-value in
the standard normal table

5. Find the P-value


a. For a left-tailed test, P = (Area in left tail).
b. For a right-tailed test, P = (Area in right tail).
c. For a two-tailed test, P = 2(Area in tail of test statistic).

6. Make a decision to reject or fail to reject the Reject H0 if P-value is less


null hypothesis. than or equal to .
7. Interpret the decision in the context of the Otherwise, fail to reject H0.
original claim.

Example:
1. A manufacturer claims that its rechargeable batteries are good for an average of more
than 1,000 charges. A random sample of 100 batteries has a mean life of 1002 charges
and a standard deviation of 14. Is there enough evidence to support this claim at  =
0.01?
STEP 1. (Claim)
H :  > 1000 H :   1000
a 0

STEP 2. The level of significance is  = 0.01

STEP 3. The standardized test statistic is:


x μ 1002  1000
z    1.43
σ n 14 100
STEP 4. Location of P-value
z  1.43
The area to the right of z = 1.43 is
P = 0.0764.

1.4
0 0 3
z
z
STEP 5. P= 0.0764

STEP 6. P-value is greater than  = 0.01, fail to reject H0.

STEP 7. Interpret

At the 1% level of significance, there is not enough evidence to support the claim
that the rechargeable battery has an average life of at least 1000 charges.

III. ONE SAMPLE T-TEST


 The t statistic is not distributed normally like the z statistic is but is distributed as)
the t-distribution, also referred to as student's distribution. We will use this
distribution when we do the six-step process for testing statistical hypothesis. To
use the table for the t-distribution we need to know one other piece of information
and that is the degrees of freedom for the one sample t-test.
 The t-test for the mean is a statistical test for a population mean. The t-test can be
used when the population is normal or nearly normal,  is unknown, and n < 30.
 The test statistic is the sample mean and the standardized test statistic is t.
 Using the formula:
x μ
t 
where x́ is the
s n sample mean, μ is a specified value to be
tested, s is the population standard deviation, and n is the size sample. The
critical values of t-distribution with ν degrees of freedom is needed.
 An additional value, the degrees of freedom are to be computed using the formula
d.f. = n – 1.
 The steps for using z-test in hypothesis testing can be summarized into 8 steps:
WORDS IN SYMBOLS
1. State the claim mathematically and verbally. State H0 and Ha.
Identify the null and alternative hypotheses.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom and sketch
d.f= n-1
the sampling distribution.
4. Determine any critical value significance level of the z-value in
the standard normal table
See critical values of t-distribution
5. Determine any rejection region(s).
with ν degrees of freedom
x μ
6. Find the standardized test statistic t 
s n
7. Make a decision to reject or fail to reject the If t is in the rejection region,
null hypothesis. reject H0. Otherwise, fail to
8. Interpret the decision in the context of the reject H0.
original claim.
 Note: Example will be presented after discussing on how to get the critical values
involving t-test.

FINDING CRITICAL VALUES IN T-DISTRIBUTION


1. Identify the level of significance.
2. Identify the degrees of freedom d.f. = n – 1.
3. Find the critical value(s) using Table 5 in Appendix B in the row with n – 1 degrees of
freedom. If the hypothesis test is
a. left-tailed, use “One Tail,  ” column with a negative sign,
b. right-tailed, use “One Tail,  ” column with a positive sign,
c. Two-tailed, use “Two Tails,  ” column with a negative and a positive sign.
Examples: Finding critical values in t-distibution
1. Find the critical value t0 for a right-tailed test given  = 0.01 and n = 24.
a. The degrees of freedom are d.f. = n – 1 = 24 – 1 = 23.
b. To find the critical value, use critical values of t-distribution with ν degrees of
freedom with d.f. = 23 and 0.01 in the “One Tail,  “column. Because the test is a
right-tail test, the critical value is positive.
c. t0 = 2.500
2. Find the critical values t0 and t0 for a two-tailed test given  = 0.10 and n = 12.
a. The degrees of freedom are d.f. = n – 1 = 12 – 1 = 11.
b. To find the critical value, use critical values of t-distribution with ν degrees of
freedom with d.f. = 11 and 0.10 in the “Two Tail,  “column. Because the test is
a two-tail test, one critical value is negative and one is positive.
c. t0 =  1.796 and t0 = 1.796
Examples: Problems involving one sample t-test
1. A local telephone company claims that the average length of a phone call is 8 minutes. In
a random sample of 18 phone calls, the sample mean was 7.8 minutes and the standard
deviation was 0.5 minutes. Is there enough evidence to support this claim at  = 0.05?
STEP 1. (Claim)
Ha:   8 H0:  = 8
STEP 2. The level of significance is  = 0.05
STEP 3. Degrees of freedom are d.f. = 18 – 1 = 17
STEP 4 AND STEP 5.
The test is a two-tailed test and by using the steps presented above, the critical
values can be computed.
The critical values are t0 = 2.110 and t0 = 2.110
STEP 6 Standardized test statistic
x μ 7.8  8
t    1.70.
s n 0.5 18
STEP 7. Decide whether to reject or fail to reject the null hypothesis.

 1.70.
The test statistic falls in the nonrejection
region, so H0 is not rejected.
z
z0 = 2.110 0 z0 = 2.110
STEP 8. INTERPRET
At the 5% level of significance, there is not enough evidence to reject the claim
that the average length of a phone call is 8 minutes.
Unit III
TWO PARAMETER TESTING

LEARNING OBJECTIVES
By the end of this chapter, the following targets should have been accomplished:
 to differentiate independent and dependents samples;
 to apply independent t-test and dependent t-test appropriately to compare the means of
two sample
I. INTRODUCTION
In a two-sample hypothesis test, two parameters from two populations are
compared.
For a two-sample hypothesis test,
1. the null hypothesis H0 is a statistical hypothesis that usually states there is no
difference between the parameters of two populations. The null hypothesis always
contains the symbol , =, or .
2. the alternative hypothesis Ha is a statistical hypothesis that is true when H0 is false.
The alternative hypothesis always contains the symbol >, , or <.
To write a null and alternative hypothesis for a two-sample hypothesis test, translate
the claim made about the population parameters from a verbal statement to a mathematical
statement.

H0 : μ 1 = μ 2 H0: μ1 μ2 H :μ  μ
0 1 2
Ha: μ1 μ2 Ha: μ1 > μ2
H :μ <μ
Regardless of which hypotheses used, μ1 = μ2 is always assumed to abe true
1 2
There are two words that have to be defined to understand this unit-the dependent and
independent samples. Two samples are independent if the sample selected from one
population is not related to the sample selected from the second population. Two samples
are dependent if each member of one sample corresponds to a member of the other
sample. Dependent samples are also called paired samples or matched samples.

Dependent Samples
Independent Samples
Example:

Classify each pair of samples as independent or dependent.

1. Sample 1: The weight of 24 students in a first-grade class


Sample 2: The height of the same 24 students
These samples are dependent because the weight and height can be paired with respect to
each student.
2. Sample 1: The average price of 15 new trucks
Sample 2: The average price of 20 used sedans
These samples are independent because it is not possible to pair the new trucks with the
used sedans. The data represents prices for different vehicles.
II. INDEPENDENT T-TEST

The independent t-test, as we have already mentioned is used when we wish to compare the
statistical significance of a possible difference between the means of two groups on some
independent variable and the two groups are independent of one another.

The formula for the independent t-test is

X 1 −X 2
t=
SS 1 +SS 2
√( n1 + n2 −2 )( 1 1
+
n1 n2 )
where
X1 is the mean for group 1,
X2 is the mean for group 2,
SS1 is the sum of squares for group 1,
SS2 is the sum of squares for group 2,
n1 is the number of subjects in group 1, and
n2 is the number of subjects in group 2.

The sum of squares is a new way of looking at variance. It gives us an indication of how spread
out the scores in a sample is. The t-value we are finding is the difference between the two means
divided by their sum of squares and taking the degrees of freedom into consideration.
  X1 
2

SS1   X 12 
n1

and
  X2 
2

SS2   X 22 
n2
We can see that each sum of squares is the sum of the squared scores in the sample minus the
sum of the scores quantity squared divided by the size of the sample (n).

So, to calculate the independent-t value we need to know:


1. The mean for sample or group 1
2. The mean for sample or group 2
3. The summation X and summation X squared for group 1
4. The summation X and summation X squared for group 2
5. The sample size for group 1 (n1)
6. The sample size for group 2 (n2)

We also need to know the degrees of freedom for the independent t-test which is:
df = n1 + n2 – 2

Let's do a sample problem using the independent t-test.

Example: Using the independent t-test


Research Problem: Job satisfaction as a function of work schedule was investigated in two
different factories. In the first factory the employees are on a fixed shift system while in the
second factory the workers have a rotating shift system. Under the fixed shift system, a worker
always works the same shift, while under the rotating shift system, a worker rotates through the
three shifts. Using the scores below determine if there is a significant difference in job
satisfaction between the two groups of workers.
Work Satisfaction Scores for Two Groups of Workers
Fixed Shift Rotating Shift
79 63
83 71
68 46
59 57
81 53
76 46
80 57
74 76
58 52
49 68
68 73

In this problem we see that we have two samples and the samples are independent of one
another. We can see that the inferential statistic we need to use here is the independent t-test.

We can calculate the quantities we need to solve this problem as follows:


Worksheet to calculate independent t-test value.
X1 (X1)2 X2 (X2)2
79 6241 63 3969
83 6889 71 5041
68 4624 46 2116
59 3481 57 3249
81 6561 53 2809
76 5776 46 2116
80 6400 57 3249
74 5476 76 5776
58 3364 52 2704
49 2401 68 4624
68 4 624 73 5329
------ ------ ------ ------
775 55837 662 40982

We can use the totals from this worksheet and the number of subjects in each group to calculate
the sum of squares for group 1, the sum of squares for group 2, the mean for group 1, the mean
for group 2, and the value for the independent t.

  X1 
2
(775)2
SS 1   X 12   55837   1234.73
n1 11

  X2 
2
(662)2
SS1   X 22   40982   1141.64
n2 11
775 662
X1   70.45 X2   60.18
11 11
X 1  X2 70.45  60.18
t   2.209
 SS1  SS2  1 1   1234.73  1141.64  1 1 
       
 n1  n2  2  n1 n2   11  11  2  11 11 

We now have the information we need to complete the six step statistical inference process for
our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 : μ1 = μ 2
H 1 : μ1 ≠ μ2

Note: Our problem did not state which direction of significance we will be looking for;
therefore, we will be looking for a significant difference between the two means in either
direction.
2. Set the alpha level.
 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.209
df = n1 + n2 - 2 = 11 + 11 - 2 = 20

Note: We have calculated the t-value and will also need to know the degrees of freedom
when we go to look up the critical values of t.

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is >= 2.086 or if t <= -2.086

Note: To write the decision rule we need to know the critical value for t, with an alpha
level of .05 and a two-tailed test. We can do this by looking at Appendix C (Distribution
of t) on page 318 of the text book. Look for the column of the table under .05 for Level of
significance for two-tailed tests, read down the column until you are level with 20 in the
df column, and you will find the critical value of t which is 2.086. That means our result
is significant if the calculated t value is less than or equal to -2.086 or is greater than or
equal to 2.086.

5. Write a summary statement based on the decision.


Reject H0, p < .05, two-tailed

Note: Since our calculated value of t (2.209) is greater than or equal to 2.086, we reject
the null hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English.


There is a significant difference in job satisfaction between the two groups of workers.

Additional problem using the independent t-test


Research Problem: A new test preparation company, called Bright Future (BF), wants to
convince high school students studying for the American College Testing (ACT) assessment test
that enrolling in their test preparation course would significantly improve the students' ACT
scores. BF selects 10 students at random and assigns five to the experimental group and five to
the control group. The experimental group students participate in the test preparation course
conducted by BF. At the conclusion of the course, both groups of students take the ACT test
form which was given to high school students the previous year. BF conducts a t-test for
independent samples to compare the scores of Group 1 (Experimental, E) to those of Group 2
(Control, C).
ACT Scores of Experimental and Control Groups
Experimental Group Control Group
23 17
18 19
26 21
32 14
21 19

1. State the null hypothesis and the alternative hypothesis based on your research
question.
H0: 1 = 2
H1: 1 > 2

2. Set the alpha level.  = 0.05

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.252
df = n1 + n2 - 2 = 5 + 5 - 2 = 8

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is >= 1.860

5. Write a summary statement based on the decision.


Reject H0, p < .05, one-tailed

6. Write a statement of results in Standard English.


Students who participated in the test-taking course, scored significantly higher on the
practice form of the ACT, than did the control group of students.
III. DEPENDENT T-TEST
A t-test can be used to test the difference of two population means when a sample
is randomly selected from each population. The requirements for performing the test are
that each population must be normal and each member of the first sample must be paired
with a member of the second sample. The test statistic is

d
d 
and the standardized test statistic is
n
d  μd
t  .
sd n
The degrees of freedom are d.f. = n – 1.
d = x1 – x2
To perform a two-sample hypothesis test with dependent samples, the difference
between each data pair is first found:
d = x1 – x2 Difference between entries for a data pair

The test statistic is the mean of these differences.


d
d  . Mean of the differences between paired data entries in
n the dependent samples.

Three conditions are required to conduct the test.

1. The samples must be randomly selected.

2. The samples must be dependent (paired).

3. Both populations must be normally distributed.

If these conditions are met, then the sampling distribution for d́ is approximated
by a t-distribution with n – 1 degrees of freedom, where n is the number of data pairs.

–t0 μd d
t0

The following steps can be used to solve a two means using t-test.

WORDS IN SYMBOLS
1. State the claim mathematically and verbally. State H0 and Ha.
Identify the null and alternative hypotheses.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom and sketch
d.f= n-1
the sampling distribution.
4. Determine any critical value significance level of the z-value in
the standard normal table
5. Determine any rejection region(s). See critical values of t-distribution
with ν degrees of freedom
d n (d 2 )  (d )2
d  sd 
6. Calculate d́ and sd .Use a table n n (n  1)
d  μd
7. Find the standardized test statistic. t 
sd n
8. Make a decision to reject or fail to reject the If t is in the rejection region,
null hypothesis. reject H0. Otherwise, fail to
9. Interpret the decision in the context of the reject H0.
original claim.

Examples: Problems involving two-dependent sample t-test


1. A reading center claims that students will perform better on a standardized reading test
after going through the reading course offered by their center. The table shows the
reading scores of 6 students before and after the course. At  = 0.05, is there enough
evidence to conclude that the students’ scores after the course are better than the scores
before the course?

STEP 1. (Claim)
H0: d  0 Ha: d > 0 (Claim)
=
STEP 2. 0.0
5

- -2 -1 0 1 2 3 t
3
STEP 3. Identify the degrees of freedom

d.f. = 6 – 1 = 5

STEP 4 AND STEP 5. Determine any critical value and any rejection region(s).
t0 = 2.015
STEP 6 Calculate d́ and sd .
d = (score before) – (score after)

 d  43
 d 2  833
d 43
d    7.167
n 6

n ( d 2 )  ( d )2  6(833)  1849  104.967  10.245


sd 
n (n  1) 6(5)
STEP 7. Find the standardized test statistic.
d  μ d 7.167  0
t    1.714.
s d n 10.245 6
STEP 8. Make a decision to reject or fail to reject the null hypothesis.

Fail to reject H0.


t
-3 -2 -1 0 1 2 3 t0 = 2.015

 1.714.

STEP 9: Interpret

There is not enough evidence at the 5% level to support the claim that the students’
scores after the course are better than the scores before the course.
Unit IV
CORRELATION AND REGRESSION
LEARNING OBJECTIVES
In this chapter we would look at the most commonly used techniques for investigating the
relationship between two quantitative variables. These variables are correlation and regression.
Specifically, at end of this unit, the following target are be are to be accomplished:
 Define correlation and regression
 Compute and interpret a correlation coefficient
 Compute and interpret coefficients in a linear regression analysis

I. INTRODUCTION
Our interest in this chapter is in situations in which we can associate to each
element of a population or sample two measurements x and y, particularly in the case that
it is of interest to use the value of x to predict the value of y. For example, the population
could be the air in automobile garages, x could be the electrical current produced by an
electrochemical reaction taking place in a carbon monoxide meter, and y the
concentration of carbon monoxide in the air. In this chapter we will learn statistical
methods for analyzing the relationship between variables x and y in this context.

II. CORRELATION

 A correlation is a relationship between two variables. The data can be represented by


the ordered pairs (x, y) where x is the independent (or explanatory) variable, and y is
the dependent (or response) variable.
 A scatter plot, like what figure 1 and figure 2 show, can be used to determine whether a
linear (straight line) correlation exists between two variables.

Figure 1 Figure 2

CORRELATION COEFFICIENT

The correlation coefficient is a measure of the strength and the direction of a linear
relationship between two variables. The symbol r represents the sample correlation coefficient.
n  xy   x   y 
The formula for r is : r  .
 x    y 
2 2 2 2
n x n y
The range of the correlation coefficient is 1 to 1. If x and y have a strong positive linear
correlation, r is close to 1. If x and y have a strong negative linear correlation, r is close to 1. If
y
there is no linear correlation or a weak. Examples are shown below.
y

r=
r=
0.91
x
Strong negative correlation y 0.88x
y Strong positive correlation

x
r = 0.42
x
r = 0.07

Weak positive correlation


The population parameter is denoted by the greekNonlinear
letter rhoCorrelation
and the sample statistic is
denoted by the roman letter r.
Here are some properties of r:
 r only measures the strength of a linear relationship. There are other kinds of
relationships besides linear.
 r is always between -1 and 1 inclusive. -1 means perfect negative linear correlation and
+1 means perfect positive linear correlation
 r has the same sign as the slope of the regression (best fit) line
 r does not change if the independent (x) and dependent (y) variables are interchanged
 r does not change if the scale on either variable is changed. You may multiply, divide,
add, or subtract a value to/from all the x-values or y-values without changing the value of
r.
 r has a Student's t distribution

CALCULATING A CORRELATION COEFFICIENT

WORDS IN SYMBOLS
1. Find the sum of the x-values. ∑x
2. Find the sum of the y-values. ∑y
3. Multiply each x-value by its corresponding
y-value and find the sum. ∑ xy
4. Square each x-value and find the sum. ∑ x2
5. Square each y-value and find the sum ∑ y2
6. Use these five sums to calculate the n  xy    x    y 
r  .
n x  x  n  y   y 
2 2
correlation coefficient. 2 2

HYPOTHESIS TESTING FOR CORRELATION

The t-Test for the Correlation Coefficient: A t-test can be used to test whether the
correlation between two variables is significant. The test statistic is r and the standardized test
statistic follows a t-distribution with n – 2 degrees of freedom. In this text, only two-tailed
hypothesis tests for ρ are considered. It can be computed using the formula presented below.

r r
t  
σr 1 r2
n 2

Where r is the correlation coefficient, n-2 is the degree of freedom and 1-r2 coefficient of
non-determination.

 The claim we will be testing is "There is significant linear correlation"


 Hypothesis testing is always done under the assumption that the null hypothesis is true.
 The following steps can be done for the hypothesis testing of linear correlation.

WORDS IN SYMBOLS
1. State the null and alternative hypothesis. State H0 and Ha.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom d.f= n-2
4. Determine any critical value(s) and any
rejection region(s). See critical values of t-distribution
with ν degrees of freedom

r
t 
5. Find the standardized test statistic 1 r2
n 2
6. Make a decision to reject or fail to reject the If t is in the rejection region,
null hypothesis. reject H0. Otherwise fail to
7. Interpret the decision in the context of the reject H0.
original claim.

EXAMPLE
1. The following data represents the number of hours 12 different students watched
television during the weekend and the scores of each student who took a test the
following Monday. The correlation coefficient r  0.831. Test the significance of this
correlation coefficient significant at  = 0.01?
STEP 1. Claim
H0: ρ = 0 (no correlation) Ha: ρ  0 (significant correlation)
STEP 2. The level of significance is  = 0.01
STEP 3. Degrees of freedom are d.f. = 12 – 2 = 10
STEP 4 AND STEP 5. The critical values are t0 = 3.169 and t0 = 3.169.

STEP 6 The standardized test statistic is


t 
r 0.831

1 r2 1  (0.831)2  4.72.
n 2 12  2 The test statistic falls in the
rejection region, so H0 is
STEP 7. Decide whether to reject or fail to reject the null hypothesis.

rejected.
 4.72. The test statistic falls in the rejection region,
so H0 is rejected.
t
t0 = 3.169 0 t0 = 3.169
STEP 8. Interpret
At the 1% level of significance, there is enough evidence to conclude that
there is a significant linear correlation between the number of hours of TV
watched over the weekend and the test scores on Monday morning.

CORRELATION AND CAUSATION

The fact that two variables are strongly correlated does not in itself imply a cause-and-
effect relationship between the variables. If there is a significant correlation between two
variables, you should consider the following possibilities:

 There is a direct cause and effect relationship


 There is a reverse cause and effect relationship
 The relationship may be caused by a third variable
 The relationship may be caused by complex interactions of several variables
 The relationship may be coincidental

III. REGRESSION
A regression line, also called a line of best fit, is the line for which the sum of the
squares of the residuals is a minimum. Since it "best fits" the data, it makes sense that the line
passes through the means.
The idea behind regression is that when there is significant linear correlation, you can use
a line to estimate the value of the dependent variable for certain values of the independent
variable.
The regression equation should only used
 When there is significant linear correlation. That is, when you reject the null hypothesis
that rho=0 in a correlation hypothesis test.
 The value of the independent variable being used in the estimation is close to the original
values. That is, you should not use a regression equation obtained using x's between 10
and 20 to estimate y when x is 200.
 The regression equation should not be used with different populations. That is, if x is the
height of a male, and y is the weight of a male, then you shouldn't use the regression
equation to estimate the weight of a female.
 The regression equation shouldn't be used to forecast values not from that time frame. If
data is from the 1960's, it probably isn't valid in the 1990's.

Assuming that you've decided that you can have a regression equation because there is
significant linear correlation between the two variables, the equation becomes line for an
independent variable x and a dependent variable y is ŷ = mx + b where ŷ is the predicted y-value
for a given x-value. The slope m and y-intercept b are given by

m is the slope of the regression line:

n ( ∑ xy ) − ( ∑ x )( ∑ y )
m= 2
n (∑ x 2) − (∑ x )

b is the y-intercept of the regression line:

(∑ y ) (∑ x 2 ) − (∑ x )(∑ xy )
b= ∑ y -m ∑ x
2 b = ý -m x́ =
n (∑ x 2) − (∑ x ) or m n

Where ý is the mean of the y-values and x́ is the mean of the x-values. The regression line
always passes through ¿ , ý)

EXAMPLE:

1. The following data represents the number of hours 12 different students watched
television during the weekend and the scores of each student who took a test the
following Monday.
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score for a student who watches 9 hours of TV.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
96 85 82 74 95 68 76 84 58 65 75 50
y
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100

y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

 x  54  y  908  x y  3724  x 2
 332  y 2
 70836

SOLUTION:
n xy  x   y  12(3724)   54   908
m    4.067
 x 
2 2
n x 12(332)   54 
2

ŷ = –4.07x + 93.97

1. COEFFICIENT OF DETERMINATION

Coefficient of Determination
The coefficient of determination is
 the percent of the variation that can be explained by the regression equation
 the explained variation divided by the total variation
 the square of r

What's all this variation stuff?


908 54
b  y mx 
12
 (4.067)
12  93.97
Every sample has some variation in it (unless all the values are identical, and that's unlikely to
happen). The total variation is made up of two parts, the part that can be explained by the
regression equation and the part that can't be explained by the regression equation.

2
∑ ( y− ȳ )2 = ∑ ( y'− ȳ )
+ ∑ ( y− y' )2
total = explained + unexplained
Well, the ratio of the explained variation to the total variation is a measure of how good the
regression line is. If the regression line passed through every point on the scatter plot exactly, it
would be able to explain all of the variation. The further the line is from the points, the less it is
able to explain.

Coefficient of Non-Determination
The coefficient of non-determination is ...
 The percent of variation which is unexplained by the regression equation
 The unexplained variation divided by the total variation
 1 - r2

Standard Error of the Estimate


The coefficient of non-determination was used in the t-test to see if there was significant linear
correlation. It was the in the numerator of the standard error formula.

1−r 2
se =

n−2

The standard error of the estimate is the square root of the coefficient of non-determination
divided by it's degrees of freedom.

Confidence Interval for y'

1−r 2
E = z α/2
n−2√
y' − E < y < y' + E
The following only works when the sample size is large. Large in this instance is usually taken to
be more than 100. We're not going to cover this in class, but is provided here for your
information. The maximum error of the estimate is given, and this maximum error of the
estimate is subtracted from and added to the estimated value of y.
Unit VII

CHI-SQUARE

LEARNING OUTCOMES:

In this lesson we are going to look a statistical test that uses the Chi square distribution that is
applicable on both large and small samples depending on their context. At end of this unit, the
following target should have been accomplished.

 To describe the properties of the chi distribution


 To compare the variance of a sample to given variance using the chi distribution
 To determine the goodness of it or strength of relationship of categorical values through
the Chi square test.

I. INTRODUCTION
In this chapter we explore two types of hypothesis tests that require the Chi Square Distribution.
The Chi Square statistics is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-square test is that no relationship exists on the
categorical variables in the population; they are independent.

II. Chi Distribution

2 
 n  1  s2  df  s2
2 2
The chi-square (  ) distribution is obtained from the values of the ratio of the sample variance
2

and population variance multiplied by the degrees of freedom. This occurs when the population
is normally distributed with population variance sigma^2.

The Chi-Square distribution has only one parameter. df : Degree of freedom

The degrees of freedom depends on the application, as we will see later. Here are a few facts
about the Chi-Square distribution. If X 2 ≈ X 2df the following are true of X 2 .

 X 2 is a continuous random variable


 X 2 =Z 2 +Z 2 +…+ Z 2 ; X 2 is the sum of df independent squared standard normal random
variables
 data values can’t be negative: x ∈ [0, ∞)
 µ = df (the mean of the Chi-Square distribution is the df)
 σ = √ 2 ∗ df
 X 2 is skewed right
 the mean (df) is just to the right of the peak of the density curve
 when df > 90, X 2 is approximately normal

Properties of the Chi-Square


 Chi-square is non-negative. Is the ratio of two non-negative values, therefore must be
non-negative itself.
 Chi-square is non-symmetric.
 There are many different chi-square distributions, one for each degree of freedom.
 The degrees of freedom when working with a single population variance is n-1.

Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail values is
different from the method for looking up right tail values.
 Area to the right - just use the area given.
 Area to the left - the table requires the area to the right, so subtract the given area from
one and look this area up in the table.
 Area in both tails - divide the area by two. Look up this area for the right critical value
and one minus this area for the left critical value.

Single population variance

2 df × s 2
The variable X =   has a chi-square distribution if the population variance has a normal
σ2
distribution. The degrees of freedom are n-1. We can use this to test the population variance
under certain conditions.

Condition for testing

 The population has a normal distribution


 The data is from a random sample
 The observations must be independent of each other
 The test statistic has a chi-square distribution with n-1 degrees of freedom and is given
2 df × s 2
by: X =
σ2
Testing is done in the same manner as before. Remember, all hypothesis testing is done under the
assumption the null hypothesis is true.

Confidence Interval

2 df × s 2
If you solve the test statistic formula for the population variance, you get: X =
σ2
1. Find the two critical values (alpha/2 and 1-alpha/2)
2. Compute the value for the population variance given above.
3. Place the population variance between the two values calculated in step 2 (put the smaller
one first).

Note, the left-hand endpoint of the confidence interval comes when the right critical value is
used and the right-hand endpoint of the confidence interval comes when the left critical value is
used. This is because the critical values are in the denominator and so dividing by the larger
critical value (right tail) gives the smaller endpoint.

Goodness of Fit Test

We use the goodness of fit test to test if a discrete categorical random variable matches a
predetermined “expected” distribution. The hypotheses in a goodness of fit test are

H o: the actual distribution fits the expected distribution

H a: the actual distribution does not fit the expected distribution

Requirement: In order for a chi-square goodness of fit test to be appropriate, the expected value
in each category must be at least 5. It may be possible to combine categories to meet this
requirement.

Definition of Goodness of Fit test

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed
value of a given phenomenon is significantly different from the expected value. In Chi-Square
goodness of fit test, the term goodness of fit is used to compare the observed sample distribution
with the expected probability distribution. Chi-Square goodness of fit test determines how well
theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. In
Chi-Square goodness of fit test, sample data is divided into intervals. Then the numbers of points
that fall into the interval are compared, with the expected numbers of points in each interval.

Procedure for Chi-Square Goodness of Fit Test:

Set up the hypothesis for Chi-Square goodness of fit test:

 Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value.
 Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a significant difference between the observed and the expected
value.

Compute the value of Chi-Square goodness of fit test using the following formula
2 ( O−E )2
X = where, X 2 =Chi−Square goodness of fit test O=observed value E=expected value
E

Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom depends on the
distribution of the sample. The following table shows the distribution and an associated degree
of freedom:

Type of distribution No of constraints Degree of Freedom


Binomial Distribution 1 n-1
Poisson Distribution 2 n-2
Normal distribution 3 n-3

Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the same as in other
tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit test is
compared with the table value. If the calculated value of Chi-Square goodness of fit test is
greater than the table value, we will reject the null hypothesis and conclude that there is a
significant difference between the observed and the expected frequency. If the calculated value
of Chi-Square goodness of fit test is less than the table value, we will accept the null hypothesis
and conclude that there is no significant difference between the observed and expected value

The idea is that if the observed frequency is really close to the claimed (expected) frequency,
then the square of the deviations will be small. The square of the deviation is divided by the
expected frequency to weight frequencies. A difference of 10 may be very significant if 12 was
the expected frequency, but a difference of 10 isn't very significant at all if the expected
frequency was 1200.

If the sum of these weighted squared deviations is small, the observed frequencies are close to
the expected frequencies and there would be no reason to reject the claim that it came from that
distribution. Only when the sum is large is the reason to question the distribution. Therefore, the
chi-square goodness-of-fit test is always a right tail test.

 Observed  Expected  2
 
2
Expected

The test statistic has a chi-square distribution when the following assumptions are met
 The data are obtained from a random sample
 The expected frequency of each category must be at least 5. This goes back to the
requirement that the data be normally distributed. You're simulating a multinomial
experiment (using a discrete distribution) with the goodness-of-fit test (and a continuous
distribution), and if each expected frequency is at least five then you can use the normal
distribution to approximate (much like the binomial). If the expected

The following are properties of the goodness-of-fit test


 The data are the observed frequencies. This means that there is only one data value for
each category.
 The degree of freedom is one less than the number of categories, not one less than the
sample size.
 It is always a right tail test.
 It has a chi-square distribution.
 The value of the test statistic doesn't change if the order of the categories is switched.
 The test statistic is

Interpreting the Claim


There are four ways you might be given a claim.
1. The values occur with equal frequency. Other words for this are "uniform", "no
preference", or "no difference". To find the expected frequencies, total the observed
frequencies and divide by the number of categories. This quotient is the expected
frequency for each category.
2. Specific proportions or probabilities are given. To find the expected frequencies, multiply
the total of the observed frequencies by the probability for each category.
3. The expected frequencies are given to you. In this case, you don't have to do anything.
4. A specific distribution is claimed. For example, "The data is normally distributed". To
work a problem like this, you need to group the data and find the frequency for each
class. Then, find the probability of being within that class by converting the scores to z-
scores and looking up the probabilities. Finally, multiply the probabilities by the total
observed frequency. (It's not really as bad as it sounds).

One-Variable Chi-Square (goodness-of-fit test) with equal expected frequencies

We can use the chi-square statistic to test the distribution of measures over levels of a variable to
indicate if the distribution of measures is the same for all levels. This is the first use of the one-
variable chi-square test. This test is also referred to as the goodness-of-fit test.

Example 1: Fair Die: Suppose we wish to test if a die is weighted. We roll the die 120 times and
get the following “observed” results.
Roll Observed Expected
1 15
2 29
3 16
4 15
5 30
6 15
1. What is the expected distribution of the 120 die rolls? Complete the table.
2. Is the requirement for a chi-square goodness of fit test satisfied? Explain.
3. Write the null and alternative hypotheses for a goodness of fit test.
4. I can see that the rolls didn’t come out even. What’s the point of completing the test?
Solution:

Our goal is to see if the observed values are close enough to the expected values that the
differences could be due to random variation or, alternatively, if the differences great enough that
we can conclude that the distribution is not as expected. Therefore, our sample statistic (which is
also the test statistic in this case) should provide a measure of how far the from “expected”
frequencies the “observed” frequencies are, as a group. The test statistic for a goodness of fit test
is:

2 ( O−E )2
X =∑
E

where O = observed frequency, E = expected frequency, and the sum is taken overall the
categories.

Example 2: The data for 100 students is recorded in the table below (the observed frequencies).
We have also indicated the expected frequency for each category. Since there are 100 measures
or observations and there are three categories (Macintosh, IBM, and Other) we would indicate
the expected frequency for each category to be 100/3 or 33.333. In the third column of the table
we have calculated the square of the observed frequency minus the expected frequency divided
by the expected frequency. The sum of the third column would be the value of the chi-square
statistic.

Frequency with which students select computer brand


Observed Expected
Computer (O - E)2/E
Frequency Frequency
IBM 47 33.333 5.604
Macintosh 36 33.333 0.213
Other 17 33.333 8.003
Total (chi-square) 13.820

From the table we can see that:


(O  E )2
2   5.604  0.213  8.003  13.820
E
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level and
with degrees of freedom of 2 obtained from Distribution of Chi Square table. Looking under the
column for .05 and the row for df = 2 we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing statistical
hypotheses for our research problem.

Solution:
Step 1: State the null hypothesis and the alternative hypothesis based on your research question.
H 0 :O  E
H1 :O  E

Note: Our null hypothesis, for the chi-square test, states that there are no differences between the
observed and the expected frequencies. The alternate hypothesis states that there are significant
differences between the observed and expected frequencies.

Step 2: Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

Step 3: Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for
the statistical test if necessary.
 2 =13.820
df = C - 1 = 2

Step 4: Write the decision rule for rejecting the null hypothesis.
Reject H0 if  >= 5.991.
2

Note: To write the decision rule we had to know the critical value for chi-square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and
noting the tabled value for the column for the .05 level and the row for 2 df.

Step 5: Write a summary statement based on the decision.


Reject H0, p < .05

Note: Since our calculated value of  (13.820) is greater than 5.991, we reject the null
2

hypothesis and accept the alternative hypothesis.

Step 6: Write a statement of results in standard English.


There is a significant difference among the frequencies with which students purchased three
different brands of computers.
Other step

Example 2: Acme Toy Company prints baseball cards. The company claims that 30% of the
cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars.

Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this
consistent with Acme's claim? Use a 0.05 level of significance.

Solution:

Step 1: State the null hypothesis and the alternative hypothesis based on your research question.

 Null hypothesis: The proportion of rookies, veterans, and All-Stars is 30%, 60% and
10%, respectively.
 Alternative hypothesis: At least one of the proportions in the null hypothesis is false.

Step 2: Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

Step 3: Analyze sample data: Applying the chi-square goodness of fit test to sample data, we
compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic.
Based on the chi-square statistic and the degrees of freedom, we determine the P-value.

DF = k - 1 = 3 - 1 = 2 (Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
Χ2 = Σ [ (Oi - Ei)2 / Ei ]
Χ2 = [ (50 - 30)2 / 30 ] + [ (45 - 60)2 / 60 ] + [ (5 - 10)2 / 10 ]
Χ2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58

where DF is the degrees of freedom, k is the number of levels of the categorical variable, n is the
number of observations in the sample, Ei is the expected frequency count for level i, Oi is the
observed frequency count for level i, and Χ2 is the chi-square test statistic.

The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more
extreme than 19.58.

Step 4: Interpret the result


Since the P-value (0.0001) is less than the significance level (0.05), we cannot accept the null
hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple
random sampling, the variable under study was categorical, and each level of the categorical
variable had an expected frequency count of at least 5.

One-Variable Chi-Square (goodness-of-fit test) with predetermined expected frequencies


Let's look at the problem we just solved, in a way that illustrates the other use of one-variable
chi-square that is with predetermined expected frequencies rather than with equal frequencies.
We could formulate our revised problem as follows:

Example 3: In a national study, students required to buy computers for college use bought IBM
computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of
the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought
IBM computers, and 17 bought some other brand of computer. We want to know if this
frequency of computer buying behavior is similar to or different than the national study data.

The data for 100 students is recorded in the table below (the observed frequencies). In this case
the expected frequencies are those from the national study. To get the expected frequency we
take the percentages from the national study times the total number of subjects in the current
study.
 Expected frequency for IBM = 100 X 50% = 50
 Expected frequency for Macintosh = 100 X 25% = 25
 Expected frequency for Other = 100 X 25% = 25

The expected frequencies are recorded in the second column of the table. As before we have
calculated the square of the observed frequency minus the expected frequency divided by the
expected frequency and recorded this result in the third column of the table. The sum of the third
column would be the value of the chi-square statistic.

Frequency with which students select computer brand


Observed Expected
Computer (O - E)2/E
Frequency Frequency
IBM 47 50 0.18
Macintosh 36 25 4.84
Other 17 25 2.56
Total (chi-square) 7.58

From the table we can see that:


2
χ = 0.18 + 4.84 + 2.56 = 7.58
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level and
with degrees of freedom of 2. We see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six-step process for testing statistical
hypotheses for our research problem.
Step 1: State the null hypothesis and the alternative hypothesis based on your research
question.
H0 : O  E
H1 : O  E

Note: Our null hypothesis, for the chi-square test, states that there are no differences between the
observed and the expected frequencies. The alternate hypothesis states that there are significant
differences between the observed and expected frequencies.

Step 2: Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.

Step 3: Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
2
χ =7.58
df = C - 1 = 2

Step 4: Write the decision rule for rejecting the null hypothesis.
2
Reject H0 if χ >= 5.991.

Step 5: Write a summary statement based on the decision.


Reject H0, p < .05

2
Note: Since our calculated value of χ (7.58) is greater than 5.991, we reject the null
hypothesis and accept the alternative hypothesis.

Step 6: Write a statement of results in Standard English.


There is a significant difference among the frequencies with which students purchased three
different brands of computers and the proportions suggested by a national study.

Test for Independence

The Chi-Square test of independence is used to determine if there is a significant relationship


between two nominal (categorical) variables. The frequency of each category for one nominal
variable is compared across the categories of the second nominal variable. The data can be
displayed in a contingency table where each row represents a category for one variable and each
column represents a category for the other variable. For example, say a researcher wants to
examine the relationship between gender (male vs. female) and empathy (high vs. low). The chi-
square test of independence can be used to examine this relationship. The null hypothesis for
this test is that there is no relationship between gender and empathy. The alternative hypothesis
is that there is a relationship between gender and empathy (e.g. there are more high-empathy
females than high-empathy males).

In the test for independence, the claim is that the row and column variables are independent of
each other. This is the null hypothesis.

The multiplication rule said that if two events were independent, then the probability of both
occurring was the product of the probabilities of each occurring. This is key to working the test
for independence. If you end up rejecting the null hypothesis, then the assumption must have
been wrong and the row and column variable are dependent. Remember, all hypothesis testing is
done under the assumption the null hypothesis is true.

The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the
test for independence is the same as the principle behind the goodness-of-fit test. The test for
independence is always a right tail test.

How to calculate the Chi-square statistics by hand: First we have to calculate the expected value
of the two nominal variables. We can calculate the expected value of the two nominal variables
by using this formula:
c y

∑ Oi , j ∑ O k , j
Ei , j= k−1 k−1
N

where:

Ei , j= Expected Value
c

∑ Oi , j= Sum of the ith column


k −1

∑ O k , j= Sum of the kth row


k −1

N = total number

After calculating the expected value, we will apply the following formula to calculate the value
of the Chi-Square test of Independence:
y c 2
2 ( Oij −Eij )
X =∑ ∑
i=1 j=1 Eij

where:

X 2 = Chi-Square test of Independence

O ij=Observed value of two nominal variables

Eij =Expected value of two nominal variables

Degree of freedom is calculated by using the following formula:

DF = (r-1)(c-1)

Where

DF = Degree of freedom

r = number of rows

c = number of column

Hypothesis:

Null hypothesis: Assumes that there is no association between the two variables

Alternative hypothesis: Assumes that there is an association between the two variables

Hypothesis testing: Hypothesis testing for the chi-square test of independence as it is for other
tests like ANOVA, where a test statistic is computed and compared to a critical value. The
critical value for the chi-square statistic is determined by the level of significance (typically .05)
and the degrees of freedom. The degrees of freedom for the chi-square are calculated using the
following formula: df = (r-1)(c-1) where r is the number of rows and c is the number of columns.
If the observed chi-square test statistic is greater than the critical value, the null hypothesis can
be rejected.

Example 4: A public opinion poll surveyed a simple random sample of 1000 voters.
Respondents were classified by gender (male or female) and by voting preference (Republican,
Democrat, or Independent). Results are shown in the contingency table below.

Voting Preferences Total Row


Rep Dem Ind
Male 200 150 50 400
Female 250 300 50 600

Column Total 450 450 100 1000

Is there a gender gap? Do the men's voting preferences differ significantly from the women's
preferences? Use a 0.05 level of significance.

Solution:

Step 1: State the Hypothesis. The first step is to state the null hypothesis and an alternative
hypothesis.

Ho: Gender and voting preferences are independent.

Ha: Gender and voting preferences are not independent.

Step 2: Formulate an analysis plan. For this analysis, the significance level is 0.05. Using
sample data, we will conduct a chi-square test for independence.

Step 3: Analyze sample data. Applying the chi-square test for independence to sample data, we
compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic.
Based on the chi-square statistic and the degrees of freedom, we determine the P-value.

DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2

Er,c = (nr * nc) / n

E1,1 = (400 * 450) / 1000 = 180000/1000 = 180

E1,2 = (400 * 450) / 1000 = 180000/1000 = 180

E1,3 = (400 * 100) / 1000 = 40000/1000 = 40

E2,1 = (600 * 450) / 1000 = 270000/1000 = 270

E2,2 = (600 * 450) / 1000 = 270000/1000 = 270

E2,3 = (600 * 100) / 1000 = 60000/1000 = 60

Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]


Χ2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40

+ (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/60

Χ2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60

Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2

where DF is the degrees of freedom, r is the number of levels of gender, c is the number of levels
of the voting preference, nr is the number of observations from level r of gender, nc is the
number of observations from level c of voting preference, n is the number of observations in the
sample, Er,c is the expected frequency count when gender is level r and voting preference is
level c, and Or,c is the observed frequency count when gender is level r voting preference is
level c.

The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more
extreme than 16.2.

We use the Chi-Square Distribution Calculator to find P(Χ2 > 16.2) = 0.0003.

Step 5: Interpret results. Since the P-value (0.0003) is less than the significance level (0.05), we
cannot accept the null hypothesis. Thus, we conclude that there is a relationship between gender
and voting preference.

Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple
random sampling, the variables under study were categorical, and the expected frequency count
was at least 5 in each cell of the contingency table.

Chi-square is a useful non-parametric statistic to help evaluate statistical hypothesis, involving


frequencies with which observations fall in various categories (nominal data).

Definitions of Key Terms


Chi-square distribution
 A distribution obtained from the multiplying the ratio of sample variance to population
variance by the degrees of freedom when random samples are selected from a normally
distributed population

Contingency Table
 Data arranged in table form for the chi-square independence test

Expected Frequency
 The frequencies obtained by calculation.

Goodness-of-fit Test
 A test to see if a sample comes from a population with the given distribution.
Independence Test
 A test to see if the row and column variables are independent.

Observed Frequency
 The frequencies obtained by observation. These are the sample frequencies.

Unit VIII

F-test

LEARNING OBJECTIVES

In this unit, we are going to understand another statistical test which uses f-distribution. At
end of this unit, the following target are be are to be accomplished

 To describe the properties and application of the F-test


 To compare the variance of two samples using the F-test
 To compare the means of three or more independent samples using One-way ANOVA
with the aid of Tukey’s LSD or Scheffe’s Test
 To compare the means of three or more independent samples with two variables using
Two-way ANOVA

I. F-test

Definition:

An “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when
people talk about the F-Test, what they are actually talking about is The F-Test to Compare Two
Variances. However, the f-statistic is used in a variety of tests including regression analysis, the
Chow test and the Scheffe Test (a post-hoc ANOVA test).

Statistics Solutions is the country’s leader in F-test and dissertation statistics. Contact Statistics
Solutions today for a free 30-minute consultation. The F-test contains some applications that are
used in statistical theory. This document will detail the applications.

The F-test is used by a researcher in order to carry out the test for the equality of the two
population variances. If a researcher wants to test whether or not two independent samples have
been drawn from a normal population with the same variability, then he generally employs the F-
test.
The F-test is also used by the researcher to determine whether or not the two independent
estimates of the population variances are homogeneous in nature.

An example depicting the above case in which the F-test is applied is, for example, if two sets of
pumpkins are grown under two different experimental conditions. In this case, the researcher
would select a random sample of size 9 and 11. The standard deviations of their weights are 0.6
and 0.8 respectively. After making an assumption that the distribution of their weights is normal,
the researcher conducts an F-test to test the hypothesis on whether or not the true variances are
equal.

The researcher uses the F-test to test the significance of an observed multiple correlation
coefficient. It is also used by the researcher to test the significance of an observed sample
correlation ratio. The sample correlation ratio is defined as a measure of association as the
statistical dispersion in the categories within the sample as a whole. Its significance is tested by
the researcher.

The F-distribution is formed by the ratio of two independent chi-square variables divided by their
respective degrees of freedom.

Since F is formed by chi-square, many of the chi-square properties carry

df 1⋅s 21
2
σ1
df 1
F=
df 2⋅s 22
σ 22
df 2

over to the F distribution.


 The F-values are all non-negative
 The distribution is non-symmetric
 The mean is approximately 1
 There are two independent degrees of freedom, one for the numerator, and one for the
denominator.
 There are many different F distributions, one for each pair of degrees of freedom.

F-Test
The F-test is designed to test if two population variances are equal. It does this by comparing the
ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.

s12
F
s22
All hypothesis testing is done under the assumption the null hypothesis is true

If the null hypothesis is true, then the F test-statistic given above can be simplified
(dramatically). This ratio of sample variances will be test statistic used. If the null hypothesis is
false, then we will reject the null hypothesis that the ratio was equal to 1 and our assumption that
they were equal.

There are several different F-tables. Each one has a different level of significance. So, find the
correct level of significance first, and then look up the numerator degrees of freedom and the
denominator degrees of freedom to find the critical value.

You will notice that all of the tables only give level of significance for right tail tests. Because
the F distribution is not symmetric, and there are no negative values, you may not simply take
the opposite of the right critical value to find the left critical value. The way to find a left critical
value is to reverse the degrees of freedom, look up the right critical value, and then take the
reciprocal of this value. For example, the critical value with 0.05 on the left with 12 numerator
and 15 denominator degrees of freedom is found of taking the reciprocal of the critical value
with 0.05 on the right with 15 numerator and 12 denominator degrees of freedom.

Avoiding Left Critical Values


Since the left critical values are a pain to calculate, they are often avoided altogether. This is the
procedure followed in the textbook. You can force the F test into a right tail test by placing the
sample with the large variance in the numerator and the smaller variance in the denominator. It
does not matter which sample has the larger sample size, only which sample has the larger
variance.

The numerator degrees of freedom will be the degrees of freedom for whichever sample has the
larger variance (since it is in the numerator) and the denominator degrees of freedom will be the
degrees of freedom for whichever sample has the smaller variance (since it is in the
denominator).

If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up and
compare the right critical value.

Assumptions / Notes
 The larger variance should always be placed in the numerator
 The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
 Divide alpha by 2 for a two-tail test and then find the right critical value
 If standard deviations are given instead of variances, they must be squared
 When the degrees of freedom aren't given in the table, go with the value with the larger
critical value (this happens to be the smaller degrees of freedom). This is so that you are
less likely to reject in error (type I error)
 The populations from which the samples were obtained must be normal.
 The samples must be independent

General for an F test


If you’re running an F Test, you should use Excel, SPSS, Minitab or some other kind of
technology to run the test. Why? Calculating the F test by hand, including variances, is tedious
and time-consuming. Therefore you’ll probably make some errors along the way.

Step 1: State the null hypothesis and the alternate hypothesis.

Step 2: Calculate the F value. The F Value is calculated using the formula F = (SSE1 – SSE2 /
m) / SSE2 / n-k, where SSE = residual sum of squares, m = number of restrictions and k =
number of independent variables.

Step 3: Find the F Statistic (the critical value for this test). The F statistic formula is:

F Statistic = variance of the group means / mean of the within group variances.

You can find the F Statistic in the F-Table.

Step 4: Support or Reject the Null Hypothesis.

One-way ANOVA

The one-way analysis of variance (ANOVA) is used to determine whether there are any
statistically significant differences between the means of three or more independent (unrelated)
groups. This guide will provide a brief introduction to the one-way ANOVA, including the
assumptions of the test and when you should use this test.

What those this test do?

The one-way ANOVA compares the means between the groups you are interested in and
determines whether any of those means are statistically significantly different from each other.
Specifically, it tests the null hypothesis:

where µ = group mean and k = number of groups. If, however, the one-way ANOVA returns a
statistically significant result, we accept the alternative hypothesis (H A), which is that there are at
least two group means that are statistically significantly different from each other.

At this point, it is important to realize that the one-way ANOVA is an omnibus test statistic and
cannot tell you which specific groups were statistically significantly different from each other,
only which at least two groups were. To determine which specific groups differed from each
other, you need to use a post hoc test. Post hoc tests are described later in this guide.

For some statisticians the ANOVA doesn’t end there – they assume a cause effect relationship
and say that one or more independent, controlled variables (the factors) cause the significant
difference of one or more characteristics. The way this works is that the factors sort the data
points into one of the groups and therefore they cause the difference in the mean value of the
groups.

Example: Let us claim that woman have on average longer hair than men. We find twenty
undergraduate students and measure the length of their hair. A conservative statistician would
then claim we measured the hair of ten female and ten male students, and that we conducted an
analysis of variance and found that the average hair of female undergraduate students is
significantly longer than the hair of their fellow male students.

Assumptions
 The populations from which the samples were obtained must be normally or
approximately normally distributed.
 The samples must be independent.
 The variances of the populations must be equal.

Hypotheses
The null hypothesis will be that all population means are equal, the alternative hypothesis is that
at least one mean is different.

In the following, lower case letters apply to the individual samples and capital letters apply to the
entire set collectively. That is, n is one of many sample sizes, but N is the total sample size.

Grand Mean

X̄ GM =
∑x
N
The grand mean of a set of samples is the total of all the data values divided by the total sample
size. This requires that you have all of the sample data available to you, which is usually the
case, but not always. It turns out that all that is necessary to find perform a one-way analysis of
variance are the number of samples, the sample means, the sample variances, and the sample
sizes.

X̄ GM =
∑ n x̄
∑n
Another way to find the grand mean is to find the weighted average of the sample means. The
weight applied is the sample size.

Total Variation

2
SS(T ) = ∑ ( x − X̄ GM )
The total variation (not variance) is comprised the sum of the squares of the differences of each
mean with the grand mean.

There is the between group variation and the within group variation. The whole idea behind the
analysis of variance is to compare the ratio of between group variance to within group variance.
If the variance caused by the interaction between the samples is much larger when compared to
the variance that appears within each group, then it is because the means aren't the same.

Between Group Variation


SS(B ) = ∑ n ( x̄ − X̄ GM )2

The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares
Between groups. If the sample means are close to each other (and therefore the Grand Mean) this
will be small. There are k samples involved with one data value for each sample (the sample
mean), so there are k – 1 degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean Square
Between groups. This is the between group variation divided by its degrees of freedom. It is also
2
s
denoted by b .
Within Group Variation
SS(W ) = ∑ df⋅s 2
The variation due to differences within individual samples, denoted SS(W) for Sum of Squares
Within groups. Each sample is considered independently, no interaction between samples is
involved. The degree of freedom is equal to the sum of the individual degrees of freedom for
each sample. Since each sample has degrees of freedom equal to one less than their sample sizes,
and there are k samples, the total degrees of freedom is k less than the total sample size:

df = N – k

The variance due to the differences within individual samples is denoted MS(W) for Mean
Square Within groups. This is the within group variation divided by its degrees of freedom. It is
2
also denoted by s w . It is the weighted average of the variances (weighted with the degrees of
freedom).

F test statistic
sb2
F
sw2

Recall that a F variable is the ratio of two independent chi-square variables divided by their
respective degrees of freedom. Also recall that the F test statistic is the ratio of two sample
variances, well, it turns out that's exactly what we have here. The F test statistic is found by
dividing the between group variance by the within group variance. The degrees of freedom for
the numerator are the degrees of freedom for the between group (k-1) and the degrees of freedom
for the denominator are the degrees of freedom for the within group (N-k).

Example 1: Suppose the National Transportation Safety Board (NTSB) wants to examine the
safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of
the treatments (cars types). Using the hypothetical data provided below, test whether the mean
pressure applied to the driver’s head during a crash test is equal for each types of car. Use α =
5%.

Table ANOVA. 1

Compact Cars Midsize Cars Full-size cars


643 469 484
655 427 456
702 525 402
Mean 666.67 473.67 447.33

S 31.18 49.17 41.68

Step 1: State the hypothesis

The null hypothesis for an ANOVA always assumes the population means are equal. Hence, we
may write the null hypothesis as:

H0: µ1 = µ 2 = µ 3 - The mean head pressure is statistically equal across the three types of cars.

Since the null hypothesis assumes all the means are equal, we could reject the null hypothesis if
only mean is not equal. Thus, the alternative hypothesis is:

Ha: At least one mean pressure is not statistically equal.

Step 2: Calculate the appropriate test statistic

The test statistic in ANOVA is the ratio of the between and within variation in the data. It
follows an F distribution.

Total Sum of Squares – the total variation in the data. It is the sum of the between and within
variation.
r c
Total Sum of Squares (SST) = ∑ ∑ ( X ij− X)2 where r is the number of rows in the table, c is
i=1 j=1

the number of columns, X is the grand mean, and X ij is the ith observation in the jth column.
X=
∑ X ij = (643+655+702+469+ 427+ 484+ 456¿402) =529.22
N 9

SST =(643−529.22)2 +…+ ( 402−529.22 )2=96303.55

Between Sum of Squares (or Treatment Sum of Squares) – variation in the data between the
different samples (or treatments).

Treatment Sum of Squares (SSTR) = ∑ r j (X j− X)2, where j r is the number of rows in the j th
treatment and Xj is the mean of the j th treatment

43024.78
F= =25.17
1709

Step 3: Obtain the Critical Value

To find the critical value from an F distribution you must know the numerator (MSTR) and
denominator (MSE) degrees of freedom, along with the significance level.

FCV has df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal
to c-1 and df2 is the denominator degrees of freedom equal to N-c.

In our example, df1 = 3 - 1 = 2 and df2 = 9 - 3 = 6. Hence we need to find CV F2,6


corresponding to α = 5%. Using the F tables in your text we determine that CV F2,6 = 5.14.

Step 4: Decision Rule

You reject the null hypothesis if: F (observed value) > FCV (critical value). In our example
25.17 > 5.14, so we reject the null hypothesis.

Step 5: Interpretation

Since we rejected the null hypothesis, we are 95% confident (1-α ) that the mean head pressure is
not statistically equal for compact, midsize, and full size cars. However, since only one mean
must be different to reject the null, we do not yet know which mean(s) is/are different. In short,
an ANOVA test will test us that at least one mean is different, but an additional test must be
conducted to determine which mean(s) is/are different.

Determining Which Mean(s) Is/Are Different

If you fail to reject the null hypothesis in an ANOVA then you are done. You know, with some
level of confidence, that the treatment means are statistically equal. However, if you reject the
null then you must conduct a separate test to determine which mean(s) is/are different. There are
several techniques for testing the differences between means, but the most common test is the
Least Significant Difference Test.

SCHEFFE' AND TUKEY TESTS

Definition:

The Scheffe Test (also called Scheffe’s procedure or Scheffe’s method) is a post-hoc test used
in Analysis of Variance. It is named for the American statistician Henry Scheffe. After you have
run ANOVA and got a significant F-statistic (i.e. you have rejected the null hypothesis that the
means are the same), then you run Sheffe’s test to find out which pairs of means are significant.
The Scheffe test corrects alpha for simple and complex mean comparisons. Complex mean
comparisons involve comparing more than one pair of means simultaneously.

Out of the three mean comparisons test you can run (the other two are Fisher’s LSD and Tukey’s
HSD). The Scheffe test is the most flexible, but it is also the test with the lowest statistical
power. Deciding which test to run largely depends on what comparisons you’re interested in:
 If you only want to make pairwise comparisons, run the Tukey procedure because it will
have a narrower confidence interval.
 If you want to compare all possible simple and complex pairs of means, run the Scheffe
test as it will have a narrower confidence interval.

This is where the Scheffe' and Tukey tests come into play. They will help us analyze pairs of
means to see if there is a difference -- much like the difference of two means covered earlier.

Hypotheses
H0 : i   j
H1 : i   j

Both tests are set up to test if pairs of means are different. The formulas refer to mean i and mean
j. The values of i and j vary, and the total number of tests will be equal to a combination of k
objects, 2 at a time C(k, 2), where k is the number of samples.

Scheffé Test
The Scheffe' test is customarily used with unequal sample sizes, although it could be used with
equal sample sizes.

The critical value for the Scheffe' test is the degrees of freedom for the between variance times
the critical value for the one-way ANOVA. This simplifies to be:

CV = (k – 1) F(k – 1, N – k, alpha)

The test statistic is a little bit harder to compute.


 xi  x j 
2

TS : Fs 
 1 1 
sw2   
 ni n j 
 

Pure mathematicians will argue that this shouldn't be called F because it doesn't have an F
distribution (it's the degrees of freedom times an F), but we'll live it with it.

Reject H0 if the test statistic is greater than the critical value. Note, this is a right tail test. If there
is no difference between the means, the numerator will be close to zero, and so performing a left
tail test wouldn't show anything.

Tukey Test
The Tukey test is only usable when the sample sizes are the same.

The Critical Value is looked up in a table. There are actually several different tables, one for
each level of significance. The number of samples, k, is used as a index along the top, and the
degrees of freedom for the within group variance, v = N – k, are used as an index along the left
side.

xi  x j
TS : q 
sw2 / n

The test statistic is found by dividing the difference between the means by the square root of the
ratio of the within group variation and the sample size.

Reject the null hypothesis if the absolute value of the test statistic is greater than the critical
value (just like the linear correlation coefficient critical values).

Two-way ANOVA

Definition

The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable. For example, you could use a two-way ANOVA to understand whether there is an
interaction between gender and educational level on test anxiety amongst university students,
where gender (males/females) and education level (undergraduate/postgraduate) are your
independent variables, and test anxiety is your dependent variable. Alternately, you may want to
determine whether there is an interaction between physical activity level and gender on blood
cholesterol concentration in children, where physical activity (low/moderate/high) and gender
(male/female) are your independent variables, and cholesterol concentration is your dependent
variable.

Assumption:

 Your dependent variable should be measured at the continuous level (i.e., they are
interval or ratio variables). Examples of continuous variables include revision time
(measured in hours), intelligence (measured using IQ score), exam performance
(measured from 0 to 100), weight (measured in kg), and so forth.
 Your two independent variables should each consist of two or more categorical,
independent groups. Example independent variables that meet this criterion include
gender (2 groups: male or female), ethnicity (3 groups: Caucasian, African American and
Hispanic), profession (5 groups: surgeon, doctor, nurse, dentist, therapist), and so forth.
 You should have independence of observations, which means that there is no relationship
between the observations in each group or between the groups themselves. For example,
there must be different participants in each group with no participant being in more than
one group. This is more of a study design issue than something you would test for, but it
is an important assumption of the two-way ANOVA. If your study fails this assumption,
you will need to use another statistical test instead of the two-way ANOVA (e.g., a
repeated measures design).
 There should be no significant outliers. Outliers are data points within your data that do
not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean
score was 108 with only a small variation between students, one student had a score of
156, which is very unusual, and may even put her in the top 1% of IQ scores globally).
The problem with outliers is that they can have a negative effect on the two-way
ANOVA, reducing the accuracy of your results.
 Your dependent variable should be approximately normally distributed for each
combination of the groups of the two independent variables.
 There needs to be homogeneity of variances for each combination of the groups of the
two independent variables.
Hypotheses
There are three sets of hypotheses with the two-way ANOVA.

The null hypotheses for each of the sets are given below.
1. The population means of the first factor are equal. This is like the one-way ANOVA for
the row factor.
2. The population means of the second factor are equal. This is like the one-way ANOVA
for the column factor.
3. There is no interaction between the two factors. This is similar to performing a test for
independence with contingency tables.
Factors
The two independent variables in a two-way ANOVA are called factors. The idea is that there
are two variables, factors, which affect the dependent variable. Each factor will have two or more
levels within it, and the degrees of freedom for each factor is one less than the number of levels.

Treatment Groups
Treatment Groups are formed by making all possible combinations of the two factors. For
example, if the first factor has 3 levels and the second factor has 2 levels, then there will be 3  2
= 6 different treatment groups.

As an example, let's assume we're planting corn. The type of seed and type of fertilizer are the
two factors we're considering in this example. This example has 15 treatment groups. There are 3
– 1 = 2 degrees of freedom for the type of seed, and 5 – 1 = 4 degrees of freedom for the type of
fertilizer. There are 2  4 = 8 degrees of freedom for the interaction between the type of seed
and type of fertilizer.

The data that actually appears in the table are samples. In this case, 2 samples from each
treatment group were taken.

  Fert I Fert II Fert III Fert IV Fert V


Seed A-402 106, 110 95, 100 94, 107 103, 104 100, 102
Seed B-894 110, 112 98, 99 100, 101 108, 112 105, 107
Seed C-952 94, 97 86, 87 98, 99 99, 101 94, 98

Main Effect
The main effect involves the independent variables one at a time. The interaction is ignored for
this part. Just the rows or just the columns are used, not mixed. This is the part which is similar
to the one-way analysis of variance. Each of the variances calculated to analyze the main effects
are like the between variances

Interaction Effect
The interaction effect is the effect that one factor has on the other factor. The degrees of freedom
here are the product of the two degrees of freedom for each factor.

Within Variation
The Within variation is the sum of squares within each treatment group. You have one less than
the sample size (remember all treatment groups must have the same sample size for a two-way
ANOVA) for each treatment group. The total number of treatment groups is the product of the
number of levels for each factor. The within variance is the within variation divided by its
degrees of freedom.

The within group is also called the error.


F-Tests
There is an F-test for each of the hypotheses, and the F-test is the mean square for each main
effect and the interaction effect divided by the within variance. The numerator degrees of
freedom come from each effect, and the denominator degrees of freedom is the degrees of
freedom for the within variance in each case.

Two-Way ANOVA Table


It is assumed that main effect A has a levels (and A = a – 1 df), main effect B has b levels (and B
= b – 1 df), n is the sample size of each treatment, and N = abn is the total sample size. Notice
the overall degrees of freedom is once again one less than the total sample size.

Source SS df MS F
Main Effect A given A, a - 1 SS / df MS(A) / MS(W)
Main Effect B given B, b - 1 SS / df MS(B) / MS(W)
Interaction given AB, (a - 1)(b - SS / df MS(AB) /
Effect 1) MS(W)
Within given N - ab, ab(n - 1) SS / df  
Total sum of others N - 1, abn - 1    

Summary
The following results are calculated using spreadsheet. It provides the p-value and the critical
values are for alpha = 0.05.

Source of SS df MS F P-value F-crit


Variation
Seed 512.8667 2 256.4333 28.283 0.000008 3.682
Fertilizer 449.4667 4 112.3667 12.393 0.000119 3.056
Interaction 143.1333 8 17.8917 1.973 0.122090 2.641
Within 136.0000 15 9.0667
Total 1241.4667 29

From the above results, we can see that the main effects are both significant, but the interaction
between them isn't. That is, the types of seed aren't all equal, and the types of fertilizer aren't all
equal, but the type of seed doesn't interact with the type of fertilizer.

You might also like