Flipped Notes 8 Hypothesis Testing
Flipped Notes 8 Hypothesis Testing
lives by
FLIPPED NOTES NUMBER 8
Educating for
the BEST.
CSU Mission
CSU is committed
to transform the In partial fulfilment for the requirements of the course
lives of people and ENGINEERING DATA ANALYSIS
communities
through high
quality instruction
and innovative
research,
development, By:
production and
SUBONG, JOEMAR D.
extension.
BACANI, VALERIE ELAINE M.
DOCA, AL JOHNKENETH A.
CSU – IGA
Competence
TANNAGAN, NOREEN G.
Social Responsibility
Unifying Presence
COE – IGA
January 06, 2020
Innovative Thinking
Synthesis
Personal
Responsibility
Empathy
Research Skill
Entrepreneurial Skill
UNIT I
Hypothesis Testing
LEARNING OUTCOMES:
In this lesson we are going to look the parts of hypothesis testing which is essential to
thoroughly understand the whole lesson for hypothesis testing. At end of this unit, the following
target are be are to be accomplish
From previous experience we know that the birth weights of England are normally distributed
with a mean of 3000g and a standard deviation of 500g.
We think that maybe babies in Australia have a mean birth weight greater than 3000g and we
would like to test this hypothesis.
The statistical method that is used in making statistical decision using experimental data and
basically an assumption that we make about population parameter is called Hypothesis Testing.
A hypothesis test is a formal way to make a decision based on statistical analysis. It refers to the
process of making inferences or educated guesses about a particular parameter. This can either
be done using statistics and sample data, or it can be done on the basis of an uncontrolled
observational study.
Formulating of Hypotheses
Setting a new studies/theory always begins with formation of assumption or claim that is known
in field of statistics as Hypothesis.
Hypothesis testing is a set of logical and statistical guidelines used to make decision from sample
statistics to population statistics. The intent of hypothesis testing is to formally examine two
opposing hypotheses, the null and alternative hypotheses. These two hypotheses are mutually
exclusive and exhaustive.
1. Null hypothesis, which denoted as H 0, is the statement about the population parameter that is
assumed to be true unless there is convincing evidence to the contrary. The null hypothesis
attempts to show that no variation exists between variables or that a single variable is no
different than its mean. It is presumed to be true until statistical evidence nullifies it for an
alternative hypothesis. Simply null hypothesis is a type of conjecture used in statistics that
proposes that no statistical significance exists in a set of given observations. [ CITATION
Hay19 \l 4105 ]
For example, if the hypothesis test is set up so that the alternative hypothesis states that the
population parameter is not equal to the claimed value. Therefore, the cook time for the
population mean is not equal to 12 minutes; rather, it could be less than or greater than the stated
value. If the null hypothesis is accepted or the statistical test indicates that the population mean is
12 minutes, then the alternative hypothesis is rejected. And vice-versa.
Two-sided
Use a two-sided alternative hypothesis (also known as a nondirectional hypothesis) to determine
whether the population parameter is either greater than or less than the hypothesized value. A
two-sided test can detect when the population parameter differs in either direction, but has less
power than a one-sided test.
For example, a researcher has results for a sample of students who took a national exam at a high
school. The researcher wants to know if the scores at that school differ from the national average
of 850. A two-sided alternative hypothesis (also known as a nondirectional hypothesis) is
appropriate because the researcher is interested in determining whether the scores are either less
than or greater than the national average. (H0: μ = 850 vs. Ha: μ≠ 850)
One-sided
Use a one-sided alternative hypothesis (also known as a directional hypothesis) to determine
whether the population parameter differs from the hypothesized value in a specific direction.
You can specify the direction to be either greater than or less than the hypothesized value. A one-
sided test has greater power than a two-sided test, but it cannot detect whether the population
parameter differs in the opposite direction.
For example, a researcher has exam results for a sample of students who took a training course
for a national exam. The researcher wants to know if trained students score above the national
average of 850. A one-sided alternative hypothesis (also known as a directional hypothesis) can
be used because the researcher is specifically hypothesizing that scores for trained students are
greater than the national average. (H0: μ = 850 vs. Ha: μ > 850)
Hypothesis testing allows a mathematical model to validate or reject a null hypothesis within a
certain confidence level.
The end result of a hypotheses testing procedure is a choice of one of the following two possible
conclusions:
The null hypothesis typically represents the status quo, or what has historically been true. In the
example of the respirators, we would believe the claim of the manufacturer unless there is reason
not to do so, so the null hypotheses is H 0:μ=75. The alternative hypothesis in the example is the
contradictory statement H a:μ<75. The null hypothesis will always be an assertion containing an
equals sign, but depending on the situation the alternative hypothesis can have any one of three
forms: with the symbol <, as in the example just discussed, with the symbol >, or with the
symbol ≠. The following two examples illustrate the latter two cases.
Example 1. The recipe for a bakery item is designed to result in a product that contains 8 grams
of fat per serving. The quality control department samples the product periodically to insure that
the production process is working as designed. State the relevant null and alternative hypotheses.
Solution: The default option is to assume that the product contains the amount of fat it was
formulated to contain unless there is compelling evidence to the contrary. Thus the null
hypothesis is H0:μ=8.0 . Since to contain either more fat than desired or to contain less fat than
desired are both an indication of a faulty production process, the alternative hypothesis in this
situation is that the mean is different from 8.0, so Ha:μ≠8.0.
Example 2. A publisher of college textbooks claims that the average price of all hardbound
college textbooks is $127.50 . A student group believes that the actual mean is higher and
wishes to test their belief. State the relevant null and alternative hypotheses.
Solution: The default option is to assume that the product contains the amount of fat it was
formulated to contain unless there is compelling evidence to the contrary. Thus the null
hypothesis is H0:μ=8.0 . Since to contain either more fat than desired or to contain less fat than
desired are both an indication of a faulty production process, the alternative hypothesis in this
situation is that the mean is different from 8.0, so Ha:μ≠8.0.
In order to make the null and alternative hypotheses easy to distinguish, in every example and
problem in this text we will always present one of the two competing claims about the value of a
parameter with an equality. The claim expressed with an equality is the null hypothesis. This is
the same as always stating the null hypothesis in the least favorable light. So in the introductory
example about the respirators, we stated the manufacturer’s claim as “the average is 75
minutes” instead of the perhaps more natural “the average is at least 75 minutes,” essentially
reducing the presentation of the null hypothesis to its worst case
The format of the testing procedure in general terms is to take a sample and use the information
it contains to come to a decision about the two hypotheses. As stated before our decision will
always be either.
There are four possible outcomes of hypothesis testing procedure, as shown in the following
table:
True State of Nature
Ho is true Ho is false
The null hypothesis can be either true or false further, we will make a conclusion either to reject
or not to reject the null hypothesis. Thus, there are four possible situations that may arise in
testing a hypothesis.
Definition of Level of significance
The number α that is used to determine the rejection region is called the level of significance of
the test. It is the probability that the test procedure will result in a Type I error. The following are
procedure for formulating hypotheses and stating conclusions.
The probability of making a Type II error is too complicated to discuss in a beginning text, so we
will say no more about it than this: for a fixed sample size, choosing alpha smaller in order to
reduce the chance of making a Type I error has the effect of increasing the chance of making a
Type II error. The only way to simultaneously reduce the chances of making either kind of error
is to increase the sample size.
Solution: A Type I error is the error of incorrectly rejecting the null hypothesis. In our example,
this would occur if we conclude that the process is out of control when in fact the process is in
control, i.e., if we conclude that the mean bearing diameter is different from .5 inch, when in fact
the mean is equal to .5 inch. The consequence of making such an error would be that
unnecessary time and effort would be expended to repair the metal lathe.
A Type II error, which of accepting the null hypothesis when it is false, would occur if we
conclude that the mean bearing diameter is equal to .5 inch when in fact the mean differs from .5
inch. The practical significance of making a Type II error is that the metal lathe would not be
repaired, when in fact the process is out of control.
The probability of making a Type I error (a) can be controlled by the researcher (how to do this
will be explained in Section 4). a is often used as a measure of the reliability of the conclusion
and called the level of significance (or significance level) for a hypothesis test.
You may note that we have carefully avoided stating a decision in terms of "accept the null
hypothesis H0." Instead, if the sample does not provide enough evidence to support the
alternative hypothesis Ha, we prefer a decision "not to reject H0." This is because, if we were to
"accept H0," the reliability of the conclusion would be measured by a, the probability of Type II
error. However, the value of a is not constant, but depends on the specific alternative value of the
parameter and is difficult to compute in most testing situations.
The null hypothesis always has the form H0:μ=μ0 for a specific number μ0 (in the respirator
example μ0=75 , in the textbook example μ0=127.50 , and in the baked goods example
μ0=8.0). Since the null hypothesis is accepted unless there is strong evidence to the contrary, the
test procedure is based on the initial assumption that H0 is true. This point is so important that
we will repeat it in a display:
Think of the respirator example, for which the null hypothesis is H0:μ=75, the claim that the
average time air is delivered for all respirators is 75 minutes. If the sample mean is 75 or greater
then we certainly would not reject H0 (since there is no issue with an emergency respirator
delivering air even longer than claimed).
If the sample mean is slightly less than 75 then we would logically attribute the difference to
sampling error and also not reject H0 either.
Values of the sample mean that are smaller and smaller are less and less likely to come from a
population for which the population mean is 75. Thus if the sample mean is far less than 75, say
around 60 minutes or less, then we would certainly reject H0, because we know that it is highly
unlikely that the average of a sample would be so low if the population mean were 75. This is the
rare event criterion for rejection: what we actually observed (X<60) would be so rare an event if
μ=75 were true that we regard it as much more likely that the alternative hypothesis μ<75 holds.
In summary, to decide between H0 and Ha in this example we would select a “rejection region”
of values sufficiently far to the left of 75, based on the rare event criterion, and reject H0 if the
sample mean X lies in the rejection region, but not reject H0 if it does not.
Each different form of the alternative hypothesis Ha has its own kind of rejection region:
if (as in the respirator example) Ha has the form Ha : μ<μ0 , we reject H0 if x is far to
the left of μ0 , that is, to the left of some number C , so the rejection region has the form
of an interval (−∞,C]
if (as in the textbook example) Ha has the form Ha:μ>μ0 , we reject H0 if x is far to
the right of μ0 , that is, to the right of some number C , so the rejection region has the
form of an interval [C,∞) ;
if (as in the baked good example) Ha has the form Ha:μ≠μ0 , we reject H0 if x¯ is far
away from μ0 in either direction, that is, either to the left of some number C or to the
right of some other number C', so the rejection region has the form of the union of two
intervals (−∞,C]∪[C′,∞).
The key issue in our line of reasoning is the question of how to determine the number C or
numbers C and C' , called the critical value or critical values of the statistic, that determine the
rejection region.
Suppose the rejection region is a single interval, so we need to select a single number C. Here is
the procedure for doing so. We select a small probability, denoted α, say 1%, which we take as
our definition of “rare event:” an event is “rare” if its probability of occurrence is less than α .
(In all the examples and problems in this text the value of α will be given already.) The
probability that X takes a value in an interval is the area under its density curve and above that
interval, so as shown in Figure below (drawn under the assumption that H0 is true, so that the
curve centers at μ0 ) the critical value C is the value of X that cuts off a tail area α in the
probability density curve of X. When the rejection region is in two pieces, that is, composed of
two intervals, the total area above both of them must be α , so the area above each one is α/2 ,
as also shown in Figure below.
The number α is the total area of a tail or a pair of tails.
Example 4: In the context of Example 1, suppose that it is known that the population is
normally distributed with standard deviation α=0.15 gram, and suppose that the test of
hypotheses H0 : μ=8.0 versus Ha : μ≠8.0 will be performed with a sample of size 5 .
Construct the rejection region for the test for the choice α=0.10. Explain the decision procedure
and interpret it.
Solution:
If H0 is true then the sample mean X is normally distributed with mean and standard deviation.
μ x =μ
= 8.0
σ
σ x=
√n
0.15
¿
√5
= 0.067
Since Ha contains the ≠ symbol the rejection region will be in two pieces, each one
corresponding to a tail of area α/2=0.10/2=0.05 . z x = 1.645 , so C and C' are 1.645 standard
deviations of X to the right and left of its mean 8.0.
The reasoning is that if the true average amount of fat per serving were 8.0 grams then there
would be less than a 10% chance that a sample of size 5 would produce a mean of either 7.89
grams or less or 8.11 grams or more. Hence if that happened it would be more likely that the
value 8.0 is incorrect (always assuming that the population standard deviation is 0.15 gram).
Because the rejection regions are computed based on areas in tails of distributions, as shown in
Figure, hypothesis tests are classified according to the form of the alternative hypothesis in the
following way.
Unit II
ONE SAMPLE STATISTICAL TESTS
LEARNING OUTCOMES:
In this lesson we are going to move on and look at inferential statistics to test hypotheses
concerned with comparing a single sample (instead of a single score) with some population
parameter. At the end of this unit, the following targets are to be accomplished:
1. to apply the Z-test to compare the mean of a large group to a single statistic
2. to apply the one-sample t-test to compare the mean of a small group to a single value
or statistic.
I. INTRODUCTION
In our last lesson we looked at the process for making inferences about research.
In this context we looked at the significance of a single score. We wanted to see if a score
differed significantly from a population value. To test statistical hypotheses involving a
single score we calculated the scores Z-score. We referred to this as the Z-score test. As a
reminder the formula for the Z-score (or the Z-score test) was
X− μ
Z=
σ
II. THE Z-TEST
The z-test for the mean is a statistical test for a population mean. The z-test can be used
when the population is normal and is known, or for any population when the sample
size n is at least 30.
The test statistic is the sample mean and the standardized test statistic is z.
Using the formula:
x́−μ σ
st a n dar d er r or σ x
z= n x́−μ
σ z=
σx
√n
where x́ is the sample mean, μ is a specified value to be tested, σ is the population
standard deviation, and n is the size sample. The significance level of the z-value in
the standard normal table must be considered.
The sampling distribution of the mean is the mean of a set of many sample means
taken from a population. It is the mean of all the means. In practice the sampling
distribution of the mean is the same as the population mean, so we can use μ instead
of μ X .
σ X is the standard error of the mean. It is the standard deviation of many sample
means. Unfortunately for us the standard error of the mean does not equal the
population standard deviation but instead is equal to the population standard deviation
(sigma) divided by the square root of the sample size (n).
The steps for using z-test in hypothesis testing can be summarized into 7 steps:
WORDS IN SYMBOLS
1. State the claim mathematically and verbally. State H0 and Ha.
Identify the null and alternative hypotheses.
2. Specify the level of significance. Identify .
3. Determine the standardized test statistic. x μ
z
σ n
4. Find the area that corresponds to z. significance level of the z-value in
the standard normal table
Example:
1. A manufacturer claims that its rechargeable batteries are good for an average of more
than 1,000 charges. A random sample of 100 batteries has a mean life of 1002 charges
and a standard deviation of 14. Is there enough evidence to support this claim at =
0.01?
STEP 1. (Claim)
H : > 1000 H : 1000
a 0
1.4
0 0 3
z
z
STEP 5. P= 0.0764
STEP 7. Interpret
At the 1% level of significance, there is not enough evidence to support the claim
that the rechargeable battery has an average life of at least 1000 charges.
1.70.
The test statistic falls in the nonrejection
region, so H0 is not rejected.
z
z0 = 2.110 0 z0 = 2.110
STEP 8. INTERPRET
At the 5% level of significance, there is not enough evidence to reject the claim
that the average length of a phone call is 8 minutes.
Unit III
TWO PARAMETER TESTING
LEARNING OBJECTIVES
By the end of this chapter, the following targets should have been accomplished:
to differentiate independent and dependents samples;
to apply independent t-test and dependent t-test appropriately to compare the means of
two sample
I. INTRODUCTION
In a two-sample hypothesis test, two parameters from two populations are
compared.
For a two-sample hypothesis test,
1. the null hypothesis H0 is a statistical hypothesis that usually states there is no
difference between the parameters of two populations. The null hypothesis always
contains the symbol , =, or .
2. the alternative hypothesis Ha is a statistical hypothesis that is true when H0 is false.
The alternative hypothesis always contains the symbol >, , or <.
To write a null and alternative hypothesis for a two-sample hypothesis test, translate
the claim made about the population parameters from a verbal statement to a mathematical
statement.
H0 : μ 1 = μ 2 H0: μ1 μ2 H :μ μ
0 1 2
Ha: μ1 μ2 Ha: μ1 > μ2
H :μ <μ
Regardless of which hypotheses used, μ1 = μ2 is always assumed to abe true
1 2
There are two words that have to be defined to understand this unit-the dependent and
independent samples. Two samples are independent if the sample selected from one
population is not related to the sample selected from the second population. Two samples
are dependent if each member of one sample corresponds to a member of the other
sample. Dependent samples are also called paired samples or matched samples.
Dependent Samples
Independent Samples
Example:
The independent t-test, as we have already mentioned is used when we wish to compare the
statistical significance of a possible difference between the means of two groups on some
independent variable and the two groups are independent of one another.
X 1 −X 2
t=
SS 1 +SS 2
√( n1 + n2 −2 )( 1 1
+
n1 n2 )
where
X1 is the mean for group 1,
X2 is the mean for group 2,
SS1 is the sum of squares for group 1,
SS2 is the sum of squares for group 2,
n1 is the number of subjects in group 1, and
n2 is the number of subjects in group 2.
The sum of squares is a new way of looking at variance. It gives us an indication of how spread
out the scores in a sample is. The t-value we are finding is the difference between the two means
divided by their sum of squares and taking the degrees of freedom into consideration.
X1
2
SS1 X 12
n1
and
X2
2
SS2 X 22
n2
We can see that each sum of squares is the sum of the squared scores in the sample minus the
sum of the scores quantity squared divided by the size of the sample (n).
We also need to know the degrees of freedom for the independent t-test which is:
df = n1 + n2 – 2
In this problem we see that we have two samples and the samples are independent of one
another. We can see that the inferential statistic we need to use here is the independent t-test.
We can use the totals from this worksheet and the number of subjects in each group to calculate
the sum of squares for group 1, the sum of squares for group 2, the mean for group 1, the mean
for group 2, and the value for the independent t.
X1
2
(775)2
SS 1 X 12 55837 1234.73
n1 11
X2
2
(662)2
SS1 X 22 40982 1141.64
n2 11
775 662
X1 70.45 X2 60.18
11 11
X 1 X2 70.45 60.18
t 2.209
SS1 SS2 1 1 1234.73 1141.64 1 1
n1 n2 2 n1 n2 11 11 2 11 11
We now have the information we need to complete the six step statistical inference process for
our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 : μ1 = μ 2
H 1 : μ1 ≠ μ2
Note: Our problem did not state which direction of significance we will be looking for;
therefore, we will be looking for a significant difference between the two means in either
direction.
2. Set the alpha level.
= 0.5
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.209
df = n1 + n2 - 2 = 11 + 11 - 2 = 20
Note: We have calculated the t-value and will also need to know the degrees of freedom
when we go to look up the critical values of t.
Note: To write the decision rule we need to know the critical value for t, with an alpha
level of .05 and a two-tailed test. We can do this by looking at Appendix C (Distribution
of t) on page 318 of the text book. Look for the column of the table under .05 for Level of
significance for two-tailed tests, read down the column until you are level with 20 in the
df column, and you will find the critical value of t which is 2.086. That means our result
is significant if the calculated t value is less than or equal to -2.086 or is greater than or
equal to 2.086.
Note: Since our calculated value of t (2.209) is greater than or equal to 2.086, we reject
the null hypothesis and accept the alternative hypothesis.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H0: 1 = 2
H1: 1 > 2
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.252
df = n1 + n2 - 2 = 5 + 5 - 2 = 8
d
d
and the standardized test statistic is
n
d μd
t .
sd n
The degrees of freedom are d.f. = n – 1.
d = x1 – x2
To perform a two-sample hypothesis test with dependent samples, the difference
between each data pair is first found:
d = x1 – x2 Difference between entries for a data pair
If these conditions are met, then the sampling distribution for d́ is approximated
by a t-distribution with n – 1 degrees of freedom, where n is the number of data pairs.
–t0 μd d
t0
The following steps can be used to solve a two means using t-test.
WORDS IN SYMBOLS
1. State the claim mathematically and verbally. State H0 and Ha.
Identify the null and alternative hypotheses.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom and sketch
d.f= n-1
the sampling distribution.
4. Determine any critical value significance level of the z-value in
the standard normal table
5. Determine any rejection region(s). See critical values of t-distribution
with ν degrees of freedom
d n (d 2 ) (d )2
d sd
6. Calculate d́ and sd .Use a table n n (n 1)
d μd
7. Find the standardized test statistic. t
sd n
8. Make a decision to reject or fail to reject the If t is in the rejection region,
null hypothesis. reject H0. Otherwise, fail to
9. Interpret the decision in the context of the reject H0.
original claim.
STEP 1. (Claim)
H0: d 0 Ha: d > 0 (Claim)
=
STEP 2. 0.0
5
- -2 -1 0 1 2 3 t
3
STEP 3. Identify the degrees of freedom
d.f. = 6 – 1 = 5
STEP 4 AND STEP 5. Determine any critical value and any rejection region(s).
t0 = 2.015
STEP 6 Calculate d́ and sd .
d = (score before) – (score after)
d 43
d 2 833
d 43
d 7.167
n 6
1.714.
STEP 9: Interpret
There is not enough evidence at the 5% level to support the claim that the students’
scores after the course are better than the scores before the course.
Unit IV
CORRELATION AND REGRESSION
LEARNING OBJECTIVES
In this chapter we would look at the most commonly used techniques for investigating the
relationship between two quantitative variables. These variables are correlation and regression.
Specifically, at end of this unit, the following target are be are to be accomplished:
Define correlation and regression
Compute and interpret a correlation coefficient
Compute and interpret coefficients in a linear regression analysis
I. INTRODUCTION
Our interest in this chapter is in situations in which we can associate to each
element of a population or sample two measurements x and y, particularly in the case that
it is of interest to use the value of x to predict the value of y. For example, the population
could be the air in automobile garages, x could be the electrical current produced by an
electrochemical reaction taking place in a carbon monoxide meter, and y the
concentration of carbon monoxide in the air. In this chapter we will learn statistical
methods for analyzing the relationship between variables x and y in this context.
II. CORRELATION
Figure 1 Figure 2
CORRELATION COEFFICIENT
The correlation coefficient is a measure of the strength and the direction of a linear
relationship between two variables. The symbol r represents the sample correlation coefficient.
n xy x y
The formula for r is : r .
x y
2 2 2 2
n x n y
The range of the correlation coefficient is 1 to 1. If x and y have a strong positive linear
correlation, r is close to 1. If x and y have a strong negative linear correlation, r is close to 1. If
y
there is no linear correlation or a weak. Examples are shown below.
y
r=
r=
0.91
x
Strong negative correlation y 0.88x
y Strong positive correlation
x
r = 0.42
x
r = 0.07
WORDS IN SYMBOLS
1. Find the sum of the x-values. ∑x
2. Find the sum of the y-values. ∑y
3. Multiply each x-value by its corresponding
y-value and find the sum. ∑ xy
4. Square each x-value and find the sum. ∑ x2
5. Square each y-value and find the sum ∑ y2
6. Use these five sums to calculate the n xy x y
r .
n x x n y y
2 2
correlation coefficient. 2 2
The t-Test for the Correlation Coefficient: A t-test can be used to test whether the
correlation between two variables is significant. The test statistic is r and the standardized test
statistic follows a t-distribution with n – 2 degrees of freedom. In this text, only two-tailed
hypothesis tests for ρ are considered. It can be computed using the formula presented below.
r r
t
σr 1 r2
n 2
Where r is the correlation coefficient, n-2 is the degree of freedom and 1-r2 coefficient of
non-determination.
WORDS IN SYMBOLS
1. State the null and alternative hypothesis. State H0 and Ha.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom d.f= n-2
4. Determine any critical value(s) and any
rejection region(s). See critical values of t-distribution
with ν degrees of freedom
r
t
5. Find the standardized test statistic 1 r2
n 2
6. Make a decision to reject or fail to reject the If t is in the rejection region,
null hypothesis. reject H0. Otherwise fail to
7. Interpret the decision in the context of the reject H0.
original claim.
EXAMPLE
1. The following data represents the number of hours 12 different students watched
television during the weekend and the scores of each student who took a test the
following Monday. The correlation coefficient r 0.831. Test the significance of this
correlation coefficient significant at = 0.01?
STEP 1. Claim
H0: ρ = 0 (no correlation) Ha: ρ 0 (significant correlation)
STEP 2. The level of significance is = 0.01
STEP 3. Degrees of freedom are d.f. = 12 – 2 = 10
STEP 4 AND STEP 5. The critical values are t0 = 3.169 and t0 = 3.169.
rejected.
4.72. The test statistic falls in the rejection region,
so H0 is rejected.
t
t0 = 3.169 0 t0 = 3.169
STEP 8. Interpret
At the 1% level of significance, there is enough evidence to conclude that
there is a significant linear correlation between the number of hours of TV
watched over the weekend and the test scores on Monday morning.
The fact that two variables are strongly correlated does not in itself imply a cause-and-
effect relationship between the variables. If there is a significant correlation between two
variables, you should consider the following possibilities:
III. REGRESSION
A regression line, also called a line of best fit, is the line for which the sum of the
squares of the residuals is a minimum. Since it "best fits" the data, it makes sense that the line
passes through the means.
The idea behind regression is that when there is significant linear correlation, you can use
a line to estimate the value of the dependent variable for certain values of the independent
variable.
The regression equation should only used
When there is significant linear correlation. That is, when you reject the null hypothesis
that rho=0 in a correlation hypothesis test.
The value of the independent variable being used in the estimation is close to the original
values. That is, you should not use a regression equation obtained using x's between 10
and 20 to estimate y when x is 200.
The regression equation should not be used with different populations. That is, if x is the
height of a male, and y is the weight of a male, then you shouldn't use the regression
equation to estimate the weight of a female.
The regression equation shouldn't be used to forecast values not from that time frame. If
data is from the 1960's, it probably isn't valid in the 1990's.
Assuming that you've decided that you can have a regression equation because there is
significant linear correlation between the two variables, the equation becomes line for an
independent variable x and a dependent variable y is ŷ = mx + b where ŷ is the predicted y-value
for a given x-value. The slope m and y-intercept b are given by
n ( ∑ xy ) − ( ∑ x )( ∑ y )
m= 2
n (∑ x 2) − (∑ x )
(∑ y ) (∑ x 2 ) − (∑ x )(∑ xy )
b= ∑ y -m ∑ x
2 b = ý -m x́ =
n (∑ x 2) − (∑ x ) or m n
Where ý is the mean of the y-values and x́ is the mean of the x-values. The regression line
always passes through ¿ , ý)
EXAMPLE:
1. The following data represents the number of hours 12 different students watched
television during the weekend and the scores of each student who took a test the
following Monday.
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score for a student who watches 9 hours of TV.
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
96 85 82 74 95 68 76 84 58 65 75 50
y
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
x 54 y 908 x y 3724 x 2
332 y 2
70836
SOLUTION:
n xy x y 12(3724) 54 908
m 4.067
x
2 2
n x 12(332) 54
2
ŷ = –4.07x + 93.97
1. COEFFICIENT OF DETERMINATION
Coefficient of Determination
The coefficient of determination is
the percent of the variation that can be explained by the regression equation
the explained variation divided by the total variation
the square of r
2
∑ ( y− ȳ )2 = ∑ ( y'− ȳ )
+ ∑ ( y− y' )2
total = explained + unexplained
Well, the ratio of the explained variation to the total variation is a measure of how good the
regression line is. If the regression line passed through every point on the scatter plot exactly, it
would be able to explain all of the variation. The further the line is from the points, the less it is
able to explain.
Coefficient of Non-Determination
The coefficient of non-determination is ...
The percent of variation which is unexplained by the regression equation
The unexplained variation divided by the total variation
1 - r2
1−r 2
se =
√
n−2
The standard error of the estimate is the square root of the coefficient of non-determination
divided by it's degrees of freedom.
1−r 2
E = z α/2
n−2√
y' − E < y < y' + E
The following only works when the sample size is large. Large in this instance is usually taken to
be more than 100. We're not going to cover this in class, but is provided here for your
information. The maximum error of the estimate is given, and this maximum error of the
estimate is subtracted from and added to the estimated value of y.
Unit VII
CHI-SQUARE
LEARNING OUTCOMES:
In this lesson we are going to look a statistical test that uses the Chi square distribution that is
applicable on both large and small samples depending on their context. At end of this unit, the
following target should have been accomplished.
I. INTRODUCTION
In this chapter we explore two types of hypothesis tests that require the Chi Square Distribution.
The Chi Square statistics is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-square test is that no relationship exists on the
categorical variables in the population; they are independent.
2
n 1 s2 df s2
2 2
The chi-square ( ) distribution is obtained from the values of the ratio of the sample variance
2
and population variance multiplied by the degrees of freedom. This occurs when the population
is normally distributed with population variance sigma^2.
The degrees of freedom depends on the application, as we will see later. Here are a few facts
about the Chi-Square distribution. If X 2 ≈ X 2df the following are true of X 2 .
Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail values is
different from the method for looking up right tail values.
Area to the right - just use the area given.
Area to the left - the table requires the area to the right, so subtract the given area from
one and look this area up in the table.
Area in both tails - divide the area by two. Look up this area for the right critical value
and one minus this area for the left critical value.
2 df × s 2
The variable X = has a chi-square distribution if the population variance has a normal
σ2
distribution. The degrees of freedom are n-1. We can use this to test the population variance
under certain conditions.
Confidence Interval
2 df × s 2
If you solve the test statistic formula for the population variance, you get: X =
σ2
1. Find the two critical values (alpha/2 and 1-alpha/2)
2. Compute the value for the population variance given above.
3. Place the population variance between the two values calculated in step 2 (put the smaller
one first).
Note, the left-hand endpoint of the confidence interval comes when the right critical value is
used and the right-hand endpoint of the confidence interval comes when the left critical value is
used. This is because the critical values are in the denominator and so dividing by the larger
critical value (right tail) gives the smaller endpoint.
We use the goodness of fit test to test if a discrete categorical random variable matches a
predetermined “expected” distribution. The hypotheses in a goodness of fit test are
Requirement: In order for a chi-square goodness of fit test to be appropriate, the expected value
in each category must be at least 5. It may be possible to combine categories to meet this
requirement.
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed
value of a given phenomenon is significantly different from the expected value. In Chi-Square
goodness of fit test, the term goodness of fit is used to compare the observed sample distribution
with the expected probability distribution. Chi-Square goodness of fit test determines how well
theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. In
Chi-Square goodness of fit test, sample data is divided into intervals. Then the numbers of points
that fall into the interval are compared, with the expected numbers of points in each interval.
Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value.
Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a significant difference between the observed and the expected
value.
Compute the value of Chi-Square goodness of fit test using the following formula
2 ( O−E )2
X = where, X 2 =Chi−Square goodness of fit test O=observed value E=expected value
E
Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom depends on the
distribution of the sample. The following table shows the distribution and an associated degree
of freedom:
Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the same as in other
tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit test is
compared with the table value. If the calculated value of Chi-Square goodness of fit test is
greater than the table value, we will reject the null hypothesis and conclude that there is a
significant difference between the observed and the expected frequency. If the calculated value
of Chi-Square goodness of fit test is less than the table value, we will accept the null hypothesis
and conclude that there is no significant difference between the observed and expected value
The idea is that if the observed frequency is really close to the claimed (expected) frequency,
then the square of the deviations will be small. The square of the deviation is divided by the
expected frequency to weight frequencies. A difference of 10 may be very significant if 12 was
the expected frequency, but a difference of 10 isn't very significant at all if the expected
frequency was 1200.
If the sum of these weighted squared deviations is small, the observed frequencies are close to
the expected frequencies and there would be no reason to reject the claim that it came from that
distribution. Only when the sum is large is the reason to question the distribution. Therefore, the
chi-square goodness-of-fit test is always a right tail test.
Observed Expected 2
2
Expected
The test statistic has a chi-square distribution when the following assumptions are met
The data are obtained from a random sample
The expected frequency of each category must be at least 5. This goes back to the
requirement that the data be normally distributed. You're simulating a multinomial
experiment (using a discrete distribution) with the goodness-of-fit test (and a continuous
distribution), and if each expected frequency is at least five then you can use the normal
distribution to approximate (much like the binomial). If the expected
We can use the chi-square statistic to test the distribution of measures over levels of a variable to
indicate if the distribution of measures is the same for all levels. This is the first use of the one-
variable chi-square test. This test is also referred to as the goodness-of-fit test.
Example 1: Fair Die: Suppose we wish to test if a die is weighted. We roll the die 120 times and
get the following “observed” results.
Roll Observed Expected
1 15
2 29
3 16
4 15
5 30
6 15
1. What is the expected distribution of the 120 die rolls? Complete the table.
2. Is the requirement for a chi-square goodness of fit test satisfied? Explain.
3. Write the null and alternative hypotheses for a goodness of fit test.
4. I can see that the rolls didn’t come out even. What’s the point of completing the test?
Solution:
Our goal is to see if the observed values are close enough to the expected values that the
differences could be due to random variation or, alternatively, if the differences great enough that
we can conclude that the distribution is not as expected. Therefore, our sample statistic (which is
also the test statistic in this case) should provide a measure of how far the from “expected”
frequencies the “observed” frequencies are, as a group. The test statistic for a goodness of fit test
is:
2 ( O−E )2
X =∑
E
where O = observed frequency, E = expected frequency, and the sum is taken overall the
categories.
Example 2: The data for 100 students is recorded in the table below (the observed frequencies).
We have also indicated the expected frequency for each category. Since there are 100 measures
or observations and there are three categories (Macintosh, IBM, and Other) we would indicate
the expected frequency for each category to be 100/3 or 33.333. In the third column of the table
we have calculated the square of the observed frequency minus the expected frequency divided
by the expected frequency. The sum of the third column would be the value of the chi-square
statistic.
We now have the information we need to complete the six step process for testing statistical
hypotheses for our research problem.
Solution:
Step 1: State the null hypothesis and the alternative hypothesis based on your research question.
H 0 :O E
H1 :O E
Note: Our null hypothesis, for the chi-square test, states that there are no differences between the
observed and the expected frequencies. The alternate hypothesis states that there are significant
differences between the observed and expected frequencies.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.
Step 3: Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for
the statistical test if necessary.
2 =13.820
df = C - 1 = 2
Step 4: Write the decision rule for rejecting the null hypothesis.
Reject H0 if >= 5.991.
2
Note: To write the decision rule we had to know the critical value for chi-square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and
noting the tabled value for the column for the .05 level and the row for 2 df.
Note: Since our calculated value of (13.820) is greater than 5.991, we reject the null
2
Example 2: Acme Toy Company prints baseball cards. The company claims that 30% of the
cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars.
Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this
consistent with Acme's claim? Use a 0.05 level of significance.
Solution:
Step 1: State the null hypothesis and the alternative hypothesis based on your research question.
Null hypothesis: The proportion of rookies, veterans, and All-Stars is 30%, 60% and
10%, respectively.
Alternative hypothesis: At least one of the proportions in the null hypothesis is false.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.
Step 3: Analyze sample data: Applying the chi-square goodness of fit test to sample data, we
compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic.
Based on the chi-square statistic and the degrees of freedom, we determine the P-value.
DF = k - 1 = 3 - 1 = 2 (Ei) = n * pi
(E1) = 100 * 0.30 = 30
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
Χ2 = Σ [ (Oi - Ei)2 / Ei ]
Χ2 = [ (50 - 30)2 / 30 ] + [ (45 - 60)2 / 60 ] + [ (5 - 10)2 / 10 ]
Χ2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58
where DF is the degrees of freedom, k is the number of levels of the categorical variable, n is the
number of observations in the sample, Ei is the expected frequency count for level i, Oi is the
observed frequency count for level i, and Χ2 is the chi-square test statistic.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more
extreme than 19.58.
Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple
random sampling, the variable under study was categorical, and each level of the categorical
variable had an expected frequency count of at least 5.
Example 3: In a national study, students required to buy computers for college use bought IBM
computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of
the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought
IBM computers, and 17 bought some other brand of computer. We want to know if this
frequency of computer buying behavior is similar to or different than the national study data.
The data for 100 students is recorded in the table below (the observed frequencies). In this case
the expected frequencies are those from the national study. To get the expected frequency we
take the percentages from the national study times the total number of subjects in the current
study.
Expected frequency for IBM = 100 X 50% = 50
Expected frequency for Macintosh = 100 X 25% = 25
Expected frequency for Other = 100 X 25% = 25
The expected frequencies are recorded in the second column of the table. As before we have
calculated the square of the observed frequency minus the expected frequency divided by the
expected frequency and recorded this result in the third column of the table. The sum of the third
column would be the value of the chi-square statistic.
We now have the information we need to complete the six-step process for testing statistical
hypotheses for our research problem.
Step 1: State the null hypothesis and the alternative hypothesis based on your research
question.
H0 : O E
H1 : O E
Note: Our null hypothesis, for the chi-square test, states that there are no differences between the
observed and the expected frequencies. The alternate hypothesis states that there are significant
differences between the observed and expected frequencies.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.
Step 3: Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
2
χ =7.58
df = C - 1 = 2
Step 4: Write the decision rule for rejecting the null hypothesis.
2
Reject H0 if χ >= 5.991.
2
Note: Since our calculated value of χ (7.58) is greater than 5.991, we reject the null
hypothesis and accept the alternative hypothesis.
In the test for independence, the claim is that the row and column variables are independent of
each other. This is the null hypothesis.
The multiplication rule said that if two events were independent, then the probability of both
occurring was the product of the probabilities of each occurring. This is key to working the test
for independence. If you end up rejecting the null hypothesis, then the assumption must have
been wrong and the row and column variable are dependent. Remember, all hypothesis testing is
done under the assumption the null hypothesis is true.
The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the
test for independence is the same as the principle behind the goodness-of-fit test. The test for
independence is always a right tail test.
How to calculate the Chi-square statistics by hand: First we have to calculate the expected value
of the two nominal variables. We can calculate the expected value of the two nominal variables
by using this formula:
c y
∑ Oi , j ∑ O k , j
Ei , j= k−1 k−1
N
where:
Ei , j= Expected Value
c
N = total number
After calculating the expected value, we will apply the following formula to calculate the value
of the Chi-Square test of Independence:
y c 2
2 ( Oij −Eij )
X =∑ ∑
i=1 j=1 Eij
where:
DF = (r-1)(c-1)
Where
DF = Degree of freedom
r = number of rows
c = number of column
Hypothesis:
Null hypothesis: Assumes that there is no association between the two variables
Alternative hypothesis: Assumes that there is an association between the two variables
Hypothesis testing: Hypothesis testing for the chi-square test of independence as it is for other
tests like ANOVA, where a test statistic is computed and compared to a critical value. The
critical value for the chi-square statistic is determined by the level of significance (typically .05)
and the degrees of freedom. The degrees of freedom for the chi-square are calculated using the
following formula: df = (r-1)(c-1) where r is the number of rows and c is the number of columns.
If the observed chi-square test statistic is greater than the critical value, the null hypothesis can
be rejected.
Example 4: A public opinion poll surveyed a simple random sample of 1000 voters.
Respondents were classified by gender (male or female) and by voting preference (Republican,
Democrat, or Independent). Results are shown in the contingency table below.
Is there a gender gap? Do the men's voting preferences differ significantly from the women's
preferences? Use a 0.05 level of significance.
Solution:
Step 1: State the Hypothesis. The first step is to state the null hypothesis and an alternative
hypothesis.
Step 2: Formulate an analysis plan. For this analysis, the significance level is 0.05. Using
sample data, we will conduct a chi-square test for independence.
Step 3: Analyze sample data. Applying the chi-square test for independence to sample data, we
compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic.
Based on the chi-square statistic and the degrees of freedom, we determine the P-value.
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
where DF is the degrees of freedom, r is the number of levels of gender, c is the number of levels
of the voting preference, nr is the number of observations from level r of gender, nc is the
number of observations from level c of voting preference, n is the number of observations in the
sample, Er,c is the expected frequency count when gender is level r and voting preference is
level c, and Or,c is the observed frequency count when gender is level r voting preference is
level c.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more
extreme than 16.2.
We use the Chi-Square Distribution Calculator to find P(Χ2 > 16.2) = 0.0003.
Step 5: Interpret results. Since the P-value (0.0003) is less than the significance level (0.05), we
cannot accept the null hypothesis. Thus, we conclude that there is a relationship between gender
and voting preference.
Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple
random sampling, the variables under study were categorical, and the expected frequency count
was at least 5 in each cell of the contingency table.
Contingency Table
Data arranged in table form for the chi-square independence test
Expected Frequency
The frequencies obtained by calculation.
Goodness-of-fit Test
A test to see if a sample comes from a population with the given distribution.
Independence Test
A test to see if the row and column variables are independent.
Observed Frequency
The frequencies obtained by observation. These are the sample frequencies.
Unit VIII
F-test
LEARNING OBJECTIVES
In this unit, we are going to understand another statistical test which uses f-distribution. At
end of this unit, the following target are be are to be accomplished
I. F-test
Definition:
An “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when
people talk about the F-Test, what they are actually talking about is The F-Test to Compare Two
Variances. However, the f-statistic is used in a variety of tests including regression analysis, the
Chow test and the Scheffe Test (a post-hoc ANOVA test).
Statistics Solutions is the country’s leader in F-test and dissertation statistics. Contact Statistics
Solutions today for a free 30-minute consultation. The F-test contains some applications that are
used in statistical theory. This document will detail the applications.
The F-test is used by a researcher in order to carry out the test for the equality of the two
population variances. If a researcher wants to test whether or not two independent samples have
been drawn from a normal population with the same variability, then he generally employs the F-
test.
The F-test is also used by the researcher to determine whether or not the two independent
estimates of the population variances are homogeneous in nature.
An example depicting the above case in which the F-test is applied is, for example, if two sets of
pumpkins are grown under two different experimental conditions. In this case, the researcher
would select a random sample of size 9 and 11. The standard deviations of their weights are 0.6
and 0.8 respectively. After making an assumption that the distribution of their weights is normal,
the researcher conducts an F-test to test the hypothesis on whether or not the true variances are
equal.
The researcher uses the F-test to test the significance of an observed multiple correlation
coefficient. It is also used by the researcher to test the significance of an observed sample
correlation ratio. The sample correlation ratio is defined as a measure of association as the
statistical dispersion in the categories within the sample as a whole. Its significance is tested by
the researcher.
The F-distribution is formed by the ratio of two independent chi-square variables divided by their
respective degrees of freedom.
df 1⋅s 21
2
σ1
df 1
F=
df 2⋅s 22
σ 22
df 2
F-Test
The F-test is designed to test if two population variances are equal. It does this by comparing the
ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.
s12
F
s22
All hypothesis testing is done under the assumption the null hypothesis is true
If the null hypothesis is true, then the F test-statistic given above can be simplified
(dramatically). This ratio of sample variances will be test statistic used. If the null hypothesis is
false, then we will reject the null hypothesis that the ratio was equal to 1 and our assumption that
they were equal.
There are several different F-tables. Each one has a different level of significance. So, find the
correct level of significance first, and then look up the numerator degrees of freedom and the
denominator degrees of freedom to find the critical value.
You will notice that all of the tables only give level of significance for right tail tests. Because
the F distribution is not symmetric, and there are no negative values, you may not simply take
the opposite of the right critical value to find the left critical value. The way to find a left critical
value is to reverse the degrees of freedom, look up the right critical value, and then take the
reciprocal of this value. For example, the critical value with 0.05 on the left with 12 numerator
and 15 denominator degrees of freedom is found of taking the reciprocal of the critical value
with 0.05 on the right with 15 numerator and 12 denominator degrees of freedom.
The numerator degrees of freedom will be the degrees of freedom for whichever sample has the
larger variance (since it is in the numerator) and the denominator degrees of freedom will be the
degrees of freedom for whichever sample has the smaller variance (since it is in the
denominator).
If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up and
compare the right critical value.
Assumptions / Notes
The larger variance should always be placed in the numerator
The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
Divide alpha by 2 for a two-tail test and then find the right critical value
If standard deviations are given instead of variances, they must be squared
When the degrees of freedom aren't given in the table, go with the value with the larger
critical value (this happens to be the smaller degrees of freedom). This is so that you are
less likely to reject in error (type I error)
The populations from which the samples were obtained must be normal.
The samples must be independent
Step 2: Calculate the F value. The F Value is calculated using the formula F = (SSE1 – SSE2 /
m) / SSE2 / n-k, where SSE = residual sum of squares, m = number of restrictions and k =
number of independent variables.
Step 3: Find the F Statistic (the critical value for this test). The F statistic formula is:
F Statistic = variance of the group means / mean of the within group variances.
One-way ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any
statistically significant differences between the means of three or more independent (unrelated)
groups. This guide will provide a brief introduction to the one-way ANOVA, including the
assumptions of the test and when you should use this test.
The one-way ANOVA compares the means between the groups you are interested in and
determines whether any of those means are statistically significantly different from each other.
Specifically, it tests the null hypothesis:
where µ = group mean and k = number of groups. If, however, the one-way ANOVA returns a
statistically significant result, we accept the alternative hypothesis (H A), which is that there are at
least two group means that are statistically significantly different from each other.
At this point, it is important to realize that the one-way ANOVA is an omnibus test statistic and
cannot tell you which specific groups were statistically significantly different from each other,
only which at least two groups were. To determine which specific groups differed from each
other, you need to use a post hoc test. Post hoc tests are described later in this guide.
For some statisticians the ANOVA doesn’t end there – they assume a cause effect relationship
and say that one or more independent, controlled variables (the factors) cause the significant
difference of one or more characteristics. The way this works is that the factors sort the data
points into one of the groups and therefore they cause the difference in the mean value of the
groups.
Example: Let us claim that woman have on average longer hair than men. We find twenty
undergraduate students and measure the length of their hair. A conservative statistician would
then claim we measured the hair of ten female and ten male students, and that we conducted an
analysis of variance and found that the average hair of female undergraduate students is
significantly longer than the hair of their fellow male students.
Assumptions
The populations from which the samples were obtained must be normally or
approximately normally distributed.
The samples must be independent.
The variances of the populations must be equal.
Hypotheses
The null hypothesis will be that all population means are equal, the alternative hypothesis is that
at least one mean is different.
In the following, lower case letters apply to the individual samples and capital letters apply to the
entire set collectively. That is, n is one of many sample sizes, but N is the total sample size.
Grand Mean
X̄ GM =
∑x
N
The grand mean of a set of samples is the total of all the data values divided by the total sample
size. This requires that you have all of the sample data available to you, which is usually the
case, but not always. It turns out that all that is necessary to find perform a one-way analysis of
variance are the number of samples, the sample means, the sample variances, and the sample
sizes.
X̄ GM =
∑ n x̄
∑n
Another way to find the grand mean is to find the weighted average of the sample means. The
weight applied is the sample size.
Total Variation
2
SS(T ) = ∑ ( x − X̄ GM )
The total variation (not variance) is comprised the sum of the squares of the differences of each
mean with the grand mean.
There is the between group variation and the within group variation. The whole idea behind the
analysis of variance is to compare the ratio of between group variance to within group variance.
If the variance caused by the interaction between the samples is much larger when compared to
the variance that appears within each group, then it is because the means aren't the same.
The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares
Between groups. If the sample means are close to each other (and therefore the Grand Mean) this
will be small. There are k samples involved with one data value for each sample (the sample
mean), so there are k – 1 degrees of freedom.
The variance due to the interaction between the samples is denoted MS(B) for Mean Square
Between groups. This is the between group variation divided by its degrees of freedom. It is also
2
s
denoted by b .
Within Group Variation
SS(W ) = ∑ df⋅s 2
The variation due to differences within individual samples, denoted SS(W) for Sum of Squares
Within groups. Each sample is considered independently, no interaction between samples is
involved. The degree of freedom is equal to the sum of the individual degrees of freedom for
each sample. Since each sample has degrees of freedom equal to one less than their sample sizes,
and there are k samples, the total degrees of freedom is k less than the total sample size:
df = N – k
The variance due to the differences within individual samples is denoted MS(W) for Mean
Square Within groups. This is the within group variation divided by its degrees of freedom. It is
2
also denoted by s w . It is the weighted average of the variances (weighted with the degrees of
freedom).
F test statistic
sb2
F
sw2
Recall that a F variable is the ratio of two independent chi-square variables divided by their
respective degrees of freedom. Also recall that the F test statistic is the ratio of two sample
variances, well, it turns out that's exactly what we have here. The F test statistic is found by
dividing the between group variance by the within group variance. The degrees of freedom for
the numerator are the degrees of freedom for the between group (k-1) and the degrees of freedom
for the denominator are the degrees of freedom for the within group (N-k).
Example 1: Suppose the National Transportation Safety Board (NTSB) wants to examine the
safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of
the treatments (cars types). Using the hypothetical data provided below, test whether the mean
pressure applied to the driver’s head during a crash test is equal for each types of car. Use α =
5%.
Table ANOVA. 1
The null hypothesis for an ANOVA always assumes the population means are equal. Hence, we
may write the null hypothesis as:
H0: µ1 = µ 2 = µ 3 - The mean head pressure is statistically equal across the three types of cars.
Since the null hypothesis assumes all the means are equal, we could reject the null hypothesis if
only mean is not equal. Thus, the alternative hypothesis is:
The test statistic in ANOVA is the ratio of the between and within variation in the data. It
follows an F distribution.
Total Sum of Squares – the total variation in the data. It is the sum of the between and within
variation.
r c
Total Sum of Squares (SST) = ∑ ∑ ( X ij− X)2 where r is the number of rows in the table, c is
i=1 j=1
the number of columns, X is the grand mean, and X ij is the ith observation in the jth column.
X=
∑ X ij = (643+655+702+469+ 427+ 484+ 456¿402) =529.22
N 9
Between Sum of Squares (or Treatment Sum of Squares) – variation in the data between the
different samples (or treatments).
Treatment Sum of Squares (SSTR) = ∑ r j (X j− X)2, where j r is the number of rows in the j th
treatment and Xj is the mean of the j th treatment
43024.78
F= =25.17
1709
To find the critical value from an F distribution you must know the numerator (MSTR) and
denominator (MSE) degrees of freedom, along with the significance level.
FCV has df1 and df2 degrees of freedom, where df1 is the numerator degrees of freedom equal
to c-1 and df2 is the denominator degrees of freedom equal to N-c.
You reject the null hypothesis if: F (observed value) > FCV (critical value). In our example
25.17 > 5.14, so we reject the null hypothesis.
Step 5: Interpretation
Since we rejected the null hypothesis, we are 95% confident (1-α ) that the mean head pressure is
not statistically equal for compact, midsize, and full size cars. However, since only one mean
must be different to reject the null, we do not yet know which mean(s) is/are different. In short,
an ANOVA test will test us that at least one mean is different, but an additional test must be
conducted to determine which mean(s) is/are different.
If you fail to reject the null hypothesis in an ANOVA then you are done. You know, with some
level of confidence, that the treatment means are statistically equal. However, if you reject the
null then you must conduct a separate test to determine which mean(s) is/are different. There are
several techniques for testing the differences between means, but the most common test is the
Least Significant Difference Test.
Definition:
The Scheffe Test (also called Scheffe’s procedure or Scheffe’s method) is a post-hoc test used
in Analysis of Variance. It is named for the American statistician Henry Scheffe. After you have
run ANOVA and got a significant F-statistic (i.e. you have rejected the null hypothesis that the
means are the same), then you run Sheffe’s test to find out which pairs of means are significant.
The Scheffe test corrects alpha for simple and complex mean comparisons. Complex mean
comparisons involve comparing more than one pair of means simultaneously.
Out of the three mean comparisons test you can run (the other two are Fisher’s LSD and Tukey’s
HSD). The Scheffe test is the most flexible, but it is also the test with the lowest statistical
power. Deciding which test to run largely depends on what comparisons you’re interested in:
If you only want to make pairwise comparisons, run the Tukey procedure because it will
have a narrower confidence interval.
If you want to compare all possible simple and complex pairs of means, run the Scheffe
test as it will have a narrower confidence interval.
This is where the Scheffe' and Tukey tests come into play. They will help us analyze pairs of
means to see if there is a difference -- much like the difference of two means covered earlier.
Hypotheses
H0 : i j
H1 : i j
Both tests are set up to test if pairs of means are different. The formulas refer to mean i and mean
j. The values of i and j vary, and the total number of tests will be equal to a combination of k
objects, 2 at a time C(k, 2), where k is the number of samples.
Scheffé Test
The Scheffe' test is customarily used with unequal sample sizes, although it could be used with
equal sample sizes.
The critical value for the Scheffe' test is the degrees of freedom for the between variance times
the critical value for the one-way ANOVA. This simplifies to be:
CV = (k – 1) F(k – 1, N – k, alpha)
TS : Fs
1 1
sw2
ni n j
Pure mathematicians will argue that this shouldn't be called F because it doesn't have an F
distribution (it's the degrees of freedom times an F), but we'll live it with it.
Reject H0 if the test statistic is greater than the critical value. Note, this is a right tail test. If there
is no difference between the means, the numerator will be close to zero, and so performing a left
tail test wouldn't show anything.
Tukey Test
The Tukey test is only usable when the sample sizes are the same.
The Critical Value is looked up in a table. There are actually several different tables, one for
each level of significance. The number of samples, k, is used as a index along the top, and the
degrees of freedom for the within group variance, v = N – k, are used as an index along the left
side.
xi x j
TS : q
sw2 / n
The test statistic is found by dividing the difference between the means by the square root of the
ratio of the within group variation and the sample size.
Reject the null hypothesis if the absolute value of the test statistic is greater than the critical
value (just like the linear correlation coefficient critical values).
Two-way ANOVA
Definition
The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable. For example, you could use a two-way ANOVA to understand whether there is an
interaction between gender and educational level on test anxiety amongst university students,
where gender (males/females) and education level (undergraduate/postgraduate) are your
independent variables, and test anxiety is your dependent variable. Alternately, you may want to
determine whether there is an interaction between physical activity level and gender on blood
cholesterol concentration in children, where physical activity (low/moderate/high) and gender
(male/female) are your independent variables, and cholesterol concentration is your dependent
variable.
Assumption:
Your dependent variable should be measured at the continuous level (i.e., they are
interval or ratio variables). Examples of continuous variables include revision time
(measured in hours), intelligence (measured using IQ score), exam performance
(measured from 0 to 100), weight (measured in kg), and so forth.
Your two independent variables should each consist of two or more categorical,
independent groups. Example independent variables that meet this criterion include
gender (2 groups: male or female), ethnicity (3 groups: Caucasian, African American and
Hispanic), profession (5 groups: surgeon, doctor, nurse, dentist, therapist), and so forth.
You should have independence of observations, which means that there is no relationship
between the observations in each group or between the groups themselves. For example,
there must be different participants in each group with no participant being in more than
one group. This is more of a study design issue than something you would test for, but it
is an important assumption of the two-way ANOVA. If your study fails this assumption,
you will need to use another statistical test instead of the two-way ANOVA (e.g., a
repeated measures design).
There should be no significant outliers. Outliers are data points within your data that do
not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean
score was 108 with only a small variation between students, one student had a score of
156, which is very unusual, and may even put her in the top 1% of IQ scores globally).
The problem with outliers is that they can have a negative effect on the two-way
ANOVA, reducing the accuracy of your results.
Your dependent variable should be approximately normally distributed for each
combination of the groups of the two independent variables.
There needs to be homogeneity of variances for each combination of the groups of the
two independent variables.
Hypotheses
There are three sets of hypotheses with the two-way ANOVA.
The null hypotheses for each of the sets are given below.
1. The population means of the first factor are equal. This is like the one-way ANOVA for
the row factor.
2. The population means of the second factor are equal. This is like the one-way ANOVA
for the column factor.
3. There is no interaction between the two factors. This is similar to performing a test for
independence with contingency tables.
Factors
The two independent variables in a two-way ANOVA are called factors. The idea is that there
are two variables, factors, which affect the dependent variable. Each factor will have two or more
levels within it, and the degrees of freedom for each factor is one less than the number of levels.
Treatment Groups
Treatment Groups are formed by making all possible combinations of the two factors. For
example, if the first factor has 3 levels and the second factor has 2 levels, then there will be 3 2
= 6 different treatment groups.
As an example, let's assume we're planting corn. The type of seed and type of fertilizer are the
two factors we're considering in this example. This example has 15 treatment groups. There are 3
– 1 = 2 degrees of freedom for the type of seed, and 5 – 1 = 4 degrees of freedom for the type of
fertilizer. There are 2 4 = 8 degrees of freedom for the interaction between the type of seed
and type of fertilizer.
The data that actually appears in the table are samples. In this case, 2 samples from each
treatment group were taken.
Main Effect
The main effect involves the independent variables one at a time. The interaction is ignored for
this part. Just the rows or just the columns are used, not mixed. This is the part which is similar
to the one-way analysis of variance. Each of the variances calculated to analyze the main effects
are like the between variances
Interaction Effect
The interaction effect is the effect that one factor has on the other factor. The degrees of freedom
here are the product of the two degrees of freedom for each factor.
Within Variation
The Within variation is the sum of squares within each treatment group. You have one less than
the sample size (remember all treatment groups must have the same sample size for a two-way
ANOVA) for each treatment group. The total number of treatment groups is the product of the
number of levels for each factor. The within variance is the within variation divided by its
degrees of freedom.
Source SS df MS F
Main Effect A given A, a - 1 SS / df MS(A) / MS(W)
Main Effect B given B, b - 1 SS / df MS(B) / MS(W)
Interaction given AB, (a - 1)(b - SS / df MS(AB) /
Effect 1) MS(W)
Within given N - ab, ab(n - 1) SS / df
Total sum of others N - 1, abn - 1
Summary
The following results are calculated using spreadsheet. It provides the p-value and the critical
values are for alpha = 0.05.
From the above results, we can see that the main effects are both significant, but the interaction
between them isn't. That is, the types of seed aren't all equal, and the types of fertilizer aren't all
equal, but the type of seed doesn't interact with the type of fertilizer.