0% found this document useful (0 votes)
9 views6 pages

Chi Square Test

asdf

Uploaded by

pspalitto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Chi Square Test

asdf

Uploaded by

pspalitto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

THE CHI SQUARE (Χ2) TEST Name:

Period: 1 2 3 4 5 6 7

Background:
Science is a methodological way of knowing about the world, as well as a cumulative body of knowledge
that describes the way the natural world works. The Scientific Method poses ideas about the way the world works,
and the ideas are presented as causal statements called hypotheses. Hypotheses are tested in experiments. As a
result of experiments, data is collected, which can be either quantitative or qualitative. Collected data either
support or do not support the hypothesis being tested. Thus it is important to discern whether the data being
collected are meaningful.
The Chi Square is a test of statistical validity for quantitative data. That means it can determine whether or
not the collected data reflect something different than what is normal or expected. If the data indicate that there is a
difference, this provides important support for the hypothesis under investigation.

The Process:
1. The Chi Square test begins with the creation of a hypothesis. A hypothesis suggests a relationship
between a specific cause and a specific effect. The EASIEST way to write a hypothesis is to use the
format “If…(cause)…, then…(effect)…” Of course you can write a hypothesis in multiple ways, but
you may mislead or confuse the reader if you use any method other than “If…Then…”

EXAMPLE: If you flip a coin 100 times, then you should see equal numbers of heads and tails. This is a testable
hypothesis, based on previously acquired data, linking a specific cause to a specific consequence/effect, and
assumes that coin flipping is a perfectly randomized process.

2. Once you have posed a specific cause-effect relationship, the next step is to write a Null Hypothesis.
The EASIEST way to write a null hypothesis is to use the format “There is no difference
between…(observed data)…and…(expected data)…” Note that this is about the DATA you collected,
not about the cause-effect relationship of the original hypothesis being tested by the experiment.

EXAMPLE: You expect to see the coin land equal numbers of times on each side, because there are only 2
outcomes, and if 100 flips is the trial pool, a perfectly fair coin flipped in a random manner should land on each
side an equal number of times. For this experiment, use the data given below:

# Heads # Tails TOTAL coin flips


47 53 100

Heavens to Betsy! The numbers did not come out exactly 50 heads : 50 tails! Does that mean that coin flips are
NOT supposed to be even? Or is there something wrong with the coin? Or the flipper? Hold on there, buckaroo—
maybe the results you see are simply the result of random variation…and not evidence of some Global 1%
Conspiracy. This is where Chi Square comes in. A sample null hypothesis would be: “There is no difference
between a 47 heads : 53 tails distribution and a 50 heads : 50 tails distribution when flipping a coin 100 times.”
When we say “there is no difference”, what we are REALLY saying is that “there is no significant difference, and
any deviation we observe is because of random chance variations in sampling.”

3. To calculate a Chi Square value, we need to know something called “degrees of freedom”. This means
that for any one specific outcome, how many OTHER outcomes could have happened?

EXAMPLE: Since there are two classes of possible outcome (heads or tails), when a “heads” shows up, there is one
other possible outcome. Therefore, there is 1 degree of freedom. If there were ten categories of outcome, there
would be 9 degrees of freedom. So degrees of freedom will always be n-1, where n is the number of categories of
outcome.
4. We also need a standard to compare against. When the Chi Square test was developed by Pearson in
1900, he calculated what are called “critical values” at each degree of freedom. The table below shows
the critical values for df up to 5 at a p=.05 level of probability. HUH? Translation: If the Chi Square
calculated value is greater than the critical value, there is only a 5% chance that the collected data are
the result of random chance, and a 95% chance that it is NOT the result of random chance, so clearly
the observed is different than the expected.

Critical Values of the Chi-Square Distribution


Degrees of Freedom (df)
Probability (p)
1 2 3 4 5
0.05 3.84 5.99 7.82 9.49 11.1

5. To actually calculate a Chi Square, we use this equation:

where: o is the observed number


e is the expected number
o – e is the difference between observed and expected
(o – e)2 is the square of the difference

The calculated Chi Square value is the sum of the differences between the observed and the
expected squared, divided by the expected. The easiest way to see this in action is to create a
table that has our data, and creates the terms of the equation one at a time.

EXAMPLE: For our heads/tails problem, this is what our data table would look like:

Coin face # observed (o) # expected (e) (o-e) (o-e)2 (o-e)2


e
47 50 -3 9 .18
Heads
53 50 3 9 .18
Tails
χ2 = .36

For each possible outcome, you have the observed quantity, and the expected quantity. For each
possible outcome, you take the difference between observed and expected. Square that difference.
Divide the squared result by the expected number. Sum all of the dividends. This gives the
calculated Chi Square value. In the case of our heads/tails example, the calculated Chi Square of
.36 is less than the critical value from the table of 3.84 under 1 degree of freedom. Since the
calculated value is LESS THAN the critical value, we “fail to reject the null hypothesis”, and there
is no statistically meaningful difference between 47:53 and 50:50. Our deviation from expected is
95% likely due to chance in the sample process. If the calculated value had been GREATER
THAN the critical value, we would “reject the null hypothesis”—meaning that the difference we
observe is not due to chance, and there really is SOMETHING going on that skews the data from
expected. What that something is, the Chi Square does not tell us—but there IS something going
on.
Let’s see this in practice:

Chi Square Modeling Using m&m’s® Candies


Introduction:
When you open a snack size bag of m&m’s®, there is a small handful of
these chocolatey treats. While the colors themselves have no flavor, the
candies have a visual appeal that just screams “eat me!” It is possible that
you will get a good mix of colors, or perhaps there will be a bag of mostly
blue m&m’s®. What’s going on at the Mars® Company? Is the number of
the different colors of m&m’s® in a package really different from one
package to the next, or does the Mars® Company do something to insure
that each package gets a standard number of each color of m&m®? I
imagine you’ve stayed up nights pondering this!

Objectives: After this investigation you should be able to:


• write a null hypothesis that pertains to the investigation;
• determine the degrees of freedom (df) for an investigation;
• calculate the X2 value for a given set of data;
• use the critical values table to determine if the calculated value is equal to or less than the critical value;
• determine if the calculated Chi Square value exceeds the critical value and you reject the null hypothesis or
you fail to reject it.

m&m® candies are manufactured at 2 different plants: Hackettstown, NJ; and Cleveland, TN (to determine origin,
look at the serial production code on the package—HKP or CLV). These plants ship to distributors in different
parts of the country. Each plant has a different distribution for colors in the packages. Here are the expected
percentages of m&m’s® as calculated by the website StatsMedic: (https://www.statsmedic.com/m-m-
distribution)

Hackettstown: Cleveland:

Brown 12.5% Brown 12.4%


Red 12.5% Red 13.1%
Yellow 12.5% Yellow 13.5%
Orange 25% Orange 20.5%
Blue 25% Blue 20.7%
Green 12.5% Green 19.8%
Procedure:
1. If you plan to eat your results, wash your hands.

2. Lay out a paper towel on the desk. Tear open a bag of m&m® candies and spill them onto the
paper towel.

3. Count the number of candies in each color category and write them in the “number observed”
column of Table 1. Fill in the percentage expected based on the production code for the factory
that made your candy. Use these numbers to calculate the “number expected” for each color.
Table 1
Color of Candy Number Observed (o) Percentage Number Expected (e)
Expected (Total number of candies) x
(Percentage Expected)
Brown
Red
Yellow
Orange
Blue
Green
Total # candies =

4. Write a null hypothesis for the distribution of colors of m&m® candies from your bag.

5. Write the number of candies observed in column (o) of Table 2. Write the number of candies
expected in column (e). Do the math to finish the rest of the columns in the table.

Table 2
Classes Observed Expected o-e (o-e)2 (o-e)2
(Colors) (o) (e) e
Brown
Red
Yellow
Orange
Blue
Green

Degrees of freedom = _________ χ2 = __________


(number of classes – 1
Critical Values of the Chi-Square Distribution
Degrees of Freedom (df)
Probability (p)
1 2 3 4 5
0.05 3.84 5.99 7.82 9.49 11.1

Analysis:
If the calculated value for chi square is lower than the critical value, then we fail to reject the null
hypothesis. In other words, there is NO difference between observed and expected distribution
of data. Stated another way…any differences we see between the colors Mars® claims is in a
bag, and what is actually in a bag of M&Ms, just happened by chance sampling error.

If the calculated value is higher than the critical value, then we reject the null hypothesis;
meaning there IS a difference in m&m’s® color ratios between store-bought bags of m&m’s®
and what the Mars® Co. claims are the actual ratios. Stated another way…any differences we
see between what Mars® claims and what is actually in a bag of m&m’s® did NOT just happen
by chance sampling error.

Individual Data
1. What is the calculated X2 value for your Individual data? __________________

2. What is the critical value (p = 0.05) for your Individual data? _____________

3. Based on your individual sample, should you reject or fail to reject the null hypothesis?
Why?

4. If you rejected your null hypothesis, what might be two explanations for your outcome?

Class Data
Now, let’s do the calculation again, but this time for the whole class:

Table 1
Color of Candy Number Observed (o) Percentage Number Expected (e)
Expected (Total number of all pieces of
candy X Percentage Expected)
Brown
Red
Yellow
Orange
Blue
Green
Total # candies =
Table 2
Classes Observed Expected o-e (o-e)2 (o-e)2
(Colors) (o) (e) e
Brown
Red
Yellow
Orange
Blue
Green

Degrees of freedom = _________ χ


2
= __________
(number of classes – 1)

Critical Values of the Chi-Square Distribution


Degrees of Freedom (df)
Probability (p)
1 2 3 4 5
0.05 3.84 5.99 7.82 9.49 11.1

1. What is the calculated X2 value for your Class data? __________________

2. What is the critical value (p = 0.05) for your Class data? _____________

3. Based on the class data, should you reject or fail to reject the null hypothesis? Why ?

4. If you rejected your null hypothesis, what might be some explanations for your outcome?

You might also like