Chi-Square Tests - Handout
Chi-Square Tests - Handout
Roughly 90% of individuals are right handed. The shape of the sam-
pling distribution of proportions of individuals that are right handed in
Statistics 101
random samples of size 80 will be
Prof. Rundel
(a) nearly normal
October 25, 2011 (b) left skewed
(c) right skewed
(d) not enough information to tell
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 1 / 39
Testing for goodness of fit using chi-square Testing for goodness of fit using chi-square
It was observed that 5s or 6s occurred more often than expected, Each day there were ∼150 images to process manually.
and Pearson hypothesized that this was probably due to the At this rate Weldon’s experiment was repeated in a little more
construction of the dice. Most inexpensive dice have than six full days.
hollowed-out pips, and since opposite sides add to 7, the face
with 6 pips is lighter than its opposing face, which has only 1 pip. http:// www.youtube.com/ watch?v=95EErdouO2w
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 2 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 3 / 39
Testing for goodness of fit using chi-square Testing for goodness of fit using chi-square Creating a test statistic for one-way tables
1
(a) 6
12
(b) 6
26,306
(c) 6
12×26,306
(d) 6
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 4 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 5 / 39
Testing for goodness of fit using chi-square Creating a test statistic for one-way tables Testing for goodness of fit using chi-square Creating a test statistic for one-way tables
The table below shows the observed and expected counts from
Labby’s experiment. Do these data provide convincing evidence to suggest an inconsis-
tency between the observed and expected counts?
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 6 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 7 / 39
Testing for goodness of fit using chi-square Creating a test statistic for one-way tables Testing for goodness of fit using chi-square The chi-square test statistic
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 8 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 9 / 39
Testing for goodness of fit using chi-square The chi-square test statistic Testing for goodness of fit using chi-square The chi-square test statistic
Outcome 1 2 3 4 5 6 52,118−52,612
(a) Z2 = √
52,612
= −2.15
Observed counts 53,222 52,118 52,465 52,338 52,244 53,285
52,612−52,118
Expected counts 52,612 52,612 52,612 52,612 52,612 52,612 (b) Z2 = √
52,612
= 2.15
52,612−52,118
↓ (c) Z2 = √
52,118
= 2.16
52,118−52,612
53, 222 − 52, 612 (d) Z2 = √
52,118
= −2.16
Z1 = √ = 2.66
52, 612
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 10 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 11 / 39
Testing for goodness of fit using chi-square The chi-square test statistic Testing for goodness of fit using chi-square The chi-square test statistic
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 12 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 13 / 39
Testing for goodness of fit using chi-square The chi-square distribution and finding areas Testing for goodness of fit using chi-square The chi-square distribution and finding areas
Degrees of Freedom
2
In order to determine if the χ statistic we calculated is
2
4
9
considered unusually high or not we need to first describe its
distribution.
The chi-square distribution is sometimes used to characterize
data sets and statistics that are always positive and typically right
skewed.
The chi-square distribution has just one parameter called
degrees of freedom (df), which influences the shape, center, and 0 5 10 15 20 25
Finding areas under the chi-square curve Finding areas under the chi-square curve (cont.)
Estimate the shaded area under the chi-square curve with df = 6.
We will calculate the p-value for the hypotheses we set earlier as
the tail area under the chi-square distribution.
For this we can use technology, or a chi-square probability table.
This table differs a bit from the normal probability table: df = 6
areas
Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001
df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83
2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.82
Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001
df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83 3 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.27
2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.82 4 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.47
3 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.27
4 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.47 5 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52
5 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52
6 7.23 8.56 10.64 12.59 15.03 16.81 18.55 22.46
6 7.23 8.56 10.64 12.59 15.03 16.81 18.55 22.46
0 5 10 15 20 25
7 8.38 9.80 12.02 14.07 16.62 18.48 20.28 24.32 7 8.38 9.80 12.02 14.07 16.62 18.48 20.28 24.32
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 16 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 17 / 39
Testing for goodness of fit using chi-square The chi-square distribution and finding areas Testing for goodness of fit using chi-square The chi-square distribution and finding areas
Finding areas under the chi-square curve (cont.) Finding areas under the chi-square curve (one more)
Clicker question Clicker question
Estimate the shaded area (above 17) under the χ2 curve with df = 9. Estimate the shaded area (above 30) under the χ2 curve with df = 10.
0 17
(e) between 0.01 and 0.02 0 30
(e) cannot tell using this table
Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001 Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001
df 7 8.38 9.80 12.02 14.07 16.62 18.48 20.28 24.32 df 7 8.38 9.80 12.02 14.07 16.62 18.48 20.28 24.32
8 9.52 11.03 13.36 15.51 18.17 20.09 21.95 26.12 8 9.52 11.03 13.36 15.51 18.17 20.09 21.95 26.12
9 10.66 12.24 14.68 16.92 19.68 21.67 23.59 27.88 9 10.66 12.24 14.68 16.92 19.68 21.67 23.59 27.88
10 11.78 13.44 15.99 18.31 21.16 23.21 25.19 29.59 10 11.78 13.44 15.99 18.31 21.16 23.21 25.19 29.59
11 12.90 14.63 17.28 19.68 22.62 24.72 26.76 31.26 11 12.90 14.63 17.28 19.68 22.62 24.72 26.76 31.26
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 18 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 19 / 39
Testing for goodness of fit using chi-square The chi-square distribution and finding areas Testing for goodness of fit using chi-square Finding a p-value for a chi-square test
Finding the tail areas in the year 2011 Back to Labby’s dice
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 20 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 21 / 39
Testing for goodness of fit using chi-square Finding a p-value for a chi-square test Testing for goodness of fit using chi-square Finding a p-value for a chi-square test
Degrees of freedom for a goodness of fit test Finding a p-value for a chi-square test
The p-value for a chi-square test is defined as the tail area above the
calculated test statistic.
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 22 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 23 / 39
Testing for goodness of fit using chi-square Finding a p-value for a chi-square test Testing for goodness of fit using chi-square Finding a p-value for a chi-square test
Testing for goodness of fit using chi-square Finding a p-value for a chi-square test Testing for goodness of fit using chi-square Finding a p-value for a chi-square test
Recap: p-value for a chi-square test Conditions for the chi-square test
The p-value for a chi-square test is defined as the tail area above
the calculated test statistic.
This is because the test statistic is always positive, and a higher
test statistic means a higher deviation from the null hypothesis.
1 Independence: Each case that contributes a count to the table
must be independent of all the other cases in the table.
2 Sample size / distribution: Just like for proportions, each
particular scenario (i.e. cell count) must have at least 10 cases.
Failing to check conditions may unintentionally affect the test’s error
rates.
p−value
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 26 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 27 / 39
Testing for goodness of fit using chi-square Evaluating goodness of fit for a distribution Testing for goodness of fit using chi-square Evaluating goodness of fit for a distribution
There was lots of talk of election fraud in the 2009 Iran election. We’ll
compare the data from a poll conducted before the election (observed
data) to the reported votes in the election to see if the two follow the
same distribution.
What are the hypotheses for testing if the distributions of reported and
Observed # of Reported % of polled votes are different?
Candidate voters in poll votes in election
Ahmedinajad 338 63.29%
Mousavi 136 34.10%
Minor candidates 30 2.61%
Total 504 100%
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 28 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 29 / 39
Testing for goodness of fit using chi-square Evaluating goodness of fit for a distribution Testing for goodness of fit using chi-square Evaluating goodness of fit for a distribution
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 30 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 31 / 39
Testing for independence in two way tables Testing for independence in two way tables
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 32 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 33 / 39
Testing for independence in two way tables Expected counts in two-way tables Testing for independence in two way tables Expected counts in two-way tables
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 34 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 35 / 39
Testing for independence in two way tables The chi-square test statistic for two-way tables Testing for independence in two way tables The chi-square test statistic for two-way tables
Calculating the test statistic in two-way tables Calculating the test statistic in two-way tables (cont.)
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 36 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 37 / 39
Testing for independence in two way tables The chi-square test statistic for two-way tables Testing for independence in two way tables The chi-square test statistic for two-way tables
Clicker question
Which of the following is the correct p-value for this hypothesis test?
Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001
df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83
2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.82
3 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.27
4 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.47
5 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52
Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 38 / 39 Statistics 101 (Prof. Rundel) L15: Chi square October 25, 2011 39 / 39