CHP 6
CHP 6
CHP 6
1
Two scientists want to know if a certain drug is effective against
high blood pressure. The first scientist wants to give the drug to
1000 people with high blood pressure and see how many of them
experience lower blood pressure levels. The second scientist wants
to give the drug to 500 people with high blood pressure, and not give
the drug to another 500 people with high blood pressure, and see
how many in both groups experience lower blood pressure levels.
Which is the better way to test this drug?
1
Results from the GSS
2
Parameter and point estimate
3
Parameter and point estimate
p (a population proportion)
3
Parameter and point estimate
p (a population proportion)
p̂ (a sample proportion)
3
Inference on a proportion
4
Inference on a proportion
point estimate ± ME
4
Inference on a proportion
point estimate ± ME
• And we also know that ME = critical value × standard error of
the point estimate.
4
Inference on a proportion
point estimate ± ME
• And we also know that ME = critical value × standard error of
the point estimate.
Standard error of a sample proportion
r
p (1 − p)
SEp̂ =
n
4
Sample proportions are also nearly normally distributed
5
Sample proportions are also nearly normally distributed
5
Sample proportions are also nearly normally distributed
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Estimate (using a 95%
confidence interval) the proportion of all Americans who have good
intuition about experimental design?
6
Back to experimental design...
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Estimate (using a 95%
confidence interval) the proportion of all Americans who have good
intuition about experimental design?
6
Back to experimental design...
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Estimate (using a 95%
confidence interval) the proportion of all Americans who have good
intuition about experimental design?
6
Back to experimental design...
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Estimate (using a 95%
confidence interval) the proportion of all Americans who have good
intuition about experimental design?
6
We are given that n = 670, p̂ = 0.85, we also justqlearned that the
p(1−p)
standard error of the sample proportion is SE = n . Which of
the below is the correct calculation of the 95% confidence interval?
q
0.85×0.15
(a) 0.85 ± 1.96 × 670
q
0.85×0.15
(b) 0.85 ± 1.65 × 670
0.85×0.15
(c) 0.85 ± 1.96 × √
670
q
571×99
(d) 571 ± 1.96 × 670
7
We are given that n = 670, p̂ = 0.85, we also justqlearned that the
p(1−p)
standard error of the sample proportion is SE = n . Which of
the below is the correct calculation of the 95% confidence interval?
q
0.85×0.15
(a) 0.85 ± 1.96 × 670 → (0.82, 0.88)
q
0.85×0.15
(b) 0.85 ± 1.65 × 670
0.85×0.15
(c) 0.85 ± 1.96 × √
670
q
571×99
(d) 571 ± 1.96 × 670
7
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
r
0.85 × 0.15
0.01 ≥ 1.96 × → Use p̂ from previous study
n
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
r
0.85 × 0.15
0.01 ≥ 1.96 × → Use p̂ from previous study
n
0.85 × 0.15
0.012 ≥ 1.962 ×
n
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
r
0.85 × 0.15
0.01 ≥ 1.96 × → Use p̂ from previous study
n
0.85 × 0.15
0.012 ≥ 1.962 ×
n
1.962 × 0.85 × 0.15
n ≥
0.012
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
r
0.85 × 0.15
0.01 ≥ 1.96 × → Use p̂ from previous study
n
0.85 × 0.15
0.012 ≥ 1.962 ×
n
1.962 × 0.85 × 0.15
n ≥
0.012
n ≥ 4898.04
8
Choosing a sample size
How many people should you sample in order to cut the margin of
error of a 95% confidence interval down to 1%.
ME = z? × SE
r
0.85 × 0.15
0.01 ≥ 1.96 × → Use p̂ from previous study
n
0.85 × 0.15
0.012 ≥ 1.962 ×
n
1.962 × 0.85 × 0.15
n ≥
0.012
n ≥ 4898.04 → n should be at least 4,899
8
What if there isn’t a previous study?
why?
9
What if there isn’t a previous study?
why?
9
What if there isn’t a previous study?
why?
9
CI vs. HT for proportions
• Success-failure condition:
• CI: At least 10 observed successes and failures
• HT: At least 10 expected successes and failures, calculated
using the null value
• Standard error: q
p̂(1−p̂)
• CI: calculate using observed sample proportion: SE = n
q
p0 (1−p0 )
• HT: calculate using the null value: SE = n
10
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
11
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
11
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
11
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
11
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
11
The GSS found that 571 out of 670 (85%) Americans answered the
question on experimental design correctly. Do these data provide
convincing evidence that more than 80% of Americans have a good
intuition about experimental design?
(a) Yes
(b) No
(c) Cannot tell
12
11% of 1,001 Americans responding to a 2006 Gallup survey stated
that they have objections to celebrating Halloween on religious
grounds. At 95% confidence level, the margin of error for this sur-
vey is ±3%. A news piece on this study’s findings states: “More than
10% of all Americans have objections on religious grounds to cel-
ebrating Halloween.” At 95% confidence level, is this news piece’s
statement justified?
(a) Yes
(b) No
(c) Cannot tell
12
Recap - inference for one proportion
13
Recap - inference for one proportion
13
Recap - inference for one proportion
13
Difference of two proportions
Melting ice cap
Scientists predict that global warming may have big effects on the
polar regions within the next 100 years. One of the possible effects
is that the northern ice cap may completely melt. Would this bother
you a great deal, some, a little, or not at all if it actually happened?
14
Results from the GSS
The GSS asks the same question, below are the distributions of
responses from the 2010 GSS as well as from a group of
introductory statistics students at Duke University:
GSS Duke
A great deal 454 69
Some 124 30
A little 52 4
Not at all 50 2
Total 680 105
15
Parameter and point estimate
pDuke − pUS
16
Parameter and point estimate
pDuke − pUS
16
Inference for comparing proportions
17
Inference for comparing proportions
17
Inference for comparing proportions
17
Inference for comparing proportions
17
Inference for comparing proportions
r
p1 (1 − p1 ) p2 (1 − p2 )
SE(p̂1 −p̂2 ) = +
n1 n2
17
Conditions for CI for difference of proportions
18
Conditions for CI for difference of proportions
18
Conditions for CI for difference of proportions
18
Conditions for CI for difference of proportions
18
Conditions for CI for difference of proportions
18
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 X1 454 X2
Not a great deal 36 226
Total 105 n1 680 n2
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
? p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
(p̂Duke − p̂US ) ± z × +
nDuke nUS
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
? p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
(p̂Duke − p̂US ) ± z × +
nDuke nUS
= (0.657 − 0.668)
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
? p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
(p̂Duke − p̂US ) ± z × +
nDuke nUS
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
?
(p̂Duke − p̂US ) ± z × +
nDuke nUS
r
0.657 × 0.343 0.668 × 0.332
= (0.657 − 0.668) ± 1.96 × +
105 680
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
?
(p̂Duke − p̂US ) ± z × +
nDuke nUS
r
0.657 × 0.343 0.668 × 0.332
= (0.657 − 0.668) ± 1.96 × +
105 680
= −0.011 ±
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
?
(p̂Duke − p̂US ) ± z × +
nDuke nUS
r
0.657 × 0.343 0.668 × 0.332
= (0.657 − 0.668) ± 1.96 × +
105 680
= −0.011 ± 1.96 × 0.0497
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
?
(p̂Duke − p̂US ) ± z × +
nDuke nUS
r
0.657 × 0.343 0.668 × 0.332
= (0.657 − 0.668) ± 1.96 × +
105 680
= −0.011 ± 1.96 × 0.0497
= −0.011 ± 0.097
19
Construct a 95% confidence interval for the difference between the
proportions of Duke students and Americans who would be both-
ered a great deal by the melting of the northern ice cap (pDuke −pUS ).
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
r
p̂Duke (1 − p̂Duke ) p̂US (1 − p̂US )
?
(p̂Duke − p̂US ) ± z × +
nDuke nUS
r
0.657 × 0.343 0.668 × 0.332
= (0.657 − 0.668) ± 1.96 × +
105 680
= −0.011 ± 1.96 × 0.0497
= −0.011 ± 0.097
= (−0.108, 0.086)
19
Which of the following is the correct set of hypotheses for testing if
the proportion of all Duke students who would be bothered a great
deal by the melting of the northern ice cap differs from the propor-
tion of all Americans who do?
20
Which of the following is the correct set of hypotheses for testing if
the proportion of all Duke students who would be bothered a great
deal by the melting of the northern ice cap differs from the propor-
tion of all Americans who do?
21
Flashback to working with one proportion
np0 ≥ 10 n(1 − p0 ) ≥ 10
21
Pooled estimate of a proportion
22
Pooled estimate of a proportion
22
Pooled estimate of a proportion
# of successes1 + # of successes2
p̂ =
n1 + n2
22
Calculate the estimated pooled proportion of Duke students and
Americans who would be bothered a great deal by the melting of
the northern ice cap. Which sample proportion (p̂Duke or p̂US ) the
pooled estimate is closer to? Why?
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
23
Calculate the estimated pooled proportion of Duke students and
Americans who would be bothered a great deal by the melting of
the northern ice cap. Which sample proportion (p̂Duke or p̂US ) the
pooled estimate is closer to? Why?
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
# of successes1 + # of successes2
p̂ =
n1 + n2
23
Calculate the estimated pooled proportion of Duke students and
Americans who would be bothered a great deal by the melting of
the northern ice cap. Which sample proportion (p̂Duke or p̂US ) the
pooled estimate is closer to? Why?
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
# of successes1 + # of successes2
p̂ =
n1 + n2
69 + 454
=
105 + 680
23
Calculate the estimated pooled proportion of Duke students and
Americans who would be bothered a great deal by the melting of
the northern ice cap. Which sample proportion (p̂Duke or p̂US ) the
pooled estimate is closer to? Why?
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
# of successes1 + # of successes2
p̂ =
n1 + n2
69 + 454 523
= =
105 + 680 785
23
Calculate the estimated pooled proportion of Duke students and
Americans who would be bothered a great deal by the melting of
the northern ice cap. Which sample proportion (p̂Duke or p̂US ) the
pooled estimate is closer to? Why?
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
# of successes1 + # of successes2
p̂ =
n1 + n2
69 + 454 523
= = = 0.666
105 + 680 785
23
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
(0.657 − 0.668)
= q =
0.666×0.334
105 + 0.666×0.334
680
24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
(0.657 − 0.668) −0.011
= q =
0.0495
0.666×0.334
105 + 0.666×0.334
680
24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
(0.657 − 0.668) −0.011
= q = = −0.22
0.0495
0.666×0.334
105 + 0.666×0.334
680
24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
(0.657 − 0.668) −0.011
= q = = −0.22
0.0495
0.666×0.334
105 + 0.666×0.334
680
p − value = 2 × P(Z < −0.22) 24
Do these data suggest that the proportion of all Duke students who
would be bothered a great deal by the melting of the northern ice
cap differs from the proportion of all Americans who do? Calcu-
late the test statistic, the p-value, and interpret your conclusion in
context of the data.
Data Duke US
A great deal 69 454
Not a great deal 36 226
Total 105 680
p̂ 0.657 0.668
(p̂Duke − p̂US )
Z = q
p̂(1−p̂) p̂(1−p̂)
nDuke + nUS
(0.657 − 0.668) −0.011
= q = = −0.22
0.0495
0.666×0.334
105 + 0.666×0.334
680
p − value = 2 × P(Z < −0.22) = 2 × 0.41 = 0.82 24
Recap - comparing two proportions
25
Recap - comparing two proportions
25
Recap - comparing two proportions
25
Recap - comparing two proportions
• when H0 : p1 − p2 = (some value other than 0): use p̂1 and p̂2
- this is pretty rare
25
Reference - standard error calculations
r
s21 s22
mean SE = √s
n
SE = n1 + n2
q q
p(1−p) p1 (1−p1 ) p2 (1−p2 )
proportion SE = n SE = n1 + n2
26
Reference - standard error calculations
r
s21 s22
mean SE = √s
n
SE = n1 + n2
q q
p(1−p) p1 (1−p1 ) p2 (1−p2 )
proportion SE = n SE = n1 + n2
26
Reference - standard error calculations
r
s21 s22
mean SE = √s
n
SE = n1 + n2
q q
p(1−p) p1 (1−p1 ) p2 (1−p2 )
proportion SE = n SE = n1 + n2
29
Expected counts
1
(a) 6
12
(b) 6
26,306
(c) 6
12×26,306
(d) 6
30
Expected counts
1
(a) 6
12
(b) 6
26,306
(c) 6
(d) 12×26,306
6 = 52, 612
30
Summarizing Labby’s results
The table below shows the observed and expected counts from
Labby’s experiment.
31
Summarizing Labby’s results
The table below shows the observed and expected counts from
Labby’s experiment.
Why are the expected counts the same for all outcomes but the
observed counts are different? At a first glance, does there appear
31
to be an inconsistency between the observed and expected counts?
Setting the hypotheses
32
Setting the hypotheses
32
Setting the hypotheses
32
Evaluating the hypotheses
33
Evaluating the hypotheses
33
Evaluating the hypotheses
33
Anatomy of a test statistic
34
Anatomy of a test statistic
34
Anatomy of a test statistic
34
Chi-square statistic
When dealing with counts and investigating how far the observed
counts are from the expected counts, we use a new test statistic
called the chi-square (χ2 ) statistic.
35
Chi-square statistic
When dealing with counts and investigating how far the observed
counts are from the expected counts, we use a new test statistic
called the chi-square (χ2 ) statistic.
χ2 statistic
k
X (O − E)2
χ2 = where k = total number of cells
i=1
E
35
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
(52,465−52,612)2
3 52,465 52,612 52,612 = 0.41
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
(52,465−52,612)2
3 52,465 52,612 52,612 = 0.41
(52,338−52,612)2
4 52,338 52,612 52,612 = 1.43
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
(52,465−52,612)2
3 52,465 52,612 52,612 = 0.41
(52,338−52,612)2
4 52,338 52,612 52,612 = 1.43
(52,244−52,612)2
5 52,244 52,612 52,612 = 2.57
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
(52,465−52,612)2
3 52,465 52,612 52,612 = 0.41
(52,338−52,612)2
4 52,338 52,612 52,612 = 1.43
(52,244−52,612)2
5 52,244 52,612 52,612 = 2.57
(53,285−52,612)2
6 53,285 52,612 52,612 = 8.61
36
Calculating the chi-square statistic
(O−E)2
Outcome Observed Expected E
(53,222−52,612)2
1 53,222 52,612 52,612 = 7.07
(52,118−52,612)2
2 52,118 52,612 52,612 = 4.64
(52,465−52,612)2
3 52,465 52,612 52,612 = 0.41
(52,338−52,612)2
4 52,338 52,612 52,612 = 1.43
(52,244−52,612)2
5 52,244 52,612 52,612 = 2.57
(53,285−52,612)2
6 53,285 52,612 52,612 = 8.61
37
Why square?
37
Why square?
37
Why square?
37
The chi-square distribution
38
The chi-square distribution
38
The chi-square distribution
- normal distribution: unimodal and symmetric with two parameters: mean and
standard deviation
- T distribution: unimodal and symmetric with one parameter: degrees of freedom
- F distribution: unimodal and right skewed with two parameters: degrees of freedom
or numerator (between group variance) and denominator (within group variance)
38
Which of the following is false?
Degrees of Freedom
2
4
9
0 5 10 15 20 25
As the df increases,
39
Which of the following is false?
Degrees of Freedom
2
4
9
0 5 10 15 20 25
As the df increases,
39
Finding areas under the chi-square curve
40
Finding areas under the chi-square curve
40
Finding areas under the chi-square curve (cont.)
Estimate the shaded area (above the cutoff value of 10) under the
χ2 curve with df = 6.
41
Finding areas under the chi-square curve (cont.)
Estimate the shaded area (above the cutoff value of 10) under the
χ2 curve with df = 6.
[1] 0.124652
41
Finding areas under the chi-square curve (cont.)
Estimate the shaded area (above the cutoff value of 17) under the
χ2 curve with df = 9.
(a) 0.05
(b) 0.02
df = 9 (c) between 0.02 and 0.05
(d) between 0.05 and 0.1
0 17 (e) between 0.01 and 0.02
42
Finding areas under the chi-square curve (cont.)
Estimate the shaded area (above the cutoff value of 17) under the
χ2 curve with df = 9.
(a) 0.05
(b) 0.02
df = 9 (c) between 0.02 and 0.05
(d) between 0.05 and 0.1
0 17 (e) between 0.01 and 0.02
[1] 0.04871598
42
Finding areas under the chi-square curve (one more)
Estimate the shaded area (above 30) under the χ2 curve with df =
10.
43
Finding areas under the chi-square curve (one more)
Estimate the shaded area (above 30) under the χ2 curve with df =
10.
[1] 0.0008566412
43
Back to Labby’s dice
44
Back to Labby’s dice
44
Back to Labby’s dice
44
Back to Labby’s dice
df = k − 1
45
Degrees of freedom for a goodness of fit test
df = k − 1
df = 6 − 1 = 5
45
Finding a p-value for a chi-square test
The p-value for a chi-square test is defined as the tail area above
the calculated test statistic.
0 24.67
46
Conclusion of the hypothesis test
(a) Reject H0 , the data provide convincing evidence that the dice
are fair.
(b) Reject H0 , the data provide convincing evidence that the dice
are biased.
(c) Fail to reject H0 , the data provide convincing evidence that the
dice are fair.
(d) Fail to reject H0 , the data provide convincing evidence that the
dice are biased.
47
Conclusion of the hypothesis test
(a) Reject H0 , the data provide convincing evidence that the dice
are fair.
(b) Reject H0 , the data provide convincing evidence that the dice
are biased.
(c) Fail to reject H0 , the data provide convincing evidence that the
dice are fair.
(d) Fail to reject H0 , the data provide convincing evidence that the
dice are biased.
47
Turns out...
• The 1-6 axis is consistently shorter than the other two (2-5
and 3-4), thereby supporting the hypothesis that the faces
with one and six pips are larger than the other faces.
• Pearson’s claim that 5s and 6s appear more often due to the
carved-out pips is not supported by these data.
• Dice used in casinos have flush faces, where the pips are
filled in with a plastic of the same density as the surrounding
material and are precisely balanced.
48
Recap: p-value for a chi-square test
p−value
49
Conditions for the chi-square test
50
Conditions for the chi-square test
50
Conditions for the chi-square test
50
Conditions for the chi-square test
50
2009 Iran Election
There was lots of talk of election fraud in the 2009 Iran election.
We’ll compare the data from a poll conducted before the election
(observed data) to the reported votes in the election to see if the
two follow the same distribution.
Observed # of Reported % of
Candidate voters in poll votes in election
(1) Ahmedinajad 338 63.29%
(2) Mousavi 136 34.10%
(3) Minor candidates 30 2.61%
Total 504 100%
51
2009 Iran Election
There was lots of talk of election fraud in the 2009 Iran election.
We’ll compare the data from a poll conducted before the election
(observed data) to the reported votes in the election to see if the
two follow the same distribution.
Observed # of Reported % of
Candidate voters in poll votes in election
(1) Ahmedinajad 338 63.29%
(2) Mousavi 136 34.10%
(3) Minor candidates 30 2.61%
Total 504 100%
↓ ↓
observed expected
distribution
51
Hypotheses
52
Hypotheses
H0 : The observed counts from the poll follow the same distribution
as the reported votes.
HA : The observed counts from the poll do not follow the same
distribution as the reported votes.
52
Calculation of the test statistic
53
Calculation of the test statistic
53
Calculation of the test statistic
53
Calculation of the test statistic
54
Conclusion
54
Wednesday, 29 Nov
4th
5th
6th
Grades Popular Sports Grades
4th 63 31 25
5th 88 55 33
Popular
6th 96 55 32
Sports
55
Chi-square test of independence
56
Chi-square test of independence
56
Chi-square test of independence
*normally, on the test the table has 2 rows, 2 columns
56
Expected counts in two-way tables
57
Expected counts in two-way tables
57
Expected counts in two-way tables
119 × 247
Erow 1,col 1 = = 61
478
57
Expected counts in two-way tables
57
Expected counts in two-way tables
176×141
(a) 478
119×141
(b) 478
176×247
(c) 478
176×478
(d) 478
58
Expected counts in two-way tables
(a) 176×141 → 52
478
119×141 more than expected # of 5th graders
(b) 478 have a goal of being popular
176×247
(c) 478
176×478
(d) 478
58
Calculating the test statistic in two-way tables
59
Calculating the test statistic in two-way tables
59
Calculating the test statistic in two-way tables
59
Calculating the p-value
Which of the following is the correct p-value for this hypothesis test?
χ2 = 1.3121 df = 4
0 1.3121
(d) between 0.1 and 0.05
(e) less than 0.001
60
Calculating the p-value
Which of the following is the correct p-value for this hypothesis test?
χ2 = 1.3121 df = 4
0 1.3121
(d) between 0.1 and 0.05
(e) less than 0.001
60
Conclusion
61
Conclusion
61