HW 13 220 Soln
HW 13 220 Soln
HW 13 220 Soln
** For Exercise 69 in Chapter 12, the textbook solution contains some errors. While the setup and work are fine, the part
reproduced below is incorrect:
It should say “The P‐value is 0.003. This gives strong support to conclude that more people in Quebec support abolishing
the Senate than in Ontario. This tiny P‐value easily meets the burden of proof set by a 1% significance level (a high
burden of proof). This result is highly statistically significant. It is also economically significant: an 11 percentage point
difference in the support in Quebec (43%) versus Ontario (32%) is large. Hence, this result is significant overall.”
Required Problems:
(1) (a) The P‐value measures the strength of the evidence in favor of the research hypothesis: weak, strong, or non‐
existent depending on the sample and what H1 claims. The P‐value is the probability that sampling error could have
caused the sample to appear to be inconsistent with the null in the direction of the research hypothesis. The P‐value
depends critically on the direction of the research hypothesis.
For example, the sex of babies where H0: p = 0.512; H1: p > 0.512. The null says the proportion of males is no different
from what human biology dictates. The research hypothesis says that the proportion of males is higher than the natural
rate. For (parity = 0) born in ON to Chinese‐born moms, the sample of n = 12,339 babies where 6,429 are male – the
sample proportion of males (P‐hat) is 0.521 – supports the research hypothesis because the proportion of males in the
sample is higher than what nature would predict. But maybe it is just sampling error (by chance, slightly more of the
12,339 babies turned out to be male)? The P‐value measures the strength of the evidence. What is the probability of
such a high proportion of males if, in fact, the only thing affecting the sex is chance (sampling error)?
Use the sampling distribution of P‐hat under the presumption that H0 is true. If H0 were true then P‐hat ~ Normal with
mean 0.512 and s.d. 0.0045 (=(0.512)(1‐0.512)/12,339)^0.5). Standardize and find P‐value = P(Z > 2.007) = 0.022, which
is fairly strong evidence in favor of the research hypothesis: it is pretty unlikely (2.2% chance) that we could end up with
a random sample of 12,339 babies where so many were male just by chance given the natural birth rate of males.
In contrast, if the research hypothesis were two‐tailed – H1: p ≠ 0.512 – the evidence is weaker because the P‐value
reflects the chance the sample proportion differs from the null by as much as it does in either direction. The difference
between 0.521 and 0.512 is 0.009 (i.e. just shy of one percentage point). If the research hypothesis were two‐tailed the
P‐value = P(P‐hat >=0.521) + P(P‐hat <= 0.503). The chance the data would come out to be so different from the natural
birth rate (so different naturally means in either direction) is 0.044 (=0.022*2).
In further contrast, if the research hypothesis were left‐tailed – H1: p < 0.512 – then the evidence contradicts the
research hypothesis, and the P‐value is greater than 0.5 (P‐value = P(Z < 2.007) = 0.978).
(b) We have NO evidence for the research hypothesis: the observed proportion is LESS THAN 0.512. Hence the P‐value is
huge: P‐value = P(P‐hat > 0.5102441 | p = 0.512, n = 14,789) = P(Z > (0.5102441 – 0.512)/(0.512*(1‐0.512)/14,789)^0.5)
= P(Z > ‐0.43) = 0.665. We cannot conclude there is sex selection in favor of boys if the proportion of boys in the sample
is LESS than the biological norm.
(c) Part (b) is a perfect example. To give another example, suppose we wished to prove that a political candidate has
more than a majority share of the votes: 𝐻 : 𝑝 0.5 vs. 𝐻 : 𝑝 0.5. If a poll (with sample size 𝑛) reveals they have only
30% support 𝑃 0.30 , the P‐value for 𝐻 : 𝑝 0.5 is greater than 0.5 (we have no evidence to support the research
Page 1 of 6
hypothesis). However, the P‐value for the two‐tailed test (𝐻 : 𝑝 0.5 vs. 𝐻 : 𝑝 0.5) is not twice the P‐value from the
original one‐tailed test (which would be a probability greater than 1!). Instead, the P‐value for the two‐tailed test is P‐
value = 𝑃 𝑃 0.30 𝑝 0.5, 𝑛 𝑃 𝑃 0.70|𝑝 0.5, 𝑛 .
(2) Only the rejection region and not the test statistic.
(3)(a) 𝐻 : 𝑝 𝑝 0 versus 𝐻 : 𝑝 𝑝 0
(b) 𝐻 : 𝑝 𝑝 0 versus 𝐻 : 𝑝 𝑝 0
(c) 𝐻 : 𝑝 𝑝 0 versus 𝐻 : 𝑝 𝑝 0
(d) It means the sample response rate for those with a match is lower than the sample response rate of those with no
match (in other words, a direct contradiction of the research hypothesis).
(e) The rejection region would be getting a z test statistic greater than 1.645.
(f) The rejection region would be getting a z test statistic less than ‐1.28.
(g) The rejection region would be getting a z test statistic less than ‐2.576 OR a z test statistic greater than 2.576.
(4) (a) This result is highly statistically significant but is unlikely to be economically significant. Who cares if the
redemption rate for coupons is 0.1 percentage points higher than the claim (15.1 versus 15 percent redeemed)? This
tiny difference is unlikely to affect decisions. You may wonder how such a tiny difference can be statistically different
from zero. The trick is the huge samples size (3 million!). With huge samples, even tiny differences will be statistically
significant, but that does not mean that they are significant. Significant means both statistically significant and
economically significant.
(b) This result is highly economically significant but is not statistically significant. A male birth rate that is more than 10
percentage points higher than the natural rate would point to extreme sex selection in favor of male infants: a
demographic catastrophe. You may wonder how such a huge difference is not statistically different from zero. The trick
is the sample size (which the question did not disclose) must have been tiny. For example, suppose a family had 5 kids
and 3 were male, which is 60 percent male. Obviously, we have no basis to conclude that that family engaged in sex
selection. With a small sample size (𝑛 5), we could easily observe 60 percent male (or even 80 percent or 100
percent). We would not be able to reject the null and conclude there is sex selection.
(c) This result is significant: both statistically significant and economically significant. More than 2 percentage point
higher male birth rate would concern policy makers: that will be a lot of missing girls in a country’s population. (In other
contexts, 2 percentage points may not be economically significant.)
(d) This result is significant: both economically and statistically significant. (For example, Group 1 is Canadian born adults
of working age and Group 2 is non‐Canadian born adults of working age.)
(e) This result statistically significant (at a 5% level) but is not economically significant. Who cares about 22 calories of
difference? A typical adult in Canada consumes about 2,000 calories per day: 22 is a mere 1 percent. That seems small
given the considerable costs of redoing menus to provide calorie information.
(5) FALSE. The idea that there is a trade‐off between the P‐value and Type II error is a common misconception. There is a
trade‐off between Type I and Type II error but that refers to the significance level (, the predetermined threshold for
the maximum Type I error we are willing to tolerate) and not the P‐value (measure of the strength of the evidence). It is
true that the P‐value is the probability of making a Type I error if the null were true. However, it does NOT follow that a
smaller P‐value means a larger Type II error. First, the trade‐off refers to how big the burden of proof is: the choice of a
Page 2 of 6
significance level (). If you choose a large significance level, such as = 0.10, then you would raise the chance of
sending an innocent person to jail (Type I error) and you would decrease the chance of letting a guilty person go free
(Type II error). If you choose a smaller significance level, such as = 0.01 (closer to the “beyond a reasonable doubt”
standard), then you would lower the chance of sending an innocent person to jail (Type I error) and you would increase
the chance of letting a guilty person go free (Type II error). BUT, while you choose the significance level, you do NOT
choose the P‐value. Hence there is a difference between the significance level, which is the burden of proof, and the P‐
value, which is the proof/evidence you have. We would never say that because we have a huge amount of evidence of a
defendant’s guilt (very small P‐value) that there is an increased chance of letting a guilty person go free. We could say
that if we generally REQUIRE a huge amount of evidence to convict people (a very small significance level) then we will
increase the chance of letting guilty people go free. Another way to think about this is that there are many factors that
affect BOTH the P‐value and the chance of making a Type II error: sample size, the standard deviation of the sample
proportion, whether the research hypothesis is one or two directional, and the value specified in the null hypothesis.
Changing any of these underlying factors will change both the P‐value and the probability of making a Type II error.
Hence, we cannot say there is a causal relationship between the P‐value and the probability of making a Type II error,
which means that we cannot say that changing one would result in a change in the other.
(6) (a) Find the critical value of the hypothesis test. alpha = .05
𝑃 𝑍 1.645 0.05 30
0.233
0.20 1 0.20 20
𝑃 𝑃 1.645 ∗ 0.20 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒, 𝑛 400 0.05
400
10
𝑃 𝑃 0.233 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒, 𝑛 400 0.05
0
.1 .2 .3 .4 .5
Hence the critical value (unstandardized) is 𝑝∗ = 0.233. The rejection region is (0.233,
): if the test statistic, 𝑃, lies in this region then we (correctly) reject the false null hypothesis.
If the test statistic, 𝑃, lies outside this region then we (incorrectly) fail to reject the beta = .37
false null hypothesis: make a Type II error. Find the probability of a Type II error. 30
0.233
𝛽 𝑃 𝑃 0.233 | 𝑝 0.24, 𝑛 400
20
𝑃 0.24 0.233 0.24
𝛽 𝑃⎛ ⎞ 10
0.24 1 0.24 0.24 1 0.24
⎝ 400 400 ⎠ 0
.1 .2 .3 .4 .5
0.233 0.24
𝛽 𝑃 ⎛𝑍 ⎞
0.24 1 0.24
⎝ 400 ⎠
𝛽 𝑃 𝑍 0.33 0.37
Hence the power of the statistical test is 0.63 (=1 – 0.37). This means that if we randomly sample 400 people and ask if
they recall the product there is a 63% chance that the statistical test will lead us to conclude correctly that at least 20%
recall the product if in fact 24% of the population do. On the dark side, there is a 37% chance that we will fail to find
sufficient evidence to support the research hypothesis even though it is in fact true.
(b)
alpha = .05 beta = .108
30 30
0.233 0.233
20 20
10 10
0 0
.1 .2 .3 .4 .5 .1 .2 .3 .4 .5
Page 3 of 6
(c)
alpha = .05 beta = .0179
30 30
0.233 0.233
20 20
10 10
0 0
.1 .2 .3 .4 .5 .1 .2 .3 .4 .5
(d) The power increases from (a) to (c) because as the effectiveness of the ad improves – as a higher and higher fraction
of the population recall the product – the chance that our random sample will contain a high fraction recalling the ad
improves. As the sample proportion increases, our ability to reject the false null hypothesis (which says the fraction is
small) improves.
(e) If half of the population recalls the ad and we sample 400 people we will almost surely obtain a high sample
proportion that will fall deep into the rejection region and allow us to reject the false null hypothesis that the proportion
is only 0.2. With the table you’d approximate the power as 1: the probability of a Type II error in this case is virtually zero
(out past the 25th decimal place!).
(f) If the proportion of people recalling the ad in the population is only slightly better than the null hypothesis – 20.5%
versus 20% ‐‐ it is very likely that the sample statistic will be small and provide insufficient proof of the research
hypothesis. Power is only 0.083: there is a low chance that we will obtain a sample that will allow us to infer the
research hypothesis is true (even though it is in fact true!).
(g)
alpha = .1 beta = .0585
30 30
0.226 0.226
20 20
10 10
0 0
.1 .2 .3 .4 .5 .1 .2 .3 .4 .5
(h)
alpha = .05 beta = .0321
40 40
0.227 0.227
30 30
20 20
10 10
0 0
.1 .2 .3 .4 .5 .1 .2 .3 .4 .5
(7) Find the rejection region for this right‐tailed hypothesis test. The standardized rejection region is P( Z 1.645) 0.05
and the un‐standardized region is:
0.6(1 0.6)
P ( pˆ 1.645 * 0.60 | H 0 , n 100) 0.05
100
P ( pˆ 0.6806) 0.05
Page 4 of 6
Find probability of Type II error () if p = 0.58:
0.6806 0.58
P ( pˆ 0.6806 | p 0.58, n 100) P Z P Z 2.0383 0.9792
0.58(1 0.58)
100
Find probability of Type II error () if p = 0.60: (Note: This is weird because the null is NOT false. To understand it, think
about a value very close to 0.60, such as 0.6001.)
0.6806 0.60
P ( pˆ 0.6806 | p 0.60, n 100) P Z PZ 1.645 0.95
0.60(1 0.60)
100
Find probability of Type II error () if p = 0.62:
0.6806 0.62
P ( pˆ 0.6806 | p 0.62, n 100) P Z PZ 1.2485 0.8941
0.62(1 0.62)
100
Find probability of Type II error () if p = 0.64:
0.6806 0.64
P ( pˆ 0.6806 | p 0.64, n 100) P Z P Z 0.8458 0.8012
0.64(1 0.64)
100
Find probability of Type II error () if p = 0.66:
0.6806 0.66
P ( pˆ 0.6806 | p 0.66, n 100) P Z P Z 0.4349 0.6682
0.66(1 0.66)
100
Find probability of Type II error () if p = 0.70:
0.6806 0.70
P ( pˆ 0.6806 | p 0.70, n 100) P Z PZ 0.4233 0.3360
0.70(1 0.70)
100
Find probability of Type II error () if p = 0.72:
0.6806 0.72
P ( pˆ 0.6806 | p 0.72, n 100) P Z P Z 0.8775 0.1901
0.72(1 0.72)
100
Find probability of Type II error () if p = 0.74:
0.6806 0.74
P ( p 0.6806 | p 0.74, n 100) P Z
ˆ PZ 1.3542 0.0878
0.74(1 0.74)
100
Find probability of Type II error () if p = 0.76:
0.6806 0.76
P ( pˆ 0.6806 | p 0.76, n 100) P Z P Z 1.8591 0.0315
0.76(1 0.76)
100
Next, use these calculations to graph the power curve:
Page 5 of 6
Probability of Type II error
depends on true p
1
.8
.6
Beta
.4
.2
0
.6 .65 .7 .75
p
Page 6 of 6