0 ratings0% found this document useful (0 votes) 585 views20 pagesCH 11 Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
AP Statistics Name:
Chapter 11
Activity 11: “I Didn’t Get Enough Blues!”
Materials needed: One 1.69-ounce bag of plain M&M's per student, calculator,
computer, and AP Stat textbook
DO NOT EAT any M&Ms until you have completed the experiment
‘© The M&M/Mars Company, headquartered in Hackettstown, New Jersey, makes
plain and peanut chocolate candies.
In 1995, they decided to replace the tan-colored M&M’s with a new color.
After conducting an extensive national preference survey, they decided to replace
the tan M&M's with blue M&M's.
© The company's Consumer Affairs Department announced the following: (this is
an updated announcement) (yes, I know the distribution is 101%)
According to M&M website,
‘On average, each 1.69-ounce package of Milk Chocolate M&M's should
contain the following percentage of M&M's
24% blue
14% brown
16% green
20% orange
13% red
14% yellow
They explained:
While we mix the colors as thoroughly as possible, the above ratios
‘may vary somewhat, especially in the smaller bags. This is because
we combine the various colors in large quantities for the last
production stage (printing). The bags are then filled on high-speed
packaging machines by weight, not by count.
Purpose of this Activity
Compare the color distribution of M&M's in your individual bag with the
advertised distribution.In order to use as random a sample as possible, itis best if the bags of M&M's are
purchased at different stores and not obtained from one or a few sources of supply.
1. Open your bag and carefully count the number of M&M's of color-brown,
yellow, red, orange, green, and blue-as well as the total number of M&M's in
the bag.
2. Fill in the counts, by color, and the total number of M&M's in the "Observed"
row in Table 1.
3. To obtain the expected counts, multiply the total number of M&M's in your
bag by the company's stated percentages (expressed in decimal form) for each
of the colors. Write these values in the “Expected” row in Table 1
4. For EACH color, perform this calculation: (observed - expected)?/
expected and enter the result in the last row in Table 1
5. ‘Then add up all of these calculated values in the last row of Table 1 and name
the sum 72.
‘Table 1: (Make a table in your notebook)
Color Brown | Yellow | Red | Orange | Green | Blue_| Total
Observed
Expected
(O-EY/E
Answer the following questions in your notebook:
6. Does your sample reflect the distribution advertised by the M&eM/Mars
Company?
7. Are the entries in the last row all about the same, or do any of the quantities
stand out because they are "significantly" larger?
8. Did you get more of a particular color than you expected?
9. Did you get fewer of a particular color than you expected?
10. On my teacher website, click the link for the Excel form. Answer the
questions using your table above.AP Statistics Name
Chapter 11
Chapter 11: Inference for Tables: Chi-Square Procedures
11.1 - Chi-Square Test for Goodness of Fit
From Chapter 9: (Inference for proportion)
* Performed significance test with proportions.
«p= The proportion of blue M&M's
Ho: p = 0.24
He: p < 0.24
‘© Performing 5 more of these test for each of the other M&M colors in the
bag would be inefficient.
+ Doing this wouldn't tell us how likely it is that six sample proportions
differ from the values stated by M&M/ Mars as much as our sample does.
Chi - Square (’) Goodness of Fit Test
* Determine whether a specified population distribution seems valid,
* Single test used to see if the observed sample distribution is
significantly different from the hypothesized population distribution
© Null hypothesis
+ A population distribution is the same as a reference
distribution.
© Alternate Hypothesis:
* A population distribution is different from a reference
distribution.
© Hypotheses can be stated in words
+ H,:The age group distribution in 1996 is the same as the
1980 age group distribution.
+ H,: The age group distribution in 1996 is different from the
1980 age group distribution.
© Hypotheses can be stated in notation (the proportions that make
up the distribution)
© Hy! Pag = 14, Pa, = 24, Po = -16, Pg = -13,Py =-14,Po = 20
+ H,:Atleast one of the proportions differs from the stated
values.
© Can be applied to see if the observed sample distribution is significantly
different from the hypothesized population distribution.
«The more the observed counts differ from the expected counts the more
evidence we have to reject the null hypothesis.(0-8)
measures how well the observed counts fit the expected counts,
if the null were true.
(o-£y A
{Co =) calculated for each category of the distribution.
(0-87
The SUM of ——— is called Chi - Square Statistic x?
The larger the difference between the observed and expected values, the
larger the Chi - Square Statistic x
2
Chi - Square Statistic: Pa
Chi - Square (,) Distributions
Properties of the Chi - Square Distribution
© Family of distributions that take only positive values
© Skewed to the right
© The total area under a chi - square curve is 1
Based on degrees of freedom “n-1” or “categories-1” or “proportions-1”
o Weare dealing with percentages, so five of the six values are free to
vary but the sixth one may not because they must (should) all add.
up to 100%
A chi - square curve is specified by the degrees of freedom
© Each row in the Chi - Square chart is a distribution based on the
degrees of freedom.
As degrees of freedom increase:
co The density curves become less skewed
©. Larger values are more probable.
© The curve becomes more and more symmetrical and more like a
normal curve.
Each chi-square curve:
© Begins at 0 on the horizontal axis, increases to a peak, and then
approaches the horizontal axis asymptotically from above.
© The only curve that does not follow this is when df = 1
P- Value
© Area under the curve to the right of the chi - square test statistic.
©. The probability that observing a value of 7 at least as extreme as
the one actually observed.
©. Larger the value of the chi ~ square statistic, the smaller the P-
value
©. The smaller the P-value the more evidence against the null
hypothesis.Wc dieae eas
BCR oon kaa
* Random: The data come from a well-designed random sample or randomized
‘experiment.
= 10%: When sampling without replacement, check that "=
* Large counts: All expected counts are at least 5.
o
1. The chi-square test statistic compares observed and expected counts. Don't
try to perform calculations with the observed and expected proportions in each
category.
2. When checking the Large Counts condition, be sure to examine the
expected counts, not the observed counts.
prise
aed
al
‘Suppose the conditions are met. To determine whether a categorical variable has 2
‘Specified distribution in the population of interest, expressed as the proportion of
individuals falling into each possible category, perform a test of
Ho: The stated distribution of the categorical variable in the population of interest is.
correct,
Ha: The stated distribution of the categorical variable in the population of interest is not
correct.
Start by finding the expected count for each category assuming that Ho is true. Then
calculate the chi-square statistic
; (Observed — Expected)?
a eee a
oe Expected
where the sum
over the k different categories. The P-value is the area to the right of
x? under the density curve of the chi-square distribution with k ~ 1 degrees of freedom.
Always record Test Statistic, p value, and degrees of freedom.AP Statistics
Chapter 11
11.2: Inferences for Two-Way Tables
Statistical Methods for Multiple Comparisons
‘© An overall test (that calculate a chi-square statistic) to see if there is good
evidence of any differences among parameters of interest.
‘* A detailed follow-up analysis to decide which of the parameters differ
and to estimate how large the differences are.
Two - Way Tables
‘© Organizes data about categorical variables
* Summarize large amounts of data by grouping outcomes into categories.
+ First step in this type chi-square statistic test is to arrange data in a two -
way table.
+ “rx” table (row x column)
To test H,, compare the observed counts with the expected counts.
Expected Counts
+ Counts we would expect (except for random variation) if H, were true.
The expected count in any cell of a two - way table when Hyis true is
row total x
expected count = 6108
table total
Chi-Square Test with Two-Way Tables
* Measure of the distance between the observed counts and the expected
counts.
* It isa distance - it is always zero or a positive value
* When equal to Zero - the observed counts and the expected counts are
exactly equal.
© Large counts of x’ are evidence against H, because the observed counts
are far from what we would expect if H,were true,
‘* Itis an approximate method that becomes more accurate as the counts in
the cells of the table get larger.
Chi- Square Statistic: = (0-4).
E
‘The sum is over all “r X c “cells in the table.
6Chi - Square Test for Homogeneity of Populations
* Compares several population proportions
Arrange into a two-way table:
Select an independent SRS from each of “c” (several) populations. Classify each
individual in a sample according to a categorical response variable with “1
possible values. There are “c” different sets of proportions to be compared, one
for each population.
State Hypotheses:
H, = the distribution of the response variable is the same in all “c”
populations.
H,= that these “c” distributions are not all the same. (allows any other
relationship among the population proportions)
Check Conditions
‘¢ Must come from independent SRSs from population of interest.
‘* Populations 10 times larger than samples
«All individual expected counts are at least 5
If H,is true, the chi-square statistic has approximately a chi-square statistic
distribution with a specified degree of freedom.
Degree of Freedom = (1-1)(c-1)
The P-value for the chi-square statistic is the area to the right of x under the
chi-square density curve with d.f. (degrees of freedom)
The Chi-square Test for Independence
* Asingle SRS is drawn from one population.
+ Observations are classified according to two categorical variables (these
variables can have levels)
* Tests the null hypothesis:
©. Hy: There is no relationship between the row variables and the
column variables
OR
© Hy: There is no relationship between the two categorical variables.
* “The row and column variables are not related to each other”Setup the Two-Way Table
‘© Marginal = each row total and each column total.
© Calculate conditional probabilities
Descriptive Statistics
Analysis of data: describe relationship between categorical variables by
comparing the percents (not counts)
‘* Compute conditional distributions
‘* Graph the data to visually examine. (Bar chart)
The Chi-square Test for Independence
‘* Asingle SRS is drawn from a single population.
‘+ Tests that there is no relationship between the row variable and the
column variable.
* Will assess whether this observed association is statistically significant.
Example Hypothesis Phrasing:
H,: Smoking and SES are independent
H,,: smoking and SES are dependent
Example Hypothesis Phrasing:
Hy: There is no association between smoking and SES
H.,: there isan association between smoking and SES
Check Conditions
© Must come from well-designed random sample or randomized
experiment
+ When sampling, check that population is 10 times larger
+ All expected counts are at least 5
Degree of Freedom = (r-1)(-1)
The P-value for the chi-square statistic is the area to the right of x? under the
chi-square density curve with d.f. (degrees of freedom)
Note: If a conclusion is made that there is an association, this
does not show or prove causation.15, What's your sign? The University of Chicago's General Social Survey (GSS) ts the nation’s most
{important social science sample survey. For reasons known only to social scientists, the GSS regularly asks
random sample of people their astrological sign. Here are the counts of responses from a recent GSS:
Sign: Aries Taurus Gemini Cancer Leo Virgo
Count: 321 360 367 374 383 402
Sign: Libra Scorpio Sagittarius Capricorn Aquarius Pisces
Count; 392 329 331 354 376 355
births are spread uniformly across the year, we expect all 12 signs to be equally likely. Do these data
provide convincing evidence that all 12 signs are not equally ikely? If you finda significant result, perform
a follow-up analysis.
27. Why men and women play sports Do men and women participate in sports for the same reasons? One
‘goal for sports participants is social comparison —the desire to win orto do better than other people.
Another is mastery —the desire to improve one’s skills or to try one’s best, A study on why students
participate in sports collected data from independent random samples of 67 male and 67 female
‘undergraduates at a large university."? Each student was classified into one of four categories based on his,
‘or her responses to a questionnaire about sports goals. The four categories were high social comparison
high mastery (HSC-HM), high social comparison-low mastery (HSC-LM), low social comparison-high
‘mastery (LSC-HM), and low social comparison-low mastery (LSC-LM). One purpose ofthe study was to
compare the goals of male and female students, Here are the data displayed in a two-way table:
Gender
Goal Female Male
HSC-HM 14 31
HSC-LM 7 18
LSC-HM val 5
LSC-LM 25 13
(a) Calculate the conditional distribution (in proportions) ofthe reported sports goals for each
‘gender,
{(b) Make an appropriate graph for comparing the conditional distributions in par (a).
{€) Write a few sentences comparing the distributions of sports goals for male and female
undergraduates.
C29, Why women and men play sports Refer to Exercise 27. Do the data provide convincing evidence of a
difference in the distributions of sports goals for male and female undergraduates at the university?
(a) State appropriate null and alternative hypotheses for a significance test to help answer this
question.
(b) Calculate the expected counts. Show your work.
(©) Calculate the chi-square statistic, Show your work.
31, Why women and men play sports Refer to Exercise 27 and Exercise 29.
(a) Check that the conditions for performing the chi-square test are met.
(b) Use Table C to find the P-value. Then use your calculator’s y7cdf command.
(c) Interpret the P-value from the calculator in context.
(4) What conclusion would you draw? Justify your answer.33. Python eggs How is the hatching of water python eggs influenced by the temperature of the snake's nest?
Researchers randomly assigned newly laid eggs to one of three water temperatures: hot, neutral, or cold.
Hot duplicates the extra warmth provided by the mother python, and cold duplicates the absence of the
‘mother. Here are the data on the number of eggs that hatched and didn’t hatch:!5
Water Temperature
Hatched? Cold Neutral Hot
Yes 16 38 15
No. i 18 29
(a) Compare the distributions of hatching status forthe three treatments
(b) Are the differences between the three groups statistically significant? Give appropriate evidence
to support your answer.
45. Regulating guns The National Gun Policy Survey asked a random sample of adults, “Do you think there
should be a law that would ban possession of handguns except for the police and other authorized persons?"
Here are the responses, broken down by the respondent's level of education:
Education
Less than Highschool Some College Postgrad
high school grad college grad degree
Yes 58 84 169 98 7
No. 58 129 294 135 99
Does the sample provide convincing evidence of an association between education level and opinion about
aa handgun ban in the adult population?2003 AP® STATISTICS FREE-RESPONSE QUESTIONS
‘5. A random sample of 200 students was selected from a large college in the United States. Each selected student
was asked to give his or her opinion about the following statement.
“The most important quality of a person who aspires to be the President
of the United States is a knowledge of foreign affairs.”
Each response was recorded in one of five categories. The gender of each selected student was noted.
‘The data are summarized in the table below.
Response Category
‘Strongly | Somewhat | Neither Agree | Somewhat | Strongly
Disagree | Disagree | norDisagree | Agree | Agree
Male 10
Female 20
25, 15
Is there sufficient evidence to indicate that the response is dependent on gender? Provide statistical evidence
to support your conclusion.2002 AP® STATISTICS FREE-RESPONSE QUESTIONS
STATISTICS
SECTION I
Part B
Question 6
‘Spend about 25 minutes on this part of the exam.
Percent of Section II grade—25
Directions: Show all your work. Indicate clearly the methods you use, because you will be graded on the
correctness of your methods as well as on the accuracy of your results and explanation.
6. Asurvey given to a random sample of students at a university included a question about which of two well-
known comedy shows, S orF, students preferred. The students were asked the question, “Do you prefer S or F 2”
‘The responses are shown below.
Preference
Ss F Toul
185 139) 324
(a) Based on the results of this survey, construct and interpret a 95% confidence interval for the proportion of
students in the population who would respond S to the question, “Do you prefer Sor F 2”
(b) What is the meaning of “95% confidence” in part (a) ?
(©) Ina follow-up survey, a separate group of randomly selected students was asked “Do you prefer F or S 2”
‘The responses are shown below.
Preference |
Ss F | Toul
68 8156
‘Based on these two surveys, is there evidence that the stated preference depends on the order in which the
comedy shows were listed in the survey question? Justify your answer,
(d) Suppose the test in part (c) indicates that the order in which the shows were listed does make a difference.
Is the pooled value 485+ 68 — 0.527 a reasonable estimate forthe proportion of students at the university
Poa Wa + 156 parte
who would respond $ ? If so, justify your answer. If not, what would be a more reasonable estimate?
Explain why.
132008 AP® STATISTICS FREE-RESPONSE QUESTIONS
5. A study was conducted to determine where moose are found in a region containing a large bumed area. A map
of the study area was partitioned into the following four habitat types.
(2) Inside the burned area, not near the edge of the bumed area,
(2) Inside the burned area, near the edge,
(3) Outside the burned area, near the edge, and
(4) Outside the burned area, not near the edge.
The figure below shows these four habitat types.
Note: Figure not drawn to scale.
The proportion of total acreage in each of the habitat types was determined for the study area. Using an aerial
survey, moose locations were observed and classified into one of the four habitat types. The results are given in
the table below.
Habitat Type | Proportion of Total Acreage | _ Number of Moose
1 0.340 25
2 0.101 Bs
3 0.104 30
bd 0.455 40
Total 1.000 Tr
(a) The researchers who are conducting the study expect the number of moose observed in a habitat type to be
proportional to the amount of acreage of that type of habitat. Are the data consistent with this expectation
Conduct an appropriate statistical test to support your conclusion. Assume the conditions for inference are
met.
(b) Relative to the proportion of total acreage, which habitat types did the moose seem to prefer? Explain.
(© 2008 The College Board, All rights reserved,
Visit apcentra.collegeboardLcom (for AP professionals) and www-collegeboard.comiapstudents (fr students and parents).
Th
GO ON TO THE NEXT PAGE. | Y\Test LIA AP Statistics Name:
Part 1: Multiple Choice. Circle the letter corresponding to the best answer.
Use the following for questions 1 - 3:
A well-known chewing gum maker wants to determine if any of its four flavors of gum are more
popular than the others. A random sample of 80 people who say they chew gum regularly is asked to
identify their favorite flavor of gum: Here are the results:
Flavor Peppermint Cinnamon [Wintergreen | Spearmint
Frequency 25 19) 22 14
1. Which of the following would be an appropriate null hypothesis for the company to test?
@ A= mr
(b) The observed counts are all equal to 20.
(©) Flavor preferences for the population are evenly distributed across the four flavors.
(@) At least one of the four flavor preferences in the population is different from the other three.
(e) The observed counts are equal to the expected counts.
2, Which of the following are conditions that must be met in order to test this hypothesis using a
chi-square test?
) I. If p= proportion of gum-chewers in the population, then mp 210 and n(1- p)=10.
II. All expected cell counts are at least 5.
TIL. The sample size is no more than 10% of the population size.
(a) Land Honly
(b) Mand Ill only
(©) Land Ill only
@) only
(&) 1, Hand 1
3. Which of the following represents the component of the chi-square statistic for Wintergreen?
(a) 22
7
© a
© (ae)
‘©BFW Publishers The Practice of Statist for APY SleUse the following for questions 4 ~ 6:
Do male and female children respond differently to colors? A study of color association in children asked
separate random samples of male and female fourth-graders what emotion they associated with the color
red. Here are the results for each group:
Emotion
‘Anger [Happiness] Love | Pain | Total
Female | 27 19 39 7 102
Male | 34 12 38 2B 12
Total OT 31 7 5 214
4, Which of the following would be the appropriate null hypothesis for this test?
(a) The distribution of emotional associations with the color red is the same for male and female
fourth-graders,
(b) Gender is dependent upon emotional association with the color red.
(c) Emotional associations with the color red are independent of gender.
(d) The number of observations in each cell is the same for each emotional association.
(c) 25% of all fourth graders associate the color red with each of the four listed emotions.
5. Under the assumption that the null hypothesis is true, which of the following represents the expected
count for female children who associate the color red with love?
(a) 39
77)(214)
wy I)
102
(77)(102)
214
39) (102)
@ G2)0%)
7
(39)
214
(©)
©
6. The chi-square statistic for these data is X*= 4.629. Which of the follow:
value for this test?
(a) 0.005 s P-value s 0.01
(b) 0.015 P-value = 0.025
(©) 0.025 s P-value s 0.05
(@) 0.05 s P-value < 0.1
(©) P-value2 0.1
tervals contains the P-
(©BFW Publishers The Practice of Statistic for APY Sle) Use the following for questions 7 ~ 8:
State traffic engineers want to characterize the types of vehicles found on three state roads. They take a
random sample of vehicles on each road over a two-week period and get the results in the table for the
number of vehicles of each type on each road. The engineers perform a chi-square test of homogeneity,
using the null hypothesis that there is no difference in distribution of vehicles types on the four roads.
Vehicle type
Cars Light trucks/SUVs Heavy trucks/trailers
Route 9 126 a2 16
Route 47 216 31 35
Route 116 271 4L 56
Route 176 413 37 a
7. For this chi-square test, what are the correct degrees of freedom?
@ 3 () 5 ©6 @u © 12
8. Below are the individual components for the chi-square statistic for this test:
Cars Light trucks/SUVs __Heavy trucks/trailers
Route 9 18 14.1 03
Route 47, 13 38 09
Route 116 06 09 10.7
Route 176 60 95 1d
Based on the original data and the components, which of the following statements is true’
(a) The observed count of heavy trucks/trailers on Route 176 is much higher than the expected count
(b) There are many more light trucks on Route 9 than we would expect if the null hypothesis were
true.
(c) The number of observed cars on Route 116 is much lower than we would expect if the null
hypothesis were true.
(d) The greatest difference between observed and expected counts is for heavy trucks/trailers on
Route 9.
(e) The chi-square statistic for this test is less than 30.
9, Which of the following statements about chi-square distributions are true?
L.A chi-square distribution with fewer than 10 degrees of freedom is roughly symmetric.
II. The more degrees of freedom a chi-square distribution has, the larger the median of the
distribution.
IIL For all chi-square distributions, P(z? = 0) =1
(@) Lonly
(6) only
(c) IM only
(@) Mand 111
(©) All three statements are true
‘OBFW Publishers The Practice of Statistics for APY, Sle10, Is the accident rate for some car colors different than for other car colors? An insurance company
selects a random sample of cars that it insures and records their color (using five categories: white,
silver, black, red, or “all others”) and whether or not they have been involved in an accident in the last
three years. They perform a chi-square test of association and obtain a test statistics of x°= 8.474,
which yields a P-value of 0.0758. Using a significance level of ct = 0.05, which of the following is
the appropriate conclusion for this test?
(a) Reject Ho: there is convincing evidence of an association between car color and proportion of cars
involved in accidents.
(b) Accept Hf: there is convincing evidence that car color and proportion of cars involved in accidents
are independent.
(©) Reject Ho: there is insufficient evidence to establish an association between car color and
proportion of cars involved in accidents,
(d) Fail to reject Hg: there is insufficient evidence to estal
proportion of cars involved in accidents.
(e) Fail to reject Hg: there is convincing evidence that car color and proportion of cars involved in
accidents are independent.
hh an association between car color and
‘OBFW Publishers The Practice of Statistics for AP", SlePart 2: Free Response
») Show all your work. Indicate clearly the methods you use, because you will be graded on the
correctness of your methods as well as on the accuracy and completeness of your results and
explanations.
11. Big Box Electronics, a large national chain store, has one store in the city of Kingston. One factor
in deciding whether to build a second store in the city is whether the current store is serving all
residents equally well, or whether unequal proportions of residents from different parts of town are
using the store because it’s located on one side of town. The national managers of Big Box divide
Kingston into four geographical regions and determine the percentage of residents who live in each
region, Here’s what they find:
Region
North ‘South’ East
West
Percentage of population | 40% 24% 22%.
14%
‘Then the managers take a simple random sample of 250 shoppers at Kingston’s Big Box store and
out:
determine which part of town they come from by asking for their zip code when they are checking
Region
North South, East
‘West
‘Number of shoppers
120 48. 62
20
Is Kingston's only Big Box store used by a higher proportion of the residents in some parts of town
than others? Support your conclusion with an appropriate statistical test.
‘©BFW Publishers “The Practice of Stasis fr APF, Sie12, A few weeks before the senatorial election between incumbent Senator Smirk and his challenger,
former Governor Graff, the senator’s polling organization wants to know where he should
concentrate his campaigning. They take simple random samples of potential voters in the southern
and northern portions of the state, and ask them if they have decided who to vote for or are still
undecided. Here are the results
Decided on a Still
candidate __ undecided _— Total
Region North 116 60 176
South 148 52, 200
Total 264 112 316
(a) Do these data provide convincing evidence that there is a difference in the distribution of voters
who have decided or are still undecided in the two regions? Use a chi-square test to support
your conclusion,
(b) The pollsters are concerned that while all 200 people in the “South” sample responded, 24
people (out or the original SRS of 200) in the “North” sample did not respond. Is it possible
that the opinions of these people would change the pollsters’ conclusions? Explain.
‘OBEW Publishers The Practice of Statics for APF Sle