AP Stats Study Guide 1 1 1
AP Stats Study Guide 1 1 1
r^2 - About % of the variability in (y with units) is accounted for by the least squares
regression line with x = (x with units)
Standard deviation of residuals: (s) - The actual [y] is typically s away from the value
predicted by the LSRL
P Value:
Assuming null hypothesis is true, there is a (P-value)% probability of getting a sample
mean as (large or larger/small or smaller/at or more extreme) just by chance
Confidence Interval: We are % Confident that the interval { . } contain the true
population parameter.
Confidence Level: If we take many samples of [context] and calculate many intervals,
then 95% of the intervals will contain population parameters.
Standard error: If we take many, many samples of slopes, then it will be, on average,
[standard error] from the true population slope
Acronyms:
(Ch. 1) Describe Distribution in a HISTOGRAM: CUSS -
Center (median)
Unusual Points (outliers by 1.5*IQR rule
Spread (most points lie between range of values, or say IQR)
Shape (uniform, bimodal, skewed left or right, symmetrical)
Describe association in a SCATTERPLOT: DUFS
Direction,
Unusual Features,
Form,
Strength.
Binary: BINS - Binary, Independent, Fixed sample value, same probability for all
Geometric: BIS - Binary, Independent, same probability for all.
Inferences:
Verify Conditions: SIN -
Random sample/SRS,
Independent: sample size n < 10% of population N.
Normal: np > 10, n(1-p)>10
---------------------------------------------
There are more males than females because the bottom right box is bigger. However, females
were offered admission more.
Measures of Center:
Resistant: Median & IQR
Non-resistant: Mean, Standard Deviation, Variance. quartiles
The standard error (standard deviation of statistic), and correlation is not affected by
any operation
If the mean decreases, standard deviation either stays the same or decreases.
Transformations:
Addition and Multiplication(+, x): Mean, Median, Mode, Range
Multiplication(x): Standard Deviation, IQR
Standard Deviation.
Average distance of values from the mean
High SD = flatter b/c values are farther away from mean
Low SD = skinnier and higher in center bc values are closer
For the smallest standard deviation number set, any set of four identical numbers
would will have
sx = 0, so there are multiple correct answer choices.
For the largest standard deviation number set, there is only one possible answer. The
largest standard deviation will come from two values at each extreme.
Correlation:
Resistant: NOT affected by unit changes, multiply/divide/add/subtract.
Not affected by which variable is x or y, x or y does not need to be defined
Correlation does NOT equal Causation, ONLY experiments can imply causation
R^2
The pattern in residual plot = non-linear
Random residual plot = linear
Blocking: Sample 100 people into 50 males, and 50 females, randomize treatments
Benefit: We reduce confounding variables.
Matched Pairs:
● 30 stores. sample the 2 closest by location. Label store as 1 and the other as 2.
Flip a coin, if the coin lands on heads, store 1 gets treatment A, and store 2 gets
treatment B. If the coin lands on tails, store 1 gets Treatment B and store 2 gets
Treatment A. Compare treatments between stores. Repeat for the remaining
pairs.
● 30 people.
For each pair of twins, label one person as twin A and label the other person as
twin B. For each pair of twins, toss a coin. If the coin lands on heads, twin A gets
the placebo and twin B gets the active drug. If the coin lands on tails, twin A gets
the active drug and twin B gets the placebo.
Double Blinding: Neither subjects nor people administrating the treatment are aware of
who gets what treatment. Only the data collectors know.
Clustering
Stratification
More accurate (+), More expensive (-)
Bias:
sampling method is biased if it produces estimates that are consistently smaller or
larger than the true value in the population.
Bias methods:
volunteer survey - volunteer selection can lead to non-response/under coverage
Confounding: When other lurking variables change outcome of the response variable.
Ch. 5 - Probability
When it says “Is it unusual that…” take the probability at that point or higher
Don't say sampling variability or not provide a probability.
2 2
25 + 15 = 29.155
P(certificate of merit AND hrs worked < 90) / P(hrs worked < 90)
P(hrs worked < 90) = normalcdf(-10^99, 90, 80, 7) = 0.923
𝑃(85.89 < 𝑋 < 90) 0.123
𝑃(𝑋 < 0.923)
= 0.923
= 0.133
Answer: C
Relationships
↑ n , ↓ confidence size, margin of error
If you get closer to the true value, the interval becomes wider.
if the confidence level is the same, risk of being incorrect is the same
ME only accounts for sampling variability by chance NOT BIAS
MCQ Practice:
“Minimum sample size for margin of error”
.60(1−0.60)
z* 900
-> z* 0.0163 ≤ 0.027 -> z* ≤ 1.653
normalcdf(-10^99, 1.653, 0, 1) = 0.95
Answer: C
Relationships:
A two sided test only allows us to reject (or fail to reject) a hypothesized value for a
particular population parameter.
You cannot find power from only a type 1 error, or vice versa.
T test -
● t-distributions are symmetric
● they are lower at the mean and higher at the tails and so are more spread out
than the normal distribution.
● The greater the df, the closer the t-distributions are to the normal distribution.
● The 68-95-99.7 Rule applies to the z-distribution and will work for t-models with
very large df.
● All probability density curves have an area of 1 below them.
Conditions: normally distributed, n>=30, and symmetrical,
Pooling:
2 Proportion Z Intervals DO NOT use pooling
2 Proportion Z Tests DO use pooling
11.1.)
Goodness of Fit Test:
Ho: The distribution of [context] is uniform in population
Ha: The distribution of [context] is NOT uniform in population
11.2)
Homogeneity Test:
Ho: The distribution of [context] is uniform in population
Ha: The distribution of [context] is NOT uniform in population
Independence Test:
Ho: There is not an association between [context] and [context] in population
Ha: There is an association between [context] and [context] in population
Do: Put observed values in 2nd x^-1, stat tests x^2 test.
Calculator programs(ti-84):
Goodness of fit test: x GOFTest(2nd + Vars - > x2GOFTest)
2
X2GOFTest(L1,L2)
Homogeneity: x2-Test( Stat -> Tests - > x2-Test )
x2-Test {A,B}
Independence: x2-Test( Stat -> Tests - > x2-Test )
x2-Test {A,B}
(Enter the observed values into matrix A and the x2-Test will do the rest of your calculations. The
values in matrix B are inputted by the calculator when you finish the test into {B} ).
Clear matrix - 2nd + -> Mem Management -> Matrix -> Del A or B
Relationships:
The more degrees of freedom a chi-square distribution has, the HIGHER the mean will be.
Chi-square distribution with greater than 10 degrees of freedom is roughly symmetric.
These questions ask for expected value. (row total)(column total) / (grand total)
When a question asks for the contribution to the x^2 statistic, just do (O-E)^2 / E
Ch 12 - Slope
12.1
The formula for the confidence interval for the slope is b1 ± t* · standard error·
df = n-2
12.2
log transformation on both sides, do 10^log(x)
Standard error = 16.258, will always be under SE Coef and where the slope is.
b = -145.569
df = n-2, df = 23.
invT(0.975, 23) = 2.069
-145.57 +- 2.069*16.258 Answer is C
AP Tips:
1. When asked to describe a one-variable data set, always discuss shape, center, and
spread in context. That means your answer should mention the variable and include
units.
2. If you are asked to compare distributions, use phrases such as greater than, less
than, and the same as. And, again, always answer in context.
3. Understand how skewness can be used to differentiate between the mean and the
median.
7. Be able to use a residual plot to help determine if a linear model for a data set is
appropriate. Be able to explain your reasoning.
8. Recognize that the correlation coefficient (r) measures the strength and direction of a
relation we have reason to believe is linear. The correlation coefficient does NOT tell us
that the linear model is an appropriate model.
12. Know the definition of, and reasons for, choosing to do a stratified random sample
instead of a simple random sample.
14. Explain the difference between the purposes of randomization and blocking.
17. Be clear on the distinction between independent events and mutually exclusive
events (and why mutually exclusive events can’t be independent).
18. Be able to find the mean and standard deviation of a discrete random variable.
20. Never forget that hypotheses are always about parameters, never about statistics.
21. Any hypothesis testing procedure involves four steps. Know what they are and that
they must always be there. And never forget that your conclusion in context (Step 4)
must be linked to your calculations (Step 3) in some way.
22. When doing inference problems, remember that you must show that the conditions
for the inference procedure are present. It is not sufficient to simply declare them
present. Realize that you are often not instructed to check the conditions in the question
but you must do so anyway.
23. Be clear on the concepts of Type I and Type II errors and the power of a test.
24. If you are required to construct a confidence interval, remember that there are three
things you must do to receive full credit: justify that the conditions necessary to
construct the interval are present; construct the interval; and interpret the interval in
context. You’ll need to remember this, because often the only instruction you will see is
to construct the interval.
25. If you include graphs as part of your solution, be sure that axes are labeled and that
scales are clearly indicated. This is part of communication