0% found this document useful (0 votes)
19 views

AP Stats Module 2 Notes

The document provides an overview of scatterplots, explaining the roles of explanatory and response variables, and the significance of direction, outliers, form, and strength in analyzing data relationships. It includes instructions for using a TI-83/84 calculator to compute the least squares regression line and emphasizes the importance of interpreting residuals and correlation coefficients correctly. Additionally, it highlights common mistakes to avoid and the necessity of contextualizing statistical interpretations.

Uploaded by

bjs63624
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

AP Stats Module 2 Notes

The document provides an overview of scatterplots, explaining the roles of explanatory and response variables, and the significance of direction, outliers, form, and strength in analyzing data relationships. It includes instructions for using a TI-83/84 calculator to compute the least squares regression line and emphasizes the importance of interpreting residuals and correlation coefficients correctly. Additionally, it highlights common mistakes to avoid and the necessity of contextualizing statistical interpretations.

Uploaded by

bjs63624
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Scatterplots (dofs)

Explanatory variable: variable used to explain or predict the changes in the values of another variables,
AKA independent variable (x).

Response Variable: variable that measures the outcome (prediction); it is in response to the explanatory variables,
AKA dependent variable (y).

Direction Outlier(s) Form Strength


Is the overall direction: Are there any points that do Does the association follow a How closely do the points follow a visible form? If the points
not follow the overall trend? linear trend? Typical forms are are close to where a model can be drawn, the strength is
Positive - As x increases, y increases. linear, non-linear, or random strong. If linear, report r (see correlation coefficient below).
Review 02.02 page 6 of 8
Negative - As x increases, y decreases. scatter (no association).

Calculator steps for TI - 83/84: Least squares regression line (LSRL)


Go to Stats, Edit To receive full credit, be sure to define the x and y variables in your LSRL.
Formulas from AP Statistics 2 - Variable Statistics that
Enter explanatory variable (x) in L1 and response Y-intercept interpretation: When [x in context] is zero, the predicted Formula Sheet are necessary for calculating
variable (y) in L2. slope and y-intercept:
[y in context] is [b units].
Then, go to: Stats, Calc, and Linreg(a + bx)

y - intercept:
Candy Slope interpretation: On average, for each additional [x in
context], the predicted [y in context] changes by [a units].
Slope:
*Always report four decimal places when possible.

Tip: Always show all work when calculating a residual and include units.
Residual values
R-Squared: Coefficient of Determination
Residual Plots: Used to determine whether current
Helps to determine whether a linear model is appropriate
linear model is appropriate. The x-axis usually plots the
(after checking that the residual plot shows no visible pattern).
x-variable and y-axis is usually the residuals.
Residual = Actual - Predicted value. Think AP Statistics. The closer to 1 r-squared is, the more appropriate the linear model.
To interpret: The residual represents how much our model either
To interpret: r-squared is the percent of variation in [y] that
over/underestimated the actual value to be.
can be accounted for by the LSRL relating
Be careful, a positive residual is an UNDERestimate [y in context] to [x in context].
Random Scatter is GOOD! It Visible pattern is BAD!
and a negative residual is an OVERestimate. This has When reading computer output, we NEVER report
means that current linear It means that another
model is appropriate. model could be better! to do with where the point is relative to the LSRL. r-squared adj (adjusted).

Transformations Common mistakes


Transforming an association to achieve linearity depends on the original association that is present. Typically,
n x
there are two types of associations: Power (x ) and Exponential (a ). Appropriate transformations will be • Always show your work!
discussed for each. Begin by entering the explanatory variable (x) into L1 and the response variable (y) into L2.

For Power (x2, x3, x1/2, or xn in general): Graph log(x) vs. log(y) (see top graph on right) • Round to four decimal places!
For Exponential (2x,ex,(1/4)x, or ax in general): Graph x vs. log(y) (see bottom graph on right)
• Be sure to include units for both your x- and y-variable.
To find log(x), find log(L1) and store in L3. To find log(y), find log(L2) and store in L4.

After transforming the original association, check to see if linearity was achieved by: • When interpreting slope and y-intercept, the use of the word “predicted or
1. Check the new x vs. y in a scatterplot
estimated” is mandatory. Otherwise, it seems as though y is the actual value.

2. Check the residual plot for no visible pattern


• When estimating slope it needs to say "for each additional" or "for every one
3. Check that r-squared is closer 1. unit increase in [x in context]".

Correlation • Please remember that your variables (x and y-hat) must be defined. Be sure to
include the context (what do x and y-hat stand for). We should always write
Correlation* coefficient (r) - measures both direction (+/-) and strength (closer to
'where x represents ___ and y-hat represents the predicted ____.' Also, when
–1 or 1 stronger, closer to 0 weaker).
defining variables, there is a BIG difference between saying x is the hand span
*Correlation does NOT imply causation!
versus x is the length of arm span.
Examples of correlation
• Always write your answers in the context of the problem when interpreting
slope, y-intercept, r, r2, residuals, etc.

• The sign of the residual is opposite to what one would believe. A negative
residual is an overestimate and a positive residual is an underestimate. Think
about the order of the subtraction! :)
• Remember that correlation is a measure of association, not causation.

• When you are asked to describe the association shown in a scatterplot, you are expected to discuss the direction, form, and strength
of the association, along with any unusual features, in the context of the problem. This means that you need to use both variable
names in your description.

• Correlation:

 IS only appropriate to use the correlation to describe the strength and direction for linear relationships.

 does not measure form.

 is not a resistant measure of strength (similar to mean and standard deviation).

 has no unit of measurement.

 requires that both variables be quantitative.

 makes no distinction between explanatory and response variables.

• Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data (known as
extrapolation, see 02.02 Page 4 of 8).

• When asked to interpret the slope or y intercept, it is very important to include the word predicted in your response. Otherwise, it
might appear that you believe the regression equation provides actual values of y.

• Remember that slope is changes in y over changes in x. (Sy/Sx) is still consistent with this and can be found on the AP Statistics For-
mula Sheet.

You might also like