0% found this document useful (0 votes)
262 views6 pages

MY NOTES Linear Regression

This document contains notes from an AP Statistics class on linear regression. It discusses using a linear regression model to find the relationship between two variables, the protein and fat content of Burger King menu items. The model finds the slope and y-intercept of the regression line. The residual plot is examined to check if the linear model is a good fit to the data. The R-squared value indicates that 69% of the variability in fat content is explained by the linear relationship to protein content. TI calculator instructions are provided for performing linear regression analysis.

Uploaded by

bveshpsu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
262 views6 pages

MY NOTES Linear Regression

This document contains notes from an AP Statistics class on linear regression. It discusses using a linear regression model to find the relationship between two variables, the protein and fat content of Burger King menu items. The model finds the slope and y-intercept of the regression line. The residual plot is examined to check if the linear model is a good fit to the data. The R-squared value indicates that 69% of the variability in fat content is explained by the linear relationship to protein content. TI calculator instructions are provided for performing linear regression analysis.

Uploaded by

bveshpsu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

AP Stats – Mr.

Veshio
Unit 2 – Chapter 8
**ActivStats** 8-1 “Manatees and Motorboats”

Let’s look at a scatterplot of these data on ActivStats…

How could we come up with an equation of this line? What will this
help us do?

I -Residuals
**Using the attached data on Burger King menu items, make a
scatterplot of the data on your TI**

**ActivStats** 8-3 – “Residuals”

When a “line of best fit” is drawn, we refer to it as a LINEAR MODEL


 Use this line to make ESTIMATES

Predicted Values: Estimates made from a model, called “y-hat”

Residual: Difference between the observed value (y) and its


associated predicted value (y-hat)
 Tells us how far off the model’s prediction is at that point

Residual = Observed – Predicted

Line of best fit – couldn’t we just add up the residuals, and see when
the sum is minimized?
 NO! – Why? The positive and negative residuals would cancel
each other out (LIKE STANDARD DEVIATION)
 How did we fix it? SQUARE THE RESIDUALS

Least Squares Line (aka Line of Best Fit) – minimizes the sum of the
squares of the residuals
 This line, and ONLY this line, will have the smallest variation
from the data as seen in the residuals

II – Correlation and the Least Squares Line

Let’s standardize the data – translate them into z-scores:

TI TIPS
STAT -> 1: Edit -> To create a new list, hit 2ND-> INS (above DEL
button)
Name it ZFAT -> Move your cursor on the name and hit ENTER -> The
cursor will be at the bottom now

AP Stats – Mr. Veshio -1-


Unit 2 – Chapter 8
AP Stats – Mr. Veshio
Unit 2 – Chapter 8
Hit 2ND -> LIST (above STAT button) -> Select FAT
You can now use this as a variable – whatever you tell the calculator to
do with this list, it will do it for every one of the values in list FAT -> To
convert to z-scores, we need to subtract the mean of the data, and
divide by the standard deviation

Run 1-VarStats on FAT and PROT to find the mean and standard
deviation

(FAT – 23.11) / 16.29 -> Hit Enter -> The calc will now fill ZFAT with the
z-scores for all 32 items’ fat content

Do the same for PROT

(PROT – 17.09) / 13.79

If you construct scatterplot, you will see a picture similar to Fig. 8.2 on
p. 170 minus the line – we don’t know how to draw it yet – but do we
know anything about it?

 Should go through the point (x , y), the averages of both


variables – but on a standardized scatterplot, that point is (0 , 0)
 Lines that go through the origin look like this: y = mx, where
“m” is the slope – but since variables were standardized, the
equation looks a little different:
zy = mzx
 Which slope will be “best” i.e. which slope gives us the line that
minimizes the sum of the residuals? THE CORRELATION
COEFFICIENT
zy = rzx
 “Moving one standard deviation away from the mean in x moves
our ESTIMATE r standard deviations away from the mean in y”

AP Stats – Mr. Veshio -2-


Unit 2 – Chapter 8
AP Stats – Mr. Veshio
Unit 2 – Chapter 8

Fig. 8.3 – The slope of a regression line on a standardized scatterplot


Just Checking
1.) about 0.85 standard deviations
2.) about 1.7 standard deviations below

***FROM MATH BOX, p. 172-173***


 Slope of line of best fit for z-scores is “r”
 Slope of regression line is b = rsy/sx
 1 – r2 = % of variability NOT explained by the regression line, so
 r2 = % of variability in y that IS explained by x
o explained by our model -> at most 100% (which is why r is
between -1 and 1)

III – Specifying the Regression Line

Instead of y = mx + b, we’ll use

y = b0 + b1x

You’ll need to find…


b0 = slope
b1 = y-intercept

MINI STEP-BY-STEP

1.) Find Slope

b1 = rsy/sx = (0.83 x 16.4 g fat) / 14g protein = 0.97g fat per g protein

2.) Find y-intercept

 Regression line will always pass through (x , y) -> substitute in


with the slope, and solve:

x = 17.2g protein
y = 23.5g fat

23.5 g fat = b0 + 0.97g fat/g protein · 17.2g fat


b0 = 6.8g fat

3.) Put slope and y-intercept back in, using specific variables

Fat = 6.8 + 0.97Protein

AP Stats – Mr. Veshio -3-


Unit 2 – Chapter 8
AP Stats – Mr. Veshio
Unit 2 – Chapter 8

How to Interpret

Slope: “y units per x” -> one additional gram of protein associated


with additional 0.97g of fat
 UNITS MATTER – include them

Intercept: value of the line when x=0 -> 6.8g fat for an item with no
protein, on average
 Zero is not usually plausible value -> y-intercept only serves as a
starting point

***SINCE CORRELATION IS AT THE HEART OF REGRESSION, YOU MUST


CHECK YOUR THREE CONDITIONS FIRST***

Just Checking p. 175

3.) Price increases on average $122,740 for every 1000 square feet

4.) dollars / square foot

5.) $245,480

6.) $377,784

IV – Residuals Revisited

Residuals: part of the data that HASN’T been modeled

Residual = Data – Model

e=y-y

“What did the model miss?” -> If our model is “good”, nothing
interesting should be left behind -> PLOT THE RESIDUALS

AP Stats – Mr. Veshio -4-


Unit 2 – Chapter 8
AP Stats – Mr. Veshio
Unit 2 – Chapter 8

 What should we see? NOTHING! No patterns, curves, bends – it


should stretch horizontally, with the same amount of scatter
throughout
 Residual plot is a good way to check “Straight Enough Condition”
o Plot them against either variable, usually y

V – R2 – Variation Accounted For

r2 = squared correlation

 Represents the fraction of data’s variation accounted for by the


model
 REMEMBER – our model isn’t perfect – some of the variation
WON’T be accounted for -> 1 - r2 of the variation is left in the
RESIDUALS

 Denoted R2 -> given as a % -> WHY?

Burger King Example

Variation in fat = variance = s2 =16.42 = 268.42

Variation in Residuals = 83.195

Variance of Resid. / Variance of Fat = 83.195 / 268.42 = 0.31 or 31%

 This 31% is the variation left in the residuals -> to find the
variation explained by the model (the rest of it), take 100% -
31% = 69%

HOW TO INTERPRET R2
“According to our linear model, 69% of the variability in fat content of
30 Burger King items is accounted for by variation in protein content”

 ALWAYS report it with a scatterplot and linear regression model

AP Stats – Mr. Veshio -5-


Unit 2 – Chapter 8
AP Stats – Mr. Veshio
Unit 2 – Chapter 8

TI TIPS – LINEAR REGRESSION

STAT -> Calc -> 8: LinReg (a + bx)

Enter YR, TUIT, Y1 -> Y1 can be found by VARS -> y-Vars -> 1:Function
 Will create a regression equation and store into Y1

**CHECK RESIDUALS**
 Calcs make a residual list when it runs LinReg -> List name is
always RESID -> if you don’t see it them in the lists, create a new
list and store them in it by finding RESID in the list of list names
 STAT PLOT -> X: YR Y: RESID -> ZOOM -> 9: ZoomStat (might
help to turn axes off)
 WHAT DO YOU SEE? AND WHAT DOES THIS MEAN?

What Can Go Wrong?


 Linearity is KEY
 Don’t fit a line to a non-linear relationship
 Re-express if possible
 Watch out for extreme values – OUTLIERS!!
 Causation is STILL not valid
 Fitting a regression line doesn’t help your case
 R does not work by itself
2

 Always look at the scatterplot in conjunction

HW: re – read Ch. 8, do #’s 2, 4, 15, 24, 40, 42

AP Stats – Mr. Veshio -6-


Unit 2 – Chapter 8

You might also like