MY NOTES Linear Regression
MY NOTES Linear Regression
Veshio
Unit 2 – Chapter 8
**ActivStats** 8-1 “Manatees and Motorboats”
How could we come up with an equation of this line? What will this
help us do?
I -Residuals
**Using the attached data on Burger King menu items, make a
scatterplot of the data on your TI**
Line of best fit – couldn’t we just add up the residuals, and see when
the sum is minimized?
NO! – Why? The positive and negative residuals would cancel
each other out (LIKE STANDARD DEVIATION)
How did we fix it? SQUARE THE RESIDUALS
Least Squares Line (aka Line of Best Fit) – minimizes the sum of the
squares of the residuals
This line, and ONLY this line, will have the smallest variation
from the data as seen in the residuals
TI TIPS
STAT -> 1: Edit -> To create a new list, hit 2ND-> INS (above DEL
button)
Name it ZFAT -> Move your cursor on the name and hit ENTER -> The
cursor will be at the bottom now
Run 1-VarStats on FAT and PROT to find the mean and standard
deviation
(FAT – 23.11) / 16.29 -> Hit Enter -> The calc will now fill ZFAT with the
z-scores for all 32 items’ fat content
If you construct scatterplot, you will see a picture similar to Fig. 8.2 on
p. 170 minus the line – we don’t know how to draw it yet – but do we
know anything about it?
y = b0 + b1x
MINI STEP-BY-STEP
b1 = rsy/sx = (0.83 x 16.4 g fat) / 14g protein = 0.97g fat per g protein
x = 17.2g protein
y = 23.5g fat
3.) Put slope and y-intercept back in, using specific variables
How to Interpret
Intercept: value of the line when x=0 -> 6.8g fat for an item with no
protein, on average
Zero is not usually plausible value -> y-intercept only serves as a
starting point
3.) Price increases on average $122,740 for every 1000 square feet
5.) $245,480
6.) $377,784
IV – Residuals Revisited
e=y-y
“What did the model miss?” -> If our model is “good”, nothing
interesting should be left behind -> PLOT THE RESIDUALS
r2 = squared correlation
This 31% is the variation left in the residuals -> to find the
variation explained by the model (the rest of it), take 100% -
31% = 69%
HOW TO INTERPRET R2
“According to our linear model, 69% of the variability in fat content of
30 Burger King items is accounted for by variation in protein content”
Enter YR, TUIT, Y1 -> Y1 can be found by VARS -> y-Vars -> 1:Function
Will create a regression equation and store into Y1
**CHECK RESIDUALS**
Calcs make a residual list when it runs LinReg -> List name is
always RESID -> if you don’t see it them in the lists, create a new
list and store them in it by finding RESID in the list of list names
STAT PLOT -> X: YR Y: RESID -> ZOOM -> 9: ZoomStat (might
help to turn axes off)
WHAT DO YOU SEE? AND WHAT DOES THIS MEAN?