Chapter09 Part 2
Chapter09 Part 2
Regression Wisdom
Slide 9 - 2
How do we use regressions to figure out what the data has to say? Check Homework #20 with Group: Read Chapter 9, #1-9ODD, 8 Homework #21: Read Chapter 9, #11-23 ODD
Slide 9 - 3
Extrapolation is always dangerous. But, when the x-variable in the model is time, extrapolation becomes an attempt to peer into the future. Knowing that extrapolation is dangerous doesnt stop people. The temptation to see into the future is hard to resist. Heres some more realistic advice: If you must extrapolate into the future, at least dont believe that the prediction will come true.
Slide 9 - 4
Outlying points can strongly influence a regression. Even a single point far from the body of the data can dominate the analysis. Any point that stands away from the others can be called an outlier and deserves your special attention.
Slide 9 - 5
The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election
Slide 9 - 6
The red line shows the effects that one unusual point can have on a regression:
Slide 9 - 7
The linear model doesnt fit points with large residuals very well. Because they seem to be different from the other cases, it is important to pay special attention to points with large residuals.
Slide 9 - 8
A data point can also be unusual if its x-value is far from the mean of the x-values. Such points are said to have high leverage.
Slide 9 - 9
A point with high leverage has the potential to change the regression line. We say that a point is influential if omitting it from the analysis gives a very different model.
Slide 9 - 10
The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow!
Slide 9 - 11
When we investigate an unusual point, we often learn more about the situation than we could have learned from the model alone. You cannot simply delete unusual points from the data. You can, however, fit a model with and without these points as long as you examine and discuss the two regression models to understand how they differ.
Slide 9 - 12
Warning: Influential points can hide in plots of residuals. Points with high leverage pull the line close to them, so they often have small residuals. Youll see influential points more easily in scatterplots of the original data or by finding a regression model with and without the points.
Slide 9 - 13
No matter how strong the association, no matter how large the R2 value, no matter how straight the line, there is no way to conclude from a regression alone that one variable causes the other. Theres always the possibility that some third variable is driving both of the variables you have observed. With observational data, as opposed to data from a designed experiment, there is no way to be sure that a lurking variable is not the cause of any apparent association.
Copyright 2010 Pearson Education, Inc.
Slide 9 - 14
The following scatterplot shows that the average life expectancy for a country is related to the number of doctors per person in that country:
Slide 9 - 15
This new scatterplot shows that the average life expectancy for a country is related to the number of televisions per person in that country:
Slide 9 - 16
Since televisions are cheaper than doctors, send TVs to countries with low life expectancies in order to extend lifetimes. Right? How about considering a lurking variable? That makes more sense Countries with higher standards of living have both longer life expectancies and more doctors (and TVs!). If higher living standards cause changes in these other variables, improving living standards might be expected to prolong lives and increase the numbers of doctors and TVs.
Copyright 2010 Pearson Education, Inc.
Slide 9 - 17
Conclusion
When do points have high leverage? When would we consider a point influential? How are lurking variables and causation related?
Slide 9 - 18