Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance
Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance
Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance
- p. 1/16
Today
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS
- p. 2/16
Spline models
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Dropping an observation
Different residuals
f (x) =
3
X
0j xj +
j=0
DF F IT S
Cooks distance
where
DF BET AS
(x ti )+ =
h
X
i=1
x ti
0
i (x ti )3+
if x ti 0
otherwise.
Here is an example.
Conditioning problem again: B-splines are used to keep the
model subspace the same but have the design less
ill-conditioned.
Other bases one might use: Fourier: sin and cos waves;
Wavelet: space/time localized basis for functions.
- p. 3/16
Today
Spline models
Yi = 0 + 1 Xi1 + + p Xi,p1 + i
- p. 4/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
- p. 5/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
For 1 j p 1 let
eij = ei + bj Xij .
Cooks distance
DF BET AS
- p. 6/16
Added-variable plot
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Y = X(j) (j) + j Xj +
DF F IT S
Cooks distance
DF BET AS
- p. 7/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
DF BET AS
- p. 8/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
- p. 9/16
Dropping an observation
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS
Basic idea: if Ybj(i) is very different than Ybj (using all the data)
then i is an influential point for determining Ybj .
- p. 10/16
Different residuals
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Studentized residuals: ti = ei /d
(i) 1 Hii tnp1 .
(rstudent)
- p. 11/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
- p. 12/16
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS
m
i=1 |Ti |
m
X
i=1
t1/(2m),np2
P |Ti | t1/(2m),np2
m
X
= .
=
m
i=1
- p. 13/16
DF F IT S
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Ybi Ybi(i)
DF F IT Si =
b(i) Hii
- p. 14/16
Cooks distance
Today
Spline models
What are the assumptions?
Di =
Pn
bj Ybj(i) )2
(
Y
j=1
p
b2
This quantity measures how much the entire regression
function changes when the i-th variable is deleted.
Should be comparable to Fp,np : if the p-value of Di is 50
percent or more, then the i-th point is likely influential:
investigate this point further.
DF BET AS
- p. 15/16
DF BET AS
Today
Spline models
What are the assumptions?
Problems in the regression
function
Partial residual plot
Added-variable plot
Problems with the errors
Outliers & Influence
Dropping an observation
Different residuals
Crude outlier detection test
Bonferroni correction for
multiple comparisons
DF F IT S
Cooks distance
DF BET AS
DF BET ASj(i) = q
bj bj(i)
2 (X T X)1
b(i)
jj
- p. 16/16