NASA Regression Lecture
NASA Regression Lecture
Geoff Vining
Virginia Tech
1. Overview of Modeling
2. Review of Simple Linear Regression
3. Multiple Linear Regression
4. Residual Analysis
5. Transformations
6. Influence Diagnostics
7. Collinearity
8. Model Selection
9. Logistic Regression
Chapter 1: Overview of Regression
Scientific Method
A deterministic model is
y = β0 + βt t + βT T + βA A
where
I the β’s are constants,
I t is the deposition time,
I T is the deposition temp, and
I A is the Argon flow rate.
Regression Models
A better model is
yi = β0 + βt t + βT T + βA A + ϵi .
yi = β0 + β1 xi + ϵi
yi = β0 + β1 xi + β11 xi2 + ϵi
Note: we can let x1i = xi , x2i = xi2 , and let β2 = β11 .
yi = β0 + β1 x1i + β2 x2i + ϵi
A linear model means linear in the parameters (the β’s).
1
yi = β0 + β1 + ϵi
xi
Three basic methods of collecting data are:
Reboil
Temp. Pres. Flow Rate
120 2 100
150 2 100
120 3 100
150 3 100
120 2 150
150 2 150
120 3 150
150 3 150
23 Factorial Experiment
P
C
Chapter 2
y = mx + b
I m is the slope
I b is the y -intercept
Later, we will use Greek letters to define the model.
The following data are the vapor pressures (mm Hg) from
the freezing point of water to its boiling point (temperatures
reported in degrees Kelvin).
Scatter Plots
Temp vp
273 4.6
283 9.2
293 17.5
303 31.8
313 55.3
323 92.5
333 149.4
343 233.7
353 355.1
363 525.8
373 760.0
Scatter Plots
Scatter Plots
yi = β0 + β1 xi + ϵi ,
where
I yi is the response, in this case, the vapor pressure at the
ith temperature,
I xi is the predictor or regressor, in this case, the ith
temperature,
I β0 is the y -intercept,
I β1 is the slope (in our case, we expect β1 to be
positive), and
I ϵi is a random error.
The Formal Simple Linear Regression Model
E(yi ) = β0 + β1 xi ,
which is a straight line.
Note:
I ŷi is an estimate or prediction of yi .
I β̂0 is an estimate of the y -intercept.
I β̂1 is an estimate of the slope.
One possible line through our scatter plot is the following.
Least Squares Estimation of the Model
Least Squares Estimation of the Model
ei = yi − ŷi .
For a good estimated line, all of the residuals should be
“small”.
∑
n ∑
n
ei = (yi − ŷi )
i=1 i=1
Least Squares Estimation of the Model
A better measure?
∑
n ∑
n
SSres = ei2 = (yi − ŷi )2 .
i=1 i=1
β̂0 = y − β̂1 x.
SSxy
β̂1 =
SSxx
where
∑
n
SSxy = (yi − y )(xi − x)
i=1
and
∑
n
SSxx = (xi − x)2
i=1
Statistical Properties of the Estimators
E[β̂0 ] = β0 .
1 x2
var[β̂0 ] = σ 2 [ + ]
n SSxx
E[β̂1 ] = β1
σ2
var[β̂1 ] =
SSxx
x2
cov[β0 , β̂1 ] = −σ 2 .
SSxx
Understanding the Variance of β̂1
Suppose you control the xs.
How should you choose them to minimize var[β̂1 ]?
Partitioning the Total Variability
∑
n
SSreg = (ŷi − y )2 .
i=1
SSreg SSres
R2 = =1−
SStotal SStotal
It can be shown that 0 ≤ R 2 ≤ 1.
H0 : β1 = 0
Ha : β1 ̸= 0
SSreg
MSreg = ,
dfreg
where
MSreg
F = .
MSres
The degrees of freedom for the test statistic are 1 for the
numerator and n − 2 for the denominator (for simple linear
regression).
The Overall F -Test
It can be shown that
E [MSres ] = σ 2
One way to view this F statistics is as a signal-to-noise ratio.
′
Fdfreg ,dfres ,λ
number of parameters − 1 = 2 − 1 = 1.
The Overall F -Test
= n − 2 = 11 − 2 = 9.
We obtain the mean squares by dividing the appropriate sum
of squares by the corresponding degrees of freedom.
Ha : β1 < 0 Ha : β1 > 0 Ha : β1 ̸= 0.
The form of the test statistic is
β̂1
t=
σ̂β̂1
Apart from rounding errors, the square of the value for the t
statistic is the F statistic from the global test.
Test for β1
′
tdfres ,δ
where
√
β1 SSxx
δ=
σ
Note: δ controls the power of the test.
If you control the xs, thus, SSxx , how should you pick them
to maximize the power?
Confidence and Prediction Bands
We can construct confidence intervals around any
observation by noting that the estimated variance for a
predicted value is
( )
1 (x − x)2
MSres + .
n SSxx
Analysis of Variance
Source DF SS MS F P
Regression 1 491662 491662 35.57 0.000
Residual Error 9 124403 13823
Total 10 616065
Using Software
New
Obs Fit SE Fit 95% CI 95% PI
1 -131.1 66.3 (-281.1, 18.9) (-436.5, 174.3)
2 -64.2 57.2 (-193.6, 65.1) (-360.0, 231.5)
3 2.6 48.9 (-107.9, 113.1) (-285.4, 290.6)
4 69.5 41.9 ( -25.4, 164.3) (-212.9, 351.8)
5 136.3 37.2 ( 52.2, 220.4) (-142.6, 415.3)
6 203.2 35.4 ( 123.0, 283.4) ( -74.6, 481.0)
7 270.0 37.2 ( 185.9, 354.1) ( -8.9, 549.0)
8 336.9 41.9 ( 242.0, 431.8) ( 54.5, 619.3)
9 403.7 48.9 ( 293.2, 514.3) ( 115.7, 691.8)
10 470.6 57.2 ( 341.3, 599.9) ( 174.9, 766.3)
11 537.4 66.3 ( 387.4, 687.5) ( 232.1, 842.8)
Using Software
Some Considerations in the Use of Regression
x1 x2 x3 y
53.4 4.5 1.10 730
60.4 9.9 1.08 725
60.8 8.1 1.41 710
61.9 6.8 1.03 710
61.8 6.8 0.99 700
61.9 6.6 0.90 715
61.1 6.4 0.91 710
59.0 7.6 1.36 740
59.3 7.0 1.31 730
56.6 7.6 1.07 730
Introduction to Multiple Linear Regression
∑
k
yi = β0 + βj xij + ϵi
j=1
where
I yi is the ith response,
I xij is the ith value for the jth regressor,
I k is the number of regressors,
I β0 is the y -intercept,
I βj is coefficient associated with the jth regressor, and
I ϵi is a random error with mean 0 and constant variance
σ2.
Model and Ordinary Least Squares Revisited
∑
k
ŷi = β̂0 + β̂j xij
j=1
where
I ŷi is the predicted response,
I β̂0 is the estimated y -intercept, and
I β̂j is the estimated coefficient for the jth regressor.
Model and Ordinary Least Squares Revisited
∑
n
SSres = (yi − ŷi )2 .
i=1
y = Xβ + ϵ
where
y1 1 x11 x12 . . . x1k β0 ϵ1
y2 1 x21 x22 . . . x2k β1 ϵ2
y = ..
X = .. .. .. ..
.. β = ..
ϵ = ..
. . . . . . . .
yn 1 xn1 xn2 . . . xnk βk ϵn
Matrix Notation
(X ′ X )β̂ = X ′ y .
β̂ = (X ′ X )−1 X ′ y .
Matrix Notation
The variance of β̂ is
σ 2 (X ′ X )−1 .
The sum of squares of the residuals is
y ′ [I − X (X ′ X )−1 X ′ ]y .
The vector of predicted values is
ŷ = X (X ′ X )−1 X ′ y .
Sometimes we call X (X ′ X )−1 X ′ the hat matrix.
Matrix Notation
SSreg SSres
R2 = =1− ,
SStotal SStotal
where
∑
n
SSreg = (ŷi − y )2 ,
i=1
MSres
2
Radj =1−
MStotal
Overall F Test
Second, the overall F test, which tests the hypotheses
H0 : β1 = β2 = · · · = βk = 0
Ha : at least one of the β’s ̸= 0.
SSreg
MSreg = ,
dfreg
where dfreg is the number of regressors, and
SSres
MSres = ,
dfres
where dfres is the number of observations (n) minus the
number of parameters estimated (k + 1).
Overall F Test
MSreg
F = ,
MSres
and has k numerator degrees of freedom and n − k − 1
denominator degrees of freedom.
E (MSres ) = σ 2
[ ]
β ′ X ′ X (X ′ X )−1 X ′ − 1(1′ 1)−1 1′ X β
1
E (MSreg ) = σ 2 +
n−p
where 1 is a n × 1 vector of 1’s.
The t Test for an Individual Coefficient
H0 : βj = 0
Ha : βj ̸= 0
β̂j
t= .
σ̂β̂j
The t Test for an Individual Coefficient
MSres · x ′0 (X ′ X )−1 x 0 .
The estimated variance for predicting a new response at that
setting is
MSres · [1 + x ′0 (X ′ X )−1 x 0 ].
y = Xβ + ϵ = X 1 β1 + X 2 β2 + ϵ
Consider a hypothesis test of the form
H0 : β 2 = 0
Ha : β 2 ̸= 0
Define SS(β 2 |β 1 ) by
[ ]
SS(β 2 |β 1 ) = y ′ X (X ′ X )−1 X ′ − X 1 (X ′1 X 1 )−1 X ′1 y
Extra Sum of Squares Principle
X = [1 X r ]
[ ]
SSreg = y ′ X (X ′ X )−1 X ′ − 1(1′ 1)−1 1′ y
This approach also includes the t tests on the individual
parameters.
Extra Sum of Squares Principle
[ ]
E [SS(β 2 |β 1 )] = p2 σ 2 + β2′ X ′2 I − X 1 (X ′1 X 1 )−1 X 1 X 2 β2
where p2 is the number of parameters in β 2 .
SS(β 2 |β 1 )/p2
F =
MSres
Extra Sum of Squares Principle
1 ′ ′[ ]
λ =
σ2
β2 X 2 I − X 1 ( X ′
1 X 1 ) X
−1 ′
1 X 2 β2
1 ′[ ′ ′ ]
= 2
β 2 X 2 X 2 − X ′2 X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
σ
[ ]
= β ′2 X ′2 X 2 β2
Thus, λ is maximized!
Note:
1 ′ ′[ ]
λ =
σ2
β 2 X 2 I − X 1 ( X ′
1 X 1 ) X
−1 ′
1 X 2 β2
1 ′ ′ ′[ ]
= 2
β 2 A X 1 I − X 1 (X ′1 X 1 )−1 X ′1 X 1 Aβ 2
σ
1 ′[ ′ ′ ]
= 2
β 2 A X 1 X 1 A − A′ X ′1 X 1 (X ′1 X 1 )−1 X ′1 X 1 A β 2
σ
1 ′[ ′ ′ ]
= 2
β 2 A X 1 X 1 A − A′ X ′1 X 1 A β 2 = 0
σ
Impact of Collinearity on Testing
X 2 ≈ X 1A
In this situation, the regressors that form X 2 are almost
perfectly related to at least some of the regressors in X 1 .
A real example: Georgia Power Data
Using Software
We illustrate the basic multiple linear regression analysis with
the Coking Heat data.
Analysis of Variance
Source DF SS MS F P
Regression 3 9081.6 3027.2 13.48 0.000
Residual Error 21 4714.4 224.5
Total 24 13796.0
Chapter 4
Introduction to Residuals
Underlying Assumptions for OLS
Recall, ei = yi − ŷi .
E[e ] = 0.
var[e ] = σ 2 [I − X (X ′ X )−1 X ′ ].
Information about the Random Errors
σ 2 (1 − hii )
where hii is the i th diagonal element of the hat matrix.
Leverage points are distant from the other data in terms of the
regressors.
In many analyses, the outliers are the most informative data points!
Surfactant Data
Useful Plots
Transformations
Common Transformations
Primary purposes:
I correct problems with the constant variance assumption
I corrects problems with the normality assumption.
Common Transformations
y λ −1
λ ̸= 0
λy ∗λ−1
y (λ) =
y ∗ ln(y ) λ = 0
Analysis of Variance
Source DF SS MS F P
Regression 1 28.238 28.238 916.71 0.000
Residual Error 9 0.277 0.031
Total 10 28.515
Influence Diagnostics
“Too Much in Love with Your Model”
Big Problem!
Major caution:
I Natural tendency is to say an observation is an outlier
I if the absolute value > 2 (α = .05)
I if the absolute value > 3 (α = .0027)
I Such a cut-off ignores the multiple comparison problem!
( )
I If we have n observations, actually are performing n2
comparisons.
I To preserve an overall α of .05 (which is large!), need to use
0.05
(n) for each observation.
2
Leverage
H = X (X ′X )−1X ′
Leverage
[ ]
trace [H ] = trace X (X ′X )−1X ′
[ ]
= trace X ′X (X ′X )−1
= trace [I p ] = p
The basic idea: How much does a specific data point impact the
vectors of predicted values?
Mathematically,
( )′ ( )
ŷ (i) − ŷ ŷ (i) − ŷ
Di =
pMSres
Cook’s D
The basic idea: how much does the prediction of the i th response
change when we drop the i th data point?
Computational formula:
ŷi − ŷ(i)
DFFITSi = √
MSres,(i) hii
The generally recommended cut-off value is
√
p
|DFFITSi | > 2
n
DFBETAS
Computationally:
β̂ j − β̂ j(i)
DFBETASi,j = √
MSres,(i) Cjj
where Cjj is the j th diagonal element of (X ′ X )−1 .
Note:
I COVRATIOi > 1 indicates that the i th data value improves
precision
I COVRATIOi < 1 indicates that the i th data value hurts
precision
Suggested cut-off values are
I COVRATIOi > 1 + 3p/n
I COVRATIOi < 1 − 3p/n
Jet Turbine Example: Table B.13 in the Appendix
Analysis of Variance
Source DF SS MS F P
Regression 6 9908846 1651474 2350.13 0.000
Residual Error 33 23190 703
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x2
Residuals versus x3
Residuals versus x4
Residuals versus x5
Residuals versus x6
Influence Analysis
We note that p = 7:
I 2p
Cut off for leverage: = .35
n
√
I Cut off for DFFITS: 2 pn = .8367
Flagged observations:
I No. 11 hii = 0.4987, DFFITS = 1.6546
I No. 20 hii = 0.7409, Cook’s D = 3.0072, DFFITS = −5.1257
I No. 28 DFFITS = 0.8925
Chapter 9
Overview to Collinearity
What is Multicollinearity?
Most people define the condition number as the ratio of the largest
eigenvalue of X ′ X to the smallest.
Note:
I a VIF of 2 implies that Rj2 = 0.5, which implies a pairwise
r ≈ 0.7
I a VIF of 5 implies that Rj2 = 0.8, which implies a pairwise
r ≈ 0.9
I a VIF of 10 implies that Rj2 = 0.9, which implies a pairwise
r ≈ 0.95
Key point: VIFs > 5 represent serious relationships among the
regressors!
Jet Turbine Data
Analysis of Variance
Source DF SS MS F P
Regression 6 9908846 1651474 2350.13 0.000
Residual Error 33 23190 703
Total 39 9932036
Designed Experiments
Analysis of Variance
Source DF SS MS F P
Regression 4 233.218 58.304 21.77 0.000
Residual Error 11 29.467 2.679
Total 15 262.684
Correcting Multicollinearity
Model Selection
Basic Issues
y = X 1 β1 + ϵ
when the true model is
y = X 1 β1 + X 2 β2 + ϵ .
We note that
β̂ = β̂ 1 = (X ′1 X 1 )−1 X ′1 y .
Impact of Underspecifying the Model
Consequences:
(1) E(β̂)
(2) var(β̂)
var(β̂1 ) = σ 2 (X ′1 X 1 )−1
(3) SSres
[ ]
E(SSres ) = (n − p1 )σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2
Impact of Underspecifying the Model
1 [ ]
E(MSres ) = σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
n − p1
Consequences:
I Our estimate of σ 2 is biased too large.
I The denominator of all of our tests is larger than it should be.
I We have reduced power!
Impact of Underspecifying the Model
(5) E [ŷi ]
E(θ̂) − θ
where θ is the parameter being estimated.
Impact of Underspecifying the Model
∑
n
[ ]
bias2 (ŷi ) = β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2
i=1
Note: This squared bias is very similar to the bias for MSres !
Impact of Overspecifying the Model
∑
n
var(ŷi )
i=1
We observe that
[ ′ −1 ′
]
var(ŷ ) = σ2 X 1 (X 1 X 1 ) X1
Thus,
[ the sum of the] predicted variances is the trace of
σ 2 X 1 (X ′1 X 1 )−1 X ′1 .
Recall,
SSreg SSres
R2 = =1− .
SStotal SStotal
2 by
We define Radj
MSres
2
Radj =1− .
MStotal
Consider
∑
n ∑
n ∑
n
MSE(ŷi ) = var(ŷi ) + bias2 (ŷi ).
i=1 i=1 i=1
∑
n
[ ]
MSE(ŷi ) = pσ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2 .
i=1
We note that
1 [ ]
E(MSres ) = σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
n − p1
As a result, an unbiased estimate of the sum of the squared biases
is
[ ]
(n − p1 ) MSres − σ 2
Mallow’s Cp
1 ∑
n
Cp = MSE(ŷi )
σ2
i=1
(n − p) [MSres − MSres,full ]
Cp = p +
MSres,full
where MSres,full is the mean squared residual from the full model.
Basically, we use the MSres from the full model as our estimate of
σ2.
Mallow’s Cp
SSres
Cp = 2p + − n.
σ2
PRESS
The PRESS statistic stands for predicted sum of squares.
It is defined to be
n (
∑ )2
ei
PRESS = .
1 − hii
i=1
In general,
I 2 likes bigger models.
Radj
I PRESS likes smaller models.
I Cp falls in between.
Other Popular Measures
AIC = −2ln(L) + 2p
I BIC: Bayesian Information Criterion.
SSres
AIC = 2p + n ln(2π) + n ln(σ 2 ) +
σ2
SSres
= 2p + − n + n + n ln(2π) + n ln(σ 2 )
σ2
SSres
= 2p + 2 − n + C
σ
= Cp + C ,
Consequence: The best models in terms of AIC are the same best
models for Mallow’s Cp .
It estimates each one regressor model and picks the best according
to an entrance requirement, based on the model’s F statistic.
Note: the technique estimates k models
The method then estimates each two regressor model based on the
term already included.
p s f p
r e u r e a
Mallows i c e e x m
Vars R-Sq R-Sq(adj) Cp S m d l s h b
1 99.0 99.0 101.8 50.486 X
1 99.0 99.0 104.7 51.010 X
1 95.3 95.1 633.0 111.22 X
2 99.6 99.5 26.7 33.941 X X
2 99.3 99.3 62.9 42.899 X X
2 99.3 99.3 65.2 43.400 X X
3 99.7 99.7 7.6 27.791 X X X
3 99.7 99.7 7.9 27.911 X X X
3 99.6 99.6 24.6 33.235 X X X
Example: Jet Turbine Data
p s f p
r e u r e a
Mallows i c e e x m
Vars R-Sq R-Sq(adj) Cp S m d l s h b
4 99.7 99.7 5.6 26.725 X X X X
4 99.7 99.7 6.9 27.205 X X X X
4 99.7 99.7 8.8 27.923 X X X X
5 99.8 99.7 5.6 26.362 X X X X X
5 99.8 99.7 7.1 26.916 X X X X X
5 99.7 99.7 8.8 27.585 X X X X X
6 99.8 99.7 7.0 26.509 X X X X X X
Example: Jet Turbine Data
Models with Cp ≤ 7:
p s f
p
r e u
r e a
Mallows i c e
e x m
Vars R-Sq R-Sq(adj) Cp S m d l
s h b
4 99.7 99.7 5.6 26.725 X X X X
4 99.7 99.7 6.9 27.205 X X X X
5 99.8 99.7 5.6 26.362 X X X X X
6 99.8 99.7 7.0 26.509 X X X X X X
Example: Jet Turbine Data
Analysis of Variance
Source DF SS MS F P
Regression 4 9907039 2476760 3467.86 0.000
Residual Error 35 24997 714
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x3
Residuals versus x5
Residuals versus x6
Problem Children
We note that p = 5
I Cut-off for leverage: .25
I Cut-off for DFFITS: .7071
Problem Children:
I 10: hii = .3065
I 11: hii = .2516
I 20: DFFITS = -1.0518
I 28: DFFITS = .7620
Analysis of Model B
Source DF SS MS F P
Regression 5 9908408 1981682 2851.63 0.000
Residual Error 34 23628 695
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x3
Residuals versus x4
Residuals versus x5
Residuals versus x6
Problem Children
We note that p = 6
I Cut-off for leverage: .30
I Cut-off for DFFITS: .7746
Problem Children:
I 10: hii = .3065
I 11: hii = .4903, DFFITS = 1.5109
I 20: DFFITS = -1.1401
I 21: DFFITS = -0.9404
I 28: DFFITS = .8128
Analysis of Model C
Analysis of Variance
Source DF SS MS F P
Regression 4 9906133 2476533 3346.22 0.000
Residual Error 35 25903 740
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x4
Residuals versus x5
Residuals versus x6
Problem Children
We note that p = 5
I Cut-off for leverage: .25
I Cut-off for DFFITS: .7071
Problem Children
Problem Children:
I 6: hii = 0.2525
I 9: hii = 0.2648
I 10: hii = 0.2870
I 11: hii = 0.3892
I 20: DFFITS = -0.9746
I 21: DFFITS = -0.8429
I DFFITS = -0.7659
“Conclusions”
Forward: Model B
Backward: Model A
Stepwise: Model A
Final Comments on Model Selection
Let
yi
pi = .
mi
Let πi be the expected value of pi .
πi (1 − πi )
var[pi ] = .
mi
Rather than modeling pi directly, we use the logistic function to
transform the pi ’s.
What the Logistic Function Does
Estimation of the Logistic Regression Model
For convenience, define
[ ]
πi
ηi = ln
1 − πi
and
[ ]
π̂i
η̂i = ln
1 − π̂i
At least initially, π̂i = pi .
To a first-order approximation:
1
var(η̂i ) = .
mi πi (1 − πi )
Estimation of the Logistic Regression Model
V̂ = diag
1
mi π̂i (1 − π̂i )
,
or
∑
n ∑
n ∑
n
L= yi lnπi + mi ln (1 − πi ) − yi ln (1 − πi ) .
i=1 i=1 i=1
Estimation of the Logistic Regression Model
Recall
exp(x ′i β)
1 + exp(x ′i β)
πi = .
X ′ (y − µ̂) = 0,
where
y = (y1 , y2 , . . . , yn )′ , and
µ̂ = (m1 π̂1 , m2 π̂2 , . . . , mn π̂n )′ .
Estimation of the Logistic Regression Model
or
X ′ V −1 (η − η̂) = 0,
where V is the diagonal matrix formed from the variances of the
η̂i ’s.
Using this procedure, if the model assumptions are correct, one can
show that asymptotically
( )
E (β̂ n ) = β and var β̂ n = (X ′ V −1 X )−1 .
Interpretation of the Parameters
π̂(x)
ln = β̂0 + β̂1 x.
1 − π̂(x)
The log-odds ratio at x + 1 is
π̂(x + 1)
ln = β̂0 + β̂1 (x + 1).
1 − π̂(x + 1)
Interpretation of the Parameters
Oddsx+1
Ôr = = exp(β̂1 ).
Oddsx
The odds ratio is the estimated increase of the probability of a
success given a one-unit increase in x.
Analog to the Global F Test
For logistic regression, we use log-likelihood theory to construct
the test statistic.
Let Lred be the likelihood function for the reduced model evaluated
at the MLE for β 1 .
Lfull
G 2 = 2 ln
Lred
Analog to the Global F Test
The test statistic, G 2 , asymptotically follows a χ2 distribution with
k degrees of freedom under the null hypothesis.
H0 : β1 = β2 = . . . = βk = 0
Ha : at least one βj ̸= 0
E(ηi ) = β0 .
Test for the Individual Coefficients
(X ′ V −1 X )−1 = −G .
Let cjj be the j th diagonal element of (X ′ V −1 X )−1 .
Test for the Individual Coefficients
H0 : β j = 0
is
β̂j
.
cjj
Asymptotically, this statistic follows a standard normal distribution
under the null hypothesis.
Response Information
Odds 95% CI
Ratio Lower Upper
0.98 0.97 0.99
Log-Likelihood = -10.182
Goodness-of-Fit Tests
Method Chi-Square DF P
Pearson 19.5867 18 0.357
Deviance 17.5911 18 0.483
Hosmer-Lemeshow 7.0039 8 0.536
The Pneumoconiosis Data
options ls=70;
data coal;
input years cases n;
cards;
5.8 0 98
15.0 1 54
21.5 3 43
27.5 8 48
33.5 9 51
39.5 8 38
46.0 10 28
51.5 5 11
proc logistic descending;
model cases/n = years;
output out=coal2 resdev=r p=p;
run;
The Pneumoconiosis Data
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
1. Overview of Modeling
2. Review of Simple Linear Regression
3. Multiple Linear Regression
4. Residual Analysis
5. Transformations
6. Influence Diagnostics
7. Collinearity
8. Model Selection
9. Logistic Regression
Take-Home Messages