Water Study Model Selection
Water Study Model Selection
A. Introduction.
Observations on nine explanatory (independent) variables were obtained in the water
Level Study: sex gravity totphys bryant vander triangle trailer tree comphys, and two
new variables were created from these: (i) moving and (ii) total. The dependent variable
was y: (pass or fail the water level task). Two goals of the study were:
1 Which subset of the variables are statistically significantly related to passing/failing
the water level task?
2. Can the difference between females and males on the water level task be explained
by the independent variables?
In the following we look at these two issues. We begin by using two subset selection
procedures in SAS Proc Logistic for choosing variables related to the response:
1. Backward elimination
2. Stepwise selection
B. SAS Program:
options ls=72;
data water;
input obs y sex gravity
comphys moving total;
cards;
1
0
1
4
2
1
2
5
...
.
.
.
...
.
.
.
166
0
1
4
;
Proc Logistic; Model
trailer tree comphys
Proc Logistic; Model
trailer tree comphys
run;
3
6
.
.
5
10
12
..
..
12
0
0
.
.
2
6
6
.
.
6
1
4
.
.
3
1
4
.
.
3
1
1
.
.
1
25
37
..
..
35
change in deviance with the value in and out of the model: G2 = 102.932-102.897 =
0.035.
The criterion for significance is G2 > 3.84 = 2(1, .05). We conclude that triangle is not
a significant variable, adjusted for the other variables in the model.
Step 2. Omit the variable identified as not being significant in Step 1. Re-run the
logistic regression model with triangle deleted. Identify the variable with the smallest
G2 for testing its parameter is 0, adjusted for all other variables in the model, as in Step
1. In this case, the variable is totphys. Calculate the change in deviance with the value
in and out of the model: G2 = 102.897-101.660 = 1.237.
The criterion for significance is G2 > 3.84 = 2(1, .05) = 3.84. We conclude that
totphys is not a significant variable, adjusted for the other variables in the model
Continue until no variable, adjusted for others in the model, meets the criterion for
deletion.
Page 2
Response Profile
Ordered
Value
1
2
Total
Frequency
0
1
96
70
228.036
145.104
231.148
179.336
226.036
123.104
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
102.9319
10
<.0001
Score
75.9882
10
<.0001
Wald
39.8693
10
<.0001
S t ep 1 . Effect triangle is removed:
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
SC
-2 Log L
228.036
143.139
231.148
174.259
226.036
123.139
Chi-Square
DF
Likelihood Ratio
102.8970
Score
75.3948
9
Wald
39.8243
9
Page 3
Pr > ChiSq
9
<.0001
<.0001
<.0001
DF
1
Pr > ChiSq
0.8527
228.036
142.376
231.148
170.384
226.036
124.376
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Likelihood Ratio
101.6598
Score
74.7438
8
Wald
39.6571
8
Pr > ChiSq
8
<.0001
<.0001
<.0001
DF
2
Pr > ChiSq
0.6267
228.036
141.542
231.148
166.438
226.036
125.542
Page 4
Testing Global Null Hypothesis: BETA=0
Chi-Square
DF
Likelihood Ratio
100.4940
Score
73.5038
7
Wald
37.9008
7
Pr > ChiSq
7
<.0001
<.0001
<.0001
DF
3
Pr > ChiSq
0.5612
228.036
142.385
231.148
164.169
226.036
128.385
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Likelihood Ratio
97.6504
Score
70.8081
6
Wald
37.6798
6
Pr > ChiSq
6
<.0001
<.0001
<.0001
DF
4
Pr > ChiSq
0.3050
NOTE: No (additional) effects met the 0.05 significance level for entry
into the model.
Summary of Stepwise Selection
Step
1
2
3
4
Effect Number
Removed DF In
triangle 1
9
.0345
totphys 1
8
0.8349
comphys 1
7
1.1476
sex 1
6
2.8044
0.8527
0.3609
0.2840
0.0940
Page 5
Analysis of Maximum Likelihood Estimates
Parameter DF
Standard
Estimate
Wald
Error Chi-Square
Intercept 1
7.8773
2.2166
12.6295
gravity
1 -0.5583
0.1783
9.8099
bryant
1 -0.3691
0.1737
4.5149
vander
1 -0.2044
0.0712
8.2388
trailer
1 -0.7125
0.2961
5.7890
tree
1 -0.4932
0.1642
9.0239
moving
1 2.4148
0.9041
7.1340
Pr > ChiSq
0.0004
0.0017
0.0336
0.0041
0.0161
0.0027
0.0076
Point
95% Wald
Estimate Confidence Limits
0.572
0.403
0.811
0.691
0.492
0.972
0.815
0.709
0.937
0.490
0.274
0.876
0.611
0.443
0.842
11.187
1.902
65.808
DF
10
Pr > ChiSq
<.0001
228.036
231.148
182.993
189.217
226.036
178.993
Chi-Square
DF
Page 6
Pr > ChiSq
Likelihood Ratio
47.0426
Score
41.5656
1
Wald
32.9603
1
<.0001
<.0001
<.0001
DF
9
Pr > ChiSq
<.0001
228.036
165.484
231.148
174.820
226.036
159.484
Chi-Square
DF
Likelihood Ratio
66.5513
Score
57.2050
2
Wald
41.6365
2
Pr > ChiSq
2
<.0001
<.0001
<.0001
DF
Pr > ChiSq
0.0004
228.036
156.637
231.148
169.085
226.036
148.637
Page 7
Chi-Square
DF
Likelihood Ratio
77.3983
Score
64.0331
3
Wald
42.4435
3
Pr > ChiSq
3
<.0001
<.0001
<.0001
DF
7
Pr > ChiSq
0.0039
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
SC
-2 Log L
228.036
150.284
231.148
165.844
226.036
140.284
Chi-Square
DF
Likelihood Ratio
85.7518
Score
66.7439
4
Wald
40.2427
4
Pr > ChiSq
4
<.0001
<.0001
<.0001
DF
6
Pr > ChiSq
0.0177
228.036
143.792
231.148
162.463
226.036
131.792
Page 8
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
94.2442
Score
69.0294
5
Wald
39.2090
5
<.0001
<.0001
<.0001
DF
5
Pr > ChiSq
0.1543
228.036
141.725
231.148
163.509
226.036
127.725
Chi-Square
DF
Likelihood Ratio
98.3107
Score
71.9151
6
Wald
40.1958
6
Pr > ChiSq
6
<.0001
<.0001
<.0001
DF
4
Pr > ChiSq
0.4023
NOTE: No (additional) effects met the 0.05 significance level for entry
1
1
1
1
1
1
Pr > ChiSq
1
2
41.5656
<.0001 .
19.1098
<.0001.
3 10.6764
0.0011 .
4
6.6594
0.0099 .
5
7.5976
0.0058 .
6
3.9691
0.0463 .
Page 9
Standard
Estimate
Wald
Error Chi-Square
Intercept 1
7.7840
2.1401
13.2297
totphys
1 -0.3914
0.1209
10.4818
bryant
1 -0.3412
0.1739
3.8479
vander
1 -0.2059
0.0719
8.2126
trailer
1 -0.6802
0.2846
5.7122
tree
1 -0.4534
0.1640
7.6388
moving
1 2.3768
0.8904
7.1250
Pr > ChiSq
0.0003
0.0012
0.0498
0.0042
0.0168
0.0057
0.0076
Point
95% Wald
Estimate Confidence Limits
0.676
0.533
0.857
0.711
0.506
1.000
0.814
0.707
0.937
0.507
0.290
0.885
0.635
0.461
0.876
10.771
1.881
61.684
D. Conclusions.
1. Variables Selected and Estimates
Odds Ratio Estimates
Backward elimination
Effect
Point
95% Wald
Estimate Confidence Limits
Stepwise Selection
Point
Effect
95% Wald
Estimate Confidence Limits
The two procedures each selected 6 variables with 5 in common; backward elimination
chose gravity while stepwise chose totphysics. The odd ratio and confidence interval
estimates are quite close for all variables.
2. Neither model includes sex. We conclude that adjusted for these 6 independent
variables sex does not affect passing/failing.