Principles of Model Building
Principles of Model Building
Population:
Sample:
Why is Model Building so Important?
E.g. 1: How to fit with sample data?
Model 1:
Model 2:
Why is Model Building so Important?
E.g. 2: All Earthquake Data in Japan (1964 – 2011)
Frequency Magnitude
Model 1 Model 2
Lesson Outline
1 Introduction: Why Model Building Is Important
2 Models with a Single Quantitative Independent Variable
3 First-Order Models with Two or More Quantitative Independent Variables
4 Second-Order Models with Two or More Quantitative Independent Variables
5 Coding Quantitative Independent Variables (Optional)
6 Models with One Qualitative Independent Variable
7 Models with Two Qualitative Independent Variables
8 Models with Three or More Qualitative Independent Variables
9 Models with Both Quantitative and Qualitative Independent Variables
10 External Model Validation
6
Models with a Single Quantitative Independent
Variable
7
Modeling exam score, y, as a function of
study time, x
8
9
10
11 Two second-order polynomial models
12 Example of the use of a quadratic model
13
14 Two third-order polynomial models
15
16 Scatterplot for power load data
17 Excel Output for 3rd order model
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.979483347
R Square 0.959387626
Adjusted R Square 0.953585859
Standard Error 5.501031179
Observations 25
ANOVA
df SS MS F Significance F
Regression 3 15012.16218 5004.054058 165.3612626 9.13676E-15
Residual 21 635.4882247 30.26134403
Total 24 15647.6504
Regression Statistics
Multiple R 0.979470618
R Square 0.959362691
Adjusted R Square 0.95566839
Standard Error 5.376203469
Observations 25
ANOVA
df SS MS F Significance F
Regression 2 15011.772 7505.885999 259.687216 4.99085E-16
Residual 22 635.8784022 28.90356374
Total 24 15647.6504
A. Either model is good, as they both have significant Global F-test, and high value of R2
B. Third-order model is marginally better, as its value of R2 is slightly higher than the
second-order model
C. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.91067.
D. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.20346.
First-Order Models with Two or More
22
Quantitative Independent Variables
Response surface for first-order model with two
23
quantitative independent variables
24
Contour lines of E(y) for
25
x2 = 1,2,3 (first-order model)
26
Second-Order Models with Two or More
Quantitative Independent Variables
Response surface for an interaction model
27
(second-order)
Contour lines of E(y) for x2 = 1,2,3
28
(first-order model plus interaction)
29
30 Definition of Interaction
Regression Statistics
Multiple R 0.996496918
R Square 0.993006107
Adjusted R Square 0.991340894
Standard Error 1.678696006
Observations 27
ANOVA
df SS MS F Significance F
Regression 5 8402.264537 1680.452907 596.3239222 7.02346E-22
Residual 21 59.17842593 2.818020282
Total 26 8461.442963
Regression Statistics
Multiple R 0.45281092
R Square 0.205037729
Adjusted R Square 0.146151635
Standard Error 168.9478223
Observations 30
ANOVA
df SS MS F Significance F
Regression 2 198772.4667 99386.23333 3.48193801 0.045152103
Residual 27 770670.9 28543.36667
Total 29 969443.3667
Regression Statistics
Multiple R 0.601691084
R Square 0.362032161
Adjusted R Square 0.122794221
Standard Error 13.74958677
Observations 12
ANOVA
df SS MS F Significance F
Regression 3 858.2575758 286.0858586 1.513272356 0.283753136
Residual 8 1512.409091 189.0511364
Total 11 2370.666667
ANOVA
df SS MS F Significance F
Regression 5 2303 460.6 40.84137931 0.000147737
Residual 6 67.66666667 11.27777778
Total 11 2370.666667
The number of observations in the new data set should be large enough to reliably assess the
model's prediction performance. Montgomery, Peck, and Vining (2006) recommend 15-20
new observations, at the minimum.
Leaving each observation out of the data set, one at a time, and calculate: