0% found this document useful (0 votes)
99 views75 pages

Principles of Model Building

The document discusses principles of model building. It provides examples of using sample data to fit models and comparing different polynomial models to earthquake data in Japan from 1964 to 2011. It also covers topics like coding quantitative variables, models with qualitative independent variables, and interaction effects between variables.

Uploaded by

Wingtung Ho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views75 pages

Principles of Model Building

The document discusses principles of model building. It provides examples of using sample data to fit models and comparing different polynomial models to earthquake data in Japan from 1964 to 2011. It also covers topics like coding quantitative variables, models with qualitative independent variables, and interaction effects between variables.

Uploaded by

Wingtung Ho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Principles of Model Building

Dr. William Lau


Tel: 3943 8572
[email protected]
Why is Model Building so Important?
E.g. 1: Population Data VS Sample Data

Population:

Sample:
Why is Model Building so Important?
E.g. 1: How to fit with sample data?

Model 1:

Model 2:
Why is Model Building so Important?
E.g. 2: All Earthquake Data in Japan (1964 – 2011)

Frequency Magnitude

Model 1 Model 2
Lesson Outline
1 Introduction: Why Model Building Is Important
2 Models with a Single Quantitative Independent Variable
3 First-Order Models with Two or More Quantitative Independent Variables
4 Second-Order Models with Two or More Quantitative Independent Variables
5 Coding Quantitative Independent Variables (Optional)
6 Models with One Qualitative Independent Variable
7 Models with Two Qualitative Independent Variables
8 Models with Three or More Qualitative Independent Variables
9 Models with Both Quantitative and Qualitative Independent Variables
10 External Model Validation
6
Models with a Single Quantitative Independent
Variable
7
Modeling exam score, y, as a function of
study time, x
8
9
10
11 Two second-order polynomial models
12 Example of the use of a quadratic model
13
14 Two third-order polynomial models
15
16 Scatterplot for power load data
17 Excel Output for 3rd order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.979483347
R Square 0.959387626
Adjusted R Square 0.953585859
Standard Error 5.501031179
Observations 25

ANOVA
df SS MS F Significance F
Regression 3 15012.16218 5004.054058 165.3612626 9.13676E-15
Residual 21 635.4882247 30.26134403
Total 24 15647.6504

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 331.2526803 477.1114657 0.694287822 0.495111584 -660.9549293 1323.46029
TEMP -6.391912447 16.79082602 -0.380678856 0.707265164 -41.3103467 28.5265218
TEMP2 0.037753973 0.19451185 0.194096006 0.847965995 -0.366755563 0.44226351
TEMP3 8.43217E-05 0.000742596 0.113549971 0.91067311 -0.001459991 0.001628634
18 MINITAB output for 3rd order model
19 Excel Output for 2nd order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.979470618
R Square 0.959362691
Adjusted R Square 0.95566839
Standard Error 5.376203469
Observations 25

ANOVA
df SS MS F Significance F
Regression 2 15011.772 7505.885999 259.687216 4.99085E-16
Residual 22 635.8784022 28.90356374
Total 24 15647.6504

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 385.0480932 55.17243578 6.978993909 5.2661E-07 270.6274646 499.4687219
TEMP -8.292526804 1.29904502 -6.383556132 2.00975E-06 -10.98658128 -5.598472322
TEMP2 0.059823368 0.007548554 7.925142711 6.8979E-08 0.044168625 0.075478111
20 MINITAB output for 2nd order model
21 Test Your Understanding
Do you think the second-order model or the third-order model is a better choice in the
previous example?

A. Either model is good, as they both have significant Global F-test, and high value of R2
B. Third-order model is marginally better, as its value of R2 is slightly higher than the
second-order model
C. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.91067.
D. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.20346.
First-Order Models with Two or More
22
Quantitative Independent Variables
Response surface for first-order model with two
23
quantitative independent variables
24
Contour lines of E(y) for
25
x2 = 1,2,3 (first-order model)
26
Second-Order Models with Two or More
Quantitative Independent Variables
Response surface for an interaction model
27
(second-order)
Contour lines of E(y) for x2 = 1,2,3
28
(first-order model plus interaction)
29
30 Definition of Interaction

Two variables are said to interact if the change in E(y)


for a 1-unit change in a variable (holding the other
variable fixed) is dependent on the value of the other
variable.
31
32 Graphs of three second-order surfaces
Contours of E(y) for x2 = -1,0,1 (complete
33 second-order model)
34
35 Definition of Complete Second Order Model

The complete second-order model includes the constant,


all linear (first-order), all two-variable interactions,
and all quadratic terms.
36
37 Excel output for complete second-order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.996496918
R Square 0.993006107
Adjusted R Square 0.991340894
Standard Error 1.678696006
Observations 27

ANOVA
df SS MS F Significance F
Regression 5 8402.264537 1680.452907 596.3239222 7.02346E-22
Residual 21 59.17842593 2.818020282
Total 26 8461.442963

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -5127.899074 110.2960149 -46.49215185 1.15297E-22 -5357.272194 -4898.525954
TEMP 31.09638889 1.344413217 23.1300827 2.01046E-16 28.30052855 33.89224923
PRESSURE 139.7472222 3.140054116 44.50471777 2.86045E-22 133.2171222 146.2773222
TEMP2 -0.133388889 0.006853248 -19.46360234 6.45546E-15 -0.147640998 -0.11913678
2
PRESSURE -1.144222222 0.027412991 -41.74014512 1.0841E-21 -1.201230658 -1.087213787
TEMP*PRESSURE -0.1455 0.009691956 -15.01244964 1.05883E-12 -0.165655526 -0.125344474
38 SAS output for complete second-order model
39 Graph of complete second-order model
40
Coding Quantitative Independent Variables
(Optional)
41 Optional
42 Optional
Optional

43 MINITAB printout for the quadratic model


Optional

44 MINITAB descriptive statistics for temperature, x


45 Optional
Optional

MINITAB printout for the quadratic model


46
with coded temperature
47 Models with One Qualitative
Independent Variable
48

Basically, we first arbitrarily select one level to be the base level,


then we set up dummy variables for the remaining levels.
49
50
51 Excel Output for the dummy variable model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.45281092
R Square 0.205037729
Adjusted R Square 0.146151635
Standard Error 168.9478223
Observations 30

ANOVA
df SS MS F Significance F
Regression 2 198772.4667 99386.23333 3.48193801 0.045152103
Residual 27 770670.9 28543.36667
Total 29 969443.3667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 279.6 53.42599243 5.233407697 1.62749E-05 169.9789184 389.2210816
Kentucky 80.3 75.55576307 1.062791199 0.297290401 -74.72762037 235.3276204
Texas 198.2 75.55576307 2.623228089 0.01414919 43.17237963 353.2276204
52 SPSS Output for the dummy variable model
53 Models with Two Qualitative
Independent Variables
54
55
Main effects model: Mean response as a function of F
and B when F and B affect E(y) independently
56

Remark: µij stands for E(y) with FiBj


57
Interaction model: Mean response as a function of F
and B when F and B interact to affect E(y)
58

Number of interaction terms


= number of main effect terms for one variable x
number of main effect terms for the other variable
59
60 Excel output for the main effects model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.601691084
R Square 0.362032161
Adjusted R Square 0.122794221
Standard Error 13.74958677
Observations 12

ANOVA
df SS MS F Significance F
Regression 3 858.2575758 286.0858586 1.513272356 0.283753136
Residual 8 1512.409091 189.0511364
Total 11 2370.666667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 64.45454545 7.180487506 8.976346717 1.88931E-05 47.89631157 81.01277934
X1 6.704545455 9.940934811 0.674438127 0.519043082 -16.21929133 29.62838224
X2 -2.295454545 9.940934811 -0.230909325 0.823181119 -25.21929133 20.62838224
X3 -15.81818182 8.291312789 -1.907801843 0.09284519 -34.9379834 3.30161976
61 SAS output for the main effects model
Excel output for interaction model
SUMMARY OUTPUT
62
Regression Statistics
Multiple R 0.985625027
R Square 0.971456693
Adjusted R Square 0.947670604
Standard Error 3.35824028
Observations 12

ANOVA
df SS MS F Significance F
Regression 5 2303 460.6 40.84137931 0.000147737
Residual 6 67.66666667 11.27777778
Total 11 2370.666667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 68.66666667 1.93888093 35.41561816 3.37827E-08 63.92239594 73.41093739
X1 11.33333333 3.065639925 3.696889919 0.01012564 3.83198267 18.834684
X2 -21.66666667 3.065639925 -7.067583669 0.000401953 -29.16801733 -14.165316
X3 -32.66666667 3.877761859 -8.424103349 0.000152595 -42.15520812 -23.1781252
X1X3 -0.833333333 5.129796762 -0.162449581 0.876284759 -13.38549382 11.71882716
X2X3 47.16666667 5.129796762 9.194646271 9.32953E-05 34.61450618 59.71882716
63 SAS output for interaction model
MINITAB graph of sample means for
64
engine performance
65
SAS printout for nested model:
Partial F-test of interaction
66
Models with Three or More Qualitative
Independent Variables
67
Models with Both Quantitative and Qualitative
Independent Variables
68 Model for E(y) as a function of engine speed
Model for E(y) as a function of fuel type and
69
engine speed (no interaction)
Model for E(y) as a function of fuel type and
70
engine speed (interaction)
A graphical portrayal of three factors - two
71
qualitative and one quantitative - on DDT level
72 DDT curves for stages 1 and 2
73 External Model Validation

 Examining the predicted values


 Examining the estimated model parameters
 Collecting new data for prediction
 Data-splitting / cross-validation
 Jackknifing
74
External Model Validation:
Collecting new data for prediction
   SSE
SSyy

The number of observations in the new data set should be large enough to reliably assess the
model's prediction performance. Montgomery, Peck, and Vining (2006) recommend 15-20
new observations, at the minimum.

If using Data-splitting / Cross-validation, Snee (1977) recommended the entire sample to


consist of at least 2k + 25 observations, where k is the number of β parameters in the
model.
75 External Model Validation: Jackknifing

Leaving each observation out of the data set, one at a time, and calculate:

You might also like