0% found this document useful (0 votes)

99 views75 pages

Principles of Model Building

The document discusses principles of model building. It provides examples of using sample data to fit models and comparing different polynomial models to earthquake data in Japan from 1964 to 2011. It also covers topics like coding quantitative variables, models with qualitative independent variables, and interaction effects between variables.

Uploaded by

Wingtung Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views75 pages

Principles of Model Building

Uploaded by

Wingtung Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 75

Principles of Model Building

Dr. William Lau

Tel: 3943 8572
[email protected]
Why is Model Building so Important?
E.g. 1: Population Data VS Sample Data

Population:

Sample:
Why is Model Building so Important?
E.g. 1: How to fit with sample data?

Model 1:

Model 2:
Why is Model Building so Important?
E.g. 2: All Earthquake Data in Japan (1964 – 2011)

Frequency Magnitude

Model 1 Model 2
Lesson Outline
1 Introduction: Why Model Building Is Important
2 Models with a Single Quantitative Independent Variable
3 First-Order Models with Two or More Quantitative Independent Variables
4 Second-Order Models with Two or More Quantitative Independent Variables
5 Coding Quantitative Independent Variables (Optional)
6 Models with One Qualitative Independent Variable
7 Models with Two Qualitative Independent Variables
8 Models with Three or More Qualitative Independent Variables
9 Models with Both Quantitative and Qualitative Independent Variables
10 External Model Validation
6
Models with a Single Quantitative Independent
Variable
7
Modeling exam score, y, as a function of
study time, x
8
9
10
11 Two second-order polynomial models
12 Example of the use of a quadratic model
13
14 Two third-order polynomial models
15
16 Scatterplot for power load data
17 Excel Output for 3rd order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.979483347
R Square 0.959387626
Adjusted R Square 0.953585859
Standard Error 5.501031179
Observations 25

ANOVA
df SS MS F Significance F
Regression 3 15012.16218 5004.054058 165.3612626 9.13676E-15
Residual 21 635.4882247 30.26134403
Total 24 15647.6504

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 331.2526803 477.1114657 0.694287822 0.495111584 -660.9549293 1323.46029
TEMP -6.391912447 16.79082602 -0.380678856 0.707265164 -41.3103467 28.5265218
TEMP2 0.037753973 0.19451185 0.194096006 0.847965995 -0.366755563 0.44226351
TEMP3 8.43217E-05 0.000742596 0.113549971 0.91067311 -0.001459991 0.001628634
18 MINITAB output for 3rd order model
19 Excel Output for 2nd order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.979470618
R Square 0.959362691
Adjusted R Square 0.95566839
Standard Error 5.376203469
Observations 25

ANOVA
df SS MS F Significance F
Regression 2 15011.772 7505.885999 259.687216 4.99085E-16
Residual 22 635.8784022 28.90356374
Total 24 15647.6504

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 385.0480932 55.17243578 6.978993909 5.2661E-07 270.6274646 499.4687219
TEMP -8.292526804 1.29904502 -6.383556132 2.00975E-06 -10.98658128 -5.598472322
TEMP2 0.059823368 0.007548554 7.925142711 6.8979E-08 0.044168625 0.075478111
20 MINITAB output for 2nd order model
21 Test Your Understanding
Do you think the second-order model or the third-order model is a better choice in the
previous example?

A. Either model is good, as they both have significant Global F-test, and high value of R2
B. Third-order model is marginally better, as its value of R2 is slightly higher than the
second-order model
C. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.91067.
D. Second-order model is better, given the p-value of the partial F-test between the two
nested models is 0.20346.
First-Order Models with Two or More
22
Quantitative Independent Variables
Response surface for first-order model with two
23
quantitative independent variables
24
Contour lines of E(y) for
25
x2 = 1,2,3 (first-order model)
26
Second-Order Models with Two or More
Quantitative Independent Variables
Response surface for an interaction model
27
(second-order)
Contour lines of E(y) for x2 = 1,2,3
28
(first-order model plus interaction)
29
30 Definition of Interaction

Two variables are said to interact if the change in E(y)

for a 1-unit change in a variable (holding the other
variable fixed) is dependent on the value of the other
variable.
31
32 Graphs of three second-order surfaces
Contours of E(y) for x2 = -1,0,1 (complete
33 second-order model)
34
35 Definition of Complete Second Order Model

The complete second-order model includes the constant,

all linear (first-order), all two-variable interactions,
and all quadratic terms.
36
37 Excel output for complete second-order model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.996496918
R Square 0.993006107
Adjusted R Square 0.991340894
Standard Error 1.678696006
Observations 27

ANOVA
df SS MS F Significance F
Regression 5 8402.264537 1680.452907 596.3239222 7.02346E-22
Residual 21 59.17842593 2.818020282
Total 26 8461.442963

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -5127.899074 110.2960149 -46.49215185 1.15297E-22 -5357.272194 -4898.525954
TEMP 31.09638889 1.344413217 23.1300827 2.01046E-16 28.30052855 33.89224923
PRESSURE 139.7472222 3.140054116 44.50471777 2.86045E-22 133.2171222 146.2773222
TEMP2 -0.133388889 0.006853248 -19.46360234 6.45546E-15 -0.147640998 -0.11913678
2
PRESSURE -1.144222222 0.027412991 -41.74014512 1.0841E-21 -1.201230658 -1.087213787
TEMP*PRESSURE -0.1455 0.009691956 -15.01244964 1.05883E-12 -0.165655526 -0.125344474
38 SAS output for complete second-order model
39 Graph of complete second-order model
40
Coding Quantitative Independent Variables
(Optional)
41 Optional
42 Optional
Optional

43 MINITAB printout for the quadratic model

Optional

44 MINITAB descriptive statistics for temperature, x

45 Optional
Optional

MINITAB printout for the quadratic model

46
with coded temperature
47 Models with One Qualitative
Independent Variable
48

Basically, we first arbitrarily select one level to be the base level,

then we set up dummy variables for the remaining levels.
49
50
51 Excel Output for the dummy variable model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.45281092
R Square 0.205037729
Adjusted R Square 0.146151635
Standard Error 168.9478223
Observations 30

ANOVA
df SS MS F Significance F
Regression 2 198772.4667 99386.23333 3.48193801 0.045152103
Residual 27 770670.9 28543.36667
Total 29 969443.3667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 279.6 53.42599243 5.233407697 1.62749E-05 169.9789184 389.2210816
Kentucky 80.3 75.55576307 1.062791199 0.297290401 -74.72762037 235.3276204
Texas 198.2 75.55576307 2.623228089 0.01414919 43.17237963 353.2276204
52 SPSS Output for the dummy variable model
53 Models with Two Qualitative
Independent Variables
54
55
Main effects model: Mean response as a function of F
and B when F and B affect E(y) independently
56

Remark: µij stands for E(y) with FiBj

57
Interaction model: Mean response as a function of F
and B when F and B interact to affect E(y)
58

Number of interaction terms

= number of main effect terms for one variable x
number of main effect terms for the other variable
59
60 Excel output for the main effects model
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.601691084
R Square 0.362032161
Adjusted R Square 0.122794221
Standard Error 13.74958677
Observations 12

ANOVA
df SS MS F Significance F
Regression 3 858.2575758 286.0858586 1.513272356 0.283753136
Residual 8 1512.409091 189.0511364
Total 11 2370.666667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 64.45454545 7.180487506 8.976346717 1.88931E-05 47.89631157 81.01277934
X1 6.704545455 9.940934811 0.674438127 0.519043082 -16.21929133 29.62838224
X2 -2.295454545 9.940934811 -0.230909325 0.823181119 -25.21929133 20.62838224
X3 -15.81818182 8.291312789 -1.907801843 0.09284519 -34.9379834 3.30161976
61 SAS output for the main effects model
Excel output for interaction model
SUMMARY OUTPUT
62
Regression Statistics
Multiple R 0.985625027
R Square 0.971456693
Adjusted R Square 0.947670604
Standard Error 3.35824028
Observations 12

ANOVA
df SS MS F Significance F
Regression 5 2303 460.6 40.84137931 0.000147737
Residual 6 67.66666667 11.27777778
Total 11 2370.666667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 68.66666667 1.93888093 35.41561816 3.37827E-08 63.92239594 73.41093739
X1 11.33333333 3.065639925 3.696889919 0.01012564 3.83198267 18.834684
X2 -21.66666667 3.065639925 -7.067583669 0.000401953 -29.16801733 -14.165316
X3 -32.66666667 3.877761859 -8.424103349 0.000152595 -42.15520812 -23.1781252
X1X3 -0.833333333 5.129796762 -0.162449581 0.876284759 -13.38549382 11.71882716
X2X3 47.16666667 5.129796762 9.194646271 9.32953E-05 34.61450618 59.71882716
63 SAS output for interaction model
MINITAB graph of sample means for
64
engine performance
65
SAS printout for nested model:
Partial F-test of interaction
66
Models with Three or More Qualitative
Independent Variables
67
Models with Both Quantitative and Qualitative
Independent Variables
68 Model for E(y) as a function of engine speed
Model for E(y) as a function of fuel type and
69
engine speed (no interaction)
Model for E(y) as a function of fuel type and
70
engine speed (interaction)
A graphical portrayal of three factors - two
71
qualitative and one quantitative - on DDT level
72 DDT curves for stages 1 and 2
73 External Model Validation

 Examining the predicted values

 Examining the estimated model parameters
 Collecting new data for prediction
 Data-splitting / cross-validation
 Jackknifing
74
External Model Validation:
Collecting new data for prediction
 SSE
SSyy

The number of observations in the new data set should be large enough to reliably assess the
model's prediction performance. Montgomery, Peck, and Vining (2006) recommend 15-20
new observations, at the minimum.

If using Data-splitting / Cross-validation, Snee (1977) recommended the entire sample to

consist of at least 2k + 25 observations, where k is the number of β parameters in the
model.
75 External Model Validation: Jackknifing

Leaving each observation out of the data set, one at a time, and calculate:

Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
104 pages
Lecture 12 Regression
No ratings yet
Lecture 12 Regression
55 pages
4 Multiple Regression Models
No ratings yet
4 Multiple Regression Models
67 pages
Pharmaceutical and Medical Device
100% (1)
Pharmaceutical and Medical Device
272 pages
2015 Regression Using Stata and SAS
No ratings yet
2015 Regression Using Stata and SAS
36 pages
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
100% (14)
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
15 pages
Linear Regression
No ratings yet
Linear Regression
65 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
CH 14
No ratings yet
CH 14
48 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
Lec 3
No ratings yet
Lec 3
69 pages
4 Multiple Regression Models
No ratings yet
4 Multiple Regression Models
66 pages
Resource Estimation
No ratings yet
Resource Estimation
188 pages
Module 2 Transcripts - v3
No ratings yet
Module 2 Transcripts - v3
103 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Statistical Evaluation of Analytical Data: Quality Assurance Programme
No ratings yet
Statistical Evaluation of Analytical Data: Quality Assurance Programme
80 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
07 - Multiple Linear Regression III
No ratings yet
07 - Multiple Linear Regression III
6 pages
Linear Regression
100% (2)
Linear Regression
228 pages
T1 FRM 5 Ch5 26 32 60 61 403 404 20.8 20.9 v3 4 - 6120adda7e8e9
0% (1)
T1 FRM 5 Ch5 26 32 60 61 403 404 20.8 20.9 v3 4 - 6120adda7e8e9
48 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Math2831 Course Pack
No ratings yet
Math2831 Course Pack
246 pages
Chapter 06-Regression Analysis
No ratings yet
Chapter 06-Regression Analysis
41 pages
Linear Model
No ratings yet
Linear Model
10 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Actuarial CT3 Probability & Mathematical Statistics Sample Paper 2011
100% (2)
Actuarial CT3 Probability & Mathematical Statistics Sample Paper 2011
9 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Impact of Work-Life Balance On Employees' Turnover and Turnover Intentions: An Empirical Study On Multinational Corporations in Bangladesh
No ratings yet
Impact of Work-Life Balance On Employees' Turnover and Turnover Intentions: An Empirical Study On Multinational Corporations in Bangladesh
19 pages
Krajewski TIF Chapter 13
No ratings yet
Krajewski TIF Chapter 13
43 pages
CH 06
No ratings yet
CH 06
22 pages
Wear Thesis PDF
No ratings yet
Wear Thesis PDF
68 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
ML Notes
No ratings yet
ML Notes
38 pages
SM Notes 2020
No ratings yet
SM Notes 2020
139 pages
KOntario Gateway
No ratings yet
KOntario Gateway
426 pages
The University of Auckland: Second Semester, 2004 Campus: City
No ratings yet
The University of Auckland: Second Semester, 2004 Campus: City
23 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
No ratings yet
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
26 pages
Business Statistics, 5 Ed.: by Ken Black
No ratings yet
Business Statistics, 5 Ed.: by Ken Black
34 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Bivariate
No ratings yet
Bivariate
28 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Revision 235
No ratings yet
Revision 235
8 pages
Unit 5
No ratings yet
Unit 5
104 pages
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
No ratings yet
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
37 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Formulas Stat QM3 - 1617
No ratings yet
Formulas Stat QM3 - 1617
3 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
PPT
No ratings yet
PPT
10 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Shrout Bolger-Mediation-2002
No ratings yet
Shrout Bolger-Mediation-2002
25 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
19 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
ISLR
No ratings yet
ISLR
9 pages
PGDBA
No ratings yet
PGDBA
13 pages
An Almost Ideal Demand System
No ratings yet
An Almost Ideal Demand System
15 pages
Sol 114
No ratings yet
Sol 114
8 pages
Report On Mobile Banking Service Quality-A Study On Bkash"
No ratings yet
Report On Mobile Banking Service Quality-A Study On Bkash"
26 pages
Guidelines Assignment SPSS 2024 2025 WB
No ratings yet
Guidelines Assignment SPSS 2024 2025 WB
9 pages
Real Gases SPE26668
No ratings yet
Real Gases SPE26668
19 pages
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
No ratings yet
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
81 pages
174839-Article Text-447457-1-10-20180719
No ratings yet
174839-Article Text-447457-1-10-20180719
32 pages
Errors of Measurement (Compatibility Mode)
No ratings yet
Errors of Measurement (Compatibility Mode)
5 pages
Lab Manual ACS2 - 2023
No ratings yet
Lab Manual ACS2 - 2023
56 pages
ImyachetJamir Probability and Statistics Practical
No ratings yet
ImyachetJamir Probability and Statistics Practical
23 pages
Experimental and Statistical Investigation of The Mechanical Properties of Limestone Rocks in Lebanon
No ratings yet
Experimental and Statistical Investigation of The Mechanical Properties of Limestone Rocks in Lebanon
6 pages
4-Accuracy in Forecasting PDF
No ratings yet
4-Accuracy in Forecasting PDF
43 pages
Optimization of AMDP-ABROD Furnace For Rice and Grain Drying
No ratings yet
Optimization of AMDP-ABROD Furnace For Rice and Grain Drying
26 pages
Tests For Mean and Proportion
No ratings yet
Tests For Mean and Proportion
34 pages
Opm Assignment#1
No ratings yet
Opm Assignment#1
22 pages
Season 11 Taric Jungle Metrics
No ratings yet
Season 11 Taric Jungle Metrics
31 pages
MR Example Multiplereg001
No ratings yet
MR Example Multiplereg001
17 pages
Time Series With Excel
No ratings yet
Time Series With Excel
11 pages
Ajae Aaq104 Full
No ratings yet
Ajae Aaq104 Full
7 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Economic Control of Quality of Manufactured Product
From Everand
Economic Control of Quality of Manufactured Product
Walter A. Shewhart
No ratings yet
Applied Regression Analysis
From Everand
Applied Regression Analysis
Norman R. Draper
4/5 (11)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)

Principles of Model Building

Uploaded by

Principles of Model Building

Uploaded by

Principles of Model Building

Dr. William Lau

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Two variables are said to interact if the change in E(y)

The complete second-order model includes the constant,

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

43 MINITAB printout for the quadratic model

44 MINITAB descriptive statistics for temperature, x

MINITAB printout for the quadratic model

Basically, we first arbitrarily select one level to be the base level,

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Remark: µij stands for E(y) with FiBj

Number of interaction terms

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

 Examining the predicted values

If using Data-splitting / Cross-validation, Snee (1977) recommended the entire sample to

You might also like