Lecture Notes Week 3
Lecture Notes Week 3
- Enter your powers in order, and test for significance in order (2, 3, 4….)
- In practical terms, “new” variables equal to x2, x3, etc. are calculated and added to the model.
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
Note:
plot(m2, 1) # same output
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
- Compare using residual error, adjusted R2, and the standard residual analysis graphics.
- We look more at adjusted R2 later ☺
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
summary(m1)
summary(m2)
Can compare these values as they are….OR use some statistical tests….
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
The linear model is a subset of the quadratic model (if the coefficient of the quadratic model is zero, it
is a linear model).
→ This means we can compare the models with an F-test.
Extrapolation
- Polynomials are, in general, very poor for extrapolation (prediction outside data range)
- They rapidly become large (positive or negative)
- They may be good for prediction within the range of the x-values
- Prediction intervals get larger outside the range of x-values, BUT they still assume that the model
is correct
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
head(Indy500)
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
Example: Indianapolis.csv
plot(Indy500$Year, Indy500$Speed, xlab = “Year”, ylab = “Speed”)
lines(Indy500$Year, fitted(m3), col = 3, lw = 3)
228371 Lecture set 3: More Regression
Lecture 1: Polynomial Regression
Example: Indianapolis.csv
summary(m3)
228371
Statistical Modelling for Engineers & Technologists
LECTURE SET 3: More Regression
F-test
There is an overall test to see if the model is useful in predicting change in the response.
This is called the F-test, it uses Mean Squares (MS) and is related to the Sum of Squares.
𝑀𝑆𝑅𝑒𝑔 𝑆𝑆𝑅𝑒𝑔 /𝑝
F-test statistic: =
𝑀𝑆𝑅𝑒𝑠 𝑆𝑆𝑅𝑒𝑠 /(𝑛−𝑝−1)
Null hypothesis is that the model is NOT significant, under which the test statistic follows an F distribution
with p, and n-p-1 degrees of freedom.
228371 Lecture set 3: More Regression
Lecture 2: One-way ANOVA
Remember – the real world is messy, we are not just interested in fitting a model, we are interested in
the stuff around it – the variance.
ANOVA provides information about variability within our regression models, and allow us to test for
significance.
For polynomials in the last lecture, we were using ANOVA to test H0: 𝛽𝑖 = 0, against Ha: 𝛽𝑖 ≠ 0 using
the F-statistic.
228371 Lecture set 3: More Regression
Lecture 2: One-way ANOVA
Factors
Factors are discrete valued variables. The values a factor takes are called levels. For example:
- This is common in experimental design because you can set the levels
One-way ANOVA
Example: fabrics.txt
Q → are any of the fabrics more or less flammable than the others?
fabrics <- read.table(“fabrics.txt”, header = TRUE)
Data are “stacked” → need unstack()
attach(fabrics)
fabrics_table <- unstack(as.data.frame(fabrics), burntime ~ fabric)
names(fabrics_table) = paste0(“Fabric”, 1:4)
228371 Lecture set 3: More Regression
Lecture 2: One-way ANOVA
Example: fabrics.txt
Q → are any of the fabrics more or less flammable than the others?
Decompose Variation
2 2 2
𝑦𝑖𝑗 − 𝑦ത = 𝑦ത𝑖 − 𝑦ത + 𝑦𝑖𝑗 − 𝑦ത𝑖
Let k = number of treatments (factor levels), n = total number of observations (all groups),
Then: Total df = n-1, Factor df = k -1, and residual df = n-k
Mean Sq = SumSq/df
F = FactorMS / Residual MS
It does this by creating indicator variables for each level after the first level,
e.g., for 2nd fabric:
1 If factor is at level 2
𝐼2 = ቊ If factor is at another level
0
TukeyHSD() gives joint 95% confidence for all intervals simultaneously – note only works on aov()
e.g., TukeyHSD(aov(oneway))
MC <- TukeyHSD(aov(oneway))
MC
ANOVA assumptions
[similar to regression assumptions]
Two-way ANOVA
Decomposition of variation and ANOVA table similar to one-way, but now there are two factor SumSqs
giving two F tests and p-values.
Two-way ANOVA
Cement #1
Two-way ANOVA – example concrete.txt
concrete <- read.table(“concrete.txt”, header = TRUE)
attach(concrete)
Two-way ANOVA in R
twoway <- lm(strength ~ factor(aggregate) + factor(cement))
anova(twoway)
p-value is low
Aggregate is significant
p-value is high
Cement is NOT
significant
Choose whichever cement you want (cost, availability, etc), but choose Aggregate 3 to maximise concrete
strength
228371 Lecture set 3: More Regression
Lecture 3: Two-way ANOVA
Suppose we hold one factor constant, and vary the levels of the other factor:
If the changes are the same (similar) regardless of which first factor level was chosen, then there is no
(little) interaction.
However, if the response changes are not similar, then the two factors are interacting with one another.
Need to investigate whether interaction is present, and if so, it needs to be put into the fitted model.
Example: mixingtime.txt
A large paddle is used to mix milk that has been collected and stored in large vats.
The optimal mixing time depends on the diameter of the paddle, and its rotation speed (3 levels for
each).
Model: fit = overall effect + row effect + column effect + interaction effect
Two factor SumSqs plus an interaction SumSq giving three F tests and p-values.
If interaction term is not significant then we can refit ANOVA model without interaction term.
Note:
df(rowfactor) = r – 1 where r is number of levels per row (there are three speeds)
df(colfactor) = c – 1 where c is number of levels per column (there are three diameters)
df(interaction) = (r-1)(c-1)
df(residuals) = (n-1)-(r-1)-(c-1)-(r-1)(c-1)
228371 Lecture set 3: More Regression
Lecture 3: Two-way ANOVA
Analysis allows different regression lines (different slopes and/or intercepts) for different levels of the factor by
testing for significance using the linear model.
ANCOVA Model
𝑌 = 𝛽0 + 𝛽1 𝐼2 + 𝛽2 𝐼3 + (𝛽3 +𝛽4 𝐼2 + 𝛽5 𝐼3 )𝑋
Highway: I2 = I3 = 0 → 𝑌 = 𝛽0 + 𝛽3 𝑋
Mall: I2 = 1, I3 = 0 → 𝑌 = 𝛽0 + 𝛽1 + (𝛽3 +𝛽4 )𝑋
Street: I2 = 0, I3 = 1 → 𝑌 = 𝛽0 + 𝛽2 + (𝛽3 +𝛽5 )𝑋
228371 Lecture set 3: More Regression
Lecture 3: Two-way ANOVA
model.matrix(anc)
Interaction model with X and a factor
Extract I2, and then regress Y on X and I2
I2<- model.matrix(anc)[ , 3]
final <- lm(sales~homes + I2)
summary(final)
228371 Lecture set 3: More Regression
Lecture 3: Two-way ANOVA
summary(final)$coefficients
𝑏0 = −3.298
𝑏3 = 0.906
𝑏1 = 23.84
𝑌 = 𝛽0 + 𝛽1 𝐼2 + 𝛽2 𝐼3 + (𝛽3 +𝛽4 𝐼2 + 𝛽5 𝐼3 )𝑋
𝑌 = 𝛽0 + 𝛽1 𝐼2 + 𝛽3 𝑋