Multiple Linear Regression
Multiple Linear Regression
1
Steps for building the Multiple Linear
Regression Model
1. Check for the multicollinearity (correlation among
predictors) using Variance Inflation Factor (VIF).
a. If VIF>10 then muticollinearity exits. First remove the variable
with highest VIF value and check for the multicollinearity of
the remaining variables.
b. Continue this process until VIF of the variables less than or
equal to 10
2
3. Test for the overall significance of the fitted model
using F-test and coefficient of determination .
5. Model validation.
3
Example 2
The following data were collected on a simple random
sample of 20 patients with hypertension. The variables are
y = Mean arterial blood pressure (mm Hg)
x1= age (years)
x2= weight (kg)
x3=body surface area (sq m)
x4= duration of hypertension (years)
x5= basal pulse (beats/min)
x6= measure of stress
4
Patient y x1 x2 x3 x4 x5 x6
VIF(<10) No Multicollinearity
7
Fit the regression model
mlr1<-lm(y~x1+x2+x3+x4+x5+x6,data = patients)
8
• Obtaining the summary of the regression
model
summary(mlr1)
Insignificant
(P(>0.05))
9
Note
• If the Intercept is not significant, How can you
get rid of it in R
10
• Define the model by removing variable x4
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)
VIF(<10) No Multico-llinearity
11
• Fitting the model by removing variable x4 and
obtaining the summary
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)
summary(mlr2)
Insignificant
(P(>0.05))
12
• Define the model by removing variable x5
mlr3<-lm(y~x1+x2+x3+x6,data = patients)
VIF(<10) No Multicollinearity
13
• Fitting the model by removing variable x5 and
obtaining the summary
mlr3<-lm(y~x1+x2+x3+x6,data = patients)
summary(mlr3)
Insignificant
(P(>0.05))
14
• Define the model by removing variable x6
mlr4<-lm(y~x1+x2+x3,data = patients)
VIF(<10) No Multico-llinearity
15
• Fitting the model by removing variable x6 and
obtaining the summary
mlr4<-lm(y~x1+x2+x3,data = patients)
summary(mlr4)
All
parameters
are significant
(P(<0.05))
16
Residual Analysis(assumptions)
• H0 - No serial correlation (auto
correlation)-Durbin Watson Test
17
Normality Test
• H0: Residuals are normally distributed –
Anderson-Darling Test
Do not
(P(>0.05))
reject H0
18
• H0 – variance of the residuals is constant.
Since there is a
random pattern,
constant variance of
residual is satisfied
19
Model Validation
20
Using the fitted model to predict the
blood pressure
𝑦ො = −13.6672 + 0.7016 ∗ 𝑥1 + 0.9058 ∗ 𝑥2 + (4.6273 ∗ 𝑥3)
• When weight and body surface area are fixed, age increases by
one year, the Blood pressure will increase by 0.702 units
• Calculating blood pressure when
age (x1) = 52 years,
Weight (x2) = 83.7 Kg,
Body surface area (x3) = 1.4 (sq m)
x1=52
x2=83.7
x3=1.4
Blood_pressure<-(-13.6672)+(0.7016*x1)+(0.9058*x2)+(4.6273*x3)
Blood_pressure
[1] 105.1097
21