0% found this document useful (0 votes)
50 views21 pages

Multiple Linear Regression

1. Multiple linear regression involves building a model to analyze the relationship between one continuous dependent variable and multiple independent variables. 2. The steps include checking for multicollinearity, fitting the regression model and removing insignificant variables, testing the overall model significance, and validating the model assumptions. 3. As an example, a multiple linear regression model was built to analyze factors affecting blood pressure using data on 20 patients. Variables like age, weight, and body surface area were found to significantly impact blood pressure based on statistical tests run in R.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views21 pages

Multiple Linear Regression

1. Multiple linear regression involves building a model to analyze the relationship between one continuous dependent variable and multiple independent variables. 2. The steps include checking for multicollinearity, fitting the regression model and removing insignificant variables, testing the overall model significance, and validating the model assumptions. 3. As an example, a multiple linear regression model was built to analyze factors affecting blood pressure using data on 20 patients. Variables like age, weight, and body surface area were found to significantly impact blood pressure based on statistical tests run in R.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Multiple Linear Regression

1
Steps for building the Multiple Linear
Regression Model
1. Check for the multicollinearity (correlation among
predictors) using Variance Inflation Factor (VIF).
a. If VIF>10 then muticollinearity exits. First remove the variable
with highest VIF value and check for the multicollinearity of
the remaining variables.
b. Continue this process until VIF of the variables less than or
equal to 10

2. Fit the regression model with the remaining variables and


test for the significance of the parameters. First remove
the insignificant parameter with the highest p-value.
a. Continue this process until all the parameters are significant.

2
3. Test for the overall significance of the fitted model
using F-test and coefficient of determination .

4. Carry out the residual analysis.

5. Model validation.

3
Example 2
The following data were collected on a simple random
sample of 20 patients with hypertension. The variables are
y = Mean arterial blood pressure (mm Hg)
x1= age (years)
x2= weight (kg)
x3=body surface area (sq m)
x4= duration of hypertension (years)
x5= basal pulse (beats/min)
x6= measure of stress

4
Patient y x1 x2 x3 x4 x5 x6

1 105 47 85.4 1.75 5.1 63 33


2 115 49 94.2 2.10 3.8 70 14
3 116 49 95.3 1.98 8.2 72 10
4 117 50 94.7 2.01 5.8 73 99
5 112 51 89.4 1.89 7.0 72 95
6 121 48 99.5 2.25 9.3 71 10
7 121 49 99.8 2.25 2.5 69 42
8 110 47 90.9 1.90 6.2 66 8
9 110 49 89.2 1.83 7.1 69 62
10 114 48 92.7 2.07 5.6 64 35
11 114 47 94.4 2.07 5.3 74 90
12 115 49 94.1 1.98 5.6 71 21
13 114 50 91.6 2.05 10.2 68 47
14 106 45 87.1 1.92 5.6 67 80
15 125 52 101.3 2.19 10.0 76 98
16 114 46 94.5 1.98 7.4 69 95
17 106 46 87.0 1.87 3.6 62 18
18 113 46 94.5 1.90 4.3 70 12
19 110 48 90.5 1.88 9.0 71 99
20 122 56 95.7 2.09 7.0 75 99
5
• Import the “patients” data set
• View the data set
View(patients)
• Attaching the data set
attach(patients)
• To obtain the correlation
cor(patients)
• To obtain a scatter plot
pairs(patients)
6
• Define the model to check the Multicollinearity
mlr1<-lm(y~x1+x2+x3+x4+x5+x6,data = patients)

• Calculating VIF values


install.packages("faraway")
library(faraway)
vif(mlr1)

VIF(<10) No Multicollinearity

7
Fit the regression model
mlr1<-lm(y~x1+x2+x3+x4+x5+x6,data = patients)

• Obtaining the regression coefficients of the model


coef(mlr1)
or
mlr1

8
• Obtaining the summary of the regression
model
summary(mlr1)

Insignificant
(P(>0.05))

9
Note
• If the Intercept is not significant, How can you
get rid of it in R

lm(y~ -1+x1+x2+x3+x4+x5+x6,data = patients)

10
• Define the model by removing variable x4
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)

• Calculating VIF values


vif(mlr2)

VIF(<10) No Multico-llinearity

11
• Fitting the model by removing variable x4 and
obtaining the summary
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)
summary(mlr2)

Insignificant
(P(>0.05))

12
• Define the model by removing variable x5
mlr3<-lm(y~x1+x2+x3+x6,data = patients)

• Calculating VIF values


vif(mlr3)

VIF(<10) No Multicollinearity

13
• Fitting the model by removing variable x5 and
obtaining the summary
mlr3<-lm(y~x1+x2+x3+x6,data = patients)
summary(mlr3)

Insignificant
(P(>0.05))

14
• Define the model by removing variable x6
mlr4<-lm(y~x1+x2+x3,data = patients)

• Calculating VIF values


vif(mlr4)

VIF(<10) No Multico-llinearity

15
• Fitting the model by removing variable x6 and
obtaining the summary
mlr4<-lm(y~x1+x2+x3,data = patients)
summary(mlr4)

All
parameters
are significant
(P(<0.05))

16
Residual Analysis(assumptions)
• H0 - No serial correlation (auto
correlation)-Durbin Watson Test

• p-value=0.4011>0.05 Do not reject H0

17
Normality Test
• H0: Residuals are normally distributed –
Anderson-Darling Test
Do not
(P(>0.05))
reject H0

18
• H0 – variance of the residuals is constant.

Since there is a
random pattern,
constant variance of
residual is satisfied

19
Model Validation

• Here p-value<0.05. Therefore, model is


significant.
• R2=0.9935, 99.35% of the total variation can
be explained by the fitted model.

20
Using the fitted model to predict the
blood pressure
𝑦ො = −13.6672 + 0.7016 ∗ 𝑥1 + 0.9058 ∗ 𝑥2 + (4.6273 ∗ 𝑥3)
• When weight and body surface area are fixed, age increases by
one year, the Blood pressure will increase by 0.702 units
• Calculating blood pressure when
age (x1) = 52 years,
Weight (x2) = 83.7 Kg,
Body surface area (x3) = 1.4 (sq m)
x1=52
x2=83.7
x3=1.4
Blood_pressure<-(-13.6672)+(0.7016*x1)+(0.9058*x2)+(4.6273*x3)
Blood_pressure
[1] 105.1097

21

You might also like