0% found this document useful (0 votes)
29 views

Multiple Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Multiple Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

MULTIPLE LINEAR

REGRESSION

NURULJANNAH BT NOR AZMI


INTRODUCTION

Multiple linear regression is used


to estimate the relationship
between two or more independent
variables and one dependent
variable.

Dependent (outcome) : numerical

Independent (predictor) : 2 or more


numerical variables
INTRODUCTION

If independent variables are


combination of numerical and
categorical or categorical only -
General Linear Regression

Dependent (outcome) : numerical

Independent (predictor) : 2 or more


combination of numerical and
categorical or categorical only
SIMPLE LINEAR REGRESSION - ONLY ONE INDEPENDENT VARIABLE

Independent variable (x) Dependent variable (y)

Mother's height Length of baby

MULTIPLE LINEAR REGRESSION - MORE THAN ONE INDEPENDENT VARIABLES

Independent variables (x) Dependent variable (y)

Mother's height
Mother's weight Length of baby
Age
You can use multiple linear
regression when you want to
know:
When to 1. How strong the relationship is
between two or more independent
apply variables and one dependent variable

Multiple (e.g. how mother's height, weight and


age affect length of baby).
Linear 2. The value of the dependent variable at
a certain value of the independent
Regression? variables (e.g. the expected length of
baby at certain levels of mother's
height, weight and age)
Multiple Linear Regression Model

Y= 0 + 1 X1 + 2 X2 + 3 X 3 + ........ n Xn
Y = outcome
0 = intercept

1 ........ n = regression coefficient for independent variable


X 1 ........ X n = independent variable
STEPS IN MULTIPLE LINEAR REGRESSION

1 Descriptive statistics

2 Simple linear regression (Univariable analysis)

3 Multiple linear regression (Multivariable analysis)

4 Checking multicollinearity & interaction (Preliminary final model)

5 Checking assumptions (final model)

6 Interpretation & presentation


EXAMPLE
Open dataset:
birthweight.sav

This dataset contains information on new born babies and their


parents admitted in Hospital Kuala Lumpur. A researcher is
interested to determine the factors that are associated with the
length of baby.
EXAMPLE
RQ: What are the factors that associated with the length of baby?

Length of baby (DV) Factors (IV) List down all the variables

Mother's age
Mother's height
Mother's weight

Identify the types of


Numerical Numerical variables

Identify the right


Multiple Linear Regression statistical analysis
STEP 1: DESCRIPTIVE STATISTICS

1.Data exploration and cleaning.


2.For categorical data, run the data by using Frequencies in SPSS.
3.For numerical data, run the data by using Descriptives/Explore in
SPSS.
Run frequencies for categorical data

Go to: Analyze > Descriptive statistics > Frequencies

Enter
categorical
variables
Run descriptive for numerical data

Go to: Analyze > Descriptive statistics > Descriptives/Explore

Enter numerical
variables
STEP 2: SIMPLE LINEAR REGRESSION (UNIVARIABLE ANALYSIS)

1.Do Simple Linear Regression analysis for each independent


variable:
Mother's age
Mother's height
Mother's weight

2.At the end, choose variables with p-value < 0.25 and/or clinically
important.
Go to: Analyze > Regression > Linear
Length of baby vs Mother's age

There is a significant relationship between mother's age and the length of baby.
Length of baby vs Mother's height

There is a significant relationship between mother's height and the length


of baby.
Length of baby vs Mother's weight

There is a significant relationship between mother's pre-pregnancy weight


and the length of baby.
Table 1: Associated factors of the length of baby by Simple Linear
Regression
STEP 3: MULTIPLE LINEAR REGRESSION
(MULTIVARIABLE ANALYSIS)

1.Variables selection can be done by using following methods:


Forward
Backward
Stepwise

2. Perform all the methods and select the model with all
variables significant as the preliminary main effect model.
Go to: Analyze > Regression > Linear
METHOD: FORWARD (Automatically enters the IMPORTANT independent
variable into the model)

Enter all the


selected
variables

Select Forward
METHOD: FORWARD

Mother's age and height are significant.


METHOD: BACKWARD (Automatically removes the UNIMPORTANT
independent variable out of the model)

Select Backward
METHOD: BACKWARD

Mother's age and height are significant.


METHOD: STEPWISE (The procedure adds or removes independent variables
one at a time using the variable’s statistical significance)

Select Stepwise
METHOD: STEPWISE

Mother's age and height are significant.


During this step, mother's age and height were found to be
significant in all methods.
Run the model once again using 'Enter' method by using the
chosen variables.

This will be the preliminary main effect model.


STEP 4: CHECKING MULTICOLLINEARITY

1.Multicollinearity occurs when independent variables in a


regression model are correlated.

2. This correlation is a problem because independent variables


should be independent.

3.If the degree of correlation between variables is high enough,


it can cause problems when you fit the model and interpret the
results.

4. There is a high chance of getting inaccurate p-values and wide


confidence interval of regression coefficient.
STEP 4: CHECKING MULTICOLLINEARITY

5. Multicollinearity can be checked by using Variance Inflation


Factor (VIF).

6. If VIF is more than 10, then there is a multicollinearity


amongst independent variables.
Go to: Analyze > Regression > Linear

The values of VIF for both variables are less than 10. There is no
multicollinearity problem in this model.
STEP 4: CHECKING INTERACTION

1.An interaction effect occurs when the effect of one variable


depends on the value of another variable.

2.The interaction terms need to be biologically meaningful.

3.The interaction term needs to be computed in SPSS and then


added to the model as an independent variable. If you have more
than one interaction term, add to the model one by one.

4.If the interaction term is statistically significant, include the


term in the model.
Go to: Transform > Compute variable
age*height

mage*mheight
Go to: Analyze > Regression > Linear

add interaction term


The interaction age_height is not statistically significant
(p=0.909).
STEP 5: CHECKING ASSUMPTIONS

Assumptions How to check?

1.Independent observation Done during design stage

Scatter plot between residuals and


2.Overall linearity
predicted values (XP - YR)

3.Homoscedasticity Scatter plot between residuals and


(Equal variances) predicted values (XP - YR)

Scatter plot residual vs each independent


4.Linearity of each independent variable
variable (XI - YR)

5.Residuals should be approximately Histogram with overlaid normal curve of


normally distributed residuals
Checking assumption: Overall linearity & Homoscedasticity

Go to: Analyze > Regression > Linear


Go to: Graph > Legacy Dialogs > Scatter/Dot
XP - YR
Double click the plot and click
Linearity:
If there is a peculiar shape of concavity or convexity, then assumption is NOT
MET.

Since there is no peculiar shape, linearity assumption is MET.


Homoscedasticity (Equal variance):
If there is a peculiar shape of divergence or convergence or fan-shape, then
assumption is NOT MET.

Since there is no peculiar shape, homoscedasticity assumption is MET.


Example of non-linear
relationship
Checking assumption: Linearity of each independent variable

Go to: Graph > Legacy Dialogs > Scatter/Dot


Mother's age vs Residual

XI - YR
Double click the plot and click

There is no peculiar shape, linearity assumption is MET.


Mother's height vs Residual

XI - YR
Double click the plot and click

There is no peculiar shape, linearity assumption is MET.


Checking assumption: Normality distribution of residuals

Go to: Graphs > Legacy Dialogs > Histogram


Residuals are normally distributed. Assumption
is met.
STEP 6: INTERPRETATION AND PRESENTATION

Run the final model. All the assumptions were checked and MET.
STEP 6: INTERPRETATION AND PRESENTATION

Table 2: Factors associated with the length of baby in HKL (n=42)


STEP 6: INTERPRETATION AND PRESENTATION

There is a significant linear negative relationship between mother's age


and the length of baby. For every one-year increase in the mother's age,
the baby's length is 0.28 cm lower. (adjusted b = -0.28; 95% CI
-0.37,-0.19; p<0.001)
There is a significant linear positive relationship between mother's height
and the length of baby. For every 1 cm increase in the mother's height,
the baby's length increases by 0.14 cm. (adjusted b = 0.14; 95% CI
0.05,0.24; p=0.004)
62.1% of the variation in the length of baby is explained by mother's age
and height according to the multiple linear regression model (R2 = 0.621).
MDM NURULJANNAH
BT NOR AZMI

EMAIL: [email protected]

You might also like