0% found this document useful (0 votes)
9 views

Stats Multiple Regression

Regression analysis

Uploaded by

Justin Mfaume
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Stats Multiple Regression

Regression analysis

Uploaded by

Justin Mfaume
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Multiple regression

www.kent.ac.uk/student-learning-advisory-service
1
Multiple regression
Introduction
• We will introduce multiple regression, in particular we will:

• Learn when we can use multiple regression

• Learn how multiple regression extends simple linear regression

• Learn how to use multiple regression in real applications

• This presentation is intended for students in initial stages of


Statistics. No previous knowledge is required. It is advised to first
read the presentation on simple linear regression.

2
Multiple regression
• Regression is used to study the relationship
between one dependent variable and two or more
independent variables.

• Just as in single regression, we need the dependent


variable to be numerical. The independent variables
can be numerical or categorical.

• However, if all the independent variables are


categorical, it is best to use ANOVA.
3
Motivation
• Single regression (i.e., with one IV) allows us to study the
relationship between two variables only.

• However, in reality, we do not believe that only a single


variable explains all the variation of the dependent variable.

• For example, in the scenario of IQ and income, we do not


expect IQ only to explain income, but we expect that there
are also other variables, such as years of education, to
explain income.

• Hence, to make the model more realistic, it makes sense to


include multiple independent variables in the regression.
4
Examples
The following are situations where we can use
multiple regression:
• Testing if IQ and level of education affect income
(IQ and years of education are the IV and income is
the DV).
• Testing if study time and pre-test scores affect final
grades (DV is final grades, and study time and pre-
test scores are the IV).
• Testing if exercise and amount of salt in the diet
affect blood pressure (exercise and salt are the IV
and blood pressure is the DV).
5
Displaying the data

As opposed to the simple


linear regression case, we
do not have a way to plot all
the variables at the same
time.

Hence, the scatterplot can


be performed only for each
continuous independent
variable independently.
6
Multiple linear regression
Example: Testing if study time and pre-test scores affect final grades (DV is final grades, and
study time and pre-test scores are the IV).

y = b0 + b1*X1 + b2*X2 + E

b2
b1
final grade

pre-test score
7
study time
Multiple linear regression

y = b0 + b1*X1 + b2*X2 + ...... + bn * Xn + E

b2
b1
final grade

pre-test score

study time 8
Assumptions of regression
• The errors E are normally distributed.

This can be tested by plotting an histogram of the residuals of


the regression and checking that they all have a bell shape.

Alternatively, you could use the Shapiro-Wilk test for


normality.

9
Assumptions of regression
• There are no clear outliers
This can be checked by performing the scatterplot. The
outliers (circled in red in the figure) can simply be removed
from the analysis .

10
Hypothesis testing
Regression tests, for each variable , the null hypothesis:

H0 : There is no effect of on Y.

versus the alternative hypothesis:

H1 : There is an effect of on Y.

If the null hypothesis is rejected, there is an evidence that there is a


significant relationship between and Y.

11
Hypothesis testing

We perform multiple regression in SPSS and look at the


p-value of each coefficient .

If the p-value is less than 0.05, we reject the null


hypothesis, otherwise, we do not reject the null
hypothesis.

Hence, we just look at the p-value as in simple


regression, but for each variable.

12
Regression in SPSS
(from statistics.leard.com)

Assume that you’re trying to investigate the


relationship between an individual’s VO2 max and the
individual’s age, weight, heart rate and gender.

In this case, VO2 max is the dependent variable and all


the others are independent variables.

13
Regression in SPSS
• First, go on Analyze > Regression > Linear..

14
Regression in SPSS
• In the Linear Regression box, transfer the DV
(VO2max) to the Dependent box and the IV (age,
weight, heart rate and gender) to the
Independent(s): box

15
Regression in SPSS
• Click on “Statistics” and tick “Estimates” and
“Model fit”, then click “Continue”.

• Finally, click on
the OK Button

16
Regression in SPSS
• Look for the box “Coefficients” and identify the
numbers under Sig.

• Those numbers are the p-value of each variable. If this


number is less than 0.05, the respective variable is
significant, otherwise it is not.

• In the example, all the variables are significant.


17
Regression in SPSS
• Similarly to simple regression, if the respective
coefficient B is positive, the variable has a positive
effect, otherwise it has a negative effect.

• In this case, age, weight and heart-rate all have a negative effect (that is, as
they increase, VO2max decreases).

• Gender has a positive effect. To understand the meaning, we look at how


gender was coded. Since gender was a coded as 0 for females and 1 for males
and the effect of gender is positive, that means that being male increases the 18
VO2max.
To book a maths/stats appointment…

www.kent.ac.uk/student-learning-advisory-service

19

You might also like