Assumptions of Multiple Linear Regression

This document discusses assumptions and methods for multiple linear regression analysis. The key assumptions are that the relationships between predictor and outcome variables are linear and that residuals are normally distributed. Multicollinearity, where predictors are highly correlated, can also cause issues. Methods include simultaneous regression, which considers all predictors at once, and hierarchical regression, which allows specifying the order of predictors. Stepwise regression can be used with many predictors but capitalizes on chance. The document provides an example using education data to predict math achievement, first with all predictors and then correcting for multicollinearity between related predictors.

Uploaded by

Dr. Krishan K. Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

558 views18 pages

Assumptions of Multiple Linear Regression

Uploaded by

Dr. Krishan K. Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Multiple Regression

Assumptions of Multiple Linear Regression

There are many assumptions to consider but we will focus on the major ones that are easily tested with
SPSS. The assumptions for multiple regression include the following: that the relationship between each
of the predictor variables and the dependent variable is linear and that the error, or residual, is normally
distributed and uncorrelated with the predictors. A condition that can be extremely problematic as well is
multicollinearity, which can lead to misleading and/or inaccurate results. Multicollinearity (or
collinearity) occurs when there are high intercorrelations among some set of the predictor variables. In
other words, multicollinearity happens when two or more predictors contain much of the same
information.
Although a correlation matrix indicating the intercorrelations among all pairs of predictors is helpful in
determining whether multicollinearity is a problem, it will not always indicate that the condition exists.
Multicollinearity may occur because several predictors, taken together, are related to some other
predictors or set of predictors. For this reason, it is important to test for multicollinearity when doing
multiple regression.
There are several different ways of computing multiple regression that are used under somewhat different
circumstances. We will have you use several of these approaches, so that you will be able to see that the
method one uses to compute multiple regression influences the information one obtains from the analysis.
If the researcher has no prior ideas about which variables will create the best prediction equation and has
a reasonably small set of predictors, then simultaneous regression is the best method to use. It is
preferable to use the hierarchical method when one has an idea about the order in which one wants
to enter predictors and wants to know how prediction by certain variables improves on prediction by
others. Hierarchical regression appropriately corrects for capitalization on chance; whereas, stepwise,
another method available in SPSS in which variables are entered sequentially, does not. Both
simultaneous regression and hierarchical regression require that you specify exactly which variables
serve as predictors. Sometimes you have a relatively large set of variables that may be good
predictors of the dependent variable, but you cannot enter such a large set of variables without
sacrificing the power to find significant results. In such a case, stepwise regression might be used.
However, as indicated earlier, stepwise regression capitalizes on chance more than many researchers
find acceptable.
• Retrieve your data file: hsbdataB.sav
Problem 6.1: Using the Simultaneous Method to
Compute Multiple Regression
To reiterate, the purpose of multiple regression is to predict an interval (or scale) dependent variable
from a combination of several interval/scale, and/or dichotomous independent/predictor variables. In
the following assignment, we will see if math achievement can be predicted better from a
combination of several of our other variables, such as the motivation scale, grades in high school,
and mother's and father's education. In Problems 6.1 and 6.3, we will run the multiple regression
using alternate methods provided by SPSS. In Problem 6.1, we will assume that all seven of the
predictor variables are important and that we want to see what is the highest possible multiple
correlation of these variables with the dependent variable. For this purpose, we will use the method
that SPSS calls Enter (often called simultaneous regression), which tells the computer to consider
all the variables at the same time. In Problem 6.3, we will use the hierarchical method. 6.1. How well
can you predict math achievement from a combination of seven variables: motivation,
competence,pleasure, grades in high school, father's education, mother's education, andgendert
In this problem, the computer will enter/consider all the variables at the same time. Also, we will ask
which of these seven predictors contribute significantly to the multiple correlation/regression.
It is a good idea to check the correlations among the predictor variables prior to running the multiple
regression, to determine if the predictors are sufficiently correlated such that multicollinearity is
highly likely to be a problem. This is especially important to do when one is using a relatively large
set of predictors, and/or if, for empirical or conceptual reasons, one believes that some or all of the
predictors might be highly correlated. If variables are highly correlated (e.g., correlated at .50 or .60
and above), then one might decide to combine (aggregate) them into a composite variable or
eliminate one or more of the highly correlated variables if the variables do not make a meaningful
composite variable. For this example, we will check correlations between the variables to see if there
might be multicollinearity problems. We typically also would create a scatterplot matrix to check the
assumption of linear relationships of each predictor with the dependent variable and a scatterplot
between the predictive equation and the residual to check for the assumption that these are
uncorrelated. In this problem, we will not do so because we will show you how to do these
assumption checks in Problem 6.2.
• Click on Analyze => Correlate => Bivariate. The Bivariate Correlations window will appear.
• Select the variables motivation scale, competence scale, pleasure scale, grades in h.s., father's
education, mother's education, and gender and click them over to the Variables box.

• Click on Options => Missing values => Exclude cases listwise.

• Click on Continue and then click on OK. A correlation matrix like the one in Output 6.la should appear.

The correlation matrix indicates large correlations between motivation and competence and between
mother's education and father's education. To deal with this problem, we would usually aggregate or
eliminate variables that are highly correlated. However, we want to show how the collinearity problems
created by these highly correlated predictors affect the Tolerance values and the significance of the beta
coefficients, so we will run the regression without altering the variables. To run the regression, follow the
steps below:
• Click on the following: Analyze => Regression => Linear. The Linear Regression window (Fig. 6.1)
should appear.
• Select math achievement and click it over to the Dependent box (dependent variable).
• Next select the variables motivation scale, competence scale, pleasure scale, grades in h.s., father's
education, mother's education, and gender and click them over to the Independent(s) box (independent
variables).
• Under Method, be sure that Enter is selected.
Problem 6.2: Simultaneous Regression Correcting Multicollinearity

In Problem 6.2, we will use the combined/average of the two variables, mother's education and father's
education, and then recompute the multiple regression, after omitting competence and pleasure.
We combined father's education and mother's education because it makes conceptual sense and
because these two variables are quite highly related (r = .65). We know that entering them as two
separate variables created problems with multicollinearity because tolerance levels were low for
these two variables, and, despite the fact that both variables were significantly and substantially
correlated with math achievement, neither contributed significantly to predicting math achievement
when taken together. When it does not make sense to combine the highly correlated variables, one
can eliminate one or more of them. Because the conceptual distinction between motivation,
competence, and pleasure was important for us, and because motivation was more important to us
than competence or pleasure, we decided to delete the latter two scales from the analysis. We wanted
to see if motivation would contribute to the prediction of math achievement if its contribution was not
canceled out by competence and/or pleasure. Motivation and competence are so highly correlated
that they create problems with multicollinearity. We eliminate pleasure as well, even though its
tolerance is acceptable, because it is virtually uncorrelated with math achievement, the dependent
variable, and yet it is correlated with motivation and competence. Thus, it is unlikely to contribute
meaningfully to the prediction of mathachievement, and its inclusion would only serve to reduce
power and potentially reduce the predictive power of motivation. It would be particularly important
to eliminate a variable such as pleasure if it were strongly correlated with another predictor, as this
can lead to particularly misleading results.
6.2. Rerun Problem 6.1 using the parents' education variable (parEduc) instead offaed and maed and
omitting the competence and pleasure scales. First, we created a matrix scatterplot (as in chapter 2)
to see if the variables are related to each other in a linear fashion. You can use the syntax in Output
6.2 or use the Analyze => Scatter windows as shown below.
• Click on Graphs => Scatter...
• Select Matrix and click on Define.
• Move math achievement, motivation, grades, parent's education, and gender into the Matrix
Variables: box.
• Click on Options. Check to be sure that Exclude cases listwise is selected.
• Click on Continue and then OK.
Then, run the regression, using the following steps:
• Click on the following: Analyze => Regression => Linear. The Linear Regression window (Fig.
6.1) should appear. This window may still have the variables moved over to the Dependent and
Independent(s) boxes. If so, click on Reset.
• Move math achievement into the Dependent box.
• Next select the variables motivation, grades in h.s., parent's education, and gender and move them
into the Independent(s) box (independent variables).
• Under Method, be sure that Enter is selected.
• Click on Statistics, click on Estimates (under Regression Coefficients), and click on Model fit,
Descriptives, and Collinearity diagnostics (See Fig. 6.2.).
• Click on Continue.
• Click on OK.
Then, we added a plot to the multiple regression to see the relationship of the predictors and the residual.
To make this plot follow these steps:
• Click on Plots... (in Fig. 6.1 to get Fig. 6.3.)
Problem 6.3: Hierarchical Multiple Linear Regression
In Problem 6.3, we will use the hierarchical approach, which enters variables in a series of blocks or
groups, enabling the researcher to see if each new group of variables adds anything to the prediction
produced by the previous blocks of variables. This approach is an appropriate method to use when
the researcher has a priori ideas about how the predictors go together to predict the dependent
variable. In our example, we will enter gender first and then see if any of the other variables make an
additional contribution. This method is intended to control for or eliminate the effects of gender on
the prediction.
6.3. If we control for gender differences in math achievement, do any of the other variables
significantly add anything to the prediction over and above what gender contributes?
We will include all of the variables from the previous problem; however, this time we will enter the
variables in two separate blocks to see how motivation, grades in high school, and parents' education
improve on prediction from gender alone.
• Click on the following: Analyze => Regression => Linear.
• Click on Reset.
• Select math achievement and click it over to the Dependent box (dependent variable).
• Next, select gender and move it to the over to the Independent(s) box (independent variables).
• Select Enter as your Method. (See Fig. 6.4.)

• Click on Next beside Block 1 of 1. You will notice it changes to Block 2 of 2.

• Then move motivation scale, grades in h.s., and parent's education to the Independents) box
(independent variables). Under Method, select Enter. The window should look like Fig. 6.5.
• Click on Statistics, click on Estimates (under Regression Coefficients), and click on Model fit
and R squared change. (See Fig. 6.2.)
• Click on Continue.
• Click on OK. Compare your output and syntax to Output 6.3.

Lecture-18 Canonical Form
No ratings yet
Lecture-18 Canonical Form
5 pages
Sampling and Sampling Distributions
No ratings yet
Sampling and Sampling Distributions
30 pages
Statistics For Support Slides
No ratings yet
Statistics For Support Slides
186 pages
Biostatistics Assignment
No ratings yet
Biostatistics Assignment
3 pages
Worksheet For Engineers
100% (2)
Worksheet For Engineers
2 pages
Advance Business Research Methods
No ratings yet
Advance Business Research Methods
57 pages
Biostat Exam For Anaesthesia
No ratings yet
Biostat Exam For Anaesthesia
7 pages
Chapter 6 Section 4-5: Probability: Multiple Choice
No ratings yet
Chapter 6 Section 4-5: Probability: Multiple Choice
7 pages
Statistical Computing I
No ratings yet
Statistical Computing I
187 pages
Econometrics QP Calicut
No ratings yet
Econometrics QP Calicut
17 pages
IHRERC Proposal Guideline Updated September l2023
No ratings yet
IHRERC Proposal Guideline Updated September l2023
2 pages
Pratima Education® 9898168041: D. Ratio
No ratings yet
Pratima Education® 9898168041: D. Ratio
68 pages
Ug Stat Pract Manual
100% (1)
Ug Stat Pract Manual
108 pages
16 Graeco Latin Squares 323
No ratings yet
16 Graeco Latin Squares 323
7 pages
Observational Studies PDF
No ratings yet
Observational Studies PDF
12 pages
Econometrics II Assignment
No ratings yet
Econometrics II Assignment
3 pages
SAS Part001
No ratings yet
SAS Part001
15 pages
Social & Economic Statistics (Chapter 1 - 5)
No ratings yet
Social & Economic Statistics (Chapter 1 - 5)
71 pages
University of Gondar: Prepared By: Bisrat Misganaw Department of Statistics
100% (1)
University of Gondar: Prepared By: Bisrat Misganaw Department of Statistics
20 pages
Statistics For Management - Unit - One-1
100% (1)
Statistics For Management - Unit - One-1
9 pages
EPIData Presentation
No ratings yet
EPIData Presentation
36 pages
Correlation Analysis Notes-2
No ratings yet
Correlation Analysis Notes-2
5 pages
Inference About Population Variance
100% (1)
Inference About Population Variance
30 pages
Spss Project (Prashant Rajput)
No ratings yet
Spss Project (Prashant Rajput)
23 pages
Statistics 2
67% (3)
Statistics 2
3 pages
Applied Mathematics II For Chemists (Math 2022)
No ratings yet
Applied Mathematics II For Chemists (Math 2022)
2 pages
Census Vs Sample Enumeration: Comparison Chart
No ratings yet
Census Vs Sample Enumeration: Comparison Chart
12 pages
Tabulation of Data
100% (1)
Tabulation of Data
5 pages
LIkert Scale - Normality
No ratings yet
LIkert Scale - Normality
11 pages
Sample Final Exam 1
No ratings yet
Sample Final Exam 1
13 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
41 pages
Econometrics Assignemente
No ratings yet
Econometrics Assignemente
2 pages
R Programming Exam With Solutions
No ratings yet
R Programming Exam With Solutions
9 pages
Multivariate Analysis in SPSS
No ratings yet
Multivariate Analysis in SPSS
65 pages
New Curriculum For Bachelor of Management
100% (1)
New Curriculum For Bachelor of Management
159 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
Worksheet 65.1: Game Theory: Price
No ratings yet
Worksheet 65.1: Game Theory: Price
1 page
Catpca
No ratings yet
Catpca
19 pages
Economics Modular Curriculum July 2013
No ratings yet
Economics Modular Curriculum July 2013
116 pages
Detecting Multicollinearity in Regression Analysis
No ratings yet
Detecting Multicollinearity in Regression Analysis
4 pages
Chapter-6-Random Variables & Probability Distributions
No ratings yet
Chapter-6-Random Variables & Probability Distributions
15 pages
ECN 702 Final Examination Question Paper
No ratings yet
ECN 702 Final Examination Question Paper
6 pages
E X DX: Mcqs From Unit-2
No ratings yet
E X DX: Mcqs From Unit-2
4 pages
TIGP Application Form
No ratings yet
TIGP Application Form
28 pages
Sample Class Test Research Methodology For Business
No ratings yet
Sample Class Test Research Methodology For Business
10 pages
Statistics and Data
No ratings yet
Statistics and Data
67 pages
Introduction To R Brochure
No ratings yet
Introduction To R Brochure
3 pages
Project Feasibility Study: The Key To Successful Implementation of Sustainable and Socially Responsible Construction Management Practice
No ratings yet
Project Feasibility Study: The Key To Successful Implementation of Sustainable and Socially Responsible Construction Management Practice
24 pages
Community Based Training Special Issue 02
No ratings yet
Community Based Training Special Issue 02
14 pages
MCQ Question Bank
No ratings yet
MCQ Question Bank
20 pages
Social and Economics Statistics Multiplechoose Question
No ratings yet
Social and Economics Statistics Multiplechoose Question
12 pages
Practice Exercises (Chi-Square Test)
No ratings yet
Practice Exercises (Chi-Square Test)
2 pages
Unit 3
No ratings yet
Unit 3
10 pages
Population Exercises
100% (1)
Population Exercises
27 pages
Econometrics 1: Dummy Dependent Variables Models
0% (1)
Econometrics 1: Dummy Dependent Variables Models
12 pages
Managerial Economics Past Papers
No ratings yet
Managerial Economics Past Papers
5 pages
Chapter 2 - Regression and Correlation PDF
No ratings yet
Chapter 2 - Regression and Correlation PDF
26 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Advanced Statistics Day 1
No ratings yet
Advanced Statistics Day 1
61 pages
SLR Solved Example
No ratings yet
SLR Solved Example
6 pages
Dummy Variable Regression and Oneway ANOVA Models Using SAS
No ratings yet
Dummy Variable Regression and Oneway ANOVA Models Using SAS
11 pages
Project Management: Demand Forecast
100% (1)
Project Management: Demand Forecast
30 pages
Unit 5 Estimation: Structure
No ratings yet
Unit 5 Estimation: Structure
17 pages
β β X, X σ X X X: Simposium Nasional Akuntansi Vi
No ratings yet
β β X, X σ X X X: Simposium Nasional Akuntansi Vi
12 pages
Time Value of Money
No ratings yet
Time Value of Money
54 pages
HO4 Estimation
No ratings yet
HO4 Estimation
9 pages
Assignment 6 Tree Based Methods
No ratings yet
Assignment 6 Tree Based Methods
7 pages
Simultaneous Games-Mixed Strategies: Abdul Quadir Xlri
No ratings yet
Simultaneous Games-Mixed Strategies: Abdul Quadir Xlri
45 pages
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manualdownload
100% (4)
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manualdownload
51 pages
Risk_Management_for_Quantitative_Traders
No ratings yet
Risk_Management_for_Quantitative_Traders
3 pages
Survival Analysis. Techniques For Censored and Truncated Data (2Nd Ed.)
No ratings yet
Survival Analysis. Techniques For Censored and Truncated Data (2Nd Ed.)
3 pages
Poisson Distribution: DR A R M Harunur Rashid
No ratings yet
Poisson Distribution: DR A R M Harunur Rashid
13 pages
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjeepdf download
100% (2)
Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjeepdf download
58 pages
Stat
No ratings yet
Stat
2 pages
Lecture 2 Block 1.1 Introduction To Epidemiology
No ratings yet
Lecture 2 Block 1.1 Introduction To Epidemiology
27 pages
Cuki - Hrvoje (Marko Topalović's Conflicted Copy 2014-05-17) (Marko Grozdek's Conflicted Copy 2015-05-12) - 000
No ratings yet
Cuki - Hrvoje (Marko Topalović's Conflicted Copy 2014-05-17) (Marko Grozdek's Conflicted Copy 2015-05-12) - 000
55 pages
Review
No ratings yet
Review
33 pages
QMM
No ratings yet
QMM
3 pages
Lecture On Game Theory IITM Part4
No ratings yet
Lecture On Game Theory IITM Part4
22 pages
Credit Risk Modeling in R: Logistic Regression: Introduction
No ratings yet
Credit Risk Modeling in R: Logistic Regression: Introduction
27 pages
Credibility, Mahler & Dean (AutoRecovered)
No ratings yet
Credibility, Mahler & Dean (AutoRecovered)
4 pages
Artificial Assignment
No ratings yet
Artificial Assignment
7 pages
Assignment 3 EPI3010
No ratings yet
Assignment 3 EPI3010
5 pages
MATH1541-WE01 Statistics I May 2016
No ratings yet
MATH1541-WE01 Statistics I May 2016
8 pages
Linear Regression (BA)
No ratings yet
Linear Regression (BA)
13 pages
Econ 335 Wooldridge CH 8 Heteroskedasticity
No ratings yet
Econ 335 Wooldridge CH 8 Heteroskedasticity
23 pages
Unit 11 Portfolio Selection: Objectives
No ratings yet
Unit 11 Portfolio Selection: Objectives
13 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages

Assumptions of Multiple Linear Regression

Uploaded by

Assumptions of Multiple Linear Regression

Uploaded by

Multiple Regression

Assumptions of Multiple Linear Regression

• Click on Options => Missing values => Exclude cases listwise.

• Click on Next beside Block 1 of 1. You will notice it changes to Block 2 of 2.

You might also like