Assumptions of Multiple Linear Regression
Assumptions of Multiple Linear Regression
The correlation matrix indicates large correlations between motivation and competence and between
mother's education and father's education. To deal with this problem, we would usually aggregate or
eliminate variables that are highly correlated. However, we want to show how the collinearity problems
created by these highly correlated predictors affect the Tolerance values and the significance of the beta
coefficients, so we will run the regression without altering the variables. To run the regression, follow the
steps below:
• Click on the following: Analyze => Regression => Linear. The Linear Regression window (Fig. 6.1)
should appear.
• Select math achievement and click it over to the Dependent box (dependent variable).
• Next select the variables motivation scale, competence scale, pleasure scale, grades in h.s., father's
education, mother's education, and gender and click them over to the Independent(s) box (independent
variables).
• Under Method, be sure that Enter is selected.
Problem 6.2: Simultaneous Regression Correcting Multicollinearity
In Problem 6.2, we will use the combined/average of the two variables, mother's education and father's
education, and then recompute the multiple regression, after omitting competence and pleasure.
We combined father's education and mother's education because it makes conceptual sense and
because these two variables are quite highly related (r = .65). We know that entering them as two
separate variables created problems with multicollinearity because tolerance levels were low for
these two variables, and, despite the fact that both variables were significantly and substantially
correlated with math achievement, neither contributed significantly to predicting math achievement
when taken together. When it does not make sense to combine the highly correlated variables, one
can eliminate one or more of them. Because the conceptual distinction between motivation,
competence, and pleasure was important for us, and because motivation was more important to us
than competence or pleasure, we decided to delete the latter two scales from the analysis. We wanted
to see if motivation would contribute to the prediction of math achievement if its contribution was not
canceled out by competence and/or pleasure. Motivation and competence are so highly correlated
that they create problems with multicollinearity. We eliminate pleasure as well, even though its
tolerance is acceptable, because it is virtually uncorrelated with math achievement, the dependent
variable, and yet it is correlated with motivation and competence. Thus, it is unlikely to contribute
meaningfully to the prediction of mathachievement, and its inclusion would only serve to reduce
power and potentially reduce the predictive power of motivation. It would be particularly important
to eliminate a variable such as pleasure if it were strongly correlated with another predictor, as this
can lead to particularly misleading results.
6.2. Rerun Problem 6.1 using the parents' education variable (parEduc) instead offaed and maed and
omitting the competence and pleasure scales. First, we created a matrix scatterplot (as in chapter 2)
to see if the variables are related to each other in a linear fashion. You can use the syntax in Output
6.2 or use the Analyze => Scatter windows as shown below.
• Click on Graphs => Scatter...
• Select Matrix and click on Define.
• Move math achievement, motivation, grades, parent's education, and gender into the Matrix
Variables: box.
• Click on Options. Check to be sure that Exclude cases listwise is selected.
• Click on Continue and then OK.
Then, run the regression, using the following steps:
• Click on the following: Analyze => Regression => Linear. The Linear Regression window (Fig.
6.1) should appear. This window may still have the variables moved over to the Dependent and
Independent(s) boxes. If so, click on Reset.
• Move math achievement into the Dependent box.
• Next select the variables motivation, grades in h.s., parent's education, and gender and move them
into the Independent(s) box (independent variables).
• Under Method, be sure that Enter is selected.
• Click on Statistics, click on Estimates (under Regression Coefficients), and click on Model fit,
Descriptives, and Collinearity diagnostics (See Fig. 6.2.).
• Click on Continue.
• Click on OK.
Then, we added a plot to the multiple regression to see the relationship of the predictors and the residual.
To make this plot follow these steps:
• Click on Plots... (in Fig. 6.1 to get Fig. 6.3.)
Problem 6.3: Hierarchical Multiple Linear Regression
In Problem 6.3, we will use the hierarchical approach, which enters variables in a series of blocks or
groups, enabling the researcher to see if each new group of variables adds anything to the prediction
produced by the previous blocks of variables. This approach is an appropriate method to use when
the researcher has a priori ideas about how the predictors go together to predict the dependent
variable. In our example, we will enter gender first and then see if any of the other variables make an
additional contribution. This method is intended to control for or eliminate the effects of gender on
the prediction.
6.3. If we control for gender differences in math achievement, do any of the other variables
significantly add anything to the prediction over and above what gender contributes?
We will include all of the variables from the previous problem; however, this time we will enter the
variables in two separate blocks to see how motivation, grades in high school, and parents' education
improve on prediction from gender alone.
• Click on the following: Analyze => Regression => Linear.
• Click on Reset.
• Select math achievement and click it over to the Dependent box (dependent variable).
• Next, select gender and move it to the over to the Independent(s) box (independent variables).
• Select Enter as your Method. (See Fig. 6.4.)