0% found this document useful (0 votes)

30 views64 pages

CH 05

Uploaded by

aurora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views64 pages

CH 05

Uploaded by

aurora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Chapter 5: Multiple Regression

Analysis

Overview

What Is Multiple Regression Analysis?

Multiple Regression in the Era of Big Data
A Decision Process for Multiple Regression Analysis
• Stage 1: Objectives of Multiple Regression
• Stage 2: Research Design of a Multiple Regression Analysis
• Stage 3: Assumptions in Multiple Regression Analysis
• Stage 4: Estimating the Regression Model and Assessing Overall Model Fit
• Stage 5: Interpreting the Regression Variate
• Stage 6: Validation of the Results
Extending Multiple Regression
Illustration of a Regression Analysis
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Learning Objectives
Upon completing this chapter, you should be able to do the following:
• Determine when regression analysis is the appropriate statistical tool.
• Understand how regression helps us make both predictions and explanations using the
least squares concept.
• Use dummy variables with an understanding of their interpretation.
• Be aware of the assumptions underlying regression analysis and how to assess them.
• Understand the implications of managing the variate and its impact on the results.
• Address the implications of user- versus software-controlled variable selection and
explain the options available in software controlled variable selection.
• Interpret regression results and variable importance, especially in light of
multicollinearity.
• Apply the diagnostic procedures necessary to assess influential observations.
• Understand the benefits gained from the extended forms of regression, namely multi-
level models and panel models.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

WHAT IS MULTIPLE REGRESSION ANALYSIS?

Multiple Regression Defined

Multiple Regression Analysis

• Statistical technique that can be used to analyze the relationship between a single
dependent (criterion) variable and several independent (predictor) variables.

Key Component
• Regression variate
• Linear combination of weighted independent variables best predicts the dependent variable.

Variable Types in Multiple Regression

• Dependent – metric
• Independent – metric or transformed non-metric (through dummy variable
coding)

The Regression Variate

Example: Credit Card Usage Based on
Variate Specification Family Size and Income

Y’ = b0 + b1X1 + b2X2 + . . . + bnXn + ε Y’ = b0 + b1X1 + b2X2 + ε

Y’ – Dependent variable Y’ – number of credit cards

b0 – Intercept (constant) b0 – number of credit cards independent of family

size and income
b1, b2, … bn – regression weight indicating the
change in the dependent variable b1, b2 – change in number of credit cards for a
associated with a unit change in the unit change in family size and income,
independent variable respectively
X1, X2, … Xn – independent variables X1, X2 – values of family size, income respectively

ε – Prediction error (residual) ε – Prediction error (residual)

Graphically Portraying the Variate

The Y’ value is a linear combination of the entire
set of independent variables that best achieves
the statistical objective.
X3

Circles represent each independent variables’

association/correlation with Y Y’
• The entire circle for each X represents the X1
univariate correlation with Y
• Areas of overlap represent the shared X2
correlation with other variable(s)
• Non-overlapping areas for each IV indicate the
unique impact (i.e., the regression weight)
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

MULTIPLE REGRESSION IN THE ERA OF BIG DATA

Multiple Regression in the Era of Big Data

Historical Relevance
• Multiple regression has historically been the dominant statistical technique for
understanding dependence relationships.
• Particularly useful in providing explanation of importance of independent variables.

Era of Big Data

• Still dominant technique for estimation of the statistical/data model (see Chapter 1).
• Yet many new applications, such as automated decision-making (e.g., instantaneous
credit approval, online advertising placement, and many others), change the focus to
alternative techniques (e.g., machine learning) emphasizing prediction at expense of
explanation.

MULTIPLE REGRESSION DECISION PROCESS

Stage 1: Objectives of Multiple Regression

Stage 2: Research Design of Multiple Regression
Stage 3: Assumptions in Multiple Regression Analysis
Stage 4: Estimating the Regression Model and Assessing Overall Fit
Stage 5: Interpreting the Regression Variate
Stage 6: Validation of the Results

STAGE 1: OBJECTIVES OF MULTIPLE REGRESSION

❑ Research Problems Appropriate for Multiple Regression

❑ Specifying a Statistical Relationship
❑ Selection of Dependent and Independent Variables

Research Problems Appropriate for Multiple Regression

Prediction with Multiple Regression:

• Maximize predictive accuracy
• always crucial to ensuring the validity of the set of independent variables.
• Model comparison
• comparing two or more sets of independent variables to ascertain the predictive power of
each variate.
Explanation With Multiple Regression:
• Relative importance of independent variables
• objectively assessing the magnitude and direction (positive or negative) of each independent
variable’s relationship.
• Nature of relationships (e.g., linear versus nonlinear) with dependent variables
• Nature of relationships among independent variables
• Impact of multicollinearity in assessing relative importance of independent variables.

Selection of Dependent and Independent Variables

The researcher should always consider three issues that can affect any
decision about variables:

• The theory that supports using the variables.

• Measurement error, especially in the dependent variable

• Only structural equation modeling (SEM) can directly accommodate measurement error, but
using summated scales can mitigate it when using multiple regression.

• Specification error – the exclusion of relevant (and inclusion of irrelevant)

independent variables
• When in doubt, include potentially irrelevant variables (as they can only confuse
interpretation) rather than possibly omitting a relevant variable (which can bias all regression
estimates).
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Examples of Specification Error: Exclusion and Inclusion of Variables

When in doubt, include potentially irrelevant variables (they can only

confuse interpretation) rather than possibly omitting a relevant variable
(which can bias all regression estimates).

STAGE 2: RESEARCH DESIGN OF A

MULTIPLE REGRESSION ANALYSIS
❑ Sample Size
❑ Creating Additional Variables

Sample Size Considerations

Statistical Power
• Simple regression can be effective with a sample size of 20.
• Maintaining power at .80 in multiple regression requires a minimum sample of 50
and preferably 100 observations for most research situations.

Generalizability
• The minimum ratio of observations to variables is 5 to 1, but the preferred ratio is
15 or 20 to 1, and this should increase when stepwise estimation is used.
• Maximizing the degrees of freedom improves generalizability and addresses both
model parsimony and sample size concerns.

Nonmetric variables
• can only be included in a regression analysis by creating dummy variables.
• Dummy variables can only be interpreted in relation to their reference category.

Representing Nonlinear Relationships

• Adding an additional polynomial term represents another inflection point in the
curvilinear relationship.
• Quadratic and cubic polynomials are generally sufficient to represent most curvilinear
relationships.
• Assessing the significance of a polynomial or interaction term is accomplished by evaluating
incremental R2, not the significance of individual coefficients, due to high multicollinearity.

Representing Interaction or Moderator Effects

Moderator effect
• When the moderator variable, a second independent variable, changes the form
of the relationship between another independent variable and the dependent
variable.
• Long been a primary topic of interest since it addresses the fundamental question
of “When” does this effect occur.
• Also known as an interaction effect and is similar to the interaction term found in
analysis of variance and multivariate analysis of variance (see Chapter 6).
• The moderator term is a compound variable formed by multiplying X1 by the
moderator X2, which is entered into the regression equation.
• The coefficient of the interaction term (i.e., the moderator effect) indicates the
unit change in the effect of X1 as X2 changes.
• The coefficients of the two independent variables now represent the effects when
the other independent variable is zero.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Adding A Mediation Effect

Mediation
• Occurs when the effect of an independent variable may “work through” an
intervening variable (the mediating variable) to predict the dependent variable.
• In this situation the independent variable may have a direct effect on the
dependent measure as well as an indirect effect though the mediating variable to
the dependent variable.
• Most commonly associated with ANOVA and MANOVA models (see Chapter 6 for
an extended discussion), can play an important role in defining the roles of
potential independent variables.
Designation of Mediation Effects
• Designation of a mediation effect is a conceptual decision by the researcher as it
has little or no impact on the effects of other independent variables.

STAGE 3: ASSUMPTIONS IN
MULTIPLE REGRESSION ANALYSIS
❑ Assessing Individual Variables Versus the Variate
❑ Methods of Diagnosis
❑ Linearity of the Phenomenon
❑ Constant Variance of the Error Term
❑ Normality of the Error Term Distribution
❑ Independence of the Error Terms

Primary Assumptions of Multiple Regression

Four Primary Assumptions
• Linearity of the phenomenon measured.
• Homoscedasticity – Constant variance of the error terms.
• Normality of the error term distribution.
• Independence of the error terms.
Assessing Individual Variables Versus the Variate
• Testing assumptions must be done:
• for each dependent and independent variable.
• for the variate as well.
Methods of Diagnosis
• Principal diagnostic measure is the standardized or studentized residual.
• Graphical analyses (i.e., null plot/residual plot, partial regression plots, and normal
probability plots) are the most widely used diagnostic methods.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Assessing Linearity and Homoscedasticity

Linearity
• Critical issue in representing the “true” relationship since the concept of
correlation, the measure of association underlying regression analysis, is based on
a linear relationship.
• Examined through residual plots and comparison to null plot
• Corrective actions available are:
• Transforming the data values.
• Including the nonlinear relationships in the regression model (e.g., polynomials).
• Specialized methods such as nonlinear regression.
Homoscedasticity – constant variance of the error term
• Diagnosis with graphical plots or simple statistical tests
• Remedies include:
• Variable transformation, weighted least squares or heteroscedasticity-consistent standard
errors (HCSE).
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Assessing Normality and Independence of the Error Terms

Normality
• Applies to error terms/residuals, but remedies are to the variables themselves.
• Graphical diagnostic – normal probability plot.
• Regression generally considered robust to violations of normality when sample
size exceeds 200.

Independence of Error Terms

• Predicted values, and thus error terms, are not correlated to any variables not in
the analysis.
• Graphical test – residuals plotted versus offending variables.
• Two basic types of offending variables:
• Time series data.
• Clustered data – common example is students within classroom.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

STAGE 4: ESTIMATING THE REGRESSION MODEL AND

ASSESSING OVERALL MODEL FIT
❑ Managing the Variate
❑ Variable Specification
❑ Variable Selection
❑ Testing the Regression Variate for Meeting the Regression Assumptions
❑ Examining the Statistical Significance of Our Model
❑ Understanding Influential Observations

Managing The Variate: Variable Specification and Selection

Variable Specification

Two options
• Use variables in their original form
• Allows for use of direct measures of the variables of interest.
• As number of variables increases, interpretability may become problematic.

• Employ some form of dimensional reduction

• Most common approach to address multicollinearity among the independent variables.
• Can be either software controlled or user controlled.
• Software controlled – software independently forms dimensional reduction and then proceeds with
analysis (e.g., principal components regression).
• User controlled – research performs some form of dimensional reduction process (e.g., exploratory
factor analysis) and forms composites which are then substituted for original variables in the
analysis.

Variable Selection Approaches – User Controlled

User-controlled
• Confirmatory (Simultaneous)
• the only method to allow direct testing of a pre-specified model.
• also the most complex from the perspectives of specification error, model parsimony and
achieving maximum predictive accuracy.

• Combinatorial (All-Possible-Subsets)
• provides control by allowing the researcher to review the entire set of roughly equivalent
models in terms of predictive accuracy.

Variable Selection Approaches – Software Controlled

Software-controlled
• Sequential Search Methods:
• while maximizing predictive accuracy, represents a completely “automated” approach to
model estimation, leaving the researcher almost no control over the final model specification.
• Forward Inclusion & Backward Elimination.
• Stepwise (variables possible removed once included in regression equation).
• Caveats – impact of multicollinearity, loss of researcher control and increased alpha level.
• Constrained
• A variant of sequential methods whereby variables regression weights are constrained to
maximize parsimony in the final model results.
• Ridge
• LASSO

Choosing Between User- and Software-Controlled Approaches

• No single method is “Best” and the prudent strategy is to use a combination of
approaches to capitalize on the strengths of each.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Assess the Statistical Significance of the Overall Model

Components of Model Fit

• Total Sum of Squares (SST)
• total amount of variation that exists to be explained by the independent variables.
• TSS = the sum of SSE and SSR.
• Sum of Squared Errors (SSE)
• the variance in the dependent variable not accounted for by the regression model = residual.
• The objective is to obtain the smallest possible sum of squared errors as a measure of
prediction accuracy.
• Sum of Squares Regression (SSR)
• the amount of improvement in explanation of the dependent variable attributable to the
independent variables.

Measures of Fit for Individual Observations

Least Squares Regression Line and Individual Observations

Y Deviation not
explained by
regression
Total Deviation

Y = average

Deviation
explained by
regression

Measures of Overall Model Fit

F statistic – statistical significance of overall model

• Significance means that it is unlikely your sample will produce a large R2 when the
population R2 is actually zero.
• A rule of thumb is there must be <.05 probability for statistical significance.
R2 (Coefficient of Determination) – strength of overall variate relationship
• Represents the percent of variation (i.e., goodness of fit) in the dependent variable
associated (“explained) by all of the independent variables considered together.
• R2 ranges from 0 to 1.0, with large R2 indicating the linear relationship works well.
• Statistical significance does not ensure practical significance, which is based on the
meaningfulness of the results. Example, is explaining 4 percent of the variation
worth the cost of collecting and analyzing the data?
Adjusted R2
• based on the number of independent variables relative to the sample size.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Significance Tests of Regression Coefficients

Statistical Inference test for each estimated coefficient

• Establishing a confidence interval
• Specify desired alpha level (typically .05).
• Compute standard error – expected sampling error of the coefficient, similar to a standard
deviation of an individual variable.
• Confidence interval – determine number of standard errors based on alpha level (e.g., 1.96 for
.05 alpha) times the value of the standard error [e.g., .05 level = ±1.96 ×standard error].
• Applying the confidence interval
• Statistical significance established if confidence interval does not include zero.
• Sample size has direct influence on standard error – increased sample size reduces standard
error, thus making statistical significance more likely.
• Practical significance must also be assessed . . .
• Always ensure practical significance when using large sample sizes, as the model results and
regression coefficients could be deemed irrelevant even when statistically significant due just to the
statistical power arising from large sample sizes.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Types of Influential Observations

Influential observations include all observations that

• lie outside the general patterns of the data set.
• disproportionate effect on the regression results.

Three basic types based upon the nature of their impact on the results:
• Outliers are observations that have large residual values and can be identified
only with respect to a specific regression model.
• Leverage points are observations that are distinct from the remaining
observations based on their independent variable values.
• Influential observations are the broadest category, including all observations that
have a disproportionate effect on the regression results. Influential observations
potentially include outliers and leverage points but may include other
observations as well.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Impacts of Influential Observations

Reinforcing (Figures 5.1a,b)

• Reinforce (5.1a) or even strongly
define (5.1b) the relationships
Conflicting (Figures 5.1c, e, f)
• have an effect that is contrary to
the general pattern of the
remaining data but still have
small residuals
Shifting (Figure 5.1d)
• affect all of the results in a similar
manner

Identifying Influential Observations

Step 1: Examining Residuals and Partial Regression Plots

• Residuals – defined by a) cases used to calculate and b) type of standardization
• Partial Regression Plots – depict relationship of variable controlling for other variables
Step 2: Identifying Leverage Points
• Leverage points – substantially different on one or more independent variables
• Diagnostic measures – Hat matrix and Mahalanobis distance (D2)
Step 3: Single-case Diagnostics
• Empirical measures of each case’s influence on:
• Individual coefficients – SDFBETA
• Overall results – Cook’s distance, COVRATIO and SDFFIT

Step 4: Selecting Influential Observations

Observations classified into four
groups based on combination
of residuals and leverage
A. No issues – fit well and no
extreme values
B. High leverage, but not outlier
– very different on IVs, but still
predicted well by model
C. Outliers, But Acceptable
Leverage – high residual, but
no extreme values on IV
D. Outliers and High Leverage –
poor prediction and quite
different on IV
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Corrective Actions for Influentials

Four conditions reflecting the potential remedy for an influential observation

1. An error in observations or data entry
• remedy by correcting the data or deleting the case.
2. A valid but exceptional observation that is explainable by an extraordinary situation
• remedy by deletion of the case unless variables reflecting the extraordinary situation are
included in the regression equation.
3. An exceptional observation with no likely explanation
• presents a special problem because there is no reason for deleting the case, but its inclusion
cannot be justified either, suggesting analyses with and without the observations to make a
complete assessment.
• An ordinary observation in its individual characteristics but exceptional in its
combination of characteristics
• indicates modifications to the conceptual basis of the regression model and should be retained.

STAGE 5: INTERPRETING THE REGRESSION VARIATE

❑ Using the Regression Coefficients

❑ Assessing Multicollinearity
❑ Relative Importance of Independent Variables

Using the Regression Coefficients

Key Functions of Regression Coefficients
• Prediction
• Estimation – minimize the residuals and produce expected value for each observation.
• Forecasting – allow for predicted value for any set of values for independent variables.
• Explanation
• Interpretation with Regression Coefficients – primary measure of the relative impact and
importance of the independent variables in their relationship with the dependent variable
• Comparison between independent variables problematic if on different scales.
• Standardizing the Regression Coefficients: Beta Coefficients – converts variables to a common
scale and variability, the most common being a mean of zero (0.0) and standard deviation of
one (1.0).
• Eliminate the problem of dealing with different units of measurement and thus reflect the relative
impact on the dependent variable of a change in one standard deviation in either variable.
• Caveats in use as measure of importance:
• guide to the relative importance of individual independent variables only when collinearity is minimal.
• interpreted only in the context of the other variables in the equation.
• Reflect change in the dependent measure for a one standard deviation change in the independent variable.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Assessing Multicollinearity

Multicollinearity
• relationship between two (collinearity) or more (multicollinearity) independent
variables. Multicollinearity occurs when any single independent variable is highly
correlated with a set of other independent variables.

Steps in Assessing and Addressing Multicollinearity

1. Understand new measures of correlation which incorporate multicollinearity.
2. Assess the degree of multicollinearity.
3. Determine its impact on the results.
4. Apply the necessary remedies if needed.

Measures of Correlation Incorporating Multicollinearity

Three Measures of Correlation

• Bivariate or zero-order correlation
• Association between two variables,
not accounting for the variation
shared with any other variables.
• Appears in the correlation matrix.
Measures reflecting unique explanation
• Semi-partial or part correlation
• Unique predictive effect.
• Partial correlation
• Incremental predictive effect.

Insights of each form of correlation will

be seen in later sections
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Identifying Multicollinearity
Variance Inflation Factor (VIF) – measures how much the variance of the regression
coefficients is inflated by multicollinearity problems. The square root of the VIF is the
expected increase in the standard error of the coefficients.
• A VIF of 0 indicates no correlation between the independent measures.
• As the VIF increases it indicates a higher degree of association between the predictor variables.
• For example, a VIF measure of 1 is generally not enough to cause problems.
• However, a VIF value of 5 is generally thought to be the maximum acceptable; anything higher would
indicate a problem with multicollinearity.
Tolerance – the amount of variance of an independent variable that is not explained by the
other independent variables (i.e., an independent variable is considered a dependent
variable, predicted by all the other independent variables).
• Small values for tolerance indicate problems of multicollinearity.
• The minimum cutoff value for tolerance is typically .20 (i.e., value less than .20 are problematic).
That is, the tolerance value must be smaller than .20 to indicate a problem of multicollinearity.

VIF and Tolerance are inversely related: VIF = 1 / Tolerance

Effects of Multicollinearity

All Impacts arise from the shared variance among variables which can not be
attributed to any single variable
Impacts on Estimation
• Decrease in explained variance – as multicollinearity increases, unique explanatory
effects of variables decline, thus overall decline in predicted variation (R2).
• Singularity – if reaches 1,0, precludes model estimation.
• Increases in standard error – as shown by VIF, multicollinearity increases standard
errors and makes it more difficult to achieve statistical significance.
• Reversal of signs of Coefficients – signs can “reverse” from bivariate relationships.
Impacts on Explanation
• Since coefficients only represent unique explanation, multicollinearity can obscure
the total effect of a variables, which requires newer measures of relative importance.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

How Much Multicollinearity is Acceptable And Remedies

Bivariate correlations
• Values of .70 or higher may result in problems and lower values may be problematic
if they are higher than the correlations with the dependent variable.
Tolerance or VIF
• Tolerance values up to .20, corresponding to a VIF of 5, almost always indicate
problems with multicollinearity.
• VIF values of even 3 to 5 may result in interpretation or estimation problems,
particularly when the relationships with the dependent variable are weaker.
Remedies for Multicollinearity
• Delete collinear variable(s).
• Apply dimensional reduction, such as composites from exploratory factor analysis.
• Specific estimation techniques – Bayesian or principal components regression.
• Do nothing – particularly if used solely for prediction, but still risky.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Relative Importance of Independent Variables

Represent the overall impact of the independent variables

• Accounting for both shared and unique variance explained.
• Measures that are comparable across all the independent variables.
All measures provide some insights, best to use combination
Direct Measures (from regression results)
• Bivariate correlations – “starting point” representing total relationship, but does
not separate unique versus shared effects.
• Squared semi-partial correlation – percentage of the variance in the dependent
variables that is unique to the independent variable.
• Regression weights – only unique relationship, potentially impacted by
multicollinearity.
• Beta (standardized) weights – regression weights on standardized scale, but still
impacted by multicollinearity.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Additional Measures of Relative Importance

Provide additional insights into variable impact in presence of multicollinearity
• All possible subsets regression – foundation for several measures discussed below.
• Structure coefficients – bivariate correlations with predicted value.
• Direct measure of contribution to predicted value.
• Do not distinguish between unique and shared variance.
• Commonality analysis – divides impact into unique and shared components
• Based on all possible subsets regression and all unique combinations of variables.
• Negative effects may indicate suppression effects.
• Dominance analysis – average squared semi-partial correlation across all possible
subsets regression models.
• Two forms of dominance between variables based on if one variable always has greater squared
semi-partial correlation, no matter what other variables are in the model.
• Relative Weights – sum to R2, but do not distinguish between unique and shared
variance.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

STAGE 6: VALIDATION OF THE RESULTS

❑ Additional or Split Samples

❑ Calculating the PRESS Statistic
❑ Comparing Regression Models
❑ Forecasting with the Model

Validation of the Regression Model

Ensure that the model represents the general population (generalizability)

and is appropriate for the situations in which it will be used (transferability).
• Additional or Split Samples
• Comparison of results to ensure comparability of results across differing samples.
• Calculating the PRESS (Prediction Sum of Squares) statistic
• Employs jackknife procedure to calculate measure of predictive fit.
• Used to calculate P2 (coefficient of prediction) which is measure of expected predictive accuracy.
• Comparing Regression Models
• Most common standard – R2, but does always increase as variables added.
• Alternatives – Akaike Information Criterion and the Bayesian Information Criterion.
• model with the best predictive power has lowest values on these measures.
• Forecasting
• Must always ensure comparability of new data to dataset used in estimation.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

EXTENDING MULTIPLE REGRESSION

❑ Multilevel Models
❑ Panel Models

Multilevel Models

Unified framework for addressing many of the statistical issues which occur
naturally when hierarchical/nested data structures are present

Background
• Context – any external factor outside the unit of analysis that:
• impacts the outcome of multiple individuals.
• creates differences between separate contexts and fosters dependencies within a single context.
• Hierarchical data structure – observations which have a natural nesting effect created
by contexts, with Level-1 observations nested with context represented by Level-2.
• Multilevel model (MLM) – extension of regression analysis that allows for the
incorporation of both individual (Level-1) and contextual (Level-2) effects with the
appropriate statistical treatment.

Basic Concepts and Issues

Matching Measurement Properties to Level
• Ensure that you avoid ecological fallacy (using characteristic of a higher-level to
represent characteristics of a lower level) and atomistic fallacy (group level
relationships are assumed to equate to individual-level relationships).
Intraclass Correlation (ICC)
• Degree of dependence among individuals within a higher-level grouping.
• Demonstrates that the individuals in a context are not independent, thus violating
this assumption of regression.
Fixed Versus Random Effects
• Fixed effects – regression coefficient made as a point estimate with no variation.
• Random effects – best estimate of the variability or distribution of effects across a
set of groups/contexts.
Sample size by level – small group sizes as long as number of groups large.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Five stage Modeling Strategy for MLM

1. Sufficient Variation At Level 2
• Ensure enough variation between groups to justify including the level in the analysis.
2. Level-1 Model with Level-2 Effects
• Basic regression equation for Level-1 is specified as fixed effects and the Level-2
effects (intercepts) are added.
3. Introduce Level-2 Independent Variables
• Level-2 characteristics are introduced into the Level-2 equations to establish their
relationships with the Level-1 parameters.
4. Test for Random Coefficients of Level-1 Variables
• Do the intercepts or Level-1 coefficients vary across groups?
5. Add Cross-level Interactions to Explain Variations in Coefficients
• identify which Level-2 characteristics are related to the Level-1 variables that had
random coefficients.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Panel Models

Cross-sectional analysis of longitudinal or time-series data

Similarity to MLM
• Unit of observation (e.g., individual, class, firm) becomes a group (Level-2) with
multiple observations (Level-1).
• Accommodates serial correlation inherent in longitudinal data.

Benefits of Unified Framework

• By using a fixed effects estimate for an effect, the omitted variable problem
(endogeneity) is accounted for.
• Ability to employ a full range of independent variables also allows for the testing
of more complex models than alternative methods.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Basic Issues in Panel Models

Four Types of Variables

• Differ between units of analysis, but don’t change over time (e.g., gender).
• Change over time, but are the same for all units of analysis at any given time
period (e.g., national economic indicators).
• Vary over both time and units (e.g., income).
• Varies over time in a predictable pattern (e.g., any measure of age or tenure).

Selecting Between Fixed Versus Random Effects

• Trade-off between controlling for endogeneity with fixed effects versus the
statistical efficiency, yet biased results that result from random effects.

Basic Issues in Panel Models

Types of Models
• Basic model – simple pooled regression, which disregards the interdependencies
among observations within a unit of analysis.
• Unit-specific results (similar to the random effects in the multilevel models) where
intercepts, coefficients or both vary by unit.
• Unique model – time dependent effects model, where the intercept and
coefficients may vary over time as well.

Adding Time
• Panel models also provide for estimating time-variant effects just as was possible
for unit-specific effects.
• Requires at least enough time periods for a basic relationship to be estimated (five
or more).
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

ILLUSTRATION OF A REGRESSION ANALYSIS

Stages One, Two and Three

Stage 1: Objectives
• Predict the customer satisfaction based on their perceptions of HBAT’s
performance and identify the factors that lead to increased satisfaction.
Stage 2: Research Design
• Thirteen independent variables (X6 to X18).
• Meets minimum ratio of observations per variable – 7:1 with adequate power.
Stage 3: Assumptions
• Linearity – graphical analysis did not reveal nonlinear relationships.
• Homoscedasticity – only two variables (X6 and X17) had minimal violations.
• Normality – six variables indicated violations, thus requiring further analysis.

Stepwise Results

R2 – .791
Standard error of the
estimate – .559

Five significant variables:

X9 – Complaint Resolution
X6 – Product Quality
X12 – Salesforce Image
X7 – E-Commerce
X11 – Product Line

Influential Plot
Eight observations
qualify as outliers,
but still have
acceptable leverage.

Four observations
have high leverage,
but well-predicted
by model.

No observations are
outliers with high
leverage.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

Measures of Variable Importance

Most direct measures

of variable
importance point
to X12 because it
has the largest
unique impact of all
of the independent
variables.
X9 also becomes
relatively important
when shared impact
is considered.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有，為課本著作之延伸教材，亦受著作權法之規範保護，僅作為授課教學使用，禁止列印、影印、未經授權重製和公開散佈

EVALUATING ALTERNATIVE REGRESSION MODELS

❑ Confirmatory Regression Model

❑ Use of Summated Scales as Remedies for Multicollinearity
❑ Including a Nonmetric Independent Variable

Comparison of 4 Models

Combination of variable specification (original variables versus composite

measures) and variable selection (simultaneous and stepwise) options

Including a Nonmetric Independent Variable: X

• X3 (Firm size) is binary nonmetric variable.

• Positive value of the coefficient indicates that large firms, given their
characteristics on the other five independent variables in the equation,
still have a customer satisfaction level that is about a quarter point higher
(.271) on the 10-point customer satisfaction question.

Learning Checkpoints

1. When should multiple regression be used?

2. Why should multiple regression be used?
3. What level of statistical significance and R2 would justify use of
multiple regression?
4. How do you use regression coefficients?
5. What are the options for interpreting the relative importance of
the significant variables?
6. How does variable specification (using original variables versus
composites from exploratory factor analysis) impact the
regression results for prediction and explanation
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA

Dissertation Using Multiple Regression
100% (3)
Dissertation Using Multiple Regression
8 pages
Marketing Research: Ninth Edition
No ratings yet
Marketing Research: Ninth Edition
44 pages
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
100% (1)
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
14 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
59 pages
Multivariate Analysis: Are Some of The Variables Dependent On Others?
100% (2)
Multivariate Analysis: Are Some of The Variables Dependent On Others?
16 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Regression PPT Final
100% (1)
Regression PPT Final
59 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
104 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
CH 11
No ratings yet
CH 11
19 pages
Biostat Ch17
No ratings yet
Biostat Ch17
21 pages
L2 Multiple Regression Analysis
No ratings yet
L2 Multiple Regression Analysis
38 pages
0531實習課
No ratings yet
0531實習課
22 pages
Stats Multiple Regression
No ratings yet
Stats Multiple Regression
19 pages
ch06CFA E
No ratings yet
ch06CFA E
13 pages
測試：OM Chapter 3 - Quizlet - A
No ratings yet
測試：OM Chapter 3 - Quizlet - A
15 pages
R: Linear Regression
No ratings yet
R: Linear Regression
15 pages
Multilpe Regression
No ratings yet
Multilpe Regression
17 pages
Análise de Regressão
No ratings yet
Análise de Regressão
77 pages
Lect 10 801T
No ratings yet
Lect 10 801T
17 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
EVSC 445 Week 11
No ratings yet
EVSC 445 Week 11
40 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
48 pages
Notes of DA Unit-II
No ratings yet
Notes of DA Unit-II
91 pages
Danna Expo
No ratings yet
Danna Expo
5 pages
Introduction To Regression Analysis
No ratings yet
Introduction To Regression Analysis
14 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
36 pages
Syllabus of Multivariate Analysis
No ratings yet
Syllabus of Multivariate Analysis
4 pages
Chapter 8.2
No ratings yet
Chapter 8.2
33 pages
Module 5: Multiple Regression Analysis: Tom Ilvento
No ratings yet
Module 5: Multiple Regression Analysis: Tom Ilvento
20 pages
CH 03
No ratings yet
CH 03
54 pages
Mulitple Linear Regression
No ratings yet
Mulitple Linear Regression
6 pages
Da Semi
No ratings yet
Da Semi
42 pages
Bsacore1 M5 Wed
No ratings yet
Bsacore1 M5 Wed
4 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
Qsar Stastistical Method in Drug Design
No ratings yet
Qsar Stastistical Method in Drug Design
54 pages
Flow-Chart For Popularly Used Statistical Tests
No ratings yet
Flow-Chart For Popularly Used Statistical Tests
1 page
Optimization of Gas Field Thesis Report PDF
No ratings yet
Optimization of Gas Field Thesis Report PDF
68 pages
2024 L2 QuantMethods
No ratings yet
2024 L2 QuantMethods
57 pages
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
No ratings yet
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
79 pages
9-Multiple Regression
No ratings yet
9-Multiple Regression
22 pages
2014, APPLIED ECONOMETRICS, Question Paper
No ratings yet
2014, APPLIED ECONOMETRICS, Question Paper
22 pages
Handout On Regression
No ratings yet
Handout On Regression
5 pages
Multiple-Regression - Batool & Raya
No ratings yet
Multiple-Regression - Batool & Raya
24 pages
CH04 Wooldridge 7e PPT 2pp 20231005
No ratings yet
CH04 Wooldridge 7e PPT 2pp 20231005
7 pages
SPE 04629 Fetkovich Decline TC
No ratings yet
SPE 04629 Fetkovich Decline TC
28 pages
Linear Least Square and Euler Method
No ratings yet
Linear Least Square and Euler Method
18 pages
Uji Asumsi Klasik
No ratings yet
Uji Asumsi Klasik
2 pages
MATH 107 Curve-Fitting Project - Linear Regression Model (UMUC)
No ratings yet
MATH 107 Curve-Fitting Project - Linear Regression Model (UMUC)
3 pages
Multiple Regression Analysis Using SPSS Statistics
No ratings yet
Multiple Regression Analysis Using SPSS Statistics
5 pages
Tute - 04
No ratings yet
Tute - 04
6 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
60 pages
Session3 and 4 - RKS - PredictiveAnalytics
No ratings yet
Session3 and 4 - RKS - PredictiveAnalytics
46 pages
MS-02 (Cost Behavior With Regression Analysis)
No ratings yet
MS-02 (Cost Behavior With Regression Analysis)
44 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
Lecture 25 - Multiple Regression
No ratings yet
Lecture 25 - Multiple Regression
34 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
University of Cambridge International Examinations General Certificate of Education Ordinary Level
No ratings yet
University of Cambridge International Examinations General Certificate of Education Ordinary Level
12 pages
CH07 Wooldridge 7e PPT 2pp 20230926
No ratings yet
CH07 Wooldridge 7e PPT 2pp 20230926
4 pages
Curve Fitting Algorithm
No ratings yet
Curve Fitting Algorithm
4 pages
CH 10
No ratings yet
CH 10
9 pages
Multiple Regression Analysis Using SPSS Statistics
No ratings yet
Multiple Regression Analysis Using SPSS Statistics
9 pages
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
No ratings yet
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
6 pages
Curvefitting PDF
No ratings yet
Curvefitting PDF
6 pages
Sec 3
No ratings yet
Sec 3
6 pages
Pr. 12 Regression
No ratings yet
Pr. 12 Regression
4 pages
Ba ZG524 Course Handout
No ratings yet
Ba ZG524 Course Handout
7 pages
ATP (Autosaved) 10
No ratings yet
ATP (Autosaved) 10
55 pages
Reg Lin
No ratings yet
Reg Lin
73 pages
Assignment Sheet (Questions) - RDD
No ratings yet
Assignment Sheet (Questions) - RDD
3 pages
2SLS Notes
No ratings yet
2SLS Notes
44 pages
Topic 1 Industrial Project Management
No ratings yet
Topic 1 Industrial Project Management
4 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Male
No ratings yet
Male
4 pages
Multiple Regression Analysis Using SPSS Laerd
No ratings yet
Multiple Regression Analysis Using SPSS Laerd
14 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
DS Final Report
No ratings yet
DS Final Report
5 pages
Metrics Assignment
No ratings yet
Metrics Assignment
6 pages
5562 22893 1 PB
No ratings yet
5562 22893 1 PB
8 pages
w4 Judgmental Forecasts
No ratings yet
w4 Judgmental Forecasts
24 pages
Business Analytics With Excel Mca
No ratings yet
Business Analytics With Excel Mca
6 pages
Occupational Health & Safety Analysis With HIRADC Method in Building Construction Project X
No ratings yet
Occupational Health & Safety Analysis With HIRADC Method in Building Construction Project X
9 pages
3 GCMAR Nov 2019 BRR726 Bus pp.21-33
No ratings yet
3 GCMAR Nov 2019 BRR726 Bus pp.21-33
13 pages
JRS, 523 - Mey Damayanti. C (143-151)
No ratings yet
JRS, 523 - Mey Damayanti. C (143-151)
9 pages
Regression
No ratings yet
Regression
7 pages
Define Mean Square Error
No ratings yet
Define Mean Square Error
3 pages
Multiple Regression
0% (1)
Multiple Regression
41 pages
Ultimate Snowflake Architecture for Cloud Data Warehousing: Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions (English Edition)
From Everand
Ultimate Snowflake Architecture for Cloud Data Warehousing: Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions (English Edition)
Ganesh Bharathan
No ratings yet
Learn Python Generative AI: Journey from autoencoders to transformers to large language models (English Edition)
From Everand
Learn Python Generative AI: Journey from autoencoders to transformers to large language models (English Edition)
Zonunfeli Ralte
No ratings yet
FinOps : RoadMap to Cloud Efficiency: Mentoring Cloud and Finance Professionals to Drive Cloud Productivity (English Edition)
From Everand
FinOps : RoadMap to Cloud Efficiency: Mentoring Cloud and Finance Professionals to Drive Cloud Productivity (English Edition)
Navin Sabharwal
No ratings yet
Direct Real Estate Duration Risk, Total Risk and the Residential Mortgage Life Insurance (Rmli)
From Everand
Direct Real Estate Duration Risk, Total Risk and the Residential Mortgage Life Insurance (Rmli)
Kim Hin David HO
No ratings yet
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
From Everand
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
Ruchi Doshi
No ratings yet