CH 05
CH 05
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Overview
Learning Objectives
Upon completing this chapter, you should be able to do the following:
• Determine when regression analysis is the appropriate statistical tool.
• Understand how regression helps us make both predictions and explanations using the
least squares concept.
• Use dummy variables with an understanding of their interpretation.
• Be aware of the assumptions underlying regression analysis and how to assess them.
• Understand the implications of managing the variate and its impact on the results.
• Address the implications of user- versus software-controlled variable selection and
explain the options available in software controlled variable selection.
• Interpret regression results and variable importance, especially in light of
multicollinearity.
• Apply the diagnostic procedures necessary to assess influential observations.
• Understand the benefits gained from the extended forms of regression, namely multi-
level models and panel models.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Key Component
• Regression variate
• Linear combination of weighted independent variables best predicts the dependent variable.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Historical Relevance
• Multiple regression has historically been the dominant statistical technique for
understanding dependence relationships.
• Particularly useful in providing explanation of importance of independent variables.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Statistical Power
• Simple regression can be effective with a sample size of 20.
• Maintaining power at .80 in multiple regression requires a minimum sample of 50
and preferably 100 observations for most research situations.
Generalizability
• The minimum ratio of observations to variables is 5 to 1, but the preferred ratio is
15 or 20 to 1, and this should increase when stepwise estimation is used.
• Maximizing the degrees of freedom improves generalizability and addresses both
model parsimony and sample size concerns.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Variable Transformations To Represent Unique Elements of the
Dependence Relationship
Nonmetric variables
• can only be included in a regression analysis by creating dummy variables.
• Dummy variables can only be interpreted in relation to their reference category.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Mediation
• Occurs when the effect of an independent variable may “work through” an
intervening variable (the mediating variable) to predict the dependent variable.
• In this situation the independent variable may have a direct effect on the
dependent measure as well as an indirect effect though the mediating variable to
the dependent variable.
• Most commonly associated with ANOVA and MANOVA models (see Chapter 6 for
an extended discussion), can play an important role in defining the roles of
potential independent variables.
Designation of Mediation Effects
• Designation of a mediation effect is a conceptual decision by the researcher as it
has little or no impact on the effects of other independent variables.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
STAGE 3: ASSUMPTIONS IN
MULTIPLE REGRESSION ANALYSIS
❑ Assessing Individual Variables Versus the Variate
❑ Methods of Diagnosis
❑ Linearity of the Phenomenon
❑ Constant Variance of the Error Term
❑ Normality of the Error Term Distribution
❑ Independence of the Error Terms
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Normality
• Applies to error terms/residuals, but remedies are to the variables themselves.
• Graphical diagnostic – normal probability plot.
• Regression generally considered robust to violations of normality when sample
size exceeds 200.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Variable Specification
Two options
• Use variables in their original form
• Allows for use of direct measures of the variables of interest.
• As number of variables increases, interpretability may become problematic.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
User-controlled
• Confirmatory (Simultaneous)
• the only method to allow direct testing of a pre-specified model.
• also the most complex from the perspectives of specification error, model parsimony and
achieving maximum predictive accuracy.
• Combinatorial (All-Possible-Subsets)
• provides control by allowing the researcher to review the entire set of roughly equivalent
models in terms of predictive accuracy.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Y Deviation not
explained by
regression
Total Deviation
Y = average
Deviation
explained by
regression
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Three basic types based upon the nature of their impact on the results:
• Outliers are observations that have large residual values and can be identified
only with respect to a specific regression model.
• Leverage points are observations that are distinct from the remaining
observations based on their independent variable values.
• Influential observations are the broadest category, including all observations that
have a disproportionate effect on the regression results. Influential observations
potentially include outliers and leverage points but may include other
observations as well.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Assessing Multicollinearity
Multicollinearity
• relationship between two (collinearity) or more (multicollinearity) independent
variables. Multicollinearity occurs when any single independent variable is highly
correlated with a set of other independent variables.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Identifying Multicollinearity
Variance Inflation Factor (VIF) – measures how much the variance of the regression
coefficients is inflated by multicollinearity problems. The square root of the VIF is the
expected increase in the standard error of the coefficients.
• A VIF of 0 indicates no correlation between the independent measures.
• As the VIF increases it indicates a higher degree of association between the predictor variables.
• For example, a VIF measure of 1 is generally not enough to cause problems.
• However, a VIF value of 5 is generally thought to be the maximum acceptable; anything higher would
indicate a problem with multicollinearity.
Tolerance – the amount of variance of an independent variable that is not explained by the
other independent variables (i.e., an independent variable is considered a dependent
variable, predicted by all the other independent variables).
• Small values for tolerance indicate problems of multicollinearity.
• The minimum cutoff value for tolerance is typically .20 (i.e., value less than .20 are problematic).
That is, the tolerance value must be smaller than .20 to indicate a problem of multicollinearity.
Effects of Multicollinearity
All Impacts arise from the shared variance among variables which can not be
attributed to any single variable
Impacts on Estimation
• Decrease in explained variance – as multicollinearity increases, unique explanatory
effects of variables decline, thus overall decline in predicted variation (R2).
• Singularity – if reaches 1,0, precludes model estimation.
• Increases in standard error – as shown by VIF, multicollinearity increases standard
errors and makes it more difficult to achieve statistical significance.
• Reversal of signs of Coefficients – signs can “reverse” from bivariate relationships.
Impacts on Explanation
• Since coefficients only represent unique explanation, multicollinearity can obscure
the total effect of a variables, which requires newer measures of relative importance.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
❑ Multilevel Models
❑ Panel Models
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Multilevel Models
Unified framework for addressing many of the statistical issues which occur
naturally when hierarchical/nested data structures are present
Background
• Context – any external factor outside the unit of analysis that:
• impacts the outcome of multiple individuals.
• creates differences between separate contexts and fosters dependencies within a single context.
• Hierarchical data structure – observations which have a natural nesting effect created
by contexts, with Level-1 observations nested with context represented by Level-2.
• Multilevel model (MLM) – extension of regression analysis that allows for the
incorporation of both individual (Level-1) and contextual (Level-2) effects with the
appropriate statistical treatment.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Panel Models
Similarity to MLM
• Unit of observation (e.g., individual, class, firm) becomes a group (Level-2) with
multiple observations (Level-1).
• Accommodates serial correlation inherent in longitudinal data.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Types of Models
• Basic model – simple pooled regression, which disregards the interdependencies
among observations within a unit of analysis.
• Unit-specific results (similar to the random effects in the multilevel models) where
intercepts, coefficients or both vary by unit.
• Unique model – time dependent effects model, where the intercept and
coefficients may vary over time as well.
Adding Time
• Panel models also provide for estimating time-variant effects just as was possible
for unit-specific effects.
• Requires at least enough time periods for a basic relationship to be estimated (five
or more).
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Stage 1: Objectives
• Predict the customer satisfaction based on their perceptions of HBAT’s
performance and identify the factors that lead to increased satisfaction.
Stage 2: Research Design
• Thirteen independent variables (X6 to X18).
• Meets minimum ratio of observations per variable – 7:1 with adequate power.
Stage 3: Assumptions
• Linearity – graphical analysis did not reveal nonlinear relationships.
• Homoscedasticity – only two variables (X6 and X17) had minimal violations.
• Normality – six variables indicated violations, thus requiring further analysis.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Stepwise Results
R2 – .791
Standard error of the
estimate – .559
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Influential Plot
Eight observations
qualify as outliers,
but still have
acceptable leverage.
Four observations
have high leverage,
but well-predicted
by model.
No observations are
outliers with high
leverage.
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Comparison of 4 Models
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
For use with Hair, Black, Babin and Anderson, Multivariate Data Analysis 8e © 2018 Cengage Learning EMEA
© 2019 Cengage版權所有,為課本著作之延伸教材,亦受著作權法之規範保護,僅作為授課教學使用,禁止列印、影印、未經授權重製和公開散佈
Learning Checkpoints