0% found this document useful (0 votes)
12 views52 pages

LUBS5902-Lec6-LinearRegressionAssumptions-full - Tagged

Uploaded by

20ust049
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views52 pages

LUBS5902-Lec6-LinearRegressionAssumptions-full - Tagged

Uploaded by

20ust049
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Leeds University Business School

LUBS5902M

Data Analysis in International Business

Lecture 6: Linear Regression Assumptions

Emma LIU
[email protected]
Overview

• Understand the SPSS output


– Parameter estimate / Coefficients (a, b)
– Significance level
– Standardized coefficients
– Link SPSS output to hypothesis testing in research
• Model selection based on goodness of fit
– Use R-square for regression using single predictor
– Use adjusted R-square for regression using multiple predictors
• Linear regression assumption
– Be able to assess whether a statistical model violates the
assumptions and provide suggestions on improvement

Leeds University Business School 2


Exercise

• Researchers aim to explore the factors influencing firm


performance. Four predictors are:
• Patent number,
• Industry experience of top management team,
• International experience of top management team,
• Firm size
• The SPSS outputs are displayed as follows.

Leeds University Business School


Exercise

• Write down the regression model


• Explain the impact of each predictors on performance.
• Explain R-square and adjusted R-square, which is more
appropriate to use in this model?
• Business managers want to know how this result can
help them improve the performance. Can you give some
practical suggestions?

Leeds University Business School


Exercise

• Write down the regression model

• Explain the impact of each predictors on performance


– Industry experience: positive, insignificant
– International experience: positive, significant
– Firm size: positive, significant
– Patent: positive, significant

Leeds University Business School


Exercise

• Explain R-square and adjusted R-square, which is more


appropriate to use in this model?
– Adjusted R-square for multiple linear regression
• Business managers want to know how this result can
help them improve the performance. Can you give some
practical suggestions?
– Improve innovation capability by developing patents;
– Hire or train top management team with increased
international experience;
– Increase firm size through expansion or M&As;
– …

Leeds University Business School


Linear Regression Assumptions

• Linear relationship
• Multivariate normality
• Lack of multicollinearity
• No auto-correlation (independence of residuals/errors)
• Homoscedasticity (constant variance)

Leeds University Business School 8


Linear Assumption

• Linear regression needs the relationship between the


independent and dependent variables to be linear.
• It is also important to check for outliers since linear regression
is sensitive to outlier effects.
• The linearity assumption can best be tested with scatter plots
between and .

Leeds University Business School


Linearity & Test Outliers

Leeds University Business School


Examples

Leeds University Business School


Assumptions

• Linear relationship
• Multivariate normality
• Lack of multicollinearity
• No auto-correlation (independence of residuals/errors)
• Homoscedasticity (constant variance)

Leeds University Business School 12


Multivariate Normality

• Linear regression analysis requires all variables to be


multivariate normal. Or, errors are normally distributed (bell
curve).
• This assumption can best be checked with a histogram of
residuals or a P-P plot.
• P-P plot is a probability plot for assessing how closely two data
sets agree, which plots the two cumulative distribution
functions against each other
– e.g., comparing the distribution of residuals (errors) from
a known theoretical normal distribution

Leeds University Business School


Test Normality

The histogram is not normally The observed probability distribution


distributed, skew to the right. of residuals does not match with the
expected normal distribution.

Leeds University Business School


Example (IFDI)

Leeds University Business School


Example (Performance)

Leeds University Business School


Multivariate Normality

• Linear regression analysis requires all variables to be


multivariate normal. Or, errors are normally distributed (bell
curve).
• This assumption can best be checked with a histogram of
residuals or a P-P plot.
• When the data is not normally distributed a non-linear
transformation, e.g., log-transformation might fix this issue.

Leeds University Business School


Assumptions

• Linear relationship
• Multivariate normality
• Lack of multicollinearity
• No auto-correlation (independence of residuals/errors)
• Homoscedasticity (constant variance)

Leeds University Business School 18


Lack of multicollinearity

• Multicollinearity occurs when the independent variables are


not independent from each other.

• Multicollinearity might be tested with the following criteria


– Correlation matrix – computing the matrix of Pearson's correlation
coefficients among all independent variables
– Variance Inflation Factor (VIF) – the variance inflation factor of the linear
regression is defined as VIF = 1/(1 – R²). With VIF > 10 there is an
indication for multicollinearity to be present; with VIF > 100 there is
certainly multicollinearity in the sample. [Note. in VIF is the for the regression of on
the other predictors (a regression that does not involve the dependent variable )*]

• If multicollinearity is found in the data centering the data, that


is deducting the mean score might help to solve the problem.

Leeds University Business School


Test Multicollinearity

Leeds University Business School


Example (IFDI)

Leeds University Business School


Example (Performance)

Leeds University Business School


Assumptions

• Linear relationship
• Multivariate normality
• Lack of multicollinearity
• No auto-correlation (independence of residuals/errors)
• Homoscedasticity (constant variance)

Leeds University Business School 23


No auto-correlation

• Autocorrelation occurs when the residuals are not independent


from each other.

• This typically occurs in stock prices, where the price is not


independent from the previous price.

• Autocorrelation can be tested with the Durbin-Watson test.


– d is between 0 and 4
– Rule of thumb: 1.5 < d < 2.5 indicates no auto-correlation

Leeds University Business School


Examples

Leeds University Business School


Assumptions

• Linear relationship
• Multivariate normality
• Lack of multicollinearity
• No auto-correlation (independence of residuals/errors)
• Homoscedasticity (constant variance)

Leeds University Business School 26


Homoscedasticity

• Homoscedasticity means the error terms (i.e., residuals) along


the regression are equal. Or, they have constant variance.
• The scatter plot between and is good way to check whether
homoscedasticity is given.

Leeds University Business School


Heteroscedasticity

• If the data is heteroscedastic the scatter plots looks like the


following examples:

Leeds University Business School


Homoscedasticity
Examples

Leeds University Business School


Q&A

• Which of the following points reflect the assumption of


lack of multicollinearity?

a. Data must be normally distributed and not skewed


b. There must not be any extreme scores in the data set
c. The dependent variable cannot be a combination of
other independent variables
d. The variance across your variables must be equal
e. The relationship between your independent variables
must not be above |r|=0.75

Leeds University Business School


Q&A

• Which of the following points reflect the assumption of


lack of multicollinearity?

a. Data must be normally distributed and not skewed


b. There must not be any extreme scores in the data set
c. The dependent variable cannot be a combination of
other independent variables
d. The variance across your variables must be equal
e. The relationship between your independent variables
must not be above |r|=0.75

Leeds University Business School


Q&A

• The assumption that the variance of the residuals about


the predicted dependent variable scores should be the
same for all predicted scores reflects which assumption?

a. Homoscedasticity
b. Multicollinearity
c. Normality
d. No auto-correlation
e. Linear relationship

Leeds University Business School


Q&A

• The assumption that the variance of the residuals about


the predicted dependent variable scores should be the
same for all predicted scores reflects which assumption?

a. Homoscedasticity
b. Multicollinearity
c. Normality
d. No auto-correlation
e. Linear relationship

Leeds University Business School


Q&A

• What does a P-P plot help you to test?

a. Linearity
b. Auto-correlation
c. Homoscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• What does a P-P plot help you to test?

a. Linearity
b. Auto-correlation
c. Homoscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• The following two P-P plots can be used to ascertain


whether or not the residuals for the regression analysis
are normally distributed. Which of the two P-P plots might
give you cause for concern?

a. Plot A
b. Plot B

Leeds University Business School


Q&A

• The following two P-P plots can be used to ascertain


whether or not the residuals for the regression analysis
are normally distributed. Which of the two P-P plots might
give you cause for concern?

a. Plot A
b. Plot B

Leeds University Business School


Q&A

• Examining a scatter plot of residuals against predicted


outcome can help you identify the issue of

a. Linearity
b. Outlier
c. Heteroscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• Examining a scatter plot of residuals against predicted


outcome can help you identify the issue of

a. Linearity
b. Outlier
c. Heteroscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• A single case which has a disproportionately strong


influence over your regression model is called

a. Linearity
b. Outlier
c. Homoscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• A single case which has a disproportionately strong


influence over your regression model is called

a. Linearity
b. Outlier
c. Homoscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• The “Variance Inflation Factor (VIF)” can be used to


identify the issue of

a. Linearity
b. Outlier
c. Heteroscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• The “Variance Inflation Factor (VIF)” can be used to


identify the issue of

a. Linearity
b. Outlier
c. Heteroscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• Two or more predictor variables are highly correlated with


one another will cause the issue of

a. Linearity
b. Outlier
c. Heteroscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Q&A

• Two or more predictor variables are highly correlated with


one another will cause the issue of

a. Linearity
b. Outlier
c. Homoscedasticity
d. Normality
e. Multicollinearity

Leeds University Business School


Linear Regression Assumptions

• Linear relationship
– Scatter plot to identify the linear relationship and outliers
• Multivariate normality
– P-P-Plot or histogram to test the normality of residuals/errors
• Lack of multicollinearity
– Correlation matrix and VIF
• No auto-correlation (independence of residuals/errors)
– Durbin-Watson test

• Homoscedasticity (constant variance)


– Scatter plot (standardized y-hat vs. standardized residual)

Leeds University Business School 47


Exercise

• A research is conducted to examine the factors influencing the


performance of cross-border M&As (measured by the acquirer's
ROA differences before and after 3 years of the M&A). Three
factors are considered:
• Cultural distance (”CD”, measured by Hofstede’s cultural
dimensions using the formula )
• Linguistic distance (“LD”, measured as with Dow and Karunaratna
(2006) )
• Geographic distance (“GD”, measured by the distance between
the HQs in km)
• The SPSS outputs are included in the following pages. Evaluate
the use of linear regression against the five assumptions.

Leeds University Business School


Answer

• Linear relationship
– Scatter plot (top right figure), linear but with a few potential outliers. The
linear relationship is flat.
• Multivariate normality
– P-P plot shows the distribution of residuals is not consistent with the
expected normal distribution.
– Histogram of residuals (bottom right figure) shows that the residual is not
normally distributed (not a bell curve)
• No auto-correlation (independence of residuals/errors)
– Durbin-Watson test (in Model Summary box, )

Leeds University Business School 51


Answer

• Lack of multicollinearity
– Correlation matrix
– All VIF values are below 10.
• Homoscedasticity (constant variance)
– Scatter plot (standardized y-hat vs. standardized residual in the bottom
left figure) shows that there is no clear pattern, except for a few outliers.

Leeds University Business School 52

You might also like