Case 2, in Class, Correlation and Multiple Regression (R)
Case 2, in Class, Correlation and Multiple Regression (R)
1. Is there any association between the firm’s income and manager’s age,
work experience, & company’s age?
As the DV (firm income) and the IVs (manager age, work experience, and
company age) are all scale variables, we will use Pearson correlation.
H0: Firm’s income and manager’s age, work experience, & company’s age
are independent
H1: Firm’s income and manager’s age, work experience, & company’s age
are NOT independent
From the correlation table, we can see that the sig. value or p-value
between firm’s income from one side and manager age, work experience,
and company’s age from the other sides are all equal to 0.000 less than
0.01. Therefore, we have enough proofs to reject H0 and accept H1.
Therefore, all IVs have a relation with the DV. In addition, the correlation
between the dependent variable and the manager’s age is moderate
(0.479), the correlation with the work experience is strong (0.620), and the
correlation with the company’s age is weak to moderate (0.316). Therefore,
we can proceed with linear regression.
2. Find the suitable linear regression that allows you to predict the firm
income as a function of manager’s age, work experience, company’s age.
Provide the necessary explanation of the various assumptions and
provide the conclusion to be provided to the CEO.
Assumption no. 1:
First we plot the scatter graph for each independent variable and the only
dependent variable, and then we show the loess line. Select “Graphs” →
“Legacy dialogs” → “Scatter/dot” → “Simple scatter” → “Define” → Move
“manager’s age” to “X Axis” and “firm’s income” to “Y Axis” → OK. Repeat
the same steps for the work experience and company’s age.
Double click in the scatter plot → Click on the icon “Add Fit Line at Total” →
select “Loess” → Apply, then in “confidence intervals” select “individual”
and write 95% → Unselect “Attach label to line” → Apply → Close.
As it can be seen from the graphs, all the relations are very close to linear,
so assumption no. 1 has been met.
The normality of the residuals can be tested by looking at the P-P plot for
the model.
As shown in the figure, the distribution of the standardized residuals is far
from being normal. So, assumption no. 4 (normality of residuals) has not
been met.
Looking at the COO_1 in the original dataset, all cook’s distances are below
1 which means no significant outliers. So, assumption no. 5 has been met.
The Variance Inflation Factor (VIF) should be close to 1 but under 5 is fine
whereas 10 + needs checking; in our case all VIF values are less than 2. And
Tolerance values should be higher than 0.2 which exists in our case too.
In our case, also all the Pearson correlation values in the Correlations table
(answer 1) among the independent variables are less than 0.8. So, we don’t
face the problem of Multicollinearity where continuous independent
variables are too correlated. Then, assumption no. 6 has been met.
Conclusion 1:
A multiple linear regression was carried out to test if manager’s age, work
experience, and company’s age significantly predict the income of the
company. The results showed a non-normal behavior of the residuals and a
close to funneling shape. The results of the regression indicated that the
model explained 41.3 % of the variance in the company’s income. The
results showed also that the independent variables can be used together to
reliably predict the income (ANOVA p<0.05). The final predictive model
was: Income = -8.655 + 0.82*Manager Age + 2.817*work experience +
0.246*company age.
Conclusion 2:
It should be noted that the company age was found to be not significant to
predict the income. Hence, we should repeat the test without the company
age. As a summary we will obtain a DW value = 2.063 with funneling shape
of residuals, which are not normally distributed. This will confirm the
results of the previous case. However, the R-squared is 0.412, which is very
close to the previous one (0.413). The ANOVA test is satisfactory and all
coefficients are statistically significant. So, we can use this model: