0% found this document useful (0 votes)
12 views12 pages

Case 2, in Class, Correlation and Multiple Regression (R)

The document discusses a multiple linear regression analysis conducted by Complex Systems, Inc. (CSI) to determine the factors influencing firm income, specifically examining the relationships between manager's age, work experience, and company age. The analysis revealed that while manager's age and work experience significantly predict income, company age does not contribute significantly. The final predictive model was established as Income = -10.986 + 0.945*Manager Age + 2.817*Work Experience, with an R-squared value indicating that approximately 41.2% of the variance in income can be explained by the independent variables.

Uploaded by

Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Case 2, in Class, Correlation and Multiple Regression (R)

The document discusses a multiple linear regression analysis conducted by Complex Systems, Inc. (CSI) to determine the factors influencing firm income, specifically examining the relationships between manager's age, work experience, and company age. The analysis revealed that while manager's age and work experience significantly predict income, company age does not contribute significantly. The final predictive model was established as Income = -10.986 + 0.945*Manager Age + 2.817*Work Experience, with an R-squared value indicating that approximately 41.2% of the variance in income can be explained by the independent variables.

Uploaded by

Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

College of Business Administration

BITM 350 Fundamentals of Data Analytics


Fall 2022 - 2023

Given: Multiple linear regression

Complex Systems, Inc. (CSI) is an organization that provides technical MIS


solutions to companies looking to improve the flow of information among
their departments, which could lead to incredible efficiency and
responsiveness. CSI performs a lengthy and complicated process in order to
reach the stage where it can offer the suitable MIS solution to its client.
This raises the cost of producing the consulting service, and the high cost
makes clients reluctant to make the “Buy” decision. CSI wants to analyze a
few variables to find out what could increase the likelihood of those clients
to make the “Buy” decision.

1. Is there any association between the firm’s income and manager’s age,
work experience, & company’s age?

As the DV (firm income) and the IVs (manager age, work experience, and
company age) are all scale variables, we will use Pearson correlation.
H0: Firm’s income and manager’s age, work experience, & company’s age
are independent

H1: Firm’s income and manager’s age, work experience, & company’s age
are NOT independent

From the correlation table, we can see that the sig. value or p-value
between firm’s income from one side and manager age, work experience,
and company’s age from the other sides are all equal to 0.000 less than
0.01. Therefore, we have enough proofs to reject H0 and accept H1.
Therefore, all IVs have a relation with the DV. In addition, the correlation
between the dependent variable and the manager’s age is moderate
(0.479), the correlation with the work experience is strong (0.620), and the
correlation with the company’s age is weak to moderate (0.316). Therefore,
we can proceed with linear regression.
2. Find the suitable linear regression that allows you to predict the firm
income as a function of manager’s age, work experience, company’s age.
Provide the necessary explanation of the various assumptions and
provide the conclusion to be provided to the CEO.

Assumption no. 1:

Relation between dependent variable (firm’s income) and independent


variables (manager’s age, work experience, & company’s age) is linear.

First we plot the scatter graph for each independent variable and the only
dependent variable, and then we show the loess line. Select “Graphs” →
“Legacy dialogs” → “Scatter/dot” → “Simple scatter” → “Define” → Move
“manager’s age” to “X Axis” and “firm’s income” to “Y Axis” → OK. Repeat
the same steps for the work experience and company’s age.

Double click in the scatter plot → Click on the icon “Add Fit Line at Total” →
select “Loess” → Apply, then in “confidence intervals” select “individual”
and write 95% → Unselect “Attach label to line” → Apply → Close.
As it can be seen from the graphs, all the relations are very close to linear,
so assumption no. 1 has been met.

To test the other assumptions, we have to run the regression in SPSS:

To do this, click on Analyze → Regression → Linear → move “firm’s income”


to “Dependent” and “manager’s age”, “work experience”, and “company’s
age” to “Independent(s)”. In “Statistics”, then in “Residuals” select “Durbin-
Watson” → select “Collinearity diagnostics” → “Continue” (ASSUMPTION#2
& ASSUMPTION#6)→ “Plots” → CLICK on the ZPRED variable and MOVE it
across to the X-axis. Next, SELECT the ZRESID variable and MOVE it across
to the Y-axis (ASSUMPTION#3) → Under “Standardized residual plots”
select “Normal probability plot” → “Continue” (ASSUMPTION#4). Click on
“Save” option in the main regression dialog box → under “Distances” select
“Cook’s” → “Continue” → OK in the main regression dialog box to run the
analysis (ASSUMPTION#5).

The Durbin-Watson value is 2.064 which is very close to 2 so the


assumption no.2 has been met (residuals are independent).
As shown in the figure, the shape of the plot is slightly close to funneling
shape. So, assumption no. 3 (the assumption of homoscedasticity) has
been partially met.

The normality of the residuals can be tested by looking at the P-P plot for
the model.
As shown in the figure, the distribution of the standardized residuals is far
from being normal. So, assumption no. 4 (normality of residuals) has not
been met.

Looking at the COO_1 in the original dataset, all cook’s distances are below
1 which means no significant outliers. So, assumption no. 5 has been met.
The Variance Inflation Factor (VIF) should be close to 1 but under 5 is fine
whereas 10 + needs checking; in our case all VIF values are less than 2. And
Tolerance values should be higher than 0.2 which exists in our case too.

In our case, also all the Pearson correlation values in the Correlations table
(answer 1) among the independent variables are less than 0.8. So, we don’t
face the problem of Multicollinearity where continuous independent
variables are too correlated. Then, assumption no. 6 has been met.

Now, we have to analyze the regression output:

The adjusted R square = 0.413 which indicates good relationships between


the firm’s income and the 3 IVs. This means that 41.3% of the variance in
income can be predicted by the independent variables.
The Sig. value (p-value) is 0.000 <0.05, which means that we have enough
evidence to use the independent variables together in order to reliably
predict the company’s income.

Manager age statistically significantly contributes to the model because its


Sig. value is 0.000 less than 0.05.

Work experience statistically significantly contributes to the model because


its Sig. value is 0.000 less than 0.05.

Company age does not statistically significantly contribute to the model


because its Sig. value is 0.208 more than 0.05.
We can write the model as:

Income = -8.655 + 0.82*Manager Age + 2.817*work experience +


0.246*company age

Conclusion 1:

A multiple linear regression was carried out to test if manager’s age, work
experience, and company’s age significantly predict the income of the
company. The results showed a non-normal behavior of the residuals and a
close to funneling shape. The results of the regression indicated that the
model explained 41.3 % of the variance in the company’s income. The
results showed also that the independent variables can be used together to
reliably predict the income (ANOVA p<0.05). The final predictive model
was: Income = -8.655 + 0.82*Manager Age + 2.817*work experience +
0.246*company age.

Conclusion 2:

It should be noted that the company age was found to be not significant to
predict the income. Hence, we should repeat the test without the company
age. As a summary we will obtain a DW value = 2.063 with funneling shape
of residuals, which are not normally distributed. This will confirm the
results of the previous case. However, the R-squared is 0.412, which is very
close to the previous one (0.413). The ANOVA test is satisfactory and all
coefficients are statistically significant. So, we can use this model:

Income = -10.986 + 0.945*Manager Age + 2.817*work experience

You might also like