0% found this document useful (0 votes)
141 views

Assignment-15 BA

Business Analysis homework
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

Assignment-15 BA

Business Analysis homework
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

16.

Using the data in the Excel file Credit Card Spending, develop a multiple linear
regression model for estimating the average credit card expenditure as a function of both the
income and family size. Predict the average expense of a family that has two members and an
income of $188,000 per annum, and another that has three members and an income of $39,000
income per annum.

NO DATA FILE

17. The Excel file Cereal Data provides a variety of nutritional information about 67 cereals
and their shelf location in a supermarket. Use regression analysis to find the best model that
explains the relationship between calories and the other variables. Investigate the model
assumptions and clearly explain your conclusions. Keep in mind the principle of parsimony!

Dependent: Overall Satisfactions


Independent Model 1 Model 2 Model 3 Model 4 Model 5
Sodium .104 - - .115 .345***
Fiber .088 - .105 - -.329***
Carbs .706*** .690*** .757*** .646*** -
Sugars .926*** .917*** .964*** .883*** -
R Square .742 .726 .72 .725 .228
F 44.652 84.702 57.659 58.940 10.721
P < 0.001 P < 0.01 P < 0.01 P < 0.01 P < 0.01

MODEL 1
MODEL 2
MODEL 3
MODEL 4

MODEL 5
18. The Excel file Salary Data provides information on current salary, beginning salary,
previous experience (in months) when hired, and total years of education for a sample of 100
employees in a firm.

a. Develop a multiple regression model for predicting current salary as a function of the other
variables.
R2 = 0.803 strongly “fit” the regression line with the “data”

The Adjusted R² is 79.7%, showing that the regression model strongly fits the data reasonably well after
adjusting for the number of independent variables.

H0: The other variables does not significantly affect the Current Salary
H1: The other variables significantly affect the Current Salary

F = 130.521
P < 0.001
 Accept H1, has a regression line, the other variables significantly affect the Current Salary
+ Constant : t= 0.0001 ; P > 0.05 (1.000)
the constant (intercept) is NOT statistically significant
+ Beginning Salary: t= 15.203 ; P < 0.001
+ Previous Experience (months): t= -1.404 ; P > 0.05 (0.164)
+ Education (years): t= 2.045 ; P < 0.05 (0.044)

 Previous Experience is NOT significant to the Current Salary

- Beginning Salary is the most important predictor of Current Salary.


- Education also contributes positively to predicting Current Salary, though its effect is smaller
compared to Beginning Salary.
- Previous Experience does not seem to be a significant predictor in this model.

b. Find the best model for predicting current salary using the t-value criterion.

We exclude the Previous Experience

MODEL 2
MODEL 3
Dependent: Overall Satisfactions – t-value
Independent Model 1 Model 2 Model 3
Beginning Salary 15.203*** 15.261*** -
Previous Experience -1.404 - 1.341
Education 2.045*** 2.507*** 6.894***
R Square .803 .799 0.315
F 130.521 192.868 23.787
P < 0.01 P < 0.01 P < 0.001

Model 2 is better according to the requirements based on the t-criterion. This model removes the
non-significant variable (Previous Experience) and focuses on the variables that have a real
impact on Current Salary. This aligns with the principle of removing statistically insignificant
variables in regression, making the model simpler while maintaining high accuracy.

The Adjusted R-squared of both models is nearly identical, but Model 2 has the advantage of
eliminating the irrelevant variable, making the model more concise and efficient.

21. The Excel file Major League Baseball provides data on the 2010 season.

a. Construct and examine the correlation matrix. Is multicollinearity a potential problem?


with r=−1.000 at the 0.001 significance level, ∣r∣>0.7 ⇒ There is multicollinearity between them.
 The number of wins has a perfect negative correlation relationship with the number of losses

with r=0.785 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity between them.
 The number of wins has a strong positive correlation relationship with the number of runs

batted in with r=0.758 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
 The number of wins has a strong positive correlation relationship with the number of runs

between them.

with r=−0.785 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity between them.
 The number of losses has a strong negative correlation relationship with the number of runs

batted in with r=−0.758 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
 The number of losses has a strong negative correlation relationship with the number of runs

between them.

batted in with r=0.995 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
 The number of runs has a perfect positive correlation relationship with the number of runs

between them.

runs with r=0.727 at the 0.001 significance level, ∣r∣>0.7 ⇒ There is multicollinearity between
 The number of runs has a strong positive correlation relationship with the number of home

them.
home runs with r=0.759 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
 The number of runs batted in has a strong positive correlation relationship with the number of

between them

b. Suggest an appropriate set of independent variables that predict the number of wins by
examining the correlation matrix.

The appropriate set of independent variables should be significant and avoid multicollinearity

c. Find the best multiple regression model for predicting the number of wins. How good is your
model? Does it use the same variables you thought were appropriate in part (b)?

You might also like