Assignment-15 BA
Assignment-15 BA
Using the data in the Excel file Credit Card Spending, develop a multiple linear
regression model for estimating the average credit card expenditure as a function of both the
income and family size. Predict the average expense of a family that has two members and an
income of $188,000 per annum, and another that has three members and an income of $39,000
income per annum.
NO DATA FILE
17. The Excel file Cereal Data provides a variety of nutritional information about 67 cereals
and their shelf location in a supermarket. Use regression analysis to find the best model that
explains the relationship between calories and the other variables. Investigate the model
assumptions and clearly explain your conclusions. Keep in mind the principle of parsimony!
MODEL 1
MODEL 2
MODEL 3
MODEL 4
MODEL 5
18. The Excel file Salary Data provides information on current salary, beginning salary,
previous experience (in months) when hired, and total years of education for a sample of 100
employees in a firm.
a. Develop a multiple regression model for predicting current salary as a function of the other
variables.
R2 = 0.803 strongly “fit” the regression line with the “data”
The Adjusted R² is 79.7%, showing that the regression model strongly fits the data reasonably well after
adjusting for the number of independent variables.
H0: The other variables does not significantly affect the Current Salary
H1: The other variables significantly affect the Current Salary
F = 130.521
P < 0.001
Accept H1, has a regression line, the other variables significantly affect the Current Salary
+ Constant : t= 0.0001 ; P > 0.05 (1.000)
the constant (intercept) is NOT statistically significant
+ Beginning Salary: t= 15.203 ; P < 0.001
+ Previous Experience (months): t= -1.404 ; P > 0.05 (0.164)
+ Education (years): t= 2.045 ; P < 0.05 (0.044)
b. Find the best model for predicting current salary using the t-value criterion.
MODEL 2
MODEL 3
Dependent: Overall Satisfactions – t-value
Independent Model 1 Model 2 Model 3
Beginning Salary 15.203*** 15.261*** -
Previous Experience -1.404 - 1.341
Education 2.045*** 2.507*** 6.894***
R Square .803 .799 0.315
F 130.521 192.868 23.787
P < 0.01 P < 0.01 P < 0.001
Model 2 is better according to the requirements based on the t-criterion. This model removes the
non-significant variable (Previous Experience) and focuses on the variables that have a real
impact on Current Salary. This aligns with the principle of removing statistically insignificant
variables in regression, making the model simpler while maintaining high accuracy.
The Adjusted R-squared of both models is nearly identical, but Model 2 has the advantage of
eliminating the irrelevant variable, making the model more concise and efficient.
21. The Excel file Major League Baseball provides data on the 2010 season.
with r=0.785 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity between them.
The number of wins has a strong positive correlation relationship with the number of runs
batted in with r=0.758 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
The number of wins has a strong positive correlation relationship with the number of runs
between them.
with r=−0.785 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity between them.
The number of losses has a strong negative correlation relationship with the number of runs
batted in with r=−0.758 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
The number of losses has a strong negative correlation relationship with the number of runs
between them.
batted in with r=0.995 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
The number of runs has a perfect positive correlation relationship with the number of runs
between them.
runs with r=0.727 at the 0.001 significance level, ∣r∣>0.7 ⇒ There is multicollinearity between
The number of runs has a strong positive correlation relationship with the number of home
them.
home runs with r=0.759 at the 0.001 significance level, ∣r∣>0.7⇒ There is multicollinearity
The number of runs batted in has a strong positive correlation relationship with the number of
between them
b. Suggest an appropriate set of independent variables that predict the number of wins by
examining the correlation matrix.
The appropriate set of independent variables should be significant and avoid multicollinearity
c. Find the best multiple regression model for predicting the number of wins. How good is your
model? Does it use the same variables you thought were appropriate in part (b)?