MATH 533 Part C - Regression and Correlation Analysis
MATH 533 Part C - Regression and Correlation Analysis
MATH 533 Part C - Regression and Correlation Analysis
Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best
fit" line. Interpret.
Scatterplot of Credit Balance($) vs Size
6000
Credit Balance($)
5000
4000
3000
2000
1
4
Size
The scatter plot of Credit balance ($) versus Size show that the slope of the best fit line
is upward (positive); this indicates that Credit balance varies directly with Size. As Size
increases, Credit Balance also increases vice versa. Correct
MINITAB OUTPUT:
Regression Analysis: Credit Balance($) versus Size
The regression equation is
Credit Balance($) = 2591 + 403 Size
Predictor
Constant
Size
S = 620.162
Coef
2591.4
403.22
SE Coef
195.1
50.95
R-Sq = 56.6%
Analysis of Variance
T
13.29
7.91
P
0.000
0.000
R-Sq(adj) = 55.7%
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)
Size
5.00
2. Determine the equation of the "best fit" line, which describes the relationship between
CREDIT BALANCE and SIZE.
The equation of the best fit line help describes the relationship between Credit Balance
and Size is
Credit Balance ($) = 2591 + 403.2 Size Correct
MINITAB OUTPUT:
Pearson correlation of Credit Balance ($) and Size = 0.752
P-Value = 0.000
MINITAB OUTPUT:
S = 620.162
R-Sq = 56.6%
R-Sq(adj) = 55.7%
5. Test the utility of this regression model (use a two tail test with =.05). Interpret your
results, including the p-value.
The null hypothesis; Ho, states that there is no significant correlation, or the correlation
coefficient
=0.
From the Analysis of Variance table, I find that the p-value is 0.000, which is much less
than 0.05. Therefore, I reject the null hypothesis because there is no significant
correlation and conclude that, according to the overall test of significance, the regression
model is valid. Correct
MINITAB OUTPUT:
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
6. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT
BALANCE? Explain.
Base on my finding, I see that Size is a good predictor of Credit Balance because Credit
Balance and Size seems to affect each other. As Size increase Credit Balance seems to
increases also; they correlated. As the Size of the household grow so does the Credit
Balance of those household also grew and increase. Correct
The household size of 5 average credit balances for customers is estimated to lie within
the interval of (4368.2, 4846.9). This is the 95% confidence interval estimate for the
credit balance for customers that have household size of 5. Correct
MINITAB OUTPUT:
Predicted Values for New Observations
New Obs
1
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)
Size
5.00
9. Using an interval, predict the credit balance for a customer that has a household size of 5.
Interpret this interval.
The credit balance for a customer that has household size of 5 is expected to lie within
the interval of (3337.9, 5877.2). This is the 95% prediction interval estimate for the credit
balance for a customer that has household size of 5. Correct
MINITAB OUTPUT:
Predicted Values for New Observations
New Obs
1
Fit
4607.5
SE Fit
119.0
95% CI
(4368.2, 4846.9)
95% PI
(3337.9, 5877.2)
Size
5.00
10. What can we say about the credit balance for a customer that has a household size of 10?
Explain your answer.
We cannot say anything about the credit balance for a customer that has a household size
of 10 because since the maximum value of the predictor variable (size) used to formulate
the given regression model is only 7, which is much less than 10; therefore, we cannot
use the given regression model to accurately estimate the credit balance for a customer
that has a household size of 10. Correct
Coef
1276.0
32.272
346.85
7.88
SE Coef
273.6
4.348
36.03
12.34
R-Sq = 80.5%
T
4.66
7.42
9.63
0.64
P
0.000
0.000
0.000
0.526
R-Sq(adj) = 79.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
46
49
SS
34255444
8297619
42553062
Source
Income ($1000)
Size
Years
DF
1
1
1
Seq SS
16703393
17478430
73620
MS
11418481
180383
F
63.30
P
0.000
Unusual Observations
Obs
3
5
11
17
Income
($1000)
32.0
31.0
25.0
55.0
Credit
Balance($)
5100.0
1864.0
4208.0
4412.0
Fit
3830.1
3001.7
3210.1
5250.3
SE Fit
93.7
139.3
103.3
116.3
Residual
1269.9
-1137.7
997.9
-838.3
St Resid
3.07R
-2.84R
2.42R
-2.05R
Correct
12. Perform the Global Test for Utility (F-Test). Explain your conclusion.
The null hypothesis, Ho states that there is no significant correlation, or the correlation
coefficient
=0.
MINITAB OUTPUT:
Test for Equal Variances: Credit Balance($) versus Income ($1000)
95% Bonferroni confidence intervals for standard deviations
Income
($1000)
21
22
23
25
26
27
29
30
31
32
33
34
35
37
39
40
41
42
44
46
48
50
51
52
54
55
61
N
2
2
1
1
1
2
1
3
1
1
1
1
1
2
2
1
1
1
1
1
2
2
1
1
3
4
1
Lower
267.855
188.069
*
*
*
101.215
*
123.736
*
*
*
*
*
328.265
276.062
*
*
*
*
*
80.471
259.193
*
*
396.622
290.865
*
StDev
830.85
583.36
*
*
*
313.96
*
309.43
*
*
*
*
*
1018.23
856.31
*
*
*
*
*
249.61
803.98
*
*
991.86
647.76
*
Upper
344720
242037
*
*
*
130260
*
7053
*
*
*
*
*
422465
355281
*
*
*
*
*
103563
333571
*
*
22607
5780
*
62
63
64
65
66
67
2
1
1
1
2
2
221.807
*
*
*
87.765
70.212
688.01
*
*
*
272.24
217.79
285457
*
*
*
112951
90361
N
5
15
8
9
5
5
3
Lower
137.540
459.836
193.542
415.251
340.696
360.277
150.085
StDev
271.807
698.998
336.323
701.689
673.284
711.981
356.267
Upper
1303.27
1337.23
943.85
1796.00
3228.28
3413.83
5956.16
N
2
1
2
2
1
2
1
2
2
4
4
4
5
3
4
2
5
2
2
Lower
541.930
*
452.950
130.788
*
78.920
*
76.013
135.483
204.115
348.641
167.957
584.321
232.333
231.705
111.114
452.721
121.398
540.589
StDev
1714.03
*
1432.60
413.66
*
249.61
*
240.42
428.51
461.26
787.86
379.55
1221.32
590.58
523.61
351.43
946.25
383.96
1709.78
Upper
875261
*
731550
211232
*
127462
*
122768
218815
4413
7538
3631
7236
14935
5010
179457
5607
196067
873094
Conclusion is that since all the p-value of the Bartletts Test (Normal Distribution) is
greater than 0.05, I am unable to reject the null hypothesis. Levenes Test does not
assume Normality and also fails to reject the null hypothesis of equal variance.
13. Perform the t-test on each independent variable. Explain your conclusions and clearly
state how you should proceed. In particular, which independent variables should we keep
and which should be discarded.
Test the significance for the individual coefficients of the independent variables.
The null hypothesis, Ho states that there is no significant correlation, or the correlation
coefficient p = 0.
Decision Rule: Reject Ho if p-value <0.05
MINITAB OUT:
Income ($1000)
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
48
49
SS
16703393
25849669
42553062
MS
16703393
538535
F
31.02
P
0.000
Year
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
48
49
SS
2878
42550184
42553062
MS
2878
886462
F
0.00
P
0.955
Size
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
48
49
SS
24092210
18460853
42553062
MS
24092210
384601
F
62.64
P
0.000
The independent variables of Income ($1000) and Size should kept because they have a
significant contribution in the regression model, but variable Years should be discarded
because it does not have a significant contribution in the regression model. Correct
14. Is this multiple regression model better than the linear model that we generated in parts 110? Explain.
The proportion of variability in a dataset that is accounted for is given by the coefficient
of determination r-square. Thus, the higher the value of r-square, the better is the
regression model. The value of r-square is greater for the multiple regression model
(0.805) as compared to that of the linear regression model (0.566) and hence the multiple
regression model is better than the linear regression model. Correct
Project Part C: Grading Rubric
Category
Points Your
Description
Value Points
Questions 1 - 12 and 14 - 5
pts. each. Everyone gets
credit for No. 7
65
65
Question 13
15
15
Summary
20
20
Total
100
100