Assesment
Assesment
Do you
think that this assertion is justified? Use a one-sample t-test to draw your conclusion.
Yes, the average balance on their credit cards is $500. Hence, Assertion is justified. It is
concluded based on the one-sample t-test.
Explanation :
Null Hypothesis: Average balance of credit card is $500
Alternate Hypothesis: Average balance of credit card is not $500
Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296
As P value of one tail test is greater than our significance level 0.05, Null hypothesis
cannot be rejected i.e. Average balance of credit card is $500.
2. Is there a difference between men and women as far as average balance is concerned? Use
a two-sample t-test to draw your conclusion.
There is no significant difference between men and women as far as average balance is
concerned.
Explanation :
Null Hypothesis: Average balance of credit card for men and women has no difference.
Alternate Hypothesis: Average balance of credit card for men and women is different.
Men Women
Mean 509.8031088 529.5362
Variance 213554.5652 210187.1
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608
4. It is generally assumed that if there are more credit cards then the balance on the cards will
be more. Based on this dataset, do you think this is true? Calculate a correlation coefficient
and show a scatter plot to support your answer.
No, this is not true. There is no Correlation between them. Correlation coefficient is
very less.
Correlation coefficient:
Cards Balance
Cards 1
Balance 0.086456 1
Correlation coefficient is almost equal to zero, which implies there is no relation between
no.of cards and balance of the cards.
Scatter plot:
The values fall scattered and not following trend line, correlation is very less.
5. Examine whether the following demographic variables influence balance: (a) age, (b) years
of education, (c) marital status. For age and years of education, use scatter plots to depict
their relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion
The demographic variables age, years of education, Martial status has no influence on
credit balance.
5a & 5b
Correlation coefficient:
Age Education Balance
Age 1
Education 0.003619 1
Balance 0.001835 -0.00806 1
Scatter plot:
It is clear that the trend shows no correlation.so credit balance does not dependent on
tis variables.
5c.
Null Hypothesis: Average balance of credit card for Single and Married is same.
Alternate Hypothesis: Average balance of credit card for single and married is
different.
Single Married
Mean 523.2903226 517.9429
Variance 221735.0385 205696.7
Observations 155 245
Hypothesized Mean Difference0
df 319
t Stat 0.112233601
P(T<=t) one-tail0.455354389
t Critical one-tail
1.649644319
P(T<=t) two-tail0.910708777
t Critical two-tail
1.967428387
Based on ANOVA it is clear that P value is greater than 0.05.So Ethnicity has no impact on
balance.
SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.2
Asian 102 52256 512.3137 231748.3
Caucasian 199 103181 518.4975 190922.4
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.20047 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321457.71 397 212396.6
7. A general principle that credit card companies often follow is to assign a higher credit limit
to people with a higher credit rating. Does the data show that this principle is being followed?
Yes, this principle is followed.
Correlation coefficient:
Limit Rating
Limit 1
Rating 0.99688 1
It has a good agreement.
Scatter plot :
Credit card companies often follow is to assign a higher credit limit to people with a
higher credit rating is true in our case. It is justified based on the correlation.
8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and
the balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.861697
R Square 0.742522
Adjusted R Square 0.741875
Standard Error 233.585
Observations 400
ANOVA
df SS MS F Significance F
Regression 1 62624255 62624255 1147.764 2.5E-119
Residual 398 21715657 54561.95
Total 399 84339912
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept -292.79 26.68341 -10.9728 1.18E-24 -345.249 -240.332 -345.249 -240.332
Credit Limit 0.171637 0.005066 33.87867 2.5E-119 0.161677 0.181597 0.161677 0.181597
Scatter plot :
Credit limit is a significant predictor. It has a decent correlation i.e. R2= 0.74
9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients
and R-squared. Show a scatter plot
Simple linear regression :
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400
ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904790 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%
Intercept -390.8463418 29.06851463 -13.4457 3.07318E-34 -447.993365 -333.6993186 -447.993365 -333.699
Credit Rating(X) 2.566240327 0.075089102 34.17594 1.8989E-120 2.418619483 2.713861171 2.418619483 2.713861
Scatter plot :
Yes, credit rating influences the credit balance. It has a decent correlation.
10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or
decrease the balance on credit cards. Try to quantify your answers.
It is clear that the credit card rating and credit limit has Significant impact on credit
card balance. The both have good correlation. They both are significant predictor of
Credit card balance .The balance is high for those who has credit rating and Credit
limit high. Both rating and limit are the significant predictor of balance.
Higher rating and higher credit limit persons balance can be increased , whereas the
lower rating and lower credit limit people balance has to be decreased.( based on this
analysis)
11. The credit limit is provided as a consolidated amount for all the credit cards the
cardholder has. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X
variables. Report the coefficients. Discuss the effect on the balance of (a) increasing the
credit limit on the same number of cards and (b) increasing the number of cards without
altering the total credit limit.
Multiple Linear regression:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400
ANOVA
df SS MS F Significance F
Regression 2 63132707.37 31566354 590.9238 9.8E-120
Residual 397 21207204.54 53418.65
Total 399 84339911.91
Credit limit and no.of cards is a significant predictor for credit balance both has greater
impact on the balance.
Correlation coefficient = 0.865 and R-square = 0.748
Increase in single unit ($) of credit limit wit same card will increase 0.17 of balance. (Credit
limit is measured on broader scale compared to cards it has 34.2 as a standard error).
Increase in one card will increase 26.03 in the balance .i.e. increase in card increases the
balance
12. Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficients. Is the coefficient of Income significantly different from zero? What does this say
about the effect of income on balance?
Balance(y) y = 6.0484x + 246.51
R² = 0.215
2500
2000
SUMMARY OUTPUT
1500
Regression Statistics
Multiple R 0.463656457
1000
R Square 0.21497731
Adjusted R Square 0.213004891
500
Standard Error 407.8647195
Observations 400
0
0 50 100 150 200
ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167 108.9917152 1.03089E-22
Residual 398 66208744.51 166353.6
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289 6.90344E-13 181.2467485 311.7827527 181.2467485 311.7827527
Income 6.048363409 0.579350163 10.43991 1.03089E-22 4.909394402 7.187332415 4.909394402 7.187332415
Income Balance(y)
Income 1
Balance(y) 0.463656457 1
13.Based on the equation derived in question 12, what is the estimated balance for a person
with an income of USD 100k per year?
Balance(y)
2500
2000
1500
f(x) = 6.05 x + 246.51
R² = 0.21
1000
500
0
0 20 40 60 80 100 120 140 160 180 200
14.Based on the dataset, explore the relationship between credit card balance (Y) and (a)
Income (b) Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple
linear regression model and report the statistical significance of each of these variables.
Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R Square
0.875856031
Standard Error
161.9917647
Observations 400
ANOVA
df SS MS F Significance F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept -473.2514026 55.10833546 -8.587655545 2.08837E-16 -581.5945666 -364.908 -581.595 -364.908
Income -7.608832003 0.381931562 -19.92197755 1.37077E-61 -8.359710677 -6.85795 -8.35971 -6.85795
Limit 0.07901642 0.044791005 1.764113581 0.078487737 -0.009042839 0.167076 -0.00904 0.167076
Rating 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324 1.462363 4.085324
Age -0.860030445 0.478700493 -1.796594023 0.073165937 -1.801157147 0.081096 -1.80116 0.081096
Education 1.967791521 2.605290902 0.755305874 0.450516748 -3.154218733 7.089802 -3.15422 7.089802
Regression Statistics
Multiple R 0.93547739
R Square 0.875117948
Adjusted R Square 0.874488819
Standard Error 162.8813393
Observations 400
ANOVA
df SS MS F Significance F
Regression 2 73807370.62 36903685.31 1390.999823 4.5212E-180
Residual 397 10532541.29 26530.33071
Total 399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -534.8121502 21.60269845 -24.75672896 1.66359E-82 -577.2821357 -492.3421648 -577.282136 -492.3421648
Income -7.672124366 0.378462026 -20.2718472 3.1071E-63 -8.416164597 -6.928084134 -8.4161646 -6.928084134
Rating 3.949264832 0.086209035 45.81033566 1.4482E-160 3.77978154 4.118748125 3.77978154 4.118748125
Explanation :
Based on the multiple regression analysis it is clear that income and rating are the two
statistically significant predictor based on the p-value.
These all variables i.e income, education, age, limit and rating together has
contributed to 87.7% of variation in the credit card balance.
But to understand whether also tis variables have contributed are only some has
contributed to the variation in balance, analysis wit acceptable P-values are retained.
So retaining the Xs with low p-value i.e. Say only with income and rating, the
regression analysis was done again.
In this regression, analysis with these two variables showed 87.5% variation in the
credit card balance.
Which is almost same r-square value as previous.
Based on that it is very clear that, Income and rating are the two significant predictor.
Also looking on the errors (residuals) and pattern is studied.
On focusing income residuals it is seen that more values is on negative side and
specifically more lower income groups and the line of fit is also not linear .
Residuals of rating showed a positive side for lower and higher rating where it
showed negative rating for other typical ratings, where is the ratings line of fit is
decent.
Concluding Remarks,
Income and rating are the two important variables contributing to the
change in balance, whereas the limit , age and education is not a significant
variables for the balance.