40% found this document useful (5 votes)
3K views

Assesment

The document analyzes credit card data using statistical tests to address various questions: 1. A one-sample t-test finds the average credit card balance of $500 is justified. 2. A two-sample t-test shows no significant difference between average balances of men and women. 3. Students and non-students have significantly different average balances based on a two-sample t-test. Regression analyses show credit limit and rating are good predictors of balance, with R-squared values around 0.74. Demographic factors like age, education, and marital status do not influence balance. Ethnicity also has no impact. The data supports the principle that higher credit limits and ratings lead

Uploaded by

subburaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
40% found this document useful (5 votes)
3K views

Assesment

The document analyzes credit card data using statistical tests to address various questions: 1. A one-sample t-test finds the average credit card balance of $500 is justified. 2. A two-sample t-test shows no significant difference between average balances of men and women. 3. Students and non-students have significantly different average balances based on a two-sample t-test. Regression analyses show credit limit and rating are good predictors of balance, with R-squared values around 0.74. Demographic factors like age, education, and marital status do not influence balance. Ethnicity also has no impact. The data supports the principle that higher credit limits and ratings lead

Uploaded by

subburaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

1. A company manager says that the average balance on their credit cards is $500.

Do you
think that this assertion is justified? Use a one-sample t-test to draw your conclusion.

Yes, the average balance on their credit cards is $500. Hence, Assertion is justified. It is
concluded based on the one-sample t-test.

Explanation :
Null Hypothesis: Average balance of credit card is $500
Alternate Hypothesis: Average balance of credit card is not $500

t-Test: Two-Sample Assuming Unequal


Variances

  Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296

As P value of one tail test is greater than our significance level 0.05, Null hypothesis
cannot be rejected i.e. Average balance of credit card is $500.

2. Is there a difference between men and women as far as average balance is concerned? Use
a two-sample t-test to draw your conclusion. 
There is no significant difference between men and women as far as average balance is
concerned.
Explanation :
Null Hypothesis: Average balance of credit card for men and women has no difference.
Alternate Hypothesis: Average balance of credit card for men and women is different.

t-Test: Two-Sample Assuming Unequal Variances

Men Women
Mean 509.8031088 529.5362
Variance 213554.5652 210187.1
Observations 193 207
Hypothesized Mean Difference 0
df 396
t Stat -0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608

Cannot reject null hypothesis µ1 is equal to µ2


Therefore, average balance of men and women has no significant differences i.e. same.

3. Is there a difference between students and non-students as far as average balance is


concerned? Use a two-sample t-test to draw your conclusion.

Yes, there is a significant difference between Students and Non-students as far as


average balance is concerned.
Explanation :
Null Hypothesis: Average balance of credit card for Students and Non-students has no
difference.
Alternate Hypothesis: Average balance of credit card for Students and Non-students is
different.
As P value of two-tail test is less than our significance level 0.05, Null hypothesis can be
rejected i.e. Average balance of credit card is different for students and non-students.

4. It is generally assumed that if there are more credit cards then the balance on the cards will
be more. Based on this dataset, do you think this is true? Calculate a correlation coefficient
and show a scatter plot to support your answer.
No, this is not true. There is no Correlation between them. Correlation coefficient is
very less.

Correlation coefficient:
Cards Balance
Cards 1
Balance 0.086456 1

Correlation coefficient is almost equal to zero, which implies there is no relation between
no.of cards and balance of the cards.
Scatter plot:
The values fall scattered and not following trend line, correlation is very less.

5. Examine whether the following demographic variables influence balance: (a) age, (b) years
of education, (c) marital status. For age and years of education, use scatter plots to depict
their relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion
The demographic variables age, years of education, Martial status has no influence on
credit balance.
5a & 5b
Correlation coefficient:
Age Education Balance
Age 1
Education 0.003619 1
Balance 0.001835 -0.00806 1

Correlation coefficient is almost equal to zero, which implies there is no relation


between age and education on credit balance.

Scatter plot:
It is clear that the trend shows no correlation.so credit balance does not dependent on
tis variables.

5c.
Null Hypothesis: Average balance of credit card for Single and Married is same.
Alternate Hypothesis: Average balance of credit card for single and married is
different.

t-Test: Two-Sample Assuming Unequal Variances

Single Married
Mean 523.2903226 517.9429
Variance 221735.0385 205696.7
Observations 155 245
Hypothesized Mean Difference0
df 319
t Stat 0.112233601
P(T<=t) one-tail0.455354389
t Critical one-tail
1.649644319
P(T<=t) two-tail0.910708777
t Critical two-tail
1.967428387

P value is greater so null hypothesis, so cannot be rejected which means there is no


significant changes caused due to marital status
6. Ethnicity of the cardholder matter does not matter as far a balance is concerned.” Carry out
an analysis of variance (ANOVA) and discuss whether this statement is supported by the data
or not
Null Hypothesis: Ethnicity of the cardholder matter does not matter as far a balance i.e. same
Alternate Hypothesis: Ethnicity of the cardholder matter as far a balance

Based on ANOVA it is clear that P value is greater than 0.05.So Ethnicity has no impact on
balance.

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.2
Asian 102 52256 512.3137 231748.3
Caucasian 199 103181 518.4975 190922.4

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.20047 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321457.71 397 212396.6

Total 84339911.91 399

7. A general principle that credit card companies often follow is to assign a higher credit limit
to people with a higher credit rating. Does the data show that this principle is being followed?
Yes, this principle is followed.

Correlation coefficient:

Limit Rating
Limit 1
Rating 0.99688 1
It has a good agreement.
Scatter plot :

Credit card companies often follow is to assign a higher credit limit to people with a
higher credit rating is true in our case. It is justified based on the correlation.

8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and
the balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.

Simple linear regression :

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.861697
R Square 0.742522
Adjusted R Square 0.741875
Standard Error 233.585
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62624255 62624255 1147.764 2.5E-119
Residual 398 21715657 54561.95
Total 399 84339912

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept -292.79 26.68341 -10.9728 1.18E-24 -345.249 -240.332 -345.249 -240.332
Credit Limit 0.171637 0.005066 33.87867 2.5E-119 0.161677 0.181597 0.161677 0.181597
Scatter plot :

Credit limit is a significant predictor. It has a decent correlation i.e. R2= 0.74

9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients
and R-squared. Show a scatter plot
Simple linear regression :
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904790 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%
Intercept -390.8463418 29.06851463 -13.4457 3.07318E-34 -447.993365 -333.6993186 -447.993365 -333.699
Credit Rating(X) 2.566240327 0.075089102 34.17594 1.8989E-120 2.418619483 2.713861171 2.418619483 2.713861

Scatter plot :
Yes, credit rating influences the credit balance. It has a decent correlation.

10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or
decrease the balance on credit cards. Try to quantify your answers.

 It is clear that the credit card rating and credit limit has Significant impact on credit
card balance. The both have good correlation. They both are significant predictor of
Credit card balance .The balance is high for those who has credit rating and Credit
limit high. Both rating and limit are the significant predictor of balance.

 Higher rating and higher credit limit persons balance can be increased , whereas the
lower rating and lower credit limit people balance has to be decreased.( based on this
analysis)

11. The credit limit is provided as a consolidated amount for all the credit cards the
cardholder has. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X
variables. Report the coefficients. Discuss the effect on the balance of (a) increasing the
credit limit on the same number of cards and (b) increasing the number of cards without
altering the total credit limit.
Multiple Linear regression:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 63132707.37 31566354 590.9238 9.8E-120
Residual 397 21207204.54 53418.65
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
Intercept -369.0359554 36.16414657 -10.2045 7.23E-22 -440.133 -297.939 -440.133 -297.939
Credit Limit 0.171479037 0.005013136 34.20594 2E-120 0.161623 0.181335 0.161623 0.181335
Cards 26.03375427 8.438363509 3.085166 0.002177 9.444291 42.62322 9.444291 42.62322

Credit limit and no.of cards is a significant predictor for credit balance both has greater
impact on the balance.
Correlation coefficient = 0.865 and R-square = 0.748
Increase in single unit ($) of credit limit wit same card will increase 0.17 of balance. (Credit
limit is measured on broader scale compared to cards it has 34.2 as a standard error).
Increase in one card will increase 26.03 in the balance .i.e. increase in card increases the
balance

12. Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficients. Is the coefficient of Income significantly different from zero? What does this say
about the effect of income on balance?
Balance(y) y = 6.0484x + 246.51
R² = 0.215
2500

2000
SUMMARY OUTPUT
1500
Regression Statistics
Multiple R 0.463656457
1000
R Square 0.21497731
Adjusted R Square 0.213004891
500
Standard Error 407.8647195
Observations 400
0
0 50 100 150 200
ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167 108.9917152 1.03089E-22
Residual 398 66208744.51 166353.6
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289 6.90344E-13 181.2467485 311.7827527 181.2467485 311.7827527
Income 6.048363409 0.579350163 10.43991 1.03089E-22 4.909394402 7.187332415 4.909394402 7.187332415

  Income Balance(y)
Income 1
Balance(y) 0.463656457 1

Correlation coefficient for the two variables =0.46


Based on regression coefficient of income is 6.048 .Yes; it is well away from zero it takes the
value from 4.90 to 7.18. Adding one unit of income will increase balance 6.04 more and it is
a significant predictor. Based on scale, seeing it t - stat it is 10.4 standard error away from
zero.

13.Based on the equation derived in question 12, what is the estimated balance for a person
with an income of USD 100k per year?
Balance(y)
2500

2000

1500
f(x) = 6.05 x + 246.51
R² = 0.21
1000

500

0
0 20 40 60 80 100 120 140 160 180 200

Based on the quation derived Y = 6.0484 (X)+246.51


X= Income
Y = 6.0484(100) + 246.51
Estimated balance for a person with an income of USD 100k per year= $ 851.35.

14.Based on the dataset, explore the relationship between credit card balance (Y) and (a)
Income (b) Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple
linear regression model and report the statistical significance of each of these variables.

Multiple regression model:


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R Square
0.875856031
Standard Error
161.9917647
Observations 400

ANOVA
df SS MS F Significance F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept -473.2514026 55.10833546 -8.587655545 2.08837E-16 -581.5945666 -364.908 -581.595 -364.908
Income -7.608832003 0.381931562 -19.92197755 1.37077E-61 -8.359710677 -6.85795 -8.35971 -6.85795
Limit 0.07901642 0.044791005 1.764113581 0.078487737 -0.009042839 0.167076 -0.00904 0.167076
Rating 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324 1.462363 4.085324
Age -0.860030445 0.478700493 -1.796594023 0.073165937 -1.801157147 0.081096 -1.80116 0.081096
Education 1.967791521 2.605290902 0.755305874 0.450516748 -3.154218733 7.089802 -3.15422 7.089802

Income Limit Rating Age Education Balance


Income 1
Limit 0.792088 1
Rating 0.791378 0.99688 1
Age 0.175338 0.100888 0.103165 1
Education -0.02769 -0.02355 -0.03014 0.003619 1
Balance 0.463656 0.861697 0.863625 0.001835 -0.00806 1
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.93547739
R Square 0.875117948
Adjusted R Square 0.874488819
Standard Error 162.8813393
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 73807370.62 36903685.31 1390.999823 4.5212E-180
Residual 397 10532541.29 26530.33071
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -534.8121502 21.60269845 -24.75672896 1.66359E-82 -577.2821357 -492.3421648 -577.282136 -492.3421648
Income -7.672124366 0.378462026 -20.2718472 3.1071E-63 -8.416164597 -6.928084134 -8.4161646 -6.928084134
Rating 3.949264832 0.086209035 45.81033566 1.4482E-160 3.77978154 4.118748125 3.77978154 4.118748125

Explanation :
 Based on the multiple regression analysis it is clear that income and rating are the two
statistically significant predictor based on the p-value.
 These all variables i.e income, education, age, limit and rating together has
contributed to 87.7% of variation in the credit card balance.
 But to understand whether also tis variables have contributed are only some has
contributed to the variation in balance, analysis wit acceptable P-values are retained.
 So retaining the Xs with low p-value i.e. Say only with income and rating, the
regression analysis was done again.
 In this regression, analysis with these two variables showed 87.5% variation in the
credit card balance.
 Which is almost same r-square value as previous.
 Based on that it is very clear that, Income and rating are the two significant predictor.
 Also looking on the errors (residuals) and pattern is studied.
 On focusing income residuals it is seen that more values is on negative side and
specifically more lower income groups and the line of fit is also not linear .
 Residuals of rating showed a positive side for lower and higher rating where it
showed negative rating for other typical ratings, where is the ratings line of fit is
decent.
Concluding Remarks,
 Income and rating are the two important variables contributing to the
change in balance, whereas the limit , age and education is not a significant
variables for the balance.

You might also like