DADM Assignment
DADM Assignment
1. A company manager says that the average balance on their credit cards is $500. Do you think
that this assertion is justified? Use a one-sample t-test to draw your conclusion.
Answer: Yes, the average balance on their credit cards is $500. Hence, Assertion is justified. It is
concluded based on one-sample t-test.
Null Hypothesis: Average balance on their credit card is $500.
Alternate Hypothesis: Average balance on their credit card is not $500.
t-Test: Two-Sample
Assuming Unequal Variances
Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean
Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296
The P value of one tail test sample is greater then 0.05 significance level, so the Null Hypothesis
can be rejected.
2. Is there a difference between men and women as far as average balance is concerned?
Use a two-sample t-test to draw your conclusion.
Answer: There is no difference between men and women with respect to average credit card
balance.
Null Hypothesis: Average balance of credit card for men and women do not have difference.
Alternate Hypothesis: Average balance of credit card for men and women has difference.
t-Test: Two-Sample Assuming
Unequal Variances
Female
Balance Male Balance
Mean 529.5362319 509.8031088
Variance 210187.1043 213554.5652
Observations 207 193
Hypothesized Mean 0
df 396
t Stat 0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608
Cannot reject Null hypothesis.
t-Test: Two-Sample
Assuming Unequal
Variances
Non-Student
Student Balance Balance
Mean 876.825 480.3694444
Variance 240101.9429 193085.1361
Observations 40 360
Hypothesized Mean
Difference 0
df 46
t Stat 4.902778661
P(T<=t) one-tail 6.08619E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 1.21724E-05
t Critical two-tail 2.012895599
4. It is generally assumed that if there are more credit cards then the balance on the cards
will be more. Based on this dataset, do you think this is true? Calculate a correlation
coefficient and show a scatter plot to support your answer.
Answer: There is no correlation between no of cards and balance of the cards since the
correlation coefficient is almost zero.
Cards Balance
Cards 1
Balance 0.086456 1
Balance
2500
2000
1500
1000
500
0
0 2 4 6 8 10
5. Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion.
Answer: 5(a) & 5(b) Correlation Coefficient
Age Education Balance
Age 1
Education 0.003619 1
Balance 0.001835 -0.00806 1
Correlation coefficient is almost equal to zero, there is no relation between age, education on
credit balance.
Balance
2500
2000
Balance
1500
1000
500
0
0 20 40 60 80 100 120
Age
Balance
2500
2000
Balance
1500
1000
500
0
0 5 10 15 20 25
Education
t-Test: Two-Sample
Assuming Unequal
Variances
Unmarried
Married Balance Balance
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean
Difference 0
df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387
5(c) Answer : P value is greater so null hypothesis cannot be rejected, there is no difference in
balance due to marital status.
6. “Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an
analysis of variance (ANOVA) and discuss whether this statement is supported by the data or
not.
Answer:
Null Hypothesis: Ethnicity of the card holder does not matter the balance.
Alternate Hypothesis: Ethnicity of the card holder does matter the balance.
Based on ANOVA the P value is greater then 0.05 the significance level so the Ethnicity has no
effect on the balance.
SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.2
Asian 102 52256 512.3137 231748.3
Caucasian 199 103181 518.4975 190922.4
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.2 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321458 397 212396.6
Rating
1200
1000
800
Rating
600
400
200
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Credits
8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot. State
inference.
Answer:
Simple Linear Regression
Balance y = 0.1716x - 292.79
R² = 0.7425
2500
2000
1500
Axis Title
1000
500
0
-500 0 5000 10000 15000
Axis Title
9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and
R-squared. Show a scatter plot. State inference.
Answer: Yes, Credit rating influences the credit balance. It has a correlation.
2000
1500
1000
500
0
0 200 400 600 800 1000 1200
-500
10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease
the balance on credit cards. Try to quantify your answers. In this context, focus on possible
specific strategies using variables in Q8 and Q9 that the business could adopt to increase the
balance on credit cards.
Answer:
Based on the finding of Q8 and Q9, the credit card rating and credit card limit has good
correlation with each other, and it affects the credit card balance of an individual.
Individual with high credit card rating has high credit limit which increases the credit
card balance of the individual.
Individuals with lower credit rating will have lower credit limit and which makes the
individual to hold lower credit card balance.
11. 11a. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X variables;
11b. Report the coefficients; 11c. Discuss the effect on balance of (a) increasing the credit
limit on the same number of cards and (b) increasing the number of cards without altering
the total credit limit.
Answer:
Credit card limit and credit card has significant effect on the credit card balance.
The correlation coefficient is 0.865 and R2 value is 0.748.
Increase is credit card limit by 1 $ will increase the balance by 0.17.
Increase is one card for an individual will increase the balance by 26.03.
12. Run a simple linear regression equation with Income as X and Balance as Y. b) Report the
coefficients. c) Is the coefficient of Income significantly different from zero? d) What does this
say about the effect of income on balance?
Answer:
Income Balance
Income 1
Balance 0.463656 1
14. Generate a regression output and discuss the statistical significance of the coefficients.
Mention how credit balance depends on each of the variables (a) Income (b) Age (c)
Education (c) Limit and (d) Rating.
Answer:
Based on the multiple liner regression model credit rating and income are the two factors
which has a significant effect on the credit balance. The other factors such as age, education
and limit are not significant factors to affect the credit balance. Income and Rating is the
predictor of the credit balance.