0% found this document useful (0 votes)
46 views10 pages

DADM Assignment

The document analyzes credit card data using statistical tests to understand relationships between various variables. T-tests and ANOVA are used to compare average balances between groups, and correlations and linear regressions identify relationships between balance and factors like credit limit and rating.

Uploaded by

Shammi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views10 pages

DADM Assignment

The document analyzes credit card data using statistical tests to understand relationships between various variables. T-tests and ANOVA are used to compare average balances between groups, and correlations and linear regressions identify relationships between balance and factors like credit limit and rating.

Uploaded by

Shammi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

DADM Assignment

1. A company manager says that the average balance on their credit cards is $500. Do you think
that this assertion is justified? Use a one-sample t-test to draw your conclusion.
Answer: Yes, the average balance on their credit cards is $500. Hence, Assertion is justified. It is
concluded based on one-sample t-test.
Null Hypothesis: Average balance on their credit card is $500.
Alternate Hypothesis: Average balance on their credit card is not $500.

t-Test: Two-Sample
Assuming Unequal Variances

Balance
Mean 520.015
Variance 211378.2253
Observations 400
Hypothesized Mean
Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296

The P value of one tail test sample is greater then 0.05 significance level, so the Null Hypothesis
can be rejected.

2. Is there a difference between men and women as far as average balance is concerned?
Use a two-sample t-test to draw your conclusion.

Answer: There is no difference between men and women with respect to average credit card
balance.

Null Hypothesis: Average balance of credit card for men and women do not have difference.

Alternate Hypothesis: Average balance of credit card for men and women has difference.
t-Test: Two-Sample Assuming
Unequal Variances

Female
Balance Male Balance
Mean 529.5362319 509.8031088
Variance 210187.1043 213554.5652
Observations 207 193
Hypothesized Mean 0
df 396
t Stat 0.42838443
P(T<=t) one-tail 0.334302083
t Critical one-tail 1.648710601
P(T<=t) two-tail 0.668604165
t Critical two-tail 1.965972608
Cannot reject Null hypothesis.

3. Is there a difference between students and non-students as far as average balance is


concerned? Use a two-sample t-test to draw your conclusion.
Answer: Yes, there is significance difference between students and non-students concerned
with average balance.
Null hypothesis: No difference in average balance between students and non-students.
Alternate hypothesis: Difference in average balance between students and non-students.

t-Test: Two-Sample
Assuming Unequal
Variances

Non-Student
Student Balance Balance
Mean 876.825 480.3694444
Variance 240101.9429 193085.1361
Observations 40 360
Hypothesized Mean
Difference 0
df 46
t Stat 4.902778661
P(T<=t) one-tail 6.08619E-06
t Critical one-tail 1.678660414
P(T<=t) two-tail 1.21724E-05
t Critical two-tail 2.012895599
4. It is generally assumed that if there are more credit cards then the balance on the cards
will be more. Based on this dataset, do you think this is true? Calculate a correlation
coefficient and show a scatter plot to support your answer.

Answer: There is no correlation between no of cards and balance of the cards since the
correlation coefficient is almost zero.

Cards Balance
Cards 1
Balance 0.086456 1

Balance
2500

2000

1500

1000

500

0
0 2 4 6 8 10

5. Examine whether the following demographic variables influence balance: (a) age, (b) years of
education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion.
Answer: 5(a) & 5(b) Correlation Coefficient
Age Education Balance
Age 1
Education 0.003619 1
Balance 0.001835 -0.00806 1

Correlation coefficient is almost equal to zero, there is no relation between age, education on
credit balance.
Balance
2500

2000
Balance

1500

1000

500

0
0 20 40 60 80 100 120
Age

Balance
2500

2000
Balance

1500

1000

500

0
0 5 10 15 20 25
Education

t-Test: Two-Sample
Assuming Unequal
Variances

Unmarried
Married Balance Balance
Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean
Difference 0
df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387
5(c) Answer : P value is greater so null hypothesis cannot be rejected, there is no difference in
balance due to marital status.

6. “Ethnicity of the cardholder does not matter as far a balance is concerned.” Carry out an
analysis of variance (ANOVA) and discuss whether this statement is supported by the data or
not.
Answer:
Null Hypothesis: Ethnicity of the card holder does not matter the balance.
Alternate Hypothesis: Ethnicity of the card holder does matter the balance.
Based on ANOVA the P value is greater then 0.05 the significance level so the Ethnicity has no
effect on the balance.

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.2
Asian 102 52256 512.3137 231748.3
Caucasian 199 103181 518.4975 190922.4

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 18454.2 2 9227.1 0.043443 0.957492 3.018452
Within Groups 84321458 397 212396.6

Total 84339912 399


7. A general principle that credits card companies often follow is to assign a higher credit limit
to people with a higher credit rating. Does the data show that this principle is being
followed?
Answer: Yes, this principle is followed. Credit card companies often follow is to assign high
credit limit for people with high credit ratings.

Rating
1200
1000
800
Rating

600
400
200
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Credits

8. Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and the
balance is the Y). Report the coefficients and the R-squared. Show a scatter plot. State
inference.
Answer:
Simple Linear Regression
Balance y = 0.1716x - 292.79
R² = 0.7425
2500
2000
1500
Axis Title

1000
500
0
-500 0 5000 10000 15000
Axis Title

This has a correlation R2 = 0.74.

9. Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients and
R-squared. Show a scatter plot. State inference.
Answer: Yes, Credit rating influences the credit balance. It has a correlation.

Balance y = 2.5662x - 390.85


R² = 0.7458
2500

2000

1500

1000

500

0
0 200 400 600 800 1000 1200
-500
10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decrease
the balance on credit cards. Try to quantify your answers. In this context, focus on possible
specific strategies using variables in Q8 and Q9 that the business could adopt to increase the
balance on credit cards.
Answer:
 Based on the finding of Q8 and Q9, the credit card rating and credit card limit has good
correlation with each other, and it affects the credit card balance of an individual.
 Individual with high credit card rating has high credit limit which increases the credit
card balance of the individual.
 Individuals with lower credit rating will have lower credit limit and which makes the
individual to hold lower credit card balance.
11. 11a. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X variables;
11b. Report the coefficients; 11c. Discuss the effect on balance of (a) increasing the credit
limit on the same number of cards and (b) increasing the number of cards without altering
the total credit limit.
Answer:

Credit card limit and credit card has significant effect on the credit card balance.
The correlation coefficient is 0.865 and R2 value is 0.748.
Increase is credit card limit by 1 $ will increase the balance by 0.17.
Increase is one card for an individual will increase the balance by 26.03.
12. Run a simple linear regression equation with Income as X and Balance as Y. b) Report the
coefficients. c) Is the coefficient of Income significantly different from zero? d) What does this
say about the effect of income on balance?
Answer:

Income Balance
Income 1
Balance 0.463656 1

Correlation coefficient of income and balance is 0.46.


b) Based on the regression coefficient of income is 6.04.
c) Yes, it is well away from zero. Its lower bound and upper bound value is between 4.90 to
7.18.
d) Adding a unit of income will increase the balance by 6.04 and it’s a significant predictor.
13. 13.a) Write the regression equation b) Estimate the balance.
Answer: Regression Equation - Y = B1(X)+B0

14. Generate a regression output and discuss the statistical significance of the coefficients.
Mention how credit balance depends on each of the variables (a) Income (b) Age (c)
Education (c) Limit and (d) Rating.
Answer:

Based on the multiple liner regression model credit rating and income are the two factors
which has a significant effect on the credit balance. The other factors such as age, education
and limit are not significant factors to affect the credit balance. Income and Rating is the
predictor of the credit balance.

You might also like