Banking Analysis
Banking Analysis
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Data exploration
df.info() # Analyze the datatypes of every column
df.describe() # Analyze the statistics of numerical columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45216 entries, 0 to 45215
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 45216 non-null int64
1 job 45216 non-null object
2 marital 45213 non-null object
3 marital_status 45213 non-null object
4 education 45213 non-null object
5 default 45216 non-null object
6 balance 45216 non-null int64
7 housing 45216 non-null object
8 loan 45216 non-null object
9 contact 45216 non-null object
10 day 45216 non-null int64
11 month 45216 non-null object
12 day_month 45216 non-null object
13 duration 45216 non-null int64
14 campaign 45216 non-null int64
15 pdays 45216 non-null int64
16 previous 45216 non-null int64
17 poutcome 45216 non-null object
18 y 45216 non-null object
dtypes: int64(7), object(12)
memory usage: 6.6+ MB
plt.tight_layout()
plt.show()
# For the second plot, show distribution of days for those who were contacted
plt.subplot(1, 2, 2)
# Only include pdays > 0 (excluding -1)
contacted_days = df[df['pdays'] != -1]['pdays']
plt.hist(contacted_days, bins=30, edgecolor='black')
plt.title('Distribution of Days Since Previous Contact\n(For Previously Contacted Clients)')
plt.xlabel('Number of days since previous contact')
plt.ylabel('Number of clients')
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
# Statistical Analysis
print("\nStatistical Analysis:")
JOB category:
Count Subscription Rate
job
admin. 5171 0.122
blue-collar 9731 0.073
entrepreneur 1487 0.083
housemaid 1240 0.088
management 9458 0.138
retired 2266 0.229
self-employed 1579 0.118
services 4154 0.089
student 936 0.287
technician 7597 0.111
unemployed 1303 0.155
unknown 288 0.118
MARITAL category:
Count Subscription Rate
marital
divorced 5207 0.120
married 27216 0.101
single 12787 0.150
EDUCATION category:
Count Subscription Rate
education
primary 6851 0.086
secondary 23201 0.106
tertiary 13301 0.150
unknown 1857 0.136
DEFAULT category:
Count Subscription Rate
default
no 44395 0.118
yes 815 0.064
HOUSING category:
Count Subscription Rate
housing
no 20080 0.167
yes 25130 0.077
LOAN category:
Count Subscription Rate
loan
no 37966 0.127
yes 7244 0.067
CONTACT category:
Count Subscription Rate
contact
cellular 29288 0.149
telephone 2902 0.134
unknown 13020 0.041
MONTH category:
Count Subscription Rate
month
apr 2932 0.197
aug 6247 0.110
dec 214 0.467
feb 2649 0.166
jan 1403 0.101
jul 6895 0.091
jun 5341 0.102
mar 477 0.520
may 13766 0.067
nov 3972 0.102
oct 735 0.439
sep 579 0.465
POUTCOME category:
Count Subscription Rate
poutcome
failure 4900 0.126
other 1838 0.167
success 1513 0.648
unknown 36959 0.092
Statistical Analysis:
AGE:
p-value: 0.0748132131
Median for subscribers: 38.0
Median for non-subscribers: 39.0
BALANCE:
p-value: 0.0000000000
Median for subscribers: 733.0
Median for non-subscribers: 417.0
DURATION:
p-value: 0.0000000000
Median for subscribers: 426.0
Median for non-subscribers: 164.0
CAMPAIGN:
p-value: 0.0000000000
Median for subscribers: 2.0
Median for non-subscribers: 2.0
JOB:
Chi-square p-value: 0.0000000000
Success rates by category:
job
admin. 12.20
blue-collar 7.28
entrepreneur 8.27
housemaid 8.79
management 13.77
retired 22.90
self-employed 11.84
services 8.88
student 28.74
technician 11.06
unemployed 15.50
unknown 11.81
Name: count, dtype: float64
MARITAL:
Chi-square p-value: 0.0000000000
Success rates by category:
marital
married 10.13
single 14.95
divorced 11.96
Name: count, dtype: float64
EDUCATION:
Chi-square p-value: 0.0000000000
Success rates by category:
education
secondary 10.57
tertiary 15.01
primary 8.64
unknown 13.57
Name: count, dtype: float64
DEFAULT:
Chi-square p-value: 0.0000023759
Success rates by category:
default
no 11.81
yes 6.38
Name: count, dtype: float64
HOUSING:
Chi-square p-value: 0.0000000000
Success rates by category:
housing
no 16.72
yes 7.70
Name: count, dtype: float64
LOAN:
Chi-square p-value: 0.0000000000
Success rates by category:
loan
no 12.67
yes 6.68
Name: count, dtype: float64
x = np.arange(3)
width = 0.35
plt.tight_layout()
plt.show()