0% found this document useful (0 votes)

93 views18 pages

TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur

The document discusses a dataset containing information on bank clients and marketing campaigns. It includes demographic data on clients like age, job, education as well as data on past interactions and outcomes. The author performs exploratory data analysis on the data, including univariate analysis of key fields and bivariate analysis of job type versus outcome. Overall the document aims to analyze the dataset and client information to help classify and target clients for marketing campaigns.

Uploaded by

Soumit Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views18 pages

TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur

Uploaded by

Soumit Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

1|Page

TITLE: Bank Marketing Classification

Submitted to:

Dr. Supriya Kumar De

Professor
XLRI, Jamshedpur

Report Prepared By:

Soumit Ghosh
PGCBA -2
(SID - EA19053, SMSID - 120755)
Course – Data Mining
XLRI, Jamshedpur

October 29, 2019

2|Page

Contents

1. Problem Context ...................................................................................................................................3

2. Description of fields of dataset.............................................................................................................3
3. Exploratory Data Analysis......................................................................................................................5
4. Modelling Technique...........................................................................................................................13
5. ROC Curves:..........................................................................................................................................15
6. Confusion Matrices..............................................................................................................................17
7. Conclusion:...........................................................................................................................................18
3|Page

1. Problem Context – Targeting Bank Clients

Targeting through telemarketing phone calls to sell long-term deposits of a Portuguese bank.
Within a campaign, the human agents execute phone calls to a list of clients to sell a deposit or if
meanwhile the client calls the contact-centers for any other reason, he is asked to subscribe the
deposit. Thus, the result is a binary unsuccessful or successful contact.

2. Description of fields of dataset

For this statistical analysis, we will analyze data from one table. Description of the tables and
their fields are as follows:

- Variables related to Bank Client Data

Fields Description
Age Client’s age.
Job Client’s type of job.
Client’s marital status, divorced means divorced or
Marital widowed.
Educatio
n Client’s education.
Default Client has previosly defaulted.
Housing Client has a housing loan.
Loan Client has a personal loan.

- Variables related with the last contact of the current campaign:

Fields Description
Contact Contact communication type (telephone or cellular).
Month Last contact month of year.
day_of_wee
k Last contact day of week.
Last contact duration in seconds. If duration is 0s,
then we never contacted a client to sign up for a term
duration deposit account.
4|Page

- Other Attributes

Fields Description
Campaig number of contacts performed during this campaign and for
n this client
number of days that passed by after the client was last
contacted from a previous campaign (numeric; 999 means
Pdays client was not previously contacted)
number of contacts performed before this campaign and for
Previous this client (numeric)
outcome of the previous marketing campaign (categorical,
Poutcome ‘failure’,‘nonexistent’,‘success’)

- social and economic context attributes

Fields Description
Emp.var.rate employment variation rate - quarterly indicator (numeric)
Cons.price.id
x consumer price index - monthly indicator (numeric)
Cons.conf.idx consumer confidence index - monthly indicator (numeric)
Euribor3m euribor 3 month rate - daily indicator (numeric)
Nr.employed number of employees - quarterly indicator (numeric)

- Output Variable

Fields Description
y has the client subscribed a term deposit? (binary, yes, no)
5|Page

3. Exploratory Data Analysis

3.1 Initial Exploration of Data

4. > str(bank)

5. 'data.frame': 41188 obs. of 21 variables:

6. $ age : int 56 57 37 40 56 45 59 41 24 25 ...
7. $ job : Factor w/ 12 levels "admin.","blue-collar",..: 4 8 8 1
8 8 1 2 10 8 ...
8. $ marital : Factor w/ 4 levels "divorced","married",..: 2 2 2 2 2 2
2 2 3 3 ...
9. $ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4 2 4
3 6 8 6 4 ...
10. $ default : Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2 1
2 1 1 ...
11. $ housing : Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1 1
1 3 3 ...
12. $ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1 1
1 1 1 ...
13. $ contact : Factor w/ 2 levels "cellular","telephone": 2 2 2 2
2 2 2 2 2 2 ...
14. $ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7
7 7 7 7 7 ...
15. $ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 2 2 2 2 2
2 2 2 2 2 ...
16. $ duration : int 261 149 226 151 307 198 139 217 380 50 ...
17. $ campaign : int 1 1 1 1 1 1 1 1 1 1 ...
18. $ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
19. $ previous : int 0 0 0 0 0 0 0 0 0 0 ...
20. $ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2
2 2 2 2 2 2 2 2 ...
21. $ emp.var.rate : num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
22. $ cons.price.idx: num 94 94 94 94 94 ...
23. $ cons.conf.idx : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4
-36.4 -36.4 -36.4 ...
24. $ euribor3m : num 4.86 4.86 4.86 4.86 4.86 ...
25. $ nr.employed : num 5191 5191 5191 5191 5191 ...
26. $ y : num 0 0 0 0 0 0 0 0 0 0 ...

> summary(bank)

age job marital education

default housing
Min. :17.00 admin. :10422 divorced: 4612 university.degree :
12168 no :32588 no :18622
6|Page

1st Qu.:32.00 blue-collar: 9254 married :24928 high.school :

9515 unknown: 8597 unknown: 990
Median :38.00 technician : 6743 single :11568 basic.9y :
6045 yes : 3 yes :21576
Mean :40.02 services : 3969 unknown : 80 professional.course:
5243
3rd Qu.:47.00 management : 2924 basic.4y :
4176
Max. :98.00 retired : 1720 basic.6y :
2292
(Other) : 6156 (Other) :
1749
loan contact month day_of_week duration
campaign pdays
no :33950 cellular :26144 may :13769 fri:7827 Min. : 0.0
Min. : 1.000 Min. : 0.0
unknown: 990 telephone:15044 jul : 7174 mon:8514 1st Qu.: 102.0
1st Qu.: 1.000 1st Qu.:999.0
yes : 6248 aug : 6178 thu:8623 Median : 180.0
Median : 2.000 Median :999.0
jun : 5318 tue:8090 Mean : 258.3
Mean : 2.568 Mean :962.5
nov : 4101 wed:8134 3rd Qu.: 319.0
3rd Qu.: 3.000 3rd Qu.:999.0
apr : 2632 Max. :4918.0
Max. :56.000 Max. :999.0
(Other): 2016
previous poutcome emp.var.rate cons.price.idx
cons.conf.idx euribor3m nr.employed
Min. :0.000 failure : 4252 Min. :-3.40000 Min. :92.20 Min.
:-50.8 Min. :0.634 Min. :4964
1st Qu.:0.000 nonexistent:35563 1st Qu.:-1.80000 1st Qu.:93.08 1st
Qu.:-42.7 1st Qu.:1.344 1st Qu.:5099
Median :0.000 success : 1373 Median : 1.10000 Median :93.75
Median :-41.8 Median :4.857 Median :5191
Mean :0.173 Mean : 0.08189 Mean :93.58 Mean
:-40.5 Mean :3.621 Mean :5167
3rd Qu.:0.000 3rd Qu.: 1.40000 3rd Qu.:93.99 3rd
Qu.:-36.4 3rd Qu.:4.961 3rd Qu.:5228
Max. :7.000 Max. : 1.40000 Max. :94.77 Max.
:-26.9 Max. :5.045 Max. :5228

y
Min. :0.0000
1st Qu.:0.0000
Median :0.0000
Mean :0.1127
3rd Qu.:0.0000
Max. :1.0000

3.2 Getting the Idea of missing variables

7|Page

We can see in the dataset, there are no missing variables. Fields contain values as ‘unknown’.
But these are non-influential data points. Hence can be ignored.

3.3 Univariate Analysis

Plot Distribution of Age

Here we can conclude that Banks only contact persons between the age of 20 to 60.

Plot Distribution of Jobs

8|Page

Here, we can conclude that Banks contact persons with Job titles admin and blue-collar than the
rest.

Distribution of Marital Status

Married Men are more likely to be contacted by the Bank.

Distribution of Education
9|Page

3.4 Bivariate Analysis

Bivariate analysis of Jobs with respect to outcome yes.

Total Observations in Table: 41188

| bank$y
bank$job | 0 | 1 | Row Total |
--------------|-----------|-----------|-----------|
admin. | 9070 | 1352 | 10422 |
| 3.423 | 26.961 | |
| 0.870 | 0.130 | 0.253 |
| 0.248 | 0.291 | |
| 0.220 | 0.033 | |
--------------|-----------|-----------|-----------|
blue-collar | 8616 | 638 | 9254 |
| 19.926 | 156.951 | |
| 0.931 | 0.069 | 0.225 |
| 0.236 | 0.138 | |
| 0.209 | 0.015 | |
--------------|-----------|-----------|-----------|
entrepreneur | 1332 | 124 | 1456 |
| 1.240 | 9.767 | |
| 0.915 | 0.085 | 0.035 |
| 0.036 | 0.027 | |
| 0.032 | 0.003 | |
--------------|-----------|-----------|-----------|
housemaid | 954 | 106 | 1060 |
| 0.191 | 1.507 | |
| 0.900 | 0.100 | 0.026 |
| 0.026 | 0.023 | |
| 0.023 | 0.003 | |
--------------|-----------|-----------|-----------|
management | 2596 | 328 | 2924 |
| 0.001 | 0.006 | |
| 0.888 | 0.112 | 0.071 |
| 0.071 | 0.071 | |
| 0.063 | 0.008 | |
--------------|-----------|-----------|-----------|
retired | 1286 | 434 | 1720 |
| 37.814 | 297.849 | |
| 0.748 | 0.252 | 0.042 |
| 0.035 | 0.094 | |
| 0.031 | 0.011 | |
--------------|-----------|-----------|-----------|
self-employed | 1272 | 149 | 1421 |
| 0.097 | 0.767 | |
10 | P a g e

| 0.895 | 0.105 | 0.035 |

| 0.035 | 0.032 | |
| 0.031 | 0.004 | |
--------------|-----------|-----------|-----------|
services | 3646 | 323 | 3969 |
| 4.375 | 34.458 | |
| 0.919 | 0.081 | 0.096 |
| 0.100 | 0.070 | |
| 0.089 | 0.008 | |
--------------|-----------|-----------|-----------|
student | 600 | 275 | 875 |
| 40.090 | 315.775 | |
| 0.686 | 0.314 | 0.021 |
| 0.016 | 0.059 | |
| 0.015 | 0.007 | |
--------------|-----------|-----------|-----------|
technician | 6013 | 730 | 6743 |
| 0.147 | 1.156 | |
| 0.892 | 0.108 | 0.164 |
| 0.165 | 0.157 | |
| 0.146 | 0.018 | |
--------------|-----------|-----------|-----------|
unemployed | 870 | 144 | 1014 |
| 0.985 | 7.758 | |
| 0.858 | 0.142 | 0.025 |
| 0.024 | 0.031 | |
| 0.021 | 0.003 | |
--------------|-----------|-----------|-----------|
unknown | 293 | 37 | 330 |
| 0.000 | 0.001 | |
| 0.888 | 0.112 | 0.008 |
| 0.008 | 0.008 | |
| 0.007 | 0.001 | |
--------------|-----------|-----------|-----------|
Column Total | 36548 | 4640 | 41188 |
| 0.887 | 0.113 | |
--------------|-----------|-----------|-----------|

Bivariate analysis of Marital status and outcome variable yes:

| bank$y
bank$marital | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
divorced | 4136 | 476 | 4612 |
| 0.464 | 3.652 | |
| 0.897 | 0.103 | 0.112 |
| 0.113 | 0.103 | |
| 0.100 | 0.012 | |
-------------|-----------|-----------|-----------|
married | 22396 | 2532 | 24928 |
| 3.450 | 27.174 | |
| 0.898 | 0.102 | 0.605 |
| 0.613 | 0.546 | |
11 | P a g e

| 0.544 | 0.061 | |
-------------|-----------|-----------|-----------|
single | 9948 | 1620 | 11568 |
| 9.778 | 77.021 | |
| 0.860 | 0.140 | 0.281 |
| 0.272 | 0.349 | |
| 0.242 | 0.039 | |
-------------|-----------|-----------|-----------|
unknown | 68 | 12 | 80 |
| 0.126 | 0.990 | |
| 0.850 | 0.150 | 0.002 |
| 0.002 | 0.003 | |
| 0.002 | 0.000 | |
-------------|-----------|-----------|-----------|
Column Total | 36548 | 4640 | 41188 |
| 0.887 | 0.113 | |
-------------|-----------|-----------|-----------|

Bivaraite analysis of Education with restpect to outcome variable

| bank$y
bank$education | 0 | 1 | Row Total |
--------------------|-----------|-----------|-----------|
basic.4y | 3748 | 428 | 4176 |
| 0.486 | 3.829 | |
| 0.898 | 0.102 | 0.101 |
| 0.103 | 0.092 | |
| 0.091 | 0.010 | |
--------------------|-----------|-----------|-----------|
basic.6y | 2104 | 188 | 2292 |
| 2.423 | 19.088 | |
| 0.918 | 0.082 | 0.056 |
| 0.058 | 0.041 | |
| 0.051 | 0.005 | |
--------------------|-----------|-----------|-----------|
basic.9y | 5572 | 473 | 6045 |
| 8.065 | 63.527 | |
| 0.922 | 0.078 | 0.147 |
| 0.152 | 0.102 | |
| 0.135 | 0.011 | |
--------------------|-----------|-----------|-----------|
high.school | 8484 | 1031 | 9515 |
| 0.198 | 1.561 | |
| 0.892 | 0.108 | 0.231 |
| 0.232 | 0.222 | |
| 0.206 | 0.025 | |
--------------------|-----------|-----------|-----------|
illiterate | 14 | 4 | 18 |
| 0.244 | 1.918 | |
| 0.778 | 0.222 | 0.000 |
| 0.000 | 0.001 | |
12 | P a g e

| 0.000 | 0.000 | |
--------------------|-----------|-----------|-----------|
professional.course | 4648 | 595 | 5243 |
| 0.004 | 0.032 | |
| 0.887 | 0.113 | 0.127 |
| 0.127 | 0.128 | |
| 0.113 | 0.014 | |
--------------------|-----------|-----------|-----------|
university.degree | 10498 | 1670 | 12168 |
| 8.292 | 65.317 | |
| 0.863 | 0.137 | 0.295 |
| 0.287 | 0.360 | |
| 0.255 | 0.041 | |
--------------------|-----------|-----------|-----------|
unknown | 1480 | 251 | 1731 |
| 2.041 | 16.079 | |
| 0.855 | 0.145 | 0.042 |
| 0.040 | 0.054 | |
| 0.036 | 0.006 | |
--------------------|-----------|-----------|-----------|
Column Total | 36548 | 4640 | 41188 |
| 0.887 | 0.113 | |
--------------------|-----------|-----------|-----------|

Default with respect to outcome variable

| bank$y
bank$default | 0 | 1 | Row Total |
-------------|-----------|-----------|-----------|
no | 28391 | 4197 | 32588 |
| 9.562 | 75.315 | |
| 0.871 | 0.129 | 0.791 |
| 0.777 | 0.905 | |
| 0.689 | 0.102 | |
-------------|-----------|-----------|-----------|
unknown | 8154 | 443 | 8597 |
| 36.198 | 285.122 | |
| 0.948 | 0.052 | 0.209 |
| 0.223 | 0.095 | |
| 0.198 | 0.011 | |
-------------|-----------|-----------|-----------|
yes | 3 | 0 | 3 |
| 0.043 | 0.338 | |
| 1.000 | 0.000 | 0.000 |
| 0.000 | 0.000 | |
| 0.000 | 0.000 | |
-------------|-----------|-----------|-----------|
Column Total | 36548 | 4640 | 41188 |
| 0.887 | 0.113 | |
-------------|-----------|-----------|-----------|
13 | P a g e

In default, the value ‘yes’ only results in 1 outcome and hence should be ignored while
modelling.

3.5 Multicollinearity between the socio economic attributes.

The socio economic attributes looks the same and perhaps should have high correlation. To test
for Multicollinearity, we used the VIF method. We took the nr.employed variable and compared
with others

> vif(mod2)
euribor3m cons.conf.idx cons.price.idx emp.var.rate
25.992676 1.215625 3.123787 32.470054

The VIF of three variables is greater than 3. Hence, they were ignored in our Final Model.

3.6 Class Imbalance

table(bank$y)

0 1
36548 4640

Here, we can see the outcome variable which is imbalanced. This class imbalance can cause
wrong values of performance metrics. Even though if it shows high accuracy, this can be
incorrect. To deal with class imbalance we have used the technique of Over-sampling.

4. Modelling Technique

Data Splitting:

We used 75-25% train-test split of data.

Algorithm Used:

- Logistic Regression
- Decision Tree
14 | P a g e

- Naïve Bayes Algorithm

Parameter Settings for R algorithms:

We tried out various parameter tuning settings to find out the best ones, based on a greedy
approach, ie. changing one parameter at a time, for each of the algorithm on each of the data set
seperately. We also used chi-squared test between the categorical variables and outcome
variables to find the significant variables.

The following are parameters which were deleted from Model:

1. Removed four variables: default (lack of variability), housing (lack of information), loan
(lack of information), and emp.var.rate (lack of significance),
2. Re-framed one variable: campaign because it had outliers.
3. Removed due to correlation issues: euribor3m, cons.price.idx, emp.var.rate.

Below are the best parameter settings for each of the algorithms on each dataset:

a) Logistic Regression

In Logistic Regression, we hyper tuned the parameters according to area under ROC
curve, the accuracy and the AIC score.

Following are the final parameters settings we used to maximize the accuracy

glm(formula= y ~ job + contact + month + day_of_week + poutcome + nr.employed +

cons.conf.idx, family = binomial, data = train)

b) Decision Tree

In Decision Tree, we hyper tuned the parameters according to area under ROC curve, the
accuracy. We used two R packages to implement Decision tree.

We decided to keep all the parameters we used to maximize the accuracy

dt_mod1 = rpart( y ~ ., data = ndov, method="class")

dt_mod3 = C5.0(as.factor(y) ~ .,data=ndov)

c) Naïve Bayes

In Naïve Bayes, we hyper tuned the parameters according to area under ROC curve, the
accuracy.

We decided to keep all the parameters we used to maximize the accuracy

dt_mod3 = C5.0(as.factor(y) ~ .,data=ndov)

15 | P a g e

5. ROC Curves:
The ROC Curves for Different Algorithm are:

Logistic Regression:

Decision Tree:
16 | P a g e

Naïve Bayes:
17 | P a g e

6. Confusion Matrices

Logistic Regression:

Predicted Yes Predicted No

Actual Yes 7791 1346

Actual No 413 747

Decision Tree:

Predicted Yes Predicted No

Actual Yes 7310 1827

Actual No 56 1104

Naïve Bayes:
18 | P a g e

Predicted Yes Predicted No

Actual Yes 7563 1572

Actual No 408 885

7. Conclusion:
Performance Parameters Summary:

Models Accuracy Precision(1/0) Recall(1/0) F1 score(1/0) AUC

Logistic 0.8291735 0: 0.94 0: 0.85 0: 0.89 0.79

Regression 1: 0.35 1: 0.64 1: 0.45
Decision 0.817 0: 0.99 0: 0.80 0: 0.88 0.91
Tree 1: 0.37 1: 0.95 1: 0.53
Naïve Bayes 0.82 0: 0.96 0: 0.82 0: 0.89 0.87
Algorithm 1: 0.36 1: 0.77 1: 0.49

Following are the Analytical insights we get from the model:

- Bank should contact customers who are highly educated.
- Bank should contact customers who are married
- Bank should contact customer during the month of April, September and December.
- Bank should always contact previous customers

All our models have high predictive power as Indicated by F1 scores. Although the prediction
power of 0 is almost same for all algorithm. The higher F1 score of 1 makes it a better model.
The accuracy rate is almost similar for all data models. But the AUC score differ a lot . Here also
Decision tree has the highest AUC score.
Therefore, we can conclude that Decision Tree algorithm is the best for this dataset.

My Pals Are Here Maths Homework Book Answers 5a
33% (12)
My Pals Are Here Maths Homework Book Answers 5a
4 pages
Cover Sheet: For Audited Financial Statements
80% (10)
Cover Sheet: For Audited Financial Statements
2 pages
O Level Space Physics Notes
100% (5)
O Level Space Physics Notes
40 pages
SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
Telecom Churn Report
No ratings yet
Telecom Churn Report
66 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Bank Marketing Data Set Analysis
No ratings yet
Bank Marketing Data Set Analysis
33 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Thera Bank-Project
100% (12)
Thera Bank-Project
26 pages
EDA Assignment
100% (1)
EDA Assignment
19 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Upsell Model Case PDF
No ratings yet
Upsell Model Case PDF
48 pages
Check List (Quality Auditors) - Converted1
No ratings yet
Check List (Quality Auditors) - Converted1
65 pages
Bank Rpubs
No ratings yet
Bank Rpubs
24 pages
Produit Bancaire
No ratings yet
Produit Bancaire
15 pages
Project 3 Thera Bank
100% (1)
Project 3 Thera Bank
24 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Mini Project-Data Mining
No ratings yet
Mini Project-Data Mining
25 pages
Cart Project
75% (4)
Cart Project
17 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Capstone - 1 Notes - Vikas Chauhan PDF
100% (3)
Capstone - 1 Notes - Vikas Chauhan PDF
13 pages
Project3: Loading Library
No ratings yet
Project3: Loading Library
17 pages
Advanced Modelling Techniques Anurag Payel
No ratings yet
Advanced Modelling Techniques Anurag Payel
41 pages
PracticalWeek02
No ratings yet
PracticalWeek02
1 page
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Note 4
No ratings yet
Note 4
18 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
Project On Data Mining-Raveendra Babu Gaddam
No ratings yet
Project On Data Mining-Raveendra Babu Gaddam
29 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Animesh Jain
No ratings yet
Animesh Jain
13 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Business Report - ML
No ratings yet
Business Report - ML
25 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
11 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
No ratings yet
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
56 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
No ratings yet
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
20 pages
ML Cops
No ratings yet
ML Cops
17 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Working With Data
No ratings yet
Working With Data
38 pages
22bit0079 VL2024250502751 Ast05
No ratings yet
22bit0079 VL2024250502751 Ast05
26 pages
Project Report Abhay PDF
100% (1)
Project Report Abhay PDF
20 pages
Quiz 3
No ratings yet
Quiz 3
56 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
No ratings yet
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
41 pages
Specsem f2006 Handouts Francis2
No ratings yet
Specsem f2006 Handouts Francis2
49 pages
FRA Group Assignment - Report
No ratings yet
FRA Group Assignment - Report
22 pages
Music Facilities, Architecture, and Planning: Michael Howard, Architect, President Performance Architecture, LLC
No ratings yet
Music Facilities, Architecture, and Planning: Michael Howard, Architect, President Performance Architecture, LLC
12 pages
PB Liberty Data Sheet - PKL-14-5855
No ratings yet
PB Liberty Data Sheet - PKL-14-5855
2 pages
S&S Question Bank
No ratings yet
S&S Question Bank
2 pages
Ramp Check List
No ratings yet
Ramp Check List
1 page
DLP Cot2
No ratings yet
DLP Cot2
3 pages
Biodata of Profvssapkal
No ratings yet
Biodata of Profvssapkal
30 pages
Danfoss Refrigeration Basics - ESSENTIAL
100% (1)
Danfoss Refrigeration Basics - ESSENTIAL
24 pages
(Utkarsh Pandey WTLF)
No ratings yet
(Utkarsh Pandey WTLF)
28 pages
Medical Image Analysis: Published by Elsevier B.V
No ratings yet
Medical Image Analysis: Published by Elsevier B.V
1 page
Overhead Lines Chapter 4 PDF
No ratings yet
Overhead Lines Chapter 4 PDF
102 pages
The Role of Academic Libraries in The Digital Transformation of The Universities
No ratings yet
The Role of Academic Libraries in The Digital Transformation of The Universities
5 pages
1.) Trace The Development of Science and Technology From Pre-Colonial Times Up To The Present. What Have You Observe?
No ratings yet
1.) Trace The Development of Science and Technology From Pre-Colonial Times Up To The Present. What Have You Observe?
1 page
22O23A2 - 1 Business Accounting Case Study 15-Nov-2024
No ratings yet
22O23A2 - 1 Business Accounting Case Study 15-Nov-2024
12 pages
Octavia Manual Running Gear Part4
No ratings yet
Octavia Manual Running Gear Part4
136 pages
2nd Diagnostic Test
No ratings yet
2nd Diagnostic Test
2 pages
F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
No ratings yet
F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
592 pages
Graven and Venkat
No ratings yet
Graven and Venkat
21 pages
Case Study BARGAIN CITY
No ratings yet
Case Study BARGAIN CITY
1 page
Chemical Signalling.
No ratings yet
Chemical Signalling.
73 pages
Lab Manual 10
No ratings yet
Lab Manual 10
12 pages
Online Content Creation Workbook
100% (1)
Online Content Creation Workbook
8 pages
Question 1: How Busy Is Your Schedule?
No ratings yet
Question 1: How Busy Is Your Schedule?
10 pages
建筑师求职信
100% (1)
建筑师求职信
7 pages
Manual F315-F321-F330-F340
No ratings yet
Manual F315-F321-F330-F340
19 pages
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
No ratings yet
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
4 pages
Dawn 2
No ratings yet
Dawn 2
8 pages

TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur

Uploaded by

TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur

Uploaded by

1|Page

TITLE: Bank Marketing Classification

Dr. Supriya Kumar De

Report Prepared By:

October 29, 2019

1. Problem Context ...................................................................................................................................3

1. Problem Context – Targeting Bank Clients

2. Description of fields of dataset

- Variables related to Bank Client Data

- Variables related with the last contact of the current campaign:

- social and economic context attributes

3. Exploratory Data Analysis

3.1 Initial Exploration of Data

5. 'data.frame': 41188 obs. of 21 variables:

age job marital education

1st Qu.:32.00 blue-collar: 9254 married :24928 high.school :

3.2 Getting the Idea of missing variables

3.3 Univariate Analysis

Plot Distribution of Age

Plot Distribution of Jobs

Distribution of Marital Status

Married Men are more likely to be contacted by the Bank.

3.4 Bivariate Analysis

Bivariate analysis of Jobs with respect to outcome yes.

Total Observations in Table: 41188

| 0.895 | 0.105 | 0.035 |

Bivariate analysis of Marital status and outcome variable yes:

Bivaraite analysis of Education with restpect to outcome variable

Default with respect to outcome variable

3.5 Multicollinearity between the socio economic attributes.

3.6 Class Imbalance

We used 75-25% train-test split of data.

- Naïve Bayes Algorithm

The following are parameters which were deleted from Model:

glm(formula= y ~ job + contact + month + day_of_week + poutcome + nr.employed +

We decided to keep all the parameters we used to maximize the accuracy

dt_mod1 = rpart( y ~ ., data = ndov, method="class")

We decided to keep all the parameters we used to maximize the accuracy

dt_mod3 = C5.0(as.factor(y) ~ .,data=ndov)

Predicted Yes Predicted No

Actual Yes 7791 1346

Actual No 413 747

Predicted Yes Predicted No

Actual Yes 7310 1827

Predicted Yes Predicted No

Actual Yes 7563 1572

Actual No 408 885

Models Accuracy Precision(1/0) Recall(1/0) F1 score(1/0) AUC

Logistic 0.8291735 0: 0.94 0: 0.85 0: 0.89 0.79

Following are the Analytical insights we get from the model:

You might also like