Business Report - ML
Business Report - ML
Table of Content
Business Context - Problem Statement 3
Key Features for Analysis 4-6 Pages
1
Types of charts 7-23 Pages
Business Insights: 24
The Thera bank recently saw a steep decline in the number of users of their credit card,
credit cards are a good source of income for banks because of different kinds of fees
charged by the banks like annual fees, balance transfer fees, and cash advance fees, late
payment fees, foreign transaction fees, and others. Some fees are charged to every user
irrespective of usage, while others are charged under specified circumstances.
2
Customers’ leaving credit cards services would lead bank to loss, so the bank wants to
analyse the data of customers and identify the customers who will leave their credit card
services and reason for same – so that bank could improve upon those areas
You as a Data scientist at Thera bank need to come up with a classification model that will
help the bank improve its services so that customers do not renounce their credit cards
3
2.Banking Relationship:
Months_on_book: Duration of the customer’s relationship with the bank.
Total_Relationship_Count: Number of products (services) the customer has with the
bank.
3.Credit Card Activity:
Months_Inactive_12_mon: Number of months the customer was inactive in the past
12 months.
Contacts_Count_12_mon: Number of contacts made with the customer in the past 12
months.
4.Credit and Usage Information:
Credit_Limit: Total credit limit on the customer’s credit card.
Total_Revolving_Bal: The balance that the customer did not pay in full, carried over
from month to month.
Avg_Open_To_Buy: Amount left on the credit card for spending (on average over the
last 12 months).
Avg_Utilization_Ratio: The percentage of the credit limit used by the customer
(calculated using the relationship between credit limit and revolving balance).
Total_Amt_Chng_Q4_Q1: Change in transaction amount from Q4 to Q1.
Total_Trans_Amt: Total transaction amount in the last 12 months.
Total_Trans_Ct: Total number of transactions in the last 12 months.
Total_Ct_Chng_Q4_Q1: Change in transaction count from Q4 to Q1.
1.Data Preprocessing:
Handle missing values in categorical features like Education_Level or
Income_Category.
Convert categorical features (e.g., Gender, Education_Level, Marital_Status) to
numerical values for modeling.
4
2.Exploratory Data Analysis (EDA):
Explore correlations between key features such as credit utilization, transaction
patterns, and customer attrition.
Analyze the distribution of customer demographics (age, income, etc.) and credit card
usage.
3.Model Selection:
Use classification algorithms (e.g., Logistic Regression, Random Forest, Gradient
Boosting) to predict whether a customer will leave.
Evaluate model performance using metrics such as accuracy, precision, recall, and F1-
score to ensure proper detection of attrited customers.
4.Feature Importance:
Identify the most significant predictors of customer attrition, such as inactivity, high
revolving balances, or a decrease in transaction amounts.
5.Actionable Insights:
Based on the model results, the bank can take proactive measures, such as offering
better credit terms, sending reminders to inactive customers, or creating personalized
offers to retain valuable customers.
Type of Data
Data Dictionary
Data Considerations:
Missing values in the Education_Leveland Marital_Statuscolumns need handling.
Attrition_Flagis the target column, which will be used to predict whether a
customer will leave.
The dataset has missing values in the following columns:
Education_Level: 1,519 missing values.
Marital_Status: 749 missing values.
Data Analysis
Mean value for theCustomer Agecolumn is approx 46 and the median is also 46. This
shows that majority of the customers are under 46 years of age.
Dependent Countcolumn has mean and median of~2
6
Months on Bookcolumn has mean and median of36months.Minimumvalue is 13
months, showing that the dataset captures data for customers with the bank at least
1 whole years
Total Relationship Counthas mean and median of~4
Credit Limithas a wide range of1.4K to 34.5K, the median being4.5K, way less than
the mean8.6K • Total Transaction Counthas mean of~65and median of67
7
Months_on_book
Credit_Limit
8
Total_Revolving_Bal
Avg_Open_To_Buy
9
Total_Trans_Ct
Total_Amt_Chng_Q4_Q1
10
Let's see total transaction amount distributed
Total_Trans_Amt
Total_Ct_Chng_Q4_Q1
11
Avg_Utilization_Ratio
Total_Relationship_Count
12
Months_Inactive_12_mon
Contacts_Count_12_mon
13
Gender
14
Marital_Status
15
Let's see the distribution of the level of income of customers
Income_Category
16
Card_Category
17
Attrition_Flag
18
Bivariate Distributions
Let's see the attributes that have a strong correlation with each other
Correlation Check
_Flag 0 1 All
Gender
All 8500 1627 10127
F 4428 930 5358
M 4072 697 4769
19
Attrition_Flag vs Education_Level
Education_Level College Doctorate Graduate High School Post-Graduate \
Attrition_Flag
All 1013 451 3128 2013 516
0 859 356 2641 1707 424
1 154 95 487 306 92
20
Attrition_Flag vs Income_Category
Income_Category $120K + $40K - $60K $60K - $80K $80K - $120K \
Attrition_Flag
All 727 1790 1402 1535
0 601 1519 1213 1293
1 126 271 189 242
21
Attrition_Flag vs Contacts_Count_12_mon
------------------------------------------------------------------------------
-----------
Let's see the number of months a customer was inactive in the last 12 months
(Months_Inactive_12_mon) vary by the customer's account status (Attrition_Flag)
22
Attrition_Flag vs Credit_Limit
23
Attrition_Flag vs Customer_Age
24
Along with the available types of cards, bank can introduce credit cards specific to
online shopping (with % cashback offers) or online food ordering. This way the card
will be used more frequently. With our model, we can predict which customers are
likely to attrite, and according to the predicted probability, at least top 20-30%
customers can be reached out to discuss credit card offers, credit limit increase etc,
to try retain those customers.
25