0% found this document useful (0 votes)
32 views5 pages

Milestone 1

The document outlines a project proposal for a bank aiming to address customer attrition through data mining and machine learning techniques. It describes the business problem, the dataset containing over 10,000 customer records, and the data mining agenda focusing on demographic analysis, variable significance, inactivity correlation, and transaction behaviors. The proposal emphasizes the importance of customer retention in a competitive environment and suggests actionable insights to improve customer satisfaction and minimize losses.

Uploaded by

tpnvi95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

Milestone 1

The document outlines a project proposal for a bank aiming to address customer attrition through data mining and machine learning techniques. It describes the business problem, the dataset containing over 10,000 customer records, and the data mining agenda focusing on demographic analysis, variable significance, inactivity correlation, and transaction behaviors. The proposal emphasizes the importance of customer retention in a competitive environment and suggests actionable insights to improve customer satisfaction and minimize losses.

Uploaded by

tpnvi95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

COVER PAGE

GBUS 738 - Course Project Assignment


Milestone 1. Project Proposal

GROUP 3
Elizabeth Henderson
Colleen Leavell
Lana Hashem
Logan Slominski
Vi Tran
1) Business Problem Description Commented [EH1]: Up to 1/2 page in
A bank manager faces a problem of customer attrition as the bank loses more and more length.

customers. They want to leverage existing customer data to predict churn and offer better services
to retain customers. Churn prediction not only provides insights into customer behaviors, but also
allows the bank to improve customer satisfaction, maintain revenue, and minimize losses.
Customer retention is extremely crucial for service companies due to the competitive business
environment. Yet, there is a significant gap in addressing retention, with 85% of customers
reporting that companies could do more to keep them (Ascarza, 2017).
This problem is ideal for data mining as it is data-rich and action-oriented, making it a
perfect example to use machine learning techniques. For example, American Express (Amex) has
gained traction by applying machine learning to forecast potential churn. It analyzes 115 customer
behavior variables. Amex believes that it can identify 24% of Australian accounts which will close
within the next four months (Harvard Business School, n.d.). Amex also launched Amex Advance
in 2017, a predictive analytics platform designed to provide customized services for merchants to
understand their customers’ behaviors (Ascarza, 2018).

2) Dataset Description
Bank Churners is a dataset representing over 10,000 customers for one credit card company
with 21 data points per customer. The data points include a unique identifying number for each
customer, demographic variables, and customer variables to create a picture of each customer and
the general makeup of the population. The data set can be viewed and downloaded from Commented [EH2]: EH to add in the
kaggle.com using this hyperlink. A data point that can be assessed and immediately implemented individual columns and what they mean.

by the manager to retain customers would be the contact count over the past 12 months. Data points
which help understand the type of customers who trend towards leaving include average utilization
ratio, total transaction count, months inactive over 12 months, and total count change over 12
months. The demographic variables would be beneficial to assess as they can cause outlier
situations regarding a customer’s account closure.
There are a total of 21 variables that include a unique identifier for the customer holding
the account, attrition flag labeling existing and attrited customers, customer age, gender, the
number of dependents a customer claims, education level of the customer ranging from high school
to doctorate and an unknown category, marital status, income category which is grouped by in
uneven ranges of less than $40,000, $40,000 to $60,000, $60,000 to $80,000, $80,000 to $120,000,
$120,000+ and unknown, card category of blue, silver, gold, or platinum, months on book period Commented [EH3]: Demographic
of relationship with bank, total relationship count total number of products held by the customer, Variables
Commented [EH4]: Product Variable
number of months inactive in past 12 months, contacts count 12 months, number of contacts in the
past 12 months, credit limit, total revolving balance on credit card, average open to buy credit line
of last 12 months, total amount changed Q4 to Q1 which is the change in transaction amount from
the end of the year compared to the beginning, total transaction amount, total transaction count,
total count change (Q4 to Q1) from the end of year to beginning, and the average utilization ratio
average card utilization ratio. Below are the descriptive statistics for the numerical (Table 1) and
categorical variables (Table 2).

Table 1
Numerical Variables Min 1st Median Mean 3rd Max
Quartile Quartile
Customer_Age 26 41 46 46.33 52 73
Dependent_count 0 1 2 2.346 3 5
Months_on_book 13 31 36 35.93 40 56
Total_Relationship_Count 1 3 4 3.813 5 6
Months_Inactive_12_mon 0 2 2 2.341 3 6
Contacts_Count_12_mon 0 2 2 2.455 3 6
Credit_Limit 1438 2555 4549 8632 11068 34516
Total_Revolving_Bal 0 359 1276 1163 1784 2517
Avg_Open_To_Buy 3 1324 3474 7469 9859 34516
Total_Amt_Chng_Q4_Q1 0 0.631 0.736 0.7599 0.859 3.397
Total_Trans_Amt 510 2156 3899 4404 4741 18484
Total_Trans_Ct 10 45 67 64.86 81 139
Total_Ct_Chng_Q4_Q1 0 0.582 0.702 0.7122 0.818 3.714
Avg_Utilization_Ratio 0 0.023 0.176 0.2749 0.503 0.999

Table 2
Categorical Unique Most Common Most Common Percentage (%)
Variables Categories Value Frequency
Attrition_Flag 2 Existing Customer 8500 83.95
Gender 2F 5358 52.92
Education_Level 7 Graduate 3128 30.89
Marital_Status 4 Married 4687 46.25
Income_Category 6 Less than $40K 3561 34.94
Card_Category 4 Blue 9436 90.52
3) Data Mining Agenda Commented [EH5]: Up to 1/2 page in
length.
Question 1:

Which demographic groups are most likely to leave? Commented [LH6]: prof comment: write
why u think this is a valid approach
Examining the variables Customer Age, Gender, Dependents, Income Category, Education Level,
and Marital Status provides insights about specific demographics prone to attrition by studying the Commented [EH7]: EH to proof for
mean and percentages of attrited customers and identifying the groups they fall into to proactively demographic variable equivalence.
Commented [EH8R7]: Done!
convince them to stay using marketing schemes.
Question 2: Which variables are the most significant for predicting whether the customer
would leave?

Variable selection methods such as forward or backward selection will be used to find the most
significant factors to customer attrition, which builds a good foundation for the data analysis.

Question 3: Is there a relationship between inactivity and attrition?


We will perform a hypothesis test to study the variables Months Inactive 12 months and Contacts
Count 12 months to identify whether there exists a relationship and correlation between the two
variables and thereby identify if customer inactivity period affects the attrition and prevent it.
Question 4: How do transaction behaviors impact customer attrition?

Studying variables Total Transaction Amount, Total Transaction Count, and Total Count Change
Q4 to Q1 can help determine if a decline in spending activity is an early indicator of churn, so the
bank intervenes by offering cashback offers or personalized rewards.

References
• Ascarza, E. (2017). In pursuit of enhanced customer retention management: Review, key
issues, and future directions. Journal of Marketing Research, 54(2), 1-21.
https://www.hbs.edu/faculty/Publication%20Files/ascarza_et_al_cns_17_e08d63c
f-0b65-4526-9d23-b0b09dcee9b9_538a6ea6-a480-4841-b9f0-a87be24989ba.pdf
• Harvard Business School. (n.d.). American Express: Machine learning for customer
churn prediction and more effective customer retention. Harvard Business Review.
https://d3.harvard.edu/platform-rctom/submission/american-express-machine-learning-
for-customer-churn-prediction-and-more-effective-customer-retention/#
• Ascarza, E. (2018). Retention futility: Targeting high-risk customers might be ineffective.
Journal of Marketing Research, 55(3), 1-13.
https://www.hbs.edu/faculty/Publication%20Files/ascarza_jmr_18_783d54d4-e548-41ed-
b1d7-8a180f1ae85a.pdf

You might also like