Yash - Capstone Report PDF Notes1
Yash - Capstone Report PDF Notes1
SUBMITTER:
YASH GUPTA
PGP-DSBA
CONTENTS
Data Visualization:............................................................................................................................................................................................................................. 9
Scaling: .............................................................................................................................................................................................................................................. 12
Split of Data:..................................................................................................................................................................................................................................... 12
Insights .............................................................................................................................................................................................................................................. 13
1
PROBLEM INTRODUCTION
PROBLEM STATEMENT:
An E Commerce company or DTH (you can choose either of these two domains) provider is facing a lot of competition in the current market and it has become a challenge
to retain the existing customers in the current situation. Hence, the company wants to develop a model through which they can do churn prediction of the accounts and
provide segmented offers to the potential churners. In this company, account churn is a major thing because 1 account can have multiple customers. hence by losing one
account the company might be losing more than one customer. You have been assigned to develop a churn prediction model for this company and provide business
recommendations on the campaign. Your campaign suggestion should be unique and be very clear on the campaign offer because your recommendation will go through
the revenue assurance team. If they find that you are giving a lot of free (or subsidized) stuff thereby making a loss to the company; they are not going to approve your
recommendation. Hence be very careful while providing campaign recommendation
PROBLEM UNDERSTANDING:
➢ An E-Commerce company/DTH provider is facing challenges on retaining existing customers due to current market competitive
situation.
➢ As 1 account might have multiple users, so losing 1 account means multiple customers going to be churn.
➢ The company wants to develop a particular model, so future customer churn prediction need to be done and company able to
provide excited offers to the potential churners, Hence Company can improve their revenue.
➢ The Company can also do internal study if any operational causes and improve the product offerings.
➢ Proactive action will prevent loss of revenue for the company and will improve or, retain the market share among the industry peers in
terms of the number of active customers.
DATA DICTIONARY:
2
SAMPLE OF DATA:
AccountID Churn Tenure City Tier CC Contacted LY Payment Gender Service Score Account user count
DATA TYPE:
➢ Dataset has 19 variables i.e. Account ID, Churn, Tenure, City_Tier, CC_Contacted_LY, Payment, Gender, Service_Score,
Account_user_count, Account_segment. CC_Agent_Score,Marital_Status, rev_per_month, Complain_ly, rev_growth_yoy,
coupon_used_for_payment, Day_Since_CC_connect, cashback, Login_device All has different Data Types as above mentioned but it
3
needs to change data type of few of variables.
➢
4
Figure 4: Box Plot of CC_Contacted_LY
5
Table 6: Description of CC_Agent_Score Figure 8: Distribution of CC_Agent_Score Figure 7: Box Plot of CC_Agent_Score
Table 7: Description of rev_per_month Figure 9: Distribution of rev per month Figure 10: Box Plot of rev_per_month
Table 11: Description of rev_growth_yoy Figure 18: Distribution of rev_growth_yoy Figure 17: Box Plot of rev_growth_yoy
6
BIVARIATE ANALYSIS:
7
Figure 20 collinearity
➢ We can see from above both of the figure there are no such multicollinearity present in data.
8
DATA VISUALIZATION:
Figure 23: Gender VS Churn Figure 22: City_Tier VS Churn Figure 21: Payment VS Churn
Figure 26: Service_Score VS Churn Figure 25: account_segment VS Churn Figure 24: Marital_Status VS Churn
Figure 29: Login_device VS Churn Figure 28: Tenure VS Churn Figure 27: CC_Contacted_LY VS Churn
Figure 32: Service_Score VS Churn Figure 31: rev_per_month Figure 30: rev_growth_yoy VS Churn
9
➢ From Figure 21, we can see that no. of male customer churned is more than no. of femalechurned.
➢ From Figure 22, we can see that the people belong to Tier 1 city has churned more thanTier 3 people then Tier 2 people.
➢ From Figure 23, we can see the people who are using their Debit card for the payment themost likely to churn after that credit card user, then E
wallet user, then Cash on delivery user, then UPI user has churned.
➢ From Figure 24, the people who received service score is 3 are most likely to churn, afterthat service score is 2 and then 1 have churned.
➢ From Figure 25, we can see that whose account segment belongs to Regular Plus are mostlikely to churn, after that Super and then HNI Account
segment.
➢ From Figure 26, the people who are single is most likely to churn, after that married peopleand then divorced people.
➢ From Figure 27, we can see that Mobile user are most likely to churn.
➢ From Figure 28, We can see , The churn amount is higher in the initial 2-3 months, which is usually the time when new customers try out the
service and decide whether to continue orcancel. This pretty much can be attributed to the uncertainty in the customer's mind.
➢ From Figure 29, We can see , customer who has contacted customer 10 to 25 times last yearthey are most likely to churn.
➢ From Figure 30, we can see, customer who received service score is 3 they are most likely to churn.
➢ From Figure 31, we can see, who has Percentage of revenue for the last year from 2 to 15are most likely to churn.
➢ From Figure 32, we can see, who has Percentage of revenue from 10 to 25 are most likely to churn.
➢ From Figure 33, we can see, Customer who received Cashback from Rs.125 to Rs.200 aremost likely to churn.
▪ There are missing value and Special character also present in dataset so, we have replaced all specialcharacter/missing value with
their mean, median and mode according to data type.
10
We can see from the above graphs’ outliers are present in the data.So, it needs to treatment of outliers for better
result.
➢ We have treated outliers by using IQR (Inter Quantile Range) Method which is also known ascapping.
➢ IQR=Q3-Q1
➢ Where Q1=0 to 25% of the total range, Q3= 0 to 75% of the total range.
OUTLIERS TREATMENT:
ENCODING OF DATASET:
Churn Tenure City CC Pay Gender Service Accoun Account CC Marital Rev Complain Rev Cou Day Since Ca
Tier Contacted ment Score t User segment Agent Status per ly growth pon CC sh og
LY count Score Month yoy used Connect bac in
for k D
pay ev
ment ic
e
1 4 3 6 0 1 3 3 0 2 1 9 1 11 1 5 159 0
1 0 1 8 4 0 3 4 1 3 1 7 1 15 0 0 120 0
1 0 1 30 0 0 2 4 1 3 1 6 1 14 0 3 165 0
1 0 3 15 0 0 2 4 0 5 1 8 0 23 0 3 134 0
1 0 1 12 1 0 2 3 1 5 1 3 0 11 1 3 129 0
11
SCALING:
Churn Tenure City CC Pay Gender Service Accoun Account CC Marital Rev Complain Rev Cou Day Since Ca L
Tier Contacted ment Score t User segment Agent Status per ly growth pon CC sh og
LY count Score Month yoy used Connect bac in
for k D
pay ev
ment ic
e
1 4 3 6 0 1 3 3 0 2 1 9 1 11 1 5 159 0
1 0 1 8 4 0 3 4 1 3 1 7 1 15 0 0 120 0
1 0 1 30 0 0 2 4 1 3 1 6 1 14 0 3 165 0
1 0 3 15 0 0 2 4 0 5 1 8 0 23 0 3 134 0
1 0 1 12 1 0 2 3 1 5 1 3 0 11 1 3 129 0
SPLIT OF DATA:
12
Table 17: Y Train Table 16: Y Test
REGRESSION RESULT:
INSIGHTS AS OF NOW:
➢ The people belongs to Tier 1 city has churned more then Tier 3 people then Tier 2 people.
➢ People who are using their Debit card for the payment the most likely to churn after that credit card user, then E wallet user, then Cash on delivery user, then UPI user has
churned.
➢ People who received service score is 3 are most likely to churn, after that service score is 2 and then 1 have churned.
➢ Whose account segment belongs to Regular Plus are most likely to churn, after that Super and then HNI Account segment.
➢ People who are single is most likely to churn, after that married people and then divorced people.
➢ The churn amount is higher in the initial 2-3 months, which is usually the time when new customers try out the service and decide whether to continue or cancel. This pretty much
can be attributed to the uncertainty in the customer's mind.
➢ Customer who has contacted customer 10 to 25 times last year they are most likely to churn.
➢ Who has Percentage of revenue of growth from 10 to 25 are most likely to churn.
➢ Who has Percentage of revenue for the last year from 2 to 15 are most likely to churn.
➢ Customer who received Cashback from Rs.125 to Rs.200 are most likely to churn.
13