0% found this document useful (0 votes)
819 views66 pages

Telecom Churn Report

The document discusses a telecom company's capstone project analyzing customer churn. It notes high customer attrition is a concern. The company currently only acts when customers cancel, but wants more proactive retention strategies. The project aims to derive insights from customer data to predict potential churn and recommend reduction steps. It outlines importing relevant libraries and reading in a dataset of 26518 customers and 81 variables for exploratory analysis.

Uploaded by

Abhay Poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
819 views66 pages

Telecom Churn Report

The document discusses a telecom company's capstone project analyzing customer churn. It notes high customer attrition is a concern. The company currently only acts when customers cancel, but wants more proactive retention strategies. The project aims to derive insights from customer data to predict potential churn and recommend reduction steps. It outlines importing relevant libraries and reading in a dataset of 26518 customers and 81 variables for exploratory analysis.

Uploaded by

Abhay Poddar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

CAPSTONE PROJECT

PROJECT NOTE 1
MAY 2021

TELECOM CHURN ANALYSIS

Submitted By,
Ajuna P John
Introduction
The telecommunications sector has become one of the main industries in developed
countries. The technical progress and the increasing number of operators raised the level of
competition. Companies are working hard to survive in this competitive market depending on
multiple strategies. Three main strategies have been proposed to generate more revenues:
(1) acquire new customers, (2) upsell the existing customers, and (3) increase the retention
period of customers. However, comparing these strategies taking the value of return on
investment (RoI) of each into account has shown that the third strategy is the most profitable
strategy, proves that retaining an existing customer costs much lower than acquiring a new
one, in addition to being considered much easier than the upselling strategy.
Customer churn is a major problem and one of the most important concerns for large
companies with highly competitive services. Due to the direct effect on the revenues of the
companies, especially in the telecom field, companies are seeking to develop means to predict
potential customer to churn. Therefore, finding factors that increase customer churn is
important to take necessary actions to reduce this churn. On the other hand, predicting the
customers who are likely to leave the company will represent potentially large additional
revenue source if it is done in the early phase.
The reasons that lead customers to the cancellation decision can be numerous,
coming from poor service quality, delay on customer support, prices, new competitors entering
the market, and so on. Usually, there is no single reason, but a combination of events that
somehow culminated in customer displeasure.
Understanding business/social opportunity
If your company were not capable to identify these signals and take actions prior to the
cancel button click, there is no turning back, your customer is already gone. But you still have
something valuable: the data. Your customer left particularly good clues about where you left
to be desired. It can be a valuable source for meaningful insights and to train customer
churn models. Learn from the past and have strategic information at hand to improve
future experiences.
When it comes to the telecommunications segment, there is great room for opportunities.
The wealth and the amount of customer data that carriers collect can contribute a lot to shift
from a reactive to a proactive position. The emergence of sophisticated artificial intelligence
and data analytics techniques further help leverage this rich data to address churn in a much
more effective manner.
Problem Statement:
The senior management in a telecom provider organization is worried about the rising
customer attrition levels. Additionally, a recent independent survey has suggested that the
industry will face increasing churn rates and decreasing ARPU (average revenue per unit).
Need of the study/project:
The effort to retain customers so far has been very reactive. Only when the customer calls to
close their account is when the company acts. That has not proved to be a great strategy so
far. The management team is keen to take more proactive measures on this front. This is to
derive insights, predict the potential behaviour of customers, and then recommend steps to
reduce churn.
Understanding how data was collected in terms of time, frequency and methodology
Generating hypothesis is key to unlock any analytics project. We first list down our
understanding to derive insights through various approach and then proceed from there.

Page 1
Importing libraries
The following packages were installed prior to running the analysis:

• install.packages("rpivotTable")
• install.packages("VIM")
• install.packages("corrplot")

Following which the below mentioned libraries are called for:

• library(rpivotTable)
• library(tidyverse)
• library("VIM")
• library(readr)
• library(corrplot)
The working directory is set and the dataset is being read
#Set working directory

• setwd("D:/GreatLakes/Capstone")
• getwd()
#Read the dataset

• mydata= read.csv("Telecom_Sampled.csv")
• attach(mydata)
• View(mydata)

The snapshot of the data is provided below:

Page 2
Analysis of Dataset
Results of Exploratory Data Analysis

Dimension of the dataset


[1] 26518 81

Number of rows in the dataset


[1] 26518

Number of Columns in the dataset


[1] 81
Names of the variables in the dataset
[1] "mou_Mean" "totmrc_Mean" "rev_Range" "mou_Range"
[5] "change_mou" "drop_blk_Mean" "drop_vce_Range" "owylis_vce_Range"
[9] "mou_opkv_Range" "months" "totcalls" "income"
[13] "eqpdays" "custcare_Mean" "callwait_Mean" "iwylis_vce_Mean"
[17] "callwait_Range" "ccrndmou_Range" "adjqty" "ovrrev_Mean"
[21] "rev_Mean" "ovrmou_Mean" "comp_vce_Mean" "plcd_vce_Mean"
[25] "avg3mou" "avgmou" "avg3qty" "avgqty"
[29] "avg6mou" "avg6qty" "crclscod" "asl_flag"
[33] "prizm_social_one" "area" "refurb_new" "hnd_webcap"
[37] "marital" "ethnic" "age1" "age2"
[41] "models" "hnd_price" "actvsubs" "uniqsubs"
[45] "forgntvl" "dwlltype" "dwllsize" "mailordr"
[49] "occu1" "opk_dat_Mean" "mtrcycle" "numbcars"
[53] "retdays" "truck" "wrkwoman" "roam_Mean"
[57] "recv_sms_Mean" "blck_dat_Mean" "mou_pead_Mean" "churn"
[61] "solflag" "proptype" "mailresp" "cartype"
[65] "car_buy" "children" "csa" "da_Mean"
[69] "da_Range" "datovr_Mean" "datovr_Range" "div_type"
[73] "drop_dat_Mean" "drop_vce_Mean" "adjmou" "totrev"
[77] "adjrev" "avgrev" "Customer_ID" "comp_dat_Mean"
[81] "plcd_dat_Mean"
The following checks were also done as part of the EDA
> head(mydata,10)
> tail(mydata,15)

Class of the dataset


"data.frame"

Page 3
Summary of the dataset
mou_Mean totmrc_Mean rev_Range mou_Range
Min. : 0.0 Min. :-26.91 Min. : 0.00 Min. : 0.0
1st Qu.: 160.5 1st Qu.: 30.00 1st Qu.: 1.98 1st Qu.: 114.0
Median : 368.5 Median : 44.99 Median : 15.99 Median : 245.0
Mean : 533.6 Mean : 47.19 Mean : 44.13 Mean : 378.7
3rd Qu.: 733.2 3rd Qu.: 59.99 3rd Qu.: 57.44 3rd Qu.: 485.0
Max. :7667.8 Max. :399.99 Max. :1524.39 Max. :6233.0
NA's :58 NA's :58 NA's :58 NA's :58
change_mou drop_blk_Mean drop_vce_Range owylis_vce_Range
Min. :-2785.00 Min. : 0.000 Min. : 0.00 Min. : 0.0
1st Qu.: -83.00 1st Qu.: 2.000 1st Qu.: 1.00 1st Qu.: 3.0
Median : -4.75 Median : 5.333 Median : 3.00 Median : 9.0
Mean : -10.21 Mean : 10.220 Mean : 5.52 Mean : 15.9
3rd Qu.: 65.25 3rd Qu.: 12.667 3rd Qu.: 7.00 3rd Qu.: 20.0
Max. : 3046.75 Max. :411.667 Max. :313.00 Max. :542.0
NA's :161
mou_opkv_Range months totcalls income eqpdays
Min. : 0.00 Min. : 6.00 Min. : 0 Min. :1.000 Min. : -5.0
1st Qu.: 16.20 1st Qu.:11.00 1st Qu.: 868 1st Qu.:4.000 1st Qu.: 202.0
Median : 55.73 Median :16.00 Median : 1808 Median :6.000 Median : 326.0
Mean : 117.42 Mean :18.69 Mean : 2936 Mean :5.772 Mean : 376.5
3rd Qu.: 142.03 3rd Qu.:24.00 3rd Qu.: 3534 3rd Qu.:7.000 3rd Qu.: 510.0
Max. :4783.67 Max. :60.00 Max. :92076 Max. :9.000 Max. :1812.0
NA's :6697
custcare_Mean callwait_Mean iwylis_vce_Mean callwait_Range
Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000
Median : 0.000 Median : 0.3333 Median : 2.000 Median : 0.000
Mean : 1.903 Mean : 1.8927 Mean : 8.307 Mean : 1.924
3rd Qu.: 1.667 3rd Qu.: 1.6667 3rd Qu.: 9.333 3rd Qu.: 2.000
Max. :365.667 Max. :212.6667 Max. :519.333 Max. :143.000

ccrndmou_Range adjqty ovrrev_Mean rev_Mean


Min. : 0.000 Min. : 0 Min. : 0.000 Min. : -2.52
1st Qu.: 0.000 1st Qu.: 850 1st Qu.: 0.000 1st Qu.: 33.76
Median : 0.000 Median : 1776 Median : 0.975 Median : 48.80

Page 4
Mean : 7.453 Mean : 2896 Mean : 13.318 Mean : 59.36
3rd Qu.: 7.000 3rd Qu.: 3486 3rd Qu.: 14.175 3rd Qu.: 71.95
Max. :600.000 Max. :92076 Max. :896.087 Max. :926.08
NA's :58 NA's :58
ovrmou_Mean comp_vce_Mean plcd_vce_Mean avg3mou
Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
1st Qu.: 0.00 1st Qu.: 30.67 1st Qu.: 41.0 1st Qu.: 159.0
Median : 2.75 Median : 78.00 Median : 103.8 Median : 369.0
Mean : 40.48 Mean : 112.59 Mean : 149.8 Mean : 538.5
3rd Qu.: 41.25 3rd Qu.: 155.00 3rd Qu.: 204.3 3rd Qu.: 737.0
Max. :3472.25 Max. :1812.67 Max. :2180.3 Max. :7270.0
NA's :58
avgmou avg3qty avgqty avg6mou avg6qty
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0.0
1st Qu.: 177.9 1st Qu.: 57.0 1st Qu.: 63.69 1st Qu.: 169 1st Qu.: 60.0
Median : 365.9 Median : 129.0 Median : 128.84 Median : 375 Median : 130.0
Mean : 493.9 Mean : 185.9 Mean : 177.23 Mean : 527 Mean : 183.8
3rd Qu.: 670.3 3rd Qu.: 247.0 3rd Qu.: 233.62 3rd Qu.: 721 3rd Qu.: 244.0
Max. :6329.4 Max. :3261.0 Max. :2475.75 Max. :5589 Max. :2759.0
NA's :814 NA's :814
crclscod asl_flag prizm_social_one area
AA :9602 N:22535 C :4594 NEW YORK CITY AREA : 3055
A :4338 Y: 3983 R :1279 DC/MARYLAND/VIRGINIA AREA: 1805
BA :3283 S :8475 MIDWEST AREA : 1782
CA :2298 T :3921 LOS ANGELES AREA : 1750
EA :1889 U :6368 SOUTHWEST AREA : 1613
B :1046 NA's:1881 (Other) :16508
(Other):4062 NA's : 5
refurb_new hnd_webcap marital ethnic age1 age2
N:22863 UNKW: 68 A :1361 N :8912 Min. : 0.00 Min. : 0.00
R: 3655 WC : 3408 B :1880 H :3512 1st Qu.: 0.00 1st Qu.: 0.00
WCMB:20660 M :8159 S :3417 Median :36.00 Median : 0.00
NA's: 2382 S :4829 U :2940 Mean :31.16 Mean :21.06
U :9821 G :1584 3rd Qu.:48.00 3rd Qu.:42.00
NA's: 468 (Other):5685 Max. :94.00 Max. :99.00
NA's : 468 NA's :468 NA's :468
models hnd_price actvsubs uniqsubs forgntvl

Page 5
Min. : 1.000 Min. : 9.99 Min. : 0.000 Min. : 1.00 Min. :0.000
1st Qu.: 1.000 1st Qu.: 59.99 1st Qu.: 1.000 1st Qu.: 1.00 1st Qu.:0.000
Median : 1.000 Median : 99.99 Median : 1.000 Median : 1.00 Median :0.000
Mean : 1.569 Mean :104.95 Mean : 1.349 Mean : 1.52 Mean :0.058
3rd Qu.: 2.000 3rd Qu.:149.99 3rd Qu.: 2.000 3rd Qu.: 2.00 3rd Qu.:0.000
Max. :15.000 Max. :499.99 Max. :11.000 Max. :12.00 Max. :1.000
NA's :254 NA's :468
dwlltype dwllsize mailordr occu1 opk_dat_Mean
M : 5169 A :12521 B : 9508 1 : 2723 Min. : 0.0000
S :12968 B : 1348 NA's:17010 2 : 1347 1st Qu.: 0.0000
NA's: 8381 C : 420 5 : 799 Median : 0.0000
J : 384 4 : 493 Mean : 0.4317
O : 315 3 : 428 3rd Qu.: 0.0000
(Other): 1440 (Other): 1240 Max. :247.3333
NA's :10090 NA's :19488
mtrcycle numbcars retdays truck wrkwoman
Min. :0.0000 Min. :1.000 Min. : 0.00 Min. :0.0000 Y : 3272
1st Qu.:0.0000 1st Qu.:1.000 1st Qu.: 37.25 1st Qu.:0.0000 NA's:23246
Median :0.0000 Median :1.000 Median :163.00 Median :0.0000
Mean :0.0137 Mean :1.562 Mean :233.84 Mean :0.1874
3rd Qu.:0.0000 3rd Qu.:2.000 3rd Qu.:391.00 3rd Qu.:0.0000
Max. :1.0000 Max. :3.000 Max. :966.00 Max. :1.0000
NA's :468 NA's :12959 NA's :25652 NA's :468
roam_Mean recv_sms_Mean blck_dat_Mean mou_pead_Mean
Min. : 0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0000
Median : 0.0000 Median : 0.00000 Median : 0.00000 Median : 0.0000
Mean : 1.1811 Mean : 0.03733 Mean : 0.02104 Mean : 0.7392
3rd Qu.: 0.2575 3rd Qu.: 0.00000 3rd Qu.: 0.00000 3rd Qu.: 0.0000
Max. :488.7800 Max. :98.33333 Max. :122.33333 Max. :310.0933
NA's :58
churn solflag proptype mailresp cartype car_buy
Min. :0.00 N : 509 A : 6753 R : 9896 B : 2012 New :11051
1st Qu.:0.00 Y : 6 B : 406 NA's:16622 E : 1582 UNKNOWN:14999
Median :0.00 NA's:26003 D : 204 A : 1524 NA's : 468
Mean :0.24 E : 95 F : 1426
3rd Qu.:0.00 G : 18 C : 1021

Page 6
Max. :1.00 M : 55 (Other): 998
NA's:18987 NA's :17955
children csa da_Mean da_Range datovr_Mean
N : 2705 NYCBRO917: 918 Min. : 0.0000 Min. : 0.000 Min. : 0.0000
Y : 6251 HOUHOU281: 784 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000
NA's:17562 DALDAL214: 762 Median : 0.2475 Median : 0.990 Median : 0.0000
NYCMAN917: 649 Mean : 0.9034 Mean : 1.638 Mean : 0.2544
APCFCH703: 414 3rd Qu.: 0.9900 3rd Qu.: 1.980 3rd Qu.: 0.0000
(Other) :22986 Max. :72.7650 Max. :57.420 Max. :242.8725
NA's : 5 NA's :58 NA's :58 NA's :58
datovr_Range div_type drop_dat_Mean drop_vce_Mean
Min. : 0.0000 BTH : 512 Min. : 0.00000 Min. : 0.0000
1st Qu.: 0.0000 LDD : 4260 1st Qu.: 0.00000 1st Qu.: 0.6667
Median : 0.0000 LTD : 245 Median : 0.00000 Median : 3.0000
Mean : 0.7073 NA's:21501 Mean : 0.03939 Mean : 6.0837
3rd Qu.: 0.0000 3rd Qu.: 0.00000 3rd Qu.: 7.6667
Max. :475.0200 Max. :48.33333 Max. :195.3333
NA's :58
adjmou totrev adjrev avgrev
Min. : 0 Min. : 9.12 Min. : 8.77 Min. : 1.13
1st Qu.: 2454 1st Qu.: 509.31 1st Qu.: 441.51 1st Qu.: 35.45
Median : 5105 Median : 800.98 Median : 733.22 Median : 50.21
Mean : 7707 Mean : 1037.70 Mean : 965.30 Mean : 58.37
3rd Qu.: 9739 3rd Qu.: 1268.38 3rd Qu.: 1191.08 3rd Qu.: 70.27
Max. :174383 Max. :13358.37 Max. :12982.62 Max. :588.27

Customer_ID comp_dat_Mean plcd_dat_Mean


Min. :1000004 Min. : 0.0000 Min. : 0.000
1st Qu.:1025148 1st Qu.: 0.0000 1st Qu.: 0.000
Median :1050404 Median : 0.0000 Median : 0.000
Mean :1050532 Mean : 0.8284 Mean : 0.917
3rd Qu.:1076291 3rd Qu.: 0.0000 3rd Qu.: 0.000
Max. :1099998 Max. :463.3333 Max. :465.000

Page 7
Structure of the dataset

Dataframe - 26518 observations of 81 variables


$ mou_Mean : num 190.2 443 400.5 53.5 37 ...
$ totmrc_Mean : num 63.9 40 45 34.7 21 ...
$ rev_Range : num 26 5.1 13.9 18.6 35.8 ...
$ mou_Range : num 43 199 172 78 74 381 462 1280 23 107 ...
$ change_mou : num -11.2 -78 -67.5 12.5 -33 ...
$ drop_blk_Mean : num 4.67 4.33 2 1.33 5.67 ...
$ drop_vce_Range : int 8 5 1 2 1 2 3 19 1 5 ...
$ owylis_vce_Range : int 20 33 7 0 4 40 8 17 2 11 ...
$ mou_opkv_Range : num 28.4 72.5 93.6 4.7 36.6 ...
$ months : int 14 13 29 25 33 25 16 19 9 8 ...
$ totcalls : int 1104 2237 3276 1932 652 5122 3138 5456 314 3441 ...
$ income : int 9 9 6 8 5 NA 8 NA 3 5 ...
$ eqpdays : int 403 404 213 757 983 273 391 283 258 215 ...
$ custcare_Mean : num 0 1.333 1 4.667 0.333 ...
$ callwait_Mean : num 0.667 0 0 0 0 ...
$ iwylis_vce_Mean : num 10.3 16.3 0 0 0 ...
$ callwait_Range : int 2 0 0 0 0 0 2 3 0 2 ...
$ ccrndmou_Range : int 0 14 28 76 4 0 0 220 16 0 ...
$ adjqty : int 1104 2230 3269 1924 636 5098 3138 5415 313 3431 ...
$ ovrrev_Mean : num 2.05 8.84 4.72 9.3 0 ...
$ rev_Mean : num 53.5 34.4 40.5 38.7 21 ...
$ ovrmou_Mean : num 5.5 25 13.5 23.2 0 ...
$ comp_vce_Mean : num 59.3 87 124.7 32.3 10.7 ...
$ plcd_vce_Mean : num 73.3 104.3 153 43 18 ...
$ avg3mou : int 194 469 423 49 48 1204 435 1738 90 1222 ...
$ avgmou : num 182.8 463.8 291.3 243.5 50.7 ...
$ avg3qty : int 91 188 134 25 12 226 129 462 38 520 ...
$ avgqty : num 84.9 185.8 121.1 80.2 20.5 ...
$ avg6mou : int 211 467 333 244 43 1331 452 1419 112 1287 ...

Page 8
$ avg6qty : int 99 166 118 95 16 217 145 374 44 572 ...
$ crclscod : Factor w/ 49 levels "A","A2","A3",..: 4 22 8 22 26 4 4 11 4 21 ...
$ asl_flag : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 2 ...
$ prizm_social_one : Factor w/ 5 levels "C","R","S","T",..: 4 5 1 5 5 5 3 5 1 1 ...
$ area : Factor w/19 levels "ATLANTIC SOUTH AREA", 1 9 10 9 6 1 6 12 5 14
$ refurb_new : Factor w/ 2 levels "N","R": 1 1 1 1 1 2 1 1 1 1 ...
$ hnd_webcap : Factor w/ 3 levels "UNKW","WC","WCMB": 3 3 3 2 NA 3 3 3 3 3 ...
$ marital : Factor w/ 5 levels "A","B","M","S",..: 3 5 3 5 2 5 3 5 3 5 ...
$ ethnic : Factor w/ 17 levels "B","C","D","F",..: 10 14 10 15 10 10 14 6 10 10 ...
$ age1 : int 36 0 32 34 78 0 50 0 40 0 ...
$ age2 : int 0 0 30 38 0 0 48 0 42 0 ...
$ models : int 1 1 2 1 1 2 2 1 1 1 ...
$ hnd_price : num 200 150 100 30 30 ...
$ actvsubs : int 2 1 1 1 1 1 1 1 3 1 ...
$ uniqsubs : int 2 1 1 1 1 2 1 1 3 1 ...
$ forgntvl : int 0 0 0 0 0 0 0 0 0 0 ...
$ dwlltype : Factor w/ 2 levels "M","S": 2 NA 2 NA NA NA 2 NA 2 2 ...
$ dwllsize : Factor w/ 15 levels "A","B","C","D",..: 1 NA 1 NA NA NA 1 NA 1 1 ...
$ mailordr : Factor w/ 1 level "B": 1 NA 1 NA NA NA 1 NA 1 NA ...
$ occu1 : Factor w/ 21 levels "1","2","3","4",..: 10 NA NA NA NA NA 1 NA 4 NA
...
$ opk_dat_Mean : num 0 0 0 0 0 ...
$ mtrcycle : int 0 0 0 0 0 0 0 0 0 0 ...
$ numbcars : int NA 1 1 1 NA NA 2 NA NA 1 ...
$ retdays : int NA NA NA NA NA NA NA NA NA NA ...
$ truck : int 0 0 0 0 0 0 0 0 0 0 ...
$ wrkwoman : Factor w/ 1 level "Y": 1 NA NA NA NA NA NA NA NA NA ...
$ roam_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ recv_sms_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ blck_dat_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ mou_pead_Mean : num 0 0.883 0 0 0 ...
$ churn : int 0 0 0 0 0 0 0 0 0 0 ...
$ solflag : Factor w/ 2 levels "N","Y": NA NA NA NA NA NA NA NA NA NA ...

Page 9
$ proptype : Factor w/ 6 levels "A","B","D","E",..: 1 NA NA NA NA NA NA NA 1 NA
$ mailresp : Factor w/ 1 level "R": 1 NA 1 NA NA NA 1 NA 1 NA ...
$ cartype : Factor w/ 7 levels "A","B","C","D",..: NA NA 5 NA NA NA 5 NA NA 5 ...
$ car_buy : Factor w/ 2 levels "New","UNKNOWN": 2 1 1 1 2 2 1 2 2 2 ...
$ children : Factor w/ 2 levels "N","Y": 2 NA 2 NA NA NA 2 NA 2 NA ...
$ csa : Factor w/ 694 levels "AIRAIK803","AIRAND864",..: 7 271 688 263 344
58 38 435 521 665 ...
$ da_Mean : num 0 0.247 0.495 0 0 ...
$ da_Range : num 0 0.99 1.98 0 0 0 0 0.99 0 0 ...
$ datovr_Mean : num 0 0.877 0 0 0 ...
$ datovr_Range : num 0 3.51 0 0 0 0 0 0.78 0 0 ...
$ div_type : Factor w/ 3 levels "BTH","LDD","LTD": NA NA NA 2 NA NA NA NA NA
$ drop_dat_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ drop_vce_Mean : num 3.667 3 0.333 0.667 1.333 ...
$ adjmou : num 2376 5565 7866 5844 1573 ...
$ totrev : num 519 599 1617 1682 968 ...
$ adjrev : num 489 539 1586 1648 917 ...
$ avgrev : num 37.6 44.9 58.8 68.7 29.6 ...
$ Customer_ID : int 1064525 1048538 1010139 1014496 1012053 1028421 1061694
1038047 1084670 1096350 ...
$ comp_dat_Mean : num 0 0.333 0 0 0 ...
$ plcd_dat_Mean : num 0 0.333 0.333 0 0 ...

Page 10
Conversion to factors
The variables (forgntv1, mtrcycle, truck, churn) were converted to factors
The income levels are converted into ordered factors
eqpdays (“Number of days (age) of current equipment”) has negative values. We have
fixed them with corresponding absolute values

Check for missing values

Found missing values in 42 variables

mou_Mean totmrc_Mean rev_Range mou_Range change_mou


58 58 58 58 161
drop_blk_Mean drop_vce_Range owylis_vce_Range mou_opkv_Rangemonths
0 0 0 0 0
totcalls income eqpdays custcare_Mean callwait_Mean
0 6697 0 0 0
iwylis_vce_Mean callwait_Range ccrndmou_Range adjqty ovrrev_Mean
0 0 0 0 58
rev_Mean ovrmou_Mean comp_vce_Mean plcd_vce_Mean avg3mou
58 58 0 0 0
avgmou avg3qty avgqty avg6mou avg6qty
0 0 0 814 814
crclscod asl_flag prizm_social_one area refurb_new
0 0 1881 5 0
hnd_webcap marital ethnic age1 age2
2382 468 468 468 468
models hnd_price actvsubs uniqsubs forgntvl
0 254 0 0 468
dwlltype dwllsize mailordr occu1 opk_dat_Mean
8381 10090 17010 19488 0
mtrcycle numbcars retdays truck wrkwoman
468 12959 25652 468 23246
roam_Mean recv_sms_Mean blck_dat_Mean mou_pead_Mean churn
58 0 0 0 0
solflag proptype mailresp cartype car_buy
26003 18987 16622 17955 468
children csa da_Mean da_Range datovr_Mean
17562 5 58 58 58
datovr_Range div_type drop_dat_Mean drop_vce_Mean adjmou
58 21501 0 0 0
totrev adjrev avgrev Customer_ID comp_dat_Mean
0 0 0 0 0
plcd_dat_Mean
0

Page 11
Missing value treatment
As missing values are more than 40% of the data, would drop mailordr, occu1, numbcars,
retdays, wrkwoman, solflag, proptype, mailresp, cartype, children and div_type
Since 31 observation rows having “NA” are also having vital other information, hence we
may replace NA with “median value of the column” or “KNN value of the column” to factor
them instead of discarding them
Missing values in the variables `avg6mou`, `avg6qty`, `mou_Mean`, `totmrc_Mean`,
`rev_Range`, `mou_Range`, `change_mou`, `ovrrev_Mean`, `rev_Mean`, `ovrmou_Mean`
which are continuous with less than 10% of data values missing is replaced with the median
value of the column values.
Missing values in the variables "prizm_social_one", "income", "dwlltype", "dwllsize",
"hnd_webcap", "marital", "ethnic", "age1", "age2", "hnd_price", "forgntvl", "mtrcycle", "truck",
"roam_Mean", "car_buy", "csa", "da_Mean", "da_Range", ”datovr_Mean", "datovr_Range",
"area" which are categorical with less than 40% of data values missing are replaced with KNN
of the column values.

Now the working dataset has 26518 observations and 70 variables

Page 12
Structure of working dataset:
'data.frame': 26518 obs. of 70 variables:
$ mou_Mean : num 190.2 443 400.5 53.5 37 ...
$ totmrc_Mean : num 63.9 40 45 34.7 21 ...
$ rev_Range : num 26 5.1 13.9 18.6 35.8 ...
$ mou_Range : num 43 199 172 78 74 ...
$ change_mou : num -11.2 -78 -67.5 12.5 -33 ...
$ drop_blk_Mean : num 4.67 4.33 2 1.33 5.67 ...
$ drop_vce_Range : num 8 5 1 2 1 2 3 16 1 5 ...
$ owylis_vce_Range: num 20 33 7 0 4 40 8 17 2 11 ...
$ mou_opkv_Range : num 28.4 72.5 93.6 4.7 36.6 ...
$ months : num 14 13 29 25 33 25 16 19 9 8 ...
$ totcalls : num 1104 2237 3276 1932 652 ...
$ income : Ord.factor w/ 9 levels "1"<"2"<"3"<"4"<..: 9 9 6 8 5 5 8 6 3 5 ...
$ eqpdays : num 403 404 213 757 970 273 391 283 258 215 ...
$ custcare_Mean : num 0 1.333 1 4 0.333 ...
$ callwait_Mean : num 0.667 0 0 0 0 ...
$ iwylis_vce_Mean : num 10.3 16.3 0 0 0 ...
$ callwait_Range : num 2 0 0 0 0 0 2 3 0 2 ...
$ ccrndmou_Range : num 0 14 17 17 4 0 0 17 16 0 ...
$ adjqty : num 1104 2230 3269 1924 636 ...
$ ovrrev_Mean : num 2.05 8.84 4.72 9.3 0 ...
$ rev_Mean : num 53.5 34.4 40.5 38.7 21 ...
$ ovrmou_Mean : num 5.5 25 13.5 23.2 0 ...
$ comp_vce_Mean : num 59.3 87 124.7 32.3 10.7 ...
$ plcd_vce_Mean : num 73.3 104.3 153 43 18 ...
$ avg3mou : num 194 469 423 49 48 ...
$ avgmou : num 182.8 463.8 291.3 243.5 50.7 ...
$ avg3qty : num 91 188 134 25 12 226 129 462 38 520 ...
$ avgqty : num 84.9 185.8 121.1 80.2 20.5 ...
$ avg6mou : num 211 467 333 244 43 ...
$ avg6qty : num 99 166 118 95 16 217 145 374 44 502 ...

Page 13
$ crclscod : Factor w/ 49 levels "A","A2","A3",..: 4 22 8 22 26 4 4 11 4 21 ...
$ asl_flag : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 2 ...
$ prizm_social_one: Factor w/ 5 levels "C","R","S","T",..: 4 5 1 5 5 5 3 5 1 1 ...
$ area : Factor w/ 19 levels "ATLANTIC SOUTH AREA",..: 1 9 10 9 6 1 6 12 5 14 ...
$ refurb_new : Factor w/ 2 levels "N","R": 1 1 1 1 1 2 1 1 1 1 ...
$ hnd_webcap : Factor w/ 3 levels "UNKW","WC","WCMB": 3 3 3 2 2 3 3 3 3 3 ...
$ marital : Factor w/ 5 levels "A","B","M","S",..: 3 5 3 5 2 5 3 5 3 5 ...
$ ethnic : Factor w/ 17 levels "B","C","D","F",..: 10 14 10 15 10 10 14 6 10 10 ...
$ age1 : int 36 0 32 34 78 0 50 0 40 0 ...
$ age2 : int 0 0 30 38 0 0 48 0 42 0 ...
$ models : num 1 1 2 1 1 2 2 1 1 1 ...
$ hnd_price : num 200 150 100 30 30 ...
$ actvsubs : num 2 1 1 1 1 1 1 1 3 1 ...
$ uniqsubs : num 2 1 1 1 1 2 1 1 3 1 ...
$ forgntvl : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ dwlltype : Factor w/ 2 levels "M","S": 2 1 2 1 1 1 2 1 2 2 ...
$ dwllsize : Factor w/ 15 levels "A","B","C","D",..: 1 1 1 1 8 2 1 3 1 1 ...
$ opk_dat_Mean : num 0 0 0 0 0 ...
$ mtrcycle : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ truck : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ roam_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ recv_sms_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ blck_dat_Mean : num 0 0 0 0 0 0 0 0 0 0 ...
$ mou_pead_Mean : num 0 0.883 0 0 0 ...
$ churn : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ car_buy : Factor w/ 2 levels "New","UNKNOWN": 2 1 1 1 2 2 1 2 2 2 ...
$ csa : Factor w/ 694 levels "AIRAIK803","AIRAND864",..: 7 271 688 263 344 58 38
435 521 665 ...
$ da_Mean : num 0 0.247 0.495 0 0 ...
$ da_Range : num 0 0.99 1.98 0 0 0 0 0.99 0 0 ...
$ datovr_Mean : num 0 0.877 0 0 0 ...
$ datovr_Range : num 0 3.51 0 0 0 0 0 0.78 0 0 ...
$ drop_dat_Mean : num 0 0 0 0 0 0 0 0 0 0 ...

Page 14
$ drop_vce_Mean : num 3.667 3 0.333 0.667 1.333 ...
$ adjmou : num 2376 5565 7866 5844 1573 ...
$ totrev : num 519 599 1617 1682 968 ...
$ adjrev : num 489 539 1586 1648 917 ...
$ avgrev : num 37.6 44.9 58.8 68.7 29.6 ...
$ Customer_ID : int 1064525 1048538 1010139 1014496 1012053 1028421 1061694
1038047 1084670 1096350 ...
$ comp_dat_Mean : num 0 0.333 0 0 0 ...
$ plcd_dat_Mean : num 0 0.333 0.333 0 0 ...

Page 15
Summary of the working dataset:

mou_Mean totmrc_Mean rev_Range mou_Range


Min. : 0.0 Min. :-26.91 Min. : 0.00 Min. : 0.0
1st Qu.: 161.0 1st Qu.: 30.00 1st Qu.: 2.00 1st Qu.: 115.0
Median : 368.5 Median : 44.99 Median : 15.99 Median : 245.0
Mean : 504.4 Mean : 46.54 Mean : 36.97 Mean : 341.2
3rd Qu.: 732.4 3rd Qu.: 59.99 3rd Qu.: 57.28 3rd Qu.: 484.0
Max. :1589.7 Max. :104.95 Max. :140.20 Max. :1037.0

change_mou drop_blk_Mean drop_vce_Range owylis_vce_Range


Min. :-2785.00 Min. : 0.000 Min. : 0.000 Min. : 0.00
1st Qu.: -82.25 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 3.00
Median : -4.75 Median : 5.333 Median : 3.000 Median : 9.00
Mean : -27.39 Mean : 8.533 Mean : 4.606 Mean :13.36
3rd Qu.: 64.00 3rd Qu.:12.667 3rd Qu.: 7.000 3rd Qu.:20.00
Max. : 283.00 Max. :28.000 Max. :16.000 Max. :45.00

mou_opkv_Range months totcalls income eqpdays


Min. : 0.00 Min. : 6.00 Min. : 0 6 :7423 Min. : 0.0
1st Qu.: 16.20 1st Qu.:11.00 1st Qu.: 868 7 :5140 1st Qu.:202.0
Median : 55.73 Median :16.00 Median :1808 5 :3196 Median :326.0
Mean : 96.17 Mean :18.55 Mean :2525 9 :2993 Mean :371.7
3rd Qu.:142.03 3rd Qu.:24.00 3rd Qu.:3534 4 :2592 3rd Qu.:510.0
Max. :330.00 Max. :43.00 Max. :7530 8 :1754 Max. :970.0
(Other):3420
custcare_Mean callwait_Mean iwylis_vce_Mean callwait_Range ccrndmou_Range
Min. :0.000 Min. :0.0000 Min. : 0.000 Min. :0.000 Min. : 0.000
1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000
Median :0.000 Median :0.3333 Median : 2.000 Median :0.000 Median : 0.000
Mean :1.019 Mean :0.9908 Mean : 6.016 Mean :1.277 Mean : 4.217
3rd Qu.:1.667 3rd Qu.:1.6667 3rd Qu.: 9.333 3rd Qu.:2.000 3rd Qu.: 7.000

Page 16
Max. :4.000 Max. :4.0000 Max. :23.000 Max. :5.000 Max. :17.000

adjqty ovrrev_Mean rev_Mean ovrmou_Mean comp_vce_Mean


Min. : 0 Min. : 0.000 Min. : -2.52 Min. : 0.00 Min. : 0.00
1st Qu.: 850 1st Qu.: 0.000 1st Qu.: 33.79 1st Qu.: 0.00 1st Qu.: 30.67
Median :1776 Median : 0.975 Median : 48.80 Median : 2.75 Median : 78.00
Mean :2488 Mean : 8.657 Mean : 55.81 Mean : 25.43 Mean :105.51
3rd Qu.:3486 3rd Qu.:14.141 3rd Qu.: 71.86 3rd Qu.: 41.19 3rd Qu.:155.00
Max. :7440 Max. :35.000 Max. :128.00 Max. :103.00 Max. :341.00

plcd_vce_Mean avg3mou avgmou avg3qty avgqty


Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 41.0 1st Qu.: 159.0 1st Qu.: 177.9 1st Qu.: 57.0 1st Qu.: 63.69
Median :103.8 Median : 369.0 Median : 365.9 Median :129.0 Median :128.84
Mean :139.8 Mean : 508.8 Mean : 472.5 Mean :172.4 Mean :165.93
3rd Qu.:204.3 3rd Qu.: 737.0 3rd Qu.: 670.3 3rd Qu.:247.0 3rd Qu.:233.62
Max. :448.0 Max. :1604.0 Max. :1409.0 Max. :532.0 Max. :488.00

avg6mou avg6qty crclscod asl_flag prizm_social_one


Min. : 0.0 Min. : 0 AA :9602 N:22535 C:4932
1st Qu.: 174.0 1st Qu.: 62 A :4338 Y: 3983 R:1337
Median : 375.0 Median :130 BA :3283 S:9176
Mean : 494.4 Mean :169 CA :2298 T:4208
3rd Qu.: 704.0 3rd Qu.:238 EA :1889 U:6865
Max. :1499.0 Max. :502 B :1046
(Other):4062
area refurb_new hnd_webcap marital ethnic
NEW YORK CITY AREA : 3055 N:22863 UNKW: 68 A:1375 N :9162
DC/MARYLAND/VIRGINIA AREA: 1805 R: 3655 WC : 4258 B:1883 H :3566
MIDWEST AREA : 1783 WCMB:22192 M:8503 S :3479
LOS ANGELES AREA : 1750 S:4889 U :2976
ATLANTIC SOUTH AREA : 1614 U:9868 G :1601

Page 17
SOUTHWEST AREA : 1613 Z :1348
(Other) :14898 (Other):4386
age1 age2 models hnd_price actvsubs
Min. : 0.00 Min. : 0.00 Min. :1.000 Min. : 9.99 Min. :0.00
1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:1.000 1st Qu.: 59.99 1st Qu.:1.00
Median :36.00 Median : 0.00 Median :1.000 Median : 99.99 Median :1.00
Mean :31.42 Mean :21.37 Mean :1.526 Mean :104.42 Mean :1.34
3rd Qu.:48.00 3rd Qu.:44.00 3rd Qu.:2.000 3rd Qu.:149.99 3rd Qu.:2.00
Max. :94.00 Max. :99.00 Max. :3.500 Max. :284.00 Max. :3.50

uniqsubs forgntvl dwlltype dwllsize opk_dat_Mean mtrcycle


Min. :1.000 0:25000 M:11768 A :16148 Min. : 0.0000 0:26162
1st Qu.:1.000 1: 1518 S:14750 B : 3773 1st Qu.: 0.0000 1: 356
Median :1.000 J : 1217 Median : 0.0000
Mean :1.485 C : 1081 Mean : 0.4317
3rd Qu.:2.000 O : 738 3rd Qu.: 0.0000
Max. :3.500 N : 722 Max. :247.3333
(Other): 2839
truck roam_Mean recv_sms_Mean blck_dat_Mean mou_pead_Mean
0:21511 Min. :0.0000 Min. : 0.00000 Min. : 0.00000 Min. : 0.0000
1: 5007 1st Qu.:0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0000
Median :0.0000 Median : 0.00000 Median : 0.00000 Median : 0.0000
Mean :0.1554 Mean : 0.03733 Mean : 0.02104 Mean : 0.7392
3rd Qu.:0.2575 3rd Qu.: 0.00000 3rd Qu.: 0.00000 3rd Qu.: 0.0000
Max. :0.6400 Max. :98.33333 Max. :122.33333 Max. :310.0933

churn car_buy csa da_Mean da_Range


0:20154 New :11350 NYCBRO917: 918 Min. :0.0000 Min. :0.000
1: 6364 UNKNOWN:15168 HOUHOU281: 784 1st Qu.:0.0000 1st Qu.:0.000
DALDAL214: 762 Median :0.2475 Median :0.990
NYCMAN917: 649 Mean :0.5871 Mean :1.297
APCFCH703: 414 3rd Qu.:0.9900 3rd Qu.:1.980

Page 18
SANSAN210: 380 Max. :2.4000 Max. :4.900
(Other) :22611
datovr_Mean datovr_Range drop_dat_Mean drop_vce_Mean
Min. : 0.0000 Min. : 0.0000 Min. : 0.00000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.6667
Median : 0.0000 Median : 0.0000 Median : 0.00000 Median : 3.0000
Mean : 0.2538 Mean : 0.7058 Mean : 0.03939 Mean : 5.1429
3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.00000 3rd Qu.: 7.6667
Max. :242.8725 Max. :475.0200 Max. :48.33333 Max. :18.0000

adjmou totrev adjrev avgrev


Min. : 0 Min. : 9.12 Min. : 8.77 Min. : 1.13
1st Qu.: 2454 1st Qu.: 509.31 1st Qu.: 441.51 1st Qu.: 35.45
Median : 5105 Median : 800.98 Median : 733.22 Median : 50.21
Mean : 7707 Mean : 1037.70 Mean : 895.45 Mean : 55.92
3rd Qu.: 9739 3rd Qu.: 1268.38 3rd Qu.:1191.08 3rd Qu.: 70.27
Max. :174383 Max. :13358.37 Max. :2315.00 Max. :120.00

Customer_ID comp_dat_Mean plcd_dat_Mean


Min. :1000004 Min. : 0.0000 Min. : 0.000
1st Qu.:1025148 1st Qu.: 0.0000 1st Qu.: 0.000
Median :1050404 Median : 0.0000 Median : 0.000
Mean :1050532 Mean : 0.8284 Mean : 0.917
3rd Qu.:1076291 3rd Qu.: 0.0000 3rd Qu.: 0.000
Max. :1099998 Max. :463.3333 Max. :465.000

Balance of the target variable

24% customers have churned and 76% has not churned

Page 19
Univariate and Bivariate Analysis along with Outlier
treatment

Continuous Variables
Boxplot of continuous variables are created and found that outliers are present in most of the
variables. Outlier treatment is done by capping at the upper value. Boxplot, histogram and density
diagram of the treated values are taken for analysis.

On a typical chart of a particular variable, there are 5 plots consisting of boxplot of dataset with
outliers, boxplot with dataset after outlier treatment, histogram, density plots and plot on bivariate
analysis.

Page 20
AN1 - Anlaysis of Mean number AN2 - Anlaysis of Mean
of monthly minutes of use total monthly recurring charge

Page 21
AN3 - Range of revenue AN4 - Range of number of
(charge amount) minutes of use

Page 22
AN5 - Percentage change in monthly AN6 - Anlaysis of Mean
minutes of use vs previous 3 months number of dropped or blocked calls

Page 23
AN7 - Range of number of dropped AN8 - Anlaysis of Range of number of
(failed) voice calls outbound wireless to wireless calls

Page 24
AN9 - Range of unrounded minutes AN10 - Anlaysis of Total number
of use off-peak voice calls of months in service

Page 25
AN11 - Range of unrounded minutes AN12 - Anlaysis of Number of days
of use off-peak voice calls of current equipment

Page 26
AN13 - Range of Mean number of AN14 - Anlaysis of Mean number
customer care calls of call waiting calls

Page 27
AN15 - Range of inbound wireless to AN16 - Anlaysis of Range of
wireless voice calls number of call waiting calls

Page 28
AN17 - Range of unrounded minutes AN18 - Anlaysis of Total number
of use off-peak voice calls of calls over the life of the customer

Page 29
AN19 - Range of mean overage revenue AN20 - Anlaysis of Mean monthly
revenue (charge amount)

Page 30
AN21 - Range of mean overage AN22 - Anlaysis of mean number
minutes of use of completed voice calls

Page 31
AN23 - Mean numner of attempted AN24 - Average monthly minutes of use
voice calls placed over previous three months

Page 32
AN25 - Average monthly minutes AN26 - Anlaysis of monthly number of
of use over the life of the customer calls over the previous three months

Page 33
AN27 - Analysis of Average monthly AN28 - Anlaysis of Average monthly
number of calls over the life minutes of use over six months

Page 34
AN29 - Average monthly number AN30 - Age of first household
of calls over six months member

Page 35
AN31 - Age of second household AN32 - Anlaysis of Number of
member models issued

Page 36
AN33 - Analysis of current AN34 - Anlaysis of active subscribers
handset price in household

Page 37
AN35 - Analysis of unique AN36 - Anlaysis of mean number
users in the household of off-peak data calls

Page 38
AN37 - Analysis of mean AN38 - Anlaysis of mean number
number of roaming calls of received SMS calls

Page 39
AN39 - Analysis of mean number AN40 - Anlaysis of mean minutes
of blocked (failed) data calls of use of peak data calls

Page 40
AN41 - Analysis of mean number AN42 - Anlaysis of number of directory
of directory assisted calls assisted calls

Page 41
AN43 - Analysis of mean revenue AN44 - Anlaysis of range of revenue of
of data overage data overage

Page 42
AN45 - Analysis of dropped (failed) AN46 - Anlaysis of mean number of
data calls dropped (failed) voice calls

Page 43
AN47 - Analysis of billing adjusted AN48 - Anlaysis of Total Revenue
total minutes of use over life

Page 44
AN49 - Analysis of adjusted total AN50 - Anlaysis of Average monthly
revenue over life revenue over the life

Page 45
AN51 - Analysis of mean number AN52 - Anlaysis of mean number of
of completed data calls attempted data calls placed

Page 46
Correlation between the numeric variables

Page 47
Categorical Variables
1. Income

prop.table(table(mydata2$income,mydata2$churn),1)*100
0 1
1 77.56654 22.43346
2 78.40735 21.59265
3 77.72595 22.27405
4 78.85802 21.14198
5 76.72090 23.27910
6 76.76142 23.23858
7 72.39300 27.60700
8 72.12087 27.87913
9 77.28032 22.71968
> table(mydata2$churn, mydata2$income)
1 2 3 4 5 6 7 8 9
0 816 512 1333 2044 2452 5698 3721 1265 2313
1 236 141 382 548 744 1725 1419 489 680
2. Credit class code

Page 48
prop.table(table(mydata2$crclscod,mydata2$churn),1)*100

0 1
A 75.034578 24.965422
A2 65.934066 34.065934
A3 0.000000 100.000000
AA 75.234326 24.765674
B 71.032505 28.967495
B2 78.260870 21.739130
BA 73.865367 26.134633
C 76.334107 23.665893
C2 78.260870 21.739130
C5 85.185185 14.814815
CA 78.633594 21.366406
CC 70.000000 30.000000
CY 82.000000 18.000000
D 74.358974 25.641026
D2 100.000000 0.000000
D4 86.597938 13.402062
D5 93.023256 6.976744
DA 79.162512 20.837488
E 83.870968 16.129032
E2 90.000000 10.000000
E4 87.398374 12.601626
EA 80.836421 19.163579
EC 94.117647 5.882353
EF 100.000000 0.000000
EM 64.705882 35.294118
G 71.186441 28.813559
GA 77.777778 22.222222
GY 70.000000 30.000000
H 100.000000 0.000000
I 79.245283 20.754717
IF 66.666667 33.333333

Page 49
J 80.392157 19.607843
JF 74.137931 25.862069
K 72.727273 27.272727
M 77.500000 22.500000
O 71.428571 28.571429
TP 0.000000 100.000000
U 76.984127 23.015873
U1 82.352941 17.647059
V1 90.000000 10.000000
W 77.777778 22.222222
Y 100.000000 0.000000
Z 72.549020 27.450980
Z1 100.000000 0.000000
Z2 100.000000 0.000000
Z4 84.722222 15.277778
Z5 81.818182 18.181818
ZA 75.783784 24.216216
ZY 78.787879 21.212121

Page 50
3. Account spending limit

prop.table(table(mydata2$asl_flag,mydata2$churn),1)*100

0 1
N 74.97670 25.02330
Y 81.79764 18.20236
table(mydata2$churn, mydata2$asl_flag)
N Y
0 16896 3258
1 5639 725

Page 51
4. Social group letter

prop.table(table(mydata2$prizm_social_one,mydata2$churn),1)*100
0 1
C 76.45985 23.54015
R 74.64473 25.35527
S 76.32956 23.67044
T 74.00190 25.99810
U 76.72251 23.27749

table(mydata2$churn, mydata2$prizm_social_one)

C R S T U
0 3771 998 7004 3114 5267
1 1161 339 2172 1094 1598

Page 52
5. Area
prop.table(table(mydata2$area,mydata2$churn),1)*100

0 1
ATLANTIC SOUTH AREA 76.64188 23.35812
CALIFORNIA NORTH AREA 74.25743 25.74257
CENTRAL/SOUTH TEXAS AREA 78.49462 21.50538
CHICAGO AREA 75.32567 24.67433
DALLAS AREA 75.37367 24.62633
DC/MARYLAND/VIRGINIA AREA 77.17452 22.82548
GREAT LAKES AREA 77.53968 22.46032
HOUSTON AREA 77.34513 22.65487
LOS ANGELES AREA 77.08571 22.91429
MIDWEST AREA 78.63152 21.36848
NEW ENGLAND AREA 75.89928 24.10072
NEW YORK CITY AREA 74.79542 25.20458
NORTH FLORIDA AREA 72.64957 27.35043
NORTHWEST/ROCKY MOUNTAIN AREA 71.70732 28.29268
OHIO AREA 78.52713 21.47287
PHILADELPHIA AREA 72.80000 27.20000
SOUTH FLORIDA AREA 73.13609 26.86391
SOUTHWEST AREA 75.44947 24.55053
TENNESSEE AREA 79.28669 20.71331

table(mydata2$churn, mydata2$area)

ATLANTIC SOUTH AREA CALIFORNIA NORTH AREA CENTRAL/SOUTH TEXAS AREA


CHICAGO AREA
0 1237 1125 949 983
1 377 390 260 322

DALLAS AREA DC/MARYLAND/VIRGINIA AREA GREAT LAKES AREA HOUSTON


AREA

Page 53
0 1059 1393 977 874
1 346 412 283 256

LOS ANGELES AREA MIDWEST AREA NEW ENGLAND AREA NEW YORK CITY AREA
0 1349 1402 1055 2285
1 401 381 335 770

NORTH FLORIDA AREA NORTHWEST/ROCKY MOUNTAIN AREA OHIO AREA


PHILADELPHIA AREA
0 850 735 1013 455
1 320 290 277 170

SOUTH FLORIDA AREA SOUTHWEST AREA TENNESSEE AREA


0 618 1217 578
1 227 396 151

Page 54
6. Handset: refurbished or new

prop.table(table(mydata2$refurb_new,mydata2$churn),1)*100

0 1
N 76.52976 23.47024
R 72.69494 27.30506

table(mydata2$churn, mydata2$refurb_new)
N R
0 17497 2657
1 5366 998

Page 55
7. Handset web capability

prop.table(table(mydata2$hnd_webcap,mydata2$churn),1)*100

0 1
UNKW 91.176471 8.823529
WC 68.529826 31.470174
WCMB 77.388248 22.611752
table(mydata2$churn, mydata2$hnd_webcap)

UNKW WC WCMB
0 62 2918 17174
1 6 1340 5018

Page 56
8. Marital status

prop.table(table(mydata2$marital,mydata2$churn),1)*100
0 1
A 75.20000 24.80000
B 76.63303 23.36697
M 77.09044 22.90956
S 77.17325 22.82675
U 74.47304 25.52696

table(mydata2$churn, mydata2$marital)

A B M S U
0 1034 1443 6555 3773 7349
1 341 440 1948 1116 2519

Page 57
9. Ethnicity roll-up code

prop.table(table(mydata2$ethnic,mydata2$churn),1)*100
0 1
B 70.317003 29.682997
C 91.780822 8.219178
D 70.000000 30.000000
F 75.130435 24.869565
G 77.451593 22.548407
H 75.322490 24.677510
I 73.278520 26.721480
J 74.827109 25.172891
M 80.555556 19.444444
N 76.555337 23.444663
O 70.216963 29.783037
P 81.818182 18.181818
R 72.908367 27.091633
S 75.452716 24.547284
U 75.403226 24.596774
X 83.870968 16.129032
Z 83.605341 16.394659

table(mydata2$churn, mydata2$ethnic)
B C D F G H I J M N O P R S U X
0 244 67 154 432 1240 2686 713 541 29 7014 712 117 183 2625 2244 26
1 103 6 66 143 361 880 260 182 7 2148 302 26 68 854 732 5
Z
0 1127
1 221

Page 58
10. Foreign travel dummy variable

prop.table(table(mydata2$forgntvl,mydata2$churn),1)*100
0 1
0 75.97200 24.02800
1 76.48221 23.51779
table(mydata2$churn, mydata2$forgntvl)

0 1
0 18993 1161
1 6007 357

Page 59
11. Dwelling unit type

prop.table(table(mydata2$dwlltype,mydata2$churn),1)*100
0 1
M 74.79606 25.20394
S 76.96271 23.03729
table(mydata2$churn, mydata2$dwlltype)
M S
0 8802 11352
1 2966 3398

Page 60
12. Dwelling size

prop.table(table(mydata2$dwllsize,mydata2$churn),1)*100

0 1
A 76.66584 23.33416
B 75.05963 24.94037
C 79.92599 20.07401
D 71.33479 28.66521
E 85.22427 14.77573
F 72.54902 27.45098
G 76.26582 23.73418
H 78.38828 21.61172
I 78.14208 21.85792
J 71.07642 28.92358
K 76.39257 23.60743
L 76.76768 23.23232
M 76.84729 23.15271
N 71.05263 28.94737
O 70.46070 29.53930

table(mydata2$churn, mydata2$dwllsize)

A B C D E F G H I J K L M
0 12380 2832 864 326 323 185 241 214 143 865 288 304 156
1 3768 941 217 131 56 70 75 59 40 352 89 92 47

N O
0 513 520
1 209 218

Page 61
13. Motorcycle indicator

prop.table(table(mydata2$mtrcycle,mydata2$churn),1)*100

0 1
0 76.0263 23.9737
1 74.1573 25.8427

table(mydata2$churn, mydata2$mtrcycle)

0 1
0 19890 264
1 6272 92

Page 62
14. Truck indicator

prop.table(table(mydata2$truck,mydata2$churn),1)*100

0 1
0 75.92395 24.07605
1 76.33313 23.66687
table(mydata2$churn, mydata2$truck)
0 1
0 16332 3822
1 5179 1185

Page 63
15. New or used car buyer

prop.table(table(mydata2$car_buy,mydata2$churn),1)*100
0 1
New 76.66960 23.33040
UNKNOWN 75.50105 24.49895

table(mydata2$churn, mydata2$car_buy)

New UNKNOWN
0 8702 11452
1 2648 3716

Page 64
Observation

• Most of the variables (continuous/categorical) follow normal distribution (analysis from


histogram and density plots).
• “mou_Mean” seems to be highly correlated with “comp_vce_mean, plcd_vce_mean”.
• The variables highly correlated are: "comp_vce_Mean", "plcd_vce_Mean", "avg3mou",
"avgmou", "avg3qty", “avgqty", "avg6mou", "avg6qty".
• Churn does not seem to be highly correlated with any of the variables. Churn has
maximum correlation with “opk_dat_Mean” and “mou_pead_Mean”

Page 65

You might also like