Task 2 Exploratory Data Analysis
Task 2 Exploratory Data Analysis
Background information
The BCG project team thinks that building a churn model to understand whether price
sensitivity is the largest driver of churn has potential. The client has sent over some data and
the AD wants you to perform some exploratory data analysis.
Task
Sub-Task 1:
Perform some exploratory data analysis. Look into the data types, data
statistics, specific parameters, and variable distributions. This first subtask is
for you to gain a holistic understanding of the dataset. You should spend
around 1 hour on this.
Sub-Task 2:
Verify the hypothesis of price sensitivity being to some extent correlated with
churn. It is up to you to define price sensitivity and calculate it. You should
spend around 30 minutes on this.
Sub-Task 3:
Prepare a half-page summary or slide of key findings and add some suggestions
for data augmentation – which other sources of data should the client provide
you with and which open source datasets might be useful? You should spend
10-15 minutes on this.
Sub-Task 1:
# load packages
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 50)
OUTPUT:
There are 4 features related to date, it is better to convert them to datetime data type.
# convert datetime feature to datetime data type
for f in ['date_activ','date_end','date_modif_prod','date_renewal']:
client[f] = pd.to_datetime(client[f])