0% found this document useful (0 votes)

13 views

Task 2 Exploratory Data Analysis

The document outlines a task for exploratory data analysis (EDA) on customer and pricing data to investigate the correlation between price sensitivity and churn. It includes three subtasks: performing EDA to understand the dataset, verifying the price sensitivity hypothesis, and preparing a summary with suggestions for data augmentation. The initial analysis reveals no missing values or duplicate records in the client dataset, which consists of 14,606 entries and 28 columns.

Uploaded by

rakshanamurugesan0411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Task 2 Exploratory Data Analysis

Uploaded by

rakshanamurugesan0411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Task 2: Exploratory Data Analysis

Background information
The BCG project team thinks that building a churn model to understand whether price
sensitivity is the largest driver of churn has potential. The client has sent over some data and
the AD wants you to perform some exploratory data analysis.

The data that was sent over includes:

 Historical customer data: Customer data such as usage, sign up date,

forecasted usage etc
 Historical pricing data: variable and fixed pricing data etc
 Churn indicator: whether each customer has churned or not

Task
Sub-Task 1:
Perform some exploratory data analysis. Look into the data types, data
statistics, specific parameters, and variable distributions. This first subtask is
for you to gain a holistic understanding of the dataset. You should spend
around 1 hour on this.

Sub-Task 2:
Verify the hypothesis of price sensitivity being to some extent correlated with
churn. It is up to you to define price sensitivity and calculate it. You should
spend around 30 minutes on this.

Sub-Task 3:
Prepare a half-page summary or slide of key findings and add some suggestions
for data augmentation – which other sources of data should the client provide
you with and which open source datasets might be useful? You should spend
10-15 minutes on this.
Sub-Task 1:

# load packages
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 50)

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import LabelEncoder

client dataset feature descriptions:

# load client dataset
client = pd.read_csv('../input/bcginternshipprogram/client_data.csv')
client.head(3)

OUTPUT:

Have a look at the general information on the client dataset

client.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14606 entries, 0 to 14605
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 14606 non-null object
1 channel_sales 14606 non-null object
2 cons_12m 14606 non-null int64
3 cons_gas_12m 14606 non-null int64
4 cons_last_month 14606 non-null int64
5 date_activ 14606 non-null object
6 date_end 14606 non-null object
7 date_modif_prod 14606 non-null object
8 date_renewal 14606 non-null object
9 forecast_cons_12m 14606 non-null float64
10 forecast_cons_year 14606 non-null int64
11 forecast_discount_energy 14606 non-null float64
12 forecast_meter_rent_12m 14606 non-null float64
13 forecast_price_energy_off_peak 14606 non-null float64
14 forecast_price_energy_peak 14606 non-null float64
15 forecast_price_pow_off_peak 14606 non-null float64
16 has_gas 14606 non-null object
17 imp_cons 14606 non-null float64
18 margin_gross_pow_ele 14606 non-null float64
19 margin_net_pow_ele 14606 non-null float64
20 nb_prod_act 14606 non-null int64
21 net_margin 14606 non-null float64
22 num_years_antig 14606 non-null int64
23 origin_up 14606 non-null object
24 pow_max 14606 non-null float64
25 churn 14606 non-null int64

dtypes: float64(11), int64(7), object(8)

memory usage: 2.9+ MB

There are 4 features related to date, it is better to convert them to datetime data type.
# convert datetime feature to datetime data type
for f in ['date_activ','date_end','date_modif_prod','date_renewal']:
client[f] = pd.to_datetime(client[f])

Add some new features from the above datetime features

#client['contract_start_year'] = client['date_activ'].dt.year
client['contract_end_year'] = client['date_end'].dt.year

First, check missing values and duplicate records.

# define a function to display missing values and dupliate rows

def duplicate_and_missing(dataset, dataset_name):
print('There are', dataset.shape[0], 'rows and', dataset.shape[1],
'columns in the dataset', '"'+dataset_name+'"','\n'+'--'*40)
# display missing values
if dataset.isna().sum().sum()!=0: # if there is missing values
missing_value = dataset.isna().sum()[dataset.isna().sum()!
=0].to_frame(name='count')
missing_value['proportion'] =
missing_value['count']/len(dataset)
print('There are', dataset.isna().sum().sum(), 'missing values')
print(missing_value, '\n'+'--'*40)
else:
print('There is no missing value')
# display duplicate rows
if dataset.duplicated().sum()!=0:
print('There are', dataset.duplicated().sum(), 'duplicate rows\n')
else:
print('There is no duplicate row\n')

There is no missing value or duplicate rows.

duplicate_and_missing(dataset=client, dataset_name='Client')
There are 14606 rows and 28 columns in the dataset "Client"
--------------------------------------------------------------------------------
There is no missing value
There is no duplicate row

Immediate download Solution Manual for Power Electronics: Converters, Applications, and Design, 3rd Edition. Ned Mohan, Tore M. Undeland, William P. Robbins all chapters
100% (27)
Immediate download Solution Manual for Power Electronics: Converters, Applications, and Design, 3rd Edition. Ned Mohan, Tore M. Undeland, William P. Robbins all chapters
39 pages
05continuous Univariate Distributions, Vol. 1 PDF
0% (1)
05continuous Univariate Distributions, Vol. 1 PDF
769 pages
Cross Regulator and Head Regulator Design
94% (17)
Cross Regulator and Head Regulator Design
21 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Splunk: SPLK-3001 Exam
100% (2)
Splunk: SPLK-3001 Exam
21 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
BCG Internship Task 2
No ratings yet
BCG Internship Task 2
26 pages
Module 9 Seaborn - Loans MSIS2407 20241113 Filled
No ratings yet
Module 9 Seaborn - Loans MSIS2407 20241113 Filled
38 pages
KPMG - Task 1
No ratings yet
KPMG - Task 1
22 pages
GRL - EX - 4 (1) .Ipynb - Colaboratory
No ratings yet
GRL - EX - 4 (1) .Ipynb - Colaboratory
7 pages
ML 5
No ratings yet
ML 5
11 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
EnyEgLH5PpLWRjqAJCb8S65HT0Ty8Q
No ratings yet
EnyEgLH5PpLWRjqAJCb8S65HT0Ty8Q
9 pages
credit-card_notebooks_preprocessed-data_data_preprocessing.ipynb at main · Shubhamdongarjal_credit-card
No ratings yet
credit-card_notebooks_preprocessed-data_data_preprocessing.ipynb at main · Shubhamdongarjal_credit-card
15 pages
task2-eda-cleaning
No ratings yet
task2-eda-cleaning
33 pages
Masterclass Data Analysis.ipynb - Colab
No ratings yet
Masterclass Data Analysis.ipynb - Colab
4 pages
Ecommerce Purchases Exercise - Jupyter Notebook
No ratings yet
Ecommerce Purchases Exercise - Jupyter Notebook
2 pages
DM Project
No ratings yet
DM Project
34 pages
DMV - 1 - Jupyter Notebook
No ratings yet
DMV - 1 - Jupyter Notebook
4 pages
DM Project
No ratings yet
DM Project
36 pages
Implement K-Means Clustering.: Preprocessing
No ratings yet
Implement K-Means Clustering.: Preprocessing
8 pages
Untitled0.ipynb - Colab
No ratings yet
Untitled0.ipynb - Colab
6 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
Project
No ratings yet
Project
12 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Eda Document Longterm
No ratings yet
Eda Document Longterm
10 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Ali Shafi BSBA 2-A 6522 Sales Market Data
No ratings yet
Ali Shafi BSBA 2-A 6522 Sales Market Data
40 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
16 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
Data Mining Project - Brahma Chari
No ratings yet
Data Mining Project - Brahma Chari
23 pages
Exploratry Data Analysis of The Telecom Customer Churn
No ratings yet
Exploratry Data Analysis of The Telecom Customer Churn
16 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Porter Case Study
No ratings yet
Porter Case Study
27 pages
Python
No ratings yet
Python
8 pages
Startup 1668080110
No ratings yet
Startup 1668080110
36 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Exploratory Data Analysis BCG - Ipynb
No ratings yet
Exploratory Data Analysis BCG - Ipynb
260 pages
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Nifty_50_Time_Series Rohit
No ratings yet
Nifty_50_Time_Series Rohit
12 pages
Exploratory Data Analysis BCG - Ipynb
No ratings yet
Exploratory Data Analysis BCG - Ipynb
273 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Ex 1
No ratings yet
Ex 1
119 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
30 pages
Time_Series_Updated (1)
No ratings yet
Time_Series_Updated (1)
11 pages
Danmairo - Analysis - Ipynb - Colaboratory
No ratings yet
Danmairo - Analysis - Ipynb - Colaboratory
18 pages
Credit Card Fraud Detection With CNN 99 Accuracy
No ratings yet
Credit Card Fraud Detection With CNN 99 Accuracy
12 pages
Clustering
No ratings yet
Clustering
53 pages
EDA Project
No ratings yet
EDA Project
7 pages
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
No ratings yet
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
16 pages
10_apr
No ratings yet
10_apr
5 pages
Guides
No ratings yet
Guides
23 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
LoanTap_Case_Study
No ratings yet
LoanTap_Case_Study
37 pages
Complexivo Caso 2.Ipynb - Colab
No ratings yet
Complexivo Caso 2.Ipynb - Colab
39 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Thenmozhi P
No ratings yet
Thenmozhi P
1 page
Aqi Using Mq135 Paper-5
No ratings yet
Aqi Using Mq135 Paper-5
10 pages
My Resume
No ratings yet
My Resume
1 page
Untitled document - Google Docs
No ratings yet
Untitled document - Google Docs
4 pages
MIH 2K24 HACKTIVIST
No ratings yet
MIH 2K24 HACKTIVIST
10 pages
[CREATE_A_COPY]_AI_for_Good_Hackathon_Prototype_Round_Template
No ratings yet
[CREATE_A_COPY]_AI_for_Good_Hackathon_Prototype_Round_Template
13 pages
Unit4 Final
No ratings yet
Unit4 Final
57 pages
unit1
No ratings yet
unit1
30 pages
Book 9
No ratings yet
Book 9
8 pages
ling_lect_18-19
No ratings yet
ling_lect_18-19
40 pages
UNIT 3 OBT356
No ratings yet
UNIT 3 OBT356
21 pages
Rakshana Resume
No ratings yet
Rakshana Resume
1 page
Dataset
No ratings yet
Dataset
603 pages
bus_student_data
No ratings yet
bus_student_data
1 page
andra (1)
No ratings yet
andra (1)
7 pages
Ahamed Yunus Oct Resume
No ratings yet
Ahamed Yunus Oct Resume
1 page
Fazal Resume
No ratings yet
Fazal Resume
1 page
Shanthini P Resume-4
No ratings yet
Shanthini P Resume-4
1 page
1 Online
No ratings yet
1 Online
7 pages
Exs 306
No ratings yet
Exs 306
21 pages
Savera Spares Rist
No ratings yet
Savera Spares Rist
41 pages
Six Box Model
No ratings yet
Six Box Model
3 pages
prEN_1993-1-10
No ratings yet
prEN_1993-1-10
52 pages
Personal Computer (PC) - Based Flood Monitoring System Using Cloud Computing
No ratings yet
Personal Computer (PC) - Based Flood Monitoring System Using Cloud Computing
7 pages
Mechanical Vibration CEP
No ratings yet
Mechanical Vibration CEP
15 pages
Genetics Vocabulary 335
No ratings yet
Genetics Vocabulary 335
1 page
Avaliação - Inglês - 6º Ano - 1º Trimestre
No ratings yet
Avaliação - Inglês - 6º Ano - 1º Trimestre
7 pages
Jambudweep GENERAL
No ratings yet
Jambudweep GENERAL
33 pages
0193 01
No ratings yet
0193 01
22 pages
The Structure of Stative Verbs PDF
No ratings yet
The Structure of Stative Verbs PDF
237 pages
DMX-SPI-203 LED Controller: 2. Product Dimension
No ratings yet
DMX-SPI-203 LED Controller: 2. Product Dimension
3 pages
Jacket Vessels. Overall Coefficients
No ratings yet
Jacket Vessels. Overall Coefficients
1 page
Heuristic Evaluation of Mobile Usability: A Mapping Study
No ratings yet
Heuristic Evaluation of Mobile Usability: A Mapping Study
2 pages
Reading Comprehension: S N Ne C Las Es L
No ratings yet
Reading Comprehension: S N Ne C Las Es L
33 pages
A Unified Understanding of Minimum Lattice Thermal Conductivity
No ratings yet
A Unified Understanding of Minimum Lattice Thermal Conductivity
14 pages
What To Look For in A Drum Set For Youngsters
No ratings yet
What To Look For in A Drum Set For Youngsters
5 pages
FAC2601_Assessment1_20March2024 - Attempted on 16 March 2025
No ratings yet
FAC2601_Assessment1_20March2024 - Attempted on 16 March 2025
10 pages
Science6 - Q1-WK-3 FOR STUDENT
No ratings yet
Science6 - Q1-WK-3 FOR STUDENT
18 pages
Olympus HP Specification
No ratings yet
Olympus HP Specification
8 pages
Fragblast: International Journal For Blasting and Fragmentation
No ratings yet
Fragblast: International Journal For Blasting and Fragmentation
20 pages
Eee 204 Note
No ratings yet
Eee 204 Note
37 pages
ma-psycho(10)-sem1-led 05042023
No ratings yet
ma-psycho(10)-sem1-led 05042023
116 pages
965-0231 Onan BGE (Spec F-P) BGEL (Spec E) Emerald Series Parts Manual (01-1998) PDF
No ratings yet
965-0231 Onan BGE (Spec F-P) BGEL (Spec E) Emerald Series Parts Manual (01-1998) PDF
59 pages
Product Datasheet: Control Unit Micrologic 5.0 A, For Masterpact NT/ NW, LSI Protections
No ratings yet
Product Datasheet: Control Unit Micrologic 5.0 A, For Masterpact NT/ NW, LSI Protections
3 pages

Task 2 Exploratory Data Analysis

Uploaded by

Task 2 Exploratory Data Analysis

Uploaded by

Task 2: Exploratory Data Analysis

The data that was sent over includes:

 Historical customer data: Customer data such as usage, sign up date,

import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder

client dataset feature descriptions:

Have a look at the general information on the client dataset

dtypes: float64(11), int64(7), object(8)

Add some new features from the above datetime features

First, check missing values and duplicate records.

# define a function to display missing values and dupliate rows

There is no missing value or duplicate rows.

You might also like