SourceCode&Writeup_Comcast telecom
SourceCode&Writeup_Comcast telecom
import pandas as pd
import numpy as np
#import plotly.express as px
#import plotly.graph_objects as go
In [73]: df.dtypes
1 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [74]: df.head()
#df.tail()
Out[74]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via code
Comcast
Cable 3:53:50 Customer
0 250635 22-04-15 22-Apr-15 Abingdon Maryland 21009
Internet PM Care Call
Speeds
Payment
disappear - 10:22:56
1 223441 04-08-15 04-Aug-15 Internet Acworth Georgia 30102
service got AM
disconnected
Comcast
Imposed a
New Usage 11:59:35
3 277946 05-07-15 05-Jul-15 Internet Acworth Georgia 30101
Cap of AM
300GB that
...
Comcast not
working and 1:25:26
4 307175 26-05-15 26-May-15 Internet Acworth Georgia 30101
no service to PM
boot
Out[75]:
Filing on
Ticket Customer Received Zip
Date Date_month_year Time City State Status Behalf of
# Complaint Via code
Someone
In [ ]: # based on the results we see that thare are no duplicate row entrys ba
sed on all the columns
2 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[76]: Ticket # 0
Customer Complaint 0
Date 0
Date_month_year 0
Time 0
Received Via 0
City 0
State 0
Zip code 0
Status 0
Filing on Behalf of Someone 0
dtype: int64
In [77]: df.describe(include='all')
Out[77]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
count 2224 2224 2224 2224 2224 2224 2224 2224 2224.000000
3:25:33 Customer
top 317911 Comcast 24-06-15 24-Jun-15 Atlanta Georgia
PM Care Call
mean NaN NaN NaN NaN NaN NaN NaN NaN 47994.393435
std NaN NaN NaN NaN NaN NaN NaN NaN 28885.279427
min NaN NaN NaN NaN NaN NaN NaN NaN 1075.000000
25% NaN NaN NaN NaN NaN NaN NaN NaN 30056.500000
50% NaN NaN NaN NaN NaN NaN NaN NaN 37211.000000
75% NaN NaN NaN NaN NaN NaN NaN NaN 77058.750000
max NaN NaN NaN NaN NaN NaN NaN NaN 99223.000000
Observations 1:
We have 2224 rows and 11 columns
We see date and date_month_year and time are of type object and we may need to format it to correct
data type for further analysis
There are no duplicates records based on all columns
There are no missing values in the given dataset
3 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
to Analyze:
Provide the trend chart for the number of complaints at monthly and daily granularity levels.
Provide a table with the frequency of complaint types.
Which complaint types are maximum i.e., around internet, network issues, or across any other
domains.
Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized
as Open and Closed & Solved is to be categorized as Closed.
Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3.
Provide insights on:
Which state has the maximum complaints
Which state has the highest percentage of unresolved complaints
Provide the percentage of complaints resolved till date, which were received through the Internet and
customer care calls.
for checking the trend of number of complaint,s we need to format the given date and time data which is
currently in string/object into correct datetime fromat so that we can apply datetime funcstions to the data
to summarise or create plots correctly
To find out the complaint types, we need to create a custom Complaint Types category based on the
customer complaints
Since there is no complaint types or category provided, we can use pandas methods to find the key
word frequency ( example, internet, billing, etc) using the Customer Complaints data to get the
general sense on what type of issues/complaints are being reported
using these key word frequency, we can create a custome category of Coustomer complalints
Using this newly created Complaint Types category we can create afrequency table of complaint types
We need to create a new Status Category column in the data frame using the Status column ( Open,
Closed, Pending and solved , such that Open & Pending is to be categorized as Open and Closed &
Solved is to be categorized as Closed.
any additional helper columns can be added to the dataframe to help make analysis easy
In [78]: df_comcast=df.copy()
4 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [79]: df_comcast.head()
Out[79]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via code
Comcast
Cable 3:53:50 Customer
0 250635 22-04-15 22-Apr-15 Abingdon Maryland 21009
Internet PM Care Call
Speeds
Payment
disappear - 10:22:56
1 223441 04-08-15 04-Aug-15 Internet Acworth Georgia 30102
service got AM
disconnected
Comcast
Imposed a
New Usage 11:59:35
3 277946 05-07-15 05-Jul-15 Internet Acworth Georgia 30101
Cap of AM
300GB that
...
Comcast not
working and 1:25:26
4 307175 26-05-15 26-May-15 Internet Acworth Georgia 30101
no service to PM
boot
5 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[80]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...
bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain
Misleading
4:21:46 Des
584 217999 information 04-04-15 04-Apr-15 Internet Washington
PM Moines
given
comcast 5:32:05
561 218043 04-04-15 04-Apr-15 Internet Denver Colorado
services PM
Multiple
Unauthorized
8:10:35 Customer
1892 218168 and 04-04-15 04-Apr-15 Shoreview Minnesota
PM Care Call
Unwarranted
Credit C...
6 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[81]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...
bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain
Misleading
4:21:46 Des
584 217999 information 04-04-15 04-Apr-15 Internet Washington
PM Moines
given
comcast 5:32:05
561 218043 04-04-15 04-Apr-15 Internet Denver Colorado
services PM
Multiple
Unauthorized
8:10:35 Customer
1892 218168 and 04-04-15 04-Apr-15 Shoreview Minnesota
PM Care Call
Unwarranted
Credit C...
In [82]: df_comcast[['datetime']].info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1416 to 464
Data columns (total 1 columns):
datetime 2224 non-null object
dtypes: object(1)
memory usage: 34.8+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1416 to 464
Data columns (total 1 columns):
datetime 2224 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 34.8 KB
7 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [84]: #checking that the datetime col is formatted correctly and applying dat
etie methods
df_comcast.loc[1416,'datetime'].day_name()
Out[84]: 'Saturday'
In [18]: df_comcast.shape
In [85]: df_comcast.head(2)
Out[85]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...
bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain
In [86]: df_comcast.sort_values('datetime',inplace=True)
df_comcast.head()
Out[86]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via
10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon
Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service
In [63]: # now that we have created a col with correct datetime format, we can c
reate additional columns for Year, Month and date
# using the pandas datetime methods
8 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [25]: #df_comcast.drop(['date'],axis=1,inplace=True)
#df_comcast.head(2)
df_comcast['Year'] = df_comcast['datetime'].dt.strftime('%Y')
df_comcast['Month'] = df_comcast['datetime'].dt.strftime('%b')
df_comcast['date']=df_comcast['datetime'].dt.strftime('%d-%b-%Y')
In [27]: df_comcast.head(2)
Out[27]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
In [88]: df_comcast[['date']].info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1852 to 441
Data columns (total 1 columns):
date 2224 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 34.8 KB
In [89]: df_comcast['Ticket_count']=1
df_comcast.head(2)
Out[89]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via
In [ ]:
9 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Q1.Provide the trend chart for the number of complaints at monthly and daily
granularity levels.
In [61]: # now that we have created a col with correct datetime format, we can u
se the col to plot the trend charts
Out[90]:
Month Ticket_count
0 Jan 55
1 Feb 59
2 Mar 45
3 Apr 375
4 May 317
10 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [92]: plt.figure(figsize=(10,5))
plt.plot(monthly_tickets_df['Month'],monthly_tickets_df['Ticket_count
'])
plt.xticks(rotation=90)
plt.margins(0.02)
Out[93]:
date Ticket_count
86 2015-11-05 12
87 2015-11-06 21
88 2015-12-04 15
89 2015-12-05 7
90 2015-12-06 43
11 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [95]: plt.figure(figsize=(20,10))
plt.plot_date(daily_tickets_df['date'], daily_tickets_df['Ticket_count
'], linestyle='solid')
plt.gcf().autofmt_xdate()
plt.gca().xaxis.set_major_formatter(mpl_dates.DateFormatter('%d-%b-%Y
'))
plt.gca().xaxis.set_major_locator(mpl_dates.DayLocator(interval=7))
plt.xticks(rotation=90)
plt.margins(0.02)
plt.show()
In [213]: # using plotly charts to get the full view of the complainst by date
12 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Observations 2:
* By looking at the Monthly chart and daily trend of customer complains ti
ckets, we see a steep rise in the month of
May, Jun and July,in the number of tickets
* we see that highest number of complaints tickets are being handled in th
e month of June
* On June 24 2015, we have handled 218 complaints tickets, highest compare
d to any other days in the year
* We need to do further analysis to check why there is a sudden raise of t
icekts, especially in Jun
13 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [98]: df_comcast['Status'].value_counts()
Out[99]:
Month Status Ticket_count
0 Jan Closed 39
1 Jan Open 5
2 Jan Solved 11
3 Feb Closed 46
4 Feb Open 4
14 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [408]: df1=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Cl
osed']
df2=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Op
en']
df3=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='So
lved']
df4=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Pe
nding']
plt.figure(figsize=(15,5))
plt.plot(df1['Month'], df1['Ticket_count'],label='Closed')
plt.plot(df2['Month'], df2['Ticket_count'],label='Open')
plt.plot(df3['Month'], df3['Ticket_count'],label='Solved')
plt.plot(df4['Month'], df4['Ticket_count'],label='Pending')
plt.legend()
plt.show()
In [ ]:
Observations 3:
* We can see that in the month of Jun, we received most number of ticekts
and also most of the tickets were solved
* this kind of explains the spkie of tickets being handled in Jun
* It will be interesting to how was the trend of tickets handled by status
month on month
15 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
16 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
#def highlightvalue(s):
#if s.Open_pct > 0.10:
#return ['background-color: yellow']*10
#else:
#return ['background-color: white']*10
#df_status=df_status.style.apply(highlightvalue, axis=1)
def color_negative_red(value):
"""
Colors elements in a dateframe
green if positive and red if
negative. Does not color NaN
values.
"""
df_status
17 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[191]:
Month Closed Open Pending Solved Closed+Solved Open+pending Total Open_pct
0 Jan 39 5 0 11 50 5 55 9.09%
1 Feb 46 4 0 9 55 4 59 6.78%
2 Mar 24 7 0 14 38 7 45 15.56%
6 Jul 31 6 0 12 43 6 49 12.24%
7 Aug 40 4 0 23 63 4 67 5.97%
8 Sep 28 5 0 22 50 5 55 9.09%
9 Oct 25 3 0 25 50 3 53 5.66%
10 Nov 10 2 0 26 36 2 38 5.26%
11 Dec 22 5 1 37 59 6 65 9.23%
Observations 4:
It appears that, there has been sudden jump in the customer complaints in may and jun
mar, may, jul, and jul we seem to have highest open tickets compared to any other months
18 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[202]:
Ticket_count
Customer Complaint
Comcast 83
Comcast Internet 18
comcast 13
Comcast Billing 11
Data Caps 11
Creating a frequency of complaints using the Customer Complaints column, doe snot give us much
infromation
firstly these Customer complaints provided are the full complaints information and not a category of
complaints to demonstrate which type of Customer complaints are more than other.
secondly, we do not have a pre defined Complaint Types to group these list of Customer complaints
To find out the complaint types, we need to create a custom Complaint Types category based on the
customer complaints
Since there is no complaint types or category provided, we can use pandas methods to find the key word
frequency ( example, internet, billing, etc) using the Customer Complaints data to get the general sense on
what type of issues/complaints are being reported using these key word frequency, we can create a customer
category of Customer complaints
In [204]: # finding the frequency of the words appear in the Customer Complaints
Column and visualing using wordcloud
19 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
In [214]: # from the word cloud we could see that Internet and Billing seems to
be more prominent types of issues
# we can explore more to see if there are any other group or types
In [ ]:
20 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
21 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
looking at the word cloud and the details of complaints, we can probably create a Complaint Type
category under
• Internet # issues related to internet connectivity, speed, network • Billing # issues related to Billing, charges,
etc • Data # issues related to data, data cap, etc • Service # customer service issues • Others # others This is
just a best guess and map Complaint type category based on the frequency of words being found in the
Customer Complaints details
values = [
'Internet',
'Internet',
'Internet',
'Billing',
'Billing',
'Billing',
'Billing',
'Billing',
'Data',
'Data',
'Service'
import numpy as np
22 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [239]: df_comcast.head()
Out[239]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via
10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon
Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service
In [240]: #creating a table with the frequency of the newly created complaint Ty
pes
df_comcast.groupby(['Complaint_Types'])[['Ticket_count']].sum().sort_v
alues(by='Ticket_count', ascending=False)
Out[240]:
Ticket_count
Complaint_Types
Internet 636
Others 612
Billing 542
Service 239
Data 195
as per the data we can say that Internet, Billing and others are the three areas where we can see most of
the complaints being recorded
We can probably look more deeper into these complaints to understand more on the issues
23 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
values=['Closed','Closed','Open','Open']
df_comcast['Ticket_Status'] = np.select(conditions,values)
df_comcast.head()
Out[243]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via
10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon
Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service
Out[244]:
State Ticket_Status Ticket #
0 Alabama Closed 17
1 Alabama Open 9
2 Arizona Closed 14
3 Arizona Open 6
4 Arkansas Closed 6
24 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
In [250]: #plt.figure(figsize=(15,5))
#plt.bar(x=df_state[df_state['Ticket_Status']=='Closed']['State'],heig
ht=df_state[df_state['Ticket_Status']=='Closed']['Ticket #'],label ='C
losed')
#plt.bar(x=df_state[df_state['Ticket_Status']=='Open']['State'],height
=df_state[df_state['Ticket_Status']=='Open']['Ticket #'],label ='Open
')
#plt.title('State wise status of complaints')
#plt.xlabel('State')
#plt.ylabel('# of Customer complaints')
#plt.xticks(rotation=90)
#plt.legend()
#plt.show()
25 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
from the above bar chart it is clear that Georgia has the Maxinum number of complaints
Out[270]:
Ticket_count
State
Georgia 288
In [271]: ax=df_state_grp.plot(kind='bar',figsize=(15,6),rot=90,color='b',width=
0.5)
plt.title('Customer Complaints by State')
plt.ylabel('Customer Complaints')
plt.show()
26 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[285]:
Ticket_Status Closed Open Open%
State
Out[378]:
Ticket_Status Closed Open Open%
State
27 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...
Out[365]:
Received Via Customer Care Call Internet Total
Ticket_Status
Out[366]:
Received Via Customer Care Call Internet Total
Ticket_Status
Out[370]:
Received Via Customer Care Call Internet Total
Ticket_Status
In [371]: # approximately 77% of the tickets were closed till date and out of wh
ich
# 39% is from Customer care call and 38% is via Internet
In [ ]:
28 of 28 18-06-2021, 21:28