0% found this document useful (0 votes)

2 views

SourceCode&Writeup_Comcast telecom

The document assesses consumer complaints against Comcast Telecom, analyzing a dataset containing 2224 complaints across various attributes. It outlines steps for data preparation, including formatting date and time, checking for duplicates and missing values, and creating categorical variables for complaint status. The analysis aims to identify trends in complaint types, state-wise complaint status, and resolution rates, using visualizations for better insights.

Uploaded by

niksagar6242

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

SourceCode&Writeup_Comcast telecom

Uploaded by

niksagar6242

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

Comcast Telecom Consumer Complaints

In [70]: #Importing libraries

import pandas as pd
import numpy as np

#from sorted_months_weekdays import *

#from sort_dataframeby_monthorweek import *

from matplotlib import pyplot as plt

%matplotlib inline
import seaborn as sns

#import plotly.express as px
#import plotly.graph_objects as go

In [71]: # reading data to data frame

df = pd.read_csv('Comcast_telecom_complaints_data.csv')

Basic data check

In [72]: df.shape

Out[72]: (2224, 11)

In [73]: df.dtypes

Out[73]: Ticket # object

Customer Complaint object
Date object
Date_month_year object
Time object
Received Via object
City object
State object
Zip code int64
Status object
Filing on Behalf of Someone object
dtype: object

1 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [74]: df.head()
#df.tail()

Out[74]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via code

Comcast
Cable 3:53:50 Customer
0 250635 22-04-15 22-Apr-15 Abingdon Maryland 21009
Internet PM Care Call
Speeds

Payment
disappear - 10:22:56
1 223441 04-08-15 04-Aug-15 Internet Acworth Georgia 30102
service got AM
disconnected

Speed and 9:55:47

2 242732 18-04-15 18-Apr-15 Internet Acworth Georgia 30101
Service AM

Comcast
Imposed a
New Usage 11:59:35
3 277946 05-07-15 05-Jul-15 Internet Acworth Georgia 30101
Cap of AM
300GB that
...

Comcast not
working and 1:25:26
4 307175 26-05-15 26-May-15 Internet Acworth Georgia 30101
no service to PM
boot

In [75]: #Checking duplicate rows based on all columns

dup_df = df[df.duplicated()]
dup_df

Out[75]:
Filing on
Ticket Customer Received Zip
Date Date_month_year Time City State Status Behalf of
# Complaint Via code
Someone

In [ ]: # based on the results we see that thare are no duplicate row entrys ba
sed on all the columns

2 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [76]: #checking missing values

df.isnull().sum()

Out[76]: Ticket # 0
Customer Complaint 0
Date 0
Date_month_year 0
Time 0
Received Via 0
City 0
State 0
Zip code 0
Status 0
Filing on Behalf of Someone 0
dtype: int64

In [ ]: # we see that there is no missing values in the data set.

In [77]: df.describe(include='all')

Out[77]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

count 2224 2224 2224 2224 2224 2224 2224 2224 2224.000000

unique 2224 1841 91 91 2190 2 928 43

3:25:33 Customer
top 317911 Comcast 24-06-15 24-Jun-15 Atlanta Georgia
PM Care Call

freq 1 83 218 218 2 1119 63 288

mean NaN NaN NaN NaN NaN NaN NaN NaN 47994.393435

std NaN NaN NaN NaN NaN NaN NaN NaN 28885.279427

min NaN NaN NaN NaN NaN NaN NaN NaN 1075.000000

25% NaN NaN NaN NaN NaN NaN NaN NaN 30056.500000

50% NaN NaN NaN NaN NaN NaN NaN NaN 37211.000000

75% NaN NaN NaN NaN NaN NaN NaN NaN 77058.750000

max NaN NaN NaN NaN NaN NaN NaN NaN 99223.000000

Observations 1:
We have 2224 rows and 11 columns
We see date and date_month_year and time are of type object and we may need to format it to correct
data type for further analysis
There are no duplicates records based on all columns
There are no missing values in the given dataset

3 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

to Analyze:
Provide the trend chart for the number of complaints at monthly and daily granularity levels.
Provide a table with the frequency of complaint types.
Which complaint types are maximum i.e., around internet, network issues, or across any other
domains.
Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized
as Open and Closed & Solved is to be categorized as Closed.
Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3.
Provide insights on:
Which state has the maximum complaints
Which state has the highest percentage of unresolved complaints
Provide the percentage of complaints resolved till date, which were received through the Internet and
customer care calls.

Data Preparation & Approach

Looking at the above set of details to analyse, we need to first prep the given data

for checking the trend of number of complaint,s we need to format the given date and time data which is
currently in string/object into correct datetime fromat so that we can apply datetime funcstions to the data
to summarise or create plots correctly

To find out the complaint types, we need to create a custom Complaint Types category based on the
customer complaints
Since there is no complaint types or category provided, we can use pandas methods to find the key
word frequency ( example, internet, billing, etc) using the Customer Complaints data to get the
general sense on what type of issues/complaints are being reported
using these key word frequency, we can create a custome category of Coustomer complalints

Using this newly created Complaint Types category we can create afrequency table of complaint types

We need to create a new Status Category column in the data frame using the Status column ( Open,
Closed, Pending and solved , such that Open & Pending is to be categorized as Open and Closed &
Solved is to be categorized as Closed.

any additional helper columns can be added to the dataframe to help make analysis easy

Formatting date and time

In [78]: df_comcast=df.copy()

4 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [79]: df_comcast.head()

Out[79]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via code

Comcast
Cable 3:53:50 Customer
0 250635 22-04-15 22-Apr-15 Abingdon Maryland 21009
Internet PM Care Call
Speeds

Payment
disappear - 10:22:56
1 223441 04-08-15 04-Aug-15 Internet Acworth Georgia 30102
service got AM
disconnected

Speed and 9:55:47

2 242732 18-04-15 18-Apr-15 Internet Acworth Georgia 30101
Service AM

Comcast
Imposed a
New Usage 11:59:35
3 277946 05-07-15 05-Jul-15 Internet Acworth Georgia 30101
Cap of AM
300GB that
...

Comcast not
working and 1:25:26
4 307175 26-05-15 26-May-15 Internet Acworth Georgia 30101
no service to PM
boot

5 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [80]: #sorting all data by col = Date Date_month_year in ascending or

der of Date Date_month_year
df_comcast.sort_values(by='Date_month_year',ascending=True,inplace=Tru
e)
df_comcast.head()

Out[80]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...

bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain

Misleading
4:21:46 Des
584 217999 information 04-04-15 04-Apr-15 Internet Washington
PM Moines
given

comcast 5:32:05
561 218043 04-04-15 04-Apr-15 Internet Denver Colorado
services PM

Multiple
Unauthorized
8:10:35 Customer
1892 218168 and 04-04-15 04-Apr-15 Shoreview Minnesota
PM Care Call
Unwarranted
Credit C...

6 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [81]: #combing date and Time and creating a new col

df_comcast['datetime']= df_comcast['Date']+' '+df_comcast['Time']
df_comcast.head()

Out[81]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...

bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain

Misleading
4:21:46 Des
584 217999 information 04-04-15 04-Apr-15 Internet Washington
PM Moines
given

comcast 5:32:05
561 218043 04-04-15 04-Apr-15 Internet Denver Colorado
services PM

Multiple
Unauthorized
8:10:35 Customer
1892 218168 and 04-04-15 04-Apr-15 Shoreview Minnesota
PM Care Call
Unwarranted
Credit C...

In [82]: df_comcast[['datetime']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1416 to 464
Data columns (total 1 columns):
datetime 2224 non-null object
dtypes: object(1)
memory usage: 34.8+ KB

In [83]: #Converting datetime which is in string/object format to datetime forma

t
df_comcast['datetime'] = pd.to_datetime(df_comcast['datetime'],format=
'%d-%m-%y %I:%M:%S %p')
df_comcast[['datetime']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1416 to 464
Data columns (total 1 columns):
datetime 2224 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 34.8 KB

7 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [84]: #checking that the datetime col is formatted correctly and applying dat
etie methods
df_comcast.loc[1416,'datetime'].day_name()

Out[84]: 'Saturday'

In [18]: df_comcast.shape

Out[18]: (2224, 12)

In [85]: df_comcast.head(2)

Out[85]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

Comcast
Business
6:39:55
1416 218108 Phone/Internet 04-04-15 04-Apr-15 Internet Newnan Georgia
PM
Contract
Disag...

bait and
switch 4:07:36
1483 217985 04-04-15 04-Apr-15 Internet Orcutt California
services for PM
monetary gain

In [ ]: # sorting the dataframe by date in ascending order

In [86]: df_comcast.sort_values('datetime',inplace=True)
df_comcast.head()

Out[86]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via

Comcast 12:18:47 Customer

1852 211255 04-01-15 04-Jan-15 Schaumburg Illinois
harassment AM Care Call

comcast 10:43:20 Customer

1160 211472 04-01-15 04-Jan-15 Lockport Illinois
cable AM Care Call

10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon

Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service

Horrible 12:28:58 Customer

1237 211775 04-01-15 04-Jan-15 Mckeesport Pennsylvania
Service PM Care Call

In [63]: # now that we have created a col with correct datetime format, we can c
reate additional columns for Year, Month and date
# using the pandas datetime methods

8 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [25]: #df_comcast.drop(['date'],axis=1,inplace=True)
#df_comcast.head(2)

In [87]: #df_comcast['Year'] = pd.DatetimeIndex(df_comcast['datetime']).year

#df_comcast['Month'] = pd.DatetimeIndex(df_comcast['datetime']).month
#df_comcast['date'] = pd.DatetimeIndex(df_comcast['datetime']).date

df_comcast['Year'] = df_comcast['datetime'].dt.strftime('%Y')
df_comcast['Month'] = df_comcast['datetime'].dt.strftime('%b')
df_comcast['date']=df_comcast['datetime'].dt.strftime('%d-%b-%Y')

# convert this date to correct date format

df_comcast['date'] = pd.to_datetime(df_comcast['date'])

In [27]: df_comcast.head(2)

Out[27]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

Comcast 12:18:47 Customer

1852 211255 04-01-15 04-Jan-15 Schaumburg Illinois
harassment AM Care Call

comcast 10:43:20 Customer

1160 211472 04-01-15 04-Jan-15 Lockport Illinois
cable AM Care Call

In [88]: df_comcast[['date']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2224 entries, 1852 to 441
Data columns (total 1 columns):
date 2224 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 34.8 KB

In [ ]: # adding helper column for count of tickets for aggregagtion

In [89]: df_comcast['Ticket_count']=1
df_comcast.head(2)

Out[89]:
Ticket Customer Received
Date Date_month_year Time City State
# Complaint Via

Comcast 12:18:47 Customer

1852 211255 04-01-15 04-Jan-15 Schaumburg Illinois
harassment AM Care Call

comcast 10:43:20 Customer

1160 211472 04-01-15 04-Jan-15 Lockport Illinois
cable AM Care Call

In [ ]:

9 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

Q1.Provide the trend chart for the number of complaints at monthly and daily
granularity levels.

In [61]: # now that we have created a col with correct datetime format, we can u
se the col to plot the trend charts

In [90]: monthly_tickets_df = df_comcast.groupby(['Month'])[['Ticket_count']].su

m().reset_index()
monthly_tickets_df.head()
monthly_tickets_df.tail()

# sorting the monthly_tickets_df in calendar months order

from sorted_months_weekdays import *
from sort_dataframeby_monthorweek import *
monthly_tickets_df=Sort_Dataframeby_Month(df=monthly_tickets_df,monthco
lumnname='Month')
monthly_tickets_df.head()

Out[90]:
Month Ticket_count

0 Jan 55

1 Feb 59

2 Mar 45

3 Apr 375

4 May 317

In [91]: import matplotlib.pyplot as plt

%matplotlib inline

10 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [92]: plt.figure(figsize=(10,5))
plt.plot(monthly_tickets_df['Month'],monthly_tickets_df['Ticket_count
'])
plt.xticks(rotation=90)
plt.margins(0.02)

plt.title('Customer Complaints by Month')

plt.xlabel('Months')
plt.ylabel('Complaints')
plt.show()

In [93]: daily_tickets_df = df_comcast.groupby(['date'])[['Ticket_count']].sum

().reset_index()
daily_tickets_df.head()
daily_tickets_df.tail()

Out[93]:
date Ticket_count

86 2015-11-05 12

87 2015-11-06 21

88 2015-12-04 15

89 2015-12-05 7

90 2015-12-06 43

In [94]: from datetime import datetime, timedelta

from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
plt.style.use('seaborn')

11 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [95]: plt.figure(figsize=(20,10))
plt.plot_date(daily_tickets_df['date'], daily_tickets_df['Ticket_count
'], linestyle='solid')
plt.gcf().autofmt_xdate()
plt.gca().xaxis.set_major_formatter(mpl_dates.DateFormatter('%d-%b-%Y
'))
plt.gca().xaxis.set_major_locator(mpl_dates.DayLocator(interval=7))
plt.xticks(rotation=90)
plt.margins(0.02)

plt.title('Customer Complaints by day')

plt.xlabel('Days')
plt.ylabel('Complaints')

plt.show()

In [213]: # using plotly charts to get the full view of the complainst by date

In [96]: import plotly.express as px

import plotly.graph_objects as go

12 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [97]: fig = px.line(daily_tickets_df, x="date", y="Ticket_count", title='Cust

omer complaints by day')
fig.update_xaxes(ticklabelmode="period",tickangle = -45,tick0 = 0.5,dti
ck = 0,rangeslider_visible=True)
fig.show()

Observations 2:
* By looking at the Monthly chart and daily trend of customer complains ti
ckets, we see a steep rise in the month of
May, Jun and July,in the number of tickets
* we see that highest number of complaints tickets are being handled in th
e month of June
* On June 24 2015, we have handled 218 complaints tickets, highest compare
d to any other days in the year
* We need to do further analysis to check why there is a sudden raise of t
icekts, especially in Jun

13 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [226]: # Lets check number of tickets by status

In [98]: df_comcast['Status'].value_counts()

Out[98]: Solved 973

Closed 734
Open 363
Pending 154
Name: Status, dtype: int64

In [99]: monthly_tickets_status_df = df_comcast.groupby(['Month','Status'])[['Ti

cket_count']].sum().reset_index()
monthly_tickets_status_df.head()
monthly_tickets_status_df.tail()

# sorting the monthly_tickets_df in calendar months order

from sorted_months_weekdays import *
from sort_dataframeby_monthorweek import *
monthly_tickets_status_df=Sort_Dataframeby_Month(df=monthly_tickets_sta
tus_df,monthcolumnname='Month')
monthly_tickets_status_df.head()

Out[99]:
Month Status Ticket_count

0 Jan Closed 39

1 Jan Open 5

2 Jan Solved 11

3 Feb Closed 46

4 Feb Open 4

14 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [408]: df1=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Cl
osed']
df2=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Op
en']
df3=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='So
lved']
df4=monthly_tickets_status_df[monthly_tickets_status_df['Status']=='Pe
nding']

plt.figure(figsize=(15,5))
plt.plot(df1['Month'], df1['Ticket_count'],label='Closed')
plt.plot(df2['Month'], df2['Ticket_count'],label='Open')
plt.plot(df3['Month'], df3['Ticket_count'],label='Solved')
plt.plot(df4['Month'], df4['Ticket_count'],label='Pending')
plt.legend()

plt.title('Customer complaints by Month & Status')

plt.xlabel('Months')
plt.ylabel('Complaints')

plt.show()

In [ ]:

In [410]: #fig = px.line(monthly_tickets_status_df, x="Month", y="Ticket_count",

color='Status',
#title='Customer complaints by Month & Status')
#fig.update_xaxes(tickangle = -45)
#fig.show()

Observations 3:
* We can see that in the month of Jun, we received most number of ticekts
and also most of the tickets were solved
* this kind of explains the spkie of tickets being handled in Jun
* It will be interesting to how was the trend of tickets handled by status
month on month

15 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [233]: # Creating a table with the monthly_tickets_status_df

16 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [191]: #groupng a df by Month and status

df_status = df_comcast.groupby(['Month','Status'])['Ticket_count'].sum
().unstack().fillna(0)
df_status.reset_index(inplace=True)

# sorting the mont column in correct month order

from sorted_months_weekdays import *
from sort_dataframeby_monthorweek import *
df_status=Sort_Dataframeby_Month(df=df_status,monthcolumnname='Month')

# cacluation to see total open ticekts % and Closed tickets % by month

df_status['Closed+Solved'] = df_status['Closed'] + df_status['Solved']
df_status['Open+pending'] = df_status['Open'] + df_status['Pending']
df_status['Total'] = df_status['Closed+Solved'] + df_status['Open+pen
ding']
df_status['Open_pct'] = df_status['Open+pending']/df_status['Total']
df_status['Closed_pct'] = df_status['Closed+Solved']/df_status['Total
']

#def highlightvalue(s):
#if s.Open_pct > 0.10:
#return ['background-color: yellow']*10
#else:
#return ['background-color: white']*10

#df_status=df_status.style.apply(highlightvalue, axis=1)

#df_status[['Open%']].style.applymap(lambda x: 'background-color : yel

low' if x>0.10 else '')
# formating the Open% and Closed %columns to display the values in %
#df_status['Open_pct'] = df_status[['Open_pct']].applymap(lambda x:
"{0:.2f}%".format(x*100))
#df_status['Closed_pct'] = df_status[['Closed_pct']].applymap(lambda
x: "{0:.2f}%".format(x*100))

def color_negative_red(value):
"""
Colors elements in a dateframe
green if positive and red if
negative. Does not color NaN
values.
"""

if value > 0.10:

color = 'red'
else:
color = 'black'

return 'color: %s' % color

df_status=df_status.style.applymap(color_negative_red, subset=['Open_p
ct']).format({'Open_pct': "{:.2%}",'Closed_pct': "{:.2%}"})

df_status

17 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

Out[191]:
Month Closed Open Pending Solved Closed+Solved Open+pending Total Open_pct

0 Jan 39 5 0 11 50 5 55 9.09%

1 Feb 46 4 0 9 55 4 59 6.78%

2 Mar 24 7 0 14 38 7 45 15.56%

3 Apr 348 10 0 17 365 10 375 2.67%

4 May 47 63 26 181 228 89 317 28.08%

5 Jun 74 249 127 596 670 376 1046 35.95%

6 Jul 31 6 0 12 43 6 49 12.24%

7 Aug 40 4 0 23 63 4 67 5.97%

8 Sep 28 5 0 22 50 5 55 9.09%

9 Oct 25 3 0 25 50 3 53 5.66%

10 Nov 10 2 0 26 36 2 38 5.26%

11 Dec 22 5 1 37 59 6 65 9.23%

Observations 4:
It appears that, there has been sudden jump in the customer complaints in may and jun
mar, may, jul, and jul we seem to have highest open tickets compared to any other months

Q2 - Provide a table with the frequency of complaint types.

Which complaint types are maximum i.e., around internet, network issues, or across any other domains.

18 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [202]: df_comcast_Complaints =df_comcast.groupby('Customer Complaint')[['Tick

et_count']].sum().sort_values(by='Ticket_count',ascending=False)
df_comcast_Complaints.head(10)

Out[202]:
Ticket_count

Customer Complaint

Comcast 83

Comcast Internet 18

Comcast Data Cap 17

comcast 13

Comcast Billing 11

Comcast Data Caps 11

Data Caps 11

Unfair Billing Practices 9

Comcast data cap 8

Comcast data caps 8

Creating a frequency of complaints using the Customer Complaints column, doe snot give us much
infromation

firstly these Customer complaints provided are the full complaints information and not a category of
complaints to demonstrate which type of Customer complaints are more than other.
secondly, we do not have a pre defined Complaint Types to group these list of Customer complaints

To find out the complaint types, we need to create a custom Complaint Types category based on the
customer complaints

Since there is no complaint types or category provided, we can use pandas methods to find the key word
frequency ( example, internet, billing, etc) using the Customer Complaints data to get the general sense on
what type of issues/complaints are being reported using these key word frequency, we can create a customer
category of Customer complaints

In [204]: # finding the frequency of the words appear in the Customer Complaints
Column and visualing using wordcloud

In [210]: from wordcloud import WordCloud, STOPWORDS

19 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [212]: # converting the Custome Complaints col to list

complaints_list= df_comcast['Customer Complaint']

# capitalizing every word i the list

new_complaints_list=[]
for complaints in complaints_list:
new_complaints_list.append(complaints.title())

#Creeating final list of words by removing undwated words

words = new_complaints_list
words = set(words)
stopwords = ['Comcast','And','To','For','My','Comcast','The','Of','Not
','With']
stopwords = set(stopwords)
final_Complaint_list = words - stopwords
final_Complaint_list = list(final_Complaint_list)

In [213]: from collections import Counter

word_could_dict=Counter(final_Complaint_list)
wordcloud = WordCloud(width = 1000, height = 500).generate_from_freque
ncies(word_could_dict)

plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

In [214]: # from the word cloud we could see that Internet and Billing seems to
be more prominent types of issues
# we can explore more to see if there are any other group or types

In [ ]:

20 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [217]: complaints = df_comcast['Customer Complaint'].str.split(expand=True).s

tack().value_counts()
complaints.head(50)

Out[217]: Comcast 1031

Internet 317
and 269
service 251
internet 182
to 175
for 173
Billing 156
Service 153
Data 151
of 111
billing 111
comcast 90
my 75
not 75
with 72
Cap 63
speed 63
data 63
on 56
- 53
Unfair 48
me 47
in 47
Speed 47
customer 45
a 45
Caps 44
is 44
Xfinity 43
Practices 43
issues 43
bill 43
Issues 40
by 40
the 39
speeds 39
Complaint 39
I 38
Customer 38
no 37
from 37
cap 37
charges 36
services 35
COMCAST 35
Throttling 33
& 32
caps 31
Slow 30
dtype: int64

21 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

looking at the word cloud and the details of complaints, we can probably create a Complaint Type
category under

• Internet # issues related to internet connectivity, speed, network • Billing # issues related to Billing, charges,
etc • Data # issues related to data, data cap, etc • Service # customer service issues • Others # others This is
just a best guess and map Complaint type category based on the frequency of words being found in the
Customer Complaints details

In [218]: # creating an additional column to df_comcast with Complaint_Types

In [238]: masks = [df_comcast['Customer Complaint'].str.lower().str.contains('in

ternet'),
df_comcast['Customer Complaint'].str.lower().str.contains('speed
'),
df_comcast['Customer Complaint'].str.lower().str.contains('connect
ion'),
df_comcast['Customer Complaint'].str.lower().str.contains('billing
'),
df_comcast['Customer Complaint'].str.lower().str.contains('bil'),
df_comcast['Customer Complaint'].str.lower().str.contains('fee'),
df_comcast['Customer Complaint'].str.lower().str.contains('charge
'),
df_comcast['Customer Complaint'].str.lower().str.contains('pric'),
df_comcast['Customer Complaint'].str.lower().str.contains('data'),
df_comcast['Customer Complaint'].str.lower().str.contains('cap'),
df_comcast['Customer Complaint'].str.lower().str.contains('service
'),

values = [
'Internet',
'Internet',
'Internet',
'Billing',
'Billing',
'Billing',
'Billing',
'Billing',
'Data',
'Data',
'Service'

import numpy as np

df_comcast['Complaint_Types'] = np.select(masks, values, default='Othe

rs')

22 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [239]: df_comcast.head()

Out[239]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via

Comcast 12:18:47 Customer

1852 211255 04-01-15 04-Jan-15 Schaumburg Illinois
harassment AM Care Call

comcast 10:43:20 Customer

1160 211472 04-01-15 04-Jan-15 Lockport Illinois
cable AM Care Call

10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon

Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service

Horrible 12:28:58 Customer

1237 211775 04-01-15 04-Jan-15 Mckeesport Pennsylvania
Service PM Care Call

In [240]: #creating a table with the frequency of the newly created complaint Ty
pes
df_comcast.groupby(['Complaint_Types'])[['Ticket_count']].sum().sort_v
alues(by='Ticket_count', ascending=False)

Out[240]:
Ticket_count

Complaint_Types

Internet 636

Others 612

Billing 542

Service 239

Data 195

as per the data we can say that Internet, Billing and others are the three areas where we can see most of
the complaints being recorded
We can probably look more deeper into these complaints to understand more on the issues

Q3. Create a new categorical variable with value as

Open and Closed. Open & Pending is to be
categorized as Open and Closed & Solved is to be
categorized as Closed.

23 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [243]: conditions= [df_comcast['Status']=='Closed',df_comcast['Status']=='Sol

ved',
df_comcast['Status']=='Open',df_comcast['Status']=='Pendi
ng']

values=['Closed','Closed','Open','Open']

df_comcast['Ticket_Status'] = np.select(conditions,values)
df_comcast.head()

Out[243]:
Ticket Customer Received
Date Date_month_year Time City
# Complaint Via

Comcast 12:18:47 Customer

1852 211255 04-01-15 04-Jan-15 Schaumburg Illinois
harassment AM Care Call

comcast 10:43:20 Customer

1160 211472 04-01-15 04-Jan-15 Lockport Illinois
cable AM Care Call

10:47:35 North
1430 211478 Comcast 04-01-15 04-Jan-15 Internet Pennsylvania
AM Huntingdon

Comcast
12:01:06 Customer
2144 211677 refusal of 04-01-15 04-Jan-15 Wayne Pennsylvania
PM Care Call
service

Horrible 12:28:58 Customer

1237 211775 04-01-15 04-Jan-15 Mckeesport Pennsylvania
Service PM Care Call

Q4. Provide state wise status of complaints in a

stacked bar chart. Use the categorized variable from
Q3. Provide insights on:
Which state has the maximum complaints
Which state has the highest percentage of unresolved complaints

In [244]: df_state=df_comcast.groupby(['State','Ticket_Status'])[['Ticket #']].c

ount().reset_index()
df_state.head()

Out[244]:
State Ticket_Status Ticket #

0 Alabama Closed 17

1 Alabama Open 9

2 Arizona Closed 14

3 Arizona Open 6

4 Arkansas Closed 6

24 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [250]: #plt.figure(figsize=(15,5))
#plt.bar(x=df_state[df_state['Ticket_Status']=='Closed']['State'],heig
ht=df_state[df_state['Ticket_Status']=='Closed']['Ticket #'],label ='C
losed')
#plt.bar(x=df_state[df_state['Ticket_Status']=='Open']['State'],height
=df_state[df_state['Ticket_Status']=='Open']['Ticket #'],label ='Open
')
#plt.title('State wise status of complaints')
#plt.xlabel('State')
#plt.ylabel('# of Customer complaints')
#plt.xticks(rotation=90)
#plt.legend()
#plt.show()

In [251]: fig = px.bar(df_state, x="State", y='Ticket #', color="Ticket_Status",

title="state wise status of complaints")
fig.show()

25 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

Which state has the maximum complaints

from the above bar chart it is clear that Georgia has the Maxinum number of complaints

In [270]: #State Group

df_state_grp = df_comcast.groupby(['State'])[['Ticket_count']].sum()
df_state_grp.loc[df_state_grp['Ticket_count']== df_state_grp['Ticket_c
ount'].max()]

Out[270]:
Ticket_count

State

Georgia 288

In [ ]: df_state_grp.loc[df_state_grp['Ticket count']== df_state_grp['Ticket co

unt'].max()]

In [271]: ax=df_state_grp.plot(kind='bar',figsize=(15,6),rot=90,color='b',width=
0.5)
plt.title('Customer Complaints by State')
plt.ylabel('Customer Complaints')
plt.show()

Which state has the highest percentage of unresolved complaints

26 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [285]: df_Open = df_comcast.groupby(['State','Ticket_Status'])['Ticket_count

'].sum().unstack().fillna(0)
df_Open['Open%']=df_Open['Open']/df_Open['Open'].sum()
df_Open.sort_values(by='Open%',ascending=False).head(3)

Out[285]:
Ticket_Status Closed Open Open%

State

Georgia 208.0 80.0 0.154739

California 159.0 61.0 0.117988

Tennessee 96.0 47.0 0.090909

In [378]: df_Open.loc[df_Open['Open%']== df_Open['Open%'].max()].style.format({'

Open%': "{:.2%}"})

Out[378]:
Ticket_Status Closed Open Open%

State

Georgia 208 80 15.47%

In [284]: df_Open.plot.bar(y='Open%',figsize=(15,6), color='red')

Out[284]: <matplotlib.axes._subplots.AxesSubplot at 0x12c321a3518>

Provide the percentage of complaints resolved till

date, which were received through the Internet and
customer care calls.

27 of 28 18-06-2021, 21:28
Comcast Telecom Consumer Complaints-Assessment file:///C:/Users/SUJITS~1/AppData/Local/Temp/Comcast Telecom Con...

In [365]: df_resolved =df_comcast.groupby(['Ticket_Status','Received Via'])['Tic

ket_count'].sum().unstack().fillna(0)
df_resolved['Total']=df_resolved['Customer Care Call'] +df_resolved['I
nternet']
#df_resolved['Closed%'] = df_resolved['Closed']/df_resolved['Total']
df_resolved
#df_resolved.style.format({'Closed%': "{:.2%}"})

Out[365]:
Received Via Customer Care Call Internet Total

Ticket_Status

Closed 864 843 1707

Open 255 262 517

In [366]: df_resolved.loc['Grand Total'] = df_resolved.sum()

df_resolved

Out[366]:
Received Via Customer Care Call Internet Total

Ticket_Status

Closed 864 843 1707

Open 255 262 517

Grand Total 1119 1105 2224

In [370]: df_resolved_pct = df_resolved.div(df_resolved['Total'][-1]).mul(100).r

ound(2).astype(str) + '%'
df_resolved_pct

Out[370]:
Received Via Customer Care Call Internet Total

Ticket_Status

Closed 38.85% 37.9% 76.75%

Open 11.47% 11.78% 23.25%

Grand Total 50.31% 49.69% 100.0%

In [371]: # approximately 77% of the tickets were closed till date and out of wh
ich
# 39% is from Customer care call and 38% is via Internet

In [ ]:

28 of 28 18-06-2021, 21:28

AutoCAD Electrical 2023 Black Book
From Everand
AutoCAD Electrical 2023 Black Book
Gaurav Verma
No ratings yet
Python - DataScience Question - Paper
No ratings yet
Python - DataScience Question - Paper
5 pages
COMP4040 Lab07 SumoLogic
No ratings yet
COMP4040 Lab07 SumoLogic
13 pages
Comcast Telecom Project Code Using Python
82% (11)
Comcast Telecom Project Code Using Python
24 pages
DH-L Cantilever Tail Lifts Maintenance & Repair Manual: Edition
75% (4)
DH-L Cantilever Tail Lifts Maintenance & Repair Manual: Edition
52 pages
Wingspan Bird Card List 20190309 Print
No ratings yet
Wingspan Bird Card List 20190309 Print
6 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
AutoCAD Electrical 2024 Black Book
From Everand
AutoCAD Electrical 2024 Black Book
Gaurav Verma
No ratings yet
CIMA E1 Enterprise Operations Study Text
No ratings yet
CIMA E1 Enterprise Operations Study Text
457 pages
My Project Customer Service Requests Analysis
No ratings yet
My Project Customer Service Requests Analysis
3 pages
Piggery Docxd
100% (2)
Piggery Docxd
7 pages
Project 1: Comcast Telecom Consumer Complaints
No ratings yet
Project 1: Comcast Telecom Consumer Complaints
7 pages
Comcast Analysis
No ratings yet
Comcast Analysis
8 pages
Python Comcast Telecom Consumer Complaints Final Project
100% (2)
Python Comcast Telecom Consumer Complaints Final Project
44 pages
Comcast Project
No ratings yet
Comcast Project
8 pages
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
11 pages
Data Science With R - Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R - Comcast Telecom Consumer Complaints
11 pages
Comcast Analysis
No ratings yet
Comcast Analysis
48 pages
Comcast Telecom Consumer Complaints
No ratings yet
Comcast Telecom Consumer Complaints
1 page
Project 1 - Comcast Telecommunication
No ratings yet
Project 1 - Comcast Telecommunication
1 page
Comcast Writeup Ai
No ratings yet
Comcast Writeup Ai
1 page
Comcast Telecom Consumer Complaints: Sidharth Arora 17/09/2020
No ratings yet
Comcast Telecom Consumer Complaints: Sidharth Arora 17/09/2020
8 pages
Sas Sol Description
17% (23)
Sas Sol Description
2 pages
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
No ratings yet
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
15 pages
Project 02 Customer Service Requests Analysis Caltech
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
19 pages
cover2
No ratings yet
cover2
31 pages
Final
No ratings yet
Final
32 pages
Consumer Complaint Analysis - Debershi Analysis
No ratings yet
Consumer Complaint Analysis - Debershi Analysis
46 pages
Comcast Telecom Consumer Complaints: Problem Statement
No ratings yet
Comcast Telecom Consumer Complaints: Problem Statement
2 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
Customer Services Request Analysis: Pandas PD Numpy NP Matplotlib MPL Matplotlib
No ratings yet
Customer Services Request Analysis: Pandas PD Numpy NP Matplotlib MPL Matplotlib
14 pages
Customer Service Requests Analysis - Solution
No ratings yet
Customer Service Requests Analysis - Solution
11 pages
Project - Comcast Telecom Consumer Complaints
No ratings yet
Project - Comcast Telecom Consumer Complaints
1 page
Customer Analysis Project1
No ratings yet
Customer Analysis Project1
6 pages
Q1 Video Games Sales: #Import The Libraries
No ratings yet
Q1 Video Games Sales: #Import The Libraries
16 pages
Calculating Request - Closing - Time in Terms of HRS: Most Common Complain in NYC
No ratings yet
Calculating Request - Closing - Time in Terms of HRS: Most Common Complain in NYC
1 page
Problem Statement
No ratings yet
Problem Statement
6 pages
Customer Satisfaction Analysis_
No ratings yet
Customer Satisfaction Analysis_
1 page
911 Calls Data Capstone Project .HTML
No ratings yet
911 Calls Data Capstone Project .HTML
15 pages
Homework 04
No ratings yet
Homework 04
2 pages
Comcast Intro and Details
0% (1)
Comcast Intro and Details
10 pages
Day72
No ratings yet
Day72
11 pages
task2-eda-cleaning
No ratings yet
task2-eda-cleaning
33 pages
DOC-20241028-WA0016.
No ratings yet
DOC-20241028-WA0016.
13 pages
IT Ticket Analysis
No ratings yet
IT Ticket Analysis
15 pages
Complaints Management
No ratings yet
Complaints Management
17 pages
MLStackCafe2
No ratings yet
MLStackCafe2
11 pages
Half Project
No ratings yet
Half Project
9 pages
Yulu Business Case Study
No ratings yet
Yulu Business Case Study
42 pages
R - Analysis
No ratings yet
R - Analysis
26 pages
1-s2.0-S187705091932126X-main
No ratings yet
1-s2.0-S187705091932126X-main
8 pages
all code explanations
No ratings yet
all code explanations
8 pages
netflix-case
0% (1)
netflix-case
19 pages
Day 3 - Notes Interview Questions
No ratings yet
Day 3 - Notes Interview Questions
36 pages
Uob Python Lecture2p
No ratings yet
Uob Python Lecture2p
22 pages
dev record final (3)
No ratings yet
dev record final (3)
34 pages
Python Coding Interview Interview Questions Questions
No ratings yet
Python Coding Interview Interview Questions Questions
9 pages
Walmart Solution PDF
No ratings yet
Walmart Solution PDF
35 pages
Group 3-Report Mgt555
No ratings yet
Group 3-Report Mgt555
19 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Papers on the field QoS Measurement Of Services in mobile networks Using Aggregation Method
From Everand
Papers on the field QoS Measurement Of Services in mobile networks Using Aggregation Method
Sigit Haryadi
4/5 (2)
Multipair Ethernet over DMT ver.2: ver, #2
From Everand
Multipair Ethernet over DMT ver.2: ver, #2
Ashlan Chidester
No ratings yet
AutoCAD Electrical 2025 Black Book
From Everand
AutoCAD Electrical 2025 Black Book
Gaurav Verma
No ratings yet
SolidWorks Electrical 2023 Black Book
From Everand
SolidWorks Electrical 2023 Black Book
Gaurav Verma
No ratings yet
1
No ratings yet
1
1 page
Start On
No ratings yet
Start On
1 page
ShareMarket SQL
No ratings yet
ShareMarket SQL
2 pages
The Imperative Need for Digital Detox
No ratings yet
The Imperative Need for Digital Detox
1 page
whitehat jr
No ratings yet
whitehat jr
3 pages
Women
No ratings yet
Women
2 pages
Yaadein aur Dosti
No ratings yet
Yaadein aur Dosti
2 pages
Cheap, Simple, and Safe Logging Using Expression Templates - Marc Eaddy
No ratings yet
Cheap, Simple, and Safe Logging Using Expression Templates - Marc Eaddy
18 pages
Hot Water Bottle
No ratings yet
Hot Water Bottle
2 pages
SMEA Monitoring Tool 2024 2025
100% (1)
SMEA Monitoring Tool 2024 2025
2 pages
Updated Resume
No ratings yet
Updated Resume
1 page
Strainer PDF
No ratings yet
Strainer PDF
7 pages
HowToAutomateAmibrokerStrategy v2
0% (2)
HowToAutomateAmibrokerStrategy v2
17 pages
Top 250 Questions Social Science Class 10 Cbse by Ujjvala Mam
No ratings yet
Top 250 Questions Social Science Class 10 Cbse by Ujjvala Mam
166 pages
Letters - Keats
No ratings yet
Letters - Keats
7 pages
Project Plan
100% (1)
Project Plan
6 pages
Fundamentals of Computer Programming With CSharp Nakov Ebook v2013
No ratings yet
Fundamentals of Computer Programming With CSharp Nakov Ebook v2013
12 pages
Department of Civil Engineering, IIT Delhi CVL 742: Traffic Engineering Problem Set 2: Traffic Stream Characteristics Problem 1
No ratings yet
Department of Civil Engineering, IIT Delhi CVL 742: Traffic Engineering Problem Set 2: Traffic Stream Characteristics Problem 1
1 page
Basic Computer Skills
100% (1)
Basic Computer Skills
27 pages
Expt-1 Vernier Callipers
No ratings yet
Expt-1 Vernier Callipers
3 pages
Solution Manual for Auditing & Assurance Services: A Systematic Approach, 11th Edition, William Messier Jr, Steven Glover Douglas Prawitt - Available For Quick Download And Unlimited Reading
100% (4)
Solution Manual for Auditing & Assurance Services: A Systematic Approach, 11th Edition, William Messier Jr, Steven Glover Douglas Prawitt - Available For Quick Download And Unlimited Reading
38 pages
TRB Cookery
100% (1)
TRB Cookery
34 pages
Robotic Process Automation and Intelligent Automation Security Challenges: A Review
No ratings yet
Robotic Process Automation and Intelligent Automation Security Challenges: A Review
7 pages
Service Manual (Common) : Gn5Un Chassis Segment: SG
No ratings yet
Service Manual (Common) : Gn5Un Chassis Segment: SG
148 pages
10.isolator 2TC811A1006
No ratings yet
10.isolator 2TC811A1006
2 pages
Commercial 2
No ratings yet
Commercial 2
31 pages
Singpapore Companies Data (Extracted 14 April 2020)
No ratings yet
Singpapore Companies Data (Extracted 14 April 2020)
114 pages
List of Inventionts and The Inventors
No ratings yet
List of Inventionts and The Inventors
9 pages
SCE Training Curriculum: TIA Portal Module 003
No ratings yet
SCE Training Curriculum: TIA Portal Module 003
72 pages
6th SEM Hydrology
No ratings yet
6th SEM Hydrology
28 pages
MKP's For HM Works: Compiled by Er. Rabin Neupane
No ratings yet
MKP's For HM Works: Compiled by Er. Rabin Neupane
84 pages
Insight Paper About Laos
No ratings yet
Insight Paper About Laos
2 pages
A Review On Mechanical Property of Sisal Glass Fiber - 2017 - Materials Today
No ratings yet
A Review On Mechanical Property of Sisal Glass Fiber - 2017 - Materials Today
11 pages