Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
ipynb - Colaboratory
Problem Objective :
Perform a service request data analysis of New York City 311 calls. You will focus on the data
wrangling techniques to understand the pattern in the data and also visualize the major
complaint types.
Domain:
Customer Service
Tasks Performed:
Import a 311 NYC service request.
Read or convert the columns ‘Created Date’ and Closed Date’ to datetime datatype and
create a new column ‘Request_Closing_Time’ as the time elapsed between request
creation and request closing.
Provide major insights/patterns that you can offer in a visual format (graphs or tables); at
least 4 major conclusions that you can come up with after generic data mining.
Order the complaint types based on the average ‘Request_Closing_Time’, grouping them
for different locations.
Perform a statistical tests.
Please note: For the below statements you need to state the Null and Alternate and then provide
a statistical test to accept or reject the Null Hypothesis along with the corresponding ‘p-value’.
Whether the average response time across complaint types is similar or not (overall)
Are the type of complaint or service requested and location related?
Whether the average response time across complaint types is similar or not (overall) Are the
type of complaint or service requested and location related?
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 1/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import statsmodels.api as sm
from statsmodels.formula.api import ols
df=pd.read_csv("../input/311-service-requests-nyc/311_Service_Requests_from_2010_to_Presen
df.head()
Descriptive Analysis
df.describe()
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 2/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
X
Y Coordinate Sc
Coordinate
Unique Key Incident Zip (State C
(State
Plane) Co
Plane)
df.shape
(364558, 53)
We see lots of missing value. All the values given in the above does not provides us very clear
insights about our data so we can move ahead with Exploratory Data Analysis.
Feature Creation
#Creating the new column that consist the amount of time taken to resolve the complaint
df["Request_Closing_Time"]=(df["Closed Date"]-df["Created Date"])
Request_Closing_Time=[]
for x in (df["Closed Date"]-df["Created Date"]):
close=x.total_seconds()/60
Request_Closing_Time.append(close)
df["Request_Closing_Time"]=Request_Closing_Time
df["Agency"].unique()
array(['NYPD'], dtype=object)
All of our data belongs to a single agency NYPD i.e New York City Police Department.
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 3/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
Percentage of Requests took less than 100 hour to get solved : 33.63 %
Percentage of Requests took less than 1000 hour to get solved : 97.44 %
From above we can see that the data is heavily skewed. There are lots of outliers. Almost more
than 97% of the requests are solved in less than 1000 hours i.e 17 days.
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 4/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
Almost around 85% of the the requests belongs to transport (Blocked driveway,Illegal Parking,
Vehicle Noise, Road Traffic etc ).
#Categorical Scatter Plot to understand which type of complaints are taking more time to g
g=sns.catplot(x='Complaint Type', y="Request_Closing_Time",data=df)
g.fig.set_figwidth(15)
g.fig.set_figheight(7)
plt.xticks(rotation=90)
plt.ylim((0,5000))
plt.show()
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 5/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
As we have got above that almost around 85% of the the requests belongs to transport (Blocked
driveway,Illegal Parking, Vehicle Noise, Road Traffic etc ). From this plot we can understand that
most of these issues have taken more time to get resolved. Government should take measure in
incresing awareness and find some measures to reduce traffic problems.
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 6/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 7/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
#Request Closing Time for all location Type sorted in ascending Order
pd.DataFrame(df.groupby("Location Type")["Request_Closing_Time"].mean()).sort_values("Requ
Request_Closing_Time
Location Type
Club/Bar/Restaurant 183.492218
Store/Commercial 192.928792
Highway 204.372348
Park/Playground 206.594724
Bridge 229.458333
Street/Sidewalk 261.052945
Commercial 270.649846
Park 20210.566667
Ferry NaN
Terminal NaN
We see that maximum(mean) time to resolve the complaint is taken in Park,Vacant Lot and
Commercial areas whereas the cases in the Subway Station and Restaurent are resolved in very
less time
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 8/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 9/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
Request_Closing_Time
City
ARVERNE 137.840605
BAYSIDE 160.062978
FLUSHING 177.446478
WHITESTONE 187.976467
CORONA 188.984584
ELMHURST 194.108392
BROOKLYN 236.607935
Astoria 242.452302
Handling Missing
Long Island CityValues 245.388922
ASTORIA 265.236501
#Percentage RIDGEWOOD
Of Missing Value 268.285547
pd.DataFrame((df.isnull().sum()/df.shape[0]*100)).sort_values(0,ascending=False)[:20]
SAINT ALBANS 271.040767
Woodside 281.455622
JAMAICA 305.346459
rem=[]
for x in new_df.columns.tolist():
if new_df[x].nunique()<=3:
print(x+ " "*10+" : ",new_df[x].unique())
rem.append(x)
Agency : ['NYPD']
Agency Name : ['New York City Police Department' 'NYPD' 'Internal Affairs
Facility Type : ['Precinct' nan]
Park Facility Name : ['Unspecified' 'Alley Pond Park - Nature Center']
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 11/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
We see that all the data above have not much details, are Unspecified. So we can remove those
columns to ease our analysis
new_df.drop(rem,axis=1,inplace=True)
new_df.shape
(364558, 26)
new_df.drop(rem1,axis=1,inplace=True)
new_df.head()
2015- 2016-
Noise -
0 12-31 01-01 Street/Sidewalk 10034.0 ADDRESS NEW YORK C
Street/Sidewalk
23:59:45 00:55:15
2015- 2016-
Blocked
1 12-31 01-01 Street/Sidewalk 11105.0 ADDRESS ASTORIA C
Driveway
23:59:44 01:26:57
Hypothesis Testing
g=sns.catplot(x="Complaint Type",y="Request_Closing_Time",kind="box",data=new_df)
g.fig.set_figheight(8)
g.fig.set_figwidth(15)
plt.xticks(rotation=90)
plt.ylim((0,2000))
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 12/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
(0.0, 2000.0)
anova_df=pd.DataFrame()
anova_df["Request_Closing_Time"]=new_df["Request_Closing_Time"]
anova_df["Complaint"]=new_df["Complaint Type"]
anova_df.dropna(inplace=True)
anova_df.head()
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 13/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
Request_Closing_Time Complaint
lm=ols("Request_Closing_Time~Complaint",data=anova_df).fit()
0 55.500000 Noise - Street/Sidewalk
table=sm.stats.anova_lm(lm)
table 1 87.216667 Blocked Driveway
Since p value for the Complaint is less that 0.01 thus we accept alternate hypothesis i.e there is
significant difference in the mean response time w.r.t different type of complaint.
chi_sq=pd.DataFrame()
chi_sq["Location Type"]=new_df["Location Type"]
chi_sq["Complaint Type"]=new_df["Complaint Type"]
chi_sq.dropna(inplace=True)
alpha = 0.05
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (H0 holds true)')
Since p value for the chi square test is less than 0.05(LOS) we can conclude that Complaint Type
is dependent on Location Type i.e specific type of complaint is raised from specific places,
Conclusions
Maximum Complaints are raised in road and parking (vehicle) related sectors
On an average complains are closed in an span of 150 to 300 hours
Transport and Road related issues are taking more time to get resolved as number of
these cases are quite high.
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 14/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
Number of cases from Borough goes as follows BROOKLYN > QUEENS > MANHATTAN >
BRONX > STATEN ISLAND
Complaint Type are Depentent on Location Type.
Time taken for solving different complaint type are different
https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 15/15