0% found this document useful (0 votes)
135 views15 pages

Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory

The document discusses analyzing customer service request data from New York City's 311 system. It describes importing the data, converting date fields to datetime format, and creating a new field for the time elapsed to close each request. Exploratory analysis found most requests were addressed within 17 days, with transportation-related complaints like parking and traffic making up 85% of requests and taking varying times to resolve on average.

Uploaded by

ritwij maunas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views15 pages

Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory

The document discusses analyzing customer service request data from New York City's 311 system. It describes importing the data, converting date fields to datetime format, and creating a new field for the time elapsed to close each request. Exploratory analysis found most requests were addressed within 17 days, with transportation-related complaints like parking and traffic making up 85% of requests and taking varying times to resolve on average.

Uploaded by

ritwij maunas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

11/17/22, 4:47 AM customer-service-requests-analysis.

ipynb - Colaboratory

Customer Service Requests Analysis

Background of Problem Statement :


NYC 311's mission is to provide the public with quick and easy access to all New York City
government services and information while offering the best customer service. Each day,
NYC311 receives thousands of requests related to several hundred types of non-emergency
services, including noise complaints, plumbing issues, and illegally parked cars. These requests
are received by NYC311 and forwarded to the relevant agencies such as the police, buildings, or
transportation. The agency responds to the request, addresses it, and then closes it.

Problem Objective :
Perform a service request data analysis of New York City 311 calls. You will focus on the data
wrangling techniques to understand the pattern in the data and also visualize the major
complaint types.

Domain:
Customer Service

Tasks Performed:
Import a 311 NYC service request.
Read or convert the columns ‘Created Date’ and Closed Date’ to datetime datatype and
create a new column ‘Request_Closing_Time’ as the time elapsed between request
creation and request closing.
Provide major insights/patterns that you can offer in a visual format (graphs or tables); at
least 4 major conclusions that you can come up with after generic data mining.
Order the complaint types based on the average ‘Request_Closing_Time’, grouping them
for different locations.
Perform a statistical tests.

Please note: For the below statements you need to state the Null and Alternate and then provide
a statistical test to accept or reject the Null Hypothesis along with the corresponding ‘p-value’.

Whether the average response time across complaint types is similar or not (overall)
Are the type of complaint or service requested and location related?

Whether the average response time across complaint types is similar or not (overall) Are the
type of complaint or service requested and location related?

Importing the Library

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 1/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt


import seaborn as sns
sns.set()

import warnings
warnings.filterwarnings("ignore")

from scipy import stats


from scipy.stats import chi2_contingency

import statsmodels.api as sm
from statsmodels.formula.api import ols

Loading the Data

df=pd.read_csv("../input/311-service-requests-nyc/311_Service_Requests_from_2010_to_Presen
df.head()

Unique Created Closed Agency Complai


Agency
Key Date Date Name Ty

12/31/2015 01/01/2016 New York


Nois
0 32310363 11:59:45 12:55:15 NYPD City Police
Street/Sidew
PM AM Department

12/31/2015 01/01/2016 New York


Block
1 32309934 11:59:44 01:26:57 NYPD City Police
Drivew
PM AM Department

12/31/2015 01/01/2016 New York


Block
2 32309159 11:59:29 04:51:03 NYPD City Police
Drivew
PM AM Department

12/31/2015 01/01/2016 New York


3 32305098 11:57:46 07:43:13 NYPD City Police Illegal Park
PM AM Department

12/31/2015 01/01/2016 New York

Descriptive Analysis

df.describe()

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 2/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

X
Y Coordinate Sc
Coordinate
Unique Key Incident Zip (State C
(State
Plane) Co
Plane)

count 3.645580e+05 361560.000000 3.605280e+05 360528.000000

mean 3.106595e+07 10858.496659 1.005043e+06 203425.305782

std 7.331531e+05 578.263114 2.196362e+04 29842.192857

min 2.960737e+07 83.000000 9.133570e+05 121185.000000

25% 3.049938e+07 10314.000000 9.919460e+05 182945.000000

df.shape

(364558, 53)

We see lots of missing value. All the values given in the above does not provides us very clear
insights about our data so we can move ahead with Exploratory Data Analysis.

Feature Creation

# Converting the data into datetime format


df["Created Date"]=pd.to_datetime(df["Created Date"])
df["Closed Date"]=pd.to_datetime(df["Closed Date"])

#Creating the new column that consist the amount of time taken to resolve the complaint
df["Request_Closing_Time"]=(df["Closed Date"]-df["Created Date"])

Request_Closing_Time=[]
for x in (df["Closed Date"]-df["Created Date"]):
close=x.total_seconds()/60
Request_Closing_Time.append(close)

df["Request_Closing_Time"]=Request_Closing_Time

Exploratory Data Analysis

df["Agency"].unique()

array(['NYPD'], dtype=object)

All of our data belongs to a single agency NYPD i.e New York City Police Department.

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 3/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

#Univariate Distribution Plot for Request Closing Time


sns.distplot(df["Request_Closing_Time"])
plt.show

<function matplotlib.pyplot.show(*args, **kw)>

print("Total Number of Concerns : ",len(df),"\n")


print("Percentage of Requests took less than 100 hour to get solved : ",round((len(df)-(
print("Percentage of Requests took less than 1000 hour to get solved : ",round((len(df)-(d

Total Number of Concerns : 364558

Percentage of Requests took less than 100 hour to get solved : 33.63 %
Percentage of Requests took less than 1000 hour to get solved : 97.44 %

From above we can see that the data is heavily skewed. There are lots of outliers. Almost more
than 97% of the requests are solved in less than 1000 hours i.e 17 days.

#Univariate Distribution Plot for Request Closing Time


sns.distplot(df["Request_Closing_Time"])
plt.xlim((0,5000))
plt.ylim((0,0.0003))
plt.show()

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 4/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

# Count plot to understand the type of the complaint raised


df['Complaint Type'].value_counts()[:10].plot(kind='barh',alpha=0.6,figsize=(15,10))
plt.show()

Almost around 85% of the the requests belongs to transport (Blocked driveway,Illegal Parking,
Vehicle Noise, Road Traffic etc ).

#Categorical Scatter Plot to understand which type of complaints are taking more time to g
g=sns.catplot(x='Complaint Type', y="Request_Closing_Time",data=df)
g.fig.set_figwidth(15)
g.fig.set_figheight(7)
plt.xticks(rotation=90)
plt.ylim((0,5000))
plt.show()

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 5/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

As we have got above that almost around 85% of the the requests belongs to transport (Blocked
driveway,Illegal Parking, Vehicle Noise, Road Traffic etc ). From this plot we can understand that
most of these issues have taken more time to get resolved. Government should take measure in
incresing awareness and find some measures to reduce traffic problems.

# Count plot to know the status of the requests


df['Status'].value_counts().plot(kind='bar',alpha=0.6,figsize=(15,7))
plt.show()

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 6/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

As of now almost 98% of the cases are closed state.

#Count Plot for Coloumn Borough


plt.figure(figsize=(12,7))
df['Borough'].value_counts().plot(kind='bar',alpha=0.7)
plt.show()

#Percentage of cases in each Borough


for x in df["Borough"].unique():
print("Percentage of Request from ",x," Division : ",round((df["Borough"]==x).sum()/le

Percentage of Request from MANHATTAN Division : 21.25


Percentage of Request from QUEENS Division : 27.64
Percentage of Request from BRONX Division : 13.49
Percentage of Request from BROOKLYN Division : 32.6

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 7/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

Percentage of Request from Unspecified Division : 0.81


Percentage of Request from STATEN ISLAND Division : 4.21
#Unique Location Types
df["Location Type"].unique()

array(['Street/Sidewalk', 'Club/Bar/Restaurant', 'Store/Commercial',


'House of Worship', 'Residential Building/House',
'Residential Building', 'Park/Playground', 'Vacant Lot',
'House and Store', 'Highway', 'Commercial', 'Roadway Tunnel',
'Subway Station', 'Parking Lot', 'Bridge', 'Terminal', nan,
'Ferry', 'Park'], dtype=object)

#Request Closing Time for all location Type sorted in ascending Order
pd.DataFrame(df.groupby("Location Type")["Request_Closing_Time"].mean()).sort_values("Requ

Request_Closing_Time

Location Type

Subway Station 145.120000

Club/Bar/Restaurant 183.492218

House of Worship 190.052861

Store/Commercial 192.928792

Highway 204.372348

Park/Playground 206.594724

Bridge 229.458333

Street/Sidewalk 261.052945

Residential Building 267.260350

Commercial 270.649846

Roadway Tunnel 283.486047

House and Store 291.750204

Parking Lot 296.526747

Residential Building/House 300.233145

Vacant Lot 404.561930

Park 20210.566667

Ferry NaN

Terminal NaN

We see that maximum(mean) time to resolve the complaint is taken in Park,Vacant Lot and
Commercial areas whereas the cases in the Subway Station and Restaurent are resolved in very
less time

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 8/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

#Request Closing Time for all City sorted in ascending Order


pd.DataFrame(df.groupby("City")["Request_Closing_Time"].mean()).sort_values("Request_Closi

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 9/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

Request_Closing_Time

City

ARVERNE 137.840605

ROCKAWAY PARK 139.602908

LITTLE NECK 155.031437

OAKLAND GARDENS 156.240167

BAYSIDE 160.062978

FAR ROCKAWAY 161.193068

NEW YORK 175.343723

FLUSHING 177.446478

FOREST HILLS 184.097636

WHITESTONE 187.976467

CORONA 188.984584

COLLEGE POINT 190.393782

JACKSON HEIGHTS 190.885368

ELMHURST 194.108392

FRESH MEADOWS 200.741045

REGO PARK 202.462138

BREEZY POINT 205.197849

EAST ELMHURST 206.801481

CENTRAL PARK 206.921364

STATEN ISLAND 228.038305

BROOKLYN 236.607935

Howard Beach 241.750000

Astoria 242.452302

Handling Missing
Long Island CityValues 245.388922

ASTORIA 265.236501

#Percentage RIDGEWOOD
Of Missing Value 268.285547
pd.DataFrame((df.isnull().sum()/df.shape[0]*100)).sort_values(0,ascending=False)[:20]
SAINT ALBANS 271.040767

East Elmhurst 273.630556

Woodside 281.455622

KEW GARDENS 283.319775

JAMAICA 305.346459

SOUTH OZONE PARK 308 283046


https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 10/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory
SOUTH OZONE PARK 308.283046
0
SOUTH RICHMOND HILL 318.020470
School or Citywide Complaint 100.000000
WOODHAVEN 321.714469
Garage Lot Name 100.000000
RICHMOND HILL 321.749064
Vehicle Type 100.000000
MIDDLE VILLAGE 323.290492
Taxi Pick Up Location 100.000000
OZONE PARK 328.309146
Taxi Company Borough 100.000000
MASPETH 328.997706
Ferry Direction 99.999726
HOLLIS 332.061427
Ferry Terminal Name 99.999451
HOWARD BEACH 346.959615
Road Ramp 99.928132
BRONX 353.116425
Bridge Highway Segment 99.928132
LONG ISLAND CITY 367.326726
Bridge Highway Direction 99.918531
SUNNYSIDE 380.744297
Bridge Highway Name 99.918531
WOODSIDE 389.758733
Landmark 99.897136
NEW HYDE PARK 423.396512
Intersection Street 2 86.144317
GLEN OAKS 501.653463
Intersection Street 1 85.977540
SPRINGFIELD GARDENS 510.113239
Cross Street 2 15.856187
CAMBRIA HEIGHTS 542.883117
Cross Street 1 15.686941
ROSEDALE 569.194745
Street Name 14.181283
We see that BELLEROSE 576.173614
all the data related to school columns are empty which must be because none of
Incident Address 14.181283
the request or complaint
QUEENS VILLAGEare from the school sector. Thus we can go on and remove that
593.920472
column. Descriptor 1.783255
FLORAL PARK 609.812160
Latitude 1.105448
QUEENS 717.171171
#Remove the column with very high percentage of missing value
new_df=df.loc[:,(df.isnull().sum()/df.shape[0]*100)<=50]

print("Old DataFrame Shape :",df.shape)


print("New DataFrame Shape : ",new_df.shape)

Old DataFrame Shape : (364558, 54)


New DataFrame Shape : (364558, 40)

rem=[]
for x in new_df.columns.tolist():
if new_df[x].nunique()<=3:
print(x+ " "*10+" : ",new_df[x].unique())
rem.append(x)

Agency : ['NYPD']
Agency Name : ['New York City Police Department' 'NYPD' 'Internal Affairs
Facility Type : ['Precinct' nan]
Park Facility Name : ['Unspecified' 'Alley Pond Park - Nature Center']

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 11/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

School Name : ['Unspecified' 'Alley Pond Park - Nature Center']


School Number :
['Unspecified' 'Q001']
School Region :
['Unspecified' nan]
School Code : ['Unspecified' nan]
School Phone Number : ['Unspecified' '7182176034']
School Address : ['Unspecified' 'Grand Central Parkway, near the soccer fi
School City : ['Unspecified' 'QUEENS']
School State : ['Unspecified' 'NY']
School Zip : ['Unspecified' nan]
School Not Found : ['N']

We see that all the data above have not much details, are Unspecified. So we can remove those
columns to ease our analysis

new_df.drop(rem,axis=1,inplace=True)

new_df.shape

(364558, 26)

#Remove columns that are not needed for our analysis


rem1=["Unique Key","Incident Address","Descriptor","Street Name","Cross Street 1","Cross S

new_df.drop(rem1,axis=1,inplace=True)

new_df.head()

Created Closed Complaint Location Incident Address


City S
Date Date Type Type Zip Type

2015- 2016-
Noise -
0 12-31 01-01 Street/Sidewalk 10034.0 ADDRESS NEW YORK C
Street/Sidewalk
23:59:45 00:55:15

2015- 2016-
Blocked
1 12-31 01-01 Street/Sidewalk 11105.0 ADDRESS ASTORIA C
Driveway
23:59:44 01:26:57

Hypothesis Testing

g=sns.catplot(x="Complaint Type",y="Request_Closing_Time",kind="box",data=new_df)
g.fig.set_figheight(8)
g.fig.set_figwidth(15)
plt.xticks(rotation=90)
plt.ylim((0,2000))

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 12/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

(0.0, 2000.0)

H0 : there is no significant different in mean of Request_Closing_Time for different

H1 : there is signficant different in mean of Request_Closing_Time for different Co

anova_df=pd.DataFrame()
anova_df["Request_Closing_Time"]=new_df["Request_Closing_Time"]
anova_df["Complaint"]=new_df["Complaint Type"]

anova_df.dropna(inplace=True)
anova_df.head()

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 13/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

Request_Closing_Time Complaint

lm=ols("Request_Closing_Time~Complaint",data=anova_df).fit()
0 55.500000 Noise - Street/Sidewalk
table=sm.stats.anova_lm(lm)
table 1 87.216667 Blocked Driveway

2 291.566667 Blocked Driveway


df sum_sq mean_sq F PR(>F)
3 465.450000 Illegal Parking
Complaint 22.0 1.487316e+09 6.760526e+07 565.26157 0.0
4 207.733333 Illegal Parking
Residual 362154.0 4.331361e+10 1.196000e+05 NaN NaN

Since p value for the Complaint is less that 0.01 thus we accept alternate hypothesis i.e there is
significant difference in the mean response time w.r.t different type of complaint.

H0 : Complaint Type and Location Type are independent

H1 : Complaint Type and Location Type are related

chi_sq=pd.DataFrame()
chi_sq["Location Type"]=new_df["Location Type"]
chi_sq["Complaint Type"]=new_df["Complaint Type"]

chi_sq.dropna(inplace=True)

data_crosstab = pd.crosstab( chi_sq["Location Type"],chi_sq["Complaint Type"])

stat, p, dof, expected = chi2_contingency(data_crosstab)

alpha = 0.05
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (H0 holds true)')

Dependent (reject H0)

Since p value for the chi square test is less than 0.05(LOS) we can conclude that Complaint Type
is dependent on Location Type i.e specific type of complaint is raised from specific places,

Conclusions
Maximum Complaints are raised in road and parking (vehicle) related sectors
On an average complains are closed in an span of 150 to 300 hours
Transport and Road related issues are taking more time to get resolved as number of
these cases are quite high.

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 14/15
11/17/22, 4:47 AM customer-service-requests-analysis.ipynb - Colaboratory

Number of cases from Borough goes as follows BROOKLYN > QUEENS > MANHATTAN >
BRONX > STATEN ISLAND
Complaint Type are Depentent on Location Type.
Time taken for solving different complaint type are different

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1stj2RbtiToEGH22EuWWOfcd_fFqNodrH#scrollTo=RxcXhZo3uas9&printMode=true 15/15

You might also like