Chicago Crime: .Shape .Dropna .Info Pandas DF - Convert Dtypes Date Updated On Datetime64 (NS) To Datetime
Chicago Crime: .Shape .Dropna .Info Pandas DF - Convert Dtypes Date Updated On Datetime64 (NS) To Datetime
ID Int64
Case Number string
Date datetime64[ns]
Block string
IUCR string
Primary Type string
Description string
Location Description string
Arrest boolean
Domestic boolean
Beat Int64
District Int64
Ward Int64
Community Area Int64
FBI Code string
X Coordinate Int64
Y Coordinate Int64
Year Int64
Updated On datetime64[ns]
Latitude Float64
Longitude Float64
Location string
(e) (2 pts) Writing a data frame to a .csv (comma separated format) file does not save information on column
data types. Use the parquet binary format instead. Not only will the file size be smaller, but data types
are preserved as well. Hint: df.to parquet()
(f) (2 pts) Reload df from the saved parquet file and check that data types have been preserved. Note the
.parquet file is significantly smaller than the .csv file.
(g) (2 pts) Construct a bar chart of the number of police crime reports each year from 2001 to 2017. Do you
observe a trend in the number of reported crimes? Explain.
(h) (2 pts) The commands below create a plot of the total number of crimes reported on each day of the year.
Explain the spikes at the beginning and end of the plot. Hint: Could it have something to do with New
Year’s eve and also leap years?
grouping_day = df.groupby(df[’Date’].dt.dayofyear)
grouping_day[’ID’].count().plot()
plt.title(’Total number of crimes reported on each day for all years.’)
1 https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
1
Total number of crimes reported on each day for all years.
25000
20000
15000
10000
5000
(i) (4 pts) How many unique FBI codes are listed in the data set? Hint: .unique() Construct a rough
description of the FBI codes by performing a cross tabulation with the Primary Type description of
each arrest record. Use the most common Primary Type description for each FBI code. Hint: I found
.idxmax() to be helpful. What does an FBI code of 03 correspond to?
(j) (4 pts) What types of crime are increasing and what types of crime are deceasing with time? Perform a
cross tabulation of FBI Code with Year. (FBI Code should be the index.) Then substitute the FBI code
with the Primary Type description determined above. Normalize the rows so that the rows are proportions
of each type of crime committed each year. The rows should add to 1. Finally, plot a heat map of the
data, sorting the rows by the proportions of crime committed in the first year, 2001. Draw conclusions
from the heat map. The commands below increase the size of the the figure and plot the heat map.
Hint: .div
plt.subplots(figsize=(20,10))
sns.heatmap(prob_crimes)
DECEPTIVE PRACTICE
LIQUOR LAW VIOLATION 0.200
DECEPTIVE PRACTICE
BATTERY
SEX OFFENSE 0.175
ARSON
MOTOR VEHICLE THEFT
PROSTITUTION 0.150
ASSAULT
ASSAULT
BATTERY
0.125
HOMICIDE
CRIMINAL DAMAGE
ROBBERY
0.100
OTHER OFFENSE
CRIM SEXUAL ASSAULT
THEFT
0.075
NARCOTICS
BURGLARY
DECEPTIVE PRACTICE
0.050
WEAPONS VIOLATION
GAMBLING
DECEPTIVE PRACTICE
0.025
OFFENSE INVOLVING CHILDREN
PUBLIC PEACE VIOLATION
HOMICIDE
0.000
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Year