0% found this document useful (0 votes)

96 views16 pages

Lab 05 - Data Analysis and Visulaization

This document outlines Lab 05 for the CS433 course on Internet of Things, focusing on data analysis and visualization using the San Francisco Crime dataset. It details the objectives, required resources, and steps for importing Python packages, loading, preparing, analyzing, and visualizing the data. The lab emphasizes the use of Python and Jupyter Notebook to demonstrate the Data Analysis Lifecycle.

Uploaded by

hoanganhyen0901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views16 pages

Lab 05 - Data Analysis and Visulaization

Uploaded by

hoanganhyen0901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Faculty of Computers and Artificial Intelligence

CS433: Internet of Things (IoT)

---------------------------------------------------------------------------------------------
Lab no 05 –Data Analysis and Visualization

This lab provides an introduction to data analysis and visualization.

In this lab, our data source is the San Francisco Crime data.

Parts: -

1. Python Packages.
2. Load Data.
3. Prepare Data.
4. Analyze Data.
5. Visualize Data.

© 2022 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 16
Lab no 05 – Data Analysis and Visualization

Lab - San Francisco Crime

Objectives
Demonstrate your knowledge of the Data Analysis Lifecycle using a given set of data and the tools, Python
and Jupyter Notebook
Part 1: Import the Python Packages
Part 2: Load the Data
Part 3: Prepare the Data
Part 4: Analyze the Data
Part 5: Visualize the Data
Background / Scenario
In this lab, you will import some Python packages required to analyze a data set containing San Francisco
crime information. You will then use Python and Jupyter Notebook to prepare this data for analysis, analyze
it, graph it, and communicate your findings.
Required Resources

• 1 PC with Internet access

• Raspberry Pi version 2 or higher
• Python libraries: pandas, numpy, matplotlib, folium, datetime, and csv
• Datafiles: Map-Crime_Incidents-Previous_Three_Months.csv

Part 1: Import the Python Packages

In this part, you will import the following Python packages necessary for the rest of this lab.
numpy
NumPy is the fundamental package for scientific computing with Python. It contains among other things: a
powerful N-dimensional array object and sophisticated (broadcasting) functions.
pandas
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures
and data analysis tools for the Python programming language.
matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics
extension NumPy.
folium
Folim is a library to create interactive map.

In [27]:
# Code cell 1
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import folium

Page 2 of 16
Lab no 05 – Data Analysis and Visualization

Part 2: Load the Data

In this part, you will load the San Francisco Crime Dataset and the Python packages necessary to analyze
and visualize it.

Step 1: Load the San Francisco Crime data into a data frame.
In this step, you will import the San Francisco crime data from a comma separated values (csv) file into a
data frame.

In [28]:
# code cell 2
# This should be a local path
dataset_path = './Data/Map-Crime_Incidents-Previous_Three_Months.csv'

# read the original dataset (in comma separated values format) into a DataFrame
pd.read_csv(dataset_path, sep=",")
SF = pd.read_csv(dataset_path)
print(SF)
IncidntNum Category Descript \
0 NaN LARCENY/THEFT GRAND THEFT FROM UNLOCKED AUTO
1 NaN LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
2 NaN LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
3 NaN DRUG/NARCOTIC POSSESSION OF METH-AMPHETAMINE
4 NaN DRUG/NARCOTIC POSSESSION OF COCAINE
... ... ... ...
30755 NaN LARCENY/THEFT PETTY THEFT SHOPLIFTING
30756 NaN OTHER OFFENSES DRIVERS LICENSE, SUSPENDED OR REVOKED
30757 NaN ASSAULT BATTERY
30758 NaN ASSAULT ASSAULT WITH CAUSTIC CHEMICALS
30759 NaN OTHER OFFENSES DRIVERS LICENSE, SUSPENDED OR REVOKED

DayOfWeek Date Time PdDistrict \

0 Sunday 08/31/2014 07:00:00 AM +0000 20:30 CENTRAL
1 Sunday 08/31/2014 07:00:00 AM +0000 14:30 CENTRAL
2 Sunday 08/31/2014 07:00:00 AM +0000 11:30 CENTRAL
3 Sunday 08/31/2014 07:00:00 AM +0000 17:49 MISSION
4 Sunday 08/31/2014 07:00:00 AM +0000 18:05 NORTHERN
... ... ... ... ...
30755 Sunday 06/01/2014 07:00:00 AM +0000 15:30 SOUTHERN
30756 Sunday 06/01/2014 07:00:00 AM +0000 16:00 NORTHERN
30757 Sunday 06/01/2014 07:00:00 AM +0000 15:00 TENDERLOIN
30758 Sunday 06/01/2014 07:00:00 AM +0000 15:20 CENTRAL
30759 Sunday 06/01/2014 07:00:00 AM +0000 13:15 INGLESIDE

Resolution Address X Y \
0 NONE HYDE ST / CALIFORNIA ST -122.417393 37.790974
1 NONE COLUMBUS AV / JACKSON ST -122.404418 37.796302
2 NONE SUTTER ST / STOCKTON ST -122.406959 37.789435
3 ARREST, BOOKED 16TH ST / MISSION ST -122.419672 37.765050
4 ARREST, BOOKED LARKIN ST / OFARRELL ST -122.417904 37.785167
... ... ... ... ...
30755 ARREST, BOOKED 900.0 Block of MARKET ST -122.408052 37.783957
30756 ARREST, CITED POLK ST / MCALLISTER ST -122.418601 37.780261
30757 ARREST, CITED 0.0 Block of JONES ST -122.412122 37.781379
30758 NONE 200.0 Block of GEARY ST -122.407434 37.787494
30759 ARREST, CITED MISSION ST / BOSWORTH ST -122.426391 37.733675

Location
0 (37.7909741243888, -122.417392830334)

Page 3 of 16
Lab no 05 – Data Analysis and Visualization

1 (37.7963018736036, -122.404417620748)
2 (37.7894347630337, -122.406958660602)
3 (37.7650501214965, -122.419671780296)
4 (37.7851670875814, -122.417903977564)
... ...
30755 (37.7839574642528, -122.408051765969)
30756 (37.7802607511488, -122.418600974625)
30757 (37.7813786419025, -122.412121608136)
30758 (37.7874944447786, -122.407434204569)
30759 (37.7336749150401, -122.426391018521)

[30760 rows x 12 columns]

To view the first five lines of the csv file, the Linux command head is used.

In [29]:
# code cell 3
!head -n 5 ./Data/Map-Crime_Incidents-Previous_Three_Months.csv

Step 2: View the imported data.

a) By typing the name of the data frame variable into a cell, you can visualize the top and bottom rows in a
structured way.

In [30]:
# Code cell 4
pd.set_option('display.max_rows', 10) #Visualize 10 rows
SF
Out[30]:
IncidntN DayOfW Tim Resolut
Category Descript Date PdDistrict Address X Y Location
um eek e ion
0 GRAND 08/31/2
HYDE ST (37.790974124
THEFT 014 -
LARCENY/T 20: / 37.790 3888, -
NaN FROM Sunday 07:00:0 CENTRAL NONE 122.417
HEFT 30 CALIFOR 974 122.41739283
UNLOCKE 0 AM 393
NIA ST 0334)
D AUTO +0000
1 GRAND 08/31/2
COLUMB (37.796301873
THEFT 014 -
LARCENY/T 14: US AV / 37.796 6036, -
NaN FROM Sunday 07:00:0 CENTRAL NONE 122.404
HEFT 30 JACKSO 302 122.40441762
LOCKED 0 AM 418
N ST 0748)
AUTO +0000
2 GRAND 08/31/2
SUTTER (37.789434763
THEFT 014 -
LARCENY/T 11: ST / 37.789 0337, -
NaN FROM Sunday 07:00:0 CENTRAL NONE 122.406
HEFT 30 STOCKT 435 122.40695866
LOCKED 0 AM 959
ON ST 0602)
AUTO +0000
3 POSSESSI 08/31/2
ARRES (37.765050121
ON OF 014 16TH ST / -
DRUG/NARC 17: T, 37.765 4965, -
NaN METH- Sunday 07:00:0 MISSION MISSION 122.419
OTIC 49 BOOKE 050 122.41967178
AMPHETA 0 AM ST 672
D 0296)
MINE +0000
4 ARRES LARKIN (37.785167087
POSSESSI 08/31/2 -
DRUG/NARC 18: NORTHE T, ST / 37.785 5814, -
NaN ON OF Sunday 014 122.417
OTIC 05 RN BOOKE OFARRE 167 122.41790397
COCAINE 07:00:0 904
D LL ST 7564)

Page 4 of 16
Lab no 05 – Data Analysis and Visualization

IncidntN DayOfW Tim Resolut

Category Descript Date PdDistrict Address X Y Location
um eek e ion
0 AM
+0000
... ... ... ... ... ... ... ... ... ... ... ... ...
307 06/01/2
PETTY ARRES 900.0 (37.783957464
55 014 -
LARCENY/T THEFT 15: SOUTHE T, Block of 37.783 2528, -
NaN Sunday 07:00:0 122.408
HEFT SHOPLIFTI 30 RN BOOKE MARKET 957 122.40805176
0 AM 052
NG D ST 5969)
+0000
307 DRIVERS 06/01/2
POLK ST (37.780260751
56 LICENSE, 014 ARRES -
OTHER 16: NORTHE / 37.780 1488, -
NaN SUSPEND Sunday 07:00:0 T, 122.418
OFFENSES 00 RN MCALLIS 261 122.41860097
ED OR 0 AM CITED 601
TER ST 4625)
REVOKED +0000
307 06/01/2
(37.781378641
57 014 ARRES 0.0 Block -
15: TENDERL 37.781 9025, -
NaN ASSAULT BATTERY Sunday 07:00:0 T, of JONES 122.412
00 OIN 379 122.41212160
0 AM CITED ST 122
8136)
+0000
307 ASSAULT 06/01/2
200.0 (37.787494444
58 WITH 014 -
15: Block of 37.787 7786, -
NaN ASSAULT CAUSTIC Sunday 07:00:0 CENTRAL NONE 122.407
20 GEARY 494 122.40743420
CHEMICAL 0 AM 434
ST 4569)
S +0000
307 DRIVERS 06/01/2
MISSION (37.733674915
59 LICENSE, 014 ARRES -
OTHER 13: INGLESID ST / 37.733 0401, -
NaN SUSPEND Sunday 07:00:0 T, 122.426
OFFENSES 15 E BOSWOR 675 122.42639101
ED OR 0 AM CITED 391
TH ST 8521)
REVOKED +0000
30760 rows × 12 columns

b) Use the function columns to view the name of the variables in the DataFrame.

In [31]:
# Code cell 5
SF.columns
Out[31]:
Index(['IncidntNum', 'Category', 'Descript', 'DayOfWeek', 'Date', 'Time',
'PdDistrict', 'Resolution', 'Address', 'X', 'Y', 'Location'],
dtype='object')

How many variables are contained in the SF data frame (ignore the Index)?

c) Use the function len to determine the number of rows in the dataset.

In [32]:
# Code cell 6
len(SF)
Out[32]:
30760

Page 5 of 16
Lab no 05 – Data Analysis and Visualization

Part 3: Prepare the Data

Now that you have the data loaded into the work environment and determined the analysis you want to
perform, it is time to prepare the data for analysis.

Step 1: Extract the month and day from the Date field.

lambda is a Python keyword to define so-called anonymous functions. lambda allows you to specify a
function in one line of code, without using def and without defining a specific name for it. The syntax for a
lambda expression is :
lambda parameters : expression.
In the following, the lambda function is used to create an inline function that selects only the month digits
from the Date variable, and int to transform a string representation into an integer. Then, the pandas
function apply is used to apply this function to an entire column (in practice, apply implicitly defines a for
loop and passes one by one the rows to the lambda function). The same procedure can be done for the
Day.

In [33]:
# Code cell 7
SF['Month'] = SF['Date'].apply(lambda row: int(row[0:2]))
SF['Day'] = SF['Date'].apply(lambda row: int(row[3:5]))

To verify that these two variables were added to the SF data frame, use the print function to print some
values from these columns, and type to check that these new columns contain indeed numerical values.

In [34]:
# Code cell 8
print(SF['Month'][0:2])
print(SF['Day'][0:2])
0 8
1 8
Name: Month, dtype: int64
0 31
1 31
Name: Day, dtype: int64

In [35]:
# Code cell 9
print(type(SF['Month'][0]))
<class 'numpy.int64'>

Step 2: Remove variables from the SF data frame.

a) The column IncidntNum contains many cells with NaN. In this instance, the data is missing. Furthermore,
the IncidntNum is not providing any value to the analysis. The column can be dropped from the data frame.
One way to remove unwanted variables in a data frame is by using the del function.

In [36]:
# Code cell 10
del SF['IncidntNum']

Page 6 of 16
Lab no 05 – Data Analysis and Visualization

b) Similarly, the Location attribute will not be in this analysis. It can be droped from the data frame.
Alternatively, you can use the drop function on the data frame, specifying that the axis is the 1 (0 for rows),
and that the command does not require an assignment to another value to store the result (inplace = True ).

In [37]:
# Code cell 11
SF.drop('Location', axis=1, inplace=True )

In [38]:
SF
Out[38]:
DayOfW Tim Resoluti Mon Da
Category Descript Date PdDistrict Address X Y
eek e on th y
0 GRAND 08/31/2
HYDE ST
THEFT 014 -
LARCENY/TH 20: / 37.7909
FROM Sunday 07:00:0 CENTRAL NONE 122.417 8 31
EFT 30 CALIFOR 74
UNLOCKED 0 AM 393
NIA ST
AUTO +0000
1 GRAND 08/31/2
COLUMB
THEFT 014 -
LARCENY/TH 14: US AV / 37.7963
FROM Sunday 07:00:0 CENTRAL NONE 122.404 8 31
EFT 30 JACKSON 02
LOCKED 0 AM 418
ST
AUTO +0000
2 GRAND 08/31/2
SUTTER
THEFT 014 -
LARCENY/TH 11: ST / 37.7894
FROM Sunday 07:00:0 CENTRAL NONE 122.406 8 31
EFT 30 STOCKT 35
LOCKED 0 AM 959
ON ST
AUTO +0000
3 POSSESSI 08/31/2
ARRES
ON OF 014 16TH ST / -
DRUG/NARC 17: T, 37.7650
METH- Sunday 07:00:0 MISSION MISSION 122.419 8 31
OTIC 49 BOOKE 50
AMPHETA 0 AM ST 672
D
MINE +0000
4 08/31/2
ARRES LARKIN
POSSESSI 014 -
DRUG/NARC 18: NORTHER T, ST / 37.7851
ON OF Sunday 07:00:0 122.417 8 31
OTIC 05 N BOOKE OFARREL 67
COCAINE 0 AM 904
D L ST
+0000
... ... ... ... ... ... ... ... ... ... ... ... ...
307 06/01/2
PETTY ARRES 900.0
55 014 -
LARCENY/TH THEFT 15: SOUTHER T, Block of 37.7839
Sunday 07:00:0 122.408 6 1
EFT SHOPLIFTI 30 N BOOKE MARKET 57
0 AM 052
NG D ST
+0000
307 DRIVERS 06/01/2
56 LICENSE, 014 ARRES POLK ST / -
OTHER 16: NORTHER 37.7802
SUSPENDE Sunday 07:00:0 T, MCALLIS 122.418 6 1
OFFENSES 00 N 61
D OR 0 AM CITED TER ST 601
REVOKED +0000
307 06/01/2 ARRES 0.0 Block -
15: TENDERL 37.7813
57 ASSAULT BATTERY Sunday 014 T, of JONES 122.412 6 1
00 OIN 79
07:00:0 CITED ST 122

Page 7 of 16
Lab no 05 – Data Analysis and Visualization

DayOfW Tim Resoluti Mon Da

Category Descript Date PdDistrict Address X Y
eek e on th y
0 AM
+0000
307 ASSAULT 06/01/2
200.0
58 WITH 014 -
15: Block of 37.7874
ASSAULT CAUSTIC Sunday 07:00:0 CENTRAL NONE 122.407 6 1
20 GEARY 94
CHEMICAL 0 AM 434
ST
S +0000
307 DRIVERS 06/01/2
MISSION
59 LICENSE, 014 ARRES -
OTHER 13: INGLESID ST / 37.7336
SUSPENDE Sunday 07:00:0 T, 122.426 6 1
OFFENSES 15 E BOSWOR 75
D OR 0 AM CITED 391
TH ST
REVOKED +0000
30760 rows × 12 columns

c) Check that the columns have been removed.

In [39]:
# Code cell 12
SF.columns
Out[39]:
Index(['Category', 'Descript', 'DayOfWeek', 'Date', 'Time', 'PdDistrict',
'Resolution', 'Address', 'X', 'Y', 'Month', 'Day'],
dtype='object')

Part 4: Analyze the Data

Now that the data frame has been prepared with the data, it is time to analyze the data.

Step 1: Summarize variables to obtain statistical information.

a) Use the function value_counts to summarize the number of crimes committed by type, then print to
display the contents of the CountCategory variable.

In [40]:
# Code cell 13
CountCategory = SF['Category'].value_counts()
print(CountCategory)
LARCENY/THEFT 8205
OTHER OFFENSES 4004
NON-CRIMINAL 3653
ASSAULT 2518
VEHICLE THEFT 1885
...
LOITERING 5
BAD CHECKS 3
PORNOGRAPHY/OBSCENE MAT 1
BRIBERY 1
GAMBLING 1
Name: Category, Length: 36, dtype: int64

b) By default, the counts are ordered in descending order. The value of the optional parameter ascending
can be set to True to reverse this behavior.

Page 8 of 16
Lab no 05 – Data Analysis and Visualization

In [41]:
# Code cell 14
SF['Category'].value_counts(ascending=True)
Out[41]:
GAMBLING 1
BRIBERY 1
PORNOGRAPHY/OBSCENE MAT 1
BAD CHECKS 3
LOITERING 5
...
VEHICLE THEFT 1885
ASSAULT 2518
NON-CRIMINAL 3653
OTHER OFFENSES 4004
LARCENY/THEFT 8205
Name: Category, Length: 36, dtype: int64

What type of crime was committed the most?

c) By nesting the two functions into one command, you can accomplish the same result with one line of
code.

In [42]:
# Code cell 15
print(SF['Category'].value_counts(ascending=True))
GAMBLING 1
BRIBERY 1
PORNOGRAPHY/OBSCENE MAT 1
BAD CHECKS 3
LOITERING 5
...
VEHICLE THEFT 1885
ASSAULT 2518
NON-CRIMINAL 3653
OTHER OFFENSES 4004
LARCENY/THEFT 8205
Name: Category, Length: 36, dtype: int64

Challenge Question: Which PdDistrict had the most incidents of reported crime? Provide the Python
command(s) used to support your answer.

In [43]:
# code cell 16
# Possible code for the challenge question
print(SF['PdDistrict'].value_counts(ascending=True))
RICHMOND 1622
PARK 1800
TARAVAL 2038
TENDERLOIN 2449
INGLESIDE 2613
BAYVIEW 2970
NORTHERN 3205
CENTRAL 3867
MISSION 4011
SOUTHERN 6185
Name: PdDistrict, dtype: int64

Page 9 of 16
Lab no 05 – Data Analysis and Visualization

Step 2: Subset the data into smaller data frames.

a) Logical indexing can be used to select only the rows for which a given condition is satisfied. For example,
the following code extracts only the crimes committed in August, and stores the result in a new DataFrame.

In [44]:
# Code cell 17
AugustCrimes = SF[SF['Month'] == 8]
AugustCrimes
Out[44]:
DayOfW Tim PdDistri Resoluti Mon Da
Category Descript Date Address X Y
eek e ct on th y
0 08/31/2
GRAND HYDE ST
014 -
LARCENY/TH THEFT FROM 20: CENTRA / 37.7909
Sunday 07:00:0 NONE 122.417 8 31
EFT UNLOCKED 30 L CALIFOR 74
0 AM 393
AUTO NIA ST
+0000
1 08/31/2
GRAND COLUMB
014 -
LARCENY/TH THEFT FROM 14: CENTRA US AV / 37.7963
Sunday 07:00:0 NONE 122.404 8 31
EFT LOCKED 30 L JACKSO 02
0 AM 418
AUTO N ST
+0000
2 08/31/2
GRAND SUTTER
014 -
LARCENY/TH THEFT FROM 11: CENTRA ST / 37.7894
Sunday 07:00:0 NONE 122.406 8 31
EFT LOCKED 30 L STOCKT 35
0 AM 959
AUTO ON ST
+0000
3 08/31/2
POSSESSION ARRES
014 16TH ST / -
DRUG/NARC OF METH- 17: T, 37.7650
Sunday 07:00:0 MISSION MISSION 122.419 8 31
OTIC AMPHETAMI 49 BOOKE 50
0 AM ST 672
NE D
+0000
4 08/31/2
ARRES LARKIN
014 -
DRUG/NARC POSSESSION 18: NORTHE T, ST / 37.7851
Sunday 07:00:0 122.417 8 31
OTIC OF COCAINE 05 RN BOOKE OFARRE 67
0 AM 904
D LL ST
+0000
... ... ... ... ... ... ... ... ... ... ... ... ...
971 08/01/2
1100.0
5 AIDED CASE, 014 -
NON- 19: Block of 37.7542
MENTAL Friday 07:00:0 MISSION NONE 122.406 8 1
CRIMINAL 55 POTRER 79
DISTURBED 0 AM 497
O AV
+0000
971 08/01/2
MISCELLANE 1500.0
6 014 -
OTHER OUS 22: RICHMO Block of 37.7844
Friday 07:00:0 NONE 122.441 8 1
OFFENSES INVESTIGATI 47 ND BRODERI 27
0 AM 458
ON CK ST
+0000
971 08/01/2
400.0
7 THREATS 014 -
23: BAYVIE Block of 37.7097
ASSAULT AGAINST Friday 07:00:0 NONE 122.401 8 1
55 W TUNNEL 48
LIFE 0 AM 364
AV
+0000

Page 10 of 16
Lab no 05 – Data Analysis and Visualization

DayOfW Tim PdDistri Resoluti Mon Da

Category Descript Date Address X Y
eek e ct on th y
971 DRIVING 08/01/2
ARRES
8 DRIVING WHILE 014 OAK ST / -
23: NORTHE T, 37.7745
UNDER THE UNDER THE Friday 07:00:0 LAGUNA 122.425 8 1
38 RN BOOKE 99
INFLUENCE INFLUENCE 0 AM ST 892
D
OF ALCOHOL +0000
971 08/01/2
ASSAULT TO 1000.0
9 SEX 014 -
RAPE WITH 00: Block of 37.7568
OFFENSES, Friday 07:00:0 MISSION NONE 122.406 8 1
BODILY 01 POTRER 26
FORCIBLE 0 AM 657
FORCE O AV
+0000
9720 rows × 12 columns

How many crime incidents were there for the month of August?

How many burglaries were reported in the month of August?

In [45]:
# code cell 18
# Possible code for the question: How many burglaries were reported in the month of
August?
AugustCrimes = SF[SF['Month'] == 8]
AugustCrimesB = SF[SF['Category'] == 'BURGLARY']
len(AugustCrimesB)
Out[45]:
1257

b) To create a subset of the SF data frame for a specific day, use the function query operand to compare
Month and Day at the same time.

In [46]:
# Code cell 19
Crime0704 = SF.query('Month == 7 and Day == 4')
Crime0704
Out[46]:
DayOf Ti PdDistri Resolu Mo D
Category Descript Date Address X Y
Week me ct tion nth ay
190 07/04/2
87 GRAND THEFT 014 -
LARCENY/ 22: SOUTH 8TH ST / 37.777
FROM LOCKED Friday 07:00:0 NONE 122.41 7 4
THEFT 30 ERN MISSION ST 457
AUTO 0 AM 3161
+0000
190 07/04/2
88 GRAND THEFT 014 -
LARCENY/ 18: SOUTH CLEMENTINA 37.774
FROM LOCKED Friday 07:00:0 NONE 122.41 7 4
THEFT 15 ERN ST / 9TH ST 201
AUTO 0 AM 2174
+0000
190 BURGLARY,RES 07/04/2
89 IDENCE UNDER 014 -
00: TARAV 0.0 Block of 37.748
BURGLARY CONSTRT, Friday 07:00:0 NONE 122.46 7 4
50 AL MENDOSA AV 011
FORCIBLE 0 AM 6414
ENTRY +0000

Page 11 of 16
Lab no 05 – Data Analysis and Visualization

DayOf Ti PdDistri Resolu Mo D

Category Descript Date Address X Y
Week me ct tion nth ay
190 07/04/2
90 014 -
NON- LOST 19: CASTRO ST / 37.764
Friday 07:00:0 PARK NONE 122.43 7 4
CRIMINAL PROPERTY 00 16TH ST 102
0 AM 5318
+0000
190 07/04/2
91 014 -
21: NORTH 1000.0 Block of 37.785
ASSAULT BATTERY Friday 07:00:0 NONE 122.41 7 4
00 ERN POLK ST 894
0 AM 9783
+0000
... ... ... ... ... ... ... ... ... ... ... ... ...
194 07/04/2
THE
23 GRAND THEFT 014 -
LARCENY/ 19: SOUTH EMBARCADER 37.787
FROM LOCKED Friday 07:00:0 NONE 122.38 7 4
THEFT 25 ERN OSOUTH ST / 103
AUTO 0 AM 8007
BRYANT ST
+0000
194 07/04/2
24 014 -
OTHER LOST/STOLEN 11: INGLES 0.0 Block of 37.716
Friday 07:00:0 NONE 122.39 7 4
OFFENSES LICENSE PLATE 00 IDE FRATESSA CT 129
0 AM 9762
+0000
194 07/04/2
THE
25 GRAND THEFT 014 -
LARCENY/ 20: SOUTH EMBARCADER 37.789
FROM LOCKED Friday 07:00:0 NONE 122.38 7 4
THEFT 30 ERN OSOUTH ST / 573
AUTO 0 AM 8486
HARRISON ST
+0000
194 07/04/2
26 GRAND THEFT 014 -
LARCENY/ 08: SOUTH 11TH ST / 37.770
FROM LOCKED Friday 07:00:0 NONE 122.41 7 4
THEFT 00 ERN HARRISON ST 631
AUTO 0 AM 2483
+0000
194 07/04/2
27 014 -
LARCENY/ PETTY THEFT 15: RICHM 3900.0 Block of 37.781
Friday 07:00:0 NONE 122.46 7 4
THEFT OF PROPERTY 30 OND GEARY BL 181
0 AM 1295
+0000
341 rows × 12 columns

In [47]:
# Code cell 20
SF.columns
Out[47]:
Index(['Category', 'Descript', 'DayOfWeek', 'Date', 'Time', 'PdDistrict',
'Resolution', 'Address', 'X', 'Y', 'Month', 'Day'],
dtype='object')

Part 5: Present the Data

Visualization and presentation of the data provides an instant overview that might not be apparent by simply
looking at the raw data. The SF data frame contains longitude and latitude coordinates that can be used to
plot the data.

Page 12 of 16
Lab no 05 – Data Analysis and Visualization

Step 1: Plot a graph of the SF data frame using the X and Y variables.

a) Use the plot() function to plot the SF data frame. Use the optional parameter to plot the graph in red
and setting the marker shape to a circle using ro .

In [48]:
# Code cell 21
plt.plot(SF['X'],SF['Y'], 'ro')
plt.show()

b) Identify the number of police department district, then build the dictionary pd_districts to associate their
string to an integer.

In [49]:
# Code cell 22
pd_districts = np.unique(SF['PdDistrict'])
pd_districts_levels = dict(zip(pd_districts, range(len(pd_districts))))
pd_districts_levels
Out[49]:
{'BAYVIEW': 0,
'CENTRAL': 1,
'INGLESIDE': 2,
'MISSION': 3,
'NORTHERN': 4,
'PARK': 5,
'RICHMOND': 6,
'SOUTHERN': 7,
'TARAVAL': 8,
'TENDERLOIN': 9}

c) Use apply and lambda to add the police deparment integer id to a new column of the DataFrame

Page 13 of 16
Lab no 05 – Data Analysis and Visualization

In [50]:
# Code cell 23
SF['PdDistrictCode'] = SF['PdDistrict'].apply(lambda row: pd_districts_levels[row])

d) Use the newly create PdDistrictCode to automatically change the color

In [51]:
# Code cell 24
plt.scatter(SF['X'], SF['Y'], c=SF['PdDistrictCode'])
plt.show()

Step 2: Add Map packages to enhance the plot.

In Step 1, you created a simple plot that displays where crime incidents took place in SF County. This plot is
useful, but folium provides additional functions that will allow you to overlay this plot onto an OpenStreet
map.

a) Folium requires the color of the marker to be specified using an hexadecimal value. For this reason, we
use the colors package, and select the necessary colors.

In [52]:
# Code cell 25
from matplotlib import colors
districts = np.unique(SF['PdDistrict'])
print(list(colors.cnames.values())[0:len(districts)])
['#9932CC', '#FAEBD7', '#778899', '#00FF7F', '#C71585', '#3CB371', '#00FFFF', '#556B2F', '#80
8080', '#FFA07A']

b) Create a color dictionary for each police department district.

In [53]:

Page 14 of 16
Lab no 05 – Data Analysis and Visualization

# Code cell 26
color_dict = dict(zip(districts, list(colors.cnames.values())[0:-
1:len(districts)]))
color_dict
Out[53]:
{'BAYVIEW': '#9932CC',
'CENTRAL': '#FFA500',
'INGLESIDE': '#FFF8DC',
'MISSION': '#FF7F50',
'NORTHERN': '#A0522D',
'PARK': '#FFE4B5',
'RICHMOND': '#FFB6C1',
'SOUTHERN': '#5F9EA0',
'TARAVAL': '#C0C0C0',
'TENDERLOIN': '#191970'}

c) Create the map using the middle coordinates of the SF Data to center the map (using mean). To reduce
the computation time, plotEvery is used to limit amount of plotted data. Set this value to 1 to plot all the rows
(might take a long time to visualize the map).

In [54]:
# Code cell 27
# Create map
map_osm = folium.Map(location=[SF['Y'].mean(), SF['X'].mean()], zoom_start = 12)
plotEvery = 50
obs = list(zip( SF['Y'], SF['X'], SF['PdDistrict']))

for el in obs[0:-1:plotEvery]:

folium.CircleMarker(el[0:2], color=color_dict[el[2]],
fill_color=el[2],radius=10).add_to(map_osm)

In [55]:
# Code cell 28
map_osm
Out[55]:

Page 15 of 16
Lab no 05 – Data Analysis and Visualization

Page 16 of 16

VA-CaseStudy - Report Final
No ratings yet
VA-CaseStudy - Report Final
28 pages
September 2024 Thames Valley Crime Data Analysis
No ratings yet
September 2024 Thames Valley Crime Data Analysis
5 pages
Urban Crime Analysis Case Study
No ratings yet
Urban Crime Analysis Case Study
5 pages
FBI - AnkitaDutta - Ipynb - Colab
No ratings yet
FBI - AnkitaDutta - Ipynb - Colab
61 pages
Baltimore Crime Report.
No ratings yet
Baltimore Crime Report.
11 pages
Chicago Crime: .Shape .Dropna .Info Pandas DF - Convert Dtypes Date Updated On Datetime64 (NS) To Datetime
No ratings yet
Chicago Crime: .Shape .Dropna .Info Pandas DF - Convert Dtypes Date Updated On Datetime64 (NS) To Datetime
2 pages
NM Project P II (1) - Compressed
No ratings yet
NM Project P II (1) - Compressed
15 pages
Crime Analysis Problem Statement
0% (1)
Crime Analysis Problem Statement
4 pages
How Police Uses Data To Predict Crime
No ratings yet
How Police Uses Data To Predict Crime
10 pages
BDA Lab EX-11
No ratings yet
BDA Lab EX-11
5 pages
Crime Analysis Dashboard Project
No ratings yet
Crime Analysis Dashboard Project
5 pages
Department of Criminology and Police Science
No ratings yet
Department of Criminology and Police Science
24 pages
Assignment 3.HTML
No ratings yet
Assignment 3.HTML
50 pages
San Francisco Crime Prediction Analysis
No ratings yet
San Francisco Crime Prediction Analysis
3 pages
Us Crime Data Exploration and Analysis
No ratings yet
Us Crime Data Exploration and Analysis
4 pages
Crime and Homelessness in Portland
No ratings yet
Crime and Homelessness in Portland
32 pages
US Crime Data Analysis 2022
No ratings yet
US Crime Data Analysis 2022
4 pages
ADV Exp 4 2022301014
No ratings yet
ADV Exp 4 2022301014
6 pages
Project Report
No ratings yet
Project Report
12 pages
2019 Nibrs Technical Specification V 1.0
No ratings yet
2019 Nibrs Technical Specification V 1.0
212 pages
UCPER in Mnthly Crime Statistics 2021
No ratings yet
UCPER in Mnthly Crime Statistics 2021
38 pages
Data Analytics Homework 1-3
No ratings yet
Data Analytics Homework 1-3
34 pages
SF Crime Data Pipeline Report
No ratings yet
SF Crime Data Pipeline Report
5 pages
Exercise: Detecting and Quantifying Patterns: Personal and Property Crime
0% (1)
Exercise: Detecting and Quantifying Patterns: Personal and Property Crime
36 pages
Homework 04
No ratings yet
Homework 04
2 pages
NYC Crime Data Insights
No ratings yet
NYC Crime Data Insights
12 pages
Data Analytics for Crime in Chicago
No ratings yet
Data Analytics for Crime in Chicago
21 pages
NYC 311 Service Request Analysis
93% (15)
NYC 311 Service Request Analysis
2 pages
Nashville Crime Analysis Report
No ratings yet
Nashville Crime Analysis Report
172 pages
2009 Oakland CID Annual Report
No ratings yet
2009 Oakland CID Annual Report
13 pages
CompStat Report
No ratings yet
CompStat Report
172 pages
ML#05
No ratings yet
ML#05
35 pages
Sachin Project
No ratings yet
Sachin Project
13 pages
Crime Data Report January 2006
No ratings yet
Crime Data Report January 2006
162 pages
Anderson County Sheriff's Office: Heritage Trace
No ratings yet
Anderson County Sheriff's Office: Heritage Trace
6 pages
Summary of Dataset Composition-Brown
No ratings yet
Summary of Dataset Composition-Brown
2 pages
San Francisco Crime Data Analysis
No ratings yet
San Francisco Crime Data Analysis
4 pages
Minnesota Crime Report 2022
No ratings yet
Minnesota Crime Report 2022
68 pages
Smart Data
No ratings yet
Smart Data
19 pages
Chicago Crime Reduction via Data Science
No ratings yet
Chicago Crime Reduction via Data Science
29 pages
Crimestat Iii: Susan C. Smith Christopher W. Bruce
No ratings yet
Crimestat Iii: Susan C. Smith Christopher W. Bruce
145 pages
Solution By: Team Venom Psid: Intl-Da-07 Team Leader Name: Nirav Parekh
No ratings yet
Solution By: Team Venom Psid: Intl-Da-07 Team Leader Name: Nirav Parekh
6 pages
3 Police Dept Annual Update
No ratings yet
3 Police Dept Annual Update
18 pages
How To Use The Crime Map API
No ratings yet
How To Use The Crime Map API
6 pages
Final Capstone Project - Group 4 - TPS
No ratings yet
Final Capstone Project - Group 4 - TPS
27 pages
Ohio Uniform Incident Report Summary
No ratings yet
Ohio Uniform Incident Report Summary
4 pages
CRIME2 Description
No ratings yet
CRIME2 Description
1 page
Anderson County Sheriff's Office: Shiloh Creek
No ratings yet
Anderson County Sheriff's Office: Shiloh Creek
6 pages
Crime Statistics
No ratings yet
Crime Statistics
4 pages
Crimes Tati I I Work Book
No ratings yet
Crimes Tati I I Work Book
151 pages
Spatial Statistics in Crime Analysis:: Using Crimestat Iii®
No ratings yet
Spatial Statistics in Crime Analysis:: Using Crimestat Iii®
151 pages
San Antonio 2021 Crime Report
No ratings yet
San Antonio 2021 Crime Report
14 pages
SF Crime Trends & Police Allocation
No ratings yet
SF Crime Trends & Police Allocation
35 pages
Spatial Modeling
No ratings yet
Spatial Modeling
20 pages
Crime Stats
No ratings yet
Crime Stats
12 pages
Cleveland Crime Statistics
No ratings yet
Cleveland Crime Statistics
25 pages
Carrollton PD Crime Summary July 27 - Aug. 2
100% (2)
Carrollton PD Crime Summary July 27 - Aug. 2
3 pages
Nathaniel Brown Module1 Discussion Assignment
No ratings yet
Nathaniel Brown Module1 Discussion Assignment
1 page
IoT Serial Communication with NodeMCU
No ratings yet
IoT Serial Communication with NodeMCU
43 pages
IoT Final Project 2024
No ratings yet
IoT Final Project 2024
2 pages
Hybrid CNN-BiLSTM for Emergency HAR
No ratings yet
Hybrid CNN-BiLSTM for Emergency HAR
24 pages
1 s2.0 S2352340924001458 Main
No ratings yet
1 s2.0 S2352340924001458 Main
15 pages
1000698E v02 Alere Triage MeterPro How To Conduct Tes
No ratings yet
1000698E v02 Alere Triage MeterPro How To Conduct Tes
2 pages
Revision Questions
No ratings yet
Revision Questions
68 pages
Mayur Mahajan Azure Data Engineer
No ratings yet
Mayur Mahajan Azure Data Engineer
2 pages
Chapter 3 Artificial Intelligence (AI) - Final
No ratings yet
Chapter 3 Artificial Intelligence (AI) - Final
43 pages
JIMMA University CBTP PHASE 2
100% (2)
JIMMA University CBTP PHASE 2
22 pages
JavaScript Error Handling Guide
No ratings yet
JavaScript Error Handling Guide
6 pages
Computer Programming C 1 To 5 Units Notes
No ratings yet
Computer Programming C 1 To 5 Units Notes
108 pages
CS20B1060
No ratings yet
CS20B1060
16 pages
Understanding Functional Dependency in DBMS
No ratings yet
Understanding Functional Dependency in DBMS
45 pages
Vocalcom HNetVX HowTo - IVR - Call A DLL From IVR Script
No ratings yet
Vocalcom HNetVX HowTo - IVR - Call A DLL From IVR Script
6 pages
1 25cs01i It Skills
No ratings yet
1 25cs01i It Skills
12 pages
Modbus Protocol Overview and Applications
No ratings yet
Modbus Protocol Overview and Applications
6 pages
Understanding ASCII: Basics and Limitations
No ratings yet
Understanding ASCII: Basics and Limitations
4 pages
ErrorDetails de Pluing
No ratings yet
ErrorDetails de Pluing
3 pages
MP Notes by Campusify
No ratings yet
MP Notes by Campusify
84 pages
Fixed - USB Drive Unusable
No ratings yet
Fixed - USB Drive Unusable
39 pages
Event Management Application
No ratings yet
Event Management Application
120 pages
MasterKoder Brochure - English
No ratings yet
MasterKoder Brochure - English
6 pages
Full Book 2
No ratings yet
Full Book 2
91 pages
SQL Query and Database Normalization Guide
No ratings yet
SQL Query and Database Normalization Guide
5 pages
GE Fanuc Automation: SRTP TCP/IP Driver
No ratings yet
GE Fanuc Automation: SRTP TCP/IP Driver
18 pages
Block Family
No ratings yet
Block Family
5 pages
NLP with Python & NLTK Guide
No ratings yet
NLP with Python & NLTK Guide
2 pages
Orcas Island VRP Compliance Analysis
No ratings yet
Orcas Island VRP Compliance Analysis
12 pages
Introduction To Programming Syllabus
No ratings yet
Introduction To Programming Syllabus
5 pages
Visual Programming 1
No ratings yet
Visual Programming 1
159 pages
Fundamentals of Programming - Lecture 3-1
No ratings yet
Fundamentals of Programming - Lecture 3-1
38 pages
Fatima Sheikh MP Home Assignment
No ratings yet
Fatima Sheikh MP Home Assignment
3 pages
Lab Manuals Automation Lab 1-7
No ratings yet
Lab Manuals Automation Lab 1-7
31 pages
Course 6 Customer Project Handover Deck Template RB
No ratings yet
Course 6 Customer Project Handover Deck Template RB
7 pages

Lab 05 - Data Analysis and Visulaization

Uploaded by

Lab 05 - Data Analysis and Visulaization

Uploaded by

Faculty of Computers and Artificial Intelligence

CS433: Internet of Things (IoT)

This lab provides an introduction to data analysis and visualization.

Lab - San Francisco Crime

• 1 PC with Internet access

Part 1: Import the Python Packages

Part 2: Load the Data

DayOfWeek Date Time PdDistrict \

[30760 rows x 12 columns]

Step 2: View the imported data.

IncidntN DayOfW Tim Resolut

Part 3: Prepare the Data

Step 2: Remove variables from the SF data frame.

DayOfW Tim Resoluti Mon Da

c) Check that the columns have been removed.

Part 4: Analyze the Data

Step 1: Summarize variables to obtain statistical information.

What type of crime was committed the most?

Step 2: Subset the data into smaller data frames.

DayOfW Tim PdDistri Resoluti Mon Da

How many burglaries were reported in the month of August?

DayOf Ti PdDistri Resolu Mo D

Part 5: Present the Data

d) Use the newly create PdDistrictCode to automatically change the color

Step 2: Add Map packages to enhance the plot.

b) Create a color dictionary for each police department district.

You might also like