Crime Analysis
Crime Analysis
Crime Analysis
By
Sridhar Akula
Introduction
• Data Science has always been used for growth and prosperity
of humanity. There is a new paradigm in Data analysis and Big
Data’s advent has made significant impact on various
organisations/departments in their decision making process
particularly when dealing with unstructured data .
• Big Data analysis has already been highly effective in law
enforcement and can make police departments more
effective, accountable, efficient, and proactive. As Hadoop
continues to spread through law enforcement agencies, it has
the potential to permanently change the way policing is
practiced and administered.
Problem Statement
• The dataset talks about 35,000 crime incidents that happened in the city of San Francisco in
the last 3 months
Write programs to answer the following queries on this data:
Regression model for predicting the category (& occurrence) of crime based on the remaining
attributes
• https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry
Execution Steps
• Step 1: Download the 3 months of data from website as per
the link.
• Step 2: Data pre-processing has to be done before ingesting
data into hadoop distributed file system(HDFS).
– Replace ‘commas’ with ‘blank’ and drop the attributes: date and
location.
• Step 3: Create a directory “/tmp” with directory name
“project7”.
• Step 4: Copy downloaded data file in Virtual Machine for
Cloudera user at “\tmp\project7”.
• Step 5: Open Terminal and create database in MySQL, create
a table in that database and load data file into the table.
Commands
• Open Terminal, Enter in to super root user using “ su – “ command .( password : cloudera )
• Start MySQL service using command “service mysqld start “.
• Type “mysql “to enter in to MySQL shell.
• Once mysql is started you will see “ mysql> “grunt shell, Now write mysql queries to create database
,table and then load data into the table created.
• Create a database crime using command “create database crime; “.
• Use crime database using command “use crime;”
• Create a table using command “ create table <tablename>(attribute <datatype>)”.
•
• create table crimeinfo(IncidntNum int(40),Category varchar(40),Descript varchar(100),DayOfWeek
varchar(100),Time time,PdDistrict varchar(100),Resolution varchar(200),Address varchar(200),X decimal,Y
decimal,PdId int);
•
• Load data from “\tmp\project7 “directory to mysql crime database using command.
•
• LOAD DATA LOCAL INFILE '/tmp/project7/SFPD.csv' INTO TABLE crimeinfo FIELDS TERMINATED BY ',' LINES
TERMINATED BY '\n' IGNORE 1 LINES;
•
• Once data is loaded in to crime database, write commands to view data and analysis if it is required.
Sqoop
• Step 6: Use Sqoop to injest data into hadoop HDFS as the
data we now have data in Structured form.
• Open a new Terminal
• Injesting sql data in to hdfs using sqoop command
• sqoop import --connect jdbc:mysql://localhost/crime --table
crimeinfo --username root -m 1 --hive-import
• The above command, data is ingested to hdfs and also into
hive, as default table is moves to “default” database.
• Once ingesting is done ,Hive Queries can be written to solve
given problem.
• Each of the queries as stated in the problem statement are
now analysed on this data set.
Hive
• Step 7: Open browser and then Hue-(HIVE U) ( hadoop
web User Interface to write hive queries interactively).
• Relative frequencies of different types of crime
incidents:
-select category, count(1) from crimeinfo group by
category
query-- select a.no_crimes,a.category,b.total,
(a.no_crimes/b.total)*100 as relfreq from
sfd_crime_tab_relfreq a cross join (select
sum(no_crimes) AS total from s fd_crime_tab_relfreq) b
order by category
HUE - (Hive UI)
• Crime occurrence frequency as a function of
day of the week
-select count(category) as dayofweek
,dayofweek from crimeinfo group by
dayofweek
HUE - (Hive UI): Result
HUE - (Hive UI)
• Crime occurrence frequency as a function of
hour of the day
-select count(category) as time ,time from
crimeinfo group by time;
HUE - (Hive UI): Result
HUE - (Hive UI)
• Regression model for predicting the category
(& occurrence) of crime based on the
remaining attributes
C:\Users\Sridhar\
Desktop\Crime-ana Rcode.
References
• Olesker, A. (2012). Big Data Solutions for Law
Enforcement, CTO Labs.
• University of Strathclyde project on Using big
data analytics and genetic algorithms to
predict street crime and optimise crime
reduction measures.
• BIG DATA: SEIZING OPPORTUNITIES,
PRESERVING VALUES – White House