0% found this document useful (0 votes)
313 views13 pages

Crime Analysis

Project 7 analyzes crime data from San Francisco to understand crime patterns. The document outlines steps to download crime incident data, preprocess it, load it into HDFS and MySQL, then analyze it using Hive queries. Key analyses include determining relative frequencies of crime types, crime occurrences by day of week and hour, and building a regression model to predict crime category based on attributes. Hive queries on the data in HDFS are demonstrated to analyze crime patterns.

Uploaded by

aashrit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
313 views13 pages

Crime Analysis

Project 7 analyzes crime data from San Francisco to understand crime patterns. The document outlines steps to download crime incident data, preprocess it, load it into HDFS and MySQL, then analyze it using Hive queries. Key analyses include determining relative frequencies of crime types, crime occurrences by day of week and hour, and building a regression model to predict crime category based on attributes. Hive queries on the data in HDFS are demonstrated to analyze crime patterns.

Uploaded by

aashrit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Project 7

Crime Analysis
By
Sridhar Akula
Introduction
• Data Science has always been used for growth and prosperity
of humanity. There is a new paradigm in Data analysis and Big
Data’s advent has made significant impact on various
organisations/departments in their decision making process
particularly when dealing with unstructured data .
• Big Data analysis has already been highly effective in law
enforcement and can make police departments more
effective, accountable, efficient, and proactive. As Hadoop
continues to spread through law enforcement agencies, it has
the potential to permanently change the way policing is
practiced and administered.
Problem Statement
• The dataset talks about 35,000 crime incidents that happened in the city of San Francisco in
the last 3 months
Write programs to answer the following queries on this data:

 Relative frequencies of different types of crime incidents

 Crime occurrence frequency as a function of day of the week

 Crime occurrence frequency as a function of hour of the day

 Regression model for predicting the category (& occurrence) of crime based on the remaining
attributes

The Data can be downloaded from the following link:

• https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry
Execution Steps
• Step 1: Download the 3 months of data from website as per
the link.
• Step 2: Data pre-processing has to be done before ingesting
data into hadoop distributed file system(HDFS).
– Replace ‘commas’ with ‘blank’ and drop the attributes: date and
location.
• Step 3: Create a directory “/tmp” with directory name
“project7”.
• Step 4: Copy downloaded data file in Virtual Machine for
Cloudera user at “\tmp\project7”.
• Step 5: Open Terminal and create database in MySQL, create
a table in that database and load data file into the table.
Commands
• Open Terminal, Enter in to super root user using “ su – “ command .( password : cloudera )
• Start MySQL service using command “service mysqld start “.
• Type “mysql “to enter in to MySQL shell.
• Once mysql is started you will see “ mysql> “grunt shell, Now write mysql queries to create database
,table and then load data into the table created.
• Create a database crime using command “create database crime; “.
• Use crime database using command “use crime;”
• Create a table using command “ create table <tablename>(attribute <datatype>)”.
• 
• create table crimeinfo(IncidntNum int(40),Category varchar(40),Descript varchar(100),DayOfWeek
varchar(100),Time time,PdDistrict varchar(100),Resolution varchar(200),Address varchar(200),X decimal,Y
decimal,PdId int);
• 
• Load data from “\tmp\project7 “directory to mysql crime database using command.
• 
• LOAD DATA LOCAL INFILE '/tmp/project7/SFPD.csv' INTO TABLE crimeinfo FIELDS TERMINATED BY ',' LINES
TERMINATED BY '\n' IGNORE 1 LINES;
• 
• Once data is loaded in to crime database, write commands to view data and analysis if it is required.
Sqoop
• Step 6: Use Sqoop to injest data into hadoop HDFS as the
data we now have data in Structured form.
• Open a new Terminal
• Injesting sql data in to hdfs using sqoop command
• sqoop import --connect jdbc:mysql://localhost/crime --table
crimeinfo --username root -m 1 --hive-import
• The above command, data is ingested to hdfs and also into
hive, as default table is moves to “default” database.
• Once ingesting is done ,Hive Queries can be written to solve
given problem.
• Each of the queries as stated in the problem statement are
now analysed on this data set.
Hive
• Step 7: Open browser and then Hue-(HIVE U) ( hadoop
web User Interface to write hive queries interactively).
• Relative frequencies of different types of crime
incidents:
-select category, count(1) from crimeinfo group by
category
query-- select a.no_crimes,a.category,b.total,
(a.no_crimes/b.total)*100 as relfreq from
sfd_crime_tab_relfreq a cross join (select
sum(no_crimes) AS total from s fd_crime_tab_relfreq) b
order by category
HUE - (Hive UI)
• Crime occurrence frequency as a function of
day of the week  
-select count(category) as dayofweek
,dayofweek from crimeinfo group by
dayofweek 
HUE - (Hive UI): Result
HUE - (Hive UI)
• Crime occurrence frequency as a function of
hour of the day
-select count(category) as time ,time from
crimeinfo group by time;
HUE - (Hive UI): Result
HUE - (Hive UI)
• Regression model for predicting the category
(& occurrence) of crime based on the
remaining attributes
C:\Users\Sridhar\
Desktop\Crime-ana Rcode.
References
• Olesker, A. (2012). Big Data Solutions for Law
Enforcement, CTO Labs.
• University of Strathclyde project on Using big
data analytics and genetic algorithms to
predict street crime and optimise crime
reduction measures.
• BIG DATA: SEIZING OPPORTUNITIES,
PRESERVING VALUES – White House

You might also like