Fyp Caps
Fyp Caps
Fyp Caps
Prepared by:
Mr. Noman Rashid
IS/BS(CS)/A-16/M/031
BS(CS) in Software Systems
Mr. Umair Ahmed
IS/BS(CS)/A-16/M/143
BS(CS) in Software Systems
Supervised by:
Dr. Saeed Ullah
Associate Professor
HOD.
Department of
Computer Science
Coordinated by:
Ms. Sabira Feroz
Department of Computer Science
Session [2016-2020]
Federal Urdu University of Arts, Science and Technology Islamabad
Abstract
“Crime Analysis & Prediction System” based on predictive modeling predicts and analyze the
crimes on the basis of the time and location.
To have a better response towards criminal activity, it is very important that one should understand
the patterns in crime. We analyze this pattern by taking crime datasets from the Chicago Police
Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. This dataset
includes different crimes in the city of Chicago. The major aim of this mission is to expect the
type of crime that is most probably to take place at a detailed time and places in Chicago. Finally,
this paper uses a different algorithm like Decision Tree, KNN, MLP, XG-Boot & FBProphet for
forecasting.
INTRODUCTION:
Historically, solving crimes has been the right of the criminal justice and law
enforcement specialists. With the increase in the use of the computerized systems to
track crimes and trace criminals, computer data analysts have started lending their
hands in helping the law enforcement officers and detectives to speed up the process of
solving crimes. Criminology is process that is used to identify crime and criminal
characteristics. In the middle of 1990s, data mining came into existence as a strong tool
to extract useful information from large datasets and find the relationship between the
attributes of the data. Data mining originally came from statistics and machine learning
as an interdisciplinary field, but then it was grown a lot that in 2001 it was considered as
one of the top 10 leading technologies which will change the world. For solving crimes
faster we have to develop a data mining paradigm that performs an interdisciplinary
approach between computer science and criminal justice. The Criminology is a process
that aims to identify crime characteristics and it is one of the most important fields for
applying data mining. By using this, data mining algorithms will be able to produce crime
reports and help in the identification of criminals much faster than any human could.
Actually, Crime analysis is a process which includes exploring the behavior of the
crimes, detecting crimes and their relationships with criminals. Identifying crime
characteristics is the first step for proceeding with any further analysis. The quality of
data analysis depends greatly on background knowledge of analyst. The knowledge
that is gained from Data Mining approaches is a very useful and this can help and
support, the police. More specifically, we can use classification and clustering based
models to help in identification of crime patterns and criminals. The wide range of data
mining applications in the criminology has made it an important field of research.
The motivation for proceeding with this survey work is to aid a helping hand to the
young researchers who are performing their research in criminal analysis and crime
prediction areas.
The document is organized in such a manner to provide insights about the crime
analysis procedure and then produce different types of crime analysis operations and
those which can be applied together for producing an end user product which can be
1
applied to the crime analysis in any police stations and detective agencies. This work
will be a valuable reference to those who precede their research work in the crime
analysis and Crime prediction using data mining techniques.
This survey work is organized in such a manner for easy understanding of the concepts.
The Criminal analysis methods are discussed in this document which will include all the
different types of methods grouped under their own categories and in the end
Quantitative analysis of the Crime Analysis and Prediction techniques are discussed.
Related Work:
2
project is to give a jest idea of how machine learning can be used by the law
enforcement agencies to detect, predict and solve crimes at a much faster rate and thus
reduces the crime rate. It not restricted to Chicago, this can be used in other states or
countries depending upon the availability of the dataset.
Dataset:
The dataset used in this project is taken from Kaggle.com. The dataset obtained from
kaggle is maintained and updated by the Chicago police department.
Figure 1: Dataset
3
Figure 3: System Architecture
ii.) Data mining Techniques in detecting and predicting Cyber crimes in Banking
Sector.
Author:
K. Chitra Lekha , Dr.S.Prakasam
Objectives:
Data mining applications are utilized in many banking sectors for client
segmentation and productivity, credit scores and authorization, predicting payment
default, advertising, detecting fake transactions, etc. This paper presents a general idea
about the model of Data Mining techniques and diverse cyber crimes in banking
applications. It also provides an inclusive survey of competent and valuable techniques
on data mining for cyber crime data analysis. The objective of cyber crime data mining
is to recognize patterns in criminal manners in order to predict crime anticipate criminal
activity and prevent it. This paper implements a novel data mining techniques like K-
Means, Influenced Association Classifier and J48 Prediction tree for investigating the
cyber crime data sets and sorts out the accessible problems. The K-Means algorithm is
being utilized for unsupervised learning cluster within influenced Association
Classification. K-means selects the initial centroids so that the classifier can mine the
record and formulate predictions of cyber crimes with J48 algorithm. The collective
knowledge of K-Means, Influenced Association Classifier and J48 Prediction tree tends
4
certainly to afford a enhanced, incorporated, and precise result over the cyber crime
prediction in the banking sectors Our law enforcement organizations require to be
adequately outfitted to defeat and prevent the cyber crime
Dataset:
Collected the crime dataset from Andhra Pradesh police department.
5
develop a data mining procedure that can help solve crimes faster. Criminals also be
predicted based on the crime data. Events of crime and illegal activities have increased
in the past few years. We propose a system which can analyze, detect and predict
various crime probability in a given region. To accomplish this, we obtain raw data from
police department official website. On this pre processed data sets, by applying Naïve
Bayesian algorithm we create a predictive model which analyze the data and helps to
predict the trends of crimes for a given region in a future. With the aim of securing the
society from crimes, there is a need for advance system and new approaches for
improving the crime analytics for protecting their community. Accurate real time crime
predictions help to reduce the crime rate. But remains challenging problem for the
scientific community as a crime occurrences depend on many complex factors. The
hidden relationship among the data which is further used to report and discover the
crime pattern s that is valuable for the crime analytics to analyze these crime networks
by the means of various interactive visualization for crime prediction and hence is
supportive in prevention of crimes. This probabilistic trend is also displayed in form of
graphs for easy understanding of the police department. This paper explains various
types of criminal analysis and crime prediction using several Data Mining techniques.
Towards this goal, crime hotspot prediction has previously been suggested. Crime
hotspot prediction leverages past data in order to identify crime hotspots, while ignoring
the predictive power of other data such as urban r social media data. Crime data
analysts can help the law enforcement officers to speed up the process of solving
crimes. Using the concept of data mining we can extract previously unknown, useful
information from an unstructured data. Here we have an approach between computer
science and criminal justice to develop a data mining procedure that can help solve
crime faster.
6
Crime analysis and prevention is a systematic approach for identifying and analyzing
patterns and trends in crime. Our system can predict regions which have high
probability for crime occurrence and can visualize crime prone areas. With the
increasing advent of computerized systems, crime data analysts can help the Law
enforcement officers to speed up the process of solving crimes. Using the concept of
data mining we can extract previously unknown, useful information from an unstructured
data. Here we have an approach between computer science and criminal justice to
develop a data mining procedure that can help solve crimes faster. Instead of focusing
on causes of crime occurrence like criminal background of offender, political enmity etc
we are focusing mainly on crime factors of each day.
Dataset:
Collected the crime dataset from NAVI MUMBA
police department india.
7
Figure1.Pseudocode naive bayes
8
There had been an enormous increase in the crime in the recent past. Crimes are a
common social problem affecting the quality of life and the economic growth of a
society. With the increase of crimes, law enforcement agencies are continuing to
demand advanced systems and new approaches to improve crime analytics and better
protect their communities. Decision tree (J48) applied in the context of law enforcement
and intelligence analysis holds the promise of alleviating such problem. Data mining is a
way to extract knowledge out of usually large data sets; in other words it is an approach
to discover hidden relationships among data by using artificial intelligence methods of
which decision tree (J48) is inclusive. The wide range of machine learning applications
has made it an important field of research. Criminology is one of the most important
fields for applying data mining. Criminology is a process that aims to identify crime
characteristics. This study considered the development of crime prediction prototype
model using decision tree (J48) algorithm because it has been considered as the most
efficient machine learning algorithm for prediction of crime data as described in the
related literature. From the experimental results, J48 algorithm predicted the unknown
category of crime data to the accuracy of 94.25287% which is fair enough for the
system to be relied on for prediction of future crimes
Dataset:
The dataset used for this study is real and authentic. The dataset was acquired from
UCI machine learning repository website. The title of the dataset is ‘Crime and
Communities’. This dataset contains a total number of 128 attributes and 1994
instances. All data provided in this dataset is numeric and normalized. The complete
details of all 128 attributes can be acquired from the UCI machine learning repository
website.
9
Figure 1: Block diagram of the proposed crime predictive System
10
avoid more crimes. The main objective of this paper is to classify clustered crimes
based on occurrence frequency during different years. Data mining is used extensively
in terms of analysis, investigation and discovery of patterns for occurrence of different
crimes. We applied a theoretical model based on data mining techniques such as
clustering and classification to real crime dataset recorded by police in England and
Wales within 1990 to 2011. We assigned weights to the features in order to improve the
quality of the model and remove low value of them. The Genetic Algorithm (GA) is used
for optimizing of Outlier Detection operator parameters using RapidMiner tool.
Dataset:
A dataset of crimes recorded by police in England and Wales1 within 1990 to 2011 has
been used, and RapidMiner will be used for the purpose of implementation.
11
Figure 2: Predicting feature crime trends
vii.) CRIME ANALYSIS AND PREDICTION USING DATA MINING
Author:
JANNATUL FARIA
Objectives:
Crime analysis and prediction is systematic way for detecting crime, analyzing patterns
of crime and predicting crime trends. Data mining, it is suitable method to apply on large
quantity of crime dataset and information obtained from data mining technique is very
effective that support law enforcement officers. Our system can detect and predict areas
which are crime prone and have high probability for crime occurrence and can visualize
crime prone areas in radius. The results of this method can be used to increase
people’s awareness related to the locations that are dangerous and to assist police
force to predict crimes that will occur in the future in a target area within a target time.
With the rapid increase and usages of computerized and informational systems, data
analysts of crime can assist the police forces to fasten up the process of crime solving
in our community. It is interesting to know that about 15% of the criminals commits 40%
of the total occurrence. Even though it is not possible to estimate who may be the
12
victims of a crime but we can estimate the place that has high chance for crime
occurrence. K-means algorithm, is popular for cluster analysis in data mining, is done by
splitting data into different subset based on their different means. Expectation -
maximization algorithm is an extension where it is possible to split the data based on
their available parameters. Data mining framework which is easy to implement, works
with the geospatial plot of crime and criminal activities and assists to increase the
effectiveness of the police forces and other law enforcement agencies. Our developed
system can also be applied for the Bangladesh crime departments, by doing this the
Bangladesh police can reduce the crime and solve the crimes with minimum time.
Dataset:
Boston Police Department Crime data was downloaded and used for research purpose.
13
Figure 2: WEKA, Test data file import
14
viii.) Crime Analysis Through Machine Learning
Author:
Suhong Kim , Param Joshi, Parminder Singh Kalsi, and Pooya Taheri
Objectives:
This paper investigates machine-learning-based crime prediction. In this work,
Vancouver crime data for the last 15 years is analyzed using two different data-
processing approaches. Machine-Learning predictive models, K-nearest-neighbour and
boosted decision tree, are implemented and a crime prediction accuracy between 39%
to 44% is obtained when predicting crime in Vancouver.
Dataset:
Vancouver Police Department (VPD) Canada crime dataset use.
Figure 1: Dataset
15
METHODOLOGY:
We use a specific dataset to train the algorithm. The algorithm that is used to train the dataset are
Decision Tree, XG-Boost, KNN, MLP and Extra Trees Classifier. The following steps are
followed for all the implemented algorithms:
1. Features
The dataset that is used is collected from the Chicago Police Department's CLEAR (Citizen
Law Enforcement Analysis and Reporting) system. The datasets consist of facts on crime
prevalence that has taken area in Chicago over the time of 1/1/2012 to 1/1/2017. The dataset
that we used is in a CSV format, which contains more than 1,500,000 records/rows. There
are different attributes of the dataset. The attributes that are used in this paper is given in the
table:
Latitude
Longitude
Districts
Arrest
Location Description
Description
Year
Month
Day
Hour
Minute
Primary Type
16
Data Preprocessing:
For preprocessing the dataset, we have used Python library Scikit-learn (sklearn).
1. The dataset consists of some attributes, which are string values, and other attributes are in
numeric values. To train the model, the text features in this dataset needs to be converted right
into a numeric value. This conversion is done with the usage of Python library NumPy.
2. Attributes in our dataset with string type are “Date”, “Location”, “Location Description” etc.
Using python, we assigned numeric values for those capabilities.
3. Since time is considered as the main factor thus “Date” has been split into “Day”, “Month”,
“Year”, “Hour”, “Minute”, “Second” attributes.
17
Fig4 Criminal activities occurring on different days
18
Fig6 Normalized Crime Types by Location
19
Fig8 Normalized Crime Types by Days
20
Fig10 Crime Count Per Week
Results:
We use dataset which contains both the mixture of categorical and numeric values. Thus, the
mainly focuses on those algorithms which can work on the combination of both categorical and
numeric values. Also, keeping in mind that, the algorithm performs well for our classification
problem. Therefore, several algorithms are chosen to serve the purpose such as Decision Tree,
KNN, XG-Boost, MLP and ensemble method such as Extra Tree Classifier. The main motive of
this paper is to use algorithms on these datasets to classify the type of crime occurring based on
time and location. The chosen algorithms are applied where it provides a simple and fast way of
learning a function. This is where the algorithm maps data x to outputs y, where x is a mixture of
categorical and numeric variables and y is the categorical value for classification. The applied
algorithm gives better performance for any classification problem. The result after reducing the
classes is shown in the below table for all algorithms.
21
KNN 81% 72%
22