Crime Analysis and Prediction Using Datamining: A Review
Crime Analysis and Prediction Using Datamining: A Review
ABSTRACT
Crime analysis and prediction is a systematic approach for identifying the crime. This system can predict
region which have high probability for crime occurrences and visualize crime prone area. Using the concept
of data mining we can extract previously unknown, useful information from an unstructured data. The
extraction of new information is predicted using the existing datasets. Crimes are treacherous and common
social problem faced worldwide. Crimes affect the quality of life ,economic growth and reputation of nation.
With the aim of securing the society from crimes, there is a need for advanced systems and new approaches
for improving the crime analytics for protecting their communities. We propose a system which can analysis,
detect, and predict various crime probability in given region. This paper explains various types of criminal
analysis and crime prediction using several data mining techniques.
KEYWORDS
1. INTRODUCTION
Day by day crime data rate is increasing because the modern technologies and hi-tech methods are helps the
criminals to achieving the illegal activities .according to Crime Record Bureau crimes like burglary, arson etc
have been increased while crimes like murder, sex, abuse, gang rap etc have been increased [2].crime data
will be collected from various blogs, news and websites. The huge data is used as a record for creating a
crime report database. The knowledge which is acquired from the data mining techniques will help in
reducing crimes as it helps in finding the culprits faster and also the areas that are most affected by crime
IJCRT2102057 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 465
www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 2 February 2021 | ISSN: 2320-2882
[11].Data mining helps in solving the crimes faster and this technique gives good results when applied on
crime dataset, the information obtained from the data mining techniques can help the police department.
A particular approach has been found to be useful by the police, which is the identification of crime ‘hot
spots ‘which indicates areas with a high concentration of crime [1].Use of data mining techniques can
produce important results from crime report datasets. The very step in study of crime is crime analysis. Crime
analysis is exploring, inter relating and detecting relationship between the various crimes and characteristics f
the crime. This analysis helps in preparing statistics, queries and maps on demand. It also helps to see if a
crime in a certain known pattern or a new pattern necessary.
Crimes can be predicted as the criminal are active and operate in their comfort zones. Once successful they
try to replicate the crime under similar circumstances. The occurrences of crime depended on several factors
such as intelligence of criminals, security of a location,etc The work has followed the steps that used in data
analysis, in which the important phases are Data collection ,data classification, pattern identification,
prediction and visualization. The proposed framework uses different visualization techniques to show the
trends of crimes and various ways that can predicts the crime using machine learning algorithm.
Collection and analysis of crime related data are imperative to secure agencies. the use of a coherent methods
to classify these data based on the rate and location of occurrences, detection of the hidden pattern among the
committed crimes at different times, and prediction of their future relationship are the most important aspects
that have to be addressed. One of the most popular approaches is hot spot analysis. Some of the most popular
approaches used for this purpose of point pattern analysis and clustering/distances statistics. Another popular
approach is the discovery of pattern or trends through various techniques from data mining, text mining and
spatial analysis, and self-organizing maps.[1]An crime analysis tool should be able to identify crime patterns
quickly and in an efficient manner for future crime pattern detection and action.
Extraction of crime pattern by crime analysis and based on available criminal information.
Crime recognition [3].
Problem of identifying techniques that can efficient and accurate.
Data Collection
Classification
Pattern Identification
Prediction
Visualization
Data Collection
Classification
Pattern Identification
Prediction
Visualization
The data collection is first methodology in crime analysis. Data’s are collected from various different
websites, news sites and blogs. The collected data is stored into database for further process. This is
unstructured data and it is object oriented programming which is easy to use and flexible.
Crime data is an unstructured data since no of field, content, and size of the document can differ from one
document to another the better option is to have a schema less database. Also the absence of joins reduces the
complexity. Other benefits of using an unstructured database are that:
Classification
In this step use Naive Bayes Algorithm which is supervised learning method. Naive Bayes classifier is a
probabilistic classifier which when given an input gives a probability distribution of set of all classes rather
than providing a single output. One of the main advantages of the Naïve bayes Classifier is simple, and
coverage quicker than logistic regression [2].Compare to other algorithm like SVM (Support Vector
IJCRT2102057 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 467
www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 2 February 2021 | ISSN: 2320-2882
machine) which takes lots of memory.
Using naïve Bays algorithm is create a model by training crime data related to vandalism, murder, robbery,
burglary, sex abuse, gang rape, etc. Naive Bayes is that works well for small amount of training to calculate
the classification parameter. Estimating probability sometimes while checking a probability P(A) * P(B/D)
*P(C/D) * P(E/D) where P(C/D)=0[2].
Pattern Identification
A third step is the pattern identification where we have identify trends and patterns in crime. For finding
crime pattern that occurs frequently we are using apriori algorithm. Apriori can be used to determine
association rule which highlight general trends in the database. By using pattern identification it will helps to
the police officials in an effective manner and avoid the crime occurrences in particular place by providing
security, CCTV, fixing alarms etc.
Crime Prediction
The second Approach is predicting the crime type that might occur in a specific location within particular
time. To predict an expected crime type is provide four related features of the crime. The features are:
occurrence month, the occurrence day of the week, the occurrences time and the crime location. Prediction is
stating probability of an event in future period time. A Classification approach is used crime prediction in
data miningn[1]classify areas into hotspots and cold spots and to predictive an area will be a hotspot for
residential burglary. Variety of classification techniques are used for predicting the crime:-[1]
Linear Regression methods are also used for predicting the crime prediction. Based on the crime probability.
The formula for a regression line is
Y=aX + b where, Y is the predicted score, b is the slope of the line, and A is the Y intercept. b = r sx\sy
Integrated theory
Biological theory
Psychological theory
Sociological theory
Conflict theory
Victimization theory
Choice theory
Visualization
The crime prone area can be graphically reoresented using a heat amp which indicates level of activity,dark
colour indicates low activity and brighter colour indicates the high activity.
Prediction Estimate
Algorithm d Result
Output result
Test data
3. DATA
This dataset contains a record of incidents that the Austin Police Department responded to and wrote a report.
Data is from 2003 to present. This dataset is updated weekly. Understanding the following conditions will
allow you to get the most out of the data provided. Due to the methodological differences in data collection,
different data sources may produce different results. This database is updated weekly, and a similar or same
search done on different dates can produce different results.
Comparisons should not be made between numbers generated with this database to any other official police
reports. Data provided represents only calls for police service where a report was written. Totals in the
database may vary considerably from official totals following investigation and final categorization.
Therefore, the data should not be used for comparisons with Uniform Crime Report statistics. The Austin
Police Department does not assume any liability for any decision made or action taken or not taken by the
recipient in reliance upon any information or data provided. Pursuant to section 552.301 (c) of the
Government Code, the City of Austin has designated certain addresses to receive requests for public
information sent by electronic mail.
In the dataset contains different types of crimes (attributes) are considered like murder, rape, kidnapping,
dacoit, robbery, burglary, cheating, dowry deaths, arson etc.
2 Burglary 22,081
6 Drugs 8,336
7 Roberry 2,166
4. ALGORITHMS
-The instance based algorithm is also called as tge machine based learning is a family of learning algorithm
that, instead of performing explicit generalization, compares new problems instances with instance seen in
training, which have been stored in memory. These stored their training set when predicting a value or class
for a new instances, they compute distance training instances to make a decision.
The algorithm in this category for numerical prediction can divided into two types: similarity- based, e.g.,
Euclidean or entropy based and regression-based e.g., LWL Since regression is one of the most popular
methods for numerical prediction[1].
The advantages of the Instances based Algorithm is it over other methods of machine learning is its ability to
adapt its model of machine learning is its ability to adapt its model to previously unseen data. Instance based
learners may simply store a new instance or throw an old instance away. The Disadvantages of the instances
based Algorithm are its need more storage and computational complexity.
2. Linear Regression
-It is simple form of regression. Linear regression attempts to model the relationship between the two
variables by fitting a linear equation to observe the data. this is widely used in statistics. For this purpose
,linear functions are used for which the unknown parameter i.e., weight of the independent variables, are
estimated from the training data[1].this can be used to predict the values One of the most common estimating
method is least mean square.
Linear regression algorithms for predicting include simple regression multiple regression and pace
regression, which is suitable for data of high dimensionality and only accepts binary nominal attributes.[1].
The main advantages of the linear regressions is gain a far greater understanding of the variables that can
impact its success in the coming weeks,months and years into the future.The disadvantages of the regression
3. Decision Tree
Decision tree is used for both the prediction and classification. for the classification purpose a function can
be learned this is intervals defined by splits on the individuals attributes value
No No No Yes No
Yes No No No No
No Yes No No No
Attribute 1
Yes No
Attribute 2 Attribute 4
No Yes Yes No
Attribute 3 No Yes No
Yes
Yes No
Decision tree for the above table
A Root node, that has incoming edges and zero or more outgoing edges.
Internal nodes, each of which has one incoming edges and two or more outgoing edges.
Leaf node or end node, each of which has exactly one incoming edge and no outgoing edges.
For prediction purpose, the decision trees algorithm for classification have been adapted to output a
numerical value the main difference
Advantages of the decision trees are It is very simple to understand and help determine worst,best and
expected values for different scenarios.it can be combined with other decision techniques.Some of the
Disadvantages of the Decision tree are They are unstable, They are often relatively inaccurate, Calculation
can get vey complex.
4. K-Means Algorithm
:K –means is the simplest and most commonly used portioning algorithm among the clustering algorithm in
scientific and industrial software[3].Acceptance of k means is mainly due to its being simple .This algorithm
is also suitable for clustering of a large datasets since it has much less computational complexity grows
linearly by increasing of the data points.
Advantages of the k-means algorithm are relatively simple to implement, Scales to large dataset, Guarantees
convergence, easily adapts to new examples. Disadvantages of the k-means algorithm are Choosing
manually, Being dependent on initial values, clustering data of varying sizes and density.
2 Crime Shiju First According to theNaïve bayes, In this This paper Our system
Analys isSathyade Internatio Crime record Apriori paper predict has tested predicts
and van,Deva n nal Bureau crimes likealgorithm, the crime the accuracy crime prone
Predict M.S, Conferenc e burglary, arson, etcDecision based on of classificatio regions in
ion Surya on have been decreasedtree, NER, the n and India on a
Using Gangadh networks & while crimes likeMongo Db, occurrence s prediction particular
Data aran soft murder, sex, GraphDBs of the based on day. It will
Minin computing abuse, gang rap crimes and different test be more
g
3 Crime Khushab u Internatio This paper is mainlyCrime, The main From the From the
Detect A.Bokde, nal Journal focusing on crime Clustering objective of accuracy clustered
ion Tisksha P. of Engineeri Analysis, Clustering , K-Means this result, crime result it is
Techni Kakade, ng Research and Algorithm paper is to data mining easy to
ques Dnyanes & Clustering by K- classify has a identify
Using hwari S. Technolog means algorithms clustered promising crime trends
data Tumsare, ieEngineer methods. some of crimes based future for over year and
Minin g Chetan G. ing the purpose of on increasing the can be used to
and Wadhai (IJETER) crime analysis are occurrence effectivenes s design
K- B.E Extraction of frequently and precaution
Means Student crime patterns by during efficiency of methods for
crime analysis different criminal and future.
and based on years. Data intelligence
available criminal mining is analysis .
information, crime used to
recognition. extensively in.
Clustering means terms of
analysis,
could be
concluded that
crime details
increasingly to very
large
quantities running
into zota bytes.
6 Crime Rajkumar Internatio Main contribution Data mining, Crime Analyzed and As a future
Analys is.S, nal Journal of this paper is to machine hotspot compared extension of
And Sakkarai of recent propose a new learning, prediction different our
predict Pandi.M, trends in approach based crime has algorithm on work, we
ion using Soundary a engineerin g on deep learning analysis, previously crime plan to
Data Jagan.J,V and success in crime been data determine apply more
Minin g arnikasre reaserch different prediction. suggested. which classificati on
Techni e.P classification task Crime algorithm models
ques such object hotspot performs to increase
detection image prediction better of crime
recognition natural leverages crime prediction
image past data in prediction. accuracy and
processing and order to to
dimensionality identify enhance the
reduction deep crime overall
learning algorithms hotspots, or performanc e.
use social media
deep architectures of data.
multiple layers to
extract features fr
om raw data.
7 Syste Sapreet Internatio This paper is Crime data Crime data The results In future to
matic kaur,Dr. nal Journals explains techniques mining, mining has of this result improve the
Revie w Williamj of Advanced used, crime data the ability may help performanc e
of eet Singh Research in challenges analysis, of extracting new potential of these
Crime Computer addressed, systematic useful users in classificati
Data Science methodologies used, review, information understandi ng on. Hence,
Minin g (IJARCS) and crime data systematic and hidden the range of the usage
mining and analysis study patterns from available of other
paper. The the crime data classificati
methodologies is large data mining on
classificatio n,
pattern
identificatio n,
prediction.
10 Crime Deepika Internatio According to this Clustering The approach A goodThe future
analysi s k,k, Smitha nal journal paper burglary , consists of accuracy ofenhanceme nt
Journa l Vinod of Engineeri and robbery has classificati the following 99.93% is of this
of ng & reduced over a on, steps – data obtained and research
engine Technolog y period of 53 tears Visualizati preprocessi this work focuses
ering & by79.84%and on, k- ng, verifies the on training
techno 28.85% means, clustering, correctness of bots to
logy respectively. crime random classificati the predict the
like murder and forest, on, and instances. crime prone
kidnapping Neural visualizatio area by
has hiked by networks n. Data using
7.39% and mining machine
47.80% techniques learning
respectively. Crime are often techniques.
analysis is a applied to
part of criminolog y
criminology plays as it
an important role in provides
crime detection good results.
.data mining helps
in solving the
crimes faster and
this
techniques give
good results when
applied on crime
dataset, the
information
obtained from the
data mining
techniques can
help the police
department.
In this paper focused on building predictive models for crime frequencies per crime type per month. The
crime rates in India are increasing day by day due to many factors such as increase in poverty,
implementation, corruption, etc. The proposed model is very useful for both the investigating agencies and the
police official in taking necessary steps to reduce crime. The project helps the crime analysis to analysis these
crime networks by means of various interactive visualization.
Future enhancement of this research work on training bots to predict the crime prone areas by using machine
learning techniques. Since, machine learning is similar to data mining advanced concept of machine learning
can be used for better prediction. The data privacy, reliability, accuracy can be improved for enhanced
prediction.
REFERENCE
[1] Ginger Saltos and Mihaela Coacea, An Exploration of Crime prediction Using Data Mining on Open
Data, International journal of Information technology & Decision Making ,2017.
[2] Shiju Sathyadevan, Devan M.S, Surya Gangadharan.S, Crime Analysis and Prediction Using Data
Mining, First International Conference on networks & soft computing (IEEE) 2014.
[3] Khushabu A.Bokde, Tisksha P.Kakade, Dnyaneshwari S. Tumasare, Chetan G.Wadhai B.E Student,
Crime Detection Techniques Using Data Mining and K-Means, International Journal of Engineering Research
& technology (IJERT) ,2018
[4] H.Benjamin Fredrick David and A.Suruliandi,Survey on crime analysis and prediction using data mining
techniques, ICTACT Journal on Soft computing, 2017.
[5] Tushar Sonawanev, Shirin Shaikh, rahul Shinde, Asif Sayyad, Crime Pattern Analysis, Visualization And
prediction Using Data Mining, Indian Journal of Computer Science and Engineering (IJCSE), 2015.
[6] RajKumar.S, Sakkarai Pandi.M, Crime Analysis and prediction using data mining techniques,
International Journal of recent trends in engineering & research,2019.
[7] Sarpreet kaur, Dr. Williamjeet Singh, Systematic review of crime data mining, International Journal of
Advanced Research in computer science , 2015.
[8] Ayisheshim Almaw, Kalyani Kadam, Survey Paper on Crime Prediction using Ensemble Approach,
International journal of Pure and Applied Mathematics,2018.
[9] Dr .M.Sreedevi, A.Harha Vardhan Reddy, ch.Venkata Sai Krishna Reddy, Review on crime Analysis and
prediction Using Data Mining Techniques, International Journal of Innovative Research in Science
Engineering and technology ,2018.
IJCRT2102057 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 483
www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 2 February 2021 | ISSN: 2320-2882
[10] K.S.N .Murthy, A.V.S.Pavan kumar, Gangu Dharmaraju, international journal of engineering, Science
and mathematics, 2017.
[11] Deepiika k.K, Smitha Vinod, Crime analysis in india using data minig techniques , International journal
of Enginnering and technology, 2018.
[12] Hitesh Kumar Reddy ToppyiReddy, Bhavana Saini, Ginika mahajan, Crime Prediction &Monitoring
Framework Based on Spatial Analysis, International Conference on Computational Intelligence Data Science
(ICCIDS 2018).