0% found this document useful (0 votes)
174 views

Crime Analysis and Prediction Using Machine Learning

Crime is one in every of the most important hurdle in today’s global and it’s growing at the hearth place tempo that is a primary motive for concern
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views

Crime Analysis and Prediction Using Machine Learning

Crime is one in every of the most important hurdle in today’s global and it’s growing at the hearth place tempo that is a primary motive for concern
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 7, Issue 6, June – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Crime Analysis and Prediction using


Machine Learning
1
Nazma Sultana Shaik, 2K. Sai Krishna, 3P. Naveen Kumar
1,2,3
Department of Information Technology
1,2,3
Vignan’s foundation for science Technology and Research

Abstract:- Crime is one in every of the most important The two major law enforcement professions like Data
hurdle in today’s global and it’s growing at the hearth mining as well as artificial intelligence, simply called as DM
place tempo that is a primary motive for concern. There & AI techniques are becoming as a result of technological
is a want to screen and maintain music of all the crimes advancements, research, and information.
in order that it may be utilized by police branch to
analyze the instances without problems and quickly. In In order to predict and prevent the crime we have to do
this experiment, a gadget studying K-way set of rules is the crime analysis by using the Data Mining techniques.
used to are expecting and examine the crime with inside Law enforcement organizations deal with a lot of data.
the metropolis of Chicago. The dataset of crimes in Valuable information can be obtained by the processing of
Chicago is to be had on the kaggle website, this is used that large amount data. The criminal data can be processed
because the dataset to make prediction and to visualize by using the several models offered by the law enforcement
the styles and developments of various crimes. The agencies in the perspective of preventing the crime.
different cause of this mission is to assess how lots okay –
way set of rules is viable to decide and clear up the II. CRIMEDATAANALYSIS
present day problem. Secure agencies require the collection and analysis of
Keywords:- Crime Analysis, Training Datasets, Decision crime-related data. The most essential factors that must be
tree, Naïve Bayes, k Nearest Neighbor (KNN). addressed are the employment of coherent methods to
classify this data based on the rate and place of incidences,
I. INTRODUCTION identification of the underlying pattern among the broken
laws at different times, and forecasting of their future
The rate of crime is rising on a daily basis as current relationship. Hot spot analysis is one of the most popular
technologies and high-tech ways assist criminals in carrying methods. Point method of analysis and clustering/distance
out their unlawful activities. According to the Crime Record statistics are two of the most used methodologies for this
Bureau, burglary, arson, and other crimes have increased, purpose. The finding of patterns or trends using techniques
while murder, sex, abuse, gang rap, and other crimes have like as data mining, text mining, spatial, and self-organizing
increased [2]. Data about crime will be gathered from a maps is another prominent approach.
variety of blogs, news outlets, and websites. The massive
data is used to create a crime report database as a record. Crime analysis is a branch of criminology tasked with
The knowledge gained via data mining techniques will aid investigating and uncovering crime and its links to
in the reduction of crime by assisting in the speedier criminals. The goal of law enforcement is to identify the
identification of criminals as well as the areas most affected features of crime. The initial step in generating further
by crime. analysis is to identify crime characteristics. Because of the
large volume of crime data and the intricacy of the links
The expansion of research methodologies are aiming to between them, criminology is an ideal topic for data mining.
get the resources in background from the crime reports that
are available in order to get a better view about criminal The primary goal of crime analysis is to:
behavior and moreover it is used to prevent future crimes. It  Data mining can be used to investigate very enormous
has resulted in the rapid growth of the crime data records datasets with a large number of variables that are beyond
that are combined along with the data analytics. In this way the scope of a single analyst, or even an analytical team or
crime can bea complicated social problem that has grown in task force.
response to major key elemental developments. The  Based on the leads and circumstances accessible to law
elements that lead to a rise in criminal tendencies must be enforcement agencies, the goal of criminal investigative
discovered by law enforcement agencies. There is always a analysis is to find parallels in specific occurrences of serial
need for crime prevention techniques and policies to combat crimes.
this.  Tactical, operational, and strategic levels of crime analysis
are all possible. As rapidly as possible, crime analysts
examine crime data, arrest reports, and police calls for
service in order to spot emerging patterns, series, and
trends.

IJISRT22JUN073 www.ijisrt.com 149


Volume 7, Issue 6, June – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
A. CRIMEANALYSISMETHODOLOGY D. Pattern Identification:
The several methodologies that help in the evaluation of A 1/3 step is the sample identity wherein we've pick out
crime are:- developments and styles in crime. For locating crime sample
 Data Collection that takes place often we're the usage of apriority algorithm.
 Classification Apriori may be used to decide affiliation rule which
 Pattern Identification spotlight fashionable developments with inside the database.
 Prediction By the usage of sample identity it's going to allows to the
 Visualization police officers in an powerful way and keep away from the
crime occurrences mainly vicinity with the aid of using
supplying security, CCTV, solving alarms etc.
Data Classification Pattern
Collection Identification E. Crime Prediction:
The 2d Approach is predicting the crime kind that could
arise in a particular vicinity inside unique time. To expect an
anticipated crime kind is offer 4 associated functions of the
crime. The functions are: prevalence month, the prevalence
Visualization Prediction day of the week, the occurrences time and the crime
vicinity. Prediction is pointing out opportunity of an
occasion in destiny length time. A Classification technique
Fig. 1: Crime Analysis Steps is used crime prediction in facts mining classify regions into
hotspots and bloodless spots and to predictive a place might
B. Data Collection: be a hotspot for residential burglary. Variety of class
The statistics collection is first approach in crime strategies are used for predicting the crime:-
analysis. Data’s are accumulated from numerous one-of-a-  KNN
type websites, records on web, web sites and blogs. The  J48
accumulated statistics is stored into database for further  SVM
process. This is unstructured and iseasy to use and flexible  Neural Networks
because it is made up of object oriented programming. The  Naïve Bayes and ensemble learning
data format of the crime statistics is mostly unstructured. It
is clearly observed by comparing the number of fields and Linear Regression strategies also are used for
content storage type in those fields from one format at one predicting the crime prediction. Based at the crime
place to the different format at another place. To overcome probability. The system for a regression line is
that we have to use a schema that will operate more
efficiently on small datasets like time to time.in accordance Y=aX + b where, Y is the expected score, b is the slope of
with that the complexity will be reduced by removing the the line, and A is the Y intercept. b = r sx\sy
joins. Along with these there are many other uses comes
with unstructured data are: And the intercept (A) may be calculated as A=MY –bMX.
 High amount of all the three types of data Some Theories used to predicting the crimes are:
 Programming language that is simple to use and flexible in  Integrated theory
type.  Biological theory
 Psychological theory
C. Classification:
In this step use Naive Bayes Algorithm that is supervised  Sociological theory
getting to know method. One of the probabilistic classifier  Conflict theory
namely Naïve Bayes to which whenwe given an enter offers  Victimization theory
a best chance to spread the data evenly to all the training  Choice theory
sets as opposed to imparting a unmarried output. One of the
primary benefits other than simple in the classification of F. Visualization:
Naïve Bayes is insurance and when compared to the other it Visualization can be done by using the heat map which
is faster in the classification. Compare two different set of shows the activity in several areas, whereas the dark areas
rules like SVM which in terms called as Support Vector represent the Low Activity and the Light colors indicates the
machine which takes plenty of memory. Using naïve Bays areas where activity is very high.
set of rules is create a version via way of means of education
Advantages of using heat map are:-
crime records associated with murder, vandalism,
intercourse abuse, burglary robbery, rape attacks etc. The  Colored images based on both numerical and category.
main purpose of the naïve Bayes is to work properly over a  Choice to select the data we want to analyze.
small amount of data that is having a limited number of  Unrelavent data will not be taken into consideration.
fields which contains data regarding the crime. Estimating
chance on occasion whilst checking a chance
P[A]*P[B/D]*P[C/D]*P[E/D] where in P[C/D] =0[2].

IJISRT22JUN073 www.ijisrt.com 150


Volume 7, Issue 6, June – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Taking the Dataset of


Crime Database Test Data
Crime

Based on the requirements the data


is filtered Prediction
Algorithm

Reading the Excel File


done in the rapid miner tool Estimated
Result
Fig. 4: PREDICTING FUTURE CRIME TRENDS
Replacing the Missing Values III. DATA
& proceed to Execute The Austin plice department write a reply to all the
crime incidents that come into notice to the police and store
those replies in a dataset. The data stored in that manner is
presented from the year 2003 to till now. The dataset is
Normalization is performedon going to update every week with the new incidents that are
reported in that week. By this we will get to know the how
Resulted Dataset much of the data is going to be collected in a week. Based
on the information gathering technique, which is different
from one to other the data collected is also in different
formats according the nature of the information collector.
The data uploaded in the dataset is to be coming from these
K-Means Clustering is performed sources, so it is likely to be different when compared to the
on Resulted Dataset same dates by different in gathering the information source.

Now a days each report generated by the police is


having such a unique number which is used to know all the
details about the incident along with the time date and the
Using the Plot view to get
person who take responcibity of that particular incident.
the clusters Which makes it easy for any of the police in any other place
to know the information using that reference number in the
police database. Which makes the police department a
benefit of no longer storing the files written at the time of
Crime Analysis Algorithm is the incident reporting.
performed on Cluster Formed
Now a days the legal responsibility to those papers
which are written and stored by the Austin police is not
Fig. 3: FLOWCHART OF CRIMEANALYSIS provided. Since storing such amount of papers in the police
station is also a big challenge that should we take into
consideration. Using the codes in the law namely Segment
552. 301© states that the public can be given freedom to talk
about the incident with the police officers by viewing the
data that has been sent via E-Mail In the dataset includes
distinctive kinds of crimes (attributes) are taken into
consideration like murder, rape, kidnapping, dacoit, robbery,
burglary, cheating, dowry deaths, arson etc.

IJISRT22JUN073 www.ijisrt.com 151


Volume 7, Issue 6, June – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

S.No Name Of The Crime Records Found


1 Violentcrime 20384
2 Burglary 21048
3 Vehiclecrime 17964
4 Anti-socialbehavior 46152
5 Robbery 7452
6 Drugs 13425
7 Theft from person 1486
8 Other theft 8945
Table 1: Different crime data information

IJISRT22JUN073 www.ijisrt.com 152


Volume 7, Issue 6, June – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IV. ALGORITHMS Area Sensitivity YES YES NO YES YES NO
Notable Event YES YES NO NO YES YES
The Algorithms used in our experiment are
VIP Presence YES NO NO NO YES NO
 Instance based algorithm Criminal Group NO YES YES NO YES NO
 Decision tree Crime YES NO NO NO YES NO
 Linear regression Table 2: Describing the Attributes
 K-means algorithm
D. K-Means algorithm
A. Instance Based Algorithm K –method is also very easy method to handle the
The example primarily based totally set of rules is datasets and maximum normally used portioning set of rules
likewise known as the gadget primarily based totally getting a number of the clustering set of rules in medical and
to know is a own circle of relatives of getting to know set of commercial software. Acceptance of okay method is
rules that, in preference to appearing specific generalization, particularly because of its being easy .This set of rules is
compares new issues times with example visible in likewise used in a appropriate way to make clusters of the
schooling, that have been saved in memory. These saved large datasets because it has tons much less complexity in
their schooling set while predicting a cost or magnificence the field of computation that grows continuously via way of
for a brand new times, they compute distance schooling means of growing of the records points. Advantages of the
times to make a decision. The blessings of the Instances okay-method set of rules are particularly easy to implement,
primarily based totally Algorithm is it over different Scales to massive dataset, Guarantees convergence,
strategies of system mastering is its capacity to evolve its effortlessly adapts to new examples. Disadvantages of the
version of system mastering is its capacity to evolve its okay-method set of rules are choosing manually, being
version to formerly unseen data. Instance primarily based depending on preliminary values, clustering records of
totally beginners can also additionally really shop a brand various sizes and density.
new example or throw an antique example away. The
Disadvantages of the times primarily based totally V. CONCLUSION
Algorithm are its want greater garage and computational
complexity. As we seen that this paper is targeted on constructing a
way to predict the fashions of crime that are occurring very
B. Linear Regression frequently with the experience in the crime consistent within
It is easy shape of regression. Linear regression tries to a month. The crime fees in India are growing every day
version the connection among the 2 variables with the aid of because of many elements which includes boom in poverty,
using becoming a linear equation to look at the data. That is implementation, corruption, etc. The proposed version may
extensively utilized in statistics. The unknown parameter be very beneficial for each the investigating organizations
i.e., weight of the impartial variables, are predicted from the and the police reputable in taking vital steps to lessen crime.
education data for the linear capabilities which is used for
this purpose. This may be used to are expecting the values The task allows the crime evaluation to evaluation
One of the maximum not unusual place estimating technique those crime networks by way of numerous interactive
is least imply squareSimple regression, multiple regression, visualization. Future enhancement of this studies paintings
and rhythmic regression are the one of the different models on education bots to expect the crime inclined regions
in the Algorithm of linear regression. Which are suitable for through the use of system studying techniques. Since,
high-dimensional data and only accept nominal binary system studying is much like records mining superior idea
attributes [1]. The main benefit of linear regressions is to of system studying may be used for higher prediction. The
better understand the variables that could affect your success records privacy, reliability, accuracy may be progressed for
in the weeks, months and years to come. Linearity is the better prediction.
main setback for the regression models. If the data is REFERENCES
nonlinearly dependent, the most adequate and suitable line
can be implemented by using the linear regression, which [1.] Review on crime Analysis and Prediction Using Data
may not fit very well. Mining Techniques, International Journal of Innovative
Research in Science Engineering and technology, 2018
C. Decision Tree by M. Sreedevi , A. Harsha vardhan, Ch.Venkata sai
The decision tree is used in several ways like predicting Krishna.
and classifying data. We have to learn a function inorder to [2.] Review on Crime Analysis and prediction using data
do the classification, that function is called intervals defined mining techniques, International Journal of recent
by distribution of the values belonging to the individual trendsinengineering&research,2019 by Rajkumar and
attributes. Advantages of the choice timber are It is quite Sakkarai.
simple to recognize and assist decide worst, great and [3.] Hitesh Kumar, Bhavana, Ginika’s review on Crime
anticipated values for unique scenarios. It is able to be Prediction & Monitoring Framework Based on Spatial
blended with different choice techniques. Some of the Analysis, ICCIDS, 2018.
Disadvantages of the Decision tree are they are unstable,
they are regularly exceedingly inaccurate, Calculation can
get very complex.

IJISRT22JUN073 www.ijisrt.com 153

You might also like