AnandReport Merged
AnandReport Merged
A PROJECT REPORT
Submitted by
ANANDAKUMAR. A (920819104005)
MAY 2023
ANNA UNVIERSITY::CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr. K. RAMANAN, M.Tech., Ph.D., Dr. K. RAMANAN, M.Tech., Ph.D.,
HEAD OF THE DEPARTMENT MENTOR
Professor, Assistant Professor,
Computer Science and Computer Science and
Engineering, Engineering,
NPR College of Engineering NPR college of Engineering
and Technology, and Technology,
Natham, Natham,
Dindigul – 624001. Dindigul – 624001.
crime database. Our system predicts what criminal activity occurs in everyday
life in various parts of the world. Using machine learning and data mining
algorithms, we can predict the information in the dataset. This process helps
solve the crime faster. Instead of focusing on why the crime happened,
crime data based on monthly and weekly data. Here we have an approach
procedure that can help solve crimes faster. Instead of focusing on causes of
iii
ACKNOWLEDGEMENT
First and foremost, I praise and thank the nature from the depth of my
heart which has given us immense source of strength, comfort and inspiration in
completion of this project work.
I should like to express sincere thanks to our beloved Principal
Dr.J.SUNDARARAJAN,B.E.,M.Tech., Ph.D., for forwarding us to do our
project.
iv
TABLE OF CONTENTS
CHAPTER TITLE
NO. PAGE NO.
ABSTRACT iii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS x
I INTRODUCTION 1
1.1 Need for study 3
1.2 Objectives of study 4
1.3 Research Objective 4
1.4 Challenges 5
1.5 Technique Used 5
1.6 Plan of implementation 9
1.7 Problem Statement 10
II LITERATURE SURVEY 11
IV SYSTEM STUDY 19
4.1 Feasibility Study
19
4.1.1 Economical Feasibility 19
4.1.2 Technical Feasibility 19
5.2 Advantages 22
VI SYSTEM SPECIFICATION 23
6.1 Technologies Used 23
APPENDIX 61
OUTPUT SCREENSHOTS 61
SAMPLE CODE 63
REFERENCES 68
LIST OF FIGURES
FIGURE
FIGURE NAME PAGE NO
NO.
LIST OF TABLES
TABLE
TABLE NAME PAGE NO
NO.
ACRONYMS ABBREVIATIONS
ML Machine Learning
IT Information Technology
K - MEANS Number of Clusters
IDLE Integrated Development Learning
Environment
COLAB Collabaratory
VS CODE Visual Studio Code
RAM Random Access Memory
GB Giga Byte
FPDA False Positive Data Aggregate
UML Unified Modeling Language
x
CHAPTER – I
INTRODUCTION
Crimes are the significant threat to human kind. There are many crimes
that happen. Crime prediction & identification are the major problems to police
department. The aim of this project is to make crime prediction using the
features present in the dataset. With the help of Machine learning algorithms we
can predict the type of crime which will occur in particular area.
Crimes are the significant threat to the humankind. There are many
crimes that happen in regular intervals of time. Perhaps it is increasing and
spreading at a fast and vast rate. Crimes happen from small village, town to big
cities. Crimes are of different type – robbery, murder, rape, assault, battery, false
imprisonment, kidnapping, homicide. Since crimes are increasing there is a need
to solve the cases in a much faster way. The crime activities have been increased
at a faster rate and it is the responsibility of police department to control and
reduce the crime activities.
1
Crime prediction and criminal identification are the major problems to the
police department as there are tremendous amount of crime data that exist.
There is a need of technology through which the case solving could be faster.
Through many documentation and cases, it came out that machine learning and
data science can make the work easier and faster. The aim of this project is to
make crime prediction using the features present in the dataset. The dataset is
extracted from the official sites. With the help of machine learning algorithm,
using python as core we can predict the type of crime which will occur in a
particular area with crime percapita. The objective would be to train a model for
prediction. The training would be done using Training data set which will be
validated using the test dataset. The Multi Linear Regression (MLR) will be
used for crime prediction. Visualization of dataset is done to analyze the crimes
which may have occurred in a particular year and based on population and
number of crimes. This work helps the law enforcement agencies to predict and
detect the crime percapita in an area and thus reduces the crime rate.
Machine Learning is a sub-area of artificial intelligence, whereby the term
refers to the ability of IT systems to independently find solutions to problems by
recognizing patterns in databases. In other words: Machine Learning enables IT
systems to recognize patterns on the basis of existing algorithms and data sets
and to develop adequate solution concepts. Therefore, in Machine Learning,
artificial knowledge is generated on the basis of experience. In order to enable
the software to independently generate solutions, the prior action of people is
necessary. For example, the required algorithms and data must be fed into the
systems in advance and the respective analysis rules for the recognition of
patterns in the data stock must be defined. Once these two steps have been
completed, the system can perform the following tasks by Machine Learning.
2
1.1 Need for the study
3
1.2 Objectives of the study
The aim of the study is to analyze and predict the crime rate and crime
types in various locations. Based on this information the officials can take
charge and try to reduce the crime rate. It is based on spatial distribution of
data and anticipation of crime rate.
The main objective of the project is to predict the crime rate and analyze
the crime rate to be happened in future. Based on this Information the officials
can take charge and try to reduce the crime rate.
→The concept of Multi Linear Regression is used for predicting the graph
between the Types of Crimes (Independent Variable) and the Year (Dependent
Variable).
4
1.4 Challenges
Crime rate prediction and analysis system have many advantages such as
accessibility, simplicity, secure and efficient but there are number of challenging
problems associated when design, planning and implementing the entire system.
Some of the challenges that we are faced during project are:
Technical Challenges:
Data Security
General Challenges:
1. Undefined goals
2. Scope changes
3. Lack of engagement
4. Resource depreciation
5. Lack of time
5
Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008,
was a major revision not completely backward-compatible with earlier versions.
Python 2.7.18, released in 2020, was the last release of Python 2.
6
Numpy:
NumPy is a Python library used for working with arrays. It also has
functions for working in domain of linear algebra, fourier transform, and
matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source
project and you can use it freely. NumPy stands for Numerical Python. In
Python we have lists that serve the purpose of arrays, but they are slow to
process. NumPy aims to provide an array object that is up to 50x faster than
traditional Python lists. The array object in NumPy is called ndarray, it provides
a lot of supporting functions that make working with ndarray very easy. Arrays
are very frequently used in data science, where speed and resources are very
important. NumPy arrays are stored at one continuous place in memory unlike
lists, so processes can access and manipulate them very efficiently. This
behavior is called locality of reference in computer science. This is the main
reason why NumPy is faster than lists. Also it is optimized to work with latest
CPU architectures.
Pandas:
Pandas is a Python library used for working with data sets. It has functions
for analyzing, cleaning, exploring, and manipulating data. The name "Pandas"
has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008. Pandas allows us to analyze big data and
make conclusions based on statistical theories. Pandas can clean messy data sets,
and make them readable and relevant. Pandas are also able to delete rows that
are not relevant, or contains wrong values, like empty or NULL values. This is
called cleaning the data. Relevant data is very important in data science. Pandas
gives you answers about the data like:
Relational plots: This plot is used to understand the relation between two
variables.
Categorical plots: This plot deals with categorical variables and how they
can be visualized.
Distribution plots: This plot is used for examining univariate and bivariate
distributions.
8
Matplotlib:
Matplotlib is a plotting library for the Python programming language and
its numerical mathematics extension NumPy. It provides an object-
oriented API for embedding plots into applications using general-purpose GUI
toolkits like Tkinter, wxPython, Qt, or GTK. There is also a procedural "pylab"
interface based on a state machine (like OpenGL), designed to closely resemble
that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.
Matplotlib was originally written by John D. Hunter. Since then it has had
an active development community and is distributed under a BSD-style license.
Michael Droettboom was nominated as matplotlib's lead developer shortly
before John Hunter's death in August 2012 and was further joined by Thomas
Caswell. Matplotlib is a NumFOCUS fiscally sponsored project.
9
number of elements per cluster, we will be able to decide which element has
more weight inside the algorithm. Technically there is no data loss. Once the
data has been treated, the following algorithms will be tried (in order of
complexity):
- K-Nearest Neighbors.
- Neural Networks.
- Confusion matrix.
Each of them will be deeply explained in its chapter later on. All the
development and testing was done on a server lent by a university department.
This way, executions could last all day without having to worry about them and
were a little bit faster.
1.7 Problem Statement
The main problem is that day to day the population is going to be increased
and by that the crimes are also going to be increased in different areas by this the
crime rate can„t be accurately predicted by theofficials. The officials as they
focus on many issues may not predict the crimes to be happened in the future.
The officials/police officers although they tries to reduce the crime rate they
may not reduce in full-fledged manner. The crime rate prediction in future may
be difficult for them. There has been countless of work done related to crimes.
Large datasets have been reviewed, and information such as location and the
type of crimes have been extracted to help people follow law enforcements.
Existing methods have used these databases to identify crime hotspots based on
locations. There are several maps applications that show the exact crime location
along with the crime type for any given city. Even though crime locations have
been identified, there is no information available that includes the crime
occurrence date and time along with techniques that can accurately predict what
crimes will occur in the future.
10
CHAPTER – II
LITERATURE SURVEY
The main goal of this work is the identification and classification of crime
data. Here, the system detects that the linked regions chain them together in
relative position. Modern technology alsohelps to catch mistakes.
11
2. “Machine Learning based criminal shortlisting using modus
aperandi features”, University of Mumbai, Shree L.RTiwari college of
engineering and technology. Year – 2017.
12
3. “Crime analysis Exploitation data processing techniques and
algorithms” S Sivaranjani, S Sivakumari, M Aasha. Dept. Ofinformation
technology. University of Mumbai L.R. Tiwari College Year- 2017.
13
Existing : Does not collect the raw data.
Proposed : collects the raw data that the hotspot uses after sharing the data
to create a new hotspot and ultimately predict crime rates.
This study deals with different types of crime scenes. The analysis was
conducted from both research and behavioral perspectives. It provided
information about unknown criminals and a recommendation for investigation.
Classification is a unique data mining method used to classify each object. Data
mining creates classification models by observing confidential information.
Naive Bayes is a classification algorithm used for prediction.
Crime is one of the biggest and dominating problem in our society. Daily
there are huge number of crimes committed frequently. Here the dataset consists
of the date and the crime rate that has taken place in the corresponding years. In
this project the crime rate is only based on the robbery. We use linear regression
algorithm to predict the percentage of the crime rate in the future by using the
previous data information. The date is given as an input to the algorithm and the
output is the percentage of the crime rate in that particular year.
Existing : Does not use Naïve Bayes algorithm
Proposed : Naive Bayes & linear SVM are the classification algorithm
used for prediction.
Merits: It gives the police a time frame & points out “hot spots” and all.
14
5. “Cluster Analysis for anamoly detection in accounting Data”
The purpose of this study is to examine the possibility of using clustering
technology for continuous auditing. Automating fraud filtering can be of great
value to preventive continuous audits. In this paper, cluster-based outliers help
auditors focus their efforts when evaluating group life insurance claims. Claims
with similar characteristics have been grouped together and those clusters with
small population have been flagged for further investigations. Some dominant
characteristics of those clusters are, for example, having large beneficiary
payment, having huge interest amount and having been submitted long time
before getting paid. This study examines the application of cluster analysis in
accounting domain. The results provide a guideline and evidence for the potential
application of this technique in the field of audit.
15
6. .“Analysing Violent Criminal Behaviour By Simulation Model.”
Crime analysis, a part of criminology, is a task that includes exploring and
detecting crimes and their relationships with criminals. The high volume of
crime datasets and also the complexity of relationships between these kinds of
data have made criminology an appropriate field for applying data mining
techniques. Identifying crime characteristics is the first step for developing
further analysis. The knowledge that is gained from data mining approaches is a
very useful tool which can help and support in identifying violent criminal
behaviour. The idea here is to try to capture years of human experience into
computer models via data mining and by designing a simulation model.
16
CHAPTER – III
EXISTING SYSTEM
3.1 Overview
Crime has been increasing day by day and everyone is trying to Figureure
out how to manage crime rate. Different crime data mining techniques are
proposed. Among each of them includes extraction only. By using hotspots
crime zones only identified. Clustering techniques are not included in existing
systems. Without internet connectivity the system can‟t be work.
Crime has increased day by day and everyone is trying to Figure out how to
control and reduce crime . Various criminal data mining techniques have been
proposed in the existing work. Each one involves only mining. Using hotspots,
only crime areas are identified. Criminal records cannot be analyzed in previous
jobs. Grouping techniques are not included in existing work. Separation of crime
models based on crime analysis and available crime information, prediction of
crimes based on the spatial distribution of available information. This level of
crime, which the authorities cannot accurately predict, also increases crime in
various areas. Officers who focus on multiple issues may not predict future
crimes. Current methods used these databases to identify crime scenes based on
locations. Although the crime scenes have been identified, there is no
information about the crime scene.
17
3.3 Disadvantages
Code readability is inadequate.
Optimized code is designed.
Cannot be modified.
18
CHAPTER – IV
SYSTEM STUDY
Economical feasibility
Technical feasibility
Social feasibility
19
CHAPTER – V
PROPOSED SYSTEM
5.1 Overview
In the proposed work, we analyze crime data with several parameters and
factors, including daily crime, weekly crime and monthly crime, domestic
violence. Using a decision tree algorithm and K-means clustering algorithm, crime
type is predicted from latitude and longitude. K-means Clustering is one of the
methods of cluster analysis, which aims to divide n observations into k clusters,
where each observation belongs to a cluster. There are two types of data, training
and test set. A regression model predicts the transitivity of future crime data with
various parameters.
1) Pre-processing Stage
In Pre-processing Stage the data from various resources are collected
transformed into clean information. The transformed data will then be stored on
graph data store. Then the system will automatically take parsing to understand
the type of information, its relationship etc. to provide all possible combination
of events.
2) Processing Stage
In Processing Stage, the generated event combinations will be analysed to
produce possible configuration. The system decides the most suitable
combination with the help of previous data. Therefore producing the set of
locations with the set of possible events.
21
3) Post processing Stage
In the Post processing Stage the set of events are filtered into interesting
and important events. It can be found out by using several output stage threshold
based filters.
5.2 Advantages
Reusability of code.
Effective accuracy.
Prediction can be easy.
Precautionary methods can be taken to prevent from crimes.
22
CHAPTER – VI
SYSTEM SPECIFICATION
The technologies may serve as the basis for a contract for the
implementation of the system and should therefore be a complete and consistent
specification of the whole system. They are used by software engineers as the
starting point for the system design. It shows what the system does and not
knows it should be implemented.
Code Editor
It is a text editor program designed specifically for editing source code
of computer programs. It may be standalone application or it may built into
an integrated development environment or web browser.
They make writing and reading the source code easier by
differentiating the elements and routines so programmers can more easily
look at their code.
23
Figure 6.1 Code editors
PyCharm :
The beta version of the product was released in July 2010, with the 1.0
arriving 3 months later. Version 2.0 was released on 13 December 2011, version
3.0 was released on 24 September 2013, and version 4.0 was released on
November 19, 2014.It is a text editor program designed specifically for editing
source code of computer programs.
24
It may be standalone application or it may built into an integrated
development environment or web browser. PyCharm became Open Source on
22 October 2013. The Open Source variant is released under the name
Community Edition – while the commercial variant, Professional Edition,
contains closed-source modules.
On November 18, 2015, the source of Visual Studio Code was released
under the MIT License, and made available on GitHub. Extension support was
also announced. On April 14, 2016, Visual Studio Code graduated from the
public preview stage and was released to the Web. Microsoft has released most
of Visual Studio Code's source code on GitHub under the permissive MIT
License, while the releases by Microsoft are proprietary freeware. Visual Studio
Code is a source-code editor that can be used with a variety of programming
languages, including C, C#, C++, Fortran, Go, Java, JavaScript, Node.js,
Python, Rust.It is based on the Electron framework, which is used to develop
Node.js web applications that run on the Blink layout engine.
25
Visual Studio Code employs the same editor component (codenamed
"Monaco") used in Azure DevOps (formerly called Visual Studio Online and
Visual Studio Team Services). Out of the box, Visual Studio Code includes
basic support for most common programming languages. This basic support
includes syntax highlighting, bracket matching, code folding, and configurable
snippets. Visual Studio Code also ships with IntelliSense for JavaScript,
TypeScript, JSON, CSS, and HTML, as well as debugging support for Node.js.
Support for additional languages can be provided by freely available extensions
on the VS Code Marketplace.
Sublime Text:
Sublime Text is a shareware text and source code editor available for
Windows, macOS, and Linux. It natively supports many programming
languages and markup languages. Users can customize it with themes and
expand its functionality with plugins, typically community-built and maintained
under free-software licenses. To facilitate plugins, Sublime Text features a
Python API. The editor utilizes minimal interface and contains features for
programmers including configurable syntax highlighting, code folding, search-
and-replace supporting regular-expressions, terminal output window, and more.
It is proprietary software, but a free evaluation version is available. The
following is a list of features of Sublime Text:
26
5) Project-specific preferences.
6) Extensive customizability via JSON settings files, including project- specific
and platform-specific settings.
Spyder :
Spyder uses Qt for its GUI and is designed to use either of the PyQt or
PySide Python bindings. QtPy, a thin abstraction layer developed by the Spyder
project and later adopted by multiple other packages, provides the flexibility to
use either backend.
27
Atom :
Atom is a deprecated free and open-source text and source code editor for
macOS, Linux, and Windows with support for plug-ins written in JavaScript,
and embedded Git Control. Developed by GitHub, Atom was released on June
25, 2015. On June 8, 2022, GitHub announced that Atom‟s end-of-life would
occur on December 15 of that year, "in order to prioritize technologies that
enable the future of software development", specifically its GitHub Codespaces
and Microsoft's Visual Studio Code.
Jupyter :
Jupyter is a project to develop open-source software, open standards, and
services for interactive computing across multiple programming languages. It
was spun off from IPython in 2014 by Fernando Pérez and Brian Granger.
Project Jupyter's name is a reference to the three core programming languages
supported by Jupyter, which are Julia, Python and R. Its name and logo are an
homage to Galileo's discovery of the moons of Jupiter, as documented in
notebooks attributed to Galileo. Project Jupyter has developed and supported the
interactive computing products Jupyter Notebook, JupyterHub, and JupyterLab.
Jupyter is financially sponsored by Num FOCUS. The first version of
Notebooks for IPython was released in 2011 by a team including Fernando
Pérez, Brian Granger, and Min Ragan-Kelley. In 2014, Pérez announced a spin-
off project from IPython called Project Jupyter.
IPython continues to exist as a Python shell and a kernel for Jupyter, while
the notebook and other language-agnostic parts of IPython moved under the
Jupyter name. Jupyter supports execution environments (called "kernels") in
several dozen languages, including Julia, R, Haskell, Ruby, and Python (via the
IPython kernel).
NotePad ++ :
Notepad++ is a highly functional, free, open-source, editor for MS Windows
that can recognize (i.e., highlight syntax for) several different programming
languages from Assembly to XML, and many others inbetween, including, of
course, Python. Besides syntax highlighting, Notepad++ has some features that are
particularly useful to coders. It will allow you to create shortcuts to program calls,
such as a Run Python menu item that will invoke python.exe to execute your
Python code without having to switch over to another window running a Python
shell, such as IPython. Another very convenient feature is that it will group
sections of code and make them collapsible so that you can hide blocks of code to
make the page/window more readable.
29
Google Colab :
Colab notebooks allow you to combine executable code and rich text in a
single document, along with images, HTML, LaTeX and more. When you create
your own Colab notebooks, they are stored in your Google Drive account. You
can easily share your Colab notebooks with co-workers or friends, allowing
them to comment on your notebooks or even edit them. With Colab you can
harness the full power of popular Python libraries to analyze and visualize data.
The code cell below uses numpy to generate some random data, and uses
matplotlib to visualize it.
The Core i3 processor is available in multiple speeds, ranging from 1.30 GHz
up to 3.50 GHz, and features either 3 MB or 4 MB of cache. It utilizes either the
LGA 1150 or LGA 1155 socket on a motherboard. Core i3 processors are often
found as dual-core, having two cores. However, a select few high-end Core i3
processors are quad-core, featuring four cores. The most common type of RAM
used with a Core i3 processor is DDR3 1333 or DDR3 1600.v
30
CHAPTER – VII
SYSTEM DESIGN
Block diagrams are typically used for higher level, less detailed
descriptions that are intended to clarify overall concepts without concern for the
details of implementation. Contrast this with the schematic diagrams and layout
diagrams used in computer engineering which shows the implementation details
of physical construction.
31
Figure 7.1 Block diagram of our proposed work
The data set can be collected from the official website for the analysis
process. After collecting the data set, start the preprocessing method. This
preprocessing method can be used to remove unknown or unwanted values from
a loaded dataset. We can clean the data and prepare it for our clustering
algorithm. It converts the raw data into an understandable form. After the data is
pre-processed, we start distributing the train. A K-means clustering algorithm is
then applied to partition the N observations into K clusters, where each
observation belongs to the closest cluster. And then we use a linear regression
algorithm m to find the ratio and percentage of crime that occurred. The Naive
Bayes algorithm is the third algorithm we use in our project.
32
CHAPTER – VIII
SYSTEM IMPLEMENTATION
It is the process of the users does registration process first. The user can
send all information to authentication server. After login, voters are able to cast
their votes.
8.1 Modules
34
c. Filter unwanted outliers: Often individual observations are made, in
which at first glance we do not seem to stop within the limits of the data to be
analyzed. If we have a legitimate reason to correct an anomaly, such as
incorrect data entry, we will help work with the data. However sometimes that
appearance of an outlier will prove a theory you are working on. And just
because an outlier exists doesn‟t mean it is incorrect. This step is needed to
determine the validity of that number. If an outlier proves to be irrelevant for
analysis or is a mistake, considering removing it.
36
a. Data Smoothing: Data Smoothing is a process that is used to remove
noise from the dataset using some algorithms. It allows for highlighting
important features present in the dataset. It helps in predicting the patterns.
When collecting data, it can be manipulated to eliminate or reduce any
variance or any other noise form. The concept behind data smoothing is
that it will be able to identify simple changes to help predict different
trends and patterns.
b. Attribute Construction: In this method, the new attribute consult
the existing attributes to construct a new dataset that eases data mining. New
attributes are created and applied to assist the mining process from the given
attributes.
For example, Suppose we have a dataset referring to measurements of
different plots we may have the height and width of each plot.
c. Data Aggregation: Data aggregation is the method of storing and
presenting data in a summary format. The data may be obtained from
multiple data sources to integrate these data sources into a data analytics
description. This is a crucial step since the accuracy of data analysis insights
is highly dependent on the quantity and quality of the data used.
For example, we have a dataset of sales report of an enterprise that has
quarterly sales of each year. We can aggregate the data to get the
enterprise‟s annual sales report.
d. Data Normalization: Normalization in the data refers to scaling the
data values to much smaller range such as [-1,1] or [0.0,1.0]. There are
different methods to normalize the data.
Min – max normalization
Z – score normalization
Decimal Scaling
37
e. Data Discretization: This is a process of converting continuous data
into a set of data intervals. Continuous attribute values are substituted by
small interval labels. This makes the data easier to study and analyze. This
improves the efficiency of task. This method is also called as a data reduction
mechanism as it transforms a large dataset into a set of categorial data. It also
uses decision tree based algorithms to produce short,compact and accurate
results when using discrete values.
Use Case Diagram
Use case diagram of proposed system, where user inputs dataset, we pre-
process dataset, the algorithm Decision Tree and K-means clusteringto generate
the trained model to predict the crime type. The actor and usecase is represented.
An eclipse shape represents the use case namely input image, pre-process, Split
features, prediction and output.
The class diagram explains about the properties and functions of eachclass.
The classes are Main, pre-process, split data, train and test. In the above diagram,
every class is represented with attributes and operations.
39
8.1.2 Classification Module
40
Figure 8.5 Crime Dataset
41
8.1.3. Clustering Module
42
Figure 8.7 Clustering using K-Means Algorithm
44
Figure 8.8 Split the data into training & testing
45
CHAPTER – IX
The graph represents the false positive data gets aggregated. False Positive
Data Aggregate rate (FPDA) is the ratio of false data aggregated to the overall
46
Table -1 : False positive data aggregate rate with respect to detect count
This graph shows that the time taken for data gathering is the difference
between end time and start time for crime data gathering. It is measured in terms
of milliseconds and is formulated is given below.
DCt = (End TimeDC – Start TimeDC)
350
300
250
200
150
100
50
0
10 20 30 40 50
47
Table -2: Time taken for data gathering with respect to Frame count
10 125 174
20 146 176
30 155 187
40 164 190
50 168 195
Scatter Graph:
The scatter graph represents the total crime that appears in various regions.
With the help of this crime appears can be detected in the various hotspots can be
determined. The total crime values can be taken from the crime dataset.
48
Table -3: Total Crime count
nm_pol Totalcrime
CHITRANJAN PARK 512
DABRI 397
MALVIYA NAGAR 837
CHANDNI MAHAL 588
MODEL TOWN 466
ANANDVIHAR 619
KASHMERE GATE 398
GOVIND PURI 741
BINDAPUR 509
NEW FRIENDS COLONY 694
SARITA VIHAR 245
Bar Graph:
The bar graph represents the murder appear from the total crime that
appears in various regions. With the help of this the murder type of crime
appears can be detected in the various hotspots can be determined. The murder
values can be taken from the crime dataset.
Line Graph:
The line graph represents the relationship between the normal murders and
assault murders that appears in various regions. With the help of this the murder
type of crime appears can be detected in the various hotspots can be determined.
The murder values & assault murder can be taken from the crime dataset.
Data Visualization:
A large amount of information represented in graphic form is easier to
understand and analyze. Some companies specify that a data analyst must know
how to create slides, diagrams, charts, and templates. In our approach, the data
histogram and scatter matrix are shown as data visualization part.
52
Table -6: Top 5 crimes values
Number of arrests:
The following line graph denotes the number of arrests as monthly and
weekly can be determined.
53
Elbow Method:
The elbow method runs k-means clustering on the dataset for a range of
values of k. In the elbow method we plot mean distance and look for the elbow
point where the rate of decrease shifts. For each k, calculate the total within-
cluster sum of squares.
Range Distortion
1 0.42
2 0.27
3 0.24
4 0.22
5 0.15
6 0.09
54
CHAPTER – X
TESTING
After finishing the development of any computer based system the next
complicated time consuming process is system testing. During the time of
testing only the development company can know that, how far the user
requirements have been met out, and so on. Software testing is an important
element of the software quality assurance and represents the ultimate review of
specification, design and coding. The increasing feasibility of software as a
system and the cost associated with the software failures are motivated forces
for well planned through testing.
Testing procedures for the project is done in the following sequence:
System testing is done for checking the server name of the machines
being connected between the customer and executive..
The product information provided by the company to the executive is
tested against the validation with the centralized data store.
System testing is also done for checking the executive availability to
connected to the server.
The server name authentication is checked and availability to the
customer.
Proper communication chat line viability is tested and made the chat
system function properly.
Mail functions are tested against the user concurrency and customer
mail date validate.
55
10.2 Specification Testing
We can set with, what program should do and how it should perform under
various condition. This testing is a comparative study of evolution of system
performance and system requirements.
In this the error will be found at each individual module, it encourages the
programmer to find and rectify the errors without affecting the other modules.
Unit testing focuses on verifying the effort on the smallest unit of software-
module. The local data structure is examined to ensure that the date stored
temporarily maintains its integrity during all steps in the algorithm‟s execution.
Boundary conditions are tested to ensure that the module operates properly at
boundaries established to limit or restrict processing.
56
10.7 Recovery Testing
57
10.11 Output Testing
After performing the validation testing, the next step is output testing of the
proposed system since no system would be termed as useful until it does produce
the required output in the specified format. Output format is considered in two
ways, the screen format and the printer format.
User Acceptance Testing is the key factor for the success of any system.
The system under consideration is tested for user acceptance by constantly
keeping in touch with prospective system users at the time of developing .
58
CHAPTER – XI
CONCLUSION
59
CHAPTER – XII
FUTURE SCOPE
60
APPENDICES
OUTPUT SCREENSHOTS
62
SAMPLE CODING
Header page:
<!DOCTYPE html>
<html>
<head>
<title> Etihaad</title>
<style>
h1{
margin-top: 3.0em;
i.i con{
font-size: 2.0em;
</style>
</head>
<body>
63
Index Page
<%include partials/header%>
<!DOCTYPE html>
<html>
<head>
<title>Major Project</title>
<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('crime.csv')
X = dataset.iloc[:, [1,2,3,4,5,6,7,12]].values
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)
66
Elbow Method for find optimum K values
kmeanModel = KMeans(n_clusters=k).fit(data_normalize)
kmeanModel.fit(data_normalize);
distortions.append(sum(np.min(cdist(data_normalize,
kmeanModel.cluster_centers_, 'euclidean'), axis=1)) /
data_normalize.shape[0])#
Plot the elbow
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()
67
Split the data into training and testing
68
REFERENCES
3. Anon:Areview_Crime_analysis_using_data_mining_techniques
and_algorithms. Accessed 30 Aug. 2019.
4. Marchant, R., Haan, S., Clancey, G., Cripps, S.: Applying machine
learning to criminology: semi parametric spatial demographic Bayesian
regression. Security Inform. 7(1) (2018).
6. Lin, Y., Chen, T., Yu, L.: Using machine learning to assist crime
prevention. In: 2017 sixth IIAI International Congress on Advanced Applied
Science (IIAI-AAI).
69
8. Crime Prediction and Forecasting in Tamilnadu Using Clustering
Approaches S Sivaranjani, S Sivakumari, M Aasha. Dept. Of information
technology. University of Mumbai L.R. Tiwari College Year- 2017.
12. S. Joshi, and B. Nigam, ―Categorizing the document using multi class
classification in data mining,‖ International Conference on Computational
Intelligence and Communication Systems, 2011.
70