PASSBAAN
PASSBAAN
REPORT
ON
Submitted By:
Affiliated to
Savitribai Phule Pune University
Himanshu Sharma
Atharva
<Your Topic>
With the rapid urbanization and development of big cities and towns, the graph of crimes is also
on the increase. This phenomenal rise in offences and crime in cities is a matter of great concern
and alarm to all of us.
There are robberies, murders, rapes and what not. The frequent and repeated thefts, burglaries,
robberies, murders, killings, rapes, shoplifting, pick pocketing, drug- abuse, illegal trafficking,
smuggling, theft of vehicles etc., have made the common citizens to have sleepless nights and
restless days.
They feel very insecure and vulnerable in the presence of anti-social and evil elements. The
criminals have been operating in an organized way and sometimes even have nationwide and
international connections and links.
Scope
● Various Security organizations can utilize the data of crimes occurred(data visualization
aspect of the project) for deployment of preventive and action forces.
● Crime prediction is a very important attribute which can be used by intelligence agencies
like CBI, RAW to prevent crimes from occurring.
Requirements
Functional Requirements
● Anaconda Navigator-Spyder/Jupyter Notebook
● Microsoft Excel
Non-Functional Requirements
● To provide maximum accuracy.
● Ease of use
● Availability
● Reliability
● Maintainability
List of Tables
S.No. Table Number Table Name Page Number
1 1.1 Police Dataset 1.4
2 1.4 Dataset after preprocessing 1.5
3 1.5 Role and Responsibilities 1.6
CONTENTS Page
No.
Title Page I
Declaration II
Certificate by the Supervisor III
Acknowledgement IV
List of Figures V
List of Tables VI
Abstract VII
Chapter 1: Introduction 1.1
.....................................................................................................
1.1 Rationale 1.2
......................................................................................................................
1.2 Goal ......................................................................................................................... 1.2
IX
Introduction
Chapter-1
Introduction
Paasbaan which is an Urdu word meaning protector, many important questions in public safety
and protection relate to crime, and a better understanding of crime is beneficial in multiple ways:
it can lead to targeted and sensitive practices by law enforcement authorities to mitigate crime, and
more concerted efforts by citizens and authorities to create healthy neighborhood environments.
With the advent of the Big Data era and the availability of fast, efficient
algorithms for data analysis, understanding patterns in crime from data is an active and growing
field of research.
The inputs to our algorithms are time (hour, day, month, and year), place (latitude and longitude),
and class of crime:
Act 13 - Gambling
Act 279 - Accident
The output is the class of crime that is likely to have occurred. We try out multiple classification
algorithms, such as KNN (K-Nearest Neighbors), Decision Trees, and Random Forests.
We also perform multiple classification tasks – we first try to predict which of 6 classes of
crimes are likely to have occurred, and later try to differentiate between violent and non-violent
crimes.
PAASBAAN
1.1
Introduction
1.1 Rationale
Madhya Pradesh's commercial capital Indore has topped the crime record in the country in 2008
followed by Bhopal and Jaipur. Crime rate of Indore was 941.4, which is the highest in the country,
according to National Crime Record Bureau's (NCRB) report - "Crime in India 2008".
With the rapid urbanization and development of big cities and towns, the graph of crimes is also
on the increase. This phenomenal rise in offences and crime in cities is a matter of great concern
and alarm to all of us.
There are robberies, murders, rapes and what not. The frequent and repeated thefts, burglaries,
robberies, murders, killings, rapes, shoplifting, pick pocketing, drug- abuse, illegal trafficking,
smuggling, theft of vehicles etc., have made the common citizens to have sleepless nights and
restless days.
They feel very insecure and vulnerable in the presence of anti-social and evil elements. The
criminals have been operating in an organized way and sometimes even have nationwide and
international connections and links.
1.2 Goal
Much of the current work is focused in two major directions:
1.3 Objective
The objective of our work is to:
1.2
Introduction
1.4 Methodology
1.4.1 Machine learning
The term machine learning refers to the automated detection of meaningful patterns in data. In the
past couple of decades it has become a common tool in almost any task that requires information
extraction from large data sets.We are surrounded by a machine learning based technology: search
engines learn how to bring us the best results (while placing pro_table ads), anti-spam software
learns to filter our email messages, and credit card transactions are secured by a software that
learns how to detect frauds. Digital cameras learn to detect faces and intelligent personal assistance
applications on smart-phones learn to recognize voice commands. Cars are equipped with accident
prevention systems that are built using machine learning algorithms.
PAASBAAN
1.3
Introduction
The inputs to our algorithms are time (hour, day, month, year), place (latitude and longitude), class
of crime
Act 379-Robbery
Act 13-Gambling
Act 279-Accident
Act 323-Violence
Act 302-Murder
Act 363-Kidnapping
The output is the class of crime that is likely to have occurred. We try out multiple classification
algorithms, such as KNN (K-Nearest Neighbors), Decision Trees, and Random Forests.
But the dataset is Hindi and in order to perform machine learning this data cannot be used as it is.
1.4
Introduction
थाना थाना धारा फररया का आरोपी का घटना स्थल घटना कायमी दवलााां से घटना के
दी ब
अपराध/मगग नाम एवााांपत नाम एवा ददनाााां व ददनाााांक व कायमी का कारण
क
क्रमाााांक पता समय समय कार सददत
ण
दववर
ण
थाना जूनी 89/18 379, सुनील अज्ञात - ४६ टाईप २ 08-02-18 2/10/2018 फररयादी कोई अज्ञात
,
इााांदौर अन्जाने उम्र बीएसएनएल 11:0 के 12:45:00 के थाना व्यक्ति
२८ वगग क्वाटगर 12:0 बीच PM आनेपर फररयादी
दपता/प खातीवालाटैं क की दबना
दत
सुरेश अन्जाने इन्दौर नम्बर की
दनवासी ५६ मोटर
टाईप २ साकयकल
पीएनटी को रखे
कालोनी स्थान से
खातीवालाटैं क चोरी कर ले
इन्दौर गया
थाना 64/18 13 जुआ शासन तफे ददनेश - बाडी मोद ल्ला 10-02-18 2/10/2018 घटना
राऊ एक्ट पुदलस सउदन केशरददस , राऊ 19:10 के 8:10:00 ददना को
क
दमेश सुरेश -रमेश 20:0 बीच PM आरोपीयो
श्रीवास्तव यादव को तास
दपता/प मुकेश - पत्तो से
दत
भगवान दास सत्यनारायण द ारजीत
का
दनवासी पााांव र , दाव लगाते
पुदलस
थानाराऊ हुवेपकडा
Dropping features such police station, station number, Complainant name & address
,Accused name & address
Dropping features such as Resolution, Description and Address: The resolution and
description of a crime are only known once the crime has occurred, and have limited
significance in a practical, real-world scenario where one is trying to predict what kind
of crime has occurred, and so, these were omitted. The address was dropped because we
had information about the latitude and longitude, and, in that context, the address did not
add much marginal value.
PAASBAAN
1.5
Introduction
The timestamp contained the year, date and time of occurrence of each crime. This was
decomposed into five features: Year (2018), Month (1-12), Date (1-31), Hour (0-23) and
Minute (0-59).
Following these preprocessing steps, we ran some out-of-the box learning algorithms as a part of
our initial exploratory steps. Our new feature set consisted of 9 features, all of which were now
numeric in nature.
1.4.4 Methodology
After the preprocessing described in the previous sections, we had three different classifications
problems to solve, which we proceeded to attack with an assortment of classification algorithms.
The following are the algorithms which we are using:
Decision Tree
Random Forests
PAASBAAN
1.6
Introduction
∙ Data Entry
∙ Data Preprocessing
Data ∙ GUI(Flask)
∙ Documentation
Scientist
Sourabh Tiwari ∙ Machine Learing
& GUI
∙ Data Analysis
Developer
∙ Data Mining
∙ Kernel Designing
∙ Data Visualization
∙ Data Entry
∙ Data Preprocessing
Data ∙ GUI(Flask)
∙ Documentation
Scientist Vikramaditya
∙ Machine Learing
& GUI Singh Bhati
∙ Data Analysis
Developer
∙ Data Mining
∙ Kernel Designing
∙ Data Visualization
PAASBAAN
1.7
Introduction
The use of AI/ML in predicting crimes or an individual’s likelihood for committing a crime has
promise but is still more of an unknown.The biggest challenge will probably be “proving” to
politicians that it works. When a system is designed to stop something from happening, it is
difficult to prove the negative.Companies that are directly involved in providing governments with
AI tools to monitor areas or predict crime will likely benefit from a positive feedback loop.
Improvements in crime prevention technology will likely spur increased total spending on this
technology.
Possible avenues through which to extend this work include time-series modeling of the data to
understand temporal correlations in it, which can then be used to predict surges in different
categories of crime. It would also be interesting to explore relationships between surges in different
categories of crimes.
For Example: it could be the case that two or more classes of crimes surge and sink together, which
would be an interesting relationship to uncover. Other areas to work on include implementing a
more accurate multi-class classifier, and exploring better ways to visualize our results.
1.6.2 Innovativeness
The idea behind this project is that crimes are relatively predictable; it just requires being able to
sort through a massive volume of data to find patterns that are useful to law enforcement.This kind
of data analysis was technologically impossible a few decades ago, but the hope is that recent
developments in machine learning are up to the task.
1.6.3 Usefulness
Public safety and protection relate to crime, and a better understanding of crime is beneficial in
multiple ways: it can lead to targeted and sensitive practices by law enforcement authorities to
mitigate crime, and more concerted efforts by citizens and authorities to create healthy
neighborhood environments. With the advent of the Big Data era and the availability
PAASBAAN
1.8
Introduction
of fast, efficient algorithms for data analysis, understanding patterns in crime from data is an active
and growing field of research.
Chapter 5 provides Conclusion and future scope as well as future application of this
project
PAASBAAN
1.9
Requirement Engineering
Chapter-2
Requirement Engineering
Ease of use.
Availability
Reliability
Maintainability
PAASBAAN
2.1
Analysis and Design
Chapter-3
Analysis and Design
Thus use case is a set of scenario tied together by some goal. The use case diagram are drawn for
exposing the functionalities of the system.
Fig 3.1-Use case diagram of Paasbaan
PAASBAAN
3.1
Analysis and Design
PAASBAAN
3.2
Analysis and Design
It is a time oriented view of the interation between objects to accomplish a behavioural goal of
the system.
Fig 3.3-Sequence diagram of Paasbaan
PAASBAAN
3.3
Analysis and Design
PAASBAAN
3.4
Construction
Chapter-4
Construction
4.1 Implementation
The implementation of the project is done with the help of python language. To be particular, for
the purpose of machine learning Anaconda is being used.
Anaconda is one of several Python distributions. Anaconda is a new distribution of the Python. It
was formerly known as Continuum Analytics. Anaconda has more than 100 new packages.
Anaconda is used for scientific computing, data science, statistical analysis, and machine learning.
On Python technology, we found out Anaconda to be easier. Since it helps with the following
problems:
This data was scraped from the publically available data from Indore police website which had
been made by people in police station of different areas. Implementation of the idea started from
the Indore city itself so as to limit an area for the prediction and making it less complex. The data
was sorted and converted into a new format of timestamp, longitude, latitude, which was the input
that machine would be taking so as to predict the crime rate in particular location or city.
The entries was done just to make the machine learn what all it has to do with the data and what
actually the output is being demanded. As soon as the machine learnt the algorithms and the
process, accuracy of different algorithms were measured & the algorithm with the most accuracy
is used for the prediction kernel i.e. Random forest.
PAASBAAN
4.1
Construction
A powerful classification algorithm used in pattern recognition K nearest neighbors stores all
available cases and classifies new cases based on a similarity measure (e.g. distance function).One
of the top data mining algorithms used today. A non-parametric lazy learning algorithm (An
Instance based Learning method).
An object (a new instance) is classified by a majority votes for its neighbor classes.
The object is assigned to the most common class amongst its K nearest
neighbors.(measured by distance function)
PAASBAAN
4.2
Construction
It is different from others because it works intuitively i.e., taking decisions one-by-one.
PAASBAAN
4.3
Construction
Decision tree considers the most important variable using some fancy criterion and splits dataset
based on it. It is done to reach a stage where we have homogenous subsets that are giving
predictions with utmost surety.
Thus, the Random Forests algorithm is a variance minimizing algorithm that uses randomness
when making split decision to help avoid overfitting on the training data.
Also, each θk is a randomly chosen parameter vector. If D(x,y) denotes the training dataset, each classification tree in
the ensemble is built using a different subset Dθk(x,y) ⊂ D(x,y) of the training dataset.
PAASBAAN
4.4
Construction
Thus, h(x|θk) is the kth classification tree which uses a subset of features xθk ⊂ x to build a
classification model. Each tree then works like regular decision trees: it partitions the data based
on the value of a particular feature (which is selected randomly from the subset), until the data is
fully partitioned, or the maximum allowed depth is reached. The final output y is obtained by
aggregating the results thus:
PAASBAAN
4.5
Construction
PAASBAAN
4.6
Construction
PAASBAAN
4.7
Construction
PAASBAAN
4.8
Construction
Python (3.6.5)
Packages Used:
HTML 5
CSS 3
Bootstrap 4
PAASBAAN
4.9
Construction
RAM: 4 GB or greater.
4.5 Testing
The development of software involves a series of production activities were opportunities for
injection of human fallibilities are enormous.
Error may begin to occur at very inspection of the process where the objective may be enormously
or imperfectly specified as well as in lateral design and development stage. Because of human
inability to perform and communicate with perfection, software development quality assurance
activities.
Software testing is a crucial element of software quality assurances and represents ultimate review
of specification, design and coding.
submit. kernel
successfully
submit. kernel
successfully
PAASBAAN
4.11
Conclusion and Future Scope
Chapter-5
Conclusion and future scope
5.1 Conclusion
The initial problem of classifying 6 different crime categories was a challenging multi-class
classification problem, and there was not enough predictability in our initial data -set to obtain
very high accuracy on it. We found that a more meaningful approach was to collapse the crime
categories into fewer, larger groups, in order to find structure in the data. We got high accuracy
and precision on Prediction. However, the Violent/Non-violent crime classification did not yield
remarkable results with the same classifiers – this was a significantly harder classification problem.
Thus, collapsing crime categories is not an obvious task and requires careful choice and
consideration.
Possible avenues through which to extend this work include time-series modeling of the data to
understand temporal correlations in it, which can then be used to predict surges in different
categories of crime. It would also be interesting to explore relationships between surges in different
categories of crimes – for example, it could be the case that two or more classes of crimes surge
and sink together, which would be an interesting relationship to uncover. Other areas to work on
include implementing a more accurate multi -class classifier, and exploring better ways to visualize
our results.
Predicting Future Crime Spots: By using historical data and observing where recent crimes
took place we can predict where future crimes will likely happen. For example a rash of
burglaries in one area could correlate with more burglaries in surrounding areas in the near
future. System highlights possible hotspots on a map the police should consider patrolling
more heavily
PAASBAAN
5.1
Conclusion and Future Scope
Predicting Who Will Commit a Crime: Using Face Recognition to predict if a individual
will commit a crime before it happens. The system will detect if there are any suspicious
changes in their behavior or unusual movements. For example if an individual seems to be
walking back and forth in a certain area over and over indicating they might be a pickpocket
or casing the area for a future crime. It will also track individual over time.
Pretrial Release and Parole: After being charged with a crime, most individuals are released
until they actually stand trial. In the past deciding who should be released pretrial or what an
individual’s bail should be set at is mainly now done by judges using their best judgment. In
just a few minutes, judges had to attempt to determine if someone is a flight risk, a serious
danger to society, or at risk to harm a witness if released. It is an imperfect system open to
bias. The media organization’s analysis indicated the system might indirectly contain a strong
racial bias. They found, “That black defendants who did not recidivate over a two-year period
were nearly twice as likely to be misclassified as higher risk compared to their white
counterparts (45 percent vs. 23 percent).” The report raises the question of whether better
AI/ML can eventually produce more accurate predictions or if it would reinforce existing
problems. Any system will be based off of real world data, but if the real world data is
generated by biased police officers, it can make the AI/ML biased.
PAASBAAN
5.2
Conclusion and Future Scope
The idea behind this project is that crimes are relatively predictable; it just requires being able to
sort through a massive volume of data to find patterns that are useful to law enforcement. This
kind of data analysis was technologically impossible a few decades ago, but the hope is that recent
developments in machine learning are up to the task.
The use of AI and machine learning to detect crime via sound or cameras currently exists, is proven
to work, and expected to continue to expand. The use of AI/ML in predicting crimes or an
individual’s likelihood for committing a crime has promise but is still more of an unknown. The
biggest challenge will probably be “proving” to politicians that it works. When a system is
designed to stop something from happening, it is difficult to prove the negative.
Companies that are directly involved in providing governments with AI tools to monitor areas or
predict crime will likely benefit from a positive feedback loop. Improvements in crime prevention
technology will likely spur increased total spending on this technology.
Possible avenues through which to extend this work include time-series modeling of the data to
understand temporal correlations in it, which can then be used to predict surges in different
categories of crime. It would also be interesting to explore relationships between surges in
different categories of crimes – for example, it could be the case that two or more classes of crimes
surge and sink together, which would be an interesting relationship to uncover. Other areas to
work on include implementing a more accurate multi-class classifier, and exploring better ways to
visualize our results.
PAASBAAN
5.3
REFERENCES
References
[1] Bogomolov, Andrey and Lepri, Bruno and Staiano, Jacopo and Oliver, Nuria and Pianesi,
Fabio and Pentland, Alex.2014. Once upon a crime: Towards crime prediction from
demographics and mobile data, Proceedings of the 16th International Conference on Multimodal
Interaction.
[2] Yu, Chung-Hsien and Ward, Max W and Morabito, Melissa and Ding, Wei.2011. Crime
forecasting using data mining techniques, pages 779-786, IEEE 11th International Conference on
Data Mining Workshops (ICDMW)
[3] Kianmehr, Keivan and Alhajj, Reda. 2008. Effectiveness of support vector machine for crime
hot-spots prediction, pages 433-458, Applied Artificial Intelligence, volume 22, number 5.
[4] Toole, Jameson L and Eagle, Nathan and Plotkin, Joshua B. 2011 (TIST), volume 2, number
4, pages 38, ACM Transactions on Intelligent Systems and Technology
[5] Wang, Tong and Rudin, Cynthia and Wagner, Daniel and Sevieri, Rich. 2013. pages 515-
530, Machine Learning and Knowledge Discovery in Databases
[6] Friedman, Jerome H. ”Stochastic gradient boosting.” Computational Statistics and Data
[7]Leo Breiman, Random Forests, Machine Learning, 2001,Volume 45, Number 1, Page 5
PAASBAAN
6.1
Appendix-A
Appendix-A
Fig A.1-Snapshot 1
Fig A.2-Snapshot 2
PAASBAAN
7.1
Appendix-B
Appendix-B
Fig B.1-Snapshot 3
Fig B.2-Snapshot 4
PAASBAAN
8.1
Appendix-C
Appendix-C
Fig C.1-Snapshot 5
Fig C.2-Snapshot 6
PAASBAAN
9.1
Report Documentation & Accounting Page
Address (Details):
Computer Department, K. K. Wagh Institute of Engineering Education & Research,
Hirabai Haridas Vidyanagari, Amrutdham, Nashik
Pin – 422 003, M.S. INDIA.
[email protected]
Report Title: <TITLE of PROJECT>
Author Details (Year, Branch, Roll):
Author [with Address, phone, E-mail]:
Address Year: 2019– 2020
NOTE –
This table should not go beyond this page.
Scale down the Abstract if it does not fit in one page.
Take guide’s Signature in the “Report Checked By:” Cell and Date of Signature in the “Report Checked
Date:” Cell.
This page is the last page of the Dbms Mini Project report and is NOT to be included in the “Page Count”