Criminova Crime Forecast
Criminova Crime Forecast
Abhijith A V (PEC20CS002)
Anziya A S (PEC20CS011)
Don Sabu (PEC20CS019)
Nithin Kumar S (PEC20CS030)
Under the guidance of
Mrs. Liji Sara Varghese
Certificate
This is to certify that the Project Phase - II report entitled “CRIMINOVA: Crime Forecast”
is a bonafide record of the Project Phase - II by Abhijith A V,Anziya A S,Don Sabu,Nithin
Kumar S during the year 2023-2024, in partial fulfilment of the requirements for the award
of B-Tech Degree in Computer Science and Engineering of APJ Abdul Kalam Technological
University is a bonafide record of the Project work carried out by them under our guidance and
supervision. This report in any form has not been submitted to any other University or Institute
for any purpose.
Place: CE Pathanapuram
Date:May-2024
Abhijith A V (PEC20CS002)
Anziya A S (PEC20CS011)
Don Sabu (PEC20CS019)
Nithin Kumar S (PEC20CS030)
Place: CE Pathanapuram
Date:May-2024
Acknowledgement
Firstly, we would like to thank Almighty, thus we were able to complete our project within
the given time.
We also express our sincere gratitude to Mr. Prasanth R, Head of the Computer Science
and Engineering Department, for providing all support cooperation.
We are very much thankful to Mrs. Jooby E and Mrs. Prameela S , our Project
coordinators for giving us moral support and cooperation. It is our at most pleasure to convey
our sincere gratitude to Mrs. Liji Sara Varghese, our guide, for providing all valuable
suggestions and support.
Finally I thank my family, and friends who contributed to the successful fulfilment of this
project work.
Abhijith A V (PEC20CS002)
Anziya A S (PEC20CS011)
Don Sabu (PEC20CS019)
Nithin Kumar S (PEC20CS030)
i
Abstract
Criminal activity is one of the major problems in our society. With the revival of such
activities globally every day, it is quite difficult to manage and investigate the incidents by
crime investigation agencies either because of less head counts of cops or criminals are smarter
than investigation process. Traditional process of investigation for police department takes
quite longer to predict about the criminal profiles, to suspect the next future crime location, or
to know the pattern of crime. Therefore, there is need to analyze the historical crime patterns
more effectively in minimum time, and predicting the future location and type of crime. Police
department needs a systematic way for analyzing criminal profile easily and find the associated
criminals who can be associated to that crime. Advanced analytics system is also required
to track other information such as traffic sensors, calls, videos, police service calls etc. for
monitoring the criminal activities. In this project, we have discussed how machine learning
approaches can be used to prevent the deal with such cases.
ii
Contents
Acknowledgement i
Abstract ii
List of Figures v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Outline of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 3
2.1 Spatio-Temporal crime hotspot detection and prediction . . . . . . . . . . . . . 3
2.2 An empirical analysis of machine learning algorithms for crime prediction
using Stacked Generalization: An Ensemble Approach. . . . . . . . . . . . . . 4
2.3 Smart policing technique with crime type and risk score prediction based on
machine learning for early awareness of risk situation . . . . . . . . . . . . . . 4
2.4 A study on predicting crime rates through machine learning and data mining . 5
2.5 Criminal behavior analysis based on machine learning techniques . . . . . . . 5
2.6 Novel Multi-Module approach to predict crime . . . . . . . . . . . . . . . . . 6
2.7 Multimodal deep learning crime prediction using Tweets . . . . . . . . . . . . 6
3 Requirement Analysis 8
3.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Non Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 8
iii
3.3 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Methodology 12
4.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.1 Data Processing Module . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.2 Machine Learning Module . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.3 Web Application Module . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Implementation 17
References 27
iv
List of Figures
6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.4 Output 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.5 Output 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.6 Output 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
v
Chapter 1
Introduction
1.1 Background
Crime remains a significant challenge faced by societies worldwide. From property crimes
to violent offenses, crime rates continue to pose a threat to public safety and well-being. Law
enforcement agencies, despite their dedication, often struggle with limited resources and a
reactive approach focused on solving crimes after they occur. Predicting criminal activity can
be difficult, making it challenging to prevent crimes before they happen. Predictive policing,
utilizing data-driven technologies and predictive analytics, offers a proactive solution. By
analyzing past crime data, identifying patterns, and forecasting future criminal activity, law
enforcement can optimize resource allocation and reduce crime rates. However, challenges
such as data privacy, algorithmic bias, and ethical implications must be addressed. Despite
these challenges, the potential benefits of predictive policing, including improved crime
prevention and enhanced public safety, make it a promising tool in modern law enforcement.
1.2 Objectives
This project aims to enhance public safety through crime prediction. Firstly, we will develop
advanced predictive models that leverage data on past crimes, geographical factors, and socio-
economic indicators. By analyzing these combined datasets, the models will learn to identify
patterns and trends associated with criminal activity. Secondly, these models will be used to
assist law enforcement in allocating their resources more effectively. By pinpointing areas
with a higher likelihood of crime occurrence, police can proactively deploy their personnel to
1
CHAPTER 1. INTRODUCTION
high-risk zones, deterring potential crimes before they happen. Ultimately, this system aims to
contribute to a significant reduction in crime by anticipating potential occurrences and enabling
preventative measures to be taken by law enforcement.
• Develop predictive models using crime data, geography, and socioeconomics to forecast
criminal activity accurately.
• Aid law enforcement in proactive resource allocation for public safety measures.
Literature Review
3
CHAPTER 2. LITERATURE REVIEW
2.3 Smart policing technique with crime type and risk score
prediction based on machine learning for early aware-
ness of risk situation
This paper presents a novel machine learning-based technique for predicting crime type and
risk level, crucial for prompt and efficient law enforcement responses. Leveraging text-based
criminal case summaries, the system utilizes the KICS data format, containing comprehensive
policing data. With consideration for 21 representative crime types, the system predicts the
specific type of crime for each case. Additionally, a formula is developed to calculate crime risk
level, considering severity and damage. DNN and CNN-based prediction models are designed
for both crime type and risk score prediction. Evaluation demonstrates superior performance
compared to traditional algorithms, with CNN-based models outperforming SVM and naı̈ve
Bayes by 7% and 8% respectively in crime type prediction. The developed technology,
implemented as a user-friendly software platform, empowers field personnel like police officers
to swiftly identify crime types and risk levels upon receiving new cases, enhancing operational
efficiency.
underscoring the necessity of predictive patterns in crime identification and sets out to survey
ML approaches applied to criminal profiling. The literature review section summarizes
various studies, including clustering techniques, blockchain technology for surveillance, and
analytic algorithms for crime prediction. Additionally, it delves into the application of ML
in criminal activities, particularly in financial institutions for fraud detection and money
laundering prevention. The section stresses the importance of scenario analysis in evaluating
the risks associated with AI-driven crime-fighting tools. The conclusion underscores ethical
considerations and risk assessments in deploying AI for crime prevention, advocating for
responsible use and suggesting avenues for future research. Overall, the paper offers valuable
insights for researchers, law enforcement agencies, and policymakers seeking to harness ML
for enhancing crime prediction and prevention efforts.
enhancing the predictive capabilities of the model. By applying data fusion to a ConvBiLSTM
model, independent vectors from both tweet and crime modalities are combined into a unified
representation. The study conducted experiments using datasets from the Chicago police
department and crime-related tweets specific to Chicago. Performance evaluation against
various crime prediction models, including traditional deep-learning and BERT-based models,
demonstrated the superiority of the proposed ConvBiLSTM model with multimodal data
fusion, achieving an accuracy of 97.75%. This approach showcases promising results in
enhancing crime prediction accuracy by incorporating social media sentiment analysis into
predictive models.
Requirement Analysis
• Result Presentation:Display predicted crime type and nature on the web page.
8
CHAPTER 3. REQUIREMENT ANALYSIS
The application should be designed to operate efficiently across various operating systems to
accommodate a wide range of users. Compatibility with major OS platforms such as Windows,
macOS, and various distributions of Linux, including Ubuntu, CentOS, and Fedora, is crucial.
By ensuring compatibility with multiple operating environments, the application can reach a
larger user base and provide a consistent user experience across different platforms.
Python serves as the primary programming language for the project due to its versatility,
ease of use, and extensive ecosystem of libraries and frameworks. Leveraging the latest
stable version of Python 3.x ensures access to the latest language features, performance
improvements, and security updates. Additionally, Python’s strong support for scientific
computing and machine learning makes it an ideal choice for implementing deep learning
algorithms and processing large datasets efficiently.
Pandas is essential for handling structured data within the application, particularly for tasks
such as data cleaning, transformation, and analysis. Its powerful data manipulation capabilities,
including support for data alignment, indexing, and aggregation, streamline the processing
of nutritional datasets. Integrating the latest version of Pandas ensures compatibility with
new features, bug fixes, and performance improvements, enhancing the application’s data
processing capabilities.
NumPy plays a critical role in scientific computing tasks, providing support for mul-
tidimensional arrays, mathematical functions, and linear algebra operations. Within the
project, NumPy facilitates numerical computations, data preprocessing, and statistical analysis,
enabling efficient manipulation and processing of numerical data. Leveraging the latest
version of NumPy ensures access to new features, optimizations, and bug fixes, improving
the performance and reliability of numerical computations.
Web Browser
Users interact with the application through modern web browsers such as Google Chrome,
Mozilla Firefox, Apple Safari, or Microsoft Edge. Ensuring compatibility with a variety of
web browsers enhances the accessibility and usability of the application, allowing users to
access it seamlessly from their preferred browser on desktop or mobile devices. Compatibility
testing across different browsers helps identify and address any compatibility issues, ensuring
a consistent user experience across platforms.
Developers use integrated development environments (IDEs) or text editors for writing,
editing, and debugging code during the development process. Popular IDEs such as PyCharm,
Jupyter Notebook, Visual Studio Code, Sublime Text, and Atom provide features such as
syntax highlighting, code completion, and version control integration, improving developer
productivity and code quality. Choosing the right IDE or text editor based on individual
preferences and project requirements helps streamline the development workflow and ensure
code consistency and quality.
Flask Framework
The system shall employ Flask, a lightweight and flexible web framework, for developing
and deploying the web application. Flask shall provide routing, request handling, template
rendering, and other essential functionalities to facilitate the development of user-friendly web
interfaces.
Geocoding Service
The system shall integrate with a geocoding service, such as Nominatim (provided by
OpenStreetMap), to convert user-provided addresses into geographical coordinates (latitude
and longitude). API requests to the geocoding service shall be efficiently managed to minimize
latency and ensure timely response to user input.
Methodology
12
CHAPTER 4. METHODOLOGY
• User Interface (UI): This component enables users to interact with the system by
providing input, such as location and time data, and receiving crime predictions. The
UI may take the form of a web application, mobile app, or other user-friendly interface.
• Machine Learning Kernel: At the heart of the system lies the machine learning kernel
such as Random Forest Classifiers, which is responsible for crime prediction. This
component processes user input, extracts relevant features, queries the database for
historical crime data, and applies the trained machine learning model to make predictions.
• Database: The database stores historical crime data used for training the machine
learning model and making predictions. It contains detailed information about past
crime incidents, including timestamps, locations, crime types, and other relevant factors.
The database provides the necessary data for the machine learning kernel to analyze and
derive insights.
• Result Presentation: Once the machine learning model has processed the user input
and made predictions, the results are presented back to the user through the UI. This
component displays the predicted crime likelihoods, associated crime types, severity
levels, or overall risk assessments based on the specified location and time.
Within this module, a suite of powerful libraries such as Pandas, NumPy, Matplotlib, and
Seaborn come into play. Pandas serves as the cornerstone for data manipulation and analysis,
offering robust tools for handling structured data. NumPy complements Pandas with its efficient
numerical operations, while Matplotlib and Seaborn provide versatile options for visualizing
data in various formats. Together, these modules form a cohesive framework for loading,
cleaning, preprocessing, and visualizing crime data, ensuring its readiness for subsequent
analysis.
At the heart of the crime prediction system lies the machine learning module, empowered
by libraries like Scikit-learn and Joblib. Scikit-learn stands out as a comprehensive toolkit for
machine learning tasks, offering a wide array of algorithms and utilities for model training,
evaluation, and prediction. Meanwhile, Joblib plays a crucial role in the persistence of
machine learning models, enabling seamless saving and loading of trained models. Within
this module, sophisticated algorithms such as the Random Forest Classifier are employed to
construct predictive models based on historical crime data, leveraging patterns and correlations
to anticipate future crime occurrences.
The user interface of the crime prediction system is powered by the Flask web framework,
which forms the cornerstone of the web application module. Flask’s lightweight and flexible
architecture make it an ideal choice for developing interactive web applications. It handles
user requests, routing them to appropriate functions, and rendering HTML templates to
present prediction results in a visually appealing manner. Additionally, Geopy, a geocoding
library, may be incorporated to process location data input by users, translating addresses into
geographic coordinates for precise spatial analysis.
4.4 Algorithm
Here’s a step-by-step algorithm for crime prediction using the Random Forest algorithm:
Step 1. Input: Historical crime dataset containing features such as timestamps, locations,
and crime types.
Step 2. Data Preprocessing:Clean the historical crime data to remove missing values,
outliers, and inconsistencies.
- Extract relevant features from the dataset, such as year, month, day, hour, and day of the
week from the timestamps.
- Encode categorical features like crime types using techniques such as one-hot encoding.
- Split the dataset into training and testing sets for model evaluation.
Step 3. Model Training: Initialize a Random Forest classifier with hyperparameters like
the number of trees (n estimators), maximum depth, and minimum samples per leaf.
- Train the Random Forest model on the preprocessed training data, where features are the
extracted attributes, and labels are the crime types.
Step 4. Feature Importance:Assess the importance of features in the trained Random Forest
model to understand which factors contribute most to crime occurrences.
-Features with higher importance scores indicate a stronger influence on crime prediction.
Step 5. Prediction: Preprocess user-provided input, including location and timestamp,
similar to the training data preprocessing steps.
-Extract relevant features from the input data and encode categorical features as needed.
-Use the trained Random Forest model to predict the likelihood of different crime types
occurring at the specified location and time.
-The prediction is based on the majority vote or probability distribution of the individual
decision trees in the Random Forest ensemble.
Step 6. Output:Display the predicted crime likelihoods for various crime types to the user,
providing insights into potential risks associated with the specified location and time.
-Visualize the predictions using charts or maps to enhance user understanding and decision-
making.
-This algorithm enables users to input location and timestamp data and receive predictions
about potential crime occurrences, helping them make informed decisions to enhance personal
safety and security.
Implementation
The implementation of the Crime prediction system involves several steps and functions to
perform various tasks.
Importing Libraries: The code starts by importing necessary libraries such as pandas,
numpy, matplotlib, Flask, joblib, and RandomForestClassifier from sklearn. These libraries
are used for data manipulation, model training, web application development, and machine
learning tasks.
Loading Data:The code loads a CSV file named ’data.csv’ into a pandas DataFrame using
the pd.read csv() function. The loaded data is stored in the variable data.
17
CHAPTER 5. IMPLEMENTATION
Feature Engineering: Additional features like year, month, day, etc., are extracted from
the timestamp column using the ’dt’ accessor provided by pandas. These features are stored
in a new DataFrame called dat and then concatenated with the original DataFrame data using
’pd.concat()’.
Model Training: The data is split into training and testing sets using the train test split()
function from sklearn. Then, a RandomForestClassifier model is instantiated with 100
estimators and trained on the training data using the fit() method.
Model Evaluation: The trained model is used to make predictions on the testing data
using the predict() method. The accuracy of the model is evaluated using the accuracy score()
function and the classification report() function from sklearn.metrics.
Setting up Flask App: The code initializes a Flask application instance by creating an
object of the Flask class with the name app.
Loading the Trained Model: The code loads the trained Random Forest model (’rf model.pkl’)
using the joblib.load() function and assigns it to the variable rfc.
Defining Routes: The code defines several routes using the @app.route() decorator. Each
route corresponds to a different URL path and HTTP method.
Rendering HTML Templates: The routes return HTML templates using the ren-
der template() function. The templates are located in the ’templates’ directory of the Flask
application.
Processing Form Data: When the user submits a form with timestamp data, the
/result.html route receives the form data using the POST method. The predict() function
preprocesses the data, makes predictions using the trained model, and renders the result on
the web page.
Running the Flask App: The code checks if the script is run directly ( name ==
’ main ’) and starts the Flask application with debug mode enabled.This Flask code sets
up routes for different pages of the web application, loads a trained machine learning model,
processes form data, and renders HTML templates to display the results on the web page.
The fig 6.1 and 6.2 depicts the processed crime dataset and its corresponding heatmap, show-
casing the distribution of crime occurrences across different locations and timestamps. The
heatmap provides a visual representation of crime hotspots and trends, aiding in understanding
patterns and identifying areas of high crime density.
22
CHAPTER 6. RESULT AND DISCUSSIONS
The fig 6.3 displays a bar graph illustrating the distribution of crime incidents by month
and day of the week. This visualization helps in identifying any temporal patterns or trends in
crime occurrences, such as seasonal variations or specific days with higher crime rates.
Figure 6.4,6.5,6.6 showcases the user interface of the crime prediction system. Users are
prompted to input their location (address) and the desired date for which they seek crime
predictions. Upon submitting the information, the system processes the input data, applies
the trained machine learning model, and provides predictions on potential crime occurrences at
the specified location and date.
7.1 Conclusion
In conclusion, the development and implementation of the crime prediction system mark a
significant step forward in leveraging machine learning technology to enhance public safety
and law enforcement efforts. Through the analysis of historical crime data and the utilization
of advanced predictive algorithms, the system provides valuable insights into potential crime
occurrences, enabling proactive measures to be taken to mitigate risks and allocate resources
effectively. The system’s ability to identify crime patterns, hotspots, and temporal trends
empowers both individuals and law enforcement agencies to make informed decisions and
take timely actions to address security concerns. By leveraging data-driven approaches, such
as heatmap visualizations and predictive modeling, the system contributes to a more proactive
and strategic approach to crime prevention and intervention. Furthermore, the user-friendly
interface facilitates seamless interaction with the system, allowing users to input their location
and desired date to receive personalized crime predictions. This empowers individuals to take
proactive measures to safeguard themselves and their communities, fostering a sense of security
and confidence in the effectiveness of the system.
25
CHAPTER 7. CONCLUSION AND FUTURE SCOPE
[1] Mandalapu, Varun, et al. ”Crime prediction using machine learning and deep learning: A
systematic review and future directions.” IEEE Access (2023).
[2] Baek, Myung-Sun, et al. ”Smart policing technique with crime type and risk score
prediction based on machine learning for early awareness of risk situation.” IEEE Access
9 (2021): 131906-131915.
[3] Kshatri, Sapna Singh, et al. ”An empirical analysis of machine learning algorithms for
crime prediction using stacked generalization: an ensemble approach.” Ieee Access 9
(2021): 67488-67500.
[4] Travaini, Guido Vittorio, et al. ”Machine learning and criminal justice: A systematic
review of advanced methodology for recidivism risk prediction.” International journal of
environmental research and public health 19.17 (2022): 10594.
[5] Tam, Sakirin, and Ömer ÖzgürTanrıöver. ”Multimodal Deep Learning Crime Prediction
Using Crime and Tweets.” IEEE Access (2023).
[6] Kwan-Loo, Kevin B., et al. ”Detection of violent behavior using neural networks and pose
estimation.” IEEE Access 10 (2022): 86339-86352.
[7] Tasnim, Nowshin, Iftekher Toufique Imam, and M. M. A. Hashem. ”A novel multi-
module approach to predict crime based on multivariate spatio-temporal data using
attention and sequential fusion model.” IEEE Access 10 (2022): 48009-48030.
[8] Chen, Fan, et al. ”Wifi Log-Based Student Behavior Analysis and Visualization
System.” The International Archives of the Photogrammetry, Remote Sensing and Spatial
Information Sciences 43 (2022): 493-499.
27
REFERENCES
[9] Lin, Chih-Yang, et al. ”Invisible adversarial attacks on deep learning-based face
recognition models.” IEEE Access (2023).
[10] Abdelfattah, Mazen, et al. ”Towards universal physical attacks on cascaded camera-lidar
3d object detection models.” 2021 IEEE International Conference on Image Processing
(ICIP). IEEE, 2021.