Suspicious Activity Detection Using Different Models
Suspicious Activity Detection Using Different Models
Suspicious Activity Detection Using Different Models
https://doi.org/10.22214/ijraset.2023.50729
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: In today's insecure world, video surveillance systems play a significant role in keeping both indoors and outdoors
secure. Real-time applications can utilize video surveillance components, such as behavior recognition, understanding and
classifying activities as normal or suspicious. People are at risk from suspicious activities when it comes to the potential danger
they pose. Detecting criminal activities in urban and suburban areas is necessary to minimize such incidents as criminal activity
increases. The early days of surveillance were carried out manually by humans and involved a lot of fatigue, since suspicious
activities were rare compared to everyday activities. Various surveillance approaches were introduced with the advent of
intelligent surveillance systems. This paper analyzes two cases that could pose a threat to human lives if ignored, namely the
detection of gun-related crimes, the detection of abandoned luggage, the detection of human violence, the detection of lock
hammering, the theft of wallets, and the tempering of ATMs on surveillance video frames. In these papers they have used a
neural network model that is Faster R-CNN and YOLOv3 technique to detect these activities.
I. INTRODUCTION
Video surveillance systems are the only way to detect crimes such as stealing bags, abandoning bags on stations, stabbing with
knives, and using guns, which are on the rise every day. However, video surveillance systems have the disadvantage of requiring
continuous human attention, reducing their efficiency. Video surveillance has been automated to solve this problem. It is impossible
to manually monitor all events on CCTV cameras today. A manual search in the recorded video would waste a lot of time, even if
the event had already occurred. Automated video surveillance systems are investigating abnormal events from video footage. Video
surveillance can be automated to solve this problem. Automated systems give indications in the form of alarms or other forms when
predefined abnormal activities occur. As stated in the papers, they used a semantic based approach which involves defining
suspicious activities, background subtraction, object detection, tracking & classification of suspicious activities within the
framework of a system
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2668
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
In order to detect objects, most researchers use a machine learning approach. For training, a standard reliable dataset is required,
which is difficult to obtain. The machine learning approach becomes less reliable as a result. The hierarchical semantic approach is
used in our system. There will be a focus on areas such as early detection and recognition of activities. A method for predicting
human activity is presented in the research paper. Their primary concern is recognizing events early (for instance, a man picking up
a gun with his hand). A probabilistic activity prediction problem is formulated, and new methodologies are introduced to solve it.
Spatio-temporal features are analyzed using an integral histogram. As a result of considering the sequential nature of human
activities and handling noisy data, they named their new recognition methodology dynamic bag-of-words.
IV. METHODOLOGY
A. Input Data
The dataset of images is given as an input which contains the images of three activities that are gun detection, knife detection and
fight detection. This data is raw data which has images of different shapes, pixels. We were having the 55,000 images data, but due
to less computing power we were able to use only 19,000 images. We converted fight videos to frames for the fight dataset from the
following link: [https://github.com/seymanurakti/fight-detection-surv-dataset] The images for the gun and knife detection were
collected from different sources on the internet. The link to it is given below: [https://drive.google.com/drive/folders/1bC3BmrRsx-
_papIyvUbBvXpnR3Dsxr-s?usp=sharing]
B. Reshaping
We were having the images of different pixels in our dataset. Hence, we converted all the images to a constant pixels ratio that is
300x200.
C. Splitting
The dataset is splitted in three subfolders: training (70%), testing (15%) and validation (15%).
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2669
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
D. Labelling
bag : 0
fight : 1
gun : 2
knife : 3
normal : 4
stealing : 5
Converted the input data to labelled input data from which the model will learn to predict the activity.
E. Binary Conversion
Read the images from training, testing and validation folders. Converted these images to the color images and then converted into
binary format. After this step, our .npy binary converted file will be ready to provide as an input to the model.
F. Model
For the classification of these activities we have used three models that are simple CNN, ResNet50v2 and VGG19. These models
will take the binary images as an input, extract the features from it and learn from it. Models will provide accuracy, loss, validation
accuracy and validation loss. Based on this we can judge the performances of these models and find out the best model for
prediction. These parameters are explained below:
1) Accuracy: Accuracy is the fraction of correct predictions made by the model on a given set of data.
Accuracy = (Number of Correct Predictions) / (Total Number of Predictions)
2) Loss: Loss is a measure of how well the model is able to predict the target variable. It is the error between the predicted value
and the actual value of the target variable. The loss is usually computed using a loss function, such as mean squared error
(MSE) or cross-entropy loss.
Loss = Loss Function(Predicted Value, Actual Value)
For example, the mean squared error (MSE) loss function can be defined as:
MSE = (1 / n) * Σ(y_pred - y_actual)²
where ‘n’ is the number of examples, ‘y_pred’ is the predicted value, and ‘y_actual’ is the actual value.
3) Validation Accuracy: Validation accuracy is the accuracy of the model on a validation set, which is a set of data that is not used
for training but is used to evaluate the model's performance. The validation accuracy is a measure of how well the model is
able to generalize to new, unseen data.
Validation Accuracy = (Number of Correct Predictions on Validation Set) / (Total Number of Predictions on Validation Set)
4) Validation Loss: Validation loss is the loss of the model on a validation set. It is a measure of how well the model is able to
predict the target variable on new, unseen data. The validation loss is usually used to monitor the model's performance during
training and to prevent overfitting.
Validation Loss = Loss Function(Predicted Value on Validation Set, Actual Value on Validation Set)
For example, the cross-entropy loss function can be defined as:
Cross-entropy loss = -Σ(y_actual * log(y_pred) + (1 - y_actual) * log(1 - y_pred))
where ‘y_pred’ is the predicted value and ‘y_actual’ is the actual value, and the sum is taken over all examples in the validation set.
G. Prediction
Model will predict the activity in the image with some accuracy label. For the following image, this image is detected as a fight
with 99.84% accuracy.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2670
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
V. SIMULATION
From the confusion matrix, we can calculate various metrics such as accuracy, precision, recall, and F1- score, which provide
insight into the performance of the classification model. Hence, first we will take a look at the confusion matrices of these models.
1) Simple CNN:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2671
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
2) Simple CNN:
Figure 5: Model accuracy and loss graph for simple CNN model
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2672
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
3) ResNet50v2:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2673
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
4) VGG19:
Figure 11: Model accuracy and loss graph for VGG19 model
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2674
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
VI. RESULTS
1) Gun Detection:
The VGG19 model is used because of its highest accuracy. It detects guns accurately as we can see in the above images which are
the output given by the model with accuracy percentage label.
2) Knife Detection:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2675
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The simple CNN will also provide the same results since both the models (simple CNN and VGG19) has same accuracy for knife
detection.
3) Fight Detection:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2676
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
VIII. CONCLUSION
In this work, we have performed gun detection, knife detection and fight detection. We used three models that are simple CNN,
ResNet50v2 and VGG19. From these three, we got the highest accuracy by VGG19 model, which is 95.4827. But, rather than
having this much validation accuracy, sometimes this model wrongly detects the activities from unseen image data. This issue is
known as an overfitting issue which generally occurs in machine learning models. We can solve this issue by increasing the training
data if we get more computing power. Also, we are not able to perform live detection of these activities because we didn't get the
annotated dataset and this works only for a classification purpose.
REFERENCES
[1] U. M. Kamthe and C. G. Patil, "Suspicious Activity Recognition in Video Surveillance System," 2018 Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA), 2018, pp. 1-6, doi: 10.1109/ICCUBEA.2018.8697408.
[2] S. Loganathan, G. Kariyawasam and P. Sumathipala, "Suspicious Activity Detection in Surveillance Footage," 2019 International Conference on Electrical
and Computing Technologies and Applications (ICECTA), 2019, pp. 1-4, doi: 10.1109/ICECTA48151.2019.8959600.
[3] N. Bordoloi, A. K. Talukdar and K. K. Sarma, "Suspicious Activity Detection from Videos using YOLOv3," 2020 IEEE 17th India Council International
Conference (INDICON), 2020, pp. 1-5, doi: 10.1109/INDICON49873.2020.9342230.
[4] Aditya G, Prasham S, Sumon G, Vaibhav S, Dr. Vaqar A, "Suspicious Activity Detection", Volume: 10, Issue: XI, Month of publication: November 2022, pp:
113-116 (ISSN no. 2321-9653, IC Value: 45.98, Impact Factor: 7.538), UGC Approved, doi: https://doi.org/10.22214/ijraset.2022.47186
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2677