PIMPRI CHINCHWAD EDUCATION TRUST’S
PIMPRI CHINCHWAD COLLEGE OF
ENGINEERING
SECTOR NO. 26, PRADHIKARAN, NIGDI, PUNE- 411044.
Query Retrieval for object detection in
video using OpenCV & Jetson Xavier
Development
GuidedBoard
By : Prof. S.P.Jagtap
Group Members
Utkarsha Chinchore - BEETB254
Ameya Nerkar - BEETB217
Vrunda Khare - BEETA159
Content
❏ Introduction
❏ Literature Survey
❏ Methodology
❏ Implementation
❏ Expected Conclusion
❏ References
Introduction
Surveillance through video graphics is a very important
aspect in any security system in all domains. A CCTV A user-friendly GUI is developed
footage is considered more important evidence than for easy access of the model for
actual human evidence. Improvement in this technology the use.
is a necessity for security applications.
This model can be deployed in
This project deals with building a model that ★ security
automatically retrieves a query from video captured by ★ healthcare
a camera. The query is provided by user in the form of ★
an image. This model was created using MATLAB,
transport
Python 3.8.10, OpenCV libraries, yolo (you only look ★ defence
once), and TensorFlow technologies. Intel Real Sense ★ automation domains
Depth Camera for recording high-quality video and an
Nvidia Jetson Xavier Development Board for
performing calculations are the two pieces of hardware
needed to make the model.
Problem Statement
To retrieve the query given by
the user from the video
database with the help of
OpenCV by training a model to
analyse the video for the
detection.
Motivation & Need of Project
01 Healthcare : Historically, healthcare institutions have invested large amounts of money in
video surveillance solutions to ensure the safety of their patients, staff, and visitors, at levels
that are often regulated by strict legislation. Theft, infant abduction, and drug diversion are
some of the most common problems addressed by surveillance systems.
02 Smart cities / Transportation : Video analytics has proven to be a tremendous
help in the area of transport, aiding in the development of smart cities.
03 Retail : Brick and mortar retailers can use video analytics to understand who their
customers are and how they behave.
Security : Facial and license plate recognition (LPR) techniques can be used to identify people
04 and vehicles in real-time and make appropriate decisions. For instance, it’s possible to search for
a suspect both in real-time and in stored video footage, or to recognize authorized personnel
and grant access to a secured facility.
Objective &
Scope of Project
1. To train the model
according to sample
database
2. To process the Query.
3. To Retrieve the query
4. Implementation on Xavier Board.
5. To create a GUI for the end user.
Literature Survey
S.No Title of the Year of Publisher
Paper Publication
1. Multi-actor activity detection by modeling object June 2022 ELSEVIER
relationships in extended videos based
2. Secure video retrieval using image query on an OCTOBER 2020 ELSEVIER
untrusted cloud
3. Surveillance Video Retrieval using Effective 2018 IEEE
Matching Techniques
4. Video analytics using deep learning for crowd 2019 Springer Volume
analysis: a review 1
Publication 2019
5. VSS: A Storage System For Video 2019 Cornell University
Publication - 2019
Analytics
Summary of Literature survey
1)The above six papers include the information regarding the video analytics in
a wide range of methods and applications
2)Current systems have begun to look into automatic event detection. These
are often point solutions for detecting license plate numbers, abandoned
objects, or motion in restricted locations. However, the area of context-based
interpretation of the events in a monitored space is still in its infancy.
3)There are many challenges faced that include: using knowledge of time and
deployment conditions to improve video analysis, using geometric models of
the environment and other object and activity models to interpret events, and
using learning techniques to improve system performance and detect unusual
events.
● 4)The work in video indexing of broadcast video has focused on such tasks as
shot boundary detection, story segmentation and high level semantic concept
extraction.
● 5)The latter is based on the classification of video, audio, and text into a small
(10-20) but increasing number of semantically interesting categories such as
outdoor, people, building, road, vegetation, and vehicle.
● 6)For broadcast video, the goal is to find a high level indexing scheme to
facilitate retrieval. The task objectives are very different for surveillance video.
For surveillance video, the primary interest is to learn higher level behavior
patterns.
● 7)In both broadcast and surveillance video, there exists a semantic gap
between the feasible low level feature set and the high level semantics or
ontology desired by the system users.
Methodology : Block Diagram
Hardware
Jetson Xavier nx Development Board
The NVIDIA® Jetson Xavier NX™ Developer Kit includes a power-efficient,
compact Jetson Xavier NX module for AI edge devices. For intelligent machine
OEMs, start-ups and AI application developers who want to create
breakthrough products, the Jetson Xavier NX Developer Kit delivers the
capability to develop and test power-efficient, small form factor solutions with
accurate, multi-modal AI inference
Intel Realsense Depth Camera
The stereo Intel Real Sense depth camera D435 provides high-quality depth for a range
of applications. For applications like robots or augmented and virtual reality, where
viewing as much of the scene as possible is crucial, its broad field of view is ideal. This
compact form factor camera can be easily incorporated into any system and has a range
of up to 10 meters. It also includes our Intel RealSense SDK 2.0 and cross-platform
support.
Software
YOLO is an abbreviation for the term ‘You Only
Look Once’. This is an algorithm that detects and
recognizes various objects in a picture. Object
Open Source Computer Vision Library is an detection in YOLO is done as a regression
open source computer vision and machine problem and provides the class probabilities of
learning software library. OpenCV was built to
provide a common infrastructure for computer the detected images.YOLO algorithm employs
vision applications and to accelerate the use of convolutional neural networks (CNN) to detect
machine perception in the commercial
objects in real-time. The prediction in the entire
products. Being a BSD-licensed product,
OpenCV makes it easy for businesses to utilize image is done in a single algorithm run. The CNN
and modify the code. is used to predict various class probabilities and
bounding boxes simultaneously.
Implementation Steps
1. We tried to extract the multiple frames from the video which is required for
object detection in each respective frame.
2. We implemented the concept using a Gaussian algorithm through
MATLAB Simulink Computer Vision toolbox
3. After analyzing the data we shifted to yolo technology as it does not require
extraction of frames and as well it is more precise (high confidence score)
than the gaussian algorithm.
4. After successfully implementing object detection, we made a GUI for the
end user using Gradio Library.
1. Frame extraction from video
EXTRACTED FRAMES
For implementing the idea of
object detection, we used a
MATLAB image processing and
Computer Vision model working
on gaussian technology.
The object selected for detection is a car.
RESULTS
OBJECT DETECTION USING MOBILENET SSD
OBJECT DETECTION USING YOLO V3
CONFUSION MATRIX
Sr True False False Precision % Recall% F1 Score% Accuracy%
No Positive Negative Positive
1 6 4 2 75 60 60 66
2 9 1 0 100 90 95 94
3 12 2 0 100 86 92 92
4 9 1 0 100 90 95 94
5 39 2 0 100 95 98 97
GUI for the end user
Expected Conclusion
To successfully implement an efficient video analytic
model which can generate an accurate output of the
query retrieved from a certain video and thus help in
reducing the overall frequency of criminal acts, thus
creating a healthy and safe environment for all. The
accuracy of the model should be as high as possible
using the yolo technology. The accuracy of model is :
88.6%, Precision is : 95% respectively. The model
should be easily accessed by the user through a GUI.
Applications
★ Security - Video surveillance
★ Healthcare - Patient Monitoring
★ Transport - Traffic Violation
★ Defence - Object detection for Military guard
★ Automation domains - Industry
Manufacturing unit automation
References
1. F. F. Chamasemani, L. S. Affendey, N. Mustapha and F. Khalid, "Surveillance Video Retrieval Using Effective Matching
Techniques," 2020 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP),
2020, pp. 1-5, doi: 10.1109/INFRKM.2018.8464772.
2. Maureen Daum , Brandon Haynes , Dong He, Amrita Mazumdar , Magdalena
Balazinska, Paul G. Allen School of Computer Science & Engineering, University of
Washington IEEE 37th International Conference on Data Engineering (ICDE),2021
3. W. Hu, T. Tan, L. Wang, and S. Maybank, "A survey on visual surveillance of object motion and behaviors," IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 34, no. 3, pp. 334-352, 2004.
4. K. Huang and T. Tan, "Vs-star: A visual interpretation system for visual surveillance," Pattern Recognition Letters, vol.
31, no. 14, pp. 2265- 2285, 2020.
5. T.-L. Le, "Relevance feedback for surveillance video retrieval at object level," in Future Information Technology:
Springer, 2021, pp. 175-182.
6. A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar, "Can high-level concepts fill the semantic gap in video
retrieval? A case study with broadcast news," IEEE transactions on multimedia, vol. 9, no. 5, pp. 958-966, 2019.
7. F. F. Chamasemani, L. S. Affendey, N. Mustapha, and F. Khalid, "A framework for automatic video surveillance indexing
and retrieval," Research Journal of Applied Sciences, Engineering and Technology, vol. 10, no. 11, pp. 1316-1321, 2019.
8. T. Brodsky et al., "Visual surveillance in retail stores and in the home," in VideoBased Surveillance Systems: Springer, 2018,
pp. 51-61.
9. N. Durak, A. Yazici, and R. George, "Online surveillance video archive system," in International Conference on
Multimedia Modeling, 2020, pp. 376-385: Springer.
10. A. Hampapur et al., "Searching surveillance video," in Advanced Video and Signal
Based Surveillance, 2007. AVSS 2007. IEEE Conference on, 2017, pp. 75-80: IEEE.
11. W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, "Semantic-based surveillance video retrieval," IEEE Transactions on
image processing, vol. 16, no. 4, pp. 1168-1181, 2017.
12 Y.-K. Jung, K.-W. Lee, and Y.-S. Ho, "Content-based event retrieval using semantic scene interpretation for automated
traffic surveillance," IEEE Transactions on Intelligent Transportation Systems, vol. 2, no. 3, pp. 151-163, 2001.
13. H. Lee, A. F. Smeaton, N. O'Connor, and N. Murphy, "User-interface to a CCTV video search system," 2015.
14. E. ùaykol, U. Güdükbay, and Ö. Ulusoy, "Scenario-based query processing for videosurveillance archives," Engineering
Applications of Artificial Intelligence, vol. 23, no. 3, pp. 331-345, 2010.
15. A. Doulamis, N. Doulamis, L. Van Gool, and M. Nixon, "Guest editorial: Eventbased video analysis/retrieval,"
Multimedia Tools and Applications, vol. 69, no. 2, pp. 247-251, 2014.
16. F. F. Chamasemani, L. S. Affendey, F. Khalid, and N. Mustapha, "Object detection and representation method for
surveillance video indexing," in Smart Instrumentation, Measurement and Applications (ICSIMA), 2015 IEEE 3rd
International Conference on, 2015, pp. 1-5: IEEE.
Thank You !