0% found this document useful (0 votes)
27 views23 pages

Object Detection11

This project aims to implement real-time object detection using Python and OpenCV. The system utilizes the SSD-MobileNetV3 deep learning model to identify and localize objects in video frames from a webcam. Bounding boxes are overlaid on detected objects along with class names and confidence scores. The objectives are to create an efficient real-time system, integrate OpenCV for video processing, configure the SSD-MobileNetV3 model pre-trained on COCO, and provide iterative processing in a continuous loop for live video streams.

Uploaded by

asha.py81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views23 pages

Object Detection11

This project aims to implement real-time object detection using Python and OpenCV. The system utilizes the SSD-MobileNetV3 deep learning model to identify and localize objects in video frames from a webcam. Bounding boxes are overlaid on detected objects along with class names and confidence scores. The objectives are to create an efficient real-time system, integrate OpenCV for video processing, configure the SSD-MobileNetV3 model pre-trained on COCO, and provide iterative processing in a continuous loop for live video streams.

Uploaded by

asha.py81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

A

Synopsis/Project Report
On

REAL TIME OBJECT DETECTION


in Python
Submitted in partial fulfillment of the requirement for the VI semester
Bachelor of Computer Science
By

Asha Pandey
Harshit Lohani
Under the Guidance of
Mrs.Senam Pandey
Assistant Professor
Department of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


GRAPHIC ERA HILL UNIVERSITY, BHIMTAL CAMPUS
SATTAL ROAD, P.O. BHOWALI,
DISTRICT- NAINITAL-263132
2022- 2023
STUDENT’S DECLARATION

We, Asha Pandey and Harshit Lohani here by declare the work, which is being presented in

the project, entitled “Object Detection” in partial fulfillment of the requirement for the award

of the degree B.Tech in the session 2022-2023, is an authentic record of my own work carried

out under the supervision of “Mrs.Senam Pandey”, Assistant Professor, Department of CSE,

Graphic Era Hill University, Bhimtal.

The matter embodied in this project has not been submitted by us for the award of any other

degree.

Date:

Asha Pandey

Harshit Lohani

CERTIFICATE
The project report entitled “Real time object detection” being submitted by Harshit Lohani

and Asha Pandey to Graphic Era Hill University Bhimtal Campus for the award of

bonafide work carried out by them. They have worked under my guidance and supervision

and fulfilled the requirement for the submission of report.

(Mrs.Senam Pandey) (Dr. Ankur Bisht)

Project Guide (HOD, CSE Dept.)

ACKNOWLEDGEMENT
We take immense pleasure in thanking Honorable “Mrs.Senam Pandey” (Assistant

Professor, CSE, GEHU Bhimtal Campus) to permit me and carry out this project work with

his excellent and optimistic supervision. This has all been possible due to his novel inspiration,

able guidance and useful suggestions that helped me to develop as a creative researcher and

complete the research work, in time.

Words are inadequate in offering my thanks to GOD for providing me everything that we

need. We again want to extend thanks to our President “Prof. (Dr.) Kamal Ghanshala” for

providing us all infrastructure and facilities to work in need without which this work could not be

possible.

Many thanks to Professor “Dr. Manoj Chandra Lohani” (Director Gehu Bhimtal),

other faculties for their insightful comments, constructive suggestions, valuable advice, and time

in reviewing this thesis.

Finally, yet importantly, we would like to express my heartiest thanks to our beloved parents,

for their moral support, affection and blessings. We would also like to pay our sincere thanks to

all our friends and well-wishers for their help and wishes for the successful completion of this

research.

Harshit Lohani

Asha Pandey
TABLE OF CONTENTS

Declaration…………………………………………………………………………..I

Certificate……………………………………………………………………………II

Acknowledgement…………………………………………………………………..III

Abstract………………………………………………………………………………IV

Table of Contents…………………………………………………………………….

List of Publications…………………………………………………………………..

List of Tables…………………………………………………………………………

List of Figures………………………………………………………………………..

List of Symbols……………………………………………………………………….

List of Abbreviations………………………………………………………………...

CHAPTER 1: INTRODUCTION……………………………………………

1.1 Objective………………………………………………………

1.2 Background and Motivations………………………………….

1.3 Problem Statement…………………………………………….

1.4 Objectives and Research Methodology……………………….

1.5 Project Organization…………………………………………..

CHAPTER 2: PROPOSED SYSTEM………………………………………

2.1 History………………………………………………………...

CHAPTER 3: S/W AND H/W REQUIREMENTS

3.1 S/W and H/W requirements …………………………………………

3.1.1 Security………………………………………………………………
3.2 Resources and Technology used……………………………………..

CHAPTER 4: ER DIAGRAM……………………………………………………….

4.1 ER Diagram…………………………………………………………...

CHAPTER 5: CODING OF FUNCTION…………………………………………..

5.1 Basic modules of the project…………………………………………..

CHAPTER 6: LIMITATIONS

CHAPTER 7: CONCLUSION

REFERENCES………………………………………………………...
PROJECT ABSTRACT
This project focuses on implementing a real-time object detection system using OpenCV and the

SSD-MobileNetV3 model. The objective is to leverage computer vision techniques to identify

and localize objects in a live video stream from a webcam. The system utilizes a pre-trained deep

neural network model, SSD-MobileNetV3, to achieve efficient and accurate object detection.

The workflow involves capturing frames from the webcam, feeding them into the SSD-

MobileNetV3 model, and processing the model's predictions to draw bounding boxes around

detected objects. The project also incorporates the COCO (Common Objects in Context) dataset

for labeling classes and displaying relevant information such as class names and confidence

scores.

Key components of the project include video capture setup, loading class names, configuring and

loading the deep neural network model, and real-time visualization of the detection results. The

system is designed for flexibility and ease of use, allowing for potential applications in areas

such as surveillance, human-computer interaction, and augmented reality.

Through this project, we aim to explore the capabilities of deep learning-based object detection

in real-world scenarios, demonstrate its implementation using the OpenCV library, and showcase

the potential applications of such technology in various domains.

.
I. INTRODUCTION
In the realm of computer vision, the confluence of advanced technologies such as deep learning

and OpenCV has paved the way for innovative applications, notably real-time object detection.

This project aims to showcase the practical implementation of a real-time object detection

system using Python and OpenCV, employing the sophisticated SSD-MobileNetV3 deep neural

network architecture. The primary objective is to create an efficient system capable of accurately

identifying and localizing objects in live video streams from a webcam. The project leverages the

capabilities of OpenCV for video capture, while the SSD-MobileNetV3 model, pre-trained on

the COCO dataset, demonstrates proficiency in recognizing a diverse array of objects commonly

encountered in various contexts. The implementation encompasses video capture setup, class

name loading from the COCO dataset, SSD-MobileNetV3 model configuration, and real-time

visualization of detection results. Bounding boxes are overlaid on detected objects, accompanied

by pertinent information such as class names and confidence scores. This iterative process,

encapsulated in a continuous loop, exemplifies the system's real-time capabilities. By exploring

the practical application of deep learning-based object detection, this project serves to highlight

the integration of cutting-edge models into tangible computer vision tasks, with the provided

code offering a foundational understanding and implementation base for real-time object

detection systems across diverse applications.


OBJECTIVE

The primary aim of this project is to create an efficient and versatile real-time object detection
system using Python and OpenCV, specifically leveraging the capabilities of the SSD-
MobileNetV3 deep neural network architecture. Key objectives include:

1. Real-time Object Detection: Implementing a robust system capable of real-time object


detection in live video streams from a webcam. The project aims to showcase the
practical application deep learning in accurately identifying and localizing various objects
within diverse environments.

2. OpenCV Integration: Leveraging the OpenCV library for video capture setup, enabling
seamless interfacing with the webcam, and processing video frames in real-time. The
integration with OpenCV serves as the foundation for the project's video processing
capabilities.

3. SSD-MobileNetV3 Model Implementation: Configuring and implementing the SSD-


MobileNetV3 model for object detection. This model, pre-trained on the COCO dataset,
demonstrates proficiency in recognizing a wide range of objects commonly found in
diverse scenarios.
4. Bounding Box Visualization: Overlaying bounding boxes on detected objects to provide
a visual representation of the system's object localization capabilities. Each bounding box
will be accompanied by pertinent information such as class names and confidence scores.

5. Real-time Iterative Processing: Encapsulating the entire object detection process within
a continuous loop to ensure real-time and iterative processing of incoming video frames.
This design choice showcases the system's capability to handle a constant stream of data
and provide instantaneous results.

6. User-Friendly Interaction: Designing the system with a user-friendly interface,


ensuring ease of use and understanding. The project aims to provide clear visual feedback
on detected objects, enhancing the overall user experience.

7. Error Handling and Robustness: Implementing effective error handling mechanisms to


ensure system robustness. The project will incorporate validation processes to handle
unexpected scenarios, preventing crashes or errors and ensuring the system's reliability.

8. Extensibility and Modularity: Designing the project with extensibility and modularity
in mind to facilitate the addition of new functionalities or integration with other
technologies. This approach enables future enhancements and customization based on
specific project requirements.
PROBLEM STATEMENT
1. The problem addressed in this object detection project revolves around the limitations of
existing solutions in the field. While object detection technologies have made significant
strides, certain challenges persist. The project aims to overcome these challenges and
develop an efficient object detection system using Python and OpenCV.
2. Inaccurate Object Detection: Existing object detection systems may struggle with
accurate and real-time identification of objects in various environments. This can lead to
misinterpretation or failure to recognize certain objects, limiting the system's reliability.
The project addresses this challenge by implementing a real-time object detection system
using the SSD-MobileNetV3 architecture, aiming for accurate and efficient identification
across diverse scenarios.
3. Limited Object Recognition Functionality: Some object detection systems may have
limited capabilities, restricting their usefulness to specific types of objects or scenarios.
The project aims to provide a comprehensive solution by leveraging the SSD-
MobileNetV3 model, pre-trained on the COCO dataset. This model's versatility enables
the recognition of a wide range of objects commonly found in different contexts,
enhancing the system's utility.
4. Real-time Processing Constraints: The efficiency of object detection systems is crucial,
especially in real-time applications. Existing solutions might face challenges in achieving
real-time processing, leading to delays in object identification. This project addresses
such constraints by utilizing the OpenCV library and optimizing the SSD-MobileNetV3
model for real-time processing, ensuring timely and accurate detection of objects.
5. User Interface for Object Visualization: The project acknowledges the significance of
providing a user-friendly interface for object visualization. Some existing solutions may
lack clear visual feedback, hindering user understanding. This project aims to overcome
this limitation by overlaying bounding boxes on detected objects, accompanied by
relevant information such as class names and confidence scores. This enhances the user
experience and provides intuitive feedback.
6. Error Handling and Robustness: Object detection systems need to be robust in
handling unexpected scenarios or errors during operation. Existing solutions may struggle
with error handling, potentially leading to system crashes or incorrect responses. This
project addresses this concern by implementing proper error handling mechanisms,
ensuring the system remains resilient in scenarios such as low-light conditions,
occlusions, or unexpected object types.
By addressing these challenges, the project seeks to provide an object detection system that
offers accurate and real-time identification, versatile object recognition functionality, efficient
processing, a user-friendly interface, and robust error handling. The goal is to enhance the
reliability and effectiveness of the object detection system in diverse applications, from
surveillance to interactive technologies.
II. Proposed System
The journey of real-time object detection can be traced back to the early days of computer
vision, where researchers embarked on exploring the possibilities of identifying and
localizing objects in dynamic environments. The inception of this project was fueled by
the convergence of powerful computing, advancements in deep learning, and the
increasing demand for robust object detection systems.

• In its nascent stages during the late 1950s and 1960s, computer vision enthusiasts delved
into the realm of pattern recognition algorithms and statistical models to tackle the
challenges of object identification. Initial efforts focused on simple object recognition,
laying the groundwork for subsequent advancements in the field.
• The 1990s marked a pivotal era, witnessing a surge in the capabilities of object detection
as powerful computers and machine learning algorithms came to the forefront. Neural
networks and hidden Markov models were introduced, revolutionizing the accuracy and
performance of object detection systems. This paved the way for the development of
sophisticated virtual assistants capable of understanding and responding to the visual
environment.
• The advent of smartphones and personal digital assistants in the early 2000s acted as a
catalyst, propelling object detection into the mainstream. Tech giants like Apple, Google,
and Amazon integrated object detection functionalities into their devices, ushering in an
era of interactive and visually aware systems. This transformative period showcased the
potential of real-time object detection in various applications.
• Motivated by these technological strides and the desire to create a more versatile object
detection system, this project was initiated. The primary goal was to develop a real-time
object detection system using Python, specifically leveraging the capabilities of the SSD-
MobileNetV3 architecture. The project aimed to push the boundaries of object detection
by providing a system that could identify and locate objects seamlessly.
• The development process involved extensive research and experimentation with machine
learning algorithms, particularly deep neural networks, to train models for accurate and
efficient object detection. The integration of the OpenCV library became instrumental in
creating a reliable framework for video capture, image processing, and real-time
visualization of detected objects.
• Throughout the project's evolution, iterative improvements were made to enhance the
system's capabilities and address any challenges. User feedback and real-world testing
played a pivotal role in refining the object detection system, ensuring its adaptability to
diverse scenarios. The project team dedicated efforts to create an intuitive user interface,
enabling seamless interaction and robust error handling mechanisms.
III. Software and Hardware Requirements
Software Requirements for Real-time Object Detection with OpenCV and SSD-
MobileNetV3 in Python:
• Programming Language: Python 3.x
• Computer Vision Library: OpenCV (Open Source Computer Vision Library)
• Deep Learning Library: TensorFlow or PyTorch (for utilizing the pre-trained SSD-
MobileNetV3 model)
• Integrated Development Environment (IDE): Any Python IDE, such as PyCharm,
Visual Studio Code, or Jupyter Notebook, for development and testing
• Operating System Compatibility: The object detection system should be compatible
with the target operating systems (e.g., Windows, macOS, Linux)
• Internet Connection: While not mandatory for the core functionality, an internet
connection may be required for accessing additional resources or updating model
weights.
Hardware Requirements for Real-time Object Detection with OpenCV and SSD-
MobileNetV3 in Python:
• Computer: A desktop or laptop computer capable of running Python, OpenCV, and deep
learning libraries.
• Webcam: A working webcam for capturing real-time video frames.
• Sufficient Processing Power: The computer should have ample processing power to
handle real-time video processing, object detection, and visualization.
• Adequate Graphics Processing Unit (GPU): While not mandatory, having a GPU can
significantly accelerate deep learning computations and improve overall performance.
• Adequate Memory: The computer should have sufficient memory (RAM) to store and
process data efficiently during real-time object detection.
• Speakers or Headphones: Optional but may be useful for receiving audio feedback or
alerts related to detected objects.
• Stable Power Supply: Ensuring a stable power supply to prevent interruptions during
real-time object detection processes.
These software and hardware requirements provide a foundational framework for developing a
real-time object detection system with OpenCV and SSD-MobileNetV3 in Python. Depending on
the project's specific features or additional functionalities, further considerations may be
necessary. It's crucial to adapt the requirements based on the intended use case, platform, and any
external devices or sensors that might be integrated into the system.

IV. E-R Diagram


V. CODING
1. Importing Libraries
import cv2

This line imports the OpenCV library, which is a powerful tool for computer vision tasks such as
image and video processing.

2. Setting Threshold and Video Capture

thres = 0.45
cap = cv2.VideoCapture(0)
cap.set(3, 1280)
cap.set(4, 720)
cap.set(10, 70)

• thres: This variable sets the confidence threshold for object detection. Objects with
confidence scores below this threshold will be ignored.
• cap: Initializes a video capture object using the default camera (index 0). The subsequent
lines set various properties for the video capture, such as width, height, and brightness.

3. Loading Class Names

classNames = []
classFile = 'coco.names'
with open(classFile, 'rt') as f:
classNames = f.read().splitlines()

• classNames: A list that will store the names of the classes that the model can detect.
• classFile: The file containing the names of the classes, typically associated with a pre-
trained model (like COCO dataset classes).
• This code reads the class names from the file and splits them into a list.

4. Loading Pre-trained Model

cv2.dnn_DetectionModel('frozen_inference_graph.pb',
'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt')
net.setInputSize(320, 320)
net.setInputScale(1.0/127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

• net: Initializes an object detection model using the MobileNet architecture with Single
Shot Multibox Detector (SSD) for real-time object detection.
• 'frozen_inference_graph.pb' and 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt':
These are files containing the pre-trained weights and configuration for the model.
• setInputSize(320, 320): Sets the input size of the images for the model.
• setInputScale(1.0/127.5): Sets the scale factor for the pixel values normalization.
• setInputMean((127.5, 127.5, 127.5)): Sets the mean values for image normalization.
• setInputSwapRB(True): Swaps the Red and Blue channels in the input images.

5. Object Detection Loop

while True:
success, img = cap.read()
classIds, confs, bbox = net.detect(img, confThreshold=thres)

if classIds:
for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
cv2.rectangle(img, box, (0, 255, 0), 2)
cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30), 0, 1, (0, 255,
0), 2)
cv2.putText(img, f'{round(confidence*100, 2)}%', (box[0]+200, box[1]+30), 0, 1, (0,
255, 0), 2)

cv2.imshow("Output", img)
cv2.waitKey(1)

• The code enters an infinite loop to continuously capture video frames.


• cap.read(): Reads a frame from the video capture.
• net.detect(img, confThreshold=thres): Detects objects in the frame, considering only
those with confidence scores above the threshold (thres).
• If objects are detected (classIds is not empty):
• The code iterates through the detected objects.
• cv2.rectangle(): Draws a rectangle around the detected object.
• cv2.putText(): Places text on the image, showing the class name and confidence score.
• cv2.imshow(): Displays the annotated frame.
• cv2.waitKey(1): Waits for a key press (1 millisecond delay).

Summary
The code sets up a real-time object detection system using OpenCV, a pre-trained model, and a
webcam feed. It continuously captures frames, detects objects, and annotates the video feed with
bounding boxes and class labels.
Screenshots

Fig 1. Command to print information from Wikipedia

Fig 2. Command to tell the time


Fig 3. Command to open google in browser

Fig 4. Command to tell which day it is


Fig 5. Command to print the Bot name

Fig 6. Command to exit


V. LIMITATIONS
1. Accuracy and Generalization:
• Limited Training Data: The model's accuracy may be constrained by the quantity and
diversity of the training data. An insufficient dataset may result in poor generalization to
unseen scenarios.
• Overfitting: The model might overfit to the training data, meaning it performs well on the
training set but poorly on new, unseen data.
2. Algorithm Selection:
• Algorithm Constraints: Depending on the chosen algorithm (e.g., Haarcascades, SSD,
YOLO), there might be limitations in handling specific object types, sizes, or
orientations.
• Trade-offs: Different algorithms have trade-offs between speed and accuracy. Some may
sacrifice accuracy for real-time performance.
3. Real-world Conditions:
• Environmental Factors: Changes in lighting conditions, background clutter, or variations
in object appearance might impact detection accuracy.
• Scale and Perspective: The model may struggle with objects at different scales or
perspectives, affecting its ability to accurately detect them.
4. Resource Intensiveness:
• Hardware Requirements: The project's performance may be resource-intensive, requiring
a powerful computer for real-time processing.
• Processing Speed: The speed of object detection may be limited by the computing
resources available.
5. Lack of Semantic Understanding:
• Limited Context Awareness: The model may lack semantic understanding, meaning it
might not comprehend the context of the detected objects or their relationships.
6. Single Object Type:
• Specialization: If the model is trained on a specific type of object, it may struggle with
detecting objects outside its training scope.
7. Lack of Real-time Learning:
• Static Model: The model is likely static and doesn't adapt to new objects or
environmental changes in real-time.
8. Evaluation Metrics:
• Subjectivity: Evaluation metrics for object detection can be subjective, and different
metrics may highlight different aspects of performance.
9. Interpretability:
• Black-box Nature: Deep learning models, depending on the complexity, can be
challenging to interpret, making it difficult to understand how and why certain
predictions are made.
10. Integration with Other Systems:
• Compatibility: Integrating the object detection system with other software or systems
may pose challenges, especially if they use different technologies or standards.
11. Lack of Robustness:
• Vulnerability to Adversarial Attacks: Some deep learning models, if not designed to be
robust, can be vulnerable to adversarial attacks, where small input perturbations lead to
incorrect predictions.
12. Maintenance and Updates:
• Continuous Improvement: Regular updates and maintenance are crucial for addressing
emerging challenges, improving performance, and ensuring compatibility with evolving
standards.
Understanding these limitations can guide further development, improvements, and set realistic
expectations for the project's capabilities.
VI. CONCLUSION

In conclusion, the development of the object detection project in Python has been an insightful
journey aimed at creating a real-time system capable of identifying and annotating objects in a
video feed. The primary goal was to leverage OpenCV and a pre-trained model for efficient
object detection, with a focus on practical applications such as surveillance or real-time
monitoring.
Throughout the project, key components such as the choice of object detection algorithm (in this
case, MobileNet with SSD), hardware specifications, and model configuration were carefully
considered. The project successfully demonstrated the capability to detect and annotate objects in
real-time, providing a foundation for applications requiring automated visual recognition.
Various limitations were identified, including challenges related to the algorithm's accuracy
under diverse environmental conditions, potential overfitting, and the project's dependency on
stable hardware and resource availability. These limitations highlight areas for future refinement
and optimization.
Despite these limitations, the project has illuminated the potential of computer vision in real-
world applications. The seamless integration of OpenCV and a pre-trained model showcased the
power of existing technologies in addressing object detection challenges.
Looking ahead, there is room for improvement and expansion. Future developments could
involve refining the accuracy and robustness of the object detection model, addressing
limitations in challenging environments, and exploring opportunities for integration with other
systems or sensors. Enhancements in multi-object detection, real-time learning capabilities, and
support for diverse scenarios could further extend the project's utility.
In conclusion, this object detection project serves as a foundational exploration into computer
vision applications, laying the groundwork for potential advancements and broader use cases in
the field of automated visual recognition.

References
OpenCV Documentation:
OpenCV Documentation: The official documentation for OpenCV provides comprehensive
information on functions, modules, and tutorials.
Object Detection Algorithms:
Haarcascades: Learn about Haarcascades, a machine learning object detection method available
in OpenCV.
Single Shot MultiBox Detector (SSD): GitHub repository with information on implementing
SSD using MobileNet.
MobileNet with SSD:
MobileNetV3: PyTorch Hub provides a MobileNetV3 implementation.
MobileNetV2 + SSD: GitHub repository for MobileNetV2 with SSD.
Object Detection Datasets:
COCO (Common Objects in Context): COCO is a large-scale object detection, segmentation,
and captioning dataset.

You might also like