0% found this document useful (0 votes)
33 views46 pages

Final Project2

The document is a Major Project-II report on 'Real Time Object Detection' submitted by students from the Department of Artificial Intelligence & Machine Learning at Lakshmi Narain College of Technology. It outlines the project objectives, methodologies, and applications of real-time object detection technology, including its significance in various industries like autonomous vehicles, security, and healthcare. The report also discusses challenges, literature surveys, and the setup of the development environment for implementing object detection using TensorFlow and OpenCV.

Uploaded by

cryptechgroup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views46 pages

Final Project2

The document is a Major Project-II report on 'Real Time Object Detection' submitted by students from the Department of Artificial Intelligence & Machine Learning at Lakshmi Narain College of Technology. It outlines the project objectives, methodologies, and applications of real-time object detection technology, including its significance in various industries like autonomous vehicles, security, and healthcare. The report also discusses challenges, literature surveys, and the setup of the development environment for implementing object detection using TensorFlow and OpenCV.

Uploaded by

cryptechgroup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

REAL TIME OBJECT DETECTION

A Major Project-II Report


Submitted in Partial fulfillment for the award of
Bachelor of Technology in Department of Artificial Intelligence &
Machine Learning

Submitted to
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL (M.P)

MAJOR PROJECT-II REPORT


Submitted by
Harsh Lakshkar [0103AL211073] Sonali Kumari [0103AL211181]
Kartikey Dadarya [0103AL211086] Shruti Verma [0103AL211175]

Under the supervision of


Dr. Nikita Shivhare
Mitra
Associate Professor

Department of Artificial Intelligence & Machine


Learning Lakshmi Narain College of Technology,
Bhopal (M.P.)

Session
2024-25
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL
Department of Artificial Intelligence & Machine Learning

CERTIFICATE

This is to certify that the work embodied in this project work entitled “Real time
object detection” has been satisfactorily completed by the Harsh Lakshkar
(0103AL211073), Sonali Kumari (0103AL211181), Kartikey Dadarya
(0103AL211086), Shruti Verma (0103AL211175).
It is a Bonafide piece of work, carried out under the guidance of Department of
CSE, Lakshmi Narain College of Technology, Bhopal for the partial fulfillment of
the Bachelor of Technology during the academic year 2024-2025.

Guide By Approved By

Dr. Nikita Shivhare Mitra Dr. Tripti Saxena


Associate Professor Professor & Head
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL

Department of Artificial Intelligence & Machine Learning

ACKNOWLEDGEMENT

We express our deep sense of gratitude to Dr. Nikita Shivhare Mitra department of
CSE-AIML, L.N.C.T., Bhopal. Whose kindness, valuable guidance and timely help
encouraged me to complete this project.

A special thank goes to Dr. Tripti Saxena (HOD), who helped me in completing this
project work. He exchanged his interesting ideas & thoughts which made this
project work successful.

We would also thank our institution and all the faculty members without whom this
project work would have been a distant reality.

Harsh Lakshkar Shruti Verma


[0103AL211073] [0103AL211175]

Sonali Kumari Kartikey Dadarya


[0103AL211181] [0103AL211086]
INDEX

S.NO. TOPICS PAGES

1. Problem Domain Description 1-5

2. Literature Survey 6-13

3. Major objective & scope of project 14-16

4. Problem Analysis and requirement specification 17-20

5. Detailed Design (Modeling and ERD/DFD) 21-22

6. Hardware/Software platform environment 23-25

7. Snapshots of Input & Output 26-29

8. Coding 30-39

9. Project limitation and Future scope 40-44

10. References 45

Total 45 Pages
CHAPTER 1
PROBLEM DOMAIN DESCRIPTION

1.1 What is Real-Time Object Detection?


Real-time object detection is a computer vision technique that enables the identification and
localization of objects in video or image streams in real-time. It involves analyzing visual data and
detecting objects of interest promptly, allowing for immediate decision-making and response.

Many prominent and intellectual citizens from different walks of life uses machine learning
knowingly or unknowingly. As per current scenario the near future seems to follow tradition based
on image processing. Nowadays face detection, eye scanners are used to unlock devices,
recommendation systems are also widely used and all of them are based on image processing. The
most popular machine learning library in the world is Google’s Tensorflow developed by their team
named google brain. Every single product of google use machine learning in some or the other way
whether its image searching, image capturing, caption translation and so on. Google needs machine
learning to take advantage of such bulky datasets. Tensorflow library was built to scale it and made
to run on multiple CPUs and GPUs. There are wrapper classes available in several languages with
python being the most used language where python package manager is used to install TensorFlow
commonly known as pip. Pip install TensorFlow is the command used to install TensorFlow on
different machines based on their CPU and GPU. Initially, libraries are loaded, and models are
downloaded as per the requirement. There are various Coco dataset models available based on cnn,
rcnn and faster_rcnn but among them faster_rcnn is found most accurate. The widely used model is
ssd_mobilenet_v2_coco model which follows the architecture of SSD (Screen Shot Detection). In
addition, protobuf-compiler is also used which convert proto files into c++, java and python source
code that are cloned inside TensorFlow model.

1.2 Applications for Real-Time Object Detection:


• Object detection is a powerful computer vision technology that enables systems to identify and
locate objects within an image or video. It involves recognizing what objects are present (like a
car, person, or dog) and determining where each object is located. This technology is widely used
in various applications, including security cameras, self-driving cars, and smartphone apps that
can identify objects through the camera.

1
• Exploring diverse industries and use cases where object detection is making a significant impact,
it includes autonomous vehicles, surveillance systems, retail, healthcare, agriculture,
manufacturing, sports analytical, environmental monitoring and smart cities.
• Autonomous vehicles rely on sensors and cameras to detect and classify objects like pedestrians,
other vehicles, traffic signs, and obstacles. This information helps the car make safe driving
decisions, avoid collisions, and navigate complex environments.
Application:Tesla’s Autopilot uses a combination of cameras, radar, and ultrasonic sensors to
detect and classify objects around the vehicle. This system helps the car navigate roads, change
lanes, and avoid obstacles autonomously, enhancing safety and driving efficiency.
• In security and surveillance, object detection identifies and tracks individuals, recognizes
suspicious activities, and detects intrusions. Modern surveillance systems can alert security
personnel to potential threats in real time and provide detailed footage for forensic analysis.
Application: Hikvision’s AI-powered cameras detect and track individuals, recognize suspicious
activities, and detect intrusions. These cameras can alert security personnel in real time to
potential threats and provide detailed footage for forensic analysis, improving overall security
measures.
• Retail stores use object detection to monitor customer behaviour, manage inventory, and prevent
theft. Smart cameras analyze shopper movements, detect when shelves need restocking, and track
items to reduce shrinkage and improve overall store management.
Application: Amazon Go stores use advanced object detection to monitor customer behaviour,
manage inventory, and prevent theft. Smart cameras and sensors track shopper movements and
detect when shelves need restocking, providing a seamless shopping experience without the need
for checkout lines.
• In healthcare, object detection aids in medical imaging and diagnostics. For instance, it helps
radiologists detect tumours, fractures, and other anomalies in X-rays, MRIs, and CT scans. It also
plays a role in monitoring patients and ensuring adherence to treatment protocols.
Application: Zebra Medical Vision uses object detection to aid radiologists in detecting tumours,
fractures, and other anomalies in medical images like X-rays, MRIs, and CT scans. This
technology improves diagnostic accuracy and helps monitor patient conditions more effectively.
• Object detection in agriculture helps farmers monitor crops and livestock. Drones and cameras
equipped with this technology can detect weeds, pests, and diseases in crops, enabling targeted

2
treatments. It also assists in counting livestock, monitoring their health, and managing farm
equipment, leading to more efficient and productive farming practices.
Application: John Deere's See & Spray technology uses object detection to identify weeds in
crops. This enables targeted herbicide application, reducing chemical usage and improving crop
yield. It also assists in monitoring crop health and detecting diseases early.
• In manufacturing, object detection is used for quality control and automation. Cameras and
sensors inspect products on assembly lines, identifying defects and ensuring they meet quality
standards. This technology also helps automate production processes by guiding robots to handle
and assemble parts accurately, increasing efficiency and reducing errors.
Application: Cognex vision systems use object detection for quality control on assembly lines.
Cameras inspect products for defects, ensuring they meet quality standards. This technology also
guides robots in handling and assembling parts accurately, increasing efficiency and reducing
errors in production processes.
• Sports analytics benefit from object detection by tracking players and equipment during games.
This technology provides detailed statistics on player movements, actions, and game strategies.
Coaches and analysts use this data to improve team performance, develop game plans, and
enhance the viewing experience for fans with real-time insights.
Application: Hawk-Eye uses object detection to track the ball and players during tennis matches.
This technology provides detailed statistics on player movements, actions, and game strategies.
It assists referees with line calls and enhances the viewing experience for fans with real-time
insights and replays.
• Object detection helps monitor and protect the environment by identifying and tracking changes
in natural habitats. It is used in wildlife conservation to track animal movements and monitor
populations. Additionally, it can detect illegal activities such as poaching or deforestation,
enabling timely interventions to protect ecosystems.
Application: The World Wildlife Fund (WWF) uses drones equipped with object detection to
monitor wildlife in their natural habitats. These drones track animal movements and monitor
populations, helping conservationists protect endangered species and detect illegal activities like
poaching.
• In smart cities, object detection enhances urban living. It helps manage traffic flow by detecting
vehicles and pedestrians, reducing congestion and improving safety. It also monitors public
spaces for cleanliness and security, supports waste management by identifying full bins, and

3
contributes to energy efficiency by controlling lighting and other utilities based on occupancy
detection.
• Object detection technology is transforming industries by improving efficiency, safety, and
decision-making processes. Its applications are vast and varied, demonstrating its significant
impact on modern society.
Application: Barcelona uses object detection to manage traffic flow by detecting vehicles and
pedestrians, reducing congestion and improving safety. Technology also monitors public spaces
for cleanliness and security, supports waste management by identifying full bins, and
contributes to energy efficiency through occupancy-based control of lighting and utilities.
• Real-time object detection relies on various techniques, with deep learning-based methods being
widely used. These techniques include Single Shot Multibox Detector (SSD), You Only Look
Once (YOLO), and Faster R-CNN (Region-based Convolutional Neural Networks). They employ
convolutional neural networks (CNNs) to extract features, perform object localization, and
classify objects simultaneously.
• Usually, we categorize object detection as a classification problem involving the classification of
an object based on categories. There are several approaches to detect the objects and it can be
broadly classified as a classification problem and regression problem.

1.3. Challenges in Real-Time Object Detection:


Real-time object detection faces several challenges. One major challenge is the need for high
computational power and memory bandwidth to process large amounts of visual data in real-time.
Additionally, achieving the desired speed and accuracy trade-off requires efficient algorithms and
architecture.

1.4. Applications of Real-Time Object Detection:


Real-time object detection has diverse applications.

• In autonomous vehicles, it helps identify pedestrians, vehicles, and traffic signs to ensure
safe navigation.

• Surveillance systems utilize real-time object detection to detect and track suspicious
activities.

4
• Robotics benefits from object detection to enable robots to perceive and interact with their
surroundings effectively.

• In augmented reality, real-time object detection allows for virtual objects to be placed and
interacted with in the real world.

In conclusion, real-time object detection is a vital computer vision technique that enables computers
to analyze visual information in real-time. With its wide range of applications, it relies on advanced
deep learning techniques to achieve accurate and timely object detection and classification.

5
CHAPTER 2

LITERATURE SURVEY

2.1 Detection of Real Time Objects Using TensorFlow and OpenCV

Object recognition and detection can be done in structured as well as in the unstructured environment.
It is the most interesting and challenging job in computer science. It will be a helping hand for people
who have lost their eyes. Object detection techniques have wide methods to implement. One of those is,
by using sensors. It includes much complex hardware such as ultrasonic sensor, radar, stereo vision
optical flow. Those made the product more expensive. A prototype model can be developed which is
cost-effective.

There are two techniques for detecting living or non-living things, named as TensorFlow and PyTorch.
TensorFlow is one of the source software libraries of python which is used for machine
learningapplications along with math as a core library. Whereas PyTorch was developed by the AI
research team of Facebook and Uber. It is an open-source library.

2.1.1 Setting up the Environment:

Before starting with real-time object detection, it is essential to set up the development
environment. Install Python, TensorFlow, and OpenCV on your system. Ensure that the required
dependencies are installed and accessible.

 Install Python:
• Download and install Python from the official Python website
(https://www.python.org/) based on your operating system.

• Make sure to select the option to add Python to the system PATH during the installation process.
 Install TensorFlow:
• Open a command prompt or terminal and run the following command to install TensorFlow:
pip install tensorflow
• Depending on your system configuration, you can install the CPU-only version or the GPU-enabled
version of TensorFlow. GPU-enabled TensorFlow requires additional dependencies like CUDA and
cuDNN, which need to be installed separately.

 Install OpenCV:

6
• OpenCV can be installed using pip as well. Run the following command in the command prompt
or terminal:

pip install opencv-python


 Install Additional Dependencies:

• Real-time object detection may require additional libraries and dependencies. Some commonly
used libraries include NumPy, Matplotlib, and Pillow. Install them using the following commands:
pip install numpy pip install matplotlib pip install pillow

 Download TensorFlow Object Detection API:


• Clone or download the TensorFlow Object Detection API repository from the official GitHub
repository (https://github.com/tensorflow/models).

• Extract the downloaded archive to a suitable location on your system.


 Install Protobuf Compiler:
• The TensorFlow Object Detection API relies on Protocol Buffers (protobuf) to configure and train

object detection models. Download and install the protobuf compiler from the official proto.

 Compile Protobuf Files:


• In the TensorFlow Object Detection API directory, navigate to the research folder and run the

following command to compile the protobuf files: protoc object_detection/protos/*.proto --

 Add Object Detection API to PYTHONPATH:


• To use the TensorFlow Object Detection API, you need to add the research and slim directories
to the PYTHONPATH environment variable.

• On Windows:
• Open the command prompt and run the following commands:
PYTHONPATH=<path_to_tensorflow_object_detection_folder>\research;<path_t

• On Linux/Mac:
• Open a terminal and run the following commands:

PYTHONPATH=<path_to_tensorflow_object_detection_folder>/research:<path_t

o_tensorflow_object_detection_folder>/research/slim:$PYTHONPATH

7
 Test the Installation:
• To verify that the environment is set up correctly, run the following command in the command
prompt or terminal:

python -c "import tensorflow as tf;print(tf. version )"

• It should display the installed TensorFlow version without any errors.


2.1.2 TensorFlow Object Detection API:
The TensorFlow Object Detection API is a powerful tool that provides pre-trained models and a
comprehensive framework for training custom object detection models. This API supports various
object detection architectures, including SSD, YOLO, and Faster R-CNN.

2.1.3 Selecting a Pre-trained Model:


Choose a pre-trained model that suits your specific object detection needs. The TensorFlow Object
Detection Model Zoo offers a variety of models trained on popular datasets like COCO and Open
Images. Select a model based on factors such as accuracy, speed, and the objects you need to detect.

2.1.4 Model Conversion:


To use a pre-trained TensorFlow model with OpenCV, you need to convert the model to a format
compatible with OpenCV's deep learning module. Convert the TensorFlow model to a frozen graph
using TensorFlow's freeze_graph.py script. Next, optimize the frozen graph using TensorFlow's
optimize_for_inference.py script. Finally, convert the optimized graph to OpenCV's Intermediate
Representation (IR) format using the Model Optimizer tool.

2.1.5 Loading the Model in OpenCV:


Use OpenCV's dnn module to load the converted model. Create a cv2.dnn.Net object and load the

model's configuration file and the corresponding weights file.

2.1.6 Real-Time Object Detection:


To perform real-time object detection, capture video frames using OpenCV's VideoCapture class.
Preprocess each frame by resizing it to the required input size, normalizing pixel values, and

8
converting it to a blob. Pass the preprocessed blob through the loaded model using the cv2.dnn.Net
object's forward() method. Retrieve the bounding box coordinates, class labels, and confidence
scores of detected objects.

2.1.7 Drawing Bounding Boxes:


Iterate over the detected objects and draw bounding boxes on the frame using OpenCV's drawing
functions. Display the frame with bounding boxes using OpenCV's imshow() function. Optionally,
you can also display the class labels and confidence scores alongside the bounding boxes.

2.1.8 Performance Optimization:


Real-time object detection requires efficient processing to achieve acceptable frame rates.
Implement performance optimizations like hardware acceleration (e.g., GPUs), batch processing,
and multi-threading to improve the detection speed. Experiment with different optimizations and
measure their impact on performance.

2.1.9 Further Enhancements:


To enhance the object detection system, consider implementing additional features such as object
tracking, object counting, or integrating with other modules for specific application requirements.
These enhancements can be achieved using various computer vision techniques and algorithms.

2.2 Appearance-based object detection

This method has shown very well result on detecting range of 3-D objects from smaller ones to
larger. For instance, it could recognize cars, buses, trucks, airplanes, too. Additionally, it is
programmed to find out which is the animal against the web camera.

The project has been done on a large-scale that it has verified around more than 2000 separate
images, thus, ultimately, it increases the performance of recognition and always keep updating
rigorous dataset/database. The major problem happens with this is that the whole object frame
must not separate as bottom-up. In the current system, it is done automatically for 2-D views of 3-
D objects.

9
2.3 Implementation
YOLO is an object detector which is used in the object recognition system and real-time object
detection. YOLO is an abbreviation of “You only look once". It can capture 45 frames per second.
Another thing to install is the webcam that is operated by programming in python. Specifically, it is
run when we call the function of OpenCV. Functions for example, cv2.VideoCapture() is for reading
incoming frames and method like read() is reading the images' pixels and return it to script.
According to the dataset, it will match the pixels with objects resided in the dataset and will fetch
the name of it.

2.4 Tensorflow: A system for large scale machine learning

Google brain team is developed specifically for making machines intelligent. One of their projects
was the development of the TensorFlow system, model making and training them on larger
datasets. The functional computation, a traditional method previously used for graphical
representation on immutable data is now replaced by tensor (multidimensional array) on mutable
data. However, Tensorflow allows vertices to represent computations that update the state or own
them. Previously used system was DistBelief developed for training neural networks on distributed
system that google had used since 2011.Distbelief follows parameter server architecture, and its
limitations were tackled by development of TensorFlow. TensorFlow has benefits such as along
with python it also provides APIs in C++ which makes it very easier and flexible to implement, it
also has a relatively faster compilation time than other deep learning libraries such as Keras and
Torch. Unlike others libraries, TensorFlow supports both CPU and GPU computing for any device,
which is very beneficial because nowadays the volume of datasets used for deep learning is huge
and CPU's computing power is not capable of handling such complex operations, so using CPU
along with GPU is really the key to process hybrid deep learning applications especially games
having a high graphics requirement. However, Tensorflow was not always considered as a tool for
deep learning, originally it was developed to run large numerical computations. Tensorflow uses
Tensors which are multidimensional arrays to accept data. Input in neural networks is taken in the
form of arrays having different dimensions and ranks. Since rank a tensor dimension can be
classified, for example, s=[200] is a tensor of rank 0 which contains only one element also known
as a scalar, a tensor of rank 1 can be described as v=[10,11,12] a single row matrix and higher rank
tensors are also possible due to multidimensionality. Tensorflow programs are executed on Data
Flow Graphs
10
which consists of several nodes and edges. Unlike typical programming, in TensorFlow we prepare
graphs containing various nodes which run in the form of a session using data from a tensor.
TensorFlow operations can be arranged into a generally called computational graph which is the
graph of main logic of a program which TensorFlow builds in the memory. Very large-scale neural
networks as computing can be distributed across several CPUs or GPUs in the same manner can be
allowed by using it. blocks can be created from computational graphs which can run across distinct
GPU and CPU simultaneously. This process is called parallel computation.

2.5 Microsoft COCO: Common Objects in Context

The dataset developed by Microsoft, commonly known as COCO: Common Objects in Context that
means it contains objects that are easily recognized by a 4-year-old boy. This dataset contains 330K
images with 1.5M object instances for 80 object categories including 5 captions per image. This is
stored in an annotation file stored with extension. Json. The coco images are trained and validated
and tested for all 80 categories. All object instances are annotated with detailed segmentation
mask. This json file contains five different sections info, licenses, images, annotations, and
categories. The info and licenses section only contains information related to dataset like date,
contributor, year, URL and licenses. The images section has its own unique ID which is used to
reference in annotations. Apart from that, cocourl and flickerurl specify the path to images hosted.
Widths and heights are stored in pixels.

For each image they have distinct annotations that have different ID’s, category_id that specifies the
category to which the object belongs. Categories have their name, id and supercategory. To
illustrate, if the name is bicycle, then the supercategory is vehicle and that is having id as 2.
Iscrowd is a field of annotation that is specified by 0 and 1. If iscrowed is 0 that means image
contains only one lion and is iscrowed is 1 than there is more than one lion in the image. The whole
image is described in segmentation and values are stored in form of array as shown
[[x,y,x,y][x,y,x,y]]. So, segmentation is list of points that defined image where actual object is
present in the image. Counts are also defined in annotations which identifies the unshaded part or
the part where object is not present. In other words, let us consider the array [147, 3, 1, 3, 89, …],
147 represents that all thes” pix’ls are not the part of object followed digit 3 indicates these 3
pixels represents image. Next digit 1 shows that pixel is not a part of image and so on continues
for rest of array.

11
The annotations in the COCO dataset are manually created by human annotators, ensuring high-
quality and accurate annotations. This meticulous annotation process makes COCO a valuable
resource for training and evaluating computer vision models.

The COCO dataset is commonly used for several important computer vision tasks:

• Object Detection: The dataset provides bounding box annotations for objects in images,
enabling researchers to train models that can accurately detect and localize objects in
various scenes.

• Instance Segmentation: In addition to bounding box annotations, COCO also includes pixel-
level annotations for some images, which makes it suitable for training models that can
segment objects at the pixel level.

• Keypoint Detection: The dataset contains annotations for human poses, including keypoints
like joints and body parts. This makes COCO useful for training models that can detect and
track key body points in images or videos.

• Image Captioning: The dataset includes descriptive captions for each image, allowing
researchers to develop models that can generate natural language descriptions of the visual
content.

• The Microsoft COCO evaluation metric is designed to measure the performance of computer
vision models on these tasks. It uses average precision (AP) as the primary metric for object
detection and instance segmentation. AP summarizes the precision-recall curve of a model
and provides a single numerical value to evaluate its performance. For image captioning, the
evaluation is based on the quality of generated captions using various metrics like BLEU,
METEOR, CIDEr, and ROUGE.

Overall, Microsoft COCO has become a standard benchmark in the field of computer vision,
providing researchers and developers with a common dataset and evaluation metric to compare and
advance the state-of-the-art in object detection, instance segmentation, and image captioning.

12
CHAPTER 3

MAJOR OBJECTIVE & SCOPE OF PROJECT

3.1 OBJECTIVE
The objective of real-world object detection is to develop a robust and accurate system that can
detect and localize objects of interest in real-world scenarios. The primary goal is to create a
practical solution that can effectively identify objects in diverse environments, handle various
challenges (e.g., occlusion, scale variations, complex backgrounds), and provide reliable results in
real-time or near real-time.

13
3.2 SCOPE

3.2.1 Application:

Specific Objectives: Real-world object detection projects often have specific objectives tailored to
their application domain. For example, in autonomous driving, the objective might be to detect and
track vehicles, pedestrians, traffic signs, and other relevant objects for safe navigation. In
surveillance, the objective might be to

identify suspicious activities or recognize specific individuals. Defining the specific objects and
requirements for detection within the chosen application domain is a critical aspect of the project
scope.

3.2.2 Data Collection and Annotation:

The scope includes collecting and creating a comprehensive dataset that represents the real-world
scenarios and objects of interest. This involves capturing images or videos from the target
environment and annotating them with accurate labels, such as bounding boxes, segmentation
masks, or keypoints. Data collection may involve multiple sources, including public datasets, custom
data collection, or data augmentation techniques to increase diversity.

3.2.3 Algorithm Selection and Development:

The scope involves researching, selecting, and adapting state-of-the-art object detection algorithms
or architectures to address the project's specific requirements. This may include variants of
convolutional neural networks (CNNs), such as one-stage detectors (e.g., YOLO, SSD) or two-stage
detectors (e.g., Faster R-CNN, Mask R-CNN). The development process may include modifying the
architecture, adjusting hyperparameters, or incorporating additional modules to improve
performance or handle specific challenges.

3.2.4 Model Training and Optimization:

Training the selected object detection model using the annotated dataset is an important part of the
scope. This involves preprocessing the data, selecting appropriate loss functions, fine-tuning pre-
trained models, or training from scratch. Optimization techniques, such as learning rate schedules,
regularization methods, or model distillation, may be applied to improve model performance and
generalization.

14
3.2.5 Performance Evaluation:

The scope includes evaluating the performance of the object detection system using appropriate
metrics and benchmarks. This evaluation should cover aspects like accuracy, precision, recall, mean
average precision (mAP), F1-score, or intersection over union (IoU). Evaluation may involve splitting
the dataset into training, validation, and test sets, and conducting thorough analysis to understand
the strengths and limitations of the system.

3.2.6 Real-Time Processing and Efficiency:

Real-time or near real-time processing is a crucial aspect of real-world object detection. The scope
includes optimizing the algorithm and implementation to achieve fast inference speeds while
maintaining accuracy. Techniques such as model compression, hardware acceleration (e.g., GPUs,
TPUs), or parallel processing may be explored to ensure efficient execution on different platforms or
devices.

3.2.7 Robustness and Generalization:

The scope involves designing the object detection system to be robust and capable of generalizing
unseen or challenging scenarios. This may require incorporating techniques like data
augmentation, domain adaptation, ensemble methods, or attention mechanisms to improve
robustness against variations in lighting, occlusion, viewpoint, or object scale.

3.2.8 Integration and Deployment:

Integrating the object detection system into a practical application or system is part of the scope.
This may involve designing APIs, interfaces, or software components that allow seamless integration
with other systems or frameworks. Consideration should be given to scalability, performance, and
compatibility with target platforms (e.g., edge devices, cloud servers).

3.2.9 Ethical and Privacy Considerations:

Real-world object detection projects need to address ethical considerations, privacy concerns, and
legal regulations. The scope includes ensuring data privacy, avoiding biases in the training data,
and implementing measures.

15
CHAPTER 4

PROBLEM ANALYSIS & REQUIREMENT


SPECIFICATION

4.1 Problem Statement:


The objective of the real-world object detection project is to develop a system that can accurately
detect and classify various objects present in images or video streams captured by cameras or
other sensors. The system should be able to handle diverse environmental conditions, varying
object sizes, and different viewpoints.

4.2 User Requirements:

• Accuracy:

The system should have a high level of accuracy in detecting and classifying objects. It should be
able to correctly identify objects even in challenging scenarios, such as occlusion, cluttered
backgrounds, and low lighting conditions.

• Speed:

The object detection process should be performed in real-time or near real-time to enable timely

decision-making and response.

• Object Classes:

The system should support detection and classification of a wide range of object classes, including

but not limited to people, vehicles, animals, and common household items.

• Scalability:

The system should be scalable and able to handle a large number of objects in a scene, especially

in crowded environments.

• Robustness:

16
The system should be robust to variations in object appearance, such as changes in lighting

conditions, object orientation, and scale.

• Integration:

The system should be compatible with different camera or sensor types and capable of integrating

with existing systems or platforms.

• User Interface:

The system should have a user-friendly interface that allows users to interact with the system,

configure settings, and visualize the detected objects.

• Customizability:
The system should provide options for customization, such as adding new object classes, adjusting

detection thresholds, and fine-tuning the model.

• Data Privacy:

The system should ensure the privacy and security of captured data, adhering to relevant data

protection regulations.

4.3 Functional Requirements:

• Object Detection:

The system should be able to detect objects in images or video frames, accurately localizing their

positions.

• Object Classification:

The system should be capable of classifying the detected objects into predefined classes or

categories.

• Real-Time Processing:
The system should process the input data in real-time or near real-time to ensure timely detection

and response.

17
• Multiple Object Detection:

The system should handle scenarios with multiple objects and accurately identify and label each

object.

• Tracking:

The system should provide object tracking capabilities, allowing the tracking of objects across

frames or video streams.

• Alarm/Alert Generation:

The system should generate alarms or alerts when specific objects or events of interest are

detected, enabling proactive response.

• Integration APIs:

The system should provide application programming interfaces (APIs) or SDKs for integration with

other systems or platforms.

• Model Training and Fine-Tuning:


The system should support training and fine-tuning of object detection models using labeled

datasets, allowing customization for specific use cases.

• Data Storage and Management:

The system should provide mechanisms for storing, managing, and querying the detected object

data for analysis and future reference.

4.4 Non-Functional Requirements:

• Performance:

The system should have high performance, capable of processing a large number of frames per

second (FPS) while maintaining accuracy.

• Reliability:

18
The system should be reliable, with minimal downtime or failures, ensuring continuous

operation in critical scenarios.

• Compatibility:

The system should be compatible with different hardware platforms, operating systems, and

camera/sensor configurations.

• Usability:

The system should have a user-friendly interface, providing intuitive controls, clear visualizations,

and comprehensive documentation.

• Scalability:
The system should scale seamlessly to handle increasing data volumes and accommodate

additional cameras or sensors.

• Security:

The system should implement appropriate security measures to protect against unauthorized

access, data breaches, and tampering.

• Maintainability:

The system should be easy to maintain and update.

19
CHAPTER 5
DETAILED DESIGN

 DATA FLOW DIAGRAM:

Fig. 5.1: DFD Level 0 Diagram

20
Fig. 5.2: DFD Level 1 Diagram

21
 ENTITY RELATIONSHIP DIAGRAM:

Fig.5.3: E-R Diagram

22
CHAPTER 6
SOFTWARE PLATFORM ENVIRONMENT

6.1 Programming Language:


Python is widely used in the field of computer vision and deep learning due to its extensive libraries

and frameworks. Python is used in various fields like web development, Machine Learning, artificial
intelligence, and automation, making it a versatile tool for professionals and learners alike.

Whether you're a beginner writing your first lines of code or an experienced developer looking to

deepen your knowledge, this Python tutorial covers everything, from basics to advanced level, you

need to become proficient in Python.

6.2 Deep Learning Framework:


TensorFlow and PyTorch are the most popular frameworks for training and deploying deep learning

models. They provide efficient tools and APIs for object detection tasks.

 Building the computational graph. A computational graph is nothing but a series of TensorFlow
operations arranged into a graph of nodes.

 Running the computational graph. To actually evaluate the nodes, we must run the computational
graph within a session. A session encapsulates the control and state of the TensorFlow runtime.

 Deep Learning is a branch of Machine Learning where algorithms are written that mimic the
functioning of a human brain. The most commonly used libraries in deep learning are Tensorflow
and PyTorch. Pytorch is an open-source deep learning framework available with a Python and
C++ interface. The PyTorch resides inside the torch module. In PyTorch, the data that has to be
processed is input in the form of a tensor.

6.3 Object Detection Libraries:


You can leverage established object detection libraries built on top of deep learning frameworks,
such as:

23
• TensorFlow Object Detection API:

A powerful library that provides pre-trained models, training scripts, and evaluation tools for object
detection tasks. It supports various state-of-the-art models like Faster R-CNN, SSD, and EfficientDet.

• Detectron2:

Built on PyTorch, Detectron2 offers a modular and flexible framework for object detection. It

provides pre-trained models and tools for training and inference.

6.4 Image Processing Libraries:


OpenCV (Open-Source Computer Vision Library) is a popular library for image and video processing.
It offers various functions for image manipulation, feature extraction, and object detection
preprocessing.

OpenCV allows you to perform various operations in the image.

 Read the Image: OpenCV helps you to read the image fro file or directly from camera to make
it accessible for further processing.

 Image Enhacncement : You will be able to enhance image by adjusting the brightness, sharpness
or contract of the image. This is helpful to visualize the quality of the image.

 Object detection: As you can see in the below image object can also be detected by using OpenCV
, Bracelet, watch, patterns, faces can be detected. This can also include recognizing faces ,
shapes or even objects.

 Image Filtering: You can change image by applying various filters such as blurring or Sharpening.

 Draw the Image: OpenCV allows you to draw text, lines and any shapes in the images.

 Saving the Changed Images: After processing, You can save images that are being modified for
future analysis.

24
6.5 Hardware Acceleration:
Depending on the scale of your project and performance requirements, you can utilize hardware
acceleration technologies, such as NVIDIA GPUs or specialized AI accelerators (e.g., NVIDIA Jetson)
to speed up training and inference processes.

6.6 Development Environment:


Integrated Development Environments (IDEs) like PyCharm, Jupyter Notebook, or Visual Studio Code

provide a user-friendly interface for coding, debugging, and experimentation.

6.7 Version Control and Collaboration:


Use version control systems like Git along with platforms like GitHub or GitLab to manage your
codebase, track changes, and collaborate with team members.

6.8 Cloud Services:


Cloud platforms, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft
Azure, offer convenient infrastructure for training and deploying object detection models. They
provide scalable computer resources, storage, and services for managing AI workloads.

6.9 Data Management:


Utilize data management tools like TensorFlow Data API or PyTorch DataLoader to efficiently handle

large-scale datasets during training and inference.

6.10 Visualization Tools:


Matplotlib, Seaborn, or other data visualization libraries can help visualize the results of object

detection, display bounding boxes, and analyze performance metrics.

25
CHAPTER 7

SNAPSHOTS OF INPUTS & OUTPUTS

Fig.7.1

Fig 7.2

26
Fig. 7.3

Fig 7.4

27
Fig 7.5

Fig 7.6

28
Fig 7.7

Fig 7.8

29
CHAPTER 8

CODING

# How to run?: python real_time_object_detection.py --prototxt

MobileNetSSD_deploy.prototxt.txt --model MobileNetSSD_deploy.caffemodel

# python real_time.py --prototxt MobileNetSSD_deploy.prototxt.txt --model

MobileNetSSD_deploy.caffemodel

# import packages from

imutils.video import

VideoStream from imutils.video

import FPS import numpy as np

import argparse import imutils

import time import cv2

# construct the argument parse and parse the arguments

ap= argparse.ArgumentParser() ap.add_argument("-p", "--

prototxt", required=True, help="path to Caffe 'deploy'

prototxt file") ap.add_argument("-m", "--model",

required=True, help="path to Caffe pre-trained model")

ap.add_argument("-c", "--confidence", type=float,

default=0.2, help="minimum probability to filter weak

predictions") args = vars(ap.parse_args())

# Arguments used here:

30
# prototxt = MobileNetSSD_deploy.prototxt.txt (required)
# model = MobileNetSSD_deploy.caffemodel (required)
# confidence = 0.2 (default)

# SSD (Single Shot MultiBox Detector) is a popular algorithm in object detection

# It has no delegated region proposal network and predicts the boundary boxes and the classes

directly from feature maps in one single pass

# To improve accuracy, SSD introduces: small convolutional filters to predict object classes and

offsets to default boundary boxes

# Mobilenet is a convolution neural network used to produce high-level features

# SSD is designed for object detection in real-time

# The SSD object detection composes of 2 parts: Extract feature maps, and apply convolution

filters to detect objects

# Let's start by initialising the list of the 21 class labels MobileNet SSD was trained to.

# Each prediction composes of a boundary box and 21 scores for each class (one extra class for no

object),

# and we pick the highest score as the class for the bounded object
CLASSES = ["aeroplane", "background", "bicycle", "bird", "boat",

"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",


"dog", "horse", "motorbike", "person", "pottedplant", "sheep",

"sofa", "train", "tvmonitor"]

# Assigning random colors to each of the classes

COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# COLORS: a list of 21 R,G,B values, like ['101.097383 172.34857188 111.84805346'] for each

label

# length of COLORS = length of CLASSES = 21

31
# load our serialized model

# The model from Caffe: MobileNetSSD_deploy.prototxt.txt;

MobileNetSSD_deploy.caffemodel; print("[INFO] loading

model...") net = cv2.dnn.readNetFromCaffe(args["prototxt"],

args["model"])

# print(net)

# <dnn_Net 0x128ce1310>

# initialize the video stream, #

and initialize the FPS counter

print("[INFO] starting video

stream...") vs =

VideoStream(src=0).start()

# warm up the camera for a couple of seconds

time.sleep(2.0)

# FPS: used to compute the (approximate) frames per second


# Start the FPS

timer fps =

FPS().start()

# OpenCV provides two functions to facilitate image preprocessing for deep learning classification:
cv2.dnn.blobFromImage and cv2.dnn.blobFromImages. Here we will use cv2.dnn.blobFromImage

# These two functions perform: Mean subtraction, Scaling, and optionally channel swapping

# Mean subtraction is used to help combat illumination changes in the input images in our dataset.
We can therefore view mean subtraction as a technique used to aid our Convolutional Neural
Networks

32
# Before we even begin training our deep neural network, we first compute the average pixel

intensity across all images in the training set for each of the Red, Green, and Blue channels.

# we end up with three variables: mu_R, mu_G, and mu_B (3-tuple consisting of the mean of the

Red, Green, and Blue channels)

# For example, the mean values for the ImageNet training set are R=103.93, G=116.77, and

B=123.68

# When we are ready to pass an image through our network (whether for training or testing), we

subtract the mean, \mu, from each input channel of the input image:

# R = R - mu_R
# G = G - mu_G
# B = B - mu_B

# We may also have a scaling factor, \sigma, which adds in a normalization:


# R = (R - mu_R) / sigma

# G = (G - mu_G) / sigma
# B = (B - mu_B) / sigma

# The value of \sigma may be the standard deviation across the training set (thereby turning the

preprocessing step into a standard score/z-score)

# sigma may also be manually set (versus calculated) to scale the input image space into a particular

range — it really depends on the architecture, how the network was trained

# cv2.dnn.blobFromImage creates 4-dimensional blob from image. Optionally resizes and crops
image from center, subtract mean values, scales values by scalefactor, swap Blue and Red channels

# a blob is just an image(s) with the same spatial dimensions (width and height), same depth

(number of channels), that have all be preprocessed in the same manner

# Consider the video stream as a series of frames. We capture each frame based on a certain FPS,

and loop over each frame

33
# loop over the frames from the video stream

while True:

# grab the frame from the threaded video stream and resize it to have a maximum width of 400

pixels

# vs is the VideoStream frame = vs.read()

frame = imutils.resize(frame, width=400)

print(frame.shape) # (225, 400, 3)

# grab the frame dimensions and convert it to a blob


# First 2 values are the h and w of the frame. Here h = 225 and w = 400
(h, w) = frame.shape[:2]

# Resize each frame

resized_imagecv2.resize(frame,(300, 300))

# Creating the blob


# The function:
# blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)

# image: the input image we want to preprocess before passing it through our deep neural network
for classification
# mean:

# scalefactor: After we perform mean subtraction we can optionally scale our images by some factor.

Default = 1.0

# scalefactor should be 1/sigma as we're actually multiplying the input channels (after mean

subtraction) by scalefactor (Here 1/127.5)

# swapRB : OpenCV assumes images are in BGR channel order; however, the 'mean' value assumes we

are using RGB order.

# To resolve this discrepancy we can swap the R and B channels in image by setting this value to 'True'

34
# By default OpenCV performs this channel swapping for us.

blob = cv2.dnn.blobFromImage(resized_image, (1/127.5), (300, 300), 127.5, swapRB=True)

# print(blob.shape) # (1, 3, 300, 300)


# pass the blob through the network and obtain the predictions and predictions

net.setInput(blob)

# net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# Predictions:
predictions = net.forward()

# loop over the predictions for i in np.arange(0,

predictions.shape[2]):

# extract the confidence (i.e., probability) associated with the prediction

# predictions.shape[2] = 100 here confidence =

predictions[0, 0, i, 2]

# Filter out predictions lesser than the minimum confidence level


# Here, we set the default confidence as 0.2. Anything lesser than 0.2 will be filtered
if confidence > args["confidence"]:
# extract the index of the class label from the 'predictions'

# idx is the index of the class label


# E.g. for person, idx = 15, for chair, idx = 9, etc.
idx = int(predictions[0, 0, i, 1])
# then compute the (x, y)-coordinates of the bounding box for the object box =
predictions[0, 0, i, 3:7] * np.array([w, h, w, h])
# Example, box = [130.9669733 76.75442174 393.03834438 224.03566539]
# Convert them to integers: 130 76 393 224
(startX, startY, endX, endY) = box.astype("int")

35
# Get the label with the confidence score

label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)

print("Object detected: ", label)

# Draw a rectangle across the boundary of the object cv2.rectangle(frame, (startX, startY), (endX,
endY), COLORS[idx], 2)

y = startY - 15 if startY - 15 > 15 else startY + 15

# Put a text outside the rectangular detection

# Choose the font of your choice: FONT_HERSHEY_SIMPLEX, FONT_HERSHEY_PLAIN,


FONT_HERSHEY_DUPLEX, FONT_HERSHEY_COMPLEX, FONT_HERSHEY_SCRIPT_COMPLEX,
FONT_ITALIC, etc. cv2.putText(frame, label, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX,

0.5, COLORS[idx], 2)

cv2.imshow("Frame", frame)

# HOW TO STOP THE VIDEOSTREAM?


# Using cv2.waitKey(1) & 0xFF

# The waitKey(0) function returns -1 when no input is made

# As soon an event occurs i.e. when a button is pressed, it returns a 32-bit integer
# 0xFF represents 11111111, an 8 bit binary
# since we only require 8 bits to represent a character we AND waitKey(0) to 0xFF, an integer below

255 is always obtained

# ord(char) returns the ASCII value of the character which would be again maximum 255
# by comparing the integer to the ord(char) value, we can check for a key pressed event and break the

loop

# ord("q") is 113. So once 'q' is pressed, we can write the code to break the loop
# Case 1: When no button is pressed: cv2.waitKey(1) is -1; 0xFF = 255; So -1 & 255 gives 255
#Case 2: When 'q' is pressed: ord("q") is 113; 0xFF = 255; So 113 & 255 gives 113

36
# Explaining bitwise AND Operator ('&'):

# The & operator yields the bitwise AND of its arguments

# First you convert the numbers to binary and then do a bitwise AND operation

# For example, (113 & 255):


# Binary of 113: 01110001
# Binary of 255: 11111111
# 113 & 255 = 01110001 (From the left, 1&1 gives 1, 0&1 gives 0, 0&1 gives 0,... etc.)

# 01110001 is the decimal for 113, which will be the output

# So we will basically get the ord() of the key we press if we do a bitwise AND with 255.
# ord() returns the unicode code point of the character. For e.g., ord('a') = 97; ord('q') = 113

# Now, let's code this logic (just 3 lines, lol)


key = cv2.waitKey(1) & 0xFF

# Press 'q' key to break the loop

if key == ord("q"):

break

#update the FPS counter fps.update()

# stop the timer

fps.stop()

# Display FPS Information: Total Elapsed time and an approximate FPS over the entire video
stream print("[INFO] Elapsed Time: {:.2f}".format(fps.elapsed())) print("[INFO] Approximate FPS:
{:.2f}".format(fps.fps())) # Destroy windows and cleanup cv2.destroyAllWindows()

# Stop the video stream vs.stop()

37
CHAPTER 9

PROJECT LIMITATIONS & FUTURE SCOPE

9.1 Limitations:
9.1.1 Multiple feature maps

Single-shot detectors must place special emphasis on the issue of multiple scales because they detect
objects with a single pass through the CNN framework. If objects are detected from the final CNN
layers alone, only large items will be found as smaller items may lose too much signal during
downsampling in the pooling layers. To address this problem, single-shot detectors typically look for
objects within multiple CNN layers including earlier layers where higher resolution remains. Despite
the precaution of using multiple feature maps, single-shot detectors notoriously struggle to detect
small objects, especially those in tight groupings like a flock of birds.

9.1.2 Limited data

The limited amount of annotated data currently available for object detection proves to be another
substantial hurdle. Object detection datasets typically contain ground truth examples for about a
dozen to a hundred classes of objects, while image classification datasets can include upwards of
100,000 classes. Furthermore, crowdsourcing often produces image classification tags for free (for
example, by parsing the text of user-provided photo captions). Gathering ground truth labels along
with accurate bounding boxes for object detection, however, remains incredibly tedious work.

38
The COCO dataset, provided by Microsoft, currently leads as some of the best object detection data
available. COCO contains 300,000 segmented images with 80 different categories of objects with
very precise location labels. Each image contains about 7 objects on average, and items appear at
very broad scales. As helpful as this dataset is, object types outside of these 80 select classes will not
be recognized if training solely on COCO.

A very interesting approach to alleviating data scarcity comes from YOLO9000, the second version
of YOLO. YOLO9000 incorporates many important updates into YOLO, but it also aims to narrow
the dataset gap between object detection and image classification. YOLO9000 trains simultaneously
on both COCO and ImageNet, an image classification dataset with tens of thousands of object classes.
COCO information helps precisely locate objects, while ImageNet increases YOLO’s classification
“vocabulary.” A hierarchical WordTree allows YOLO9000 to first detect an object’s concept (such as
“animal/dog”) and to then drill down into specifics (such as “Siberian husky”). This approach appears
to work well for concepts known to COCO like animals but performs poorly on less prevalen
concepts since RoI suggestion comes solely from the training with COCO.

39
9.2 Future Scope:
Computers have become the most important technology in our recent lives and can solve
increasingly difficult problems. More specifically, computers are generally better than humans in
repetitive, data intensive and computational tasks. Over the last decade computers have become so
powerful that they can be used to perform complex tasks. This paper elucidates one of these highly
computational applications that has become possible in recent years, which is Object detection: the
complex task of finding objects in a given image or video frame. It has been around for years but is
becoming more apparent across a range of industries now more than ever before.

Existing algorithms most often only tackle a small subset of the different tasks necessary for
understanding an image and are very demanding in terms of computational resources and runtime. In
order to reproduce at least a part of the human visual perception abilities, one would have to combine
several different algorithms. Making such a combined system run in real time with today’s hardware
is a big challenge.

Indeed, object detection is a key ability for most computer and robot vision systems. Although great
progress has been observed in the last years and there will be improvements in the future thanks to
the remarkable evolution of artificial intelligence, and some existing techniques that are now part of
many consumer electronics or have been integrated in assistant driving technologies, we are still far
from achieving human-level performance, in particular in terms of open-world learning. It should be
noted that object detection has not been used much in many areas where it could be of great help.
Hence, we need to consider that we will need object detection systems for nano-robots or for robots
that will explore areas that have not been seen by humans, such as depth parts of the sea or other
planets, and the detection systems will have to learn to new object classes as they are encountered.
In such cases, a real-time open-world learning ability will be critical.

This fascinating computer technology is related to computer vision and image processing that
detects and defines objects such as people, vehicles and animals from digital images and videos.
Therefore, object detection has the power to classify just one or several objects within a digital
image or video at once. To build object detection system in the future we have many methods
developed until the moment but Object detection using deep learning technic is promising more
accuracy for variety of object classes. This technology is obviously breaking into a wide range of
industries, with use cases ranging from personal security to productivity in the workplace. It is
applied in many areas of computer vision, including image retrieval, security, surveillance,
automated vehicle systems and machine inspection. Significant challenges stay on the field of

40
object recognition. The possibilities are endless when it comes to future use cases for object
detection.

Herewith are some of the main useful applications of object detection: Vehicle’s Plates recognition,
self-driving cars, Tracking objects, face recognition, medical imaging, object counting, object
extraction from an image or video, person detection.

9.3 Conclusion
Object detection is customarily considered to be much harder than image classification, particularly
because of these five challenges: dual priorities, speed, multiple scales, limited data, and class
imbalance. Researchers have dedicated much effort to overcome these difficulties, often yielding
amazing results; however, significant challenges persist.

Basically, all object detection frameworks continue to struggle with small objects, especially those
bunched together with partial occlusions. Real-time detection with top-level classification and
localization accuracy remains challenging, and practitioners must often prioritize one or the other
when making design decisions. Video tracking may see improvements in the future if some continuity
between frames is assumed rather than processing each frame individually. Furthermore, an
interesting improvement that may see more exploration would extend the current two-dimensional
bounding boxes into three-dimensional bounding cubes. Even though many object detection
obstacles have seen creative solutions, these additional considerations–and plenty more–signal that
object detection research is certainly not done!

41
CHAPTER 10

REFERENCES

• Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng, Avidan, S., “Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients,” Computer Vision and Pattern Recognition,
2006 IEEE Computer Society Conference, pp.1491-1498, 2006

• Yoshua Bengio. Learning deep architectures for ai. Foundations and Trends R in Machine
Learning, 2(1):1–127, 20093.

• J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object
recognition. IJCV, 2013

• R. Girshick, J. Donahue, T. Darrell, and J. Malik. Region-based convolutional networks for


accurate object detection and segmentation. TPAMI, 2015

• R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV),


2015

• S. Ren, K. He, R. Girshick, X. Zhang, and J. Sun, “Object detection networks on


convolutional feature maps,” arXiv:1504.06066, 2015

• Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T.


Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proc. of the ACM
International Conf. on Multimedia, 2014

• Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun., “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks”, arXiv:1506.01497, 2016

• [9] Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias
Weyand Marco Andreetto Hartwig Adam, “MobileNets: Efficient Convolutional
Neural Networks for Mobile Vision Applications”, arXiv:1704.04861v1, 2017

• J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. ImageNet: A large-scale
hierarchical image database. In CVPR, 2009

• Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional


neural networks,” in Neural Information Processing Systems (NIPS), 2012

42

You might also like