0% found this document useful (0 votes)
9 views110 pages

Sample Major Project Report Jul-Dec 24

The project report titled 'Real-Time Object Detection and Recognition' details a system developed by Vedik Sharma and Yash Sanskari for their Bachelor of Engineering in Computer Science Engineering. Utilizing deep learning and TensorFlow, the project aims to automate the counting of students, vehicles, and animals entering a college campus in real-time, thereby reducing manual effort and improving efficiency. The report includes acknowledgments, project objectives, and a structured overview of the project's implementation and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views110 pages

Sample Major Project Report Jul-Dec 24

The project report titled 'Real-Time Object Detection and Recognition' details a system developed by Vedik Sharma and Yash Sanskari for their Bachelor of Engineering in Computer Science Engineering. Utilizing deep learning and TensorFlow, the project aims to automate the counting of students, vehicles, and animals entering a college campus in real-time, thereby reducing manual effort and improving efficiency. The report includes acknowledgments, project objectives, and a structured overview of the project's implementation and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

“Real-Time Object Detection and Recognition”

A Major Project Report Submitted to

Rajiv Gandhi Proudyogiki Vishwavidyalaya

Towards Partial Fulfillment for the Award of

Bachelor of Engineering in Computer Science Engineering

Guided by:
Submitted by:

Prof. ……………………
Vedik Sharma (0827CS141117)
Professor, CSE
Yash Sanskari (0827CS141119)
Acropolis Institute of Technology & Research, Indore
July - Dec 2024
EXAMINER APPROVAL

The Major Project entitled “Real-Time Object Detection and Recognition”

submitted by Vedik Sharma (0827CS141117), Yash Sanskari

(0827CS141119) has been examined and is hereby approved towards

partial fulfillment for the award of Bachelor of Technology degree in

Computer Science Engineering discipline, for which it has been

submitted. It understood that by this approval the undersigned do not

necessarily endorse or approve any statement made, opinion expressed, or

conclusion drawn therein, but approve the project only for the purpose for

which it has been submitted.

(Internal Examiner) (External Examiner)

Date: Date:
RECOMMENDATION

This is to certify that the work embodied in this major project entitled

“Real Time Object Detection and Recognition” submitted by Vedik

Sharma (0827CS141117), Yash Sanskari (0827CS141119) is a

satisfactory account of the bonafide work done under the supervision of

Dr. Kamal Kumar Sethi, is recommended towards partial fulfillment for

the award of the Bachelor of Technology (Computer Science Engineering)

degree by Rajiv Gandhi Proudyogiki Vishwavidhyalaya, Bhopal.

(Project Guide)

(Project Coordinator)

(Dean Academics)
STUDENTS UNDERTAKING

This is to certify that the major project entitled “Real-Time Object

Detection and Recognition” has developed by us under the supervision of

Dr. Kamal Kumar Sethi. The whole responsibility of the work done in this

project is ours. The sole intension of this work is only for practical learning

and research.

We further declare that to the best of our knowledge; this report

does not contain any part of any work which has been submitted for the

award of any degree either in this University or in any other University /

Deemed University without proper citation and if the same work found

then we are liable for explanation to this.

Vedik Sharma (0827CS141117)

Yash Sanskari (0827CS141119)


Acknowledgement
We thank the almighty Lord for giving me the strength and courage to sail
out through the tough and reach on shore safely.

There are number of people without whom this project would not
have been feasible. Their high academic standards and personal integrity
provided me with continuous guidance and support.

We owe a debt of sincere gratitude, deep sense of reverence and


respect to our guide and mentor Dr. Kamal Kumar Sethi, Professor, AITR,
Indore for his motivation, sagacious guidance, constant encouragement,
vigilant supervision, and valuable critical appreciation throughout this
project work, which helped us to successfully complete the project on time.

We express profound gratitude and heartfelt thanks to Dr Kamal


Kumar Sethi, Professor & Head CSE, AITR Indore for his support,
suggestion, and inspiration for carrying out this project. I am very much
thankful to other faculty and staff members of the department for
providing me all support, help and advice during the project. We would be
failing in our duty if do not acknowledge the support and guidance
received from Dr S C Sharma, Director, AITR, Indore whenever needed. We
take opportunity to convey my regards to the management of Acropolis
Institute, Indore for extending academic and administrative support and
providing me all necessary facilities for project to achieve our objectives.

We are grateful to our parent and family members who have


always loved and supported us unconditionally. To all of them, we want to
say “Thank you”, for being the best family that one could ever have and
without whom none of this would have been possible.

Vedik Sharma (0827CS141117), Yash Sanskari (0827CS141119)


Executive Summary

Real-Time Object Detection and Recognition

This project is submitted to Rajiv Gandhi Proudyogiki


Vishwavidhyalaya, Bhopal (MP), India for partial fulfillment of Bachelor of
Engineering in Information Technology branch under the sagacious
guidance and vigilant supervision of Dr. Kamal Kumar Sethi.

The project is based on Deep Learning, which is a sub field of machine


learning, concerned with algorithms inspired by the structure and function of
the brain called artificial neural networks. In the project, TensorFlow is used,
which is an open-source software library created by Google for machine
learning applications. It is used for detecting, identifying and tracking objects
through the camera in real time. The project uses a pre-trained model on
Microsoft Common Objects in Context (COCO) data set, which contains
approximately all common objects. The purpose of this project is to implement
'Students and vehicles counter' in the college in real-time.

Key words: Image Processing, Neural Networks, Tensorflow


“Where the vision is one year,
cultivate flowers;

Where the vision is ten years,


cultivate trees;

Where the vision is eternity,


cultivate people.”
- Oriental Saying
List of Figures
Figure 1-1: Counting people at wedding function 3

Figure 1-2: Counting of eggs and soda bottle 4

Figure 1-3: Counting people at railway stations and airport 4

Figure 3-1: Block Diagram 16

Figure 3-2: R-CNN: Regions with CNN Features 18

Figure 3-3: Fast R-CNN Architecture 18

Figure 3-4: YOLO Architecture 18

Figure 3-5: Faster CNN Architecture 19

Figure 3-6: Bounding-Box 20

Figure 3-7: Data Flow Diagram Level 0 20

Figure 3-8: Data Flow Diagram Level 1 20

Figure 4-1: Deep Learning 24

Figure 4-2: Neural Networks 25

Figure 4-3: TensorFlow Architecture 26

Figure 4-4: TensorFlow Working 27

Figure 4-5: Data Structure in JSON Format 29

Figure 4-6: Objects in training set 29

Figure 4-7: Instances per Category 30

Figure 4-8: Comparison Graphs 30

Figure 4-9: Screenshot 1 32

Figure 4-10: Screenshot 2 32

Figure 4-11: Screenshot 3 32


Figure 4-12: Test Case 1 output 34

Figure 4-13: Test Case 2 Output 1 35

Figure 4-14: Test Case 2 Output 2 35


List of Tables
Table 1: Types of Datasets 19

Table 2: Database Structure 21

Table 3: Types of Models 27

Table 4: Test Case 1 33

Table 5: Test Case 2 34


List of Abbreviations

Abbr1: R- CNN- Regional based Convolutional Neural Networks

Abbr2: COCO – Common Objects in context

Abbr3: OpenCV- Open Source Computer Vision

Abbr4: JSON- Java Script Object Notation

Abbr5: CIF- Count In Frame

Abbr6: GPU- Graphical Processing Unit

Abbr7: YOLO- You Only Look Once


Table of Contents

CHAPTER 1. INTRODUCTION 1
...........................................................................
1.1 Overview 1
...............................................................................
1.2 Background and Motivation 2
.................................................
1.3 Problem Statement and Objectives 2
......................................
1.4 Scope of the Project 3
..............................................................
1.5 Team Organization 5
................................................................
1.6 Report Structure 5
...................................................................

CHAPTER 2. REVIEW OF LITERATURE 7


.........................................................
2.1 Preliminary Investigation 7
......................................................
2.1.1 Current System 7
.................................................................
2.2 Limitations of Current System 8
..............................................
2.3 Requirement Identification and Analysis for Project ........... 8
2.3.1 Conclusion 14
......................................................................

CHAPTER 3. PROPOSED SYSTEM 15


.................................................................
3.1 The Proposal ....................................................................... 15

3.2 Benefits of the Proposed System........................................ 15


3.3 Block Diagram ..................................................................... 16
3.4 Feasibility Study .................................................................. 16
3.4.1 Technical 16
.........................................................................
3.4.2 Economical................................................................. 17
.....
3.4.3 Operational 17
.....................................................................
3.5 Design Representation........................................................ 18
3.5.1 Data Flow Diagrams 20
........................................................
3.5.2 Database Structure 21
.........................................................
3.6 Deployment Requirements ................................................. 21
3.6.1 Hardware 21
........................................................................
3.6.2 Software..................................................................... 22
.....
CHAPTER 4. IMPLEMENTATION 23
..................................................................
4.1 Technique Used .................................................................. 23

4.1.1 Deep- Learning 23


...............................................................
4.1.2 Neural Networks : 24
...........................................................
4.2 Tools Used ........................................................................... 25
4.2.1 OpenCV 25
...........................................................................
4.2.2 Tensor Flow 26
....................................................................
4.2.3 Models 27
............................................................................
4.3 Language Used .................................................................... 31
4.4 Screenshots ......................................................................... 32
4.5 Testing ................................................................................. 33
4.5.1 Strategy 33
Used..................................................................
4.5.2 Test Case and Analysis 33
....................................................

CHAPTER 5. CONCLUSION........................................................................... 3
.... 6
5.1 Conclusion 36
...........................................................................
5.2 Limitations of the Work 36
......................................................
5.3 Suggestion and Recommendations for Future Work 37
.........
BIBLIOGRAPHY..................................................................................................... 3
..... 8

PROJECT
PLAN………………………………………………………………………………
…41

GUIDE INTERACTION
SHEET…………………………………………………………...42
SOURCE
CODE………………………………………………………………………………
….43
Real-Time Object Detection and Recognition

Chapter 1. Introduction

Introduction

The modern world is enclosed with gigantic masses of digital visual


information. To analyze and organize these devastating ocean of visual
information, image analysis techniques are major requisite. In particular,
useful would be methods that could automatically analyze the semantic
contents of images or videos. The content of the image determines the
significance in most of the potential uses. One important aspect of image
content is the objects in the images that are required to be recognized.

The project uses image analysis for object detection and recognition
in order to determine the activities and track objects. Here, Machine
learning is used to train the machine for particular set of objects so that a
system can be implemented for counting those objects by detecting and
then recognizing them in real-time.

1.1 Overview
The project is based on image processing and analysis from a video
that is captured from the camera in real-time. A camera placed at the
entrance of the college is used to detect students who enter inside the
campus.

A system is made in such a way that it detects students as objects


“Person” and counts them. Along with counting the number of students, it
is also able to enumerate vehicles and unwanted animals like dogs, cows
etc. that may enter the campus.

1
Real-Time Object Detection and Recognition

A record is maintained for a particular entry, be it a person, vehicle


or animal , as a log file in the database so it can be used for analysis
purpose in the future.

1.2 Background and Motivation


As the holy grail of computer vision research is to tell a story from a
single image or a sequence of images, object detection and recognition has
been studied for more than four decades. Significant efforts have been paid
to develop representation schemes and algorithms aiming at recognizing
generic objects in images taken under different imaging conditions (e.g.,
viewpoint, illumination, and occlusion).

Object Detection and recognition becomes a necessity when there is


a need of automation, where the identification is done by machines instead
of doing it manually for better performance and reliability. Normally, there
are people hired specially for counting the number of students and vehicles
that enter in a college everyday and maintain their records manually in a
register. Automation by this system provides a better way to perform the
same work.

1.3 Problem Statement and Objectives


In our college, there are a large number of students and cars that enter
through the main entrance everyday. To keep a record of the count of all
the students and vehicles manually by a single person is a very tedious and
time consuming task. Specially hiring people for this purpose is not a
feasibile solution.

Thus, the system implemented has the following objectives :

1. Objective 1: To count the number of students, vehicles that enter in the


college campus ,by automation with the help of machine learning ,and
thus reduce manual efforts and increase performance.

2
Real-Time Object Detection and Recognition

2. Objective 2: To keep a record of all the objects such as persons,


vehicles and animals as log files in database that can be retrieved
whenever required for analysis

1.4 Scope of the Project


As the project uses image processing for detecting and counting the objects
from the video in real-time, it can have a wide variety of applications in
various areas. Some of them are given below :

​ It can be used to count the number of students and vehicles that

enter the college / school campus.

​ People can be counted in crowdy public places such as temples,

shopping malls, industrial and corporate areas, concerts etc.

​ In weddings where the counting of guests is necessary for billing

and other accounting purpose , the project can be used efficiently


and easily.
Figure 1-1 : Counting people at wedding function

3
Real-Time Object Detection and Recognition

​ In packaging industries where the packed contents can be counted

like bottles, eggs etc.

Figure 1-2: Counting of eggs and soda bottle

​ It has its application at the railway stations, airports for counting

persons.

Figure 1-3 : Counting people at railway stations and airport


4
Real-Time Object Detection and Recognition

1.5 Team Organization

​ Vedika Shrivastava :

Along with doing preliminary investigation and understanding the


limitations of current system, I studied about the topic and its scope
and surveyed various research papers related to the object detection
and the technology that is to be used.
I also worked on the implementation of tensorflow framework and
the working of counting of objects in the project.

Worked on creating database for storing results in database.

Documentation is also a part of the work done by me in this project.

​ Yash Bhawsar :

I investigated and found the right technology and studied in deep


about it. For the implementation of the project , I collected the object
data and trained the model for it. Implementation logic for the
project objective and coding of internal functionalities is also done
by me.

Also, worked on Back end design for storing results in database for
maintaining logs.

1.6 Report Structure


The project Real-time Object detection and Recognition is primarily
concerned with the Image processing in real-time and whole project
report is categorized into five chapters.
Chapter 1: Introduction- introduces the background of the problem
followed by rationale for the project undertaken. The chapter describes the
objectives, scope and applications of the project. Further, the chapter

5
Real-Time Object Detection and Recognition

gives the details of team members and their contribution in development


of project which is then subsequently ended with report outline.

Chapter 2: Review of Literature- explores the work done in the area


of Project undertaken and discusses the limitations of existing system and
highlights the issues and challenges of project area. The chapter finally
ends up with the requirement identification for present project work based
on findings drawn from reviewed literature and end user interactions.

Chapter 3: Proposed System - starts with the project proposal based


on requirement identified, followed by benefits of the project. The chapter
also illustrate software engineering paradigm used along with different
design representation. The chapter also includes block diagram and details
of major modules of the project. Chapter also gives insights of different
type of feasibility study carried out for the project undertaken. Later it
gives details of the different deployment requirements for the developed
project.

Chapter 4: Implementation - includes the details of different


Technology/ Techniques/ Tools/ Programming Languages used in
developing the Project. The chapter also includes the different user
interface designed in project along with their functionality. Further it
discuss the experiment results along with testing of the project. The
chapter ends with evaluation of project on different parameters like
accuracy and efficiency.

Chapter 5: Conclusion - Concludes with objective wise analysis of


results and limitation of present work which is then followed by
suggestions and recommendations for further improvement.

6
Real-Time Object Detection and Recognition

Chapter 2 . Review of Literature

Review of Literature

Object Detection and Recognition is being studies for more than four
decades now and significant efforts have been paid to develop
representation schemes and algorithms aiming at recognizing generic
objects in images taken under different imaging conditions. Within a
limited scope of distinct objects, such as handwritten digits, fingerprints,
faces, and road signs, substantial success has been achieved. Object
recognition is also related to content-based image retrieval and multimedia
indexing as a number of generic objects can be recognized. In addition,
significant progress towards object categorization from images has been
made in the recent years . Object recognition has also been studied
extensively in psychology, computational neuroscience and cognitive
science.

2.1 Preliminary Investigation

2.1.1 Current System

​ The Current System for fulfilling the need is to have a single

dedicated person who manually does all the work of counting the
students and vehicles and keeping away the animals, that is done by
this project by automation.

​ The presence of a CCTV camera at the entrance of the college

campus which has no real-time significance and having just a


camera cannot solve the problem of counting the objects in
real-time.

7
Real-Time Object Detection and Recognition

2.2 Limitations of Current System


The limitations of these are as follows :

​ At the economic front, it is not feasible to have a person who

only stands at the gate and counts the number of students


entering the college or hiring a personnel just to check if dogs
or other animals are entering the building.

​ A lot of man power is wasted in doing the work manually.

​ It is not possible for a person to stand for 24 hours and keep a

record in a file but our system can be used 24X 7.

​ When a CCTV camera is used, students or persons cannot be

counted but in our system they can be counted in real-time.

2.3 Requirement Identification and Analysis for Project


Significant work has been done in the field of Object Detection and
Recognition ; however, it is not easy to achieve desired results. The review
of literature leads to draw certain major findings which are as under :

​ The study brought out that moving targets are extracted from

video-streaming and they are classified according to the pre defined


categories. Objects are detected by using pixel wise difference between
consecutive frames and they are categorized mainly as- humans, vehicles
and background clutter. They are then tracked by template matching.
There are two key elements which make the system robust - the
classification system which is based on temporal consistency and the
tracking system based on a combination of temporal differencing and
correlation matching. [2]

​ The paper presents a new approach to tracking of non-rigid objects which

is based on features like color, texture.It is appropriate for a variety of


objects with different color patterns. The mean shift iterations are
employed to find the target candidate that is most similar to target model

8
Real-Time Object Detection and Recognition

and this similarity is based on a metric which in turn, is based on


Bhattacharya coefficient. [3]

​ The paper focused on the problem of learning to detect objects from a

small training database. The performance depends crucially on the


features that are used to represent the objects. Specifically, it shows that
using local edge orientation histograms (EOH) as features can
significantly improve performance compared to the standard linear
features used in existing systems. For frontal faces, local orientation
histograms enable state of the art performance using only a few hundred
training examples. For profile view faces, local orientation histograms
enable learning a system that seems to outperform the state of the art in
real-time systems even with a small number of training examples. [4]

​ The paper advocated the use of randomized trees as the classification

technique. It is both fast enough for real-time performance and more


robust. It also gives us a principled way not only to match keypoints but to
select during a training phase those that are the most recognizable ones.
This results in a real-time system able to detect and position in 3D planar,
non-planar, and even deformable objects. It is robust to illuminations
changes, scale changes and occlusions. [5]

​ The work presents a real-time system for multiple objects tracking in

dynamic scenes. A unique characteristic of the system is its ability to cope


with long duration and complete occlusion without a prior knowledge
about the shape or motion of objects. The system produces good segment
and tracking results at a frame rate of 15-20 fps for image size of 320x240,
as demonstrated by extensive experiments performed using video
sequences under different conditions indoor and outdoor with
long-duration and complete occlusions in changing background. [6]
​ The work presented a real time robust human detection and tracking

system for video surveillance which can be used in varying environments.


This system consists of human detection, human tracking and false object
detection. The human detection utilizes the background subtraction to
segment the blob and use codebook to classify human being from other
objects. The optimal design algorithm of the codebook is proposed. The

9
Real-Time Object Detection and Recognition

tracking is performed at two levels: human classification and individual


tracking .The color histogram of human body is used as the appearance
model to track individuals. In order to reduce the false alarm, the
algorithms of the false object detection are also provided. [7]

​ The paper advocates a general trainable framework for object detection in

static images of cluttered scenes. The detection technique we develop is


based on a wavelet representation of an object class derived from a
statistical analysis of the class instances. By learning an object class in
terms of a subset of an over complete dictionary of wavelet basis
functions, we derive a compact representation of an object class which is
used as an input to a support vector machine classifier. This
representation overcomes both the problem of in-class variability and
provides a low false detection rate in unconstrained environments. [8]

​ The paper introduced a new real-time traffic light recognition system for

on-vehicle camera applications. The approach is mainly based on a spot


detection algorithm therefore able to detect lights from a high distance
with the main advantage of being not so sensitive to motion blur and
illumination variations. The detected spots together with other shape
analysis form strong hypothesis we feed our Adaptive Templates Matcher
with. a high rate of correctly recognized traffic lights and very few false
alarms were noticed. Processing is performed in real-time on 640x480
images using a 2.9GHz single core desktop computer. [9]

​ In this paper, a real-time vision system is described that can recognize

100 complex three-dimensional objects. In contrast to traditional


strategies that rely on object geometry and local image features, the
present system is founded on the concept of appearance matching.
Appearance manifolds of the 100 objects were automatically learned
using a computer-controlled turntable. A recognition loop was
implemented that performs scene change detection, image segmentation,
region normalizations, and appearance matching, in less than 1 second.
[10]

​ A novel algorithm for detection of certain types of unusual events was

presented in this paper. The algorithm is based on multiple local monitors

1
0
Real-Time Object Detection and Recognition

which collect low-level statistics. Each local monitor produces an alert if


its current measurement is unusual and these alerts are integrated to a
final decision regarding the existence of an unusual event. Since the
algorithm is not based on objects’ tracks, it is robust and works well in
crowded scenes where tracking-based algorithms are likely to fail. [11]

​ The paper focused on a computer vision based system for real-time

robust traffic sign detection, tracking, and recognition. The proposed


approach consists of two components. First, signs are detected using a set
of Haar wavelet features obtained from Ada-Boost training. Compared to
previously published approaches, the solution offers a generic, joint
modeling of color and shape information without the need of tuning free
parameters. Once detected, objects are tracked within a temporal
information propagation framework. Second, classification is performed
using Bayesian generative modeling. [12]

​ In the work surveyed, a real-time vision system was developed that

analyzes color videos taken from a forward-looking video camera in a car


driving on a highway. The system uses a combination of color, edge, and
motion information to recognize and track the road boundaries, lane
markings and other vehicles on the road. Cars are recognized by matching
templates that are cropped from the input data online and by detecting
highway scene features, by temporal differencing and evaluating how they
relate to each other and by tracking motion parameters that are typical for
cars. Experimental results demonstrate robust, real-time car detection
and tracking over thousands of image frames. [13]

​ The paper describes a two level approach to solve the problem of

real-time vision-based hand gesture classification. The lower level of the


approach implements the posture recognition with Haar-like features and
the Ada-Boost learning algorithm. With this algorithm, real-time
performance and high recognition accuracy can be obtained. The higher
level implements the linguistic hand gesture recognition using a
context-free grammar-based syntactic analysis. Given an input gesture,
based on the extracted postures, the composite gestures can be parsed
and recognized with a set of primitives and production rules. [14]

1
1
Real-Time Object Detection and Recognition

​ The paper brings in new techniques to detect and analyze periodic motion

as seen from both a static and a moving camera. By tracking objects of


interest, it computes an object's self-similarity as it evolves in time. For
periodic motion, the self-similarity measure is also periodic and we apply
Time-Frequency analysis to detect and characterize the periodic motion.
The periodicity is also analyzed robustly using the 2D lattice structures
inherent in similarity matrices. A real-time system has been implemented
to track and classify objects using periodicity. Examples of object
classification (people, running dogs, vehicles), person counting, and
non-stationary periodicity are provided. [15]

​ The paper advocated that Camera based systems are routinely used for

monitoring highway traffic, supplementing inductive loops and microwave


sensors employed for counting purposes. These techniques achieve very
good counting accuracy and are capable of discriminating trucks and cars.
However, pedestrians and cyclists are mostly counted manually.This paper
described a new camera based automatic system that utilizes Kalman
filtering in tracking and Learning Vector Quantization for classifying the
observations to pedestrians and cyclists. [16]

​ The paper presented a method for real-time 3D object detection that does

not require a time consuming training stage, and can handle untextured
objects. At its core, is a novel template representation that is designed to
be robust to small image transformations. It tests only a small subset of all
possible pixel locations when parsing the image, and to represent a 3D
object with a limited set of templates. It showed that together with a
binary representation that makes evaluation very fast and a
branch-and-bound approach to efficiently scan the image, it can detect
untextured objects in complex situations and provide their 3D pose in
real-time. [5]
​ According to paper, Robust object recognition is a crucial skill for robots

operating autonomously in real world environments. Range sensors such


as LiDAR and RGBD cameras are increasingly found in modern robotic
systems, providing a rich source of 3D information that can aid in this
task. However, many current systems do not fully utilize this information
and have trouble efficiently dealing with large amounts of point cloud

1
2
Real-Time Object Detection and Recognition

data. The paper proposes VoxNet, an architecture to tackle this problem


by integrating a volumetric Occupancy Grid representation with a
supervised 3D Convolutional Neural Network (3D CNN). [17]

​ This paper presents a general, trainable system for object detection in

unconstrained, cluttered scenes. The system derives much of its power


from a representation that describes an object class. The learning
approach implicitly derives a model of an object class by training a
support vector machine classifier using a large set of positive and negative
examples. It quantifies how the representation affects detection
performance by considering several alternate representations including
pixels and principal components. It also describes a real-time application
of our person detection system as part of a driver assistance system. [18]

​ The paper brings in YOLO, a new approach to object detection. Prior work

on object detection repurposes classifiers to perform detection. A single


neural network predicts bounding boxes and class probabilities directly
from full images in one evaluation. Since the whole detection pipeline is a
single network, it can be optimized end-to-end directly on detection
performance. Their base YOLO model processes images in real-time at 45
frames per second. A smaller version of the network, Fast YOLO, processes
an astounding 155 frames per second while still achieving double the
mAP of other real-time detectors. Compared to state-of-the-art detection
systems, YOLO makes more localization errors but is less likely to predict
false positives on background. Finally, YOLO learns very general
representations of objects. It outperforms other detection methods,
including DPM and R-CNN, when generalizing from natural images to
other domains like artwork. [19]

​ Paper proposes new method to quickly and accurately predict 3D

positions of body joints from a single depth image, using no temporal


information. An object recognition approach is taken, designing an
intermediate body parts representation that maps the difficult pose
estimation problem into a simpler per-pixel classification problem. The
large and highly varied training dataset allows the classifier to estimate

1
3
Real-Time Object Detection and Recognition

body parts in variant to pose, body shape, clothing, etc. The system runs at
200 frames per second on consumer hardware. The evaluation shows high
accuracy on both synthetic and real test sets, and investigates the effect of
several training parameters. [20]

2.3.1 Conclusion
This chapter reviews the literature surveys that have been done during the
research work. The related work that has been proposed by many
researchers has been discussed. The research papers related to object
detection and recognition of objects from 1985 to 2015 have been shown
which discussed about different methods and algorithm to identify objects.
1
4
Real-Time Object Detection and Recognition

Chapter 3 . Proposed System

Proposed System

3.1 The Proposal


The proposal is to deploy a system at the entry gate which can identify
various objects like person, animals through webcam or CCTV cameras and
perform appropriate operations like counting and tracking of objects in
real-time and save the information of that object in the database at that
very moment.

It can also count the vehicles that come inside the college and
classify them if it is car , motorcycle , bicycle , truck etc. and store that
information in the database. It can give the individual count of different
objects.

3.2 Benefits of the Proposed System


The current system had a lot of challenges that are overcome by this

system :

​ Economic : The proposed system is economic as there will not be


any person required to keep a watch on the entrance.

​ Real-Time Observation : Unlike CCTV, the objects can be identified


also in real-time and can be saved for later use.

​ Man Power : It does not require any person or their efforts to stand
and count the number of people.

​ 24 x 7 Availability : Camera implemented with this does not


require the person to stand 24 hrs.
​ Statistical analysis : The number of students and vehicles can be
counted individually and kept a record for calculating various
factors.

1
5
Real-Time Object Detection and Recognition

3.3 Block Diagram

Figure 3-1 : Block Diagram

3.4 Feasibility Study


A feasibility study is an analysis of how successfully a system can be
implemented, accounting for factors that affect it such as economic,
technical and operational factors to determine its potential positive and
negative outcomes before investing a considerable amount of time and
money into it.

3.4.1 Technical
For any real-time detection system, there is a need to process images from
the video. For this, the kind of framework used must be the one that is
capable of extracting those objects from the images easily and accurately in
real-time. The framework used in this is Tensorflow, which is a framework
designed by Google for efficiently dealing with deep learning and concepts
like neural networks , making the system technically feasible.

The system once set up completely, works automatically without


needing any person to operate it. The result (count and other information),
1
6
Real-Time Object Detection and Recognition

gets automatically saved in the database, without requiring any manual


effort for saving it.

For making the system technically feasible, there is a requirement of


GPU built system with high processor for better performance.

3.4.2 Economical
For any real-time object detection system, there is a need of a High
definition Camera for better and accurate results.

Since the system is completely automated, there is a need of


continuous electricity supply for it to operate 24X7.

The Tensorflow framework used in the system works great with


GPU built systems, which are a little on the expensive side.

Since the system uses high performance processors continuously, so


to save any disaster from occurring due to very high temperatures, there is
a requirement of a cooling system in the environment where it is
implemented.

3.4.3 Operational
The main motto of our system is to reduce the manual efforts of counting
the students and vehicles by automating it.

The system is able to do that accurately and efficiently making the


system operationally feasible.
1
7
Real-Time Object Detection and Recognition

3.5 Design Representation

Figure 3-2: R-CNN: Regions with CNN Features

Figure 3-3 : Fast R-CNN Architecture

Figure 3-4: YOLO Architecture


1
8
Real-Time Object Detection and Recognition

Figure 3-5: Faster CNN Architecture

Table 1 : Types of Datasets


1
9
Real-Time Object Detection and Recognition

Figure 3-6 : Bounding-Box

3.5.1 Data Flow Diagrams

Figure 3-7 Data Flow Diagram Level 0


Figure 3-8 Data Flow Diagram Level 1

2
0
Real-Time Object Detection and Recognition

3.5.2 Database Structure


The name of the database created is “db_detect” and there is one table in
the database named “logs” for storing the records.

The “Logs” table has the following structure :

Name Data Type Description

Datetime Timestamp Shows the complete


date and time when the
person/vehicle enters
and is identified

Type Varchar2 Displays the type of


object for example
Person, Car, Dog.

CIF Number Count per frame.It tells


the number of objects
in frame.

Table 2 : Database Structure

3.6 Deployment Requirements


There are various requirements (hardware, software and services) to
successfully deploy the system. These are mentioned below :

3.6.1 Hardware

​ 32-bit, x86 Processing system

​ Windows 7 or later operating system

​ High processing computer system without GPU or with GPU(high


performance)
​ High- definition Camera

21
Real-Time Object Detection and Recognition

3.6.2 Software

​ OpenCV

​ Python and its supported libraries

​ Tensor Flow

​ If Installing Tensorflow in GPU systems :

1. CUDA® Toolkit 9.0.

2. The NVIDIA drivers associated with CUDA Toolkit 9.0. cuDNN


v7.0.
3. GPU card with CUDA Compute Capability 3.0 or higher
2
2
Real-Time Object Detection and Recognition

Chapter 4 . Implementation

Implementation

For the problem of counting the number of students and vehicles entering
the college campus manually, the system is designed in such a way so as to
automate the process by placing a camera at the entrance gate so that
students, bikes and cars getting inside the college campus can be identified
and counted.

4.1 Technique Used

4.1.1 Deep- Learning


Deep Learning is a subfield of machine learning concerned with algorithms
inspired by the structure and function of the brain called artificial neural
networks. Deep learning (also known as deep structured learning or
hierarchical learning) is part of a broader family of machine learning
methods based on learning data representations, as opposed to

task-specific algorithms. Learning can be supervised, semi-supervised or


unsupervised.

Deep learning models are loosely related to information processing


and communication patterns in a biological nervous system, such as neural
coding that attempts to define a relationship between various stimuli and
associated neuronal responses in the brain.
2
3
Real-Time Object Detection and Recognition

Figure 4-1 : Deep Learning

Deep learning architectures such as deep neural networks, deep


belief networks and recurrent neural networks have been applied to fields
including computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine
translation, bioinformatics and drug design, where they have produced
results comparable to and in some cases superior to human experts.

4.1.2 Neural Networks :

In machine learning, a convolutional neural network (CNN, or ConvNet) is

a class of deep, feed-forward artificial neural networks that has


successfully been applied to analyzing visual imagery.

CNNs use a variation of multilayer perceptrons designed to require


minimal preprocessing. They are also known as shift invariant or space
invariant artificial neural networks (SIANN), based on their
shared-weights architecture and translation invariance characteristics.
2
4
Real-Time Object Detection and Recognition

Figure 4-2 : Neural Networks

Convolutional networks were inspired by biological processes in that the


connectivity pattern between neurons resembles the organization of the
animal visual cortex. Individual cortical neurons respond to stimuli only in
a restricted region of the visual field known as the receptive field. The
receptive fields of different neurons partially overlap such that they cover
the entire visual field. CNNs use relatively little pre-processing compared
to other image classification algorithms. This means that the network
learns the filters that in traditional algorithms were hand-engineered. This
independence from prior knowledge and human effort in feature design is
a major advantage. They have applications in image and video recognition,
recommender systems and natural language processing.

4.2 Tools Used

4.2.1 OpenCV

OpenCV (Open Source Computer Vision Library) is released under a


BSD license and hence it’s free for both academic and commercial use. It
has C++, Python and Java interfaces and supports Windows, Linux, Mac OS,
iOS and Android. OpenCV was designed for computational efficiency and
2
5
Real-Time Object Detection and Recognition

with a strong focus on real-time applications. Written in optimized C/C++,


the library can take advantage of multi-core processing. Enabled with
OpenCL, it can take advantage of the hardware acceleration of the
underlying heterogeneous compute platform.

Adopted all around the world, OpenCV has more than 47 thousand
people of user community and estimated number of downloads exceeding
14 million. Usage ranges from interactive art, to mines inspection, stitching
maps on the web or through advanced robotics.

4.2.2 Tensor Flow


TensorFlow is an open source software library for high performance
numerical computation. Its flexible architecture allows easy deployment of
computation across a variety of platforms (CPUs, GPUs, TPUs), and from
desktops to clusters of servers to mobile and edge devices. Originally
developed by researchers and engineers from the Google Brain team
within Google’s AI organization, it comes with strong support for machine
learning and deep learning and the flexible numerical computation core is
used across many other scientific domains.

Figure 4-3 : TensorFlow Architecture


2
6
Real-Time Object Detection and Recognition

Figure 4-4 : TensorFlow Working

4.2.3 Models
The TensorFlow official models are a collection of example models that use
TensorFlow's high-level APIs. They are intended to be well-maintained,
tested, and kept up to date with the latest TensorFlow API. They should
also be reasonably optimized for fast performance while still being easy to
read.
Table 3 : Types of Models

2
7
Real-Time Object Detection and Recognition

Below is a list of the models available.

​ COCO: COCO is a large-scale object detection, segmentation, and

captioning dataset.

​ Mnist: A basic model to classify digits from the MNIST dataset.

​ Resnet: A deep residual network that can be used to classify both


CIFAR-10 and ImageNet's dataset of 1000 classes.

​ Wide_deep: A model that combines a wide model and deep


network to classify census income data.

The system uses COCO data set Model which is Common Object in Context
designed by Mirosoft.

COCO Model :

This database has several features:

​ Object Detection

​ Recognition in context

​ Superpixel stuff segmentation

​ 330K images (>200K labeled)

​ 1.5 million object instances


​ 80 object categories

​ 91 stuff categories

​ 5 captions per image

​ 250,000 people with keypoints

COCO currently has three annotation types: object instances, object


keypoints, and image captions. The annotations are stored using the
JSON file format. All annotations share the basic data structure
below:

2
8
Real-Time Object Detection and Recognition

Figure 4-5 : Data Structure in JSON Format

The distribution of the objects in this database can be obtained from their
website. In section Explore, it is possible to choose and combine each of the
objects and observe how many images these objects appear. The
distribution for each of the objects in training/validation set is shown in
the following image:

Figure 4-6 : Objects in training set


2
9
Real-Time Object Detection and Recognition

Finally, in order to understand better both databases, the following image


shows some characteristics of them, comparing them with other important
databases.

Figure 4-7 : Instances per Category

Figure 4-8 : Comparison Graphs


3
0
Real-Time Object Detection and Recognition

4.3 Language Used


Python language is used in the system due to the following Characterstics :

Simple :

Python is a simple and minimalistic language. Reading a good


Python program feels almost like reading English (but very strict English!).
This pseudo-code nature of Python is one of its greatest strengths. It allows
you to concentrate on the solution to the problem rather than the syntax
i.e. the language itself.

Free and Open Source :

Python is an example of a FLOSS (Free/Libre and Open Source


Software). In simple terms, you can freely distribute copies of this
software, read the software's source code, make changes to it, use pieces of
it in new free programs, and that you know you can do these things. FLOSS
is based on the concept of a community which shares knowledge. This is
one of the reasons why Python is so good - it has been created and
improved by a community who just want to see a better Python.

Object Oriented :

Python supports procedure-oriented programming as well as


object-oriented programming. In procedure-oriented languages, the program
is built around procedures or functions which are nothing but reusable
pieces of programs. In object-oriented languages, the program is built around
objects which combine data and functionality. Python has a very powerful
but simple way of doing object-oriented programming, especially, when
compared to languages like C++ or Java.

Extensive Libraries :

The Python Standard Library is huge indeed. It can help you do


various things involving regular expressions, documentation generation,
unit testing, threading, databases, web browsers, CGI, ftp, email, XML,

3
1
Real-Time Object Detection and Recognition

XML-RPC, HTML, WAV files, cryptography, GUI(graphical user interfaces)


using Tk, and also other system-dependent stuff. Remember, all this is
always available wherever Python is installed. This is called the "batteries
included" philosophy of Python.

4.4 Screenshots
The Following are the screenshots of the result of the project :

Figure 4-9 : Screenshot 1

Figure 4-10 : Screenshot 2


Figure 4-11 : Screenshot 3

3
2
Real-Time Object Detection and Recognition

4.5 Testing
Testing is the process of evaluation of a system to detect differences
between given input and expected output and also to assess the feature of
the system. Testing assesses the quality of the product. It is a process that
is done during the development process. .

4.5.1 Strategy Used


Tests can be conducted based on two approaches –

​ Functionality testing

​ Implementation testing

The texting method used here is Black Box Testing. It is carried out to test
functionality of the program. It is also called ‘Behavioral’ testing. The tester
in this case, has a set of input values and respective desired results. On
providing input, if the output matches with the desired results, the
program is tested ‘ok’, and problematic otherwise.

4.5.2 Test Case and Analysis


TEST CASE: 1

Test Case ID TC001

Test Case It will check whether the system detects the students entering
in the college with accuracy >=50% or not.
Summary

Test Place and start the camera at the college entrance.

Procedure
Expected The students must be detected with accuracy greater than
Result 50%.
Actual Result The Students with accuracy greater than 50% are detected.
Status Pass

Table 4 : Test Case 1


3
3
Real-Time Object Detection and Recognition

TEST CASE 1 OUTPUT

Figure 4-12 : Test Case 1 output


TEST CASE: 2

Test Case ID TC002

Test Case It will check whether the entry for each student entering in Th
e
college is stored in the database with the timestamp or not.
Summary

Test 1. Place and start the camera at the college entrance.

Procedure 2. Check the database after detection.

Expected For each student, an entry must be there in the database.


Result
Actual Result 3 students were detected and their entry was present in Th
e
database with the timestamp
Status Pass
Table 5: Test Case 2

3
4
Real-Time Object Detection and Recognition

TEST CASE 2 OUTPUT

Figure 4-13 : Test Case 2 Output 1


Figure 4-14 : Test Case 2 Output 2

3
5
Real-Time Object Detection and Recognition

Chapter 5.Conclusion

Conclusion

5.1 Conclusion
The aim of the project that was to automatically detect , identify and count
the number of students, vehicles and animals and save their details in the
database is successfully and accurately done by this project with the use of
concepts like Deep learning, Convolutional Neural Networks, OpenCV and
TensorFlow.

The work done manually can now be completely replaced by this


automated system and it can reduce all the extra efforts of maintain the
records.

5.2 Limitations of the Work

​ The working of this project would be a little slow because

framework like TensorFlow, & deep learning need high-processing


hardware and GPU(graphical processing unit) systems but we are
using CPU only.

​ The models that we are using for identifying the objects are

pre-trained models. So, if we want to train our own model, it takes a


lot of time and processing.

​ In the system, scanning of each frame is one per second but still it

needs improvement. If the objects move too fast, it may not detect
them.
3
6
Real-Time Object Detection and Recognition

5.3 Suggestion and Recommendations for Future Work

​ The Model would be trained for detecting more number of objects.

​ SNS service will be integrated in this project for alert notification

when an unwanted object is detected.

​ Currently, the bounding box technique is used which is bounding the

targeted object within a rectangle. In future , Segmentation will be


used.
3
7
Real-Time Object Detection and Recognition

Bibliography

[1] H. Fujiyoshi, A. J. Lipton and R. S. Patil, "Moving Target Classification


and Tracking from Real-time Video," in Fourth IEEE workshop, 1998.

[2] D. Comaniciu, R. Vishwanathan and P. Meer, "Real-Time Tracking of


Non-Rigid Objects using Mean Shift," in IEEE, 2000.

[3] K. Levi and Y. Weiss, "Learning Object Detection from a Small Number
of Examples: the Importance of Good Features," in IEEE Computer
Society Conference, 2004.

[4] V. Lepetit, P. Lagger and P. Fua, "Randomized Trees for Real-Time


Keypoint Recognition," in IEEE Computer Society Conference, 2005.

[5] T. Yang, S. Z. Li and J. Li, "Real-time Multiple Objects Tracking with


Occlusion Handling in Dynamic Scenes," in IEEE Computer Society
Conference, 2005.

[6] J. Zhou and J. Hoang, "Real Time Robust Human Detection and
Tracking System," in IEEE Computer Society Conference, 2005.

[7] C. P. Papageorgiou, M. Oren and T. Poggio, "A General Framework for


Object Detection," in Center for Biological and Computational Learning
Artificial Intelligence Laboratory, Cambridge, 2005.

[8] R. D. Charette and F. Nashashibi, "Real TimeVisual Traffic Lights


Recognition Based on Spot Light Detection andAdaptive Traffic Lights
Template," in IEEE, 2009.

3
8
Real-Time Object Detection and Recognition

[9] S. K. Nayar, S. A. Nene and M. Hiroshi, "Real-Time 100 Object


Recognition System," in IEEE International Conference, 1996.

[10] A. Adam, E. Rivlin, S. Ilan and D. Reinitz, "Robust Real-Time Unusual


Event Detection Using Multiple Fixed-Location Monitors," in IEEE,
2008.

[11] C. Bahlmann, Y. Zhu, R. Vishwanathan, M. Pelkoffer and T. Koehler, "A


System for Traffic Sign Detection, Tracking, and Recognition Using
Color, Shape, and Motion Information," in IEEE, 2005.

[12] M. Betke, E. Haritaoglu and L. S. Davis, "Real-time multiple vehicle


detection and tracking from a moving vehicle," Machine Vision and
Applications, p. 12, 2000.

[13] Q. Chen, N. D. Georganas and E. M. Petriu, "Real-time Vision-based


Hand Gesture Recognition Using Haar-like Features," in IEEE
Conference, 2007.

[14] R. Cutler and L. S. Davis, "Robust Real-Time Periodic Motion Detection,


Analysis, and Applications," IEEE transactions on Pattern Analysis and
Machine Intelligence, 2000.

[15] J. Heikkila and O. Silven, "A real-time system for monitoring of cyclists
and pedestrians," Image and Vision Computing, 2004.

[16] D. Maturana and S. Scherer, "VoxNet: A 3D Convolutional Neural


Network for Real-Time Object Recognition," in IEEE International
Conference, 2015.

[17] C. Papageorgiou and T. Poggio, "A Trainable System for Object


Detection," Internation Journal of Computer Vision, 2000.

3
9
Real-Time Object Detection and Recognition

[18] J. Redmon, S. Divvala, R. Girshik and A. Farhadi, "You Only Look Once:
Unified, Real-Time Object Detection," in IEEE conference on Computer
Vision and Pattern Rocgnition, 2016.

[19] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A.


Kipman and A. Blake, "Real-Time Human Pose Recognition in Parts
from Single Depth Images," Computer Vision and Pattern Recognition,
2011.

[20] K. P. A. Menon, "Management of Agriculture and Rural Development


Programme: Problems and Prospects, India," J Extension Education,
vol. 21, no. 1, pp. 94-98, 1985.

[21] S. Hinterstoisse, V. Lepetit, S. Ilic, P. Fua and N. Nassir, "Dominant


Orientation Templates for Real-Time Detection of Texture-Less
Object," in IEEE Conference, 2010.
4
0
Real-Time Object Detection and Recognition

Project Plan

Gantt Chart
4
1
Real-Time Object Detection and Recognition

Guide Interaction Sheet

Date Discussion Action Plan

Discussed about the title of the Real-time object detection


4/01/2018 and Recognition was decided
Project
as the title.
10/01/2018 Discussion on the technology to be Tensorflow , OpenCV and
used for object detection in real-time other tools were finalized
14/01/2018 Discussion of the creation of Gathering of information for

synopsis of the project synopsis creation


Suggestions on how to do a literature Many research papers were
17/01/2018 survey and preliminary investigation read , understood and their
on the topic abstract were to be written.
Discussion on the implementation of Using tensorflow and other
22/02/2018 tools, we decided to
the project implement detection.
Discussion on the objective of the Decided to Include the logic
15/03/2018 project(counting of students at the of counting students in the
entrance gate of college in real-time) Program
Suggestion for counting the number Took steps for adding and
26/03/2018 of vehicles like cars, bikes, buses also modifying the program for
at the college entrance counting vehicles also
For generation of log files and Action taken that for each
user an entry must be made
10/04/2018 storing the result, database was
in the database so that count
advised to be added
can be made easy
Decided to write the content
15/04/2018 Discussion on project documentation and integrate it in the proper
fomat of the report
42
Real-Time Object Detection and Recognition

Source Code
1. Create_record.py
import hashlib

import io

import logging

import os

import random

import re

from lxml import etree

import PIL.Image

import tensorflow as tf

from object_detection.utils import dataset_util

from object_detection.utils import label_map_util

flags = tf.app.flags

flags.DEFINE_string('data_dir', '', 'Root directory to raw pet dataset.')


flags.DEFINE_string('output_dir', '', 'Path to directory to output TFRecords.')
flags.DEFINE_string('label_map_path', 'data/pet_label_map.pbtxt',

'Path to label map proto')

FLAGS = flags.FLAGS

def get_class_name_from_filename(file_name):

"""Gets the class name from a file.

Args:

file_name: The file name to get the class name from.

ie. "american_pit_bull_terrier_105.jpg"

Returns:
example: The converted tf.Example.

" match = re.match(r'([A-Za-z_]+)(_[0-9]+\.jpg)',


file_name, re.I) return match.groups()[0]

4
3
Real-Time Object Detection and Recognition

def dict_to_tf_example(data,

label_map_dict,

image_subdirectory,

ignore_difficult_instances=False):

"""Convert XML derived dict to tf.Example proto.

Notice that this function normalizes the bounding box coordinates provided
by the raw data.

Args:

data: dict holding PASCAL XML fields for a single image (obtained by

running dataset_util.recursive_parse_xml_to_dict)

label_map_dict: A map from string label names to integers ids.

image_subdirectory: String specifying subdirectory within the

Pascal dataset directory holding the actual image data.

ignore_difficult_instances: Whether to skip difficult instances in the

dataset (default: False).

Returns:

example: The converted tf.Example.

Raises:

ValueError: if the image pointed to by data['filename'] is not a valid JPEG


"""

img_path = os.path.join(image_subdirectory, data['filename'])

with tf.gfile.GFile(img_path) as fid:

encoded_jpg = fid.read()

encoded_jpg_io = io.BytesIO(encoded_jpg)

image = PIL.Image.open(encoded_jpg_io)

if image.format != 'JPEG':

raise ValueError('Image format not JPEG')

key = hashlib.sha256(encoded_jpg).hexdigest()
width = int(data['size']['width'])

height = int(data['size']['height'])

xmin = []

4
4
Real-Time Object Detection and Recognition

ymin = []

xmax = []

ymax = []

classes = []

classes_text = []

truncated = []

poses = []

difficult_obj = []

for obj in data['object']:

difficult = bool(int(obj['difficult']))

if ignore_difficult_instances and difficult:

continue

difficult_obj.append(int(difficult))

xmin.append(float(obj['bndbox']['xmin']) / width)

ymin.append(float(obj['bndbox']['ymin']) / height)

xmax.append(float(obj['bndbox']['xmax']) / width)

ymax.append(float(obj['bndbox']['ymax']) / height)

class_name = get_class_name_from_filename(data['filename'])

classes_text.append(class_name)

classes.append(label_map_dict[class_name])

truncated.append(int(obj['truncated']))

poses.append(obj['pose'])

example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height), 'image/width':
dataset_util.int64_feature(width), 'image/filename':
dataset_util.bytes_feature(data['filename']), 'image/source_id':
dataset_util.bytes_feature(data['filename']), 'image/key/sha256':
dataset_util.bytes_feature(key), 'image/encoded':
dataset_util.bytes_feature(encoded_jpg), 'image/format':
dataset_util.bytes_feature('jpeg'), 'image/object/bbox/xmin':
dataset_util.float_list_feature(xmin), 'image/object/bbox/xmax':
dataset_util.float_list_feature(xmax), 'image/object/bbox/ymin':
dataset_util.float_list_feature(ymin), 'image/object/bbox/ymax':
dataset_util.float_list_feature(ymax), 'image/object/class/text':
dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
'image/object/truncated': dataset_util.int64_list_feature(truncated),

4
5
Real-Time Object Detection and Recognition

'image/object/view': dataset_util.bytes_list_feature(poses),
}))
return example

def create_tf_record(output_filename,

label_map_dict,

annotations_dir,

image_dir,

examples):

"""Creates a TFRecord file from examples.

Args:

output_filename: Path to where output file is saved.

label_map_dict: The label map dictionary.

annotations_dir: Directory where annotation files are stored.

image_dir: Directory where image files are stored.

examples: Examples to parse and save to tf record.

"""

writer = tf.python_io.TFRecordWriter(output_filename)

for idx, example in enumerate(examples):

if idx % 100 == 0:

logging.info('On image %d of %d', idx, len(examples))

path = os.path.join(annotations_dir, 'xmls', example + '.xml')

if not os.path.exists(path):

logging.warning('Could not find %s, ignoring example.', path)

continue

with tf.gfile.GFile(path, 'r') as fid:

xml_str = fid.read()

xml = etree.fromstring(xml_str)

data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']

tf_example = dict_to_tf_example(data, label_map_dict, image_dir)

writer.write(tf_example.SerializeToString())
writer.close()

4
6
Real-Time Object Detection and Recognition

# TODO: Add test for pet/PASCAL


main files. def main(_):
data_dir = FLAGS.data_dir

label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)

logging.info('Reading from Pet dataset.')

image_dir = os.path.join(data_dir, 'images')

annotations_dir = os.path.join(data_dir, 'annotations')

examples_path = os.path.join(annotations_dir, 'trainval.txt')

examples_list = dataset_util.read_examples_list(examples_path)

# Test images are not included in the downloaded data set, so we shall perform

# our own split.

random.seed(42)

random.shuffle(examples_list)

num_examples = len(examples_list)

num_train = int(0.7 * num_examples)

train_examples = examples_list[:num_train]

val_examples = examples_list[num_train:]

logging.info('%d training and %d validation examples.',

len(train_examples), len(val_examples))

train_output_path = os.path.join(FLAGS.output_dir, 'pet_train.record')


val_output_path = os.path.join(FLAGS.output_dir, 'pet_val.record')
create_tf_record(train_output_path, label_map_dict, annotations_dir,

image_dir, train_examples)

create_tf_record(val_output_path, label_map_dict, annotations_dir,

image_dir, val_examples)

if __name__ == '__main__':

tf.app.run()
2. Object_detection.py

4
7
Real-Time Object Detection and Recognition

import os

import cv2

import time

import argparse

import multiprocessing

import numpy as np

import tensorflow as tf

import datetime

import pymysql

from time import gmtime, strftime

from utils.app_utils import FPS, WebcamVideoStream


from multiprocessing import Queue, Pool

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as vis_util

CWD_PATH = os.getcwd()

# Path to frozen detection graph. This is the actual model that is used for the object
detection. MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'

PATH_TO_CKPT = os.path.join(CWD_PATH, 'object_detection', MODEL_NAME,


'frozen_inference_graph.pb')

# List of the strings that is used to add correct label for each box.

PATH_TO_LABELS = os.path.join(CWD_PATH, 'object_detection', 'data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90

# Loading label map

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)

categories = label_map_util.convert_label_map_to_categories(label_map,
max_num_classes=NUM_CLASSES,

use_display_name=True)

category_index = label_map_util.create_category_index(categories)

def detect_objects(image_np, sess, detection_graph):


# Expand dimensions since the model expects images to have shape: [1, None,
None, 3] image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

4
8
Real-Time Object Detection and Recognition

# Each box represents a part of the image where a particular object was
detected. boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

# Each score represent how level of confidence for each of the objects.

# Score is shown on the result image, together with the class label.

scores = detection_graph.get_tensor_by_name('detection_scores:0')

classes = detection_graph.get_tensor_by_name('detection_classes:0')

num_detections = detection_graph.get_tensor_by_name('num_detections:0')

# Actual detection.

(boxes, scores, classes, num_detections) = sess.run(

[boxes, scores, classes, num_detections],

feed_dict={image_tensor: image_np_expanded})

#print([category_index.get(i) for i in classes[scores > 0.5]])

conn = pymysql.connect(host="localhost",user="root",passwd="",db="DB_detect")

mycursor = conn.cursor()

co=0

for i in classes[scores > 0.5]:

if(category_index[i]['id']==1):

co = co+1

print(strftime("%Y-%m-%d %H:%M:%S", gmtime()))

print(category_index[i]['name'])

print("The count_per_Frame: %i " % co)

mycursor.execute("INSERT INTO logs( datetime,type,cif)


VALUES(NOW(),'person',co);")
print("Data inserted !!")
print("------------------------------------------------------ ")
------------------------
if(category_index[i]['id']==18):
co = co+1
print(strftime("%Y-%m-%d %H:%M:%S", gmtime()))
print(category_index[i]['name'])
print("The count_per_Frame: %i " % co)
mycursor.execute("INSERT INTO logs( datetime,type,cif) VALUES(NOW(),'dog','co');")
print("Data inserted !!")
print("------------------------------------------------------ ")
------------------------

4
9
Real-Time Object Detection and Recognition

if(category_index[i]['id']==3):

co = co+1

print(strftime("%Y-%m-%d %H:%M:%S", gmtime()))

print(category_index[i]['name'])

print("The count_per_Frame: %i " % co)

mycursor.execute("INSERT INTO logs( datetime,type,cif) VALUES(NOW(),'car','co');")

print("Data inserted !!")

print("------------------------------------------------------------------------------")

conn.commit()

conn.close()
5
0

You might also like