0% found this document useful (0 votes)
19 views57 pages

Nivetha Me Phase1rep

Summer internship report model

Uploaded by

tobi41879
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views57 pages

Nivetha Me Phase1rep

Summer internship report model

Uploaded by

tobi41879
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

DESIGN AND DEVELOPMENT OF MOBILE

OBJECT DETECTION USING TENSOR FLOW LITE


AND DEEP LEARNING APPROACHES

PHASE I REPORT

Submitted by
D.NIVETHA (920122421006)

In partial fulfilment for the award of the degree


Of
MASTER OF ENGINEERING IN
SOFTWARE ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BHARATH NIKETAN ENGINEERING COLLEGE, AUNDIPATTY

ANNA UNIVERSITY: CHENNAI 600 025

NOV/DEC 2023
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “DESIGN AND DEVELOPMENT OF MOBILE


OBJECT DETECTION USING TENSOR FLOW LITE AND DEEP LEARNING
APPROACHES” is the bonafide work of “ D.NIVETHA (920122421006)”who carried out
the project work under my supervision.

SIGNATURE SIGNATURE
Head of the Department Project Supervisor
Mr.S.OYYATHEVAN, M.E., Mrs.S.SUBHA.,M.E., ……..

Department of Computer science Department of Computer science


and Engineering, and Engineering,
Bharath Niketan Engineering college. Bharath Niketan Engineering college.
Aundipatty-625 536. Aundipatty-625 536.

Submitted for the project Viva-Voice examination held on _

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
ACKNOWLEDGEMENT

We thank the most graceful creator of the universe, our almighty GOD, who ideally
supported us thought this project.

At this moment of having successfully completed our project, we wish to convey our
sincere thanks to the management and our Chairman Dr. S. MOHAN who provide
all the facilities to us.

We would like to express our sincere thanks to our Principal Dr.P.V.ARUL


KUMAR, M.E, M.B.A, Ph.D., for letting us to do our project and offering adequate
duration in completing our project.

We are also grateful to Mr.S.OYYATHEVAN M.E., our Head of the Department


for his constructive suggestions during our project with deep sense of gratitude.

We extend our sincere thanks to our guide Mrs.S.SUBHA M.E., Assistant Professor,
Department of Computer Science and Engineering, who is our light house in the vast
ocean of learning with her inspiring guideless and encouragement to complete the
project.

We would like to express our gratitude to all the teaching and nonteaching staff
members of Computer Science and Engineering Department and our friends for that
kind help extended to us.

iii
ABSTRACT

Mobile object detection has become an important research area due to its

various practical applications. In this paper, we propose a deep learning-based

approach for mobile object detection using TensorFlow Lite. We use a deep

neural network with multiple convolutional layers to extract features from the

input image. The extracted features are then fed to a set of detection heads to

predict the bounding boxes and corresponding class labels of the objects in

the image. We use the MobileNetV2 architecture as our backbone network,

and the SSD (Single Shot Detector) algorithm as our detection framework.

Our approach achieves high accuracy and real-time performance on mobile

devices. We evaluate our approach on several popular object detection

datasets, and demonstrate that it outperforms existing state-of-the-art

approaches. Our approach can be used in a wide range of applications, such as

autonomous driving, surveillance, and augmented reality.

iv
க்கம்

ெமாைபல் ெபா ள் கண்ட தல் அதன் பல் ேவ நைட ைற

பயன் பா கள் காரணமாக ஒ க் யமான ஆராய் ச்

ப யாக மா ள் ள . இந்தத் தாளில் , ெடன் சர்ஃப் ேளா

ைலட்ைடப் பயன் ப த் ெமாைபல் ெபா ைளக்

கண்ட வதற் கான ஆழமான கற் றல் அ ப்பைட லான

அ ைறைய நாங் கள் ன்ெமா ேறாம் . உள் ளீட் ப்

படத் ந் அம் சங் கைளப் ரித்ெத க்க, பல உ மாற் ற

அ க் கைளக் ெகாண்ட ஆழமான நரம் யல்

வைலயைமப் ைபப் பயன் ப த் ேறாம் .

ரித்ெத க் கப்பட்ட அம் சங் கள் , படத் ல் உள் ள

ெபா ட்களின் எல் ைலப் ெபட் கள் மற் ம் ெதாடர் ைடய

வ ப் ேல ள் கைளக் கணிக்க, கண்ட தல் தைலகளின்

ெதா ப் ற் வழங் கப் ப ன் றன. MobileNetV2

கட்டைமப் ைப எங் கள் ெக ம் ெநட்ெவார்க்காக ம் ,

SSD ( ங் ள் ஷாட் ெடக்டர்) அல் காரிதத்ைத எங் கள்

கண்ட தல் கட்டைமப்பாக ம் பயன் ப த் ேறாம் . எங் கள்

அ ைற ெமாைபல் சாதனங் களில் அ க ல் யம்

மற் ம் நிகழ் ேநர ெசயல் றைன அைட ற . பல

ரபலமான ஆப்ெஜக்ட் கண்ட தல் தர த்ெதா ப் களில்

எங் கள் அ ைறைய நாங் கள் ம ப் ெசய் ேறாம் ,

v
ேம ம் இ தற் ேபா ள் ள ந ன அ ைறகைள ட

றப் பாக ெசயல் ப ற என் பைத நி க் ேறாம் .

எங் களின் அ ைறயான தன் னியக்க ஓட் நர்,

கண்காணிப் மற் ம் ஆக்ெமன் ட்டட் ரியா ட் ேபான் ற

பரந்த அள லான பயன் பா களில் பயன் ப த்தப் படலாம் .

vi
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

ABSTRACT iv

LIST OF FIGURES ix

LIST OF ABBREVIATIONS x
11.1.

1. INTRODUCTION
11
1.1 General Introduction 11
1.2 Project Objectives 14
1.3 Problem Statement 15
2. EXISTING SYSYEM 16
2.1.Disadvantages 16

3. PROPOSED SYSTEM 17

3.1 Advantages 17

4. SYSTEM DIAGRAMS 19
4.1 Architecture Diagram 19
4.2 Data Flow Diagram 20
4.3 ER Diagram 22
4.4 Use Case Diagram 23
4.5 Class Diagram 24
4.6 Sequence Diagram 25
4.7 Activity Diagram 26
5. LITERATURE SURVEY 27
6. SYSTEM IMPLEMENTATION 42

vii
6.1 MODULES
42
6.2 MODULES DESCRIPTION
42
6.2.1Examine And Understand Dataset 42

6.2.2Build an input dataset 43

6.2.3Build the model 46

6.2.4Train the model 49

6.2.5Test the model 49

6.2.6Integration with tensor flow lite 50


7. 52
REFERENCES

viii
LIST OF FIGURES

FIGURE NO TITLE PAGE NO


4.1 Architecture Diagram 19
4.2 Data Flow Diagram 20
4.3 ER Diagram 22
4.4 Use Case Diagram 23
4.5 Class Diagram 24
4.6 Sequence Diagram 25
4.7 Activity Diagram 26
6.1 Build The Model For Teachable
Machine 43

6.2 Training Datasets With Label 45

6.3 Sample Input Training Model


45

6.4 Structure Of CNN


46

6.5 Integration Architecture For Tensor


Flow Lite 50

ix
LIST OF ABBREVIATION

SSD Single Shot Detector

CNN Convolutional Neural Networks

ADAS Advanced Driver Assistance Systems

FDDB Face Detection Data Set and Benchmark

10
CHAPTER-1

INTRODUCTION

1.1. General Introduction

In our Project develop a model for image classification using Deep Learning
approach it can automate the feature extraction process and is effective for image
recognition. We evaluate the viability of using deep learning models for object
detection in real-time video feeds on mobile devices in terms of object detection
performance and inference delay as either an end-to-end system or feature extractor.
used TensorFlow, a relatively new library from Google, to model our neural
network. The TensorFlow Object Detection API is used to detect multiple objects in
real-time video streams. It can use an algorithm to detect patterns and alert the user
if an anomaly is found. Mobile object detection is the process of detecting and
localizing objects of interest in images or videos using a mobile device such as a
smartphone or a tablet. This technology has become increasingly important in
recent years as the use of mobile devices has become more widespread, and the
demand for applications that can recognize and understand the visual world has
grown.

Mobile object detection algorithms use a combination of machine learning and


computer vision techniques to analyze images and identify objects within them.
These algorithms can be trained on large datasets of labeled images to learn to
recognize specific objects or categories of objects. Once trained, the algorithms can
be deployed on mobile devices to analyze real-world images or video in real-time.

11
Mobile object detection has a wide range of applications, from augmented
reality and gaming to autonomous vehicles and security systems. By enabling
mobile devices to identify and understand their environment, object detection
technology can provide users with a more immersive and interactive experience and
help businesses and organizations improve their operations and security.

Object detection plays a vital role in computer vision and robotics. Common
examples of computer vision include video surveillance, automated inspection in
industries, traffic monitoring, digital form of libraries, and electronic gadgets such
as mobiles, cameras, and tablets. Object detection is applied in robotics for
navigational tasks, interactions with human and the environment, manipulation or
building of objects, etc. It is one of the primary objectives involved in building a
fully autonomous system. Integrating object detection with sophisticated cameras
that feature long-range zooming, night vision, etc., can give rise to hybrid systems
that can be deployed for surveillance and military-based applications. Since most
modern visual detection techniques are inspired by the human brain, they can be
used in providing remedies and alternatives for vision-related impairments. They
also provide tangible insight into how vision is being interpreted by the brain.

Applying or using computer vision algorithms in each of the fields mentioned


above has its own challenges and still remain as open problems in research. In a
broader perspective, object detection can be accomplished by using two approaches:
machine learning based and deep learning based. In the machine learning-based
approach, the features of computer vision are used to identify a group of pixels. The
features used are primary properties such as edges, shapes, texture and colour. The
features are imported to a regression-based algorithm which returns the label and
location of the target object. In the deep learning-based approaches, Convolutional
Neural Networks (CNN) are employed. The usage of CNN result in an
unsupervised end-to-end object detection without the need for feature extraction. In
this thesis, both these approaches have been deployed for object detection and

12
recognition. Advanced Driver Assistance Systems (ADAS) are a much popular
disruptive technology in automobile industry. It needs software, hardware and
technologies like RADAR, LIDAR, vision and image processing systems, with
artificial intelligence to assist the driver for a safe driving experience. In this thesis,
ADAS has been given only as a use case for the problems solved and this does not
include any real-time image acquisition. However these algorithms can be
successfully deployed in real-time vision systems such as ADAS.

TensorFlow Lite is a lightweight version of the TensorFlow framework


specifically designed for running machine learning models on mobile and
embedded devices. It allows developers to deploy machine learning models on
smartphones, tablets, and other mobile devices with low computational resources
and limited memory. Mobile object detection using TensorFlow Lite involves
converting a pre-trained object detection model into a format that can be executed
on mobile devices. This involves a process called model optimization, which
includes techniques such as quantization, pruning, and weight compression to
reduce the size of the model and improve its efficiency. Once the model is
optimized, it can be integrated into a mobile application using the TensorFlow Lite
API. This allows developers to create custom object detection applications that can
be run on mobile devices in real-time. There are several benefits to using
TensorFlow Lite for mobile object detection. First, it allows developers to leverage
the power of machine learning for object detection without requiring high-end
hardware. Second, it provides a flexible and customizable platform for building
object detection applications that can be tailored to specific use cases. Finally, it
enables real-time object detection on mobile devices, which can be useful for
applications such as augmented reality and mobile robotics.

13
1.2. Project Objectives

 The fundamental task of image classification is to make sure all the images
are categorized according to its specific sectors or groups. Classification is
easy for humans but it has proved to be major problems for machines.
 It consists of unidentified patterns compared to detecting an object as it
should be classified to the proper categories. The various applications such
as vehicle navigation, robot navigation and remote sensing by using image
classification technology.
 In this project have solved an image recognition problem, where our goal is
to tell which class the input image belongs to. The way are going to achieve
it is by training an artificial neural network on I,00,000 images of airplane,
automobile, bird, cat, deer, dog, frog, horse, ship and truck and make the
Convolutional Neural Networks (CNN) learn to predict which class the
image belongs to, next time it sees an image having a cat or dog in it.

14
1.3.Problem Statement

This application requires to be connected online throughout the Image


classification process. Computational cost time during the recall phase of such a
model, as it should be capable of running on a mobile device with limited
computational power. As multiple objects might exist in a single frame, the frame
must be divided into a grid were multiple cells, where each cell is analyzed
independently, which inherently increase the computational cost. So we have
implemented a project for object detection in real-time video feeds on mobile
devices in an effective way. We have also aimed to acquire good accuracy in
identifying the objects present in the images

15
CHAPTER 2

2.1 EXISTING SYSTEM


Image classification system based on a structure of a Convolutional Neural
Network (CNN). The training was performed such that a balanced number of face
images and non-face images were used for training by deriving additional face
images from the face images data. The image classification system employs the bi-
scale CNN with 120 trained data and the auto-stage training achieves 81.6%
detection rate with only six false positives on Face Detection Data Set and
Benchmark (FDDB), where the current state of the art achieves about 80% detection
rate with 50 false positives. It proposed fast image classification by boosting the
Fuzzy Classifiers. It was a simple way to differentiate between known and unknown
category. This method is simply boosting Meta knowledge where local
characteristic can be mostly found. It was tested with some big data of images and
compared with the bag-of-features image model. The result gave much better
classification accuracy as it was a testing process that gave a short period of time
where it produced 30% shorter compared to the previous one.

2.1.1. Disadvantages

 Low Accuracy

 Required High Memory Space

16
CHAPTER-3

3.Proposed System

There are many challenges in identifying or detecting the objects present in


the image and also detection of objects in image from a large image dataset is a
difficult process. So we have implemented a project for image recognition in Deep
Learning using convolutional neural network algorithm. The main objective is to
recognize the objects present in the image dataset in an effective way. We have also
aimed to acquire good accuracy in identifying the objects present in the images. The
fundamental task of image classification is to make sure all the images are
categorized according to its specific sectors or groups. Classification is easy for
humans but it has proved to be major problems for machines. It consists of
unidentified patterns compared to detecting an object as it should be classified to the
proper categories. The various applications such as vehicle navigation, robot
navigation and remote sensing by using image classification technology.

3.1. Advantages

There are several advantages of using TensorFlow Lite and deep learning
approaches for mobile object detection:

 High accuracy:

Deep learning models are capable of achieving high accuracy in object detection
tasks. By using TensorFlow Lite, we can deploy these models on mobile devices
without compromising on accuracy.

 Real-time performance:

TensorFlow Lite is optimized for mobile devices, enabling deep learning models
to run in real-time on mobile devices. This is particularly useful in applications such
as autonomous driving and surveillance, where real-time object detection is crucial.

17
 Efficient memory and power consumption:

Mobile devices have limited memory and battery life. TensorFlow Lite models
are designed to be lightweight and efficient, minimizing memory and power
consumption on mobile devices.

 Customizable:

TensorFlow Lite and deep learning models allow for customization and fine-
tuning to specific use cases. This means that we can train models to detect
specific objects, such as faces or vehicles, and optimize them for specific
applications.

 Portability:

TensorFlow Lite models can be easily deployed on a wide range of mobile


devices, making them a versatile solution for object detection in different
environments.

18
CHAPTER-4

SYSTEM DIAGRAMS

4.1 Architecture Diagram

FIGURE NO: 4.1 Architecture Diagram

19
4.2 Flow Diagram

A data flow diagram (DFD) is a common method for illustrating how


information moves throughout a system. A good deal of the system requirements
can be graphicall depicted in a clean and clear DFD. It may be manual, automated,
or a hybrid of the two. It demonstrates how information enters and exits the system,
what modifies the information, and where information is stored. A DFD's main
function is to outline the scope and bounds of a system as a whole. It may be
utilised as a tool for communication between a systems analyst and any individual
who plays a component in the system that serves as the foundation for redesigning a
system.

FIGURE 4.2. Data Flow Diagram

20
4.3 ER Diagram

An entity-relationship model (ER model) employs a diagram referred to as


an entity relationship diagram to explain the organisation of a database (ER
Diagram). In the future, a database can be built using a blueprint known as an ER
model. The two primary components of the E-R model are the entity set and
relationship set. The relationships between entity sets are shown in an ER diagram.
An entity set is a group of connected entities, each of which may have attributes. An
entity in a DBMS is a table or an attribute of a table, so the ER diagram
demonstrates the relationships between tables and their attributes to show the entire
logical structure of a database. To better understand this idea, let's look at a
straightforward ER diagram.

21
FIGURE 4.3 ER Diagram

22
4.4.Use Case Diagram

FIGURE 4.4 Use Case Diagram

23
4.5 .Class Diagram

FIGURE 4.5.Class Diagram

24
4.6 .Sequence Diagram

FIGURE 4.6 .SEQUENCE DIAGRAM

25
4.7. Activity Diagram

FIGURE 4.7 Activity Diagram

26
CHAPTER-5

5.1.LITERATURE SURVEY

This chapter reviews the relevant literature and several strands of research in
the area of Object Detection using Deep Learning. There are many papers related to
Object Detection techniques in literature such as Scale Invariant Feature Transform,
speeded up Robust Features, Convolutional Neural Networks and out of which,
study in this thesis is confined to Convolutional Neural Networks. Object Detection
is used in several walks of life like Security, Military, Medical Imaging, Biometric
recognition, Iris recognition, Natural Language processing, video analysis, weather
forecasting. Deep Learning Models are mainly used for Object Detection algorithms
due to their accurate Image Recognition capability. In various fields there is a
necessity to detect the target object and also track them effectively while handling
occlusions and other included complexities. Many researchers attempted for various
approaches in object tracking. The nature of the techniques largely depends on the
application domain. Some of the research works which made the evolution to
proposed work in the field of object tracking are depicted as follows. The following
is a list of several papers to emphasize the research works carried out on Object
Detection using Deep Learning.

[1] Lichao Mou et al. proposed a novel method for vehicle instance segmentation,
which requires identifying, at the pixel level, where the vehicles appear and
associating each pixel with a physical instance of a vehicle. This work developed a
unified multitask learning network that learns two complementary tasks, which are
segmenting vehicle regions and detecting semantic boundaries. Also, a new Dataset
for vehicle instance segmentation, namely, Busy Parking Lot Unmanned Aerial

27
Vehicle Video is developed for future research purposes.

[2] Waqas Hassan et al. proposed a pixel classification method which uses
segmentation technique to identify stationary objects in the image. These objects are
then tracked using a new technique called adaptive edge orientation. This technique
detects objects with an accuracy of 95% which is an improvement over state-of-the-
art methods.

[3] Richard J. Radke et al. surveyed various Image change detection algorithms and
discussed the principles for comparing performance of change detection algorithms.

[4] Hao Long et al. demonstrates Object Detection in Aerial Images to obtain better
detection performance in aerial images by designing a novel Deep Neural Network
framework called Feature Fusion Deep Networks (FFDN). The novel architecture
provides powerful hierarchical representation and also strengthens the spatial
relationship between the high-density objects. Proposed model FFDN achieves
good improvement on benchmark Datasets like UAV123, UAVDT. The advantage
of the proposed method is that the objects which appear with small size, partial
occlusion and out of view, as well as in the dark background can also be detected
accurately.

[5] Ross Girshick et al. proposed a scalable Object Detection algorithm which
improves the Mean Average Precision (MAP) by more than 50% when compared to
the previous best result of 62.4% on VOC 2012 Dataset. The proposed method is a
culmination of two ideas. They are ‘Usage of highcapacity CNNs to bottom-up
region proposals’ and ‘Supervised pre-training for an auxiliary task’, when labelled
training data are scarce and followed by domain-specific Fine-Tuning, which boosts
the performance. Hoo-Chang Shin et al discussed the reasons for using CNNs in
computer-aided detection problems. This work concentrates on two computer aided
detection problems such as thoraco-abdominal Lymph Node (LN) detection and

28
Interstitial Lung Disease (ILD) classification. Xuesong Zhang et al. [13] described a
method of combining a set of Fine-Tuned CNN models for multi-class object
category recognition tasks. In this method the Fine-Tuned CNN models are trained
on target Dataset and the last fully connected layers are changed according to the
targeted task. The results obtained from the proposed method are good enough in
detecting objects.

[6] Zhong-Qiu Zhao et al. discusses various Deep Learning-based Object Detection
frameworks. This work focuses on refinements that can be applied on Object
Detection architectures to improve their performance further. Also, this work gave
some promising directions for designing a better Object Detector.

[7] Ross Girshick et al. proposed a Fast Region-based Convolutional Neural


Network (Fast R-CNN) for Object Detection. This method became popular because
of its speed when compared to the state-of-the-art techniques. Fast-RCNN trains
VGG-16 network three times faster than RCNN and tests the same network ten
times faster than RCNN. Olaf Ronneberger et al. [16] presented a training strategy
that uses data augmentation to make extensive use of the available images. This
strategy gave best results even with few training images and outperformed the
preceding best on the ISBI challenge for segmentation of neuronal structures in
electron microscopic stacks.

[8] Hasbi Ash Shiddieqy, Farkhad Ihsan Hariadi, Trio Adiono “Implementation of
Deep-Learning based Image Classification on Single Board Computer”, In this
paper, a deep-learning algorithm based on convolutional neural-network is
implemented using python and tflearn for image classification, in which two
different structures of CNN are used, namely with two and five layers and It
conclude that the CNN with higher layer performs classification process with much
higher accuracy.

29
[9] Rui Wang, Wei Li, Runnan Qin and JinZhong Wu “Blur Image Classification
based on Deep Learning”, In this paper, a convolution neural network (CNN) of
Simplified-Fast-Alexnet (SFA) based on the learning features is proposed for
handling the classification issue of defocus blur, Gaussian blur, haze blur and
motion blur four blur type images. The experiment results demonstrate that the
performance of classification accuracy of SFA, which is 96.99% for simulated blur
dataset and 92.75% for natural blur dataset, is equivalent to Alexnet and superior to
other classification methods.

[10] Sameer Khan and Suet-Peng Yong “A Deep Learning Architecture for
Classifying Medical Image of Anatomy Object”, In this paper, a modified CNN
architecture that combines multiple convolution and pooling layers for higher level
feature learning is proposed. In this, medical image anatomy classification has been
carried out and it shows that the proposed CNN feature representation outperforms
the three baseline architectures for classifying medical image anatomies.

[11] Ye Tao, Ming Zhang, Mark Parsons “Deep Learning in Photovoltaic


Penetration Classification”, this paper proposed a deep learning based algorithm to
differentiate photovoltaic events from other grid events, and it conclude that a deep
convolutional neural network can achieve higher classification accuracy than a fully
connected model.

[12] C. A. Ronao and S.-B. Cho, “Human activity recognition with smartphone
sensors using deep learning neural networks,” Expert Syst. Appl., vol. 59, pp.
235_244, Oct. 2016. Convolutional networks comprised of one or more
convolutional and pooling layers followed by one or more fully-connected layers,
have gained popularity due to their ability to learn unique representations from
images or speeches, capturing local dependency and distortion invariance.

30
[13] Hernández-Serna, A.; Jimenez, L. “Automatic identification of species with
neural networks.” 2014, 11,563. This Paper reveals that tasks are getting more
complicated than ever before, the volume of datasets has increased to a size which
cannot be solved by traditional machine learning methods. As a result, the role of
artificial neural networks is revealed. developed a neural network structure
intending to classify a dataset including 740 species and 11,198 samples. They had
a 91.65% accuracy with fish, 92.87% and 93.25% of plants and butterflies,
respectively.

[14] Cortes, C.; Vapnik, V. Support-vector networksv. Mach. Learn. 1995 20, 273–
297.In the traditional machine learning area, SVM is one of the most effective
methods to address various kinds of problems such as classification and regression.
Recently, the Principal Component Analysis (PCA) This paper revealed that
algorithm combined with SVM is also prevailing since this method can enhance the
efficiency of training the model to a large extent.

[15] Alsing, O. Mobile Object Detection using TensorFlow Lite and Transfer
Learning (Dissertation). 2018.Built real-time CNN networks based on transfer
learning to detect the regions where have Post-it and transformed these models to
the format which can be used on smartphones with TensorFlow Lite tools. A Faster
Region CNN (RCNN) ResNet50 architecture achieved the highest mAP (mean
Average Precision), which is 99.33% but the inference time is 20018 milliseconds
(ms) The difference in the accuracy of traditional machine learning, deep learning
and transfer learning methodologies on butterfly detection has not been the focus of
any research so far and a mobile application to detect butterfly has never been
developed. Therefore, in this paper, after optimized certain parameters, not only we
improved the accuracy from previous studies with deep learning models,but we also
created a practical Android application that has a relatively short inference time

31
[16] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-net:
Imagenet classification using binary convolutional neural networks. Lecture Notes
in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), Image classification system based on a
structure of a Convolutional Neural Network (CNN). The training was performed
such that a balanced number of face images and non-face images were used for
training by deriving additional face images from the face images data. The image
classification system employs the bi-scale CNN with 120 trained data and the auto-
stage training achieves 81.6% detection rate with only six false positives on Face
Detection Data Set and Benchmark (FDDB), where the current state of the art
achieves about 80% detection rate with 50 false positives.

[17] Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015).
DRAW: A Recurrent Neural Network For Image Generation. This Paper studied
about Neural Network Architecture (NNA) as a method for the image classification.
The framework consists of a combination between mimics of two pairs human eye
and variation sequence auto-encoding. It involved many complex images but in the
process of this study, the system slowly improves the MNIST models. The MNIST
is the open source database to be used as the training set. It also tests with Street
View House Numbers dataset where the result was improved because even the
human eyes cannot distinguish it

32
[18] Kamavisdar, P., Saluja, S., & Agrawal, S. (2013). A survey on image
classification approaches and techniques. Nternational Journal of Advanced
Research in Computer and Communication Engineering, 2(1), 1005–1009. This
Paper used Decision Tree (DT) as the techniques in image classification. The DT
has multiple datasets that are located under each of Hierarchical classifier. It must
be done in order to calculate membership for each of the classes. The classifier
allowed some rejection of the class on the intermediary stages. This method also
required of three (3) parts which the first one is to find terminal nodes and second in
the placement of class within it. The third one is partitioning of the nodes. This
method is considered very simple and high rate of efficiency.

[19] "MobileNetV2: Inverted Residuals and Linear Bottlenecks" by Mark Sandler,


Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. This
paper proposes a novel mobile architecture for object detection, called
MobileNetV2. The architecture uses inverted residuals and linear bottlenecks to
improve accuracy and efficiency.

[20] "SSD: Single Shot MultiBox Detector" by Wei Liu, Dragomir Anguelov,
Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C.
Berg. This paper presents a new object detection framework, SSD, that is optimized
for mobile devices. The framework achieves high accuracy with low computational
cost.

[21] "MobileNet: Efficient Convolutional Neural Networks for Mobile Vision


Applications" by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry
Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig
Adam. This paper proposes a mobile architecture for object detection that uses
lightweight convolutional neural networks (CNNs) to achieve high accuracy while
minimizing computational cost.

33
[22] "YOLOv3: An Incremental Improvement" by Joseph Redmon and Ali Farhadi.
This paper introduces YOLOv3, the latest version of the YOLO (You Only Look
Once) object detection algorithm. The algorithm is optimized for mobile devices
and achieves state-of-the-art performance in terms of speed and accuracy.

[23] "EfficientDet: Scalable and Efficient Object Detection" by Mingxing Tan,


Ruoming Pang, and Quoc V. Le. This paper proposes a family of scalable and
efficient object detection models called EfficientDet. The models achieve high
accuracy while being computationally efficient and are designed for deployment on
mobile devices.

[24] "MobileNets for Object Detection: A Practical Guide" by Andrew G. Howard,


Menglong Zhu, Bo Chen, Dmitry Kalenichenko, and Hartwig Adam. This paper
provides a practical guide for using MobileNets, a family of lightweight CNNs
designed for mobile devices, for object detection. The guide includes
implementation details and performance benchmarks.

[25] "SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural


Networks for Real-Time Object Detection for Autonomous Driving" by Bichen Wu,
Alvin Wan, Xiangyu Yue, Peter Jin, Sicheng Zhao, Noah Golmant, Amir
Gholaminejad, Joseph Gonzalez, and Kurt Keutzer. This paper proposes
SqueezeDet, a real-time object detection algorithm optimized for autonomous
driving applications. The algorithm achieves high accuracy with low computational
cost and is designed to run on low-power mobile hardware.

34
[26] An improved Gabor Wavelet Transform based algorithm has been proposed
by Wang et.al (2019) which employs a two-dimensional Discrete Cosine Transform
(DCT). This method focuses on the DC part of the image reducing the correlation
and helps to extract the features, and then Gabor Wavelet Transform is applied. The
use of 2D-DCT increases the recognition rate by removing unnecessary part of the
image that carries no information, thus reducing the recognition time. From these
extracted features, the images are classified using a nearest neighbor classifier using
euclidean distance. This helps in reducing the computation time and improves
recognition rate.

[27] Facial features are extracted using Gabor wavelet based on the contour feature
line (Zhang et.al 2019). This forms the Gabor face feature matrix from which the
redundant information is removed using uniform sampling. Dimensionality
reduction is done using Principal Component Analysis and calculates the co-
variance and characteristic matrix. Finally, classification has been done using
Support Vector Machine. Face recognition using 2D Gabor Wavelet Transform and
Local Binary Pattern (LBP) is proposed by Zhang et.al (2019).

[28] Initially, dimensions of the input image are reduced using Principal Component
Analysis (PCA) and then LBP is applied. It describes an image’s texture features,
each pixel’s neighbourhood characteristics using binary encoding of the pixel gray
differences between the center pixel and its neighbours. This shows the pattern
representation of the image and reduces the impact of factors such as illumination
on the image. It shows that the proposed idea of implementing LBP and Gabor
together has better recognition rate compared to applying each technique separately.
The problem of high feature dimensionality by application of Ant Colony
Optimization algorithm (ACO) for feature selection of relevant features is discussed
by Aro et.al (2018).

35
[29] The images are pre-processed to ensure proper recognition using geometrical
and illumination normalization. 14 After pre-processing is done, the face image is
convolved with a Gabor function and magnitude of Gabor responses are
concatenated into the face image. This extracted Gabor features are optimized using
ACO. A feature extraction method combining Gabor wavelet and uniform pattern
LBP has been proposed by Wang et.al (2017) for improving the accuracy of human
face feature recognition and flexibility of practical operation. The output of the
Gabor filter is coded using an uniform LBP. In an uniform LBP, the binary values
do not have excessive jumps. Depending on the binary values, the uniform pattern
can be detected using the number of binary digits and the number of binary
sequences. Based on the literature survey done it has been observed that face
recognition invariant to pose and illumination is still an open problem to be
addressed. In this work, an algorithm has been proposed for the same.

[30] Krishneel Chaudhary et al. presented a method that allows robots to


autonomously interpret objects from observations of human-object interactions. The
proposed method takes RGB-D data of known object template and a search space as
input and segments the objects using the class-agnostic deep comparison and
segmentation network and finally outputs pixel-wise label and object ness score of
the object. The advantage of this approach is it could handle highly deformed
objects.

[31] Zhong-Qiu Zhao et al. surveyed various Deep Learning based Object
Detection frameworks. This work made some modifications to the existing Object
Detection architectures to improve the detection accuracy. Also, this work
formulated some promising guidelines for future work in Object Detection and
Neural Network based learning systems.

36
[32] Wei Liu et al. proposed a method called SSD which discretizes the output
space of bounding boxes into a set of default boxes over different aspect ratios and
scales per feature map location. The network outputs the score for the presence of
object in each box and produces adjustments to the box to better match the object
shape. Results displayed by SSD on PASCAL VOC, COCO, and ILSVRC Datasets
SSD achieved 74.3% mAP on VOC 2007 test set outperforming the state-of-the-art
model Faster R-CNN.

[33] Xingyi Zhou et al. developed a detector which uses key-point estimation to
find center points and regresses to all other object properties, like size, orientation,
3D location and pose. Center-Net achieves the best speed accuracy trade-off on MS-
COCO Dataset. The same approach is used to estimate 3D bounding box in KITTI
Dataset and human pose estimation on COCO key-point Dataset. This method
performed well in all the considered Datasets.

[34] Yushan Yu et al. designed a novel Deep Neural Network design for domain
adaptive Object Detection by further improving the localization accuracy of objects.
The following sequences of steps are followed for experimentation purpose. Step-1:
Refine the pseudo labels generated from the current Object Detection methods and
use these labels with a weighted loss function to train the network on the target
domain. Step-2: Residual blocks are inserted into the shallow layers of CNN in the
target domain to augment the special information which helps for object
localization. This network is tested on three publicly available benchmark Datasets
and the experimental results show that Domain Adaptive Localization Network
(DALocNet) out-performed the state-of-the-art methods for all the three Datasets.
Bernardo Augusto Godinho de Oliveira et al. surveyed that Object Detection
methods require either high processing power or large storage availability, making
it hard for resource constrained devices to perform the detection in real-time
without a connection to a powerful server. This work then designed a model that

37
requires only 95 megabytes of storage and took ms in average per image running on
a laptop CPU, making it suitable for standalone devices.

[34] Bin Xue et al. proposed a fast semi-supervised method, called DIOD, which is
based on Fully Convolutional Region Candidate Networks (FCRCNs) and Deep
Convolutional Neural Networks for affective object localization in minimal time.
FCRCN uses “seed” boxes at multiple scales and aspect ratios for object
localization. The proposed model is passed through a semi-supervised pre-training
process followed by Fine-Tuning, which produces accurate results to overcome the
lack of labelled training data. Finally, to improve the accuracy and speed of the
detection system further a novel sharing mechanism is designed which uses joint
learning strategy that extracts more discriminative and comprehensive features
while simultaneously learning the latent shared and individual features and their
correlations. Experiments are conducted on two real-world ISAR Datasets, and the
results demonstrate that DIOD outperforms the state-of-the-art methods.

[35] Alex Krizhevsky et al. trained a Deep Convolutional Neural Network with 1.2
million high-resolution images which belong to 1000 classes.
These results from this network achieved a top-1 and top-5 error rates of 37.5% and
17.0% in the test data which is better than the state-of-the-art techniques. Also,
another variant of this model is placed in ILSVRC-2012 competition and it became
the winner with a top-5 test error rate of 15.3%, compared to 26.2% achieved by the
second-best entry.

38
[36] Tie Liu et al. gave a new approach for detecting objects by template matching
from the large database collection. The approach was suitable for multi-scale
contrast backgrounds and used a color spatial method to detect objects. This
approach failed to detect multiple objects in a given user scenario and also failed
when the objects were in non-linear motion. This problem of failed detections was
effectively overcome in the proposed system by training the system to identify the
objects through an effective system learning technique and the objects in non-linear
motion are tracked using the proposed particle grouping approach.

[37] Gong Cheng et al. proposed a novel approach to learn a Rotation-Invariant


CNN (RICNN) model for advancing the performance of Object Detection in VHR
Optical Remote Sensing Images. This task is achieved by introducing a new
rotation-invariant layer. RICNN model is trained by optimizing a new objective
function which explicitly enforces the feature representations of the training
samples before and after rotating to be mapped close to each other, hence achieving
rotation invariance. The proposed work is tested on a publicly available ten-class
Object Detection Dataset and gave satisfactory results.

[38] Wendi Cai et al. discusses about the increase in road accidents and tries to find
a solution for the same. With the advent of self-driving cars the need for automatic
detection of street objects became essential. Generic Model detection algorithm
based on Convolution Neural Network(CNN) need to design the training model,
while the training and testing of the model will take a lot of time. Transfer Learning
is used to Fine-Tune the pre-trained models, using the Image task Datasets of
COCO, transferring a generic Deep Learning Model to specific one with different
weights and outputs. Furthermore, the CNN structure is adjusted to improve overall
performance, and the street environment is trained to the special scene.

39
[39] Chenglin Liu et al. [54] proposed a cascade method for lung nodule detection
using CNN. This method uses transfer learning technique [55] to train the Object
Detection network on the considered Dataset which is of 2954 chest X-ray images.
The results from the proposed method are effective in reducing false positives.

[40] Ruhan Sa et al. used a Deep Learning based Object Detection method for
identification of landmark points in lateral lumbar X-ray images. In this work Fine-
Tuning Faster-RCNN, using small annotated clinical Datasets is performed. In this
experiment only 81 lateral lumbar X-Ray images are taken for training purpose, and
could achieve better performance than the traditional sliding window detection
method. The number of training and testing images was increased to 974 and 108
and then Fine-Tuning was applied and the network achieved an average precision of
0.905 with an average computation time of 3s/image, which greatly outperformed
the state-of-the-art methods.

[41] Wenhui Diao et al. proposed an efficient coarse object locating method in
Remote Sensing Images using Deep Belief Networks (DBN). DBNs are trained on
remote sensing images for feature extraction and classification. The feature learning
of the DBN is operated by pre-training each layer of Restricted Boltzmann
Machines (RBMs) using the general layer wise training algorithm. This makes an
RBM generate edge filters which help in identifying edges more precisely. The
precise edge position information and pixel value information are more efficient to
build a good model of images.

40
[42] Jayraj Bandariya et al. developed an Object Detection system to assist totally
blind individuals to manage their activities without anybody’s help. This work also
compares different Object Detection algorithms like Haar Cascade and
Convolutional Neural Network (CNN). Haar Cascade classifier is basically used for
face detection but it can also be trained to detect other objects whereas
Convolutional Neural Network falls under Deep Learning approaches which are
mainly used for object recognition. A Dataset is created with 2300 images which
belong to 3 different classes. This comparison is being executed to find the CNN as
a suitable algorithm for this system from the aspect of accuracy.

[43] Chun Zhan et al. proposed an Image recognition technique using Optical
Wavelet and Support Vector Machine (OWSVM). Optical wavelet is used for
feature extraction and support vector machine is used for creation of image
recognition model. In this study, 80 images which belong to 9 classes are taken to
test the effectiveness of the proposed method. Image recognition accuracy of the
proposed method is 96.25 which is an improvement over the state-of-the-art method
Optical Wavelet Back Propagation Neural Network (OWBPNN) whose accuracy is
88.75.

[44] Zhi Zhang et al. developed a new approach to generate animal object region
proposals using multilevel graph cut in the spatiotemporal domain. Then a cross-
frame temporal patch verification method is developed to find out if these region
proposals are true animals or background patches. An efficient feature description is
developed by using Deep Learning in combination with histogram of oriented
gradient features encoded with Fisher vectors for animal detection. Results indicate
that the proposed spatiotemporal object proposal and patch verification framework
are performing better than the state-of-the-art methods and increased the animal
detection accuracy by 4.5%.

41
CHAPTER -6

6. IMPLEMENTATIONS

6.1 Module

 Examine and understand dataset

 Build an input dataset

 Build the model

 Train the model

 Test the model

 Integration with Tensor flow lite

6.2 Module Description

6.2.1 EXAMINE AND UNDERSTAND DATASET

A dataset (or data set) is a collection of data, usually presented in tabular


form. Each column represents a particular variable. Each row corresponds to
a given member of the dataset in question. It lists values for each of the
variables, such as height and weight of an object. Each value is known as a
datum. The dataset may comprise data for one or more members,
corresponding to the number of rows.
If your data set is messy, building models will not help you to solve your
problem. In order to build a powerful machine learning system, we need to
explore and understand our data set before we define a predictive task and
solve it.

42
6.2.2 Build An Input Dataset

Teachable Machine is a web tool that makes it fast and easy to create
machine learning models for your projects Train a computer to recognize
your images

FIGURE 6.1 Build The Model For Teachable Machine

43
TRAINING SAMPLE IMAGES

Image dataset of CIFAR- 100 which has numerous super-classes of general


object images and a number of subclass categories of each superclass.
CIFAR-100 has 100 classes of images with each class having 600 images
each These 600 images are divided into 500 training images and 100 testing
images for each class, therefore, making a total of 60,000 different images.
These 100 classes are clubbed together into 20 super classes. Every image in
the dataset comes with a “fine” label (depicting the class to which it belongs)
and a “coarse” label (superclass to the “fine” label detected). The selected
categories for training and testing are abed, bicycle, bus, chair, couch,
motorcycle, streetcar, table, train, and wardrobe.
For the proposed work, some wide categories of each super class need to be
used for training the networks, the super classes used are Household
furniture and vehicle. The chosen categories are shown in the table below.
The second dataset used was ImageNet datasets that has super-classes of
images which is further divided into subclasses. ImageNet is an image
dataset which is organized as per the WordNet hierarchy. The dataset is
organized as meaningful concepts.

44
FIGURE 6.2 Training Datasets With Label

FIGURE 6.3 Sample Input Training Model

45
6.2.3 TRAIN THE MODEL

Teachable Machine is a GUI tool that allows you to create training dataset
and train several types of machine learning models, including image
classification, pose classification and sound classification. Teachable
Machine uses TensorFlow.js under the hood to train your machine learning
model. You can export the trained models in TensorFlow.js format to use in
web browsers, or export in TensorFlow Lite format to use in mobile
applications or IoT devices.

Here are the steps to train your models:

1. Go to Teachable Machine website


2. Create an Image project
3. Record some sound clips for each category that you want to recognize. You
need only 8 seconds of sound for each category.
4. Start training. Once it has finished, you can test your model on live audio feed.
5. Export the model in TFLite format.

FIGURE 6 .4 Structure Of CNN

46
The most important advancement of deep learning over traditional machine
learning is that its performance improves with the increasing of the amount of
data. A 4 convolutional layers model (4-Conv CNN) was built from scratch for
this project. The structure of CNN is shown in Figure 1. This 4-Conv CNN has
4 Conv2D layers, 2 MaxPooling layers, and 5 Dropout layers, and a fully
connected layer is the following

Functions of different layers:

 The first layer in this structure is a Conv2D layer for the convolution
operation which can extract features from the input data by sliding the filter
over the input to generate a feature map. In this case, the size of the filter is
3x3.
 The second layer is a MaxPooling2D layer for the max-pooling operation
which can reduce the dimensionality of each feature. In this case, the size of
the pooling window is 2 x 2.
 The third layer is a Dropout layer for reducing overfitting. In this case, the
dropout function will randomly abandon 20% of the outputs.
 We repeated these steps to add more hidden layers, which are 4 Conv2D
layers, 2 MaxPooling2D layers, and 5 Dropout layers. Then, a Flatten layer
is used to flatten the output to 1D shape because the classifier will finally
output a 1D prediction (category). Two Dense layers have a fully connected
function and another Dropout layer between them can discard 30% of the
outputs. Because this task is a 10 categories classification, a final layer is
combined with a Softmax activation to transform the outputs to 10
possibilities.

47
Deep Convolutional Neural Networks Algorithm

Image Normalization

The first step in our approach is to normalize the input image. Images are
given as a set of RGB values. These values are given as channels. Value of 1 equals
grey scale and of 3 equals that of RGB. So, we need to normalize these values in a
common range. Hence in the first layer we have normalize our images to RGB
using scale 3.

Rectified Linear Unit

We use ReLU to threshold the input. These are usually defined in the
convolutional layer itself. In tensorflow there are two ways to define it. First to
define it in the convolutional layer and secondly add a different layer that contains
the activation function. We use the first option in our approach. The formula for
rectified linear unit is given as: f(x) = max(0, x)

Convolutional Layer

The convolutional layer convolves a set of filters over the input. A high filter
response indicates similarity between the filter and the input and vice-versa. The
filter outputs obtained in this layer enables it to make a decision about the class of
the input image. It does linear transformation from input to output without changing
the dimensions of the input image but changing the number of channels in the
output image. The convolutional layer takes the input with some pre-defined bias.
Weights are defined for each input as well. These weights help us reduce the errors
when we backpropagate the error value and adjust it so as to improve the model.

48
Maxpool layer

The maxpool operation is used to down-sample the images. This takes in an


input of size and turns it into our desired size. In this step we change the number of
rows and columns but the depth remains same. The maxpool operation is important
because it will not overfit to our data.

Fully connected layer


This layer comes into use when we are done with all our convolutional,
maxpool and ReLU operations. This layer computes the numerical values of the
given input, which is the output of our above operations.

Soft-max Layer

The end of a convolutional neural network is usually either a softmax


classifier. In our network we have used softmax. The softmax classifier takes in an
array and gives output for different categories in the dataset.

6.2.4 TEST THE MODEL

1. The dataset was downloaded from caltech’s website. There were 101
classes which contained multiple images. We then split the dataset into 80%
training and 20% testing. The model was designed using tensorflow in keras with a
combination of convolutional and max pooling layers. This was then flattened and
provided to the fully connected layer. The validation accuracy that we got was
around 30%. With some tweaking of the hyper parameters we were able to get
accuracy of up to 62%

2. This was happening due to class imbalance. So we decided to restrict our


training to 10 classes which had maximum number of images.

49
3. As our model was ready, we had a lot of space to improve our model’s
performance on 10 classes. We decided to go ahead with 2 activation functions and
then compare the results. The two activation functions were, ‘tanh’ and ‘relu’. As an
initiative to learn the working of the neural network, we decided to visualize the
output of every convolutional and max pooling layer. Using tanh activation we were
able to get 32% accuracy.
4. We then used relu activation function for the same model. With relu
activation function, we were able to get 93% accuracy for our model.
5. We even plotted multiple graphs throughout the progress of the project.
We compared the training accuracy vs validation accuracy for all the models,
training loss vs validation loss for all the models. We even plotted the comparison
plots for tanh and relu activation functions.

6.2.5 INTEGRATION WITH TENSOR FLOW LITE

FIGURE 6.5 Integration Architecture For Tensor Flow Lite

50
CHAPTER 7

7 SYSTEM SPECIFICATION

7.1 HARDWARE SPECIFICATION

 Hard Disk: 500GB and Above


 RAM: 4GB and Above
 Processor: I 3 and Above

7.2 SOFTWARE SPECIFICATION

 Operating System: Windows 7, 8, 10 (64 bit)


 Software: Python 3.7
 IDE: Pycharm

TECHNOLOGIES USED:

 Python

51
REFERENCES

[1] A. A. Cruz-Roa, J. E. A. Ovalle, A. Madabhushi, and F. A. G. Osorio,


‘‘A deep learning architecture for image representation, visual inter- pretability and
automated basal-cell carcinoma cancer detection,’’ in Med- ical Image Computing
and Computer-Assisted Intervention—MICCAI 2013, C. Salinesi, M. C. Norrie, and
Ó. Pastor, Eds. Berlin, Germany: Springer, 2013, pp. 403–410, doi: 10.1007/978-3-
642-40763-5_50.

[2] A. B. Amjoud and M. Amrouch, ‘‘Transfer learning for auto- matic


image orientation detection using deep learning and logistic regression,’’ IEEE
Access, vol. 10, pp. 128543–128553, 2022, doi: 10.1109/ACCESS.2022.3225455.

[3] A. B. Nassif, M. A. Talib, Q. Nasir, Y. Afadar, and O. Elgendy, ‘‘Breast


cancer detection using artificial intelligence techniques: A systematic lit- erature
review,’’ Artif. Intell. Med., vol. 127, May 2022, Art. no. 102276, doi:
10.1016/j.artmed.2022.102276.

[4] A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, and J. Li, ‘‘Salient object
detection: A survey,’’ Comput. Vis. Media, vol. 5, no. 2, pp. 117–150, Jun. 2019,
doi: 10.1007/s41095-019-0149-9.

[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification


with deep convolutional neural networks,’’ in Advances in Neural Information
Processing Systems, vol. 25, F. Pereira, C. J. C. Burges,L. Bottou, and K. Q.
Weinberger, Eds. Red Hook, NY, USA: Curran Associates, 2012, pp. 1097–1105.
Accessed: Oct. 22, 2019. [Online]. Available: http://papers.nips.cc/paper/4824-
imagenet-classification- with-deep-convolutional-neural-networks.pdf

52
[6] A.-M. Founta, D. Chatzakou, N. Kourtellis, J. Blackburn, A. Vakali, and
C. Chen, A. Seff, A. Kornhauser, and J. Xiao, ‘‘DeepDriving: Learn- ing
affordance for direct perception in autonomous driving,’’ in Proc. IEEE Int. Conf.
Comput. Vis. (ICCV), Santiago, Chile, Dec. 2015,

[7] G. Guo, H. Wang, C. Shen, Y. Yan, and H.-Y.-M. Liao, ‘‘Automatic


image cropping for visual aesthetic enhancement using deep neural net- works and
cascaded regression,’’ IEEE Trans. Multimedia, vol. 20, no. 8,

[8] I. Lenz, H. Lee, and A. Saxena, ‘‘Deep learning for detecting robotic
grasps,’’ Int. J. Robot. Res., vol. 34, nos. 4–5, pp. 705–724, Apr. 2015, doi:
10.1177/0278364914549607.

[9] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,

J. Ni, K. Shen, Y. Chen, W. Cao, and S. X. Yang, ‘‘An improved deep


network-based scene classification method for self-driving cars,’’ IEEE Trans.
Instrum. Meas., vol. 71, pp. 1–14, 2022, doi: 10.1109/TIM.2022.3146923.

[10] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, ‘‘Abusive


language detection in online user content,’’ in Proc. 25th Int. Conf. World Wide
Web, Montreal, QC, Canada, Apr. 2016, pp. 145–153, doi:
10.1145/2872427.2883062.

[11] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman,


‘‘Discovering objects and their location in images,’’ in Proc. 10th IEEE Int. Conf.
Comput. Vis. (ICCV), vol. 1, Beijing, China, Oct. 2005,

[12] K. Aurangzeb, S. Aslam, M. Alhussein, R. A. Naqvi, M. Arsalan, and


K. Tong and Y. Wu, ‘‘Deep learning-based detection from the perspec- tive of

53
small or tiny objects: A survey,’’ Image Vis. Comput., vol. 123, Jul. 2022, Art. no.
104471, doi: 10.1016/j.imavis.2022.104471.

[13] L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, and R. Qu, ‘‘A sur- vey
of deep learning-based object detection,’’ IEEE Access, vol. 7, Leontiadis, ‘‘A
unified deep learning architecture for abuse detection,’’ 2018, arXiv:1802.00385.
Accessed: Oct. 21, 2019.

[14] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt,


‘‘Sequen- tial deep learning for human action recognition,’’ in Human Behavior
Understanding, A. A. Salah and B. Lepri, Eds. Berlin, Germany: Springer, 2011, pp.
29–39, doi: 10.1007/978-3-642-25446-8_4.

[15] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,


‘‘Object detection with discriminatively trained part-based models,’’ IEEE Trans.
Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010, doi:
10.1109/TPAMI.2009.167.

[16] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of
simple features,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit. (CVPR), Kauai, HI, USA, Dec. 2001, pp. I-511–I-518, doi:
10.1109/CVPR.2001.990517.

[17] S. Agarwal, J. O. D. Terrail, and F. Jurie, ‘‘Recent advances in object


detection in the age of deep convolutional neural networks,’’ 2018,
arXiv:1809.03193.

[18] S. I. Haider, ‘‘Contrast enhancement of fundus images by employ-


ing modified PSO for improving the performance of deep learn- ing models,’’
IEEE Access, vol. 9, pp. 47930–47945, 2021, doi: 10.1109/ACCESS.2021.3068477.

54
[19] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, ‘‘Detecting
unexpected obstacles for self-driving cars: Fusing deep learning and geometric
modeling,’’ 2016, arXiv:1612.06573. Accessed: Oct. 21, 2019.

[20] W. Liu, I. Hasan, and S. Liao, ‘‘Center and scale prediction: Anchor-free
approach for pedestrian and face detection,’’ Pattern Recognit., vol. 135, Mar.
2023, Art. no. 109071, doi: 10.1016/j.patcog.2022.109071.

[21] W. Zhiqiang and L. Jun, ‘‘A review of object detection based on con-
volutional neural network,’’ in Proc. 36th Chin. Control Conf. (CCC), Jul. 2017,
pp. 11104–11109, doi: 10.23919/ChiCC.2017.8029130.

[22] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, ‘‘Multi-view 3D object


detection network for autonomous driving,’’ 2016, arXiv:1611.07759. Accessed:
Oct. 21, 2019.X. Zhang, ‘‘Learning-based object detection and localization for a
mobile robot manipulator in SME production,’’ Robot. Comput.-Integr. Manuf.,
vol. 73, Feb. 2022, Art. no. 102229, doi: 10.1016/j.rcim.2021.102229.

[23] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.

[24] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes in
the wild,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, Dec.
2015, pp. 3730–3738, doi: 10.1109/ICCV.2015.425.

55
[25] Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu,
‘‘Human action recognition from various data modalities: A review,’’ IEEE Trans.
Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3200–3225, Mar. 2022, doi:
10.1109/TPAMI.2022.3183112.

[26] Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy, ‘‘Speed/accuracy trade-


offs for modern convolutional object detectors,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 3296–3297, doi:
10.1109/CVPR.2017.351.
.
[27] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, ‘‘Object detection with deep
learning: A review,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp.
3212–3232, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.

[28] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, ‘‘DetNet:


Design backbone for object detection,’’ in Computer Vision—ECCV 2018, V.
Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds.Cham, Switzerland:
Springer, 2018, pp. 339–354, doi: 10.1007/978-3-030-01240-3_21.

[29] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun,Light-head


R-CNN: In defense of two-stage object detector,’’
2017,arXiv:1711.07264.

[30] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, ‘‘Random eras- ing
data augmentation,’’ in Proc. AAAI, Apr. 2020, vol. 34, no. 7,
pp. 13001–13008, doi: 10.1609/aaai.v34i07.7000.

56
57

You might also like