Nivetha Me Phase1rep
Nivetha Me Phase1rep
PHASE I REPORT
Submitted by
D.NIVETHA (920122421006)
NOV/DEC 2023
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Head of the Department Project Supervisor
Mr.S.OYYATHEVAN, M.E., Mrs.S.SUBHA.,M.E., ……..
ii
ACKNOWLEDGEMENT
We thank the most graceful creator of the universe, our almighty GOD, who ideally
supported us thought this project.
At this moment of having successfully completed our project, we wish to convey our
sincere thanks to the management and our Chairman Dr. S. MOHAN who provide
all the facilities to us.
We extend our sincere thanks to our guide Mrs.S.SUBHA M.E., Assistant Professor,
Department of Computer Science and Engineering, who is our light house in the vast
ocean of learning with her inspiring guideless and encouragement to complete the
project.
We would like to express our gratitude to all the teaching and nonteaching staff
members of Computer Science and Engineering Department and our friends for that
kind help extended to us.
iii
ABSTRACT
Mobile object detection has become an important research area due to its
approach for mobile object detection using TensorFlow Lite. We use a deep
neural network with multiple convolutional layers to extract features from the
input image. The extracted features are then fed to a set of detection heads to
predict the bounding boxes and corresponding class labels of the objects in
and the SSD (Single Shot Detector) algorithm as our detection framework.
iv
க்கம்
v
ேம ம் இ தற் ேபா ள் ள ந ன அ ைறகைள ட
vi
TABLE OF CONTENTS
ABSTRACT iv
LIST OF FIGURES ix
LIST OF ABBREVIATIONS x
11.1.
1. INTRODUCTION
11
1.1 General Introduction 11
1.2 Project Objectives 14
1.3 Problem Statement 15
2. EXISTING SYSYEM 16
2.1.Disadvantages 16
3. PROPOSED SYSTEM 17
3.1 Advantages 17
4. SYSTEM DIAGRAMS 19
4.1 Architecture Diagram 19
4.2 Data Flow Diagram 20
4.3 ER Diagram 22
4.4 Use Case Diagram 23
4.5 Class Diagram 24
4.6 Sequence Diagram 25
4.7 Activity Diagram 26
5. LITERATURE SURVEY 27
6. SYSTEM IMPLEMENTATION 42
vii
6.1 MODULES
42
6.2 MODULES DESCRIPTION
42
6.2.1Examine And Understand Dataset 42
viii
LIST OF FIGURES
ix
LIST OF ABBREVIATION
10
CHAPTER-1
INTRODUCTION
In our Project develop a model for image classification using Deep Learning
approach it can automate the feature extraction process and is effective for image
recognition. We evaluate the viability of using deep learning models for object
detection in real-time video feeds on mobile devices in terms of object detection
performance and inference delay as either an end-to-end system or feature extractor.
used TensorFlow, a relatively new library from Google, to model our neural
network. The TensorFlow Object Detection API is used to detect multiple objects in
real-time video streams. It can use an algorithm to detect patterns and alert the user
if an anomaly is found. Mobile object detection is the process of detecting and
localizing objects of interest in images or videos using a mobile device such as a
smartphone or a tablet. This technology has become increasingly important in
recent years as the use of mobile devices has become more widespread, and the
demand for applications that can recognize and understand the visual world has
grown.
11
Mobile object detection has a wide range of applications, from augmented
reality and gaming to autonomous vehicles and security systems. By enabling
mobile devices to identify and understand their environment, object detection
technology can provide users with a more immersive and interactive experience and
help businesses and organizations improve their operations and security.
Object detection plays a vital role in computer vision and robotics. Common
examples of computer vision include video surveillance, automated inspection in
industries, traffic monitoring, digital form of libraries, and electronic gadgets such
as mobiles, cameras, and tablets. Object detection is applied in robotics for
navigational tasks, interactions with human and the environment, manipulation or
building of objects, etc. It is one of the primary objectives involved in building a
fully autonomous system. Integrating object detection with sophisticated cameras
that feature long-range zooming, night vision, etc., can give rise to hybrid systems
that can be deployed for surveillance and military-based applications. Since most
modern visual detection techniques are inspired by the human brain, they can be
used in providing remedies and alternatives for vision-related impairments. They
also provide tangible insight into how vision is being interpreted by the brain.
12
recognition. Advanced Driver Assistance Systems (ADAS) are a much popular
disruptive technology in automobile industry. It needs software, hardware and
technologies like RADAR, LIDAR, vision and image processing systems, with
artificial intelligence to assist the driver for a safe driving experience. In this thesis,
ADAS has been given only as a use case for the problems solved and this does not
include any real-time image acquisition. However these algorithms can be
successfully deployed in real-time vision systems such as ADAS.
13
1.2. Project Objectives
The fundamental task of image classification is to make sure all the images
are categorized according to its specific sectors or groups. Classification is
easy for humans but it has proved to be major problems for machines.
It consists of unidentified patterns compared to detecting an object as it
should be classified to the proper categories. The various applications such
as vehicle navigation, robot navigation and remote sensing by using image
classification technology.
In this project have solved an image recognition problem, where our goal is
to tell which class the input image belongs to. The way are going to achieve
it is by training an artificial neural network on I,00,000 images of airplane,
automobile, bird, cat, deer, dog, frog, horse, ship and truck and make the
Convolutional Neural Networks (CNN) learn to predict which class the
image belongs to, next time it sees an image having a cat or dog in it.
14
1.3.Problem Statement
15
CHAPTER 2
2.1.1. Disadvantages
Low Accuracy
16
CHAPTER-3
3.Proposed System
3.1. Advantages
There are several advantages of using TensorFlow Lite and deep learning
approaches for mobile object detection:
High accuracy:
Deep learning models are capable of achieving high accuracy in object detection
tasks. By using TensorFlow Lite, we can deploy these models on mobile devices
without compromising on accuracy.
Real-time performance:
TensorFlow Lite is optimized for mobile devices, enabling deep learning models
to run in real-time on mobile devices. This is particularly useful in applications such
as autonomous driving and surveillance, where real-time object detection is crucial.
17
Efficient memory and power consumption:
Mobile devices have limited memory and battery life. TensorFlow Lite models
are designed to be lightweight and efficient, minimizing memory and power
consumption on mobile devices.
Customizable:
TensorFlow Lite and deep learning models allow for customization and fine-
tuning to specific use cases. This means that we can train models to detect
specific objects, such as faces or vehicles, and optimize them for specific
applications.
Portability:
18
CHAPTER-4
SYSTEM DIAGRAMS
19
4.2 Flow Diagram
20
4.3 ER Diagram
21
FIGURE 4.3 ER Diagram
22
4.4.Use Case Diagram
23
4.5 .Class Diagram
24
4.6 .Sequence Diagram
25
4.7. Activity Diagram
26
CHAPTER-5
5.1.LITERATURE SURVEY
This chapter reviews the relevant literature and several strands of research in
the area of Object Detection using Deep Learning. There are many papers related to
Object Detection techniques in literature such as Scale Invariant Feature Transform,
speeded up Robust Features, Convolutional Neural Networks and out of which,
study in this thesis is confined to Convolutional Neural Networks. Object Detection
is used in several walks of life like Security, Military, Medical Imaging, Biometric
recognition, Iris recognition, Natural Language processing, video analysis, weather
forecasting. Deep Learning Models are mainly used for Object Detection algorithms
due to their accurate Image Recognition capability. In various fields there is a
necessity to detect the target object and also track them effectively while handling
occlusions and other included complexities. Many researchers attempted for various
approaches in object tracking. The nature of the techniques largely depends on the
application domain. Some of the research works which made the evolution to
proposed work in the field of object tracking are depicted as follows. The following
is a list of several papers to emphasize the research works carried out on Object
Detection using Deep Learning.
[1] Lichao Mou et al. proposed a novel method for vehicle instance segmentation,
which requires identifying, at the pixel level, where the vehicles appear and
associating each pixel with a physical instance of a vehicle. This work developed a
unified multitask learning network that learns two complementary tasks, which are
segmenting vehicle regions and detecting semantic boundaries. Also, a new Dataset
for vehicle instance segmentation, namely, Busy Parking Lot Unmanned Aerial
27
Vehicle Video is developed for future research purposes.
[2] Waqas Hassan et al. proposed a pixel classification method which uses
segmentation technique to identify stationary objects in the image. These objects are
then tracked using a new technique called adaptive edge orientation. This technique
detects objects with an accuracy of 95% which is an improvement over state-of-the-
art methods.
[3] Richard J. Radke et al. surveyed various Image change detection algorithms and
discussed the principles for comparing performance of change detection algorithms.
[4] Hao Long et al. demonstrates Object Detection in Aerial Images to obtain better
detection performance in aerial images by designing a novel Deep Neural Network
framework called Feature Fusion Deep Networks (FFDN). The novel architecture
provides powerful hierarchical representation and also strengthens the spatial
relationship between the high-density objects. Proposed model FFDN achieves
good improvement on benchmark Datasets like UAV123, UAVDT. The advantage
of the proposed method is that the objects which appear with small size, partial
occlusion and out of view, as well as in the dark background can also be detected
accurately.
[5] Ross Girshick et al. proposed a scalable Object Detection algorithm which
improves the Mean Average Precision (MAP) by more than 50% when compared to
the previous best result of 62.4% on VOC 2012 Dataset. The proposed method is a
culmination of two ideas. They are ‘Usage of highcapacity CNNs to bottom-up
region proposals’ and ‘Supervised pre-training for an auxiliary task’, when labelled
training data are scarce and followed by domain-specific Fine-Tuning, which boosts
the performance. Hoo-Chang Shin et al discussed the reasons for using CNNs in
computer-aided detection problems. This work concentrates on two computer aided
detection problems such as thoraco-abdominal Lymph Node (LN) detection and
28
Interstitial Lung Disease (ILD) classification. Xuesong Zhang et al. [13] described a
method of combining a set of Fine-Tuned CNN models for multi-class object
category recognition tasks. In this method the Fine-Tuned CNN models are trained
on target Dataset and the last fully connected layers are changed according to the
targeted task. The results obtained from the proposed method are good enough in
detecting objects.
[6] Zhong-Qiu Zhao et al. discusses various Deep Learning-based Object Detection
frameworks. This work focuses on refinements that can be applied on Object
Detection architectures to improve their performance further. Also, this work gave
some promising directions for designing a better Object Detector.
[8] Hasbi Ash Shiddieqy, Farkhad Ihsan Hariadi, Trio Adiono “Implementation of
Deep-Learning based Image Classification on Single Board Computer”, In this
paper, a deep-learning algorithm based on convolutional neural-network is
implemented using python and tflearn for image classification, in which two
different structures of CNN are used, namely with two and five layers and It
conclude that the CNN with higher layer performs classification process with much
higher accuracy.
29
[9] Rui Wang, Wei Li, Runnan Qin and JinZhong Wu “Blur Image Classification
based on Deep Learning”, In this paper, a convolution neural network (CNN) of
Simplified-Fast-Alexnet (SFA) based on the learning features is proposed for
handling the classification issue of defocus blur, Gaussian blur, haze blur and
motion blur four blur type images. The experiment results demonstrate that the
performance of classification accuracy of SFA, which is 96.99% for simulated blur
dataset and 92.75% for natural blur dataset, is equivalent to Alexnet and superior to
other classification methods.
[10] Sameer Khan and Suet-Peng Yong “A Deep Learning Architecture for
Classifying Medical Image of Anatomy Object”, In this paper, a modified CNN
architecture that combines multiple convolution and pooling layers for higher level
feature learning is proposed. In this, medical image anatomy classification has been
carried out and it shows that the proposed CNN feature representation outperforms
the three baseline architectures for classifying medical image anatomies.
[12] C. A. Ronao and S.-B. Cho, “Human activity recognition with smartphone
sensors using deep learning neural networks,” Expert Syst. Appl., vol. 59, pp.
235_244, Oct. 2016. Convolutional networks comprised of one or more
convolutional and pooling layers followed by one or more fully-connected layers,
have gained popularity due to their ability to learn unique representations from
images or speeches, capturing local dependency and distortion invariance.
30
[13] Hernández-Serna, A.; Jimenez, L. “Automatic identification of species with
neural networks.” 2014, 11,563. This Paper reveals that tasks are getting more
complicated than ever before, the volume of datasets has increased to a size which
cannot be solved by traditional machine learning methods. As a result, the role of
artificial neural networks is revealed. developed a neural network structure
intending to classify a dataset including 740 species and 11,198 samples. They had
a 91.65% accuracy with fish, 92.87% and 93.25% of plants and butterflies,
respectively.
[14] Cortes, C.; Vapnik, V. Support-vector networksv. Mach. Learn. 1995 20, 273–
297.In the traditional machine learning area, SVM is one of the most effective
methods to address various kinds of problems such as classification and regression.
Recently, the Principal Component Analysis (PCA) This paper revealed that
algorithm combined with SVM is also prevailing since this method can enhance the
efficiency of training the model to a large extent.
[15] Alsing, O. Mobile Object Detection using TensorFlow Lite and Transfer
Learning (Dissertation). 2018.Built real-time CNN networks based on transfer
learning to detect the regions where have Post-it and transformed these models to
the format which can be used on smartphones with TensorFlow Lite tools. A Faster
Region CNN (RCNN) ResNet50 architecture achieved the highest mAP (mean
Average Precision), which is 99.33% but the inference time is 20018 milliseconds
(ms) The difference in the accuracy of traditional machine learning, deep learning
and transfer learning methodologies on butterfly detection has not been the focus of
any research so far and a mobile application to detect butterfly has never been
developed. Therefore, in this paper, after optimized certain parameters, not only we
improved the accuracy from previous studies with deep learning models,but we also
created a practical Android application that has a relatively short inference time
31
[16] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). XNOR-net:
Imagenet classification using binary convolutional neural networks. Lecture Notes
in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), Image classification system based on a
structure of a Convolutional Neural Network (CNN). The training was performed
such that a balanced number of face images and non-face images were used for
training by deriving additional face images from the face images data. The image
classification system employs the bi-scale CNN with 120 trained data and the auto-
stage training achieves 81.6% detection rate with only six false positives on Face
Detection Data Set and Benchmark (FDDB), where the current state of the art
achieves about 80% detection rate with 50 false positives.
[17] Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015).
DRAW: A Recurrent Neural Network For Image Generation. This Paper studied
about Neural Network Architecture (NNA) as a method for the image classification.
The framework consists of a combination between mimics of two pairs human eye
and variation sequence auto-encoding. It involved many complex images but in the
process of this study, the system slowly improves the MNIST models. The MNIST
is the open source database to be used as the training set. It also tests with Street
View House Numbers dataset where the result was improved because even the
human eyes cannot distinguish it
32
[18] Kamavisdar, P., Saluja, S., & Agrawal, S. (2013). A survey on image
classification approaches and techniques. Nternational Journal of Advanced
Research in Computer and Communication Engineering, 2(1), 1005–1009. This
Paper used Decision Tree (DT) as the techniques in image classification. The DT
has multiple datasets that are located under each of Hierarchical classifier. It must
be done in order to calculate membership for each of the classes. The classifier
allowed some rejection of the class on the intermediary stages. This method also
required of three (3) parts which the first one is to find terminal nodes and second in
the placement of class within it. The third one is partitioning of the nodes. This
method is considered very simple and high rate of efficiency.
[20] "SSD: Single Shot MultiBox Detector" by Wei Liu, Dragomir Anguelov,
Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C.
Berg. This paper presents a new object detection framework, SSD, that is optimized
for mobile devices. The framework achieves high accuracy with low computational
cost.
33
[22] "YOLOv3: An Incremental Improvement" by Joseph Redmon and Ali Farhadi.
This paper introduces YOLOv3, the latest version of the YOLO (You Only Look
Once) object detection algorithm. The algorithm is optimized for mobile devices
and achieves state-of-the-art performance in terms of speed and accuracy.
34
[26] An improved Gabor Wavelet Transform based algorithm has been proposed
by Wang et.al (2019) which employs a two-dimensional Discrete Cosine Transform
(DCT). This method focuses on the DC part of the image reducing the correlation
and helps to extract the features, and then Gabor Wavelet Transform is applied. The
use of 2D-DCT increases the recognition rate by removing unnecessary part of the
image that carries no information, thus reducing the recognition time. From these
extracted features, the images are classified using a nearest neighbor classifier using
euclidean distance. This helps in reducing the computation time and improves
recognition rate.
[27] Facial features are extracted using Gabor wavelet based on the contour feature
line (Zhang et.al 2019). This forms the Gabor face feature matrix from which the
redundant information is removed using uniform sampling. Dimensionality
reduction is done using Principal Component Analysis and calculates the co-
variance and characteristic matrix. Finally, classification has been done using
Support Vector Machine. Face recognition using 2D Gabor Wavelet Transform and
Local Binary Pattern (LBP) is proposed by Zhang et.al (2019).
[28] Initially, dimensions of the input image are reduced using Principal Component
Analysis (PCA) and then LBP is applied. It describes an image’s texture features,
each pixel’s neighbourhood characteristics using binary encoding of the pixel gray
differences between the center pixel and its neighbours. This shows the pattern
representation of the image and reduces the impact of factors such as illumination
on the image. It shows that the proposed idea of implementing LBP and Gabor
together has better recognition rate compared to applying each technique separately.
The problem of high feature dimensionality by application of Ant Colony
Optimization algorithm (ACO) for feature selection of relevant features is discussed
by Aro et.al (2018).
35
[29] The images are pre-processed to ensure proper recognition using geometrical
and illumination normalization. 14 After pre-processing is done, the face image is
convolved with a Gabor function and magnitude of Gabor responses are
concatenated into the face image. This extracted Gabor features are optimized using
ACO. A feature extraction method combining Gabor wavelet and uniform pattern
LBP has been proposed by Wang et.al (2017) for improving the accuracy of human
face feature recognition and flexibility of practical operation. The output of the
Gabor filter is coded using an uniform LBP. In an uniform LBP, the binary values
do not have excessive jumps. Depending on the binary values, the uniform pattern
can be detected using the number of binary digits and the number of binary
sequences. Based on the literature survey done it has been observed that face
recognition invariant to pose and illumination is still an open problem to be
addressed. In this work, an algorithm has been proposed for the same.
[31] Zhong-Qiu Zhao et al. surveyed various Deep Learning based Object
Detection frameworks. This work made some modifications to the existing Object
Detection architectures to improve the detection accuracy. Also, this work
formulated some promising guidelines for future work in Object Detection and
Neural Network based learning systems.
36
[32] Wei Liu et al. proposed a method called SSD which discretizes the output
space of bounding boxes into a set of default boxes over different aspect ratios and
scales per feature map location. The network outputs the score for the presence of
object in each box and produces adjustments to the box to better match the object
shape. Results displayed by SSD on PASCAL VOC, COCO, and ILSVRC Datasets
SSD achieved 74.3% mAP on VOC 2007 test set outperforming the state-of-the-art
model Faster R-CNN.
[33] Xingyi Zhou et al. developed a detector which uses key-point estimation to
find center points and regresses to all other object properties, like size, orientation,
3D location and pose. Center-Net achieves the best speed accuracy trade-off on MS-
COCO Dataset. The same approach is used to estimate 3D bounding box in KITTI
Dataset and human pose estimation on COCO key-point Dataset. This method
performed well in all the considered Datasets.
[34] Yushan Yu et al. designed a novel Deep Neural Network design for domain
adaptive Object Detection by further improving the localization accuracy of objects.
The following sequences of steps are followed for experimentation purpose. Step-1:
Refine the pseudo labels generated from the current Object Detection methods and
use these labels with a weighted loss function to train the network on the target
domain. Step-2: Residual blocks are inserted into the shallow layers of CNN in the
target domain to augment the special information which helps for object
localization. This network is tested on three publicly available benchmark Datasets
and the experimental results show that Domain Adaptive Localization Network
(DALocNet) out-performed the state-of-the-art methods for all the three Datasets.
Bernardo Augusto Godinho de Oliveira et al. surveyed that Object Detection
methods require either high processing power or large storage availability, making
it hard for resource constrained devices to perform the detection in real-time
without a connection to a powerful server. This work then designed a model that
37
requires only 95 megabytes of storage and took ms in average per image running on
a laptop CPU, making it suitable for standalone devices.
[34] Bin Xue et al. proposed a fast semi-supervised method, called DIOD, which is
based on Fully Convolutional Region Candidate Networks (FCRCNs) and Deep
Convolutional Neural Networks for affective object localization in minimal time.
FCRCN uses “seed” boxes at multiple scales and aspect ratios for object
localization. The proposed model is passed through a semi-supervised pre-training
process followed by Fine-Tuning, which produces accurate results to overcome the
lack of labelled training data. Finally, to improve the accuracy and speed of the
detection system further a novel sharing mechanism is designed which uses joint
learning strategy that extracts more discriminative and comprehensive features
while simultaneously learning the latent shared and individual features and their
correlations. Experiments are conducted on two real-world ISAR Datasets, and the
results demonstrate that DIOD outperforms the state-of-the-art methods.
[35] Alex Krizhevsky et al. trained a Deep Convolutional Neural Network with 1.2
million high-resolution images which belong to 1000 classes.
These results from this network achieved a top-1 and top-5 error rates of 37.5% and
17.0% in the test data which is better than the state-of-the-art techniques. Also,
another variant of this model is placed in ILSVRC-2012 competition and it became
the winner with a top-5 test error rate of 15.3%, compared to 26.2% achieved by the
second-best entry.
38
[36] Tie Liu et al. gave a new approach for detecting objects by template matching
from the large database collection. The approach was suitable for multi-scale
contrast backgrounds and used a color spatial method to detect objects. This
approach failed to detect multiple objects in a given user scenario and also failed
when the objects were in non-linear motion. This problem of failed detections was
effectively overcome in the proposed system by training the system to identify the
objects through an effective system learning technique and the objects in non-linear
motion are tracked using the proposed particle grouping approach.
[38] Wendi Cai et al. discusses about the increase in road accidents and tries to find
a solution for the same. With the advent of self-driving cars the need for automatic
detection of street objects became essential. Generic Model detection algorithm
based on Convolution Neural Network(CNN) need to design the training model,
while the training and testing of the model will take a lot of time. Transfer Learning
is used to Fine-Tune the pre-trained models, using the Image task Datasets of
COCO, transferring a generic Deep Learning Model to specific one with different
weights and outputs. Furthermore, the CNN structure is adjusted to improve overall
performance, and the street environment is trained to the special scene.
39
[39] Chenglin Liu et al. [54] proposed a cascade method for lung nodule detection
using CNN. This method uses transfer learning technique [55] to train the Object
Detection network on the considered Dataset which is of 2954 chest X-ray images.
The results from the proposed method are effective in reducing false positives.
[40] Ruhan Sa et al. used a Deep Learning based Object Detection method for
identification of landmark points in lateral lumbar X-ray images. In this work Fine-
Tuning Faster-RCNN, using small annotated clinical Datasets is performed. In this
experiment only 81 lateral lumbar X-Ray images are taken for training purpose, and
could achieve better performance than the traditional sliding window detection
method. The number of training and testing images was increased to 974 and 108
and then Fine-Tuning was applied and the network achieved an average precision of
0.905 with an average computation time of 3s/image, which greatly outperformed
the state-of-the-art methods.
[41] Wenhui Diao et al. proposed an efficient coarse object locating method in
Remote Sensing Images using Deep Belief Networks (DBN). DBNs are trained on
remote sensing images for feature extraction and classification. The feature learning
of the DBN is operated by pre-training each layer of Restricted Boltzmann
Machines (RBMs) using the general layer wise training algorithm. This makes an
RBM generate edge filters which help in identifying edges more precisely. The
precise edge position information and pixel value information are more efficient to
build a good model of images.
40
[42] Jayraj Bandariya et al. developed an Object Detection system to assist totally
blind individuals to manage their activities without anybody’s help. This work also
compares different Object Detection algorithms like Haar Cascade and
Convolutional Neural Network (CNN). Haar Cascade classifier is basically used for
face detection but it can also be trained to detect other objects whereas
Convolutional Neural Network falls under Deep Learning approaches which are
mainly used for object recognition. A Dataset is created with 2300 images which
belong to 3 different classes. This comparison is being executed to find the CNN as
a suitable algorithm for this system from the aspect of accuracy.
[43] Chun Zhan et al. proposed an Image recognition technique using Optical
Wavelet and Support Vector Machine (OWSVM). Optical wavelet is used for
feature extraction and support vector machine is used for creation of image
recognition model. In this study, 80 images which belong to 9 classes are taken to
test the effectiveness of the proposed method. Image recognition accuracy of the
proposed method is 96.25 which is an improvement over the state-of-the-art method
Optical Wavelet Back Propagation Neural Network (OWBPNN) whose accuracy is
88.75.
[44] Zhi Zhang et al. developed a new approach to generate animal object region
proposals using multilevel graph cut in the spatiotemporal domain. Then a cross-
frame temporal patch verification method is developed to find out if these region
proposals are true animals or background patches. An efficient feature description is
developed by using Deep Learning in combination with histogram of oriented
gradient features encoded with Fisher vectors for animal detection. Results indicate
that the proposed spatiotemporal object proposal and patch verification framework
are performing better than the state-of-the-art methods and increased the animal
detection accuracy by 4.5%.
41
CHAPTER -6
6. IMPLEMENTATIONS
6.1 Module
42
6.2.2 Build An Input Dataset
Teachable Machine is a web tool that makes it fast and easy to create
machine learning models for your projects Train a computer to recognize
your images
43
TRAINING SAMPLE IMAGES
44
FIGURE 6.2 Training Datasets With Label
45
6.2.3 TRAIN THE MODEL
Teachable Machine is a GUI tool that allows you to create training dataset
and train several types of machine learning models, including image
classification, pose classification and sound classification. Teachable
Machine uses TensorFlow.js under the hood to train your machine learning
model. You can export the trained models in TensorFlow.js format to use in
web browsers, or export in TensorFlow Lite format to use in mobile
applications or IoT devices.
46
The most important advancement of deep learning over traditional machine
learning is that its performance improves with the increasing of the amount of
data. A 4 convolutional layers model (4-Conv CNN) was built from scratch for
this project. The structure of CNN is shown in Figure 1. This 4-Conv CNN has
4 Conv2D layers, 2 MaxPooling layers, and 5 Dropout layers, and a fully
connected layer is the following
The first layer in this structure is a Conv2D layer for the convolution
operation which can extract features from the input data by sliding the filter
over the input to generate a feature map. In this case, the size of the filter is
3x3.
The second layer is a MaxPooling2D layer for the max-pooling operation
which can reduce the dimensionality of each feature. In this case, the size of
the pooling window is 2 x 2.
The third layer is a Dropout layer for reducing overfitting. In this case, the
dropout function will randomly abandon 20% of the outputs.
We repeated these steps to add more hidden layers, which are 4 Conv2D
layers, 2 MaxPooling2D layers, and 5 Dropout layers. Then, a Flatten layer
is used to flatten the output to 1D shape because the classifier will finally
output a 1D prediction (category). Two Dense layers have a fully connected
function and another Dropout layer between them can discard 30% of the
outputs. Because this task is a 10 categories classification, a final layer is
combined with a Softmax activation to transform the outputs to 10
possibilities.
47
Deep Convolutional Neural Networks Algorithm
Image Normalization
The first step in our approach is to normalize the input image. Images are
given as a set of RGB values. These values are given as channels. Value of 1 equals
grey scale and of 3 equals that of RGB. So, we need to normalize these values in a
common range. Hence in the first layer we have normalize our images to RGB
using scale 3.
We use ReLU to threshold the input. These are usually defined in the
convolutional layer itself. In tensorflow there are two ways to define it. First to
define it in the convolutional layer and secondly add a different layer that contains
the activation function. We use the first option in our approach. The formula for
rectified linear unit is given as: f(x) = max(0, x)
Convolutional Layer
The convolutional layer convolves a set of filters over the input. A high filter
response indicates similarity between the filter and the input and vice-versa. The
filter outputs obtained in this layer enables it to make a decision about the class of
the input image. It does linear transformation from input to output without changing
the dimensions of the input image but changing the number of channels in the
output image. The convolutional layer takes the input with some pre-defined bias.
Weights are defined for each input as well. These weights help us reduce the errors
when we backpropagate the error value and adjust it so as to improve the model.
48
Maxpool layer
Soft-max Layer
1. The dataset was downloaded from caltech’s website. There were 101
classes which contained multiple images. We then split the dataset into 80%
training and 20% testing. The model was designed using tensorflow in keras with a
combination of convolutional and max pooling layers. This was then flattened and
provided to the fully connected layer. The validation accuracy that we got was
around 30%. With some tweaking of the hyper parameters we were able to get
accuracy of up to 62%
49
3. As our model was ready, we had a lot of space to improve our model’s
performance on 10 classes. We decided to go ahead with 2 activation functions and
then compare the results. The two activation functions were, ‘tanh’ and ‘relu’. As an
initiative to learn the working of the neural network, we decided to visualize the
output of every convolutional and max pooling layer. Using tanh activation we were
able to get 32% accuracy.
4. We then used relu activation function for the same model. With relu
activation function, we were able to get 93% accuracy for our model.
5. We even plotted multiple graphs throughout the progress of the project.
We compared the training accuracy vs validation accuracy for all the models,
training loss vs validation loss for all the models. We even plotted the comparison
plots for tanh and relu activation functions.
50
CHAPTER 7
7 SYSTEM SPECIFICATION
TECHNOLOGIES USED:
Python
51
REFERENCES
[4] A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, and J. Li, ‘‘Salient object
detection: A survey,’’ Comput. Vis. Media, vol. 5, no. 2, pp. 117–150, Jun. 2019,
doi: 10.1007/s41095-019-0149-9.
52
[6] A.-M. Founta, D. Chatzakou, N. Kourtellis, J. Blackburn, A. Vakali, and
C. Chen, A. Seff, A. Kornhauser, and J. Xiao, ‘‘DeepDriving: Learn- ing
affordance for direct perception in autonomous driving,’’ in Proc. IEEE Int. Conf.
Comput. Vis. (ICCV), Santiago, Chile, Dec. 2015,
[8] I. Lenz, H. Lee, and A. Saxena, ‘‘Deep learning for detecting robotic
grasps,’’ Int. J. Robot. Res., vol. 34, nos. 4–5, pp. 705–724, Apr. 2015, doi:
10.1177/0278364914549607.
53
small or tiny objects: A survey,’’ Image Vis. Comput., vol. 123, Jul. 2022, Art. no.
104471, doi: 10.1016/j.imavis.2022.104471.
[13] L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, and R. Qu, ‘‘A sur- vey
of deep learning-based object detection,’’ IEEE Access, vol. 7, Leontiadis, ‘‘A
unified deep learning architecture for abuse detection,’’ 2018, arXiv:1802.00385.
Accessed: Oct. 21, 2019.
[16] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of
simple features,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit. (CVPR), Kauai, HI, USA, Dec. 2001, pp. I-511–I-518, doi:
10.1109/CVPR.2001.990517.
54
[19] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, ‘‘Detecting
unexpected obstacles for self-driving cars: Fusing deep learning and geometric
modeling,’’ 2016, arXiv:1612.06573. Accessed: Oct. 21, 2019.
[20] W. Liu, I. Hasan, and S. Liao, ‘‘Center and scale prediction: Anchor-free
approach for pedestrian and face detection,’’ Pattern Recognit., vol. 135, Mar.
2023, Art. no. 109071, doi: 10.1016/j.patcog.2022.109071.
[21] W. Zhiqiang and L. Jun, ‘‘A review of object detection based on con-
volutional neural network,’’ in Proc. 36th Chin. Control Conf. (CCC), Jul. 2017,
pp. 11104–11109, doi: 10.23919/ChiCC.2017.8029130.
[23] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
[24] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes in
the wild,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, Dec.
2015, pp. 3730–3738, doi: 10.1109/ICCV.2015.425.
55
[25] Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu,
‘‘Human action recognition from various data modalities: A review,’’ IEEE Trans.
Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3200–3225, Mar. 2022, doi:
10.1109/TPAMI.2022.3183112.
[30] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, ‘‘Random eras- ing
data augmentation,’’ in Proc. AAAI, Apr. 2020, vol. 34, no. 7,
pp. 13001–13008, doi: 10.1609/aaai.v34i07.7000.
56
57