0% found this document useful (0 votes)
13 views8 pages

Deep Learning Models For Gesture Recognition

Uploaded by

technoversalgeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Deep Learning Models For Gesture Recognition

Uploaded by

technoversalgeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/347549497

Deep Learning Models for Gesture-controlled Drone Operation

Conference Paper · November 2020


DOI: 10.23919/CNSM50824.2020.9269056

CITATIONS READS
7 371

3 authors, including:

Israat Haque
Dalhousie University
59 PUBLICATIONS 788 CITATIONS

SEE PROFILE

All content following this page was uploaded by Israat Haque on 15 June 2021.

The user has requested enhancement of the downloaded file.


Deep Learning Models for Gesture-controlled Drone
Operation
Tahajjat Begum, Israat Haque, and Vlado Keselj
Department of Computer Science, Dalhousie University, Halifax, Canada
Email: [email protected], [email protected], [email protected]

Abstract—Recently Unmanned Aerial Vehicles (UAVs) or a system for vision-based drone control that can reduce cost and
Drones have gained enormous attention in applications like avoid any interruptions in the drone operation due to the failure
military, agriculture, industry, etc. One approach of controlling of hardware like a remote control. Furthermore, the vision-
the operation of a drone is using hand gestures, which enables
designing a low-cost system. However, the accuracy of such a based drones will be able to provide better data extraction
system highly depends on the gesture recognition models. We can facility than remote-controlled ones in terms of flexibility and
use a neural network-based gesture recognition model, which is ease of use [5].
a widely accepted image recognition scheme. In this work, we
We consider a vision-based gesture recognition system to
first design three deep neural network-based gesture recognition
models: simple Convolutional Neural Networks (CNN), VGG-16, control the operation of a drone. In particular, we focus on
and ResNet-50 to uncover the best model for drone control. We applications like navigation, surveillance, and training, where
evaluate the proposed models over our generated hand-gesture users do not require a remote control. Instead, they can use
images in terms of their accuracy, precision, and complexity. hand gestures to control the drone operation. For instance, DJI
The analysis reveals that each of the three models has its
Spark is a hand gesture-controlled drone that helps immature
advantages and disadvantages while balancing between accuracy
and complexity. For example, Simple CNN offers 92% accuracy on pilots learn flying [6]. The system consists of an image recog-
the testing set validation with the lowest validation loss compared nizer to capture gesture images to feed to an image processing
to VGG-16 and ResNet-50. Thus, users can choose one of the unit. The drone controller reacts to the image processing unit’s
proposed models to match their drone application. outcome, i.e., the controller controls a drone in real-time
Index Terms—Convolutional Neural Networks (CNN), Human-
Computer Interaction (HCI), Unmanned Aerial Vehicles (UAV).
following users’ gestures. Thus, the core component of such
a gesture-controlled system is the image processing unit and
corresponding accuracy.
I. I NTRODUCTION A deep neural network, such as a convolution neural network
The usage of drones is not limited to the military use any- (CNN), is widely recognized as an accurate image classifi-
more; instead, the popularity is increasing in different applica- cation algorithm because of its automatic feature extraction
tions like aerial photography, shipping-and-delivery, entertain- capability [7]. In particular, CNN learns features by studying
ment, law enforcement, wildlife monitoring, search-and-rescue, complex hidden layers. It can also effectively reduce the grow-
precision agriculture, disaster management, storm tracking, etc. ing number of parameters without compromising the model
[1], [2]. According to the Federal Aviation Administration, the accuracy [8], [9]. Over time researchers have proposed different
drone market will reach 17 billion by 2024, and 7 million CNN architectures for better accuracy, processing time, and
drones will reach the sky [3]. Today we are experiencing the model complexity. We select three architectures varying in the
evolution of drone technology from one generation to another. number of fully connected layers. For example, we consider
The next generation of smart drones called Solo is already in a basic CNN architecture with fifteen layers, medium-sized
the market to capture visual images [4]. The remote-controlled VGG-16, and a large neural network Resnet-50. We plan to
drones are gradually transforming into semi or completely explore their trade-off in accuracy, complexity, and resource
automated devices exploiting the Artificial Intelligence (AI) requirement (CPU, memory, etc.), and based on these recom-
based implementation. mendations; drone users can choose an appropriate model to
Humans usually use hand and body gestures to communi- meet their application requirements and available resources.
cate; e.g., face or hand motion. Thus, gesture recognition can The research in neural network-based drone control is still
enable man-machine interactions to realize human-controlled in its infancy except for a couple of schemes. Hu et al. [10]
drone operation. Controlling drones using hand gestures can present a framework for neural network-based drone control,
give humans an edge to directly interact with the drones and where different images of hand-gestures are classified using an
avoid additional hardware like remote control, which incur eight-layer CNN. We extend their work by incorporating VGG-
an additional cost. A hand gesture-controlled drone can also 16 and Resnet-50 and a 15-layer CNN in the image classifi-
minimize physical labor for humans. As the hand gestures can cation module of the drone control system to generalize it for
be comprised of images or live video streaming, we can design various applications. In addition to measuring the accuracy, we
978-3-903176-31-7 © 2020 Crown consider precision, recall, and F1 value in our model evaluation,

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
which are missing in [10]. Hadri et al. [11] use simple CNN
with varying number of fully connected layers. We offer the
above three different neural networks that users can choose
from depending on their application demand. Thus, we present
an extensive evaluation and comparison of these three models:
simple CNN, VGG-16, and Resnet-50.
We consider drone applications where simple hand gestures
like left, right, stop, and forward can control drones for
navigation, surveillance, or training (e.g., an immature pilot).
However, any required hand gestures can be accommodated
in our drone control system. Sensors from a collector (e.g.,
Leap Motion Controller) can capture the gestures to feed to the
image classification module, a neural network in our design.
We collect around five hundred and fifty hand gestures from
different locations at Halifax (NS, Canada) over twenty days.
Seventeen people participated in the image collection phase
to generate images in eight different light intensity (day and
night) for each gesture. We then use 80% of data to train the Fig. 1: An example of a gesture-controlled drone system.
above neural networks and the rest of the data for testing. We
used different people and their images for the validity of the
training and testing dataset to ensure the testing images are In a vision-based drone control system, we need another
entirely different from the images produced for training. component to capture and process the gestures. There are
We measure the accuracy, recall, precision, and F1 values of two approaches to capture gestures: a front-facing camera
the three CNNs. The accuracy of the training dataset for VGG- mounting on a drone or camera connected to the ground base-
16 was 100%; however, the validation accuracy and loss value station. In this study, we consider the second option to capture
are worse compared to simple CNN. Based on precision, recall, gestures. For example, a Leap Motion Controller can capture
validation accuracy, and lose value, simple CNN offers a better the gestures to feed to the learning-based image processing unit.
result. We notice that the validation accuracy of simple CNN A trained CNN model then classifies the captured gestures to
increases with the increasing epoch measure while it decreases send an appropriate control signal from the base station to the
for the other two architectures. Also, the validation loss is the flight controller over MAVLink. The entire vision-based drone
lowest in simple CNN compared to VGG-16 and Resnet-50. It control system is depicted in Fig 1.
offers 92.5% accuracy on the testing dataset. We suspect that Deep Learning Fundamentals. The deep learning method-
simple CNN offers good accuracy in the case of a small dataset ology is mostly based on artificial neural networks that are
compared to the other two architectures. computational models inspired by the human brain structure.
The neurons are usually organized into several layers, which
The rest of the paper is organized follows. Section II provides
are the core entity of a neural network. Each layer has a
the necessary background. The following section presents the
collection of neuron nodes, where the information processing
related work. We outline our methodology in Section IV, and
takes place. The information is transferred from one layer to
the next section presents the evaluation results. We discuss
another over connecting channels called the weighted channel,
the future research directions in Section VI following the
where the neurons include a unique bias. The bias is added to
concluding remarks.
the weighted sum of inputs reaching the neurons, which later
II. BACKGROUND pass through an activation function to activate neurons and
compute the output value. The output of one layer connects
This section presents the necessary background on the op- to the next one until it reaches the second last input layer.
eration of a vision-based drone and three neural networks that Each neuron can consist of one or more output connections
we deploy in this work. that process information as signals to the next layer. The weight
Drones Fundamentals. A drone is an Unmanned Aerial and bias are adjusted throughout the network to produce a well-
Vehicle (UAV) without any onboard pilot. The core component trained neural network, which can recognize patterns.
is its flight controller hardware that consists of various sensors The forward propagation of information through a neuron
like accelerometer and gyroscope and firmware to control the defines a set of inputs such as x1 , x2 , x3 , . . . , xm in Fig 2,
drone movement. The drone control system also requires a which has corresponding weights w1 through wm , respectively.
ground base station consisting of essential software to install The weighted sum of the input passes through a non-linear
and set up the firmware at the flight controller. It also calibrates activation function to produce the final output y. Bias allows
different parts of the drone. The base-station and the flight the shift of activation function to the left or right regardless
controller communicate over a standard protocol called Micro of the input. Sigmoid Function, Hyperbolic Tangent, Rectified
Air Vehicle Communication Protocol (MAVLink) [11]. Linear Unit, SoftMax, etc. are the common types of non-

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
poses, or sound [16]. However, these designs are not for drone
applications.
We notice a couple of drone control systems using neural
networks, specifically CNN. Hu and Wang [10] present a state-
of-the-art gesture recognition system to control a UAV. In their
experiment, the authors focus on three deep neural networks
to recognize dynamic hand gestures. They use skeleton data
from a leap motion controller and split the data among training,
validation, and testing. The former dataset is used to train three
neural networks: 2-layer and 5-layer neural networks and an 8-
layer convolution neural network. The evaluation results reveal
that CNN offers the highest accuracy among the three models
Fig. 2: An example of artificial neuron computation.
that can be used to control a drone in real-time.
Hadri proposes a similar drone control system deploying
VGG-16 as the image classifier [11]. In particular, the system
linear activation functions in neural networks. Each neuron’s
uses the Single Shot MultiBox Detector (SSD), which is based
value is calculated for each hidden layer based on the weight,
on VGG-16, to detect hand gestures. The gestures are grouped
bias, and activation function [12]. The activation function intro-
as a single to five fingers and a close wrist. The author
duces non-linearities into the network, allowing approximating
also discusses different hardware and software-based drone
complex functions arbitrarily into decision boundaries, making
controlling schemes. However, their accuracy is lower than the
the neural network a powerful method to detect classes. The
one we report in this work. We compare and contrast simple
number and arrangement of layers distinguish neural network
and complex CNNs to allow users to select an appropriate
architectures. The neural network is the most efficient method
model based on their available data volume and resources.
to process unstructured data, such as images. As CNN is
a widely utilized deep learning method to analyze visual IV. M ETHODOLOGY
images, this paper focuses on comparing three different CNN This section first describes the data collection process. Then,
architecture for gesture recognition, which can be implemented we focus on presenting data preprocessing and decomposition
in the drone control system. for model training and testing.
Simonyan et al. proposed Very Deep Convolutional Net-
works for Large-Scale Image Recognition (VGGNet) [13], A. Data Collection and Preprocessing
which outperformed AlexNet, the previous state-of-the-art. The We captured images (dataset) around Halifax, Nova Scotia,
authors showed that having more layers with smaller convolu- Canada, in indoor and outdoor settings, where both long and
tional kernels can increase a neural network’s accuracy. How- short distance images are considered to increase posture trajec-
ever, optimizing models with many layers is inherently difficult tory. We captured total 544 images in different light conditions,
because of the vanishing and exploding gradient problems. where 17 people participated in the image capturing. The
The Residual networks (ResNet) [14] address this gradient dataset includes four types of gestures: forward, stop, right, and
problem. As the name suggests, residual functions form the left. Each person provided a total of 32 images in 8 different
building block for these models (see Fig.??). Explicitly adding lights for each gesture. We used both mobile and webcams to
the identity function to deepereach convolutional layers appears capture images, and saved in JPG format. Then, we rescaled
to be beneficial during training and improve accuracy. ResNet- the collected images of different sizes into 128 × 128 pixels as
50 network achieves this by adding a skip layer and identity part of the preprocessing. We then split 544 images into 80%
mapping to add up the previous layer to approximate the final and 20% for training and testing, respectively. In particular, we
function F (x) + x. use 344, 88, and 68 images as training, testing, and validation,
respectively. Fig 3 represents the sample of forward, stop, right,
III. R ELATED W ORK and left gesture images. Thus, our CNN models need to detect
four classes of hand gestures.
In this section, we present and discuss existing literature
related to our proposed design. B. Deep Learning Models
The deep neural network-based image classification for In our experiment, we started with pre-trained VGG-16 and
drone operation is getting its momentum in academia and in- ResNet-50 models. However, we created simple CNN based
dustry. Industries have started implementing drones in their ser- on a basic CNN architecture. We calculated model parameters,
vices and proposing to build advanced drone support systems. including the number of epochs, optimizer, number of dense
For instance, Google AI Blog announces a new framework layers, neurons in the dense layers, activation function, and
for hand and finger tracking, which can be used for real-time dropout layer to deal with the model overfitting problem. For
hand perception experiments [15]. They also introduce another VGG-16, we started with a pre-trained VGG architecture with
solution to train their provided model to recognize images, 18 weight layers as input, one pooling layer, and three dense

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
during the training of the simple CNN. A similar process has
been followed for the VGG-16 model; however, the default
image size for VGG-16 is set to 224 × 224. VGG-16 is a
published pre-trained model that was trained on the ImageNet,
which showed excellent accuracy result. We simply imported
that model; however, the output layer is modified to fit our
problem, four classes. For the VGG-16 model, a total of 527364
trainable parameters were available to learn within the network.
Adam optimization algorithm was used to optimize the model
and train the deep neural network and used 32 batch sizes
to train the model. Likewise, ResNet-50 used a pre-trained
model built on ImageNet with 128 × 128 weight and height
and 16 batch size, half of the VGG-16 model, and resulted in
34609156 learnable parameters.
All three models used the same activation functions, and
Fig. 3: Examples of different gestures. ResNet-50 and VGG-16 use the same optimizer. After training
all three deep neural network models, the accuracy of the model
is tested using the 88 testing sample, which was separated from
the training dataset to ensure the model validity is based on an
unknown dataset. However, before examining the model using
the testing dataset, the validation dataset is used for VGG-16
and ResNet-50 model to reduce the overfitting problem. Testing
the model helped to understand the learning rate or the need
for epochs size adjustment. We used 30 epoch size for Simple
CNN and VGG-16; however, the epoch size for the ResNet-
50 model was adjusted during the experiment for an improved
result. Multiple epochs update the weights of the model as
network biases, and weights are trained to ascertain the final
output layer. The accuracy and validation increased at the end
of epoch training for all the models, which will be discussed
in the next section.

V. E VALUATION
Fig. 4: ResNet-50 architecture.
In this section, a complete discussion of three trained models
is presented in terms of accuracy, error, and robustness of each
layers; however, to fit our problem, we changed the output model.
layer into four classes: right, left, forward, and stop. In the
original VGG-16 architecture, the width of convolution layers A. Model Comparison
(the number of channels) is relatively small. It starts from 64 It is essential to fine-tune weights to reconcile the difference
in the first layer and then increases by a factor of 2 after between the actual and predicted outcomes as weights decide
each max-pooling layer until it reaches 512. We followed the how quickly the activation function will react. We can adjust
same procedure for ResNet-50 (see Fig.4), i.e., we used a pre- weights using back-propagation to improve the accuracy pre-
trained ResNet model and kept the same number of input layers diction by calculating the loss function gradient. The validation
while changed the output layers based on our classification and training error are the main two factors to determine the
requirement. training time. Model training usually continues until these
two errors start dropping. The increase in the validation error,
C. Training and Testing however, indicates overfitting. Usually, when validation error
We did not transfer images into gray-scale instead used starts to increase, the training needs to be terminated. Another
the color ones. After the preprocessing step, which was done factor is the number of epochs, which depends on the dataset,
using Keras library, the training data is used to build different and it controls the number of complete passes of the learning
CNN models. In simple CNN, the 128 × 128 dimension was algorithm through the training dataset. The epochs allow the
set as width and height and used Convo2d model. We used learning algorithm to continue looping until it reaches the
ReLu function as an activation function for hidden layers and minimal model error. In our experiment, the number of epochs
the Soft-Max activation function in the final output layer for is chosen based on the accuracy and validation loss, which
multiclass problems. We tested a total of 831780 parameters is 30. However, this hyperparameter can be tested for different

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Comparison of the Three Models TABLE III: VGG-16 Classification Performance.
Model Simple CNN VGG-16 RestNet50 Precision Recall F1 -Score Support
Training accuracy 0.97 1 0.9389 Forward 0.16 0.14 0.15 21
Training loss 0.119209e-08 0.005770 0.6531 Left 0.27 0.26 0.27 23
Validation accuracy 0.9250 0.9545 0.8295 Right 0.17 0.18 0.17 22
Validation loss 0.11921e-08 0.1556 0.7129 Stop 0.30 0.32 0.31 22
Optimizer rmsprop adam adam Avg/Total 0.23 0.23 0.23 88

TABLE II: Simple CNN Classification Performance. TABLE IV: ResNet-50 Classification Performance.
Precision Recall F1 -Score Support Precision Recall F1 -Score Support
Forward 0.70 0.33 0.29 21 Forward 0.26 0.33 0.45 21
Left 0.29 0.43 0.34 23 Left 0.58 0.83 0.68 23
Right 0.00 0.00 0.00 22 Right 0.67 0.55 0.60 22
Stop 0.07 0.05 0.05 22 Stop 0.78 0.67 0.65 22
Avg/Total 0.15 0.20 0.17 88 Avg/Total 0.68 0.67 0.65 88

epochs numbers to compare validation losses for a single model


for a better comparison.
Table I provides the accuracy and loss comparison for the
three models presented in this paper. The results confirm that
Simple CNN offers the lowest training loss, but VGG-16 has
100% or 1.0 accuracy on its training data. In Simple CNN,
the accuracy increases with the increasing epoch measure,
whereas the validation accuracy decreases, which indicates the
model fits the training set better. Furthermore, it has the lowest
validation loss compared to the other two models and offers
92.5% accuracy on the testing dataset. Thus, Simple CNN
Fig. 5: Gesture prediction of the three models.
best fits to recognize gesture images better than the other two
models in a small dataset like ours.
Furthermore, compared to VGG-15 and ResNet-50, Simple two models. The loss presents a fascinating insight, mainly for
CNN offers better precision and recall value, presented in Simple CNN. The loss value for simple CNN is better than
Table III, Table II, and Table IV. Model accuracy represents VGG-16 and ResNet-50. These two models have fluctuation
the number of accurate predictions; however, model accuracy in loss value for both training and testing datasets; however,
is not the right predictor to ensure the validity and reliability simple CNN presents similar and very lower loss value for
of the model. We need to dive deeper into the confusion matrix training and testing. The validation and accuracy curves can
to comprehend other classification matrices. It is very crucial stipulate epoch closure signs if both training and testing values
to predict the number of correct and incorrect predictions. We start to depart consistently. These accuracy and loss values
calculate precision to calculate the positive predictions of the also support Simple CNN as a more accurate gesture detection
model, and recall calculates the model reliability to predict algorithm in our case.
positive outcomes. The combination of precision and recall is Fig.5 presents the prediction performance of the three mod-
represented by F1 score, which provides a weighted average els, which indicates that all three models can accurately predict
of precision and recall [17]. We can understand the number different gestures. In particular, we present the prediction
of actual occurrences of the class in the specified dataset by outcome of left and stop gestures, which are predicted correctly.
checking the support. Simple CNN has better precision for
forward and stop gestures with an average of 68% error rate B. Model Test in a Simulator
to label a negative instance as positive. Table II presents that In this experiment, we integrate the trained models in a simu-
the Simple CNN results in 67% recall value, which means that lator using the Python Turtle library. The Python Turtle module
in 68% of the cases, Simple CNN can find positive instances provides a drawing window where shapes can be drawn with
compared to the other two models. Table II also shows that the simple repetitive moves. Fig.6 presents the simulation process:
left and stop gestures positive instances are identified by the shapes are moving in the Turtle simulation environment during
Simple CNN at the rates of 83% and 95%, respectively. the implementation process. All the four gestures: right, left,
We also investigated the training and validation accuracy stop, and forward, are identified and labeled by the windows
and loss for all three models, where we observe an upward command prompt and Turtle simulation. We also plan to use
accuracy trend at the end of the epoch for both training and graphical software in this simulator in the future. Finally, we
testing data. However, VGG-16 shows the highest accuracy plan to extend this simulation in a testbed deployment, which
for both testing and training datasets compared to the other we discuss in the following section.

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
Fig. 7: A drone control system.

VII. C ONCLUSION
Fig. 6: Gesture prediction in a simulation environment.
In this paper, we have focused on the gesture-controlled
drone operation, where gestures are composed of hand im-
ages. The core of this drone control system is its gesture
VI. D ISCUSSION
recognition component. The accuracy of the recognition system
In this section, we outline a couple of limitations of our work has a crucial impact on the drone movement. CNN is widely
and discussion on how to mitigate those in the future research. recognized as an accurate image recognition solution. Thus, we
Dataset. The current dataset includes around 550 gestures as have implemented and compared the three CNN architectures:
we could not collect more because of difficulties reaching out simple CNN, VGG-16, and ResNet-50. We have considered
to more participants. We did not use publicly available dataset the most common hand gestures: right, left, forward, and
as our goal is to generate a dataset for the target applications stop to evaluate the performance of these architectures. The
from diverse participants (by gender, age, height, etc.). The hand gestures are collected from different participants in a
CNN models are usually trained on large datasets, especially diverse environment (e.g., indoors, outdoors). We have tested
ResNet-50 and VGG-16 models. Thus, we plan to conduct the trained models over the Turtle simulator to recognize hand
another data collection cycle to gather thousands of gestures gestures in real-time. The evaluation results have shown that
from a wide range of participants. We will then test these simple CNN has the best performance. The accuracy of the
architectures to see the impact of the size of the dataset on their training dataset for VGG-16 is 100%, whereas the validation
performance. Another plan is to explore extensively different accuracy and loss value are lower compared to the simple CNN.
hyperparameters and epochs for the three models as they have We have also proposed a drone controlling system consists of
a different number of hidden layers; they may converge to the a leap motion controller, gesture recognition module, and a
optimal solution for different hyperparameters and epochs. Raspberry Pi to control an AR drone, which we envision to
CNN based drone system. We plan to integrate the proposed test as part of the future work.
gesture recognition model in a simulator and real testbed. For
ACKNOWLEDGEMENTS
example, we can consider a system presented in Fig.7. In that
system, a leap motion controller acts as a gesture sensor to We would like to thank the anonymous AnServApp review-
capture hand images. These images then feed to the proposed ers for their constructive feedback. Also, we would like to
CNN based image recognition module. The model outcome thank Dr. Sageev Oore from Dalhousie University for his useful
next goes to a Raspberry Pi serving as a ground station. The comments.
entire controlling system can be managed using a Python script.
For example, Olympe by Parrot developer [18] provides a R EFERENCES
Python interface to connect and control a drone in a simulator [1] N. Joshi, “10 stunning applications of drone technology,”
and testbed, so users can write their own control applications. February 27, 2019. [Online]. Available: https://www.allerin.com/blog/
We can use a customized Python script to control a drone 10-stunning-applications-of-drone-technology
[2] D. Joshi, “Drone technology uses and applications for commercial,
(e.g., Parrot AR drone). The drone will have a Raspberry Pi industrial and military drones in 2020 and the future,” December
controller attached to it, which will process all the gesture 18, 2019. [Online]. Available: https://www.businessinsider.com/
commands via WiFi networks. We plan to test the proposed drone-technology-uses-applications
[3] G. Jeremy, “7 reasons why drones are the future of business,” May
system with the integrated deep NN based gesture recognition 05, 2018. [Online]. Available: https://www.inc.com/jeremy-goldman/
module. 7-reasons-why-drones-are-future-of-business.html

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
[4] F. Stephen, “The next generation 3DR Solo smart drone takes flight,”
September 16, 2016. [Online]. Available: https://www.businessinsider.
com/drone-technology-uses-applications
[5] A. A. A Kumar. Singha, A. Swarupa and D. Singh, “Vision based rail
track extraction and monitoring through drone imagery,” ICT Express,
vol. 5, no. 4, pp. 250–255, December 2019.
[6] F. Jonathan, “7 reasons to choose the DJI spark,” May 20, 2020. [Online].
Available: https://dronerush.com/reasons-to-choose-dji-spark-9807/
[7] P. Mishra, “Why are convolutional neural networks good for image
classification?”
[8] E. AI and V. Alliance, “Vision processing opportunities in drones,”
September 15, 2016. [Online]. Available: https://www.edge-ai-vision.
com/2016/09/vision-processing-opportunities-in-drones/
[9] A. Bonner, “The complete beginner’s guide to deep learning:
Convolutional neural networks and image classification,” February
2, 2019. [Online]. Available: https://towardsdatascience.com/
wtf-is-image-classification-8e78a8235acb
[10] B. Hu and J. Wang, “Deep learning based hand gesture recognition and
uav flight controls,” International Journal of Automation and Computing,
vol. 17, pp. 17–29, 2020.
[11] S. Hadri, “Hand gesture for drone control using deep learning,” 2018.
[12] N. Kang, “Introducing deep learning and neural networks — deep
learning for rookies (1).”
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in ICLR 2015, 2014.
[14] S. R. Kaiming. He, Xiangyu. Zhang and Jian.Sun, “Deep residual learning
for image recognition,” in The IEEE conference on computer vision and
pattern recognition, 2016.
[15] V. Bazarevsky and F. Zhang, “Google AI blog,” August
19, 2019. [Online]. Available: https://ai.googleblog.com/2019/08/
on-device-real-time-hand-tracking-with.html
[16] Google, “Teachable machine,” 2019. [Online]. Available: https:
//teachablemachine.withgoogle.com/
[17] J. Brownlee, “What is a confusion matrix in machine learning,”
August 15 2020. [Online]. Available: https://machinelearningmastery.
com/confusion-matrix-machine-learning/
[18] P. Developers, “Parrot sdk,” n.d. [Online]. Available: https://developer.
parrot.com/

Authorized licensed use limited to: Dalhousie University. Downloaded on June 15,2021 at 11:57:55 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like