0% found this document useful (0 votes)
8 views

ResearchPaper3

Uploaded by

ujjwalbansal780
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ResearchPaper3

Uploaded by

ujjwalbansal780
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AutomatedHandGestureRecognitionusingaDeep

Convolutional Neural Networkmodel


IshikaDhall ShubhamVashisth GarimaAggarwal
Department ofComputerScience& Department ofComputerScience& Department ofComputerScience&
Engineering Engineering Engineering
Amity University, UttarPradesh Amity University, UttarPradesh Amity University, UttarPradesh
India India India
[email protected] [email protected] [email protected]

Abstract—The tremendous growthin the domainof deeplearning inother dataanalysisandclassificationproblemsaswell. It isa


has helped in achieving breakthroughs in computer vision typeof artificial neural networkwhichhasaspecialty of being
applicationsespeciallyafter convolutional neural networkscoming ableto deduceor distinguishpatterns andunderstandingthem.
into the picture. The unique architecture of CNNs allows it to It isdifferent thanother deeplearningmodelsasit hasanextra
extract relevant information fromthe input images without any set of hiddenlayers calledtheconvolutional layers alongwith
hand-tuning. Today, with such powerful models we have quite a the standardhidden layers. It canhave one or more than one
flexibility build technology that may ameliorate human life. One
such technique can be used for detecting and understanding convolutional layer followedbythefully connectedlayers. The
various human gestures as it would make the human-machine systemwill be learning features fromeach gesture and then
communication effective. This couldmake the conventional input further classify it. Theentirenotionof makingamachinelearn
devices like touchscreens, mouse pad, and keyboards redundant. and making it smart is based on the abundance of data or
Also, it is considered as a highly secure tech compared to other information.
devices. In this paper, hand gesture technology along with
Convolutional Neural Networks has been discovered followed by Our data fuels the machines and is used to make the
theconstructionof adeepconvolutional neural networktobuild a machinelearntomakepredictions. Themainaimof this paper
handgesturerecognitionapplication.
is to train analgorithmwhich enables it to classify images of
Keywords—Convolutional Neural Networks, Hand Gesture va rious hand gestures and signs like thumbs up, bolted fist,
Recognitionsystem, FeatureMap, Deepneural network finger count, etc. Since the analysis of visual imaginings is
being used, the class used to performdeep learning will be
I. INTRODUCTION Convolutional Neural Network with Keras andTensorFlowas
it isthestandardizedversionof amultilayer perceptron.
A gesture is a body movement that conveys a noteworthy
implication. Gesture recognition is a computer science Thisresearchprovidesthereaderwithaprofoundknowledge
technology that helps a user in interacting with their digital of adeepconvolutional neural network. Also, this paper uses
devices using simple and natural body gestures. Gesture the data captured using the OpenCV library which will
recognition technology can be beneficial at many places like contributetoimprovingtheaccuracy scoreof theexistinghand
automated home appliances, hand signal interpretation [1], gesturerecognitiontechniques.
automobiles, etc. Handgesturerecognitionis apart of gesture
recognition that is based on recognizing the movements of In this research, a real-time anti-encroaching hand gesture
handsmeant tobedelivered, for example: showingaforefinger recognition and hand tracking mechanismhas been proposed
could denote the number “1” or a thumbs up could be an whichwill improvethehuman-computer interactionsandbring
indicationof agreement. ease for the ones who rely on gestures for their day-to- day
communication. It canbeasignificant communicationtool for
Deep learning is a fragment of a wide-ranging family of deafened people and people with ASD or autism spectrum
Artificial Intelligence. It essentially puts alight ontheconcept disorder. It canbeof great assistancefor SOSsignaling.
of amulti-layer perceptron learning. A Convolutional Neural
Network commonly knownasaCompNet is aneural network II. LITERATUREREVIEW
classusedindeeplearningwhichismost appliedtoimagesand
videos for their analysis. A CNNis atechnique, or amachine A technique of hand gesture recognition on a video game-
learning model that can be applied to images to make them basedapplicationhasbeenproposedin [1].
interpretablebymachines. It canbeimplemented

978-l-7281-2791-0/20/$31.00©2020IEEE 811
A newalgorithmhas been discussed in [1] to recognize and InthegivenFig 1, theimagegivenastheinput isusedtofind
trackhandgestures for better interactionwith avideo game. It theprimitivefeatureslikehorizontal andvertical lines.
is consisting of four hand gestures and four-hand direction
classesto fulfil requirements that couldhavebeenextendedto
make it more powerful. It uses segmentationandtracking [2]. PRIMITIVE
Theproposedalgorithmwasperformedon40 samplesandthe FEATURES
accuracyturnedout tobequiteimpressive. OBJECT
CLASS
Use of aconvolutional neural network to reducethefeature
extraction process and parameters being used has been
discussed in [2]. The hand gesture recognition is performed
using aconvolutional neural network but the one usedin our BUILDING
paper shows a deep convolutional neural network
implementation. Resultsshownin [3] areveryimpressivewhen
a training set of 50%of the database is used. Max Pooling
Convolutional Neural Network to advance Human- robot
interactions using color segmentation, edge blurring with
morphological digital imageprocessingandthenexperimenting
withmobilerobotsusingARM11533MHz [3]. Fig1. WorkingofaCNNModel
They manage to get anaccuracy score of around 96%. The After the primitive feature extraction phase, the next stage
vocabulary in the proposed project was up to 11 classes. It determines the part of the given object using the features
couldhavebeenusedin amilieuof human-swarminteraction. extracted. Objects parts arethenusedto interpret the class of
Driver’s hand gesture recognition via a 3D convolutional theobject.
neural net was followed [4]. It engages spatial data
augmentation techniques and pre-processing techniques for Interfaces that cannot be touched not only improves the
betterresults. driver's focus andprevent possiblemishapsbut alsomakesthe
devicesmuchmoreuser-friendly duetowhichimplementation
They achieved a score of 77% which could have been of suchtechnologiesinseveral control systemsispreferred.
improved by constructing a deeper neural network. The
challengesof a3DCNNtoperformclassificationanddetection IV. METHODOLOGY
onthegivendataset wasaddressedin [5] whichalsointroduces
a multi-modal dynamic challenging dataset and achieved an Fig2showsaconvolutional Neural Networkmodel and
accuracyof 83.8%. variouslayersinvolved.
III. PRELIMINARIES
A convolutional neural network is a deep learning neural
network class which is most appliedto images andvideos for
its analysis. It is a kind of artificial neural network using
machine learning algorithms for a unit and perceptron for
supervisedproblems.
A CNN is basically a technique, or a machine learning
model applied to images to make them interpretable by
machines. It canhaveoneor morethanoneconvolutional layer
followed by the fully connectedlayers. All types of cognitive CNNisatypeof neural networkthat isempoweredwith
tasks are performed using CNNs like Natural Language certainspecificlayers, suchas:
processing, image processing, etc. The concept of machine
learning is not a contemporary thing, the first Artificial 1. Input layer
Intelligence-based program which came into play with a 2. Hiddenlayer:
learned version of a game in which anArtificial Intelligence
programwasbuilt that understoodnatural languagefinally.

812 10thInternational ConferenceonCloudComputing, DataScience&Engineering(Confluence)


reduction. Max Pooling finds the extreme values in input and
2.1. AConvolutional Layer simplifies the inputs. It diminishes the number of parameters
2.2. AMaxPoolingLayer within themodel andconverts lower-level datainto apieceof
2.3. AFully ConnectedLayer higher-level information. In Fig 3, process of extracting a
3. Output layer featuremapisshown.
Aconvolutional layer isthefirst HiddenLayer of convolutional
deep learning and its key purpose is to detect and extract
featuresfromtheimagethat wastheinput toour model likethe
edgesandverticesof animage. Let’stakeanexample, assume
that we have an input image to be detected, then the hidden
convolutional layer will helpthesysteminfindingtheedgesof
that image. After that, we define afilter matrix also called a
‘Kernel’ whichisfurther usedfor findinganewimagewithall
theedges. Slidethis filter for certainstrides all over theimage
tofindanewimagewithjust theedges. Apply thedot product
over the pixels of the image to find the image with all the
edges. Edge image is useful for the initial steps and layers of
CNN, infact, it isthevery first set of primitivefeaturesetsfor
theworkinginahierarchyof all thefeatures.
Theconvolutional processisabout detectingtheedges, small
patterns, orientationspresent in animageandit ispresent asa
mathematical function. Convert the image into a matrix of
binary integer values, i.e., 0sand 1swhereappoint thevalue0
to all theplaceswheretheimage isblack andappoint 1to all
the white parts of the image. Commonly while making this For example, A 2x2 matrix results into asingle pixel databy
matrix, avalue between 0 and 255 is used where all these choosing the maximumvalue of the matrix. Strides dictates
numbers represent different shades of grey in case of a the sliding behavior of amax-pooling process [6]. It prevents
grayscaleimage. Athree-channel imageisusedincaseof input overlapping, for example, if strides have avalue two thenthe
images which are not in grayscale format but are colored. windowwill move2pixels everytime. Afully connectedlayer
Randomly select thetype of filter andarbitrarily initialize the takes the high-level images fromthe previous layer’sfiltered
valueofmatrixelements. output andconverts it into avector. Eachlayer of theprevious
matrixwill befirst convertedintoasingle(flatten) dimensional
The values of matrix elements are updated by optimum vector theneachvector isfully connectedtothenext layer that
valuesasit goesthroughthetrainingphase. Takedot product of is connected through a weight matrix [7]. SOFTMAX is an
pixels with every matrix value, now slide the kernel window activationfunctionthat helpstofindtheclassof eachdigit and
over theimageandfind anewmatrix calledconvolvedimage. generateprobabilityfor outputs.
Performthe element-wise multiplication of the kernel matrix
and the image matrix. Try to apply multiple kernels in one The advantages of CNN include being a strong and
image. Multiple kernelsbeingappliedto oneimagewill result computationally fast model, moreefficient intermsof memory
inmultipleconvolvedmatrices. andit showsaccurateresultsonimages.
Leveraging different kernels assistances, it helps to find The disadvantages of CNN are that it is computationally
divergent patternspresent in theimagelike curves, edges, etc. exp
At theendof this layer, wereceiveafeaturemapwhichisthe alonegnsw ive, quite complex, requires GPU and a large dataset
output of the process of convolution. Rectified linear unit imbalancithe
theproblemof Overfitting. It canalsoresult in an
in class. One of the parts of collected dataset is
(ReLu) [13] which is anon- linear activationfunctionunit for presentedinFig 4. It showstheimagesof classshowing‘1’.
providingamappingbetweentheresponsevariablesandinputs.
Swapall negativenumberswith a‘0’ andall positivenumbers
staythesamewhileusingReLufunction(tf.nn.relu(x)).
For apooling layer, down sampletheoutput imagesfor relu
function to performdimensionality reduction for the activated
neurons. UsetheMaxPoolinglayertoperformdimensionality

10thInternational ConferenceonCloudComputing, DataScience&Engineering(Confluence) 813


(e)
Fig5. Handgesturedetection
Fig. 4. Dataset ofoneofthelabels
Fig6(a), 6(b), 6(c), 6(d) and6(e) representsthepre-processed
All the necessary libraries like Keras, TensorFlow, etc. were thresholdimagesof “1”, “2”, “3”, “4” and“5” respectively.
imported and data set was gathered using OpenCV library
followed by an implementation of data augmentation
technique which is anapproach allows experts to expressively
rise the variety of data available for model training, without
collectingmuchdata.
V. RESULT
The experimental results of this paper show that the model
proposedin thispaper candistinguishamongseveral dominant
and low-level features for the input images and can classify
varioushandgestures[8] withgreater accuracyandanegligible (a) (b)
model lossof 0.0504.
Fig 5(a) illustrates the hand gestures showing the value “1”,
5(b) represents “2”, 5(c) represents 3”, 5(d) represents “4” and
5(e) represents“5” beingcapturedanddetectedsuccessfully.

(c)

(a) (b)

(d) (e)
(c) (d) Fig6. Thresholdimages

814 10thInternational ConferenceonCloudComputing, DataScience&Engineering(Confluence)


Table1. Optimal hyperparametersfortheCNN
Beforeactual detection, theimagestobetrainedaresegmented.
Afterwards, features [9] areextractedandreduced(finding the Parameters Value
significant features andremoving the unnecessary details that
maycreatenoiseor reduceaccuracy). Fig7presentsagraphof HiddenLayers 7
resultsreceivedfromthepresentedmodel. 1
Dropout layer
Numberoffilters 32,64
per Convolutional
layer
NodesinDense 128
layer
Batchsize 32
ActivationFunction ReLU
Optimizer Adam
Epochs 30
Kernel Size (3, 3)
SizeofPool (2, 2)
Fig7. Result accuracygraphofthemodel Strides (2, 2)
The model was able to achieve atesting accuracy of 99.13%
which was muchhigher thanthe existing models. Gaining an VI. CONCLUSIONANDFUTUREWORK
accuracyashighasthat wasatoughjobbut it canbeachieved
by tuning various hyperparameters [10] and by performing This paper discusses and offers astate-of-the-art deep Multi-
properdatapre-processingandaugmentationtechniques. layer Convolutional Neural Network for performing hand
gesture recognition in Human-Robot Interaction systems. It is
Datapre-processing anddataaugmentationvalues: an efficient model to be used on image data when tuned
properly and with proper image pre-processing [15]. While
Rescaling=1./255 detectingtheimagesonlive-videosor staticimages, theimages
Rotationrange=20 and labels that were fed andtrained in the model areusedto
Width shift range= 0.2 compare the output. Its palpable ability to determine the
Height shift range=0.2 inva riant problemof recognizing gestures despiteall thenoise
andcomplicationsisundefeatable. Also, it hasbeenposedwith
Horizontal flip=True real hand gesture images and despite hefty numbers and
boundless structural overlapping of noises [16], the program
The techniques like rescaling, cropping, padding, and workedfinealongwithprovidingadecent accuracynotch. This
horizontal flipping arevery commonfor training denseneural workcanfurther beextendedby addingmorefunctionalitiesto
networks. Data pre-processing [11] is also an important step themodel andbymakingagreaternumberof classes.
which is mostly used in cases where we need to reduce the
amount of data. It reducesthenumber of attributes, number of
attribute values and the number of tuples. The most optimal toA p
bovementionedexplicationsput it inpersuasivedistinction
ro posethemechanismsof recognition[17] whichonlywork
values upon testing the accuracy of model [12] were as on m eek images having fewer varying objects to be
mentionedintheresultsabove. distinguished [18]. Therefore, this work is afoot in adoor. Its
remaining complications are meant to be resolved
Here, Table1representsthevaluesofvariousparametersthat progressively.
weretuned. Themodel was testedandthevalidation score of
the model is calculated. The resultant Deep Convolution REFERENCES
Network model comprises of seven hidden layers and tuned
Hyperparameters[14] for our application. [1] Pigou, Lionel, et al. "Signlanguagerecognitionusing
convolutional neural networks." European Conference on
Computer Vision. Springer, Cham,pp. 572-578, 2014.

10thInternational ConferenceonCloudComputing, DataScience&Engineering(Confluence) 815


[2] Li, Gongfa, et al. "Hand gesture recognition based on
convolution neural network." Cluster Computing pp.2719-
2729, 2017.
[3] Nagi, Jawad, et al. "Max-pooling convolutional neural
networks for vision-based hand gesture recognition." 2011
IEEE International Conference on Signal and Image
ProcessingApplications (ICSIPA). IEEE, pp. 342-347,2011.
[4] Molchanov, Pavlo, et al. "Hand gesture recognition with
3Dconvolutional neural networks." Proceedings of the IEEE
conference on computer vision and pattern recognition
workshops. pp. 1-7, 2015.
[5] Molchanov, Pavlo, et al. "Online detection and
classification of dynamic hand gestures with recurrent 3d
convolutional neural network." Proceedings of the IEEE
Conference on Computer Vision andPatternRecognition. pp.
4207-4215, 2016.
[6] Tolias, Giorgos, RonanSicre, andHerveJegou. "Particular
object retrieval with integral max-poolingof CNN
activations." arXivpreprint arXiv:1511.05879, 2015.
[7] Ren, Zhou, et al. "Robust part-based hand gesture
recognition using kinect sensor." IEEE transactions on
multimedia 15.5, pp.1110-1120, 2013.
[8] Nishihara, H. Keith, et al. "Hand-gesture recognition
method." U.S. Patent No. 9,696,808. 4Jul, 2017.
[9] Chaudhary, Anita, and Sonit Sukhraj Singh. "Lung cancer
detection on CT images by using image processing." 2012
International Conference on Computing Sciences. IEEE, pp.
142-146, 2012.
[10] Binh, NguyenDang, EnokidaShuichi, andToshiaki
Ejima. "Real-timehandtracking andgesturerecognition
system." Proc. GVIPpp.19-21, 2005.
[11] Murthy, G. R. S., andR. S. Jadon. "Areviewofvision
basedhandgesturesrecognition." International Journal of
Information TechnologyandKnowledge Management 2.2,
pp.405-410, 2009.
[12] Manresa, Cristina, et al. "Hand tracking and gesture
recognition for human-computer interaction." ELCVIA
Electronic Letters on Computer Vision and Image Analysis
5.3, pp.96-104, 2005.
[13] Agarap, AbienFred. "Deeplearningusingrectifiedlinear
units (relu)." arXivpreprint arXiv:1803.083752018.
[14] Wang, Binghui, andNeil ZhenqiangGong. "Stealing
hyperparametersinmachinelearning." 2018IEEESymposium
onSecurityandPrivacy (SP). IEEE, pp. 36-52,2018.
[15] Poostchi, Mahdieh, et al. "Image analysis andmachine
learningfor detectingmalaria." Translational Research194,
pp.36-55, 2018.
[16] Santhanam, T., andS. Radhika. "ANovel Approachto
ClassifyNoisesinImagesUsingArtificial Neural Network1."
2010.
[17] Simonyan, Karen, andAndrewZisserman. "Verydeep
convolutional networksfor large-scaleimage
recognition." arXivpreprint arXiv:1409.15562014.
[18] Kim, Youngwook, andBrianToomajian. "Handgesture
recognitionusingmicro-Doppler signatureswith
convolutional neural network." IEEEAccess4,pp.7125-7130
2016.

816 10thInternational ConferenceonCloudComputing, DataScience&Engineering(Confluence)

You might also like