0% found this document useful (0 votes)
23 views2 pages

Hussain 2017

dd

Uploaded by

coconutchor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views2 pages

Hussain 2017

dd

Uploaded by

coconutchor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Hand Gesture Recognition Using Deep Learning

Soeb Hussain and Rupal Saxena Xie Han, Jameel Ahmed Khan, Prof. Hyunchul Shin
Department of Chemistry Dept. of Electronics and Communication Engineering
Indian Institute of Technology, Guwahati Hanyang University
Guwahati, India Sangnok-gu, Korea
[email protected] and [email protected] [email protected] and [email protected]

Abstract—In order to offer new possibilities to interact with Each frame after resizing and padding is entered to the
machine and to design more natural and more intuitive classifier. If the classified hand is a static gesture then it
interactions with computing machines, our research aims at the immediately passes to commanding phase. Otherwise, it passes
automatic interpretation of gestures based on computer vision. In to hand tracing phase. The block diagram of our proposed
this paper, we propose a technique which commands computer method is shown in Fig.2.
using six static and eight dynamic hand gestures. The three main
steps are: hand shape recognition, tracing of detected hand (if
dynamic), and converting the data into the required command.
Experiments show 93.09% accuracy.

Keywords—computer vision, deep learning, hand gesture, neural


network, transfer learning, hand gesture recognition.

I. INTRODUCTION
Gesture recognition is the mathematical interpretation of a
human motion by a computing device. Modern research of the
control of computers changes from standard peripheral devices
to remotely commanding computers through speech, emotions
and body gestures [1]. Our application belongs to the domain of
hand gesture recognition which is generally divided into two Fig. 2. Workflow
categories i.e. contact-based and vision-based approaches. The
second type is simpler and intuitive as it employs video image II. HAND SHAPE RECOGNITION USING TRANSFER LEARNING
processing and pattern recognition. For hand shape recognition, the classifier is trained through
the process of transfer learning [3] over a pretrained CNN that
The aim is to recognize six static and eight dynamic is initially trained on a large dataset.
gestures while maintaining accuracy and speed of the system.
The recognized gestures are to command the computer. Transfer learning is transferring learned features of a
pretrained network to a new problem. The initial layers of the
Division of hand gestures are explained in the block pretrained network can be fixed, the last few layers must be
diagram shown in Fig. 1. fine-tuned to learn the specific features of the new data set.
2 Gestures
Multidirectional
2 shapes
Pointer In our work, VGG16 a CNN architecture is used as the
pretrained model. It consists of 13 convolution layers
8 Gestures Cursor
Dynamic 5 shapes
followed by 3 fully connected layers. A convolutional neural
network (CNN) is a type of feed-forward artificial neural
14 Gestures 6 Gestures Swap Left
11 shapes Unidirectional 3 shapes Swap Right network in which the connectivity pattern between its neurons
Zoom In
is inspired by the organization of the animal visual cortex. We
Static 6 Gestures
6 shapes Zoom Out need to recognize eleven hand shapes, hence CNN is
Scroll Up trained as a classifier using transfer learning method. To reach
Scroll Down the desired output, network model needs to be altered.
Therefore, two layers of the model were replaced with a set of
Fig. 1. Division of eleven shapes into fourteen gestures layers that can classify 11 classes. All other layers remained
unaltered. To avoid over fitting, the Regularization along with
For hand shape recognition, a CNN based classifier is
a more diverse dataset was introduced. Regularization involves
trained through the process of transfer learning over a
modifying the performance function which is normally chosen
pretrained convolutional neural net which is initially trained on
to be the sum of the square of the network errors on the training
a large dataset. We are using VGG16 [2] as the pretrained
set. The Classifier used over 55 thousand self-created image
model.

978-1-5386-2285-8/17/$31.00 ©2017 IEEE 48 ISOCC 2017


dataset out of which 70 percent were used for training and rest
for testing. If recognized hand gesture is a dynamic hand
gesture then it will further be traced to detect motion.
III. TRACING OF DETECTED HAND (IF DYNAMIC)

Recognition of a static gesture requires only the hand shape.


Once hand shape is classified as static gesture by the trained
classifier, command is given to the computer. Unlike static
gesture, dynamic gesture requires both the hand shape as well
as the motion of hand. For tracing dynamic hand gestures, hand
area is segmented out using HSV (Hue, Saturation, Value) skin
color algorithm in a frame, followed by cropping blob area.
Centroid of the blob is detected and traced. The main idea in
this stage consists in retrieving the coordinates of the traced
hand’s center in each frame. These coordinates will be used in
order to know which computer command corresponds to which
motion. Coordinates will be used differently for each gesture,
depending on detected hand shape.
Five out of eleven hand shapes are used for dynamic hand
gestures and rest for static hand gestures. These dynamic hand
shapes are categorized into unidirectional and multi-directional
hand gestures. Unidirectional hand gestures require shape and
direction of motion of hand for commanding whereas
multidirectional gestures require the position of hand along
with its shape. Out of five dynamic hand shapes three are used
for unidirectional gestures namely: swap, scroll and zoom, and Fig. 3. Experimental results
remaining are used for multidirectional gestures of pointer and V. CONCLUSION
cursor. Each unidirectional gesture can further be used for
differentiating two hand gestures depending on the direction of We propose a vision based hand gesture recognition method
motion, e.g. swap can be left or right, depending on the using transfer learning. The method was made robust by
direction of motion of hand. avoiding skin color segmentation, blob detection, skin area
cropping and centroid extraction for unidirectional dynamic
Tracing involves extracting position of hand which is done gestures. Prototype was tested successfully on seven different
by skin color detection, skin cropping, blob detection and volunteers at different backgrounds and light conditions with an
centroid extraction. Hence tracing on the whole is a accuracy of 93.09%.
comparatively time consuming process. This process of tracing
can be avoided after certain frames for unidirectional gestures ACKNOWLEGMENT
as it only requires the direction of motion which can be derived We would like to thank IC Design Education Centre (IDEC)
from the few initial frames. Hence, the direction of at Hanyang University for Supporting tools and environment
unidirectional dynamic gestures can be determined by for this research.
comparing centroid of initial frames.
REFERENCES
IV. EXPERIMENTAL RESULTS [1] Tahani Bouchrika, Mourad Zaied, Olfa Jemai and Chokri Ben Amar,
Our prototype was tested in different backgrounds by seven “Ordering computers by hand gestures recognition based on wavelet
networks” Communications, Computing and Control Applications, 2012.
volunteers who did not train the system. Each of them DOI 10.1109/CCCA.2012.6417911
performed all the hand gestures. We compared our results with [2] Karen Simonyan, Andrew Zisserman, “Very deep convolutional network
CNN architecture AlexNet. Obtained results are shown in the for large scale image recognition” ICLR (Internaltional Conference on
Fig. 3. below. Overall accuracy for AlexNet is 76.96%. Our Learning Representations) 2015.
Recorded accuracy is 93.09%. [3] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions
on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, Oct.
2010.

978-1-5386-2285-8/17/$31.00 ©2017 IEEE 49 ISOCC 2017

You might also like