0% found this document useful (0 votes)
21 views4 pages

Hand Gesture Recognition With Convolution Neural Networks

Uploaded by

technoversalgeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

Hand Gesture Recognition With Convolution Neural Networks

Uploaded by

technoversalgeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

*&&&UI*OUFSOBUJPOBM$POGFSFODFPO*OGPSNBUJPO3FVTFBOE*OUFHSBUJPOGPS%BUB4DJFODF *3*

Hand Gesture Recognition with Convolution Neural Networks

Felix Zhan
USAOT
[email protected]

Abstract— Hand gestures are the most common forms of have proposed data augmentation strategies to prevent CNNs
communication and have great importance in our world. They from overfitting when training with datasets containing limited
can help in building safe and comfortable user interfaces for a diversity. Krizhevsky et al. [19] employed translation,
multitude of applications. Various computer vision algorithms horizontal flipping and RGB jittering of the training and testing
have employed color and depth camera for hand gesture images for classifying them into 1000 categories. Simonyan
recognition, but robust classification of gestures from different and Zisserman [18] employed similar spatial augmentation on
subjects is still challenging. I propose an algorithm for real-time each video frame to train CNNs for video-based human activity
hand gesture recognition using convolutional neural networks recognition. However, these data augmentation methods were
(CNNs). The proposed CNN achieves an average accuracy of limited to spatial variations. To add variations to video
98.76% on the dataset comprising of 9 hand gestures and 500 sequences containing dynamic motion, Pigou et al. [17]
images for each gesture.
temporally translated the video frames in addition to applying
Keywords-deep learning; Convolution Neural Networks; Hand
spatial transformations. Other research motivated my ideas
Gesture Recognition includes [20-65].
In this paper, I introduce a hand gesture recognition system
that extracts hand components in the image and learns and
I. INTRODUCTION predicts using 2D convolutional neural networks. To reduce
In recent years, robotics and artificial intelligence have potential over- fitting and improve generalization of the gesture
been leveraged to increase the autonomy of people living with classifier, I propose an effective spatio-temporal data
disabilities. In this context, the main objective is to improve the augmentation method to deform the input volumes of hand
quality of life by enabling users to perform a wider range of gestures. The augmentation method also incorporates existing
day-to-day tasks more efficiently. In particular, hand gesture spatial augmentation techniques [12].
recognition has been recognized as a valuable technology for
several application fields, especially for Sign Language II. METHOD
Recognition (SLR). Sign languages comprise of complex hand I used a CNN classifier for dynamic hand gesture
movements, and even miniscule hand changes can have a recognition. Section 2.1, briefly describes the hand gesture
variety of possible meanings. In response to this, in the last dataset used in this paper. Section 2.2 to 2.3 describe the
decade, many vision-based dynamic hand gesture recognition preprocessing steps needed for my model, the details of the
algorithms were introduced [1,2]. To recognize gestures, classifier and the training pipeline for the two sub-networks
different features such as hand-crafted spatio-temporal (Fig. 1). Finally, I introduce a spatio-temporal data
descriptors [3] and articulated models [4], were used, along augmentation method in Section 2.4, and show how it is
with gesture classifiers, hidden Markov models [5], conditional combined with spatial transformations.
random fields [6] and support vector machines (SVM) [7] have A. DATASET
been widely used. However, classification of gestures is I have acquired 500 images of 9 hand gestures using
unpredictable under varying lighting conditions, and from webcam to evaluate the model. Each image is a 50x50 pixels.
different subjects is still a challenging problem [8,9,10]. Skin pixels are extracted from the color image and then
An intuitive approach for creating interfaces is to look at converted to black and white. The dimensions of these black
the muscle activity of the user. This activity can be recorded by and white images are reduced to 50x50 pixels. Sample image
the device using a camera. These recorded images can then be
for each of the 9 hand gestures are shown in Fig. 1.
analyzed using deep learning algorithms to determine the sign.
Recently, classification with deep convolutional neural
networks has been successful in various recognition challenges
[11,12,13,14]. Multi-column deep CNNs that employ multiple
parallel networks have been shown to improve recognition
rates of single networks by 30-80% for various image
classification tasks [15]. Similarly, for large scale video
classification, Karpathy et al. [16] observed the best results on
combining CNNs trained with two separate streams of the
original and spatially cropped video frames.
Several authors have emphasized the importance of using
many diverse training examples for CNNs [12, 17, 18]. They Figure 1

¥*&&& 
%0**3*
Images pertaining to each hand gesture are segregated into a failed to converge to acceptable solutions. As recommended in
separate folder. Each folder has a text file with an entry for [20], BN was applied before the non-linearity.
each image in the folder. The entries in the text file denote one
of the hand gesture the image depicts. Along with this dataset, C. TRAINING
I have used spatio-temporal data augmentation techniques to The process of training a CNN involves the optimization of
the network parameters to minimize a cost function for the
get an additional 4000 images. More details about the
dataset. I selected mean squared error as the cost function:
technique is discussed in section 2.4. I performed optimization via stochastic gradient descent. I
B. CLASSIFIER updated the networks parameters, with the Nesterov
accelerated gradient at every iteration. I initialized the weights
The network consists of six 2D convolution layers, each of of 2D convolutional layers with random samples. These terms
which is followed by a max-pooling operator. Fig 2 shows the are explained in greater details in the following subsections.
sizes of the convolution kernels, volumes at each layer, and the For tuning the learning rate, I initialized the rate to 0:005
pooling operators. The output of the sixth convolution layer is and reduced it by a factor of 2 if the cost function did not
given as input to a fully connected network having 9 layers. improve by more than 10% in the preceding 40 epochs. I
Each layer has 512 hidden neurons except the last output layer terminated network training after the learning rate had decayed
which has 9 neurons, one neuron each for the 9 hand gestures. at least 4 times or if the number of epochs had exceeded 300.
A sigmoid activation function is used in the output layer. Tanh Since the dataset is small, I did not reserve data from any
activation function is used in the remaining eight layers. subjects to construct a validation set. Instead, I selected the
In the context of this article, acquiring a large dataset for network configuration that resulted in the smallest error on the
each individual subject would be time-consuming and training set.
impractical when considering real-life applications, as a user
would often not endure hours of data recording for each C.1 STOCHASTIC GRADIENT DESCENT
training. To address this overfitting issue, Batch Normalization Stochastic gradient descent (often shortened to SGD), also
[20] is utilized and explained in greater details in the following known as incremental gradient descent, is a stochastic
subsections. approximation of the gradient descent optimization and
iterative method for minimizing an objective function that is
B.1 BATCH NORMALIZATION written as a sum of differentiable functions. In other words,
Batch Normalization (BN) [20] is a recent technique that SGD tries to find minima or maxima by iteration.
normalizes each batch of data through every layer during

training. After training, the data is fed one last time through the
network to compute the data statistics in a layer-wise fashion D. SPATIO-TEMPORAL DATA AUGMENTATION
which are then fixed at test time. BN was shown to yield faster The dataset has 4500 gestures for training, which are not
training times whilst achieving better system accuracy and enough to prevent overfitting. To avoid overfitting, I
regularization [20]. When removing BN, the proposed CNN performed spatio-temporal data augmentation. I have


performed horizontal mirroring of the images to generate a [2] V. I. Pavlovic, R. Sharma, and T. S. Huang. Visual interpretation of
new set of data as shown in Fig 3. hand gestures for human-computer interaction: A review. PAMI,
19:677–695, 1997.
[3] P. Trindade, J. Lobo, and J. Barreto. Hand gesture recognition using
color and depth images enhanced with hand angular pose data. In IEEE
Conf. on Multisensor Fusion and Integration for Intelligent Systems,
pages 71–76, 2012.
[4] J. J. LaViola Jr. An introduction to 3D gestural interfaces. In
SIGGRAPH Course, 2014.
[5] T. Starner, A. Pentland, and J. Weaver. Real-time American sign
language recognition using desk and wearable computer based video.
PAMI, 20(12):1371–1375, 1998.
[6] S. B. Wang, A. Quattoni, L. Morency, D. Demirdjian, and T. Darrell.
Hidden conditional random fields for gesture recognition. In CVPR,
pages 1521–1527, 2006.
[7] N. Dardas and N. D. Georganas. Real-time hand gesture detection and
recognition using bag-of-features and support vector machine
techniques. IEEE Transactions on Instrumentation and Measurement,
60(11):3592–3607, 2011.
[8] M. Zobl, R. Nieschulz, M. Geiger, M. Lang, and G. Rigoll. Gesture
components for natural interaction with in-car devices. In Gesture-Based
Communication in Human Computer Interaction, pages 448–459.
Springer, 2004.
[9] F. Althoff, R. Lindl, and L. Walchshausl. Robust multimodal hand-and
head gesture recognition for controlling automotive infotainment
systems. In VDI-Tagung: Der Fahrer im 21. Jahrhundert, 2005.
[10] F. Parada-Loira, E. Gonzalez-Agulla, and J. Alba-Castro.Hand gestures
to control infotainment equipment in cars. In IEEE Intelligent Vehicles
Symposium, pages 1–6, 2014.
[11] D. C. Cires¸an, U. Meier, J. Masci, L. M. Gambardella, and J.
Schmidhuber. Flexible, high performance convolutional neural networks
for image classification. In International Joint Conference on Artificial
Intelligence, pages 1237–1242, 2011.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification
Fig 3. Spatio-Temporal data augmentation with deep convolutional neural networks. In NIPS, pages 1097–1105.
2012.
III. RESULTS [13] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning
applied to document recognition. In Proceedings of the IEEE, pages
I evaluated the performance of the hand gesture recognition 2278–2324, 1998.
system using a test set. The original dataset was split into 7:3 [14] P. Y. Simard, D. Steinkraus, and J. C. Platt. J.c.: Best practices for
convolutional neural networks applied to visual document analysis. In
ratio. 70% was used for training and remaining 30% was used Int. Conference on Document Analysis and Recognition, pages 958–
for testing. The classifier showed an accuracy of 98.74% on 963, 2003.
the test set. [15] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural
networks for image classification. In CVPR, pages 3642–3649, 2012.
[16] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L.
IV. CONCLUSION Fei-Fei. Large-scale video classification with convolutional neural
I developed an effective method for dynamic hand gesture networks. In CVPR, pages 1725–1732, 2014.
recognition with 2D convolutional neural networks. The [17] L. Pigou, S. Dieleman, P.-J. Kindermans, and B. Schrauwen. Sign
language recognition using convolutional neural networks. In ECCVW,
proposed classifier utilizes spatio-temporal data augmentation 2014.
to avoid overfitting. By means of extensive evaluation, I [18] K. Simonyan and A. Zisserman. Two-stream convolutional networks for
demonstrated that the combination of low and high resolution action recognition in videos. In NIPS, pages 568–576, 2014.
[19] P. Molchanov, S. Gupta, K. Kim, and K. Pulli. Multi-sensor System for
sub-networks improves classification accuracy considerably. I Driver’s Hand-gesture Recognition. In AFGR, 2015.
further demonstrated that the proposed data augmentation [20] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
technique plays an important role in achieving superior network training by reducing internal covariate shift,” in International
Conference on Machine Learning, 2015, pp. 448–456.
performance. For the dataset, my proposed system achieved a [21] Raymond Ahn, Justin Zhan, Using proxies for node immunization
validation accuracy of 98.2%. My future work will include identification on large graphs, IEEE Access, Vol. 5, pp. 13046-13053,
more adaptive selection of the optimal hyper-parameters of the 2017.
CNNs, and investigating robust classifiers that can classify [22] Gary Blosser, Justin Zhan, Privacy preserving collaborative social
network, International Conference on Information Security and
higher level dynamic gestures including activities and motion Assurance, pp. 543-548, 2008.
contexts. [23] C Chiu, J Zhan, F Zhan, Uncovering suspicious activity from partially
paired and incomplete multimodal data, Vol. 5, pp. 13689-13698, IEEE
REFERENCES Access, 2017.
[24] Brittany Cozzens, Richard Huang, Maxwell Jay, Kyle Khembunjong,
[1] S. Mitra and T. Acharya. Gesture recognition: A survey. Sahan Paliskara, Felix Zhan, Mark Zhang, Shahab Tayeb, Signature
IEEE Systems, Man, and Cybernetics, 37:311–324, 2007. Verification Using a Convolutional Neural Network, University of
Nevada Las Vegas AEOP/STEM/REAP/RET Programs Technical
Report, 2018.


[25] Pravin Chopade, Justin Zhan, Marwan Bikdash, Node attributes and [45] Justin Zhan, Sweta Gurung, Sai Phani Krishna Parsa, Identification of
edge structure for large-scale big data network analytics and community top-K nodes in large networks using Katz centrality, Journal of Big
detection, 2015 IEEE International Symposium on Technologies for Data, Vol. 4, Issue 1, pp. 16, 2017.
Homeland Security, pp. 1-8, 2015. [46] Justin Zhan, Xing Fang, Peter Killion, Trust optimization in task-
[26] Pravin Chopade, Justin Zhan, A framework for community detection in oriented social networks, 2011 IEEE Symposium on Computational
large networks using game-theoretic modeling, IEEE Transactions on Intelligence in Cyber Security, pp. 137-143, 2011.
Big Data, Vol. 3, Issue 3, pp. 276-288, 2017. [47] Carter Chiu and Justin Zhan, Deep Learning for Link Prediction in
[27] Pravin Chopade, Justin Zhan, Structural and functional analytics for Dynamic Networks Using Weak Estimators, IEEE Access, Volume 6,
community detection in large-scale complex networks, Journal of Big Issue 1, pp., 2018.
Data, Vol. 2, Issue 1, , 2015 [48] Moinak Bhaduri and Justin Zhan, Using Empirical Recurrences Rates
[28] Matin Pirouz, Justin Zhan, Shahab Tayeb, An optimized approach for Ratio for Time Series Data Similarity, IEEE Access, Volume 6, Issue 1.,
community detection and ranking, Journal of Big Data, Vol. 3, Issue 1, pp.30855-30864, 2018.
pp. 22, 2016. [49] Jimmy Ming-Tai Wu, Justin, Zhan, and Sanket Chobe, Mining
[29] Matin Pirouz, Justin Zhan, Optimized relativity search: node reduction Association Rules for Low Frequency Itemsets, PLoS ONE 13(7):
in personalized page rank estimation for large graphs, Journal of Big e0198066. , 2018.
Data, Vol. 3, Issue 1, 2016. [50] Payam Ezatpoor, Justin Zhan, Jimmy Ming-Tai Wu, and Carter Chiu,
[30] Shahab Tayeb, Matin Pirouz, Brittany Cozzens, Richard Huang, Finding Top-k Dominance on Incomplete Big Data Using MapReduce
Maxwell Jay, Kyle Khembunjong, Sahan Paliskara, Felix Zhan, Mark Framework, IEEE Access, Volume 6, Issue 1, pp. 7872-7887, 2018.
Zhang, Justin Zhan, Shahram Latifi, Toward data quality analytics in [51] Pravin Chopade and Justin Zhan, Towards A Framework for Community
signature verification using a convolutional neural network, 2017 IEEE Detection in Large Networks using Game-Theoretic Modeling, IEEE
International Conference on Big Data, pp. 2644-2651, 2017. Transactions on Big Data, Volume: 3, Issue: 3, pp.276-288, 2017.
[31] Haysam Selim, Justin Zhan, Towards shortest path identification on Moinak Bhaduri, Justin Zhan, and Carter Chiu, A Weak Estimator For
large networks, Journal of Big Data, Vol. 3, Issue 1, pp. 10, 2016 Dynamic Systems, IEEE Access, Volume 5, Issue 1, pp. 27354-27365,
[32] Xian-Ming Xu, Justin Zhan, Hai-tao Zhu, Using social networks to 2017.
organize researcher community, International Conference on [52] Matin Pirouz and Justin Zhan, Toward Efficient Hub-Less Real Time
Intelligence and Security Informatics, pp. 421-427, 2008. Personalized PageRank, IEEE Access, Volume 5, Issue 1, pp. 26364-
[33] Felix Zhan, Gabriella Laines, Sarah Deniz, Sahan Paliskara, Irvin 26375, 2017.
Ochoa, Idania Guerra, Shahab Tayeb, Carter Chiu, Matin Pirouz, Elliott [53] Moinak Bhaduri, Justin Zhan, Carter Chiu, and Felix Zhan, A Novel
Ploutz, Justin Zhan, Laxmi Gewali, Paul Oh, Prediction of online social Online and Non-Parametric Approach for Drift Detection in Big Data,
networks users' behaviors with a game theoretic approach, pp. 1-2, 2018 IEEE Access, Volume 5, Issue 1, pp. 15883-15892, 2017.
15th IEEE Annual Consumer Communications & Networking [54] Carter Chiu, Justin Zhan, and Felix Zhan, Uncovering Suspicious
Conference. Activity from Partially Paired and Incomplete Multimodal Data, IEEE
[34] Felix Zhan, Brandon Waters, Maria Mijangos, LeAnn Chung, Raghav Access, Volume 5, Issue 1, pp. 13689 - 13698 ,2017.
Bhagat, Tanvi Bhagat, Matin Pirouz, Carter Chiu, Shahab Tayeb, Elliott [55] Jimmy Ming-Tai Wu, Justin Zhan, Jerry Lin, Ant Colony System
Ploutz, Justin Zhan, Laxmi Gewali, An efficient alternative to Sanitization Approach to Hiding Sensitive Itemsets, IEEE Access, Vol.
personalized page rank for friend recommendations, pp. 1-2, 2018 15th 5, No. 1, pp. 10024–10039, 2017.
IEEE Annual Consumer Communications & Networking Conference. [56] Justin Zhan and Binay Dahal, Using Deep Learning for Short Text
[35] Justin Zhan, Xing Fang, A novel trust computing system for social Understanding, Journal of Big Data, 4: 34, pp. 1-15, 2017.
networks, IEEE Third International Conference on Social Computing, [57] Justin Zhan, Timothy Rafalski, Gennady Stashkevich, Edward Verenich,
pp. 1284-1289, 2011. Vaccination Allocation in Large Dynamic Networks, Journal of Big
[36] Justin Zhan, Secure collaborative social networks, IEEE Transactions on Data, 4:2, 2017.
Systems, Man, and Cybernetics, Part C, Vol. 40, Issue 6, pp. 682-689, [58] Zhan, J., Oommen, J., and Crisostomo, J., Anomaly Detection in
2010. Dynamic Systems Using Weak Estimator, ACM Transaction on Internet
[37] Justin Zhan, Xing Fang, Social computing: the state of the art, Technology, Vol. 11, No. 1, pp. 53-69, 2011.
International Journal of Social Computing and Cyber-Physical Systems, [59] Zhan, J., Hsieh C., Wang, I., Hsu, T., Liau, C., and Wang D., Privacy-
Vol. 1, Issue 1, pp. 1-12, 2011. Preserving Collaborative Recommender Systems, IEEE Transaction on
[38] Justin Zhan, Xing Fang, A computational trust framework for social Systems, Man, and Cybernetics, Part C, Volume 40, Issue 4, pp. 472-476,
computing (a position paper for panel discussion on social computing 2010.
foundations), IEEE Second International Conference on Social [60] Wang, I., Shen, C., Zhan, J., Hsu, T., Liau, C. and Wang, D., Empirical
Computing, pp. 264-269, 2010. Evaluations of Secure Scalar Product, IEEE Transactions on Systems,
[39] Felix Zhan, Gabriella Laines, Sarah Deniz, Sahan Paliskara, Irvin Man, and Cybernetics, Part C, Vol. 39, Issue 4, pp. 440-447, 2009.
Ochoa, Idania Guerra, Shahab Tayeb, University of Nevada Las Vegas [61] Andrea Hart, Brianna Smith, Sean Smith, Elijah Sales, Jacqueline
AEOP/STEM/REAP/RET Programs Technical Report, Vol. 2, pp. 26- Hernandez-Camargo, Yarlin Mayor Garcia, Felix Zhan, Lori Griswold,
30, 2017. Brian Dunkelberger, Michael R. Schwob, Sharang Chaudhry, Justin
[40] Felix Zhan, Brandon Waters, Maria Mijangos, Raghav Bhagat, Tanvi Zhan, Laxmi Gewali, Paul Oh, Resolving Intravoxel White Matter
Bhagat, A Low Cost, High Speed Alternative to Personalized Page Rank Structures in the Human Brain Using Regularized Regression and
for Friend Recommendations, http://aeop.asecamps.com/wp- Clustering, Journal of Big Data, 2019.
content/uploads/2017/07/TeamC.pdf. [62] Jimmy Ming-Thai Wu, Justin Zhan, and Jerry Chun-Wei Lin, An ACO-
[41] Felix Zhan, How to Optimize Social Network Influence, 2019 IEEE based Approach to Mine High-Utility Itemsets, Knowledge-Based
International Conference on Artificial Intelligence and Knowledge Systems, Vol. 116, pp. 102-113, 2017.
Engineering, Cagliari, Italy, June 3-5, 2019. [63] Justin Zhan, Vivek Gudibande, and Sai Phani Krishna Parsa, Idenfication
[42] Justin Zhan, Gary Blosser, Chris Yang, Lisa Singh, Privacy-preserving of Top-K Influential Communities in Large Networks, Journal of Big
collaborative social networks, International Conference on Intelligence Data, 3:16, 2016.
and Security Informatics, pp. 114-125, 2008. [64] Matin Pirouz, Justin Zhan, Node Reduction in Personalized Page Rank
[43] Justin Zhan, Xing Fang, Trust maximization in social networks, Estimation for Large Graphs, Journal of Big Data, 3-12, 2016.
International Conference on Social Computing, Behavioral-Cultural [65] Haysam Selim, Justin Zhan, Towards Shortest Path Identification on
Modeling, and Prediction, pp. 205-211, 2011. Large Networks, Journal of Big Data, 3-10, 2016.
[44] Justin Zhan, Vivek Guidibande, Sai Phani Krishna Parsa, Identification
of top-K influential communities in big networks, Journal of Big Data,
Vol. 3, Issue 1, pp. 16, 2016.



You might also like