Research Paper 4
Research Paper 4
Dynamic Hand Gesture Based Sign Word Recognition Using Convolutional Neural
Network with Feature Fusion
Md Abdur Rahim1, Jungpil Shin2, Md Rashedul Islam3
pre-processing techniques are employed. We convert the significantly in the technological advances of human-computer
grayscale image to an input image by transforming the YCbCr interaction [12]. The CNN described in Fig. 3 includes
from RGB color space. The YCbCr has the luminance (Y) and convolutional, pooling and fully connection layer, activation
chrominance (Cb and Cr) color values. The pixel values of the function, and the classifier. Gesture images and segmented
grayscale image are between 0 and 255, 0 are generally black images are the input of two channels. The convolutional level is
and 255 is white. As 128 sets a threshold value and the pixel to detect the input local features and move the convolution
values are redefined as 0-127 to 0 and 128-255 to 255, we can through the specific length of kernel steps. Since there is no
process the image in grayscale with binary images. Then we difference in the convolution kernel, but the weight parameters
apply erosion to the binary image which removes the regions of are obtained through the running level by level. In pooling layer,
boundaries of foreground pixels. Thus the size of the the transformation of the function takes place in the
foreground pixels is shrunk in size, and the holes in the area overlapping field of the output level of convolutional.
became larger. Finally, we fill those holes and accept the image Therefore, a higher level of invariant features can be achieved.
of the hole that is used to extract the feature. Fig. 2 shows the However, the pooling layer can reduce the data layer while
preprocessing steps of an input image. saving feature information. Generally, the fully connected layer
connects to the latest level of pooling and classifier, which is
used to categorize various features expressed by multiple
features. The proposed method is used by the feature descriptor
to provide complete information on the last fully connected
layer. The fully connected layer can only accept
one-dimensional data, where we used Flatten function. Finally,
the fully connected layer integrates all the features and provides
to the softmax classifier. The Relu function is used as an
activation function.
Experimental Results
A. Experimental Dataset
In the examination, hand gesture-based sign word images
are used to evaluate the effectiveness of the proposed method.
The dataset considers fifteen isolated gestures to collect images
using a webcam. For each gesture, 900 images are captured.
Therefore, a total of 13,500 images are used to sign word
recognition. For creating a gesture database, three volunteers
are performed. We collected 300 images for each gesture from
each individual. The gestures images are 200x200 in size. The
input images are preprocessed using our proposed methods and
provided for feature extraction. Fig. 4 shows the symbol of
gesture images of sign word and Fig. 5 depicts the example of
segmented gesture images. The experiment was conducted on a
computer (Intel Core i5-2400, 3.10 GHz) and GPU GTX 1080
Ti, and the webcam mounted at an appropriate place.
Fig. 2 Preprocessing steps of an input image.
B. Experimental Evaluation
We evaluate gesture-based different sign word recognition
C. Feature Extraction and Classification in this section. The proposed CNN architecture trained the
As a popular class of machine learning techniques, the entire dataset. We used 80% datasets for training and the
Convolutional Neural Network (CNN) has expanded remaining 20% for testing. After training, the extracted features
References
[1] F.S. Chen, C.M. Fu and C.L. Huang. "Hand gesture recognition
using a real-time tracking method and hidden Markov models."
Image and vision computing, vol. 21, no. 8, pp. 745-758, Aug.
2003.
[2] G. Marin, F. Dominio, and P. Zanuttigh. "Hand gesture
recognition with jointly calibrated leap motion and depth
sensor." Multimedia Tools and Applications, vol. 75, no. 22, pp.
14991-15015, Nov. 2016.
[3] P. Kumar, H. Gauba, P.P. Roy, and D.P. Dogra. "Coupled
HMM-based multi-sensor data fusion for sign language
recognition." Pattern Recognition Letters, vol. 86, pp. 1-8, Jan.
2017.
[4] D. Lifeng, R. Jun, M. Qiushi, W. Lei, “The gesture identification
based on invariant moments and SVM[J].” Microcomputer and
Its Applications, vol. 31, no. 6, pp. 32-35, 2012.
[5] Y.H. Sui, Y.S. Guo, “Hand gesture recognition based on
combing Hu moments and BoF-SURF support vector machine.”
Application Research of Computers, vol. 31, no. 3, pp. 953–956,
2014.
[6] M.A. Rahim, T. Wahid, M.K. Islam, “Visual recognition of
Bengali sign language using artificial neural network.”
International Journal of Computer Applications, vol. 94, no. 17,
Jan. 2014.
[7] M.A. Rahim, J. Shin, and M.R. Islam, “Human-Machine
Interaction based on Hand Gesture Recognition using Skeleton