0% found this document useful (0 votes)

27 views12 pages

Deep Learning-Based Sign Language Recognition System For Static Signs

This document presents a deep learning-based system for recognizing static Indian Sign Language (ISL) signs from images using convolutional neural networks (CNNs). The system was trained on a dataset of 35,000 images of 100 static ISL signs collected from multiple users. CNN models were evaluated using different optimizers, achieving a maximum training accuracy of 99.72% on colored images and 99.90% on grayscale images. The system demonstrates improved performance over prior work that recognized only a few ISL signs, and provides a foundation for static sign language recognition in ISL.

Uploaded by

smartarun726

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views12 pages

Deep Learning-Based Sign Language Recognition System For Static Signs

Uploaded by

smartarun726

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Neural Computing and Applications (2020) 32:7957–7968

https://doi.org/10.1007/s00521-019-04691-y (0123456789().,-volV)
(0123456789().,-volV)

S . I . : H Y B R I D A R T I F I C I A L I N T E L L I G E N CE A N D M A C H I N E L E A R N I N G
TECHNOLOGIES

Deep learning-based sign language recognition system for static signs

Ankita Wadhawan1 • Parteek Kumar1

Received: 3 December 2018 / Accepted: 18 December 2019 / Published online: 1 January 2020
Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract
Sign language for communication is efficacious for humans, and vital research is in progress in computer vision systems.
The earliest work in Indian Sign Language (ISL) recognition considers the recognition of significant differentiable hand
signs and therefore often selecting a few signs from the ISL for recognition. This paper deals with robust modeling of static
signs in the context of sign language recognition using deep learning-based convolutional neural networks (CNN). In this
research, total 35,000 sign images of 100 static signs are collected from different users. The efficiency of the proposed
system is evaluated on approximately 50 CNN models. The results are also evaluated on the basis of different optimizers,
and it has been observed that the proposed approach has achieved the highest training accuracy of 99.72% and 99.90% on
colored and grayscale images, respectively. The performance of the proposed system has also been evaluated on the basis
of precision, recall and F-score. The system also demonstrates its effectiveness over the earlier works in which only a few
hand signs are considered for recognition.

Keywords Sign language Data acquisition Convolutional neural network Max-pooling Softmax Optimizer

1 Introduction Humans have been adapting to sign language to com-

municate since ancient times. Hand gestures are as ancient
Sign language is a computer vision-based complete con- as the human civilization itself [1]. Hand signs are espe-
voluted language that engrosses signs shaped by the cially useful to express any word or feeling to communi-
movements of hands in combination with facial expres- cate. Therefore, people around the world use signals from
sions. It is a natural language used by people with low or hand constantly to express despite the formulation of
no hearing sense for communication. A sign language can writing conventions.
be used for communication of letters, words or sentences In recent times, much research has been ongoing in
using different signs of the hands. This type of communi- developing systems that are able to classify signs of dif-
cation makes it easier for hearing-impaired people to ferent sign languages into the given class. Such systems
express their views and also help in bridging the commu- have found applications in games, virtual reality environ-
nication gap between hearing-impaired people and other ments, robot controls and natural language communica-
person. tions. At present, the Indian Sign Language systems are in
the developing stage and no sign language recognition
system is available for recognizing signs in real time. So,
there is a need to develop a complete recognizer which
& Ankita Wadhawan identifies signs of Indian Sign Language.
[email protected] The automatic recognition of human signs is a complex
Parteek Kumar multidisciplinary problem that has not yet been completely
[email protected] solved. In the past years, a number of approaches were
1 used which involve the use of machine learning techniques
Computer Science and Engineering Department, Thapar
Institute of Engineering and Technology, Patiala, Punjab, for sign language recognition. Since the advent of deep
India learning techniques, there have been attempts to recognize

123
7958 Neural Computing and Applications (2020) 32:7957–7968

human signs. Networks which are based on deep learning Huang et al. [5] presented a Kinect-based sign language
paradigms deal with the architectures and learning algo- recognition system using 3D convolutional neural net-
rithms that are biologically inspired, in distinction to con- works. They used 3D CNN to capture spatial–temporal
ventional networks. Generally, the training of deep features from raw data, which help in extracting authentic
networks occurs in a layer-wise manner and depends on features to adapt to the large differences of hand gestures.
more distributed features as present in the human visual This model is validated on a real dataset collected from 25
cortex. In this, the abstract features from the collected signs signs with a recognition rate of 94.2%. Huang et al. [6]
in the first layer are grouped into primary features in the proposed a real-sense-based sign language recognition
second layer, which further combined into more defined system. They collected total 65,000 image frames con-
features present in the next layer. These features are then taining 26 alphabet signs, out of which 52,000 were used
further combined together into more engrossing features in for training and 13,000 for testing. The deep neural net-
the following layers, which help in the better recognition of work model was trained and classified using deep belief
different signs [2]. network and achieved an accuracy of 98.9% with real-
The sign language presents a huge variability in postures sense and 97.8% with Kinect. Pigou et al. [7] contributed
that a hand can have, which makes this discipline a par- their efforts on Microsoft Kinect and CNN-based recog-
ticularly complex problem. To deal with this, a correct nition system. In this system, they used thresholding,
generation of the static postures is necessary. In addition, background removal and median filtering for preprocess-
because each region has specific language grammar, it is ing. They implemented Nesterov’s Accelerated Gradient
required to develop the Indian Sign Language database, descent (NAG) optimizer and achieved a validation accu-
which has not been available yet. racy of 91.7% to recognize Italian gestures. Molchanov
Most of the research work in sign language recognition et al. [8] presented a multi-sensor system for gesture
based on deep learning technique is performed on sign recognition of the driver’s hand. They calibrate the data
languages other than Indian Sign Language. Of recent, this received from depth, radar and optical sensors, and use
area is gaining popularity among research experts. The CNN to classify ten different gestures. The experimental
earliest reported work on sign language recognition is results showed that the system achieved the best accuracy
mainly based on machine learning techniques. These of 94.1% using a combination of all three sensors. Tang
methods result in low accuracy as it does not extract fea- et al. [9] proposed a hand posture recognition system for
tures automatically. The main goal of deep learning tech- sign language recognition using the Kinect sensor. They
niques is automatic feature engineering. The idea behind employed hand detection and tracking algorithms for pre-
this is to automatically learn a set of features from raw data processing of the captured data. The proposed system is
that can be useful in sign language recognition. In this trained on 36 different hand postures using LeNet-5 CNN-
manner, it avoids the manual process of handcrafted feature based model. The testing has been performed using Deep
engineering by learning as a set of features automatically. Belief Network (DBN) and CNN, and it has been found
There exist many reported research systems related to that DBN outperformed CNN with the overall average
sign language recognition based on deep learning and accuracy of 98.12%.
machine learning techniques. Nagi et al. [3] proposed a Yang and Zhu [10] presented a video-based Chinese
max-pooling CNN for vision-based hand gesture recogni- Sign Language (CSL) recognition using CNN. They col-
tion. They employed color segmentation to retrieve hand lected data using 40 daily vocabularies and showed that the
contour and morphological image processing to remove developed method simplifies the hand segmentation
noisy edges. The experiments were performed on 6000 sign method and avoids information loss while extracting fea-
images collected from six gesture classes only and tures. They used Adagrad and Adadelta optimizers for
achieved an accuracy of 96%. learning CNN and found that Adadelta outperformed
Rioux-Maldague and Giguere [4] presented a feature Adagrad. Tushar et al. [11] proposed a numerical hand sign
extraction technique for the recognition of hand pose using recognition method using Deep CNN. They presented a
depth images and intensity images that are captured using layer-wise optimized architecture in which batch normal-
Kinect. They employed threshold on the maximum hand ization contributes to faster training convergence and the
depth for segmentation, resize the image and use image involvement of the dropout technique alleviates data over-
centralization for preprocessing. The results were evaluated fitting. The collected American Sign Language (ASL)
on known users and unseen users using a deep belief net- images were optimized using Adadelta optimizer of CNN
work. The recall and precision of 99% were achieved with and resulted in an accuracy of 98.50%. Oyedotun and
known users, 77% recall and 79% precision was achieved khashman [2] developed a vision-based static hand gesture
with unseen users. recognition system for recognizing 24 American Sign
Language alphabets. The complete hand gestures were

123
Neural Computing and Applications (2020) 32:7957–7968 7959

obtained from the publicly available Thomas Moeslund’s 2 CNN architecture components
gesture recognition database. They implemented the CNN
network and Stacked Denoising Autoencoders (SDAE) The objective of CNN is to learn the features present in the
network and achieved an accuracy of 91.33% and 92.83% data with higher order using convolutions. The CNN
on testing data, respectively. Bheda and Radpour [12] architecture works well for the recognition of objects
presented an American Sign Language-based recognition which includes images. They can recognize individuals,
system for letters and digits. The proposed CNN-based faces, street signs and other facets of visual data. There
architecture consists of three groups of convolutional lay- exist a number of CNN variations, but each of them is
ers followed by a max-pool layer and a dropout layer and based on the pattern of layers present, as shown in Fig. 1.
two groups of fully connected layers. The collected images CNN architecture consists of different components
were preprocessed using background subtraction technique which include different types of layers and activation
and achieved an accuracy of 82.5% on alphabets and 97% functions. The listing describes the purpose and functioning
on digits using stochastic gradient descent optimizer. of some commonly used layers which is discussed below.
Rao et al. [13] developed a selfie-based sign language
recognition system using Deep CNN. They created the Convolutional layer The core building blocks of CNN
dataset which performs 200 signs in different angles and architecture are the convolutional layer. Convolutional
under various background environments. They adopted layers (Conv) modify the input data with the help of a patch
mean pooling, max-pooling and stochastic pooling strate- of neurons connected locally from the previous layer. The
gies on CNN, and it has been observed that a stochastic dot product will be computed by the layer between the
pooling outperformed other pooling strategies with a region of the neurons present in the input layer and the
recognition rate of 92.88%. Koller et al. [14] proposed the weights to which they are locally connected present in the
hybrid approach that combines the strong discriminative output layer.
qualities of CNN with the sequence modeling property of A convolution is a mathematical operation that describes
Hidden Markov Model (HMM) for recognition of contin- the rule for merging two sets of information. The convo-
uous signs. The collected data have been preprocessed by lution operation takes input, applies a convolution filter or
using a dynamic programming-based approach. It has been kernel, and returns a feature map as an output as shown in
observed that the hybrid CNN-HMM approach outper- Fig. 2. This operation demonstrates the sliding of the ker-
forms the other state-of-the-art approaches. nel across the input data which produces the convoluted
Kumar et al. [15] proposed a two stream CNN archi- output data. At each step, the input data values are multi-
tecture, which takes two color-coded images the joint plied by the kernel within its boundaries and a single value
distance topographic descriptor (JDTD) and joint angle in the output feature map is created.
topographical descriptor (JATD) as input. They collected Let us suppose the frame size of an input image
and developed the dataset of 50,000 sign videos of Indian W 2 RwXh . The convolutional filter with size F is used for
Sign Language and achieved an accuracy of 92.14%. convolution with a stride of S and P padding for input
Based on the requirements mentioned above, this paper image boundary. The size of the output of the convolution
aims to develop a complete system based on deep learning layer is presented by Eq. (1).
models to recognize static signs of Indian Sign Language W F þ 2P
Output ¼ þ1 ð1Þ
collected from different users. It presents an effective S
method for the recognition of Indian Sign Language digits, For example, there is one neuron with a receptive field
alphabets and words used in day-to-day life. The deep size of F = 3, the input size is W = 128, and there is zero
learning-based convolutional neural network (CNN) padding of P = 1. The neuron stride across the input in
architecture is constructed using convolutional layers, fol- stride of S = 1, giving output of size (128 - 3 ? 2)/
lowed by other layers. A web camera-based dataset of 1 ? 1 = 128.
static signs has been created under different environmental The output of a convolutional layer is denoted with
conditions. The performance of the proposed system has standardized Eq. (2).
been evaluated using different deep learning models, 0 1
optimizers, precision, recall and F-score. X
The paper is organized as follows. Section 2 describes anj ¼ f @ yn1
i kijn þ bnj A ð2Þ
i2Cj
the generalized CNN architecture used for classification.
The proposed system design and architecture are demon- where * is the convolution operation, n represents the nth
strated in Sect. 3. Section 4 describes the experimental layer, anj is the jth output map, yn1 represents the ith input
i
results and analysis. Finally, the research has been con- map in the ðn 1Þth layer, the convolutional kernel is
cluded in Sect. 5.

123
7960 Neural Computing and Applications (2020) 32:7957–7968

Fig. 1 High-level general CNN architecture

Fig. 2 The convolution

operation

represented by kij , bj represents bias, Cj is used for repre- operation used by the pooling layer helps in the resizing of
senting input maps and f is an activation function [10]. the input data spatially (width, height). This operation is
For example, suppose that the input volume has size called as max-pooling. The down-sampling in this layer has
[128 9 128 9 3]. If the filter size is 3 9 3, then each been performed using filters on the input data.
neuron in the convolution layer will have weights to a For example, the input volume of size
[3 9 3 9 3] region in the input volume, for a total of [126 9 126 9 16] is pooled with filter size 2, stride 2 into
3*3*3 = 27 weights and ? 1 bias parameter. output volume of size [63 9 63 9 16].
The main objective of other feature extraction layers is
ReLU layer ReLU stands for Rectified Linear Units. The
to reduce the dimensions of the output generated by con-
ReLU layer helps in applying an element-wise activation
volutional layers. After convolution, the max-method will
function over the input data thresholding, for example,
be used over a region with some specific size for sub-
maxð0; xÞ at zero, giving the same dimension output as the
sampling of feature map. This operation is given by
input to the layer. The usage of ReLU layers does not affect
Eq. (3).
the receptive field of the convolution layer and at the same
anj ¼ s an1
i ; 8i 2 Vj ð3Þ time provides nonlinearity to the network. This nonlinear
property of the function helps in the better generalization of
where s is the subsampling operation and Vj is the jth
the classifier. The nonlinear function f ð xÞ used in the ReLU
region of subsampling in the nth input map [10].
layer is shown in Eq. (4).
Pooling layer Pooling layers help in reducing the repre-
f ð xÞ ¼ maxð0; xÞ ð4Þ
sentation of data gradually over the network and control
over-fitting. The pooling layer operates in an independent The sigmoid function and hyperbolic tangent are some
manner on every depth slice of the input. The max () other activation functions that can also be used to influence

123
Neural Computing and Applications (2020) 32:7957–7968 7961

nonlinearity in the network. The usage of ReLU is pre- the CNN architecture parameters are fine-tuned until the
ferred because the derivative of the function helps back- results match the desired accuracy.
propagation work considerably faster without making any
noticeable difference to generalization accuracy [16]. 3.1 Data acquisition
Fully connected layer/output layer Fully connected layer is
The three-channel image frames (RGB) are retrieved from
used to compute scores of different features for classifi-
the camera, and then these images are passed to the image
cation. The dimensions of the output volume are
preprocessing module. The dataset consists of the collec-
[1 9 1 9 N], where N represents the number of output
tion of the RGB images for different static signs. The
classes to be evaluated. Each output neuron is connected
dataset comprises 35,000 images which include 350 images
with all other neurons in the previous layer with different
for each of the static signs. There are 100 distinct sign
sets of weights. Furthermore, the fully connected layer is a
classes that include 23 alphabets of English, 0–10 digits
set of convolutions in which each feature map is connected
and 67 commonly used words (e.g., bowl, water, stand,
with every field of the consecutive layer and filters consist
hand, fever, etc.). The dataset consists of static sign images
of the same size as that of the input image [16].
with various sizes, colors and taken under different envi-
For example, a fully connected layer with
ronmental conditions to assist in the better generalization
[63 9 63 9 16] volume and a convolutional layer use fil-
of the classifier. A few examples from the dataset are
ter size 16, giving output volume [1 9 1 9 63,504].
shown in Fig. 4.
The final and last layer is the classification layer. As this
sign language recognition is a multi-classification problem,
3.2 Data preprocessing
softmax function is used in the output layer for classifica-
tion. Finally, the last fully connected layer with 1000
The data preprocessing is the application of different
neurons is used that computes the class scores. Here, 1000
morphological operations that are used to remove noise
represents the total number of classes in the dataset.
from the data. In this phase, the sign images are prepro-
Generally, the CNN architecture consists of four main
cessed using two methods that are image resizing and
layers that are a convolutional layer, the pooling layer, the
normalization. In image resizing, the image is resized to
ReLU layer and the fully connected or output layer. The
128 9 128. These images are then normalized to change
proposed sign language recognition system has been tested
the range of pixel intensity values which results in mean 0
on approximately 50 models of CNN by making variations
and variance 1.
in the hyperparameters such as filter size, stride and pad-
ding as presented in Sect. 3. The system has also been
3.3 Model training
tested by changing the number of convolutional and
pooling layers. To enhance the effectiveness of the results,
The model training is based upon convolutional neural
one more layer, i.e., dropout layer, is also added in the
networks. The proposed model is trained using the Tesla
proposed approach, which is a regularization technique
K80 Graphical Processing Unit (GPU), 12 GB memory,
used to ignore randomly selected neurons at the time of
64 GB Random Access Memory (RAM) and 100 GB Solid
training and it helps in reducing the chances of over-fitting.
State Drive (SSD). The classifier takes the preprocessed
sign images and classifies it into the corresponding cate-
gory. The classifier is trained on the dataset of different ISL
3 System design and rationale
signs. The dataset is shuffled and divided into training and
validation set with the size of training set being 80% of the
The proposed sign language recognition system includes
whole dataset. Shuffling the dataset is very significant in
four major phases that are data acquisition, image prepro-
terms of adding randomness to the process of neural net-
cessing, training and testing of the CNN classifier. Figure 3
work training which prevents the network from being
describes the data flow diagram depicting the working
biased toward certain parameters. The configuration of the
model of the system. The first phase is the data acquisition
CNN architecture used in the proposed system is described
phase, in which the RGB data of static signs get collected
in Table 1.
using a camera. The collected sign images are then pre-
processed using image resizing and normalization. These
3.4 Testing
normalized images are stored in the data store for future
use. In the next phase, the proposed system gets trained
The developed sign language recognition system has been
using CNN classifier and then the trained model is used to
tested on approximately 50 convolutional neural network
perform testing. The last phase is the testing phase in which
models. The algorithms with different optimizers are used

123
7962 Neural Computing and Applications (2020) 32:7957–7968

Fig. 3 System flowchart

to train the network for a maximum of 100 epochs with the function and predict results as accurate as possible. In this
loss function as categorical cross-entropy. Some of the paper, the proposed model is tested on different optimizers
other parameters which were used to fine-tune the network such as Adaptive Moment Estimation (Adam), Adagrad,
architecture based upon the preliminary results and after Adadelta, RMSprop and Stochastic Gradient Descent
applying some heuristics to increase the accuracy and find (SGD). The model is trained using AlexNet and Adam as
an optimal CPU/GPU computing usage are described in an optimizer and achieved training and validation accuracy
Table 2. of 10% and 5%, respectively. It took a total 4 h to train our
It can be observed from Table 2 that the accuracy of the model, and it has been observed that the model obtained is
proposed model gets increased as we limit the number of highly under-fitted. In the next step, we have reduced the
layers in CNN architecture. The training and validation number of layers from 8 to 5 and it has been found that the
accuracy get increased to 99.17% and 98.80% by reducing training and validation accuracy get increased to 42% and
the number of layers from 8 to 4, respectively. On the other 26%, respectively, using Adam as an optimizer and 16
hand, the accuracy gets decreased as we alter the number of filters. The proposed model achieved the best result with
filters from 16 filters to 32 filters and then to 64 filters with training and validation accuracy of 99.17% and 98.80%,
20 epochs. It has been observed that the recognition rate is respectively, using total 4 layers, 16 filters and Adam as an
high with only 20 epochs. optimizer.
The optimizers are used to tweak the parameters or The proposed model is tested using different optimizers.
weights of the model which helps in minimizing the loss Experimental results with respect to optimizers and colored

123
Neural Computing and Applications (2020) 32:7957–7968 7963

Fig. 4 Sample dataset

Table 1 Proposed system architecture faster calculations and performs updates more frequently
on massive datasets.
Layer type Output size Parameters
The proposed model is also tested on grayscale data.
Input 128 9 128 9 3 – The results obtained with respect to different optimizers, 16
Conv2d_1 (128, 128, 16) 448 filters, 4 layers and grayscale image datasets are given in
Conv2d_2 (126, 126, 16) 2320 Table 4. It has been observed that the model achieved the
Maxpooling2d_1 (63, 63, 16) 0 training and validation accuracy of 99.24% and 98.85%,
Dropout (63, 63, 16) 0 respectively, using Adam optimizer. The system achieved
Flatten 63,504 0 training and validation accuracy of 99.76% and 98.35%,
Dense_1 (FC1) 64 4,064,272 respectively, using RMSProp and it has been found that the
Dense_2 (FC2) 100 6500 SGD optimizer outperformed Adam, RMSProp and other
Total parameters 4,073,540 optimizers with training and validation accuracy of 99.90%
Trainable parameters 4,073,540 and 98.70%, respectively, on grayscale image dataset.
Non-trainable parameters 0

4 Experimentation and results

image datasets are represented in Table 3. It has been The performance of the Indian Sign Language recognition
observed that the SGD outperformed RMSProp, Adam and system is evaluated on the basis of two different experi-
other optimizers with 16 filters and 4 layers. The proposed ments. Firstly, the parameters used in training the model
model obtained the training and validation accuracy of are fine-tuned in which the number of layers, number of
99.72% and 98.56% using SGD optimizer, respectively. filters and optimizers have been changed. In the second
However, it is the distinct advantage of SGD that it does experiment, the performance of the trained model is

123
7964 Neural Computing and Applications (2020) 32:7957–7968

Table 2 Experimental results with respect to parameters

Number of layers Number of filters Training accuracy (%) Validation accuracy (%) Number of epochs

8 (5 CL, 3FC) 16 10 5 100

5 (3 CL, 2FC) 16 42 26 100
4 (2 CL, 2FC) 16 99.17 98.80 20
4 (2 CL, 2FC) 32 98.82 98.53 20
4 (2 CL, 2FC) 64 99.05 98.76 20

Table 3 Experimental results

Model Training accuracy (%) Training loss Validation accuracy (%) Validation loss Optimizer
with respect to optimizer and
colored images I 99.17 0.0280 98.80 0.0684 Adam
II 99.59 0.0378 98.27 0.1940 RMSProp
III 99.72 0.0126 98.56 0.0759 SGD

Table 4 Experimental results

Model Training accuracy (%) Training loss Validation accuracy (%) Validation loss Optimizer
with respect to optimizer and
grayscale images I 99.24 0.0280 98.85 0.0684 Adam
II 99.76 0.0378 98.35 0.1940 RMSProp
III 99.90 0.0126 98.70 0.0759 SGD

evaluated on color as well as on the grayscale image The training concluded after the 20th epoch due to stag-
dataset. The average precision, recall, F1-score and accu- nation in the improvement in validation loss.
racy of the ISL recognition system have also been
computed. 4.1 Comparison with existing systems
Precision is defined as,
TP=ðTP þ FPÞ ð5Þ The comparative analysis of the proposed Indian Sign
Language recognition system with other classifiers using
where TP and FP are the numbers of true and false posi- our own dataset is shown in Table 6. It has been found that
tives, respectively. the authors of the existing systems have used machine
The Recall is defined as, learning-based techniques for classification, whereas in our
TP=ðTP þ FNÞ ð6Þ methodology we have proposed an Indian Sign Language
recognition system using a deep learning-based CNN
where FN is the number of false negatives technique. It has been observed that the proposed Indian
The F1-score is defined as, Sign Language recognition system outperformed all the
2 Precision Recall/ðPrecision þ RecallÞ ð7Þ other existing ISL systems with an accuracy of 99.90%. It
has been also concluded that the CNN convolute structure
The classification performance for some of the grayscale
in large datasets by using the algorithm of backpropagation
sign samples showing precision, recall and F1 score is
which indicates how a machine could change its parame-
shown in Table 5. The complete description of results for
ters that are used to evaluate the representation in each
all the signs is given in ‘‘Appendix.’’
layer from the representation in the previous layer.
The compilation accuracy and loss range from about
The results of the proposed CNN-based sign language
12% and 3.623 after the third epoch to 99.90% and 0.012
recognition system are best when experimentation was
after the 20th epoch on training data, whereas the valida-
performed with different number of layers in CNN archi-
tion accuracy and validation loss range from 14 and 3.458
tecture. The rigorous experimentation was also performed
to 98.70% and 0.023 during the first 20 epochs as described
to find the optimal parameter values (number of layers,
in Fig. 5. The early stopping mechanism is also applied in
kernel size) for the implementation of the algorithm.
case the validation accuracy stops improving before the
completion of maximum of 30 epochs to avoid over-fitting.

123
Neural Computing and Applications (2020) 32:7957–7968 7965

Table 5 Classification
Sign Precision Recall F1-score Sign Precision Recall F1-score
performance
A 1.00 0.96 0.98 Me 1.00 1.00 1.00
Afraid 0.97 0.97 0.97 Nose 0.98 1.00 0.99
B 1.00 1.00 1.00 Oath 1.00 1.00 1.00
Bent 0.97 1.00 0.99 Open 1.00 0.97 0.98
Coolie 0.97 0.94 0.96 P 1.00 0.97 0.98
Claw 1.00 1.00 1.00 Pray 1.00 1.00 1.00
D 0.79 0.97 0.87 Q 0.97 1.00 0.99
Doctor 0.98 1.00 0.99 S 0.95 1.00 0.97
Eight 0.96 0.90 0.93 Sick 1.00 1.00 1.00
Eye 1.00 1.00 1.00 Strong 0.97 1.00 0.98
Fever 0.95 1.00 0.97 T 0.99 1.00 0.99
Fist 0.97 0.98 0.97 Tongue 0.99 1.00 0.99
Gun 0.97 1.00 0.99 Trouble 1.00 0.95 0.97
H 1.00 1.00 1.00 U 1.00 0.99 0.99
Hand 0.97 1.00 0.98 V 1.00 1.00 1.00
I 1.00 1.00 1.00 West 1.00 0.93 0.96
Jain 0.99 1.00 0.99 Water 0.93 0.98 0.95

Fig. 5 Accuracy and loss curves for training and validation datasets

Table 6 Comparative analysis

Author Technique used Recognition rate (%)
of the proposed ISL system with
other classifiers Rahaman et al. [17] K-nearest neighbors 95.95
Uddin and Chowdhury [18] Support vector machine 97.90
Rao and Kishore [19] Artificial neural network 98
Proposed system CNN 99.90

5 Conclusion and future scope layers. Each convolutional layer consists of different fil-
tering window sizes which help in improving the speed and
In this research, an effective method for the recognition of accuracy of recognition. A web camera-based dataset of
ISL digits, alphabets and words used in daily routine is 35,000 images from 100 static signs has been generated
presented. The proposed CNN architecture is designed with under different environmental conditions. The proposed
convolutional layers, followed by ReLU and max-pooling architecture has been tested on approximately 50 deep

123
7966 Neural Computing and Applications (2020) 32:7957–7968

learning models using different optimizers. The system videos into frames. A video sequence contains temporal as
results in the highest training and validation accuracy of well as spatial features. Firstly, a hand object is focused to
99.17% and 98.80%, respectively, with respect to change reduce the time and space complexity of network. After
in parameters such as the number of layers and number of that, the spatial features are extracted from the video
filters. The proposed system is also tested using different frames and the temporal features are extracted by relating
optimizers, and it has been found that SGD outperformed the video frames in the meantime. The frames of the
Adam and RMSProp optimizers with training and valida- training set will be given to the CNN model for training
tion accuracy of 99.90% and 98.70%, respectively, on the process. Finally, the trained model will be used as future
grayscale image dataset. The results of the proposed system reference to make predictions of the training and test data.
have also been evaluated on the basis of precision, recall The work will also be extended to develop a mobile-based
and F-score. It has been found that the system outper- application for the recognition of different signs in real
formed other existing systems even with less number of time.
epochs.
The major source of challenge in sign language recog- Acknowledgements This publication is an outcome of R&D work
undertaken in the project under the Visvesvaraya PhD Scheme of
nition is the capability of sign recognition systems to Ministry of Electronics and Information Technology, Government of
adequately process a large number of different manual India, being implemented by Digital India Corporation (formerly
signs while executing with low error rates. For this con- Media Lab Asia). We gratefully acknowledge the support of NVIDIA
dition, it has been shown that the proposed system is robust Corporation with the donation of the Titan XP GPU used for this
research.
enough to learn 100 different static manual signs with
lower error rates, as in contrast to other recognition systems
described in other works in which few hand signs are
Compliance with ethical standards
considered for recognition. Conflict of interest The authors declare that they have no conflict of
For future work, there is a need to collect more datasets interest.
to refine the recognition method. Furthermore, the experi-
mentation is ongoing on the trained CNN model to rec-
ognize signs in real time. In addition, the system will be Appendix
extended to recognize dynamic signs which require the
collection and development of a video-based dataset and
the system is tested using CNN architecture by dividing the

Grayscale sign samples showing precision, recall and F1 score

S no. Sign Precision Recall F1-score S no. Sign Precision Recall F1-score

1 A 1.00 0.96 0.98 51 Leprosy 1.00 1.00 1.00

2 Afraid 0.97 0.97 0.97 52 M 0.97 0.79 0.87
3 Add 1.00 1.00 1.00 53 Me 1.00 1.00 1.00
4 B 1.00 1.00 1.00 54 N 1.00 1.00 1.00
5 Bottle 1.00 0.97 0.99 55 Nose 0.98 1.00 0.99
6 Bud 1.00 1.00 1.00 56 Nine 1.00 1.00 1.00
7 Bent 0.97 1.00 0.99 57 Nurse 0.99 0.98 0.98
8 Between 1.00 0.99 0.99 58 O 0.96 0.97 0.96
9 Blind 1.00 1.00 1.00 59 Oath 1.00 1.00 1.00
10 Bowl 0.97 1.00 0.98 60 One 1.00 0.99 0.99
11 Brain 0.99 0.99 0.99 61 Open 1.00 0.97 0.98
12 C 1.00 1.00 1.00 62 Owl 1.00 1.00 1.00
13 Coolie 0.97 0.94 0.96 63 P 1.00 0.97 0.98
14 Cough 1.00 1.00 1.00 64 Policy 0.98 1.00 0.99
15 Cow 0.99 0.97 0.98 65 Pray 1.00 1.00 1.00
16 Chest 1.00 1.00 1.00 66 Promise 1.00 1.00 1.00
17 Claw 1.00 1.00 1.00 67 Q 0.97 1.00 0.99
18 D 0.79 0.97 0.87 68 R 0.98 1.00 0.99
19 Devil 0.98 0.94 0.96 69 S 0.95 1.00 0.97

123
Neural Computing and Applications (2020) 32:7957–7968 7967

S no. Sign Precision Recall F1-score S no. Sign Precision Recall F1-score

20 Doctor 0.98 1.00 0.99 70 Seven 0.96 1.00 0.98

21 E 1.00 1.00 1.00 71 Soldier 1.00 1.00 1.00
22 East 1.00 1.00 1.00 72 Shirt 1.00 1.00 1.00
23 Eight 0.96 0.90 0.93 73 Six 1.00 0.97 0.99
24 Evening 1.00 0.96 0.98 74 Sick 1.00 1.00 1.00
25 Elbow 0.95 1.00 0.98 75 Skin 1.00 1.00 1.00
26 Eye 1.00 1.00 1.00 76 Shoulder 1.00 0.98 0.99
27 F 0.97 0.97 0.97 77 Stand 1.00 1.00 1.00
28 Fat 1.00 1.00 1.00 78 Strong 0.97 1.00 0.98
29 Faith 0.98 1.00 0.99 79 Sleep 1.00 0.95 0.98
30 Fever 0.95 1.00 0.97 80 Sunday 1.00 1.00 1.00
31 Feel 0.97 1.00 0.98 81 T 0.99 1.00 0.99
32 Few 1.00 1.00 1.00 82 Ten 1.00 0.98 0.99
33 Food 0.98 0.86 0.92 83 Telephone 1.00 1.00 1.00
34 Four 1.00 0.96 0.98 84 Tongue 0.99 1.00 0.99
35 Fist 0.97 0.98 0.97 85 Thorn 0.96 1.00 0.98
36 Five 1.00 1.00 1.00 86 Three 0.97 1.00 0.99
37 G 0.97 0.97 0.97 87 Trouble 1.00 0.95 0.97
38 Gun 0.97 1.00 0.99 88 Two 1.00 1.00 1.00
39 Good 1.00 1.00 1.00 89 U 1.00 0.99 0.99
40 H 1.00 1.00 1.00 90 V 1.00 1.00 1.00
41 Hair 1.00 1.00 1.00 91 W 1.00 1.00 1.00
42 Hand 0.97 1.00 0.98 92 West 1.00 0.93 0.96
43 Head 0.90 0.99 0.94 93 Wedding 0.99 0.99 0.99
44 Hear 0.97 1.00 0.98 94 Water 0.93 0.98 0.95
45 I 1.00 1.00 1.00 95 White 1.00 0.98 0.99
46 Jain 0.99 1.00 0.99 96 X 1.00 1.00 1.00
47 K 0.98 0.98 0.98 97 Y 1.00 1.00 1.00
48 King 1.00 0.95 0.97 98 You 0.97 1.00 0.99
49 L 1.00 1.00 1.00 99 Z 0.99 1.00 0.99
50 Love 1.00 0.98 0.99 100 Zero 1.00 1.00 1.00

References 5. Huang J, Zhou W, Li H, Li W (2015) Sign language recognition

using 3D convolutional neural networks. In: IEEE international
conference on multimedia and expo (ICME), pp 1–6
1. Corballis MC (2003) From mouth to hand: gesture, speech and
6. Huang J, Zhou W, Li H, Li W (2015) Sign language recognition
the evolution of right-handedness. Behav Brain Sci
using real-sense. In: IEEE China summit and international con-
26(2):199–208
ference on signal and information processing (ChinaSIP),
2. Oyedotun OK, Khashman A (2017) Deep learning in vision-
pp 166–170
based static hand gesture recognition. Neural Comput Appl
7. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign
28(12):3941–3951
language recognition using convolutional neural networks. In:
3. Nagi J, Ducatelle F, Di Caro GA, Cireşan D, Meier U, Giusti A,
Workshop at the European conference on computer vision.
Gambardella LM (2011) Max-pooling convolutional neural net-
Springer, Cham, pp 572–578
works for vision-based hand gesture recognition. In: IEEE
8. Molchanov P, Gupta S, Kim K, Pulli K (2015) Multi-sensor
international conference on signal and image processing appli-
system for driver’s hand-gesture recognition. In: 11th IEEE
cations (ICSIPA), pp 342–347
international conference and workshops on automatic face and
4. Rioux-Maldague L, Giguere P (2014) Sign language finger-
gesture recognition (FG), vol 1, pp 1–8
spelling classification from depth and color images using a deep
9. Tang A, Lu K, Wang Y, Huang J, Li H (2015) A real-time hand
belief network. In: IEEE Canadian conference on computer and
posture recognition system using deep neural networks. ACM
robot vision (CRV), pp 92–97
Trans Intell Syst Technol (TIST) 6(2):21

123
7968 Neural Computing and Applications (2020) 32:7957–7968

10. Yang S, Zhu Q (2017) Video-based Chinese sign language topographical descriptor on a 2—stream CNN. Neurocomput
recognition using convolutional neural network. In: IEEE 9th 372:40–54
international conference on communication software and net- 16. Prabhu R (2018) Understanding of convolutional neural network
works (ICCSN), pp 929–934 (CNN) — deep learning. https://medium.com/@RaghavPrabhu/
11. Tushar AK, Ashiquzzaman A, Islam MR (2017) Faster conver- understanding-of-convolutional-neural-network-cnn-deep-learn
gence and reduction of overfitting in numerical hand sign ing-99760835f148. Accessed 4 Mar 2018
recognition using DCNN. In: Humanitarian technology confer- 17. Rahaman MA, Jasim M, Ali MH, Hasanuzzaman M (2014) Real-
ence (R10-HTC), IEEE Region 10, pp 638–641 time computer vision-based Bengali Sign Language recognition.
12. Bheda V, Radpour D (2017) Using deep convolutional networks In: 17th IEEE international conference on computer and infor-
for gesture recognition in American sign language. arXiv preprint mation technology (ICCIT), pp 192–197
arXiv:1710.06836 18. Uddin MA, Chowdhury SA (2016) Hand sign language recog-
13. Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep nition for Bangla alphabet using support vector machine. In:
convolutional neural networks for sign language recognition. In: IEEE international conference on innovations in science, engi-
IEEE conference on signal processing and communication engineering and technology (ICISET), pp 1–4
neering systems (SPACES), pp 194–197 19. Rao GA, Kishore PVV (2017) Selfie video based continuous
14. Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: Indian sign language recognition system. Ain Shams Eng J
enabling robust statistical continuous sign language recognition 9(4):1929–1939
via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325
15. Kumar EK, Kishore PVV, Kiran Kumar MT (2019) 3D sign Publisher’s Note Springer Nature remains neutral with regard to
language recognition with joint distance and angular coded color jurisdictional claims in published maps and institutional affiliations.

123

SIGNLANGUAGE PPT
100% (1)
SIGNLANGUAGE PPT
15 pages
SushmaArora-Cyber Crimes and Laws
0% (1)
SushmaArora-Cyber Crimes and Laws
22 pages
Software Project Management Plan
100% (1)
Software Project Management Plan
15 pages
The Complete Guide To Simple OEE
100% (3)
The Complete Guide To Simple OEE
26 pages
IRC5-DeviceNet Application Manual3HAC020676-001 RevB en
100% (1)
IRC5-DeviceNet Application Manual3HAC020676-001 RevB en
155 pages
Kendall Sad9 PP 12 GE
No ratings yet
Kendall Sad9 PP 12 GE
53 pages
Operations (Math IBA)
No ratings yet
Operations (Math IBA)
45 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
16 pages
OI Gateway User Guide
No ratings yet
OI Gateway User Guide
115 pages
Advantages and Disadvantages of Information Gathering Techniques
100% (3)
Advantages and Disadvantages of Information Gathering Techniques
4 pages
LPC17xx ARM Cortex M3 Assembly Language Example
67% (3)
LPC17xx ARM Cortex M3 Assembly Language Example
11 pages
Discrete Mathematics For Computer Science
No ratings yet
Discrete Mathematics For Computer Science
19 pages
Cycle Counting: Configuration, Process Flow and Implementation
No ratings yet
Cycle Counting: Configuration, Process Flow and Implementation
7 pages
CNN and Stacked LSTM Model For Indian Sign Language Recognition
No ratings yet
CNN and Stacked LSTM Model For Indian Sign Language Recognition
9 pages
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
100% (1)
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
1 page
Verification Mag 2018 Issue
No ratings yet
Verification Mag 2018 Issue
39 pages
Data Structure
No ratings yet
Data Structure
109 pages
S4 B.tech (2019) Syllabus
No ratings yet
S4 B.tech (2019) Syllabus
175 pages
All Research
No ratings yet
All Research
133 pages
Plag Free
No ratings yet
Plag Free
28 pages
C.Homework 4 PDF
No ratings yet
C.Homework 4 PDF
3 pages
Sign Language AI With Box Detectors
No ratings yet
Sign Language AI With Box Detectors
29 pages
Report 1
No ratings yet
Report 1
30 pages
Sign Language To Text-Speech Translator Using Machine Learning
No ratings yet
Sign Language To Text-Speech Translator Using Machine Learning
5 pages
Deloitte Consulting 401
No ratings yet
Deloitte Consulting 401
23 pages
Gv500 User Manual v1.00
No ratings yet
Gv500 User Manual v1.00
17 pages
Paper 3+ijisae
No ratings yet
Paper 3+ijisae
15 pages
Final PPT Capstone Project
No ratings yet
Final PPT Capstone Project
17 pages
OSS BSS The Challenges Ahead
No ratings yet
OSS BSS The Challenges Ahead
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
No ratings yet
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
16 pages
Sign Language RECOGNITION USING DEEP LEARNING
No ratings yet
Sign Language RECOGNITION USING DEEP LEARNING
28 pages
Practical: 1 Unit Impulse Response: Num (0 0 1) Den (1 0.2 1) Impulse (Num, Den) Grid Title
No ratings yet
Practical: 1 Unit Impulse Response: Num (0 0 1) Den (1 0.2 1) Impulse (Num, Den) Grid Title
20 pages
Presentation 1
No ratings yet
Presentation 1
12 pages
Particles Basic English
No ratings yet
Particles Basic English
21 pages
Real Time Hand Gesture Recognition Research
No ratings yet
Real Time Hand Gesture Recognition Research
11 pages
Journal Paper - Sign Language
No ratings yet
Journal Paper - Sign Language
10 pages
Changes Affecting Feasibility
No ratings yet
Changes Affecting Feasibility
13 pages
Sign Language
No ratings yet
Sign Language
22 pages
Mathematics 11 03729
No ratings yet
Mathematics 11 03729
20 pages
Indian Sign Language Recognition System For Dynamic Signs
No ratings yet
Indian Sign Language Recognition System For Dynamic Signs
9 pages
Updated
No ratings yet
Updated
30 pages
Final Year Project
No ratings yet
Final Year Project
19 pages
Sign Language Recognition
No ratings yet
Sign Language Recognition
24 pages
Hermes Opcodes Table
No ratings yet
Hermes Opcodes Table
28 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
18 pages
A Survey On Sign Language Recognition Systems
No ratings yet
A Survey On Sign Language Recognition Systems
27 pages
Development of An End-To-End Deep Learning Framework For Sign Language Recognition Translation and Video Generation
No ratings yet
Development of An End-To-End Deep Learning Framework For Sign Language Recognition Translation and Video Generation
17 pages
Sign 1
No ratings yet
Sign 1
10 pages
Selfie Video Based Continuous Indian Sign Language Recognition System PDF
No ratings yet
Selfie Video Based Continuous Indian Sign Language Recognition System PDF
11 pages
1 s2.0 S2590005622000121 Main
No ratings yet
1 s2.0 S2590005622000121 Main
14 pages
Hand Gesture Detection Using Deep Learning Demo
No ratings yet
Hand Gesture Detection Using Deep Learning Demo
9 pages
A Survey of Sign Language Recognition
No ratings yet
A Survey of Sign Language Recognition
6 pages
Sensors 21 01120 v2
No ratings yet
Sensors 21 01120 v2
22 pages
Hand Gesture Based Sign Language Recognition Using Deep Learning
No ratings yet
Hand Gesture Based Sign Language Recognition Using Deep Learning
5 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
Referencia N°03
No ratings yet
Referencia N°03
11 pages
PFX 48420843
No ratings yet
PFX 48420843
6 pages
University of Southern Mindanao College of Engineering and Computing Department of Computer Engineering
No ratings yet
University of Southern Mindanao College of Engineering and Computing Department of Computer Engineering
9 pages
A Review On The Perception and Recognition Systems For Interpreting Sign Languages Used by Deaf and Mute
No ratings yet
A Review On The Perception and Recognition Systems For Interpreting Sign Languages Used by Deaf and Mute
6 pages
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
No ratings yet
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
6 pages
Deepcnn Handgestures
No ratings yet
Deepcnn Handgestures
9 pages
Paper 2728
No ratings yet
Paper 2728
10 pages
Sign Language Detection Presentation
No ratings yet
Sign Language Detection Presentation
9 pages
PC 3000
No ratings yet
PC 3000
3 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
Solution8 9
No ratings yet
Solution8 9
7 pages
Quick Start Guide: Industrial Automation
No ratings yet
Quick Start Guide: Industrial Automation
25 pages
A Review of Sign Language Classification Techniques
No ratings yet
A Review of Sign Language Classification Techniques
7 pages
An Empirical Analysis of CNN For American Sign Language Recognition
No ratings yet
An Empirical Analysis of CNN For American Sign Language Recognition
8 pages
Vigneshkumar - Sign Language Recognition
No ratings yet
Vigneshkumar - Sign Language Recognition
8 pages
Sign Language Recognition Using Deep Learning Through LSTM and CNN
No ratings yet
Sign Language Recognition Using Deep Learning Through LSTM and CNN
5 pages
IJCRT2402668
No ratings yet
IJCRT2402668
7 pages
Deep Learning For Sign Language Recognition
No ratings yet
Deep Learning For Sign Language Recognition
4 pages
Recognition of Indian Sign Language Alphanumeric Gestures Based On Global Features
No ratings yet
Recognition of Indian Sign Language Alphanumeric Gestures Based On Global Features
6 pages
Elements of Statistics - Fergus Daly, Et Al
No ratings yet
Elements of Statistics - Fergus Daly, Et Al
4 pages
Sign Language Recognition System - A Survey
No ratings yet
Sign Language Recognition System - A Survey
5 pages
Sign Language Recognition Using Deep Learning and Computer Vision
No ratings yet
Sign Language Recognition Using Deep Learning and Computer Vision
6 pages
Electronics 13 01229 v2
No ratings yet
Electronics 13 01229 v2
13 pages
Indian Sign Language Recognition System
No ratings yet
Indian Sign Language Recognition System
3 pages
Review Paper
No ratings yet
Review Paper
5 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
5 pages
Sign Language Detection From Hand Gesture Images Using Deep Multi-Layered Convolution Neural Network
No ratings yet
Sign Language Detection From Hand Gesture Images Using Deep Multi-Layered Convolution Neural Network
5 pages
A Real Time Hand Gesture Recognition For Indian Sign Language Using Advanced Neu
No ratings yet
A Real Time Hand Gesture Recognition For Indian Sign Language Using Advanced Neu
5 pages
Aman Jain
No ratings yet
Aman Jain
2 pages
Sign Language Recognition Using ML
No ratings yet
Sign Language Recognition Using ML
3 pages
At 2 Manuscript
No ratings yet
At 2 Manuscript
2 pages
Lit Survey
No ratings yet
Lit Survey
2 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Deep Learning-Based Sign Language Recognition System For Static Signs

Uploaded by

Deep Learning-Based Sign Language Recognition System For Static Signs

Uploaded by

Neural Computing and Applications (2020) 32:7957–7968

Deep learning-based sign language recognition system for static signs

1 Introduction Humans have been adapting to sign language to com-

Fig. 1 High-level general CNN architecture

Fig. 2 The convolution

Fig. 3 System flowchart

Fig. 4 Sample dataset

4 Experimentation and results

Table 2 Experimental results with respect to parameters

8 (5 CL, 3FC) 16 10 5 100

Table 3 Experimental results

Table 4 Experimental results

Table 6 Comparative analysis

Grayscale sign samples showing precision, recall and F1 score

1 A 1.00 0.96 0.98 51 Leprosy 1.00 1.00 1.00

20 Doctor 0.98 1.00 0.99 70 Seven 0.96 1.00 0.98

References 5. Huang J, Zhou W, Li H, Li W (2015) Sign language recognition

You might also like