Paper 2728
Paper 2728
IJARSCT
International Journal of Advanced Research in Science, Communication and Technology (IJARSCT)
Abstract: Sign language is one of the visual means of communicating through hand signals, gestures,
facial expressions, and visual communication. It’s the main form of communication for people with the
disability of hearing or speech. People with disabilities like autism spectrum disorder may also find sign
language beneficial for communicating. The system will realize Indian Sign Language using a keypoint
detection model. It will be used to make a sequence of keypoints. These keypoints can then be passed to an
action detection model. The proposed system will be predicting Indian Sign Language signs using several
frames and predicting what action is being demonstrated. The system will use Mediapipe Holistic to extract
the keypoints. It allows extracting keypoints from the user’s hands, from the user’s body, and the user’s
face. We will then use Tensorflow and Keras to build an LSTM model to be able to predict the action from
the live video feed. The system is going to train a deep neural network using LSTM layers to go on ahead
and predict that temporal component so we will be able to predict action from several frames. Then we
are going to put it all together, the Mediapipe Holistic and trained LSTM model using OpenCV and go on
ahead and predict in real-time using our webcam.
Keywords: Machine Learning, Feature Extraction, Mediapipe Holistic, LSTM, Sign Language
Recognition.
I. INTRODUCTION
Sign language is the primary mode of communication between the hearing and vocally impaired population. For people
with hearing impairment, Indian Sign Language is an important communication medium. People with disabilities
including Autism, Down Syndrome may also find sign lang useful for communicating. There are more than 300 different
sign languages, and they vary from nation to nation. Countries with the same spoken language do not have the same sign
language. American, British, and Australian Sign Language are three different varieties of English.
Our application uses the Deep Learning model for its implementation. Deep Learning is a subfield of machine learning
and is essentially a neural network with three or more layers. It’s concerned with algorithms inspired by the structure
and performance of the brain called artificial neural networks. These neural networks plan to simulate the behavior of
the human brain allowing it to find out from large amounts of knowledge. Using the combination of data inputs, weights,
and bias, Deep learning neural networks attempt to mimic the human brain. There are different types of neural networks
to deal with specific problems or datasets. For example:
1. Convolutional Neural Network (CNN): A CNN contains one or more than one convolutional layer. It contains
a three-dimensional arrangement of neurons. It takes an image as input, assigns significance to varied
aspects/objects within the image, and differentiates one from the other. CNN can be used for image processing,
computer vision, speech recognition, etc.
2. Recurrent Neural Network (RNN): The output of a particular layer in RNN is saved and fed back to the input.
This helps predict the result of the layer. If the prediction is not right, the learning rate is employed to make
small changes. The application of RNN can be found in the text to speech (TTS) conversion models. Other
applications of RNN are text processing like grammar checks, auto-suggest, etc., sentiment analysis, image
tagger, translation, etc.
Figure 3: Flow diagram of the proposed system Figure 4: Diagrammatic Representation of Proposed Signet Architecture
C. An Efficient Binarized Neural Network for Recognizing Two Hands Indian Sign Language Gestures in Real-
time Environment [3]
Considering the challenges of sign language recognition, on targeted embedded platforms, authors have proposed the
novel architecture of a binarized neural network with binary values of weights and activations using bitwise operations.
The advantage is using this architecture achieves an overall accuracy of 98.8% which is higher than other existing
methods while the disadvantage is This system misclassifies some signs of M, N, E because of their similar kind of
shapes, and also, the proposed BNN architecture is limited with small no of classes of gestures.
F. Video-based isolated hand sign language recognition using a deep cascaded model [6]
The authors used a cascaded architecture of SSD, CNN, and LSTM from RGB Videos to propose a deep-based model
for systematic hand sign recognition. The accuracy and complexity of hand sign recognition were improved by this
model. In case of an uncontrolled environment such as rapid hand motions, it provided fast processing. Using more data,
the accuracy of detection can be improved.
G. A Modified-LSTM Model for Continuous Sign Language Recognition using Leap motion [7]
In this paper, they have presented a novel framework for continuous-SLR using the Leap motion sensor. A modified
LSTM architecture has also been proposed for the recognition of sign words and sentences. Average accuracies of 72.3%
and 89.5% have been recorded on the signed sentences and isolated sign words, respectively. The recognition
performance can be improved by increasing more training data for better model learning.
Copyright to IJARSCT DOI: 10.48175/IJARSCT-2728 169
www.ijarsct.co.in
ISSN (Online) 2581-9429
IJARSCT
International Journal of Advanced Research in Science, Communication and Technology (IJARSCT)
Figure 9: A proposed framework for continuous SLR using Leap Motion sensor
I. Deep learning-based sign language recognition system for static signs [9]
The authors achieved the highest training accuracy of 99.17% and validation accuracy of 98.80%, concerning changes
in the number of layers and filters. To refine the recognition method, there is a need to collect more datasets.
6 Video-based isolated hand Razieh Rastgoo1 The accuracy and Using more data, the
sign language recognition Kourosh Kiani complexity of hand accuracy of detection can
using a deep cascaded Sergio Escalera sign recognition be improved.
model (2020) were improved by
this model. In case
of an uncontrolled
environment such as
rapid hand motions,
it provided fast
processing.
7 A Modified-LSTM Model Anshul Mittal, Average accuracies The recognition
for Continuous Sign Pradeep Kumar, of 72.3% and 89.5% performance can be
Language Recognition Partha Pratim Roy, have been recorded improved by increasing
using Leap motion Raman on the signed more training data for better
Balasubramanian, sentences and model learning.
and Bidyut B. isolated sign words,
Chaudhuri (2019) respectively
The average
recognition accuracy The system doesn’t
A depth-based Indian Sign T Raghuveera, R was improved up to consider the environment of
Language recognition Deepthi, R 71.85% with this gestures, leading to
8
using Microsoft Mangalashri, and R method. The system incorrect translations on
Kinect Akshaya (2019) achieved 100% many occasions.
accuracy for a few
signs.
The authors
achieved the highest
training accuracy of
To refine the recognition
Deep learning-based sign Ankita Wadhawan, 99.17% and
method, there is a need to
9 language recognition Parteek Kumar validation accuracy
collect more datasets.
system for static signs (2020) of 98.80%,
concerning changes
in the number of
layers and filters.
10 Deep Convolutional G. Anantha Rao, K. A less amount of The database is not
Neural Networks for Sign Syamala, P.V.V. training and available publicly.
Language Recognition Kishore, A.S.C.S. validation loss is
Sastry (2018) observed with the
proposed CNN
architecture.
IV. CONCLUSION
After studying the above-mentioned papers in detail, what we noticed was there is no system that uses real-time video
sequence for predicting words of Indian Sign language or a system that works for a live video feed. Most of the systems
work for static signs only or capture images and then compare them with a trained database or public datasets. So, there
is a need for a system that predicts words and not just the alphabets of Indian Sign Language for a live video feed.
ACKNOWLEDGMENT
We acknowledge with thanks the assistance and timely guidance which we have received from our Guide Mrs. Shubhangi
Vairagar Ma’am. We are very much thankful to Dr. S.V. Chobe Sir, Head of Department of Computer Engineering, and
Dr. Pramod Patil Sir, Principal, Dr. D Y Patil Institute of Technology for their constant support and encouragement at
various stages of our project. We would wish to express our deep sense of gratitude to Prof. Abhijit Jadhav Sir and Prof.
Snehal Ma’am whose guidance, encouragement, suggestion, and very constructive criticism have contributed immensely
to the evolution of our ideas on the project. Lastly, we would like to express our warm appreciation and sense of gratitude
to every member of this group for their valuable contribution to making this project a success.
REFERENCES
[1]. P. K. Athira, C. J. Sruthi, A. Lijiya, “A Signer Independent Sign Language Recognition with Co-articulation
Elimination from Live Videos: An Indian Scenario”,2019.https://www.sciencedirect.com/science
/article/pii/S131915781831228X
[2]. M. Jaiswal, V. Sharma, A. Sharma, S. Saini, R. Tomar, “An Efficient Binarized Neural Network for Recognizing
Two Hands Indian Sign Language Gestures in Real-time Environment”,2020https://ieeexplore.ieee.org/
abstract/document/9342454
[3]. C. J. Sruthi, A. Lijiya, “Signet: A Deep Learning based Indian Sign Language Recognition System”,
2019.https://ieeexplore.ieee.org/abstract/document/8698006
[4]. T. Raghuveera, R. Deepthi, R. Mangalashri, R. Akshaya, “A depth-based Indian Sign Language recognition
using Microsoft Kinect”, 2020.https://link.springer.com/article/10.1007/s12046-019-1250-6.
[5]. A. Wadhawan, P. Kumar, “Deep learning-based sign language recognition system for static signs”, 2020.https://
link.springer.com/article/10.1007/s00521-019-04691-y
[6]. S. Z. Gurbuz, A. C. Gurbuz, E. A. Malaia, D. J. Griffin, C. S. Crawford, M. M. Rahman, E. Kurtoglu, R. Aksu,
T. Macks, R. Mdrafi, “American Sign Language Recognition Using RF Sensing”, 2020.https://ieeexplore.ieee.
org/abstract/document/9187644
[7]. M. Al-Hammadi, G. Muhammad, w. Abdul, M. Alsulaiman, M. A. Bencherif, M. A. Mekhtiche, “Hand Gesture
Recognition for Sign Language Using 3DCNN”, 2020.https://ieeexplore.ieee.org/abstract/document/9078786
[8]. R. Rastgoo, K. Kiani, S. Escalera, “Video-based isolated hand sign language recognition using a deep cascaded
model”, 2020.https://link.springer.com/article/10.1007%2Fs11042-020-09048-5
[9]. G. A. Rao, K. Syamala, P. V. V. Kishore, A. S. C. S. Sastry, “Deep Convolutional Neural Networks for Sign
Language Recognition”, 2018. https://ieeexplore.ieee.org/abstract/document/8316344.
[10]. A. Mittal, P. Kumar, P. R. Roy, R. Balasubramanian, B. B. Chaudhuri, “A Modified-LSTM Model for
Continuous Sign Language Recognition using Leap motion”,2019. https://ieeexplore.ieee.org/abstract/
document/8684245