0% found this document useful (0 votes)
52 views6 pages

A Survey of Sign Language Recognition

Trata do desenvolvimento de programas para reconhecimento de sinais em asl .

Uploaded by

lamendes1987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

A Survey of Sign Language Recognition

Trata do desenvolvimento de programas para reconhecimento de sinais em asl .

Uploaded by

lamendes1987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375240372

A Survey of Sign Language Recognition

Article in INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT · October 2023
DOI: 10.55041/IJSREM26316

CITATIONS READS

0 521

4 authors, including:

Vaishnavi Karanjkar
Smt. Kashibai Navale College Of Engineering
1 PUBLICATION 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Vaishnavi Karanjkar on 18 January 2024.

The user has requested enhancement of the downloaded file.


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 10 | October - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

A Survey of Sign Language Recognition

Vaishnavi Karanjkar1, Rutuja Bagul2, Raj Ranjan Singh3, Rushali Shirke4


Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune
[email protected], [email protected], [email protected], [email protected]

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Sign Language is mainly used by deaf (hard Various techniques and methods for sign language recognition
hearing) and dumb people to exchange information was developed by different researchers.
between their own community and with other people. It is One example of this is the use of Recurrent Neural
a language where people use their hand gestures to Networks (RNN) which are commonly used for sign language
communicate as they can't speak or hear. The goal of sign recognition systems that rely on sequential data [1]. One of the
language recognition (SLR) is to identify acquired hand most common types of RNN that is used for sign language
motions and to continue until related hand gestures are recognition is the Long Short-term memory (LSTM) which is
translated into text and speech. Here, static and dynamic created to solve the vanishing gradient problem that can occur
hand gestures for sign language can be distinguished. The in traditional RNNs, where the gradient becomes too small to
human community values both types of recognition, even if be useful during backpropagation, resulting in poor training and
static hand gesture recognition is easier than dynamic hand performance [1]. The study of [1] in which used the LSTM
gesture recognition. By creating Deep Neural Network model for the system to recognize Indian sign language,
designs (Convolution Neural Network designs), where the presents a high accuracy result.
model will learn to detect the hand motions images It is a system which uses a camera to sense the
throughout an epoch, we are using Deep Learning information that has been obtained through finger motions. It is
Computer Vision to recognize the hand gestures. After the the most commonly used visual-based method. It has been a
model successfully recognizes the motion, an English text tremendous effort and has been gone into the development of
file is created that can subsequently be translated to speech. vision-based sign recognition systems through worldwide [8].
The user can choose from a variety of translations for this In recent years, there has been an increasing interest
paragraph. This application can be used without an in deep learning applied to various fields, and it has contributed
internet connection and is entirely offline. With this model's to technological improvement [10]. There have recently been
improved efficiency, communication will be easier for the numerous studies in the field of sign language recognition using
deaf (hard of hearing) and disabled people. We shall discuss deep learning to classify images or videos.
the use of deep learning for sign language recognition in this The reason we chose sign language recognition is
paper. because it has both the characteristics of motion recognition
and the characteristics of a time series language translation.
Deep learning models that classify images have low complexity
Key Words: sign language, convolutional neural network, compared to models that classify videos [10].
computer vision. A multilayer perceptron (MLP) is a deep, artificial
neural network in which the first layer, that is, the input layer
is used to receive the signal and the last layer, the output layer,
predicts the class of the input. Between these two layers, there
1. INTRODUCTION
consists an arbitrary number of hidden layers that is the true
computational engine of the MLP [7].
The application of Sign language to multilingual text Researchers have used several picture-capturing tools
and voice output is an innovative nexus of technology, to classify photos. This technology includes a camera or
linguistic accessibility, and inclusivity is created by the webcam, a data glove, a Kinect, and jump controls. Contrary to
incorporation of sign language into multilingual text and voice data glove-based systems, a camera or webcam is the
output. For Deaf and hard of hearing people, sign language is a instrument that most researchers employ since it offers better
crucial means of communication that opens up the outside and more natural interaction with no need for extra equipment.
world to them. However, this particular language has Data gloves have shown to be more accurate in data collection,
frequently encountered difficulties when dealing with spoken despite being relatively pricey and cumbersome [4].
and written languages, posing obstacles in daily life and the An overview of [9] is with three main modules
worlds of education and the workplace. including the feature extraction module, the processing module,
Innovative solutions have surfaced in response to and the classification module. The feature extraction module
these issues, utilizing technology to close the communication uses MMDetection to detect hand or body bounding boxes
gap. A varied, multilingual world will benefit from these depending on the dataset’s characteristics. If the dataset has
applications' increased accessibility, comprehension, and full-body images, the body bounding boxes are extracted. On
inclusivity of sign language. These apps are redefining how the other hand, the hand bounding boxes are extracted from the
sign language is incorporated into our global culture by only-hand dataset. After that, the detected bounding boxes will
utilizing cutting-edge advancements in natural language be forwarded to HRNet, a CNN based model, to determine the
processing, computer vision, and machine learning. key points normalized in the processing module. In addition,

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM26316 | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 10 | October - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

with the whole-body dataset, the hand bounding boxes are These features can then be used as input to your overall
evaluated from the farthest key-point to the left, the farthest recognition model.
key-point to the right, the farthest key-point on the top, and the LSTMs can be employed for recognizing sign
farthest key-point to the bottom in the processing module. The language gestures as sequences. As a user signs a phrase or
hand gestures are identified in the classification module that sentence, the LSTM network processes each video frame
uses key points and hand bounding boxes as inputs [9]. sequentially and maintains context. This allows it to make
predictions about the sign language signs being performed in
A.MediaPipe real time.
Google has created an open-source framework called LSTMs are often used in combination with CNNs in a
MediaPipe that allows developers to build machine learning hybrid architecture. CNNs are suitable for processing static
and computer vision pipelines for multimedia applications. It visual features in individual frames, while LSTMs excel in
provides pre-built components and tools for processing, handling temporal sequences. The output of the CNNs can be
analyzing, and visualizing multimedia data. The framework's fed as sequences into the LSTM network, allowing the model
modular architecture enables the proponents to create pipelines to consider both spatial and temporal information for sign
for gestures of static and dynamic Filipino Sign Language language recognition.
recognition [1]. The LSTM network is trained using labeled sign
First step is to generate dataset as there is no publicly language data, where the sequences of video frames are
available datasets for words. Signs are captured from webcam associated with specific sign language signs or phrases. The
and dataset was generated. The commonly used words in which LSTM learns to capture the dynamics and context for accurate
only one hand represents a particular word are 'okay', ‘yes’ recognition.
'peace', 'thumbs up', 'call me', 'stop', 'live long', 'fist', 'smile', Once trained, the LSTM model can be used for real-
'thumbs down', 'rock', and words which uses both hands are time sign language recognition. It takes in video frames as input
‘alright’, ‘hello’, ‘good’, ‘no’. After capturing 2536 images by and provides predictions on the signs being performed as the
stable camera, they are converted into frames. The dataset user signs. The model's ability to maintain context and consider
images are divided into 75% for training and 25% for testing. temporal dependencies is particularly valuable for
Second step is to pass video frames to MediaPipe framework. this application.
Google’s MediaPipe Hands is a solution for accurate hand and
finger tracking. It uses machine learning (ML) to deduce 21 3D C. CNN
hand landmarks from a single frame. Various existing state-of CNNs are primarily used for image feature extraction.
the-art approaches rely on desktop environments for inference In this system, each frame of the sign language video can be
whereas the proposed approach achieves real time performance considered an image. CNNs are employed to capture and
even on a mobile phone and scales to multiple hands [2]. analyze the spatial features within these images. They can
identify handshapes, facial expressions, and the position of
B.LSTM hands in the frame. CNNs consist of convolutional layers that
Long Short-Term Memory is a kind of recurrent apply filters to the input images. These filters detect various
neural network. LSTM was designed by Hochreiter & patterns and features, such as edges, corners, and textures in the
Schmidhuber. It tackled the problem of long-term sign language video frames. The network learns to recognize
dependencies of RNN in which the RNN cannot predict the the most relevant features for sign language recognition.
word stored in the long-term memory but can give more Pooling layers downsample the feature maps,
accurate predictions from the recent information. As the gap reducing the spatial dimension while preserving the essential
length increases RNN does not give an efficient performance. features. This helps in reducing computational complexity and
LSTM can by default retain the information for a long period enhances the model's invariance to small variations in hand
of time. It is used for processing, predicting, and classifying on positions or orientations.
the basis of time-series data. The convolutional filters can be trained to identify key
Long Short-Term Memory (LSTM) is a type of aspects of sign language gestures, such as the shapes made by
Recurrent Neural Network (RNN) that is specifically designed fingers and the positions of the hands relative to the face. This
to handle sequential data, such as time series, speech, and text. helps the CNN in understanding the visual characteristics of
LSTM networks are capable of learning long-term signs. CNN often requires preprocessing techniques, such as
dependencies in sequential data, which makes them well suited image resizing, normalization, and data augmentation.
for tasks such as language translation, speech recognition, and Preprocessing ensures that the input data is appropriately
time series forecasting. prepared for the network. Data augmentation techniques can
LSTMs can be employed to process sign language help increase the robustness of the model by introducing
video sequences, which are essentially sequential data frames. variations in the training data.
Each video frame can be considered a time step, and the LSTM CNNs are often used in combination with Long Short-
network can analyze the temporal patterns and dependencies Term Memory (LSTM) networks in a hybrid architecture.
between these frames. For example, it can capture the dynamic While CNNs capture spatial features, LSTMs handle the
movement of hands and facial expressions in sign temporal aspects of sign language gestures. The output of the
language gestures.
CNNs can be passed as sequences to the LSTM network,
LSTMs can be used for feature extraction from the
video data. They can learn to represent important temporal allowing the model to consider both spatial and temporal
features from the video frames, such as the trajectory of hand information for recognition. The CNN model is trained using
movements, handshapes, and the order of signs in a sentence. labeled sign language image data. The model learns to
recognize important visual features and patterns associated

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM26316 | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 10 | October - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

with sign language gestures. Training includes optimization Language


techniques and the adjustment of model parameters to using
minimize recognition errors. Machine
Learning and
2. METHODOLOGY Computer
Vision
In this study, during the model training, the LSTM network
5 Speech to Wavelet Accuracy for voice
learns to tune the weights and biases of the gates and memory Sign based input is very less and
cells through backpropagation over time. Moreover, in this Language MFCC & with lot of noise.
study the model training is set to stop at the nth epoch. Models Translation LSTM
that have lower loss than the previous lowest will be saved. for Indian Accuracy
Languages – 80%
After the model training, the generated models are 6 Sign CNN The model accuracy is
evaluated and visualize it through a confusion matrix for the Language to Accuracy less with less dataset.
basis of the evaluation through their accuracy score. TP Text – 99%
Conversion
corresponds to “True Positive” wherein the model prediction is
using Deep
correct. FP corresponds to “False Positive” wherein the model Learning
prediction is incorrect. 7 Sign KNN, The system does not
Language Decision take into account the
After the evaluation of the models, the model with the Translator Tree context of the signs.
highest accuracy score will be used as the final model. This will using ML Classifier, This means that the
go through inferencing to determine if the model correctly Neural system may not be
predicts each sign language phrase. Network able to correctly
Accuracy classify a sign if it is
– 97% performed in a
Table -1: SUMMARY OF RELATED WORK/GAP different context.
ANALYSIS 8 Sign CNN If the user places the
Language to Accuracy sensor in a different
Ref Paper Name Algorithm Limitations speech – 90% location, the system's
No & translation performance may degr
Accuracy using ML ade.
1 Long Short- LSTM The system is not able 9 An improved CNN The proposed method
Term Neural to recognize rare or hand gesture Accuracy is computationally
Memory- Network complex gestures. recognition – 95% expensive, especially
based Static Accuracy system using for large images.
and Dynamic – 98% key-points
Filipino Sign and hand
Language bounding
Recognition boxes
2 Hand AlexNet It is only able to 10 Dataset STmap, The system converts
Gesture Classifier recognize Transformati CNN- the skeleton data into
Based Sign Accuracy fingerspelling, which on System RNN an image, called an
Language – 99% is a subset for Sign Accuracy STmap, before
Recognition of sign language. Language – 99% training the image
Using Deep Recognition classification network.
Learning Based on This conversion
3 Deep CNN, This is only working Image process may lead to
Learning LSTM with large Classificatio some
based Sign Accuracy amount of data. n Network loss of information
Language – 95.6%
Recognition
robust to
Sensor
Displacemen
t
4 A Review of CNN The current datasets
Segmentatio used for ISL
n and recognition and
Recognition segmentation are
Techniques limited in size and
for Indian diversity.
Sign

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM26316 | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 10 | October - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

3. CONCLUSION AND FUTURE WORK [5] Jashwanth Peguda, V Sai Sriharsha Santosh, Y Vijayalata, Ashlin
Deepa R N, Vaddi Mounish: Speech to Sign Language Translation
for Indian Languages (2022)
Sign language recognition has greatly benefited from
advancements in machine learning, particularly in computer [6] Dr. Aruna Bhat, Vinay Yadav, Vishesh Dargan, Yash: Sign
vision and natural language processing. Deep learning models, Language to Text Conversion using Deep Learning (2022)
such as convolutional neural networks (CNNs) and Long Short
Term Memory (LSTM), have demonstrated impressive results [7] Jaya Nirmala: Sign language translator using machine learning
in recognizing signs accurately. (2022)

Sign language recognition technology is poised to make a [8] Mrs.Aerpula Swetha, Vamja Pooja, Vundi Vedavyas, Challa Datha
significant impact on the lives of Deaf and hard of hearing Venkata Naga Sai Kiran, Sadu Sravan: SIGN LANGUAGE TO
SPEECH TRANSLATION USING MACHINE LEARNING
individuals. This report underscores the importance of ongoing (2022)
research and collaboration between experts in machine
learning, computer vision, and the Deaf community to drive [9] Tuan Linh Dang a,∗ , Sy Dat Tran a , Thuy Hang Nguyen a , Suntae
innovation and make sign language recognition more accurate, Kim b , Nicolas Monet b: An improved hand gesture recognition
accessible, and inclusive. One of the main important future system using keypoints and hand bounding boxes (2022)
works to be done is to improve the response time and accuracy
of the textual and speech outputs. With further advancements [10] Sang-Geun Choi , Yeonji Park and Chae-Bong Sohn : Dataset
and increased awareness, we can look forward to a future where Transformation System for Sign Language Recognition Based on
communication barriers are significantly reduced for the Deaf Image Classification Network (2022)
community.
[11] E. B. Villagomez, R. A. King, M. J. Ordinario, J. Lazaro and J. F.
Villaverde, "Hand Gesture Recognition for Deaf-Mute using
FuzzyNeural Network," 2019 IEEE International Conference on
ACKNOWLEDGEMENT Consumer Electronics - Asia (ICCE-Asia), 2019, pp. 30- 33, doi:
10.1109/ICCEAsia46551.2019.8942220.
The present world of competition there is a race of existence in
which those who have the will to come forward succeed. [12] G. K. R. Madrid, R. G. R. Villanueva and M. V. C. Caya,
"Recognition of Dynamic Filipino Sign Language using MediaPipe
Project is like a bridge between theoretical and practical work. and Long ShortTerm Memory," 2022 13th International
With this willing we joined this particular project. First of all, Conference on Computing Communication and Networking
we would like to thank the supreme power the Almighty God Technologies (ICCCNT), Kharagpur, India, 2022, pp. 1-6, doi:
who is obviously the one who has always guided us to work on 10.1109/ICCCNT54827.2022.9984599.
the right path of life. We sincerely thank Prof. R. H. Borhade
sir, Head of the Department of Computer Engineering of Smt [13] M. B. D. Jarabese, C. S. Marzan, J. Q. Boado, R. R. M. F. Lopez,
Kashibai Navale college of engineering, for all the facilities L. G. B. Ofiana and K. J. P. Pilarca, "Sign to Speech Convolutional
provided to us in the pursuit of this project. Neural Network-Based Filipino Sign Language Hand Gesture
Recognition System," 2021 International Symposium on Computer
Science and Intelligent Controls (ISCSIC), Rome, Italy, 2021, pp.
We are indebted to our project guide Prof. P. V. Bhaskare,
147-153, doi: 10.1109/ISCSIC54682.2021.00036.
Department of Computer Engineering of Smt. Kashibai Navale
college of engineering. We feel it’s a pleasure to be indebted to [14] K. E. Oliva, L. L. Ortaliz, M. A. Tobias and L. Vea, "Filipino
our guide for his valuable support, advice and encouragement Sign Language Recognition for Beginners using Kinect," 2018
and we thank him for his superb and constant guidance towards IEEE 10th International Conference on Humanoid,
this project. Nanotechnology, Information Technology,Communication and
Control, Environment and Management (HNICEM), Baguio City,
We are deeply grateful to all the staff members of the computer Philippines, 2018, pp. 1-6, doi: 10.1109/HNICEM.2018.8666346.
department, for supporting us in all aspects. We acknowledge
[15] M. Allen Cabutaje, K. Ang Brondial, A. Franchesca Obillo, M.
our deep sense of gratitude to our loving parents for being a Abisado, S. Lor Huyo-a and G. Avelino Sampedro, "Ano Raw: A
constant source of inspiration and motivation. Deep Learning Based Approach to Transliterating the Filipino Sign
Language," 2023 International Conference on Electronics,
REFERENCES Information, and Communication (ICEIC), Singapore, 2023, pp. 1-
6, doi: 10.1109/ICEIC57457.2023.10049890.
[1] Carmela Louise L. Evangelista, Criss Jericho R. Geli, Marc Marion
V. Castillo: Long Short-Term Memory-based Static and Dynamic [16] A. S. M. Miah, J. Shin, M. A. M. Hasan, and M. A. Rahim,
Filipino Sign Language Recognition (2023) “BenSignNet: Bengali Sign Language Alphabet Recognition Using
Concatenated Segmentation and Convolutional Neural Network,”
[2] Roli Kushwaha, Gurjit Kaur, Manjeet Kumar: Hand Gesture Based Applied Sciences 2022, Vol. 12, Page 3933, vol. 12, no. 8, p. 3933,
Sign Language Recognition Using Deep Learning (2023) Apr. 2022, doi: 10.3390/APP12083933.

[3] Rinki Gupta, Roohika Manodeep Dadwal: Deep Learning based [17] D. Li, X. Yu, C. Xu, L. Petersson, and H. Li, “Transferring Cross-
Sign Language Recognition robust to Sensor Displacement (2023) Domain Knowledge for Video Sign Language Recognition,”
Proceedings of the IEEE Computer Society Conference on
[4] Subhangi Kumari, Ernest Tarlue, Aissatou Diallo, Megha Chhabra, Computer Vision and Pattern Recognition, pp. 6204–6213, 2020,
Gouri Shankar Mishra, Mayank Kumar Goyal: A Review of doi: 10.1109/CVPR42600.2020.00624.
Segmentation and Recognition Techniques for Indian Sign
Language using Machine Learning and Computer Vision (2023) [18] L. Pigou, A. van den Oord, S. Dieleman, M. van Herreweghe, and
J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM26316 | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 10 | October - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

Convolutions for Gesture Recognition in Video,” Int J Comput Vis,


vol. 126, no. 2–4, pp. 430–439, Apr. 2018, doi: 10.1007/S11263-
016-0957-7.

[19] S. Adhikary, A. K. Talukdar, and K. Kumar Sarma, “A


Visionbased System for Recognition of Words used in Indian Sign
Language Using MediaPipe,” Proceedings of the IEEE
International Conference Image Information Processing, vol. 2021-
November, pp. 390–394, 2021, doi:
10.1109/ICIIP53038.2021.9702551.

[20] O. Nafea, W. Abdul, G. Muhammad, M. Alsulaiman. "Sensor-


based human activity recognition with spatio-temporal deep
learning." Sensors, vol. 21, no. 6, p. 2141, 2021.

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM26316 | Page 5


View publication stats

You might also like