0% found this document useful (0 votes)
27 views

Sign Language Detection

Research paper on sign language recognition
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Sign Language Detection

Research paper on sign language recognition
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)

IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

Sign Language Translation in WebRTC Application


Gangadhar Chakali 1 Ch. Govardhan Reddy 2 Dr. B. Bharathi 3
Department of CSE Department of CSE Department of CSE
Sathyabama Institute Of Science And Sathyabama Institute Of Science And Sathyabama Institute Of Science And
Technology Technology Technology
Chennai, India Chennai, India Chennai, India
[email protected] chinthalacheruvugovardhanreddy [email protected]
@gmail.com

Abstract—Communication has been an essential part of language to the preferred language and vice versa will
human life. Different languages are present around the world minimize the communication gap. This study attempts to
for communication. Still, people who have lost their hearing and develop a system that helps to communicate through the
speaking ability by accidents and genetic birth often face video-conferencing application [7][15]. For developing the
difficulty in communication. Auditory-impaired people have
2023 7th International Conference on Trends in Electronics and Informatics (ICOEI) | 979-8-3503-9728-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICOEI56765.2023.10125915

Sign Language Recognition (SLR) system, this study uses


found sign language helpful in communicating with others. deep learning techniques to collect, train, and test data with
Hearing-impaired people must always have a personal open-source tools like TensorFlow, Keras, NumPy,
interpreter available for translations whenever they need to
MediaPipe Framework, OpenCV, etc.
communicate. People with disability of hearing impairment find
it challenging to interact with others on social media and the
internet to form new relationships on their own. An open-source
video-conferencing application that can translate sign language
is quite helpful for the hearing impaired. Sign language
recognition (SLR) has drawn a lot of attention as a way to close
the enormous communication gap. However, sign language is
far more complex and unpredictable when compared to other
activities, making it challenging for reliable recognition. The
Speech-to-Text API enables speech-impaired people who can
read to comprehend others. The Sign Language Translation
Application (SLTA) allows them to communicate by translating
their sign language into text that others can understand. The
proposed method uses python, the MediaPipe Framework for
gesture data extraction, and the Deep Gesture Recognition
(DGR) Model to identify the sign motion in real-time. The
proposed method achieves the highest accuracy of 98.81% using
a neural network comprised of Long-Short Term Memory units
for sequence identification.

Keywords—American Sign Language (ASL), Neural


Networks, Deep Learning, Web- Real-Time Communication
(RTC), Long-Short Term Memory (LSTM), Gesture Recognition,
Sign Language Translation. Fig. 1.1. Alpha-Numeric Hand Gestures of American Sign Language,
Source: [24]
I. INTRODUCTION
In the first part of the review, there is an explanation of the
Human beings communicate with each other through usage of sign language, how open-source platforms use SLR
different languages and expressions. The first language services, the technology used, studies that help understand
evolved around 50,000 to 150,000 years ago. Since then, SLR model development in the part of the literature review,
humans have depended on voice communication to and inferences from it.
understand each other better. Not everyone is gifted to speak
and hear to communicate and understand others. Many people The second part of the review explains the definitions and
around the globe suffer from listening and speaking problems analysis of the requirements essential for developing the
and are referred to as deaf and mute people or hearing model and software integration tools used in this study and the
impaired. Approximately around 18 million Indians and 300 description of Software requirements specifications (SRS).
million around the world are suffering from auditory The third part of the review explains the proposed model
impairment. It is hard to communicate with these people and describes software implementation and the development
because of their disabilities. With this disability, hearing- process. Explanation of the model's description, performance,
impaired people find it challenging to find new connections test results, and proposal of a system management plan to
and relationships. These people communicate with each other fulfil system requirements.
through sign language. Sign language has wide varieties like
American Sign Language (ASL) [20], Chinese Sign Language II. RELATED WORK
(CSL), Arabic Sign Language, etc. American Sign Language Researchers and computer scientists have researched this
(ASL) is the most recognized and widely used in different problem statement in great detail over the past few years to
countries. With sign language, there is a huge gap in find a solution, and all of their answers range from examining
communication with others. Everyone should know sign numerous patterns of gesture recognition techniques to the
language to interpret and understand each other to analysis of different sign languages and different styles of data
communicate with people suffering from an auditory collection.
impairment. Creating an interface that translates this sign

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1394


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)
IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

The study by author Yeresime Suresh [21] suggested the EMG-based [9][10][3], and SignSpeaker [1]. These tests are
method employs Canny Edge detection, which provides a contrasted with his DeepSLR work, which is based on a multi-
more accurate result in detecting edges to identify the edges channel CNN architecture and produces findings that take less
of hand symbols in the frames. Compared to using embedded than 1.1 seconds to detect signals and recognize a sentence
sensors in gloves to collect for the identification of gestures, with four sign words, demonstrating DeepSLR's recognition
as seen in [3], the suggested system also uses the prediction effectiveness and real-time capability in practical situations.
model of Convolutional Neural Networks (CNN) The average word error rate for continuous sentence
transcription of a speech and hand symbols shown in figure recognition is 10.8%.
1.1. The result of all CNN and Canny Edge Detection provides
reliable and accurate results when trained with a large dataset. In this proposed system, the problems addressed in the
existing systems will be solved, like model prediction
The feasibility of using wearable sensor-based dependence on sensor gloves and static signs to predict the
devices to recognize hand movements in an application words, as shown in figure 1.1. This system can reduce this
directly connected to sign language was investigated by Karly system model dependence on sensor gloves by using an
Kudrinko [14] in her work. Her review aims to identify trends intelligent framework that can extract key points of gesture
and best approaches by examining earlier studies. The review skeleton structure using a digital camera commonly embedded
also points out the difficulties and gaps the sensor-based SLR in any personal computing (PC) device. Extraction of gesture
field is experiencing. Our analysis could help create better holistic (skeletal system) data is possible with the MediaPipe
SLR [17] systems that can be applied in real-world situations Framework developed by Google, which can map the skeleton
without the dependence on sensor-based wearables. A of the human body and extract coordinates of those key point
standardized data collection protocol and evaluation processes areas for gesture recognition. Combined with multiple frames,
might also be developed for her field as a result of looking at this can create a motion sequence to make a word in sign
diverse study methodologies. language. From the literature survey, LSTM algorithmic
Neural Network architectures could perform better for motion
Mathieu De Coster's [13] study tested a variety of recognition. LSTM algorithm is popularly known for the
neural network topologies, including Hidden Markov Models memory units that present the neural network nodes, which
(HMM), Long Short-Term Memory (LSTM), and can remember the output of previous data prediction. The
Convolutional Neural Networks (CNN), To improve the memory gate in the LSTM makes it a significant algorithm for
continuous SLR model. OpenPose is a framework used as a SLR applications, where most of the ASL consists of motion
feature extractor to gather the skeleton motion of gestures. gestures. Using the DGR model with LSTM architecture
Since SLR [17] relies on hand form, location, orientation, and solves the second problem: the barrier of using only static
non-manual components like mouth shape, OpenPose is the images to predict signs. It means expanding the vocabulary
sole full-body pose estimation methodology used to estimate and making the model more versatile and usable in different
gesture action. There are other pose estimate methods, but conditions.
they only identify, for instance, body key points or hand key
points (Fang et al., 2017). (Mueller et al., 2018). He retrieved Using the NLP model, these predicted words can be
data using the OpenPose Framework, trained the model, and converted into sentences. The NLP model will add meaning
developed the model. and send it to the WebRTC transmission stream to
communicate seamlessly by displaying sentences as captions
In a study by Bhushan Bhokse, he created a program in the interface. The Speech-to-Text model can also help the
for gesture recognition that allows a user to demonstrate his deaf to read and understand by converting audio of speech and
hand making a particular motion in front of a video camera converting to text.
connected to a computer. The computer program must gather
images of his moves, examine them, and recognize the sign. It III. PROPOSED SYSTEM
was decided that identification would include counting the
user's fingers and identifying the American Sign Language The proposed system, which is Sign Language Translation
they use in the input image to make the system more Application (SLTA), helps people who face challenges with
manageable [12]. He performed experiments using static communication by translating American Sign Language
images on simple backgrounds, retrieved the pictures as (ASL) for communication, who aren't exposed to sign
grayscale images, and used the binary data from the image for language communication. SLTA is a video-conferencing
gesture detection. application that makes use of WebRTC protocols for real-time
two-way video and audio communication. It is similar to
Sushmita Mitra described gestures, like those used in Google Meet, Zoom, and Microsoft Teams, which are
sign languages, including static and dynamic components, in embedded with DGR and NLP models that help in SLR [17].
a study conducted by her [16]. Additionally, talk about how Development of such a system comes with specific challenges
various body motions, including those made with the hands, like creating visual motion gestures, the vocabulary of a
arms, head, and torso, are used in sign languages. She language, training a neural network to accommodate vast
conducted tests on a different face, hand, and body movement- vocabulary for Prediction, and making a user-friendly
based algorithms for sign language recognition for the interface system to use the ML translator model.
evaluation. She improved accuracy in those studies by fusing
Hidden Markov Models (HMM) and finite state machines SLTA can capture the hand gestures with the help of
(FSM) in hybridization. python libraries of OpenCV and MediaPipe, via video frames
of the user camera on PC that detects the positions of hand,
Zhibo Wang's study [19] covered past works and palm, torso, and face for spatial positioning landmarks of the
system trials on various SLRs with wearables that have person, and 21 points of each palm which are coordinates of
embedded sensors, such as RF-based [5], PPG-based [9], pixels that help to predict the sign more precisely as shown in
Acoustic-based [6], Sensing Gloves [8], Vision-based [2][4], figure 3.2. [18]

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1395


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)
IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

collected spatial 3D data of the hand motion of gestures with


acceptable performance using LSTM neural networks. Still, it
is challenging to employ economical gloves that require less
maintenance. The DGR model uses and accepts the data from
a digital front camera commonly found in personal devices as
a primary video source. Data from a digital camera is
pipelined to the SLTA via the OpenCV module, which helps
to manage the digital camera data and manipulate it according
to the requirement. MediaPipe Framework [18] makes gesture
recognition using Machine Learning (ML) and Deep learning
(DL) models for the detection of gesture holistic data of the
Fig. 3.1. System Architecture of Proposed System SLTA person at every frame of the gesture video, and predicting the
motion gesture with LSTM neural network is trained with
ASL.
DGR model can be able to detect sign language and predict
the vocabulary in keywords form like "EAT," "DRINK,"
"HELP," "HELLO," and "THANK YOU," but can't predict
the tense of the keywords like "EATING," "DRANK,"
"HELPED" and also it can't predict articles and prepositions
in the English language. Natural Language Processing (NLP)
model can analyze and predict sentences with grammar and
reconstruct sentences with the meaning the user meant to
communicate. Sentences that are predicted and reconstructed
Fig. 3.2. Hand Landmarks detected by Media-Pipe, by the NLP model are displayed as closed captions (CC) in the
Source:[22] SLTA application.
The positions of the face, hand, and torso are generally Hearing-impaired people are subjected to understanding
involved with ASL, where movements of different words and sentences of a language, which they learned with
combinations of fingers gesture separate each word in the sign language as they progress the communication. The
vocabulary. In a gesture, the positions of the face, body, and context of the commoner on the other side of the SLTA can
torso are important as they create unique vocabulary relating be understood by the hearing impaired using the speech-to-
to the wide sign gesture's actions. MediaPipe Framework can text model, which converts speech to live captions of the
also detect the 33 points in pose holistic landmarks speaker, making reading the context easy for the hearing
(coordinates) of the full body, as shown in figure 3.3. [18] impaired. Developing a user-friendly interface and features is
Most of the ASL vocabulary doesn't involves the legs and hip. possible for future work by using the Web Real-Time
Only the first 22 points mentioned in figure 3.3 are used to Communication (WebRTC) method and software engineering
record the dataset. techniques.
IV. RESULTS AND DISCUSSIONS
This section gives a brief discussion of the results of the
experiments and analyses that are conducted on the proposed
system. Sign language translation requires processing the
video frames into sequence data of 30 frames in a video for
each gesture captured by the system as one data point in the
dataset. Thirty frames of each gesture data point have the
gesture data in the form of positional coordinates used by
MediaPipe, representing the gesture holistic of one frame. The
sequence of frames creates motion of skeletal structure,
making it a motion gesture detected by the MediaPipe
Framework. These data points are ASL sign gestures that are
Fig. 3.3. Pose Full Body Landmarks detected By Media-Pipe, mapped to words. Based on the vocabulary size, LSTM
Source:[23] architecture is built to process the 30 frames of sequence
ASL follows the subject-verb-object (SVO), meaning gesture of live stream video data and detect the motion.
sentences are communicated through sign language without A. Conversion of video frames to sequence data
breaks. Phrases intended to communicate are understood with
Using OpenCV, the streamed data is pipelined to the
the help of the topic of discussion and verbs signed related to
application where gesture skeletal structure data is detected
the topic by the cognitive perception of humans. Sign
from video frames with the help of MediaPipe Framework, as
language gestures can be converted to words using the Deep
shown in figure 4.1. MediaPipe [18] uses ML and DL models
Gesture Recognition model (DGR), developed using a neural
that detect the skeletal structure in real-time and return
network consisting of LSTM units, a module in SLTA.
collection arrays of each frame as output, considered one point
Previous work has shown the limitations of the Convolution
in the ASL dataset of motion gestures.
Neural Network (CNN), which can only predict static sign
gestures [20]. Another study on motion gesture recognition
employed wearables [8][11] embedded with sensors that

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1396


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)
IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

"HELP," and "PLEASE" along with the word "NONE," also


added to the vocabulary. The word "NONE" means the user
does not perform a sign gesture. The dataset is developed with
the MediaPipe Framework, OpenCV, and other libraries.
MediaPipe uses built-in libraries to extract the landmarks of
the gesture data.
DGR model architecture comprises six layers where the
first three layers are LSTM layers, and the remaining three
Fig. 4.1. Live Holistic Landmarks detected by MediaPipe layers are fully connected layers. DGR model takes 30 frames
of gesture data as input and predicts seven words as output,
The ASL dataset for training ML models is built using the
including the word "None." The detailed structure of the DGR
OpenCV and MediaPipe framework. Each word in the
model is shown in figure 4.3.
vocabulary is a gesture in ASL performed in front of a digital
camera and pipelined to a data collection module that converts
video frames to sequence data. The converted sequence data
of each frame in 30 frames in motion gesture is stored in a
folder where; the folder is considered as one data point
representing the word. An independent executable file (.exe)
is developed based on the python language and makes use of
the 'auto-py-to-exe' library. It can run on any windows
machine without installing any python libraries or
frameworks. By making this system open-source, users can
train their sign gestures with data collector executable files to
work according to the ASL, allowing for a vast vocabulary.
The ASL dataset is used to experiment with the LSTM
architecture and parameters of Neural Networks to develop a
reliable SLR [17] system.
B. Data Pre-processing Fig. 4.3. DGR Model Architecture with LSTM layers
Data returned from the MediaPipe model will be in the The presented training and testing results of LSTM
form of an object variable. The pose, face, left hand, and right architecture are based on the 360 data points of the dataset
hand (PFLR)landmarks are extracted from MediaPipe objects given for training the LSTM neural network, consisting of the
as x, y, z, and visibility variables. The visibility variable is for four words mentioned above. As shown in figure 4.4, various
only pose landmarks. Extracted landmarks of PFLR have hyperparameters of LSTM architecture are experimented with
variables in a two-dimensional (2D) array that are flattened in and tested to develop the model, which performs an accuracy
each category and arranged as follows: pose, face, left hand, of 98.81%.
and right hand. So, every frame in the one motion gesture data
point consists of a PFLR sequence of the 1D array where 30
frames give us 30 arrays stored in one folder in the local drive,
making it one data point.

Fig. 4.4. Training results of the DGR Model


D. Sign Gesture Prediction and results
MediaPipe Framework extracts the gesture data from real-time
video stream data from a digital camera pipelined with OpenCV to
Fig. 4.2. Array representation training dataset the application. It captures the pose, face, left-hand, and right-hand
ASL datasets stored in folders are retrieved and combined coordinates from each video frame and stores them as sequence data.
as a 2D array from the file manager to make a data structure This gesture data is continuously given to the DGR model, which
considers 30 frames of data for the Prediction of a single gesture.
to feed the DGR model for training, as shown in figure 4.2. Each frame sequence consists of pose landmarks: 132, face
Based on the label of the top folder, the training labels are landmarks: 1404, left-hand landmarks: 63, and right-hand: 63 points,
generated for neural network training. giving 1662 points of sequence data in the whole data point. The
C. Building the LSTM architectural Neural Network
ASL dataset used to train the DGR model comprises six
words, "HELLO," "EAT," "THANK YOU," "EXCUSE ME,"

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1397


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)
IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

results DGR model with a confusion matrix, accuracy, and loss are recognition models. SLTA captures signed gestures with a
shown in figure 4.5. digital camera and extracts the gesture data using the
MediaPipe Framework. With the help of the DGR model
developed with LSTM architecture, it can predict the dynamic
motion gestures performed by the person in Infront of the
camera in real-time. An Accuracy of 98.81% has been
achieved with seven words of vocabulary and 252 test
samples. The proposed system also overcomes the existing
problems of static gesture recognition using LSTM
architecture by predicting motion gestures and replacing the
usage of sensor-based gloves and wearables for gesture
motion data collection.
Using WebRTC protocols, this application can be
implemented as a video conferencing for speech and hearing-
impaired people. Further development and refinement can be
done on this system by including the expansion of the size of
ASL vocabulary, development of the NLP model for predicted
words to sentences, and implementation of WebRTC protocol
for video conferencing for communication can be considered
for future work.
REFERENCES
[1] J. Hou, X.-Y. Li, P. Zhu, Z. Wang, Y. Wang, J. Qian, and P.
Yang, "Signspeaker: A real-time, high-precision smartwatch-
based sign language translator," in Proc. of ACM MobiCom,
2019.
Fig. 4.5. Confusion Matrix, Accuracy and Loss of the DGR Model
[2] J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, "Video-based
DGR model with LSTM architecture predicts the gesture sign language recognition without temporal segmentation,"
with the given data after pre-processing and conversion to data arXiv preprint arXiv:1801.10111, 2018.
structure from live video data in the above-mentioned [3] J. Wu, Z. Tian, L. Sun, L. Estevez, and R. Jafari, "Real-time
American sign language recognition using wrist-worn motion
sequence. Each gesture is predicted based on the probability and surface emg sensors," in Proc. of IEEE BSN, 2015, pp. 1–
of the gesture performed at every sequence of 30 frames. The 6.
word is predicted based on the highest probability in [4] J. Zang, L. Wang, Z. Liu, Q. Zhang, G. Hua, and N. Zheng,
vocabulary, and that word crosses the threshold probability of "Attention-based temporal weighted convolutional neural
0.9. Each predicted word is given to the display output, as network for action recognition," in Proc. of IFIP INTERACT,
shown in figure 4.6. 2018, pp. 97–108.
[5] J. Zhang, J. Tao, and Z. Shi, "Doppler-radar based hand gesture
recognition system using convolutional neural networks," in
International Conference in Communications, Signal
Processing, and Systems. Springer, 2017, pp. 1096–1113.
[6] R. Nandakumar, V. Iyer, D. Tan, and S. Gollakota, "Fingerio:
Using active sonar for fine-grained finger tracking," in Proc. of
ACM CHI, 2016, pp. 1515–1525.
[7] Julian Menezes .R, Albert Mayan .J, M. Breezely George,"
Development of a Functionality Testing Tool for Windows
Phones", Indian Journal of Science and
Technology,Vol:8,Issue:22,pp: 1-7,September 2015.
[8] T. T. Swee, A. Ariff, S.-H. Salleh, S. K. Seng, and L. S. Huat,
"Wireless data gloves malay sign language recognition
system," in Information, Communications & Signal
Processing, 2007 6th International Conference on. IEEE, 2007,
pp. 1–4.
[9] T . Zhao, J. Liu, Y. Wang, H. Liu, and Y. Chen, "Ppg-based
finger level gesture recognition leveraging wearables," in Proc.
of IEEE INFOCOM, 2018, pp. 1457–1465.
[10] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Wang, and J. Yang, "A
framework for hand gesture recognition based on
accelerometer and emg sensors," IEEE Transactions on
Fig. 4.6. Prediction of ASL with the DGR model Systems, Man, and Cybernetics Part A: Systems and Humans,
vol. 41, no. 6, pp. 1064–1076, 2011.
V. CONCLUSION [11] Z. Lu, X. Chen, Q. Li, X. Zhang, and P. Zhou, "A Hand Gesture
Sign language has been the primary way of Recognition Framework and Wearable Gesture-Based
communication for hearing and speech-impaired people. Most Interaction Prototype for Mobile Devices," in IEEE
Transactions on Human-Machine Systems, vol. 44, no. 2, pp.
people aren't aware of sign language, especially American 293-299, April 2014, doi: 10.1109/THMS.2014.2302794.
sign language. Common people often find it difficult to [12] Bhokse, B. (January 1, 2015). ISSN 2348 – 7968 hand gesture
understand disabled people. Sign Language Translation recognition using a neural network - IJISET. IJISET -
Application (SLTA) aims to bridge the communication barrier International Journal of Innovative Science, Engineering &
between disabled and abled persons using sign language

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1398


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI 2023)
IEEE Xplore Part Number: CFP23J32-ART; ISBN: 979-8-3503-9728-4

Technology. Retrieved October 24, 2022, from [19] Wang, Z., Zhao, T., Ma, J., Chen, H., Liu, K., Shao, H., Wang,
https://www.ijiset.com/vol2/v2s1/IJISET_V2_I1_01.pdf Q., & Ren, J. (2020). Hear sign language: A real-time end-to-
[13] Coster(UGent), M. D., & Dambre(UGent), and J. (1970, end sign language recognition system. IEEE Transactions on
January 1). Sign language recognition with Transformer Mobile Computing, 1–1.
Networks. Sign language recognition with transformer https://doi.org/10.1109/tmc.2020.3038303
networks. Retrieved October 24, 2022, from [20] Wikimedia Foundation. (2022, October 23). American sign
http://hdl.handle.net/1854/LU-8660743 language. Wikipedia. Retrieved October 24, 2022, from
[14] K. Kudrinko, E. Flavin, X. Zhu, and Q. Li, "Wearable Sensor- https://en.wikipedia.org/wiki/American_Sign_Language
Based Sign Language Recognition: A Comprehensive [21] Y. Suresh, J. Vaishnavi, M. Vindhya, M. S. A. Meeran and S.
Review," in IEEE Reviews in Biomedical Engineering, vol. 14, Vemala, "MUDRAKSHARA - A Voice for Deaf/Dumb
pp. 82-97, 2021, doi: 10.1109/RBME.2020.3019769. People," 2020 11th International Conference on Computing,
[15] Asha Pandian, Bharathi B, Albert Mayan J, Prem Jacob, Pravin Communication and Networking Technologies (ICCCNT),
"A Comprehensive View of Scheduling Algorithms for Kharagpur, India, 2020, pp. 1-8, doi:
MapReduce Framework in Hadoop," Journal of Computational 10.1109/ICCCNT49239.2020.9225656.
and Theoretical Nanoscience, Vol.16, No. 8, pp. 3582-3586, [22] Bazarevsky, V., & Zhang, F. (2019, August 19). Hands. On-
2019 Device, Real-Time Hand Tracking with MediaPipe. Retrieved
[16] Mitra, S. (2007, May 3). GESTURE RECOGNITION: A February 10, 2023, from
survey. IEEE Xplore. Retrieved October 24, 2022, from https://google.github.io/mediapipe/solutions/hands.html
https://ieeexplore.ieee.org/document/4154947 [23] Bazarevsky, V., & Grishchenko, I. (2020, August 13). Pose.
[17] Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Sign On-device, Real-time Body Pose Tracking with MediaPipe
Language Recognition: A Deep Survey, Expert Systems with BlazePose. Retrieved February 10, 2023, from
Applications, Volume 164, 2021, 113794, ISSN 0957-4174, https://google.github.io/mediapipe/solutions/pose.html
https://doi.org/10.1016/j.eswa.2020.113794. [24] Teak-Wei , C., & Boon Giin, L. (2018, October). The 26
(https://www.sciencedirect.com/science/article/pii/S09574174 letters and 10 digits of American Sign Language (ASL).
2030614X) American Sign Language Recognition Using Leap Motion
[18] Lugaresi, Camillo. "MediaPipe: A Framework for Building Controller with Machine Learning Approach. Retrieved
Perception Pipelines." arXiv.org, June 14, 2019. February 10, 2023, from
https://arxiv.org/abs/1906.08172. https://www.researchgate.net/figure/The-26-letters-and-10-
digits-of-American-Sign-Language-ASL_fig1_328396430

979-8-3503-9728-4/23/$31.00 ©2023 IEEE 1399


Authorized licensed use limited to: BMS Institute of Technology. Downloaded on May 06,2024 at 08:43:28 UTC from IEEE Xplore. Restrictions apply.

You might also like