Sign Language Detection
Sign Language Detection
Abstract—Communication has been an essential part of language to the preferred language and vice versa will
human life. Different languages are present around the world minimize the communication gap. This study attempts to
for communication. Still, people who have lost their hearing and develop a system that helps to communicate through the
speaking ability by accidents and genetic birth often face video-conferencing application [7][15]. For developing the
difficulty in communication. Auditory-impaired people have
2023 7th International Conference on Trends in Electronics and Informatics (ICOEI) | 979-8-3503-9728-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICOEI56765.2023.10125915
The study by author Yeresime Suresh [21] suggested the EMG-based [9][10][3], and SignSpeaker [1]. These tests are
method employs Canny Edge detection, which provides a contrasted with his DeepSLR work, which is based on a multi-
more accurate result in detecting edges to identify the edges channel CNN architecture and produces findings that take less
of hand symbols in the frames. Compared to using embedded than 1.1 seconds to detect signals and recognize a sentence
sensors in gloves to collect for the identification of gestures, with four sign words, demonstrating DeepSLR's recognition
as seen in [3], the suggested system also uses the prediction effectiveness and real-time capability in practical situations.
model of Convolutional Neural Networks (CNN) The average word error rate for continuous sentence
transcription of a speech and hand symbols shown in figure recognition is 10.8%.
1.1. The result of all CNN and Canny Edge Detection provides
reliable and accurate results when trained with a large dataset. In this proposed system, the problems addressed in the
existing systems will be solved, like model prediction
The feasibility of using wearable sensor-based dependence on sensor gloves and static signs to predict the
devices to recognize hand movements in an application words, as shown in figure 1.1. This system can reduce this
directly connected to sign language was investigated by Karly system model dependence on sensor gloves by using an
Kudrinko [14] in her work. Her review aims to identify trends intelligent framework that can extract key points of gesture
and best approaches by examining earlier studies. The review skeleton structure using a digital camera commonly embedded
also points out the difficulties and gaps the sensor-based SLR in any personal computing (PC) device. Extraction of gesture
field is experiencing. Our analysis could help create better holistic (skeletal system) data is possible with the MediaPipe
SLR [17] systems that can be applied in real-world situations Framework developed by Google, which can map the skeleton
without the dependence on sensor-based wearables. A of the human body and extract coordinates of those key point
standardized data collection protocol and evaluation processes areas for gesture recognition. Combined with multiple frames,
might also be developed for her field as a result of looking at this can create a motion sequence to make a word in sign
diverse study methodologies. language. From the literature survey, LSTM algorithmic
Neural Network architectures could perform better for motion
Mathieu De Coster's [13] study tested a variety of recognition. LSTM algorithm is popularly known for the
neural network topologies, including Hidden Markov Models memory units that present the neural network nodes, which
(HMM), Long Short-Term Memory (LSTM), and can remember the output of previous data prediction. The
Convolutional Neural Networks (CNN), To improve the memory gate in the LSTM makes it a significant algorithm for
continuous SLR model. OpenPose is a framework used as a SLR applications, where most of the ASL consists of motion
feature extractor to gather the skeleton motion of gestures. gestures. Using the DGR model with LSTM architecture
Since SLR [17] relies on hand form, location, orientation, and solves the second problem: the barrier of using only static
non-manual components like mouth shape, OpenPose is the images to predict signs. It means expanding the vocabulary
sole full-body pose estimation methodology used to estimate and making the model more versatile and usable in different
gesture action. There are other pose estimate methods, but conditions.
they only identify, for instance, body key points or hand key
points (Fang et al., 2017). (Mueller et al., 2018). He retrieved Using the NLP model, these predicted words can be
data using the OpenPose Framework, trained the model, and converted into sentences. The NLP model will add meaning
developed the model. and send it to the WebRTC transmission stream to
communicate seamlessly by displaying sentences as captions
In a study by Bhushan Bhokse, he created a program in the interface. The Speech-to-Text model can also help the
for gesture recognition that allows a user to demonstrate his deaf to read and understand by converting audio of speech and
hand making a particular motion in front of a video camera converting to text.
connected to a computer. The computer program must gather
images of his moves, examine them, and recognize the sign. It III. PROPOSED SYSTEM
was decided that identification would include counting the
user's fingers and identifying the American Sign Language The proposed system, which is Sign Language Translation
they use in the input image to make the system more Application (SLTA), helps people who face challenges with
manageable [12]. He performed experiments using static communication by translating American Sign Language
images on simple backgrounds, retrieved the pictures as (ASL) for communication, who aren't exposed to sign
grayscale images, and used the binary data from the image for language communication. SLTA is a video-conferencing
gesture detection. application that makes use of WebRTC protocols for real-time
two-way video and audio communication. It is similar to
Sushmita Mitra described gestures, like those used in Google Meet, Zoom, and Microsoft Teams, which are
sign languages, including static and dynamic components, in embedded with DGR and NLP models that help in SLR [17].
a study conducted by her [16]. Additionally, talk about how Development of such a system comes with specific challenges
various body motions, including those made with the hands, like creating visual motion gestures, the vocabulary of a
arms, head, and torso, are used in sign languages. She language, training a neural network to accommodate vast
conducted tests on a different face, hand, and body movement- vocabulary for Prediction, and making a user-friendly
based algorithms for sign language recognition for the interface system to use the ML translator model.
evaluation. She improved accuracy in those studies by fusing
Hidden Markov Models (HMM) and finite state machines SLTA can capture the hand gestures with the help of
(FSM) in hybridization. python libraries of OpenCV and MediaPipe, via video frames
of the user camera on PC that detects the positions of hand,
Zhibo Wang's study [19] covered past works and palm, torso, and face for spatial positioning landmarks of the
system trials on various SLRs with wearables that have person, and 21 points of each palm which are coordinates of
embedded sensors, such as RF-based [5], PPG-based [9], pixels that help to predict the sign more precisely as shown in
Acoustic-based [6], Sensing Gloves [8], Vision-based [2][4], figure 3.2. [18]
results DGR model with a confusion matrix, accuracy, and loss are recognition models. SLTA captures signed gestures with a
shown in figure 4.5. digital camera and extracts the gesture data using the
MediaPipe Framework. With the help of the DGR model
developed with LSTM architecture, it can predict the dynamic
motion gestures performed by the person in Infront of the
camera in real-time. An Accuracy of 98.81% has been
achieved with seven words of vocabulary and 252 test
samples. The proposed system also overcomes the existing
problems of static gesture recognition using LSTM
architecture by predicting motion gestures and replacing the
usage of sensor-based gloves and wearables for gesture
motion data collection.
Using WebRTC protocols, this application can be
implemented as a video conferencing for speech and hearing-
impaired people. Further development and refinement can be
done on this system by including the expansion of the size of
ASL vocabulary, development of the NLP model for predicted
words to sentences, and implementation of WebRTC protocol
for video conferencing for communication can be considered
for future work.
REFERENCES
[1] J. Hou, X.-Y. Li, P. Zhu, Z. Wang, Y. Wang, J. Qian, and P.
Yang, "Signspeaker: A real-time, high-precision smartwatch-
based sign language translator," in Proc. of ACM MobiCom,
2019.
Fig. 4.5. Confusion Matrix, Accuracy and Loss of the DGR Model
[2] J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, "Video-based
DGR model with LSTM architecture predicts the gesture sign language recognition without temporal segmentation,"
with the given data after pre-processing and conversion to data arXiv preprint arXiv:1801.10111, 2018.
structure from live video data in the above-mentioned [3] J. Wu, Z. Tian, L. Sun, L. Estevez, and R. Jafari, "Real-time
American sign language recognition using wrist-worn motion
sequence. Each gesture is predicted based on the probability and surface emg sensors," in Proc. of IEEE BSN, 2015, pp. 1–
of the gesture performed at every sequence of 30 frames. The 6.
word is predicted based on the highest probability in [4] J. Zang, L. Wang, Z. Liu, Q. Zhang, G. Hua, and N. Zheng,
vocabulary, and that word crosses the threshold probability of "Attention-based temporal weighted convolutional neural
0.9. Each predicted word is given to the display output, as network for action recognition," in Proc. of IFIP INTERACT,
shown in figure 4.6. 2018, pp. 97–108.
[5] J. Zhang, J. Tao, and Z. Shi, "Doppler-radar based hand gesture
recognition system using convolutional neural networks," in
International Conference in Communications, Signal
Processing, and Systems. Springer, 2017, pp. 1096–1113.
[6] R. Nandakumar, V. Iyer, D. Tan, and S. Gollakota, "Fingerio:
Using active sonar for fine-grained finger tracking," in Proc. of
ACM CHI, 2016, pp. 1515–1525.
[7] Julian Menezes .R, Albert Mayan .J, M. Breezely George,"
Development of a Functionality Testing Tool for Windows
Phones", Indian Journal of Science and
Technology,Vol:8,Issue:22,pp: 1-7,September 2015.
[8] T. T. Swee, A. Ariff, S.-H. Salleh, S. K. Seng, and L. S. Huat,
"Wireless data gloves malay sign language recognition
system," in Information, Communications & Signal
Processing, 2007 6th International Conference on. IEEE, 2007,
pp. 1–4.
[9] T . Zhao, J. Liu, Y. Wang, H. Liu, and Y. Chen, "Ppg-based
finger level gesture recognition leveraging wearables," in Proc.
of IEEE INFOCOM, 2018, pp. 1457–1465.
[10] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Wang, and J. Yang, "A
framework for hand gesture recognition based on
accelerometer and emg sensors," IEEE Transactions on
Fig. 4.6. Prediction of ASL with the DGR model Systems, Man, and Cybernetics Part A: Systems and Humans,
vol. 41, no. 6, pp. 1064–1076, 2011.
V. CONCLUSION [11] Z. Lu, X. Chen, Q. Li, X. Zhang, and P. Zhou, "A Hand Gesture
Sign language has been the primary way of Recognition Framework and Wearable Gesture-Based
communication for hearing and speech-impaired people. Most Interaction Prototype for Mobile Devices," in IEEE
Transactions on Human-Machine Systems, vol. 44, no. 2, pp.
people aren't aware of sign language, especially American 293-299, April 2014, doi: 10.1109/THMS.2014.2302794.
sign language. Common people often find it difficult to [12] Bhokse, B. (January 1, 2015). ISSN 2348 – 7968 hand gesture
understand disabled people. Sign Language Translation recognition using a neural network - IJISET. IJISET -
Application (SLTA) aims to bridge the communication barrier International Journal of Innovative Science, Engineering &
between disabled and abled persons using sign language
Technology. Retrieved October 24, 2022, from [19] Wang, Z., Zhao, T., Ma, J., Chen, H., Liu, K., Shao, H., Wang,
https://www.ijiset.com/vol2/v2s1/IJISET_V2_I1_01.pdf Q., & Ren, J. (2020). Hear sign language: A real-time end-to-
[13] Coster(UGent), M. D., & Dambre(UGent), and J. (1970, end sign language recognition system. IEEE Transactions on
January 1). Sign language recognition with Transformer Mobile Computing, 1–1.
Networks. Sign language recognition with transformer https://doi.org/10.1109/tmc.2020.3038303
networks. Retrieved October 24, 2022, from [20] Wikimedia Foundation. (2022, October 23). American sign
http://hdl.handle.net/1854/LU-8660743 language. Wikipedia. Retrieved October 24, 2022, from
[14] K. Kudrinko, E. Flavin, X. Zhu, and Q. Li, "Wearable Sensor- https://en.wikipedia.org/wiki/American_Sign_Language
Based Sign Language Recognition: A Comprehensive [21] Y. Suresh, J. Vaishnavi, M. Vindhya, M. S. A. Meeran and S.
Review," in IEEE Reviews in Biomedical Engineering, vol. 14, Vemala, "MUDRAKSHARA - A Voice for Deaf/Dumb
pp. 82-97, 2021, doi: 10.1109/RBME.2020.3019769. People," 2020 11th International Conference on Computing,
[15] Asha Pandian, Bharathi B, Albert Mayan J, Prem Jacob, Pravin Communication and Networking Technologies (ICCCNT),
"A Comprehensive View of Scheduling Algorithms for Kharagpur, India, 2020, pp. 1-8, doi:
MapReduce Framework in Hadoop," Journal of Computational 10.1109/ICCCNT49239.2020.9225656.
and Theoretical Nanoscience, Vol.16, No. 8, pp. 3582-3586, [22] Bazarevsky, V., & Zhang, F. (2019, August 19). Hands. On-
2019 Device, Real-Time Hand Tracking with MediaPipe. Retrieved
[16] Mitra, S. (2007, May 3). GESTURE RECOGNITION: A February 10, 2023, from
survey. IEEE Xplore. Retrieved October 24, 2022, from https://google.github.io/mediapipe/solutions/hands.html
https://ieeexplore.ieee.org/document/4154947 [23] Bazarevsky, V., & Grishchenko, I. (2020, August 13). Pose.
[17] Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Sign On-device, Real-time Body Pose Tracking with MediaPipe
Language Recognition: A Deep Survey, Expert Systems with BlazePose. Retrieved February 10, 2023, from
Applications, Volume 164, 2021, 113794, ISSN 0957-4174, https://google.github.io/mediapipe/solutions/pose.html
https://doi.org/10.1016/j.eswa.2020.113794. [24] Teak-Wei , C., & Boon Giin, L. (2018, October). The 26
(https://www.sciencedirect.com/science/article/pii/S09574174 letters and 10 digits of American Sign Language (ASL).
2030614X) American Sign Language Recognition Using Leap Motion
[18] Lugaresi, Camillo. "MediaPipe: A Framework for Building Controller with Machine Learning Approach. Retrieved
Perception Pipelines." arXiv.org, June 14, 2019. February 10, 2023, from
https://arxiv.org/abs/1906.08172. https://www.researchgate.net/figure/The-26-letters-and-10-
digits-of-American-Sign-Language-ASL_fig1_328396430