0% found this document useful (0 votes)
4 views5 pages

IRJAEH0202016 - Real-Time Sign Language Recognition and Translation Using Deep Learning Techniques

This paper discusses a novel approach to Real-Time Sign Language Recognition (SLR) and Translation (SLT) using deep learning techniques, specifically the YOLOv5 architecture, achieving high accuracy in recognizing hand gestures. The model was trained on a custom dataset of 15 classes, each with 70-75 images, resulting in impressive mAP values of 92% to 99%. The research aims to enhance communication accessibility for the hearing impaired and lays the groundwork for future advancements in sign language technology.

Uploaded by

aswathy.24achus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

IRJAEH0202016 - Real-Time Sign Language Recognition and Translation Using Deep Learning Techniques

This paper discusses a novel approach to Real-Time Sign Language Recognition (SLR) and Translation (SLT) using deep learning techniques, specifically the YOLOv5 architecture, achieving high accuracy in recognizing hand gestures. The model was trained on a custom dataset of 15 classes, each with 70-75 images, resulting in impressive mAP values of 92% to 99%. The research aims to enhance communication accessibility for the hearing impaired and lays the groundwork for future advancements in sign language technology.

Uploaded by

aswathy.24achus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Research Journal on Advanced Engineering Hub (IRJAEH)

e ISSN: 2584-2137
Vol. 02 Issue: 02 February 2024
Page No: 93 - 97
https://irjaeh.com

Real-Time Sign Language Recognition and Translation Using Deep Learning


Techniques
Tazyeen Fathima1*, Ashif Alam2, Ashish Gangwar3, Dev Kumar Khetan4, Prof. Ramya K5
1,2,3,4
UG, Artificial Intelligence and Machine Learning Engineering, Dayananda Sagar College of
Engineering, Bangalore, Karnataka, India.
5
Assistant Professor, Artificial Intelligence and Machine Learning Engineering, Dayananda Sagar College of
Engineering, Bangalore, Karnataka, India.
Emails: [email protected], [email protected], [email protected],
[email protected], [email protected]
*Corresponding Author Orcid ID: https://orcid.org/0009-0001-2597-1044

Abstract
Sign Language Recognition (SLR) recognizes hand gestures and produces the corresponding text or speech.
Despite advances in deep learning, the SLR still faces challenges in terms of accuracy and visual quality. Sign
Language Translation (SLT) aims to translate sign language images or videos into spoken language, which is
hampered by limited language comprehension datasets. This paper presents an innovative approach for sign
language recognition and conversion to text using a custom dataset containing 15 different classes, each class
containing 70-75 different images. The proposed solution uses the YOLOv5 architecture, a state-of-the-art
Convolutional Neural Network (CNN) to achieve robust and accurate sign language recognition. With careful
training and optimization, the model achieves impressive mAP values (average accuracy) of 92% to 99% for
each of the 15 classes. An extensive dataset combined with the YOLOv5 model provides effective real-time
sign language interpretation, showing the potential to improve accessibility and communication for the
hearing impaired. This application lays the groundwork for further advances in sign language recognition
systems with implications for inclusive technology applications.
Keywords: Sign Language Recognition (SLR), Sign Language Translation (SLT), YOLO V5 architecture,
Convolution Neural Network (CNN), mAP values

1. Introduction
The main form of communication for the deaf and of a Sign Language Recognition (SLR) system for
dumb is sign language (SL), which differs from hard-of-hearing and speech-impaired people is
spoken or written language in terms of vocabulary, emphasised in the study. Current SLRs frequently
meaning, and grammar. There are between 138 and depend on several depth sensor cameras or pricey
300 distinct forms of sign language used worldwide; wearable sensors. The suggested method presents a
in India, where there are 7 million deaf people, there framework for multilingual sign language
are only around 250 licenced interpreters. It is recognition that is based on vision and tracks and
difficult to teach sign language to the community extracts multi-semantic manual co-articulations, such
because of this lack. To overcome communication as one- and two-handed signals, in addition to non-
hurdles, sign language recognition uses computer manual components like body language and facial
vision and deep learning to identify hand motions and emotions. The objective is to isolate and extract
transform them into text or voice [1]. The importance different signals and non-manual motions to create a

International Research Journal on Advanced Engineering Hub (IRJAEH) 93


International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 02 February 2024
Page No: 93 - 97
https://irjaeh.com

realistic, multi-signer Indo-Russian Sign Language meaningful text or speech [6].


Database [2]. To close the communication gap 2.1.2 Data Preprocessing and Labelling
between the public and the hearing-impaired, the We preprocess and label the gathered data by
authors present a Hybrid Deep Neural Architecture utilizing Roboflow, a potent tool for dataset
(H-DNA) that integrates CNN, LSTM, GRU, and management. To guarantee consistency and accuracy
GAN for real-time sign language detection and in training, this stage entails cleaning, normalizing,
translation. The H-DNA model highlights how it may and annotating the dataset.
improve communication in a variety of sign 2.2 Model Construction
languages by showcasing accurate detection and 2.2.1 YOLO v5 Model Architecture Selection
translation [3]. The paper explores the difficulties and We chose the YOLO v5 architecture because it has a
methods involved in translating text into sign well-established track record of being effective in
language or in obtaining voice from movies in sign object identification tasks, which makes it a perfect
language. Translation, interpretation, and sign fit for our goals about sign language recognition.
segmentation are some of the tasks involved in the "You Only Look Once," or YOLO is well known for
process. Identifying sign glosses is a common its real-time image processing capabilities and ability
emphasis of current research instead of doing a whole to forecast bounding boxes around things. The
translation. To improve translation accuracy, the architecture's speed and efficacy are enhanced by its
study presents Sign Language Transformers, which single forward run through the neural network, which
use an encoder-decoder architecture with gloss precisely matches our objectives for sign language
representations. It also gives baseline data and offers identification [7].
possible directions for further study [4]. The paper
offers a framework for translating sign language 2.2.2 Training the YOLO v5 Model
films into spoken language using sign language We next use our carefully chosen and labelled dataset
translation (SLT). Modules for quantifying semantic to train the model with the chosen YOLO v5
similarity, conditional sentence construction, and architecture. Through a series of iterative procedures,
word existence verification are all included in the the model is trained to identify and classify a wide
system. By resolving issues like word order variances range of sign language gestures that are included in
and missing words in sign language movies, this the dataset. Exposure to varied hand gestures allows
system increases the efficacy of SLT. Results from the model to fine-tune its parameters, ultimately
experiments using sign language datasets show how achieving skill in reliably detecting and categorizing
useful the method is for improving communication signals in a variety of contexts. System Overview is
between the hearing and the deaf groups [5]. shown in Figure 1.
2. Experimental Methods or Methodology 2.2.3 Fine-Tuning and Optimization
2.1 Data Preprocessing We go into a crucial stage of optimization and fine-
2.1.1 Custom Dataset Creation tuning after the first training phase. To improve
To begin the project, we carefully chose a unique accuracy and robustness, we tweak hyperparameters
dataset designed specifically for sign language as we examine the finer points of the model's
interpretation and identification. This carefully performance. Any particular difficulties or problems
selected dataset includes a wide range of hand found during the first training are methodically
gestures and emotions that are often used in sign resolved. By carefully optimizing the model, we want
language communication. Interestingly, our dataset to improve its performance and make it more resilient
includes 70–75 photos for each of 15 different to various sign language expressions and
classes. The wide range of sign language expressions environmental factors.
that are represented by this selection guarantees a
solid basis for our model's training, which will enable
it to correctly interpret and convert these gestures into

International Research Journal on Advanced Engineering Hub (IRJAEH) 94


International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 02 February 2024
Page No: 93 - 97
https://irjaeh.com

Figure 1 System Overview

2.3 Evaluation and Deployment 2.3.5 Testing with User Feedback


2.3.1 Assessment and Validation A wide range of users, including those who are deaf,
To determine the model's capacity for generalization and dumb, or have other impairments, test the completed
to guarantee dependable performance in a range of programme. We actively gather user input to pinpoint
scenarios, a thorough assessment and validation are areas that need development and enhance the usability of
carried out utilizing distinct datasets. the programme.
2.3.2 Deployment Readiness 2.3.6 Maintenance and Deployment
After the model performs well enough, we get ready to The programme is placed on a cloud-based platform,
deploy it. This entails drafting a plan for smooth guaranteeing its ongoing efficacy and availability.
integration and taking into account any deployment- Frequent maintenance is carried out to fix any new
specific issues [8]. problems and provide upgrades or improvements as
2.3.3 User Interface Implementation required.
The usability of the programme depends on the design of 3. Results and Discussion
a user-friendly interface. Our goal is to develop a user- The proposed model seamlessly recognizes and converts
friendly interface that supports several forms of hand movements into text as shown in Figure 2, it closes
communication, such as text, speech, and sign language. social divides and promotes efficient communication.
2.3.4 Integration of Accessibility Features Its precise comprehension of many sign language
We use features like text-to-speech, screen readers, and expressions makes a substantial contribution to inclusive
hapticfeedback to improve accessibility. The programme communication, overcoming linguistic barriers, and
is more accessible and easier to use for people with a improving accessibility. Object detection output from the
variety of impairments thanks to these enhancements. YOLO model shows different signs shown in Figure 2.

International Research Journal on Advanced Engineering Hub (IRJAEH) 95


International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 02 February 2024
Page No: 93 - 97
https://irjaeh.com

Figure 4 Confusion matrix


Conclusion
The recognition and translation of sign language have
entered a new era of advancements because of the
Figure 2 Object detection output from the incorporation of cutting-edge deep learning techniques,
YOLO model showing different signs particularly neural networks and transformers. Accurate
After being trained on 15 different classes, each with sign language identification is a crucial difficulty that this
70–75 photos, the model's Mean Average Precision revolutionary wave attempts to address. Scholars have
worked hard to address the challenges of identifying
(mAP) metrics shown below in Figure 3 demonstrate its
semantic co-articulations, nonmanual features, and
outstanding accuracy. The strong performance, which spatial-temporal features present in sign language
was achieved by training on a varied dataset, highlights expressions, demonstrating a commitment to improving
the model's efficacy in promoting inclusive and the area. Simultaneously, research into cutting-edge
accessible communication platforms [9]. translation methods like cross-modal reranking and word
existence verification has produced encouraging results,
especially when conducted on datasets that are available
to the public. These creative methods represent a major
advancement in the development of translation techniques
by improving the interpretability of sign language
statements. Moreover, the terrain has been enhanced by
the use of emotion analysis algorithms, improving the
accuracy of text and voice classification in sign language
scenarios. The DED algorithm and the suggested
empathetic speech synthesis approach are examples of
pioneering technologies that highlight the practical
significance of these breakthroughs. They stress the
continued dedication to study, to advance sign language
Figure 3 mAP values of each class processing, and to promote effective communication in
Furthermore, the confusion matrix's thorough study as in linguistically heterogeneous communities, hence
advancing inclusion and accessibility.
Figure 4 sheds further light on the model's performance.
The aforementioned matrix provides valuable insights References
into the intricate details of identification and translation. [1]. Satwik Ram Kodandaram, N Pavan Kumar and
It illustrates the model's ability to discriminate among the Sunil G L, “Sign Language Recognition”, vol.12,
No.14 (2021), 994 – 1009. Doi:
15 classes, which contributes to our comprehension of its
10.13140/RG.2.2.29061.47845
strengths and potential areas for improvement. [2]. E. Rajalakshmi et al., "Multi-Semantic

International Research Journal on Advanced Engineering Hub (IRJAEH) 96


International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 02 February 2024
Page No: 93 - 97
https://irjaeh.com

Discriminative Feature Learning for Sign Gesture


Recognition Using Hybrid Deep Neural
Architecture," in IEEE Access, vol. 11, pp. 2226-
2238, 2023, doi:
10.1109/ACCESS.2022.3233671.
[3]. B. Natarajan et al., "Development of an End-to-
End Deep Learning Framework for Sign
Language Recognition, Translation, and Video
Generation," in IEEE Access, vol. 10, pp. 104358-
104374, 2022, doi:
10.1109/ACCESS.2022.3210543.
[4]. N. Cihan Camgöz, O. Koller, S. Hadfield, and R.
Bowden, "Sign Language Transformers: Joint
Endto-End Sign Language Recognition and
Translation," 2020 IEEE/CVF Conference on
Computer Vision and Pattern Recognition
(CVPR), Seattle, WA, USA, 2020, pp. 10020-
10030, doi: 10.1109/CVPR42600.2020.01004.
[5]. J. Zhao, W. Qi, W. Zhou, N. Duan, M. Zhou, and
H. Li, "Conditional Sentence Generation and
Cross-Modal Reranking for Sign Language
Translation," in IEEE Transactions on
Multimedia, vol. 24, pp. 2662-2672, 2022, doi:
10.1109/TMM.2021.3087006.
[6]. Moganapriya, C., et al. "Dry machining
performance studies on TiAlSiN coated inserts in
turning of AISI 420 martensitic stainless steel and
multi-criteria decision-making using Taguchi-
DEAR approach." Silicon (2021): 1-14.
[7]. Kaliyannan, Gobinath Velu, et al. "Development
of sol-gel derived gahnite anti-reflection coating
for augmenting the power conversion efficiency
of polycrystalline silicon solar cells." Materials
Science-Poland 37.3 (2019): 465-472.
[8]. Velu Kaliyannan, Gobinath, et al. "An extended
approach on power conversion efficiency
enhancement through deposition of ZnS-Al 2 S 3
blends on silicon solar cells." Journal of
Electronic Materials 49 (2020): 5937-5946.
[9]. Sathishkumar, T. P., et al. "Investigation of
chemically treated randomly oriented sansevieria
ehrenbergii fiber reinforced isophthallic polyester
composites." Journal of Composite Materials
48.24 (2014): 2961-2975.

International Research Journal on Advanced Engineering Hub (IRJAEH) 97

You might also like