Real-Time AI Sign Language Interpreter

The document presents a project on a Real-Time AI Sign Language Interpreter aimed at bridging communication gaps for Deaf individuals by translating sign language into text and speech using AI and machine learning. It highlights the challenges faced by the Deaf community in India and globally, emphasizing the need for accessible communication solutions. The proposed system utilizes advanced technologies to provide real-time gesture recognition and voice output, enhancing inclusivity and participation in various social and professional environments.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Real-Time AI Sign Language Interpreter

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877

Real-Time AI Sign Language Interpreter

Abiram R1; Vikneshkumar D2; Abhishek E T3; Bhuvaneshwari S4; Joyshree K5
1
UG Scholar; 2Assistant Professor; 3UG Scholar; 4UG Scholar; 5UG Scholar
1
Department of Artificial Intelligence and Machine Learning SNS College of Technology Coimbatore, India
2
Department of Information Technology SNS College of Technology Coimbatore, India
3
Department of Artificial Intelligence & Machine Learning SNS College of Technology Coimbatore, India
4
Department of Artificial Intelligence & Machine Learning SNS College of Technology Coimbatore, India
5
Department of Artificial Intelligence & Machine Learning SNS College of Technology Coimbatore, India

Publication Date: 2025/04/19

Abstract: Hearing loss and communication challenges impact the lives of millions of individuals, particularly those who are
Deaf and hard of hearing. 43 million Indians and 466 million people worldwide suffer from debilitating hearing loss,
according to the World Health Organization (WHO). This group struggles to find work, healthcare, and education in India.
Given initiatives like the National Policy for Persons with Disabilities and the Right of Persons with Disabilities Act, there
are still gaps in ensuring full inclusion. By 2050, an estimated 2.5 billion individuals would have hearing loss, requiring 700
million people to undergo hearing rehabilitation, according to WHO estimates an extra 1 billion youths are at risk for
unintentional hearing loss due to unsafe listening practices. By bridging the gap between Deaf people and the general
communication world, our project, the Real-Time Sign Language Interpreter, aims to overcome these obstacles. This
innovative technology enables an uninterrupted communication by instantly translating hand movements into text and then
speech using AI and machine learning. Our project provides the equivalent of Beyond: Communication access for people
from the Deaf community, enabling greater participation in education, employment, and social life. Harnessing this
technology can do a lot with relatively low investment which, in turn, can provide an immense social return by making
services available to everyone, regardless of background or circumstances.

Keywords: Sign Language Recognition, Gesture Recognition, Machine Learning, Computer Vision, AI.

How to Cite: Abiram R; Vikneshkumar D; Abhishek E T; Bhuvaneshwari S; Joyshree K (2025). Real-Time AI Sign Language
Interpreter. International Journal of Innovative Science and Research Technology, 10(4), 681-687.
https://doi.org/10.38124/ijisrt/25apr877

I. INTRODUCTION Video feed is processed by the backend built in Flask, to

extract the hand gestures using OpenCV and MediaPipe, and
In India, a whopping 4.3 crore people suffer from with the already trained CNN model, translated signs into
hearing loss which hinders their interaction ability socially. text. Also, a TTS engine converts the recognized text to
People do not generally know sign language which is the audio output, enabling the communication between Deaf and
primary language of the deaf and hard of hearing. This creates non-sign language users seamlessly.
social exclusion and limits opportunities by creating barriers
to daily education, employment. The accessibility of sign This system greatly enhances accessibility by allowing
language communication has traditionally suffered due to the for effective real-time communication without the need for
requirement for direct interaction with a fluent speaker. human interpreters using computer vision technology and
Besides being time-consuming, manual sign language deep learning techniques. Our project offers stepped up
interpretation is further complicated by the lack of real-time version of Beyond: Communication access for people from
translation technologies. Existing solutions often lack Deaf community to allow greater engagement in education,
precision or prove difficult to implement in practical employment, and social life. Using this same image it can
scenarios. then compare with its previous guesses of poorly distributed
[{}], and associate actions with these changes enabling it to
A major part of this project is a Convolutional Neural populate both high and low recognition trees. Upcoming
Network (CNN), which allows us to classify the gestures of updates could include, for example, multi-language support
the user in real time into outputs. You use HTML, CSS, and to make it more applicable at an international level.
JavaScript to build the frontend, but it is the one that provides
a user interface that can be used interactively and accessibly.

IJISRT25APR877 www.ijisrt.com 681

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877
II. PURPOSE OF RSTL accurate recognition. The research also investigated attention
mechanisms to improve the classification of gestures. Their
The interpretation of traditional sign language is research highlighted the implication of sequential modeling
expensive, unreliable, and hard to find. The platform in gesture communication.
eliminates the need for human interpreters. Thus giving the
Deaf an always-available service at a reasonable price is The service system required a real-time SL interpreter,
provided, further maximizing accessibility. This innovative Natarajan et al. (Pang et al. 2021) proposed an end-to-end
system significantly bolsters the prospects for inclusivity by deep learning framework that integrates TTS synthesis,
enabling real-time conversations, effectively allowing CNNs, and LSTMs. Their approach enhanced accessibility by
individuals who are Deaf to participate fully in diverse social, converting gestures to text and speech, and to support a
academic, and professional environments without variety of sign languages. They provided video-based
encountering communication obstacles. By breaking down answers to provide contextual feedback. "The study
these barriers, it promotes a seamless integration of Deaf highlighted the potential of AI to help the Deaf community
individuals into community interactions and opportunities. eliminate its communication barriers. Experimental results
Now, the system offers a robust collection of different sign exhibited higher recognition rates even for advanced
languages for users in different parts of the world. The system gestures. For improved scalability, the study recommended
is mobile optimized allowing Deaf persons to be able to view additional transformer-based model optimization.
the material from home, office, or on the go, to increase
accessibility. The system enables Deaf and hearing people to Singh et al. (2019) proposed a real-time low latency sign
communicate independently without an intermediary (Deaf language recognition system using OpenCV-based gesture
people can communicate independently with hearing people, detection and pyttsx3 (text-to-speech). Their system
just as they communicate independently with Deaf people), supported low-latency speech output, which is best suited for
which increases independence and reduces the need for low-resource devices like smartphones and Raspberry Pi. The
someone outside for daily communication needs. Since Deaf study was intended to create an efficient and cost-effective
persons may communicate with hearing persons who cannot system for sign language users. Real-time gesture
speak sign language, the system provides the audio-to-text classification by the model improved communication
translation so the Deaf persons can interact with them. The accessibility. To enhance its robustness, it was also utilized in
system allows changes to display settings, text speed, and font varying lighting conditions. Their article explained how AI-
size, making the platform adaptable to users with different driven sign recognition works on edge devices.
reading and visual preferences. The system uses the technique
of real-time knowledge to steadily improve the accuracy of Zhang et al. (2020) proposed solutions for real-time sign
its gesture identification, thus aiding in the wrinkle in sign language identification tasks, such as variations in hand
language actions is changed accurately. The platform itself is gestures, illumination, and background noise. They used a
a living organism and get its output enhanced based on user combination of multiple methods and learning from
feedback which ensures that User needs and expectation are pretrained models to improve the adaptation of the models.
met. Their results showed how pre-trained models can achieve the
highest accuracy over a selection of sign languages. The study
III. LITERATURE SURVEY included visual and depth-based features to improve
identification. The experiments demonstrated significant
In terms of speed and accuracy, Zhou et al. (2019) performance gains when compared to traditional techniques.
presented a CNN-based hand gesture detection model that
significantly outperformed the traditional techniques. The IV. EXISTING SYSTEM
model easily distinguished between static and dynamic
gestures and could be adjusted to be applied in practical  Manual Interpretation Services:
settings. The paper emphasized the significance of deep Human interpreters work in traditional sign language
learning in enhancing the effectiveness and versatility of sign interpreting, where they give direct interpretation in different
language recognition. The authors further compared various places like schools, businesses, and medical facilities. Even
CNN structures for optimizing performance. Higher accuracy though the method can work well in direct interactions, the
was obtained in complicated hand gestures, attesting to the method has disadvantages like scheduling and expense.
robustness of the model. Improvement in the future was Leveraging human timetables and schedules can introduce
proposed by incorporating real-time processing for best delays that would obstruct real-time communication in both
optimization. emergency and casual contexts. A human translator may also
only be able to accommodate one person at a time, which is
Liu et al. (2020) improved continuous sign language not scalable.
recognition through the combination of CNNs and LSTMs,
which automatically learned temporal relationships in sign  Video Relay Services (VRS):
sequences. The collaborative strategy enabled accurate VRS is an Internet-based service that allows a person to
sentence-level interpretation, with improved performance communicate in American Sign Language (ASL) through a
relative to frame-based methods. The research showed the video call with an interpreter. Although more scalable than
viability of deep learning in real-time sign language human interpretation, VRS also requires heavy use of the
translation. They tested with varying datasets with highly internet and such specialized hardware as computers and

IJISRT25APR877 www.ijisrt.com 682

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877
webcams and thus is not feasible in low-resource settings or especially in more dynamic environments such as education
in those with limited access to technology. The service also and workplaces where rapid and accurate interactions are
relies on interpreter schedules and availability, which is less vital.
feasible for spontaneous chat.
 Accessibility and Integration:
 Sign Language Recognition Applications: Current sign language recognition technologies do not
There are some phone applications that make use of the technically guarantee complete universal accessibility for a
phone's camera to identify sign language signs and translate user dependent on a certain sign language variety such as
them to speech or text. The applications make use of American Sign Language (ASL) or a local dialect of
computer vision algorithms for hand sign recognition but are language, primarily due to not supporting large vocabularies
not precise, particularly in dynamic real-time environments. and not being flexible to various styles of signing. Their
Most of the applications do not have extensive vocabularies, limited ability to connect with other communication
identify independent signs only, and do not support real-time modalities makes them less usable and diminishes their
execution for extended conversation. Some of the relevance in diverse, real-world scenarios.
applications also do not support integration with text-to-
speech engines, which would enable users to hear the voice  Lack of Comprehensive Solutions:
translation of identified signs. Current systems are designed to recognize single
isolated gestures or words, not offer a complete solution for
 Limitations of Existing Systems in the Context of the ongoing dialogue. To effectively interpret sign language, it is
Project not enough to recognize gestures; it also needs to translate
them into meaningful text or speech with context awareness.
 Interpreter Access: Your project bridges this limitation by developing a complete
Human interpreters as well as Video Relay Services integrated real-time system that accurately interprets hand
(VRS) can sparse and lacking "in the moment" or at very gestures and translates them to text and speech, facilitating
critical times. This can be an enormous burden for Deaf smoother communication in any setting.
people, no matter the situation, where back-and-forth
communication would be expected at any moment. VRS V. PROPOSED SYSTEM
systems also rely on stable internet connection in order to
function as intended, so they are often not possible options in The intended "Sign Language Interpreter" system is
more rural or underserved communities. designed to close the communication gap between the Deaf
or hard-of-hearing and the non-sign language users by
 Real-Time Performance: providing real-time gesture recognition, translation, and
The vast majority of modern sign language recognition output in text as well as speech. The system utilizes advanced
systems are incapable of functioning in real time, as these machine learning methods, computer vision, and text-to-
systems are severely limited in hardware and/or software. speech capabilities to offer a usable, effective, and simple
They can take time to process gestures, which interrupts the solution for sign language communication. The major
flow of communication and reduces accuracy. Real-time building blocks of the system are the following:
recognition is critical to fluid, continuous communication,

Fig 1 CNN Model Diagram

IJISRT25APR877 www.ijisrt.com 683

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877
 Real-Time Gesture Recognition: The system uses a opportunity for non-sign language users who can read the
Convolutional Neural Network (CNN) to identify and output in real time to follow the conversation easily.
understand sign language gestures in real time. It analyzes
video frames, identifies hand motion, and converts it into  Voice Output: The system additionally includes a text-to-
a recognized sign, which is translated into text or speech. speech (TTS) functionality, such as the pyttsx3 or Google
TTS API, allowing the text to be read in words so that a
 Translating Signs into Text: The recognized signs are non-sign language user can hear the message. This also
converted into text, essentially providing the written form makes it more synchronous so that the users can
of the communicated message. This provides an communicate all-inclusive the Deaf user and the hearing
user for example.

Fig 2 Landing Page

 User Interface: We are developing a simple and intuitive accessibility for individuals varying in technical skill.
easy-to-use user interface for interaction to the system. Users will easily be able to see the interpreted signs and
The user interface will be easy to customize for hear translated speech for communication access.

Fig 3 Flow Chart

IJISRT25APR877 www.ijisrt.com 684

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877

Fig 4 Model Training Page

VI. DESIGN AND IMPLEMENTATION

Fig 5 Prediction Window

The Real-Time Sign Language AI Software project aims system provides the ability to recognize hand gestures in real-
to bridge the communication chasm between Deaf individuals time, and presents text, and, voice output seamlessly for
and those who cannot use sign language via the usage of communication and accessibility.
artificial intelligence and computer vision technologies. This

IJISRT25APR877 www.ijisrt.com 685

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877

Fig 6 Gesture Page

Users use a very easy-to-use interface that allows them inclusive, efficient, and accessible manner. Not only is the
to perform sign language gestures in front of a webcam. The project made more accessible but also a new standard for real-
system records these gestures and transmits them to a secure time sign language interpretation through AI innovation.
robust back end system built on Flask. The machine learning
models, CNN and LSTM, process the input hand gestures into VII. EXPERIMENTAL RESULTS
text and speech outputs. The AI capabilities utilize a deep-
learning hand gesture recognition model built in The Real-Time Sign Language AI Software project aims
TensorFlow/Keras. The AI model has been trained on a vast to address the communication gap between Deaf and non-sign
dataset of hand gestures from numerous sign languages such language using artificial intelligence and computer vision
as American Sign Language (ASL), Indian Sign Language tools. The system enables the identification of hand gestures
(ISL) and British Sign Language (BSL). The model keeps in real-time and consolidates text and voice output for
learning and enhances accuracy in translating intricate hand accessibility and communication.
gestures in different lighting and environmental conditions.
The software has an optimization layer that accommodates Users operate with a very simple-to-use interface
efficiency and precision during real-time gesture recognition. through which they can make sign language gestures in front
It is also multilingual output supporting different languages of a webcam. The gestures are recorded and sent to a secure
for converting text and speech. The interface is also strong back end system developed on Flask. The machine
developed with accessibility functions, such as adjustable learning algorithms, CNN and LSTM, convert the input hand
font size, language choice, and voice modulation for an gestures to text and speech outputs.
improved user interface.
The AI capabilities rely on a deep-learning model to
To further improve the efficiency of the system, Google recognize hand gestures that was built using
API (Gemini-1.5-flash-latest) is used for advanced natural TensorFlow/Keras.
language processing and real-time text-to-speech translation.
The API improves the precision of translation and overall user The AI model was trained on a robust dataset of hand
experience by providing natural and seamless gestures in a variety of sign languages, such as American Sign
communication. Challenges like different hand shapes, Language (ASL), Indian Sign Language (ISL), and British
occlusions, and environmental factors are managed through Sign Language (BSL). The model is constantly learning and
adaptive pre-processing methods, data augmentation, and improving its accuracy for the interpretation of complex hand
real-time model fine-tuning. The cloud-based platform gestures in varying light and situations.
provides scalability and responsiveness, enabling the
software to run efficiently on various devices. The software is equipped with an optimization layer that
provides for efficiency and accuracy in real-time hand gesture
Through the merging of advanced AI and user-centric recognition. It also supports multilingual output in text and
design, Real-Time Sign Language AI Software breaks down speech as a part of multilingual text and speech conversion.
communication barriers for the Deaf community in an The interface is also designed with accessibility features like

IJISRT25APR877 www.ijisrt.com 686

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr877
dynamic font size, language selection, and voice modulation ACKNOWLEDGEMENTS
for a better user interface.
We convey my sincere sense of gratitude and obligation
To further increase efficiency in the system, we utilize to our college "SNS COLLEGE OF TECHONOLGY",
Google API (Gemini-1.5-flash-latest) to perform advanced Coimbatore, which gave me the chance to achieve our dearest
real-time translation using natural language processing and aspirations. I convey my heartfelt thanks and regards to Dr. S.
text-to-speech. This API enhances translation accuracy and Angel Latha Mary, Head of the Department, Information
the overall user experience through natural and seamless Technology & Artificial Intelligence and Machine Learning,
communication. It employs flexible preprocessing for providing this chance to complete this work in the college.
techniques, data augmentation is employed to achieve spatial We would most sincerely like to thank the almighty, my
invariance, and a real-time joint fine-tuning of the model friends and family members without whom this paper would
while in use to address challenges such as continuous be out of question.
changing hand shapes, occlusions, and environmental
perturbations. The employment of a cloud-based system REFERENCES
promotes scalable and responsive use of the software, as it
can run effectively on numerous devices. With the [1]. Siddhant Dani et al., "Survey on the use of CNN and
incorporation of cutting-edge Artificial Intelligence (AI) and Deep Learning in Image Classification", 2021.
user-centric design, Real-Time Sign Language AI Software https://scholar.google.com/scholar
breaks communication barriers for the Deaf community in an [2]. Michele. Russo, "AR in the Architecture Domain:
inclusive, efficient, and accessible manner. The project offers State of the Art", Applied Sciences, vol. 11, no. 15,
enhanced accessibility and effectively sets a new standard for 2021. https://doi.org/10.3390/app11156800
real-time sign language interpretation through AI innovation. [3]. Agnieszka Mikołajczyk and Michał Grochowski,
"Data augmentation for improving deep learning in
VIII. RESULTS image classification problem", 2018 international
interdisciplinary PhD workshop (IIPhDW). IEEE,
The Real-Time Sign Language AI Software shows a 2018. https://ieeexplore.ieee.org/document/8388338
revolutionary way of crossing communication barriers for the [4]. Moniruzzaman Bhuiyan and Rich Picking, "Gesture-
Deaf community. Through extensive testing with 120 controlled user interfaces what have we done and
participants, including Deaf people, teachers, and what's next", Proceedings of the fifth collaborative
interpreters, the system showed high user interaction and high research symposium on security E-Learning Internet
performance. and Networking (SEIN 2009), 2009.
[5]. Tianmei Guo et al., "Simple convolutional neural
A resounding 94% of the users experienced the site as network on image classification", 2017 IEEE 2nd
being user-friendly and simple to use, commenting on how it International Conference on Big Data Analysis
made both the signers and the non-signers welcome. (ICBDA). IEEE, 2017.
Moreover, a total of 91% of participants expressed they were https://ieeexplore.ieee.org/document/8078730
satisfied with how the AI recognized their hand gestures in [6]. Salima Hassairi, Ridha Ejbali and Mourad Zaied, "A
real-time, pointing out it performed consistently well by deep stacked wavelet auto-encoders to supervised
converting them into text and voice output with little latency. feature extraction to pattern classification",
The AI model achieved average recognition accuracies of Multimedia Tools and applications, vol. 77, no. 5, pp.
96% in controlled conditions and 90% in actual use, 5443-5459, 2018. https://doi.org/10.1007/s11042-
displaying some degree of consistency across different 017-4461-z
lighting and backgrounds, and different hand shapes. [7]. Fifth Generation Computer Corporation-"Speaker
According to feedback from expert sign language Independent Connected Speech Recognition.
interpreters, the system’s translation accuracy and speed of
response received ratings of 8.9 and 9.2 out of 10,
respectively, supporting its proposed application in real-life
settings.

The performance data showed that, on average, gesture

translation occurred at 0.8 seconds, resulting in instant
communication. Its integration with Google API (Gemini-
1.5-flash-latest) had benefits for speech synthesis and in
assisting multilingual translation, with two-thirds of users
indicating that they appreciated the multilingual support.
Latency of communication dropped by 75%; and
interpretation cost was reduced by 60% relative to a
traditional interpretation service. Although the system was
able to resolve most forms of communication, there was still
work required to improve gesture recognition and contextual
sentence structure in signing.