IEEE Research Paper
IEEE Research Paper
1st Aditya Vikram Singh 2nd Anmol Dev 3rd Mrinal Tyagi 4th Paras Singh 5th Ms. Tanu Gupta
Abstract—India hosts an estimated 18 million deaf individuals, medium serves not only as a mode of communication but as a
facing communication challenges often underreported and under- profound pathway for the exploration of underlying principles.
served. While global sign language use abounds, efficient models The narrative pivots to the primary focus of this transfor-
converting audio to sign language are scarce. This paper explores
the profound link between language and cognitive understanding, mative project—an earnest endeavor to overcome a significant
emphasizing the need for effective communication tools. The communication barrier faced by deaf individuals proficient in
focus lies on the Indian Sign Language (ISL) Generator, bridg- Indian Sign Language (ISL) and those unacquainted with this
ing communication gaps and fostering inclusivity in education, expressive form of communication. At its core, the project
healthcare, employment, and social interactions. Methodolog- is a bridge-building initiative, aiming to furnish a medium
ically, the project employs tokenization, lemmatization, POS
tagging, and ISL grammar conversion, culminating in visually that enhances understanding and interaction between these two
represented output, emphasizing individual words through video linguistic worlds. However, this initiative is not confined to
representation. Results demonstrate successful communication mere bridging; it extends to empowerment.
gap reduction, empowerment of deaf individuals, and broader Enter the ISL Generator—a potent tool that goes beyond
inclusivity. The ISL Generator represents a significant stride in bridging the communication gap. It serves as an empower-
breaking down communication barriers, with future directions
including database expansion and enhanced adaptability. This ment engine, enabling deaf individuals to not only express
project highlights language’s transformative role, promoting themselves effectively but also to access written content
inclusivity for individuals with hearing impairments. and actively engage in societal discourse. The aspirations of
Index Terms—Sign Language Generator, nlkt the project stretch further, reaching into various domains of
life—education, healthcare, employment, and social interac-
I. I NTRODUCTION tions. The ISL Generator, through effective communication,
The 2011 census reveals a stark reality: an estimated 18 becomes a catalyst for inclusivity.
million individuals in India grapple with deafness, a chal- In essence, the project’s culmination is envisioned as the de-
lenge exacerbated by a nation where disabilities, particularly velopment of a comprehensive ISL generator. Proficient in re-
deafness, are systematically underreported and underserved. ceiving input in various forms—text or audio—and producing
This demographic revelation serves as the poignant backdrop accurate ISL signs, this innovative tool emerges as a beacon
against which the profound significance of sign language in the dismantling of communication barriers. The narrative,
emerges. Globally recognized as a vital tool to bridge the fueled by an acknowledgment of the challenges faced by the
communication gap for the deaf or hard of hearing, sign deaf in India, takes a transformative turn, unveiling the power
language, despite its ubiquity, faces a critical shortfall in embedded in sign language and culminating in the introduction
efficient models capable of translating audio inputs into its of the ISL Generator—a beacon of hope in the quest for a more
expressive visual form. inclusive and accessible world for individuals with hearing
Within the intricate tapestry of linguistic diversity, sign impairments.
language stands as more than a mode of communication—it
is the foundational means of connection for individuals within II. L ITERATURE R EVIEW
the deaf community. Analogous to how spoken language D. Bharat et al. in the paper presents a project that focuses
functions as the mother tongue for those engaged in vocal on developing a communication system for deaf people by
communication, sign language becomes a source of solace and converting audio messages into Indian Sign Language (ISL)
fluency for deaf individuals. It transcends mere communica- using a microphone and a camera. The system takes audio as
tion; it becomes a conduit for the exploration and mastery input, converts it into text, and displays relevant ISL images or
of underlying principles, echoing the universal truth that the GIFs, making communication between deaf and normal people
language in which we learn profoundly shapes our cognitive easier. [1]
journey. Sharma et al. in their paper presents a system that translates
The intrinsic connection between an individual and their audio and text inputs into Indian Sign Language (ISL) using
mother language extends far beyond the surface of commu- natural language processing techniques, such as tokenization,
nication; it permeates the very essence of understanding and parsing, lemmatization, and part-of-speech tagging. It aims to
processing information efficiently. Learning and assimilating bridge the communication gap between hearing-impaired indi-
knowledge within the framework of one’s mother language viduals and society by providing a means for them to express
facilitates a seamless comprehension of intricate concepts. their thoughts and ideas effectively. The system matches the
In the context of sign language for the deaf, this linguistic input with videos in a database created by the authors and
people, enabling more effective and inclusive communication.
[5]
H. Mishra et al. presents a web application that uses natural
language processing (NLP) technologies to translate spoken
or written language into Indian Sign Language (ISL), aiming
to bridge the communication gap between deaf and hearing
individuals. Users can input their speech through a microphone
or text, and the system converts it into ISL by matching
the words with corresponding videos in the database. This
application simplifies communication for deaf and speech-
impaired individuals, making it more practical and effective.
[6]
A. Maurya et al. presents an application that uses AI to
bridge the communication gap for deaf individuals who have
difficulty understanding the thoughts of others based on their
movement or motion. The application incorporates predefined
expressions of the English alphabet to help convey messages
Fig. 1. English Alphabets in ISL
effectively. It aims to assist deaf people with hearing problems
in understanding and communicating with others. [7]
generates corresponding ISL sign movements based on the V. Jyoti et al. in their paper aims to recognize hand gestures
grammar rules of ISL. It also utilizes neural network tech- and convert them into readable text and audio speech using
niques and the hidden Markov model. However, the system machine learning algorithms. It also allows written text to be
currently only supports English to ISL translation and requires converted into hand gestures. Sign language recognition and
an exact match with the database for generating output. [2] translation enable real-time learning of spatial representations,
P. Shreelekha et al. in paper presents a comparative analysis language models, and mapping between sign and spoken
of technologies used in small, medium, and large vocabulary language. [8]
Speech Recognition Systems, focusing on the role of language
models in improving accuracy. The experiment conducted In this paper V. Ravi Kumar et al. proposes a system
shows a prominent result for randomly chosen sentences that converts spoken language into Indian Sign Language
compared to sequential sets, highlighting the potential of the (ISL) using a combination of machine learning techniques
system in converting audio signals to text for sign language and computer vision. The system consists of three main com-
translation. [3] ponents: speech recognition, translation, and sign language
generation. The speech recognition component uses a deep
R. Tiwari et al. proposes a real-time system that converts
learning model to convert spoken language into text. The
voice input into text using speech recognition techniques and
translation component maps the text to ISL glosses, which are
then displays the text as a series of images or motioned
linguistic representations of signs. Finally, the sign language
video representing Indian sign language (ISL) on the screen.
generation component uses computer vision techniques to
The system utilizes Pyaudio, SPHINX, and Google speech
generate animated sign language videos based on the ISL
recognition API for voice recognition, and various Python
glosses. The system achieves promising results in terms of
libraries for image processing and display. The paper builds
accuracy and real-time performance, making it a potential
upon previous research on converting sign language to text and
tool for facilitating communication between hearing and deaf
vice versa, utilizing techniques such as gesture recognition,
individuals. The output is shown to the user through 2 methods
KNN classification, and neural networks. It also explores the
that is, video from the ISL dictionary and synthetic animation.
importance and applications of speech recognition in various
[9]
domains. The system is designed as a desktop application with
a low error rate in speech recognition and has been thoroughly In their study H. Monga et al. focuses on the development
tested and validated. [4] of a system to facilitate communication for individuals with
D. Tejaswi et al. presents a communication system for deaf hearing impairment, particularly those who want to learn sign
people in their paper that converts audio messages into text language or reduce communication gaps. It highlights the lack
and displays relevant Indian Sign Language images or GIFs, of research on Indian Sign Language (ISL) and the prevalence
aiming to make communication between normal and deaf of learning American Sign Language (ASL) in India. The
individuals easier. This system addresses the difficulties faced system aims to convert spoken English into ISL gloss using a
by deaf individuals in communication, allowing them to under- rule-based technique and a phrase-based algorithm to improve
stand audio messages through text and sign language images. efficiency and reduce time. The paper also mentions the
It bridges the communication gap between normal and deaf challenges of rendering and portability from ASL to ISL. [10]
III. M ETHODOLOGY standard English and converts the given sentence into a list of
In the proposed system, user input in English is received the individual words in the form of a string. There are many
through text or audio modes. The methodology unfolds ways you can tokenize the input text, we are going to tokenize
through a sequence of key processes, beginning with tokeniza- the sentences into the words. Types of Tokenization:
tion and extending to the conversion of English sentences into 1) Word tokenization: This technique divides text into dis-
Indian Sign Language (ISL) grammar for visual representation. crete words and works well with languages like English
that have distinct word boundaries.
A. Audio Input
2) Character tokenization: By dissecting text into individual
Incorporating a fusion of computer science and linguistics, characters, this technique enables more in-depth text
Speech Recognition is a multifaceted discipline aimed at analysis.
discerning spoken words and transforming them into textual 3) Subword tokenization: This technique is helpful for
form, enabling computers to comprehend human language. languages that combine smaller units to form meaning or
The integration of speech recognition into Python facilitates when working with words that are not in the dictionary
the conversion of spoken words into text, fostering diverse because it breaks text down into smaller units like
applications. The procedural steps involve: syllables or morphemes.
1) Input Handling: Utilizing Pyaudio, which interfaces with 4) Sentence tokenization: Sentence tokenization is the pro-
the cross-platform audio I/O library, PortAudio. This cess of breaking a text into individual sentences, allow-
empowers the system to play and record audio across ing for more granular analysis, and facilitating natural
Windows, Linux, and Mac platforms. Pyaudio offers language processing tasks, such as sentiment analysis
fine-grained control over input and output devices, al- and language modeling. This segmentation enhances the
lowing parameter adjustments and performance moni- understanding of textual content at the sentence level,
toring, such as CPU load and latency. aiding in various language-related applications.
2) Recording: Pyaudio facilitates the audio recording pro-
cess by writing to the designated stream, ensuring effi-
cient capture and handling of audio data.
3) Data Storage: The Wave module, dependent on NumPy,
serves as a tool for reading WAV files as NumPy arrays
and saving NumPy arrays as WAV files, contributing to
effective data storage.
4) Speech-to-Text Conversion: Leveraging the ASSEM-
BLYAI API, specifically tailored for speech-to-text
transcription and other AI-driven functionalities. The
AssemblyAI Python SDK streamlines API integration
within Python code, offering a user-friendly interface
for tasks such as uploading audio files, transcription, Fig. 2. Example of Tokenization
and result retrieval.
The culmination of this methodology results in the genera-
tion of a .txt file through AssemblyAI, translating user-input C. Lemmatization
audio into English text seamlessly. Then we will lemmatize the words in the list. What is
lemmatization? Here is the explanation of lemmatization:
B. Tokenization In the realm of Natural Processing Language(NLP),
Firstly, we will take that input and first we will take the Lemmatization is a text pre-processing technique used in
input, then we will tokenize the sentence. Now, we will natural language processing (NLP) and machine learning to
understand what tokenization is. Tokenization, in the realm of reduce a word to its root form, called a lemma, by con-
Natural Processing Language(NLP), is the process of splitting sidering the word’s meaning in the language it belongs to.
down a sequence of text into smaller parts, known as tokens, Unlike stemming, which simply chops off part of the word
which can be as small as characters and as long as words. at the end, lemmatization algorithms have knowledge of the
It helps machines understand human language by analyzing word’s meaning and aim to identify similarities between
it in bite-sized pieces, making it easier to identify patterns words. Lemmatization is commonly used in NLP and machine
and derive meaning from the text. It is similar to dissecting learning tasks to improve text analysis and understanding. The
a sentence to understand its structure and meaning, allowing lemmatization algorithm, for instance, reduces the word ”best”
NLP practitioners to study the individual components of text to its lemma, ”good”.
and their relationships. Tokenization is crucial in NLP and
machine learning as it enables algorithms to analyze and D. POS Tag
respond to human input effectively. The tokenizer is very well In the linguistic realm, Part of Speech(POS) Tags, also
trained in the standard English text, so it works very well in called grammatical tagging, is the process of the word in
Parsing is the process of comprehending and analyzing
human language. It involves breaking down a sentence into
smaller components and identifying its grammatical structure.
The primary goal of parsing is to understand the grammat-
ical rules followed in a given input, enabling the analysis of
its underlying structure and extraction of meaning.
In this project, the initial step involves analyzing the
grammar of a provided English sentence. This sentence has
undergone tokenization (breaking it into words), lemmatization
(reducing words to their base or root form), and POS tagging
(assigning parts of speech to each word).
Now we will be converting the input English into the ISL
grammar and rearranging the words in the order in which
Fig. 3. Example of Lemmatization
they appear in the ISL visual form. For us to achieve this we
will be analyzing the English sentence’s grammar using POS
the sentence to their corresponding part of speech, based on Tags provides insights into the structure. The next step is to
both the definition and its context in the sentence. A common convert the English sentence into ISL grammar. This involves
example of this is nouns, verbs, adjectives, and many more. rearranging the words according to the visual structure of ISL.
The POS Tags represent their data in short form, e.g. NOUN An example of all these procedures can be seen in the
and its POS Tag will be NN. following image.