Sign Language Recog Paper-1
Sign Language Recog Paper-1
1*
Department of Computer Science, NFC-IET Multan,
1 Introduction
Language is undeniably the most essential social and cultural element created by mankind throughout history. It allows people
to communicate either orally (through spoken discussion) or nonverbally (by gestures, glances, body language, and so on) [1].
A myth was suggested that language reflects each nation. If its relevance is studied further, it may help us understand and
address all situations involving language and the attitude or behavior of sets of language speakers. Language is thought of as a
framework, which is made up of fixed and patternable components. Language is traditionally defined as a tool for interacting
or communicating, which in turn involves conveying concepts, beliefs, theories, or even feelings [2]. Humans need to
communicate to exchange information and express their emotions. One of the human language's essential characteristics is that
all spoken languages blend meaningless material to derive valuable symbols. Language is used not just for everyday speech but
may also be a valuable goose for someone. Language is utilized for more than simply regular communication; it can also be an
important asset for someone. Speech refers to the method used for transmission which can consist of a combination of
vocalizations, vocabulary, syntax, and context. It serves as a means of exchanging ideas, thoughts, and feelings between
VAWKUM Transactions on Computer Sciences
individuals, communities, and cultures. Language is often shaped by social, historical, and geographic factors, resulting in a
range of dialects and variations. It is an essential tool for conveying information and establishing connections, whether it is
within a specific community, country, or profession [3]. Based on the Department of Health, Hearing loss affects 5 million
individuals, accounting for approximately 5% of the global population., and requires some form of rehabilitation [4]. More
than 700 million people are expected to suffer debilitating hearing loss by 2050. Faint, moderate, intense, and extreme are the
different degrees of hearing loss. People classified as "hard of hearing" experience a range from modest to severe hearing loss,
while those "deaf" experience a significant loss of hearing. It is a prevalent chronic condition that often results from sensory
cell degeneration associated with aging. However, it is treatable with conventional hearing aids and communication gadgets
[5]. Persons with normal hearing can listen and converse, but deaf people cannot hear and, if born deaf, cannot even speak [6].
Although many deaf people, particularly those who developed their hearing disability after infancy, can converse effectively
and may use skills such as lip-reading when conversing with hearing people, such communication approaches are frequently
insufficient for communication within the Deaf community. Deaf and hearing-impaired youngsters endure even more difficult
language comprehension issues than their hearing counterparts. People who are deaf have severely limited social connections.
To participate in social settings, Deaf individuals often rely on means such as body language interpretation or sign language to
communicate. For deaf children, it is crucial to develop early literacy abilities and become familiar with written language.
Without these competencies, they may experience limited social involvement with peers and family, negative perceptions from
their family members, immaturity, dependence on others, poor language proficiency, and a range of cultural, behavioral, and
communication challenges [7]. As a result, the usage of body parts has become the primary mode of communication in these
cultures [8]. This is referred to as nonverbal communication.
Nonverbal communication is the act of expressing or conveying messages without using words. It plays an important role
in human connection, and various nonverbal communication techniques are used, including hand signals and gestures, facial
expressions, posture, eye contact, and more. Hand signals and gestures are nonverbal communication methods that involve
hand movements to convey a message. They have become an integral part of language, enabling individuals to communicate
their emotions effectively and enhance communication among themselves. When people from various cultures communicate,
they employ hand signals and gestures. Even those who are born blind and have never seen somebody gesture use their hands
when speaking. Signing or signaling is used by deaf individuals to communicate. So, if a deaf person wishes to communicate,
he or she can use his or her hands/gestures to produce sign language. Each symbol is associated with a distinct word, phrase, or
expression. As a result, it highlights the effectiveness of nonverbal communication methods in conveying messages [9]. Sign
Language (SL) serves as a communication bridge between individuals who are deaf and those who can hear, utilizing gestures
to convey messages [10]. Expressive language involves conveying feelings through body gestures, particularly using the arms
and hands. This method is employed when verbal communication is difficult. It aids interaction with the deaf & HOH [8]. SL is
a sensory language utilized by many Deaf individuals to communicate. Sign language involves using physical movements and
facial expressions, as well as certain body positions, during human-to-human conversations, as well as on social media and
television. Many hearing-impaired people, as well as those with other speech impairments, utilize Sign Language as their first
language [11]. Both physical gestures, such as palm and body movements, including facial expressions, are used to convey
meaning in sign language. These components are combined to form expressions that communicate specific ideas or concepts
[7]. It is essential for the Deaf community to understand and grasp the association between words and phrases, to lead us to a
time when automated translation between syllables and sentences is possible [12]. This research aims to explore the potential of
nonverbal communication in enhancing social interaction between humans and robots. The study delves into the possibility of
improving the connection between social robots and people by investigating the impact of nonverbal communication [9].
The study of social robotics falls under the umbrella of human-robot interaction (HRI) and has garnered significant interest
in recent times due to the widespread integration of robots in daily activities. Interactive robots are being developed for
practical applications including rehabilitation, eldercare, and educational purposes. Consequently, there has been a surge in
research related to inter-species interaction, resulting in the emergence of diverse subtopics within the HRI research domain
[13]. Human-robot interaction research has traditionally emphasized the corporeal aspects of the interactions between robots &
humans, primarily in the context of industrial settings with collaborative robots (cobots) and robotic industrial arms. HRI
research has become more divided. Conversely, social robotics has experienced noteworthy progress, presenting new avenues
for research and innovation in creating robots that can engage with humans in a more natural and social manner [14], robot
designers are always looking for new methods to improve HRI and its adoption in the real world. Human-robot interaction can
take place either verbally or nonverbally. The advancements made in Natural Language Processing (NLP) and speech
recognition technology have led to significant improvements in the accuracy and effectiveness of the verbal aspects of human-
robot interaction (HRI) [15], but nonverbal communication - although being an essential component of human relationships - is
only lightly implemented within real world for social robots. Nonverbal cues such as hand signals, postures, facial expression,
and others provide extra information and meaning to an individual's verbal communication [9]. The scientific community has
long recognized the need for social robots to aid hearing-impaired persons with communication and social inclusion. Deep
learning emerged as best tool for many fields such cancer detection [16], brain disorders [17], covid detection [18], pattern
recognition [19], underwater species detection and sign detection [20]. Recognizing sign language gestures using deep learning
methods can be a challenging task, primarily due to the diverse range of sign languages and a scarcity of large, annotated
VAWKUM Transactions on Computer Sciences
datasets available for training the models. However, recent advancements in machine learning and artificial intelligence have
facilitated the automation and enhancement of such technologies. The method of SLR involves converting a user's hand signals
and movements into textual form, which helps bridge the communication gap between those who cannot communicate through
spoken language and the wider community. This is accomplished through classification procedures and neural networks,
trained to interpret gestures, and convert them into easily understood text. The raw image and video data is converted into
readable text, making it possible for individuals to communicate effectively without the need for verbal communication [21].
Sign language recognition (SLR) aims to develop advanced machine learning algorithms capable of accurately categorizing
human articulations into individual signs or continuous sentences. However, the accuracy and generalization capabilities of
SLR algorithms are currently limited by the absence of large, annotated datasets, as well as the challenges posed by
recognizing sign boundaries in continuous SLR scenarios. Addressing these limitations is crucial for the development of robust
and accurate SLR systems that can effectively create a means of communication that can overcome the barrier among those
who practice sign language and persons who do not [12]. SLR has remained the scientific field that captures and translates sign
language using computer vision & artificial intelligence algorithms [5]. The bulk of research efforts on sign language
processing (SL) concentrate on its recognition, with little attention paid to detection in its true sense. To detect sign language, a
deep neural network (DNN) is commonly usedThe use of deep neural network (DNN)-based models has been demonstrated as
a successful method for sign language detection in various tasks. Ongoing research is focused on enhancing the precision and
speed of these models for improved effectiveness [22]. The fact that most sign language datasets are created for sign
recognition and are gathered in controlled circumstances adds to the lack of acceptable publicly accessible data for
identification of sign language. As a result, researchers have turned to using sample groups obtained from Kaggle Multiple
Combination datasets. This approach allows for the use of real-world data, but it also presents challenges such as variability in
lighting, background, and signing speed, which can affect the accuracy of sign language identification models. Therefore,
efforts are being made to develop more comprehensive and diverse datasets for sign language detection and identification.
Continued research endeavors to enhance the precision and effectiveness of these systems, enabling their application in a wider
range of sign language recognition tasks.
This study's primary issue is identifying the best possible ways to detect sign language with the highest possible accuracy.
It also focuses on the response a hearing-impaired person gets from a robot by utilizing NLP and then assessing the correctness
of that answer.
Here are some points on what researchers will do overall in this research:
Collect and preprocess the ASL dataset: Researchers will gather and preprocess a massive collection of sign language
pictures to meet the first study goal. Deep learning algorithms will be trained and evaluated using this dataset.
Implement and compare deep learning models: To determine the most effective deep learning model for sign language
detection, researchers will implement and compare three different models: MLP, CNN, and Restnet50 v2. The models
will be trained on the ASL dataset, and their performances will be compared based on accuracy and other evaluation
metrics.
Develop an NLP model: To generate responses in sign language communication, authors will develop an NLP model
using BERT. This model will be optimized for the best return mechanism to generate frequent and accurate responses.
Integrate deep learning and NLP models: Once the deep learning and NLP models are developed, researchers will
integrate them to create a sign language communication system that can recognize sign language and generate
automated responses.
Assess the system’s effectiveness: To assess the impact of incorporating NLP in sign language communication
systems. The authors will perform experiments to test the precision and efficacy of the suggested system. Researchers
will also analyze the impact of NLP on the system's overall performance.
Overall, research aims to develop an innovative sign language communication method that can help people with hard of
hearing communicate more effectively. Using deep learning and NLP, researchers hope to achieve higher accuracy and
efficiency in sign language recognition and response generation.
2 Literature
In conducting the literature review, a systematic approach was employed to facilitate an unbiased evaluation of the paper's
substance. This method ensured that the analysis was conducted in a rigorous and structured manner, adhering to established
best practices in the field. To ensure a rigorous and objective assessment of the paper's content, a standardized procedure was
employed, which encompassed several tasks. These tasks involved identifying relevant variables, delineating the authors'
strategic approaches, scrutinizing the methodologies employed to obtain the results, and culminating in a comprehensive
report. A pre-set of factors was used to choose which scientific papers to incorporate in the acquisition process [23]. In this
section, a bibliometric analysis is presented to examine the analysis of the use of autonomous machines in sign language
identification during the previous twenty years. This analysis aims to use bibliometric methods to examine recent publications
on the topics of intelligent machines and SLR. Another objective was to determine patterns and trends in these publications,
VAWKUM Transactions on Computer Sciences
including the journals in which they are published, the regions where the research is conducted, and collaborations between
different institutions and organizations. The interest in automatic sign language recognition has increased over the years, and
therefore, investigating the primary focus of research and existing patterns in this field would be significant [24].
In this research, we performed an analysis of scholarly publications related to the use of intelligent machines in sign language
recognition from 2018 onwards, as shown in Figure 1. The primary objective was to systematize the existing research and
identify patterns and publication trends based on journals, regions, and institutional collaborations. After this initial phase, we
employed a forward/backward technique to thoroughly analyse the collected materials. By adopting this approach, we were
able to comprehend each paper in detail and track crucial research areas, preventing the omission of any fundamental studies.
By employing this method, we were able to create a representative collection of SLR papers that accurately reflected the most
fruitful research directions in the field [23]. In the paper [25], the authors propose a hybrid approach for recognizing Bangla
Sign Language using a deep transfer learning model combined with a random forest classifier. The recognition rates achieved
are 72%-85% for character recognition and 95%-96% for digits recognition. However, the study acknowledges that there is
room for improving the accuracy of the model and highlights the limitation of limited dataset availability. In their research, the
authors introduce "Sign Explainer [26]," a framework that combines explainable AI techniques with ensemble learning for sign
language recognition. The framework achieves an impressive overall recognition rate of 98% and 92.60%. However, one
limitation of the study is the limited utilization of Explainable AI (XAI) methodology, which could have provided deeper
insights into the model's decision-making process. In their research, the authors propose a deep learning model called
"SIGNFORMER [27]" that utilizes a hybrid MLP architecture for sign language recognition. However, the model achieves a
relatively low recognition rate of 0.9%. The study also identifies two main limitations: the absence of preprocessing techniques
for input data and the limited availability of a comprehensive dataset. These limitations may impact the model's performance
and generalization capabilities. The paper presents a novel approach for unsupervised speech-to-sign language recognition
called "Speak, Decipher and Sign. [28]" The proposed method combines a hybrid architecture consisting of MLP and LSTM
deep learning algos. This technique achieves a recognition rate of at least 0.6%. However, the study acknowledges a limitation
regarding the dataset, which is relatively small. This limitation may impact the model's performance and its ability to
generalize to a wider range of sign language data.
NLP
NLP is a subfield of AI with an emphasis on training machines to comprehend speech that is not artificial [49]. Its applications
are vast, including sentiment analysis, text analysis, text summarization, and speech recognition. There are numerous strategies
and techniques utilized in NLP [50], with researchers focusing on specific techniques in their experiments as described in
various research papers.
Special children's learning styles are diverse because they require various sorts of educational needs. These youngsters cannot
be taught in the traditional manner. Intelligent tutoring systems that include AI, ML, and DL may motivate and reinforce
youngsters in an educational and learning setting. The efficiency of three distinct approaches for detecting MNIST Sign
Language is compared in this article: the (CNN) process, the Support Vector Machine (also known as SVM) methodology, &
the decision tree method. The results demonstrate that the CNN technique delivers up to 86% accuracy, while the Support
Vector Trained model achieves an accuracy of 84% among machine learning algorithms [51]. The purpose of this research [52]
is to perform a scoping review of the utilization of AI in instant translation applications from spoken language to sign
language. The goal is to provide an AI-based technique to turn South African Sign languages from monologue-to-text to sign
language, allowing for interactive interaction between those who can hear and those who cannot. The study revealed a lack of
information on the utilization and approval of machine learning and other related methods as potential solutions for the deaf
and hard-of-hearing (HOH) population, specifically in Africa.
impaired [53]
EasyTalk: A Translator 2020 2-way process Fastest RNN, 80% System stops
for Sri Lankan Sign (sign-to-text/grap CNN, NLP & halfway, invalid
Language using Machine hics + vice versa) Semantic Dataset
Learning and Analysis
Artificial [54]
2-way Arabic Sign 2020 2-way process Hybrid model: 88.67% only capable of
Language Translator (Text-to-sign+ CNNLSTM translating single
using CNNLSTM vice versa) dynamic
Architecture and expressions
NLP [55]
Literation Hearing 2019 Conversation Naïve Bayes 88.75% Limited Dataset
Impairment (I-Chat Bot): mechanism
Natural Language Chatbot is
Processing (NLP) and constructed
Naïve Bayes Method [56]
Sign Language 2019 Vision based Adam, SGD, 99.12% It is complicated
Recognition System system CNN to use
Using Deep Neural
Network [57]
Intelligent Mobile 2018 Instant 2D GIF, >90% The file format
Assistant for Hearing Messaging App Semantic does not work.
Impairers to Interact (text-to-SL + Analysis, TTS
with the Society in Vice versa) Engine, ML,
Sinhala Language [58] NLP(App)
A Wearable System for 2016 Wearable System Sensors (IMU, 96.16% Talking and
Recognizing American sEMG) using hand-held
Sign Language in Real- sensors would
Time Using IMU and not offer the
Surface EMG identical
Sensors [59] precision level,
Limited Dataset
(80)
3 Methodology:
VAWKUM Transactions on Computer Sciences
/ 1 Dataset
A dataset is a data collection organized into a specific format for analysis. One’s dataset is the information that is used to train
machine-learning models. A dataset can come in many different forms, including text, images, audio, and more. It is essential
to understand the dataset to ensure that the machine-learning model is accurate and efficient.
The dataset used is related to the task trying to solve using machine learning. It could detect from recognizing handwritten
digits to detecting objects in images. It is crucial to have a clear understanding of the data and the problems that individuals are
trying to solve before building any machine learning model. Our dataset includes a set of images that serve as training data for
our machine learning model. Each image in the dataset is associated with a label that identifies the object or entity represented
in the image. These labels are used to train our model to recognize similar objects or entities in new images. The size and
diversity of the dataset are crucial to ensure effective training of the machine learning model. However, working with a large
dataset can present challenges such as increased computational requirements and the need for efficient data storage and
retrieval. To use the dataset for machine learning, it must first undergo preprocessing to clean and transform the data into a
format that can be readily consumed by our model. Preprocessing may include tasks such as resizing images, removing noise,
and converting the data to a numerical format. In conclusion, your dataset is a critical component of your machine-learning
project, and it is essential to understand it thoroughly. A well-prepared dataset can make a significant difference in the
accuracy and efficiency of your machine-learning model. It is essential to preprocess your dataset carefully to ensure that it is
optimized for use in machine learning. This database comprises a well-organized and labeled collection of image data
containing American Sign Language alphabets [60]. The data is sorted into 29 separate folders, each representing a different
class. The training dataset contains a whopping 87,000 images, each measuring 200x200 pixels. The 26 folders represent the
26 alphabets from A-Z, while the remaining three folders contain images for SPACE, DELETE, and NOTHING. These three
folders are of immense value for real-time applications and classification purposes. Additionally, a test dataset consisting of 29
images is provided to facilitate the use of real-world test images. Akash Nagaraj created the set, which is now available for
public viewing. The hand movements exhibited in the photos were observed to be signed against a black background. The
project's goal was to generate a dataset of real-world test images. Personal images were taken during the process. The hand
movements exhibited in the photos were then signed against a black background. It is not necessary to purchase any equipment
such as gloves or a particular marking system with this system, and all essential photographs can be prepared with a single
basic camera. The suggested system's goal is to speak with deaf individuals in addition to conversing with computers and
robots, which were employed in previous studies.
Sign language is a form of nonverbal communication that employs hand gestures, facial expressions, and body language to
express ideas and convey meaning. It is widely used by deaf and hard-of-hearing individuals across the world to interact with
others. Recognizing sign language poses a vital challenge to researchers because of its complexity and variation. However,
recent advances in deep learning methods [20], such as recurrent neural networks (RNN), long short-term memory (LSTM),
artificial neural networks (ANN), and spiking neural networks (SNN), have shown considerable potential in identifying and
interpreting sign language. Deep learning is a subset of machine learning that involves the training of artificial neural networks
VAWKUM Transactions on Computer Sciences
using huge amounts of data to perform complex tasks, such as speech recognition, natural language processing, and image
recognition. Deep learning models can learn patterns from the data and use them to make accurate prediction [57].
SLR is a crucial field of study that tries to refine transmission between deaf or hearing persons. The Multi-Layer Perceptron
(MLP) models [61], a kind of neural network that could recognize patterns in data, are one method of solving this challenge.
The code provided illustrates a demonstration of an MLP model with numerous layers of neurons and multiple activation
mechanisms utilised in sign language recognition. The classified cross-entropy loss function and the Adam optimizer are used
to train the model, and accuracy measures are used to assess its performance. MLP models can recognise sign language
motions with a high degree of accuracy by training on an ample collection of sign language pictures & fine-tuning the model
parameters. The offered code is an MLP based neural network model for recognizing sign language. The system consisted of a
sequential network of neurons with 10 layers, 3 dense layers, each with 200 units and the stimulation mechanism "relu", 2
dropout layers, each with a 10% dropout rate, and 4 further dense layers, each with fewer units and a more sophisticated
activation function. The final output layer generates chances for 29 different classes using the "softmax" activation function.
The model uses the categorical cross-entropy loss function and the Adam optimizer. The model is trained with both validation
and training generators using the fit_generator() technique. A development rate planner callback function is used to control the
growth rate during training, and the total quantity of eras is set to 30. In conclusion, the plotCurves().
Convolutional Neural Networks (CNNs) are a powerful type of deep learning model that excel in image recognition tasks,
including sign language recognition. CNNs learn spatial hierarchies of features from raw image data, allowing them to identify
and classify objects within an image [29]. In the context of sign language recognition, CNNs can be trained to identify the hand
gestures and movements that correspond to letters, words, or phrases in American Sign Language (ASL). The gestures can be
represented as image data, either as still images or as sequences of images captured in real-time.
The architecture of a CNN consists of convolutional layers, pooling layers, and fully connected layers [62]. The convolutional
layers use small filters to scan across the input image and identify patterns and features at various scales. The pooling layers
then reduce the spatial dimensionality of the feature maps, extracting the most valuable information while discarding redundant
details. Finally, the fully connected layers use the learned features to classify the image into one of several categories.
To recognize sign language, a CNN can be trained using labeled image data, with each image corresponding to a particular sign
or gesture. The network can then learn to identify the key features of each sign and use these to accurately classify new, unseen
images. This approach offers a flexible and accurate way of recognizing a wide range of gestures and movements, with
potential applications in assistive technology and human-computer interaction. The code initializes a CNN model using Keras'
Sequential API and adds layers to the model in a linear order. The layers consist of convolutional layers with ReLU activation,
MaxPooling2D layers, a Flatten layer, and Dense layers with ReLU and softmax activation. The model is indoctrinate using the
categorical cross-entropy loss function, Adam. CNN is designed to recognize hand-written digits with high accuracy by
extracting features from input images and using them to make predictions.
Deep learning algorithms have showed considerable potential in the field of SLR, which is a challenging task in sight.
ResNet50V2, a formerly trained deep learning framework that can be further refined for sign language detection, is one such
model. ResNet50V2 is utilised as the base model in this code, with additional layers put on top to boost its performance for
recognising signs. The Flatten layer is used to turn the underlying model's output into a one-dimensional vector that can enter
the Dense layer. To learn more complex representations of the process facts, the Dense layer with 512 units and the ReLU
activation function is used. A dropout method structure with a pace of 0.5 is placed after the Dense layer to avoid overfitting.
In the end, a Predicting layer that utilises a Softmax activation algorithm is utilised to generate the last result, which comprises
of 29 potential groups of American Sign Language hand motions and movements.
In general, this type of architecture [63] is intended to learn spatial feature hierarchies using unprocessed image data and
reliably classify hand motions and movements. This model can attain elevated levels of accuracy with proper training and
optimisation, enabling innovative uses in fields like adaptive devices and human-computer interaction.
The key issue investigated in the current research is the precise interpretation of gestures for hearing-impaired individuals
and the usefulness of applying NLP to offer responses via a robot. This study employed a variety of deep learning approaches,
including multi-layered perception (MLP), Convolutional Neural Network (CNN), and RestNet50V2, to determine the most
accurate method for detecting sign language. The ASL dataset, which contains 87,000 images, was used for the study. This
chapter summarizes the study's findings, including the outcomes of each deep neural network model and the chosen approach
for creating an auto-response utilizing NLP.
MLP
Prior When developing a multi-layer perceptron (MLP) system for sign language recognition, researchers employed a
training strategy that involves progressively increasing and decreasing the rate of learning while advancing the number of
epochs. Despite our best efforts, they discovered that MLP networks were inefficient when it comes to image data. As an
outcome, they did not track accuracy during the training period. Instead, they focused on CNNs, which have been found to
outperform MLP networks on picture data because of their ability to account for spatial structure. Finally, researchers found
that CNN was the ideal option for our detection of sign languages challenge, and then used the RestNet50V2 framework to
simulate its superior recognition accuracy.
VAWKUM Transactions on Computer Sciences
CNN
The CNN model was the second model we explored in our research, and it gave us an accuracy of 0.9599%, which can be
rounded off to 0.96%. Convolutional Neural Networks (CNNs) were utilized to construct the model, and the ASL dataset was
used for training. Three convolutional blocks and a fully connected layer with 512 neurons compose the model. The model
employed the Rectified Linear Unit (ReLU) activation function for the convolutional and fully connected layers, and the
softmax function for classification in the output layer. To reduce the spatial size of the input, we additionally applied the max-
pooling operation after each convolutional block. The CNN model offers an innovative approach to recognize sign language,
showing potential improvements in accuracy and efficiency compared to existing systems.
RESTNET50V2
Following an exhaustive series of trials and training with three distinct deep learning models: MLP, CNN, and
RestNet50v2, researchers were able to obtain an impressive accuracy of 0.97 using the RestNet model. To achieve this, a
learning rate schedule function, and callbacks, such as early stop and a custom callback that cancels training when the accuracy
reaches 97%, were implemented. Despite attempting to enhance the performance of the other two models by increasing layers
and modifying the learning rate, the RestNet model demonstrated superior results, affirming its position as the most suitable
model for the given task. These results have significant implications for similar deep learning tasks, indicating the importance
of selecting the appropriate model to attain maximum accuracy.
VAWKUM Transactions on Computer Sciences
Utilizing the ASL dataset, the best effective deep learning model for image categorization was identified., a
comprehensive evaluation was conducted. This involved running three different models: MLP, CNN, and Restnet50v2. Each
model was tested on the dataset, and their respective accuracies were calculated.
Due to its excellent accuracy of 0.97% the Restnet50v2 model was chosen as the major research model after analyzing the
accuracy results obtained from the various deep learning algorithms evaluated on the ASL dataset. The algos outstanding work
proves it to be an ideal candidate for image grading tasks that require high precision. To achieve this level of accuracy, the
researchers implemented a learning rate schedule function and callbacks such as early stop and a custom callback that cancels
training when the accuracy reaches 97%. Despite attempts to increase the effectiveness of the other two models with additional
layers, and adjusting the learning rate, the Restnet model's results surpassed the others. These findings highlight the
significance of selecting the right model to achieve the best results for deep learning tasks. Finally, the evaluation of deep
neural network models over the ASL dataset yielded useful insights, leading to the selection of the Restnet50v2 model, which
aligns with the research's objective of achieving high accuracy in image classification.
The goal of this research is to create an auto-response generator using bidirectional encoder representations (BERT from
Transformers) coding to aid deaf or hard of hearing. Nowadays, systems are either translation or interpretation machines. Sync-
response generating is a plan to develop a technology that can connect directly with the audience without an intermediary.
The BERT approach is an already trained automatic language processing model that can be used for the classification of
texts, linguistic translation, and text production, among other things. The suggested system implements the BERT paradigm
and generates text answers using the PyTorch module. The system was evaluated using the BLEU score, which assesses the
similarity between the generated text and a reference text. The program begins by importing the pre-trained GPT-2 algorithm
and tokenizer, which happens to be a variation of BERT intended specifically for text production. The model is then put into
evaluation mode, and the BERT model is used to generate text. The BLEU score is then applied to the created text, and the
system's correctness is calculated. According to the results, the suggested system has a reliability of 0.8, meaning it represents a
substantial boost over existing systems.
The auto-response source is an intriguing breakthrough having the capacity to significantly boost the standard of existence
for deaf and HOH people. The proposed technique achieved great precision, verifying the strategy of BERT effectiveness in
creating text responses. Additionally, by incorporating new machine learning algorithms, the system may be improved. and, by
expanding its response dictionary. In the end, the study demonstrates the potential of employing BERT to create text answers
VAWKUM Transactions on Computer Sciences
with high accuracy, and the need to analyze the produced material using measures that include the BLEU score to assure the
output quality.
5 Conclusion:
This study aimed to create a system that would let deaf people communicate with technology. To achieve this goal, the
researchers utilized an ASL dataset containing 87,000 images and implemented three different deep learning models: MLP,
CNN, and Restnet50 v2 to recognize sign language gestures. Restnet50 v2 outperformed the other models, achieving an
accuracy of 0.97%. The system created by the researchers not only recognizes sign language but also generates automated
responses using NLP model BERT, which utilizes transformers for text generation. The system's overall accuracy was
determined to be 0.8% with the generated response accuracy measured by the BLEU score being about 0.83%. This method
uses technology to give an innovative solution aimed at enhancing interaction for deaf people. It opens new avenues for
developing intelligent chatbots that can better understand human language and nonverbal communication, which can be used in
various industries such as healthcare, education, and entertainment.
More research can be done to increase the system's accuracy and speed, and to expand its functionality to recognize a
wider range of sign language gestures. This can be achieved by implementing more advanced deep learning models and
increasing the size of the dataset. Additionally, user feedback and testing can be used to improve the user interface and make it
more user-friendly for individuals with hearing impairments. Overall, this study demonstrates the potential of technology to
provide novel solutions to enhance the lives of people with deafness. It is a significant step forward in addressing the
communication barriers faced by hard of hearing individuals and can have a positive impact on their social and professional
lives.
6 References
[1] M. B. Miralles, “Supervisor: Teresa Morell Moll,” no. June, p. 93, 2020, [Online]. Available:
https://rua.ua.es/dspace/bitstream/10045/107795/1/The_orchestration_of_verbal_and_nonverbal_modes_of_co_Bastias_Mirall
es_Marta.pdf
[2] L. C. Moats, “Speech to print language essentials for teachers,” pp. 1–11, 2020, [Online]. Available: http://slubdd.de/katalog?
TN_libero_mab216782845
[3] K. C. F. Kurnianti, “The Study of Verbal and Nonverbal Language in Communication to Create Images in Coca Cola Zero
Advertisement,” pp. 2–15, 2010.
[4] world health org, “WHO.”
[5] B. Joksimoski et al., “Technological Solutions for Sign Language Recognition: A Scoping Review of Research Trends,
Challenges, and Opportunities,” IEEE Access, vol. 10, no. January, pp. 40979–40998, 2022, doi:
10.1109/ACCESS.2022.3161440.
[6] A. A. Haseeb and A. Ilyas, “Speech Translation into Pakistan Sign Language Speech Translation into Pakistan Sign Language
Speech Translation into Pakistan Sign Language,” 2012.
[7] P. Akach, “The grammar of sign language,” Language Matters, vol. 28, no. 1, pp. 7–35, 1997, doi:
10.1080/10228199708566118.
[8] S. Vamplew, “Recognition of sign language gestures using neural networks,” Neuropsychological Trends, no. 1, 2021, doi:
10.7358/neur-2007-001-vamp.
[9] Mayuresh Amberkar, “Humanoid Robot handling Hand-Signs Recognition,” no. August, 2020.
[10] M. Burton and S. Gilbert, “Evaluation of sign language learning tools: Understanding features for improved collaboration and
communication between a parent and a child,” ProQuest Dissertations and Theses, p. 101, 2013, [Online]. Available:
http://ezproxy.nottingham.ac.uk/login?url=https://search.proquest.com/docview/1415424736?accountid=8018%0Ahttps://
nusearch.nottingham.ac.uk/openurl/44NOTUK/44NOTUK?genre=dissertations+%26+theses&atitle=&author=Burton
%2C+Melissa&volume=&issue=&spage=&date=
[11] Dindha Amelia, “DEEP LEARNING BASED SIGN LANGUAGE TRANSLATION SYSTEM,” vol. 21, no. 1, pp. 1–9, 2020,
[Online]. Available: http://mpoc.org.my/malaysian-palm-oil-industry/
[12] C. Papastratis, Ilias Chatzikonstantinou, D. Konstantinidis, K. Dimitropoulos, and P. Daras, “Artificial intelligence
technologies for sign language,” Sensors, vol. 21, no. 17, 2021, doi: 10.3390/s21175843.
[13] F. Mazzei, Daniele Chiarello and G. Fantoni, “Analyzing Social Robotics Research with Natural Language Processing
Techniques,” Cognit Comput, vol. 13, no. 2, pp. 308–321, 2021, doi: 10.1007/s12559-020-09799-1.
[14] E. Stulp, Freek Oztop, P. Pastor, M. Beetz, and S. Schaaz, “Compact models of motor primitive variations for predictable
reaching and obstacle avoidance,” 9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09, pp. 589–
595, 2009, doi: 10.1109/ICHR.2009.5379551.
[15] I. A. Hameed, “Using natural language processing (NLP) for designing socially intelligent robots,” 2016 Joint IEEE
International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2016, pp. 268–269, 2017, doi:
10.1109/DEVLRN.2016.7846830.
[16] M. Tahir, A. Naeem, H. Malik, J. Tanveer, R. A. Naqvi, and S. W. Lee, “DSCC_Net: Multi-Classification Deep Learning
Models for Diagnosing of Skin Cancer Using Dermoscopic Images,” Cancers (Basel), vol. 15, no. 7, Apr. 2023, doi:
10.3390/cancers15072179.
[17] A. Naeem, T. Anees, R. A. Naqvi, and W. K. Loh, “A Comprehensive Analysis of Recent Deep and Federated-Learning-Based
Methodologies for Brain Tumor Diagnosis,” Journal of Personalized Medicine, vol. 12, no. 2. MDPI, Feb. 01, 2022. doi:
10.3390/jpm12020275.
[18] H. Malik, T. Anees, A. Naeem, R. A. Naqvi, and W. K. Loh, “Blockchain-Federated and Deep-Learning-Based Ensembling of
Capsule Network with Incremental Extreme Learning Machines for Classification of COVID-19 Using CT Scans,”
Bioengineering, vol. 10, no. 2, Feb. 2023, doi: 10.3390/bioengineering10020203.
[19] A. Naeem, T. Anees, K. T. Ahmed, R. A. Naqvi, S. Ahmad, and T. Whangbo, “Deep learned vectors’ formation using auto-
correlation, scaling, and derivations with CNN for complex and huge image retrieval,” Complex and Intelligent Systems, Apr.
2022, doi: 10.1007/s40747-022-00866-8.
[20] Y. Obi, K. S. Claudio, V. M. Budiman, S. Achmad, and A. Kurniawan, “Sign language recognition system for communicating
to people with disabilities,” Procedia Comput Sci, vol. 216, pp. 13–20, 2023, doi: 10.1016/j.procs.2022.12.106.
[21] bheemilimandal, “a Robust Sign Language and Hand Gesture Recognition System Using Convolution Neural Networks,”
Anits, pp. 2019–2020, 2020, [Online]. Available: http://cse.anits.edu.in/projects/projects1920C6.pdf
[22] K. P. . C. Mark Borg, “SIGN LANGUAGE DETECTION ‘ IN THE WILD ’ WITH RECURRENT NEURAL NETWORKS
Mark Borg , Kenneth P . Camilleri,” pp. 1637–1641, 2019.
VAWKUM Transactions on Computer Sciences
[23] T. Al-Qurishi, Muhammad Khalid and R. Souissi, “Deep Learning for Sign Language Recognition: Current Techniques,
Benchmarks, and Open Issues,” IEEE Access, vol. 9, pp. 126917–126951, 2021, doi: 10.1109/ACCESS.2021.3110912.
[24] O. O. Adeyanju, I. A. Bello and M. A. Adegboye, “Machine learning methods for sign language recognition: A critical review
and analysis,” Intelligent Systems with Applications, vol. 12, p. 200056, 2021, doi: 10.1016/j.iswa.2021.200056.
[25] S. Das, M. S. Imtiaz, N. H. Neom, N. Siddique, and H. Wang, “A hybrid approach for Bangla sign language recognition using
deep transfer learning model with random forest classifier,” Expert Syst Appl, vol. 213, Mar. 2023, doi:
10.1016/j.eswa.2022.118914.
[26] D. R. Kothadiya, C. M. Bhatt, A. Rehman, F. S. Alamri, and T. Saba, “SignExplainer: An Explainable AI-Enabled Framework
for Sign Language Recognition With Ensemble Learning,” IEEE Access, vol. 11, pp. 47410–47419, 2023, doi:
10.1109/ACCESS.2023.3274851.
[27] D. R. Kothadiya, C. M. Bhatt, T. Saba, A. Rehman, and S. A. Bahaj, “SIGNFORMER: DeepVision Transformer for Sign
Language Recognition,” IEEE Access, vol. 11, pp. 4730–4739, 2023, doi: 10.1109/ACCESS.2022.3231130.
[28] L. Wang et al., “Low-complexity Features for Audio Event Detection View project Speak, Decipher and Sign: Toward
Unsupervised Speech-to-Sign Language Recognition.” [Online]. Available:
https://www.researchgate.net/publication/370832215
[29] T. Petkar, T. Patil, A. Wadhankar, V. Chandore, V. Umate, and D. Hingnekar, “Real Time Sign Language Recognition System
for Hearing and Speech Impaired People,” Int J Res Appl Sci Eng Technol, vol. 10, no. 4, pp. 2261–2267, 2022, doi:
10.22214/ijraset.2022.41765.
[30] S. Sharma and K. Kumar, “ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural
networks,” Multimed Tools Appl, vol. 80, no. 17, pp. 26319–26331, 2021, doi: 10.1007/s11042-021-10768-5.
[31] R. Rastgoo, K. Kiani, and S. Escalera, “Real-time isolated hand sign language recognition using deep networks and SVD,” J
Ambient Intell Humaniz Comput, vol. 13, no. 1, pp. 591–611, 2022, doi: 10.1007/s12652-021-02920-8.
[32] J. Islam, Md Mohiminul Siddiqua, Sarah nan, “Real time Hand Gesture Recognition using different algorithms based on
American Sign Language,” 2017 IEEE International Conference on Imaging, Vision and Pattern Recognition, icIVPR 2017,
vol. 21, no. 03, pp. 1–6, 2020, doi: 10.1109/ICIVPR.2017.7890854.
[33] D. Huh et al., “Generative Multi-Stream Architecture for American Sign Language Recognition,” 2019 IEEE MIT
Undergraduate Research Technology Conference, URTC 2019, 2019, doi: 10.1109/URTC49097.2019.9660587.
[34] A. Elboushaki, R. Hannane, K. Afdel, and L. Koutti, “MultiD-CNN: A multi-dimensional feature learning approach based on
deep convolutional networks for gesture recognition in RGB-D image sequences,” Expert Syst Appl, vol. 139, p. 112829, 2020,
doi: 10.1016/j.eswa.2019.112829.
[35] Ç. Özdemir, Oğulcan Gökçe, A. A. Kındıroğlu, and L. Akarun, “Score-Level Multi Cue Fusion for Sign Language
Recognition,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol. 12536 LNCS, no. September 2020, pp. 294–309, 2020, doi: 10.1007/978-3-030-66096-3_21.
[36] M. Borg and K. P. Camilleri, “Phonologically-Meaningful Subunits for Deep Learning-Based Sign Language Recognition,”
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 12536 LNCS, pp. 199–217, 2020, doi: 10.1007/978-3-030-66096-3_15.
[37] K. Rastgoo, Razieh Kiani and S. Escalera, “Hand sign language recognition using multi-view hand skeleton,” Expert Syst Appl,
vol. 150, p. 113336, 2020, doi: 10.1016/j.eswa.2020.113336.
[38] A. Mittal, P. Kumar, P. P. Roy, R. Balasubramanian, and B. B. Chaudhuri, “A Modified LSTM Model for Continuous Sign
Language Recognition Using Leap Motion,” IEEE Sens J, vol. 19, no. 16, pp. 7056–7063, 2019, doi:
10.1109/JSEN.2019.2909837.
[39] H. B. D. Nguyen and H. N. Do, “Deep learning for American sign language fingerspelling recognition system,” 2019 26th
International Conference on Telecommunications, ICT 2019, pp. 314–318, 2019, doi: 10.1109/ICT.2019.8798856.
[40] K. M. Lim, A. W. C. Tan, C. P. Lee, and S. C. Tan, “Isolated sign language recognition using Convolutional Neural Network
hand modelling and Hand Energy Image,” Multimed Tools Appl, vol. 78, no. 14, pp. 19917–19944, 2019, doi: 10.1007/s11042-
019-7263-7.
[41] C. C. de Amorim, D. Macêdo, and C. Zanchettin, “Spatial-Temporal Graph Convolutional Networks for Sign Language
Recognition,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol. 11731 LNCS, pp. 646–657, 2019, doi: 10.1007/978-3-030-30493-5_59.
[42] X. Chen, Yuxiao Zhao, Long Peng, J. Yuan, and D. N. Metaxas, “Construct dynamic graphs for hand gesture recognition via
spatial-temporal attention,” 30th British Machine Vision Conference 2019, BMVC 2019, pp. 1–13, 2020.
[43] Y. Ma, G. Zhou, S. Wang, H. Zhao, and W. Jung, “SignFi: Sign Language Recognition Using WiFi,” Proc ACM Interact Mob
Wearable Ubiquitous Technol, vol. 2, no. 1, pp. 1–21, 2018.
[44] S. Ameen and S. Vadera, “A convolutional neural network to classify American Sign Language fingerspelling from depth and
colour images,” Expert Syst, vol. 34, no. 3, 2017, doi: 10.1111/exsy.12197.
[45] S. Y. Kim, H. G. Han, J. W. Kim, S. Lee, and T. W. Kim, “A hand gesture recognition sensor using reflected impulses,” IEEE
Sens J, vol. 17, no. 10, pp. 2975–2976, 2017, doi: 10.1109/JSEN.2017.2679220.
[46] T. K. Õ, “6ljq /dqjxdjh 5hfrjqlwlrq,” 2017.
VAWKUM Transactions on Computer Sciences
[47] jaime s C. and ana rebelo pedro m Ferreira, “Multi Model Learning for Sign Language Recognition.” 2017.
[48] O. K. Oyedotun and A. Khashman, “Deep learning in vision-based static hand gesture recognition,” Neural Comput Appl, vol.
28, no. 12, pp. 3941–3951, 2017, doi: 10.1007/s00521-016-2294-8.
[49] F. Mallek, N. Tan Le, and F. Sadat, Automatic machine translation for Arabic tweets, vol. 740. 2018. doi: 10.1007/978-3-319-
67056-0_6.
[50] “nlp basic ventsislav 2018.”
[51] J. Jayanthi, “Role of Machine Learning and Deep Learning in Assisting the Special Children ’ s Learning Process,” vol. 13, no.
2, pp. 2327–2334, 2022.
[52] M. Madahana, K. Khoza-Shangase, N. Moroe, D. Mayombo, O. Nyandoro, and J. Ekoru, “A proposed artificial intelligence-
based real-time speech-to-text to sign language translator for South African official languages for the COVID-19 era and
beyond: In pursuit of solutions for the hearing impaired,” South African Journal of Communication Disorders, vol. 69, no. 2,
2022, doi: 10.4102/sajcd.v69i2.915.
[53] L. A. Kumar, D. K. Renuka, S. L. Rose, M. C. Shunmuga priya, and I. M. Wartana, “Deep learning based assistive technology
on audio visual speech recognition for hearing impaired,” International Journal of Cognitive Computing in Engineering, vol. 3,
no. January, pp. 24–30, 2022, doi: 10.1016/j.ijcce.2022.01.003.
[54] D. Manoj Kumar, K. Bavanraj, S. Thavananthan, G. M. A. S. Bastiansz, S. M. B. Harshanath, and J. Alosious, “EasyTalk: A
translator for Sri Lankan sign language using machine learning and artificial intelligence,” ICAC 2020 - 2nd International
Conference on Advancements in Computing, Proceedings, pp. 506–511, 2020, doi: 10.1109/ICAC51239.2020.9357154.
[55] T. Agrawal and S. Urolagin, “2-way Arabic Sign Language Translator using CNNLSTM Architecture and NLP,” ACM
International Conference Proceeding Series, pp. 96–101, 2020, doi: 10.1145/3378904.3378915.
[56] M. Anggraeni, M. Syafrullah, and H. A. Damanik, “Literation Hearing Impairment (I-Chat Bot): Natural Language Processing
(NLP) and Naïve Bayes Method,” J Phys Conf Ser, vol. 1201, no. 1, 2019, doi: 10.1088/1742-6596/1201/1/012057.
[57] S. Suresh, T. P. Mithun Haridas, and M. H. Supriya, “Sign Language Recognition System Using Deep Neural Network,” 2019
5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, pp. 614–618, 2019, doi:
10.1109/ICACCS.2019.8728411.
[58] Y. Perera, N. Jayalath, S. Tissera, O. Bandara, and S. Thelijjagoda, “Intelligent mobile assistant for hearing impairers to
interact with the society in Sinhala language,” International Conference on Software, Knowledge Information, Industrial
Management and Applications, SKIMA, vol. 2017-Decem, 2018, doi: 10.1109/SKIMA.2017.8294116.
[59] J. Wu, L. Sun, and R. Jafari, “A Wearable System for Recognizing American Sign Language in Real-Time Using IMU and
Surface EMG Sensors,” IEEE J Biomed Health Inform, vol. 20, no. 5, pp. 1281–1290, 2016, doi: 10.1109/JBHI.2016.2598302.
[60] “ASL Alphabet _ Kaggle”.
[61] M. Pérez-Enciso and L. M. Zingaretti, “A guide for using deep learning for complex trait genomic prediction,” Genes, vol. 10,
no. 7. MDPI AG, Jul. 01, 2019. doi: 10.3390/genes10070553.
[62] B. G. Chong, Teak Wei Lee, “American sign language recognition using leap motion controller with machine learning
approach,” Sensors (Switzerland), vol. 18, no. 10, 2018, doi: 10.3390/s18103554.
[63] L. Cai, Z. Wang, R. Kulathinal, S. Kumar, and S. Ji, “Deep Low-Shot Learning for Biological Image Classification and
Visualization from Limited Training Samples,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.10050