7th Sem Report Sign Language Recognition
7th Sem Report Sign Language Recognition
on
Rasika Ukarde
Ashwini Ramteke
Khemant Shahare
Tejas Bangre
Prof. D. D. Meshram
Session 2023-24
PRIYADARSHINI J. L. COLLEGE OF ENGINEERING, NAGPUR
Certificate
The synopsis titled SIGN LANGUAGE RECOGNITION AND SPEECH GENERATION submitted
by students of 7th Semester of B Tech in Computer Science and Engineering, by Rashtrasant Tukadoji
Maharaj Nagpur University, Nagpur, shall be carried out under my supervision in the Department of
Computer Science and Engineering of Priyadarshini J.L. College of Engineering, Nagpur during
academic session 2023-24. The proposed project work and the synopsis enclosed herewith have my
approval.
Name of Guide
Abstract i
1 Introduction 1-2
2 Objective 3
3 Literature Review 4
Project Plan 10
References 11
ABSTRACT
Sign language is an important mode of communication for millions of deaf people around the world,
allowing them to express thoughts, feelings, and ideas effectively. In India, where a large proportion of
the population is affected by hearing loss, the importance of Indian Sign Language as a means of
communication cannot be overstated. With approximately 6.3% of India's population having severe
hearing impairment, there is an urgent need to develop efficient communication systems for people with
disabilities. This paper provides a comprehensive overview and comparison of various deep learning
algorithms for Indian Sign Language recognition. The objective is to identify the best algorithm to bridge
the communication gap between sign language users and sign language learners. The importance of this
research lies in its potential to improve accessibility and inclusion for deaf and hard-of-hearing people
in India and beyond. This integration allows sign language users to participate more effectively in
everyday activities and interactions, further increasing their inclusion in modern society. Through a
rigorous comparison of these deep learning approaches, this paper aims to provide valuable insights into
the most promising direction for future research and the development of practical sign language
recognition systems. Ultimately, our goal is to help create a more inclusive and accessible world for
people with hearing loss by leveraging the potential of deep learning in the context of Indian Sign
Language recognition.
Keywords - ISL (Indian Sign Language), CNN (Convolutional Neural Network), SLR (Sign Language
Recognition), RNN (Recurrent neural network), Deep Learning
i
INTRODUCTION
Sign language is a communication for those who are deaf or hard of hearing, sign
language communicates sign patterns visually to convey meaning. Speech hearing loss is the most
common sensory disability in modern society. Estimates from the World Health Organization place the
prevalence of substantial auditory impairment at 6.3% of the population in India, where there are over
63 million sufferers. For people with disabilities, an efficient communication system must be developed.
Sign language is a vital means of communication for millions deaf and hard of hearing individuals
around the world. It serves as their primary mode of expressing thoughts, emotions, and ideas. However,
effective communication between sign language users and those who do not understand sign language
remains a significant challenge. Sign language is the mode of communication which uses visuals ways
like expressions, hand gestures, and body movements to convey meaning. Sign language recognition
refers to the conversion of these gestures into words or alphabets of existing formally spoken language.
Conversion of sign language into words by algorithm or a model can help bridge the gap between people
with hearing or speaking impairment and the rest of the world.
ISL is divided into two main groups. It is divided into one-handed and two-handed signs. One hand is
used to make the gestures in one-handed signs. Both hands are used to denote two-handed signs.
Furthermore, static and dynamic signs are separated into groups for both one-handed and two-handed
signals. While dynamic signs can either be continuous or isolated videos or movements, static signs are
still pictures.
In recent years, Convolutional Neural Networks (CNNs) [1], a class of deep learning models, have
shown remarkable success in various machine learning tasks, including image, have shown remarkable
success in various computer vision tasks, including image classification and object detection. Their
ability to automatically learn and extract features from visual data makes them a compelling choice for
sign language recognition. The significance of our research lies in its potential to enhance accessibility
and inclusivity for Deaf and Hard of Hearing individuals by enabling seamless communication with the
broader community. Real time sign language detection can be integrated into various application,
including video conferencing, educational tools, and accessibility services, making it easier for sign
language users to engage in everyday activities and interactions.
1
Fig 1. Indian Sign Language
Indian Sign Language is predominantly used in South Asian region. India doesn’t have a standard sign
language like America and Europe. The majority of the ISL parts come from the British sign language.
Localized sign languages have been used in India for a very long time. Researchers discovered that
different towns' sign languages shared some signs in common as well as some differences. Figure 1
shows the ISL alphabets chart. Automatic translation of gestures into text or speech is most important
for interaction between people with disabilities and people who do not understand the sign language.
The sign language interpretation is an extensive research area. A technique for identifying the signs in
sign language performs automatic conversion of the gestures into human readable text or speech. The
dataset is used to train the machine to detect, predict and analyze the target
It is a computational process to recognize and to classify the gestures into text or voice using various
techniques. It is used in various sectors like human-computer interaction (HCI), General gesture
recognition which could be required in specific places like hospitals, banks, railways stations, bus stops
etc., Human robot interaction (HRI), sign language tutor for the hearing-impaired society, special
education for the differently abled people. Sign Language Recognition works on few key factors like
hand forms, hand motions, hand and head orientation, hand and head position, and facial expressions.
The purpose of Sign Language Recognition system is to interpret the gestures correctly in the form of
text or voice.
2
OBJECTIVE
The objectives of Sign Language Recognition and Speech Generation are as follows:
1. Facilitating Communication: To develop systems that can understand and recognize sign language
gestures and translate them into spoken or written language. This is done to bridge the
communication gap between sign language users and those who do not understand sign language.
2. Enhancing Accessibility: To improve accessibility for the deaf and hard-of-hearing individuals by
providing them with a means to communicate effectively with the broader hearing population.
3. Promoting Inclusion: To promote the inclusion of the deaf and hard-of-hearing community in
various aspects of life, such as education, employment, and social interactions, by making
communication more seamless.
4. Real-time Interaction: Enabling real-time and interactive communication, which is especially
valuable for scenarios like education, customer service, or social interactions.
5. Development of Assistive Technology: Developing assistive technology that empowers individuals
with hearing disabilities and reduces the barriers they face in daily life.
6. Research and Innovation: Advancing the field of deep learning and artificial intelligence in the
context of sign language recognition and speech generation, contributing to ongoing research and
innovation.
7. Indian Sign Languages: Depending on the scope of the project, it aim to recognize and generate
speech for specific Indian sign languages(ISL).
8. User-Friendly Systems: Creating user-friendly systems that are easy for both sign language users
and non-sign language users to operate.
9. Improving Quality of Life: Ultimately, the overarching objective is to enhance the quality of life
for individuals with hearing disabilities by providing them with effective tools for communication
and interaction.
3
LITRETATURE REVIEW
Kausar Mia, Tariqul I, et al. [2] This research study proposed a hand gesture detection and sign
language recognition system using Deep CNN with VGG16 of ImageNet. It worked well while using
Deep CNN and VGG16 to identify gestures made with the hands. When we used transfer learning our
Improved Deep CNN model gave better results than the other methods. Moreover, it gave 95% accuracy
over all the classes. Here we have used picture data of several hand gestures. Which they stated will be
helpful for speech and hearing-impaired people in the future. In their current model, cannot detect hand
gestures from real-time data or video data.
Purva B, Vaishali K, et al. [3] This paper presents a novel approach that combines Convolutional
Neural Network (CNN) with Recurrent Neural Network (RNN) for sign language recognition. The study
makes use of a self-built database of signers who are naturally deaf. The feature extraction process
utilizes the widely adopted Inception V3 model. To address the limitation of a relatively smaller
database, transfer learning is employed, where pre-trained model weights are imported to effectively
train the current system. Following successful feature selection, gesture classification is performed using
the Long Short-Term Memory (LSTM) approach. By leveraging the combination of CNN and RNN, the
proposed methodology takes advantage of both hierarchical and sequential features inherent in dynamic
gesture datasets.
Sanket D, Aniket K, et al. [4] This paper presented a vision-based system able to interpret hand gestures
from the American Sign Language and convert them to text or speech. After that we also did the opposite.
The proposed solution was tested to evaluation in real-world scenarios, demonstrating that the
classification models that were created could identify all of the taught gestures while remaining user
independent, a crucial need for this kind of system. The selected hand features, in conjunction with
machine learning algorithms, proved to be very efficient, allowing their application in any real-time sign
language recognition systems.
Srushti R, Panav P, et al. [5] This research paper provides a comprehensive CNN Model
implementation for the recognition of Indian sign language. The stepwise implementation has been
discussed which includes Image Collection, Image Segmentation, Feature Extraction, Classification, and
Text-to-Speech.
Sahil R, Dhara S, et al. [6] The fundamental goal of this paper is to provide a practical mechanism for
normal and deaf individuals to communicate through hand gestures. The suggested approach can be used
to webcams or any other built-in cameras that can identify and process cues for recognition. We may
deduce from the model's findings that the suggested system produces reliable results under conditions
of regulated light and intensity.
4
PROPOSED METHODOLOGY
The proposed system is a real-time system where the live sign gestures will be recognized in words or
sentences. CNN is used for processing data such as images and is used for image recognition and
classification. After predicting the gesture, the text is converted to speech. The methodology are follows:
5
segmentation process. The motion and location of the hand must be detected and segmented in order to
recognise gestures.
3. Features Extraction:
Predefined features such as form, contour, geometrical feature (position, angle, distance ), colour feature,
histogram, and others are extracted from the preprocessed images and used later for sign classification
or recognition. Feature extraction is a step in the dimensionality reduction process that divides and
organises a large collection of raw data, reduced to smaller, easier-to-manage classes As a result,
processing would be simpler. The fact that these massive data sets have a large number of variables is
the most important feature.
To process these variables, a large amount of computational power is needed. As a result, function
extraction aids in the extraction of the best feature from large data sets by selecting and combining
variables into functions. reducing the size of the data These features are simple to use while still
accurately and uniquely describing the actual data collection.
4. Preprocessing:
Preprocessing techniques are applied to an input image to remove unwanted noise and also enhance the
quality. This can be accomplished by resizing, colour conversion, removing unwanted noise, or a
combination of several of these techniques from the original image. The output of this process can
greatly affect the accuracy with a good selection of preprocessing techniques. Image preprocessing
techniques can be broadly classified into image enhancement and image restoration. Image enhancement
techniques include Histogram Equalization (HE), Adaptive Histogram Equalization (AHE), Contrast
Limited Adaptive Histogram Equalizaion (CLAHE) and logarithmic transformation. Image restoration
includes median filter, mean filter, Gaussian filter, adaptive filter and wiener filter
6
5.1 CNN (Convolutional neural network):
In this paper [7], the authors of the research have suggested a solid CNN model for feature extraction
and categorization of ISL gestures. For SLR, the depth-wise separable convolution was employed. The
suggested model recognizes the primary characteristics of an input frame automatically. To lower the
computational cost, the model has been developed with an adequate amount of convolutional layers and
filters. Three separate sets of self-created data are used in the experiment, with the background colour
being black. It has the highest level of precision. The pixel data is carefully processed by CNN. It
provides extremely accurate picture recognition. It recognizes the necessary features automatically. The
cost of computation is substantial. Without a GPU, the training process takes longer for difficult jobs.
Numerous training data are needed.
5.2 RNN (Recurrent Neural Network):
In this paper [8] the authors have suggested a technique for training both spatial and temporal
characteristics using CNN and RNN. To obtain a sequence of predictions, individual frames were
forecasted using CNN. Later, RNN was given the sequence to train the temporal characteristics. With a
self-created dataset made up of isolated hand motions, the suggested model with a pool layer has an
excellent accuracy of 96%. RNN is designed to remember every little aspect of the procedure, which
helps tackle the challenges of time series. The size of the model does not rise as the size of the input
increases. Due to the computation's recurring nature, it is sluggish. Processing lengthy sequences might
be challenging. It is delicate to problems like gradient disappearing.
5.3 LSTM (Long Short-Term Memory):
In this paper [9] the author proposes a deep learning approach for encoder-decoder-based two-way
translation of sign language to conventional English. Two encoder-decoder network topologies that
employ GRU and LSTM. The proposed methodology is tested on the ASLG-PC12 and Phoenix-
2014Tcorpora. The experiment's findings demonstrate that the suggested approach performs better than
similar work on the same corpus. Future research should examine the possibility of translating from text
to posture estimation and vice versa.
5.4 KNN (K-Nearest Neighbor Algorithm):
In this paper [10] Greater communication accessibility and inclusion for the non-verbal and hearing-
impaired communities are made possible by the development of sign language recognition technologies.
A graphical user interface training tool that conducts sign letter prediction and collects user input has an
accuracy of 97% for Random Forest, 96% for KNN, and 95% for SVM. The accomplishment of this
project demonstrates the possibility for future research to enhance the precision and effectiveness of sign
language recognition technology and ultimately have a substantial impact on the lives of people who use
sign language for communication.
6. Text-to-speech:
The hand sign recognized by CNN is displayed and then is converted to speech using the python text to
speech (pyttsx3) library.
7
TOOLS AND PLATFORM USED
1. Programming Languages:
Python: Python is the go-to language for machine learning and deep learning projects. It offers
extensive libraries and frameworks for data processing, model development, and integration.
2. Deep Learning Frameworks:
TensorFlow: TensorFlow is widely used for building deep learning models. Its high-level API,
Keras, simplifies model development and training.
3. Data Collection and Annotation Tools:
OpenCV: OpenCV (Open Source Computer Vision Library) is an open-source computer vision
library that is used for capturing and processing video data. It can handle tasks like video frame
extraction and basic video processing.
Labeling Tools: Tools like LabelImg or Labelbox are used for annotating and labeling sign
language gestures in the collected data. Annotating the data is crucial for training the model
effectively.
4. Data Preprocessing:
NumPy and Pandas: NumPy and Pandas are Python libraries for data manipulation and
preprocessing, including feature extraction and dataset management.
Scikit-learn: Scikit-learn p
rovides utilities for data preprocessing, feature scaling, and dataset splitting.
5. User Interface and Development:
HTML, CSS, JavaScript: Web technologies like HTML, CSS, and JavaScript are used for creating
user interfaces that capture sign language gestures.
Frameworks like Flask or Django: If a web application is part of the project, web frameworks like
Flask or Django can be used for development.
6. Data Visualization:
Matplotlib and Seaborn: Matplotlib and Seaborn are Python libraries used for data visualization.
They can create various types of charts, graphs, and plots, which are useful for understanding data
distributions and presenting results.
8
ADVANTAGES AND APPLICATION
Application
The projects primary application is to provide a bridge for communication between the deaf and
hard of hearing individuals and the general population, making information and services more
accessible to this community.
In educational settings, the technology can be used to aid sign language learners and facilitate
communication between students, teachers, and students with hearing impairments.
The project can be integrated into assistive devices such as mobile applications or wearable
technology to help individuals with hearing disabilities in their daily interactions.
Sign language recognition can be used in healthcare settings to enable better communication
between healthcare providers and patients with hearing impairments.
In customer service and public facilities, it can provide efficient communication between staff
and customers with hearing impairments.
The project promotes social inclusion and equal participation in various aspects of life, including
employment, social interactions, and community involvement.
Advantages
1. Improved Communication: The project enhances communication for individuals with hearing
impairments, reducing barriers and enabling more natural interactions.
2. Independence: It empowers users to communicate independently without relying on
intermediaries or written messages.
3. Real-time Interaction: Real-time recognition and speech generation facilitate fluid and
spontaneous conversations in sign language and spoken language.
4. Learning Support: It serves as a valuable tool for sign language learners and educators,
offering a platform for practice and instruction.
5. Accessibility Compliance: Organizations and institutions that adopt this technology
demonstrate their commitment to accessibility and inclusivity, ensuring compliance with
accessibility standards and regulations.
6. Remote Communication: The project is adaptable for remote communication, allowing users
to connect with others even when physically distant.
9
PROJECT PLAN
Phases Task
10
REFERENCES
[1] O. Rodrigues, A. Shinde, K. Desle, S. Yadav. “A Review on Sign Language Recognition Using CNN
International”, Conference on. Machine Intelligence and Data Analytics, ACSR 105, pp. 251–259.
2023
[2] Kausar Mia, Tariqul Islam, Md Assaduzzaman, Sonjoy Prosad Shaha, Arnab Saha, Md. Abdur
Razzak, Angkur Dhar, Teerthanker Sarker, “Isolated sign language recognition using hidden transfer
learning”, Al Imran Alvy International Journal of Computer Science and Information Technology
Research ISSN 2348-120X (online) Vol. 11, Issue 1, pp: (28-38), Month: January - March (2023).
[3] Purva C. Badhe, Vaishali Kulkarni, “Dynamic Gestural Sign Recognition Using Deep Neural
Network”, JETIR, Volume 10, Issue 8 (ISSN2349-5162) (2023), August 2023.
[4] Sanket Dhobale, Aniket Kandrikar, Sumeet Manapure, Aniket Zullarwar, Ankush Temburnikar,
International Journal of Advanced Research in Science, Communication and Technology
(IJARSCT), March 2022 Copyright to IJARSCT Impact Factor: 6.252 Real Time Sign Language
Detector Using Deep Learning DOI: 10.48175/568 682 Volume 2, Issue 1, March 2022.
[5] Srushti Raut, Panav Patel, Sohan Vichare, Gayatri Hegde, Rahul Durvas, by Indian Sign Language
Recognition System for Deaf and Dumb Using CNN International Journal of Scientific Engineering
and Research (IJSER) Licensed Under Creative Commons Attribution CC ISSN (Online): 2347-3878
Impact Factor (2020): 6.733 Volume 11 Issue 4, April 2023.
[6] Sahil Rawal, Dhara Sampat, Priyank Sanghvi, Real Time Sign Language Detection International
Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04
| www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008
Certified Journal | Page 3540, Apr 2022.
[7] Sharma, S., & Singh, S. Recognition of Indian Sign Language (ISL) Using Deep Learning Model.
Wireless Personal Communications, 123(1), 671–692. (2022).
[8] A. Divkar, R. Bailkar, and D. C. S. Pawar, “Gesture Based Real-time Indian Sign Language
Interpreter,”Int.J.Sci.Res. Comput. Sci. Eng. Inf. Technol.,b vol.3307, pp.387–
394,doi:10.32628/cseit217374 (2021).
[9] Amin, M., Hefny, H., & Mohammed, A. Sign Language Gloss Translation using Deep Learning
Models. International Journal of Advanced Computer ScienceandApplications,12 (2021).
[10] Anjana Devi V, Charulatha T, Dharishinie P July 10th, Sign Language Recognition and Training
Module (2023).
11