Final Report
Final Report
BACHELOR OF ENGINEERING
IN
INFORMATION SCIENCE AND ENGINEERING
For the Academic Year 2023-2024
Submitted by
2023-2024
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060
JSS MAHAVIDYAPEETHA, MYSURU
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060
CERTIFICATE
This is to certify that Project Work Phase - 1 (18CSP77) Report entitled “SymbolSync:
Bridging Communication Gaps with EfficientNetv2 in Sign Language Recognition” is a
bonafide work carried out by Abhay Umesh Hegde [1JS20IS001], Arun M Mirle
[1JS20IS024], KG Arjun Krishna [1JS20IS046], Kushal KN [1JS20IS050] in partial fulfillment
for the award of degree of Bachelor of Engineering in Information Science and
Engineering of Visvesvaraya Technological University Belagavi during the year 2022-
2023.
The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible. So with gratitude, we
acknowledge all those whose guidance and encouragement crowned my effort with success.
First and foremost we would like to thank his Holiness Jagadguru Sri Shivarathri
Deshikendra Mahaswamiji and Dr. Bhimasen Soragaon, Principal, JSSATE, Bangalore
for providing an opportunity to carry out the Project Work Phase – 1 (18CSP77) as a part of
our curriculum in the partial fulfillment of the degree course.
We express our sincere gratitude for our beloved Head of the department, Dr. Rekha P M,
for her co-operation and encouragement at all the moments of our approach.
It is our pleasant duty to place on record our deepest sense of gratitude to our respected guide
Mrs. Sahana V, Assistant Professor, for the constant encouragement, valuable help and
assistance in every possible way.
We are thankful to the Project Coordinators Dr. Nagamani N P, Assoc. Professor and Mrs.
Sahana V Asst. Professor, for their continuous co-operation and support.
We would like to thank all ISE department teachers and non-teaching staff for providing
us with their valuable guidance and for being there at all stages of our work.
The motivation behind this initiative is to address the communication barriers faced by the
deaf and dumb community. Traditional methods of communication can be challenging for
individuals with hearing and speech impairments, often leading to social isolation and
difficulties in accessing information. By leveraging cutting-edge technology, this project
aims to create a platform that empowers this community, enabling them to communicate
more freely and effectively with the wider world.
In conclusion, this project represents a significant step forward in harnessing the power of
deep learning for social good. By utilizing the EfficientNetV2 model for real-time sign
language recognition, this initiative aims to break down communication barriers and open up
new avenues for interaction and understanding in the deaf and dumb community. The project
is a testament to the transformative potential of technology in creating more inclusive
societies.
TABLE OF CONTENTS
In image processing, deep learning is used to perform tasks such as image classification,
object detection, and image generation. By training on a vast dataset of images, deep learning
models can learn to recognize various objects and features in new images. This capability is
crucial for applications like autonomous vehicles, facial recognition systems, medical image
analysis, and even artistic image generation. The versatility and accuracy of deep learning in
interpreting and manipulating images have made it a key technology in modern image
processing.
Human-computer interaction (HCI) has now become a frequent element of our lifestyles as
computer technology and hardware equipment have advanced. The usage of hand signals in
this HCI has piqued people’s curiosity since it is a nice method of engaging with the
computer. A hand robot control, virtual gaming, and natural user interfaces is just a few of
the applications. A well-known use of hand gesture recognition is the identification of human
data, namely, sign language. Sign language is a visual language in which ideas are
communicated by a series of expressive hand motions in a certain order. For deaf people, it is
their only means of communication.
According to the World Health Organization (WHO), 5% of the world’s population (about
360 million people) has moderate to severe hearing loss and can only communicate via their
local sign language (WHO, 2015). Because this communication is difficult for the auditory
population to grasp, there is a communication gap between the normal and speech-hearing
impaired groups. As a result, computer-assisted gesture detection may be used to translate
between sign languages. is would be beneficial and would act as a bridge between the two
communities.
Recognition of gestures in ISL is a challenging task. ISL uses both the hands for portraying a
gesture as opposed to ASL (alphabets), which uses only one hand. This increases complexity
while applying feature extractors like Hough Transform and Scale Invariant Feature
Transform (SIFT). Also, while trying to predict gestures in real time, the problem of
background complexity occurs which might inhibit accurate prediction of the gestures. So, it
becomes essential to segment the hand gesture region from the background. Though there are
techniques like segmentation using colour spaces and Otsus technique, they all have their
limitations with respect to the background conditions.
LITERATURE SURVEY
SL.N TITLE OBJECTIVES METHODLOGY ADVANTAGES DISADVANTAGES
O
1 Human-Computer -To bridge the Combination of Resnet50 and - High accuracy in - Limited dataset
Interaction with communication gap MobilenetV2. Accuracy of sign language variations, including
Hand Gesture between deaf and each model is calculated. recognition (about similar poses and
Recognition Using hearing Resnet50 – 97.5 97%). background settings.
ResNet and communities. MobilenetV2- 97.1
MobileNet Resnet50 & MobilenetV2 – - Effective - Restricted to static
(2022) - Implement a 98.2 preprocessing gestures, which may
vision-based techniques for image not cover the full range
technique for enhancement. of sign language
recognizing static expressions.
hand gestures in - Ensemble of two
Arabic sign different CNN - Potential challenges in
language using models for improved real-world application
CNNs. performance. due to dataset
limitations.
- Enhance the
models' robustness
and avoid
overfitting through
image preprocessing
and data
augmentation
2 Dynamic gesture - Enhance dynamic 2D convolution neural - Higher accuracy - Limited to specific
recognition based gesture recognition network with feature fusion. with reduced datasets (Cambridge
on 2D efficiency and original keyframes and optical network complexity. Hand Gesture and
convolutional accuracy. flow keyframes are used to Northwestern
neural network and represent spatial and temporal - Effective data University Hand
feature fusion - Implement a 2D features respectively, which preprocessing and Gesture datasets).
(2022) CNN-based method are then sent augmentation.
with feature fusion. to the 2D convolutional neural - Potential scalability
network for feature fusion and - Innovative use of issues in diverse real-
- Optimize the final recognition fractional-order world applications.
recognition process Accuracy – 98.2 calculus in neural
using fractional- networks. - The method's
order HS optical dependency on specific
flow and keyframe feature extraction
extraction. techniques may limit its
generalizability.
3 Deepsign: Sign - Develop a system Deep learning models like - High accuracy - Limited to the specific
Language for ISL recognition LSTM and GRU are used (around 97%) in dataset (IISL2020)
Detection and using deep learning. where a single layer of LSTM recognizing signs created for this study.
Recognition Using is followed by GRU achieving from ISL.
Deep Learning - Utilize LSTM and an accuracy of 97. - Potential challenges in
(2022) GRU models to - Effective in real- generalizing the model
process video time scenarios. to other sign languages
frames of sign or varied environments.
language. - Doesn't require
specific environment - Focuses mainly on
- Achieve high or camera setup for isolated signs, not
accuracy in sign inference. continuous sign
recognition. language.
- Utilize variational
autoencoder for
effective cross-
modal alignment.
5 American and - To translate ASL Use of CNN model, where the - Effective use of - Limited to static
Indian Sign and ISL into image is converted to binary CNNs for sign alphabets of ASL and
Language English using image and the gesture is language ISL.
Translation Using CNNs. recognised and later converted recognition.
Convolutional to text-to-speech. - Challenges in real-
Neural Networks - Utilize image Accuracy- 98.34 - Addresses the need world application due
(2021) processing methods for communication to varied sign language
for sign language aids for the deaf dialects and user
alphabet community. conditions.
recognition.
6 Compact Spatial - Develop a selection of the optimal nodes - Reduction in model - The complexity of
Pyramid Pooling compact DCNN in the DCNN by pruning. size and model optimization and
Deep model using transformation of the DCNN computational pruning processes.
Convolutional node/filter pruning to a compact SPP-based demands.
Neural Network and Spatial Pyramid DCNN, along with a practical - Potential limitations
Based Hand Pooling (SPP). approach to - High accuracy in in generalizing to
Gestures Decoder decoding video gestures in gesture recognition. diverse real-world
(2020) - Improve gesture real time based on the compact scenarios beyond the
recognition SPP-based DCNN. - Flexibility in tested datasets.
efficiency and Accuracy -91 handling variable
accuracy. input image sizes - Balancing between
due to SPP pruning for efficiency
- Address issues of integration. and maintaining
fixed input recognition accuracy.
dimensionality and
high computational
costs in traditional
DCNNs.
7 Recognition of - Enhance ISL obtain a depth image from the - High accuracy in - Limited to the specific
Indian Sign recognition using Kinect sensor. ISL gesture dataset and gestures
Language Using ORB and Bag of Hand features extracted by recognition. used in the study.
ORB with Bag of Visual Words. captured hand region
Visual Words by using SIFT, SURF, and ORB. - Effective in - Reliance on Kinect
Kinect Sensor - Use Kinect sensor The K-mean algorithm was different lighting andsensor's capabilities.
(2020) for efficient image used and ISL sign is background
capture in varied recognized by M-SVM conditions.
environments. Accuracy – 87.8
8 Vision-based Hand - Develop a vision- The hand - High accuracy in - Limited to static signs
Gesture based ISL region is detected and recognizing ISL of ISL, not dynamic or
Recognition for recognition system segmented using the depth gestures (up to continuous signs.
Indian Sign using CNN and image. the CNN is used to 99.3%).
Language Using Microsoft Kinect recognize the ISL signs - System's effectiveness
Convolution sensor. (alphabet). - Efficient hand is subject to the Kinect
Neural Network Accuracy - 99 detection and sensor's limitations.
(2020) - Address segmentation in
challenges like hand various - Challenges in
detection and environments. generalizing to other
segmentation in sign languages or more
cluttered - Robust against complex gesture
environments. varying lighting interpretations.
conditions.
9 EfficientNetV2: -Create a more This paper includes various -EfficientNetV2 -EfficientNetV2,
Smaller Models compact efficientnet model variation achieves a balance although smaller and
and Faster EfficientNet model like Convnet & hybrid models, between model size faster, poses challenges
Training while preserving or Vision Transformers and and accuracy, in design and
(2019) enhancing accuracy. Convnet. making it highly optimization,
efficient for diverse demanding specialized
-Enables applications in knowledge in model
deployment in image processing architecture and
resource- and recognition. scaling.
constrained
environments. -With a smaller -Similar to other deep
model size and faster learning models, there's
-Improves training, a potential for
efficiency and cost- EfficientNetV2 overfitting, especially
effectiveness in reduces when trained on limited
model development computational or specific datasets
requirements
10 Indian Sign - To recognize ISL Depth Based Segmentation is - High accuracy in - Specific to the Kinect
Language Gesture gestures in real time used to extract features from recognizing static sensor, which may limit
Recognition using using deep learning the images. Later the images and dynamic ISL generalizability.
Image Processing and image are mapped on single RGB gestures.
and Deep Learning processing. frames and later passed to the - Focused on ISL,
(2019) LSTM model. - Effective in various potentially requiring
- Utilize depth Accuracy- 97.52 background adaptation for other
perception and conditions and with sign languages.
computer vision different hand sizes.
techniques for - Reliance on a custom
accurate gesture - Adaptability to dataset, which may not
segmentation. American Sign represent all variations
Language through in ISL.
transfer learning.
11 Assistive Sign - Develop a The camera - Portability and - May face challenges
Language portable, real-time captures an image which is real-time operation. in diverse environments
Converter for Deaf sign language to then sent to the Controller for or with complex
and Dumb speech conversion further processing. This image - High accuracy in gestures.
(2019) device. is processed and classified by sign recognition and
the controller, employing conversion to - Reliance on specific
- Utilize image different algorithms. The speech. hardware (camera,
processing and deep system processing unit) for
learning for gesture recognizes the sign and functionality.
recognition. generates the corresponding
Alpha bet from the predefined
data set.
Accuracy – 99
12 Automatic Indian -The approach each class is trained with a - High recognition - Limited to the dataset
Sign Language combines Hu multiclass support vector rate of 96%. used in the study,
Recognition invariant moment machine (MSVM).different which may affect
System and structural shape classes are used for testing - Effective generalizability.
(2012) descriptors to form an input gesture. The outcome combination of
a new feature vector with the most probable group different feature - Focused on manual
for sign recognition, is extraction methods. signs, potentially
using a multi-class identified to recognize the limiting the scope of
Support Vector gesture. recognizable gestures.
Machine (MSVM) Accuracy – 96.23
PROBLEM IDENTIFICATION
1. Fundamental Nature of Communication:
- Communication is integral to human interaction, serving key roles in sharing
information, expressing emotions, and participating in social activities.
- Effective communication is central to personal development, education, and
professional success.
2. Challenges Faced by the Deaf and Dumb Community:
- The deaf and dumb community faces significant communication barriers,
primarily due to the prevalent use of spoken and written languages they
cannot access.
- This barrier often results in social isolation, as they find it challenging to
engage in everyday conversations and social interactions.
- Limited access to information and communication restrictions can adversely
affect education and employment opportunities.
- The psychological impact, including feelings of exclusion and frustration, can
be profound.
3. The Communication Gap:
- There exists a substantial gap in communication methods between the deaf
and dumb community and those unfamiliar with sign language.
- The majority of the hearing population does not understand sign language,
leading to a disconnect in communication.
- Existing solutions like interpreter services or text-to-speech tools are often not
practical for everyday, spontaneous use.
- This gap restricts the participation of individuals with hearing and speech
impairments in various aspects of society, including education and
employment.
4. Need for Accessible Technologies:
- Current technologies and services are either inaccessible, expensive, or
impractical for daily use.
- The necessity for an efficient, real-time communication tool is evident and
urgent.
OBJECTIVES
The primary objective of this project is to develop a real-time sign language recognition
system using EfficientNet and React, aimed at enhancing communication for the deaf and
dumb community. This system seeks to bridge the gap between individuals with hearing and
speech impairments and those who do not understand sign language, facilitating more
effective and inclusive interactions.
Technical Objectives:
1. Efficient and Accurate Gesture Recognition: To utilize the EfficientNet model for
its state-of-the-art efficiency and accuracy in image processing, ensuring the system can
accurately interpret a wide range of sign language gestures in real time.
In summary, the objectives of this project encompass a range of technical, user experience,
and social goals, all aimed at creating an efficient, effective, and inclusive real-time sign
language recognition system. Achieving these objectives will not only provide practical
communication solutions for the deaf and dumb community but also contribute to the broader
goal of creating a more inclusive and understanding society.
METHODOLOGY
1. EfficientNet Overview:
EfficientNet, particularly the EfficientNetV2 version used in this project, represents a
significant advancement in CNN (Convolutional Neural Network) architectures. Its
design is rooted in the principle of compound scaling, which uniformly scales the
depth, width, and resolution of the network. This allows EfficientNet to achieve
higher accuracy without a proportional increase in computational complexity.
2. Preprocessing:
The data preprocessing steps are critical in preparing the images for the model. The
pipeline involves:
This project combines state-of-the-art deep learning techniques with modern web
development practices to create an accessible, efficient, and user-friendly tool for real-time
sign language recognition. The use of EfficientNet for backend processing ensures high
accuracy and efficiency, while React provides a robust and responsive front-end. Together,
these technologies create a system that can significantly enhance communication for the deaf
and dumb community.
MODEL : EFFICIENTNETV2
EXPECTED OUTCOMES
1. Immediate and Accurate Gesture Translation: The primary expected outcome is
the real-time translation of sign language into text or spoken words with high
accuracy. This feature will enable seamless communication for individuals with
hearing and speech impairments.
2. Reduced Social Isolation: By facilitating easier communication, the project is
expected to significantly reduce the social isolation often experienced by the deaf and
dumb community. It will allow for more spontaneous and engaging interactions with
the wider society.
3. Enhanced Educational and Professional Opportunities: The system's
implementation in educational and professional settings is anticipated to open up new
opportunities for learning and employment, making these environments more
accessible and inclusive.
4. Advancement in Assistive Technology: This project is expected to be a significant
contribution to the field of assistive technology, demonstrating the effective
application of advanced deep learning models like EfficientNet in real-world
scenarios.
5. User-Friendly Interface: With the integration of React for front-end development, a
highly intuitive and accessible user interface is anticipated, which will be easy to
navigate even for users with limited technical skills.
6. Increased Awareness and Empathy: The project is expected to raise awareness
about the challenges faced by the deaf and dumb community, fostering greater
empathy and understanding within the broader society.
7. Scalable Solution for Diverse Applications: The system is designed to be scalable,
allowing for future enhancements such as support for multiple sign languages and
dialects, and integration into various digital platforms and devices.
8. Valuable Feedback for Improvements: Continuous user feedback is expected,
which will be crucial for the iterative improvement of the system, ensuring that it
meets the evolving needs of its users effectively.