0% found this document useful (0 votes)
15 views22 pages

Final Report

asana vision

Uploaded by

thunderthor000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

Final Report

asana vision

Uploaded by

thunderthor000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Santhibastawad Road, Machhe


Belagavi - 590018, Karnataka, India

PROJECT WORK PHASE- 1 (18CSP77) REPORT


ON
“SymbolSync: Bridging Communication Gaps with EfficientNetv2 in
Sign Language Recognition”
Submitted in the partial fulfillment of the requirements for the award of the degree of

BACHELOR OF ENGINEERING
IN
INFORMATION SCIENCE AND ENGINEERING
For the Academic Year 2023-2024
Submitted by

Abhay Umesh Hegde 1JS20IS001


Arun M Mirle 1JS20IS024
KG Arjun Krishna 1JS20IS046
Kushal KN 1JS20IS050
Under the Guidance of
Sahana V
Assistant Professor, Dept. of ISE, JSSATE

2023-2024
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060
JSS MAHAVIDYAPEETHA, MYSURU
JSS ACADEMY OF TECHNICAL EDUCATION
JSS Campus, Dr.Vishnuvardhan Road, Bengaluru-560060

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that Project Work Phase - 1 (18CSP77) Report entitled “SymbolSync:
Bridging Communication Gaps with EfficientNetv2 in Sign Language Recognition” is a
bonafide work carried out by Abhay Umesh Hegde [1JS20IS001], Arun M Mirle
[1JS20IS024], KG Arjun Krishna [1JS20IS046], Kushal KN [1JS20IS050] in partial fulfillment
for the award of degree of Bachelor of Engineering in Information Science and
Engineering of Visvesvaraya Technological University Belagavi during the year 2022-
2023.

Signature of the Guide Signature of the HOD


Sahana V Dr. Rekha P M
Assistant Professor Professor & Head
Dept. of ISE Dept. of ISE
JSSATE, Bengaluru JSSATE, Bengaluru
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible. So with gratitude, we
acknowledge all those whose guidance and encouragement crowned my effort with success.

First and foremost we would like to thank his Holiness Jagadguru Sri Shivarathri
Deshikendra Mahaswamiji and Dr. Bhimasen Soragaon, Principal, JSSATE, Bangalore
for providing an opportunity to carry out the Project Work Phase – 1 (18CSP77) as a part of
our curriculum in the partial fulfillment of the degree course.

We express our sincere gratitude for our beloved Head of the department, Dr. Rekha P M,
for her co-operation and encouragement at all the moments of our approach.

It is our pleasant duty to place on record our deepest sense of gratitude to our respected guide
Mrs. Sahana V, Assistant Professor, for the constant encouragement, valuable help and
assistance in every possible way.

We are thankful to the Project Coordinators Dr. Nagamani N P, Assoc. Professor and Mrs.
Sahana V Asst. Professor, for their continuous co-operation and support.

We would like to thank all ISE department teachers and non-teaching staff for providing
us with their valuable guidance and for being there at all stages of our work.

Abhay Umesh Hegde [1JS20IS001]


Arun M Mirle [1JS20IS024]
KG Arjun Krishna [1JS20IS046]
Kushal KN [1JS20IS050]
ABSTRACT
This research introduces a groundbreaking approach in assistive technologies, focusing on
the development of a real-time sign language recognition system using the EfficientNetV2
neural network. This project is driven by the goal of creating an inclusive communication
tool that caters to the needs of the deaf and dumb community, thereby fostering more
accessible and effective interactions in various settings.

The motivation behind this initiative is to address the communication barriers faced by the
deaf and dumb community. Traditional methods of communication can be challenging for
individuals with hearing and speech impairments, often leading to social isolation and
difficulties in accessing information. By leveraging cutting-edge technology, this project
aims to create a platform that empowers this community, enabling them to communicate
more freely and effectively with the wider world.

In conclusion, this project represents a significant step forward in harnessing the power of
deep learning for social good. By utilizing the EfficientNetV2 model for real-time sign
language recognition, this initiative aims to break down communication barriers and open up
new avenues for interaction and understanding in the deaf and dumb community. The project
is a testament to the transformative potential of technology in creating more inclusive
societies.
TABLE OF CONTENTS

SL NO. TITLE PG NO.


1 INTRODUCTION
2 LITERATURE SURVEY
3 PROBLEM IDENTIFICATION
4 OBJECTIVES
5 METHODOLOGY
6 MODEL : EFFICIENTNETV2
7 EXPECTED OUTCOME
INTRODUCTION
Deep learning is a subset of machine learning where artificial neural networks, algorithms
inspired by the human brain, learn from large amounts of data. These networks can identify
patterns and features in data, making them particularly effective for complex tasks like
speech recognition, language translation, and image recognition.

In image processing, deep learning is used to perform tasks such as image classification,
object detection, and image generation. By training on a vast dataset of images, deep learning
models can learn to recognize various objects and features in new images. This capability is
crucial for applications like autonomous vehicles, facial recognition systems, medical image
analysis, and even artistic image generation. The versatility and accuracy of deep learning in
interpreting and manipulating images have made it a key technology in modern image
processing.

Human-computer interaction (HCI) has now become a frequent element of our lifestyles as
computer technology and hardware equipment have advanced. The usage of hand signals in
this HCI has piqued people’s curiosity since it is a nice method of engaging with the
computer. A hand robot control, virtual gaming, and natural user interfaces is just a few of
the applications. A well-known use of hand gesture recognition is the identification of human
data, namely, sign language. Sign language is a visual language in which ideas are
communicated by a series of expressive hand motions in a certain order. For deaf people, it is
their only means of communication.

According to the World Health Organization (WHO), 5% of the world’s population (about
360 million people) has moderate to severe hearing loss and can only communicate via their
local sign language (WHO, 2015). Because this communication is difficult for the auditory
population to grasp, there is a communication gap between the normal and speech-hearing
impaired groups. As a result, computer-assisted gesture detection may be used to translate
between sign languages. is would be beneficial and would act as a bridge between the two
communities.

Recognition of gestures in ISL is a challenging task. ISL uses both the hands for portraying a
gesture as opposed to ASL (alphabets), which uses only one hand. This increases complexity
while applying feature extractors like Hough Transform and Scale Invariant Feature
Transform (SIFT). Also, while trying to predict gestures in real time, the problem of
background complexity occurs which might inhibit accurate prediction of the gestures. So, it
becomes essential to segment the hand gesture region from the background. Though there are
techniques like segmentation using colour spaces and Otsus technique, they all have their
limitations with respect to the background conditions.
LITERATURE SURVEY
SL.N TITLE OBJECTIVES METHODLOGY ADVANTAGES DISADVANTAGES
O
1 Human-Computer -To bridge the Combination of Resnet50 and - High accuracy in - Limited dataset
Interaction with communication gap MobilenetV2. Accuracy of sign language variations, including
Hand Gesture between deaf and each model is calculated. recognition (about similar poses and
Recognition Using hearing Resnet50 – 97.5 97%). background settings.
ResNet and communities. MobilenetV2- 97.1
MobileNet Resnet50 & MobilenetV2 – - Effective - Restricted to static
(2022) - Implement a 98.2 preprocessing gestures, which may
vision-based techniques for image not cover the full range
technique for enhancement. of sign language
recognizing static expressions.
hand gestures in - Ensemble of two
Arabic sign different CNN - Potential challenges in
language using models for improved real-world application
CNNs. performance. due to dataset
limitations.
- Enhance the
models' robustness
and avoid
overfitting through
image preprocessing
and data
augmentation
2 Dynamic gesture - Enhance dynamic 2D convolution neural - Higher accuracy - Limited to specific
recognition based gesture recognition network with feature fusion. with reduced datasets (Cambridge
on 2D efficiency and original keyframes and optical network complexity. Hand Gesture and
convolutional accuracy. flow keyframes are used to Northwestern
neural network and represent spatial and temporal - Effective data University Hand
feature fusion - Implement a 2D features respectively, which preprocessing and Gesture datasets).
(2022) CNN-based method are then sent augmentation.
with feature fusion. to the 2D convolutional neural - Potential scalability
network for feature fusion and - Innovative use of issues in diverse real-
- Optimize the final recognition fractional-order world applications.
recognition process Accuracy – 98.2 calculus in neural
using fractional- networks. - The method's
order HS optical dependency on specific
flow and keyframe feature extraction
extraction. techniques may limit its
generalizability.
3 Deepsign: Sign - Develop a system Deep learning models like - High accuracy - Limited to the specific
Language for ISL recognition LSTM and GRU are used (around 97%) in dataset (IISL2020)
Detection and using deep learning. where a single layer of LSTM recognizing signs created for this study.
Recognition Using is followed by GRU achieving from ISL.
Deep Learning - Utilize LSTM and an accuracy of 97. - Potential challenges in
(2022) GRU models to - Effective in real- generalizing the model
process video time scenarios. to other sign languages
frames of sign or varied environments.
language. - Doesn't require
specific environment - Focuses mainly on
- Achieve high or camera setup for isolated signs, not
accuracy in sign inference. continuous sign
recognition. language.

4 CVT-SLR: - To improve sign VAE network is pretrained as - Effective - Potentially complex


Contrastive language textual model. Later the visual integration of visual implementation and
Visual-Textual recognition by model is transferred to the and textual optimization.
Transformation for integrating visual pretrained model into the information.
Sign Language and language CVT-SLR framework. - May require extensive
Recognition with modalities. WER - Enhanced training data for
Variational Test- 20.3 recognition accuracy optimal performance.
Alignment - Implement a novel Dev-19.4 using contrastive
(2022) contrastive visual- alignment methods. - Specificity to datasets
textual used (e.g., PHOENIX-
transformation - Versatile 2014, PHOENIX-
framework (CVT- application across 2014T) may limit
SLR). different datasets. generalization.

- Utilize variational
autoencoder for
effective cross-
modal alignment.
5 American and - To translate ASL Use of CNN model, where the - Effective use of - Limited to static
Indian Sign and ISL into image is converted to binary CNNs for sign alphabets of ASL and
Language English using image and the gesture is language ISL.
Translation Using CNNs. recognised and later converted recognition.
Convolutional to text-to-speech. - Challenges in real-
Neural Networks - Utilize image Accuracy- 98.34 - Addresses the need world application due
(2021) processing methods for communication to varied sign language
for sign language aids for the deaf dialects and user
alphabet community. conditions.
recognition.

6 Compact Spatial - Develop a selection of the optimal nodes - Reduction in model - The complexity of
Pyramid Pooling compact DCNN in the DCNN by pruning. size and model optimization and
Deep model using transformation of the DCNN computational pruning processes.
Convolutional node/filter pruning to a compact SPP-based demands.
Neural Network and Spatial Pyramid DCNN, along with a practical - Potential limitations
Based Hand Pooling (SPP). approach to - High accuracy in in generalizing to
Gestures Decoder decoding video gestures in gesture recognition. diverse real-world
(2020) - Improve gesture real time based on the compact scenarios beyond the
recognition SPP-based DCNN. - Flexibility in tested datasets.
efficiency and Accuracy -91 handling variable
accuracy. input image sizes - Balancing between
due to SPP pruning for efficiency
- Address issues of integration. and maintaining
fixed input recognition accuracy.
dimensionality and
high computational
costs in traditional
DCNNs.
7 Recognition of - Enhance ISL obtain a depth image from the - High accuracy in - Limited to the specific
Indian Sign recognition using Kinect sensor. ISL gesture dataset and gestures
Language Using ORB and Bag of Hand features extracted by recognition. used in the study.
ORB with Bag of Visual Words. captured hand region
Visual Words by using SIFT, SURF, and ORB. - Effective in - Reliance on Kinect
Kinect Sensor - Use Kinect sensor The K-mean algorithm was different lighting andsensor's capabilities.
(2020) for efficient image used and ISL sign is background
capture in varied recognized by M-SVM conditions.
environments. Accuracy – 87.8
8 Vision-based Hand - Develop a vision- The hand - High accuracy in - Limited to static signs
Gesture based ISL region is detected and recognizing ISL of ISL, not dynamic or
Recognition for recognition system segmented using the depth gestures (up to continuous signs.
Indian Sign using CNN and image. the CNN is used to 99.3%).
Language Using Microsoft Kinect recognize the ISL signs - System's effectiveness
Convolution sensor. (alphabet). - Efficient hand is subject to the Kinect
Neural Network Accuracy - 99 detection and sensor's limitations.
(2020) - Address segmentation in
challenges like hand various - Challenges in
detection and environments. generalizing to other
segmentation in sign languages or more
cluttered - Robust against complex gesture
environments. varying lighting interpretations.
conditions.
9 EfficientNetV2: -Create a more This paper includes various -EfficientNetV2 -EfficientNetV2,
Smaller Models compact efficientnet model variation achieves a balance although smaller and
and Faster EfficientNet model like Convnet & hybrid models, between model size faster, poses challenges
Training while preserving or Vision Transformers and and accuracy, in design and
(2019) enhancing accuracy. Convnet. making it highly optimization,
efficient for diverse demanding specialized
-Enables applications in knowledge in model
deployment in image processing architecture and
resource- and recognition. scaling.
constrained
environments. -With a smaller -Similar to other deep
model size and faster learning models, there's
-Improves training, a potential for
efficiency and cost- EfficientNetV2 overfitting, especially
effectiveness in reduces when trained on limited
model development computational or specific datasets
requirements
10 Indian Sign - To recognize ISL Depth Based Segmentation is - High accuracy in - Specific to the Kinect
Language Gesture gestures in real time used to extract features from recognizing static sensor, which may limit
Recognition using using deep learning the images. Later the images and dynamic ISL generalizability.
Image Processing and image are mapped on single RGB gestures.
and Deep Learning processing. frames and later passed to the - Focused on ISL,
(2019) LSTM model. - Effective in various potentially requiring
- Utilize depth Accuracy- 97.52 background adaptation for other
perception and conditions and with sign languages.
computer vision different hand sizes.
techniques for - Reliance on a custom
accurate gesture - Adaptability to dataset, which may not
segmentation. American Sign represent all variations
Language through in ISL.
transfer learning.
11 Assistive Sign - Develop a The camera - Portability and - May face challenges
Language portable, real-time captures an image which is real-time operation. in diverse environments
Converter for Deaf sign language to then sent to the Controller for or with complex
and Dumb speech conversion further processing. This image - High accuracy in gestures.
(2019) device. is processed and classified by sign recognition and
the controller, employing conversion to - Reliance on specific
- Utilize image different algorithms. The speech. hardware (camera,
processing and deep system processing unit) for
learning for gesture recognizes the sign and functionality.
recognition. generates the corresponding
Alpha bet from the predefined
data set.
Accuracy – 99

12 Automatic Indian -The approach each class is trained with a - High recognition - Limited to the dataset
Sign Language combines Hu multiclass support vector rate of 96%. used in the study,
Recognition invariant moment machine (MSVM).different which may affect
System and structural shape classes are used for testing - Effective generalizability.
(2012) descriptors to form an input gesture. The outcome combination of
a new feature vector with the most probable group different feature - Focused on manual
for sign recognition, is extraction methods. signs, potentially
using a multi-class identified to recognize the limiting the scope of
Support Vector gesture. recognizable gestures.
Machine (MSVM) Accuracy – 96.23

PROBLEM IDENTIFICATION
1. Fundamental Nature of Communication:
- Communication is integral to human interaction, serving key roles in sharing
information, expressing emotions, and participating in social activities.
- Effective communication is central to personal development, education, and
professional success.
2. Challenges Faced by the Deaf and Dumb Community:
- The deaf and dumb community faces significant communication barriers,
primarily due to the prevalent use of spoken and written languages they
cannot access.
- This barrier often results in social isolation, as they find it challenging to
engage in everyday conversations and social interactions.
- Limited access to information and communication restrictions can adversely
affect education and employment opportunities.
- The psychological impact, including feelings of exclusion and frustration, can
be profound.
3. The Communication Gap:
- There exists a substantial gap in communication methods between the deaf
and dumb community and those unfamiliar with sign language.
- The majority of the hearing population does not understand sign language,
leading to a disconnect in communication.
- Existing solutions like interpreter services or text-to-speech tools are often not
practical for everyday, spontaneous use.
- This gap restricts the participation of individuals with hearing and speech
impairments in various aspects of society, including education and
employment.
4. Need for Accessible Technologies:
- Current technologies and services are either inaccessible, expensive, or
impractical for daily use.
- The necessity for an efficient, real-time communication tool is evident and
urgent.

5. Leveraging Technological Advancements:


- Advances in technology, especially in deep learning and image processing,
open new avenues for innovative solutions.
- The application of these technologies in creating real-time sign language
recognition systems is a promising area of development.
6. Developing a Real-Time Sign Language Recognition System:
- The proposed system aims to translate sign language into text or spoken
language instantly.
- It seeks to utilize deep learning models, like EfficientNet, which are capable
of processing complex visual data.
- The system could potentially use a camera to capture sign language gestures
and translate them in real time.
7. Bridging the Communication Divide:
- Such a system would serve as a bridge between the deaf and dumb community
and the rest of the world.
- It would allow for more spontaneous and natural communication, reducing
dependence on third-party interpretation.
- The system could be implemented in various settings, including public
services, education, and healthcare, to facilitate better accessibility.
8. Impact on Society and Inclusion:
- Implementing this technology could significantly reduce the communication
barriers faced by the deaf and dumb community.
- It has the potential to foster greater inclusivity, allowing for fuller
participation in social and professional spheres.
- The technology would not only benefit individuals with hearing and speech
impairments but also enhance societal awareness and inclusivity towards
diverse communication needs.

OBJECTIVES
The primary objective of this project is to develop a real-time sign language recognition
system using EfficientNet and React, aimed at enhancing communication for the deaf and
dumb community. This system seeks to bridge the gap between individuals with hearing and
speech impairments and those who do not understand sign language, facilitating more
effective and inclusive interactions.
Technical Objectives:

1. Efficient and Accurate Gesture Recognition: To utilize the EfficientNet model for
its state-of-the-art efficiency and accuracy in image processing, ensuring the system can
accurately interpret a wide range of sign language gestures in real time.

2. Optimized Data Preprocessing: To implement an effective preprocessing pipeline


that enhances the model's ability to recognize and interpret sign language gestures
accurately, considering various environmental conditions.

3. Robust Model Training: To train the EfficientNet model on a comprehensive


dataset, ensuring it can accurately recognize diverse sign language gestures, including
subtle variations and different sign language dialects.

4. Effective Model Evaluation: To rigorously evaluate the model using relevant


metrics such as accuracy, precision, and F1 score, ensuring its reliability and
effectiveness in real-world scenarios.

User Interface and Experience Objectives:

1. Intuitive User Interface: To develop a user-friendly interface using React, ensuring


that the system is accessible and easy to use for individuals with varying levels of
technical proficiency.

2. Real-Time Interaction: To facilitate real-time communication between the user and


the system, ensuring that the sign language gestures are translated quickly and
accurately.

3. Accessibility and Inclusivity: To design the interface with accessibility in mind,


ensuring that it accommodates the needs of users with different abilities, including those
with visual, motor, and cognitive impairments.

In summary, the objectives of this project encompass a range of technical, user experience,
and social goals, all aimed at creating an efficient, effective, and inclusive real-time sign
language recognition system. Achieving these objectives will not only provide practical
communication solutions for the deaf and dumb community but also contribute to the broader
goal of creating a more inclusive and understanding society.

METHODOLOGY
1. EfficientNet Overview:
EfficientNet, particularly the EfficientNetV2 version used in this project, represents a
significant advancement in CNN (Convolutional Neural Network) architectures. Its
design is rooted in the principle of compound scaling, which uniformly scales the
depth, width, and resolution of the network. This allows EfficientNet to achieve
higher accuracy without a proportional increase in computational complexity.
2. Preprocessing:
The data preprocessing steps are critical in preparing the images for the model. The
pipeline involves:

- Resizing images to 256x256 pixels.


- Applying a Random Resized Crop to 224x224 pixels, accommodating the
input size requirement of EfficientNet.
- Random Rotation to introduce rotational invariance.
- Color Jittering to enhance the robustness of the model against variations in
brightness and contrast.
- Gaussian Blur to simulate variations in image quality and focus.
- Normalization using predefined mean and standard deviation values, aligning
the dataset with the conditions of the pretrained model.
3. Training Process:
1. Data Loading: The images are loaded using the ImageFolder dataset class, which
assumes that images are organized in a folder structure where each folder
corresponds to a class. This dataset is then split into training and test sets.
2. Model Configuration: The EfficientNetV2 model is initialized with pretrained
weights to leverage transfer learning. The final classifier layer is replaced to
match the number of classes in the sign language dataset. The model is then
transferred to the CUDA device if available, enabling GPU acceleration.
3. Training Loop:
 For each epoch, the model is set to training mode.
 Inputs and labels from the training data loader are forwarded through
the model.
 Loss is calculated using Cross-Entropy, which is appropriate for
multi-class classification tasks.
 Backpropagation and optimization steps are performed using Adam
optimizer.
4. Evaluation Metrics:
After training, the model's performance is evaluated using metrics like accuracy,
precision, and F1 score. These metrics provide a comprehensive understanding of the
model's effectiveness in classifying sign language gestures correctly.
5. Front-End Development Using React:
1. Overview: The front-end application is being developed using React, a popular
JavaScript library for building user interfaces. React's component-based
architecture allows for efficient and flexible development of the application's user
interface.
2. User Interface: The user interface will feature a clean and intuitive design. It will
include a live video feed section where users can perform sign language gestures.
Real-time recognition results will be displayed, either as text or through an avatar
mimicking the sign.
3. Integration with Backend: The front-end will communicate with the backend (the
EfficientNet model) through a RESTful API or a WebSocket for real-time
interaction. The model's predictions will be sent back to the front-end, where they
will be rendered for the user.
4. Accessibility and Responsiveness: Special attention is being paid to ensure the
application is accessible, with considerations for users with various disabilities.
Additionally, the design will be responsive, ensuring usability across different
devices and screen sizes.

This project combines state-of-the-art deep learning techniques with modern web
development practices to create an accessible, efficient, and user-friendly tool for real-time
sign language recognition. The use of EfficientNet for backend processing ensures high
accuracy and efficiency, while React provides a robust and responsive front-end. Together,
these technologies create a system that can significantly enhance communication for the deaf
and dumb community.

MODEL : EFFICIENTNETV2

EfficientNetV2 Architecture Design:


1. Review of EfficientNet: The original EfficientNet models were optimized for FLOPs
(Floating Point Operations per Second) and parameter efficiency. It utilized Neural
Architecture Search (NAS) for finding a baseline model (EfficientNet-B0), which
was then scaled to create a range of models (B1-B7).
2. Training Efficiency Challenges: EfficientNetV2 addresses issues like the slow
training speed caused by large image sizes used in EfficientNet, memory constraints,
and the inefficiency of depthwise convolutions in early layers. Techniques like
FixRes are applied, using smaller image sizes during training to improve speed and
enable larger batch sizes.
3. Depthwise Convolutions: The paper discusses the trade-offs with depthwise
convolutions, noting that while they are parameter-efficient, they don't fully utilize
modern hardware accelerators. The Fused-MBConv, which combines depthwise and
expansion convolutions into a single regular convolution, is introduced for better
hardware utilization.
4. Scaling Strategy: EfficientNet's equal scaling of all stages is identified as sub-
optimal. EfficientNetV2 adopts a non-uniform scaling strategy, adding more layers to
later stages and limiting maximum image size to reduce memory consumption and
improve training speed.

Training-Aware NAS and Scaling:

1. NAS Search: EfficientNetV2 employs a training-aware NAS framework, optimizing


for accuracy, parameter efficiency, and training efficiency. The search space includes
design choices like convolution types (MBConv, Fused-MBConv), number of layers,
kernel sizes, and expansion ratios.
2. EfficientNetV2 Architecture: The resulting EfficientNetV2 architecture uses both
MBConv and Fused-MBConv in early layers and prefers smaller expansion ratios for
MBConv.

Experimental Results and Findings:

1. Model Performance: EfficientNetV2 models demonstrate faster training speeds and


better parameter efficiency compared to previous models. The models are up to 6.8
times smaller and significantly outperform previous models on various datasets like
ImageNet, CIFAR, Cars, and Flowers.
2. Accuracy and Speed: By pretraining on ImageNet21k, EfficientNetV2 achieves
87.3% top-1 accuracy on ImageNet ILSVRC2012, surpassing the recent Vision
Transformer (ViT) models in accuracy and training speed.
3. Progressive Learning Method: The paper proposes an improved method of
progressive learning to address the accuracy drop caused by increasing image size
during training. This method adaptively adjusts regularization and data augmentation
along with image size.

EXPECTED OUTCOMES
1. Immediate and Accurate Gesture Translation: The primary expected outcome is
the real-time translation of sign language into text or spoken words with high
accuracy. This feature will enable seamless communication for individuals with
hearing and speech impairments.
2. Reduced Social Isolation: By facilitating easier communication, the project is
expected to significantly reduce the social isolation often experienced by the deaf and
dumb community. It will allow for more spontaneous and engaging interactions with
the wider society.
3. Enhanced Educational and Professional Opportunities: The system's
implementation in educational and professional settings is anticipated to open up new
opportunities for learning and employment, making these environments more
accessible and inclusive.
4. Advancement in Assistive Technology: This project is expected to be a significant
contribution to the field of assistive technology, demonstrating the effective
application of advanced deep learning models like EfficientNet in real-world
scenarios.
5. User-Friendly Interface: With the integration of React for front-end development, a
highly intuitive and accessible user interface is anticipated, which will be easy to
navigate even for users with limited technical skills.
6. Increased Awareness and Empathy: The project is expected to raise awareness
about the challenges faced by the deaf and dumb community, fostering greater
empathy and understanding within the broader society.
7. Scalable Solution for Diverse Applications: The system is designed to be scalable,
allowing for future enhancements such as support for multiple sign languages and
dialects, and integration into various digital platforms and devices.
8. Valuable Feedback for Improvements: Continuous user feedback is expected,
which will be crucial for the iterative improvement of the system, ensuring that it
meets the evolving needs of its users effectively.

In summary, the expected outcomes of this project encompass significant advancements in


communication for the deaf and dumb community, broader social inclusion, educational and
professional accessibility, technological innovation, and increased societal awareness.
The successful implementation and adoption of this real-time sign language recognition
system have the potential to bring about transformative changes in the lives of individuals
with hearing and speech impairments, and in society as a whole.

You might also like