Final Project
Final Project
GestureConnect-ASL Translator
By
ALEENA JOHNSON(SCM21CS017)
AINA ARUN(SCM21CS010)
AMIYA SHEREEF(SCM21CS022)
CHRISTY SAJI(SCM21CS040)
Under the Supervision of
Ms. SRUTHY K JOSEPH
Assistant Professor
Department of Computer Science and Engineering
SCMS School of Engineering & Technology, Ernakulam
1. INTRODUCTION
Introduction to ASL Language Translator
• The ASL Language Translator is an advanced system designed to bridge the
communication gap between hearing-impaired individuals and those unfamiliar with
American Sign Language (ASL).
• Using image capturing techniques, the translator recognizes and translates ASL
gestures into text or speech in real-time.
• The system employs Convolutional Neural Networks (CNN) to analyze hand gestures,
ensuring accurate identification of signs.
• This project aims to make communication more inclusive and accessible for the
hearing-impaired community.
1.INTRODUCTION
Technology Overview
• The translator starts by capturing hand gestures through a camera. CNN processes
these images, identifying key features and patterns to understand the gestures.
• The model is trained using TensorFlow, which allows the system to provide quick and
accurate real-time translation.
• This combination of technologies-image capturing, CNN, and TensorFlow ensures a
seamless user experience, promoting efficient communication and fostering
inclusivity in everyday interactions.
2.PROBLEM STATEMENT
● Communication between hearing-impaired individuals and non-ASL speakers
poses significant challenges.
● Current solutions often require human interpreters or specialized devices, which
may not be accessible or efficient.
● There is a lack of real-time, automated systems that can accurately capture and
interpret ASL gestures.
● The need is for a system that translates ASL into text or speech instantly, enabling
seamless communication.
● This project aims to address these challenges by using image capturing techniques,
CNN, and TensorFlow for an efficient and inclusive translation system.
3.MOTIVATION
● Communication Barrier: Millions of hearing-impaired individuals face difficulties in
daily communication with non-ASL speakers, limiting their ability to interact freely.
● Lack of Accessibility: Existing ASL translation tools are either costly, limited in
availability, or not real-time, making them inefficient in daily use.
● Technological Advancement: With advancements in machine learning and image
processing, there is an opportunity to create an affordable, real-time solution that makes
communication smoother and more inclusive.
● Inclusivity: Developing this system will foster greater inclusivity by enabling hearing-
impaired individuals to communicate effortlessly with the world, promoting equality and
breaking down social barriers.
4.OBJECTIVES
● Real-Time ASL Translation: To develop a system that can accurately and efficiently
translate American Sign Language gestures into text or speech in real-time.
● Leverage Advanced Technologies: To utilize image capturing techniques,
Convolutional Neural Networks (CNN), and TensorFlow for precise gesture recognition
and translation.
● Promote Inclusivity: To create an accessible tool that bridges communication gaps,
making interactions between hearing-impaired individuals and non-ASL speakers more
inclusive and seamless.
5.LITERATURE SURVEY
Sl.no Paper Title Proposed Methodology Drawbacks
1. An Efficient Two- • The proposed solution is a Hierarchical • It has limited generalization
Stream Network Sign Learning Model which is a trainable to different signers
for Isolated deep learning network for sign language especially in signer
Sign Language recognition. independent mode.
Recognition • Developed a trainable deep learning network • There is potential overfitting
Using for sign language recognition that uses Key to extracted postures,
Accumulative posture extractor and accumulative video leading to reduced
Video Motion by motion. recognition accuracy in
Hamzah • Utilizes 3 specialized networks: DMN, dynamic, nuanced gestures.
Luqman , 2022 AMN, and SRN. • Struggled to identify digits,
• Evaluated on the KArSL-190 and KArSL- and individual letters due to
502 Arabic Sign Language datasets. similar signs with the slightest
• Experiments were conducted in two modes: variation in finger positions.
signer dependent and signer independent
5.LITERATURE SURVEY
Sl. Paper Title Proposed Methodology Drawbacks
no
2 BIM Sign • It is an offline BIM sign language translator • Limitations include inability
Language using a camera system to convert sign to sense dynamic signs,
Translator Using gestures into text in multiple languages. sensitivity to background light
Machine • BIM sign language uses TensorFlow, Python, and skin color of signers.
Learning OpenCV, and Qt as a core development library. • Lack of diversified training
(TensorFlow), by • Develop an interactive GUI using PyQt, featuring data which caused accuracy to
Herrick Yeap a video feed display area that captures real-time be diminished.
Han Lin, gestures. • The sign language
Norhanifah Murl. • Using OpenCV, the model captures sign dictionary was limited,
2022 gestures through an H D camera, processes which restricted translation
them, and stores them locally as JPG images capabilities
• The system was distributed as a standalone
executable file to ensure simplicity for users.
5.LITERATURE SURVEY
Sl.n Paper Title Proposed Methodology Drawbacks
o
3 Static Sign • Uses a CNN based model for sign language • Static Gesture Focus: The
Language recognition which uses real time sign capturing current model is limited to
Recognition using Keras . Uses a skin color modeling recognizing static gestures.
Using Deep technique. • The accuracy of
Learning, Lean • A vision-based approach using Convolutional recognition was affected by
Karlo , Ronnie Neural Networks (CNNs) is employed for real-time lighting; optimal conditions
O, August C, recognition of static ASL gestures. led to better performance.
2019 • Data Collection included Image Capturing: of a • The system struggled with
diverse set of images of hand gestures in a controlled complex backgrounds,
environment. indicating that simpler
• Image was then enhanced by Resizing the images backgrounds are essential for
and converting from RGB to HSV color space for accurate recognition.
better skin detection.
• The system achieved an impressive average accuracy
of 90.04% for letter recognition
5.LITERATURE SURVEY
Sl. Paper Title Proposed Methodology Drawbacks
no
4 A Moroccan Sign • Utilized a CNN and image preprocessing • The model primarily effective
Language based algorithm to classify single and double for static signs, making it less
Recognition handed static sign language. suitable for dynamic or
Algorithm Using a • Combines convolution and max pooling for continuous gestures that
Convolution real-time classification and localization. require spatiotemporal
Neural Network, • The system uses a database of 20 stored analysis.
Nourdine Herbaz , signs(letters & numbers) of Morrocan Sign • Highly dependent on the
Hassan El Language. quality of the data set and
Idrissi ,2022 • Input images captured from the webcam are shows significantly lower
first converted from RGB image to accuracy when datasets vary
grayscale from the norm.
• From the processed image, the hand gesture
is extracted using binary image conversion
techniques,
.
5.LITERATURE SURVEY
Sl. Paper Title Proposed Methodology Drawbacks
no
5 A CNN sign • A Webcam based static sign language • The model was Limited to Static
language recognition system which uses a CNN Signs and was unable to recognize
recognition architecture to translate BSL. dynamic gestures, sequences, or
system with • It also uses Tensorflow , Keras and motion,.
single & double- OpenCV for image capture and analysis. • Sensitivity to Hand Positioning:
handed gestures, • The design was implemented using the Minor deviations in hand position
by Neil Buckley, python programming language. or orientation may result in
Lewis Sherret, • The graphical user interface was misclassification.
2022 implemented using the OpenCV library. • Lower recognition rate for signs
• The testing dataset consisted of a total of outside the training dataset,
2,375 images, which is 125 images per
gesture. ,
6.PROPOSED ARCHITECTURE
Block Diagram
7.DESCRIPTION
1.Data Collection Phase
Capturing Images:
This block initiates the data collection process by capturing images of hand gestures. A camera
or digital imaging device is used to acquire multiple frames of each sign language gesture,
ensuring the system captures sufficient visual data for each gesture.
To create a robust dataset, multiple images are often captured from slightly different angles,
lighting conditions, and backgrounds to account for variability in real-world conditions.
Collecting the Images:
After capturing, the images are collected and organized into a structured dataset. This dataset is
typically labeled according to the corresponding gestures, which could be letters, numbers, or
specific signs in American Sign Language.
Organizing images into labeled classes (e.g., ‘A’, ‘B’, ‘1’, etc.) is crucial as it enables supervised
learning in the training phase.
7.DESCRIPTION
Process the Gestures:
The preprocessing step involves several image processing techniques to prepare the images
for feature extraction:
• Resizing: Ensures all images are of uniform dimensions, which helps the neural network
handle the images more effectively.
• Grayscale Conversion: Converts images from RGB to grayscale to reduce the
computational load by focusing on intensity values rather than color.
• Thresholding: A binary thresholding operation separates the hand gesture (foreground) from
the background by setting a threshold value, turning pixels above this threshold white and
those below it black. This segmentation highlights the hand's shape and reduces background
noise.
• Normalization: This step may also involve normalizing pixel values to bring all images to a
similar range, facilitating more consistent neural network performance.
7.DESCRIPTION
2. Database Phase
Reception of New Gestures:
• In this block, new hand gesture images (test data) are fed into the system for recognition.
These images are similar in structure to the training images, having undergone the same
preprocessing steps.
• This stage is critical for real-time applications where new gestures are continuously
introduced for interpretation.
Extraction of Gesture Characteristics:
• Feature extraction is a core process where the system identifies essential gesture features using
image processing and deep learning techniques.
• Edge Detection (e.g., using Sobel or Canny filters) and Contours Extraction can be applied
to emphasize the outline of the hand, capturing unique patterns like finger positions and hand
orientation.
7.DESCRIPTION
Training/Testing Images:
• This block represents a repository where both training and testing images are stored in the database.
The training images contain labeled data that the Convolutional Neural Network (CNN) uses to
learn the patterns and associations between image features and specific gestures.
• Testing images, also preprocessed and labeled, are used to evaluate the model's accuracy and ability
to generalize to unseen data. The database ensures that the system has access to a large, diverse set of
images for both training and testing.
3. Recognition Phase
Make a Decision:
After processing and extracting features from a new gesture, the system uses a decision-making
algorithm (typically the CNN classifier) to determine the gesture’s identity.
7.DESCRIPTION
Identify Gestures:
This block is responsible for identifying the gesture by comparing the features of the new gesture image
with known patterns in the model. The model uses a classification algorithm to match the gesture to one
of the pre-defined categories.
For example, a Softmax layer at the end of the CNN outputs a probability distribution over all gesture
classes, and the class with the highest probability is selected as the identified gesture.
Dec 21 - Dec 26: Research ASL datasets; gather and preprocess data.
Dec 27 - Dec 31: Set up the development environment, libraries (TensorFlow,
OpenCV), and initial project structure.
Feb 11 - Feb 20: Conduct user testing, resolve errors, and optimize model
performance.
Feb 21 - Feb 29: Perform final tests and prepare the model for deployment.
• H.B.D. Nguyen H. N. Do, BIM Sign Language Translator Using Machine Learning
(TensorFlow),2022
• Nourdine Herbaz , Hassan El Idrissi and Abdelmajid Badri, A Moroccan Sign Language
Recognition Algorithm Using a Convolution Neural Network,2022
• Lean Karlo , Ronnie O, August C, Maria Abigail B. Pamahoy et al, Static Sign Language
Recognition Using Deep Learning,2022
• Neil Buckley,Lewis Sherret,Emanuele Lindo Secco, A CNN sign language recognition system
with single & double-handed gestures,2022
10.REFERENCES
• H.B.D. Nguyen H. N. Do, Deep learning for American sign language fingerspelling recognition
system,2019
• J.P.Sahoo,A.J Prakash,P.Plawaik, Real Time Hand Gesture Recognition using Fine-Tuned Convolution
Neural Network,2022
• J. A. Deja, P. Arceo, D. G. David, P. Lawrence, and R. C. Roque, “MyoSL: A Framework for measuring
usability of two-arm gestural electromyography for sign language,” in Proc. International Conference
on Universal Access in Human-Computer Interaction, 2018, pp. 146-159.
• G. Joshi, S. Singh, and R. Vig, Taguchi-TOPSIS based HOG parameter selection for complex
background sign language recognition,2020.
THANK YOU!