Development of American Sign AUTHORS
Mashyaat Shafia Chowdhury
Language to Text
Tazriyan Sayeed
Saaquib Rahman
Barsha Talukdar
Converter using Machine Learning SUPERVISOR
Dr. K.M.A. Salam
A Real-Time Sign Language to Text Converter Using Convolutional Neural Networks Associate Professor
Department of Electrical and Electronic Engineering
North South University
INTRODUCTION RESULTS/FINDINGS
ANALYSIS
In today’s interconnected world, communication barriers hinder effective interactions between The results of the project demonstrated exceptional performance, with the
American Sign Language (ASL) users and those who do not use it, limiting access to information, The Convolutional Neural Network (CNN) model used in this project is designed to Convolutional Neural Network (CNN) model achieving an accuracy rate of 99.94%
education, and employment opportunities for deaf and hard-of-hearing individuals. This project effectively process and recognize American Sign Language (ASL) gestures. The in sign language recognition. This high accuracy underscores the model's
addresses these challenges by developing an ASL-to-text converter using machine learning model architecture includes: proficiency in learning intricate patterns from the training data and its potential for
techniques, particularly Convolutional Neural Networks (CNNs). By translating ASL gestures into Convolutional Layers: Extract features from input images with filters of 32, robust performance on unseen data. The system, trained on a dataset of 15,600
text in real-time, the technology aims to enhance accessibility, inclusivity, and participation for 64, and 128, using ReLU activation. ASL hand sign images, performed effectively in real-time applications, accurately
the deaf community in various settings, including education, healthcare, employment, and social Pooling Layers: Reduce dimensionality and complexity. detecting and translating ASL gestures into text. Although there was a slight
interactions. Flattening and Dense Layers: Convert data to a single vector and process it decline in accuracy when integrated into the GUI interface, the system continued
through dense layers. to function well, maintaining its reliability and effectiveness.
Output Layer: Softmax layer translates features into probabilities for the 26
ASL alphabet classes.
The CNN model was trained using a dataset of 15,600 images, achieving high
ABSTRACT accuracy, with optimizations like the Adam optimizer and dropout layers to prevent
overfitting. The CNN architecture is illustrated below.
Disabled people, comprising 15% of the global population, often rely on support for
daily needs. Deaf and mute individuals, typically illiterate, communicate via sign
language. This project introduces a novel system to translate sign language into text CONCLUSION
and speech. Using 26 American Sign Language (ASL) alphabet samples, the system
captures hand gestures from live feeds and predicts labels using models like CNN, In conclusion, our Sign Language Translation System project significantly advances
FRCNN, and YOLO, with CNN providing the highest accuracy. Trained on 15,600 hand communication for the hearing-impaired community by achieving a remarkable
sign images from a Logitech Webcam, this system enhances real-time ASL alphabet 99.94 percent accuracy rate in sign language recognition. This project underscores
detection accuracy. our commitment to empowering deaf individuals to engage more fully in various
life aspects. While we have navigated challenges like data quality and model
architecture, our journey doesn’t end here. There remains a path for further
enhancements, promising an even more efficient and impactful system in the
future. Our continuous efforts aim to bridge communication gaps and enrich the
METHODOLOGY quality of life for those who communicate through sign language
Data Collection: Images of American Sign Language (ASL) hand gestures were captured
using a Logitech Webcam. A dataset of 15,600 images, with 600 images for each ASL
alphabet, was created.
Data Preprocessing: The captured images were converted to grayscale and resized to
100x100 pixels for uniformity. These preprocessed images were stored in a designated folder
for further use. SYSTEM DIAGRAM
Model Training: A Convolutional Neural Network (CNN) model was chosen for its high
ACKNOWLEDGEMENT
accuracy in image recognition tasks. The model architecture included several convolutional
We extend our heartfelt gratitude to everyone who contributed to the successful
layers, pooling layers, and dense layers. The CNN was trained using the preprocessed
completion of this project, including our peers, friends and everyone else who
dataset, achieving near-perfect accuracy.
provided valuable constructive criticisms and first-hand feedbacks, resulting in a
Testing and Validation: The dataset was split into training (80%) and testing (20%) sets.
significantly enriched research experience. But on top of it all, we deeply
The model’s performance was evaluated on the testing set to ensure accuracy and
acknowledge and appreciate our supervisor, Dr. K.M.A. Salam, for his unwavering
generalization.
support, insightful guidance, and invaluable feedback throughout this research. His
Implementation: The trained model was integrated into a system that captures hand
expertise and encouragement have been instrumental in shaping the direction and
gestures in real-time, predicts the corresponding ASL alphabet, and converts it into text.
ultimate success of our project.
GUI Development: A user-friendly graphical interface was created to display the
recognized ASL gestures as text, allowing users to interact with the system seamlessly.
Optimization: Techniques like using the Adam optimizer and dropout layers were employed
to reduce overfitting and improve the model's robustness.