ASL Sign Language Recogniation System
ASL Sign Language Recogniation System
Project
On
Place: Delhi
Date: 19 May 2025
i
CERTIFICATE OF DECLERATION
This is to certify that the work embodied in project thesis titled, “American Sign Language
Recognization System” by Kartik Saini (2021UEA6504), Lakshaya Singh (2021UEA6511),
Deepak Kumar (2021UEA6522) and Yash Rajora (2021UEA6562) is the bonafide work of the
group submitted to Netaji Subhas University of Technology for consideration in 8 th Semester
B.Tech. Project Evaluation.
The original Research work was carried out by the team under my guidance and supervision in the
academic year 2024-2025. This work has not been submitted for any other diploma or degree of
any university. On the basis of declaration made by the group, we recommend the project report
for evaluation.
(Professor)
ii
ACKNOWLEDGEMENT
We would like to express my gratitude and appreciation to all those who make it possible
to complete this project. Special thanks to our project supervisor(s) Dr. R.K. Sharma whose help,
stimulating suggestions and encouragement helped us in writing this report. We also sincerely
thank our colleagues for the time spent proofreading and correcting our mistakes.
We would also like to acknowledge with much appreciation the crucial role of the staff in
the Electronics and Communication Engineering, who gave us a permission to use the lab
equipment and also the machine and giving permission to use all necessary tools in the laboratory.
iii
ABSTRACT
Sign language plays a vital role in enabling communication for individuals with hearing or speech
impairments. However, due to the lack of widespread knowledge and understanding of sign language among
the general population, this form of communication often remains inaccessible, resulting in social and
communicational isolation for the deaf and mute community. This research presents a real-time sign language
to text conversion system that utilizes a combination of deep learning and computer vision techniques,
specifically Long Short-Term Memory (LSTM) networks integrated with a Temporal Attention mechanism.
The proposed system captures hand and body gestures using a webcam and processes them using MediaPipe,
an efficient real-time hand and pose landmark detection framework.
After extracting high-dimensional keypoints from video sequences, the LSTM model captures temporal
dependencies across frames, while the attention mechanism enhances performance by focusing on the most
significant time steps in each gesture sequence. The model was trained on a custom dataset of American Sign
Language (ASL) gestures, achieving exceptional accuracy in classification. Performance metrics such as
accuracy (99.7%), precision (97%), recall (96.5%), and F1 score (96.74%) confirm the model's robustness
and reliability.
This system offers a non-intrusive, camera-based solution without the need for additional sensors or gloves,
ensuring usability and accessibility. Its real-time capability makes it ideal for deployment in educational,
healthcare, and public service settings. Moreover, the model architecture is scalable and can be expanded to
recognize additional signs and integrate with voice synthesis or chatbot systems for interactive
communication. This research contributes significantly to the field of assistive technology and paves the way
toward a more inclusive society.
iv
INDEX
CANDIDATE(S) DECLARATION i
CERTIFICATE ii
ACKNOWLEDGMENTS iii
ABSTRACT iv
INDEX v
LIST OF FIGURES ix
LIST OF TABLES x
LIST OF A
CHAPTER 1 1-4
INTRODUCTION AND LITERATURE REVIEW 1
1.1 Motivation 1
1.2 Key Challenges 2
1.3 Problem addressed in thesis 2
1.4 Approach to the Problem and Organization of the Thesis 3
CHAPTER 2 5-10
MATHEMATICAL MODELING/EXPERIMENTAL METHODS AND MATERIALS 5
2.1 Dataset Creation 5
2.2 Feature Extraction using MediaPipe 6
2.3 Model Architecture: LSTM with Temporal Attention 8
2.4 Training and Real-Time Deployment 9
CHAPTER 3 10-12
RESULTS AND DISCUSSION 10
3.1 Performance Evaluation 10
3.2 Graphical Analysis 11
3.3 Discussion 12
3.4 Limitations 12
CHAPTER 4 14-
CONCLUSIONS AND SCOPE FOR FUTURE WORK 14
4.1 Task, Achievement and Possible Beneficiaries 15
4.2 Review of Contributions 16
4.3 Scope for Future Work 16
REFERENCES 19-20
APPRENDIX
LIST OF FIGURES
1. American Sign language
2. Image Preprocessing
3. Confusion Matrix
vi
LIST OF TABLES
1. layer-wise breakdown of the model used
2. Software and Tools Used
3. Performance Metrics of the Proposed Model
vii