Finalreport Seminar Spoo1
Finalreport Seminar Spoo1
Seminar Report
on
Deep learning-based sign language recognition system for static
signs
Submitted in partial fulfillment of the requirements for the VIII semester
Bachelor of Engineering
in
Computer Science and Engineering
of
Visvesvaraya Technological University, Belagavi.
by
Spoorthi S V
(1CD21CS158)
CERTIFICATE
Certified that Ms. Spoorthi S V bearing USN 1CD21CS158, a bonafide student of Cambridge
Institute of Technology, has successfully completed technical seminar entitled “Deep learning-
based sign language recognition system for static signs” in partial fulfillment of the
requirements for VIII semester Bachelor of Engineering in Computer Science and Engineering
of Visvesvaraya Technological University, Belagavi during academic year 2024- 2025. It is
certified that all Corrections/Suggestions indicated for Internal Assessment have been incorporated
in the report deposited in the departmental library. The seminar report has been approved as it
satisfies the academic requirements in respect of technical seminar prescribed for the Bachelor of
Engineering degree.
I, Spoorthi S V, a student of VIII semester BE, Computer Science and Engineering, Cambridge
Institute of Technology, hereby declare that the technical seminar entitled “Deep learning-based
sign language recognition system for static signs” has been carried out by me and submitted in
partial fulfillment of the course requirements of VIII semester Bachelor of Engineering in
Computer Science and Engineering as prescribed by Visvesvaraya Technological University,
Belagavi, during the academic year 2024-2025.
I also declare that, to the best of my knowledge and belief, the work reported here does
not form part of any other report on the basis of which a degree or award was conferred on an
earlier occasion on this by any other student.
Academic Environment at CITech without which this work would not have been possible.
I am extremely thankful to Dr. G. Indumathi, Principal, CITech, Bangalore, for providing me the
academic ambience and everlasting motivation to carry out this work and shaping our careers.
I express my sincere gratitude to Dr. Shreekanth Prabhu M., HOD, Dept. of Computer Science
and Engineering, CITech, Bangalore, for his stimulating guidance, continuous encouragement and
motivation throughout the course of present work.
I also wish to extend my thanks to Ms. Vasumathi AK, Assistant Professor, Seminar Coordinator,
Dept. of CSE, CITech, Bangalore, for her critical, insightful comments, guidance and constructive
suggestions to improve the quality of this work.
I also wish to extend my thanks to Mr. Arun p, Assistant. Professor, Dept. of CSE, CITech for his
guidance and impressive technical suggestions to complete my seminar.
I express my gratitude to Thomas Bocek, Bruno B. Rodrigues, Tim Strasser, and Burkhard Stiller,
whose paper entitled “Blockchains Everywhere – A Use-case of Blockchains in the Pharma
Supply-Chain” forms the base for this report.
Finally, I would like to express my deepest gratitude to my friends and classmates for their
unwavering support, especially in technical aspects. I’m also thankful to my faculty members for
their guidance and encouragement. Lastly, I extend my heartfelt thanks to my parents, whose
constant support and encouragement were my pillar of strength in completing this work.
Spoorthi S V
ABSTRACT
Sign language is an effective medium of communication for individuals with hearing and speech
impairments. With the rapid progress in computer vision, researchers are increasingly focusing on
developing automated sign language recognition systems to enhance accessibility. Traditional
approaches to Indian Sign Language (ISL) recognition often concentrate on a small set of distinct
static signs, limiting their applicability in real-world scenarios. This paper proposes a robust deep
learning-based system for the recognition of static ISL signs using Convolutional Neural Networks
(CNNs). A large and diverse dataset of 35,000 images representing 100 static signs was collected
from multiple users under varying conditions to ensure model robustness and generalizability. The
system's performance was extensively evaluated using around 50 CNN architectures and tested
with different optimizers to identify the most efficient configuration. The proposed model achieved
a remarkable training accuracy of 99.72% on colored images and 99.90% on grayscale images. In
addition to accuracy, evaluation metrics such as precision, recall, and F-score were used to validate
the system's reliability. The results demonstrate significant improvement over previous works,
which were limited to recognizing only a few hand signs. This research contributes to building a
more comprehensive and scalable sign language recognition system, paving the way for better
human-computer interaction and accessibility solutions.
i
CONTENTS
Abstract i
Contents ii
i
4.2 Challenges 10
4.2.1 Dataset Imbalance and Collection Difficulty 10
4.2.2 Variations in Lighting and Background 10
4.2.3 Overfitting in CNN Models 11
4.2.4 Computational Resource Constraints 11
4.1.5 Accuracy Trade-off in Similar Signs 11
Chapter 5 Real-World Applications 12
5.1 Deep learning in sign language recognition 12
5.1.1 Real -Time Sign-to-Text Conversion 12
System
5.1.2 Education tools for Learning Sign Language 15
In real-time
5.1.3 Assistive Communication Devices for 13
Accessibility
5.1.4 Integration into Mobile and Web Applications 13
Platforms
5.1.5 Enhanced Customer Support Accessibility 13
5.1.6 Smart Classrooms and Inclusive Education 13
5.3 Modum: Blockchain-Enabled Temperature 14
Monitoring in Pharmaceutical Supply Chains
5.4 Challenges in Real-World Implementations 16
Conclusion 17
References 18
ii
List of Figures
iii
CHAPTER 1
INTRODUCTION
Deep learning, especially Convolutional Neural Networks (CNNs), offers powerful capabilities in
feature extraction and image classification. This motivates the development of a CNN-based system
that can handle varied hand shapes, lighting conditions, and user differences. The ultimate goal is to
bridge the communication gap and create a supportive tool that enhances accessibility and
independence for the hearing-impaired community.
2 Introduction
Sign language recognition has evolved significantly with the introduction of machine learning
and deep learning techniques. Early research efforts primarily utilized traditional machine
learning algorithms, which required manual feature extraction and offered limited accuracy.
However, recent advances in deep learning, especially convolutional neural networks (CNNs),
have significantly improved the performance and accuracy of sign language recognition
systems.
2.5 Gesture Recognition Using Autoencoders and Deep Belief Networks [5]
Autoencoders and Deep Belief Networks (DBNs) have been explored for their capacity to
extract abstract and layered features from gesture images. Oyedotun and Khashman used
Stacked Denoising Autoencoders (SDAE) and CNNs for static ASL gestures, achieving
accuracies of 91.33% and 92.83% respectively. These models were trained on public gesture
databases and showed significant improvement over shallow learning models. DBNs were
particularly effective in learning hierarchical representations, while SDAEs helped denoise and
refine the input images before classification. These findings demonstrate that unsupervised
deep models can provide an effective alternative to standard CNNs. By fine-tuning the final
layers of these models, researchers were able to leverage powerful feature extractors without
needing extensive data.
3.1.4 Server:
• The server acts as a central hub for managing model inference, handling client
requests, and interacting with both the database and the blockchain.
• Manages smart contracts and stores related data in the database.
3.3 Workflow
The system begins with the user performing a static or dynamic sign in front of the camera
on a mobile device. The camera captures the frame and applies preprocessing techniques like
background subtraction or grayscale transformation. The image is then sent to the server
where the CNN model processes the input through multiple layers including convolutional,
ReLU, pooling, and fully connected layers to classify the sign.
4.1 Methods
The proposed Sign Language Recognition System is implemented through four key phases: Data
Acquisition, Data Preprocessing, Model Training, and Testing. Each phase is crucial in
developing a robust and accurate CNN-based classifier for recognizing static Indian Sign
Language (ISL) signs. The system aims to bridge the communication gap for individuals with
hearing or speech impairments by translating visual gestures into meaningful text.
• These steps ensure consistency in image size and scale for CNN input.
Preprocessed images are stored and used in both training and testing phases.
4.1.5 Testing
• The model is evaluated by testing it with unseen data after training.
Around 50 different CNN models with various optimizers are tested.
• Testing helps in identifying the model with the best performance.
Fine-tuning is done by adjusting parameters to enhance accuracy.
• The best-performing model is finalized for deployment.
Testing validates the system’s effectiveness in recognizing static ISL signs.
4.2 Challenges
During the development of the system, we faced several challenges such as collecting a balanced and
diverse dataset, managing variations in lighting and background during image capture, and avoiding
overfitting in CNN models.
Real-time tuning and debugging were delayed due to limited hardware availability. Handling large
image batches sometimes caused memory overflow or system lag. These constraints slowed down
experimentation and development cycles significantly.
The proposed Deep Learning-Based Sign Language Recognition System efficiently recognizes
static signs from Indian Sign Language using a customized Convolutional Neural Network (CNN).
The system achieves high accuracy by extracting spatial features through convolutional, pooling,
ReLU, and fully connected layers. A modular architecture with front-end components like cameras
and back-end components such as a trained CNN model, server, and database ensures smooth
functioning. The dataset, collected under varied environmental conditions, enhances the system’s
robustness. Preprocessing techniques like background subtraction and image resizing further
improve recognition performance. Metrics like precision, recall, and F1-score confirm the system's
reliability. The dropout layer prevents overfitting, ensuring better generalization. The web camera-
based interface allows real-time interaction, making the system accessible and user-friendly. The
CNN model is trained to classify digits, alphabets, and common words in ISL with high
confidence. The system helps bridge communication gaps for hearing or speech-impaired
individuals. Its scalable design allows future extensions such as dynamic sign recognition and
integration with mobile platforms. The hybrid architecture supports multi-class classification and
efficient data handling. With nearly 50 model variations tested, the chosen configuration shows
superior performance. This sign recognition system represents a valuable application of AI in
assistive technology, making communication more inclusive.
[1] Corballis MC (2003) From mouth to hand: gesture, speech and the evolution of right-handedness.
Behav Brain Sci 26(2):199–208.
[2] Oyedotun OK, Khashman A (2017) Deep learning in vision- based static hand gesture recognition.
Neural Comput Appl 28(12):3941–3951.
[3] Nagi J, Ducatelle F, Di Caro GA, Cires¸an D, Meier U, Giusti A, Gambardella LM (2011) Max-
pooling convolutional neural net- works for vision-based hand gesture recognition. In: IEEE
international conference on signal and image processing appli- cations (ICSIPA), pp 342–347.
[4] Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3D convolutional neural
networks. In: IEEE international conference on multimedia and expo (ICME), pp 1–6.
[5] Arora, S.; Roy, A. Recognition of sign language using image processing. Int. J. Bus. Intell. Data
Min. 2018, 13, 163–176.
[6] Lin, H.; Hong, X.; Wang, Y. Object Counting: You Only Need to Look at One. arXiv 2021,
arXiv:2112.05993.
[7] Rioux-Maldague L, Giguere P (2014) Sign language finger- spelling classification from depth and
color images using a deep belief network. In: IEEE Canadian conference on computer and robot
vision (CRV), pp 92–97.
[8] Dhulipala, S.; Adedoyin, F.F.; Bruno, A. Sign and Human Action Detection Using Deep Learning.
J. Imaging 2022, 8, 192.
[9] Alvarez-Estevez, D.; Rijsman, R.M. Inter-database validation of a deep learning approach for automatic
sleep scoring. PLoS ONE 2021, 16, e0256111.
[10] Kaluri, R.; Pradeep Reddy, C.H. Sign gesture recognition using modified region growing algorithm
and Adaptive.
[11] A.M.; Kamel, A.E.; Slim, S.O.; Abdallah, M.S.; Cho, Y.I. MediaPipe’s Landmarks with RNNfor
Dynamic Sign Language Recognition. Electronics 2022, 11, 3228.
[12] Dang, C.N.; Moreno-García, M.N.; De La Prieta, F. Hybrid Deep Learning Models for Sentiment
Analysis. Complexity 2021, 2021, 9986920.
[13] Aly, S.; Aly, W. DeepArSLR: A novel signer-independent deep learning framework for isolated
arabic sign language gestures recognition. IEEE Access 2020, 8, 83199–83212.
[14] Huang,Y.; Huang, J.; Wu, X.; Jia, Y. Dynamic Sign Language Recognition Based on CBAM with
Autoencoder Time Series Neural Network. Mob. Inf. Syst. 2022, 2022, 3247781.
[15] Mekala, P.; Gao, Y.; Fan, J.; Davari, A. Real-time sign language recognition based on neural
network architecture. In Proceedings of the 2011 IEEE 43rd Southeastern Symposium on System
Theory, Auburn, AL, USA, 14–16 March 2011; pp. 195–199.
[16] Al-Shaheen, A.; Çevik, M.; Alqaraghuli, A. American Sign Language Recognition using YOLOv4
Method. Int. J. Multidiscip. Stud. Innov. Technol. 2022, 6, 61.
[17] Kothadiya, D.; Bhatt, C.; Sapariya, K.; Patel, K.; Gil-González, A.B.; Corchado, J.M. Deepsign:
Sign Language Detection and Recognition Using Deep Learning. Electronics 2022, 11, 1780.
[18] Gunji, B.M.; Bhargav, N.M.; Dey, A.; Zeeshan Mohammed, I.K.; Sathyajith, S. Recognition of
Sign Language Based on Hand Gestures. J. Adv. Appl. Comput. Math. 2022, 8, 21–32.
[19] Agarwal, S.R.; Agrawal, S.B.; Latif, A.M. Sentence Formation in NLP Engine on the Basis of
Indian Sign Language using Hand Gestures. Int. J. Comput. Appl. 2015, 116, 18–22.