0% found this document useful (0 votes)
14 views17 pages

Architecture

ppt for sign language detection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Architecture

ppt for sign language detection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

BE Project Group Number 19:

1] Nigade Prathamesh
2] Dhaytonde Yash
3] Shendage Dipali
4] Jadhav Bhagyashri

Sign Language Conversion


to Text and Audio for
Divyang
Communication Developed Guide:
as silence
Bridging a Microservice
with technology, empowering communication for all.
Mrs. Madhuri Potey,
HOD,Computer
Department
Problem
Statement

Develop a microservice solution that bridges the


communication gap for individuals with hearing and
speech disabilities (Divyang) by translating sign language
gestures into real-time text and audio. This system aims to
enhance accessibility and inclusivity in daily interactions
across settings such as education, healthcare, and social
environments.
Proposed Work
1. Sign Language Recognition System:
Develop a robust computer vision-based system to capture and interpret hand gestures and body movements. The
system will leverage machine learning models, such as convolutional neural networks (CNNs) or deep learning-based
action recognition techniques, to accurately recognize sign language gestures from video input.

2. Text and Audio Conversion:


Once gestures are recognized, the system will convert them into corresponding text. This text will then be transformed
into audio using text-to-speech (TTS) technology, providing a dual output—visual (text) and auditory (speech)—to cater
to diverse communication needs.

3. Microservice Architecture:
The solution will be developed as a cloud-based microservice to ensure scalability and platform independence. This
approach will allow seamless integration with various web and mobile applications while ensuring low-latency
performance for real-time communication.

4. Multilingual and Multi-dialect Support:


The system will be designed to support multiple sign languages and regional dialects, allowing adaptability to different
linguistic contexts. Machine learning models will be trained on datasets that cover various sign language dialects to
ensure accuracy across different regions and communities.
Literature Review
Paper Title Conference/Journal Relevance to Project
Deep Sign: Deep Learning Based Indian Sign Language IEEE Access, vol. 8, 2020. doi: Provides insights into CNN architectures and training
Recognition System 10.1109/ACCESS.2020.3016407 methodologies for Indian Sign Language (ISL)
recognition. Emphasizes dataset diversity, which informs
our approach to gesture data collection and annotation.
Highlights challenges like signing variations for optimized
model accuracy.

A Real-Time Continuous American Sign Language IEEE Transactions on Multimedia, vol. 22, no. 7, 2020. Discusses real-time recognition of continuous ASL with
Recognition System Based on Deep Learning doi: 10.1109/TMM.2019.2957468 CNNs and RNNs. Offers techniques for handling
continuous signing, enhancing our system's capability to
handle varied signing speeds and styles. Insights on
optimizing model performance in real-time align with
our project's accuracy goals.

Gesture Recognition for Sign Language: A Comparative IEEE Transactions on Human-Machine Systems, vol. 50, Compares different algorithms for gesture recognition,
Study of Machine Learning Algorithms no. 2, 2020. doi: 10.1109/THMS.2020.2974376 helping us select the best approach for our module.
Algorithm performance insights guide optimization for
accuracy and robustness, crucial for effective Divyang
communication.
A Hybrid Approach for Sign Language Translation Using IEEE Access, vol. 9, 2021. doi: Proposes a hybrid approach combining CNNs and NLP,
CNNs and NLP 10.1109/ACCESS.2021.3059849 directly relevant to our text-to-speech capabilities. The
integration of gesture recognition with text/audio
processing informs our design for accurate and natural-
sounding audio outputs, enhancing user experience.

Sign Language Recognition: A Review and Future IEEE Transactions on Neural Networks and Learning Provides a comprehensive overview of current
Directions Systems, vol. 32, no. 5, 2021. doi: challenges in sign language recognition, such as accuracy
10.1109/TNNLS.2020.3034001 and real-time processing. Guides our efforts to develop a
robust, responsive system, with recommendations for
areas of improvement to support Divyang
communication.
Mathematical Model
Mathematical Model
System Architecture
Algorithms

•Gesture Detection (YOLO)

• Goal: Detect hand gestures in real-time from video frames.


• Process: Divides the image into a grid, predicting bounding boxes and confidence
scores for each cell. Optimizes for loss based on bounding box accuracy,
objectness score, and classification error.

•Feature Extraction with CNN

• Goal: Extract specific features of detected gestures.


• Process: Uses convolution operations to create feature maps, followed by
activation and pooling. Applies Softmax at the output layer for gesture classification
probabilities.
Algorithms

•Gloss Sequence to Text Translation (LLM)

• Goal: Translate gloss sequences into natural language.


• Process: Uses embeddings and attention to weigh glosses, optimizing
for maximum likelihood in decoding to coherent sentences.

•Text-to-Speech Conversion (TTS)


• Goal: Convert text to audio for feedback.
• Process: Converts text to mel spectrograms, then synthesizes
waveforms with a neural vocoder, generating realistic speech from the
spectrogram.
Module Wise Algorithms
1. Input Capture (Video Frame Collection):
Input Capture
1. Start the video stream and capture frames continuously.
2. Store captured frames in a buffer for further processing.
2. Feature Extraction (Gesture Identification): Feature Extraction
1. Use a CNN model to extract key features from each frame.
2. Generate feature vectors representing hand gestures.
3. Gesture Classification: Gesture Classification
1. Feed feature vectors into a classification model.
2. Predict the corresponding gesture with the highest confidence.
4. Text and Audio Conversion: Text and Audio Conversion
1.Map the recognized gesture to a corresponding word/phrase.
Dat a Flow

Region Feature
Video Frames
Detection Extraction

Gloss Gloss
Audio Text
Conversion Detection
UML
Diagrams Use Case Diagram
Sequence Diagram
Class Diagram
Test Cases
Conclusion and future Scope

Conclusion:
The development of a microservice for converting sign language into text and audio is a significant
step toward enhancing communication for Divyang individuals. By utilizing computer vision,
machine learning, and text-to-speech technology, the system facilitates real-time, accessible
communication, fostering inclusion in various sectors like healthcare, education, and public services.
Future Scope:
The system can be expanded to support more regional sign languages and dialects, improving
accuracy through larger, more diverse datasets. Additionally, integration with wearable devices like
smart glasses and advancements in gesture recognition technology can make the solution more
portable and widely applicable in everyday scenarios.
Thank
You!!

You might also like