0% found this document useful (0 votes)
13 views29 pages

A5 Batch

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

A5 Batch

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

SIGN LANGUAGE TRANSLATION

USING DEEP LEARNING

GUIDED BY: Team Members:


Dr. K Satyanarayana G. Bhagyalakshmi : 20B81A1236
Designation : Associate Professor H. Eswar Pranav Nadh : 20B81A1239
K. Anand Jagadeesh : 20B81A1262
G. Harini : 20B81A1233
CONTENTS
 Abstract
 Introduction
 Existing system
 Disadvantages
 Proposed system
 Advantages
 System Requirements
 Problem statement
 Literature survey
 Methodology
ABSTRACT
Communication plays a crucial role in understanding others and
building relationships. However, it can be challenging for people to
communicate with those who are deaf and vocally impaired, especially when
not everyone is familiar with sign language. To address this gap, our research
explores the application of deep learning techniques for Sign Language
Translation (SLT). This study proposes a deep learning model that uses the
power of neural networks to automatically translate sign language gestures
into written language, facilitating effective communication between sign
language users and those unfamiliar with signing. The potential impact of this
research extends beyond individual interactions, as the proposed deep
learning-based SLT system has the capacity to enhance accessibility in various
sectors, such as education, healthcare, and public services.
INTRODUCTION
In a world that thrives on communication, the diverse
ways in which individuals express themselves should be
recognized and supported. Unfortunately, many people
with hearing impairments face challenges in effectively
communicating with the broader community due to the
limited understanding of sign language. To address this
issue and promote inclusivity, our Sign Language
Conversion Project aims to develop a groundbreaking
solution that facilitates seamless communication
between individuals who use sign language and those
who rely on spoken language.
LITERATURE SURVEY
Throughout our research, We came across a number of publications focusing on Translation System for
Dumb and Deaf People, as well as its numerous components and methodologies.
 [1]The first research paper underscores the importance of successfully able to develop a practical and
meaningful system that can able to detect 0-9 digits and A-Z alphabets hand gestures
 [2] The second research paper underscores the importance of Transcriptor. It looks at pictures, finds
where the hands are, separates them, and then figures out what the hand movements mean. It's like
turning hand gestures in photos into words we can understand.
 [3] In this paper they created a system for recognizing sign language. They used fancy stuff like
Convolutional Networks and machine vision to make it work. They look at pictures of hand signs, break
them down into smaller parts, and use smart techniques to analyze them. Even if the pictures are taken
differently, they make them all the same size so it's easier to understand the hand movements. It's like a
tech-savvy way of understanding sign language in photos!.
EXISTING SYSTEM

Several prevalent models primarily rely on


Convolutional Neural Networks (CNNs) and are
commonly trained on image datasets to achieve
high accuracy in detecting sign language letters.
However, the drawback of these models is the
considerable time and resource investment
required during the training process.
DISADVANTAGES
Existing models are majorly using Convolutional
Neural Networks (CNNs) which are powerful for
image-related tasks, including sign language
detection, there are some disadvantages when
using them for sign language translation:
 Limited Temporal Understanding

 Fixed Input Size

 Inability to Capture Sequential Context

 Resource Intensive

 They can only detect letters


PROPOSED SYSTEM

Our proposed system combines Mediapipe and a Deep


Learning model especially LSTM Model to identify sign
language gestures and consistently predict words to
construct sentences, going beyond individual letters. This
holds the potential to bridge the communication divide
between the deaf and vocally impaired individuals and
others.
ADVANTAGES
Utilizing LSTM (Long Short-Term Memory)
models for sign language detection and
translation offers several advantages:
 Sequential Information Handling

 Memory Retention

 Handling Variable-Length Sequences

 Real-time Processing

 Reduced Vanishing Gradient Problem

 LSTM can detect letters along with words


SYSTEM REQUIREMENTS
Software Requirements
 Google Colab
Hardware Requirements
 Jupiter Notebook

• Camera
• System with graphic card
• 4GB RAM
Tech stack requirements • Intel Core i3 Processor
 Python
 Deep Learning
PROBLEM STATEMENT
 It can be challenging for people to communicate with those who are deaf and vocally
impaired, especially when not everyone is familiar with sign language.
 Existing models primarily concentrate on recognizing individual sign language letters, lacking
the capability to seamlessly track actions across continuous frames and construct complete
words.
 A substantial amount of data is necessary for addressing this issue of detecting words, yet
there is a scarcity of datasets containing sequences of images that represent complete words
in sign language. Typically, sign language words are composed of actions captured in a
sequence of image frames. To recognize them it requires continuous flow of data as input and
it can’t be done through existing models as they only operate on limited input size.
METHODOLGY

Detecting Collect Key point


Importing Preprocess Data
landmarks using Values for Training
dependencies and Create Labels.
MP Holistic. and Testing.

Build and Train Evaluation using


Predict in Real
LSTM Neural Confusion Matrix
Time.
Network. and Accuracy score
IMPORTING DEPENDENCIES
 Here we are going to import the required
dependencies for our project like :
 Numpy
 Mediapipe
 Matplotlib
 Tensorflow==2.12.0
 Sklearn
DETECTING LANDMARKS
USING MP HOLISTIC
 It turns on your webcam.

 It sets up a special tool called "MediaPipe" to help


recognize where a person is in the video.
 It keeps looking at each frame of the video from the
webcam.
 For each frame, it tries to figure out where a person is
and how they're positioned.
 It draws little dots and lines on the video to show
where it thinks the person's key body parts are.
 It shows this modified video on your screen.

 It keeps doing this until you press the 'q' key to stop.
COLLECT KEY POINT
VALUES FOR TRAINING
AND TESTING
 Training sign actions involve defining an array of sign
actions to be trained.
 For each sign action, a separate folder is created,
and a loop is executed to process each sign
individually.
 Within each sign folder, 30 sequences are recorded,
with each sequence consisting of 10 frames.
 we gathers x, y, and z coordinates for hand
landmarks, while for body pose, it collects x, y, and z
coordinates along with visibility. If no landmarks are
detected, arrays of zeros are returned. These key
points are combined into a single array and saved for
each sequence.
PREPROCESS DATA AND CREATE LABELS
AND FEATURES
 Data preprocessing is an essential step in
developing deep learning model.. It involves
preparing and cleaning the data to ensure that
the deep learning algorithms can work effectively.
 We loop through all frames of each sequence,
progressively appending the data of each frame
to the data of the previous frame. Likewise, we
append the data of each sequence of an action to
the data of the previous sequence, forming an
individual array for each action containing all its
sequences.
 X represents the array of sequences while Y
represents the each action
 splits the data into training and testing sets, with
a test size of 5%.
BUILD AND TRAIN LSTM
NEURAL NETWORK
 The defined model architecture consists of
multiple LSTM layers followed by dense layers,
culminating in a SoftMax output layer.
 The model is compiled using the Adam optimizer
and categorical cross-entropy loss function, with
categorical accuracy as the evaluation metric.
 Training is performed for 100 epochs on the
training data.
EVALUATION USING CONFUSION
MATRIX AND ACCURACY SCORE
 The model's predictions are generated for the
test data.
 The true labels of the test data are extracted.

 Confusion matrices are computed for each class,


depicting the model's performance across
different classes.
 The accuracy of the model's predictions is
calculated and presented as a single scalar
value.
TEST IN REAL TIME
 In real-time testing, input data, such as video
streams capturing sign language gestures, is
continuously fed into the trained model for
prediction without the luxury of batch processing
or prior knowledge of the entire dataset.
 Real-time predictions involve processing each
frame or sequence of frames individually, to
make predictions.
CODING & EXECUTION
THANK YOU

You might also like