A5 Batch
A5 Batch
Resource Intensive
Memory Retention
Real-time Processing
• Camera
• System with graphic card
• 4GB RAM
Tech stack requirements • Intel Core i3 Processor
Python
Deep Learning
PROBLEM STATEMENT
It can be challenging for people to communicate with those who are deaf and vocally
impaired, especially when not everyone is familiar with sign language.
Existing models primarily concentrate on recognizing individual sign language letters, lacking
the capability to seamlessly track actions across continuous frames and construct complete
words.
A substantial amount of data is necessary for addressing this issue of detecting words, yet
there is a scarcity of datasets containing sequences of images that represent complete words
in sign language. Typically, sign language words are composed of actions captured in a
sequence of image frames. To recognize them it requires continuous flow of data as input and
it can’t be done through existing models as they only operate on limited input size.
METHODOLGY
It keeps doing this until you press the 'q' key to stop.
COLLECT KEY POINT
VALUES FOR TRAINING
AND TESTING
Training sign actions involve defining an array of sign
actions to be trained.
For each sign action, a separate folder is created,
and a loop is executed to process each sign
individually.
Within each sign folder, 30 sequences are recorded,
with each sequence consisting of 10 frames.
we gathers x, y, and z coordinates for hand
landmarks, while for body pose, it collects x, y, and z
coordinates along with visibility. If no landmarks are
detected, arrays of zeros are returned. These key
points are combined into a single array and saved for
each sequence.
PREPROCESS DATA AND CREATE LABELS
AND FEATURES
Data preprocessing is an essential step in
developing deep learning model.. It involves
preparing and cleaning the data to ensure that
the deep learning algorithms can work effectively.
We loop through all frames of each sequence,
progressively appending the data of each frame
to the data of the previous frame. Likewise, we
append the data of each sequence of an action to
the data of the previous sequence, forming an
individual array for each action containing all its
sequences.
X represents the array of sequences while Y
represents the each action
splits the data into training and testing sets, with
a test size of 5%.
BUILD AND TRAIN LSTM
NEURAL NETWORK
The defined model architecture consists of
multiple LSTM layers followed by dense layers,
culminating in a SoftMax output layer.
The model is compiled using the Adam optimizer
and categorical cross-entropy loss function, with
categorical accuracy as the evaluation metric.
Training is performed for 100 epochs on the
training data.
EVALUATION USING CONFUSION
MATRIX AND ACCURACY SCORE
The model's predictions are generated for the
test data.
The true labels of the test data are extracted.