Sign Speak: Recogninzing Sign Language With Machine Learning
Sign Speak: Recogninzing Sign Language With Machine Learning
Integrating GRU and 3D Convolutional Neural Networks language gestures, offering innovative solutions to gesture
(CNNs) is crucial to address the temporal dynamics and recognition challenges. This literature survey provides a
spatial dependencies inherent in sign language comprehensive overview of the evolution of sign language
communication. The challenge lies in capturing the nuanced recognition techniques over the past few years, highlighting
movements and expressions within sign language gestures, key advancements and research trends in the field.
ensuring accurate translation into text or speech. By
leveraging deep learning models and computer vision In 2018, significant progress was made in deep
algorithms, SignSpeak aims to achieve real-time and precise learning and computer vision techniques for sign language
interpretation of sign language, promoting accessibility and recognition, as evidenced by works such as "American Sign
inclusion for the deaf and hard of hearing community. The Language Recognition using Deep Learning and Computer
project seeks to overcome existing limitations in sign Vision" by K. Bantupalli and Y. Xie. This study explored
language recognition systems by advancing state-of-the-art the application of deep learning methods to recognize
machine learning approaches. Evaluation metrics such as F1 American Sign Language gestures, laying the groundwork
score, accuracy, recall, and AUCROC are employed to for subsequent research in this area.
assess the performance of predictive models and ensure
effective precision. SignSpeak aims to revolutionize In 2019, Lean Karlo S. Tolentino et al. proposed a
communication accessibility for the deaf and hard of hearing novel approach to sign language identification using deep
population, contributing to a more inclusive society through learning, as detailed in "Sign language identification using
technological innovation. Deep Learning." This work contributed to the growing body
of literature on deep learning-based approaches for sign
Objective: language recognition, demonstrating promising results and
The objective of SignSpeak is to develop a robust opening up new avenues for research. Moving into 2020,
machine learning system capable of accurately recognizing Ankita Wadhawan and Parteek Kumar presented a deep
and interpreting sign language gestures in real-time. By learning-based sign language recognition system for static
integrating GRU and 3D Convolutional Neural Networks, signs. This study highlighted the importance of static sign
the project aims to address the temporal dynamics and recognition in practical applications and showcased the
spatial dependencies inherent in sign language potential of deep learning techniques to achieve accurate
communication. The system will provide seamless and efficient recognition of sign language gestures
translation of sign language into text or speech, fostering
accessibility and inclusion for the deaf and hard of hearing In 2021, there was a growing emphasis on real-time
community. SignSpeak seeks to advance existing sign sign language interpretation systems, with Geethu G Nath
language recognition technology by leveraging deep and Arun C S presenting their work on a "Real Time Sign
learning models and computer vision algorithms. The Language Interpreter" at the 2017 International Conference
project aims to achieve high accuracy and reliability in on Electrical, Instrumentation, and Communication
interpreting a wide range of sign language gestures. Engineering (ICEICE2017). This research addressed the
Additionally, SignSpeak aims to create a user-friendly need for systems capable of interpreting sign language
platform that can be easily accessed and utilized by both gestures in real-time, enabling seamless communication
individuals fluent in sign language and those unfamiliar with between individuals who are deaf or hard of hearing and
it. Ultimately, the objective is to break down communication those who are hearing.
barriers and promote equal participation and engagement for
all individuals, regardless of their hearing abilities. Finally, in 2022, researchers such as CABRERA,
MARIA et al. continued to explore gesture recognition
II. LITERATURE SURVEY systems, with their work on a "GLOVE-BASED GESTURE
RECOGNITION SYSTEM." This study investigated the use
The literature survey in the domain of sign language of wearable devices such as gloves for capturing and
recognition spans several years, each marked by significant interpreting sign language gestures, offering a hands-on
advancements in deep learning, computer vision, and approach to gesture recognition technology.
gesture recognition techniques. Beginning in 2018,
researchers delved into the application of deep learning and Existing System:
computer vision for recognizing sign language gestures, The existing system employs a combination of
paving the way for subsequent studies. In 2019, a focus on Bidirectional Long Short-Term Memory (BiLSTM)
deep learning-based approaches emerged, showcasing networks and Convolutional Neural Networks (CNNs) to
promising results in sign language identification. The year tackle tasks such as action recognition and gesture detection
2020 saw the development of systems tailored for in sign language videos. Bi-LSTM networks are adept at
recognizing static signs, underscoring the practical capturing long-range dependencies within sequential data,
applications of sign language recognition technology. Real- making them well-suited for modeling the temporal
time interpretation systems gained traction in 2021, dynamics present in video sequences. On the other hand,
addressing the need for seamless communication between CNNs are particularly effective at extracting spatial features
individuals who are deaf or hard of hearing and those who from image frames, enabling the identification of
are hearing. Finally, in 2022, researchers explored wearable discriminative patterns crucial for recognizing gestures. By
devices like gloves for capturing and interpreting sign integrating these two architectures, the system can leverage
both temporal and spatial information, thereby enhancing its features effectively. Addressing these challenges is crucial
ability to perform robustly in gesture recognition and action for further improving the system's performance and
classification tasks. However, despite the advantages of this advancing the field of sign language recognition. That can
hybrid approach, several challenges persist. Bi-LSTM be easily accessed and utilized by both individuals fluent in
networks may encounter difficulties in capturing highly sign language and those unfamiliar with it. Ultimately, the
complex temporal dependencies, potentially leading to objective is to break down communication barriers and
limitations in their effectiveness, particularly when applied promote equal participation and engagement for all
to large-scale video datasets. Similarly, while CNNs excel at individuals, regardless of their hearing abilities.
extracting spatial features, they may struggle to model long-
range temporal relationships inherent in sign language Existing System Architecture
videos, requiring extensive preprocessing to extract relevant
Architecture of MSP-NET
Proposed System: to capture both spatial and temporal features directly from
The proposed system introduces a novel architecture video data. This enhancement enables more effective
combining 3D Convolutional Neural Networks (CNNs) and modeling of the intricate temporal dynamics present in sign
Gated Recurrent Units (GRUs) to address the limitations of language videos. By leveraging the 3D CNNs, the proposed
the existing approach. 3D CNNs extend traditional CNNs by system aims to overcome the challenges associated with
incorporating an additional dimension, time, allowing them capturing long-range temporal relationships, which were
III. METHODOLOGY
Employ techniques such as early stopping and learning Import the Libraries:
rate scheduling to improve training efficiency and Libraries required are NumPy, Pandas, Matplotlib,
convergence. TensorFlow, Seaborn , Scikit-learn (sklearn), Keras,
ImageDataGenerator, and ReduceLROnPlateau.
Basic Architecture:
The Multilayer Perceptron (MLP) architecture is a type of feedforward artificial neural network commonly used for
supervised learning tasks, including regression and classification. It consists of multiple layers of interconnected neurons, each
performing specific operations on the input data. Here's a breakdown of the key components of the MLP architecture: