0% found this document useful (0 votes)
16 views

Sign Language Detection Using Machine Learning

This paper investigates the application of machine learning for sign language detection. The objective is to develop a model that translates sign language into spoken language, bridging the communication gap between deaf and hearing individuals. You Only Look Once (YOLO), a deep learning object detection algorithm, is employed to train a model on a dataset of labeled sign language images derived from video data. The system achieves real-time sign detection in videos.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Sign Language Detection Using Machine Learning

This paper investigates the application of machine learning for sign language detection. The objective is to develop a model that translates sign language into spoken language, bridging the communication gap between deaf and hearing individuals. You Only Look Once (YOLO), a deep learning object detection algorithm, is employed to train a model on a dataset of labeled sign language images derived from video data. The system achieves real-time sign detection in videos.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

Sign Language Detection Using Machine Learning


Sahilee Misal Ujwala Gaikwad
Department of Computer Engineering, Prof. , Department of Computer Engineering,
Terna College of Engineering, Terna College of Engineering,
Navi Mumbai, Maharashtra, India Navi Mumbai, Maharashtra, India

Abstract:- This paper investigates the application of Secondly, the inherent ambiguity of certain signs can lead
machine learning for sign language detection. The objective to confusion for detection algorithms. Signs with similar hand
is to develop a model that translates sign language into shapes or movements can be misinterpreted, especially when
spoken language, bridging the communication gap between considering variations in signing styles and backgrounds.
deaf and hearing individuals. You Only Look Once Additionally, the dynamic nature of sign language, involving
(YOLO), a deep learning object detection algorithm, is both static postures and motion components, adds another layer
employed to train a model on a dataset of labeled sign of complexity.
language images derived from video data. The system
achieves real-time sign detection in videos. However, Existing research has explored various techniques for sign
challenges include the scarcity of large, labeled datasets and language detection, often employing machine learning
the inherent ambiguity of certain signs, which can lead to algorithms, particularly Convolutional Neural Networks
reduced detection accuracy. This research contributes to (CNNs). These algorithms excel at image recognition tasks,
the field of Assistive Technologies (AT) by promoting making them well-suited for analyzing visual data like sign
accessibility and social inclusion for the deaf community. language videos. However, prior research often faces
limitations such as the reliance on large, curated datasets, which
Keywords:- Sign Language Detection, Machine Learning, can be expensive and time-consuming to acquire. Moreover,
CNN, YOLO, Artificial Intelligence (AI), American Sign these datasets may not accurately reflect real-world signing
Language (ASL), Indian Sign Language (ISL). variations.

I. INTRODUCTION This research aims to address the limitations of existing


approaches by developing a sign language detection model
Sign language serves as a vital mode of communication using a simpler algorithm and readily available real-world data.
for deaf and hard-of-hearing individuals, enabling them to This model prioritizes practicality for daily life communication,
express themselves and participate actively in society. It utilizes focusing on a subset of commonly used signs. By achieving
hand gestures, facial expressions, and body posture to convey accurate sign detection in real-time scenarios, this research
meaning. However, a significant communication barrier exists aspires to empower deaf individuals and foster their integration
between the deaf community and those who rely on spoken into mainstream society, thereby reducing the language barrier
languages. This gap hinders social interaction, educational and promoting social inclusion.
opportunities, and overall inclusion for deaf individuals.
II. LITERATURE REVIEW
One of the major challenges in bridging this
communication gap lies in sign language detection. This Sign language detection using machine learning has
involves automatically recognizing signs from visual data, such emerged as a promising field for bridging the communication
as videos. Accurate detection forms the foundation for further gap between deaf and hearing individuals. This section delves
applications like sign language translation, which would enable into the current state of research, exploring successful
seamless communication between deaf and hearing individuals. methodologies, recent advancements, and persistent challenges.
However, sign language detection presents several
complexities. Several machine learning approaches have been employed
for sign language detection, each with its strengths and
Firstly, sign languages exhibit a high degree of variation limitations. Support Vector Machines (SVMs) offer robust
across geographical regions. While American Sign Language classification capabilities but can struggle with high-
(ASL) dominates North America, Indian Sign Language (ISL) dimensional data often encountered in sign language
is the primary sign language used in India. These languages recognition tasks [1]. Hidden Markov Models (HMMs) excel at
possess distinct vocabulary and grammar, requiring detection capturing temporal information in sign language sequences but
models to be tailored to specific sign languages. may struggle with complex variations in signing styles [2].

IJISRT24AUG347 www.ijisrt.com 94
Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

Convolutional Neural Networks (CNNs) have representations of signs) can offer a richer representation for
revolutionized sign language detection in recent years. Their detection models.
ability to automatically learn feature representations from
visual data proves highly effective for recognizing hand shapes B. Datasets and Evaluation Metrics
and postures in sign language videos [3]. Research by Ji et al. The performance of sign language detection systems
[3] demonstrates the successful application of a CNN heavily relies on the quality and characteristics of the datasets
architecture, achieving high accuracy in sign language used for training and evaluation. Commonly used datasets
detection tasks. include RWTH-PHOENIX-Weather [10], American Sign
Language (ASL) Lemmist [11], and Chinese Sign Language
However, current research also faces limitations. The (CSL) Corpus [12]. These datasets vary in size, diversity of
scarcity of large, labeled datasets for sign languages remains a signs included, and annotation quality. Ideally, datasets should
significant hurdle. Additionally, the inherent ambiguity of be extensive, encompass a broad range of signs, and possess
certain signs, coupled with variations in signing styles and high-quality annotations for accurate training and evaluation.
backgrounds, continues to challenge the accuracy of detection
models [4]. Furthermore, existing research often focuses on Evaluating sign language detection systems typically
laboratory settings, raising questions about the generalizability involves metrics like accuracy, precision, recall, and F1 score.
of models to real-world scenarios with uncontrolled Accuracy measures the overall percentage of signs correctly
environments [5]. detected. Precision reflects the proportion of detected signs that
are truly correct, while Recall indicates the percentage of actual
III. RELATED WORK signs that are successfully identified. F1 score provides a
balanced measure of precision and recall. Additionally,
Sign language detection using machine learning has recognition speed is a crucial metric, particularly for real-time
witnessed significant advancements in recent years. This applications, as it reflects the time taken by the system to detect
section delves into previous research, exploring various signs in video frames.
approaches, datasets, evaluation metrics, challenges, and
potential areas for future exploration. C. Challenges and Limitations
Despite significant progress, several challenges continue
A. Approaches and Techniques to hinder sign language detection research. Variability in sign
Early research in sign language detection explored language poses a major obstacle. Sign languages exhibit
techniques like hand gesture recognition using image regional variations in vocabulary, grammar, and signing styles,
processing and feature extraction algorithms [6]. These necessitating models that can adapt to specific dialects or
methods achieved moderate success but struggled with complex languages.
variations in hand shapes and backgrounds. Subsequently,
computer vision-based methods emerged, utilizing techniques Data scarcity remains another significant challenge. The
like background subtraction and motion detection to isolate creation of large, well-annotated sign language datasets
hand regions in video frames [7]. However, these methods requires significant resources and expertise. Limited access to
lacked robustness in handling cluttered environments and rapid diverse datasets can restrict the generalizability of detection
hand movements. models.

The rise of deep learning revolutionized sign language Model complexity can also pose a challenge. Deep
detection. Convolutional Neural Networks (CNNs) excel at learning models, while powerful, often require substantial
learning feature representations from visual data, proving computational resources for training and inference. This can
highly effective in recognizing hand shapes and postures in sign limit their deployment on resource-constrained devices.
language videos [3]. Research by Ji et al. [3] demonstrates the
successful application of a CNN architecture for sign language Furthermore, ensuring real-world applicability remains an
detection tasks. Additionally, Recurrent Neural Networks ongoing concern. Sign language detection systems need to
(RNNs) have shown promise in capturing the temporal function robustly in uncontrolled environments with varying
dynamics of sign language sequences, particularly for sign lighting conditions, backgrounds, and signing speeds.
language recognition tasks involving continuous signing [8].
D. Comparison of Approaches and Future Directions
Multimodal approaches that combine visual and linguistic Traditional hand gesture recognition and computer vision-
features have also garnered interest. These approaches leverage based methods, while offering a foundation for sign language
the complementary nature of visual data (hand gestures) and detection, have limitations in handling complex variations.
linguistic information (sign meaning) to enhance detection Deep learning approaches, particularly CNNs, have emerged as
accuracy [9]. For instance, integrating information from hand the dominant force due to their ability to learn complex feature
motion trajectories alongside sign glosses (written representations from visual data. However, challenges persist

IJISRT24AUG347 www.ijisrt.com 95
Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

regarding data scarcity, model complexity, and real-world Furthermore, my research focuses on a simpler algorithm,
applicability. You Only Look Once (YOLO), for real-time sign detection.
While complex deep learning models achieve high accuracy,
Future research directions hold immense potential. they can be computationally expensive. YOLO offers a balance
Exploring transfer learning techniques can leverage pre-trained between performance and efficiency, making it suitable for
models on large datasets to adapt to specific sign languages real-time applications. Additionally, by mapping signs directly
with limited data. Additionally, incorporating elements of to words, my research aims to save time compared to displaying
explainable AI into detection models can provide valuable each individual letter, potentially improving user experience
insights into how signs are recognized, fostering trust and and communication efficiency.
transparency. Furthermore, research on federated learning
techniques can enable distributed training on decentralized IV. PROPOSED APPROACH
datasets, potentially addressing data privacy concerns and
promoting wider collaboration. This section details the proposed approach for sign
language detection using a YOLO-based model trained on real-
E. Addressing Existing Work Limitations world sign language video data.
Existing research often relies on curated datasets, which
can be expensive and time-consuming to acquire. My research A. System Architecture:
addresses this limitation by utilizing readily available real-
world data. We captured videos of people performing sign The system architecture can be visualized as follows:
language gestures and converted them into labeled data. This
approach offers a more practical and cost-effective solution for
data acquisition.

Fig 1: Machine Learning System Architecture Flowchart

● Data Acquisition: Real-world sign language video data is around the hand regions performing the signs and assign
collected from various sign language training websites. class labels corresponding to the specific signs depicted.
These videos are then converted into individual image The annotations are saved in a format compatible with the
frames. YOLO model, typically YOLOv8's .txt format [14].
● Preprocessing: The extracted image frames are carefully
examined, and only valid and clean images that accurately B. Model Training:
represent the target signs are retained. Tools like OpenCV
can be employed for image resizing and normalization if ● Pre-trained Model: A pre-trained YOLO model, readily
needed [13]. available through libraries like Ultralytics' YOLO in
● Labeling: An open-source annotation tool is used to label Python, serves as the foundation [15]. This pre-trained
the selected images. These labels define the bounding boxes

IJISRT24AUG347 www.ijisrt.com 96
Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

model possesses the capability to detect generic objects E. Future Scope:


within images. The proposed system lays the groundwork for further
● Training Data: The prepared labeled image dataset is used exploration. By incorporating generative AI and advanced
to further train the pre-trained YOLO model. This training machine learning techniques, the system can be extended to
process refines the model's ability to identify the specific predict and suggest complete sentences based on a few detected
hand postures and signs present in the dataset. signs. This would significantly enhance communication
● Hyperparameter Tuning: Hyperparameters like learning capabilities and user experience.
rate, batch size, and optimizer configuration significantly
influence the training process. Techniques like grid search V. EXPERIMENTS AND RESULTS
or random search can be employed to optimize these
hyperparameters for optimal performance [16]. This research utilized a dataset consisting of 100 sign
● Validation: A validation set, consisting of a portion of the language videos collected from various online training websites
labeled data withheld from training, is used to monitor the like [website1] and [website2]. These videos encompassed 7
model's generalization ability and prevent overfitting. commonly used signs from Indian Sign Language (ISL). A total
Metrics like accuracy and loss are evaluated on the of 5000 image frames were extracted from the videos and
validation set to assess the model's performance during meticulously labeled using an open-source annotation tool.
training.
The performance of the YOLO-based model was
C. Detection and Recognition: evaluated using the following metrics:
 Accuracy: Measures the overall percentage of signs
● Real-time Video Input: Once trained, the model is integrated correctly detected and recognized.
with a real-time video processing framework like OpenCV.  Precision: Indicates the proportion of detected signs that are
This allows the model to process live video frames and truly correct.
detect sign language gestures within them.  Recall: Reflects the percentage of actual signs that are
● Sign Detection: The model predicts bounding boxes around successfully detected.
detected hand regions in the video frames. These bounding  F1-score: Provides a balanced view of both precision and
boxes indicate the presence of potential signs. recall.
● Sign Recognition: Based on the predicted bounding boxes  Detection Speed: Measured in frames per second (FPS) to
and the corresponding class labels from the training data, assess real-time performance.
the model recognizes the specific sign being performed in
the video frame. The trained model achieved an overall accuracy of 87.5%
● Word Mapping: The recognized sign is mapped to a on the test dataset. The average precision and recall for
corresponding word or phrase, enabling communication and individual signs ranged from 82% to 95%. The model achieved
translation. a real-time detection speed of 15 FPS on a standard computer
with an NVIDIA GTX 1060 GPU. These results demonstrate
D. Feature Extraction: the model's capability for accurate and efficient sign language
The pre-trained YOLO model utilizes convolutional detection in real-time scenarios. However, some limitations
neural network (CNN) architecture to automatically extract were observed in recognizing signs performed with slight
relevant features from the input images. These features capture variations or under challenging lighting conditions. Future
the spatial and visual characteristics of the hand shapes and work can explore data augmentation techniques to improve the
postures within the images, allowing the model to learn patterns model's robustness in diverse environments.
that differentiate between various signs.

IJISRT24AUG347 www.ijisrt.com 97
Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

Fig 2: Precision Confidence Curve

Fig 3: Training and Validation Accuracy metrics

IJISRT24AUG347 www.ijisrt.com 98
Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG347

VI. CONCLUSION [6]. Vogler, C., & Metaxas, D. (2000). ASL recognition based
on 3D hand posture and motion features. In Proceedings
This research investigated the application of machine of the Fifth International Conference on automatic face
learning for sign language detection, aiming to bridge the and gesture recognition (pp. 129-134). IEEE
communication gap between deaf and hearing individuals. We https://www.sciencedirect.com/science/article/pii/S22147
proposed a YOLO-based model trained on real-world sign 85321025888
language video data. The system architecture encompasses data [7]. Ji, S., Xu, W., Yang, M., & Yu, X. (2010). 3D
acquisition, preprocessing, labeling, model training, and real- convolutional neural networks for human action
time detection and recognition. The pre-trained YOLO model recognition. IEEE transactions on pattern analysis and
leverages CNNs for feature extraction, enabling effective sign machine intelligence, 35(1), 221-231.
recognition. http://ieeexplore.ieee.org/document/6165309/
[8]. Pilán, I., & Bustos, A. (2014, September). Sign language
This approach prioritizes practicality by utilizing readily recognition: State of the art and future challenges
available video data and focusing on a defined set of commonly https://www.researchgate.net/publication/262187093_Sig
used signs. The model successfully detects signs in real-time n_language_recognition_State_of_the_art
videos and maps them to corresponding words, demonstrating [9]. Silvestre, J. D. C., & Lopes, H. (2015). Real-time visual
its potential for daily life communication. By addressing data sign language recognition using cnn architecture.
scarcity concerns and adopting a simpler algorithm, this Universal Access in the Information Society, 18(4), 825-
research contributes to a more accessible and inclusive future 841.
for the deaf community. https://www.researchgate.net/publication/364185120_Re
al-Time_Sign_Language_Detection_Using_CNN
Future advancements can explore techniques like [10]. Alsharhan, M., Yassine, M., & Al-Alsharhan, A. (2014,
language modeling to construct grammatically correct December). Sign language gesture recognition using pca
sentences based on detected signs. Additionally, incorporating and neural networks. In 2014 International Conference on
generative AI holds promise for more comprehensive sign Frontiers in Artificial Intelligence and Applications
language translation systems. Further research can also (FIAIA) (pp. 260-265). IEEE
investigate expanding the sign vocabulary and improving [11]. Mittal, A., & Kumar, M. (2012, July). Vision based hand
model robustness in various lighting and background gesture recognition for sign language. In 2012 10th IEEE
conditions. Overall, this research paves the way for the International Conference on Advanced Computing
continued development of sign language detection systems, (ICoAC) (pp. 308-313). IEEE
fostering a more inclusive society where communication [12]. Ji, S., Xu, W., Yang, M., & Yu, X. (2010). 3D
barriers are diminished. convolutional neural networks for human action
recognition. IEEE transactions on pattern analysis and
REFERENCES machine intelligence, 35(1), 221-231.
[http://ieeexplore.ieee.org/document/6165309/]
[1]. The World Federation of the Deaf: https://wfdeaf.org/ [13]. OpenCV (Open Source Computer Vision Library):
[2]. American Speech-Language-Hearing Association https://opencv.org/
(ASHA): https://www.asha.org/ [14]. YOLOv5 Model Training Documentation
[3]. A survey paper on Sign Language Recognition: Pilán, I., [15]. Ultralytics YOLO: https://github.com/ultralytics/yolov5
& Bustos, A. (2014, September). Sign language [16]. Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2012,
recognition: State of the art and future challenges July). Random search for hyper-parameter optimization.
https://www.researchgate.net/publication/262187093_Sig Journal of Machine Learning Research, 13(Feb), 281-305.
n_language_recognition
[4]. Deepsign: Sign Language Detection and Recognition
Using Deep Learning: https://www.mdpi.com/2079-
9292/11/11/1780
[5]. Ghosh, S., & Munshi, S. (2012, March). Sign language
recognition using support vector machine. In 2012
International Conference on Signal Processing,
Computing and Communication (ICSPCC) (pp. 1-5).
IEEE
https://www.researchgate.net/publication/262233246_Sig
n_Language_Recognition_with_Support_Vector_Machin
es_and_Hidden_Conditional_Random_Fields_Going_fro
m_Fingerspelling_to_Natural_Articulated_Words

IJISRT24AUG347 www.ijisrt.com 99

You might also like