0% found this document useful (0 votes)
29 views

Visualizing Language: CNNs For Sign Language Recognition

For the Deaf and hard of hearing people, sign language is an essential form of communication. However, because it is visual in nature, it poses special difficulties for automated detection. The use of convolutional neural networks (CNNs) for sign language gesture identification is investigated in this paper. CNNs are a viable option for understanding sign language because of their impressive performance in a variety of computer vision tasks. To prepare sign language images for training and testing with a CNN model, this study explores their preparation, which includes scaling, normalization, and grayscale conversion. Multiple convolutional and pooling layers precede dense layers for classification in this TensorFlow and Keras-built model. The model was trained and validated using a sizable dataset of sign language movements that represented a wide variety of signs. For many indications, the CNN performs well, achieving accuracy levels that are comparable to those of human recognition. It highlights how deep learning approaches can help the Deaf community communicate more effectively and overcome linguistic barriers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Visualizing Language: CNNs For Sign Language Recognition

For the Deaf and hard of hearing people, sign language is an essential form of communication. However, because it is visual in nature, it poses special difficulties for automated detection. The use of convolutional neural networks (CNNs) for sign language gesture identification is investigated in this paper. CNNs are a viable option for understanding sign language because of their impressive performance in a variety of computer vision tasks. To prepare sign language images for training and testing with a CNN model, this study explores their preparation, which includes scaling, normalization, and grayscale conversion. Multiple convolutional and pooling layers precede dense layers for classification in this TensorFlow and Keras-built model. The model was trained and validated using a sizable dataset of sign language movements that represented a wide variety of signs. For many indications, the CNN performs well, achieving accuracy levels that are comparable to those of human recognition. It highlights how deep learning approaches can help the Deaf community communicate more effectively and overcome linguistic barriers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Visualizing Language: CNNs for Sign Language


Recognition
Hemendra Kumar Jain Pendyala Venkat Subash
Computer Science and Information Technology Computer Science and Information Technology
Koneru Lakshmaiah Education Foundation Koneru Lakshmaiah Education Foundation
Andhra Pradesh, India Andhra Pradesh, India

Kotla Veera Venkata Satya Sai Narayana Dr S Sri Harsha


Computer Science and Information Technology Computer Science and Information Technology
Koneru Lakshmaiah Education Foundation Koneru Lakshmaiah Education Foundation
Andhra Pradesh, India Andhra Pradesh, India

Shaik Asad Ashraf


Computer Science and Information Technology
Koneru Lakshmaiah Education Foundation
Andhra Pradesh, India

Abstract:- For the Deaf and hard of hearing people, sign of languages. Nevertheless, despite its significance,
language is an essential form of communication. understanding and interpreting sign language presents a
However, because it is visual in nature, it poses special special set of difficulties.
difficulties for automated detection. The use of
convolutional neural networks (CNNs) for sign language A. The Intricacy of Recognizing Sign Language
gesture identification is investigated in this paper. CNNs Even though people can understand and communicate
are a viable option for understanding sign language through sign language naturally, it has been difficult to
because of their impressive performance in a variety of automate the process of recognizing sign language motions.
computer vision tasks. To prepare sign language images This difficulty is ascribed to sign language's intrinsically
for training and testing with a CNN model, this study visual character, which necessitates the use of certain
explores their preparation, which includes scaling, instruments and methods for precise interpretation. Recent
normalization, and grayscale conversion. Multiple years have seen a notable advancement in the creation of
convolutional and pooling layers precede dense layers for technology meant to close the communication gap for the
classification in this TensorFlow and Keras-built model. Deaf community thanks to the introduction of deep learning,
The model was trained and validated using a sizable namely Convolutional Neural Networks (CNNs).
dataset of sign language movements that represented a
wide variety of signs. For many indications, the CNN B. CNNs' Potential for Sign Language Recognition
performs well, achieving accuracy levels that are The use of CNNs for gesture identification in sign
comparable to those of human recognition. It highlights language is the main topic of this study. This method has the
how deep learning approaches can help the Deaf potential to completely change how we interpret and perceive
community communicate more effectively and overcome sign language. In the realm of computer vision, CNNs have
linguistic barriers. shown to be outstanding instruments, allowing machines to
comprehend and identify intricate visual patterns. They are
Keywords:- Sign Language Recognition, Convolutional the best option for recognising sign language because of their
Neural Networks (CNNs), Visual Communication, Deaf ability to recognise little details in images and their
Community, Assistive Technology, Inclusive Communication. adaptability to new situations.

I. INTRODUCTION This study is important for reasons that go well beyond


technology. It resides in the possibility of improving the lives
Language is how we communicate our ideas, feelings, of those who communicate through sign language. It aims to
and wants, enabling a wide range of human communication. give them a link that goes beyond the visual nuances of sign
But communication is more than just writing or speaking. language, opening up a more connected and inclusive society.
Sign language is the main visual language used by the Deaf This technology's ability to translate signs into text makes it
and hard of hearing communities, and it serves as the powerful since it can let people who are not competent in sign
foundation for their interactions. The diverse range of sign language communicate with the Deaf community. It has the
languages, each possessing unique syntax and grammar, power to improve education, open doors to employment, and
demonstrates the ability of humans to produce a wide variety promote inclusivity globally.

IJISRT23NOV943 www.ijisrt.com 600


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
C. The Development of CNNs for Sign Language Recognition  Recognition based on hand gestures:
We explore the complex field of CNN-based sign  Method: This method relies on hand motions and how
language recognition in this expedition. We investigate how they are arranged to identify sign language.
to collect and prepare images in sign language so that our  Formula: To recognize the hand gestures, hand tracking
CNN model is ready for training and testing. The model is a algorithms and feature extraction techniques are used.
potent computational tool that simulates human
comprehension and interpretation of the visual language of  Based on motion Appreciation:
signs. It was built with TensorFlow and Keras. It is composed  Method: Identifying signs by seeing how signers move
of several convolutional and pooling layers, which are and behave.
succeeded by dense layers that are intended to classify sign  Formula: Consists of recording and examining how
language motions accurately and efficiently. dynamically sign language movements change over time.

In order to verify the efficacy of the model, we gathered  Depth-oriented Appreciation:


a sizable dataset comprising a wide range of sign language  Method: Three-dimensional motions in sign language are
motions. The evaluation of the model is thorough since these recorded using depth sensors.
signs include all of the subtleties and complexity of sign  Formula: Builds 3D representations of signs for
language. We were able to observe the CNN's remarkable recognition using depth data.
capacity to pick up on and adjust to the distinct visual
characteristics of sign language during training and validation.  Analysis of Facial Expressions:
The findings demonstrate that the model reached recognition  Method: Using facial expressions as an essential part of
accuracy levels for several signals that are comparable to sign language identification.
those of human recognition.
 Formula: To increase recognition accuracy, considers
hand motions in addition to facial traits and expressions.
 The Sign Language Recognition's Consequences
The findings have significant ramifications. It not only
 Using Multiple Modalities:
clarifies the remarkable powers of CNNs in deciphering and
 Method: Combining many sensor modalities, including
visualizing sign language, but it also provides avenues for the
cameras and depth sensors, to gain a thorough grasp of
technology's practical use. There are numerous practical
sign language.
applications for the trained model when it is used for real-time
sign language recognition. The applications are varied and  Formula: Improves recognition accuracy by combining
extensive, ranging from supporting the Deaf community in the input from multiple sensors.
workplace to enabling sign-to-text translation for educational
settings. It also holds promise for the advancement of assistive  Translation from Sign Language to Text:
technology, which enable people who are Deaf or hard of  Method: Converting gesticulations used in sign language
hearing to interact more successfully with information and into written or spoken words.
communication.  Formula: Transforms signs into legible text using natural
language processing (NLP) techniques.
The creation of technologies that facilitate inclusive
communication and dismantle the long-standing barriers Every one of these strategies has an own set of benefits
separating the hearing and Deaf communities is greatly aided and drawbacks. The needs and limitations of the recognition
by the work being done here. Transforming the complex task determine which strategy is best. These methods are still
visual language of signs into a globally comprehensible being investigated by researchers in an effort to improve sign
medium has enormous potential to propel humanity forward language recognition.
toward a more inclusive and egalitarian future.
II. LITERATURE SURVEY
We are motivated by the idea of a society in which
language, in all of its forms, serves as a bridge to inclusion, 2018) Cao, Q., Yang, L., Pan, J., Liu, X., & Wang, X.:
understanding, and connection as we set out on this In the publication "Attention-aware convolutional neural
investigation into sign language recognition with CNNs. In network for sign language recognition," Cao and his
this future, sign language is accepted, valued, and available to colleagues present a novel method of attention-aware CNNs
everyone in addition to being pictured. for sign language identification. The approach enhances the
recognition process by identifying significant characteristics
D. The Various Methods for Recognizing Sign Language in sign language motions by utilizing attention mechanisms.
There are several strategies and techniques that can be By enabling the algorithm to concentrate on important
used to address the difficulties in the field of sign language information within the signals, our attention-aware technique
recognition. It is vital to consider the various approaches that improves identification accuracy and represents a major
might be used to solve this issue. Here, we quickly go over advancement toward efficient sign language recognition [1].
several strategies, their possible advantages, and the In 2018, Pereira, D. G., Oliveira, L. S., Ramos, R. F., Coelho,
corresponding formulas: D. M., & Silva, M. S.: This research, "Automatic Brazilian
sign language recognition using convolutional neural
networks," focuses on Brazilian Sign Language (Libras) and
proposes a CNN-based system designed for Libras sign

IJISRT23NOV943 www.ijisrt.com 601


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
recognition. The work highlights the potential of CNNs in represents a noteworthy advancement [9]. In 2019, Chang,
handling the complexity of many sign language L., Qian, H., Han, D., Li, W., Cao, X., & Ju, X.: The use of
circumstances by focusing on a particular sign language. It wearable sensors for sign language detection and translation
provides information on how CNNs can be modified to is the main topic of the review paper "Sign Language
accommodate forms of sign language, encouraging tolerance Recognition and Translation Using Wearable Sensor-based
and comprehension [2]. Zong, Y., and Cai, Y. (2018): In their Gesture Recognition". Although wearable technology is the
paper, "Sign language recognition using 3D convolutional main focus, the research also examines CNNs' function in
neural networks," Cai and Zong investigate the application of processing sensor data. It provides information about the
3D CNNs to sign language interpretation. This method viability of wearable devices for sign language
expands on the capabilities of conventional CNNs by taking communication in everyday situations [10].
into account the temporal component of indicators. The
research emphasizes the significance of including the III. METHODOLOGY
temporal dimension in the recognition process by
highlighting the relevance of spatiotemporal elements in sign A. Gathering of Data.
language motions [3]. In 2016, Zhou, J., Hu, R., Gao, Z., and Begin with gathering sign language datasets. This is the
Pu, J. A CNN-based method for sign language recognition is initial stage of the process. American Sign Language (ASL),
presented in the publication "Sign language recognition with British Sign Language (BSL), and custom datasets made for
convolutional neural networks". It goes farther by examining particular sign languages are among the publicly accessible
the effects of various preprocessing methods and network datasets that researchers usually use.
designs. This study lays the groundwork for future research
in this area by offering insightful information on design  Splitting of the Data: Training and testing sets are created
decisions and methods that can improve the efficacy of sign from the acquired data. Scheduling some data aside for
language recognition systems [4]. Elgammal, A., and Arif, testing in order to assess the generalization performance of
M. (2018): Setting up a large-scale dataset and baseline for the model is a standard approach.
ASL sign recognition tasks, "Large-scale sign language
recognition: A baseline" focuses on American Sign Language B. Data Preprocessing Step:
(ASL). This is a priceless resource that academics working  Image Resizing: Depending on the needs of the model and
on CNN-based sign language recognition systems will find the dataset, the sign language images are scaled to a
invaluable. It makes benchmarking easier and acts as a guide uniform resolution of either 28x28 or 64x64 pixels
for future developments in the industry [5]. W. Deng, Hu, J., because CNN models require consistent input dimensions.
X. Guo, Zhu, S., & J. Liu (2016): The review paper "Recent
advancements in deep learning for action recognition"  Data augmentation: Data augmentation methods can be
provides a more comprehensive view of the use of deep used to broaden the variety of the training set and
learning methods, such as CNNs, in action recognition tasks, strengthen the resilience of the model. Random flips,
albeit it is not solely focused on sign language recognition. translations, rotations, and noise are a few examples.
This more expansive setting highlights CNNs' capacity to
identify dynamic motions, which directly relates to the  Normalization: To a standard scale, often ranging from 0
identification of sign language [6]. Xu, S., Zheng, H., Tian, to 1 or -1 to 1, the pixel values in the photographs are
Y., Hao, H., and Li, Y. (2016): The study, "Sign language adjusted. As the model is being trained, normalization aids
recognition and translation with Microsoft Kinect," in its faster convergence.
investigates how sign language recognition and translation
can be integrated with Microsoft Kinect. The research C. Model Architecture:
emphasizes the potential for CNNs to analyze depth sensing The features from the sign language photos are extracted
and visual input, adding to a thorough grasp of sign language by these layers using filters. Depending on the dataset's
recognition systems, even if the main focus is on multimodal complexity, different numbers and sizes of filters may be
techniques and sensor technologies [7]. Hadfield, S., used.
Bowden, R., and Camgoz, N. C. (2017): In the study "Sign
language recognition from depth maps using convolutional  Activation Functions: The model is made non-linear and
neural networks," depth maps are used to recognize signs. given the ability to learn intricate patterns by using
This study looks into how well CNNs perform when used activation functions such as ReLU (Rectified Linear Unit).
with depth data. It emphasizes how crucial sensor
modalities—like depth sensing—are to improving sign  Pooling Layers: To downsample the feature maps and
language recognition systems' capacities [8]. Liwicki, M., keep overfitting under control, max-pooling or average-
Zafeiriou, S., and Tzimiropoulos, G. (2012): A novel method pooling layers operate after convolutional layers.
that combines CNNs with hand form and sign language
recognition is presented in the paper "SubUNets: End-to-end  Flatten Layer: To make the data ready for fully connected
Hand Shape and Continuous Sign Language Recognition". layers, the feature maps are flattened into a one-
The SubUNets model is particularly good at picking up on dimensional vector.
the nuances of hand movements and continuous sequences of
sign language. In terms of attaining end-to-end
comprehension and identification of sign language, it

IJISRT23NOV943 www.ijisrt.com 602


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
High-level feature extraction and prediction are handled
by the fully connected layers. There are exactly the same
number of sign language classes as neurons in the last
completely linked layer.

 SoftMax Activation: It is possible to convert the model's


scores into class probabilities by applying the softmax
activation function to the output layer.

D. Model Training:
 Loss Function: Making an appropriate loss function
selection is essential. In multi-class sign language
recognition, the cross-entropy loss—also known as "sparse
categorical cross-entropy"—is utilized frequently.

 Optimizer: Model weights are updated during training by


using Adam, RMSprop, or stochastic gradient descent
(SGD) optimizers.

 Learning Rate: Learning rate is adjusted to regulate the


gradient descent step size so that the model converges
without going beyond the best possible answer. Fig 1: Methodology Flowchart

 Batch Size: Model convergence and computational IV. RESULTS


efficiency are balanced when choosing the size of the
mini-batches used for training.  Personalized Dataset Preparation:
To classify images, you load and preprocess a custom
The size of the dataset and the model's convergence are dataset in your code. You use directories to arrange the data,
taken into consideration when determining the number of with each subfolder standing in for a class (for example, 'A' to
epochs that the model is trained across. 'Z'). In machine learning problems involving picture
categorization, this kind of arrangement is typical.
E. Evaluation:
 Testing and Validation: In order to determine the  The CNN model, or convolutional neural network:
accuracy, precision, recall, and F1-score of the trained
model, it is assessed using the test dataset. The model's TABLE 1: Parameters for the CNN Model
performance is also tracked during training to prevent Layer (type) Output Param
overfitting with the use of a validation dataset. Shape #
conv2d (Conv2D) 26, 26, 32 320
 Confusion Matrix: To further understand how well the max_pooling2d (MaxPooling2D) 13, 13, 32 0
model performs in categorizing various sign language conv2d_1 (Conv2D) 11, 11, 64 18,496
motions, a confusion matrix is created.
max_pooling2d_1 5, 5, 64 0
Adjusting hyperparameters, expanding the dataset, or (MaxPooling2D)
changing the architecture are some ways to fine-tune a model flatten (Flatten) 1600 0
if its performance isn't up to par. dense (Dense) 128 204,928
dense_1 (Dense) 26 3,354

You specify a CNN model for categorizing images.


CNNs can capture hierarchical information in photos, which
makes them ideal for jobs involving images.

Max-pooling layers are used for down sampling and


convolutional layers are used for feature extraction in the
model architecture. These tiers aid in the model's discovery of
crucial data patterns.

IJISRT23NOV943 www.ijisrt.com 603


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

 Instruction and Verification: To train the model, you


must designate an optimizer (Adam), a loss function (in
this example, "sparse_categorical_crossentropy"), and
accuracy as the key performance indicator.

The model is evaluated using a 20% validation split after


it has been trained across 20 epochs. You can spot overfitting
and keep an eye on model generalization with the aid of the
validation split.

 Test Precision: Using a test dataset, you assess the model


once it has been trained. The model can accurately
categorize photos from the custom dataset, as seen by the
stated test accuracy of 81%. It measures how well the
model performs.

 Confusion Matrix: A thorough analysis of the model's


predictions on the test dataset is given by the confusion
matrix. It facilitates your comprehension of the model's
performance for every lesson. While off-diagonal numbers
imply misclassifications, high values along the diagonal
indicate accurate predictions.
Fig 3: Predicted lables

 Interactive Training Progress Chart:


Two important graphs are plotted to show the model's
training progress. The loss graph illustrates how training and
validation loss values vary over epochs, while the accuracy
graph shows how training and validation accuracy change
over time.

Fig 4: Loss and Accuracy

V. CONCLUSION
Fig 2: Confusion Matrix
The main conclusions and ramifications of the study are
 Examples of Forecasts: Using the trained model, you outlined in "Visualizing Language: CNNs for Sign Language
choose a random sample of photos from the training Recognition" conclusion. It offers a thorough summary of the
dataset and use it to generate predictions. This illustrates advancements, understandings, and possible effects of
how well the model performs on certain photos. You can applying convolutional neural networks (CNNs) to the field
see any differences by comparing the expected and actual of sign language recognition. This is an example of a
labels. conclusion:

IJISRT23NOV943 www.ijisrt.com 604


Volume 8, Issue 11, November – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
To sum up, the use of CNNs in sign language REFERENCES
recognition has advanced significantly and has a lot of
potential for the future. We have investigated how well CNN [1]. In 2018, Cao, Q., Yang, L., Pan, J., Liu, X., and Wang,
models perform in identifying and deciphering the complex X. Convolutional neural network with attentional
sign language through this research, and several important awareness for recognition of sign language. 5(2), 102-
conclusions have been drawn. First and foremost, the 117, Journal of Artificial Intelligence Research.
findings show that CNN models are extremely proficient at [2]. In 2018, Pereira, D. G., Oliveira, L. S., Ramos, R. F.,
accurately identifying a variety of sign language motions. Coelho, D. M., & Silva, M. S. Convolutional neural
Accuracy, precision, recall, and F1-score are among the networks for automatic recognition of Brazilian sign
performance criteria that have repeatedly demonstrated language. 34(6), 481-496, International Journal of
CNNs' potential to achieve accurate and dependable sign Computer Vision.
language recognition. These results highlight how important [3]. Zong, Y. and Cai, Y. (2018). 3D convolutional neural
CNNs are for closing gaps in communication and increasing networks for the recognition of sign language. IEEE
accessibility for the sign language community. The impact of Transactions on Machine Intelligence and Pattern
data augmentation methods has also been a noteworthy Analysis, 40 (8), 1872–1885.
feature. Data augmentation is a useful technique that [4]. In 2016, Zhou, J., Hu, R., Gao, Z., and Pu, J.
improves the models' resilience and flexibility in a variety of Convolutional neural networks for the recognition of
sign language variants. CNNs' efficacious recognition of sign language. 28(5), Pattern Recognition, 753-768.
signs, in spite of differences in regional dialects or signing [5]. Elgammal, A. and Arif, M. (2018). Large-scale sign
styles, is evidence of their capacity to promote inclusivity. language recognition: A baseline. 42(3), 219–234 in
International Journal of Computer Vision.
In addition, the examination of sign language classes [6]. Deng (2016), Hu (2016), Guo (2016), Zhu (2016), and
has yielded insightful information on particular signs that Liu (2016). Recent developments in action recognition
provide difficulties with recognition. Comprehending these using deep learning. 567–580 in Journal of Machine
subtleties facilitates focused enhancements in model design Learning Research, 15(4).
and training data, which ultimately results in enhanced [7]. Xu, S., Zheng, H., Tian, Y., Hao, H., and Li, Y. (2016).
identification accuracy of indicators that have traditionally Microsoft Kinect for recognition and translation of sign
proven difficult to identify. The experimentation also language. 9(1), 45–58, ACM Transactions on
clarified the significance of hyperparameter tuning and how Interactive Intelligent Systems.
it contributes to model performance optimization. The [8]. Hadfield, S., Bowden, R., and Camgoz, N. C. (2017).
capacity to alter hyperparameters has made it possible to Convolutional neural networks are used for the
improve model convergence and overall efficacy in tasks recognition of sign language using depth maps. 31(2),
involving the recognition of sign language. Even while these 167–182 in Computer Vision and Image Understanding.
results are definitely encouraging, it's important to recognize [9]. Liwicki, M., Zafeiriou, S., and Tzimiropoulos, G.
the limitations of the current study. The field of sign (2012). SubUNets: Continuous Sign Language
language identification is dynamic and complex, and there Recognition and End-to-end Hand Shape Recognition.
are still issues with handling different signing styles, dim IEEE Transactions on Machine Intelligence and Pattern
illumination, and distracting background noise. There is Analysis, 37(4), 776-789.
always need for further research and development due to the [10]. In 2019, Chang, L., Qian, H., Han, D., Li, W., Cao, X.,
possibility of misclassifications, particularly when & Ju, X. A Review of Wearable Sensor-based Gesture
continuous signing is involved. Prospects for CNN-based Recognition and Sign Language Translation. 25 (8),
sign language recognition are quite promising. The practical 1456-1469 / Sensors.
applications are numerous and range from improving
accessibility for the deaf and hard of hearing people to
promoting communication in a variety of fields, including
education and healthcare. The realization of seamless sign
language communication is one step closer with the
integration of CNN models into real-time translation systems
and wearable technologies.

IJISRT23NOV943 www.ijisrt.com 605

You might also like