0% found this document useful (0 votes)
14 views9 pages

Paper Template1

Uploaded by

Bangtan Twtt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

Paper Template1

Uploaded by

Bangtan Twtt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

ISL Sign Language

Recognition
Sabeshini.M, Prashanti.M, Priyanka.K
Information Technology, SRM Valliammai Engineering College, SRM University, Chengalpattu, India
Information Technology, SRM Valliammai Engineering College, SRM University, Chengalpattu, India
Information Technology, SRM Valliammai Engineering College, SRM University, Chengalpattu, India

E-mail: [email protected] , [email protected], [email protected]

Abstract

With the most recent developments, hand gesture recognition is becoming a popular and
efficient means of communication for the deaf and non-verbal community. Individuals with
hearing impairments require assistance globally, yet only a small percentage of people
worldwide are able to read sign language. Sign languages, sometimes referred to as signed
languages, are languages in which meaning is expressed visually rather than verbally. It
becomes difficult to extract motions from images taken by cameras because of a number of
issues.
Problems such as intensity change provide difficulties, noise and other inputs slow down
calculation, and complicated backgrounds make gesture extraction even more difficult.
Directional pictures are used to introduce the pre-processed image region of interest.
landmarks following manual extraction Convex hulls are used to extract landmarks after
manual extraction. Then, the Convolutional Neural Network (CNN) classifier used the
retrieved features to assist in gesture detection and recognition.
As a result, a hand recognition system that recognizes hand gestures regardless of
background noise and noise was created using CNN classifier.
Keywords: CNN, ISL

1. Introduction

Sign language using Hand gestures is an essential communication tool between


disabled people. Each letter is associated with a specific gesture and Hand Gestures are
defined as the movement or pattern of Hands which conveys a meaning. For instance, the
“Thumbs up” sign indicates agree with someone or everything is all right. Currently, there
are 300 different sign languages spoken by many individuals with disabilities, with Indian
sign language being one of them.
Hand gestures would be the best way to express opinions and emotions in a variety of
ways. This would be a good replacement for verbal communication. However, it can be
challenging for individuals who do not understand sign language to interpret these
gestures, As they are not prone to those gestures or well-trained prior. This lack of
knowledge about sign language has led to barriers to basic communication.

To address this issue, we are presenting a design based on hand gesture recognition,
which emphasizes recognizing the hand gesture using many algorithms and the output
would be in text format. The main aim of this design is to make communication trouble-
free between people from all communities using advanced machine-learning techniques.
The main difference among others work is overcoming challenges like generalization,
accuracy, and efficiency.

The recognition of hands can be in two ways, sensor-based and vision-based.


The sensor-based is by using physical sensors such as accelerometers which help to detect
changes in movement, position, and speed of hand. These sensor-detected images are then
processed and recognized as the output. Whereas, vision-based relies on cameras to
capture the images and videos which are then processed into frames to identify the
gestures using various algorithms.
Among These vision-based recognition has come up with a definite outcome.
This sensor-based recognition is needed for datasets that will be trained and evaluated.
So datasets are collected from different people and processed. Different algorithms such
as CNN, and convex hull are also used for processing the images. These algorithms are
used to detect the landmarks, points for clustering, and conversion of normal images to
grayscale. Furthermore, this will be discussed in the proposed work.
The work summarizes to collect datasets and preprocess them using CNN and
convex hull by avoiding the outliers, recognizing the edges, and converting to grayscale in
high resolution. The paper further is partitioned into many parts: Part II consists of
information about the related words, Part III contains the proposed work or the
methodology of the system, Part IV contains the result and discussion about the work, Part
V is of conclusion of the system, and finally references are mentioned.

2. Related Work

As we discussed above, sign language hand recognition is a key to communication.


Some of the references for this paper are Shravani K et. al[1] present a technique that converts
the Indian sign language to a text format. This process comprises two different algorithms such
as “SURF” and “BOW”. “SURF” is a feature detection algorithm used in computer vision to
detect and summarize the features in images. It is also robust in changes of Scale, rotation, and
illumination making it suitable for tasks like image recognition and matching. “BOW” is a
technique used in computer vision and natural language processing for feature representation.

BOW is used to represent hand gestures or patterns as a collection of visual “words”.


But in this paper “CNN” algorithm is used, which is a technique well-suited for tasks involving
visual imagery. CNN uses a hierarchical architecture. They consist of multiple layers, including
convolutional layers for feature extraction. Using “BOW” and “SURF” the model scored 99%
accuracy, but there could be slight biasing in the model prediction as the data set has many
similar images without variation like skin tone.
Jinhwan Koh et. al[3] present accuracy enhancement of hand gesture recognition
using CNN using IR-UWB and 2D-fast Fourier transform. “2D-FFT” is a two-dimensional fast
Fourier transform technique used to analyze and manipulate two-dimensional signals or images
on the frequency domain. “IR-UWB” is a specialized radar technology that utilizes ultra-
wideband signals in the form of short loco power pulses to achieve high-resolution radar
imaging and sensors. GoogleNet and ResNet are some of the prominent CNN techniques that
have achieved 90% accuracy so “2D-FFT” is used to obtain high accuracy.
Faisal Anwer et. al[5] present two models, the first model is a pre-trained VCG-16,
and a recurrent neural network with a long short-term memory schema is combined to form a
three-dimensional convolutional neural network. The other model is built on the sophisticated
object identification algorithm known as “YOLO” which defines you only look once. The
categorized models have prediction accuracy of 82% and 98% respectively. The YOLO-based
model exhibited superior performance, with a remarkable mean average accuracy of 99.6%.

3. Proposed Work

The proposed model uses CNN and Convex Hull algorithm to translate the ISL sign
languages into a text format.
Firstly, the proposed work mainly focuses on CNN with a minimal use of Convex Hull
algorithm. CNN, abbreviated as Convoluted Neural Network were a quite useful when it
comes to Computer Vision. Computer Vision is based on getting resources from images. Static
images were used, and data is collected from it.
There are furthermore process which were done step by step to make our model work
efficiently. Those models were explained on upcoming sub-divisions.
Image Collection

Figure 1. ISL gestures for numeric and alphabets

Even in the ever-developing rise of technology, there is a significant lack of resources


when it comes to sign language, more so Indian Sign language. There wasn’t any availability
of data sets. So, alphabets along with basic words like hi, hello were included in the collections
of data set. The data sets were collected from live videos using a webcam and the still images
from it were taken and stored. Noise in the images were excluded by capturing the images in a
blank background. The images which were collected are the gestures seen in the figure 1.

Image Pre-Processing
In this phase, the CNN algorithm comes in help as it recognizes patterns from images
and videos. It consists of multiple layers like Input layer, Convolutional layer, Max Pooling
layer, Dense layer, Output layer.
Figure 2. Pictorial representation of CNN

The input layer processes the image one give and make it go through multiple layers to
get the final output. In CNN, generally the input will be an image or a sequence of images that
we upload.

The input image of a hand signing a letter is taken and it was made to pass through the
program. The system trains to recognizes the distinct features of the images and stores it under
a common name. The images turn into grayscale images so that the distinct features and
outline of the hand can be viewed without interrupted by things such as skin texture,
background things and so on.

Feature Extraction

Figure 3. Input image & its features


Like in figure 3, the outline of the hands was clearly visible, and it will be easier for the
machine to train the data by marking the points of the fingers using Convex hull algorithm.
There will be noises occurrence in the data when the image is taken in an unsteady
background.
Figure 4. Image with noise

Unlike the image in figure 3 which has clear outline of hand, the image in figure 4 have
noises. The noises are occurred due to non-blank background or background with obstacles.
Thus, making sure that the image is taken in a blank background is an inevitable step.

Classification

After the image has been identified using the algorithm, the subsequent task involves
categorizing the images. This categorization process is achieved through machine learning
techniques applied to the trained datasets. Consequently, it enables the detection of gestures
performed within a video.

Figure 5. Image explaining the steps

4. Results and Discussion

The proposed work has been implemented in a series of steps as given below and the output is
obtained from the same.
1.Data collection - Gather a diverse dataset of hand gestures commonly used in ISL. As
this dataset represent a wide range of signs and expressions within the language.

2.Preprocessing - Clean and preprocess the collected data. This may involve techniques
such as noise reduction, normalization, and resizing to standardize the input data for the
recognition system.
3.Feature extraction - Identify and extract relevant features from the pre-processed data. In
the context of ISL hand gesture recognition, features could include the position of fingers,
hand shape, and motion patterns.
4.Training the model - Train the selected model using the labeled dataset. The model
learns to associate the extracted features with specific ISL hand gestures during the
training process.
5.Validation & testing - Evaluate the model's performance on separate datasets not used
during training. This helps assess the model's ability to generalize to new data and ensures
its reliability in real-world scenarios.
6.User interface - Develop a user-friendly interface that allows users to interact using
recognized ISL hand gestures. The interface may include visual feedback or text
translations of the gestures to enhance communication.

5. Conclusion

In conclusion, the application of Convolutional Neural Networks (CNNs) for


Indian Sign Language (ISL) hand gesture recognition demonstrates promising results.
The use of CNNs allows for effective feature extraction from image data, enabling the
model to learn intricate patterns associated with various ISL gestures. Through
extensive training on diverse datasets, the CNN achieves a high level of accuracy in
recognizing and classifying hand gestures.

Furthermore, the robustness of the model is evident in its ability to generalize


well to new, unseen data, showcasing its potential for real-world applications. The
convolutional layers enable the network to capture spatial hierarchies, while pooling
layers contribute to spatial invariance, enhancing the overall performance of the ISL
gesture recognition system.

However, challenges such as variations in lighting conditions, hand orientations,


and background clutter still pose potential issues. Fine-tuning the model and augmenting
the dataset with more diverse examples could further enhance its adaptability to real-
world scenarios. Additionally, continuous updates and refinements to the model may be
necessary to accommodate new ISL gestures or improve accuracy over time.

In summary, the use of Convolutional Neural Networks for ISL hand gesture
recognition holds great promise, with the potential to facilitate communication for
individuals with hearing impairments. As technology advances and datasets expand,
this approach can contribute significantly to the development of accessible and
inclusive applications in the field of sign language recognition.

References

[1] S. C. Agrawal, A. S. Jalal, and C. Bhatnagar. Recognition of indian sign language using
feature fusion. In 2012 4th International Conference on Intelligent Human Computer
Interaction (IHCI), pages 1–5, 2012.
[2] Herbert Bay, TinneTuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In
AleˇsLeonardis, Horst Bischof, and Axel Pinz, editors, Computer Vision – ECCV 2006,
pages 404–417, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

You might also like