0% found this document useful (0 votes)
11 views6 pages

Real-Time Recognition of Indian Sign Language

Uploaded by

Jeffrin M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Real-Time Recognition of Indian Sign Language

Uploaded by

Jeffrin M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

Real-Time Recognition of Indian Sign


Language
Muthu Mariappan H Dr Gomathi V
Department of Computer Science and Engineering Department of Computer Science and Engineering
National Engineering College National Engineering College
Kovilpatti, Tamil Nadu, India Kovilpatti, Tamil Nadu, India
0000-0001-7801-3512 0000-0003-3639-485X

Abstract – The real-time sign language recognition Sign languages are a visual representation of
system is developed for recognising the gestures of thoughts through hand gestures, facial expressions,
Indian Sign Language (ISL). Generally, sign languages and body movements. Sign Languages also have
consist of hand gestures and facial expressions. For
several variants, such as American Sign Language
recognising the signs, the Regions of Interest (ROI) are
(ASL), Argentinean Sign Language (LSA), British
identified and tracked using the skin segmentation
feature of OpenCV. The training and prediction of
Sign Language (BSL) and ISL. The hearing and
hand gestures are performed by applying fuzzy c-means speech impaired people prefer the sign language,
clustering machine learning algorithm. The gesture which is mostly used in their region. Moreover, in
recognition has many applications such as gesture India, there is no universal sign language. Though
controlled robots and automated homes, game control, there exist many sign languages, the normal people
Human-Computer Interaction (HCI) and sign language do not know about sign languages. Hence
interpretation. The proposed system is used to communicating with deaf and dumb people becomes
recognize the real-time signs. Hence it is very much
more complex.
useful for hearing and speech impaired people to
communicate with normal people. Recognition of sign language can be done in two
Keywords – ISL, Sign language recognition, HCI,
ways, either glove based recognition or vision based
Fuzzy c-means clustering recognition. In glove based technique a network of
sensors is used to capture the movements of the
I. INTRODUCTION fingers. Facial expressions cannot be recognized in
this method and also, wearing a glove is always
World Health Organization's (WHO) survey states uncomfortable for the users. This method cannot be
that above 6% of the world's population is suffering implemented massively since data gloves are very
from hearing impairment. In March 2018, the number much expensive. So, the proposed system uses the
of people with this disability is 466 million, and it is non-invasive vision based recognition method. The
expected to be 900 million by 2050. Also, the 2011 vision-based recognition can be achieved in two
census of India states that 7 million Indians are ways. They are Static recognition or Dynamic
suffering from hearing and speech impairment. They recognition. In static recognition system, the input
do not think these impairments as disabilities; it is may be an image of hand pose. It provides an only
another way of a different life. However, their circle 2D representation of the gesture, and this can be used
is very much limited. They should not be part of the to recognize only alphabets and numbers. For
deaf world alone, which seems cloistered sometimes. recognition of continuous sign language, the dynamic
Text messaging, writing, using visual media and gesture recognition system is used. Here the real-time
finger spelling are a few methods used to establish videos are given as inputs to the system, a sequence
communication between normal and hearing and of hand movements form the gesture of the
speech impaired people. However, they prefer sign word/sentence. Information Technology with its
language only because they can express their modern methodologies such as artificial intelligence
emotions and feelings through signs only. So and cloud computing has an impressive role in
conversing in their regional sign language brings enhancing intercommunication among people with
more comfort for the people to share their ideas and vocal disabilities and normal people.
thoughts among their near and dears.

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

II. RELATED WORKS Kairong Wang [8] presented a Codebook (CB)


modelling method and spatial moments for
Hand gesture recognition is the key area of recognizing dynamic hand gestures. Background
research for the past two decades. Researchers have subtraction with skin colour detection is used for
done lots of research and tried variety of techniques hand region segmentation. Palm centre is identified
for gesture recognition. Geethu Nath and Arun C.S. by spatial moments of hand contour, and fingers are
[1] developed a system using ARM CORTEX A8 tracked by a curvature-based algorithm. Francke H
processor for recognising the ASL symbols. The [9] developed a real-time hand gesture recognition
system uses Jarvis algorithm to recognize the system by using active learning and bootstrap
numbers and template matching algorithm to training techniques. The usage of colour-based hand
recognize the alphabets. Kumud Tripathi [2] designed tracking and a multi-gesture classification tree
a system for recognising continuous ISL gestures increases the robustness of the system. This
using Principal Component Analysis (PCA) with innovative use provides the system with 86%
various distance classifiers. From the own data set, accuracy, better than the similar systems. Hari
the features from the keyframes are extracted using Prabhat Gupta [10] uses accelerometer and gyroscope
Orientation Histogram and given as input to the sensors for recognizing continuous hand gestures
system. Manasa Srinivasa H.S. and Suresha H.S. [3] recognition. The starting and ending points of
used the codebook algorithm for background meaningful gesture segments are developed using an
subtraction and generated binary images from the automatic gesture spotting algorithm. This gesture
given image frames. The binary image is used to code is compared with the gesture database using
calculate convex hull and convexity defects and DTW algorithm to recognize the corresponding
depending upon the calculation of defect points the gesture. Noor Tubaiz [11] suggested sequential data
fingers which are unfolded are counted. classification using Modified k-Nearest Neighbor
Joyeeta Singha and Karen Das [4] described a (MKNN) approach. The hand motions are sensed
novel approach to recognize alphabets of ISL. An using data gloves. The raw data are augmented using
eigenvector-based technique is used for feature window-based statistical features, which are
extraction, and eigenvalue weighted Euclidean computed from the previous raw feature vectors and
distance technique is used for Classification of 24 future raw feature vectors. Based on the existing
different ISL alphabets. Archana S. Ghotkar and systems the proposed system has been developed
Gajanan K. Kharate [5] explored rule-based and with novel techniques to recognize words of Indian
dynamic time warping (DTW) based method to Sign Language (ISL).
recognize ISL words. Their experiments proved that
the performance of DTW is very much higher for III. METHODOLOGY
continuous word recognition. M.K. Bhuyan [6]
segments frame into video object planes (VOPs), to The proposed system has a camera unit for
obtain a semantically meaningful hand position. Key capturing the gestures of the hearing and speech
VOPs and temporal information are tracked to form a impaired people. The real-time sign language
complete gesture sequence. The test results recognition system was designed as a portable unit
concluded that by using keyframes, a gesture could for more convenience of the users. The raw videos
be uniquely represented as a finite state machine with taken in a dynamic background is given as an input to
keyframes and corresponding frame duration as the system. The image frames are resized to maintain
states. M.M.Gharasuie and H.Seyedarabi [7] the equality among all the videos. OpenCV (Open
proposed a system that recognizes the real-time hand Source Library for Computer Vision) is used for
gestures of numbers from 0 to 9 using Hidden feature extraction and video classification.
Markov Models (HMMs).In this proposed system,
preprocessing and tracking steps took place in hand A. PREPROCESSING
trajectory extraction phase and feature extraction is During preprocessing, high-intensity noises from
taking place in the classification phases. The system the video frames are eliminated. The first and
is capable of providing 93% recognition rate. foremost step in preprocessing is smoothing or

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

blurring, and the most popular goal of blurring is to C. BACKGROUND NOISE REMOVAL
reduce noise. The blurred image is obtained by The two morphological operations are repeated
performing a convolution operation with a low-pass until a clear foreground object is extracted. While
box filter. A 3x3 normalised box filter can be performing the morphological operations, the
selection kernel depends upon the needs of the
ª111º
1 « system, and it may be created manually using the
represented as: K «111»
» OpenCV module. Morphological operations along
9
«
¬111»
¼
with median blurring achieve high efficiency in noise
removal. In the proposed system a 5x5 elliptical
Coloured object extraction can be achieved more kernel as shown below is used.
easily in HSV colourspaces. Hence the images are
converted from BGR colorspaces to HSV colorspaces ª 00100 º
«11111 »
with the range of H varies from 0 to 179, the range of
« »
S varies from 0 to 255, and the range of V varies MORPH _ ELLIPSE «11111 »
from 0 to 255. At the end of the preprocessing, binary « »
images are obtained where the white coloured area is «11111 »
the skin region, and the black coloured represents the «
¬ 00100 »
¼
rest. The median blurring technique is very much efficient
in removing the salt and pepper noises in the image.
B. MORPHOLOGICAL TRANSFORMATIONS
In median filtering, the mid value is updated as the
Morphological transformations are operated on
median of all neighbouring pixels. After applying
the binary images based on the shape of the image. It
requires the original image and the structuring morphological operations and median blurring, a
element or kernel as inputs. Erosion and Dilation are simple threshold function is used to obtain the final
the two basic morphological operators. Erosion image after preprocessing.
removes all the noises near the edges, based on the
D. FINDING CONTOURS
kernel size. Thus erosion can be very much useful in
In the proposed system contours are used for
removing small noises from the foreground. The
detecting the object. Contour is a curve that joins all
erosion is followed by dilation. It increases the
the points in the edges, having same colour or
foreground object or the white coloured region in the
intensity or a contour refers to the outline or
output image, because the object may shrink while
silhouette of an object.
eroding.
Contours work very well on binary images.
Let E be a Euclidean space and A be a
Hence threshold or canny edge detection is applied
binary image in E and B be the structuring element.
before finding contours. Contours are a list of the
Then,
entire contour in the image. Each contour is an array
Erosion of A by B is,
of (x, y) coordinates of boundary points of the object.
A B A
bB
b
Area of all the contours is calculated and using them
the top three contours are selected. Those three
contours represent Face, Left and Right hand, which
Here A−b denotes the translation of A by –b
contributes to the gestures.

E. FEATURE EXTRACTION
Dilation of A by B is, Feature selection and extraction are very
A† B *A
bB
b
crucial steps in an image processing application. The
most relevant features should be identified and
Ab is the translation of A by b. extracted for the correct functionality.
Criteria for feature selection/extraction:
9 Either improve or maintain the accuracy of the
classifier

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

9 Simplify the complexity of the classifier for both supervised learning and unsupervised
learning, depending upon the needs.

The FCM, partitions, n data elements X {x1, x2, .....xn }

into c clusters C {c1, c2, .....cc } based on the


criteria used.
The partition matrix,
W wi , j  > 0,1@ , i 1,...n, j 1,...c
Where wij is the membership of the xi in c j .
The FCM minimises the following objective
function,
n c
arg min ¦¦ wij xi  c j
m 2

C i 1 j 1
Where m > 0, and the fuzzy partitioning is carried out
by an iterative optimisation of the above function
with the update of wij .

1
wij 2
§ xi  c j · m 1
¦
c
¨

¸
k
x  ck ¸
© i ¹
Here k is an iteration step. During training, the
extracted features are given to the c-means algorithm
and it partitions the input data items into a specified
number of clusters. During testing, it matches the test
file with the existing clusters and returns the id of the
Fig.1. Flow Diagrams of Training and Testing cluster centre which has the highest degree of
membership.
These two criteria must be satisfied while doing
feature selection and extraction. The top three
contours obtained earlier completely covers the IV. EXPERIMENTAL ANALYSIS
Regions of Interest (Face, Left hand and Right hand).
The required features are extracted from these The data samples are collected for 80 words
regions as vector features for each frame in a video. and 50 sentences of everyday usage terms of ISL.
The videos are recorded from ten volunteers of our
F. FUZZY CLUSTERING collaborator school, using a digital camera.
Clustering is known as the process of grouping
of similar data items together, while the items in the
other clusters are as dissimilar as possible. In fuzzy Sign Type Number. Number Total
clustering, the data items may belong to more than of Signs of signers Samples
one cluster. Among several fuzzy clustering Word 80 10 800
algorithms, fuzzy c-means clustering (FCM) Sentence 50 10 500
algorithm is used most widely, and this can be used Table.1: Distribution of Dataset

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

The data collection camp is planned for two The morphological operations are performed on the
sessions, where the samples for 40 words and 25 HSV image to remove the noises present in the
sentences are recorded in each session. At the earlier foreground. The morphological transformation gives
stage, the system was developed to recognize 40 the binary images as shown in Fig. 4. Here the white
words. Eight samples of each sign were used for region represents the skin area and rests are in black.
training, and two samples were used for testing.

Book, Note, Pen, Face,


Word Father, Mother, Brother,
Soap, Temple, School

What is your name?


Sentences How are You?
Where are you going?

Table.2: Samples from Dataset


Fig.4. Erosion (left) and Dilation (right)

The contours are used to track the foreground object


(skin area). All the contours in the current frame are
identified; among them, the top three represents our
ROI. Fig. 5, Depicts the contours covering the face
and both hands.

Fig.2. Comparison of the original frame (left) and smoothened


frame.

In Fig. 2, the right side image is obtained by applying


smoothing or blurring technique to reduce the noise
in the image.
Fig.5.Segmentation of ROIs with contours
The features such as number of points in the convex
hull, number of defect points and distance from the
centre to each finger are extracted from the Regions
of Interest, through these three contours, and the
orientation between the contours is also kept track.

The FCM algorithm is used to group similar data


items. FCM assigns membership to data points,
corresponding to the cluster centres. More the data
point is near to the cluster, the higher the value of the
Fig.3. HSV colorspaces conversion membership. The sum of membership of data points
one. Membership and cluster centres are updated
BGR image is converted into HSV colorspaces since iteratively. Finally, the algorithm returns cluster
coloured object tracking can be done effectively in centres and membership of each data points. By
HSV colorspaces and the result is displayed in Fig. 3.

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)

using these, the Fuzzy c-means prediction algorithm Sentence Formation", Eleventh International Multi-
Conference on Information Processing-2015 (IMCIP-2015),
classifies the new data items. The cluster with the
Procedia Computer Science 54 (2015) 523 – 531.
highest membership for the corresponding data points [3]. Manasa Srinivasa H S and Suresha H S, "Implementation of
is chosen as gesture id. The identifications of the Real Time Hand Gesture Recognition," International Journal
gestures are made by using this gesture id. of Innovative Research in Computer and Communication
Engineering, Vol. 3, Issue 5, May 2015.
[4]. Joyeeta Singha and Karen Das, "Automatic Indian Sign
V. RESULTS Language Recognition for Continuous Video Sequence,"
ADBU Journal of Engineering Technology 2015 Volume 2
This FCM based real-time sign language Issue 1.
recognition system, for recognising the words of [5]. Archana S. Ghotkar and Gajanan K. Kharate, "Dynamic
Hand Gesture Recognition and Novel Sentence Interpretation
Indian Sign Language has produced 75 % accuracy in Algorithm for Indian Sign Language Using Microsoft Kinect
gesture labelling and this is somewhat higher than the Sensor," Journal of Pattern Recognition Research 1 (2015)
similar systems. Also, the developed system is much 24-38.
better than other systems, since it is capable of [6]. M.K. Bhuyan, "FSM-based recognition of dynamic hand
gestures via gesture summarization using key video object
recognising 40 words of ISL in real-time while the planes," World Academy of Science, Engineering and
similar systems have the capability to recognize static Technology Vol: 6 2012-08-23.
gestures only. The FCM is more efficient and reliable [7]. M.M.Gharasuie and H.Seyedarabi, "Real-time Dynamic
than the other clustering algorithms in many Hand Gesture Recognition using Hidden Markov Models,"
2013 8th Iranian Conference on Machine Vision and Image
applications by its performance. Processing (MVIP).
[8]. Kairong Wang, Bingjia Xiao, Jinyao Xia, and Dan Li, "A
VI. CONCLUSION Dynamic Hand Gesture Recognition Algorithm Using
Codebook Model and Spatial Moments," 2015 7th
The system for recognizing real-time Indian Sign International Conference on Intelligent Human-Machine
Systems and Cybernetics.
Language (ISL) portrays an impressive role in
[9]. Francke H., Ruiz-del-Solar J. and Verschae R., "Real-Time
enhancing casual communication among people with Hand Gesture Detection and Recognition Using Boosted
hearing disabilities and normal persons. Though Classifiers and Active Learning," Advances in Image and
FCM is efficient, it requires more computation time Video Technology. PSIVT 2007. Lecture Notes in Computer
Science, vol 4872. Springer, Berlin, Heidelberg.
than the others. Also, for high dimensionality
[10]. Hari Prabhat Gupta, Haresh S Chudgar, Siddhartha
datasets, most of the traditional algorithms suffer. Mukherjee, Tanima Dutta, and Kulwant Sharma, "A
Hence it is planned to extend the system by Continuous Hand Gestures Recognition Technique for
combining Convolutional Neural Networks (CNN) Human-Machine Interaction using Accelerometer and
Gyroscope sensors," IEEE Sensors Journal (Volume: 16,
and Recurrent Neural Networks (RNN) to capture the
Issue: 16, Aug.15, 2016 )Page(s): 6425 – 6432.
spatial and temporal features. In future work, more [11]. Noor Tubaiz, Tamer Shanableh, and Khaled Assaleh,
words will be added to the system. "Glove-Based Continuous Arabic Sign Language
Recognition in User-Dependent Mode," IEEE
VII. ACKNOWLEDGEMENT TRANSACTIONS ON HUMAN-MACHINE SYSTEMS,
VOL. 45, NO. 4, August 2015.
Sincere thanks to EPICS in IEEE for providing the
initial funding to develop this assistive product. The
research team appreciates and heartily thanks the
high school volunteers for their contribution to the
dataset.

VIII. REFERENCES

[1]. Geethu G Nath and Arun C S, "Real Time Sign Language


Interpreter," 2017 International Conference on Electrical,
Instrumentation, and Communication Engineering
(ICEICE2017).
[2]. Kumud Tripathi, Neha Baranwal and G. C. Nandi,
"Continuous Indian Sign Language Gesture Recognition and

978-1-5386-9471-8/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.

You might also like