Real-Time Recognition of Indian Sign Language
Real-Time Recognition of Indian Sign Language
Abstract – The real-time sign language recognition Sign languages are a visual representation of
system is developed for recognising the gestures of thoughts through hand gestures, facial expressions,
Indian Sign Language (ISL). Generally, sign languages and body movements. Sign Languages also have
consist of hand gestures and facial expressions. For
several variants, such as American Sign Language
recognising the signs, the Regions of Interest (ROI) are
(ASL), Argentinean Sign Language (LSA), British
identified and tracked using the skin segmentation
feature of OpenCV. The training and prediction of
Sign Language (BSL) and ISL. The hearing and
hand gestures are performed by applying fuzzy c-means speech impaired people prefer the sign language,
clustering machine learning algorithm. The gesture which is mostly used in their region. Moreover, in
recognition has many applications such as gesture India, there is no universal sign language. Though
controlled robots and automated homes, game control, there exist many sign languages, the normal people
Human-Computer Interaction (HCI) and sign language do not know about sign languages. Hence
interpretation. The proposed system is used to communicating with deaf and dumb people becomes
recognize the real-time signs. Hence it is very much
more complex.
useful for hearing and speech impaired people to
communicate with normal people. Recognition of sign language can be done in two
Keywords – ISL, Sign language recognition, HCI,
ways, either glove based recognition or vision based
Fuzzy c-means clustering recognition. In glove based technique a network of
sensors is used to capture the movements of the
I. INTRODUCTION fingers. Facial expressions cannot be recognized in
this method and also, wearing a glove is always
World Health Organization's (WHO) survey states uncomfortable for the users. This method cannot be
that above 6% of the world's population is suffering implemented massively since data gloves are very
from hearing impairment. In March 2018, the number much expensive. So, the proposed system uses the
of people with this disability is 466 million, and it is non-invasive vision based recognition method. The
expected to be 900 million by 2050. Also, the 2011 vision-based recognition can be achieved in two
census of India states that 7 million Indians are ways. They are Static recognition or Dynamic
suffering from hearing and speech impairment. They recognition. In static recognition system, the input
do not think these impairments as disabilities; it is may be an image of hand pose. It provides an only
another way of a different life. However, their circle 2D representation of the gesture, and this can be used
is very much limited. They should not be part of the to recognize only alphabets and numbers. For
deaf world alone, which seems cloistered sometimes. recognition of continuous sign language, the dynamic
Text messaging, writing, using visual media and gesture recognition system is used. Here the real-time
finger spelling are a few methods used to establish videos are given as inputs to the system, a sequence
communication between normal and hearing and of hand movements form the gesture of the
speech impaired people. However, they prefer sign word/sentence. Information Technology with its
language only because they can express their modern methodologies such as artificial intelligence
emotions and feelings through signs only. So and cloud computing has an impressive role in
conversing in their regional sign language brings enhancing intercommunication among people with
more comfort for the people to share their ideas and vocal disabilities and normal people.
thoughts among their near and dears.
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)
blurring, and the most popular goal of blurring is to C. BACKGROUND NOISE REMOVAL
reduce noise. The blurred image is obtained by The two morphological operations are repeated
performing a convolution operation with a low-pass until a clear foreground object is extracted. While
box filter. A 3x3 normalised box filter can be performing the morphological operations, the
selection kernel depends upon the needs of the
ª111º
1 « system, and it may be created manually using the
represented as: K «111»
» OpenCV module. Morphological operations along
9
«
¬111»
¼
with median blurring achieve high efficiency in noise
removal. In the proposed system a 5x5 elliptical
Coloured object extraction can be achieved more kernel as shown below is used.
easily in HSV colourspaces. Hence the images are
converted from BGR colorspaces to HSV colorspaces ª 00100 º
«11111 »
with the range of H varies from 0 to 179, the range of
« »
S varies from 0 to 255, and the range of V varies MORPH _ ELLIPSE «11111 »
from 0 to 255. At the end of the preprocessing, binary « »
images are obtained where the white coloured area is «11111 »
the skin region, and the black coloured represents the «
¬ 00100 »
¼
rest. The median blurring technique is very much efficient
in removing the salt and pepper noises in the image.
B. MORPHOLOGICAL TRANSFORMATIONS
In median filtering, the mid value is updated as the
Morphological transformations are operated on
median of all neighbouring pixels. After applying
the binary images based on the shape of the image. It
requires the original image and the structuring morphological operations and median blurring, a
element or kernel as inputs. Erosion and Dilation are simple threshold function is used to obtain the final
the two basic morphological operators. Erosion image after preprocessing.
removes all the noises near the edges, based on the
D. FINDING CONTOURS
kernel size. Thus erosion can be very much useful in
In the proposed system contours are used for
removing small noises from the foreground. The
detecting the object. Contour is a curve that joins all
erosion is followed by dilation. It increases the
the points in the edges, having same colour or
foreground object or the white coloured region in the
intensity or a contour refers to the outline or
output image, because the object may shrink while
silhouette of an object.
eroding.
Contours work very well on binary images.
Let E be a Euclidean space and A be a
Hence threshold or canny edge detection is applied
binary image in E and B be the structuring element.
before finding contours. Contours are a list of the
Then,
entire contour in the image. Each contour is an array
Erosion of A by B is,
of (x, y) coordinates of boundary points of the object.
A B A
bB
b
Area of all the contours is calculated and using them
the top three contours are selected. Those three
contours represent Face, Left and Right hand, which
Here A−b denotes the translation of A by –b
contributes to the gestures.
E. FEATURE EXTRACTION
Dilation of A by B is, Feature selection and extraction are very
A B *A
bB
b
crucial steps in an image processing application. The
most relevant features should be identified and
Ab is the translation of A by b. extracted for the correct functionality.
Criteria for feature selection/extraction:
9 Either improve or maintain the accuracy of the
classifier
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)
9 Simplify the complexity of the classifier for both supervised learning and unsupervised
learning, depending upon the needs.
C i 1 j 1
Where m > 0, and the fuzzy partitioning is carried out
by an iterative optimisation of the above function
with the update of wij .
1
wij 2
§ xi c j · m 1
¦
c
¨
1¨
¸
k
x ck ¸
© i ¹
Here k is an iteration step. During training, the
extracted features are given to the c-means algorithm
and it partitions the input data items into a specified
number of clusters. During testing, it matches the test
file with the existing clusters and returns the id of the
Fig.1. Flow Diagrams of Training and Testing cluster centre which has the highest degree of
membership.
These two criteria must be satisfied while doing
feature selection and extraction. The top three
contours obtained earlier completely covers the IV. EXPERIMENTAL ANALYSIS
Regions of Interest (Face, Left hand and Right hand).
The required features are extracted from these The data samples are collected for 80 words
regions as vector features for each frame in a video. and 50 sentences of everyday usage terms of ISL.
The videos are recorded from ten volunteers of our
F. FUZZY CLUSTERING collaborator school, using a digital camera.
Clustering is known as the process of grouping
of similar data items together, while the items in the
other clusters are as dissimilar as possible. In fuzzy Sign Type Number. Number Total
clustering, the data items may belong to more than of Signs of signers Samples
one cluster. Among several fuzzy clustering Word 80 10 800
algorithms, fuzzy c-means clustering (FCM) Sentence 50 10 500
algorithm is used most widely, and this can be used Table.1: Distribution of Dataset
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)
The data collection camp is planned for two The morphological operations are performed on the
sessions, where the samples for 40 words and 25 HSV image to remove the noises present in the
sentences are recorded in each session. At the earlier foreground. The morphological transformation gives
stage, the system was developed to recognize 40 the binary images as shown in Fig. 4. Here the white
words. Eight samples of each sign were used for region represents the skin area and rests are in black.
training, and two samples were used for testing.
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.
Second International Conference on Computational Intelligence in Data Science (ICCIDS-2019)
using these, the Fuzzy c-means prediction algorithm Sentence Formation", Eleventh International Multi-
Conference on Information Processing-2015 (IMCIP-2015),
classifies the new data items. The cluster with the
Procedia Computer Science 54 (2015) 523 – 531.
highest membership for the corresponding data points [3]. Manasa Srinivasa H S and Suresha H S, "Implementation of
is chosen as gesture id. The identifications of the Real Time Hand Gesture Recognition," International Journal
gestures are made by using this gesture id. of Innovative Research in Computer and Communication
Engineering, Vol. 3, Issue 5, May 2015.
[4]. Joyeeta Singha and Karen Das, "Automatic Indian Sign
V. RESULTS Language Recognition for Continuous Video Sequence,"
ADBU Journal of Engineering Technology 2015 Volume 2
This FCM based real-time sign language Issue 1.
recognition system, for recognising the words of [5]. Archana S. Ghotkar and Gajanan K. Kharate, "Dynamic
Hand Gesture Recognition and Novel Sentence Interpretation
Indian Sign Language has produced 75 % accuracy in Algorithm for Indian Sign Language Using Microsoft Kinect
gesture labelling and this is somewhat higher than the Sensor," Journal of Pattern Recognition Research 1 (2015)
similar systems. Also, the developed system is much 24-38.
better than other systems, since it is capable of [6]. M.K. Bhuyan, "FSM-based recognition of dynamic hand
gestures via gesture summarization using key video object
recognising 40 words of ISL in real-time while the planes," World Academy of Science, Engineering and
similar systems have the capability to recognize static Technology Vol: 6 2012-08-23.
gestures only. The FCM is more efficient and reliable [7]. M.M.Gharasuie and H.Seyedarabi, "Real-time Dynamic
than the other clustering algorithms in many Hand Gesture Recognition using Hidden Markov Models,"
2013 8th Iranian Conference on Machine Vision and Image
applications by its performance. Processing (MVIP).
[8]. Kairong Wang, Bingjia Xiao, Jinyao Xia, and Dan Li, "A
VI. CONCLUSION Dynamic Hand Gesture Recognition Algorithm Using
Codebook Model and Spatial Moments," 2015 7th
The system for recognizing real-time Indian Sign International Conference on Intelligent Human-Machine
Systems and Cybernetics.
Language (ISL) portrays an impressive role in
[9]. Francke H., Ruiz-del-Solar J. and Verschae R., "Real-Time
enhancing casual communication among people with Hand Gesture Detection and Recognition Using Boosted
hearing disabilities and normal persons. Though Classifiers and Active Learning," Advances in Image and
FCM is efficient, it requires more computation time Video Technology. PSIVT 2007. Lecture Notes in Computer
Science, vol 4872. Springer, Berlin, Heidelberg.
than the others. Also, for high dimensionality
[10]. Hari Prabhat Gupta, Haresh S Chudgar, Siddhartha
datasets, most of the traditional algorithms suffer. Mukherjee, Tanima Dutta, and Kulwant Sharma, "A
Hence it is planned to extend the system by Continuous Hand Gestures Recognition Technique for
combining Convolutional Neural Networks (CNN) Human-Machine Interaction using Accelerometer and
Gyroscope sensors," IEEE Sensors Journal (Volume: 16,
and Recurrent Neural Networks (RNN) to capture the
Issue: 16, Aug.15, 2016 )Page(s): 6425 – 6432.
spatial and temporal features. In future work, more [11]. Noor Tubaiz, Tamer Shanableh, and Khaled Assaleh,
words will be added to the system. "Glove-Based Continuous Arabic Sign Language
Recognition in User-Dependent Mode," IEEE
VII. ACKNOWLEDGEMENT TRANSACTIONS ON HUMAN-MACHINE SYSTEMS,
VOL. 45, NO. 4, August 2015.
Sincere thanks to EPICS in IEEE for providing the
initial funding to develop this assistive product. The
research team appreciates and heartily thanks the
high school volunteers for their contribution to the
dataset.
VIII. REFERENCES
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on July 27,2024 at 06:02:18 UTC from IEEE Xplore. Restrictions apply.