Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
This research paper introduces a transformation that uses the This research aims to contribute to an integrated society by
power of machine learning to instantly convert hand gestures eliminating communication barriers and providing technological
into text and text into speech. This project uses computer solutions that support communication based on the knowledge of
vision and natural language processing technologies to solve people using the language. Evaluation of performance in terms of
communication problems of language users and people who accuracy of translation and naturalness of synthesized speech
rely on language. demonstrates the social impact of this new educational practice.
Sign Language to Text Shubham Thakar, High accuracy (98.7%) Assumes a smooth
Conversion in Real Time Samveg Shah, achieved with transfer learning background in images;
Bhavya Shah, Anant compared to CNN (94%). Future scope includes
V. Nimkar diversifying the model for
different sign languages and
improving robustness to
diverse image backgrounds
Sign Language to Text Shreyas Viswanathan, Affordable and efficient Limited to 11 ASL alphabets
and Speech Conversion Saurabh Pandey, solution using Raspberry Pi. due to processing power
Using CNN Kartik Sharma, Dr. P. Hand gesture recognition for constraints. Challenges in
Vijayakumar American Sign Language diverse lighting conditions.
Sign Language to Text- Akshatha Rani K, Dr. Bridges communication gap Achieves 74% accuracy -
Speech Translator Using N Manjanaik between deaf-mute individuals Recognizes almost all letters
Machine Learning and others - Utilizes efficient in ASL - Addresses the
hand tracking with media pipe - challenge of communication
Converts recognized signs to for deaf and mute individuals
speech, aiding blind
individuals.
Sign Language S. Kumara Krishnan, The proposed system utilizes a Dependency on hardware. -
Recognition and V. Prasanna virtual reality headset for Limitation to alphabets. -
Response via Virtual Venkatesan, V. immersive sign language Cost implications with
Reality Suriya Ganesh, D.P. learning. It employs Leap increased sensors.
Sai Prassanna, K. Motion controller features for
Sundara Skandan real-time gesture recognition.
KoSign Sign Language Mathew Huerta- Quantitative evaluation of the Translating low-context and
Translation Project: Enochian, Du Hui translation methodology, unclear phrases into KSL
Introducing The Lee, Hye Jin Myung, revealing that text-free - Signing Dates as the day of
NIASL2021 Dataset Kang Suk Byun, Jun prompting produced better the month cannot be signed
Woo Lee translations than text-based without also signing the
prompting. month
In recent years there has been tremendous research done Gesture Classification:
on the hand gesture recognition. In Hidden Markov Models (HMM) is used for the
With the help of literature survey, we realized that the classification of the gestures. This model deals with
basic steps in hand gesture recognition are: - dynamic aspects of gestures. Gestures are extracted from
a sequence of video images by tracking the skin-color
Data acquisition: blobs corresponding to the hand into a body– face space
Use of sensory devices: It uses centred on the face of the user.
electromechanical devices to provide exact hand
configuration and position. Different glove-based The goal is to recognize two classes of gestures: deictic
approaches can be used to extract information. But it and symbolic. The image is filtered using a fast look–up
is expensive and not user friendly. indexing table. After filtering, skin colour pixels are
gathered into blobs. Blobs are statistical objects based on
Vision based approach: In vision-based the location (x, y) and the colorimetry (Y, U, V) of the
methods, the computer webcam is the input device skin color pixels in order to determine homogeneous
for observing the information of hands and/or areas.
fingers. The Vision Based methods require only a
camera, thus realizing a natural interaction between In Naïve Bayes Classifier is used which is an effective
humans and computers without the use of any extra and fast method for static hand gesture recognition. It is
devices, thereby reducing cost. These systems tend based on classifying the different gestures according to
to complement biological vision by describing geometric based invariants which are obtained from
artificial vision systems that are implemented in image data after segmentation.
software and/or hardware. The main challenge of
vision-based hand detection ranges from coping with Thus, unlike many other recognition methods, this
the large variability of the human hand’s appearance method is not dependent on skin colour. The gestures are
due to a huge number of hand movements, to extracted from each frame of the video, with a static
different skin-color possibilities as well as to the background. The first step is to segment and label the
variations in viewpoints, scales, and speed of the objects of interest and to extract geometric invariants
camera capturing the scene. from them. Next step is the classification of gestures by
using a K nearest neighbor algorithm aided with distance
weighting algorithm (KNNDW) to provide suitable data
Data Pre-Processing and Feature for a locally weighted Naïve Bayes‟ classifier.
extraction for vision-based approach:
In the approach for hand detection combines threshold- According to the paper on “Human Hand Gesture
based colour detection with background subtraction. Recognition Using a Convolution Neural Network” by
We can use AdaBoost face detector to differentiate Hsien-I Lin, Ming-Hsiang Hsu, and Wei-Kai Chen
between faces and hands as they both involve similar (graduates of Institute of Automation Technology
skin-color. National Taipei University of Technology Taipei,
Taiwan), they have constructed a skin model to extract
We can also extract necessary image which is to be the hands out of an image and then apply binary threshold
trained by applying a filter called Gaussian Blur (also to the whole image. After obtaining the threshold image
known as Gaussian smoothing). The filter can be easily they calibrate it about the principal axis in order to centre
applied using open computer vision (also known as the image about the axis. They input this image to a
OpenCV). convolutional neural network model in order to train and
predict the outputs. They have trained their model over 7
For extracting necessary image which is to be trained hand gestures and using this model they produced an
we can use instrumented gloves. This helps reduce accuracy of around 95% for those 7 gestures.
computation time for Pre-Processing and gives us more
concise and accurate data compared to applying filters
on data received from video extraction.
TensorFlow
TensorFlow is an end-to-end open-source platform
for Machine Learning. It has a comprehensive,
flexible ecosystem of tools, libraries and
community resources that lets researchers push the
state-of-the-art in Machine Learning and developers
easily build and deploy Machine Learning powered
applications.
OpenCV:
OpenCV (Open-Source Computer Vision) is an
open-source library of programming functions used
for real-time computer-vision.
V. CONCLUSION