0% found this document useful (0 votes)
10 views4 pages

(10 13) Voice

This research article presents a dual approach to assist aphonic individuals through voice conversion and hand gesture recognition technologies. The system captures hand gestures and translates them into text or speech, enabling effective communication for those who cannot speak. Additionally, it incorporates speech-to-text conversion to facilitate interactions between deaf and aphonic individuals and the hearing population, ultimately aiming to improve their quality of life.

Uploaded by

yogi singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

(10 13) Voice

This research article presents a dual approach to assist aphonic individuals through voice conversion and hand gesture recognition technologies. The system captures hand gestures and translates them into text or speech, enabling effective communication for those who cannot speak. Additionally, it incorporates speech-to-text conversion to facilitate interactions between deaf and aphonic individuals and the hearing population, ultimately aiming to improve their quality of life.

Uploaded by

yogi singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10

Research Article
Volume-1 | Issue-1| Jan-Jun-2024|
JOURNAL OF
Image Processing and Image
Restoration
Double Blind Peer Reviewed Journal
DOI: https://doi.org/10.48001/JoIPIR

Voice Conversion and Hand Gesture Recognition for Aponic


People

Monith M1, Punith Kumar N1, Naveen Kumar R1, Lokesh B S1, Raghunath B H1*
1
Department of Electronics and Communication Engineering, Acharya Institute of Technology, Bengaluru,
Karnataka, India
*
Corresponding Author’s Email: [email protected]

ARTICLE HISTORY: ABSTRACT: Aphonia, a condition resulting in the loss of voice, presents significant
Received: 14th Dec, 2023
challenges in interpersonal interactions. This project proposes a dual-pronged
approach involving hand gesture recognition and voice conversion techniques to
Revised: 18th Jan, 2024
facilitate effective communication for aphonic individuals. The integration of real-
Accepted: 28th Jan, 2024
time hand gesture recognition provides an alternative means of expressing ideas and
Published: 9th Feb, 2024 emotions. By capturing and translating hand gestures into textual or auditory output,
KEYWORDS: this approach offers a versatile mode of communication. Additionally, advanced
voice conversion algorithms are employed to synthesize natural and intelligible
Aponic people,
speech from typed or selected text. This innovative coupling of technologies
Communicate, Database,
Hand gesture, Voice empowers aphonic individuals to engage in fluid conversations, fostering improved
conversion social interactions and enhancing their overall quality of life. A webcam is used to
communicate with deaf and aphonic people. When there are modalities of
communication, such as speech, that are unavailable, the human hand is the preferred
option. Hand gestures that transmit concepts utilizing diverse forms and finger
alignment enable human-machine interaction. The purpose of this work is to develop
a hand gesture detection model and translate the results to text and audio formats. The
model also responds to user voice commands and displays hand signs from the
database.

1. INTRODUCTION hearing voice and text format. Machine learning will be


utilized to train hand gesture photos in this proposed
Creating a system that allows aphonic people who rely
project, and this trained model will then be used to
heavily on sign language to communicate with others. It is
predict these taught hand motions from a camera. The
extremely difficult for aphonic persons to communicate
main goal of the system’s initiative is to offer deaf and
their message to non-aphonic people. Normal people are
dumb persons with a regular existence. This type of
not taught sign language. It is extremely tough to express
system allows visually impaired people to readily
their message during an emergency. As a result, the
understandable language. People who are deaf or hard of
solution is to transform sign language into human-
hearing can communicate their message utilizing text
DOI: https://doi.org/10.48001/JoIPIR.2023.1110-13 Copyright (c) 2024 QTanalytics India (Publications)
11

and gestures. Deaf persons can interpret other people's


speech according to the text shown. It also enables them
to live lives that are more autonomous (Al-Obodi et al.,
2020).
The ability to perceive, listen, talk, and respond to
situations is one of the most valuable gifts a human
being can have. However, some unfortunate people are
denied this. It is difficult to create a single compact
model for those with visual, hearing, and vocal
disabilities. Communication between deaf- dumb and
hearing people has always been difficult. In a single
compact model, this project presents a communication
system for deaf and mute individuals. We present a
method for a blind person to read a text by taking an
image with a camera that translates a text to speech
(TTS). It enables deaf people to read text using speech-
to- text (STT) conversion technology. It also includes a
method for dumb people to use text-to-voice conversion.
Blind individuals can read words using PyTesseract
Figure 1: Flowchart to Train the Model.
OCR (Online Character Recognition). A laptop is used to
carry out all these duties (Amrutha & Prabu, 2021). In case the image matches accurately with the named
information within the database, the hand signal is
2. OBJECTIVES AND METHODOLOGY
recognized and shows the yield. If the picture does not
The proposed hand signal acknowledgment and voice coordinate with the information with in the database as
change for hard of hearing and idiotic people, as well as shown in Figure 2.
the strategies for accomplishing these objectives, are as
takes after.

• To recognize distinctive hand signals.

• To change over hand signals to content arrange.

• To change over hand motions to discourse arrange.

• To change over discourse to content organize.


Objective 1: To Recognize Different Hand Gestures
At first, pictures of the hand signals are taken employing
a web camera and put away in JPEG arrange. Utilizing Figure 2: Flowchart to Recognize the Hand Gesture.
the Media Pipe calculation, points of interest are
Objective 2: To Convert Hand Gestures to Text Format
recognized from the palm of the hand and arranges are
found. There are 21 points of interest accessible, each The picture which is captured by the internet camera is
hand sign has its interesting point of interest and compared with the named information put away within the
arranges. Each hand sign is labelled and put away within database. If the picture matches accurately with the
the dataset. Different pictures of different hand signals information put away within the information base, hand
are captured and put away within the dataset to prepare the motion is recognized. This hand signal which was gotten
show precisely Pictures are captured employing a web camera by comparing the information put away in database is
to prepare the show as given in Figure 1 (Liu et al., 2016). recognized by utilizing Py Tesseract (OCR) and the
Points of interest are stamped and coordinates are named content message is shown on the yield screen.
and put away within the dataset. While running the
A few highlights, such as Eigen values and Eigen vectors, are
demonstrate, the picture is captured employing a web
extricated and utilized in acknowledgment. The direct
camera and is compared with the named information
discriminate investigation (LDA) Calculation is at that point
within the database.
utilized to recognize movements some time recently being
DOI: https://doi.org/10.48001/JoIPIR.2023.1110-13 Copyright (c) 2024 QTanalytics India (Publications)
12

changed to content and sound arrange. Clamor will be Objective 4: To Convert Speech to Text Format
diminished as a result of dimensionality lessening, and the
The discourse is given as input with the assistance of a
framework will work with awesome accuracy as given in
mouthpiece show on the laptop/computer. The
Figure 3.
framework at that point recognizes the discourse. The
framework checks on the off chance that the voice was
capable of beingheard and clear.
If yes, it changes over discourse to content utilizing (STT)
converter and shows the content on the screen. On the off
chance that no, it shows a mistake expressing that the
framework did not capture the voice legitimately. Once
content is gotten, it checks with the database and returns
the hand-sign pictures.
3. BLOCK DIAGRAM

Figure 3: Flowchart to Convert Hand Gestures to Word


Format.
Objective 3: To Convert Hand Gestures to Speech
Format
The obtained content record ought to too be changed over to
a sound record to permit outwardly disabled individuals to
get it the message being passed on. The pictures which are
captured will be compared with information within the
database and the output is shown within the shape of content
utilizing Py Tesseract (OCR). This content message is
changed over to speech format using the e-Speak device. e-
Speak apparatus could be a text-to-speech (TTS) converter Figure 5: Block Diagram.
instrument (Sawant & Kumbhar, 2014). The gotten discourse In this system, the image of the hand is taken from the
message is played with the assistance of a portable web camera and the captured images are pre- processed to
workstation or a computer as given in Figure 4. eliminate noise. Then the features of the images are
extracted and compared to the features dataset and the
image is classified to its correct hand gesture and it is
recognized. With the use of the audio and text dataset the
recognized image is converted into text and speech
format as shown in Figure 5 (Vijayalakshmi & Aarthi,
2016).
In the proposed system, we make use of media pipe
algorithm which classifies hand gestures using 21
landmarks present in a person’s palm. We make use of
python modules such as cv2 for image processing,
numpy to work on arrays, gTTs and pyttsx3 for text-to-
speech conversion, pygame to play mp3 file, pytesseract
for character recognition, google speech recognizer module
to recognize speech and convert it to text.
4. CONCLUSION
Hand gesture recognition framework may be a keen and
mental frame work for hard of hearing and aphonic
Figure 4: Flowchart to Convert Hand Gestures to Audio individuals for communication. It helps them in
Format.
communicating to ordinary individuals in crisis
DOI: https://doi.org/10.48001/JoIPIR.2023.1110-13 Copyright (c) 2024 QTanalytics India (Publications)
13

circumstances additionally decrease a hole between the


typical individuals and the hard of hearing and aphonic
individuals. The techniques point to help the hard of
hearing and aphonic individuals by making an interface
which aid recognize hand signals and change over into
content and discourse organize moreover to change over
voice and text input to hand signals. The proposed
frameworks may be a bidirectional system. It is ordinarily
difficult for the hard of hearing and aphonic individuals to
communicate with other individuals within the society.
This might lead to hard of hearing and aphonic people
failing to realize their dreams or accomplishing more
noteworthy statures in their life. This framework makes a
difference to diminish the communication hole conjointly
expel the one boundary hard of hearing and aphonic
individuals confront in their travel to victory.
REFERENCES
Al-Obodi, A. H., Al-Hanine, A. M., Al-Harbi, K. N., Al-
Dawas, M. S., & Al-Shargabi, A. A. (2020). A Saudi
sign language recognition system based on
convolutional neural networks. Department of
Information Technology, College of Computer,
Qassim University, Buraydah, Saudi Arabia.
https://dx.doi.org/10.37624/IJERT/13.11.2020.3328-
3334.
Amrutha, K., & Prabu, P. (2021, February). ML based sign
language recognition system. In 2021 International
Conference on Innovative Trends in Information
Technology (ICITIIT) (pp. 1-6). IEEE.
https://doi.org/10.1109/ICITIIT51526.2021.9399594.
Liu, X., Sacks, J., Zhang, M., Richardson, A. G., Lucas, T.
H., & Van der Spiegel, J. (2016). The virtual
trackpad: An electromyography-based, wireless, real-
time, low-power, embedded hand-gesture-recognition
system using an event-driven artificial neural
network. IEEE Transactions on Circuits and Systems
II: Express Briefs, 64(11), 1257-1261. https://doi.
org/10.1109/TCSII.2016.2635674.
Sawant, S. N., & Kumbhar, M. S. (2014, May). Real time
sign language recognition using pca. In 2014 IEEE
International Conference on Advanced
Communications, Control and Computing
Technologies (pp. 1412-1415). IEEE.
https://doi.org/10.1109/ICACCCT.2014.7019333.
Vijayalakshmi, P., & Aarthi, M. (2016, April). Sign
language to speech conversion. In 2016 International
Conference on Recent Trends in Information
Technology (ICRTIT) (pp. 1-6).
IEEE. https://doi.org/10.1109/ICRTIT.2016.756954.
DOI: https://doi.org/10.48001/JoIPIR.2023.1110-13 Copyright (c) 2024 QTanalytics India (Publications)

You might also like