0% found this document useful (0 votes)
80 views4 pages

Final Paper IEEE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views4 pages

Final Paper IEEE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

GestureInterpreter: A low cost real-time Bangla sign

language interpreter robot


Safina Islam Aboni Saniul Islam Sani
Farhan Zaman Ha-mim
School of Engineering, Technology and School of Engineering, Technology and
School of Engineering, Technology and
Sciences Sciences
Sciences
Independent University, Bangladesh Independent University, Bangladesh
Independent University, Bangladesh
Dhaka, Bangladesh Dhaka, Bangladesh
Dhaka, Bangladesh
[email protected] [email protected]
[email protected]
Mohammad Shidujaman
Md. Omar Bin Sarwar
Department of Computer Science and
School of Engineering, Technology and
Engineering,
Sciences
Independent University, Bangladesh
Independent University, Bangladesh
Dhaka, Bangladesh
Dhaka, Bangladesh
[email protected]
[email protected]

Abstract— By employing the latest technologies such as the this study could lead to improvements in assistive devices for
ESP32 camera, TFT display, TTL serial converter, and people with hearing loss not just in Bangladesh but
speaker, this project seeks to create a device for real-time throughout the world. To encourage inclusion and
interpretation of Bangla Sign Language (BSL) movements. By accessibility on a larger scale, comparable devices could be
addressing communication barriers, the device aims to implemented in other areas where sign languages have been
improve accessibility and inclusivity within the deaf used by improving and expanding the technology established
community in Bangladesh for those with hearing impairments. in this research.
The device analyzes BSL movements collected by the camera
and sensor using machine learning techniques and image
processing algorithms, providing clear output over a speaker II. LITERATURE REVIEW
and display.
A. Previous work summary(in tabular form)
Keywords— Bangla Sign Language (BSL), ESP32 camera, Table 1. Literature Review
TTL serial converter, real-time interpretation, accessibility,
Title and Applications Methodology Performances
inclusivity, image processing, machine learning.
References
1. Deep Assistive This System The YOLOv7
I. INTRODUCTION Learning- technology is trained on model
Sign language is a form of interaction method that is based Bangla for the achieved
usually associated with individuals who are deaf or have Sign individuals Okkhornama [email protected]
other communication challenges. Sign language enables Language with hearing category with accuracies
people to communicate with others more easily and more Detection disabilities. 49 BdSL ranging from
efficiently, reducing the need for assistance devices such as with an categories. 85% to 97%
hearing aids or cochlear implants.[1] It also allows them to EdgeDevice and
participate more fully in interactions with others, such as Siddique et [email protected]
discussions, social events, and performances and
al. (2023) [7] from 41% to
presentations[1]. Bangla Sign Language (BdSL) is critical
53%.
for encouraging communication and promoting social
integration among Bangladesh's deaf people[2]. BdSL 2. Bangla YOLOv5 for 24.2GB The prototype
recognition and detection is a difficult problem in computer Sign BdSL BdSL video implementati
vision and deep learning research because sign language Language recognition from SignBD on of the
recognition accuracy changes with skin tone, hand position, Recognition aids converted to project
and background[3]. Minor changes in hand placement, facial using communicati images using achieves an
expressions, and movement are common in BdSL signs, and YOLOv5 on in FFmpeg. average
they can drastically change the meaning. Those who are not M. Karim et Bangladesh/I accuracy of
familiar with the language may easily overlook the al. (2023) [8] ndia. 91.62%.
complexity, which could result in misunderstandings. 3. Sign A The model is The
Resources for BdSL are frequently more limited than those Language TensorFlow- trained to developed
for spoken languages, which makes it difficult for people to Recognition powered sign recognize system is
learn and use the language outside of specialized classes or System using language different showing an
contacts with the deaf community[4]. Large datasets of TensorFlow interpreter gestures of average
BdSL movements can be used to train the system, giving it Object translates sign language confidence
the capacity to detect small variations and complexities that Detection hand gestures and translate rate of
are essential for accurate interpretation[5]. Researchers in API captured in them into 85.45%
artificial intelligence and natural language processing are at
Mishra et al. real-time into English.
present attempting to create a system that can automatically
recognize sign language. ASL, BSL, and JSL belong to the (2022) [9] English text.
sign languages whose automatic recognition has improved to
a high degree of accuracy (more than 99%)[6]. In the future,
4. Deep Recognition Deep learning- Highest allows real-time engagement via mobile networks[11]. In
learning- of Indian based training addition, in order to improve accessibility and inclusivity in
based sign Sign convolutional accuracy sign language interaction, GestureTek, a company that
language Language neural networks of 99.72% focuses on gesture recognition technology, has developed
recognition (ISL) static (CNN) using and interactive systems that employ motion-tracking algorithms
system for signs, robust modeling 99.90% on and depth-sensing cameras to translate and interpret sign
static signs including of static signs colored language gestures into spoken language or text[12].
digits, and
Kumar et alphabets, grayscale C. Survey Analysis
al. (2020) and words images. In order to understand the needs of the user base for this
[13] used in day- project, an initial survey was administered to people in the
to-day life. age range of 5 to 55 and older. The surveyed demographic
5. A CNN Real-time, Convolutional Average was primarily made up of friends and family members of
sign web-camera- Neural Network recognitio people who are Deaf or hard of hearing. When asked how
language based (CNN) based n accuracy they wanted to benefit from the initiative, they indicated that
recognition British Sign system of 89%. they wanted to improve educational opportunities and
system with Language learning as well as communication accessibility with an
single & recognition emphasis on understanding. When responding to questions
double- system about what aspects of the project they would like to see
handed improved, participants emphasized how critical it is to
gestures achieve high accuracy and speed of interpretation in
addition to being able to identify and accommodate different
N. Buckley signers and dialects. Moreover, there was a clear preference
et al. (2021) for the use of intelligent robotic elements in the proposed
[14] system.
D. Low-Cost Device
6. Machine The Hand Gesture Works
application recognition well in Table 2. Detail Cost
learning
based sign of the using HOG black Component Price
language research is (Histogram of backgroun Arduino Uno R3 ৳789
recognition: to convert Oriented d and ESP-32 Cam Module ৳ 848
a review sign Gradients)SVM white
language to (Support Vector foreground
1.8-inch TFT Display ৳480
and its
research speech. Machine) as a HC-05 Bluetooth Module ৳315
frontier. classifier, PL2303 USB to TTL ৳98
Conversion of Serial Converter
sign language to
Hasan et al. Mini Speaker ৳30
(2016) audio using
[15] TTS (Text to Breadboard ৳88
Speech) Jumper Wires ৳112
converter Total ৳2760

B. Some previous HCI work with sign language III. METHODOLOGY

Figure1. Reference image of SignAloud and MobileASL project


Plenty of creative tools and initiatives have been
developed to help with interaction and interpretation in sign
language. The SignAloud gloves are designed to assist those
who have no expertise in American Sign Language (ASL)
along with those who are by capturing hand motions and
movements using sensors such as flex sensors and Figure2. System Block Diagram of proposed system
accelerometers and translating them into text or speech[10].
ASL video-based communication is made possible with
MobileASL, a mobile application that saves bandwidth and
The project process includes several critical phases for processing unit for gesture analysis, using libraries such as
creating a gesture recognition device. Initially, hardware OpenCV to handle data from the ESP32-CAM and sensors.
components such as an ESP32-CAM module for hand Additional tools, such as PySerial, facilitate UART
gesture capture, a PL2303 USB to TTL serial converter for communication. Text-to-speech libraries such as pyttsx3
communication, and a TFT display for visual output are offer audible feedback. Machine learning packages such as
assembled and linked. The ESP32-CAM captures hand scikit-learn and TensorFlow enable gesture detection
motions, while the APDS-9960 sensor detects additional algorithms. A graphical interface, maybe created with
gestures. Instead of using a Raspberry Pi, a PL2303 USB to Tkinter or PyQt, shows interpreted gestures on a TFT
TTL converter makes data transfer easier. Collected data is
display. The seamless integration of these components,
sent to the ESP32-CAM for analysis. The ESP32-CAM uses
including UART connectivity and visual feedback, is
gesture recognition algorithms to interpret received gestures.
The TFT display shows interpreted gestures. The system is critical.Testing and debugging ensure system reliability.
extensively tested and refined to ensure precise gesture
interpretation and overall efficiency. This approach ensures V. INTERACTION METHOD
the development of a robust gesture recognition device using
the ESP32-CAM, PL2303 converter, and TFT display,
enabling both visual and aural feedback for users. The ESP32-CAM camera module is used in this project to
record hand movements, which are then processed for
IV. SYSTEM DESIGN analysis as part of the interaction model. Using the ESP32
camera library for effective processing, the ESP32-CAM
A. Hardware Design: firmware takes pictures of hand motions. After that, these
pictures are sent via UART communication to other parts of
the system like a display module or TTL converter for
additional processing or visualization. Parallel to this,
Python scripts operating on a different system, the TTL
converter, receive the gesture data and use OpenCV and
other libraries for image processing and machine learning
for gesture identification to conduct a thorough analysis.
The studied data is then presented on a TFT display after
being meaningfully interpreted possibly with the aid of text-
to-speech libraries for audio feedback. Real-time data
collection, processing, analysis, and display are guaranteed
by this interaction approach.
VI. EXPERIMENT DESIGN

Figure3. Hardware design of proposed system

Our new system architecture includes an ESP32


microcontroller as the central processing unit. The ESP32
collects and processes data by programmed instructions. It is
linked to the ESP32-CAM module, which captures hand
movements, and a PL2303 USB to TTL serial converter, Figure4. Experimental Design of the project
which facilitates communication. A TFT display and a
speaker serve as output devices. When the microcontroller The experimental design involves capturing hand gestures
receives input data, it converts it to text and then into audio using the ESP32-CAM's camera module, transmitting data
using the TTS (Text-to-speech) method. The data is via UART to a TTL converter, and processing it on a
subsequently shown on the TFT display and broadcast via separate system, represented by the TTL converter.
the speaker. To simplify circuit connections, we use the
Interpretations are displayed on a TFT display, aided by
LMC 1602 IIC with the LCD and an amplifier with the
speaker. The microprocessor transforms spoken words into text-to-speech for auditory feedback. The experiment
text, which is then shown on the LCD panel, allowing for assesses the accuracy and efficiency of gesture recognition
smooth communication between the user and the system. algorithms in real-time applications, focusing on user
interaction and system responsiveness.
B. Software Design:
The The ESP32-CAM firmware is in charge of recording
hand gesture images with the camera module and
connecting with external devices such as the Raspberry Pi,
which is substituted here by a TTL converter. It uses the
ESP32 camera library to take and process photos. The TTL
converter enables UART connection between the ESP32-
CAM and other components. Python acts as the core Figure5. Proposed 3D Design
VII. CONCLUSION Model.” Sensors, vol. 22, no. 2, 12 Jan. 2022, p. 574,
https://doi.org/10.3390/s22020574.
In conclusion, the Bangla sign language interpretation [4] Handbook of Research on Medical Interpreting. (2020). In Advances
device demonstrated the viability of developing assistive in medical diagnosis, treatment, and care (AMDTC) book series.
technologies for people with hearing difficulties utilizing https://doi.org/10.4018/978-1-5225-9308-9
easily available hardware components and software [5] Sultan, A., Makram, W., Kayed, M., & Ali, A. (2022). Sign language
identification and recognition: A comparative study. Open Computer
resources. The device can accurately analyze and express Science, 12(1), 191–210. https://doi.org/10.1515/comp-2022-0240.
the meanings of hand movements in real-time through the [6] Tao, W., Leu, M., & Yin, Z. (2018). American Sign Language
integration of image processing, machine learning, and aural alphabet recognition using Convolutional Neural Networks with
feedback. While more modification and optimization may multiview augmentation and inference fusion. Engineering
be required to increase accuracy and user experience, the Applications of Artificial Intelligence, 76, 202–213.
https://doi.org/10.1016/j.engappai.2018.09.006.
project provides a solid platform for future progress in this
[7] Siddique, S., Islam, S., Neon, E. E., Sabbir, T., Naheen, I. T., &
area. Khan, R. (2023). Deep Learning-based Bangla Sign Language
Detection with an Edge Device. Intelligent Systems With
Applications, 18, 200224.
VIII. FUTURE WORKS https://doi.org/10.1016/j.iswa.2023.200224.
[8] M. Karim, M. Ali, M. F. Hasan, M. Fatema-Tuj-Jahra, S. S. Sultana
There are several possibilities that future research could go and D. M. Farid, "Bangla Sign Language Recognition using
to improve the functionality and performance of the Bangla YOLOv5," 2023 IEEE 8th International Conference for Convergence
sign language interpreting tool. The development of more in Technology (I2CT), Lonavla, India, 2023, pp. 1-7, doi:
advanced gesture recognition algorithms to increase 10.1109/I2CT57861.2023.10126419.
accuracy and robustness is one possible opportunity for [9] Srivastava, S., Gangwar, A., Mishra, R., & Singh, S. (2022). Sign
Language Recognition System using TensorFlow Object Detection
progress. To improve efficiency and real-time performance, API. ArXiv. https://doi.org/10.1007/978-3-030-96040-7_48.
efforts can also be made to optimize the hardware and [10] Azodi, N., & Pryor, T. (2016). SignAloud: Gloves That Translate
software components of the device. Moreover, adding more Sign Language Into Speech. In IEEE SoutheastCon 2016 (pp. 1-5).
sign language support and adding more interpreted motions [11] Cavender, A. C., Ladner, R. E., & Riskin, E. A. (2006). MobileASL:
to the device's vocabulary could increase its usefulness and https://doi.org/10.1145/1168987.1169001.
influence. User feedback sessions and collaborations with [12] GestureTek. (n.d.). Sign Language Recognition. Retrieved from
http://www.gesturetek.com/sign-language-recognition.php.
sign language specialists can yield insightful information
[13] Wadhawan, A., Kumar, P. Deep learning-based sign language
that can be used to improve the device's functionality and recognition system for static signs. Neural Comput & Applic 32,
user interface. 7957–7968 (2020). https://doi.org/10.1007/s00521-019-04691-y.
[14] N. Buckley, L. Sherrett and E. Lindo Secco, "A CNN sign language
recognition system with single & double-handed gestures," 2021
REFERENCES IEEE 45th Annual Computers, Software, and Applications
Conference (COMPSAC), Madrid, Spain, 2021, pp. 1250-1253, doi:
10.1109/COMPSAC51774.2021.00173.
[1] Harati R (2023) Importance of Sign Language in Communication and
its Down Barriers. J Commun Disord. 11: 247 [15] Hasan, M., Sajib, T. H., & Dey, M. (2016, December). A machine
learning based approach for the detection and recognition of Bangla
[2] Finger detection for sign language recognition. (2010, July 1). sign language. In 2016 international conference on medical
ResearchGate. engineering, health informatics and technology (MediTec) (pp. 1-5).
https://www.researchgate.net/publication/44259601_Finger_Detectio IEEE.
n_for_Sign_Language_Recognition
[3] Podder, Kanchon Kanti, et al. “Bangla Sign Language (BdSL)
Alphabets and Numerals Classification Using a Deep Learning

You might also like