0% found this document useful (0 votes)

39 views19 pages

Paper 1

This paper presents a real-time Arabic avatar system designed to assist deaf-mute individuals in communicating by translating text or spoken input into Arabic Sign Language (ArSL) gestures using deep learning techniques. The system employs the YOLOv8 model for gesture recognition, achieving a recognition accuracy of 99.4% on the AASL dataset, and aims to enhance communication within Arabic-speaking communities. The research addresses challenges in sign language interpretation and proposes a framework that can be deployed as a mobile application for effective real-time communication.

Uploaded by

Naglaa Soliman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views19 pages

Paper 1

Uploaded by

Naglaa Soliman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Computers and Electrical Engineering 119 (2024) 109475

Contents lists available at ScienceDirect

Computers and Electrical Engineering

journal homepage: www.elsevier.com/locate/compeleceng

Real-time Arabic avatar for deaf-mute communication enabled by

deep learning sign language translation
Fatma M. Talaat a, b , Walid El-Shafai c, d , Naglaa F. Soliman e , Abeer D. Algarni e ,
Fathi E. Abd El-Samie e , Ali I. Siam f, *
a
Dept. of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt
b
Faculty of Computer Science & Engineering, New Mansoura University, Gamasa, 35712, Egypt
c
Security Engineering Lab, Computer Science Department, Prince Sultan University, Riyadh, 11586, Saudi Arabia
d
Department Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
e
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box
84428, Riyadh, 11671, Saudi Arabia
f
Dept. of Embedded Network Systems Technology, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt

A R T I C L E I N F O A B S T R A C T

Keywords: Deaf-mute individuals encounter substantial difficulties in their daily lives due to communication
Arabic sign language recognition impediments. These individuals may encounter difficulties in social contact, communication, and
Deep learning capacity to acquire knowledge and engage in employment. Recent studies have contributed to
YOLOv8
decreasing the gap in communication between deaf-mute people and normal people by studying
Real-time communication
Deaf-mute communication
sign language interpretation. In this paper, a real-time Arabic avatar system is created to help
deaf-mute people communicate with other people. The system translates text or spoken input into
Arabic Sign Language (ArSL) movements that the avatar makes using deep-learning-based
translation. The dynamic generation of the avatar movements allows smooth and organic real-
time communication. In order to improve the precision and effectiveness of ArSL translation,
this study depends on a state-of-the-art deep learning model, which makes use of YOLOv8, to
recognize and interpret sign language gestures in realtime. The avatar is trained on three diverse
datasets of Arabic sign language images, namely Sign-language-detection Image (SLDI), Arabic
Sign Language (ArSL), and RGB Arabic Alphabet Sign Language (AASL), enabling it to accurately
capture the nuances and variations of hand movements. The best recognition accuracy of the
suggested approach was 99.4% on the AASL dataset. The experimental results of the suggested
approach demonstrate that deaf-mute people will be able to communicate with others in Arabic-
speaking communities more effectively and easily.

1. Introduction

The World Health Organization (WHO) declared that over 5% of people on the planet are deaf, and they face significant challenges
in communicating with normal people. Additionally, WHO predicts that by 2050, 1 in every 10 people will have a hearing loss
disability [1]. Deaf people use hand gestures and movements that correspond to alphabets and words to communicate with other
people, that is known as sign language [2]. The World Federation of the Deaf (WFD) states that there are 200+ sign languages

* Corresponding author.
E-mail address: [email protected] (A.I. Siam).

https://doi.org/10.1016/j.compeleceng.2024.109475
Received 30 April 2024; Received in revised form 5 June 2024; Accepted 10 July 2024
0045-7906/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

worldwide [3]. Deaf people suffer from real challenges to express their thoughts and needs to normal people without an interpreter for
their signs. This has a serious impact on their social and daily life experiences. Additional challenges may exist if (i) a signer com-
municates with someone that he cannot communicate with signs, (ii) a signer communicates with a different sign language (amongst
the existing 200+ sign languages).
Recently, various studies have contributed to helping deaf people to communicate with normal people by providing automatic sign
language translators (SLT), which aim to produce spoken words for the provided sign language, or vice versa [4,5]. Automatic SLT
provides a unified tool that allows communication between signers and non-signers by translating the signs into either different
understood sign language or spoken/written words, and vice versa [6,7].
Recent advancements in modern technologies can facilitate the process of automatic SLT. For example, convolutional neural
networks (CNNs) have proven particularly effective in this domain, as they can learn the complex spatial and temporal patterns
inherent in sign language gestures [8,9]. In addition, recent developments in computer vision, machine learning, and natural language
processing have paved the way for innovative solutions in automatic SLT [10]. Moreover, wearable sensors, such as data gloves and
motion capture systems, can provide precise hand and finger tracking, enabling the capture of fine-grained movement data [11].
Computer vision algorithms can then be employed to extract features from the sensor data and translate them into corresponding sign
language gestures [12,13]. The combination of deep learning, sensor-based devices, and computer vision techniques holds great
promise for the development of robust and accurate automatic SLT systems. These systems have the potential to break down
communication barriers between deaf and hearing individuals, empowering deaf people to participate more fully in the society.
Different challenges can be faced when building an automatic SLT, starting from collecting the required data for training the model
to the deployment of the model [14–16]. Some challenges can be summarized as follows:

• Sign Diversity: There are several sign languages in the world.

• Knowledge Levels of Technology: People have different levels of knowledge of technology, which could be a barrier in dealing with
smart systems.
• Language-dependent Models: Most studies are concentrated on American Sign Language (ASL), and there are few resources and
datasets available for other sign languages.
• Capture Variance: Individuals may perceive the same sign differently due to variations in their stances and hand kinematics. Also,
there will be a slight difference when a signer repeats the same sign with the same hand.
• Facial Expressions: Some signs can be fully understood when captured side by side with the facial expression. Some obstructions
may infer the system, such as glasses, and long hair.
• Environment Interference: Redundant background and other landmarks can exist in the captured scene, and they may interfere
with the interpreted sign.
• Database Generalization: Most of the available sign language datasets are recorded under restricted conditions, which may limit
system generalization.

Researchers have explored various approaches to build robust and accurate SLR systems. Two prominent sub-categories have
emerged: pixel-based and skeleton-based approaches. The pixel-based approaches directly analyze the entire image or video frame
containing the signer’s hand(s) and body posture. CNNs are frequently employed for feature extraction and classification. These
approaches can capture intricate details of signs, including hand orientation, finger configuration, and facial expressions which
contribute to meaning in some sign languages [17]. On the other hand, the skeleton-based approaches focus on extracting key points
(joints) from the signer’s hand and body, forming a skeletal representation [18]. Techniques like CNNs and Support Vector Machines
(SVMs) are often used for recognition based on the movement trajectories of these key points [19,20]. These approaches are more
robust to background variations and lighting conditions [21].
In this paper, a deep learning-based Arabic Sign Language (ArSL) translator is presented. The suggested model depends on the
YOLOv8 framework to recognize and interpret sign language gestures in real time, which helps deaf and mute people to communicate
with each other in Arabic language. The contributions of the current work can be summarized as follows:

• Reviewing the literature on various constructed systems and frameworks in the context of automatic SLT.
• Proposing a deep-learning-based framework using YOLOv8 architecture for interpreting Arabic sign language. This helps deaf and
mute Muslims to understand the meanings and interpretations of the Holy Qur’an and perform Islamic rituals.
• Overcoming the problem of available dataset limitations and user-dependent datasets by adopting diverse datasets and data
augmentation techniques to achieve system generalization.
• Developing a real-time translation method that may be deployed as a mobile application.
• Conducting various experiments and comparing the results with those of other state-of-the-art works.

The rest of the paper is organized as follows. Section 2 presents a discussion of some related works and studies. The proposed ArSL
translation framework is presented in detail in Section 3. The experiments, their results, and a comparative analysis are presented in
Section 4. The paper conclusion is given in Section 5.

2. Literature review

In this section, recent related works are investigated and analyzed for discovering and highlighting any research gaps. Here, various

2
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

recent works related to automatic sign language and Arabic sign language interpretation using advancements in artificial intelligence
(AI), transfer learning, and computer vision techniques are discussed.
The CNNs have demonstrated significant advancements in various fields, including visual object recognition [22,23], natural
language processing [24], and medical image processing [25–28], among others. However, there is a lack of research on the utilization
of CNNs for video classification. This is due to the difficulty of integrating the CNNs to incorporate both spatial and temporal data [29].
Computer vision is a subfield of artificial intelligence that aims to extract a meaning from images and videos, similar to how the
human visual system perceives and comprehends its surroundings [30]. Computer vision intersects computer science, machine
learning, and image processing, and its applications span over image classification [31], object detection [32], image segmentation
[30], medical image analysis [33,34], and robotics [35].
Recently, CNNs and computer vision approaches have been used as a standard practice to improve the accuracy of SLR models. For
example, Sreemathy et al. [36] proposed an image-based SLR model to interpret 80 static signs. They adopted both YOLOv4 and
Mediapipe with SVM classifiers to recognize the hand gestures of the 80 words. Their method achieved an accuracy of 98.8% for the
YOLOv4 classifier and 98.62% for the Mediapipe with an SVM classifier.
Attia et al. [37] developed a method for automatic SLR based on YOLOv5 architecture. They presented three models based on
YOLOv5 structure with attention techniques to recognize alphabets and numbers from hand gestures. They identified that the attention
mechanism helps in improving the detection performance through observing important areas and distinguishing objects from their
surroundings, efficiently, even with occlusions or cluttered backgrounds. The authors validated their methods on two datasets: MU
HandImages ASL and OkkhorNama, and some augmentation techniques were applied to the images. The models achieved a 98% F1
score. The authors of [38] created an automated Bangla Sign Language (BSL) detection system using machine vision technology and an
embedded device. Detectron2, EfficientDet, and YOLOv7 with Jetson Nano are among the deep learning methods that have been
trained on the Okkhornama open-source dataset. The Jetson Nano, however, might cost more than other available edge computing
products, which would make it unaffordable for certain clients or projects with tight budgets. Al Ahmadi et al. [39] presented a
YOLOv8-based model for ArSL recognition. They relied on both channel attention and spatial attention modules with cross--
convolution module to extract and process features from input images. The validation of the model revealed that it achieved a clas-
sification accuracy of 99% on the ArSL21L dataset.
Miah et al. [19] developed an automatic sign recognition method based on hand skeleton joint information to overcome partial
occlusion and redundant background problems that may exist in images. Their method involves capturing the positions of joints in the
skeleton and the movement of these joints to analyze the whole body movement of a person during sign language gestures. This in-
formation is then fed into two separate streams. The first stream processes joint key features through a combination of a Separable
Temporal Convolutional Network (Sept-TCN), a Graph Convolutional Network (GCN), and an attention module. The second stream
processes the joint motion. Features extracted from the two streams are fused to extract spatial-temporal features from sign language
videos.
The authors of [4] developed an application named Sign4PSL that translates sentences to Pakistan Sign Language (PSL) for deaf
people with visual representation using virtual signing characters. The application is compatible with several platforms, such as web
and mobile. This system accepts English language text as input, translates it into sign language, and visually represents the movements
through a virtual character. The system attained an accuracy of 100%, when processing alphabets, numerals, words, and phrases.
Nevertheless, when it comes to sentences, the architecture attains an accuracy of 80%. In [40], the authors developed a computer
vision system that depends on deep convolutional neural networks to translate Amharic alphabets into Ethiopian Sign Language. The
suggested model achieved a training accuracy of 98.5%, a validation accuracy of 95.59%, and a testing accuracy of 98.3% in signal
identification. Furthermore, this technology has the capability to transform Amharic sign language visuals into written text.
In [41], the authors proposed a multistream deep learning model for recognizing signs of Brazilian, Indian, and Korean sign
languages. They incorporated 3D CNN networks and Generative Adversarial Networks (GAN) to generate depth maps of sign artic-
ulation. The method receives information from facial expressions, hands, and distances between joints and provides a visual expla-
nation to identify which regions are important for model decision making. The method achieved an accuracy of 91%, and an f1-score of
90% on a Brazilian sign language dataset. The authors of [42] introduced a framework called SignGraph, a pose-based SLR model using
a graph convolution network (GCN) and a residual neural network (RNN). They relied on Mediapipe to extract spatiotemporal features
from 65 joints from the face, hands, and pose from video sequences, which are given as input to the model. They used a ResNet-based
architecture for recognition. The authors reported that their method is computationally efficient in terms of the number of learnable
parameters, the computational cost, the number of flops, and the inference time.
Lee et al. [12] developed a wearable hand device to interpret sign language from hand gestures. They attached flex sensors,
pressure sensors, and IMU sensors on a signer’s hand to distinguish language characters. Their system involves three modules: a sensor
module, a processing module, and a display module on a mobile application. First, sensor data are analyzed using an embedded SVM
classifier to recognize characters. Second, the recognized character is transmitted to a mobile device through BLE communication. The
mobile application converts the received text into voice output. The system achieved an average accuracy of 65.7% without pressure
sensor and 98.2% with pressure sensor.

3
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Sharma et al. [43] proposed a transfer-learning-based methodology for continuous sign language translation. Data was collected
using multiple IMUs on both hands. Their architecture incorporated CNN, Bi-LSTM, and connectionist temporal classification layers.
First, they trained the model on static isolated signs which enhanced the classification of continuous sentences of sign data. A limited
amount of data is used to train the lower layers of the pre-trained network. The classification accuracy reached 88.5%. Barbhuiya et al.
[8] proposed a transfer-learning-based SLR method for human-computer interaction (HCI) applications. The authors utilized pre--
trained AlexNet and VGG16 architectures for feature extraction along with an SVM classifier. They validated their method using a
dataset of 36 characters for 5 persons. They reported an accuracy of 99.82%.
Naz et al. [18] presented a pose-based approach for SLR. This method involves three steps: pose extraction, handcrafted feature
generation, and feature space mapping and recognition. The pose-based features include joints, bone lengths, and bone angles. A
lightweight residual graph convolutional network (ResGCN) and a new part attention method incorporate body spatial and temporal
information in a compact feature space and recognize signs. The presented technique achieved an accuracy of 83.33%.
AbdElghfar et al. [44] proposed an Arabic SLR model to help deaf and dumb Muslims recite the Holy Qur’an. They used a
CNN-based deep learning model to extract the features and recognize hand motions referring to Fourteen dashed Qur’anic letters. They
used 24,137 images from the ArSL2018 Arabic sign language dataset. The testing accuracy of the model with SMOTE technique was
97.67%.
Alsaadi et al. [45] proposed an Arabic alphabet sign language recognition model using deep learning and transfer learning tech-
niques. They utilized the ArSLA dataset, consisting of 54,049 images and representing 32 Arabic letters. The authors employed a
number of CNN architectures, including VGG16, ResNet50, EffecientNet, and AlexNet. The results indicated that the AlexNet archi-
tecture achieves the highest accuracy of 94.81%.
Islam et al. [46] presented a novel deep learning approach that leverages stacked autoencoders and the Internet of Things (IoT)
infrastructure to refine feature extraction and classification for accurate ArSL recognition. Saleem et al. [47] proposed a machine
learning-based system for two-way communication between deaf and mute (DnM) and non-deaf and mute (NDnM) individuals. Their
system depends on deep learning for hand gesture recognition and supervised machine learning for multi-language support. This
approach highlights the potential of machine learning in bridging communication gaps but may require user training on the specific
system interface.
Balaha et al. [14] proposed a deep-learning-based approach for Arabic sign language recognition. They created an Arabic sign
language dataset with 8467 videos of 20 Arabic words, using a mobile phone. They developed a method combining CNN and RNN to
interpret sign language based on the features extracted from video frames. They adopted two CNN networks for feature extraction, and
the features extracted from the two streams are fused to produce a feature sequence. The RNN network is used as a final classifier to
produce the final prediction. Their approach achieved a testing accuracy of 98%.
Kamruzzaman [48] proposed a CNN-based Arabic sign language recognition model that can translate the recognized words into
audible speech. The author also created an image dataset with 125 images for each of 31 letters of Arabic sign language. These images
are subjected to a number of preprocessing techniques, such as resizing, color conversion, and augmentation. The recognized Arabic
words are converted into speech using Google Text to Speech API. The achieved accuracy was 90%. The author reported that the
system can be further improved by employing an Xbox Kinect device.

3. ArSLGen: Arabic sign language generation algorithm

The proposed ArSLGen algorithm is intended to dynamically translate Arabic Sign Language (ArSL) from spoken or written Arabic
using deep learning, with a focus on real-time object detection. The algorithm consists of four basic phases, each of which contributes
to the overall efficacy and efficiency of the Arabic Sign Language translation process. The architecture of the proposed framework
consists of four main phases as illustrated in Fig. 1, which are: (i) Data Collection and Preprocessing, (ii) Model Training with YOLOv8,
(iii) Performance Evaluation and Metrics, and (iv) Real-time Translation and Validation.

3.1. Data collection and preprocessing

The main objective of this phase is to compile an extensive dataset of ArSL gestures and preprocess it in order to get it ready for
model training. The collection ensures a diverse representation of ArSL motions and expressions by drawing from a variety of sources,
such as movies, photographs, and motion capture data.
An important stage is data annotation, where each gesture is labeled with the meaning or translation in ArSL. The basis for su-
pervised learning is this annotated dataset, which helps the model learn how to translate ArSL gestures into their corresponding verbal
expressions. To guarantee uniformity in the input data, the photos or videos are downsized to a standard resolution during pre-
processing. Normalizing pixel values to a consistent scale makes training of the model easier. Rotation, flipping, and scaling are ex-
amples of data augmentation techniques that can be used to improve the model capacity to generalize to new data and diversify the
dataset. Ultimately, the preprocessed dataset is divided into test, validation, and training sets to ensure accurate assessment of the
model effectiveness. The overall steps of the Data Collection and Preprocessing phase are shown in Algorithm 1.

4
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 1. Arabic sign language generation algorithm.

5
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Algorithm 1
Data collection and preprocessing algorithm.

The three primary subphases of the Data Collection and Preprocessing procedure are (i) Gathering Datasets, (ii) Handwritten
Annotation, and (iii) Data Augmentation.

i. Gathering Datasets

Several datasets containing Arabic sign language motions are collected in this sub-phase. In this paper, Sign-language-detection
Image (SLDI), Arabic Sign Language ArSL, and RGB Arabic Alphabet Sign Language (AASL) are among the curated datasets. The
goal of this phase is to give a complete representation of Arabic sign language motions, encompassing a range of facial expressions and
movements.

ii. Data Annotation

The manual annotation sub-phase involves labeling images after data gathering. Every image in the datasets has been annotated
with Arabic text transcripts and bounding boxes surrounding hands. This generates a cleanly labeled dataset as illustrated in Fig. 2 that
serves as the basis for training and assessing the model in the future.

iii. Data Augmentation

6
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 2. Annotated audio transcripts for arabic sign language recognition.

Strategies for data augmentation are employed to broaden the diversity of datasets. In this study, various techniques are used with
images for this purpose, including rotation, flipping, zooming, stretching, and adjusting brightness. The goal of this sub-phase is to
increase the model resilience by subjecting it to a greater variety of sign language gestures and environmental factors. It allows the
model to efficiently generalize to different hand morphologies and activities.
The data augmentation processes utilized in the proposed method with the speech signals are shown in Fig. 3. With this step, the
model is exposed to more environmental factors to enhance its robustness. In this step, random noise with different types and SNRs is
added to the speech signals. The inclusion of noise enhances the model ability to withstand disturbances and uncertainties in the input
data, hence optimizing its application in noisy or less controlled environments.

3.2. Model training with YOLOv8

The deep learning model is trained to recognize and detect ArSL motions in input images or videos during the model training phase
using YOLOv8. YOLOv8 is a cutting-edge object detection system known for its precision and high speed [49]. In order to precisely
anticipate boundary boxes and class labels for ArSL gestures in real-time scenarios, the model is trained by adjusting its parameters.
Throughout the training phase, the dataset is iterated over, batches of images are fed through the model, the loss is computed, and an
optimizer is used to update the model weights. At that point, real-time ArSL translation can be performed using the learned YOLOv8
model. The general procedure for training the model with YOLOv8 is described in Algorithm 2.

3.3. Performance evaluation and metrics

The proposed algorithm is evaluated in this phase using a range of measures to identify the efficacy of the model in translating ArSL
motions into spoken or written Arabic. In order to assess the model predictions against the ground truth translations, three datasets,
including SLDI, ArSL, and AASL are employed. To measure the performance of the model, metrics including accuracy, precision, recall,
and F1-score are calculated. These metrics give a quantitative assessment of the model efficacy by indicating how well it can translate
ArSL motions. In addition, researchers can find cases in which the model might be deficient and change it to increase overall per-
formance by examining the performance indicators.
In order to make sure that the model can translate ArSL gestures in practical settings, the Performance Evaluation and Metrics phase
is essential for the development of the ArSLGen algorithm. The overall steps of The Performance Evaluation and Metrics are shown in
Algorithm 3.
The trained YOLOv8 model and the test dataset are the inputs for this algorithm, and the computed performance metrics are the
output. The test dataset is loaded for assessment by the algorithm after the evaluation metrics have been defined. Evaluation metrics
are computed by comparing the model prediction with the ground truth for every sample in the test dataset. The average performance
metrics across all samples are then calculated by the algorithm and saved for later examination.

7
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475
Fig. 3. Data augmentation steps.
8
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Algorithm 2
Model training with YOLOv8 algorithm.

9
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Algorithm 3
Performance evaluation and metrics algorithm.

3.4. Real-time translation and validation

In this phase of the proposed ArSLGen algorithm, the trained model is put into practice in a real-time environment to translate ArSL
gestures into spoken or written Arabic. In addition, the model performance is validated in real-time scenarios to make sure it can
reliably and quickly translate ArSL gestures as they are made. The objective is to give deaf-mute people in Arabic-speaking com-
munities a smooth and useful communication tool. The overall steps of the Real-time Translation and Validation are shown in Al-
gorithm 4.
The ArSLGen method is designed to dynamically transform ArSL movements into text or spoken Arabic in real time, leveraging
deep learning techniques. The algorithm is divided into four main stages: Data Collection and Preprocessing, which gathers and
prepares a diverse dataset of ArSL gestures for training; Model Training with YOLOv8, which depends on the YOLOv8 architecture to
train a deep learning model to detect and translate ArSL gestures; Performance Evaluation and Metrics, which is performed to assess
the performance of the trained model using different metrics; and Real-time Translation and Validation, which depends on the trained
model to translate ArSL gestures in a real-time setting. The ArSLGen algorithm seeks to improve the communication and social
interaction skills of deaf-mute people in Arabic-speaking communities by offering a smooth and efficient communication tool during
these periods.

4. Results

This section provides a description of the used datasets, performance measurements, and a comparison with the most cutting-edge
techniques currently in use.

10
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Algorithm 4
Real-time translation and validation algorithm.

Fig. 4. Samples of AASL dataset.

Fig. 5. Samples of SLDI dataset.

4.1. Dataset

Three comprehensive collections of images were created to train and validate the proposed approach to recognize gestures in ArSL.
The first dataset is the RGB Arabic Alphabet Sign Language (AASL) dataset [50]. It includes thirty-one classes, including Ain, Al, Alef,

11
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 6. Samples of ArSL dataset.

Beh, Dal, Feh, Ghain, Hah, Heh, Jeem, Kaf, Khah, Laa, Lam, Meem, Noon, Qaf, Reh, Seen, Sheen, Tah, Teh, Teh_Marbuta, Thal, Theh,
Waw, Yeh, Zah, and Zain. The dataset constitutes a total number of 7534 images; 6027 of them were used for training and 1507 for
testing. Each image in the dataset involves a manual annotation with bounding boxes around the hands and the accompanying Arabic
text transcription. Samples of the images from this dataset are shown in Fig. 4.
The second adopted dataset is the Sign-Language-Detection Image (SLDI) dataset [51]. Thirty lessons covering all Arabic letters are
included. The dataset contains 7494 images, which are partitioned into 5234 training images, 1129 testing images, and 1131 vali-
dation images. Some of the images from this dataset are displayed in Fig. 5.
The third dataset is the Arabic sign language (ArSL) dataset [52]. There are 5832 images in this dataset, each with 416 × 416 pixels.
The testing set consists of 290 images, the training set of 4651 images, and the validation set of 891 images. These images were
captured with a cell phone camera in a variety of environments, with different backgrounds and hand orientations. Samples of this
dataset are displayed in Fig. 6.

4.2. Performance metrics

The following metrics are used to compare the state-of-the-art methods with the proposed model for Arabic speech/text translation
into Arabic Sign Language using YOLOv8: (i) Precision: The portion of predicted positive outcomes that actually occurred as opposed
to all predicted positive outcomes. (ii) Recall: The portion of actual positive results that are actually positive. (iii) The F1-score is the
harmonic mean of recall and precision. (iv) mAP: The average precision of the models. We also discuss the quantitative analysis that
was conducted to evaluate the relative effectiveness of various strategies. Prominent methods that use YOLO architectures and deep
learning approaches to identify Arabic sign language are compared with the proposed model for translating Arabic speech/text into
Arabic Sign Language. (v) The Matthews Correlation Coefficient (MCC) is a metric used to evaluate the performance of binary clas-
sification models. Unlike accuracy, which can be misleading in imbalanced datasets, MCC takes into account true positives, true
negatives, false positives, and false negatives, providing more robust measures of model performance. Here is how to interpret the MCC
values: 1: perfect prediction, between 0 and 1: Good prediction, 0: random prediction (no better than chance), between − 1 and 0: poor
prediction.
TP
Recall = (1)
TP + FN

TP
Precision = (2)
TP + FP

recall × precision
F1 − score = 2 × (3)
recall + precision

k=n
1 ∑
mAP = APk (4)
n k=1

(TP × TN − FP × FN)
MCC = (5)
sqrt((TP + FP) × (TP + FN) × (TN + FP) × (TN + FN))

Table 1
Configuration parameters for the YOLOv8 model.
Parameters Values

Epochs 100
Learning rate 0.01
Image size 640
Batch size 8
Number of images 20,860
Number of training images 15,912
Number of testing images 4948
Layers 225

12
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 7. Overall performance of the suggested model for translating Arabic speech/text into Arabic Sign Language using YOLOv8 with the
three datasets.

13
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 8. The precision-recall and F1-confidence curves of the suggested model for translating Arabic speech/text into Arabic Sign Language using
YOLOv8 with the three datasets.

14
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Fig. 9. The validation labeled of the suggested model for translating Arabic speech/text into Arabic Sign Language using YOLOv8 with the
three datasets.

15
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

where TP, or True Positive, denotes the percentage of cases that are accurately identified as positive. FP, or False Positive, is the
number of cases that are incorrectly classified as positive. TN and FN represent True Negative and False Negative, respectively,
indicating the percentage of instances correctly classified as negative and the quantity of cases incorrectly classified as negative. In
addition, APk is the class-k average precision, and n is the number of classes.

4.3. Model evaluation

We employed stratified 10-fold cross-validation to evaluate the model performance on the unseen evaluation dataset. This tech-
nique randomly partitions the data into 10 folds, where 9 folds are used for training and the remaining fold for testing. This process is
repeated 10 times, ensuring all data points are used for both training and testing. The final reported performance metrics (MCC,
precision, recall, F1-score) represent the average values obtained across all 10 folds.

4.4. Results and discussion

The proposed YOLOv8-based model is trained using the three preprocessed datasets. The architecture of YOLOv8 constitutes
multiple-layered CNNs that enable real-time object detection. The model was able to identify and locate Arabic sign language motions
in real-time after being trained on speech or text input. There were 100 epochs in the training procedure, an 8-batch size, and a 0.01
learning rate, with an Adam optimizer. To speed up convergence, we have employed pre-trained weights from a general-purpose object
detection model. The model’s parameters were modified during the training phase by iterating through the dataset for a predefined
number of epochs using gradient descent and backpropagation. A detailed summary of the YOLOv8 configuration parameters is
displayed in Table 1.
The proposed model for translating Arabic speech/text into Arabic Sign Language using YOLOv8 was developed, trained, and
verified on the GPU platform on Kaggle. The results of the experiments demonstrate that our model can identify Arabic Sign Language
with mean average precision (mAP) values of 99.5.3%, 99.2%, and 98.7%, respectively, across all classes for the AASL, SLDI, and ArSL
datasets.
Fig. 7 displays the results of recall, accuracy, and loss of the proposed model across the three datasets. For each of the three datasets,
the F1-score and precision-recall curves are displayed in Fig. 8. Moreover, Fig. 9 displays the validation results of the proposed model
for the three datasets.
Table 2 and Fig. 10 display the detailed results of the proposed model. For the AASL dataset, the achieved results for precision,

Table 2
Performance comparison of the proposed model for the three adopted datasets.
Dataset Precision (%) Recall (%) mAP (%)

AASL 99.4 99.5 99.5

SLDI 99.2 99.3 99.2
ArSL 98.2 99.8 98.7

Fig. 10. Performance metrics of the proposed model on different datasets.

16
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

Table 3
A comparison of the proposed model using the YOLOv8 model’s performance with different techniques.
Reference Model Recall% mAP% Precision% MCC Dataset
(↑) (↑) (↑) (↑)

Alsaadi et al. [45] AlexNet —– —– 94.81 0.83 54,049 grayscale JPEG pictures, each having a 64 × 64
(accuracy) resolution, that correspond to the 32 Arabic characters
Attia et al. [37] YOLOv5x 96.6 98.5 97.6 0.96 3 datasets
Kamruzzaman EfficientNetB4 96.2 —– 95.6 0.95 ArSL2018 dataset with 54,049 images
et al. [48]
Balaha et al. [14] CNN+RNN —– —– 98 0.87 8467 videos of 20 signs
(Accuracy)
Al-Barham et al. Modified ResNet-18 —– —– 99.47 0.88 ArSL2018 dataset with 54,049 images
[53] model (Accuracy)
AbdElghfar et al. QSLRS-CNN —– —– 97.13 0.86 24,137 images dataset
[54] (Accuracy)
Dabwan et al. [55] CNN model with —– —– 97.9 0.86 16,192 images dataset
EfficientnetB1 scaling (Accuracy)
Sreemathy et al. YOLO V4 99.17 98.17 97.2 0.98 676 images
[36]
Al-Barham et al. ResNet 18 —– —– 96.36 0.84 54,000 images
[56] (accuracy)
Al Ahmadi et al. YOLO V8 —– —– 99 (accuracy) —– ArSL21L dataset with 14,202 images
[39]
Proposed YOLO V8 99.5 99.5 99.4 0.99 AASL (7534 images)
Technique 99.3 99.2 99.2 0.99 SLDI (7494 images)
99.8 98.7 98.2 0.99 ArSL (5832 images)

The bold is the proposed model with highest evaluation metrics.

The up-arrow (↑) indicates that the higher the value of the metric, the better the performance.

recall, and mAP are 99.4%, 99.5%, and 99.5%, respectively. Additionally, for the SLDI dataset, the suggested model achieved results
for precision, recall, and mAP of 99.2%, 99.3%, and 99.2%, respectively. For the ArSL dataset, the suggested model obtained 98.2%,
99.8%, and 98.7% for precision, recall, and mAP, respectively.
Table 3 compares the performance of the proposed model with various approaches in the literature. It is clear that the proposed
model outperforms existing methods in terms of precision, recall, and mAP, indicating its efficiency in translating Arabic speech or text
into Arabic sign language using YOLOv8.
Furthermore, Table 3 illustrates the performance comparison of deep learning architectures using MCC. As Table 3 reveals, the
proposed YOLOv8 achieves the highest MCC score (0.99), indicating exceptional performance in recall, mean Average Precision
(mAP), and precision. This surpasses other architectures like AlexNet (0.83 MCC) and ResNet-18 (0.84 MCC). Notably, YOLOv4 also
exhibits strong results with an MCC of 0.98. These findings demonstrate that the proposed revisions provide a robust evaluation of the
model, highlighting its superiority over existing architectures.
Moreover, Compared to previous versions of YOLO, the proposed YOLOv8-based model outperforms the performance of the
YOLOv5x-based technique [37] and YOLOv4 [36] for the same task.
The results of the comparison demonstrate how effectively YOLOv8 converts Arabic voice or text to Arabic sign language. The
model can recognize and translate sign language motions with high accuracy and performs well in real-time. The authors hope that
these comparisons will demonstrate the superiority of the proposed model and demonstrate its practical application in enhancing
communication between the Arabic-speaking community and individuals with hearing impairments.
Training Set Performance:
To evaluate the model’s learning behavior during training, we also measured performance metrics on the training dataset. The
model achieved an MCC score of 0.99 on the training data, indicating that the model is robust. This suggests that the model learned
effectively from the training data and generalizes well to unseen data.

5. Conclusion

This paper proposes a methodology for automatic Arabic sign language recognition. This helps deaf and mute Muslims to un-
derstand the meanings and interpretations of the Holy Qur’an and perform Islamic rituals. The proposed model is based on YOLOv8
architecture, which translates Arabic sign language and Arabic speech into text. The model can effectively recognize and translate
Arabic sign language movements, as demonstrated by the high results of precision, recall, and mAP attained across all three datasets
(AASL, SLDI, and ArSL). For the AASL dataset, the achieved results for precision, recall, and mAP are 99.4%, 99.5%, and 99.5%,
respectively. Additionally, for the SLDI dataset, the suggested model achieved results for precision, recall, and mAP of 99.2%, 99.3%,
and 99.2%, respectively. For the ArSL dataset, the suggested model obtained 98.2%, 99.8%, and 98.7% for precision, recall, and mAP,
respectively. The enhanced efficacy of the suggested model compared to current techniques, such as those predicated on YOLOv4 and
YOLOv5x, highlights the progress achieved in the identification and interpretation of Arabic sign language. The model’s practical
application in improving communication for the Arabic-speaking community and people with hearing impairments is demonstrated by
its high accuracy and real-time performance.

17
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

In this study, the application of deep learning techniques, specifically the YOLOv8 architecture, has shown to be successful,
demonstrating the potential of such methods in the development of assistive technology for sign language translation. In order to
increase translation efficiency and accuracy, future research could concentrate on refining the model’s performance even more, adding
more diverse sign language gestures to the dataset, and investigating other deep learning architectures.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgement

The authors extend their appreciation to the King Salman center For Disability Research for funding this work through Research
Group no. KSRG-2023-183.

References

[1] W. H. Organization. Deafness and hearing loss. Feb-2024 [Online]. Available, https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
[Accessed: 16-Feb-2024].
[2] Johnston T, Schembri A. Australian sign language (Auslan): an introduction to sign language linguistics. Cambridge University Press; 2007.
[3] [Online]. Available. 2024. https://wfdeaf.org/our-work/; 2024; 2024 [Accessed: 18-Feb-].
[4] Sanaullah M, et al. A real-time automatic translation of text to sign language. Comput Mater Continua 2022;70(2):2471–88. https://doi.org/10.32604/
cmc.2022.019420.
[5] Núñez-Marcos A, Perez-de-Viñaspre O, Labaka G. A survey on sign language machine translation. Expert Syst Appl 2023;213:118993. https://doi.org/10.1016/
j.eswa.2022.118993. Mar.
[6] Rastgoo R, Kiani K, Escalera S, Athitsos V, Sabokrou M. A survey on recent advances in sign language production. Expert Syst Appl 2024;243:122846. https://
doi.org/10.1016/j.eswa.2023.122846. Jun.
[7] Dhanjal AS, Singh W. An automatic machine translation system for multi-lingual speech to Indian sign language. Multimedia Tools Appl 2022;81(3):4283–321.
https://doi.org/10.1007/s11042-021-11706-1. Jan.
[8] Barbhuiya AA, Karsh RK, Jain R. CNN based feature extraction and classification for sign language. Multimedia Tools Appl 2021;80(2):3051–69. https://doi.
org/10.1007/s11042-020-09829-y. Jan.
[9] Hao W, Hou C, Zhang Z, Zhai X, Wang L, Lv G. A sensing data and deep learning-based sign language recognition approach. Comput Electr Eng 2024;118:
109339. https://doi.org/10.1016/j.compeleceng.2024.109339. Aug.
[10] Siam AI, Soliman NF, Algarni AD, Abd El-Samie FE, Sedik A. Deploying machine learning techniques for human emotion detection. Comput Intell Neurosci
2022;2022:1–16. https://doi.org/10.1155/2022/8032673. Feb.
[11] Gupta R, Kumar A. Indian sign language recognition using wearable sensors and multi-label classification. Comput Electr Eng 2021;90:106898. https://doi.org/
10.1016/j.compeleceng.2020.106898. Mar.
[12] Lee BG, Lee SM. Smart wearable hand device for sign language interpretation system with sensors fusion. IEEE Sens J 2018;18(3):1224–32. https://doi.org/
10.1109/JSEN.2017.2779466. Feb.
[13] Qahtan S, Alsattar HA, Zaidan AA, Deveci M, Pamucar D, Martinez L. A comparative study of evaluating and benchmarking sign language recognition system-
based wearable sensory devices using a single fuzzy set. Knowl Based Syst 2023;269:110519. https://doi.org/10.1016/j.knosys.2023.110519. Jun.
[14] Balaha MM, et al. A vision-based deep learning approach for independent-users Arabic sign language interpretation. Multimedia Tools Appl 2023;82(5):
6807–26. https://doi.org/10.1007/s11042-022-13423-9. Feb.
[15] Rastgoo R, Kiani K, Escalera S. Sign language recognition: a deep survey. Expert Syst Appl 2021;164:113794. https://doi.org/10.1016/j.eswa.2020.113794.
Feb.
[16] Er-Rady A, Faizi R, Thami ROH, Housni H. Automatic sign language recognition: a survey. In: 2017 International conference on advanced technologies for signal
and image processing (ATSIP); 2017. p. 1–7. https://doi.org/10.1109/ATSIP.2017.8075561.
[17] Pathan RK, Biswas M, Yasmin S, Khandaker MU, Salman M, Youssef AAF. Sign language recognition using the fusion of image and hand landmarks through
multi-headed convolutional neural network. Sci Rep 2023;13(1):16975. https://doi.org/10.1038/s41598-023-43852-x. Oct.
[18] Naz N, Sajid H, Ali S, Hasan O, Ehsan MK. MIPA-ResGCN: a multi-input part attention enhanced residual graph convolutional framework for sign language
recognition. Comput Electr Eng 2023;112:109009. https://doi.org/10.1016/j.compeleceng.2023.109009. Dec.
[19] Miah ASM, Hasan MAM, Nishimura S, Shin J. Sign language recognition using graph and general deep neural network based on large scale dataset. IEEE Access
2024;12:34553–69. https://doi.org/10.1109/ACCESS.2024.3372425.
[20] Abdul W, et al. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM. Comput Electr Eng 2021;95:107395.
https://doi.org/10.1016/j.compeleceng.2021.107395. Oct.
[21] Varshini CS, Hruday G, Mysakshi Chandu GS, Sharif SK. Sign language recognition. Int J Eng Res Technol 2020;V9(05). https://doi.org/10.17577/
IJERTV9IS050781. Jun.
[22] Siam AI, El-Bahnasawy NA, El Banby GM, Abou Elazm A, Abd El-Samie FE. Efficient video-based breathing pattern and respiration rate monitoring for remote
health monitoring. J Opt Soc Am 2020;37(11):C118. https://doi.org/10.1364/JOSAA.399284. Nov.
[23] Islam MM, Nooruddin S, Karray F, Muhammad G. Human activity recognition using tools of convolutional neural networks: a state of the art review, data sets,
challenges, and future prospects. Comput Biol Med 2022;149:106060. https://doi.org/10.1016/j.compbiomed.2022.106060. Oct.
[24] Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimedia Tools Appl 2023;82(3):3713–44.
https://doi.org/10.1007/s11042-022-13428-4. Jan.
[25] Siam AI, et al. Biosignal classification for human identification based on convolutional neural networks. Int J Commun Syst 2021;34(7). https://doi.org/
10.1002/dac.4685. May.
[26] Alnaggar M, Siam AI, Handosa M, Medhat T, Rashad MZ. Video-based real-time monitoring for heart rate and respiration rate. Expert Syst Appl 2023;225:
120135. https://doi.org/10.1016/j.eswa.2023.120135. Sep.

18
F.M. Talaat et al. Computers and Electrical Engineering 119 (2024) 109475

[27] Alharbey R, Dessouky MM, Sedik A, Siam AI, Elaskily MA. Fatigue State detection for tired persons in presence of driving periods. IEEE Access 2022. https://doi.
org/10.1109/ACCESS.2022.3185251.
[28] El-Rashidy N, Sedik A, Siam AI, Ali ZH. An efficient edge/cloud medical system for rapid detection of level of consciousness in emergency medicine based on
explainable machine learning models. Neural Comput Appl 2023. https://doi.org/10.1007/s00521-023-08258-w. Mar.
[29] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-scale video classification with convolutional neural networks. In: Proceedings of the
IEEE conference on computer vision and pattern recognition; 2014. p. 1725–32.
[30] Szeliski R. Computer vision: algorithms and applications. Springer Nature; 2022.
[31] Chai J, Zeng H, Li A, Ngai EWT. Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Mach Learn Appl 2021;6:
100134. https://doi.org/10.1016/j.mlwa.2021.100134. Dec.
[32] Cazzato D, Cimarelli C, Sanchez-Lopez JL, Voos H, Leo M. A survey of computer vision methods for 2D object detection from unmanned aerial vehicles.
J Imaging 2020;6(8):78. https://doi.org/10.3390/jimaging6080078. Aug.
[33] Olveres J, et al. What is new in computer vision and artificial intelligence in medical image analysis applications. Quant Imaging Med Surg 2021;11(8):3830–53.
https://doi.org/10.21037/qims-20-1151. Aug.
[34] Elyan E, et al. Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward. Artif Intell Surg 2022. https://
doi.org/10.20517/ais.2021.15.
[35] Abaspur Kazerouni I, Fitzgerald L, Dooly G, Toal D. A survey of state-of-the-art on visual SLAM. Expert Syst Appl 2022;205:117734. https://doi.org/10.1016/j.
eswa.2022.117734. Nov.
[36] Sreemathy R, Turuk M, Chaudhary S, Lavate K, Ushire A, Khurana S. Continuous word level sign language recognition using an expert system based on machine
learning. Int J Cognitive Comput Eng 2023;4:170–8. https://doi.org/10.1016/j.ijcce.2023.04.002. Jun.
[37] Attia NF, Ahmed MTFS, Alshewimy MAM. Efficient deep learning models based on tension techniques for sign language recognition. Intell Syst Appl 2023;20:
200284. https://doi.org/10.1016/j.iswa.2023.200284. Nov.
[38] Siddique S, Islam S, Neon EE, Sabbir T, Naheen IT, Khan R. Deep learning-based bangla sign language detection with an edge device. Intell Syst Appl 2023;18:
200224. https://doi.org/10.1016/j.iswa.2023.200224. May.
[39] Al Ahmadi S, Mohammad F, Al Dawsari H. Efficient YOLO-based deep learning model for arabic sign language recognition. J Disabil Res 2024;3(4). https://doi.
org/10.57197/JDR-2024-0051. May.
[40] Abeje BT, Salau AO, Mengistu AD, Tamiru NK. Ethiopian sign language recognition using deep convolutional neural network. Multimedia Tools Appl 2022;81
(20):29027–43. https://doi.org/10.1007/s11042-022-12768-5. Aug.
[41] de Castro GZ, Guerra RR, Guimarães FG. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps. Expert Syst
Appl 2023;215:119394. https://doi.org/10.1016/j.eswa.2022.119394. Apr.
[42] Naz N, Sajid H, Ali S, Hasan O, Ehsan MK. Signgraph: an efficient and accurate pose-based graph convolution approach toward sign language recognition. IEEE
Access 2023;11:19135–47. https://doi.org/10.1109/ACCESS.2023.3247761.
[43] Sharma S, Gupta R, Kumar A. Continuous sign language recognition using isolated signs data and deep transfer learning. J Ambient Intell Human Comput 2023;
14(3):1531–42. https://doi.org/10.1007/s12652-021-03418-z. Mar.
[44] AbdElghfar HA, et al. QSLRS-CNN: qur’anic sign language recognition system based on convolutional neural networks. Imaging Sci J 2024;72(2):254–66.
https://doi.org/10.1080/13682199.2023.2202576. Feb.
[45] Alsaadi Z, Alshamani E, Alrehaili M, Alrashdi AAD, Albelwi S, Elfaki AO. A real time Arabic sign language alphabets (ArSLA) recognition model using deep
learning architecture. Computers 2022;11(5):78. https://doi.org/10.3390/computers11050078. May.
[46] Islam M, et al. Toward a vision-based intelligent system: a stacked encoded deep learning framework for sign language recognition. Sensors 2023;23(22):9068.
https://doi.org/10.3390/s23229068. Nov.
[47] Saleem MI, Siddiqui A, Noor S, Luque-Nieto M-A, Otero P. A novel machine learning based two-way communication system for deaf and mute. Appl Sci 2022;13
(1):453. https://doi.org/10.3390/app13010453. Dec.
[48] Kamruzzaman MM. Arabic sign language recognition and generating arabic speech using convolutional neural network. Wireless Commun Mobile Comput
2020;2020:1–9. https://doi.org/10.1155/2020/3685614. May.
[49] [Online]. Available. 2024. https://github.com/ultralytics/ultralytics; 2024; 2024 [Accessed: 20-Apr-].
[50] M. Al-Barham et al., “RGB Arabic alphabets sign language dataset,” arXiv preprint arXiv:2301.11932, 2023, doi: https://doi.org/10.48550/arXiv.2301.11932.
[51] C. Project, “sign-language-detection dataset,” Apr-2023. [Online]. Available: https://universe.roboflow.com/capston-project/sign-language-detection-qztxk.
[Accessed: 15-Feb-2024].
[52] Belmadoui S [Online]. Available. 2024. https://www.kaggle.com/datasets/sabribelmadoui/arabic-sign-language-unaugmented-dataset; 2024; 2024 [Accessed:
15-Feb-].
[53] Al-Barham M, Sa’Aleek AA, Al-Odat M, Hamad G, Al-Yaman M, Elnagar A. Arabic sign language recognition using deep learning models. In: 2022 13th
International conference on information and communication systems (ICICS); 2022. p. 226–31. https://doi.org/10.1109/ICICS55353.2022.9811162.
[54] AbdElghfar HA, et al. A model for Qur’anic sign language recognition based on deep learning algorithms. J Sens 2023;2023:1–13. https://doi.org/10.1155/
2023/9926245. Jun.
[55] Dabwan BA, Jadhav ME, Ali YA, Olayah FA. Arabic sign language recognition using efficientnetB1 and transfer learning technique. In: 2023 International
conference on IT innovation and knowledge discovery (ITIKD); 2023. p. 1–5. https://doi.org/10.1109/ITIKD56332.2023.10099710.
[56] M. Al-Barham, A. Jamal, and M. Al-Yaman, “Design of Arabic sign language recognition model,” arXiv preprint arXiv:2301.02693, 2023, doi: 10.48550/arXiv.
2301.02693.

Unit - 3 NLP - R20
No ratings yet
Unit - 3 NLP - R20
21 pages
Sign Language Translator Project Report
100% (1)
Sign Language Translator Project Report
45 pages
Visual Vernacular - Scriptie Eindversie
100% (1)
Visual Vernacular - Scriptie Eindversie
35 pages
Semantic Analysis
No ratings yet
Semantic Analysis
46 pages
TSAT Model Manual PDF
100% (1)
TSAT Model Manual PDF
431 pages
Khóa Học Python
No ratings yet
Khóa Học Python
391 pages
PTS Reference Manual V2.2
100% (3)
PTS Reference Manual V2.2
265 pages
R5 ES DP221 Deployment Approach
No ratings yet
R5 ES DP221 Deployment Approach
26 pages
Smart Glove For Sign Language Translation Using Arduino: KEC Conference KEC Conference
No ratings yet
Smart Glove For Sign Language Translation Using Arduino: KEC Conference KEC Conference
7 pages
5CED2 22171341121 Nigam
No ratings yet
5CED2 22171341121 Nigam
31 pages
Sign Language Recognition Using Python and Opencv: Sandip Appasaheb Dange
No ratings yet
Sign Language Recognition Using Python and Opencv: Sandip Appasaheb Dange
51 pages
M580-2CH User Manual 20230620
No ratings yet
M580-2CH User Manual 20230620
40 pages
T3 Unit 4.2 Features of Conversations and Stories
No ratings yet
T3 Unit 4.2 Features of Conversations and Stories
23 pages
Library Management - Principles and Practice
60% (5)
Library Management - Principles and Practice
83 pages
T3 Unit 4.5 ISL Literature V2
No ratings yet
T3 Unit 4.5 ISL Literature V2
18 pages
Creating A NAS With Ubuntu Server
No ratings yet
Creating A NAS With Ubuntu Server
10 pages
VoiceToSign Proposal Final
No ratings yet
VoiceToSign Proposal Final
22 pages
HTML Reference by Caregory Sinhala
No ratings yet
HTML Reference by Caregory Sinhala
8 pages
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
No ratings yet
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
11 pages
5.4 Error Handling in File Operations
No ratings yet
5.4 Error Handling in File Operations
10 pages
Towards Kurdish OCR
No ratings yet
Towards Kurdish OCR
20 pages
GDM December 1999
No ratings yet
GDM December 1999
43 pages
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
No ratings yet
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
11 pages
Black ASL Sign Language and Racism in The Education of Deaf Students
No ratings yet
Black ASL Sign Language and Racism in The Education of Deaf Students
10 pages
ZSH - NixOS Wiki
No ratings yet
ZSH - NixOS Wiki
3 pages
AKIN PROJECT SIGN LANGUAGE (Tensor)
No ratings yet
AKIN PROJECT SIGN LANGUAGE (Tensor)
22 pages
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
No ratings yet
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
15 pages
SRS Report
No ratings yet
SRS Report
59 pages
BES103 PythonLab4
No ratings yet
BES103 PythonLab4
4 pages
Compiler Design Unit 1 Notes
No ratings yet
Compiler Design Unit 1 Notes
21 pages
Survey Sign Language Production 2023
No ratings yet
Survey Sign Language Production 2023
23 pages
Hand Gesture Vocaliser For Deaf
No ratings yet
Hand Gesture Vocaliser For Deaf
6 pages
Vision 2022 Toc Chapter 7 Turing Machine 50
No ratings yet
Vision 2022 Toc Chapter 7 Turing Machine 50
19 pages
Mohammed Maqdoom Jahagirdarp2Yo
No ratings yet
Mohammed Maqdoom Jahagirdarp2Yo
9 pages
CD4069UBC Inverter Circuits: General Description Features
No ratings yet
CD4069UBC Inverter Circuits: General Description Features
7 pages
Bilingual Education For Deaf Children in Sweden
No ratings yet
Bilingual Education For Deaf Children in Sweden
17 pages
Sign Language Detection
No ratings yet
Sign Language Detection
5 pages
Aa PORTABLE ASSISTIVE DEVICE FOR DEAF DUMB AND BLIND USING AI
100% (2)
Aa PORTABLE ASSISTIVE DEVICE FOR DEAF DUMB AND BLIND USING AI
4 pages
Using List Views React Native
No ratings yet
Using List Views React Native
3 pages
Morse Code Translator Using Eye Blinks
No ratings yet
Morse Code Translator Using Eye Blinks
6 pages
0623 Microsoft Excel 2013 Part 2 Intermediate
No ratings yet
0623 Microsoft Excel 2013 Part 2 Intermediate
23 pages
Pushdown Automata Pdas: Fall 2006 Costas Busch - RPI 1
No ratings yet
Pushdown Automata Pdas: Fall 2006 Costas Busch - RPI 1
79 pages
Application Form: Minimum Term - 4 Years, Maximum Term (2019) - 8 Years
No ratings yet
Application Form: Minimum Term - 4 Years, Maximum Term (2019) - 8 Years
2 pages
Types of Space and Their Use in Sign Language
No ratings yet
Types of Space and Their Use in Sign Language
11 pages
Umer Ziyad Resume QualityEngineer
No ratings yet
Umer Ziyad Resume QualityEngineer
3 pages
Sign Language To Voice Converter
No ratings yet
Sign Language To Voice Converter
13 pages
Chapter 5 Symbol Tables and Type Checking
No ratings yet
Chapter 5 Symbol Tables and Type Checking
39 pages
TOC Practise Problems and Hints
No ratings yet
TOC Practise Problems and Hints
15 pages
Audism: A Theory and Practice of Audiocentric Privilege: Richard Clark Eckert and Amy June Rowley
No ratings yet
Audism: A Theory and Practice of Audiocentric Privilege: Richard Clark Eckert and Amy June Rowley
30 pages
G1 Sign Language Identifier PPT
No ratings yet
G1 Sign Language Identifier PPT
18 pages
Review Your Answers
No ratings yet
Review Your Answers
5 pages
Sign Language Converter
No ratings yet
Sign Language Converter
4 pages
Sign Language Recognition System Using Machine Learning
No ratings yet
Sign Language Recognition System Using Machine Learning
6 pages
Sign Language and Common Gesture Using CNN
0% (1)
Sign Language and Common Gesture Using CNN
7 pages
Top-Down Parsing Predictive Parsing
No ratings yet
Top-Down Parsing Predictive Parsing
4 pages
Software and Hardware Interaction: Learning Outcomes Words To Know
No ratings yet
Software and Hardware Interaction: Learning Outcomes Words To Know
8 pages
Manual Lip Synch Toon Boom Tutorial
No ratings yet
Manual Lip Synch Toon Boom Tutorial
3 pages
Controlling Computer Using Hand Gesture Recognition
No ratings yet
Controlling Computer Using Hand Gesture Recognition
60 pages
Virtual Reality
No ratings yet
Virtual Reality
21 pages
Finger Motion Capture For Sign Language Interpretation
No ratings yet
Finger Motion Capture For Sign Language Interpretation
11 pages
CIT 811 TMA 4 Quiz Question
No ratings yet
CIT 811 TMA 4 Quiz Question
3 pages
Word Semantics, Sentence Semantics and Utterance Semantics
No ratings yet
Word Semantics, Sentence Semantics and Utterance Semantics
11 pages
Sign Language Recognition and Converting Into Text
No ratings yet
Sign Language Recognition and Converting Into Text
8 pages
Synopsis PPT Template
No ratings yet
Synopsis PPT Template
13 pages
Use of Neural Networks and Deep Learning in Urdu Translation
No ratings yet
Use of Neural Networks and Deep Learning in Urdu Translation
8 pages
A Survey Paper On Sign Language Recognition System Using OpenCV and Convolutional Neural Network
No ratings yet
A Survey Paper On Sign Language Recognition System Using OpenCV and Convolutional Neural Network
7 pages
Parsing
No ratings yet
Parsing
38 pages
OOPS in C++ PDF
No ratings yet
OOPS in C++ PDF
7 pages
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
No ratings yet
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
13 pages
Typesetting in Wxmaxima: 1.1 Entering Material & Exporting L Tex Files
No ratings yet
Typesetting in Wxmaxima: 1.1 Entering Material & Exporting L Tex Files
9 pages
Gesture Language Translator Using Raspberry Pi
No ratings yet
Gesture Language Translator Using Raspberry Pi
7 pages
Convolution Neural Networks For Hand Gesture Recognation
No ratings yet
Convolution Neural Networks For Hand Gesture Recognation
5 pages
XMLP DevelopmentGuide
No ratings yet
XMLP DevelopmentGuide
26 pages
Bottom Up Approach
No ratings yet
Bottom Up Approach
22 pages
Use The Standard Program
No ratings yet
Use The Standard Program
23 pages
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
No ratings yet
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
4 pages
Hand Sign Language Translator For Speech Impaired
No ratings yet
Hand Sign Language Translator For Speech Impaired
4 pages
COMP 3803 - Solutions Assignment 3
No ratings yet
COMP 3803 - Solutions Assignment 3
10 pages
Wireless Site Survey Checklist: Select Download Format
0% (1)
Wireless Site Survey Checklist: Select Download Format
4 pages
Deaf Students Solving Mathematical Problem
No ratings yet
Deaf Students Solving Mathematical Problem
21 pages
Sign Language Interpreter Using Computer Vision and LeNet-5 Convolutional Neural Network Architecture
No ratings yet
Sign Language Interpreter Using Computer Vision and LeNet-5 Convolutional Neural Network Architecture
4 pages
SUMMARY Deaf Language
No ratings yet
SUMMARY Deaf Language
2 pages
Theory of Computation Practice Questions
No ratings yet
Theory of Computation Practice Questions
6 pages
Sign Language To Text-Speech Translator Using Machine Learning
No ratings yet
Sign Language To Text-Speech Translator Using Machine Learning
5 pages
Qurana: Corpus of The Quran Annotated With Pronominal Anaphora
No ratings yet
Qurana: Corpus of The Quran Annotated With Pronominal Anaphora
8 pages
RESUME Mark Reardon PHD PDF
No ratings yet
RESUME Mark Reardon PHD PDF
2 pages
AT&FL Lab 11
No ratings yet
AT&FL Lab 11
6 pages
Machine Translation
No ratings yet
Machine Translation
1 page
Sign Language
No ratings yet
Sign Language
2 pages
MiFi 2372 Datasheet
No ratings yet
MiFi 2372 Datasheet
2 pages

Paper 1

Uploaded by

Paper 1

Uploaded by

Computers and Electrical Engineering 119 (2024) 109475

Contents lists available at ScienceDirect

Computers and Electrical Engineering

Real-time Arabic avatar for deaf-mute communication enabled by

• Sign Diversity: There are several sign languages in the world.

3. ArSLGen: Arabic sign language generation algorithm

3.1. Data collection and preprocessing

Fig. 1. Arabic sign language generation algorithm.

ii. Data Annotation

iii. Data Augmentation

Fig. 2. Annotated audio transcripts for arabic sign language recognition.

3.2. Model training with YOLOv8

3.3. Performance evaluation and metrics

3.4. Real-time translation and validation

Fig. 4. Samples of AASL dataset.

Fig. 5. Samples of SLDI dataset.

Fig. 6. Samples of ArSL dataset.

4.2. Performance metrics

4.3. Model evaluation

4.4. Results and discussion

AASL 99.4 99.5 99.5

Fig. 10. Performance metrics of the proposed model on different datasets.

The bold is the proposed model with highest evaluation metrics.

Declaration of competing interest

Data will be made available on request.

You might also like