0% found this document useful (0 votes)
11 views

Development of A Sign Language Recognition System Using Machine Learning

The document discusses the development of a machine learning system to recognize sign language in real-time. It involves collecting a dataset of images representing American Sign Language alphabet signs from a webcam. The dataset is split into training, validation and test sets and several classifiers are trained and evaluated for sign language recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Development of A Sign Language Recognition System Using Machine Learning

The document discusses the development of a machine learning system to recognize sign language in real-time. It involves collecting a dataset of images representing American Sign Language alphabet signs from a webcam. The dataset is split into training, validation and test sets and several classifiers are trained and evaluated for sign language recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Development of a Sign Language Recognition

System Using Machine Learning


2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) | 979-8-3503-1480-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICABCD59051.2023.10220456

Hope Orovwode Ibukun Deborah Oduntan John Abubakar


Electrical and Information Engineering Electrical and Information Engineering Electrical and Information Engineering
Covenant University Covenant University Covenant University
Ota, Ogun State, Nigeria Ota, Ogun State, Nigeria Ota, Ogun State, Nigeria
hope.orovwode@covenantuniversitv.ed oduntan.Ibukunoluwa@stu.cu.edu.ng john.abubakar@covenantuniversitv.edu
u.ng _:!!g

Abstract-Deafness and voice impairment have been It can range from mild to severe and can affect either or both
persistent disabilities throughout history, hindering individuals ears [6]. The term "deaf and dumb" is frequently used to
from engaging in verbal communication and leading to their refer to those who, due to their deafness, are unable to hear
isolation from the predominantly vocally communicating what others are saying and, as a result, continue to be dumb
society. Sign language has emerged as the primary mode of [7]. Children who have early hearing loss due to illness or an
communication for people with these disabilities. However, it
accident quickly lose their ability to speak [8]. This is
presents a language barrier as it is not commonly understood
because deaf children are unable to acquire speech by
by those who can hear. To address this issue, various methods
imitating others [9]. Hereditary, degenerative, and accidental
for recognizing sign language have been proposed. This
factors among others can lead to hearing loss [10].
paperaims to develop a machine learning-based system that
can recognize sign language in real-time. The paper involved Loss of hearing may impede the capacity of individuals
the acquisition of a dataset consisting of 44,654 images to express themselves verbally. This restriction makes it
representing the static American Sign Language (ASL) difficult for healthy and deaf individuals to communicate
alphabet signs. The HandDetector module was utilized to with one another. Due to this, deaf and mute people often
detect and capture images of the signer's hand forming each
find it difficult to have meaningful conversations with people
sign through a PC webcam. The dataset was split into three
they encounter on a regular basis. The primary method of
sets: training data (20,772 cases), validation data (8,903 cases),
communication for persons with trouble speaking or hearing
and test data (14,979 cases). Image pre-processing techniques
impairments is sign language [1 1]. For this mode of
were implemented on the images and a convolutional neural
network (CNN) model was trained and compiled. The CNN
communication to work, the regular individual must be able
utilized in the paper comprised of three convolutional layers
to understand sign language. Acquiring adequately trained
and a SoftMax output layer and it was compiled using the translators in any of the numerous signed languages often
Adam optimizer and categorical cross-entropy loss function. requires much time and resources and may also hinder the
The performance of the system was evaluated using the test privacy of the hearing- impaired person [12]. For this reason,
dataset. Notably, the system achieved remarkable accuracy many research papers have been carried out to develop sign
rates, having a training accuracy of 99.86%, a validation recognition systems for various languages. The most
accuracy of 99.94%, and a test accuracy of 94.68%. The results effective methods for understanding sign language are
obtained from this study demonstrated significant through vision-based and wearable sensing methods like
advancements in sign language recognition, surpassing sensory gloves [13]. To determine the hand posture for
previous findings in the literature. identification, the glove- based system uses mechanical or
optical sensors which are connected to the user's glove and
Keywords-machine learning, sign language recognition, transforms finger motions into electrical signals [14][15] .
CNN, sign language, American Sign Language (ASL) However, these devices are bulky and typically have many
cables connecting to a computer. This demonstrates the
I. INTRODUCTION necessity for non-intrusive, image-based approaches for
recognizing motions [16][17]. In a vision/image- based
Based on a World Health Organization (WHO) report, technique, joint angles, fmger positions, and characteristics
greater than 6% of the general population has hearing relating to the palms are calculated and used to accomplish
impairment while approximately 5% of the population of the recognition [ 1 8]. With this methodology, the signs must be
world, or 430 million people (432 million adults and 34 photographed or recorded on video, then processed using
million children), need therapy to treat their 'disabling' image-processing software [19][20]. The three types of
hearing loss [1]. Debilitating hearing loss is projected to image-based sign language recognition systems are alphabet,
affect about 700 million individuals by 2050, or one in ten individual word, and continuous sequences [2 1]. Since it is
people [2]. Treat (20 1 6) estimates that 23.7% of Nigerians, quicker to use words and continuous sequences than to spell
out of a population of over 155 million people, have a out each individual word, people who have trouble hearing
hearing disability (total deafness, hearing loss, or other or speaking most often interact with others in this way.
impairment related to hearing) [3]. Additionally, up to 84% However, signers will employ fmger spelling if there is no
of Nigeria's deaf population is uneducated and conventional sign to represent the required word [22] . They
underdeveloped economically [4]. WHO defines someone as spell out the word with hand gestures that represent the
having loss of hearing if they are unable to hear at the level letters of the alphabet. In this instance, a static posture
of a person with normal hearing, which is characterized by individually produces each letter. In certain sign languages, a
possessing hearing levels of20 dB or higher in both ears [5].

979-8-3503- 1480-9/23/$3 1 .00 ©2023 IEEE

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
single hand is used for finger spelling, whereas two hands are 90.04%, number recognition was 93 .44%, and static word
used in others [23]. identification was 97.52%. The study showed that deep
learning techniques can be useful for static sign language
There are many distinct kinds of sign language in the
recognition and can outperform previous research in this
world including British sign language, Spanish sign
field. The authors suggest that further research could focus
language, Arabic sign language, Chinese sign language and
on improving the system's accuracy under varying lighting
so on [24]. The proposed sign language recognition system
conditions and complex backgrounds to make the system
focuses on the American sign language and uses a vision­
more robust for real-world applications.
based approach to detect finger spelling ASL hand gestures
and translate them into text. Image processing algorithms are Purl et al. [28] proposed a method for Indian Sign
used by the system to detect and track the movements of the Language (ISL) recognition through the use of Python
hand and fmgers in real-time. The system also uses machine programming language. The program's code was written
learning algorithms to recognize the gestures and translate using Python, and several modules, including Numpy, Os,
them into text. Tensorflow, Keras, OpenCv (Cv2), and various
preprocessors were employed to train the system. To
In this paper, we will discuss some related works on the
enhance accuracy, the training process utilized both a locally
development of sign language recognition system. We will
created library of ISL symbols like numbers 0-9, and an
also explain in detail how the proposed system was
online database obtained from GitHub. The proposed
developed and implemented, and the results obtained.
algorithm demonstrated an accuracy range of 79 to 1 00%.
The study demonstrated the potential of using Python-based
II. LITERATURE REVIEW deep learning techniques to recognize Indian Sign Language
Numerous research studies have been conducted on sign gestures accurately. The authors suggest further research in
this direction could be fruitful in enhancing the system's
language recognition, utilizing various sensor technologies
efficiency and generalizability to various signing styles.
and machine learning methods. Below is a quick rundown of
some of the recent research that has been carried out on the The study by Sharma and Kumar [29] proposed a
detection and recognition of sign language. technique called ASL-3DCNN for recognizing American
In their study, Thakur et al. [25] proposed employing a Sign Language (ASL) using 3-D Convolutional Neural
Networks (CNNs). To prepare the video sequences for
Convolutional Neural Network (CNN) to achieve real-time
detection of sign language and generate corresponding processing, the frames were separated and then pre­
processed by converting them to grayscale, filtering out noise
speech. The dataset used in the study mostly consisted of
and spots, and removing illumination variations through
American Sign Language alphabet, and the preprocessed
histogram equalization. 25 frames are then condensed and
gesture dataset were trained using the CNN VGG- 16 model
normalized before being trained on 3-D CNNs. The
with Python libraries and tools such as OpenCV and
suggested technique outperformed current cutting-edge
skimage. The system detected input and generated speech
models in metrics including accuracy, recall, and f-measure,
accordingly. The results depicted that the training loss and
accuracy were 0.0259 and 99.65%, respectively, and 99.62% with a 0.19 second computation duration per frame, which
of the tests were accurate. By demonstrating the feasibility of makes it well suited for real-time applications.
using CNN, the study showed how real-time detection of Using machine learning techniques, Amrutha et al. [26]
sign language and subsequent speech generation could be devised a system capable of recognizing sign language. The
achieved, potentially offering a more efficient system's performance was assessed using a dataset
communication method for people who are deaf or hearing­ comprising hand gestures representing numbers from 1 to 5.
impaired. Further research can be done to broaden the During the pre-processing phase, the background was
application of the system to include more sign languages and eliminated using a threshold approach, and the outlines of the
gestures. fingers were extracted using contour-based segmentation. To
extract features and classify the gestures, K-NN with the
In the study conducted by Shrenika and Madhu Bala [26],
convex hull method was used, and Euclidean distance was
the authors explored the use of a template matching
employed. The system had an accuracy rate of 65% on the
algorithm for sign language recognition. The study involved
test dataset, which could be improved by using a larger
recording different hand gestures using a camera and
dataset and a different classifier.
processing the images using various algorithms. The first
step in the process was pre-processing the image, followed Quin et al. [30] developed a British sign language
by edge detection to identify the edges of the sign. Once the recognition system with multi-class support vector machines
sign was recognized using the template matching algorithm, (SVM) and histogram of oriented gradients (HOG) computer
the corresponding text was displayed. The system was able vision techniques. A real-world dataset of 13,066 cases was
to successfully detect basic static hand signs, indicating that used to successfully test the system, which had an accuracy
template matching can be an efficient technique for sign rate of approximately 99% and a 170 ms mean processing
language recognition. time, making it suitable for real-time visual signature.
Tolentino et al. [27] conducted research on the Overall, the existing studies have made significant
application of deep learning techniques to recognize static contributions to sign language recognition , but there is a
sign language. The authors employed a skin-color modeling need for further research to address the representation and
technique to isolate the pixels of the hand from the recognition of a wider range of sign languages and gestures,
background. Afterwards, the images were subjected to including dynamic signing. This would ensure that the
classification using a CNN model, which had been trained developed systems are more inclusive, applicable, and
using Keras. When adequate illumination and a consistent beneficial to diverse sign language communities.
background were maintained, the average testing accuracy
reached an impressive 93.67%. The system's accuracy in
recognizing American Sign Language (ASL) letters was

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
Ill. METHODOLOGY B. Data preparation/pre-processing
There are five main stages in the system's development as The sign images were pre-processed at this stage using
seen in Fig 1 , which are data acquisition, image pre­ image resizing and normalization techniques. In image
processing, training of the model, model evaluation, and resizing, the images were obtained in RGB format and
deployment of the model in which the model's real time resized to 224x224 pixels to suit the model. This was done
prediction and results are obtained. The system's design using built-in functions from OpenCV. In the normalization
specifications, as well as the step-by-step procedure for step, the to_categorical function from Keras was used to
developing the system are explained in this chapter. convert the datasets into one-hot encoded vectors. The
images were then adjusted to alter the range of pixel intensity
values, yielding a mean and variance ofO and 1 , respectively.
This was done by dividing the image datasets by 255. This
was done to ensure that all the pixel values have a similar
scale. The formula for normalization when the input data is

( ) ( )
in the 0-255 range is given in equation (1 ).
X X min X max X min
X_norm =
255 - ;55 / � - ;55 (1}

C. The CNNArchitecture
The model architecture was built using the Keras
framework. The validation dataset was used to obtain
optimal settings for the CNN hyper-parameters. The input
shape of the images is specified as (224, 224, 3), indicating
that the images have a resolution of 224x224 pixels and
contain three color channels (Red, Green, and Blue) to
represent the RGB values of each pixel. RGB images were
used to provide additional color information to help the
Fig. 1 . Block Diagram representing the System Design
model easily distinguish between different signs and improve
sign language recognition accuracy.
A. Data Acquisition
The model's structure comprises of three convolutional
A collection of images depicting static ASL alphabet
layers that each utilize Rectified Linear Unit (ReLU)
signs was generated to form a dataset. To acquire the image
activation and a max pooling layer to extract characteristics
datasets, the laptop webcam was used in combination with
from the input images. The amount of filters in the
the python OpenCV library to capture images of the signer's
convolutional layers progressively increases from 32 to 64 to
hand. The HandDetector module from evzone was used to
1 28. The convolutional layer outputs are then flattened and
locate and crop out the signer's hand within the camera's field
passed on to fully connected layers with ReLU activation to
of view. This was done to minimize the effect of the user's
categorize the input images based on the extracted
background on the model prediction. Once the hand was
characteristics. The output layer has 24 neurons with
detected, the OpenCV library was used to resize the image to
Softmax activation, which generates a distribution of
a fixed size of 300 x 300 pixels. However, the dynamic signs
probabilities over the 24 classes the model is designed to
J and Z were removed from all datasets, leaving only 24
classify. The model is as shown in Fig 3.
image classes. This resulted in a total of 44654 images,
which were saved in JPG format. The training dataset was Conv2d Conv2d
(3 x 3) (3 x 3)

ReLU ReLU
divided into two separate portions - one designated for
training the model, and the other reserved for validation
purposes. 70% (20772 instances) was used for training while
the remaining 30% (8903 instances) was used for validation.
The test dataset was a separate dataset and contained 14979
instances. The distribution of the dataset is as shown in Fig 2. Fully
Connected

1200 Fig. 3. CNN Model

1000 The model was compiled using Adam optimizer once the

required hyper-parameters were established with the loss
800
'"
"'
"' function being categorical cross entropy. Updates to network
E
weights are made using Adam. It has simple
600
0

Gi implementations, is computationally efficient, and uses less


E
.c

" memory. This approach estimates the first and second


400
"'

moments of the gradient for determining the adaptive


learning rate for each parameter given a set of different
:wo parameters. The layers of the model are further analyzed as
thus:
0 10 20
Class Label
1) Input layer: This should contain the image data.
2) The 2D convolution layer (Conv2d): applies a kernel
Fig. 2. Bar graph illustrating the distribution ofdata for each class that convolves with the input of the layer, resulting in a
tensor of outputs .. The 3x3 kernel which makes up the

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
convolution layer is a convolution matrix that, when ranging from 0 to 1 . The sum of all the values in the vector
convolved with an image, performs several different tasks, is equal to 1 , with each value denoting the likelihood or
including blurring, edge detection, and sharpening. Let us probability of the input image being associated with a
assume an input image frame of dimensions Win x Rin. The specific class .. The model predicts the class with the highest
convolutional kernel with dimensions K x K is considered probability as the output. The Softmax activation function
for convolution with a stride of S and P padding. The size of can be mathematically expressed as shown in equation 8.
'
� e l"

the output (Wout x Rout) of convolution layer is given by
f (p) = soft.max = a- (p ) = (8)
equations 2 and 3. L.j =- 1 .e: PJ

Wout = (Wm-K+2P)/S+l D. Model Training


(2)
The model underwent training on the training dataset for
a duration of 5 epochs with a batch size of 64. This number
Rout = (Rm-K+2P)/S+ l (3)
of epochs was chosen because the dataset was relatively
small, so the smaller number of epochs was sufficient to
There are three convolutional layers in the CNN model's
achieve reasonable performance while minimizing the
design. The following standard equation in (4) is typically
training time. The decision was also made to prevent

f(LYF-I l
used to represent the convolutional layer output.
overfitting to the training data which can be caused by
prolonged training, resulting in poor performance on new
Y} ;:;; * kfj (j
+ and previously unseen data. The validation dataset was used
IEC}
(4) to obtain optimal settings for the CNN hyper-parameters.

Where n is the nth layer, kij represents the convolutional E. Model Evaluation
kernel, g is the bias and the input maps are symbolized by cj.
The CNN employs the piecewise linear ReLU activation To assess the model's performance on novel, unseen
function, which yields zero in the absence of a positive input; data, it was evaluated using the test dataset. Various metrics,
otherwise, it outputs the input directly. It is formulated as including accuracy, precision, recall, and F l -score, were
shown in equation (5). computed based on the test data to gauge the model's
effectiveness.
f(x) = max(O,x) (5)
3) Pooling Layer: A convoluted image can be too large F. Model Deployment
and therefore must be shrunk without sacrificing any of its The model was deployed for testing in real time. The
feature. Max and min pooling are two varieties of pooling. application was able to access the device camera to acquire
The proposed CNN makes use of max pooling. The live video feed, and then display the model' s prediction on
maximum value from the specific region is chosen in the the screen. In order for the system to detect signs, the user
background should preferably be clutterless and there needs
max pooling layer. Each convolutional layer uses a max­
to be sufficient lighting. Once the hand is detected, the
pooling layer having a 2x2 kernel size to downsample the
application will generate the equivalent text representation of
output by a factor of 2. The formula for calculating the
the sign.
output max pooling spatial shape as shown in equation (6)
IV. EXPERIMENTS AND RESULTS
l(lx-P)/Sj+ 1 (6)
The overall training accuracy was determined to be
Where P is the pooling window size, S presents the amount 99.86% while validation accuracy was determined to be
of strides, and Ix represents the input x or y shape. 99.94%. The test accuracy was determined to be 94.68%.
Figures 5 exhibits the curves for training and validation
4) Flattening Layer: In order to feed a multi­
accuracy and Fig 6. depicts the training and validation loss
dimensional matrix into the fully connected layer, this layer
over 5 epochs. A confusion matrix and the calculated test
converts it to a ! -dimensional array.
accuracy, precision, recall, fl- score were obtained based on
5) Fully Connected or Dense Layers: The model has
the test dataset using scikit learn library in Python.
two fully connected layers with 256 and 128 units,
respectively. The dense layer of a CNN is a highly coupled Accuracy is the most effective performance metric and is
computed as the ratio of correct predictions to all predictions.
layer that provides learning features based on all potential
combinations of the features from the preceding layer. It
The mathematical expression is given in equation (9):

carries out using equation (7):


number of correct precictions TP + TN
Accuracy = =
------

number of all predictions TP + FP + FN + TN (9}


Output = activation (dot (input,kernel) + bias) (7)
The degree of precision is determined by the proportion
6) Output Layer: The output layer of the model of accurately predicted positives to all predicted positive
comprises of 24 units, which are activated by a Softmax observations. The mathematical expression is given in
activation function. The Softmax function generates a equation (10):
probability distribution over the 24 possible classes to which TP
Pr ecision = --­

the input image can belong. In classification models, it is TP + FP (10)


common to utilize the softmax activation function in the
The ratio of correctly predicted positive observations to
output layer. This function helps convert the output of the
the total number of actual positive observations is known as
last layer into a probability distribution across the various
classes. The output is transformed into a vector of values

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
recall. The mathematical expression is given in equation
(1 1):
TP
Recall =
12
--­

TP + FN (1 1)
These are the key words used in the equations for the
evaluation metrics:
True Positives (TP) are correctly anticipated positive
values, demonstrating that both the actual class value and the
projected class value are accurate.
0.6

QJ
True Negatives (TN) are properly predicted negatives,
or wrong values, which show that there is a discrepancy :::J

between the actual class value and the anticipated class � 0.4

value.
0.2
False Positives (FP) are instances where the actual class
is incorrect despite the predicted class being correct.
0 I 2 3 4 5 6 7 8 9 W ll U � � 6 � D W � � ll D B
False Negatives (FN) are instances in which the actual
class is true, but the predicted class is false.
Classes
Table 1 depicts the recall and fl-score of recognition of
each letter class with Class labels 0 to 23 representing letters
Recall fl-score.
A to Y excluding J. These metrics were calculated based on -- Precision -- . --

the test data.

TABLE I. MODEL PRECISION, RECALL AND Fl SCORE Fig. 4. Graph ofprecision, recall and fl-score metrics calculated using the
test dataset
Class precision recall fl-score
The precision, recall, and F l -score metrics are plotted on
0 1 .00 0.98 0.99
a graph as a line chart as shown in Fig. 4. The x-axis
1 0.89 0.99 0.93 represents the 24 different classes, and the y-axis represents
the corresponding metric scores. The lines are color-coded
2 1 .00 0.98 0.99
for ease of interpretation. From the graph, all the classes
3 0.73 0.80 0.77 have relatively high precision, recall, and F 1-score. These are
4 0.97 indicative of good prediction performance.
0.95 0.99
5 0.99 The curve in Fig. 5 shows the accuracy ofthe model on a
0.99 0.99
training dataset and the validation dataset for 5 epochs or
6 1 .00 1 .00 1 .00 training iterations. Fig. 6 shows the loss of the model on both
7 0.99 the training and validation dataset for 5 epochs or training
0.99 1 .00
iterations. The x-axis in both figures represent the number of
8 1 .00 0.67 0.80 training iterations or epochs, while the y-axis represents the
9 0.86 0.98 0.92 accuracy of the model in Fig. 5 and the loss in Fig. 6 on the
corresponding dataset.
10 0.96 1 .00 0.98
In Fig. 5, at the beginning of training, the training
11 0.93 0.87 0.90 accuracy is lower. This is because the CNN model has not
12 0.88 1 .00 0.94 yet learned to recognize hand gestures and is essentially
making random guesses. At the start of training, the model's
13 1 .00 1 .00 1 .00 weights are randomly initialized, resulting in a lower training
14 1 .00 1 .00 1 .00 accuracy. As the training progresses, the model updates its
weights, which leads to an increase in accuracy on both the
15 1 .00 1 .00 1 .00 training and validation sets. As the number of epochs
16 0.83 0.97 0.89 increases, the training accuracy gradually improves, as the
model learns to recognize hand gestures by adjusting the
17 0.97 0.90 0.94 weights of its neurons. The validation accuracy remained
18 1 .00 1 .00 1 .00 consistently high for all epochs with a slight decrease
between the first and third epoch. This means that the CNN
19 0.92 0.96 0.94 model was not overfitting to the training data, which is
20 0.94 0.88 0.91 important because overfitting can cause the model to perform
poorly on new data.
21 1 .00 0.78 0.87
In Fig. 6, it is shown that at the beginning of training, the
22 1 .00 1 .00 1 .00
training loss is high. This is because the CNN model is
23 1 .00 1 .00 1 .00 making random guesses and has not yet learned to recognize
hand gestures accurately. The loss curve represents how the
A Graph of Performa nce Indices vs. Classes

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
model's weights affect its ability to minimize the difference Tr.ai ning & Validatioh lo-ss
between the predicted output and the actual output. At the ...,.. Tr;unill'lg L o::t"
.....,._ '.fa atJOfll LOSS
0.2 5
start of training, the model's loss is high because its weights
are random, and it is essentially making random guesses. As
the training progresses, the model updates its weights, which
leads to a decrease in the loss on both the training and
o.2 0
validation sets. As the number of epochs increases, the
training loss gradually decreases, indicating that the CNN
model is getting better at recognizing hand gestures on the
training dataset. The validation loss remained consistently
near 0 for all epochs with a slight increase between the first
C\15
and third epoch. This would suggest that the CNN model is
able to recognize hand gestures accurately with a high level
of confidence. This would provide a strong indication that
the CNN model is a good candidate for use in sign language
prediction, as it is able to make accurate predictions with a
very low level of error.

Training & Valida tl()n Att uracy


0.05

l.QO

o.ca

0.0 1.0 's 2.0 l.O :;..:s �.0


Epochs

11.98

Fig. 6. Graph oftraining and validation loss for 5 epochs


g o.!H
� Figure 7 displays the confusion matrix of the model,
which was generated using the test dataset. It is a 24x24 table
that provides an overview of the multiclass classification
model's performance by comparing the predicted and actual
labels for each class. To verify the model's accuracy on the
0.95 test dataset, one can sum the number of true positives and
divide it by the total number of instances, as expressed in
equation (12). This equation offers a reliable method to
assess the accuracy of the model using the information
-+- 1ta1n1ng Accuro::y provided in the confusion matrix.
...... v�lidntxm Accurllt.)'

O.G 0.5 l.O l.5 2.0 2.5 J.S 4.0


total num.b gr of f;;r�w positives
Epochs
A ccur acy =
_
ro t.al num b e-r of �nst.an ces (12)

Fig. 5. Graph of training and validation accuracy for 5 epochs Therefore, the test accuracy of the model can be
calculated as seen in equation (13):

600 + 603 + 603 + 4>99· + 621 + 61 1 + 611


+619 + 422 + 624 + 635 + 5 5 5 + 613 + 620
+ 63 0 + 625 + 629 + 5 69 + 62 1 + 605 + 5 5 7

A ccur a q• = _
___ t_,
___. 6_,
7""'
4'-'- , ,_
t
._6,_ +_,6
.__._
14 0
1,_,
,_. _____

14 9 7 9 (13)

Which would yield an accuracy value of 94.68%.

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
0 3 0 0 0 4 0 0 0 0 0 3 0 0 0 0 0 2 1 0 0 0 0 0

1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 600

N 0 0 0 0 0 0 8 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 112 0 0 11 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0
500
0 6 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

co 0 0 0 181 23 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
400
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 7 0 0 10 0 0 0 0 1 0 57 0 0 0 0 9 0 0 0 0 0 0

N
..... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
300

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Lf"l
.....
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 200

0 0 0 0 0 0 0 0 0 0 0 41 21 0 0 0 0 0 0 0 0 0 0

co
..... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 2 0 0 0 0 0 5 0 0 0 0 0 0 2 0 0 19 0 0 0

- 100
0
N 0 0 0 0 0 0 0 0 0 32 26 0 0 0 0 0 0 0 0 20 1 0 0

.....
N
0 63 0 0 0 0 0 0 0 61 0 0 0 0 0 0 12 0 0 1 1 0 0

N
N
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

"'
N
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
-o
0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Fig. 7. Confusion matrix ofthe CNN model

V. CONCLUSION language. Future research should aim to expand the system


to recognize dynamic sign language gestures for more
This system was successfully developed, implemented, comprehensive communication. The system has also been
tested, and deployed. The paper was able to emphasize the specifically trained and tested on the ASL alphabet.
significance of sign language recognition systems in the lives However, sign languages vary across different regions and
of the deaf and the hearing-impaired. It was also able to countries, each with their own unique vocabulary and
recognize sign language gestures with high accuracy. The grammar. The system's performance on other sign languages
system was able to recognize gestures in real-time. However, remains unknown and requires further investigation and
additional data in different environmental conditions may adaptation to ensure its generalizability.
improve the model's efficacy and reliability. Also, the
system's recognition capabilities are currently limited to
static ASL alphabet gestures. It does not consider the ACKNOWLEDGMENT
dynamic and continuous nature of sign language, which The authors acknowledge the financial support offered by
involves intricate movements, facial expressions, and body Covenant University in the publication of this paper.

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.
REFERENCES 2022, pp. 1--6.
[17] A. Ademola, T. E. Somefun, A. F. Agbetuyi, and A.
[1 ] S. A. Khan, "Importance of hearing and hearing loss
Olufayo, "Web based fingerprint roll call attendance
treatment & recovery," IJSA, vol. 3, no. 1, pp. 14-16,
management system," Int. J. Electr. Comput. Eng. , vol. 9,
2022.
no. 5, pp. 4364-437 1 , 2019, doi:
[2] K. Aashritha and V. M. Manikandan, "Assistive 1 0. 1 1 591/ijece.v9i5 .pp4364-437 1 .
Technology for Blind and Deaf People: A Case Study," in
[ 1 8] J . Eunice, Y . Sei, and D . J . Hemanth, "Sign2Pose: APose­
Machine Vision and Augmented Intelligence: Select
Based Approach for Gloss Prediction Using a
Proceedings ofMAI 2022, Springer, 2023, pp. 539-55 1 .
Transformer Model," Sensors, vol. 23, no. 5, p. 2853,
[3] S . Treat, "Deaf Education: Gallaudet university: How deaf 2023.
education and special education is being advanced in
[19] T. A. Siby, S. Pal, J. Arlina, and S. Nagaraju, "Gesture
Nigeria," Retrieved Nov, vol. 2, p. 2016, 20 1 6.
based Real-Time Sign Language Recognition System," in
[4] C. J. Eleweke, I. 0. Agboola, and S. I. Guteng, 2022 International Conference on Connected Systems &
"Reviewing the pioneering roles of Gallaudet University Intelligence (CSI), IEEE, 2022, pp. 1--6.
alumni in advancing deaf education and services in
[20] E. Adetiba et a/. , "FEDGEN Testbed: A Federated
developing countries: Insights and challenges from
Genomics Private Cloud Infrastructure for Precision
Nigeria," Am. Ann. Deaf, vol. 1 60, no. 2, pp. 75-83, 20 15.
Medicine and Artificial Intelligence Research," in
[5] E. Sammari and A. Naceur, "The Effectiveness of an Communications in Computer and Information Science,
Intervention Program for the Development of Social and Springer, 2022, pp. 78-9 1 . doi: 10.1 007/978-3-030-
Emotional Capacities of Children with a Hearing 95630-1_6.
impairment," Psychology, vol. 13, no. 7, pp. 1025-1062,
[2 1 ] A. M. Soliman, M. M. Khattab, and A. M. Ahmed,
2022.
"Arabic Sign Language Recognition System: Using an
[6] S. Shave, C. Botti, and K. Kwong, "Congenital Image-Based hand Gesture Detection Method to help
sensorineural hearing loss," Pediatr. Clin., vol. 69, no. 2, Deaf and Dump Children to Engage in Education,"vol. 32,
pp. 22 1-234, 2022. no. 58, pp. 1-28, 2023.
[7] L. Lee, "The Importance of Learning Deaf Culture [22] K. Amrutha and P. Prabu, "ML based sign language
through a Black Deaf Perspective in the Field of recognition system," 2021 Int. Conf Innov. Trends Inf
Communication Sciences and Disorders," 2022. Techno/. ICITIIT 2021, 202 1 , doi:
[8] S. Colibaba, I. Gheorghiu, A. Colibaba, 0. Ursa, C. 10.1 1 09/ICITIIT51 526.202 1 .9399594.
Antoni¢, and R. Cir�mari, "The Voice Project: [23] P. Pranav and R. Katarya, "Optimal Sign language
Habilitating Hearing-Impaired Children to Recover recognition employing multi-layer CNN," in 2022 4th
Hearing and Lead a Normal Life," in 2022 E-Health and International Conference on Advances in Computing,
Bioengineering Conference (EHB), IEEE, 2022, pp. 1-4. Communication Control and Networking (ICAC3N),
[9] Y. Xusnora and A. Yulduz, "Ways to develop vocabulary IEEE, 2022, pp. 288-293.
in children with hearing impairment," in E Conference [24] Y. Farhan, A. A. Madi, A. Ryahi, and F. Derwich,
Zone, 2022, pp. 229-230. "American Sign Language: Detection and Automatic Text
[10] K. N. Robillard, E. de Vrieze, E. van Wijk, and J. J. Generation," in 2022 2nd International Conference on
Lentz, "Altering gene expression using antisense Innovative Research in Applied Science, Engineering and
oligonucleotide therapy for hearing loss," Hear. Res. , p. Technology (IRASET), IEEE, 2022, pp. 1--6.
108523, 2022. [25] A. Thakur, P. Budhathoki, S. Upreti, S. Shrestha, and S.
[11] A. S. Dhanjal and W. Singh, "An optimized machine Shakya, "Real Time Sign Language Recognition and
translation technique for multi-lingual speech to sign Speech Generation," J. Innov. Image Process. , vol. 2, no.
language notation," Multimed. Tools Appl., vol. 8 1 , no. 2, pp. 65-76, 2020, doi: 10.36548/jiip.2020.2.00 1 .
1 7, pp. 24099-24 1 17, 2022. [26] S. Shrenika and M. Madhu Bala, "Sign Language
[12] S. H. Hamerdinger and C. J. Crump, "Sign language Recognition Using Template Matching Technique," 2020
interpreters and clinicians working together in mental Int. Conf Comput. Sci. Eng. Appl. ICCSEA 2020, pp. 5-9,
health settings," Routledge Handb. Sign Lang. Trans/. 2020, doi: 10. 1 1 09/ICCSEA49143.2020.91 32899.
Interpret. , 2022. [27] L. K. S. Tolentino, R. 0. Serfa Juan, A. C. Thio-ac, M. A.
[13] A. H. Alhafdee, H. Abbas, and H. I. Shahadi, "Sign B. Pamahoy, J. R. R. Forteza, and X. J. 0. Garcia, "Static
Language Recognition and Hand Gestures Review," sign language recognition using deep learning," Int. J.
Kerbala J. Eng. Sci. , vol. 2, no. 4, pp. 1 92-3 1 6, 2022. Mach. Learn. Comput. , vol. 9, no. 6, pp. 82 1-827, 2019,
[ 14] W. Lu et a/., "Artificial Intelligence-Enabled Gesture­ doi: 10. 1 8 1 78/ijmlc.2019.9.6.879.
Language-Recognition Feedback System Using Strain­ [28] S. Puri, M. Sinha, S. Golaya, and A. K. Dubey, "Indian
Sensor-Arrays-Based Smart Glove," Adv. Intell. Syst., p. sign language recognition using python," in Emerging
2200453, 2023. Technologies in Data Mining and Information Security,
[15] D. W. Alausa et a/. , "PalmMatchDB : An On-Device Springer, 202 1 , pp. 427-434.
Contactless Palmprint Recognition Corpus," in 2023 IEEE [29] S. Sharma and K. Kumar, "ASL-3DCNN: American sign
3rd International Conference on Power, Electronics and language recognition technique using 3-D convolutional
Computer Applications, ICPECA 2023, IEEE, 2023, pp. neural networks," Multimed. Tools Appl., vol. 80, no. 17,
3 1 8-325. doi: 1 0. 1 1 09/ICPECA56706.2023.1 0076097. pp. 2631 9-263 3 1 , 202 1 , doi: 10.1 007/s 1 1 042-02 1 - 1 0768-
[ 16] E. Hisham and S. N. Saleh, "ESMAANI: A Static and 5.
Dynamic Arabic Sign Language Recognition System [30] M. Quinn and J. I. Olszewska, "British sign language
Based on Machine and Deep Learning Models," in 2022 recognition in the wild based on multi-class SVM," in
5th International Conference on Communications, Signal 2019 federated conference on computer science and
Processing, and their Applications (ICCSPA), IEEE, information systems (FedCSIS), IEEE, 2019, pp. 81-86.

Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on August 25,2023 at 12:22:36 UTC from IEEE Xplore. Restrictions apply.

You might also like