0% found this document useful (0 votes)
6 views5 pages

Signlanguage SEL

This document presents a conference paper detailing the development of a real-time British Sign Language recognition system using Convolutional Neural Networks (CNN) to classify both single and double-handed gestures. The system utilizes a bespoke dataset of 11,875 images and achieves an average recognition accuracy of 89% during testing. The paper discusses the methodologies, system architecture, and implementation details, highlighting the advantages of using a 3D camera for gesture recognition.

Uploaded by

amiya07chinnu03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Signlanguage SEL

This document presents a conference paper detailing the development of a real-time British Sign Language recognition system using Convolutional Neural Networks (CNN) to classify both single and double-handed gestures. The system utilizes a bespoke dataset of 11,875 images and achieves an average recognition accuracy of 89% during testing. The paper discusses the methodologies, system architecture, and implementation details, highlighting the advantages of using a 3D camera for gesture recognition.

Uploaded by

amiya07chinnu03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/351625774

A CNN sign language recognition system with single & double-handed


gestures

Conference Paper · May 2021


DOI: 10.1109/COMPSAC51774.2021.00173

CITATIONS READS

21 734

2 authors, including:

Emanuele Lindo Secco


Liverpool Hope University
177 PUBLICATIONS 2,088 CITATIONS

SEE PROFILE

All content following this page was uploaded by Emanuele Lindo Secco on 04 October 2022.

The user has requested enhancement of the downloaded file.


A CNN sign language recognition system with
single & double-handed gestures
Neil Buckley Lewis Sherrett Emanuele Lindo Secco
AI Laboratory, School of Mathematics, AI Laboratory, School of Mathematics, Robotics Laboratory, School of
Computer Science and Engineering Computer Science and Engineering Mathematics, Computer Science and
Liverpool Hope University Liverpool Hope University Engineering
Liverpool, UK Liverpool, UK Liverpool Hope University
[email protected] [email protected] Liverpool, UK
[email protected], 0000-0002-3269-
6749

Abstract— This work aims presents a novel Computer Vision categorised into two methods. The first method involves
approach in the development of a real-time, web-camera based, capturing gesture data using a 3D camera, and the second
British Sign Language recognition system. A literature review method involves capturing gesture data using a 2D camera.
focused on current (1) state of sign language recognition systems Therefore the literature to be reviewed in this sub-chapter shall
and (2) techniques used is conducted. This review process is used be represented in two further sub-chapters discussing SLR
as a foundation on which a Convolutional Neural Network systems which use 3D cameras, and SLR systems which use
(CNN) based system is design as a and then implemented. A 2D cameras respectively.
bespoke British Sign Language dataset - containing 11,875
images - is then performed to train and test the CNN which is An advantage to using a 3D camera over a 2D camera is
used for the classification of human hand performed gestures. that the depth capturing capabilities of a 3D camera allow for
Finally, the CNN architecture recognized 19 static British Sign easier image pre-processing as everything registered as greater
Language gestures, incorporating both single and double- in depth than the signer can quickly be removed, solving the
handed gestures. During testing, the system achieved an average issue of complex environmental backgrounds and lighting as
recognition accuracy of 89%. seen with 2D cameras [8].
Keywords— sign language recognition, AI, CNN, human- Other SLR system are only capable of recognising sign
machine interaction language gestures in uniform background and lighting
I. INTRODUCTION conditions, such as Tolentino et al work [9], which achieved a
recognition accuracy of 93.67% in recognising gestures of the
Research in the area of systems that are capable of American Sign Language in uniform lighting and background
recognising sign language has received substantial attention conditions when tested with 30 individuals. The system
over the past few decades, fuelled in particular by the rapid operated in real time, and used a CNN to classify the gestures
evolution of artificial intelligence techniques [1-3]. In turn this performed. However, the system modified some of the
has led to the development of many Sign Language gestures because their similarity to other gestures would have
Recognition Systems, which shall be referred to as SLR caused misrecognition and affected the accuracy of the
systems throughout the remainder of this chapter. These system. The SLR system proposed by Sawant and Kumbhar
systems though varying in sign language dialect, share the [10] also avoided the issue of complex backgrounds by
common goal of correctly recognising hand gestures capturing all testing footage on a white background which not
performed by a signer. However the varying proposed only worked to limit background interference but also limited
approaches to achieving this goal has produced a diverse area ununiformed lighting. This SLR system was able to recognise
of research and development encompassing areas of computer 26 Indian Sign Language gestures and used Principal
science such as Computer Vision (CV), Sensor Processing, Component Analysis (PCA) during the classification stage. A
Human-Computer Interaction, and Pattern Recognition [1-4]. paper authored by Berru-Novoa et al [11], tested a host of
Of these SLR systems there are two main types of design and classification methods for a SLR system using a uniform
implementation, which are those that use wearable sensors, background and lighting approach. This study found that
and those that use video footage and images. Both shall be amongst the classification methods of a Support Vector
discussed below. Machine, or SVM, a KNN, and an ANN, that the SVN
SLR systems that utilise sensors worn on the body to performed best achieving a recognition accuracy of 89.17%.
capture sign language gestures usually comprise of sensor- However the KNN and ANN were less than 2% behind. This
embedded gloves that are worn on the hands. These types of system also used HOG which stands for Histogram of
SLR systems are one of the two main approaches to capturing Orientated Gradients in the feature extraction stage.
gestures to be classified [1]. There have many sensor-based
II. MATERIALS AND METHODS
SLR system developed. Of these developed systems many
rely on sensor fusion to achieve an accurate recognition rate; The design of the sign language recognition system itself
such as the system proposed by Kim et al [5] in which ‘bi- is a multi-step process. All of the steps involved are detailed
channel sensor fusion’ is used to combine data from an below.
accelerometer and electromyogram embedded glove that A. The Complexity of Sign Language
covers the hand and upper wrist to recognise German Sign
Language gestures [6, 7]. All sign languages are gesturally complex, which some
signs being performed with one hand and others being
SLR systems designed to use video footage or image data performed with two hands. The position and rotation of the
to capture a gesture performed by a user can be further hands is important in the gestures, as is the position of the

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


fingers. British Sign Language is no different and the 19 recognition system component during the feature extraction
gestures which have been chosen for this work reflect these and classification stage. This system architectural design is
diverse variations in hand and finger position. This is to ensure visualised in Fig.2.
that the system which is produced has the capacity to deal with
these variations. The 19 gestures which will be used by the
system can be seen being performed in Fig.1.

Fig. 2. The system architecture design.

D. CNN Architecture
The system will make use of a CNN for the feature
extraction and classification process. This is because of the
abilities of CNNs in performing feature extraction processes
Fig. 1. The British Sign Language gestures to be used in the system. in CV based systems as discussed in [13]. A bespoke CNN
Of the signed gestures seen in Fig.1 a diverse range can be will be created for the system which means that it will not have
identified. For example, signs such as 0 and C are performed been pre-trained on any previous data. This is to allow for the
one-handed, whereas signs such as Q and W are performed CNN to be specifically tailored for the purpose of providing
double-handed. Furthermore, the importance of finger accurate gesture recognition in this system. The architecture
position is evident with signs such as A and L in which the of the CNN can be seen in Fig. 3.
splayed fingers and position of the fingers on the secondary
hand of A make the difference between gesturing an L.
However, there are also clear similarities between some
gestures such as 5 and E in which the only defining factor of
the E gesture is the secondary index finger pointing to the
index finger of the main hand, without which the E gesture
would become a 5 gesture. These will be challenging cases for
the system to deal with, and they have been incorporated to
test the robustness of the system.
B. A web-camera based system
The system will make use of a web-camera for capturing
the gesture performed by the user. This is because in the real-
world web-cameras are common, portable, and cheap, making
them accessible for users of a sign language recognition
system. The web-camera based approach was discussed in the
literature review chapter, and the many SRL systems develop Fig. 3. The CNN set up.
using this approach proves the viability of web-camera based
E. Dataset set up
systems. The web-camera used in this implementation will be
the Advent AWC72015 which contains a 12 mega pixel The design of the dataset involves 2 steps, of which the first
camera, with a 720 p resolution, which captures at 30 frames is the structure of the dataset, and the second is the contents
a second [12]. However, the system is not web-camera model of the dataset. These steps are detailed below.
specific, and therefore can be used with any web-camera.
1. Dataset Structure
C. System Architecture Design The dataset will be housed in a structure that is
The system will be designed around two main conveniently accessible for the CNN to perform training and
components. The first of these components is the sign testing with. Two main folders shall be used, one folder will
language recognition system itself. This component will be used to house the training data, and the other shall hold the
capture frames from the web-camera, pre-process these testing data. Both of these folders will contain an identical set
frames, and send these frames to the classifier for recognition. of 19 sub-folders representing each gesture recognisable by
This component will also handle the graphical user interface, the system. Inside these sub-folders the image data shall be
and user input. The other component will be used to house the stored in .jpg format. All images will consist of the
classifier, and will consist of two smaller parts. Of which the dimensions 288 x 312.
first will be the creator of the classifier, this part will construct
the model, and perform training and testing of the model. The 2. Gesture position and distance
second will be the compiled, trained, and saved model itself. The 19 gestures which are to be recognisable to the system
The saved model will be used with the sign language have been displayed in Fig.1 however in this example all signs
were present at close distance to the web-camera and in a directory, along with the training and testing datasets. As
neutral position. It is important to build a dataset containing stated in the design, both of these datasets contain a folder
data depicting gestures being performed at varying angles, corresponding to one of the 19 recognisable gestures.
positions, and distance to the web-camera. Including this data
will result in a more robust classification model, which is able III. RESULTS
to deal with fringe events. This process of collecting data will Each image data is captured by the web-camera and fed
also makes the system more versatile and able to deal more to the system in real-time. No pre-processing steps are taken
efficiently with other users, who may perform gestures until the user presses the b key on their keyboard, which runs
slightly differently [14]. a background subtraction task. This background subtraction
process is the first of multiple steps of image pre-processing
used by the system. Fig. 5 below displays the process of
background subtraction.

Fig. 5. Pre-processing steps.


Fig. 4. The Implemented System Architecture.

F. Implementation The graphical user interface was implemented using the


openCV library. The interface features the original frame in
The design was implemented using the python
its entirety, with a red box in the top left-hand corner. The red
programming language. This is due to Python’s expansive
libraries and suitableness in developing Artificial Intelligence box represents the region of interest and in the area in which
based approaches [15]. The Python libraries used in the the user must perform the gestures. If the CNN recognises a
implementation of the system and the reason for their use will gesture with 60% or more certainty, the gesture’s name is
be discussed in the bullet-point list below: displayed on-screen to the user (Fig. 6).
▪ Tensorflow, which is an AI library, was used as a
backend for the Keras library
▪ Keras, which is an API built on top of Tensorflow
allows for the abstraction of complex Tensorflow
commands with the replacement of more user-
friendly Keras commands. Keras was used to
construct, train, test, and save the CNN model used
in the implementation. Fig. 6. The Graphical User Interface (GUI).
▪ openCV is a computer vision oriented AI library to A. Data set testing
import the video footage from the web-camera, and
then carry out multiple steps of image pre-processing The testing carried out on the implementation using the
on the captured frames. It was also used for the GUI testing dataset is discussed in this sub-chapter. The testing
implementation and to save the dataset images to file. dataset consisted of a total of 2,375 images, which is 125
images per gesture. Examples of the training images can be
▪ Numpy is a library containing high level seen in Fig. 7.
mathematical functions and many matrices and array
functions.
▪ OS module allows for interaction with the operating
system. It was used to define the file directories and
saving the training CNN model.
▪ Time module provides time-based functions, and was
used to pause camera capture feed while tasks such
as background subtraction were performed.
Python, ver. 3.7.0, was used for the implementation as this
was the most current version which was compatible will all
imported libraries. The implemented system architecture can
be seen in Fig. 4 below. The components which make up the
sign language recognition system can all be found in the same Fig. 7. Selection of testing dataset images.
The dataset was initialised, with the labels withheld, and motivated by the communication gap that exists between the
the CNN was used to predict the gesture being performed in hearing and hearing-impaired community, and the ability of
the images. If the CNNs prediction matched the label for that SLR systems to bridge this gap. However further motivating
image, the test was deemed a success, if not the test was was the lack of British Sign Language recognition systems,
deemed a failure. Table 1 displays the accuracy of the 10 tests and the stark difference between those with a hearing-
which were carried out on the trained mode. impairment and the number of registered users of Sign
Language in the United Kingdom.
TABLE I. DATASET TESTING RESULTS
Test number 1 2 3 4 5
ACKNOWLEDGMENT
Accuracy [%] 88.13 87.92 87.99 87.88 88.81 This work was presented in thesis form in fulfilment of the
6 7 8 9 10 requirements for the BSc in Computer Science for the student
87.88 87.92 87.99 87.86 87.92
Lewis Sherrett under the supervision of Dr Neil Buckley from
the AI Laboratory, School of Mathematics, Computer Science
B. Real-time testing and Engineering, Liverpool Hope University.
The testing of the trained CNN using the testing dataset is References
useful for providing an insight into the effectiveness of the [1] Shanableh T, Assaleh K, Al-Rousan M (2007) Spatio-Temporal
model, however the testing dataset contains on static images Feature Extraction Techniques for Isolated Gesture Recognition in
Arabic Sign Language. IEEE Trans on Cybernetics, 37(3), pp.641-650.
and the system’s real-world function is to allow for real-time
[2] Setiawardhana Hakkun RY, Baharuddin A (2015) Sign Language
gesture recognition. Therefore a further series of tests has been Learning Based on Android for Deaf and Speech Impaired People. In:
carried out on the system to determine its robustness as a real- IEEE. 2015 Int Electronics Symposium. Surabaya, pp.114-117.
time sign language recognition system. The comments of [3] Soodtoetong N, Gedkhaw E (2018) The Efficiency of Sign Language
some of these tests can be found in Table 2. Recognition using 3D Convolutional Neural Networks. In: IEEE. 2018
15th Int Conf on Electrical Engineering/Electronics, Computer,
TABLE II. REAL-TIME TESTING RESULTS Telecommunications and Information Technology. Chiang Rai, 70-73.
[4] Chu TS, Chua AY, Secco EL, A Wearable MYO Gesture Armband
Signed Controlling Sphero BB-8 Robot, HighTech and Innovation Journal,
Accuracy comment
gesture
1(4), 179-186, http://dx.doi.org/10.28991/HIJ-2020-01-04-05
The 0 sign had the highest recognition accuracy of the signs and was
[5] Kim J, Wagner J et al (2008) Bi-Channel Sensor Fusion for Automatic
recognized correctly almost every time, regardless of distance from the
web-camera or rotation of the wrist. The placement of the hand in the Sign Language Detection. In: IEEE. 8th IEEE Int Conf on Automatic
0 Face and Gesture Recognition. Amsterdam, 17-19 September, pp.1-6.
region of interest also had no negative effect on the recognition accuracy.
This is most likely down to the uniqueness of this gesture. The sign is [6] Maereg AT, Lou Y et al (2020), Hand Gesture Recognition Based on
one handed (no other similarity with such position of the fingers). Near-Infrared Sensing Wristband, Proc. 15th Int Joint Conf on CV,
The 1 sign had a high recognition accuracy and could be distinguished Imaging & Computer Graphics Theory and Applications, 110-117.
by the CNN in many cases of distance and hand location inside of the
1 region of interest. However, the rotation of the wrist to the left could [7] Myers K, Secco EL (2020) A Low-Cost Embedded Computer Vision
cause a misrecognition of a 2 sign in some cases, especially causes where System for the Classification of Recyclable Objects, Lecture Notes on
the background lighting was not highly uniformed. Data Eng and Comms Tech, 61.
The 2 sign had a high recognition accuracy, and could be distinguished
[8] Galicia R, Carranza O. et al (2015) Mexican Sign Language
2 by the CNN in most cases. This could be due to the vast diversity of
testing images used for the 2 gesture.
Recognition using Movement Sensor. In: IEEE. 2015 IEEE 24th Int
The A sign boasted a high recognition accuracy and came in second place Symposium on Industrial Electronics. Buzios, 3-5 June. pp.573-578.
A
of the highest recognizable alphabetical gestures. A trend with all [9] Tolentino LK, Juan RS et al (2019) Static Sign Language Recognition
double-handed alphabetical gestures is greater reliance upon uniformed Using Deep Learning. Int J of ML and Computing, 9(6), pp.821-827.
background lighting in the pursuit of high recognition accuracies.
The B sign is quite unique which lead to its relatively high recognition [10] Sawant SN, Kumbhar MS (2014) Real Time Sign Language
accuracy; on some occasions it could be misrecognized for a D gesture. Recognition using PCA. In: IEEE. 2014 IEEE Int Conf on Advanced
B Rotating the hands inwards slightly and therefore ensuring that the gaps Comms, Control, & Computing Technologies. 8-10 May, 1412-1415.
between the curled fingers were visible on both hands usually resulted in [11] Berru-Novoa B, Gonzales-Valenzuela R et al (2018) Peruvian Sign
the recognition being corrected to a B sign. Language Recognition using Low Resolution Camera. In: IEEE. 2018
The C sign possessed the highest recognition accuracy of the alphabetical
IEEE XXV Int Conf on Electronics, Electrical Engineering and
gestures and was recognized accurately in almost the same cases as the
C 5 sign. This is most likely due to the sign being performed one-handed,
Computing. Lima, 8-10 August, pp.1-4.
and also the unique curved finger shape of the sign. This sign was the [12] Advent (2015) Instruction Manual PC Webcam Model AWC72015.
most versatile in terms of hand position and recognition accuracy. [13] Pigou L, Dielemna S et al (2015) Sign Language Recognition Using
Convolutional Neural Networks. Computer Vision - ECCV 2014
IV. CONCLUSION & DISCUSSION Workshop. Zurich, Switzerland, 6-7 September. Springer, pp.572-578.
This work aimed to research the current state of sign [14] Kanwal K, Abdullah S et al (2014) Assistive Glove for Pakistani Sign
language recognition systems and to use this research as a Language Translation. In: IEEE. 17th IEEE International Multi Topic
Conference 2014. Karachi, 8-10 December, pp.173-176.
foundation to develop a computer vision web-camera based
[15] Sodhi P, Awasthi N et al (2018) Introduction to Machine Learning and
British Sign Language recognition system, along with a its Basic Application in Python. Proceedings of 10th Int Conf on Digital
bespoke British Sign Language gesture dataset. The system Strategies for Organizational Success, India, 5-7 January. pp.1-22.
was designed to be able to recognise 19 British Sign Language
gestures, and operate in real-time. This primary aim was

View publication stats

You might also like