0% found this document useful (0 votes)
13 views11 pages

A Wearable System For Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors

This document presents a wearable system for real-time recognition of American Sign Language (ASL) using inertial measurement unit (IMU) and surface electromyography (sEMG) sensors. The system aims to improve communication between deaf and hearing individuals by translating ASL signs into text or speech, achieving high accuracy rates in evaluations. The paper discusses the methodology, hardware components, and experimental results, highlighting the significance of fusing IMU and sEMG data for enhanced performance in sign language recognition.

Uploaded by

manisrinivas0609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

A Wearable System For Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors

This document presents a wearable system for real-time recognition of American Sign Language (ASL) using inertial measurement unit (IMU) and surface electromyography (sEMG) sensors. The system aims to improve communication between deaf and hearing individuals by translating ASL signs into text or speech, achieving high accuracy rates in evaluations. The paper discusses the methodology, hardware components, and experimental results, highlighting the significance of fusing IMU and sEMG data for enhanced performance in sign language recognition.

Uploaded by

manisrinivas0609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

http://ieeexplore.ieee.

org/Xplore

A Wearable System for Recognizing American


Sign Language in Real-time Using IMU and
Surface EMG Sensors
Jian Wu, Student member, IEEE, Lu Sun, Roozbeh Jafari, Senior Member, IEEE

 traditional ways of communication between deaf persons and


Abstract—A Sign Language Recognition (SLR) system hearing individuals who do not know sign language exist:
translates signs performed by deaf individuals into text/speech in through interpreters or text writing. The interpreters are very
real time. Inertial measurement unit (IMU) and surface expensive for daily conversations and their involvement will
electromyography (sEMG) are both useful modalities to detect result in a loss of privacy and independence of deaf persons.
hand/arm gestures. They are able to capture signs and the fusion The text writing is not an efficient way to communicate because
of these two complementary sensor modalities will enhance system
writing is too slow compared to either spoken/sign language
performance. In this paper, a wearable system for recognizing
American Sign Language (ASL) in real-time is proposed, fusing and the facial expressions during performing sign language or
information from an inertial sensor and sEMG sensors. An speaking will be lost. Thus, a low-cost, more efficient way of
information gain based feature selection scheme is used to select enabling communication between hearing people and deaf
the best subset of features from a broad range of well-established people is needed.
features. Four popular classification algorithms are evaluated for A sign language recognition (SLR) system is a useful tool to
80 commonly used ASL signs on four subjects. The experimental enable communication between deaf people and hearing people
results show 96.16% and 85.24% average accuracies for who do not know sign language by translating sign language
intra-subject and intra-subject cross session evaluation
respectively, with the selected feature subset and a support vector
into speech or text [2, 3]. Fig. 1 shows a typical application of
machine classifier. The significance of adding sEMG for sign language recognition system. The system can be worn by
American Sign Language recognition is explored and the best deaf people who cannot talk and translates the signs performed
channel of sEMG is highlighted. to text or speech on the cell phone of the people who can hear
and talk. The speech recognition system on deaf person’s cell
Index Terms—American Sign Language recognition; IMU phone translates speech into sign language images/videos. The
sensor; surface EMG; feature selection; sensor fusion
speech recognition part is not considered in this paper. The
real-time translation enables them communicate in a more
convenient and natural way.
I. INTRODUCTION
There are different sign languages in different countries in

A sign language is a language which uses manual


communication to convey meaning, as opposed to
acoustically conveyed sound patterns. It is a natural
different regions. Around 300 hundred sign languages are in
use all over world today. Sign languages are natural languages
and similar to spoken languages, they differ from each other.
language widely used by deaf people to communicate with each The system should be studied and designed for a specific sign
other [1]. However, there are communication barriers between
language. In this paper, we focus on the recognition of ASL.
hearing people and deaf individuals either because signers may
There are thousands of signs in ASL dictionary but most of
not be able to speak and hear or because hearing individuals
them are not commonly used. In our paper, 80 commonly used
may not be able to sign. This communication gap can cause a
negative impact on lives and relationships of deaf people. Two signs are chosen from 100 basic ASL signs [4, 5]. A sign
consists of hand shape, hand location, hand orientation, hand
This work was supported in part by the National Science Foundation, under and arm movement and facial expression. In our paper, facial
grants CNS-1150079 and ECCS-1509063. Any opinions, findings, expression is not considered when we design our system.
conclusions, or recommendations expressed in this material are those of the Vision-based and glove-based SLR systems are well-studied
authors and do not necessarily reflect the views of the funding organizations.
Jian Wu is with department of Computer Science and Engineering, Texas
systems which capture signs using cameras and sensory glove
A&M University, College Station, TX 77840 USA devices, respectively [6, 7, 8, 9, 10].Vision-based techniques
(e-mail:[email protected]). typically require cameras to be mounted in the environment
Lu Sun, was with University of Texas at Dallas, Richardson, TX 75080
USA (e-mail: [email protected]).
which inherently suffer from a limited range of vision. Further,
Roozbeh Jafari is with Center of Remote Health Technologies and Systems, the required infrastructure may not be available at all of the
Departments of Biomedical Engineering, Computer Science and Engineering, desired locations or may be too expensive to implement. Issues
and Electrical and Computer Engineering, Texas A&M University, College associated with users’ privacy also limit the utility of
Station 77840 USA (e-mail: [email protected]).
Digital Object Identifier 10.1109/JBHI.2016.2598302 vision-based techniques. Due to high cost of glove devices,
glove-based SLR systems are not ideal for use in daily life.

2168-2194 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Fig. 1. Typical application of sign language recognition system.

Wearable inertial measurement unit (IMU) based gesture


recognition systems attract much research attention due to their II. RELATED WORK
low cost, low power consumption and ubiquitous sensing SLR systems are well studied in the areas of computer vision
ability [11, 12]. An IMU measures acceleration and gravity and image processing. Two vision-based real-time ASL
with a 3-axis accelerometer and angular velocities with a 3-axis recognition systems are studied for sentence level continuous
gyroscope. A surface electromyography (sEMG) sensor American Sign Language using Hidden Markov Model (HMM)
measures muscle electrical activity and it can be used to detect [6]. In the first system, the camera is mounted on the desk
neuromuscular diseases and to analyze human kinetics. while in the second system, the camera is mounted on a cap
Different signs will generate different muscle electrical patterns which is worn by the user. They are both tested for 40 signs and
and sEMG is able to capture this information to distinguish achieve 92% and 98% accuracy, respectively. A framework for
different gestures [13, 14]. For sign language recognition recognizing the simultaneous aspects of ASL is proposed [7].
systems, the wrist worn IMU sensor is good at capturing hand This framework targets at addressing the scalability issue
orientations and hand and arm movements while sEMG does associated with HMM. It breaks down the signs into their
well in distinguishing different hand shapes and finger phonemes and modeling them with parallel HMM. In this way,
movements when the sensors are placed on the forearm. Thus, the state space will decrease significantly as the number of
they each have their own advantages capturing different signs increases. Another vision-based SLR system is studied
information about a sign. The fusion of these two for a medium vocabulary Chinese Sign Language [17]. Robust
complementary modalities will enhance the performance of an hand detection, background subtraction and pupil detection are
SLR system and thus enable the recognition of a large number implemented as the first module, followed by a tiered-mixture
of signs [15]. density HMM. With the aid of a colored glove, this system
A wearable system for recognizing American Sign Language achieves 92.5% accuracy for 439 Chinese Sign Language
in real-time fusing information from inertial and sEMG sensors words. A combination of three new vision based features are
is proposed in this paper. Although such a system has been explored for ASL recognition [18]. Three features are mapped
studied for Chinese Sign Language [16], to the best of the into four components of ASL: hand shape, place of articulation,
authors’ knowledge this is the first time such a system is hand orientation and movement. The proposed features achieve
studied for American Sign Language. In our work, an adaptive 10.90% error rate on an existing dataset.
auto-segmentation technique using sEMG is proposed to define Glove-based SLR systems implement multiple sensors on
the beginning and ending of a sign. A broad range of the glove and capture the physical features of the gestures.
well-studied features from both inertial and sEMG sensors are Unlike vision-based systems, they do not require cameras
extracted from each segment and a best feature subset is mounted around the user and the system can perform
selected using an information gain based feature selection recognition at any place at any time with a wearable glove. A
approach. Four popular classification algorithms are evaluated glove-based Australian SLR system is studied using two
for intra- and inter-subject testing and the significance of classifiers (i.e. Instance based classifier and decision tree
adding sEMG for SLR is explored. classifier) with some simple features. 80% accuracy is achieved
The remainder of this paper is organized as follows. The for 95 AUSLAN signs [19]. The performance of artificial
related work is discussed in Section II. Our lab customized neural networks is explored for an ASL recognition system
sEMG data acquisition and IMU hardware platforms are using a sensory glove [9]. It achieves about 90% accuracy for
introduced in Section III. The details of our system are 50 ASL signs.
explained in Section IV, followed by the experimental setup in The low cost wearable accelerometer and sEMG based SLR
Section V. The experimental results are explained in Section VI systems have the same advantages as glove-based systems
and limitations are discussed in Section VII. At last, the paper is compared to vision-based approach while they cost much less
concluded in Section VIII. than glove based systems since they have fewer sensors
deployed. Therefore, this kind of wearable SLR system is
JBHI-00032-2016 3

Fig. 3. 8-channel sEMG acquisition system.


Fig. 2. Motion Sensor Board.

gyroscope and 4-channel sEMG) are discussed. 3) Gyroscope


gaining more and more popularity in recent years. SLR system
is incorporated and the significance of adding sEMG is
has been explored in several studies fusing information from
analyzed. 4) Although such a system has been studied for
accelerometer and sEMG sensors. The comparison of
Chinese Sign Language [16], our paper is the first study for
accelerometer based and sEMG based gesture recognition
American Sign Language recognition fusing these two
systems is discussed [20]. It is suggested accelerometer and
modalities.
sEMG sensors are good at capturing different information of
gestures and the performance enhancement combining these
III. HARDWARE DESCRIPTION
two modalities has been studied. The experiments show 5% -
10% accuracy improvement is obtained after fusing these two A. IMU Sensor
modalities [21]. An accuracy of 93% of recognizing 60 Greek Fig. 2 shows the 9-axis motion sensor customized in our lab.
Sign Language signs is achieved using only one effective The InvenSense MPU9150, a combination of 3-axis
sample entropy based feature set for both accelerometer and accelerometer, 3-axis gyroscope and 3-axis magnetometer,
sEMG [22]. A Chinese SLR framework is proposed fusing data severs as the IMU sensor. A Texas Instruments (TI) 32-bit
from an accelerometer and 4-channel sEMG sensors [16]. Auto microcontroller SoC, CC2538, is used to control the whole
segmentation is applied to extract sign words from continuous system. The board also includes a microSD storage unit and a
sentences according to sEMG signal intensity. Multiple dual mode Bluetooth module BC127 from BlueCreation. The
classifiers are implemented at different stages and the decisions system can be used for real-time data streaming or can store data
achieved by each individual classifier are fused. At the first for later analysis. It also has an 802.15.4 wireless module which
stage, the linear discriminate analysis (LDA) classifier is can offer low power proximity measurement or ZigBee
applied for both sEMG and accelerometer data which are able communication. In this paper, the sampling rates for
to capture hand shape and hand orientation, respectively. All accelerometer and gyroscope are chosen to be 100 Hz which is
sEMG and accelerometer features are cascaded and fit into a sufficient for the sign language recognition system [24].
multi-stream HMM to recognize signs. A Gaussian mixture
model is applied to fuse decisions obtained in the first stage. B. sEMG Acquisition System
Although this system obtains a 96.5% accuracy for 120 Chinese sEMG measures the electrical activity generated by skeletal
sign words with sensors deployed on two hands, multiple stages muscle. Fig. 3 shows a customized 16-channel
and multiple classifiers make it unfavorable for implementation Bluetooth-enabled physiological signal acquisition system. It
on real-time wearable computers based applications which are can be used for ECG, sEMG and EEG data acquisition. The
constrained by limited computational resources. Another system is used as a four channel sEMG acquisition system in
system is proposed to detect seven German sign words with this study. A TI low power analog front end, the ADS1299, is
99.82% accuracy achieved using an accelerometer and one used to capture four channel sEMG signals and a TI MSP430
channel sEMG [23]. However, this work is not extensively microcontroller is responsible for forwarding data to a PC via
evaluated for a large number of signs and does not include Bluetooth. A resolution of 0.4 μV is achieved setting a gain of 1
auto-segmentation which makes it difficult to operate in real on the ADS1299. Covidien Kendall disposable surface EMG
time. The major differences between our work and the previous patches are attached to skin and the same electrodes are used as
works are as follows: 1) An adaptive auto-segmentation is introduced in our previous work [25].
proposed to extract periods during which signs are performed Generally, sEMG signals are in the frequency range of 0Hz
using sEMG. 2) The best feature subset is selected from a broad -500 Hz depending on the space between electrodes and muscle
range of features using information gain criterion and the type [26]. To meet the Nyquist criterion, the sampling rate is
selected features from different modalities (e.g. accelerometer, chosen as 1K Hz, which is usually used in surface EMG based
pattern recognition tasks [27].
JBHI-00032-2016 4

Fig. 4. Diagram of proposed system.

training phase and testing phase. In the training phase, the


IV. PROPOSED SLR SYSTEM
signals from 3-D accelerometer (ACC), 3-D gyroscope (GYRO)
The block diagram of our proposed multi-modal ASL and four channel sEMG are preprocessed for noise rejection and
recognition system is shown in Fig. 4. Two phases are included:
signal can be detected from the wrist. Thus, sEMG signals are
used for our automatic segmentation technique since sEMG
synchronization purposes. The sEMG based auto-segmentation
signals can capture larger number of movements.
technique obtains the beginning and ending of a sign for both
To explain our segmentation technique, we first define the
IMU and sEMG. As the segmentation is done, a broad set of
average energy E of four sEMG channels in an n sample
well-established features are extracted for both IMU and sEMG
window in Equation (1). Sc(i) denotes ith sample of cth channel
signals. All extracted features are then put into one feature
of sEMG. m is total number of channels which equals four in our
vector. The best feature subset is obtained using an information
case. A non-overlapping sliding window is used to calculate E in
gain (IG) based feature selection scheme. Four different
every window. The length of the window is set to 128
classifiers are evaluated (i.e. decision tree, support vector
milliseconds, which covers 128 samples with the 1000 Hz
machine, NaïveBayes and nearest neighbor) on the selected
sampling frequency. If E in five continuous windows are all
feature subset and the best one is selected. In the testing phase,
larger than a threshold T, the first sample of the first window
the same techniques are repeated for preprocessing and
will be taken as the beginning of a gesture. If E in four
segmentation. The selected features are extracted and
continuous windows are all smaller than the threshold, the last
recognition of the sign is achieved by the chosen classifier.
sample in the last window is considered to be the ending of this
A. Preprocessing gesture.
The synchronization between IMU and sEMG data is 1 n m 2
important for fusion. In our system, IMU data samples and E ¦¦ sc (i)
ni1c1
(1)
sEMG data samples are sent to a PC via Bluetooth and
time-stamped with the PC clock. The synchronization is done by Different people have different muscular strengths which will
aligning samples with the same PC clock. Bluetooth causes a result in different E. A simple threshold may not be suitable for
transmission delay (5-20ms) for both IMU and sEMG data and all subjects. An adaptive estimation technique is proposed to
this small synchronization error is negligible for the purposes of adjust the threshold according to different subjects and different
our system. To remove low frequency noise in sEMG, a 5Hz IIR noise levels on-line. The proposed approach is explained in two
high pass filter is used since the frequency components of steps. In the first step, the average energy E is calculated for five
sEMG beyond the range of 5Hz – 450Hz are negligible [28]. continuous windows. If all five E is smaller than a*T, it is
The raw data is used for accelerometer and gyroscope. assumed no muscle activity is detected and the threshold is
updated with b*T in the second step. a is called the converge
B. Segmentation
parameter and this reduces the threshold T when quiet periods
Automatic segmentation is crucial for real-time applications. are detected. b is the diverge parameter which enlarges the
It extracts the period during which each sign word is performed threshold T as the noise level increases. The values of a, b and T
such that the features can be extracted on the correct segment are set to be 0.5, 4 and 0.01 for the system empirically. 0.01 is
before classification is done. For certain parts of some signs, much bigger than E for all subjects and the user is requested to
only finger movements are observed and no obvious motion
JBHI-00032-2016 5

construction. In our work, an information gain filter method is


used in conjunction with a ranking algorithm to rank all the
TABLE I. S
EMG FEATURES
Feature name (dimension) Feature name (dimension)
Mean Absolute Value (1) Variance (1)
Four order Reflection Willison Amplitude in 5 amplitude
Coefficients (4) ranges (5)
Histogram (1) Modified Median Frequency (1)
Root Mean Square (1) Modified Mean Frequency (1)
Four order AR coefficients (4)

TABLE II. I
MU SENSOR FEATURES
Feature name (dimension) Feature name (dimension)
Mean (1) Variance (1)
Standard Deviation (1) Integration (1)
Root Mean Square (1) Zero Cross Rate (1)
Mean Cross Rate (1) Skewness (1)
Kurtosis (1) First three orders of 256-point FFT
Coefficients (3)
Entropy (1) Signal Magnitude Area (1) Fig. 5. Placement of sEMG electrodes.
AR coefficients (10)
features. The best n features form the best feature subset which
have a 2-3 seconds quiet period at the beginning of system is evaluated with different classifiers. The choice of n is
operation to have the system converge to a suitable threshold. discussed in Section V. Compared to wrapper methods, the
features selected by filter methods will operate for any classifier
C. Feature Extraction
instead of working only with a specific classifier.
A large number of features have been proposed and studied
for both sEMG and IMU sensors for detecting activities or E. Classification
gestures. We adopt some of these well-established features in Four popular classification algorithms are studied in this
our paper [29, 30, 31, 32, 33]. Table I and Table II show features paper: decision tree (DT) [35], support vector machine
from sEMG and IMU sensors, respectively. The dimension of (LibSVM) [36], nearest neighbor (NN) and NaiveBayes. Weka,
each feature is also listed in the table. The sEMG features are a widely used open source machine learning tool, is applied for
extracted for all four channel signals and the total dimension is the implementations of these four algorithms [37]. The radial
76. The IMU sensor features are extracted for 3-axis basis function (RBF) kernel is selected for the LibSVM and the
accelerometer, 3-axis gyroscope and the magnitude of best kernel parameters are tuned using a grid search algorithm.
accelerometer and gyroscope. It leads to a 192 dimension The default parameters are selected for the other three classifiers.
feature space. The features from sEMG and IMU sensors are In machine learning, it is usually hard to determine which
combined into the final feature vector of dimension 268. classifier is more suitable for a specific application and thus it is
worth testing several algorithms before we choose one.
D. Feature Selection
Feature selection provides a way to select the most suitable V. EXPERIMENTAL SETUP
feature subset for certain tasks from the well-established
features. It reduces over fitting problems and information A. Sensor Placement
redundancy existing in the feature set. It can also suggest the The signs can involve one hand or two hands. In our paper,
best feature subset if a smaller feature set is required by we only look at the right hand movements for both one-hand or
applications with limited computational resources. two-hand signs. If they system is deployed on two hands, it will
There are three different feature selection methods which are increase the recognition accuracy. Fig. 5 shows the sensor
filter methods, wrapper methods, and embedded methods [34]. placement on right forearm of the user. Four major muscle
Wrapper methods generate scores for each feature subset based groups are chosen to place four channel sEMG electrodes: (1)
on a specific predictive model. Then, cross validation is done for extensor digitorum, (2) flexor carpi radialis longus, (3) extensor
each feature subset. Based on the prediction performance, each carpi radialis longus and (4) extensor carpi ulnaris. The IMU
subset is assigned a score and the best subset is chosen. Filter sensor is worn on the wrist where a smart watch is usually
methods use general measurement metrics of a dataset to score a placed. To improve signal-to-noise ratio of sEMG readings, a
feature subset instead of using the error rate of a predictive bi-polar configuration is applied for each channel and the space
model. Some common measures are mutual information and between two electrodes for each channel is set to 15 mm [38].
inter/intra class distance. The embedded methods perform the The electrode placements are also annotated in the figure.
feature subset selection in conjunction with the model
JBHI-00032-2016 6

segmentation technique. However, we know the total number


of signs each subject performed and the number of signs our
TABLE III. algorithm
O recognized. An error rate (ER) is defined as:
PTIMAL DATA POINT OF FEATURE SELECTION

Classifier Optimal point (feature number, accuracy)


NaiveBayes (270, 82.13%)
NeareastNeighbor (120, 98.73%)
Decision Tree (100, 78.00%)
LibSVM (120, 98.96%)

TABLE IV. N
UMBER OF FEATURES SELECTED FROM DIFFERENT SENSORS

Number of Number of
Sensor Sensor
feature selected feature selected
Accelerometer 21 sEMG2 2
Gyroscope 10 sEMG3 0
sEMG1 4 sEMG4 3
Fig. 6. Results of feature selection.

B. Data Collection
80 commonly used ASL signs in daily conversations are
selected in our paper. Three male and one female volunteer are detected nums  performed nums
recruited for data collection. They are all first time learners and ER
perfomed nums
did not know ASL before. For each subject, the data is collected
from three sessions on three different days and during each (2)
session, the subject repeats each sign 25 times. The dataset has detected nums and performed nums are numbers of signs our
24000 instances in total. algorithm detected and numbers of signs the user actually
performed, respectively. The ER of our approach is 1.3% which
C. Experiments indicates our segmentation technique achieves a good
Four different experiments are conducted to test our system: performance. The intra-subject classification results in section
intra-subject testing, all cross validation, inter-subject testing V.C also indicate suitable performance of the segmentation.
and intra-subject cross session testing. For intra-subject testing, B. Feature Selection
the data collected from three sessions of same subject is put
together and a 10-fold cross validation is done for the data All 268 features are ranked with a score obtained from
collected from each subject separately. 10-fold cross validation information gain criterion. The highest ranked ones are selected
means the data is split into 10 subsets randomly and the model is to form the best subset. To decide the size of best feature set, all
trained with 9 subsets and tested on the 10th subset. This process cross validation is performed on four different classifiers as
is repeated for 10 times and the average was taken over. For the feature subset size increases from 10 to 268.
all cross validation analyses, data from all four subjects are put
Fig. 6 shows classification accuracies of four classifiers as
together and a 10-fold cross validation is performed. For the
the size of the best feature subset increases. It is seen from the
inter-subject testing, the classifier is trained with data from three
figure that as the size of feature subset increases, the accuracies
subjects and tested on the fourth subject. The performance is
of all classifiers increase. However, when the feature number is
averaged for four tests. The feature selection for the first three
bigger than 120 for LibSVM and nearest neighbor, their
experiments is carried out during all cross validation since it has
accuracies start to decrease as a result of over-fitting. This
data from all four subjects which makes it a good generalization
illustrates one of the reasons why feature selection is necessary.
for classification algorithms. For the intra-subject cross session
Table III lists four data points when classifiers achieve best
testing, the feature selection is performed and the classifier is
performance.
trained with two sessions from each subject and tested on the
third session of the same subject. The process is repeated three
Fig. 6 shows that when number of selected features becomes
times for each subject and the performance is averaged for each
40, LibSVM already achieves 96.16% accuracy. Due to the
subject. This experiment would give an indication of how well
computational constraints associated with wearable systems,
the system will generalize to new data collected in future for the
the feature size is thus selected to be 40. Among the 40 features,
same subject.
the numbers of features selected from different sensors are
shown in Table IV. More than half of the features are selected
VI. EXPERIMENTAL RESULTS
from accelerometer which means accelerometer plays most
A. Auto-segmentation important role in recognizing signs. Accelerometer measures
In our experiment, we do not have a gold standard (e.g. video both gravity and acceleration caused by movement. Gravity is
record) and thus it is hard to measure the error of our automatic usually the major part which is capable of capturing hand
JBHI-00032-2016 7

orientation information. It indicates hand orientation distinguish different signs. Ten features from gyroscope are
information is more significant than hand shape when
TABLE V. FOURTY SELECTED FEATURES
Rank # Feature name Rank # Feature name Rank # Feature name Rank # Feature name
Signal magnitude area of
1 Mean of Acc_y 11 RMS of Gyro_x 21 RMS of sEMG1 31
Acc_x
RMS of amplitude of Zero cross rate
2 Mean of Acc_z 12 22 32 Variance of sEMG4
accelerometer of Acc_y
Mean of amplitude of
3 RMS of Acc_x 13 23 Variance of Gyro_z 33 Entropy of Gyro_x
accelerometer
Standard deviation
4 RMS of Acc_z 14 Mean of Acc_x 24 34 RMS of sEMG4
Of Gyro_z
Signal magnitude area of Signal magnitude area of
5 RMS of Acc_y 15 25 Variance of Acc_y 35
Acc_x Gyro_x
Standard deviation Standard deviation Zero cross rate
6 Integration of Acc_y 16 26 36
of Acc_z of Acc_y of Acc_z
Modified mean frequency Mean absolute value of
7 Integration of Acc_x 17 Variance of Acc_z 27 37
of sEMG1 sEMG4
Standard deviation Mean absolute value of Signal magnitude area of
8 Integration of Acc_z 18 28 38
of Gyro_z sEMG1 Gyro_z
First auto-regression
9 Entropy of Acc_x 19 Variance of Gyro_x 29 39 RMS of sEMG2
coefficient of Acc_x
Mean absolute value of Mean of amplitude of
10 RMS of Gyro_z 20 Variance of sEMG1 30 40
sEMG2 gyroscope
accelerometer features have very high rank which indicates
accelerometer is the most important modality in our system.

TABLE VI. R
ESULTS OF INTRA-SUBJECT VALIDATION

NaiveBayes DT NN LibSVM
Subject 1 88.81% 83.89% 96.6% 98.22%
Subject 2 97.01% 91.54% 99.16% 99.48%
Subject 3 92.74% 81.97% 92.89% 96.61% The gyroscope features have higher ranks than sEMG features
Subject 4 91.15% 77.98% 95.77% 97.23% on average. Although the gyroscope is not as important as the
Average 93.68% 83.85% 96.11% 97.89% accelerometer, it contributes more than sEMG. sEMG features
are the least important among the three modalities which
selected which means that the hand and arm rotation is also indicates it may not be significant in our system. Among
valuable information. Nine selected sEMG features make this accelerometer and gyroscope features, the most important ones
modality necessary for our system. include mean, integration, standard deviation, RMS and
To have a better understanding of the importance of different variance. Mean absolute value, variance and RMS are valuable
sensor features, forty selected features are listed in Table V features for sEMG signal. One interesting observation of sEMG
along with their rankings. In the table, Acc_x, Acc_y and features is that four selected features from channel one have
Acc_z represent accelerometer readings along x-axis, y-axis higher ranks than the others from channel two and channel four.
and z-axis, respectively. Similarly, Gyro_x, Gyro_y and Channel one is placed near the wrist where a smart watch is
Gyro_z are gyroscope readings along x-axis, y-axis and z-axis, usually worn. In reality, if only one electrode is available,
respectively. From the table, we can see that most of the channel one would be selected and it can be integrated into a
smart watch to capture the most important sEMG features.

model based on training data. As the size of training set


C. Classification results
increases, it only increase the training time without affecting
Table VI shows the classification results of intra-subject the time needs in testing phase. This is crucial for real time
testing on four subjects. In this experiment, each classifier is applications. Therefore, LibSVM is the one we select for our
trained and tested with data from the same subject. We can see system implementation. The results achieved for 80 signs are
that nearest neighbor and LibSVM achieve high accuracies consistent with the results obtained for 40 signs in our prior
while decision tree classifier obtains the lowest accuracy. investigation [39]. It indicates our technique scales well for
Nearest neighbor classifier is a lazy learning classifier and it intra-subject testing.
does not require a trained model. In the testing phase, it
compares the testing instance with all instances in the training Table VII shows classification results of all cross validation.
set and assigns it a same class label as the most similar instance For all classifiers, the classification results with sEMG and
in the training set. It does not scale well as the size of the without sEMG are given. The classification with sEMG means
training set increases since the testing instance needs to be we use all 40 features while without sEMG means we only use
compared to all instances in the training set. LibSVM trains a 31 features from accelerometer and gyroscope. The
JBHI-00032-2016 8

performance improvement with adding sEMG is also listed in technique offers low accuracy for all classifiers consistently.
the table. For NaiveBayes, NN and LibSVM, the accuracy obtained from

TABLE VII. R
ESULTS OF ALL-CROSS VALIDATION

NaiveBayes DT NN LibSVM
Accuracy with sEMG 63.87% 76.18% 94.02% 96.16%
Accuracy without sEMG 48.75% 68.93% 87.62% 92.29%
Improvement 15.12% 7.25% 6.4% 3.84%
Precision with sEMG 66.9% 76.3% 94.0% 96.7%
Precision without sEMG 51.8% 69.0% 87.7% 92.3%
Improvement 15.1% 7.3% 6.3% 4.4%
Recall with sEMG 63.9% 76.2% 94.0% 96.7%
Recall without sEMG 48.8% 68.9% 87.7% 92.3%
Improvement 15.1% 7.3% 6.3% 4.4%
F-score with sEMG 63.6% 76.2% 94.0% 96.7%
F-score without sEMG 47.6% 68.9% 87.6% 92.3% Fig. 7. Results of inter-subject testing.
Improvement 16.0% 7.3% 6.4% 4.4%

TABLE VIII. R
ESULTS OF INTRA-SUBJECT CROSS SESSION TESTING

Classifier Accuracy Classifier Accuracy


NaiveBayes 50.11% NN 81.37%
DT 46.01% LibSVM 85.24%

Among four classifiers, LibSVM achieves the best


performance in accuracy, precision, recall and F-score while
NaïveBayes gives the worst performance. The accuracy,
precision, recall and F-score are very close to each other for all Fig. 8. Results of intra-subject cross session testing.
classifiers which indicates all classifiers achieve balanced
performance on our dataset. With 40 features, LibSVM 40 signs is higher than obtained from 80 signs. However, NN
achieves 96.16% accuracy. It is consistent with the results offers higher accuracy for 80 signs surprisingly. The results
(95.16%) we obtained for 40 sign words with 30 features in our suggest our system is not suitable for inter-subject test. It is
prior study [39]. This proves the scalability of approach for all suggested that the system should be trained on each subject
cross validation test. before using it to obtain a high accuracy.

The improvement after adding the sEMG modality is most The first three experiments show our system achieves
significant for NaiveBayes classifier. It achieves about 15% suitable performance if the system is trained and tested for the
improvement for all four classification performance metrics. same subject and the system obtains less ideal performance for
However, for our chosen classifier LibSVM, the accuracy inter-subject testing. We further investigate how well the
improvement is about 4% while the error rate is reduced by system will generalize for new data collected in future for the
50%. It indicates the sEMG is necessary and significant. The same subject. Fig. 8 shows the results of the intra-subject cross
significance of sEMG is further analyzed in next section. session testing in which the feature selection is performed and
Fig. 7 shows the average accuracy of inter-subject testing for the classifier is trained with two days data from the same each
both eighty sign words and forty sign words. It is seen from the subject and is tested on data of the third day for the same
figure, none of the classifiers offer good accuracy for subject. This process is repeated three times for the same
recognizing 40 or 80 signs. LibSVM still offers the best subject and the accuracy measures are averaged. We can see
performance among four classifiers. There are three reasons for that both NaiveBayes and decision tree yield poor accuracies
such low accuracies. First, different people perform the same while LibSVM offers best accuracy. Table VIII shows the
signs in different ways. Second, all subjects in our experiment average accuracy of different classification algorithms between
are first time ASL learners and never had experience with ASL four subjects. LibSVM achieves 85.24% which is less suitable
before. Even though they follow the instructions, the gestures than the 96.16% of intra-subject testing. Two reasons may
for the same signs are different from each other. Third, different explain this performance decrease. The first reason is that the
subjects have very different muscular strength and thus leading user may have placed the sensors at slightly different locations
to different sEMG features for same signs. From the for the sEMG and IMU sensors, and with a slightly different
comparison between accuracy of 40 signs and 80 signs, our orientation for the IMU sensor. The second reason is that all
four subjects are first time learner who have not developed
JBHI-00032-2016 9

consistent patterns for signs. They may have performed the same signs somewhat differently on different days.

Fig. 9. Sequence of postures when performing ‘Please’ and ‘Sorry’. (a). Sequence of postures when performing ‘Please’.
(b). Sequence of postures when performing ‘Sorry’.

TABLE IX. 1
0 SIGNS WITH MOST TP RATE IMPROVEMENT
performing two signs ‘Please’ and ‘Sorry’. We can see from the
Sign ID Sign Improvement figures, the arm has the same movement which is drawing a
29 Thank 21%
19 My 18.2%
circle in front of chest. The inertial sensor will offer same
9 Have 16.7% readings for these two different signs. However, the hand is
24 Please 16.7% closed (i.e. fist) when performing ‘Sorry’ while it is open (i.e.
37 Work 16.5% palm) when performing ‘Please’. This difference can be
57 Tall 14.3%
67 Girl 13.9%
captured by sEMG and thus they will be distinguishable if
26 Sorry 13.8% sEMG is included.
76 Doctor 12.5%
66 Boy 12.5% Instead of average improvement, the improvement of true
positive (TP) rate is analyzed to show how the sEMG impacts
each individual sign. TP rate is rate of true positive and true
positives are number of instances which are correctly classified
D. Significance of sEMG as a given class. The improvement of TP rate of each sign with
From the analysis of inter-subject testing in previous section, sEMG can tell how much sEMG will help for each individual
LibSVM achieves about 4% improvement for accuracy, signs. Fig. 10 shows the TP rate improvement for 80 signs and
precision, recall and F-score while the error rates for these the improvement is sorted in descend order. From the figure,
metrics are reduced by about 50%. In this section, we further we can see that for most of signs (last 29-80), the rate of
analyze the importance of sEMG. In American Sign Language, improvement is within the range of [-5%, 5%]. However, for
there are some signs which are very similar in arm movement the signs from 1 to 11, the improvement is bigger than 10%
and are different in hand shape and finger configurations (e.g. which is very helpful for recognizing these signs. In Table IX,
fist and palm). The sEMG is able to capture the difference of 10 signs are listed with the highest TP rate improvement. We
finger configuration and to distinguish these signs. If only can see that ‘Sorry’ and ‘Please’ are both improved
inertial sensor is considered, the exactly same motion profile significantly since they are confused with each other. In reality,
will make these signs confusing relative to each other. Fig. 9 it is important to eliminate the confusion between signs which
shows an example of sequences of postures when the user is
have similar motion profile but different sEMG characteristics. privacy non-intrusive and ubiquitous sensing ability compared
Therefore, the sEMG is significant for our system. with vision-based approaches. They may not be as accurate as
vision-based approaches. A vision-based approach achieves
92.5% accuracy for 439 frequently used Chinese Sign
VII. LIMITATIONS AND DISCUSSION Language words [17]. Although we have not tested for such a
The wearable inertial sensor and sEMG sensors based sign large number of signs, it may be challenging with wearable
language recognition/gesture recognition systems have become inertial and sEMG systems to recognize such a big number of
more and more popular in recent years because of low-cost, signs. Another disadvantage with wearable inertial sensor and
JBHI-00032-2016 10

sEMG based sign language recognition system is that the facial VIII. CONCLUSION
expression is not captured. A wearable real-time American Sign Language recognition
In our study, we observe that the accelerometer is the most system is proposed in our paper. This is a first study of
significant modality for detecting signs. When designing such American Sign Language recognition system fusing IMU
sensor and sEMG signals which are complementary to each
other. Feature selection is performed to select the best subset of
features from a large number of well-established features and
four popular classification algorithms are investigated for our
system design. The system is evaluated with 80 commonly used
ASL signs in daily conversation and an average accuracy of
96.16% is achieved with 40 selected features. The significance
of sEMG to American Sign Language recognition task is
explored.

REFERENCES
[1] W. C. Stokoe, “Sign language structure: An outline of the visual
Fig. 10. TP rate improvement of all signs. communication systems of the american deaf,” Journal of deaf studies
and deaf education, vol. 10, no. 1, pp. 3–37, 2005.
[2] D. Barberis, N. Garazzino, P. Prinetto, G. Tiotto, A. Savino, U. Shoaib,
systems, if fusion of multiple modalities is not possible, the and N. Ahmad, “Language resources for computer assisted translation
suggested choice order of these three are: accelerometer, from italian to italian sign language of deaf people,” in Proceedings of
gyroscope and sEMG. The significance of sEMG is to Accessibility Reaching Everywhere AEGIS Workshop and International
Conference, Brussels, Belgium (November 2011), 2011.
distinguish sets of signs which are similar in motion and this is [3] A. B. Grieve-Smith, “Signsynth: A sign language synthesis application
crucial for sign language recognition. For some gesture using web3d and perl,” in Gesture and Sign Language in
recognition tasks, if gesture number is not big and there are no Human-Computer Interaction, pp. 134–145, Springer, 2002.
gestures which are very similar in motion, one inertial sensor [4] B. Vicars, “Basic asl: First 100 signs.”
[5] E. Costello, American sign language dictionary. Random House
may be sufficient for the task to reduce the system cost. Reference &, 2008.
Our system offers high accuracy for both 40 signs and 80 [6] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign
signs for intra-subject testing and all cross validation. This language recognition using desk and wearable computer based video,”
Pattern Analysis and Machine Intelligence, IEEE Transactions on,
shows our system is scalable for American Sign Language vol. 20, no. 12, pp. 1371–1375, 1998.
recognition if the system is trained and tested on the same [7] C. Vogler and D. Metaxas, “A framework for recognizing the
subjects. However, very low accuracy is achieved for simultaneous aspects of american sign language,” Computer Vision and
Image Understanding, vol. 81, no. 3, pp. 358–384, 2001.
inter-subject testing which indicates our system is not very [8] T. E. Starner, “Visual recognition of american sign language using hidden
suitable for use on individuals if the system is not trained for markov models.,” tech. rep., DTIC Document, 1995.
them. We have talked to several experts of American Sign [9] C. Oz and M. C. Leu, “American sign language word recognition with a
sensory glove using artificial neural networks,” Engineering Applications
Language and they think it is reasonable to train for each of Artificial Intelligence, vol. 24, no. 7, pp. 1204–1213, 2011.
individuals since even for expert, they will perform quite [10] E. Malaia, J. Borneman, and R. B. Wilbur, “Analysis of asl motion
differently from each other for the same signs based on their capture data towards identification of verb type,” in Proceedings of the
2008 Conference on Semantics in Text Processing, pp. 155–164,
preference and habits. This is the major limitation of sign Association for Computational Linguistics, 2008.
language recognition systems. Our system is studied and [11] A. Y. Benbasat and J. A. Paradiso, “An inertial measurement framework
designed to recognize individual signs assuming a pause exists for gesture recognition and applications,” in Gesture and Sign Language
between two sign words. However, in daily conversation, a in Human-Computer Interaction, pp. 9–20, Springer, 2002.
[12] O. Amft, H. Junker, and G. Troster, “Detection of eating and drinking
whole sentence may be performed continuously without an arm gestures using inertial body-worn sensors,” in Wearable Computers,
obvious pause between each words. To recognize continuous 2005. Proceedings. Ninth IEEE International Symposium on,
sentence, a different segmentation technique or other pp. 160–163, IEEE, 2005.
[13] A. B. Ajiboye and R. F. Weir, “A heuristic fuzzy logic approach to emg
possibility models should be considered. pattern recognition for multifunctional prosthesis control,” Neural
Machine learning is a powerful tool for different applications Systems and Rehabilitation Engineering, IEEE Transactions on, vol. 13,
and is gaining a lot of popularity in recent years in wearable no. 3, pp. 280–291, 2005.
[14] J.-U. Chu, I. Moon, and M.-S. Mun, “A real-time emg pattern recognition
computer based applications. However, it is important to use it based on linear-nonlinear feature projection for multifunction myoelectric
in a correct way. For different applications, different features hand,” in Rehabilitation Robotics, 2005. ICORR 2005. 9th International
and different classifiers may have significantly different Conference on, pp. 295–298, IEEE, 2005.
[15] Y. Li, X. Chen, X. Zhang, K. Wang, and J. Yang, “Interpreting sign
performance. It is suggested to try different approaches to components from accelerometer and semg data for automatic sign
determine the best one. The other point is that the classifier language recognition,” in Engineering in Medicine and Biology Society,
parameters should be carefully tuned. In our approach, if we do EMBC, 2011 Annual International Conference of the IEEE,
pp. 3358–3361, IEEE, 2011.
not choose the correct parameters for LibSVM, only 68% [16] Y. Li, X. Chen, X. Zhang, K. Wang, and Z. J. Wang, “A
accuracy can be achieved. sign-component-based framework for chinese sign language recognition
using accelerometer and semg data,” Biomedical Engineering, IEEE
Transactions on, vol. 59, no. 10, pp. 2695–2704, 2012.
JBHI-00032-2016 11

[17] L.-G. Zhang, Y. Chen, G. Fang, X. Chen, and W. Gao, “A vision-based contamination,” Journal of biomechanics, vol. 43, no. 8, pp. 1573–1579,
sign language recognition system using tied-mixture density hmm,” in 2010.
Proceedings of the 6th international conference on Multimodal [27] I. Mesa, A. Rubio, I. Tubia, J. De No, and J. Diaz, “Channel and feature
interfaces, pp. 198–204, ACM, 2004. selection for a surface electromyographic pattern recognition task,”
[18] M. M. Zaki and S. I. Shaheen, “Sign language recognition using a Expert Systems with Applications, vol. 41, no. 11, pp. 5190–5200, 2014.
combination of new vision based features,” Pattern Recognition Letters, [28] R. Merletti and P. Di Torino, “Standards for reporting emg data,” J
vol. 32, no. 4, pp. 572–577, 2011. Electromyogr Kinesiol, vol. 9, no. 1, pp. 3–4, 1999.
[19] M. W. Kadous et al., “Machine recognition of auslan signs using [29] A. Phinyomark, C. Limsakul, and P. Phukpattaranont, “A novel feature
powergloves: Towards large-lexicon recognition of sign language,” in extraction for robust emg pattern recognition,” arXiv preprint
Proceedings of the Workshop on the Integration of Gesture in Language arXiv:0912.3973, 2009.
and Speech, pp. 165–174, Citeseer, 1996. [30] M. Zhang and A. A. Sawchuk, “Human daily activity recognition with
[20] D. Sherrill, P. Bonato, and C. De Luca, “A neural network approach to sparse representation using wearable sensors,” Biomedical and Health
monitor motor activities,” in Engineering in Medicine and Biology, 2002. Informatics, IEEE Journal of, vol. 17, no. 3, pp. 553–560, 2013.
24th Annual Conference and the Annual Fall Meeting of the Biomedical [31] S. H. Khan and M. Sohail, “Activity monitoring of workers using single
Engineering Society EMBS/BMES Conference, 2002. Proceedings of the wearable inertial sensor,”
Second Joint, vol. 1, pp. 52–53, IEEE, 2002. [32] O. Paiss and G. F. Inbar, “Autoregressive modeling of surface emg and its
[21] X. Chen, X. Zhang, Z.-Y. Zhao, J.-H. Yang, V. Lantz, and K.-Q. Wang, spectrum with application to fatigue,” Biomedical Engineering, IEEE
“Hand gesture recognition research based on surface emg sensors and Transactions on, no. 10, pp. 761–770, 1987.
2d-accelerometers,” in Wearable Computers, 2007 11th IEEE [33] A. M. Khan, Y.-K. Lee, S. Y. Lee, and T.-S. Kim, “A triaxial
International Symposium on, pp. 11–14, IEEE, 2007. accelerometer-based physical-activity recognition via augmented-signal
[22] V. E. Kosmidou and L. J. Hadjileontiadis, “Sign language recognition features and a hierarchical recognizer,” Information Technology in
using intrinsic-mode sample entropy on semg and accelerometer data,” Biomedicine, IEEE Transactions on, vol. 14, no. 5, pp. 1166–1172, 2010.
Biomedical Engineering, IEEE Transactions on, vol. 56, no. 12, [34] I. Guyon and A. Elisseeff, “An introduction to variable and feature
pp. 2879–2890, 2009. selection,” The Journal of Machine Learning Research, vol. 3,
[23] J. Kim, J. Wagner, M. Rehm, and E. André, “Bi-channel sensor fusion for pp. 1157–1182, 2003.
automatic sign language recognition,” in Automatic Face & Gesture [35] J. R. Quinlan, C4. 5: programs for machine learning. Elsevier, 2014.
Recognition, 2008. FG’08. 8th IEEE International Conference on, [36] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector
pp. 1–6, IEEE, 2008. machines,” ACM Transactions on Intelligent Systems and Technology,
[24] J.-S. Wang and F.-C. Chuang, “An accelerometer-based digital pen with a vol. 2, pp. 27:1–27:27, 2011. Software available at http://-
trajectory recognition algorithm for handwritten digit and gesture www.csie.ntu.edu.tw/~cjlin/libsvm.
recognition,” Industrial Electronics, IEEE Transactions on, vol. 59, [37] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
no. 7, pp. 2998–3007, 2012. Witten, “The weka data mining software: an update,” ACM SIGKDD
[25] V. Nathan, J. Wu, C. Zong, Y. Zou, O. Dehzangi, M. Reagor, and explorations newsletter, vol. 11, no. 1, pp. 10–18, 2009.
R. Jafari, “A 16-channel bluetooth enabled wearable eeg platform with [38] M. Z. Jamal, “Signal acquisition using surface emg and circuit design
dry-contact electrodes for brain computer interface,” in Proceedings of considerations for robotic prosthesis,” 2012.
the 4th Conference on Wireless Health, p. 17, ACM, 2013. [39] J. Wu, Z. Tian, L. Sun, L. Estevez, and R. Jafari, “Real-time american
[26] C. J. De Luca, L. Donald Gilmore, M. Kuznetsov, and S. H. Roy, sign language recognition using wrist-worn motion and surface emg
“Filtering the surface emg signal: Movement artifact and baseline noise sensors,” in Wearable and Implantable Body Sensor Networks (BSN),
2015 IEEE 12th International Conference on, pp. 1–6, IEEE, 2015.

You might also like