0% found this document useful (0 votes)
111 views6 pages

Sentence-Level Sign Language Recognition Using RF Signals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views6 pages

Sentence-Level Sign Language Recognition Using RF Signals

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 1

Sentence-Level Sign Language Recognition Using


RF signals
Xianjia Meng Lin Feng∗ Xiao Yin Huanting Zhou
Chang Sheng Chongyang Wang Anxun Du Linzhi Xu
School of Information Science and Technology, Northwest University
Shaanxi International Research Center for Passive Internet of Things, Xi’an, China
([email protected], (lynnefeng, 15188108151, zht1257765631,
sc2018117118, wcynwu, a474698501, x3597421680)@163.com)

Abstract—Sign language recognition is emerging as a vital computer-vision-based system can realize high recognition
component of our smart life. In addition, commercial RFID accuracy, the privacy concern is a big issue, especially in a pri-
shall become a popular technology for sign language recognition. vacy environment. Wearable-sensor-based system is intrusive
As we all know, there are 70 million deaf people using sign
language as their first language and sign language can facilitate and it restricts the freedom of users. Although WiFi-signals-
communication with deaf people. However, most of the researches based does not violate human privacy and does not need to
are isolated word recognition. There is few researches about wear any devices, the deployment of WiFi is inconvenient and
sentence-level sign language recognition. More importantly, they susceptible to interference, so it is not a good choice. RFID
are limited and it is difficult to achieve the desired results of real- has low cost and its tags can also be easily deployed.
world applications. So this paper introduces the first sentence-
level sign language recognition system based on RFID. It mainly This paper introduces the first sentence-level sign language
collects the phase sequence of signals received by commercial recognition system based on RFID. We believe that it rep-
RFID device. We obtain relatively pure phase characteristics resents an important step to break the communication barrier
and present a method to carry out sign language segmentation. between deaf people and others. Different sign languages have
Effective feature extraction and classifier selection are crucial
to sign language recognition. By evaluating our system in real- different effects on wireless signals, so we extract the features
word environment, we fill in the gaps between corresponding low- of different sign languages and use them as input to the
cost sentence-level sign language recognition. We implement and classifier for sign language recognition. Experimentally, our
evaluate through extensive experiments and the average accuracy method has high recognition accuracy and robustness.
of the method are 96% and 98.11% in different multipath
scenarios. The results show that our method has high recognition The first challenge of our system is how to recognize
accuracy and robustness. sentence-level sign language. It is hard to divide continuous
sign language into words and translate them. Moreover, there
Index Terms—Sentence-level Sign Language Recognition; RF
Signals; RFID are differences between words and word-level words separated
from sentences. To illustrate this problem, Fig.2 shows the
difference between the word “you” and the word “you” in
I. INTRODUCTION
two sentences (“you” are subject and object respectively).
Although spoken language is the mainstay of current world The second one is sign language segmentation. Previous
trend, there is no denying that sign language is still important related work about sign language often extract sign language
to us. Especially in the era of artificial intelligence, its appli- manually. Obviously, it is a huge workload. In order to deal
cation range is very wide, such as smart home, showroom, with the first challenge, our system realizes sentence-level sign
etc. Sign language is not only convenient for deaf people to language recognition, which is highly practical and has a wide
integrate into intelligent life faster, but also convenient for us range of application scenarios. We present a sign language
to communicate with deaf people. For example, in the history segmentation algorithm to deal with the second challenge and
museum, when deaf people need help, staff members that do its accuracy is as high as 95%.
not know sign language cannot communicate with them. If
In summary, our main contributions can be concluded as
the museum has a robot (or other interactive devices) which
follows:
can recognize deaf people’s sign language and responds (in
sign language). The robot can communicate with the deaf and • We introduce the first sentence-level sign language recog-
provide help and services (See Fig.1). nition system based on RFID. We implement and evaluate
In recent years, the researches about sign language recog- through extensive experiments and the average accuracy
nition are mainly based on three categories: Computer-vision- of the method are 96% and 98.11% in different multipath
based [1]–[6], Wearable-sensor-based [7]–[13], WiFi-signals- scenarios.
based [14]–[18]. Most of these researches only focus on isolat- • We design each part (data preprocessing, effective feature
ed word sign language recognition, but it is difficult to be used extraction, sign language segmentation and recognition)
in daily communication. And few of them have been conducted of our system for high performance in recognition. Espe-
to translate sentence-level sign language recognition. Although cially, we introduce a new feature SOS (sum of standard

978-1-7281-4762-8/19/$31.00 © 2019 IEEE


2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 2

Conversation Sign Language


Data collection
M: Thank you! Recognition
R: It is my pleasure. RF

Respond with Sign Language


Data preprocessing Feature Extraction

Phase Shift Calibration


Sign Language
Wavelet Denoising
Segmentation

Fig. 1: An example of our system. Fig. 3: Framework flow of our system.

very cumbersome. These systems could also raise sensitive


issues such as personal privacy.

B. Wearable-sensor-based Systems
In these systems, special sensors are used to sense the
gestures of arms, hands and fingers. Wearable sensing methods
are adopted in perceiving subtle movements and usually have
high recognition accuracy. Some embedded in data gloves like
paper [7]–[9], but wearing the data gloves is inconvenient for
users. A lighter way is to embed the sensor in the wristwatches
or armbands like paper [10]–[13]. No doubt that these ways
are still intrusive and require the users to wear equipment,
which enormously limit their freedom.
Fig. 2: The same word which is in different positions of
different sentences has different phases. C. WiFi-signals-based Systems
WiFi-signals-based systems based on the major different
deviations) which can compare the gentle degree of data gestures can result in distinguishing primary signal fluctua-
fluctuation. tions. Recent papers have not been aware of this development
• We present sign language segmentation algorithm and its
of the recognition of sign language, like [14] a review of
accuracy is as high as 95%. sign language recognition techniques. In fact, WiFi signals
• We design and collect a large amount of Chinese sign
are successfully used to recognize sign languages in a non-
language data sets. Our method can be extended to any intrusive way in [15]–[17]. However, only few sign language
sign languages to develop sign language recognition. can be recognized in these systems. In paper [18], more sign
The rest of this paper is organized as follows. Section II language involved. But all of the above WiFi-based methods
is the related work. Section III presents the design of the can only recognize single ASL word and it is susceptible to
system. Section IV introduces the system implementation and interference.
experiment settings. Section V demonstrates the evaluation of All of the above related work can only recognize word-level
our system. And we discuss the future work in Section VI. In ASL. Computer-vision-based systems are susceptible to envi-
Section VII, we conclude this paper. ronmental influences and may affect users’ privacy. Wearable-
sensor-based systems require users to wear corresponding
II. RELATED WORK devices, which greatly affects users’ comfort. WiFi-signals-
based systems are not flexible enough to be deployed and are
There are three categories of sign language recognition
subject to interference. The system based on RFID mentioned
systems.
in this paper is a sentence-level sign language recognition
A. Computer-vision-based Systems system. It is less susceptible to environmental interference and
will not involve users’ privacy issues. And it does not require
These systems usually use camera [1], [2] or depth-cameras users to wear any device. Moreover, it can be deployed more
like the Kinect [3]–[5] or leap motion [6]. Although these flexibly.
systems have certain recognition accuracy, the recognition
accuracy of 56 basic ASL (American Sign Language) is
94.5% in the paper [6], they are severely constrained by III. SYSTEM DESIGN
light conditions and problems of the influence of background In this section, we give a detailed demonstration of the
environment still exist. Besides, the infrastructure required is design of RFID system. It contains four basic functional

978-1-7281-4762-8/19/$31.00 © 2019 IEEE


2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 3

Fig. 4: Phase shift calibration. Fig. 5: Comparison before and after wavelet denoising.

blocks: data preprocessing, sign language segmentation, fea- sign language segment. The method used in this paper is a sign
ture extraction and sign language recognition (See Fig.3). For language segmentation method based on standard deviation.
each functional block, we give an introduction to specific The phase sequence obtained by removing noise from the
techniques. We also apply and state the insights behind them. data acquired from the API of RFID is as follows:
A. Data Preprocessing θ = {θ1 , θ2 . . . θi . . . θn } (2)
Signal collection of our system is mainly based on COTS
where n is the sequence number. θ(i) is the ith phase value.
RFID technology. It can recognize specific targets by radio
Step 1: We first normalize the phase sequence as follows:
signals, read and write related data without establishing contact
between recognition system and specific targets. RFID system θ(i) − min(θ)
S(i) = (3)
includes RFID tag, RFID reader and the antenna. max(θ) − min(θ)
A commercial RFID device is used to provide an application
program interface (API) for users. Through API, users can a(i) = Σ10
k=i S(k)(k ≤ n − 10) (4)
acquire signal features such as RSSI (Received Signal Strength
where S(i) is the intermediate result of normalization and a(i)
Indication) and phase for passive tag acquisition. Among
is the ith phase sequence sample after normalization.
these signal features, the phase sequence of the signal is less
Step 2: Then the data in sequence a(i) are processed.
disturbed by the environment and has higher stability and it
is more stable than RSSI. Therefore, the phase sequence is m(i) =| a(i) − median(a) | (5)
selected for sign language recognition of our method.
The phase sequence obtained directly from the API of where median(a) is the median of the sequence a and m(i) is
the reader general contains various noises, including phase the absolute value that each number in a(i) subtract from the
shift caused by hardware and Gaussian noise in environment. median.
Therefore, we first preprocess the received signal. Step 3: Data in m(i) are grouped into a group and the
There are two kinds of phase shift in the received signal. standard deviation is calculated in turn.

Generally, it shifts by π or 2π. 1 n+k
d(i) = Σ (mi − µ)2 (6)
n i=n
θt rue = θ − θn (θn = π or2π) (1)
where d(i) is a new sequence after calculating the standard
where n is the sequence number. In order to calibrate phase deviation, k is the length of the sequence and µ is the average
shift, we calculate the average of the phase sequence and the sequence (k=50).
restore it to the real phase by adding or subtracting π or 2π, Step 4: For grouping for standard deviation (d(i)), the part
which differs from the average about π or 2π (See Fig.4). with data greater than threshold r (r=0.1) is the valid range
In order to remove the Gauss noise from the environment for extraction. As shown in Fig.6, we achieve the right sign
and extract the pure phase sequence, we adopt threshold-based language segmentation.
discrete wavelet denoising method. We can see the phase’s
changes before and after wavelet denoising (See Fig.5). C. Feature Extraction
We need to exact features after data preprocessing and the
B. Sign language Segmentation target sign language is segmented. We extract 11 features
On account of hand movements affect signals, each sign including skewness, expected value, third-order center dis-
language action corresponds to a rapidly changing waveform. tance, average, variance, standard deviation, kurtosis, energy,
The key of signal segmentation is to determine the start and maximum of frequency peak, dominant frequency ratio (dfr)
end of each sign language, so as to locate the corresponding a and SOS (new feature).

978-1-7281-4762-8/19/$31.00 © 2019 IEEE


2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 4

Fig. 6: Sign language Segmentation. (a) Normalized results


(s(i)) (b) Subtract from median(a) (m(i)) (c) Grouping for
standard deviation (d(i)) (d) Segmentation results
Fig. 8: In less multipath scenario (laboratory).

as follows:
ΣN 4
i (xi − x) fi
kurtosis = 4
(9)

Among them, N is the sample number of sign language
data, the standard deviation of sign language data, the
average value of sign language data and the sample
interval. And fi is the time interval between the ith
sample and center of the sign language segment.
• Third-order center distance: It mainly be used to measure
whether the distribution of sign language data is biased.
The calculation formula is as follows:
Fig. 7: Experimental Equipment. E3 = E(X − E(x))3 (10)

• SOS: We assume that N is the length of the array. Starting Among them, E(x) is the mean of x, x is sign language
from the first number of arrays, each number and the data.
following (k-1) number form an array with the length of In this paper, through the calculation of sign language data,
K. We get the value of standard deviation of each array. we can effectively classify sign language by the above feature
Then we get (N-k+1) standard deviation, finding the sum calculation method. Through effective features, we can fully
of these (N-k+1) standard deviations and get SOS. capture the characteristics of different sign language signals,
Function: It can compare the gentle degree of data which are the guarantee of accurate sign language recognition.
fluctuation. The value is large when the data fluctuates D. Sign Language Recognition
severely and is small when the data fluctuates less.
With the effective extracted from the measure phase se-
 quences of the RFID tag, our system employs Random Forest
N −k 1 n+k
SOS(m) = Σn=0 Σ (xi − µ)2 (7) (RF) classifier to recognize sign language, we use valid
k i=n
features as input for RF classifier. We choose RF because it
where N is the total number of data in the data set, µ is can achieve high performance without complicated parameter
the average of k data, k is the number of data taken for adjusting [19].
each standard deviation, taken for each standard deviation
and k is taken as 5 in this experiment. IV. SYSTEM IMPLEMENTATION AND
• Standard Deviation: It reflects the discrete degree of ges- EXPERIMENT SETTINGS
ture data and mean. The formula for calculating standard In this section, we describe our hardware setup, scenario de-
deviation is as follows: ployment and experiment process. Then, system performance

 is evaluated in different settings.
1  N
Std =  (xi − x)2 (8) A. Experiment Setup
K i
Our experimental equipment has a commodity RFID reader
(Impinj R420), a directional antenna (circular polarization),
Where N is the sample number of sign language data an off-the shelf UHF passive tags (AZ-9662) and a computer
and the average value of sign language data, K is the (WIN8), as shown in Fig.7. We choose Matlab as a program-
total number of data in the data set, xi is ith data and x ming tool and we set the number of decision tress as 50 for
is the average of the total data. RF classifier [19]. We recruited 8 volunteers, 6 males and 2
• kurtosis: It reflects the steep degree of sign language data female, aged 18 to 28 years old, from 1.56 to 1.85 meters in
at the peak of the data curve. The calculation formula is height and weighing 46 to 70 kilograms.

978-1-7281-4762-8/19/$31.00 © 2019 IEEE


2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 5

Fig. 9: In multipath scenario (corridor). Fig. 11: Accuracy of sign language recognition in less multi-
path scenario.

Fig. 10: Confusion matrix of sign language recognition in less


multipath scenario. Fig. 12: Confusion matrix of sign language recognition in
multipath scenario.
B. Scenario Deployment
We implement and evaluate system through extensive ex- probability that the sign language is recognized as other sign
periments in two different scenarios. The first is a typical language (including itself).
laboratory, which is about 70 m2 and includes several tables
A. About the Impact of Multipath
and chairs. The distance between the antenna and the tag is
1.2 m. The volunteer will stand on the center of the antenna RF-based systems are susceptible to multipath effects. In
and tag (See Fig.8). The second is the narrow corridor which order to verify the robustness of our system, we perform sign
is 1.4 m wide (See Fig.9). language recognition experiments in corridor and laboratory.
Compared to the laboratory (See Fig.10, the average accuracy
C. Data Acquisition is 96%), the corridor is more affected by the multipath effect.
Therefore, we use the corridor as a multipath scenario. The
We use RFID reader, antenna and tag to collect sign lan-
results are shown in Fig.12, the average accuracy of sign lan-
guage data and process it with MATLAB. We mainly collect
guage recognition in multipath scenario is about 98.11%, it is
the sign language data of the antenna from the volunteers in the
slightly higher than laboratory. We also evaluate performance
laboratory. There are nine kinds of sign language ( “Hi”, “Da
of our system by “Accuracy” and “F1” in two scenarios (See
Tang Treasures Exhibition (DTTE)”, “Digital Exhibition Hall
Fig.11 and Fig.13 ). And we perform iterative calculations
(DEH)”, “How old are you? (HOAY)”, “Tang Dynasty Mural
on sign language, so that each sign language data has the
Exhibition Hall (TDMEH)”, “Exhibition Exchange (EE)”,
opportunity to do the test sets and training sets. All the
“Nice to meet you. (NTMY)”, “What is your name? (WIYN)”,
evaluation proves our system’s robustness and the split-hair
“Exhibition Hall (EH)” ). Each volunteer learns standard sign
recognition accuracy.
language before the experiment. The volunteers execute each
sign language 100 times when doing the experiment. And the B. Recognition Accuracy Comparison of Different Literatures
accuracy of sign language segmentation algorithm is as high
Paper [5] uses the SVM method to classify sign language.
as 95%. The experimental sampling rate is approximately 270
We achieve the SVM algorithm and use our sign language
samples per second. We repeat this experiment in the corridor.
data (laboratory and corridor, respectively) to recognize. The
accuracies of RF are 96.00% (laboratory) and 98.11% (cor-
V. PERFORMANCE EVALUATION ridor), while the accuracies of SVM are 23.28% (laboratory)
We will evaluate the performance of various scenarios and and 28.89% (corridor) under the same data. At the same time,
comparison with different literatures. The main performance SVM needs to set parameters artificially, which reduces the
evaluation is confusion matrix of the recognition accuracy. The feasibility. The selection of classifier also is a vital part in
elements in each row of the confusion matrix represent the sign language recognition, except effective features extraction

978-1-7281-4762-8/19/$31.00 © 2019 IEEE


2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing 6

of the method are 96% and 98.11% in different multipath


scenarios. It proves our method has high recognition accuracy
and robustness.

ACKNOWLEDGMENT
This paper was supported by Project National Natural Sci-
ence Foundation of China (No.61702416, No.61602382), the
Science and Technology Innovation Team Supported Project
of Shaanxi Province (No.2018TD-026), the Key Research and
Development Program of Shaanxi Province (No.2019GY-012).
Fig. 13: Accuracy of sign language recognition in multipath Lin Feng is the corresponding author.
scenario.
R EFERENCES
TABLE I: Recognition accuracy comparison of different liter- [1] SignAll. http://www.signall.us.com.
[2] O. Koller, J. Forster and H. Ney, “Continuous sign language recogni-
atures tion: Towards large vocabulary statistical recognition system handling
Literature Method Accuracy(%) multiple signers”, Computer Vision and Image Understanding, 2015.
[3] C. Sun, T.Z. Zhang and C.S. Xu, “Latent support vector machine
Paper [5] SVM 23.28 (laboratory) and 28.89 (corridor)
modeling for sign language recognition with Kinect”, ACM Transactions
Paper [13] CNN and LSTM 93.1
on Intelligent Systems and Technology (TIST), 2015.
Our system RF 96 (laboratory) and 98.11 (corridor)
[4] J. Huang, W.G. Zhou, H.Q. Li and W.P. Li, “Sign Language Recognition
using 3D Convolutional Neural Networks”, 2015 IEEE International
and data preprocessing. Paper [13] uses data from camera as Conference on Multimedia and Expo (ICME), 2015.
input of CNN. CNN method needs a lot of experimental data [5] L. Pigou, S. Dieleman, P.J. Kindermans and B. Schrauwen, “Sign
whose data type is different from our system. Therefore, for a Language Recognition Using Convolutional Neural Networks”, Springer
International Publishing, Cham, 572C578.
fair comparison, we summarize the results of paper [13] and [6] B.Y. Fang, J. Co and M. Zhang, “DeepASL: Enabling Ubiquitous and
compare them with our paper. The results show that we make Non-Intrusive Word and Sentence-Level Sign Language Translation”,
a higher accuracy with less data and protect users’ privacy. the 15th ACM Conference on Embedded Network Sensor Systems.
[7] K.H. Li, Z.Y. Zhou and C.H. Lee, “Sign transition modeling and
a scalable solution to continuous sign language recognition for real-
VI. DISCUSSION world applications”, ACM Transactions on Accessible Computing (TAC-
In this section, we discuss several final thoughts in the future CESS), 2016.
[8] Y. Du, Y.K. Wong, W.G. Jin, W.T. Wei, Y. Hu, M. Kankanhalli
of our system. Firstly, in this paper, we just take nine basic and W.D. Geng, “Semi-supervised learning for surface EMG-based
sentence-level sign language in scenario of History Museum. gesture recognition”, the 26th International Joint Conference on Artiicial
We need to do more sign language in the future to verify the Intelligence, AAAI Press, 1624ś1630.
[9] Z.H. Zhoua, Y.K. Daia and W.S. Lib, “Gesture recognition based
performance of our system. Then, our system is robust for two on Global Template DTW for Chinese Sign Language”, Journal of
scenarios and eight volunteers. There are many other factors Intelligent Fuzzy Systems, 2018.
that may influence the recognition performance. [10] Y. Li, X. Chen, X. Zhang, K.Q. Wang and Z.J. Wang, “A sign-
component-based framework for Chinese sign language recognition
On the one hand, the same sign language, we need the same using accelerometer and sEMG data”, IEEE transactions on biomedical
person to complete, because the sign language completed by engineering, 2012.
different people is different. The same person should do the [11] J. Wu, L. Sun and R. Jafari, “A Wearable System for Recognizing
American Sign Language in Real-Time Using IMU and Surface EMG
action standardly. Our future work hopes to identify different Sensors”, IEEE J. Biomedical and Health Informatics, 2016.
people of the same sign language. On the other hand, the [12] X.D. Yang, X. Chen, X. Cao, S.J. Wei and X. Zhang, “Chinese Sign
change of human’s position causes sign language is different. Language Recognition Based on an Optimized Tree-Structure Frame-
work”, IEEE journal of biomedical and health informatics, 2017.
We test different positions in the preliminary experiment and [13] Q. Zhang, D. Wang, R. Zhao and Y.G. Yu, “MyoSign: Enabling End-to-
find that the effect of standing between devices and standing End Sign Language Recognition with Wearables”, the 24th International
on the side is better. For the application scenario of this paper, Conference on Intelligent User Interfaces.
[14] M.J. Cheok, Z. Omar and M.H. Jaward, “A review of hand gesture and
we choose people to stand between devices. In the future, we sign language recognition techniques”, Springer-Verlag GmbH Germany
hope that we can expand the locations and give a detectable 2017 Int, 2017.
range. Lastly, we use algorithms to implement automatic time [15] P. Melgarejo, X.Y. Zhang, P. Ramanathan and D. Chu, “Leveraging
Directional Antenna Capabilities for Fine-grained Gesture Recognition”,
segmentation. However, when the action interval is long or the 2014 ACM International Joint Conference on Pervasive and Ubiq-
the sign language action is not standard, the waveform will be uitous Computing (UbiComp 14).
milder and misjudgment may occur at this time. [16] H. Li, W. Yang, J.X. Wang, Y. Xu and L.S. Huang, “WiFinger: Talk
to Your Smart Devices with Finger-grained Gesture”, the 2016 ACM
International Joint Conference on Pervasive and Ubiquitous Computing
VII. CONCLUSION (UbiComp 16).
[17] J.C. Shang and J. Wu, “A Robust Sign Language Recognition System
In this paper, we present a sentence-level sign language with Multiple Wi-Fi Devices”, the Workshop on Mobility in the Evolv-
recognition system based on RFID. It can carry out sign ing Internet Architecture (MobiArch 17).
language segmentation and present new feature (SOS) for [18] Y.S. Ma, G. Zhou, S.Q. Wang, H.Y. Zhao and W. Jung, “SignFi: Sign
Language Recognition Using WiFi”, the ACM on Interactive, Mobile,
sign language recognition. Extensive experiments have been Wearable and Ubiquitous Technologies, 2018.
conducted in various settings to evaluate its performance [19] L. Feng, Z.Y. Li and C. Liu, “Are You Sitting Right?-Sitting Posture
from different aspects. The results show the average accuracy Recognition Using RF Signals”, Pacrim 2019.

978-1-7281-4762-8/19/$31.00 © 2019 IEEE

You might also like