Inertial Motion Sensing Glove For Sign Language Gesture Acquisition and Recognition
Inertial Motion Sensing Glove For Sign Language Gesture Acquisition and Recognition
Abstract— The most popular systems for automatic sign movement is then not limited by any gear or additional equip-
language recognition are based on vision. They are user-friendly, ment. Vision-based approaches allow for up to 95% correct
but very sensitive to changes in regard to recording conditions. recognition of sign language gestures [1], [10]. In the context
This paper presents a description of the construction of a more
robust system—an accelerometer glove—as well as its application of evaluation scenarios and testing conditions, the accuracy is
in the recognition of sign language gestures. The basic data reasonably high, although it is not enough in cases where a
regarding inertial motion sensors and the design of the gesture highly reliable and robust system is needed. Inertial and orien-
acquisition system as well as project proposals are presented. tation sensors such as accelerometers, magnetometers, or gyro-
The evaluation of the solution presents the results of the gesture scopes are highly efficient. These devices are not influenced
recognition attempt by using a selected set of sign language
gestures with a described method based on Hidden Markov by environmental conditions such as illumination or the back-
Model (HMM) and parallel HMM approaches. The proposed ground, which are usually problematic in vision systems.
usage of parallel HMM for sensor-fusion modeling reduced the Those sensors also allow for relatively easy acquisition of
equal error rate by more than 60%, while preserving 99.75% parameters which are hard to obtain in vision systems, such
recognition accuracy. as hand shape or forward/backward movement (related to the
Index Terms— Inertial motion sensors, gesture analysis, sign image depth axis).
language recognition, sensor glove.
Inertial-based systems also have drawbacks. The sensors
I. I NTRODUCTION are mounted on the entire upper limb, which often introduces
B. Measurement Resolution
The manufacturer guarantees 10-bit resolution for the accel-
eration sensors, which translates to 1024 recognizable states.
The transmission line length caused by the length of the
human arm results in a high-frequency noise in the system.
The highest values of noise reveal that the actual resolution
is 7-bit (128 recognizable states). Another experiment, which
tested the Signal to Noise Ratio (SNR), proved that the value
of the SNR is 40 dB for most sensors (noise level was
estimated as an average signal, measured with stationary glove Fig. 3. Acceleration signals for gesture “good”, measured on forefinger.
in different positions). Both parameters indicate that the value
of noise is 1/100 the value of the signal. If such reasoning
is followed, then: 1024/100 ∼10 – uncertain states in the
10-bit resolution range. 3 lowest bits: 23 = 8 states. It can
be therefore assumed that in most situations, the 3 lowest bits
contain noise values. This confirms the resolution being 7-bit.
C. Signal Acquisition and Processing
All of the sensors employed in the Accelerometer Glove are
3-axis accelerometers. Each sensor is connected to a micro-
controller by using the Serial Peripheral Interface (SPI) Bus.
Data is acquired from the sensors synchronously. Following
data collection, the entire set is sent to the PC through USB.
The PC recognizes the device as a Serial Port (due to the
Virtual COM Protocol implemented in the microcontroller).
Then, the dataflow is intercepted by the GUI for acquisition
and further processing.
The sensors are connected to a single SPI Bus. The measure- Fig. 4. Scheme of Parallel Hidden Markov Model with independent HMMs
ments are collected with a frequency of 400 Hz. Each sensor for each channel.
is queried regarding the data in proper order, and, following a
whole cycle, the measurements are sent to the PC. Due to time In the approach presented in this article, PaHMM channels
uncertainty, in a worst case scenario the data may be delayed correspond to multiple sensors attached to the signer’s hand.
by 2.5 ms. Such a delay is fully acceptable, because natural The gesture in each channel is modeled as a sequence of
upper limb movement, and even rapid movement, is slow subunits. After taking into account the results of the conducted
enough to be recorded by acceleration sensors. experiments, a joint-feature HMM has also been attached to
The saved data is later processed by the PC. After calibra- PaHMM in a separate channel.
tion, the data is filtered through a low-pass Hamming-window- Unlike in [12], where data fusion was performed at fea-
based running average digital filter. The length of the filter is ture level and employed full joint-feature modeling only,
250 ms. This stage of pre-processing ensures the removal of we adopted another approach where the fusion of different
the higher noise frequencies from the signal. The feature vector sensor signals is performed at score level. A full joint-feature
consists of 3D acceleration measured by each sensor. Prior to model is also included as an additional stream. This solution
gesture modeling, a standardization procedure is applied to all increases the robustness of the system significantly when
of the features, separately for each gesture. The signals for compared to similar feature-level fusion (joint feature model)
the gesture “good”, measured on the forefinger, are presented only. A comparison between both approaches is discussed
in Fig. 3. later, in the evaluation section of the paper.
TABLE II
C LASSIFICATION R ESULTS ( IN %) FOR THE V IDEO D ATABASE
TABLE III
C LASSIFICATION R ESULTS ( IN %) O BTAINED FOR S EPARATE
S ENSORS C OMPARED TO PaHMM AND J OINT-F EATURE
HMM A PPROACHES (S ENSOR N UMBERS AS IN F IG . 1)
Fig. 8. DET plot for accelerometer features for joint-feature HMM, PaHMM,
and recognition for separate sensors (Sensor numbers as in Fig. 1).
The presented system can be used to improve the [20] U. von Agris, J. Zieren, U. Canzler, B. Bauer, and K.-F. Kraiss, “Recent
effectiveness of vision systems – e.g. at the stage of gesture developments in visual sign language recognition,” Universal Access Inf.
Soc., vol. 6, no. 4, pp. 323–362, Feb. 2008.
model training. Temporal models (such as the HMM) can be [21] S. Theodorakis, V. Pitsikalis, and P. Maragos, “Dynamic–static unsu-
assumed to be better suited to the variability of movement pervised sequentiality, statistical subunits and lexicon for sign lan-
due to the high efficiency of recognition in the case of inertial guage recognition,” Image Vis. Comput., vol. 32, no. 8, pp. 533–549,
Aug. 2014.
movement parameters. For this reason, such a model (or even [22] S. Young et al., The HTK Book (for HTK Version 3.4). Cambridge, U.K.:
the time division of the gesture into segments) could be used Eng. Dept. Cambridge Univ., 2006.
as input information for model training for a system based [23] P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach.
London, U.K.: Prentice-Hall, 1982.
on RGB cameras or depth sensors. This issue needs further [24] S. Bilal, R. Akmeliawati, A. A. Shafie, and M. J. E. Salami, “Hidden
investigation. Markov model for human to computer interaction: A study on human
Some of the results (Table III, Fig. 8 and Fig. 9) allow hand gesture recognition,” Artif. Intell. Rev., vol. 40, no. 4, pp. 495–516,
Dec. 2013.
for optimism in regards to the ability of using a single inertial [25] S. G. M. Almeida, F. G. Guimarães, and J. A. Ramírez, “Feature
sensor for gesture recognition (autonomously or in cooperation extraction in Brazilian sign language recognition based on phonological
with another system). It could be quite efficient and ergonomic structure and using RGB-D sensors,” Expert Syst. Appl., vol. 41, no. 16,
pp. 7259–7271, 2014.
to use smartphones or smartwatches in the future. [26] M. W. Kadous, “Temporal classification: Extending the classifi-
R EFERENCES cation paradigm to multivariate time series,” Ph.D. dissertation,
School Comput. Sci. Eng., Univ. New South Wales, Kensington, NSW,
[1] Y. Wu and T. S. Huang, “Vision-based gesture recognition: A review,” Australia, 2002.
in Gesture-Based Communication in Human-Computer Interaction
(Lecture Notes in Computer Science), vol. 1739. Berlin, Germany:
Springer, 1999, pp. 103–115.
[2] D. J. Sturman and D. Zeltzer, “A survey of glove-based input,” IEEE Jakub Gałka (M’14) received the M.Sc. and
Comput. Graph. Appl., vol. 14, no. 1, pp. 30–39, Jan. 1994. Ph.D. degrees in telecommunications and electronic
[3] H. Teleb and G. Chang, “Data glove integration with 3D virtual engineering from the AGH University of Science
environments,” in Proc. ICSAI, 2012, pp. 107–112. and Technology, Kraków, Poland, in 2003 and 2008,
[4] H. Zhou, H. Hu, N. D. Harris, and J. Hammerton, “Applications of respectively. He has been with the Department of
wearable inertial sensors in estimation of upper limb movements,” Electronics, AGH University of Science and Tech-
Biomed. Signal Process. Control, vol. 1, no. 1, pp. 22–32, 2006. nology, where he is currently a Researcher and a
[5] Z. Lu, X. Chen, Q. Li, X. Zhang, and P. Zhou, “A hand gesture Lecturer. In terms of his work, he was involved
recognition framework and wearable gesture-based interaction prototype in several Polish and European research projects
for mobile devices,” IEEE Trans. Human-Mach. Syst., vol. 44, no. 2, related to speech and audio processing. His research
pp. 293–299, Apr. 2014. focus lies in speech and language processing and
[6] S. Zhou et al., “2D human gesture tracking and recognition by the fusion recognition, speaker recognition, multimedia signal processing, and data
of MEMS inertial and vision sensors,” IEEE Sensors J., vol. 14, no. 4, analysis. He is working on the development of commercially available ASR
pp. 1160–1170, Apr. 2014. and speaker verification systems.
[7] Y.-C. Kan and C.-K. Chen, “A wearable inertial sensor node for
body motion analysis,” IEEE Sensors J., vol. 12, no. 3, pp. 651–657,
Mar. 2012. Mariusz Ma˛sior received the M.Sc. and Engineer-
[8] S. C. Mukhopadhyay, “Wearable sensors for human activity monitoring: ing degrees in telecommunications and electronic
A review,” IEEE Sensors J., vol. 15, no. 3, pp. 1321–1330, Mar. 2015. engineering from the AGH University of Science and
[9] R. C. King, L. Atallah, B. P. L. Lo, and G.-Z. Yang, “Development of Technology, Kraków, Poland, in 2010. He has been
a wireless sensor glove for surgical skills assessment,” IEEE Trans. Inf. an Assistant Professor, a Lecturer, and a member of
Technol. Biomed., vol. 13, no. 5, pp. 673–679, Sep. 2009. the Signal Processing Group, Department of Elec-
[10] S. C. W. Ong and S. Ranganath, “Automatic sign language analysis: tronics, AGH University of Science and Technology.
A survey and the future beyond lexical meaning,” IEEE Trans. Pattern He specializes in signal processing, speech technol-
Anal. Mach. Intell., vol. 27, no. 6, pp. 873–891, Jun. 2005. ogy, embedded systems, and systems engineering.
[11] A. F. da Silva, A. F. Gonçalves, P. M. Mendes, and J. H. Correia, “FBG
sensing glove for monitoring hand posture,” IEEE Sensors J., vol. 11,
no. 10, pp. 2442–2448, Oct. 2011.
[12] K. Liu, C. Chen, R. Jafari, and N. Kehtarnavaz, “Fusion of inertial and Mateusz Zaborski received the Engineering degree
depth sensor data for robust hand gesture recognition,” IEEE Sensors J., from the Department of Electronics, AGH University
vol. 14, no. 6, pp. 1898–1903, Jun. 2014. of Science and Technology, Kraków, Poland, where
[13] C. Chen, R. Jafari, and N. Kehtarnavaz, “A real-time human action he is continuing his education. His scientific interests
recognition system using depth and inertial sensor fusion,” IEEE concentrate in the areas of electronic design and
Sensors J., vol. 16, no. 3, pp. 773–781, Feb. 2016. sensors data acquisition.
[14] W. C. Stokoe, Jr., “Sign language structure: An outline of the visual
communication systems of the American deaf,” J. Deaf Stud. Deaf Edu.,
vol. 10, no. 1, pp. 3–37, 1960.
[15] E. J. Dijkstra, “Upper limb project, modeling of the upper limb,”
Dept. Eng. Technol., Univ. Twente, Enschede, The Netherlands,
Tech. Rep. s0142395, Dec. 2010.
[16] D. J. Sturman, “Whole-hand input,” Ph.D. dissertation, Katarzyna Barczewska received the M.Sc. degree
Media Arts Sciences Section, School Archit. Planning, Massachusetts in biomedical engineering from the AGH Univer-
Inst. Technol., Cambridge, MA, USA, 1992. sity of Science and Technology, Kraków, Poland,
[17] H. Bourlard and S. Dupont, “A mew ASR approach based on indepen- in 2011. She is currently pursuing the Ph.D. degree
dent processing and recombination of partial frequency bands,” in Proc. with the Department of Automatics and Biomed-
IEEE ICSLP, Philadelphia, PA, USA, Oct. 1996, pp. 426–429. ical Engineering, AGH University of Science and
[18] H. Bourlard and S. Dupont, “Subband-based speech recognition,” in Technology. Her research interests include statistical
Proc. IEEE ICASSP, Munich, Germany, Apr. 1997, pp. 1251–1254. learning, machine learning, and gesture recognition.
[19] C. Vogler and D. Metaxas, “A framework for recognizing the simul-
taneous aspects of American sign language,” Comput. Vis. Image
Understand., vol. 81, no. 3, pp. 358–384, Mar. 2001.