0% found this document useful (0 votes)
12 views7 pages

Inertial Motion Sensing Glove For Sign Language Gesture Acquisition and Recognition

This paper presents an accelerometer glove designed for robust sign language gesture recognition, addressing the limitations of vision-based systems that are sensitive to environmental conditions. The glove utilizes multiple inertial sensors to capture detailed hand and arm movements, achieving high recognition accuracy through a Parallel Hidden Markov Model (PaHMM) approach. The system demonstrates significant improvements in gesture recognition reliability and provides a potential solution for enhancing communication for deaf individuals.

Uploaded by

wiktor.zloty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Inertial Motion Sensing Glove For Sign Language Gesture Acquisition and Recognition

This paper presents an accelerometer glove designed for robust sign language gesture recognition, addressing the limitations of vision-based systems that are sensitive to environmental conditions. The glove utilizes multiple inertial sensors to capture detailed hand and arm movements, achieving high recognition accuracy through a Parallel Hidden Markov Model (PaHMM) approach. The system demonstrates significant improvements in gesture recognition reliability and provides a potential solution for enhancing communication for deaf individuals.

Uploaded by

wiktor.zloty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

6310 IEEE SENSORS JOURNAL, VOL. 16, NO.

16, AUGUST 15, 2016

Inertial Motion Sensing Glove for Sign Language


Gesture Acquisition and Recognition
Jakub Gałka, Member, IEEE, Mariusz Ma˛sior, Mateusz Zaborski, and Katarzyna Barczewska

Abstract— The most popular systems for automatic sign movement is then not limited by any gear or additional equip-
language recognition are based on vision. They are user-friendly, ment. Vision-based approaches allow for up to 95% correct
but very sensitive to changes in regard to recording conditions. recognition of sign language gestures [1], [10]. In the context
This paper presents a description of the construction of a more
robust system—an accelerometer glove—as well as its application of evaluation scenarios and testing conditions, the accuracy is
in the recognition of sign language gestures. The basic data reasonably high, although it is not enough in cases where a
regarding inertial motion sensors and the design of the gesture highly reliable and robust system is needed. Inertial and orien-
acquisition system as well as project proposals are presented. tation sensors such as accelerometers, magnetometers, or gyro-
The evaluation of the solution presents the results of the gesture scopes are highly efficient. These devices are not influenced
recognition attempt by using a selected set of sign language
gestures with a described method based on Hidden Markov by environmental conditions such as illumination or the back-
Model (HMM) and parallel HMM approaches. The proposed ground, which are usually problematic in vision systems.
usage of parallel HMM for sensor-fusion modeling reduced the Those sensors also allow for relatively easy acquisition of
equal error rate by more than 60%, while preserving 99.75% parameters which are hard to obtain in vision systems, such
recognition accuracy. as hand shape or forward/backward movement (related to the
Index Terms— Inertial motion sensors, gesture analysis, sign image depth axis).
language recognition, sensor glove.
Inertial-based systems also have drawbacks. The sensors
I. I NTRODUCTION are mounted on the entire upper limb, which often introduces

H AND movement recognition, in its different approaches,


has been a topic of research since the early 90s [1], [2].
Regardless of the passage of time, the topic is still rele-
limitation in hand movement. Additionally, when using a wired
solution, the user’s freedom of movement may be limited.
However, even with such drawbacks, a device based on inertial
vant [3]–[7], most likely due to the tons of data provided by sensors could be employed in the first stages of a system
human limb movement (measured by different devices, such project, e.g. in data acquisition support, gesture training, or as
as IoT, CCTV, smart home electronics, etc.). a validator of vision data. In this case, the use of a hybrid
Researchers are trying to use the human hand as a precise system should be considered, wherein the inertial sensor
controller of electronic devices. There are domains where this measurements are treated as support for the simultaneously
method of movement acquisition is in high demand [8]. The acquired vision data.
first example concerns medicine. For young interns, the pos-
sibility of surgery simulation, including hand movements, II. I NERTIAL M OTION S ENSORS
would be a valuable experience [9]. The second example, Researchers have developed a multitude of different solu-
which is directly connected to the topic of this work, is sign tions based on diverse sensors. Some of them are commercial-
language recognition. Deaf people, in order to maintain their ized, but there is no widespread and integrated solution used
independence, must be able to communicate with other peo- in gesture recognition yet. Many solutions employ only flex
ple and interact with consumer devices. An efficient gesture sensors, which are used mainly for hand movement or hand
recognition system, when correctly used, could improve their posture acquisition. One of such solutions for hand posture
quality of life. acquisition is presented in [11], where the sensing glove was
The most popular sign language recognition systems use equipped with fiber Bragg gratings sensors, which allowed
contactless RGB cameras and image processing, as user for the measurement of finger bending. Another solution
used an accelerometer sensor wristband to capture arm move-
Manuscript received April 11, 2016; revised June 6, 2016; accepted
June 7, 2016. Date of publication June 22, 2016; date of current version ments [12], [13]. In these works, the detected arm movements
July 18, 2016. This work was supported by the Polish National Centre were used as an additional data stream along with the body
for Research and Development through the Applied Research Program enti- joint positions extracted from the depth-image for fusion-
tled Virtual Sign Language Translator under Grant PBS2/B3/21/2013. The
associate editor coordinating the review of this paper and approving it for based gesture recognition. This approach, however, used only
publication was Dr. Roozbeh Jafari. (Corresponding author: Jakub Gałka.) one sensor and did not allow for hand and finger posture
J. Gałka, M. Ma˛sior, and M. Zaborski are with the Department of analysis required in more complicated sign language gesture
Electronics, AGH University of Science and Technology, 30-059 Kraków,
al. Mickiewicza 30, Poland (e-mail: [email protected]; [email protected]; modeling.
[email protected]). A sensor-based solution was employed in the presented
K. Barczewska is with the Department of Automatics and Biomedical project. The idea of the authors is to create a sensor glove,
Engineering, AGH University of Science and Technology, 30-059 Kraków,
al. Mickiewicza 30, Poland (e-mail: [email protected]). which will be used as an additional synchronous input in a
Digital Object Identifier 10.1109/JSEN.2016.2583542 hybrid system along with an RGB camera for sign language
1558-1748 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
GAŁKA et al.: INERTIAL MOTION SENSING GLOVE FOR SIGN LANGUAGE GESTURE ACQUISITION AND RECOGNITION 6311

hand-gesture recognition. The glove tracks the movement of


the entire upper limb due to an additional accelerometer placed
on the arm, as well as the movement of each of the fingers,
allowing for precise hand-gesture modeling and recognition.
As part of the design of the Accelerometer Glove,
the designers want to be able to acquire an exact model of
limb movement. There are several criteria the device must
satisfy. The first one was sufficient number of sensors. Each
sensor requires its own communication line. A significant
amount of sensors requires a significant amount of communi-
cation lines, which creates more connections on printed circuit
boards (PCBs). The second criterion concerns the surface of
the PCBs. It should be as small as possible to avoid limb
movement limitations.
Sign language introduces additional ergonomic require-
ments. A sign language user needs total freedom of movement
in each direction for every joint of their upper limbs. Sign
language gestures are highly dynamic and complex, with the
signer moving their arms and fingers at the same time [14]. Fig. 1. Motion sensor placement.
This simultaneous, complex movement is especially difficult
to obtain when using solely vision systems [2].
From the user’s point of view, an inertial system set up on
the upper limb is less comfortable than using a contactless
camera-based visual solution. However, in order to obtain
information regarding precise limb movement, only the sensors
placed on the wrist and fingers can be used. These sensors
acquire information regarding the general arm movement and
hand shape dynamics. The hand shape data are considered
crucial information in the case of the more complicated
gestures involving multiple rotations or joint bends, which are
elusive for vision-based systems.

III. T HE A RCHITECTURE OF THE S YSTEM


In order to have a mechanically accurate model of an upper
limb, a very complex model should be considered: seven
degrees of freedom should be assumed for three joints in
the upper limb (glenohumeral, elbow, and wrist) [15], and
23 degrees of freedom distal to the wrist [16], which results
in a total of 30 degrees of freedom. To copy such an exact Fig. 2. Acquisition system component configuration.
model, the use of at least 30 sensors should be considered. Not
all information provided by such a big set of sensors would
be needed in the recognition process. There are anatomical structure: the designed PCBs have zero insertion force (ZIF)
points whose behavior is more distinctive than others, so the sockets. The PCBs are connected by using ZIF connectors.
number of sensors could be significantly limited. The glove Each sensor is on a separate board. All of the boards are
made by the authors covers the most important degrees of electrically connected to the educational kit, which contains
freedom, as the arm, wrist, and fingers are all monitored. a microcontroller. The microcontroller manages measurement
Particular parts of the upper limb are connected, which allows acquisition and streaming. The measurement is then sent
e.g. to estimate the position of the elbow, or the finger’s through the universal serial bus (USB) to the PC and received
proximal interphalangeal joint. The number of sensors used via a graphical user interface (GUI). The entire acquisition
allows for sufficient hand-posture and gesture modeling, even system configuration is shown in Fig. 2.
for complicated sign-language gestures. Modularity allows for a fast relocation of sensors in case if
a sensor is broken, or if the user wants to test another type of
A. Hardware Description sensor. It is a rarely occurring solution in glove-type devices:
The Accelerometer Glove is a device designed for sign in most gloves, the positions of the sensors are fixed.
language users. The device has seven active sensors, with The small sizes of the designed boards and proper attach-
five located on the fingers (one sensor on each finger), one ment to the user’s body cause the glove to not limit the
on the wrist, and one on the arm (Fig. 1). Each of them freedom of movement. This is why sensor glove usage is not
is a 3-axis acceleration sensor. The device has a modular exhausting for the user during longer sessions.
6312 IEEE SENSORS JOURNAL, VOL. 16, NO. 16, AUGUST 15, 2016

B. Measurement Resolution
The manufacturer guarantees 10-bit resolution for the accel-
eration sensors, which translates to 1024 recognizable states.
The transmission line length caused by the length of the
human arm results in a high-frequency noise in the system.
The highest values of noise reveal that the actual resolution
is 7-bit (128 recognizable states). Another experiment, which
tested the Signal to Noise Ratio (SNR), proved that the value
of the SNR is 40 dB for most sensors (noise level was
estimated as an average signal, measured with stationary glove Fig. 3. Acceleration signals for gesture “good”, measured on forefinger.
in different positions). Both parameters indicate that the value
of noise is 1/100 the value of the signal. If such reasoning
is followed, then: 1024/100 ∼10 – uncertain states in the
10-bit resolution range. 3 lowest bits: 23 = 8 states. It can
be therefore assumed that in most situations, the 3 lowest bits
contain noise values. This confirms the resolution being 7-bit.
C. Signal Acquisition and Processing
All of the sensors employed in the Accelerometer Glove are
3-axis accelerometers. Each sensor is connected to a micro-
controller by using the Serial Peripheral Interface (SPI) Bus.
Data is acquired from the sensors synchronously. Following
data collection, the entire set is sent to the PC through USB.
The PC recognizes the device as a Serial Port (due to the
Virtual COM Protocol implemented in the microcontroller).
Then, the dataflow is intercepted by the GUI for acquisition
and further processing.
The sensors are connected to a single SPI Bus. The measure- Fig. 4. Scheme of Parallel Hidden Markov Model with independent HMMs
ments are collected with a frequency of 400 Hz. Each sensor for each channel.
is queried regarding the data in proper order, and, following a
whole cycle, the measurements are sent to the PC. Due to time In the approach presented in this article, PaHMM channels
uncertainty, in a worst case scenario the data may be delayed correspond to multiple sensors attached to the signer’s hand.
by 2.5 ms. Such a delay is fully acceptable, because natural The gesture in each channel is modeled as a sequence of
upper limb movement, and even rapid movement, is slow subunits. After taking into account the results of the conducted
enough to be recorded by acceleration sensors. experiments, a joint-feature HMM has also been attached to
The saved data is later processed by the PC. After calibra- PaHMM in a separate channel.
tion, the data is filtered through a low-pass Hamming-window- Unlike in [12], where data fusion was performed at fea-
based running average digital filter. The length of the filter is ture level and employed full joint-feature modeling only,
250 ms. This stage of pre-processing ensures the removal of we adopted another approach where the fusion of different
the higher noise frequencies from the signal. The feature vector sensor signals is performed at score level. A full joint-feature
consists of 3D acceleration measured by each sensor. Prior to model is also included as an additional stream. This solution
gesture modeling, a standardization procedure is applied to all increases the robustness of the system significantly when
of the features, separately for each gesture. The signals for compared to similar feature-level fusion (joint feature model)
the gesture “good”, measured on the forefinger, are presented only. A comparison between both approaches is discussed
in Fig. 3. later, in the evaluation section of the paper.

IV. S IGN L ANGUAGE G ESTURE R ECOGNITION A. Recognition Architecture


The authors’ model isolates sign language gestures by using Each gesture is modeled as a sequence of subunits, which
Parallel Hidden Markov Models (PaHMM), used at first in are smaller elements, similar in speech analysis to phonemes.
Automatic Speech Recognition (ASR) systems and described An isolated gesture can be transcribed as
by [17] and [18] but also successfully adopted in Automatic gestur e : sub1 sub2 . . . sub K . (1)
Sign Language Recognition (ASLR) systems, [19]–[21].
Usually, PaHMM is used for the modeling of sign language The simultaneous character of sign language causes that
gestures in accordance with sign language linguistics, taking each independent articulatory element can be described sep-
into account the parallelism of elements of articulation indi- arately in a parallel model. Independent parallel channels
cated e.g. by Stokoe [14]. Each PaHMM channel corresponds can correspond to independent parallel events which hap-
to a group of features which describes different articulatory pen during signing, as well as to different measurement
elements and is modeled as an independent HMM (Fig. 4). devices used in the experiment. Thus, gestures can be also
GAŁKA et al.: INERTIAL MOTION SENSING GLOVE FOR SIGN LANGUAGE GESTURE ACQUISITION AND RECOGNITION 6313

B. Gesture Model Training


A gesture model is trained separately in each parallel
channel. Initially, the mixture parameters {μi , i , ωi } are
assumed to be global values calculated for the whole training
set and are equal in all composite HMM states (flat start).
Fig. 5. Four-subunit left-to-right HMM gesture model. At first, GMMs have single components, and their number
is incremented by one in each training step by using mixture
transcribed as
⎧ splitting [22]. The parameters of the model are re-estimated in

⎪ channel1 : sub11 sub21 . . . sub1K 1 further training steps by the use of the Baum-Welch algorithm.



⎨ channel2 : sub2 sub2 . . . sub2 In each step, the subunit borders are realigned, taking into
1 2 K2
gestur e : (2) account the best match of the new model to the observations.

⎪ . . .

⎪ The subunits are not synchronized in time between channels,

⎩ channel L : sub L sub L . . . sub L as the synchronization is done by performing a fusion of
1 2 KL
channel responses at the whole-gesture level.
where L is the number of channels, and K L the number of sub-
units in the L-th channel. In this article, the parallel channels C. Recognition
correspond to different sensors attached to the signer’s hand. The recognition for a single l channel is performed by
An isolated gesture is represented by a sequence of obser- a token passing algorithm and an analysis of the N-best
vations O, which consists of feature vectors ot observed at list which contains log-likelihood values (scores) obtained by
each time frame t each gesture model gl,i . For single channels, the test sign is
recognized as the one for which the log-likelihood value is the
O = o1 , o2 , . . . , o T (3)
highest
To recognize a gesture, one needs to find such a gesture    
g ∗ = arg max log P gl,i |O . (8)
model g ∗ for which i

g = arg max (P (gi |O)) (4) To include information from different channels, a fusion of
i their responses is performed. To compare different channels,
where P (gi |O) is unknown and can be calculated using the the scores must be scaled to a similar range by using score
Bayes rule normalization. It is performed separately in each channel l,
and within each tested sign i
P (O|gi ) P (gi )   
P (gi |O) = . (5) log P gl,i |O − μl
P (O) scor el,i = (9)
P (gi ) is the prior probability, assumed to be equal for σl
different gestures, and P (O|gi ) is a generative gesture model where the mean μl and variance σl2 values are estimated,
likelihood based on the sequence of observations, the proba- taking into account all of the sign model scores from the
bility of which, P (O), can be calculated with N-best list, except for the highest one.
 Fusion is performed as the weighted sum of normalized
P (O) = P (O|gi ) P (gi ). (6)
channel responses. The sign is recognized as the one for
i
which the weighted sum of normalized scores has the highest
Every subunit in a given channel is modeled as a single- value
state HMM with a Gaussian Mixture Model (GMM), denoted

L
as λ = {μi , i , ωi }, which describes the probability of the g ∗ = arg max wl scor el,i . (10)
observation emission i l=1

M
Weights wl for different channels are proportional to recog-
p (o|λ) = ωi pi (o), (7) nition accuracy Accl , independently Accl obtained by a single
i=1
channel
where o is a D-dimensional observed feature vector, ωi are Accl
mixture weights, and M is the number of mixtures. Mean wl = . (11)
L
vector μi and diagonal covariance matrix i are parameters Accr
of unimodal probability densities pi (o). r=1
For a single PaHMM channel, the feature vector is
D = 3 dimensional, containing acceleration measured in each V. R ECOGNITION E VALUATION
dimension by a single sensor. An HMM joint model contains A. Gesture Database
signals from all sensors, which means that the feature vector The quality of the entire sign language gesture recognition
is D = 21 dimensional (7 sensors × 3D acceleration). system has been verified on a set of specifically collected
To model an entire gesture, the models of subunits gestures, recorded with the designed Accelerometer Glove.
are sequentially connected into a composite left-to-right The purpose of creating a database of recordings was
HMM (Fig. 5). to verify recognition efficiency and the possibilities of the
6314 IEEE SENSORS JOURNAL, VOL. 16, NO. 16, AUGUST 15, 2016

interoperability of inertial gesture recognition with a vision


system (based on RGB cameras and infra-red depth sensors).
In the case of a recognition system adapted to the purpose
of a dialogue system, there is no need for a large dictionary.
The cardinality of a dictionary for video gesture recognition
systems is often less than 100 [24], [25].
The dictionary gestures were matched to a dialog system
from the use case of a deaf person booking a visit to a
doctor’s office. This decision has conditioned the type of
gestures and determined their semantic content. The database
contains isolated gestures describing days of the week, months,
basic numerals, and names of medical specialties (pediatrician,
cardiologist, dermatologist, etc.). Finally, 40 of such gestures
were selected, which is an acceptable number for initial
research.
It should also be mentioned that the gestures were chosen
by taking into account the possibility of proper efficiency
verification. The creation of a dictionary ensures that the
gesture database covers and involves the entire space of all Fig. 6. Recording of example sign for evaluation corpus.
possible centers of articulation for sign language.
The database contains recorded gestures which either dif-
fer in the entire range of movements (shape, direction,
and speed), or differ only in a small part of the total
motion (e.g. the final movement of the hand, the number of
taps or exposed fingers).
This approach to the design of the gesture database allows
for reliable and consistent verification of the operation of
the system and for the determination of its full applicability
for the case of sign language recognition as well as of a
supportive data stream for the development of vision-only
gesture modeling.
Finally, the created recording database contains 10 repeti-
tions of each of the 40 gestures registered for the 5 signers.
This results in a total of 2000 recordings used for the validation
of the solution.
Fig. 7. User interface of the multisource acquisition application. Sensor glove
The signers were in a sitting position in order to minimize data visible in the bottom-left corner of the screen.
the movement of the entire body, just as in vision-based
setups. Each isolated gesture begins and ends in the same TABLE I
position (both hands rested on knees). Recordings were made C LASSIFICATION R ESULTS ( IN %) FOR THE A CCELEROMETER D ATABASE
for each person using a single Accelerometer Glove worn on
the dominant hand (Fig. 6).
The gesture recording was divided into several recording
sessions (taking place on different days). Gestures were shown
in different order and with a maximum of 3 repetitions of the
same gesture in a row during the course of a single recording
session. This approach was designed to minimize the effect As the scheme of 5-fold cross-validation was adopted,
of the signers familiarizing themselves with the gesture and the gesture records were divided into training and test sets
therefore signing it in a similar way. in the ratio of 80% – 20% in each validation. This division
The recording procedures were designed to proceed as was made ensuring that the training and test sets consisted
smoothly as possible. Automatic computer software was cre- of gestures made during different recording sessions and with
ated for this purpose (Fig. 7). It allows for simultaneous different people.
acquisition of the video representations of gestures (recorded
from the RGB cameras and depth sensor). C. Evaluation Results
An experiment was conducted in accordance with the
B. Evaluation Procedures described evaluation procedure, training, and recognition sce-
The entire evaluation was performed in compliance with all narios. The recognition performance is presented in Table I.
required practices for the validation of algorithms in pattern It contains recognition accuracy (Acc), equal error rate (EER),
recognition problems. F1 score, precision, and recall. The results also present the
GAŁKA et al.: INERTIAL MOTION SENSING GLOVE FOR SIGN LANGUAGE GESTURE ACQUISITION AND RECOGNITION 6315

TABLE II
C LASSIFICATION R ESULTS ( IN %) FOR THE V IDEO D ATABASE

TABLE III
C LASSIFICATION R ESULTS ( IN %) O BTAINED FOR S EPARATE
S ENSORS C OMPARED TO PaHMM AND J OINT-F EATURE
HMM A PPROACHES (S ENSOR N UMBERS AS IN F IG . 1)

Fig. 8. DET plot for accelerometer features for joint-feature HMM, PaHMM,
and recognition for separate sensors (Sensor numbers as in Fig. 1).

recognition evaluation for the joint-feature HMM as a refer-


ence for the effectiveness of the PaHMM.
Recognition accuracy, precision, and recall are very high,
while the EER is very small, which demonstrates the effec-
tiveness of an accelerometer-based system for the recognition
of isolated sign language gestures.
Using the PaHMM approach leads to an even lower value of
EER in comparison to the joint-feature HMM, which achieves
the same recognition accuracy. Low EER is particularly impor-
tant for end-user applications with low false acceptance and
false rejection rates. For comparison, the results obtained for a
similar database, containing less gestures (31), but created for
video-based recognition methods, are presented in Table II. Fig. 9. Precision-recall plot for accelerometer joint-feature HMM, PaHMM,
As can be observed, all of the parameters of the vision- and recognition for separate sensors (Sensor numbers as in Fig. 1).
based method are worse: lower accuracy, precision, and recall,
and higher EER, even though less gestures were used in the VI. C ONCLUSION
experiment. The evaluation results of the described acquisition system
Because of the very high efficiency of recognition per- and sign language gesture recognition, using accelerometer
formed with fractures from all accelerometer sensors, the pos- sensors, clearly shows that such an approach can result in
sibility of recognition of individual sensors in relation to the an extremely high efficiency of recognition. The efficiency is
overall processing structure is worth taking into consideration. much higher than in systems based solely on video sensors.
Table III presents the recognition performance achieved by The very high stability and resistance to different conditions
separate sensors in comparison to the PaHMM and the joint- and variances of recording gestures, as well as the differences
feature HMM. caused by recording different people (EER of 0.5%), leads to
The results obtained for the inertial features of separate the conclusion that using inertial motion sensors may result
sensors are worse than the results for the PaHMM and HMM in very high recognition confidence and robustness to data
approaches, which is also illustrated in the Detection Error variability.
Trade-off and precision-recall plots in Fig. 8 and Fig. 9 respec- It would obviously be quite difficult and inconvenient to
tively. However, it can be observed that the best single sensors use this approach (the Accelerometer Glove) with dialog
in terms of accuracy (Sensor 3 – middle finger) and in terms systems for deaf people. The vision system, as a contactless
of EER (Sensor 6 – wrist) achieve significantly better results approach, is more convenient, and does not require any special
than by using the vision-based method. sensors or devices for acquisition, except for an RGB camera.
6316 IEEE SENSORS JOURNAL, VOL. 16, NO. 16, AUGUST 15, 2016

The presented system can be used to improve the [20] U. von Agris, J. Zieren, U. Canzler, B. Bauer, and K.-F. Kraiss, “Recent
effectiveness of vision systems – e.g. at the stage of gesture developments in visual sign language recognition,” Universal Access Inf.
Soc., vol. 6, no. 4, pp. 323–362, Feb. 2008.
model training. Temporal models (such as the HMM) can be [21] S. Theodorakis, V. Pitsikalis, and P. Maragos, “Dynamic–static unsu-
assumed to be better suited to the variability of movement pervised sequentiality, statistical subunits and lexicon for sign lan-
due to the high efficiency of recognition in the case of inertial guage recognition,” Image Vis. Comput., vol. 32, no. 8, pp. 533–549,
Aug. 2014.
movement parameters. For this reason, such a model (or even [22] S. Young et al., The HTK Book (for HTK Version 3.4). Cambridge, U.K.:
the time division of the gesture into segments) could be used Eng. Dept. Cambridge Univ., 2006.
as input information for model training for a system based [23] P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach.
London, U.K.: Prentice-Hall, 1982.
on RGB cameras or depth sensors. This issue needs further [24] S. Bilal, R. Akmeliawati, A. A. Shafie, and M. J. E. Salami, “Hidden
investigation. Markov model for human to computer interaction: A study on human
Some of the results (Table III, Fig. 8 and Fig. 9) allow hand gesture recognition,” Artif. Intell. Rev., vol. 40, no. 4, pp. 495–516,
Dec. 2013.
for optimism in regards to the ability of using a single inertial [25] S. G. M. Almeida, F. G. Guimarães, and J. A. Ramírez, “Feature
sensor for gesture recognition (autonomously or in cooperation extraction in Brazilian sign language recognition based on phonological
with another system). It could be quite efficient and ergonomic structure and using RGB-D sensors,” Expert Syst. Appl., vol. 41, no. 16,
pp. 7259–7271, 2014.
to use smartphones or smartwatches in the future. [26] M. W. Kadous, “Temporal classification: Extending the classifi-
R EFERENCES cation paradigm to multivariate time series,” Ph.D. dissertation,
School Comput. Sci. Eng., Univ. New South Wales, Kensington, NSW,
[1] Y. Wu and T. S. Huang, “Vision-based gesture recognition: A review,” Australia, 2002.
in Gesture-Based Communication in Human-Computer Interaction
(Lecture Notes in Computer Science), vol. 1739. Berlin, Germany:
Springer, 1999, pp. 103–115.
[2] D. J. Sturman and D. Zeltzer, “A survey of glove-based input,” IEEE Jakub Gałka (M’14) received the M.Sc. and
Comput. Graph. Appl., vol. 14, no. 1, pp. 30–39, Jan. 1994. Ph.D. degrees in telecommunications and electronic
[3] H. Teleb and G. Chang, “Data glove integration with 3D virtual engineering from the AGH University of Science
environments,” in Proc. ICSAI, 2012, pp. 107–112. and Technology, Kraków, Poland, in 2003 and 2008,
[4] H. Zhou, H. Hu, N. D. Harris, and J. Hammerton, “Applications of respectively. He has been with the Department of
wearable inertial sensors in estimation of upper limb movements,” Electronics, AGH University of Science and Tech-
Biomed. Signal Process. Control, vol. 1, no. 1, pp. 22–32, 2006. nology, where he is currently a Researcher and a
[5] Z. Lu, X. Chen, Q. Li, X. Zhang, and P. Zhou, “A hand gesture Lecturer. In terms of his work, he was involved
recognition framework and wearable gesture-based interaction prototype in several Polish and European research projects
for mobile devices,” IEEE Trans. Human-Mach. Syst., vol. 44, no. 2, related to speech and audio processing. His research
pp. 293–299, Apr. 2014. focus lies in speech and language processing and
[6] S. Zhou et al., “2D human gesture tracking and recognition by the fusion recognition, speaker recognition, multimedia signal processing, and data
of MEMS inertial and vision sensors,” IEEE Sensors J., vol. 14, no. 4, analysis. He is working on the development of commercially available ASR
pp. 1160–1170, Apr. 2014. and speaker verification systems.
[7] Y.-C. Kan and C.-K. Chen, “A wearable inertial sensor node for
body motion analysis,” IEEE Sensors J., vol. 12, no. 3, pp. 651–657,
Mar. 2012. Mariusz Ma˛sior received the M.Sc. and Engineer-
[8] S. C. Mukhopadhyay, “Wearable sensors for human activity monitoring: ing degrees in telecommunications and electronic
A review,” IEEE Sensors J., vol. 15, no. 3, pp. 1321–1330, Mar. 2015. engineering from the AGH University of Science and
[9] R. C. King, L. Atallah, B. P. L. Lo, and G.-Z. Yang, “Development of Technology, Kraków, Poland, in 2010. He has been
a wireless sensor glove for surgical skills assessment,” IEEE Trans. Inf. an Assistant Professor, a Lecturer, and a member of
Technol. Biomed., vol. 13, no. 5, pp. 673–679, Sep. 2009. the Signal Processing Group, Department of Elec-
[10] S. C. W. Ong and S. Ranganath, “Automatic sign language analysis: tronics, AGH University of Science and Technology.
A survey and the future beyond lexical meaning,” IEEE Trans. Pattern He specializes in signal processing, speech technol-
Anal. Mach. Intell., vol. 27, no. 6, pp. 873–891, Jun. 2005. ogy, embedded systems, and systems engineering.
[11] A. F. da Silva, A. F. Gonçalves, P. M. Mendes, and J. H. Correia, “FBG
sensing glove for monitoring hand posture,” IEEE Sensors J., vol. 11,
no. 10, pp. 2442–2448, Oct. 2011.
[12] K. Liu, C. Chen, R. Jafari, and N. Kehtarnavaz, “Fusion of inertial and Mateusz Zaborski received the Engineering degree
depth sensor data for robust hand gesture recognition,” IEEE Sensors J., from the Department of Electronics, AGH University
vol. 14, no. 6, pp. 1898–1903, Jun. 2014. of Science and Technology, Kraków, Poland, where
[13] C. Chen, R. Jafari, and N. Kehtarnavaz, “A real-time human action he is continuing his education. His scientific interests
recognition system using depth and inertial sensor fusion,” IEEE concentrate in the areas of electronic design and
Sensors J., vol. 16, no. 3, pp. 773–781, Feb. 2016. sensors data acquisition.
[14] W. C. Stokoe, Jr., “Sign language structure: An outline of the visual
communication systems of the American deaf,” J. Deaf Stud. Deaf Edu.,
vol. 10, no. 1, pp. 3–37, 1960.
[15] E. J. Dijkstra, “Upper limb project, modeling of the upper limb,”
Dept. Eng. Technol., Univ. Twente, Enschede, The Netherlands,
Tech. Rep. s0142395, Dec. 2010.
[16] D. J. Sturman, “Whole-hand input,” Ph.D. dissertation, Katarzyna Barczewska received the M.Sc. degree
Media Arts Sciences Section, School Archit. Planning, Massachusetts in biomedical engineering from the AGH Univer-
Inst. Technol., Cambridge, MA, USA, 1992. sity of Science and Technology, Kraków, Poland,
[17] H. Bourlard and S. Dupont, “A mew ASR approach based on indepen- in 2011. She is currently pursuing the Ph.D. degree
dent processing and recombination of partial frequency bands,” in Proc. with the Department of Automatics and Biomed-
IEEE ICSLP, Philadelphia, PA, USA, Oct. 1996, pp. 426–429. ical Engineering, AGH University of Science and
[18] H. Bourlard and S. Dupont, “Subband-based speech recognition,” in Technology. Her research interests include statistical
Proc. IEEE ICASSP, Munich, Germany, Apr. 1997, pp. 1251–1254. learning, machine learning, and gesture recognition.
[19] C. Vogler and D. Metaxas, “A framework for recognizing the simul-
taneous aspects of American sign language,” Comput. Vis. Image
Understand., vol. 81, no. 3, pp. 358–384, Mar. 2001.

You might also like