0% found this document useful (0 votes)
2 views4 pages

Icta13 MJ

This paper discusses the development of a sign language recognition system aimed at improving human-computer interaction, particularly for deaf individuals. It highlights the complexity of sign languages compared to spoken languages and emphasizes the importance of context and prediction in enhancing recognition accuracy. The proposed system involves a dialogue between users and a signing avatar, addressing challenges such as vocabulary size and the need for real-time processing.

Uploaded by

ayanmovie54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Icta13 MJ

This paper discusses the development of a sign language recognition system aimed at improving human-computer interaction, particularly for deaf individuals. It highlights the complexity of sign languages compared to spoken languages and emphasizes the importance of context and prediction in enhancing recognition accuracy. The proposed system involves a dialogue between users and a signing avatar, addressing challenges such as vocabulary size and the need for real-time processing.

Uploaded by

ayanmovie54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Towards Sign language recognition

system in Human-Computer interactions


Maher JEBALI#1, Patrice DALLE*2, Mohamed JEMNI#3
#Research Lab. LaTICE - ESSTT
University of Tunis - Tunisia
1
[email protected]
3
[email protected]
*Research Lab. IRIT
Univ. of Toulouse3 - France
2
[email protected]
Abstract— With the efforts to improve the use of The grammars of sign languages are as highly
Human-Computer-Interaction, there has been an complex as the grammars of voice languages and
important interest in trying to integrate human share with them many universal features, despite
gestures into human–computer interface. This paper the difference in modality between sign languages
presents a modelling of Sign language recognition
(which use the visual channel) and voice languages
system, which is summarized in a dialogue between
deaf people and signing avatar. With this modelling, (which use the auditory channel). Yet, sign
the system can be configurable: we can keep the languages also differ from spoken languages in
general modelling and only we change the scenario radical ways: morphological information in SL is
and the vocabulary. We have included to these often conveyed simultaneously and from the
modelling two important elements, which are context beginning of signing; moreover, certain aspects of
and prediction, to improve the reliability of sign their phonological, syntactic and semantic
language recognition system compared to the classic structures are not commonly found in voice
systems, which don’t use semantic concept. languages. These differences raise an interesting
challenge both for including context and handle
Keywords— French sign language recognition, HCI,
avatar large vocabulary.
This paper is structured as follows. Section 2 gives
I. INTRODUCTION an overview of related works in this study. Section
In several disciplines, many researchers have been 3 gives the modelling system and the benefits of
interested in the field of gesture based Human- prediction. Finally, we present the conclusion and
some perspectives.
Computer interaction (HCI) and gesture recognition.
Among these disciplines we quote computer vision, II. RELATED WORKS
natural language processing, pattern recognition,
HCI and linguistics. This multi-disciplinary The automatic recognition of sign language is
research field can develop useful applications such almost 20 years behind speech recognition for
as robotics control, emotion analysis, psychological multiple reasons.
behaviour and sign language recognition. Classification and processing of one-dimensional
Incessantly, HCI is defining new communication audio signals are easier than two-dimensional video
modalities and new machines interacting ways. signals. Also, sign language processing is by far not
Gesture can transmit information for which other completely explored yet. Understanding sign
aspects are not suitable or efficient. In spontaneous language requires better linguistic knowledge, but
interaction, gestures can be used as a single until now there is no general rules that define the
modality or in combination with multi-modal signing from a linguistic point of view.
interaction programs involving textual media, The first scientific publications in the field of sign
speech or facial expression. Sign language language recognition has become in the beginning
constitutes the multi-aspects interactions where of the 90s. Most applications presented in previous
different manual or non-manual components may works don't operate in real-time and need up to 20
occur simultaneously. seconds after the sign production to complete the
Most French deaf people do not understand processing. There is a rarely published work, which
perfectly French language, which explain the gives details on camera hardware and resolution,
difficulties that they encounter to communicate suggesting that professional hardware, optimal
with the computer and new technologies. camera placement, low noise and high resolution
Sign language recognition is an application area of were used.
HCI, which tends to facilitate interaction between The data acquisition method constitutes the first
deaf person and technologies. feature, which classifies the different works. The
most simple, exact and reliable techniques are
intrusive. Put magnetic or optical markers on hands
and face facilitate the determination of manual source language and target language (omissions,
configuration and facial expression. However, this additions, substitutions…) because they exploit the
is restrictive and unnatural for the user. context and semantics. On the other side, existing
Furthermore, data gloves, which measure the systems have poor performance because they are
flexion of the finger joints, are undesirable for not interested by the context and semantics.
practical systems due to their high cost. We have chosen to deal with situations of dialogue
Furthermore, most existing applications do not led by the system because they can exploit the
exploit non-manual features [4]. sense and control the context. (illustrated in figure
Many work deal only the notion of signer- 1).
dependent where every signer is required to train
the system before being able to use it.
The use of the notion of signer-independent
requires a suitable features normalization from the
first step of processing to rid of features
dependencies on the distance of the signer’s from
the camera, his position in the image and other
morphological rules.
Many researchers are focused on isolated signs like
the speech recognition in their early days. Some
existing systems process continuous production of
signs but their vocabulary is not large. To improve
the recognition rate, the exploitation of grammar Fig. 1: Genaral schema of dialogue (recognition system is
and context is necessary. controlled by context)
The described system’s feature and the several
We model the scenarios of a diaogue that can occur
important works are listed in Table 1. In contrast to
between a real interocutor and other virtual,
speech recognition, we cannot compare the
knowing that this dialogue is driven by the avatar.
indicated performances, due to the absence of a
The different stages of the dialogue are presented in
standardized benchmark for sign language
the following subsection.
recognition.
TABLE I A. Description of the dialogue algorithm
CLASSIFIER CHARACTERISTICS FOR SIGN LANGUAGE 1- In the beginning, Avatar lances a message of
RECOGNITION welcome
Author Features Interface Vocab Language Recog.
2- Avatar begins the scenario :
Level rate  Explanation of the rules (in this case, the
in % avatar is the master)
[3] M Optical 22 Word 95.5  asks the interlocutor to explain his request
markers 2 (a) - if the interlocutor has understood the request
[8] M Video 40 Word 98.1
and he haven't the answer (hesitation, long
[2] M Data   203 S 92.1
glove inactivity, sign ...), the system (avatar) intervenes
[6] M Video 40 S 97.8 and reformulates the request.
[7] M Video 22 S 91.8 2 (b) - if the interlocutor has understood the
[4] M Video 39 S 92.0 demand, he responds to the request.
[1] M Video 164 SB 74.3 3- The system analyzes the signs produced by the
[5] M Video 961 Word 82.0 interlocutor
M : Manual, S: Sentence, SB: Subunits 3 (a) - If the system do not recognize a sign or all
All recognition rates are valid only for the test the statement, it generates a message in avatar
examples. Also, we observed that when the language, thereafter it passes the information to the
vocabulary size increases, the recognition rate interlocutor. The interlocutor produces again the
decreases sharply and becomes insufficient. request
In summary, we can judge that the existing systems 3 (b) - If the system recognize the request, it
not meet the requirements for a robust real word generates a message in avatar language, thereafter it
system. In the following sections, we describe a passes the information to the interlocutor
framework for experimentation that takes into 4- whether there are any other iteration, we repeat
account the context and it more closely matches the the same process from 2, otherwise we end the
real world. dialogue.

III. MODELLING INTERACTION ANALYSIS B. Exploitation


The interpreters are capable to convert voice or text In all language analysis, context is very important.
to sign, despite the difficult conditions It shows why some sign or word is used in a certain
(simultaneity and the non-equivalence between situation.
We can not talk about the optimality of sign verifying the subsets characteristics which
language recognition system without adding the identifies a sign relative to another (table 2).
context as an input to the system, because the same TABLE II
concept may appear in a variety of contexts and its
appearance can be very different depending on VERIFICATION OF SUBSET CHARACTERISTICS BASED ON
PREDICTION
these context.
In French sign language, there are similarities Sign Charac1 Charac2 Charac3 Charac4 … CharacN

between several signs but with different meaning, S1 X X X X X


thus adding the context in each step of the dialogue
allows us to refine the reliability of the recognition S2 X X X X
system. S3 X X X X

C. Interest of dialogue led by avatar


In addition to the context, prediction is one of the
IV. CONCLUSIONS
most essential issues that need to be explored for
sign language recognition. Gestural interfaces can help deaf people to have
Such as in human behaviour, it is possible – in sign more natural communication with computer. In this
language recognition system – to predict the future scope, we showed in the first time the main
outcome, rather than to simply provide backward- problematic of sign language recognition towards
looking data about past interactions and to do these real world application.
predictions in real-time. After that, we detailed a modelling of HCI as a
The wide use of signs characterization data, dialogue between a deaf person and signing avatar.
whether on 2D or 3D data, enable sometimes to This modelling is constrained by context concept
improve the recognition rate. But these approaches and the prediction concept was proposed to handle
require huge processing time, since it attempts to the problem of large vocabulary complexity.
define signs detection based methods. One of the most useful approaches for SL
Accurate prediction of location information is also recognition is to use HMMs, a powerfull generative
crucial in processing location-dependent queries. model. However, the observations of these
generative models are conditionally independent,
which allows us to focus may be on the
discriminant models.

REFERENCES
[1] Cooper HM, Bowden R. Large Lexicon Detection of Sign
Language. IEEE Workshop Human Computer Interaction,
Fig. 2 Movement prediction ICCV, Rio Brazil 07, LNCS 4796, Springer Verlag. pp88-
97, 2007.
Each dialog step is controlled by the system in a [2] Fang, G., Gao, W., Chen, X., Wang, C., Ma, J. Signer-
known context, which is used to remove the independent continuous sign language recognition based
on SRN/HMM. In: Revised Papers from the International
ambiguities of similarity between the signs and to Gesture Workshop on Gestures and Sign Languages in
predict certain sign characteristics (hand location, Human–Computer Interaction, pp. 76–85. Springer,
head orientation…). Processing with prediction is Heidelberg, 2002.
easier than the bottom-up processing (segmentation, [3] Holden, E.J., Owens, R.A. Visual sign language
recognition. In: Proceedings of the 10th International
tracking, characterization, ...), because if there are Workshop on Theoretical Foundations of Computer
errors in one of the different stages, the rest will be Vision, pp. 270–288. Springer, Heidelberg, 2001.
false. Also, measurements are more simple and [4] Parashar, A.S. Representation and interpretation of
easy to check with the predction processing. manual and non-manual information for automated
American sign language recognition. Ph.D. Thesis,
For example, figure 2, shows how we can predict Department of Computer Science and Engineering,
the hand position according to speed and movement College of Engineering, University of South Florida, 2003.
of the hand. Also, we can check if the global shape [5] Pitsikalis V, Theodorakis S, Vogler C, Maragos P.
of the hand has changed or not, instead of Advances in Phonetics- based Sub-Unit Modeling for
Transcription Alignment and Sign Language Recog-
segmenting different fingers, which is costly in nition. IEEE CVPR Workshop on Gesture Recognition,
terms of time. Colorado Springs, USA, 2011.
Dialogue is a particular case of Human-Machine [6] Starner, T., Weaver, J., Pentland, A. Real-time American
Interactions where it is controlled by the machine. sign language recognition using desk and wearable
computer based video. IEEE Trans. Pattern Anal. Mach.
So, there is expectations on the response and rightly Intell. 20(12), 1371– 1375, 1998.
to exploit in order to make a feasible system. [7] Vogler, C., Metaxas, D. Parallel hidden Markov models
In each interaction in dialogue, we predicted all for American sign language recognition. In: Proceedings
condidats signs to be recognized. In the following of the International Conference on Computer Vision,
1999.
phase (recognition), we are concerned with [8] Yang, M., Ahuja, N., Tabb, M. Extraction of 2D motion
trajectories and its application to hand gesture recognition.
IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074,
2002.

You might also like