Machine Translation From Text To Sign Language: A Systematic Review
Machine Translation From Text To Sign Language: A Systematic Review
https://doi.org/10.1007/s10209-021-00823-1
LONG PAPER
Abstract
An equal opportunity for all is the basic right of every human being. The deaf society of the world needs to have access to
all the information just like hearing people do. For this to happen there should be a mode of direct communication between
hearing and deaf people. The need at this time is to automate this communication so as the deaf society is not dependent
upon human interpreters. This paper deals with the systematic survey of conventional and state-of-the-art sign language
machine translation and sign language generation projects. We used a standard procedure of carrying out a systematic litera-
ture review on 148 studies published in 30 reputed journals and 40 premium conferences and workshops. Existing literature
about sign language machine translation is broadly classified into three different categories. These categories are further
sub-classified into different classifications depending upon the type of machine translation. Studies pertaining to the specified
classifications have been presented with their advantages and limitations. Different methods for sign language generation
are reported with their benefits and limitations. Manual and automatic evaluation methods used in every study is presented
along with their respective performance metrics. We call for increased efforts in presentation of signs to make them an easy
and comfortable mode of communication for deaf society. There is also a requirement to improve translation methods and
include the contribution of advanced technologies such as deep learning and neural networks to make an optimal translation
process between text and sign language.
Keywords Sign language · Deaf people communication · Machine translation · Systematic literature review · Sign language
generation
13
Vol.:(0123456789)
Universal Access in the Information Society
to Portuguese Sign Language (LIBRAS) to improve com- reported. The results were extracted using a search engine
munication between deaf and hearing. of scholarly documents Google Scholar. The search strings
There are various projects like SignAloud, Kinect Sign required the titles to have the keywords like “sign language,”
Language Translator, SignAll, and MotionSavvy, translating “machine translation,” and “sign generation/synthesis.”
the signed words or sentences to spoken language. However, While searching several studies mentioned above, studies
in this paper, we will review only the research papers which related to sign language recognition were excluded as the
deal with text/Voice to sign language translation. review conducted only cites the studies based upon text to
sign language translation. The lack of systematic studies on
1.1 Motivation for work SLMT is the primary motivation behind this review.
• Sign Language Machine Translation (SLMT) is a funda- This main goal of this systematic review’s was to identify
mental idea to make information accessible for the deaf and classify the existing literature focusing on SLMT, sign
community an easy task. Our study discusses various synthesis, and performance metrics used to evaluate the
conventional and contemporary approaches along with translation techniques. Planning the review required a set of
their shortcomings. research questions. Table 1 lists the specific research ques-
• Our study explores various kinds of SLMT methods and tions and sub-questions. Every research question is sup-
the sign synthesis required to perceive the final output ported with motivation for considering that question.
for the deaf community. Furthermore, we will discuss
multiple evaluation methods and performance metrics
used by different studies. 2.3 Sources of information
• The lack of a complete systematic literature review in the
field of SLMT was a motivating factor. We analyzed the A broad perspective is required for extensive and broad cov-
entire existing database for SLMT and summarized it to erage of literature. Before starting the review, an appropri-
report certain opportunities for future investigations. ate set of databases must be chosen to make the findings of
highly relevant articles probable. The following electronic
This main goal of this review was to classify the existing databases were searched for this review:
literature focusing on different SLMT systems, sign synthe-
sis, and evaluation methods applied in various studies. • Springer (https://link.springer.com/)
• ScienceDirect (https://www.sciencedirect.com/)
• IEEE explore (https://ieeexplore.ieee.org/Xplore/home.
2 Review method jsp)
• Taylor and Francis (https://www.tandfonline.com/)
The systematic review reported in this paper was done fol- • ACM Digital Library (https://www.acm.org/publicatio
lowing the guidelines discussed in [4–6]. The steps included ns/digital-library)
in the review include developing a review protocol, con-
ducting the study, and discussing the findings and future
prospective. 2.4 Search criteria
The systematic review shows different classifications and
sub-classifications of SLMT interspersed over time. In nearly all the searches, the keywords “machine transla-
Following is a table with a set of research questions that tion,” “sign language,” and “sign generation” are included in
were required to plan this review. the abstract. We tried to extract as much relevant literature
as possible from the e-resources mentioned. We undertook
2.1 Planning the review a pedantic database search to ensure the comprehensiveness
of our review. Table 2 lists the e-resources and the search
The review protocol formulated includes the research ques- strings used to extract relevant papers. Many known research
tion framework, the databases searched, and search strings papers were not included in the review mainly because our
used to search for relevant studies required for this review. review is one-sided, i.e., from text to sign language and not
The review method was discussed between both the authors vice-versa. Only papers in English were included in our
and then finalized after discussion. Electronic databases study. The studies’ abstracts and titles are used as the initial
were extensively searched and the studies extracted are filtration parameter to find out relevant studies.
13
Universal Access in the Information Society
1. What is the conventional status of Sign Language Machine transla- It helps in understanding the initial research endeavours in the field.
tion? Various state-of-art works have been reported with their respective
advantages and limitations
2. What is the current status of Rule-Based Machine Translation? The study of various RBMT research papers helps understand the role
2.1 What are the categories of RBMT, what are the methods used for of linguistic understanding of both source and destination languages.
pre-processing, translation, and generation of signs? We have mentioned studies that picked upon all three kinds of RBMT
and the number of studies that followed rule-based translation strategy
3. Research status of Corpus-Based Machine Translation? The study of corpus-based machine translation research papers answers
3.1 What are different categories of corpus-based machine translation, the need to move from rule-based to corpus-based approaches. The
methods used for pre-processing, translation and sign generation, research question explores the studies which have adopted different
and bilingual corpora’s importance? kinds of data-driven strategies for SLMT
4. What are the studies discussing the current scenario of sign synthe- It helps in knowing the final output of the translation process. It is
sis? essential to know the kind of modes available for sign synthesis. The
4.1 What are various types of modes used for sign synthesis? studies discussed show the advantages of 3D avatars over other types
4.2 What are different types of annotation systems and platforms used of sign generation outputs
for synthesis using a 3D avatar?
5. What are the evaluation methods used in different studies? It is essential to understand the evaluation measures adopted to measure
5.1 What are the different types of evaluation methods? the efficiency of the system. Various performance metrics and the
5.2 What are performance metrics used under each evaluation number of studies for each performance metric are also reported
method?
1 ieeexplore.ieee.org Abstract: “sign language”, “machine translation”, “sign genera- All dates Conferences and Journals
tion”
2 www.acm.org Abstract: “sign language”, “machine translation”, “sign genera- All dates Journals and Proceedings
tion”
3 https://www.sciencedirect.com/ Abstract: “sign language”, “machine translation”, “sign genera- All dates All sources
tion”
4 https://link.springer.com/ Abstract: “sign language”, “machine translation”, “sign genera- All dates All sources
tion”
5 https://www.tandfonline.com/ Abstract: “sign language”, “machine translation”, “sign genera- All dates Journals
tion
2.5 Inclusion and exclusion criteria i.e., each study was assessed for bias, internal and external
validity of results.
In the first stage, irrelevant papers were excluded based on Using the quality assessment as per “Appendix A”, all of
their titles. In our study, machine translation is a broad topic, the included papers contain premium SLMT research, thus
including translation between spoken languages and from increasing the selected database’s validity. In quality assess-
sign to spoken languages. These studies were ineligible for ment after the screening questions of A.1 and A.2 were con-
review as we were focussing on text to SLMT. The system- sidered, the study for detailed findings was performed in A.3
atic review included qualitative and quantitative research
studies published up to and including 2020. Research papers
repeated in different e-resources were individually excluded
to remove any redundancy. 3 Background
13
Universal Access in the Information Society
summarize the translation process between spoken lan- • Interlingua-based: In this, the source text is trans-
guages to sign language. formed to an Interlingua (i.e., abstract language inde-
pendent representation) presentation, and from this
presentation, the target text is generated;
3.1 Sign language • Transfer-based: This is similar to Interlingua based
translation as both have intermediate representation;
Sign languages are full-fledged natural languages with their
the difference is that in interlingua-based intermedi-
own grammar and lexicon [135]. They are expressed through
ate representation is independent of the languages
manual articulations in combination with non-manual
in question, whereas, in transfer-based, it has some
components.
dependency on the language pair involved.
There are two kinds of features to be considered in signs:
manual features and non-manual features. Manual features
• Corpus-Based machine translation (CBMT): This type
include the movement of hands, fingers, and arms. Non-
of translation requires bilingual data i.e., data of both
manual features hold a fundamental component in all sign
spoken and sign languages. It is further classified as fol-
languages. They include facial expressions, eye gaze, head,
lows:
and upper body movement, and position. Non-manual signs,
in combination with manual signs, give a complete represen-
• Statistical `Machine Translation (SMT): This pro-
tation of sign language.
cess uses probability and is based on bilingual text
Sign languages like spoken languages organize elemen-
corpora. It is dependent upon large parallel context;
tary units called phonemes into meaningful units called
• Example-Based Machine Translation (EBMT): It is
semantic units. These units are represented through hand
based on the idea of analogy and requires a number
shape, orientation, location, movement, and non-manual
of bilingual examples without any linguistic knowl-
expressions. Unlike spoken languages, sign languages take
edge;
advantage of the spatial nature of the language through the
• Hybrid Machine Translation: This involves the
use of classifiers. Classifiers help in spatially showing refer-
strength of multiple machine translation approaches
ent type, size, shape, and movement.
within a single machine translation system.
There is a common misconception that sign languages
are dependent on spoken sign languages. As sign language
• Neural Machine Translation (NMT): This method of
develops, it borrows some elements from spoken languages
translation uses an artificial neural network to predict. It
but how much is borrowed varies. The grammar of sign lan-
uses an encoder that divides the sentence into constituent
guage is different from spoken language. Therefore, while
words, and the meanings are represented using vectors.
translating from a spoken language to sign language, both
The sentences are then interpreted as a whole and are
the source and target languages’ grammar must be consid-
further decoded using the weighted distribution over the
ered for an adequate translation.
encoded vectors. [78].
Machine translation is a computer science field that investi- Sign languages differ from spoken language in various ways.
gates the use of software to translate text/voice from one lan- Apart from the grammatical differences, sign and spoken
guage to another. It can assist in breaking linguistic barriers languages differ in structure, word order, and lexicon for
and providing easy access to information. A human transla- some languages. San-Segundo et al. point specific issues
tor can be substituted with machine translation software to while translating Spanish to Spanish Sign Language (LSE),
perform this translation. In [8] the author describes different such as mapping one semantic concept to a specific sign,
types of machine translations listed below: mapping several semantic concepts onto a unique sign,
and generating several signs from one semantic concept
• Rule-Based Machine Translation (RBMT): This transla- [134]. Similarly, in translation from Arabic text to Arabic
tion is based upon linguistic rules, and it involves infor- Sign Language (ArSL), similar challenges can be noticed
mation about the source and target language. It is further wherein the difference in grammatical rules and difference
classified into the following categories: in the word order of both source and target languages pose
issues in the translation process [102].
• Direct-based: This consists of translating word by Another vital difference between spoken and sign lan-
word using a bilingual dictionary after applying mor- guages is sequencing [95, 164]; in spoken languages, pho-
phological analysis; nemes are produced in sequence, whereas sign languages
13
Universal Access in the Information Society
have non-sequential components because fingers, hands, and Language (LSE), Indian Sign Language (ISL), etc. To
face movements can be involved in sign language simul- translate from one language to another, parallel corpora
taneously. Similar differences can be found between Thai is to be created where corresponding sentences, phrases,
and Thai sign language (TSL) wherein the Thai language is or words from the two languages can be identified [123].
linear, but Thai sign language is simultaneous with parallel, Text to sign language process is divided into three
temporal and spatial configurations [30]. TSL, like other modules:
sign languages, differs from the Thai language in word
order as a Thai sentence contains Subject (S), Verb (V), and • The first module pre-processes the input text in which
Object (O) in a sequence that differs from TSL. the text is morpho-syntactically analysed and broken into
The above differences specify that sign languages have words with their types (noun, verb, particle, etc.) with a
some specific features, which are listed below [126]: language model’s help;
• In the second module, the pre-processed words are con-
• Non-Manual components: The articulators in sign lan- verted into sign sequences. For this module, various
guages are hands, arms, face, head, neck, and body. machine translation strategies (RBMT, EBMT, SMT,
While signing, the signer covers all the non-manual HBMT, and NMT) can be followed;
components to perform a sign; • The last module converts the sign sequences into sign
• The use of space: The space around the signer is a lexical glosses, videos, or animated avatars.
space with a phonological value and is used to articulate
signs; The following section of the paper discusses the current
• Parts of Speech: The sign languages have different parts status of machine translation in sign language. It includes
of speech, and sometimes the noun, the adjective, and the conventional as well as state of the art studies in this field.
verb are represented by the same sign; The following studies are milestones in the field of sign
• Classifiers: Classifiers are used to give meaning to a language generation, some having high citations.
linguistic category of nouns and nominal’s. In sign lan-
guages, classifiers are certain hand shapes that substitute
for other signs and have morphological value;
• Syntax: Not all languages organize the sentence structure 4 Current status of machine translation
in SVO order; many sign languages have different place- in sign language
ment of object and verb in the sentence.
Several SLMT projects have been carried out through-
3.4 Process of text to sign language machine out the world like TESSA, ViSiCAST, TEAM, ZARDOZ,
translation SASL-MT, TGT [12, 130, 149, 156, 164, 165]. All the
projects mentioned above have played a significant role in
The automated sign language translation systems make more making machine translation an essential technology text to
information and services accessible to deaf and hearing- sign language translation systems. In the next part of the
impaired people in an economical way [30]. paper, we will discuss different machine translation studies
In this paper, we will discuss translations from spo- and some contemporary works in this field. Table 3 shows
ken languages to sign languages. Each country generally the categorization of machine translation with the count
has its native sign language like American Sign Language of studies in each category.
(ASL), British Sign Language (BSL), Spanish Sign
Table 3 Number of studies Sr. no. Type of machine translation Code # Citations
referring to different categories
of machine translation 1 Rule Based Machine Translation C1 38 [12, 53, 66, 76, 87, 129, 130, 149, 152,
156, 161, 164, 165],
[9, 51, 54, 62, 64, 104–106, 131, 133, 166,
167],
[4, 17, 29, 42, 44, 46, 48, 81, 88, 102, 120,
126]
2 Example Based Machine Translation C2 5 [5, 15, 114, 127, 139]
3 Statistical Machine Translation C3 10 [3, 22, 23, 30, 92, 95, 110, 112, 146, 162]
4 Hybrid Machine Translations C4 9 [2, 19, 79, 89, 97, 113, 115, 117, 132, 162]
5 Neural Machine Translation C5 6 [20, 103, 136–138, 147]
13
Universal Access in the Information Society
The language processing model is an essential prerequisite During 1990s, the need for a sign language corpus and trans-
for machine translation from spoken to sign language. A pri- lating the spoken language into a sign language started sur-
mary language processing model includes input text parser, facing. Some significant works during this time have been
Word eliminator, stemmer, and phrase reordering modules done in different languages like ZARDOZ for English to
[6]. A parser applies tokenization to get parts of speech other sign languages, TESSA for British to British sign lan-
(POS) tagging for the input text. Newly generated tokens guage, INGIT for English to Indian Sign language, TEAM
are further transferred to the eliminator module, removing for English to American sign language, and SYUWAN for
all the unwanted tokens from the parsed text. The stem- Japanese to Japanese sign language [76, 152, 156, 161, 164].
mer module works on verbs and converts every word into a Sign translation raises various exciting issues for how
simple present tense of the verb. As mentioned above that machine translation can be used to make it an automatic
spoken and sign language differ from each other in terms of translation. Machine translation helps synthesize existing
sequencing. This difference gives rise to the phrase reorder- translation technologies into a workable and socially rel-
ing model, which reorders the arguments according to sign evant application [156].
language grammar [95]. Another critical factor in SLMT program is to measure
Language models or the pre-processing model reduces the suitability of the translation program. Several efforts
variability in the source language necessary in both rule have been made in measuring translation systems’ perfor-
and corpus-based systems. Language models perform mor- mance, considering both automatic and manual evaluation
phological analyses on source data to produce accurate and methods. The most common evaluation metrics which are
maximum rules and significantly reduce source-target lan- used in the studies are WER (Word Error Rate), PER (posi-
guage alignment in corpus-based systems. tion-independent word error rate), BLEU (Bilingual Evalu-
The earliest use of language models were seen in sys- ation under Study), TER (Translation Error Rate) [121, 125,
tems like ZARDOZ [154, 156], TEAM [164], ASL Work- 142, 151]. Various other performance metrics and criteria
bench [144] and ViSiCAST [13, 104, 129, 130]. The TEAM of manual evaluation will be discussed in detail in Sect. 4.4.
system employed Synchronous Tree Adjoining Grammar Different approaches of machine language have come
(STAG) [140] rules to build an English dependency tree up in past years for text to sign language translation which
during the analysis stage. ASL Workbench used Lexical can be broadly categorized as (1) rule-based machine trans-
Functional Grammar (LFG) to convert English texts into lation and (2) corpus-based machine translation, which is
functional structures, further converted to ASL output. ViSi- sub-categorized as (a) example-based machine translation
CAST system, which is also considered the most innovative (b) statistical-based machine translation (c) hybrid-based
system amongst those mentioned above, uses CMU Link machine translation and, (3) neural machine translation. All
Parser [141] to analyze English input text. Prolog declarative approaches have their strengths and weaknesses (Fig. 1).
clause grammar rules are further used to convert the output Translation systems under these approaches will be dis-
into a Discourse Representation Structure (DRS) [68]. cussed in the following part of the paper. Figure 2 gives the
Earlier parsers and morpho analyzers were based on Eng- taxonomy of Machine Translation.
lish syntax. As more and more researchers worldwide started
working in this field, several multilingual parsers and spe- 4.2.1 Rule‑based machine translation
cific language parsers were also developed. Parsers like ILSP
parser, VnTokenizer, and the Al-Khalil morpho system were Rule-based Machine Translation (RBMT) system is based
explicitly made for Greek, Vietnamese and Arabic language upon linguistic information about the source and target lan-
respectively [1, 93, 124]. Stanford Parser is another present- guage. Rule-based systems take sentences of the source lan-
day parser that, apart from English, is adapted to work on guage as input and generate them to output sentences based
other languages like Chinese, German and Arabic [153]. on morphological, syntactic, and semantic analysis of both
Figure 4 shows some parsers used by various studies of source and target languages. RBMT system consists of a
machine translations. Table 3 shows different machine trans- source language morphological analyser, a source language
lations referred to in Fig. 4 as CX, where X is a number from parser and translator, a target language morphological gen-
1 to 5. The reference numbers of the corresponding study erator, and a target language parser for composing the output
are also mentioned in Fig. 4. Tables 4, 5, 6, 7, and 8 give a sentences. The Vauquois Pyramid presented in Fig. 3 is used
comparative analysis of various studies related to different to describe the complexity and sophistication of the rule
types of machine translations. Studies in these tables are based approaches [87].
picked from Table 3, and the sign synthesis modes used by Earlier approaches of rule based translation include
the corresponding works are also presented. some outstanding works like the ZARDOZ system, the SL
13
Table 4 Projects following rule-based machine translation and their comparative analysis
Project Description Languages Year Strengths Limitations Sign synthesis
[156] Uses Interlingua approach for English to Irish, British and 1994 Complete infrastructure includ- Comprehensive grammar for Animation sequences
translation Japanese Sign Language ing parsing, interlingua genera- all languages has not been
tion and animation component developed
has been implemented
[164] Prototype MT system is made English to American Sign 2000 Takes into account visual and No evaluation of the system has Animations using Parallel Transi-
based upon linguistic, spatial Language(ASL) spatial information associated been done tion Networks (Pat-Nets) [8]
and visual information with ASL signs
[161] Translation on the basis of phrase British to British Sign Language 2000 Demonstrates virtual signers with Issues in language translation Animation is achieved by Motion
lookup approach instead of (BSL) high fidelity to deliver legible and understanding Capture (Simon-the-Signer)
Universal Access in the Information Society
13
Table 4 (continued)
Project Description Languages Year Strengths Limitations Sign synthesis
13
[166] Uses STAG parser (earlier used English to SASL 2006 Stress patterns are included to Signing space construction needs 3D avatar
in TEAM) in order to generate improve non-manual generation improvement
non-manual signs of signs
[131] Applied 153 rules for translation Spanish to Spanish Sign 2006 Conducts field experiments using Errors in speech recognition 3D avatar (VGuido)
and GER and BLEU are used Language(LSE) automatic performance metrics inhibit generation of gestures
for evaluating results
[133] Gesture sequence generation and Spanish to LSE 2007 Develops an animated agent and AGR agent is not sophisticated AGR(Agent for Gesture Anima-
gesture animation are the main strategy for reducing gesture tion) avatar made up of geomet-
focus design time ric shapes
[46] Combines natural language Greek to Greek Sign Language 2007 Analysis adopted for GSL gives The system provides optimum 3D avatar
knowledge, machine transla- (GSL) multilayer information required result in a restricted sub-lan-
tion and avatar technology for for performance of grammatical guage oriented environment
dynamic generation of signs GSL utterance
[9] Uses separate module for analyz- Spanish to LSE 2009 Inclusion of mood of signer in Evaluation conducted is informal Maxine animation engine
ing and transformation of input translation
text, also the model includes
mood preferences of the deaf
person
[88] Detailed implementation of Greek to Greek Sign Language 2010 GSL conversion module is open- GSL conversion module works 3D sign representation
language processing component (GSL) source, platform independent with a language specific parser
and focuses upon problems of and functional through Web
knowledge extraction of SL
grammar
[126] Follows transfer based approach Spanish to LSE 2014 Follows algorithms to cover Use of glosses as final output Glosses
and applies word order gen- semantic and lexical gaps dur-
eration algorithm to deal with ing translation
topic oriented surface order of
LSE
[4] Several information extraction Portuguese to Portuguese Sign 2014 Use of open source software for Less linguistic understanding of 3D Avatar using Blender
techniques are applied before Language animation the language
being passed onto animation
stage for sentiment analysis
[48] Formalizes LSF production Rules French to French Sign Language 2015 Rules are formed based upon Proposed evaluation model is 3D avatar
and triggers them with Text (LSF) corpus analysis and not influ- premature
pre-processing enced by text structure
[17] Web application for sign lan- French to French Sign Language 2016 Focuses on both manual and non- Not available for public access WebGL for 3D animation
guage generation. Linguistic (LSF) manual articulators
model is used
[42] Presents implementation of post- Greek to GSL 2016 Incorporation of SL editor in Evaluators demanded availability 3D avatar using HamNoSys tran-
processing stage to grammar MT system which increases the of synonyms in the editing scription
based machine translation accessibility potential of deaf environment
users
Universal Access in the Information Society
Table 4 (continued)
Project Description Languages Year Strengths Limitations Sign synthesis
[87] A large and good quality of paral- Greek to GSL 2018 Produces good language models Only glosses are generated as Glosses
lel corpora is created for sign which can be used in SMT final output
language approaches
[102] Semantic system is developed Arabic to Arabic Sign Language 2019 Creation of parallel corpus in No animation for SL output is Gloss and GIF images
which performs lexical, seman- (ArSL) health domain which is freely produced
tic and syntactic analysis on the available for researchers
input sentence
[29] Uses Machine Learning Decision Vietnamese to Vietnamese Sign 2019 Use of decision tree to shorten Database is small resulting in 3D avatar using HamNoSys tran-
Tree to convert structured sen- Language (VSL) complex Vietnamese sentences lesser accuracy, scription
Universal Access in the Information Society
Table 5 Projects using example-based machine translation and their comparative analysis
[114] One of the initial approaches in the English to Dutch Sign Language 2005 Segmentation approaches used Small corpus and dictionary size. Annotations using ELAN tool
field of EBMT in SLMT. Use of provided similar chunks of data No 3D animation for SL output
Marker Hypothesis for segmen- which helped in alignment dur-
tation of English text ing translation
[5] Morphological analyzer and root Arabic to ArSL 2011 Addition of morphological ana- Small corpus leads to several Sign clips
extractor are used along with lyzer and root extractor to the missing signs error problem
chunk-based EBMT to overcome system
small corpora problems
[15] EBMT coupled with Genetic Any text to ASL 2012 Use of specific form and emo- Automatic management of signing 3D humanoid using Sign
algorithm to produce naturalized tion interpolation factors in SL area is not included Modelling Language
animations animation
[139] Lexical Supervision component is Turkish to Turkish Sign Language 2017 Use of k-fold cross-validation Use of glosses as final SL output TSL glosses
used in learning and translating method for evaluation to deter-
component of the procedures mine training and test sets
[127] Translation happens based on Vietnamese to Vietnamese Sign 2018 Text processing resulted in better Long processing time VSL glosses
word type to reduce the problem Language accuracy
of unexpected words in the
sentence
13
13
Table 6 Projects using Statistical Machine Translation and their comparative analysis
Project Description Language Year Strengths Limitations Sign Synthesis
[22] A bilingual corpus trains the German to German Sign Language 2004 Produces satisfactory results on a Low translation performance Glosses using annotation
translation system, and methods (DGS) small corpus
to develop the SL corpora are
demonstrated
[146] Morpho-syntactic knowledge source German to DGS 2006 Use of flexible POS parser allowing Poorly supported avatar Glosses
of German is used to improve the transforming words based on lexi- Avatar
translation quality cal assumptions
[92] Follows phrase based translation Czech to Czech Sign Language 2007 The sign editor provides online Lack of intelligibility in isolated Animation
which uses word reordering to access for creating and editing signs
speed up the process signs
[30] Syntax and semantic differences Thai to Thai Sign Language 2007 The model is simple, modular, No animation or 3D avatar as SL Pictures with text
between source and target lan- accurate and user-friendly output
guages are considered in develop-
ing this model
[162] Viterbi algorithms are applied along Chinese to Taiwanese Sign Lan- 2007 Evaluation results indicates that the No animation or 3D avatar as SL TSL glosses
with context free grammars in this guage (TSL) model outperforms IBM Model# output
three stage translation process
[95] The pre-processing (uses word- Spanish to LSE 2011 The pre-processing model reduces No animation or 3D avatar as SL LSE glosses
tag list) module of the system is variability in source language output
incorporated into phrase based
and SFST architecture of the
system
[112] Involves sense based and pronuncia- Japanese to Japanese Sign Lan- 2014 Proper names are presented as CG Only human hands and fingers are 3D model of human
tion based translation and a corpus guage animations used to present SL output hands and fingers
of Japanese proper names is also
constructed
[3] Uses rich module of semantic inter- Arabic to ArSL 2017 3D avatar is based on character Lack of synchronization between 3D avatar
pretation, language model and behaviours and gestures used by facial expressions, lip movements
support dictionary of signs ArSL users and hand movements
[110] A word based translation process, English to Indian Sign Language 2017 No dependency on rules or gram- No 3D animation as SL output ISL glosses
uses IBM models for word align- mar of either of the languages
ment
[23] SMT is applied on small corpora Turkish to TID 2019 Produces satisfactory results with No 3D animation as SL output TID glosses
giving out satisfactory results small corpora
Universal Access in the Information Society
Table 7 Projects using Hybrid Machine Translation and their comparative analysis
Project Combination Description Language Year Strengths Limitations Sign Synthesis
[162] RBMT + SMT Phrase structured trees are Chinese to Taiwanese Sign 2007 Evaluation results indicate that No animation or 3D avatar as TSL glosses
used along with CFG rules Language (TSL) the model outperforms IBM SL output
to perform the translation Model3
[115] EBMT + SMT Focuses on Irish Sign language English to Irish Sign Lan- 2007 Sub-sentential chunking of No animation or 3D avatar as Annotated output
grammar and linguistics, also guage data improves translation SL output
highlights the importance accuracy
of native signers for manual
evaluation
Universal Access in the Information Society
[113] SMT + EBMT The decoder of the system English to Irish Sign Lan- 2008 Addition of avatar component Naturality of avatar need Avatar using
compares the input data guage improvement POSER animation
with the bilingually aligned software
resources
[132] RBMT + EBMT + SMT Translation module has a hier- Spanish to LSE 2012 Field evaluation is adopted Naturalness of avatar is not VGuido 3D avatar
archical structure using all to measure the real-time comparable to human sign
the MT systems in each step efficiency of the system language
[117] EBMT + SMT Adds Animation module to the English to Irish Sign Lan- 2008 Incorporates additional mod- Manual evaluation of signed 3D avatar
previous project guage ules over the baseline system output is not scheduled
[97] EBMT + SMT Only data-oriented strategies Spanish to LSE 2013 Sign editor has options like – 3D avatar
are used in this system predefined positions and
orientations, which reduce
sign creation time
[99] EBMT + SMT + RBMT Technology developed in the Spanish to LSE 2013 Translation system is Inclusion of RBMT makes 3D avatar
previous work has been developed for two separate translation a time consum-
adapted to a new domain domains ing task
[94] EBMT + SMT Analysis of different transla- Spanish to LSE 2014 Extensive field evaluation Lack of normalization of 3D avatar
tion strategies and their results in real time effective- LSE leads to several sign
integration to achieve the ness of system mistakes
best accuracy
[89] RBMT + SMT Creating a large parallel Greek to GSL 2018 Process does not need deep No use of animation technolo- GSL glosses
corpus that is further used as grammar knowledge of GSL gies for sign synthesis
training data
[2] RBMT + SMT Building artificial corpus using English to ASL 2019 IBM algorithms are enhanced No use of animation technolo- ASL glosses
grammar dependency rules by integrating Jaro-Winkler gies for sign synthesis
distances
[19] RBMT + EBMT Translation rules and a data- Arabic text to ArSL 2019 Combination of linguistic No 3D animation. No evalu- GIF images
base of signs are used for knowledge and database of ation
the translation, and proper signs results in better transla-
names are finger spelled tion accuracy
[79] RBMT + SMT After the application of rules, Turkish to Turkish Sign Lan- 2019 Language-specific rules No 3D avatar or animation for TID glosses
the intermediate result is fed guage (TID) increase the overall perfor- SL output
into the statistical component mance of the system
13
13
Table 8 Projects using Neural Machine Translation and their comparative analysis
Project Description Language Year Strengths Limitations Sign synthesis Dataset
[103] NMT system wholly based English to ASL 2018 Combines machine learning Tokenization errors due to Virtual avatar 83,618 pairs of sentences
on attention was used in and deep learning low vocabulary size [122]
this video to video system
[20] The project uses feed- Arabic to ArSL 2019 Morphological characteris- Limited database of sen- 3D Avatar 9715 pairs of
forward back propagation tics are utilized to derive tences and signs sentences(interrogative,
Artificial Neural Network maximum information affirmative and imperative)
from each word in the [19]
input to NMT
[147] Recurrent Neural Network German to German 2019 Uses Motion Graph to pro- Less training resolution fails HD Sign Video sequences 8257 sequences of weather
Method is combined with Sign Language duce sign video frames to produce sign sequences broadcasts [24], videotaped
Motion Graphs (DGS) comparable with avatar repeated production of 100
animation isolated signs [38], multiple
videos of BSL sequences
[16]
[138] Progressive Transformer German to DGS 2020 Uses several data augmenta- Focuses mainly on manual 3D sign pose sequences 8257 sequences of weather
uses a counter decod- tion techniques to improve features of a sign, and forecast with 2887 German
ing technique to predict SLP production under expressed sign words and sign language
continuous sign language production videos of 1066 different
sequences of varying signs [24]
length
[136] Progressive transformer German to DGS 2020 Appends regression loss Uses skeletal representation 3d sign pose sequences 8257 sequences of weather
architecture is employed with adversarial loss and of signs forecast with 2887 German
with a conditional adver- introduces production of words and sign language
sarial discriminator non-manual sign features videos of 1066 different
signs [24]
[157] Uses a sota human motion English to ASL 2020 Study shows preference of Bad synthesis of the hands Sign language videos 60 h of sign language videos
transfer method videos over skeletal visu- [37]
alizations
[137] Transformer architecture German to DGS 2020 Novel key point based loss Continuous sign language German vocabulary of 2887
with Mixture Density Net- improves the quality of videos words, Sign language videos
work (MDN) is employed hand image synthesis of 1066 different signs [24]
to generate skeletal poses Controllable video gen- and high-quality sign inter-
eration enables trainin on preter broadcast data
large and diverse datasets
Universal Access in the Information Society
Universal Access in the Information Society
13
Universal Access in the Information Society
Machine
[5] Translaon [[133],[95]]
Google Tashkeel C3 SRI-LM
C2
C4
[19] [79]
AlKhalil Morpho C4
System BOUN Morpho
C4 analyzer
C5 [[23],[104]]
C5 C3 MOSES
like dependency on perfect POS tagging and word sense the mood preferences of a deaf person, which is a step up
disambiguation (WSD) [111]. from the previous work [10].
San-Segundo et al. presented one of the first experi- Regarding teaching SL grammar, Kouremenos et al.
ments of translating Spanish speech to a sign language presented a prototype Greek text to Greek Sign Language
translation system [131]. It followed a limited domain (GSL) conversion system, which is integrated into an edu-
(Identification document, passport) of 458 words and cational platform to address the needs of teaching GSL
270 gestures. The sign gestures were represented using grammar [88]. Kouremenos et al. present the language pro-
VGuido (the eSIGN 3D avatar) by taking an input script in cessing component’s detailed implementation focusing on
the form of SiGML (Signing Gesture Markup language). SL grammar knowledge problems in this work. In the same
The study’s evaluation resulted in a 27.2% gesture error language pair, Fotinea et al. proposed a dynamic combina-
rate and 0.62% BLEU (Bilingual Evaluation Understudy), tion of linguistic knowledge and avatar performance [46].
which was due to a high percentage of deletions and other GSL is annotated using HamNoSys and VRML controlled
errors. Montero et al. also presented an architecture for by STEP (Scripted Technology for Embodies Persona) lan-
translating Spanish into LSE gestures [133]. This work guage [60] is used for avatar animation. Manual and non-
includes four modules: speech recognizer, semantic analy- manual features like eye gaze and facial expressions have
sis, gesture sequence generation, and gesture playing. The been considered, but the avatar lacks naturality.
central focus of the work is on the last two phases. For Later, systems using rule-based approaches followed
gesture animations, an animated agent is developed, and sophisticated parsers and visual avatars for sign synthesis.
the agent positions created by the developer are combined Mazzei et al. use dependency parser, semantic interpreter,
with positions generated automatically by the system. and spatial planner to treat hands position while sign gen-
Evaluation for the proposed strategy is done using position eration [109]. The systems evaluate and compare the results
distance metric and measurement of gesture complexity. with a statistical translator using Mean and Standard Devia-
Manual evaluation has not been conducted, which is also tion, giving out approximately the same results. Another
marked as a future endeavour in which perception of deaf rule-based translation approach proposed by Porta et al.
people will be considered. For Spanish to LSE translation, for translating Spanish to Spanish Sign Language glosses
there is another rule-based approach proposed by Baldas- addresses issues like lexical gap using lexical-semantic rela-
sarri et al., which also has different modules for translation tionship, topic-oriented analysis using word order algorithm,
of phrases and sentences: morphological analyzer, gram- and classifier predicates and classifier names [126]. The
matical analyzer, morphological transformer, grammatical system is evaluated, giving BLEU as 0.30 and Translation
transformer, and sign generator [9]. The signs are gener- Error Rate (TER) as 42%. The paper’s linguistic error analy-
ated on the Maxine animation engine, which also includes sis indicates that the difference in the system output and
13
Universal Access in the Information Society
reference translations arises from the variations in the lin- gloss system is based on the Berkley gloss system and
guistic structures and that classifier predicates are the most contains non-manual components information and can be
complex expressions to be generated. However, the paper directly used for any 3D animation. The performance of the
is incomplete as the translation system gives out glosses as system is evaluated on a BLEU metric score.
the output, which is considered an intermediate symbolic In the Turkish language, Eryiğit et al., a rule-based
representation and not a final output. approach, performs translation for Turkish primary school
Almeida et al. use a rule-based pipeline with a deep educational material into TID (Turkish sign language) [44].
structural transfer and analysis up to the semantic level to It follows a transfer-based approach in which the translation
better cope with the language gap [4]. This system has also stage consists of translational rules from Turkish to TID.
explored various challenges in the domain of Avatar anima- The input to the translational rules stage is analysis of the
tion. It used Blender to setup a character with a rig move source language which is produced by Turkish NLP pipeline
mechanism to generate a fluid animation of Portuguese Sign [45] and the output is fed into the animation layer. The ani-
Language (LGP) utterances [169]. Though the results were mation signs are collected using a motion capture scheme,
positive, the deep linguistic study of sign language was a which uses RGB-D cameras to capture signs from native
future concern. signers. Resource and corpus creation of TID is the main
In sign generation or sign synthesis, Braffort et al. devel- focus for which it introduces a machine-readable representa-
oped a web application named KAZOO for French sign tion scheme of TID, which is linked to the ELAN annotation
language (LSF) generation using a virtual signer [17]. The tool. The work is part of an ongoing research project which
author describes the platform’s architecture in detail, and it is not yet complete; thus no evaluation is reported.
discusses the WebGL technology used in it to display 3D Luqman et al. develop a gloss notation system to tran-
animations of readymade animations together with synthetic scribe Arabic Sign Language (ArSL), also creates a semantic
animations based on SL specific linguistic model [178]. This RBMT to translate from Arabic to ArSL [102]. The trans-
research uses AZee as the linguistic model that allows more lation process follows three main stages of morphological
flexibility, precision, and completeness and pays attention analysis, syntactic analysis, and ArSL generation. Input to
to manual and non manual articulators and combines them the system is an Arabic sentence, and output is an ArSL
effectively [49]. Filhol et al. further use KAZOO in another sentence represented in a gloss notation displayed as a sign
French to LSF translation system based upon two distinct sequence of GIF images. A bilingual parallel corpus tar-
efforts: formalizing LSF production rules and triggering geting the health domain of 600 sentences was build and
them with text pre-processing [48]. The AZee frameworks was translated into ArSL by two expert signers. The system
use the concept of “production rule” identified by its func- generates 15 rules which cover the mapping at word, phrase,
tion. They formally describe the form to produce to express and sentence level. The Out of vocabulary (OOV) problem
it to be unambiguously generated by SL synthesis with a is also handled using the synonyms of the OOV words. The
virtual signer. This approach’s second distinctive effort deals results reported by evaluation parameters BLEU, WER, TER
with breaking the entire text into as many text processing are 0.35, 0.55, and 0.53, respectively. The cause of errors is
problems as there are rules available. Therefore each estab- mainly because of using a parser whose training has been
lished rule’s function creates an information extraction prob- done on news data and thus is not appropriate for the pro-
lem. This approach creates numerous amounts of rules to posed work, which deals with the health domain data.
make the system linguistically robust. On the other hand, a Conversion of sentences to gloss notations is converting
large set of rules leads to the problem of rule combination, source language standard sentences into short sentences for
thus creating confusion of how the rules are to be nested the deaf. Nguyen et al. propose a rule-based approach that
under one another. reduces or shortens spoken or written Vietnamese sentences
RBMT approaches still hold a relevant place in the text by reducing propositions, conjunctions, and auxiliary words
to SL translation due to the lack of parallel corpora required and replacing synonyms [120]. The system evaluates 200
in the modern machine translation systems. Lack of gram- simple sentences giving out a BLEU score of 97.5%, thus
mar and lack of Sign language corpus are the two problems proving the proposed method’s effectiveness. However, the
addressed by Kouremenos et al. To further improve upon system did not perform sign synthesis, which was incorpo-
them, propose a processing method for creating a large and rated by Da et al., which uses a machine-learning decision
good quality parallel data for SL’s [87]. Kouremenos et al.’s tree (ID3) to convert Vietnamese sentences to short sen-
system is an advancement over a Greek to GSL translation tences of Vietnamese Sign Language [29]. In this project,
system of Efthimiou et al. It uses an open-source Python a system is developed that builds a data set and applies ID3
NLTK toolkit than Java technologies and open-source to convert sentences’ structure to reduced SL forms. Ham-
AUEB’s Greek POS parser [86] over Greek ILSP parser, NoSys notation is used for transcription and then a SiGML
which was not open source [42]. The proposed system’s code to express the SL by a virtual signer. Manual evaluation
13
Universal Access in the Information Society
of the results indicates that the understanding of the clip [118]. Sara Morrissey and Andy Way have been pioneers
generated is 97.06%. As in all RBMT approaches, the need in using the example-based approach in SLMT systems. In
for a more extensive dataset is the limitation of this project. one of the first approaches, Morrissey et al. use the Marker
One of the latest applications of RBMT is a Pakistan Sign Hypothesis [52] to translate English to Dutch Sign Lan-
language project (PSL). In this project, Khan et al. use a guage [114]. The Marker Hypothesis proved to be a prom-
grammar-based MT model to translate English sentences ising approach for the segmentation of English input text
into equivalent PSL sequences using core NLP techniques into chunks that can be aligned accurately with SL annota-
[81]. Khan et al.’s project is one of the first projects in PSL tion. ELAN annotation tool has been used for sign language
using core NLP techniques. This approach analyzes the corpora, including richly annotated data of three different
linguistic structure of PSL and formulates the grammatical sign languages, including Dutch Sign Language. Improve-
structure of PSL sentences. The rules created by analysis are ment was expected in the project’s chunk alignment segment
formalized into Context-free Grammar, used as a parsing to make close matches between the English text and sign
module for translation and validation for target PSL sen- annotation.
tences. Before the generation of rules, a dataset with the Almohimeed et al. propose that EBMT is suitable to pro-
deaf community and PSL experts’ help was created. The duce reasonable translation output even with existing small
created dataset is then extensively analysed for grammatical size corpora [5]. It uses a corpus of 203 signed sentences
differences between English and PSL. Only the sentences for conversion from Arabic text to Arabic Sign Language in
present inside the sign database are animated for sign gen- the domain of instructional language typically used in deaf
eration, and the rest are fingerspelled. In manual evaluation, education. The EBMT system works on chunks taken from
valid and invalid sentences were used as parameters. The the input text and aligned with the equivalent signs. Google
automatic evaluation was done based upon BLEU, WER, Tashkeel is used in the pre-processing step to avoid ambigu-
and TER as the performance metric. The respective scores ity. In the final step, the signs are recombined, and the whole
for all three metrics are 0.78, 0.10, and 0.15. The authors sentence is delivered using Window Media Video (WMV).
also discuss several points that can be incorporated in the The evaluation results indicate a word error rate (WER) of
future; one is to use the Deep Learning approach to generate 46.7% and position-independent word error rate (PER) of
more data. The rule-based approaches try to model linguistic 29.4% using the Leave-One-Out Evaluation Technique. The
knowledge to formalize rules, allowing processing data from higher error rate is due to the translation of those sentences
the input source to the target languages. that are not similar to the examples because EBMT depends
As reviewed above, many distinctive projects have used on the example’s quality. The results are expected to improve
a rule-based approach because it generates accurate results with a larger corpus.
for small-sized datasets. Table 4 lists all the studies regard- Boulares et al. combine EBMT with genetic algorithm
ing RBMT discussed above and their comparative analysis. and fuzzy logic to translate English into ASL [15]. This
approach performs a global proximity search (Needleman-
4.2.2 Corpus‑based machine translations Wunsch Algorithm [119]) to perform global alignment
on two sequences and then proceed to Example proximity
Corpus-Based Machine Translation (CBMT) is generated search (Smith-Waterman Algorithm [160]) for local align-
based on the bilingual text. Though the RBMT systems can ment. This approach results in a set of scores representing
produce efficient translations, constructing the whole RBMT proximity between all words in the two sentences. Boulares
system is a laborious task as linguistic resources need to et al. use fuzzy logic concepts to detect compound emotion
be handcrafted, and new rules to be added to the system from the text, which yielded good results that can be used
from time to time, making it a time-consuming task. CBMT further in systems that use interpolations between different
systems, on the other hand, are based upon a large amount facial expression modules to produce emotions [15].
of bilingual data. CBMT systems, also sometimes called A bidirectional EBMT approach for Turkish to Turkish
data-driven machine translations, are classified in three cat- Sign Language (TSL) proposed by Selcuk-Simsek et al.
egories: (1) Example-based machine translation (EBMT), prefers EBMT over other corpus-based approaches as the
(2) Statistical machine translation (SMT), (3) Hybrid based grammar of TSL is barely known, and also EBMT is suitable
machine translation (HBMT). for a limited dataset [139]. Learning and translation are two
primary components of this system, and both procedures use
4.2.3 Example‑based machine translation lexical supervision component (LSC) as its subpart. LSC
is constituted of a morphological analyzer, an orthography
Makoto Nagao first suggested Example-Based Machine control tool, and a disambiguation tool. Furthermore, k-fold
translation in 1984. Training of EBMT is done on bilingual cross-validation determines training and test sets, and results
parallel corpora containing sentence pairs of both languages are obtained in terms of BLEU as 43% and TER as 38%. A
13
Universal Access in the Information Society
significantly lower TER rate was observed by Quach et al. Another Czech text-to-sign synthesis system proposed by
in the process of converting Vietnamese Grammar to Viet- Krnoul et al. converts written text to an artificial human
namese Sign Language (VSL) [127]. A corpus of 740 sen- model animation [92]. The translation system implements
tences was used, and a TER value of 2.58% was observed. a simple monotonic phrase-based decoder (SiMPAD) [75]
However, the system’s processing time was long as EBMT which does not have a reordering module and uses a trigram
systems translate based on available sentence patterns. language model. SLAPE [91] editor is used to create and
EBMT approaches mentioned above are usually applied in edit signs, and HamNoSys is used to represent intermedi-
projects having a limited dataset. Table 5 lists all the stud- ate signs which are further synthesized in H-Anim standard
ies discussed regarding EBMT along with their comparative animation model. Dangsaart et al. presented the Intelligent
analysis. For larger datasets, this approach is unsuitable as Thai text-Thai sign translation (IT3STL), which follows re-
it is challenging to create a large number of examples. Thus ordering rules and Thai-sign dictionary to convert a Thai
for larger datasets, we will discuss the implementation of text to Thai Sign language (TSL) [30].It reaches an F-score
Statistical Machine translation. of 97% for 297 sentences for language learning which used
knowledge basis and architecture of Thai-Thai Sign Machine
4.2.4 Statistical machine translation Translation (TTSMT) to enhance Thai sign language learn-
ing [31].
Statistical translations are another form of Corpora Based Data sparseness has been a significant problem in statisti-
Machine Translation (CBMT) that works on bilingual text cal machine translation systems discussed above. Su et al.
paradigm. The Statistical translation is based on probability introduced thematic roles to counter the problem of data
distribution which was first approached with Bayes Theo- scarcity for Chinese to Taiwanese Sign language transla-
rem. Statistical translation, unlike RBMT, does not require tion [148]. Thematic roles attempt to capture similarities
manually developed rules, and unlike EBMT, is not suitable and differences in the verb meanings, e.g., agent (the ‘doer’
for small corpora but is only efficient with large bilingual or instigator of the action) [175]. The authors adopted Syn-
corpora. Earlier works of Statistical Machine translations chronous Context-Free Grammar (SCFG) instead of PCFG
include Koehn et al. [84, 85], wherein the former works on to convert a Chinese structure to a corresponding Taiwan-
word alignment and a framework that enables to evaluate ese SL structure. Translation memory, which comprises the-
and compare various phrase translation methods. The latter matic role sequences of both languages, contains the learned
presents a suite of open-source toolkit that, along with an templates from which the bilingual corpus can be applied for
SMT decoder, includes a wide variety of training and tun- thematic role sequence matching. Another important effort
ing tools. Bungeroth et al.’s work is also one of the earlier of this approach was the determination of verb agreements to
attempts of SMT to translate German text into German Sign produce expressive sign sequences. BLEU scores achieved
Language (DGS) [22]. In this approach, a statistical scheme by the system for long sentences (with n values of 3 and
modified from IBM models [110] is proposed. Corpus was 4 in n-gram precisions) are 0.65 and 0.671, respectively.
an issue in this work as only 167 sentence pairs for training However, due to little research on Taiwanese SL linguistics,
and 33 for testing in DGS and German were investigated. non-manual features are hardly constructed in this Structural
The evaluation results of 59.9% word error rate (WER) and statistical approach.
23.6% position-independent word error rate (PER) indicated Lopez-Ludena et al. proposed to improve in Statistical
improvement of the referenced baseline model, but it was methods of machine translation in their study. The approach
considered a small-scale example. in this paper analyzed two different statistical strategies:
Stein et al. proposed another German-DGS translation Phrase-Based system (Moses [84]) and Statistical Finite
system [146].This paper’s pre and post-processing steps State Transducer (SFST) [100]. Two new methods are
based upon morpho-syntactic analysis of German are used to improve translation from Spanish to LSE. The first
included to enhance the machine translation. For the pre- consists of implementing a categorization module (which
processing steps, CG parser was used, POS tags were used, replaces Spanish words with associated tags) in the pre-
splitting of words at breakpoints, and deleting commonly processing step before the translation. The second is the
unused words in DGS. Post-processing steps try to curtail use of Factored Translation Modules (FTMs) for improv-
basic errors of translation algorithms. Results are meas- ing translation performance. The first method has been
ured using WER and PER, which signify a 9% improve- incorporated into the Phrase-based system and an SFST,
ment over the baseline system. Corpus building has been a but FTM is considered only in the Phrase-based system.
task in many SMT research. Freksa et al. describe the first These methods allow incorporating syntactic-semantic
stage of corpus building and a translation system based on information during the translation process, thus reducing
phrase bilingual dictionary for Czech text to Czech Sign the source language variability and number of words com-
Speech synthesis giving a sentence error rate of 50.5% [74]. posing the input sentence. The valuation module reveals that
13
Universal Access in the Information Society
the categorization module’s use increases BLEU from 69.1 translated English to ISL glosses using a word-based trans-
to 78.8% for phrase-based systems and from 69.8 to 75.6% lation model, and the methodologies are implemented on
for SFST outputs. The inclusion of FTMs also increased the MOSES [84]. The corpus consists of 326 English sentences
BLEU score from 69.1 to 73.9%. Lopez-Ludena et al. did a and 537 ISL glosses. Integration with phrase-based transla-
step up from this work, in which the pre-processing module tion for better results is the prospect of the paper [110].
replaces Spanish words with associated tags and removes the In the Turkish language, Buz et al. presented a novel
words having ‘non-relevant’ tags [96]. The pre-processing approach to convert primary school education book mate-
module was incorporated in a phrase based and SFST sta- rial to TID using SMT [23]. Earlier, Eryigit et al. attempted
tistical translation system. A parallel corpus of 4080 Span- the same language pair translation, following the RBMT
ish sentences and their LSE translation has been used, and approach [44]. In this, five different approaches have been
the valuation results of BLEU rose from 73.8 to 81.0% in tested, every approach applying a different kind of pre-pro-
phrase-based system and 70.6% to 78.4% for SFST. The cessing to the source data. In the first approach, no opera-
paper has also presented a human evaluation (two experts); tion was performed on the data and gave the BLEU score
pre-processing module obtains an increase of 0.64 to 0.83 of 61.69% and WER to be 42%. In the second approach,
in phrase-based and an increase from 0.65 to 0.82 in SFST. Stemming was applied, which improved BLEU-1 to 77.66%;
Data collection is also a significant component of SMT for approaching third, positive and negative tags were given
systems. Stein et al. analyse existing data collections and to each verb to contain their meanings while stemming. In
emphasises their quality and usability for statistical machine approaches 4 and 5, pronouns were added after examining
translation [145]. This work analyses different existing cor- verbs, leading to a drop in BLEU-2 and BLEU-3. Though
pora like RWTH-Phoenix corpus [120], which is a richly the system indicated an excellent SMT approach even for
annotated corpus, Corpus-NGT [121], SIGNSPEAK [122]. smaller corpora, the system lacked any manual evaluation
The second part of the project deals with the preparation of and visual synthesis of signs. Table 6 lists all the SMT sys-
Sign language corpora for DGS and translation from Ger- tems considered in this review. SMT and EBMT approaches
man to DGS. Sentence end markers are introduced in the do not require high-end linguistic knowledge, but they suffer
pre-processing phase of the translation process. The last from the limitation of parallel data’s unavailability. Thus
phase of the paper discusses the optimization of the scarce several researchers have tried combing corpus-based and
resource translation procedure. The results indicate whether rule-based strategies to achieve better results and response.
the project’s domain was suitable for the machine transla-
tion applied, but still, it had less impact and usefulness for 4.2.5 Hybrid Machine Translation System
the deaf user.
Miyazaki et al. proposed a Japanese proper name transla- Multiple machine translation systems within a single
tion system that involved sense-based and pronunciation- machine translation system are Hybrid Machine Translation
based translation [112]. The sense-based translation is (HMT) systems. The need for a hybrid machine translation
learned from phrase pairs in a Japanese-JSL (Japanese Sign system arises from the failure of single machine translation
Language) corpus, and pronunciation based is learned from systems to achieve an adequate accuracy level. For exam-
a Japanese proper name corpus. The corresponding CG ple, Hogan et al. at Carnegie Mellon University combined
animation, a high-quality 3D model of human hands and example-based, transfer-based, knowledge-based (a rule-
fingers, is created when a proper name is entered. The CG based system displaying extensive semantic and pragmatic
animation is rendered from scripts written in TVML (TM knowledge of domain), and statistical translation sub-sys-
program Making Language). tems into one machine translation system [59].
Al-Barahamtoshy et al. use a rich module of semantic Wu et al. combine rule-based and statistical approaches
interpretation, language model, and support dictionary of to achieve translations from Chinese to Taiwanese Sign
signs to understand the type, tense, number, gender, and Language [162]. The authors use a parallel corpus of 2036
the semantic features for subject and object, which will be sentences of Chinese and a parallel annotated sequence of
scripted by a 3D avatar [3]. This approach uses SMT for corresponding Taiwanese Sign Language words from which
alignment, which is considered a function of transforma- CFG rules are created, and transfer probabilities are derived.
tion between source and target language. The future implica- Context-Free Grammars (CFGs) are formal grammar
tions of this work include acquiring facial expressions and designed for transfer-based statistical translations. Another
lip movement synchronized with hand orientation. corpus, a Chinese Treebank containing 36,925 manually
In the domain of ISL (Indian Sign Language), Mishra annotated sentences, is also used. Both corpora’s are used
et al. point the limitations of the Dasgupta et al. approach, to derive a probabilistic CFG. The system produced a BLEU
indicating that the latter was not a generic model to be score of 0.86 and was also manually evaluated with good,
followed for machine translation [32, 110]. Mishra et al. fair, and poor three traditional opinions. Though the results
13
Universal Access in the Information Society
generated were satisfactory, the system suffered from small- combination of techniques improved the results manifold rel-
sized corpora problems consequentially unsuitable for statis- ative to the single technique results. The paper also depicts
tical translations. The use of rule-based approaches inhibits the sign animation using VGuido [168] in eSignEditor [57].
the system of extensibility to new language pairs. Also, the Lopez-Ludena et al. presented a more refined version of
system did not produce any signing avatar. the approach in which instead of following all data-oriented
Researchers have tried using HMT systems to improve strategies, the required modules are generated automati-
practical applications of SLMT. Morrissey et al. uses the cally from a parallel corpus [97]. The statistical transla-
MaTrEx Machine translation system [121] and combines tion includes a pre-processing module [95] that increases
SMT and EBMT methodologies to make a modular design the performance. In this step-up project, the combination
that is above all adaptable to convert English to Irish Sign of EBMT and SMT is used, and the rule-based strategy
Language (ISL) [115]. The results indicate that the system module is not included. The sign editor uses inverse kin-
does a good work for conversion but loses its practical use ematics (IK) which helps in reducing the sign specification
as it does not have a signing avatar. Morrissey improves this time. The whole system presents an SER 10% lower than
system in the following study [113]. In this approach, the the San-Segundo et al. approach [132]. Lopez-Ludena et al.
MaTrEx decoder is fed with three bilingual data resources: again used the combination of EBMT and SMT in two new
aligned sentences, sub-sentential chunks, and words. For domains of hotel reception and bus information [99] [94].
sign synthesis, POSER [170] animation software version In the hotel reception domain paper, Lopez-Ludena et al.
6.0 is used to create an animated avatar. Though the system use a declarative abstraction module for all internal com-
used the concept of animation, the avatar lacked naturalness ponents and H-Anim for sign generation [99]. The paper’s
in its movement and expressions. objective evaluation was done using WER, SER, and aver-
Morrissey et al. addressed corpus-driven MT predomi- age Translation time with respective scores as 6.7%, 10.7%,
nantly for Irish Sign Language and DGS [117]. They and 3.1 s. Subjective measurement is done in the form of
extended the MaTrEx system by adding two additional questionnaires. This approach’s main disadvantage is that the
modules of recognition and SL animation. Though SL rec- methodology is sequential, and the technology adaptation
ognition is out of our survey’s scope, SL transcription and depends upon the parallel corpus generation. Lopez-Ludena
evaluation are other challenges handled in this work. The et al. follow the same translation strategy for the bus infor-
approach conducts two experiments using two different cor- mation domain but focuses majorly on the sign synthesis
pora. Air Traffic Information System (ATIS) corpus [21] is part [94]. Non-manual signs are also considered during sign
used in the first experiment, and POSER animation software language generation. Another advantage of the representa-
tool is used for producing 3D human figures. In the second tion module is an adaptation to different kinds of devices
experiment, a medical corpus is used, and HamNoSys is (computers, mobile phones, etc.). The subjective evaluation
used for annotation. Both manual and automatic evaluations of the paper involves two main aspects of intelligibility and
have been conducted in the experiments. naturalness. Automatic evaluation metrics indicate SER less
San-Segundo et al. develop a comprehensive HMT than 10%, BLEU greater than 90%, and translation time as
approach wherein they initially implement and evaluate 8.5 s. Kouremenos et al. proposed a novel prototype system
rule-based, example-based, and statistical translators. The to create a parallel corpus using RBMT and then use the
final version combines all the alternatives into a hierarchi- same corpora as training data [89].
cal structure [132]. A corpus of 2214 Spanish sentences A professional translator, with the help of RBMT, pro-
was used, and two LSE experts converted to LSE. For every duces a high-quality parallel Greek text to GSL glossed
module in this paper, different strategies are used; the first corpus of 1,015 sentences and 20,287 tokens. RBMT uses
consists of EBMT, wherein the translation process is carried different tools and technologies: AUEB’s POS parser, NLTK
out based on the similarity between a sentence to be trans- 3.0 suite, and Java and Perl scripts. Finally, a parallel cor-
lated and examples of the parallel corpus. The second fol- pus trains the MOSES SMT system. When evaluated using
lows a set of translation rules for rule-based translation and BLUE n-gram, the results indicate that the larger the n-gram
for the last strategy of statistical translation, parallel corpora superior is translation accuracy. On similar lines but without
are used for training language and translation model. Finally, using a professional translator, Achraf et al. created an arti-
the alternatives mentioned above are combined into a hier- ficial corpus using grammatical dependency rules [2]. The
archical structure. The system’s evaluation is conducted corpus was of English-ASL language pair and was fed into
based on objective parameters like Sign Error Rate (SER), an SMT system. Both systems mentioned above produced
position-independent word error rate (PER), and BLEU. The gloss output and expressed the need to incorporate 3D ani-
results indicate that RBMT systems obtain better results mation in the future.
than EBMT and SMT systems because the rules introduce Brour et al. formulated a hybrid approach ATLA-
translation knowledge not seen in the parallel corpus. The SLang MTS by combining EBMT and RBMT to
13
Universal Access in the Information Society
facilitate translation from Arabic to ArSL [19]. In this hybrid took advancement machine translation methods, Manzano
approach, if the sentence to be translated exists in the data- et al. used NMT to translate English to ASL [103]. The
base, EBMT is applied, and if it does not, then the rule- approach worked on the ASLG-PC dataset [122], which
based Interlingua approach is applied. For the pre-processing consisted of 83,618 pairs of sentences, was used for trans-
of Arabic text, the Alkhalil Morpho System [1] is used as an lation and ASL glosses were the output of the process.
analyzer, and SAFAR Platform [143] is used to transform The vocabulary size of the project was small, which led to
the analyzed text in the output from html to xml format. tokenization errors.
When the sentence exists in the database, it is displayed A major translation from Arabic to Arabic Sign Language
directly without any analysis, but if it does not, then after ATLASLang had earlier used classical machine transla-
morpho-syntactic analysis, eleven rules of syntactic reorder- tion methods that suffered from the limitation of linguistic
ing are applied. A database of gif images is used to display knowledge necessary to develop the rules [19]. A dataset
the final output, and if a sentence contains a proper noun, it of around 9715 input–output pairs of Arabic-ArSL exam-
is animated using a 3D hand. The final version of translation ples of sentences trains the ALTLASLang MTS1 system.
must use a 3D avatar instead of a gif. The process starts with morpho-syntactic analysis wherein
Kayahan et al. observe that language-specific rules tend to each word is given morphological characteristics, and then
increase the overall system’s performance and system’s work the sentence is encoded. On the target side, a target vec-
efficiently when combined with other machine translation tor is generated using a feed-forward neural network with
systems [79]. This system converts Turkish spoken language backpropagation. In the end, the vector produced is decoded
to Turkish Sign Language by combining rule-based and sta- using a 3D avatar. For sign generation in the form of 3D
tistical machine translations. The whole system is divided avatar XML encoding of HamNoSys called SiGML was
into three components; the first one is a rule-based transla- developed. By converting HamNoSys symbols to SiGML
tion component which is python-based and comprises 13 form, all signs have been displayed using JASigning API.
translational rules. The second component is a pre-processor ATLASLang NMT outperformed ATLASLang MTS, with
that reduces data thinness for the next segment, a statistical the forming scoring a BLEU score of 0.79.
translation component. The SMT component of the system Another NMT system, Text2Sign, uses a Generative
uses MOSES for generating a language model and decodes Adversarial Network and Motion Generation to produce sign
the input sentence. The approach uses the BLEU metric videos from spoken language sentences [147]. The project is
for evaluation and reports the scores as BLEU-4 12.64% divided into phases, and in the first phase, Recurrent Neural
BLEU-3 19.28% BLEU-2 31.48% BLEU-1 53.17%. The Network (RNN) [26] method of NMT using Luong atten-
results reported are satisfactory; however, the system needs tion [101] is combined with Motion Graphs [90] to generate
a virtual avatar tool for completeness. sign pose sequences. The resulting pose is used to condition
Table 7 lists all hybrid translation strategies used in this a generative model to produce video sequences. Multiple
study. A large amount of work in the text to sign language datasets have been used in this approach:PHOENIX 14 T
translation has been done using the conventional machine German weather broadcasts dataset [24], SMILE dataset
translation approaches. Since the advent of neural networks, [38] to train multi-signer generation network, and HD dis-
researchers worldwide have focused on including neural net- semination material acquired by the Learning to Recognize
work technologies in this field. Dynamic Visual Content from Broadcast Footage (Dynavis)
project [16] to train HD sign generation network.
4.2.6 Neural machine translations The use of multiple datasets demonstrates the robustness
of the system. This approach counters the use of avatars
The Neural Machine Translation (NMT) model uses an and motion capture as for avatars, deep expert knowledge
artificial neural network to predict the likelihood of a of animation is required, and the Motion Capture method is
sequence of words. They require lesser memory than SMT an un-scalable and costly process. Despite this, the system
systems, and they do not use separate language models, cannot compete with the existing avatar approaches due to
translation models, and reordering models but are just one low translation training data resolution.
integrated model. It uses deep learning and representation Contrary to the Stoll et al. approach, which used gloss as
learning to perform translations. The basic idea behind a priori to generate sign sequences, the approach proposed
NMT is to code a sequence of variable length words into by Saunders et al., focuses on automatic sign language pro-
a fixed-length vector that summarizes the complete sen- duction and learning the mapping between text and sign pose
tence [20]. NMT methods for translation from text to sign sequences directly [138]. This approach used transformer
language are still unexplored, and an open problem, a few architecture and produced 3D sign pose sequences as the
but significant NMT works are discussed in this part. As final output. The approach increased the SLP performance
deep learning and Artificial Neural Network technologies when evaluated on PHOENIX-14 T dataset. However, the
13
Universal Access in the Information Society
approach focused mainly on hand and body articulators, thus Although many SLMT works focus on translation, several
ignoring a sign’s non-manual features. comprehensive research pieces have made sign generation
As non-manual features provide contextual and grammat- a prominent part of the study. In this part of the paper, we
ical information of a sign, Saunders et al. included NMF’s will discuss some of the earliest approaches of SLMT, which
in their adversarial multi-channel SLP approach [136]. This incorporated sign synthesis as a significant part.
model fully encapsulated all sign articulators, thus generat- An interlingua approach ZARDOZ describes a multi-
ing realistic sign productions. lingual translation system using a blackboard control struc-
The approaches mentioned above mainly produced skel- ture [154]. This system offers a complete generation phase,
eton pose sequences that resulted in under articulation, and including a detailed avatar animation phase. Sign tokens are
also no studies have been so far on whether they are helpful compiled into Doll Control Language (DCL) program by the
to deaf people. Ventura et al. go one step further from skel- DCL animator [156]. This program controls an on-screen
etal visualizations and generate realistic videos using the animated doll to articulate the correct gesture sequence. The
state-of-the-art human motion transfer method Everybody TEAM project also uses gloss notation to present the SL out-
Dance Now (EDN) [25, 157]. For signing videos and key- put. In the last phase of the project, the sign synthesizer uses
point, they use a subset of the How2Sign [37] dataset, a large an avatar model to show each sign [164]. The TEAM project
dataset of ASL sign videos. The study results indicate that formed the basis for the translation work carried out in the
generated videos are preferred over skeletal visualizations, SASL-MT project [165, 167]. The system differs from the
but the model fails to generate high-quality hand images. TEAM project as it did not integrate the translation phase
Henceforth, the SIGNGAN approach used a Mixture Den- with the animation phase.
sity Network [14] for more expressive sign production [137]. The ViSiCAST and eSIGN projects produce SL output
This approach produced photo-realistic continuous sign lan- in HamNoSys notation, DRS, and HPSG [12] and focus on
guage videos directly from the spoken language. The model animation generation [107]. Though these projects cater to
training was done on separate datasets, PHOENIX-14 T, and a wide area of SL animation, it lacks the working of non-
a corpus collected from sign language interpreter’s broad- manual features.
cast. This system outperformed all baseline systems when An ASL workbench focuses majorly on the representation
evaluated on quantitative metrics and human perceptions. of ASL rather than translation [144]. It adopts the Move-
All of the above discussed NMT approaches are state-of- ment-Hold model of sign notation which divides the signs
the-art systems. They have been trained and evaluated on into sign sequences according to their phonological features.
challenging datasets like PHOENIX-14 T. Studies reviewed Each phoneme comprises a set of features specifying its
under NMT are listed in Table 8. articulation. The author noted that a translation system could
produce output in many forms like glossed output, linguistic
4.3 Sign synthesis representations, and animated signing, but the project did
not have any. However, the study tried presenting a phonetic
An avatar accompanies a complete SLMT system to per- structure for non-manual features.
form signs. However, specific state of the art systems have The representations mentioned above help in understand-
not synthesized translation into avatar but have presented ing the SLMT. Still, they are not applicable in the real world
the sign words using some notation. Like any other nota- because deaf people need a system to see the particular sign
tion system, gloss is also considered a notation system to words’ performance by real-life virtual humans as they are
represent sign language. A gloss is an approximation of a much suitable for sign representation. The following part of
word of another language written in upper case stem form. the paper will discuss the current status of 3D sign synthesis
Stein et al.’s statistical machine translation study uses gloss in SLMT.
notation to represent DGS [146]. A rule-based system pro-
posed by Porta et al. uses a transfer approach to convert 4.3.1 3D avatar
Spanish LSE glosses [126]. Several other annotation tools
help in representing sign languages. Morrissey et al. used A 3D avatar as a sign representation mode started as early
the ELAN annotation tool [171], which provides a graphical as the late nineties. ZARDOZ uses a doll animator to per-
interface and helps illustrate the corpora in a video format form different articulation of signs. Advancing on the same
[114]. Almohimeed et al. also use the ELAN annotation to scheme, TESSA came into play, which used the motion
transcribe ArSL in Arabic to ArSL translation [5]. Like- capture method to directly capture human signer move-
wise, many latest approaches have confined their work only ments and coupled this with virtual humans [12]. To pro-
till notation representation, though some like IT3STL and duce smooth movements caught using the motion capture
ATLASLang MTS approaches have used pictures and GIF technique, TESSA used the Simon-the-Signer 3D model
images as the presentation mode [19, 30]. for animation. TESSA formed a stepping stone for another
13
Universal Access in the Information Society
system named VANESSA. VANESSA was one of the several 3D prototype was compared against earlier used motion
applications based on synthetic signing avatar technology capture approaches. Another significant work proposed by
developed under the eSIGN project. In this system, each Huenerfauth et al. added linguistically motivated pauses and
lexicon entry is expressed in terms of HamNoSys notation variations in sign duration of avatars created by Sign Smith
which defines manual and non-manual features of a sign. Studio [177] to improve the signers’ performance [61]. The
Further, the HamNoSys scripted lexicons are automatically results indicate that signs were more comprehensible and
converted to SiGML notation to drive the avatar. understandable by the deaf users after the proposed changes.
Many projects in the early 2000s focussed majorly on Furthermore, Huenerfauth et al. also tested the effect of
sign synthesis, one of them being the sign synthesis project spatial reference and verb inflection on SL animations [67].
by Grieve-Smith et al. This project took a sign language In the experiment’s context, 10 paragraph length stories in
text as input in ASCII-Stokoe notation converted it into a ASL were designed, and computer animations were scripted
linguistic representation further to 3D animation sequence using Sign Smith Studio. Native deaf evaluators used Lik-
in Virtual Reality Modelling Language (VRML or Web3D) ert scale to answer questions regarding comprehensibility,
automatically rendered to a Web3D browser [54]. The pro- understandability and naturalness of animations. The study
totype was one of the initial attempts to create a 3D avatar, and the results demonstrated that the inclusion of properly
thus faced several inverse kinematics problems. The paper inflected ASL verbs produced more realistic animations.
also discusses adopting HamNoSys /SiGML or some other Sign synthesis has been a significant focus in many Span-
linguistic representation for the future. ish to LSE projects. San-Segundo et al.’s approach incorpo-
Kennaway et al.’s synthetic animation system automati- rate avatar use after the system’s transfer phase [133]. An
cally synthesizes deaf signing animations from HamNoSys animated agent is developed, which is a simple representa-
transcription [80]. A simple control model of hand move- tion of a human person but is detailed to produce accurate
ment is combined with Inverse Kinematics calculations for gestures required for sign language. This animated agent is
placements of arms. The synthesized animation is further a 2D agent named AGR (agent for gesture representation).
combined with motion capture data for the spine and neck to This avatar had poor level accuracy, as can be attributed
add natural ambient motion. This approach produced results to a 2D presentation. Baldassarri et al. adopted HamNoSys
that compared favorably with the existing alternatives of that and SiGML notation as intermediary representations and
time. then translated them into a signing avatar [9]. This work
Though the systems mentioned above focussed on syn- included a deaf person’s mood and used the Maxine Ani-
thesis technology to produce a better performing sign lan- mation engine [10] to present 3D scenes in real-time using
guage avatar, there was also a need to exploit natural lan- a virtual avatar. The use of avatars kept gaining popularity,
guage processing mechanisms to build structural rules to and people worldwide tried finding new ways to incorporate
create a sign-coded lexicon. Efthimiou et al.’s Greek to virtual signer animation in SLMT systems.
GSL approach used parsed output of GSL structure patterns KAZOO used two animation frameworks, Octopus [18]
enriched with sign-specific information to activate a virtual and GeneALS [34], to create a specialized animator for ani-
avatar in Web3D [41]. This approach also introduced an idea mation generation [17]. Gene ALS framework is used to
and importance of classifiers in sign language conversion compute postures, and the co-articulation and combination
systems. Karpouzis et al. present another Greek to GSL sys- capabilities of Octopus produce final animation.
tem which utilizes language resources suitable for young Another famous 3D sign synthesis standard used in vari-
students to implement a virtual signer software component ous studies is the H-Anim standard [173]. In this project,
using VRML plug-in in Web3D browser [77]. Web Sign each phalanx of the avatar can be positioned and rotated
technology which is based on Web technology has also been using realistic human animations. Lopez-Ludena et al. used
majorly used by Mohamed Jemni et al. in his text to sign a sign editor module and H-Anim to reduce the time con-
language approaches [70–72]. All the approaches describe sumed in the generation process of signs [98]. Lopez-Ludena
the web based module and its integration with the VRML et al. again used the same standard in a subsequent study
player. This tool has been used to create sign dictionaries where manual and non-manual features are composed to pro-
and making information more accessible to deaf users. duce final animation using Non-Linear Animation (NLA)
Meanwhile, Huenerfauth et al. presented his idea of techniques [94].
classifier predicates wherein a 3D visualization of the Kipp et al. focus majorly on the creation and evalua-
arrangements of objects in the sentence of English input tion of SL avatars [82]. This work has also focused on
is done using AnimNL and multimodal NLG technology evaluation and has introduced delta testing as an evalua-
(a technology for illustrating gestures that are not easily tion method. This technique is a unique way of comparing
encoded as text strings) [63, 65, 66]. Huenerfauth et al.’s avatars with human signers. After discussing earlier pro-
approaches also allowed a feedback evaluation wherein the jects like ViSiCAST, eSIGN, and PAULA (Practising ASL
13
Universal Access in the Information Society
using linguistic animation [33]), the paper proposes the avatars add more naturalness and flexibility in the presenta-
use of EMBR [58] character animation engine as it offers a tion of SL.
high degree of control over the animation and is publically
available. Evaluation tests for comprehensibility resulted
in a score value of 58.6%, which is somewhat close to 4.4 Performance metric in SLMT
the state-of-art method ViSiCAST having a 62% score.
The paper stresses steering research towards SL synthesis, An SLMT performance metric is a standard measure of a
focussing majorly on non-manuals and prosody to achieve degree to evaluate translation accuracy from a spoken lan-
next-level naturalness in avatars. guage to sign language. Evaluation of SLMT is broadly
Another breakthrough project in 3D sign synthesis was divided into manual and automatic evaluation. Both evalua-
the DICTA-SIGN project, which was the first multilingual tion methods have their performance metrics that help better
system and worked on four European languages (Greek, understand the accuracy of the system. For manual evalua-
British, German, and French) [40]. The project aims to tion feedback from deaf and SL experts, it is one of the best
amalgamate recognition, animation, and machine trans- ways to check the understanding of the sign language gen-
lation techniques. The signs are synthesised using Ham- eration output. Automatic evaluation performance metrics
NoSys/ SiGML notation. DICTA-SIGN includes prosodic depend upon the type of Machine Translation. In the earlier
information along with phonetic and grammatical informa- approaches like ZARDOZ, TESSA, evaluation was not a
tion. The authors aimed to make DICTA-SIGN a multidis- part of the project [12, 53, 155, 156].
ciplinary approach in the future. The performance metrics followed by SLMT are more or
As and when researchers became comfortable using less the same as that followed by text-to-text MT or trans-
3D animation as a sign synthesis approach, they started literation. Word Error Rate (WER) and Sentence Error Rate
moving into the arena of making 3D animation more and (SER) are the primary and most widely used performance
more natural. Non-Manual features are capable of carrying metrics in transliteration [121] [116]. SER computes the per-
crucial linguistic information. Ebling et al. present a work centage of incorrect complete sentences match. WER cal-
that bridges the gap between the sign translation system’s culates distance (Levenshtein Distance algorithm) between
output and the sign animation system’s input by incorpo- reference and candidate translation. The formula of WER
rating non-manual features at the end of translation [39]. is as follows, summing up all kinds of errors (substitution,
Sequence classification is used to generate non-manual insertion, deletion).
information in sign languages. The glosses generated after S+D+I
translation serve as input to the sequence classification WER =
N
system. Sequential condition random fields (CRFs) [150]
are the state-of-art approach to perform sequence classifi- where S is number of substitutions, D is number of dele-
cation. The key feature of this paper, cascading of sequen- tions, I is number of insertions, N is the number of words
tial predictions, can be used in other sign languages. in reference.
In recent years, the use of free software available for As WER follows the order of the words, another perfor-
creating 3D avatars started getting popular. There are soft- mance metric, PER (Position Independent Word Error Rate),
ware’s available used for 3D creation like Blender, Unity, completely neglects the word order [151]. It measures the
and Maya [169, 174, 176]. Blender being open software difference in the count of words occurring in candidate and
has been used in many pieces of research. Almeida et al. reference sentences. The resulting number is divided by the
used Blender software to convert Portuguese to Portuguese number of words in reference.
Sign Language (LGP) [4]. In the same language pair, a Translation Error Rate (TER) is another commonly used
prototype was developed called OpenLibras, which offered metric for MT evaluation. It attempts to measure the mini-
a multilingual platform for text-to-sign translation [158]. mum amount of editing that a human would have to perform
Open Libras was an extension of VLibras, which followed to convert the system output into reference translation [142].
the same suite of animation [7]. In this system, a 3D avatar The lower TER rate indicates the higher accuracy of the MT
model was created composed of 82 bones for facial expres- system [108]. The TER score is determined based upon the
sions, hand shapes, body, and arm movements. The model minimum number of corrections Nb(op) on the average size
developed has the capability of representing non-manual of the reference AvregNref .
features. Nb(op)
The above overview of several papers dealing with 3D TER =
AvregNref
sign synthesis highlights the importance and advantages
of the 3D avatar model over other sign synthesis represen- BLEU [125] is another performance metric that has gained
tations like gloss, videos, pictures, and GIFs because 3D importance in the MT field. BLEU was the first automatic
13
Universal Access in the Information Society
MT evaluation metric to show a high correlation with human performed by the signing avatar by 15 BSL users. Though
judgment. BLEU score is a precision-based metric that com- no automatic evaluation was done, the results of the manual
pares a system’s translation output against reference transla- evaluation were well documented along with scale values
tions set by summing over 4-g, trigram, bigram, and unigram marked by (1) Liked, (2) Quite liked, (3) Neutral, (4) Not
matches found divided by the sum of those found in the much liked, (5) Disliked and relevant participant comments.
reference translation set. The score of output translation is Fotinea et al. also subjected its system to evaluation regard-
produced between 1 and 0. BLEU is a precision measure- ing the system’s usability and its appeal to the users con-
ment; a higher value indicates better translation. cerning navigation, educational targets, and virtual signers’
An improvement over BLEU was developed and was acceptability [46]. Evaluators’ comments were categorized
called NIST [36]. NIST is also a precision measure, but it according to the virtual signer’s naturalness, performance
avoids BLEU’s bias towards short n-gram candidates. This accuracy, and appearance.
bias is termed as brevity penalty and was an unwanted effect A study is considered to be a complete one if both auto-
of BLEU [108]. matic and manual evaluations are conducted. Dangsaart
Though the SLMT can use the same metrics, other param- et al.’s evaluation system measured translation accuracy
eters help measure the quality of the SLMT system’s output (Translation of sentence and sign representation) for auto-
like San-Segundo et al. use the feedback of 10 users to rec- matic evaluation and user satisfaction for manual evaluation.
ognize several gestures played by the system and measured Translation accuracy was measured in terms of intelligibil-
the gesture quality based on it [133]. Similarly, Almeida ity and fidelity by testers to determine if the system gener-
et al. evaluate an interface created for 3D viewing for useful- ates correct and reliable translations [30]. The performance
ness, usability, translation quality, and adequacy of the 3D metric used in this case were accuracy, precision, recall,
avatar [4]. Thus, evaluation is divided into two categories: 1) and F-score. The equations of the mentioned metrics are
Automatic Evaluation and 2) Manual Evaluation. The next as follows:
part of the paper will discuss both the evaluation criteria and
|Y|
the importance of each. Accuracy (Y|X) = , (Y, X > 0)
|X|
4.4.1 Performance metric in RBMT |X ∩ Y|
Precision (p) = , (Y, X > 0)
|Y|
Cox et al. carried out a detailed evaluation of a translation Recall (r) =
|X ∩ Y|
, (Y, X > 0)
system developed for communication between a deaf per- |Y|
son and clerk in the post office to determine the usefulness 2rp
F1 (r, p) =
of the Wray et al. prototype system for people who have r+p
BSL as their first language [28, 161]. The system’s evalu-
ation measures the quality of signs, difficulty performing User satisfaction was evaluated in terms of thought and
a sign with TESSA, and perception of the deaf users and preference of the deaf via questionnaire. The scale taken for
post office clerks. On a 3 point scale from 1 (Low) to 3 the measurement was rates on 5 points (5-Excellent, 4-good,
(High), the average ratings of acceptability by deaf persons 3-fair, 2-poor, 1-very poor). The evaluation results of the
was 1.9 with TESSA and 2.6 without TESSA. The clerks system show that the system performance satisfies the user’s
also rated the transactions completed with TESSA slightly need.
lower than without using it. In conclusion, the average rating Translation speed is another metric that measures the
was 2.5 with TESSA and 2.6 without TESSA. For subjective quality of the system. Baldassarri et al. measure the sys-
evaluation, several questions were asked from deaf partici- tem’s performance based upon well-translated words, words
pants about ease of communication in the post office. The in the correct order, and the translation speed [11]. Thus
responses were measured on a 5 point scale from 1 (Very the time required to translate from a spoken language text
difficult) to 5 (Very easy). Herein the clerks responded that to the animated sign indicates the translator’s relevance to
communication was “slightly easier” or “much easier” with work in the real-time system. Translation speed and TER
TESSA than without it. are also performance metrics used by Da et al. to measure
VANESSA, a significant project under eSiGN project, Vietnamese television news translation performance into 3D
was amongst several applications based on synthetic signing SL animation [29].
avatars [51]. The evaluation conducted in this project was Porta et al. has evaluated the Spanish to LSE translation
a threefold user evaluation. The first one was on-site with system using BLEU and TER performance metrics and com-
the live VANESSA system with two BSL users. The second pared the evaluation with other approaches [126]. It conducts
was laboratory-based evaluation with 7 BSL users. The last six experiments to conduct a comparative analysis with differ-
was a simple evaluation of the intelligibility of the sequences ent methods based on TER and BLEU performance metrics.
13
Universal Access in the Information Society
The system does not produce any animation; thus, no evalua- an important place to test the system’s validity, usability,
tion for synthesis is done; the study also lacks manual evalua- and overall acceptance. Automatic evaluation, a less time-
tion, which is considered necessary for a complete SLMT. In consuming task, becomes a challenge in several RBMT situ-
the ATLAS project, a similar comparison evaluation has been ations due to a lack of reference texts.
presented but using different metrics. In this project, the mean
and standard deviation of the correctly recognizing correct and
4.4.2 Performance metric in corpus‑based machine
incorrect paraphrases by proposed and the referent systems
translation
have been compared [109].
The manual evaluation applied differs from system to sys-
The performance metrics for rule-based and data-based
tem; on the contrary, automatic evaluation metrics are com-
machine translation are more or less the same for automatic
mon to several systems. Luqman et al. present BLEU, WER,
evaluation. In one of the earliest systems of example-based
and TER as performance metrics for automatic evaluation.
machine and statistical machine translation, MaTrEx the
For manual evaluation, user feedback from one deaf person
evaluation was done against a ‘gold standard’ annotations
and one bilingual translator with good, fair, and poor metrics
[115]. Two types of error measurement metrics, WER and
to measure translation quality [102]. In an Arabic translation
PER, measured the distance between reference and candi-
approach, Al-Barahamtoshy et al. only conduct the manual
date translations. Another EBMT system proposed by Almo-
evaluation. It evaluates the quality of signs by classifying them
himeed et al. used the same performance metrics but applied
into valid, partially valid, or not valid categories [3].
a Leave-One-Out Cross-Validation technique before apply-
OpenSigns, a multilingual translation platform from multi-
ing PER and WER [5]. In both, cases accuracy measurement
ple spoken languages to Portuguese Sign Language (LIBRAS),
metrics and manual evaluation were left out, thus falling
was compared with the original VLIBRAS using BLEU and
short of being a complete evaluation system.
WER performance metrics [158]. The evaluation was con-
Lopez-Ludena et al. evaluate the system’s performance
ducted in two phases. In the first phase, text-to-text translation
using both accuracy and error measuring metrics [100]. For
was tested using BLEU and WER, and in the second phase,
accuracy, BLEU and NIST are used, and for error calcula-
using the same metrics, text-to-gloss translation was tested.
tion mSER (multiple reference Sign Error Rate) and mPER
OpenSigns was one of the first systems using BLEU and WER
(multiple Position independent Sign Error Rate) have been
metrics to test a multilingual platform. A manual evaluation
added. The system compares three different alternatives of
of the system was conducted using a three-part questionnaire
the system, and for comparison, the confidence interval (at
with every part consisting of different questions. The test was
95%) for every BLEU is also presented. This interval is cal-
performed with 15 deaf Brazilian students and three LIBRAS
culated as
interpreters from three public institutions. The user had to
select between 5 responses. Though animation quality was √
BLEU(100 − BLEU)
not tested, this study tried to perform a complete and proper ±Δ = 1.96
n
evaluation of the system in all this study.
Evaluation is a challenge when there is a lack of substantial A Spanish to LSE Translation system proposed by San-
corpus for assessment. In a study based on English text to Segundo et al. evaluates both RBMT and SMT systems in
Pakistan Sign Language (PSL), the authors did a time-consum- different modules [134]. The evaluation metrics used are
ing task of compiling a sentence level evaluation corpus of 500 SER and BLEU in both cases. These automatic evaluation
sentences, which covers all categories of PSL sentences cor- metrics are capable of indicating RBMT giving better results
responding to various linguistic types of English [81]. Corpus than SMT. Another evaluation used in this study is the use
is evaluated using BLEU, WER, and TER performance met- of confidence measures to inform the user of the translated
rics. Manual evaluation is also conducted with two deaf sub- signs’ reliability. Three confidence levels are defined: (1)
jects and two bilingual experts of PSL and English. Another High Confidence (confidence value higher than 0.5) (2)
approach that reduces evaluation time is followed wherein only Medium Confidence (Confidence value between 0.25 and
the translation rules are evaluated and not the manual com- 0.5) (3) Low Confidence (Confidence value less than 0.25).
parison of each sentence with its transcription. In this work, Both the above Spanish translation systems lack manual
820 transfer rules are extracted, and precision is calculated as evaluation.
Though the above systems paved a good way for the
count (valid sentence)
T (precision) = × 100 use of automatic metrics for evaluation of the transla-
count (sentence) tion system, however [132] in a subsequent study, San-
We can infer from the above submitted reviews that Segundo et al. performed an extensive field evaluation
though manual evaluation is time-consuming, it still holds of the proposed prototype of Spanish to LSE translation
system. The system combined EBMT, SMT, and RBMT
13
Universal Access in the Information Society
translation methods and measured each’s performance A Spanish translation system proposed by Lopez-Ludena
and integrated, using translation time, SER, PER, BLEU, et al. uses automatic evaluation metrics at different levels
and NIST metrics. The field evaluation included objective [95]. Firstly, the paper presents the BLEU, NIST, mSER, and
measurements from the system and subjective measures mPER percentage values of the baseline phrase-based and
from both Deaf users and government employee ques- Statistical Fine State Transducers approach (SFST). Next, a
tionnaires. The assessment conducted helped to know the pre-processing module is applied to both methods to reduce
actual reviews of deaf people and problems faced by them the occurrence of errors. The pre-processing module applies
regarding avatar naturalness. tags to the words manually as well as automatically. After
The systems discussed above majorly produced gloss applying the pre-processing module to both phrase-based
notations thus relied only on automatic evaluation metrics. and SFST approaches, automatic evaluation is conducted
In the systems where animation is the final output of the using the same metrics to see the comparison between
translation, manual evaluation is imperative to evaluate the automatic and manual tag application and see the effect of
signed production, especially in the absence of automatic the pre-processing module. Lastly, two experts have been
evaluation. In Stein et al.’s approach, the translation is meas- involved in manual evaluation to complete the analysis.
ured using WER and PER metrics, and manual evaluation is Both experts evaluated every sentence with one of the three
conducted using human experts to rate the coherence of the possible scores: 1 (the sentence is well constructed and the
sentence to the avatar output with numbers ranging from 1 meaning is same as original), 0.5 (there are errors but the
(incomprehensive) to 5 (perfect match) [146]. In a Chinese meaning is same as the original) and 0 (the sentences are
to Taiwanese Sign Language translation system, Su et al.’s not understandable, nor the meaning is same as the original
proposed approach is compared with the baseline system one). Lastly, to show the correlation between automatic met-
proposed by Wu et al. using a BLEU metric [148, 162]. The rics and human evaluation, Pearson correlation has been pre-
corpus is divided into short and long sentences, and BLEU sented. The study has tried to perform an exhaustive assess-
scores are measured with different n-gram precisions. The ment of the whole system and kept the avatar presentation of
second metric, WER, is used on a similar corpus for both signs and its human evaluation for future research.
proposed and baseline approaches. For manual evaluation, A bidirectional EBMT translation process Turkish to
two approaches were followed. Firstly, it adopted mean opin- Turkish sign language proposed by Selcuk-Simsek et al.
ion scores (MOS) method to score translation results for highlighted datasets’ importance for a fair evaluation [139].
proposed, and baseline approaches. Scores for evaluating the In this system, three datasets were created to observe the
translation were divided into five grades, from 1 for bad and characteristics of the system. All three datasets are evalu-
5 for excellent. The second subjective assessment evaluated ated separately using BLEU and TER performance metrics.
the reading comprehension of the ten subjects involved in TER metric was also used by Quach et al. in the Vietnamese
the evaluation. For this fifth and sixth grade, deaf students EBMT system [127].
were invited as the subject, and a special educator designed From the above discussion, it is apparent that BLEU is a
20 questions for testing the reading comprehension of the critical metric used in collaboration with metrics like TER
translated TSL sequences. and WER to measure the system’s accuracy and precision.
Similarly, the MaTrEx model had not synthesized the
Irish SL gloss notation into virtual signs. In the following 4.4.3 Performance metrics in neural machine translation
years, this system was extended to include bidirectionality, (NMT)
which added two modules: recognition and SL animation
[117]. The translation part of the system was evaluated auto- The performance metrics used in NMT for automatic eval-
matically using BLEU, WER and PER metrics. The anima- uation are similar to those used in RBMT or data-driven
tion part of the system was manually evaluated. The evalu- approaches. In NMT, BLEU is also considered to be one of
ation was done in terms of intelligibility (understanding the most reliable performances metric. ATLASLang NMT is
how understandable animation was) and fidelity (assessing compared with its original data-driven version ATLASLang
how good a translation of the English to the animation was). MTS1 over 73 sentences over a 4-g BLEU score [19, 20].
A scale of 1–4 with qualifying descriptions was used for Apart from BLEU, NMT systems have used other perfor-
both metrics. The resultant feedback allowed the evaluator mance metrics. Stoll et al. uses BLEU, ROUGE (ROUGE-L
to attribute a completely negative, mostly negative, mostly F1) score and WER [147]. This system has been compared
positive, or completely positive rating to each translation. with a Gloss2text system on different n-gram granularities
The evaluation was further concluded with a questionnaire [24]. This system also extends its evaluation to evaluate the
of the general questions regarding translation and technol- quality of the generated output. For this, it uses Structured
ogy. This evaluation was done on a web-based format by 4 Similarity Index measurement (SSIM), Peak Signal-to-Noise
ISL evaluators provided by the Irish Deaf society. Ratio (PSNR), and Mean Squared Error (MSE) to assess
13
Universal Access in the Information Society
image quality [159]. SSIM is used to measure the perceptual As NMT is a relatively new field compared to RBMT
degradation of images and videos in a broadcast by compar- and data-driven, several researchers in this area have not
ing a corrupted image to its original. PSNR and MSE are completed the evaluation part of their work. Between man-
used to assess the quality of compressed images compared ual and automatic evaluation, the latter has been majorly
to their original. SIGNGAN approach used the same metrics seen in the NMT works. Table 9 catalogs different evalu-
for measuring the quality of synthesized images [137]. ation measures used in various studies, and Fig. 5 depicts
Another NMT study proposed by Jung et al. follows it in a pie chart. In this section state of the art perfor-
different performance metrics apart from BLEU [73]. In mance metrics have been discussed. BLEU has been used
this approach, a complex word reordering technique is in many studies as it is considered the standard metric for
formulated. Three MT evaluation metrics used are BLEU, machine translation evaluation [73]. Furthermore, authors
METEOR, and RIBES [35, 69]. METEOR calculates word have used metrics like RIBES, METEOR to elevate the
order similarity using the smallest number of chunks where performance evaluation of the system. As sign synthesis
the system’s results can be aligned with a reference sentence is an integral part of the above studies, metrics like SSIM
[35]. RIBES directly measures the number of reordering are used to measure the quality of the synthesized output.
events between the system results and reference sentence
[69].
The results generated from the score mentioned above
were able to show that an NMT system can be created at a
low cost and in a less resource environment.
1 WER 11 [5, 23, 81, 92, 92, 102, 113, 117, 146, 148, 158]
2 PER 12 [5, 23, 95–97, 99, 113, 117, 132, 134, 145, 146]
3 SER 7 [94–97, 99, 132, 134]
4 TER 6 [29, 81, 126, 127, 139, 145]
5 BLEU 25 [2, 20, 73, 79, 81, 89, 94–97, 99, 100, 102, 113,
117, 126, 131, 132, 134, 139, 145, 147, 148, 158,
162]
6 NIST 6 [95–97, 99, 132, 134]
7 METEOR 2 [73, 117]
8 RIBES 1 [73]
9 Gesture Animation Quality/Gesture Error Rate/Output 3 [131, 133, 137, 147]
picture Quality (SSIM, PSNR, MSE)
10 F1-score 1 [30]
11 Mean and standard deviation 1 [109]
12 Manual evaluation 13 [3, 4, 28–30, 46, 95, 99, 102, 132, 146, 158, 162]
BLEU
28%
13
Universal Access in the Information Society
5 Discussion on implications for research Thus corpus-based methods hold significant potentials for
and practice future research. Some are listed below:
We consider the following avenues as the most promising 5.2.1 Lack of bilingual corpora in all SL’s
for future research. These are discussed according to the
categories of machine translation, sign synthesis, and a Corpus-based machine translation systems work on sentence
general view. pairs of spoken and sign languages. A large corpus is essen-
tial for data-driven systems (example, statistical, hybrid or
neural). Few datasets have been created such as RWTH-
5.1 Rule‑based machine translation Phoenix-14 T, ATIS, DICTA-SIGN, How2Sign, and the
latest public DGS corpus [55]. However, the similar size
A rule-based system is a traditional approach to machine and depth of the corpus do not exist in all sign languages.
translation. However, due to the lack of bilingual data, Henceforth, researchers can focus on creating large datasets
rule-based methods produce effective translation systems. for other sign languages.
The following may be some suggested areas of research.
5.2.2 Acquisition of data in multiple SL’s
5.1.1 Translating complex sentences The general studies of SLMT present a single language
translation system. Though multilingual datasets like
Present research mainly deals with translation of words or DICTA-SIGN and MultiATIS++ corpus [163] exist, more
simple sentences to sign language, reducing the transla- efforts can be made to bridge communication barriers within
tion’s accuracy. Moreover, this facility is only suitable for different deaf communities. Thus there are plenty of possi-
to deaf if there are platforms to convert sentences used in bilities for future developments in making SL’s as any other
everyday conversations, which can help deaf people com- translating natural language. Henceforth, acquiring data in
municate more effectively and efficiently. Thus, to improve different sign languages is the area that holds a high future
the accuracy and usability of a translation system, complex perspective.
and compound sentences (sentences with two independent
clauses with a related idea) need to be covered. 5.3 Neural machine translation
13
Universal Access in the Information Society
fluent in sign language as compared to the corresponding is the number of publications which come out every year.
spoken language. Thus sign generation is an essential topic We have conducted a literature review to review the number
of research. of techniques being used in this field. From more than 200
papers, we have selected 151 studies published in approxi-
5.4.1 Naturality of 3D avatar mately 40 conferences and workshops and 30 reputed jour-
nals which contributed directly to this field. We classified the
Prevailing sign synthesis studies reveal that avatar animation papers according to different types of machine translation
is the most suitable sign production form amongst all kinds systems, sign language generation methods, and evaluation
of sign synthesis. Despite its suitability, avatars have not metrics used.
been successful in being entirely accepted by the deaf com- After analyzing the selected papers, we have noticed that
munity. As an avatar’s design requires a considerable amount the number of publications has been consistent in the last two
of hand engineering, its performance remains robotic and decades. Although the balance of techniques has changed from
under articulated. Non-manual components, facial expres- conventional to contemporary styles, traditional techniques are
sions, eye, and eyebrow movement, can increase the com- still noticed. We have seen that in the nineties and early 2000,
prehensibility of a 3D avatar. For future studies, researchers RBMT held the monopoly over the machine translation field.
can focus on temporal coordination between manual and The need for a larger dataset increased the path for data-driven
non-manual components for signing to appear natural. machine translation was paved. Another consideration is that
many papers have focussed on creating large bilingual corpora,
5.4.2 Alternative methods of sign generation which is an essential requirement of data-driven systems.
Several research efforts conducted do not directly deal with
As the chances of achieving a completely acceptable 3D ava- the translation process but focus on the pre-processing of data
tar are slight in the foreseeable future, it is wise to explore and sign synthesis. Sign synthesis has been the main focus of
other sign synthesis methods. Videos most closely resemble many researchers because it is relevant to the deaf. Several
the ground in overall structure and detail. Thus, sign genera- kinds of sign generations have been reported. Though a sig-
tion capabilities can be analyzed to synthesize meticulous nificant amount of research is done in the sign synthesis field,
sign videos with signers of acceptable appearance. Further- still a level of completion and satisfaction for the deaf has not
more, data processing strategies should be enhanced to focus been achieved.
on SL data’s intricate features such as the size of motion Evaluation is an essential aspect in SLMT as the feedback
and speed. of the deaf is the best source to understand the use, effective-
ness, and limitation of proposed projects. We have reported
both manual and automatic evaluations and their respective
6 Limitations of the review performance metrics used in all the studies.
Lastly, we have identified several gaps in sign language
The main limitation of the present study is that the review translation and generation from the complete analysis. RBMT
is limited to one side of translation, i.e., text to sign lan- requires strong linguistic knowledge, which limits the transla-
guage and not the other way around. The data extraction tion system to a particular language. The amount of bilingual
process has been conducted meticulously. A manual search data is scarce in many languages, making it hard for data-
for including all the articles under this field was done. How- driven techniques to produce accurate results. Even after two
ever, some relevant articles published in different languages decades of research, the sign generation module lacks being
were not included in this study. natural, flexible, and comfortable as required by the deaf.
The authors’ disagreements arose regarding inclusion and Artificial Neural Networks is the latest and upcoming tech-
exclusion of some studies that were resolved with discus- nology that has shown promising results in other fields. Based
sions, and a joint agreement was achieved. The experience upon it, neural machine translation can achieve the much
of one author in conducting a systematic review was used awaited breakthrough in Text-to-Sign language translation in
notably at each stage. Study categories were decided based the coming future.
upon the available literature in the collected studies.
7 Conclusion
13
Universal Access in the Information Society
13
Universal Access in the Information Society
16. Bowden, R. et al.: Learning to recognise dynamic visual content 33. Davidson, M.J.: PAULA: a computer-based sign language tutor
from broadcast footage. https://c vssp.o rg/p rojec ts/d ynavi s/i ndex. for hearing adults. In: Intelligent Tutoring Systems Workshop on
html. Last Accessed 2021/04/04 Teaching with Robots, Agents, and Natural Language Processing,
17. Braffort, A., et al.: KAZOO: a sign language generation plat- pp. 66–72 (2006)
form based on production rules. Univers. Access Inf. Soc. 15(4), 34. Delorme, M. et al.: Thumb modelling for the generation of sign
541–550 (2016). https://doi.org/10.1007/s10209-015-0415-2 language. In: Proceedings of the 9th International Conference on
18. Braffort, A. et al.: Virtual signer coarticulation in octopus, a sign Gesture and Sign Language in Human–Computer Interaction and
language generation platform. In: Proceedings of the 9th Interna- Embodied Communication, pp. 151–160 (2012). https://doi.org/
tional Gesture Workshop, Gesture in Embodied Communication 10.1007/978-3-642-34182-3_14
and Human–Computer Interaction, pp. 29–32 (2011) 35. Denkowski, M., Lavie, A.: Meteor universal: language specific
19. Brour, M., Benabbou, A.: ATLASLang MTS 1: Arabic text lan- translation evaluation for any target language. In: Proceedings of
guage into Arabic sign language machine translation system. In: 9th Workshop on Statistical Machine Translation, pp. 376–380
2nd International Conference on Intelligent Computing in Data (2014). https://doi.org/10.3115/v1/W14-3348
Sciences, pp. 236–245 (2019). https://doi.org/10.1016/j.procs. 36. Doddington, G.: Automatic evaluation of machine translation
2019.01.066 quality using N-gram co-occurrence statistics. In: Proceedings of
20. Brour, M., Benabbou, A.: ATLASLang NMT: Arabic text lan- the 2nd International Conference on Human Language Technol-
guage into Arabic sign language neural machine translation. J. ogy Research, pp. 138–145 (2002)
King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10. 37. Duarte, A.: Cross-modal neural sign language translation. In: In
1016/j.jksuci.2019.07.006 Proceedings of the 27th ACM International Conference on Mul-
21. Bungeroth, J. et al.: The ATIS sign language corpus. In: Proceed- timedia, pp. 1650–1654 (2019). https://doi.org/10.1145/33430
ings of the 6th International Conference on Language Resources 31.3352587
and Evaluation, pp. 2943–2946 (2008) 38. Ebling, S. et al.: SMILE Swiss German sign language dataset.
22. Bungeroth, J., Ney, H.: Statistical sign language translation. In: In: Proceedings of the International Conference on Language
Workshop on Representation and Processing of Sign Languages, Resources and Evaluation, pp. 19–25 (2018)
pp. 105–108 (2004) 39. Ebling, S., Huenerfauth, M.: Bridging the gap between sign lan-
23. Buz, B., Gungor, T.: Developing a statistical Turkish sign lan- guage machine translation and sign language animation using
guage translation system for primary school students. In: IEEE sequence classification. In: Proceedings of the 6th Workshop on
International Symposium on Innovations in Intelligent SysTems Speech and Language Processing for Assistive Technologies, pp.
and Applications, pp. 1–6 (2019). https://doi.org/10.1109/INI- 2–9 (2015). https://doi.org/10.18653/v1/W15-5102
STA.2019.8778246 40. Efthimiou, E. et al.: DICTA-SIGN: sign language recognition,
24. Camgoz, N.C. et al.: Neural sign language translation. In: Pro- generation, and modelling: a research effort with applications in
ceedings of IEEE Conference on Computer Vision and Pattern deaf communication. In: Proceedings of the 4th Workshop on the
Recognition, pp. 7784–7793 (2018). https://doi.org/10.1109/ Representation and Processing of Sign Languages: Corpora and
CVPR.2018.00812 Sign Language Technologies, pp. 80–84 (2009). https://doi.org/
25. Chan, C. et al.: Everybody dance now. In: Proceedings of the 10.1007/978-3-642-02707-9_3
IEEE International Conference on Computer Vision, pp. 5933– 41. Efthimiou, E., et al.: Feature-based natural language process-
5942 (2019) ing for GSL synthesis. Sign Lang. Linguist. 10(1), 1–21 (2007).
26. Chung, J. et al.: Empirical evaluation of gated recurrent neural https://doi.org/10.1075/sll.10.1.03eft
networks on sequence modeling. Presented at the (2014) 42. Efthimiou, E., Dimou, S.F.A.: From grammar-based MT to post-
27. Coetzee, L. et al.: The national accessibility portal: an accessible processed SL representations. Univers. Access Inf. Soc. 15(4),
information sharing portal for the South African disability sector. 499–511 (2016). https://doi.org/10.1007/s10209-015-0414-3
In: Proceedings of the International Cross-Disciplinary Confer- 43. Elliott, R. et al.: The development of language processing support
ence on Web Accessibility, pp. 44–53. Banff, Canada (2007). for the ViSiCAST project. In: Proceedings of the 4th Interna-
https://doi.org/10.1145/1243441.1243456 tional ACM Conference on Assistive Technologies, pp. 101–108
28. Cox, S., et al.: The development and evaluation of a speech-to- (2000). https://doi.org/10.1145/354324.354349
sign translation system to assist transactions. J. Hum. Comput. 44. Eryiğit, C., et al.: Building machine-readable knowledge repre-
Interact. 16(2), 141–161 (2003). https://doi.org/10.1207/S1532 sentations for Turkish sign language generation. Knowl.-Based
7590IJHC1602 Syst. 108, 179–194 (2016). https://doi.org/10.1016/j.knosys.
29. Da, Q.L., et al.: Converting the vietnamese television news into 2016.04.014
3D sign language animations for the deaf. In: Duong, T., Vo, 45. Eryiğit, G.: ITU Turkish NLP web service. In: Proceedings of the
N.S. (eds.) Lecture Notes of the Institute for Computer Sciences, Demonstrations at the 14th Conference of the European Chapter
Social Informatics and Telecommunications Engineering, vol. of the Association for Computational Linguistics, pp. 1–4 (2014).
257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030- https://doi.org/10.3115/v1/E14-2001
05873-9_13 46. Evita Fotinea, S., et al.: A knowledge-based sign synthesis archi-
30. Dangsaart, S., et al.: Intelligent Thai text—Thai sign translation tecture. Univers. Access Inf. Soc. 6(4), 405–418 (2008). https://
for language learning. Comput. Educ. 51(3), 1125–1141 (2008). doi.org/10.1007/s10209-007-0094-8
https://doi.org/10.1016/j.compedu.2007.11.008 47. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database.
31. Dangsaart, S., Cercone, N.: Bridging the gap: Thai–Thai sign MIT Press, Massachusetts (1998)
machine translation. In: Proceedings of the 10th Conference of 48. Filhol, M., et al.: A rule triggering system for automatic text-
the Pacific Association for Computational Linguistics, pp. 191– to-sign translation. Univers. Access Inf. Soc. 15(4), 487–498
199 (2007) (2016). https://doi.org/10.1007/s10209-015-0413-4
32. Dasgupta, T., Basu, A.: Prototype machine translation system 49. Filhol, M.: Combining two synchronisation methods in a lin-
from text-to-Indian sign language. In: Proceedings of the 13th guistic model to describe sign language. In: Proceedings of the
International conference on Intelligent User Interfaces, pp. 313– 9th International Conference on Gesture and Sign Language in
316 (2008). https://doi.org/10.1145/1378773.1378818 Human–Computer Interaction and Embodied Communication,
13
Universal Access in the Information Society
pp. 194–203 (2011). https://doi.org/10.1007/978-3-642-34182- 67. Huenerfauth, M., Lu, P.: Effect of spatial reference and verb
3_18 inflection on the usability of sign language animations. Univ-
50. Glauert, J., et al.: Linguistic modelling and language-process- ers. Access Inf. Soc. 11(2), 169–184 (2012). https://doi.org/10.
ing technologies for Avatar-based sign language presentation. 1007/s10209-011-0247-7
Univers. Access Inf. Soc. 6(4), 375–391 (2008). https://doi. 68. Huenerfauth, M.P.: A survey and critique of American sign
org/10.1007/s10209-007-0102-z language natural language generation and machine translation
51. Glauert, J.R.W., et al.: VANESSA—a system for communica- systems. Technical report (2003)
tion between deaf and hearing people. Technol. Disabil. 18(4), 69. Isozaki, H. et al.: Automatic evaluation of translation qual-
207–216 (2006). https://doi.org/10.3233/TAD-2006-18408 ity for distant language pairs. In: Proceedings of Conference
52. Gough, N.: Example-based machine translation using the on Empirical Methods and Natural Language Processing, pp.
marker hypothesis. PhD thesis. Dublin City University (2005) 944–952 (2010). https://doi.org/10.5555/1870658.1870750
53. Grieve-Smith, A.B.: English to American sign language 70. Jemni, M. et al.: A Web-based tool to create online courses for
machine translation of weather reports. In: Proceedings of the deaf pupils. In: Proceedings of the International Conference
Second High Desert Student Conference in Linguistics. High on Interactive Mobile and Computer Aided Learning, pp. 1–8.
Desert Linguistics Society, pp. 23–30 (1999) Amman, Jordan (2007)
54. Grieve-Smith, A.B.: SignSynth: A sign language synthesis 71. Jemni, M., Elghoul, O.: A system to make signs using collabo-
application using Web3D and Perl. In: Gesture and sign lan- rative approach. In: International Conference on Computers for
guage based human–computer interaction, pp. 134–145. Lon- Handicapped Persons. Lecture Notes in Computer Science, pp.
don, UK (2001). https://doi.org/10.1007/3-540-47873-6_14 670–677 (2008). https://doi.org/10.1007/978-3-540-70540-6_
55. Hanke, T. et al.: Extending the public DGS corpus in size and 96
depth. In: Proceedings of the 9th Workshop on the Repre- 72. Jemni, M., Elghoul, O.: Towards Web-based automatic inter-
sentation and Processing of Sign Languages: Sign Language pretation of written text to sign language. In: Proceedings of
Resources in the Service of the Language Community, Tech- the 1st International Conference on ICT & Accessibility, pp.
nological Challenges and Application Perspectives, pp. 75–82 43–48. (2008).
(2020) 73. Jung, H.Y., et al.: Word reordering for translation into Korean
56. Hanke, T.: HamNoSys—Representing sign language data sign language using syntactically-guided classification. ACM
in language resources and language processing contexts. In: Trans. Asian Low-Resource Lang. Inf. Process. 19(2), 1–20
Proceedings of the LREC workshop on the representation and (2019). https://doi.org/10.1145/3357612
processing of sign languages, pp. 1–6 (2004) 74. Kanis, J. et al.: Czech-sign speech corpus for semantic based
57. Hanke, T., Popescu, H.: eSIGN deliverable D2.3: intelligent machine translation. In: Sojka, P., Kopeček, I., Pala, K. (eds.)
sign editor (2003) Proceedings of 9th International Conference on Text, Speech
58. Heloir, A., Kipp, M.: Real-time animation of interactive agents: and Dialogue, pp. 613–620 (2006). https://doi.org/10.1007/
specification and realization. Appl. Artif. Intell. 24(6), 510– 11846406_77
529 (2010). https://doi.org/10.1080/08839514.2010.492161 75. Kanis, J., Müller, L.: Automatic Czech—sign speech transla-
59. Hogan, C., Frederking, R.: An evaluation of the multi-engine tion. In: Proceedings of 10th International Conference on Text,
MT architecture. In: Proceedings of the Conference of the Speech and Dialogue, pp. 488–495 (2007). https://doi.org/10.
Association for Machine Translation in the Americas, pp. 1007/978-3-540-74628-7_63
113–123 (1998). https://doi.org/10.1007/3-540-49478-2_11 76. Kar, P. et al.: INGIT: Limited domain formulaic translation
60. Huang, Z., Eli, A.: STEP: a scripting language for embodied from Hindi strings to Indian sign language. In: International
agents. In: Proceedings of the Workshop of Lifelike Animated Conference on Natural Language Processing (2007)
Agents, pp. 1–6 (2002) 77. Karpouzis, K., Caridakis, G.: Educational resources and imple-
61. Huenerfauth, M.: A linguistically motivated model for speed mentation of a Greek sign language synthesis architecture.
and pausing in animations of American sign language. ACM Comput. Educ. 49(1), 54–74 (2007). https://doi.org/10.1016/j.
Trans. Access. Comput. 2, 2 (2009). https://doi.org/10.1145/ compedu.2005.06.004
1530064.1530067 78. Katyana, Q.: Google’s neural network learns to translate lan-
62. Huenerfauth, M.: A multi-path architecture for machine trans- guages it hasn’t been trained on. https://www.t heregister.co.
lation of English text into American sign language animation. uk/2016/11/17/googles_neural_net_translates_languages_not_
In: Proceedings of the Student Workshop at the Human Lan- trained_on/, Last accessed 2020/09/23
guage Technology Conference/North American Chapter of the 79. Kayahan, D., Gungor, T.: A hybrid translation system from
Association for Computational Linguistics, pp. 25–30 (2004). Turkish spoken language to Turkish sign language. In: IEEE
https://doi.org/10.3115/1614038.1614043 international symposium on innovations in intelligent systems
63. Huenerfauth, M.: American sign language generation: multi- and applications, pp. 1–6 (2019). https://doi.org/10.1109/INI-
modal NLG with multiple linguistic channels. In: Proceedings STA.2019.8778347
of the ACL Student Research Workshop, pp. 37–42 (2005). 80. Kennaway, R.: Synthetic animation of deaf signing gestures.
https://doi.org/10.5555/1628960.1628968 In: 4th International Workshop on Gesture and Sign Language
64. Huenerfauth, M.: An accessibility motivation for an English- Based Human–Computer Interaction, pp. 146–157 (2002).
to-ASL machine translation system (2004) https://doi.org/10.1007/3-540-47873-6_15
65. Huenerfauth, M.: Generating American sign language classi- 81. Khan, N.S., et al.: A novel natural language processing
fier predicates for English-to-ASL machine translation. Ph.D (NLP)—based machine translation model for English to Paki-
thesis. University of Pennsylvania (2006) stan sign language translation. Cognit. Comput. 12, 748–765
66. Huenerfauth, M.: Spatial representation of classifier predi- (2020). https://doi.org/10.1007/s12559-020-09731-7
cates for machine translation into American sign language. 82. Kipp, M. et al.: Sign language avatars: animation and compre-
In: Proceedings of the Workshop on the Representation and hensibility. In: Proceedings of the 10th International Confer-
Processing of Signed Languages, 4th International Conference ence on Intelligent Virtual Agents, pp. 113–126 (2011). https://
on Language Resources and Evaluation, pp. 24–31 (2004) doi.org/10.1007/978-3-642-23974-8
13
Universal Access in the Information Society
83. Kitchenham, B., Charters, S.: Guidelines for performing sys- of the 15th Mexican International Conference on Artificial Intel-
tematic literature reviews in software engineering. Technical ligence, pp. 1–11 (2016)
report EBSE-2007-01. (2007) 101. Luong, M.T. et al.: Effective approaches to attention-based neural
84. Koehn, P. et al.: Moses: open source toolkit for statistical machine translation. In: Conference on Empirical Methods in
machine translation. In: Companion Volume to the Proceed- Natural Language Processing, pp. 1412–1421 (2015). https://d oi.
ings of the 45th Annual Meeting of the Association for Com- org/10.18653/v1/d15-1166
putational Linguistics, pp. 177–180 (2007) 102. Luqman, H., Mahmoud, S.A.: Automatic translation of Arabic
85. Koehn, P. et al.: Statistical phrase-based translation. In: Pro- text-to-Arabic sign language. Univers. Access Inf. Soc. 18(4),
ceedings of the Human Language Technology and North 939–951 (2019). https://doi.org/10.1007/s10209-018-0622-8
American Association for Computational Linguistics Confer- 103. Manzano, D.M.: English to ASL translator for SPEECH2SIGNS.
ence, pp. 48–54 (2003) (2018)
86. Koleli, E.: A new Greek part-of-speech tagger, based on a max- 104. Marshall, I., Safar, E.: Extraction of semantic representa-
imum entropy classifier. Master’s thesis. Athens University of tions from syntactic CMU link grammar linkages. In: Recent
Economics and Business (2011) Advances in Natural Language Processing, pp. 154–159 (2001)
87. Kouremenos, D., et al.: A novel rule based machine translation 105. Marshall, I., Safar, E.: Grammar Development for Sign Language
scheme from Greek to Greek sign language: production of dif- Avatar-Based Synthesis . In: Proceedings of the 11th Interna-
ferent types of large corpora and language models evaluation. tional Conference on Human Computer Interaction. (2005).
Comput. Speech Lang. 51, 110–135 (2018). https://doi.org/10. 106. Marshall, I., Safar, E.: Sign language generation in an ALE
1016/j.csl.2018.04.001 HPSG. In: Proceedings of the 11th International Conference on
88. Kouremenos, D., et al.: A prototype Greek text to Greek sign Head-Driven Phrase Structure Grammar, pp. 189–201 (2004)
language conversion system. Behav. Inf. Technol. 29(5), 467– 107. Marshall, I., Sáfár, É.: A prototype text to British sign language
481 (2010). https://doi.org/10.1080/01449290903420192 (BSL) translation system. In: 41st Annual Meeting of the Associ-
89. Kouremenos, D. et al.: Statistical machine translation for Greek ation for Computational Linguistics, pp. 113–116 (2003). https://
to Greek sign language using parallel corpora produced via doi.org/10.3115/1075178.1075194
rule-based machine translation. In: IEEE 31st International 108. Mauser, A., Ney, H.: Automatic evaluation measures for statisti-
Conference on Tools with Artificial Intelligence (ICTAI), pp. cal machine translation system optimization. In: Proceedings of
1–15 (2018) the 6th International Conference on Language Resources and
90. Kovar, L. et al.: Motion graphs. In: Proceedings of the 29th Evaluation, pp. 28–30
Annual Conference on Computer Graphics and Interactive 109. Mazzei, A. et al.: Deep natural language processing for Italian
Techniques, pp. 473–482 (2002). https://d oi.o rg/1 0.1 145/ Sign Language translation. In: Proceedings of the 13th Confer-
566570.566605 ence of the Italian Association for Artificial Intelligence, pp.
91. Krnoul, Z. et al.: 3D symbol base translation and synthesis of 193–204 (2013). https://doi.org/10.1007/978-3-319-03524-6_17
Czech sign speech. In: Proceedings of the 11th International 110. Mishra, G.S., et al.: Word based statistical machine translation
Conference on Speech and Computer, pp. 530–535 (2006) from English text to Indian sign language. ARPN J. Eng. Appl.
92. Krňoul, Z., Železný, M.: Translation and conversion for Czech Sci. 12(2), 481–489 (2017)
sign speech synthesis. Lect. Notes Comput. Sci. 4629, 524–531 111. Mitkov, R. (ed.): The Oxford Handbook of Computational Lin-
(2007). https://doi.org/10.1007/978-3-540-74628-7_68 guistics. Oxford University Press, Oxford (2005)
93. Le, H.P. et al.: A hybrid approach to word segmentation of 112. Miyazaki, T. et al.: Proper name machine translation from Japa-
Vietnamese texts. In: Proceedings of 2nd International Confer- nese to Japanese sign language. In: Language Technology for
ence on Language and Automata Theory and Applications, pp. Closely Related Languages and Language Variants, pp. 67–75
240–249 (2008). https://doi.org/10.1007/978-3-540-88282-4_ (2014). https://doi.org/10.3115/v1/w14-4209
23 113. Morrissey, S.: Assistive technology for deaf people: Translat-
94. López-ludeña, V., et al.: Translating bus information into sign ing into and animating Irish sign language. In: Proceedings of
language for deaf people. Eng. Appl. Artif. Intell. 32, 258–269 the 12th International Conference on Computers Helping People
(2014). https://doi.org/10.1016/j.engappai.2014.02.006 with Special Needs, pp. 8–14 (2008)
95. López-Ludeña, V., et al.: Automatic categorization for improv- 114. Morrissey, S., Way, A.: An example-based approach to translat-
ing Spanish into Spanish Sign Language machine translation. ing sign language. In: Workshop example-based machine transla-
Comput. Speech Lang. 26(3), 149–167 (2012). https://doi.org/ tion (MT X-05), pp. 109–116 (2005)
10.1016/j.csl.2011.09.003 115. Morrissey, S., Way, A.: Joining hands: developing a sign lan-
96. López-Ludeña, V. et al.: Factored translation models for improv- guage machine translation system with and for the deaf com-
ing a speech into sign language translation system. In: Proceed- munity. In: Proceedings of the Conference and Workshop
ings of the Conference of the International Speech Communica- on Assistive Technologies for People with Vision & Hearing
tion Association, pp. 1605–1608 (2011) Impairments, pp. 1–6 (2007)
97. López-Ludeña, V., et al.: Increasing adaptability of a speech 116. Morrissey, S., Way, A.: Lost in translation: the problems of using
into sign language translation system. Expert Syst. Appl. 40(4), mainstream MT Evaluation metrics for sign language translation.
1312–1322 (2013). https://doi.org/10.1016/j.eswa.2012.08.059 In: Proceedings of the 5th SALTMIL Workshop on Minority
98. López-Ludeña, V., et al.: Methodology for developing an Languages at Language Resources and Evaluation Conference,
advanced communications system for the deaf in a new domain. pp. 91–98 (2006)
Knowl.-Based Syst. 56, 240–252 (2014). https://doi.org/10. 117. Morrissey, S., Way, A.: Manual labour: tackling machine trans-
1016/j.knosys.2013.11.017 lation for sign languages. Mach. Transl. 27(1), 25–64 (2013).
99. Lopez Ludeña, V. et al.: Methodology for developing a speech https://doi.org/10.1007/s10590-012-9133-1
into sign language translation system in a new semantic domain. 118. Nagao, M.: Framework of a mechanical translation between Japa-
In: Proceedings of the Conference on IberSPEECH, pp. 193–203. nese and English by analogy principle. In: Proceedings of the
Madrid, Spain (2012) International NATO symposium on artificial and human intel-
100. Ludeña, V.L., San-segundo, R.: Statistical methods for improving ligence, pp. 173–180 (1984). https://doi.org/10.7551/mitpress/
Spanish into Spanish sign language translation. In: Proceedings 5779.003.0038
13
Universal Access in the Information Society
119. Needleman, S.B., Wunsch, C.D.: A general method applicable 137. Saunders, B. et al.: Everybody sign now: translating spoken lan-
to the search for similarities in the amino acid sequence of two guage to photo realistic sign language video (2020)
proteins. J. Mol. Biol. 48(3), 443–453 (1970). https://doi.org/10. 138. Saunders, B. et al.: Progressive transformers for end-to-end sign
1016/0022-2836(70)90057-4 language production. In: Proceedings of European Conference on
120. Nguyen, T.B.D. et al.: A rule based method for text shortening in Computer Vision, pp. 687–705 (2020). https://doi.org/10.1007/
Vietnamese sign language translation. In: Information Systems 978-3-030-58621-8_40
Design and Intelligent Applications. Advances in Intelligent Sys- 139. Selcuk-Simsek, M., Cicekli, I.: Bidirectional machine transla-
tems and Computing (2018). https://doi.org/10.1007/978-981- tion between Turkish and Turkish sign language: a data-driven
10-7512-4_65 approach. Int. J. Nat. Lang. Comput. 6(3), 33–46 (2017). https://
121. Nießen, S. et al.: An evaluation tool for machine translation: fast doi.org/10.5121/ijnlc.2017.6303
evaluation for MT research. In: Proceedings of 2nd International 140. Shieber, S.M., Yves, S.: Synchronous tree-adjoining grammars.
Conference on Language Resources and Evaluation, pp. 39–45 In: Proceedings of the 13th International Conference on Compu-
(2000) tational Linguistics, pp. 253–258 (1990). https://d oi.o rg/1 0.3 115/
122. Othman, A., Tmar, Z.: English-ASL gloss parallel corpus 2012: 991146.991191
ASLG-PC12. In: 5th Workshop on the Representation and Pro- 141. Sleator, D.D., Temperley, D.: Parsing English with a link gram-
cessing of Sign Languages (2012) mar. Technical report CMU-CS-91-196 (1991)
123. Otoom, M., Alzubaidi, M.A.: Ambient intelligence framework 142. Snover, M. et al.: Study of translation edit rate with targeted
for real-time speech-to-sign translation. Assist. Technol. 30(3), human annotation. In: Proceedings of the 7th Conference of
119–132 (2018). https://doi.org/10.1080/10400435.2016.12682 the Association for Machine Translation in the Americas, pp.
18 223–231. Cambridge, MA (2006)
124. Papageorgiou, H. et al.: A unified POS tagging architecture and 143. Souteh, Y., Bouzoubaa, K.: SAFAR platform and its morphologi-
its application to Greek. In: Proceedings of the 2nd Language cal layer. In: Proceedings of the 11th Conference on Language
Resources and Evaluation Conference, pp. 1455–1462 (2000) Engineering, pp. 14–15 (2011)
125. Papineni, K. et al.: BLEU: a method for automatic evaluation of 144. Speers, d’ A.L.: Representation of American sign language for
machine translation. In: Proceedings of the 40th Annual Meeting machine translation. Ph.D thesis. Georgetown University (2001)
of the Association for Computational Linguistics, pp. 311–318 145. Stein, D., et al.: Analysis, preparation, and optimization of sta-
(2002). https://doi.org/10.3115/1073083.1073135 tistical sign language machine translation. Mach. Transl. 26(4),
126. Porta, J., et al.: A rule-based translation from written Spanish 325–357 (2012). https://doi.org/10.1007/s10590-012-9125-1
to Spanish sign language glosses. Comput. Speech Lang. 28(3), 146. Stein, D. et al.: Morpho-syntax based statistical methods for
788–811 (2014). https://doi.org/10.1016/j.csl.2013.10.003 automatic sign language translation. In: Proceedings of the 11th
127. Quach, L., Nguyen, C.-N.: Conversion of the Vietnamese Annual Conference of the European Association for Machine
grammar into sign language structure using the example-based Translation, pp. 169–177. Oslo, Norway (2006)
machine translation algorithm. In: International Conference on 147. Stoll, S., et al.: Text2Sign: towards sign language production
Advanced Technologies for Communications, pp. 27–31 (2018). using neural machine translation and generative adversarial net-
https://doi.org/10.1109/ATC.2018.8587584 works. Int. J. Comput. Vis. 128, 891–908 (2020). https://doi.org/
128. Safar, E., Glauert, J.: Computer modelling. In: Pfau, R., et al. 10.1007/s11263-019-01281-2
(eds.) Sign Language, pp. 1075–1102. De Gruyter Mouton, Ber- 148. Su, H.Y., Wu, C.H.: Improving structural statistical machine
lin (2012). https://doi.org/10.1515/9783110261325.1075 translation for sign language with small corpus using thematic
129. Safar, E., Marshall, I.: The architecture of an English-text-to- role templates as translation memory. IEEE Trans. Audio Speech
sign-languages translation system. In: Recent Advances in Natu- Lang. Process. 17(7), 1305–1315 (2009). https://d oi.o rg/1 0.1 109/
ral Language Processing, pp. 223–228. Bulgaria (2001) TASL.2009.2016234
130. Sáfár, É., Marshall, I.: Sign language translation via DRT and 149. Suszczanska, N. et al.: Translating Polish texts into sign language
HPSG. In: Proceedings of the 3rd International Conference on in the TGT system. In: 20th IASTED International Multi-Con-
Intelligent Text Processing and Computational Linguistics, pp. ference on Applied Informatics, pp. 282–287 (2002)
58–68 (2002). https://doi.org/10.1007/3-540-45715-1_5 150. Sutton, C., McCallum, A.: An introduction to conditional ran-
131. San-Segundo, R. et al.: A Spanish speech to sign language trans- dom fields By Charles Sutton and Andrew McCallum. Found.
lation system for assisting deaf-mute people. In: Proceedings of Trends Mach. Learn. 4(4), 267–373 (2012). https://doi.org/10.
the 9th International Conference on Spoken Language Process- 1561/2200000013
ing, pp. 1399–1402 (2006) 151. Tillmann, C. et al.: Accelerated DP based search for statistical
132. San-Segundo, R., et al.: Design, development and field evalua- translation. In: Proceedings of the 5th European Conference on
tion of a Spanish into sign language translation system. Pattern Speech Communication and Technology, pp. 2667–2670 (1997)
Anal. Appl. 15(2), 203–224 (2012). https://doi.org/10.1007/ 152. Tokuda, M., Okumura, M.: Towards automatic translation from
s10044-011-0243-9 Japanese into Japanese sign language. Assist. Technol. Artif.
133. San-Segundo, R., et al.: Proposing a speech to gesture transla- Intell. Robot. User Interfaces Nat. Lang. Process. 1458, 97–108
tion architecture for Spanish deaf people. J. Vis. Lang. Comput. (1998). https://doi.org/10.1007/bfb0055973
19(5), 523–538 (2008). https://doi.org/10.1016/j.jvlc.2007.06. 153. Toutanova, K. et al.: Feature-rich part-of-speech tagging with a
002 cyclic dependency network. In: Conference of the North Ameri-
134. San-Segundo, R., et al.: Speech to sign language translation can Chapter of the Association for Computational Linguistics &
system for Spanish. Speech Commun. 50(11–12), 1009–1020 Human Language Technologies on Human Language Technolo-
(2008). https://doi.org/10.1016/j.specom.2008.02.001 gies, pp. 173–180 (2003)
135. Sandler, W., Lillo-Martin, D.: Sign Language and Linguistic 154. Veale, T., et al.: The challenges of cross-modal translation: Eng-
Universals. J. Linguist. 42(3), 738–742 (2006). https://doi.org/ lish-to-sign-language translation in the Zardoz system. Mach.
10.1017/CBO9781139163910 Transl. 13(1), 81–106 (1998). https://doi.org/10.1023/A:10080
136. Saunders, B. et al.: Adversarial training for multi-channel sign 14420317
language production. In: Proceedings of the British Machine 155. Veale, T., Collins, B.: Space, metaphor and schematization
Vision Conference (2020) in sign: sign language translation in the ZARDOZ system.
13
Universal Access in the Information Society
In: Proceedings of the 2nd Conference of the Association for generation. In: Proceedings of the Annual Research Conference
Machine Translation in the Americas, pp. 168–179 (1996) of the South African Institute of Computer Scientists and Infor-
156. Veale, T., Conway, A.: Cross modal comprehension in ZARDOZ mation Technologists on IT Research in Developing Countries,
an English to sign-language translation system. In: 4th Interna- pp. 127–134 (2006). https://doi.org/10.1145/1216262.1216276
tional Workshop on Natural Language Generation, pp. 67–72 167. Van Zijl, L., Olivrin, G.: South African sign language assistive
(1994). https://doi.org/10.3115/1641417.1641450 translation. In: Proceedings of the IASTED International Confer-
157. Ventura, L. et al.: Can everybody sign now? Exploring sign ence on Telehealth/Assistive Technologies, pp. 7–12 (2008)
language video generation from 2D poses. In: Sign Language 168. Zwitserlood, I. et al.: Synthetic signing for the deaf: esign. In:
Recognition, Translation and Production Workshop (2020) Proceedings of the conference and workshop on assistive tech-
158. Veríssimo, V.M., et al.: Towards an open platform for machine nologies for vision and hearing impairment (2004)
translation of spoken languages into sign languages. Mach. 169. Blender Tool. https://www.blender.org/features/animation/. Last
Transl. 33(4), 315–348 (2019). https:// d oi. o rg/ 1 0. 1 007/ accessed 2020/09/04
s10590-019-09238-5 170. Curious Labs POSER. https://curious-labs-poser.software.infor
159. Wang, Z., et al.: Image quality assessment: from error visibility to mer.com/6.0/. Last accessed 2020/09/22
structural similarity. IEEE Trans. Image Process. 13(4), 600–612 171. ELAN annotation tool. https://www.mpi.nl/corpus/html/elan/.
(2004). https://doi.org/10.1109/TIP.2003.819861 Last accessed 2020/08/09
160. Waterman, M., Smith, T.: Identification of common molecular 172. Google Neural Machine Translation. https://en.wikipedia.org/
subsequences. J. Mol. Biol. 147(1), 195–197 (1981). https://doi. wiki/ G oogle_ Neural_ M achi n e_ Trans l ation. Last accessed
org/10.1016/0022-2836(81)90087-5 2020/09/18
161. Wray, A., et al.: A formulaic approach to translation at the post 173. Humanoid Animation. https://www.web3d.org/working-groups/
office: reading the signs. Lang. Commun. 24, 59–75 (2004). humanoid-animation-hanim. Last accessed 2020/09/23
https://doi.org/10.1016/j.langcom.2003.08.001 174. Maya Tool. https://www.autodesk.in/products/maya/overview?
162. Wu, C.H., et al.: Transfer-based statistical translation of Taiwan- plc=MAYA&term=1-YEAR&suppor t=ADVANCED&quant
ese sign language using PCFG. ACM Trans. Asian Lang. Inf. ity=1. Last accessed 2020/08/08
Process. 6(1), 1–18 (2007). https://doi.org/10.1145/1227850. 175. Semantic Role Lists. http://elies.rediris.es/elies11/cap5111.htm.
1227851 Last accessed 2021/04/05
163. Xu, W. et al.: End-to-end slot alignment and recognition for 176. Unity 3D. https://unity.com/. Last accessed 2020/08/08
cross-lingual NLU. In: Proceedings of the Conference on Empiri- 177. VComD:Sign Smith Studio. http://w ww.v com3d.c om/s ignsm ith.
cal Methods in Natural Language Processing, pp. 5052–5063 php. Last accessed 2010/08/08
(2020) 178. WebGL. https:// w ww. k hron o s. o rg/ webgl/. Last accessed
164. Zhao, L. et al.: A machine translation system from English to 2021/04/12
American sign language. In: Proceedings of the 4th Conference
of the Association of Machine Translation, pp. 293–300 (2000). Publisher’s Note Springer Nature remains neutral with regard to
https://doi.org/10.1007/3-540-39965-8_6 jurisdictional claims in published maps and institutional affiliations.
165. Van Zijl, L.: South African sign language machine translation
project. In: Proceedings of the 8th International Conference on
Computers and Accessibility, pp. 233–234 (2006). https://doi.
org/10.1145/1168987.1169031
166. Van Zijl, L., Combrink, A.: The South African sign lan-
guage machine translation project: issues on non-manual sign
13