MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry

MT Impact – Horizon 2020 (and beyond)
Rudy Tirry
President EUATC
Country Manager LIONBRIDGE BELGIUM
Agenda
• MT usage today
• NMT vs SMT
• Post-editing
• MT usage tomorrow
• Machine vs Human
MT usage
2017 Language Industry Survey - Trends
35%
30%
30%
25% 23%
20%
15% 13%
11%
10%
4% 3%
5% 3% 3% 3%
2% 2%
0%
2018 – the year of MT
No MT usage 2016 2017 2018 (prel.)

LSP’s 80% 57% 36%
Freelancers 69% 67% 48%
Source: European Language Industry Survey
No MT
100%
80%
60%
40%
20%
0%
2016 2017 2018 (prel)
Companies Individuals
What about Neural ?
Neural MT – why the hype ?
Where do you find neural MT today?
• Google Translate
• Skype Translate
• Microsoft Translator Live
• Facebook
• Amazon
…
• DeepL
• Omniscien
• SDL
• Systran
• KantanMT
• Tilde
…
The new Wild Wild West in machine translation

SMT vs NMT : main differences
Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

• Phrase-based • Uses (recurrent) neural networks (« deep »
• Separate language model, translation model and NMT uses several layers of neural networks)
reordering model • Sentence-based
• Fast training • One single sequencing model – simpler than
SMT approach
• ‘Predicts’ next word
• Restricted vocabulary (max. 50,000)
• More time needed for training
• No easy solution for terminology
• Less tolerant for low quality source
• More pre- and post-processing required
Is it really that good ?
Findings Tilde (www.tilde.com/about/news/316)
Findings DFKI/QT21 Project
Phenomenon Occurrences Percentage correct
NMT Moses
Formal address 138 90% 86%
Genitive 114 92% 68%
Modal construction 290 94% 75%
Negation 101 93% 86%
Passive voice 109 83% 40%
Predicate adjective 122 81% 75%
Prepositional phrase 104 81% 75%
Terminology 330 35% 68%
Tagging 145 83% 100%
Sum/average 1453 89% 73%
Findings DFKI/QT21 Project
Source:
MultiLingual Jan ’18, John Tinsley
So, is it better ?
• NMT makes 3 to 5 times less errors in

– Word ordering
– Morphology
– Syntax
– Agreements
Source: Tilde (EN>ET)
This leads to more fluent translations, mainly on ‘difficult’ languages
• BUT
– Older techniques (RBMT, trained SMT) can perform better on ambiguity
(source:PBML, Aljoscha Burchardt et al., June 2017), terminology and tagging
– Still quite some ‘dangerous’ errors, such as negation (although better than SMT)
– Traditional automated evaluation methods (BLUE score) do not always agree with
human evaluation results
– Uneven results, depending on language pair
Let’s talk Post-editing
Post-editing levels
• Full MTPE
– No distinction with full human translation
• Light MTPE
– Correct understanding
– No effort on stylistic aspects
– Varying practices regarding linguistic accuracy
– Run automated QA rules (ex. check for missing negation)
• Focused MTPE
– Specific rules
• Specific – highly visible – parts of the content
• Check important elements like numbers, names, etc.
• …
Traps
• Higher fluency misleads post-editor

• Terminology
• Tag order issues
• « Target first » approach not ideal
So where does that lead us ?
MT penetration – a personal view
100%
Basic procedural
2030 content
Complex procedural Basic editorial

content content
2030
2030 2030 Complex

editorial
2018 2020 content
2018 (marketing
brochures)
2020 2020
2030
Standard-
ized 2018 2020
content Creative
User-generated Controlled 2018 2018 content
content language 2018 (ads)
Future role(s) of translator
« Machine translation will replace only those translators

that translate like a machine »
« The machine will take care of the keystrokes.

The translator will add the human dimension - the cherry on the cake.»
• Non-MT content (<20% - same as non-CAT content)

– Transcreator / Copy-editor
• MT content
– Full post-editing
– Light post-editing
– Focused post-editing
Translator = Post-editor = [Augmented Translator] ?
• Do we need the same profile ?

• Editorial translation
• Full post editing
• Light post editing (cf software testing)
• Do we need the same training ?

• Creative writing
• Pattern recognition (search for typical MT errors)
• Eye for detail and critical sense
• General (world) knowledge: disambiguation, logical errors
Workable MT is here. Are we ready to work with it ?
Will translation buyer expectations become realistic ?

Will translators embrace technology ?
Will translation companies find a workable business model ?
Will universities adapt training programmes to prepare future generations ?
Will translation tool providers be able to integrate and standardise ?
Come and see in 2, or rather 12 years !

Q&A
[email protected]

MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry

Uploaded by

Copyright:

Available Formats

MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry

Uploaded by

Copyright:

Available Formats

MT Impact – Horizon 2020 (and beyond)

No MT usage 2016 2017 2018 (prel.)

The new Wild Wild West in machine translation

Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

• NMT makes 3 to 5 times less errors in

• Higher fluency misleads post-editor

Complex procedural Basic editorial

2030 2030 Complex

« Machine translation will replace only those translators

« The machine will take care of the keystrokes.

• Non-MT content (<20% - same as non-CAT content)

• Do we need the same profile ?

• Do we need the same training ?

Will translation buyer expectations become realistic ?

Come and see in 2, or rather 12 years !

You might also like