MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

MT Impact – Horizon 2020 (and beyond)

Rudy Tirry
President EUATC
Country Manager LIONBRIDGE BELGIUM
Agenda

• MT usage today
• NMT vs SMT
• Post-editing
• MT usage tomorrow
• Machine vs Human
MT usage
2017 Language Industry Survey - Trends

35%

30%
30%

25% 23%

20%

15% 13%
11%
10%

4% 3%
5% 3% 3% 3%
2% 2%
0%
2018 – the year of MT

No MT usage 2016 2017 2018 (prel.)


LSP’s 80% 57% 36%
Freelancers 69% 67% 48%
Source: European Language Industry Survey
No MT
100%
80%
60%
40%
20%
0%
2016 2017 2018 (prel)
Companies Individuals
What about Neural ?
Neural MT – why the hype ?
Where do you find neural MT today?

• Google Translate
• Skype Translate
• Microsoft Translator Live
• Facebook
• Amazon

• DeepL
• Omniscien
• SDL
• Systran
• KantanMT
• Tilde

The new Wild Wild West in machine translation


SMT vs NMT : main differences

Statistical Machine Translation (SMT) Neural Machine Translation (NMT)


• Phrase-based • Uses (recurrent) neural networks (« deep »
• Separate language model, translation model and NMT uses several layers of neural networks)
reordering model • Sentence-based
• Fast training • One single sequencing model – simpler than
SMT approach
• ‘Predicts’ next word
• Restricted vocabulary (max. 50,000)
• More time needed for training
• No easy solution for terminology
• Less tolerant for low quality source
• More pre- and post-processing required
Is it really that good ?
Findings Tilde (www.tilde.com/about/news/316)
Findings DFKI/QT21 Project
Phenomenon Occurrences Percentage correct
NMT Moses
Formal address 138 90% 86%
Genitive 114 92% 68%
Modal construction 290 94% 75%
Negation 101 93% 86%
Passive voice 109 83% 40%
Predicate adjective 122 81% 75%
Prepositional phrase 104 81% 75%
Terminology 330 35% 68%
Tagging 145 83% 100%
Sum/average 1453 89% 73%
Findings DFKI/QT21 Project

Source:
MultiLingual Jan ’18, John Tinsley
So, is it better ?

• NMT makes 3 to 5 times less errors in


– Word ordering
– Morphology
– Syntax
– Agreements
Source: Tilde (EN>ET)
This leads to more fluent translations, mainly on ‘difficult’ languages

• BUT
– Older techniques (RBMT, trained SMT) can perform better on ambiguity
(source:PBML, Aljoscha Burchardt et al., June 2017), terminology and tagging
– Still quite some ‘dangerous’ errors, such as negation (although better than SMT)
– Traditional automated evaluation methods (BLUE score) do not always agree with
human evaluation results
– Uneven results, depending on language pair
Let’s talk Post-editing
Post-editing levels

• Full MTPE
– No distinction with full human translation
• Light MTPE
– Correct understanding
– No effort on stylistic aspects
– Varying practices regarding linguistic accuracy
– Run automated QA rules (ex. check for missing negation)
• Focused MTPE
– Specific rules
• Specific – highly visible – parts of the content
• Check important elements like numbers, names, etc.
• …
Traps

• Higher fluency misleads post-editor


• Terminology
• Tag order issues
• « Target first » approach not ideal
So where does that lead us ?
MT penetration – a personal view
100%
Basic procedural
2030 content

Complex procedural Basic editorial


content content
2030

2030 2030 Complex


editorial
2018 2020 content
2018 (marketing
brochures)
2020 2020
2030
Standard-
ized 2018 2020
content Creative
User-generated Controlled 2018 2018 content
content language 2018 (ads)
Future role(s) of translator

« Machine translation will replace only those translators


that translate like a machine »

« The machine will take care of the keystrokes.


The translator will add the human dimension - the cherry on the cake.»

• Non-MT content (<20% - same as non-CAT content)


– Transcreator / Copy-editor
• MT content
– Full post-editing
– Light post-editing
– Focused post-editing
Translator = Post-editor = [Augmented Translator] ?

• Do we need the same profile ?


• Editorial translation
• Full post editing
• Light post editing (cf software testing)

• Do we need the same training ?


• Creative writing
• Pattern recognition (search for typical MT errors)
• Eye for detail and critical sense
• General (world) knowledge: disambiguation, logical errors
Workable MT is here. Are we ready to work with it ?

Will translation buyer expectations become realistic ?


Will translators embrace technology ?
Will translation companies find a workable business model ?
Will universities adapt training programmes to prepare future generations ?
Will translation tool providers be able to integrate and standardise ?

Come and see in 2, or rather 12 years !


Q&A

[email protected]

You might also like