Lionbridge 2023 Machine Translation Report Whitepaper
Lionbridge 2023 Machine Translation Report Whitepaper
Despite past advancements and growing use cases, What Does the Future Hold
MT still has limitations. Some of its longstanding quality for Machine Translation?
issues include its inability to attain and consistently Having tracked the major MT engines for many
achieve the right formality level, tone, or handling of years and masterfully leveraging new technologies,
negation. These limitations deter growth. Research Lionbridge is well-suited to analyze developments in
into and the use of Large Language Models (LLMs) 2023 and beyond. We anticipate MT’s existing Neural
hold promise to resolve these issues and unlock a new Machine Translation (NMT) paradigm will end. A
technological leap for Machine Translation. new paradigm will replace it, likely based on Large
Language Models (LLMs) like ChatGPT. The release
Big Tech’s investments in LLM technology — such as of GPT-4 and growth in LLMs are having significant
Microsoft’s $10B investment in OpenAI, the company business implications.
behind the ChatGPT, GPT-3, and GPT-4 models — is
accelerating the development of this technology and You can expect the following:
advancing the Natural Language Processing (NLP)
field. These advancements will inevitably disrupt the A significant leap in MT quality, including
translation and localization industry and change how workflow automations
companies create and translate content. Increased content output
A reduced supply of top-notch human translators
The exponential advancements of NLP, specifically Increased adoption of Machine Translation
LLMs, will transform how content is created and Machine Translation as a means for Customer
localized. The upshot will be exponential gains in Experience (CX) enhancement
productivity and speed as human translators process
much larger volumes of content. Every global company wishing to thrive in our
interconnected economy must embrace and fully
Companies that master and leverage AI in their content leverage Machine Translation. Read on as we examine
engines will gain a significant competitive advantage in the technology’s developments — or lack thereof — in
our increasingly digital economies. 2022, what that has meant in 2023, and what it will
mean for the years ahead.
To fully capitalize on Machine Translation and task at hand and more advanced Machine Learning,
enjoy its profound benefits — for the first time, which is how computers gain the knowledge required
enabling companies to Locailize everything™ — it’s for AI applications.
necessary to have a fundamental understanding of
the evolution of the technology. Machine Learning
Machine Learning (ML) is a branch of computer
What is MT? What has triggered widespread global science that uses massive amounts of data to teach
adoption? What are its major strengths and pitfalls to computers how to perform tasks. Machine Learning
avoid when using it? And what is the backdrop against examines data related to a particular task, finds
which it has been evolving? patterns in those data, makes associations among
those patterns, and then uses those new learnings to
Artificial Intelligence shape how the computer performs the task.
In the most basic sense, Machine Translation uses
Artificial Intelligence (AI), or the “intelligence” If, after this analysis, the computer gets better at
machines demonstrate, to perform tasks that usually performing the task, then Machine Learning has
require inherently human thinking, such as learning occurred. Because we have vast language and
and problem-solving. In this case, AI is used to localization data, people are using Machine Learning
perform translations. In recent years, AI has benefited to improve computer performance in everything from
from increasing computer power. More powerful weather forecasting to automatic stock selection to
computers yield more intensive processing during a Machine Translation.
Statistical Models language. When it’s time to translate new material, the
Statistical Machine Translation (SMT) relies on a large SMT system breaks the new source sentence down into
number of translation candidates for a given source n-grams, finds the highly associated target language
sentence, then selects the best one based on the n-grams, and generates candidate sentences.
likelihood of words and phrases appearing together in
the target language. The final translation is that sentence whose target
language n-grams correlate most highly with the
SMT learns about translation through the lens of source sentence’s n-grams and whose target language
“n-grams” — small groupings of words that appear words are most likely to appear together in the target
together in the source and target language. language. SMT works surprisingly well, especially
The SMT system is given training material — that is, since there is nothing linguistic about an SMT system;
many examples of sentences in the source language indeed, the system only considers n-grams, never a
and their translations into the target language. complete sentence.
The learning algorithm divides source sentences and Hybrid Machine Translation
target sentences into n-grams. It determines which Companies then began experimenting with
target language n-grams are likely to appear in a Hybrid Machine Translation (HMT), which combined
translation when a certain source language n-gram the output of Statistical Machine Translation and
appears in a sentence. Rule-based Machine Translation systems.
These advancements popularized Machine Translation
The learning algorithm then builds a language model technology and helped adoption on a global scale.
that calculates the likelihood that given words and Another technological leap would come from a newer
phrases appear next to one another in the target approach to MT: Neural Machine Translation.
1954 Georgetown researchers perform the first-ever public demonstration of an early MT system.
1962 The Association for Machine Translation and Computational Linguistics is formed in the U.S.
1970 The French Textile Institute begins translating abstracts using an MT system.
1989 Trados is the first to develop and market Translation Memory technology.
1991 The first commercial MT system between Russian, English, and German-Ukrainian is
developed at Kharkov State University.
1996 Systran and Babelfish offer free translations of small texts on the web.
2002 Lionbridge executes its first commercial MT project using its rule-based MT engine.
Mid-2000s Statistical MT systems launch to the public. Google Translate launched in 2006, and Microsoft
Live Translator launched in 2007.
2012 Google announces that Google Translate translates enough text to fill 1 million books daily.
2016 Both Google and Microsoft enable Neural Machine Translation (NMT), slashing word order
mistakes and significantly improving lexicon and grammar.
2020 As of October, Google Neural Machine Translation (GNMT) supports 109 languages.
2022 ChatGPT, a Large Language Model (LLM) that can generate human-like text based on context,
goes mainstream in November with significant implications for Machine Translation.
2023 A major MT paradigm shift is anticipated as a type of LLM evolves and disrupts MT.
Why MT Engines Make Catastrophic Errors The public witnessed a real-world example of a
Think of a catastrophic error as an MT engine catastrophic error involving a proper name on a
malfunction. It can occur if the engine doesn’t Spanish governmental agency website. In that
understand the context of the text, such as when one instance, the department head’s name, Dolores del
word has two meanings or if there is a typo in the Campo, was omitted from the ministry’s official site.
source text. These errors can happen if the engine is Instead, the literal translation — It is pain of field —
not trained well or a flawed glossary is used, which appeared in place of the name.
then causes the same mistakes to appear repeatedly.
Catastrophic errors occur because engines are
imperfect despite their sophistication. Machines
cannot exercise judgment the way people can.
What Are the 2022 MT Key Trends? We calculated how well three major engines handled
Lionbridge MT experts found 2022 to be notable automated translations from English into numerous
for both what did — and did not — transpire. languages. We determined the quality by calculating
Having observed so many MT-related technological the average edit distance — the number of edits a
advancements during the past few years, our team human must make to the MT output for the resulting
anticipated more of the same. But MT did not make translation to be as good as a human translation.
significant strides, as our Machine Translation
Tracker revealed. The lower the number, the more effective the
automated translation is. As shown in Figure 1, paying
With rare exceptions, the major engines made attention to these results is worth a company’s while.
little-to-no improvements during the year. This trend
has implications for the future. But first, let’s take a According to our analysis, in certain situations:
closer look at the 2022 results.
DeepL translated Spanish better than
Google and Microsoft
How Did the Top MT Engines Perform in 2022?
When a company wants to start using MT or improve Google translated Japanese better than DeepL
the way it currently uses MT, it is critical to identify Microsoft translated Polish better than DeepL
which MT engines will work best based on their The three engines performed similarly for
specific needs. As we delve deeper into how the major Italian, Turkish, and Hebrew
MT engines performed in 2022, one thing becomes
very clear: One engine can’t do it all. These results demonstrate the complexity and
challenges inherent in Machine Translation, which
Comparison of MT Engine Performance involves navigating the nuances and complexities of
Based on Language different languages, cultures, and domains.
A company working with Spanish content benefited
from selecting DeepL for its automated translations; It is not surprising to see variations in performance
it had better alternative options when translating across various MT engines, as no single algorithm or
Japanese. That’s because each engine’s performance approach can work perfectly for all languages and
varies based on the language it handles. content types.
Turkish MT Provider
Thai
DeepL
Google
Swedish
MicrosoftV3
Spanish
Slovenian
Romanian
Polish
Norwegian
Latvian
Japanese
Italian
Hebrew
Greek
French
Dutch
Danish
Chinese
Brazilian
Computing Hardware
Life Sciences
Financial
Electronics
Computing Software
MT Provider
Travel and Leisure
Amazon
DeepL
Textile and Fashion
Google
Media and Marketing MicrosoftV3
Life Sciences
Financial
Electronics
Computing Software
Computing Hardware
Edit Distance
0.2 0.2 0.2 0.2 0.2
Timeline
Timeline
Bing
Bing
DeepL
DeepL
Google
Google
Yandex
Yandex
Amazon
Amazon
Page 21 | LIONBRIDGE.COM
Comparison of Machine Translation As shown in Figure 6:
Engine Quality per Language
How did the main engines perform against one There were minimal MT improvements overall,
another in 2022, specifically for German, Spanish, as reflected by the scale used to measure the
Russian, and Chinese? We measured quality based inverse edit distance
on the inverse edit distance. Microsoft Bing made minor improvements
in German, Spanish, and Chinese during
The edit distance measures the number of edits a October/November
human must make to the MT output for the resulting 2022 proved to be a flat year
translation to be as good as a human translation.
We can conclude that Neural Machine Translation has
The inverse edit distance means the higher the hit a plateau. A new iteration will be necessary for MT
resulting number, the better the quality. to make significant quality gains.
Performance of Machine Translation Engines per Select Languages via Inverse Edit Distance
German Spanish
Inverse Edit Distance Inverse Edit Distance
Timeline Timeline
Russian Chinese
Inverse Edit Distance Inverse Edit Distance
Timeline Timeline
Figure 6. A comparison of MT quality per language based on the inverse edit distance
Timeline Timeline
Timeline Timeline
Timeline Timeline
Media, Advertising, and Marketing Travel, Tourism, Recreation, Leisure, and Arts
Inverse Edit Distance Inverse Edit Distance
Timeline Timeline
Figure 7. A comparison of MT quality per domain based on the inverse edit distance
1 Portuguese 15 Turkish
2 Spanish 16 Slovak
3 French 17 Hebrew
4 Italian 18 Latvian
7 Danish 21 Lithuanian
8 Japanese 22 Czech
9 Greek 23 Arabic
10 Romanian 24 Estonian
11 Thai 25 Korean
12 Norwegian 26 Russian
13 German 27 Hungarian
14 Swedish 28 Finnish
Terminology To Improve Domain Performance Not Translate (DNT) and glossary lists added to a
As noted, generic MT engines can put out erroneous specific profile to address Machine Translation
translations; they can especially cause undesired terminology. We help our customers create and
results for specific domains from a terminological maintain glossaries, regularly refined to include new,
point of view. The impact can be particularly harmful relevant terms and retire obsolete terminology. When
to the medical and legal fields. The effective use of glossaries are created once in Smairt MT, they can be
terminology can enable you to improve the quality of used for all the MT engines, saving time and money.
MT and achieve accurate, consistent translations no
matter what your subject matter is. Using glossaries for MT projects is more complex
than it may seem. Glossaries, if used inappropriately,
It’s imperative to train customized MT systems with can negatively affect the overall quality of Machine
domain-specific bilingual texts that include specialized Translation. The best way to follow terminology in MT
terminology. Still, when engines are trained with is through MT training. The combination of trained MT
specialized texts, accurate translations cannot be engines, glossary customization, and the identification
guaranteed if the terminology is not used consistently. of preprocessing and post-processing rules ensure MT
Research in this area proposes to inject linguistic output contains proper terminology and is similar in
information into Neural Machine Translation (NMT) style to the customer's documentation.
systems. Implementing manual or semi-automatic
annotation depends on available resources, such as MT Customization vs. MT Training
glossaries, and constraints, such as time, cost, and MT customization and MT training can help you get
availability of human annotators. more out of your MT output, but you must be intentional
about when to apply these methods. Table 2 provides
Lionbridge’s Smairt MT allows the application of an overview of Machine Translation customization
linguistic rules to the source and target text and vs. Machine Translation training and offers some
the enforcement of terminology based on Do considerations when evaluating each method.
MT Customization MT Training
Improves MT’s suggestions for more accurate Improves MT’s suggestions for more accurate
What it does output and reduces the need for post-editing output and reduces the need for post-editing
Enables companies to adhere to their brand name Enables companies to attain a specific brand voice,
Specific benefits and terminology and achieve regional variations tone, and style and achieve regional variations
Ideal for technological and detail-oriented content Ideal for highly specialized content, marketing and
and any content that requires: creative content, and any content that requires:
When to use it
• Accurate translations of terminology • A specific brand voice, tone, or style
• Regional variation, but you lack • Regional variation, and you have
sufficient data for MT training enough data for MT training
What can we conclude about the state of Machine We’re betting that Large Language Models (LLMs)
Translation from the 2022 data and surprising results — with their massive amounts of content, including
that mainly showed stagnant quality performances for multimodality and multilingualism — will have
the year? The technology is mature and will continue something to do with a future paradigm.
to attain widespread adoption as it has unequivocally
proven its value as a business-grade technology. Why do we think this? Because of the results of our
ground-breaking analysis that compared ChatGPT's
People recognize the technology’s usefulness for translation performance with the performance
almost any translation case — with or without human of MT engines.
intervention and hybrid approaches. Indeed, according
to Global Market Insights, the translation market size OpenAI’s ChatGPT produced inferior results than
is projected to grow at a Compound Annual Growth designated MT engines — but not by much. Its
Rate (CAGR) of 30% from 2022 to 2030. Companies will performance was nothing short of remarkable.
increasingly embrace MT — including those businesses GPT-4 even surpassed one major Neural Machine
in traditionally MT-resistant domains, such as games Translation engine in one instance and one
and life sciences. The ability to fully capitalize on the language pair. These results undoubtedly have
technology — in conjunction with the use of AI-driven implications for the future of Machine Translation.
technology that automates workflows and translator
selection — will position companies to increase their Why Is a New Machine Translation Paradigm
content velocity, produce captivating multilingual Likely Underway?
content that is always on brand, grow their markets, Current MT engine trends give us a sense of déjà vu.
and thrive in what has become a brutally competitive During the end of the Statistical Machine Translation
digital market. era, which NMT replaced, there was virtually no change
in MT quality output. In addition, the quality output
What is the Future of Machine Translation? of different MT engines converged. These things are
2022 Machine Translation results made us question the happening now.
current Neural Machine Translation paradigm.
While NMT may not be replaced imminently if we believe
Is the NMT paradigm reaching a plateau? in exponential growth and accelerating returns theories,
Is a new paradigm shift needed, given the consider Rule-based MT’s 30-year run and Statistical MT’s
engines’ inability to make significant strides? decade-long prominence, and note that NMT is now in
What could be next? its sixth year, a new paradigm shift is near.
0.722
0.720
0.714
0.708 0.706
0.700 0.697
0.680
0.660 0.658
0.640
0.620
0.600
Bing NMT Yandex Google NMT DeepL Amazon ChatGPT
Figure 8. Comparison of automated translation quality between ChatGPT and the major Machine Translation engines based on the inverse edit distance using multiple references
for the English-to-Spanish language pair.
0.600
0.553 0.552 0.550
0.525 0.521
0.505 0.488
0.500
0.459
0.400
0.300
0.200
0.100
0
Bing NMT Amazon Google NMT DeepL GPT-4 Yandex ChatGPT GPT-3
Figure 9. Comparison of automated translation quality between GPT models and the five major Neural MT engines based on the inverse edit distance using multiple references for
the English-to-Chinese language pair.
Why Are the LLM Translation communicate with humans in a conversational manner.
Results Noteworthy? Specialization adds accuracy to the performed tasks.
The results of our comparative analysis are remarkable
because the generic model has been trained to do many What Does the Future Hold for Large Language
different Natural Language Processing (NLP) tasks as Models in General?
opposed to the single NLP task of translation that MT The great thing about Large Language “Generic” Models
engines have been trained to do. And even though GPT is that they can do many different things and offer
has not been specifically trained to execute translations, outstanding quality in most of their tasks. For example,
its quality is exceptional. DeepMind’s GATO, another general intelligence model,
has been tested in more than 600 tasks, with State-of-
How Might Machine Translation Evolve as a the-Art (SOTA) results in 400 of them.
Result of Large Language Models?
Given the growth of LLMs — based on the Two development lines will continue to exist — generic
public’s attention and the significant investments models, such as GPT and GATO, and specialized models
tech companies are making in this technology — we for specific purposes based on those generic models.
may soon see whether MT will start adopting a The generic models are important for advancing
new LLM paradigm. Artificial Generic Intelligence (AGI) and possibly
advancing even more impressive developments
MT may use LLMs as a base but then fine-tune the in the longer term. Specialized models will have
technology specifically for Machine Translation. It would practical uses in the short run for specific areas.
be like what OpenAI and other LLM companies are
doing to improve their generic models for specific use One of the remarkable things about LLMs is that both
cases, such as making it possible for the machines to lines can progress and work in parallel.
Enhanced Quality
There will be a leap in Machine Translation quality as
technological advancements resolve longstanding
issues, such as language formality and other quality
issues pertaining to tone. LLMs may even solve
MT engines’ biggest problem: their lack of world
knowledge. This achievement may be made possible
through their multimodality training.
Rafa Moral
Vice President, Innovation
Rafa oversees R&D activities related to language and translation, including Machine
Translation initiatives, Content Profiling and Analysis, Terminology Mining, and
Linguistic Quality Assurance and Control.
Yolanda Martin
MT Specialist
Yolanda is responsible for the creation of customized translation models, as well as
quality analysis and the development of strategies to fine-tune them. In parallel, she
collaborates with the R&D department to develop new linguistic tools and resources.
Thomas McCarthy
MT Business Analyst
Thomas ensures Lionbridge customers and stakeholders obtain maximum benefits
from MT-related technologies, services, and consultancy.
To learn more about how Lionbridge can help you fully capitalize on
automated translations, contact our team today.
ABOUT LIONBRIDGE
Lionbridge partners with brands to break barriers and build bridges all over the
world. For over 25 years, we have helped companies connect with their global
customers and employees by delivering translation and localization solutions in
350+ languages. Through our world-class platform, we orchestrate a network
of passionate experts across the globe who partner with brands to create
culturally rich experiences. Relentless in our love of linguistics, we use the best
of human and machine intelligence to forge understanding that resonates with
our customers’ clients. Based in Waltham, Massachusetts, Lionbridge maintains
solution centers in 24 countries.
LEARN MORE AT
LIONBRIDGE.COM
Lionbridge | 2023 Machine Translation Report © 2023 Lionbridge. All Rights Reserved. Page 36 | LIONBRIDGE.COM