Machine Translation
Machine Translation
TRANSLATION TECHNOLOGY
Consecutive Interpreting
A Short Course
Andrew Gillies
For more information on any of these and other titles, or to order, please go to
www.routledge.com/Translation-Practices-Explained/book-series/TPE
Additional resources for Translation and Interpreting Studies are available on the
Routledge Translation Studies Portal: http://routledgetranslationstudiesportal.com/
A PROJECT-BASED
APPROACH TO
TRANSLATION
TECHNOLOGY
Rosemary Mitchell-Schuitevoerder
First published 2020
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
52 Vanderbilt Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2020 Rosemary Mitchell-Schuitevoerder
The right of Rosemary Mitchell-Schuitevoerder to be identified as author of this work
has been asserted by her in accordance with sections 77 and 78 of the Copyright,
Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised
in any form or by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks,
and are used only for identification and explanation without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
Names: Mitchell-Schuitevoerder, Rosemary, author.
Title: A project-based approach to translation technology /
Rosemary Mitchell-Schuitevoerder.
Description: London ; New York : Routledge, 2020. |
Series: Translation practices explained |
Includes bibliographical references and index.
Identifiers: LCCN 2020003466 | ISBN 9780367138820 (hardback) |
ISBN 9780367138844 (paperback) | ISBN 9780367138851 (ebook)
Subjects: LCSH: Translating and interpreting–Technological innovations. |
Translating and interpreting–Data processing.
Classification: LCC P306.97.T73 M58 2020 | DDC 418/.020285–dc23
LC record available at https://lccn.loc.gov/2020003466
ISBN: 978-0-367-13882-0 (hbk)
ISBN: 978-0-367-13884-4 (pbk)
ISBN: 978-0-367-13885-1 (ebk)
Typeset in Bembo
by Newgen Publishing UK
For John, Thomas, Benjamin, Simeon
CONTENTS
Bibliography 153
Index 159
FIGURES
MS Microsoft
MT machine translation
NDA non-disclosure agreement
NGO non-government organisation
NLP natural language processing
NMT neural machine translation
PDF portable document format
PE post-edit
PEMT post edit machine translation
PM project manager
PO purchase order
POS parts of speech
Q&A question and answer
QA quality assurance
RBS risk breakdown structure
RSI repetitive strain injury
SaaS software as a service
SEO search engine optimisation
SL source language
ST source text
SR speech recognition
STT speech-to-text
T&C terms and conditions
TAUS translation automation user society
TBX TermBase eXchange
TEnT translation environment tool
TL target language
TM translation memory
Tmdb terminology database
TMS translation management system
TMX translation memory eXchange
ToB terms of business
TQA translation quality insurance
TTS text-to-speech
TT target text
TU translation unit
WCMS web content management system
WWW World Wide Web
XLIFF xml localization interchange file format
XML Extensible Markup Language
GLOSSARY
This book is designed for instructors whose students are new to translation tech-
nology. It can also be used for students and professionals who want to learn more
about translation technology and project management. This book is not a manual of
technological translation tools, nor is it a guide to translation project management.
If students have not yet been introduced to CAT tools, the suggestion is to throw
them in at the deep end. The manufacturers of CAT tools and other translation soft-
ware products provide detailed manuals, videos, Q&A pages, and knowledge base
portals. It is important to point students to the Help sections in the CAT tool. Don
Kiraly (APTIS Newcastle upon Tyne 2019) suggested that a 30-minute introduction
to the CAT tool by the instructor should be enough to get students started, on con-
dition that they use the CAT tool daily. The instructor’s colleagues should allow and
encourage students to deliver all their translation work in CAT tools. An introduction
to CAT tools for teachers may be helpful.The key to CAT tool competence is regular
use. A weekly instruction class is relatively unproductive, because it is not enough to
consolidate a large amount of newly learnt skills, however small they may be.
The project-based approach aims to give the students an all-round perspective
of translation projects, workflow, and of the teamwork required to complete large
multilanguage translation projects using translation environment tools (TEnTs).
The book does not aim to be prescriptive: suggested TEnTs may not be access-
ible and they could become obsolete over time. One can only use what is available
in the educational institution. There is the option to download demo versions on
personal devices. The URL address www.routledgetranslationstudiesportal.com/
in the textbook points to the (editable) Routledge Translation Studies Portal. The
double asterisks ** in the book are important markers that signify useful URL
addresses in the Portal.The asterisks are followed by the title of the relevant section
on the web page. If URLs are not accessible, you may be able to find alternative
sites from your location.
Introduction for instructors xxv
real world into the learning process (Rohlfing et al. 2003). A student who struggles
with the CAT tool and does not experience the interplay, who cannot understand
the context, and cannot visualise the situation, is likely to give up on translation
technology.They must learn to see the whole picture: the situation is created by the
source text with its characteristics and constraints (e.g. PDF formats), the context as
a construct that depends on factors, such as language pair and direction (are there
enough students in class with the same language pairs?), access to the internet, etc.
The context controls and challenges the user. A CAT tool presents its own bound-
aries and limitations and they too are part of the context.The student is confronted
with a situation and is challenged to act appropriately within it. They need to learn
to shape translation technology instead of being shaped by it; in the book they learn
about issues surrounding machine translation, translation quality in TEnTs, digital
ethics, translating in the cloud and this knowledge should help them make choices
and set parameters.
‘Situated learning’ gives the student a chance to reflect on the role and impact
of translation technologies and to see the bigger picture (Bowker 2015). We need
to create an authentic workplace in the classroom where students work and learn
collaboratively in preparation for a career in the translation industry. The ‘Food for
thought’ sections in this book invite discussion or reflection on the theory. The
project-based assignments combine understanding, reflection, learning, and team-
work in practice.
Project-based assignments
The project-based assignments in the chapters offer frameworks which can and
should be adapted by the instructor, in line with situation and context, and com-
petencies. The assignments challenge the students to remember, understand, apply,
analyse, evaluate, and create (Bloom’s revised taxonomy 2001). The instructor leads
discussions and offers support during planning and execution stages. After the
team’s self-analysis and evaluation, the instructor sits down with the students for a
final assessment of their collaborative translation project.
experienced most management and linguist roles pertaining to the creation and
completion of a translation project. They will also have tried and managed a var-
iety of technological features. They should have taken turns in managing multiple
translation projects. Project management and other business-focussed or career-
oriented skills must be consolidated in the final chapters. The instructor could
arrange a presentation on project management by a colleague or student from the
business school at any time.
Assessment
A project-based approach to translation technology is built on the premise that
what happens in the industry should be trialled in class. Errors are made, outcomes
may vary, and evaluation is essential to achieving improvement. Three types of
assessment in collaboration between instructor and students can be used as a starting
point (Robinson et al 2008):
Students must not forget the ultimate objective of learning about translation tech-
nology, which is to improve translation quality. The submission of a final translation
by a team can be graded by the instructor.
• the individual student should assess their own translation quality and efficiency
achieved through collaboration and project management
Individual assessment should be of the student’s own translation against the bench-
mark translation.
Team setup
Students who have not yet worked in the translation industry may not neces-
sarily be familiar with concepts such as project management or project-based
translation. Both concepts are explained in Chapter 1 but students will need
guidance and support when setting up their teams to carry out a project-based
assignment. The assignments vary between two kinds of team setup: the pro-
ject team in which a team of students work together and a project manage-
ment team that manages and contracts students to do the work. The boundary
between the two types of team is not strict, and there may be situations where
there are not enough students to form teams and assignments will have to be
carried out collaboratively. Setting up a team is challenging and needs to be
well organised:
1. the instructor prepares a range of URLs with suitable source texts containing
short sentences and repetition. The digital format of the text matters, too.
A text without layout requirements or HTML markup codes would be prefer-
able in the initial assignment;
2. the instructor invites an external ‘client’ to supply their source material for
translation. A good example is University College London (Shuttleworth
2017) where the translation trainee students collaborated with the Museum of
Zoology with a repository of 68,000 specimens. The museum was delighted
to have students translating for free and the students were highly motivated to
produce translated material that would be published.
Studies Portal is therefore the site to consult regularly for updates and additions and
the names of products and tools.The URL address is www.routledgetranslationstud
iesportal.com/. In the book URL links on the Portal are marked like this: **. The
information in parentheses gives the title of the relevant section in the book on the
Portal’s web page.
Another concern has been deciding on the title of this book: outsiders inter-
pret ‘project-based approach’ as meaning project management and ‘translation tech-
nology’ as machine translation. The project-based approach refers to the style of
learning advocated and to the translation as a complex, frequently collaborative
undertaking: volume and time constraints on source documents often mean that
they must be split and shared among several translators. The translation technology
component is best explained as the software that is used by translators and LSPs in
large translation projects. The range of tools is infinite. The CAT (computer-aided
translation) tool is the main player, because it is the tool into which so many other
technological aids can be integrated. The machine translation engine is not the
main player, even though it likes to compete with and within the CAT tool. It is,
however, a very serious competitor both in terms of efficiency and improved trans-
lation quality. Watch this space!
The project-based assignments at the end of each chapter provide oppor-
tunities to practise translation technology skills, preferably within a framework
of collaboration, through the use of appropriate tools. If circumstances do not
permit organised collaboration, you will have to find other ways to collab-
orate: you can ask revisers anywhere in the world to revise target your files;
you can ask teachers to comment on your work in bilingual CAT files; you can
crowdsource; you can use digital platforms.
There may be several setbacks ahead of you while working through this book:
the tool may not be available or work well, there may not be enough peers with
the same language pair to facilitate collaboration or enough translators/revisers to
tackle the job; or missed deadlines. These problems crop up in the working lives of
professional translators too. This book aims to give you a better understanding, and
hands-on experience within the assignments, of how translation technology is
managed in the workplace. It also aims to help you use translation technology com-
fortably in your personal work environment.Translation technology is promoted as
being a translator’s best friend: manufacturers claim that it radically improves effi-
ciency. Their often-used mantra is less time, more profit, better quality. We will put this
to the test by using their translation technology tools and reviewing them through
the eyes of the different parties involved.
This book is not only written for trainee translators and aspiring transla-
tion project managers, but also for practising translators, project managers,
and instructors. Hopefully, practitioners within each category will appreciate
being given an overview of translation technology tools and their impact on trans-
lation work. The project-based assignments at the end of the chapters are set up as
frameworks and give users and readers of the book an opportunity to experience
Introduction for students and translators xxxiii
Pitfalls
Rather than my listing a whole string of recommendations here for avoiding poten-
tial problems with hardware and software, translation texts and resources, language
newgenprepdf
pairs and direction, I suggest you discuss any specific concerns with your instructor
in advance of beginning assignments, but here are a few general tips and warnings:
• internet
CAT tools work well with certain formats and love the repetition of terms and
terminological phrases.
• hardware
Key concepts
• The translator, the language service provider (LSP), and computer-aided trans-
lation (CAT) tools have become partners in the translation process
• The computer-aided translation tool comes first among translation environ-
ment tools (TEnTs)
• The CAT tool is one of many TEnTs that can be linked or integrated
• The management of translation projects requires good teamwork, administra-
tive and project-solving abilities
Introduction
CAT tools are invariably used in managed translation projects, but they can also
be helpful in translation projects where the translator is working directly for the
client. The CAT tool was built to assist the translator, it is shaped by the trans-
lator, it can efficiently process the work of collaborating translators and revisers,
and it has become the tool used by project managers to help a team of linguists,
in-house or freelance, to deliver a high volume of good quality translations, in mul-
tiple languages, shared among many contractees, and in the shortest period of time.
Any difficulties associated with the tool must be solved by the users. Most likely,
the problems were created by humans in the first place and not by the technology
(Kenny 2016). This chapter builds on a basic level of CAT tool skills, good MS
Word competence, the availability of Microsoft (MS) Windows operating systems
and an appropriate CAT tool. In addition to MS Word and MS Windows expertise,
your administrative skills will be challenged: effective personal file management is
the precursor to successful CAT tool operations.
2 Computer-aided translation tools
1.2 Compatibility
Most CAT tools operate on MS Windows systems. Users of non-MS Windows
systems can either download virtual MS software or use CAT tools specifically
Computer-aided translation tools 3
designed for their systems. Currently the leading programs are only Windows-friendly.
LSPs often prefer to contract translators who use MS operating systems for ease and
compatibility. Many MS-based TEnTs are a challenge for non-Windows users.
OmegaT was launched in 2000 to work on multiple operating systems and
others have followed. Whereas the big players in the MS Windows market are
expensive to purchase and to maintain (annual service agreements, upgrades),
OmegaT is an example of free open-source software for non-MS users. Open
means that it can be inspected, modified, and enhanced by anyone, either producer
or user. OmegaT’s compatibility with other CAT programs is supported through
filter plugins, which allow you to process XLIFF files, the standard workfiles in
CAT (2.4). XLIFF is an XML-based format, created to facilitate the exchange of
files between programs and users. Extensible Markup Language (XML) encodes
documents in a format that is both human-readable and machine-readable. The
demand to use iOS (Apple’s operating system) software and not only Microsoft
on desktops and laptops has increased so much that many suitable CAT programs
have been designed since OmegaT entered the market**(link: non-MS compatible
CAT tools). Leading CAT tool competitors are Microsoft only programs.
Not only hardware-cum-software, but also the format of source files is an
area where compatibility plays a role. CAT tools can handle major file types like
Microsoft Office files and webpages, but more complicated file types such as image
files are not always supported. For example, InDesign design files (INDD) are cur-
rently only supported by one CAT tool**(link: INDD compatible), and even then,
only after conversion in the cloud. XML files, in which text is created and stored
in databases rather than in document files, can be converted in CAT tools, but not
without limitations or applied filters. Once they are set up in the CAT tool the pro-
cess is straightforward and they will be delivered as a perfect image product, ready
for publication, if tags and codes have been applied correctly (2.4). PDF files can
be problematic in relation to formatting. Hence, there are conversion TEnTs to
make the import in a CAT tool straightforward (2.4).
A different kind of compatibility, or conflict, arises between web-based and
standalone CAT tools (1.6.2). When the translator is required to work in a web or
cloud-based CAT tool, such as a server managed by an LSP, the translator cannot
take ownership of the data they have entered in the TM database. If the trans-
lator cannot download the XLIFF file or work in, for example, the program’s off-
line editor**(link: offline mode), the translation units entered in the TM are no
longer the translator’s intellectual property. This loss of control often means that
translators cannot build their own TMs. The technical and ethical aspects of TM
sharing and online translation/collaboration will be discussed in more detail in
Chapters 2 and 6.
Lack of compatibility between CAT tools has brought much frustration to
working translators. The manufacturers are fully aware, and they improve com-
patibility to remain ahead of their competitors. Technology is often driven by
manufacturers, who do listen if users make their voices heard. The user-friendliness
of CAT tools today compared to the early programs in the 1990s has made the
current tools intuitive and much more accessible.
4 Computer-aided translation tools
In 2010, two decades after the software had entered the market and competition
had flourished, TAUS, the Translation Automation User Society, predicted a more
advanced future for translation memory**(link: TAUS).The EAGLES definition of
TM as a text archive from which translator select, has changed to programs which
are highly compatible.They use a ranking system to suggest and propose and it is up
to the user to reject or accept.TAUS wants to prove that algorithms, metrics, ranked
matches in TM, and machine translation (MT) are on a par. They believe that TM
will move into the cloud, like its MT sibling:
TM will now finally become a smart tool, bridging the gap with its more
intelligent MT sister and significantly increasing the recycling of previous
translations. At the same time, TM will move into the cloud. Leveraging of
translations will be done in the cloud through web services links in desk-top
and enterprise translation tools. The combination of advanced leveraging
and the sharing of TMs in the cloud will boost translation productivity by
30% to 50%. And just as important, we will see greater consistency and
accuracy in translations.
TAUS Predictions Webinar, 21 January 2010
The listed features show that the CAT tool can be supportive in different genres
or styles of text and in a variety of formats. TM supports our personal working
memory; it remembers what might have slipped our mind.
The formatting function, a helpful characteristic of the CAT tool (1.6.1) is
a great time saver. It replicates the layout of the ST in the TT and it facilitates
the translation of file formats which cannot be processed in MS Word. The
CAT tool automatically converts the different formats to its standard bilingual
interface (2.4). Web formats are one example of the many new technological
features that have leveraged translation above the pure act of translating. They
have placed more demands on the translator. CAT tools can help (Melby and
Wright 2015: 665).
6 Computer-aided translation tools
What value does the CAT tool add to the translation process?
• source file
• bilingual doc file (for review)
• XLIFF file (the work file that can be shared with others)
• TM database
• terminology database
• target file
The total files and folders vary between programs, as well as the storage system.
The translator cannot control the generation of files but can control where they are
stored. Before importing the source file in the CAT tool, it is important to
1.4.1 Filenames
Do not change filenames
The naming of a computer file should be done with care, clarity, and precision.
Once in the CAT tool the name cannot be changed. LSPs request a filename not
to be changed because a different name would prevent it from being imported in
their system.
01_ASD
ASD_01
BSD_04
BSD_15
BSD_7
Non-alphabetic languages
If the language is non-alphabetic and contains characters the computer will have
its own method to order filenames, by strokes or sounds, for example in pinyin. To
achieve a degree of conformity and homogeneity between languages, filenames
generally begin with numbers.
Parent
folder
Subfolder
EN -FR
Subfolder
ST
Subfolder
TT
Subfolder
EN -IT
Subfolder
ST
Subfolder
TT
TM
Parent folder
A parent folder is the higher directory and can be filled with subdirectories or
subfolders. Figure 1.1 is an example of a parent folder with subfolders set up by the
translator who set up a parent folder with three subfolders:
CAT
Agency F
BackupBy
Retrofit
en_GB
External
Review
nl-NL
Reports
Retrofit
TM_eco_en-
nl.sdlproj
Tips
[…] symbol
If you are not familiar with operations, and you want to export or generate a target
file, do so slowly and click on the […] symbol next to the filename. It opens your
browser and allows you to select a folder where you want to store the target files.
By clicking on ‘next’ or ‘finish’ without opening your browser, the CAT tool takes
control and place files in its program folders or generates new folders. The general
rule of thumb is that if we click on Next, Next, Next, without much consideration
as to what is happening, we allow the CAT tool to make choices that tend to ignore
our directories.
Right click
Left click
The ribbon, in MS Word and the CAT tool, is a powerful command bar that has
organised the program’s features into a series of tabs at the top of the window.
Several MS Word core functions are quite similar to CAT tool functions:
The functions on the ribbon operate the program. We tend to use the functions
we think we need, and ignore others. The CAT tool ribbon contains functions that
override the default status and make it more adaptable to our needs, for example,
our language pairs. Our productivity, quality, and efficiency will benefit from taking
time to become familiar with the ribbon.
Shortcuts help us work faster and are more precise than moving the mouse to
commands on the ribbon. If there is shortcut, it can be found by hovering over an
icon on the ribbon and the code will appear. Apparently, we save time by keeping
our fingers on the keyboard and it prevents clicking on the wrong icon or command.
12 Computer-aided translation tools
Most CAT tools support MS Office formats, such as .docx, .odt, .csv, .xlsx, plus
.html files and .xml. The translation industry has introduced the XLIFF file
(interchangeable CAT tool format) and TMX file (interchangeable format for
databases) which improve compatibility and homogeneity. Advanced CAT tools
also support software formats, such as .json, which is a language independent
format,Visual Studio and software for layout and building plans, such as InDesign,
Corel Draw and sometimes AutoCAD. The usefulness of CAT tools cannot be
denied when translating multimedia formats. They extract text for translation,
rebuild and reformat the file in the target language after the translation is finished.
The translator works in the standard bilingual CAT tool text display and is not
distracted by metadata or, for example, the autocompletion function in Excel (2.4
Formats). The spellchecker is automatically operable in the CAT tool, contrary to
Excel and other formats.
A new type of web-based CAT tool entered was launched in 2019, the smart-
phone or tablet version**(link: web-based and standalone parity). Obviously, the
phone version cannot show all the functions, but it works well as a backup for the
translator on the road who is asked to make quick changes or add a paragraph.
The developers of CAT tools are constantly bringing changes with modified or
new features. The similarity between the main tools has increased greatly, and so
has their compatibility. Many LSPs prefer translators to work on their servers and
if this trend continues it will be interesting to see if it will at all be necessary to
invest in standalone tools.
What advantages and disadvantages does the CAT tool bring to the translator?
Translations have become projects and translation is only one component in the
workflow. The translation project requires management of communications and
accounts in addition to the actual translation process. Sheer volume demands a
different approach to how we translate and deal with demands.
Translator
Client LSP
Project
Managers
Proofreader
source
text
target
translator
text
project
manager
revised
reviser
text
client and translator. The LSP must source translators (and other linguists, such as
revisers). A purchase order (PO), which is the first official document with details
about type, word count, deadline, and agreed price for the translation service, needs
to be sent to the translator to complete the first stage.Translation is the second stage
followed by the third stage of revision; the final stage in the workflow includes
delivery and invoices from translators to the LSP, and from the LSP to the client. If
the contracted translator uses a CAT tool, stage 2 consists of:
receipt of file > preparation > ST import > translation > self-revision > return
to LSP > third-party revision > return to LSP > return to translator > export
of XLIFF file or generation of target text > return to LSP.
During the process additional nodes of interaction are generated through contact
with the project manager at different points such as delivery, return of revised trans-
lation, review by the translator and return to the LSP project manager (Figure 1.5).
If the translator translates in a web-based CAT tool, controlled by LSP, it will
produce a smoother cycle because the entire workflow is in the cloud (Figure 1.6).
client
LSP LSP
reviser translator
LSP
The time allowance for the following translation request is three hours from receipt
of email to delivery of the translation.The subject is technical (cosmetics) and requires
research. Note that self-revision is more effective the following day than immediately
after completion:
Good morning,
I hope you are well today. We are currently looking for a translator to help with
a translation for our Cosmetics client.
The text is about a new range of summer products for 2020; there are 414
words. Please take note of the source file in which you can also see the product
names for reference.
Would you have capacity to translate this by 2pm UK time today?
Email from LSP (2019)
Translation work has generally become more complex with less remuner-
ation (see 2.1.3) Good translation management is crucial for well-being and
satisfaction. It needs to be more comprehensive than the delivery of quality
translations.
specialism), tool use, and the types of service(s) they offer such as translation,
proofreading, post-edit of machine translation, desktop publishing, transcription.
Project management can be mapped into five processes (Dunne and
Dunne 2011):
• initiation
• planning
• execution
• monitoring
• signing-off
Increasingly, LSPs automate the process and use either CAT tool servers or web-
based translation management systems.
team’ can be modified (reduced) to a ‘project team’ in which the members per-
form tasks themselves.Very often project team members then become project man-
agers because they need third parties to assist them in their roles and functions.
Thus, the project team may evolve into a project management team. There may
be circumstances when teamwork is not at all possible in class: the components in
the translation project can then be performed from the perspective of a contracted
freelance translator who collaborates with others.
In the real world, a project management team will ideally consist of linguists
with different native languages and language pairs. Team members have different
functions and can therefore be assigned different tasks. The teams usually have
one senior project manager, responsible for leading the team, ensuring that
the members have everything they need to complete their tasks to achieve
goals and objectives. The project management team will have any number of
project managers to manage the tasks through contracted external linguists
(freelancers).
The project-based assignment will always have a set time frame in which
members fulfil their roles. After completion the team is dissolved. Teams can recon-
vene, be modified, or reconfigure for new project-based assignments. If the team
works well, it can stay together. For the duration of the project the team members
work towards a set common goal of delivering the project within time and budget
constraints and according to agreed standards. Third parties are the stakeholders,
such as clients and service providers. If the service providers are translators, they
are contractees. The project managers may well spend much time overseeing and
managing their team of service providers. Completion and quality checks of the
translation should be performed by team members, such as revisers, quality assurers.
Project managers within a team are accountable to the senior project manager
and need to communicate effectively with their colleague project managers and
stakeholders to deliver a successful project.
Although there is a distinct difference in the distribution of tasks performed by
project teams or project management teams, hybrid project (management) teams
are quite acceptable.
• the rules that matter to the client are standard, style, price, and speed;
• the project management team is faced with hierarchical rules of seniority in
the team, which also affect external translators and revisers;
• the translator requires fair pay, sufficient time, helpful resources, and support.
Speed and rates are common causes of friction because the different parties do
not share the same objectives: the translator generally asks for more time than the
client/LSP is willing to give, and the same applies to rates.The contracted translator
may experience stress that they will lose their contracts with the LSP if they do not
abide by the rules.
Cognitive friction also exists on a technological level. What is the impact of
TEnTs on the translator?
• upgrades and new versions of their own CAT tools have a negative impact
on time
• translating in the LSPs web-based tools challenges routine and habit
• financial depreciation in relation to the pricing of TM matches
• reduced word fees for post-edited machine translation (PEMT)
• prohibited use of integrated MT in a CAT tool in relation to potential breaches
of confidentiality (6.2)
There has always been cognitive friction, but its nature has changed since the arrival
of TEnTs. LSPs find themselves in a difficult position between the translation buyer
and the translation vendor where the parties have a different understanding of time,
efficiency and quality. Good LSPs will try to satisfy the needs and wishes of either
party. Project managers must be skilled and diplomatic negotiators.
22 Computer-aided translation tools
Project-based assignment
Objective:
A hands-on experience of managing translation technology and human
resources in a translation project.
Method:
The assignment is designed for teamwork, but collaboration between individ-
uals is possible.
Tools:
CAT-tool (a server version is not necessary at this stage)
Suggested resources:
Digital source text (for example a brochure for museums, cars, domestic
products, services, hotels and catering, promotional university websites,
health and safety in organisations, non-government organisations).
Consult your instructor to check if the text is suitable for a collaborative
translation in a CAT tool.
Assignment brief
In this project-based assignment we will try our hands at managing a translation
project. The LSP team consists of different roles and the project offers different
Computer-aided translation tools 23
perspectives: the project requires project managers (PM) and translators. The LSP
manages a translation project and contractees, but does not translate. The translator
managed by the LSP will be asked to translate or to revise.TEnTs play an important
part. The brief is as follows:
The client requests the translation of a brochure into multiple languages. The
LSP is asked for a quote and best turnaround time.The brochure has an approximate
word count of 3000+ words. Consider this request as an LSP and work through the
following stages as a team:
Possible pitfalls
• Unsuitable material for translation. The TM in CAT tools is effective when
there is repetitive terminology. Consult your instructor.
• Shortage of translators
• Role overlap
• Formats: it is advisable to use uncomplicated formats in this assignment, such as
docx. Other formats often require tags, which are discussed in the next chapter
(2.1.1 and 2.4).
Concluding remarks
This chapter has introduced you to a range of translation environment tools with
an overview of the the CAT tool, the TM, its features and functions. We have
considered the benefits and challenges of the CAT tool from the translator’s and
24 Computer-aided translation tools
the client’s or LSP’s perspectives.We have considered the similarities of ribbons and
commands in CAT tools and MS Office, and their usefulness. We have seen the
importance of good file management in respect to the CAT tool. We have learnt
how to manage translation projects that involve multiple target languages, managed
by LSPs who collaborate with the client, contracted freelance translators, take care
of accounts and quality standards in translations. We have looked at the different
roles in a translation project management team with project managers and in-house
or contracted translators and other linguists. You have been given an opportunity
to organise the setup of a project management team and to determine the roles
of its team members, create workflow charts and operate features and functions in
the CAT tool. All these activities help us reflect on not only our different roles and
contribution to the outcome and quality of translations, but also on the use of CAT
tools in the translation process.
Further reading
Dunne, Keiran J. and Elena S. Dunne (eds) (2011). Translation and Localization Project
Management. Amsterdam and Philadelphia: John Benjamins.
Ehrensberger-Dow, Maureen and Sharon O’Brien (2014). ‘Ergonomics of the translation
workplace: Potential for cognitive friction’. In: Deborah A. Folaron, Gregory M. Shreve,
and Ricardo Muñoz Martín (eds), Translation Spaces, 4(1): 98–118. Amsterdam and
Philadelphia: John Benjamins.
Melby, Alan K. and Sue Ellen Wright (2015). ‘Translation memory’. In: Chan Sin-Wai
(ed.), The Routledge Encyclopedia of Translation Technology, pp. 662–77. London and
New York: Routledge.
Walker, Andy (2014). SDL Trados Studio – A Practical Guide. PACKT Publishing.
Wright, Sue Ellen (2015). ‘Language codes and language tags’. In: Chan Sin-Wai
(ed.), The Routledge Encyclopedia of Translation Technology, pp. 536–49. London and
New York: Routledge.
2
THE TRANSLATION MEMORY
DATABASE
Key concepts
• The translation memory database can be filled to suit our individual needs
• The translation memory database accepts several methods to boost its content
• The translation memory database can be customised to suit our language
pairs
• The translation memory database includes features to improve translation
quality and productivity
• The translation memory database can affect profit margins
Introduction
‘Translation memory’ means TM database. In the translation industry the terms
‘TM’ and ‘CAT tool’ are often interchangeable: TM meaning CAT tool or vice
versa. We will try to avoid this confusion. The TM is a database consisting of trans-
lation units (TUs), which we have entered over time.We can increase the size of our
TM by importing TMX files (interchangeable TMs) from colleagues or LSPs for
our use or reference purposes (2.3.3). A TM needs to be managed intelligently to
be effective. The TM is more than a useful storage system. Regular edits and good
maintenance make the TM a resource that is customised to our own needs and keep
it up to date. The TM is the primary database in the CAT tool.
The TM’s special feature is that it automatically stores TUs when we confirm
a translated segment, whereas the terminology database (Tmdb) must be built
manually: we must highlight and add the terminology pairs we want to store.
The Tmdb is integrated in CAT tools. One CAT program prefers an external
Tmdb, because it can then also be used outside the CAT tool. In the tool it can
26 The translation memory database
be linked to the translation project and integrated like any internal Tmdb**(link:
External Tmdb).
The TM in a new CAT program arrives empty and it can take quite a while to
fill, depending on the amount of translation we do. If you are a student who uses
a public PC, the TM will be cleared each time the PC is shut down. Therefore, we
must be careful to export our TMs as interchangeable TMX files and save them in
our personal accounts for importation when we open a public CAT tool (2.3.3). If
we want to fill a TM before we start a new project, we can import resources (2.3.1)
or TMX files (external TMs).
In this chapter we will explore how we can best use and boost the TM to
suit our needs, even in the short term, and make it a superb tool to increase and
enhance quality, efficiency, and potential profit. Although translators and LSPs share
the objective of better quality and more productivity through the TM, translators
often cross their swords with LSPs about word rates when costing a project (2.1.3).
Manufacturers constantly update their programs, adding new features. The cloud
plays a crucial part in CAT tool development; it gives access to infinite new oppor-
tunities. In the project management assignment, you can experience TM features
and view them from different perspectives, through the eyes of the LSP and the
translator.
2.1.1 Segmentation
The CAT tool splits a source document into segments so that the TM can store
the TUs systematically for recall. File preparation, after importing a document
in the CAT tool, is an automated procedure that happens almost unnoticeably
in the CAT tool. ‘Project preparation’ is the process that is clearly displayed in
one of the CAT tools (Figure 2.1). The first task, ‘Convert to Translatable Format’
(Figure 2.1), is the segmentation process and converts the source text to a
segmented bilingual CAT tool format. By default, the segment boundaries are
spaces or punctuation marks, which works well in word-based languages such
as English where a full stop signifies the end of a sentence. The segmentation is
defined by rules specific to each source language and may vary between different
CAT tool programs. We can to some extent edit the rules to suit our needs, but
they are designed to support match searches and our intervention could prevent
recall. The inclusion or exclusion of one word in a new string will not generate
a perfect match. In non-whitespace languages (character-based languages in East
and South East Asia) segmentation has been traditionally defined by characters.
CAT tools are now enabling a choice so that words instead of characters can
be ticked as the basic unit for segmentation in Chinese and Japanese source
languages.
The translation memory database 29
have been missed during a bilingual review in the CAT tool editor, the interface
with grids containing the segmented ST and TL units.
Despite improvements and new developments in the tools to help us deal with
segmentation, opinions on the ultimate translation quality are divided. Pym (2008)
claims that TM software may help maintain terminological consistency, but it requires
too much management to bring about great productivity gains. Meanwhile, LSPs
are great advocates of the TM. Their priority is to meet the client’s requirements
which is made possible through a shared TM in large projects. Furthermore, seg-
mentation structures source texts, improves repetition rates, guarantees consistency,
and leads to cost reduction (2.1.3).
Ultimately the translator is the decision-maker regarding acceptance or modi-
fication of segmentation rules. We need to be aware of syntactic differences in
relation to our language pairs so that we can change segmentation rules to suit
our needs (2.5.2). We can also control segmentation rules by being critical of the
way in which source texts are formatted before we import them. Editable source
texts can be formatted or reformatted by the translator in MS Word. For example,
we can use the formatting features in MS Word, rather than spaces, tabs, and hard
returns or line breaks. A reduction of these special characters prior to importing
the file will greatly reduce segmentation and reduce the number of format tags
(2.4). The translator should prioritise precision over recall, favour quality over
productivity (Bowker 2005). Precision refers not only to content, but also to
source formatting.
FIGURE 2.2 Dialog box opened in concordance search in SDL Trados Studio 2019
is often used for that purpose. In Figure 2.6, the concordance dialog highlights
four search results for ‘supplied’ in the source language but there are only two
identical matches in the target language. Adjectival endings in the target language
prevent matches. However, if you highlight the stem of the word in the concord-
ance search, it will show results with and without the adjectival ending. A more
The translation memory database 33
translation project (Figure 2.7). They expect a translator to offer a discount for
repetitions, 75% discount for 90–100% matches, 50% discount for 70–90% fuzzy
matches and no discount for (fuzzy) matches below 70%. Alternatively, the LSP sets
up a Purchase Order (PO) and chooses a different breakdown against their TM and
it is up to the translator to accept or reject the translation job. Some translators will
reject discounts altogether. Their arguments may be that they purchased an expen-
sive CAT tool for their benefit and that even 100% matches need to be checked in
context. They risk not being offered the job.
The analysis gives word and character counts and other statistical data, such
as the number of new words and repetitions in a file, before or after translation.
In Figure 2.7 the ribbon at the top gives the name of the project, file(s), active
document, or part of the document from the cursor, and counts. It gives a status
report of what has been translated and the amount of repetition in the ST. It
also gives the option to calculate the word count in another program, as CAT
tools vary. The variance is caused by the method chosen to identify and count
numbers or tags, and different segmentation rules. The analysis in the lower half
of Figure 2.7 shows that there is repetition in 15 segments and in 37 source
words. A calculation in the other tool showed the same number of segments
but less repetition, a lower source word count and fewer words with no match
(Table 2.1).
There are also language-specific variances in pricing. For example, 1000 Chinese
characters will usually be translated into about 600–700 English words; 1000 English
words will be translated into about 1500–1700 Chinese characters (depending on
the nature of the text). A Dutch target text generally scores a 15% higher word
count than the English source text, and this variance within language pairs is not
uncommon. It is due to a different syntax. In other languages a letter or character
count per line may offer a more reliable fee basis. Fees are generally based on the
source text because these counts are available at the initial negotiation stage. Ideally
fees should be set per language direction.
In German, a language that joins many noun compounds into single long words,
fees tend to be based on standard lines consisting of 55 characters including spaces
between words. Fees can also be based on hourly rates, page rates, and flat fees
(minimum fees).
The analysis feature in the CAT tool is convenient for costing, both for translators
and LSPs.When CAT tools first entered the market and introduced match matrixes
in the analysis feature, the agencies began to ask contracted translators for reduced
rates for repetitions. Many translators felt let down because they saw their increased
profit margins fall below the level prior to CAT tools. Translators argued that fuzzy
matches needed editing and that revision included a critical review of context
matches and therefore there should not be any discounts. One of the manufacturers’
selling points was that CAT tools would increase productivity and increase profit.
The question many translators ask is ‘whose profit’.
newgenrtpdf
The translation memory database 35
FIGURE 2.7 Analysis report (statistics tab in memoQ 9.1)
36 The translation memory database
TABLE 2.1 MemoQ (M) and TRADOS 2007-like (T) counts compared in MemoQ 9.2
Statistics
M T M T
What are arguments to defend a 100% payment of fees for perfect and fuzzy
matches?
in translation results. We can see in which TM, associated with which client
or domain, they occur. Displayed dates are significant too and may remind us
that it is time to clean a TM if obsolete terms present themselves. Although
we cannot add metadata to the TM database on the fly, we can use the Tmdb
for this purpose. It may be useful to know the part of speech (POS), gender
or number, so that we can account for declensions and inflections. The TM
searches in authentic strings, while the Tmdb searches term pairs or phrases,
which we can modify, extend, make plural, etc. Fuzzy matches are only generated
if there is a high level of similarity. Although it is possible to lower the fuzzy
match percentage in the settings, it is not recommended. The TM will propose
many unhelpful suggestions. If translators find the TM’s recall disappointing, it is
because matches are part of longer strings in segments. In this respect, the recall
by Tmdb is superior (Flanagan 2014).
Software developers are working hard to develop subsegment matching in the
TM to improve match results.TAUS, the Translation Automation User Society**(go
to www.routledgetranslationstudiesportal.com/ – A Project-Based Approach to
Translation Technology – link: TAUS), was set up in the Netherlands in 2009.
The association consists of major translation buyers and other organisations and
companies interested in the advance of translation technology, especially machine
translation. It offers webinars to educate its members and other stakeholders, and
conferences to generate networking possibilities. TAUS has developed a frame-
work to test (machine) translated data and to collect and disseminate data among
members (Zetzsche 2019 (297th Journal)).
TAUS has launched a new product called Matching Data. Its developers recog-
nise that adding metadata to the TM could in fact be counterproductive if metadata
cause TUs to be locked into one domain. Their solution to inadequate matching
lies in receiving huge quantities of parallel language data from different sources and
owners, which are then transformed into unique corpora, i.e. bodies of text, such as
glossaries.They are not only domain-specific (e.g. associated with a client’s product,
field or science) but are also customised to individual search requirements. In
Matching Data, the matches are based on granular subsegments, which means they
focus on much smaller units, with more lexical or morphemic detail and descrip-
tion. TM manufacturers have adopted this approach. Many TMs can now search at
subsegment level in the TU to give a partial perfect match, rather than a fuzzy match
which needs post-editing. This kind of subsegment leverage (Flanagan 2014) has
been given different names by different CAT tool manufacturers: ‘Deepminer’,
‘Uplift’, or ‘Longest Substring Concordance’. The TM looks for a matching string
of words within another TU and that substring is not affected by any surrounding
words which would previously have prevented a match.
The identification process is straightforward, but CAT programs operate dif-
ferently when their respective TMs reassemble or auto-assemble new and longer
strings, and results vary. ‘Uplift’ which matches subsegments in the TM works better
if the TM is smaller and more specific.The program claims that Uplift saves us using
their external terminology database. This might be true if the searched terms are
38 The translation memory database
stored in the TM, but if they are new or stored in external resources the suggested
match may be less perfect without access to terminology and reference features.
CAT tool developers are trying to remedy this short coming, and some CAT tools
give the translator more control over resources that can be used in addition to
the TM. ‘Deepminer’ was the first form of subsegmentation. The TM ‘mines’ for
subsegments and uses statistical data to analyse TM content (DTA=Dynamic TM
Analysis). DTA was called ‘Guess Translation’ in another CAT tool. What all CAT
tools do is perform a concordance search, i.e. find subsegment pairs in source and
target strings and present the subsegments combined with the non-matching word
strings in the Translation Results window. This procedure allows the translator to
select the target subsegment, insert it, and translate the remainder of the string. The
feature Auto-assemble was also a product of data analysis. It was a laudable attempt
to create good matches, but the edit distance was high. Autosuggest is not unlike
predictive writing in smartphones: it recalls potential matches in the TM, triggered
by the translator’s keystrokes. Subsegmentation in CAT tools is work in progress.
CAT tool upgrades bring new features.
Obviously, the quality of subsegment matching will depend on the quality
and specificity of the TM. A random TM will give random matches, whereas a
customised, domain-focused, well-maintained TM will deliver higher quality
matches and a lower edit distance. Again, it is the translator who holds the controls.
Sophisticated features need data, and they are provided by the translator.
2.3.1 Alignment
The alignment feature in the CAT tool turns previously translated documents and
their source texts into translation units (TUs) so that we can add them to a TM.
Performing alignment to boost the TM can be useful but it is time-consuming.
CAT tool developers have tried to facilitate and automate the alignment process.
It has become more intuitive and one CAT tool claims it is now integrated and
automated. Different programs have different names for the alignment feature. In
one CAT program (memoQ 9.1), we add reference files to a LiveDocs corpus and
the program will then align them automatically. It claims that the aligned results are
correct in most cases, but that human revision is needed each time a match presents
itself to ensure good results. It is advisable to make yourself familiar with the
alignment feature before the need arises. Automated alignment is a great improve-
ment compared to the manual adjustment of join-up lines between segments which
used to be the standard method in CAT tools (Figure 2.8).
CAT tools offer five types of alignment: alignment with review, which means
checking the join-up lines between segments or review in the translation editor on
the fly, alignment of single files and multiple files. The fifth type is the monolin-
gual review (5.3.1): the translator can revise an exported clean target file.When the
reviewed target file is re-imported, the TM performs an automatic alignment and
updates changes. The alignment of source and target texts, and external reference
material is not the only way to boost a TM, we can create TM content with existing
public translations or public, bilingual corpora (4.3).
FIGURE 2.8 Alignment with join-up lines between source and target segments
40 The translation memory database
2.4 Formats
Source text files for translation come in many different formats (up to 50) and
CAT tool manufacturers compete to keep up with changes and the interchange-
ability of new formats. CAT tools have made it possible to translate files that were
previously inaccessible or not readable on translators’ desktops. There are image
files or files with such complex layouts that reformatting the TT would put the
translation job beyond the scope of the translator and the budget of the client.
Before accepting an unusual file format, it is advisable to check if your CAT
tool can process it. You may receive files in formats with formatting instructions
hidden in tags, which must be observed in the target segments. Tags are entered
in the TM as part of the string. Tags affect not only the word count but also the
identification of strings. The TM may not identify a string as a match due to
an added or missing tag. Tags are considered inconvenient, their insertion takes
extra time, but they are the best way to retain difficult formats. Some formats are
easier to manage within the CAT tool because of its standardisation of all formats
in the translation editor. Formats such as Excel spreadsheets, web-based HTML
files, PDF files, and PowerPoint files all conform to the segmented grid in the
editor. The only difference is that their formatting must be kept by means of tags.
There are format tags in the SL segments and they must be inserted in the target
segments by the translator.
MS Excel
The translation process of an MS Excel file format is more convenient in a CAT
tool than in Excel: the text does not hide behind other cells, furthermore, it can
be spellchecked, the word count feature operates and the predictive aspect is absent
in the CAT tool. Although Excel spreadsheets were not designed for translations,
they have become popular with LSPs to manage short or fragmented source texts
such as instructions or specifications that need translating in multiple languages.
All the target languages can be assembled in one spreadsheet. When working in
Excel, the translators must insert translations in the respective language columns,
and if their language is in the P column, it is difficult to see the source text in the
A column without moving the scroll bar. The CAT tool imports and converts the
Excel format to its clear bilingual editor interface and exports the translation in its
original Excel format. One CAT tool (DVX3) has a function that excludes red text
(fonts) in spreadsheets. If the translator copies the source text to their column and
changes the font colour to red in the A column, the source text will remain in the
42 The translation memory database
A column and not be overwritten in the CAT tool and the target text is exported
in the required column.
HTML
HTML (HyperText Markup Language) files have a computer ‘markup’ language
which although readable, has tags with all the information needed to define struc-
ture and layout for the text on the World Wide Web. The tags are placed within
these symbols <>, which is what the eye can see. The CAT tool reads all (hidden)
instructions and presents them with its own set of tags (also called codes) in the
source segments. To replicate the original layout, the CAT tags must be included
in the right places in the target texts. Translators complain about tags, because they
require extreme accuracy and cost extra time. If the syntax of SL and TL is very
different, the insertion of tags in the right place can be demanding. The CAT tool
flags up error warnings if tags are forgotten or misplaced and the file cannot be
exported until the omission is corrected. CAT tool manufacturers are working
hard to simplify and automate the insertion of tags. It would be impossible to post
HTML target files on the web without the markup and we must therefore respect
the CAT tool tags.
Adobe Acrobat Reader could not open ‘xxx.pdf ’ because it is either not a supported
file type or because the file has been damaged. For example, it was sent as an email
attachment and was not decoded correctly
PDF
The Portable Document Format (PDF) is one of the most widely used file
formats. It is an easy method to secure a file and change it from an editable format
in MS Word to an uneditable image file. It is a problematic format for the trans-
lator. Translation in a CAT tool is not always possible and if it is, sentence sequen-
cing and format may change. If the PDF file is text-based you will find that
you can search and copy text, but if it is image-based this is not possible and an
Optical Character Recognition (OCR) program is needed to convert the file and
make it editable.
It is important to understand why CAT tools have more difficulty converting
some files, and why the success of the TT layout is not guaranteed. The clearer
the text, the better the conversion result, but a scanned text file saved in PDF
format is enough to make a translator and their CAT tool despair. The pseudo-
translation feature is helpful to see if the file can be exported and what the target
file will look like prior to attempting a translation. If a translator receives many
PDF files, it is worth considering the use of a high-quality OCR program to
convert PDF files prior to importing them in the CAT tool. It guarantees the
retention of format and sentence structures. In comparison, CAT tools can pro-
cess other common formats extremely well, such as PowerPoints, and formatting
results are good.
2.5.1 Filters
We have stressed the importance of maintaining and editing the TM. CAT
programs have their own servicing mechanism and we should apply the TM
repair tool regularly. It does most of the ‘maintenance work’ for us by re-indexing
the database and we can set the edit tab in the TM to remove duplicates. This
process is called setting up filters. The TM editor has many filters and deserves
our attention:
• Edit filters can be used to search for a specific source/target term or phrase in
the TM.
• We can check modification dates in ‘history’ to recall previous versions for
comparison.
• We can set or add filters. For example,TM maintenance is made easier if we set
the filter to show duplicates, because they can then be deleted on the fly each
time they appear in the Translation Results window.
• We can set a filter to capitalise certain words in our languages or we can tell it
not to capitalise.
All these functions and more are available in the TM editor. When a TM slows
down because of its size, or if we use several TMs simultaneously in a project to
recall as many matches as possible, filters help improve matches and prevent false
positives.
46 The translation memory database
Some programs have a filter that sets penalties. They produce percentages in
the margin and alert us to matches which may be 100% identical but include a
tag or a space. If we ignore this penalty it will affect the format. A percentage of,
for example, 90% will alert us to differences and allow us to accept or edit the
proposed match. We can add or remove penalties depending on our requirements.
Furthermore, if one of our clients requests the term ‘item’ (ST) to be translated
as ‘product’ (TT) and another client prefers ‘article’, we can set penalties to alert
us that our clients use different terms. Penalties are also useful when we import a
TM from a colleague. A penalty set for the entire external TM will tell us that its
search results are not our matches. Penalties are not ‘punishments’ but alerts that
matches need closer scrutiny. Filters give us a means to control the TM regarding
the matches it recalls.
2.5.2 Regex
Our texts are full of patterns, such as dates, which are ordered differently per country
and language. Regex in the CAT tool is set up to apply transfer rules for different
formats of date and time, currencies, metrics, numbers, and email addresses. The
transfer rules will adapt dates from, for example, US 03.13.22 ST to UK 13.03.22
TT. Regex helps the TM recognise the string and it will match the digits in the
source and target segments on the fly.
Regex, which stands for ‘regular expressions’, is a mathematical theory on
which pattern matching is based. MT and CAT could not operate without
regex. It is not about regular language expressions but about matching recurrent
patterns. Why is regex important to us? If we recognise and understand pattern
matching, we can change segmentation rules and improve our error checking in
the CAT tool’s QA (quality assurance) functions. Here follows an example where
the TM will not recognise the sentences as identical because they lack automatic
numbering:
B)
This is a new model.
1.1.
This is a new model.
If you go to segmentation rules in the CAT tool you can add the following regex:
^\(?[a-zA-Z0-9]+\)[\s\t]*
It means: look for all segments that start with any lowercase letter or
uppercase letter or any number between 0 and 9, which repeats itself one or
more times, is preceded or not by a left parenthesis, is followed by a right paren-
thesis, then by a space character or a tab character, which repeats zero or more
times. To change this rule we then add #!#, which means apply a segment break
here. Regex may not immediately appeal to linguists, but translators who have
learnt a few codes find it extremely helpful and an excellent way to customise
their TMs.
Regex codes consist of metacharacters, which are standard. Here are some
examples:
If you want to check for inconsistencies, and you realize/realise that you may have
confused GB and US spelling, you can click on Find and Search for one spelling
and then again for the other spelling. The regex metacharacter | enables one
search: realiz|se.
Many regexes are pre-set: if boxes in the CAT tool settings are ticked, cur-
rency in the English group should appear in the target format as 1,234.56 and
in the German target format as 1.234,56. This feature is automatically enabled
when we select source and target languages. However, we may have a client who
requires a space instead of a dot as the decimal separator, in for example, 2 000,
and it is helpful to set the regex accordingly in the custom list in auto-translation
rules**(link: regex).
CAT tools explain how to create a regex list in their Help sections and although
it requires some code writing, it is a skill that can dispel irritation: it prevents
48 The translation memory database
the TM from producing repeated errors or fuzzy matches or what we call false
positives, or unwanted matches in Search and Find. MT and TM systems both rely
on regexes to match TUs; MT needs regexes as a fully automated system. CAT tools
and MT systems are gradually merging, based on the same modus operandi. The
next chapter discusses integrated and adaptive MT in CAT tools, which would not
be possible without regexes.
Project-based assignment
Objective:
Collaboration in a translation project and the management of one or more
shared TMs either online or offline
Method:
The assignment is designed for a project management team but collaboration
between individuals is possible
Tools:
A desktop CAT-tool to test the exchange of TMX and XLIFF files. A server
version is not necessary but would facilitate real-time collaboration between
collaborating translators.
Suggested resources:
Digital source text and digital reference materials (monolingual for reference,
bilingual for alignment)
Assignment brief
Set up a project management team (1.7.4.1) to include a senior project manager, a
relations manager, who is responsible for negotiations with the client and translators,
a resource/reference manager, and other managers as appropriate.
Your translation project involves a client, contracted translators, and one trans-
lation with a high word count plus adequate reference materials. The source text
for translation into one or more languages is split and shared between two or more
translators per language pair/direction. The main objective is to build a useful TM
(to be shared as a TMX) and supply reference materials to maintain consistency in
the translation. Consistency can be measured according to key terms in the TMX
and reference materials.
In this assignment most time should be spent on project and TMX preparation
within the team.
The following actions are suggested, the order is not set:
Possible pitfalls
• duplicates in the TMX
• poor quality of the source text. Manipulate the ST if needed. Consistency in
the ST can be helpful.
• import and export of TMX files impact (sub)segmentation
• loss of data in TMs and TMXes during migration – keep backups
• forgetting to attach a TM(X) to the project
Concluding remarks
This chapter has illustrated how the TM as the main component in the CAT
tool can improve translation quality and increase your productivity. The industry’s
50 The translation memory database
mantra has been reviewed in the light of quality and consistency, productivity and
quantity, and the impact of the TM on the profit margins of manufacturer, LSP, and
translator.The CAT tool is an extremely sophisticated and comprehensive program.
The TM is however relatively inflexible. We cannot make alterations or additions
in the TM on the fly, as it can only accept our TUs, store them and recall matches
with high similarities. Improved translation quality depends on the user’s mainten-
ance of TM quality. The TM can only increase productivity if it is well filled and
customised, possibly accompanied by a range of smaller customised TMs, rather
than one big master TM. Furthermore, if we do not use filters and regex, and we
must change smart apostrophes to curly apostrophes manually after completion, we
are not using the tool efficiently. If we are asked by the client to use terminology
consistently and there are multiple translators involved, we must check matches
from an external TMX file before we insert them. We need to know their origin,
date, and version history. If our TM is boosted with TMX data, it might be prefer-
able to view it in Read Only mode. Alternatively, we can create a new TM for the
client and import the TMX data into the customised TM. Productivity and quality
are ultimately controlled by the translator and not by the CAT tool.
Further reading
Dragsted, Barbara (2005). ‘Segmentation in translation. Differences across levels of expertise
and difficulty.’ In: Target, 17(1): 49–70. John Benjamins. https://doi.org/10.1075/
target.17.1.04dra.
Flanagan, Kevin (2014).‘Subsegment recall in Translation Memory – perceptions, expectations
and reality’, JoSTrans, 23.
Garcia, Ignacio (2015). ‘Computer-Aided Translation’. In: Chan, Sin-Wai, The Routledge
Encyclopedia of Translation Technology, pp. 68–87. London, UK: Routledge.
Gintrowicz, Jacek and Jassem Krzysztof Jassem (2007). ‘Using regular expressions in trans-
lation memories’, Proceedings of the International Multiconference on Computer Science and
Information Technology, pp. 87–92.
Zetzsche, Jost (2003–17). A Translator’s Tool Box for the 21st Century: A Computer Primer for
Translators. Winchester Bay: International Writers’ Group.
Zetzsche, Jost (2019). ‘The 297th Tool box Journal’. The International Writers’ Group.
3
INTEGRATION OF MACHINE
TRANSLATION IN TRANSLATION
MEMORY SYSTEMS
Key concepts
• The translation industry has embraced technology in response to high volume
translation requiring a fast turnaround
• Artificial intelligence (AI) and the further development of machine translation
(MT) meet the high demands
• Translation technology developers have invested significantly in research to
improve MT
• We can combine MT search engines with the TM in the CAT tool to improve
and speed up our translation work
Introduction
Machine translation (MT) is an important TEnT. In this chapter we will evaluate
MT’s added value to translation when it is integrated in the CAT tool. The MT
function requires little preparation or action. When it is enabled in the CAT tool,
its proposals pop up in the Translation Results window for the translator to accept
or reject. TM matches take priority and MT matches may appear at the bottom of
the list, depending on the CAT tool.
The MT database cannot be edited: it is part of the external search engine.
We can only accept, reject, or post-edit proposed matches. Our discussion of MT
is specific: MT as a TEnT, integrated in the CAT tool and potentially adaptive.
For example, if we accept and edit an MT match, the TU enters the TM when
it is confirmed. Next time, the previously edited MT match will be proposed as
a TM match. Developers have caught on to this and designed closed circuits in
which the MT learns from the TM and becomes adaptive.This is not possible in an
52 Integration of machine translation
independent MT engine such as Google, which learns from huge corpora and not
from the individual translator’s CAT tool entries.
It is important to understand the operation of TM and MT in the CAT tool.We
must also know how to estimate and evaluate MT quality in the CAT tool. If we
have a better understanding of MT features as a TEnT and use the tool appropri-
ately, we can reduce the need to post-edit (PE) after the translation is completed.
We can do the editing on the fly. We will therefore discuss MT as a TEnT in a sup-
portive capacity, its integration in CAT tools, the quality it delivers, and the evalu-
ation of its quality through models and metrics. All this can be tried and tested in
the project assignment.
CAT tool
custom
MT
terminology translation
database memory
percentages (2.1), whereas MT matches are not rated or scored. This is an imbal-
ance that is detrimental to MT matches. If algorithms were given more domin-
ance, the TM might be able to select from the MT database, but NMT requires
such a large amount of training data that it would be too large for the TM to
hold and process.
Most CAT tools enable the integration of NMT and various programs list the
NMT engines that can be used. Although the integrated MT comes second in the
CAT tool, TM matches and MT matches are recycled in each other’s databases. If
the TM does not leverage a match, the MT will generate one, which will then be
entered in the TM after it has been edited or confirmed by the translator. It will
populate the segment next time as a TM match. In an adaptive MT system (3.6),
TM matches enter the MT through deep learning (Figure 3.1). The searches of
TMs and MT engines in their respective databases follow a similar pattern, but the
qualitative outcome is quite different.
CAT systems have shown improvement in MT match accuracy. They can now
apply another layer of metadata to segments to indicate whether an MT translation
will reach an acceptable level of quality (TAUS). Web-based CAT tools can predict
which MT engines are likely to give the best integration into CAT workflow. This
development is called machine translation quality estimation and its development
is still in its early stages.
TAUS (2019) compares the different levels of productivity between TM and
MT. The study shows that segments translated with TM have a shorter edit dis-
tance on average and are translated faster.The minimum edit distance between two
strings of words is the minimum number of editing operations that are needed
to transform one to match the other. Editing operations are generally insertion
(I), deletion (D), and substitution (S). The example shows how the MT transla-
tion (line 2 in Table 3.1) compares with the ‘ideal’ reference translation (line 1 in
Table 3.1):
The edit distance is determined by the weighting given to each edit distance. If
S (substitution) scores 1, D (deletion) scores 2, and I (insertion) scores 3, the edit
distance in this sentence is 8.
DQF, the TAUS Data Quality Framework, looks at the distribution of
MT+PE+TM+HT in hybrid TM/MT processes. Any process will require some
post-editing (PE): MT matches, TM fuzzy matches, and human translation (HT)
without matches. What TAUS discovered was that 100% matches in the TM were
more reliable than any MT match, and that fuzzy matches which by default have a
correctness threshold level of around 85% are less reliable than MT: the lower the
fuzzy match percentage, the more reliable the MT match. Their study of efficiency in
the different processes shows that a combination of all processes (MT+PE+TM+HT)
is given an efficiency percentage over 45%, TM achieves a percentage of nearly 45%
in suitable word matches and HT is given the highest percentage of time spent on
each segment. TAUS’ findings indicate that the edit distance in TM is longer than
in MT and yet TM productivity is higher. A suggested explanation is the translator’s
familiarity with their TM and the awareness of the level of editing that is required in
fuzzy matches according to their percentages. TM fuzzy matches are, however, a grey
area because of the different match rates. TAUS concludes that there is no clear point
where MT efficiency proves to be higher than TM efficiency.They also found that the
target language plays a role in quality and output. If English is the source language, MT
productivity in Western European languages can be up to three times higher than in
Asian languages. In terms of productivity, MT comes close to the lower fuzzy matches.
56 Integration of machine translation
Advances seem to come from the CAT program manufacturers who have
developed adaptive MT in CAT tools to leverage MT as a TEnT that learns from
the TM. But before we discuss adaptive MT in CAT tools and translation man-
agement systems, we will discuss how MT search engines learn from TMs in the
CAT tool.
Integration of machine translation 57
Why are many translators less enthusiastic about MT than the companies who
make the engines?
consistency and results. It appeared that modifications were possible in either dir-
ection, and could have an upgrading or downgrading effect, the latter caused by
repeated changed word order in the MT.
Quality and edits are the responsibilities of the user. Confirming unedited or
poorly edited matches will have a negative impact on the TM and consequently
on the MT, regardless of which database presented the match in the first place.
The conclusion in the study was that TM data performed better than the baseline
engine, which correlates with TAUS’ findings (3.3).
A research group at the University of Trento Italy (Farajian et al. 2017) focused on
MT input and discovered that the integration of NMT in a translator’s workflow seemed
to work reasonably well if source domains were controlled, but that quality deteriorated
if there were multiple domains, large domains, and hence greater data diversity. They
also found that NMT worked well when sentences were processed from domains close
to the training data. Hence the success rate is likely to be higher when translators work
in a limited number of domains, feed their own data into an integrated MT in their
CAT tool, or build their own MT engines in translation management systems (7.5).
Adaptive and self-build MT engines enable translators to influence, control, and edit
MT content and quality before matches are confirmed and propagated.
collaborate with other translators. The interface integrates the components MT,
TM, and Tmdb and the engine provides the best suggestions based on the three
components. It states the source of the proposed matches.The topic of adaptive MT
engines is continued in translation management systems (7.5).
How do you see the benefits of MT as a TEnT when it is integrated in a CAT tool?
• Extra quality translation; both fluent and idiomatically and accurate, culturally
correct in the target language; to be used for advertisements and literature
• Adaptation, which is not a direct translation of an original text; to be used for
press releases and advertising
EAGLES 1996
work together and there is a better understanding of the applied technology, the
translator and their TEnTs are more likely to deliver a better-quality translation.
a set of linguistic criteria for testing against a human reference translation. AEMs can
make use of the information retrieval concepts precision and recall (Doherty 2017).
Precision is the fraction of words in the MT output that is correct compared
with the HT benchmark and recall is the total number of correct words (i.e. pre-
cision examples) produced by the MT. This is called error typology, whereas
other AEMs measure the edit distance according to the minimum number of edits
needed to match MT output to the reference HT. Of course, we could question the
objectivity and reliability of the ‘golden standard’ human translation, which is used
as the MT reference (Doherty 2017). Two translators will rarely generate identical
target texts, and two reviewers will seldom produce the same review.
The following models, LISA (Localization Industry Standards Association) and
TAUS, use metrics that are meaningful measurements and calculations to improve the
engine and estimate the quality of the output. In this book we are not measuring pure
MT but comparing MT and TM output in CAT tools in integrated and adaptive mode.
Nonetheless, we can still apply the following metrics after we have generated a ‘golden’
benchmark translation in HT. The important feature is that the metrics are weighted.
The LISA Quality Assurance (QA) model is used by companies or LSPs to structure
feedback from reviewers on translation quality in the localisation industry. Errors are
weighted with one point for a minor error, 5 for major and 10 for a critical error:
Weighting means that if we record two minor mistranslations, they will generate
a score of 2. But if we record two critical mistranslations, they will generate a score
of 20. Scores for all segments in the task are totalled to give a final score for the task.
If the overall quality of the task falls below a predefined threshold, the task will fail
the LISA check. It is important that scoring models are agreed to make sure that all
reviewers score to the same standard.
The error typology template is used by LSPs to manage their quality program.TAUS
has developed a Dynamic Quality Framework (DQF) which uses Multidimensional
Quality Metrics to standardise translation evaluation. Their objective is that buyers
and providers of translation can compare and benchmark MT productivity and
quality. For translators it would be useful if we could apply metrics to a sample trans-
lation to give a client an estimate of quality, time, and cost.With the DQF plugin we
can track our translation quality and productivity in a CAT tool.The plugin collects
data from our CAT tools and sends a report to visualise our data.
Doc language
Mistranslation 1 5 10
Accuracy 1 5 10
Terminology 1 5 10
Language 1 5 10
Style 1 5 10
Consistency 1 5 10
Integration of machine translation 65
The new metrics are analytic metrics that try to explain why translations are good
or bad, whereas holistic metrics merely state the quality level. In the DQF, TAUS sets
new standards to define errors. It creates a taxonomy, a more accurate classification
of translation errors, and a scoring method to produce numeric indications. A root
cause analysis also refers to the source text and its quality. Its metrics require human
intelligence, it is not a one-size-fits-all metric, it is not an automatic approach, nor a
BLEU reference-based score. It renders a TQ qualification for any quality manage-
ment system so that it is in accordance with ISO standards (5.4). It includes validity,
verification, and reliability as properties of a metric.
The DQF-MQM (Dynamic Quality Framework-Multidimensional Quality
Metrics) is an error typology used by TAUS like the LISA Quality Assessment. It
is extensive:
The category Verity (Table 3.3) means that translations should be rendered according
to locale conventions, for example, a copper wire in a product in one country might
not be copper in another. The category Style tends to be subjective and is an area of
disagreement among translators.The best rule is to aim for a minimal set of error types.
In fact, it is best to avoid checking style if not relevant (e.g. in manuals or specifications).
The emphasis of metrics tends to be on error management and error repair.
Good MT translator management should consist of domain-focused engine selec-
tion, TM data input and editing, and ST quality checks to enhance output quality.
At the beginning of the chapter, MT was presented as a rigid, inflexible TEnT but
we have discovered many ways of managing input, which should give us better
results in integrated (adaptive) MT and reduce PE time within the CAT tool. The
following project-based assignment gives us an opportunity to trial this aspect of
translation tool management.
TABLE 3.3 TAUS (2019) Dynamic Quality Framework (DQF) uses Multidimensional
Quality Metrics
Project-based assignment
Objective:
Ability to estimate and assess output quality and efficiency of integrated TM
and MT in a CAT tool
Method:
The assignment is designed for project teams with one language pair. A clear
distribution of tasks is necessary to provide rich data for metrics; good
data records will increase the reliability of results and restrict the impact of
variable data
Tools:
CAT tool with integrated or adaptive MT; TMS system; APIs
Suggested resources:
Several source texts from different domains with high word counts to try
different MT search engines
Assignment brief
The project assignment is about testing and assessing integrated MT in the
CAT tool. You will need to set up a project team and appoint a project manager.
As a team you prepare a quote for a 6000-word translation (a deadline of four
workdays) which gives the client an estimate of the anticipated quality levels and
Integration of machine translation 67
Possible pitfalls
• TM(s) with little data
• Poor MT data
• Text suitability for MT: the evaluation may be less dependent on data than the
suitability of the selected source texts. Test samples first
Concluding remarks
The objective of this chapter is to help you understand the processes used by MT
engines and the conditions that make them work well in a CAT tool. We have
explained how neural machine translation is a significant improvement after statis-
tical machine translation, particularly in fluency, syntax, and word order, although
not necessarily in terminology. You have had a chance to experience post-edits
68 Integration of machine translation
Further reading
Doherty, Stephen (2017). ‘Issues in human and automatic translation quality assessment’.
In: D. Kenny (ed.), Human Issues in Translation Technology, pp. 131–48. London,
UK: Routledge.
Forcada, Mikel (2015). ‘Open-source machine translation technology’. In: Chan Sin-Wai
(ed.), Encyclopedia of Translation Technology, pp. 152–66. London and New York:
Routledge.
Garcia, Ignacio (2015). ‘Computer-aided translation’. In: Chan Sin-Wai (ed.), The Routledge
Encyclopedia of Translation Technology, pp. 68–87. London and New York: Routledge.
Hutchins, John (2015). ‘Machine translation. History of research and applications’. In: Chan
Sin-Wai (ed.), The Routledge Encyclopedia of Translation Technology, pp. 120–36. London and
New York: Routledge.
Kit, Chunyu and Billy Wong Tak-ming (2015). ‘Evaluation in machine translation and
computer-aided translation’. In: Chan Sin-Wai (ed.), The Routledge Encyclopedia of
Translation Technology, pp. 213–36. London and New York: Routledge.
Moorkens, Joss, Stephen Doherty, Dorothy Kenny, and Sharon O’Brien (2013). ‘A vir-
tuous circle: laundering translation memory data using statistical machine translation’.
In: Perspective Studies in Translatology. DOI: 10.1080/0907676X.2013.811275
TAUS (2019). DQF BI Bulletin – Q1
Wilks,Yorick (2008). Machine Translation. Its Scope and Limits. New York: Springer.
Zetzsche, Jost (2019). ‘The 298th Tool Box Journal’. The International Writers’ Group.
4
THE TERMINOLOGY DATABASE
Key concepts
• The terminology database is a comprehensive database in the CAT tool
• A terminology database is usually internal and integrated in the CAT tool (but
linked to the CAT tool if external)
• The terminology database complements the TM, especially when morpho-
logical changes affect matches
• The quality of online terminology resources is variable
• Corpora and concordances are indispensable terminological resources
Introduction
The terminology database (Tmdb), also called termbase in some CAT tools, is
one you build yourself within the CAT tool. A good Tmdb optimises, refines, and
customises TM recall. When the TM does not give anticipated matches, the Tmdb
can assist. Terminological pairs must be added manually. They are not stored auto-
matically like TUs in the TM. We create or open a Tmdb at the start of a project
and add new entries by highlighting the source and target term pairs or phrases.
We access the ‘add term’ function on the menu with a right click, a shortcut code,
or through an icon on the ribbon.To add metadata, such as parts of speech (POS),
synonyms or definitions, we use a different shortcut code or click on full entry
in the drop-down box. When the term recurs, matches appear in the Translation
Results window. They are marked with different icons or colours to show that
they are Tmdb results. The Translation Results window in Figure 4.1 (top left)
shows from top to bottom: two red TM matches with red ticks (1–2), three blue
Tmdb matches with green ticks (3–5), and four EuroTermBank results with their
typical icon (6–9) (4.6).
70 The terminology database
4.1.1 Glossary
A glossary can be defined as a language dictionary in one or more languages
containing all the terminology of a domain usually preferred by an organisation.
A vocabulary is a list of terms with synonyms and definitions (or explanations) in
two or more languages, relating to a specific subject field, but not pertaining to the
activities of a company or organisation like the glossary. A good example of an online
glossary is ‘Electropedia’, which is produced for the International Electrotechnical
Commission by the International Electrotechnical Vocabulary, a leading global organ-
isation that prepares and publishes International Standards for Electrotechnology**(go
to www.routledgetranslationstudiesportal.com/ – A Project-Based Approach to
Translation Technology – link:glossary). Electropedia is multilingual, currently offering
terms in 14 languages, accessible to all. Another example of a glossary created by
an organisation is the Online Glossary on Governance and Public Administration
developed by the Division for Public Institutions and Digital Government (DPIDG),
of the UN Department of Economic and Social Affairs in collaboration with
CEPA**(link: glossary), Classification of Environmental Protection Activities:
The purpose of the Glossary is to provide United Nations Member States, and
all other interested parties, with a common definition of the basic terms and concepts
related to governance and public administration used in United Nations
documents, and DPIDG. In particular, the Glossary aims at improving the
clarity of the intergovernmental deliberations of the United Nations itself; and
at assisting Member States to better implement United Nations resolutions
by providing a more unified understanding of governance and public admin-
istration terminology. This Glossary contains non-legally binding definitions
72 The terminology database
Why is a glossary, a potential source for a CAT Tmdb, more suitable than a
vocabulary?
4.1.3 Metadata
If we compare the translation memory with the terminology database, there is a sig-
nificant difference in the amount and type of metadata either can store, and our control
over them. Metadata in the TM tell us when and by whom a segment was translated
or edited. This can be useful when comparing previously translated versions, which
are all stored in the TM.When we set up the project, we can state the domain, client,
and deadline. The Tmdb, in contrast, is filed manually and metadata are added by the
translator: SL/TL variant(s), fields of application, part of speech (POS), definitions,
sources, usage, and context (2.2, 4.4).When a Tmdb match appears in the Translation
Results window, we can open a dialog with a right click to view added metadata. If
the Tmdb is well filled and edited we call it a ‘high-end termbase’ (Melby 2012).This
means that the concept entries have been checked (not taken from the web without
quality control), and are accompanied by equally checked information, the metadata.
These data help the translator choose appropriately if there are several target options
for the source term. This work practice has a positive impact on the quality of the
TM and reduces TM edit time if appropriate terms are selected immediately and con-
sistently. Table 4.1 shows how a comprehensive terminological entry in the database
removes ambiguities and supports terminological consistency in a translation project.
It gives an example of a definition of a special breed of cat in the Tmdb of a 20,000-
word-encyclopaedia-on-cats translation project:
The window ‘Create term base entry’ (memoQ 9.2) in Figure 4.2 opens after
you have highlighted a source and target term pair in the translation editor and
clicked on ‘Add term’. It has entered the following metadata:
Termbase(s) en-nl-nl
Term entry supplied (duplicates 3)
Languages English Dutch
TL term …….
TL contextual definition …….
TL source www....
Translation equivalent (full or partial) …….
The terminology database 75
• entry – source term(s): [+] for additional terms, [|] for change, and [-] for
removal
• matching:
• matching ‘50% prefix’: morphological changes (4.5) such as verbal
prefixes or endings are not immediately identified as matches by the
TM. The ‘50% prefix’ is the default setting. In this case, the Tmdb will
look for ‘supplies’ and ‘supplied’ but not for ‘supplying’. The default can
be changed to fuzzy, exact, or custom
• case sensitivity: this can be set to ‘yes’ or ‘no’, if you would like a term to
contain small letters or caps or both.
76 The terminology database
• usage enables us to add forbidden terms, meaning how a term should not be
translated
• grammar enables us to add POS, gender, and number. For example, the term
‘research’ can be a verb or a noun
• definition (Table 4.1)
Definitions are sometimes needed to clarify the meaning and standardise the
term (4.1.4, 5.4), particularly in technical texts. A good and helpful definition will
meet the following criteria:
Which metadata would the translator find helpful in a TBX file provided by the
LSP to assist a translation on fuel converters? Rank them in order of importance.
The terminology database 77
search techniques and validity checks of our web hits. When we add a target
term retrieved from the internet to the Tmdb, we must be sure of its validity
and accuracy. Furthermore, the insertion of target terms in the CAT editor
needs to be based on our understanding of the entire source text, and our trans-
lation benefits from a linear, syntagmatic approach.
• (General-purpose) search engines are software applications that help search for
websites in WWW and then access and display them. Each search engine has
its own special features which also vary per country
• Specialised search engines show index pages for special topics only, which may not
be shown in general-purpose search engines
• Meta-search engines are search tools that operate as aggregators and use other
search engines’ data to produce their own results**(link: meta-SE)
Our search techniques can be made more term specific by using the operators
AND, ANDNOT in the search box of the search engine.These operators are called
Boolean logic. Its use has distinct advantages and immediate results:
• AND narrows the search by retrieving only documents that contain the
keywords we enter, e.g. ‘tea AND coffee’ excludes ‘tea and chocolate’,
• OR expands the search by returning findings with either or both keywords, for
example, ‘nursery OR day care centre’, and NOT limits the search to the first
keyword only, for example ‘coffee ANDNOT tea’.
The validity and accuracy of our term hits on the web are important. Boolean logic
is a good filter and makes hits more specific. Another filtering method is to check
the source and only accept term hits from reliable URL website addresses. The
URLs presented to us by the search engine tell us about the origin of the web-
site and help us determine a level of reliability. A domain suffix**(link: domain
suffix) is the last part of a domain and defines the type of website we may be
about to access. ‘Com’ domains are commercial websites, whereas ‘org’ domains
are used by organisations, and ‘co.uk’ means that the business is run in and from
The terminology database 79
the UK. Domain suffixes are not entirely reliable and the actual website must
be checked to be sure that it was not hijacked by a less trustworthy body for
their own purposes. The quality of the website does not necessarily guarantee
the quality of its terminology and linguistic checks are necessary. We can check
the quality and usage of term hits by inserting them in linguistic corpora and
concordances.
4.3 Corpora
‘Corpora’ (plural form of ‘corpus’) are large structured sets of texts or words.
A terminology database in a CAT tool is a corpus of words, like a dictionary; the
TM is a corpus of phrases or sentences. Aligned texts and reference files consti-
tute corpora in a CAT tool (Alignment 2.3.1 and Reference files 2.3.2). Digital
corpora can be external or integrated in the CAT tool. In the following sections
we will examine some external corpora and their TEnT qualities. Glossaries and
external terminology databases (interchangeable TBX files) can be imported in
the Tmdb.
FIGURE 4.3 Bilingual data corpora in Linguee stating that ‘external sources are not yet
reviewed’
first entry in Figure 4.4 gives information on industry (Legal Services), type
(Standards, Statutes, and Regulations), Data owner (European Parliament), Data
Provider (TAUS).
Google offers us multilingual data corpora by letting us search the web in the
requested language. It gives a random set of links to the search term in context.
Other kinds of open source multilingual corpora programs**(link: multilingual
corpora) contain words and phrases which are too new or too rare to appear
in a dictionary or standard corpus. Similar to TAUS Data cloud, they operate
as a concordance in Key Word In Context (KWIC) style (4.4). There are many
other free downloadable parallel corpora of texts (for alignment) and terms avail-
able to fill your tools, but this does not guarantee the quality of your translations
(Zetzsche 2003–2017).Their web mining for material is done automatically and
82 The terminology database
can extract terms from reference materials by ‘bootstrapping’ our corpora onto
existing and available web corpora. For this there is the webBootCaT tech-
nology and there are BootCaT tools**(link: creating corpora for Tmdb). The
designers (Baroni and Bernardini 2004) of the webBootCaT built a tool that
does not need downloading but can be created by using the Google search
engine. The basis method is that you first select a few seed terms (keywords),
then send queries with the seed terms to Google and then collect the pages in
Google’s hits. The vocabulary in your created corpus of texts can be extracted
automatically in the tool. The webserver of the designers will hold the corpus
and load it into a corpus query tool, such as Sketch Engine** (link: corpus
building) where it is investigated and analysed. The terminology database you
have created must be converted to suitable formats (CSV) to import them in
CAT tools. The corpus query language in the term extracting tool is a special
language that looks for complex grammatical or lexical patterns to establish
suitable search criteria. We can see the outcome of its searches in the concord-
ance, where our queries are presented in context.
In the following section about the concordance feature we return to the
parameters of the CAT tool which can give us more control over quality in our
databases.
How could standards and quality in multilingual data corpora be made more
transparent?
84 The terminology database
4.6 Termbanks
Large external terminology databases are often referred to as termbanks**(link:
termbanks). They have different modes of operation within or outside the CAT
The terminology database 87
FIGURE 4.6 Termbank and lookup term in CAT tool (memoQ 9.1)
88 The terminology database
Project-based assignment
Objective:
A hands-on experience of creating, building and sharing a CAT terminology
database (TBX) in a translation project to ensure quality and consistency in
all translations across the project
Method:
The assignment is designed for:
• A project team with several terminologists, revisers and a project man-
ager who create a TBX for sharing. You translate the text in-house.
A revised translation on completion of the project will provide the
basis for your assessment
• A project management team with terminologists, revisers and project
managers who split the source text between multiple contracted
translators and provide them with a TBX and monolingual/bilingual
reference texts
• Collaboration between individuals to create a TBX prior to or during
the translation of a shared source text
Tools:
CAT-tool (a server version is not necessary), access to online concordances,
termbanks, WWW and other terminology supporting TEnTs discussed in
the chapter
Suggested resources:
One digital source text with domain-specific terminology which can be
split between multiple translators in a CAT tool. Consult your instructor.
Assignment brief
A client requires the translation of a text in a highly specialised domain. They con-
tact a language service provider (LSP) and ask for a quote and best turnaround time.
The source text has an approximate word count of 500 words per available translator.
The client cannot produce a glossary. The source text is split between translators.
Teamwork is necessary to test the outcome of a shared translation supported by
good terminology resources. The following minimum requirements for resource/
reference materials and TBX apply:
90 The terminology database
• The TBX file for translators contains extracted terminology from mined
sources
• The TBX file includes metadata if you use a CAT tool with an integrated
Tmdb
• Reference files (mono/bilingual) are supplied to translators for alignment in
the CAT tool
Possible pitfalls
• Inferior quality of mined terminology. If ISO standards are not associated with
the resources, how do you check quality?
• If TEnTs are not available or require a subscription/licence, use available
resources, glossaries and dictionaries and create your own quality Tmdb/
TBX file
• Incompatibility of file formats of reference files/glossaries. Txt files or Excel
files (saved as CSV) can be imported in most CAT tools or Tmdbs
Concluding remarks
Translation without terminological support would at times be impossible. We have
seen that terminology occupies a database in the CAT tool in which term pairs must
be added manually, unlike the TM database which accepts all confirmed TUs auto-
matically. One CAT tool comes with an external Tmdb which gives the impression
that terminology is a separate entity outside the CAT tool. The objective of this
chapter is to demonstrate that the two databases are inseparable. The Tmdb is a
benign, clever, supportive, and indispensable friend of the TM. Where the TM fails
to present a match, despite subsegmentation, the Tmdb can do so, with data support
from many other TEnTs, either in the cloud or through digital terminological
resources. The volume of multilingual corpora we can draw from is infinite, ran-
ging from digital dictionaries and WWW term searches to databanks. Our TEnT
findings can be added to the CAT Tmdb. When they become confirmed TUs in
the TM, they can be checked for consistency and usage in a CAT concordance
or external concordance. Various CAT programs accept and integrate termino-
logical reference materials to varying degrees, but one aspect stands out: a high-end
terminological database greatly assists the operation of the TM. The translator is
accountable for the quality of the CAT Tmdb, which is determined by our working
practice and standards.
Further reading
Müller-Spitzer, Carolin and Alexander Koplenig (2014). ‘Online dictionaries: expectations
and demands’. In: Ebook Package Linguistics, pp. 144–88. Berlin: De Gruyter,
ZDB-23-DSP.
The terminology database 91
Key concepts
• Translation technology tools offer quality assurance; translators offer revision,
evaluation, or assessment of translations
• Translation quality assurance in the CAT tool and revision are
complementary
• The final quality check is performed by the user of the CAT tool
• ISO standards apply to the translation process rather than the translation
• A post-editor’s task is complex and deserves more recognition
Introduction
In the previous chapters we presented translation environment tools (TEnT) as
invaluable to the translator for ‘better quality, more efficiency and productivity, and
higher profit’. Ultimately, improved quality, productivity, and profit is determined
by our efficient management of the tools. In this chapter we will try to discern a
balance between human assessment and its technological counterpart, which is
referred to as translation quality assurance (TQA), quality assurance (QA), or lan-
guage quality assurance (LQA) respectively, depending on the CAT tool program.
QA does not only mean quality assurance but can also mean quality assessment.
There is a difference:
• Quality assurance is the function we use when we enable the CAT tool to
resolve errors or flag up warnings to prevent errors. QA in the CAT refers to
the checking of terms, spelling, grammar, non-translatables, omissions, tags.
The process, also called ‘verification’, uses default specifications, such as the
Human and technological quality assurance 93
Ideally, we need a basic revision model like translation job model contracts,
which are made available to translators by professional organisations (7.3.1), with
reference to technological tools and methods that can assist good translation quality.
The revision model should include guidelines for self and third-party revision, but
also for full use of the QA function available in CAT tools. Quality assurance and
quality assessment are essential parts of the revision process. Revision is a critical
stage in the translation project.
Web localisation (often written as l10n = 10 letters including first and final letters)
requires a different take on translation quality. Hypertexts can be presented in the
tool in a nonlinear fashion, which means that what the translator sees is quite
different to how it will be read by users.The content is often split and shared among
translators and it is difficult to adopt a holistic approach to the text (Jiménez-
Crespo 2013: 30). It is even more important that the localising translator has a
greater awareness of digital genres, so that they know what kind of localisation is
required, particularly if the topic is not immediately clear from the segmented text.
98 Human and technological quality assurance
Compliance with local norms and conventions will make or break the quality of
the translation in the eyes of the reader.
We use special localisation tools to localise websites**(go to www.routledgetra
nslationstudiesportal.com/ – A Project-Based Approach to Translation Technology
– link: localisation tools). They are like CAT tools and have a TM. Their filters are
prepared for the most common types of files found in software and websites. They
separate text segments (called ‘strings’) from the source code so that the coding
is not visible to the translator. It makes translation more straightforward and the
codes are not exposed to unintended changes that could damage the product.
Another important feature of localisation is that files are ‘internationalised’ (inter-
nationalisation – i18n), an intermediary stage in which the source file has been
delocalised. The ST has had linguistic and culturally specific features removed so
that translators can start their localisation from a neutral file ( Jiménez-Crespo
2013). Figure 5.1 illustrates how a UK clock, which shows 10 o’clock at night-time,
is internationalised and localised to a 24-hour clock, used in many other countries.
Localisation is a growing industry which should not be ignored by any translator.
Dedicated localisation tools, some of which can be integrated in the CAT tool,
may be better suited to digital material. Web content is becoming so heavily coded
that even CAT tools struggle. Localisation requires the translator to have a good
understanding of metalanguage and adaptation.The client may internationalise web
texts before they are sent out for translation, simply to standardise the process and
standardise strings in future updates.
Quality management in localisation is comprehensive.The industry’s approach is
to use QA procedures to guarantee that quality requirements are met. It uses quality
control (QC) procedures from ST to TT delivery to check the quality of product
or service (Jiménez-Crespo 2013: 104). QA procedures include the localisation of
links and images hidden in the website structure.
Quality in web localisation is of interest because of its fluid nature and diversity.
One-size quality cannot fit all localised texts. In this respect the ‘corpus-assisted
approach to quality evaluation’ (Bowker 2001) deserves mentioning. We have seen
how many methods despite metrics are subjective such as the human reference
What the QA does not check is smoothness or cohesion which relate to style and a
smooth flow of words. We already explained (4.2) how segmentation can interrupt
the translation process if the translator does not take a linear approach to the whole
text. If segmentation affects the TT in spite of precautions, the final monolingual
revision of the TT must address style parameters.
TABLE 5.1 Positive and negative sides to revision in XLIFF or bilingual files
A final comment about monolingual revisions: after hours of revising your own
work (it is estimated we can revise approximately 1000 words per hour), we become
tired.We read but can no longer take in what we read. Other than leaving the desktop
and taking the dog for a walk or making a non-essential trip to the supermarket, there
are a couple of tricks that can wake us up and renew our critical faculties, for example,
changing the background colour of your file, or reading white letters on a dark grey
background. The other trick is to make yourself alert by moving to a different place,
such as taking the laptop to another room or even sitting on the stairs.The latter may
well be uncomfortable and will make you want to get the job done!
ISO 17100:2015 provides requirements for the core processes, resources, and
other aspects necessary for the delivery of a quality translation service that
102 Human and technological quality assurance
In other words, the ISO 17100:2015 standard specifies translation steps such as:
The following definitions taken from the Oxford English Dictionary (2019) and
Merriam-Webster (2019) give us an accurate definition of the above terms used in
the revision process:
ISO 17100 Translation Services Management Standard has superseded the old
European quality standard BS EN 15038 for language service providers that was set up
by the European Committee for Standardisation in 2006.The former BS EN standard
continues to be embraced by many language service providers as a stamp of approval,
which indicates that their products are delivered with an acceptable quality. The
Human and technological quality assurance 103
translation industry’s initial enthusiasm for the BS EN ‘fit for purpose’ standard may be
due to a lack of consensus within the industry (Gouadec 2007; Pym 2010a) as to what
a standard should be. One of the differences between ISO 17100:2015 and the earlier
BS EN 15038:2006 ‘fit for purpose’ standard is that the ISO now has an additional
competence requirement which means that the translator must be able to translate into
the target language using appropriate style and terminology (Mitchell-Schuitevoerder
2015). Another key difference is that there is a greater focus on the customer:
Create three sets of revision standards: for human translation, for CAT translation,
and machine translation. Are there any major differences?
104 Human and technological quality assurance
TABLE 5.2 Random selection of criteria found in LSP evaluation statements (2019)
Agreed criteria for revision are crucial. Third-party revision and self-revision
require different mindsets: third-party revision is best performed with circumspec-
tion and respect for the translator who has undoubtedly translated to the best of
their ability. We do not know the circumstances under which the translation was
carried out, for example, was there time pressure? What was the quality of the ori-
ginal ST? Were reference files sent to the translator? Was it a shared translation?
A reviser should check a translation without comparing or modifying according
to personal preferences. It is better to query when there is doubt and leave it
to the translator to review their own choices and decisions, especially if they are
not evident errors (Mossop 2019). He suggests revisers should ask themselves the
following questions after completing their revisions:
• Was each change I made needed? If so, did it adequately correct the
problem?
• If the change was adequate, how would I justify it (does it fit the checklist
supplied by the LSP)?
• Have I missed any errors?
• Have I introduced new errors?
Mossop 2019: 205
In heavily revised translations it is not unusual to find that the reviser has uninten-
tionally introduced new errors. If Track Changes is used, it is best to hide the red
lines during the revision process to keep the text readable.
Revisions and evaluations require skill, competence and experience. They are
potentially subjective: they should be carried out according to appropriate criteria
and returned to the translator for acceptance or rejection, ideally in the CAT tool
to update the TM (and Tmdb).
TABLE 5.3 Error typology grid applied to a CAT tool with integrated MT
Mistranslation
Accuracy
Terminology
Language
Consistency
Country
Format
Style
Over-correction
Totals
before you start and mark your problem areas, or do you begin without reading
the ST and stop and start each time you come across an unfamiliar term? Do you
confirm each segment, or do you send segments to the TM in draft mode? What
is your best method and what is your quickest way to revise? How is your CAT
tool set when you re-import revised bilingual documents? Does it import the
file with Track Changes enabled, which means the translator must review each
modification before confirming? If time is short, is it acceptable to skip some
steps by, for example, not checking the revisions in the editor but simply clicking
on accept and confirm all. After all, the CAT file history function records all
your changes: you can compare them after you have confirmed them, if there is
any doubt. Your work method has a significant impact on your time and prod-
uctivity, as well as its quality.
Can you agree on the editorial corrections needed in a machine translated para-
graph and a suitable financial reward on an hourly or a word basis? Discuss and
offer suggestions.
bitten > bittin (1 change); bittin > bitin (2nd change); bitin > biting
(3rd change)
Three-digit changes have been made. In PEMT, the edit distance can be used to
measure the post-editing effort: fewer edits indicate less effort on the post-editor’s
part – and payment will be less as the post-editor is only paid for changes made. If
this approach is taken, many aspects of PEMT that are in fact similar to translating
from scratch in a CAT tool, such as job preparation, reading and comparing of
source and target segments, topic research and checking supplied glossaries or style
guides, final QA and delivery and administration, are overlooked.
The differences between MT post-editing and HT revision are largely
determined by the purpose of the translations. For example, if an MT text is not
for publication and only for gist, there is less emphasis on high quality and time
and cognitive effort will decrease. Another difference relates to the predictability
of errors in MT, such as terminology inconsistency, lack of cohesion in gram-
matical number, gender, or punctuation. These typical errors make post-edits
easier and quicker, whereas errors in HT are less predictable: they vary from
translator to translator. It is difficult to predict how remuneration and fees will
continue to be calculated. It is to be expected that time and effort in PEMT are
put under pressure depending on the purpose of the translation and that quality
may not be the main priority. On the positive side, adaptive MT in the CAT tool
allows us to make corrections to proposed matches before they are confirmed
in the TM. This should automatically reduce the need for PEMT and improve
quality levels.
Project-based assignment
Objective:
To measure efficiency (post-edit time) and quality during the revision process
and after revision or post-edit
110 Human and technological quality assurance
Method:
The assignment is designed for a project management team but collaboration
within a project team or between individuals is possible
Tools:
CAT tool (a server version is not necessary but would facilitate collaboration
between contracted linguists and team members)
Suggested resources:
One digital source text of approximately 1000 words
Assignment brief
The assignment incorporates steps discussed in previous chapters and requires
you to manage them through collaboration: ranging from the recruitment of
required linguists to the sending out of Purchase Orders (PO) (1.7.2) and trans-
lation briefs (3.7.1), the distribution of files for translation and revision, and the
receipt of invoices from contracted linguists. The new component is human revi-
sion combined with QA features in the CAT tool. In your team you must agree on
revision and assessment criteria. You arrange the translation of your source docu-
ment and third-party revision.
The project-based assignment provides the framework to manage and assess
translation quality and must be shaped by your team with due consideration of the
following:
• Manipulate your source text and send out different versions to translators or
revisers, as appropriate, for example:
• prepare and send a clean ST to translators
• prepare and send a clean ST with a TBX or glossary to translators
• pre-translate the ST with MT in the CAT tool and send to revisers
• pre-translate the ST with MT but clear 50% of the target segments and
send to translators
• Prepare files for revisers in the following formats: XLIFF, bilingual doc or clean
target files
• Return revised files to the translators for rejection/acceptance
• An accounts PM manages incoming invoices and creates a spreadsheet
• Compare and discuss revisions and evaluation statements; compare the quality
of target files translated and revised through different methods; discuss and
compare the target quality of manipulated source files after revision. Show
your findings in tables or diagrams
Possible pitfalls
• Quality – too many criteria will make it difficult to measure your objectives
• Time – the time factor plays a significant role in revision and editing. If the
revision of a 1000-word file exceeds one hour, the reviser should be given the
option to return the file to the PM, who will then contact the translator and
request self-revision. In PEMT files with reduced fees (lower than translation),
post-editors should be advised about the time allowed and the quality required
• Profit and fees – in this assignment fees are important and must be negotiated
with the different parties including the client. Fees can be variable according to
the type of service required but the PM team must stay within a budget when
contracting their linguists. The budget is decided after the price of the transla-
tion service charged to the client is set.
Concluding remarks
This chapter discusses translation standards and quality, evaluation or assessment
and review, revision and edits without the requirement for a benchmark translation.
Standards are necessary, but they cannot be standardised, nor should we say that
we are fully in control of translation quality. The arrival of the internet and TEnTs
have accelerated the time to complete and deliver translations. Reduced time will
not necessarily improve translation quality, but if the end user is content, maybe we
should accept that the quality must suit the purpose. It is time for us to reconsider
our ideas of standards and quality benchmark translations. ISO 17100:2015 fails to
set standards for the translation product but sets them for the translation process.
We must accept that boundaries are shifting (Jiménez-Crespo 2013:112–13), for
example, between professional and non-professional translations (in crowdsourcing
112 Human and technological quality assurance
Further reading
Cronin, Michael (2003). Translation and Globalization. London and New York: Routledge.
European Commission. Directorate-General for Translation (2015). DGT Translation
Quality Guidelines. Brussels/Luxembourg. [Online] https://ec.europa.eu/translation/
maltese/guidelines/documents/dgt_translation_quality_guidelines_en.pdf [accessed
October 2019].
Jiménez-Crespo, Miguel A. (2013). Translation and Web Localization. London and New York:
Routledge.
Mossop, Brian (2014; 2019). Revising and Editing for Translators. Manchester: St Jerome
Publishing.
Nord, Christiane (1991). Text Analysis in Translation. Amsterdam-Atlanta: Rodopi.
Nunes Vieira, Lucas (2017). ‘From process to product: links between post-editing effort and
post-edited quality’. In: Arnt Lykke Jakobsen and Bartolomé Mesa-Lao (eds), Translation
in Transition. Between Cognition, Computing and Technology, pp. 161–86. Amsterdam/
Philadelphia: John Benjamins Publishing Company.
O’Brien, Sharon (2012). ‘Towards a dynamic quality evaluation model for translation’, The
Journal of Specialised Translation, 17. [Online] www.jostrans.org/issue17/art_obrien.pdf
[accessed May 2014].
6
DIGITAL ETHICS AND RISK
MANAGEMENT
Key concepts
• Intellectual property rights of translations need a clear definition about
ownership
• Confidentiality is at risk when web-based TMs and terminology databases
are shared
• Collaborative translation environments challenge digital ethics
• Non-disclosure agreements could potentially put the translator at risk
• Security and risk in the translation project are the responsibility of all parties
Introduction
This chapter is about our consideration of ethics in the digital translation industry,
where terms of business should govern the ways in which translators, LSPs, and
the client collaborate. The contractor trusts that good work will be delivered and
the contractee expects that it will be remunerated fully and in time. Purchase
orders (POs) can give both parties some legal standing. POs carry agreements
between LSP and contracted linguists. Invoices from linguists will include their
terms and conditions about payments.
Digital ethics relate to digital materials: how we store them, protect them, what
we do with them and what happens to them once they are in the cloud. Published
and printed materials are copyrighted. They are protected and cannot be sold for
profit, wholly or partly, without permission from the owner, the author, and pub-
lisher. We must now ask ourselves how we can protect our translations in digital
format against modification and other forms of violation, which could impact the
quality and consequently our reputation. And the second question is: how can we
114 Digital ethics and risk management
1. Audio-visual material for translation may not be shown or used for any other
purpose than translation
2. The linguist promises to take measures to protect confidentiality by preventing
digital copies from being passed on
3. The linguist promises not to discuss the client or the product or mention that
they work for the client
4. The linguist must sign that all files and materials will be deleted, removed, or
destroyed after completion and delivery of the translation
5. Logos, ideas, designs, notes, etc. shall not be used by the linguist
The second condition is intended for the linguist, but is it also signed by the LSP
and any project manager dealing with the digital files? And how does this condition
116 Digital ethics and risk management
The translation remains the The translator guarantees to The translator shall do their best
property of the translator give the contracting LSP all not to use or store confidential
unless agreed otherwise in intellectual property rights information in an externally
writing. associated with their services. accessible computer or electronic
information retrieval system
without security software and shall
prevent unlawful access by third
parties.
Copyright must be ensured by The translator gives the LSP all Confidential information requires
the client before the source copyrights and other intellectual the use of software to protect
is an urgent need for guidelines as to who owns the TM before, during and after
the job is completed, the client or the translator (Berthaud 2019). In our digital
work environment, confidentiality is challenged in source texts, target texts, file
transfers, interchangeable file formats, CAT tools with or without MT, and many
other cloud-based tools. T&Cs are meant to protect the client or the translator,
but there needs to be more specific information as to who and what are being
protected and where. Confidentiality is a broad concept and a statement like ‘the
client’s documents are only considered confidential if this was stated by the client’
(Table 6.1) is an obvious starting point. The same applies to copyright and own-
ership which can and should be defined in an initial agreement. A good point of
departure would be for the translator to inform the client which digital tool is used
and the level of security practised.
Discuss and create a workable set of terms of business for translators in relation to
their translation memory databases.
addresses the transfer of personal data outside the EU and EEA areas. It not only
affects the way in which we communicate digitally, it regulates how businesses should
handle and process data. LSPs have become keen to show that they are GDPR com-
pliant and ask their contractees to sign that they agree. GDPR has given a significant
boost to uniformity in work practice within the EU/EEA area and beyond.
Large global organisations outside the EU/EEA area need to be GDPR compliant
because the legislation affects their business to and from the EU/EEA area. TAUS, the
independent data collection and sharing network, collects language data, metadata, and
personal data. Language data include source and target texts, metadata are associated
data, including language pairs, data about industry sector and content type; personal data
include information that will identify the person (name, email address, IP address). Since
GDPR,TAUS has introduced a different method for selecting data, it removes all meta-
data, including, for example, company names. It is important for companies to know
that their company name is no longer attached to any of their language data uploads.
GDPR has affected LSP work practice in several ways. LSPs store a large amount
of personal data, such as bank details of their contractees. They must assure that
such details are stored safely and that they have permission to store them. Group
emails must not show names other than that of the addressee. LSPs now ask their
contractees to take precautions regarding the materials they store on their devices,
and may ask for encryption of files and emails, or password protection. There is
a growing awareness on all sides that it is essential to treat all transferable data
associated with translation projects with great care.
Google has been certified compliant with ISO 27018 for G Suite and Google
cloud Platform ISO 27018 (cloud Privacy). ISO 27018 is an international
standard of practice for protection of personally identifiable information (PII)
in Public cloud Services.
Google does not claim any ownership in any of the content (including text
and labels) that you transmit to the cloud Translation API.
Google 2019
Google Cloud Translation, contrary to Google Translate which stores data for
training, claims their API (7.1) is secure and that data uploaded to their MT engine
will not be stored and reused:
When you send text to Cloud Translation API, we must store that text for
a short period of time in order to perform the translation and return the
Digital ethics and risk management 121
results to you. The stored text is typically deleted after a few days, although
occasionally we will retain it for longer while we perform debugging and
other testing. Google also temporarily logs some metadata about your Cloud
Translation API requests (such as the time the request was received and the
size of the request) to improve our service and combat abuse.
Google 2019
Clients and LSPs are concerned about the insertion of ‘we must store that text for
a short period of time in order to perform the translation and return the results to
you’, and in their terms and conditions LSPs may prohibit use of the MT engine.
The difference in GDPR compliance between open-source Google Translate and
API-controlled Google Cloud Translation is not always recognised by all parties in
the translation industry. In LSP terms of business (ToB) MT engines may be listed as
‘third parties’ that have unauthorised access to content. Furthermore, the difference
between CAT tools with secure integrated close-circuit MT engines, and CAT
tools with secure APIs, or APIs to open-source MT engines deserves more delinea-
tion in job descriptions. In her study, Berthaud (2019) discovered an unwillingness
to engage with digital ethics relating to MT. Some of her responding LSPs did not
know or want to know whether their translators use MT. This kind of ignorance
creates ethical issues. LSPs must know which tools their contractees use and know
the level of security and GDPR compliance necessary to deliver subcontracted
translations to clients with confidence. The impact of MT on translation quality is
recognised and a much more graduated view of quality in machine translated trans-
lation is growing. The digital ethics of Google Cloud Translation deserve a closer
examination rather than an unqualified decree by LSPs that MT should not be used
by their contracted translators.
TMs and MT are merging in CAT tools and may soon be inseparable. Adaptive
machine translation (3.2) learns from our entries in the CAT tool. Programmers
and manufacturers insist that all data belong to us and are kept safe, in other words
not shared with other users. Neural Machine Translation uses encryption. Data
centres and all operations are ISO 27001:2013 certified, which means that they will
not use our translation for training purposes**(link: adaptive MT confidentiality).
Any content uploaded by users to the NMT engine, new or edited, is not shared.
Once or twice a year the NMT engine will, however, use our data together with
multi-billion-word corpora to improve the performance of their engine. Apparently,
it does not impact confidentiality or security because it is not possible to recon-
struct authentic sentences from NMT training data. Zetzsche (2019 (297th Journal))
queries whether this process does not raise a digital ethics issue considering that
other major MT systems decided not to use data for training purposes after the
introduction of GDPR legislation, if APIs**(link: adaptive MT confidentiality) are
used. If there is concern about confidentiality, an organisation can use a model based
on their server to give the utmost privacy. Translators do not have this option and
must trust that their entries in the MT corpus cannot be used by third parties.
Whatever our feelings about the digital ethics of integrated and adaptive MT
support in CAT tools, the NMT engine needs our TM entries before it can produce
and generate matches. If there might be a confidentiality requirement in the ST, we
can disable the MT function in the CAT tool.The programmers and manufacturers
claim that MT has been trained on legally acquired data, wherefore it is safe to use
without breaches of confidence. If we accept commissions with an awareness of eth-
ical risks in shared digital sources and resources, and we make our commissioning
LSPs and clients aware of our and their authoring rights, combined efforts will be
a step in the right direction. Confidentiality in digital materials is just as important
as our intellectual property rights.
Pym questions Gouadec’s opinion that job descriptions may lead to risk reduc-
tion, and suggests that high-risk jobs may benefit, but low-risk jobs could be left to
the discretion of the translator. Undoubtedly, translation projects requiring NDAs
need detailed job descriptions, but in real time, the focus tends to be on quick
recruitment and prompt agreement between LSP and contractees and on a smooth
transfer of digital files between users. Important details about readership and target
audience of the translation job in hand are often only supplied on request. The LSP
is not always informed by the client.
There seems to be an unspoken gentleman’s agreement that a good collaboration
between LSPs and linguists depends on trust, communication, and understanding.
Clients cannot be part of this relationship, if they do not know enough about trans-
lation to draft a translation job description. Our conclusion must be that in the
digital era a gentleman’s agreement can be a risky agreement. All parties need to
increase their awareness of security risks and breaches that could adversely affect
translation work, file storage and transfers.
We take risks even if we also see a positive outcome. What kind of digital risks
would an LSP be prepared to take which might affect a translation positively or
negatively?
Project
Technical
management
Requirements Planning
Complexity Priorities
Technology Estimates
Resources Communication
Quality Control
FIGURE 6.2 PDF of three sides of a carton with text sideways and upside down
graphics file that cannot be processed in a CAT tool and requires specialist
tools for conversion:
Dear translator,
Would you be available to translate a brochure? The source texts are EPS files and we
would require the translations to be supplied as EPS files so that we can place them
into the artwork for printing.
Email from language service provider, 2018
Should the job offer have been sent to the translator? The translator may take it
on without realising that the files are incompatible with their CAT tool…. It is
interesting to read threads on translator mailing lists and forums. Quite possibly the
technical Q&A threads outnumber terminology queries.
• Resources
Good resources include reference files, glossaries, and more, but also appro-
priate human resources (specialised translators) with appropriate tools. How
do LSPs prepare translators for collaboration in a shared translation project?
• Quality
The translation brief must state the purpose of machine and human translated texts,
target audience, locale (for example, GB or US English), etc.What does the transla-
tion brief state about quality? Is the application of QA in the CAT tool adequate?
Digital ethics and risk management 127
Once risk and identified risk factors are understood, course of action can be as follows
(Figure 6.3): it can be decided to avoid certain situations at all costs and take preventative
avoidance
mi!ga!on
Risk response
transfer
acceptance
con!ngency
measures, or it is decided that risks are acceptable, either because their impact is low
and not worth the investment of time and effort, or because the risk is worth taking.
Alternatively, potential serious risks can be mitigated by introducing some measures that
are not too demanding on time and effort, or there is a transfer of risks by using, for
example, a different translator. If risks are taken seriously, it is wise to design contingency
plans so that if plan A does not work, a quick transfer to plan B can be made.
Risk management is indispensable and should be practised by anyone involved
in the translation business. Understanding risks is the first step, but the implemen-
tation of appropriate measures can also be a risk. Very often a balance is needed
between the investment of time and resources to mitigate or avoid risks and the
potential gain if those investments are made (Lammers in Dunne 2011: 211–32).
In other words, the translation provider, LSP, or translator, must prioritise risks
and make choices. Risks are not always negative; they can have a positive out-
come and be worth taking. The other much-needed balance is between the client’s
expectations and the actual role of the translator. Technology has made translators
and other linguists easily accessible, which is good news. The downside is that
working hours are not and cannot be respected if collaboration takes place across
time zones. The accessibility of MT engines and translators on the web carries the
risk that translators and MT are juxtaposed and perceived as equal but different
modes. Furthermore, translation technology is considered to lighten the translator’s
load, without any realisation that the opposite can be true: time pressure due to
increased volume, shared projects with fluctuations in consistency and quality, and
breaches in confidentiality arising from poorly regulated electronic transfer, storage
and sharing of databases.These are problems that need to be dealt with by freelance
linguists and LSP teams. Raised awareness and transparency as to what constitutes
breaches or risks would be a step in the right direction.
Project-based assignment
Objective:
Identifying digital risk and taking appropriate measures to prevent breach of
confidentiality in project-based digital translation
Digital ethics and risk management 129
Method:
The assignment is designed for a project management team but collaboration
within a project team or between individuals is possible
Tools:
CAT-tool, and MT engine in the CAT tool (a server version would facili-
tate an examination of security breaches and/or risk in web-based shared
databases and files)
Suggested resources:
One manipulated digital source text (you may have to artificially create con-
fidential issues) of approximately 1000 words; a (manipulated) TMX and
TBX file.The ST must contain confidential issues to check the suitability of
your safeguarding measures
Assignment brief
Call your project managers and other team members together to discuss a previous
translation project: you have received a serious complaint from the client who has
discerned a breach of confidentiality. Alternatively, the press release you were asked to
translate has been leaked and consequently a competitor has announced their immi-
nent launch of a similar product. Some of the terminology that was provided to you
by the client was leaked – intentionally or unintentionally. The client has threatened
legislative action.
• You must now identify the risk factors that may have caused breach of
confidence, e.g.:
• Inadequate Terms of Business on all sides
• Your human resources database does not tell you what TEnTs your free-
lance translators use; whether they use MT in their CAT tools
• You did not ask your contractees to return XLIFF files, which contain
much metadata (e.g. about possible use of MT)
• Other risk factors
• Identify and discuss your risk responses to prevent recurrence
• Set up a new translation project, apply your preventative measures and check
to what level you have managed to avoid potential breaches or risks that were
experienced in the previous project
130 Digital ethics and risk management
• Conclude with a debrief for your team and discuss how well your contractees
responded to your new set of requirements and how well you managed risks
Possible pitfalls
• Too many potential risks – it is not possible to prevent and avoid risk entirely.
The identification of too many risks may cause you to lose focus, become
demotivated to keep agreements and cost too much time and effort
• Being too risk adverse or too risk tolerant. Try to find a workable balance
• GDPR begins in the way you communicate with your contracted linguists and
adherence must be stressed in the job description sent to your contractees
Concluding remarks
In this chapter we have discussed the digital ethics of intellectual property, copy-
right in digital materials, in TM databases and in translated digital files. We know
the risks surrounding digital ethics, we can identify the problems, but the solutions
are not always apparent. We see conflicting Terms of Business sent by LSPs and
translators. If there is an issue, the parties may prefer not to look for answers
because it requires too much effort, and blame goes to the party that signed. It is
not easy to define confidentiality, but we could accept that material is confidential
only if the client or LSP state its confidentiality. We could also check the security
of the digital sources and resources we have received. If we are likely to receive
confidential material, our TEnTs require security software and strong passwords to
prevent unauthorised access.
Our reputation could be damaged if the target text or TM is modified by a third
party and no longer matches the authentic source text which is under the client’s
copyright. Copyright also applies to attachments in emails. Copyright is difficult
to define in shared tools: the materials can be uploaded or downloaded innumer-
able times, modified, edited, revised. We must be aware of potential risks when
sharing files.
Digital ethics affect the working practice of all parties involved in a transla-
tion workflow: client, LSP, and contracted linguists. It may be appropriate to sign
a non-disclosure agreement. However, conditions that remove the rights of the
translator to their translations, should be questioned. All parties must know which
TEnTs have been used and there needs to be multilateral agreement on TEnT
usage. A global recognition of GDPR would be a solid step forward towards a
working code that is understood by all.
Further reading
Chesterman, Andrew (2018). ‘Translation ethics’. In: Lieven d’Hulst and Yves Gambier
(eds), A History of Modern Translation Knowledge. Sources, Concepts, Effects, pp. 443–8.
Amsterdam: Benjamins.
Digital ethics and risk management 131
Drugan, Jo and Bogdan Babych (2010). ‘Shared resources, shared values? Ethical implications
of sharing translation resources.’ In: Ventsislav Zhechev (ed.), Proceedings of the Second
Joint EM+/CNGL Workshop. Bringing MT to the User: Research on Integrating MT in the
Translation Industry. Available from: https://pdfs.semanticscholar.org/4acd/2c229ef9dfa3f
a903911ed7447e62f726edc.pdf.
Kenny, Dorothy (2010). ‘The ethics of machine translation’. Proceedings XI NZSTI National
Conference.
Lammers, Mark (2011). ‘Risk management in localization’. In: Keiran J. Dunne and Elena S.
Dunne (eds), Translation and Localization Project Management, pp. 211–32. Amsterdam and
Philadelphia: John Benjamins.
Moorkens, Joss (2019). ‘Uses and limits of machine translation’. In: Ethics and Machines in an
Era of New Technologies. ITI research e-book.
Project Management Institute (2004). A Guide to the Project Management Body of Knowledge.
(PMBOK® Guide). 3rd ed. Newtown Square, PA: PMI Press.
7
WEB-BASED TRANSLATION
ENVIRONMENT TOOLS
Key concepts
• Translation environment tools move into the cloud
• Collaborative translation projects can be managed centrally when using servers
and translation management systems
• Digital platforms offer opportunities for translators and LSPs to collaborate, to
recruit linguists or source jobs and exchange knowledge
• Translation must adapt to different reading techniques of web-based material
• LSPs and translators have different views on the impact of web-based transla-
tion technology on quality, efficiency and profit
Introduction
This chapter is about the interaction between the translator and web-based tools. It
describes new trends in translation technology and examines the technologies that
may be helpful in our work. The TEnT market evolves, new tools come and go.
Translation management systems, for example, incorporate CAT features and are
gaining popularity among LSPs, because the comprehensive system enables them to
manage the workflow in the translation project from beginning to end. LSPs expect
contractees to work online in their translation management system. The translator’s
challenge is to be up to date with any web-based TEnTs we may be required to use.
We will also visit several digital platforms to find out what is on offer there. Digital
platforms are virtual meeting places where vendors and buyers of translation can
discuss matters, ask for support, look for answers, translate, localise, offer or bid for
translation jobs. They are gaining popularity among freelance translators as a virtual
meeting place where jobs are advertised, or as a resource centre and knowledge base.
The ownership of translation tools is changing too. New trends show that it is
unnecessary to invest in an expensive CAT tool with its annual maintenance fees
Web-based translation environment tools 133
paid to the manufacturer. There is the option to subscribe to web-based CAT tools
or machine-assisted translation systems**(go to www.routledgetranslationstudies-
portal.com/ – A Project-Based Approach to Translation Technology – link:
machine-assisted translation systems) with integrated CAT functions. You are not
tied into a subscription but can opt out when required. If, however, you feel com-
fortable with your CAT tool and gain much benefit from your TM and termin-
ology databases, it is worth considering APIs and plugins which give access to many
web-based tools. In the project-based assignment you will be given an opportunity
to test translation project workflow in different web-based tools. The objective is
that your experiences with a variety of TEnTs give you the skills and competence
to try new ones and to be ready for changing trends.
can be downloaded free of charge. The following list of plugins shows a selection
which are provided for and by a variety of CAT tools:
In sum, plugins are developed to enhance the CAT tool by integrating external
programs. APIs, sometimes called connectors, are a convenient way of integrating
tools into your CAT tool but require a fee to be paid to the provider. Plugins
appear to be under-used by translators (Nunes Viera and Alonso 2018). Lack of
awareness may explain why translators tend to export their translations for a spell
and grammar check outside the CAT tool (most CAT QAs include spellchecks,
not all include grammar checks**(link: CAT grammar check). The plugin was
not initially designed for the translation market but has found a new niche in the
translation industry and we will undoubtedly see the range increase. It is good to
‘shop around’ because quality varies greatly, and so does cost. Some plugins are free,
others require a licence or API. Several CAT tools now have the microphone icon
for speech recognition (SR). It is not a new TEnT but it is to CAT tools. It may
require the use of a smartphone as a microphone**(link: CAT tool with ASR).
commands to make the STT do what you want it to do, such as ‘move down’ to
the next segment. A wireless microphone means that you do not have to sit down
and stare at your screen. Sight translation (also practised by interpreters) can be
done from your ST on a piece of paper while walking around, if you wish. TTS
allows you to listen to your translation read by an artificial voice. TTS presents an
alternative and very different way of revising your translation. It also invalidates
the criticism that segmentation in CAT tools reduces translation quality, because
TTS helps you ‘see and hear’ your translations in context. The TT is presented to
you as a coherent text. TTS helps you assess the level of fluency in your transla-
tion (Ciobanu 2019).
ASR, STT, and TTS are forms of weak AI, they do not operate like NMT, they
are not creative, they cannot produce novel utterances, but rely on historical data,
i.e. the exact data previously entered by the speaker. ASR can however be trained
by you, they can be corrected on the fly, and they can learn shortcut codes for
functions, which you would normally type on your keyboard. Of course, ASR
becomes strong AI when it is combined with NMT: a speaker at a conference
who uses ASR will see machine translated speech-to-text appear on the audience’s
tablets without time delay. NMT uses the ASR generated data to generate the
target text.
A good quality ASR program can achieve a very high level of perfection and
can be used in the CAT tool. It needs to be told and trained how to move from
one segment to the next and how to open your frequently used functions such as
concordance, lookups, find, etc. If you spot an error while you are translating, you
can correct it in real time or leave it and do an ASR or manual keyboard revision
later. Some translators claim that they can increase their translation speed by 500%
(Ciobanu 2019). This increase can only be achieved if the translator does termin-
ology research in advance, if speech recognition errors are few, if the text has a
smooth narrative with full sentences to facilitate recognition, and if the translator
can ‘sight translate’. Pauses and incomplete sentences or phrases hinder the tool
from recognising speech matches in the corpus. A paid SR program can recognise
over 86 languages, including character languages. Free online SR programs recog-
nise around 30 languages.
In the CAT tool it is possible to combine STT and the keyboard without con-
flict. If you want to increase your productivity, the ASR tool needs to be taught
commands. Your tone of voice is important for the ASR to recognise when it
is a command and not a phrase for translation. There are a few CAT tools with
integrated STT**(link: CAT tools with ASR). They call the function ‘dictation’,
and it can be enabled with or without an app.
There is still much scope for CAT tool developers to improve the audio feature,
but also for translators to understand the benefits of ASR. Adaptive technology
should help ASR work with other integrated tools. For example, MT and ASR do
not yet speak to each other, in other words, give each other priority. Consequently,
it causes conflicts when both are used simultaneously. Integrated audio data in our
tools would also offer opportunities for different kinds of immediate communication
136 Web-based translation environment tools
The advantages of the server for the contracted translator are significant too.
The translation process on the server is shorter and quicker because the document
files are ready for translation without the need to save, store, and import into the
CAT tool. They do not need exporting either. If collaboration between multiple
translators in a translation project is well organised, the TM and Tmdb will update
in real time and support consistency. The translator can see what colleagues have
entered in the database.Well-edited and customised TMs and Tmdbs, plus reference
material, are great resources for the translator. A temporary LSP licence supplied
to the translator means that there is no direct need for the translator to purchase a
CAT tool. If the translator would prefer to pay monthly for a licence rather than
buy a CAT tool, it is possible to lease a web-based CAT tool**(link: web-based
CAT tool).
There are, of course, disadvantages to translating on a server or web-based CAT
tool for the translator, such as potential unfamiliarity with the CAT program, an
inability to link or integrate your usual TEnTs, such as dictionaries, your TM and
Tmdb databases, or MT API. If the web-based CAT tool has an offline editor, the
file can be downloaded as an XLIFF file from the cloud and translated in the off-
line editor. This allows you to work beyond a wifi signal. It is possible to down-
load the file from the editor into your own CAT tool before returning it to the
web-based tool via the offline editor**(link: web-based CAT tool link to personal
CAT tool).This procedure does not allow you to benefit from matches in the web-
based TM: they are only accessible online. When translating in the offline editor,
matches can be checked once the translation is uploaded to the web-based CAT
tool. It is not a straightforward method for the translator, whereas the LSP enjoys
an automated workflow without having to manage attachments in emails. Working
on the server raises two questions: whose are the property rights associated with
the translation and TM entries, and how can the translator make convenient good
use of their own resources?
Web-based CAT tools are becoming more sophisticated: as translation man-
agement systems and in portals or on platforms. At this level of sophistication, the
main market consists of companies, organisations, and LSPs. The translator risks
becoming a confused user with the reins taken from them. These developments are
important and in the following sections we will discuss various aspects of working
in the cloud.
purpose and user groups. We will concentrate on two types: the business platform
(and portal) for translation and crowdsourcing where translations are produced and
the platform where translators meet, which can be a mix of advertising, market-
place, social media or be knowledge based.
working on live content in the development cycle, can keep up with language
updates.The translator can either push content for translation, i.e. upload files to the
platform, where localisation will be performed, or use the pull approach (4.6) and
consult the platform database.
When localising website material, the choice is between a file-based solution
or a cloud-based solution. Figure 7.2 shows potential workflows in a simple local-
isation activity. Analogue twelve-hour clocks as used in the UK must be localised
to digital 24-hour clocks. The manufacturer may deliver the ST in an i18n version
and the translator can perform l10n in a CAT tool (file-based) by using an API.
Alternatively, the process can happen in the cloud (cloud-based) in TMS (7.5).
A cloud-based platform requires an API to obtain access, and payment is based
on usage**(link: localisation platform). In this case, the localisation function is
automatically integrated in translation management systems (7.5).
translator, for PEMT and/or translation of any segments beyond the scope of the
tool**(link: online open source adaptive NMT platforms).
Discuss the different digital platforms and their benefits from the translator’s
perspective.
Web-based translation environment tools 143
• project planning
• workflow management
• integration of TEnTs for large-scale translation activities
• coordination of all contributors within, outside and across organisations
• automation of repetitive tasks
• reduction in file management and transfer
Web-based translation environment tools 145
The TMS has the following translation features to suit the translator:
The main selling point of TMS is adaptive MT: whereas CAT tools offer MT
access as a plugin, the primary TMS focus is on its access to an MT engine.The TM
and Tmdb databases are integrated in the TMS. The TMS needs the two databases
to make the MT adaptive. The MT engine learns from your entered TUs in the
TM and adjusts as you translate. TMS developers believe that TMS tools will offer
such high-quality MT that the user will not even have to post-edit (Zetzsche 2017
(281st Journal)). The translator edits on the fly, confirms and adds TUs to the TM,
and the MT learns from the TM. The other major difference between CAT tools
and TMS is the latter’s inclusion of business management functions.
• Google uses crawlers, programs that systematically search the World Wide Web
to collect data. Based on those data, Google creates ranking algorithms
• Algorithms determine the ranking order of a website: at the top of the
page, at the bottom, or on pages 2, 3, etc. In order to push a website up the
ranking order, visits can be influenced through suitable website content,
appropriate length, relevance and keywords or keyword phrases
Web-based translation environment tools 147
Although SEO based on a seed list is not a creative form of translation, the client
who needs web translation may ask for transcreation rather than translation,
including SEO.Transcreation (translation + creation) is a marketing term that refers
to the adaptation of ST to localised TT. One LSP (2019) referred to transcreation
as a ‘creative and adaptive translation’. Another LSP (2019) described transcreation
on their website as a ‘hybrid of innovative, culturally adapted content and straight-
forward translation, in which the sense or feeling of the original must be retained in
the target language’. The question worth asking is whether these features have not
always been part of translation.
What do clients want? The designers, developers, and manufacturers of CAT tools
believed their tools delivered the mantra of less time, more profit, better quality. LSPs
and translators agreed that quality of service came first, followed by responsive-
ness, meaning prompt delivery, the quality of the translation, and flexibility. It is
interesting that price, or rather low cost, was not considered a top priority for the
customer by translation providers. LSPs and translators disagreed about the lan-
guage focus which they thought was expected by the client: translators reported
148 Web-based translation environment tools
domain specialism, whereas the LSPs believed the client was looking for a broader
spectrum of language services.
When it comes to technology improvement wishes vary. Translators would like
more ease of use of their tools and a lower cost of ownership, whereas the LSPs
would like to see better integration, better MT quality. They agreed on the cost of
ownership of tools, which both parties considered too high.
The gap between LSPs and translators widened in terms of tool-investment
plans: LSPs scored 62% for machine translation (cf. 16% for translators) and 52%
for automated workflow (cf. 6% for translators). QA tools were not popular with
translators, contrary to LSPs, which suggests that translators are not (yet) convinced
that QA tools offer added value.
Tool familiarity was on a par across the board: MS Office, major CAT tool and
Google Translate were among the top five. The difference between familiarity and
usage, however, was significant. Google Translate dropped from high familiarity to
50% among LSP users and to 75% among translators as a frequently used tool.
LSPs do not appear to recognise the different levels of confidentiality risks between
open-source web-based MT engines (high risk) and API access with adaptive
use (low risk). In the survey, concerns for breaches of confidentiality and inferior
quality were high among LSPs. The following communication from the LSP to
their contractees confirms these concerns:
Two main points stood out in the survey: translators are not habitual users of
TMS, and although translators covet their TMs, they are seemingly generous with
their terminology databases. LSPs show remarkably little interest in ownership and
transfer of terminology databases. Both points are closely related with ethics and
property rights. Contracted translators that work in web-based tools, such as the
TMS, do not have ownership of their work.The transfer of user rights is very much
beyond the translator’s control, especially if a contract agreement is signed in which
the translator is requested to give away their rights. The seemingly involuntary
transfer of ownership and user rights deserves a wakeup call among translators.They
could lay claim on the industry’s mantra and take ownership of translation quality,
efficiency, and profit by informing the client and LSP what they can deliver within
the given parameters.
Project-based assignment
Objective:
Planning, implementation, and management of your own web-based trans-
lation project through teamwork and contracted translators with new
cloud-based TEnTs
Method:
The assignment is designed for a project management team but collaboration
within a project team or between individuals is possible
Tools:
Server or TMS; CAT tool with internet access; ASR
150 Web-based translation environment tools
Suggested resources:
HTML source text for localisation
Assignment brief
In this assignment you must
• apply all your management skills, design your own team translation project
and run it on a TMS or server. Alternatively, use a CAT tool but access digital
platforms, or create your own (marketplace? crowdsourcing? portal?) platform.
Your choice depends on what is available to you
• apply any new TEnTs available to you: TMS, CAT + server, plugins, APIs,
speech recognition
• recruit translators and revisers in the cloud on existing platforms or on a self-
created platform
• apply localisation skills (SEO) to the text through recruited translators, through
localisation tools, or in your team.
Do not forget a plan and workflow at the beginning and the quality assessment of
the translation at the end of this assignment.
Make sure that you can tick the following activities in your evaluation of your
project management skills:
Possible pitfalls
• Impact on quality by unqualified translators
• Impact on quality as a result of incompetent usage of TEnTs or platforms
• Impact on product (translation) as a result of project manager incompetence
Web-based translation environment tools 151
• Stagnation in the workflow if project management team members are not sure
about their roles
Concluding remarks
How does the industry’s mantra sit among freelance translators and LSPs? Data
for this assessment are based on interviews in 2018 with translators and LSPs
conducted by Lucas Nunes Vieira and Elisa Alonso (University of Bristol UK)
in collaboration with the Institute of Translation and Interpreting (UK) and the
Western Regional Group of the Institute of Translation (UK). The interview
questions related to MT output. Bearing in mind that the translation management
systems discussed in this chapter use MT as their primary tool, the answers in the
survey are pertinent.
The mantra of quality was considered most important and yet also most prob-
lematic: the concept of quality needed defining and quality needed negotiating.
LSPs would ask translators to post-edit within a given time. It was not a problem
if MT output was good but detrimental to final quality if the output was poor.
Translators felt that their sense of quality and their pride in delivering quality was
under attack.We have seen that adaptive MT in TMS can resolve this problem if the
translator can select and modify proposed MT matches on the fly rather than per-
form a post-edit. None of the interviewees stated that MT provided better quality.
The mantra of productivity saw an imbalance. LSPs did not see initial gain
in productivity when the MT technology was introduced, but improvement after
the arrival of NMT increased productivity by more than 100%, from 2000 words
a day to 4000 to 5000 words. Translators commented that they saw little gain if
MT output was poor. In this respect adaptive MT in TMS may not be a temporal
improvement if the translator is faced with having to make too many match choices
and too many edits. They are time-consuming.
The mantra of profit is closely related to the experiences described by LSPs
and translators about MT quality. The LSPs appreciated the increased translated
word rate, and the translators felt that post-edits were more profitable than trans-
lation if the quality was good. It was agreed that the pricing structure of word
rates did not suit hybrid translation when MT was used. Translators were not
happy with alternatives such as pricing according to edit distance because they
felt it did not account for effort. Time spent on thinking was not part of the
calculation.
A pertinent conclusion in the survey was that in large collaborative projects
translators felt like a small cog in the wheel with little say, especially when they were
expected to work within a given structure or procedure, such as TMS or platform/
server. Translators stressed the human side of translation, which meant that they
considered feedback important and that communication between all parties was
of prime importance. They would like the industry to listen and respond to their
concerns about TEnTs they want to use or are expected to use.
152 Web-based translation environment tools
Further reading
Cronin, Michael (2003). Translation and Globalization. London and New York: Routledge.
GALA (Globalization and Localization Association) www.gala-global.org/what-translation-
management-system
Jiménez-Crespo, Miguel A. (2013). Translation and Web Localization, pp. 193–7. London and
New York: Routledge.
Nunes Vieira, Lucas and Elisa Alonso (2018). ‘The use of machine translation in human
translation workflows. Practices, perceptions and knowledge exchange’. Milton Keynes:
Institute of Translation and Interpreting.
O’Hagan (2017). ‘Crowdsourcing and the Facebook initiative’. In: D. Kenny (ed.), Human
Issues in Translation Technology, pp. 25–44. London, UK: Routledge.
Pym, Anthony (2010b). ‘The Translation Crowd’. Tradumàtica 8. www.fti.uab.cat/
tradumatica/revista/num8/sumari.htm# [accessed November 2019].
Shuttleworth, Mark (2015). ‘Translation management systems’. In: Chan Sin-Wai (ed.),
The Routledge Encyclopedia of Translation Technology, pp. 687–91. London and New York:
Routledge.
BIBLIOGRAPHY
Anderson, Lorin W. and David R. Krathwohl (eds) (2001). A Taxonomy for Learning, Teaching,
and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York: Addison
Wesley Longman.
Austermühl, Frank (2001). Electronic Tools for Translators. Manchester: St Jerome Publishing.
Baker, Mona (2018). In Other Words: A Coursebook on Translation. London: Routledge.
Baker, Mona and Gabriela Saldanha (eds) (2009). Routledge Encyclopedia of Translation Studies.
London: Routledge.
Baroni, Marco and Sylvia Bernardini (2004). BootCaT: Bootstrapping corpora and terms
from the web. Proceedings of LREC 2004.
Berthaud, Sarah (2019). ‘Ethical issues surrounding the use of technologies in the translation
and interpreting market in the Republic of Ireland’. In: Ethics and Machines in an Era of
New Technologies. ITI research e-book.
Bowker, Lynne (2001). ‘Towards a methodology for a corpus-based approach to translation
evaluation’, Meta, 64: 345–64.
Bowker, Lynne (2002). Computer-aided Translation Technology: A Practical Introduction.
Ottawa: University of Ottawa Press.
Bowker, Lynne (2005). Productivity vs quality? A pilot study on the impact of translation
memory systems. Localisation Focus, 4(1): 13–20.
Bowker, Lynne (2015). ‘Computer-aided translation. Translator training’. In: Chan Sin-
Wai (ed.) The Routledge Encyclopedia of Translation Technology, pp. 88–119. London and
New York: Routledge.
Byrne, Jody (2014). Scientific and Technical Translation Explained: A Nuts and Bolts Guide for
Beginners. Hoboken: Taylor & Francis.
Chan, Sin-Wai (2015). The Routledge Encyclopedia of Translation Technology. London and
New York: Routledge.
Chan, Sin-Wai (2017). The Future of Translation Technology. Towards a World without Babel.
London and New York: Routledge.
Chaume, Frederic (2012). Audiovisual Translation: Dubbing. Translation Practices Explained.
Manchester: St Jerome Publishing.
Chesterman, Andrew (2001). ‘Proposal for a hieronymic oath’. In Anthony Pym (ed.), The
Return to Ethics, special issue of The Translator, 7(2): 139–54.
154 Bibliography
Chesterman, Andrew (2018). ‘Translation ethics’. In: Lieven d’Hulst and Yves Gambier
(eds), A History of Modern Translation Knowledge. Sources, Concepts, Effects, pp. 443–8.
Amsterdam: Benjamins.
Ciobanu, Dragoş (2019). ‘Speech technologies: the latest word in AI-driven game-changing
language technologies’. In: Ethics and Machines in an Era of New Technologies. ITI research
e-book.
Cooper, Alan (2004). The Inmates are Running the Asylum: Why Hi-Tech Products Drive Us
Crazy and How to Restore the Sanity. Indianapolis: Sams Publishing.
Cronin, Michael (2003). Translation and Globalization. London and New York: Routledge.
Cronin, Michael (2010). ‘The translation crowd’. Revista.Tradumàtica 08. Catalonia: UAB.
Cronin, Michael (2012). Translation in the Digital Age. London: Routledge.
Daems, Joke, Lieve Macken, and Sonia Vandepitte (2013). ‘Quality as the sum of its parts: a
two-step approach for the identification of translation problems and translation quality
assessment for HT and MT+PE’. In: Sharon O’Brien, Michel Simard, and Lucia
Specia (eds), MT Summit XIV Workshop on Post-editing Technology and Practice, Proceedings,
pp. 63–71. European Association for Machine Translation.
Doherty, Stephen (2017). ‘Issues in human and automatic translation quality assessment’.
In: D. Kenny (ed.), Human Issues in Translation Technology, pp. 131–48. London: Routledge.
Doherty, Stephen and Joss Moorkens (2013). ‘Investigating the experience of translation
technology labs: pedagogical implications’. JoSTrans 19. www.jostrans.org/issue19/art_
doherty.pdf
Dragsted, Barbara (2005). ‘Segmentation in translation. Differences across levels of expertise
and difficulty’. Target, 17(1): 49–70. John Benjamins. https://doi.org/10.1075/
target.17.1.04dra.
Drugan, Jo (2009). ‘Intervention through computer-assisted translation: the case of the
EU’. In: J. Munday (ed.), Translation as Intervention, pp. 118–37. London & New York:
Continuum International Publishing Group.
Drugan, Jo (2013). Quality in Professional Translation: Assessment and Improvements. London/
New York: Bloomsbury Academic.
Drugan, Jo and Bogdan Babych (2010). ‘Shared resources, shared values? Ethical implications
of sharing translation resources’. In Ventsislav Zhechev (ed.), Proceedings of the second
joint EM+/CNGL workshop. Bringing MT to the USER: Research on Integrating MT in the
Translation Industry. Available from: https://pdfs.semanticscholar.org/4acd/2c229ef9dfa3f
a903911ed7447e62f726edc.pdf.
Dunne, Keiran J. and Elena S. Dunne (eds) (2011). Translation and Localization Project
Management. Amsterdam and Philadelphia: John Benjamins.
EAGLES-EWG (1996). Eagles Evaluation of Natural Language Processing Systems,
Final Report EAG-EWG-PR.2, Project LRE-61-100, Center for Sprogteknologi,
Copenhagen, Denmark. [available at: www.issco.unige.ch/projects/ewg96/]
Ehrensberger-Dow, Maureen and Gary Massey (2014). ‘Cognitive issues in professional
translation’. In: John W. Schwieter and Aline Ferreira (eds), The Development of Translation
Competence;Theories and Methodologies from Psycholinguistics and Cognitive Science, pp. 58–86.
Cambridge: Cambridge Scholars Publishing.
Ehrensberger-Dow, Maureen and Sharon O’Brien (2014). ‘Ergonomics of the translation
workplace: potential for cognitive friction’. In: Deborah A. Folaron, Gregory M. Shreve,
and Ricardo Muñoz Martin (eds), Translation Spaces, 4(1): 98–118. Amsterdam and
Philadelphia: John Benjamins.
European Commission. Directorate-General for Translation (2015). DGT Translation Quality
Guidelines. Brussels/Luxembourg. [Online] https://ec.europa.eu/translation/maltese/
guidelines/documents/dgt_translation_quality_guidelines_en.pdf [accessed October 2019].
Bibliography 155
Farajian, M. Amin, Marco Turchi, Matteo Negri, and Marcello Federico (2017). ‘Multi-
domain neural machine translation through unsupervised adaptation’. Proceedings of
the second conference on machine translation. Denmark, Copenhagen. DOI: 10.18653/v1/
W17-4713.
Flanagan, Kevin (2014). ‘Subsegment recall in translation memory — perceptions,
expectations and reality’. JoSTrans 23.
Forcada, Mikel L. (2015). ‘Open-source machine translation technology’. In: Chan Sin-
Wai (ed.), The Routledge Encyclopedia of Translation Technology, pp. 152–66. London and
New York: Routledge.
Garcia, Ignacio (2015). ‘Computer-aided translation’. In: Chan Sin-Wai (ed.), The Routledge
Encyclopedia of Translation Technology, pp. 68–87. London and New York: Routledge.
Gintrowicz, Jacek and Jassem Krzysztof Jassem (2007). ‘Using regular expressions in trans-
lation memories’. Proceedings of the International Multiconference on Computer Science and
Information Technology, pp. 87–92.
Gouadec, Daniel (2007). Translation as a Profession. Amsterdam and Philadelphia: John
Benjamins.
Guerberof, Ana (2017). ‘Quality is in the eyes of the reviewer. A report on post-editing
quality evaluation’. In: Arnt Lykke Jakobsen and Bartolomé Mesa-Lao (eds), Translation
in Transition. Between Cognition, Computing and Technology, pp. 187–206. Amsterdam/
Philadelphia: John Benjamins Publishing Company.
Hernandez-Morin, Katell, Frenck Barbin, Fabienne Moreau, Daniel Toudic, and Gaëlle
Phuez-Favris (2017). ‘Translation technology and learner performance: professionally-
oriented translation quality assessment with three translation technologies’. In: Arnt
Lykke Jakobsen and Bartolomé Mesa-Lao (eds), Translation in Transition. Between Cognition,
Computing and Technology, pp. 207–34. Amsterdam/Philadelphia: John Benjamins
Publishing Company.
Holmes, James S. (1988a). Translated! Papers on Literary Translation and Translation Studies.
Amsterdam: Rodopi.
Holmes, James S. (1988b/2004). ‘The name and nature of translation studies’. In: Lawrence
Venuti (ed.), The Translation Studies Reader, pp. 180–92. London and New York:
Routledge.
Hutchins & Somers (1992). An Introduction to Machine Translation. London: Academic Press.
ISO 704:987 Principles and Methods of Terminology – http://lipas.uwasa.fi/termino/
library.html [accessed 2019].
Jiménez-Crespo, Miguel A. (2013). Translation and Web Localization. London and New York:
Routledge.
Jiménez-Crespo, Miguel A. (2015). ‘Translation quality, use and dissemination in an internet
era: using single-translation and multi-translation parallel corpora to research translation
quality on the web’. JoSTrans 23.
Kenny, Dorothy (2010). ‘The Ethics of Machine Translation’. Proceedings XI NZSTI
National Conference.
Kenny, Dorothy (2011). ‘The effect of translation memory tools in translated web texts: evi-
dence from a comparative product-based study’. In: Walter Daelemans and Veronique
Hoste (eds), Evaluation of Translation Technology, series 8/2009 Linguistica antverpiensia.
New Series – Themes in Translation Studies, pp. 213–34. Antwerpen: Artesis Hogeschool
Antwerpen. Departement Vertalers en Tolken.
Kenny, Dorothy (ed.) (2017). Human Issue in Translation Technology. London and New York:
Routledge.
Kiraly, Donald C. (1995). Pathways to Translation: Pedagogy and Process. Kent, Ohio: Kent State
University Press.
156 Bibliography
Note: Entries given in bold refer to tables; entries in italics refer to figures.
invoice 113, 134, 142, 144 morphological change 69, 75, 84, 86
ISO (International Organisation for Mossop, B. 99, 105
Standardisation) 60, 77, 92–4, 101–4, 120 MQF (Multidimensional Quality
Framework) see TAUS
Jiménez-Crespo, M.A. 97–8, 111, 141 MS Word: competence 1; shortcut code see
joined and split segments 27, 29, 39, 42 code; status bar 10; toolbar 10–11
Müller-Spitzer C. and A. Koplenig 79
Kenny, D. 1 Munday, J. 94
knowledge platform 141
KWIC (Key Word in Context) 81, 85 natural language processing 52
NDA see non-disclosure agreement
l10n (localisation) 98 neural machine translation (NMT) see
language: code 7; direction 27, 34; pair 11, machine translation
18, 44, 60, 63, 138; quality assurance no match 26–7, 30, 34, 36, 105–6
92; tag 7 non-disclosure agreement (NDA) see
language service provider 18, 80, 102 digital ethics
Levenshtein distance 108 nonlinear 97, 142 see also syntagmatic
leverage 5, 37, 54, 56 non-translatable 40, 92
lexicographer 79, 82 non-whitespace language 28; see also
licence 114; software 13, 134, 136–7, 144 alphabetic language
linear 77–8, 100; see also nonlinear Nunes Vieira L. 106, 108
linguist: contracted 34, 103; contractee Nunes Vieira L. and E. Alonso 151
120–3, 127, 132, 136, 138; see also
freelancer; freelance translator; reviser; O’Brien, S. 96, 106
subcontracted 121 offline editor 137
LISA (Localization Industry Standard O'Hagan, M. 142
Association): model 64 Olohan, M. 85
localisation 95, 97–9, 140–6 on-demand software see SaaS
lookup 12, 26–7, 87, 134–5 open-source software 3, 56
LSP see Language Service Provider optimisation see search engine optimisation
output quality 65
machine translation: adaptive 54–60; outsource 15, 143
integrated 12, 21, 26, 48, 51; neural 52–3, outsourced material 141; see also
60, 96, 107, 122; see also deep learning crowdsourcing
management: business 2, 145; content 144; ownership see digital ethics
file 6, 24, 144; quality 60, 65, 98; risk
113–4, 122, 124, 128 package 39–40; project 136; return 39, 136
mantra 14–5, 147, 149, 151 paid crowdsourcing platform 141
marketplace see platform paradigmatic 77, 107
match: accuracy 55; context 26, 34, 87; parallel corpora see corpora
fuzzy 26, 34, 37, 52–8, 86; perfect 26–8, parent folder 7–8; see also subfolder
33, 37, 54; reversed 27; statistics 13; see pattern matching see regex
also no match pdf 40, 43–4; see also format
metadata 33, 36–7, 53–5, 73–4, 120, 147; platform 116, 125; marketplace 139,
see also data 142; translation 58, 138, 140, 143,
metrics 46, 52, 62–5, 96, 98; automatic 145; translator 4, 138; see also
evaluation (AEM) 63; data crowdsourcing; digital knowledge base;
quality framework (DQF) 64–5; localisation; portal
multidimensional quality (MQM) 64–5 PO see purchase order
Microsoft (MS) Office 2–3, 13, 24; MS portal 127, 134, 137, 142–4; see also
Windows 1–3; operating system 3 platform
model: business 137, 139; encoder 53 see post edit machine translation (PEMT) 21,
also BLEU; LISA; metrics; TAUS 61–2, 107–9, 141
monolingual see revision post-editing (PE) 37, 54–5, 62, 102–3, 106
162 Index
post-editor 52, 92, 107–9 score: edit distance 55; error 52, 59, 63, 95,
precision 7, 30, 64; see also recall 102; MT 59, 63–4; underscore 7, 95
predictive feature 41, 53; typing 13; search: local 134; phrase-based 95, 146;
writing 38 tool 12
pre-translate 27–8, 94, 140 search engine 56, 62, 69, 84; optimisation
preview 12, 29 143, 146–7; MT 56; specialised 77
process: conversion 3, 125; file 13, security: breach 121; breach of
42; revision 99; segmentation 28, confidentiality 21, 113–15, 121–2;
42; translation 13–17, 71–3, 92–4, digital 115–18
136–7, 144 seed 146; list 147–8
professional organisation 4, 95, 116, 121 segment: source 4, 12, 26; split 27, 29; target
project management 17–19, 124–5; team 12, 26, 29, 80; subsegment 4; see also
13, 19–21; translation 15, 125–6; see also join; split
project manager segmentation 28, 42, 84, 135; rule 28–9, 34,
project manager 13, 16–20, 33, 115, 122 46–7; see also subsegmentation
propagate 27, 33, 58 server 13–14, 42, 125, 132–3, 136–7, 144
pseudo-translation 42–4 sight translation 135; see also text to speech
purchase order (PO) 17, 34, 113, 144 smartphone 14, 38, 134, 143
push and pull approach 87, 140 software see hardware
Pym, A. 29–30, 78, 103, 122, 142 source: file 6, 18, 98, 125–6; language
28–9, 32, 55, 104; text 42, 57, 73,
QA see quality 99, 115–16
quality: assessment 65, 92–6; assurance speech recognition see automated speech
(QA) 46, 64, 77, 85–6, 92–5; check 20, recognition
60, 92; control (QC) 74, 88, 93, 98, 104; speech technology see automated speech
estimation 55, 62–3; specification 93;TQA recognition
(translation quality assurance) 63, 92, 96 speech to text 134–5; see also automated
speech recognition; text to speech
ranking: algorithm 146; order 4 spellcheck 12–13, 41, 93, 99, 134
recall 37, 45, 64, 69, 84; see also precision split see joined and split segments
reference: file 38–41, 85, 105, 122, 126; standalone 2–3, 13–14, 52, 87–8
material 70–2, 80, 83, 85, 137; translation standard see ISO; quality
55–6, 64, 96, 99; see also alignment standardisation
regex 44, 46–7, 95; pattern matching 46; statistical machine translation 53, 121
see also filter see also neural machine translation
repetition 5, 34, 36, 95 statistics 13, 34–6, 53
repetitive strain injury 11 status bar see MS Word
review: bilingual 9, 29–30; monolingual storage system 6, 25
52, 104; preview 12, 29; tab 11, 31; string 27–30, 37–8, 53, 55, 84,
translation 17, 34, 53, 64, 102, 107; 98–99
see also revision subfolder 7–9
reviser 17–18, 94, 104–5, 112, 136 subsegment: matching 8, 36–7, 70; see also
revision: bilingual 100–1; CAT tool 99–102, segment
105, 107, 136; model 95; monolingual 29; subsegmentation 38, 90
self- 17–18, 94, 100, 105; third-party 5, syllabic language 12; see also alphabetic
17, 94, 101–2, 104, 150 language
ribbon 10–11, 44–5; CAT tool 11, 30–1, 34, syntagmatic 77–8, 107; see also nonlinear
45, 69; MS Word 10, 45
risk: breakdown structure 124–5; factors tablet 14, 119, 135, 143
124–5, 127; management 114, 124, 127; tag; language 7, see also code; formatting 14,
response 124, 127 41–2, 99
target: file 9, 39, 44, 95, 100; language (TL)
SaaS (software as a service) 136, 143–4 32, 42, 55, 60, 96, 147; text (TT) 34, 42,
sampling 79 53, 61, 96, 99–100
Index 163