2018 Simultaneousinterpretingin CHANed Encyclopedia

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Lecture 3

Simultaneous interpreting

1. Translation vs. interpreting

In English, the words ‘translation’ and ‘translating’ are often used as an umbrella term to
cover both written translation and interpreting, while the words ‘interpretation’ and
‘interpreting’ are generally used to refer to the spoken and/or signed translation modalities
only.

Two further points need to be made here: the first is that translation also includes a
hybrid, ‘sight translation’, which is the translation of a written text into a spoken or signed
speech. Simultaneous interpreting with text, discussed later in this article, combines
interpreting and sight translation. The second point is that while in the literature, there is
generally a strong separation between the world of spoken (and written) languages and the
world of signed languages, in this paper, both will be considered. Both deserve attention and
share much common ground when looking at the simultaneous interpreting mode.

2. Interpreting modes and modalities

In the world of interpreting, the consensus is that there are two basic interpreting modes:
simultaneous interpreting, in which the interpreter produces his/her speech while the
interpreted speaker is speaking/signing – though with a lag of up to a few seconds – and
consecutive interpreting, in which the speaker produces an utterance, pauses so that the
interpret can translate it, and then produces the next utterance, and so on.

In everyday interaction between hearing people who speak different languages and
need an interpreter, consecutive interpreting is the natural mode: the speaker makes an
utterance, the interpreter translates it into the other language, then there is either a response or
a further utterance by the first speaker, after which the interpreter comes in again, and so on.
In interaction between a hearing person and a deaf person, or between two deaf persons who
do not use the same sign language (American Sign Language, British Sign Language, French
Sign Language etc.), simultaneous interpreting is more likely to be used.
Conference interpreters also make a distinction between what they often call ‘long/true
consecutive’, in which each utterance by a speaker is rather long, up to a few minutes or
longer, and requires note-taking by the interpreter, and ‘short consecutive’ or ‘sentence-by-
sentence consecutive’, in which each utterance is short, one to a few sentences, and
interpreters do not take notes systematically.
Simultaneous with text is clearly in the simultaneous mode, because the interpreter
translates while the original speech (‘source speech’). The hybrid sight translation is more
difficult to classify. On the one hand, it entails translation after the source text has been
produced, which suggests it should be considered a consecutive translation mode. On the
other, the sight translator translates while reading the text, which suggests simultaneous
reception and production operations.
Simultaneous interpreting is conducted with or without electronic equipment
(microphones, amplifiers, headsets, an interpreting booth), generally in teams of at least two
interpreters who take turns interpreting every thirty minutes or so, because it is widely
assumed that the pressure is too high for continuous operation by just one person.
In conference interpreting between spoken languages, speakers generally speak into a
microphone, the sound is electronically transmitted to an interpreting booth where the
interpreter listens to it through a headset and interprets it into a dedicated microphone, and the
target language speech is heard in headsets by listeners who need it.
Sometimes, portable equipment is used, without a booth. When simultaneously
interpreting for radio or television, listeners/viewers hear the interpreted speech through the
radio or TV set’s loudspeakers rather than through a headset, but the principle remains the
same.
Sometimes, no equipment at all is used, and the interpreter, who sits next to the
listener who does not understand the speaker’s language, interprets the speech into the
listener’s ears in what is called ‘whispered (simultaneous) interpreting’.
When interpreting between a signed language and a spoken language or between two
signed languages, no equipment is used in dialogue settings. In video relay interpreting, a deaf
person with a camera is connected through a telephone or other telecommunications link to a
remote interpreting center where interpreters generally sit in booths and send their spoken
interpretation of the output back to a hearing interlocutor and vice-versa. In the USA, such
free interpreting service has been available to all for a number of years and has allegedly had
major implications on deaf signers (e.g. Keating & Mirus, 2003; Taylor, 2009).
To users of interpreting services, the main advantage of simultaneous interpreting over
consecutive interpreting is the time gained, especially when more than two languages are used
at a meeting. Its main drawbacks are its higher price and lack of flexibility. The former is due
to both the cost of the interpreting booth and electronic equipment generally required for
simultaneous and to the rule that with the exception of very short meetings, simultaneous
interpreters work in teams of at least two people – at least in interpreting between spoken
languages – whereas for consecutive interpreting assignments, interpreters tend to accept
working alone more often. The lack of flexibility is due to the use of booths and electronic
equipment in usual circumstances, with restricts mobility and requires a certain layout of the
conference room.

3. The history of simultaneous interpreting: a few pointers

The literature about the history of interpreting tends to associate simultaneous interpreting
with the development of conference interpreting, and in particular with the Nuremberg trials,
after World War II (e.g. Ramler, 1988; Baigorri Jalón, 2004). It is definitely the Nuremberg
trials which gave high visibility to simultaneous interpreting, which had been experimented
with at the ILO (International Labor Organization) and at the League of Nations with limited
success (Baigorri Jalón, 2004, chapter III), perhaps to a large extent because of resistance by
leading conference interpreters who were afraid that this development would reduce their
prestige and be detrimental to working conditions (Baigorri Jalón, 2004, p. 148).
In signed language interpreting, in all likelihood, simultaneous interpreting became a
popular interpreting mode, perhaps even a default mode early on. It allowed faster
communication than consecutive. Moreover, whereas in spoken language interpreting, there is
vocal interference between the source speech and the interpreter’s speech, in signed language
interpreting, there is none. Ball (2013, p. 4-5) reports that as early as 1818, Laurent Clerc, a
deaf French teacher, addressed US President James Monroe and the Senate and Congress of
the United States in sign language, and “while he signed”, Henry Hudson, an American
teacher, “spoke the words”.
After World War II, simultaneous was used mostly in international organizations
where fast interpreting between several languages became necessary and where waiting for
several consecutive interpretations into more than one language was not an option. But it soon
extended to other environments such as multinational corporations, in particular for Board of
Director meetings, shareholders meetings, briefings, to press conferences, to international
medical, scientific and technological conferences and seminars, and to the media. Television
interpreting, for instance, has probably become the most visible form of (mostly)
simultaneous interpreting, both for spoken languages and for signed languages, and there are
probably few people with access to radio and TV worldwide who have not encountered
simultaneous interpreting on numerous occasions.
Professional conference interpreter organizations such as AIIC (the International
Association of Conference Interpreters, the most prestigious organization, which was set up in
Paris in 1953 and has shaped much of the professional practices and norms of conference
interpreting) claim high level simultaneous interpreting as a major conference interpreting
asset, but simultaneous interpreting is also used in the courtroom and in various public service
settings, albeit most often in its whispered form.
All in all, it is probably safe to say that besides signed language interpreting settings,
where it is ever-present, simultaneous interpreting has become the dominant interpreting
mode in international organizations and in multi-language meetings of political, economic,
scientific, technical and even high-level legal meetings as well as in television programs,
while consecutive interpreting is strong in dialogue interpreting, e.g. in one-on-one
negotiations, in visits of personalities to foreign countries, and in encounters in field
conditions where setting up interpreting equipment is difficult.

4. An analysis of the simultaneous interpreting process

4.1 How is simultaneous interpreting done?

Is simultaneous interpreting possible at all? One of the early objections to simultaneous


interpreting between two spoken languages was the idea that listening to a speech in one
language while simultaneously producing a speech in another language was impossible.
Intuitively, there were two obstacles. Firstly, simultaneous interpreting required paying
attention to two speeches at the same time (the speaker’s source speech and the interpreter’s
target speech), whereas people were thought to be able to focus only on one at a time because
of the complexity of speech comprehension and speech production. The second, not unrelated
to the first, was the idea that the interpreter’s voice would prevent him/her from hearing the
voice of the speaker – later, Welford (1968) claimed that interpreters learned to ignore the
sound of their own voice (see Moser, 1976, p. 20). Interestingly, while the debate was going
on among spoken language conference interpreters, there are no traces in the literature of
anyone raising the case of signed language interpreting, which presumably was done in the
simultaneous mode as a matter of routine and showed that attention could be shared between
speech comprehension and speech production.
As the evidence in the field showed that simultaneous interpreting was possible
between two spoken languages, from the 1950s on, investigators began to speculate on how
this seemingly unnatural performance was made possible, how interpreters distributed their
attention most effectively between the various components of the simultaneous interpreting
process (see Barik, 1973, quoted in Gerver, 1976, 168).
One idea was that interpreters use the speaker’s pauses, which occur naturally in any
speech, to cram much of their own (‘target’) speech – see Goldman-Eisler, 1968, Barik, 1973.
However, in a study of recordings of 10 English speakers from conferences, Gerver found that
4% of the pauses only lasted more than 2 seconds and 17% lasted more than 1 second. Since
usual articulation rates in such speeches range from close to 100 words per minute to about
120 words per minute, during such pauses, it would be difficult for interpreters to utter more
than a few words at most, which led him to the conclusion that their use to produce the target
speech could only be very limited (Gerver, 1976, 182-183). He also found that even when on
average of 75 percent of the time, interpreters listened to the source speech and produced the
target speech simultaneously, they interpreted correctly more than 85 percent of the source
speech.
There are no longer doubts about the genuine simultaneousness of speaking and
listening during simultaneous interpreting – though most of the time, at micro-level, the
information provided in the target speech lags behind the speaker’s source speech by a short
span. Anticipation also occurs – sometimes, interpreters actually finish their target language
utterance before the speaker has finished his/hers. According to Chernov (2004), such
anticipation, which he refers to as “probabilistic prognosis”, is what makes it possible to
interpret in spite of the cognitive pressure involved in the exercise.
Basically, the simultaneous interpreter analyzes the source speech as it unfolds and
starts producing his/her own speech when s/he has heard enough to start an idiomatic
utterance in the target language. This can happen after a few words have been produced by
the speaker who is being translated, or a phrase, or more rarely a longer speech segment.
For instance, if, in a conference, after a statement by the Chinese representative, the
British speaker says “I agree with the distinguished representative of China”, interpreters can
generally anticipate and even start producing their target language version of the statement as
soon as they have heard “I agree with the distinguished” with little risk of going wrong. In
other cases, the beginning of the sentence is ambiguous, or they have to wait longer until they
can start producing their translation because the subject, the objet and the verb are normally
positioned at different places in the target language.
One of the earliest and most popular theories in the field, Interpretive Theory, which
was developed at ESIT, France, by Danica Seleskovitch and Marianne Lederer in the late
1960s and early 1970s (e.g. Israël & Lederer, 2005), presents the interpreting process in both
consecutive and simultaneous as a three-phase sequence. The interpreter listens to the source
speech ‘naturally’, as in everyday life, understands its ‘message’, which is then
‘deverbalized’, i.e. stripped of the memory of its actual wording in the source speech. This
idea was probably inspired by psychologists, and in particular Sachs (1967), who found that
memory for the form of text decayed rapidly after its meaning was understood. The
interpreter than reformulates the message in the target language from it's a-lingual mental
representation (see Seleskovitch & Lederer, 1989). Central to this theory is the idea that
interpreting differs from ‘transcoding’, i.e. translating by seeking linguistic equivalents in the
target language (for instance lexical and syntactic equivalents) to lexical units and
constituents of the source speech as it unfolds. While the theory that total deverbalization
occurs during interpreting has been criticized, the idea that interpreting is based more on
meaning than on linguistic form transcoding is widely accepted. As explained later, it is
particularly important in simultaneous where the risk of language interference is high.

4.2 Cognitive challenges in simultaneous interpreting

Lay people often ask how simultaneous interpreters manage to translate highly technical
speeches at scientific and technical conferences. Actually, the language of specialized
conferences is not particularly complex in terms of syntax, much less so than the language of
non-technical flowery speeches, and its main difficulty for interpreters is its specialized
lexicon. The relevant terminology needs to be studied before every assignment, which can be
done with the appropriate documents, and interpreters tend to prepare ad hoc glossaries for
specialized meetings.
Language is not the only challenge that simultaneous interpreters face. There are also
cultural challenges, social challenges, affective challenges having to do with their role as
message mediators between groups with different cultures and sometimes different interests,
as witnesses of events and actions about which they may feel strongly, as persons whose
social and cultural status and identity can be perceived differently by the principals in the
interpreter-mediated communication and by themselves, but these challenges are not specific
to simultaneous interpreting and will not be discussed here.
The main cognitive challenge of simultaneous interpreting is precisely the high
pressure on the interpreter’s mental resources which stems from the fact that they must
understand a speech and produce another at the same time at a rate imposed by the speaker. A
more detailed analysis of the nature of this challenge is presented in Section 4.3. At this point,
suffice it to say that interpreters have always been aware of the fact that the difficulty was
considerable as soon as the speech was delivered rapidly, and that interpreters could not
always cope (see for example George Mathieu’s statement made in 1930 as quoted in Keiser,
2004, p. 585; Herbert, 1952; Moser, 1976; Quicheron, 1981).
The practical consequence of this challenge is the presence of errors, omissions and
infelicities (e.g. clumsy wording or syntax) in the simultaneous interpreters’ production. How
many there are in any interpreted speech or statement is a topic that interpreters are reluctant
to discuss. It depends on a number of factors, including the interpreter’s skills and experience,
features of the speech (see the discussion of problem triggers in the next section) and
environmental conditions such as the quality of the sound (or image) which reach the
interpreter, background noise, the availability of information for thematic and terminological
preparation, and probably language-pair specific features. In many cases, interpreters are able
to translate a speaker’s statement faithfully and in idiomatic, sometimes elegant language, but
in other cases, which are far from rare, errors, omissions and infelicities (EOIs) can be
numerous. In a study of authentic online simultaneous interpretations of President Obama’s
inaugural speech in January 2009 by 10 professional interpreters working into French,
German or Japanese, Gile (2011) found 5 to 73 blatant errors and omissions over the first 5
minutes of the speech. In other words, these experienced, proficient interpreters made on
average from 1 to more than 14 blatant meaning errors or omissions every minute when
translating a difficult, but not extraordinarily difficult speech.
How this affects the comprehension of the speaker’s message and intentions by users
remains to be investigated. Some EOs may have little or no impact, for instance if they affect
speech segments which are highly redundant or of little relevance to the message, while
others may deprive the users of important information – for example if numbers measuring
the financial performance of a company are omitted or translated incorrectly. The number of
EOIs is therefore not a sufficiently reliable metric to measure the amount of information
actually transmitted to users of the target language, but the image of the simultaneous
interpreter producing a very faithful and idiomatic version of the source speech in the target
language at all times is clearly not a realistic one.

4.3 The Effort Model of Simultaneous Interpreting

In the late 1970 and early 1980s, Gile observed the difficulties even highly experienced
interpreters with an excellent reputation encountered via EOIs, reflected upon his own
interpreting experience, started reading the literature on cognitive psychology and developed
a set of ‘Effort Models’ of interpreting to account for the problems which occurred regularly
in the field (e.g. Gile, 2009). The Effort Model for simultaneous interpreting conceptualizes
SI as consisting of four ‘Efforts’:

The Reception Effort, which encompasses all mental operations involved in perceiving and
understanding the source speech as it unfolds, including the perception of the speech sounds –
or signs when working from a signed language – and of other environmental input such as
documents on screen or reactions of other people present, the identification of linguistic
entities from these auditory or visual signals, their analysis leading to a conclusion about their
meaning.

The Production Effort, which encompasses all mental operations leading from decisions on
ideas or feelings to be expressed (generally on the basis of what was understood from the
source speech) to the actual production of the target speech, be it spoken or signed, including
the selection of words or signs and their assembly into a speech, self-monitoring and
correction if required.

The Memory Effort, which consists in storing for a short period of up to a few seconds
information from the source speech which has been understood or partly understood and
awaits further processing or needs to be kept in memory until it is either discarded or
reformulated into the target language.

The Coordination Effort, which consists in allocating attention to the other three Efforts
depending on the needs as the source and target speeches unfold.

Increasingly, speakers read texts. When these are provided to the interpreters as well, the
resulting interpreting mode is called ‘simultaneous with text’. In simultaneous with text, the
Reception Effort is actually composed of a Listening Effort and a Reading Effort. This
distinction is based firstly on the fact that one relies on sound and the other on visual signals,
which means that at least at the perception stage, different processes are involved, and
secondly because speakers often depart from the written text and modify, add or omit
segments, which forces interpreters to either use the oral signal only or to attend to both the
text and the speaker’s voice. They generally do the latter, because read speeches tend to have
a prosody and a rhythm that make them more difficult to follow than adlibbed speeches (see
Déjean Le Féal, 1978), and having a text is a help – though the additional Effort entails
additional cognitive pressure.

Two further Efforts were added later for the case of an interpreter working from a spoken
language into a sign language (on the basis of input from Sophie Pointurier-Pournin – see
Pointurier-Pournin, 2014). One is the SMS Effort, for Self-Management in Space: besides
paying attention to the incoming speech and their own target language speech, interpreters
need to be aware of spatial constraints and position themselves physically so as to be able to
hear the speaker and see material on screen if available, and at the same time remain visible to
the deaf audience without standing out and without causing disturbance to the hearing
audience.

The other is the ID Effort, for Interaction with the deaf audience: deaf people often sign while
an interpreter is working, either making comments to each other or saying something to the
interpreter, for instance asking him/her to repeat or explain or making a comment about the
speech being interpreted. This is a disturbance factor for the interpreter, whose attention is
distracted from focusing on the incoming speech and outgoing speech.

All these Efforts include non-automatic components: in other words, they require attentional
resources (see Gile, 2009). For instance, in the Reception Effort, some processing capacity is
required to identify linguistic units from the sounds or visual signals, and more capacity is
required to make sense out of the linguistic units. In the Production Effort, the retrieval of
lexical units, be they spoken or signed, can also require processing capacity, especially in the
case of less than frequently occurring words. So does the assembly of lexical units into
idiomatic utterances, especially under the possible interference of the source language.
More importantly, the total processing capacity required for all these Efforts tends to
be close to the total available attentional resources, close enough for interpreters to be
destabilized and risk saturation and EOIs in their output when making errors in their
management (such as lagging too far behind the speaker or focusing too strongly on
producing an elegant target speech) and when encountering certain difficulties in the speech
itself – the so-called ‘problem triggers’. This fragile situation in which simultaneous
interpreters find themselves is the ‘Tightrope Hypothesis’, which Gile assumes to be the main
sources of EOIs in interpreting (Gile, 2009).

4.4 Cognitive problem triggers

According to the Tightrope Hypothesis, three types of phenomena cause the overwhelming
majority of errors and omissions among professional simultaneous interpreters. The first is
mismanagement of attentional resources. The second is an increase of processing capacity
requirements. The third is short information-carrying signals with little redundancy such as
short names, numbers, short lexical units in text with little grammatical or semantic
redundancy such as are often found in Chinese, for example. They are particularly vulnerable
to short lapses of attention which cause loss of signal from the speaker with little chance of
recovery through redundancy. Triggers most often discussed among interpreters and studied
in the literature belong to the second category and include in particular the following:

a. Rapid delivery of speeches, dense speeches and speech segments as well as written
speeches read out aloud
In all these cases, interpreters are forced to analyze much incoming information-
containing signal over very short periods, which puts the Reception Effort under a heavy
workload. Moreover, since they cannot afford to lag behind (see section 4.5), they also have
to formulate their target speech rapidly, which imposes a heavy load on the Production Effort.

b. Embedded structures and multi-word names such (names of organizations, conventions


etc.)
In both of these cases, interpreters have to store much information in memory as the
target speech unfolds before they can reformulate it. In multi-word names, the problem arises
mainly due to the need to reorganize the components in the target language. For instance,
WIPO, the World Intellectual Property Organization, translates into French as OMPI,
Organisation Mondiale de la Propriété Intellectuelle. The first word in the name in English is
second in French, the second becomes fourth, the third remains third, and the fourth becomes
the first. If the interpreter cannot anticipate the name in a speech and is not very familiar with
the French equivalent, s/he will be forced to wait until the fourth word is pronounced in
English before starting to translate it, with repeated retrievals from memory of the English
name and what has already been translated into French at every stage of the translation, a very
taxing task. In a small experiment with a 12 minutes extract from an authentic speech, Gile
(1984) found that out of 15 interpreters who interpreted the speech from English into French,
only 3 managed to interpret correctly one of the two multi-word names it contained and none
interpreted correctly the other.

c. Noise and low quality signal from the speaker


This includes poor sound produced by the electronic equipment, background noise, but
also strong accents, poor language quality such as incorrect lexical usage and grammar,
careless and non-standard signing when using a sign language, poor logic and ambiguous
formulation of ideas. In all these cases, Reception becomes more difficult. Strong accents and
poor language quality are frequent when working from English, which has become a lingua
franca worldwide and is often used by speakers who do not necessarily master it very well.

d. Working from one language into another which is syntactically and/or lexically very
different.
The lexical difficulty applies mostly to interpreting from spoken languages into sign
languages, which have a much smaller lexicon, and finding a way to express in a sign
language a concept which is lexicalized in the spoken language but has no sign (these as
known as ‘lexical gaps’) can require considerable effort and time – at cognitive scale (see
Pointurier-Pournin and Gile, 2012; Pournin, 2014). As to syntactic differences, they have the
same effect as multi-word names, in that they require waiting and storing much information in
short-term memory before reformulation of the message in the target language (e.g. Wilss,
1978; Gile, 2009; Seeber, 2013).

4.5 Failure sequences

A few typical examples can illustrate how speech- and speaker-related difficulties and
attentional resource management errors can combine to produce EOIs in simultaneous
interpreting due to the fact that interpreters tend to work close to cognitive saturation.

Example 1: Direct Reception Effort failure

A common failure sequence starts with a source speech segment with high information
density (e.g. a number, a complex name, an enumeration) or noisy signal (background noise,
strong accent, faulty logic and lexical usage, non-standard signing) the analysis of which
requires much processing capacity at a time where little is available, for instance when the
interpreter is busy reformulating a previous segment. This results in failure to ‘hear’ the
segment – actually, not to hear but to understand the sounds or signs – and in omission of the
relevant information in the target speech, or in an error.
In signed language interpreting, background ‘noise’ can come from deaf people other
than the speaker who sign to each other or to the interpreter, asking for clarification or making
other comments.
Example 2: Indirect effect of increased processing requirements in the Reception Effort

Another typical sequence starts with a similar, informationally dense or noisy source speech
segment which the interpreter identifies as requiring more processing capacity. S/he may
decide to focus on the perception and analysis of this segment, which takes away attentional
resources from production of the target speech. This may result in a deterioration of the
linguistic quality of the target speech, or in slower output, leading to increased lag behind the
speaker. This lag may overload the interpreter’s short-term memory (‘working memory’),
leading to the interpreter’s inability to remember previously analyzed information which has
yet to be reformulated in the target language. Alternatively, the interpreter’s short-term
memory is not affected immediately, but the lag forces the interpret to accelerate his/her
production after the dense source speech segment is analyzed, at which time less processing
capacity is available for the Reception Effort, and even simple segments with low information
density and no problem-triggering features may be missed (see Gile, 2009, chapter 8).

Example 3: Effect of forced waiting

An excessive lag behind the speaker can also occur when the source speech is neither
informationally dense nor noisy, but syntactic differences between the source language and
target language force the interpreter to wait a while before reformulating the content of the
beginning of sentence, which may result in overload in the Memory Effort.

Example 4: Indirect effect of intensified Production Effort

Interpreters may understand a concept or idea expressed in the source speech but find it
difficult to reformulate it in the target language, for instance because the specific lexical unit
they need is unknown to them, or does not exist in the target language (a frequent occurrence
when interpreting from a spoken language into a sign language), or is not immediately
available to them (‘tip of the tongue’ phenomenon). The extra time and effort mobilize
processing capacity away from the Reception and Memory Efforts, which may result in errors
and omissions through the process described above.
Sometimes, the production difficulty stems from the interpreter’s wish to produce an
elegant speech, which requires less available lexical units, or a complex sentence as opposed
to a simple one. In such cases, the problem lies not with features of the source speech or its
delivery by the speaker, but with the interpreter’s strategic and tactical choices.

Example 5: Effect of excessive focus on text in simultaneous with text

Having in the booth the text of an address which a speaker reads has both advantages and
drawbacks. The attractiveness of the text to interpreters stems from the fact that it makes the
linguistic content of the speech easily available to them in print (save for rare handwritten
texts), with virtually no effort required for the linguistic identification of the signals and more
freedom to manage the Reading part of the Reception Effort while the Listening part is
entirely paced by the speaker. As a result, they are tempted to sight-translate the text and use
the speech sounds for the sole purpose of checking the speaker’s progression in the text. This
entails two risks: one is that the speaker’s deviations from the text – skipped parts and added
comments – can be missed because not enough attention is devoted to the Listening
component; the other is that difficulties in sight translation slow down the interpreter while
the speaker is speeding ahead, with the associated lag and consequences described above.
5. The simultaneous interpreter’s language skills

AIIC offers very general descriptions of language skills required for conference interpreting.
It defines three types of working languages (see http://aiic.net/node/6/working-
languages/lang/1):
The 'A' language is the interpreters’ mother tongue (or its strict equivalent) into which
they work from all their other working languages in both consecutive and simultaneous
interpretation. It is the language they speak best, and “in which they can easily express even
complicated ideas”, and the interpreter’s main ‘active language’. ‘B languages’ are languages
in which interpreters are “perfectly fluent” and into which they can work (they are also ‘active
languages’), and ‘C languages' are languages which they “understand perfectly”, from which
they work but into which they do not translate (they are ‘passive’ languages).
All conference interpreters are supposed to have an A language and at least a C
language. However, there is little work for interpreters with one A language and one C
language only. The vast majority of them have at least two active languages (one A language
and one B language or two A languages) or one active language (generally an A language)
and at least two passive languages. In many parts of the world, and in particular in Asia,
interpreters tend to have one A language and one B language and work both ways (from A
into B and vice-versa), though the prevailing norm is that it is better to work into one’s A
language only – a controversial norm (e.g. Kelly et al. 2003).
Due to the cognitive pressure explained earlier, in terms of language skills,
requirements from simultaneous interpreters are more stringent than being “perfectly fluent”,
being “able to express easily even complicated ideas” and being “able to understand a
language perfectly.”
Because of the vulnerability of simultaneous interpreters to cognitive saturation,
linguistic processing of the incoming speech sounds or visual signs must be very rapid and
require as little attentional capacity as possible. This ‘comprehension availability’ comes after
repeated exposure to speech (or signed utterances in the case of a sign language) in a variety
of situations and with a variety of sociolects and accents. It does not necessarily develop after
repeated exposure to written texts, which are perceived visually with only an indirect link to
their phonological form. Student interpreters with an excellent comprehension of a passive
language in its written form, including those with considerable experience as translators, often
fail to have the required availability for the spoken form of their passive languages.
With respect to active languages, cognitive pressure on the simultaneous interpreting
process, especially limitations on maximum time lag between reception and reformulation,
imposes two requirements. One is the interpreters’ ability to access lexical units and assemble
them into idiomatic statements rapidly and with little attentional processing capacity
expenditure so as to leave enough resources free for other operations, in particular those
making up the Reception Effort. The other is flexibility, in other words the ability to start an
utterance on the basis of partial information and continue its assembly into an idiomatic
sequence of sentences as the incoming source speech unfolds while maintaining rigorous
compliance with a given information content. This contrasts sharply with everyday situations
in which speakers can plan their utterances in advance or change their content online if they
encounter difficulties in formulating their ideas. Such production skills, when they are not
part of a person’s baseline aptitudes, come after much speaking/signing practice – as opposed
to writing, in which, at cognitive scale, that is, fractions of a second, text producers have
much more time to retrieve words from memory, assemble them and write them down.
As discussed in section 7, interpreters also need to have correct prosody and speak
without a strong accent so as to be easily understood by users.
In the case of signed language interpreting, for Reception, interpreters need to be
familiar with non-standard forms of signing, as they may encounter signers from various
backgrounds and geographic areas, with dialects and idiosyncrasies. For Production, they
need to be creative in the use of their sign language in order to deal with frequent lexical gaps.
Such requirements are only met by a small proportion of ‘bilinguals’ or
‘multilinguals’, and earning a foreign language degree is far from sufficient to give them
sufficient linguistic qualifications. Initially, in the West, it was thought that only persons who
came from a culturally and linguistically mixed background or had lived for many years in
foreign countries could acquire them. Experience has shown that this is not the case, as some
competent simultaneous interpreters have actually acquired their foreign language(s) as adults
and have not lived in the relevant country for any significant length of time, but such people
presumably have higher than average talent. In prestigious conference interpreter training
programs in Europe, insufficient language skills are probably by far the most frequent reason
of students’ failure in graduation examinations.
Requirements are far less stringent in consecutive interpreting, in which, while the
source speech unfolds, the interpreter’s attention can be focused on the incoming speech –
and on note-taking when notes are taken. Production follows – after the comprehension
process is completed, and at that stage, the interpreter’s attention can be focused on word
retrieval and utterance assembly, without the need to keep part of it available for
comprehension of the incoming speech as is the case in simultaneous. This is why some
interpreters who refuse to work from their A language into their B language in simultaneous
do work regularly into their B language in consecutive.

6. Tactics and strategies

Over time, simultaneous interpreters have developed ways of facing the linguistic and
cognitive difficulties they encounter on a regular basis. In the literature, they are often
referred to as ‘strategies’ indiscriminately, but it is perhaps more rational to distinguish
between preparatory actions with a desired medium-term or long-term effect, which can
indeed be called ‘strategies’, and online decisions aiming at immediate or quasi-immediate
effects, which will be referred to as ‘tactics’.

6.1 Preparation strategies

The most fundamental strategies used by interpreters to cope with the cognitive difficulties of
simultaneous revolve around preparation of their interpreting assignments. This is done
mainly through documents, including both conference documents such as the agenda or
program of the meeting, lists of participants, calls for papers, documents describing the
conference, texts to be read and abstracts, powerpoint presentations, and external documents
such as newspaper articles, books, scientific journals and, increasingly, internet sources of
various kinds. The preparation process often continues up to the beginning of the meeting and
even beyond, in the interpreting booth, when reading new documents as they arrive and
seeking on the Web information about concepts and names just heard or read.
Through these documents, interpreters acquire and/or refresh background information
on the topic and the meeting as well as relevant language-related data, including terminology
and phraseology. This helps them resolve potential ambiguity in the speeches they hear and
facilitates comprehension as well as production by lowering the processing capacity and time
required to analyze incoming signals and to retrieve appropriate lexical units when producing
their translation.
Interpreters also prepare glossaries which they can use in the booth to help further
speed up the comprehension and production processes.

A specific preparation strategy for simultaneous interpreting with text consists in


marking the texts in advance for more rapid reading and translation. In particular, important
concepts and names can be highlighted with a marker or underlined, glosses for certain lexical
items, idiomatic, expressions, citations and names can be written between the lines or in the
margins, complex names which require reordering of their components can be marked with
numbers above each component to indicate the order in which they will be translated into the
target language, sentences can be segmented with slashes to show visually the boundaries of
their syntactic or logical constituents.
In signed language interpreting, one important preparation strategy consists in
consulting with other interpreters to see what signs have been used for particular concepts and
names with the same deaf user(s) of interpreting services or consulting with the deaf user(s)
and reaching an agreement on how to sign particular concepts and names which are likely to
come up during the meeting.
Preparation strategies give interpreters access to knowledge, including terminology
and relevant names and acronyms for people, places, organizations, products etc. which make
it possible for them to understand and reformulate very specific and very specialized
information to which they are normally outsiders. Beyond information per se, but they also
reduce processing capacity requirements for both comprehension of source speech signals,
even under ‘noisy’ conditions, and retrieval of lexical units and names for reformulation, thus
lowering the frequency of occurrence of errors, omissions and infelicities associated with
cognitive saturation.

6.2 Online tactics

The literature on interpreting abounds with descriptions and analyses of the simultaneous
interpreter’s online tactics, which is further evidence – of the high prevalence of difficulties
interpreters encounter while in the booth. In the literature, some of the tactics are studied
under a psycho-sociological angle, as reflecting the interpreter’s position as a participant in
the interpreter-mediated event (see for instance Diriker, 2004; Monacelli, 2009; Torikai,
2009). These will not be taken up here, because they are not specific to simultaneous
interpreting. The following analysis focuses on tactics used to counter cognitive challenges
arising from the simultaneousness of the Efforts in the simultaneous mode.

a. Collaborative tactics

There are two reasons why simultaneous interpreters tend to work in teams of at least two.
One is the possibility for them to take turns to relieve each other of fatigue. The other is
collaboration in the face of difficulties. For instance, the non-active interpreter (the one who is
producing the target speech) writing down or signing a source speech concept, term, name or
number which has been missed by the active interpreter or which s/he believes may be
missed. In signed language interpreting, when working in a team of two, when the active
interpreter faces the deaf users and therefore turns his/her back on the screen where
information is shown, it is particularly useful for the non-active interpreter to face the screen
and active interpreter and serve as his/her eyes.

b. Individual tactics
Individual tactics are the ones that have received most attention in the literature (e.g. Snelling,
1992; Van Besien, 1999; Tohyama & Matsubara, 2006; Wallmach, 2000; Liontou, 2012).
Numerous analyses have been conducted, including some that compare students’ tactics with
tactics used by seasoned professionals and others that compare tactics used when working
into one’s A language to those used when working into one’s B language. Giving a full
inventory of those identified so far is beyond the scope of this article, which will only offer an
illustrative list of how interpreters attempt to cope with the cognitive challenges that arise
while they are active in the booth.
Some tactics are aimed at preventing predictable overload. For instance, when
interpreting from one language into another language with very different syntax, interpreters
often start their target language reformulation as soon as they have enough information from
the unfolding source speech with a short autonomous sentence or with a ‘neutral’ beginning
rather than commit themselves in one specific direction. This is intended to avoid working
memory saturation associated with the syntactic difference-induced lag. Another preventive
tactic consists in reformulating the last element in an enumeration of names first: this is
assumed to save on processing capacity, as working memory can dispose of this last name
before it is processed in more depth along with the other components of the list.
Other tactics aim at limiting loss. For instance, when interpreters fail to grasp or
remember a large number such as 3 251 127, they may resort to an approximation such as
“over three million”; when they do not know the exact target language equivalent for a
specific term, they may resort to a hypernym, for instance “device” to replace the name of a
specific machine; similarly, when the speaker cites the name of an organization or a
convention, interpreters who are not familiar with the name or have missed it may choose to
say “the group”, “our organization”, “the convention”. While strictly speaking, the
information explicitly expressed in their speech is incomplete, if users are familiar with the
topic, they may know exactly what group, organization or convention is referred to, and no
information is lost. At other times, under high cognitive pressure, interpreters realize they will
not have the resources to reformulate all of the content of the relevant speech segment and
may choose to omit some to preserve what is more important. For instance, in an enumeration
followed by a critical piece of information, if they lag too far behind, they may decide to
leave out some of the terms of the enumeration and thus save time and processing capacity for
the reformulation of the more important content.
When encountering a term they do not know in the target language, they may choose
to explain the relevant concept with a few words. They may also decide to reproduce the
source language term in their target language utterance. This may work well in some cases,
for instance when translating from English into many languages in fields like medicine,
information technology or finances, because many users read reference documents in English
and are familiar with English terms, but this tactic is direction-sensitive: it is likely to be
efficient when working from English into Japanese, but obviously not when working from
Japanese into English.
This particular tactic is also used in signed language interpreting, through
fingerspelling: the spelling of the term or name in the relevant spoken language is signed to
the deaf persons. However, this is often problematic, not necessarily in terms of information
efficiency if the term is short (see the discussion on strategy and tactic selection below), but in
terms of social acceptability, as many deaf persons are very sensitive about the use of sign
languages, which they consider an essential part of their cultural identity, and frown upon the
intrusion of elements of spoken languages in their signed exchanges (Pointurier-Pournin,
2014).
Another possibility is to ‘translate’ word for word (‘transcode’) the source language
term into the target language. At a dentistry meeting, the term ‘mandibular block’, which
refers to a type of anesthesia, was interpreted into French as “bloc mandibulaire” by an
interpreter who did not know the corresponding French term, tronculaire. French participants
later commented that they had understood the translation.
Most often, in signed language interpreting as opposed to simultaneous interpreting
between two spoken languages, the setting is ‘dialogic’, with very few persons and the close
physical presence of the interpreter among them. This allows interpreters to communicate
with speakers while they are signing and ask for clarification when encountering a
comprehension difficulty, something that is difficult to do from a booth. In a booth, when an
interpreter misses important information, s/he may decide to inform the audience, for instance
by saying something like “the result/number/name which the interpreter has missed”.

6.3 Strategy and tactic selection

All these strategies and tactics are used to overcome difficulties, but their efficiency is relative
and each has a price. For instance, using the transcoding option may convey the full
information without loss, but at the cost of credibility for the interpreter. Similarly,
fingerspelling into a sign language may convey the information, but generate a negative
reaction towards the interpreter who let the spoken language intrude. Such confidence loss
can have dire consequences for the interpreter in the medium and long term, but even in the
short term, immediate communication through his/her interpretation may become more
difficult, especially when the event involves a strong affective dimension for the users. A loss
of credibility may also result from an interpreter’s decision to inform the audience that s/he
has missed a bit of information.
Another type of price associated with certain tactics is cognitive in nature: explaining
a term whose equivalent in the target language is unknown to the interpreter, fingerspelling a
long word, writing down a large number lest it be forgotten involve extra time and can lead to
excessive lag behind the speaker and to saturation of working memory with the consequences
described earlier.
When selecting online tactics, interpreters probably attempt to attain, consciously or
not, two partly interdependent objectives: conveying a maximum amount of information that
the speaker wants to convey, ideally all of it, while prioritizing the most important
information, and formulating a target speech which will produce the effect sought by the
speaker.
What information is important depends on the circumstances. In a diplomatic event,
many facts mentioned in speeches can have a secondary role, whereas in a technical event,
some information can be critically important, and other information more anecdotal.
Interpreters are often capable of determining which should be prioritized if some loss is
inevitable. In some environments, information as a whole is far less important than a general
atmosphere. This is inter alia the case of entertainment programs on TV, where a further
requirement is the smooth flow of information, if possible without lags, so it is often more
important to finish the interpretation of an utterance at the same time as the speaker than to
convey the full information, which has implications on tactics which will be selected.

7. The quality of simultaneous interpreting

Cognitive pressure and its implications as analyzed so far show that on the whole, the product
of simultaneous interpreting cannot be expected to be as fully faithful to the original and of
uniformly high linguistic quality as desired from written translation. This prompts the
question of how much is lost in interpreting versus direct communication between a speaker
and his/her interlocutor(s).
With respect to information conveyed to users, it would be simplistic to calculate the
proportion of information explicitly present in the source speech and absent in the target
speech, if only because information loss also occurs in direct speaker to listener
communication. In some cases, for instance when speakers are under stress or speak in a non-
native language, which is a frequent occurrence, interpreters may be able to convey more to
their listeners. How much information is actually lost through interpretation remains to be
investigated and is probably highly variable across speakers, interpreters and situations.
But as already mentioned, the relative importance of information in a speaker’s
message is also highly variable. Verbal utterances also have social, affective, ritual, formal
goals, and information loss does not necessarily have adverse effect on speakers’ success in
attaining their goals.
What interpreting quality actually means has been a concern of professional
interpreters for a long time, and the issue has generated considerable literature, in the form of
theoretical reflection on one hand, and empirical research on user expectations and perception
of interpreting quality on the other. Much of it is relevant for all interpreting modes in various
environments and cannot be covered here. The focus in this article will be on three quality
parameters or components most relevant to the simultaneous mode, but more general
expectations as regards informational fidelity, ethical behavior, attitudes (in signed language
interpreting – see Taylor, 2009) should be kept in mind.
The quality component most obviously affected by the cognitive pressure of
simultaneous interpreting is informational fidelity. As discussed earlier, errors and omissions
are a frequent occurrence in this interpreting mode, which is traditionally considered less
informationally faithful than consecutive. This may well be the case for ‘short’ or sentence-
by-sentence consecutive, but does not necessarily hold for long consecutive, in which
interpreters take notes, for the simple reason that while listening to the speaker, interpreters
need to share attention between the Listening Effort and a Note Production Effort, just as in
simultaneous interpreting, they need to share attention between the Reception Effort and the
Production Effort. Moreover, since writing is slower than speaking, note-taking can involve
more lag during Reception than usual lag in simultaneous, with the associated risks of
working memory saturation and errors and omissions. This is particularly true of dense speech
segments such as enumerations, long numbers and multi-word names. In a small case study,
Gile (2001) found that consecutive renderings of a speech were not informationally superior
to simultaneous renderings.
Another vulnerable quality component in simultaneous is intonation, which often
becomes unnatural (see Shlesinger, 1994; Ahrens, 2004). In her doctoral work, Collados Aís
(1998) found that monotonous intonation in simultaneous had a negative impact on the
listeners’ perception of many other quality components, including fidelity and
professionalism. In later work by the group led by Collados Aís (e.g. Collados Aís et al.,
2011), it was found that manipulation of other single quality components (e.g. terminology,
grammar, voice) similarly ‘contaminated’ users’ perception of other components which had
been left untouched.
The third most vulnerable quality component in simultaneous interpreting is language
quality. This is due to two factors: firstly, the fact that because attentional resources must be
shared between Reception, Production, Memory and Coordination (see section 4.3), less
resources are available for each, including Production, than in everyday verbal interaction,
and infelicities or even language errors may occur. The second is that while in consecutive
and in everyday life, people listen to a speaker’s words, analyze them and extract their
meaning, and by the time they respond, they generally have forgotten most of the verbal form
of the utterance (Sachs, 1967), in simultaneous interpreting, the processing of the incoming
source language speech and the production of the target language speech occur at the same
time, and source language and target language elements are probably often being processes at
the same time in working memory (which is shared by the Reception Effort, the Memory
Effort and the Production Effort). This entails higher risks of linguistic contamination or
‘interference’ (see Seleskovitch & Lederer, 1989, chapter III).

8. Learning simultaneous interpreting

In view of all these difficulties, it is remarkable that simultaneous interpreters so often do


manage to produce faithful translations of source speeches. Some do without any special
training thanks to their natural talent, but it is generally considered that performing good
simultaneous requires training. Low success rates at graduation exams in demanding
interpreter training programs which strive to meet AIIC criteria for conference interpreting
give some weight to this idea, especially considering that admission criteria tend to be
stringent as well.
In such programs, the main objectives of training are fourfold: teach students the basic
techniques of interpreting, generally both consecutive and simultaneous, improve their
proficiency in the use of these techniques up to professional level, teach them professional
norms and test their level of proficiency at the end of training to provide the market with
qualified professionals. In many programs, (long) consecutive is taught before simultaneous.
Some trainers consider that consecutive is nothing but accelerated simultaneous, presumably
because in both, there is comprehension and ‘deverbalization’ of the source speech and then
reformulation. In cognitive terms, this is a somewhat simplistic position, inter alia because in
simultaneous there is no management of note taking, note reading and reconstruction of a
speech after a prolonged waiting period of several dozen seconds to several minutes, and in
consecutive, the risk of interference due to the simultaneous presence of source language and
target language lexical and syntactic units is very low. Nevertheless, teaching consecutive
before simultaneous has a number of advantages. One of them is that it is good training for
careful listening (Seleskovitch & Lederer, 1989, chapter II). Because memory for syntax and
words rapidly fades away, consecutive interpreters, as opposed to simultaneous interpreters,
necessarily translate from meaning, not from the linguistic form of the source speech. If they
have not listened carefully to the speaker, despite the notes they have taken, they are generally
incapable of reconstructing the speech. This also makes consecutive a good diagnostics tool
for language comprehension, language production and analytical skills: if an interpreter’s
output in simultaneous interpreting is weak, the problem may lie in poor comprehension of
the source language, in poor analytical skills, in poor language production skills in the source
language and/or in poor management of attentional resources.
Besides consecutive as general preparation, specific exercises are sometimes given to
students for a ‘softer’ introduction to the pressure of simultaneous, and in particular to the
sharing of attention. These include shadowing exercises, paraphrasing exercises, counting
backwards in the booth while listening to a speech. Once this stage is over, simultaneous
skills are acquired through practice, both in the classroom, with guidance from the instructors,
and in autonomous student groups. Through practice, attention management is fine-tuned and
some cognitive components of simultaneous are gradually automated. While it probably takes
up to ten years or longer to become an expert in the sense of cognitive psychology, after one
or two years of training, many students reach a high enough proficiency level to go into the
marketplace (see Seeber, this volume).

9. The present and future of simultaneous interpreting


As evidenced by its widespread use around the globe, simultaneous interpreting is a major
success in immediate multilingual communication, but it is associated with a number of
problems. One is its cost. While such cost is relative, and, as some conference interpreters like
to point out, it can be lower than the price of one coffee break at a large congress, in other
circumstances, for instance in dialogue interpreting where only two principals are involved, it
can legitimately be considered high. The interpreting budget of European institutions is also
reported to represent a substantial proportion of their total budget. The other problem is the
quality of the output, especially at a time when more and more speeches are read and
organizers are less and less inclined to send documents for advance preparation to the
interpreters.
There is increasing market pressure to lower the interpreters’ remuneration, but
beneath a certain threshold, which may vary across countries and types of interpreters, they
will refuse the assignments, while their services are indispensable in many cases, in particular
for interaction between deaf people and hearing people. As to quality, no matter how selective
training programs are, no matter how efficient the training methods, there are cognitive
limitations to what humans can do, and it is impossible for most if not all interpreters to
translate faithfully and idiomatically fast read speeches without an appropriate preparation.
One possible answer to these two problems is the use of a lingua franca, or several,
which would make it possible to do without interpreters. This is already done in some
specialized fields, often with English – or perhaps Globish. Interpreters tend to claim that
communication in a language other than one’s own is imperfect. But so is communication
through interpreting, and it cannot be ruled out that in many cases, people communicate with
each other better (and at a negligible price) in the lingua franca spoken by non natives than
through interpreting. In other cases, the lingua franca is not an option, because the principals
do not master an appropriate language. In the case of deaf people, by definition, they cannot
use any spoken language as a lingua franca – though they could choose one sign language to
serve that role when communicating with other deaf people.
Another possibility is automatic interpreting. Automatic speech recognition is
advancing in great strides, and the performance of dictation software is impressive. Automatic
translation has also made spectacular progress. Combining both, a quasi-simultaneous written
translation of speech is a distinct possibility. In all probability, for natural speech, the quality
of the output will remain far below that of human simultaneous interpreting because of errors
in speech recognition, errors in semantic interpretation of text and infelicities in the
production of target texts or speeches, but the cost of the process can well be brought down to
a negligible amount, to the extent that interlocutors may prefer to submit their main
exchanges to automatic interpreting and only turn to interpreters or bilinguals to clarify
residual issues.
Such options may develop for specialized purposes and settings, but are less likely to
be popular in settings where natural communication is important for the preservation of a
certain atmosphere or human relations. No matter how fast technological development will
be, human simultaneous interpreters will probably keep certain market segments, in particular
in political speeches, in debates, and in the media.

You might also like