Módulo 3 - Updated
Módulo 3 - Updated
TECHNOLOGY
NDJIMI MALAKA, PHD.
1. DEFINITION OF TERMS
2. MACHINE TRANSLATION
3. MACHINE TRANSLATION SYSTEMS
4. COMPUTER-AIDED TRANSLATION TOOLS AND RESOURCES
5. EVALUATING TRANSLATION TOOLS
6. RECENT DEVELOPMENTS AND FUTURE DIRECTIONS
• AIMS:
• TO DISCUSS THE DEVELOPMENT OF DIFFERENT MACHINE TRANSLATION TOOLS.
TOPICS:
a. OVERVIEW
b. MACHINE TRANSLATION MODEL
c. LEVELS OF MT
Restricted dictionary
expansion.
Restricted to single-
clause/basic sentence
translations.
The lecturer, Ndjimi Malaka, PhD.
2. STANDARD LEVEL SYSTEM
1. HAS MORE THAN 50,000 ENTRIES IN ITS LARGEST
DICTIONARY,
2. ALLOWS FOR DICTIONARY EXPANSION,
3. ALLOWS MORE THAN SINGLE-CLAUSE/BASIC
SENTENCE TRANSLATIONS,
4. IS SUITABLE FOR HOME USE AND STAND-ALONE
OFFICE USE.
The lecturer, Ndjimi Malaka, PhD.
Has more than 75,000 entries
in its smallest dictionary,
• AIMS:
a. TO DESCRIBE AND INVESTIGATE SPECIFIC MT APPROACHES .
b. TO GIVE A DETAILED DESCRIPTION OF DIFFERENT MACHINE TRANSLATION SYSTEM DESIGNS
ALSO KNOWN AS ‘ARCHITECTURES’.
IN 1629, DESCARTES MAY HAVE BEEN THE FIRST TO PROPOSE THE IDEA THAT A LANGUAGE
COULD BE REPRESENTED BY CODES AND THAT WORDS OF DIFFERENT LANGUAGES WITH
EQUIVALENT MEANING COULD SHARE THE SAME CODE.
IN THE EARLY YEARS OF MACHINE TECHNOLOGY, THE COMMON TERM USED WAS ‘AUTOMATIC
TRANSLATION’ OR ‘MECHANICAL TRANSLATION’.
ONLY AFTER THE SECOND WORLD WAR THAT THE POSSIBILITIES OF
LANGUAGE TRANSLATION USING STORED-PROGRAM COMPUTERS WERE EXPLORED.
They all consist of only word and sentence structure analysis with much of the lexical
ambiguities unresolved. Their domains are restricted to certain subject fields such as
computer science and information technology. These machine translation systems
require extensive
The lecturer, Ndjimi Malaka, PhD. pre-editing and post-editing by human translators
CHECK!
IMPORTANCE MTS
MAJOR HISTORICAL DEVELOPMENTS
DISTINGUISH BETWEEN 1ST AND 2ND GENERATION SYSTEMS
Parser
The goal of the parser is to identify the
relationships between source-language
words and their structural representations
• (*A-SUPPLIES
(TENSE PRESENT)
(MOOD DECLARATIVE)
(PUNCTUATION PERIOD)
(SOURCE (*O-HOT AIR
(REFERENCE DEFINITE)
(NUMBER SINGULAR)
(ATTRIBUTE (*P-INSTANT))))
• (THEME (*U-HEAT
(REFERENCE DEFINITE)
(NUMBER SINGULAR)
(ATTRIBUTE (*P-NECESSARY))))
• (GOAL_TO (*O-ALL LABORATORIES
• (REFERENCE INDEFINITE)
• (NUMBER PLURAL)))
• EXAMPLE OF STRUCTURAL REPRESENTATIONS
3.5.1.1 MODELS OF THE DTA
AN ‘INTERLINGUA’ REPRESENTS ‘ALL SENTENCES THAT MEAN THE “SAME” THING IN THE
SAME WAY, REGARDLESS OF THE LANGUAGE THEY HAPPEN TO BE IN’.
3. INTERLINGUA SYSTEMS ARE HIGHLY MODULAR IN THE SENSE THAT ONE PART OF
THE SYSTEM DOES NOT AFFECT OTHER PARTS.
The example-based
approach
The rule-based approach is often expensive, and may produce inconsistent results
when new linguistic rules are added.
The lecturer, Ndjimi Malaka, PhD.
In contrast, the corpus-based approach is flexible enough to
process sentences even if they are ill formed. However, when
long sentences are involved, the processing time tends to be
lengthy.
• WHAT IS HAMT?
Ideally, pre-edited and controlled language texts are free from ambiguity and
complex sentences.
Unedited text, on the other hand, has had no editing prior to translation.
For systems that have an interactive mode, a human is allowed to correct or select
appropriate equivalents during the automatic translation process. Otherwise,
corrections can only be performed at the post-editing stage, which is after the
machine translation
The lecturer, Ndjimi Malaka, PhD. system has produced the translation.
EXAMPLES OF HUMAN-AIDED MACHINE
TRANSLATION SYSTEMS
• MATRA PRO AND LITE
DEVELOPED AT THE NATIONAL CENTRE FOR SOFTWARE TECHNOLOGY BASED IN MUMBAI,
INDIA, THAT TRANSLATE FROM ENGLISH INTO HINDI.
HUMAN-AIDED MACHINE TRANSLATION SYSTEMS HAVE BEEN IMPLEMENTED AT SCHREIBER
TRANSLATIONS, INC., FOREIGN LANGUAGE SERVICES, INC. AND RALPH MCELROY
TRANSLATION COMPANY, ALL COMPANIES THAT ARE EMPLOYED TO TRANSLATE PATENTS FOR
THE UNITED STATES (US) PATENT AND TRADEMARK OFFICE.
Fuzzy matching
The translation
workflow
Workbenches
CATTR Alignment
Terminology
management
systems
Segmentation
Filter
Terminology is arranged by concept. Each concept has a label – or set of labels if synonymous
– called a ‘term’, which is a single word or a string of words used to represent it in the
language of the specialized field.
The lecturer, Ndjimi Malaka, PhD.
• CONCEPTS ARE ARRANGED ‘SYSTEMATICALLY’ TO REFLECT THE ORGANIZATION OF
KNOWLEDGE IN A PARTICULAR SUBJECT FIELD, FOR EXAMPLE TO EXHIBIT A HIERARCHICAL
RELATIONSHIP OF SCIENTIFIC CLASSIFICATION OR TAXONOMY.
After selecting a TMS solution, the next step is to plan and execute the
implementation process. This may involve tasks such as data migration,
configuration, customization, and integration with existing systems and
workflows. It’s important to involve key stakeholders throughout the
implementation process and provide adequate training and support to
ensure a smooth transition and adoption of the TMS.