0% found this document useful (0 votes)
99 views

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

The presentation provided an overview of machine translation and statistical machine translation models. It discussed language alignment systems, datasets used for training models, and decoding approaches. Specific models discussed included IBM Models 1 and 2, which use an expectation-maximization algorithm to produce word alignments. Decoding approaches mentioned were Viterbi algorithm, greedy hill climbing, and beam search. Results from studies on word alignment error rates were provided. Potential future work discussed improving models and implementing example-based machine translation.

Uploaded by

emailmyname
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

The presentation provided an overview of machine translation and statistical machine translation models. It discussed language alignment systems, datasets used for training models, and decoding approaches. Specific models discussed included IBM Models 1 and 2, which use an expectation-maximization algorithm to produce word alignments. Decoding approaches mentioned were Viterbi algorithm, greedy hill climbing, and beam search. Results from studies on word alignment error rates were provided. Potential future work discussed improving models and implementing example-based machine translation.

Uploaded by

emailmyname
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Machine Translation

A Presentation by:
Julie Conlonova,
Rob Chase,
and Eric Pomerleau
Overview

Language Alignment System


Datasets
Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus)
A word-aligned set for testing and evaluation
to measure accuracy and precision
Decoding
Language Alignment

Goal: Produce a word-aligned set from


a sentence-aligned dataset
First step on the road toward Statistical
Machine Translation
Example Problem:
The motion to adjourn the House is now
deemed to have been adopted.
La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
IBM Models 1 and 2
-Kevin Knight, A Statistical MT Tutorial Workbook, 1999

 Each capable of being used to produce a


word-aligned dataset separately.
 EM Algorithm
 Model 1 produces T-values based on
normalized fractional counting of
corresponding words.
 Additionally, Model 2 uses A-values for
“reverse distortion probabilities” –
probabilities based on the positions of the
words
Training Data
 European Parliament Proceedings Parallel
Corpus 1996-2003
 Aligned Languages:
English - French
English - Dutch
English - Italian
English - Finish
English - Portuguese
English - Spanish
English - Greek
Training Data cont.

Eliminated
Misaligned sentences
Sentences with 50 or more words
XML tags
Symbols and numerical characters other then
commas and periods
Ideally…

http://www.cs.berkeley.edu/~klein/cs294-5
Bypassing Interlingua: Models I-III

Variables contributing to the probability


of a sentence:
Correlation between words in the
source/target languages
Fertility of a word
Correlation between order of words in
source sentence and order of words
in target
A Translation Matrix
Rob Cat is Dog

Rob 1 0 0 0

Gato 0 1 0 0

es 0 0 .5 0

esta 0 0 .5 0

Perro 0 0 0 1
Building the Translation Matrix: Starting
from alignments

Find the sentence alignment


If a word in the source aligns with a word
in the target, then increment the
translation matrix.
Normalize the translation matrix
Can’t find alignments

Most sentences in the hansards corpus


are 60 words long. There are many that
can be over 100.
100100 possible alignments
Counting

Rob is a boy. Rob es nino.


Rob is tall. Rob es alto.
Eric is tall. Eric es alto.
… …
Base counts on co-occurrence, weighting
based on sentence length.
Iterative Convergence
 Use Estimation Rob Is Tall boy
Maximization
algorithm Rob .66 .33 .25 .25
 Creates translation
matrix es .30 .66 .25 .25

alto .2 .05 .5 0

nino .2 .05 0 .5
Distorting the Sentence

Word order changes between languages


How is a sentence with 2 words distorted?
How is a sentence with 3 words distorted?
How is a sentence with …

To keep track of this information we use…


A tesseract!

(A quadruply nested default


dictionary)
This could be a problem if there
are more than 100 words in a
sentence.
100x100x100x100 = too big for
RAM and takes too much time
Broad Look at MT

 “The translation process can be


described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target
language.”
- “Translation Process”, Wikipedia, May 2006
Decoding

How to go from the T-matrix and A-matrix


to a word alignment?

There are several approaches…


Viterbi

If only doing alignment, much smaller


memory and time requirements.
Returns optimal path.

T-Matrix probabilities function as the


“emission” matrix
A-Matrix probabilities concerned with
the positioning of words
Decoding as a Translator

Without supplying a translated sentence


to the program, it is capable of being a
stand-alone translator instead of a word
aligner.

However, while the Viterbi algorithm runs


quickly with pruning for decoding, for
translating the run time skyrockets.
Greedy Hill Climbing
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Best first search


2-step look ahead to avoid getting stuck in
most probable local maxima
Beam Search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Optimization of Best First Search with


heuristics and “beam” of choices
Exponential tradeoff when increasing the
“beam” width
Other Decoding Methods
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Finite State Transducer


Mapping between languages based on a finite
automaton
Parsing
String to Tree Model
Problem: One to Many

Necessary to take all alignments over a


certain probability in order to capture the
“probability that e has fertility at least a
given value”

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999


Results

Study done in 2003 on word alignment


error rates in Hansards corpus:
Model 2 –
29.3% on 8K training sentence pairs
19.5% on 1.47M training sentence pairs
Optimized Model 6 –
20.3% on 8K training sentence pairs
8.7% on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various Statistical Alignment
Models, 2003
Expected Accuracy

70% overall
Language performance:
 Dutch
 French
• Italian, Spanish, Portuguese
 Greek
 Finish
Possible Future Work

 Given more time, we would’ve implemented IBM


Model 3
 Additionally uses n, p, and d fertilities for weighted
alignments:
 N, number of words produced by one word
 D, distortion
 P, parameter involving words that aren’t involved directly
 Invokes Model 2 for scoring
Another Possible Translation Scheme

Example-Based Machine Translation


Translation-by-Analogy
Can sometimes achieve better than the “gist”
translations from other models
Why Is Improving Machine
Translation Necessary?
A Chinese to English Translation
The End
Are there any
questions/comments?

You might also like