0% found this document useful (0 votes)

99 views

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

The presentation provided an overview of machine translation and statistical machine translation models. It discussed language alignment systems, datasets used for training models, and decoding approaches. Specific models discussed included IBM Models 1 and 2, which use an expectation-maximization algorithm to produce word alignments. Decoding approaches mentioned were Viterbi algorithm, greedy hill climbing, and beam search. Results from studies on word alignment error rates were provided. Potential future work discussed improving models and implementing example-based machine translation.

Uploaded by

emailmyname

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

Uploaded by

emailmyname

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 31

Machine Translation

A Presentation by:
Julie Conlonova,
Rob Chase,
and Eric Pomerleau
Overview

Language Alignment System

Datasets
Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus)
A word-aligned set for testing and evaluation
to measure accuracy and precision
Decoding
Language Alignment

Goal: Produce a word-aligned set from

a sentence-aligned dataset
First step on the road toward Statistical
Machine Translation
Example Problem:
The motion to adjourn the House is now
deemed to have been adopted.
La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
IBM Models 1 and 2
-Kevin Knight, A Statistical MT Tutorial Workbook, 1999

 Each capable of being used to produce a

word-aligned dataset separately.
 EM Algorithm
 Model 1 produces T-values based on
normalized fractional counting of
corresponding words.
 Additionally, Model 2 uses A-values for
“reverse distortion probabilities” –
probabilities based on the positions of the
words
Training Data
 European Parliament Proceedings Parallel
Corpus 1996-2003
 Aligned Languages:
English - French
English - Dutch
English - Italian
English - Finish
English - Portuguese
English - Spanish
English - Greek
Training Data cont.

Eliminated
Misaligned sentences
Sentences with 50 or more words
XML tags
Symbols and numerical characters other then
commas and periods
Ideally…

http://www.cs.berkeley.edu/~klein/cs294-5
Bypassing Interlingua: Models I-III

Variables contributing to the probability

of a sentence:
Correlation between words in the
source/target languages
Fertility of a word
Correlation between order of words in
source sentence and order of words
in target
A Translation Matrix
Rob Cat is Dog

Rob 1 0 0 0

Gato 0 1 0 0

es 0 0 .5 0

esta 0 0 .5 0

Perro 0 0 0 1
Building the Translation Matrix: Starting
from alignments

Find the sentence alignment

If a word in the source aligns with a word
in the target, then increment the
translation matrix.
Normalize the translation matrix
Can’t find alignments

Most sentences in the hansards corpus

are 60 words long. There are many that
can be over 100.
100100 possible alignments
Counting

Rob is a boy. Rob es nino.

Rob is tall. Rob es alto.
Eric is tall. Eric es alto.
… …
Base counts on co-occurrence, weighting
based on sentence length.
Iterative Convergence
 Use Estimation Rob Is Tall boy
Maximization
algorithm Rob .66 .33 .25 .25
 Creates translation
matrix es .30 .66 .25 .25

alto .2 .05 .5 0

nino .2 .05 0 .5
Distorting the Sentence

Word order changes between languages

How is a sentence with 2 words distorted?
How is a sentence with 3 words distorted?
How is a sentence with …

To keep track of this information we use…

A tesseract!

(A quadruply nested default

dictionary)
This could be a problem if there
are more than 100 words in a
sentence.
100x100x100x100 = too big for
RAM and takes too much time
Broad Look at MT

 “The translation process can be

described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target
language.”
- “Translation Process”, Wikipedia, May 2006
Decoding

How to go from the T-matrix and A-matrix

to a word alignment?

There are several approaches…

Viterbi

If only doing alignment, much smaller

memory and time requirements.
Returns optimal path.

T-Matrix probabilities function as the

“emission” matrix
A-Matrix probabilities concerned with
the positioning of words
Decoding as a Translator

Without supplying a translated sentence

to the program, it is capable of being a
stand-alone translator instead of a word
aligner.

However, while the Viterbi algorithm runs

quickly with pruning for decoding, for
translating the run time skyrockets.
Greedy Hill Climbing
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Best first search

2-step look ahead to avoid getting stuck in
most probable local maxima
Beam Search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Optimization of Best First Search with

heuristics and “beam” of choices
Exponential tradeoff when increasing the
“beam” width
Other Decoding Methods
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

Finite State Transducer

Mapping between languages based on a finite
automaton
Parsing
String to Tree Model
Problem: One to Many

Necessary to take all alignments over a

certain probability in order to capture the
“probability that e has fertility at least a
given value”

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

Results

Study done in 2003 on word alignment

error rates in Hansards corpus:
Model 2 –
29.3% on 8K training sentence pairs
19.5% on 1.47M training sentence pairs
Optimized Model 6 –
20.3% on 8K training sentence pairs
8.7% on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various Statistical Alignment
Models, 2003
Expected Accuracy

70% overall
Language performance:
 Dutch
 French
• Italian, Spanish, Portuguese
 Greek
 Finish
Possible Future Work

 Given more time, we would’ve implemented IBM

Model 3
 Additionally uses n, p, and d fertilities for weighted
alignments:
 N, number of words produced by one word
 D, distortion
 P, parameter involving words that aren’t involved directly
 Invokes Model 2 for scoring
Another Possible Translation Scheme

Example-Based Machine Translation

Translation-by-Analogy
Can sometimes achieve better than the “gist”
translations from other models
Why Is Improving Machine
Translation Necessary?
A Chinese to English Translation
The End
Are there any
questions/comments?

Lecture 10 - Text To Speech
No ratings yet
Lecture 10 - Text To Speech
76 pages
CD & DVD Working Principle-1
No ratings yet
CD & DVD Working Principle-1
7 pages
An Introduction To Machine Translation: Andy Way, DCU
No ratings yet
An Introduction To Machine Translation: Andy Way, DCU
23 pages
English
0% (1)
English
13 pages
James Joyce: 1.-"DUBLINERS" 2.-"A Portrait of The Artist As A Young Man" 3.-"ULYSSES"
No ratings yet
James Joyce: 1.-"DUBLINERS" 2.-"A Portrait of The Artist As A Young Man" 3.-"ULYSSES"
18 pages
Paradise Lost
No ratings yet
Paradise Lost
16 pages
Waiting For GOdot
No ratings yet
Waiting For GOdot
25 pages
Examination Related Issues: Kanthapura
No ratings yet
Examination Related Issues: Kanthapura
16 pages
Dokumen - Tips Fielding Tom Jones Power Point Presentation
No ratings yet
Dokumen - Tips Fielding Tom Jones Power Point Presentation
17 pages
Joseph-Conrad PDF
No ratings yet
Joseph-Conrad PDF
15 pages
Diaspora House of MR Biswas
No ratings yet
Diaspora House of MR Biswas
4 pages
Victorian Age Presentation
No ratings yet
Victorian Age Presentation
30 pages
Mrs Dalloway: Virginia Wolf
No ratings yet
Mrs Dalloway: Virginia Wolf
6 pages
Post Modernism
No ratings yet
Post Modernism
16 pages
Modernism and Post Modernism in Literature
0% (1)
Modernism and Post Modernism in Literature
16 pages
Indian Writers Part 2: Help@gradeup - Co
No ratings yet
Indian Writers Part 2: Help@gradeup - Co
30 pages
Joseph Conrad: Momchil and Nikolay 12b
No ratings yet
Joseph Conrad: Momchil and Nikolay 12b
8 pages
Irony in Pride and Prejudice
No ratings yet
Irony in Pride and Prejudice
16 pages
Introduction To Victorian Er A and Poetry
No ratings yet
Introduction To Victorian Er A and Poetry
19 pages
The Adventures of Huckleberry Finn
No ratings yet
The Adventures of Huckleberry Finn
9 pages
Post Modernism: Ar Hena Tiwari Jan-July 2016, GCAD Sonipat
No ratings yet
Post Modernism: Ar Hena Tiwari Jan-July 2016, GCAD Sonipat
18 pages
A Passage To India
No ratings yet
A Passage To India
29 pages
Selected Lines From Paradise Lost
No ratings yet
Selected Lines From Paradise Lost
8 pages
02 Neo Classicism & 19th Century Architectecture
No ratings yet
02 Neo Classicism & 19th Century Architectecture
81 pages
Joseph Conrad
0% (1)
Joseph Conrad
8 pages
Concept of Plot and Plot of Tom Jones Part 5
No ratings yet
Concept of Plot and Plot of Tom Jones Part 5
10 pages
Hard Times Story and Characters
No ratings yet
Hard Times Story and Characters
2 pages
Literary Devices
No ratings yet
Literary Devices
44 pages
Beckett
No ratings yet
Beckett
16 pages
RAJARAO-Father Ofindian English Novels
100% (2)
RAJARAO-Father Ofindian English Novels
7 pages
Speech Processing
No ratings yet
Speech Processing
9 pages
Look Back in Anger
No ratings yet
Look Back in Anger
23 pages
Lord Jim Powerpoint Presentation
No ratings yet
Lord Jim Powerpoint Presentation
8 pages
Paradise Lost
No ratings yet
Paradise Lost
33 pages
Recent Advances in Natural Language Processing
No ratings yet
Recent Advances in Natural Language Processing
50 pages
Sons and Lovers: D.H. Lawrence
No ratings yet
Sons and Lovers: D.H. Lawrence
9 pages
John Donne: Metaphysical Poet: "Now Thou Hast Loved Me One Whole Day, Tomorrow When Thou Leavest, What Wilt Though Say?"
No ratings yet
John Donne: Metaphysical Poet: "Now Thou Hast Loved Me One Whole Day, Tomorrow When Thou Leavest, What Wilt Though Say?"
12 pages
A Doll's House - Character
No ratings yet
A Doll's House - Character
7 pages
The Rise of The Novel (18th Century PP
No ratings yet
The Rise of The Novel (18th Century PP
11 pages
The Tempest Gender Issue
No ratings yet
The Tempest Gender Issue
2 pages
Analysis of The Tempest
No ratings yet
Analysis of The Tempest
6 pages
Charles Dickens's Life
100% (1)
Charles Dickens's Life
4 pages
Pride and Prejudice
No ratings yet
Pride and Prejudice
49 pages
Neoclassicism: Steve Wood TCCC
No ratings yet
Neoclassicism: Steve Wood TCCC
18 pages
Machine Translation Technologies
No ratings yet
Machine Translation Technologies
30 pages
Augustan Literature. Satires and Essays
No ratings yet
Augustan Literature. Satires and Essays
35 pages
The Tempest
No ratings yet
The Tempest
20 pages
Neo-Classicism, Pope, Dryden
No ratings yet
Neo-Classicism, Pope, Dryden
9 pages
Thomas Hardy, Tess
No ratings yet
Thomas Hardy, Tess
10 pages
In Memoriam A.H.H.: Alfred, Lord Tennyson
No ratings yet
In Memoriam A.H.H.: Alfred, Lord Tennyson
5 pages
T1 Intro Speech Processing
No ratings yet
T1 Intro Speech Processing
21 pages
A Passage To India Themes
No ratings yet
A Passage To India Themes
14 pages
An Analysis of Elements of Drama in Samuel Beckett's Waiting For Godot
No ratings yet
An Analysis of Elements of Drama in Samuel Beckett's Waiting For Godot
13 pages
Mrs Dalloway: by Virginia Woolf
No ratings yet
Mrs Dalloway: by Virginia Woolf
8 pages
Passage To India
100% (2)
Passage To India
13 pages
An Introduction To British Romanticism
No ratings yet
An Introduction To British Romanticism
21 pages
The Theatre of The Absurd
100% (1)
The Theatre of The Absurd
13 pages
Sister Carrie
No ratings yet
Sister Carrie
11 pages
The Portrait of The Artist As A Young Man
No ratings yet
The Portrait of The Artist As A Young Man
10 pages
A Study Guide for Robert Frost's "Mending Wall"
From Everand
A Study Guide for Robert Frost's "Mending Wall"
Gale
No ratings yet
What God Has Put Asunder
From Everand
What God Has Put Asunder
Victor Epie'Ngome
No ratings yet
Brand Architecture PDF
100% (3)
Brand Architecture PDF
3 pages
Kalman
No ratings yet
Kalman
36 pages
(Ebook) Essentials of Body MRI by William E. Brant (editor), Eduard E. de Lange (editor) ISBN 9780199738496, 0199738491, 2011039714 instant download
100% (2)
(Ebook) Essentials of Body MRI by William E. Brant (editor), Eduard E. de Lange (editor) ISBN 9780199738496, 0199738491, 2011039714 instant download
54 pages
Introduction To Web Development
No ratings yet
Introduction To Web Development
1 page
Collibra DIC Data Privacy 2022.05
No ratings yet
Collibra DIC Data Privacy 2022.05
185 pages
NPTEL List of Courses - Phase I
No ratings yet
NPTEL List of Courses - Phase I
6 pages
Portfolio2024 BlankaKania CGAarts WIP 2
No ratings yet
Portfolio2024 BlankaKania CGAarts WIP 2
25 pages
Ruijie - SME Product Mapping - Poster
No ratings yet
Ruijie - SME Product Mapping - Poster
1 page
Mcbe1-D123n7 Mcbe1-D123u7 Mcbe1-D253n7 Mcbe1-D253u7
No ratings yet
Mcbe1-D123n7 Mcbe1-D123u7 Mcbe1-D253n7 Mcbe1-D253u7
5 pages
Ibs Seksyen 18, Shah Alam 1 30/09/22
No ratings yet
Ibs Seksyen 18, Shah Alam 1 30/09/22
3 pages
First Page Softeng
No ratings yet
First Page Softeng
12 pages
2024-03-12
No ratings yet
2024-03-12
42 pages
Enhancing Strategic Decision Making
No ratings yet
Enhancing Strategic Decision Making
9 pages
h5 Manual PDF 5f58f94d402e7
No ratings yet
h5 Manual PDF 5f58f94d402e7
62 pages
15CS33 PDF
No ratings yet
15CS33 PDF
147 pages
Solar Bag
No ratings yet
Solar Bag
43 pages
Rog Phone zs600kl
No ratings yet
Rog Phone zs600kl
90 pages
Dasamayi Bala
No ratings yet
Dasamayi Bala
5 pages
3 Wbook
No ratings yet
3 Wbook
100 pages
Algorithmic Accountability Act of 2019 Bill Text
No ratings yet
Algorithmic Accountability Act of 2019 Bill Text
15 pages
Cybercrimes Act PDF
No ratings yet
Cybercrimes Act PDF
23 pages
6.4.1.2 Packet Tracer - Configure Initial Router Settings
No ratings yet
6.4.1.2 Packet Tracer - Configure Initial Router Settings
4 pages
Bright Fluorescent Dye Models
No ratings yet
Bright Fluorescent Dye Models
4 pages
Mech ch3
No ratings yet
Mech ch3
7 pages
Sap Reports
No ratings yet
Sap Reports
307 pages
Experiment 2-os record
No ratings yet
Experiment 2-os record
3 pages
Unit II Lesson 4 Determining Probabilities
100% (2)
Unit II Lesson 4 Determining Probabilities
14 pages
Assignement of Law & Poverty by Zubair Ahmad
No ratings yet
Assignement of Law & Poverty by Zubair Ahmad
4 pages
Current Log
No ratings yet
Current Log
6 pages

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

Uploaded by

Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau

Uploaded by

Machine Translation

Language Alignment System

Goal: Produce a word-aligned set from

 Each capable of being used to produce a

Variables contributing to the probability

Find the sentence alignment

Most sentences in the hansards corpus

Rob is a boy. Rob es nino.

Word order changes between languages

To keep track of this information we use…

(A quadruply nested default

 “The translation process can be

How to go from the T-matrix and A-matrix

There are several approaches…

If only doing alignment, much smaller

T-Matrix probabilities function as the

Without supplying a translated sentence

However, while the Viterbi algorithm runs

Best first search

Optimization of Best First Search with

Finite State Transducer

Necessary to take all alignments over a

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

Study done in 2003 on word alignment

 Given more time, we would’ve implemented IBM

Example-Based Machine Translation

You might also like