0% found this document useful (0 votes)
39 views37 pages

(Slide) Neural Machine Translation

The document discusses neural machine translation, including an introduction covering the basics of NMT and how it is evaluated. It then provides outlines on NMT using Transformer models and using pre-trained language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views37 pages

(Slide) Neural Machine Translation

The document discusses neural machine translation, including an introduction covering the basics of NMT and how it is evaluated. It then provides outlines on NMT using Transformer models and using pre-trained language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

AI VIETNAM

All-in-One Course

NLP Project

Neural Machine Translation

AI VIET NAM
Nguyen Quoc Thai

1
Year 2023
Outline
Ø Introduction
Ø NMT using Transformer
Ø NMT using Pre-trained LMs

2
Introduction
! Translate a sentence w(s) in a source language (input) to a sentence w(t) in the
target language (output)

3
Introduction
! Translate a sentence w(s) in a source language (input) to a sentence w(t) in the
target language (output)

Automatic Speech Natural Language Natural Language


Recognition (ASR) Understanding (NLU) Generation (NLG)

translation of spoken a computer’s ability to generate natural


language into text understand language language by a computer

q Syntax
q Semantics
q Phonology
q Pragmatics
q Morphology

4
Introduction
! Translate a sentence w(s) in a source language (input) to a sentence w(t) in the
target language (output)

Ø Can be formulated as an optimization problem:


! (") = argmax 𝜃( 𝑤 (%) , 𝑤 (&) )
𝑤
$(")
Where 𝜃 is a scoring function over source and target sentences
Ø Requires two components:
q Learning algorithm to compute parameters of 𝜃
! (")
q Decoding algorithm for computing the best translation 𝑤

5
Introduction

1950 1980 1990 2007 2015


6
Introduction
! Evaluating translation quality

Ø Human judgement
q Given: machine translation output
q Given: source / reference translation
q Task: assess the quality of machine translation output
Ø Different translations of “A Vinay le gusta Python”

7
Introduction
! Evaluating translation quality

Ø Two main criteria:


q Adequacy: Translation w(t) should adequately reflect the linguistic content of w(s)
q Fluency: Translation w(t) should be fluent text in the target language

Ø Different translations of “A Vinay le gusta Python”

8
Introduction
! Evaluating translation quality

Ø Two main criteria:


q Adequacy: Translation w(t) should adequately reflect the linguistic content of w(s)
q Fluency: Translation w(t) should be fluent text in the target language

Ø Adequacy and fluency: Adequacy Fluency


5 All meaning 5 Flawless English
4 Most meaning 4 Good English
3 Much meaning 3 Non-native English
2 Little meaning 2 Disfluent English
1 None 1 Incomprehensible

9
Introduction
! Evaluating Metrics

Ø Manual evaluation is most accurate, but expensive


Ø Automated evaluation metrics:
q Compare system hypothesis with reference translations
q BLEU Score (BiLingual Evaluation Understudy): Modified n-gram Precision
q SacreBLEU Score (A Call for Clarity in Reporting BLEU Scores)

10
Introduction
! Evaluating Metrics

Precision and Recall of words


System A A officials responsibility of airport safety
Reference A officials are responsible for airport security

Ø Precision: Ø Recall:
correct 3 correct 3
= = 50% = = 43%
output − length 6 reference − length 7
Ø F-measure:
PxR 0.5 x 0.43
= = 46%
(P + R)/2 (0.5 + 0.43)/2

11
Introduction
! Evaluating Metrics

Precision and Recall of words


v Flaw: no penalty for reordering
System A A officials responsibility of airport safety
Reference A officials are responsible for airport security
System B airport security A officials are responsible

Metric System A System B


Precision 50% 100%
Recall 43% 86%
F-measure 46% 92,5%

12
Introduction
! Evaluating Metrics

BLEU
v N-gram overlap between machine translation output and reference translation
v Compute precision for n-grams of size 1 to 4
v Add brevity penalty (for too short translations)
* )/*
output − length
BLEU = min 1, > precision)
reference − length
'()
v Typically computed over the entire corpus, not single sentences

13
Introduction
! Evaluating Metrics

BLEU 1-gram
System A A officials responsibility of airport safety
Reference A officials are responsible for airport security
System B airport security A officials are responsible
Metric System A System B
Precision (1 gram) 3/6 6/6
Precision (2 gram)
Precision (3 gram)
Precision (4 gram)
Brevity penalty
BLEU
14
Introduction
! Evaluating Metrics

BLEU
System A A officials responsibility of airport safety
Reference A officials are responsible for airport security
System B airport security A officials are responsible

2 -gram Metric System A System B


Precision (1 gram) 3/6 6/6
Precision (2 gram) 1/5 4/5
Precision (3 gram) 0/4 2/4
Precision (4 gram) 0/3 1/4
Brevity penalty 6/7 6/7
BLEU 0 0.52
15
Introduction
! Evaluating Metrics

BLEU
-
r
logBLEU = min 1 − , 0 + B w,logp,
c
,()
r: reference-length, c: output (candidate)-length
n: n-gram (1,2,3,4), wn: weight of n-gram
uniform weights wn=1/n
pn: precision n-gram
SacreBLEU (A Call for Clarity in Reporting BLEU)

16
Introduction
! Evaluating Metrics

17
Outline
Ø Introduction
Ø NMT using Transformer
Ø NMT using Pre-trained LMs

18
NMT using Transformer
! Sequence to Sequence

v A single neural network is used to translate from source to target


v Architecture: Encoder-Decoder
v Encoder: Convert source sentence (input) into a vector/matrix (State)
v Decoder: Convert encoding into a sentence in target language (output)

Input Decoder State Encoder Output


Thought Vector
Capture all information of input sentence

19
NMT using Transformer
! Transformer Model

20
NMT using Transformer
! Training
Target
I go to work <end>

Loss
Prediction I go _earn work <end>

t
_ôi
đi ENCODER DECODER
l
_àm

<start> I go to work
21
NMT using Transformer
! Training

How to choose “Best candidate”


Output Sequence (Target)

ENCODER DECODER

Input Sequence (Source) 22


NMT using Transformer
! Greedy Decoding

23
Outline
Ø Introduction
Ø NMT using Transformer
Ø NMT using Pre-trained LMs

24
NMT using Pre-trained LMs
! Pre-trained LMs

25
NMT using Pre-trained LMs
! Pre-trained LMs

Source
26
NMT using Pre-trained LMs
! Pre-trained LMs

27
NMT using Pre-trained LMs
! Pre-trained LMs: BERT

v BERT: An encoder-only model


H 𝟏:𝒏
v Maps an input sequence to a contextualized sequence: 𝒇𝜽𝑩𝑬𝑹𝑻 : 𝑿𝟏:𝒏 ⟶ 𝑿

28
NMT using Pre-trained LMs
! Pre-trained LMs: BERT

29
NMT using Pre-trained LMs
! Pre-trained LMs: GPT2

v GPT2: A decoder-only model, use uni-directional (causal) self-attention


v Maps an input sequence to a “next word” logit vector sequence:
𝒇𝜽𝑮𝑷𝒀𝟐 : 𝑿𝟎:𝒎4𝟏 ⟶ 𝑳𝟏:𝒎

30
NMT using Pre-trained LMs
! Pre-trained LMs: GPT2

31
NMT using Pre-trained LMs
! Encoder-Decoder with BERT and GPT2

32
NMT using Pre-trained LMs
! BERT for Encoder

33
NMT using Pre-trained LMs
! BERT for Decoder

34
NMT using Pre-trained LMs
! GPT2 for Decoder

35
NMT using Pre-trained LMs
! Experiment

v Dataset: IWSLT’15 English-Vietnamese


Training: 133 317 Validation: 1 553 Test: 1 269
Experiment Model ScareBLEU
#1 Standard Transformer (Greedy Search) 24.66 55.9/30.3/18.5/11.8
#2 BERT-to-BERT (Greedy Search) 25.41 53.8/31.8/19.8/12.3
#3 BERT-to-GPT2 (Greedy Search) 23.56 49.1/28.5/18.4/12.0

36
Thanks!
Any questions?

37

You might also like