BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
• Word-level
VinAI công_bố các kết_quả nghiên_cứu khoa_học tại hội_nghị hàng_đầu thế_giới về trí_tuệ
nhân_tạo
• BARTpho in fairseq
7
Resolve problems/issues
• Be used with popular libraries fairseq (Facebook - 2019) and
transformers (huggingface.co)
• Can serve as a strong baseline for future research
applications of generative natural language processing task Vietnamese
8
Compare baseline mBART (Facebook - 2020)
• Multilingual Denoising Pre-training for Neural Machine Translation
• focused only on the encoder, decoder, or reconstructing parts of the text
• fine tuned for supervised (both sentence-level and document-level) and
unsupervised machine translation
• mBART up to 12 BLEU points for low resource MT and over 5 BLEU points
Data train Data dev Data test
Original 105418 (~70%) 22642 (~15%) 22644 (~15%)
After filtering duplicate 102044 21040 20733
~70% ~15% ~15%
9
Compare baseline mBART (Facebook - 2020)
10
Compare others
12
Architecture
• 12 encoder and decoder layers and pre-training scheme of BART
• pre-training BART has two stages:
13
Pre-training data
• Reuse the PhoBERT’s tokenizer and BPE
• PhoBERT pre-training corpus
• Used a large-scale corpus of 20GB Vietnamese texts
• Pre-training corpus of 145M word-segmented sentences (4B word tokens)
14
Architecture
• Transformer architecture
-> Attention Is All You Need
• Has fine-tune
• use a batch size of 512 sequence blocks
• learning rate of 0.0001
• etc...
15
Architecture
• Transformer architecture
-> Attention Is All You Need
• Has fine-tune
• use a batch size of 512 sequence blocks
• learning rate of 0.0001
• etc...
16
17
Transfomer evolution
BARTPho
18
Transfomer Model
19
Attention mechanism
20
Demo Multiplication
https://www.symbolab.com/graphing-calculator
21
22
23
24
25
Attention mechanism
v2 v3
v1
27
28
Multi-Head Attention Layer
29
30
31
32
33
34
Demo :
https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/t
ensor2tensor/notebooks/hello_t2t.ipynb
35
36
37
38
39
40
41
BERT Model
42
GPT Model
43
44
BART Model
45
BART Model
46
Xoay văn bản (Document Rotation): Một
token được chọn ngẫu nhiên, văn bản được Điền văn bản (Text Infilling): Một vài đoạn văn
xoay để bắt đầu với token đó. Điều này giúp bản ngẫu nhiên được thay thế bằng [MASK]. Đặ
cho mô hình học được đâu và điểm bắt đầu c biệt, đoạn văn bản có thể là rỗng.
của văn bản.
48
Click to add text
49
Demo
• Colab :
https://colab.research.google.com/drive/1JRSGghV7oWgRSLHqqyxpfZg
UjxSqz1YB?usp=sharing
• Source code :
https://github.com/VinAIResearch/BARTpho
• Ours : https://github.com/hmthanh/BARTpho_code
50
Conclusion
• BARTPho is absolutely base on BART to Vietnamese language
• The main contribution of the author is weight training and
tokenization in Vietnamese language
• Via result evaluate BARTpho helps produce the SOTA performance for
the Vietnamese text summarization task
• Outstanding successes SOTA -> premise for research
• BARTphosyllable and BARTphoword—the first pre-trained and largescale
monolingual seq2seq models for Vietnamese.
51
52