Srihari Thirumaligai Precis 2
Srihari Thirumaligai Precis 2
Srihari Thirumaligai Precis 2
Eisner
02/22/24
PRECIS FOR: “Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine
Translation”
In the article "Chain-of-Dictionary Prompting Elicits Translation in Large Language Models" (2023), Wenxuan Wang
et al., researchers on Neural Machine Translations (NMT), find that by employing Sequence-to-Sequence pretraining
on both the encoder and decoder of NMTs, it is possible to improve their translation performance for many languages.
Wang et al. demonstrate these ideas by comparing the capabilities of NMTs undergoing encoder-decoder training with
NMTs using other techniques, by comparing their chances of hallucinating, and by comparing the word-frequencies
and chances of hallucination of models undergoing in-domain training to reduce hallucinations.
Using tables and figures(models undergoing encoder-decoder pre training have the highest BLEU scores tested in the
tables(5), models undergoing encoder-decoder pre-training have higher HUP scores, showing their increased chances
of hallucinating(6)), they showcase different methodologies and ideas in order to convince their readers that
Sequence-to-Sequence pretraining on the encoder and decoder provides worthwhile improvements to the translation
capabilities of NMTs.
Their intended audience is experts researching different NMT techniques because the article utilizes complicated and
advanced vocabulary without an explanation ("We recap the beam search problem on the application of our approaches
in Table 12” (9))), and showcases different ideas using an objective inquisitive tone; however, by using easy-to-
understand graphs and charts, Wang et al. make their essay palatable to laymen and hope to explain complicated ideas
using simple figures and tables.
Wang, Wenxuan, et al. Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine
Translation. 16 Mar. 2022. arxiv, https://doi.org/10.48550/arXiv.2203.08442. Accessed 14 Feb. 2024.
—------------------------------------------------------------------------------------------------------------------------------
Lu et al. demonstrate these ideas by comparing the translations of LLMs with and without Chain of Dictionary
Prompting (CoD),, by comparing the scores when performing translation of corpora for non-English to English
translations, and by comparing the scores when performing translation of corpora for English to non-English
translations.
Using direct statistics and explanations (“ChatGPT does not handle the translation perfectly and it reports a score under
30 points for 100 out of 200 languages”, “the translation performance of ChatGPT is promising, and it reports a score
over 40 points in ChrF for around 50 languages”(6)), they continuously showcase different statistics and examples in
order to reinforce the idea that after applying CoD, ChatGPT’s performance generally improves, especially when
translating low-resource languages.
Their intended audience is experts researching the MNMT capabilities of LLMs because the article utilizes intricate
terminology ("Table 3 reports the ablation study using GPT-3.5-0213”(7)), and explores advanced techniques using a
speculative investigative tone; however, by often giving simple explanations afterwards, Lu et al. make their essay
readable by amateurs and hope to make complicated explanations of advanced topics seem simple.
Lu, Hongyuan, et al. Chain-of-Dictionary Prompting Elicits Translation in Large Language Models. 24 May 2023.
arxiv, https://doi.org/10.48550/ arXiv.2305.06575.
PRECIS FOR: “Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in
Machine Translation”
In the article "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine
Translation" (2023), Haoran Xu et al., researchers on Machine Translation, find that the Machine Translation(MT)
capabilities of Large Language Models(LLMs) can be improved by applying Contrastive Preference Optimization
(CPO) which allows moderate-sized 7B or 13B parameter models to match or even surpass state of the art models such
as WMT competition winners and GPT-4.
Xu et al. demonstrate these ideas by comparing the capabilities of moderate-sized models without CoD with GPT-4,,
by comparing their capabilities after using CoD to train the moderate-sized model, and by comparing the translations
with so-called “gold references” to question their capabilities.
Using tables and explanations (“The incorporation of CPO significantly enhances ALMA’s capabilities”, “LLaMa2-7B
outperforms previously released open-source LLMs”(4)), they continuously reinforce their ideas in order to convince
their readers that while the MMT capabilities of LLMs are currently worse than Neural Machine Translations, their
abilities are continuously improving, and will likely soon surpass Neural Machine Translations.
Their intended audience is researchers exploring and training moderate-sized language models as their technique only
pertains to these models("We reconstruct the preference data using only KIWI-XXL or XCOMET and re-train the
ALMA-13B-LoRA model using the CPO method”(4)), and teaches them at a high level utilizing a questioning
analytical tone; however, by using examples which call into question gold references, Xu et al. make their essay
palatable to general NMT researchers and hope to teach them important ideas which are universally applicable in
Machine Translation.
Xu, Haoran, et al. "Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis."
Publicly Available Content Database, 2023, www.proquest.com/working-papers/multilingual-machine-translation-
with-large/docview/2799277250/se-2?accountid=41498.
—------------------------------------------------------------------------------------------------------------------------------