NLP Project Final Report1

This document is a research paper that investigates leveraging BERT for English-Arabic machine translation. It discusses using BERT pre-training and byte-pair encoding (BPE) for data preprocessing. The paper then trains baseline transformer models on an IWSLT2017 dataset and achieves relatively good BLEU scores for both English-to-Arabic and Arabic-to-English translation compared to the baseline.

Uploaded by

Abhishek Dhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views10 pages

NLP Project Final Report1

Uploaded by

Abhishek Dhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/368275060

NLP project ﬁnal report (1)

Article · February 2023

CITATIONS READS

0 1,017

1 author:

Aliakbar Abdurahimov
National Research University Higher School of Economics
7 PUBLICATIONS 1 CITATION

SEE PROFILE

All content following this page was uploaded by Aliakbar Abdurahimov on 04 February 2023.

The user has requested enhancement of the downloaded file.

Leveraging BERT for English-Arabic Machine Translation

MuhammadMahdi Abdurahimov
National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)

Abstract neural translation model are trained jointly (end-

to-end) to maximize translation performance. The
Large pre-trained models have gained great at-
tention in the field of NLP tasks. In this work,
first scientific paper on the use of neural networks
we have used the transformer as our baseline in machine translation appeared in 2014(Sutskever
model for machine translation of Arabic to En- et al., 2014). NMT is able to handle a wider range
glish as well as English to Arabic. In terms of inputs and produce more natural translations,
of the details of the experiment, We first pre- but it requires a large amount of training data and
processed the data, such as unicode normal- computational resources. To some extent, neural
ization, orthographic normalization, and dedi- machine translation is more efficient and versatile
acritization for Arabic and lowercase, normal-
than statistical machine translation.
ize punctuation for English. We also used the
BPE tokenization scheme in our experiment for Arabic is spoken by more than 400 million people
both languages. Besides for the baseline model, and is one of the official languages of the United
based on related work, we used the BERT-fused Nations, the Arabic term Semitic is widely used
model and compared the performance of base- as a first language in Arab countries. Most of the
line MT models on 6 separate test sets which Arabic-speaking countries are located in the Mid-
are in the IWSLT2017 dataset. As a result, for dle East, North Africa and the Arabian Gulf region.
both English-to-Arabic and Arabic-to-English
A total of 22 countries have Arabic as their official
machine translations, we achieved a relatively
good BLEU score compared to the baseline
language, and Arabic is also the liturgical language
model. of Islam, which means that there is a large demand
for accurate and reliable machine translation ser-
1 Introduction vices in Arabic, both for personal and professional
purposes. Arabic is a morphologically rich lan-
Machine Translation (MT) is an important task in
guage that usually puts many kinds of morphemes
NLP, it has undergone many changes, and now the
inside a word, so a single prefix or suffix may have
most popular ones are Rule-based MT, Statistical
a significant impact on the result. Moreover, due
MT(SMT), and Neural MT(NMT). Rule-based MT
to diacritization (addition of short vowels to each
is a rule created by linguists and programmers, by
word), it is possible for two words written with the
using dictionaries and grammar rules to create a
same letter to represent something opposed. Due to
translation rule base, but this MT approach has sig-
the above problems, the word vectors generated by
nificant limitations, and any new corpus will lead to
the existing models are relatively not sufficient for
some problems. Statistical machine translation, on
accurate machine translation, and in this case, we
the other hand, creates a mapping between two par-
would like to obtain better word vectors by some
allel corpora, transitions the earlier word-based ma-
methods.
chine translation to phrase-based translation, and
As mentioned before, the rich morphology of the
incorporates syntactic information to further im-
Arabic language leads to some challenges. The
prove the accuracy of the translation.
specificity of Arabic causes the biggest challenge
However, due to the low syntactic and semantic
of Arabic machine translation as a language, the
components in such models, problems are easily
absence of letters representing short vowels, the
encountered when dealing with syntactically differ-
absence of capital letters and then multiple morpho-
ent language pairs, such as Chinese-English. NMT
logical letters, and few punctuation marks all cause
is a deep learning approach to translate text, and un-
some difficulties. Furthermore, the variant symbols
like traditional translation systems, all parts of the
have a great impact. As a result, machine trans- amounts of Byte Pair Encoding (BPE) merging
lation can be particularly challenging for Arabic, operations had a huge influence on the system
and the development of effective Arabic machine performance. And it is important to select ideal
translation algorithms can help to advance the field BPE setups for LSTM-based designs or Trans-
of machine translation as a whole. former systems.They discovered that a sub-optimal
Traditional methods especially for the Arabic lan- BPE configuration would cause a lower system
guage with rich morphology are not easy to han- performance by 3–4 BLEU points.
dle, our work here is to pre-train the data first, we In order to address difficulty with short sentences
used unicode normalization, orthographic normal- in neural machine translation, (Oudah et al., 2019)
ization, and dediacritization. We also use the BPE constructed a length-based system selection to
tokenization scheme in the experiment. Besides we solve this problem. They used different training
also considered that the BERT(Devlin et al., 2018) data set for tuning and obtained better BLEU
pre-training model has a bi-directional transformer scores 56.18% than prior work when optimized the
structure, which can consider the contextual con- choice the tokenization scheme.
tent, so we may get better results in machine trans- The vast majority of deep learning models rely on
lation. So we used the mdoel which is based on a word vector layer for both input and output, i.e.,
BERT. We also preprocessed the English data, such each unit is mapped to a word vector at input and
as lowercase and normalize punctuation.e used the finally needs to be mapped back to a word vector
IWSLT2017 En-Ar dataset, which is an open cor- at output. However, there are some problems
pus and is often used for training Arabic machine associated with this, as the lexicon is large and
translation. Our work focuses on prepossing the the space occupied by the word vectors can be
data first, then used the BPE tokenization for both large. (Shaham and Levy, 2020) explored the
languages, after using BPE tokenization, we set use of byte encoding, which discarded the input
models and train the model with tuning the param- and output word vectors, and experimented on
eters to train and improve the accuracy of English several languages, finding that the results are not
to Arabic and Arabic to English translation. very different, and even slightly better for some
tasks. Their Experiment discovered a consistent
2 Related Work improvement in BLEU score with 12.9%. Which
inspires us thinking about the use of encoding
At present, most of the research papers in the field forms and word vectors.
of Arabic-English translation have mainly focused (Zoph et al., 2016) employs a migration learning
on statistical and neural machine translation strategy, not necessarily for the English to Arabic
methods. In this section, we will introduce some translation task. The main idea is twofold, and
research works in the tasks of Arabic-English or we use the translation task of X and Y as an
English-Arabic translation that is based on neural example here. The first aspect is to first train the
machine translation. NMT model between X and Z, and then train the
translation model of X and Y afterwards. The
second aspect is to pre-train on the high-resource
2.1 Arabic Machine Translation
language and then migrate to the low-resource for
In 2016, a neural machine translation (Almahairi training. Due to the problem of a low resource
et al., 2016) in the task of Arabic translation was language in English-Arabic and Arabic-English,
compared to the typical phrase-based translation (Ren et al., 2018) proposed a triangular approach
system, which the neural machine translation to train English to French and translate English
method outperformed the typical phrase- based to Arabic and Arabic to French using the well
system on the MT05 dataset with BLEU 33.62%. translated corpus, which was informed by (Zoph
(Shapiro and Duh, 2018) introduced word em- et al., 2016). This method first trains a parent
beddings to process the morphological resources model using a high-resource dataset, and then uses
in Arabic language, which using sub-word infor- this parent model to train a low-resource dataset
mation outperformed regular word embeddings with initialization constraints. French-English
on a word similarity task in the experiments of is used as the parent model and other languages
Arabic-English on a small corpus of TED subtitles. are used as child models for training. Their
(Ding et al., 2019) demonstrated that different
experiments showed that both English to Ara- in Figure 11 . In this section, we introduce the
bic and the Arabic to French tasks got better results. BERT-fused(Zhu et al., 2019) model in detail and
employ the BERT-fused model in Arabic-English
and English-Arabic machine translation.
2.2 Pre-training in Machine Translation In traditional natural language tasks, pre-trained
Pre-trained embeddings also have a good applica- models have been used widely and it can be used
tion on translation tasks. (Qi et al., 2018) employed in the encoder and decoder modules or be used as
pre-trained embeddings vectors in Neural Machine the input of machine translation tasks. However,
Translation and obtained a consistent BLEU score different from text classification or other natural
improvement. language tasks that usually apply pre-trained model
(Mikolov et al., 2013) proposed a new word2vec as fine-tuning to perform experiments, BERT-fused
method, which is the most universal word embed- model was designed to explore the contextual em-
dings and each word is linked to a vector represen- bedding in machine translation.
tation with the advantage of captureing semantic More specifically, as shown in Figure 22 , the rep-
relationship. Another approaches is ELMo based resentation of input information is extracted using
on features (Peters et al., 2018) using bidirectional BERT by feeding it into decoder and encoder lay-
LSTM structures. It can be used to capture context- ers instead of served as only input embedding and
related meanings in the whole sentences. In ad- then attention mechanism is fused in the layers of
dition, there are other word embedding methods encoder modules and decoder modules in BERT-
for Arabic, such as Polyglot (Al-Rfou et al., 2013), fused model architecture. In order to solve the
AraVec(Soliman et al., 2017). problem of different sequence lengths, the atten-
In contrast to word-level representation, many pre- tion modules were designed adaptively to process
trained language model can be used in the level different word segmentation lengths. In addition,
of sentence representation and can be fine-tuned two new modules named BERT-encoder attention
downstream tasks with funing very few parame- and BERT-decoder attention were fused with NMT
ters. One of these language models is OpenAI encoder and decoder to obtain fused output repre-
GPT (Radford et al., 2018) which can capture a sentations.
long range of linguistic information based on the It can be seen in Figure 2 that given any input sen-
Transformer network(Vaswani et al., 2017). An- tence or tokenization x, it firstly is feed into BERT
other widely used pre-trained language models module and encoded into representation. Here, We
is BERT(Devlin et al., 2018) based Transformer defined the output of final layer in BERT module
attention networks, which uses the bidirectional as HB . After that, we define HEl as the hidden
Transformer architecture to capture both left and representation of l-th layer in the encoder, and HE0
right context. These pre-trained language models as word embedding of input x. Denote the i-th
achieve state-of-the-art results in many NLP tasks, element in HEl as hli for any i ∈ [lx ].
which is useful for machine translation as well. Therefore, the output in l-th layer is that,
In this work, we mainly focus on using differ-
ent pre-trained models to generate embedding vec-
tors in Arabic-English and English-Arabic machine hˆli = 12 (attns (hl−1 l−1 l−1
i , HE , HE )
translation tasks and compare the difference be- (1)
tween different language models. +attnB (hl−1
i , HB , HB )), i ∈ [lx ]

3 Method
where attens and attnB are attention models
Typical neural machine translation model includes with different parameters defined in Eqn.(2).Then
two parts, one is the encoder architecture that
each hˆli is further processed by F F N (x) defined
the encoder forms contextualized word embedding
in Eqn.(3) and we get the output of the l-th layer:
from a source sentence , another is the decoder ar-
chitecture that the decoder generates a target trans- HEl = (F F N (hˆl1 ), ..., F F N (hˆllx )). The encoder
lation from left to right. 1
Attention Is All You Need(Vaswani et al., 2017)
In our experiment, we firstly use transformer as our 2
Incorporating bert into neural machine translation(Zhu
baseline. The transformer architecture is as shown et al., 2019)
will eventually output HEl from the last layer, In short, BERT-fused model combine the output of
BERT with attention modules to incorporate it into
the machine translation model.
P|V |
attn(q, K, V ) = i=1 ai Wv vi ,
4 Experiments
exp((Wq q)T (Wk ki ))
ai = Z , (2) 4.1 Dataset
P|V | In our experiment, we use IWSLT2017 En-Ar
Z= T
i=1 exp((Wq q) (Wk ki )) dataset3 , which was constructed using transcripts
and manual translations of TED talks. As shown in
Table 1, this dataset contains 235,527 parallel sen-
where attn(q, K, V ) defines the attention layer tences in the training set and 888 parallel sentences
and q, K and V represent query, key and value re- in the validation set. And it also contains 6 test sets,
spectively. Here q is a dq -dimensional vector (d ∈ which were collected from TED talks in the year
Z), K and V are two sets with |K| = |V |. Each 2010 to 2015. Figure 3shows some examples about
ki ∈ K and vi ∈ V are also dk /dv -dimensional En→Ar and Ar→En translation of baseline mod-
(dq , dk and dv can be different) vectors, i ∈ [|K|] els. For each translation direction, we show a good
and Wq , Wk and Wv are the parameters to be case, a tricky case and a bad case from Ex- ample 1
learned. to Example 3 respectively. As shown in Example 2,
We define the non-linear transformation layer as the Arabic sentence contains two un- known words.
This results in two problems:En→Ar translation,
F F N (x) = W2 max(W1 x + b1 , 0) + b2 (3) BLEU score does not reflect the actual translation
quality; 2) For Ar→En translation, the MT model
can not generate correct text due to information
where x is the input; W1 , W2 , b1 , b2 are the pa- missing. Therefore, it is of great importance to
rameters to be learned. Let S<t l denote the hidden reconsider the tokenization scheme of Arabic text.
state of l-th layer in the decoder proceeding time
l . Note s0 is a special token indicat-
step t, i.e., S<t Parallel sentences
1
ing the state of a sequence, and s0t is the embedding train 235,527
of the predicted word at time-step t − 1. At the l-th
dev 888
layer, we have
tst2010 1,565
tst2011 1,427
sˆlt = attns (sl−1 l−1 l−1
t , S<t+1 , S<t+1 ), tst2012 1,705
test
tst2013 1,380
sˆlt = 12 (attnB (sˆlt , HB , HB ) (4) tst2014 1,301
tst2015 1,205
+attnE (sˆlt , HEL , HEL )), slt = F F N (sˆlt )
Table 1: Overview of IWSLT2017 En-Ar dataset.

The attnS , attnB and attnE represent self-

4.2 Setting
attention model, BERT-decoder attention model
and encoder-decoder attention model respectively. Preprocessing For English, we normalize punc-
Eqn.(2) iterates over layers and we can eventually tuation, remove non-printing characters and
obtain sL L perform lowercase for all sentences using
t . Finally st is mapped via a linear trans-
formation and softmax to get he t-th predicted word mosesdecoder script4 . For Arabic, we first
ŷˆt. The decoding process continues until meet- preprocess the text by unicode normalization, or-
ing the end-of-sentence token. Besides, in order to thographic normalization and dediacritization us-
ensure that the features length obtained by BERT ing camel_tools (Obeid et al., 2020). Then we
and the traditional encoder are the same, a drop-net clean the text by using camel_arclean from
trick was used to ensure that the features are fully 3
https://wit3.fbk.eu/2017-01-c
4
utilized. https://github.com/moses-smt/mosesdecoder
Figure 1: The architecture of transformer model.

camel_tools5 . Preprocessing is a very impor- Models For Ar → En MT, we use BERT-base-

tant step especially for Arabic since Arabic texts arabic6 with a standard transformer-base model in
are often inconsistent in terms of punctuation, dig- our BERT-fused method. Similarly, we use BERT-
its, diacritics and spelling. Especially in informal base-english7 . for En → Ar MT. For the baseline
texts such as TED talks, these phenomena occur model, we use the transformer for machine trans-
frequently, exacerbating the data sparsity problem lation of Arabic to English as well as English to
in Arabic. Arabic. Since the training set is quite small, in-
stead of using the standard transformer-base archi-
Tokenization We use byte-pair-encoding(BPE) tecture, we use a smaller model with 6 layers, 4
(Sennrich et al., 2016) tokenization scheme in our attention heads, 512 embedding dimensions, and
experiment. For each language, the vocabulary 1,024 feed-forward embedding dimensions for both
size is set to 8,000. We train the BPE model on the encoder and the decoder.
the training set and then use the trained model to
tokenize the train/dev/test data. 6
https://huggingface.co/asafaya/bert-base-arabic
5 7
https://github.com/CAMeL-Lab/camel_tools https://huggingface.co/bert-base-uncased
Figure 2: The architecture of BERT-fused model.From left to right, the modules name is the BERT, encoder and
decoder respectively.

Training Following the practice of (Zhu et al., baseline model achieves a 26.71 BLEU score on av-
2019), we first train the transformer until con- erage across the 6 test sets with a standard deviation
vergence and then initialize the encoder and de- of 1.80. The Bert-fused model achieves a 28.52
coder of the BERT-fused model with the obtained BLEU score on average, outperforming the base-
model. The BERT-encoder attention and BERT- line by an absolute improvement of 1.81 BLEU
decoder attention are randomly initialized. Dur- score. For En→Ar translation, the baseline model
ing training, all parameters in BERT are frozen. achieves a 12.78 BLEU score on average with a
We use fairseq8 for training. For each transla- standard deviation of 1.96. The Bert-fused model
tion direction, we train the BERT-fused model with achieves a 13.81 BLEU score on average, outper-
max_tokens = 4, 000 in each batch. It takes roughly forming the baseline by an absolute improvement
10 hours to train on a single Nvidia A6000 48G of 1.03 BLEU score. For both translation direc-
GPU. We also employ label smoothing of value 0.1 tions, incorporating BERT into Transformer results
during training. in consistent improvements across all six test sets
over the vanilla transformer model. In addition,
Evaluation During the evaluation, we use the comparing the absolute performance of Ar→En
model with the best validation score to generate MT and En→Ar MT, we can see that translating
translations of the given source input. For decoding, to Arabic is much more difficult than translating to
we use a beam search algorithm with a beam size of English in terms. However, it is also possible that
5. The evaluation metric is BLEU score (Papineni BLEU4 is not an appropriate metric for evaluating
et al., 2002), which automatically measures word Arabic translation.
and phrase matching scores between the MT output
and reference translations. Specifically, we use
BLEU4 following common practice.
Effect of Tokenization Table 3 shows the aver-
4.3 Results age performance of Transformer models trained
on different tokenization schemes. From Table 3,
Main Results Table 2 shows the performance of
we can see that for both translation directions, us-
baseline MT models and our BERT-fused models
ing BPE results in a much better BLEU score than
on 6 separate test sets. For Ar→En translation, the
using whole-word tokenization. This is mainly be-
8
https://github.com/facebookresearch/fairseq cause BPE addresses out-of-vocabulary issues.
Figure 3: Examples about En→Ar and Ar→En translations of baseline models.

Model tst2010 tst2011 tst2012 tst2013 tst2014 tst2015 Avg Std

Transformer 25.86 26.5 29.73 27.76 24.66 25.73 26.71 1.80
Ar →En Bert-fused Model 29.94 28.22 32.95 27.98 25.32 26.72 28.52 2.66
∆ +4.08 +1.72 +3.22 +0.22 +0.66 +0.99 +1.81
Transformer 10.06 11.52 15.37 14.3 11.96 13.48 12.78 1.96
En→Ar Bert-fused Model 10.77 12.32 17.22 15.58 13.32 13.66 13.81 2.30
∆ +0.71 +0.80 +1.85 +1.28 +1.36 +0.18 +1.03

Table 2: Results of baseline models and BERT-fused models. We report BLEU4 in this table. Bold denotes the best
result.

Effect of Preprocessing Table 4 shows the aver- to English as well as English to Arabic. During the
age performance of Transformer models trained on experiment, we preformed unicode normalization,
raw Arabic corpus and preprocessed Arabic corpus. orthographic normalization, dediacritization and
When using BPE, the results of models trained on BPE tokenization scheme in IWSLT2017 dataset.
preprocessed Arabic and raw Arabic are almost the We compared the performance of baseline MT mod-
same for Ar→En MT. This result indicates that for els on 6 separate test sets, the results in all of the
Ar→En MT, preprocessing Arabic is not necessary sets are really good. We not only explored the low
when using BPE, which makes it easier for peo- resource translation of Ar→En and En→Ar, but
ple who do not understand any Arabic to perform also leveraged pre-trained BERT and fuse it into
this task. However, for En→Ar MT, preprocessing Transformer, and the results are consistently im-
Arabic is important with and without using BPE. proved across six test sets for both direction. What
is more, we found that preprocessing Arabic is crit-
5 Conclusion and Future Work ical for translating English to Arabic. We think
it could work in some other languages. In future
In this work, we have used the transformer as our work, we will continue to use Arabic-BERT in-
baseline model for machine translation of Arabic
Model Ar→En En→Ar Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. 2013. Distributed representa-
word, raw Ar 17.95 8.24 tions of words and phrases and their compositionality.
BPE, raw Ar 26.31 10.53 Advances in neural information processing systems,
26.
∆ +8.36 +2.29
word, clean Ar 21.02 9.93 Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima
Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl
BPE, clean Ar 26.71 12.78 Eryani, Alexander Erdmann, and Nizar Habash. 2020.
∆ +5.69 +2.85 CAMeL tools: An open source python toolkit for Ara-
bic natural language processing. In Proceedings of
Table 3: Results of Transformer models trained with the 12th Language Resources and Evaluation Confer-
different tokenization schemes. We report the average ence, pages 7022–7032, Marseille, France. European
BLEU score on the six test sets in this table. “word" Language Resources Association.
refers to whole-word tokenization, “BPE" refers to byte-
Mai Oudah, Amjad Almahairi, and Nizar Habash. 2019.
pair encoding. “raw Ar" refers to raw Arabic corpus The impact of preprocessing on arabic-english statis-
while “clean Ar" refers to the preprocessed Arabic cor- tical and neural machine translation. arXiv preprint
pus. arXiv:1906.11751.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-

Model Ar→En En→Ar Jing Zhu. 2002. Bleu: a method for automatic evalu-
word, raw Ar 17.95 8.24 ation of machine translation. In Proceedings of the
40th Annual Meeting of the Association for Compu-
word, clean Ar 21.02 9.93 tational Linguistics, pages 311–318, Philadelphia,
∆ +3.07 +1.69 Pennsylvania, USA. Association for Computational
Linguistics.
BPE, raw Ar 26.31 10.53
BPE, clean Ar 26.71 12.78 Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
∆ +0.40 +2.25 Gardner, Christopher Clark, Kenton Lee, and Luke
Zettlemoyer. 2018. Deep contextualized word repre-
Table 4: Results of Transformer models trained with sentations. arXiv.
raw and preprocessing Arabic corpus. We report the Ye Qi, Devendra Singh Sachan, Matthieu Felix, Sar-
average BLEU score on the six test sets in this table. guna Janani Padmanabhan, and Graham Neubig.
2018. When and why are pre-trained word embed-
dings useful for neural machine translation? arXiv
stead of English-BERT to help En→Ar MT. Also preprint arXiv:1804.06323.
we think that we can improve the evaluation met-
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya
ric of Arabic translation (eg. BERTScore1(Zhang
Sutskever, et al. 2018. Improving language under-
et al., 2019)) because the evaluation metric of Ara- standing by generative pre-training.
bic translation may not be appropriate.
Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming
Zhou, and Shuai Ma. 2018. Triangular architec-
References ture for rare language translation. arXiv preprint
arXiv:1805.04813.
Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2013.
Polyglot: Distributed word representations for multi- Rico Sennrich, Barry Haddow, and Alexandra Birch.
lingual nlp. arXiv preprint arXiv:1307.1662. 2016. Neural machine translation of rare words with
subword units. In Proceedings of the 54th Annual
Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Meeting of the Association for Computational Lin-
and Aaron Courville. 2016. First result on ara- guistics (Volume 1: Long Papers), pages 1715–1725,
bic neural machine translation. arXiv preprint Berlin, Germany. Association for Computational Lin-
arXiv:1606.02680. guistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Uri Shaham and Omer Levy. 2020. Neural machine
Kristina Toutanova. 2018. Bert: Pre-training of deep translation without embeddings. arXiv preprint
bidirectional transformers for language understand- arXiv:2008.09396.
ing. arXiv preprint arXiv:1810.04805.
Pamela Shapiro and Kevin Duh. 2018. Morphological
Shuoyang Ding, Adithya Renduchintala, and Kevin Duh. word embeddings for arabic neural machine trans-
2019. A call for prudent choice of subword merge op- lation in low-resource settings. In Proceedings of
erations in neural machine translation. arXiv preprint the Second Workshop on Subword/Character LEvel
arXiv:1905.10453. Models, pages 1–11.
Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-
Beltagy. 2017. Aravec: A set of arabic word embed-
ding models for use in arabic nlp. Procedia Com-
puter Science, 117:256–265.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.
Sequence to sequence learning with neural networks.
Advances in neural information processing systems,
27.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Kaiser, and Illia Polosukhin. 2017. Attention is all
you need. Advances in neural information processing
systems, 30.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q
Weinberger, and Yoav Artzi. 2019. Bertscore: Eval-
uating text generation with bert. arXiv preprint
arXiv:1904.09675.
Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin,
Wengang Zhou, Houqiang Li, and Tieyan Liu. 2019.
Incorporating bert into neural machine translation. In
International Conference on Learning Representa-
tions.
Barret Zoph, Deniz Yuret, Jonathan May, and
Kevin Knight. 2016. Transfer learning for low-
resource neural machine translation. arXiv preprint
arXiv:1604.02201.

View publication stats

Lab Manual - NLP
No ratings yet
Lab Manual - NLP
139 pages
Arnold Etal94
100% (1)
Arnold Etal94
237 pages
Cosworth Performance Parts 2011
No ratings yet
Cosworth Performance Parts 2011
48 pages
Welding Machine Specifications PDF
0% (1)
Welding Machine Specifications PDF
4 pages
Free Siemens NX (Unigraphics) Tutorial - Surface Modeling
73% (11)
Free Siemens NX (Unigraphics) Tutorial - Surface Modeling
53 pages
Official Glossary - ISC 2 CC Preparation
No ratings yet
Official Glossary - ISC 2 CC Preparation
12 pages
Final Research Paper
100% (1)
Final Research Paper
5 pages
2022 Arabic MT Nagoudi
No ratings yet
2022 Arabic MT Nagoudi
11 pages
Language Modelling Approaches To Adaptive Machine Translation
No ratings yet
Language Modelling Approaches To Adaptive Machine Translation
132 pages
BENSALAH Nouhaila, AYAD Habib, ADIB Abdellah and IBN EL FAROUK Abdelhamid+
No ratings yet
BENSALAH Nouhaila, AYAD Habib, ADIB Abdellah and IBN EL FAROUK Abdelhamid+
2 pages
Low-Resource Neural Machine Translation A Systematic Literature Review
No ratings yet
Low-Resource Neural Machine Translation A Systematic Literature Review
39 pages
Neural and Statistical Machine Translation: Confronting The State of The Art
No ratings yet
Neural and Statistical Machine Translation: Confronting The State of The Art
13 pages
Analysis of Different Text Features Using NLP
No ratings yet
Analysis of Different Text Features Using NLP
7 pages
Arabic NLP Session Hackathon
No ratings yet
Arabic NLP Session Hackathon
33 pages
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
No ratings yet
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
10 pages
Neural and Statistical Machine Translation: Confronting The State of The Art
No ratings yet
Neural and Statistical Machine Translation: Confronting The State of The Art
13 pages
Arabic Machine Translation A Survey With Challenges and Future Directions
No ratings yet
Arabic Machine Translation A Survey With Challenges and Future Directions
24 pages
Survey On Machine Translation Approaches Used in India: D S Rawat
No ratings yet
Survey On Machine Translation Approaches Used in India: D S Rawat
4 pages
2503 06594v1-LaMaTE
No ratings yet
2503 06594v1-LaMaTE
36 pages
A Transformer-Based Neural Mac
No ratings yet
A Transformer-Based Neural Mac
29 pages
Innovatively Fused Deep Learning For Evaluating Translations From Poor Into Rich Morphology-Coling2020
No ratings yet
Innovatively Fused Deep Learning For Evaluating Translations From Poor Into Rich Morphology-Coling2020
11 pages
NLP Module5 and 6
No ratings yet
NLP Module5 and 6
31 pages
NLP Self Notes
No ratings yet
NLP Self Notes
12 pages
A Transformer-Based Yoruba To English Machine Translation (TYEMT) System With Rouge Score
No ratings yet
A Transformer-Based Yoruba To English Machine Translation (TYEMT) System With Rouge Score
11 pages
Arabic Machine Translation A Survey. Artificial Intelligence Review
No ratings yet
Arabic Machine Translation A Survey. Artificial Intelligence Review
25 pages
Linux Unit I
No ratings yet
Linux Unit I
51 pages
Machine Translation Final Draft
No ratings yet
Machine Translation Final Draft
27 pages
NLP Project Final Report1
No ratings yet
NLP Project Final Report1
10 pages
NLP Project Final Report1
No ratings yet
NLP Project Final Report1
10 pages
Amharic Arabic Neural Machine Translatio
No ratings yet
Amharic Arabic Neural Machine Translatio
14 pages
Bashaier Proposal Ver 22-8-2024
No ratings yet
Bashaier Proposal Ver 22-8-2024
15 pages
Machine Translation Paper
No ratings yet
Machine Translation Paper
24 pages
A GPT: N A GPT-L L M: Rabian Ative Rabic Based Arge Anguage Odel
No ratings yet
A GPT: N A GPT-L L M: Rabian Ative Rabic Based Arge Anguage Odel
21 pages
1679506287709733
No ratings yet
1679506287709733
15 pages
NLP Unit V
No ratings yet
NLP Unit V
18 pages
New Applications June 2023
No ratings yet
New Applications June 2023
124 pages
Channel Coding - Part II: Digital Communications
No ratings yet
Channel Coding - Part II: Digital Communications
26 pages
Lang Gragh
No ratings yet
Lang Gragh
14 pages
2024 Wat-1 51
No ratings yet
2024 Wat-1 51
11 pages
Lab Experiment 07 Logical Operations
No ratings yet
Lab Experiment 07 Logical Operations
6 pages
Mustafamachinelearning
No ratings yet
Mustafamachinelearning
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
26 Vol 101 No 3
No ratings yet
26 Vol 101 No 3
11 pages
(IJCST-V9I1P20) :T. Madhavi Kumari, Dr. A. Vinaya Babu
No ratings yet
(IJCST-V9I1P20) :T. Madhavi Kumari, Dr. A. Vinaya Babu
6 pages
A Review of The Marathi Natural Language Processing
No ratings yet
A Review of The Marathi Natural Language Processing
13 pages
Error Analysis of The Urdu Verb Markers
No ratings yet
Error Analysis of The Urdu Verb Markers
13 pages
SM PDF
No ratings yet
SM PDF
417 pages
3405 ArticleText 6033 1 10 20240118
No ratings yet
3405 ArticleText 6033 1 10 20240118
11 pages
A Rule-Based English To Arabic Machine Translation Approach: December 2015
No ratings yet
A Rule-Based English To Arabic Machine Translation Approach: December 2015
8 pages
Evaluation of Arabic To English Machine Translation Systems
No ratings yet
Evaluation of Arabic To English Machine Translation Systems
6 pages
Word Embedding-SemanticFeatureExtraction
No ratings yet
Word Embedding-SemanticFeatureExtraction
14 pages
Advancements in Neural Machine Translation
No ratings yet
Advancements in Neural Machine Translation
9 pages
Neural Machine Translation in Foreign Language Teaching and Learning A Systematic Review
No ratings yet
Neural Machine Translation in Foreign Language Teaching and Learning A Systematic Review
20 pages
Challenges in Rendering Arabic Text To English Using Machine Translation A Systematic Literature Review
No ratings yet
Challenges in Rendering Arabic Text To English Using Machine Translation A Systematic Literature Review
8 pages
Machine Translation Development For Indian Languages and Its Approaches
No ratings yet
Machine Translation Development For Indian Languages and Its Approaches
21 pages
Hindi To English Machine Translation
No ratings yet
Hindi To English Machine Translation
4 pages
1.1 General: Resourced" Languages. To Enhance The Translation Performance of Dissimilar Language
No ratings yet
1.1 General: Resourced" Languages. To Enhance The Translation Performance of Dissimilar Language
18 pages
2020 Nlposs-1 2
No ratings yet
2020 Nlposs-1 2
6 pages
359 1632 1 PB
No ratings yet
359 1632 1 PB
5 pages
Jeeit 2019 8717369
No ratings yet
Jeeit 2019 8717369
7 pages
Machine Translation Approaches Issues An
No ratings yet
Machine Translation Approaches Issues An
7 pages
JETIR2211403
No ratings yet
JETIR2211403
6 pages
Temp Research Paper
No ratings yet
Temp Research Paper
5 pages
Use of Neural Networks and Deep Learning in Urdu Translation
No ratings yet
Use of Neural Networks and Deep Learning in Urdu Translation
8 pages
Arabic Natural Language Processing and Machine Learning-Based Systems
No ratings yet
Arabic Natural Language Processing and Machine Learning-Based Systems
10 pages
Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm
No ratings yet
Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm
4 pages
NLP Assignment 1
No ratings yet
NLP Assignment 1
4 pages
Syntactic and Semantic
No ratings yet
Syntactic and Semantic
4 pages
Linux Unit Iii
No ratings yet
Linux Unit Iii
33 pages
F&G Devices Inspection and Test Plan
No ratings yet
F&G Devices Inspection and Test Plan
3 pages
Ielts Practice-Reading-Skimming and Scanning
No ratings yet
Ielts Practice-Reading-Skimming and Scanning
5 pages
Koe088 Natural Language Processing 2023 24
No ratings yet
Koe088 Natural Language Processing 2023 24
2 pages
AEM 7130 Dynamic Optimization and Computational Methods From Cornell (2018)
No ratings yet
AEM 7130 Dynamic Optimization and Computational Methods From Cornell (2018)
8 pages
BMC Resmart Gii Y30t Bipap Humidifier
No ratings yet
BMC Resmart Gii Y30t Bipap Humidifier
4 pages
#1 Introduction To HRM
No ratings yet
#1 Introduction To HRM
19 pages
Tabla de Ampacidades 310.16 NEC (NFPA 70) - Como Utilizar La Tabla de Ampacidades
No ratings yet
Tabla de Ampacidades 310.16 NEC (NFPA 70) - Como Utilizar La Tabla de Ampacidades
3 pages
Pragya Sachdeva Resume
No ratings yet
Pragya Sachdeva Resume
1 page
MIS-Quiz 2
No ratings yet
MIS-Quiz 2
6 pages
Unit V
No ratings yet
Unit V
24 pages
Manual Hiad 6 Ton Inv. 1942
No ratings yet
Manual Hiad 6 Ton Inv. 1942
46 pages
SSG-VD-000-MECH-IOM-SCA01-0001 - 3 - IFI - AC (Cover)
No ratings yet
SSG-VD-000-MECH-IOM-SCA01-0001 - 3 - IFI - AC (Cover)
20 pages
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
No ratings yet
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
6 pages
Unit 4 Program
No ratings yet
Unit 4 Program
6 pages
Land Earth Station (LES) Configuration of Sat-C Terminals
No ratings yet
Land Earth Station (LES) Configuration of Sat-C Terminals
9 pages
Lab Jam WASv8 Development Lab
No ratings yet
Lab Jam WASv8 Development Lab
121 pages
Comparison of Different DEM Generation Methods Based On Open Source Datasets
No ratings yet
Comparison of Different DEM Generation Methods Based On Open Source Datasets
23 pages
The Process of Animation
No ratings yet
The Process of Animation
7 pages
Inspection Notification-093.Rev A
No ratings yet
Inspection Notification-093.Rev A
2 pages
R Art 42999-10
No ratings yet
R Art 42999-10
5 pages
Rashed
No ratings yet
Rashed
9 pages
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
No ratings yet
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
15 pages
Eswi November 2023 1ST Mid Paper
No ratings yet
Eswi November 2023 1ST Mid Paper
2 pages
Satellite Assisted Flight Tracking and Rescue: S.A.F.T.A.R
No ratings yet
Satellite Assisted Flight Tracking and Rescue: S.A.F.T.A.R
4 pages