Abstract
In the current era of global economic integration and digital economy development, multilingual English translation plays a crucial role in cultural exchange. Traditional translation models have poor adaptability and fitting ability. To improve the translation effect and quality, this article combines the Text-To-Text Transfer Transformer (T5) and Model-Agnostic Meta-Learning (MAML) to study their sustainable improvement and application in multilingual English translation. In this article, the autoregressive learning method is first used to fine tune the pre-trained parameters of T5 model, and a generative multilingual English translation model is constructed. Then, combined with the MAML framework, the model is trained on multiple tasks to achieve rapid adaptation on new task data. Finally, a multilingual parallel corpus is constructed using web crawlers, and the translation model based on T5 and MAML is evaluated for quality using Bilingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) as indicators. The experimental results show that compared to the baseline models Open Neural Machine Translation (OpenNMT), Transformer, and Open Parallel Corpus-Machine Translation (Opus-MT), the mean BLEU Score of the model in this article is 6.05%, 2.59%, and 2.05% higher, respectively. The conclusion indicates that the T5-MAML model can effectively improve the quality of multilingual English translation and achieve more natural and smooth translation output.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the acceleration of economic globalization, the cross-cultural communication and the exchange of multilingual information are increasingly valued by people [1]. The current machine translation methods still face limitations in precision and fluency when solving multilingual translation problems. This restriction not only hinders the precise transmission of cross-linguistic information, but also affects the efficiency and quality of multilingual communication. How to improve the adaptability and coherence of multilingual translation while ensuring quality is an urgent and important issue that needs to be addressed. Text-To-Text Transfer Transformer (T5), as a powerful automatic generation model, has shown good performance in various types of text automatic generation [2]. Model-Agnostic Meta-Learning (MAML) adopts model independent meta learning methods, which can improve the adaptability of the model to new data [3]. Integrating the two organically and continuously improving the quality of multilingual English translation has important practical value for promoting the in-depth development of multilingual translation technology, facilitating cross-cultural communication and understanding.
To improve translation quality and promote cultural exchange and communication, this article combines T5 and MAML to study their application in multilingual English translation. Through the T5 model, powerful language representation capabilities are obtained during large-scale pre training, capturing the structural and semantic features of multiple languages. Combined with the MAML mechanism, the model is endowed with the ability to quickly adapt to new language pairs. Quickly adjust to adapt to other languages. In pre training, cross language alignment techniques are used to further enhance understanding of different language structures, ensuring the accuracy and naturalness of translation results. In the experimental analysis, this article conducts experimental analysis from three aspects: baseline comparison advanced methods comparison, and ablation experiments. At the baseline comparison level, compared to the Open Neural Machine Translation (OpenNMT), Transformer, and Open Parallel Corpus-Machine Translation (Opus-MT) baseline models, the translation model based on the T5-MAML has a mean Bilingual Evaluation Understudy (BLEU) score that is 6.05%, 2.59%, and 2.05% higher, respectively, and a mean Translation Error Rate (TER) that is 11.13%, 6.03%, and 6.09% lower in different language pairs, respectively. The results under the METEOR indicator evaluation are also more ideal. In comparison with advanced methods, our model outperforms the mBART model and XLM-R model in BLEU, TER, and METEOR evaluations. At the level of ablation experiments, compared to the T5-Large model, the mean BLEU score of the model in this article is 3.43% higher, the mean TER result is 8.52% lower than that of the T5-Large model without meta learning, and the METEOR mean is 0.08 higher than the T5 Large model. In practical applications, using T5 and MAML for multilingual English translation can help improve translation quality and promote cultural understanding and communication.
2 Related work
With the rapid development and application of machine translation, multilingual translation models have also achieved certain results [4, 5]. Fan Angela created a multilingual translation model through large-scale mining, utilizing a combination of dense scaling and specific language sparse parameters to effectively increase model capacity. In non-English translation practice, the proposed model could achieve benefits of over 10 BLEU [6]. To solve the machine translation problem in multilingual environments, Singh Salam Michael trained a multilingual translation system based on long short-term memory (LSTM), which integrated cross-language functionality. The experimental results showed that for Manipuri to translation tasks, the proposed multilingual model improved both multilingual and bilingual baselines more than vanilla, verifying the good translation quality of the model [7]. Lalrempuii Candy proposed a translation model based on statistics and modern neural networks. The translation performance predicted by the training model was evaluated using automatic and manual evaluation methods through different tagging methods, architectures, and configurations, and the model performance was compared with existing baselines. The results indicated that the proposed model had advantages in prediction error and prediction quality [8]. Escolano Carlos proposed a new architecture based on introducing interlingual loss as an additional training objective. By adding and enforcing this interlingual loss, multiple encoders and decoders were trained for each language, sharing a common intermediate representation between them. The results showed that the BLEU of the proposed architecture improved by 2.8 points [9]. Although existing models have certain support capabilities for improving the quality of multilingual translation, promoting multilingual communication and collaboration, most models perform poorly in low resource scenarios.
The development of T5 and MAML provides more possibilities for improving the generalization performance of models in multiple languages [10, 11]. To avoid the cost and time intensive data collection and annotation in low resource scenarios, Fuad Ahlam explored the effectiveness of cross language transfer learning in building an end-to-end Arabic task-oriented dialogue system using a multilingual T5 model. The results suggested that compared with Chinese literature using the same settings, the T5 model outperformed traditional cross-language pre-trained methods [12]. Regarding the data preprocessing method for Thai, Phakmongkol Puri generated more questions and answers using multilingual T5. The results showed that compared with other modern transformer models, the proposed enhanced model exhibited more ideal translation performance on the dataset [13]. To achieve accurate classification of low resource languages, Awal Md Rabiul proposed the MAML framework, which utilized self-supervised strategies to overcome the limitations of scarce data in low-resource scenarios and generate better fine-tuning language model initialization to quickly adapt to cross language transfer. The experimental results of the dataset showed that in cross-domain multilingual transfer settings, the performance of the MAML framework was more than 3% higher than the advanced baseline [14]. In the Event Detection (ED) task of low resource languages, Roy Aniruddha proposed a MAML model to address the issue of data scarcity by training instances for cross-language ED. The results demonstrated that this method could find good parameter initialization and quickly adapt to new low-resource languages [15]. By utilizing the powerful pre-training capability of T5 and the fast adaptability of MAML, the model can effectively adapt to new tasks and improve the overall performance of the translation system. However, current research mostly focuses on using T5 or MAML alone for rapid adaptation, lacking research on combining the two to further improve model generalization performance.
3 Multilingual english translation based on T5 and MAML
3.1 Model architecture design
3.1.1 T5 model
The T5 model is a unified framework language model open-source by Google, whose core idea is to transform various natural language processing (NLP) problems into “text-to-text” tasks [16, 17]. As a universal language model. T5 can be widely applied in various NLP tasks such as language translation [18]. The input and output of the model are both text strings, which guide the model to perform and process different NLP tasks by adding task specific prefixes. This article uses autoregressive learning to fine tune the pre-trained parameters of the T5 model, and a generative multilingual English translation model is constructed, as shown in Fig. 1.
In Fig. 1, the model is trained using a combination of text input and text output. In the pre-training stage, the autoregressive learning method is used to predict each text segment from left to right based on contextual information, gradually forming a complete text sequence.
The T5 model mainly consists of an encoder and a decoder [19]. The encoder is a multi-level Transformer encoder used to encode and express input text. It contains information about the location of the text. By converting a series of input texts into vector representations with contextual associations and segmenting the text, the model can better understand the relationships between words. The decoder is also based on the Transformer architecture, which can convert the output of the encoder into the target text based on the context vector provided by the encoder.
During the training process, the T5 model adopts a text-to-text architecture, treating multilingual translation tasks as text-to-text, that is, obtaining corresponding translation outputs in text form. The input and output can use the same objective function in the pre-training and fine-tuning, and the same decoding process can also be used in the testing phase.
When fine-tuning parameters, the source language text string is used as the input text sequence and the target language text string as the output text sequence. By using autoregression, which generates one label at a time and uses it as input for the next time step, a complete output sequence is generated. The steps are shown in Table 1:
In the autoregressive learning, parameter learning is achieved by maximizing the conditional probability of the target sequence. This step is represented by the formula [20, 21]:
Among them, the definitions of the variables in Formula (1) are shown in Table 2:
In Table 2, there is \({y}_{<t}=\left({y}_{1},{y}_{2},\cdots ,{y}_{<t}\right)\).
To optimize the model parameters, cross entropy is used as the loss function to measure the difference between the predicted and actual values of the model. The cross entropy loss function is defined as:
By minimizing \(\mathcal{L}\) and optimizing the parameters of the model, the generation precision of the target language sequence in the translation process can be improved.
During the training phase, the gradient descent is used to update the parameters of the model. Assuming the parameters of the model are \(\theta\), the formula for parameter update is:
Among them, \(\eta\) is the learning rate, and \({\nabla }_{\theta }\mathcal{L}\) is the gradient of the loss function with respect to parameter \(\theta\).
3.1.2 MAML
Different languages have significant differences in grammar, vocabulary, syntax, and other aspects, which to some extent leads to the high complexity of T5 in NLP processes [22]. Low-resource languages in multilingual environments often lack restructured language sample data, and building a universal multilingual model requires massive amounts of data and computing resources. When faced with new language pairs, it is often necessary to retrain or fine tune them, which increases computational costs and restricts their adaptability in real scenarios. In response to this, this article combines the MAML framework and trains the model on multiple tasks to enable rapid adaptation on new task data, as shown in Fig. 2.
In Fig. 2, the MAML framework considers each language pair as a separate task. By training and learning through MAML, the model can quickly adjust parameters in low-resource environments, learn new language pairs quickly, and achieve better translation results on new language pairs.
It is assumed that \(\left(r-th(1\le r\le R\right)\)) batches are sampled in a specific batch of tasks. This batch includes \(Q\) tasks. Given the model parameter \(\theta\), the traversal through each task in the current batch is performed to calculate the loss gradient \({\nabla }_{\theta }{\mathcal{L}}_{{{T}_{i}}_{support}}f(\theta )\) of the current task support set. The intermediate variable \({\theta }_{i}^{\prime}\) is further updated based on the loss gradient [23, 24]:
The loss \({\mathcal{L}}_{{{T}_{i}}_{query}}f({\theta }_{i}^{\prime})\) of the current task query set related to \({\theta }_{i}^{\prime}\) is calculated. The cumulative loss and \(\sum {\mathcal{L}}_{{{T}_{i}}_{query}}f({\theta }_{i}^{\prime})\) are recorded.
\({\nabla }_{\theta }{\mathcal{L}}_{{{T}_{i}}_{query}}f({\theta }_{i}^{\prime})\) is used to represent the cumulative loss related to the model parameter \(\theta\), and the final model parameter \(\theta\) is updated [25, 26]:
3.2 Data collection and preprocessing
Starting from the needs of practical applications and the availability of data, this article takes English as the main target language and Chinese, Japanese, German, Spanish, and French as source languages. In the construction of multilingual parallel corpora, English monolingual data is first acquired from authoritative websites such as Wikipedia and Europarl to obtain sufficient English monolingual text data. Then, by cleaning and preprocessing the obtained English text data, English monolingual corpus is obtained. On this basis, using Google Translate technology, the English monolingual corpus is translated into five languages: Chinese, Japanese, German, Spanish, and French, forming a parallel corpus of multiple languages. The overall framework is shown in Fig. 3:
The data collection is carried out through web crawlers, as shown in Fig. 4. Before performing the collection work, the crawler object is first determined. On Wikipedia, web pages that contain English texts of historical events and scientific articles are selected to ensure that the pages have corresponding versions in multiple languages. On the website of Europarl, relevant meeting minutes and report text data are crawled in a centralized manner. The open-source scraping tool Scrapy is used to obtain real data. Firstly, the Scrapy project is configured, and its target pages and rules are defined. The selector function of Scrapy is used to accurately find the corresponding Hyper Text Markup Language (HTML) element. On Wikipedia, the language linking section is found, and the language text of the same content is retrieved through linking relationships. On Europarl, the Uniform Resource Locator (URL) structure is used to find the meeting minutes page. Normal browser behaviors are simulated by configuring request headers and proxy servers. The collection time and delay are set to reduce the load on the site server.
After the data collection work is completed, the collected text is preliminarily cleaned and processed. Regular expressions are utilized to extract valuable text content from web pages, removing irrelevant content such as advertisements and navigation bars. Some examples are shown in Table 3:
On the basis of extracting content from Table 3 pages, a translation model is used to match different language texts of English monolingual data to ensure that each corpus is parallel to each other. At the same time, during the acquisition process, language detection is performed on the collected text, and consistency checks are performed on it. To avoid bias or errors in the training corpus generated by Google Translate, a bidirectional translation method is adopted to verify and improve translation quality. Firstly, select a portion of the original English text as the initial input. The sentence structure covered by the input text mainly includes simple declarative sentences, complex subordinate clauses, and sentences containing professional terminology. Then use Google Translate to translate the input English text into Chinese, Japanese, German, Spanish, and French; Translate these translated texts back into English. In this process, compare the differences between the original English text and the reverse translated English text in terms of word meaning, word order, and grammatical structure. Including differences in word order, deviations in word meaning, changes in grammatical structure, and misunderstandings in context. Establish a terminology list for fixed phrases or proper nouns, and automatically adjust word order and grammar structure based on differences in word order and grammar structure through integrated conversion rules. After completing all necessary corrections, update the corpus and add validated and modified parallel corpora to the training set. The open-source software Natural Language Toolkit (NLTK) is used to complete text processing, standardize the format and content of text, and remove unnecessary punctuation and special text. Finally, the data is saved and preprocessed in JavaScript Object Notation (JSON) standard format.
In order to expand the training dataset and improve its applicability in different translation scenarios, this paper adopts a cross language alignment method. Align data from different language pairs to enhance the performance of the model in multilingual conversion. Firstly, extract features from the text of each language. Use word1Vec word vectors to represent each word in the text as a vector, in order to capture the semantic relationship between words. For each sentence, calculate the average of all word vectors as the vector representation of the sentence:
Among them, \({v}_{i}\) is the vector representation of the \(i\)-th word, and \(n\) is the number of words in the sentence.
Use dependency parsing to obtain the syntax tree of a sentence and convert it into a vector representation. For each occurrence in the sentence, determine its subject verb relationship and determine its dependency relationship through the adjacency matrix \(A\). \({A}_{ij}=1\) indicates that the word depends on the word, otherwise \({A}_{ij}=0\). Namely:
Among them, \({w}_{i}\) and \({w}_{j}\) are the vocabulary in the sentence, respectively. Use Tree Structured Neural Networks (TreeNN) to transform dependency graphs into vector representations. TreeNN recursively passes information along the dependency graph and ultimately aggregates the vector representation of the entire sentence
\(children(i)\) represents all child nodes of word \(i\), \(W\) and \(b\) are learnable parameter matrices and bias terms, and \({h}_{i}\) is the hidden state vector of the word.
Finally, the vector representation of the sentence is obtained by aggregating the hidden state vectors of all words:
Among them, \(\left|\text{V}\right|\) is the number of words in the sentence, and \(s\) is the vector representation of the sentence.
Using LDA (Latent Dirichlet Allocation) topic model to extract text topic distribution as advanced sentence features:
In cross linguistic feature mapping, feature vectors from different languages are mapped to the same feature space for effective alignment. For each language pair, use existing word vectors to represent vocabulary. Then, use a linear transformation matrix to align the word vector spaces of the two languages. The goal is to minimize the distance between two sets of word vectors:
Among them, \(X\) and \(Y\) represent the word vector matrices of the source language and the target language, respectively, and \(W\) is the alignment matrix. Using Siamese networks to learn sentence similarity, mapping relationships between sentences in different languages, and using them for alignment tasks:
By analyzing the structure and semantic features of texts in different languages, aligning sentences in different languages, a diverse and balanced multilingual parallel corpus can be constructed.
The final collected data is shown in Table 4:
According to Table 4, 7592 monolingual English data have been collected, while 3766, 3082, 3615, 3044, and 3128 language pair data have been collected for Chinese-English, Japanese-English, German-English, Spanish–English, and French–English, respectively.
3.3 Quality evaluation indicators
3.3.1 BLEU
The BLEU algorithm determines the optimal number of bytes by comparing the continuously generated n-tuples in the candidate translation with the n-tuples in the standard reference translation [27]. On this basis, the number of n-tuples in the candidate translations is counted. In addition, there is a new parameter in the BLEU algorithm, namely the Brevity Penalty (BP), which penalizes candidate translations with lengths shorter than the standard reference translation. The BLEU calculation formula is [28]:
The calculation formula of \(BP\) is expressed as:
Among them, the definitions of formula variables are shown in Table 5:
The value of BLEU is between [0,1]. In the BLEU algorithm, the N-gram mechanism is used to evaluate the quality of the translated text generated by the model. Here, N means that a set contains N adjacent words for matching. Taking “The Earth revolves around the Sun.” as an example, its matching examples under the N-gram mechanism are shown in Table 6:
In the evaluation of translation quality, N is generally taken as 4, which is BLEU-4. This article evaluates the translation quality of the model using the BLEU-4 standard.
3.4 TER
TER is an indicator that measures the minimum amount of editing operations required to translate a result into a reference translation. These editing actions include insertion, deletion, and substitution. Its specific implementation is shown in Fig. 5:
The formula is [29]:
Among them, \({min E}_{n}\) is the minimum number of edits required for the translated output to become a reference translation, and \({W}_{t}\) is the total number of words in the reference translation.
3.5 METEOR
METEOR not only focuses on vocabulary matching, but also considers synonyms and morphological changes. By introducing penalties for word order, METEOR can better reflect the naturalness of translation. The calculation formula is expressed as:
Among them, \(P\) is the proportion of words that match the reference translation in automatic translation; \(R\) is the proportion of vocabulary in the reference translation that matches the automatic translation, \(\alpha\) is the weight parameter, set to 0.9.
4 Experimental evaluation of multilingual english translation quality based on T5 and MAML
To evaluate the quality of multilingual English translation based on T5 and MAML, this article conducts experimental analysis from three levels: baseline comparison advanced methods comparison, and ablation experiments.
4.1 Experimental data
The collected English monolingual sample data is divided as shown in Table 7:
The English text data obtained in Table 7 is cleaned and preprocessed, and the Google Translate technology is used to form a multilingual parallel corpus. The bilingual sample data is shown in Table 8:
In Table 8, EC, EJ, EG, ES, and EF represent bilingual sample data for English Chinese, Japanese, German, Spanish, and French, respectively.
On this basis, the dataset samples in the experiment are divided in an 8:2 ratio, as shown in Table 9:
According to Table 9, in the dataset division, two disjoint metadata sets \({D}_{train}\) and \({D}_{test}\) are obtained. In translation evaluation, a task consists of two parts: support set and query set. Support set: n-class k training refers to training with n categories of data in the same task, each category containing k labeled samples, that is, n × k labeled samples. Query set: one task contains q samples.
The model and meta learning parameter settings are shown in Table 10:
4.2 Baseline comparison
In the experiment, three types of baseline models are compared with the T5-MAML model, and their performance in multilingual translation tasks is evaluated.
Baseline models:
OpenNMT: an open-source translation model that supports multiple neural network architectures.
Transformer: a typical multilingual transformer model trained on the same multilingual dataset.
Opus-MT: a multilingual translation system based on the OPUS model.
Experimental model:
T5-MAML: adopting MAML meta learning strategy based on the T5 model.
For each model, the same training dataset is used for training. The baseline model settings are consistent with the experimental model.
-
1)
Baseline comparison analysis of BLEU score.
BLEU score evaluates translation quality based on N-gram matching degree, with a particular emphasis on the precision and fluency of the translation. This article compares the BLEU scores of different models in the same translation task to compare their differences in practical use. The final results are shown in Fig. 6:
From Fig. 6, it can be seen that the BLEU score of the model in this article is generally high. The mean BLEU score of the model in this article reaches about 32.53% under different language tasks; the mean BLEU scores of the OpenNMT model, Transformer model, and Opus-MT model in different language pairs are approximately 26.48%, 29.94%, and 30.47%, respectively. From the specific comparison results, compared to the other three types of models, the mean BLEU score of the translation model based on T5-MAML is 6.05%, 2.59%, and 2.05% higher, respectively. This result indicates that the T5-MAML model can better adapt to multilingual translation tasks and achieve better translation results on fewer data samples. By pre training the T5 model on unlabeled text data, rich language representations were learned, enabling the model to have better language understanding and generation capabilities, thereby improving translation quality and achieving an improvement in BLEU Score.
-
2)
Baseline comparison analysis of TER.
TER pays more attention to the differences in editing between translated and referenced translations. In the TER comparison experiment, this article compares the TER values of various models in the same text to test the translation quality, readability, and attention to detail processing of the models. The final results are shown in Fig. 7:
From Fig. 7, it can be seen that the four types of models show certain differences in the baseline comparison results of TER. Among them, the lowest TER value of the model in this article reaches 50.30%, and its mean TER value under different language tasks is about 52.59%. The TER values of OpenNMT model, Transformer model, and Opus-MT model are the lowest at 59.14%, 52.47%, and 52.26%, respectively, with mean TER values of approximately 63.72%, 58.62%, and 58.68%, respectively. From the comparison of mean values, compared to the baseline model, the TER mean of the model in this article is 11.13%, 6.03%, and 6.09% lower in different language pairs, respectively. This indicates that the T5-MAML model can better adapt to multilingual translation tasks in terms of editing distance and achieve better translation results on fewer data samples. The rich language representation provided by T5, combined with the fast learning ability of MAML, enables the model to have good basic language understanding ability, effectively improving fluency in specific translation tasks.
-
3)
Baseline comparison analysis of METEOR.
The METEOR scores for the four types of models are shown in Table 11:
From Table 11, it can be seen that compared to the baseline model, the T5-MAML model performs the best in METEOR evaluation, with an average score of 0.73. The Transformer model has an average score of 0.70, and the OpenNMT and Opus MT have average METEOR scores of 0.66 and 0.62, respectively. This result reflects that the T5-MAML model combines advanced pre training techniques and meta learning strategies to more effectively improve translation quality in multilingual translation tasks.
4.3 Comparison of advanced methods
Based on the comparison of baseline models, in order to comprehensively evaluate the effectiveness of our model, we compared it with advanced multilingual models: mBART (Multilingual BART): mBART is a multilingual version of the BART (Bidirectional and Auto Progressive Transformers) model proposed by Facebook AI. It achieves unsupervised machine translation of multiple languages by using bidirectional encoder representation (BERT like) and autoregressive decoder between the encoder and decoder.
XLM-R (Cross lingual Language Model Pretraining): XLM-R is another powerful multilingual pretraining model, which is an improved version of XLM (Cross lingual Language Model). XLM-R is able to capture relationships between different languages and perform well in tasks involving multiple languages by pre training on large-scale multilingual texts.
Compare the model presented in this article with mBART and XLM-R models, and discuss its performance in BLEU, TER, and METEOR.
(1) Comparative analysis of BLEU with advanced methods.
The comparison results of three types of models in BLEU are shown in Table 12:
According to Table 12, the mean BLEU evaluation results of mBART model and XLM-R model under various language sequences reached 31.54 and 31.02, respectively. Compared with these two types of models, the BLEU mean of T5-MAML is 0.99% and 1.51% higher, respectively. This result indicates that in terms of translation accuracy, our model is more ideal than the other two advanced multilingual translation models. Based on meta learning and training fine-tuning, the T5-MAML model can learn a wider range of language patterns and contextual information, and has better generalization ability and stronger adaptability when dealing with multilingual translation.
(2) Comparative analysis of TER with advanced methods
The comparison results of the three types of models in TER are shown in Table 13:
According to Table 13, the mean TER results of the mBART model and XLM-R model for each language sequence are 54.24% and 54.21%, respectively, which are slightly higher than those of the T5-MAML model. This represents that the translation results generated by the T5-MAML model are closer to the results of manual translation and require less correction. In the process of multilingual training, compared with the mBART model and XLM-R model, the T5-MAML model focuses more on optimizing the editing distance. Through effective adaptive training strategies, it can minimize the gap between the generated translation and the reference translation as much as possible.
(2) omparative analysis of METEOR with advanced methods.
The comparison results of three types of models in METEOR are shown in Table 14:
From Table 14, it can be seen that the T5-MAML model has more significant advantages. Among them, the mean METEOR results of mBART model and XLM-R model in various language sequences were 0.70 and 0.69, respectively, slightly lower than the model in this paper. This indicates that the translation results generated by the T5-MAML model not only match well with the reference translation at the lexical level, but also perform better in terms of word order, fragment length matching, and semantic coherence. Compared with the T5-MAML model, although the mBART model and XLM-R model also learned cross lingual features during the pre training stage, they do not have a dedicated meta learning mechanism to accelerate their ability to adapt to new tasks.
4.4 Ablation experiment
In the ablation experiment, the impact of MAML meta learning strategy on the translation quality is studied by removing it. In the T5-MAML model, the MAML meta learning strategy is used to train the model. The T5-Large model is used as the ablated model, which is trained only using standard supervised learning methods. The hyperparameter settings of the two types of models remain consistent.
-
1)
Ablation comparison analysis of BLEU score.
The impact of MAML strategy on the translation quality is analyzed by comparing the BLEU scores of T5-MAML model and T5-Large model. The ablation comparison results are shown in Fig. 8:
From Fig. 8, it can be seen that in the ablation experiment, the BLEU score of the model in this article is significantly better than that of the T5-Large model. From the specific results, the highest BLEU score of the model in this article reaches 33.63%, and its mean in different language tasks reaches 31.67%. The highest BLEU score of the T5-Large model is 30.67%, with a mean of 28.24% in different language tasks. Compared to the T5-Large model, the mean BLEU score of the model in this article is 3.43% higher. The model in this article integrates MAML algorithm, which enables the model to quickly adapt to new tasks on data. In translation tasks, this means that the model can learn the mapping relationship between new language pairs faster, thereby improving translation quality.
-
2)
Ablation comparison analysis of TER.
By comparing the TER values of two models in the same translation task, the impact of meta learning strategies on the translation quality and detail processing is analyzed. The final comparison results are shown in Fig. 9:
In Fig. 9, the TER results of the model in different language pairs are generally lower than those of the T5-Large model. From the specific comparison results, it can be seen that the TER results of the model in this article in different language tasks are the lowest at 49.45%, with a mean of about 50.82%. The T5-Large model achieves the lowest TER result of 58.27% in different language pairs, with a mean of approximately 59.34%. The mean TER of the T5-MAML model in this article is 8.52% lower than that of the T5-Large model without meta learning. From this result, it can be seen that the T5-MAML model using meta learning performs better in English translation quality in multilingual environments. The MAML algorithm enables models to quickly learn from data, and in multilingual environments, models can quickly adapt and generate high-quality translations.
(3) Ablation comparison analysis of METEOR.
By comparing the METEOR scores of the models in the same translation task, analyze the impact of meta learning strategies on the naturalness of the translated text. The final comparison results are shown in Table 15:
From Table 15, it can be seen that under the ablation experiment, the mean METEOR score of our model is 0.75, and the mean METEOR score of the T5 Large model is 0.67. From the comparison results, it can be seen that meta learning strategies have a key impact on the naturalness of the model's generated translations. Meta learning strategies utilize prior knowledge to better understand the mapping relationship between the source language and the target language in translation tasks, thereby generating more natural and fluent translations.
5 Discussion
This article verifies the sustainable improvement and application effect of multilingual English translation quality based on T5 and MAML through baseline comparison advanced methods comparison, and ablation experiments. From the baseline comparison, compared to the other three baseline models, the T5-MAML model in this article performs better in BLEU Score and TER levels. The T5-MAML model can learn more semantic information and structure through joint training on multiple tasks, thus having higher translation quality. From the ablation experiment, it can be seen that compared with the T5-Large model without meta learning, the T5-MAML model has higher BLEU scores and lower TER results. MAML enables the model to have strong multilingual English translation capabilities, reducing overfitting to specific languages and improving the overall quality of the translation. Overall, the T5-MAML model has better adaptability and generalization ability, and can better perform multilingual translation tasks, obtaining high-quality translations with a smaller sample size.
T5, as the basic model, has learned rich language representations through pre training on a large amount of text data. Combining MAML meta learning mechanism, quickly adapt to new tasks based on corpus training, and learn conversion rules between new language pairs more quickly. Through cross language alignment techniques and the use of multilingual datasets, the diversity and balance of the model have been further enhanced. Although this article provides some guidance for improving the quality of multilingual translation, there are still limitations in terms of cross domain adaptability. T5 pre trained on large-scale datasets may be affected by data imbalance during corpus training, and the selection of hyperparameters in MAML methods has a significant impact on the performance of the model. In future research, we will consider building multimodal translation scenarios to study how to maintain consistency and naturalness in translation in a multimodal environment, in order to promote the high-quality development of multilingual intelligent translation. From the perspective of improving the multilingual pre training model, cross language transfer learning is utilized to enhance the performance of the model in low resource languages, further optimizing the meta learning algorithm to make it more efficient and stable on different language pairs.
6 Conclusion
In the context of global integration, cross-cultural communication is becoming increasingly frequent, making multilingual translation an inevitable trend. Traditional translation models have limitations in their training adaptability and fitting ability in multilingual environments. To improve the translation performance and level of the model, this article combines T5 and MAML to study their sustainable quality improvement and application effect in multilingual English translation. Compared with the baseline models, the T5-MAML model has more ideal translation quality, and MAML effectively enhances the model’s rapid adaptation and generalization ability, improving the practicality of the translation model. Although this article can provide some guidance for multilingual English translation, it also has limitations. This article mainly focuses on the translation task between English and other languages, and the translation effect between non English languages still needs further verification. In future research, exploring the application of T5 and MAML meta learning strategies in translation of other languages can be considered to further promote cross-cultural communication and exchange.
Data availability
The data are available from the corresponding author on reasonable request.
References
Munoz-Basols J. Going beyond the comfort zone: multilingualism, translation and mediation to foster plurilingual competence. Lang Cult Curric. 2019;32(3):299–321. https://doi.org/10.1080/07908318.2019.1661687.
Hwang M-H, Jikang S, Hojin S, Jeong-Seon Im, Hee C, Chun-Kwon L. Ensemble-nqg-t5: Ensemble neural question generation model based on text-to-text transfer transformer. Appl Sci. 2023;13(2):903–14. https://doi.org/10.3390/app13020903.
Ji K, Yang J, Liang Y. Theoretical convergence of multi-step model-agnostic meta-learning. J Mach Learn Res. 2022;23(29):1–41. https://doi.org/10.48550/arXiv.2002.07836.
Dabre R, Chu C, Kunchukuttan A. A survey of multilingual neural machine translation. ACM Comput Surv. 2020;53(5):1–38. https://doi.org/10.1145/3406095.
Goitom M. Multilingual research: reflections on translating qualitative data. Br J Soc Work. 2020;50(2):548–64. https://doi.org/10.1093/bjsw/bcz162.
Fan A, Shruti B, Holger S, Zhiyi M, Ahmed E-K, Siddharth G, et al. Beyond english-centric multilingual machine translation. J Mach Learning Res. 2021;22(107):1–48.
Singh SM, Thoudam DS. An empirical study of low-resource neural machine translation of manipuri in multilingual settings. Neural Comput Appl. 2022;34:14823–44. https://doi.org/10.1007/s00521-022-07337-8.
Lalrempuii C, Soni B, Pakray P. An improved English-to-Mizo neural machine translation. Trans Asian Low-Res Lang Inf Process. 2021;20(4):1–21. https://doi.org/10.1145/3445974.
Escolano C, Marta RC-J, Jose ARF. From bilingual to multilingual neural-based machine translation by incremental training. J Assoc Inf Sci Technol. 2021;72:190–203. https://doi.org/10.1002/asi.24395.
Goyal R, Parteek K, Singh VP. Automated question and answer generation from texts using text-to-text transformers. Arabian J Sci Eng. 2024;49:3027–41. https://doi.org/10.1007/s13369-023-07840-7.
Yao X, Zhu J, Huo G, Ning Xu, Liu X, Zhang Ce. Model-agnostic multi-stage loss optimization meta learning. Int J Mach Learn Cybern. 2021;12(8):2349–63. https://doi.org/10.1007/s13042-021-01316-6.
Fuad A, Al-Yahya M. Cross-lingual transfer learning for Arabic task-oriented dialogue systems using multilingual transformer model mT5. Mathematics. 2022;10(5):746–54. https://doi.org/10.3390/math10050746.
Phakmongkol P, Vateekul P. Enhance text-to-text transfer transformer with generated questions for Thai question answering. Appl Sci. 2021;11(21):10267–83. https://doi.org/10.3390/app112110267.
Rabiul AM, Lee R-W, Tanwar E, Garg T, Chakraborty T. Model-agnostic meta-learning for multilingual hate speech detection. IEEE Trans Comput Soc Syst. 2023;11(1):1086–95. https://doi.org/10.1109/TCSS.2023.3252401.
Roy A, Isha S, Sudeshna S, Pawan GoyalAuthors Info & Claims. Cross-lingual event detection using meta-learning for Indian languages. ACM Trans Asian Low-Resource Lang Inf Process. 2023;222:1–22. https://doi.org/10.1145/3555340.
Agrawal A, Shukla P. Context aware automatic subjective and objective question generation using fast text to text transfer learning. Int J Adv Comput Sci Appl. 2023;14(4):456–63.
Qi Z, Yongsheng F. News text summarization generation based on improved T5 PEGASUS model. Electron Sci Technol. 2023;36(12):72–8. https://doi.org/10.16180/j.cnki.issn1007-7820.2023.12.010.
Jian XU, Yu SU, Liming ZH. An automatic generation model of multiple-choice questions based on T5. J Qujing Normal Univ. 2021;406:36–42.
Etemad AG, Ali IA, Megha C. Fine-tuned t5 for abstractive summarization. Int J Performability Eng. 2021;17.10:900–6. https://doi.org/10.23940/ijpe.21.10.p8.900906.
Naiyu W, Yuxin Ye, Liu Lu, Lizhou F, Tie B, Tao P. Research progress on language models based on deep learning. J Softw. 2020;32(4):1082–115. https://doi.org/10.13328/j.cnki.jos.006169.
Zengying Y, Xia Ye, Ruiheng L. A review of pre training techniques based on language models. J Chin Inf Processing. 2021;35(9):15–29. https://doi.org/10.3969/j.issn.1003-0077.2021.09.002.
Raffel C, Noam S, Adam R, Katherine L, Sharan N, Michael M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learning Res. 2020;21(140):1–67.
Renjie Xu, Baodi L, Kai Z, Weifeng L. Model independent meta learning algorithm based on Bayesian weight function. J Comput Appl. 2022;42(3):708–12. https://doi.org/10.11772/j.issn.1001-9081.2021040758.
Yang N, Bangning Z, Guoru D, Yimin W, Guofeng W, Jian W, et al. “Specific emitter identification with limited samples: a model-agnostic meta-learning approach.” IEEE Communications Letters 26.2 (2021): 345–349. https://doi.org/10.1109/LCOMM.2021.3110775.
Fanchang Li, Liu Yang Wu, Pengxiang DF, Qi C, Zhe W. A review of meta learning research. Chin J Comput. 2021;44(2):422–46. https://doi.org/10.11897/SP.J.1016.2021.00422.
Li D, Haojun F, Biying Z, Jiangzhou L, Haichao L. A meta learning knowledge reasoning framework that integrates semantic paths and language models. J Electron Inf Technol. 2022;44(12):4376–83. https://doi.org/10.11999/JEIT211034.
Chauhan S, Philemon D, Archita M, Abhay K. Adableu: a modified bleu score for morphologically rich languages. IETE J Res. 2023;69(8):5112–23. https://doi.org/10.1080/03772063.2021.1962745.
Diab N. Out of the BLEU: an error analysis of statistical and neural machine translation of WikiHow articles from English into Arabic. CDELT Occasional Papers Dev English Educ. 2021;75(1):181–211. https://doi.org/10.21608/opde.2021.208437.
Rivera-Trigueros I. Machine translation systems and quality assessment: a systematic review. Lang Resour Eval. 2022;56(2):593–619. https://doi.org/10.1007/s10579-021-09537-5.
Funding
This work was supported by the academic funding for Youth Talent of “On the Waixuan Translation Mechanism of Huizhou Culture from the Perspective of ‘Cultural Confidence’” of Anhui Provincial Department of Education in 2022 under Grant no. gxyqZD2022092. The academic funding project for “Study on the Independent Professional Development of College English Teachers in the Context of New Liberal Arts” of Anhui Provincial Department of Education in 2022 under Grant no. 2022jyxm628.
Author information
Authors and Affiliations
Contributions
HS-Writing-original draft,review and editing,Conceptualization,Formal analysis,Methology,Validation;BK-review and editing,Supervision,Visualization,Project adminstration.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study did not involve any animal or human testing.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sun, H., Kong, B. Sustainable improvement and application of multilingual english translation quality using T5 and MAML. Discov Artif Intell 4, 98 (2024). https://doi.org/10.1007/s44163-024-00213-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44163-024-00213-5