Controllable Sentence Simplification With A Unified Text-to-Text Transfer Transformer
Controllable Sentence Simplification With A Unified Text-to-Text Transfer Transformer
Transformer
341
Proceedings of the 14th International Conference on Natural Language Generation (INLG), pages 341–352,
Aberdeen, Scotland, UK, 20-24 September 2021. ©2021 Association for Computational Linguistics
2 Related Work inter-relations between a control sequence and the
complexity of the outputs. The results of Scar-
2.1 Sentence Simplification
ton and Specia (2018), Martin et al. (2020b), and
It is often regarded as a monolingual translation Kariuk and Karamshuk (2020) have shown that
problem (Zhu et al., 2010; Coster and Kauchak, adding control tokens does help improve the per-
2011; Wubben et al., 2012), where the models are formance of sentence simplification models quite
trained on parallel complex-simple sentences ex- significantly.
tracted from English Wikipedia and Simple English Building upon Martin et al. (2020b), we fine-
Wikipedia (SEW) (Zhu et al., 2010). tune T5 with all control tokens as defined in Martin
There are many approaches based on statisti- et al. (2020b) to control different aspects of the
cal Machine Translation (SMT), including phrase- output sentences. Moreover, we add one more
based MT (PBMT) (Štajner et al., 2015), and control token (number of words ratio) in order to
syntax-based MT (SBMT) (Xu et al., 2016). Nisioi be able to generate new sentences with a similar
et al. (2017) introduced Neural Text Simplifica- length as the source but shorter in word length as
tion (NTS), a Neural-Machine-Translation-based we believe that the number characters ratio alone is
system (NMT) which performs better than SMT. not enough for the model to generate shorter words.
Zhang and Lapata (2017) took a similar approach
adding lexical constraints combining the NMT 3 Model
model with reinforcement learning. After the re-
lease of Transformer (Vaswani et al., 2017), Zhao In this work, we fine-tune T5 pre-trained model
et al. (2018a) introduced a Transformer-based ap- with the controllable mechanism on Text Simplifi-
proach and integrated it with a paraphrase database cation. T5 (A Unified Text-to-Text Transfer Trans-
for simplification called Simple PPDB (Pavlick and former) (Raffel et al., 2019) is pre-trained on a
Callison-Burch, 2016a). The model outperforms number of supervised and unsupervised tasks such
all previous state-of-the-art models in sentence sim- as machine translation, document summarization,
plification. question answering, classification tasks, and read-
Our proposed model is also a sequence-to- ing comprehension, as well as BERT-style token
sequence Transformer-based model, but instead and span masking (Devlin et al., 2019). There are
of using the original Transformer by Vaswani et al. five different variants of T5 pre-trained models: T5-
(2017), we use T5 (Raffel et al., 2020). small (5 attention modules, 60 million parameters),
and T5-base (12 attention modules, 220 million
2.2 Controllable Sentence Simplification parameters). Due to the limited resources of Co-
In recent years, there has been increased interest lab Pro, we are able to train only T5-small and
in conditional training with sequence-to-sequence T5-base.
models. It has been applied to some NLP tasks
such as controlling the length and content of sum- 3.1 Control Tokens
maries (Kikuchi et al., 2016; Fan et al., 2017), We use control tokens to control different aspects of
politeness in machine translation (Sennrich et al., simplification such as compression ratio (#Chars),
2016), and linguistic style in text generation (Ficler paraphrasing (Levenshtein similarity), lexical com-
and Goldberg, 2017). Scarton and Specia (2018) plexity (word rank), and syntactic complexity (the
introduced the controllable TS model by embed- depth of dependency tree) as defined in (Martin
ding grade level token <grade> into the sequence- et al., 2020b). Then, we add another control to-
to-sequence model. Martin et al. (2020b) took a ken word ratio (#Words) to control word length.
similar approach adding 4 tokens into source sen- We argue that word ratio is another important con-
tences to control different aspects of the output trol token because normally word frequency cor-
such as length, paraphrasing, lexical complexity, relates well with familiarity, and word length can
and syntactic complexity. Kariuk and Karamshuk be an additional factor as long words tend to be
(2020) took the idea of using control tokens from hard to read (Rello et al., 2013b). Moreover, cor-
Martin et al. (2020b) and used it in unsupervised pus studies of original and simplified texts show
approach by integrating those control tokens into that simple texts contain shorter and more frequent
the back translation algorithm, which allows the words (Drndarević and Saggion, 2012). Therefore,
model to self-supervise the process of learning we add word ratio to help the model generate sim-
342
plified sentences with a similar amount of words 4.1 Datasets
and shorter in word length, whereas #Chars alone We use the WikiLarge dataset (Zhang and Lapata,
could help the model regulate sentence length but 2017) for training. It is the largest and most com-
not word length. monly used text simplification dataset containing
• #Chars (C): character length ratio between 296,402 sentence pairs from automatically aligned
source sentence and target sentence. The num- complex-simple sentence pairs English Wikipedia
ber of characters in target divided by that of and Simple English Wikipedia which is compiled
the source. from (Zhu et al., 2010; Woodsend and Lapata,
2011; Kauchak, 2013).
• LevSim (L): normalized character-level Lev-
For validation and testing, we use TurkCorpus
enshtein similarity (Levenshtein, 1966) be-
(Xu et al., 2016), which has 2000 samples for vali-
tween the source and target.
dation and 359 samples for testing, and each com-
• WordRank (WR): inverse frequency order plex sentence has 8 human simplifications. We also
of all words in the target divided by that of the use a newly created dataset called ASSET (Alva-
source. Manchego et al., 2020) for testing, which contains
2000/359 samples (validation/test) with 10 simpli-
• DepTreeDepth (DTD): maximum depth of fications per source sentence.
the dependency tree of the target divided by
that of the source. 4.2 Evaluation Metrics
• #Words (W): number of words ratio between Following previous research (Zhang and Lapata,
source sentence and target sentence. The num- 2017; Martin et al., 2020a), we use automatic eval-
ber of words in target divided by that of the uation metrics widely used in text simplification
source. task.
Table 1 shows an example of a sentence embed- SARI (Xu et al., 2016) compares system outputs
ded with control tokens for training. with the references and the source sentence. It
measures the performance of text simplification on
Source a lexical level by explicitly measuring the goodness
of words that are added, deleted and kept. So far, it
simplify: W 0.58 C 0.52 L 0.67 WR 0.92 is the most commonly adopted metric and we use
DTD 0.71 In architectural decoration Small it as an overall score.
pieces of colored and iridescent shell have been
used to create mosaics and inlays, which have BLEU (Papineni et al., 2002) is originally de-
been used to decorate walls, furniture and boxes. signed for Machine Translation and is commonly
used previously. BLEU has lost its popularity on
Target Text Simplification due to the fact that it correlates
Small pieces of colored and shiny shell has been poorly with human judgments and often penalizes
used to decorate walls, furniture and boxes. simpler sentences (Sulem et al., 2018). We keep
using it so that we can compare our system with
previous systems.
Table 1: This table shows how control tokens are em-
bedded into the source sentence for training. The key- FKGL (Kincaid et al., 1975) In addition to SARI
word simplify is added at the beginning of each source and BLEU, we use FKGL to measure readability;
sentence to mark it as a simplification task.
however, it does not take into account grammatical-
ity and meaning preservation.
4 Experiments We compute SARI, BLEU, and FKGL using
EASSE (Alva-Manchego et al., 2019)5 , a simplifi-
Our model is developed using the Huggingface
cation evaluation library.
Transformers library (Wolf et al., 2019)2 with Py-
Torch3 and Pytorch lightning4 . 4.3 Training Details
2
https://huggingface.co/transformers/ We performed hyperparameters search using Op-
model_doc/t5.html tuna (Akiba et al., 2019) with T5-small and reduced
3
https://pytorch.org
4 5
https://pytorchlightning.ai https://github.com/feralvam/easse
343
size dataset to speed up the process. All models combined with a lexical simplification model to
are trained with the same hyperparameters such improve complex word substitutions.
as a batch size of 6 for T5-base and 12 for T5-
DMASS+DCSS (Zhao et al., 2018b) A Seq2Seq
small, maximum token of 256, learning rate of
model trained with the original Transformer ar-
3e-4, weight decay of 0.1, Adam epsilon of 1e-8, 5
chitecture (Vaswani et al., 2017) combined with
warm up steps, 5 epochs, and the rest of the param-
the simple paraphrase database for simplification
eters are left with default values from Transformers
PPDB. (Pavlick and Callison-Burch, 2016b).
library. Also, the seed is set to 12 for reproducibil-
ity. For the generation, we use beam size of 8. ACCESS (Martin et al., 2020b) Seq2Seq system
Our models are trained and evaluated using Google trained with four control tokens attached to source
Colab Pro, which has a random GPU T4 or P100. sentence: character length ratio, Levenshtein simi-
Both have 16GB of memory, up to 25GB of RAM, larity ratio, word rank ratio, and dependency tree
and a time limit of 24h maximum for the execution depth ratio between source and target sentence.
of cells. Training of T5-base model for 5 epochs
usually takes around 20 hours. BART+ACCESS (Martin et al., 2020a) The sys-
tem fine-tunes BART (Lewis et al., 2020) and adds
4.4 Choosing Control Token Values at the simplification control tokens from ACCESS.
Inference
4.6 Results
In this experiment, we want to search for control We evaluate our models automatically on two
token values that make the model generate the best different datasets TurkCorpus and ASSET. In
possible simplifications. Thus, we select the values addition, we also perform a human evaluation
that achieve the best SARI on the validation set on one of our models, which is described in
using the same tool that we use for hyperparameters Section 5. Table 2 reports the results of auto-
tuning, Optuna (Akiba et al., 2019), and keep those matic evaluation of our models compared with
values fixed for sentences in the test set. We repeat other state-of-the-art systems. Our model T5-
the same process for each evaluation dataset. base+#chars+WordRank+LevSim+DepTreeDepth
performs best on TurkCorpus with the SARI score
4.5 Baselines
of 43.31, while the other model T5-base+All
We benchmark our model against several well- Tokens performs best on ASSET with SARI score
known state-of-the-art systems: of 45.04 compared to the current state-of-the-art
BART+ACCESS with the SARI score of 42.62 on
YATS (Ferrés et al., 2016)6 Rule-based system
TurkCorpus and 43.63 on ASSET. Following these
with linguistically motivated rule-based syntactic
results, our models out-perform all the state-of-
analysis and corpus-based lexical simplifier which
the-art models in the literature in all approaches:
generates sentences based on part-of-speech tags
rule-based, supervised and unsupervised approach
and dependency information.
even without using any additional resources.
PBMT-R (Wubben et al., 2012) Phrase-based
5 Human Evaluation
MT system trained on a monolingual parallel cor-
pus with candidate re-ranking based on dissimilar- In addition to automatic evaluation, we performed
ity using Levenshtein distance. a human evaluation on the outputs of different sys-
tems. Following recent works (Alva-Manchego
UNTS (Surya et al., 2019) Unsupervised Neu- et al., 2017; Dong et al., 2019; Zhao et al., 2020),
ral Text Simplification is based on the encode- we run our evaluation on Amazon Mechanical Turk
attend-decode style architecture (Bahdanau et al., by asking five workers to rate using 5-point lik-
2014) with a shared encoder and two decoders and ert scale on three aspects: (1) Fluency (or Gram-
trained on unlabeled data extracted from English maticality): is it grammatically correct and well-
Wikipedia dump. formed?, (2) Simplicity: is it simpler than the
Dress-LS (Zhang and Lapata, 2017) A Seq2Seq original sentence?, and (3) Adequacy (or Mean-
model trained with deep reinforcement learning ing preservation): does it preserve meaning of the
original sentence? More detailed instructions can
6
http://able2include.taln.upf.edu be found in Appendix A. For this evaluation, we
344
ASSET TurkCorpus
Model Data
SARI↑ BLEU↑ FKGL↓ SARI↑ BLEU↑ FKGL↓
YATS Rule-based 34.4 72.07 7.65 37.39 74.87 7.67
PBMT-R PWKP (Wikipedia) 34.63 79.39 8.85 38.04 82.49 8.85
UNTS Unsup. Data 35.19 76.14 7.60 36.29 76.44 7.60
Dress-LS WikiLarge 36.59 86.39 7.66 36.97 81.08 7.66
DMASS+DCSS WikiLarge 38.67 71.44 7.73 39.92 73.29 7.73
ACCESS WikiLarge 40.13 75.99 7.29 41.38 76.36 7.29
BART+ACCESS WikiLarge 43.63 76.28 6.25 42.62 78.28 6.98
T5-base+#Chars+WordRank
+LevSim+DepTreeDepth WikiLarge 44.91 71.96 6.32 43.31 66.23 6.17
T5-base+All Tokens WikiLarge 45.04 71.21 5.88 43.00 64.42 5.63
Table 2: We report SARI, BLEU and FKGL evaluation results of our model compared with others on TurkCorpus
and ASSET test set (SARI and BLEU higher the better, FKGL lower the better). BLEU and FKGL scores are
not quite relevant for sentence simplification, and we keep them just to compare with the previous models. All
the results of the literature are taken from Martin et al. (2020a), except YATS which is generated using its web
interface.
randomly select 100 sentences from different sim- and in some cases, the subject is repeated twice
plification systems trained on WikiLarge dataset, when the sentence is split into two (e.g., relative
except YATS which is rule-based. Table 3 reports clause). The repetition is also considered as one
the results in averaged values. of the key features of simplification as it makes
text easier to understand, but for native or fluent
Model Fluency Simplicity Adequacy language speakers, repetition and the longer sen-
YATS 4.03* 3.62* 3.92* tence make the fluency worse. Moreover, due to
DMASS+DCSS 3.84* 3.70* 3.48* these problems, the evaluators also tend to lower
BART+ACCESS 4.41 4.02 4.13 the simplicity score as they consider it harder to
Our Model 4.30 3.99 4.18 read.
345
ASSET TurkCorpus
Model
SARI↑ BLEU↑ FKGL↓ SARI↑ BLEU↑ FKGL↓
T5-small (No tokens) 29.85 90.39 8.94 34.50 94.16 9.44
T5-small + All Tokens 39.12 86.08 6.99 40.83 85.12 6.78
T5-base (No tokens) 34.15 88.97 8.94 37.56 90.96 8.81
T5-base:
+#Words 38.51 84.02 7.45 38.86 89.10 8.61
+#Chars 39.58 79.22 6.06 38.95 84.81 7.76
+LevSim 41.58 82.52 6.53 40.90 85.45 7.55
+WordRank 41.40 76.75 5.85 41.44 85.46 7.67
+DepTreeDepth 40.08 81.94 6.56 39.18 87.60 7.81
T5-base:
+WordRank+LevSim 42.85 80.38 4.47 41.75 83.90 7.42
+#Chars+WordRank+LevSim 44.89 56.76 5.93 42.91 67.09 6.53
+#Words+#Chars+WordRank+LevSim 44.65 58.52 5.52 43.03 68.11 5.96
+#Chars+WordRank+LevSim+DepTreeDepth 44.91 71.96 6.32 43.31 66.23 6.17
+All Tokens 45.04 71.21 5.88 43.00 64.42 5.63
Table 5: Ablation study on different T5 models and different control token values. Each model is trained and
evaluated independently. We report SARI, BLEU and FKGL on TurkCorpus and ASSET test set. Control token
values corresponded to each model are listed in the Table 6
Table 6: These are the control token values used for the ablation study in Table 5. Each model is trained and
evaluated independently. The values are selected using the hyperparameters search tool mentioned in Section 4.4.
346
Figure 1: Influence of #Words and #Chars control tokens on the simplification outputs. Red represents the outputs
of the model trained with four tokens, without #Words control token. Blue represents the outputs of the model
trained with all five tokens. Green is the reference taken from TurkCorpus. The first row shows the compression
ratio (number of chars ratio between system outputs and source sentences), and second row is the Levenshtein
similarity (words similarity between system outputs and source sentences) of each model. We plot the results of
the 2000 validation sentences from TurkCorpus. Other control token values used here are set to 0.75, the example
in Table 7.
the models in Table 5 which are selected using the mines the number of words and #Chars limits the
same process and tool as mentioned in Section 4.4. number of characters in the sentence. In our exam-
Based on the results, the larger model (T5-base) ples Table 7, we set #Words to 1.0, which means
performs better than the smaller one (T5-small) on the number of words in the simplified sentence has
both datasets (+3.06 on TurkCorpus, +4.3 on AS- to be similar to the original sentence, and #Chars
SET). It is due to the fact that larger model has is set to 0.5 and 0.75, which means keeping the
more information which could generate better and same amount of words but reduces 50% or 25% of
more coherent text. Moreover, when added con- characters.
trol tokens, the performance increases significantly. Figure 1 shows the differences in density distri-
With only one token, WordRank performs best on bution (first row) and similarity (second row) be-
TurkCorpus (+3.88 over T5-base) and LevSim on tween model 1 in red without #Words token, model
ASSET (+7.43 over T5-base). 2 in blue with #Words tokens, and the one in green
Using pre-trained model alone does not gain is the reference. The first column #Chars is set
much improvement, only when combined with con- to 0.25, second column #Chars=0.5, third column
trol tokens, the results improve by a big margin #Chars=0.75, fourth #Chars=1.0, and in all cases
(+3.06 and +9.28 for T5-small with and without #words is set to 1.0. From the plots, we can see
tokens), and (+5.75 and +10.89 for T5-base with that model 1 does more compression than model
and without tokens). 2, which means model 2 preserve more words than
model 1.
6.1 Analysis on the effect of #Words Table 7 shows some example sentences com-
Our goal of using #Words control token is to make paring models with #Chars 0.75 and #Chars 0.5.
the model learn to generate shorter words whereas When #Chars is set to 0.75, we do not see much dif-
#Chars alone could help the model regulate the ference between the two models, but when #Chars
sentence length but not word length, so here we is set to 0.5, the two models have differences in
investigate how #Words and #Chars control tokens terms of sentence length and word length. For
affect the outputs. example, the word mathematics in the example
For the model with #Words token to work, it has number one is replaced with the word math in
to be incorporated with #Chars as #Words deter- model 2 (with #Words) and removed by model 1
347
Model 1: #Chars 0.5 WordRank 0.75 LevSim 0.75 DepTreeDepth 0.75
Tokens
Model 2: #Words 1.0 #Chars 0.5 WordRank 0.75 LevSim 0.75 DepTreeDepth 0.75
Source: In order to accomplish their objective, surveyors use elements of geometry, engineering,
trigonometry, mathematics, physics, and law.
Model 1: In order to accomplish their objective, surveyors use geometry, engineering, and law.
Model 2: In order to do this, surveyors use geometry, engineering, trigonometry, math, physics,
and law.
Source: The municipality has about 5700 inhabitants.
Model 1: The municipality has 5700.
Model 2: The town has about 5700.
Source: A hunting dog refers to any dog who assists humans in hunting.
Model 1: A hunting dog is any dog who hunts.
Model 2: A hunting dog is a dog who helps humans in hunting.
Model 1: #Chars 0.75 WordRank 0.75 LevSim 0.75 DepTreeDepth 0.75
Tokens
Model 2: #Words 1.0 #Chars 0.75 WordRank 0.75 LevSim 0.75 DepTreeDepth 0.75
Source: The park has become a traditional location for mass demonstrations.
Model 1: The park has become a popular place for demonstrations.
Model 2: The park has become a place for people to show things.
Source: Frances was later absorbed by an extratropical cyclone on November 21.
Model 1: Frances was later taken in by an extratropical cyclone.
Model 2: Frances was later taken over by a cyclone on November 21.
Source: There are claims that thousands of people were impaled at a single time.
Model 1: There are claims that thousands of people were killed.
Model 2: There are also stories that thousands of people were killed at a time.
Table 7: Examples showing the differences between the model with number of words ratio versus the one without.
Model 1 trained with four tokens, without #Words control token, and model 2 trained with all five control tokens.
All control token values used to generate the outputs are listed in the rows Tokens. We use bold to highlight the
differences.
(without #Words). Second example, the word mu- the #Words control token does not significantly im-
nicipality is replaced by the word town by model prove the SARI score and sometimes even lowers
2, and model 1 simply keeps the word and crops the score, it certainly holds its purpose.
the sentence (the same problem with the third ex-
ample). In addition, the fourth example, the word 7 Conclusion
location is replaced by both models with the word In this paper, we propose a method which lever-
place, the phrase mass demonstration is reduced ages a big pre-trained model (T5) fine-tuning it
to demonstration by the model 1 whereas model for the Controllable Sentence Simplification task.
2 changes to four shorter words people to show The experiments have shown good results of 43.31
things. SARI on TurkCorpus evaluation set and of 45.04
There are many cases where model 1 and model on ASSET evaluation set, outperforming the cur-
2 generate the same substitutions, but very often rent state-of-the-art model. Also, we have shown
model 1 tends to crop the end of the sentence or that adding the control token #Words is useful for
drops some words to fulfill the length constraint. generating substitutions with a shorter lengths.
Whereas model 2 tends to generate longer sen-
Acknowledgments
tences than model 1, less crop, and very often re-
places long complex words with shorter ones. Even We acknowledge support from the project
though, based on the results from Table 2, adding Context-aware Multilingual Text Simplifi-
348
cation (ConMuTeS) PID2019-109066GB- Ninth Conference of the European Chapter of the As-
I00/AEI/10.13039/501100011033 awarded by sociation for Computational Linguistics.
Ministerio de Ciencia, Innovación y Universidades Raman Chandrasekar, Christine Doran, and Srinivas
(MCIU) and by Agencia Estatal de Investigación Bangalore. 1996. Motivations and methods for text
(AEI) of Spain. Also, we would like to thank the simplification. In COLING 1996 Volume 2: The
three anonymous reviewers for their insightful 16th International Conference on Computational
Linguistics.
suggestions.
William Coster and David Kauchak. 2011. Simple en-
glish wikipedia: a new text simplification task. In
References Proceedings of the 49th Annual Meeting of the Asso-
ciation for Computational Linguistics: Human Lan-
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, guage Technologies, pages 665–669.
Takeru Ohta, and Masanori Koyama. 2019. Op-
tuna: A next-generation hyperparameter optimiza- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
tion framework. In Proceedings of the 25th ACM Kristina Toutanova. 2019. BERT: Pre-training of
SIGKDD international conference on knowledge dis- deep bidirectional transformers for language under-
covery & data mining, pages 2623–2631. standing. In Proceedings of the 2019 Conference
of the North American Chapter of the Association
Fernando Alva-Manchego, Joachim Bingel, Gustavo for Computational Linguistics: Human Language
Paetzold, Carolina Scarton, and Lucia Specia. 2017. Technologies, Volume 1 (Long and Short Papers),
Learning how to simplify from explicit labeling of pages 4171–4186, Minneapolis, Minnesota. Associ-
complex-simplified text pairs. In Proceedings of ation for Computational Linguistics.
the Eighth International Joint Conference on Natu-
ral Language Processing (Volume 1: Long Papers), Yue Dong, Zichao Li, Mehdi Rezagholizadeh, and
pages 295–305. Jackie Chi Kit Cheung. 2019. Editnts: An neu-
ral programmer-interpreter model for sentence sim-
Fernando Alva-Manchego, Louis Martin, Antoine Bor- plification through explicit editing. arXiv preprint
des, Carolina Scarton, Benoı̂t Sagot, and Lucia Spe- arXiv:1906.08104.
cia. 2020. ASSET: A dataset for tuning and evalu-
ation of sentence simplification models with multi- Biljana Drndarević and Horacio Saggion. 2012. To-
ple rewriting transformations. In Proceedings of the wards automatic lexical simplification in spanish: an
58th Annual Meeting of the Association for Compu- empirical study. In Proceedings of the First Work-
tational Linguistics, pages 4668–4679, Online. As- shop on Predicting and Improving Text Readability
sociation for Computational Linguistics. for target reader populations, pages 8–16.
Fernando Alva-Manchego, Louis Martin, Carolina Richard J Evans. 2011. Comparing methods for
Scarton, and Lucia Specia. 2019. EASSE: Easier au- the syntactic simplification of sentences in informa-
tomatic sentence simplification evaluation. In Pro- tion extraction. Literary and linguistic computing,
ceedings of the 2019 Conference on Empirical Meth- 26(4):371–388.
ods in Natural Language Processing and the 9th In-
ternational Joint Conference on Natural Language Angela Fan, David Grangier, and Michael Auli. 2017.
Processing (EMNLP-IJCNLP): System Demonstra- Controllable abstractive summarization. arXiv
tions, pages 49–54, Hong Kong, China. Association preprint arXiv:1711.05217.
for Computational Linguistics.
Daniel Ferrés, Montserrat Marimon, Horacio Saggion,
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- et al. 2016. Yats: yet another text simplifier. In
gio. 2014. Neural machine translation by jointly International Conference on Applications of Natural
learning to align and translate. arXiv preprint Language to Information Systems, pages 335–342.
arXiv:1409.0473. Springer.
Eduard Barbu, M Teresa Martı́n-Valdivia, Eugenio Jessica Ficler and Yoav Goldberg. 2017. Controlling
Martı́nez-Cámara, and L Alfonso Ureña-López. linguistic style aspects in neural language genera-
2015. Language technologies applied to document tion. arXiv preprint arXiv:1707.02633.
simplification for helping autistic people. Expert
Systems with Applications, 42(12):5076–5086. Siddhartha Jonnalagadda and Graciela Gonzalez. 2010.
Biosimplify: an open source sentence simplification
Delphine Bernhard, Louis De Viron, Véronique engine to improve recall in automatic biomedical
Moriceau, and Xavier Tannier. 2012. Question gen- information extraction. In AMIA Annual Sympo-
eration for french: collating parsers and paraphras- sium Proceedings, volume 2010, page 351. Ameri-
ing questions. Dialogue & Discourse, 3(2):43–74. can Medical Informatics Association.
John A Carroll, Guido Minnen, Darren Pearce, Yvonne Oleg Kariuk and Dima Karamshuk. 2020. Cut: Con-
Canning, Siobhan Devlin, and John Tait. 1999. Sim- trollable unsupervised text simplification. arXiv
plifying text for language-impaired readers. In preprint arXiv:2012.01936.
349
David Kauchak. 2013. Improving text simplification Gustavo H Paetzold and Lucia Specia. 2016. Unsuper-
language modeling using unsimplified text data. In vised lexical simplification for non-native speakers.
Proceedings of the 51st annual meeting of the associ- In Proceedings of the Thirtieth AAAI Conference on
ation for computational linguistics (volume 1: Long Artificial Intelligence, pages 3761–3767.
papers), pages 1537–1546.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Jing Zhu. 2002. Bleu: a method for automatic eval-
Takamura, and Manabu Okumura. 2016. Control- uation of machine translation. In Proceedings of
ling output length in neural encoder-decoders. arXiv the 40th Annual Meeting of the Association for Com-
preprint arXiv:1609.09552. putational Linguistics, pages 311–318, Philadelphia,
Pennsylvania, USA. Association for Computational
J Peter Kincaid, Robert P Fishburne Jr, Richard L Linguistics.
Rogers, and Brad S Chissom. 1975. Derivation of
new readability formulas (automated readability in- Ellie Pavlick and Chris Callison-Burch. 2016a. Simple
dex, fog count and flesch reading ease formula) for ppdb: A paraphrase database for simplification. In
navy enlisted personnel. Technical report, Naval Proceedings of the 54th Annual Meeting of the As-
Technical Training Command Millington TN Re- sociation for Computational Linguistics (Volume 2:
search Branch. Short Papers), pages 143–148.
Reno Kriz, Joao Sedoc, Marianna Apidianaki, Car- Ellie Pavlick and Chris Callison-Burch. 2016b. Simple
olina Zheng, Gaurav Kumar, Eleni Miltsakaki, and PPDB: A paraphrase database for simplification. In
Chris Callison-Burch. 2019. Complexity-weighted Proceedings of the 54th Annual Meeting of the As-
loss and diverse reranking for sentence simplifica- sociation for Computational Linguistics (Volume 2:
tion. arXiv preprint arXiv:1904.02767. Short Papers), pages 143–148, Berlin, Germany. As-
sociation for Computational Linguistics.
Vladimir I Levenshtein. 1966. Binary codes capable
of correcting deletions, insertions, and reversals. In Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
Soviet physics doklady, volume 10, pages 707–710. Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Soviet Union. Wei Li, and Peter J Liu. 2019. Exploring the limits
of transfer learning with a unified text-to-text trans-
Mike Lewis, Yinhan Liu, Naman Goyal, Mar- former. arXiv preprint arXiv:1910.10683.
jan Ghazvininejad, Abdelrahman Mohamed, Omer
Levy, Veselin Stoyanov, and Luke Zettlemoyer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
2020. BART: Denoising sequence-to-sequence pre- Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
training for natural language generation, translation, Wei Li, and Peter J Liu. 2020. Exploring the lim-
and comprehension. In Proceedings of the 58th An- its of transfer learning with a unified text-to-text
nual Meeting of the Association for Computational transformer. Journal of Machine Learning Research,
Linguistics, pages 7871–7880, Online. Association 21(140):1–67.
for Computational Linguistics.
Luz Rello, Ricardo Baeza-Yates, Stefan Bott, and Ho-
Louis Martin, Angela Fan, Éric de la Clergerie, An- racio Saggion. 2013a. Simplify or help? text
toine Bordes, and Benoı̂t Sagot. 2020a. Multilin- simplification strategies for people with dyslexia.
gual unsupervised sentence simplification. arXiv In Proceedings of the 10th International Cross-
preprint arXiv:2005.00352. Disciplinary Conference on Web Accessibility, pages
1–10.
Louis Martin, Éric Villemonte de La Clergerie, Benoı̂t Luz Rello, Susana Bautista, Ricardo Baeza-Yates,
Sagot, and Antoine Bordes. 2020b. Control- Pablo Gervás, Raquel Hervás, and Horacio Saggion.
lable Sentence Simplification. In LREC 2020 - 2013b. One half or 50%? an eye-tracking study of
12th Language Resources and Evaluation Confer- number representation readability. In IFIP Confer-
ence, Marseille, France. Due to COVID19 pan- ence on Human-Computer Interaction, pages 229–
demic, the 12th edition is cancelled. The LREC 245. Springer.
2020 Proceedings are available at http://www.lrec-
conf.org/proceedings/lrec2020/index.html. Horacio Saggion. 2017. Automatic Text Simplification.
Synthesis Lectures on Human Language Technolo-
Kerstin Matausch and Birgit Peböck. 2010. Easyweb–a gies, 10(1):1–137.
study how people with specific learning difficulties
can be supported on using the internet. In Interna- Carolina Scarton and Lucia Specia. 2018. Learning
tional Conference on Computers for Handicapped simplifications for specific target audiences. In Pro-
Persons, pages 641–648. Springer. ceedings of the 56th Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 2: Short
Sergiu Nisioi, Sanja Štajner, Simone Paolo Ponzetto, Papers), pages 712–718.
and Liviu P Dinu. 2017. Exploring neural text sim-
plification models. In Proceedings of the 55th An- Rico Sennrich, Barry Haddow, and Alexandra Birch.
nual Meeting of the Association for Computational 2016. Controlling politeness in neural machine
Linguistics (Volume 2: Short Papers), pages 85–91. translation via side constraints. In Proceedings of
350
the 2016 Conference of the North American Chap- ACM international conference on Design of commu-
ter of the Association for Computational Linguistics: nication, pages 29–36.
Human Language Technologies, pages 35–40.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Advaith Siddharthan, Ani Nenkova, and Kathleen Chaumond, Clement Delangue, Anthony Moi, Pier-
McKeown. 2004. Syntactic simplification for im- ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow-
proving content selection in multi-document summa- icz, Joe Davison, Sam Shleifer, Patrick von Platen,
rization. In COLING 2004: Proceedings of the 20th Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
International Conference on Computational Linguis- Teven Le Scao, Sylvain Gugger, Mariama Drame,
tics, pages 896–902, Geneva, Switzerland. COL- Quentin Lhoest, and Alexander M. Rush. 2019.
ING. Huggingface’s transformers: State-of-the-art natural
language processing. ArXiv, abs/1910.03771.
Sanja Štajner, Iacer Calixto, and Horacio Saggion.
2015. Automatic text simplification for spanish: Kristian Woodsend and Mirella Lapata. 2011. Learn-
Comparative evaluation of various simplification ing to simplify sentences with quasi-synchronous
strategies. In Proceedings of the international con- grammar and integer programming. In Proceedings
ference recent advances in natural language pro- of the 2011 Conference on Empirical Methods in
cessing, pages 618–626. Natural Language Processing, pages 409–420.
Sanja Štajner and Maja Popović. 2016. Can text simpli- Sander Wubben, Antal van den Bosch, and Emiel Krah-
fication help machine translation? In Proceedings of mer. 2012. Sentence simplification by monolingual
the 19th Annual Conference of the European Associ- machine translation. In Proceedings of the 50th An-
ation for Machine Translation, pages 230–242. nual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 1015–
Sanja Štajner and Maja Popović. 2019. Automated 1024, Jeju Island, Korea. Association for Computa-
text simplification as a preprocessing step for ma- tional Linguistics.
chine translation into an under-resourced language.
In Proceedings of the International Conference on Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze
Recent Advances in Natural Language Processing Chen, and Chris Callison-Burch. 2016. Optimizing
(RANLP 2019), pages 1141–1150. statistical machine translation for text simplification.
Transactions of the Association for Computational
Elior Sulem, Omri Abend, and Ari Rappoport. 2018. Linguistics, 4:401–415.
BLEU is not suitable for the evaluation of text sim-
plification. In Proceedings of the 2018 Conference Xingxing Zhang and Mirella Lapata. 2017. Sentence
on Empirical Methods in Natural Language Process- simplification with deep reinforcement learning. In
ing, pages 738–744, Brussels, Belgium. Association Proceedings of the 2017 Conference on Empirical
for Computational Linguistics. Methods in Natural Language Processing, pages
584–594, Copenhagen, Denmark. Association for
Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Computational Linguistics.
and Karthik Sankaranarayanan. 2019. Unsupervised
neural text simplification. In Proceedings of the Sanqiang Zhao, Rui Meng, Daqing He, Saptono Andi,
57th Annual Meeting of the Association for Com- and Parmanto Bambang. 2018a. Integrating trans-
putational Linguistics, pages 2058–2068, Florence, former and paraphrase rules for sentence simplifica-
Italy. Association for Computational Linguistics. tion. arXiv preprint arXiv:1810.11193.
Sanqiang Zhao, Rui Meng, Daqing He, Andi Saptono,
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
and Bambang Parmanto. 2018b. Integrating trans-
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
former and paraphrase rules for sentence simplifi-
Kaiser, and Illia Polosukhin. 2017. Attention is all
cation. In Proceedings of the 2018 Conference on
you need. In Advances in neural information pro-
Empirical Methods in Natural Language Processing,
cessing systems, pages 5998–6008.
pages 3164–3173, Brussels, Belgium. Association
Tu Vu, Baotian Hu, Tsendsuren Munkhdalai, and Hong for Computational Linguistics.
Yu. 2018. Sentence simplification with memory-
Yanbin Zhao, Lu Chen, Zhi Chen, and Kai Yu.
augmented neural networks. In Proceedings of the
2020. Semi-supervised text simplification with
2018 Conference of the North American Chapter of
back-translation and asymmetric denoising autoen-
the Association for Computational Linguistics: Hu-
coders. In Proceedings of the AAAI Conference on
man Language Technologies, Volume 2 (Short Pa-
Artificial Intelligence, volume 34, pages 9668–9675.
pers), pages 79–85, New Orleans, Louisiana. Asso-
ciation for Computational Linguistics. Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych.
2010. A monolingual tree-based translation model
Willian Massami Watanabe, Arnaldo Candido Junior,
for sentence simplification. In Proceedings of the
Vinı́cius Rodriguez Uzêda, Renata Pontin de Mattos
23rd International Conference on Computational
Fortes, Thiago Alexandre Salgueiro Pardo, and San-
Linguistics (Coling 2010), pages 1353–1361.
dra Maria Aluı́sio. 2009. Facilita: reading assistance
for low-literacy readers. In Proceedings of the 27th
351
A Human Evaluation Interface
Figure 2: Our interface is based on the one proposed by Kriz et al. (2019), and the consent form based on Alva-
Manchego et al. (2020).
352