0% found this document useful (0 votes)
8 views7 pages

Transformer 2017

This paper presents a novel framework for paraphrase generation that combines transformer and seq2seq models to enhance the quality of generated paraphrases. The proposed architecture utilizes a two-layer stack of encoders, where the transformer captures long-term dependencies and semantic properties, while the seq2seq model with GRU produces the final paraphrase. Experimental results demonstrate that this framework achieves state-of-the-art performance on the QUORA and MSCOCO datasets, outperforming existing models.

Uploaded by

ryandonglin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Transformer 2017

This paper presents a novel framework for paraphrase generation that combines transformer and seq2seq models to enhance the quality of generated paraphrases. The proposed architecture utilizes a two-layer stack of encoders, where the transformer captures long-term dependencies and semantic properties, while the seq2seq model with GRU produces the final paraphrase. Experimental results demonstrate that this framework achieves state-of-the-art performance on the QUORA and MSCOCO datasets, outperforming existing models.

Uploaded by

ryandonglin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Transformer and seq2seq model for Paraphrase Generation

Elozino Egonmwan and Yllias Chali


University of Lethbridge
Lethbridge, AB, Canada
{elozino.egonmwan, yllias.chali}@uleth.ca

Abstract reinforcement learning (Li et al., 2018), deep gen-


erative models (Iyyer et al., 2018) and a varied
Paraphrase generation aims to improve the
clarity of a sentence by using different word- combination (Gupta et al., 2018; Mallinson et al.,
ing that convey similar meaning. For better 2017) of the later three.
quality of generated paraphrases, we propose In this paper, we propose a novel framework
a framework that combines the effectiveness for paraphrase generation that utilizes the trans-
of two models – transformer and sequence-to- former model of Vaswani et al. (2017) and seq2seq
sequence (seq2seq). We design a two-layer model of Sutskever et al. (2014) specifically GRU
stack of encoders. The first layer is a trans-
(Cho et al., 2014). The multi-head self attention of
former model containing 6 stacked identical
layers with multi-head self-attention, while the
the transformer complements the seq2seq model
second-layer is a seq2seq model with gated re- with its ability to learn long-range dependencies in
current units (GRU - RNN). The transformer en- the input sequence. Also the individual attention
coder layer learns to capture long-term depen- heads in the transformer model mimics behavior
dencies, together with syntactic and semantic related to the syntactic and semantic structure of
properties of the input sentence. This rich vec- the sentence (Vaswani et al., 2017, 2018) which is
tor representation learned by the transformer key in paraphrase generation. Furthermore, we use
serves as input to the GRU - RNN encoder re-
GRU to obtain a fixed-size state vector for decod-
sponsible for producing the state vector for de-
coding. Experimental results on two datasets- ing into variable length sequences, given the more
Q UORA and MSCOCO using our framework, qualitative learned vector representations from the
produces a new benchmark for paraphrase transformer.
generation. The main contributions of this work are:
1 Introduction • We propose a novel framework for the task of
paraphrase generation that produces quality
Paraphrasing is a key abstraction technique used
paraphrases of its source sentence.
in Natural Language Processing (NLP). While ca-
pable of generating novel words, it also learns • For in-depth analysis of our results, in addi-
to compress or remove unnecessary words along tion to using BLEU (Papineni et al., 2002) and
the way. Thus, gainfully lending itself to ab- ROUGE (Lin, 2004) which are word-overlap
stractive summarization (Chen and Bansal, 2018; based, we further evaluate our model us-
Gehrmann et al., 2018) and question generation ing qualitative metrics such as Embedding
(Song et al., 2018) for machine reading compre- Average Cosine Similarity (EACS), Greedy
hension (MRC) (Dong et al., 2017). Paraphrases Matching Score (GMS) from Sharma et al.
can also be used as simpler alternatives to input (2017) and METEOR (Banerjee and Lavie,
sentences for machine translation (MT) (Callison- 2005), with stronger correlation with human
Burch et al., 2006) as well as evaluation of nat- reference.
ural language generation (NLG) texts (Apidianaki
et al., 2018).
2 Task Definition
Existing methods for generating paraphrases,
fall in one of these broad categories – rule-based Given an input sentence S = (s1 , ..., sn ) with n
(McKeown, 1983), seq2seq (Prakash et al., 2016), words, the task is to generate an alternative output

249
Proceedings of the 3rd Workshop on Neural Generation and Translation (WNGT 2019), pages 249–255
Hong Kong, China, November 4, 2019. c 2019 Association for Computational Linguistics
www.aclweb.org/anthology/D19-56%2d
S: What are the dumbest questions ever asked S: Three dimensional rendering of a kitchen area
on Quora? with various appliances.
G: what is the stupidest question on quora? G: a series of photographs of a kitchen
R: What is the most stupid question asked on R: A series of photographs of a tiny model
Quora? kitchen
S: How can I lose fat without doing any aero- S: a young boy in a soccer uniform kicking a ball
bic physical activity G: a young boy kicking a soccer ball
G: how can i lose weight without exercise? R: A young boy kicking a soccer ball on a green
R: How can I lose weight in a month without field.
doing exercise? S: The dog is wearing a Santa Claus hat.
S: How did Donald Trump won the 2016 USA G: a dog poses with santa hat
presidential election? R: A dog poses while wearing a santa hat.
G: how did donald trump win the 2016 presi- S: the people are sampling wine at a wine tasting.
dential G: a group of people wine tasting.
R: How did Donald Trump become presi- R: Group of people tasting wine next to some
dent? barrels.

Table 1: Examples of our generated paraphrases on Table 2: Examples of our generated paraphrases on
the QUORA sampled test set, where S, G, R repre- the MSCOCO sampled test set, where S, G, R repre-
sents Source, Generated and Reference sentences re- sents Source, Generated and Reference sentences re-
spectively. spectively.

sentence Y = (y1 , ..., ym ) | ∃ym 6∈ S with m former. The GRU - RNN encoder (Chung et al.,
words that conveys similar semantics as S, where 2014; Cho et al., 2014) produces fixed-state vector
preferably, m < n but not necessarily. representation of the transformed input sequence
using the following equations:
3 Method
In this section, we present our framework for para- z = σ(xt U z + st−1 W z ) (1)
phrase generation. It follows the popular encode-
decode paradigm, but with two stacked layers of r = σ(xt U r + st−1 W r ) (2)
encoders. The first encoding layer is a trans-
former encoder, while the second encoding layer
h = tanh(xt U h + (st−1 r)W h ) (3)
is a GRU - RNN encoder. The paraphrase of a given
sentence is generated by a GRU - RNN decoder.
st = (1 − z) h+z st−1 (4)
3.1 Stacked Encoders
where r and z are the reset and update gates re-
3.1.1 Encoder – TRANSFORMER
spectively, W and U are the network’s parameters,
We use the transformer-encoder as sort of a pre- st is the hidden state vector at timestep t, xt is the
training module of our input sentence. The goal input vector and represents the Hadamard prod-
is to learn richer representation of the input vector uct.
that better handles long-term dependencies as well
as captures syntactic and semantic properties be- 3.2 Decoder – GRU - RNN
fore obtaining a fixed-state representation for de-
coding into the desired output sentence. The trans- The fixed-state vector representation produced by
former contains 6 stacked identical layers mainly the GRU - RNN encoder is used as initial state for
driven by self-attention implemented by Vaswani the decoder. At each time step, the decoder re-
et al. (2017, 2018). ceives the previously generated word, yt−1 and
hidden state st−1 at time step t−1 . The output
3.1.2 Encoder – GRU - RNN word, yt at each time step, is a softmax probability
Our architecture uses a single layer uni-directional of the vector in equation 3 over the set of vocabu-
GRU - RNN whose input is the output of the trans- lary words, V .

250
50 K
M ODEL BLEU METEOR R-L EACS GMS
VAE - SVG - EQ (Gupta et al., 2018) 17.4 22.2 - - -
RbM- SL (Li et al., 2018) 35.81 28.12 - - -
T RANS (ours) 35.56 33.89 27.53 79.72 62.91
S EQ (ours) 34.88 32.10 29.91 78.66 61.45
T RAN S EQ (ours) 37.06 33.73 30.89 80.81 63.63
T RAN S EQ + beam (size=6) (ours) 37.12 33.68 30.72 81.03 63.50
100 K
M ODEL BLEU METEOR R-L EACS GMS
VAE - SVG - EQ (Gupta et al., 2018) 22.90 25.50 - - -
RbM- SL (Li et al., 2018) 43.54 32.84 - - -
T RANS (ours) 37.46 36.04 29.73 80.61 64.81
S EQ (ours) 36.98 34.71 32.06 79.65 63.49
T RAN S EQ (ours) 38.75 35.84 33.23 81.50 65.52
T RAN S EQ + beam (size=6) (ours) 38.77 35.86 33.07 81.64 65.42
150 K
M ODEL BLEU METEOR R-L EACS GMS
VAE - SVG - EQ (Gupta et al., 2018) 38.30 33.60 - - -
T RANS (ours) 39.00 38.68 32.05 81.90 65.27
S EQ (ours) 38.50 36.89 34.35 80.95 64.13
T RAN S EQ (ours) 40.36 38.49 35.84 82.84 65.99
T RAN S EQ + beam (size=6) (ours) 39.82 38.48 35.40 82.48 65.54

Table 3: Performance of our model against various models on the QUORA dataset with 50k,100k,150k training
examples. R-L refers to the ROUGE-L F1 score with 95% confidence interval

4 Experiments • Residual LSTM (Prakash et al., 2016): This


implements stacked residual long short term
We describe baselines, our implementation set- memory networks (LSTM).
tings, datasets and evaluation of our proposed
model. • T RANS: Encoder-decoder framework as de-
scribed in Section 3 but with a single trans-
4.1 Baselines former encoder layer.
We compare our model with very recent models
(Gupta et al., 2018; Li et al., 2018; Prakash et al., • S EQ: Encoder-decoder framework as de-
2016) including the current state-of-the-art (Gupta scribed in Section 3 but with a single GRU -
et al., 2018) in the field. To further highlight the RNN encoder layer.
gain of stacking 2 encoders we use each compo-
nent – Transformer (T RANS) and seq2seq (S EQ) 4.2 Implementation
as baselines.
We used pre-trained 300-dimensional gloV e1
word-embeddings (Pennington et al., 2014) as the
• VAE - SVG - EQ (Gupta et al., 2018): This is distributed representation of our input sentences.
the current state-of-the-art in the field, with We set the maximum sentence length to 15 and 10
a variational autoencoder as its main compo- respectively for our input and target sentences fol-
nent. lowing the statistics of our dataset.
For the transformer encoder, we used the
• RbM- SL (Li et al., 2018): Different from
transf ormer base hyperparameter setting from
the encoder-decoder framework, this is
a generator-evaluator framework, with the 1
https://nlp.stanford.edu/projects/
evaluator trained by reinforcement learning. glove/

251
M ODEL BLEU METEOR R-L EACS GMS
Residual LSTM (Prakash et al., 2016) 37.0 27.0 - - -
VAE - SVG - EQ (Gupta et al., 2018) 41.7 31.0 - - -
T RANS (ours) 41.8 38.5 33.4 79.6 70.3
S EQ (ours) 40.7 36.9 35.8 78.9 70.0
T RAN S EQ (ours) 43.4 38.3 37.4 80.5 71.1
T RAN S EQ + beam (size=10) (ours) 44.5 40.0 38.4 81.9 71.3

Table 4: Performance of our model against various models on the MSCOCO dataset. R-L refers to the ROUGE-L
F1 score with 95% confidence interval

the tensor2tensor library (Vaswani et al., 2018)2 , 2, while the QUORA dataset contains question
but set the hidden size to 300. We set dropout to pairs, MSCOCO contains free form texts which
0.0 and 0.7 for MSCOCO and QUORA datasets re- are human annotations of images. Subjective
spectively. We used a large dropout for QUORA observation of the MSCOCO dataset reveals that
because the model tends to over-fit to the training most of its paraphrase pairs contain more novel
set. Both the GRU - RNN encoder and decoder con- words as well as syntactic manipulations than
tain 300 hidden units. the QUORA pairs making it a more interesting
We pre-process our datasets, and do not use the paraphrase generation corpora. We split the
pre-processed/tokenized versions of the datasets QUORA dataset to 50k, 100k and 150k training
from tensor2tensor library. Our target vocabulary samples and 4k testing samples in order to align
is a set of approximately 15,000 words. It con- with baseline models for comparative purposes.
tains words in our target training and test sets that
occur at least twice. Using this subset of vocabu- 4.4 Evaluation
lary words as opposed to over 320,000 vocabulary For quantitative analysis of our model, we use
words contained in gloV e improves both training popular automatic metrics such as BLEU , ROUGE ,
time and performance of the model. METEOR . Since BLEU and ROUGE both mea-
We train and evaluate our model after each sure n − gram word-overlap with difference in
epoch with a fixed learning rate of 0.0005, and brevity penalty, we report just the ROUGE - L value.
stop training when the validation loss does not We also use 2 additional recent metrics – GMS
decrease after 5 epochs. The model learns and EACS by (Sharma et al., 2017)5 that measure
to minimize the seq2seq loss implemented in the similarity between the reference and generated
tensorflow API3 with AdamOptimizer. paraphrases based on the cosine similarity of their
We use greedy-decoding during training and vali- embeddings on word and sentence levels respec-
dation and set the maximum number of iterations tively.
to 5 times the target sentence length. For test-
ing/inference we use beam-search decoding. 4.5 Result Analysis
Tables 3 and 4 report scores of our model on both
4.3 Datasets datasets. Our model pushes the benchmark on all
We evaluate our model on two standard datasets evaluation metrics compared against current pub-
for paraphrase generation – QUORA4 and lished top models evaluated on the same datasets.
MSCOCO (Lin et al., 2014) as described in Gupta Since several words could connote similar mean-
et al. (2018) and used similar settings. The ing, it is more logical to evaluate with metrics that
QUORA dataset contains over 120k examples match with embedding vectors capable of measur-
with a 80k and 40k split on the training and ing this similarity. Hence we also report GMS and
test sets respectively. As seen in Tables 1 and EACS scores as a basis of comparison for future

2
work in this direction.
https://github.com/tensorflow/
tensor2tensor
Besides quantitative values, Tables 1 and 2
3
https://www.tensorflow.org/api_docs/ show that our paraphrases are well formed, ab-
python/tf/contrib/seq2seq/sequence_loss stractive (e.g dumbest – stupidest, dog is wearing
4
https://data.quora.com/
5
First-Quora-Dataset-Release-Question-Pairs https://github.com/Maluuba/nlg-eval

252
– dog poses), capable of performing syntactic ma- ported in this paper was conducted at the Univer-
nipulations (e.g in a soccer uniform kicking a ball sity of Lethbridge and supported by Alberta Inno-
– kicking a soccer ball) and compression. Some of vates and Alberta Education.
our paraphrased sentences even have more brevity
than the reference, and still remain very meaning-
ful. References
Marianna Apidianaki, Guillaume Wisniewski, Anne
5 Related Work Cocos, and Chris Callison-Burch. 2018. Automated
paraphrase lattice creation for hyter machine trans-
Our baseline models – VAE - SVG - EQ (Gupta et al., lation evaluation. In Proceedings of the 2018 Con-
2018) and RbM- SL (Li et al., 2018) are both ference of the North American Chapter of the Asso-
deep learning models. While the former uses a ciation for Computational Linguistics: Human Lan-
guage Technologies, Volume 2 (Short Papers), pages
variational-autoencoder and is capable of generat- 480–485.
ing multiple paraphrases of a given sentence, the
later uses deep reinforcement learning. In tune, Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An
with part of our approach, ie, seq2seq, there exists automatic metric for mt evaluation with improved
correlation with human judgments. In Proceedings
ample models with interesting variants – residual of the acl workshop on intrinsic and extrinsic evalu-
LSTM (Prakash et al., 2016), bi-directional GRU ation measures for machine translation and/or sum-
with attention and special decoding tweaks (Cao marization, pages 65–72.
et al., 2017), attention from the perspective of se- Chris Callison-Burch, Philipp Koehn, and Miles Os-
mantic parsing (Su and Yan, 2017). borne. 2006. Improved statistical machine transla-
MT has been greatly used to generate para- tion using paraphrases. In Proceedings of the main
phrases (Quirk et al., 2004; Zhao et al., 2008) due conference on Human Language Technology Con-
ference of the North American Chapter of the Asso-
to the availability of large corpora. While much ciation of Computational Linguistics, pages 17–24.
earlier works have explored the use of manually Association for Computational Linguistics.
drafted rules (Hassan et al., 2007; Kozlowski et al.,
2003). Ziqiang Cao, Chuwei Luo, Wenjie Li, and Sujian Li.
2017. Joint copying and restricted generation for
Similar to our model architecture, Chen et al. paraphrase. In Thirty-First AAAI Conference on Ar-
(2018) combined transformers and RNN-based en- tificial Intelligence.
coders for MT. Zhao et al. (2018) recently used the
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin
transformer model for paraphrasing on different Johnson, Wolfgang Macherey, George Foster, Llion
datasets. We experimented using solely a trans- Jones, Mike Schuster, Noam Shazeer, Niki Parmar,
former but got better results with T RAN S EQ. To et al. 2018. The best of both worlds: Combining
the best of our knowledge, our work is the first recent advances in neural machine translation. In
Proceedings of the 56th Annual Meeting of the As-
to cross-breed the transformer and seq2seq for the sociation for Computational Linguistics (Volume 1:
task of paraphrase generation. Long Papers), pages 76–86.

6 Conclusions Yen-Chun Chen and Mohit Bansal. 2018. Fast abstrac-


tive summarization with reinforce-selected sentence
We proposed a novel framework, T RAN S EQ that rewriting. In Proceedings of the 56th Annual Meet-
combines the efficiency of a transformer and ing of the Association for Computational Linguistics
(Volume 1: Long Papers), volume 1, pages 675–686.
seq2seq model and improves the current state-of-
the-art on the QUORA and MSCOCO paraphras- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gul-
ing datasets. Besides quantitative results, we pre- cehre, Dzmitry Bahdanau, Fethi Bougares, Holger
sented examples that highlight the syntactic and Schwenk, and Yoshua Bengio. 2014. Learning
phrase representations using rnn encoder-decoder
semantic quality of our generated paraphrases. for statistical machine translation. arXiv preprint
In the future, it will be interesting to apply this arXiv:1406.1078.
framework for the task of abstractive text summa-
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho,
rization and other NLG-related problems. and Yoshua Bengio. 2014. Empirical evaluation of
gated recurrent neural networks on sequence model-
Acknowledgments ing. arXiv preprint arXiv:1412.3555.
We would like to thank the anonymous review- Li Dong, Jonathan Mallinson, Siva Reddy, and Mirella
ers for their useful comments. The research re- Lapata. 2017. Learning to paraphrase for question

253
answering. In Proceedings of the 2017 Conference Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
on Empirical Methods in Natural Language Pro- Jing Zhu. 2002. Bleu: a method for automatic eval-
cessing, pages 875–886. uation of machine translation. In Proceedings of
the 40th annual meeting on association for compu-
Sebastian Gehrmann, Yuntian Deng, and Alexander tational linguistics, pages 311–318. Association for
Rush. 2018. Bottom-up abstractive summarization. Computational Linguistics.
In Proceedings of the 2018 Conference on Empiri-
cal Methods in Natural Language Processing, pages Jeffrey Pennington, Richard Socher, and Christopher
4098–4109. Manning. 2014. Glove: Global vectors for word
representation. In Proceedings of the 2014 confer-
Ankush Gupta, Arvind Agarwal, Prawaan Singh, and ence on empirical methods in natural language pro-
Piyush Rai. 2018. A deep generative framework for cessing (EMNLP), pages 1532–1543.
paraphrase generation. In Thirty-Second AAAI Con-
ference on Artificial Intelligence. Aaditya Prakash, Sadid A Hasan, Kathy Lee, Vivek
Datla, Ashequl Qadir, Joey Liu, and Oladimeji
Samer Hassan, Andras Csomai, Carmen Banea, Ravi Farri. 2016. Neural paraphrase generation with
Sinha, and Rada Mihalcea. 2007. Unt: Subfinder: stacked residual lstm networks. arXiv preprint
Combining knowledge sources for automatic lex- arXiv:1610.03098.
ical substitution. In Proceedings of the Fourth
International Workshop on Semantic Evaluations Chris Quirk, Chris Brockett, and William Dolan.
(SemEval-2007), pages 410–413. 2004. Monolingual machine translation for para-
phrase generation. In Proceedings of the 2004 con-
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke ference on empirical methods in natural language
Zettlemoyer. 2018. Adversarial example generation processing, pages 142–149.
with syntactically controlled paraphrase networks.
In Proceedings of the 2018 Conference of the North Shikhar Sharma, Layla El Asri, Hannes Schulz, and
American Chapter of the Association for Computa- Jeremie Zumer. 2017. Relevance of unsuper-
tional Linguistics: Human Language Technologies, vised metrics in task-oriented dialogue for evalu-
Volume 1 (Long Papers), pages 1875–1885. ating natural language generation. arXiv preprint
arXiv:1706.09799.
Raymond Kozlowski, Kathleen F McCoy, and K Vijay-
Shanker. 2003. Generation of single-sentence para-
phrases from predicate/argument structure using Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang,
lexico-grammatical resources. In Proceedings of the and Daniel Gildea. 2018. Leveraging context infor-
second international workshop on Paraphrasing- mation for natural question generation. In Proceed-
Volume 16, pages 1–8. Association for Computa- ings of the 2018 Conference of the North American
tional Linguistics. Chapter of the Association for Computational Lin-
guistics: Human Language Technologies, Volume 2
Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li. (Short Papers), pages 569–574.
2018. Paraphrase generation with deep reinforce-
ment learning. In Proceedings of the 2018 Con- Yu Su and Xifeng Yan. 2017. Cross-domain seman-
ference on Empirical Methods in Natural Language tic parsing via paraphrasing. In Proceedings of the
Processing, pages 3865–3878. 2017 Conference on Empirical Methods in Natural
Language Processing, pages 1235–1246.
Chin-Yew Lin. 2004. Rouge: A package for auto-
matic evaluation of summaries. Text Summarization Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.
Branches Out. Sequence to sequence learning with neural net-
works. In Advances in neural information process-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James ing systems, pages 3104–3112.
Hays, Pietro Perona, Deva Ramanan, Piotr Dollár,
and C Lawrence Zitnick. 2014. Microsoft coco: Ashish Vaswani, Samy Bengio, Eugene Brevdo, Fran-
Common objects in context. In European confer- cois Chollet, Aidan Gomez, Stephan Gouws, Llion
ence on computer vision, pages 740–755. Springer. Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Par-
mar, et al. 2018. Tensor2tensor for neural machine
Jonathan Mallinson, Rico Sennrich, and Mirella Lap- translation. In Proceedings of the 13th Conference
ata. 2017. Paraphrasing revisited with neural ma- of the Association for Machine Translation in the
chine translation. In Proceedings of the 15th Con- Americas (Volume 1: Research Papers), volume 1,
ference of the European Chapter of the Association pages 193–199.
for Computational Linguistics: Volume 1, Long Pa-
pers, pages 881–893. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Kathleen R McKeown. 1983. Paraphrasing questions Kaiser, and Illia Polosukhin. 2017. Attention is all
using given and new information. Computational you need. In Advances in neural information pro-
Linguistics, 9(1):1–10. cessing systems, pages 5998–6008.

254
Sanqiang Zhao, Rui Meng, Daqing He, Andi Saptono,
and Bambang Parmanto. 2018. Integrating trans-
former and paraphrase rules for sentence simplifi-
cation. In Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Process-
ing, pages 3164–3173, Brussels, Belgium. Associ-
ation for Computational Linguistics.
Shiqi Zhao, Cheng Niu, Ming Zhou, Ting Liu, and
Sheng Li. 2008. Combining multiple resources to
improve smt-based paraphrasing model. In Pro-
ceedings of ACL-08: HLT, pages 1021–1029.

255

You might also like