A Sentiment-Controllable Topic-to-Essay Generator With Topic Knowledge Graph
A Sentiment-Controllable Topic-to-Essay Generator With Topic Knowledge Graph
Abstract Love
Emotion without Love is a kind of emotion. Love experience
Experience sentiment is important to every one....
Generating a vivid, novel, and diverse es-
say with only several given topic words is a positive It’s been half a year since I fell in love
challenging task of natural language genera- positive with my boyfriend. He treats me very
well......
tion. In previous work, there are two problems positive
It‘s been half a year since I fell in love with
left unsolved: neglect of sentiment beneath negative
Love my boyfriend. But these few months
Emotion my boyfriend rarely contacted me.......
the text and insufficient utilization of topic- Experience negative
related knowledge. Therefore, we propose positive I get addicted to smoking after broke up
with him. I hope someone can comfort
a novel Sentiment-Controllable topic-to-essay and encourage me......
3336
Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3336–3344
November 16 - 20, 2020.
2020
c Association for Computational Linguistics
polarity for each of the sentence. Therefore, the discriminator. The discriminator provides the re-
ability to control sentiment is essential to improve ward to the generator based on the coverage of the
discourse-level diversity for the TEG task. output on the given topics.
Our contributions can be summarized as follow:
As for the other problem, imagine that when
we human beings are asked to write articles with 1. We propose a sentiment-controllable topic-to-
some topics, we heavily rely on our commonsense essay generator based on CVAE, which can
knowledge related to the topics. Therefore, the generate high-quality essays as well as control
proper usage of knowledge plays a vital role in the the sentiment. To the best of our knowledge,
topic-to-essay generation. Previous state-of-the-art we are the first to control the sentiment in TEG
method (Yang et al., 2019) extracts topic-related and demonstrate the potential of our model
concepts from a commonsense knowledge base to to generate diverse essays by controlling the
enrich the input information. However, they ignore sentiment.
the graph structure of the knowledge base, which
merely refer to the concepts in the knowledge graph 2. We equip our decoder with a topic knowledge
and fail to consider their correlation. This limita- graph and propose a novel Topic Graph At-
tion leads to concepts being isolated from each tention (TGA) mechanism. TGA makes the
other. For instance, given two knowledge triples full use of the structured, connected semantic
(law, antonym, disorder) and (law, part of, theory), information from the topic knowledge graph
about the topic word law, Yang et al. (2019) simply to generate more appropriate and informative
uses the neighboring concepts disorder and theory essays.
as a supplement to the input information. How-
ever, their method fails to learn that disorder has 3. We conduct extensive experiments, showing
opposite meaning with law while theory is a hyper- that our model accurately controls the sen-
nym to law, which can be learned from their edges timent and outperforms the state-of-the-art
(correlations) in the knowledge graph. Intuitively, methods both in automatic and human evalua-
lacking the correlation information between con- tions.
cepts in the knowledge graph hinders a model from
2 Task Formulation
generating appropriate and informative essays.
To address these issues, we propose a novel Traditional TEG task takes as input a topic se-
Sentiment-Controllable topic-to-essay generator quence X = (x1 , · · · , xm ) with m words, and
with a Topic Knowledge Graph enhanced decoder, aims to generate an essay with M sentences
named SCTKG, which is based on the conditional (L1 , · · · , LM ) corresponding with topic sequence
variational auto-encoder (CVAE) framework. To X. In this paper, we provide a sentiment sequence
control the sentiment of the text, we inject the sen- S = (s1 , · · · , sM ), each of which corresponds to
timent information in the encoder and decoder of a target sentence in essay. Each sentiment can be
our model to control the sentiment from both sen- positive, negative, or neutral.
tence level and word level. The sentiment labels Essays are generated in a sentence-by-sentence
are provided by a sentiment classifier during train- manner. The first sentence L1 is generated only
ing. To fully utilize the knowledge, the model conditioned on the topic sequence X, then the
retrieves a topic knowledge graph from a large- model takes all the previous generated sentences
scale commonsense knowledge base ConceptNet as well as the topic sequence to generate the next
(Speer and Havasi, 2012). Different from Yang sentence until the entire essay is completed. In this
et al. (2019), we preserve the graph structure of paper, we denote the previous sentences L1:i−1 as
the knowledge base and propose a novel Topic context.
Graph Attention (TGA) mechanism. TGA atten-
3 Model Description
tively reads the knowledge graphs and makes the
full use of the structured, connected semantic infor- In this section, we describe an overview of our
mation from the graphs for a better generation. In proposed model. Our SCTKG generator based on
the meantime, to make the generated essays more a CVAE architecture consists of an encoder and
closely surround the semantics of all input topics, a topic knowledge graph enhanced decoder. The
we adopt adversarial training based on a multi-label encoder encodes topic sequence, sentiment, and
3337
I am student Topic Knowledge Graph
hi
Next Sentence Li Recognition
z
network
Figure 2: The architecture of our model. ⊕ denotes the vector concatenation operation. Only the part with solid
lines and the red dotted arrow is applied at inference, while the entire CVAE except the red dotted arrow part used
in the training process. Sentiment label s with blue arrows denote sentiment control. Red solid lines denote TGA
at each decoding step. The text generated by SCTKG generator feeds to topic label discriminator. The above m
blue circle represents the probability that it belongs to the real text with the m input topics, and the green circle
represents the given text is a generated text.
context and regards them as conditional variables into hcontext = [h1 , h2 . . . , hi−1 ]. Then a single
c. Then a latent variable z is computed from c layer forward GRU is used to encode the sentence
through a recognition network (during training) or representations hcontext into a final state vector
prior network (during inference). The decoder at- hc ∈ Rd .
taches with a topic knowledge graph and sentiment Then the concatenation of hc , hx , e(s) is
label to generate the texts. At each decoding step, functionalized as the conditional vector c =
the TGA is used to enrich input topic information [e(s); hc ; hx ]. e(s) is the embedding of sentiment
through effectively utilizing the topic knowledge label s. We assume that z follows a multivariate
graph. Gaussian distribution with a diagonal covariance
We adopt a two-stage training approach: (1) matrix. Thus the recognition network qφ (z|hi , c)
2I
Train the SCTKG generator with the conventional and the prior network p θ (z|c) follow N µ, σ
and N µ0 , σ 02 I , respectively. I is identity matrix,
CVAE loss; (2) After the first step is done, we
introduce a topic label discriminator to evaluate and then we have
µ, σ 2 = MLPrecognition (hi , c),
the performance of SCTKG generator. We adopt
adversarial training to alternately train the gener- 0 02 (1)
µ , σ = MLPprior (c).
ator and the discriminator to further enhance the
performance of the SCTKG generator. Additionally, we use a reparametrization trick
(Kingma and Welling, 2013) to sample z from the
3.1 SCTKG Generator recognition network during training and from prior
network during testing.
3.1.1 Encoder
As shown in Figure 2, the utterance encoder is 3.1.2 Decoder
a bidirectional GRU (Chung et al., 2014) to en- A general Seq2seq model may tend to emit generic
code an input sequence into a fixed-size vector and meaningless sentences. To create more mean-
by concatenating the last hidden states of the for- ingful essays, we propose a topic knowledge
ward and backward GRU. We use the utterance graph enhanced decoder. The decoder is based
encoder to encode the topic sequence X into hx on a 1-layer GRU network with initial state d0 =
−
→ ← − Wd [z, c, e(s)] + bd . Wd and bd are trainable de-
= [hx , hx ], hx ∈ Rd . d is the dimension of the
vector. The next sequence Li is also encoded by coder parameters and e(s) is the sentiment embed-
→
− ← − ding as mentioned above. As shown in Figure 2,
utterance encoder into hi = [ hi , hi ], hi ∈ Rd . For
we equip the decoder with a topic knowledge graph
context encoder, we use a hierarchical encoding
to incorporate commonsense knowledge from Con-
strategy. Firstly, each sentence in context L1:i−1 is
ceptNet1 . ConceptNet is a semantic network which
encoded by utterance encoder to get a fixed-size
1
vector. By doing so, the context L1:i−1 is encoded https://conceptnet.io
3338
consists of triples R = (head; rel; tail). The head weighted sum of the neighbouring concepts of the
concept head has the relation rel with tail concept topic words. Note that we use different weight ma-
tail. We use word vectors to represent head and trices to distinguish the neighboring concepts in dif-
tail concepts and learn trainable vector r for rela- ferent positions (in head or in tail). This distinction
tion rel , which is randomly initialized. Each word is necessary. For instance, given two knowledge
in the topic sequence is used as a query to retrieve triples (Big Ben, part of, London) and (London,
a subgraph from ConceptNet and the topic knowl- part of, England). Even though the concepts Big
edge graph is constituted by these subgraphs. Then Ben and England are both neighboring concepts to
we use the Topic Graph Attention (TGA) mecha- London with the same relation part of, they have
nism to read from the topic knowledge graph at the different meaning with regard to London. We
each generation step. need to model this difference by W3 and W4 .
Then the final probability of generating a word
Topic Graph Attention. As previously stated, a
is computed by
proper usage of the external knowledge plays a
vital role in our task. TGA takes as input the re- Pt = softmax (Wo [dt ; e(s); gt ] + bo ) ,
trieved topic knowledge graph and a query vector q where dt is the decoder state at t step and Wo ∈
to produce a graph vector gt . We set q = [dt−1 ; c; z], Rdmodel ×|V | , bo ∈ R|V | are trainable decoder pa-
where dt−1 represents the decoder hidden state for rameters. dmodel is the dimension of [dt ; e(s); gt ]
t − 1 step. At each decoding step, we calculate and |V | is vocabulary size.
the correlation score between each of the triples in
the graph and q. Then we use the correlation score 3.2 Topic Label Discriminator
to compute the weighted sum of all the neighbor- Another concern is that the generated texts should
ing concepts2 to the topic words to form the final be closely related to the topic words. To this end,
graph vector gt . Neighboring concepts are entities at the second training stage, a topic label discrimi-
that directly link to topic words. We formalize the nator is introduced to perform adversarial training
computational process as follows: with the SCTKG generator. In a max-min game,
N
X the SCTKG generator generates essays to make dis-
gt = αn on , (2) criminator consider them semantically match with
n=1 given topics. Discriminator tries to distinguish the
exp (βn ) generated essays from real essays. In detail, sup-
αn = PN , (3)
j=1 exp (βj )
pose there are a total of m topics, the discriminator
produces a sigmoid probability distribution over
(W1 q)> tanh (W2 r n + W3 on )
(m + 1)classes. The score at (m + 1)th index rep-
when on ∈ S1 ,
βn = > (4) resents the probability that the sample is the gen-
(W1 q) tanh (W2 r n + W4 on )
erated text. The score at the j th (i ∈ {1, · · · , m})
when on ∈ S2
index represents the probability that it belongs to
where on is the embedding of nth neighboring the real text with the j th topic. Here the discrimi-
concept and r n is the embedding of the rela- nator is a CNN (Kim, 2014) text classifier.
tion for nth triple in the topic knowledge graph.
W1 , W2 , W3 , W4 are weight matrices for query, 3.3 Training
relations, head entities and tail entities, respectively. We introduce our two stage training method in
S1 contains the neighboring concepts which being this section. Stage 1: Similar to a conventional
the head concepts in their triples, while S2 con- CVAE model, The loss of our SCTKG generator
tains the neighboring concepts which being the tail −logp(Y |c) can be expressed as:
concepts. The matching score βn represents the − L (θ; φ; c; Y )cvae = LKL + Ldecoder
correlation between the query q and neighbouring
= KL (qφ (z|Y, c)kpθ (z|c)) (5)
concept on . Essentially, a graph vector gt is the
2
− Eqφ (z|Y,c) (log pD (Y |z, c)) .
As shown in Figure 2, in the topic knowledge graph,
red circles denote the topic words and blue circles denote Here, θ and φ are the parameters of the prior net-
their neighboring concepts. Since we have already encoded work and recognition network, respectively. Intu-
topic information in the encoder, the graph vector gt in this
section mainly focuses on the neighboring concept to assist itively, Ldecoder maximizes the sentence generation
the generation. probability after sampling from the recognition net-
3339
work, while LKL minimizes the distance between Tensorflow5 . The number of parameters is 68M
the prior and recognition network. Besides, we and parameters of our model were randomly initial-
use the annealing trick and BOW-loss (Zhao et al., ized over a uniform distribution [-0.08,0.08]. We
2017) to alleviate the vanishing latent variable prob- pre-train our model for 80 epochs with the MLE
lem in VAE training. method and adversarial training for 30 epochs. The
Stage 2: After trained the SCTKG generator with average runtime for our model is 30 hours on a
equation (5), inspired by SeqGan (Yu et al., 2017), Tesla P40 GPU machine, which adversarial train-
we adopt adversarial training between the genera- ing takes most of the runtime. The optimizer is
tor and the topic label discriminator described in Adam (Kingma and Ba, 2014) with 10−3 learn-
section 3.2. We refer reader to Yu et al. (2017) and ing rate for pre-training and 10−5 for adversarial
Yang et al. (2019) for more details. training. Besides, we apply dropout on the output
layer to avoid over-fitting (Srivastava et al., 2014)
4 Experiments (dropout rate = 0.2) and clip the gradients to the
maximum norm of 10. The decoding strategy in
4.1 Datasets
this paper uses greedy search and average length
We conduct experiments on the ZHIHU corpus of generated essays is 79.3.
(Feng et al., 2018). It consists of Chinese essays3
whose length is between 50 and 100. We select 4.3 Evaluation
topic words based on frequency and remove rare To comprehensively evaluate the generated essays,
topic words. The total number of topic labels are we rely on a combination of both automatic evalua-
set to 100. Sizes of the training set and the test tion and human evaluation.
set are 27,000 essays and 2500 essays. For tun-
ing hyperparameters, we set aside 10% of training Automatic Evaluation. Following previous
samples as the validation set. work (Yang et al., 2019), we consider the following
The sentence sentiment labels is required for metrics6 :
our model training. To this end, we sample 5000 BLEU: The BLEU score (Papineni et al., 2002b)
sentences from the dataset and annotated the data is widely used in machine translation, dialogue,
manually with three categories, i.e., positive, nega- and other text generation tasks by measuring word
tive, neutral. This dataset was divided into a train- overlapping between ground truth and generated
ing set, validation set, and test set. We use an sentences.
open-source Chinese sentiment classifier Senta4 to Dist-1, Dist-2 (Li et al., 2015): We calculate the
finetune on our manually-label training set. This proportion of distinct 1-grams and 2-grams in the
classifier achieves an accuracy of 0.83 on the test generated essays to evaluate the diversity of the
set. During training, the target sentiment labels s is outputs.
computed by the sentiment classifier automatically. Consistency (Yang et al., 2019): An ideal essay
During inference, users can input any sentiment should closely surround the semantics of all input
labels to control the sentiment for sentence genera- topics. Therefore, we pre-train a multi-label classi-
tion. fier to evaluate the topic-consistency of the output.
A higher “Consistency” score means the generated
4.2 Implementation Details essays are more closely related to the given topics.
We use the 200-dim pre-trained word embeddings Novelty (Yang et al., 2019): We calculated the
provided by Song et al. (2018) and dimension of novelty by the difference between output and es-
sentiment embeddings is 32. The vocabulary size is says with similar topics in the training corpus. A
50,000 and the batch size is 64. We use a manually higher “Novelty” score means the output essays are
tuning method to choose the hyperparameter values more different from essays in the training corpus.
and the criterion used to select is BLEU (Papineni Precision, Recall and Senti-F1: These metrics
et al., 2002a). We use GRU with hidden size 512 are used to measure sentiment control accuracy. If
for both encoder and decoder and the size of latent the sentiment label of the generated sentence is con-
variables is 300. We implement the model with sistent with the ground truth, the generated result
3 5
The dataset can be download by https://pan. https://github.com/tensorflow/
baidu.com/s/17pcfWUuQTbcbniT0tBdwFQ tensorflow
4 6
https://github.com/baidu/Senta https://github.com/libing125/CTEG
3340
Automatic evaluation Human evaluation
Methods BLEU Consistency Novelty Dist-1 Dist-2 Con. Nov. E-div. Flu.
TAV 6.05 16.59 70.32 2.69 14.25 2.32 2.19 2.58 2.76
TAT 6.32 9.19 68.77 2.25 12.17 1.76 2.07 2.32 2.93
MTA 7.09 25.73 70.68 2.24 11.70 3.14 2.87 2.17 3.25
CTEG 9.72 39.42 75.71 5.19 20.49 3.74 3.34 3.08 3.59
SCTKG(w/o-Senti) 9.97 43.84 78.32 5.73 23.16 3.89 3.35 3.90 3.71
SCTKG(Ran-Senti) 9.64 41.89 79.54 5.84 23.10 3.80 3.48 4.29 3.67
SCTKG(Gold-Senti) 11.02 42.57 78.87 5.92 23.07 3.81 3.37 3.94 3.75
Table 1: Automatic and human evaluation result. In human evaluation, Con., Nov., E-div., Flu. represent topic-
consistency, novelty, essay-diversity, fluency, respectively. The best performance is highlighted in bold.
is right, and wrong otherwise. The sentiment label eration. It achieves state-of-the-art performance on
is predicted by our sentiment classifier mentioned the topic-to-essay generation task.
above (see 4.1 for details about this classifier).
5 Results and Analysis
Human Evaluation. We also perform human
evaluation to more accurately evaluate the quality In this section, we introduce our experimental re-
of generated essays. Each item contains the input sults and analysis from two part: the “text quality”
topics and outputs of different models. Then, 200 and “sentiment control”. Then we show case study
items are distributed to 3 annotators, who have no of our model.
knowledge in advance about the generated essays
come from which model. Each annotator scores 5.1 Results on Text Quality
200 items and we average the score from three The automatic and human evaluation results are
annotators. They are required to score the gener- shown in Table 1. We present three different
ated essay from 1 to 5 in terms of three criteria: versions of our model for a comprehensive com-
Novelty, Fluency, and Topic-Consistency. For parison. (1)“SCTKG(w/o-Senti)” means we do
novelty, we use the TF-IDF features of topic words not attach any sentiment label to the model. (2)
to retrieve 10 most similar training samples to pro- “SCTKG(Ran-Senti)” means we randomly set the
vide references for the annotators. To demonstrate sentiment label for each generated sentence. (3)
the paragraph-level diversity of our model, we pro- “SCTKG(Gold-Senti)” means we set the golden
pose a Essay-Diversity criteria. Specifically, each sentiment label for the generated sentence. By
model generates three essays with the same input investigating the results in Table 1, we have the
topics, and annotators are required to score the di- following observations:
versity by considering the three essays together. First, all versions of our SCTKG models out-
perform the baselines in all evaluation metrics (ex-
4.4 Baselines
cept the BLEU score of SCTKG(Ran-Senti)). This
TAV (Feng et al., 2018) represents topic seman- demonstrates that our SCTKG model can generate
tics as the average of all topic embeddings and then better essays than baseline models, whether uses
uses a LSTM to generate each word. Their work the true sentiment, random sentiment or without
also includes the following two baselines. any sentiment.
TAT (Feng et al., 2018) extends LSTM with an Second, we can learn the superiority of the basic
attention mechanism to model the semantic related- architecture of our model through the comparison
ness of each topic word with the generator’s output. between SCTKG(w/o-Senti) and the baselines. In
MTA (Feng et al., 2018) maintains a topic cov- human evaluation, SCTKG(w/o-Senti) outperform
erage vector to guarantee that all topic information CTEG in topic-consistency, essay-diversity, and
is expressed during generation through an LSTM fluency by +0.15 (3.74 vs 3.89), +0.82 (3.08 vs
decoder. 3.90), +0.12 (3.59 vs 3.71) respectively. Similar im-
CTEG (Yang et al., 2019) adopts commonsense provements can be also drawn from the automatic
knowledge and adversarial training to improve gen- evaluation. The improvement in essay-diversity
3341
is the most significant. This improvement comes Methods BLEU Con. Nov. E-div. Flu.
from our CVAE architecture because our sentence Full model 11.02 3.81 3.37 3.94 3.75
representation comes from the sampling from a
w/o TGA 10.34 3.54 3.17 3.89 3.38
continuous latent variable. This sampling opera-
w/o AT 9.85 3.37 3.20 3.92 3.51
tion introduces more randomness compared with
baselines. Table 2: Ablation study on text quality. “w/o AT”
Third, as previously stated, each model gener- means without adversarial training. “w/o TGA” means
ates three essays and considers them as a whole withou TGA. Con., Nov., E-div., Flu. represent topic-
when comparing the “E-div”. When given the ran- consistency, novelty, essay-diversity, fluency, respec-
dom and diverse sentiment label sequences, our tively. Full model represent SCTKG(Gold-Senti) in
SCTKG(Ran-Senti) achieves the highest “E-div” this table.
score (4.29). Consider that CVAE architecture has
already improved the diversity compared with base- bels. Table 2 presents the BLEU scores and human
lines. By randomizing the sentiment of each sen- evaluation results of the ablation study.
tence, SCTKG(Ran-Senti) further boosts this im-
By comparing full model and “w/o TGA”,
provement (from +0.82 to +1.21 compared with
we find that without TGA, the model perfor-
CTEG). This result demonstrates the potential of
mance drops in all metrics. In particularly, topic-
our model to generate discourse-level diverse es-
consistency drops 0.27, which shows that by di-
says by using diverse sentiment sequences, proving
rectly learning the correlation between the topic
our claim in the introduction part.
words and its neighboring concepts, concepts that
Fourth, when using the golden sentiment la- are more closely related to the topic words are
bel, SCTKG(Gold-Senti) achieves the best perfor- given higher attention during generation. Novelty
mance in BLEU (11.02). However, we find the drops 0.2, the reason is that TGA is an expansion of
SCTKG(Gold-Senti) do not significantly outper- the external knowledge graph information. There-
forms other SCTKG models in other metrics. The fore the output essays are more novel and infor-
results show the true sentiment label of the tar- mative. Fluency drops 0.37 because TGA benefits
get sentence benefits SCTKG(Gold-Senti) to better our model to choose a more suitable concept in
fit in the test set, but there is no obvious help for the topic knowledge graph according to the cur-
other important metrics such as diversity and topic- rent context. And the BLEU drops for 0.68 shows
consistency. TGA helps our model to better fit the dataset by
Fifth, we find it interesting that when remov- modeling the relations between topic words and
ing the sentiment label, the SCTKG(w/o-Senti) neighboring concepts.
achieves the best topic-consistency score. We con-
By comparing full model and “w/o AT”, we find
ceive that sentiment label may interfere with the
that adversarial training can improve the BLEU,
topic information in the latent variable to some ex-
topic-consistency, and fluency. The reason is that
tent. But the effect of this interference is trivial.
the discriminative signal enhancing the topic con-
Comparing SCTKG(w/o-Senti) and SCTKG(Gold-
sistency and authenticity of the generated texts.
Senti), the topic-consistency only drops 0.08 (3.89
vs 3.81) for human evaluation and 1.27 (43.84 vs 5.2 Results on Sentiment Control
42.57) for automatic evaluation, which is com-
pletely acceptable for a sentiment controllable In this section, we investigate whether the model ac-
model. curately control the sentiment and how each compo-
nent affects our sentiment control performance. We
Ablation study on text quality. To understand train three ablated versions of our model: without
how each component of our model contributes to sentiment label in encoder, without sentiment label
the task, we train two ablated versions of our model: in decoder, and without TGA. We randomly sample
without adversarial training (“w/o AT”) and with- 50 essays in our test set with 250 sentences. Instead
out TGA (“w/o TGA”). Noted that in the “w/o of using golden sentiment labels, the sentiment la-
TGA” experiment, we implement a memory net- bels are randomly given in this section. Predicting
work the same as Yang et al. (2019) which uses the golden sentiment is relatively simple because
the concepts in ConceptNet but regardless of their sometimes emotional labels can be directly derived
correlation. All models uses golden sentiment la- from the coherence between contexts. We adopt a
3342
Methods Precison Recall Senti-F1 say is not only closely related to the topic “Law”
and “Education”, but also corresponding with the
Full model 0.68 0.66 0.67
randomly given sentiment label. Meanwhile, our
w/o Enc-senti 0.56 0.55 0.56 model makes full use of commonsense knowledge
w/o Dec-senti 0.59 0.62 0.61 with the help of TGA. For example, “high school
w/o TGA 0.62 0.64 0.63 student” and “right” are the neighboring concepts
related to the topic words “Education” and “Law”.
Table 3: Ablation study on sentiment control. “w/o
Enc-senti” means to remove the sentiment embed- 6 Related Work
ding in the encoder side and “w/o Dec-senti” means
to remove from the decoder. Full model represents Topic-to-Text Generation. Automatically gen-
SCTKG(Ran-Senti) in this table. erating an article is a challenging task in natural
language processing. Feng et al. (2018) are the first
Input topics: Law Education to propose the TEG task and they utilize coverage
Sentiment label: neu. pos. neg. neg. neu. vector to integrate topic information. Yang et al.
(2019) use extra commonsense knowledge to en-
Output essay: I am a senior high school student.
I am in the best high school in our town. But rich the input information and adopt adversarial
bullying still exist on our campus. Teachers training to enhancing topic-consistency. However,
always ignore this phenomenon. What should we both of them fail to consider the sentiment factor in
do to protect our rights? the essay generation and fully utilize the external
knowledge base. These limitations hinder them
Table 4: Given topic “Law” and “Education”, and ran- from generating high-quality texts.
domly set sentiment label for each sentence. We gen- Besides, Chinese poetry generation is similar to
erated an essay according to the topic and sentiment
our task, which can also be regarded as a topic-
labels. “neu.” represents neutral. “pos.” represents pos-
itive and “neg.” represents negative. We have translated to-sequence learning task. Li et al. (2018) adopt
the original Chinese output into English. CVAE and adversarial training to generate diverse
poetry. Yang et al. (2017) use CVAE with hybrid
decoders to generate Chinese poems. And Yi et al.
more difficult experimental setting that aims to gen- (2018) use reinforcement learning to directly im-
erate sentences following arbitrary given sentiment prove the diversity criteria. However, their models
labels. The results are shown in Table 3. are not directly applicable to TEG task. Because
We can learn that removing the sentiment label they do not take knowledge into account, their mod-
either from encoder or decoder leads to an obvious els cannot generate long and meaningful unstruc-
control performance decrease (-11% / -6% on Senti- tured essays.
F1) and the sentiment label in the encoder is the
most important, since removing it leads to the most Controllable Text Generation. Some work has
obvious decline (-11% Senti-F1). Although TGA explored style control mechanisms for text gener-
does not directly impose sentiment information, it ation tasks. For example, Zhou and Wang (2017)
still helps to improve the control ability (4% in use naturally annotated emoji Twitter data for emo-
Senti-F1), which shows that learning correlations tional response generation. Wang and Wan (2018)
among concepts in topic knowledge graph strength- propose adversarial training to control the senti-
ens the emotional control ability of the model. For ment of the texts. Chen et al. (2019) propose a
instance, when given a positive label, the concepts semi-supervised CVAE to generate poetry and de-
related to the relation “desire of” are more likely duce a different lower bound to capture generalized
to attach more attention, because the concepts with sentiment-related semantics. Different from their
the relation “desire of” may represent more positive work, we inject sentiment label in both encoder
meaning. and decoder of CVAE and prove that by modeling
a topic knowledge graph can further enhance the
5.3 Case Study sentiment control ability.
Table 4 presents an example of our output essay
7 Conclusions
with a random sentiment sequence. Positive sen-
tences are shown in red and negative sentences are In this paper, we make a further step in a challeng-
shown in blue. We can learn that the output es- ing topic-to-essay generation task by proposing a
3343
novel sentiment-controllable topic-to-essay gener- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
ator with a topic knowledge graph enhanced de- Jing Zhu. 2002b. Bleu: a method for automatic eval-
uation of machine translation. In Proceedings of
coder, named SCTKG. To get better representation
the 40th annual meeting on association for compu-
from external knowledge, we present TGA, a novel tational linguistics, pages 311–318. Association for
topic knowledge graph representation mechanism. Computational Linguistics.
Experiments show that our model can not only
Yan Song, Shuming Shi, Jing Li, and Haisong Zhang.
generate sentiment-controllable essays but also out- 2018. Directional skip-gram: Explicitly distinguish-
perform competitive baselines in text quality. ing left and right context for word embeddings. In
Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computa-
References tional Linguistics: Human Language Technologies,
Volume 2 (Short Papers), pages 175–180.
Huimin Chen, Xiaoyuan Yi, Maosong Sun, Wenhao Li,
and Zhipeng Guo. 2019. Sentiment-controllable chi- Robert Speer and Catherine Havasi. 2012. Represent-
nese poetry generation. pages 4925–4931. ing general relational knowledge in conceptnet 5. In
LREC, pages 3679–3686.
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,
and Yoshua Bengio. 2014. Empirical evaluation of Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
gated recurrent neural networks on sequence model- Dropout: a simple way to prevent neural networks
ing. arXiv preprint arXiv:1412.3555. from overfitting. The journal of machine learning
research, 15(1):1929–1958.
Xiaocheng Feng, Ming Liu, Jiahao Liu, Bing Qin, Yibo
Sun, and Ting Liu. 2018. Topic-to-essay generation Ke Wang and Xiaojun Wan. 2018. Sentigan: Gener-
with neural networks. In IJCAI, pages 4078–4084. ating sentimental texts via mixture adversarial net-
works. In IJCAI, pages 4446–4452.
Yoon Kim. 2014. Convolutional neural networks for
sentence classification. Eprint Arxiv. Pengcheng Yang, Lei Li, Fuli Luo, Tianyu Liu, and
Xu Sun. 2019. Enhancing topic-to-essay generation
Diederik P Kingma and Jimmy Ba. 2014. Adam: A with external commonsense knowledge. In Proceed-
method for stochastic optimization. arXiv preprint ings of the 57th Annual Meeting of the Association
arXiv:1412.6980. for Computational Linguistics, pages 2002–2012.
Xiaopeng Yang, Xiaowen Lin, Shunda Suo, and Ming
Diederik P Kingma and Max Welling. 2013. Auto- Li. 2017. Generating thematic chinese poetry using
encoding variational bayes. arXiv preprint conditional variational autoencoders with hybrid de-
arXiv:1312.6114. coders. arXiv preprint arXiv:1711.07632.
Leo Leppänen, Myriam Munezero, Mark Granroth- Xiaoyuan Yi, Maosong Sun, Ruoyu Li, and Wenhao
Wilding, and Hannu Toivonen. 2017. Data-driven Li. 2018. Automatic poetry generation with mutual
news generation for automated journalism. In Pro- reinforcement learning. In Proceedings of the 2018
ceedings of the 10th International Conference on Conference on Empirical Methods in Natural Lan-
Natural Language Generation, pages 188–197. guage Processing, pages 3143–3153.
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu.
and Bill Dolan. 2015. A diversity-promoting objec- 2017. Seqgan: Sequence generative adversarial nets
tive function for neural conversation models. with policy gradient. In Thirty-First AAAI Confer-
ence on Artificial Intelligence.
Juntao Li, Yan Song, Haisong Zhang, Dongmin Chen,
Shuming Shi, Dongyan Zhao, and Rui Yan. 2018. Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi.
Generating classical chinese poems via conditional 2017. Learning discourse-level diversity for neural
variational autoencoder and adversarial training. In dialog models using conditional variational autoen-
Proceedings of the 2018 Conference on Empirical coders. arXiv preprint arXiv:1703.10960.
Methods in Natural Language Processing, pages Xianda Zhou and William Yang Wang. 2017. Mojitalk:
3890–3900. Generating emotional responses at scale. arXiv
preprint arXiv:1711.04090.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Jing Zhu. 2002a. Bleu: a method for automatic eval-
uation of machine translation. In Proceedings of
the 40th Annual Meeting of the Association for Com-
putational Linguistics, pages 311–318, Philadelphia,
Pennsylvania, USA. Association for Computational
Linguistics.
3344