Li, J., Et Al. (2022) - A Survey of Discourse Parsing. Frontiers of Computer Science
Li, J., Et Al. (2022) - A Survey of Discourse Parsing. Frontiers of Computer Science
REVIEW ARTICLE
Abstract Discourse parsing is an important research area in discourse units (EDUs) are represented as vertexes in the tree
natural language processing (NLP), which aims to parse the [7]. Discourse treebank plays an important role in discourse
discourse structure of coherent sentences. In this survey, we parsing. Inspired by RST, Rhetorical Structure Theory
introduce several different kinds of discourse parsing tasks, Discourse Treebank (RST-DT) [8] is released that is a typical
mainly including RST-style discourse parsing, PDTB-style English discourse treebank. Instead of the tree structure, [9]
discourse parsing, and discourse parsing for multiparty dia- adopts graph structure to represent discourse and release
logue. For these tasks, we introduce the classical and recent Discourse Graphbank on LDC which contains 135 documents
existing methods, especially neural network approaches. After (105 documents from AP Newswire and 30 documents from
that, we describe the applications of discourse parsing for other the Wall Street Journal (WSJ)).
NLP tasks, such as machine reading comprehension and Different from RST building the full structure of discourse
sentiment analysis. Finally, we discuss the future trends of the into an RST tree, PDTB-style discourse parsing mainly focuses
task. on the discourse relations within the local structure between
two arguments (Arg1 and Arg2). Penn Discourse Treebank
Keywords discourse parsing, discourse structure, RST,
(PDTB) [10] dataset is based on discourse lexical tree (D-
PDTB, STAC
LTAG) [11] and is released as a large-scale discourse treebank
on LDC. The sense in PDTB is a three-layer hierarchical
1 Introduction
structure, including classes, types, and subtypes. The relations
Discourse parsing is a task that can parse discourse structure
can be divided into two classes: explicit and non-explicit,
in text automatically, including identifying discourse structure
considering the absence of connectives. RST-DT and PDTB
and labeling discourse relations. As a fundamental task in
have promoted most of the research on discourse parsing. One
natural language processing (NLP), discourse parsing has been
similarity between the two data sets is that they all derive from
successfully applied in many other NLP tasks, such as ques-
WSJ, a typical well-written text corpus. Figure 2 shows an
tion answering [1], machine reading comprehension [2], senti-
example from both RST-DT and PDTB datasets. From Fig. 2,
ment classification [3], language modeling [4], machine
we can find that the RST discourse parser generates a
translation [5] and text categorization [6].
hierarchical discourse tree, but the PDTB discourse parser
In this survey, we classify discourse parsing (DP) tasks into
only detects connectives, arguments, and the sense between
three main categories: RST-style DP, PDTB-style DP, and
arguments.
dialogue DP. The overview of discourse parsing is shown in
On the other hand, models that are trained in well-written
Fig. 1. Among them, RST-style and PDTB-style discourse
passages dataset maybe not appropriate for spoken language
parsing are tasks for processing passages, but the inputs of
or dialogue text. Furthermore, there are obvious differences in
dialogue discourse parsing are dialogue utterances. RST-style
linguistic properties between passages and dialogue. A
discourse parser aims to obtain the hierarchical rhetorical tree
passage is a continuous text where there is a discourse relation
structure of an input document, but the PDTB discourse parser
between every two adjacent sentences. In contrast, there may
tries to get a flat discourse structure between sentences of
be no discourse relation between adjacent utterances in a
clauses, not a tree. The discourse parser for multiparty
multiparty dialog. Utterances of a multiparty dialog are much
dialogues parses the input dialog into a discourse dependency
less locally coherent than in prose passages. There are
graph, and the discourse relations can exist between non-
research papers that begin to focus on discourse parsing on
adjacent utterances which are different from RST-style DP.
multi-party dialogue, including handcrafted features based on
Rhetorical Structure Theory (RST) is the theory of repre-
shallow models [12,13], and deep sequential models [14].
senting a document into the tree structure where elementary
There are two main challenges in discourse parsing.
Received October 9, 2020; accepted March 9, 2021
● The first challenge is the difficulty of detecting disco-
E-mail: [email protected] urse structure. Similar to other structure prediction
2 Front. Comput. Sci., 2022, 16(5): 165329
Fig. 2 An example between RST-style discourse parsing and PDTB discourse parsing
2.3 Discourse parsing for multiparty dialogue ● Output: G(V, E, R), where V represents vertex set
Most existing research of discourse parsing is about news text. consists of EDUs and |V| = n , and E represents edge set
However, models that are trained in written datasets maybe between EDUs, and R represents discourse relations.
not appropriate for spoken language. Therefore, the annotate
scheme was proposed in spoken language, including telephone The first dataset of discourse parsing for multiparty dialogue
conversations and broadcast interviews, in the style of PDTB is the STAC corpus [21]. The corpus derives from an online
3.0 and CCR [20]. They explored the differences between game The Settlers of Catan. The game Settlers of Catan is a
discourse relations in written language and spoken language. multi-party, win-lose game. More details for the STAC corpus
In written text, the distribution of explicit and implicit are described in [21].
discourse relations are almost the same. But in spoken text, the The overview of the STAC and Molweni corpus are shown
number of explicit relations is almost twice as frequent as in Table 1. From Table 1 we can know that there are more
implicit relations. Another important finding is that more than than 10K EDUs and relations and most of the EDUs are
70% of discourse relations between PDTB 3.0 and CCR can weakly connected.
be mapped into one other. The Molweni corpus is another dataset for multiparty
Different from previous work on the news or monologue dialogue discourse parsing [22]. The Molweni dataset is
dataset, there is little research focuses on discourse parsing on derived from the large-scale multiparty dialogue Ubuntu Chat
multi-party dialogue, including handcrafted features based on Corpus [23]. The name Molweni is the plural form of “Hello”
shallow models [12,13], and deep sequential models [14]. in the Xhosa language, representing multiparty dialogue in the
An example in STAC is shown in Fig. 4. The left side of the same language as Ubuntu. The Molweni dataset contains
figure shows a multi-party dialogue, and the right side of the 10, 000 dialogs with 88, 303 utterances and 32, 700 questions
figure provides the ground truth of the dialogue. In the including answerable and unanswerable questions. All ans-
dialogue, there are three interlocutors, including A, B, and C. werable questions are extractive questions whose answer is a
Three speakers sent six messages in total. Each utterance is an span in the source dialogue. For unanswerable questions, we
EDU and can be regarded as a vertex in the directed acyclic annotate their plausible answers from the dialogue. Most
graph on the right side of the figure. The directed edge questions in Molweni are 5W1H questions – Why, What,
between two EDUs is a discourse dependency relation. For Who, Where, When, and How. For each dialogue in the
example, there is an Elaboration relation from u1 to u2 . corpus, annotators propose three questions and find the answer
Different from discourse parsing on monologue or passages, span (if answerable) in the input dialogue.
there are non-projective relations between non-adjacent EDUs,
such as the QAP relation from u1 to u3 and the QAP relation 2.4 Comparisons
from u4 to u6 . Table 2 compares different discourse parsing task and existing
We formulate the task of discourse parsing on multi-party datasets. RST-style datasets contain RST-DT [8], PDTB [10].
chat dialogue as follows: Furthermore, for multiparty dialogue discourse parsing, there
are STAC [21] and Molweni [22] two datasets.
● Input: D = {u1 , u2 , ..., un } , where D is a multiparty chat From Table 2, we can find the sources, theory, and sub-tasks
dialogue with n utterances. ui is the ith utterance in the of each dataset for discourse parsing. Furthermore, we list the
dialogue. Each utterance is regarded as an elementary statistical information for all datasets.
discourse unit (EDU) in multiparty dialogue discourse
parsing. 3 Existing methods
In this section, we will introduce existing methods for differ-
ent styles of discourse parsing, including RST-style, PDTB-
style, and multiparty dialogue discourse parsing.
via incorporating syntactic and lexical features. As we know, methods [15], and other recursive deep models were proposed
[25] first proposed neural network for segmenting discourse following closely [39]. [40] proposed an attention-based
into EDUs that trained a multi-layer perceptron binary classi- hierarchical Bi-LSTM model with a tensor-based transforma-
fier using lexical and context features. [26] trained a classifier tion module to learn more feature interactions. In recent years,
using finite-state and context-free derived features. [27] there are more transition-based models for RST-style disco-
applied the Dynamic Conditional Random Filed (DCRF) that urse parsing [41−43]. To address the limitation of the amounts
is a probabilistic discriminative model addressing independ- of training data in RST-DT, [44] proposed a multi-view and
ence assumptions and sub-optimal limitation of the greedy multi-task framework to combine related tasks. Considering
algorithms. the similarities of RST-style discourse parsing among different
The second method regards discourse segmentation as a languages, [45] proposed the first cross-lingual RST discourse
sequence labeling task. The model will label each token to parser and achieves the state-of-the-art for English RST
indicate whether the token is the boundary of an EDU. parser.
Usually, models assign B or C labels to a token. If the token is The recent years’ results of RST tree building are shown in
the beginning of an EDU, the token will be labeled as B . Table 3. In the table, S, N, R, and F respectively represent
Otherwise, the token will be labeled as C [28−31]. [32] Span, Nuclearity, Relation, and Full parser. From Table 3, we
proposes a BiLSTM-CRF based model that achieves the state- can find that most neural network-based models do not
of-the-art on F1 measure. Compared to the two-pass model perform significant advantages over traditional hand-craft
[31] and the SPADE model [24], the BiLSTM-CRF based feature-based models. One reason for this phenomenon could
model saves more time and obtain better results. be that the scale of RST-DT limits the training of complex
Models for the discourse segmentation task have achieved neural models. How to effectively train a neural model on a
more than 95% of F1 measure which is quite near to human limited RST-DT dataset would be still challenging for RST-
performance (98%) in F1 measure. Therefore, most RST-style style discourse parser researchers.
discourse parsing researchers pay more attention to the RST
3.2 PDTB-style discourse parsing
tree building task.
As mentioned, PDTB-style discourse parsing contains several
3.1.2 RST tree building tasks, including connectives detection, argument labeling, and
Traditional methods for building RST tree adopt statistical discourse relation recognition, attribute labeling. Most of the
machine learning methods using hand-craft features, such as research on the PDTB dataset can be divided into classes:
context surface features and constituent features. [33,34] explicit discourse parsing and implicit discourse parsing. For
propose two heuristic rules to convert RST tree building task explicit discourse parsing, the task aims to detect connectives,
into discourse dependency parsing task, and [35] proves that label arguments, and recognize explicit discourse relations.
the rule in [33] is more useful for text summarization task. For implicit discourse parsing, two arguments are given, the
[36] proposed two RST parsers that respectively adopt a
constituent tree and dependency tree, and both two parsers Table 3 The micro-F1 score of RST tree building task [42]
achieved the state-of-the-art. [37] proposed a dependency Model S N R F
perspective on RST discourse parsing and evaluation. They Feature-based models
detect the similarities between dependency parsers and shift- Hayashi et al., 2016 82.6 66.6 54.6 54.3
reduce constituency parsers. Furthermore, the experiment Surdeanu et al., 2015 82.6 67.1 55.4 54.9
Joty et al., 2015 82.6 68.3 55.8 55.4
results prove the effects of dependency parsing for RST
Feng and Hirst, 2014 84.3 69.4 56.9 56.2
discourse parsing. [38] proposed the CODRA model which Neural network-based models
adopted a binary classifier to detect the boundary of elemen- Braud et al., 2016 79.7 63.6 47.7 47.5
tary discourse units and two Conditional Random Fields Li et al, 2016 82.2 66.5 51.4 50.6
(CRF) to build both intra-sentential and multi-sentential disco- Braud et al., 2017 81.3 68.1 56.3 56.0
urse trees. Ji&Eisenstein, 2014 82.0 68.2 57.8 57.6
The first neural-based deep model was proposed for RST- Yu et al., 2018 85.5 73.1 60.2 59.9
style discourse parsing and outperformed statistical-based Human 88.3 77.3 65.4 64.7
6 Front. Comput. Sci., 2022, 16(5): 165329
task is to classify implicit discourse relations. implicit discourse relation classification based on multi-task.
Inspired by her work, Liu trained a multi-task neural network
3.2.1 Explicit discourse parsing that only uses PDTB as experimental data but also uses RST-
In explicit discourse relation recognition, since the connec- DT and other data in auxiliary tasks [65]. [66] improved the
tive can indicate discourse relations, recent methods got good inference of implicit discourse relations via classifying explicit
performance. Pitler used an unsupervised method and got a discourse connectives. Different from common methods
good result only using the connective [46]. Besides, there are ignoring implicit discourse connectives, [67] proposed a novel
some supervised methods to recognize explicit discourse model contains a discourse relation classifier and a sequence-
relations. For instance, Pitler used an approach based on some to-sequence model to predict the implicit discourse connec-
syntactic features related to the connective and got an tives. [68] incorporated event knowledge and coreference
improvement in explicit discourse relation recognition [46]. relations into neural discourse parsing. [69] introduces know-
To reduce error propagation, a joint learning approach via ledge information from WordNet to help classify discourse
structured perceptron for explicit discourse parsing was relations and proves the effect of knowledge. To deeper
proposed, they got comparable results on relation classifi- integrates the annotation information, [70] proposed a TransS-
cation and got an improvement on argument labeling [47]. based method that learns a transition from Arg1 + relation to
3.2.2 Implicit discourse parsing Arg2 .
There are mainly three kinds of methods for recognizing The latest results on implicit discourse relation recognition
implicit discourse relations. The first kind of method is for the top four-class classification are shown in Table 4.
separately modeling two arguments. Early research was From Table 4, with introducing the pretrained models for
mainly based on surface features and statistical machine modeling arguments, the performance for detecting T emporal
learning methods [17,18,48−50]. With the success of the and Comparison relations has been significantly improved.
neural network, [51,52] respectively propose the recursive and The classifier for all four relations achieved more than 70% in
recurrent model to learn the representations of arguments. [53] the F1 measure.
compares several different representations for implicit
3.3 Dialogue discourse parsing
discourse classification. When they add features, they get a
The first paper discourse parsing model for multi-party
better result than previously. Ji proposed a novel method for
dialogue was proposed in 2015 [12]. As mentioned above, the
implicit discourse relation classification based on latent
task aims to parse discourse dependency structure in multi-
variable recurrent neural network [4]. To prove the effect of
BERT, [54] tries to prove the next sentence predict task for party chat dialogue. In the paper, they adopted maximum
implicit discourse relation and achieves great improvements entropy (MaxEnt) using hand-craft features to learn the local
on the implicit discourse relation recognition task. Further- distribution. Instead of directly using probabilities from
more, [55] adopt the BERT model to represent arguments and MaxEnt for classifier binary attachment and discourse rela-
focus on the connectives. The BERT-based model achieves tions, they used Maximum Spanning Trees (MST) for deco-
the state-of-the-art and gets obvious improvements on ding.
T emporal and Comparison (very few instances on the In the paper, the authors adopted these three categories of
datasets) two types on PDTB. features, including positional features, lexical features, and
The second kind of method for recognizing implicit parsing features.
discourse relations is not only modeling each argument but ● Positional feature: speaker initiated the dialogue, the
also model the interactions between two arguments. Many first utterance of the speaker, in the dialogue, position
papers have proved the effect of word pairs between two in dialogue, distance between EDUs, and EDUs have
arguments to classify implicit discourse relations [56−58]. To the same speaker.
solve the data sparsity and use the word-pair feature, Chen ● Lexical feature: ends with an exclamation mark, ends
proposed new deep architecture with a gated relevance with interrogation mark, contains possessive pronouns,
network (GRN) [59]. [60] proposed a new generative-discri- contains modal modifiers, contains words in lexicons,
minative framework that utilizes a new method to represent
contains question words, contains a player’s name,
semantically and get a good result. [61] considers the lingu-
contains emoticons, and first and last words.
istic characteristics including semantic interaction and the
● Parsing feature: subject lemmas given by syntactic,
cohesion device (topic continuity and attribution) for three
important discourse relations: Comparison, Contingency and Table 4 The performance of implicit discourse relation recognition on
Expansion. [62] proposed a neural tensor network with a PDTB
sparse constraint to obtain deeper and more indicative pair One-Versus-All
patterns. [63] proposed a multi-level argument representation Model
Comp. Cont. Expa. Temp.
model that learns the representations of character, sub-word, Rutherford&Xue, 2015 41.0 53.8 69.4 33.3
word, sentence, and sentence pair. [64] adopts two factored Lei et al., 2018 43.24 57.82 72.88 29.1
tensor networks (FTN) to model interactions between two Bai&Zhao., 2018 47.85 54.47 70.6 36.87
arguments and incorporate topic representations. Shi&Demberg, 2019 41.83 62.07 69.58 35.72
The third kinds of method adopt joint learning or multi-task Dai&Huang, 2019 45.34 51.8 68.5 45.93
architecture. In 2013, [50] firstly propose a method for Kishimoto et al., 2020 77.28 73.85 73.4 79.41
Jiaqi LI et al. A survey of discourse parsing 7
dependency parsing, and dialogue act according to pre- 4.1 Question answering
dict model. Discourse information has been explored for question answe-
ring (QA) systems. Considering questions are often related in
Because MST-based method only can predict tree structure real QA systems, discourse information is used to model the
discourse dependency structures, there are 9% in the dataset relation between context questions. [77] first proposes a
structure cannot be predicted. To predict non-tree structures in discourse-aware model for context question answering. To
the DAGs, an integer linear programming (ILP) based method explore the use and role of discourse in context QA, they
was proposed [13]. Besides local distribution between EDUs, propose that the discourse status relates to the discourse role
the ILP-based method also compute global representation for of entities and discourse transitions. [78] examined three
decoding. They implemented the constraints in ILP as models by Centering Theory to model question sequence as a
following equations: discourse for question answering and achieved obvious impr-
∑
n ovements.
hi = 1, The above two papers all are about model question sequence
i=1 as a discourse and adopt the discourse structure for answering
questions. Another application of discourse is ranking answers
∑
n
∀ j, 1 ⩽ nh j + ai j ⩽ n. for the non-factoid QA system [1]. They combine lexical
i=1 semantics with discourse information and adopt two different
Different from previous work on discourse parsing on multi- methods to represent discourse information: a shallow disco-
party dialogue, Shi and Huang first adopts the deep sequential urse marker model and an RST discourse parser model.
model for discourse parsing on multi-party chat dialogue [14]. Experiments demonstrate the effect of two representations and
They also adopt an iterative algorithm to learn the structured prove modeling discourse structure is helpful for non-factoid
representation and highlight the speaker’s information in the questions.
dialogue. The ablation experiments prove the efforts of 4.2 Machine reading comprehension (MRC)
structured representation and speak the highlight mechanism. Different from question answering task, the machine reading
The architecture of their model is shown in Fig. 2. The model comprehension (MRC) task aims to let the machine answer the
learns the dependency structure and discourse relations jointly questions given input passages or dialogues.
and alternately. The structured representations are computed
Discourse structure has been used for modeling input pass-
as follows:
ages and detecting relations between passages and questions.
0, i = 0, The effectiveness of the discourse structure for MRC has been
GRUhl (gSj,a , hi ⊕ r j i), ai = a, i > 0,
proved. [2] incorporates discourse relations for machine
gSi,a =
(1)
reading tasks. In the paper, they adopt a hidden variable to
GRUgen (g j,a , hi ⊕ r j i), ai , a, i > 0,
S
represent discourse relations, including Causal, Temporal,
Explanation and Other. [79] proposed a novel method using
where hl and gen are highlighted and general respectively.
an answer-entailing structure that models discourse structure
The results of the existing models of discourse parsing in the
within the text by RST and word alignments between text and
multiparty dialogue on the STAC dataset are shown in Table 5.
Similar to the dependency parsing task, we adopt UAS and hypothesis.
LAS to represent the performance of models that are short for 4.3 Sentiment analysis
unlabeled attachment score and labeled attachment score. For Sentiment analysis is a classical text classification task that
multiparty dialogue discourse parsing task, UAS and LAS detects the sentiments or emotions of input text, such as
respectively show the performance of models in identifying positive and negative, or happiness and sadness.
discourse structure and both identifying structure and labeling Discourse structure has been successfully applied to senti-
relations. ment analysis. [3] adopt discourse structure for classifying
sentiment. In RST-style discourse parsing, nucleus nodes play
4 Applications
more important roles than satellite nodes in hypotactic RST
As a fundamental task in natural language processing (NLP),
relations (RST subtree with two nodes). Considering the
discourse parsing has been successfully applied in other NLP
difference of sentiment between the satellite node and nucleus
tasks, such as question answering (QA) [71], text summari-
node, the final sentiment of inputs would be more affected by
zation [72−74], sentiment classification [3], language mode-
nucleus nodes instead of satellite nodes. [80] proposed another
ling [4], machine translation [5,75,76], and text categorization
sentiment analysis neural model Discourse-LSTM based on
[6]. In this section, we will briefly introduce the application of
RST.
discourse parsing for QA. MRC and sentiment analysis task.
4.4 Text summarization
Table 5 The performance of discourse parsing on multi-party dialogues Text summarization is the task that summarizes the input
Model UAS LAS document into a summary. For neural network-based models
MST [12] 68.8 50.4 of text summarization, it is an essential task for modeling the
ILP [13] 68.6 52.1 input document. Discourse structure has been proved for its
Deep Sequential [14] 73.2 55.7 improvements in text summarization.
8 Front. Comput. Sci., 2022, 16(5): 165329
As we know, [72] first proposed discourse-based framework 5.2 Deep graph-based method
for document summarization. [81] proved the benefits of There are two types of methods in the task of semantic
discourse structure for content selection in text summarization dependency parsing, including the transition-based approach
task, including RST-base structural features and PDTB-based and graph-based approach. Because the task of discourse
semantic features. [33] adopted RST discourse parser obtains parsing for multiparty dialogue is non-projective, transition-
discourse dependency relations of a document and trimmed based methods will not work for this task. There has been
the discourse dependency tree as a tree knapsack problem.
literature that adopted a transition-based method for semantic
[73] extracted the discourse structure of product reviews by
dependency graph parsing, but they did not achieve good
off-the-shelf RST-style discourse parser to build the aspect
rhetorical tree, and select important aspects for generating results [88]. Furthermore, with the success of graph neural
summary via a template-based framework. Different from network (GNN) on NLP, it will be worth investigating
[33], [82] can directly generate discourse dependency tree for exploring the GNN-based approach for this RST-style or
text summarization without transforming the rhetorical multiparty dialogue discourse parsing.
discourse trees into dependency-based trees. [83] adopted both
anaphora constraints and grammatical constraints including 5.3 Meta-learning based method
RST and syntactic trees. [84] examined the role of the EDUs Due to the limitation of the discourse treebank, there are not
from the RST discourse parser and proved the benefits of enough instances for many discourse relations. From example,
EDU segmentation for content selection in text summariza- there are little instances of Background and Alternation in the
tion. [85] adopted RST discourse parser to segment discourse STAC dataset. The meta-learning method can be a good
units and select content using different models, including solution that can be naturally applicable to a few-shot or one-
RNN, transformer, and BERT. [74] proposes a discourse- shot phenomenon. Meta-learning can be regarded as learning
aware neural model that captures the discourse structure using to learn, which aims to fast adapt to new training data. We can
the RST tree and encodes discourse units with a graph neural learn meta-knowledge on other big datasets, and then apply
network. the meta-knowledge in discourse treebank.
4.5 Machine translation Meta-learning methods have been successfully applied to
Machine translation is a traditional natural language proces- many tasks including regression, classification, and reinforce-
sing task that aims to translate the source language into the ment learning. However, there are not many studies that apply
target language. Discourse structure is used to model the meta-learning to natural language processing, especially
semantic relations between discourse units and has been structural prediction and text categorization.
applied on machine translation [86,87]. Considering the
importance and ambiguity of discourse connectives, Meyer 5.4 Exploring pre-trained representations
introduces connectives to help machine translation [5,75,76]. Two pre-training approaches Elmo and BERT have attracted
After decades of development, the research on machine widespread attention, because these models can significantly
translation has achieved great progress from word-level and improve the performance of many NLP tasks. Different from
phrase-level translation tasks to sentence-level translation other approaches, Elmo is a general method to learn context-
tasks. Document-level machine translation would be an dependent representations from BiLSTM [89]. Followed by
important future tend where the discourse structure of the Elmo, BERT is proposed to learn word representations based
input document would play a more essential role in modeling
on bidirectional transformer [90].
text.
Because the RST, PDTB, and STAC are all small scale
5 Future trends datasets, using pre-trained word representations trained from a
5.1 Building a large-scale corpus large corpus may significantly improve the performance of
As mentioned above, the dataset of discourse parsing dialogue discourse parsing for multi-party dialogue. It is difficult to
has been the bottleneck of this task. To further promote the build a large-scale dataset in some NLP tasks with complex
development of the task, it is necessary to build a large-scale structures. Therefore, using pre-trained representations on
high-quality corpus. Two points need more attention. small data sets would become a trend in NLP. Exploring pre-
training representations for multi-party discourse parsing
● Scale Deep learning models are data-driven and enough
should be worth studying.
training data is necessary. For example, for multiparty
The effect of pretrained models has been proved on the
dialogue discourse parsing, the number of dialogue,
PDTB dataset and achieved great improvement [55]. Because
EDUs, and discourse relations should big enough to
train a powerful model. Besides, there should be of the scale of the existing corpus for discourse parsing, the
enough instances in each discourse relations to avoid use of pretrained models should be furthermore explored.
few-shot situations. 5.5 Multitask architecture with RST and PDTB
● Consistency Due to the difficulty of annotating disco-
There are so many kinds of literature about discourse parsing
urse structure and relations, it would be a challenging
on RST-DT and PDTB, and there has been work combining
task to ensure the consistency of annotation. The
RST and PDTB [65]. It is necessary to combine previous
annotators should be well trained and fewer annotators
would be better. methods and expand the dataset at the same time for the task
of discourse parsing for multi-party dialogue.
Jiaqi LI et al. A survey of discourse parsing 9
There are two subtasks in discourse parsing for multi-party Table A2 The sense hierarchy of PDTB v3.0
dialogue, including predicting edges between EDUs and Level-1 Level-2 Level-3
labeling discourse relations on each edge. For predicting edges Synchronous −
between EDUs, structural prediction on RST-DT can be an Temporal Precedence
Asynchronous
auxiliary task for our main task. RST aims to build a Succession
document into a tree structure, while discourse parsing on the Reason
Cause Result
STAC needs to construct a graph structure. Therefore, com-
NegResult
mon approaches for RST parsing are not available for this Reason+Belief
task. But we can expand our dataset using multitask architec- Cause+Belief
Result+Belief
ture considering the limited instances in datasets. Reason+SpeechAct
Cause+SpeechAct
For labeling discourse relations, RST-DT and PDTB all can Result+SpeechAct
be used in a multitask architecture. The relations in STAC are Contingency Arg1-as-cond
Condition
quite similar to RST and PDTB, so discourse parsing on RST- Arg2-as-cond
Condition+SpeechAct −
DT and PDTB would improve the accuracy of labeling
Arg1-as-negCond
relations in the STAC. In particular, labeling relations in the Negative-condition
Arg2-as-negCond
dialogue is related to implicit discourse relations recognition Negative-condition+SpeechAct −
in PDTB. Therefore, different relevant tasks could be Arg1-as-goal
Purpose
beneficial to one another. Arg2-as-goal
Arg1-as-denier
Concession
6 Conclusion Arg2-as-denier
In this survey, we introduce the task of discourse parsing and Comparison Concession+SpeechAct Arg2-as-denier+SpeechAct
related datasets, mainly including RST-DT, PDTB, and Contrast −
Similarity −
STAC. Furthermore, we introduce existing methods for
Conjunction −
discourse parsing. We describe the applications of discourse Disjunction −
parsing and show our opinion on this task. At last, we intro- Equivalence −
duce trends and related work. Arg1-as-excpt
Exception
Arg2-as-excpt
Acknowledgements The research in this article is supported by the Science Arg1-as-instance
and Technology Innovation 2030 -“New Generation Artificial Intelligence” Instantiation
Expansion Arg2-as-instance
Major Project (2018AA0101901), the National Key Research and Arg1-as-detail
Development Project (2018YFB1005103), the National Natural Science Level-of-detail
Arg2-as-detail
Foundation of China (Grant Nos. 61772156 and 61976073), Shenzhen
Arg1-as-manner
Foundational Research Funding (JCYJ20200109113441941), and the Manner
Arg2-as-manner
Foundation of Heilongjiang Province (F2018013).
Arg1-as-subst
Substitution
Arg2-as-subst
Appendix: Sense hierarchy
The sense hierarchy of RST-DT, PDTB 3.0, and STAC are
Table A3 The sense hierarchy and distribution of the STAC corpus
respectively shown in Tables A1−A3.
Relation Train Dev Test
Table A1 The sense hierarchy of RST-DT corpus Comment 1851 1684 167
Clarification_question 260 240 20
Relation Class
Elaboration 869 771 98
Attribution attribution, attribution-negative
Acknowledgment 1010 893 117
Background background, circumstance
Continuation 987 873 114
Cause cause, result, consequence
Explanation 437 407 30
Comparison comparison, preference, analogy, proportion
Conditional 124 105 19
Condition condition, hypothetical, contingency, otherwise
Question-answer_pair 2541 2236 305
Contrast contrast, concession, antithesis
Alternation 146 128 18
elaboration-additional, elaboration-general-specific,
elaboration-part-whole, elaboration-process-step, Q-Elab 599 525 74
Elaboration Result 578 551 27
elaboration-object-attribution, elaboration-set-member,
example, definition Background 61 58 3
Enablement purpose, enablement Narration 130 116 14
Evaluation evaluation, interpretation, conclusion, comment Correction 212 189 23
Explanation evidence, explanation-argumentative, reason Parallel 215 196 19
Joint list, disjunction Contrast 493 449 44
Manner-Means manner, means Total 10513 9421 1092
problem-solution, question-answer, statement-response,
Topic-Comment topic-comment, comment-topic, rhetorical-question
Summary summary, restatement References
temporal-before, temporal-after, temporal-same-time,
Temporal sequence, inverted-sequence 1. Jansen P, Surdeanu M, Clark P. Discourse complements lexical
Topic-Change topic-shift, topic-drift semantics for non-factoid answer reranking. In: Proceedings of the 52nd
10 Front. Comput. Sci., 2022, 16(5): 165329
38. Joty S, Carenini G, Ng R T. Codra: A novel discriminative framework brown cluster pair representation and coreference patterns. In:
for rhetorical analysis. Computational Linguistics, 2015, 41(3): 385–435 Proceedings of the 14th Conference of the European Chapter of the
39. Li J, Li R, Hovy E. Recursive deep models for discourse parsing. In: Association for Computational Linguistics. 2014, 645−654
Proceedings of the 2014 Conference on Empirical Methods in Natural 57. McKeown K, Biran O. Aggregated word pair features for implicit
Language Processing (EMNLP). 2014, 2061−2069 discourse relation disambiguation. In: Proceedings of the 51st Annual
40. Li Q, Li T, Chang B. Discourse parsing with attention-based Meeting of the Association for Computational Linguistics. 2013, 69–73
hierarchical neural networks. In: Proceedings of the 2016 Conference on 58. Lei W, Wang X, Liu M, Ilievski I, He X, Kan M Y. Swim: A simple
Empirical Methods in Natural Language Processing. 2016, 362–371 word interaction model for implicit discourse relation recognition. In:
41. Jia Y, Ye Y, Feng Y, Lai Y, Yan R, Zhao D. Modeling discourse Proceedings of the 26th International Joint Conference on Artificial
cohesion for discourse parsing via memory network. In: Proceedings of Intelligence. 2017, 4026−4032
the 56th Annual Meeting of the Association for Computational 59. Chen J, Zhang Q, Liu P, Qiu X, Huang X. Implicit discourse relation
Linguistics (Volume 2: Short Papers). 2018, 438–443 detection via a deep architecture with gated relevance network. In:
42. Yu N, Zhang M, Fu G. Transition-based neural rst parsing with implicit Proceedings of the 54th Annual Meeting of the Association for
syntax features. In: Proceedings of the 27th International Conference on Computational Linguistics (Volume 1: Long Papers). 2016, 1726−1735
Computational Linguistics. 2018, 559–570 60. Chen J, Zhang Q, Liu P, Huang X. Discourse relations detection via a
43. Jia Y, Feng Y, Ye Y, Lv C, Shi C, Zhao D. Improved discourse parsing mixed generative-discriminative framework. In: Proceedings of
with two-step neural transition-based model. ACM Transactions on Thirtieth AAAI Conference on Artificial Intelligence. 2016, 30(1)
Asian and Low-Resource Language Information Processing (TALLIP), 61. Lei W, Xiang Y, Wang Y, Zhong Q, Liu M, Kan M Y. Linguistic
2018, 17(2): 11 properties matter for implicit discourse relation recognition: Combining
44. Braud C, Plank B, Søgaard A. Multi-view and multi-task training of rst semantic interaction, topic continuity and attribution. In: Proceedings of
discourse parsers. In: Proceedings of COLING 2016, the 26th Interna- the Thirty-Second AAAI Conference on Artificial Intelligence, 2018,
tional Conference on Computational Linguistics: Technical Papers. 32(1)
2016, 1903−1913 62. Guo F, He R, Jin D, Dang J, Wang L, Li X. Implicit discourse relation
45. Braud C, Coavoux M, Søgaard A. Cross-lingual rst discourse parsing. recognition using neural tensor network with interactive attention and
In: Proceedings of the 15th Conference of the European Chapter of the sparse learning. In: Proceedings of the 27th International Conference on
Association for Computational Linguistics: Volume 1, Long Papers. Computational Linguistics. 2018, 547–558
2017, 292–304 63. Bai H, Zhao H. Deep enhanced representation for implicit discourse
46. Pitler E, Nenkova A. Using syntax to disambiguate explicit discourse relation recognition. In: Proceedings of the 27th International Confer-
connectives in text. In: Proceedings of the ACL-IJCNLP 2009 ence on Computational Linguistics. 2018, 571–583
Conference Short Papers. 2009, 13–16 64. Xu S, Li P, Kong F, Zhu Q, Zhou G. Topic tensor network for implicit
47. Li S, Kong F, Zhou G. A joint learning approach to explicit discourse discourse relation recognition in chinese. In: Proceedings of the 57th
parsing via structured perceptron. Chinese Computational Linguistics Annual Meeting of the Association for Computational Linguistics.
and Natural Language Processing Based on Naturally Annotated Big 2019, 608–618
Data, Springer, Cham, 2014, 70−82 65. Liu Y, Li S, Zhang X, Sui Z. Implicit discourse relation classification
48. Marcu D, Echihabi A. An unsupervised approach to recognizing via multi-task neural networks. In: Proceedings of the Thirtieth AAAI
discourse relations. In: Proceedings of the 40th Annual Meeting on Conference on Artificial Intelligence. 2016, 2750−2756
Association for Computational Linguistics. 2002, 368–375 66. Rutherford A, Xue N. Improving the inference of implicit discourse
49. Wang X, Li S, Li J, Li W. Implicit discourse relation recognition by relations via classifying explicit discourse connectives. In: Proceedings
selecting typical training examples. In: COLING. 2012, 2757−2772 of the 2015 Conference of the North American Chapter of the
50. Lan M, Xu Y, Niu Z Y. Leveraging synthetic discourse data via multi- Association for Computational Linguistics: Human Language Techno-
task learning for implicit discourse relation recognition. In: Proceedings logies. 2015, 799–808
of the 51st Annual Meeting of the Association for Computational 67. Shi W, Demberg V. Learning to explicitate connectives with seq2seq
Linguistics (Volume 1: Long Papers). 2013, 476–485 network for implicit discourse relation classification. In: Proceedings of
51. Ji Y, Eisenstein J. One vector is not enough: Entity-augmented the 13th International Conference on Computational SemanticsLong
distributed semantics for discourse relations. Transactions of the Papers. 2019, 188–199
Association for Computational Linguistics, 2015, 3: 329–344 68. Dai Z, Huang R. A regularization approach for incorporating event
52. Rutherford A T, Demberg V, Xue N. Neural network models for knowledge and coreference relations into neural discourse parsing. In:
implicit discourse relation classification in english and chinese without Proceedings of the 2019 Conference on Empirical Methods in Natural
surface features. 2016, arXiv preprint arXiv: 1606.01990 Language Processing and the 9th International Joint Conference on
53. Braud C, Denis P. Comparing word representations for implicit Natural Language Processing (EMNLP-IJCNLP). 2019, 2967−2978
discourse relation classification. In: Proceedings of Empirical Methods 69. Guo F, He R, Dang J, Wang J. Working memory-driven neural
in Natural Language Processing (EMNLP 2015). 2015 networks with a novel knowledge enhancement paradigm for implicit
54. Shi W, Demberg V. Next sentence prediction helps implicit discourse discourse relation recognition. In: Proceedings of the AAAI Conference
relation classification within and across domains. In: Proceedings of the on Artificial Intelligence. 2020, 7822−7829
2019 Conference on Empirical Methods in Natural Language Proces- 70. He R, Wang J, Guo F, Han Y. TransS-driven joint learning architecture
sing and the 9th International Joint Conference on Natural Language for implicit discourse relation recognition. In: Proceedings of the 58th
Processing (EMNLP-IJCNLP). 2019, 5794−5800 Annual Meeting of the Association for Computational Linguistics.
55. Kishimoto Y, Murawaki Y, Kurohashi S. Adapting bert to implicit 2020, 139–148
discourse relation classification with a focus on discourse connectives. 71. Verberne S, Boves L, Oostdijk N, Coppen P A. Evaluating discour-
In: Proceedings of The 12th Language Resources and Evaluation sebased answer extraction for why-question answering. In: Proceedings
Conference. 2020, 1152−1158 of the 30th annual international ACM SIGIR conference on Research
56. Rutherford A, Xue N. Discovering implicit discourse relations through and development in information retrieval. 2007, 735–736
12 Front. Comput. Sci., 2022, 16(5): 165329
72. Marcu D. The theory and practice of discourse parsing and summari- relations. 1993
zation. MIT press, 2000 88. Wang Y, Che W, Guo J, Liu T. A neural transition-based approach for
73. Gerani S, Mehdad Y, Carenini G, Ng R, Nejat B. Abstractive semantic dependency graph parsing. In: Proceedings of the AAAI
summarization of product reviews using discourse structure. In: Conference on Artificial Intelligence. 2018, 32(1)
Proceedings of the 2014 conference on empirical methods in natural 89. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K,
language processing (EMNLP). 2014, 1602−1613 Zettlemoyer L. Deep contextualized word representations. In:
74. Xu J, Gan Z, Cheng Y, Liu J. Discourse-aware neural extractive text Proceedings of the 2018 Conference of the North American Chapter of
summarization. In: Proceedings of the 58th Annual Meeting of the the Association for Computational Linguistics: Human Language
Association for Computational Linguistics. July 2020, 5021−5031 Technologies, Volume 1 (Long Papers). 2018, 2227−2237
75. Meyer T. Disambiguating temporal-contrastive connectives for machine 90. Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of
translation. In: Proceedings of the ACL 2011 Student Session. June deep bidirectional transformers for language understanding. In:
2011, 46–51 Proceedings of the 2019 Conference of the North American Chapter of
76. Meyer T, Popescu-Belis A, Zufferey S, Cartoni B. Multilingual the Association for Computational Linguistics: Human Language
annotation and disambiguation of discourse connectives for machine Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186
translation. In: Proceedings of Association for Computational
Linguistics-Proceedings of 12th SIGdial Meeting on Discourse and Jiaqi Li received the BS degree from the School of
Dialogue, number CONF. 2011 Computer Science and Technology, Heilongjiang
77. Chai J, Jin R. Discourse structure for context question answering. In: University, China in 2015. He is currently
Proceedings of the Workshop on Pragmatics of Question Answering at
working toward the PhD degree in the Harbin
HLT-NAACL 2004. 2004, 23–30
Institute of Technology, China. His research
78. Sun M, Chai J Y. Discourse processing for context question answering
based on linguistic knowledge. Knowledge-Based Systems, 2007, 20(6):
interests include discourse parsing for multiparty
511–526 dialogues and its applications.
79. Sachan M, Dubey K, Xing E, Richardson M. Learning answerentailing
structures for machine comprehension. In: Proceedings of the 53rd Ming Liu received the PhD degree from the
Annual Meeting of the Association for Computational Linguistics and School of Computer Science and Technology,
the 7th International Joint Conference on Natural Language Processing Harbin Institute of Technology, China in 2010. He
(Volume 1: Long Papers). 2015, 239–249 is a full professor/PhD supervisor of the
80. Kraus M, Feuerriegel S. Sentiment analysis based on rhetorical structure
Department of Computer Science, and the faculty
theory: Learning deep neural networks from discourse trees. Expert
member of Social Computing and Information
Systems with Applications, 2019, 118: 65–79
81. Louis A, Joshi A, Nenkova A. Discourse indicators for content selection
Retrieval (HIT-SCIR), Harbin Institute of
in summarization. In: Proceedings of the SIGDIAL 2010 Conference. Technology, China. His research interests include knowledge graph,
2010, 147–156 machine reading comprehension.
82. Yoshida Y, Suzuki J, Hirao T, Nagata M. Dependency-based discourse
parser for single-document summarization. In: Proceedings of the 2014 Bing Qin received the PhD degree from the
Conference on Empirical Methods in Natural Language Processing School of Computer Science and Technology,
(EMNLP). 2014, 1834−1839 Harbin Institute of Technology, China in 2005.
83. Durrett G, Berg-Kirkpatrick T, Klein D. Learning-based singledocu-
She is a full professor of the Department of
ment summarization with compression and anaphoricity constraints. In:
Computer Science, and the director of the
Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). 2016, 1998−2008
Research Center for Social Computing and
84. Li J J, Thadani K, Stent A. The role of discourse units in near-extractive Information Retrieval (HIT-SCIR), Harbin
summarization. In: Proceedings of the 17th Annual Meeting of the Institute of Technology, China. Her research interests include natural
Special Interest Group on Discourse and Dialogue. 2016, 137–147 language processing, information extraction, document-level
85. Liu Z, Chen N. Exploiting discourse-level segmentation for extractive discourse analysis, and sentiment analysis.
summarization. In: Proceedings of the 2nd Workshop on New Frontiers
in Summarization. 2019, 116–121 Ting Liu received the PhD degree from the
86. Haenelt K. Towards a quality improvement in machine translation:
Department of Computer Science, Harbin Institute
Modelling discourse structure and including discourse development in
of Technology, China in 1998. He is a full
the determination of translation equivalents. In: Proceedings of the 4th
International Conference on Theoretical and Methodological Issues in
professor of the School of Computer Science and
Machine Translation. Mor-ristown: Association for Computational Technology, and the director of Faculty of
Linguiscs. 1992, 205–212 Computing, Harbin Institute of Technology,
87. Mitkov R. How could rhetorical relations be used in machine China. His research interests include information
translation? In: Proceedings of Intentionality and structure in discourse retrieval, natural language processing, and social media analysis.