0% found this document useful (0 votes)
7 views8 pages

Cho, Y. M., Et Al. (2022) - Unsupervised Entity Linking With Guided Summarization and Multiple-Choice Selection. EMNLP

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Unsupervised Entity Linking with Guided Summarization

and Multiple-Choice Selection

Young-Min Cho Li Zhang Chris Callison-Burch


University of Pennsylvania
{jch0, zharry, ccb}@seas.upenn.edu

Abstract
Entity linking, the task of linking potentially
ambiguous mentions in texts to correspond-
ing knowledge-base entities, is an important
component for language understanding. We
address two challenge in entity linking: how
to leverage wider contexts surrounding a men-
tion, and how to deal with limited training data.
Figure 1: Example of an Entity Linking problem.
We propose a fully unsupervised model called
SumMC that first generates a guided summary
of the contexts conditioning on the mention,
and then casts the task to a multiple-choice available in KBs in low-resource domains such as
problem where the model chooses an entity medicine or law. Thus, we focus on fully unsu-
from a list of candidates. In addition to evalu- pervised EL, which only has access to the entities’
ating our model on existing datasets that focus names and their KB relations like subclass-of
on named entities, we create a new dataset that (Le and Titov, 2019; Arora et al., 2021).
links noun phrases from WikiHow to Wikidata.
One challenge of unsupervised EL is leveraging
We show that our SumMC model achieves state-
of-the-art unsupervised performance on our useful information from potentially noisy and mis-
new dataset and on existing datasets. leading context (Pan et al., 2015). Specifically, a
local context (the sentence containing the mention)
1 Introduction may not be sufficient for disambiguating the target
Entity linking (EL) is an important Natural Lan- mention without the global context (other sentences
guage Processing (NLP) task that associates am- in the document). For example, in Figure 1, the tar-
biguous mentions to corresponding entities in a get mention ‘band’ cannot be disambiguated solely
knowledge base (KB, also called knowledge graph). with the local context “This band is so lovely”, but
EL is a crucial component of many NLP applica- needs to consider the global context that also in-
tions, such as question answering (Yih et al., 2015) cludes “I can’t wait for my wedding.”
and information extraction (Hoffart et al., 2011). To address this problem, we introduce an unsu-
Although there have been significant and contin- pervised approach to EL that builds on the strengths
uous developments of EL, most work requires suffi- of large neural language models like GPT-3 (Brown
cient labeled data and a well-developed KB (Zhang et al., 2020). We use zero-shot GPT-3 prompting
et al., 2021; Mulang’ et al., 2020; van Hulst et al., for two sub-tasks. First, we perform guided sum-
2020; Raiman and Raiman, 2018). However, many marization, which summarizes the input document
real-world applications, especially those in specific conditioned on the target mention and outputs a
domains, suffer from scarcity of both training data condensed global context. Then, we cast EL to
and a fully-populated KB. Previous research has a multiple-choice selection problem where the
tackled this problem by learning EL models with- model chooses an entity from a list of candidates.
out data labeled entity links, but requires indirect We refer to our unsupervised EL model as SumMC
supervision in the form of textual descriptions at- (Summarization+Multiple-Choice).
tached to entities in KBs, drawn from sources such With a few exceptions (Ratinov et al., 2011;
as Wikipedia (Cao et al., 2017; Logeswaran et al., Cheng and Roth, 2013), the majority of EL work
2019). However, such descriptions may not be targets named entities, such as names of people
9394
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9394 - 9401
December 7-11, 2022 ©2022 Association for Computational Linguistics
Figure 2: Pipeline of SumMC. Texts highlighted with green are machine generated.

and organizations (Mulang’ et al., 2020; van Hulst second casts EL to a multiple-choice selection prob-
et al., 2020), neglecting entities such as physical lem and chooses an appropriate entity from a list
objects or concepts. To comprehensively evalu- of candidates generated by some heuristics. In our
ate our model, we create the first EL dataset on work, we use GPT-3 as the language model due
procedural texts, WikiHow-Wikidata, which links to its superior performance on various NLP tasks
noun phrases from WikiHow1 to Wikidata2 entities (Brown et al., 2020).
(Vrandečić and Krötzsch, 2014).
Our SumMC model outperforms current state- Candidate Generation. Following previous work
of-the-art (SoTA) unsupervised EL models on our (Le and Titov, 2019; Arora et al., 2021), we first
new WikiHow-Wikidata data, as well as exist- select all entities from Wikidata whose name or
ing benchmarks including AIDA-CoNLL (Hoffart alias contains all tokens in a mention. Then, we
et al., 2011), WNED-Wiki and WNED-Clueweb narrow it down to the top 20 entities with the high-
dataset (Guo and Barbosa, 2018). In addition, we est degree (in-degree + out-degree) in the KB. For
also provide ablation studies to show the positive each entity in the final list, we produce a textual
influence of generating guided summaries.3 representation by concatenating the names of all
related entities. For example, the representation of
2 Methodology the candidate ribbon in Figure 1 is ribbon: costume
component, textile.
Fully unsupervised EL is the task that links a target
mention from a given document to some entities in SumMC. The first application of GPT-3 performs a
a KB without requiring any text data to be labeled guided summarization of the input document. With
with explicit links to the KB. The only available zero-shot prompting, GPT-3 summarizes the texts
information in the KB is the names of the entities using the prompt “[D] Summarize the text above in
and the relations among them. In this paper, we one sentence: [M]”, where [D] is the input docu-
follow previous work (Le and Titov, 2019; Arora ment and [M] is the target mention. Here, we force
et al., 2021) and use Wikidata as our target KB, GPT-3’s summarization to start with the mention
which defines instance-of and subclass-of re- to ensure that the conditioned summary contains
lations between entities. Wikidata can be seen as both the target mention and related global context.
a knowledge graph with entities as nodes and rela- At this point, the generated summary serves as a
tions as edges, and the popularity of an entity can global context while the sentence containing the
be represented by its degree. mention serves as a local context, both of which
We now introduce SumMC, our proposed unsu- help disambiguate the target mention.
pervised EL model which consists of two instances
The second application of GPT-3 casts the task
of a generative language model. The first performs
to multiple-choice selection following many suc-
guided summarization by generating a summary
cessful cases (Ouyang et al., 2022). With the two
of the document conditioned on a mention. The
contexts, GPT-3 transforms EL to a multiple-choice
1
https://www.wikihow.com/Main-Page question using the prompt “According to the con-
2
https://www.wikidata.org/wiki/Wikidata:Main_P text above, which of the following best describes
age
3
The code and data are available at https://github.com [M]?”, followed by the representations of the men-
/JeffreyCh0/SumMC tion [M]’s candidates as choices.
9395
3 WikiHow-Wikidata Dataset Dataset
#Easy
Mentions
#Hard #Not-found
#Documents
WikiWiki 2,727 (24%) 8,560 (76%) 0 7,097
Most work on EL has targeted named entities, espe- AIDA-B 2,555 (57%) 1,136 (25%) 787 (18%) 230
WNED-Wiki 2,731 (41%) 1,475 (22%) 2,488 (37%) 318
cially in the news. To account for more diverse en- WNED-Cweb 4,667 (42%) 3,056 (28%) 3,317 (30%) 320
tities in different styles of texts, we create a human-
annotated dataset called WikiHow-Wikidata that Table 1: Statistics of datasets showing distributions of
links noun phrases in procedural texts to Wikidata. mention difficulty.
The research revolving around entities in procedu-
ral texts have long received much attention in the
SumMC: Our proposed model integrates GPT-3
community (Dalvi et al., 2018; Zhang et al., 2020;
guided summarization and multiple-choice selec-
Tandon et al., 2020; Zhang, 2022), without existing
tion models. We use the Curie model for summa-
large-scale datasets of entity links in such a style
rization conditioned on the target mention and the
of texts.
Davinci model for multiple-choice. As discussed
To create the dataset, we first extract 40,000 arti-
before, both global and local contexts are provided.
cles from the WikiHow corpus (Zhang et al., 2020)
–Guide: This is an ablated version of SumMC that
detailing everyday procedures. To select men-
generates summaries without being conditioned on
tions to link, we choose the top 3 most-frequently-
the target mention. While both global and local
occurring nouns from each article using a part-of-
contexts are provided, the global context is not
speech tagger, assuming that most mentions in a
guaranteed to be related to the target mention.
document share the same word sense (Gale et al.,
–Sum: This is another ablated version that does
1992). Then, we ask students from a university in
not generate summaries of a whole document but
the U.S. to manually link these mentions to some
directly performs multiple-choice selection, given
Wikidata entity. Finally, to measure and control
only with the local context of the mention.
annotation quality, we manually annotate a subset
of examples beforehand as control questions. De- 4.2 Dataset
tails about our data collection process, interface,
and measures for quality control can be found in We choose AIDA-CoNLL-testb (AIDA-B),
Appendix B. Eventually, WikiHow-Wikidata con- WNED-Wiki, and WNED-Clueweb (WNED-
sists of 11,287 triples of a WikiHow article, a target Cweb) to measure models’ performance on
mention, and a Wikidata entity. disambiguating named entities and use our
WikiHow-Wikidata (WikiWiki) dataset for
4 Experiments evaluating on noun phrases.
Following previous settings (Tsai and Roth,
We evaluate our SumMC model along with other
2016; Guo and Barbosa, 2018; Arora et al., 2021),
strong baselines on some widely used EL datasets
we report micro precision@1 (P@1) and categorize
and our WikiHow-Wikidata dataset.
each mention into ‘easy’ and ‘hard’ by whether
4.1 Models the candidate entity with the highest degree in
the knowledge graph is the correct answer. Per-
τ MIL-ND: Le and Titov (2019) introduced the first
formance on ‘hard’ mention is important since
EL model that did not require an annotated dataset.
it shows the model’s ability on highly ambigu-
Their model casts the EL task to a binary multi-
ous mentions. ‘Not-found’ is for mentions whose
instance learning (Dietterich et al., 1997) problem
candidate list does not contain the correct answer.
along with a noise-detecting classifier.
‘Overall’ performance is reported considering all
Eigentheme: Arora et al. (2021) created Eigen-
mentions, including ‘Not-found’ by treating it as a
theme, the current state-of-the-art among fully un-
false prediction. The distribution of each dataset is
supervised EL models. By representing each entity
shown in Table 1.
with its graph embedding, the model identifies a
low-rank subspace using SVD on the embedding 5 Results and Discussion
matrix and ranks candidates by the distance to this
hyperplane. We show our results in Table 2. Our SumMC model
To analyze the effect of using global context in achieves significantly better results than other un-
our SumMC model, we report the evaluation results supervised EL models in all evaluation datasets.
using three variations. Specifically, SumMC has a strong performance on
9396
WikiHow-Wikidata AIDA-B WNED-Wiki WNED-Clueweb
Overall Easy Hard Overall Easy Hard Overall Easy Hard Overall Easy Hard
τ MIL-ND - - - 0.45 0.70 0.19 0.13 - - 0.27 - -
Eigentheme 0.50 0.61 0.53 0.62 0.86 0.50 0.44 0.82 0.47 0.41 0.77 0.29
SumMC (ours) 0.76 0.62 0.80 0.64 0.80 0.71 0.47 0.81 0.65 0.48 0.75 0.60
Improvement over SoTA +0.26 +0.01 +0.27 +0.02 -0.06 +0.21 +0.03 -0.01 +0.18 +0.07 -0.02 +0.31

Table 2: Performance comparison across SoTA models. Result is reported with Precision@1. We get results
of τ MIL-ND and Eigentheme on public datasets from Arora et al. (2021). ‘Overall’ shows result considering
‘Not-found’ mentions.

–Guide –Sum mention.


WikiWiki Easy -0.02 -0.01
AIDA-B Easy -0.02 -0.03 Interestingly, we observe that the performance
WNED-Wiki Easy -0.01 -0.07 gap between variations on WikiHow-Wikidata is
WNED-Cweb Easy -0.02 -0.03 relatively small. We speculate that WikiHow’s in-
Average Easy -0.02 -0.04
WikiWiki Hard -0.01 -0.00 structional sentences are usually self-explanatory,
AIDA-B Hard -0.04 -0.08 so the local context often provides enough informa-
WNED-Wiki Hard -0.01 -0.06 tion to disambiguate the mention.
WNED-Cweb Hard -0.01 -0.02
Average Hard -0.02 -0.04 Effect of Multiple-Choice Selection. Using sim-
ilarity measures to link a mention to an entity
Table 3: Ablation study showing the effects on our is one of the most successful EL methods (Pan
SumMC model by removing the mention condition on et al., 2015). We also examine this approach using
summary or the global context.
Sentence-BERT(Reimers and Gurevych, 2019) and
cosine similarity instead of the multiple-choice se-
‘hard’ mentions. In comparison, Eigentheme, the lection model. As a result, it has only 42% P@1
current SoTA model, has slightly higher scores on AIDA-B dataset. The text-based embedding
on ‘easy’ mentions on most datasets but performs approach might not be practical in our setting be-
worse on ‘hard’ mentions. cause entity candidates can only be represented by
Comparison with Previous Models. Overall, minimal texts, making text embedding unstable.
SumMC achieves 63% precision, while Eigen- Error Analysis. In some cases, common sense
theme scores 47%. Although SumMC has 1% less is required to disambiguate mentions. For exam-
precision on ‘easy’ cases (75% vs. 76%), it outper- ple, “Japan” in an article about a soccer tournament
forms Eigentheme on ‘hard’ cases by 26% (73% should be linked to the entity “Japan national foot-
vs. 47%). Eigentheme assumes that gold entities ball team” instead of the country “Japan.” The cor-
in a document are topically related (Arora et al., rect answer can be inferred from the term "Asian
2021). It captures global context only using the re- Cup" in the text. However, our model fails such a
lations between mentions while neglecting the texts case when the word ‘soccer’ is not included in the
in the document. However, this assumption might context.
not always hold. Our model, in contrast, removes Currently, each of our multiple choices is a con-
this assumption by producing a guided summary catenation of the target entity and its related en-
of texts in the document. tities based on two KB relations: instance-of
Effect of Global Context. We show the results and subclass-of. However, these might be in-
of our ablation study in Table 3. On all datasets, sufficient. For example, most person entities have
SumMC outperforms the variation without hav- ‘human’ as the only related entity, which is unin-
ing the summary guided by the mention (–Guide), formative. Conversely, considering other relations
which outperforms the variation without summa- might also introduce unnecessary noise.
rization (–Sum). This result shows the efficacy of
6 Conclusion
not only using summaries as global contexts, but
also forcing the summaries to contain information We introduce SumMC, a fully unsupervised Entity
about the mention. Indeed, in many cases, we find Linking model that first produces a summary of the
that the mention might not be central to the doc- document guided by the mention, and then casts
ument so that a standard summary might contain the task to a multiple-choice format. Our model
noise or insufficient signal for disambiguating the achieves new state-of-the-art performance on var-
9397
ious benchmarks, including our new WikiHow- learners. Advances in neural information processing
Wikidata, the first EL dataset on procedural texts. systems, 33:1877–1901.
Notably, our approach of guided summarization Yixin Cao, Lifu Huang, Heng Ji, Xu Chen, and Juanzi Li.
may be applied to other tasks that benefit from 2017. Bridge text and knowledge by learning multi-
global contexts. Future work might also extend our prototype entity mention embedding. In Proceedings
methods to supervised settings. of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers),
pages 1623–1633, Vancouver, Canada. Association
Limitations for Computational Linguistics.
Because we focus on fully unsupervised models, Eric Charton, Marie-Jean Meurs, Ludovic Jean-Louis,
we do not consider fine-tuning GPT-3 nor pro- and Michel Gagnon. 2014. Improving entity link-
vide a direct comparison with other supervised ap- ing using surface form refinement. In Proceedings
proaches. of the Ninth International Conference on Language
Resources and Evaluation (LREC’14), pages 4609–
A potential criticism of this work is our use of 4615, Reykjavik, Iceland. European Language Re-
GPT-3. Although GPT-3 is publicly available to sources Association (ELRA).
everyone, it is not an open-source model and can
Xiao Cheng and Dan Roth. 2013. Relational inference
be expensive to use at scale.
for wikification. In Proceedings of the 2013 Con-
For direct comparison, we use the candidate gen- ference on Empirical Methods in Natural Language
eration method from (Le and Titov, 2019) and Processing, pages 1787–1796, Seattle, Washington,
Arora et al. (2021), which has a low recall on USA. Association for Computational Linguistics.
datasets. Although there are better methods (Sil Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau
et al., 2012; Charton et al., 2014), we do not con- Yih, and Peter Clark. 2018. Tracking state changes in
sider them in this work. procedural text: a challenge dataset and models for
process paragraph comprehension. In Proceedings
Acknowledgements of the 2018 Conference of the North American Chap-
ter of the Association for Computational Linguistics:
This research is based upon work supported in Human Language Technologies, Volume 1 (Long Pa-
pers), pages 1595–1604, New Orleans, Louisiana.
part by the DARPA KAIROS Program (contract Association for Computational Linguistics.
FA8750-19-2-1004), the DARPA LwLL Program
(contract FA8750-19-2-0201), the IARPA BET- Thomas G Dietterich, Richard H Lathrop, and Tomás
TER Program (contract 2019-19051600004), and Lozano-Pérez. 1997. Solving the multiple instance
problem with axis-parallel rectangles. Artificial intel-
the NSF (Award 1928631). Approved for Public ligence, 89(1-2):31–71.
Release, Distribution Unlimited. The views and
conclusions contained herein are those of the au- William A. Gale, Kenneth W. Church, and David
thors and should not be interpreted as necessarily Yarowsky. 1992. One sense per discourse. In Speech
and Natural Language: Proceedings of a Workshop
representing the official policies, either expressed Held at Harriman, New York, February 23-26, 1992.
or implied, of DARPA, IARPA, NSF, or the U.S.
Government. We thank the students from the CIS- Zhaochen Guo and Denilson Barbosa. 2018. Robust
named entity disambiguation with random walks. Se-
421/521 course in 2021 at the University of Penn-
mantic Web, 9(4):459–479.
sylvania for annotating WikiHow-Wikidata dataset.
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino,
Hagen Fürstenau, Manfred Pinkal, Marc Spaniol,
References Bilyana Taneva, Stefan Thater, and Gerhard Weikum.
2011. Robust disambiguation of named entities in
Akhil Arora, Alberto Garcia-Duran, and Robert West. text. In Proceedings of the 2011 Conference on Em-
2021. Low-rank subspaces for unsupervised entity pirical Methods in Natural Language Processing,
linking. In Proceedings of the 2021 Conference on pages 782–792, Edinburgh, Scotland, UK. Associa-
Empirical Methods in Natural Language Processing, tion for Computational Linguistics.
pages 8037–8054, Online and Punta Cana, Domini-
can Republic. Association for Computational Lin- Filip Ilievski, Daniel Garijo, Hans Chalupsky,
guistics. Naren Teja Divvala, Yixiang Yao, Craig Rogers, Ron-
peng Li, Jun Liu, Amandeep Singh, Daniel Schwabe,
Tom Brown, Benjamin Mann, Nick Ryder, Melanie and Pedro Szekely. 2020. KGTK: A toolkit for large
Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind knowledge graph manipulation and analysis. In Inter-
Neelakantan, Pranav Shyam, Girish Sastry, Amanda national Semantic Web Conference, pages 278–293.
Askell, et al. 2020. Language models are few-shot Springer.

9398
Phong Le and Ivan Titov. 2019. Distant learning for 3982–3992, Hong Kong, China. Association for Com-
entity linking with automatic noise detection. In putational Linguistics.
Proceedings of the 57th Annual Meeting of the Asso-
ciation for Computational Linguistics, pages 4081– Avirup Sil, Ernest Cronin, Penghai Nie, Yinfei Yang,
4090, Florence, Italy. Association for Computational Ana-Maria Popescu, and Alexander Yates. 2012.
Linguistics. Linking named entities to any database. In Proceed-
ings of the 2012 Joint Conference on Empirical Meth-
Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, ods in Natural Language Processing and Computa-
Kristina Toutanova, Jacob Devlin, and Honglak Lee. tional Natural Language Learning, pages 116–127,
2019. Zero-shot entity linking by reading entity de- Jeju Island, Korea. Association for Computational
scriptions. In Proceedings of the 57th Annual Meet- Linguistics.
ing of the Association for Computational Linguistics,
pages 3449–3460, Florence, Italy. Association for Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi,
Computational Linguistics. Dheeraj Rajagopal, Peter Clark, Michal Guerquin,
Kyle Richardson, and Eduard Hovy. 2020. A dataset
Isaiah Onando Mulang’, Kuldeep Singh, Chaitali for tracking entities in open domain procedural text.
Prabhu, Abhishek Nadgeri, Johannes Hoffart, and In Proceedings of the 2020 Conference on Empirical
Jens Lehmann. 2020. Evaluating the impact of Methods in Natural Language Processing (EMNLP),
knowledge graph context on entity disambiguation pages 6408–6417, Online. Association for Computa-
models. In Proceedings of the 29th ACM Interna- tional Linguistics.
tional Conference on Information & Knowledge Man-
agement, pages 2157–2160. Chen-Tse Tsai and Dan Roth. 2016. Cross-lingual wiki-
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Car- fication using multilingual embeddings. In Proceed-
roll L. Wainwright, Pamela Mishkin, Chong Zhang, ings of the 2016 Conference of the North American
Sandhini Agarwal, Katarina Slama, Alex Ray, John Chapter of the Association for Computational Lin-
Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, guistics: Human Language Technologies, pages 589–
Maddie Simens, Amanda Askell, Peter Welinder, 598, San Diego, California. Association for Compu-
Paul Christiano, Jan Leike, and Ryan Lowe. 2022. tational Linguistics.
Training language models to follow instructions with
human feedback. Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen,
Krisztian Balog, and Arjen P de Vries. 2020. Rel:
Xiaoman Pan, Taylor Cassidy, Ulf Hermjakob, Heng An entity linker standing on the shoulders of giants.
Ji, and Kevin Knight. 2015. Unsupervised entity In Proceedings of the 43rd International ACM SI-
linking with Abstract Meaning Representation. In GIR Conference on Research and Development in
Proceedings of the 2015 Conference of the North Information Retrieval, pages 2197–2200.
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies, Denny Vrandečić and Markus Krötzsch. 2014. Wiki-
pages 1130–1139, Denver, Colorado. Association for data: a free collaborative knowledgebase. Communi-
Computational Linguistics. cations of the ACM, 57(10):78–85.

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jian-
Deepwalk: Online learning of social representations. feng Gao. 2015. Semantic parsing via staged query
In Proceedings of the 20th ACM SIGKDD interna- graph generation: Question answering with knowl-
tional conference on Knowledge discovery and data edge base. In Proceedings of the 53rd Annual Meet-
mining, pages 701–710. ing of the Association for Computational Linguistics
and the 7th International Joint Conference on Natu-
Jonathan Raiman and Olivier Raiman. 2018. Deeptype: ral Language Processing (Volume 1: Long Papers),
multilingual entity linking by neural type system evo- pages 1321–1331, Beijing, China. Association for
lution. In Proceedings of the AAAI Conference on Computational Linguistics.
Artificial Intelligence, volume 32.
Lev Ratinov, Dan Roth, Doug Downey, and Mike An- Li Zhang. 2022. Reasoning about procedures with natu-
derson. 2011. Local and global algorithms for disam- ral language processing: A tutorial.
biguation to Wikipedia. In Proceedings of the 49th
Annual Meeting of the Association for Computational Li Zhang, Qing Lyu, and Chris Callison-Burch. 2020.
Linguistics: Human Language Technologies, pages Reasoning about goals, steps, and temporal ordering
1375–1384, Portland, Oregon, USA. Association for with WikiHow. In Proceedings of the 2020 Con-
Computational Linguistics. ference on Empirical Methods in Natural Language
Processing (EMNLP), pages 4630–4639, Online. As-
Nils Reimers and Iryna Gurevych. 2019. Sentence- sociation for Computational Linguistics.
BERT: Sentence embeddings using Siamese BERT-
networks. In Proceedings of the 2019 Conference on Wenzheng Zhang, Wenyue Hua, and Karl Stratos. 2021.
Empirical Methods in Natural Language Processing Entqa: Entity linking as question answering. arXiv
and the 9th International Joint Conference on Natu- preprint arXiv:2110.02369.
ral Language Processing (EMNLP-IJCNLP), pages

9399
Document
SOCCER - JAPAN GET LUCKY WIN, CHINA IN SURPRISE DEFEAT. Nadim Ladki AL-AIN, United Arab Emirates
1996-12-06 Japan began the defence of their Asian Cup title with a lucky 2-1 win against Syria in a Group C championship
match on Friday. But China saw their luck desert them in the second match of the group, crashing to a surprise 2-0 defeat to
newcomers Uzbekistan. China controlled most of the match and saw several chances missed until the 78th minute when Uzbek
striker Igor Shkvyrin took advantage of a misdirected defensive header to lob the ball over the advancing Chinese keeper and
into an empty net. Oleg Shatskiku made sure of the win in injury time, hitting an unstoppable left foot shot from just outside the
area. The former Soviet republic was playing in an Asian Cup finals tie for the first time. Despite winning the Asian Games title
two years ago, Uzbekistan are in the finals as outsiders. Two goals from defensive errors in the last six minutes allowed Japan to
come from behind and collect all three points from their opening meeting against Syria. Takuya Takagi scored the winner in the
88th minute, rising to head a Hiroshige Yanagimoto cross towards the Syrian goal which goalkeeper Salem Bitar appeared to
have covered but then allowed to slip into the net. It was the second costly blunder by Syria in four minutes. Defender Hassan
Abbas rose to intercept a long ball into the area in the 84th minute but only managed to divert it into the top corner of Bitar’s goal.
Nader Jokhadar had given Syria the lead with a well-struck header in the seventh minute. Japan then laid siege to the Syrian
penalty area for most of the game but rarely breached the Syrian defence. Bitar pulled off fine saves whenever they did. Japan
coach Shu Kamo said: "The Syrian own goal proved lucky for us. The Syrians scored early and then played defensively and
adopted long balls which made it hard for us." Japan, co-hosts of the World Cup in 2002 and ranked 20th in the world by FIFA,
are favourites to regain their title here. Hosts UAE play Kuwait and South Korea take on Indonesia on Saturday in Group A
matches. All four teams are level with one point each from one game.
Mention Summary
- Japan began the defence of their Asian Cup title with a lucky 2-1 win against Syria in a Group C championship
match on Friday.

Japan Japan won 2-1 against Syria in the first game of the Asian Cup, while China lost 2-0 to Uzbekistan in the
second game of the group.

Syria Syria lost to Japan 2-1 in the Asian Cup championship, with two late goals coming from defensive errors.

Uzbekistan Uzbekistan defeated China 2-0 in their first match of the Asian Cup, surprising many observers.

Table 4: Example of guided summarization on ‘1163testb_soccer’ document in AIDA-B dataset.

A Examples of Guided Summarization to input only one answer because it is common in


Wikidata that multiple entities describe the same
Based on the document ‘1163testb_soccer’ in the
meaning. Our program records the WikiHow ar-
AIDA-B dataset, we show examples of guided
ticle URL, target mention, and the corresponding
summarization in Table 4. In the first example,
Wikidata QID students selected. We manually an-
the model generates a general document summary
notated 30 questions for control questions. The
since it is not guided with a mention. Thus, in-
program shows a random control question for ev-
formation about Uzbekistan is not shown in the
ery ten questions without telling participants. The
summary. The latter three examples are guided
annotation program is available in the uploaded
with ‘Japan’, ‘Syria’, and ‘Uzbekistan’, and give
file.
corresponding summaries specified to the mention.
We also provide example guided summaries of Eventually, we collect 31,354 responses from
the AIDA-B dataset, which can be found in the 521 participants. We then filtered qualifying par-
uploaded file. ticipants so that only those with more than 95%
B Creation of WikiHow-Wikidata accuracy on confident control questions remain.
Hence, we end up with a cleaned set of 23,352
Our annotation interface shows example sentences responses.
from a Wikihow article and asks the annotator to
select the correct sense of one of the three most In order to apply to different models examined
frequent nouns. Our inventory of senses is a num- in our paper, we do further filtering on the cleaned
bered list of possible Wikidata candidate entities, set. We run the candidate generation mentioned
along with a short description of each sense. Partic- in Section 2, and exclude entities that cannot be
ipants read the article and select the word sense by found in the list of DEEPWALK (Perozzi et al.,
picking the closest match from the candidate list or 2014) graph embedding trained on Wikidata by
choosing “No Answer” if there is none. Annotators Arora et al. (2021). Also, we drop mentions with
can also input multiple answers if more than one a candidate list that does not have a gold entity or
candidate matches the correct sense inferred from has only one entity in the list. As a result, we get a
example sentences. We do not force participants final set of 11,287 mentions.
9400
C Effect of GPT-3 Engine Size
We also compare the impact of GPT-3 engine size
to SumMC model. Guided summarization is very
powerful regardless of the engine. Only changing
engine size, our model with Ada achieves 0.631
P@1, and Babbage scores 0.633 P@1 on AIDA-B,
which tie with 0.636 P@1 by Curie. This gives an
alternative option to users with a limited budget but
who still want a moderate performance. Compared
to Curie, the pricing of Ada is 87% cheaper, but it
is still equivalent to the result that Curie achieved.
On the other hand, multiple-choice selection re-
quires a large model. Compared with the 0.633
P@1 on AIDA-B with Davinci engine, Curie and
Babbage only score 0.204 and 0.196 P@1, respec-
tively, while the Ada engine fails to complete the
evaluation.
Using our model’s setting, it costs around $0.002
for guided summarization and $0.01 for multiple-
choice selection.

D Model Setting Details


Since most of our code is API call of GPT-3,
SumMC does not require a strong requirement on
computational resources.
In our model, we used default hyperparameter
setting for both guided summarization and multiple-
choice selection. In detail, we set temperature=0.7,
max_tokens=256, top_p=1. frequency_penalty=0,
and presence_penalty=0.
Due to the input token limit of GPT-3 engines,
we truncated the input document to 512 words sur-
rounding the target mention during guided summa-
rization.
We used the ‘2021-09-13’ dump of Wikidata
in our model, and used Knowledge Graph Toolkit
(Ilievski et al., 2020) to extract entities and their
relations.

9401

You might also like