Zhao 2023 Panacea
Zhao 2023 Panacea
67
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
System Demonstrations, pages 67–74
May 2-4, 2023 ©2023 Association for Computational Linguistics
cannot cover: First, true or unverified information 2 Datasets
with intent to harm; Second, scenarios where no
The following datasets are used in the project:
verified knowledge database is available. Rumour
detection cannot prove the truth of a claim but may Knowledge Database This is used for fact-
alert the user about claims with a high risk of being checking, and includes COVID-19 related docu-
misinformation. ments from selected reliable sources 5 . The docu-
Previous work have either retrieved tweets from ments were cleaned and split into 300 token para-
a short fixed time period (Sharma et al., 2020) or graphs to construct a reliable knowledge database,
search recent tweets (Finn et al., 2014), which is whose supporting documents are retrieved and vi-
limited by Twitter to only the last 7 days. We sualised in our system.
instead maintain an updated database which is con-
stituted of an annotated tweets dataset with popular PANACEA Dataset (Arana-Catania et al., 2022),
claims and an unlabelled streaming of COVID-19 constructed from COVID-19 related data sources6
related tweets that are crawled and selected peri- and using BM25 and MonoT5 (Nogueira et al.,
odically to update the dataset. Besides building 2020) to remove duplicate claims. This dataset
on the various analytic functionalities used in pre- includes 5,143 labelled claims (1,810 False and
vious work, PANACEA improves the architecture 3,333 True), and their respective text, source and
of these elements and adds extra features to the claim sub-type.
updated dataset for more efficient results. COVID-RV dataset In order to fine-tune our
A screencast video introducing the system3 , il- model, we constructed a new COVID-19 related
lustrating its use in the checking of a COVID-19 propagation tree dataset for rumour detection. Sim-
claim, and the demo4 are also available online. The ilar previous datasets are Twitter15 and Twitter16
system can be easily adapted to other claim topics. (Ma et al., 2018), which are widespread tweets’
PANACEA covers various types of misinforma- propagation trees with rumour labels, however, they
tion detection related to COVID-19 with the fol- are not COVID-19 related. Our dataset has been
lowing contributions: constructed by extending COVID-RV (Kochkina
et al., 2023), including the number of retweets, user
• We built a new web-based system, PANACEA, id, post time, text, location and tweet reply ids as
which is able to perform both fact-checking metadata for each tweet. Each tree is annotated
and rumour detection with natural language with a related claim chosen from our claim dataset
claims submitted by users. The system in- and a stance label (chosen from Support or Re-
cludes visualisations of various statistical anal- fute) towards its related claim. Such a stance la-
yses of the results for a better user understand- bel for each tree is purely based on the content
ing. of the source tweet. In COVID-RV the conversa-
tions are annotated as either True or False based
• PANACEA performs automated veracity as- on the veracity of the claim and the stance of the
sessment and provides supporting evidence source tweet towards it. Tweets supporting a false
that can be ranked by various criteria, sup- claim or challenging a true claim are annotated as
ported by novel natural language inference False, tweets supporting true claims or challenging
methods. The system is able to manage mul- a false claim are annotated as True. Twitter15 and
tiple user requests with low latency thanks to Twitter16 datasets also contain Unverified conver-
our development of a queuing system. sations, which are discussing claim that are neither
confirmed or denied.
• PANACEA is able to perform automated ru-
COVID Twitter Propagation Tree (Live) Be-
mour detection by exploiting state-of-the-art
sides the last dataset constructed for fine-tuning,
research on propagation patterns. The sys-
tem uses an annotated dataset and streams of 5
Centers for Disease Control and Prevention (CDC), Eu-
COVID-19 tweets are collected to maintain ropean Centre for Disease Prevention and Control (ECDC),
WebMD and World Health Organisation (WHO)
an updated database. 6
Corona VirusFacts Database, CoAID dataset (Cui and
Lee, 2020), MM-COVID (Li et al., 2020), CovidLies (Hossain
3
https://www.youtube.com/watch?v=D1PN8_9oYso et al., 2020), TREC Health Misinformation track and TREC
4
https://panacea2020.github.io/ COVID challenge (Voorhees et al., 2021)
68
PANACEA also runs a crawler to collect a stream retrieval are based on our natural language infer-
of COVID-19 tweets that are used to maintain an ence model NLI-SAN (Arana-Catania et al., 2022),
updated database. This live dataset is not anno- which needs GPU resources to run. Therefore we
tated, instead, it is labelled by the pre-trained ru- built a queuing system that manages the resources
mour detection model. As the Twitter’s search API and queues the claims while the GPUs are being
does not allow retrieval of tweets beyond a week used. The results are sent to the user. To avoid
window, we retrieve COVID-19 related historical duplicate searches, a temporary copy of this result
tweets based on the widely used dataset of COVID- is saved in our database based on the user’s IP ad-
19-TweetIDs (Chen et al., 2020), which contains dress until the user searches for a new claim or the
more than 1 billion tweet IDs. Considering the saved period expires.
size of the dataset, and for the storage and retrieval
efficiency, we filtered out the less popular tweets Veracity Assessment PANACEA is supported
with limited impact. To date, more than 12k prop- by NLI-SAN (Arana-Catania et al., 2022), which
agation trees have been collected, starting from incorporates natural language inference results of
January 2020. For each tweet, its pseudo rumour claim-evidence pairs into a self-attention network.
label is generated by the trained model. The input claim c is paired with each retrieved
relevant evidence ei to form claim-evidence pairs,
3 Architecture of PANACEA where the relevant evidences are the retrieved sen-
tences as described in the following paragraph.
Figure 1 shows an overview of PANACEA, includ-
Each claim-evidence pair (c, ei ) is fed into both
ing two functions: fact-checking and rumour de-
a RoBERTa-large7 model to get a representation Si
tection for COVID-19. For fact-checking, there
and into a RoBERTa-large-MNLI7 model to get a
are three modules: (1) resource allocation system;
probability triplet Ii of stance (contradiction, neu-
(2) veracity assessment; and (3) supporting evi-
trality, or entailment) between the pair. Next, Si
dence retrieval. PANACEA also supports a unique
is mapped to a Key K and a Value V , while Ii
function, rumour detection by propagation patterns,
is mapped onto a Query Q. (Q, K, V )i forms the
which has the following modules: (1) tweet re-
input of the self-attention layer and the outputs Oi
trieval; (2) rumour detection; and (3) tweet meta-
for all the claim-evidence pairs are concatenated
information analysis.
together. The output is then passed to a MLP layer
FACT CHECKING RUMOUR DETECTION to get the veracity assessment result (True or False)
as shown in Figure 2.
Veracity Assessment Rumour Assessment
Ranked Supporting Evidence User Stance/Sentiment Analysis
Stance/Relevance Analysis Propagation visualisation
Input Claim
Supporting Evidence Retrieval This module in-
GPU Resource Queuing
Claims
Tweets cludes three parts: document retrieval, sentence
Allocation
retrieval and corresponding meta-data generation.
Retrieval
Tweet Database
69
Figure 2: Fact checking result with input claim: coronavirus is genetically engineered.
filter or re-rank the result using the metadata. An Claim-related tweets retrieval Similar to the
example of documents retrieved is shown in Fig- fact-checking module, this module includes an au-
ure 2 and the corresponding detailed information tocomplete function for the user’s natural language
visualisation is shown in Figure 3. On the details input claim that guesses the input from our claims
page, the whole document text is shown with the dataset. The results for existing claims are also
top 3 relevant sentences highlighted by their stance pre-computed to retrieve tweets faster. For a claim
towards the input claim. The stance distribution, that is not in our claim dataset, we use BM25 to
described in the veracity assessment module is also retrieve the related propagation trees from the large
visualised. Twitter propagation tree database maintained by
the active Twitter crawler.
2 5
3 6
Figure 4: Rumour detection result with input claim: vitamin c cures coronavirus.
4 Evaluation Results
Fact-Checking We investigate the performance
of our system in document retrieval and veracity
assessment in (Arana-Catania et al., 2022). Table 1
shows that combining BM25 and MonoT5 is the
most effective approach for document retrieval of
the selected techniques. In addition, Figure 5 shows
that NLI-SAN achieves similar performance with
KGAT (Liu et al., 2020), while having a simpler
architecture for the application, and outperforms Figure 6: Cross-dataset evaluation of models train and
GEAR (Zhou et al., 2019). test on different datasets, such as training on PHEME,
testing on Twitter15/Twitter16 and vice versa.
9
https://huggingface.co/
digitalepidemiologylab/covid-twitter-bert-v2
72
5 Conclusion Samantha Finn, Panagiotis Takis Metaxas, and Eni
Mustafaraj. 2014. Investigating rumor propagation
This paper introduces a web-based system on fact- with twittertrails. CoRR abs/1411.3550.
checking and rumour detection based on novel nat- Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte,
ural language processing models for COVID-19 Yoshitomo Matsubara, Sean Young, and Sameer
misinformation detection. Going forward, we will Singh. 2020. Detecting covid-19 misinformation
keep updating the data and explore other methods on social media. In the 1st Workshop on NLP for
COVID-19 (Part 2) at EMNLP 2020.
for misinformation identification to improve the
current system and introduce more functions to the Md Saiful Islam, Tonmoy Sarkar, Sazzad Hossain Khan,
system as part of our continuing efforts to support Abu-Hena Mostofa Kamal, SM Murshid Hasan,
Alamgir Kabir, Dalia Yeasmin, Mohammad Ariful
the general public to identify misinformation. Islam, Kamal Ibne Amin Chowdhury, Kazi Selim
Anwar, et al. 2020. Covid-19–related infodemic and
Acknowledgements its impact on public health: A global social media
analysis. The American journal of tropical medicine
This work was supported by the UK Engineering and hygiene, 103(4):1621.
and Physical Sciences Research Council (grant no.
Elena Kochkina, Tamanna Hossainb, Robert L.Logan
EP/V048597/1). YH and ML are each supported IV Miguel Arana-Catania, Rob Procter, Arkaitz Zu-
by a Turing AI Fellowship funded by the UK Re- biaga, Sameer Singh, Yulan He, and Maria Liakata.
search and Innovation (grant no. EP/V020579/1, 2023. Evaluating the generalisability of neural ru-
EP/V030302/1). mour verification models. Information Processing &
Management, 60(1).
Elena Kochkina and Maria Liakata. 2020. Estimating
References predictive uncertainty for rumour verification mod-
els. In Proceedings of the 58th Annual Meeting of
Miguel Arana-Catania, Elena Kochkina, Arkaitz Zu- the Association for Computational Linguistics, page
biaga, Maria Liakata, Rob Procter, and Yulan He. 6964–6981.
2022. Natural language inference with self-attention
for veracity assessment of pandemic claims. In Pro- Elena Kochkina, Maria Liakata, and Arkaitz Zubiaga.
ceedings of the 2022 Annual Conference of the North 2018. All-in-one: Multi-task learning for rumour
American Chapter of the Association for Computa- stance classification,detection and verification. In
tional Linguistics, pages 1496–1511. Proceedings of the 27th International Conference on
Computational Linguistics, page 3402–3413.
Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu.
Huang, Yu Rong, and Junzhou Huang. 2020. Rumor 2020. Mm-covid: A multilingual and multimodal
detection on social media with bi-directional graph data repository for combating covid-19 disinforma-
convolutional networks. In Proceedings of the 34th tion. CoRR abs/2011.04088.
AAAI Conference on Artificial Intelligence.
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Zhiyuan Liu. 2020. Fine-grained fact verification
Tracking social media discourse about the covid-19 with kernel graph attention network. In Proceedings
pandemic: Development of a public coronavirus twit- of the 58th Annual Meeting of the Association for
ter data set. JMIR Public Health Surveill, 6(2). Computational Linguistics, page 7342–7351.
Limeng Cui and Dongwon Lee. 2020. Coaid: Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Rumor
Covid-19 healthcare misinformation dataset. CoRR detection on twitter with tree-structured recursive
abs/2006.00885. neural networks. In Proceedings of the 56th Annual
Meeting of the Association for Computational Lin-
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and guistics, page 1980–1989.
Kristina Toutanova. 2019. Bert: Pre-training of deep Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal,
bidirectional transformers for language understand- Jason Weston, and Douwe Kiela. 2020. Adversarial
ing. In Proceedings of the 2019 conference of the nli: A new benchmark for natural language under-
North American chapter of the association for com- standing. In Proceedings of the 58th Annual Meet-
putational linguistics: human language technologies, ing of the Association for Computational Linguistics,
page 4171–4186. page 4885–4901.
John Dougrez-Lewis, Maria Liakata, Elena Kochkina, Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and
and Yulan He. 2021. Learning disentangled latent Jimmy Lin. 2020. Document ranking with a pre-
topics for twitter rumour veracity classification. In trained sequence-to-sequence model. In Proceedings
Findings of the association for computational linguis- of the 2020 Conference on Empirical Methods in Nat-
tics: ACL-IJCNLP 2021, page 3902–3908. ural Language Processing: Findings, page 708–718.
73
Gordon Pennycook, Ziv Epstein, Mohsen Mosleh, An-
tonio Arechar, Dean Eckles, and David Rand. 2020.
Understanding and reducing the spread of misinfor-
mation online. NA - Advances in Consumer Research,
48:863–867.
Karishma Sharma, Sungyong Seo, Chuizheng Meng,
Sirisha Rambhatla, and Yan Liu. 2020. Covid-19 on
social media: Analyzing misinformation in twitter
conversations. CoRR abs/2003.12309.
Lin Tian, Xiuzhen Zhang, and Jey Han Lau. 2022.
Duck: Rumour detection on social media by mod-
elling user and comment propagation networks. In
Proceedings of the North American Chapter of the As-
sociation for Computational Linguistics, pages 4939
– 4949.
Yariv Tsfati, H. G. Boomgaarden, J. Strömbäck,
R. Vliegenthart, A. Damstra, and E. Lindgren. 2020.
Causes and consequences of mainstream media dis-
semination of fake news: literature review and syn-
thesis. Annals of the International Communication
Association, 44(2):157–173.
Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina
Demner-Fushman, William R Hersh, Kyle Lo, Kirk
Roberts, Ian Soboroff, and Lucy Lu Wang. 2021.
Trec-covid: constructing a pandemic information re-
trieval test collection. In ACM SIGIR Forum.
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan
Yang, and Ming Zhou. 2020a. Minilm: Deep self-
attention distillation for task-agnostic compression
of pre-trained transformers. CoRR abs/2002.10957.
Xuan Wang, Yingjun Guan, Weili Liu, Aabhas Chauhan,
Enyi Jiang, Qi Li, , David Liem, Dibakar Sigdel,
J. Harry Caufield, Peipei Ping, and Jiawei Han. 2020b.
Evidenceminer: Textual evidence discovery for life
sciences. In Proceedings of the 58th Annual Meet-
ing of the Association for Computational Linguistics,
page 56–62.
Zilong Zhao, Jichang Zhao, Yukie Sano, Orr Levy,
Hideki Takayasu, Misako Takayasu, Daqing Li, Jun-
jie Wu, and Shlomo Havlin. 2020. Fake news propa-
gates differently from real news even at early stages
of spreading. EPJ Data Science, 9(7).
Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng
Wang, Changcheng Li, and Maosong Sun. 2019.
Gear: Graph-based evidence aggregating and reason-
ing for fact verification. In Proceedings of the 57th
Annual Meeting of the Association for Computational
Linguistics, page 892–901.
Zhengyuan Zhu, Kevin Meng, Josue Caraballo, Israa
Jaradat, Xiao Shi, Zeyu Zhang, Farahnaz Akrami,
Fatma Arslan Haojin Liao, Damian Jimenez, Mo-
hammed Samiul Saeef, Paras Pathak, and Chengkai
Li. 2021. A dashboard for mitigating the covid-19
misinfodemic. In Proceedings of the 16th Confer-
ence of the European Chapter of the Association for
Computational Linguistics: System Demonstrations,
page 99–105.
74