Graph-Based Modeling of Online Communities For Fake News Detection
Graph-Based Modeling of Online Communities For Fake News Detection
GNN
u1
Lgraph
u2
GNN u3
u4 sg
Ug
LR
Text
Text Text
LR
Classifier
Llr
Text
ssafer
Text Text
Figure 1: Visual representation of the proposed SAFER framework. Graph and text encoders are trained independently followed
by training of a logistic regression (LR) classifier. During inference, the text of the article as well as information about its social
network of users are encoded by the trained text and graph encoders respectively. Finally, the social-context and textual features
of the article are concatenated for classification using the trained LR classifier.
52.5
R - GCN 77.16
R - GAT 77.32 50.0
excl.>5% 0.71
H y- GCN 77.13 47.5
H y- GAT 77.01
45.0
R - GCN 65.89
All Top60k Top40k Top20k Top8k Top6k
R - GAT 65.32 Top N user subsets
excl.>1% 0.65 H y- GCN 71.99
H y- GAT 72.05 Figure 3: Validation and test set performance of the SAFER
(GCN) framework over varying subsets of most active users
Table 2: Results of SAFER variants on varying subsets of on HealthStory.
user nodes on GossipCop. ρ denotes relative graph density.
7 Conclusion
We presented a graph-based approach to fake news
Figure 4: Article sharing behavior of 3 kinds of users (left) detection which leverages information-spreading
and; Average of real and fake article shares of type (c) users behaviour of social media users. Our results demon-
(right). strate that incorporating community-based model-
ing leads to substantially improved performance in
this task as compared to purely text-based models.
for GossipCop.
The proposed relational GNNs for user/community
6.3 Effect of article sharing patterns modeling outperformed the traditional GNNs indi-
cating the importance of explicitly modeling the
As discussed earlier, results in Table 1 show that
relations in a heterogeneous graph. Meanwhile,
there is a difference in the article sharing behavior
the proposed hyperbolic GNNs performed on par
of users between the two datasets. To understand
with other GNNs and we leave their application for
user characteristics better, we visualize the article
user/community modeling to truly hierarchical so-
sharing behavior of users for both the datasets in
cial network datasets as future work. In the future,
Figure 4. We visualize the composition of 3 types
it would be interesting to apply these techniques to
of users in the datasets: (a) users that share only
other tasks, such as rumour detection and modeling
real articles, (b) only fake articles and (c) those that
changes in public beliefs.
share articles from both classes. We see that the
majority of the users are type (c) in both datasets
(57.18% for GossipCop and 74.15% for Health-
Story). However, 38% of the users are type (b) in
GossipCop while just 9.96% in HealthStory. Fur-
thermore, we visualize the average of real and fake
articles shared by type (c) users on the right in Fig-
ure 4. From these observations, we note that the
GNNs are better positioned to learn user represen-
tations to detect fake articles in case of GossipCop,
since: (1) The community graph has enough sup-
port of type (b) users (38%). This aids the GNNs
to learn rich community-level features of users that
aid in detecting fake articles; (2) Even of the 57%
type (c) users, they are much more likely to share ar-
ticles of a single class (here, real). This again helps
the network to learn distinct features for these users
and assign them to a specific community.
However, in case of HealthStory, the GNNs
struggle to learn equally rich user representations
to detect fake articles since: (1) The community
graph has only around 10% of type (b) users. This
limits the GNNs from learning expressive commu-
nity level features for users that are more likely to
share fake articles and thereby are not able to use
them for accurate prediction. (2) A vast majority of
users (74%) share articles of both classes. To add
to that, these bulk of users are considerably less
References Yi Han, Shanika Karunasekera, and Christopher
Leckie. 2020. Graph neural networks with continual
Uri Alon and Eran Yahav. 2020. On the bottleneck of learning for fake news detection from social media.
graph neural networks and its practical implications.
arXiv preprint arXiv:2006.05205. George Karypis and Vipin Kumar. 1998. A fast and
high quality multilevel scheme for partitioning irreg-
Meital Balmas. 2014. When fake news becomes real: ular graphs. SIAM Journal on scientific Computing,
Combined exposure to multiple news sources and 20(1):359–392.
political attitudes of inefficacy, alienation, and cyn-
icism. Communication research, 41(3):430–454.
Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi.
Gary Bécigneul and Octavian-Eugen Ganea. 2018. Rie- 2003. Marginalized kernels between labeled graphs.
mannian adaptive optimization methods. arXiv In Proceedings of the 20th ICML (ICML-03), pages
preprint arXiv:1810.00760. 321–328.
Carlos Castillo, Marcelo Mendoza, and Barbara Junaed Younus Khan, Md Khondaker, Tawkat Islam,
Poblete. 2011. Information credibility on twitter. In Anindya Iqbal, and Sadia Afroz. 2019. A bench-
Proceedings of the 20th international conference on mark study on machine learning methods for fake
World wide web, pages 675–684. news detection. arXiv preprint arXiv:1905.04749.
Ines Chami, Rex Ying, Christopher R, and Jure Yoon Kim. 2014. Convolutional neural networks
Leskovec. 2019. Hyperbolic graph convolutional for sentence classification. In Proceedings of the
neural networks. 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pages 1746–1751.
Wei Chen, Wenjie Fang, Guangda Hu, and Michael W ACL.
Mahoney. 2013. On the hyperbolicity of small-
world and treelike random graphs. Internet Mathe- Thomas N Kipf and Max Welling. 2016. Semi-
matics, 9(4):434–491. supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907.
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy
Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An
efficient algorithm for training deep and large graph Chang Li and Dan Goldwasser. 2019. Encoding social
convolutional networks. In Proceedings of the 25th information with graph convolutional networks for-
ACM SIGKDD International Conference on Knowl- Political perspective detection in news media. In
edge Discovery & Data Mining, pages 257–266. Proceedings of the 57th Annual Meeting of the ACL,
pages 2594–2604. ACL.
Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M
Rocha, Johan Bollen, Filippo Menczer, and Alessan- Qi Liu, Maximilian Nickel, and Douwe Kiela. 2019a.
dro Flammini. 2015. Computational fact checking Hyperbolic graph neural networks. In H. Wal-
from knowledge networks. PloS one, 10(6). lach, H. Larochelle, A. Beygelzimer, F. dÁlché-Buc,
E. Fox, and R. Garnett, editors, Advances in Neu-
Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Gin- ral Information Processing Systems 32, pages 8230–
ger cannot cure cancer: Battling fake health news 8241. Curran Associates, Inc.
with a comprehensive data repository. In Proceed-
ings of the International AAAI Conference on Web Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
and Social Media, volume 14, pages 853–862. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Luke Zettlemoyer, and Veselin Stoyanov. 2019b.
Marco Del Tredici, Diego Marcheggiani, Sabine Roberta: A robustly optimized bert pretraining ap-
Schulte im Walde, and Raquel Fernández. 2019. proach. arXiv preprint arXiv:1907.11692.
You shall know a user by the company it keeps: Dy-
namic representations for social media users in NLP. Ilya Loshchilov and Frank Hutter. 2017. Decoupled
In Proceedings of the 2019 Conference on Empirical weight decay regularization.
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan-
guage Processing (EMNLP-IJCNLP), pages 4707– Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon,
4717. ACL. Bernard J. Jansen, Kam-Fai Wong, and Meeyoung
Cha. 2016. Detecting rumors from microblogs with
Maurice Fréchet. 1948. Les éléments aléatoires de na- recurrent neural networks. In Proceedings of the
ture quelconque dans un espace distancié. In An- Twenty-Fifth International Joint Conference on Arti-
nales de l’institut Henri Poincaré, volume 10, pages ficial Intelligence, IJCAI16, page 38183824. AAAI
215–310. Press.
Octavian Ganea, Gary Bécigneul, and Thomas Hof- Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Ru-
mann. 2018. Hyperbolic neural networks. In Ad- mor detection on twitter with tree-structured recur-
vances in neural information processing systems, sive neural networks. In Proceedings of the 56th An-
pages 5345–5355. nual Meeting of the ACL (Volume 1: Long Papers),
pages 1980–1989. ACL.
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017.
Inductive representation learning on large graphs. In Laurens van der Maaten and Geoffrey Hinton. 2008.
Advances in neural information processing systems, Visualizing data using t-sne. Journal of machine
pages 1024–1034. learning research, 9(Nov):2579–2605.
Pushkar Mishra, Marco Del Tredici, Helen Yan- Baoxu Shi and Tim Weninger. 2016. Discriminative
nakoudakis, and Ekaterina Shutova. 2019. Abusive predicate path mining for fact checking in knowl-
Language Detection with Graph Convolutional Net- edge graphs. Knowledge-based systems, 104:123–
works. In Proceedings of the 2019 Conference of the 133.
North American Chapter of the ACL: Human Lan-
guage Technologies, Volume 1 (Long and Short Pa- Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee,
pers), pages 2145–2150. ACL. and Huan Liu. 2019a. Defend: Explainable fake
news detection. In Proceedings of the 25th ACM
Pushkar Mishra, Aleksandra Piktus, Gerard Goossen, SIGKDD International Conference on Knowledge
and Fabrizio Silvestri. 2020. Node masking: Mak- Discovery & Data Mining, KDD 19, page 395405.
ing graph neural networks generalize and scale bet- Association for Computing Machinery.
ter. ArXiv, abs/2001.07524.
Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dong-
Van-Hoang Nguyen, Kazunari Sugiyama, Preslav won Lee, and Huan Liu. 2018. Fakenewsnet: A data
Nakov, and Min-Yen Kan. 2020. Fang: Leveraging repository with news content, social context and dy-
social context for fake news detection using graph namic information for studying fake news on social
representation. media. arXiv preprint arXiv:1809.01286.
Ben Norton and Glenn Greenwald. 2016. Washington Kai Shu, Suhang Wang, and Huan Liu. 2019b. Beyond
Post disgracefully promotes a McCarthyite Blacklist news contents: The role of social context for fake
from a hidden, new and very shady group. news detection. In Proceedings of the Twelfth ACM
International Conference on Web Search and Data
Hoang NT and Takanori Maehara. 2019. Revisiting Mining, pages 312–320.
graph neural networks: All we have is low-pass fil-
ters. arXiv preprint arXiv:1905.09550. Kai Shu, Guoqing Zheng, Yichuan Li, Subhabrata
Mukherjee, Ahmed Hassan Awadallah, Scott Rus-
Emerson Yoshiaki Okano, Zebin Liu, Donghong Ji, and ton, and Huan Liu. 2020. Leveraging multi-source
Evandro Eduardo Seron Ruiz. 2020. Fake news de- weak social supervision for early detection of fake
tection on fake.br using hierarchical attention net- news. arXiv preprint arXiv:2004.01732.
works. In Computational Processing of the Por-
tuguese Language, pages 143–152. Springer Inter- Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani,
national Publishing. and Huan Liu. 2019c. The role of user profiles for
fake news detection. In Proceedings of the 2019
IEEE/ACM International Conference on Advances
Kenta Oono and Taiji Suzuki. 2019. Graph neural net- in Social Networks Analysis and Mining, pages 436–
works exponentially lose expressive power for node 439.
classification. arXiv preprint arXiv:1905.10947.
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,
Verónica Pérez-Rosas, Bennett Kleinberg, Alexan- Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
dra Lefevre, and Rada Mihalcea. 2017. Auto- Dropout: a simple way to prevent neural networks
matic detection of fake news. arXiv preprint from overfitting. The journal of machine learning
arXiv:1708.07104. research, 15(1):1929–1958.
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Petar Veličković, Guillem Cucurull, Arantxa Casanova,
Gardner, Christopher Clark, Kenton Lee, and Luke Adriana Romero, Pietro Lio, and Yoshua Bengio.
Zettlemoyer. 2018. Deep contextualized word repre- 2017. Graph attention networks. arXiv preprint
sentations. In Proceedings of the 2018 Conference arXiv:1710.10903.
of the North American Chapter of the ACL: Human
Language Technologies, Volume 1 (Long Papers), Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan,
pages 2227–2237. ACL. Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao.
2018. Eann: Event adversarial neural networks for
Kashyap Popat. 2017. Assessing the credibility of multi-modal fake news detection. In Proceedings
claims on the web. In Proceedings of the 26th Inter- of the 24th acm sigkdd international conference on
national Conference on World Wide Web Compan- knowledge discovery & data mining, pages 849–857.
ion, pages 735–739. ACM.
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Janek Bevendorff, and Benno Stein. 2017. A sty- Chaumond, Clement Delangue, Anthony Moi, Pier-
lometric inquiry into hyperpartisan and fake news. ric Cistac, Tim Rault, R’emi Louf, Morgan Funtow-
arXiv preprint arXiv:1702.05638. icz, and Jamie Brew. 2019. Huggingface’s trans-
formers: State-of-the-art natural language process-
Erzsébet Ravasz and Albert-László Barabási. 2003. Hi- ing. ArXiv, abs/1910.03771.
erarchical organization in complex networks. Physi-
cal review E, 67(2):026112. Ke Wu, Song Yang, and Kenny Q Zhu. 2015. False ru-
mors detection on sina weibo by propagation struc-
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, tures. In 2015 IEEE 31st international conference
Rianne Van Den Berg, Ivan Titov, and Max Welling. on data engineering, pages 651–662. IEEE.
2018. Modeling relational data with graph convolu-
tional networks. In European Semantic Web Confer- John Zarocostas. 2020. How to fight an Infodemic.
ence, pages 593–607. Springer. The Lancet, 395(10255):676.
Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. Safe:
Similarity-aware multi-modal fake news detection.
arXiv preprint arXiv:2003.04981.
Xinyi Zhou and Reza Zafarani. 2018. Fake news: A
survey of research, detection methods, and opportu-
nities. arXiv preprint arXiv:1812.00315.
A Appendix
A.1 Text preprocessing T rue P ositive
P recision =
T rue P ositive + F alse P ositive
We clean the raw text of the crawled articles of
the GossipCop dataset before using them for train-
ing. More specifically, we replace any URLs T rue P ositive
and hashtags in the text with the tokens [url] Recall =
T rue P ositive + F alse N egative
and [hashtag] respectively. We also replace
new line characters with a blank space and make A.5 Training Details
sure that class distributions across the train-val-test 1. To leverage effective batching of graph data
splits are the same. during training, we cluster the Graph into 300
dense sub-graphs using the METIS (Karypis
A.2 Hyper-parameters
and Kumar, 1998) graph clustering algorithm.
All our code is in PyTorch and we use the Hug- We then train all the GNN networks with a
gingFace library (Wolf et al., 2019) to train the batch-size of 16, ie, 16 of these sub-graphs are
transformer models. We grid-search over the fol- sampled at each pass as detailed in (Chiang
lowing values of the parameters for the respective et al., 2019). This vastly reduces the time,
models and choose the best setting based on best memory and computation complexity of large
F1 score on test set: sparse graphs.
1. CNN: learning rate = [5e-3, 1e-3, 5e-4, 1e-4], 2. Additionally, for GCN we adopt ”diagonal
dropout = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], weight enhancement” by adding identity to the origi-
decay = [1e-3,2e-3] nal adjacency matrix A (Chiang et al., 2019)
and perform the normalization as:Ã = (D +
2. Transformers: learning rate = [5e-3, 1e-3, I)−1 (A + I).
5e-4, 1e-4], weight decay = [1e-3, 1e-2, 1e-
1, 5e-1], hidden dropout = [0.1, 0.2, 0.3, 0.4, 3. For SAGE we use ”mean” aggregation and
x0
0.5], attention dropout = [0.1, 0.2, 0.3, 0.4, normalize the output features as x0i where,
k i k2
0.5]
x0i is x0i = W1 xi + W2 · meanj∈N (i) xj .
3. GNNs: learning rate = [5e-3, 1e-3, 5e-4,
4. For GAT, we use 3 attention heads with atten-
1e-4], weight decay = [1e-3, 2e-3], hidden
tion dropout of 0.1 to stabilize training. We
dropout = [0.1, 0.2, 0.3, 0.4, 0.5], node mask
concatenate their linear combinations instead
= [0.1, 0.2, 0.3, 0.4, 0.5], hidden dimension =
of aggregating, to have a output of each layer
[128, 256, 512]
to be 3 × hidden dim.
The set of best hyper-parameters for all models A.6 Results with CNN text encoder
are reported in Table 3.
The results of the proposed SAFER framework
A.3 Hardware and Run Times with CNN used as the text-encoder are reported
in Table 5. We can note similar trends in the per-
We use NVIDIA Titanrtx 2080Ti for training
formance although the scores are slightly lower as
multiple-GPU models and 1080ti for single GPU
compared to GossipCop.
ones. In Table 4 we report the run times (per epoch)
for each model. A.7 Effect of graph sparsity and frequent
users
A.4 Evaluation Metric
In Table 6 we report the performance of all the
We use F1 score (of the target class, ie, fake class)
GNN variants of the proposed SAFER framework
to report all our performance. F1 is defined as :
for different subsets of highly active users.
P recision × Recall A.8 Community Graph
F1 = 2 ×
P recision + Recall
A portion of the community graph is visualized in
where, Precision and Recall are defined as: Figure 5.
Graph Text
GCN GAT SAGE R - GCN R - GAT H y- GCN H y- GAT CNN R o BERT a
−4 −4 −4 −3 −4 −3 −3 −4
Learning rate 5 · 10 5 · 10 1 · 10 1 · 10 1 · 10 5 · 10 5 · 10 5 · 10 5 · 10−4
Weight Decay 1 · 10−3 1 · 10−3 2 · 10−3 2 · 10−3 1 · 10−3 1 · 10−3 1 · 10−3 1 · 10−3 5 · 10−1
Attention dropout NA 0.1 NA NA 0.1 NA NA NA 0.4
Hidden dropout 0.1 0.4 0.2 0.4 0.2 0.1 0.1 0.5 0.1
Node masking prob. 0.1 0.1 0.1 0.1 0.1 0.1 0.1 NA NA
Hidden dimension 256 512 128 512 512 256 512 384 1024
Learning rate 5 · 10−3 5 · 10−4 1 · 10−4 5 · 10−4 1 · 10−3 5 · 10−3 5 · 10−3 5 · 10−4 5 · 10−4
Weight Decay 1 · 10−3 2 · 10−3 2 · 10−3 2 · 10−3 1 · 10−3 1 · 10−3 1 · 10−3 1 · 10−3 5 · 10−1
Attention dropout NA NA NA NA NA NA NA NA 0.4
Hidden dropout 0.4 0.2 0.2 0.2 0.2 0.1 0.1 0.5 0.1
Node masking prob. 0.1 0.2 0.1 0.2 0.1 0.3 0.3 NA NA
Hidden dimension 512 512 128 512 512 512 256 384 1024
Table 3: Best Hyper-parameters for all the models on GossipCop (top) and HealthStory (bottom).
GossipCop HealthStory
Method No. of GPUs Run time (per epoch) No. of GPUs Run time (per epoch)
CNN 4 15 mins 4 1.25 mins
R o BERT a 4 6 mins 4 3 mins
SAGE 1 8.77 secs 1 1.49 secs
GCN 1 6.06 secs 1 1.91 secs
GAT 1 6.76 secs 1 1.96 secs
RGCN 1 6.92 secs 1 1.40 secs
RGAT 1 7.88 secs 1 2.16 secs
H y- GCN 1 10.39 secs 1 1.64 secs
H y- GAT 1 16.50 secs 1 2.97 secs
4382 14995
13345 39087
Figure 5: Visualization of a small portion of the fake news community graph. Green nodes represent the articles of the dataset
while red nodes represent users that shared them.
A.9 t-SNE visualizations this article, we see that on average these users
shared 5.8 fake articles while just 0.45 real ones
A.10 Qualitative Analysis (13 times more likely to share fake content than
We assess the performance of the SAFER (GCN) real), strongly indicating that the community of
variant on Gossipcop in Figure 7a. We see that users that are involved in sharing of this article are
the first article is a fake article which RoBERTa responsible for propagation of fake news. Taking
incorrectly classifies as real. However, looking at this strong community-based information into con-
the content-sharing behavior of users that shared sideration, SAFER is able to correctly classify this
Model GossipCop HealthStory
†
HAN 67.20 - 20
dEFEND† 75.00 - 10
Text SAFE
‡
89.50 - 0
20 10
15
10
5
5
15
10 10
0 0
10
10
20
† (b)
using CNN as the text encoder. ( ) denotes results reported
from Shu et al. (2019a) and (‡ ) from Zhou et al. (2020). Bold
20
20
20
15
10
5
0
Setting ρ SAFER F1 20 10 0 10 20 15
10
5
(a)
The plant extract resveratrol, found in the skin of red grapes, appears
to suppress inflammation and may fight aging in humans, according to
a new study.... apparently because resveratrol affects a gene
associated with longevity....they have found that resveratrol reduces
?
inflammation in humans that could lead to heart disease, stroke, and
type 2 diabetes.... 20 people and put them at random into two groups,
one receiving a placebo and the other a supplement containing 40
milligrams of resveratrol...fasting blood samples were taken at the start
of the trial and then at intervals of one, three, and six weeks...people
taking resveratrol also showed suppression of the.. TNF...blood
samples from those on placebo showed no significant change in pro-
inflammatory markers.
(b)
Figure 7: Demonstrating the importance of community-based features of the proposed method on (a) Gossipcop and, (b)
HealthStory. Text in red denotes a fake article, while in green denotes a real one. Black central node denotes the target article
node that we are trying to classify, blue nodes denote the users that shared this article while red and green nodes denote the other
fake and real articles these users have interacted with respectively. Predictions by different models stated on the right.