Fake News Detection
(https://youtu.be/voFIKCTHCRA)
Anushya Subbiah Divya Sudhakar Kenny Hsu
[email protected] [email protected] [email protected]
Introduction Model: BERT + DNN Discussion
We'd like to reliably and scalably detect fake news - articles Human baseline: Manually labeling approximately 360
masquerading as news containing intentional examples in the test set achieved an accuracy of 79.1% and
misinformation - using the linguistic features in the news as recall of 36%.
well as user engagement features from Twitter.
RNN using only the title features was able to fit the training set
We experimented with a variety of deep neural networks fairly well. In general, just the news features is sufficient for
such as MLPs, RNNs, transfer learning using BERT. good accuracy. Accuracy gain from user engagement features
was marginal.
These models were able to achieve higher accuracy than a
human. The models are able to achieve high enough User engagement data was valuable in accurately classifying
accuracy with just the news features. User engagement some examples. Model struggled to correctly classify articles
features improve accuracy but only marginally. from wikipedia without the user data. Model still struggles with
celebrity gossip - something which even humans struggle with
without additional research.
Data
FakeNewsNet: Labeled News datasets from PolitiFact and
GossipCop joined with data on user engagement with the
news articles from Twitter.
Results References
● 22,233 articles total. ● Accuracy: 84% [1] Fake News Detection on Social Media: A Data Mining
● F1 score:0.89 Perspective
● Approximately 75% real news, 25% fake.
[2] FakeNewsNet: A Data Repository with News Content, Social
● 1.8 million tweets along with user data.
Context and Spatiotemporal Information for Studying Fake
News on Social Media
[3] Fake News vs Satire: A Dataset and Analysis
Features Future Work [4] A Retrospective Analysis of the Fake News Challenge
● Title and text of news content ● Incorporating images from news article as features. Stance Detection Task
● Source (domain name) of news article ● Incorporating retweets and spread/virality of the news on
[5] “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake
● Text of the tweet twitter as features.
● Screen name, id, location of user who tweeted ● Other approaches to aggregating the tweets associated News Detection
● Social data on the user (# of followers/friends, how with a news article [6] BERT: Pre-training of Deep Bidirectional Transformers for
many tweets, etc…) Language Understanding