Knowledge and The Web 2017/18 (2) Overview of Fake-News Detection

Knowledge and the Web 2017/18
(2) Overview of fake-news detection

Bettina Berendt
Last updated: 2017-10-06

Plan Lecture (Wed 16+) Exercise session (Thu 10.30+) Your individual & team work
Overview, warm-up, Entry test Do the entry test
Fake news detection methods: Establish team + research
overview question for project (version 1)
Semantic Web + Linked Data SPARQL Work on your project around
the task of
Fake news detection methods: Methods continued
research methods overview and: DM / ML test
Journalistic fact-checking (IT) Data quality task
Trust and reputation: Project consultancy
computational approaches (IT)
Project consultancy
SW/LD in industry (IT) Project consultancy
LD Fragments (IT) Peer-reviewing of projects
Ethics of web data mining: Peer-reviewing of projects
issues and approaches
(incl. privacy, discrimination) Peer-reviewing of projects
Project consultancy
Project presentations Key
• IT = invited talk
• may be skipped by
4-ECTS students
Structure for today‘s lecture:
The process of knowledge discovery
CRISP-DM
• CRoss Industry Standard
Process for Data Mining
• a data mining process
model that describes
commonly used
approaches that expert
data miners use to tackle
problems.
http://www.crisp-dm.org/Images/187343_CRISPart.jpg
• 1. “Business understanding”
• What IS fake news?
Fake news – a definition
• Fake news […] is where individuals or organisations intentionally
publish hoaxes, propaganda and other misinformation and
present it as factual.
• This can include blog and social media posts and fake online
media releases.
• It does not include news satire sites such as The Onion or The
Shovel as they are not presenting their content as legitimate
factual news. Their intention is satire rather than misinformation.
• It also does not include articles that are written from the
perspective of a particular opinion or editorial standpoint,
provided the information included is factually correct.
Slide from (Melbourne Atheneum Library, n.d.)

media releases.
media releases.
• Actually, “fake news” was often used to
denote precisely these news outlets before
Trump made the term’s current meaning
popular …
Weekend Update
 Beginning in 1975
with Chevy Chase,
Weekend Update
quickly became a
favorite skit among
all of Saturday
Night Live’s
infamous sketches.
“Good night, and have
a pleasant tomorrow.”
-Chevy Chase
Slide from (Bennett et al., 2016)

Weekend Update
 Focusing on satirical
commentary of actual
events, Weekend
update also features
complete fabrication.
 SeeTina Fey in
Weekend Update

The Daily Show
 First aired on July
22nd, 1996 by Craig
Kilborn, who was
later replaced by
current host and
“anchorman” Jon
Stewart
 The Daily Show exists
as a news parody
program, and has
gained a reputation as
one of the sharpest
political commentaries
on television

The Daily Show
 Winner of 9 Emmys, 9 other
wins, and 23 nominations
 Focuses on humorous
retellings of actual current
events
 Though intentionally unreliable
as a news source, many
young Americans admit to
gaining the majority of their
current events news from The
Daily Show.
 click to see video clips of
The Daily Show

The Colbert Report
 Starring Stephen Colbert,
existing as a political
satire
 A spin-off of The Daily
Show, The Colbert
Report parodies
personality driven political
pundit programs like The
O’Reilly Factor
 Notorious for its
“truthiness” and faux-
conservative tinge

The Onion
 Weekly published parody newspaper
 Satirizes current events which are both

real and made up.
 Formed in 1988 in Madison, WI. Gained

popularity with its website in 1996.
Started to break into the mainstream in
2000.
 Has A.V. section which covers the arts

and entertainments truthfully, but
humorously.

media releases.
My FN, your FN?
https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Finsider.foxnews.com%2F2017%2F02%2F24%2Fwash-post-stands-behind-9-source-story-after-trump-calls-it-fake-news
Facts?
Opinions?
FN?
https://www.nytimes.com/2017/10/04/opinion/vegas-gun-control-
debate.html
media releases.
Intention?
• The intentionality of deception is also a
requirement in Rubin et al.’s (2015) definition
• Whose intentionality?
– Creator of the news?
• E.g. Governments ( WMD example)
• The press
– Purveyor of the news?
• The press
• Social media  you!
• How to capture the intention as a DM/ML feature?
https://fas.org/irp/cia/product/image016.jpg
https://en.wikipedia.org/wiki/Iraq_and_wea
pons_of_mass_destruction
Weapons of Mass
Destruction
and the 2003 Iraq
war
• If A, knowing that an item X is satire, retweets
X “without the metadata that it is satire”,
• And B reads and believes it
• Then did A create and/or spread FN?
"On Bullshit" (2005), by philosopher
Harry G. Frankfurt, is an essay that presents
a theory of bullshit that defines the concept
and analyzes the applications of bullshit in
the contexts of communication. Frankfurt
determines that bullshit is speech intended
to persuade (a.k.a. rhetoric), without regard
for truth.
• The liar cares about the truth and attempts
to hide it;
• the bullshitter doesn't care if what they
say is true or false, but rather only cares
whether or not their listener is persuaded.
CS
approach:
Define a
“ground
truth”
http://www.fakenewschallenge.org
• 2. + 3. Data understanding and preparation
http://www.fakenewschallenge.org/
• More options and steps possible / necessary
• Use off-the-shelf tools for NLP processing and
feature extraction
• Some pointers will be published on Toledo
• 4. Modelling
• 4.1. Approach: what do humans do to debunk
FN?
• 4. Modelling
• 4.2. Approach: How to formalise FN
detection?
•  What is the task?
 (Human) strategies
Human strategies translate to various
machine tasks
Strategy “Read past the headline”
• The goal of the Fake News Challenge is to explore how artificial intelligence technologies,
particularly machine learning and natural language processing, might be leveraged to
combat the fake news problem. We believe that these AI technologies hold promise for
significantly automating parts of the procedure human fact checkers use today to
determine if a story is real or a hoax.
• Assessing the veracity of a news story is a complex and cumbersome task, even for trained
experts 3. Fortunately, the process can be broken down into steps or stages. A helpful first
step towards identifying fake news is to understand what other news organizations are
saying about the topic. We believe automating this process, called Stance Detection,
could serve as a useful building block in an AI-assisted fact-checking pipeline. So stage #1
of the Fake News Challenge (FNC-1) focuses on the task of Stance Detection.
• Stance Detection involves estimating the relative perspective (or stance) of two pieces of
text relative to a topic, claim or issue. The version of Stance Detection we have selected
for FNC-1 extends the work of Ferreira & Vlachos 4. For FNC-1 we have chosen the task of
estimating the stance of a body text from a news article relative to a headline. Specifically,
the body text may agree, disagree, discuss or be unrelated to the headline.
Task “stance detection”
(à la Fake News Challenge)
Stance
detection
-
Example
Strategies “Follow links and check sources”
and “Check other news outlets”
Task:
Claim
validation
Slide from (Hanselowski & Gurevych, 2017)

Task “veracity assessment”
(via article classification or regression)
“the prediction of the chances of a particular

news article (news report, editorial, expose,
etc.) being intentionally deceptive”
(Rubin, Conroy, & Chen, 2015)

• 4. Modelling
• 4.2. Approach: How to formalise FN
detection?
•  How to do this?
Example Fake News Challenge 1
Cisco’s SOLAT in the SWEN (1)
http://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html
SOLAT in the SWEN (2)
SOLAT in the SWEN (3)
Example Claim Validation
• (work in progress)
Classification by Conroy et al. (2015)
• Linguistic approaches
– Mainly word-based
– Syntax-based approaches
– Semantic analysis
• Compare “profile” of document with others known to be genuine
– Rhetorical structure and discourse analysis
• Systematic differences between deceptive and truthful messages in terms of their coherence and
structure
– Classifiers
• Classification of sentiment: assumption that deceivers use unintended emotional communication
– In sum, linguistic approaches most suited to domain-specific studies (e.g. product reviews,
business), may have limited generalizability
• Network approaches
– Linked data
– Social network behaviour
• Hybrids
Classification by Conroy et al. (2015)
• Linguistic approaches  more details: 18 October
– Mainly word-based
– Syntax-based approaches
– Semantic analysis
• Compare “profile” of document with others known to be genuine
– Rhetorical structure and discourse analysis
• Systematic differences between deceptive and truthful messages in terms of their coherence and
structure
– Classifiers
• Classification of sentiment: assumption that deceivers use unintended emotional communication
– In sum, linguistic approaches most suited to domain-specific studies (e.g. product reviews,
business), may have limited generalizability
• Network approaches
– Linked data  next week
– Social network behaviour  invited lecture on 8 November
• Hybrids  18 October
• 4. Modelling
• 4.3. Beyond data mining / machine learning
What about the other strategies?
 citation
analysis +
reputation?!
I would add:
Don’t assume
something is
true just
because it is
entertaining.
Maybe the satire news point to some other
cause?
• News consumption as entertainment
• Including satire-news?
https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fwww.fipp.com%2Fnews%2Finsightnews%2Fchart-
millennials-pay-for-entertainment-not-news
https%3A%2F%2Ffanyv88.com%3A443%2Fhttps%2Fwww.pinterest.co.uk%2Fmeyerlinger%2Fhead-down-
generation-smartphone-zombies
Awareness
tools /
nudges?!
Example: The FB audience nudge (better example: the
timer nudge, but I didn’t find a picture of it)
HCI over
and
above
DM/ML!
(Wang et al., 2013)

Strategy “prevention”?!
https%3A%2F%2Ffanyv88.com%3A443%2Fhttps%2Fimage.slidesharecdn.com%2Fmegenerationsreport2017final-170911182329%2F95%2Fadi-media-entertainment-
generations-report-2017-10-638.jpg
www.businessinsider.com/us-millennials-pay-for-entertainment-not-
news-2015-11
Are you a subscriber to a “real” newspaper
(paper or electronic)?
Do you know what it costs?
Just one example
https://image.slidesharecdn.com/zuoraguardiankeynote0320-v3-130325161057-phpapp01/95/paywall-20-the-reinvention-of-media-15-638.jpg
• 5. Evaluation
Important:
observations about the data and the solution
(from the FNC-1’s SOLAT in the SWEN team)
After exploring the dataset, a few features that are likely to be informative of
headline/body relationships became obvious -- for example:
• The number overlapping words between the headline and body text;
• Similarities measured between the word count, 2-grams and 3-grams; and
• Similarities measured after transforming these counts with term frequency-inverse
document frequency (TF-IDF) weighting and Singular Value Decomposition (SVD).
Using these features, it is not necessary to use a powerful and expressive model to
learn the complex mapping from these features to the stance label.
For this, Gradient-Boosted Decision Trees were chosen because of the model’s
robustness with regard to the different scales of our feature vectors. Specifically, no
normalization is needed and it can be regularized in several different ways to avoid
overfitting. Furthermore, XGBoost is a very efficient, open-source implementation that
was easily applied to the handcrafted features.
Human performance as an upper bound?
• Bond & DePaulo (2006)

• Meta-analysis of >200 experiments
• How good are humans at detecting lies in
text?
• 4% better than chance
Laypeople and experts?!
Another corpus: LIAR
We collected a decade-long, 12.8K manually
labeled short statements in various contexts
from POLITIFACT.COM, which provides
detailed analysis report and links to source
documents for each case.
William Yang Wang, "Liar, Liar Pants on Fire":

A New Benchmark Dataset for Fake News
Detection, to appear in Proceedings of the
55th Annual Meeting of the Association for
Computational Linguistics (ACL 2017), short
paper, Vancouver, BC, Canada, July 30-
August 4, ACL. DATA PDF BIB
And more
• https://www.kaggle.com/arminehn/rumor-citation
• A Snopes dataset from MPI
– http://resources.mpi-inf.mpg.de/impact/web_credibility_analysis/README
– http://resources.mpi-inf.mpg.de/impact/web_credibility_analysis/
– http://resources.mpi-inf.mpg.de/impact/credibilityanalysis /
• The Fake News Challenge dataset (see their site)
• Note: I have not inspected any of these for

quality yet!
• 6. Deployment
Thank you!
References
Bennett, R., Gustafson, G., & Paul, S. (2016). Fake News. http://teachingmedialiteracy.pbworks.com/f/Fake+
News.ppt
Bond, C.F. Jr. & DePaulo, B.M. (2006). Accuracy of deception judgments. Personality and Social Psychology
Review, 10 (3), 214-234. http://journals.sagepub.com/doi/abs/10.1207/s15327957pspr1003_2
Conroy, N.J., Rubin, V.L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news.
ASIST 2015. https://www.asist.org/files/meetings/am15/proceedings/submissions/posters/193poster.pdf
Frankfurt, H.G. (2005). On Bullshit. Princeton University Press. https://www.stoa.org.uk/topics/bullshit/pdf/on-
bullshit.pdf
Hanselowski, A. & Gurevych, I. (2017). NLP approaches to fact checking and fake news detection. Presentation
at Dagstuhl Seminar “User-Generated Content in Social Media“. July 2017.
http://materials.dagstuhl.de/files/17/17301/17301.IrynaGurevych.Slides.pdf
Melbourne Atheneum Library (n.d.) Credible Sources.
http://www.melbourneathenaeum.org.au/images/Website/esmart/credible_sources.pps
Rubin, V., Conroy, N., & Chen, Y. (2015). Towards news verification: Deception detection methods for news
discourse. Hawaii International Conference on Systems Sciences.
http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1046&context=fimspres
Wang, Y., Leon, P.G., Chen, X., Komanduri, S., & Norcie, G.(2013). From Facebook regrets to Facebook privacy
nudges. Ohio State Law Journal, 74, 1307-1335. http
://repository.cmu.edu/cgi/viewcontent.cgi?article=1335&context=heinzworks

Knowledge and The Web 2017/18 (2) Overview of Fake-News Detection

Uploaded by

Copyright:

Available Formats

Knowledge and The Web 2017/18 (2) Overview of Fake-News Detection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Knowledge and The Web 2017/18 (2) Overview of Fake-News Detection

Uploaded by

Copyright:

Available Formats

Knowledge and the Web 2017/18

(2) Overview of fake-news detection

Last updated: 2017-10-06

Slide from (Melbourne Atheneum Library, n.d.)

Slide from (Bennett et al., 2016)

Slide from (Bennett et al., 2016)

Slide from (Bennett et al., 2016)

Slide from (Bennett et al., 2016)

Slide from (Bennett et al., 2016)

 Satirizes current events which are both

 Formed in 1988 in Madison, WI. Gained

 Has A.V. section which covers the arts

Slide from (Bennett et al., 2016)

Slide from (Hanselowski & Gurevych, 2017)

“the prediction of the chances of a particular

(Rubin, Conroy, & Chen, 2015)

(Wang et al., 2013)

• Bond & DePaulo (2006)

William Yang Wang, "Liar, Liar Pants on Fire":

• The Fake News Challenge dataset (see their site)

• Note: I have not inspected any of these for

You might also like