Big Data ML-Based Fake News Detection Using Distributed Learning

Uploaded by

lanka; srinivasudu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views17 pages

Big Data ML-Based Fake News Detection Using Distributed Learning

Uploaded by

lanka; srinivasudu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Received 4 March 2023, accepted 20 March 2023, date of publication 22 March 2023, date of current version 28 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3260763

Big Data ML-Based Fake News Detection Using

Distributed Learning
ALAA ALTHENEYAN AND ASEEL ALHADLAQ
Department of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, Riyadh 11495, Saudi Arabia
Corresponding author: Alaa Altheneyan ([email protected])
This work was supported by Researchers Supporting Project number (RSPD2023R532), King Saud University, Riyadh, Saudi Arabia.

ABSTRACT Users rely heavily on social media to consume and share news, facilitating the mass dis-
semination of genuine and fake stories. The proliferation of misinformation on various social media
platforms has serious consequences for society. The inability to differentiate between the several forms of
false news on Twitter is a major obstacle to effective detection of fake news. Researchers have made progress
toward a solution by emphasizing methods for identifying fake news. The dataset FNC-1, which includes four
categories for identifying false news, will be used in this study. The state-of-the-art methods for spotting fake
news are evaluated and compared using big data technology (Spark) and machine learning. The methodology
of this study employed a decentralized Spark cluster to create a stacked ensemble model. Following feature
extraction using N-grams, Hashing TF-IDF, and count vectorizer, we used the proposed stacked ensemble
classification model. The results show that the suggested model has a superior classification performance of
92.45% in the F1 score compared to the 83.10 % F1 score of the baseline approach. The proposed model
achieved an additional 9.35% F1 score compared to the state-of-the-art techniques.

INDEX TERMS Big data, machine learning, fake news, ensemble learning, social media.

I. INTRODUCTION Many automatically assume that the news is either bogus

The use of social media platforms to disseminate and digest or legitimate based on the article’s content. Techniques based
media has increased in recent years. Social networking sites on news content use methods for collecting data and tone
like Facebook and Twitter generate daily data [1]. It is from fake news stories. The goal of style-based methods
no secret that the internet is a goldmine of information, for de-detecting false news is to utilize the manipulators’
especially recent news [2]. The proliferation of fake news writing styles for detection. By examining certain language
is directly attributable to the internet’s user-friendly nature. features, we can distinguish fake news from the real thing
Since fake news is often presented as factual, it is often [3]. However, false news is created with the intent of fooling
shared on social media. Often, this data is spread for profit readers. Thus, improving the detection of false news using
or influencing politics. The effects of fake news on society as news content style is a difficult problem. To assist in avoiding
a whole are profound. In the light of its profound impacts, the difficult and time-consuming human work of fact-
fixing this issue is crucial [3]. Multiple instances of false checking, the Natural Language Processing (NLP) industry
news were reported to have spread on social media during has shown considerable interest in automatic recognition
the 2016 US elections, including the presidential election of fake news [6], [7]. Determining the integrity of news
and the nomination of a new Air Marshal in India [4]. The is a difficult task, even for automated approaches [8].
dissemination of false information has negatively affected Familiarizing with what other news outlets say on the same
people’s mental health and society as a whole [5]. issue might be a useful starting point for recognizing false
news. Identifying a person’s position is the purpose of this
The associate editor coordinating the review of this manuscript and phase. Multiple tasks, such as evaluating online arguments
approving it for publication was Chong Leong Gan . [9], [10], verifying the integrity of Twitter rumors [11], [12],

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 29447
A. Altheneyan, A. Alhadlaq: Big Data ML-Based Fake News Detection Using Distributed Learning

FIGURE 1. Overview of the headline and text bodies with their respective stances.

or understanding the argumentation structure of seminal Fools’ jokes, rumors, clickbait, or stated opinions posted
works [13], [14], have traditionally relied on position online with incorrect facts.
identification. In this research work, ‘‘fake news’’ is defined as a written
In the first example of evaluating the first False News article that is manifestly untrue and falsely disseminated
Challenge (FNC-1), a media news source discusses a topic without being authentic mostly accompanird by malicious
to create automated fake news detection systems using intents. This definition includes three important textual,
AI technology and machine learning. Almost fifty groups visual, and audio bases. Other elements such as video-based
from industry and academics worked on this problem. One fake news and audio, are typically ignored when referring to
of the objectives of the FNC-1 challenge is to track out textual fake news; additionally, each element has its linguistic
a media production dealing with a certain title. It might complexities that necessitate different machine learning and
support, challenge, or have nothing to do with the title. Four deep learning algorithms to detect and solve problems such as
potential vantage points from which an essay is to be written. ‘Deep Fake,’ etc. The notion also implies that fake news might
The guidelines, dataset, and grading criteria for the FNC- be fact-checked, an important characteristic. Therefore, the
1 challenge are all available on their site. These topics are claims may be checked to see if they are true or false. Because
further shown in Figure 1, which depicts the results of four rumors are usually hard to verify, they are deleted from
distinct research. the definition because of this inclusion. Conspiracy theories
Multiple deep learning and Recurrent Neural Networks are classed as rumors because they are persistent rumors
(RNN), as well as their modifications, including Convolution that are difficult to refute. False information concerning the
Neural Networks (CNN) [15], are often employed for NLP entertainment sector, including hoaxes and April Fools’ gags,
tasks and have shown to perform magnificently on NLP- is not permitted because the objective must be harmful.
related tasks [16], [17], [18]. Furthermore, the goal is infamous as it seeks to affect public
opinion in favor of a specific message. It also removes
A. OVERVIEW OF FAKE NEWS DETECTION text bits that were mistakenly published improperly, such as
In 2017, Facebook released a white paper that explored transposed numbers.
the risks of online communication and the management of A model of the connection between headlines and news
being one of the most prominent social media platforms content is necessary for identifying clickbait. It is also crucial
today. Weedon, Nuland, and Stamos also noticed the growing to tell the difference between false news and clickbait. The
challenge of using the enigmatic phrase ‘‘fake news,’’ and term ‘‘clickbait’’ refers to articles with enticing headlines
proclaimed that ‘‘the overuse and misapplication of the written to attract online audience or traffic; when people click
term ‘‘fake news’’ might be challenging since we cannot on such a headline, they end up at a different website with
understand or adequately address these concerns without poorly written articles that have nothing to do with the subject
shared definitions’’ [19]. The word can apply to anything line. So, clickbait is written with one goal: getting more
from virtually incorrect news articles to deceptions, April people to visit a website that relies on advertising to make