0% found this document useful (0 votes)
88 views20 pages

A Comprehensive Review On Fake News Detection With Deep Learning

A Comprehensive Review on Fake News Detection With Deep Learning

Uploaded by

Miracle AMU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views20 pages

A Comprehensive Review On Fake News Detection With Deep Learning

A Comprehensive Review on Fake News Detection With Deep Learning

Uploaded by

Miracle AMU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Received November 10, 2021, accepted November 16, 2021, date of publication November 18, 2021,

date of current version November 30, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3129329

A Comprehensive Review on Fake News


Detection With Deep Learning
M. F. MRIDHA 1 , (Senior Member, IEEE), ASHFIA JANNAT KEYA 1 , MD. ABDUL HAMID 2,

MUHAMMAD MOSTAFA MONOWAR 2 , AND MD. SAIFUR RAHMAN 1


1 Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
2 Department of Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Corresponding author: M. F. Mridha ([email protected])

ABSTRACT A protuberant issue of the present time is that, organizations from different domains are
struggling to obtain effective solutions for detecting online-based fake news. It is quite thought-provoking to
distinguish fake information on the internet as it is often written to deceive users. Compared with many
machine learning techniques, deep learning-based techniques are capable of detecting fake news more
accurately. Previous review papers were based on data mining and machine learning techniques, scarcely
exploring the deep learning techniques for fake news detection. However, emerging deep learning-based
approaches such as Attention, Generative Adversarial Networks, and Bidirectional Encoder Representations
for Transformers are absent from previous surveys. This study attempts to investigate advanced and state-of-
the-art fake news detection mechanisms pensively. We begin with highlighting the fake news consequences.
Then, we proceed with the discussion on the dataset used in previous research and their NLP techniques.
A comprehensive overview of deep learning-based techniques has been bestowed to organize representative
methods into various categories. The prominent evaluation metrics in fake news detection are also discussed.
Nevertheless, we suggest further recommendations to improve fake news detection mechanisms in future
research directions.

INDEX TERMS Natural language processing, machine learning, deep learning, fake news.

I. INTRODUCTION other hand, rumors are unconfirmed and questionable infor-


The Internet has changed interaction and communication mation that is spread without the aim to deceive [15].
ways through low cost, simple access, and fast information On social media sites, spreaders’ intentions might be difficult
dissemination. Therefore, social media and online portals to determine. As a result, any false or incorrect informa-
have become more popular for news searches and reading tion is typically branded as misinformation on the Inter-
for many people rather than traditional newspapers. Social net. Distinguishing real and fake information is challenging.
media harms society by influencing major events even though However, many approaches have been adopted to address this
it has become a powerful means of information. Especially issue. Various machine learning (ML) methods have been
after the presidential election of the U.S. in 2016, the issue used to detect false information spread online in the case
of online false news has gained more popularity [1], [2]. of knowledge verification [16], natural language processing
According to Zhang and Ghorbani [3], voters might be eas- (NLP) [16]–[18] and sentiment analysis [19]. Early research
ily controlled by deceptive political statements and claims. concentrated on leveraging textual information derived from
Inspection shows that false news or lies propagate more the article’s content, such as statistical text features [20] and
quickly through humans than original information and cause emotional information [21]–[23].
tremendous effects [4]. Deep learning (DL) has recently become an emerg-
The terms rumor and fake news are closely interrelated. ing technology among the research community and has
Fake news or disinformation is intentionally created. On the proven to be more effective in recognizing fake news than
traditional ML methods. DL has some particular advan-
The associate editor coordinating the review of this manuscript and tages over ML, such as a) automated feature extraction,
approving it for publication was Sergio Consoli . b) lightly dependent on data pre-processing, c) ability to

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 156151
M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 1. A comparison of existing surveys based on fake news detection.

extract high-dimensional features, and c) better accuracy. future research direction. Finally, Section VIII concludes the
Further, the current wide availability of data and program- paper.
ming frameworks has boosted the usage and robustness of
DL-based approaches. Hence, in the last five years, numer- II. FAKE NEWS CONSEQUENCES
ous articles have been published on fake news detection, There has always been fake news since the beginning of
mostly based on DL strategies [24]. An enthusiastic effort human civilization. However, the spread of fake news is
has been made to review the current literature to compare the increased by modern technologies and the conversion of the
extensive amount of DL-based fake news detection research global media landscape. The major consequences on social,
efforts. political, and economic environments may be caused by fake
A number of research works has been published on the news. Fake information and fake news have various faces.
survey of fake news detection [5], [25], [26]. Our investiga- As information molds our view toward the world, fake news
tion reveals that existing studies do not provide a thorough has a huge impact. We make critical decisions based on
overview of deep learning-based architectures for detecting the information. By obtaining information, we develop an
fake news. The existing survey papers mostly cover the ML impression about a situation or people. We cannot obtain
strategies in detecting fake news, scarcely exploring the DL good decisions if we find fake, false, distorted, or fabricated
strategies [3], [9], [10]. We provide a complete list of NLP information on the Internet. The primary impacts of fake news
techniques as well as describe their benefits and drawbacks. are as follows:
In what follows, in this survey, we performed an in depth Impact on Innocent People: Rumors can have a major
analysis of current DL-based studies. Table 1 provides a impact on specific people. These people may be harassed
brief overview of the existing survey papers and our research by social media. They may also face insults and threats
contributions. The present study aims to address the previous that may have real-life consequences. People must not
research’s weaknesses and strengths by conducting a system- believe in invalid information on social media or judge a
atic survey on fake news detection. First, we divide exist- person.
ing fake news detection research into two main categories: Impact on Health: The number of people searching for
(1) Natural Language Processing (NLP) and (2) Deep health-related news on the Internet is continuously increas-
Learning (DL). We discuss the NLP techniques such as ing. Fake news in health has a potential impact on people’s
data pre-processing, data vectorizing, and feature extraction. lives [36]. Therefore, this is one of the major challenges today.
Second, we analyze the fake news detection architectures Misinformation about health has had a tremendous impact in
based on different DL architectures. Finally, we discuss used the last year [37]. Social media platforms have made some
evaluation metrics in fake news detection. Figure 1 depicts an policy changes to ban or limit the spread of health misinfor-
overall taxonomy of fake news detection approaches. We also mation as they face pressure from doctors, lawmakers, and
include a table 2, including acronyms used throughout the health advocates.
survey to assist researchers when encountering issues due to Financial Impact: Fake news is currently a crucial problem
acronyms. in industries and the business world. Dishonest businessmen
The rest of the paper is organized as follows. Section II spread fake news or reviews to raise their profits. Fake infor-
highlights the consequences of fake news. Section III mation can cause stock prices to fall. It can ruin the fame
describes the used datasets. Section IV explains the Natu- of a business. Fake news also has an impact on customer
ral Language Processing techniques in fake news detection. expectations. Fake news can create an unethical business
Section V contains an in-depth analysis of deep learning mentality.
strategies. Section VI presents the evaluation metrics used Democratic impact: The media has discussed the fake
in previous studies. Section VII narrates the challenges and news phenomenon significantly because fake news played a

156152 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 2. The table contains the acronyms used in this survey.

FIGURE 1. A taxonomy of deep learning-based fake news detection.

vital role in the last American presidential election. This is III. BENCHMARK DATASET
a major democratic problem. We must stop spreading fake In this section, we discuss the datasets used in various
news as it has a real impact. studies. For both training and testing, benchmark datasets

VOLUME 9, 2021 156153


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 3. The table provides details of publicly available datasets and corresponding URLs.

were utilized. One of the difficulties in identifying fake news


is the shortage of a labeled benchmark dataset with trust-
worthy ground truth labels and a massive dataset. Based on
that, researchers can obtain practical features and construct
models [38]. For several usages in DL and ML, such datasets
have been collected over the last few years. The datasets
are vastly diverse from one another because of different
study agendas. For instance, a few datasets are made up
entirely of political statements (such as PolitiFact), while
others are made up entirely of news articles (FNC-1) or FIGURE 2. A pie chart of the benchmark datasets used in the studies of
fake news detection.
social media posts (Twitter). Datasets can differ based on
their modality, labels, and size. Therefore we categorize these
datasets in table 3 based on these characteristics. Fake articles dataset and they reported an accuracy of 93.50% which is the
are frequently collected from fraudulent websites designed highest, utilizing the same dataset for fake news detection.
intentionally to disseminate disinformation. These false news A pie chart of used benchmark datasets is given in 2.
stories are eventually shared on social media platforms by
their creators. Malicious individuals or bots and inattentive IV. NATURAL LANGUAGE PROCESSING
users who do not care to check the source of the story before Natural Language Processing (NLP) is an area in machine
sharing it assist in spreading fake news through social media. learning with the capability of a computer to understand, ana-
However, most datasets contain only news content. But cur- lyze, manipulate, and potentially generate human language.
rent language features and writing style are not sufficient The NLP technique consists of data pre-processing and word
enough in developing an efficient detection model. embedding. By utilizing deep learning techniques, NLP has
Fake news, Twitter15, and Liar are the most popular seen some colossal advancements in recent years [41]. The
datasets that are publicly available. But some studies trained natural language must be transformed into a mathemati-
their model with their created dataset [39]. We defined these cal structure to give machines a sense of natural language.
datasets as self-collected. Since sufficient information is not In section IV-A, IV-B, and IV-C, NLP techniques are
provided about their self-collected datasets, we find it dif- discussed.
ficult to compare with other studies properly. Using the
benchmark dataset, a comparative study can be established A. DATA PRE-PROCESSING
with current state-of-the-art methods for detecting fake news. Data pre-processing is utilized to represent complex struc-
Kaliyar et al. [40] conducted a comparative study of their tures with attributes, binarize attributes, change discrete
suggested model with existing methods using the Kaggle attributes, persist, and manage lost and obscure attributes.

156154 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

During data pre-processing, different visualization proce- dataset into training, validation, and test sets, few studies have
dures are helpful. A cautious pre-processing strategy is used only the training, and test sets [46], [47]. The ratios of
required to ingest the data in a neural network for fake news data split 60:20:20, 70:30, and 80:20 are very common in fake
detection because social media data sources are fragmented, news detection. The Pareto principle (for many outcomes,
unstructured, and noisy. It is a popular fact that amid the roughly 80% of consequences come from 20% of the causes)
learning stage, data pre-processing saves computational time is used to describe the 80:20 ratio. It is typically a safe bet
and space. In addition, limiting the impact of artifacts during to use the ratio that all studies applied. Mandical et al. [48]
the learning process, text pre-processing avoids every ingests applied the ratio of 90:5:5 and 80:10:10 when the number
of noisy data. The data becomes a logical representation after of articles in the dataset was less than 10,000 and greater
proper text pre-processing. It also included the most repre- than 10,000, respectively. However, they did not specify
sentative descriptive words. Umer et al. [42] experimented the purpose behind it. Jadhav and Thepade [49] compared
on a fake news detection model in which the accuracy was their model performance based on the data splitting ratio
only 78% when they used the features excluding data cleaning and showed that 75%–25% data split has more prominent
or pre-processing, which is surprisingly poor. After perform- performance than other models possessing diverse splits. The
ing the pre-processing steps and removing unnecessary data, model parameter estimates exhibit more prominent varia-
the accuracy increases dramatically to 93.0%. Data quality tion with smaller training data. Performance statistics exhibit
assessment, dimensionality reduction, and splitting of the more prominent variation with smaller testing data. Studies
dataset are the data pre-processing steps used in various stud- should be careful with splitting data so that neither variation
ies [39], [41], [43]. The pre-processing steps are elaborated is too large or too small, and it has more to do with the
in Sections IV-A1, IV-A2, and IV-A3. total number of instances in each category rather than the
percentage. The optimal split of the test, validation and train
1) DATA QUALITY ASSESSMENT sets is determined by hyperparameters, model architecture,
Data are frequently taken from numerous sources that are data dimension, etc. Table 4 provides an overview of the
ordinarily reliable and are in completely different formats. advantages and disadvantages of the splitting ratios used in
When working on a machine learning problem, more time most studies:
is invested in managing data quality issues. It is unreason-
able to anticipate that the data would be perfect. There may TABLE 4. The table gives an overview of common dataset partitioning
based on training, validation, and testing with advantages and
be some issues due to a human blunder, defects within the disadvantages. Few studies mentioned their data partitioning, and only
data collection process, or restrictions on measuring gadgets. those references are given in the table.
The quality of a dataset is often responsible for the poor
performance of fake news detection models. For this reason,
the quality of the data used in any machine learning project
will have a huge effect on the chances of success. However,
only a few studies ensure the quality of their used datasets.
S and Chitturi [41] collected the George McIntire dataset
from GitHub and dropped the rows that did not have
labels in the clarifying process, and the process surely has
a huge impact on their success in fake news detection.
To ensure the quality of the entire dataset, Wang et al. [44]
removed duplicate and low-quality images. Alsaeedi and
Al-Sarem [45] extended the data cleaning process by URL
removal, lowercase and hashtag character (#) removal,
mention character (@), and number removal. They also con- 3) TOKENIZATION, STEMMING AND LEMMATIZATION
sidered words with recurring characters such as ‘‘Likkke’’ Tokenization is a method of breaking down a text into words.
and handled emoticons by supplanting positive emoticons This can be applied to any character. Performing tokenization
with a ‘‘positive’’ word and with a ‘‘negative’’ word for on a space character is the most common way of tokenization.
negative emoticons. Chopping off an end to achieve the base word is called
stemming. The removal of derivational affixes is usually
2) TRAIN/VALIDATION/TEST SPLIT BASED included in the stemming. A derivational affix is an affix in
The dataset may be divided into train, test, and validation which one word is obtained from another. The derived word
sets. The sample of data that is utilized to adjust the param- is usually a distinct class of words from the original.
eters is called the training set. The validation set is a series Lemmatization is a text normalization procedure that mor-
of examples used to fine-tune the parameters of a model. phologically analyzes words, generates the root form of
A set of examples applied only for assessing a fully-specified inflated words, and is normally intended to remove inflec-
model’s performance is regarded as the test set. Although tional endings [64]. A group of letters applied to the end of
many studies on fake news detection have divided their a word to modify its meaning is known as an inflectional

VOLUME 9, 2021 156155


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

ending. Some examples of inflectional endings are s, bat, and word mapping onto a continuous, high-dimensional vector
bats. space. This is considered an improvement over the BoW
Rusli et al. [52] performed two experiments to detect fake model, wherein large sparse vectors of vocabulary size were
news with and without stemming and stop-word removal. used as word vectors. These large vectors also provided no
They used stemming and stop-word removal for removing all information about how the two words were interrelated or any
affixes and stop-words. They achieved a 0.82 macro-averaged other useful information [50]. Recently, fake news detection
F1-score by performing the stemming and stop-word removal researchers have used pre-trained word-embedding models
processes. They also achieved a 0.8 macro-averaged F1-score such as global vectors for word representation (GloVe) and
without performing stemming and stop-word removal. Per- Word2vec. The primary benefit of using these models is their
forming the stemming and stop-word removal processes ability to train with large datasets [40]. Unlike Word2Vec,
in the text preprocessing phase was time-consuming, but GloVe supports parallel implementation, making it easier to
there was a small difference in the results. Although tok- train the model on huge datasets. Table 5 gives a summary
enization, stemming, and lemmatization improve the per- of the NLP techniques and word vector models used in deep
formance of the classifier, many researchers have not used learning-based fake news detection papers.
these techniques [4], [65]. Jain and Kasbe [66] presented
simple technique with web scrapping for detecting fake news. C. FEATURE EXTRACTION
They showed that updating the dataset regularly with web A huge amount of computational power and memory is
scrapping a model’s truthfulness can be checked. The authors required to analyze a large number of variables. Classification
achieved an accuracy of 91% based on text. The result can algorithms may overfit the training samples and induce
be improved greatly with some extra preprocessing, such as poorly to new samples. Feature extraction is a process
stemming and omitting stop words. of building combinations of variables to overcome these
difficulties while still representing the data with adequate pre-
B. WORD VECTORIZING cision. Feature extraction and feature selection are frequently
Word vectorizing involves mapping the word/text to a list used in text mining [69], [97].
of vectors. TF-IDF and Bag of Words (BoW) vectorization Fake news detection strategies concentrate on apply-
techniques are commonly used in machine learning strategies ing news content and social context features [98]. News
to identify fake news [4], [53], [63]. In term frequency inverse content features highlights depict the meta-information
document frequency (TF-IDF), the value rises proportionally relevant to a chunk of news [5]. Commonly, in news
to the number of times a word emerges in the document but is validation, news content (linguistics and visual informa-
balanced by the frequency of the word in the body. Although tion) is used as a feature [99], [100]. Textual features
this vectorization is successful, the semantic sense of the comprise the writing style and emotion [101], [102].
words is lost in its attempt to translate to numbers [48]. The Furthermore, hidden textual representations are generated
BoW technique considers every news article to be a document using tensor factorization [103]–[105] and deep neural
and computes the frequency count of each word within this networks [106]–[108], achieving high performance in detect-
document, which is then used to produce a numeric repre- ing false news with news contents. Visual features are
sentation of the data. In addition to data loss, this approach retrieved from visual components such as image and video,
also has limitations. The relative location of the words is but only a few studies utilized visual features in fake news
overlooked, and contextual information is lost. This loss can detection [109], [110]. In contrast, social context information
be costly at times when measured against the benefit in com- can also be aggregated for detecting fake news in social
puting convenience with the ease of use [46]. Rusli et al. [52] media. There are three main perspectives of social content: a)
used TF-IDF and Bag of Words feature extraction methods to users, b) produced posts and c) networks (connection amidst
detect fake news. However, this approach may suffer due to the users who distributed relevant posts) [5]. User-based fea-
loss of information. tures are typically from the user profile in social media [98],
Neural network-based models have accomplished victory [111]. Users’ social responses in terms of stances [42], [64],
on diverse language-related roles as opposed to traditional topics [112], or credibility [113]–[115] are represented via
machine learning-based models such as logistic regression of post-based features. Recently, several studies have focused
support vector machine (SVM) by utilizing word embeddings on stance features to detect fake news [64]. It can be effective
in fake news detection. It maps words or text to a list of for human fact-checkers to distinguish false claims [113],
vectors. They are low-dimensional, and disseminated feature [114]. To check the authenticity of a claim/report/headline,
representations are appropriate for natural languages. The it is essential to understand what different news agencies are
term ‘‘word embedding’’ refers to a combination of language declaring about that particular claim/report/headline. Refer-
modeling and feature learning. Words or expressions from the ence [116]. Features that are network-based are retrieved by
lexicon are allocated to real-number vectors. Neural network creating specialized networks, such as diffusion networks,
models essentially utilize this method for fake news detec- interaction networks, and propagation networks [117]–[119].
tion [42], [96]. Word representation was performed using The propagation network contains rich information about
dense vectors in word embedding. These vectors represent the user interactions (likes, comments, responses, or shares) that

156156 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 5. The table provides the advantages and disadvantages of Word Vector Models, along with the references.

complex feature extraction. Instead of relying on manual


feature selection and other existing techniques, researchers
are currently focusing on neural networks for feature extrac-
tion [123]. Yang et al. [124] employed a model TI-CNN
(Text and Image information based convolutional neural net-
work) to extract latent features from both visual and textual
information and achieved promising results [124]. Another
study [107] used the deep recurrent neural network model
for extraction of a collection of latent features for news
FIGURE 3. The infographic illustrates the social content/context features producers, posts, and topics.
such as user, post and network elaborately.

V. DEEP LEARNING APPROACH FOR FAKE


show the direction of information flow, timestamp details NEWS DETECTION
about interactions, textual information about user interac- Deep learning models have seen exceptional growth in recent
tions, and user profile information about the users who are times owing to their promising success in several fields,
interacting [120]. We provide Figure 3 depicting important including communication and networking [125], [126], com-
features that were utilized to detect fake news precisely. puter vision [127], [128], intelligent transportation [129],
It is pivotal to choose the correct determination algorithm speech recognition [130], as well as NLP. Deep learning
for decreasing features because feature reduction contains systems have advantages over traditional machine learning
an incredible effect on the text classification results. Some methods. Deep learning is a subfield of machine learn-
common feature reduction algorithms include Gini Coef- ing strategies, which displays high precision and exact-
ficient (GI), Term Frequency-Inverse Document Frequency ness in fake news detection. Generally, ML methods are
(TF-IDF), Information Gain (IG), Mutual Information (1v1I), based on hand-crafted features. Biased features may appear
Principal Component Analysis (PCA), and Chi-Square Statis- because feature extraction assignments are challenging and
tics (CHI ). In the process of content classification, the slow. ML approaches failed to achieve prominent results
linear classification model works well with the TF-IDF in fake news detection. Because ML approaches produce
model [121]. PCA and Chi-square were utilized to improve high-dimensional representations of linguistic information,
the adaptability of the text classifier combined with deep resulting in the curse of dimensionality. The existing neu-
learning models. A number of studies compared their model ral network-based models have outperformed the traditional
accuracy with and without feature extraction and found models in terms of their performance owing to their excep-
that with feature extraction, the success rate is higher. tional feature extraction ability [62]. In contrast, DL sys-
Umer et al. [42] compared the applications of feature reduc- tems can acquire hidden representations from less complex
tion methods (PCA and Chi-square) applied with two deep inputs. The hidden features can be extracted from both the
learning models. When the proposed model is utilized with news content and context varieties. A study by Hiramath and
the reduced feature set, it increases the F1-score and accu- Deshpande [78] showed that deep neural networks (DNNs)
racy by 20% and 4%, respectively, compared to the other require less time than other ML-based classification algo-
techniques. However, many studies did not perform fea- rithms such as logistic regression, random forest (RF), and
ture extraction, although it has a significant impact on the SVM, etc. However, DNNs use more memory. Convolutional
result [16], [122]. Neural networks are considered very neural network (CNN) and recurrent neural network (RNN)
powerful machine learning tools due to their ability of are two broadly utilized ideal models for deep learning in

VOLUME 9, 2021 156157


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

in the NLP technique too. It is utilized for mapping the


features of n-gram patterns. The CNN is similar to a multi-
layer perceptron (MLP) as it is an unsupervised multilayer
feed-forward neural network [45]. The CNN consists of an
input layer, an output layer, and a sequence of hidden layers.
CNNs are mostly used for picture recognition and classi-
fication. Neural networks with 100 or more hidden layers
have been reported in recent studies. Backward-propagation
and forward-propagation algorithms are utilized in neural
networks. These algorithms are used to train neural networks
by updating the weights of each layer. The gradient (deriva-
tive) of the cost function is utilized to update the weights.
FIGURE 4. A nested pie chart illustrating the percentage of published
articles and popular models each year. When the sigmoid activation function is applied, the value of
the gradient decreases per layer. This lengthens the training
time. This problem is called the vanishing-gradient problem.
A deeper CNN or a direct connection in dense solves this
problem. Compared to a normal CNN, a deeper CNN is also
less vulnerable to overfitting [67]. Kaliyar et al. [40] proposed
a model FNDNet (deep CNN), which is designed to learn
the discriminatory features for fake news detection using
multiple hidden layers. The model is less prone to overfitting
but takes a longer time to train. The convolutional layer,
pooling layer, and regularization layer are the most utilized
layers in CNNs for fake news detection. The input data can
FIGURE 5. The diagram illustrates the general deep learning-based be manipulated through pooling and convolution operations.
architecture that was used in most studies. Sections V-A1, V-A2, and V-A3 describe the popular layers
used in CNN.
cutting-edge artificial neural networks. Therefore, we provide
Figure 4, which shows the percentage of DL-based fake news
detection papers with used classifiers in recent years.
After inspecting previous studies, we found a general
framework for deep learning-based fake news detection. The
first step was to collect a dataset or create one. Most studies
have used news articles collected from publicly available
datasets. The pre-processing technique was applied after col-
FIGURE 6. The figure shows the architecture of CNN. Here, an input
lecting the dataset to feed the data in a neural network [42], picture of a snowflake is given to the CNN picture classifier. The input
[96], [131]. Word2vec and GloVe word embedding methods goes through a series of convolution layers, pooling layer, fully connected
layers, and classifies the object based on learned features.
have mostly been used in previous studies to map words into
vectors [41], [78], [80]. We represent an overall process for
fake news identification with deep learning in Figure 5 based 1) CONVOLUTION LAYER
on various studies [40], [42], [61]. CNNs work very well with image classification and computer
148 DL-based studies were examined to provide a detailed vision because of the convolution operation, and their ability
description of these architectures: CNN in section V-A and to extract features from inputs for better representation makes
RNN in Section V-B, Graph Neural Network in Section V-C, them very efficient. These properties make CNNs powerful
Generative Adversarial Network in Section V-D, Attention in sequence processing [131]. Fernández-Reyes and Shinde
Mechanism in Section V-E, Bidirectional Encoder Repre- [77] proposed a CNN architecture called, StackedCNN
sentations for Transformers in Section V-F, and Ensemble (2-dimensional convolution layers, rather than 1-dimensional
Approach in Section V-G. convolutions). It is proven that finding patterns in text data a
fusion of pre-trained word embeddings with 2-dimensional
A. CONVOLUTIONAL NEURAL NETWORK (CNN) convolutional layers helps, but the performance of the
A few deep learning models have been introduced to handle StackedCNN is poor compared to state-of-the-art CNN.
ambiguous detection issues. CNNs and RNNs are the most Another study by Li et al. [132] adopted a novel approach
interesting models [77]. Researchers are trying to boost the with multilevel CNN (MCNN) and Sensitive word’s weight
performance of the fake news detector with CNN by taking calculating method (TFW). MCNN-TFW successfully cap-
its power of extracting features well and better classification tured semantic information from the article text content. For
process [132]. However, CNNs are also gaining popularity this reason, it outperforms the compared methods, including

156158 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

CNN. Their work did not consider latent-based features.


Alsaeedi and Al-Sarem [45] added more convolution layers,
and it has an impact on the proposed model performance.
According to the results, the model’s performance is lowered
by about 0.014.

2) POOLING LAYER
A pooling operation that chooses the greatest component
from each patch of each feature map covered by the filter is
called max pooling. A pooling layer is a new layer attached
to the convolutional layer. Its purpose is to continuously
diminish the spatial size of the representation in order to
decrease the number of parameters and the calculation inside
the network. The pooling layer operates autonomously on
each feature map. Max pooling or average pooling is the most
commonly used function in fake news detection. Alsaeedi
and Al-Sarem [45] adjusted the hyperparameter settings in
a CNN. They found the best parameter settings that gave an
improvement in the model’s performance. The recommended FIGURE 7. The figure shows an architecture of basic RNN with n
CNN model performs best when the number of units in the sequential layers. x represents the inputs and y represents the output
generated by the RNN.
dense layer is set to 100, the number of filters is set to 100,
and the window size is set to 5. The GlobalMaxPooling1D
method achieved the highest scores, showing that it works a minimum error function. The size of the gradients becomes
well for fake news detection when compared to other pooling tiny for each consequent layer. Thus, the RNN suffers from
methods [45]. a vanishing gradient issue in the bottom layers of the net-
work. We can deal with the vanishing gradient problem by
3) REGULARIZATION LAYER using three solutions: (1) using rectified linear unit (ReLU)
The most crucial problem of classification is to reduce the activation function, (2) using RMSProp optimization algo-
training and test errors of the classifier. Another common rithm, and (3) using diverse network architecture such as
issue is the over-fitting problem (the space between training long short-term memory networks (LSTM) or gated recur-
and testing errors is huge). Overfitting makes it difficult to rent unit (GRU). So previous studies focused on LSTM and
generalize the model as it becomes more applicable (over- GRU rather than the state-of-the-art RNN [80], [96], [134].
fit) to the training set. Regularization is a solution to the Bugueño et al. [80] proposed a model based on RNN for
overfitting problem. Regularization is applied to the model propagation tree classification. The authors used RNN for
to lessen the problem of overfitting and decrease the error sequence analysis. The number of epochs was set as 200,
of generalization, but not the error of training [45]. The which is relatively high in comparison to their training exam-
dropout regularization method is mostly used for fake news ples. To predict fake news articles, authors have proposed
detection [133]. Other methods such as early stopping and distinctive RNN models, specifically LSTM, GRU, tanh-
weight penalties were not used in previous studies on fake RNN, unidirectional LSTM-RNN, and vanilla RNN. RNNs,
news detection. Dropout avoids overfitting by gradually fil- and in specific LSTM, are especially successful in processing
tering out neurons. Eventually, all weights are calculated as an sequential data (human language) and catching significant
average so that the weight is not too high for a single neuron. features out of diverse data sources. Further, in Sections V-B1
and V-B2, we discuss LSTM and GRU.
B. RECURRENT NEURAL NETWORK (RNN)
The RNN is a type of neural network. In RNN, nodes are 1) LONG SHORT-TERM MEMORY (LSTM)
sequentially connected to construct a directed graph. The LSTM models are front runners in NLP problems. LSTM is
output from the earlier step serves as the input to the cur- an artificial recurrent neural network framework used in deep
rent step. RNNs are effective in time and sequence-based learning. LSTM is a progressed variation of RNN [41]. RNNs
predictions. RNN is less compatible with features compared are not capable of learning long-term dependencies because
to CNN. RNNs are suitable for studying sequential texts and back-propagation in recurrent networks takes a while, partic-
expressions. However, it cannot process very long sequences ularly for the evolving backflow of blunder. However, LSTM
when tanh or ReLU is used as an activation function. can keep ‘‘Short Term Memories’’ for ‘‘Long periods.’’ The
The backward-propagation algorithm is utilized in the LSTM is made up of three gates: an input gate, an output
RNN for training. While training the neural networks, it is gate, a forget gate, and a cell. Through a combination of the
required to take tiny steps frequently in the way of the nega- three, it calculates the hidden state. The cell can recall values
tive error derivative concerning network weights to establish over a large time interval. The word’s connection within the

VOLUME 9, 2021 156159


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

beginning of the content can impact the output of the word that it is difficult to determine whether one of the gated RNNs
afterward within the sentence for this reason [67]. LSTM (LSTM, GRU) is more successful, and they are usually cho-
is an exceptionally viable solution for tending the vanishing sen based on the basis of the available computing resources.
gradient issue. Bahad et al. [61] proposed an RNN model Girgis et al. [96] experimented with CNN, LSTM, Vanilla,
that suffers from the vanishing gradient issue. To tackle this and GRU. Vanilla suffers from a gradient vanishing problem,
issue, they implemented an LSTM-RNN. But still, LSTM but GRU solves this issue. Though GRU is said to be the
could not solve the vanishing gradient issue completely. The best outcome of their studies, it takes more training time.
LSTM-RNN model had a higher precision compared to the A bidirectional GRU was utilized by Singhania et al. [87]
initial state-of-the-art CNN. Asghar et al. [135] proposed for word-by-word annotation. With preceding and subsequent
bidirectional LSTM (Bi-LSTM) with CNN for rumor detec- words, it captures the word’s meaning within the sentence.
tion. The model preserves the sequence information in both A study by Shu et al. [100] proposed a sentence-comment
directions. The Bi-LSTM layer is effective in remembering co-attention subnetwork model named dEFEND (Explain-
long-term dependency. Even though the BiLSTM-CNN beat able fake news detection) utilizing news content and user
the other models, the suggested approach is computationally comments for fake news detection. The authors considered
expensive. textual information with bidirectional GRU (Bi-GRU) to
A study by Ruchansky et al. [123] suggested a model achieve better performance. Moreover, the model has a low
called CSI, which comprises three modules, Capture, Score, learning efficiency.
and Integrate. The capture module extracts features from the
article, and the score module extracts features from the user. C. GRAPH NEURAL NETWORK (GNN)
Then by integrating article and user-based features, the CSI A Graph Neural Network is a form of neural network that
model performs the prediction for fake news detection. The operates on the graph structure directly. Node classification
CSI model has fewer parameters than other RNN-based mod- is a common application of GNN. Essentially, every node
els. Another study by Sahoo and Gupta [136] proposed an in the network has a label, and the network predicts the
approach with both user profile and news content features for labels of the nodes without using the ground truth. The
detecting false news on Facebook. The authors used LSTM network extends recursive neural networks by processing
to identify fake news, and a set of new features are extracted a broader class of graphs, including directed, undirected
by Facebook crawling and Facebook API. It requires more graphs, and cyclic, and it can handle node-focused appli-
time to train and test the suggested model. Liao et al. [137] cations except any pre-processing steps [138]. The network
proposed a novel model called fake news detection multi-task extends recursive neural networks by processing a broader
learning (FDML). The model explores the influence of topic class of graphs, including cyclic, directed, and undirected
labels for fake news while also using contextual news infor- graphs, and it can handle node-focused applications without
mation to improve detection performance on short false news. requiring any pre-processing procedures cite190. GNN cap-
The FDML model, in particular, is made up of representation tures global structural features from graphs or trees better
learning and multi-task learning components that train both than the deep-learning models discussed above [139]. GNNs
the false news detection task and the news topic categoriza- are prone to noise in the datasets. Adding a little amount
tion task at the same time. However, the performance of the of noise to the graph via node perturbation or edge deletion
model decreases without the author’s information. and addition has an antagonistic effect on the GNN output.
Graph convolutional network (GCN) is considered as one of
2) GATED RECURRENT UNIT (GRU) the basic graph neural networks variants.
In terms of structure and capabilities, GRU is comparatively A study by Huang et al. [140] claimed to be the first
easier and more proficient than LSTM. This is because there that experimented using a rich structure of user behavior for
are only two gates, to be specific, reset and update. The rumor detection. The user encoder uses graph convolutional
GRU manages the information flow in the same manner as networks (GCN) to learn a representation of the user from
the LSTM unit does, but without the use of a memory unit. a graph created by user behavioral information. The authors
It literally exposes the entire hidden content with no control used two recursive neural networks based on tree structure:
whatsoever. When it comes to learning long-term dependen- bottom-up RvNN encoder and top-down RvNN encoder. The
cies, the quality of GRU is way better than LSTM. Hence, it tree structure is shown in Figure 8. The proposed model
is a promising candidate for NLP applications [41]. GRUs performed worse for the non-rumor class cause user behavior
are more straightforward as well as much more proficient information brings some interference in non-rumor detection.
compared to LSTM. GRU is still in its early stages, thus, we Another study by Bian et al. [139] proposed top-down
are seeing it being used lately to identify false news. GRU GCN and bottom-up GCN using a novel method DropEdge
is a newer algorithm with a performance comparable to that [141] for reducing over-fitting of GCNs. In addition, a
of LSTM but greater computational efficiency. Li et al. [134] root feature enhancement operation is utilized to improve
used a deep bidirectional GRU neural network (two-layer the performance of rumor detection. Although it performed
bidirectional GRU) as rumor detection model. The model well on three datasets (Weibo, Twitter15, Twitter16), the
suffers from slow convergence. S and Chitturi [41] showed outliers in the dataset affected the models’ performance.

156160 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

On the other hand, GCNs incur a significant memory foot- classification [147], and network embedding [148]. The
print in storing the complete adjacency matrix. Furthermore, unique problem for detecting fake news is the recognition
GCNs are transductive, which implies that inferred nodes of false news on recently emergent events on social media.
must be present at the training time. And do not guarantee To solve this problem, Wang et al. [44] suggested an end-
generalizable representations [142]. Wu et al. [143] proposed to-end architecture called event adversarial neural network
an algorithm of representation learning with a gated graph (EANN). This architecture is used to extract event-invariant
neural network named PGNN (propagation graph neural characteristics and, therefore, aids in the identification of
network). The suggested technique can incorporate struc- false news on newly incoming events. It is made up of three
tural and textual features into high-level representations by major components: a multimodal feature extractor, a fake
propagating information among neighbor nodes throughout news detector, and an event discriminator. Another study
the propagation network. In order to obtain considerable by Le et al. [149] introduced Malcom that generates mali-
performance improvements, they also added an attention cious comments which have fooled five popular fake news
mechanism. The propagation graph is built using the who- detectors (CSI, dEFEND, etc.) to detect fake news as real
replies-to-whom structure, but the follower-followee and for- news with 94% and 90% attack success rates. The authors
ward relationships are omitted. Zhang et al. [144] presented a showed that existing methods are not resilient against poten-
simplified aggregation graph neural network (SAGNN) based tial attacks. Though the model performed well, it is not evalu-
on efficient aggregation layers. Experiments on publicly ated using defense mechanisms, namely adversarial learning.
accessible Twitter datasets show that the proposed network
outperforms state-of-the-art graph convolutional networks E. ATTENTION MECHANISM BASED
while considerably lowering computational costs. The attention-related approach is another notable advance-
ment. In deep neural networks, the attention mechanism is an
D. GENERATIVE ADVERSARIAL NETWORK (GAN) effort to implement the same behavior of selectively focusing
Generative Adversarial Networks (GANs) are deep learning- on a few important items while ignoring others. Attention is a
based generative models. The GAN model architecture con- bridge that connects the encoder and decoder, which provides
sists of two sub-models: a generator model for creating new information to the decoder from each encoder’s secret state.
instances and a discriminator model for determining whether Using this framework, the model selectively concentrates on
the produced examples are genuine or fake, generated by the valuable components from the input. Thus the model
the generator model. Existing adversarial networks are often will be able to discover the associations among them. This
employed to create images that may be matched to observed allows the model to deal with lengthy input sentences more
samples using a minimax game framework [44]. The gener- effectively. Unlike RNNs or CNNs, attention mechanisms
ator model produces new images from the features learned maintain word dependencies in a sentence despite the dis-
from the training data that resemble the original image. The tance between them. The primary downside of the attention
discriminator model predicts whether the generated image is mechanism is that it adds additional weight parameters to the
fake or real. GANs are extremely successful in generative model, which might lengthen the training time, especially if
modeling and are used to train discriminators in a semi- the model’s input data are long sequences.
supervised context to assist in eliminating human participa- A study by Long [150] proposed attention-based LSTM
tion in data labeling. Furthermore, GANs are useful when with speaker profile features, and their experimental findings
the data have imbalanced classes or underrepresented sam- suggest that employing speaker profiles can help enhance
ples. GANs produce synthetic data only if they are based on fake news identification. Recently, attention techniques have
continuous numbers. But GANs are inapplicable to NLP data been used to efficiently extract information related to a mini
because all NLPs are based on discrete values such as words, query (article headline) from a long text (news content) [47],
letters, or bytes [145]. To train GANs for text data, novel [87]. A study by Singhania et al. [87] used an automated
techniques are required. detector through a three-level hierarchical attention network
A study by Long [145] proposed sequence GAN (3HAN). Three levels exist in 3HAN, one for words, one
(SeqGAN), which is a GAN architecture that overcomes the for sentences, and one for the headline. Because of its three
problem of gradient descent in GANs for discrete outputs levels of attention, 3HAN assigns different weights to differ-
by employing reinforcement learning (RL) based approach ent sections of an article. In contrast to other deep learning
and Monte Carlo search. The authors provide actual news models, 3HAN yields understandable results. While 3HAN
content to the GAN. Then a classifier based on Google’s only uses textual information, a study by Jin et al. [47] used
BERT model was trained to identify the real samples from the image features, including social context and text features, as
samples generated by the GAN. The architecture of SeqGAN well as attention on RNN (att-RNN). Another study used
is provided in Figure 9. RNNs with a soft-attention mechanism to filter out unique
In generative adversarial networks, the principle of linguistic features [151]. However, this method is based on
adversarial learning was invented. The adversarial learn- distinct domain and community features without any external
ing concept has produced outstanding results in a wide evidence. Thus, it provides a restricted context for credibility
range of topics, including information retrieval [146], text analysis.

VOLUME 9, 2021 156161


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

FIGURE 8. This figure illustrates the propagation tree structure encoder taken from Huang et al. [140].

user retweet sequence, as well as user profiles. Given the


chronology of its retweeters, GCAN can determine whether
a short-text tweet is fraudulent. However, this model is not
suitable for long text as it is difficult to find the relationship
between a long tweet and retweet propagation.

F. BIDIRECTIONAL ENCODER REPRESENTATIONS FOR


TRANSFORMERS (BERT)
BERT is a deep learning model that has shown cutting-edge
results across a wide variety of natural language processing
applications. BERT incorporates pre-training language rep-
resentations developed by Google. BERT is a sophisticated
pre-trained word-embedding model built on a transformer-
encoded architecture [89]. The BERT method is distinctive
FIGURE 9. A basic SeqGAN architecture. The figure is taken from
Hiriyannaiah et al. [145]. in its capacity to identify and capture contextual meaning
in a sentence or text [90]. The main restriction of con-
ventional language models is that they are unidirectional,
To overcome the shortcomings of previous works, which restricts the architectures that could be utilized dur-
Aloshban [152] proposed an automatic fake news classifica- ing pre-training. The BERT model eliminates unidirectional
tion through self-attention (ACT). Their principle is inspired limitations by using a mask language model (MLM). BERT
by the fact that claim texts are fairly short and hence cannot employs the next sentence prediction (NSP) task in addition
be used for classification efficiently. Their suggested frame- to the masked language model to jointly pre-train text-pair
work makes use of mutual interactions between a claim and representations. BERT consists of two stages: pre-training
many supporting responses. The LSTM neural network was and fine-tuning. During pre-training, the model was trained
applied to the article input. The outcome of the final step of on unlabeled data using a variety of pre-training tasks. For
LSTM may not completely reflect the semantics of the article. fine-tuning, the BERT model is first initialized with the
Connecting all vector representations of words in the text pre-trained parameters, and then all of the parameters are
will lead to a massive vector dimension. Therefore, the inter- fine-tuned using labeled data from the downstream jobs. The
nal connection between the articles’ words can be ignored. architecture of the BERT model is shown in figure 10.
As a result, employing the self-attention function on the The data utilized in the BERT model are generic data gath-
LSTM model extracts key parts of the article through several ered from Wikipedia and the Book Corpus. While these data
feature vectors. Their strategy is heavily reliant on self- contain a wide range of information, specific information on
attention and an article representation matrix. Graph-aware individual domains is still lacking. To overcome this problem,
co-attention networks (GCAN) is an innovative approach for a study by Jwa et al. [75] incorporated news data in the
detecting fake news [153]. The authors predict if a source pre-training phase to boost fake news identification skills.
tweet article is false based just on its brief text content and When compared to the state-of-the-art model stackLSTM,

156162 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

FIGURE 10. The BERT architecture taken from Devlin et al. [89].

the proposed model named exBAKE (BERT with extra activities [86]. Many researchers have used an ensemble
unlabeled news corpora) outperformed by a 0.137 F1-score. approach to boost their performance [42], [133]. Agarwal
Ding et al. [154] discovered that including mental features and Dixit [63] combined two datasets, namely, Liar and
such as a speaker’s credit history at the language level might Kaggle, to evaluate the performance of LSTM and achieved
considerably improve BERT model performance. The history an accuracy of 97%. They also used various models like CNN,
feature helps further the relationship’s construction between LSTM, SVM, naive bayes (NB), and k-nearest neighbour
the event and the person in reality. But these studies did not (KNN) for building an ensemble model. The authors showed
consider any pre-processing methods. an average accuracy score of their used algorithms but did
Zhang et al. [91] presented a BERT-based domain- not show the accuracy of their ensemble model, which is a
adaption neural network for multimodal false news detection limitation of their work.
(BDANN). BDANN is made up of three major components: Often the CNN-LSTM ensemble approach has been used
a multimodal feature extractor, a domain classifier, and a in previous DL-based studies. Kaliyar [67] used an ensemble
false news detector. The pre-trained BERT model was used of CNN and LSTM, and the accuracy was slightly lower than
to extract text features, whereas the pre-trained VGG-19 that of the state-of-the-art CNN model. However, the preci-
model was used to extract image features in the multimodal sion and recall were effectively improved. Asghar et al. [135]
feature extractor. The extracted features are then concate- obtained an increase in the efficiency of their model by using
nated and sent to the detector to differentiate between fake Bi-LSTM. The Bi-LSTM retains knowledge from both for-
and real news. Moreover, the existence of noisy images mer and upcoming contexts before rendering its input to the
in the Weibo dataset have affected the BDANN results. CNN model. Even though CNN and RNN typically require
Kaliyar et al. [92] proposed a BERT-based deep convolu- huge datasets to function successfully, Ajao et al. [133]
tional approach (fakeBERT) for fake news detection. The trained LSTM-CNN with a smaller dataset. The above-
fakeBERT is a combination of different parallel blocks mentioned works considered just text-based features for fake
of a one-dimensional deep convolutional neural network news classification, whereas the addition of new features
(1d-CNN) with different kernel sizes and filters and the may generate a more significant result. While most studies
BERT. Different filters can extract convenient information used CNN with LSTM, a study by Amine et al. [131] merged
from the training dataset. The combination of BERT with two convolutional neural networks to integrate metadata with
1d-CNN can deal with both large-scale structure and unstruc- text. They illustrate that integrating metadata with text will
tured text. Therefore, the combination is beneficial in dealing result in substantial improvements in fine-grained fake news
with ambiguity. detection. Furthermore, when tested on real-world datasets,
this approach shows improvements compared to the text-
G. ENSEMBLE APPROACH only deep learning model. Moving further Kumar et al. [86]
Ensemble approaches are strategies that generate several employed the use of an attention layer. It assists the CNN +
models and combine them to achieve better results. Ensemble LSTM model in learning to pay attention to particular
models typically yield more precise solutions than a sin- regions of input sequences rather than the full series of
gle model does. An ensemble reduces the distribution or input sequences. Utilizing the attention mechanism with
dispersion of predictions and model efficiency. Ensembling CNN+LSTM was reported to be efficient by a small margin.
can be applied to supervised and unsupervised learning Result analysis of DL-based studies is presented in Table 7.

VOLUME 9, 2021 156163


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 6. The table contains the strength and limitation of popular existing studies with reference and used classifier.

VI. EVALUATION METRICS B. PRECISION


A key step in a predictive modeling pipeline is to evaluate the Precision (P) is defined as the number of actual positive find-
output of a machine-learning model. Although a model may ings divided by the total number of positive results, including
have a higher classification result once constructed, it must be incorrectly recognized ones. The precision can be computed
determined whether it can address the specific problem in dif- using Equation (2).
ferent circumstances. Classification accuracy alone is usually TruePositive
insufficient to make this judgment. Other assessment met- P= (2)
rics are necessary for proper evaluation. Since a promising Positive + FalsePositive
method is required to pass the assessment metric’s evaluation, C. RECALL
it is easy to create a model, but it is more challenging to create When the total number of samples that should have been
a promising strategy. Diverse evaluation metrics are used to identified as positive is used to divide, the number of true
evaluate the model’s efficiency. The evaluation matrix is an positive results is referred to as recall (R). The recall can be
essential device for arranging and organizing an evaluation. computed using Equation (3).
The confusion matrix shows an overview of model perfor-
mance on the testing dataset from the known true values. TruePositive
R= (3)
It provides a review of the model’s success and useful results TruePositive + FalseNegative
of true positive, true negative, false positive, and false nega-
D. F1-SCORE
tive. To test their models, researchers considered distinctive
sorts of metrics such as accuracy (A), precision (P), and recall The model’s accuracy for each class is defined by the
(R) [40], [54], [58]. The selection of metrics relies entirely on F1-score (F1). If the dataset is not balanced, the F1-score
the model form and its implementation strategy. We provide metric is typically used. The F1-score is often used as an
some evaluation metrics that were widely used in previous assessment matrix in fake news detection [41], [157], [158].
studies: F1-score computation can be performed using Equation (4).
precision × recall
F1 = 2 × (4)
A. ACCURACY precision + recall
The accuracy score, also known as the classification accuracy
rating, is determined as the percentage of accurate predictions E. ROC CURVE AND AUC
in proportion to the total predictions made by the model.
The Receiver Operating Characteristics (ROC) curve shows
The accuracy (A) can be depicted by the given formula in
the success of a classification model across several classifica-
Equation (1).
tion thresholds. True Positive Rate (Recall) and False Positive
TruePositive + TrueNegative Rate (FPR) are used in this curve. AUC is an abbreviation for
A= (1)
TotalNumberofPredictions ‘‘Area Under the ROC curve.’’ In other words, AUC tests the

156164 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

TABLE 7. The table contains the result in accuracy of DL-based studies along with used method and NLP techniques.

whole two-dimensional field under the entire ROC curve. The • Propagation-based studies are scarce in this domain
FPR can be defined as in Equation (5). [117]. Network-based patterns of news propagation are a
FalsePositive piece of information that has not been comprehensively
FPR = (5) utilized for fake news detection [159]. Thus, we suggest
FalsePositive + TrueNegative
considering news propagation for fake news identifica-
VII. CHALLENGES AND RESEARCH DIRECTION tion. Meta-data and additional information can increase
Despite the fact that numerous studies have been conducted the robustness and reduce the noise of a single textual
on the identification of fake news, there is always space claim, but they must be handled with caution.
for future advancement and investigation. In the sense of • Studies focused only on text data for fake news detec-
recognizing fake news, we highlight challenges and sev- tion, whereas fake news is generated in sophisticated
eral unique exploration areas for future studies. Although ways, with text or images that have been purposefully
DL-based methods provide higher accuracy compared to the altered [95]. Only a few studies have used image fea-
other methods, there is scope to make it more acceptable. tures [109], [110]. Thus, we recommend the use of visual
• The feature and classifier selection greatly influences the data (videos and images). An examination with video
efficiency of the model. Previous studies did not place and image features will be an investigation region to
a high priority on the selection of features and classi- build a stronger and more robust system.
fiers. Researchers should focus on determining which • Studies that use a fusion of features are scarce in
classifier is most suitable for particular features. The this domain [160]. Combining information from mul-
long textual features require the use of sequence models tiple sources may be extremely beneficial in detect-
(RNNs), but limited research works have taken this into ing whether Internet articles are fake [95]. We suggest
account. We believe that studies that concentrate on the utilizing multi-model-based approaches with later pre-
selection of features and classifiers might potentially trained word embeddings. Many other hidden features
improve performance. may have a great impact on fake news detection. Hence
• The feature engineering concept is not common in deep we encourage researchers to investigate hidden features.
learning-based studies. News content and headline fea- • Fake news detection models that learn from newly
tures are the widely used features in fake news detection, emerging web articles in real-time could enhance detec-
but several other features such as user behavior [154], tion results. Another promising future work is the use
user profile, and social network behavior need to be of a transfer-learning approach for training a neural
explored. Political or religious bias in profile features network with online data streams.
and lexical, syntactic, and statistical-based features can • More data for a more significant number of fake news
increase the detection rate. A fusion of deeply hidden should be released since the lack of data is the major
text features with other statistical features may result in problem in fake news classification. We assume that
a better outcome. more training data will improve model performance.

VOLUME 9, 2021 156165


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

Datasets focused on news content are publicly available. ACKNOWLEDGMENT


On the other hand, datasets based on different textual The authors would like to thank the Advanced Machine
features are limited. Thus research utilizing additional Learning (AML) Lab for resource sharing and precious
textual features is scarce. opinions.
• Instead of a simple classifier, using an ensemble method
produces better results [49]. By constructing an ensem- REFERENCES
ble model with DL and ML algorithms, in which an [1] H. Allcott and M. Gentzkow, ‘‘Social media and fake news in
LSTM can identify the original article while passing the 2016 election,’’ J. Econ. Perspect., vol. 31, no. 2, pp. 36–211, 2017.
auxiliary features through a second model can yield [2] T. Rasool, W. H. Butt, A. Shaukat, and M. U. Akram, ‘‘Multi-label fake
news detection using multi-layered supervised learning,’’ in Proc. 11th
better results [41]. A simpler GRU model performs Int. Conf. Comput. Autom. Eng., 2019, pp. 73–77.
better than an LSTM [80]. Therefore, we recommend [3] X. Zhang and A. A. Ghorbani, ‘‘An overview of online fake news:
combining GRU and CNNs to urge the leading Characterization, detection, and discussion,’’ Inf. Process. Manage.,
vol. 57, no. 2, Mar. 2020, Art. no. 102025. [Online]. Available:
result. http://www.sciencedirect.com/science/article/pii/S0306457318306794
• Many researchers have achieved high accuracy by using [4] Abdullah-All-Tanvir, E. M. Mahir, S. Akhter, and M. R. Huq, ‘‘Detecting
CNN, LSTM, and ensemble models [42], [64]. SeqGAN fake news using machine learning and deep learning algorithms,’’ in Proc.
7th Int. Conf. Smart Comput. Commun. (ICSCC), Jun. 2019, pp. 1–5.
and Deep Belief Network (DBN) were not explored in [5] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, ‘‘Fake news detection on
this domain. We encourage researchers to experiment social media: A data mining perspective,’’ ACM SIGKDD Explorations
with these models. Newslett., vol. 19, no. 1, pp. 22–36, 2017.
[6] R. Oshikawa, J. Qian, and W. Y. Wang, ‘‘A survey on natural language
• Transformers have replaced RNN models such as LSTM processing for fake news detection,’’ 2018, arXiv:1811.00770.
as the model of choice for NLP tasks. BERT has been [7] S. B. Parikh and P. K. Atrey, ‘‘Media-rich fake news detection: A survey,’’
used in the identification of fake news, but Generative in Proc. IEEE Conf. Multimedia Inf. Process. Retr. (MIPR), Apr. 2018,
pp. 436–441.
Pre-trained Transformer (GPT) has not been used in this
[8] A. Habib, M. Z. Asghar, A. Khan, A. Habib, and A. Khan, ‘‘False
domain. We suggest using GPT by fine-tuning fake news information detection in online content and its role in decision making:
detection tasks. A systematic literature review,’’ Social Netw. Anal. Mining, vol. 9, no. 1,
• Existing algorithms make critical decisions without pp. 1–20, Dec. 2019.
[9] M. K. Elhadad, K. F. Li, and F. Gebali, ‘‘Fake news detection on social
providing precise information about the reasoning media: A systematic survey,’’ in Proc. IEEE Pacific Rim Conf. Commun.,
that results in specific decisions, predictions, recom- Comput. Signal Process. (PACRIM), Aug. 2019, pp. 1–8.
mendations, or actions [161]. Explainable Artificial [10] A. Bondielli and F. Marcelloni, ‘‘A survey on fake news and
rumour detection techniques,’’ Inf. Sci., vol. 497, pp. 38–55,
Intelligence (XAI) is a study field that tries to make Sep. 2019. [Online]. Available: http://www.sciencedirect.
the outcomes of AI systems more understandable to com/science/article/pii/S0020025519304372
humans [162]. XAI can be a valuable approach to start [11] P. Meel and D. K. Vishwakarma, ‘‘Fake news, rumor, information pollu-
tion in social media and web: A contemporary survey of state-of-the-arts,
making progress in this area. challenges and opportunities,’’ Expert Syst. Appl., vol. 153, Sep. 2020,
Art. no. 112986.
VIII. CONCLUSION [12] K. Sharma, F. Qian, H. Jiang, N. Ruchansky, M. Zhang, and Y. Liu,
‘‘Combating fake news: A survey on identification and mitigation tech-
Fake news is escalating as social media is growing. niques,’’ ACM Trans. Intell. Syst. Technol., vol. 10, no. 3, pp. 1–42,
Researchers are also trying their best to find solutions to keep May 2019.
society safe from fake news. This survey covers the overall [13] X. Zhou and R. Zafarani, ‘‘A survey of fake news: Fundamental theories,
analysis of fake news classification by discussing major stud- detection methods, and opportunities,’’ ACM Comput. Surv., vol. 53, no. 5,
pp. 1–40, 2020.
ies. A thorough understanding of recent approaches in fake [14] B. Collins, D. T. Hoang, N. T. Nguyen, and D. Hwang, ‘‘Trends in
news detection is essential because advanced frameworks are combating fake news on social media—A survey,’’ J. Inf. Telecommun.,
the front-runners in this domain. Thus, we analyzed fake vol. 5, no. 2, pp. 247–266, 2021.
[15] A. Zubiaga, A. Aker, K. Bontcheva, M. Liakata, and R. Procter, ‘‘Detec-
news identification methods based on NLP and advanced tion and resolution of rumours in social media: A survey,’’ ACM Comput.
DL strategies. We presented a taxonomy of fake news detec- Surveys, vol. 51, no. 2, pp. 1–36, Jun. 2018.
tion approaches. We explored different NLP techniques and [16] M. D. Ibrishimova and K. F. Li, ‘‘A machine learning approach to
fake news detection using knowledge verification and natural language
DL architectures and provided their strength and shortcom- processing,’’ in Proc. Int. Conf. Intell. Netw. Collaborative Syst. Cham,
ings. We have explored diverse assessment measurements. Switzerland: Springer, 2019, pp. 223–234.
We have given a short description of the experimental find- [17] H. Ahmed, I. Traore, and S. Saad, ‘‘Detecting opinion spams and fake
news using text classification,’’ Secur. Privacy, vol. 1, no. 1, p. e9,
ings of previous studies. In this field, we briefly outlined Jan. 2018.
possible directions for future research. Fake news identifi- [18] H. Ahmed, I. Traore, and S. Saad, ‘‘Detection of online fake news using
cation will remain an active research field for some time N-gram analysis and machine learning techniques,’’ in Proc. Int. Conf.
Intell., Secure, Dependable Syst. Distrib. Cloud Environ. Switzerland:
with the emergence of novel deep learning network archi- Springer, 2017, pp. 127–138.
tectures. There are fewer chances of inaccurate results using [19] B. Bhutani, N. Rastogi, P. Sehgal, and A. Purwar, ‘‘Fake news detection
deep learning-based models. We strongly believe that this using sentiment analysis,’’ in Proc. 12th Int. Conf. Contemp. Comput.
review will assist researchers in fake news detection to gain (IC), Aug. 2019, pp. 1–5.
[20] C. Castillo, M. Mendoza, and B. Poblete, ‘‘Information credibility
a better, concise perspective of existing problems, solutions, on Twitter,’’ in Proc. 20th Int. Conf. World Wide Web, Mar. 2011,
and future directions. pp. 675–684, doi: 10.1145/1963405.1963500.

156166 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

[21] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Sentiment aware fake news [42] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B.-W. On,
detection on online social networks,’’ in Proc. IEEE Int. Conf. Acoust., ‘‘Fake news stance detection using deep learning architecture (CNN-
Speech Signal Process. (ICASSP), May 2019, pp. 2507–2511. LSTM),’’ IEEE Access, vol. 8, pp. 156695–156706, 2020.
[22] B. Ghanem, P. Rosso, and F. Rangel, ‘‘An emotional analysis of false [43] N. Aslam, I. U. Khan, F. S. Alotaibi, L. A. Aldaej, and A. K. Aldubaikil,
information in social media and news articles,’’ ACM Trans. Internet ‘‘Fake detect: A deep learning ensemble model for fake news detection,’’
Technol., vol. 20, no. 2, pp. 1–18, May 2020. Complexity, vol. 2021, pp. 1–8, Apr. 2021.
[23] A. Giachanou, P. Rosso, and F. Crestani, ‘‘Leveraging emotional signals [44] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, and J. Gao,
for credibility detection,’’ in Proc. 42nd Int. ACM SIGIR Conf. Res. ‘‘EANN: Event adversarial neural networks for multi-modal fake news
Develop. Inf. Retr., Jul. 2019, pp. 877–880. detection,’’ in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery
[24] D. Khattar, J. S. Goud, M. Gupta, and V. Varma, ‘‘MVAE: Multimodal Data Mining, Jul. 2018, pp. 849–857.
variational autoencoder for fake news detection,’’ in Proc. World Wide [45] A. Alsaeedi and M. Al-Sarem, ‘‘Detecting rumors on social media based
Web Conf., May 2019, pp. 2915–2921. on a CNN deep learning technique,’’ Arabian J. Sci. Eng., vol. 45, no. 12,
[25] N. J. Conroy, V. L. Rubin, and Y. Chen, ‘‘Automatic deception detection: pp. 1–32, 2020.
Methods for finding fake news,’’ in Proc. 78th ASIST Annu. Meeting, Inf. [46] A. Thota, P. Tilak, S. Ahluwalia, and N. Lohia, ‘‘Fake news detection:
Sci. Impact, Res. Community, vol. 52, no. 1, pp. 1–4, 2015. A deep learning approach,’’ SMU Data Sci. Rev., vol. 1, no. 3, p. 10, 2018.
[26] A. R. Pathak, A. Mahajan, K. Singh, A. Patil, and A. Nair, ‘‘Analysis [47] Z. Jin, J. Cao, H. Guo, Y. Zhang, and J. Luo, ‘‘Multimodal fusion with
of techniques for rumor detection in social media,’’ Proc. Comput. Sci., recurrent neural networks for rumor detection on microblogs,’’ in Proc.
vol. 167, pp. 2286–2296, Jan. 2020. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 795–816.
[27] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, and M. Cha, [48] R. R. Mandical, N. Mamatha, N. Shivakumar, R. Monica, and
‘‘Detecting rumors from microblogs with recurrent neural networks,’’ in A. N. Krishna, ‘‘Identification of fake news using machine learn-
Proc. 25th Int. Joint Conf. Artif. Intell. (IJCAI). Res. Collection School ing,’’ in Proc. IEEE Int. Conf. Electron., Comput. Commun. Technol.
Comput. Inf. Syst., 2016, pp. 3818–3824. (CONECCT), Jul. 2020, pp. 1–6.
[28] J. Ma, W. Gao, and K.-F. Wong, ‘‘Detect rumors in microblog posts [49] S. S. Jadhav and S. D. Thepade, ‘‘Fake news identification and clas-
using propagation structure via kernel learning,’’ in Proc. 55th Annu. sification using DSSM and improved recurrent neural network classi-
Meeting Assoc. Comput. Linguistics (ACL). Vancouver, BC, Canada: Res. fier,’’ Appl. Artif. Intell., vol. 33, no. 12, pp. 1058–1068, Oct. 2019, doi:
Collection School Comput. Inf. Syst., Jul./Aug. 2017, pp. 708–717. 10.1080/08839514.2019.1661579.
[29] W. Y. Wang, ‘‘‘Liar, liar pants on fire’: A new benchmark dataset for fake [50] A. S. K. Shu, D. M. K. Shu, L. G. M. Mittal, L. G. M. Mittal, and
news detection,’’ in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, M. M. J. K. Sethi, ‘‘Fake news detection using a blend of neural net-
Vancouver, BC, Canada, Jul. 2017, pp. 422–426. [Online]. Available: works: An application of deep learning,’’ Social Netw. Comput. Sci., vol.
https://www.aclweb.org/anthology/P17-2067 1, no. 3, pp. 1–9, Jan. 1970. [Online]. Available: https://link.springer.
[30] A. Zubiaga, M. Liakata, and R. Procter, ‘‘Learning reporting dynam- com/article/10.1007/s42979-020-00165-4
ics during breaking news for rumour detection in social media,’’ 2016, [51] A. P. S. Bali, M. Fernandes, S. Choubey, and M. Goel, ‘‘Comparative
arXiv:1610.07363. performance of machine learning algorithms for fake news detection,’’
[31] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNewsNet: in Proc. Int. Conf. Adv. Comput. Data Sci. Switzerland: Springer, 2019,
A data repository with news content, social context, and spatiotemporal pp. 420–430.
information for studying fake news on social media,’’ Big Data, vol. 8, [52] A. Rusli, J. C. Young, and N. M. S. Iswari, ‘‘Identifying fake news
no. 3, pp. 171–188, Jun. 2020. in Indonesian via supervised binary text classification,’’ in Proc. IEEE
[32] M. Amjad, G. Sidorov, A. Zhila, H. Gómez-Adorno, I. Voronkov, and Int. Conf. Ind. 4.0, Artif. Intell., Commun. Technol. (IAICT), Jul. 2020,
A. Gelbukh, ‘‘‘Bend the truth’: Benchmark dataset for fake news detec- pp. 86–90.
tion in Urdu language and its evaluation,’’ J. Intell. Fuzzy Syst., vol. 39, [53] V. Tiwari, R. G. Lennon, and T. Dowling, ‘‘Not everything you read is
no. 2, pp. 2457–2469, 2020. true! Fake news detection using machine learning algorithms,’’ in Proc.
[33] E. Tacchini, G. Ballarin, M. L. Della Vedova, S. Moret, and L. de Alfaro, 31st Irish Signals Syst. Conf. (ISSC), Jun. 2020, pp. 1–4.
‘‘Some like it hoax: Automated fake news detection in social networks,’’ [54] A. Verma, V. Mittal, and S. Dawn, ‘‘FIND: Fake information and news
2017, arXiv:1704.07506. detections using deep learning,’’ in Proc. 12th Int. Conf. Contemp. Com-
[34] C. Boididou, S. Papadopoulos, and M. Zampoglou, ‘‘Detection and visu- put. (IC), Aug. 2019, pp. 1–7.
alization of misleading content,’’ Int. J. Multimedia Inf. Retr., vol. 7, no. 1, [55] M. Z. Hossain, M. A. Rahman, M. S. Islam, and S. Kar, ‘‘Ban-
pp. 71–86, 2018. FakeNews: A dataset for detecting fake news in Bangla,’’ in Proc.
[35] J. Golbeck, M. Mauriello, B. Auxier, K. H. Bhanushali, C. Bonk, 12th Lang. Resour. Eval. Conf. Marseille, France: European Language
M. A. Bouzaghrane, C. Buntain, R. Chanduka, P. Cheakalos, J. B. Everett, Resources Association, May 2020, pp. 2862–2871. [Online]. Available:
and W. Falak, ‘‘Fake news vs satire: A dataset and analysis,’’ in Proc. 10th https://www.aclweb.org/anthology/2020.lrec-1.349
ACM Conf. Web Sci., 2018, pp. 17–21. [56] P. Savyan and S. M. S. Bhanu, ‘‘UbCadet: Detection of compromised
[36] P. M. Waszak, W. Kasprzycka-Waszak, and A. Kubanek, ‘‘The spread of accounts in Twitter based on user behavioural profiling,’’ Multimedia
medical fake news in social media—The pilot quantitative study,’’ Health Tools Appl., vol. 79, pp. 1–37, Jul. 2020.
Policy Technol., vol. 7, no. 2, pp. 115–118, Jun. 2018. [57] J. Kapusta and J. Obonya, ‘‘Improvement of misleading and fake news
[37] (2020). The Year of Fake News Covid Related Scams and Ransomware. classification for flective languages by morphological group analysis,’’
Accessed: Mar. 12, 2021. [Online]. Available: https://www. in Informatics, vol. 7, no. 1. Switzerland: Multidisciplinary Digital Pub-
prnewswire.com/news-releases/2020-the-year-of-fake-news-covid- lishing Institute, 2020, p. 4.
related-scams-and-ransomware-301180568 [58] S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta,
[38] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNews- and W. Z. Khan, ‘‘An ensemble machine learning approach through
Net: A data repository with news content, social context and spatial- effective feature extraction to classify fake news,’’ Future Gener.
temporal information for studying fake news on social media,’’ 2018, Comput. Syst., vol. 117, pp. 47–58, Apr. 2021. [Online]. Available:
arXiv:1809.01286. https://www.sciencedirect.com/science/article/pii/S0167739X20330466
[39] Y.-C. Ahn and C.-S. Jeong, ‘‘Natural language contents evaluation system [59] M. G. Hussain, M. Rashidul Hasan, M. Rahman, J. Protim, and
for detecting fake news using deep learning,’’ in Proc. 16th Int. Joint Conf. S. A. Hasan, ‘‘Detection of Bangla fake news using MNB and SVM clas-
Comput. Sci. Softw. Eng. (JCSSE), Jul. 2019, pp. 289–292. sifier,’’ in Proc. Int. Conf. Comput., Electron. Commun. Eng. (iCCECE),
[40] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, ‘‘FNDNet— Aug. 2020, pp. 81–85.
A deep convolutional neural network for fake news detection,’’ Cog- [60] G. Gravanis, A. Vakali, K. Diamantaras, and P. Karadais, ‘‘Behind the
nit. Syst. Res., vol. 61, pp. 32–44, Jun. 2020.[Online]. Available: cues: A benchmarking study for fake news detection,’’ Expert Syst. Appl.,
http://www.sciencedirect.com/science/article/pii/S1389041720300085 vol. 128, pp. 201–213, Aug. 2019.
[41] S. Deepak and B. Chitturi, ‘‘Deep neural approach to Fake-News [61] P. Bahad, P. Saxena, and R. Kamal, ‘‘Fake news detection
identification,’’ Proc. Comput. Sci., vol. 167, pp. 2236–2243, using bi-directional LSTM-recurrent neural network,’’ Proc.
Jan. 2020. [Online]. Available: http://www.sciencedirect. Comput. Sci., vol. 165, pp. 74–82, Jan. 2019. [Online]. Available:
com/science/article/pii/S1877050920307420 http://www.sciencedirect.com/science/article/pii/S1877050920300806

VOLUME 9, 2021 156167


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

[62] E. Qawasmeh, M. Tawalbeh, and M. Abdullah, ‘‘Automatic identification [84] S. Helmstetter and H. Paulheim, ‘‘Weakly supervised learning for fake
of fake news using deep learning,’’ in Proc. 6th Int. Conf. Social Netw. news detection on Twitter,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social
Anal., Manage. Secur. (SNAMS), Oct. 2019, pp. 383–388. Netw. Anal. Mining (ASONAM), Aug. 2018, pp. 274–277.
[63] A. Agarwal and A. Dixit, ‘‘Fake news detection: An ensemble learning [85] J. Pennington, R. Socher, and C. Manning, ‘‘GloVe: Global vectors for
approach,’’ in Proc. 4th Int. Conf. Intell. Comput. Control Syst. (ICICCS), word representation,’’ in Proc. Conf. Empirical Methods Natural Lang.
May 2020, pp. 1178–1183. Process. (EMNLP), 2014, pp. 1532–1543.
[64] S. M. Padnekar, G. S. Kumar, and P. Deepak, ‘‘BiLSTM-autoencoder [86] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, ‘‘Fake news
architecture for stance prediction,’’ in Proc. Int. Conf. Data Sci. Eng. detection using deep learning models: A novel approach,’’ Trans. Emerg.
(ICDSE), Dec. 2020, pp. 1–5. Telecommun. Technol., vol. 31, no. 2, p. e3767, Feb. 2020. [Online].
[65] M. Granik and V. Mesyura, ‘‘Fake news detection using naive Bayes clas- Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.3767
sifier,’’ in Proc. IEEE 1st Ukraine Conf. Electr. Comput. Eng. (UKRCON), [87] S. Singhania, N. Fernandez, and S. Rao, ‘‘3HAN: A deep neural net-
May 2017, pp. 900–903. work for fake news detection,’’ in Proc. Int. Conf. Neural Inf. Process.
[66] A. Jain and A. Kasbe, ‘‘Fake news detection,’’ in Proc. IEEE Int. Students’ Switzerland: Springer, 2017, pp. 572–581.
Conf. Electr., Electron. Comput. Sci. (SCEECS), 2018, pp. 1–5. [88] J. A. Nasir, O. S. Khan, and I. Varlamis, ‘‘Fake news detection: A hybrid
[67] R. K. Kaliyar, ‘‘Fake news detection using a deep neural network,’’ in CNN-RNN based deep learning approach,’’ Int. J. Inf. Manage. Data
Proc. 4th Int. Conf. Comput. Commun. Autom. (ICCCA), Dec. 2018, Insights, vol. 1, no. 1, Apr. 2021, Art. no. 100007.
pp. 1–7. [89] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
[68] G. Bhatt, A. Sharma, S. Sharma, A. Nagpal, B. Raman, and A. Mittal, of deep bidirectional transformers for language understanding,’’ 2018,
‘‘Combining neural, statistical and external features for fake news stance arXiv:1810.04805.
identification,’’ in Proc. Companion The Web Conf. Web Conf. (WWW), [90] S. Kula, M. Choraś, and R. Kozik, ‘‘Application of the bert-based archi-
2018, pp. 1353–1357, doi: 10.1145/3184558.3191577. tecture in fake news detection,’’ in Proc. Comput. Intell. Secur. Inf. Syst.
[69] F. A. Ozbay and B. Alatas, ‘‘Fake news detection within online social Conf. Switzerland: Springer, 2019, pp. 239–249.
media using supervised artificial intelligence algorithms,’’ Phys. A, Stat. [91] T. Zhang, D. Wang, H. Chen, Z. Zeng, W. Guo, C. Miao, and L. Cui,
Mech. Appl., vol. 540, Feb. 2020, Art. no. 123174. ‘‘BDANN: BERT-based domain adaptation neural network for multi-
[70] B. Al-Ahmad, A. M. Al-Zoubi, R. A. Khurma, and I. Aljarah, ‘‘An evo- modal fake news detection,’’ in Proc. Int. Joint Conf. Neural Netw.
lutionary fake news detection method for COVID-19 pandemic informa- (IJCNN), Jul. 2020, pp. 1–8.
tion,’’ Symmetry, vol. 13, no. 6, p. 1091, Jun. 2021. [92] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘FakeBERT: Fake news
[71] S. Shabani and M. Sokhn, ‘‘Hybrid machine-crowd approach for fake detection in social media with a BERT-based deep learning approach,’’
news detection,’’ in Proc. IEEE 4th Int. Conf. Collaboration Internet Multimedia Tools Appl., vol. 80, no. 8, pp. 11765–11788, Mar. 2021.
Comput. (CIC), Oct. 2018, pp. 299–306. [93] W. Shishah, ‘‘Fake news detection using BERT model with joint learn-
[72] C. M. M. Kotteti, X. Dong, N. Li, and L. Qian, ‘‘Fake news detec- ing,’’ Arabian J. Sci. Eng., vol. 46, pp. 1–13, Jun. 2021.
tion enhancement with data imputation,’’ in Proc. IEEE 16th Int. [94] H. Yuan, J. Zheng, Q. Ye, Y. Qian, and Y. Zhang, ‘‘Improving fake news
Conf. Dependable, Autonomic Secure Comput., 16th Int. Conf. Perva- detection with domain-adversarial and graph-attention neural network,’’
sive Intell. Comput., 4th Int. Conf. Big Data Intell. Comput. Cyber Decis. Support Syst., vol. 151, Dec. 2021, Art. no. 113633.
Sci. Technol. Congr. (DASC/PiCom/DataCom/CyberSciTech), Aug. 2018, [95] A. Giachanou, G. Zhang, and P. Rosso, ‘‘Multimodal multi-image fake
pp. 187–192. news detection,’’ in Proc. IEEE 7th Int. Conf. Data Sci. Adv. Anal.
[73] X. Zhou, A. Jain, V. V. Phoha, and R. Zafarani, ‘‘Fake news early (DSAA), Oct. 2020, pp. 647–654.
detection: A theory-driven model,’’ Digit. Threats, Res. Pract., vol. 1, [96] S. Girgis, E. Amer, and M. Gadallah, ‘‘Deep learning algorithms for
no. 2, pp. 1–25, Jul. 2020. detecting fake news in online text,’’ in Proc. 13th Int. Conf. Comput. Eng.
[74] P. H. A. Faustini and T. F. Covões, ‘‘Fake news detection in multi- Syst. (ICCES), Dec. 2018, pp. 93–97.
ple platforms and languages,’’ Expert Syst. Appl., vol. 158, Nov. 2020, [97] H. Reddy, N. Raj, M. Gala, and A. Basava, ‘‘Text-mining-based fake
Art. no. 113503. news detection using ensemble methods,’’ Int. J. Autom. Comput., vol. 17,
[75] H. Jwa, D. Oh, K. Park, J. Kang, and H. Lim, ‘‘ExBAKE: Automatic pp. 1–12, Apr. 2020.
fake news detection model based on bidirectional encoder representations [98] K. Shu, S. Wang, and H. Liu, ‘‘Understanding user profiles on social
from transformers (BERT),’’ Appl. Sci., vol. 9, no. 19, p. 4062, Sep. 2019. media for fake news detection,’’ in Proc. IEEE Conf. Multimedia Inf.
[76] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, ‘‘Dis- Process. Retr. (MIPR), Apr. 2018, pp. 430–435.
tributed representations of words and phrases and their compositionality,’’ [99] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro, and
in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111–3119. L. de Alfaro, ‘‘Automatic online fake news detection combining content
[77] F. C. Fernández-Reyes and S. Shinde, ‘‘Evaluating deep neural networks and social signals,’’ in Proc. 22nd Conf. Open Innov. Assoc. (FRUCT),
for automatic fake news detection in political domain,’’ in Proc. Ibero- May 2018, pp. 272–279.
Amer. Conf. Artif. Intell., Nov. 2018, pp. 206–216. [Online]. Available: [100] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, ‘‘DEFEND: Explainable
https://link.springer.com/chapter/10.1007/978-3-030-03928-8_17 fake news detection,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl.
[78] C. K. Hiramath and G. C. Deshpande, ‘‘Fake news detection using deep Discovery Data Mining, Jul. 2019, pp. 395–405.
learning techniques,’’ in Proc. 1st Int. Conf. Adv. Inf. Technol. (ICAIT), [101] M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorff, and B. Stein,
Jul. 2019, pp. 411–415. ‘‘A stylometric inquiry into hyperpartisan and fake news,’’ 2017,
[79] A. P. B. Veyseh, M. T. Thai, T. H. Nguyen, and D. Dou, ‘‘Rumor detection arXiv:1702.05638.
in social networks via deep contextual modeling,’’ in Proc. IEEE/ACM [102] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong, and K. Shu, ‘‘Mining dual
Int. Conf. Adv. Social Netw. Anal. Mining, Aug. 2019, pp. 113–120. emotion for fake news detection,’’ 2019, arXiv:1903.01728.
[80] M. Bugueño, G. Sepulveda, and M. Mendoza, ‘‘An empirical analysis of [103] S. Hosseinimotlagh and E. E. Papalexakis, ‘‘Unsupervised content-
rumor detection on microblogs with recurrent neural networks,’’ in Proc. based identification of fake news articles with tensor decomposition
Int. Conf. Hum.-Comput. Interact., Jul. 2019, pp. 293–310. [Online]. ensembles,’’ in Proc. Workshop Misinformation Misbehavior Mining Web
Available: https://link.springer.com/chapter/10.1007/978-3-030-21902- (MIS), 2018, pp. 1–8.
4_21 [104] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘DeepFakE: Improving fake
[81] E. Providel and M. Mendoza, ‘‘Using deep learning to detect rumors news detection using tensor decomposition-based deep neural network,’’
in Twitter,’’ in Proc. Int. Conf. Hum.-Comput. Interact. Switzerland: J. Supercomput., vol. 77, no. 2, pp. 1015–1037, Feb. 2021.
Springer, 2020, pp. 321–334. [105] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘EchoFakeD: Improving fake
[82] Q. Le and T. Mikolov, ‘‘Distributed representations of sentences and news detection in social media with an efficient deep neural network,’’
documents,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 1188–1196. Neural Comput. Appl., vol. 33, pp. 1–17, Jan. 2021.
[83] S. Sangamnerkar, R. Srinivasan, M. R. Christhuraj, and R. Sukumaran, [106] M. Dong, L. Yao, X. Wang, B. Benatallah, Q. Z. Sheng, and H. Huang,
‘‘An ensemble technique to detect fabricated news article using machine ‘‘DUAL: A deep unified attention model with latent relation represen-
learning and natural language processing techniques,’’ in Proc. Int. Conf. tations for fake news detection,’’ in Proc. Int. Conf. Web Inf. Syst. Eng.
Emerg. Technol. (INCET), Jun. 2020, pp. 1–7. Switzerland: Springer, 2018, pp. 199–209.

156168 VOLUME 9, 2021


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

[107] J. Zhang, B. Dong, and P. S. Yu, ‘‘FakeDetector: Effective fake news [130] U. Kamath, J. Liu, and J. Whitaker, Deep Learning for NLP and Speech
detection with deep diffusive neural network,’’ in Proc. IEEE 36th Int. Recognition, vol. 84. Switzerland: Springer, 2019.
Conf. Data Eng. (ICDE), Apr. 2020, pp. 1826–1829. [131] B. M. Amine, A. Drif, and S. Giordano, ‘‘Merging deep learning model
[108] H. Karimi, P. Roy, S. Saba-Sadiya, and J. Tang, ‘‘Multi-source multi-class for fake news detection,’’ in Proc. Int. Conf. Adv. Electr. Eng. (ICAEE),
fake news detection,’’ in Proc. 27th Int. Conf. Comput. Linguistics, 2018, Nov. 2019, pp. 1–4.
pp. 1546–1557. [132] Q. Li, Q. Hu, Y. Lu, Y. Yang, and J. Cheng, ‘‘Multi-level word features
[109] D. Mangal and D. K. Sharma, ‘‘Fake news detection with integration based on CNN for fake news detection in cultural communication,’’ Pers.
of embedded text cues and image features,’’ in Proc. 8th Int. Conf. Ubiquitous Comput., vol. 24, no. 2, pp. 1–14, 2019.
Rel., INFOCOM Technol. Optim., Trends Future Directions (ICRITO), [133] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Fake news identification on
Jun. 2020, pp. 68–72. Twitter with hybrid CNN and RNN models,’’ in Proc. 9th Int. Conf.
[110] P. Qi, J. Cao, T. Yang, J. Guo, and J. Li, ‘‘Exploiting multi-domain visual Social Media Soc., New York, NY, USA, Jul. 2018, pp. 226–230, doi:
information for fake news detection,’’ in Proc. IEEE Int. Conf. Data 10.1145/3217804.3217917.
Mining (ICDM), Nov. 2019, pp. 518–527. [134] L. Li, G. Cai, and N. Chen, ‘‘A rumor events detection method based on
[111] K. Shu, X. Zhou, S. Wang, R. Zafarani, and H. Liu, ‘‘The role of user deep bidirectional GRU neural network,’’ in Proc. IEEE 3rd Int. Conf.
profiles for fake news detection,’’ in Proc. IEEE/ACM Int. Conf. Adv. Image, Vis. Comput., Jun. 2018, pp. 755–759.
[135] M. Z. Asghar, A. Habib, A. Habib, A. Khan, R. Ali, and A. Khattak,
Social Netw. Anal. Mining, Aug. 2019, pp. 436–439.
‘‘Exploring deep neural networks for rumor detection,’’ J. Ambient Intell.
[112] H. Guo, J. Cao, Y. Zhang, J. Guo, and J. Li, ‘‘Rumor detection with
Humanized Comput., vol. 12, no. 4, pp. 1–19, 2019.
hierarchical social attention network,’’ in Proc. 27th ACM Int. Conf. Inf.
[136] S. R. Sahoo and B. B. Gupta, ‘‘Multiple features based approach for
Knowl. Manage., Oct. 2018, pp. 943–951.
automatic fake news detection on social networks using deep learning,’’
[113] J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto, Appl. Soft Comput., vol. 100, Mar. 2021, Art. no. 106983.
‘‘Explainable machine learning for fake news detection,’’ in Proc. 10th [137] Q. Liao, H. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and
ACM Conf. Web Sci. (WebSci), New York, NY, USA, 2019, pp. 17–26, Y. Ding, ‘‘An integrated multi-task model for fake news detection,’’
doi: 10.1145/3292522.3326027. IEEE Trans. Knowl. Data Eng., early access, Jan. 28, 2021, doi:
[114] J. Kim, B. Tabibian, A. Oh, B. Schölkopf, and M. Gomez-Rodriguez, 10.1109/TKDE.2021.3054993.
‘‘Leveraging the crowd to detect and reduce the spread of fake news and [138] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
misinformation,’’ in Proc. 11th ACM Int. Conf. Web Search Data Mining, ‘‘The graph neural network model,’’ IEEE Trans. Neural Netw., vol. 20,
Feb. 2018, pp. 324–332. no. 1, pp. 61–80, Jan. 2008.
[115] K. Popat, S. Mukherjee, A. Yates, and G. Weikum, ‘‘DeClarE: Debunking [139] T. Bian, X. Xiao, T. Xu, P. Zhao, W. Huang, Y. Rong, and A. Huang,
fake news and false claims using evidence-aware deep learning,’’ 2018, ‘‘Rumor detection on social media with bi-directional graph convolu-
arXiv:1809.06416. tional networks,’’ in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 1,
[116] T. Saikh, A. Anand, A. Ekbal, and P. Bhattacharyya, ‘‘A novel approach pp. 549–556.
towards fake news detection: Deep learning augmented with textual [140] Q. Huang, C. Zhou, J. Wu, M. Wang, and B. Wang, ‘‘Deep structure
entailment features,’’ in Proc. Int. Conf. Appl. Natural Lang. Inf. Syst. learning for rumor detection on Twitter,’’ in Proc. Int. Joint Conf. Neural
Switzerland: Springer, 2019, pp. 345–358. Netw. (IJCNN), Jul. 2019, pp. 1–8.
[117] L. Wu and H. Liu, ‘‘Tracing fake-news footprints: Characterizing social [141] Y. Rong, W. Huang, T. Xu, and J. Huang, ‘‘DropEdge: Towards
media messages by how they propagate,’’ in Proc. 11th ACM Int. Conf. deep graph convolutional networks on node classification,’’ 2019,
Web Search Data Mining, Feb. 2018, pp. 637–645. arXiv:1907.10903.
[118] K. Shu, S. Wang, and H. Liu, ‘‘Beyond news contents: The role of social [142] Y. Ren, B. Wang, J. Zhang, and Y. Chang, ‘‘Adversarial active learning
context for fake news detection,’’ in Proc. 12th ACM Int. Conf. Web based heterogeneous graph neural network for fake news detection,’’ in
Search Data Mining, Jan. 2019, pp. 312–320. Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2020, pp. 452–461.
[119] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M. Bronstein, ‘‘Fake [143] Z. Wu, D. Pi, J. Chen, M. Xie, and J. Cao, ‘‘Rumor detection based on
news detection on social media using geometric deep learning,’’ 2019, propagation graph neural network with attention mechanism,’’ Expert
arXiv:1902.06673. Syst. Appl., vol. 158, Nov. 2020, Art. no. 113595. [Online]. Available:
[120] M. Albahar, ‘‘A hybrid model for fake news detection: Leveraging news http://www.sciencedirect.com/science/article/pii/S095741742030419X
content and user comments in fake news,’’ IET Inf. Secur., vol. 15, no. 2, [144] L. Zhang, J. Li, B. Zhou, and Y. Jia, ‘‘Rumor detection based on SAGNN:
pp. 169–177, Mar. 2021. Simplified aggregation graph neural networks,’’ Mach. Learn. Knowl.
[121] B. Al Asaad and M. Erascu, ‘‘A tool for fake news detection,’’ in Proc. Extraction, vol. 3, no. 1, pp. 84–94, Jan. 2021. [Online]. Available:
20th Int. Symp. Symbolic Numeric Algorithms Sci. Comput. (SYNASC), https://www.mdpi.com/2504-4990/3/1/5
[145] S. Hiriyannaiah, A. Srinivas, G. K. Shetty, G. Siddesh, and K. Srinivasa,
Sep. 2018, pp. 379–386.
‘‘A computationally intelligent agent for detecting fake news using gen-
[122] S. Aphiwongsophon and P. Chongstitvatana, ‘‘Detecting fake news
erative adversarial networks,’’ in Hybrid Computational Intelligence:
with machine learning method,’’ in Proc. 15th Int. Conf. Electr. Eng.,
Challenges and Applications. Amsterdam, The Netherlands: Elsevier,
Electron., Comput., Telecommun. Inf. Technol. (ECTI-CON), Jul. 2018,
2020, p. 69.
pp. 528–531.
[146] J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, B. Wang, P. Zhang, and
[123] N. Ruchansky, S. Seo, and Y. Liu, ‘‘CSI: A hybrid deep model for fake
D. Zhang, ‘‘IRGAN: A minimax game for unifying generative and dis-
news detection,’’ in Proc. ACM Conf. Inf. Knowl. Manage., New York,
criminative information retrieval models,’’ in Proc. 40th Int. ACM SIGIR
NY, USA, Nov. 2017, pp. 797–806, doi: 10.1145/3132847.3132877.
Conf. Res. Develop. Inf. Retr., Aug. 2017, pp. 515–524.
[124] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, and P. S. Yu, ‘‘TI- [147] Y. Li and J. Ye, ‘‘Learning adversarial networks for semi-supervised text
CNN: Convolutional neural networks for fake news detection,’’ CoRR, classification via policy gradient,’’ in Proc. 24th ACM SIGKDD Int. Conf.
vol. abs/1806.00749, pp. 1–11, Jun. 2018. Knowl. Discovery Data Mining, Jul. 2018, pp. 1715–1723.
[125] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physi- [148] B. Hu, Y. Fang, and C. Shi, ‘‘Adversarial learning on heterogeneous
cal layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, information networks,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl.
Dec. 2017. Discovery Data Mining, Jul. 2019, pp. 120–129.
[126] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, ‘‘Mobile encrypted [149] T. Le, S. Wang, and D. Lee, ‘‘MALCOM: Generating malicious com-
traffic classification using deep learning: Experimental evaluation, ments to attack neural fake news detection models,’’ in Proc. IEEE Int.
lessons learned, and challenges,’’ IEEE Trans. Netw. Service Manag., Conf. Data Mining (ICDM), Nov. 2020, pp. 282–291.
vol. 16, no. 2, pp. 445–458, Feb. 2019. [150] Y. Long, Q. Lu, R. Xiang, M. Li, and C.-R. Huang, ‘‘Fake news
[127] P. Yildirim and D. Birant, ‘‘The relative performance of deep learning and detection through multi-perspective speaker profiles,’’ in Proc. 8th Int.
ensemble learning for textile object classification,’’ in Proc. 3rd Int. Conf. Joint Conf. Natural Lang. Process., vol. 2. Taipei, Taiwan: Asian Fed.
Comput. Sci. Eng. (UBMK), Sep. 2018, pp. 22–26. Natural Lang. Process., Nov. 2017, pp. 252–256. [Online]. Available:
[128] D. Shen, G. Wu, and H. Suk, ‘‘Deep learning in medical image analysis,’’ https://aclanthology.org/I17-2043/
Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, Jun. 2017. [151] T. Chen, X. Li, H. Yin, and J. Zhang, ‘‘Call attention to rumors: Deep
[129] M. Veres and M. Moussa, ‘‘Deep learning for intelligent transportation attention based recurrent neural networks for early rumor detection,’’ in
systems: A survey of emerging trends,’’ IEEE Trans. Intell. Transp. Syst., Proc. Pacific–Asia Conf. Knowl. Discovery Data Mining. Switzerland:
vol. 21, no. 8, pp. 3152–3168, Aug. 2020. Springer, 2018, pp. 40–52.

VOLUME 9, 2021 156169


M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning

[152] N. Aloshban, ‘‘ACT: Automatic fake news classification through self- MD. ABDUL HAMID was born in Sonatola,
attention,’’ in Proc. 12th ACM Conf. Web Sci., Jul. 2020, pp. 115–124. Pabna, Bangladesh. He received the Bachelor of
[153] Y.-J. Lu and C.-T. Li, ‘‘GCAN: Graph-aware co-attention net- Engineering degree in computer and information
works for explainable fake news detection on social media,’’ 2020, engineering from the International Islamic Univer-
arXiv:2004.11648. sity Malaysia (IIUM), in 2001, and the combined
[154] J. Ding, Y. Hu, and H. Chang, ‘‘BERT-based mental model, a better fake
master’s and Ph.D. degree from the Computer
news detector,’’ in Proc. 6th Int. Conf. Comput. Artif. Intell., New York,
NY, USA, Apr. 2020, pp. 396–400, doi: 10.1145/3404555.3404607.
Engineering Department, Kyung Hee University,
[155] L. Wu, Y. Rao, H. Yu, Y. Wang, and A. Nazir, ‘‘False information South Korea, in August 2009, majoring in infor-
detection on social media via a hybrid deep model,’’ in Proc. Int. Conf. mation communication. His education life spans
Social Inform., Sep. 2018, pp. 323–333, doi: 10.1007/978-3-030-01159- over different countries in the world. From 1989
8_31. to 1995, his high school and college graduation at the Rajshahi Cadet
[156] A. Choudhary and A. Arora, ‘‘Linguistic feature based learning model College, Bangladesh. He has been in the teaching profession throughout
for fake news detection and classification,’’ Expert Syst. Appl., vol. 169, his life, which also spans over different parts of the globe. From 2002
May 2021, Art. no. 114171. to 2004, he was a Lecturer with the Computer Science and Engineering
[157] D. K. Vishwakarma, D. Varshney, and A. Yadav, ‘‘Detection and veracity Department, Asian University of Bangladesh, Dhaka, Bangladesh. From
analysis of fake news via scrapping and authenticating the web search,’’ 2009 to 2012, he was an Assistant Professor with the Department of Infor-
Cognit. Syst. Res., vol. 58, pp. 217–229, Dec. 2019.
[158] Z. Jin, J. Cao, Y. Zhang, and J. Luo, ‘‘News verification by exploiting
mation and Communications Engineering, Hankuk University of Foreign
conflicting social viewpoints in microblogs,’’ in Proc. 13th AAAI Conf. Studies (HUFS), South Korea. From 2012 to 2013, he was an Assistant
Artif. Intell. (AAAI), 2016, pp. 2972–2978. Professor with the Department of Computer Science and Engineering, Green
[159] X. Zhou and R. Zafarani, ‘‘Fake news detection: An interdisciplinary University of Bangladesh. From 2013 to 2016, he was an Assistant Professor
research,’’ in Proc. Companion World Wide Web Conf., May 2019, with the Department of Computer Engineering, Taibah University, Madinah,
p. 1292. Saudi Arabia. From 2016 to 2017, he was an Associate Professor with
[160] R. Kumari and A. Ekbal, ‘‘AMFB: Attention based multimodal factorized the Department of Computer Science, Faculty of Science and Information
bilinear pooling for multimodal fake news detection,’’ Expert Syst. Appl., Technology, American International University-Bangladesh, Dhaka. From
vol. 184, Dec. 2021, Art. no. 115412. 2017 to 2019, he was an Associate Professor and a Professor with the Depart-
[161] A. Nascita, A. Montieri, G. Aceto, D. Ciuonzo, V. Persico, and ment of Computer Science and Engineering, University of Asia Pacific,
A. Pescape, ‘‘XAI meets mobile traffic classification: Understand-
Dhaka. Since 2019, he has been a Professor with the Department of Infor-
ing and improving multimodal deep learning architectures,’’ IEEE
mation Technology, King Abdulaziz University, Jeddah, Saudi Arabia. His
Trans. Netw. Service Manage., early access, Jul. 19, 2021, doi:
10.1109/TNSM.2021.3098157. research interests include network/cyber-security, natural language process-
[162] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A sur- ing, machine learning, wireless communications, and networking protocols.
vey on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
pp. 52138–52160, 2018.

M. F. MRIDHA (Senior Member, IEEE) received MUHAMMAD MOSTAFA MONOWAR received


the Ph.D. degree in AI/ML from Jahangirnagar the B.Sc. degree in computer science and infor-
University, in 2017. He joined as a Lecturer at mation technology from the Islamic University
the Department of Computer Science and Engi- of Technology (IUT), Bangladesh, in 2003, and
neering, Stamford University Bangladesh, in June the Ph.D. degree in computer engineering from
2007. He was promoted as a Senior Lecturer at Kyung Hee University, South Korea, in 2011.
the Department of Computer Science and Engi- He worked as a Faculty Member at the Department
neering, in October 2010, and promoted as an of Computer Science and Engineering, University
Assistant Professor at the Department of Computer of Chittagong, Bangladesh. He is currently work-
Science and Engineering, in October 2011. Then, ing as an Associate Professor at the Department
he joined as an Assistant Professor at UAP, in May 2012. He worked as a of Information Technology, King Abdulaziz University, Saudi Arabia. His
CSE Department Faculty Member at the University of Asia Pacific and a research interests include wireless networks, mostly ad-hoc, sensor, and
Graduate Coordinator, from 2012 to 2019. He is currently working as an mesh networks, including routing protocols, MAC mechanisms, IP and trans-
Associate Professor with the Department of Computer Science and Engi- port layer issues, cross-layer design, and QoS provisioning, security and
neering, Bangladesh University of Business and Technology. His research privacy issues, and natural language processing. He has served as a pro-
experience, within both academia and industry, results in over 80 journals gram committee member for several international conferences/workshops.
and conference publications. For more than ten years, he has been with the He served as an editor for a couple of books published by CRC Press and
masters and undergraduate students as a supervisor of their thesis work. Taylor & Francis Group. He also served as a guest editor for several journals.
His research interests include artificial intelligence (AI), machine learning,
deep learning, natural language processing (NLP), and big data analysis.
He has served as a program committee member for several international
conferences/workshops. He served as an associate editor for several journals.
MD. SAIFUR RAHMAN is currently working
as an Assistant Professor at the Department of
ASHFIA JANNAT KEYA was born in Dhaka, Computer Science and Engineering, Bangladesh
Bangladesh. She received the B.Sc. degree in University of Business and Technology. He has
computer science and engineering from the expertise in software development and has devel-
Bangladesh University of Business and Technol- oped numerous management systems. He has
ogy (BUBT), in 2021. She is currently working been a successful Director of the International
as a Research Assistant with the Department of Collegiate Programming Contest (ICPC), Dhaka
CSE, BUBT. She also works as a Researcher with Regional Contest, in 2014. Apart from the col-
the Advanced Machine Learning Lab. Her research laboration and development domain, his skills
interests include deep learning, natural language cover theoretical background in computer engineering sectors. His research
processing (NLP), and computer vision. She has interests include system design and artificial intelligence-based systems.
experienced working in C++, Python, Keras, TensorFlow, Sklearn, NumPy, He received coach awards in ICPC Dhaka Regional Contests.
Pandas, and Matplotlib.

156170 VOLUME 9, 2021

You might also like