A Comprehensive Review On Fake News Detection With Deep Learning
A Comprehensive Review On Fake News Detection With Deep Learning
ABSTRACT A protuberant issue of the present time is that, organizations from different domains are
struggling to obtain effective solutions for detecting online-based fake news. It is quite thought-provoking to
distinguish fake information on the internet as it is often written to deceive users. Compared with many
machine learning techniques, deep learning-based techniques are capable of detecting fake news more
accurately. Previous review papers were based on data mining and machine learning techniques, scarcely
exploring the deep learning techniques for fake news detection. However, emerging deep learning-based
approaches such as Attention, Generative Adversarial Networks, and Bidirectional Encoder Representations
for Transformers are absent from previous surveys. This study attempts to investigate advanced and state-of-
the-art fake news detection mechanisms pensively. We begin with highlighting the fake news consequences.
Then, we proceed with the discussion on the dataset used in previous research and their NLP techniques.
A comprehensive overview of deep learning-based techniques has been bestowed to organize representative
methods into various categories. The prominent evaluation metrics in fake news detection are also discussed.
Nevertheless, we suggest further recommendations to improve fake news detection mechanisms in future
research directions.
INDEX TERMS Natural language processing, machine learning, deep learning, fake news.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 156151
M. F. Mridha et al.: Comprehensive Review on Fake News Detection With Deep Learning
extract high-dimensional features, and c) better accuracy. future research direction. Finally, Section VIII concludes the
Further, the current wide availability of data and program- paper.
ming frameworks has boosted the usage and robustness of
DL-based approaches. Hence, in the last five years, numer- II. FAKE NEWS CONSEQUENCES
ous articles have been published on fake news detection, There has always been fake news since the beginning of
mostly based on DL strategies [24]. An enthusiastic effort human civilization. However, the spread of fake news is
has been made to review the current literature to compare the increased by modern technologies and the conversion of the
extensive amount of DL-based fake news detection research global media landscape. The major consequences on social,
efforts. political, and economic environments may be caused by fake
A number of research works has been published on the news. Fake information and fake news have various faces.
survey of fake news detection [5], [25], [26]. Our investiga- As information molds our view toward the world, fake news
tion reveals that existing studies do not provide a thorough has a huge impact. We make critical decisions based on
overview of deep learning-based architectures for detecting the information. By obtaining information, we develop an
fake news. The existing survey papers mostly cover the ML impression about a situation or people. We cannot obtain
strategies in detecting fake news, scarcely exploring the DL good decisions if we find fake, false, distorted, or fabricated
strategies [3], [9], [10]. We provide a complete list of NLP information on the Internet. The primary impacts of fake news
techniques as well as describe their benefits and drawbacks. are as follows:
In what follows, in this survey, we performed an in depth Impact on Innocent People: Rumors can have a major
analysis of current DL-based studies. Table 1 provides a impact on specific people. These people may be harassed
brief overview of the existing survey papers and our research by social media. They may also face insults and threats
contributions. The present study aims to address the previous that may have real-life consequences. People must not
research’s weaknesses and strengths by conducting a system- believe in invalid information on social media or judge a
atic survey on fake news detection. First, we divide exist- person.
ing fake news detection research into two main categories: Impact on Health: The number of people searching for
(1) Natural Language Processing (NLP) and (2) Deep health-related news on the Internet is continuously increas-
Learning (DL). We discuss the NLP techniques such as ing. Fake news in health has a potential impact on people’s
data pre-processing, data vectorizing, and feature extraction. lives [36]. Therefore, this is one of the major challenges today.
Second, we analyze the fake news detection architectures Misinformation about health has had a tremendous impact in
based on different DL architectures. Finally, we discuss used the last year [37]. Social media platforms have made some
evaluation metrics in fake news detection. Figure 1 depicts an policy changes to ban or limit the spread of health misinfor-
overall taxonomy of fake news detection approaches. We also mation as they face pressure from doctors, lawmakers, and
include a table 2, including acronyms used throughout the health advocates.
survey to assist researchers when encountering issues due to Financial Impact: Fake news is currently a crucial problem
acronyms. in industries and the business world. Dishonest businessmen
The rest of the paper is organized as follows. Section II spread fake news or reviews to raise their profits. Fake infor-
highlights the consequences of fake news. Section III mation can cause stock prices to fall. It can ruin the fame
describes the used datasets. Section IV explains the Natu- of a business. Fake news also has an impact on customer
ral Language Processing techniques in fake news detection. expectations. Fake news can create an unethical business
Section V contains an in-depth analysis of deep learning mentality.
strategies. Section VI presents the evaluation metrics used Democratic impact: The media has discussed the fake
in previous studies. Section VII narrates the challenges and news phenomenon significantly because fake news played a
vital role in the last American presidential election. This is III. BENCHMARK DATASET
a major democratic problem. We must stop spreading fake In this section, we discuss the datasets used in various
news as it has a real impact. studies. For both training and testing, benchmark datasets
TABLE 3. The table provides details of publicly available datasets and corresponding URLs.
During data pre-processing, different visualization proce- dataset into training, validation, and test sets, few studies have
dures are helpful. A cautious pre-processing strategy is used only the training, and test sets [46], [47]. The ratios of
required to ingest the data in a neural network for fake news data split 60:20:20, 70:30, and 80:20 are very common in fake
detection because social media data sources are fragmented, news detection. The Pareto principle (for many outcomes,
unstructured, and noisy. It is a popular fact that amid the roughly 80% of consequences come from 20% of the causes)
learning stage, data pre-processing saves computational time is used to describe the 80:20 ratio. It is typically a safe bet
and space. In addition, limiting the impact of artifacts during to use the ratio that all studies applied. Mandical et al. [48]
the learning process, text pre-processing avoids every ingests applied the ratio of 90:5:5 and 80:10:10 when the number
of noisy data. The data becomes a logical representation after of articles in the dataset was less than 10,000 and greater
proper text pre-processing. It also included the most repre- than 10,000, respectively. However, they did not specify
sentative descriptive words. Umer et al. [42] experimented the purpose behind it. Jadhav and Thepade [49] compared
on a fake news detection model in which the accuracy was their model performance based on the data splitting ratio
only 78% when they used the features excluding data cleaning and showed that 75%–25% data split has more prominent
or pre-processing, which is surprisingly poor. After perform- performance than other models possessing diverse splits. The
ing the pre-processing steps and removing unnecessary data, model parameter estimates exhibit more prominent varia-
the accuracy increases dramatically to 93.0%. Data quality tion with smaller training data. Performance statistics exhibit
assessment, dimensionality reduction, and splitting of the more prominent variation with smaller testing data. Studies
dataset are the data pre-processing steps used in various stud- should be careful with splitting data so that neither variation
ies [39], [41], [43]. The pre-processing steps are elaborated is too large or too small, and it has more to do with the
in Sections IV-A1, IV-A2, and IV-A3. total number of instances in each category rather than the
percentage. The optimal split of the test, validation and train
1) DATA QUALITY ASSESSMENT sets is determined by hyperparameters, model architecture,
Data are frequently taken from numerous sources that are data dimension, etc. Table 4 provides an overview of the
ordinarily reliable and are in completely different formats. advantages and disadvantages of the splitting ratios used in
When working on a machine learning problem, more time most studies:
is invested in managing data quality issues. It is unreason-
able to anticipate that the data would be perfect. There may TABLE 4. The table gives an overview of common dataset partitioning
based on training, validation, and testing with advantages and
be some issues due to a human blunder, defects within the disadvantages. Few studies mentioned their data partitioning, and only
data collection process, or restrictions on measuring gadgets. those references are given in the table.
The quality of a dataset is often responsible for the poor
performance of fake news detection models. For this reason,
the quality of the data used in any machine learning project
will have a huge effect on the chances of success. However,
only a few studies ensure the quality of their used datasets.
S and Chitturi [41] collected the George McIntire dataset
from GitHub and dropped the rows that did not have
labels in the clarifying process, and the process surely has
a huge impact on their success in fake news detection.
To ensure the quality of the entire dataset, Wang et al. [44]
removed duplicate and low-quality images. Alsaeedi and
Al-Sarem [45] extended the data cleaning process by URL
removal, lowercase and hashtag character (#) removal,
mention character (@), and number removal. They also con- 3) TOKENIZATION, STEMMING AND LEMMATIZATION
sidered words with recurring characters such as ‘‘Likkke’’ Tokenization is a method of breaking down a text into words.
and handled emoticons by supplanting positive emoticons This can be applied to any character. Performing tokenization
with a ‘‘positive’’ word and with a ‘‘negative’’ word for on a space character is the most common way of tokenization.
negative emoticons. Chopping off an end to achieve the base word is called
stemming. The removal of derivational affixes is usually
2) TRAIN/VALIDATION/TEST SPLIT BASED included in the stemming. A derivational affix is an affix in
The dataset may be divided into train, test, and validation which one word is obtained from another. The derived word
sets. The sample of data that is utilized to adjust the param- is usually a distinct class of words from the original.
eters is called the training set. The validation set is a series Lemmatization is a text normalization procedure that mor-
of examples used to fine-tune the parameters of a model. phologically analyzes words, generates the root form of
A set of examples applied only for assessing a fully-specified inflated words, and is normally intended to remove inflec-
model’s performance is regarded as the test set. Although tional endings [64]. A group of letters applied to the end of
many studies on fake news detection have divided their a word to modify its meaning is known as an inflectional
ending. Some examples of inflectional endings are s, bat, and word mapping onto a continuous, high-dimensional vector
bats. space. This is considered an improvement over the BoW
Rusli et al. [52] performed two experiments to detect fake model, wherein large sparse vectors of vocabulary size were
news with and without stemming and stop-word removal. used as word vectors. These large vectors also provided no
They used stemming and stop-word removal for removing all information about how the two words were interrelated or any
affixes and stop-words. They achieved a 0.82 macro-averaged other useful information [50]. Recently, fake news detection
F1-score by performing the stemming and stop-word removal researchers have used pre-trained word-embedding models
processes. They also achieved a 0.8 macro-averaged F1-score such as global vectors for word representation (GloVe) and
without performing stemming and stop-word removal. Per- Word2vec. The primary benefit of using these models is their
forming the stemming and stop-word removal processes ability to train with large datasets [40]. Unlike Word2Vec,
in the text preprocessing phase was time-consuming, but GloVe supports parallel implementation, making it easier to
there was a small difference in the results. Although tok- train the model on huge datasets. Table 5 gives a summary
enization, stemming, and lemmatization improve the per- of the NLP techniques and word vector models used in deep
formance of the classifier, many researchers have not used learning-based fake news detection papers.
these techniques [4], [65]. Jain and Kasbe [66] presented
simple technique with web scrapping for detecting fake news. C. FEATURE EXTRACTION
They showed that updating the dataset regularly with web A huge amount of computational power and memory is
scrapping a model’s truthfulness can be checked. The authors required to analyze a large number of variables. Classification
achieved an accuracy of 91% based on text. The result can algorithms may overfit the training samples and induce
be improved greatly with some extra preprocessing, such as poorly to new samples. Feature extraction is a process
stemming and omitting stop words. of building combinations of variables to overcome these
difficulties while still representing the data with adequate pre-
B. WORD VECTORIZING cision. Feature extraction and feature selection are frequently
Word vectorizing involves mapping the word/text to a list used in text mining [69], [97].
of vectors. TF-IDF and Bag of Words (BoW) vectorization Fake news detection strategies concentrate on apply-
techniques are commonly used in machine learning strategies ing news content and social context features [98]. News
to identify fake news [4], [53], [63]. In term frequency inverse content features highlights depict the meta-information
document frequency (TF-IDF), the value rises proportionally relevant to a chunk of news [5]. Commonly, in news
to the number of times a word emerges in the document but is validation, news content (linguistics and visual informa-
balanced by the frequency of the word in the body. Although tion) is used as a feature [99], [100]. Textual features
this vectorization is successful, the semantic sense of the comprise the writing style and emotion [101], [102].
words is lost in its attempt to translate to numbers [48]. The Furthermore, hidden textual representations are generated
BoW technique considers every news article to be a document using tensor factorization [103]–[105] and deep neural
and computes the frequency count of each word within this networks [106]–[108], achieving high performance in detect-
document, which is then used to produce a numeric repre- ing false news with news contents. Visual features are
sentation of the data. In addition to data loss, this approach retrieved from visual components such as image and video,
also has limitations. The relative location of the words is but only a few studies utilized visual features in fake news
overlooked, and contextual information is lost. This loss can detection [109], [110]. In contrast, social context information
be costly at times when measured against the benefit in com- can also be aggregated for detecting fake news in social
puting convenience with the ease of use [46]. Rusli et al. [52] media. There are three main perspectives of social content: a)
used TF-IDF and Bag of Words feature extraction methods to users, b) produced posts and c) networks (connection amidst
detect fake news. However, this approach may suffer due to the users who distributed relevant posts) [5]. User-based fea-
loss of information. tures are typically from the user profile in social media [98],
Neural network-based models have accomplished victory [111]. Users’ social responses in terms of stances [42], [64],
on diverse language-related roles as opposed to traditional topics [112], or credibility [113]–[115] are represented via
machine learning-based models such as logistic regression of post-based features. Recently, several studies have focused
support vector machine (SVM) by utilizing word embeddings on stance features to detect fake news [64]. It can be effective
in fake news detection. It maps words or text to a list of for human fact-checkers to distinguish false claims [113],
vectors. They are low-dimensional, and disseminated feature [114]. To check the authenticity of a claim/report/headline,
representations are appropriate for natural languages. The it is essential to understand what different news agencies are
term ‘‘word embedding’’ refers to a combination of language declaring about that particular claim/report/headline. Refer-
modeling and feature learning. Words or expressions from the ence [116]. Features that are network-based are retrieved by
lexicon are allocated to real-number vectors. Neural network creating specialized networks, such as diffusion networks,
models essentially utilize this method for fake news detec- interaction networks, and propagation networks [117]–[119].
tion [42], [96]. Word representation was performed using The propagation network contains rich information about
dense vectors in word embedding. These vectors represent the user interactions (likes, comments, responses, or shares) that
TABLE 5. The table provides the advantages and disadvantages of Word Vector Models, along with the references.
2) POOLING LAYER
A pooling operation that chooses the greatest component
from each patch of each feature map covered by the filter is
called max pooling. A pooling layer is a new layer attached
to the convolutional layer. Its purpose is to continuously
diminish the spatial size of the representation in order to
decrease the number of parameters and the calculation inside
the network. The pooling layer operates autonomously on
each feature map. Max pooling or average pooling is the most
commonly used function in fake news detection. Alsaeedi
and Al-Sarem [45] adjusted the hyperparameter settings in
a CNN. They found the best parameter settings that gave an
improvement in the model’s performance. The recommended FIGURE 7. The figure shows an architecture of basic RNN with n
CNN model performs best when the number of units in the sequential layers. x represents the inputs and y represents the output
generated by the RNN.
dense layer is set to 100, the number of filters is set to 100,
and the window size is set to 5. The GlobalMaxPooling1D
method achieved the highest scores, showing that it works a minimum error function. The size of the gradients becomes
well for fake news detection when compared to other pooling tiny for each consequent layer. Thus, the RNN suffers from
methods [45]. a vanishing gradient issue in the bottom layers of the net-
work. We can deal with the vanishing gradient problem by
3) REGULARIZATION LAYER using three solutions: (1) using rectified linear unit (ReLU)
The most crucial problem of classification is to reduce the activation function, (2) using RMSProp optimization algo-
training and test errors of the classifier. Another common rithm, and (3) using diverse network architecture such as
issue is the over-fitting problem (the space between training long short-term memory networks (LSTM) or gated recur-
and testing errors is huge). Overfitting makes it difficult to rent unit (GRU). So previous studies focused on LSTM and
generalize the model as it becomes more applicable (over- GRU rather than the state-of-the-art RNN [80], [96], [134].
fit) to the training set. Regularization is a solution to the Bugueño et al. [80] proposed a model based on RNN for
overfitting problem. Regularization is applied to the model propagation tree classification. The authors used RNN for
to lessen the problem of overfitting and decrease the error sequence analysis. The number of epochs was set as 200,
of generalization, but not the error of training [45]. The which is relatively high in comparison to their training exam-
dropout regularization method is mostly used for fake news ples. To predict fake news articles, authors have proposed
detection [133]. Other methods such as early stopping and distinctive RNN models, specifically LSTM, GRU, tanh-
weight penalties were not used in previous studies on fake RNN, unidirectional LSTM-RNN, and vanilla RNN. RNNs,
news detection. Dropout avoids overfitting by gradually fil- and in specific LSTM, are especially successful in processing
tering out neurons. Eventually, all weights are calculated as an sequential data (human language) and catching significant
average so that the weight is not too high for a single neuron. features out of diverse data sources. Further, in Sections V-B1
and V-B2, we discuss LSTM and GRU.
B. RECURRENT NEURAL NETWORK (RNN)
The RNN is a type of neural network. In RNN, nodes are 1) LONG SHORT-TERM MEMORY (LSTM)
sequentially connected to construct a directed graph. The LSTM models are front runners in NLP problems. LSTM is
output from the earlier step serves as the input to the cur- an artificial recurrent neural network framework used in deep
rent step. RNNs are effective in time and sequence-based learning. LSTM is a progressed variation of RNN [41]. RNNs
predictions. RNN is less compatible with features compared are not capable of learning long-term dependencies because
to CNN. RNNs are suitable for studying sequential texts and back-propagation in recurrent networks takes a while, partic-
expressions. However, it cannot process very long sequences ularly for the evolving backflow of blunder. However, LSTM
when tanh or ReLU is used as an activation function. can keep ‘‘Short Term Memories’’ for ‘‘Long periods.’’ The
The backward-propagation algorithm is utilized in the LSTM is made up of three gates: an input gate, an output
RNN for training. While training the neural networks, it is gate, a forget gate, and a cell. Through a combination of the
required to take tiny steps frequently in the way of the nega- three, it calculates the hidden state. The cell can recall values
tive error derivative concerning network weights to establish over a large time interval. The word’s connection within the
beginning of the content can impact the output of the word that it is difficult to determine whether one of the gated RNNs
afterward within the sentence for this reason [67]. LSTM (LSTM, GRU) is more successful, and they are usually cho-
is an exceptionally viable solution for tending the vanishing sen based on the basis of the available computing resources.
gradient issue. Bahad et al. [61] proposed an RNN model Girgis et al. [96] experimented with CNN, LSTM, Vanilla,
that suffers from the vanishing gradient issue. To tackle this and GRU. Vanilla suffers from a gradient vanishing problem,
issue, they implemented an LSTM-RNN. But still, LSTM but GRU solves this issue. Though GRU is said to be the
could not solve the vanishing gradient issue completely. The best outcome of their studies, it takes more training time.
LSTM-RNN model had a higher precision compared to the A bidirectional GRU was utilized by Singhania et al. [87]
initial state-of-the-art CNN. Asghar et al. [135] proposed for word-by-word annotation. With preceding and subsequent
bidirectional LSTM (Bi-LSTM) with CNN for rumor detec- words, it captures the word’s meaning within the sentence.
tion. The model preserves the sequence information in both A study by Shu et al. [100] proposed a sentence-comment
directions. The Bi-LSTM layer is effective in remembering co-attention subnetwork model named dEFEND (Explain-
long-term dependency. Even though the BiLSTM-CNN beat able fake news detection) utilizing news content and user
the other models, the suggested approach is computationally comments for fake news detection. The authors considered
expensive. textual information with bidirectional GRU (Bi-GRU) to
A study by Ruchansky et al. [123] suggested a model achieve better performance. Moreover, the model has a low
called CSI, which comprises three modules, Capture, Score, learning efficiency.
and Integrate. The capture module extracts features from the
article, and the score module extracts features from the user. C. GRAPH NEURAL NETWORK (GNN)
Then by integrating article and user-based features, the CSI A Graph Neural Network is a form of neural network that
model performs the prediction for fake news detection. The operates on the graph structure directly. Node classification
CSI model has fewer parameters than other RNN-based mod- is a common application of GNN. Essentially, every node
els. Another study by Sahoo and Gupta [136] proposed an in the network has a label, and the network predicts the
approach with both user profile and news content features for labels of the nodes without using the ground truth. The
detecting false news on Facebook. The authors used LSTM network extends recursive neural networks by processing
to identify fake news, and a set of new features are extracted a broader class of graphs, including directed, undirected
by Facebook crawling and Facebook API. It requires more graphs, and cyclic, and it can handle node-focused appli-
time to train and test the suggested model. Liao et al. [137] cations except any pre-processing steps [138]. The network
proposed a novel model called fake news detection multi-task extends recursive neural networks by processing a broader
learning (FDML). The model explores the influence of topic class of graphs, including cyclic, directed, and undirected
labels for fake news while also using contextual news infor- graphs, and it can handle node-focused applications without
mation to improve detection performance on short false news. requiring any pre-processing procedures cite190. GNN cap-
The FDML model, in particular, is made up of representation tures global structural features from graphs or trees better
learning and multi-task learning components that train both than the deep-learning models discussed above [139]. GNNs
the false news detection task and the news topic categoriza- are prone to noise in the datasets. Adding a little amount
tion task at the same time. However, the performance of the of noise to the graph via node perturbation or edge deletion
model decreases without the author’s information. and addition has an antagonistic effect on the GNN output.
Graph convolutional network (GCN) is considered as one of
2) GATED RECURRENT UNIT (GRU) the basic graph neural networks variants.
In terms of structure and capabilities, GRU is comparatively A study by Huang et al. [140] claimed to be the first
easier and more proficient than LSTM. This is because there that experimented using a rich structure of user behavior for
are only two gates, to be specific, reset and update. The rumor detection. The user encoder uses graph convolutional
GRU manages the information flow in the same manner as networks (GCN) to learn a representation of the user from
the LSTM unit does, but without the use of a memory unit. a graph created by user behavioral information. The authors
It literally exposes the entire hidden content with no control used two recursive neural networks based on tree structure:
whatsoever. When it comes to learning long-term dependen- bottom-up RvNN encoder and top-down RvNN encoder. The
cies, the quality of GRU is way better than LSTM. Hence, it tree structure is shown in Figure 8. The proposed model
is a promising candidate for NLP applications [41]. GRUs performed worse for the non-rumor class cause user behavior
are more straightforward as well as much more proficient information brings some interference in non-rumor detection.
compared to LSTM. GRU is still in its early stages, thus, we Another study by Bian et al. [139] proposed top-down
are seeing it being used lately to identify false news. GRU GCN and bottom-up GCN using a novel method DropEdge
is a newer algorithm with a performance comparable to that [141] for reducing over-fitting of GCNs. In addition, a
of LSTM but greater computational efficiency. Li et al. [134] root feature enhancement operation is utilized to improve
used a deep bidirectional GRU neural network (two-layer the performance of rumor detection. Although it performed
bidirectional GRU) as rumor detection model. The model well on three datasets (Weibo, Twitter15, Twitter16), the
suffers from slow convergence. S and Chitturi [41] showed outliers in the dataset affected the models’ performance.
On the other hand, GCNs incur a significant memory foot- classification [147], and network embedding [148]. The
print in storing the complete adjacency matrix. Furthermore, unique problem for detecting fake news is the recognition
GCNs are transductive, which implies that inferred nodes of false news on recently emergent events on social media.
must be present at the training time. And do not guarantee To solve this problem, Wang et al. [44] suggested an end-
generalizable representations [142]. Wu et al. [143] proposed to-end architecture called event adversarial neural network
an algorithm of representation learning with a gated graph (EANN). This architecture is used to extract event-invariant
neural network named PGNN (propagation graph neural characteristics and, therefore, aids in the identification of
network). The suggested technique can incorporate struc- false news on newly incoming events. It is made up of three
tural and textual features into high-level representations by major components: a multimodal feature extractor, a fake
propagating information among neighbor nodes throughout news detector, and an event discriminator. Another study
the propagation network. In order to obtain considerable by Le et al. [149] introduced Malcom that generates mali-
performance improvements, they also added an attention cious comments which have fooled five popular fake news
mechanism. The propagation graph is built using the who- detectors (CSI, dEFEND, etc.) to detect fake news as real
replies-to-whom structure, but the follower-followee and for- news with 94% and 90% attack success rates. The authors
ward relationships are omitted. Zhang et al. [144] presented a showed that existing methods are not resilient against poten-
simplified aggregation graph neural network (SAGNN) based tial attacks. Though the model performed well, it is not evalu-
on efficient aggregation layers. Experiments on publicly ated using defense mechanisms, namely adversarial learning.
accessible Twitter datasets show that the proposed network
outperforms state-of-the-art graph convolutional networks E. ATTENTION MECHANISM BASED
while considerably lowering computational costs. The attention-related approach is another notable advance-
ment. In deep neural networks, the attention mechanism is an
D. GENERATIVE ADVERSARIAL NETWORK (GAN) effort to implement the same behavior of selectively focusing
Generative Adversarial Networks (GANs) are deep learning- on a few important items while ignoring others. Attention is a
based generative models. The GAN model architecture con- bridge that connects the encoder and decoder, which provides
sists of two sub-models: a generator model for creating new information to the decoder from each encoder’s secret state.
instances and a discriminator model for determining whether Using this framework, the model selectively concentrates on
the produced examples are genuine or fake, generated by the valuable components from the input. Thus the model
the generator model. Existing adversarial networks are often will be able to discover the associations among them. This
employed to create images that may be matched to observed allows the model to deal with lengthy input sentences more
samples using a minimax game framework [44]. The gener- effectively. Unlike RNNs or CNNs, attention mechanisms
ator model produces new images from the features learned maintain word dependencies in a sentence despite the dis-
from the training data that resemble the original image. The tance between them. The primary downside of the attention
discriminator model predicts whether the generated image is mechanism is that it adds additional weight parameters to the
fake or real. GANs are extremely successful in generative model, which might lengthen the training time, especially if
modeling and are used to train discriminators in a semi- the model’s input data are long sequences.
supervised context to assist in eliminating human participa- A study by Long [150] proposed attention-based LSTM
tion in data labeling. Furthermore, GANs are useful when with speaker profile features, and their experimental findings
the data have imbalanced classes or underrepresented sam- suggest that employing speaker profiles can help enhance
ples. GANs produce synthetic data only if they are based on fake news identification. Recently, attention techniques have
continuous numbers. But GANs are inapplicable to NLP data been used to efficiently extract information related to a mini
because all NLPs are based on discrete values such as words, query (article headline) from a long text (news content) [47],
letters, or bytes [145]. To train GANs for text data, novel [87]. A study by Singhania et al. [87] used an automated
techniques are required. detector through a three-level hierarchical attention network
A study by Long [145] proposed sequence GAN (3HAN). Three levels exist in 3HAN, one for words, one
(SeqGAN), which is a GAN architecture that overcomes the for sentences, and one for the headline. Because of its three
problem of gradient descent in GANs for discrete outputs levels of attention, 3HAN assigns different weights to differ-
by employing reinforcement learning (RL) based approach ent sections of an article. In contrast to other deep learning
and Monte Carlo search. The authors provide actual news models, 3HAN yields understandable results. While 3HAN
content to the GAN. Then a classifier based on Google’s only uses textual information, a study by Jin et al. [47] used
BERT model was trained to identify the real samples from the image features, including social context and text features, as
samples generated by the GAN. The architecture of SeqGAN well as attention on RNN (att-RNN). Another study used
is provided in Figure 9. RNNs with a soft-attention mechanism to filter out unique
In generative adversarial networks, the principle of linguistic features [151]. However, this method is based on
adversarial learning was invented. The adversarial learn- distinct domain and community features without any external
ing concept has produced outstanding results in a wide evidence. Thus, it provides a restricted context for credibility
range of topics, including information retrieval [146], text analysis.
FIGURE 8. This figure illustrates the propagation tree structure encoder taken from Huang et al. [140].
FIGURE 10. The BERT architecture taken from Devlin et al. [89].
the proposed model named exBAKE (BERT with extra activities [86]. Many researchers have used an ensemble
unlabeled news corpora) outperformed by a 0.137 F1-score. approach to boost their performance [42], [133]. Agarwal
Ding et al. [154] discovered that including mental features and Dixit [63] combined two datasets, namely, Liar and
such as a speaker’s credit history at the language level might Kaggle, to evaluate the performance of LSTM and achieved
considerably improve BERT model performance. The history an accuracy of 97%. They also used various models like CNN,
feature helps further the relationship’s construction between LSTM, SVM, naive bayes (NB), and k-nearest neighbour
the event and the person in reality. But these studies did not (KNN) for building an ensemble model. The authors showed
consider any pre-processing methods. an average accuracy score of their used algorithms but did
Zhang et al. [91] presented a BERT-based domain- not show the accuracy of their ensemble model, which is a
adaption neural network for multimodal false news detection limitation of their work.
(BDANN). BDANN is made up of three major components: Often the CNN-LSTM ensemble approach has been used
a multimodal feature extractor, a domain classifier, and a in previous DL-based studies. Kaliyar [67] used an ensemble
false news detector. The pre-trained BERT model was used of CNN and LSTM, and the accuracy was slightly lower than
to extract text features, whereas the pre-trained VGG-19 that of the state-of-the-art CNN model. However, the preci-
model was used to extract image features in the multimodal sion and recall were effectively improved. Asghar et al. [135]
feature extractor. The extracted features are then concate- obtained an increase in the efficiency of their model by using
nated and sent to the detector to differentiate between fake Bi-LSTM. The Bi-LSTM retains knowledge from both for-
and real news. Moreover, the existence of noisy images mer and upcoming contexts before rendering its input to the
in the Weibo dataset have affected the BDANN results. CNN model. Even though CNN and RNN typically require
Kaliyar et al. [92] proposed a BERT-based deep convolu- huge datasets to function successfully, Ajao et al. [133]
tional approach (fakeBERT) for fake news detection. The trained LSTM-CNN with a smaller dataset. The above-
fakeBERT is a combination of different parallel blocks mentioned works considered just text-based features for fake
of a one-dimensional deep convolutional neural network news classification, whereas the addition of new features
(1d-CNN) with different kernel sizes and filters and the may generate a more significant result. While most studies
BERT. Different filters can extract convenient information used CNN with LSTM, a study by Amine et al. [131] merged
from the training dataset. The combination of BERT with two convolutional neural networks to integrate metadata with
1d-CNN can deal with both large-scale structure and unstruc- text. They illustrate that integrating metadata with text will
tured text. Therefore, the combination is beneficial in dealing result in substantial improvements in fine-grained fake news
with ambiguity. detection. Furthermore, when tested on real-world datasets,
this approach shows improvements compared to the text-
G. ENSEMBLE APPROACH only deep learning model. Moving further Kumar et al. [86]
Ensemble approaches are strategies that generate several employed the use of an attention layer. It assists the CNN +
models and combine them to achieve better results. Ensemble LSTM model in learning to pay attention to particular
models typically yield more precise solutions than a sin- regions of input sequences rather than the full series of
gle model does. An ensemble reduces the distribution or input sequences. Utilizing the attention mechanism with
dispersion of predictions and model efficiency. Ensembling CNN+LSTM was reported to be efficient by a small margin.
can be applied to supervised and unsupervised learning Result analysis of DL-based studies is presented in Table 7.
TABLE 6. The table contains the strength and limitation of popular existing studies with reference and used classifier.
TABLE 7. The table contains the result in accuracy of DL-based studies along with used method and NLP techniques.
whole two-dimensional field under the entire ROC curve. The • Propagation-based studies are scarce in this domain
FPR can be defined as in Equation (5). [117]. Network-based patterns of news propagation are a
FalsePositive piece of information that has not been comprehensively
FPR = (5) utilized for fake news detection [159]. Thus, we suggest
FalsePositive + TrueNegative
considering news propagation for fake news identifica-
VII. CHALLENGES AND RESEARCH DIRECTION tion. Meta-data and additional information can increase
Despite the fact that numerous studies have been conducted the robustness and reduce the noise of a single textual
on the identification of fake news, there is always space claim, but they must be handled with caution.
for future advancement and investigation. In the sense of • Studies focused only on text data for fake news detec-
recognizing fake news, we highlight challenges and sev- tion, whereas fake news is generated in sophisticated
eral unique exploration areas for future studies. Although ways, with text or images that have been purposefully
DL-based methods provide higher accuracy compared to the altered [95]. Only a few studies have used image fea-
other methods, there is scope to make it more acceptable. tures [109], [110]. Thus, we recommend the use of visual
• The feature and classifier selection greatly influences the data (videos and images). An examination with video
efficiency of the model. Previous studies did not place and image features will be an investigation region to
a high priority on the selection of features and classi- build a stronger and more robust system.
fiers. Researchers should focus on determining which • Studies that use a fusion of features are scarce in
classifier is most suitable for particular features. The this domain [160]. Combining information from mul-
long textual features require the use of sequence models tiple sources may be extremely beneficial in detect-
(RNNs), but limited research works have taken this into ing whether Internet articles are fake [95]. We suggest
account. We believe that studies that concentrate on the utilizing multi-model-based approaches with later pre-
selection of features and classifiers might potentially trained word embeddings. Many other hidden features
improve performance. may have a great impact on fake news detection. Hence
• The feature engineering concept is not common in deep we encourage researchers to investigate hidden features.
learning-based studies. News content and headline fea- • Fake news detection models that learn from newly
tures are the widely used features in fake news detection, emerging web articles in real-time could enhance detec-
but several other features such as user behavior [154], tion results. Another promising future work is the use
user profile, and social network behavior need to be of a transfer-learning approach for training a neural
explored. Political or religious bias in profile features network with online data streams.
and lexical, syntactic, and statistical-based features can • More data for a more significant number of fake news
increase the detection rate. A fusion of deeply hidden should be released since the lack of data is the major
text features with other statistical features may result in problem in fake news classification. We assume that
a better outcome. more training data will improve model performance.
[21] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Sentiment aware fake news [42] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B.-W. On,
detection on online social networks,’’ in Proc. IEEE Int. Conf. Acoust., ‘‘Fake news stance detection using deep learning architecture (CNN-
Speech Signal Process. (ICASSP), May 2019, pp. 2507–2511. LSTM),’’ IEEE Access, vol. 8, pp. 156695–156706, 2020.
[22] B. Ghanem, P. Rosso, and F. Rangel, ‘‘An emotional analysis of false [43] N. Aslam, I. U. Khan, F. S. Alotaibi, L. A. Aldaej, and A. K. Aldubaikil,
information in social media and news articles,’’ ACM Trans. Internet ‘‘Fake detect: A deep learning ensemble model for fake news detection,’’
Technol., vol. 20, no. 2, pp. 1–18, May 2020. Complexity, vol. 2021, pp. 1–8, Apr. 2021.
[23] A. Giachanou, P. Rosso, and F. Crestani, ‘‘Leveraging emotional signals [44] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, and J. Gao,
for credibility detection,’’ in Proc. 42nd Int. ACM SIGIR Conf. Res. ‘‘EANN: Event adversarial neural networks for multi-modal fake news
Develop. Inf. Retr., Jul. 2019, pp. 877–880. detection,’’ in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery
[24] D. Khattar, J. S. Goud, M. Gupta, and V. Varma, ‘‘MVAE: Multimodal Data Mining, Jul. 2018, pp. 849–857.
variational autoencoder for fake news detection,’’ in Proc. World Wide [45] A. Alsaeedi and M. Al-Sarem, ‘‘Detecting rumors on social media based
Web Conf., May 2019, pp. 2915–2921. on a CNN deep learning technique,’’ Arabian J. Sci. Eng., vol. 45, no. 12,
[25] N. J. Conroy, V. L. Rubin, and Y. Chen, ‘‘Automatic deception detection: pp. 1–32, 2020.
Methods for finding fake news,’’ in Proc. 78th ASIST Annu. Meeting, Inf. [46] A. Thota, P. Tilak, S. Ahluwalia, and N. Lohia, ‘‘Fake news detection:
Sci. Impact, Res. Community, vol. 52, no. 1, pp. 1–4, 2015. A deep learning approach,’’ SMU Data Sci. Rev., vol. 1, no. 3, p. 10, 2018.
[26] A. R. Pathak, A. Mahajan, K. Singh, A. Patil, and A. Nair, ‘‘Analysis [47] Z. Jin, J. Cao, H. Guo, Y. Zhang, and J. Luo, ‘‘Multimodal fusion with
of techniques for rumor detection in social media,’’ Proc. Comput. Sci., recurrent neural networks for rumor detection on microblogs,’’ in Proc.
vol. 167, pp. 2286–2296, Jan. 2020. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 795–816.
[27] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, and M. Cha, [48] R. R. Mandical, N. Mamatha, N. Shivakumar, R. Monica, and
‘‘Detecting rumors from microblogs with recurrent neural networks,’’ in A. N. Krishna, ‘‘Identification of fake news using machine learn-
Proc. 25th Int. Joint Conf. Artif. Intell. (IJCAI). Res. Collection School ing,’’ in Proc. IEEE Int. Conf. Electron., Comput. Commun. Technol.
Comput. Inf. Syst., 2016, pp. 3818–3824. (CONECCT), Jul. 2020, pp. 1–6.
[28] J. Ma, W. Gao, and K.-F. Wong, ‘‘Detect rumors in microblog posts [49] S. S. Jadhav and S. D. Thepade, ‘‘Fake news identification and clas-
using propagation structure via kernel learning,’’ in Proc. 55th Annu. sification using DSSM and improved recurrent neural network classi-
Meeting Assoc. Comput. Linguistics (ACL). Vancouver, BC, Canada: Res. fier,’’ Appl. Artif. Intell., vol. 33, no. 12, pp. 1058–1068, Oct. 2019, doi:
Collection School Comput. Inf. Syst., Jul./Aug. 2017, pp. 708–717. 10.1080/08839514.2019.1661579.
[29] W. Y. Wang, ‘‘‘Liar, liar pants on fire’: A new benchmark dataset for fake [50] A. S. K. Shu, D. M. K. Shu, L. G. M. Mittal, L. G. M. Mittal, and
news detection,’’ in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, M. M. J. K. Sethi, ‘‘Fake news detection using a blend of neural net-
Vancouver, BC, Canada, Jul. 2017, pp. 422–426. [Online]. Available: works: An application of deep learning,’’ Social Netw. Comput. Sci., vol.
https://www.aclweb.org/anthology/P17-2067 1, no. 3, pp. 1–9, Jan. 1970. [Online]. Available: https://link.springer.
[30] A. Zubiaga, M. Liakata, and R. Procter, ‘‘Learning reporting dynam- com/article/10.1007/s42979-020-00165-4
ics during breaking news for rumour detection in social media,’’ 2016, [51] A. P. S. Bali, M. Fernandes, S. Choubey, and M. Goel, ‘‘Comparative
arXiv:1610.07363. performance of machine learning algorithms for fake news detection,’’
[31] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNewsNet: in Proc. Int. Conf. Adv. Comput. Data Sci. Switzerland: Springer, 2019,
A data repository with news content, social context, and spatiotemporal pp. 420–430.
information for studying fake news on social media,’’ Big Data, vol. 8, [52] A. Rusli, J. C. Young, and N. M. S. Iswari, ‘‘Identifying fake news
no. 3, pp. 171–188, Jun. 2020. in Indonesian via supervised binary text classification,’’ in Proc. IEEE
[32] M. Amjad, G. Sidorov, A. Zhila, H. Gómez-Adorno, I. Voronkov, and Int. Conf. Ind. 4.0, Artif. Intell., Commun. Technol. (IAICT), Jul. 2020,
A. Gelbukh, ‘‘‘Bend the truth’: Benchmark dataset for fake news detec- pp. 86–90.
tion in Urdu language and its evaluation,’’ J. Intell. Fuzzy Syst., vol. 39, [53] V. Tiwari, R. G. Lennon, and T. Dowling, ‘‘Not everything you read is
no. 2, pp. 2457–2469, 2020. true! Fake news detection using machine learning algorithms,’’ in Proc.
[33] E. Tacchini, G. Ballarin, M. L. Della Vedova, S. Moret, and L. de Alfaro, 31st Irish Signals Syst. Conf. (ISSC), Jun. 2020, pp. 1–4.
‘‘Some like it hoax: Automated fake news detection in social networks,’’ [54] A. Verma, V. Mittal, and S. Dawn, ‘‘FIND: Fake information and news
2017, arXiv:1704.07506. detections using deep learning,’’ in Proc. 12th Int. Conf. Contemp. Com-
[34] C. Boididou, S. Papadopoulos, and M. Zampoglou, ‘‘Detection and visu- put. (IC), Aug. 2019, pp. 1–7.
alization of misleading content,’’ Int. J. Multimedia Inf. Retr., vol. 7, no. 1, [55] M. Z. Hossain, M. A. Rahman, M. S. Islam, and S. Kar, ‘‘Ban-
pp. 71–86, 2018. FakeNews: A dataset for detecting fake news in Bangla,’’ in Proc.
[35] J. Golbeck, M. Mauriello, B. Auxier, K. H. Bhanushali, C. Bonk, 12th Lang. Resour. Eval. Conf. Marseille, France: European Language
M. A. Bouzaghrane, C. Buntain, R. Chanduka, P. Cheakalos, J. B. Everett, Resources Association, May 2020, pp. 2862–2871. [Online]. Available:
and W. Falak, ‘‘Fake news vs satire: A dataset and analysis,’’ in Proc. 10th https://www.aclweb.org/anthology/2020.lrec-1.349
ACM Conf. Web Sci., 2018, pp. 17–21. [56] P. Savyan and S. M. S. Bhanu, ‘‘UbCadet: Detection of compromised
[36] P. M. Waszak, W. Kasprzycka-Waszak, and A. Kubanek, ‘‘The spread of accounts in Twitter based on user behavioural profiling,’’ Multimedia
medical fake news in social media—The pilot quantitative study,’’ Health Tools Appl., vol. 79, pp. 1–37, Jul. 2020.
Policy Technol., vol. 7, no. 2, pp. 115–118, Jun. 2018. [57] J. Kapusta and J. Obonya, ‘‘Improvement of misleading and fake news
[37] (2020). The Year of Fake News Covid Related Scams and Ransomware. classification for flective languages by morphological group analysis,’’
Accessed: Mar. 12, 2021. [Online]. Available: https://www. in Informatics, vol. 7, no. 1. Switzerland: Multidisciplinary Digital Pub-
prnewswire.com/news-releases/2020-the-year-of-fake-news-covid- lishing Institute, 2020, p. 4.
related-scams-and-ransomware-301180568 [58] S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta,
[38] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNews- and W. Z. Khan, ‘‘An ensemble machine learning approach through
Net: A data repository with news content, social context and spatial- effective feature extraction to classify fake news,’’ Future Gener.
temporal information for studying fake news on social media,’’ 2018, Comput. Syst., vol. 117, pp. 47–58, Apr. 2021. [Online]. Available:
arXiv:1809.01286. https://www.sciencedirect.com/science/article/pii/S0167739X20330466
[39] Y.-C. Ahn and C.-S. Jeong, ‘‘Natural language contents evaluation system [59] M. G. Hussain, M. Rashidul Hasan, M. Rahman, J. Protim, and
for detecting fake news using deep learning,’’ in Proc. 16th Int. Joint Conf. S. A. Hasan, ‘‘Detection of Bangla fake news using MNB and SVM clas-
Comput. Sci. Softw. Eng. (JCSSE), Jul. 2019, pp. 289–292. sifier,’’ in Proc. Int. Conf. Comput., Electron. Commun. Eng. (iCCECE),
[40] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, ‘‘FNDNet— Aug. 2020, pp. 81–85.
A deep convolutional neural network for fake news detection,’’ Cog- [60] G. Gravanis, A. Vakali, K. Diamantaras, and P. Karadais, ‘‘Behind the
nit. Syst. Res., vol. 61, pp. 32–44, Jun. 2020.[Online]. Available: cues: A benchmarking study for fake news detection,’’ Expert Syst. Appl.,
http://www.sciencedirect.com/science/article/pii/S1389041720300085 vol. 128, pp. 201–213, Aug. 2019.
[41] S. Deepak and B. Chitturi, ‘‘Deep neural approach to Fake-News [61] P. Bahad, P. Saxena, and R. Kamal, ‘‘Fake news detection
identification,’’ Proc. Comput. Sci., vol. 167, pp. 2236–2243, using bi-directional LSTM-recurrent neural network,’’ Proc.
Jan. 2020. [Online]. Available: http://www.sciencedirect. Comput. Sci., vol. 165, pp. 74–82, Jan. 2019. [Online]. Available:
com/science/article/pii/S1877050920307420 http://www.sciencedirect.com/science/article/pii/S1877050920300806
[62] E. Qawasmeh, M. Tawalbeh, and M. Abdullah, ‘‘Automatic identification [84] S. Helmstetter and H. Paulheim, ‘‘Weakly supervised learning for fake
of fake news using deep learning,’’ in Proc. 6th Int. Conf. Social Netw. news detection on Twitter,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social
Anal., Manage. Secur. (SNAMS), Oct. 2019, pp. 383–388. Netw. Anal. Mining (ASONAM), Aug. 2018, pp. 274–277.
[63] A. Agarwal and A. Dixit, ‘‘Fake news detection: An ensemble learning [85] J. Pennington, R. Socher, and C. Manning, ‘‘GloVe: Global vectors for
approach,’’ in Proc. 4th Int. Conf. Intell. Comput. Control Syst. (ICICCS), word representation,’’ in Proc. Conf. Empirical Methods Natural Lang.
May 2020, pp. 1178–1183. Process. (EMNLP), 2014, pp. 1532–1543.
[64] S. M. Padnekar, G. S. Kumar, and P. Deepak, ‘‘BiLSTM-autoencoder [86] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, ‘‘Fake news
architecture for stance prediction,’’ in Proc. Int. Conf. Data Sci. Eng. detection using deep learning models: A novel approach,’’ Trans. Emerg.
(ICDSE), Dec. 2020, pp. 1–5. Telecommun. Technol., vol. 31, no. 2, p. e3767, Feb. 2020. [Online].
[65] M. Granik and V. Mesyura, ‘‘Fake news detection using naive Bayes clas- Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.3767
sifier,’’ in Proc. IEEE 1st Ukraine Conf. Electr. Comput. Eng. (UKRCON), [87] S. Singhania, N. Fernandez, and S. Rao, ‘‘3HAN: A deep neural net-
May 2017, pp. 900–903. work for fake news detection,’’ in Proc. Int. Conf. Neural Inf. Process.
[66] A. Jain and A. Kasbe, ‘‘Fake news detection,’’ in Proc. IEEE Int. Students’ Switzerland: Springer, 2017, pp. 572–581.
Conf. Electr., Electron. Comput. Sci. (SCEECS), 2018, pp. 1–5. [88] J. A. Nasir, O. S. Khan, and I. Varlamis, ‘‘Fake news detection: A hybrid
[67] R. K. Kaliyar, ‘‘Fake news detection using a deep neural network,’’ in CNN-RNN based deep learning approach,’’ Int. J. Inf. Manage. Data
Proc. 4th Int. Conf. Comput. Commun. Autom. (ICCCA), Dec. 2018, Insights, vol. 1, no. 1, Apr. 2021, Art. no. 100007.
pp. 1–7. [89] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
[68] G. Bhatt, A. Sharma, S. Sharma, A. Nagpal, B. Raman, and A. Mittal, of deep bidirectional transformers for language understanding,’’ 2018,
‘‘Combining neural, statistical and external features for fake news stance arXiv:1810.04805.
identification,’’ in Proc. Companion The Web Conf. Web Conf. (WWW), [90] S. Kula, M. Choraś, and R. Kozik, ‘‘Application of the bert-based archi-
2018, pp. 1353–1357, doi: 10.1145/3184558.3191577. tecture in fake news detection,’’ in Proc. Comput. Intell. Secur. Inf. Syst.
[69] F. A. Ozbay and B. Alatas, ‘‘Fake news detection within online social Conf. Switzerland: Springer, 2019, pp. 239–249.
media using supervised artificial intelligence algorithms,’’ Phys. A, Stat. [91] T. Zhang, D. Wang, H. Chen, Z. Zeng, W. Guo, C. Miao, and L. Cui,
Mech. Appl., vol. 540, Feb. 2020, Art. no. 123174. ‘‘BDANN: BERT-based domain adaptation neural network for multi-
[70] B. Al-Ahmad, A. M. Al-Zoubi, R. A. Khurma, and I. Aljarah, ‘‘An evo- modal fake news detection,’’ in Proc. Int. Joint Conf. Neural Netw.
lutionary fake news detection method for COVID-19 pandemic informa- (IJCNN), Jul. 2020, pp. 1–8.
tion,’’ Symmetry, vol. 13, no. 6, p. 1091, Jun. 2021. [92] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘FakeBERT: Fake news
[71] S. Shabani and M. Sokhn, ‘‘Hybrid machine-crowd approach for fake detection in social media with a BERT-based deep learning approach,’’
news detection,’’ in Proc. IEEE 4th Int. Conf. Collaboration Internet Multimedia Tools Appl., vol. 80, no. 8, pp. 11765–11788, Mar. 2021.
Comput. (CIC), Oct. 2018, pp. 299–306. [93] W. Shishah, ‘‘Fake news detection using BERT model with joint learn-
[72] C. M. M. Kotteti, X. Dong, N. Li, and L. Qian, ‘‘Fake news detec- ing,’’ Arabian J. Sci. Eng., vol. 46, pp. 1–13, Jun. 2021.
tion enhancement with data imputation,’’ in Proc. IEEE 16th Int. [94] H. Yuan, J. Zheng, Q. Ye, Y. Qian, and Y. Zhang, ‘‘Improving fake news
Conf. Dependable, Autonomic Secure Comput., 16th Int. Conf. Perva- detection with domain-adversarial and graph-attention neural network,’’
sive Intell. Comput., 4th Int. Conf. Big Data Intell. Comput. Cyber Decis. Support Syst., vol. 151, Dec. 2021, Art. no. 113633.
Sci. Technol. Congr. (DASC/PiCom/DataCom/CyberSciTech), Aug. 2018, [95] A. Giachanou, G. Zhang, and P. Rosso, ‘‘Multimodal multi-image fake
pp. 187–192. news detection,’’ in Proc. IEEE 7th Int. Conf. Data Sci. Adv. Anal.
[73] X. Zhou, A. Jain, V. V. Phoha, and R. Zafarani, ‘‘Fake news early (DSAA), Oct. 2020, pp. 647–654.
detection: A theory-driven model,’’ Digit. Threats, Res. Pract., vol. 1, [96] S. Girgis, E. Amer, and M. Gadallah, ‘‘Deep learning algorithms for
no. 2, pp. 1–25, Jul. 2020. detecting fake news in online text,’’ in Proc. 13th Int. Conf. Comput. Eng.
[74] P. H. A. Faustini and T. F. Covões, ‘‘Fake news detection in multi- Syst. (ICCES), Dec. 2018, pp. 93–97.
ple platforms and languages,’’ Expert Syst. Appl., vol. 158, Nov. 2020, [97] H. Reddy, N. Raj, M. Gala, and A. Basava, ‘‘Text-mining-based fake
Art. no. 113503. news detection using ensemble methods,’’ Int. J. Autom. Comput., vol. 17,
[75] H. Jwa, D. Oh, K. Park, J. Kang, and H. Lim, ‘‘ExBAKE: Automatic pp. 1–12, Apr. 2020.
fake news detection model based on bidirectional encoder representations [98] K. Shu, S. Wang, and H. Liu, ‘‘Understanding user profiles on social
from transformers (BERT),’’ Appl. Sci., vol. 9, no. 19, p. 4062, Sep. 2019. media for fake news detection,’’ in Proc. IEEE Conf. Multimedia Inf.
[76] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, ‘‘Dis- Process. Retr. (MIPR), Apr. 2018, pp. 430–435.
tributed representations of words and phrases and their compositionality,’’ [99] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro, and
in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111–3119. L. de Alfaro, ‘‘Automatic online fake news detection combining content
[77] F. C. Fernández-Reyes and S. Shinde, ‘‘Evaluating deep neural networks and social signals,’’ in Proc. 22nd Conf. Open Innov. Assoc. (FRUCT),
for automatic fake news detection in political domain,’’ in Proc. Ibero- May 2018, pp. 272–279.
Amer. Conf. Artif. Intell., Nov. 2018, pp. 206–216. [Online]. Available: [100] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, ‘‘DEFEND: Explainable
https://link.springer.com/chapter/10.1007/978-3-030-03928-8_17 fake news detection,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl.
[78] C. K. Hiramath and G. C. Deshpande, ‘‘Fake news detection using deep Discovery Data Mining, Jul. 2019, pp. 395–405.
learning techniques,’’ in Proc. 1st Int. Conf. Adv. Inf. Technol. (ICAIT), [101] M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorff, and B. Stein,
Jul. 2019, pp. 411–415. ‘‘A stylometric inquiry into hyperpartisan and fake news,’’ 2017,
[79] A. P. B. Veyseh, M. T. Thai, T. H. Nguyen, and D. Dou, ‘‘Rumor detection arXiv:1702.05638.
in social networks via deep contextual modeling,’’ in Proc. IEEE/ACM [102] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong, and K. Shu, ‘‘Mining dual
Int. Conf. Adv. Social Netw. Anal. Mining, Aug. 2019, pp. 113–120. emotion for fake news detection,’’ 2019, arXiv:1903.01728.
[80] M. Bugueño, G. Sepulveda, and M. Mendoza, ‘‘An empirical analysis of [103] S. Hosseinimotlagh and E. E. Papalexakis, ‘‘Unsupervised content-
rumor detection on microblogs with recurrent neural networks,’’ in Proc. based identification of fake news articles with tensor decomposition
Int. Conf. Hum.-Comput. Interact., Jul. 2019, pp. 293–310. [Online]. ensembles,’’ in Proc. Workshop Misinformation Misbehavior Mining Web
Available: https://link.springer.com/chapter/10.1007/978-3-030-21902- (MIS), 2018, pp. 1–8.
4_21 [104] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘DeepFakE: Improving fake
[81] E. Providel and M. Mendoza, ‘‘Using deep learning to detect rumors news detection using tensor decomposition-based deep neural network,’’
in Twitter,’’ in Proc. Int. Conf. Hum.-Comput. Interact. Switzerland: J. Supercomput., vol. 77, no. 2, pp. 1015–1037, Feb. 2021.
Springer, 2020, pp. 321–334. [105] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘EchoFakeD: Improving fake
[82] Q. Le and T. Mikolov, ‘‘Distributed representations of sentences and news detection in social media with an efficient deep neural network,’’
documents,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 1188–1196. Neural Comput. Appl., vol. 33, pp. 1–17, Jan. 2021.
[83] S. Sangamnerkar, R. Srinivasan, M. R. Christhuraj, and R. Sukumaran, [106] M. Dong, L. Yao, X. Wang, B. Benatallah, Q. Z. Sheng, and H. Huang,
‘‘An ensemble technique to detect fabricated news article using machine ‘‘DUAL: A deep unified attention model with latent relation represen-
learning and natural language processing techniques,’’ in Proc. Int. Conf. tations for fake news detection,’’ in Proc. Int. Conf. Web Inf. Syst. Eng.
Emerg. Technol. (INCET), Jun. 2020, pp. 1–7. Switzerland: Springer, 2018, pp. 199–209.
[107] J. Zhang, B. Dong, and P. S. Yu, ‘‘FakeDetector: Effective fake news [130] U. Kamath, J. Liu, and J. Whitaker, Deep Learning for NLP and Speech
detection with deep diffusive neural network,’’ in Proc. IEEE 36th Int. Recognition, vol. 84. Switzerland: Springer, 2019.
Conf. Data Eng. (ICDE), Apr. 2020, pp. 1826–1829. [131] B. M. Amine, A. Drif, and S. Giordano, ‘‘Merging deep learning model
[108] H. Karimi, P. Roy, S. Saba-Sadiya, and J. Tang, ‘‘Multi-source multi-class for fake news detection,’’ in Proc. Int. Conf. Adv. Electr. Eng. (ICAEE),
fake news detection,’’ in Proc. 27th Int. Conf. Comput. Linguistics, 2018, Nov. 2019, pp. 1–4.
pp. 1546–1557. [132] Q. Li, Q. Hu, Y. Lu, Y. Yang, and J. Cheng, ‘‘Multi-level word features
[109] D. Mangal and D. K. Sharma, ‘‘Fake news detection with integration based on CNN for fake news detection in cultural communication,’’ Pers.
of embedded text cues and image features,’’ in Proc. 8th Int. Conf. Ubiquitous Comput., vol. 24, no. 2, pp. 1–14, 2019.
Rel., INFOCOM Technol. Optim., Trends Future Directions (ICRITO), [133] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Fake news identification on
Jun. 2020, pp. 68–72. Twitter with hybrid CNN and RNN models,’’ in Proc. 9th Int. Conf.
[110] P. Qi, J. Cao, T. Yang, J. Guo, and J. Li, ‘‘Exploiting multi-domain visual Social Media Soc., New York, NY, USA, Jul. 2018, pp. 226–230, doi:
information for fake news detection,’’ in Proc. IEEE Int. Conf. Data 10.1145/3217804.3217917.
Mining (ICDM), Nov. 2019, pp. 518–527. [134] L. Li, G. Cai, and N. Chen, ‘‘A rumor events detection method based on
[111] K. Shu, X. Zhou, S. Wang, R. Zafarani, and H. Liu, ‘‘The role of user deep bidirectional GRU neural network,’’ in Proc. IEEE 3rd Int. Conf.
profiles for fake news detection,’’ in Proc. IEEE/ACM Int. Conf. Adv. Image, Vis. Comput., Jun. 2018, pp. 755–759.
[135] M. Z. Asghar, A. Habib, A. Habib, A. Khan, R. Ali, and A. Khattak,
Social Netw. Anal. Mining, Aug. 2019, pp. 436–439.
‘‘Exploring deep neural networks for rumor detection,’’ J. Ambient Intell.
[112] H. Guo, J. Cao, Y. Zhang, J. Guo, and J. Li, ‘‘Rumor detection with
Humanized Comput., vol. 12, no. 4, pp. 1–19, 2019.
hierarchical social attention network,’’ in Proc. 27th ACM Int. Conf. Inf.
[136] S. R. Sahoo and B. B. Gupta, ‘‘Multiple features based approach for
Knowl. Manage., Oct. 2018, pp. 943–951.
automatic fake news detection on social networks using deep learning,’’
[113] J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto, Appl. Soft Comput., vol. 100, Mar. 2021, Art. no. 106983.
‘‘Explainable machine learning for fake news detection,’’ in Proc. 10th [137] Q. Liao, H. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and
ACM Conf. Web Sci. (WebSci), New York, NY, USA, 2019, pp. 17–26, Y. Ding, ‘‘An integrated multi-task model for fake news detection,’’
doi: 10.1145/3292522.3326027. IEEE Trans. Knowl. Data Eng., early access, Jan. 28, 2021, doi:
[114] J. Kim, B. Tabibian, A. Oh, B. Schölkopf, and M. Gomez-Rodriguez, 10.1109/TKDE.2021.3054993.
‘‘Leveraging the crowd to detect and reduce the spread of fake news and [138] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
misinformation,’’ in Proc. 11th ACM Int. Conf. Web Search Data Mining, ‘‘The graph neural network model,’’ IEEE Trans. Neural Netw., vol. 20,
Feb. 2018, pp. 324–332. no. 1, pp. 61–80, Jan. 2008.
[115] K. Popat, S. Mukherjee, A. Yates, and G. Weikum, ‘‘DeClarE: Debunking [139] T. Bian, X. Xiao, T. Xu, P. Zhao, W. Huang, Y. Rong, and A. Huang,
fake news and false claims using evidence-aware deep learning,’’ 2018, ‘‘Rumor detection on social media with bi-directional graph convolu-
arXiv:1809.06416. tional networks,’’ in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 1,
[116] T. Saikh, A. Anand, A. Ekbal, and P. Bhattacharyya, ‘‘A novel approach pp. 549–556.
towards fake news detection: Deep learning augmented with textual [140] Q. Huang, C. Zhou, J. Wu, M. Wang, and B. Wang, ‘‘Deep structure
entailment features,’’ in Proc. Int. Conf. Appl. Natural Lang. Inf. Syst. learning for rumor detection on Twitter,’’ in Proc. Int. Joint Conf. Neural
Switzerland: Springer, 2019, pp. 345–358. Netw. (IJCNN), Jul. 2019, pp. 1–8.
[117] L. Wu and H. Liu, ‘‘Tracing fake-news footprints: Characterizing social [141] Y. Rong, W. Huang, T. Xu, and J. Huang, ‘‘DropEdge: Towards
media messages by how they propagate,’’ in Proc. 11th ACM Int. Conf. deep graph convolutional networks on node classification,’’ 2019,
Web Search Data Mining, Feb. 2018, pp. 637–645. arXiv:1907.10903.
[118] K. Shu, S. Wang, and H. Liu, ‘‘Beyond news contents: The role of social [142] Y. Ren, B. Wang, J. Zhang, and Y. Chang, ‘‘Adversarial active learning
context for fake news detection,’’ in Proc. 12th ACM Int. Conf. Web based heterogeneous graph neural network for fake news detection,’’ in
Search Data Mining, Jan. 2019, pp. 312–320. Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2020, pp. 452–461.
[119] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M. Bronstein, ‘‘Fake [143] Z. Wu, D. Pi, J. Chen, M. Xie, and J. Cao, ‘‘Rumor detection based on
news detection on social media using geometric deep learning,’’ 2019, propagation graph neural network with attention mechanism,’’ Expert
arXiv:1902.06673. Syst. Appl., vol. 158, Nov. 2020, Art. no. 113595. [Online]. Available:
[120] M. Albahar, ‘‘A hybrid model for fake news detection: Leveraging news http://www.sciencedirect.com/science/article/pii/S095741742030419X
content and user comments in fake news,’’ IET Inf. Secur., vol. 15, no. 2, [144] L. Zhang, J. Li, B. Zhou, and Y. Jia, ‘‘Rumor detection based on SAGNN:
pp. 169–177, Mar. 2021. Simplified aggregation graph neural networks,’’ Mach. Learn. Knowl.
[121] B. Al Asaad and M. Erascu, ‘‘A tool for fake news detection,’’ in Proc. Extraction, vol. 3, no. 1, pp. 84–94, Jan. 2021. [Online]. Available:
20th Int. Symp. Symbolic Numeric Algorithms Sci. Comput. (SYNASC), https://www.mdpi.com/2504-4990/3/1/5
[145] S. Hiriyannaiah, A. Srinivas, G. K. Shetty, G. Siddesh, and K. Srinivasa,
Sep. 2018, pp. 379–386.
‘‘A computationally intelligent agent for detecting fake news using gen-
[122] S. Aphiwongsophon and P. Chongstitvatana, ‘‘Detecting fake news
erative adversarial networks,’’ in Hybrid Computational Intelligence:
with machine learning method,’’ in Proc. 15th Int. Conf. Electr. Eng.,
Challenges and Applications. Amsterdam, The Netherlands: Elsevier,
Electron., Comput., Telecommun. Inf. Technol. (ECTI-CON), Jul. 2018,
2020, p. 69.
pp. 528–531.
[146] J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, B. Wang, P. Zhang, and
[123] N. Ruchansky, S. Seo, and Y. Liu, ‘‘CSI: A hybrid deep model for fake
D. Zhang, ‘‘IRGAN: A minimax game for unifying generative and dis-
news detection,’’ in Proc. ACM Conf. Inf. Knowl. Manage., New York,
criminative information retrieval models,’’ in Proc. 40th Int. ACM SIGIR
NY, USA, Nov. 2017, pp. 797–806, doi: 10.1145/3132847.3132877.
Conf. Res. Develop. Inf. Retr., Aug. 2017, pp. 515–524.
[124] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, and P. S. Yu, ‘‘TI- [147] Y. Li and J. Ye, ‘‘Learning adversarial networks for semi-supervised text
CNN: Convolutional neural networks for fake news detection,’’ CoRR, classification via policy gradient,’’ in Proc. 24th ACM SIGKDD Int. Conf.
vol. abs/1806.00749, pp. 1–11, Jun. 2018. Knowl. Discovery Data Mining, Jul. 2018, pp. 1715–1723.
[125] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physi- [148] B. Hu, Y. Fang, and C. Shi, ‘‘Adversarial learning on heterogeneous
cal layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, information networks,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl.
Dec. 2017. Discovery Data Mining, Jul. 2019, pp. 120–129.
[126] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, ‘‘Mobile encrypted [149] T. Le, S. Wang, and D. Lee, ‘‘MALCOM: Generating malicious com-
traffic classification using deep learning: Experimental evaluation, ments to attack neural fake news detection models,’’ in Proc. IEEE Int.
lessons learned, and challenges,’’ IEEE Trans. Netw. Service Manag., Conf. Data Mining (ICDM), Nov. 2020, pp. 282–291.
vol. 16, no. 2, pp. 445–458, Feb. 2019. [150] Y. Long, Q. Lu, R. Xiang, M. Li, and C.-R. Huang, ‘‘Fake news
[127] P. Yildirim and D. Birant, ‘‘The relative performance of deep learning and detection through multi-perspective speaker profiles,’’ in Proc. 8th Int.
ensemble learning for textile object classification,’’ in Proc. 3rd Int. Conf. Joint Conf. Natural Lang. Process., vol. 2. Taipei, Taiwan: Asian Fed.
Comput. Sci. Eng. (UBMK), Sep. 2018, pp. 22–26. Natural Lang. Process., Nov. 2017, pp. 252–256. [Online]. Available:
[128] D. Shen, G. Wu, and H. Suk, ‘‘Deep learning in medical image analysis,’’ https://aclanthology.org/I17-2043/
Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, Jun. 2017. [151] T. Chen, X. Li, H. Yin, and J. Zhang, ‘‘Call attention to rumors: Deep
[129] M. Veres and M. Moussa, ‘‘Deep learning for intelligent transportation attention based recurrent neural networks for early rumor detection,’’ in
systems: A survey of emerging trends,’’ IEEE Trans. Intell. Transp. Syst., Proc. Pacific–Asia Conf. Knowl. Discovery Data Mining. Switzerland:
vol. 21, no. 8, pp. 3152–3168, Aug. 2020. Springer, 2018, pp. 40–52.
[152] N. Aloshban, ‘‘ACT: Automatic fake news classification through self- MD. ABDUL HAMID was born in Sonatola,
attention,’’ in Proc. 12th ACM Conf. Web Sci., Jul. 2020, pp. 115–124. Pabna, Bangladesh. He received the Bachelor of
[153] Y.-J. Lu and C.-T. Li, ‘‘GCAN: Graph-aware co-attention net- Engineering degree in computer and information
works for explainable fake news detection on social media,’’ 2020, engineering from the International Islamic Univer-
arXiv:2004.11648. sity Malaysia (IIUM), in 2001, and the combined
[154] J. Ding, Y. Hu, and H. Chang, ‘‘BERT-based mental model, a better fake
master’s and Ph.D. degree from the Computer
news detector,’’ in Proc. 6th Int. Conf. Comput. Artif. Intell., New York,
NY, USA, Apr. 2020, pp. 396–400, doi: 10.1145/3404555.3404607.
Engineering Department, Kyung Hee University,
[155] L. Wu, Y. Rao, H. Yu, Y. Wang, and A. Nazir, ‘‘False information South Korea, in August 2009, majoring in infor-
detection on social media via a hybrid deep model,’’ in Proc. Int. Conf. mation communication. His education life spans
Social Inform., Sep. 2018, pp. 323–333, doi: 10.1007/978-3-030-01159- over different countries in the world. From 1989
8_31. to 1995, his high school and college graduation at the Rajshahi Cadet
[156] A. Choudhary and A. Arora, ‘‘Linguistic feature based learning model College, Bangladesh. He has been in the teaching profession throughout
for fake news detection and classification,’’ Expert Syst. Appl., vol. 169, his life, which also spans over different parts of the globe. From 2002
May 2021, Art. no. 114171. to 2004, he was a Lecturer with the Computer Science and Engineering
[157] D. K. Vishwakarma, D. Varshney, and A. Yadav, ‘‘Detection and veracity Department, Asian University of Bangladesh, Dhaka, Bangladesh. From
analysis of fake news via scrapping and authenticating the web search,’’ 2009 to 2012, he was an Assistant Professor with the Department of Infor-
Cognit. Syst. Res., vol. 58, pp. 217–229, Dec. 2019.
[158] Z. Jin, J. Cao, Y. Zhang, and J. Luo, ‘‘News verification by exploiting
mation and Communications Engineering, Hankuk University of Foreign
conflicting social viewpoints in microblogs,’’ in Proc. 13th AAAI Conf. Studies (HUFS), South Korea. From 2012 to 2013, he was an Assistant
Artif. Intell. (AAAI), 2016, pp. 2972–2978. Professor with the Department of Computer Science and Engineering, Green
[159] X. Zhou and R. Zafarani, ‘‘Fake news detection: An interdisciplinary University of Bangladesh. From 2013 to 2016, he was an Assistant Professor
research,’’ in Proc. Companion World Wide Web Conf., May 2019, with the Department of Computer Engineering, Taibah University, Madinah,
p. 1292. Saudi Arabia. From 2016 to 2017, he was an Associate Professor with
[160] R. Kumari and A. Ekbal, ‘‘AMFB: Attention based multimodal factorized the Department of Computer Science, Faculty of Science and Information
bilinear pooling for multimodal fake news detection,’’ Expert Syst. Appl., Technology, American International University-Bangladesh, Dhaka. From
vol. 184, Dec. 2021, Art. no. 115412. 2017 to 2019, he was an Associate Professor and a Professor with the Depart-
[161] A. Nascita, A. Montieri, G. Aceto, D. Ciuonzo, V. Persico, and ment of Computer Science and Engineering, University of Asia Pacific,
A. Pescape, ‘‘XAI meets mobile traffic classification: Understand-
Dhaka. Since 2019, he has been a Professor with the Department of Infor-
ing and improving multimodal deep learning architectures,’’ IEEE
mation Technology, King Abdulaziz University, Jeddah, Saudi Arabia. His
Trans. Netw. Service Manage., early access, Jul. 19, 2021, doi:
10.1109/TNSM.2021.3098157. research interests include network/cyber-security, natural language process-
[162] A. Adadi and M. Berrada, ‘‘Peeking inside the black-box: A sur- ing, machine learning, wireless communications, and networking protocols.
vey on explainable artificial intelligence (XAI),’’ IEEE Access, vol. 6,
pp. 52138–52160, 2018.