0% found this document useful (0 votes)
38 views13 pages

Enhancing Fake News Detection by Multi-Feature Classification

Uploaded by

sadhufrancis29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views13 pages

Enhancing Fake News Detection by Multi-Feature Classification

Uploaded by

sadhufrancis29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received 15 November 2023, accepted 29 November 2023, date of publication 5 December 2023,

date of current version 15 December 2023.


Digital Object Identifier 10.1109/ACCESS.2023.3339621

Enhancing Fake News Detection by


Multi-Feature Classification
AHMED HASHIM JAWAD ALMARASHY 1 , MOHAMMAD-REZA FEIZI-DERAKHSHI 1,

AND PEDRAM SALEHPOUR 2 , (Member, IEEE)


1 ComInSys Laboratory, Department of Computer Engineering, University of Tabriz, Tabriz 51666, Iran
2 Department of Computer Engineering, University of Tabriz, Tabriz 51666, Iran
Corresponding author: Mohammad-Reza Feizi-Derakhshi ([email protected])

ABSTRACT The proliferation of social media platforms has significantly accelerated our access to news,
but it has also facilitated the rapid dissemination of fake news. Automatic fake news detection systems can
help solve this problem. Although there is much research in this area, getting an accurate detection system
is still a challenge. This article proposes a novel model to increase the accuracy of fake news detection.
The theory behind the proposed model is to extract and combine global, spatial, and temporal features of
text to use in a new fast classifier. The proposed model consists of two phases: first, global features are
extracted by TF-IDF, spatial features by a convolutional neural network (CNN), and temporal features by
bi-directional long short-term memory (BiLSTM) simultaneously. Then a fast learning network (FLN) is
used to efficiently classify the features. Extensive experiments were conducted using two publicly available
fake news datasets: ISOT and FA-KES. These two have different sizes; therefore, the proposed architecture
(CNN+BiLSTM+FLN) can be evaluated much better. Results demonstrate the proposed model’s superiority
in comparison with previous works.

INDEX TERMS Fake news, social media platforms, distinguishing real and fake news, global,
temporal, spatial features, novel architecture: CNN+BiLSTM+FLN, convolutional neural network (CNN),
bi-directional long short-term memory (BiLSTM), fast learning network (FLN).

I. INTRODUCTION advantageous and disadvantageous outcomes. One of the


The exponential growth of online news platforms, including most significant risks associated with this phenomenon is
social media, digital news sources, and traditional print the dissemination of false information, which can have far-
media, has facilitated the rapid dissemination of fake news. reaching consequences, including undermining worldwide
This issue arises from the ease of uploading content onto trade, journalism, and democratic processes. The issue of
these platforms, leading to a significant portion of the fake news gained significant attention in 2016 following
global population relying on social media channels such as the former presidential election in the United States [1].
Twitter, Facebook, Instagram, and YouTube as their primary For instance, a fabricated news story in 2013, falsely
source of news and information. This reliance is particularly claiming that President Barack Obama had been injured in
prominent in developing nations where access to traditional an explosion, resulted in a stock market deficit of 130 billion
news outlets is limited. Consequently, individuals across dollars [2]. Scholars at Stanford University have provided
different geographical locations exploit these ubiquitous statistical data indicating that a substantial proportion of fake
social media platforms to spread false information through news, amounting to 72.3 percent, can be attributed to both
various networking channels, often with illicit intentions. The traditional news sources and digital social media platforms.
increasing prevalence of social media usage has profound The detrimental impact of misinformation on the general
impacts on society, commerce, and culture, yielding both populace is widely acknowledged and poses a formidable
obstacle to global trade, journalism, and governance. Given
The associate editor coordinating the review of this manuscript and the peril posed by the proliferation of fabricated information,
approving it for publication was Barbara Guidi . it becomes imperative to develop effective mechanisms
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 139601
A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

for identifying fake news. Fake news encompasses various graphical formats, aligning with the latest research findings
categories such as misinformation, disinformation, satire or for better comprehension.
parody, clickbait, conspiracy theories, political propaganda, The contributions of this paper are as follows: Integration
and hoaxes, each with its characteristics and intent. These of CNN+BiLSTM and TF-IDF within the deep neural
categories may overlap, and sometimes it can be challenging network architecture. The theory behind this integration is
to precisely classify fake news into a single category that CNN deals with spatial features, BiLSTM is well-known
due to the complexity of the content. The impact of for extracting temporal features, and the traditional TF-IDF
different categories of fake news on the performance of the technique can extract global features of text. Using these
detection system is an essential aspect that warrants thorough three components at the same time gives a better view
investigation. Addressing this concern will provide a more of the input text to the classifier, resulting in improved
nuanced understanding of the detection system’s strengths detection accuracy. The results of experiments confirm that
and limitations as categorization of fake news, performance the effectiveness of detection increases with the incorporation
evaluation by category, comparison of categories, and analyze of various features. Using two different datasets of different
the practical implications of the performance variations sizes Testing the model on two different datasets, which have
across different categories [3]. different sizes and features, helps us evaluate the method
The field of computer science is witnessing an upsurge much better and shows its ability to deal with different
in experimentation as Artificial Intelligence progresses datasets with different scales and formats. Utilization of the
rapidly. Researchers are now addressing the novel challenge FLN algorithm. Using FLN for the classification phase to
of detecting fake news, which has not been previously leverage the effectiveness of deep neural networks. The FLN
tackled. Machine Learning (ML)-based automatic detection algorithm is well-known for its superior regression accuracy,
approaches have been extensively studied to combat the generalization performance, stability, and quick convergence.
spread of fake news. These systems utilize ML techniques The results of experiments confirm that the FLN algorithm
to aid consumers in evaluating the veracity of the content outperforms comparable methods in terms of regression
they encounter, enabling them to determine whether a accuracy, generalization performance, and stability while
given news piece is genuine or not. Recent advancements maintaining rapid convergence. The subsequent sections of
in Deep Learning (DL) techniques have further improved this work are organized as follows: Section II provides a
the effectiveness and efficiency of fake news detection, comprehensive review of the relevant literature. Section III
surpassing the capabilities of traditional ML methods. discusses the datasets and methodologies employed in
The fake news detection process can be considered a this research. The experimental findings are presented in
classification system in artificial intelligence in which a Section IV. Finally, Section V concludes the paper and
classifier tries to classify input text into ‘‘fake’’ or ‘‘real’’ provides recommendations for future work. The study utilizes
news. Previous research has aimed to create accurate two datasets: ISOT and FA-KES datasets to facilitate the
classification systems, but achieving optimal performance research analysis.
remains an ongoing pursuit. This article presents a research
focus on developing a reliable and precise automated system II. LITERATURE REVIEW
for detecting fake news on social media. The surge in deceptive content on social media has prompted
In the proposed deep neural network architecture, the researchers to intensify their efforts to find solutions. Several
convolutional neural network (CNN) and bi-directional long studies have been conducted in this field, and we highlight a
short-term memory (BiLSTM) are employed to simultane- selection of them below. Earlier studies in this field, like many
ously extract spatial and temporal features. Also, the TF-IDF other fields of NLP, were focused on probabilistic methods
(Term Frequency-Inverse Document Frequency) technique like SVM. However, after introducing deep learning, many
is used to capture global features. These features are then researchers moved forward with DL. First, studies focused
combined and classified using a Fast Learning Network on comparing probabilistic methods and DL methods.
(FLN) [4]. The FLN algorithm, which is known for its dual Then they moved towards proposing more complicated DL
parallel forward neural network configuration, demonstrates models. For example, Article [5] provides a framework for
superior regression accuracy, generalization performance, selecting between DL and ML methods for problem-solving.
stability, and quick convergence. It examines the accuracy of techniques like Naive Bayes and
This approach ensures the extraction of a diverse range of clustering and compares them against traditional methods.
features, making it highly effective in detecting fake news. The study analyses the advantages and disadvantages of
Two famous and publicly available fake news datasets, ISOT various DL techniques and their performance compared to
and FA-KES, are used in this study. conventional methods. In [6], a paradigm and procedure for
Both the title and body text of these two datasets are identifying fake news are presented. The authors employ
used as input in this system. Various evaluation metrics, ML and NLP to gather news and utilize support vector
including accuracy, recall, F1 , and precision, are employed machines to determine the veracity of the news. Reference
to assess the performance of the proposed method. The [9] presents a system for identifying fake news based on
experimental results are presented using both tabular and feature extraction, feature selection, and vote classifiers. The

139602 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

TABLE 1. Survey of several recent fake news detection publications.

proposed system distinguishes between fake and real news network architecture called OPCNN-FAKE, which achieves
and outperforms previous works in terms of accuracy for the exceptional performance across different datasets and out-
ISOT dataset. Deep learning models gain better performance performs other models in detecting fake news. Research
in comparison with probabilistic models. The authors of in [7] suggests a novel deep learning architecture that
[8] propose a method for automatically detecting fake news combines recurrent and convolutional neural networks for the
on Facebook using deep learning. They utilize Facebook automated detection of fake news using machine learning
account-related features and news content attributes to assess and artificial intelligence. The model exhibits superior
account behavior. diagnostic results compared to non-hybrid baseline tech-
The suggested approach surpasses current techniques niques and demonstrates promising generalizability across
in terms of accuracy when evaluated on real-world data. different datasets. In [11], a deep learning approach called
To combat the issue of fake news, [10] suggests new FakeBERT is introduced, which addresses the challenge of
methods based on machine learning and deep learning. ambiguity in natural language processing. By combining
The authors propose an enhanced convolutional neural BERT with deep learning techniques, the proposed model

VOLUME 11, 2023 139603


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

achieves a high accuracy of 98.90% and outperforms existing TABLE 2. Presents a sample of the ISOT dataset.
models.
Other studies, such as [12], [13], [14], [15], [16], [17],
[18], [19], [20], [21], [22], [23], [24], and [25], also present
various approaches and architectures for fake news detection,
showcasing their performance on different datasets and
achieving high levels of accuracy. Despite the promising
results demonstrated by current deep learning algorithms in
fake news detection, there is still room for improvement in
this field.
One of the most up-to-date models is presented in [26].
In this paper, a state-of-the-art model adopted placements
and hyperparameter tuning for an effective solution for fake
news detection named HyproBert, which was evaluated using
two fake news datasets (ISOT and FA-KES) and achieved
higher performance compared to other baseline and state-
TABLE 3. Presents a sample of the FA-KES dataset.
of-the-art models. Also, [27] proposed (DeepCnnLstm +
DeepCnnBilstm) that achieved the best level of accuracy in
the detection of fake news on FA-KES dataset. We have
selected these recent papers to compare them with our
method. Despite the promising results demonstrated by
current deep learning algorithms in fake news detection, there
is still room for improvement in this field. Table1 provides an
overview of state-of-the-art models, their contributions, and
the datasets used in the related works, offering further insights
into fake news detection.

III. MATERIALS AND METHODS


The process of the proposed architecture comprises two
primary phases: the initial phase involves the extraction of
major features that are deemed to have a significant impact
on the performance of the classifier. The subsequent stage
involves the classification phase, wherein the features are
categorized into two distinct classes, one denoting real and articles labeled as true and 47% labeled as fake. Sample of
the other fake, as illustrated in Figure 1. the FA-KES dataset can be found in Table 3.

A. DATASET B. PRE-PROCESSING
The proposed study utilized two distinct datasets: ISOT [34] The textual data undergoes several preprocessing steps, the
and FA-KES [35]. These datasets are without missing or text data is cleaned, standardized, and transformed into
outlier values. a suitable format for subsequent feature extraction and
ISOT dataset consists of real and fake news articles analysis.
obtained from authentic real-world sources. The authors The following steps are included in the preprocessing
crawled articles from Reuters.com, a reputable news source, phase: Lowercasing: All text is converted to lowercase to
to collect real news articles, while fake news articles were ensure consistency and avoid treating words with different
gathered from websites that were deemed unreliable and had cases as distinct. Cleaning up: Various cleaning operations
been flagged by Politifact and Wikipedia. These articles were are performed to remove unwanted elements from the text.
primarily published between 2016 and 2017. It contains a This includes deleting URLs, punctuation marks (including
total of 44,898 instances. It is balanced, with 21,417 instances the hash character #), and special characters specific to
labeled as real news (labeled as 1) and 23,481 instances platforms like Twitter ($, &, %, etc.). Non-ASCII English
labeled as fake news (labeled as 0). Sample of the ISOT can characters are also removed to preserve data exclusive to the
be found in Table 2. English language. Replacing: Certain textual elements are
FA-KES dataset focuses on news articles related to the replaced with their related words or simplified forms. This
Syrian war. It consists of 804 news articles, each including includes replacing contractions with their extended words
the headline, date, location, news sources, and the complete (e.g., replacing ‘‘I’ll’’ with ‘‘I will’’), converting emoji to their
article body. The label for the class is set to ’0’ for fake corresponding words, and reducing repetitive occurrences
news and ’1’ for real news. It is balanced, with 53% of the of a character to a single occurrence (e.g., converting

139604 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

FIGURE 1. Overall proposed method architecture.

‘happppppy’ to ‘happy’). Tokenization: The cleaned text Term Frequency (TF) computation: TF represents the
is then split into individual tokens or words to create a frequency of a term (word) in a document. It is the ratio of
tokenized representation of the text. This step helps with the number of times a specific word appears in a document to
further processing and analysis. Lemmatization/Stemming: the total number of words in that document.
Each token is further processed by applying lemmatization Document-Term Matrix: A table is generated to display
or stemming techniques. Lemmatization aims to convert the the frequency of each word in each sentence. This matrix
tokens to their base or root form, such as converting the token captures the occurrence of words in the documents. Inverse
‘‘interesting’’ to ‘‘interest.’’ Stemming, on the other hand, Document Frequency (IDF) computation: IDF represents the
involves reducing the tokens to their stem form by removing significance of a term across the entire document collection.
prefixes or suffixes. It is the logarithm of the total number of documents divided
by the number of documents that contain the specific
C. GLOBAL FEATURES EXTRACTION PHASE word.
The TF-IDF (Term Frequency-Inverse Document Frequency) TF-IDF computation: The TF-IDF score for each word in
method is used to extract global features [28]. Here is an a document is obtained by multiplying the TF value and IDF
overview of the TF-IDF process: value as presented in Figure 2. This step assigns higher scores
Pre-processing: The text documents undergo pre- to words that are frequent in a document but relatively rare in
processing steps, including cleaning, lowercasing, removing the entire document collection. Threshold determination: The
special characters, and tokenization on the sentence level. average score of all words is computed, and a threshold value

VOLUME 11, 2023 139605


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

by modeling the sequential relationships between words


and phrases in both forward and backward directions. The
BiLSTM layer is a type of recurrent neural network that
is well-suited for processing sequential data. It is able to
remember and propagate information over long distances,
making it effective in capturing contextual information and
dependencies in text.
By combining the CNN and BiLSTM layers in the
architecture, the model is able to extract both local (spatial)
and temporal features simultaneously. The CNN focuses
FIGURE 2. TF-IDF(term frequency-inverse document frequency). on capturing patterns and features within short windows
of the input, while the BiLSTM captures the sequential
relationships and context across the entire text. This hybrid
approach allows the model to capture a comprehensive range
of features from the text data, enabling more effective and
accurate analysis and classification tasks.

1) CNN CONTRIBUTION
We’ve provided an excellent overview of the advantages
of using CNNs for feature extraction in Natural Language
Processing (NLP) tasks. CNNs, although widely recognized
for their success in computer vision, have also demonstrated
their utility in NLP applications, including text categorization
and sentiment analysis. One of the main advantages of
CNNs in NLP is their ability to capture local patterns
and dependencies within the text. By treating words or
characters as one-dimensional signals, CNNs can apply
one-dimensional convolutions (Conv1D) to extract local
features. This is particularly useful for tasks like sentiment
FIGURE 3. Global features extraction process.
analysis or identifying key features in text classification,
where capturing n-grams or short phrases is important. CNNs
is determined. Words with scores higher than the threshold are also flexible in handling variable-length inputs, making
value are considered eligible for selection. them suitable for processing documents of different lengths.
Global feature extraction: The TF-IDF process extracts This flexibility allows them to be applied to a wide range of
global features by considering the significance of words NLP tasks without requiring fixed input sizes. The process of
across the entire dataset, rather than just within individual using CNNs for feature extraction in NLP typically involves
documents. The overall procedure of the TF-IDF process the following steps:
is summarized in Figure 3, which provides an illustration of Application of one-dimensional convolutions to the input
the steps involved in converting text documents into vector representations, where filters slide over the input and perform
representations using TF-IDF. element-wise multiplications to create feature maps. Multiple
filters can be used to capture different types of features.
D. LOCAL AND TEMPORAL FEATURES EXTRACTION PHASE Utilization of pooling layers, such as max pooling,
In the feature extraction phase, the study leverages the power to down-sample the feature maps and reduce their dimen-
of deep neural networks, particularly a hybrid model called sionality while retaining important information. Pooling
CNN+BiLSTM [29], to capture both spatial and sequential helps capture significant features and introduces a level of
features from the text data. The CNN+BiLSTM architecture interpretation independence.
consists of two main components: Application of activation functions, such as ReLU,
Convolutional Neural Network (CNN): A one-dimensional element-wise to introduce non-linearity into the network,
CNN (Conv1D) is utilized to process the input vectors, which enabling the learning of complex patterns. Training the CNN
represent the pre-processed and transformed text data. The involves specifying the number of filters and the kernel size.
Conv1D layer is capable of extracting local features by Conv1D is commonly used in text categorization and NLP
applying filters over the input data and capturing patterns that tasks, as it is designed to process one-dimensional sequences
are relevant to the task at hand. In this case, the CNN focuses of word vectors. Figure 4 depicts the procedure of Conv1D
on capturing spatial features from the text. graphically.
Bidirectional Long Short-Term Memory (BiLSTM): A By stacking multiple convolutional layers with different
BiLSTM layer is employed to capture temporal features filter sizes and hyper parameters, CNNs can learn hierarchical

139606 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

FIGURE 5. BiLSTM network.

In a BiLSTM network, the input sequence is processed


by two LSTM layers, one in the forward direction and
FIGURE 4. A 1-D convolutional operation. the other in the backward direction. The outputs of both
LSTM layers can be combined using different methods such
representations of the text and capture local features at as averaging, summing, multiplying, or concatenating to
different levels. This allows them to extract meaningful create a unified representation that captures both forward and
information from the input data. backward context.
Furthermore, CNNs can be combined with other archi- Figure 5 illustrates the structure of a BiLSTM network,
tectures, such as recurrent or transformer-based models, showcasing the forward and backward LSTM layers and their
to leverage their respective strengths. combination to generate the final representation of the input
For example, in a hierarchical CNN, lower layers may sequence.
capture word-level features, while higher layers capture Overall, BiLSTMs are powerful tools for NLP feature
sentence-level features. extraction as they leverage bidirectional context to capture
While CNNs may not capture long-term dependencies as dependencies and contextual information from both past
effectively as recurrent or transformer models, they still offer and future perspectives, leading to improved representation
value in feature extraction for specific NLP tasks, especially learning in various NLP tasks. BiLSTMs are well-suited for
when local patterns and relationships are critical. The design NLP tasks that involve processing input sequences where the
and architecture of the CNN can be tailored to meet the order of words or characters is important. They are designed
requirements of the task at hand. to handle sequential data and can efficiently process input
sequences of varying lengths. BiLSTMs capture contextual
2) BILSTM CONTRIBUTION information by considering both the past and future context
We have provided an accurate description of the Bidirectional of each word or character in the input sequence. The forward
Long Short-Term Memory (BiLSTM) networks and their pass captures past context, while the backward pass captures
significance in Natural Language Processing (NLP) tasks for future context, enabling the model to have a comprehensive
feature extraction. understanding of the input. Unlike traditional LSTMs that
BiLSTMs are a variant of recurrent neural networks can only capture dependencies in the past, BiLSTMs address
(RNNs) that are widely used in NLP due to their ability this limitation by incorporating information from both
to capture both past and future context of the input directions [30].
sequence. They process the input sequence in both forward This makes them effective in capturing long-term depen-
and backward directions simultaneously, allowing them to dencies in the input sequence, which is crucial for various
capture dependencies and context from both directions. NLP tasks such as sentiment analysis, machine translation,
In contrast to standard LSTMs, which only process the and named entity recognition. The hidden states of the
input sequence in a forward manner, BiLSTMs incorporate BiLSTM at each time step serve as rich feature represen-
an additional LSTM layer that operates in reverse, enabling tations of the input sequence. These hidden states encode
the flow of information from the end of the sequence to the contextual information and capture important patterns and
beginning. This bidirectional flow of information enables the relationships within the text. These representations can be
model to capture contextual information from both past and used as inputs for subsequent layers or fed into classification
future dependencies, leading to a richer representation of the or regression models for different NLP tasks.
input sequence. BiLSTMs are often used to generate contextual word
The primary role of BiLSTMs in NLP feature extraction is embeddings, such as ELMo, which enrich word representa-
to capture bidirectional contextual information and encode it tions with contextual information from surrounding words.
into feature representations. By considering both the past and This helps in capturing multiple meanings (polysemy) and
future context, BiLSTMs can capture important patterns and resolving ambiguities in language understanding tasks.
relationships within the text, making them suitable for various BiLSTMs trained on large-scale language modeling tasks
NLP tasks such as sentiment analysis, machine translation, can be used as powerful feature extractors in transfer learning
named entity recognition, and classification. settings.

VOLUME 11, 2023 139607


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

The lower layers of a pre-trained BiLSTM can be used as


feature extractors, while the higher layers can be fine-tuned
or replaced to adapt to specific downstream tasks.
Overall, BiLSTMs offer significant benefits in NLP feature
extraction by capturing both past and future context, handling
sequential data, and providing rich feature representations
that can be utilized for a variety of NLP tasks.

E. FUSION FEATURES
In the study, a comprehensive feature set was created by
combining features extracted from different sources using
early fusion.
Early fusion refers to the process of combining features at
an early stage of the model architecture [31]. The extracted
features from different sources, such as TF-IDF repre-
sentations, CNN-based local features, and BiLSTM-based
temporal features, were fused together to create a unified FIGURE 6. Structure of fast learning network [4].
representation for each input instance [32].
Early fusion can lead to better exploitation of syner-
weight initialization is common in neural networks to
gies among data sources, potentially yielding improved
introduce diversity in the initial weights and avoid getting
accuracy and robustness [33]. However, it requires careful
stuck in local optima during training. By starting with random
consideration of data preprocessing, feature selection, and
weights, the network has a chance to explore different regions
fusion techniques to ensure optimal results. Additionally,
of the weight space and find better solutions.
early fusion may face challenges related to handling data
Weighted Connections: The FLN has weighted connec-
heterogeneity, addressing feature misalignment, and dealing
tions between the output layer and the input layer, as well
with missing or noisy data.
as between the output nodes and the hidden nodes.
In summary, early fusion is a powerful technique for inte-
These weighted connections determine the flow of infor-
grating information from multiple sources at an early stage,
mation between the layers and nodes of the network. The
providing a comprehensive and enriched representation that
weighted values of these connections are determined using
can lead to improved performance in various applications.
least squares methods, which are analytical techniques used
By performing early fusion, the model can leverage the
for finding the best-fit line or curve to a set of data points.
combined information from multiple feature sources to make
By connecting the input nodes not only to the hidden layer
predictions or perform downstream tasks. This approach
but also directly to the output nodes, the FLN aims to increase
aims to capture complementary information and potentially
the learning speed and improve the accuracy of the network.
enhance the overall performance of the model.
This connectivity pattern allows information to flow more
The specific details of how the features were combined
directly from the input to the output, potentially enabling
or fused depend on the architecture and design choices of
the network to capture important features and make accurate
the model. It could involve concatenating the feature vectors,
predictions.
applying element-wise operations, or employing other fusion
Suppose, there are N arbitrary distinct samples {xi ,yi },
techniques to merge the information from different feature
in which xi =[xi1 , xi2 ,. . . xin ]T ∈ Rn is the n-dimensional
sources.
feather vector of the ith sample, and yl =[yl1 , yl2 ,. . . yln ]T ∈
Overall, early fusion of features extracted from different
Rl is the corresponding l-dimensional output vector.
sources allows the model to leverage diverse information and
The FLN has m hidden layer nodes. W in is the m × n input
create a more comprehensive representation of the input data,
weight matrix, b=[b1 ,b2 ,. . . ,bm ] is the biases of hidden layer
potentially improving the performance of the model on the
nodes, and W oh is a l × m matrix which consists of the weight
given NLP task.
values of the connection between the output layer and the
hidden layer.
F. CLASSIFICATION PHASE
W oi is a l × n weight matrix which contains weight values
We used a neural network structure called Fast Learning
of the connection between the output layer and the input layer.
Network [4], as shown in Figure 6, for the classification phase
c=[c1 ,c2 ,. . . ,cl ]T is the biases of output layer nodes. g(.) and
of our proposed method.
f (.) are the active functions of hidden nodes and output nodes.
The FLN described in our statement has a unique
Then, the FLN is mathematically modeled as equation 1:
characteristic in its weight initialization process and weighted
connections. m
X
Weight Initialization: The input weights and biases of the yj = f (W oi xj + c + woh in
k g(wk xj + bk )) (1)
hidden layer in the FLN are randomly generated. Random k=1

139608 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

where the parameters of equation 1 are: Therefore, the FLN was able to overcome the majority of the
j=1,2,. . . ,N drawbacks presented by traditional ways of learning while
Woi : The weight vector connecting jth output nodes and also possessing an extremely rapid learning speed.
input nodes, Woi = [W1oi ,W2oi ,. . . ,Wloi ].
Wohk : The weight vector connecting the kth nodes of the
IV. EXPERIMENTAL RESULTS AND DISCUSSION
hidden layer and the nodes of the output layer, Woh k =
The study began by cleansing the dataset, ensuring it
oh ,W oh ,. . . ,W oh ]T .
[Wk1 was free from any irrelevant or noisy information. The
k2 kl
Wink : The weight vector connecting the kth nodes of the
applied tokenization technique then represents the text
hidden layer and the nodes of the input layer, and captures its characteristics. The model simultaneously
Win in in in T
k =[Wk1 ,Wk2 ,. . . ,Wkm ] .
extracted global, local, and temporal features. These features
Then equation 1 can be written compactly as equation 2. were combined and classified by the FLN classifier.
 
X
A. EXPERIMENTAL AND HYPERPARAMETER SETTINGS
Y = f (W oi X + W oh G + c) = f (W G) (2)
The datasets used in the study, namely ISOT and FA-KES,
I
were divided into training and testing sets. 80% of the
Learning algorithm for FLN: Suppose we have a training set instances from both datasets were allocated for training, while
s including input x and target y, where the number of hidden the remaining 20% were used for testing.
layer nodes is m and the activation function of hidden nodes is For the implementation of our models, we worked on
g(.), the FLN algorithm can be defined as follows: The matrix an HP Core i7 computer with 4 GB of RAM and a 64-bit
of W in input weights as well as the matrix of bias values b are operating system. The experiments were conducted using
randomly generated. The hidden layer output matrix of FLN MATLAB R2022a software. The learner algorithm for CNN
(G) is calculated using equation 3. is the stochastic gradient descent algorithm; it evaluates

g(W1in x1 + b1 ) . . . g(W1in xN + b1 )
 the gradient and updates the parameters using a subset of
.. .. .. the training data. A different subset, called a mini-batch,
G= . . . (3)
 
 is used at each iteration. The full pass of the training
g(Wmin x1 + bm ) . . . g(Wmin xN + bm ) m×N algorithm over the entire training set using mini-batches is
one epoch. Stochastic gradient descent is stochastic because
the parameter updates computed using a mini-batch are a
W = [W oi W oh c](l×(n+m+1)) (4) noisy estimate of the parameter update that would result from
using the full data set.
The matrix W = [W oi W oh c] could be called as output The learner algorithm for CNN is the stochastic gradient
weights. descent algorithm; it evaluates the gradient and updates the
parameters using a subset of the training data. A different
subset, called a mini-batch, is used at each iteration. The
I = [11â . . . 1]1×N (5)
full pass of the training algorithm over the entire training set
using mini-batches is one epoch. Stochastic gradient descent
The minimum norm least-squares solution of the linear is stochastic because the parameter updates computed using
system could be written as equation 6. a mini-batch are a noisy estimate of the parameter update that
 + would result from using the full data set.
X In BiLSTM, the Adam optimization algorithm is used to
Ŵ = f −1 (Y ) G = f −1 (Y )H + (6) adapt the learning rate of each parameter to achieve better
I and faster convergence of the training process. Table 4 shows
where H=[XT GT IT ], and also f−1 (.) is the invertible function the network’s hyperparameters.
of the activation function f(.) of output nodes. The proposed deep network underwent a learning process
on both the ISOT and FA-KES databases. For the ISOT
Finally, the determination of the weights of the FLN network dataset, the training process consisted of 1000 iterations,
is achieved through the utilization of the equations 7, 8, and 9: while for the FA-KES dataset, it was performed over
1200 iterations. After training, the network was evaluated
W oi = Ŵ (1 : l, 1 : n) (7) using the test data.
oh The average time required to train the proposed network
W = Ŵ (1 : l, n + 1 : (n + m)) (8)
was 45 seconds for the ISOT dataset and 52 seconds for
c = Ŵ (1 : l, n + m + 1) (9)
the FA-KES dataset. Figure IV-A depicts the classification
accuracy in the evaluation mode and the training error for the
In addition, in FLN, the input weights and hidden layer biases ISOT dataset. It can be observed that the network converged
are calculated haphazardly, whereas the other weights may be to 99% accuracy after 100 iterations, and the training error
found analytically through the use of least squares methods. remained below 0.01 within the same iterations.

VOLUME 11, 2023 139609


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

TABLE 4. Network’s hyperparameters.

FIGURE 8. Training accuracy and loss for proposed network on FA-KES


dataset.

TABLE 5. Confusion matrix.

and 13.
TP + TN
Accuracy = (10)
TP + TN + FP + FN
Accuracy refers to the proportion of correct predictions.
TP
Recall = (11)
TP + FN
Recall refers to the classifier’s capacity to identify all positive
samples.

FIGURE 7. Training accuracy and loss for proposed network on ISOT


TN
Precesion = (12)
dataset. TN + FP
Precision measures the accuracy of positive predictions.

On the other hand, Figure 8 illustrates the convergence RePr


F1 = 2 (13)
results and training error of the proposed network on the Re + Pr
FA-KES database. The F1 is the harmonic mean of precision and recall, which
Although the network on the FA-KES database did not computes values between 0 and 1.
converge as well as on the ISOT database, it still exhibited
excellent performance. C. RESULTS AND DISCUSSION ON THE ISOT DATASET
This fluctuation in convergence can be attributed to Table 6 and Figure IV-C present the results of various
the limited number of training samples available in the methods applied to the ISOT dataset, including the proposed
FA-KES dataset. Overall, the proposed network demon- method as well as previous methods discussed in the
strated strong performance on both the ISOT and FA-KES literature review. The comparison reveals that our proposed
databases, achieving high accuracy even with the challenges method performs better than other methods in terms of
posed by smaller datasets. accuracy.
All models trained on the ISOT dataset demonstrate
B. MODEL EVALUATION CRITERIA exceptional performance in detecting fake news. This can
The evaluation of the model was conducted using accuracy be attributed to the dataset’s inclusion of long words,
(Acc), precision (Pr), recall (Re), and F1 statistics. which provide more prominent features that contribute
According to Table 5, we compute the following metrics to the superior results achieved compared to the other
and mathematically expressed them in Equations 10, 11, 12, datasets.

139610 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

TABLE 6. Highest results of various methods on the ISOT. TABLE 7. Highest results of various methods on the FA-KES.

FIGURE 10. Comparison of different methods in term of accuracy for


FIGURE 9. Comparison of different methods in term of accuracy for ISOT
FA-KES dataset.
dataset.

It is worth noting that the performance of the proposed


method for the ISOT database is comparable to the results
reported in [7], [26], and [27] . This similarity in performance
may be attributed to the large number of training samples
available in the ISOT dataset.
The findings indicate that the proposed method exhibits
a high level of performance on the ISOT dataset,
achieving accuracy levels comparable to state-of-the-art
approaches. This demonstrates the efficacy of the hybrid
CNN+BiLSTM+FLN approach in accurately classifying
fake news, even when dealing with datasets that contain a
large number of training samples.

D. RESULTS AND DISCUSSION ON THE FA-KES DATASET


FIGURE 11. Comparison of the performance of our proposed method
The results presented in Table 7 and Figure 10 indicate with other methods on the ISOT dataset.
that the hybrid CNN+BiLSTM+FLN approach outperforms
other methods in terms of F1 , accuracy, precision, and samples while achieving high accuracy and speed in classi-
recall on the FA-KES dataset. Although modest in size, this fication. The comparison of our proposed method with the
database is used to evaluate the system’s responsiveness and approaches in [7], [26], and [27] demonstrates its superiority
robustness with databases of various sizes. Despite the in terms of F1 , accuracy, precision, and recall, across a wide
FA-KES dataset having fewer training samples compared to range of database sizes and formats. This is particularly
the ISOT dataset, the proposed method demonstrates superior noteworthy for small datasets where the proposed method
performance by effectively combining CNN, BiLSTM, excels. The provided Figures (IV-D and 12) display the
and FLN networks. This combination enables the learning comparison of the proposed method with other approaches
process to be completed with a limited number of training in terms of precision, recall, and F1 criteria for both the ISOT

VOLUME 11, 2023 139611


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

Our future work will also include considering alternative


data sources, including social media and user-generated
content in diverse languages. While these sources may
present challenges such as interference and bias, they
can provide valuable insights into the dissemination and
consequences of fake news in virtual societies. Also, the study
suggests exploring novel methodologies to further improve
the precision of fake news detection systems. One potential
avenue is the use of transfer learning techniques that leverage
pre-existing language models, which have shown exceptional
performance in various natural language processing tasks.
Investigating the impact of different categories of fake news,
such as propaganda, satire, and clickbait, on the efficacy of
identification mechanisms could also be explored to enhance
system efficiency, which is another future work.
FIGURE 12. Comparison of the performance of our proposed method Overall, the study highlights the potential of deep neural
with other methods on the FA-KES dataset. network architectures, along with feature extraction tech-
niques and diverse datasets, in effectively detecting fake
news.
and FA-KES databases, further highlighting the superior
performance of the hybrid CNN+BiLSTM+FLN approach. REFERENCES
(We noticed that there are two typos in [7] in the F1 measure [1] H. Ahmed, I. Traore, and S. Saad, ‘‘Detecting opinion spams and fake news
but we just left it to keep the results of the original paper). using text classification,’’ Secur. Privacy, vol. 1, no. 1, p. 9, Jan. 2018.
Overall, these results validate the effectiveness of the pro- [2] S. Vosoughi, D. Roy, and S. Aral, ‘‘The spread of true and false news
online,’’ Science, vol. 359, no. 6380, pp. 1146–1151, Mar. 2018.
posed method in achieving high accuracy and performance [3] V. L. Rubin, Y. Chen, and N. K. Conroy, ‘‘Deception detection for news:
in fake news detection, particularly on datasets with limited Three types of fakes,’’ Proc. Assoc. Inf. Sci. Technol., vol. 52, no. 1, pp. 1–4,
training samples. Jan. 2015.
[4] G. Li, P. Niu, X. Duan, and X. Zhang, ‘‘Fast learning network: A novel
artificial neural network with a fast learning speed,’’ Neural Comput. Appl.,
V. CONCLUSION AND FUTURE WORK vol. 24, nos. 7–8, pp. 1683–1695, Jun. 2014.
In this paper, we mentioned focusing on evaluating the [5] W. Han and V. Mehta, ‘‘Fake news detection in social networks using
machine learning and deep learning: Performance evaluation,’’ in Proc.
effectiveness of a deep neural network architecture for IEEE Int. Conf. Ind. Internet (ICII), Nov. 2019, pp. 375–380.
fake news detection using two different datasets. The main [6] A. Jain, A. Shakya, H. Khatter, and A. K. Gupta, ‘‘A smart system for
contributions of the research include the adoption of deep fake news detection using machine learning,’’ in Proc. Int. Conf. Issues
neural networks and TF-IDF for feature extraction, testing Challenges Intell. Comput. Techn. (ICICT), vol. 1, Sep. 2019, pp. 1–4.
[7] J. A. Nasir, O. S. Khan, and I. Varlamis, ‘‘Fake news detection: A hybrid
the model on different-sized datasets, and utilizing the FLN CNN-RNN based deep learning approach,’’ Int. J. Inf. Manag. Data
during the classification phase. The theory behind deep neural Insights, vol. 1, no. 1, Apr. 2021, Art. no. 100007.
networks for feature extraction lies in the ability of CNN [8] S. R. Sahoo and B. B. Gupta, ‘‘Multiple features based approach for
automatic fake news detection on social networks using deep learning,’’
to extract spatial features and the ability of BiLSTM to Appl. Soft Comput., vol. 100, Mar. 2021, Art. no. 106983.
extract temporal features, and we used a parallel combination [9] E. Elsaeed, O. Ouda, M. M. Elmogy, A. Atwan, and E. El-Daydamony,
of them in our proposed method. The proposed approach ‘‘Detecting fake news in social media using voting classifier,’’ IEEE
Access, vol. 9, pp. 161909–161925, 2021.
achieved high levels of accuracy on both datasets, surpassing
[10] H. Saleh, A. Alharbi, and S. H. Alsamhi, ‘‘OPCNN-FAKE: Optimized
the performance of other comparable methods. convolutional neural network for fake news detection,’’ IEEE Access,
The experiments have been done on two different datasets. vol. 9, pp. 129471–129489, 2021.
Most methods, including the proposed method, get good [11] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘FakeBERT: Fake news
detection in social media with a BERT-based deep learning approach,’’
results on the ISOT dataset, which includes long text. Multimedia Tools Appl., vol. 80, no. 8, pp. 11765–11788, Mar. 2021.
However, other methods get poor results on FA-KES, while [12] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, ‘‘Fake
the proposed model has a good result. It can be concluded news detection using deep learning models: A novel approach,’’ Trans.
Emerg. Telecommun. Technol., vol. 31, no. 2, p. e3767, Feb. 2020, doi:
that the proposed method has the ability to work on different 10.1002/ett.3767.
types of datasets. It also shows that the theory of the proposed [13] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, ‘‘FNDNet—A deep
model, which is simultaneously extracting spatial, temporal, convolutional neural network for fake news detection,’’ Cognit. Syst. Res.,
vol. 61, pp. 32–44, Jun. 2020.
and global features of text and using them for classification,
[14] M. K. Elhadad, K. F. Li, and F. Gebali, ‘‘A novel approach for
works well. selecting hybrid features from online news textual metadata for fake
The limitation of our work is that it is tested on English news detection,’’ in Advances on P2P, Parallel, Grid, Cloud and Internet
datasets only. Therefore, our future research will focus on Computing. Cham, Switzerland: Springer, 2020, pp. 914–925.
[15] L. Borges, B. Martins, and P. Calado, ‘‘Combining similarity features and
adopting non-English datasets in order to build a more deep representation learning for stance detection in the context of checking
comprehensive system. fake news,’’ J. Data Inf. Qual., vol. 11, no. 3, pp. 1–26, Sep. 2019.

139612 VOLUME 11, 2023


A. H. J. Almarashy et al.: Enhancing Fake News Detection by Multi-Feature Classification

[16] B. Ghanem, P. Rosso, and F. Rangel, ‘‘Stance detection in fake news a [33] S. Lu, Y. Ding, M. Liu, Z. Yin, L. Yin, and W. Zheng, ‘‘Multiscale feature
combined feature representation,’’ in Proc. 1st Workshop Fact Extraction extraction and fusion of image and text in VQA,’’ Int. J. Comput. Intell.
Verification, Brussels, Belgium, 2018, pp. 66–71. Syst., vol. 16, no. 1, p. 54, Apr. 2023.
[17] A. Hanselowski, P. V. S. Avinesh, B. Schiller, F. Caspelherr, D. Chaudhuri, [34] University of Victoria. (2021). Fake News Detection Datasets.
C. M. Meyer, and I. Gurevych, ‘‘A retrospective analysis of the fake [Online]. Available: https://onlineacademiccommunity.uvic.ca/isot/2022/
news challenge stance-detection task,’’ in Proc. 27th Int. Conf. Comput. 11/27/fake-news-detection-datasets
Linguistics. Santa Fe, New Mexico, USA: Association for Computational [35] F. K. A. Salem, R. A. Feel, S. Elbassuoni, M. Jaber, and M. Farah,
Linguistics, 2018, pp. 1859–1874. ‘‘FA-KES: A fake news dataset around the Syrian war,’’ in Proc. Int.
[18] Q. Liao, H. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and Y. Ding, AAAI Conf. Web Social Media, 2019, pp. 573–582. [Online]. Available:
‘‘An integrated multi-task model for fake news detection,’’ IEEE Trans. https://zenodo.org/record/2607278#ZBMr4RRBxPY
Knowl. Data Eng., vol. 34, no. 11, pp. 5154–5165, Nov. 2022.
[19] M. Mohtarami, R. Baly, J. Glass, P. Nakov, L. Marquez, and A. Moschitti,
‘‘Automatic stance detection using end-to-end memory networks,’’ in Proc.
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Human Lang.
Technol., New Orleans, LA, USA, 2018, pp. 767–776.
[20] C. Dulhanty, J. L. Deglint, I. B. Daya, and A. Wong, ‘‘Taking a stance AHMED HASHIM JAWAD ALMARASHY
on fake news: Towards automatic disinformation assessment via deep received the B.S. degree in control and systems
bidirectional transformer language models for stance detection,’’ 2019, engineering from the University of Technology,
arXiv:1911.11951. Baghdad, Iraq, in 2003, and the M.Tech. degree
[21] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B.-W. On, in computer science engineering from Acharya
‘‘Fake news stance detection using deep learning architecture (CNN- Nagarjuna University, India, in 2016. He is
LSTM),’’ IEEE Access, vol. 8, pp. 156695–156706, 2020. currently pursuing the Ph.D. degree with the
[22] T. Jiang, J. P. Li, A. U. Haq, A. Saboor, and A. Ali, ‘‘A novel stacking
Department of Computer Engineering, University
approach for accurate detection of fake news,’’ IEEE Access, vol. 9,
of Tabriz, Iran. His research interests include
pp. 22626–22639, 2021.
[23] A. Choudhary and A. Arora, ‘‘Linguistic feature based learning model natural language processing, deep learning, image
for fake news detection and classification,’’ Exp. Syst. Appl., vol. 169, processing, and smart cities.
May 2021, Art. no. 114171.
[24] M. Choudhary, S. S. Chouhan, E. S. Pilli, and S. K. Vipparthi,
‘‘BerConvoNet: A deep learning framework for fake news classification,’’
Appl. Soft Comput., vol. 110, Oct. 2021, Art. no. 107614.
[25] A. J. Keya, M. A. H. Wadud, M. F. Mridha, M. Alatiyyah, and M. A. Hamid, MOHAMMAD-REZA FEIZI-DERAKHSHI rec-
‘‘AugFake-BERT: Handling imbalance through augmentation of fake news eived the B.S. degree in software engineering from
using BERT to enhance the performance of fake news classification,’’ Appl. the University of Isfahan, Iran, and the M.Sc.
Sci., vol. 12, no. 17, p. 8398, Aug. 2022. and Ph.D. degrees in artificial intelligence from
[26] M. I. Nadeem, S. A. H. Mohsan, K. Ahmed, D. Li, Z. Zheng, M. Shafiq, the Iran University of Science and Technology,
F. K. Karim, and S. M. Mostafa, ‘‘HyproBERT: A fake news detection
Tehran, Iran. He is currently a Professor with
model based on deep hypercontext,’’ Symmetry, vol. 15, no. 2, p. 296,
the Faculty of Computer Engineering, University
Jan. 2023, doi: 10.3390/sym15020296.
[27] F. W. R. Tokpa, B. H. Kamagaté, V. Monsan, and S. Oumtanaga, ‘‘Fake of Tabriz, Iran. His research interests include
news detection in social media: Hybrid deep learning approaches,’’ J. Adv. natural language processing, optimization algo-
Inf. Technol., vol. 14, no. 3, pp. 606–615, 2023. rithms, deep learning, social network analysis, and
[28] M. Asgari-Chenaghlu, M.-R. Feizi-Derakhshi, L. Farzinvash, intelligent databases.
M.-A. Balafar, and C. Motamed, ‘‘Topic detection and tracking
techniques on Twitter: A systematic review,’’ Complexity, vol. 2021,
pp. 1–15, Jun. 2021.
[29] F. Ebrahimzadeh et al., ‘‘A hybrid recurrent neural network approach
for detecting abnormal user behavior in social networks,’’ Res. Square, PEDRAM SALEHPOUR (Member, IEEE)
Aug. 2023, doi: 10.21203/rs.3.rs-3242416/v1. received the B.S. and M.Sc. degrees in computer
[30] P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, ‘‘Attention-based science and the Ph.D. degree in electrical engineer-
bidirectional long short-term memory networks for relation classification,’’
ing from the University of Tabriz, in 2007, 2009,
in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Berlin, Germany,
and 2015, respectively. He is currently an Assistant
2016, pp. 207–212.
[31] N. Cecillon, V. Labatut, R. Dufour, and G. Linarès, ‘‘Abusive language Professor with the Faculty of Electrical and
detection in online conversations by combining content- and graph-based Computer Engineering, University of Tabriz. His
features,’’ Front. Big Data, vol. 2, p. 8, Jun. 2019. research interests include distributed computing,
[32] S. Lu, M. Liu, L. Yin, Z. Yin, X. Liu, and W. Zheng, ‘‘The multi-modal machine learning, image processing, and deep
fusion in visual question answering: A review of attention mechanisms,’’ learning.
PeerJ Comput. Sci., vol. 9, p. e1400, May 2023.

VOLUME 11, 2023 139613

You might also like